Show simple item record

dc.contributor.authorKhalifa, Shadyen
dc.date.accessioned2017-03-22T19:46:14Z
dc.date.available2017-03-22T19:46:14Z
dc.identifier.urihttp://hdl.handle.net/1974/15460
dc.description.abstractBusinesses look at Big Data as an opportunity to gain insights for improving their services. The derivation of such insights requires using different data mining techniques. Mature data mining tools like WEKA or R have been in development for years. They implement a large number of data mining algorithms and can support sophisticated Analytics. However, these mature tools are designed to run on a single machine making them unsuitable to handle Big Data. Using these tools requires data mining and statistics knowledge, and some of them, like R, are hard to learn. Businesses do not always have the technical skills required to carry on such Analytics. Even if they do, it is challenging to find a tool with the needed algorithms that supports distributed processing to handle the Big Data high arrival velocity and large volumes. The Businesses’ analytical requirements can be addressed by Consumable Big Data Analytics, that is, solutions that allow businesses to do Big Data Analytics themselves using their in-house expertise. In this work, we provide a Consumable Analytics solution to meet the businesses’ analytical needs. First, we conduct a survey of existing Analytics solutions to identify possible areas of improvement to provide Consumable Analytics. Second, instead of developing distributed data mining algorithms to handle Big Data, we develop the Data Mining Distribution (DMD) algorithm and the Label-Aware Disjoint Partitioning (LADP) algorithm to distribute the execution of all existing single-machine data mining algorithms without rewriting a single line of their code. This gives users the flexibility to use any available data mining library, have algorithms like Hoeffding Tree run 70% to 95% faster and achieve up to 18% increase in prediction accuracy. Third, we develop the free and open source QDrill solution to implement our DMD and LADP algorithms for distributed Analytics. QDrill implements our proposed Distributed Analytics Query Language (DAQL) interface that adds Analytics capabilities to the regular SQL syntax and allows integration with Business Intelligence (BI) tools. This allows businesses to use their in-house expertise to do Big Data Analytics using the spreadsheets and visualizations of their BI tools.en
dc.language.isoengen
dc.relation.ispartofseriesCanadian thesesen
dc.rightsAttribution-ShareAlike 3.0 United Statesen
dc.rightsQueen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canadaen
dc.rightsProQuest PhD and Master's Theses International Dissemination Agreementen
dc.rightsIntellectual Property Guidelines at Queen's Universityen
dc.rightsCopying and Preserving Your Thesisen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/us/
dc.subjectBig Dataen
dc.subjectAnalyticsen
dc.subjectData Mininigen
dc.subjectDistributeden
dc.subjectDrillen
dc.subjectMachine Learningen
dc.subjectClassifier Ensemblesen
dc.subjectConsumable Analyticsen
dc.subjectQuery Languageen
dc.subjectWekaen
dc.titleAchieving Consumable Big Data Analytics by Distributing Data Mining Algorithmsen
dc.typethesisen
dc.description.degreePhDen
dc.contributor.supervisorMartin, Patricken
dc.contributor.departmentComputingen
dc.degree.grantorQueen's University at Kingstonen


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-ShareAlike 3.0 United States
Except where otherwise noted, this item's license is described as Attribution-ShareAlike 3.0 United States