Show simple item record

dc.contributor.authorZhang, Jenneen
dc.description.abstractIn the big data era, big data frameworks play a vital role in storing and processing large amounts of data, providing significant improvements in performance and availability. Spark is one of the most popular big data frameworks, providing high scalability and fault-tolerance with its unique in-memory engine. To hide the complex settings from users, Spark has approximately 200 configurable parameters in the execution engine. Default values assigned to the parameters provide initial ease of use. However, the default values are not the best setting for all workloads. In this work, we propose a general tuning algorithm named QST, Queen’s Spark Tuning, to help users with tuning Spark and to improve overall performance. First of all, we study Spark performance for a variety of workloads and identify 9 tunable parameters among more than 200 parameters that have significant impact on performance. Then, we propose QST, a general greedy iterative tuning algorithm for our set of 9 key parameters. By classifying Spark workloads as memory-intensive, shuffle-intensive or all-intensive, QST configures the parameters for each type of workload. We perform an experimental evaluation of QST using benchmark workloads and industry workloads. In our experiments, using QST significantly improves Spark performance. Overall, using QST yields an average speedup of 65% for our benchmark evaluation workloads and 57% for our industry evaluation workloads.en
dc.relation.ispartofseriesCanadian thesesen
dc.rightsCC0 1.0 Universalen
dc.rightsQueen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canadaen
dc.rightsProQuest PhD and Master's Theses International Dissemination Agreementen
dc.rightsIntellectual Property Guidelines at Queen's Universityen
dc.rightsCopying and Preserving Your Thesisen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectApache sparken
dc.titleTuning Spark Performanceen
dc.contributor.supervisorMartin, Patricken
dc.contributor.departmentComputingen's University at Kingstonen

Files in this item


This item appears in the following Collection(s)

Show simple item record

CC0 1.0 Universal
Except where otherwise noted, this item's license is described as CC0 1.0 Universal