Optimizing Data Locality in Analytic Workloads over Distributed Computing Environments

Loading...
Thumbnail Image

Authors

Elshater, Yehia

Date

Type

thesis

Language

eng

Keyword

YARN , Hadoop , Data Locality , Simulation , Dynamic Replication , Load Balancing , Optimization

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

With the explosion of data that are generated every second, there is an emerging need for big data analytics using scalable systems and platforms for exploration, mining and decision making purposes. To gain better business insights, the business users are interested to integrate different kinds of analytics to achieve their goals. These analytics may involve accessing the same data for different purposes. Modern data intensive systems co-locate the computation as close as possible to the data to achieve greater e ciency. This placement of computation close to the data is called data locality. Data locality has a significant impact on the performance of jobs in a large cluster since higher data locality means there is less data transfer over the network. In this work, we examine data locality in parallel processing frameworks and propose approaches to optimize it. First, we conduct a literature review of the existing systems that maximize data locality while processing big data analytics workflows. Second, we provide YARN Locality Simulator (YLocSim), a simulator tool that simulates the interactions between YARN components in a real cluster to report the data locality percentages. This tool gives the users better insights about the expected performance of the computing cluster. Third, we develop YARN Dynamic Replication Manager (YDRM), which is a new component in YARN that interacts with the existing YARN's Resource Manager to improve the data locality.

Description

Citation

Publisher

License

Attribution-NonCommercial-ShareAlike 3.0 United States
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN