Queen's University - Utility Bar

QSpace at Queen's University >
Theses, Dissertations & Graduate Projects >
Queen's Theses & Dissertations >

Please use this identifier to cite or link to this item: http://hdl.handle.net/1974/5693

Title: Enabling Large-Scale Mining Software Repositories (MSR) Studies Using Web-Scale Platforms
Authors: Shang, Weiyi

Files in This Item:

File Description SizeFormat
Shang_Weiyi_201005_MSc.pdf1.16 MBAdobe PDFView/Open
Keywords: Software Engineering
Mining Software Repositories
MapReduce
Hadoop
Pig
Issue Date: 2010
Series/Report no.: Canadian theses
Abstract: The Mining Software Repositories (MSR) field analyzes software data to uncover knowledge and assist software developments. Software projects and products continue to grow in size and complexity. In-depth analysis of these large systems and their evolution is needed to better understand the characteristics of such large-scale systems and projects. However, classical software analysis platforms (e.g., Prolog-like, SQL-like, or specialized programming scripts) face many challenges when performing large-scale MSR studies. Such software platforms rarely scale easily out of the box. Instead, they often require analysis-specific one-time ad hoc scaling tricks and designs that are not reusable for other types of analysis and that are costly to maintain. We believe that the web community has faced many of the scaling challenges facing the software engineering community, as they cope with the enormous growth of the web data. In this thesis, we report on our experience in using MapReduce and Pig, two web-scale platforms, to perform large MSR studies. Through our case studies, we carefully demonstrate the benefits and challenges of using web platforms to prepare (i.e., Extract, Transform, and Load, ETL) software data for further analysis. The results of our studies show that: 1) web-scale platforms provide an effective and efficient platform for large-scale MSR studies; 2) many of the web community‚Äôs guidelines for using web-scale platforms must be modified to achieve the optimal performance for large-scale MSR studies. This thesis will help other software engineering researchers who want to scale their studies.
Description: Thesis (Master, Computing) -- Queen's University, 2010-05-28 00:37:19.443
URI: http://hdl.handle.net/1974/5693
Appears in Collections:Queen's Theses & Dissertations
Computing Graduate Theses

Items in QSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

  DSpace Software Copyright © 2002-2008  The DSpace Foundation - TOP