Show simple item record

dc.contributor.authorShang, Weiyien
dc.date2010-05-28 00:37:19.443
dc.date.accessioned2010-05-31T14:33:57Z
dc.date.available2010-05-31T14:33:57Z
dc.date.issued2010-05-31T14:33:57Z
dc.identifier.urihttp://hdl.handle.net/1974/5693
dc.descriptionThesis (Master, Computing) -- Queen's University, 2010-05-28 00:37:19.443en
dc.description.abstractThe Mining Software Repositories (MSR) field analyzes software data to uncover knowledge and assist software developments. Software projects and products continue to grow in size and complexity. In-depth analysis of these large systems and their evolution is needed to better understand the characteristics of such large-scale systems and projects. However, classical software analysis platforms (e.g., Prolog-like, SQL-like, or specialized programming scripts) face many challenges when performing large-scale MSR studies. Such software platforms rarely scale easily out of the box. Instead, they often require analysis-specific one-time ad hoc scaling tricks and designs that are not reusable for other types of analysis and that are costly to maintain. We believe that the web community has faced many of the scaling challenges facing the software engineering community, as they cope with the enormous growth of the web data. In this thesis, we report on our experience in using MapReduce and Pig, two web-scale platforms, to perform large MSR studies. Through our case studies, we carefully demonstrate the benefits and challenges of using web platforms to prepare (i.e., Extract, Transform, and Load, ETL) software data for further analysis. The results of our studies show that: 1) web-scale platforms provide an effective and efficient platform for large-scale MSR studies; 2) many of the web community’s guidelines for using web-scale platforms must be modified to achieve the optimal performance for large-scale MSR studies. This thesis will help other software engineering researchers who want to scale their studies.en
dc.language.isoengen
dc.relation.ispartofseriesCanadian thesesen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectSoftware Engineeringen
dc.subjectMining Software Repositoriesen
dc.subjectMapReduceen
dc.subjectHadoopen
dc.subjectPigen
dc.titleEnabling Large-Scale Mining Software Repositories (MSR) Studies Using Web-Scale Platformsen
dc.typethesisen
dc.description.degreeM.Sc.en
dc.contributor.supervisorHassan, Ahmed E.en
dc.contributor.departmentComputingen
dc.degree.grantorQueen's University at Kingstonen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record