Towards Generalizing Defect Prediction Models
Loading...
Authors
Zhang, Feng
Date
2016-01-28
Type
thesis
Language
eng
Keyword
Generalization , Mining software repositories , Software engineering , Defect prediction
Alternative Title
Abstract
Software quality is vital to the success of a software project. Fixing defects is the
major activity to continuously improve software quality. Given that a real development team usually exhibits limited resources and tight schedules, it is important to
prioritize testing activities and optimize development resources. Predicting defective entities (e.g., files or classes) ahead helps achieve such a goal. Defect prediction has attracted
considerable attention from both academia and industry in the last decade.
A typical defect prediction model is built upon software metrics and labelled defect
data that are collected from the historical data of a software project. A defect prediction
model can be applied within the same project (within-project defect prediction) or on other
projects (cross-project defect prediction). However, due to the diversity in development
processes, a defect prediction model is often not transferable and requires to be rebuilt
when the target project changes. As it consumes additional effort to build and maintain
a defect prediction model for a particular project, it is of significant interest to generalize
a defect prediction model. A generalized defect prediction model relieves the need to
rebuild a defect prediction model for each target project. Moreover, it helps reveal a general
relationship between software metrics and defect data.
In this thesis, we analyze the feasibility of generalizing defect prediction models. First,
we analyze how the distribution of the values of software metrics varies across projects of
different context factors (e.g., programming language and system size). We observe that
such distributions do vary across projects, but can also be similar across projects of different context factors. Second, we investigate the impact that the pre-processing steps (in
particular, transformation and aggregation of software metrics) have on the performance of
defect prediction models. We find that the pre-processing steps impact the performance of
defect prediction models, and therefore need to be considered towards generalizing defect
prediction models. Finally, we propose two approaches for generalizing defect prediction
models with supervised (requiring the training data) and unsupervised (without the training
data) methods, respectively. Our results show that both approaches are feasible to generalize defect prediction models.
Description
Thesis (Ph.D, Computing) -- Queen's University, 2016-01-27 14:42:48.167
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
Creative Commons - Attribution-Non-commercial - CC BY-NC
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
Creative Commons - Attribution-Non-commercial - CC BY-NC
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
