Show simple item record

dc.contributor.authorGhotra, Baljinderen
dc.description.abstractSoftware quality plays a vital role in the success of a software project. The probability of having defective modules in large software systems remains high. A disproportionate amount of the cost of developing software is spent on maintenance. The maintenance of large and complex software systems is a big challenge for the software industry. Fixing defects is a central software maintenance activity to continuously improve software quality. Software Quality Assurance (SQA) teams are dedicated to this task (e.g., software testing and code review) of defect detection during the software development process. Since testing or review an entire software system are time and resource-intensive. Knowing which software modules are likely to be defect-prone before a system has been deployed help in effectively allocating SQA effort. Defect classification models help SQA teams to identify defect-prone modules in a software system before it is released to users. Defect classification models can be divided into two categories: (1) classification models that classify a software module is defective or not defective; and (2) regression models that count the number of defects in a software module. Our work is focused on training defect classification models such classification models are trained using software metrics (e.g., size and complexity metrics) to predict whether software modules will be defective or not in the future. However, defect classification models may yield different results when the experimental settings (e.g., choice of classification technique, features, dataset preprocessing) are changed. In this thesis, we investigate the impact of different experimental settings on the performance of defect classification models. More specifically, we study the impact of three experimental settings (i.e., choice of classification technique, dataset processing using feature selection techniques, and applying meta-learners to the classification techniques) on the performance of defect classification models through analysis of software systems from both proprietary and open-source domains. Our study results show that: (1) the choice of classification technique has an impact on the performance of defect classification models --- recommending that software engineering researchers experiment with the various available techniques instead of relying on specific techniques, assuming that other techniques are not likely to lead to statistically significant improvements in their reported results; (2) applying feature selection techniques do have a significant impact on the performance of defect classification models --- a Correlation-based filter-subset feature selection technique with a BestFirst search method outperforms other feature selection techniques across the studied datasets and across the studied classification techniques. Hence, recommending the application of such a feature selection technique when training defect classification models; and (3)meta-learners help in improving the performance of defect classification models, however, future studies employ concrete meta-learners (e.g., Random Forest), which train classifiers that perform statistically similar to classifiers that are trained using abstract meta-learners (e.g., Bagging and Boosting); however, produce a less complex model.en
dc.relation.ispartofseriesCanadian thesesen
dc.rightsCC BY 4.0en
dc.rightsQueen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canadaen
dc.rightsProQuest PhD and Master's Theses International Dissemination Agreementen
dc.rightsIntellectual Property Guidelines at Queen's Universityen
dc.rightsCopying and Preserving Your Thesisen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectDefect Classification Modelsen
dc.subjectClassification Techniquesen
dc.subjectFeature Selection Techniquesen
dc.titleAn Empirical Study of the Impact of Experimental Settings on Defect Classification Modelsen
dc.contributor.supervisorHassan, Ahmed E.en
dc.contributor.departmentComputingen's University at Kingstonen

Files in this item


This item appears in the following Collection(s)

Show simple item record

CC BY 4.0
Except where otherwise noted, this item's license is described as CC BY 4.0