An Empirical Study of the Impact of Experimental Settings on Defect Classification Models
Software quality plays a vital role in the success of a software project. The probability of having defective modules in large software systems remains high. A disproportionate amount of the cost of developing software is spent on maintenance. The maintenance of large and complex software systems is a big challenge for the software industry. Fixing defects is a central software maintenance activity to continuously improve software quality. Software Quality Assurance (SQA) teams are dedicated to this task (e.g., software testing and code review) of defect detection during the software development process. Since testing or review an entire software system are time and resource-intensive. Knowing which software modules are likely to be defect-prone before a system has been deployed help in effectively allocating SQA effort. Defect classification models help SQA teams to identify defect-prone modules in a software system before it is released to users. Defect classification models can be divided into two categories: (1) classification models that classify a software module is defective or not defective; and (2) regression models that count the number of defects in a software module. Our work is focused on training defect classification models such classification models are trained using software metrics (e.g., size and complexity metrics) to predict whether software modules will be defective or not in the future. However, defect classification models may yield different results when the experimental settings (e.g., choice of classification technique, features, dataset preprocessing) are changed. In this thesis, we investigate the impact of different experimental settings on the performance of defect classification models. More specifically, we study the impact of three experimental settings (i.e., choice of classification technique, dataset processing using feature selection techniques, and applying meta-learners to the classification techniques) on the performance of defect classification models through analysis of software systems from both proprietary and open-source domains. Our study results show that: (1) the choice of classification technique has an impact on the performance of defect classification models --- recommending that software engineering researchers experiment with the various available techniques instead of relying on specific techniques, assuming that other techniques are not likely to lead to statistically significant improvements in their reported results; (2) applying feature selection techniques do have a significant impact on the performance of defect classification models --- a Correlation-based filter-subset feature selection technique with a BestFirst search method outperforms other feature selection techniques across the studied datasets and across the studied classification techniques. Hence, recommending the application of such a feature selection technique when training defect classification models; and (3)meta-learners help in improving the performance of defect classification models, however, future studies employ concrete meta-learners (e.g., Random Forest), which train classifiers that perform statistically similar to classifiers that are trained using abstract meta-learners (e.g., Bagging and Boosting); however, produce a less complex model.
URI for this recordhttp://hdl.handle.net/1974/15861
Request an alternative formatIf you require this document in an alternate, accessible format, please contact the Queen's Adaptive Technology Centre
The following license files are associated with this item: