UNDERSTANDING THE IMPACT OF EXPERIMENTAL DESIGN CHOICES ON MACHINE LEARNING CLASSIFIERS IN SOFTWARE ANALYTICS
Abstract
Software analytics is the process of systematically analyzing software engineering related data to generate actionable insights that help software practitioners make data-driven decisions. Machine learning classifiers lie at the heart of these software analytics pipelines and help automate the process of generating insights from large volumes of low-level software engineering data (e.g., static code metrics of software projects). However, the generated results from these classifiers are extremely sensitive to the various experimental design choices (e.g., choice of feature removal techniques) that one makes when constructing a software analytics pipeline. Despite that prior studies only explore the impact of a few experimental design choices on the results of classifiers and, the impact of many other experimental design choices on generated results remains unexplored. It is critical to further understand how the various experimental design choices impact the generated insights of a classifier. Such an understanding enables us to ensure the accuracy and validity of the generated insights from a classifier.
Therefore, in this PhD thesis, we further our understanding of how several previously unexplored experimental design choices impact the results that are generated by a classifier. Through several case studies on various software analytics datasets and contexts, 1) we find that the common practice of discretizing the dependent feature could be avoided in some cases (where the defective ratio of the dataset is <15%) by using regression-based classifiers. 2) In cases where the discretization of the dependent feature cannot be avoided, we propose a framework that the researchers and practitioners can use to mitigate its impact on the generated insights of a classifier. 3) We find that interchangeable use of feature importance methods should be avoided as different feature importance methods produce vastly different interpretations even on the same classifier. Based on these findings we provide several guidelines for future software analytics studies.
URI for this record
http://hdl.handle.net/1974/28167Request an alternative format
If you require this document in an alternate, accessible format, please contact the Queen's Adaptive Technology CentreThe following license files are associated with this item: