Software Defect Prediction Using Rich Contextualized Language Use Vectors
Loading...
Authors
Rahman, Ashiqur
Date
Type
thesis
Language
eng
Keyword
Software Code Profile , Rich Contextualized Language Use Vectors , Machine Learning , Deep Learning , Software Bug Prediction , Software Defect Prediction
Alternative Title
Abstract
Context. Software defect prediction aims to find defect prone source code, and thus reduce the effort, time and cost involved with ensuring the quality of software systems. Both code and non-code metrics are commonly used in this process to train machine learning algorithms to predict software defects. Studies have shown that such metrics-based approaches are failing to give satisfactory results, and have reached a performance ceiling. This thesis explores the idea of using code profiles as an alternative to traditional metrics to predict software defects. This code profile-based method proves to be more promising than traditional metrics-based approaches.
Aims. This thesis aims to improve software defect prediction using code profiles as feature variables in place of traditional metrics. Software code profiles encode the density of language feature use and the context of such use in Rich Contextualized Language Use Vectors (RCLUVs) by analysing the parse tree of the source code. This thesis explores whether code profiles can be used to train machine learning algorithms, and compares the performance of the derived models to traditional metrics-based approaches.
Methods. To achieve these aims the learning curves of several machine learning algorithms are analyzed, and the performance of the derived models are evaluated against traditional metrics-based approaches. Two benchmark bug datasets, the Eclipse bug dataset and the Github bug database, are used to train the models.
Results. The learning curves of the models show machine learning algorithms can learn from RCLUV-based code profiles. Performance evaluation against existing metrics-based approaches reveals that the code profile-based approach is more promising than traditional metrics-based approaches. However, the predictive performance of both metrics and code profile-based approaches drops in cross-version predictions.
Conclusions. Unlike traditional metrics-based approaches, this thesis uses vectors generated by analyzing language feature use from the parse trees of source code as feature variables to train machine learning algorithms. Experimental results using learning algorithms encourages us to use software code profiles as an alternative to traditional metrics to predict software defects.
Description
Citation
Publisher
License
Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.