Show simple item record

dc.contributor.authorFouladfard, Ghazalen
dc.date.accessioned2020-03-30T17:41:48Z
dc.date.available2020-03-30T17:41:48Z
dc.identifier.urihttp://hdl.handle.net/1974/27686
dc.description.abstractOne of the major threats to the security of software systems is the occurrence of security vulnerabilities, which can potentially cause a variety of problems including, but not limited to, information loss, privilege escalation, data breach, and system failure. Software vulnerability prediction is therefore a critical part of software engineering. A variety of approaches have been proposed to detect the most likely locations of vulnerabilities in large codebases. Many of the existing methods rely on traditional software metrics such as lines of code, complexity and code churn. In this study, we explored the possibility of using Rich Contextualized Language Use Vectors (RCLUVs) as a feature set for predicting vulnerabilities in the context of the Linux kernel. The RCLUV of a source code file contains elements representing the frequency of each programming language feature being used, both individually and in the context of other features. This code profile is generated by parsing the source code of a program and analyzing the resulting parse tree. We mined vulnerabilities reported in the National Vulnerability Database (NVD) and built a dataset containing all known vulnerable files in the 14-year history of the Linux kernel. We built and evaluated RCLUV-based prediction models using different machine learning algorithms under both experimental and realistic scenarios. Analysis of the learning curves of the models demonstrates that RCLUVs are effective for training machine learning models to learn vulnerability patterns. Performance comparison of our models with four different popular vulnerability prediction models shows that our approach outperforms the models trained on includes, function calls, and software metrics in an experimental setup. Moreover, our models can successfully predict more than half of the future and unseen vulnerabilities in a real-life setting when given enough training data.en
dc.language.isoengen
dc.relation.ispartofseriesCanadian thesesen
dc.rightsQueen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canadaen
dc.rightsProQuest PhD and Master's Theses International Dissemination Agreementen
dc.rightsIntellectual Property Guidelines at Queen's Universityen
dc.rightsCopying and Preserving Your Thesisen
dc.rightsThis publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.en
dc.subjectVulnerability Predictionen
dc.subjectRCLUVen
dc.titleSoftware Security Flaw Prediction Using Rich Contextualized Language Use Vectors: A Case Study on the Linux Kernelen
dc.typethesisen
dc.description.degreeM.Sc.en
dc.contributor.supervisorCordy, James R.
dc.contributor.departmentComputingen
dc.degree.grantorQueen's University at Kingstonen


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record