• Login
    View Item 
    •   Home
    • Graduate Theses, Dissertations and Projects
    • Queen's Graduate Theses and Dissertations
    • View Item
    •   Home
    • Graduate Theses, Dissertations and Projects
    • Queen's Graduate Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Software Security Flaw Prediction Using Rich Contextualized Language Use Vectors: A Case Study on the Linux Kernel

    Thumbnail
    View/Open
    Thesis Document (1.152Mb)
    Author
    Fouladfard, Ghazal
    Metadata
    Show full item record
    Abstract
    One of the major threats to the security of software systems is the occurrence of security vulnerabilities, which can potentially cause a variety of problems including, but not limited to, information loss, privilege escalation, data breach, and system failure. Software vulnerability prediction is therefore a critical part of software engineering. A variety of approaches have been proposed to detect the most likely locations of vulnerabilities in large codebases. Many of the existing methods rely on traditional software metrics such as lines of code, complexity and code churn. In this study, we explored the possibility of using Rich Contextualized Language Use Vectors (RCLUVs) as a feature set for predicting vulnerabilities in the context of the Linux kernel.

    The RCLUV of a source code file contains elements representing the frequency of each programming language feature being used, both individually and in the context of other features. This code profile is generated by parsing the source code of a program and analyzing the resulting parse tree.

    We mined vulnerabilities reported in the National Vulnerability Database (NVD) and built a dataset containing all known vulnerable files in the 14-year history of the Linux kernel. We built and evaluated RCLUV-based prediction models using different machine learning algorithms under both experimental and realistic scenarios. Analysis of the learning curves of the models demonstrates that RCLUVs are effective for training machine learning models to learn vulnerability patterns. Performance comparison of our models with four different popular vulnerability prediction models shows that our approach outperforms the models trained on includes, function calls, and software metrics in an experimental setup. Moreover, our models can successfully predict more than half of the future and unseen vulnerabilities in a real-life setting when given enough training data.
    URI for this record
    http://hdl.handle.net/1974/27686
    Collections
    • School of Computing Graduate Theses
    • Queen's Graduate Theses and Dissertations
    Request an alternative format
    If you require this document in an alternate, accessible format, please contact the Queen's Adaptive Technology Centre

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us
    Theme by 
    Atmire NV
     

     

    Browse

    All of QSpaceCommunities & CollectionsPublished DatesAuthorsTitlesSubjectsTypesThis CollectionPublished DatesAuthorsTitlesSubjectsTypes

    My Account

    LoginRegister

    Statistics

    View Usage StatisticsView Google Analytics Statistics

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us
    Theme by 
    Atmire NV