• Login
    View Item 
    •   Home
    • Graduate Theses, Dissertations and Projects
    • Queen's Graduate Theses and Dissertations
    • View Item
    •   Home
    • Graduate Theses, Dissertations and Projects
    • Queen's Graduate Theses and Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Studying Software Quality Using Topic Models

    Thumbnail
    View/Open
    Chen_Tse-Hsun_201301_Msc.pdf (1.403Mb)
    Date
    2013-01-14
    Author
    Chen, Tse-Hsun
    Metadata
    Show full item record
    Abstract
    Software is an integral part of our everyday lives, and hence the quality of software is very important. However, improving and maintaining high software quality is a difficult task, and a significant amount of resources is spent on fixing software defects. Previous studies have studied software quality using various measurable aspects of software, such as code size and code change history. Nevertheless, these metrics do not consider all possible factors that are related to defects. For instance, while lines of code may be a good general measure for defects, a large file responsible for simple I/O tasks is likely to have fewer defects than a small file responsible for complicated compiler implementation details. In this thesis, we address this issue by considering the conceptual concerns (or features). We use a statistical topic modelling approach to approximate the conceptual concerns as topics. We then use topics to study software quality along two dimensions: code quality and code testedness. We perform our studies using three versions of four large real-world software systems: Mylyn, Eclipse, Firefox, and NetBeans.

    Our proposed topic metrics help improve the defect explanatory power (i.e., fitness of the regression model) of traditional static and historical metrics by 4–314%. We compare one of our metrics, which measures the cohesion of files, with other topic-based cohesion and coupling metrics in the literature and find that our metric gives the greatest improvement in explaining defects over traditional software quality metrics (i.e., lines of code) by 8–55%.

    We then study how we can use topics to help improve the testing processes. By training on previous releases of the subject systems, we can predict not well-tested topics that are defect prone in future releases with a precision and recall of 0.77 and 0.75, respectively. We can map these topics back to files and help allocate code inspection and testing resources. We show that our approach outperforms traditional prediction-based resource allocation approaches in terms of saving testing and code inspection efforts.

    The results of our studies show that topics can be used to study software quality and support traditional quality assurance approaches.
    URI for this record
    http://hdl.handle.net/1974/7743
    Collections
    • Queen's Graduate Theses and Dissertations
    • School of Computing Graduate Theses
    Request an alternative format
    If you require this document in an alternate, accessible format, please contact the Queen's Adaptive Technology Centre

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us
    Theme by 
    Atmire NV
     

     

    Browse

    All of QSpaceCommunities & CollectionsPublished DatesAuthorsTitlesSubjectsTypesThis CollectionPublished DatesAuthorsTitlesSubjectsTypes

    My Account

    LoginRegister

    Statistics

    View Usage StatisticsView Google Analytics Statistics

    DSpace software copyright © 2002-2015  DuraSpace
    Contact Us
    Theme by 
    Atmire NV