Leveraging Historical Code Changes to Support Clone Management Activities

Loading...
Thumbnail Image

Authors

Sourav, Sumit

Date

2015-10-01

Type

thesis

Language

eng

Keyword

Clone Management , Survival Analysis , Clone Detection

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Code clones are code snippets that come into existence when developers copy paste (and possibly modify) an existing piece of code. Studies show that cloning is an inevitable phenomenon leading to a significant presence of code clones (as much as 10%-30% of the source code consists of cloned code) in large software systems. To effectively manage these clones, researchers have proposed multiple activities along two dimensions: 1) proactive clone management, and 2) post-mortem clone management. Proactive clone management emphasis is on activities that prevent the introduction of new clones into the source code (e.g., identifying the factors that influence developers to clone code). On the other hand, post-mortem clone management focuses on managing the existing clones (e.g., detection or refactoring of a clone). In this thesis, we examine several open issues along both dimensions of clone management activities. For example, over 80% of research focuses on the detection of clones and studying their impact on code quality. However, limited research has examined the factors that make code more likely to be cloned. We find that an increase in the complexity of a method increases the likelihood of code being cloned from that method. Moreover, while there exists more than 70 clone detection tools, limited techniques are available to evaluate the performance (especially recall) of these tools. We find that current state-of-the-art framework for the evaluation of clone detection tools tend to overestimate their recall. Hence, we propose a statistically rigorous framework to evaluate the recall of clone detection tools. In addition, little has been studied to determine the life expectancy of an introduced clone. We find that the characteristics of a clone (e.g., size of a clone, or the directory-structure distance between clone siblings) at the time of its introduction highly influence its life expectancy. Practitioners and researchers can leverage our findings for: a) effective proactive clone management (e.g., by developing tools to propose the abstraction of code that is likely to be cloned) and b) effective post-mortem clone management by leveraging our framework to more accurately evaluate the recall of clone detection tools and by determining whether an introduced clone will be short-lived or long-lived to efficiently recommend clone management activities (e.g., annotation or refactoring of clones, especially the long-lived ones).

Description

Thesis (Master, Computing) -- Queen's University, 2015-09-29 14:43:57.365

Citation

Publisher

License

Queen's University's Thesis/Dissertation Non-Exclusive License for Deposit to QSpace and Library and Archives Canada
ProQuest PhD and Master's Theses International Dissemination Agreement
Intellectual Property Guidelines at Queen's University
Copying and Preserving Your Thesis
Creative Commons - Attribution - CC BY
This publication is made available by the authority of the copyright owner solely for the purpose of private study and research and may not be copied or reproduced except as permitted by the copyright laws without written authority from the copyright owner.

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN