Empirical Studies on Managing Code Clone Evolution Using Machine Learning Techniques

Thumbnail Image
Ehsan, Osama
software engineering , machine learning , code clones
The duplication of code snippets with or without minor modifications creates a code clone. The number of code clones is expected to increase as the software expands in size. The maintenance of code clones can be challenging and costly for software maintainers as software evolves. For example, changes made to one copy of a code snippet may require updating the other copies. Prior studies show that there can be up to 21% code clones in a software project and the inconsistent nature of code clone evolution can induce software bugs in the future. Poorly maintained software projects are often closed or abandoned. Software developers have limited resources to keep track of code clones and fix the code clones that can be harmful. Prior studies focus on the detection of code clones and analysis of the clone patterns. However, there is limited support provided to developers to work on more harmful clones. In this thesis, we leverage the information available (e.g., issue reports, commits, and pull requests) from open-source software projects. We apply AI and machine learning approaches to help the software developers monitor the evolution of clones and provide intelligent approaches to help developers maintain the code clones better. In particular, we conduct four studies: (1) a study to rank the code clones in software projects based on the bug-proneness of clones to have fewer bugs in the future; (2) a study to estimate the effort needed to propagate code clone changes to clone siblings and conduct a user survey to identify the usefulness of our approach; (3) an empirical study to predict late propagation and the associated bugs in software projects; and (4) a study to predict the degree of inconsistency in a clone group, the lifetime of bugs to survive in a software project, and suggest if inconsistent changes in a clone group should be propagated or not at the pull request level. Overall, the research in this thesis provides intelligent approaches to aid developers to better maintain the evolution of the code clones.
External DOI