Detection and Analysis of \\ Detection and Analysis of Near-Miss Software Clones
MetadataShow full item record
Software clones are considered harmful in software maintenance and evolution. However, despite a decade of active research, there is a marked lack of work in the detection and analysis of near-miss software clones, those where minor to extensive modifications have been made to the copied fragments. In this thesis, we advance the state-of-the-art in clone detection and analysis in several ways. First, we develop a hybrid clone detection method, called NICAD, that can detect both exact and near-miss clones with high precision and recall and with reasonable performance. Second, in order to address the decade of vagueness in clone definition, we propose an editing taxonomy for clone creation that models developers' editing activities in the copy/pasted code in a top-down fashion. NICAD is designed to address the different types of clones in the editing taxonomy. Third, we have conducted a scenario-based qualitative comparison and evaluation of all of the currently available clone detection techniques and tools in the context of a unified conceptual framework. Using the results of this study one can more easily choose the right tools to meet the requirements and constraints of any particular application, and can identify opportunities for hybridizing different techniques. The hybrid architecture of NICAD was derived from this study. Fourth, in order to evaluate and compare the available tools in a realistic setting and to avoid the challenges and huge manual effort in validating candidate clones, we have developed a mutation-based framework that automatically and efficiently measures (and compares) the recall and precision of clone detection tools for different fine-grained clone types of the proposed editing taxonomy. We have evaluated NICAD using this framework and found that it is capable of detecting different types of clones with high precision and recall. Finally, we have conducted a large scale empirical study of cloning in open source systems, both to evaluate NICAD and to study the cloning characteristics of these systems in several different dimensions. The study has demonstrated that NICAD is capable of accurately finding both exact and near-miss function clones even in large systems and different languages, and that there seem to be a large number of clones in those systems.