Empirical Studies of Clone Mutation and Clone Migration in Clone Genealogies

Thumbnail Image
Xie, Shuai Jr
Clone Migration , Clone Mutation , Clone Genealogy
Duplications and changes made on code segments by developers form code clones. Cloned code segments are exactly the same or have a particular similarity. A set of cloned code segments that have the same similarity with each other become a clone group. A clone genealogy contains several clone groups in different revisions and time periods. Based on different textual similarities, there are three clone types, i.e., Type-1, Type-2, and Type-3. Clone mutation contains the changes of clone types in the clone evolutions. Clone migration is known as moving cloned code segment to another location in the software system. In this thesis, we build clone genealogies by clone groups in two empirical studies. We conduct two studies on clone migration and clone mutation in clone genealogies. We use three large open source software systems in both studies. In the first study, we investigate if the fault-proneness of clone genealogies is affected by different patterns of clone mutation and different evolution patterns of distances among clones in clone groups. We conclude that clone groups mutated between Type-1 and Type-2 and between Type-1 and Type-3 clones have higher risk for faults. We find that modifying the location of a clone increases its risk for faults. In the second study, we study if the fault-proneness of migrated clones is affected by clone mutation with different changes on clone types. We examine if the length of time interval between clone migration and the last change of the cloned code has an impact on the faultiness of migrated clones. Our results show that the clone migration associated with clone mutation is more fault-prone than the clone migration without clone mutation. We find that a longer time interval between clone migration and the last change makes the migrated clones more fault-prone.
External DOI