Studying reopened bugs in open source software systems
bug reports , reopened bugs , data quality , open source , Model interpretation , pre and post-release bugs
Reopened bugs can degrade the overall reputation of a software system since such bugs lead to a loss of end-users trust regarding the quality of the software. Thus, understanding the characteristics of bugs that are reopened, and what factors are more likely to affect a reopened bug (especially post-release reopened bug) to be fixed rapidly throughout the release lifecycle, could provide insights in helping software developers to avoid/minimize such bugs. In this thesis, we study the characteristics of reopened bugs and the factors that lead to a post-release reopened bug being fixed rapidly/slowly. To understand the characteristics of reopened bugs, prior studies built statistical or machine learning models to analyze the factors that impact the likelihood of a bug getting reopened. However, we observe several aspects that require further investigation from prior studies: 1) previously studied datasets are too small (only consisting of 3 projects) 2) 1 out of the 3 studied projects has a data leak issue. 3) the previously used experimental steps are outdated. After considering such aspects, we observe that only 34\% of the studied projects give an acceptable performance with AUC $\geqslant$ 0.7 for predicting if a bug will be reopened. Moreover, we observe that post-release reopened bugs take only 189.1 hours rework time (time taken to resolve a reopened bug) as compared to 388.4 hours for rework time in pre-release reopened bugs. To study the likelihood of a post-release reopened bug getting fixed rapidly, we build prediction pipelines and observe that the models give an acceptable AUC of 0.78 to determine if a post-release reopened bug will get resolved rapidly/slowly. Our model predicts if a post-release reopened bug will get resolved rapidly (i.e., less than 3 minutes) or slowly (i.e., more than 4,538 hours) by considering top 20\% fast resolved bugs as rapidly resolved and bottom 20\% fast resolved bugs as slowly resolved. Based on our findings, we encourage future research to leverage the rich data available during and after a bug is reopened, to understand the eventual resolution of a reopened bug and we also encourage researchers to consider pre-release and post-release reopened bugs separately in their analysis as studying reopened bugs as a whole leads to biased implications.