On the Maintenance of Crowdsourced Knowledge on Stack Overflow

Thumbnail Image
Zhang, Haoxiang
Knowledge Sharing, Knowledge Maintenance, Stack Overflow, Empirical Software Engineering, Mining Software Repositories
The enormous quantity of software artifacts (e.g., code, documentations, and online discussions) created by developers are in great need for maintenance. Among software artifacts built by developers, Stack Overflow, a question answering (Q&A) website for sharing programming knowledge, has tripled the number of its hosted answers from 9.3 million to 27.2 million over the past 6 years (as of 2019). Such a large-scale knowledge base (including code snippets along with the embedded knowledge in the question answering activities), is inevitably changing over time. In an effort to better understand the current knowledge maintenance practices and to improve the maintenance of such valuable knowledge on Stack Overflow, this PhD thesis empirically studies the Q&A activities on Stack Overflow over a decade (i.e., from 2008 to 2018). Our goal is to provide developers with lessons about the knowledge maintenance practices on Stack Overflow. Specifically, this thesis mines the question answering activities on Stack Overflow along three dimensions: 1) the obsolescence of answers, 2) the informativeness of comments that are associated with answers, and 3) the retrieval of information in hidden comments. First, we wish to understand the knowledge maintenance practices by studying obsolete answers. For example, obsolete answers can contain invalid links, or obsolete Application Programming Interface (API) usages. Secondly, we investigate the informativeness of comments that are associated with Stack Overflow answers. An informative comment can provide additional explanations, thus updating its associated answer. Furthermore, we examine comments that can be added long after an answer is posted. In particular, we study the comment organization mechanism and analyze whether hidden comments are informative as well. By empirically studying such crowdsourced knowledge, we wish to highlight the importance of maintaining such valuable crowdsourced knowledge and to understand developer practices.
External DOI