Mining development knowledge to understand and support software logging practices

Thumbnail Image
Li, Heng
Software engineering , Mining software repositories , Software analytics , Log analytics , Data mining , Program analysis , Machine learning , Software logging , Natural language processing
Developers insert logging statements in their source code to trace the runtime behaviors of software systems. Logging statements print runtime log messages, which play a critical role in monitoring system status, diagnosing field failures, and bookkeeping user activities. However, developers typically insert logging statements in an ad hoc manner, which usually results in fragile logging code, i.e., insufficient logging in some code snippets and excessive logging in other code snippets. Insufficient logging can significantly increase the difficulty of diagnosing field failures, while excessive logging can cause performance overhead and hide truly important information. The goal of this thesis is to help developers improve their logging practices and the quality of their logging code. We believe that development knowledge (i.e., source code, code change history, and issue reports) contains valuable information that explains developers' rationale of logging, which can help us understand existing logging practices and provide helpful tooling support for such logging practices. Therefore, this thesis proposes to mine different aspects of development knowledge to understand and support software logging practices. We mine issue reports to understand developers' logging concerns, i.e., the benefits and costs of logging from developers' perspective. Our findings shed lights on future research opportunities for helping developers leverage the benefits of logging while minimizing logging costs. We mine source code to learn how developers distribute logging statements in their source code, and propose an approach to provide automated suggestions about where to log. We find that the semantic topics of a code snippet provide another dimension to explain the likelihood of logging a code snippet. We mine code change history to understand how developers develop and maintain their logging code, and propose an automated approach that can provide developers with log change suggestions when they change their code. We also mine code change history to understand how developers choose log levels for their logging statements, and propose an automated approach that can help developers determine the appropriate log level when they add a new logging statement. This thesis highlights the need for standard logging guidelines and automated tooling support for logging.
External DOI