Log Engineering: Towards Systematic Log Mining to Support the Development of Ultra-large Scale Systems
MetadataShow full item record
Much of the research in software engineering focuses on understanding the dynamic nature of software systems. Such research typically uses automated instrumentation or profiling techniques on the code. In this thesis, we examine logs as another source of dynamic information. Such information is generated from statements inserted into the code during development to draw the attention of system operators and developers to important run-time events. Such statements reflect the rich experience of system experts. The rich content of logs has led to a new market for log management applications that assist in storing, querying and analyzing logs. Moreover, recent research has demonstrated the importance of logs in understanding and improving software systems. However, developers often treat logs as textual data. We believe that logs have much more potential in assisting developers. Therefore, in this thesis, we propose Log Engineering to systematically leverage logs in order to support the development of ultra-large scale systems. To motivate this thesis, we first conduct a literature review on the state-of-the-art of software log mining. We find that logging statements and logs from the development environment are rarely leveraged by prior research. Further, current practices of software log mining tend to be ad hoc and do not scale well. To better understand the current practice of leveraging logs, we study the challenge of understanding logs and study the evolution of logs. We find that knowledge derived from development repositories, such as issue reports, can assist in understanding logs. We also find that logs co-evolve with the code, and that changes to logs are often made without considering the needs of Log Processing Apps that surround the software system. These findings highlight the need for better documentation and tracking approaches for logs. We then propose log mining approaches to assist the development of systems. We first find that logging characteristics provide strong indicators of defect-prone source code files. Hence, code quality improvement efforts should focus on the code with large amounts of logging statements or their churn. Finally, we present a log mining approach to assist in verifying the deployment of Big Data Analytics applications.