An Exploration of the challenges associated with software logging in large systems
Loading...
Date
2016-05-30
Authors
Kabinna, Suhas
Keyword
Logging, Logging library migration
Abstract
Over the past few years, logging has evolved from from simple printf statements to
more complex and widely used logging libraries. Today logging information is used
to support various development activities such as fixing bugs, analyzing the results
of load tests, monitoring performance and transferring knowledge. Recent research
has examined how to improve logging practices by informing developers what to log
and where to log. Furthermore, the strong dependence on logging has led to the
development of logging libraries that have reduced the intricacies of logging, which
has resulted in an abundance of log information.
Two recent challenges have emerged as modern software systems start to treat
logging as a core aspect of their software. In particular, 1) infrastructural challenges
have emerged due to the plethora of logging libraries available today and 2) processing
challenges have emerged due to the large number of log processing tools that ingest
logs and produce useful information from them. In this thesis, we explore these two
challenges. We first explore the infrastructural challenges that arise due to the plethora of
logging libraries available today. As systems evolve, their logging infrastructure has
to evolve (commonly this is done by migrating to new logging libraries). We explore
logging library migrations within Apache Software Foundation (ASF) projects. We
i
find that close to 14% of the pro jects within the ASF migrate their logging libraries at
least once. For processing challenges, we explore the different factors which can affect the
likelihood of a logging statement changing in the future in four open source systems
namely ActiveMQ, Camel, Cloudstack and Liferay. Such changes are likely to negatively impact the log processing tools that must be updated to accommodate such
changes. We find that 20%-45% of the logging statements within the four systems
are changed at least once. We construct random forest classifiers and Cox models
to determine the likelihood of both just-introduced and long-lived logging statements
changing in the future. We find that file ownership, developer experience, log density
and SLOC are important factors in determining the stability of logging statements.