An Empirical Analysis of GNU Make in Open Source Projects
Build Systems , Makefiles , Open Source , Software Engineering , Software Metrics , Maintenance Complexity , GNU Make
Build systems, the tools responsible for compiling, testing, and packaging software systems, play a vital role in the software development process. Make is one of the oldest build technologies and is still widely used today, whether by manually writing Makefiles, or by generating them using tools like Autotools and CMake. Despite its conceptual simplicity, modern Make implementations such as GNU Make have become very complex languages, featuring functions, macros, lazy variable assignments and more. This thesis is an exploration of Make-based open source build systems in two parts. First, our feature analysis looks at the popularity of features and the difference between hand-written Makefiles and those generated using various tools. We find that generated Makefiles use only a core set of features and that more advanced features (such as function calls) are used very little, and almost exclusively in hand-written Makefiles. Second, our complexity analysis introduces indirection complexity -- a simple metric for measuring maintenance complexity in Makefiles using the same feature data compiled in the first analysis. We show how this new metric can provide a better way to measure which Makefiles will require more cognitive overhead to understand than traditional metrics. Both analyses utilize our framework, built with the TXL source transformation language, to obtain a detailed parse of Makefiles in our corpus. This corpus consists of almost 20,000 Makefiles, comprised of over 8.4 million lines, from 271 different open source projects. Through these analyses, we aim to gain a better understanding of how the Make language is used in the open source community (some of the most advanced users of Make).