Using Complexity, Coupling, and Cohesion Metrics as Early Indicators of Vulnerabilities
MetadataShow full item record
Software security failures are common and the problem is growing. A vulnerability is a weakness in the software that, when exploited, causes a security failure. It is difficult to detect vulnerabilities until they manifest themselves as security failures in the operational stage of the software, because security concerns are often not addressed or known sufficiently early during the Software Development Life Cycle (SDLC). Complexity, coupling, and cohesion (CCC) related software metrics can be measured during the early phases of software development such as design or coding. Although these metrics have been successfully employed to indicate software faults in general, the relationships between CCC metrics and vulnerabilities have not been extensively investigated yet. If empirical relationships can be discovered between CCC metrics and vulnerabilities, these metrics could aid software developers to take proactive actions against potential vulnerabilities in software. In this thesis, we investigate whether CCC metrics can be utilized as early indicators of software vulnerabilities. We conduct an extensive case study on several releases of Mozilla Firefox to provide empirical evidence on how vulnerabilities are related to complexity, coupling, and cohesion. We mine the vulnerability databases, bug databases, and version archives of Mozilla Firefox to map vulnerabilities to software entities. It is found that some of the CCC metrics are correlated to vulnerabilities at a statistically significant level. Since different metrics are available at different development phases, we further examine the correlations to determine which level (design or code) of CCC metrics are better indicators of vulnerabilities. We also observe that the correlation patterns are stable across multiple releases. These observations imply that the metrics can be dependably used as early indicators of vulnerabilities in software. We then present a framework to automatically predict vulnerabilities based on CCC metrics. To build vulnerability predictors, we consider four alternative data mining and statistical techniques – C4.5 Decision Tree, Random Forests, Logistic Regression, and Naïve-Bayes – and compare their prediction performances. We are able to predict majority of the vulnerability-prone files in Mozilla Firefox, with tolerable false positive rates. Moreover, the predictors built from the past releases can reliably predict the likelihood of having vulnerabilities in future releases. The experimental results indicate that structural information from the non-security realm such as complexity, coupling, and cohesion are useful in vulnerability prediction.