Towards Sustainable AI for Continuous Integration Quality Gates

Loading...
Thumbnail Image

Authors

Olewicki, Doriane

Date

2025-01-30

Type

thesis

Language

eng

Keyword

Continuous integration , Machine Learning , Lifelong Learning , Build system , Code review process

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Continuous Integration (CI) is the concept of continuously incorporating new code changes in the project's main codebase. To ensure the quality of the integrated changes, two types of quality gates are typically considered by most software development companies: automated gates, leveraging the build process, which invokes compilation commands and runs tests, and manual gates, based on peer reviews. With projects scaling up in size, the cost required for maintaining CI quality gates can be considerable (in terms of computational/manual effort). To assist or improve CI quality gates, the Software Engineering (SE) community has proposed a wide range of software analytics models. However, most of prior work focuses on the initial deployment of such models, without considering the different factors involved in their long-term sustainability: automated gates require smart retraining processes to maintain performance, while manual gates require synergy with reviewers’ workflows. In this thesis, we study sustainable AI for CI quality gates. For automated gates, we explore how to improve the learning process of software analytics models using (1) dynamic training scheduling strategies, where we evaluate different ways to dynamically adapt the training time for Retraining-From-Scratch (RFS) setups, and (2) Lifelong Learning (LL) training setups, where we change the fundamental learning algorithms toward an incremental setup. For the manual CI quality gates, we perform user studies to evaluate the impact on reviewers' productivity of (3) hotspot-based file-ordering, where problematic changes are shown first to the reviewers, and of (4) reviewers’ interaction with generated review comments. We were able to optimize the trade-off between model performance and the computational effort of model retraining, with our best heuristics recommending retraining only once every 5-6 weeks without losing performance to weekly retrained models. Yet, LL substantially reduces computational effort even further by 2-40x compared to RFS. Through large-scale industrial user studies, we also observe that reviewers benefit from ML-based assistance, as file-reordering led to more (+23%) and better (precision +13%, recall +8%) review comments compared to the default alpha-numeric file-ordering, while generative AI-based review comment generation obtains promising results regarding acceptance (8.1% and 7.2%) and appreciation (23% and 28.3%).

Description

Citation

Publisher

Journal

Volume

Issue

PubMed ID

External DOI

ISSN

EISSN