AgenticCI: An Empirical Evaluation of Autonomous Test Selection and Self-Healing for Mobile Applications

Main Article Content

Satyanarayana Gudimetla, Chandrakanth Challa

Abstract

Mobile CI/CD pipelines face persistent challenges: device fragmentation, platform diversity, and testing overhead that scales poorly. This paper presents AgenticCI, a framework combining three components: (1) a Deep Q-Network risk predictor with 24-feature state representation achieving 89.3% accuracy, (2) a self-healing test engine using ResNet-50, BERT semantic analysis, and spatial reasoning (82.6% adaptation success), and (3) a hybrid test selection algorithm incorporating code change impact, historical failures, and complexity metrics. AgenticCI was deployed across five production applications over 180 days (2,847 builds, ~1.2M test executions), then expanded to thirteen applications over 360 days (5,823 builds, ~2.9M executions). The initial deployment cut execution time by 68% (127.3 to 40.7 minutes average), detected 91.2% of defects using 31.4% of test resources, reduced maintenance overhead by 57%, and lowered infrastructure costs by ~34.3%. The extended deployment showed 71.8% time reduction and 89.6% defect detection, though results varied considerably by domain—IoT applications performed notably worse than expected. Ablation studies confirmed component interdependencies: removing risk-based prioritization dropped detection by 5.1% (p=0.003), while disabling self-healing increased maintenance by 6.8% (p=0.024). Compared to Ekstazi, RETECS, ROCKET, and DeepTest, AgenticCI showed improvements on most metrics, with some exceptions noted.

Article Details

Section
Articles