AI Governance via Explainable Reinforcement Learning (XRL) for Adaptive Cyber Deception in Zero-Trust Networks
Main Article Content
Abstract
This study presents the design and evaluation of an Explainable Reinforcement Learning (XRL) system guided by AI governance principles for adaptive cyber deception within a Zero-Trust Architecture (ZTA). The proposed approach integrates a Monte Carlo Tree Search (MCTS)-based reinforcement learning agent with SHAP (SHapley Additive exPlanations) to deliver transparent and effective deception strategies against advanced persistent threats (APTs). The simulated environment consists of a containerized network with user workstations, honeypots, and a file server, governed by strict Zero-Trust policies such as least privilege and continuous verification.
The reinforcement learning agent was trained to perform deception actions, including deploying honeypots and generating alerts, based on attacker behavior. SHAP values provided feature-level explanations for each decision, which were logged in a governance dashboard designed around ISO/IEC 42001 compliance standards. Key metrics evaluated include false positive rate (FPR), honeypot engagement time, decision explainability, and overall governance compliance.
Results showed a 23% reduction in FPR, a 47% increase in honeypot engagement time, and a 94% rise in decision transparency. The overall governance score improved from 43% to 89%. The system's learning stabilized after 380 episodes, with the agent demonstrating consistent decision-making and improved attacker manipulation over time. These findings highlight the system's ability to balance technical performance with explainability and oversight, making it suitable for secure and accountable AI applications in cybersecurity. The integration of XRL into ZTA offers a promising approach for enhancing deception-based defense mechanisms while ensuring trust and transparency in AI-driven environments.