A Novel Framework for Inherently Interpretable Deep Neural Networks Using Attention-Based Feature Attribution in High-Dimensional Tabular Data
Main Article Content
Abstract
Deep learning models for tabular data often lack interpretability, posing challenges in domains like healthcare and finance where trust is critical. We propose an attention-augmented neural network architecture that inherently highlights the most informative features, thus providing intrinsic explanations for its predictions. Drawing inspiration from TabNet and Transformer-based models, our model applies multi-head feature-wise attention to automatically weight each feature’s contribution. We incorporate an attention-weight regularization scheme (e.g. sparsemax) to encourage focused attributions. For further interpretability, we compare these learned attention weights with SHAP (Shapley Additive Explanations) post-hoc values. We evaluate our approach on a high-dimensional healthcare dataset (e.g. clinical outcome prediction) and synthetic benchmarks. Experimental results show our model achieves competitive accuracy (Table 1) while providing clear feature importance insights. Feature attribution charts (Fig. 1) demonstrate that the attention mechanism successfully identifies key predictors, aligning well with SHAP analysis. This work bridges performance and explainability by design, enabling reliable deployment of deep models on complex tabular data.