A Novel Uncertainty-Aware Evidential Multimodal Deep Learning for RGB-D Household Object Recognition
Main Article Content
Abstract
RGB-D house hold object recognition is essential for robotic perception, enabling accurate object identification by leveraging both visual (RGB) and depth information. However, traditional deep learning models struggle with sensor noise, occlusions, and overconfident misclassifications. To address this, we propose an Evidential Multimodal Deep Learning (EMDL) framework, integrating Evidential Deep Learning (EDL) with CNN (Convolutional Neural Network) and attention based feature fusion. Our model extracts features using CNNs for RGB and depth, and then fuses them through a cross-attention mechanism, allowing adaptive weighting of modalities based on uncertainty. Instead of softmax classifiers, Dirichlet-based evidential output layer has been used. It quantifies both classification confidence and epistemic uncertainty, improving robustness. Evaluations on the Washington RGB-D dataset demonstrate superior performance in classification accuracy, noise handling, and domain generalization compared to baseline models. Accuracy of 92.2% is reached with this novel approach considering 10-fold cross validation method. By enhancing uncertainty-aware decision-making, our approach ensures safer and more reliable robotic perception, making it suitable for real-world applications like grasping, manipulation, and autonomous navigation.