Multi Cascaded Face Artefact Detection with Xception Convoluted LSTM Network for Deep Fake Detection
Main Article Content
Abstract
Introduction: Facial deepfakes are becoming increasingly realistic, creating it difficult for humans to distinguish between fake and real videos. This technology poses significant risks across various sectors, including politics, entertainment, and cybersecurity.
Objectives: To address these challenges, deepfake detection systems must enhance their detection capabilities, ensure temporal consistency, and improve face detection techniques. Existing systems often struggle with subtle manipulations, necessitating a combination of spatial and temporal information.
Methods: This paper introduces a novel methodology employing a Multi-cascaded Face Artefact Detection approach combined with an Xception Convoluted Long Short-Term Memory (LSTM) Network to overcome existing limitations.
Results: The method begins with pre-processing the input video by converting it into frames at a consistent rate. Face detection is conducted using Multi-Task Cascaded Convolutional Networks (MTCNN), which identifies as well as resizes faces in each frame. Key facial landmarks are then extracted using Dlib to capture intricate manipulations.
Conclusions: The Xception Convoluted LSTM Network detects spatial features and temporal dependencies to identify inconsistencies in manipulated videos. The system was evaluated using the FaceForensics++ dataset, achieving impressive performance metrics: 94.72% accuracy, 92.09% precision, 95.06% recall, 93.55% F1-score, 94.50% specificity, and 94.78% AUC, underscoring the effectiveness of the proposed approach compared to state-of-the-art models.