Accelerated Real-Time Face Recognition and Segmentation with Yolov8 Optimized through Tensor RT
Main Article Content
Abstract
Real-time face segmentation in embedded systems is a challenging task that requires a proper reconciliation between the computational cost and the segmentation quality. However, the currently available approaches frequently pay more attention to one of these factors. This paper solves this problem by first identifying framework-aware optimisations and architectural scalability for the YOLOv8-seg model. A systematic evaluation of the model size across five model scales (X, L, M, N, S) shows that the N scale is optimal in terms of the mAP50-95 of 0.8283 at the frame rate of 137.20 FPS when trained and evaluated in PyTorch, outperforming the L model (0.7758 mAP, 29.30 FPS) in terms of speed and accuracy. The inference time is also optimised with TensorRT, which enhances the inference latency by 58% (4.28 ms/image) with nearly equal mAP50-95 of 0.8170, which is almost the same as that of native PyTorch. Our analysis reveals that TensorRT improves the throughput by 233.67 FPS for model N, but smaller architectures (N, S) are more efficient in terms of latency-accuracy compared to larger architectures (X, L) where there is a low return on investment (for example, the X model has an mAP of 0.7301 and frames per second of 11.29). Present a framework for deploying a system that assists in choosing the scale of the model and the inference engine (PyTorch, ONNX, TensorRT) based on the application's latency and memory requirements. The effectiveness of the proposed methodology is tested through experiments on NVIDIA Jetson platforms, and it can achieve real-time frame rates (≥37.17 FPS) with less than 3% quantization in accuracy, which is sufficient for consistent face segmentation in practical environments. This work closes the accuracy–deplorability gap and provides practical recommendations for designing edge computing applications for AR, biometrics, and privacy-preserving systems.