YOLO Accuracy Enhancement for Dense and Dynamic Scenes
Main Article Content
Abstract
Computer vision relies on object detection and supports different applications, including self-driving cars, intelligent monitoring, and so on. The deep neural network “You Only Look Once” with his version (YOLOv8) has greatly helped the object detection tasks by improving the accuracy and making the inference process faster. Even so, it is still difficult to achieve excellent real-time performance when conditions include clutter, obstructions, and objects that are small. This article introduces some interesting new improvements to the YOLOv8 architecture that involve two modules Cross Stage Partial Transformer block (C3TR) and Adaptive Downsampling (Adown), as well as the use of multiple sizes of receptive fields to better capture multi-scale features.
Experimental results show that the improved YOLOv8 outperforms the baseline in both accuracy and across various datasets. The outcomes of this research confirm the effectiveness of these enhancements. The modified models showed consistent improvements in accuracy, with mAP scores increasing by +2.3% to +5.8% over the original YOLOv8.