ПОРІВНЯЛЬНИЙ АНАЛІЗ  НЕЙРОННИХ МЕРЕЖ РОЗПІЗНАВАННЯ ОБ’ЄКТІВ НА ЗОБРАЖЕННЯХ

ЯРОСЛАВ ГОЗАК; СЕРГІЙ ПАЛІЙ

doi:10.31891/2307-5732-2025-357-11

Authors

YAROSLAV HOZAK Taras Shevchenko National University of Kyiv Author https://orcid.org/0009-0008-2963-7204
SERGIY PALIY Taras Shevchenko National University of Kyiv Author https://orcid.org/0000-0001-9742-1116

DOI:

https://doi.org/10.31891/2307-5732-2025-357-11

Keywords:

R-CNN, YOLO, object recognition, computer vision

Abstract

This article presents a systematic analysis of the architectural evolution in object detection networks, tracing pivotal advancements from the pioneering Region-based Convolutional Neural Network (R-CNN) framework to contemporary YOLOv8 implementations. Early two-stage methodologies, including Fast R-CNN and Faster R-CNN, established foundational region-proposal paradigms but faced inherent computational inefficiencies. The paradigm shift toward unified, single-shot detection was catalyzed by the Single Shot Detector (SSD) and You Only Look Once (YOLO) architecture, which enabled real-time inference through end-to-end spatial grid processing.

Subsequent iterations of YOLO demonstrate progressive resolution of critical limitations. YOLOv1-v3 introduced multi-scale feature hierarchies via Feature Pyramid Networks (FPN) and anchor-based localization, enhancing small-object detection. YOLOv4 integrated cross-stage partial networks (CSPDarknet) and path aggregation (PANet) to optimize gradient flow, while advanced augmentation strategies improved robustness. YOLOv5/v6 refined these principles with hardware-aware optimizations, supporting streamlined deployment via ONNX and TensorRT runtimes.

Later innovations address persistent challenges: YOLOv7's extended efficient layer aggregation (E-ELAN) enhanced parameter utilization, whereas YOLOv8's anchor-free mechanism and Distribution Focal Loss (DFL) significantly improved bounding box precision. Throughout this evolution, techniques such as decoupled heads and post-training quantization have progressively mitigated constraints related to computational latency, model size, and edge-device compatibility.

Quantitative advances in mean average precision (mAP) and frames-per-second (FPS) metrics are contextualized against real-world applications in autonomous navigation, industrial automation, and surveillance. The study concludes by identifying emergent research trajectories, including lightweight model distillation, 3D scene understanding, and multimodal vision-language integration.

NEURAL NETWORKS FOR OBJECT RECOGNITION IN IMAGES COMPARATIVE ANALYSIS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Language

Make a Submission

Index

For Avtors

Flag