
👁️ YOLOv3: the paper whose original title literally says “an incremental improvement.” A scientific rarity — and an architecture that worked.
After YOLOv2, methods like SSD, DSSD and RetinaNet were starting to overshadow it. The authors responded with incremental improvements that resulted in significant performance gains.
What changed from YOLOv2 to YOLOv3?
Darknet-53 → new backbone with 52 convolutional layers + residual connections (inspired by ResNet). Previously it was Darknet-19. Deeper = more representational capacity.
Multi-scale detection → YOLOv3 detects objects at 3 different feature map scales. This greatly improves small object detection, which was YOLO’s Achilles’ heel.
Multi-label classification → uses sigmoid instead of softmax. Allows an object to belong to multiple categories (useful in datasets with class hierarchies).
Performance: on COCO mAP@50 it outperforms RetinaNet while being ~4x faster. Speed remains YOLO’s strong suit.
The article also includes a complete from-scratch implementation in PyTorch.
💡 Explanation in a nutshell#
YOLOv3 didn’t reinvent object detection — it refined what worked. Multi-scale detection and Darknet-53 were the key changes. The result: detects small objects much better than predecessors, maintains YOLO’s characteristic speed, and lays the groundwork for modern versions.
More information at the link 👇

