title: Detailed explanation of YOLO series models: A complete guide from YOLOv1 to YOLOv8 | Daoman PythonAI description: A complete YOLO target detection algorithm tutorial, with an in-depth analysis of the development history, architectural features, PyTorch implementation and practical application scenarios from YOLOv1 to YOLOv8, including detailed code examples and performance comparisons. keywords: [YOLO, target detection, computer vision, PyTorch, deep learning, YOLOv8, YOLOv5, real-time detection, image recognition]
Detailed explanation of YOLO series models: A complete guide from YOLOv1 to YOLOv8
The real-time AR makeup try-on you see on your phone, the automatic license plate reading in the parking lot, the drone patrolling the sky to find the source of fire - behind these cool applications, there is a high probability that there is a shadow of YOLO (You Only Look Once). It uses a bold idea to split the target detection from the two stages of "slow work and careful work" to a single stage return of "determine the world at a glance". Since the birth of the first version in 2015, every iteration of YOLO has refreshed the balance of "accuracy - speed - ease of use".
This article will help you quickly sort out the core evolution logic of the YOLO family, use key codes to understand the architectural changes, and then help you choose the version that suits you best through performance comparison, and finally give you a practical solution to get started.
1. Basics of getting started with YOLO
1.1 Core Definition: Single-stage regression
The workflow of traditional two-stage algorithms (such as Faster R-CNN) is much like an old-fashioned camera: first "select the box" (generate candidate regions), and then "focus" (classification + fine-tuning). YOLO's idea is completely different - one step to get it right:
- Divide the input image into
S×Sgrid - Each grid is only responsible for detecting those targets whose "center falls on me"
- Output "B prediction boxes + box confidence + C category probabilities" corresponding to all grids at once
In this way, the model "looks" at the picture once and can directly tell what is on the picture and where it is, which greatly simplifies the detection process.
1.2 Intuitive comparison with mainstream algorithms
The following table clearly shows the differences between single-stage models such as YOLO, two-stage algorithms, and other single-stage algorithms, helping you quickly establish global awareness.
It can be seen that YOLO's advantages lie in extreme speed and increasing accuracy, which makes it the first choice in real-time scenarios.
2. Key evolution of YOLO (core code highlighting)
From v1 to v8, each generation of YOLO stands on the shoulders of the previous generation and focuses on solving the remaining pain points. Following this evolution route, we will use code to explain and connect these key improvement points in series.
2.1 YOLOv1 (2015): Groundbreaking "One Glance"
Core Breakthrough
- Proposed single-stage end-to-end detection for the first time, making target detection easier than ever
- Strong global context understanding ability, greatly reducing misjudgments of treating the background as a target
Pain points
- Poor positioning accuracy: direct regression to the width and height of the bounding box, without a priori box as a reference
- Small targets are often missed: using only a single grid of 7×7, it is difficult for small targets to “occupy” the center of a grid
- Single grid, single category: One grid can only predict one category, and overlapping targets (such as two people crowded together) cannot be detected at the same time.
2.2 YOLOv2 (2016): Anchor boxes and accuracy improvement
Core improvements
- K-means clustering generates a priori anchor boxes: gives a reliable starting point for the regression of width and height, and solves the problem of inaccurate positioning of v1
- Batch Normalization: Accelerate training convergence and improve model stability
- Darknet-19: lightweight and efficient feature extraction backbone, faster and stronger
- Multi-scale training: Randomly switch multiple resolutions of 32 (320-608) every 10 batches to allow the model to adapt to targets of different sizes.
The following code shows how v2 uses anchor boxes to decode raw predictions into true bounding box coordinates:
2.3 YOLOv3 (2018): Small goal savior
This generation focuses on solving the problems of small target and overlapping target detection, and the changes are very practical.
Core Breakthrough
- Darknet-53: Introducing a deeper backbone of residual connections, greatly enhancing feature extraction capabilities
- Three-scale detection head: Predict on three feature maps of different sizes, 13×13 is responsible for large targets, 26×26 is responsible for medium targets, and 52×52 specializes in small targets. The problem of single-scale missed detection has been completely solved
- Binary Cross Entropy classification: Use binary cross entropy to replace softmax, allowing a single grid to predict multiple categories, and overlapping targets no longer conflict
2.4 YOLOv5/v8 (2020-2023): Ease of use and ceiling improved simultaneously
These two versions (v8 can be seen as a comprehensive evolution of v5) led by the Ultralytics team have transformed YOLO from an "academic black box" into an "industrial toolkit". The developer experience is straight forward.
v5 key features
- PyTorch native implementation: The API is very friendly and you can train on public data sets such as COCO with just a few lines of code.
- Focus structure: a clever lossless downsampling method that rearranges width and height information into channel dimensions to minimize information loss
- CSPDarknet/C3 structure: efficient feature extraction and fusion module, both powerful and fast
- Mosaic data enhancement: Randomly splice 4 pictures into one, which greatly enriches the training background and the context of small targets, and the generalization ability soars.
v8 key features
- Anchor-less design: Completely throw away the set of hyper-parameters of the anchor box, making deployment and maintenance extremely simple.
- Multi-task unification: One model skeleton handles detection, segmentation, classification, and pose estimation, right out of the box
- DFL + CIoU loss: more accurate bounding box regression and more stable positioning
- Extremely lightweight: The smallest v8n model has only 3.2M parameters and can be run directly on the mobile phone
3. YOLO practical introduction (Ultralytics v8)
After talking about the theory, let’s use a few pieces of code to help you actually run it. We use Ultralytics YOLOv8, which is currently the most complete ecosystem.
3.1 Installation and basic reasoning
3.2 Performance and scene selection
Different application scenarios have completely different requirements for speed and accuracy. The comparison table below can help you quickly identify the appropriate model.
Generally speaking, try v8s or v8m first. If the speed is not satisfactory, go down in the direction of n. If the accuracy is not enough, go up in the direction of l/xl.
4. Learning and optimization suggestions
4.1 Learning Path
It is recommended to follow the following order, which is both easy and solid:
- First understand the single-stage regression idea of YOLOv1 and figure out "why just look at it"
- Look at the anchor box and BN of v2, and the multi-scale design of v3, and connect the historical pain points with solutions
- Finally use v8 to get started with the project, focus on data enhancement strategies and parameter adjustment of loss functions, and start directly from industrial-level practice
4.2 Common optimization directions
- Data Enhancement: If there are many small targets, try Copy-Paste; if overfitting occurs, appropriately reduce the intensity of Mosaic
- Training Tips: Turn on multi-scale training (imgsz=640-1280), use cosine annealing learning rate attenuation, usually 1~2 points can be improved
- Deployment Optimization: Use
model.export(format='onnx')Export ONNX and use TensorRT or OpenVINO to accelerate inference on the target device
Summarize
The success of YOLO is essentially a victory of continuously implementing cutting-edge academic technology into industrial-grade easy-to-use tools. From the "dare to be the first" of v1 to the "all-rounder" of v8, it has covered almost all computing scenarios from mobile phones to supercomputers.
If you want to choose an entry-level version or a landing product now, strongly recommend YOLOv8 - it has complete documentation, friendly API, and high performance ceiling. It is one of the well-deserved de facto standards in the field of target detection.
🔗Extended resources

