Hands-on YOLOv5: Real-time Object Detection Tutorial
Learn to implement real-time object detection with YOLOv5. This tutorial covers setup, custom dataset training, and deployment for your AI projects.
Hands-on: Object Detection with YOLOv5
YOLOv5 (You Only Look Once version 5) is a state-of-the-art, real-time object detection model developed by Ultralytics. It is highly regarded for its exceptional speed, accuracy, and user-friendly PyTorch-based implementation.
This tutorial provides a comprehensive guide to leveraging YOLOv5, covering the entire workflow from setup to model deployment.
Tutorial Outline
- Setting up YOLOv5
- Preparing a custom dataset
- Training a custom YOLOv5 model
- Running inference on new data
- Evaluating model performance
Step 1: Clone YOLOv5 Repository and Install Dependencies
Begin by cloning the official YOLOv5 GitHub repository and installing the necessary Python packages.
git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt
Step 2: Prepare Your Custom Dataset
YOLOv5 requires your dataset to be organized in a specific structure and format.
Dataset Structure
Your dataset should be organized as follows:
dataset/
├── images/
│ ├── train/
│ └── val/
└── labels/
├── train/
└── val/
images/train
: Contains training images.images/val
: Contains validation images.labels/train
: Contains training image annotations in YOLO format.labels/val
: Contains validation image annotations in YOLO format.
Annotation Format (YOLO Format)
Annotations must be provided in .txt
files, with each file corresponding to an image and sharing the same filename (e.g., image001.jpg
will have image001.txt
). Each line within a .txt
file represents a bounding box for an object and follows this format:
<class_id> <x_center> <y_center> <width> <height>
<class_id>
: An integer representing the class of the detected object (starting from 0).<x_center>
: The normalized x-coordinate of the center of the bounding box (0 to 1).<y_center>
: The normalized y-coordinate of the center of the bounding box (0 to 1).<width>
: The normalized width of the bounding box (0 to 1).<height>
: The normalized height of the bounding box (0 to 1).
Normalization: All coordinates and dimensions are normalized relative to the image's width and height. For instance, if an image is 640 pixels wide and an object's bounding box center is at pixel (320, 240), its normalized x_center
would be 320 / 640 = 0.5
, and y_center
would be 240 / 480 = 0.5
(assuming height is 480).
Step 3: Create a Data Configuration File (.yaml
)
Create a YAML file (e.g., custom_data.yaml
) to define the paths to your dataset and class information.
# Example: custom_data.yaml
train: /path/to/your/dataset/images/train # Path to training images directory
val: /path/to/your/dataset/images/val # Path to validation images directory
nc: 3 # Number of classes
names: ['car', 'truck', 'person'] # Class names
Explanation of Parameters:
train
: The directory containing your training images.val
: The directory containing your validation images.nc
: The total number of object classes in your dataset.names
: A list of strings, where each string is the name of a class, ordered according to their<class_id>
.
Step 4: Train the YOLOv5 Model
Use the train.py
script to train your custom YOLOv5 model. You can leverage pre-trained weights to accelerate the training process and improve performance.
python train.py --img 640 --batch 16 --epochs 50 --data custom_data.yaml --weights yolov5s.pt --name custom_yolov5
Key Training Parameters:
--img
: Input image size for training (e.g.,640
). Images will be resized to this dimension.--batch
: Batch size. The number of images processed in parallel during training. Adjust based on your GPU memory.--epochs
: Number of training epochs. One epoch is a complete pass through the entire training dataset.--data
: Path to your dataset configuration file (.yaml
).--weights
: Specify pre-trained weights to start training from. This is highly recommended.yolov5n.pt
(Nano)yolov5s.pt
(Small)yolov5m.pt
(Medium)yolov5l.pt
(Large)yolov5x.pt
(Extra Large)
--name
: A custom name for this training run. Logs and trained weights will be saved inruns/train/custom_yolov5
.
Step 5: Run Inference
After training, you can use your custom model to detect objects in new images, videos, or other sources.
python detect.py --weights runs/train/custom_yolov5/weights/best.pt --img 640 --conf 0.25 --source data/images/test.jpg
Inference Parameters:
--weights
: Path to your trained model weights (typicallybest.pt
from the training run).--img
: Input image size for inference. Should ideally match the training image size for best results.--conf
: Confidence threshold. Only detections with a confidence score above this value will be displayed.--source
: The input source for detection. This can be:- An image file path (
data/images/test.jpg
) - A directory of images (
data/images/
) - A video file (
video.mp4
) - A webcam stream (
0
for default webcam) - A URL to an image or video stream
- An image file path (
Inference results (images with bounding boxes) will be saved in the runs/detect/
directory.
Step 6: Evaluate Model Performance
YOLOv5 automatically evaluates your model during training and saves performance metrics. You can also re-run the evaluation on a specific dataset.
After training, the runs/train/custom_yolov5
directory will contain:
results.png
: A plot summarizing key metrics like Precision, Recall, mAP, and Loss across epochs.results.txt
: A text file with detailed metrics.
To perform a separate evaluation:
python val.py --weights runs/train/custom_yolov5/weights/best.pt --data custom_data.yaml --img 640
Evaluation Metrics:
- Precision: The accuracy of positive predictions.
- Recall: The ability of the model to find all relevant instances.
- mAP@0.5 (mean Average Precision at IoU=0.5): A standard metric measuring the average precision across all classes, using an Intersection over Union (IoU) threshold of 0.5.
- Loss metrics: Measures of how well the model is learning.
Optional: Export the Model
YOLOv5 models can be exported to various formats for deployment on different platforms.
To export your trained model to ONNX format:
python export.py --weights runs/train/custom_yolov5/weights/best.pt --include onnx
YOLOv5 supports exporting to:
- ONNX
- CoreML
- TensorRT
- TensorFlow Lite
Summary of Key Commands
Task | Command Example |
---|---|
Clone YOLOv5 | git clone https://github.com/ultralytics/yolov5 |
Install Dependencies | cd yolov5 && pip install -r requirements.txt |
Train Model | python train.py --img 640 --batch 16 --epochs 50 --data custom_data.yaml --weights yolov5s.pt --name custom_yolov5 |
Run Inference | python detect.py --weights runs/train/custom_yolov5/weights/best.pt --img 640 --conf 0.25 --source data/images/test.jpg |
Evaluate Model | python val.py --weights runs/train/custom_yolov5/weights/best.pt --data custom_data.yaml --img 640 |
Export Model (ONNX) | python export.py --weights runs/train/custom_yolov5/weights/best.pt --include onnx |
Final Thoughts
YOLOv5 stands out as a powerful and versatile tool for object detection tasks. Its suitability for both research exploration and real-world deployment, combined with its PyTorch backend, real-time performance, and straightforward customization options, makes it an excellent choice for developers and researchers working with computer vision applications.
SEO Keywords
YOLOv5 object detection tutorial, Train custom YOLOv5 model, YOLOv5 dataset format guide, YOLOv5 PyTorch implementation, Run inference with YOLOv5, YOLOv5 performance evaluation, YOLOv5 data.yaml example, YOLOv5 export to ONNX, YOLOv5 image annotation format, Best YOLOv5 training settings
Potential Interview Questions
- What is YOLOv5, and how does it differ from earlier YOLO versions?
- Answer Focus: State-of-the-art, real-time, PyTorch-based, speed/accuracy improvements, architecture changes (e.g., CSPDarknet backbone, PANet neck).
- How do you prepare a custom dataset for training YOLOv5?
- Answer Focus: Directory structure (
images/train
,labels/train
, etc.), annotation format (.txt
files), YOLO annotation schema (<class_id> <x_center> <y_center> <width> <height>
), normalization of coordinates.
- Answer Focus: Directory structure (
- What is the structure and format of YOLO-style bounding box annotations?
- Answer Focus:
<class_id>
,<x_center>
,<y_center>
,<width>
,<height>
, all normalized between 0 and 1. Explain each component and the normalization aspect.
- Answer Focus:
- Explain the contents and role of the
custom_data.yaml
file in YOLOv5.- Answer Focus: Defines dataset paths (
train
,val
), number of classes (nc
), and class names (names
). It acts as a configuration bridge between the dataset and the training script.
- Answer Focus: Defines dataset paths (
- Describe the key parameters in the YOLOv5 training command (
train.py
).- Answer Focus:
--img
,--batch
,--epochs
,--data
,--weights
,--name
. Explain what each parameter controls and its importance.
- Answer Focus:
- What pre-trained weights are available in YOLOv5 and how are they used?
- Answer Focus: List the available sizes (n, s, m, l, x) and explain that they are used with the
--weights
argument to leverage transfer learning, improving convergence and performance.
- Answer Focus: List the available sizes (n, s, m, l, x) and explain that they are used with the
- How can you perform inference using a trained YOLOv5 model?
- Answer Focus: Using the
detect.py
script, specifying the--weights
of the trained model,--source
for input, and--conf
for thresholding. Mention output saving.
- Answer Focus: Using the
- What evaluation metrics are used in YOLOv5 to assess model performance?
- Answer Focus: Precision, Recall, mAP@0.5, and loss metrics. Briefly explain what each metric indicates.
- How do you export YOLOv5 models to formats like ONNX or CoreML?
- Answer Focus: Using the
export.py
script with the--weights
and--include
arguments. Mention common export formats.
- Answer Focus: Using the
- What are some best practices to improve accuracy and reduce overfitting in YOLOv5 training?
- Answer Focus: Using appropriate pre-trained weights, data augmentation, adjusting learning rate, early stopping, increasing dataset size, hyperparameter tuning, regularization techniques.
Data Annotation: Bounding Boxes for AI Vision
Learn how bounding boxes are crucial for data annotation in computer vision, powering AI object detection, segmentation, and classification models.
YOLO, SSD, RetinaNet: Object Detection Models Explained
Explore YOLO (v5/v8), SSD, and RetinaNet, leading single-shot object detection models. Understand their speed, accuracy, and real-time applications in AI and ML.