Hands-on YOLOv5: Real-time Object Detection Tutorial

Learn to implement real-time object detection with YOLOv5. This tutorial covers setup, custom dataset training, and deployment for your AI projects.

Hands-on: Object Detection with YOLOv5

YOLOv5 (You Only Look Once version 5) is a state-of-the-art, real-time object detection model developed by Ultralytics. It is highly regarded for its exceptional speed, accuracy, and user-friendly PyTorch-based implementation.

This tutorial provides a comprehensive guide to leveraging YOLOv5, covering the entire workflow from setup to model deployment.

Tutorial Outline

  • Setting up YOLOv5
  • Preparing a custom dataset
  • Training a custom YOLOv5 model
  • Running inference on new data
  • Evaluating model performance

Step 1: Clone YOLOv5 Repository and Install Dependencies

Begin by cloning the official YOLOv5 GitHub repository and installing the necessary Python packages.

git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt

Step 2: Prepare Your Custom Dataset

YOLOv5 requires your dataset to be organized in a specific structure and format.

Dataset Structure

Your dataset should be organized as follows:

dataset/
├── images/
│   ├── train/
│   └── val/
└── labels/
    ├── train/
    └── val/
  • images/train: Contains training images.
  • images/val: Contains validation images.
  • labels/train: Contains training image annotations in YOLO format.
  • labels/val: Contains validation image annotations in YOLO format.

Annotation Format (YOLO Format)

Annotations must be provided in .txt files, with each file corresponding to an image and sharing the same filename (e.g., image001.jpg will have image001.txt). Each line within a .txt file represents a bounding box for an object and follows this format:

<class_id> <x_center> <y_center> <width> <height>
  • <class_id>: An integer representing the class of the detected object (starting from 0).
  • <x_center>: The normalized x-coordinate of the center of the bounding box (0 to 1).
  • <y_center>: The normalized y-coordinate of the center of the bounding box (0 to 1).
  • <width>: The normalized width of the bounding box (0 to 1).
  • <height>: The normalized height of the bounding box (0 to 1).

Normalization: All coordinates and dimensions are normalized relative to the image's width and height. For instance, if an image is 640 pixels wide and an object's bounding box center is at pixel (320, 240), its normalized x_center would be 320 / 640 = 0.5, and y_center would be 240 / 480 = 0.5 (assuming height is 480).


Step 3: Create a Data Configuration File (.yaml)

Create a YAML file (e.g., custom_data.yaml) to define the paths to your dataset and class information.

# Example: custom_data.yaml

train: /path/to/your/dataset/images/train  # Path to training images directory
val: /path/to/your/dataset/images/val    # Path to validation images directory

nc: 3                                      # Number of classes
names: ['car', 'truck', 'person']          # Class names

Explanation of Parameters:

  • train: The directory containing your training images.
  • val: The directory containing your validation images.
  • nc: The total number of object classes in your dataset.
  • names: A list of strings, where each string is the name of a class, ordered according to their <class_id>.

Step 4: Train the YOLOv5 Model

Use the train.py script to train your custom YOLOv5 model. You can leverage pre-trained weights to accelerate the training process and improve performance.

python train.py --img 640 --batch 16 --epochs 50 --data custom_data.yaml --weights yolov5s.pt --name custom_yolov5

Key Training Parameters:

  • --img: Input image size for training (e.g., 640). Images will be resized to this dimension.
  • --batch: Batch size. The number of images processed in parallel during training. Adjust based on your GPU memory.
  • --epochs: Number of training epochs. One epoch is a complete pass through the entire training dataset.
  • --data: Path to your dataset configuration file (.yaml).
  • --weights: Specify pre-trained weights to start training from. This is highly recommended.
    • yolov5n.pt (Nano)
    • yolov5s.pt (Small)
    • yolov5m.pt (Medium)
    • yolov5l.pt (Large)
    • yolov5x.pt (Extra Large)
  • --name: A custom name for this training run. Logs and trained weights will be saved in runs/train/custom_yolov5.

Step 5: Run Inference

After training, you can use your custom model to detect objects in new images, videos, or other sources.

python detect.py --weights runs/train/custom_yolov5/weights/best.pt --img 640 --conf 0.25 --source data/images/test.jpg

Inference Parameters:

  • --weights: Path to your trained model weights (typically best.pt from the training run).
  • --img: Input image size for inference. Should ideally match the training image size for best results.
  • --conf: Confidence threshold. Only detections with a confidence score above this value will be displayed.
  • --source: The input source for detection. This can be:
    • An image file path (data/images/test.jpg)
    • A directory of images (data/images/)
    • A video file (video.mp4)
    • A webcam stream (0 for default webcam)
    • A URL to an image or video stream

Inference results (images with bounding boxes) will be saved in the runs/detect/ directory.


Step 6: Evaluate Model Performance

YOLOv5 automatically evaluates your model during training and saves performance metrics. You can also re-run the evaluation on a specific dataset.

After training, the runs/train/custom_yolov5 directory will contain:

  • results.png: A plot summarizing key metrics like Precision, Recall, mAP, and Loss across epochs.
  • results.txt: A text file with detailed metrics.

To perform a separate evaluation:

python val.py --weights runs/train/custom_yolov5/weights/best.pt --data custom_data.yaml --img 640

Evaluation Metrics:

  • Precision: The accuracy of positive predictions.
  • Recall: The ability of the model to find all relevant instances.
  • mAP@0.5 (mean Average Precision at IoU=0.5): A standard metric measuring the average precision across all classes, using an Intersection over Union (IoU) threshold of 0.5.
  • Loss metrics: Measures of how well the model is learning.

Optional: Export the Model

YOLOv5 models can be exported to various formats for deployment on different platforms.

To export your trained model to ONNX format:

python export.py --weights runs/train/custom_yolov5/weights/best.pt --include onnx

YOLOv5 supports exporting to:

  • ONNX
  • CoreML
  • TensorRT
  • TensorFlow Lite

Summary of Key Commands

TaskCommand Example
Clone YOLOv5git clone https://github.com/ultralytics/yolov5
Install Dependenciescd yolov5 && pip install -r requirements.txt
Train Modelpython train.py --img 640 --batch 16 --epochs 50 --data custom_data.yaml --weights yolov5s.pt --name custom_yolov5
Run Inferencepython detect.py --weights runs/train/custom_yolov5/weights/best.pt --img 640 --conf 0.25 --source data/images/test.jpg
Evaluate Modelpython val.py --weights runs/train/custom_yolov5/weights/best.pt --data custom_data.yaml --img 640
Export Model (ONNX)python export.py --weights runs/train/custom_yolov5/weights/best.pt --include onnx

Final Thoughts

YOLOv5 stands out as a powerful and versatile tool for object detection tasks. Its suitability for both research exploration and real-world deployment, combined with its PyTorch backend, real-time performance, and straightforward customization options, makes it an excellent choice for developers and researchers working with computer vision applications.


SEO Keywords

YOLOv5 object detection tutorial, Train custom YOLOv5 model, YOLOv5 dataset format guide, YOLOv5 PyTorch implementation, Run inference with YOLOv5, YOLOv5 performance evaluation, YOLOv5 data.yaml example, YOLOv5 export to ONNX, YOLOv5 image annotation format, Best YOLOv5 training settings


Potential Interview Questions

  1. What is YOLOv5, and how does it differ from earlier YOLO versions?
    • Answer Focus: State-of-the-art, real-time, PyTorch-based, speed/accuracy improvements, architecture changes (e.g., CSPDarknet backbone, PANet neck).
  2. How do you prepare a custom dataset for training YOLOv5?
    • Answer Focus: Directory structure (images/train, labels/train, etc.), annotation format (.txt files), YOLO annotation schema (<class_id> <x_center> <y_center> <width> <height>), normalization of coordinates.
  3. What is the structure and format of YOLO-style bounding box annotations?
    • Answer Focus: <class_id>, <x_center>, <y_center>, <width>, <height>, all normalized between 0 and 1. Explain each component and the normalization aspect.
  4. Explain the contents and role of the custom_data.yaml file in YOLOv5.
    • Answer Focus: Defines dataset paths (train, val), number of classes (nc), and class names (names). It acts as a configuration bridge between the dataset and the training script.
  5. Describe the key parameters in the YOLOv5 training command (train.py).
    • Answer Focus: --img, --batch, --epochs, --data, --weights, --name. Explain what each parameter controls and its importance.
  6. What pre-trained weights are available in YOLOv5 and how are they used?
    • Answer Focus: List the available sizes (n, s, m, l, x) and explain that they are used with the --weights argument to leverage transfer learning, improving convergence and performance.
  7. How can you perform inference using a trained YOLOv5 model?
    • Answer Focus: Using the detect.py script, specifying the --weights of the trained model, --source for input, and --conf for thresholding. Mention output saving.
  8. What evaluation metrics are used in YOLOv5 to assess model performance?
    • Answer Focus: Precision, Recall, mAP@0.5, and loss metrics. Briefly explain what each metric indicates.
  9. How do you export YOLOv5 models to formats like ONNX or CoreML?
    • Answer Focus: Using the export.py script with the --weights and --include arguments. Mention common export formats.
  10. What are some best practices to improve accuracy and reduce overfitting in YOLOv5 training?
    • Answer Focus: Using appropriate pre-trained weights, data augmentation, adjusting learning rate, early stopping, increasing dataset size, hyperparameter tuning, regularization techniques.