YOLO FACTSHEET


Model Name

  • YOLO (You Only Look Once) (v4)

Overview

  • This document is a FactSheet accompanying the YOLO (v4) model on ‘YOLOv4: Optimal Speed and Accuracy of Object Detection’.

Purpose

  • This model can be used for object detection in real-time.

Intended Domain

  • This model is intended to be used in the image processing and classification domain.

Training Data

  • The model is trained on the COCO dataset.

Model Information

  • YOLOv4 is a one-stage object detection model that improves on YOLOv3 with several bags of tricks and modules introduced in the literature.

Inputs and Outputs

  • Input: Image, Patches, Image Pyramid
  • Output: The output has been adapted to extract embedding logits of dimension 80 for a given image.

Performance Metrics

Metric Value
Average Precision 43.5%
Frames Per Second ~65

Bias

The semantic distribution in the dataset may have bias. Thus, label smoothing is proposed to convert hard label into soft label for training, which can make model more
robust to bias.


Robustness
Data augmentation is applied with the purpose to increase the variability of the input images, so that
the designed object detection model has higher robustness to the images obtained from different environments.


Domain Shift

No domain shift evaluation occurred.


Test Data

The test set is also part of MS COCO dataset. There was a

70:20:10% split of the data into
train:val:test
. The ratio for the 80 object categories (samples/classes) is maintained as much as possible in all the splits.


Poor Conditions

  • When people on the image are very very small, another model should be used to perform crowd counting.

Explanation

While the model architecture is well documented in the reported paper, the model is still a deep neural network, which largely remains a black box when it comes to explainability of results and predictions.


Contact Information
Any queries related to the YOLO Detector model can be addressed on the model GitHub repo.