YOLO FACTSHEET

Model Name

YOLO (You Only Look Once) (v4)

Overview

This document is a FactSheet accompanying the YOLO (v4) model on ‘YOLOv4: Optimal Speed and Accuracy of Object Detection’.

Purpose

This model can be used for object detection in real-time.

Intended Domain

This model is intended to be used in the image processing and classification domain.

Training Data

The model is trained on the COCO dataset.

Model Information

YOLOv4 is a one-stage object detection model that improves on YOLOv3 with several bags of tricks and modules introduced in the literature.

Inputs and Outputs

Input: Image, Patches, Image Pyramid
Output: The output has been adapted to extract embedding logits of dimension 80 for a given image.

Performance Metrics

Metric	Value
Average Precision	43.5%
Frames Per Second	~65

Bias

The semantic distribution in the dataset may have bias. Thus, label smoothing is proposed to convert hard label into soft label for training, which can make model more
robust to bias.

Robustness
Data augmentation is applied with the purpose to increase the variability of the input images, so that
the designed object detection model has higher robustness to the images obtained from different environments.

Domain Shift

No domain shift evaluation occurred.

Test Data

The test set is also part of MS COCO dataset. There was a

70 : 20 : 10

% split of the data into

t r a i n : v a l : t e s t

. The ratio for the 80 object categories (samples/classes) is maintained as much as possible in all the splits.

Poor Conditions

When people on the image are very very small, another model should be used to perform crowd counting.

Explanation

While the model architecture is well documented in the reported paper, the model is still a deep neural network, which largely remains a black box when it comes to explainability of results and predictions.

Contact Information
Any queries related to the YOLO Detector model can be addressed on the model GitHub repo.

YOLO FACTSHEET

Read more

PLACES FactSheet

CCMA