# Sample Example of image detection report
###### tags: `Evaluation Examples`
## Motivation
- **Person or organization developing the experiment**: CONABIO Ecoinformatics Development Team
- **Abstract**: Train a model capable of detecting people and wildlife in camera trap photographs, that allow to speed up the process of labeling a set of images of the SNMB collection.
- **Proposed solution**: The Faster-RCNN model is retrained with a set of 550K images from the SNMB collection of which 500K do not contain wildlife and 50K contain at least one individual of wildlife or people. The boxes are grouped into the "Animal" and "Person" categories. The evaluation will be carried out on the test partition using the PASCAL VOC metric set, as it is the most used standard in biodiversity monitoring experiments as it is simple and does not require high accuracy in detections. The mAP was calculated from the macro-average of the Average Precision (AP) of each category, so that the large imbalance between classes does not affect the final result.
## Dataset information
- **Collection**: SNMB
- **Version**: 2019, detection
- **URL**: https://snmb.conabio.gob.mx/conabio_ml_collections/images/snmb_2019_detection.tar.gz
### Preprocessing information
- **Preprocessing operations before training**: resize (800, 600)
- **Preprocessing operations during training**: random_horizontal_flip
### Dataset distribution
Sample distribution in the dataset categories.
| Person | Animal | Empty |
|------- |------ |------- |
| 1,000 | 49,000 | 500,000 |
### Dataset partitions
#### Percents
Percentage in each of the dataset partitions.
| Train | Test | Validation |
|------ |------- |------ |
| 80% | 10% |10% |
#### Samples
Distribution of samples from each category in dataset partitions
| | Train | Test | Validation |
|------- |------ |------- |------ |
| Person | 800 | 100 |100 |
| Animal | 39,200 | 4,900 |4,900 |
| Empty | 400,000 | 50,000 |50,000 |
#### Additional criteria
- **Grouping criteria**: The images were grouped into sequences, such that the images of the same location and not separated by more than 2 seconds belong to the same sequence and the images of a sequence belong to the same partition.
### Dataset variability
- **variability**: Does not apply
## Model data
- **Model type**: Faster R-CNN architecture with ResNet of 101 layers as Neural Network. This model was pretrained model with COCO dataset.
- **Primary intended users**: The users of this model are mainly people dedicated to animal ecology.
- **Primary intended uses**: Monitoring of natural areas, detection of wildlife in photographs of camera traps, filtering of photographs that do not contain fauna.
- **Out of scope uses**: Detection of animal species in photographs that are not similar to those of camera traps. For photographs of camera traps in ecosystems that are very different from those in Mexico, it can be difficult to detect wildlife.
## Evaluation data
- **Model performance measures**: Because it is an unbalanced dataset and we want to give the same weight to all classes, and also that the type of application does not require a high degree of precision in the location of the resulting objects, it is preferred to use the set of metrics of the PASCAL VOC competition, which has as its base metrics the precision x recall curve and the average precision for each category, as well as the macro-average of the precision x recall curve and the average of the average precision (mAP) for the model in general. In this case, the loss of valuable information (photos with wildlife) in the resulting set is penalized more than having a greater number of false positives.
- **Decision thresholds**:
- matching_iou_threshold: 0.5
- nms_iou_threshold: 0.3
- nms_max_output_boxes: 50
- group_of_weight: 0.0
- **Approaches to uncertainty and variability**: Does not apply
## Results
### Plots
Below is the Precision x recall graphs resulting from each class and the macro-average for the model in general:

### Analysis of the results
As we know, each point on the graph assumes a certain score value, for which precision and recall are calculated. For example, for a score value of 0.8, only boxes with a probability equal to or greater than 0.8 will be taken as valid detections, the rest are discarded.
In general, the behavior of a curve will be that it will have the highest precision values and the lowest recall values for high threshold values. As the threshold decreases, precision values tend to decrease (more false positives), while recall values will increase (more positive examples were found from the total positive examples).
The above graph shows the curves for the 3 classes we are analyzing, in addition to the curve that shows the macro-average of the precision curve x recall for the three classes of the model. The area under each of the curves is also shown, which approximates the value of the average precision (AP) for that class. The area of the macro-average curve approximates the average value of the average precision (mAP) of the model.
#### Curve analysis for the Person class
The curve of the Person class (green) shows that the precision remains very high (0.9) to recall levels greater than 0.5, even it remains close to 0.7 for recall levels of 0.8. Because the model has a good performance for this class, you can rely on having almost all of the images of people even for relatively low score levels, and thus not including many examples of this class in the resulting set, avoiding that Experts see many of the photos that contain only people.
The area under the curve (AUC) approximates the average precision value (AP) and indicates the average precision value for all recall levels. In this case it tells us that about 80% of detections of the Person class are correct, and in general it can be highly certain that the model will correctly detect people in phototramp images.
#### Curve analysis for the Animal class
The curve of the Animal class (light blue) shows that the precision is close to 0.5 for high recall levels, and since for this problem the loss of positive elements (photos with animals) is penalized more than having a high amount of false positives (photos without animals in which an animal was detected), we see that by assigning the threshold of the score to a very low value and thus maintaining as many positive elements as possible, about half of the phototramps they review will indeed have wildlife, therefore, the review effort by the experts will have been considerably reduced, and it will be certain to have labeled a very high percentage of the total photos with wildlife.
The area under the curve (AUC) indicates that on average 60% of the detections of the Animal class actually consist of animals.
#### Curve analysis for the Empty class
The curve of the Empty (red) class shows that for this class the precision drops sharply for relatively low recall levels, so for this class you cannot have a high certainty of having correctly detected a phototram that does not contain wildlife.
### Classification examples
#### Best classified samples
##### Person



##### Animal



##### Empty



#### Worst classified samples
##### Person



##### Animal



##### Empty



##### Samples in the threshold of two classes
##### Person



##### Animal



##### Empty



### Results tables
#### Per class
| | Precision | Recall | AP |
| ------ | ----------------------- |:------------------------- | ---- |
| Person | [1., 1., .99, ..., .4] | [0., 0.1, 0.2, ..., 1.] | 0.79 |
| Animal | [1., .9, .89, ..., .45] | [0., 0.15, 0.16, ..., 1.] | 0.60 |
| Empty | [1., .55, .5, ..., .37] | [0., 0.12, 0.2, ..., 1.] | 0.39 |
#### Macro-average of all classes
| Precision | Recall | mAP |
| ----------------------- |:------------------------ | ---- |
| [1., .9, .89, ..., .45] | [0., 0.12, 0.2, ..., 1.] | 0.58 |