# Road defect detection using deep active learning
At [Element AI](https://www.elementai.com/), our teams use our active learning library [Baal](github.com/ElementAI/baal) to quickly move from labelling to production models.
If you would like to get a good introduction to active learning, we recommend that you read our initial release [blog post](https://www.elementai.com/news/2019/element-ai-makes-its-bayesian-active-learning-library-open-source).
Recently, the ability to detect road surface defects was identified as an interesting use case for active learning. The end goal was to automatically determine if a segment of road needed to be resurfaced.
More specifically, we needed a rough estimate of the defect area. For this reason, we treated this problem as a **semantic segmentation problem**.
## Data definition
We were able to find a [public dataset](https://github.com/sekilab/RoadDamageDetector), but unfortunately, the labels provided were for bounding boxes only. Consequently, to generate the polygon labels required for semantic segmentation, we involved the Element AI data labelling team to help us define our task.
We identified three types of defects and another feature for detection:
| Cracks | Potholes | Patches | Manholes |
| -------- | -------- | -------- | -------- |
| Img | Img | Img | Img |
## Active learning model definition
To perform active learning, we rely on MC-Dropout (Gal et al.) and BALD (Houlsby et al.) to estimate the uncertainty of each unlabelled sample.
Our model is a U-Net (Ronneberger et al.), to which we added a Dropout layer before the last convolution. This Dropout layer allows us to use MC-Dropout (more on that later). We trained our network using standard weighted cross-entropy. The weights are automatically detected by the proportion of pixels per class at each active learning step.
## MC-Dropout and BALD
Monte-Carlo Dropout—otherwise known as MC-Dropout—is a technique proposed by Gal et al. in which they estimate the posterior distribution of the model using Dropout. It can be shown that Dropout acts practically like a Bayesian ensemble. By doing Monte-Carlo estimation of this prior distribution, we get a distribution of predictions that may have high variance if the model parameters are uncertain. This technique can only estimate the epistemic uncertainty.
Bayesian Active Learning by Disagreement (BALD) is a heuristic that can be used with MC-Dropout to quantify the uncertainty of a distribution. A nice property of BALD is that is doesn't make a Gaussian assumption on the distribution like variance.
Something we need to remember when using active learning in a real-world project is that we need to recompute uncertainty as fast as possible. To do so, we need to limit the number of MC estimations because they slow down the process. In this case, we limited the number of MC samplings to 20. In our experiments, we saw that it created a good trade-off between speed and the quality of the uncertainties that were computed using our method.
## Labelling with active learning
At Element AI, we have a team of labellers with whom the active learning team collaborates.
They work closely with the data scientist in charge of the project so that the data can be easily integrated into their machine learning pipeline. We believe that having a **conversation** with machine learning experts can help labellers produce high-impact labels and experts to better understand the data.
### Cold-start problem
Active learning suffers from the cold-start problem, which means we can't use it before many samples have been labelled. While some approches such as coresets (Sener and Savarese) and few-shot learning (Snell et al.) have been proposed, we have not yet experimented with them. Consequently, we randomly label a small amount of data to create a test set and an initial training dataset. In our future work, we aim to integrate coresets as the first step of our active-learning pipeline.
### When to stop
We monitored several metrics during labelling, including:
* Validation Loss
We explained the metrics to the labelling team and showed them what a converged process looked like.
This way, the labelling team can be autonomous and connect with the data science expert assigned to the project when the process converges or when the labelling budget is reached.
After a few days of labelling, we decided to stop the labelling effort as the model converged. Here are our findings.
### Time saved
There are 9,900 samples in this dataset; we labelled 900 for training as well as 90 for validation.
We estimate the cost of labelling one image to be 50 cents when requiring high-quality labelling.
We also estimate the time to label one image to be 30 seconds. In the end, because we only labelled 990 out of 9900 images, **we saved $4,500 and 75 hours**.
Here are some predictions from the resulting models trained on less than 10% of the data.
[Pretty pred on pretty image.]()
[Road Damage Dataset](https://github.com/sekilab/RoadDamageDetector)
[Deep bayesian active learning for images](https://arxiv.org/abs/1703.02910)
[Coresets for active learning](https://arxiv.org/abs/1708.00489)