CVLab 2021 Winter Project

--- tags: CVLab Winter Project 2 --- # CVLab 2021 Winter Project **Deadline: 2021/2/10** ## Problem 2: License Plate Localization Train | Valid | Test ----- | ----- | --- ![](https://i.imgur.com/oz1KUBH.png) | ![](https://i.imgur.com/beFPYni.png) | ![](https://i.imgur.com/sV3ZNMw.png) | Ground-truth are drawn in orange. Prediction are drawn in red. ### Overview For each image, there is one license plate. You are asked to localize the 4 corners of the license plate. That is, predict the $(x, y)$ of each corner, 8 values in total. To reduce difficulties, you can just run the [this reference code](https://colab.research.google.com/drive/1cJS4IplIvpaQ89lA3DPIyhqZB44nCwtK?usp=sharing) to achieve the baseline. ### Data To download the data (317MB): ``` wget -nc 140.114.76.206:8000/ccpd6000.zip ``` SHA256 checksum: `977d7124a53e565c3f2b371a871ee04ebbe572f07deb0b38c5548ddaae0cb2c9` Data is organized as: ``` ccpd6000/ train_images/ test_images/ train.csv sample.csv ``` There are 3000 images with annotation for training, 3000 images without label for testing. All images are taken from [CCPD](https://github.com/detectRecog/CCPD). Each row in `train.csv` has following fields: 1. `name` specifies the name of the image, full path is `ccpd6000/train_images/<name>` 2. `BR_x`, `BR_y` is the position of bottom-right corner 3. `BL_x`, `BL_y` is the position of bottom-left corner 4. `TL_x`, `TL_y` is the position of top-left corner 5. `TR_x`, `TR_y` is the position of top-right corner The origin is at the top-left of the image. `sample.csv` serves as a sample submission. Your submission should have the same format as `sample.csv`. Note that `name` is sorted in alphabetical order. ### Evaluation The metric is the root mean-square error between the predicted locations and the ground-truth locations of the 3000 testing images: $$ \newcommand{\norm}[1]{\lVert #1 \rVert} RMSE = \sqrt{ \frac{1}{4N} \sum_{i=1}^{N} \sum_{j=1}^{4} \norm{\mathbf{p}_i^j - \mathbf{\hat p}_i^j}^2 } $$ where: $N$ is the number of images, $j$ is the index of the corner, $\mathbf{p}_i^j$ is the predicted location $(x, y)$ of the $j$-th corner of image $i$. $\mathbf{\hat{p}}_i^j$ is the ground-truth location $(x, y)$ of the $j$-th corner of image $i$. To evaluate your prediction `test.csv`, use `curl` to POST the file to the server: ``` curl -F "file=@test.csv" -X POST 140.114.76.206:5000/cs6550 -i ``` If nothing goes wrong, you will get a dictionary containing the $RMSE$ metric. ![](https://i.imgur.com/lEuqPAz.png) ### Goal #### Baseline 1. Training & Validation 2. Visualization of 25 training samples, 25 validation samples every epoch during training. 3. Overlay training losses and validation losses in the same figure against step or epoch. 4. Testing and $RMSE \le 20.0$ Your notebook should contain a cell that sends your prediction to the server, like the one shown in [reference code](https://colab.research.google.com/drive/1cJS4IplIvpaQ89lA3DPIyhqZB44nCwtK?usp=sharing). #### Improvement $RMSE \le 15.0$ Possible ways: 1. LR(learning rate) decay or smaller LR. 2. Train longer (typically until the validation loss is converged). 3. Use deeper model, like ResNet18, to extract features. 4. Different optimizer, loss, etc. 5. Data augmentation. 6. Regress heatmaps instead of values, like in human pose estimation. ## Report - Describe the methods you have tried in this project. The report should be written in Jupyter Notebooks using Markdown cells for each problem. ## Misc. - Hand in your .ipynb file along with the screenshot of the RMSE result in the comment sections in the facebook group. - When using Colab, remember to change "Runtime Type" to "GPU" to accelerate training. Typically the model parameters are initialized randomly, therefore the result may not be the same every time. It's your responsibility to make your code reproducible (by fixing seed).