###### Lenar Gumerov BS17-SE-02 # Report for Assignment 2 [F19] Introduction to Machine Learning ## Objectives * **Learn** to make training data preprocessing and model training * **Develop** a RandomForest trafic signs classifier based on given training data. * **Analyze** applied methods After applying all steps in instructions, I got this ## Results ### -- Result of padding and resizing I was usin "replicating the border" method to increase the size ![](https://i.imgur.com/iPoxk2s.png) ### -- Class frequencies in the resulting training set ![](https://i.imgur.com/CPebstQ.png) Dataset is lack of some classes ### -- Examples of augmentation, that were chosen to fill the lack | | | | -------- | -------- | | Blur ![](https://i.imgur.com/9SgDoYg.png)| Sigmoid correction![](https://i.imgur.com/3wQIIr9.png)| |Noise![](https://i.imgur.com/M8pEJPK.png)| Rotation ![](https://i.imgur.com/4Cfsah9.png) | |Gamma increasing and Sigmoid correction![](https://i.imgur.com/azyKlws.png)| Flip ![](https://i.imgur.com/l5fs5kS.png)| #### Justification **Every chosen method does not affect to sign meaning**, except flipping, because of reversed digits, but flipping is acceptable for the most of the other signs. ### -- The number of images of each class | Before augmentation| After augmentation| | -------- | -------- | |![](https://i.imgur.com/ERZzg7S.png)|![](https://i.imgur.com/1eqKBv4.png)| ### -- Overall accuracy of classifier ``` Validation score: 0.7803039979563162 Test score: 0.7453681710213776 ``` ### -- Evaluation | Precision| Recall| | -------- | -------- | |![](https://i.imgur.com/5xBGVtX.png)|![](https://i.imgur.com/cfLqw4r.png)| Samples that were classified incorrectly: | | | | -------- | -------- | |![](https://i.imgur.com/A5ed8zR.png) | ![](https://i.imgur.com/hoEsu6B.png) | |![](https://i.imgur.com/t8fARun.png)|![](https://i.imgur.com/9w1dnRh.png) | |![](https://i.imgur.com/FEHpB7A.png) |![](https://i.imgur.com/7B00GvX.png) | Most of the incorrectly classified images are very hardly readable even by human, but there are also some mistakes that may be obvious for human. That is all confirms that recognition is not ideal, but the overall score tells, that it is not bad too. ### -- Experimenting and analysing #### Non-augmented data training results ``` Validation score: 0.7748115979052241 Test score: 0.7493269992082343 ``` Augmented: ``` Validation score: 0.7803039979563162 Test score: 0.7453681710213776 ``` My augmentation methods give very slight boost around ~1 percent, because they make almost the same copy of the image with small changes. #### Different sizes dependency | Accuracy | Time | | -------- | -------- | | ![](https://i.imgur.com/C8rYYlp.png)| ![](https://i.imgur.com/u9Y2C5z.png)| Accuracy and overall time plot are very close to linear graph, so with increasing the size(or resolution) of the image, score grows, but it takes more time to train the model with bigger images.