# DEEP-PLANT

## Configurations
```bash=
args['NUM_WORKERS'] = 4
args['EPOCHES'] = 50
args['BATCH_SIZE'] = 64
args['PATIENCE'] = 5
args['VALID_RATIO'] = .2
args['LR'] = 1e-2
args['MIN_LR'] = 1e-5
args['L1_ratio'] = 1e-4
args['L2_ratio'] = 1e-3
args['CLIPPING'] = .9
args['W_DECAY'] = .9
```
## Data Augmentation
```python=
transform = transforms.Compose([
transforms.Resize(256),
transforms.RandomCrop(224),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomVerticalFlip(p=0.3),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
```
這種方式對於單調的資料集來說會造成嚴重的overfitting, 體現於大安森林公園的資料集合,配置應該要隨著資料而改變。
## Save results in Kaggle environment
```python=
import shutil
shutil.make_archive('results', 'zip', '/kaggle/working')
```
## Pretrained model
for daanForestPark dataset
| Net | pameters_n | size(MB) | accuracy |
| -------- | -------- | -------- | -------- |
| resNet18 |11,220,117 |42.8 | .897|
| googleNet |5,687,029 |21.69 | .849|
|resnext50_32x4d|23,154,069 |88.33 | .922|
|wide_resnet50_2| 67,008,405|255.62 | .853|
|shufflenet_v2_x1_0| 1,340,729 | 0.57 |.962|
## DaanForestPark Dataset
* 68 categories
* 11,140 shots
### Problems
#### Overfitting
起因可能是照片同種相似度太高,類別過少,資料量不足
為了解決過擬合的問題嘗試的方法:
1. 延伸訓練時間(decrease learning rate)
2. 加入 regularization
3. 提昇 regularization 的權重
4. 將 train/valid 獨立分開
5. 採用不同的 augmentation transforms
#### Normalize的差別
在測試模型的泛用性之前,先要理解讓模型知道某種植物的 input 為何,以 rose 為例子, 若訓練集合和測試集合相差甚遠, 很有可能造成難以辨識的情形

1. 沒有做Normalize:
以 rose 為例子,我輸入 extra Data 的 rose 當成 input ,其中 rose 在大安森林公園的 dataset 也存在, 我的輸入為 192x192 的 rose 照片如下:

```
the top 1 prediction is 矮牽牛, prob is 1.000
the top 2 prediction is 野菊花, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 0.999
the top 2 prediction is 矮牽牛, prob is 0.001
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 兩色金雞菊, prob is 1.000
the top 2 prediction is 野菊花, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 矮牽牛, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 兩色金雞菊, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 矮牽牛, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
the top 1 prediction is 野菊花, prob is 1.000
the top 2 prediction is 三色堇, prob is 0.000
the top 3 prediction is 三白草, prob is 0.000
------------------------------
```
2. 同樣的 data 做 Normalize

```
the top 1 prediction is 皋月杜鵑, prob is 0.655
the top 2 prediction is 四季秋海棠, prob is 0.169
the top 3 prediction is 桃金孃, prob is 0.084
------------------------------
the top 1 prediction is 玫瑰, prob is 0.926
the top 2 prediction is 平戶杜鵑, prob is 0.032
the top 3 prediction is 著生杜鵑, prob is 0.023
------------------------------
the top 1 prediction is 使君子, prob is 0.733
the top 2 prediction is 仙丹花, prob is 0.143
the top 3 prediction is 孤挺花, prob is 0.069
------------------------------
the top 1 prediction is 台灣金絲桃, prob is 0.264
the top 2 prediction is 玫瑰, prob is 0.263
the top 3 prediction is 蔓花生, prob is 0.131
------------------------------
the top 1 prediction is 軟枝黃蟬, prob is 0.215
the top 2 prediction is 孤挺花, prob is 0.124
the top 3 prediction is 台灣萍蓬草, prob is 0.116
------------------------------
the top 1 prediction is 蜀葵, prob is 0.439
the top 2 prediction is 翠蘆莉, prob is 0.367
the top 3 prediction is 含羞草, prob is 0.035
------------------------------
the top 1 prediction is 矮牽牛, prob is 0.534
the top 2 prediction is 翠蘆莉, prob is 0.457
the top 3 prediction is 非洲鳳仙花, prob is 0.006
------------------------------
the top 1 prediction is 阿勃勒, prob is 0.568
the top 2 prediction is 著生杜鵑, prob is 0.332
the top 3 prediction is 兩色金雞菊, prob is 0.050
------------------------------
the top 1 prediction is 四季秋海棠, prob is 0.841
the top 2 prediction is 蜀葵, prob is 0.086
the top 3 prediction is 天竺葵, prob is 0.033
------------------------------
the top 1 prediction is 南美蟛蜞菊, prob is 0.560
the top 2 prediction is 兩色金雞菊, prob is 0.124
the top 3 prediction is 軟枝黃蟬, prob is 0.115
------------------------------
the top 1 prediction is 使君子, prob is 0.683
the top 2 prediction is 著生杜鵑, prob is 0.262
the top 3 prediction is 厚皮香, prob is 0.023
------------------------------
the top 1 prediction is 金絲桃, prob is 0.410
the top 2 prediction is 平戶杜鵑, prob is 0.243
the top 3 prediction is 瑪格麗特, prob is 0.147
------------------------------
the top 1 prediction is 著生杜鵑, prob is 0.747
the top 2 prediction is 孤挺花, prob is 0.101
the top 3 prediction is 九重葛, prob is 0.051
------------------------------
the top 1 prediction is 槭葉牽牛, prob is 0.434
the top 2 prediction is 杜鵑花仙子, prob is 0.259
the top 3 prediction is 矮牽牛, prob is 0.104
------------------------------
the top 1 prediction is 平戶杜鵑, prob is 0.144
the top 2 prediction is 金露花, prob is 0.125
the top 3 prediction is 軟枝黃蟬, prob is 0.118
------------------------------
the top 1 prediction is 著生杜鵑, prob is 0.846
the top 2 prediction is 玫瑰, prob is 0.056
the top 3 prediction is 月桃, prob is 0.031
------------------------------
```
3. 兩者圖片差別


#### Size 的差別
統一使用normalize的data,原因是經過normalize模型才能正確做分類
1. 192x192

> 猜中 0 次
2. 224x224

> 猜中 4 次
> top.2 0次
```
the top 1 prediction is 槭葉牽牛, prob is 0.977
the top 2 prediction is 皋月杜鵑, prob is 0.009
the top 3 prediction is 矮牽牛, prob is 0.009
------------------------------
the top 1 prediction is 玫瑰, prob is 0.620
the top 2 prediction is 平戶杜鵑, prob is 0.314
the top 3 prediction is 著生杜鵑, prob is 0.035
------------------------------
the top 1 prediction is 平戶杜鵑, prob is 0.957
the top 2 prediction is 野薑花, prob is 0.026
the top 3 prediction is 著生杜鵑, prob is 0.011
------------------------------
the top 1 prediction is 玫瑰, prob is 0.837
the top 2 prediction is 平戶杜鵑, prob is 0.111
the top 3 prediction is 著生杜鵑, prob is 0.028
------------------------------
the top 1 prediction is 玫瑰, prob is 0.705
the top 2 prediction is 野薑花, prob is 0.213
the top 3 prediction is 南美蟛蜞菊, prob is 0.041
------------------------------
the top 1 prediction is 玫瑰, prob is 0.872
the top 2 prediction is 非洲鳳仙花, prob is 0.065
the top 3 prediction is 桃金孃, prob is 0.050
------------------------------
the top 1 prediction is 使君子, prob is 0.735
the top 2 prediction is 孤挺花, prob is 0.144
the top 3 prediction is 南美朱槿, prob is 0.052
------------------------------
the top 1 prediction is 孤挺花, prob is 0.962
the top 2 prediction is 阿勃勒, prob is 0.009
the top 3 prediction is 著生杜鵑, prob is 0.008
------------------------------
the top 1 prediction is 阿勃勒, prob is 0.956
the top 2 prediction is 瑪格麗特, prob is 0.025
the top 3 prediction is 蜀葵, prob is 0.006
------------------------------
the top 1 prediction is 台灣金絲桃, prob is 0.478
the top 2 prediction is 南美蟛蜞菊, prob is 0.447
the top 3 prediction is 台灣萍蓬草, prob is 0.027
------------------------------
the top 1 prediction is 四季秋海棠, prob is 0.787
the top 2 prediction is 蜀葵, prob is 0.095
the top 3 prediction is 平戶杜鵑, prob is 0.091
------------------------------
the top 1 prediction is 矮牽牛, prob is 0.619
the top 2 prediction is 翠蘆莉, prob is 0.371
the top 3 prediction is 非洲鳳仙花, prob is 0.007
------------------------------
the top 1 prediction is 野薑花, prob is 0.969
the top 2 prediction is 平戶杜鵑, prob is 0.021
the top 3 prediction is 玫瑰, prob is 0.003
------------------------------
the top 1 prediction is 阿勃勒, prob is 0.909
the top 2 prediction is 蜀葵, prob is 0.024
the top 3 prediction is 艷紫荊, prob is 0.018
------------------------------
the top 1 prediction is 軟枝黃蟬, prob is 0.247
the top 2 prediction is 野薑花, prob is 0.132
the top 3 prediction is 瑪格麗特, prob is 0.115
------------------------------
the top 1 prediction is 金絲桃, prob is 0.224
the top 2 prediction is 蜀葵, prob is 0.211
the top 3 prediction is 平戶杜鵑, prob is 0.202
------------------------------
```
3. 311x311

> 猜中 4 次
> top.2 2次
```
the top 1 prediction is 非洲鳳仙花, prob is 0.912
the top 2 prediction is 孤挺花, prob is 0.082
the top 3 prediction is 玫瑰, prob is 0.002
------------------------------
the top 1 prediction is 玫瑰, prob is 0.828
the top 2 prediction is 野薑花, prob is 0.100
the top 3 prediction is 南美蟛蜞菊, prob is 0.033
------------------------------
the top 1 prediction is 槭葉牽牛, prob is 0.973
the top 2 prediction is 皋月杜鵑, prob is 0.017
the top 3 prediction is 矮牽牛, prob is 0.004
------------------------------
the top 1 prediction is 皋月杜鵑, prob is 0.193
the top 2 prediction is 桃金孃, prob is 0.148
the top 3 prediction is 蜀葵, prob is 0.136
------------------------------
the top 1 prediction is 皋月杜鵑, prob is 0.888
the top 2 prediction is 玫瑰, prob is 0.072
the top 3 prediction is 四季秋海棠, prob is 0.037
------------------------------
the top 1 prediction is 金魚草, prob is 0.724
the top 2 prediction is 玫瑰, prob is 0.083
the top 3 prediction is 天竺葵, prob is 0.054
------------------------------
the top 1 prediction is 南美蟛蜞菊, prob is 0.507
the top 2 prediction is 台灣金絲桃, prob is 0.416
the top 3 prediction is 台灣萍蓬草, prob is 0.022
------------------------------
the top 1 prediction is 阿勃勒, prob is 0.972
the top 2 prediction is 瑪格麗特, prob is 0.014
the top 3 prediction is 蜀葵, prob is 0.005
------------------------------
the top 1 prediction is 矮牽牛, prob is 0.571
the top 2 prediction is 翠蘆莉, prob is 0.420
the top 3 prediction is 非洲鳳仙花, prob is 0.007
------------------------------
the top 1 prediction is 玫瑰, prob is 0.871
the top 2 prediction is 非洲鳳仙花, prob is 0.048
the top 3 prediction is 桃金孃, prob is 0.046
------------------------------
the top 1 prediction is 阿勃勒, prob is 0.836
the top 2 prediction is 蜀葵, prob is 0.037
the top 3 prediction is 艷紫荊, prob is 0.037
------------------------------
the top 1 prediction is 孤挺花, prob is 0.884
the top 2 prediction is 軟枝黃蟬, prob is 0.041
the top 3 prediction is 月桃, prob is 0.028
------------------------------
the top 1 prediction is 孤挺花, prob is 0.487
the top 2 prediction is 仙丹花, prob is 0.395
the top 3 prediction is 玫瑰, prob is 0.036
------------------------------
the top 1 prediction is 玫瑰, prob is 0.333
the top 2 prediction is 皋月杜鵑, prob is 0.296
the top 3 prediction is 著生杜鵑, prob is 0.188
------------------------------
the top 1 prediction is 玫瑰, prob is 0.574
the top 2 prediction is 平戶杜鵑, prob is 0.346
the top 3 prediction is 著生杜鵑, prob is 0.037
------------------------------
the top 1 prediction is 著生杜鵑, prob is 0.581
the top 2 prediction is 平戶杜鵑, prob is 0.271
the top 3 prediction is 石竹, prob is 0.020
------------------------------
```
4. 512x512

> 猜中 1 次
> top.2 3次
```
the top 1 prediction is 著生杜鵑, prob is 0.530
the top 2 prediction is 孤挺花, prob is 0.383
the top 3 prediction is 南美朱槿, prob is 0.055
------------------------------
the top 1 prediction is 著生杜鵑, prob is 0.771
the top 2 prediction is 兩色金雞菊, prob is 0.172
the top 3 prediction is 仙丹花, prob is 0.034
------------------------------
the top 1 prediction is 槭葉牽牛, prob is 0.978
the top 2 prediction is 皋月杜鵑, prob is 0.009
the top 3 prediction is 矮牽牛, prob is 0.007
------------------------------
the top 1 prediction is 台灣蛇莓, prob is 0.324
the top 2 prediction is 天竺葵, prob is 0.261
the top 3 prediction is 平戶杜鵑, prob is 0.245
------------------------------
the top 1 prediction is 杜鵑花仙子, prob is 0.419
the top 2 prediction is 槭葉牽牛, prob is 0.288
the top 3 prediction is 矮牽牛, prob is 0.104
------------------------------
the top 1 prediction is 皋月杜鵑, prob is 0.936
the top 2 prediction is 玫瑰, prob is 0.047
the top 3 prediction is 四季秋海棠, prob is 0.015
------------------------------
the top 1 prediction is 使君子, prob is 0.783
the top 2 prediction is 孤挺花, prob is 0.134
the top 3 prediction is 南美朱槿, prob is 0.043
------------------------------
the top 1 prediction is 孤挺花, prob is 0.635
the top 2 prediction is 著生杜鵑, prob is 0.268
the top 3 prediction is 使君子, prob is 0.088
------------------------------
the top 1 prediction is 金魚草, prob is 0.536
the top 2 prediction is 玫瑰, prob is 0.189
the top 3 prediction is 久留米杜鵑, prob is 0.079
------------------------------
the top 1 prediction is 艷紫荊, prob is 0.449
the top 2 prediction is 鳳凰木, prob is 0.374
the top 3 prediction is 盾柱木, prob is 0.082
------------------------------
the top 1 prediction is 平戶杜鵑, prob is 0.796
the top 2 prediction is 玫瑰, prob is 0.151
the top 3 prediction is 野菊花, prob is 0.022
------------------------------
the top 1 prediction is 天竺葵, prob is 0.478
the top 2 prediction is 四季秋海棠, prob is 0.232
the top 3 prediction is 蜀葵, prob is 0.181
------------------------------
the top 1 prediction is 著生杜鵑, prob is 0.296
the top 2 prediction is 軟枝黃蟬, prob is 0.152
the top 3 prediction is 野薑花, prob is 0.101
------------------------------
the top 1 prediction is 杜鵑花仙子, prob is 0.736
the top 2 prediction is 蜀葵, prob is 0.243
the top 3 prediction is 平戶杜鵑, prob is 0.007
------------------------------
the top 1 prediction is 玫瑰, prob is 0.902
the top 2 prediction is 桃金孃, prob is 0.036
the top 3 prediction is 非洲鳳仙花, prob is 0.034
------------------------------
the top 1 prediction is 仙丹花, prob is 0.370
the top 2 prediction is 非洲鳳仙花, prob is 0.308
the top 3 prediction is 使君子, prob is 0.246
------------------------------
```
## (Optimal) Extra Dataset

| context | url |
| -------- | -------- |
| flowers | https://www.kaggle.com/rednivrug/flower-recognition-he#10002.jpg |
| flowers | https://www.kaggle.com/alxmamaev/flowers-recognition |
| flowers | https://www.kaggle.com/msheriey/104-flowers-garden-of-eden |
| flowers | https://www.kaggle.com/eswarkamineni/flower-data#image_05088.jpg |
| cat and dog | https://www.kaggle.com/tongpython/cat-and-dog |
| cat and dog | https://www.kaggle.com/thesherpafromalabama/cats-and-dogs-sentdex-tutorial#10001.jpg |
|animal | https://www.kaggle.com/alessiocorrado99/animals10|
|apple/banana/orange|https://www.kaggle.com/sriramr/apples-bananas-oranges|
|coin|https://www.kaggle.com/wanderdust/coin-images|
|||
### Train individual on an extra dataset
https://www.kaggle.com/msheriey/104-flowers-garden-of-eden
### pretrained model adopted
* ShuffleNet
* AlexNet
* EfficientNet
* resNet18
| Net | pameters_n | size(MB) | accuracy |
| -------- | -------- | -------- | -------- |
| AlexNet(class104) |57,429,928 |228.02|.70|
| AlexNet(class54) |57,429,928 |228.02|.75|
| shuffleNet(class104) |1,308,954 |47.25 |.67|
| shuffleNet(class54) |1,308,954 |47.25 |.77|
| resNet18(class104) |11,229,864|100.46 |.79|
| resNet18(class90) |11,229,864|100.46 |.81|
| resNet18(class104) without regularization |11,229,864|100.46 |.81|
#### latest model
https://github.com/lukemelas/EfficientNet-PyTorch
```python=
from efficientnet_pytorch import EfficientNet
```
### difficulty
1. the number of ground-truth reaches up to 104 species
2. same species has various color, e.g., white/red rose
3. imbalance data
### Improvement
1. add **rotation** augmentation
2. **breakthrough:** decrease the learning rate
> from `1e-2 ~ 1e-5` to `1e-4 ~ 1e-7`
3. batch size from 32 to 64
> avoid overfitting
### Experiments
wild rose , wild geranium 同類
wallflower petunia 本身的照片難以分辨
columbine lenren_rose 本身的照片難以分辨
daffodil, petunia 做二元分類 (shuffleNet預測daffodil猜petunia)
bromelia, frangipani 做二元分類 (shuffleNet預測bromelia猜frangipani)
#### 難以辨別:一簇花

> 分錯的那個是 snapdragon,特色是有很多顏色以及一簇一簇的照片
> snapdragon很難用顏色區分,猜測是因為一簇的關係
#### 不明所以的照片


#### 特殊背景色

#### 同種不同色

#### 同類數量稀少或是同類中單張特殊




> 37張
> 只有一張未開花
> 看不出康乃馨的特色
#### 真的太像了

> 猜測:cyclamen
> 答案:common tulip
> 就這張照片來說很難辨別是 tulip

> 猜測:bougainvillea
> 答案:primula
> 就這張照片來說很難辨別是 primula
#### 去除 trend <= 0 的資料 (有問題,會將做好的資料去除)
trend 如何計算?
> 將收集來的precision or recall per epoch 視為timeseries的資料, 計算出每個epoch之間的斜率總和,當總和大於0,代表在訓練過程中有提昇performance,反之則無。
> 例如: `[1,4,-1,1]` 的序列資料,經過計算後得到 `[Nan,3,-5,2]`,加總後得到`0`,代表這個類別的趨勢處於不上不下的狀態。
原始label數量為104個,經過filter剩下54個,幾乎少了50%,驗證在validation set從原先的37%準確率提升到43%

#### 採用 alexNet (參數變多,導致overfitting)
2020.04.14 - 跑到第17個epoch仍然沒有好轉
2020.04.15 - 更改lr range 有了突破性進展
> 代表不同類別之間其差異非常細微,必須降低微分強度
> 但是也產生一個問題,就是overfitting

2020.04.15 - 提昇 regularization loss 影響
> 為了解決overfitting,觀察l2 regularization 的範圍約在200~240之間, 將regularization loss的權重從`1e-3` 調整為 `1e-2`
2020.04.15 - 使用 `albumentations` 試圖產生更diverse的augmentation
> 準確率提昇了1%

2020.04.15 - 調高regularzation的比重, 加入 l1 regularization
> 準確率達到70%
```python=
args['L1_ratio'] = 1e-4
args['L2_ratio'] = 1e-2
```
再進一步調高?
```python=
args['L1_ratio'] = 1e-3
args['L2_ratio'] = 1e-2
```
> 準確率68%

分析原因: 在模型第二個epoch時大幅降低了val_loss, 模型的複雜度在訓練初期就做了巨大的修正, 之後隨著lr下降對於模型的loss幾乎沒有影響, 可能是ratio設定太大,導致gradient影響不足
改成調整l2 regularization的比重
```python=
args['L1_ratio'] = 1e-4
args['L2_ratio'] = 1e-1
```
> 準確率69%
#### 額外實驗,做梯度裁剪 `[-0.9, +0.9]`
69%
#### 改回原先的shuffleNet
classes=54

> 這裡說明固定`lr=1e-4`已經可以將準確率提升到`0.66`,代表模型大致可以從data中找到圖片和種類的關聯,透過縮短步長可以從`0.66`到`0.77`
## Is clipping gradient useful ?
> Not usually useful
| resNet18 | Clipping| non-Clipping |
| -------- | -------- | -------- |
| accuracy | .783 | .835 |
| loss | ? | .492 |




## Reference
totchvison models: https://pytorch.org/docs/stable/torchvision/models.html
torch tensorboard: https://pytorch.org/docs/stable/tensorboard.html
confusion matrx: http://martin-mundt.com/tensorboard-figures/
torch tricks: https://zhuanlan.zhihu.com/p/76459295
free GPU memory: https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/discussion/96876
t-sne flower: https://www.kaggle.com/gaborvecsei/plants-t-sne

## Transfer learning
https://hackmd.io/@allen108108/H1MFrV9WH#Zero-Shot-Translation