# CRNN multiGPU training Chinese_Characters_Rec 修改成多顆GPU訓練
# 23'0519 inference issue (rename checkpoint state_dict)
開始做最後inference程式,遇到了模型checkpoint無法載入問題,分析原因及處理辦法
```
訓練時修改了多顆GPU的語法(self.cnn = nn.DataParallel(self.cnn))
儲存模型時 state_dict 都會自己加上 module. 字串,如:'cnn.module.conv0.weight',
'rnn.module.1.embedding.bias'
但原crnn(非多GPU)程式產生的state_dict就沒有,如:'cnn.conv0.weight','rnn.1.embedding.bias'
現在要改回非多GPU包含用CPU的inference程式時,
載入checkpoint報錯,因為他要'cnn.conv0.weight',不要'cnn.module.conv0.weight'(多出modile.的),
所以要將state_dict的key rename回來。方法如下:
import torch
from lib.utils.utils import model_info
import lib.models.crnn as crnn
checkpoint_path = 'output/checkpoints/checkpoint_349_acc_0.9797.pth'
checkpoint = torch.load(checkpoint_path, map_location='cpu')
rename_list = [] # 用來存放新舊名稱的對應關係
old_list = list(checkpoint['state_dict'].keys())
# for ol in old_list:
# rename_list.append([ol, ol.replace('module.', '')])
# print('rename_list:', rename_list)
for key in old_list:
checkpoint['state_dict'][key.replace('module.', '')] = checkpoint['state_dict'].pop(key)
torch.save(checkpoint, 'output/checkpoints/checkpoint_349_acc_0.9797_rename.pth')
```
# 23'0329 新進展
修改處如下:
```
lib\core\function.py
inp = inp.cuda() # add this 這組合最快 to(device)
preds = model(inp) # 這裡用 cuda() 速度加倍 350/ss -> 700/sample-sec
...
preds = preds.cpu() # 這裡是關鍵,sim_preds_convert_time_B 1.009 (本來是8秒)
sim_preds = converter.decode(preds.data, preds_size.data, raw=False)
lib\config\360CC_config.yaml # 這裡只改 batch_size
TRAIN:
BATCH_SIZE_PER_GPU: 1100
SHUFFLE: True
BEGIN_EPOCH: 0
END_EPOCH: 1000
RESUME:
IS_RESUME: True
FILE: 'output/360CC/checkpoints/checkpoint_316_acc_0.9783.pth'
OPTIMIZER: 'adam'
LR: 0.001
WD: 0.0
LR_STEP: [315, 320] # 第一個epoch以下為設定值,然後 *0.1,第二個eopch以上再 *0.1 (1/100)
LR_FACTOR: 0.1 # https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.MultiStepLR.html
MOMENTUM: 0.0
NESTEROV: False
RMSPROP_ALPHA:
RMSPROP_CENTERED:
FINETUNE:
IS_FINETUNE: False
FINETUNE_CHECKPOINIT: 'output/360CC/checkpoints/checkpoint_266_acc_0.9443.pth'
FREEZE: False
TEST:
BATCH_SIZE_PER_GPU: 1100
SHUFFLE: True # for random test rather than test on the whole validation set
NUM_TEST_BATCH: 1000
NUM_TEST_DISP: 10
```
結果如下:
```
strat of training time: 2023/03/29 16:55:15
=> creating output\360CC\checkpoints
=> creating output\360CC\log
torch.cuda.device_count() 2
load 3239051 images!
load 170477 images!
Epoch: [317][1/2945] Time 137.122s (137.122s) Speed 8.0 samples/s Data 120.794s (120.794s) Loss 0.00039 (0.00039) preds_time 10.703
Epoch: [317][101/2945] Time 0.953s (2.281s) Speed 1154.1 samples/s Data 0.000s (1.196s) Loss 0.00325 (0.00521) preds_time 0.343
Epoch: [317][201/2945] Time 0.953s (1.618s) Speed 1154.1 samples/s Data 0.000s (0.601s) Loss 0.01150 (0.00523) preds_time 0.295
Epoch: [317][301/2945] Time 1.000s (1.402s) Speed 1100.0 samples/s Data 0.016s (0.402s) Loss 0.00591 (0.00595) preds_time 0.282
Epoch: [317][401/2945] Time 0.984s (1.295s) Speed 1117.5 samples/s Data 0.016s (0.302s) Loss 0.00525 (0.00665) preds_time 0.276
Epoch: [317][501/2945] Time 0.984s (1.233s) Speed 1117.5 samples/s Data 0.000s (0.242s) Loss 0.00417 (0.00689) preds_time 0.274
Epoch: [317][601/2945] Time 1.016s (1.194s) Speed 1083.1 samples/s Data 0.000s (0.202s) Loss 0.00595 (0.00715) preds_time 0.273
Epoch: [317][701/2945] Time 1.000s (1.166s) Speed 1100.0 samples/s Data 0.000s (0.173s) Loss 0.00321 (0.00723) preds_time 0.273
Epoch: [317][801/2945] Time 1.000s (1.147s) Speed 1100.0 samples/s Data 0.000s (0.151s) Loss 0.01254 (0.00772) preds_time 0.274
Epoch: [317][901/2945] Time 1.172s (1.132s) Speed 938.7 samples/s Data 0.000s (0.135s) Loss 0.01000 (0.00779) preds_time 0.274
Epoch: [317][1001/2945] Time 1.031s (1.121s) Speed 1066.7 samples/s Data 0.000s (0.121s) Loss 0.00282 (0.00773) preds_time 0.274
Epoch: [317][1101/2945] Time 1.031s (1.112s) Speed 1066.7 samples/s Data 0.016s (0.110s) Loss 0.00731 (0.00764) preds_time 0.274
Epoch: [317][1201/2945] Time 1.047s (1.104s) Speed 1050.8 samples/s Data 0.000s (0.101s) Loss 0.01373 (0.00757) preds_time 0.274
Epoch: [317][1301/2945] Time 1.000s (1.098s) Speed 1100.0 samples/s Data 0.000s (0.093s) Loss 0.00786 (0.00786) preds_time 0.273
Epoch: [317][1401/2945] Time 1.031s (1.093s) Speed 1066.5 samples/s Data 0.000s (0.087s) Loss 0.01049 (0.00806) preds_time 0.274
Epoch: [317][1501/2945] Time 1.000s (1.088s) Speed 1100.2 samples/s Data 0.000s (0.081s) Loss 0.00809 (0.00807) preds_time 0.274
Epoch: [317][1601/2945] Time 1.007s (1.084s) Speed 1092.8 samples/s Data 0.000s (0.076s) Loss 0.00892 (0.00800) preds_time 0.274
Epoch: [317][1701/2945] Time 1.065s (1.081s) Speed 1032.8 samples/s Data 0.000s (0.071s) Loss 0.01292 (0.00796) preds_time 0.274
Epoch: [317][1801/2945] Time 1.297s (1.079s) Speed 848.2 samples/s Data 0.000s (0.067s) Loss 0.00706 (0.00851) preds_time 0.275
Epoch: [317][1901/2945] Time 1.004s (1.076s) Speed 1095.2 samples/s Data 0.000s (0.064s) Loss 0.00488 (0.00853) preds_time 0.275
Epoch: [317][2001/2945] Time 1.016s (1.074s) Speed 1083.2 samples/s Data 0.000s (0.061s) Loss 0.00518 (0.00851) preds_time 0.275
Epoch: [317][2101/2945] Time 1.016s (1.072s) Speed 1083.1 samples/s Data 0.000s (0.058s) Loss 0.01083 (0.00851) preds_time 0.275
Epoch: [317][2201/2945] Time 1.032s (1.070s) Speed 1065.6 samples/s Data 0.000s (0.055s) Loss 0.01062 (0.00850) preds_time 0.275
Epoch: [317][2301/2945] Time 1.031s (1.068s) Speed 1066.7 samples/s Data 0.000s (0.053s) Loss 0.02037 (0.00857) preds_time 0.276
Epoch: [317][2401/2945] Time 1.016s (1.066s) Speed 1083.1 samples/s Data 0.000s (0.051s) Loss 0.00674 (0.00872) preds_time 0.276
Epoch: [317][2501/2945] Time 1.031s (1.065s) Speed 1066.7 samples/s Data 0.000s (0.049s) Loss 0.01201 (0.00884) preds_time 0.276
Epoch: [317][2601/2945] Time 1.047s (1.064s) Speed 1050.8 samples/s Data 0.000s (0.047s) Loss 0.02099 (0.00890) preds_time 0.277
Epoch: [317][2701/2945] Time 1.031s (1.063s) Speed 1066.7 samples/s Data 0.000s (0.045s) Loss 0.00874 (0.00897) preds_time 0.277
Epoch: [317][2801/2945] Time 1.047s (1.062s) Speed 1050.8 samples/s Data 0.016s (0.044s) Loss 0.00585 (0.00895) preds_time 0.277
Epoch: [317][2901/2945] Time 1.004s (1.061s) Speed 1095.2 samples/s Data 0.000s (0.042s) Loss 0.01214 (0.00892) preds_time 0.278
get_last_lr : [0.001] train epoch time consume: 3134.1819939613342
Epoch: [317][10/155] Time 1.166s (6.662s) Speed 943.1 samples/s Data 0.000s (5.352s) Loss 0.06587 (0.07565) conver_time_A 0.004 preds_time 0.294 sim_preds_convert_time_B 0.849
Epoch: [317][20/155] Time 1.268s (3.952s) Speed 867.4 samples/s Data 0.000s (2.677s) Loss 0.04255 (0.06993) conver_time_A 0.003 preds_time 0.273 sim_preds_convert_time_B 0.839
Epoch: [317][30/155] Time 1.396s (3.063s) Speed 787.8 samples/s Data 0.016s (1.785s) Loss 0.07917 (0.06675) conver_time_A 0.003 preds_time 0.265 sim_preds_convert_time_B 0.851
Epoch: [317][40/155] Time 1.359s (2.621s) Speed 809.3 samples/s Data 0.000s (1.339s) Loss 0.03638 (0.06470) conver_time_A 0.003 preds_time 0.262 sim_preds_convert_time_B 0.859
Epoch: [317][50/155] Time 1.152s (2.354s) Speed 954.9 samples/s Data 0.000s (1.071s) Loss 0.08211 (0.06429) conver_time_A 0.003 preds_time 0.262 sim_preds_convert_time_B 0.861
Epoch: [317][60/155] Time 1.351s (2.171s) Speed 814.5 samples/s Data 0.001s (0.893s) Loss 0.09275 (0.06577) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.857
Epoch: [317][70/155] Time 1.240s (2.056s) Speed 886.8 samples/s Data 0.000s (0.765s) Loss 0.05339 (0.06642) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.867
Epoch: [317][80/155] Time 1.234s (1.958s) Speed 891.2 samples/s Data 0.000s (0.670s) Loss 0.10279 (0.06642) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.864
Epoch: [317][90/155] Time 1.346s (1.890s) Speed 817.3 samples/s Data 0.000s (0.595s) Loss 0.04559 (0.06646) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.872
Epoch: [317][100/155] Time 1.205s (1.824s) Speed 912.7 samples/s Data 0.000s (0.536s) Loss 0.07828 (0.06568) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.865
Epoch: [317][110/155] Time 1.297s (1.770s) Speed 847.8 samples/s Data 0.000s (0.487s) Loss 0.07391 (0.06615) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.860
Epoch: [317][120/155] Time 1.203s (1.727s) Speed 914.2 samples/s Data 0.000s (0.447s) Loss 0.07416 (0.06679) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.857
Epoch: [317][130/155] Time 1.297s (1.692s) Speed 848.2 samples/s Data 0.000s (0.412s) Loss 0.05114 (0.06671) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.858
Epoch: [317][140/155] Time 1.297s (1.670s) Speed 848.3 samples/s Data 0.000s (0.383s) Loss 0.03633 (0.06670) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.867
Epoch: [317][150/155] Time 1.193s (1.641s) Speed 921.7 samples/s Data 0.000s (0.357s) Loss 0.08962 (0.06750) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.863
------------眇---寛----藤----瓠----隅---技----緁----癣----湟----仄--------------- => 眇寛藤瓠隅技緁癣湟仄, gt: 眇寛藤瓠隅技緁癣湟仄 76467
-N---a-t-iikk-a- -f-r-e--r-e--s---------------------------------------- => Natika freres, gt: Natika freres 76467
---莛-----薨-----鲈----雀-----酷---咀-----烱-----谯----承----------------------- => 莛薨鲈雀酷咀烱谯承 , gt: 莛薨鲈雀酷咀烱谯承 76467
-------焚-----辉----琁-----嗓-----饞-----鸳----棚-----媞-----巫----萦------------ => 焚辉琁嗓饞鸳棚媞巫萦, gt: 焚辉琁嗓饞鸳棚媞巫萦 76467
------------貂----剿---佬----答----狡---坏----他---枌----殊---鎘----------------- => 貂剿佬答狡坏他枌殊鎘, gt: 貂剿佬答狡坏他枌殊鎘 76467
---辰------篱------誣------铸-----貫------靂------淫------屍------垂------------ => 辰篱誣铸貫靂淫屍垂 , gt: 辰篱誣铸貫靂淫屍垂 76467
-------------趺----銧---涛---錐----姘---漕漕---僩---褡---碱----帕----------------- => 趺銧涛錐姘漕僩褡碱帕, gt: 趺銧涛錐姘漕僩褡碱帕 76467
-------------峯-- --践-- --方-- -橘-- --腥-- -渣--- -薯----------------- => 峯 践 方 橘 腥 渣 薯, gt: 峯 践 方 橘 腥 渣 薯 76467
---廼----怯-----纔----丑----霧----热-----瘸----遼----褟------------------------- => 廼怯纔丑霧热瘸遼褟 , gt: 廼怯纔丑霧热瘸遼褟 76467
-----莹---- --柜--- --砧--- --替-- ---亢--- --瓶--- --涼---------- => 莹 柜 砧 替 亢 瓶 涼, gt: 莹 柜 砧 替 亢 瓶 涼 76467
[#correct:163821 / #total:170477]
Test loss: 0.0676, accuray: 0.9610
acc validate time consume: 256.3665907382965
is best: True
best acc is: 0.9609566099825783
```
以前是這樣:
```
load 3239051 images!
load 170477 images!
Epoch: [280][1/2793] Time 142.859s (142.859s) Speed 8.1 samples/s Data 124.562s (124.562s) Loss 0.01011 (0.01011)
Epoch: [280][101/2793] Time 1.031s (2.434s) Speed 1124.8 samples/s Data 0.000s (1.234s) Loss 0.01207 (0.02511)
...
Epoch: [280][2701/2793] Time 1.187s (1.159s) Speed 976.8 samples/s Data 0.000s (0.047s) Loss 0.02497 (0.02608)
get_last_lr : [0.001] train epoch time consume: 3244.6378741264343
Epoch: [280][10/147] Time 3.359s (9.219s) Speed 345.3 samples/s Data 0.000s (5.784s) Loss 0.04841 (0.04270)
Epoch: [280][20/147] Time 3.188s (6.208s) Speed 363.9 samples/s Data 0.000s (2.892s) Loss 0.02864 (0.04360)
Epoch: [280][30/147] Time 3.172s (5.228s) Speed 365.7 samples/s Data 0.000s (1.929s) Loss 0.02289 (0.04255)
...
Epoch: [280][130/147] Time 3.000s (3.529s) Speed 386.7 samples/s Data 0.000s (0.445s) Loss 0.04832 (0.04314)
Epoch: [280][140/147] Time 3.000s (3.489s) Speed 386.7 samples/s Data 0.000s (0.414s) Loss 0.03028 (0.04297)
----酴--- ---梅---- ---干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307
[#correct:162978 / #total:170477]
Test loss: 0.0432, accuray: 0.9560
acc validate time consume: 513.2954776287079 <--- 這第二名
```
# 23'0325 再次調整設定方法
目前training階段GPU使用情形還滿正常的,但是validate時好像偏重其中一顆GPU


目前的設定方式為
```
lib\models\crnn.py
self.cnn = cnn
self.cnn = nn.DataParallel(self.cnn) <--- here
self.rnn = nn.Sequential(
BidirectionalLSTM(512, nh, nh),
BidirectionalLSTM(nh, nh, nclass))
self.rnn = nn.DataParallel(self.rnn) <--- here
train.py
elif config.TRAIN.RESUME.IS_RESUME:
model_state_file = config.TRAIN.RESUME.FILE
if model_state_file == '':
print(" => no checkpoint found")
checkpoint = torch.load(model_state_file, map_location='cpu')
if 'state_dict' in checkpoint.keys():
model.load_state_dict(checkpoint['state_dict'])
last_epoch = checkpoint['epoch']
# optimizer.load_state_dict(checkpoint['optimizer'])
# lr_scheduler.load_state_dict(checkpoint['lr_scheduler'])
else:
model.load_state_dict(checkpoint)
# model = torch.nn.DataParallel(model, device_ids=[0,1])
model = model.to(device) # 模型完整載入後再送去 device <--- here
lib\core\function.py
labels = utils.get_batch_label(dataset, idx)
# print('lib/function len(labels): ', len(labels))
inp = inp.to(device)
# print('lib/function inp.size(): ', inp.size())
# inference
preds = model(inp) #.cuda()
...
labels = utils.get_batch_label(dataset, idx)
# labels = labels.to(device)
inp = inp.to(device)
...
# inference
# model = model.to(device)
preds = model(inp) # .cuda()
```
也加上計算時間結果如下:
```
strat of training time: 2023/03/25 06:24:55
=> creating output\360CC\checkpoints
=> creating output\360CC\log
torch.cuda.device_count() 2
load 3242588 images!
load 170663 images!
C:\Application\Anaconda3\envs\dbnet_wenmu\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn('PyTorch is not compiled with NCCL support')
Epoch: [265][1/3167] Time 75.981s (75.981s) Speed 13.5 samples/s Data 63.827s (63.827s) Loss 0.06747 (0.06747)
...
Epoch: [265][3101/3167] Time 0.971s (1.006s) Speed 1054.4 samples/s Data 0.000s (0.027s) Loss 0.05320 (0.05584)
get_last_lr : [0.001] train epoch time consume: 3192.4820868968964 <--- train
Epoch: [265][100/167]
-1------11--6--9---6---9--1- -0----- --1----s----o--------------------- => 1169691 0 1so, gt: 4588969107 2865 48209
----池-----绗------重-----斜------隋------煨-----定------应------乌------------- => 池绗重斜隋煨定应乌 , gt: 池绗重斜隋煨定应乌 48209
-------距----韌-----痿痿----誅-----9---屬----遣------离-----舄------滢----------- => 距韌痿誅9屬遣离舄滢, gt: 距韌痿誅9屬遣离舄滢 48209
----镕-----冢-----鐀------營-----拷------憓------璧-----尾-----姪--------------- => 镕冢鐀營拷憓璧尾姪 , gt: 镕冢鐀營拷憓璧尾姪 48209
----夤-----繽-----坼------秾-----纶------蹴-----司------蚤------靂-------------- => 夤繽坼秾纶蹴司蚤靂 , gt: 夤繽坼秾纶蹴司蚤靂 48209
----理---泛--押----旼---貿----繚----鐡---汯---咚-------------------------------- => 理泛押旼貿繚鐡汯咚 , gt: 理泛押旼貿繚鐡汯咚 48209
----惆--- ----渼--- ----愴--- ----最--- -----寷------------------------- => 惆 渼 愴 最 寷 , gt: 惆 渼 愴 最 寷 48209
----骯---越-----舷-----劂----裂-----市-----直----丧-----业---------------------- => 骯越舷劂裂市直丧业 , gt: 骯越舷劂裂市直丧业 48209
--------埔-----褫----礽-----戴-----疏----愉-----凶-----瓠----阿-----\----------- => 埔褫礽戴疏愉凶瓠阿\, gt: 埔褫礽戴疏愉凶瓠阿\ 48209
----眨---屎---彪---鵲----井---俯俯---彤----妨---帑------------------------------- => 眨屎彪鵲井俯彤妨帑 , gt: 眨屎彪鵲井俯彤妨帑 48209
[#correct:154625 / #total:170663]
Test loss: 0.0761, accuray: 0.9060
acc validate time consume: 1296.1403181552887 <--- validate time
is best: True
best acc is: 0.9060253247628367
Epoch: [266][1/3167] Time 64.968s (64.968s) Speed 15.8 samples/s Data 63.187s (63.187s) Loss 0.05739 (0.05739)
Epoch: [266][101/3167] Time 0.953s (1.570s) Speed 1074.4 samples/s Data 0.000s (0.633s) Loss 0.06573 (0.05908)
```
acc validate time 明顯太久,所以還是要設法解決此問題。
```
lib\models\crnn.py
# 這部分不變
self.cnn = cnn
self.cnn = nn.DataParallel(self.cnn)
self.rnn = nn.Sequential(
BidirectionalLSTM(512, nh, nh),
BidirectionalLSTM(nh, nh, nclass))
self.rnn = nn.DataParallel(self.rnn)
lib\core\function.py
criterion = criterion.cuda()
model.train()
inp = inp.cuda()
...
criterion = criterion.cuda()
model.eval()
inp = inp.cuda()
train.py
model = model.cuda() # 模型完整載入後再送去 device
model_info(model)
```
```
TRAIN:
BATCH_SIZE_PER_GPU: 1100
SHUFFLE: True
BEGIN_EPOCH: 0
END_EPOCH: 1000
RESUME:
IS_RESUME: True
FILE: 'output/360CC/checkpoints/checkpoint_265_acc_0.9362.pth'
OPTIMIZER: 'adam'
LR: 0.001
strat of training time: 2023/03/25 11:14:44
=> creating output\360CC\checkpoints
=> creating output\360CC\log
torch.cuda.device_count() 2
load 3242588 images!
load 170663 images!
warnings.warn('PyTorch is not compiled with NCCL support')
Epoch: [266][1/2948] Time 76.860s (76.860s) Speed 14.3 samples/s Data 64.411s (64.411s) Loss 0.06074 (0.06074)
Epoch: [266][501/2948] Time 0.984s (1.125s) Speed 1117.5 samples/s Data 0.000s (0.129s) Loss 0.05679 (0.05020)
Epoch: [266][601/2948] Time 0.984s (1.103s) Speed 1117.4 samples/s Data 0.000s (0.108s) Loss 0.05902 (0.05163)
...
Epoch: [266][2901/2948] Time 1.016s (1.044s) Speed 1083.1 samples/s Data 0.000s (0.023s) Loss 0.03073 (0.05148)
get_last_lr : [0.001] train epoch time consume: 3087.827137708664
Epoch: [266][100/156]
-------------蘚----邃---曠----隸---灞----鍹---[-羯羯--泊----吏------------------- => 蘚邃曠隸灞鍹[羯泊吏, gt: 蘚邃曠隸灞鍹[羯泊吏 11573
-------隆----這----柿-----渑------窯---没没-----駿----军-----犁----扑------------- => 隆這柿渑窯没駿军犁扑, gt: 隆這柿渑窯没駿军犁扑 11573
----忪---禧-----呢----廠----别----迫-----W---签签----艮------------------------- => 忪禧呢廠别迫W签艮 , gt: 忪禧呢廠别迫W签艮 11573
----炯-----几------辂-----迦-----沭沭------胚-----樵------黝------诡------------- => 炯几辂迦沭胚樵黝诡 , gt: 炯几辂迦沭胚樵黝诡 11573
----碇--- ----瀑--- ---骚--- ---驃--- ----箇--------------------------- => 碇 瀑 骚 驃 箇 , gt: 碇 瀑 骚 驃 箇 11573
---羹----- ------媃----- ----榆----- ------欤----- -----甕------------ => 羹 媃 榆 欤 甕 , gt: 羹 媃 榆 欤 甕 11573
--------羋----蠟蠟----嘻----賴-----洌-----打----缰----#---諛-----堃-------------- => 羋蠟嘻賴洌打缰#諛堃, gt: 羋蠟嘻賴洌打缰#諛堃 11573
-------啕--- -輟---- -胰---- -夭--- --竿--- --曲--- --扑------------ => 啕 輟 胰 夭 竿 曲 扑, gt: 啕 輟 胰 夭 竿 曲 扑 11573
----硃--- ---酦--- ----窈-- ---莺---- ----篌------------------------------ => 硃 酦 窈 莺 篌 , gt: 硃 酦 窈 莺 篌 11573
----鵬----旁----廷----势----喙-----昤----蝌----柝----杠------------------------- => 鵬旁廷势喙昤蝌柝杠 , gt: 鵬旁廷势喙昤蝌柝杠 11573
[#correct:161152 / #total:170663]
Test loss: 0.0529, accuray: 0.9443
acc validate time consume: 1300.369472026825
is best: True
best acc is: 0.9442702870569485
strat of training time: 2023/03/25 12:54:10
=> creating output\360CC\checkpoints
=> creating output\360CC\log
torch.cuda.device_count() 2
load 52098 images!
load 2743 images!
C:\Application\Anaconda3\envs\dbnet_wenmu\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support
warnings.warn('PyTorch is not compiled with NCCL support')
Epoch: [267][1/204] Time 32.155s (32.155s) Speed 8.0 samples/s Data 24.858s (24.858s) Loss 6.39860 (6.39860)
Epoch: [267][11/204] Time 0.203s (3.105s) Speed 1260.0 samples/s Data 0.000s (2.260s) Loss 0.70246 (1.82520)
...
Epoch: [267][191/204] Time 0.203s (0.371s) Speed 1262.4 samples/s Data 0.000s (0.130s) Loss 0.35743 (0.50064)
Epoch: [267][201/204] Time 0.203s (0.362s) Speed 1259.1 samples/s Data 0.000s (0.124s) Loss 0.33498 (0.49580)
get_last_lr : [0.001] train epoch time consume: 77.23187446594238
Epoch: [267][10/43] Time 29.733s (27.641s) Speed 2.2 samples/s Data 29.265s (27.074s) Loss 0.33460 (0.28818)
Epoch: [267][20/43] Time 34.983s (30.138s) Speed 1.8 samples/s Data 34.499s (29.592s) Loss 0.45812 (0.33128)
Epoch: [267][30/43] Time 40.311s (32.738s) Speed 1.6 samples/s Data 39.796s (32.196s) Loss 0.33809 (0.34336)
Epoch: [267][40/43] Time 45.530s (35.344s) Speed 1.4 samples/s Data 44.952s (34.808s) Loss 0.47718 (0.33017)
-1--77---9--55--99--88- -0---3--1---2--2------------------------------ => 179598 03122, gt: 179598 03122 3905
---淅--- ---役---- ---櫛---- ----弓-- ---輩--------------------------- => 淅 役 櫛 弓 輩 , gt: 淅 役 櫛 弓 輩 3905
---崧---- ---喷---- --七七--- ---莎莎--- ---故------------------------ => 崧 喷 七 莎 故 , gt: 崧 喷 七 莎 故 3905
---像---- ----闞--- ---喊---- ----星--- ----蠢------------------------- => 像 闞 喊 星 蠢 , gt: 像 闞 喊 星 蠢 3905
---诞----- -----琐---- ----厭---- -----鏗----- ----緝-------------- => 诞 琐 厭 鏗 緝 , gt: 诞 瓒 厭 鏗 緝 3905
---鑣----- -----玲---- ----玹---- -----佺----- ----特-------------- => 鑣 玲 玹 佺 特 , gt: 鑣 玲 玹 佺 特 3905
-44---0---0---22---4---4----0---6-- -55---2---5----1---7-------------- => 40024406 52517, gt: 40024406 52517 3905
---佈---- ----隆----- ----倍---- ---冲---- ---苋------------------ => 佈 隆 倍 冲 苋 , gt: 佈 隆 倍 冲 苋 3905
----戚---- ----潯---- ----涌---- ---婵---- ----檔-------------------- => 戚 潯 涌 婵 檔 , gt: 戚 潯 涌 婵 檔 3905
-2---3---3--- --5-----0----4----8-----0---4--8----6---3---------------- => 233 504804863, gt: 2331 504801863 3905
[#correct:2112 / #total:2743]
Test loss: 0.3309, accuray: 0.7700
acc validate time consume: 48.373497009277344
is best: True
best acc is: 0.7699598979219833
val_dataset = get_dataset(config)(config, is_train=False)
val_loader = DataLoader(
dataset=val_dataset,
batch_size=config.TEST.BATCH_SIZE_PER_GPU,
shuffle=False,
num_workers=0,
pin_memory=True
)
Epoch: [272][181/204] Time 0.216s (0.285s) Speed 1183.1 samples/s Data 0.000s (0.080s) Loss 0.28915 (0.25591)
Epoch: [272][191/204] Time 0.201s (0.281s) Speed 1273.3 samples/s Data 0.000s (0.076s) Loss 0.26008 (0.25824)
Epoch: [272][201/204] Time 0.193s (0.277s) Speed 1323.1 samples/s Data 0.001s (0.073s) Loss 0.30827 (0.26050)
get_last_lr : [0.001] train epoch time consume: 56.546335220336914
Epoch: [272][10/43] Time 5.578s (2.956s) Speed 11.5 samples/s Data 4.987s (2.434s) Loss 0.58593 (0.35299)
Epoch: [272][20/43] Time 11.662s (5.952s) Speed 5.5 samples/s Data 11.016s (5.405s) Loss 0.21616 (0.31394)
Epoch: [272][30/43] Time 17.700s (8.949s) Speed 3.6 samples/s Data 17.156s (8.396s) Loss 0.40394 (0.33232)
Epoch: [272][40/43] Time 23.547s (11.944s) Speed 2.7 samples/s Data 23.016s (11.393s) Loss 0.37459 (0.34642)
---瀑---- ----亘--- ---喘--- ---蕎--- ---型-------------------------- => 瀑 亘 喘 蕎 型 , gt: 瀑 亘 喘 蕎 型 3905
---竽---- ----究--- ----闽--- ----骥--- ---勾------------------------ => 竽 究 闽 骥 勾 , gt: 竽 究 闽 骥 勾 3905
```
調整了幾項設定好像都差不多,看樣子是讀檔案的時候,因為 validate dataset 非常分散,所以讀影像的時候就會變很慢,目前做好的效果是調整 woker_num 但是效果只好一點點。
未來可能要將 validate dataset 移到單獨位置試試,或者讀到 h5 or pkl 檔案來增加速度!!
23'03/26 試過了將validate資料搬到另一個資料夾中 結果沒甚麼作用!! 修改 work_num 也沒作用!! 目前將 dataloader 中 getitem (lib\dataset\_360cc.py)裡面的 resize mark 掉不使用 速度提升一倍如下
```
lib\config\360CC_config.yaml
GPUID: 0
WORKERS: 20
PRINT_FREQ: 100
SAVE_FREQ: 10
PIN_MEMORY: True
'val':'../CRNN_Chinese_Characters_Rec_data/datasets/train/test_code_valid.txt'}
TRAIN:
BATCH_SIZE_PER_GPU: 1160
SHUFFLE: True
BEGIN_EPOCH: 0
END_EPOCH: 1000
RESUME:
IS_RESUME: True
FILE: 'output/360CC/checkpoints/checkpoint_269_acc_0.9539.pth'
OPTIMIZER: 'adam'
LR: 0.001
WD: 0.0
LR_STEP: [10, 80]
LR_FACTOR: 0.1
MOMENTUM: 0.0
NESTEROV: False
RMSPROP_ALPHA:
RMSPROP_CENTERED:
FINETUNE:
IS_FINETUNE: False
FINETUNE_CHECKPOINIT: 'output/360CC/checkpoints/checkpoint_266_acc_0.9443.pth'
FREEZE: False
TEST:
BATCH_SIZE_PER_GPU: 1160
SHUFFLE: False # for random test rather than test on the whole validation set
NUM_TEST_BATCH: 1000
NUM_TEST_DISP: 10
lib\core\function.py
def validate(config, val_loader, dataset, converter, model, criterion, device, epoch, writer_dict, output_dict):
batch_time = AverageMeter()
data_time = AverageMeter()
losses = AverageMeter()
model.eval()
n_correct = 0
end = time.time()
with torch.no_grad():
for i, (inp, idx) in enumerate(val_loader):
data_time.update(time.time() - end)
labels = utils.get_batch_label(dataset, idx)
# inp = inp.to(device)
# inference
preds = model(inp).cpu() # why?
lib\dataset\_360cc.py
def __getitem__(self, idx):
img_name = list(self.labels[idx].keys())[0]
img = cv2.imread(os.path.join(self.root, img_name))
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# img_h, img_w = img.shape
# img = cv2.resize(img, (0,0), fx=self.inp_w / img_w, fy=self.inp_h / img_h, interpolation=cv2.INTER_CUBIC) <--- 這裡先停掉 增加一倍速度 (錯誤)
img = np.reshape(img, (self.inp_h, self.inp_w, 1))
strat of training time: 2023/03/26 08:02:54
=> creating output\360CC\checkpoints
=> creating output\360CC\log
torch.cuda.device_count() 2
load 3239051 images!
load 170477 images!
Epoch: [280][1/2793] Time 140.828s (140.828s) Speed 8.2 samples/s Data 125.750s (125.750s) Loss 0.02715 (0.02715)
Epoch: [280][101/2793] Time 1.047s (2.421s) Speed 1108.2 samples/s Data 0.000s (1.246s) Loss 0.02376 (0.02700)
...
Epoch: [280][2701/2793] Time 1.110s (1.142s) Speed 1045.5 samples/s Data 0.000s (0.047s) Loss 0.02019 (0.02689)
Epoch: [280][2701/2793] Time 1.110s (1.142s) Speed 1045.5 samples/s Data 0.000s (0.047s) Loss 0.02019 (0.02689)
get_last_lr : [0.001] train epoch time consume: 3196.342706680298
Epoch: [280][10/147] Time 88.247s (73.721s) Speed 13.1 samples/s Data 84.903s (70.467s) Loss 0.04430 (0.03724)
Epoch: [280][20/147] Time 120.574s (89.954s) Speed 9.6 samples/s Data 117.527s (86.712s) Loss 0.02340 (0.03811)
...
Epoch: [280][130/147] Time 456.392s (260.109s) Speed 2.5 samples/s Data 453.383s (257.027s) Loss 0.04539 (0.03979)
Epoch: [280][140/147] Time 486.361s (275.307s) Speed 2.4 samples/s Data 483.395s (272.231s) Loss 0.02618 (0.03957)
----蚓------舉-----牍-----荖-----你-----外外----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307
-2----7---0----5---3-- -4---7----3---8----1---4----------------------- => 27053 473814, gt: 27053 473814 79307
---(---予----梗----梓---鏤----募--亘亘---叔----帏------------------------------- => (予梗梓鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307
-----------噓-- -加--- --弓-- --尚-- --算-- -畛-- -竿竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307
------------近-- -盂--- --憧--- -斗-- -脂-- --醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307
-------灶----嘐-----蒸-----梃-----貂-----诛----皴-----镰-----晷-----侏----------- => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307
----悚------菒-----拭------術------托-----析析-----溜------环------鰍------------ => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307
-----------------脯- -s -谗- --栗- -効- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307
------------眇---诚-----緣---丸---喋----邕邕---踩----筑----涕----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307
----酴--- ----梅--- ----干-- -----擰--- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307
[#correct:164510 / #total:170477]
Test loss: 0.0398, accuray: 0.9650
acc validate time consume: 511.09727215766907
is best: True
best acc is: 0.9649982109023505
( 附註: validate batch time 計算錯誤 沒 reset ,所以事後發現不是資料存取的問題!!!蝦米!?)
lib\core\function.py
if i == config.TEST.NUM_TEST_BATCH:
break
end = time.time() # <--- 改到這裡才對
```
validate 改成 GPU
```
inp = inp.to(device) # add this
# inference
# preds = model(inp).cpu() # why?
preds = model(inp) #
strat of training time: 2023/03/26 09:08:55
=> creating output\360CC\checkpoints
=> creating output\360CC\log
torch.cuda.device_count() 2
load 3239051 images!
load 170477 images!
Epoch: [280][1/2793] Time 139.924s (139.924s) Speed 8.3 samples/s Data 125.174s (125.174s) Loss 0.01400 (0.01400)
Epoch: [280][101/2793] Time 0.984s (2.377s) Speed 1178.5 samples/s Data 0.000s (1.240s) Loss 0.02962 (0.02531)
...
Epoch: [280][2701/2793] Time 1.094s (1.113s) Speed 1060.5 samples/s Data 0.000s (0.047s) Loss 0.01298 (0.02704)
get_last_lr : [0.001] train epoch time consume: 3116.4562408924103
Epoch: [280][10/147] Time 8.625s (14.967s) Speed 134.5 samples/s Data 0.000s (6.159s) Loss 0.06155 (0.05366)
Epoch: [280][140/147] Time 8.359s (9.152s) Speed 138.8 samples/s Data 0.000s (0.441s) Loss 0.03826 (0.05616)
----蚓-----舉------牍----荖------你-----外-----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307
-2----7---0----5---3-- -4----7---3---8---11--4------------------------ => 27053 473814, gt: 27053 473814 79307
---(---予----梗---样----鏤---募----亘---叔---帏-------------------------------- => (予梗样鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307
-----------噓-- -加--- --弓-- --尚-- --算-- -畛-- --竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307
-----------近--- 盂--- -憧--- -斗-- -脂-- --醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307
------灶-----嘐-----蒸-----梃-----貂-----诛----皴-----镰-----晷-----侏----------- => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307
----悚-----菒------拭------術-----托------析------溜-----环------鰍------------- => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307
-----------------脯-- -s -谗- -栗- -効-- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307
------------眇---诚----緣----丸----喋---邕邕---踩----筑---涕-----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307
----酴--- ---梅---- ---干--- ----擰---- ----蒙------------------------ => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307
[#correct:161916 / #total:170477]
Test loss: 0.0565, accuray: 0.9498
acc validate time consume: 1349.7679324150085 <--- 又變更久!!? 最慢!!!
不過資料時間就變沒問題,應該是剛剛程式計算時間沒歸零所致
is best: True
best acc is: 0.9497820820403925
```
再改回 validate 用 CPU
```
# inp = inp.to(device) # add this
# inference
preds = model(inp).cpu() # why?
# preds = model(inp) #
strat of training time: 2023/03/26 10:32:29
load 3239051 images!
load 170477 images!
Epoch: [280][1/2793] Time 142.859s (142.859s) Speed 8.1 samples/s Data 124.562s (124.562s) Loss 0.01011 (0.01011)
Epoch: [280][101/2793] Time 1.031s (2.434s) Speed 1124.8 samples/s Data 0.000s (1.234s) Loss 0.01207 (0.02511)
...
Epoch: [280][2701/2793] Time 1.187s (1.159s) Speed 976.8 samples/s Data 0.000s (0.047s) Loss 0.02497 (0.02608)
get_last_lr : [0.001] train epoch time consume: 3244.6378741264343
Epoch: [280][10/147] Time 3.359s (9.219s) Speed 345.3 samples/s Data 0.000s (5.784s) Loss 0.04841 (0.04270)
Epoch: [280][20/147] Time 3.188s (6.208s) Speed 363.9 samples/s Data 0.000s (2.892s) Loss 0.02864 (0.04360)
Epoch: [280][30/147] Time 3.172s (5.228s) Speed 365.7 samples/s Data 0.000s (1.929s) Loss 0.02289 (0.04255)
...
Epoch: [280][130/147] Time 3.000s (3.529s) Speed 386.7 samples/s Data 0.000s (0.445s) Loss 0.04832 (0.04314)
Epoch: [280][140/147] Time 3.000s (3.489s) Speed 386.7 samples/s Data 0.000s (0.414s) Loss 0.03028 (0.04297)
----酴--- ---梅---- ---干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307
[#correct:162978 / #total:170477]
Test loss: 0.0432, accuray: 0.9560
acc validate time consume: 513.2954776287079 <--- 這第二名
is best: True
best acc is: 0.9560116613971387
```
大概 validate 快了 2~3 倍。
下圖是 training step resource usage count

試一下下面這樣 inp.to(device),最後一試。也同是調整了 lr的設定(雖然目前不具作用)如下:
```
lib\config\360CC_config.yaml
LR_STEP: [300, 320] # 第一個epoch以下為設定值,然後 *0.1,第二個eopch以上再 *0.1 (1/100)
lib\core\function.py
inp = inp.to(device) # add this <--- 加上去
# inference
preds = model(inp).cpu() # why?
# preds = model(inp) #
strat of training time: 2023/03/26 11:54:05
load 3239051 images!
load 170477 images!
Epoch: [280][1/2793] Time 138.890s (138.890s) Speed 8.4 samples/s Data 119.003s (119.003s) Loss 0.02168 (0.02168)
Epoch: [280][101/2793] Time 1.047s (2.417s) Speed 1108.1 samples/s Data 0.000s (1.179s) Loss 0.02360 (0.02676)
Epoch: [280][201/2793] Time 1.062s (1.734s) Speed 1091.8 samples/s Data 0.000s (0.593s) Loss 0.02606 (0.02697)
...
Epoch: [280][2701/2793] Time 1.078s (1.157s) Speed 1076.0 samples/s Data 0.000s (0.045s) Loss 0.02396 (0.02612)
get_last_lr : [0.001] train epoch time consume: 3235.9140632152557
Epoch: [280][10/147] Time 3.094s (8.489s) Speed 375.0 samples/s Data 0.000s (5.309s) Loss 0.04337 (0.03674)
Epoch: [280][20/147] Time 3.109s (5.805s) Speed 373.1 samples/s Data 0.000s (2.655s) Loss 0.02111 (0.03763)
...
Epoch: [280][80/147] Time 2.906s (3.706s) Speed 399.1 samples/s Data 0.000s (0.664s) Loss 0.04593 (0.03717)
Epoch: [280][90/147] Time 2.953s (3.634s) Speed 392.8 samples/s Data 0.000s (0.590s) Loss 0.07400 (0.03863)
Epoch: [280][100/147] Time 3.141s (3.575s) Speed 369.4 samples/s Data 0.000s (0.531s) Loss 0.03438 (0.03854)
Epoch: [280][110/147] Time 2.823s (3.525s) Speed 410.8 samples/s Data 0.000s (0.483s) Loss 0.04226 (0.03883)
Epoch: [280][120/147] Time 3.014s (3.482s) Speed 384.9 samples/s Data 0.000s (0.443s) Loss 0.02881 (0.03868)
Epoch: [280][130/147] Time 2.984s (3.445s) Speed 388.7 samples/s Data 0.000s (0.409s) Loss 0.04510 (0.03863)
Epoch: [280][140/147] Time 2.953s (3.414s) Speed 392.8 samples/s Data 0.000s (0.379s) Loss 0.02517 (0.03840)
----蚓-----舉-----牍------荖-----你----外------透-----兜------鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307
-22---7---00---5---3-- -4----7---3---8---11---4----------------------- => 27053 473814, gt: 27053 473814 79307
---(---予---梗----样----鏤---募----亘---叔---帏-------------------------------- => (予梗样鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307
-----------噓--- -加--- --弓-- --尚-- --算--- -畛--- -竿竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307
-----------近--- --盂-- -憧---- -斗-- --脂-- -醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307
------灶-----嘐-----蒸-----梃-----貂-----诛----皴-----镰-----晷晷----侏----------- => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307
---悚------菒-----拭------術------托------析------溜-----环------鰍------------- => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307
-----------------脯- -s -谗- -栗- -効- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307
------------眇---诚----緣----丸---喋----邕邕---踩----筑----涕----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307
----酴--- ---梅---- ---干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307
[#correct:164793 / #total:170477]
Test loss: 0.0387, accuray: 0.9667
acc validate time consume: 502.78747749328613 <--- 這最快 第一名
is best: True
best acc is: 0.9666582588853628
```
再試一下這樣
```
lib\core\function.py
inp = inp.cpu() # to(device) # add this
# inference
preds = model(inp).cpu() # why?
# preds = model(inp) #
strat of training time: 2023/03/26 13:02:39
load 3239051 images!
load 170477 images!
Epoch: [280][1/2793] Time 134.030s (134.030s) Speed 8.7 samples/s Data 119.123s (119.123s) Loss 0.02288 (0.02288)
Epoch: [280][101/2793] Time 1.016s (2.333s) Speed 1142.2 samples/s Data 0.000s (1.180s) Loss 0.01304 (0.02731)
Epoch: [280][201/2793] Time 1.029s (1.683s) Speed 1126.9 samples/s Data 0.000s (0.593s) Loss 0.01970 (0.02615)
...
Epoch: [280][2701/2793] Time 1.094s (1.126s) Speed 1060.6 samples/s Data 0.000s (0.044s) Loss 0.03228 (0.02746)
get_last_lr : [0.001] train epoch time consume: 3154.1199588775635
Epoch: [280][10/147] Time 3.219s (8.576s) Speed 360.4 samples/s Data 0.000s (5.364s) Loss 0.04234 (0.03568)
Epoch: [280][20/147] Time 3.125s (5.851s) Speed 371.2 samples/s Data 0.000s (2.682s) Loss 0.02054 (0.03613)
Epoch: [280][30/147] Time 3.109s (4.942s) Speed 373.1 samples/s Data 0.000s (1.788s) Loss 0.01800 (0.03577)
Epoch: [280][40/147] Time 3.062s (4.496s) Speed 378.8 samples/s Data 0.000s (1.341s) Loss 0.02370 (0.03537)
Epoch: [280][50/147] Time 3.375s (4.245s) Speed 343.7 samples/s Data 0.016s (1.073s) Loss 0.04410 (0.03662)
Epoch: [280][60/147] Time 3.141s (4.071s) Speed 369.4 samples/s Data 0.000s (0.895s) Loss 0.03769 (0.03661)
Epoch: [280][70/147] Time 3.187s (3.937s) Speed 363.9 samples/s Data 0.000s (0.767s) Loss 0.02095 (0.03587)
Epoch: [280][80/147] Time 3.203s (3.845s) Speed 362.2 samples/s Data 0.000s (0.671s) Loss 0.04494 (0.03565)
Epoch: [280][90/147] Time 3.203s (3.763s) Speed 362.1 samples/s Data 0.000s (0.597s) Loss 0.06963 (0.03693)
Epoch: [280][100/147] Time 3.328s (3.699s) Speed 348.5 samples/s Data 0.000s (0.537s) Loss 0.03398 (0.03688)
Epoch: [280][110/147] Time 3.078s (3.647s) Speed 376.9 samples/s Data 0.000s (0.488s) Loss 0.04158 (0.03719)
Epoch: [280][120/147] Time 3.078s (3.605s) Speed 376.9 samples/s Data 0.000s (0.447s) Loss 0.02884 (0.03702)
Epoch: [280][130/147] Time 3.047s (3.566s) Speed 380.7 samples/s Data 0.000s (0.413s) Loss 0.04320 (0.03699)
Epoch: [280][140/147] Time 3.188s (3.532s) Speed 363.9 samples/s Data 0.000s (0.383s) Loss 0.02361 (0.03681)
----蚓-----舉------牍-----荖-----你-----外-----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307
[#correct:165368 / #total:170477]
Test loss: 0.0370, accuray: 0.9700
acc validate time consume: 521.0724451541901
is best: True
best acc is: 0.9700311478967837
```
# 最後最快的是這個組合:
lib\core\function.py
inp = inp.to(device) # add this <--- 加上去
# inference
preds = model(inp).cpu() # why?
# preds = model(inp) #
但是令人不解的是 為什麼validate 放到GPU上會變遲緩!?為什麼不能像training時那樣每秒處理1千多個樣本!!?
```
完整的跑一次的內容
TRAIN:
BATCH_SIZE_PER_GPU: 1160
SHUFFLE: True
BEGIN_EPOCH: 0
END_EPOCH: 1000
RESUME:
IS_RESUME: True
FILE: 'output/360CC/checkpoints/checkpoint_280_acc_0.9700.pth'
OPTIMIZER: 'adam'
LR: 0.001
WD: 0.0
LR_STEP: [290, 300] # 第一個epoch以下為設定值,然後 *0.1,第二個eopch以上再 *0.1 (1/100)
LR_FACTOR: 0.1
MOMENTUM: 0.0
NESTEROV: False
RMSPROP_ALPHA:
RMSPROP_CENTERED:
FINETUNE:
IS_FINETUNE: False
FINETUNE_CHECKPOINIT: 'output/360CC/checkpoints/checkpoint_266_acc_0.9443.pth'
FREEZE: False
TEST:
BATCH_SIZE_PER_GPU: 1160
SHUFFLE: False # for random test rather than test on the whole validation set
NUM_TEST_BATCH: 1000
NUM_TEST_DISP: 10
項目說明:
Epoch: [281][101/2793] [current epoch][step/total step per epoch]
Time 1.031s (2.397s) batch_time ( avg_batch_time )
Speed 1124.9 samples/s throughput for whole batch process include load data
Data 0.000s (1.190s) load data_time ( avg_data_time )
Loss 0.02044 (0.02524) batch loss ( avg_loss )
strat of training time: 2023/03/26 14:26:44
=> creating output\360CC\checkpoints
=> creating output\360CC\log
torch.cuda.device_count() 2
load 3239051 images!
load 170477 images!
Epoch: [281][1/2793] Time 138.855s (138.855s) Speed 8.4 samples/s Data 120.167s (120.167s) Loss 0.02668 (0.02668)
Epoch: [281][101/2793] Time 1.031s (2.397s) Speed 1124.9 samples/s Data 0.000s (1.190s) Loss 0.02044 (0.02524)
Epoch: [281][201/2793] Time 1.031s (1.716s) Speed 1125.2 samples/s Data 0.000s (0.598s) Loss 0.01391 (0.02521)
Epoch: [281][301/2793] Time 1.016s (1.488s) Speed 1142.2 samples/s Data 0.000s (0.400s) Loss 0.02905 (0.02518)
Epoch: [281][401/2793] Time 1.047s (1.380s) Speed 1108.1 samples/s Data 0.000s (0.300s) Loss 0.02084 (0.02535)
Epoch: [281][501/2793] Time 1.047s (1.313s) Speed 1107.8 samples/s Data 0.000s (0.240s) Loss 0.01579 (0.02535)
Epoch: [281][601/2793] Time 1.047s (1.269s) Speed 1108.4 samples/s Data 0.000s (0.200s) Loss 0.01448 (0.02533)
Epoch: [281][701/2793] Time 1.063s (1.238s) Speed 1091.6 samples/s Data 0.000s (0.172s) Loss 0.01883 (0.02510)
Epoch: [281][801/2793] Time 1.047s (1.216s) Speed 1108.1 samples/s Data 0.000s (0.150s) Loss 0.02183 (0.02499)
Epoch: [281][901/2793] Time 1.063s (1.199s) Speed 1091.7 samples/s Data 0.000s (0.134s) Loss 0.03559 (0.02507)
Epoch: [281][1001/2793] Time 1.063s (1.186s) Speed 1091.8 samples/s Data 0.000s (0.120s) Loss 0.02354 (0.02560)
Epoch: [281][1101/2793] Time 1.155s (1.175s) Speed 1004.2 samples/s Data 0.000s (0.110s) Loss 0.05582 (0.02589)
Epoch: [281][1201/2793] Time 1.072s (1.168s) Speed 1081.9 samples/s Data 0.000s (0.100s) Loss 0.02183 (0.02582)
Epoch: [281][1301/2793] Time 1.047s (1.161s) Speed 1108.2 samples/s Data 0.000s (0.093s) Loss 0.02453 (0.02580)
Epoch: [281][1401/2793] Time 1.057s (1.156s) Speed 1097.8 samples/s Data 0.000s (0.086s) Loss 0.02911 (0.02582)
Epoch: [281][1501/2793] Time 1.078s (1.150s) Speed 1076.1 samples/s Data 0.000s (0.081s) Loss 0.02047 (0.02589)
Epoch: [281][1601/2793] Time 1.048s (1.146s) Speed 1107.1 samples/s Data 0.000s (0.076s) Loss 0.02500 (0.02589)
Epoch: [281][1701/2793] Time 1.062s (1.141s) Speed 1092.7 samples/s Data 0.000s (0.071s) Loss 0.02378 (0.02621)
Epoch: [281][1801/2793] Time 1.047s (1.138s) Speed 1107.7 samples/s Data 0.000s (0.067s) Loss 0.02426 (0.02614)
Epoch: [281][1901/2793] Time 1.078s (1.134s) Speed 1076.2 samples/s Data 0.000s (0.064s) Loss 0.03277 (0.02616)
Epoch: [281][2001/2793] Time 1.063s (1.131s) Speed 1091.7 samples/s Data 0.000s (0.061s) Loss 0.02726 (0.02650)
Epoch: [281][2101/2793] Time 1.063s (1.129s) Speed 1091.5 samples/s Data 0.000s (0.058s) Loss 0.01275 (0.02642)
Epoch: [281][2201/2793] Time 1.094s (1.126s) Speed 1060.7 samples/s Data 0.000s (0.055s) Loss 0.02007 (0.02634)
Epoch: [281][2301/2793] Time 1.062s (1.124s) Speed 1091.9 samples/s Data 0.000s (0.053s) Loss 0.02896 (0.02628)
Epoch: [281][2401/2793] Time 1.078s (1.122s) Speed 1076.1 samples/s Data 0.000s (0.051s) Loss 0.01623 (0.02623)
Epoch: [281][2501/2793] Time 1.063s (1.120s) Speed 1091.3 samples/s Data 0.000s (0.049s) Loss 0.01629 (0.02618)
Epoch: [281][2601/2793] Time 1.125s (1.119s) Speed 1031.1 samples/s Data 0.000s (0.047s) Loss 0.01695 (0.02604)
Epoch: [281][2701/2793] Time 1.078s (1.117s) Speed 1076.2 samples/s Data 0.000s (0.045s) Loss 0.02351 (0.02602)
get_last_lr : [0.001] train epoch time consume: 3126.1498596668243
Epoch: [281][10/147] Time 3.172s (8.523s) Speed 365.7 samples/s Data 0.000s (5.372s) Loss 0.04474 (0.03685)
Epoch: [281][20/147] Time 3.156s (5.830s) Speed 367.5 samples/s Data 0.000s (2.686s) Loss 0.02303 (0.03750)
Epoch: [281][30/147] Time 3.109s (4.926s) Speed 373.1 samples/s Data 0.000s (1.791s) Loss 0.01673 (0.03712)
Epoch: [281][40/147] Time 3.156s (4.473s) Speed 367.5 samples/s Data 0.000s (1.343s) Loss 0.02499 (0.03667)
Epoch: [281][50/147] Time 3.078s (4.195s) Speed 376.9 samples/s Data 0.000s (1.074s) Loss 0.03983 (0.03772)
Epoch: [281][60/147] Time 3.234s (4.035s) Speed 358.6 samples/s Data 0.000s (0.895s) Loss 0.04007 (0.03788)
Epoch: [281][70/147] Time 3.031s (3.909s) Speed 382.7 samples/s Data 0.000s (0.767s) Loss 0.02486 (0.03717)
Epoch: [281][80/147] Time 3.047s (3.811s) Speed 380.7 samples/s Data 0.000s (0.671s) Loss 0.04637 (0.03689)
Epoch: [281][90/147] Time 3.234s (3.735s) Speed 358.6 samples/s Data 0.000s (0.597s) Loss 0.07367 (0.03817)
Epoch: [281][100/147] Time 3.375s (3.683s) Speed 343.7 samples/s Data 0.000s (0.537s) Loss 0.03454 (0.03795)
Epoch: [281][110/147] Time 3.125s (3.645s) Speed 371.2 samples/s Data 0.000s (0.488s) Loss 0.04324 (0.03828)
Epoch: [281][120/147] Time 3.047s (3.602s) Speed 380.7 samples/s Data 0.000s (0.448s) Loss 0.03007 (0.03813)
Epoch: [281][130/147] Time 3.221s (3.564s) Speed 360.1 samples/s Data 0.000s (0.413s) Loss 0.04645 (0.03808)
Epoch: [281][140/147] Time 3.047s (3.533s) Speed 380.7 samples/s Data 0.000s (0.384s) Loss 0.02663 (0.03794)
----蚓-----舉------牍-----荖----你------外-----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307
-2----7---0----5---3-- -4----7---3---8----1---4----------------------- => 27053 473814, gt: 27053 473814 79307
---(---予----梗---梓----鏤---募----亘---叔---帏-------------------------------- => (予梗梓鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307
-----------噓--- -加--- --弓-- --尚-- --算-- --畛-- -竿竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307
------------近-- --盂-- -憧--- -斗-- --脂-- -醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307
-------灶----嘐嘐----蒸-----梃-----貂-----诛----皴皴----镰-----晷----侏------------ => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307
---悚------菒------拭-----術------托------析------溜-----环-------鰍------------ => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307
-----------------脯- -s -谗- -栗- 効-- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307
------------眇---诚----緣----丸----喋---邕邕---踩----筑----涕----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307
----酴--- ----梅---- ----干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307
[#correct:164903 / #total:170477]
Test loss: 0.0382, accuray: 0.9673
acc validate time consume: 519.9725430011749
is best: True
best acc is: 0.9673035072179825
```

# 正確的設定方法
根據pytorch說明(https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html)
```
class DataParallelModel(nn.Module):
def __init__(self):
super().__init__()
self.block1 = nn.Linear(10, 20)
# wrap block2 in DataParallel
self.block2 = nn.Linear(20, 20)
self.block2 = nn.DataParallel(self.block2) # <---------- 這裡
self.block3 = nn.Linear(20, 20)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
return x
```
這意思是指在模型網路中將資料分布到多棵GPU中去運算,所以加在model定義中,所以實際改成這樣
```
train.py
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7'
model = crnn.get_crnn(config)
model = model.to(device)
crnn.py # model definition
self.cnn = cnn
self.cnn = nn.DataParallel(self.cnn) # <------ here !
self.rnn = nn.Sequential(
BidirectionalLSTM(512, nh, nh),
BidirectionalLSTM(nh, nh, nclass))
self.rnn = nn.DataParallel(self.rnn) # <------ here !
lib/core/function.py
def train(config, train_loader, dataset, converter, model, ...
preds = model(inp).cuda() # <---- 改成這個
def validate(config, val_loader, dataset, converter, model, ...
preds = model(inp).cuda() # <---- 改成這個 (註:這裡不對,要用CPU)
config/360CC_config.yaml
增加 worker、Batch_Size (train valid)
```
執行效果
```
load 3279605 images!
config.TRAIN.BATCH_SIZE_PER_GPU 4000
load 364400 images!
config.TEST.BATCH_SIZE_PER_GPU 4000
torch.Size([4000, 512, 1, 41])
Epoch: [0][0/820] Time 54.249s (54.249s) Speed 73.7 samples/s Data 27.694s (27.694s) Loss 32.95580 (32.95580)
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
torch.Size([4000, 512, 1, 41])
Epoch: [0][10/820] Time 1.102s (5.968s) Speed 3629.7 samples/s Data 0.015s (2.540s) Loss 31.65610 (32.42912)
```
下圖是每一顆GPU都使用到的狀況

# 錯誤的設定方法
```
train.py
# construct face related neural networks
model = crnn.get_crnn(config)
# get device
print('torch.cuda.device_count()', torch.cuda.device_count())
device="cuda"
model = torch.nn.DataParallel(model, device_ids=[0,1]) # 設了兩顆GPU
model = model.to(device)
```
# 單顆時的參數(情形)
```
load 2000 images!
load 1000 images!
torch.Size([1000, 512, 1, 41])
lib/function preds.size(): torch.Size([41, 1000, 6736])
lib/function inp.size(): torch.Size([1000, 1, 32, 160])
lib/function inp.size(0): 1000
lib/function batch_size: 1000
lib/function text.size(), length.size(): torch.Size([10000]) torch.Size([1000])
lib/function preds_size.size(), [preds.size(0)]: torch.Size([1000]) [41]
loss = criterion(preds, text, preds_size, length)
loss = criterion(preds, text, torch.Size([41, 1000, 6736]), torch.Size([1000])
```
# 2顆GPU的情形
```
torch.Size([500, 512, 1, 41])
torch.Size([500, 512, 1, 41])
lib/function preds.size(): torch.Size([82, 500, 6736])
lib/function inp.size(): torch.Size([1000, 1, 32, 160])
lib/function inp.size(0): 1000
lib/function batch_size: 1000
lib/function text.size(), length.size(): torch.Size([10000]) torch.Size([1000])
lib/function preds_size.size(), [preds.size(0)]: torch.Size([1000]) [82]
loss = criterion(preds, text, preds_size, length)
loss = criterion(preds, text, torch.Size([82, 500, 6736]), torch.Size([1000])
# 變成 41 + 41 = 82 , 而且 batch size 減半變 500
```
結果失敗收場 失敗的原因大概是:
model = torch.nn.DataParallel(model, device_ids=[0,1])
這是把模型複製到每個GPU,但是在計算loss的時候,似乎把 w = 41 合併了,所以變成
原來 torch.Size([41, 1000, 6736]) 變成 torch.Size([82, 500, 6736])
w被乘以2、batch被除以2(分到兩張GPU去了),看樣子是被從0軸(橫)串再一起
也就是 torch.Size([41, 500, 6736]) + torch.Size([41, 500, 6736]) -> torch.Size([82, 500, 6736])