# CRNN multiGPU training Chinese_Characters_Rec 修改成多顆GPU訓練 # 23'0519 inference issue (rename checkpoint state_dict) 開始做最後inference程式,遇到了模型checkpoint無法載入問題,分析原因及處理辦法 ``` 訓練時修改了多顆GPU的語法(self.cnn = nn.DataParallel(self.cnn)) 儲存模型時 state_dict 都會自己加上 module. 字串,如:'cnn.module.conv0.weight', 'rnn.module.1.embedding.bias' 但原crnn(非多GPU)程式產生的state_dict就沒有,如:'cnn.conv0.weight','rnn.1.embedding.bias' 現在要改回非多GPU包含用CPU的inference程式時, 載入checkpoint報錯,因為他要'cnn.conv0.weight',不要'cnn.module.conv0.weight'(多出modile.的), 所以要將state_dict的key rename回來。方法如下: import torch from lib.utils.utils import model_info import lib.models.crnn as crnn checkpoint_path = 'output/checkpoints/checkpoint_349_acc_0.9797.pth' checkpoint = torch.load(checkpoint_path, map_location='cpu') rename_list = [] # 用來存放新舊名稱的對應關係 old_list = list(checkpoint['state_dict'].keys()) # for ol in old_list: # rename_list.append([ol, ol.replace('module.', '')]) # print('rename_list:', rename_list) for key in old_list: checkpoint['state_dict'][key.replace('module.', '')] = checkpoint['state_dict'].pop(key) torch.save(checkpoint, 'output/checkpoints/checkpoint_349_acc_0.9797_rename.pth') ``` # 23'0329 新進展 修改處如下: ``` lib\core\function.py inp = inp.cuda() # add this 這組合最快 to(device) preds = model(inp) # 這裡用 cuda() 速度加倍 350/ss -> 700/sample-sec ... preds = preds.cpu() # 這裡是關鍵,sim_preds_convert_time_B 1.009 (本來是8秒) sim_preds = converter.decode(preds.data, preds_size.data, raw=False) lib\config\360CC_config.yaml # 這裡只改 batch_size TRAIN: BATCH_SIZE_PER_GPU: 1100 SHUFFLE: True BEGIN_EPOCH: 0 END_EPOCH: 1000 RESUME: IS_RESUME: True FILE: 'output/360CC/checkpoints/checkpoint_316_acc_0.9783.pth' OPTIMIZER: 'adam' LR: 0.001 WD: 0.0 LR_STEP: [315, 320] # 第一個epoch以下為設定值,然後 *0.1,第二個eopch以上再 *0.1 (1/100) LR_FACTOR: 0.1 # https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.MultiStepLR.html MOMENTUM: 0.0 NESTEROV: False RMSPROP_ALPHA: RMSPROP_CENTERED: FINETUNE: IS_FINETUNE: False FINETUNE_CHECKPOINIT: 'output/360CC/checkpoints/checkpoint_266_acc_0.9443.pth' FREEZE: False TEST: BATCH_SIZE_PER_GPU: 1100 SHUFFLE: True # for random test rather than test on the whole validation set NUM_TEST_BATCH: 1000 NUM_TEST_DISP: 10 ``` 結果如下: ``` strat of training time: 2023/03/29 16:55:15 => creating output\360CC\checkpoints => creating output\360CC\log torch.cuda.device_count() 2 load 3239051 images! load 170477 images! Epoch: [317][1/2945] Time 137.122s (137.122s) Speed 8.0 samples/s Data 120.794s (120.794s) Loss 0.00039 (0.00039) preds_time 10.703 Epoch: [317][101/2945] Time 0.953s (2.281s) Speed 1154.1 samples/s Data 0.000s (1.196s) Loss 0.00325 (0.00521) preds_time 0.343 Epoch: [317][201/2945] Time 0.953s (1.618s) Speed 1154.1 samples/s Data 0.000s (0.601s) Loss 0.01150 (0.00523) preds_time 0.295 Epoch: [317][301/2945] Time 1.000s (1.402s) Speed 1100.0 samples/s Data 0.016s (0.402s) Loss 0.00591 (0.00595) preds_time 0.282 Epoch: [317][401/2945] Time 0.984s (1.295s) Speed 1117.5 samples/s Data 0.016s (0.302s) Loss 0.00525 (0.00665) preds_time 0.276 Epoch: [317][501/2945] Time 0.984s (1.233s) Speed 1117.5 samples/s Data 0.000s (0.242s) Loss 0.00417 (0.00689) preds_time 0.274 Epoch: [317][601/2945] Time 1.016s (1.194s) Speed 1083.1 samples/s Data 0.000s (0.202s) Loss 0.00595 (0.00715) preds_time 0.273 Epoch: [317][701/2945] Time 1.000s (1.166s) Speed 1100.0 samples/s Data 0.000s (0.173s) Loss 0.00321 (0.00723) preds_time 0.273 Epoch: [317][801/2945] Time 1.000s (1.147s) Speed 1100.0 samples/s Data 0.000s (0.151s) Loss 0.01254 (0.00772) preds_time 0.274 Epoch: [317][901/2945] Time 1.172s (1.132s) Speed 938.7 samples/s Data 0.000s (0.135s) Loss 0.01000 (0.00779) preds_time 0.274 Epoch: [317][1001/2945] Time 1.031s (1.121s) Speed 1066.7 samples/s Data 0.000s (0.121s) Loss 0.00282 (0.00773) preds_time 0.274 Epoch: [317][1101/2945] Time 1.031s (1.112s) Speed 1066.7 samples/s Data 0.016s (0.110s) Loss 0.00731 (0.00764) preds_time 0.274 Epoch: [317][1201/2945] Time 1.047s (1.104s) Speed 1050.8 samples/s Data 0.000s (0.101s) Loss 0.01373 (0.00757) preds_time 0.274 Epoch: [317][1301/2945] Time 1.000s (1.098s) Speed 1100.0 samples/s Data 0.000s (0.093s) Loss 0.00786 (0.00786) preds_time 0.273 Epoch: [317][1401/2945] Time 1.031s (1.093s) Speed 1066.5 samples/s Data 0.000s (0.087s) Loss 0.01049 (0.00806) preds_time 0.274 Epoch: [317][1501/2945] Time 1.000s (1.088s) Speed 1100.2 samples/s Data 0.000s (0.081s) Loss 0.00809 (0.00807) preds_time 0.274 Epoch: [317][1601/2945] Time 1.007s (1.084s) Speed 1092.8 samples/s Data 0.000s (0.076s) Loss 0.00892 (0.00800) preds_time 0.274 Epoch: [317][1701/2945] Time 1.065s (1.081s) Speed 1032.8 samples/s Data 0.000s (0.071s) Loss 0.01292 (0.00796) preds_time 0.274 Epoch: [317][1801/2945] Time 1.297s (1.079s) Speed 848.2 samples/s Data 0.000s (0.067s) Loss 0.00706 (0.00851) preds_time 0.275 Epoch: [317][1901/2945] Time 1.004s (1.076s) Speed 1095.2 samples/s Data 0.000s (0.064s) Loss 0.00488 (0.00853) preds_time 0.275 Epoch: [317][2001/2945] Time 1.016s (1.074s) Speed 1083.2 samples/s Data 0.000s (0.061s) Loss 0.00518 (0.00851) preds_time 0.275 Epoch: [317][2101/2945] Time 1.016s (1.072s) Speed 1083.1 samples/s Data 0.000s (0.058s) Loss 0.01083 (0.00851) preds_time 0.275 Epoch: [317][2201/2945] Time 1.032s (1.070s) Speed 1065.6 samples/s Data 0.000s (0.055s) Loss 0.01062 (0.00850) preds_time 0.275 Epoch: [317][2301/2945] Time 1.031s (1.068s) Speed 1066.7 samples/s Data 0.000s (0.053s) Loss 0.02037 (0.00857) preds_time 0.276 Epoch: [317][2401/2945] Time 1.016s (1.066s) Speed 1083.1 samples/s Data 0.000s (0.051s) Loss 0.00674 (0.00872) preds_time 0.276 Epoch: [317][2501/2945] Time 1.031s (1.065s) Speed 1066.7 samples/s Data 0.000s (0.049s) Loss 0.01201 (0.00884) preds_time 0.276 Epoch: [317][2601/2945] Time 1.047s (1.064s) Speed 1050.8 samples/s Data 0.000s (0.047s) Loss 0.02099 (0.00890) preds_time 0.277 Epoch: [317][2701/2945] Time 1.031s (1.063s) Speed 1066.7 samples/s Data 0.000s (0.045s) Loss 0.00874 (0.00897) preds_time 0.277 Epoch: [317][2801/2945] Time 1.047s (1.062s) Speed 1050.8 samples/s Data 0.016s (0.044s) Loss 0.00585 (0.00895) preds_time 0.277 Epoch: [317][2901/2945] Time 1.004s (1.061s) Speed 1095.2 samples/s Data 0.000s (0.042s) Loss 0.01214 (0.00892) preds_time 0.278 get_last_lr : [0.001] train epoch time consume: 3134.1819939613342 Epoch: [317][10/155] Time 1.166s (6.662s) Speed 943.1 samples/s Data 0.000s (5.352s) Loss 0.06587 (0.07565) conver_time_A 0.004 preds_time 0.294 sim_preds_convert_time_B 0.849 Epoch: [317][20/155] Time 1.268s (3.952s) Speed 867.4 samples/s Data 0.000s (2.677s) Loss 0.04255 (0.06993) conver_time_A 0.003 preds_time 0.273 sim_preds_convert_time_B 0.839 Epoch: [317][30/155] Time 1.396s (3.063s) Speed 787.8 samples/s Data 0.016s (1.785s) Loss 0.07917 (0.06675) conver_time_A 0.003 preds_time 0.265 sim_preds_convert_time_B 0.851 Epoch: [317][40/155] Time 1.359s (2.621s) Speed 809.3 samples/s Data 0.000s (1.339s) Loss 0.03638 (0.06470) conver_time_A 0.003 preds_time 0.262 sim_preds_convert_time_B 0.859 Epoch: [317][50/155] Time 1.152s (2.354s) Speed 954.9 samples/s Data 0.000s (1.071s) Loss 0.08211 (0.06429) conver_time_A 0.003 preds_time 0.262 sim_preds_convert_time_B 0.861 Epoch: [317][60/155] Time 1.351s (2.171s) Speed 814.5 samples/s Data 0.001s (0.893s) Loss 0.09275 (0.06577) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.857 Epoch: [317][70/155] Time 1.240s (2.056s) Speed 886.8 samples/s Data 0.000s (0.765s) Loss 0.05339 (0.06642) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.867 Epoch: [317][80/155] Time 1.234s (1.958s) Speed 891.2 samples/s Data 0.000s (0.670s) Loss 0.10279 (0.06642) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.864 Epoch: [317][90/155] Time 1.346s (1.890s) Speed 817.3 samples/s Data 0.000s (0.595s) Loss 0.04559 (0.06646) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.872 Epoch: [317][100/155] Time 1.205s (1.824s) Speed 912.7 samples/s Data 0.000s (0.536s) Loss 0.07828 (0.06568) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.865 Epoch: [317][110/155] Time 1.297s (1.770s) Speed 847.8 samples/s Data 0.000s (0.487s) Loss 0.07391 (0.06615) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.860 Epoch: [317][120/155] Time 1.203s (1.727s) Speed 914.2 samples/s Data 0.000s (0.447s) Loss 0.07416 (0.06679) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.857 Epoch: [317][130/155] Time 1.297s (1.692s) Speed 848.2 samples/s Data 0.000s (0.412s) Loss 0.05114 (0.06671) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.858 Epoch: [317][140/155] Time 1.297s (1.670s) Speed 848.3 samples/s Data 0.000s (0.383s) Loss 0.03633 (0.06670) conver_time_A 0.002 preds_time 0.260 sim_preds_convert_time_B 0.867 Epoch: [317][150/155] Time 1.193s (1.641s) Speed 921.7 samples/s Data 0.000s (0.357s) Loss 0.08962 (0.06750) conver_time_A 0.002 preds_time 0.261 sim_preds_convert_time_B 0.863 ------------眇---寛----藤----瓠----隅---技----緁----癣----湟----仄--------------- => 眇寛藤瓠隅技緁癣湟仄, gt: 眇寛藤瓠隅技緁癣湟仄 76467 -N---a-t-iikk-a- -f-r-e--r-e--s---------------------------------------- => Natika freres, gt: Natika freres 76467 ---莛-----薨-----鲈----雀-----酷---咀-----烱-----谯----承----------------------- => 莛薨鲈雀酷咀烱谯承 , gt: 莛薨鲈雀酷咀烱谯承 76467 -------焚-----辉----琁-----嗓-----饞-----鸳----棚-----媞-----巫----萦------------ => 焚辉琁嗓饞鸳棚媞巫萦, gt: 焚辉琁嗓饞鸳棚媞巫萦 76467 ------------貂----剿---佬----答----狡---坏----他---枌----殊---鎘----------------- => 貂剿佬答狡坏他枌殊鎘, gt: 貂剿佬答狡坏他枌殊鎘 76467 ---辰------篱------誣------铸-----貫------靂------淫------屍------垂------------ => 辰篱誣铸貫靂淫屍垂 , gt: 辰篱誣铸貫靂淫屍垂 76467 -------------趺----銧---涛---錐----姘---漕漕---僩---褡---碱----帕----------------- => 趺銧涛錐姘漕僩褡碱帕, gt: 趺銧涛錐姘漕僩褡碱帕 76467 -------------峯-- --践-- --方-- -橘-- --腥-- -渣--- -薯----------------- => 峯 践 方 橘 腥 渣 薯, gt: 峯 践 方 橘 腥 渣 薯 76467 ---廼----怯-----纔----丑----霧----热-----瘸----遼----褟------------------------- => 廼怯纔丑霧热瘸遼褟 , gt: 廼怯纔丑霧热瘸遼褟 76467 -----莹---- --柜--- --砧--- --替-- ---亢--- --瓶--- --涼---------- => 莹 柜 砧 替 亢 瓶 涼, gt: 莹 柜 砧 替 亢 瓶 涼 76467 [#correct:163821 / #total:170477] Test loss: 0.0676, accuray: 0.9610 acc validate time consume: 256.3665907382965 is best: True best acc is: 0.9609566099825783 ``` 以前是這樣: ``` load 3239051 images! load 170477 images! Epoch: [280][1/2793] Time 142.859s (142.859s) Speed 8.1 samples/s Data 124.562s (124.562s) Loss 0.01011 (0.01011) Epoch: [280][101/2793] Time 1.031s (2.434s) Speed 1124.8 samples/s Data 0.000s (1.234s) Loss 0.01207 (0.02511) ... Epoch: [280][2701/2793] Time 1.187s (1.159s) Speed 976.8 samples/s Data 0.000s (0.047s) Loss 0.02497 (0.02608) get_last_lr : [0.001] train epoch time consume: 3244.6378741264343 Epoch: [280][10/147] Time 3.359s (9.219s) Speed 345.3 samples/s Data 0.000s (5.784s) Loss 0.04841 (0.04270) Epoch: [280][20/147] Time 3.188s (6.208s) Speed 363.9 samples/s Data 0.000s (2.892s) Loss 0.02864 (0.04360) Epoch: [280][30/147] Time 3.172s (5.228s) Speed 365.7 samples/s Data 0.000s (1.929s) Loss 0.02289 (0.04255) ... Epoch: [280][130/147] Time 3.000s (3.529s) Speed 386.7 samples/s Data 0.000s (0.445s) Loss 0.04832 (0.04314) Epoch: [280][140/147] Time 3.000s (3.489s) Speed 386.7 samples/s Data 0.000s (0.414s) Loss 0.03028 (0.04297) ----酴--- ---梅---- ---干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307 [#correct:162978 / #total:170477] Test loss: 0.0432, accuray: 0.9560 acc validate time consume: 513.2954776287079 <--- 這第二名 ``` # 23'0325 再次調整設定方法 目前training階段GPU使用情形還滿正常的,但是validate時好像偏重其中一顆GPU ![](https://i.imgur.com/q6taFch.png) ![](https://i.imgur.com/IhoprUL.png) 目前的設定方式為 ``` lib\models\crnn.py self.cnn = cnn self.cnn = nn.DataParallel(self.cnn) <--- here self.rnn = nn.Sequential( BidirectionalLSTM(512, nh, nh), BidirectionalLSTM(nh, nh, nclass)) self.rnn = nn.DataParallel(self.rnn) <--- here train.py elif config.TRAIN.RESUME.IS_RESUME: model_state_file = config.TRAIN.RESUME.FILE if model_state_file == '': print(" => no checkpoint found") checkpoint = torch.load(model_state_file, map_location='cpu') if 'state_dict' in checkpoint.keys(): model.load_state_dict(checkpoint['state_dict']) last_epoch = checkpoint['epoch'] # optimizer.load_state_dict(checkpoint['optimizer']) # lr_scheduler.load_state_dict(checkpoint['lr_scheduler']) else: model.load_state_dict(checkpoint) # model = torch.nn.DataParallel(model, device_ids=[0,1]) model = model.to(device) # 模型完整載入後再送去 device <--- here lib\core\function.py labels = utils.get_batch_label(dataset, idx) # print('lib/function len(labels): ', len(labels)) inp = inp.to(device) # print('lib/function inp.size(): ', inp.size()) # inference preds = model(inp) #.cuda() ... labels = utils.get_batch_label(dataset, idx) # labels = labels.to(device) inp = inp.to(device) ... # inference # model = model.to(device) preds = model(inp) # .cuda() ``` 也加上計算時間結果如下: ``` strat of training time: 2023/03/25 06:24:55 => creating output\360CC\checkpoints => creating output\360CC\log torch.cuda.device_count() 2 load 3242588 images! load 170663 images! C:\Application\Anaconda3\envs\dbnet_wenmu\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support warnings.warn('PyTorch is not compiled with NCCL support') Epoch: [265][1/3167] Time 75.981s (75.981s) Speed 13.5 samples/s Data 63.827s (63.827s) Loss 0.06747 (0.06747) ... Epoch: [265][3101/3167] Time 0.971s (1.006s) Speed 1054.4 samples/s Data 0.000s (0.027s) Loss 0.05320 (0.05584) get_last_lr : [0.001] train epoch time consume: 3192.4820868968964 <--- train Epoch: [265][100/167] -1------11--6--9---6---9--1- -0----- --1----s----o--------------------- => 1169691 0 1so, gt: 4588969107 2865 48209 ----池-----绗------重-----斜------隋------煨-----定------应------乌------------- => 池绗重斜隋煨定应乌 , gt: 池绗重斜隋煨定应乌 48209 -------距----韌-----痿痿----誅-----9---屬----遣------离-----舄------滢----------- => 距韌痿誅9屬遣离舄滢, gt: 距韌痿誅9屬遣离舄滢 48209 ----镕-----冢-----鐀------營-----拷------憓------璧-----尾-----姪--------------- => 镕冢鐀營拷憓璧尾姪 , gt: 镕冢鐀營拷憓璧尾姪 48209 ----夤-----繽-----坼------秾-----纶------蹴-----司------蚤------靂-------------- => 夤繽坼秾纶蹴司蚤靂 , gt: 夤繽坼秾纶蹴司蚤靂 48209 ----理---泛--押----旼---貿----繚----鐡---汯---咚-------------------------------- => 理泛押旼貿繚鐡汯咚 , gt: 理泛押旼貿繚鐡汯咚 48209 ----惆--- ----渼--- ----愴--- ----最--- -----寷------------------------- => 惆 渼 愴 最 寷 , gt: 惆 渼 愴 最 寷 48209 ----骯---越-----舷-----劂----裂-----市-----直----丧-----业---------------------- => 骯越舷劂裂市直丧业 , gt: 骯越舷劂裂市直丧业 48209 --------埔-----褫----礽-----戴-----疏----愉-----凶-----瓠----阿-----\----------- => 埔褫礽戴疏愉凶瓠阿\, gt: 埔褫礽戴疏愉凶瓠阿\ 48209 ----眨---屎---彪---鵲----井---俯俯---彤----妨---帑------------------------------- => 眨屎彪鵲井俯彤妨帑 , gt: 眨屎彪鵲井俯彤妨帑 48209 [#correct:154625 / #total:170663] Test loss: 0.0761, accuray: 0.9060 acc validate time consume: 1296.1403181552887 <--- validate time is best: True best acc is: 0.9060253247628367 Epoch: [266][1/3167] Time 64.968s (64.968s) Speed 15.8 samples/s Data 63.187s (63.187s) Loss 0.05739 (0.05739) Epoch: [266][101/3167] Time 0.953s (1.570s) Speed 1074.4 samples/s Data 0.000s (0.633s) Loss 0.06573 (0.05908) ``` acc validate time 明顯太久,所以還是要設法解決此問題。 ``` lib\models\crnn.py # 這部分不變 self.cnn = cnn self.cnn = nn.DataParallel(self.cnn) self.rnn = nn.Sequential( BidirectionalLSTM(512, nh, nh), BidirectionalLSTM(nh, nh, nclass)) self.rnn = nn.DataParallel(self.rnn) lib\core\function.py criterion = criterion.cuda() model.train() inp = inp.cuda() ... criterion = criterion.cuda() model.eval() inp = inp.cuda() train.py model = model.cuda() # 模型完整載入後再送去 device model_info(model) ``` ``` TRAIN: BATCH_SIZE_PER_GPU: 1100 SHUFFLE: True BEGIN_EPOCH: 0 END_EPOCH: 1000 RESUME: IS_RESUME: True FILE: 'output/360CC/checkpoints/checkpoint_265_acc_0.9362.pth' OPTIMIZER: 'adam' LR: 0.001 strat of training time: 2023/03/25 11:14:44 => creating output\360CC\checkpoints => creating output\360CC\log torch.cuda.device_count() 2 load 3242588 images! load 170663 images! warnings.warn('PyTorch is not compiled with NCCL support') Epoch: [266][1/2948] Time 76.860s (76.860s) Speed 14.3 samples/s Data 64.411s (64.411s) Loss 0.06074 (0.06074) Epoch: [266][501/2948] Time 0.984s (1.125s) Speed 1117.5 samples/s Data 0.000s (0.129s) Loss 0.05679 (0.05020) Epoch: [266][601/2948] Time 0.984s (1.103s) Speed 1117.4 samples/s Data 0.000s (0.108s) Loss 0.05902 (0.05163) ... Epoch: [266][2901/2948] Time 1.016s (1.044s) Speed 1083.1 samples/s Data 0.000s (0.023s) Loss 0.03073 (0.05148) get_last_lr : [0.001] train epoch time consume: 3087.827137708664 Epoch: [266][100/156] -------------蘚----邃---曠----隸---灞----鍹---[-羯羯--泊----吏------------------- => 蘚邃曠隸灞鍹[羯泊吏, gt: 蘚邃曠隸灞鍹[羯泊吏 11573 -------隆----這----柿-----渑------窯---没没-----駿----军-----犁----扑------------- => 隆這柿渑窯没駿军犁扑, gt: 隆這柿渑窯没駿军犁扑 11573 ----忪---禧-----呢----廠----别----迫-----W---签签----艮------------------------- => 忪禧呢廠别迫W签艮 , gt: 忪禧呢廠别迫W签艮 11573 ----炯-----几------辂-----迦-----沭沭------胚-----樵------黝------诡------------- => 炯几辂迦沭胚樵黝诡 , gt: 炯几辂迦沭胚樵黝诡 11573 ----碇--- ----瀑--- ---骚--- ---驃--- ----箇--------------------------- => 碇 瀑 骚 驃 箇 , gt: 碇 瀑 骚 驃 箇 11573 ---羹----- ------媃----- ----榆----- ------欤----- -----甕------------ => 羹 媃 榆 欤 甕 , gt: 羹 媃 榆 欤 甕 11573 --------羋----蠟蠟----嘻----賴-----洌-----打----缰----#---諛-----堃-------------- => 羋蠟嘻賴洌打缰#諛堃, gt: 羋蠟嘻賴洌打缰#諛堃 11573 -------啕--- -輟---- -胰---- -夭--- --竿--- --曲--- --扑------------ => 啕 輟 胰 夭 竿 曲 扑, gt: 啕 輟 胰 夭 竿 曲 扑 11573 ----硃--- ---酦--- ----窈-- ---莺---- ----篌------------------------------ => 硃 酦 窈 莺 篌 , gt: 硃 酦 窈 莺 篌 11573 ----鵬----旁----廷----势----喙-----昤----蝌----柝----杠------------------------- => 鵬旁廷势喙昤蝌柝杠 , gt: 鵬旁廷势喙昤蝌柝杠 11573 [#correct:161152 / #total:170663] Test loss: 0.0529, accuray: 0.9443 acc validate time consume: 1300.369472026825 is best: True best acc is: 0.9442702870569485 strat of training time: 2023/03/25 12:54:10 => creating output\360CC\checkpoints => creating output\360CC\log torch.cuda.device_count() 2 load 52098 images! load 2743 images! C:\Application\Anaconda3\envs\dbnet_wenmu\lib\site-packages\torch\cuda\nccl.py:15: UserWarning: PyTorch is not compiled with NCCL support warnings.warn('PyTorch is not compiled with NCCL support') Epoch: [267][1/204] Time 32.155s (32.155s) Speed 8.0 samples/s Data 24.858s (24.858s) Loss 6.39860 (6.39860) Epoch: [267][11/204] Time 0.203s (3.105s) Speed 1260.0 samples/s Data 0.000s (2.260s) Loss 0.70246 (1.82520) ... Epoch: [267][191/204] Time 0.203s (0.371s) Speed 1262.4 samples/s Data 0.000s (0.130s) Loss 0.35743 (0.50064) Epoch: [267][201/204] Time 0.203s (0.362s) Speed 1259.1 samples/s Data 0.000s (0.124s) Loss 0.33498 (0.49580) get_last_lr : [0.001] train epoch time consume: 77.23187446594238 Epoch: [267][10/43] Time 29.733s (27.641s) Speed 2.2 samples/s Data 29.265s (27.074s) Loss 0.33460 (0.28818) Epoch: [267][20/43] Time 34.983s (30.138s) Speed 1.8 samples/s Data 34.499s (29.592s) Loss 0.45812 (0.33128) Epoch: [267][30/43] Time 40.311s (32.738s) Speed 1.6 samples/s Data 39.796s (32.196s) Loss 0.33809 (0.34336) Epoch: [267][40/43] Time 45.530s (35.344s) Speed 1.4 samples/s Data 44.952s (34.808s) Loss 0.47718 (0.33017) -1--77---9--55--99--88- -0---3--1---2--2------------------------------ => 179598 03122, gt: 179598 03122 3905 ---淅--- ---役---- ---櫛---- ----弓-- ---輩--------------------------- => 淅 役 櫛 弓 輩 , gt: 淅 役 櫛 弓 輩 3905 ---崧---- ---喷---- --七七--- ---莎莎--- ---故------------------------ => 崧 喷 七 莎 故 , gt: 崧 喷 七 莎 故 3905 ---像---- ----闞--- ---喊---- ----星--- ----蠢------------------------- => 像 闞 喊 星 蠢 , gt: 像 闞 喊 星 蠢 3905 ---诞----- -----琐---- ----厭---- -----鏗----- ----緝-------------- => 诞 琐 厭 鏗 緝 , gt: 诞 瓒 厭 鏗 緝 3905 ---鑣----- -----玲---- ----玹---- -----佺----- ----特-------------- => 鑣 玲 玹 佺 特 , gt: 鑣 玲 玹 佺 特 3905 -44---0---0---22---4---4----0---6-- -55---2---5----1---7-------------- => 40024406 52517, gt: 40024406 52517 3905 ---佈---- ----隆----- ----倍---- ---冲---- ---苋------------------ => 佈 隆 倍 冲 苋 , gt: 佈 隆 倍 冲 苋 3905 ----戚---- ----潯---- ----涌---- ---婵---- ----檔-------------------- => 戚 潯 涌 婵 檔 , gt: 戚 潯 涌 婵 檔 3905 -2---3---3--- --5-----0----4----8-----0---4--8----6---3---------------- => 233 504804863, gt: 2331 504801863 3905 [#correct:2112 / #total:2743] Test loss: 0.3309, accuray: 0.7700 acc validate time consume: 48.373497009277344 is best: True best acc is: 0.7699598979219833 val_dataset = get_dataset(config)(config, is_train=False) val_loader = DataLoader( dataset=val_dataset, batch_size=config.TEST.BATCH_SIZE_PER_GPU, shuffle=False, num_workers=0, pin_memory=True ) Epoch: [272][181/204] Time 0.216s (0.285s) Speed 1183.1 samples/s Data 0.000s (0.080s) Loss 0.28915 (0.25591) Epoch: [272][191/204] Time 0.201s (0.281s) Speed 1273.3 samples/s Data 0.000s (0.076s) Loss 0.26008 (0.25824) Epoch: [272][201/204] Time 0.193s (0.277s) Speed 1323.1 samples/s Data 0.001s (0.073s) Loss 0.30827 (0.26050) get_last_lr : [0.001] train epoch time consume: 56.546335220336914 Epoch: [272][10/43] Time 5.578s (2.956s) Speed 11.5 samples/s Data 4.987s (2.434s) Loss 0.58593 (0.35299) Epoch: [272][20/43] Time 11.662s (5.952s) Speed 5.5 samples/s Data 11.016s (5.405s) Loss 0.21616 (0.31394) Epoch: [272][30/43] Time 17.700s (8.949s) Speed 3.6 samples/s Data 17.156s (8.396s) Loss 0.40394 (0.33232) Epoch: [272][40/43] Time 23.547s (11.944s) Speed 2.7 samples/s Data 23.016s (11.393s) Loss 0.37459 (0.34642) ---瀑---- ----亘--- ---喘--- ---蕎--- ---型-------------------------- => 瀑 亘 喘 蕎 型 , gt: 瀑 亘 喘 蕎 型 3905 ---竽---- ----究--- ----闽--- ----骥--- ---勾------------------------ => 竽 究 闽 骥 勾 , gt: 竽 究 闽 骥 勾 3905 ``` 調整了幾項設定好像都差不多,看樣子是讀檔案的時候,因為 validate dataset 非常分散,所以讀影像的時候就會變很慢,目前做好的效果是調整 woker_num 但是效果只好一點點。 未來可能要將 validate dataset 移到單獨位置試試,或者讀到 h5 or pkl 檔案來增加速度!! 23'03/26 試過了將validate資料搬到另一個資料夾中 結果沒甚麼作用!! 修改 work_num 也沒作用!! 目前將 dataloader 中 getitem (lib\dataset\_360cc.py)裡面的 resize mark 掉不使用 速度提升一倍如下 ``` lib\config\360CC_config.yaml GPUID: 0 WORKERS: 20 PRINT_FREQ: 100 SAVE_FREQ: 10 PIN_MEMORY: True 'val':'../CRNN_Chinese_Characters_Rec_data/datasets/train/test_code_valid.txt'} TRAIN: BATCH_SIZE_PER_GPU: 1160 SHUFFLE: True BEGIN_EPOCH: 0 END_EPOCH: 1000 RESUME: IS_RESUME: True FILE: 'output/360CC/checkpoints/checkpoint_269_acc_0.9539.pth' OPTIMIZER: 'adam' LR: 0.001 WD: 0.0 LR_STEP: [10, 80] LR_FACTOR: 0.1 MOMENTUM: 0.0 NESTEROV: False RMSPROP_ALPHA: RMSPROP_CENTERED: FINETUNE: IS_FINETUNE: False FINETUNE_CHECKPOINIT: 'output/360CC/checkpoints/checkpoint_266_acc_0.9443.pth' FREEZE: False TEST: BATCH_SIZE_PER_GPU: 1160 SHUFFLE: False # for random test rather than test on the whole validation set NUM_TEST_BATCH: 1000 NUM_TEST_DISP: 10 lib\core\function.py def validate(config, val_loader, dataset, converter, model, criterion, device, epoch, writer_dict, output_dict): batch_time = AverageMeter() data_time = AverageMeter() losses = AverageMeter() model.eval() n_correct = 0 end = time.time() with torch.no_grad(): for i, (inp, idx) in enumerate(val_loader): data_time.update(time.time() - end) labels = utils.get_batch_label(dataset, idx) # inp = inp.to(device) # inference preds = model(inp).cpu() # why? lib\dataset\_360cc.py def __getitem__(self, idx): img_name = list(self.labels[idx].keys())[0] img = cv2.imread(os.path.join(self.root, img_name)) img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # img_h, img_w = img.shape # img = cv2.resize(img, (0,0), fx=self.inp_w / img_w, fy=self.inp_h / img_h, interpolation=cv2.INTER_CUBIC) <--- 這裡先停掉 增加一倍速度 (錯誤) img = np.reshape(img, (self.inp_h, self.inp_w, 1)) strat of training time: 2023/03/26 08:02:54 => creating output\360CC\checkpoints => creating output\360CC\log torch.cuda.device_count() 2 load 3239051 images! load 170477 images! Epoch: [280][1/2793] Time 140.828s (140.828s) Speed 8.2 samples/s Data 125.750s (125.750s) Loss 0.02715 (0.02715) Epoch: [280][101/2793] Time 1.047s (2.421s) Speed 1108.2 samples/s Data 0.000s (1.246s) Loss 0.02376 (0.02700) ... Epoch: [280][2701/2793] Time 1.110s (1.142s) Speed 1045.5 samples/s Data 0.000s (0.047s) Loss 0.02019 (0.02689) Epoch: [280][2701/2793] Time 1.110s (1.142s) Speed 1045.5 samples/s Data 0.000s (0.047s) Loss 0.02019 (0.02689) get_last_lr : [0.001] train epoch time consume: 3196.342706680298 Epoch: [280][10/147] Time 88.247s (73.721s) Speed 13.1 samples/s Data 84.903s (70.467s) Loss 0.04430 (0.03724) Epoch: [280][20/147] Time 120.574s (89.954s) Speed 9.6 samples/s Data 117.527s (86.712s) Loss 0.02340 (0.03811) ... Epoch: [280][130/147] Time 456.392s (260.109s) Speed 2.5 samples/s Data 453.383s (257.027s) Loss 0.04539 (0.03979) Epoch: [280][140/147] Time 486.361s (275.307s) Speed 2.4 samples/s Data 483.395s (272.231s) Loss 0.02618 (0.03957) ----蚓------舉-----牍-----荖-----你-----外外----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307 -2----7---0----5---3-- -4---7----3---8----1---4----------------------- => 27053 473814, gt: 27053 473814 79307 ---(---予----梗----梓---鏤----募--亘亘---叔----帏------------------------------- => (予梗梓鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307 -----------噓-- -加--- --弓-- --尚-- --算-- -畛-- -竿竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307 ------------近-- -盂--- --憧--- -斗-- -脂-- --醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307 -------灶----嘐-----蒸-----梃-----貂-----诛----皴-----镰-----晷-----侏----------- => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307 ----悚------菒-----拭------術------托-----析析-----溜------环------鰍------------ => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307 -----------------脯- -s -谗- --栗- -効- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307 ------------眇---诚-----緣---丸---喋----邕邕---踩----筑----涕----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307 ----酴--- ----梅--- ----干-- -----擰--- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307 [#correct:164510 / #total:170477] Test loss: 0.0398, accuray: 0.9650 acc validate time consume: 511.09727215766907 is best: True best acc is: 0.9649982109023505 ( 附註: validate batch time 計算錯誤 沒 reset ,所以事後發現不是資料存取的問題!!!蝦米!?) lib\core\function.py if i == config.TEST.NUM_TEST_BATCH: break end = time.time() # <--- 改到這裡才對 ``` validate 改成 GPU ``` inp = inp.to(device) # add this # inference # preds = model(inp).cpu() # why? preds = model(inp) # strat of training time: 2023/03/26 09:08:55 => creating output\360CC\checkpoints => creating output\360CC\log torch.cuda.device_count() 2 load 3239051 images! load 170477 images! Epoch: [280][1/2793] Time 139.924s (139.924s) Speed 8.3 samples/s Data 125.174s (125.174s) Loss 0.01400 (0.01400) Epoch: [280][101/2793] Time 0.984s (2.377s) Speed 1178.5 samples/s Data 0.000s (1.240s) Loss 0.02962 (0.02531) ... Epoch: [280][2701/2793] Time 1.094s (1.113s) Speed 1060.5 samples/s Data 0.000s (0.047s) Loss 0.01298 (0.02704) get_last_lr : [0.001] train epoch time consume: 3116.4562408924103 Epoch: [280][10/147] Time 8.625s (14.967s) Speed 134.5 samples/s Data 0.000s (6.159s) Loss 0.06155 (0.05366) Epoch: [280][140/147] Time 8.359s (9.152s) Speed 138.8 samples/s Data 0.000s (0.441s) Loss 0.03826 (0.05616) ----蚓-----舉------牍----荖------你-----外-----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307 -2----7---0----5---3-- -4----7---3---8---11--4------------------------ => 27053 473814, gt: 27053 473814 79307 ---(---予----梗---样----鏤---募----亘---叔---帏-------------------------------- => (予梗样鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307 -----------噓-- -加--- --弓-- --尚-- --算-- -畛-- --竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307 -----------近--- 盂--- -憧--- -斗-- -脂-- --醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307 ------灶-----嘐-----蒸-----梃-----貂-----诛----皴-----镰-----晷-----侏----------- => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307 ----悚-----菒------拭------術-----托------析------溜-----环------鰍------------- => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307 -----------------脯-- -s -谗- -栗- -効-- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307 ------------眇---诚----緣----丸----喋---邕邕---踩----筑---涕-----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307 ----酴--- ---梅---- ---干--- ----擰---- ----蒙------------------------ => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307 [#correct:161916 / #total:170477] Test loss: 0.0565, accuray: 0.9498 acc validate time consume: 1349.7679324150085 <--- 又變更久!!? 最慢!!! 不過資料時間就變沒問題,應該是剛剛程式計算時間沒歸零所致 is best: True best acc is: 0.9497820820403925 ``` 再改回 validate 用 CPU ``` # inp = inp.to(device) # add this # inference preds = model(inp).cpu() # why? # preds = model(inp) # strat of training time: 2023/03/26 10:32:29 load 3239051 images! load 170477 images! Epoch: [280][1/2793] Time 142.859s (142.859s) Speed 8.1 samples/s Data 124.562s (124.562s) Loss 0.01011 (0.01011) Epoch: [280][101/2793] Time 1.031s (2.434s) Speed 1124.8 samples/s Data 0.000s (1.234s) Loss 0.01207 (0.02511) ... Epoch: [280][2701/2793] Time 1.187s (1.159s) Speed 976.8 samples/s Data 0.000s (0.047s) Loss 0.02497 (0.02608) get_last_lr : [0.001] train epoch time consume: 3244.6378741264343 Epoch: [280][10/147] Time 3.359s (9.219s) Speed 345.3 samples/s Data 0.000s (5.784s) Loss 0.04841 (0.04270) Epoch: [280][20/147] Time 3.188s (6.208s) Speed 363.9 samples/s Data 0.000s (2.892s) Loss 0.02864 (0.04360) Epoch: [280][30/147] Time 3.172s (5.228s) Speed 365.7 samples/s Data 0.000s (1.929s) Loss 0.02289 (0.04255) ... Epoch: [280][130/147] Time 3.000s (3.529s) Speed 386.7 samples/s Data 0.000s (0.445s) Loss 0.04832 (0.04314) Epoch: [280][140/147] Time 3.000s (3.489s) Speed 386.7 samples/s Data 0.000s (0.414s) Loss 0.03028 (0.04297) ----酴--- ---梅---- ---干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307 [#correct:162978 / #total:170477] Test loss: 0.0432, accuray: 0.9560 acc validate time consume: 513.2954776287079 <--- 這第二名 is best: True best acc is: 0.9560116613971387 ``` 大概 validate 快了 2~3 倍。 下圖是 training step resource usage count ![](https://i.imgur.com/1PasBQC.jpg) 試一下下面這樣 inp.to(device),最後一試。也同是調整了 lr的設定(雖然目前不具作用)如下: ``` lib\config\360CC_config.yaml LR_STEP: [300, 320] # 第一個epoch以下為設定值,然後 *0.1,第二個eopch以上再 *0.1 (1/100) lib\core\function.py inp = inp.to(device) # add this <--- 加上去 # inference preds = model(inp).cpu() # why? # preds = model(inp) # strat of training time: 2023/03/26 11:54:05 load 3239051 images! load 170477 images! Epoch: [280][1/2793] Time 138.890s (138.890s) Speed 8.4 samples/s Data 119.003s (119.003s) Loss 0.02168 (0.02168) Epoch: [280][101/2793] Time 1.047s (2.417s) Speed 1108.1 samples/s Data 0.000s (1.179s) Loss 0.02360 (0.02676) Epoch: [280][201/2793] Time 1.062s (1.734s) Speed 1091.8 samples/s Data 0.000s (0.593s) Loss 0.02606 (0.02697) ... Epoch: [280][2701/2793] Time 1.078s (1.157s) Speed 1076.0 samples/s Data 0.000s (0.045s) Loss 0.02396 (0.02612) get_last_lr : [0.001] train epoch time consume: 3235.9140632152557 Epoch: [280][10/147] Time 3.094s (8.489s) Speed 375.0 samples/s Data 0.000s (5.309s) Loss 0.04337 (0.03674) Epoch: [280][20/147] Time 3.109s (5.805s) Speed 373.1 samples/s Data 0.000s (2.655s) Loss 0.02111 (0.03763) ... Epoch: [280][80/147] Time 2.906s (3.706s) Speed 399.1 samples/s Data 0.000s (0.664s) Loss 0.04593 (0.03717) Epoch: [280][90/147] Time 2.953s (3.634s) Speed 392.8 samples/s Data 0.000s (0.590s) Loss 0.07400 (0.03863) Epoch: [280][100/147] Time 3.141s (3.575s) Speed 369.4 samples/s Data 0.000s (0.531s) Loss 0.03438 (0.03854) Epoch: [280][110/147] Time 2.823s (3.525s) Speed 410.8 samples/s Data 0.000s (0.483s) Loss 0.04226 (0.03883) Epoch: [280][120/147] Time 3.014s (3.482s) Speed 384.9 samples/s Data 0.000s (0.443s) Loss 0.02881 (0.03868) Epoch: [280][130/147] Time 2.984s (3.445s) Speed 388.7 samples/s Data 0.000s (0.409s) Loss 0.04510 (0.03863) Epoch: [280][140/147] Time 2.953s (3.414s) Speed 392.8 samples/s Data 0.000s (0.379s) Loss 0.02517 (0.03840) ----蚓-----舉-----牍------荖-----你----外------透-----兜------鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307 -22---7---00---5---3-- -4----7---3---8---11---4----------------------- => 27053 473814, gt: 27053 473814 79307 ---(---予---梗----样----鏤---募----亘---叔---帏-------------------------------- => (予梗样鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307 -----------噓--- -加--- --弓-- --尚-- --算--- -畛--- -竿竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307 -----------近--- --盂-- -憧---- -斗-- --脂-- -醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307 ------灶-----嘐-----蒸-----梃-----貂-----诛----皴-----镰-----晷晷----侏----------- => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307 ---悚------菒-----拭------術------托------析------溜-----环------鰍------------- => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307 -----------------脯- -s -谗- -栗- -効- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307 ------------眇---诚----緣----丸---喋----邕邕---踩----筑----涕----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307 ----酴--- ---梅---- ---干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307 [#correct:164793 / #total:170477] Test loss: 0.0387, accuray: 0.9667 acc validate time consume: 502.78747749328613 <--- 這最快 第一名 is best: True best acc is: 0.9666582588853628 ``` 再試一下這樣 ``` lib\core\function.py inp = inp.cpu() # to(device) # add this # inference preds = model(inp).cpu() # why? # preds = model(inp) # strat of training time: 2023/03/26 13:02:39 load 3239051 images! load 170477 images! Epoch: [280][1/2793] Time 134.030s (134.030s) Speed 8.7 samples/s Data 119.123s (119.123s) Loss 0.02288 (0.02288) Epoch: [280][101/2793] Time 1.016s (2.333s) Speed 1142.2 samples/s Data 0.000s (1.180s) Loss 0.01304 (0.02731) Epoch: [280][201/2793] Time 1.029s (1.683s) Speed 1126.9 samples/s Data 0.000s (0.593s) Loss 0.01970 (0.02615) ... Epoch: [280][2701/2793] Time 1.094s (1.126s) Speed 1060.6 samples/s Data 0.000s (0.044s) Loss 0.03228 (0.02746) get_last_lr : [0.001] train epoch time consume: 3154.1199588775635 Epoch: [280][10/147] Time 3.219s (8.576s) Speed 360.4 samples/s Data 0.000s (5.364s) Loss 0.04234 (0.03568) Epoch: [280][20/147] Time 3.125s (5.851s) Speed 371.2 samples/s Data 0.000s (2.682s) Loss 0.02054 (0.03613) Epoch: [280][30/147] Time 3.109s (4.942s) Speed 373.1 samples/s Data 0.000s (1.788s) Loss 0.01800 (0.03577) Epoch: [280][40/147] Time 3.062s (4.496s) Speed 378.8 samples/s Data 0.000s (1.341s) Loss 0.02370 (0.03537) Epoch: [280][50/147] Time 3.375s (4.245s) Speed 343.7 samples/s Data 0.016s (1.073s) Loss 0.04410 (0.03662) Epoch: [280][60/147] Time 3.141s (4.071s) Speed 369.4 samples/s Data 0.000s (0.895s) Loss 0.03769 (0.03661) Epoch: [280][70/147] Time 3.187s (3.937s) Speed 363.9 samples/s Data 0.000s (0.767s) Loss 0.02095 (0.03587) Epoch: [280][80/147] Time 3.203s (3.845s) Speed 362.2 samples/s Data 0.000s (0.671s) Loss 0.04494 (0.03565) Epoch: [280][90/147] Time 3.203s (3.763s) Speed 362.1 samples/s Data 0.000s (0.597s) Loss 0.06963 (0.03693) Epoch: [280][100/147] Time 3.328s (3.699s) Speed 348.5 samples/s Data 0.000s (0.537s) Loss 0.03398 (0.03688) Epoch: [280][110/147] Time 3.078s (3.647s) Speed 376.9 samples/s Data 0.000s (0.488s) Loss 0.04158 (0.03719) Epoch: [280][120/147] Time 3.078s (3.605s) Speed 376.9 samples/s Data 0.000s (0.447s) Loss 0.02884 (0.03702) Epoch: [280][130/147] Time 3.047s (3.566s) Speed 380.7 samples/s Data 0.000s (0.413s) Loss 0.04320 (0.03699) Epoch: [280][140/147] Time 3.188s (3.532s) Speed 363.9 samples/s Data 0.000s (0.383s) Loss 0.02361 (0.03681) ----蚓-----舉------牍-----荖-----你-----外-----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307 [#correct:165368 / #total:170477] Test loss: 0.0370, accuray: 0.9700 acc validate time consume: 521.0724451541901 is best: True best acc is: 0.9700311478967837 ``` # 最後最快的是這個組合: lib\core\function.py inp = inp.to(device) # add this <--- 加上去 # inference preds = model(inp).cpu() # why? # preds = model(inp) # 但是令人不解的是 為什麼validate 放到GPU上會變遲緩!?為什麼不能像training時那樣每秒處理1千多個樣本!!? ``` 完整的跑一次的內容 TRAIN: BATCH_SIZE_PER_GPU: 1160 SHUFFLE: True BEGIN_EPOCH: 0 END_EPOCH: 1000 RESUME: IS_RESUME: True FILE: 'output/360CC/checkpoints/checkpoint_280_acc_0.9700.pth' OPTIMIZER: 'adam' LR: 0.001 WD: 0.0 LR_STEP: [290, 300] # 第一個epoch以下為設定值,然後 *0.1,第二個eopch以上再 *0.1 (1/100) LR_FACTOR: 0.1 MOMENTUM: 0.0 NESTEROV: False RMSPROP_ALPHA: RMSPROP_CENTERED: FINETUNE: IS_FINETUNE: False FINETUNE_CHECKPOINIT: 'output/360CC/checkpoints/checkpoint_266_acc_0.9443.pth' FREEZE: False TEST: BATCH_SIZE_PER_GPU: 1160 SHUFFLE: False # for random test rather than test on the whole validation set NUM_TEST_BATCH: 1000 NUM_TEST_DISP: 10 項目說明: Epoch: [281][101/2793] [current epoch][step/total step per epoch] Time 1.031s (2.397s) batch_time ( avg_batch_time ) Speed 1124.9 samples/s throughput for whole batch process include load data Data 0.000s (1.190s) load data_time ( avg_data_time ) Loss 0.02044 (0.02524) batch loss ( avg_loss ) strat of training time: 2023/03/26 14:26:44 => creating output\360CC\checkpoints => creating output\360CC\log torch.cuda.device_count() 2 load 3239051 images! load 170477 images! Epoch: [281][1/2793] Time 138.855s (138.855s) Speed 8.4 samples/s Data 120.167s (120.167s) Loss 0.02668 (0.02668) Epoch: [281][101/2793] Time 1.031s (2.397s) Speed 1124.9 samples/s Data 0.000s (1.190s) Loss 0.02044 (0.02524) Epoch: [281][201/2793] Time 1.031s (1.716s) Speed 1125.2 samples/s Data 0.000s (0.598s) Loss 0.01391 (0.02521) Epoch: [281][301/2793] Time 1.016s (1.488s) Speed 1142.2 samples/s Data 0.000s (0.400s) Loss 0.02905 (0.02518) Epoch: [281][401/2793] Time 1.047s (1.380s) Speed 1108.1 samples/s Data 0.000s (0.300s) Loss 0.02084 (0.02535) Epoch: [281][501/2793] Time 1.047s (1.313s) Speed 1107.8 samples/s Data 0.000s (0.240s) Loss 0.01579 (0.02535) Epoch: [281][601/2793] Time 1.047s (1.269s) Speed 1108.4 samples/s Data 0.000s (0.200s) Loss 0.01448 (0.02533) Epoch: [281][701/2793] Time 1.063s (1.238s) Speed 1091.6 samples/s Data 0.000s (0.172s) Loss 0.01883 (0.02510) Epoch: [281][801/2793] Time 1.047s (1.216s) Speed 1108.1 samples/s Data 0.000s (0.150s) Loss 0.02183 (0.02499) Epoch: [281][901/2793] Time 1.063s (1.199s) Speed 1091.7 samples/s Data 0.000s (0.134s) Loss 0.03559 (0.02507) Epoch: [281][1001/2793] Time 1.063s (1.186s) Speed 1091.8 samples/s Data 0.000s (0.120s) Loss 0.02354 (0.02560) Epoch: [281][1101/2793] Time 1.155s (1.175s) Speed 1004.2 samples/s Data 0.000s (0.110s) Loss 0.05582 (0.02589) Epoch: [281][1201/2793] Time 1.072s (1.168s) Speed 1081.9 samples/s Data 0.000s (0.100s) Loss 0.02183 (0.02582) Epoch: [281][1301/2793] Time 1.047s (1.161s) Speed 1108.2 samples/s Data 0.000s (0.093s) Loss 0.02453 (0.02580) Epoch: [281][1401/2793] Time 1.057s (1.156s) Speed 1097.8 samples/s Data 0.000s (0.086s) Loss 0.02911 (0.02582) Epoch: [281][1501/2793] Time 1.078s (1.150s) Speed 1076.1 samples/s Data 0.000s (0.081s) Loss 0.02047 (0.02589) Epoch: [281][1601/2793] Time 1.048s (1.146s) Speed 1107.1 samples/s Data 0.000s (0.076s) Loss 0.02500 (0.02589) Epoch: [281][1701/2793] Time 1.062s (1.141s) Speed 1092.7 samples/s Data 0.000s (0.071s) Loss 0.02378 (0.02621) Epoch: [281][1801/2793] Time 1.047s (1.138s) Speed 1107.7 samples/s Data 0.000s (0.067s) Loss 0.02426 (0.02614) Epoch: [281][1901/2793] Time 1.078s (1.134s) Speed 1076.2 samples/s Data 0.000s (0.064s) Loss 0.03277 (0.02616) Epoch: [281][2001/2793] Time 1.063s (1.131s) Speed 1091.7 samples/s Data 0.000s (0.061s) Loss 0.02726 (0.02650) Epoch: [281][2101/2793] Time 1.063s (1.129s) Speed 1091.5 samples/s Data 0.000s (0.058s) Loss 0.01275 (0.02642) Epoch: [281][2201/2793] Time 1.094s (1.126s) Speed 1060.7 samples/s Data 0.000s (0.055s) Loss 0.02007 (0.02634) Epoch: [281][2301/2793] Time 1.062s (1.124s) Speed 1091.9 samples/s Data 0.000s (0.053s) Loss 0.02896 (0.02628) Epoch: [281][2401/2793] Time 1.078s (1.122s) Speed 1076.1 samples/s Data 0.000s (0.051s) Loss 0.01623 (0.02623) Epoch: [281][2501/2793] Time 1.063s (1.120s) Speed 1091.3 samples/s Data 0.000s (0.049s) Loss 0.01629 (0.02618) Epoch: [281][2601/2793] Time 1.125s (1.119s) Speed 1031.1 samples/s Data 0.000s (0.047s) Loss 0.01695 (0.02604) Epoch: [281][2701/2793] Time 1.078s (1.117s) Speed 1076.2 samples/s Data 0.000s (0.045s) Loss 0.02351 (0.02602) get_last_lr : [0.001] train epoch time consume: 3126.1498596668243 Epoch: [281][10/147] Time 3.172s (8.523s) Speed 365.7 samples/s Data 0.000s (5.372s) Loss 0.04474 (0.03685) Epoch: [281][20/147] Time 3.156s (5.830s) Speed 367.5 samples/s Data 0.000s (2.686s) Loss 0.02303 (0.03750) Epoch: [281][30/147] Time 3.109s (4.926s) Speed 373.1 samples/s Data 0.000s (1.791s) Loss 0.01673 (0.03712) Epoch: [281][40/147] Time 3.156s (4.473s) Speed 367.5 samples/s Data 0.000s (1.343s) Loss 0.02499 (0.03667) Epoch: [281][50/147] Time 3.078s (4.195s) Speed 376.9 samples/s Data 0.000s (1.074s) Loss 0.03983 (0.03772) Epoch: [281][60/147] Time 3.234s (4.035s) Speed 358.6 samples/s Data 0.000s (0.895s) Loss 0.04007 (0.03788) Epoch: [281][70/147] Time 3.031s (3.909s) Speed 382.7 samples/s Data 0.000s (0.767s) Loss 0.02486 (0.03717) Epoch: [281][80/147] Time 3.047s (3.811s) Speed 380.7 samples/s Data 0.000s (0.671s) Loss 0.04637 (0.03689) Epoch: [281][90/147] Time 3.234s (3.735s) Speed 358.6 samples/s Data 0.000s (0.597s) Loss 0.07367 (0.03817) Epoch: [281][100/147] Time 3.375s (3.683s) Speed 343.7 samples/s Data 0.000s (0.537s) Loss 0.03454 (0.03795) Epoch: [281][110/147] Time 3.125s (3.645s) Speed 371.2 samples/s Data 0.000s (0.488s) Loss 0.04324 (0.03828) Epoch: [281][120/147] Time 3.047s (3.602s) Speed 380.7 samples/s Data 0.000s (0.448s) Loss 0.03007 (0.03813) Epoch: [281][130/147] Time 3.221s (3.564s) Speed 360.1 samples/s Data 0.000s (0.413s) Loss 0.04645 (0.03808) Epoch: [281][140/147] Time 3.047s (3.533s) Speed 380.7 samples/s Data 0.000s (0.384s) Loss 0.02663 (0.03794) ----蚓-----舉------牍-----荖----你------外-----透------兜-----鐠---------------- => 蚓舉牍荖你外透兜鐠 , gt: 蚓舉牍荖你外透兜鐠 79307 -2----7---0----5---3-- -4----7---3---8----1---4----------------------- => 27053 473814, gt: 27053 473814 79307 ---(---予----梗---梓----鏤---募----亘---叔---帏-------------------------------- => (予梗梓鏤募亘叔帏 , gt: (予梗样鏤募亘叔帏 79307 -----------噓--- -加--- --弓-- --尚-- --算-- --畛-- -竿竿--------------- => 噓 加 弓 尚 算 畛 竿, gt: 噓 加 弓 尚 算 畛 竿 79307 ------------近-- --盂-- -憧--- -斗-- --脂-- -醃-- -舌---------------- => 近 盂 憧 斗 脂 醃 舌, gt: 近 盂 憧 斗 脂 醃 舌 79307 -------灶----嘐嘐----蒸-----梃-----貂-----诛----皴皴----镰-----晷----侏------------ => 灶嘐蒸梃貂诛皴镰晷侏, gt: 灶嘐蒸梃貂诛皴镰晷侏 79307 ---悚------菒------拭-----術------托------析------溜-----环-------鰍------------ => 悚菒拭術托析溜环鰍 , gt: 悚菒拭術托析溜环鰍 79307 -----------------脯- -s -谗- -栗- 効-- -秩-- -茑--------------------- => 脯 s 谗 栗 効 秩 茑, gt: 脯 s 谗 栗 効 秩 茑 79307 ------------眇---诚----緣----丸----喋---邕邕---踩----筑----涕----髡--------------- => 眇诚緣丸喋邕踩筑涕髡, gt: 眇诚緣丸喋邕踩筑涕髡 79307 ----酴--- ----梅---- ----干--- ----擰---- -----蒙----------------------- => 酴 梅 干 擰 蒙 , gt: 酴 梅 干 擰 蒙 79307 [#correct:164903 / #total:170477] Test loss: 0.0382, accuray: 0.9673 acc validate time consume: 519.9725430011749 is best: True best acc is: 0.9673035072179825 ``` ![](https://i.imgur.com/Db2Wl6Y.png) # 正確的設定方法 根據pytorch說明(https://pytorch.org/tutorials/beginner/former_torchies/parallelism_tutorial.html) ``` class DataParallelModel(nn.Module): def __init__(self): super().__init__() self.block1 = nn.Linear(10, 20) # wrap block2 in DataParallel self.block2 = nn.Linear(20, 20) self.block2 = nn.DataParallel(self.block2) # <---------- 這裡 self.block3 = nn.Linear(20, 20) def forward(self, x): x = self.block1(x) x = self.block2(x) x = self.block3(x) return x ``` 這意思是指在模型網路中將資料分布到多棵GPU中去運算,所以加在model定義中,所以實際改成這樣 ``` train.py os.environ['CUDA_VISIBLE_DEVICES'] = '0,1,2,3,4,5,6,7' model = crnn.get_crnn(config) model = model.to(device) crnn.py # model definition self.cnn = cnn self.cnn = nn.DataParallel(self.cnn) # <------ here ! self.rnn = nn.Sequential( BidirectionalLSTM(512, nh, nh), BidirectionalLSTM(nh, nh, nclass)) self.rnn = nn.DataParallel(self.rnn) # <------ here ! lib/core/function.py def train(config, train_loader, dataset, converter, model, ... preds = model(inp).cuda() # <---- 改成這個 def validate(config, val_loader, dataset, converter, model, ... preds = model(inp).cuda() # <---- 改成這個 (註:這裡不對,要用CPU) config/360CC_config.yaml 增加 worker、Batch_Size (train valid) ``` 執行效果 ``` load 3279605 images! config.TRAIN.BATCH_SIZE_PER_GPU 4000 load 364400 images! config.TEST.BATCH_SIZE_PER_GPU 4000 torch.Size([4000, 512, 1, 41]) Epoch: [0][0/820] Time 54.249s (54.249s) Speed 73.7 samples/s Data 27.694s (27.694s) Loss 32.95580 (32.95580) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) torch.Size([4000, 512, 1, 41]) Epoch: [0][10/820] Time 1.102s (5.968s) Speed 3629.7 samples/s Data 0.015s (2.540s) Loss 31.65610 (32.42912) ``` 下圖是每一顆GPU都使用到的狀況 ![](https://i.imgur.com/SH0Yes4.png) # 錯誤的設定方法 ``` train.py # construct face related neural networks model = crnn.get_crnn(config) # get device print('torch.cuda.device_count()', torch.cuda.device_count()) device="cuda" model = torch.nn.DataParallel(model, device_ids=[0,1]) # 設了兩顆GPU model = model.to(device) ``` # 單顆時的參數(情形) ``` load 2000 images! load 1000 images! torch.Size([1000, 512, 1, 41]) lib/function preds.size(): torch.Size([41, 1000, 6736]) lib/function inp.size(): torch.Size([1000, 1, 32, 160]) lib/function inp.size(0): 1000 lib/function batch_size: 1000 lib/function text.size(), length.size(): torch.Size([10000]) torch.Size([1000]) lib/function preds_size.size(), [preds.size(0)]: torch.Size([1000]) [41] loss = criterion(preds, text, preds_size, length) loss = criterion(preds, text, torch.Size([41, 1000, 6736]), torch.Size([1000]) ``` # 2顆GPU的情形 ``` torch.Size([500, 512, 1, 41]) torch.Size([500, 512, 1, 41]) lib/function preds.size(): torch.Size([82, 500, 6736]) lib/function inp.size(): torch.Size([1000, 1, 32, 160]) lib/function inp.size(0): 1000 lib/function batch_size: 1000 lib/function text.size(), length.size(): torch.Size([10000]) torch.Size([1000]) lib/function preds_size.size(), [preds.size(0)]: torch.Size([1000]) [82] loss = criterion(preds, text, preds_size, length) loss = criterion(preds, text, torch.Size([82, 500, 6736]), torch.Size([1000]) # 變成 41 + 41 = 82 , 而且 batch size 減半變 500 ``` 結果失敗收場 失敗的原因大概是: model = torch.nn.DataParallel(model, device_ids=[0,1]) 這是把模型複製到每個GPU,但是在計算loss的時候,似乎把 w = 41 合併了,所以變成 原來 torch.Size([41, 1000, 6736]) 變成 torch.Size([82, 500, 6736]) w被乘以2、batch被除以2(分到兩張GPU去了),看樣子是被從0軸(橫)串再一起 也就是 torch.Size([41, 500, 6736]) + torch.Size([41, 500, 6736]) -> torch.Size([82, 500, 6736])