# Week 15: Yolov4 模型壓縮實作 ###### tags: `技術研討` ## 前情提要 ## 1. [YOLOv4 架構簡介](https://zhuanlan.zhihu.com/p/150127712) ![](https://i.imgur.com/hkBbNjE.jpg) * overall modules * backbone * downsample1~5 (包含 residual block 的架構) * neck * head 最後的 output channel,每條 feature 會有 x, y, w, h, object(probability), class (probability / 5 類) > (5 + 5) * 3 (bboxs) = 30 ## 2. YOLOv4 架構調整 ### 2-1 解析Yolov4 models.py"主程式"架構 先找到主程式! ```python= class Yolov4(nn.Module): """Yolov4主程式架構""" def __init__(self, yolov4conv137weight=None, n_classes=80, inference=False): super().__init__() """Yolov4主程式架構的定義""" output_ch = (4 + 1 + n_classes) * 3 # (4 + 1 + 80) *3 = 255 # backbone self.down1 = DownSample1() self.down2 = DownSample2() self.down3 = DownSample3() self.down4 = DownSample4() self.down5 = DownSample5() # neck self.neek = Neck(inference) # yolov4conv137 if yolov4conv137weight: _model = nn.Sequential(self.down1, self.down2, self.down3, self.down4, self.down5, self.neek) pretrained_dict = torch.load(yolov4conv137weight) model_dict = _model.state_dict() # 1. filter out unnecessary keys pretrained_dict = {k1: v for (k, v), k1 in zip(pretrained_dict.items(), model_dict)} # 2. overwrite entries in the existing state dict model_dict.update(pretrained_dict) _model.load_state_dict(model_dict) # head self.head = Yolov4Head(output_ch, n_classes, inference) # 255, 80, True/False def forward(self, input): """Yolov4主程式架構""" d1 = self.down1(input) # input是圖檔 d2 = self.down2(d1) d3 = self.down3(d2) d4 = self.down4(d3) d5 = self.down5(d4) x20, x13, x6 = self.neek(d5, d4, d3) output = self.head(x20, x13, x6) return output ``` ㄜ.... 漏漏長啦該應從哪開始看 1. 先看forward() (大家請自動忽略neek,他其實叫neck,作者寫錯字以後就繼續錯下去...) 2. 架構分為3部分: (1) downsample 共5個依序接下去 (2) neck (3) head 3. 整個架構的連貫性(這3個部分的連貫性): (1) down1 -> down2 -> down3 -> down4 -> down5 (2) down3, 4, 5 -> neck :star: (3) neck的conv20, conv13, conv6 -> head :star: (4) head -> output結束 4. 最後再來細看這3部分架構更細的定義: 來__init__裡面看看都有很清楚的定義,我們下面以head為例,所以來看看head ### 2-2 解析Yolov4 models.py "Yolov4Head程式"架構 以YoloHead為例,Downsample & Neck以此類推 以下每個conv-X都是pre-activation (順序是 BN-Relu-Conv BN-Relu-Conv) ``` (conv6): Bn_Activation_Conv( (conv): ModuleList( (0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (1): LeakyReLU(negative_slope=0.1, inplace=True) (2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False) ) ``` 1. 配架構圖一邊講(先看灰色、橘色字就好) [架構圖往下下下面拉](https://hackmd.io/p9GjDOm2SgGUyoz7ZiGyOg?both#畫出架構) 3. :star:要剪枝是根據誰在剪枝,是根據下一層的BN layer 記得這裡是pre-activation layer 順序是 BN-Relu-Conv BN-Relu-Conv ![](https://i.imgur.com/rnh2xcA.png) ![](https://i.imgur.com/q4z6TJL.png) ![](https://i.imgur.com/YkkLZaq.png) 4. 再來決定誰要被skip(看藍色、粉色字)(此次調整先不剪他們) (1) concat問題 (2) 與Neck連動問題 (3) 沒有BatchNorm layer 4. 最後決定出cfg(紅色字) ```python class Yolov4Head(nn.Module): """Yolov4 Head架構定義""" def __init__(self, output_ch, n_classes, inference=False): super().__init__() self.inference = inference self.conv1 = Bn_Activation_Conv(128, 256, 3, 1, 'leaky') self.conv2 = Bn_Activation_Conv(256, output_ch, 1, 1, 'linear', bn=False, bias=True) self.yolo1 = YoloLayer( anchor_mask=[0, 1, 2], num_classes=n_classes, anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401], num_anchors=9, stride=8) # R -4 self.conv3 = Bn_Activation_Conv(128, 256, 3, 2, 'leaky') # R -1 -16 self.conv4 = Bn_Activation_Conv(512, 256, 1, 1, 'leaky') self.conv5 = Bn_Activation_Conv(256, 512, 3, 1, 'leaky') self.conv6 = Bn_Activation_Conv(512, 256, 1, 1, 'leaky') self.conv7 = Bn_Activation_Conv(256, 512, 3, 1, 'leaky') self.conv8 = Bn_Activation_Conv(512, 256, 1, 1, 'leaky') self.conv9 = Bn_Activation_Conv(256, 512, 3, 1, 'leaky') self.conv10 = Bn_Activation_Conv(512, output_ch, 1, 1, 'linear', bn=False, bias=True) self.yolo2 = YoloLayer( anchor_mask=[3, 4, 5], num_classes=n_classes, anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401], num_anchors=9, stride=16) # R -4 self.conv11 = Bn_Activation_Conv(256, 512, 3, 2, 'leaky') # R -1 -37 self.conv12 = Bn_Activation_Conv(1024, 512, 1, 1, 'leaky') self.conv13 = Bn_Activation_Conv(512, 1024, 3, 1, 'leaky') self.conv14 = Bn_Activation_Conv(1024, 512, 1, 1, 'leaky') self.conv15 = Bn_Activation_Conv(512, 1024, 3, 1, 'leaky') self.conv16 = Bn_Activation_Conv(1024, 512, 1, 1, 'leaky') self.conv17 = Bn_Activation_Conv(512, 1024, 3, 1, 'leaky') self.conv18 = Bn_Activation_Conv(1024, output_ch, 1, 1, 'linear', bn=False, bias=True) self.yolo3 = YoloLayer( anchor_mask=[6, 7, 8], num_classes=n_classes, anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401], num_anchors=9, stride=32) def forward(self, input1, input2, input3): # neck channels(x20, x13, x6)=(128, 256, 512) """Yolov4 Head架構""" x1 = self.conv1(input1) # neck channels(x20) = 128 x2 = self.conv2(x1) # 沒有BN,x2結果直接output,下面有圖例 x3 = self.conv3(input1) # neck channels(x20) = 128 # R -1 -16 x3 = torch.cat([x3, input2], dim=1) # x3 channels=256, neck channels(x13) = 256 x4 = self.conv4(x3) # skip cat完以後的layer,如果剪了conv4,也要跟著剪x3和input2,不然會無法對齊 x5 = self.conv5(x4) x6 = self.conv6(x5) x7 = self.conv7(x6) x8 = self.conv8(x7) x9 = self.conv9(x8) # x8來的(conv8的結果) x10 = self.conv10(x9) # 沒有BN,x10結果直接output # R -4 x11 = self.conv11(x8) # x8來的(conv8的結果) # R -1 -37 x11 = torch.cat([x11, input3], dim=1) # x11 channels=512, neck channels(x6) = 512,又連動著x8(x8 是 conv11的input),與conv9 x12 = self.conv12(x11) # skip cat完以後的layer,如果剪了conv12,也要跟著剪x11和input3,不然會無法對齊 x13 = self.conv13(x12) x14 = self.conv14(x13) x15 = self.conv15(x14) x16 = self.conv16(x15) x17 = self.conv17(x16) x18 = self.conv18(x17) # 沒有BN,x18結果直接output if self.inference: y1 = self.yolo1(x2) y2 = self.yolo2(x10) y3 = self.yolo3(x18) return get_region_boxes([y1, y2, y3]) else: return [x2, x10, x18] ``` ![](https://i.imgur.com/uYgB6NF.png) ### 2-3 畫出架構 ![](https://i.imgur.com/QWKotH0.png) ### 2-4 把cfg換上去 ```python # 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 # s s s s s s [128, 128, 512, 256, 512, 256, 512, 256, 256, 1024, 512, 1024, 512, 1024, 512] ``` ```python class Yolov4Head(nn.Module): """Yolov4 Head架構定義調整為使用cfg參數""" def __init__(self, output_ch, n_classes, inference=False, cfg=None): """ original: [128, 128, 512, 256, 512, 256, 512, 256, 256, 1024, 512, 1024, 512, 1024, 512] new: [128, 128, 512, 113, 253, 136, 241, 256, 256, 1024, 254, 494, 257, 500, 248] skip: [0, 1, 2, 7, 8, 9] """ super().__init__() self.inference = inference # x1 = self.conv1(input1) # input1: (neck的x20) channels=128 # 128, 256 -> 128, 256(因為下一層不是bn所以不動) self.conv1 = Bn_Activation_Conv(cfg['head'][0], 256, 3, 1, 'leaky') # bn=False 不用剪 self.conv2 = Bn_Activation_Conv(256, output_ch, 1, 1, 'linear', bn=False, bias=True) self.yolo1 = YoloLayer( anchor_mask=[0, 1, 2], num_classes=n_classes, anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401], num_anchors=9, stride=8) # R -4 # x3 = self.conv3(input1) # input1: (neck的x20=128) # 128, 256 -> 128, 256(下一層是cat所以不剪) self.conv3 = Bn_Activation_Conv(cfg['head'][1], 256, 3, 2, 'leaky') # R -1 -16 # (skip cat) conv3的256 + neck的x13的256 (skip所以不動,在cfg裡保留一樣的數字) # 512, 256 -> 512(256+256), 113 self.conv4 = Bn_Activation_Conv(cfg['head'][2], cfg['head'][3], 1, 1, 'leaky') # 256, 512 -> 113, 253 self.conv5 = Bn_Activation_Conv(cfg['head'][3], cfg['head'][4], 3, 1, 'leaky') # 512, 256 -> 253, 136 self.conv6 = Bn_Activation_Conv(cfg['head'][4], cfg['head'][5], 1, 1, 'leaky') # 256, 512 -> 136, 241 self.conv7 = Bn_Activation_Conv(cfg['head'][5], cfg['head'][6], 3, 1, 'leaky') # 512, 256 -> 241, 256 self.conv8 = Bn_Activation_Conv(cfg['head'][6], cfg['head'][7], 1, 1, 'leaky') # 256, 512 -> 256(接conv8的output channel), 512(因為下一層不是bn所以不動) self.conv9 = Bn_Activation_Conv(cfg['head'][7], 512, 3, 1, 'leaky') # bn=False 不用剪 self.conv10 = Bn_Activation_Conv(512, output_ch, 1, 1, 'linear', bn=False, bias=True) self.yolo2 = YoloLayer( anchor_mask=[3, 4, 5], num_classes=n_classes, anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401], num_anchors=9, stride=16) # R -4 # conv11的input channel是:conv 8的output channel來的,所以是接cfg['head][7] # 256, 512 -> 256(接conv8的output channel), 512(下一層是cat所以不動) self.conv11 = Bn_Activation_Conv(cfg['head'][8], 512, 3, 2, 'leaky') # R -1 -37 # (skip cat) conv11的512 + neck的x6的512 # 1024, 512 -> 1024, 254 self.conv12 = Bn_Activation_Conv(cfg['head'][9], cfg['head'][10], 1, 1, 'leaky') # 512, 1024 ->254, 494 self.conv13 = Bn_Activation_Conv(cfg['head'][10], cfg['head'][11], 3, 1, 'leaky') # 1024, 512 -> 494, 257 self.conv14 = Bn_Activation_Conv(cfg['head'][11], cfg['head'][12], 1, 1, 'leaky') # 512, 1024 -> 257, 500 self.conv15 = Bn_Activation_Conv(cfg['head'][12], cfg['head'][13], 3, 1, 'leaky') # 1024, 512 -> 500, 248 self.conv16 = Bn_Activation_Conv(cfg['head'][13], cfg['head'][14], 1, 1, 'leaky') # 512, 1024 -> 248, 1024(因為下一層不是bn所以不動) self.conv17 = Bn_Activation_Conv(cfg['head'][14], 1024, 3, 1, 'leaky') # bn=False 不用剪 self.conv18 = Bn_Activation_Conv(1024, output_ch, 1, 1, 'linear', bn=False, bias=True) self.yolo3 = YoloLayer( anchor_mask=[6, 7, 8], num_classes=n_classes, anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401], num_anchors=9, stride=32) ``` --- ## 3. 剪枝方法實作 ![](https://i.imgur.com/epDueZg.png) ### Step1: Load model ```python= model = torch.load('weights/yolov4.pt') ``` ### Step2: Setting parameter ```python= # 剪枝比率 pruning_rate = 0.5 ``` ```python= # cfg ''' model: 各架構位置 skip: 不剪枝的層數 cfg: 剪枝後剩餘的 channel 數量 cfg_mask: 剪枝後剩餘 channel 的位置 cat_layer: 有 concat 的層數 ''' pruning_cfg = { 'down1':{ 'model': model.down1, 'skip': [3, 8, 13, 18, 23, 28, 33, 38], 'cfg': [], 'cfg_mask': [], 'cat_layer': [15, 35] }, 'down2':{ 'model': model.down2, 'skip': [3, 8, 13, 21, 26, 32, 37, 42, 47], 'cfg': [], 'cfg_mask': [], 'cat_layer': [10, 44] }, ... }, 'neck':{ 'model': model.neek, 'skip': [3, 21, 31, 36, 42, 47, 67, 72, 78, 83], # 3, 42, 78 這幾個會影響 downsample 所以先跳過 / (x13, x6), (72, 36), 因為會接 head 所以 conv14 不能剪 'cfg': [], 'cfg_mask': [], 'cat_layer': [15, 38, 44, 74, 80] }, 'head':{ 'model': model.head, 'skip': [3, 12, 17, 42, 51, 56], 'cfg': [128], 'cfg_mask': [], 'cat_layer': [5, 14, 44, 53, 83] } } ``` ### Step3: Compute threshold ```python= """計算global threshold""" # 計算總共多少 channels total = 0 for m in model.neek.modules(): if isinstance(m, nn.BatchNorm2d): total += m.weight.data.shape[0] # m.weight 就是 gamma # m.weight.data.shape[0]: 64 64 128 128 256 256 256 256 512 512 512 512 512 512 512 512 (channels)(baseline) # total : 5504 (總共 5504 channels) # 所有 gamma 值 取絕對值存進 bn bn = torch.zeros(total) # 1*n維 index = 0 for m in model.neek.modules(): if isinstance(m, nn.BatchNorm2d): size = m.weight.data.shape[0] # channels bn[index:(index + size)] = m.weight.data.abs().clone() index += size # index+size: 0+64 # bn[0:64] -> bn[ 64+64 : 64+128 ] # 0:64 # 64:128 # ... # 4480 4992 # 4992 5504 # bn: tensor([1.2170, 0.7687, ..., 0.5076, 0.4496]) (1*5504維) 把weight全部存進來 # 由小到大排序 y, i = torch.sort(bn) # 小 -> 大 thre_index = int(total * pruning_rate) # scale sparse rate 0.5 剪枝比例 thre = y[thre_index] if thre_index != 0 else 0 # 取第 thre_index 個值當作 threshold,如果 thre_index=0 代表全留,不能取第 0 個要直接改 0 # 之後 weight 會跟 thre 這個數字比大小,產生一個 0, 1 的 tensor,大於 thre 的留下(小於 thre 的就不會被存進 newmodel) print('Global threshold: {}'.format(thre)) print('Total channels: {}'.format(total)) ``` ```python= Global threshold: 0.4899449348449707 Total channels: 10752 ``` ### Step4: Start pruning ```python= """記錄誰該留下誰該剪掉""" pruned = 0 cfg_new = [] # remaining channel cfg_mask = [] # 記錄每層 channels,以 0,1 表示剪枝,假設 channels=3, cfg_mask=[0,1,1] for k, m in enumerate(model.neek.modules()): if isinstance(m, nn.BatchNorm2d): thre_ = 0 if k in pruning_cfg['neck']['skip'] else thre # skip 的 layer thre=0 weight_copy = m.weight.data.abs().clone() mask = weight_copy.gt(thre_).float() # 比大小,大的標記 1 & 小的標記 0,存進 mask cfg_new.append(int(torch.sum(mask))) cfg_mask.append(mask.clone()) #cfg: 254 (512 -> 254) #cfg_mask: [tensor([0., 1., 1., ... 0., 0., 0.])] 512維 pruned = pruned + mask.shape[0] - torch.sum(mask) # 計算pruning ratio print('layer index: {:d} \t total channel: {:d} \t remaining channel: {:d}'. format(k, mask.shape[0], int(torch.sum(mask)))) pruned_ratio = pruned / total print('-------------------------------------------------------------------------') print('channels pruned / channels total: {} / {}'.format(pruned, total)) print('pruned ratio: {}'.format(pruned_ratio)) ``` ```python= layer index: 3 total channel: 1024 remaining channel: 1024 layer index: 8 total channel: 512 remaining channel: 254 layer index: 13 total channel: 1024 remaining channel: 515 layer index: 21 total channel: 2048 remaining channel: 2048 layer index: 26 total channel: 512 remaining channel: 253 layer index: 31 total channel: 1024 remaining channel: 1024 layer index: 36 total channel: 512 remaining channel: 512 layer index: 42 total channel: 512 remaining channel: 512 layer index: 47 total channel: 512 remaining channel: 512 layer index: 52 total channel: 256 remaining channel: 134 layer index: 57 total channel: 512 remaining channel: 262 layer index: 62 total channel: 256 remaining channel: 119 layer index: 67 total channel: 512 remaining channel: 512 layer index: 72 total channel: 256 remaining channel: 256 layer index: 78 total channel: 256 remaining channel: 256 layer index: 83 total channel: 256 remaining channel: 256 layer index: 88 total channel: 128 remaining channel: 63 layer index: 93 total channel: 256 remaining channel: 142 layer index: 98 total channel: 128 remaining channel: 68 layer index: 103 total channel: 256 remaining channel: 125 ------------------------------------------------------------------------- channels pruned / channels total: 1905.0 / 10752 pruned ratio: 0.1771763414144516 ``` ### Step5: Save weights to new model ```python= print(cfg_new) [1024, 254, 515, 2048, 253, 1024, 512, 512, 512, 134, 262, 119, 512, 256, 256, 256, 63, 142, 68, 125] ``` ```python= # cfg pruning_cfg = { ... }, 'neck':{ 'model': model.neek, 'skip': [3, 21, 31, 36, 42, 47, 67, 72, 78, 83], # 3, 42, 78 這幾個會影響 downsample 所以先跳過 / (x13, x6), (72, 36), 因為會接 head 所以 conv14 不能剪 'cfg': cfg_new, 'cfg_mask': cfg_mask, 'cat_layer': [15, 38, 44, 74, 80] }, ... } ``` ```python= # 用新的 cfg 定義新模型架構 # 剪完的 channel 數要是這樣: # [1024, 254, 515, 2048, 253, 1024, 512, 512, 512, 134, 262, 119, 512, 256, 256, 256, 63, 142, 68, 125] newmodel = Yolov4(pruning_cfg=pruning_cfg) # 做一個空的 newmodel,照著我們要的 channel 數 ``` ```python= old_modules = list(model.neek.modules()) new_modules = list(newmodel.neek.modules()) layer_id_in_cfg = 0 start_mask = None end_mask = cfg_mask[layer_id_in_cfg] # 從第0個 cfg_mask 的 channel 開始 ``` ```python= for layer_id in range(len(old_modules)): m0 = old_modules[layer_id] m1 = new_modules[layer_id] # 針對 batchnorm if isinstance(m0, nn.BatchNorm2d): idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy()))) # end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., # 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1., # 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., # 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) # 1的位子 # idx1: [ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54] if idx1.size == 1: idx1 = np.resize(idx1, (1,)) # 存入新的權重 m1.weight.data = m0.weight.data[idx1.tolist()].clone() m1.bias.data = m0.bias.data[idx1.tolist()].clone() m1.running_mean = m0.running_mean[idx1.tolist()].clone() m1.running_var = m0.running_var[idx1.tolist()].clone() layer_id_in_cfg += 1 start_mask = end_mask.clone() end_mask = cfg_mask[layer_id_in_cfg] # 針對 conv elif isinstance(m0, nn.Conv2d): # 跟 concat 有關的層數,out channel 要保留 if layer_id in pruning_cfg['neck']['cat_layer']: idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy()))) idx1 = old_modules[layer_id].out_channels print('=====================================================') print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1)) # start_mask: tensor([1., 1., 1.]) # idx0: [0 1 2] #1的位子 # end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., # 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1., # 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., # 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) (64 -> 31個channels) # 1的位子 # idx1:[ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54] # In shape: 3, Out shape 31. if idx0.size == 1: idx0 = np.resize(idx0, (1,)) w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # in_channel m1.weight.data = w1.clone() # 存入新的權重 continue # 如果前兩層是 batch, 則 conv 也要接續 if isinstance(old_modules[layer_id - 2], nn.BatchNorm2d): idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy()))) idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy()))) print('=====================================================') print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1.size)) if idx0.size == 1: idx0 = np.resize(idx0, (1,)) if idx1.size == 1: idx1 = np.resize(idx1, (1,)) w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # in_channel w1 = w1[idx1.tolist(), :, :, :].clone() # out_channel m1.weight.data = w1.clone() # 存入新的權重 continue # We need to consider the case where there are downsampling convolutions. # For these convolutions, we just copy the weights. m1.weight.data = m0.weight.data.clone() ``` ```python= # 新的 model # torch.size() 順序是 output channel, input channel, kernel size num = 0 for i in newmodel.neek.state_dict(): if (num % 2) == 0: print(("================= {} =================").format(i.split('.')[0])) if 'conv.0.weight' in i: print('Batch shape: {}'.format(newmodel.neek.state_dict()[i].shape)) num += 1 if 'conv.2.weight' in i: print('Conv shape: {}'.format(newmodel.neek.state_dict()[i].shape)) num += 1 ``` ```python= ================= conv1 ================= Batch shape: torch.Size([1024]) Conv shape: torch.Size([254, 1024, 1, 1]) ================= conv2 ================= Batch shape: torch.Size([254]) Conv shape: torch.Size([515, 254, 3, 3]) ================= conv3 ================= Batch shape: torch.Size([515]) Conv shape: torch.Size([512, 515, 1, 1]) ================= conv4 ================= Batch shape: torch.Size([2048]) Conv shape: torch.Size([253, 2048, 1, 1]) ... ================= conv20 ================= Batch shape: torch.Size([125]) Conv shape: torch.Size([128, 125, 1, 1]) ``` ### Step6: Save model ```python= torch.save(newmodel, './weights/newmodel.pth') ``` ## 4. 實驗結果分析 ### 模型壓縮成果 #### 準確度 | 衡量指標 | 新模型 (epoch 73) | 新模型 (epoch 7) | 舊模型 | |----- |-------|--------|-------| | Average Precision (AP) @[ IoU=0.50:0.95 area= all maxDets=100 ] | 0.604 | 0.590 | 0.606 | | Average Precision (AP) @[ IoU=0.50 area= all maxDets=100 ] | 0.957 | 0.932 | 0.955 | | Average Precision (AP) @[ IoU=0.75 area= all maxDets=100 ] | 0.688 | 0.682 | 0.685 | | Average Precision (AP) @[ IoU=0.50:0.95 area= small maxDets=100 ] | -1.000 | -1.000 | -1.000 | | Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=100 ] | 0.555 | 0.568 | 0.552 | | Average Precision (AP) @[ IoU=0.50:0.95 area= large maxDets=100 ] | 0.625 | 0.604 | 0.631 | | Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 1 ] | 0.474 | 0.475 | 0.473 | | Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 10 ] | 0.691 | 0.675 | 0.680 | | Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets=100 ] | 0.746 | 0.734 | 0.733 | | Average Recall (AR) @[ IoU=0.50:0.95 area= small maxDets=100 ] | -1.000 | -1.000 | -1.000 | | Average Recall (AR) @[ IoU=0.50:0.95 area=medium maxDets=100 ] | 0.664 | 0.675 | 0.647 | | Average Recall (AR) @[ IoU=0.50:0.95 area= large maxDets=100 ] | 0.736 | 0.749 | 0.744 | :pushpin: 可以達到剪枝前模型相當的準確度 #### 壓縮數據 | 項目 | 新模型 | 舊模型 | |----- |-------|--------| | channel 剪枝比例 | 17.42% | 0 % | | 權重數 / 節省比例 | 35,055,933 (<font color=#F6D55C>45.2%</font>) | 63,966,528 | | memory 使用量 / 節省量 | load model: 900 MB (<font color=#F6D55C>200 MB</font>) <br> after inference: 2.6 GB (<font color=#F6D55C>270 MB</font>) | load model: 1.1 GB <br> after inference: 2.87 GB | | GPU memroy 使用量 / 節省量 | load model: 1,375 MB (<font color=#F6D55C>108 MB</font>) <br> after inference: 3,705 MB (<font color=#F6D55C>318 MB</font>) | load model: 1,483 MB <br> after inference: 4,023 MB | | 模型檔大小 / 節省量 | 135 MB (<font color=#F6D55C>110 MB</font>) | 245 MB | |inference 速度 (跑 100 次) | 43.2 ms +- 3.56 ms | 42.6 ms +- 1.89 ms | #### 其他資訊 :star: BN 分布狀況 (neck layer) * 加入 Sparsity-induced Penalty (L1 regularization on the scaling factors in batch normalization (BN) layers) 到 loss 後,繼續訓練 ![](https://i.imgur.com/o4IxVwx.png) * epoch 28 ![](https://i.imgur.com/L2OZBsm.png) * epoch 59 ![](https://i.imgur.com/F4V1p1V.png) :star: BN 分布狀況 (all layer) ![](https://i.imgur.com/Bc0EDNI.png) :star: 論文實驗結果 ![](https://i.imgur.com/u12OWl8.png) ## 5. 困難與挑戰 :pushpin: 透過 config 建立模型架構,在有些部分暫且還沒有拆解成功,未來可以想如何建立更有彈性的 config 目前挑戰 1. concat 層 2. 取聯集層 3. 重重限制層 * yolo backbone ![](https://i.imgur.com/9vr47JF.jpg) * yolo head ![](https://i.imgur.com/SdWy4cs.jpg)