Week 21: CRNN 模型壓縮實作

# Week 21: CRNN 模型壓縮實作 ###### tags: `技術研討` ## 1. CRNN 架構簡介 ![](https://i.imgur.com/3PKsuPd.png) ``` CRNN( (cnn): Sequential( (conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (batchnorm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu0): ReLU(inplace=True) (pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (batchnorm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu1): ReLU(inplace=True) (pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu2): ReLU(inplace=True) (conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (batchnorm3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu3): ReLU(inplace=True) (pooling2): MaxPool2d(kernel_size=(2, 1), stride=(2, 1), padding=0, dilation=1, ceil_mode=False) (conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu4): ReLU(inplace=True) (conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (batchnorm5): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu5): ReLU(inplace=True) (pooling3): MaxPool2d(kernel_size=(2, 1), stride=(2, 1), padding=0, dilation=1, ceil_mode=False) (conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1)) (batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu6): ReLU(inplace=True) ) (map_to_seq): Linear(in_features=512, out_features=64, bias=True) (rnn1): LSTM(64, 256, bidirectional=True) (rnn2): LSTM(512, 256, bidirectional=True) (dense): Linear(in_features=512, out_features=11, bias=True) ) ``` ## 2. CRNN 架構調整 [附上 Github 連結](https://github.com/GitYCC/crnn-pytorch) ### 2-1 調整 src/models.py "主程式"架構 :star: YC 很貼心的把各層的 channels 寫出來了，因此不用像上次 Yolov4 需要先釐清各層的 channels 數量 ```python= class CRNN(nn.Module): def __init__(self, img_channel, img_height, img_width, num_class, map_to_seq_hidden=64, rnn_hidden=256, leaky_relu=False): super(CRNN, self).__init__() self.cnn, (output_channel, output_height, output_width) = \ self._cnn_backbone(img_channel, img_height, img_width, leaky_relu) self.map_to_seq = nn.Linear(output_channel * output_height, map_to_seq_hidden) self.rnn1 = nn.LSTM(map_to_seq_hidden, rnn_hidden, bidirectional=True) self.rnn2 = nn.LSTM(2 * rnn_hidden, rnn_hidden, bidirectional=True) self.dense = nn.Linear(2 * rnn_hidden, num_class) def _cnn_backbone(self, img_channel, img_height, img_width, leaky_relu): channels = [img_channel, 64, 128, 256, 256, 512, 512, 512] kernel_sizes = [3, 3, 3, 3, 3, 3, 2] strides = [1, 1, 1, 1, 1, 1, 1] paddings = [1, 1, 1, 1, 1, 1, 0] cnn = nn.Sequential() ... ``` 開始進行調整 * 剪枝會造成 channels 數量的改變，因此要把 channels 參數化傳遞進 CRNN model ```python= class CRNN(nn.Module): def __init__(self, img_channel, img_height, img_width, num_class, map_to_seq_hidden=64, rnn_hidden=256, leaky_relu=False, pruning_cfg=None): super(CRNN, self).__init__() # add pruning cfg if pruning_cfg is None: self.channels = [img_channel, 64, 128, 256, 256, 512, 512, 512] else: self.channels = pruning_cfg self.cnn, (output_channel, output_height, output_width) = \ self._cnn_backbone(img_channel, img_height, img_width, leaky_relu) self.map_to_seq = nn.Linear(output_channel * output_height, map_to_seq_hidden) self.rnn1 = nn.LSTM(map_to_seq_hidden, rnn_hidden, bidirectional=True) self.rnn2 = nn.LSTM(2 * rnn_hidden, rnn_hidden, bidirectional=True) self.dense = nn.Linear(2 * rnn_hidden, num_class) ... ``` ## 3. 剪枝方法實作 :pushpin: 再複習一下剪枝實作的流程 ![](https://i.imgur.com/epDueZg.png) 以下我們進行兩種剪枝方法的比較 ### 3.1 Network Slimming (剪 batchnorm) #### Step1: Load model ```python= model = torch.load('crnn.pt') ``` #### Step2: Setting parameter 剪枝比例分別設定剪枝 50% & 80% 兩種狀況 ```python= # 剪枝比率 pruning_rate = 0.5 / 0.8 ``` ```python= # cfg ''' model: 各架構位置 skip: 不剪枝的層數 cfg: 剪枝後剩餘的 channel 數量 cfg_mask: 剪枝後剩餘 channel 的位置 cat_layer: 有 concat 的層數 ''' pruning_cfg = { 'cnn':{ 'model': model.cnn, 'skip': [], 'cfg': [], 'cfg_mask': [], 'cat_layer': [] } } ``` #### Step3: Compute threshold ```python= """計算global threshold""" # 計算總共多少 channels total = 0 for m in model.cnn.modules(): if isinstance(m, nn.BatchNorm2d): total += m.weight.data.shape[0] # m.weight 就是 gamma # m.weight.data.shape[0]: 64 128 256 256 512 512 512 # 所有 gamma 值取絕對值存進 bn bn = torch.zeros(total) index = 0 for m in model.cnn.modules(): if isinstance(m, nn.BatchNorm2d): size = m.weight.data.shape[0] bn[index:(index + size)] = m.weight.data.abs().clone() index += size # 由小到大排序 y, i = torch.sort(bn) # 小 -> 大 thre_index = int(total * pruning_rate) # scale sparse rate 剪枝比例 thre = y[thre_index] if thre_index != 0 else 0 # 之後 weight 會跟 thre 這個數字比大小，產生一個 0, 1 的 tensor，大於 thre 的留下(小於 thre 的就不會被存進 newmodel) print('Global threshold: {}'.format(thre)) print('Total channels: {}'.format(total)) ``` * 剪枝 50% 的門檻 ```python= Global threshold: 0.0 Total channels: 2240 ``` * 剪枝 80% 的門檻 ```python= Global threshold: 0.197 Total channels: 2240 ``` #### Step4: Start pruning ```python= """記錄誰該留下誰該剪掉""" pruned = 0 cfg_new = [1] # remaining channel cfg_mask = [torch.ones(1)] # 記錄每層 channels，以 0,1 表示剪枝，假設 channels=1, cfg_mask=[1] for k, m in enumerate(model.cnn.modules()): if isinstance(m, nn.BatchNorm2d): thre_ = 0 if k in pruning_cfg['cnn']['skip'] else thre # skip 的 layer thre=0 weight_copy = m.weight.data.abs().clone() mask = weight_copy.gt(thre_).float() # 比大小，大的標記 1 & 小的標記 0，存進 mask cfg_new.append(int(torch.sum(mask))) cfg_mask.append(mask.clone()) #cfg: 44 (64 -> 44) #cfg_mask: [tensor([0., 1., 1., ... 0., 0., 0.])] 64維 pruned = pruned + mask.shape[0] - torch.sum(mask) # 計算pruning ratio print('layer index: {:d} \t total channel: {:d} \t remaining channel: {:d}'. format(k, mask.shape[0], int(torch.sum(mask)))) pruned_ratio = pruned / total print('-------------------------------------------------------------------------') print('channels pruned / channels total: {} / {}'.format(pruned, total)) print('pruned ratio: {}'.format(pruned_ratio)) ``` * 剪枝 50% 的結果 ```python= layer index: 2 total channel: 64 remaining channel: 44 layer index: 6 total channel: 128 remaining channel: 115 layer index: 10 total channel: 256 remaining channel: 175 layer index: 13 total channel: 256 remaining channel: 191 layer index: 17 total channel: 512 remaining channel: 200 layer index: 20 total channel: 512 remaining channel: 175 layer index: 24 total channel: 512 remaining channel: 156 ------------------------------------------------------------------------- channels pruned / channels total: 1184.0 / 2240 pruned ratio: 0.5285714268684387 ``` * 剪枝 80% 的結果 ```python= layer index: 2 total channel: 64 remaining channel: 35 layer index: 6 total channel: 128 remaining channel: 107 layer index: 10 total channel: 256 remaining channel: 88 layer index: 13 total channel: 256 remaining channel: 84 layer index: 17 total channel: 512 remaining channel: 41 layer index: 20 total channel: 512 remaining channel: 46 layer index: 24 total channel: 512 remaining channel: 46 ------------------------------------------------------------------------- channels pruned / channels total: 1793.0 / 2240 pruned ratio: 0.8004464507102966 ``` #### Step5: Save weights to new model (以 80% 為例) ```python= print(cfg_new) [1, 35, 107, 88, 84, 41, 46, 46] ``` ```python= # cfg pruning_cfg = { 'cnn':{ 'model': model.cnn, 'skip': [], 'cfg': [1, 35, 107, 88, 84, 41, 46, 46], 'cfg_mask': [], 'cat_layer': [] } } ``` ```python= # 用新的 cfg 定義新模型架構 # [1, 35, 107, 88, 84, 41, 46, 46] newmodel = CRNN(img_channel=32, img_height=128, img_width=128, num_class=5, pruning_cfg=pruning_cfg['cnn']['cfg']) ``` ```python= old_modules = list(model.cnn.modules()) new_modules = list(newmodel.cnn.modules()) layer_id_in_cfg = 0 start_mask = cfg_mask[layer_id_in_cfg] # 第一個維度 end_mask = cfg_mask[layer_id_in_cfg+1] ``` ```python= for layer_id in range(len(old_modules)): m0 = old_modules[layer_id] m1 = new_modules[layer_id] # 針對 conv if isinstance(m0, nn.Conv2d): idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy()))) idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy()))) print('=====================================================') print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1.size)) # start_mask: tensor([1]) # 1維(最一開始) # idx0: [0] # 1的位子 # end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., # 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1., # 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., # 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) (64 -> 35個channels) # 1的位子 # idx1:[ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54] # In shape: 1, Out shape 35. if idx0.size == 1: idx0 = np.resize(idx0, (1,)) if idx1.size == 1: idx1 = np.resize(idx1, (1,)) w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # in_channel w1 = w1[idx1.tolist(), :, :, :].clone() # out_channel m1.weight.data = w1.clone() # 存入新的權重 # 針對 batchnorm elif isinstance(m0, nn.BatchNorm2d): idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy()))) if idx1.size == 1: idx1 = np.resize(idx1, (1,)) # end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., # 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1., # 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1., # 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) (64 -> 35個channels) # 1的位子 # idx1:[ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54] # batchnorm shape 35. # 存入新的權重 m1.weight.data = m0.weight.data[idx1.tolist()].clone() m1.bias.data = m0.bias.data[idx1.tolist()].clone() m1.running_mean = m0.running_mean[idx1.tolist()].clone() m1.running_var = m0.running_var[idx1.tolist()].clone() # 跑最後一層會有 list index 超出範圍，所以限制 if layer_id_in_cfg < 6: layer_id_in_cfg += 1 start_mask = end_mask.clone() end_mask = cfg_mask[layer_id_in_cfg+1] ``` ```python= # 新的 model # torch.size() 順序是 output channel, input channel, kernel size for i in newmodel.cnn.state_dict(): if ('conv' in i) and ('weight' in i): print(("================= {} =================").format(i.split('.')[0])) print('Conv shape: {}'.format(newmodel.cnn.state_dict()[i].shape)) if ('batchnorm' in i) and ('weight' in i): print('Batch shape: {}'.format(newmodel.cnn.state_dict()[i].shape)) ``` ```python= ================= conv0 ================= Conv shape: torch.Size([35, 1, 3, 3]) Batch shape: torch.Size([35]) ================= conv1 ================= Conv shape: torch.Size([107, 35, 3, 3]) Batch shape: torch.Size([107]) ================= conv2 ================= Conv shape: torch.Size([88, 107, 3, 3]) Batch shape: torch.Size([88]) ================= conv3 ================= Conv shape: torch.Size([84, 88, 3, 3]) Batch shape: torch.Size([84]) ================= conv4 ================= Conv shape: torch.Size([41, 84, 3, 3]) Batch shape: torch.Size([41]) ================= conv5 ================= Conv shape: torch.Size([46, 41, 3, 3]) Batch shape: torch.Size([46]) ================= conv6 ================= Conv shape: torch.Size([46, 46, 2, 2]) Batch shape: torch.Size([46]) ``` #### Step6: Save model ```python= torch.save(newmodel, './weights/newmodel.pth') ``` --- ### 3.2 L1-norm-pruning剪枝方法實作 [快速複習一下L1 Norm剪枝](https://hackmd.io/1-REXwMvS6SiPKCISjrXHQ?view#31-如何決定裁減哪些-filters？) * 不是用Global threshold剪枝，而是去看每一層的敏感度，依敏感度決定每層可以被剪多還是剪少 **Q: 怎麼看每層敏感度？ A:** ![](https://i.imgur.com/qV4nIU4.png) - 看a圖，透過計算每個 filters 的加總權重 $s_j$ 並依照大小排序，圖 2 (a) 顯示將權重標準化到 0~1 之間的權重分佈圖，不同的 Convolution layer 的權重分佈差異很大。 **Q: 所以誰可以被剪？ A:** * 斜率越大的線，權重分佈的越不平均，也就表示主要的權重值來自於少數 rank 前面的 filters，大部分的 filters 可能權重都非常小，所以對於剪枝的敏感度較低。 -> 剪下去 * 斜率越小的線，權重分佈的越平均，也就表示大部分的 filters 都有不小的權重值，都算有資訊的 filter，所以對於剪枝的敏感度較高。 **Q: 剪枝四步驟？ A:** 1. 計算出每個 filter $F_{i,j}$ 的 kernel weights 絕對值得總合，$s_{j}=\sum^{n_{i}}_{l=1}\sum|K_{l}|$ 2. 依照 $s_j$ 將 filters 排序 3. 從最小的加總權重開始，裁減 m 個 filters 以及其相對應的 feature maps。還有被裁減的 feature maps 在下一層對應到的 filter kernel 也應該被裁減 4. 最後產生新的 $i$ 跟 $i+1$ 層的 kernel matrix，剩下的 weights 會被複製到新的模型中 **step1: 畫出敏感度的圖** [code在這 week5](![](https://i.imgur.com/RTdaZpm.png)) **Absolute sum of filter weights for each layer:** ![](https://i.imgur.com/crCSdi1.png) - 敏感度低到高(剪枝多到少)：conv1->conv2->conv7->conv6 **step2: 依上圖sensitivity自訂剪枝比例** - 敏感度低到高(剪枝多到少)：conv1->conv2->conv7->conv6 - 設定剪枝比例主要剪**最多**為conv1，**次多**為conv2, 6, 7 - 以下實驗剪枝 12%, 52%, 80%結果 ```python= # 自訂puruning rate # 1, 2, 3, 4 5, 6, 7 pruning_rate_list = [0.6, 0.2, 0, 0, 0, 0.2, 0.2] #-> 0.12 pruning_rate_list = [0.9, 0.7, 0.3, 0.3, 0.3, 0.7, 0.7] #-> 0.52 pruning_rate_list = [0.9, 0.9, 0.7, 0.6, 0.7, 0.9, 0.9] # -> 0.8 ``` **step3: make new "cfg" of channels and mask (以80%為例)** - 注意L1剪枝方法是以留下誰的角度去想哦～～ ``` 原channels(cfg): [1, 64, 128, 256, 256, 512, 512, 512] 新channels(cfg_new): [1, 6, 12, 76, 102, 153, 51, 51] ``` - cfg_new = cfg * (1 - pruning rate) ```python= # make cfg_new and mask of cfg_new cfg = [] cfg_new = [1] # (input channel是1) cfg_mask = [] conv_id = 1 for m in model.modules(): if isinstance(m, nn.Conv2d): out_channels = m.weight.data.shape[0] # 原channels # make cfg_new num_keep = int(out_channels * (1 - pruning_rate_list[i])) cfg_new.append(num_keep) cfg.append(out_channels) print(f'Conv{i}, total channel:{out_channels}, remaining channel:{num_keep}') # make mask of cfg_new # 計算L1 # 把要留下的人給1，其餘給0，做1個mask weight_copy = m.weight.data.abs().clone() weight_copy = weight_copy.cpu().numpy() L1_norm = np.sum(weight_copy, axis=(1, 2, 3)) # 算L1 weights全部加總 arg_max = np.argsort(L1_norm) # 從小到大排列，取其對應的index(索引) arg_max_rev = arg_max[::-1][:cfg_new[conv_id]] # 取該留下的filters mask = torch.zeros(out_channels) # 一個都是0的tensor，1*out_channels維 mask[arg_max_rev.tolist()] = 1 # 把要留下的人給1 cfg_mask.append(mask) # append起來就會得到： [tensor(一堆0, 1), tensor(一堆0, 1)] conv_id+=1 ``` output: ``` Conv1, total channel:64, remaining channel:6 Conv2, total channel:128, remaining channel:12 Conv3, total channel:256, remaining channel:76 Conv4, total channel:256, remaining channel:102 Conv5, total channel:512, remaining channel:153 Conv6, total channel:512, remaining channel:51 Conv7, total channel:512, remaining channel:51 cfg: [1, 64, 128, 256, 256, 512, 512, 512] cfg_new: [1, 6, 12, 76, 102, 153, 51, 51] # 1是額外加的(input channel) cfg_mask: [tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), tensor([1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 0., 0., 1., 0., 0., 0., 1., 1.]),.....] ``` **step5: save weights to new model (以80%為例)** ```python= # 用新的 cfg 定義新模型架構 newmodel = CRNN(img_channel=1, img_height=128, img_width=128, num_class=5, pruning_cfg=cfg_new) ``` ```python= start_mask = torch.ones(1) # input channel layer_id_in_cfg = 0 end_mask = cfg_mask[layer_id_in_cfg] for [m0, m1] in zip(model.modules(), newmodel.modules()): # Conv if isinstance(m0, nn.Conv2d): # input channel 根據 mask 中將值為 1 的 positions 全部找出來變成一個 array idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy()))) # output channel mask 中將值為 1 的 positions 全部找出來變成一個 array idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy()))) print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1.size)) if idx0.size == 1: idx0 = np.resize(idx0, (1,)) if idx1.size == 1: idx1 = np.resize(idx1, (1,)) # 先把 input 要留下的 filters 取出來 w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # 再把 output 要留下的 filters 取出來 w1 = w1[idx1.tolist(), :, :, :].clone() # 存入newmodel中 m1.weight.data = w1.clone() # BatchNorm elif isinstance(m0, nn.BatchNorm2d): # 從 mask 中將值為 1 的 positions 全部找出來變成一個 array if idx1.size == 1: idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy()))) if idx1.size == 1: idx1 = np.resize(idx1,(1,)) # 直接取出參數存入newmodel m1.weight.data = m0.weight.data[idx1.tolist()].clone() m1.bias.data = m0.bias.data[idx1.tolist()].clone() m1.running_mean = m0.running_mean[idx1.tolist()].clone() m1.running_var = m0.running_var[idx1.tolist()].clone() layer_id_in_cfg += 1 # 這層的結束是下一層的input start_mask = end_mask if layer_id_in_cfg < len(cfg_mask): # do not change in Final FC end_mask = cfg_mask[layer_id_in_cfg] # Linear elif isinstance(m0, nn.Linear): if layer_id_in_cfg == len(cfg_mask): if idx0.size == 1: # 最後一層可能被裁減的 filter 層 idx0 = np.squeeze(np.argwhere(np.asarray(cfg_mask[-1].cpu().numpy()))) if idx0.size == 1: idx0 = np.resize(idx0, (1,)) m1.weight.data = m0.weight.data[:, idx0].clone() m1.bias.data = m0.bias.data.clone() layer_id_in_cfg += 1 continue m1.weight.data = m0.weight.data.clone() m1.bias.data = m0.bias.data.clone() ``` ```python= # 新的 model # torch.size() 順序是 output channel, input channel, kernel size for i in newmodel.cnn.state_dict(): if ('conv' in i) and ('weight' in i): print(("================= {} =================").format(i.split('.')[0])) print('Conv shape: {}'.format(newmodel.cnn.state_dict()[i].shape)) if ('batchnorm' in i) and ('weight' in i): print('Batch shape: {}'.format(newmodel.cnn.state_dict()[i].shape)) ``` output: ``` ================= conv0 ================= Conv shape: torch.Size([6, 1, 3, 3]) Batch shape: torch.Size([6]) ================= conv1 ================= Conv shape: torch.Size([12, 6, 3, 3]) Batch shape: torch.Size([12]) ================= conv2 ================= Conv shape: torch.Size([76, 12, 3, 3]) Batch shape: torch.Size([76]) ================= conv3 ================= Conv shape: torch.Size([102, 76, 3, 3]) Batch shape: torch.Size([102]) ================= conv4 ================= Conv shape: torch.Size([153, 102, 3, 3]) Batch shape: torch.Size([153]) ================= conv5 ================= Conv shape: torch.Size([51, 153, 3, 3]) Batch shape: torch.Size([51]) ================= conv6 ================= Conv shape: torch.Size([51, 51, 2, 2]) Batch shape: torch.Size([51]) ``` **step6: save model** ```python= # save newmodel torch.save({'state_dict': newmodel.state_dict(), 'acc': 0, 'pruning_cfg': cfg_new}, './pruning_checkpoints/newmodel_pruning_l1_80.pth') ``` ## 4. 實驗結果分析 | 項目 | 新模型 (bn) | 新模型 (bn) | 新模型 (l1 norm)| 新模型 (l1 norm)| 新模型 (l1 norm) | 舊模型 | |----- |-------|--------|-------|--------|-------|--------| | channel 剪枝比例 | 52% | 80 % | 12 % | 52 % | 80 % | 0 %| | 權重數 / 節省比例 | 3.57 M (54 %) | 2.50 M (68 %) | 6.87 M (12 %) | 3.79 M (52 %) |2.56 M (67 %)|7.84 M| | 模型檔大小 | 14 MB | 9.6 MB | 27 MB | 15 MB | 9.9 MB | 30 MB | |inference 速度 (跑 50 次) | 37.4 ms +- 6.21 ms | 37.3 ms +- 4.05 ms |8.03 ms +- 0.47 ms | 10.6 ms +- 0.32 ms | 8.38 ms +- 0.30 ms | 11.7 ms +- 0.92 ms | | 運算量 MACs | 0.26 G (61 %) | 0.12 G (82 %)| 0.56 G (16 %) | 0.24 G (64 %) |0.09 G (87 %)| 0.67 G | | best model accuracy | 97.28 % | 96.83 % | 97.42 % | 96.52% | 95.74% | 97.37 % | ### 4.1 L1-norm-pruning 實驗結果分析 **<ins> 4.1.1 finetune / from scratch (pruning rate 52%)** ![](https://i.imgur.com/Z3B5qLR.png) - 比較pruning rate 52%下，finetune / from scratch的vaild accuracy - 收斂速度：finetune > from scratch - finetune明顯比from scratch穩定 (其餘 pruning rate 12% & 82%也是一樣的實驗結果) ### <ins> 4.1.2 finetune (pruning rate 12% / 52% / 80%) ![](https://i.imgur.com/AGgO2vK.png) - 比較pruning rate 12% / 52% / 80%下，finetune的vaild accuracy - 收斂速度： pruning rate12%(綠)>pruning rate52%(藍)>pruning rate80%(橘) ### <ins> 4.1.3 from scratch (pruning rate 12% / 52% / 80%) ![](https://i.imgur.com/sltRa2e.png) - 比較pruning rate 12% / 52% / 80%下，from scratch accuracy - 收斂速度： pruning rate12%(綠) ＆ pruning rate52%(橘) 不相上下 > pruning rate80%(藍) - 明顯看出from scratch 與上一張圖(4.2 finetune)比起來都較不穩定 ### <ins> 4.1.4 finetune / from scratch (pruning rate 12% / 52% / 80%) ![](https://i.imgur.com/ABtJlGp.png) - 比較pruning rate 12% / 52% / 80%下，finetune / from scratch accuracy - 先看finetune（咖啡色/藍色/綠色）都較from scratch（紅/紫/橘）穩定 - 收斂速度： finetune / 12%(咖啡) > finetune / 52%(藍) > from scratch / 12%(紫) > finetune / 52%(紅) > from scratch / 80% (橘) > finetune / 80% (綠) ## 5. 困難與挑戰 1. 根據 weights 或 scalling factor 剪枝過後，原本的模型 accuracy 直接降到零，目前無法找出確切的原因，但是在 finetune 的過程中有 pretrained weights 的明顯訓練的比較快也比較好 2. inference 速度剪枝過後沒有提升速度，反而比剪枝前還慢，但是 FLOPs (運算量) 比以前少，目前還找不出原因 (先前 yolo inference 也有類似的狀況)