# Week 21: CRNN 模型壓縮實作
###### tags: `技術研討`
## 1. CRNN 架構簡介

```
CRNN(
(cnn): Sequential(
(conv0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu0): ReLU(inplace=True)
(pooling0): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu1): ReLU(inplace=True)
(pooling1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(conv2): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu2): ReLU(inplace=True)
(conv3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu3): ReLU(inplace=True)
(pooling2): MaxPool2d(kernel_size=(2, 1), stride=(2, 1), padding=0, dilation=1, ceil_mode=False)
(conv4): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm4): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu4): ReLU(inplace=True)
(conv5): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(batchnorm5): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu5): ReLU(inplace=True)
(pooling3): MaxPool2d(kernel_size=(2, 1), stride=(2, 1), padding=0, dilation=1, ceil_mode=False)
(conv6): Conv2d(512, 512, kernel_size=(2, 2), stride=(1, 1))
(batchnorm6): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(relu6): ReLU(inplace=True)
)
(map_to_seq): Linear(in_features=512, out_features=64, bias=True)
(rnn1): LSTM(64, 256, bidirectional=True)
(rnn2): LSTM(512, 256, bidirectional=True)
(dense): Linear(in_features=512, out_features=11, bias=True)
)
```
## 2. CRNN 架構調整
[附上 Github 連結](https://github.com/GitYCC/crnn-pytorch)
### 2-1 調整 src/models.py "主程式"架構
:star: YC 很貼心的把各層的 channels 寫出來了,因此不用像上次 Yolov4 需要先釐清各層的 channels 數量
```python=
class CRNN(nn.Module):
def __init__(self, img_channel, img_height, img_width, num_class,
map_to_seq_hidden=64, rnn_hidden=256, leaky_relu=False):
super(CRNN, self).__init__()
self.cnn, (output_channel, output_height, output_width) = \
self._cnn_backbone(img_channel, img_height, img_width, leaky_relu)
self.map_to_seq = nn.Linear(output_channel * output_height, map_to_seq_hidden)
self.rnn1 = nn.LSTM(map_to_seq_hidden, rnn_hidden, bidirectional=True)
self.rnn2 = nn.LSTM(2 * rnn_hidden, rnn_hidden, bidirectional=True)
self.dense = nn.Linear(2 * rnn_hidden, num_class)
def _cnn_backbone(self, img_channel, img_height, img_width, leaky_relu):
channels = [img_channel, 64, 128, 256, 256, 512, 512, 512]
kernel_sizes = [3, 3, 3, 3, 3, 3, 2]
strides = [1, 1, 1, 1, 1, 1, 1]
paddings = [1, 1, 1, 1, 1, 1, 0]
cnn = nn.Sequential()
...
```
開始進行調整
* 剪枝會造成 channels 數量的改變,因此要<font color=red>把 channels 參數化</font>傳遞進 CRNN model
```python=
class CRNN(nn.Module):
def __init__(self, img_channel, img_height, img_width, num_class,
map_to_seq_hidden=64, rnn_hidden=256, leaky_relu=False, pruning_cfg=None):
super(CRNN, self).__init__()
# add pruning cfg
if pruning_cfg is None:
self.channels = [img_channel, 64, 128, 256, 256, 512, 512, 512]
else:
self.channels = pruning_cfg
self.cnn, (output_channel, output_height, output_width) = \
self._cnn_backbone(img_channel, img_height, img_width, leaky_relu)
self.map_to_seq = nn.Linear(output_channel * output_height, map_to_seq_hidden)
self.rnn1 = nn.LSTM(map_to_seq_hidden, rnn_hidden, bidirectional=True)
self.rnn2 = nn.LSTM(2 * rnn_hidden, rnn_hidden, bidirectional=True)
self.dense = nn.Linear(2 * rnn_hidden, num_class)
...
```
## 3. 剪枝方法實作
:pushpin: 再複習一下剪枝實作的流程

以下我們進行兩種剪枝方法的比較
### 3.1 Network Slimming (剪 batchnorm)
#### Step1: Load model
```python=
model = torch.load('crnn.pt')
```
#### Step2: Setting parameter
剪枝比例分別設定剪枝 50% & 80% 兩種狀況
```python=
# 剪枝比率
pruning_rate = 0.5 / 0.8
```
```python=
# cfg
'''
model: 各架構位置
skip: 不剪枝的層數
cfg: 剪枝後剩餘的 channel 數量
cfg_mask: 剪枝後剩餘 channel 的位置
cat_layer: 有 concat 的層數
'''
pruning_cfg = {
'cnn':{
'model': model.cnn,
'skip': [],
'cfg': [],
'cfg_mask': [],
'cat_layer': []
}
}
```
#### Step3: Compute threshold
```python=
"""計算global threshold"""
# 計算總共多少 channels
total = 0
for m in model.cnn.modules():
if isinstance(m, nn.BatchNorm2d):
total += m.weight.data.shape[0] # m.weight 就是 gamma
# m.weight.data.shape[0]: 64 128 256 256 512 512 512
# 所有 gamma 值 取絕對值存進 bn
bn = torch.zeros(total)
index = 0
for m in model.cnn.modules():
if isinstance(m, nn.BatchNorm2d):
size = m.weight.data.shape[0]
bn[index:(index + size)] = m.weight.data.abs().clone()
index += size
# 由小到大排序
y, i = torch.sort(bn) # 小 -> 大
thre_index = int(total * pruning_rate) # scale sparse rate 剪枝比例
thre = y[thre_index] if thre_index != 0 else 0
# 之後 weight 會跟 thre 這個數字比大小,產生一個 0, 1 的 tensor,大於 thre 的留下(小於 thre 的就不會被存進 newmodel)
print('Global threshold: {}'.format(thre))
print('Total channels: {}'.format(total))
```
* 剪枝 50% 的門檻
```python=
Global threshold: 0.0
Total channels: 2240
```
* 剪枝 80% 的門檻
```python=
Global threshold: 0.197
Total channels: 2240
```
#### Step4: Start pruning
```python=
"""記錄誰該留下誰該剪掉"""
pruned = 0
cfg_new = [1] # remaining channel
cfg_mask = [torch.ones(1)] # 記錄每層 channels,以 0,1 表示剪枝,假設 channels=1, cfg_mask=[1]
for k, m in enumerate(model.cnn.modules()):
if isinstance(m, nn.BatchNorm2d):
thre_ = 0 if k in pruning_cfg['cnn']['skip'] else thre # skip 的 layer thre=0
weight_copy = m.weight.data.abs().clone()
mask = weight_copy.gt(thre_).float() # 比大小,大的標記 1 & 小的標記 0,存進 mask
cfg_new.append(int(torch.sum(mask)))
cfg_mask.append(mask.clone())
#cfg: 44 (64 -> 44)
#cfg_mask: [tensor([0., 1., 1., ... 0., 0., 0.])] 64維
pruned = pruned + mask.shape[0] - torch.sum(mask) # 計算pruning ratio
print('layer index: {:d} \t total channel: {:d} \t remaining channel: {:d}'.
format(k, mask.shape[0], int(torch.sum(mask))))
pruned_ratio = pruned / total
print('-------------------------------------------------------------------------')
print('channels pruned / channels total: {} / {}'.format(pruned, total))
print('pruned ratio: {}'.format(pruned_ratio))
```
* 剪枝 50% 的結果
```python=
layer index: 2 total channel: 64 remaining channel: 44
layer index: 6 total channel: 128 remaining channel: 115
layer index: 10 total channel: 256 remaining channel: 175
layer index: 13 total channel: 256 remaining channel: 191
layer index: 17 total channel: 512 remaining channel: 200
layer index: 20 total channel: 512 remaining channel: 175
layer index: 24 total channel: 512 remaining channel: 156
-------------------------------------------------------------------------
channels pruned / channels total: 1184.0 / 2240
pruned ratio: 0.5285714268684387
```
* 剪枝 80% 的結果
```python=
layer index: 2 total channel: 64 remaining channel: 35
layer index: 6 total channel: 128 remaining channel: 107
layer index: 10 total channel: 256 remaining channel: 88
layer index: 13 total channel: 256 remaining channel: 84
layer index: 17 total channel: 512 remaining channel: 41
layer index: 20 total channel: 512 remaining channel: 46
layer index: 24 total channel: 512 remaining channel: 46
-------------------------------------------------------------------------
channels pruned / channels total: 1793.0 / 2240
pruned ratio: 0.8004464507102966
```
#### Step5: Save weights to new model (以 80% 為例)
```python=
print(cfg_new)
[1, 35, 107, 88, 84, 41, 46, 46]
```
```python=
# cfg
pruning_cfg = {
'cnn':{
'model': model.cnn,
'skip': [],
'cfg': [1, 35, 107, 88, 84, 41, 46, 46],
'cfg_mask': [],
'cat_layer': []
}
}
```
```python=
# 用新的 cfg 定義新模型架構
# [1, 35, 107, 88, 84, 41, 46, 46]
newmodel = CRNN(img_channel=32, img_height=128, img_width=128, num_class=5, pruning_cfg=pruning_cfg['cnn']['cfg'])
```
```python=
old_modules = list(model.cnn.modules())
new_modules = list(newmodel.cnn.modules())
layer_id_in_cfg = 0
start_mask = cfg_mask[layer_id_in_cfg] # 第一個維度
end_mask = cfg_mask[layer_id_in_cfg+1]
```
```python=
for layer_id in range(len(old_modules)):
m0 = old_modules[layer_id]
m1 = new_modules[layer_id]
# 針對 conv
if isinstance(m0, nn.Conv2d):
idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy())))
idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy())))
print('=====================================================')
print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1.size))
# start_mask: tensor([1]) # 1維(最一開始)
# idx0: [0] # 1的位子
# end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1.,
# 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
# 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1.,
# 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) (64 -> 35個channels)
# 1的位子
# idx1:[ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54]
# In shape: 1, Out shape 35.
if idx0.size == 1:
idx0 = np.resize(idx0, (1,))
if idx1.size == 1:
idx1 = np.resize(idx1, (1,))
w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # in_channel
w1 = w1[idx1.tolist(), :, :, :].clone() # out_channel
m1.weight.data = w1.clone() # 存入新的權重
# 針對 batchnorm
elif isinstance(m0, nn.BatchNorm2d):
idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy())))
if idx1.size == 1:
idx1 = np.resize(idx1, (1,))
# end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1.,
# 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
# 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1.,
# 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) (64 -> 35個channels)
# 1的位子
# idx1:[ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54]
# batchnorm shape 35.
# 存入新的權重
m1.weight.data = m0.weight.data[idx1.tolist()].clone()
m1.bias.data = m0.bias.data[idx1.tolist()].clone()
m1.running_mean = m0.running_mean[idx1.tolist()].clone()
m1.running_var = m0.running_var[idx1.tolist()].clone()
# 跑最後一層會有 list index 超出範圍,所以限制
if layer_id_in_cfg < 6:
layer_id_in_cfg += 1
start_mask = end_mask.clone()
end_mask = cfg_mask[layer_id_in_cfg+1]
```
```python=
# 新的 model
# torch.size() 順序是 output channel, input channel, kernel size
for i in newmodel.cnn.state_dict():
if ('conv' in i) and ('weight' in i):
print(("================= {} =================").format(i.split('.')[0]))
print('Conv shape: {}'.format(newmodel.cnn.state_dict()[i].shape))
if ('batchnorm' in i) and ('weight' in i):
print('Batch shape: {}'.format(newmodel.cnn.state_dict()[i].shape))
```
```python=
================= conv0 =================
Conv shape: torch.Size([35, 1, 3, 3])
Batch shape: torch.Size([35])
================= conv1 =================
Conv shape: torch.Size([107, 35, 3, 3])
Batch shape: torch.Size([107])
================= conv2 =================
Conv shape: torch.Size([88, 107, 3, 3])
Batch shape: torch.Size([88])
================= conv3 =================
Conv shape: torch.Size([84, 88, 3, 3])
Batch shape: torch.Size([84])
================= conv4 =================
Conv shape: torch.Size([41, 84, 3, 3])
Batch shape: torch.Size([41])
================= conv5 =================
Conv shape: torch.Size([46, 41, 3, 3])
Batch shape: torch.Size([46])
================= conv6 =================
Conv shape: torch.Size([46, 46, 2, 2])
Batch shape: torch.Size([46])
```
#### Step6: Save model
```python=
torch.save(newmodel, './weights/newmodel.pth')
```
---
### 3.2 L1-norm-pruning剪枝方法實作
[快速複習一下L1 Norm剪枝](https://hackmd.io/1-REXwMvS6SiPKCISjrXHQ?view#31-如何決定裁減哪些-filters?)
* 不是用Global threshold剪枝,而是去看每一層的敏感度,依敏感度決定每層可以被剪多還是剪少
**Q: 怎麼看每層敏感度?
A:**

- 看a圖,透過計算每個 filters 的加總權重 $s_j$ 並依照大小排序,圖 2 (a) 顯示將權重標準化到 0~1 之間的權重分佈圖,不同的 Convolution layer 的權重分佈差異很大。
**Q: 所以誰可以被剪?
A:**
* <font color=red>斜率越大的線</font>,權重分佈的越不平均,也就表示主要的權重值來自於少數 rank 前面的 filters,大部分的 filters 可能權重都非常小,所以<font color=red>對於剪枝的敏感度較低</font>。 -> 剪下去
* <font color=red>斜率越小的線</font>,權重分佈的越平均,也就表示大部分的 filters 都有不小的權重值,都算有資訊的 filter,所以<font color=red>對於剪枝的敏感度較高</font>。
**Q: 剪枝四步驟?
A:**
1. 計算出每個 filter $F_{i,j}$ 的 kernel weights 絕對值得總合,$s_{j}=\sum^{n_{i}}_{l=1}\sum|K_{l}|$
2. 依照 $s_j$ 將 filters 排序
3. 從最小的加總權重開始,裁減 m 個 filters 以及其相對應的 feature maps。還有被裁減的 feature maps 在下一層對應到的 filter kernel 也應該被裁減
4. 最後產生新的 $i$ 跟 $i+1$ 層的 kernel matrix,剩下的 weights 會被複製到新的模型中
**step1: 畫出敏感度的圖**
[code在這 week5]()
**Absolute sum of filter weights for each layer:**

- 敏感度低到高(剪枝多到少):conv1->conv2->conv7->conv6
**step2: 依上圖sensitivity自訂剪枝比例**
- 敏感度低到高(剪枝多到少):conv1->conv2->conv7->conv6
- 設定剪枝比例主要剪**最多**為conv1,**次多**為conv2, 6, 7
- 以下實驗剪枝 12%, 52%, 80%結果
```python=
# 自訂puruning rate
# 1, 2, 3, 4 5, 6, 7
pruning_rate_list = [0.6, 0.2, 0, 0, 0, 0.2, 0.2] #-> 0.12
pruning_rate_list = [0.9, 0.7, 0.3, 0.3, 0.3, 0.7, 0.7] #-> 0.52
pruning_rate_list = [0.9, 0.9, 0.7, 0.6, 0.7, 0.9, 0.9] # -> 0.8
```
**step3: make new "cfg" of channels and mask (以80%為例)**
- 注意L1剪枝方法是以留下誰的角度去想哦~~
```
原channels(cfg): [1, 64, 128, 256, 256, 512, 512, 512]
新channels(cfg_new): [1, 6, 12, 76, 102, 153, 51, 51]
```
- cfg_new = cfg * (1 - pruning rate)
```python=
# make cfg_new and mask of cfg_new
cfg = []
cfg_new = [1] # (input channel是1)
cfg_mask = []
conv_id = 1
for m in model.modules():
if isinstance(m, nn.Conv2d):
out_channels = m.weight.data.shape[0] # 原channels
# make cfg_new
num_keep = int(out_channels * (1 - pruning_rate_list[i]))
cfg_new.append(num_keep)
cfg.append(out_channels)
print(f'Conv{i}, total channel:{out_channels}, remaining channel:{num_keep}')
# make mask of cfg_new
# 計算L1
# 把要留下的人給1,其餘給0,做1個mask
weight_copy = m.weight.data.abs().clone()
weight_copy = weight_copy.cpu().numpy()
L1_norm = np.sum(weight_copy, axis=(1, 2, 3)) # 算L1 weights全部加總
arg_max = np.argsort(L1_norm) # 從小到大排列,取其對應的index(索引)
arg_max_rev = arg_max[::-1][:cfg_new[conv_id]] # 取該留下的filters
mask = torch.zeros(out_channels) # 一個都是0的tensor,1*out_channels維
mask[arg_max_rev.tolist()] = 1 # 把要留下的人給1
cfg_mask.append(mask) # append起來就會得到: [tensor(一堆0, 1), tensor(一堆0, 1)]
conv_id+=1
```
output:
```
Conv1, total channel:64, remaining channel:6
Conv2, total channel:128, remaining channel:12
Conv3, total channel:256, remaining channel:76
Conv4, total channel:256, remaining channel:102
Conv5, total channel:512, remaining channel:153
Conv6, total channel:512, remaining channel:51
Conv7, total channel:512, remaining channel:51
cfg: [1, 64, 128, 256, 256, 512, 512, 512]
cfg_new: [1, 6, 12, 76, 102, 153, 51, 51] # 1是額外加的(input channel)
cfg_mask:
[tensor([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]),
tensor([1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1.,
0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1.,
0., 1., 0., 0., 1., 0., 0., 0., 1., 1., 1., 0., 0., 0., 0., 0., 1., 0., 1., 0., 1., 1., 1., 1., 1., 1., 0., 1.,
0., 0., 0., 0., 0., 0., 1., 1., 0., 0., 0., 1., 1., 1., 0., 0., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 1., 1., 0., 1.,
0., 1., 0., 0., 1., 0., 0., 0., 1., 1.]),.....]
```
**step5: save weights to new model (以80%為例)**
```python=
# 用新的 cfg 定義新模型架構
newmodel = CRNN(img_channel=1,
img_height=128,
img_width=128,
num_class=5,
pruning_cfg=cfg_new)
```
```python=
start_mask = torch.ones(1) # input channel
layer_id_in_cfg = 0
end_mask = cfg_mask[layer_id_in_cfg]
for [m0, m1] in zip(model.modules(), newmodel.modules()):
# Conv
if isinstance(m0, nn.Conv2d):
# input channel 根據 mask 中將值為 1 的 positions 全部找出來變成一個 array
idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy())))
# output channel mask 中將值為 1 的 positions 全部找出來變成一個 array
idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy())))
print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1.size))
if idx0.size == 1:
idx0 = np.resize(idx0, (1,))
if idx1.size == 1:
idx1 = np.resize(idx1, (1,))
# 先把 input 要留下的 filters 取出來
w1 = m0.weight.data[:, idx0.tolist(), :, :].clone()
# 再把 output 要留下的 filters 取出來
w1 = w1[idx1.tolist(), :, :, :].clone()
# 存入newmodel中
m1.weight.data = w1.clone()
# BatchNorm
elif isinstance(m0, nn.BatchNorm2d):
# 從 mask 中將值為 1 的 positions 全部找出來變成一個 array
if idx1.size == 1:
idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy())))
if idx1.size == 1:
idx1 = np.resize(idx1,(1,))
# 直接取出參數存入newmodel
m1.weight.data = m0.weight.data[idx1.tolist()].clone()
m1.bias.data = m0.bias.data[idx1.tolist()].clone()
m1.running_mean = m0.running_mean[idx1.tolist()].clone()
m1.running_var = m0.running_var[idx1.tolist()].clone()
layer_id_in_cfg += 1
# 這層的結束是下一層的input
start_mask = end_mask
if layer_id_in_cfg < len(cfg_mask): # do not change in Final FC
end_mask = cfg_mask[layer_id_in_cfg]
# Linear
elif isinstance(m0, nn.Linear):
if layer_id_in_cfg == len(cfg_mask):
if idx0.size == 1:
# 最後一層可能被裁減的 filter 層
idx0 = np.squeeze(np.argwhere(np.asarray(cfg_mask[-1].cpu().numpy())))
if idx0.size == 1:
idx0 = np.resize(idx0, (1,))
m1.weight.data = m0.weight.data[:, idx0].clone()
m1.bias.data = m0.bias.data.clone()
layer_id_in_cfg += 1
continue
m1.weight.data = m0.weight.data.clone()
m1.bias.data = m0.bias.data.clone()
```
```python=
# 新的 model
# torch.size() 順序是 output channel, input channel, kernel size
for i in newmodel.cnn.state_dict():
if ('conv' in i) and ('weight' in i):
print(("================= {} =================").format(i.split('.')[0]))
print('Conv shape: {}'.format(newmodel.cnn.state_dict()[i].shape))
if ('batchnorm' in i) and ('weight' in i):
print('Batch shape: {}'.format(newmodel.cnn.state_dict()[i].shape))
```
output:
```
================= conv0 =================
Conv shape: torch.Size([6, 1, 3, 3])
Batch shape: torch.Size([6])
================= conv1 =================
Conv shape: torch.Size([12, 6, 3, 3])
Batch shape: torch.Size([12])
================= conv2 =================
Conv shape: torch.Size([76, 12, 3, 3])
Batch shape: torch.Size([76])
================= conv3 =================
Conv shape: torch.Size([102, 76, 3, 3])
Batch shape: torch.Size([102])
================= conv4 =================
Conv shape: torch.Size([153, 102, 3, 3])
Batch shape: torch.Size([153])
================= conv5 =================
Conv shape: torch.Size([51, 153, 3, 3])
Batch shape: torch.Size([51])
================= conv6 =================
Conv shape: torch.Size([51, 51, 2, 2])
Batch shape: torch.Size([51])
```
**step6: save model**
```python=
# save newmodel
torch.save({'state_dict': newmodel.state_dict(),
'acc': 0,
'pruning_cfg': cfg_new},
'./pruning_checkpoints/newmodel_pruning_l1_80.pth')
```
## 4. 實驗結果分析
| 項目 | 新模型 (bn) | 新模型 (bn) | 新模型 (l1 norm)| 新模型 (l1 norm)| 新模型 (l1 norm) | 舊模型 |
|----- |-------|--------|-------|--------|-------|--------|
| channel 剪枝比例 | 52% | 80 % | 12 % | 52 % | 80 % | 0 %|
| 權重數 / 節省比例 | 3.57 M (<font color=#F6D55C>54 %</font>) | 2.50 M (<font color=#F6D55C>68 %</font>) | 6.87 M (<font color=#F6D55C>12 %</font>) | 3.79 M (<font color=#F6D55C>52 %</font>) |2.56 M (<font color=#F6D55C>67 %</font>)|7.84 M|
| 模型檔大小 | 14 MB | 9.6 MB | 27 MB | 15 MB | 9.9 MB | 30 MB |
|inference 速度 (跑 50 次) | 37.4 ms +- 6.21 ms | 37.3 ms +- 4.05 ms |8.03 ms +- 0.47 ms | 10.6 ms +- 0.32 ms | 8.38 ms +- 0.30 ms | 11.7 ms +- 0.92 ms |
| 運算量 MACs | 0.26 G (<font color=#F6D55C>61 %</font>) | 0.12 G (<font color=#F6D55C>82 %</font>)| 0.56 G (<font color=#F6D55C>16 %</font>) | 0.24 G (<font color=#F6D55C>64 %</font>) |0.09 G (<font color=#F6D55C>87 %</font>)| 0.67 G |
| best model accuracy | 97.28 % | 96.83 % | 97.42 % | 96.52% | 95.74% | 97.37 % |
### 4.1 L1-norm-pruning 實驗結果分析
**<ins> 4.1.1 finetune / from scratch (pruning rate 52%)**

- 比較pruning rate 52%下,finetune / from scratch的vaild accuracy
- 收斂速度:finetune > from scratch
- finetune明顯比from scratch穩定
(其餘 pruning rate 12% & 82%也是一樣的實驗結果)
### <ins> 4.1.2 finetune (pruning rate 12% / 52% / 80%)

- 比較pruning rate 12% / 52% / 80%下,finetune的vaild accuracy
- 收斂速度: pruning rate12%(綠)>pruning rate52%(藍)>pruning rate80%(橘)
### <ins> 4.1.3 from scratch (pruning rate 12% / 52% / 80%)

- 比較pruning rate 12% / 52% / 80%下,from scratch accuracy
- 收斂速度: pruning rate12%(綠) & pruning rate52%(橘) 不相上下 > pruning rate80%(藍)
- 明顯看出from scratch 與上一張圖(4.2 finetune)比起來都較不穩定
### <ins> 4.1.4 finetune / from scratch (pruning rate 12% / 52% / 80%)

- 比較pruning rate 12% / 52% / 80%下,finetune / from scratch accuracy
- 先看finetune(咖啡色/藍色/綠色)都較from scratch(紅/紫/橘)穩定
- 收斂速度: finetune / 12%(咖啡) > finetune / 52%(藍) > from scratch / 12%(紫) > finetune / 52%(紅) > from scratch / 80% (橘) > finetune / 80% (綠)
## 5. 困難與挑戰
1. 根據 weights 或 scalling factor 剪枝過後,<font color=#F6D55C>原本的模型 accuracy 直接降到零,目前無法找出確切的原因</font>,但是在 finetune 的過程中有 pretrained weights 的明顯訓練的比較快也比較好
2. <font color=#F6D55C>inference 速度剪枝過後沒有提升速度,反而比剪枝前還慢</font>,但是 FLOPs (運算量) 比以前少,目前還找不出原因 (先前 yolo inference 也有類似的狀況)