# Week 15: Yolov4 模型壓縮實作
###### tags: `技術研討`
## 前情提要
## 1. [YOLOv4 架構簡介](https://zhuanlan.zhihu.com/p/150127712)

* overall modules
* backbone
* downsample1~5 (包含 residual block 的架構)
* neck
* head
最後的 output channel,每條 feature 會有 x, y, w, h, object(probability), class (probability / 5 類) > (5 + 5) * 3 (bboxs) = 30
## 2. YOLOv4 架構調整
### 2-1 解析Yolov4 models.py"主程式"架構
先找到主程式!
```python=
class Yolov4(nn.Module):
"""Yolov4主程式架構"""
def __init__(self, yolov4conv137weight=None, n_classes=80, inference=False):
super().__init__()
"""Yolov4主程式架構的定義"""
output_ch = (4 + 1 + n_classes) * 3 # (4 + 1 + 80) *3 = 255
# backbone
self.down1 = DownSample1()
self.down2 = DownSample2()
self.down3 = DownSample3()
self.down4 = DownSample4()
self.down5 = DownSample5()
# neck
self.neek = Neck(inference)
# yolov4conv137
if yolov4conv137weight:
_model = nn.Sequential(self.down1, self.down2, self.down3, self.down4, self.down5, self.neek)
pretrained_dict = torch.load(yolov4conv137weight)
model_dict = _model.state_dict()
# 1. filter out unnecessary keys
pretrained_dict = {k1: v for (k, v), k1 in zip(pretrained_dict.items(), model_dict)}
# 2. overwrite entries in the existing state dict
model_dict.update(pretrained_dict)
_model.load_state_dict(model_dict)
# head
self.head = Yolov4Head(output_ch, n_classes, inference) # 255, 80, True/False
def forward(self, input):
"""Yolov4主程式架構"""
d1 = self.down1(input) # input是圖檔
d2 = self.down2(d1)
d3 = self.down3(d2)
d4 = self.down4(d3)
d5 = self.down5(d4)
x20, x13, x6 = self.neek(d5, d4, d3)
output = self.head(x20, x13, x6)
return output
```
ㄜ.... 漏漏長啦該應從哪開始看
1. 先看forward() (大家請自動忽略neek,他其實叫neck,作者寫錯字以後就繼續錯下去...)
2. 架構分為3部分:
(1) downsample 共5個依序接下去
(2) neck
(3) head
3. 整個架構的連貫性(這3個部分的連貫性):
(1) down1 -> down2 -> down3 -> down4 -> down5
(2) down3, 4, 5 -> neck :star:
(3) neck的conv20, conv13, conv6 -> head :star:
(4) head -> output結束
4. 最後再來細看這3部分架構更細的定義:
來__init__裡面看看都有很清楚的定義,我們下面以head為例,所以來看看head
### 2-2 解析Yolov4 models.py "Yolov4Head程式"架構
以YoloHead為例,Downsample & Neck以此類推
以下每個conv-X都是pre-activation (順序是 BN-Relu-Conv BN-Relu-Conv)
```
(conv6): Bn_Activation_Conv(
(conv): ModuleList(
(0): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): LeakyReLU(negative_slope=0.1, inplace=True)
(2): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
)
```
1. 配架構圖一邊講(先看灰色、橘色字就好)
[架構圖往下下下面拉](https://hackmd.io/p9GjDOm2SgGUyoz7ZiGyOg?both#畫出架構)
3. :star:要剪枝是根據誰在剪枝,是根據下一層的BN layer
記得這裡是pre-activation layer
順序是 BN-Relu-Conv BN-Relu-Conv



4. 再來決定誰要被skip(看藍色、粉色字)(此次調整先不剪他們)
(1) concat問題
(2) 與Neck連動問題
(3) 沒有BatchNorm layer
4. 最後決定出cfg(紅色字)
```python
class Yolov4Head(nn.Module):
"""Yolov4 Head架構定義"""
def __init__(self, output_ch, n_classes, inference=False):
super().__init__()
self.inference = inference
self.conv1 = Bn_Activation_Conv(128, 256, 3, 1, 'leaky')
self.conv2 = Bn_Activation_Conv(256, output_ch, 1, 1, 'linear', bn=False, bias=True)
self.yolo1 = YoloLayer(
anchor_mask=[0, 1, 2], num_classes=n_classes,
anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],
num_anchors=9, stride=8)
# R -4
self.conv3 = Bn_Activation_Conv(128, 256, 3, 2, 'leaky')
# R -1 -16
self.conv4 = Bn_Activation_Conv(512, 256, 1, 1, 'leaky')
self.conv5 = Bn_Activation_Conv(256, 512, 3, 1, 'leaky')
self.conv6 = Bn_Activation_Conv(512, 256, 1, 1, 'leaky')
self.conv7 = Bn_Activation_Conv(256, 512, 3, 1, 'leaky')
self.conv8 = Bn_Activation_Conv(512, 256, 1, 1, 'leaky')
self.conv9 = Bn_Activation_Conv(256, 512, 3, 1, 'leaky')
self.conv10 = Bn_Activation_Conv(512, output_ch, 1, 1, 'linear', bn=False, bias=True)
self.yolo2 = YoloLayer(
anchor_mask=[3, 4, 5], num_classes=n_classes,
anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],
num_anchors=9, stride=16)
# R -4
self.conv11 = Bn_Activation_Conv(256, 512, 3, 2, 'leaky')
# R -1 -37
self.conv12 = Bn_Activation_Conv(1024, 512, 1, 1, 'leaky')
self.conv13 = Bn_Activation_Conv(512, 1024, 3, 1, 'leaky')
self.conv14 = Bn_Activation_Conv(1024, 512, 1, 1, 'leaky')
self.conv15 = Bn_Activation_Conv(512, 1024, 3, 1, 'leaky')
self.conv16 = Bn_Activation_Conv(1024, 512, 1, 1, 'leaky')
self.conv17 = Bn_Activation_Conv(512, 1024, 3, 1, 'leaky')
self.conv18 = Bn_Activation_Conv(1024, output_ch, 1, 1, 'linear', bn=False, bias=True)
self.yolo3 = YoloLayer(
anchor_mask=[6, 7, 8], num_classes=n_classes,
anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],
num_anchors=9, stride=32)
def forward(self, input1, input2, input3): # neck channels(x20, x13, x6)=(128, 256, 512)
"""Yolov4 Head架構"""
x1 = self.conv1(input1) # neck channels(x20) = 128
x2 = self.conv2(x1) # 沒有BN,x2結果直接output,下面有圖例
x3 = self.conv3(input1) # neck channels(x20) = 128
# R -1 -16
x3 = torch.cat([x3, input2], dim=1) # x3 channels=256, neck channels(x13) = 256
x4 = self.conv4(x3) # skip cat完以後的layer,如果剪了conv4,也要跟著剪x3和input2,不然會無法對齊
x5 = self.conv5(x4)
x6 = self.conv6(x5)
x7 = self.conv7(x6)
x8 = self.conv8(x7)
x9 = self.conv9(x8) # x8來的(conv8的結果)
x10 = self.conv10(x9) # 沒有BN,x10結果直接output
# R -4
x11 = self.conv11(x8) # x8來的(conv8的結果)
# R -1 -37
x11 = torch.cat([x11, input3], dim=1) # x11 channels=512, neck channels(x6) = 512,又連動著x8(x8 是 conv11的input),與conv9
x12 = self.conv12(x11) # skip cat完以後的layer,如果剪了conv12,也要跟著剪x11和input3,不然會無法對齊
x13 = self.conv13(x12)
x14 = self.conv14(x13)
x15 = self.conv15(x14)
x16 = self.conv16(x15)
x17 = self.conv17(x16)
x18 = self.conv18(x17) # 沒有BN,x18結果直接output
if self.inference:
y1 = self.yolo1(x2)
y2 = self.yolo2(x10)
y3 = self.yolo3(x18)
return get_region_boxes([y1, y2, y3])
else:
return [x2, x10, x18]
```

### 2-3 畫出架構

### 2-4 把cfg換上去
```python
# 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14
# s s s s s s
[128, 128, 512, 256, 512, 256, 512, 256, 256, 1024, 512, 1024, 512, 1024, 512]
```
```python
class Yolov4Head(nn.Module):
"""Yolov4 Head架構定義調整為使用cfg參數"""
def __init__(self, output_ch, n_classes, inference=False, cfg=None):
"""
original: [128, 128, 512, 256, 512, 256, 512, 256, 256, 1024, 512, 1024, 512, 1024, 512]
new: [128, 128, 512, 113, 253, 136, 241, 256, 256, 1024, 254, 494, 257, 500, 248]
skip: [0, 1, 2, 7, 8, 9]
"""
super().__init__()
self.inference = inference
# x1 = self.conv1(input1)
# input1: (neck的x20) channels=128
# 128, 256 -> 128, 256(因為下一層不是bn所以不動)
self.conv1 = Bn_Activation_Conv(cfg['head'][0], 256, 3, 1, 'leaky')
# bn=False 不用剪
self.conv2 = Bn_Activation_Conv(256, output_ch, 1, 1, 'linear', bn=False, bias=True)
self.yolo1 = YoloLayer(
anchor_mask=[0, 1, 2], num_classes=n_classes,
anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],
num_anchors=9, stride=8)
# R -4
# x3 = self.conv3(input1)
# input1: (neck的x20=128)
# 128, 256 -> 128, 256(下一層是cat所以不剪)
self.conv3 = Bn_Activation_Conv(cfg['head'][1], 256, 3, 2, 'leaky')
# R -1 -16
# (skip cat) conv3的256 + neck的x13的256 (skip所以不動,在cfg裡保留一樣的數字)
# 512, 256 -> 512(256+256), 113
self.conv4 = Bn_Activation_Conv(cfg['head'][2], cfg['head'][3], 1, 1, 'leaky')
# 256, 512 -> 113, 253
self.conv5 = Bn_Activation_Conv(cfg['head'][3], cfg['head'][4], 3, 1, 'leaky')
# 512, 256 -> 253, 136
self.conv6 = Bn_Activation_Conv(cfg['head'][4], cfg['head'][5], 1, 1, 'leaky')
# 256, 512 -> 136, 241
self.conv7 = Bn_Activation_Conv(cfg['head'][5], cfg['head'][6], 3, 1, 'leaky')
# 512, 256 -> 241, 256
self.conv8 = Bn_Activation_Conv(cfg['head'][6], cfg['head'][7], 1, 1, 'leaky')
# 256, 512 -> 256(接conv8的output channel), 512(因為下一層不是bn所以不動)
self.conv9 = Bn_Activation_Conv(cfg['head'][7], 512, 3, 1, 'leaky')
# bn=False 不用剪
self.conv10 = Bn_Activation_Conv(512, output_ch, 1, 1, 'linear', bn=False, bias=True)
self.yolo2 = YoloLayer(
anchor_mask=[3, 4, 5], num_classes=n_classes,
anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],
num_anchors=9, stride=16)
# R -4
# conv11的input channel是:conv 8的output channel來的,所以是接cfg['head][7]
# 256, 512 -> 256(接conv8的output channel), 512(下一層是cat所以不動)
self.conv11 = Bn_Activation_Conv(cfg['head'][8], 512, 3, 2, 'leaky')
# R -1 -37
# (skip cat) conv11的512 + neck的x6的512
# 1024, 512 -> 1024, 254
self.conv12 = Bn_Activation_Conv(cfg['head'][9], cfg['head'][10], 1, 1, 'leaky')
# 512, 1024 ->254, 494
self.conv13 = Bn_Activation_Conv(cfg['head'][10], cfg['head'][11], 3, 1, 'leaky')
# 1024, 512 -> 494, 257
self.conv14 = Bn_Activation_Conv(cfg['head'][11], cfg['head'][12], 1, 1, 'leaky')
# 512, 1024 -> 257, 500
self.conv15 = Bn_Activation_Conv(cfg['head'][12], cfg['head'][13], 3, 1, 'leaky')
# 1024, 512 -> 500, 248
self.conv16 = Bn_Activation_Conv(cfg['head'][13], cfg['head'][14], 1, 1, 'leaky')
# 512, 1024 -> 248, 1024(因為下一層不是bn所以不動)
self.conv17 = Bn_Activation_Conv(cfg['head'][14], 1024, 3, 1, 'leaky')
# bn=False 不用剪
self.conv18 = Bn_Activation_Conv(1024, output_ch, 1, 1, 'linear', bn=False, bias=True)
self.yolo3 = YoloLayer(
anchor_mask=[6, 7, 8], num_classes=n_classes,
anchors=[12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401],
num_anchors=9, stride=32)
```
---
## 3. 剪枝方法實作

### Step1: Load model
```python=
model = torch.load('weights/yolov4.pt')
```
### Step2: Setting parameter
```python=
# 剪枝比率
pruning_rate = 0.5
```
```python=
# cfg
'''
model: 各架構位置
skip: 不剪枝的層數
cfg: 剪枝後剩餘的 channel 數量
cfg_mask: 剪枝後剩餘 channel 的位置
cat_layer: 有 concat 的層數
'''
pruning_cfg = {
'down1':{
'model': model.down1,
'skip': [3, 8, 13, 18, 23, 28, 33, 38],
'cfg': [],
'cfg_mask': [],
'cat_layer': [15, 35]
},
'down2':{
'model': model.down2,
'skip': [3, 8, 13, 21, 26, 32, 37, 42, 47],
'cfg': [],
'cfg_mask': [],
'cat_layer': [10, 44]
},
...
},
'neck':{
'model': model.neek,
'skip': [3, 21, 31, 36, 42, 47, 67, 72, 78, 83], # 3, 42, 78 這幾個會影響 downsample 所以先跳過 / (x13, x6), (72, 36), 因為會接 head 所以 conv14 不能剪
'cfg': [],
'cfg_mask': [],
'cat_layer': [15, 38, 44, 74, 80]
},
'head':{
'model': model.head,
'skip': [3, 12, 17, 42, 51, 56],
'cfg': [128],
'cfg_mask': [],
'cat_layer': [5, 14, 44, 53, 83]
}
}
```
### Step3: Compute threshold
```python=
"""計算global threshold"""
# 計算總共多少 channels
total = 0
for m in model.neek.modules():
if isinstance(m, nn.BatchNorm2d):
total += m.weight.data.shape[0] # m.weight 就是 gamma
# m.weight.data.shape[0]: 64 64 128 128 256 256 256 256 512 512 512 512 512 512 512 512 (channels)(baseline)
# total : 5504 (總共 5504 channels)
# 所有 gamma 值 取絕對值存進 bn
bn = torch.zeros(total) # 1*n維
index = 0
for m in model.neek.modules():
if isinstance(m, nn.BatchNorm2d):
size = m.weight.data.shape[0] # channels
bn[index:(index + size)] = m.weight.data.abs().clone()
index += size
# index+size: 0+64
# bn[0:64] -> bn[ 64+64 : 64+128 ]
# 0:64
# 64:128
# ...
# 4480 4992
# 4992 5504
# bn: tensor([1.2170, 0.7687, ..., 0.5076, 0.4496]) (1*5504維) 把weight全部存進來
# 由小到大排序
y, i = torch.sort(bn) # 小 -> 大
thre_index = int(total * pruning_rate) # scale sparse rate 0.5 剪枝比例
thre = y[thre_index] if thre_index != 0 else 0 # 取第 thre_index 個值當作 threshold,如果 thre_index=0 代表全留,不能取第 0 個要直接改 0
# 之後 weight 會跟 thre 這個數字比大小,產生一個 0, 1 的 tensor,大於 thre 的留下(小於 thre 的就不會被存進 newmodel)
print('Global threshold: {}'.format(thre))
print('Total channels: {}'.format(total))
```
```python=
Global threshold: 0.4899449348449707
Total channels: 10752
```
### Step4: Start pruning
```python=
"""記錄誰該留下誰該剪掉"""
pruned = 0
cfg_new = [] # remaining channel
cfg_mask = [] # 記錄每層 channels,以 0,1 表示剪枝,假設 channels=3, cfg_mask=[0,1,1]
for k, m in enumerate(model.neek.modules()):
if isinstance(m, nn.BatchNorm2d):
thre_ = 0 if k in pruning_cfg['neck']['skip'] else thre # skip 的 layer thre=0
weight_copy = m.weight.data.abs().clone()
mask = weight_copy.gt(thre_).float() # 比大小,大的標記 1 & 小的標記 0,存進 mask
cfg_new.append(int(torch.sum(mask)))
cfg_mask.append(mask.clone())
#cfg: 254 (512 -> 254)
#cfg_mask: [tensor([0., 1., 1., ... 0., 0., 0.])] 512維
pruned = pruned + mask.shape[0] - torch.sum(mask) # 計算pruning ratio
print('layer index: {:d} \t total channel: {:d} \t remaining channel: {:d}'.
format(k, mask.shape[0], int(torch.sum(mask))))
pruned_ratio = pruned / total
print('-------------------------------------------------------------------------')
print('channels pruned / channels total: {} / {}'.format(pruned, total))
print('pruned ratio: {}'.format(pruned_ratio))
```
```python=
layer index: 3 total channel: 1024 remaining channel: 1024
layer index: 8 total channel: 512 remaining channel: 254
layer index: 13 total channel: 1024 remaining channel: 515
layer index: 21 total channel: 2048 remaining channel: 2048
layer index: 26 total channel: 512 remaining channel: 253
layer index: 31 total channel: 1024 remaining channel: 1024
layer index: 36 total channel: 512 remaining channel: 512
layer index: 42 total channel: 512 remaining channel: 512
layer index: 47 total channel: 512 remaining channel: 512
layer index: 52 total channel: 256 remaining channel: 134
layer index: 57 total channel: 512 remaining channel: 262
layer index: 62 total channel: 256 remaining channel: 119
layer index: 67 total channel: 512 remaining channel: 512
layer index: 72 total channel: 256 remaining channel: 256
layer index: 78 total channel: 256 remaining channel: 256
layer index: 83 total channel: 256 remaining channel: 256
layer index: 88 total channel: 128 remaining channel: 63
layer index: 93 total channel: 256 remaining channel: 142
layer index: 98 total channel: 128 remaining channel: 68
layer index: 103 total channel: 256 remaining channel: 125
-------------------------------------------------------------------------
channels pruned / channels total: 1905.0 / 10752
pruned ratio: 0.1771763414144516
```
### Step5: Save weights to new model
```python=
print(cfg_new)
[1024, 254, 515, 2048, 253, 1024, 512, 512, 512, 134, 262, 119, 512, 256, 256, 256, 63, 142, 68, 125]
```
```python=
# cfg
pruning_cfg = {
...
},
'neck':{
'model': model.neek,
'skip': [3, 21, 31, 36, 42, 47, 67, 72, 78, 83], # 3, 42, 78 這幾個會影響 downsample 所以先跳過 / (x13, x6), (72, 36), 因為會接 head 所以 conv14 不能剪
'cfg': cfg_new,
'cfg_mask': cfg_mask,
'cat_layer': [15, 38, 44, 74, 80]
},
...
}
```
```python=
# 用新的 cfg 定義新模型架構
# 剪完的 channel 數要是這樣:
# [1024, 254, 515, 2048, 253, 1024, 512, 512, 512, 134, 262, 119, 512, 256, 256, 256, 63, 142, 68, 125]
newmodel = Yolov4(pruning_cfg=pruning_cfg) # 做一個空的 newmodel,照著我們要的 channel 數
```
```python=
old_modules = list(model.neek.modules())
new_modules = list(newmodel.neek.modules())
layer_id_in_cfg = 0
start_mask = None
end_mask = cfg_mask[layer_id_in_cfg] # 從第0個 cfg_mask 的 channel 開始
```
```python=
for layer_id in range(len(old_modules)):
m0 = old_modules[layer_id]
m1 = new_modules[layer_id]
# 針對 batchnorm
if isinstance(m0, nn.BatchNorm2d):
idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy())))
# end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1.,
# 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
# 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1.,
# 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
# 1的位子
# idx1: [ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54]
if idx1.size == 1:
idx1 = np.resize(idx1, (1,))
# 存入新的權重
m1.weight.data = m0.weight.data[idx1.tolist()].clone()
m1.bias.data = m0.bias.data[idx1.tolist()].clone()
m1.running_mean = m0.running_mean[idx1.tolist()].clone()
m1.running_var = m0.running_var[idx1.tolist()].clone()
layer_id_in_cfg += 1
start_mask = end_mask.clone()
end_mask = cfg_mask[layer_id_in_cfg]
# 針對 conv
elif isinstance(m0, nn.Conv2d):
# 跟 concat 有關的層數,out channel 要保留
if layer_id in pruning_cfg['neck']['cat_layer']:
idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy())))
idx1 = old_modules[layer_id].out_channels
print('=====================================================')
print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1))
# start_mask: tensor([1., 1., 1.])
# idx0: [0 1 2] #1的位子
# end_mask: tensor([0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 0., 0., 1.,
# 0., 1., 0., 1., 0., 1., 0., 0., 1., 1., 0., 1., 1., 0., 1., 0., 1., 1.,
# 1., 1., 0., 1., 1., 1., 0., 0., 0., 0., 1., 1., 0., 1., 0., 1., 1., 1.,
# 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) (64 -> 31個channels)
# 1的位子
# idx1:[ 1 2 5 6 9 10 13 14 17 19 21 23 26 27 29 30 32 34 35 36 37 39 40 41 46 47 49 51 52 53 54]
# In shape: 3, Out shape 31.
if idx0.size == 1:
idx0 = np.resize(idx0, (1,))
w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # in_channel
m1.weight.data = w1.clone() # 存入新的權重
continue
# 如果前兩層是 batch, 則 conv 也要接續
if isinstance(old_modules[layer_id - 2], nn.BatchNorm2d):
idx0 = np.squeeze(np.argwhere(np.asarray(start_mask.cpu().numpy())))
idx1 = np.squeeze(np.argwhere(np.asarray(end_mask.cpu().numpy())))
print('=====================================================')
print('In shape: {:d}, Out shape {:d}.'.format(idx0.size, idx1.size))
if idx0.size == 1:
idx0 = np.resize(idx0, (1,))
if idx1.size == 1:
idx1 = np.resize(idx1, (1,))
w1 = m0.weight.data[:, idx0.tolist(), :, :].clone() # in_channel
w1 = w1[idx1.tolist(), :, :, :].clone() # out_channel
m1.weight.data = w1.clone() # 存入新的權重
continue
# We need to consider the case where there are downsampling convolutions.
# For these convolutions, we just copy the weights.
m1.weight.data = m0.weight.data.clone()
```
```python=
# 新的 model
# torch.size() 順序是 output channel, input channel, kernel size
num = 0
for i in newmodel.neek.state_dict():
if (num % 2) == 0:
print(("================= {} =================").format(i.split('.')[0]))
if 'conv.0.weight' in i:
print('Batch shape: {}'.format(newmodel.neek.state_dict()[i].shape))
num += 1
if 'conv.2.weight' in i:
print('Conv shape: {}'.format(newmodel.neek.state_dict()[i].shape))
num += 1
```
```python=
================= conv1 =================
Batch shape: torch.Size([1024])
Conv shape: torch.Size([254, 1024, 1, 1])
================= conv2 =================
Batch shape: torch.Size([254])
Conv shape: torch.Size([515, 254, 3, 3])
================= conv3 =================
Batch shape: torch.Size([515])
Conv shape: torch.Size([512, 515, 1, 1])
================= conv4 =================
Batch shape: torch.Size([2048])
Conv shape: torch.Size([253, 2048, 1, 1])
...
================= conv20 =================
Batch shape: torch.Size([125])
Conv shape: torch.Size([128, 125, 1, 1])
```
### Step6: Save model
```python=
torch.save(newmodel, './weights/newmodel.pth')
```
## 4. 實驗結果分析
### 模型壓縮成果
#### 準確度
| 衡量指標 | 新模型 (epoch 73) | 新模型 (epoch 7) | 舊模型 |
|----- |-------|--------|-------|
| Average Precision (AP) @[ IoU=0.50:0.95 area= all maxDets=100 ] | 0.604 | 0.590 | 0.606 |
| Average Precision (AP) @[ IoU=0.50 area= all maxDets=100 ] | 0.957 | 0.932 | 0.955 |
| Average Precision (AP) @[ IoU=0.75 area= all maxDets=100 ] | 0.688 | 0.682 | 0.685 |
| Average Precision (AP) @[ IoU=0.50:0.95 area= small maxDets=100 ] | -1.000 | -1.000 | -1.000 |
| Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=100 ] | 0.555 | 0.568 | 0.552 |
| Average Precision (AP) @[ IoU=0.50:0.95 area= large maxDets=100 ] | 0.625 | 0.604 | 0.631 |
| Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 1 ] | 0.474 | 0.475 | 0.473 |
| Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 10 ] | 0.691 | 0.675 | 0.680 |
| Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets=100 ] | 0.746 | 0.734 | 0.733 |
| Average Recall (AR) @[ IoU=0.50:0.95 area= small maxDets=100 ] | -1.000 | -1.000 | -1.000 |
| Average Recall (AR) @[ IoU=0.50:0.95 area=medium maxDets=100 ] | 0.664 | 0.675 | 0.647 |
| Average Recall (AR) @[ IoU=0.50:0.95 area= large maxDets=100 ] | 0.736 | 0.749 | 0.744 |
:pushpin: 可以達到剪枝前模型相當的準確度
#### 壓縮數據
| 項目 | 新模型 | 舊模型 |
|----- |-------|--------|
| channel 剪枝比例 | 17.42% | 0 % |
| 權重數 / 節省比例 | 35,055,933 (<font color=#F6D55C>45.2%</font>) | 63,966,528 |
| memory 使用量 / 節省量 | load model: 900 MB (<font color=#F6D55C>200 MB</font>) <br> after inference: 2.6 GB (<font color=#F6D55C>270 MB</font>) | load model: 1.1 GB <br> after inference: 2.87 GB |
| GPU memroy 使用量 / 節省量 | load model: 1,375 MB (<font color=#F6D55C>108 MB</font>) <br> after inference: 3,705 MB (<font color=#F6D55C>318 MB</font>) | load model: 1,483 MB <br> after inference: 4,023 MB |
| 模型檔大小 / 節省量 | 135 MB (<font color=#F6D55C>110 MB</font>) | 245 MB |
|inference 速度 (跑 100 次) | 43.2 ms +- 3.56 ms | 42.6 ms +- 1.89 ms |
#### 其他資訊
:star: BN 分布狀況 (neck layer)
* 加入 Sparsity-induced Penalty (L1 regularization on the scaling factors in batch normalization (BN) layers) 到 loss 後,繼續訓練

* epoch 28

* epoch 59

:star: BN 分布狀況 (all layer)

:star: 論文實驗結果

## 5. 困難與挑戰
:pushpin: 透過 config 建立模型架構,在有些部分暫且還沒有拆解成功,未來可以想如何建立更有彈性的 config
目前挑戰
1. concat 層
2. 取聯集層
3. 重重限制層
* yolo backbone

* yolo head
