# 大綱
神經裁減(Neural Pruning)是深度學習領域中的一個重要議題,它通常是指在神經網絡中去除一些不必要的、冗餘的神經元或連結,以簡化模型結構並提高計算效率。這樣的優化過程有助於減少模型的存儲需求、提高推理速度,並降低在硬體上運行模型的能源消耗。
而在神經裁減的其中一塊領域中,利用了taylor expansion去估計神經元被裁減之後損失函數的變化。
在論文"PRUNING CONVOLUTIONAL NEURAL NETWORKS
FOR RESOURCE EFFICIENT INFERENCE"中使用taylor expansion的第1項估計
$$f(x+h)\approx f(x)+f'(x)\cdot h$$
去估計當神經元(neuron)被裁減(prune)之後,損失函數的變化。將損失函數變化極小對應的神經元裁減掉,就可以達到神經元裁減的目標。
# 符號定義
* train set: $$D=\{(x_1,y_1),(x_2,y_2),\cdots\}$$
* weight of model: $$\Theta=\{W_1^1,W_1^2,\cdots\},$$ $W_i^j=(w_i^j,b_i^j),$代表第$i$層layer的第$j$個神經元權重
* Loss function: $$L(\Theta,D)$$
# 理論推導
* 當第$j$層layer的第$i$個神經元被裁減之後,代表$\exists W^j_i\in\Theta, W^j_i=0$,假設新的weight:
$$\begin{split}\bar{\Theta}=\{W_1^1,W_1^2,\cdots,W_i^j=0,\cdots\}\end{split}$$
* 這時候的損失函數可以用taylor expansion估計成
$$\begin{split}L(\bar\Theta,D)\approx L(\Theta,D)+\nabla_{W_i^j}L(\Theta,D)\cdot W_i^j\end{split}$$
在這邊$\nabla _{W_i^j}L$代表$L$對 $W_i^j=(w_i^j,b_i^j)$中2個矩陣所有元素偏微分的梯度。
* 那麼損失函數的變化就可以被估計成
$$|\nabla _{W_i^j}L\cdot W_i^j|\approx \left|L(\Theta,D)-L(\bar\Theta,D)\right|$$
理論上只要對應比較小的$|\nabla _{W_i^j}L\cdot W_i^j|$之神經元拿掉,那麼$L$的值不會有太大的變化。
# Pytorch example
* 以下程式碼在估計某個layer中所有神經元的$|\nabla _{W_i^j}L\cdot W_i^j|$:
```python=
def cal_est_delta_loss(model, data_loader, loss_func, layer_tobe_pruned):
weight, bias = layer_tobe_pruned.weight, layer_tobe_pruned.bias
delta_losses = [0 for _ in range(weight.shape[0])]
for (x, y) in data_loader:
ybar = model(x)
loss = loss_func(ybar, y)
weight_grad, bias_grad = torch.autograd.grad(loss, [weight, bias])
for i in range(weight.shape[0]):
delta_losses[i] += (weight_grad[i]*weight[i] + bias_grad[i]*bias[i]).sum().abs().item()
return delta_losses
```
# 結果
以下面這個model在mnist手寫辨識問題上
```python=
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=8, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(in_channels=8, out_channels=16, kernel_size=3, padding=1)
self.conv3 = nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, padding=1)
self.conv4 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
self.linear1 = nn.Linear(in_features=64*7*7, out_features=128)
self.linear2 = nn.Linear(in_features=128, out_features=10)
def forward(self, x):
size = x.size(0)
x = F.relu(self.conv1(x))
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = F.relu(self.conv3(x))
x = F.max_pool2d(F.relu(self.conv4(x)), 2)
x = x.view(size, -1)
x = F.relu(self.linear1(x))
x = F.softmax(self.linear2(x), dim=1)
return x
```
計算每個神經元的$\displaystyle\left|\frac{\partial L}{\partial W_i^j}W_i^j\right|$,並由小到大將對應的神經元裁減掉以得到以下的結果:
```
Before pruning: loss: 1.480422, acc: 98.0900%, num_parameter: 427210
After pruning 216 neurons: loss: 1.480414, acc: 98.0900%, num_parameter: 6951
After pruning 217 neurons: loss: 1.480417, acc: 98.0900%, num_parameter: 6548
After pruning 218 neurons: loss: 1.480423, acc: 98.0900%, num_parameter: 6145
After pruning 219 neurons: loss: 1.480416, acc: 98.0900%, num_parameter: 5742
After pruning 220 neurons: loss: 1.480416, acc: 98.0900%, num_parameter: 5651
After pruning 221 neurons: loss: 1.480420, acc: 98.0900%, num_parameter: 5248
After pruning 222 neurons: loss: 1.480409, acc: 98.0900%, num_parameter: 4845
After pruning 223 neurons: loss: 1.480410, acc: 98.0900%, num_parameter: 4269
After pruning 224 neurons: loss: 1.480389, acc: 98.0900%, num_parameter: 3915
After pruning 225 neurons: loss: 1.479798, acc: 98.1300%, num_parameter: 3388
After pruning 226 neurons: loss: 1.575055, acc: 88.6100%, num_parameter: 3083
After pruning 227 neurons: loss: 1.659003, acc: 80.1400%, num_parameter: 2778
After pruning 228 neurons: loss: 1.769887, acc: 69.0000%, num_parameter: 2473
After pruning 229 neurons: loss: 1.864964, acc: 59.3100%, num_parameter: 2168
```
只要3388個參數就可以有高達98%以上的準確率。
下一篇將討論taylor expansion二次項估計的應用。
# 參考連結
> - Pruning Convolutional Neural Networks for Resource Efficient Inference, https://arxiv.org/abs/1611.06440