Try   HackMD

各種Residual Block

NN不斷在往Deep發展,如果只是簡單地不斷加深網路的層數,並不能使training error下降

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

後來出現了ResNet,ResNet疊了100個Layer,所以他們實作了Residual Block來解決越深卻沒有越好的問題

Residual Block

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

提出這樣的Block的好處在於,在做過去Conv. Layer在做Training的時候所需要的工作量,隨著加入簡單的Identity層"跳躍"兩個Conv. Layer,降低了Conv. Layer訓練時成本(所需要調整的Residual),也因此NN的層數也就能夠增加,如下圖所示

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

生物學上的神經傳導跳躍
還沒看ResNet的論文,不過個人猜測這應該是受到生物學上的跳躍式動作電位傳導的啟發

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Bottleneck Residual Block

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

左邊是一般的Residual Block,右邊是Bottleneck Residual Block,兩個參數數量一樣,所以時間複雜度一樣但是Bottleneck Residual Block有三層,一般Residual Block只有兩層。

ResNet作者是在超過50層的時候使用Bottle Residual Block,理由是為了增加層數但不想增加訓練時間(論文p.6的Deeper Bottleneck Architectures的描述),不過個人認為"增加層數就能取得較高準確率"的假設這件事情雖然普遍被實驗支持,但從未有數學上的驗證,並非不能被推翻。

Inverted Residual Block

所以這時候我們再回來看MobileNetV2裡面的Inverted Residual Block的時候就看得懂裡面在講甚麼了

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Inverted Residual Block與其他Residual Block的差別在於他先升高Channel數再降Channel數,一般的Residual Block是先降Channel數再升Channel數。

網路上很多文章都說這是為了保留更多特徵訊息,不過如果回去看論文,這是為了Memory Efficient! 他在第五章有證明不過還沒看得很懂。

所以現在就可以很輕鬆的看懂,PyTorchHub上MobileNetV2的code了,下面就是Inverted Residual Block的實作。

class InvertedResidual(nn.Module): def __init__(self, inp, oup, stride, expand_ratio): super(InvertedResidual, self).__init__() self.stride = stride assert stride in [1, 2] hidden_dim = int(round(inp * expand_ratio)) self.use_res_connect = self.stride == 1 and inp == oup layers = [] if expand_ratio != 1: # pw layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1)) layers.extend([ # dw ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim), # pw-linear nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), nn.BatchNorm2d(oup), ]) self.conv = nn.Sequential(*layers) def forward(self, x): if self.use_res_connect: return x + self.conv(x) else: return self.conv(x)

Reference

  1. https://zhuanlan.zhihu.com/p/28413039
  2. https://www.jianshu.com/p/e502e4b43e6d
  3. https://arxiv.org/pdf/1512.03385.
  4. https://www.cnblogs.com/hejunlin1992/p/9395345.html
  5. https://zhuanlan.zhihu.com/p/33075914