各種Residual Block

NN不斷在往Deep發展，如果只是簡單地不斷加深網路的層數，並不能使training error下降

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

後來出現了ResNet，ResNet疊了100個Layer，所以他們實作了Residual Block來解決越深卻沒有越好的問題

Residual Block

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

提出這樣的Block的好處在於，在做過去Conv. Layer在做Training的時候所需要的工作量，隨著加入簡單的Identity層"跳躍"兩個Conv. Layer，降低了Conv. Layer訓練時成本(所需要調整的Residual)，也因此NN的層數也就能夠增加，如下圖所示

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

生物學上的神經傳導跳躍
還沒看ResNet的論文，不過個人猜測這應該是受到生物學上的跳躍式動作電位傳導的啟發

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Bottleneck Residual Block

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

左邊是一般的Residual Block，右邊是Bottleneck Residual Block，兩個參數數量一樣，所以時間複雜度一樣但是Bottleneck Residual Block有三層，一般Residual Block只有兩層。

ResNet作者是在超過50層的時候使用Bottle Residual Block，理由是為了增加層數但不想增加訓練時間(論文p.6的Deeper Bottleneck Architectures的描述)，不過個人認為"增加層數就能取得較高準確率"的假設這件事情雖然普遍被實驗支持，但從未有數學上的驗證，並非不能被推翻。

Inverted Residual Block

所以這時候我們再回來看MobileNetV2裡面的Inverted Residual Block的時候就看得懂裡面在講甚麼了

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Inverted Residual Block與其他Residual Block的差別在於他先升高Channel數再降Channel數，一般的Residual Block是先降Channel數再升Channel數。

網路上很多文章都說這是為了保留更多特徵訊息，不過如果回去看論文，這是為了Memory Efficient! 他在第五章有證明不過還沒看得很懂。

所以現在就可以很輕鬆的看懂，PyTorchHub上MobileNetV2的code了，下面就是Inverted Residual Block的實作。




























class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            nn.BatchNorm2d(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)

各種Residual Block

Related Work

Residual Block

Bottleneck Residual Block

Inverted Residual Block

Reference