DLCV HW1

tags: `Course`

湯濬澤
NTUST_M11015117

Problem 1 - Image Classification

1. Architecture of model A

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

2. Accuracy on validation dataset

Model A (CNN)
48%

Model B (ResNet 50)
86.76%

3. Implement details

首先是 Dataset 部分，會依據檔名去抓出各照片的 Label。

class MyDataset(Dataset):
    def __init__(self, root_dir, transform=None):
        self.root_dir = root_dir
        self.transform = transform
        self.datas = []

        for file in os.listdir(root_dir):
            if file.endswith(".png"):
                filename_split = file.split("_")
                label = int(filename_split[0])
                self.datas.append((file, label))

    def __len__(self):
        return len(self.datas)

    def __getitem__(self, idx):
        img_path = os.path.join(self.root_dir, self.datas[idx][0])
        label = self.datas[idx][1]
        image = Image.open(img_path)
        if self.transform:
            image = self.transform(image)
        return image, label

然後針對 Training set 做左右翻轉，以及資料正規化，正規化的參數由 training dataset 算出。

transform = transforms.Compose([
        transforms.Resize((32,32)),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.5077, 0.4813, 0.4312],
            std=[0.2627, 0.2547, 0.2736]
        ),
                                        ])

# Normalization
imgs = torch.stack([img_t for img_t, _ in train_dataset], dim=3)
print("Dataset shape:", imgs.shape)
print("Dataset mean:", imgs.view(3, -1).mean(dim=1))
print("Datset std:", imgs.view(3, -1).std(dim=1))

Loss 採用 crossEntropy，optimizer 為 SGD，Schedular 負責管理 learning rate，每隔 20 個 epoch 會把 leaning rate 降 10 倍。

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
scheduler = StepLR(optimizer, step_size=20, gamma=0.1)

最後 pca 與 t-sne 透過 scikit-learn 計算。

def get_tsne(data, n_components = 2, n_images = None):   
    if n_images is not None:
        data = data[:n_images]
        
    tsne = manifold.TSNE(n_components = n_components, random_state = 0)
    tsne_data = tsne.fit_transform(data)
    return tsne_data

def get_pca(data, n_components = 2):
    pca = decomposition.PCA()
    pca.n_components = n_components
    pca_data = pca.fit_transform(data)
    return pca_data

4. Alternative model (ResNet 50)

自己的 Model 與 Resnet 50 最大的差別莫過於網路的深度，ResNet 50 設計了 5 個 Stage 以應付 224×224 的圖片，而由於 dataset 的 input 只有 32，因此當初自己的 CNN Model 深度就沒有設計得太深。不過因為網路越深，反而可能導致錯誤率提升，因此 ResNet 透過引入殘差的機制，也就是讓網路學習前一層與現在這層的差異，來避免網路退化。藉以達到較好的效果。

5. PCA

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

6. t-SNE

或許是因為有著 50 種 Class 要分類，導致 t-SNE 出來的結果顏色太相近，較難以分辨。不過 epoch 5 的結果與 epoch 100 相比，確實 epoch 100 的結果比較能看出一些小群體的感覺。

Epoch 5
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Epoch 50
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Epoch 100
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Problem 2 - Semantic Segmentation

1. Architecture of model A (VGG16-FCN32s)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

2. network architecture of model B (VGG16-FCN8s)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

與 FCN32s 相同，兩者都是先做多個 Convolution，由於這些 convolution 與 VGG-16 雷同，因此直接把 convolution 部分替換成 pre-train 的 VGG16。不過相比 FCN32s，FCN8s 多了幾層的上採樣，融合各個層的訊息後再輸出結果，理論上表現應該會比 FCN32s 好上不少。
而在輸出方面，FCN32s 的 mask 呈現明顯的塊狀，而多了上採樣與多層資訊的 FCN8s 就沒這個現象。

3. mIoU

Model A (FCN32s)
66.1319%

Model B (FCN8s)
72.0480%

4. Segmentation results

FCN32s

FCN32s

0013_sat

Epoch 5	Epoch 45	Epoch 70
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →

0062_sat

Epoch 5	Epoch 45	Epoch 70
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →

0104_sat

Epoch 5	Epoch 45	Epoch 70
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →

FCN8s

FCN8s

0013_sat

Epoch 5	Epoch 80	Epoch 160
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →

0062_sat

Epoch 5	Epoch 80	Epoch 160
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →

0104_sat

Epoch 5	Epoch 80	Epoch 160
Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →	Image Not Showing Possible Reasons The image file may be corrupted The server hosting the image is unavailable The image path is incorrect The image format is not supported Learn More →

DLCV HW1

tags: Course

Problem 1 - Image Classification

1. Architecture of model A

2. Accuracy on validation dataset

3. Implement details

4. Alternative model (ResNet 50)

5. PCA

6. t-SNE

Problem 2 - Semantic Segmentation

1. Architecture of model A (VGG16-FCN32s)

2. network architecture of model B (VGG16-FCN8s)

3. mIoU

4. Segmentation results

FCN32s

FCN8s

tags: `Course`