# DLCV HW1 ###### tags: `Course` :::success 湯濬澤 NTUST_M11015117 ::: ## Problem 1 - Image Classification ### 1. Architecture of model A ![](https://i.imgur.com/R7s6SoL.png) ### 2. Accuracy on validation dataset :::success Model A (CNN) 48% ::: :::success Model B (ResNet 50) 86.76% ::: ### 3. Implement details 首先是 Dataset 部分,會依據檔名去抓出各照片的 Label。 ```python! class MyDataset(Dataset): def __init__(self, root_dir, transform=None): self.root_dir = root_dir self.transform = transform self.datas = [] for file in os.listdir(root_dir): if file.endswith(".png"): filename_split = file.split("_") label = int(filename_split[0]) self.datas.append((file, label)) def __len__(self): return len(self.datas) def __getitem__(self, idx): img_path = os.path.join(self.root_dir, self.datas[idx][0]) label = self.datas[idx][1] image = Image.open(img_path) if self.transform: image = self.transform(image) return image, label ``` 然後針對 Training set 做左右翻轉,以及資料正規化,正規化的參數由 training dataset 算出。 ```python transform = transforms.Compose([ transforms.Resize((32,32)), transforms.RandomHorizontalFlip(p=0.5), transforms.ToTensor(), transforms.Normalize( mean=[0.5077, 0.4813, 0.4312], std=[0.2627, 0.2547, 0.2736] ), ]) # Normalization imgs = torch.stack([img_t for img_t, _ in train_dataset], dim=3) print("Dataset shape:", imgs.shape) print("Dataset mean:", imgs.view(3, -1).mean(dim=1)) print("Datset std:", imgs.view(3, -1).std(dim=1)) ``` Loss 採用 crossEntropy,optimizer 為 SGD,Schedular 負責管理 learning rate,每隔 20 個 epoch 會把 leaning rate 降 10 倍。 ```python criterion = nn.CrossEntropyLoss() optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) scheduler = StepLR(optimizer, step_size=20, gamma=0.1) ``` 最後 pca 與 t-sne 透過 scikit-learn 計算。 ```python def get_tsne(data, n_components = 2, n_images = None): if n_images is not None: data = data[:n_images] tsne = manifold.TSNE(n_components = n_components, random_state = 0) tsne_data = tsne.fit_transform(data) return tsne_data def get_pca(data, n_components = 2): pca = decomposition.PCA() pca.n_components = n_components pca_data = pca.fit_transform(data) return pca_data ``` ### 4. Alternative model (ResNet 50) 自己的 Model 與 Resnet 50 最大的差別莫過於網路的深度,ResNet 50 設計了 5 個 Stage 以應付 224×224 的圖片,而由於 dataset 的 input 只有 32,因此當初自己的 CNN Model 深度就沒有設計得太深。不過因為網路越深,反而可能導致錯誤率提升,因此 ResNet 透過引入殘差的機制,也就是讓網路學習前一層與現在這層的差異,來避免網路退化。藉以達到較好的效果。 ### 5. PCA ![](https://i.imgur.com/w0myhlO.png) ### 6. t-SNE 或許是因為有著 50 種 Class 要分類,導致 t-SNE 出來的結果顏色太相近,較難以分辨。不過 epoch 5 的結果與 epoch 100 相比,確實 epoch 100 的結果比較能看出一些小群體的感覺。 * **Epoch 5** ![](https://i.imgur.com/Qw99ep6.png) * **Epoch 50** ![](https://i.imgur.com/61aGroq.png) * **Epoch 100** ![](https://i.imgur.com/qLdGgXd.png) ## Problem 2 - Semantic Segmentation ### 1. **Architecture of model A (VGG16-FCN32s)** ![](https://i.imgur.com/1PghOBq.png) ### 2. **network architecture of model B (VGG16-FCN8s)** ![](https://i.imgur.com/VOG2kbN.png) :::info 與 FCN32s 相同,兩者都是先做多個 Convolution,由於這些 convolution 與 VGG-16 雷同,因此直接把 convolution 部分替換成 pre-train 的 VGG16。不過相比 FCN32s,FCN8s 多了幾層的上採樣,融合各個層的訊息後再輸出結果,理論上表現應該會比 FCN32s 好上不少。 而在輸出方面,FCN32s 的 mask 呈現明顯的塊狀,而多了上採樣與多層資訊的 FCN8s 就沒這個現象。 ::: ### 3. mIoU :::success Model A (FCN32s) 66.1319% ::: :::success Model B (FCN8s) 72.0480% ::: ### 4. Segmentation results :::warning :::spoiler {state=open} **FCN32s** #### **FCN32s** **0013_sat** | Epoch 5 | Epoch 45 | Epoch 70 | | -------- | -------- | -------- | | ![](https://i.imgur.com/GF3ZbVD.png) | ![](https://i.imgur.com/fSLYiKD.png) | ![](https://i.imgur.com/MIaDkYE.png) | **0062_sat** | Epoch 5 | Epoch 45 | Epoch 70 | | -------- | -------- | -------- | | ![](https://i.imgur.com/2kwcK2y.png) | ![](https://i.imgur.com/F6hNgfZ.png) | ![](https://i.imgur.com/TE7AFvI.png) | **0104_sat** | Epoch 5 | Epoch 45 | Epoch 70 | | -------- | -------- | -------- | | ![](https://i.imgur.com/tKDZsxS.png) | ![](https://i.imgur.com/mMyf5Ge.png) | ![](https://i.imgur.com/qKlQRYf.png) | ::: :::warning :::spoiler {state=open} **FCN8s** #### **FCN8s** **0013_sat** | Epoch 5 | Epoch 80 | Epoch 160 | | -------- | -------- | -------- | | ![](https://i.imgur.com/iUqtyWk.png) | ![](https://i.imgur.com/e4xtDjx.png) | ![](https://i.imgur.com/KVtE2og.png) | **0062_sat** | Epoch 5 | Epoch 80 | Epoch 160 | | -------- | -------- | -------- | | ![](https://i.imgur.com/JE72mfS.png) | ![](https://i.imgur.com/ueYFlPR.png) | ![](https://i.imgur.com/lQK3bgs.png) | **0104_sat** | Epoch 5 | Epoch 80 | Epoch 160 | | -------- | -------- | -------- | | ![](https://i.imgur.com/a5fQQJF.png) | ![](https://i.imgur.com/lv3nASR.png) | ![](https://i.imgur.com/xzPofZq.png) | :::