# ML2021Spring-hw4 解題筆記
## 題目說明
VoxCeleb1 資料集
利用 Transformer Encoder 做語者辨識
69438 labeled data for training
切十分之一給 Validation
6000 unlabeled data for testing
可分成 600 個類別(600 位語者)
## 成果

Passed Strong Baseline (Pravate Strong:0.95333、Public Strong:0.95404)
## 做了什麼?
先看原本的 Network:
```python=
class Classifier(nn.Module):
def __init__(self, d_model=80, n_spks=600, dropout=0.1):
super().__init__()
self.prenet = nn.Linear(40, d_model)
self.encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model, dim_feedforward=256, nhead=2
)
# self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=2)
self.pred_layer = nn.Sequential(
nn.Linear(d_model, d_model),
nn.ReLU(),
nn.Linear(d_model, n_spks),
)
def forward(self, mels):
out = self.prenet(mels)
out = out.permute(1, 0, 2)
out = self.encoder_layer(out)
out = out.transpose(0, 1)
# mean pooling
stats = out.mean(dim=1)
# out: (batch, n_spks)
out = self.pred_layer(stats)
return out
```
助教講解中提示:Model 可能太複雜了,可以試著減少 d_model 和 multihead attention 的 head 數,甚至可以讓 Linear 更簡單
但我一開始嘗試的時候反其道而行,將 `d_model` 改為 100,`nhead` 改為 4
並且在 Linear Layers 中間加入 BatchNorm,也將 encoder layer 疊成兩層
變成以下這個樣子:
```python=
class Classifier(nn.Module):
# TODO: d_model
def __init__(self, d_model=100, n_spks=600, dropout=0.1):
super().__init__()
# Project the dimension of features from that of input into d_model.
self.prenet = nn.Linear(40, d_model)
self.encoder_layer = nn.TransformerEncoderLayer(
d_model=d_model, dim_feedforward=256, nhead=4
)
self.encoder = nn.TransformerEncoder(self.encoder_layer, num_layers=2)
self.pred_layer = nn.Sequential(
nn.Linear(d_model, d_model),
nn.BatchNorm1d(d_model),
nn.ReLU(),
nn.Linear(d_model, n_spks),
)
def forward(self, mels):
# out: (batch size, length, d_model)
out = self.prenet(mels)
# out: (length, batch size, d_model)
out = out.permute(1, 0, 2)
# The encoder layer expect features in the shape of (length, batch size, d_model).
out = self.encoder(out)
# out: (batch size, length, d_model)
out = out.transpose(0, 1)
# mean pooling
stats = out.mean(dim=1)
# out: (batch, n_spks)
out = self.pred_layer(stats)
return out
```
這時就通過 medium baseline 了,而且離 strong baseline 不遠
觀察一下訓練時的數據,發現 Train Accuracy 與 Validate Accuracy 相差不遠
Train Accuracy 還沒達到 100%,我覺得還有機會讓 Model 進步
於是我修改了一下訓練步驟,在訓練前先讀取上一次儲存最好的 Model:
```python=
# load latest best model
model.load_state_dict(torch.load('model.ckpt'))
print(f"[Info]: Loaded the trained model.", flush=True)
```
並且加上選項,Train 到最後時可以選擇是否要停止:
```python=
if (step + 1) == total_steps:
yn = input("\nContinue Training? [y/n] ")
if yn == 'y':
total_steps += 10000
```
經過好幾次重複用目前最好的 Model 狂 Train 好幾次後,就通過 Strong baseline 了
***大概是目前最快得到好結果的一次***
Note:這題有 learning rate warm up,所以重新 load 進來 Train 的最一開始時,Accuracy 會稍微比上一輪儲存時還要低,但是 Train 到最後的成果會比上一輪好,而且也比按 y 繼續 Train 來得有效。