# 深度學習HW1_Function Approximation
###### tags: `pytorch`, `Python筆記`
410823001 電機四 許哲瑜
1. what problems you encountered when doing this assignment
隨然之前就有寫過pytorch與temsorflow等訓練AI模型的框架,不過這次首次用d2l這個工具來建立MLP模型,多了許多物件(class)繼承的概念,花了不少時間重新學習。
在訓練時不管是使用不同的激勵函數、損失函數與優化器的選擇,訓練的loss都會在0.02左右卡住,不管再多幾次epoch都還是沒辦法下降,可能是建立的層數不夠多,或是單一層的神經元數量的問題,但不太清楚是否有一個依據來做調整,沒次都是隨意更改,讓我遇到很大的問題。
2. how did you solve the problems?
* 本次作業我都盡量使用class的方式來操作,首先是關於讀取訓練資料,繼承d2l.DataModul後,分別給予訓練與測試資料,並且從新定義get_datalodaer函式來測定每次訓練的batch_size,程式碼如下顯示:
```python=
class MyData(d2l.DataModule): #@save
def __init__(self, train_f, train_l,batch_size=64):
super().__init__()
self.save_hyperparameters()
self.X = train_f
self.num_train = len(train_f)
self.y = train_l
```
```python=
@d2l.add_to_class(MyData)
def get_dataloader(self, train):
if train:
indices = list(range(0, self.num_train))
# The examples are read in random order
random.shuffle(indices)
for i in range(0, len(indices), self.batch_size):
batch_indices = torch.tensor(indices[i: i+self.batch_size])
yield self.X[batch_indices], self.y[batch_indices]
else:
indices = list(range(0, self.num_val))
for i in range(0, len(indices), self.batch_size):
batch_indices = torch.tensor(indices[i: i+self.batch_size])
yield self.v_X[batch_indices], self.v_y[batch_indices]
```
* 接著便是開始建立訓練模型架構,主要使用的激勵函數為ReLU,計算loss的函數為nn.MSELoss(reduction='mean'),並且使用Adam的優化器來訓練。
```python=
class LinearRegressionScratch(d2l.Module): #@save
def __init__(self, num_inputs, lr, sigma=0.01):
super().__init__()
self.log_loss = 0
self.min_loss = 0.0215
self.save_hyperparameters()
#self.w = torch.normal(0, sigma, (num_inputs, 1), requires_grad=True)
#self.b = torch.zeros(1, requires_grad=True)
self.net = nn.Sequential(nn.Linear(num_inputs, 64),nn.ReLU(),
nn.Linear(64, 128), nn.ReLU(),
nn.Linear(128, 256), nn.ReLU(),
nn.Linear(256, 256), nn.ReLU(),
nn.Linear(256, 512), nn.ReLU(),
nn.Linear(512, 1024), nn.ReLU(),
nn.Linear(1024, 512), nn.ReLU(),
nn.Linear(512, 128), nn.ReLU(),
nn.Linear(128, 256), nn.ReLU(),
nn.Linear(256, 512), nn.ReLU(),
nn.Linear(512, 1024), nn.ReLU(),
nn.Linear(1024, 512), nn.ReLU(),
nn.Linear(512, 128), nn.ReLU(),
nn.Linear(128, 512), nn.ReLU(),
nn.Linear(512, 1024), nn.ReLU(),
nn.Linear(1024, 512), nn.ReLU(),
nn.Linear(512, 128), nn.ReLU(),
nn.Linear(128, 32), nn.ReLU(),
nn.Linear(32, 1))
def forward(self, X):
"""The linear regression model."""
#return torch.matmul(X, self.w) + self.b
return self.net(X)
def loss(self, y_hat, y):
l = (y_hat - y) ** 2 / 2
fn = nn.MSELoss(reduction='mean')#nn.MSELoss()
#return l.mean()
self.log_loss = fn(y_hat,y)
return fn(y_hat,y)
def configure_optimizers(self):
#return SGD([self.w, self.b], self.lr)
optimizer = torch.optim.Adam(self.parameters(), self.lr) #Adam
self.scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.01, patience=5, verbose=True, threshold=0, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)
#self.scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[20,30,35,40,45,75,80,85,90,95,100,110,130], gamma=0.1, verbose=False)
#self.scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=14, eta_min=0.0000001, last_epoch=-1)
return optimizer
```
* 最後繼承d2l的Trainer來進行fit_epoch的定義,並且在每次loss反向傳遞後進行是否要修改學習速率的判斷,程式碼如下顯示,也將學習速率設定為0.00012779977,並且訓練150 epochs,而最終得到的loss值為0.0219,上傳到kaggle取得的分數為0.02216
```python=
@d2l.add_to_class(d2l.Trainer) #@save
def fit_epoch(self):
if self.model.log_loss<self.model.min_loss:
self.model.min_loss = self.model.log_loss
print('min_loss',self.model.min_loss)
torch.save(model.state_dict(),'model.pth')
self.model.train()
for batch in self.train_dataloader:
loss = self.model.training_step(batch)
self.optim.zero_grad()
with torch.no_grad():
loss.backward()
if self.gradient_clip_val > 0: # To be discussed later
self.clip_gradients(self.gradient_clip_val, self.model)
self.optim.step()
self.train_batch_idx += 1
self.model.scheduler.step(self.model.log_loss)
print(self.model.log_loss)
if self.val_dataloader is None: # check if the validation dataloader is ready
return
self.model.eval()
for batch in self.val_dataloader:
with torch.no_grad():
self.model.validation_step(self.prepare_batch(batch))
self.val_batch_idx += 1
```
```python=
model = LinearRegressionScratch(2, lr=0.00012779977)
model.to(device)
data = MyData(train_features, train_labels)
trainer = d2l.Trainer(max_epochs=150)
trainer.fit(model, data)
preds = model(train_features)
loss = model.loss(preds,train_labels)
print('loss',loss)
```
3. is there any innovative design you've made in this assignment?
我在定義優化器的函式中configure_optimizers加入了動態調整學習速率的工具lr_scheduler,分別嘗試了以下三種:
a. lr_scheduler.ReduceLROnPlateau :動態調整學習速率,當設定的指標(loss)在5的訓練過程中都沒有下降時,會進行學習速率的調整。
b. lr_scheduler.MultiStepLR: 在訓練達到自己設定的epcoh時,會將當前的學習速率乘上gamma=0.1值
c. lr_scheduler.CosineAnnealingLR: 學習速率會再設定的週期下進行上下調整。
經過測試之後,我發現方法a.會有比較好得表現,在訓練過程中遇到局部低點時,調小學習速率會使模型更有機會往最低點前進。
4. what have you learned in this assignment?
在這次的作業我學習到了如何用d2l這項工具,進行基本的繼承與調整函式內容,能快速地幫助我建立好一個完整的模型;我也透過網路搜尋學會得如何動態調整學習速率的函數的用法。
另外我發現在調整每次訓練時讀取資料的數量batch_size,當batch_size越大時,模型在訓練的過程中會很快速的收斂,但也可能會造成卡在相對低點而無法繼續優化。而新增神經網路的層數與神經元數量也都會影響到訓練的速度,因此要有更好的模型結果要調整的參數非常的多,需要花上許多時間,十分費時費力,為了達到更好的成果,一點小地方都要十分注意!