深度學習HW1_Function Approximation

# 深度學習HW1_Function Approximation ###### tags: `pytorch`, `Python筆記` 410823001 電機四許哲瑜 1. what problems you encountered when doing this assignment 隨然之前就有寫過pytorch與temsorflow等訓練AI模型的框架，不過這次首次用d2l這個工具來建立MLP模型，多了許多物件(class)繼承的概念，花了不少時間重新學習。在訓練時不管是使用不同的激勵函數、損失函數與優化器的選擇，訓練的loss都會在0.02左右卡住，不管再多幾次epoch都還是沒辦法下降，可能是建立的層數不夠多，或是單一層的神經元數量的問題，但不太清楚是否有一個依據來做調整，沒次都是隨意更改，讓我遇到很大的問題。 2. how did you solve the problems? * 本次作業我都盡量使用class的方式來操作，首先是關於讀取訓練資料，繼承d2l.DataModul後，分別給予訓練與測試資料，並且從新定義get_datalodaer函式來測定每次訓練的batch_size，程式碼如下顯示: ```python= class MyData(d2l.DataModule): #@save def __init__(self, train_f, train_l,batch_size=64): super().__init__() self.save_hyperparameters() self.X = train_f self.num_train = len(train_f) self.y = train_l ``` ```python= @d2l.add_to_class(MyData) def get_dataloader(self, train): if train: indices = list(range(0, self.num_train)) # The examples are read in random order random.shuffle(indices) for i in range(0, len(indices), self.batch_size): batch_indices = torch.tensor(indices[i: i+self.batch_size]) yield self.X[batch_indices], self.y[batch_indices] else: indices = list(range(0, self.num_val)) for i in range(0, len(indices), self.batch_size): batch_indices = torch.tensor(indices[i: i+self.batch_size]) yield self.v_X[batch_indices], self.v_y[batch_indices] ``` * 接著便是開始建立訓練模型架構，主要使用的激勵函數為ReLU，計算loss的函數為nn.MSELoss(reduction='mean')，並且使用Adam的優化器來訓練。 ```python= class LinearRegressionScratch(d2l.Module): #@save def __init__(self, num_inputs, lr, sigma=0.01): super().__init__() self.log_loss = 0 self.min_loss = 0.0215 self.save_hyperparameters() #self.w = torch.normal(0, sigma, (num_inputs, 1), requires_grad=True) #self.b = torch.zeros(1, requires_grad=True) self.net = nn.Sequential(nn.Linear(num_inputs, 64),nn.ReLU(), nn.Linear(64, 128), nn.ReLU(), nn.Linear(128, 256), nn.ReLU(), nn.Linear(256, 256), nn.ReLU(), nn.Linear(256, 512), nn.ReLU(), nn.Linear(512, 1024), nn.ReLU(), nn.Linear(1024, 512), nn.ReLU(), nn.Linear(512, 128), nn.ReLU(), nn.Linear(128, 256), nn.ReLU(), nn.Linear(256, 512), nn.ReLU(), nn.Linear(512, 1024), nn.ReLU(), nn.Linear(1024, 512), nn.ReLU(), nn.Linear(512, 128), nn.ReLU(), nn.Linear(128, 512), nn.ReLU(), nn.Linear(512, 1024), nn.ReLU(), nn.Linear(1024, 512), nn.ReLU(), nn.Linear(512, 128), nn.ReLU(), nn.Linear(128, 32), nn.ReLU(), nn.Linear(32, 1)) def forward(self, X): """The linear regression model.""" #return torch.matmul(X, self.w) + self.b return self.net(X) def loss(self, y_hat, y): l = (y_hat - y) ** 2 / 2 fn = nn.MSELoss(reduction='mean')#nn.MSELoss() #return l.mean() self.log_loss = fn(y_hat,y) return fn(y_hat,y) def configure_optimizers(self): #return SGD([self.w, self.b], self.lr) optimizer = torch.optim.Adam(self.parameters(), self.lr) #Adam self.scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.01, patience=5, verbose=True, threshold=0, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08) #self.scheduler = lr_scheduler.MultiStepLR(optimizer, milestones=[20,30,35,40,45,75,80,85,90,95,100,110,130], gamma=0.1, verbose=False) #self.scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=14, eta_min=0.0000001, last_epoch=-1) return optimizer ``` * 最後繼承d2l的Trainer來進行fit_epoch的定義，並且在每次loss反向傳遞後進行是否要修改學習速率的判斷，程式碼如下顯示，也將學習速率設定為0.00012779977，並且訓練150 epochs，而最終得到的loss值為0.0219，上傳到kaggle取得的分數為0.02216 ```python= @d2l.add_to_class(d2l.Trainer) #@save def fit_epoch(self): if self.model.log_loss<self.model.min_loss: self.model.min_loss = self.model.log_loss print('min_loss',self.model.min_loss) torch.save(model.state_dict(),'model.pth') self.model.train() for batch in self.train_dataloader: loss = self.model.training_step(batch) self.optim.zero_grad() with torch.no_grad(): loss.backward() if self.gradient_clip_val > 0: # To be discussed later self.clip_gradients(self.gradient_clip_val, self.model) self.optim.step() self.train_batch_idx += 1 self.model.scheduler.step(self.model.log_loss) print(self.model.log_loss) if self.val_dataloader is None: # check if the validation dataloader is ready return self.model.eval() for batch in self.val_dataloader: with torch.no_grad(): self.model.validation_step(self.prepare_batch(batch)) self.val_batch_idx += 1 ``` ```python= model = LinearRegressionScratch(2, lr=0.00012779977) model.to(device) data = MyData(train_features, train_labels) trainer = d2l.Trainer(max_epochs=150) trainer.fit(model, data) preds = model(train_features) loss = model.loss(preds,train_labels) print('loss',loss) ``` 3. is there any innovative design you've made in this assignment? 我在定義優化器的函式中configure_optimizers加入了動態調整學習速率的工具lr_scheduler，分別嘗試了以下三種: a. lr_scheduler.ReduceLROnPlateau :動態調整學習速率，當設定的指標(loss)在5的訓練過程中都沒有下降時，會進行學習速率的調整。 b. lr_scheduler.MultiStepLR: 在訓練達到自己設定的epcoh時，會將當前的學習速率乘上gamma=0.1值 c. lr_scheduler.CosineAnnealingLR: 學習速率會再設定的週期下進行上下調整。經過測試之後，我發現方法a.會有比較好得表現，在訓練過程中遇到局部低點時，調小學習速率會使模型更有機會往最低點前進。 4. what have you learned in this assignment? 在這次的作業我學習到了如何用d2l這項工具，進行基本的繼承與調整函式內容，能快速地幫助我建立好一個完整的模型；我也透過網路搜尋學會得如何動態調整學習速率的函數的用法。另外我發現在調整每次訓練時讀取資料的數量batch_size，當batch_size越大時，模型在訓練的過程中會很快速的收斂，但也可能會造成卡在相對低點而無法繼續優化。而新增神經網路的層數與神經元數量也都會影響到訓練的速度，因此要有更好的模型結果要調整的參數非常的多，需要花上許多時間，十分費時費力，為了達到更好的成果，一點小地方都要十分注意!