Pytorch 教學

# Pytorch 教學一個機器學習的Frame Work 可以算tensor(GPU加速)、gradient 相較於Tensorflow，Pytorch多用於研究用、容易debug ## 基本步驟 load data 以上使用torch.utils.data.Dataset或torch.utils.data.DataLoader 架設neural network + 定義loss function + Optimizer（運算gradient　decent）以上使用torch.nn或是torch.optim套件 => load data然後Training <=>用其他data來Validation load data來testing ## Tensor - nD array 可存兩種格式 (32bit浮點數)torch.float 用torch.FloatTensor呼叫 (64bit長整數)torch.long 用torch.LongTensor呼叫可用a.dtype()得知 tensor的shape同numpy的shape 可用a.shape()得知 tensor的dim同numpy的axis 製造tensor方法類似numpy的array torch.tensor([1, 2], [3, 4]) torch.form_numpy(np.array([1, 2], [3, 4])) torch.zeros([2, 2]) torch.ones([1, 2, 5]) （指定的shape要用中括號框起來） ## 常見操作（dim編號從0開始） a.reshape([m.n]) 重塑（同np） a.squeeze(n) a的第n個dim被凍結（同np） a.unsqueeze(n) a的第n個dim插入新的維度（np會用np.expand_dims(a, n)） a.transpose(m,n) 轉置a（將m n兩個dim對調） torch.cat([a, b, c], dim=n) 串接(concatenate)tensor a b c的第n個的dim（其他dim個數必須一樣） a+b a-b a.pow(2) a的2次方 a.sum a.mean CPU操作/GPU操作（可以平行運算）預設 a = a.(to'cpu') 可改成 a = a.(to'cuda') 確認是否有GPU torch.cuda.is_available() 若有很多GPU可以指定如cuda:1 cuda:2... ## Gradient 定義z為tensor x(ij)的元素平方和 z對x(ij)的偏微分就是2x(ij) 建立x x = torch.tensor([1, 2], [3, 4], requires_gra=True) 建立z z = x.pow(2).sum 算radient z.backward() 輸出 x.grad ## 讀取資料 dataset放在dataloader內，每次dataloader就會從dataset裡面拿出一筆資料，之後結合在一起變成一組mini-batch（已有指定batch size） ``` from torch.utils.data import Dataset, DataLoader class MyDataset(Dataset): def __init__(self, file): self.data = ... # Read data & preprocess def __getitem__(self, index): return self.data[index] # Returns one sample at a time def __len__(self): return len(self.data) # Returns the size of the dataset ``` ``` dataset = MyDataset(file) dataloader = DataLoader(dataset, batch_size, shuffle=True) # train的時候才要shuffle，test的時候不用 ``` ## 建立network 線性Linear Layer (Fully-connected Layer)： nn.Linear(in_features, out_features) 兩個輸入代表in/out的維度假設input x 32維；output y 64維 (32, 64) Wx+b=y W為64X32向量 ![](https://i.imgur.com/vEiNXKI.png) layer = nn.Linear(in_features, out_features) layer.weight 為 W矩陣 layer.bias 為 b矩陣常用的activation function： nn.Sigmoid() nn.ReLU() 常用的loss function： nn.MSELoss() Mean Square Error（線性迴歸用） nn.CrossEntropyLoss() Cross Entropy（分類用）範例： ``` import torch.nn as nn class MyModel(nn.Module): def __init__(self): super(MyModel, self).__init__() self.net = nn.Sequential( nn.Linear(10, 32), nn.Sigmoid(), nn.Linear(32, 1) ) # Initialize model & define layers def forward(self, x): return self.net(x) # Compute output of NN ``` ![](https://i.imgur.com/BNLp1G8.png) ## Optimize 計算gradient並更新model 最常用的Stochastic Gradient Descent (SGD)： torch.optim.SGD(params, lr, momentum = 0) paras為model.parameters() lr為learning rate ## Train ``` dataset = MyDataset(file) #vread data via MyDataset tr_set = DataLoader(dataset, 16, shuffle=True) # put dataset into Dataloader model = MyModel().to(device) # contruct model and move to device (cpu/cuda) criterion = nn.MSELoss() # set loss function optimizer = torch.optim.SGD(model.parameters(), 0.1) # set optimizer ``` 範例： ``` for epoch in range(n_epochs): # iterate n_epochs model.train() # set model to train mode for x, y in tr_set: # iterate through the dataloader optimizer.zero_grad() # set gradient to zero x, y = x.to(device), y.to(device) # move data to device (cpu/cuda) pred = model(x) # forward pass (compute output) loss = criterion(pred, y) # compute loss loss.backward() # compute gradient (backpropagation) optimizer.step() # update model with optimizer ``` ## Evaluation(Validation Set) 評估 ``` model.eval() # set model to evaluation mode total_loss = 0 for x, y in dv_set: # iterate through the dataloader x, y = x.to(device), y.to(device) # move data to device (cpu/cuda) with torch.no_grad(): # disable gradient calculation 評估的時候不用算這個 pred = model(x) # forward pass (compute output) loss = criterion(pred, y) # compute loss total_loss += loss.cpu().item() * len(x) # accumulate loss avg_loss = total_loss / len(dv_set.dataset) # compute averaged loss ``` 根據最後計算的loss決定是否存下這個model ## Testing 測試 ``` model.eval() # set model to evaluation mode preds = [] for x in tt_set: # iterate through the dataloader x = x.to(device) # move data to device (cpu/cuda) with torch.no_grad(): # disable gradient calculation pred = model(x) # forward pass (compute output) preds.append(pred.cpu()) # collect prediction ``` ## 存取 Save torch.save(model.state_dict(), path) Load ckpt = torch.load(path) model.load_state_dict(ckpt) ## 其他套件 ● torchaudio ○ speech/audio processing ● torchtext ○ natural language processing ● torchvision ○ computer vision ● skorch ○ scikit-learn + pyTorch ### github資源 https://github.com/huggingface/transformersHuggingface Huggingface Transformers (transformer models: BERT, GPT, ...) https://github.com/pytorch/fairseq Fairseq (sequence modeling for NLP & speech) https://github.com/espnet/espnet ESPnet (speech recognition, translation, synthesis, ...) ## 官方文件 https://pytorch.org/docs/stable/ torch.nn -> neural network torch.optim -> optimization algorithms torch.utils.data -> dataset, dataloader 函式裡面會有 parameter：直接輸入要用的變數即可（文件內*之前的部分） keyword argument: 需要標記現在要用什麼有些包含等於的argument通常會有預設值，不改就不用寫 ## Reference ● https://pytorch.org/ ● https://github.com/pytorch/pytorch ● https://github.com/wkentaro/pytorch-for-numpy-users ● https://blog.udacity.com/2020/05/pytorch-vs-tensorflow-what-you-need-to-know.html ● https://www.tensorflow.org/ ● https://numpy.org/ ###### tags: `機器學習` `李宏毅`