# Pytorch 教學
一個機器學習的Frame Work
可以算tensor(GPU加速)、gradient
相較於Tensorflow,Pytorch多用於研究用、容易debug
## 基本步驟
load data
以上使用torch.utils.data.Dataset或torch.utils.data.DataLoader
架設neural network +
定義loss function +
Optimizer(運算gradient decent)
以上使用torch.nn或是torch.optim套件
=> load data然後Training
<=>用其他data來Validation
load data來testing
## Tensor - nD array
可存兩種格式
(32bit浮點數)torch.float 用torch.FloatTensor呼叫
(64bit長整數)torch.long 用torch.LongTensor呼叫
可用a.dtype()得知
tensor的shape同numpy的shape
可用a.shape()得知
tensor的dim同numpy的axis
製造tensor方法類似numpy的array
torch.tensor([1, 2], [3, 4])
torch.form_numpy(np.array([1, 2], [3, 4]))
torch.zeros([2, 2])
torch.ones([1, 2, 5]) (指定的shape要用中括號框起來)
## 常見操作
(dim編號從0開始)
a.reshape([m.n]) 重塑(同np)
a.squeeze(n) a的第n個dim被凍結(同np)
a.unsqueeze(n) a的第n個dim插入新的維度
(np會用np.expand_dims(a, n))
a.transpose(m,n) 轉置a(將m n兩個dim對調)
torch.cat([a, b, c], dim=n) 串接(concatenate)tensor a b c的第n個的dim(其他dim個數必須一樣)
a+b
a-b
a.pow(2) a的2次方
a.sum
a.mean
CPU操作/GPU操作(可以平行運算)
預設 a = a.(to'cpu')
可改成 a = a.(to'cuda')
確認是否有GPU torch.cuda.is_available()
若有很多GPU可以指定如cuda:1 cuda:2...
## Gradient
定義z為tensor x(ij)的元素平方和
z對x(ij)的偏微分就是2x(ij)
建立x
x = torch.tensor([1, 2], [3, 4], requires_gra=True)
建立z
z = x.pow(2).sum
算radient
z.backward()
輸出
x.grad
## 讀取資料
dataset放在dataloader內,每次dataloader就會從dataset裡面拿出一筆資料,之後結合在一起變成一組mini-batch(已有指定batch size)
```
from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
def __init__(self, file):
self.data = ...
# Read data & preprocess
def __getitem__(self, index):
return self.data[index]
# Returns one sample at a time
def __len__(self):
return len(self.data)
# Returns the size of the dataset
```
```
dataset = MyDataset(file)
dataloader = DataLoader(dataset, batch_size, shuffle=True)
# train的時候才要shuffle,test的時候不用
```
## 建立network
線性Linear Layer (Fully-connected Layer):
nn.Linear(in_features, out_features)
兩個輸入代表in/out的維度
假設input x 32維;output y 64維 (32, 64)
Wx+b=y
W為64X32向量

layer = nn.Linear(in_features, out_features)
layer.weight 為 W矩陣
layer.bias 為 b矩陣
常用的activation function:
nn.Sigmoid()
nn.ReLU()
常用的loss function:
nn.MSELoss() Mean Square Error(線性迴歸用)
nn.CrossEntropyLoss() Cross Entropy(分類用)
範例:
```
import torch.nn as nn
class MyModel(nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.net = nn.Sequential(
nn.Linear(10, 32),
nn.Sigmoid(),
nn.Linear(32, 1)
)
# Initialize model & define layers
def forward(self, x):
return self.net(x)
# Compute output of NN
```

## Optimize
計算gradient並更新model
最常用的Stochastic Gradient Descent (SGD):
torch.optim.SGD(params, lr, momentum = 0)
paras為model.parameters()
lr為learning rate
## Train
```
dataset = MyDataset(file)
#vread data via MyDataset
tr_set = DataLoader(dataset, 16, shuffle=True)
# put dataset into Dataloader
model = MyModel().to(device)
# contruct model and move to device (cpu/cuda)
criterion = nn.MSELoss()
# set loss function
optimizer = torch.optim.SGD(model.parameters(), 0.1)
# set optimizer
```
範例:
```
for epoch in range(n_epochs):
# iterate n_epochs
model.train()
# set model to train mode
for x, y in tr_set:
# iterate through the dataloader
optimizer.zero_grad()
# set gradient to zero
x, y = x.to(device), y.to(device)
# move data to device (cpu/cuda)
pred = model(x)
# forward pass (compute output)
loss = criterion(pred, y)
# compute loss
loss.backward()
# compute gradient (backpropagation)
optimizer.step()
# update model with optimizer
```
## Evaluation(Validation Set) 評估
```
model.eval()
# set model to evaluation mode
total_loss = 0
for x, y in dv_set:
# iterate through the dataloader
x, y = x.to(device), y.to(device)
# move data to device (cpu/cuda)
with torch.no_grad():
# disable gradient calculation 評估的時候不用算這個
pred = model(x)
# forward pass (compute output)
loss = criterion(pred, y)
# compute loss
total_loss += loss.cpu().item() * len(x)
# accumulate loss
avg_loss = total_loss / len(dv_set.dataset)
# compute averaged loss
```
根據最後計算的loss決定是否存下這個model
## Testing 測試
```
model.eval()
# set model to evaluation mode
preds = []
for x in tt_set:
# iterate through the dataloader
x = x.to(device)
# move data to device (cpu/cuda)
with torch.no_grad():
# disable gradient calculation
pred = model(x)
# forward pass (compute output)
preds.append(pred.cpu())
# collect prediction
```
## 存取
Save
torch.save(model.state_dict(), path)
Load
ckpt = torch.load(path)
model.load_state_dict(ckpt)
## 其他套件
● torchaudio
○ speech/audio processing
● torchtext
○ natural language processing
● torchvision
○ computer vision
● skorch
○ scikit-learn + pyTorch
### github資源
https://github.com/huggingface/transformersHuggingface
Huggingface Transformers (transformer models: BERT, GPT, ...)
https://github.com/pytorch/fairseq
Fairseq (sequence modeling for NLP & speech)
https://github.com/espnet/espnet
ESPnet (speech recognition, translation, synthesis, ...)
## 官方文件
https://pytorch.org/docs/stable/
torch.nn -> neural network
torch.optim -> optimization algorithms
torch.utils.data -> dataset, dataloader
函式裡面會有
parameter:直接輸入要用的變數即可(文件內*之前的部分)
keyword argument: 需要標記現在要用什麼
有些包含等於的argument通常會有預設值,不改就不用寫
## Reference
● https://pytorch.org/
● https://github.com/pytorch/pytorch
● https://github.com/wkentaro/pytorch-for-numpy-users
● https://blog.udacity.com/2020/05/pytorch-vs-tensorflow-what-you-need-to-know.html
● https://www.tensorflow.org/
● https://numpy.org/
###### tags: `機器學習` `李宏毅`