# PyTorch & TF tutorial
# Pytorch tutorial
* tensor和numpy array很像,不過不一樣的地方是tensor的運算可以藉由GPU來做平行化。
* 如何建立一個tensor物件
```python=
import torch
x=torch.tensor([1,2,3])
# x: tensor([1, 2, 3])
```
或是在建立時指定data type
```python=
x=torch.tensor([1,2,3],dtype=torch.float32)
#x: tensor([1., 2., 3.])
```
**有時在training 的時候會遇到input data和model parameter一個是float32一個是float64,可以把下面tutorial的torch.tensor(...,dtype=torch.float64)改成float32,或是直接用torch.FloatTensor([1,2,3])最簡單。**
也可以初始化成全部都0或1的tensor等等...
```python=
x=torch.ones((2,4))
#x: tensor([[1., 1., 1., 1.],
# [1., 1., 1., 1.]])
x=torch.zeros((2,4))
#x: tensor([[0., 0., 0., 0.],
# [0., 0., 0., 0.]])
```
* tensor的reshape
```python=
x=torch.tensor([[1,2,3,4],[5,6,7,8]])
#x: tensor([[1, 2, 3, 4],
# [5, 6, 7, 8]])
x=x.view((4,-1))
#tensor([[1, 2],
# [3, 4],
# [5, 6],
# [7, 8]])
x=x.view(8)
#tensor([1, 2, 3, 4, 5, 6, 7, 8])
.
.
.
```
[torch.tensor的操作們](https://pytorch.org/docs/stable/torch.html)
* torch.tensor和np.array之間的轉換
* torch.tensor to np.array
```python=
a=torch.ones(5)
b=a.numpy()
#a: tensor([1., 1., 1., 1., 1.])
#b: [1. 1. 1. 1. 1.]
```
* np.array to torch.tensor
```python=
a=np.ones(5)
b=torch.from_numpy(a)
#a: [1. 1. 1. 1. 1.]
#b: tensor([1., 1., 1., 1., 1.],dtype=torch.float64)
```
值得注意的是a和b其實是share同一個address,因此只要對a的數值做變動,b的數值也會跟著變。如果要讓他們獨立的話要用下面的方法。
* torch.tensor to np.array
```python=
a=torch.ones(5)
b=np.array(a)
```
* np.array to torch.tensor
```python=
a=np.ones(5)
b=torch.tensor(a)
```
* 將數值移到GPU上
```python=
a=torch.tensor([1,2,3],device="cuda")
#or
a=torch.tensor([1,2,3])
a=a.to(torch.device("cuda")) '''or''' a=a.cuda()
# a: tensor([1, 2, 3], device='cuda:0')
```
* autograd
pytorch提供自動在tensor上進行微分的功能。只要將tensor的requires_grad屬性設成true,pytorch就會開始記錄對於此tensor做的各種操作。然後最後只要呼叫tensor的`.backward()`,就可以把所有與此tensor有關的所有tensor的偏微分算出來,存在各tensor的`.grad`裡面。例如:
```python=
x=torch.tensor([1,2,3],dtype=torch.float64,requires_grad=True)
y=x+2
#y: tensor([3., 4., 5.], device='cuda:0',
# dtype=torch.float64,grad_fn=<AddBackward0>)
z=y*y*3
#z: tensor([27., 48., 75.], device='cuda:0',
# dtype=torch.float64,grad_fn=<MulBackward0>)
out=z.mean()
#out: tensor(50.0000, device='cuda:0', dtype=torch.float64,
# grad_fn=<MeanBackward0>)
out.backward()
#d(out)/dx is stored in x.grad
#x.grad: tensor([ 6., 8., 10.], dtype=torch.float64)
```
如果要讓pytorch結束對於tensor的紀錄(例如進入到testing state),就call tensor的`.detach_()`,或是將code包在`with torch.no_grad():`裡面就好了,或是在function前面加上@torch.no_grad()
```python=
x.detach_()
#or
with torch.no_grad:
x+1
...
#or
@torch.no_grad()
def my_function_no_grad():
x+1
...
```
值得注意的一點是:grad在反向傳播的過程中是會累加的,這代表每一次進行反向傳播,梯度都會累加到之前的梯度。所以反向傳播之前需把梯度清零。方法如下:
```python=
# 以下底線結尾的函數是inplace操作,會修改自身的值,就像add_
x.grad.data.zero_()
```

* 定義一個model
在pytorch中如果要定義一個Net,可以直接繼承torch.nn.Module。然後在繼承以後有兩個member function一定要實做。```__init__```和```forward```,init部分就是在定義model的架構,forward部分就是在定義數值要怎麼在model裡面運算。
```python=
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1=nn.Conv2d(1,6,3)#1input,6output,3*3kernel
self.conv2=nn.Conv2d(6,16,3)
self.fc1=nn.Linear(16*6*6,120)
self.fc2=nn.Linear(120,84)
self.fc3=nn.Linear(84,10)
def forward(self,x):
x=F.max_pool2d(F.relu(self.conv1(x)),2)
# 2 is for 2*2 max pooling window size
x=F.max_pool2d(F.relu(self.conv2(x)),2)
x=x.view(x.size(0),-1)
x=F.relu(self.fc1(x))
x=F.relu(self.fc2(x))
x=self.fc3(x)
return x
```
或是可以用nn.Sequential簡化forward的部分
```python=
class Net(nn.Module):
def __init__(self):
super().__init__()
self.net=nn.Sequential(
nn.Conv2d(1,6,3),
nn.ReLU(),
nn.MaxPool2d((2,2)),
nn.Conv2d(6,16,3),
nn.Relu(),
nn.MaxPool2d((2,2)),
nn.Flatten(),
nn.Linear(16*6*6,120),
nn.ReLU(),
nn.Linear(120,84),
nn.ReLU(),
nn.Linear(84,10)
)
def forward(self,x):
return self.net(x)
```
* load data
只要用pytorch訓練模型基本上都會用到torch.utils.data.DataLoader來讀取訓練資料並傳給model,而這個DataLoader物件需要吃一個torch.utils.data.Dataset的物件當作它的data來源,因此先從torch.utils.data.Dataset物件介紹起。
* torch.utils.data.Dataset
pytorch定義的一個dataset虛擬物件,要在自己定義的dataset中繼承這個物件,並且一定要實作__getitem__和__len__這兩個function,前者是用來根據傳進來的index回傳dataset中的一組data(包含input和label),後者是回傳dataset的size。例如
```python=
from torch.utils.data import Dataset
class my_dataset(Dataset):#繼承Dataset物件
def __init__(self,...,...):
super().__init__()
self.Data=........
self.Label=.......
def __getitem__(self,index):
#通常會在get item裡面實作data augmentation、transformation
data=torch.tensor(self.Data[index])
label=torch.tensor(self.Label[index])
return data,label
def __len__(self):
return len(self.Label)
data=my_dataset("PATH_TO_DATA","PATH_TO_LABEL")
```
以上方法適用於所有圖片都放在同一個資料夾,透過一個.csv檔來區分每筆資料的class。而如果是在跟目錄下分成好幾個資料夾,每個資料夾是一個class的話。例如
```
root
|____dog
|____1.jpg
|____2.jpg
|____3.jpg
|____cat
|____1.jpg
|____2.jpg
|____3.jpg
```
可以使用torchvision.datasets.imagefloder來更快的實現dataset。如下
```python=
from torchvision import transforms
import torchvision.datasets as datasets
my_transform=transforms.Compose([
transforms.Resize(256),
transforms.RandomHorizontalFlip(),
transforms.CenterCrop(224),
transforms.ToTensor()
])
data_dir='/root'
data=datasets.ImageFolder(data_dir,my_transform)
print(data.classes)
#result: ['dog','cat']
```
* torch.utils.data.DataLoader
torch.utils.data.DataLoader在import以後就可以直接使用了,只要將dataset物件傳給它,設定batch_size、shuffle之類的就好。例如
```python=
from torch.utils.data import DataLoader
my_DataLoader=DataLoader(data,batch_size=10,shuffle=True)
for i,train_data in enumerate(my_DataLoader):
data,label=train_data
.
.
.
.
```
* train model
* pytorch在torch.nn裡面有一些現成的loss function,例如L1Loss、MSELoss、CrossEntropyLoss......。
* torch.optim裡面定義了一些optimizer,optimizer可以基於算出來的梯度對model的參數做更新,並且可以對optimizer設定learning rate、regularization等參數。
接下來就可以利用loss function和optimizer來對model做training,值得注意的一點是,如果model有使用dropout或是batch normalization,model在train和test之前分別要設定為model.train()或是model.eval(),因為dropout和batch norm在training phase和在testing phase的表現會不一樣。
* 可以用tqdm來顯示進度條
```python=
import torch
from torch import nn
epoch=10
model=Net().cuda()
criterion=nn.L1Loss()
optimizer=torch.optim.Adam(model.parameters(),lr=1e-3)
model.train()
for e in range(epoch):
for i,train_data in enumerate(my_DataLoader):
data,label=train_data
data=data.cuda()
label=label.cuda()
optimizer.zero_grad()
output=model(data)
loss=criterion(output,label)
loss.backward()
optimizer.step()
```
* torch.optim裡有lr_scheduler,可以隨epoch調整learning rate。例如
```python=
from torch.optim import lr_scheduler
.
.
.
optimizer=torch.optim.Adam(model.parameters(),lr=1e-3)
my_lr_shceduler=lr_scheduler.StepLR(optimizer,step_size=5,gamma=0.1)
#每5個epoch,learning rate變成0.1倍
for e in range(epoch):
my_lr_schedular.step()
.
.
.
```
* test model
train完以後做prediction或testing要記得將這部分的code用`with torch.no_grad():`或是`with torch.set_grad_enabled(False):`包起來,還有因為在train的時候是以batch為單位做training,因此資料會是四維的(第幾張圖,RGB,X軸,Y軸),但如果是prediction的狀態,只有單張圖片的話,需要幫那張圖片多加一個維度,讓它變成是只有一張圖的batch。因此可以利用`tensor=tensor.unsqueeze(0)`讓它在最前面多加一維。例如
```python=
img=cv2.imread(".......")
img=torch.tensor(img)
img=img.unsqueeze(0).cuda()
model.eval()
with torch.no_grad():
prediction=model(img)
#or
with torch.set_grad_enabled(False):
prediction=model(img)
```
* save/load model:
https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-a-general-checkpoint-for-inference-and-or-resuming-training
# TF2 tutorial
## 基本數值初始化
* 變數的initialization跟pytorch幾乎一樣
```python=
a = tf.ones(shape=(2,3), dtype=tf.int32)
#constant
b = tf.constant([[1, 2, 3], [4, 5, 6]])
print("b =",b);
npvar = np.array(["hello", "world"])
c = tf.constant(npvar)
print("\nc =", c)
d = tf.constant(10.0, shape=[2,5])
print("\nd =", d)
#變數是用Variable
#可以取名字
w = tf.Variable(20., name="my_var01")
initializer = tf.initializers.GlorotUniform()
x = tf.Variable(initializer(shape=(2, 5)), name="my_var02")
y = tf.Variable(tf.zeros([5]), name='my_var03')
```
* 賦值
```python=
w.assign(v)
w.assign_add(v)
```
* 轉成numpy
```python=
w.numpy()
```
## model implementation
### keras sequential API
* keras是TF上層的api,提供一些簡易產生model的方式
```python=
import tensorflow as tf
from tensorflow.keras import layers
model = tf.keras.Sequential()
model.add(layers.Dense(32, activation=‘relu’, input_shape=(784,))
model.add(layers.Dense(128, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
# or
model = tf.keras.Sequential([
layers.Dense(32, activation='relu', input_shape=(784,), layers.Dense(128, activation='relu'),
layers.Dense(10, activation='softmax')
])
```
### Keras functional API
使用functional api可以定義較複雜的模型例如DAG型態的模型
```python=
import tensorflow as tf
from tensorflow.keras import layers
inputs = tf.keras.Input(shape=(784,))
x = layers.Dense(64, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
```
使用`model.summary()`可以看mdoel長怎樣
還可以把model的DAG畫出來
```python=
keras.utils.plot_model(model, 'plot.png', show_shapes=True)
```
### model sub-classing
把model包成一個class
```python=
class MyModel(tf.keras.Model):
def __init__(self, num_classes=10):
super(MyModel, self).__init__()
self.dense_1 = layers.Dense(32,activation='relu')
self.dense_2 = layers.Dense(num_classes,activation='softmax') # Define your forward pass here
def call(self, inputs):
x = self.dense_1(inputs)
return self.dense_2(x)
#也可以把activation放外面
class MyModel(tf.keras.Model):
def __init__(self, num_classes=10):
super(MyModel, self).__init__()
self.dense_1 = layers.Dense(32)
self.dense_2 = layers.Dense(num_classes,activation='softmax') # Define your forward pass here
def call(self, inputs):
x = self.dense_1(inputs)
x = tf.nn.relu(x)
return self.dense_2(x)
```
## training
### Keras built-in method
```python=
model = MyModel()
'''
.compile() is about configuring the training process,
such as specifying the optimizer, loss, and metrics.
'''
model.compile(optimizer=Adam(), loss=BinaryCrossentropy(), metrics=[AUC(), Precision(), Recall()])
history = model.fit(data, epochs=10, batch_size=128,
validation_data=val_data,
callbacks=[EarlyStopping(), TensorBoard(),
ModelCheckpoint()])
results = model.evaluate(test_data, batch_size=128)
```
如果在.compile()裡面加 run_eagerly=True,model就會跑在dynamic graph mode裡。
### GradientTape
自定義training loop
有GradientTape包住的地方TF就會紀錄有關數值的操作,最後就會根據tape記錄到的東西來計算gradient。
```python=
x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
y=x*x
dy_dx = g.gradient(y, x) # Will compute to 6.0
```
二階微分:
```python=
x = tf.constant(3.0)
with tf.GradientTape() as g:
g.watch(x)
with tf.GradientTape() as gg:
gg.watch(x)
y=x*x
dy_dx = gg.gradient(y, x) # Will compute to 6.0
d2y_dx2 = g.gradient(dy_dx, x) # Will compute to 2.0
```
Model training:
```python=
model = MyModel()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)
@tf.function # optional
def train_step(features, labels):
with tf.GradientTape() as tape:
logits = model(features)
loss = loss_fn(labels, logits)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
for features, labels in data:
loss = train_step(features, labels)
```
measure loss和accuracy:TF不用像pytorch一樣自己算accuracy之類的,它有提供現成function
```python=
model = ...
optimizer = ...
loss_fn = ...
train_loss = tf.keras.metrics.Mean(name='train_loss')
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name=‘train_accuracy')
@tf.function
def train_step(features, labels):
with tf.GradientTape() as tape:
logits = model(features)
loss = loss_fn(labels, logits)
grads = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(grads, model.trainable_variables))
train_loss(loss)
train_accuracy(labels, logits)
```
```python=
EPOCHS = 5
for epoch in range(EPOCHS):
# Reset the metrics at the start of the next epoch
train_loss.reset_states()
train_accuracy.reset_states()
for images, labels in train_ds:
train_step(images, labels)
template = 'Epoch {}, Loss: {}, Accuracy: {}'
print(template.format(epoch+1,
train_loss.result(),
train_accuracy.result()*100)
```
總歸一句,tf.keras.metrics的用法:
```python=
m = SomeMetric(...)
for input in ...:
m.update_state(input)
print('Final result: ', m.result())
```
或
```python=
m = SomeMetric(...)
for input in ...:
print('Current result': m(input))
print('Final result: ', m.result().numpy())
```
## data
就我的理解,TF這邊的dataset就相當於pytorch包完dataloader的效果。
```python=
AUTOTUNE = tf.data.experimental.AUTOTUNE
X = Y = Z = AUTOTUNE
dataset = tf.data.Dataset.list_files('/path/to/ds/*.jpg')
dataset = dataset.shuffle(NUM_TOTAL_IMAGES)
dataset = dataset.map(read_image_and_label, num_parallel_calls=Y)
dataset = dataset.map(process_image, num_parallel_calls=Z)
dataset = dataset.batch(batch_size=64)
dataset = dataset.prefetch(buffer_size=X) # Enable pipelining
dataset = dataset.cache()#直接load到memory
def read_image_and_label(filepath):
# label, filepath: 0_28382.jpg
parts = tf.strings.split(filepath, "_")
label = tf.strings.to_number(parts[0], out_type=tf.int32) # image
raw_bytes = tf.io.read_file(filepath)
return raw_bytes, label
def process_image(raw_bytes, label):
img = tf.image.decode_jpeg(raw_bytes, channels=3)
img = tf.image.convert_image_dtype(img, tf.float32)
img = tf.image.resize(img, [32, 32])
img = img / 255.0
img += tf.random.normal(img.shape, stdev=0.1)
return img, label
```

## save/load
### 自動存
用以下方式就可以把weight和optimizer的參數存下來
```python=
model.save_weights('path_to_my_tf_checkpoint')
model.save_weights('path_to_my_tf_checkpoint.anything')
model.save_weights('path_to_my_tf_checkpoint', save_format='tf')
```
load的話如果沒有先把model compile過,就只會load model的參數,不會load optimizer的參數
**注意:如果要load optimizer的參數的話,compile的argument要和之前一樣**
```python=
restored_model = create_model()
restored_model.load_weights('path_to_weights')
```
如果是用model sub-classing的方法定義model的話,load weight的方式會和別的不太一樣,因為model sub-classing是lazy initialization,因此input的dimension是餵進第一筆data以後才會知道。所以load weight之前要先餵東西進去。
### callback functions
可以設定一些callback function例如save checkpoint, LR scheduler之類的東西
```python=
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
period=5)
model.fit(dataset, epochs=10,
validation_data=validation_dataset,
callbacks=[cp_callback]) # Pass callback to training
```
### 手動存
```python=
model = create_model()
opt = tf.keras.optimizers.Adam(0.1)
ckpt = tf.train.Checkpoint(step=tf.Variable(1), optimizer=opt, model=model)
manager = tf.train.CheckpointManager(ckpt, './tf_ckpts', max_to_keep=3)
save_path = manager.save()
```
## @tf.function
如果在function加這個decorator的話,tf會自動將function轉成graph mode,這樣下次在執行的時候就會快很多。但是變成graph mode會變得很難debug,所以一開始在開發的時後可以先不要放@tf.function,或是在最前面先加上`tf.config.run_functions_eagerly(True)`

## multi GPU training

## detail of TF
### side effect
在tf.function中,一般的print只有第一次tf在trace(整個function跑過一遍來建立計算圖)的時候會被call到,但一般的print不會被加入計算圖,所以之後再執行到這個function時就不會再被call到了。如果想要一直被call的的話就要換成tf.print。

就目前的了解,如果某個function是tf.function,但是在裡面被操作的參數都是python的變數而不是tf的變數,那tf就會一直重新產生computation graph,反而造成額外的overhead。
文章有提到盡量不要在tf.function裡面使用list
參考資料:https://jonathan-hui.medium.com/tensorflow-eager-execution-v-s-graph-tf-function-6edaa870b1f1