###### tags: `pytorch入門`
[參考連結 microsoft介紹](https://docs.microsoft.com/zh-tw/learn/modules/intro-machine-learning-pytorch/)
# ch6 自動區別
## Automatic differentiation with `torch.autograd`
```python=
import torch
x = torch.ones(5) # input tensor
y = torch.zeros(3) # expected output #期望輸出
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w)+b #實際輸出 z = x*w + b
# 透過loss function 比較期望輸出和實際輸出
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
print(z)
print(loss)
# .grad_fn 存在哪個記憶體位置
print('Gradient function for z =',z.grad_fn)
print('Gradient function for loss =', loss.grad_fn)
```

> 此程式w和b為參數(parameters)
## Computing gradients 梯度計算
> 為了優化神經網路的參數權重,我們需要計算損失函數的參數,對loss取微分(w和b)
> To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters, namely, we need $\frac{∂loss}{∂w}$ $\frac{∂loss}{∂b}$ under some fixed values of x and y.
> To compute those derivatives, we call `loss.backward()`, and then retrieve the values from `w.grad` and `b.grad`
```python=
loss.backward()
print(w.grad)
print(b.grad)
```
## Disabling gradient tracking 停止追蹤梯度計算過程
```python=
z = torch.matmul(x, w)+b
print(z.requires_grad) #requires_grad: 是否還在追蹤
with torch.no_grad(): #停止追蹤
z = torch.matmul(x, w)+b
print(z.requires_grad) #False = 停止追蹤
z = torch.matmul(x, w)+b
z_det = z.detach() #停止追蹤
print(z_det.requires_grad)
```
> `.requires_grad`: 查詢是否追蹤(True/False)
> `torch.no_grad()`: 停止追蹤
> `.detach`: 停止追蹤
### 停止追蹤的原因有二:
1. 為了mark一些被凍結的參數,最常見就是微調(fine tuning)一些預處理的神經網路
To mark some parameters in your neural network at frozen parameters. This is a very common scenario for fine tuning a pre-trained network
2. 為了加速計算過程
To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient.
## More on Computational Graphs 更多的計算圖
> 在很多專案中,我們有一個純量損失函數,而且需要做一些參數的梯度計算。
> 不過也有專案的輸出函數是任意張量(arbitrary tensor)
> 在這種專案中,pytorch允許計算==所謂的==Jacobian product(雅可比矩陣),而不是實際梯度。
原文
> In many cases, we have a scalar loss function, and we need to compute the gradient with respect to some parameters. However, there are cases when the output function is an arbitrary tensor. In this case, PyTorch allows you to compute so-called Jacobian product, and not the actual gradient.
雅可比矩陣
For a vector function $\vec{y}=f(\vec{x})$,
where $\vec{x}=\langle x_1,\dots,x_n\rangle$ and $\vec{y}=\langle y_1,\dots,y_m\rangle$, a gradient of $\vec{y}$ with respect to $\vec{x}$ is given by **Jacobian matrix**:
\begin{aligned}
Jacobian product=J=\left(
\begin{array}{ccc}
\frac{\partial y{1}}{\partial x{1}} & \cdots & \frac{\partial y{1}}{\partial x{n}} \\
\vdots & \ddots & \vdots \\
\frac{\partial y{m}}{\partial x{1}} & \cdots & \frac{\partial y{m}}{\partial x{n}}
\end{array}
\right)
\end{aligned}
---
==以下還不懂==
Instead of computing the Jacobian matrix itself, PyTorch allows you to compute **Jacobian Product** $v^T\cdot J$ for a given input vector $v=(v_1 \dots v_m)$. This is achieved by calling `backward` with $v$ as an argument. The size of $v$ should be the same as the size of the original tensor, with respect to which we want to compute the product:
```python=
inp = torch.eye(5, requires_grad=True)
out = (inp+1).pow(2)
out.backward(torch.ones_like(inp), retain_graph=True)
print("First call\n", inp.grad)
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nSecond call\n", inp.grad)
inp.grad.zero_()
out.backward(torch.ones_like(inp), retain_graph=True)
print("\nCall after zeroing gradients\n", inp.grad)
```
Notice that when we call `backward` for the second time with the same
argument, the value of the gradient is different. This happens because
when doing `backward` propagation, PyTorch **accumulates the
gradients**, i.e. the value of computed gradients is added to the
`grad` property of all leaf nodes of computational graph. If you want
to compute the proper gradients, you need to zero out the `grad`
property before. In real-life training an *optimizer* helps us to do
this.