# PyTorch錯誤筆記
## weight是拼接而成的 Composed Weight.
### 問題描述
I compose conv2d weight by fewer variable.
有天心血來潮,想只用少量變數構成捲積的權重。例如變數a,b,組合成捲積權重w:
> \[\[a,0,b\],
> \[a,0,b\],
> \[a,0,b\]\]
使用torch.nn.functional.conv2d作捲積,程式如下:
```python=3.7
import torch
import torch.optim as optim
import torch.nn.functional as F
def concatAB(a, b):
c = torch.stack((a, 0, b), -1)
return torch.stack((c, c, c), 0).unsqueeze(0).unsqueeze(0)
a = torch.tensor(1.0, requires_grad=True)
b = torch.tensor(-1.2, requires_grad=True)
optimizer = optim.Adam([a, b], lr=0.001)
w = concatAB(a, b)
inputs = torch.randn(1, 1, 5, 5)
out = torch.sum(F.conv2d(inputs, w, padding=1)**2)
out.backward() #沒問題
optimizer.step()
optimizer.zero_grad()
out = torch.sum(F.conv2d(inputs, w, padding=1)**2)
out.backward() #RuntimeError
'''
RuntimeError: Trying to backward through the graph a second time,
but the saved intermediate results have already been freed.
Specify retain_graph=True when calling backward the first time.
'''
```
### 除錯
RuntimeError告訴我由於上次backward已經把中間計算刪掉了,若真的有需要兩次backward,考慮寫成out.backward(retain_graph=True)。但問題是,以前實驗過,若權重為leaf tensor,即沒有經過任何張量操作(e.g. cat, stack, add, mul, etc.),沒有這個問題。所以作了以下操作檢查:
1. 印出w,發現w沒有更新。
2. 於是在第二次計算前插入w = concatAB(a, b)更新捲積的權重。
其實到這裡問題就解決了,但又好奇在不更新w的情況下會有什麼錯誤,就把原始程式碼的out.backward()改成out.backward(retain_graph=True):
```python=3.7
out = torch.sum(F.conv2d(inputs, w, padding=1)**2)
out.backward(retain_graph=True) #沒問題
optimizer.step()
optimizer.zero_grad()
out = torch.sum(F.conv2d(inputs, w, padding=1)**2)
out.backward() #RuntimeError
'''
RuntimeError: one of the variables needed for gradient computation has been
modified by an inplace operation: [torch.FloatTensor []] is at version 11;
expected version 10 instead. Hint: enable anomaly detection to find the operation
that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
```
其中裡面的version 11是leaf tensor a,b的更新次數,version 10是構成w的a,b的更新次數,pytorch透過計數leaf tensor的更新次數來防止使用者誤用上一個step的東西。所以把optimizer去掉,a,b沒有更新是沒有error的。
### 結論
1. 一樣要使用組合的權重的人在訓練時都要記得更新權重。
2. 訓練GAN之類會需要多次backward的人,可以使用多個optimizer避免多個模型的權重更新衝突造成錯誤。
3. 或用tensor.detach()讓generator的假圖片梯度截斷,再輸入給discriminator。
4. 一般得用backward(retain_graph=True)是不太正常的事,訓練的設計可能有問題。如果從什麼地方知道了特殊的訓練技巧是OK的,但如果是執行後才被RuntimeError提示的話還是不要亂用retain_graph=True。