# PyTorch錯誤筆記 ## weight是拼接而成的 Composed Weight. ### 問題描述 I compose conv2d weight by fewer variable. 有天心血來潮,想只用少量變數構成捲積的權重。例如變數a,b,組合成捲積權重w: > \[\[a,0,b\], > \[a,0,b\], > \[a,0,b\]\] 使用torch.nn.functional.conv2d作捲積,程式如下: ```python=3.7 import torch import torch.optim as optim import torch.nn.functional as F def concatAB(a, b): c = torch.stack((a, 0, b), -1) return torch.stack((c, c, c), 0).unsqueeze(0).unsqueeze(0) a = torch.tensor(1.0, requires_grad=True) b = torch.tensor(-1.2, requires_grad=True) optimizer = optim.Adam([a, b], lr=0.001) w = concatAB(a, b) inputs = torch.randn(1, 1, 5, 5) out = torch.sum(F.conv2d(inputs, w, padding=1)**2) out.backward() #沒問題 optimizer.step() optimizer.zero_grad() out = torch.sum(F.conv2d(inputs, w, padding=1)**2) out.backward() #RuntimeError ''' RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time. ''' ``` ### 除錯 RuntimeError告訴我由於上次backward已經把中間計算刪掉了,若真的有需要兩次backward,考慮寫成out.backward(retain_graph=True)。但問題是,以前實驗過,若權重為leaf tensor,即沒有經過任何張量操作(e.g. cat, stack, add, mul, etc.),沒有這個問題。所以作了以下操作檢查: 1. 印出w,發現w沒有更新。 2. 於是在第二次計算前插入w = concatAB(a, b)更新捲積的權重。 其實到這裡問題就解決了,但又好奇在不更新w的情況下會有什麼錯誤,就把原始程式碼的out.backward()改成out.backward(retain_graph=True): ```python=3.7 out = torch.sum(F.conv2d(inputs, w, padding=1)**2) out.backward(retain_graph=True) #沒問題 optimizer.step() optimizer.zero_grad() out = torch.sum(F.conv2d(inputs, w, padding=1)**2) out.backward() #RuntimeError ''' RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor []] is at version 11; expected version 10 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). ``` 其中裡面的version 11是leaf tensor a,b的更新次數,version 10是構成w的a,b的更新次數,pytorch透過計數leaf tensor的更新次數來防止使用者誤用上一個step的東西。所以把optimizer去掉,a,b沒有更新是沒有error的。 ### 結論 1. 一樣要使用組合的權重的人在訓練時都要記得更新權重。 2. 訓練GAN之類會需要多次backward的人,可以使用多個optimizer避免多個模型的權重更新衝突造成錯誤。 3. 或用tensor.detach()讓generator的假圖片梯度截斷,再輸入給discriminator。 4. 一般得用backward(retain_graph=True)是不太正常的事,訓練的設計可能有問題。如果從什麼地方知道了特殊的訓練技巧是OK的,但如果是執行後才被RuntimeError提示的話還是不要亂用retain_graph=True。