ch 7 - Telling birds from airplanes:Learning from images

# ch 7 - Telling birds from airplanes:Learning from images 這章幾個focus的重點: 1. 建立前饋神經網絡 2. 使用Datasets和DataLoaders加載資料 3. 了解分類損失 --- 第六章提到了通過"**梯度下降**"深入學習的內部機制，還有PyTorch提供的用於構建模型和優化模型的工具。為此，這裡使用了一個只有**一個輸入**和**一個輸出**的*簡單回歸模型*，能夠一目了然 ~~(但這只是borderline exciting(???))~~ * 繼續建立神經網絡基礎 * 將注意力轉移到圖像上(圖像識別) * 通過一個簡單的神經網絡（如上一章中定義的）逐步解決一個簡單的圖像識別問題 * 使用更廣泛的**微小圖像資料集(A dataset of tiny images)** (不是微小的數字資料集) --- ## 7-1 A dataset of tiny images **圖像識別**的最基本的dataset之一：**MNIST**(手寫數字識別資料集) 先下載這章需要的資料集準備使用: **CIFAR-10** (和它的同級CIFAR-100一樣，已成為十年來Computer Vision的經典之作) #### CIFAR-10 ![](https://i.imgur.com/5Os271l.png) ###### Image samples from all CIFAR-10 classes 對應的分類標記：飛機（0），汽車（1），鳥（2），貓（3），鹿（4），狗（5），青蛙（6），馬（7），輪船（8）和卡車（9）。 CIFAR-10有60,000個微小的32×32色（RGB）圖像如今CIFAR-10被認為**過於簡單**，**無法開發或驗證新研究** 但它可以很好地滿足**學習目的** --- 1. 使用該`torchvision`模塊自動下載資料集 2. 將其加載為PyTorch Tensor的集合 --- ### 1 -- 下載CIFAR-10資料集 ```python= # s = "python syntax highlighting" from torchvision import datasets # 導入torchvision模組中的 datasets data_path = '../data-unversioned/p1ch7/' # 下載資料的位置路徑 cifar10 = datasets.CIFAR10(data_path, train=True, download=True) # 實例化資料集 (給training data用) # 如果資料不存在，用TorchVision下載 cifar10_val = datasets.CIFAR10(data_path, train=False, download=True) # 當train=False,這段再次下載資料，但這次是給**驗證資料集**用 # (再次的下載資料集是必要的) # 第一個參數: 下載資料的位置(data_path) # 第二個參數: 指定我們對訓練集還是驗證集感興趣 # 第三個參數: 如果在第一個參數指定的位置找不到資料，是否允許PyTorch下載資料。 ``` --- cifar10實例的方法解析順序將其作為基類包括在內： ```python= # In[4]: type(cifar10).__mro__ ``` ##### ```datasets```子模塊提供固定的最受歡迎的CV資料集。 ( 例如:MNIST, Fashion-MNIST, CIFAR-100, SVHN, Coco, Omniglot...) ```python= # Out[4]: (torchvision.datasets.cifar.CIFAR10, torchvision.datasets.vision.VisionDataset, torch.utils.data.dataset.Dataset, object) ``` ##### 在每種情況下，資料集均作為```torch.utils.data.Dataset```的子類return。 --- ### 2 -- The Dataset class ![](https://i.imgur.com/qZIkjF9.png) ###### tags: PyTorch `Dataset` 物件的概念-->它不一定包含資料，但可以通過```__len__``` 和 ```__getitem__```對其進行統一訪問。 #### 實現所需的物件兩種方法： 1. ```__len__``` : 回傳資料集中的**項目數** 2. ```__getitem__``` :回傳由樣本及其相應標籤（整數index）組成的項目 ```python= # In[5]: len(cifar10) ``` 實際上，當Python object配備了```__len__```方法時，我們可以將其作為參數傳遞給```len```這個內建函數： ```python # Out[5]: 50000 ``` 同樣，dataset已配備```__getitem__```方法因此可以使用標準下標為tuples和list建立索引以訪問單個項 ```python= # In[6]: # 在這裡獲得一個帶有預期輸出的 PIL圖像 # (Python Imaging Library, the PIL package) # 一個值為的整數1，對應於類別“汽車” img, label = cifar10[99] # why is 99?? # 因為CIFAR-10資料集中的第99張圖像：一輛汽車 img, label, class_names[label] ``` ```python # Out[6]: (<PIL.Image.Image image mode=RGB size=32x32 at 0x7FB383657390>, 1, 'automobile') ``` ```data.CIFAR10```資料集中的樣本是RGB PIL圖像的實例。可以立即將其繪製： ```python= # In[7]: plt.imshow(img) #繪圖 plt.show() ``` ###### 輸出為CIFAR-10資料集中的第99張圖像：一輛汽車 ![](https://i.imgur.com/NJbpKQa.png) --- ### 3 -- Dataset transforms 將PIL圖像轉換為Tensor: ```torchvision.transforms``` ```torchvision.transforms``` 定義一組可組合、類似於函數的物件，可以將這些物件作為參數傳遞給```torchvision```諸如的資料集```datasets.CIFAR10(...)```，並在資料加載後回傳之前由```__getitem__```對資料執行轉換。 ```python= # In[8]: from torchvision import transforms dir(transforms) ``` ```python # Out[8]: ['CenterCrop', 'ColorJitter', ... 'Normalize', 'Pad', 'RandomAffine', ... 'RandomResizedCrop', 'RandomRotation', 'RandomSizedCrop', ... 'TenCrop', 'ToPILImage', 'ToTensor', ... ] ``` #### **ToTensor**(其中一種transform) * 可將**NumPy array**和**PIL圖像轉**換為tensor * tensor輸出維度:C × H × W (channel, height, width; just as we covered in chapter 4). ```python= # In[9]: to_tensor = transforms.ToTensor() img_t = to_tensor(img) img_t.shape # C × H × W (channel, height, width) ``` ```python # Out[9]: torch.Size([3, 32, 32]) #C × H × W # 該圖像已變成 3×32×32 tensor # 因此變成了3通道（RGB）32×32圖像 ``` ###### tags: 注意，```label```什麼都沒發生；它仍然是整數。 ```python= # In[10]: # 可以直接通過變數轉換到dataset.CIFAR10 tensor_cifar10 = datasets.CIFAR10(data_path, train=True, download=False, transform=transforms.ToTensor()) ``` **看訪問資料集的元素類型:** ```python= # In[11]: img_t, _ = tensor_cifar10[99] type(img_t) ``` ```python # Out[11]: torch.Tensor # 此時，訪問資料集的元素將回傳tensor，而不是PIL圖像 ``` The shape has the channel as the first dimension, while the scalar type is ```float32```: ```python= # In[12]: img_t.shape, img_t.dtype ``` ```python # Out[12]: (torch.Size([3, 32, 32]), torch.float32) ``` * 原始PIL圖像中的值範圍從0到255（每通道8 bits） * ```ToTensor``` turns the data into a **32-bit floating-point per channel**, scaling the values down from **0.0 to 1.0** : ```python= # In[13]: img_t.min(), img_t.max() ``` ```python # Out[13]: (tensor(0.), tensor(1.)) ``` 驗證是否得到了相同的圖像： ```python # In[14]: plt.imshow(img_t.permute(1, 2, 0)) plt.show() ``` ```python # Out[14]: <Figure size 432x288 with 1 Axes> ``` ![](https://i.imgur.com/lCuoMND.png) ###### tags: 注意如何使用```permute```更改軸的順序(從C×H×W到H×W×C)與Matplotlib預期的相匹配 --- ### 4 -- Normalizing data (資料正規化) 透過選擇**linear activation functions** **(that are linear around 0 plus or minus 1 (or 2))**，將資料保持在相同範圍內，意味著神經元更有可能具有**非零梯度(nonzero gradients and)**，因此能更快學習。同樣的，正規化每個channel以讓它具有相同的分佈，能確保可以使用相同的學習速率，通過梯度下降來混合和、更新channel的訊息。(參見ch4, 5) 為了使每個通道的**平均值(means)** 和**單一標準差(standard deviation)** = 0，可以用以下轉換計算整個資料集中，每個channel的平均值和標準差： ```python v_n[c] = (v[c] - mean[c]) / stdev[c] # The values of "mean" and "stdev" must be computed offline # (they are not computed by the transform) ``` (上式是```transforms.Normalize```在做的運算) 由於CIFAR-10資料集很小，因此我們將能夠在記憶體中完全對其進行操作。 ```python= # In[15]: #沿著額外的維度堆疊資料集，回傳的所有tensor： imgs = torch.stack([img_t for img_t, _ in tensor_cifar10], dim=3) imgs.shape ``` ```python # Out[15]: torch.Size([3, 32, 32, 50000]) ``` * 計算每個平均值(mean): ```python= # In[16]: imgs.view(3, -1).mean(dim=1) # Out[16]: tensor([0.4915, 0.4823, 0.4468]) ``` * 計算標準差(standard deviation): ```python= # In[17]: imgs.view(3, -1).std(dim=1) ``` ```python # Out[17]: tensor([0.2470, 0.2435, 0.2616]) ``` * 用以上得到的數值初始化 the ```Normalize``` transform: ```python= # In[18]: transforms.Normalize((0.4915, 0.4823, 0.4468), (0.2470, 0.2435, 0.2616)) ``` ```python # Out[18]: Normalize(mean=(0.4915, 0.4823, 0.4468), std=(0.247, 0.2435, 0.2616)) ``` * 連接之後的```ToTensor```變換(用```transforms.Compose```): ```python= # In[19]: transformed_cifar10 = datasets.CIFAR10( data_path, train=True, download=False, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.4915, 0.4823, 0.4468), (0.2470, 0.2435, 0.2616)) ])) ``` 請注意在這一點上，畫從資料集繪製的圖像，不會提供真實圖像的真實呈現： ```python= # In[21]: img_t, _ = transformed_cifar10[99] plt.imshow(img_t.permute(1, 2, 0)) plt.show() ``` ###### # out[21]: ![Uploading file..._lw9y4hj5g]() ###### tags: 正規化後的隨機CIFAR-10圖像我們得到的重新歸一化的紅色汽車 * 因為正規化改變了RGB levels超出0.0~1.0範圍，並更改了通道的整體幅度(magnitudes of the channels)。 * 所有資料仍然存在，只是Matplotlib將其呈現為黑色。 --- ## 7.2 Distinguishing birds from airplanes(從飛機上辨別鳥類) 1. 故事: 作者賞鳥俱樂部的朋友Jane在機場南面的樹林中，架一組攝影機，當物體進入鏡時，攝影機會保存鏡頭影像並將其上傳到俱樂部的部落格，供大家即時觀鳥。 2. 問題: 從機場進出的許多**飛機**觸發攝影機，因此Jane需花大量時間從部落格中**刪除飛機的圖片**。 3. 她需要一個**自動系統**：不需人工刪除，而是需要一個神經網絡。 ![Uploading file..._iiirp00o7]() ###### tags: 眼前的問題--通過訓練神經網絡來幫助Jane在飛機中分辨鳥類 ### :+1: 從CIFAR-10資料集中，挑選所有**鳥類和飛機**的資料，並建立一個可以區分鳥類和飛機的神經網絡。 ### 1 -- Building the dataset: 第一步: get the data in the right shape 可以創建一個```Dataset```僅包含鳥類和飛機 (但資料集很小，我們只需建立索引，並讓```len```在資料集上工作即可。) 實際上，它不一定是```torch.utils.data.dataset.Dataset```的子類 ```python= # In[5]: # filter the data in cifar10 # and remap the labels # so they are contiguous label_map = {0: 0, 2: 1} class_names = ['airplane', 'bird'] cifar2 = [(img, label_map[label]) for img, label in cifar10 if label in [0, 2]] cifar2_val = [(img, label_map[label]) for img, label in cifar10_val if label in [0, 2]] # The "cifar2" object satisfies the basic requirements for a "Dataset" # that is, '__len__' and '__getitem__' are defined ``` 這是一個聰明的捷徑，如果我們遇到限制，我們可能希望用適合的Dataset --- ### 2 -- A fully connected model 1. 從ch5 得知它是" a tensor of features in, a tensor of features out " 2. **圖像**只是在**空間配置**中排列的**一組數字** 3. 理論上，如果僅獲取**圖像像素**，並將其拉直成**一個長的一維向量**，我們可以將這些數字視為**輸入特徵**(如下圖) ![Uploading file..._lqs3adlrr]() ###### tags: Treating our image as a 1D vector of values and training a fully connected classifier on it * **從ch5中構建的模型開始，新模型將是```nn.Linear```:** 1. 具有3072個input features的模型 2. some number of hidden features, followed by an activation 3. another ```nn.Linear``` that tapers the network down to (縮減到適當數量) an appropriate output number of features (2, for this use case) ```python= # In[6]: import torch.nn as nn n_out = 2 model = nn.Sequential( nn.Linear( # 32×32×3：每個樣本3,072個輸入特徵 3072, #input feature 512, #hidden layer size # arbitrarily(任意) pick 512 hidden features ), nn.Tanh(), nn.Linear( 512, n_out, #output class:輸出類別 ) ) ``` * 一個神經網絡需要至少一個隱藏層(of activations, so two modules）之間具有非線性 * **隱藏的特徵**表示**通過權重矩陣編碼的輸入之間的（學習）關係** --- ### 3 -- Output of a classifier * make our network output a single scalar value（so n_out = 1） * 將標籤轉換為浮點數（飛機為0.0，鳥為1.0）， * 將其為MSELoss（the average of squared differences in the batch）當成目標為 * 這樣做會將問題轉化為回歸問題 **output is categorical**: it’s either a bird or an airplane (or something else if we had all 10 of the original classes). 簡單來說，就是當我們必須表示分類變量(categorical variable)時，應該切換到該變量的**one-hot-encoding**形式，例如[1, 0]飛機或[0, 1]鳥類（順序是任意的）。理想情況: | 類別 | 飛機 | 鳥 | | -------- | -------- | -------- | | 預期輸出 | torch.tensor([1.0, 0.0]) | torch.tensor([0.0, 1.0] | ###### tags: 關鍵的實現是--可以將"輸出"解釋為"機率"：第一個輸入是“飛機”的機率，第二個輸入是“鳥”的機率 **<<Casting the problem in terms of probabilities imposes a few extra constraints on the outputs of our network>>:** 1. 輸出的每個元素必須在[0.0, 1.0]範圍內（結果的機率不能小於0或大於1）。 2. 輸出元素的總和必須為1.0（會出現兩種結果之一）。 --- ### 4 -- Representing the output as probabilities(這裡會介紹softmax) Enforce in a differentiable way on a vector of numbers --> ***softmax*** (differentiable) [![](https://i.imgur.com/YOG7Q6g.png)](https://kknews.cc/code/np4ya5.html) ***Softmax*** is a function that **takes a vector of values and produces another vector of the same dimension**(它使用值的向量並生成另一個維度相同的向量), where the values satisfy the constraints we just listed to represent probabilities. ![Uploading file..._s5rzazuf3]() ```python= # In[7]: # 取向量的元素，計算元素指數，然後將每個元素除以指數和 def softmax(x): return torch.exp(x) / torch.exp(x).sum() ``` * Test it on an input vector: ```python= # In[8]: x = torch.tensor([1.0, 2.0, 3.0]) softmax(x) ``` ```python # Out[8]: tensor([0.0900, 0.2447, 0.6652]) ``` * As expected, it satisfies the constraints(約束) on probability: ```python= # In[9]: softmax(x).sum() ``` ```python # Out[9]: tensor(1.) ``` 1. softmax是monotone function: 輸入的較低值，對應於輸出中的較低值。 2. It’s not scale invariant(不變), in that the ratio between values is not preserved 3. 學習過程將以**值**(value)具有適當比率的方式來驅動模型的參數 | softmax | input | correspond to | output | | -------- | ----- | -------------- | ------ | | value | lower | <------------> | lower | | value | higher| <------------> | higher | ```nn```模塊使softmax可當成模塊。像往常一樣，輸入tensor可能具有額外的 batch 0th dimension，或俱有沿其**編碼機率**而**沒有沿機率編碼的維度**。因此```nn.Softmax```需要指定要應用softmax函數的尺寸： ```python= # In[10]: softmax = nn.Softmax(dim=1) x = torch.tensor([[1.0, 2.0, 3.0], [1.0, 2.0, 3.0]]) softmax(x) ``` ```python # Out[10]: tensor([[0.0900, 0.2447, 0.6652], [0.0900, 0.2447, 0.6652]]) ``` We have **two input vectors** in **two rows** (just like when we work with batches), so we initialize ```nn.Softmax``` to operate along dimension 1. ```python= # In[11]: model = nn.Sequential( nn.Linear(3072, 512), nn.Tanh(), nn.Linear(512, 2), nn.Softmax(dim=1)) # 模型的末尾添加一個softmax ``` * 在訓練模型之前嘗試running module: ```python= # In[12]: # 構建一個圖像 img, _ = cifar2[0] #a batch of one image, our bird plt.imshow(img.permute(1, 2, 0)) plt.show() ``` ![Uploading file..._qb82hznxr]() ###### tags: CIFAR-10資料集中的一隻隨機鳥（正規化後） * 為了call the model，我們需要使輸入具有正確的尺寸。 * Our model expects 3,072 features in the input, and that ```nn``` works with data organized into batches along the zeroth dimension. ```python= # In[13]: # 將3×32×32圖像轉換為1D tensor # 在第0位置添加額外的維度 img_batch = img.view(-1).unsqueeze(0) ``` ```python= # In[14]: # invoke our model out = model(img_batch) out ``` ```python # Out[14]: tensor([[0.4784, 0.5216]], grad_fn=<SoftmaxBackward>) ``` In our case, we need to take the max along the probability vector (not across batches), therefore, dimension 1: ```python= # In[15]: # 當提供維度時，torch.max返回最大值沿該維度的元素以及該值出現的index _, index = torch.max(out, dim=1) index ``` ```python # Out[15]: tensor([1]) ``` #### 小節: 1. 通過將**模型輸出**轉換為**輸出機率**來調整模型輸出，以適應手頭的分類任務 2. 針對輸入圖像running模型，並驗證plumbing works是否有效 --- ### 5 -- A loss for classifying 使用了均方差（MSE）作為損失可以用MSE並使輸出機率收斂於[0.0, 1.0]和[1.0, 0.0] 飛機的第一個機率高於第二個機率，而鳥類則相反 What we need to maximize in this case is the probability associated with the correct class, ```out[class_index]```, where ```out``` is the output of softmax and ```class_index``` is a vector containing 0 for “airplane” and 1 for “bird” for each sample. This quantity--that is, the probability associated with the correct class--is referred to as the ***likelihood*** (of our model’s parameters, given the data) 簡單來說，我們希望損失函數在likelihood很低時非常高，likelihood之低，以至於替代方案的可能性更高。反之，當likelihood比其他替代方案高時，損失應該很小，而我們並沒有真正將機率提高到1。 ** **negative log likelihood (NLL)** -- a loss function that behaves that way ```python= NLL = - sum(log(out_i[c_i])) # the sum is taken over N samples # c_i is the correct class for sample i. ``` ![](https://i.imgur.com/KMFOf8a.png) ###### tags: The NLL loss as a function of the predicted probabilities 上圖顯示，當為資料分配到低機率時，NLL增長到無窮大；當機率大於0.5時，NLL以相當淺的速率降低。 ★ NLL以**機率**為輸入；因此，隨著likelihood的增加，其他機率必然會降低。綜上所述，我們的分類損失可以如下計算(For each sample in the batch)： * Run the forward pass, 並從最後一（線性）層獲取輸出值。 * 計算他們的softmax，並獲得probabilities(機率)。 * 取對應於正確類別的預測機率(the likelihood of the parameters）。(supervised problem) * 計算logarithm(對數)，在其前面打一個減號，然後將其添加到損失中。 ★ ```nn.NLLLoss``` class: 相對而言達到預期，它不會採用probabilities，而會將log probabilities tensor作為輸入。計算NLL給定了一批資料輸入，當機率接近0時，取機率的對數是很棘手的。解決方法是```nn.LogSoftmax```而不是```nn.Softmax```，這需要注意使計算數值穩定。 ```python= model = nn.Sequential( nn.Linear(3072, 512), nn.Tanh(), nn.Linear(512, 2), nn.LogSoftmax(dim=1)) ``` ```python= # instantiate our NLL loss loss = nn.NLLLoss() ``` ```python= img, label = cifar2[0] out = model(img.view(-1).unsqueeze(0)) loss(out, torch.tensor([label])) # first argument -- loss取'nn.LogSoftmax'一批的輸出 # second argument -- a tensor of class indices (zeros and ones, in our case) tensor(0.6509, grad_fn=<NllLossBackward>) ``` #### 使用交叉熵loss如何比MSE有所改善: 當**預測偏離目標**時，交叉熵loss有一定的斜率（在低損耗角落，正確的類別被分配了99.97％的預測機率）；而MSE在一開始就更早飽和，更重要的是，對於非常錯誤的預測也是如此。根本原因--MSE的斜率太低，無法補償針對錯誤預測的softmax函數的 flatness。這就是為什麼針對機率的MSE不適用於分類的原因。 ![Uploading file..._h0olvyr0r]() ###### tags: The cross entropy (left) and MSE between predicted probabilities and the target probability vector (right) as functions of the predicted scores--that is, before the (log-) softmax --- ### 6 -- Training the classifier ```python= import torch import torch.nn as nn model = nn.Sequential( nn.Linear(3072, 512), nn.Tanh(), nn.Linear(512, 2), nn.LogSoftmax(dim=1)) learning_rate = 1e-2 optimizer = optim.SGD(model.parameters(), lr=learning_rate) loss_fn = nn.NLLLoss() n_epochs = 100 for epoch in range(n_epochs): for img, label in cifar2: out = model(img.view(-1).unsqueeze(0)) loss = loss_fn(out, torch.tensor([label])) optimizer.zero_grad() loss.backward() optimizer.step() print("Epoch: %d, Loss: %f" % (epoch, float(loss))) ``` ![Uploading file..._m3i090my6]() ###### tags: 訓練循環：（A）在整個資料集中平均更新；（B）在每個樣本上更新模型；（C）平均迷你批次的更新在微型批處理上估計的以下梯度（在整個數據集上估計的梯度較差）有助於收斂並防止優化過程陷入其在過程中遇到的**局部最小值** ![Uploading file..._kodk268zk]() ###### tags: 整個資料集（light path）的平均梯度下降與隨機梯度下降的平均值，其中梯度是在**隨機選取**的minibatches上估算的 ![](https://i.imgur.com/2PnivcH.png) ###### tags: 資料加載器通過使用資料集對單個資料項進行採樣來分配小批量 ```torch.utils.data```模塊有一個class，可幫助在小型批次中改組和組織資料：```DataLoader```。資料加載器的工作是從資料集中對微型批次進行採樣從而可以靈活地選擇不同的採樣策略 (一個常見的策略: 在每個時期對資料進行混洗後進行均勻採樣) ```python= train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True) # At a minimum, the 'DataLoader' constructor takes a Dataset object as input, # along with 'batch_size' and a 'shuffle' Boolean # that indicates whether the data needs to be shuffled at the beginning of each epoch ``` ```python= # A 'DataLoader' can be iterated over # so we can use it directly in the inner loop of our new training code import torch import torch.nn as nn train_loader = torch.utils.data.DataLoader(cifar2, batch_size=64, shuffle=True) model = nn.Sequential( nn.Linear(3072, 512), nn.Tanh(), nn.Linear(512, 2), nn.LogSoftmax(dim=1)) learning_rate = 1e-2 optimizer = optim.SGD(model.parameters(), lr=learning_rate) loss_fn = nn.NLLLoss() n_epochs = 100 for epoch in range(n_epochs): for imgs, labels in train_loader: batch_size = imgs.shape[0] outputs = model(imgs.view(batch_size, -1)) loss = loss_fn(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() print("Epoch: %d, Loss: %f" % (epoch, float(loss))) ``` 在每個內部iteration中，imgs大小為64×3×32×32的tensor-即一小批包含64張（32×32）RGB圖像-而labels大小為64的tensor包含label索引 ```python # 訓練 Epoch: 0, Loss: 0.523478 Epoch: 1, Loss: 0.391083 Epoch: 2, Loss: 0.407412 Epoch: 3, Loss: 0.364203 ... Epoch: 96, Loss: 0.019537 Epoch: 97, Loss: 0.008973 Epoch: 98, Loss: 0.002607 Epoch: 99, Loss: 0.026200 ``` 看到loss以某種方式減少，但是不知道loss是否足夠低。由於我們的目標是**正確地為圖像分配類別**，並且最好在一個**獨立的資料集**上進行。因此我們可以根據**正確類別的總數**在**驗證集上**計算模型的準確性： ```python= val_loader = torch.utils.data.DataLoader(cifar2_val, batch_size=64, shuffle=False) correct = 0 total = 0 with torch.no_grad(): for imgs, labels in val_loader: batch_size = imgs.shape[0] outputs = model(imgs.view(batch_size, -1)) _, predicted = torch.max(outputs, dim=1) total += labels.shape[0] correct += int((predicted == labels).sum()) print("Accuracy: %f", correct / total) Accuracy: 0.794000 ``` ```python= # 相當任意的多層 (透過更多層來為模型添加一些亮點) model = nn.Sequential( nn.Linear(3072, 1024), nn.Tanh(), nn.Linear(1024, 512), nn.Tanh(), nn.Linear(512, 128), nn.Tanh(), nn.Linear(128, 2), nn.LogSoftmax(dim=1)) ``` The combination of ```nn.LogSoftmax``` and ```nn.NLLLoss``` is equivalent to using ```nn.CrossEntropyLoss```. ```nn.NLLoss```: 實際上計算的是交叉熵，但對數機率預測為輸入```nn.CrossEntropyLoss```得分的地方（有時稱為logits）。從技術上講，```nn.NLLLoss```是狄拉克分佈之間的交叉熵，將所有質量放在目標上，並通過對數機率輸入給出預測的分佈。 To add to the confusion, in information theory, up to normalization by sample size, this cross entropy can be interpreted as a negative log likelihood of the predicted distribution under the target distribution as an outcome. 當我們的模型預測（softmax應用的）機率時，這兩個loss都是給定資料的模型參數的negative log likelihood ```python= model = nn.Sequential( nn.Linear(3072, 1024), nn.Tanh(), nn.Linear(1024, 512), nn.Tanh(), nn.Linear(512, 128), nn.Tanh(), nn.Linear(128, 2)) loss_fn = nn.CrossEntropyLoss() # 刪除nn.LogSoftmax網絡的最後一層並把nn.CrossEntropyLoss當成損失 ``` 唯一的麻煩：模型輸出不會被解釋為機率（或對數機率）。需要通過softmax傳遞輸出以獲得這些輸出。 * ```parameters()``` method of ```nn.Model```: 一種快速的方法來確定多少個參數模型(用來提供參數給優化器) 我們可能希望將可訓練參數的數量與整個模型的大小區分開。 ```python= # In[7]: numel_list = [p.numel() #要找出每個tensor實例中有多少個元素，我們可以調用該numel方法 for p in connected_model.parameters() if p.requires_grad == True #根據我們的用例，計算參數可能需要我們檢查參數是否也requires_grad設置True] sum(numel_list), numel_list ``` ```python # Out[7]: (3737474, [3145728, 1024, 524288, 512, 65536, 128, 256, 2]) #370萬個參數，網路很龐大 ``` ```python= # In[9]: numel_list = [p.numel() for p in first_model.parameters()] sum(numel_list), numel_list ``` ```python # Out[9]: (1574402, [1572864, 512, 1024, 2]) ``` 第一個模型中的參數數量大約是最新模型中參數的一半。從單個參數大小的列表中，開始知道是什麼原因：第一個模塊具有150萬個參數在我們的整個網絡中，我們具有1024個輸出功能，這導致第一個線性模塊具有300萬個參數知道線性層可以計算y = weight * x + bias 且如果x長度為3072（為簡單起見，不考慮批處理維度）且y長度必須為1024 則weight tensor的bias大小必須為1024×3072，且大小必須為1024。正如我們先前所發現的，1024 * 3072 + 1024 = 3146752。我們可以直接驗證這些數量： ```python= # In[10]: linear = nn.Linear(3072, 1024) linear.weight.shape, linear.bias.shape ``` ```python # Out[10]: (torch.Size([1024, 3072]), torch.Size([1024])) # 我們的神經網絡無法隨像素數量很好地縮放 ``` Q : 如果我們有1024×1024 RGB圖像怎麼辦？ A: 那是310萬個輸入值。即使突然使用1024個hidden features（對於我們的分類器也不起作用），也將擁有超過30億個參數。使用32位浮點數，且已有12 GB的RAM，甚至還沒有達到第二層，更不用說計算和存儲漸變了。這只是不適合大多數當今的GPU。 --- ### 7 -- The limits of going fully connected 獲取每個輸入值一樣，即RGB中的每個單個分量圖像-並將其與每個輸出要素的所有其他值進行線性組合。一方面，我們允許圖像中任何像素與每個其他像素的組合可能與我們的任務相關。另一方面，由於我們將圖像視為數字的一個大向量，因此我們沒有利用相鄰像素或遙遠像素的相對位置。 ![Uploading file..._fh2185pzq]() ###### tags: 使用帶有輸入圖像的完全連接的模塊：每個輸入像素彼此組合在一起，以在輸出中生成每個元素。 ![Uploading file..._q3z0knz2n]() ###### tags: Translation invariance, or the lack thereof, with fully connected layers(具有完全連接的層的平移不變性或缺少) 問題和網絡結構之間的不匹配，最終導致過度吻合了訓練資料，而不是學習模型要檢測的一般特徵。 --- ## 7-3 結論解決了一個簡單的分類問題：從資料集到模型，再到最小化訓練循環中的適當Loss 作者發現他們的模型存在嚴重缺陷：一直將2D圖像視為1D資料。同樣，我們沒有自然的方法來合併問題的平移不變性 --- 下一章：學習如何利用圖像資料的2D性質來獲得更好的結果可以立即使用學到的知識來處理資料，而不會出現這種translation invariance