# Pytorch MNIST練習 # 4-steps 1. Load data 2. Build Model 3. Train 4. Test # **1. Load data** >MINST包括60000個訓練樣本和10000個測試樣本,其中圖片已進行標準化處理,皆為黑白圖像,大小為28x28 >MNIST資料集裡的每一筆資料皆由images(數字的影像)與labels(該圖片的真實數字)所組成 ## **1.1 匯入套件** ```python import torch import torch.nn as nn import torch.optim as optim import torch.utils.data as dset from torchvision import datasets, transforms import matplotlib.pyplot as plt import numpy as np import time ``` ## **1.2 定義圖片的處理函數** 由MNIST下載的圖片數據是一個**image object**, 需要進行處理 把範圍為[0,255]的PIL.Image或者shape為(H,W,C)的numpy.ndarray轉換成形狀為[C,H,W], 範圍為[0,1.0]的torch.FloatTensor * C : 圖片通道數, 用多少channel來描述一個pixel, 如彩色圖片之channel為3(R,G,B), 黑白為1 * H : 圖片高度 * W : 圖片寬度 ```python # TODO: Define your transforms for the training, validation, and testing sets transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,)), #transforms.Normalize((0.1307,), (0.3081,)) ] ) ``` * `Compose()`: 將多個transform物件組合成list來給MNIST的transform使用, 會依序執行list內的函數 * `transforms.ToTensor()`: 會將(numpy格式)值在 [0-255] 的 PIL.Image 轉換成 (C, H, W) * `transforms.Normalize()`: 给定均值:(R,G,B) 方差:(R,G,B),會把Tensor正規化。 * Normalized_image = (image - mean) / std, **均勻分布在0附近有助提升性能** * 由於MNIST的圖片集已非常乾淨,不須進行過多的資料清理 ## **1.3 載入資料** ```python # TODO: Load the datasets with ImageFolder # datasets.MNIST() : 加載MNIST數字集 # 1.root為儲存數字集的路徑 # 2.train : boolean ,表示是使用訓練集的數據還是測試集的數據 # 3.transform 實現的對圖片的處理函數 # 4.download=True時當資料夾內無MNIST文件時就會自動下載(numpy格式) trainSet = datasets.MNIST(root='MNIST', download=True, train=True, transform=transform) testSet = datasets.MNIST(root='MNIST', download=True, train=False, transform=transform) # TODO: Using the image datasets and the trainforms, define the dataloaders # batch_size : 一次處理幾張圖片的數量 # shuffle=True` : 隨機打散圖片 trainLoader = dset.DataLoader(trainSet, batch_size=64, shuffle=True) testLoader = dset.DataLoader(testSet, batch_size=64, shuffle=False) #testSet不用打散 ``` * 數據集返回[圖片,目標值] ### 1.3.1 檢視訓練集: ```python dataiter = iter(trainLoader) #設定迭代器 images, labels = dataiter.next() print(images.shape, labels.shape, images.min(), images.max()) #Oiutput:torch.Size([64, 1, 28, 28]) torch.Size([64]) tensor(-1.) tensor(1.) #batch size=64, channel=1, h*w=28*28, label=64, normalized=[-1,1] ``` ### 1.3.2 檢視訓練集圖片: ```python plt.figure(figsize=(10,10)) random_inds = np.random.choice(60000,36) for i in range(36): plt.subplot(6,6,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) image_ind = random_inds[i] plt.imshow(np.squeeze(trainSet[image_ind][0]), cmap=plt.cm.binary) plt.xlabel(trainSet[image_ind][1]) ``` ![](https://i.imgur.com/tpBFQvw.png) # **2. Build Model** >選擇適當的model,找出最佳的參數,好讓 training data 的正確率趨近100%,或誤差接近0,再運用相同的模型去看 testing data 有沒有相同的效果,是否過擬。 ## **2.1 建立Fully connected neural network architecture** 首先設定Network類,共三步驟 1. 繼承 Module class 2. overwrite init() * 設定hidden layer及activation function 3. overwrite forward() 透過Sequential類可以直接依設定的網絡進行forward propagation, ```python # Model class Net(nn.Module): def __init__(self): #Constructor super(Net, self).__init__() self.main = nn.Sequential( # '''TODO: Define the first fully connected layer and the activation function.''' #784=28*28為將一開始的pixels拉成一維數列 nn.Linear(in_features=784, out_features=128), #activation function使用ReLU() nn.ReLU(), # '''TODO: Define the second fully connected layer and the activation function.''' nn.Linear(in_features=128, out_features=64), nn.ReLU(), # '''TODO: Define the output layer and output activation function to output the classification probabilities''' #最後輸出為10個節點, 表示數字0~9 nn.Linear(in_features=64, out_features=10), # nn.LogSoftmax(dim=1) ) #forward propagation def forward(self, input): return self.main(input) #檢視Net Class net = Net().to(device) print(net) ``` ### 筆記 * Prediction => forward propagation * 目的 : 將資訊往network送,傳到最後一層取得output * 脈絡 : 透過一層一層的隱藏層,使用Activation function抓取資料的**特徵**,可以理解為在每一層的output透過activation function轉換output的維度,也就是讓特徵更為明顯,但是當hidden layer太多時,有可能導致over fitting * overfitting : 像是我們使用四or五次曲線來近似資料及的分佈,雖然可以非常吻合training set的資訊,但是對於testing data來說不會比較準確,因此更重要的是curve的泛用性,解決方法->進行資料regulization * 神經元數量? * 太少 : under-fitting * 太多 : 運算太久且可能導致Over-fitting * 注重的是parameter個數,也就是W1,W2,W3.....(有多deep),而不是有幾個神經元 | Layer x size | Error rate(%) | Layer x Size | Error rate(%) | | :------: | :--------: | :--------: |:--------:| |1x2k|24.2||| |2x2k|20.4||| |3x2k|18.4||| |4x2k|17.8||| |5x2k|17.2|1x3772|22.5| |7x2k|17.1|1x4634|22.6| |||1x16k|22.1| * softmax()用於多分類過程中,它將多個神經元的輸出,映射到(0,1)區間內,可以看成概率來理解,從而來進行多分類 ## **2.2 Compile the model** ```python '''TODO: Experiment with different optimizers and learning rates. How do these affect the accuracy of the trained model? Which optimizers and/or learning rates yield the best performance?''' optimizer = optim.Adam(net.parameters(), lr=0.0001) loss_function = nn.CrossEntropyLoss() # Measure the validation loss and accuracy def validation(model, dataloader, loss_function, device): loss = 0.0 accuracy = 0.0 with torch.no_grad(): for i, data in enumerate(dataloader): inputs, labels = data[0].to(device), data[1].to(device) inputs = inputs.view(inputs.shape[0], -1) output = net(inputs) loss += loss_function(output, labels).item() ps = torch.exp(output) # get the class probabilities from log-softmax equality = (labels.data == ps.max(dim=1)[1]) accuracy += equality.type(torch.FloatTensor).mean() return loss, accuracy ``` # **3. Train** ## **3.1 Train the model** ```python pochs = 35 steps = 0.0 running_loss = 0.0 train_accuracy = 0.0 start = time.time() x=[] y=[] x1=[] y1=[] for e in range(epochs): for i, data in enumerate(trainLoader): inputs, labels = data[0].to(device), data[1].to(device) # Move to GPU inputs = inputs.view(inputs.shape[0], -1) steps += 1 # Zero the parameter gradients optimizer.zero_grad()#清零梯度 # Forward output = net(inputs)#將inputs丟入神經元進行運算得到Output loss = loss_function(output, labels)#誤差為交叉熵(Cross entropy),越小越好 # Backward loss.backward()#微分 optimizer.step()#更新 running_loss += loss.item() # Get the class probabilities from log-softmax ps = torch.exp(output) equality = (labels.data == ps.max(dim=1)[1]) train_accuracy += equality.type(torch.FloatTensor).mean() if steps % len(trainLoader) == 0: # model.eval() # Validate in each epoch with torch.no_grad(): valid_running_loss, valid_accuracy = validation(net, testLoader, loss_function, device) print("Epoch: {}/{} ".format(e+1, epochs), "\nTraining Loss: {:.4f} ".format(running_loss/len(trainLoader)), "Training Accuracy: {:.4f}".format(train_accuracy/len(trainLoader)), "Validation Loss: {:.4f} ".format(valid_running_loss/len(testLoader)), "Validation Accuracy: {:.4f}".format(valid_accuracy/len(testLoader)), "[{}/{} ({:.0f}%)]".format(i * len(data), len(trainLoader.dataset), 100. * i / len(trainLoader))) x.append(e+1) y.append(running_loss/len(trainLoader)) x1.append(e+1) y1.append(valid_running_loss/len(testLoader)) running_loss = 0 train_accuracy = 0 # model.train() # Make sure training is back on time_elapsed = time.time() - start print("Total time: {:.0f}m {:.0f}s".format(time_elapsed//60, time_elapsed % 60)) ``` * 脈絡: * for 每一個 epoch (遍歷整個資料集) * for 每個 iteration (要訓練幾次) * batch size (一次要訓練幾個) * forward propagation:對每一筆input的feature進行Prediction,得到$\hat{y}$ * 計算$\hat{y}$與真實label y的Loss * backward propgation:將loss往回傳遞進行每一層的paramete(b,w)修正 * Dara set size = Iteration x Batch size (1 epoch) * Batch size : * 大 : data考量多,修正的方向較準確,但是往minima的次數就會較少 * 小 : data考量少,修正的方向會出現偏差(在minima附近來回跳動),但是可以修正多次 * batch size=1表示每次data進來就重新train,當遇到Outlier就會噴出去,而使用batch training就算遇到Outlier也只是batch中的一個,會稀釋掉 ## 3.2繪製Loss ```python fig = plt.figure() plt.plot(x, y, 'r-', label=u'Training Loss') plt.legend() plt.xlabel(u'epochs') plt.ylabel(u'loss') plt.title('Compare loss') plt.show() ``` ![](https://i.imgur.com/2T3jvqB.png) # **4. Test** ## 4.1 Testing the trained model ```python # TODO: Do validation on the test set with torch.no_grad(): test_loss, test_accuracy = validation(net, testLoader, loss_function, device) print("Testing Accuracy: {:.2f}%".format(100 * test_accuracy/len(testLoader))) ``` ## 4.2 Class Prediction >It's time to write a function for making predictions with your model. A common practice is to predict the top 5 or so (usually called top- K ) most probable classes. You'll want to calculate the class probabilities then find the K largest values. To get the top K largest values in a tensor use x.topk(k). This method returns both the highest k probabilities and the indices of those probabilities corresponding to the classes. You need to convert from these indices to the actual class labels using class_to_idx which hopefully you added to the model or from an ImageFolder you used to load the data (see here). Make sure to invert the dictionary so you get a mapping from index to class as well. ```python def predict(topk=5): ''' Predict the class (or classes) of an image using a trained deep learning model. ''' # TODO: Implement the code to predict the class from an image file with torch.no_grad(): All= next(iter(testLoader)) data=All[0][62],All[1][62] inputs, labels = data[0].to(device), data[1].to(device) inputs = inputs.view(inputs.shape[0], -1) output = net(inputs) softmax = nn.Softmax(dim=1) ps = softmax(output) probs, indices = torch.topk(ps, topk) probs = [float(prob) for prob in probs[0]] invert = {v: k for k, v in testSet.class_to_idx.items()} classes = [invert[int(index)] for index in indices[0]] return probs, classes prob, classes= predict(topk=5) print("prob: ", prob) print("classes: ", classes) ``` >prob: [0.9999167919158936, 8.054587669903412e-05, 1.8697675159273786e-06, 5.743477800024266e-07, 7.593805406713727e-08] classes: ['9 - nine', '5 - five', '8 - eight', '6 - six', '7 - seven'] ## 4.3 Sanity Checking >Now that you can use a trained model for predictions, check to make sure it makes sense. Even if the testing accuracy is high, it's always good to check that there aren't obvious bugs. Use matplotlib to plot the probabilities for the top 5 classes as a bar graph, along with the input image. It should look like this: To show a PyTorch tensor as an image, use the imshow function defined above. ```python # TODO: Display an image along with the top 5 classes max_index = np.argmax(prob) max_probability = prob[max_index] label = classes[max_index] fig = plt.figure(figsize=(8,8)) ax1 = plt.subplot2grid((20,10), (0,0), colspan=9, rowspan=9) ax2 = plt.subplot2grid((20,10), (10,2), colspan=5, rowspan=9) All= next(iter(testLoader)) inputs=All[0][62] labels=All[1][62] ax1.axis('off') ax1.set_title(labels.tolist()) ax1.imshow(inputs.numpy().squeeze(), cmap='gray_r') labels = classes y_pos = np.arange(5) ax2.set_yticks(y_pos) ax2.set_yticklabels(labels) ax2.set_xlabel('Probability') ax2.invert_yaxis() ax2.barh(y_pos, prob, xerr=0, align='center', color='blue') plt.show() ``` >![](https://i.imgur.com/ZFDPPqu.png)