Hand Pose Recognition with CNN

# Hand Pose Recognition with CNN > [name=410721236 資工四鍾承恩] > [TOC] --- ## 題目說明 ### **1. Descriptions** In this assignment, you will practice the design of a Convolutional Neural Network to recognize the three different hand postures illustrated below: 在本作業中，您將練習卷積神經網絡的設計，以識別下圖所示的三種不同的手部姿勢： ![](https://i.imgur.com/HP08bVf.png) You will be given 5 data sets captured from different persons under different photo shooting conditions. The samples in each data set are stored in 9 separate directories. The directories 0000~0002, 0003~0005, and 0006~0008 contain samples of Posture 1, 2, and 3,respectively. Each directory has 20 samples. Each sample is a gray image of 32x32 pixels. 您將獲得 5 個不同人在不同照片拍攝下的數據集條件。每個數據集中的樣本存儲在 9 個單獨的目錄中。目錄0000~0002 、 0003~0005 和 0006~0008 包含姿勢 1、2 和 3 的樣本，分別。每個目錄有 20 個樣本。每個樣本都是 32x32 像素的灰度圖像。 ### **2. Steps to do this assignment** 1) Download the data set from the following link:1) 從以下鏈接下載數據集： http://web.csie.ndhu.edu.tw/ccchiang/Data/All_gray_1_32_32.rar 2) Install the OpenCV for Python to read image samples (skip this step if you know any other way to read images into a Numpy array). Visit the following website for detailsabout how to install OpenCV: https://pypi.org/project/opencv-python/. 安裝 OpenCV for Python 以讀取圖像樣本（如果您知道任何其他將圖像讀入 Numpy 數組的方法，請跳過此步驟）。 3) Read the images into the training set and the testing set. 將圖像讀入訓練集和測試集。 ![](https://i.imgur.com/UkG5uar.png) ![](https://i.imgur.com/Ixp1IZg.png) 4) Design the CNN architecture. You can try different network architectures to attain a best recognition performance. Don't forget to try the batch normalization layer which might be useful for reducing the side effect of overfitting. 設計CNN架構。您可以嘗試不同的網絡架構以獲得最佳的識別性能。不要忘記嘗試批量歸一化層，這可能有助於減少過度擬合的副作用。 6) Train the network and test the performance. Based on the CNN you have designed,use the Set 1~Set 3 for training and the Set 4~Set 5 for testing. 訓練網絡並測試性能。根據您設計的 CNN，使用 Set 1~Set 3 進行訓練，使用 Set 4~Set 5 進行測試。 8) The files you should submit for this assignment is a report, wrtten on the HackMD online document server (https://hackmd.io/) or pdf file, which contains the following parts: 應該提交的文件是一份報告，寫在 HackMD 在線文檔服務器 (https://hackmd.io/) 上或 pdf 文件，其中包含以下部分 a) Method descriptions_方法說明 b) Source code explanations_code說明 c) Experimental results (learning curves, recognition rates for both the training sets and testing sets, result comparisons for different model architectures or hyperparameters)_實驗結果（學習曲線、訓練集和測試集的識別率、不同模型架構或超參數的結果比較） d) Discussions on the results_結果討論 e) Concluding remarks (what problems are encountered and solved in the assignment, what you have learned from the assignment)_結束語（作業中遇到和解決了哪些問題，從作業中學到了什麼） --- ## 實作: ### 方法說明 Overview: Import what the need Creation of CNN Compiling and training model Saving model for later use ![](https://i.imgur.com/PvTtTun.png) ### 1.Import Libraries ![](https://i.imgur.com/sw2GolJ.png) 使用colab作為平台，因此需要先將data上傳至雲端，並連結，才能使用data ![](https://i.imgur.com/eXkm19v.png) ### 2.資料預處理 ![](https://i.imgur.com/6MjHA7W.png) ![](https://i.imgur.com/ZzdmKjK.png) ![](https://i.imgur.com/FEgb925.png) ### 3. 建立模型 ``` # Create CNN Model class CNN_Model(nn.Module): def __init__(self): super(CNN_Model, self).__init__() # Convolution 1 , input_shape=(1,28,28) self.cnn1 = nn.Conv2d(in_channels=1, out_channels=Feature_Filter, kernel_size=Filter_Size, stride=Stride, padding=Padding) #output_shape=(16,24,24) self.relu1 = nn.ReLU() # activation # Max pool 1 self.maxpool1 = nn.MaxPool2d(kernel_size=POOLING_SIZE) #output_shape=(16,12,12) # Convolution 2 self.cnn2 = nn.Conv2d(in_channels=Feature_Filter, out_channels=Feature_Filter*2, kernel_size=Filter_Size, stride=Stride, padding=Padding) #output_shape=(32,8,8) self.relu2 = nn.ReLU() # activation # Max pool 2 self.maxpool2 = nn.MaxPool2d(kernel_size=POOLING_SIZE) #output_shape=(32,4,4) # Fully connected 1 ,#input_shape=(16*16*16) self.fc1 = nn.Linear(16 * 16 * 16, Ladels) def forward(self, x): # Convolution 1 out = self.cnn1(x) out = self.relu1(out) # Max pool 1 out = self.maxpool1(out) # Convolution 2 out = self.cnn2(out) out = self.relu2(out) # Max pool 2 out = self.maxpool2(out) out = out.view(out.size(0), -1) # Linear function (readout) out = self.fc1(out) return out, x ``` --- ## 實驗結果 ### 1. Epoch | Epoch | Optimizer | accuracy of training | accuracy of testing | | ----- | --------- | -------------------- | ------------------- | | 10 | Adam | 47.41% | 41.39% | | 25 | Adam | 97.41% | 91.94% | | 35 | Adam | 94.07% | 85.56% | | 50 | Adam | 99.81% | 95.28% | ![](https://i.imgur.com/8IzQ8JB.png) ![](https://i.imgur.com/4ZbhABU.png) ![](https://i.imgur.com/JgYIuVX.png) ![](https://i.imgur.com/5nkokaH.png) --- ### 2. Feature filter (epoch:50,optimizer=Adam) | feature filter | accuracy of training | accuracy of testing | | -------------- | -------------------- | ------------------- | | 16 | 99.81% | 83.06% | | 32 | 99.64% | 84.44% | | 64 | 95.59% | 90.0% | feature filter個數越多，測試的準確率會越高，(這邊測試出來的結果是filter = 64最高。)但耗時且硬件的負擔增加。 --- ### 3. Filter size (epoch:50,optimizer=Adam) | kernel size | accuracy of training | accuracy of testing | | ----------- | -------------------- | ------------------- | | 1 | 86.78% | 67.78% | | 3 | 97.78% | 86.67% | | 5 | 98.89% | 90.28% | --- ## 過程中遇到的問題 1.在更改parameters(參數)時，原先一直出現如下圖問題: ![](https://i.imgur.com/IQLh8Gc.jpg) 後來詢問同學，才知道linear層數值相乘要等於圈起來的數值。 2.在Epoch值高時，因為電腦效能問題，需要很長時間才能跑完程式。 3.loading data時，因為path問題卡了一段時間，或許使用不是線上的處理器，像:vscode等，會比較方便，也不用一段時間就要重新loading data。 --- ## 學到的知識 1. pytorch實用:在這個作業之前，我並沒有用過pytorch的任何function，因此在這一次作業中，我查了許多相關資料，也因此了解部分的使用方法。 2. 如何設計cnn架構，單看理論沒有實作的話，是不會真正知道他是如何實行的；藉著這一次作業，更精進自己對於深度學習的知識。 --- ## 參考資料 #### https://colab.research.google.com/github/filipefborba/HandRecognition/blob/master/project3/project3.ipynb#scrollTo=l20Rj1HP7LB- #### https://colab.research.google.com/github/nearform/nodeconfeu-gesture-models/blob/master/create_gesture_model.ipynb#scrollTo=sjEyVbaw_ZSe #### https://hackmd.io/@lido2370/SJMPbNnKN?type=view --- ## Code ``` from google.colab import drive # 連結雲端硬碟 drive.mount('/content/gdrive') # 此處需要登入 google 帳號 import os import cv2 import numpy as np import pandas as pd import glob #用 glob.glob 來讀取檔案路徑#沒用到 from google.colab.patches import cv2_imshow #colab 沒有imshow，所以需特別import from torch.utils.data import Dataset from skimage import io # TensorFlow and tf.keras #後來沒用到 import tensorflow as tf from tensorflow import keras import torch import torch.nn as nn from torch.nn.modules.container import Sequential import torchvision.transforms as transforms import matplotlib.pyplot as plt from torch.utils.data import DataLoader def enumerate_files(dirs, path='/content/gdrive/MyDrive/All_gray_1_32_32/All_gray_1_32_32/', n_poses=3, n_samples=20): filenames, targets = [], [] for p in dirs: for n in range(n_poses): for j in range(3): dir_name = path+p+'/000'+str(n*3+j)+'/' for s in range(n_samples): d = dir_name + '%04d/' % s for f in os.listdir(d): if f.endswith('jpg'): #filename += [d+f] #因local variable 'filename' referenced before assignment 無法使用，但改成下句卻可以，不太懂 #似乎是因為Function 中沒有宣告 global 的話，一律是為 Local 變數，所以要額外宣告成global變數 filename = d + f filename = filename.replace('/content/gdrive/MyDrive/All_gray_1_32_32/All_gray_1_32_32/', "") #replace掉set前所有檔名成:SetX/000X/00XX/frame-000X.jpg filenames += [filename] targets.append(n)#在targets list尾端，增加n print(len(targets))#if > 0,表圖片有loaded進來 train:540 test:360 return filenames, targets def read_images(files): imgs = [] for f in files: img = cv2.imread(f, cv2.IMREAD_GRAYSCALE) #格式:img_gray = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE) #cv2_imshow(img) #可顯示圖片 #但cv2_imshow() takes 1 positional argument but 2 were given，img不只一個位置，所以這裡不能用 imgs.append(img) return imgs def read_datasets(datasets, csv_name): files, labels = enumerate_files(datasets)#原本少一s dataframe = {"filename": files,"label": labels}#row = filename，column = labels dataframe = pd.DataFrame(dataframe)#用來處理雙維度的資料，也就是具有列(橫)(row)與欄(直)(column)csv dataframe.to_csv(csv_name)#轉成csv格式 list_of_arrays = read_images(files) return np.array(list_of_arrays), labels #Python 直譯器執行程式碼時，有一些內建、隱含的變數，__name__就是其中之一，其意義是「模組名稱」。 #如果該檔案是被引用，其值會是模組名稱；但若該檔案是(透過命令列)直接執行，其值會是 __main__； #這邊原本真的不知道怎麼使用，查了很多網站及詢問同學才比較懂 if __name__ == "__main__": train_sets = ['Set1', 'Set2', 'Set3'] test_sets = ['Set4', 'Set5'] trn_array, trn_labels = read_datasets(train_sets, csv_name="train_data.csv") tst_array, tst_labels = read_datasets(test_sets, csv_name="test_data.csv") #array:存img into array，labels:label into array print(type(trn_array))#確認type是否為array os.path.exists('/content/gdrive/MyDrive/All_gray_1_32_32/All_gray_1_32_32/Set1/0000/0000/frame-0000.jpg') #檢測檔案是否存在，true = 有，false = 無 #os.path.isfile('/content/gdrive/MyDrive/All_gray_1_32_32/All_gray_1_32_32/Set1/0000/0000/frame-0000.jpg') #會判斷傳入的路徑是否為一個存在的正規檔案，是的話回傳 True，反之回傳 False class CustomImageDataset(Dataset): def __init__(self, csv_file, root_dir, transform=None): self.annotations = pd.read_csv(csv_file) self.root_dir = root_dir self.transform = transform def __len__(self): return len(self.annotations) def __getitem__(self, index): img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1]) image = io.imread(img_path) label = torch.tensor(int(self.annotations.iloc[index, 2])) if self.transform: image = self.transform(image) return image, label # Hyper Parameters Ladels = 3 EPOCH = 50 #訓練次數10/25/35/50 BATCH_SIZE = 50 LR = 0.001 #learning rate OPTIMIZER = "Adam" #優化器:adam Feature_Filter = 32 Filter_Size = 5 Stride = 1 Padding = int(Filter_Size/2) POOLING_SIZE = 2 # Load Data train_data = CustomImageDataset(csv_file="train_data.csv", root_dir='/content/gdrive/MyDrive/All_gray_1_32_32/All_gray_1_32_32/', transform=transforms.ToTensor()) test_data = CustomImageDataset(csv_file="test_data.csv", root_dir='/content/gdrive/MyDrive/All_gray_1_32_32/All_gray_1_32_32/', transform=transforms.ToTensor()) train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE) test_loader = DataLoader(dataset=test_data, batch_size=BATCH_SIZE) # Create CNN Model class CNN_Model(nn.Module): def __init__(self): super(CNN_Model, self).__init__() #Convolution 1 , input_shape=(1,28,28) self.cnn1 = nn.Conv2d(in_channels=1, out_channels=Feature_Filter, kernel_size=Filter_Size, stride=Stride, padding=Padding) #output_shape=(16,24,24) self.relu1 = nn.ReLU() # activation #Max pool 1 self.maxpool1 = nn.MaxPool2d(kernel_size=POOLING_SIZE) #output_shape=(16,12,12) #Convolution 2 self.cnn2 = nn.Conv2d(in_channels=Feature_Filter, out_channels=Feature_Filter*2, kernel_size=Filter_Size, stride=Stride, padding=Padding) #output_shape=(32,8,8) self.relu2 = nn.ReLU() # activation #Max pool 2 self.maxpool2 = nn.MaxPool2d(kernel_size=POOLING_SIZE) #output_shape=(32,4,4) #Fully connected 1 ,#input_shape=(16*16*16) self.fc1 = nn.Linear(16 * 16 * 16, Ladels) #相乘為m1第二個數值 def forward(self, x): #Convolution 1 out = self.cnn1(x) out = self.relu1(out) #Max pool 1 out = self.maxpool1(out) #Convolution 2 out = self.cnn2(out) out = self.relu2(out) #Max pool 2 out = self.maxpool2(out) out = out.view(out.size(0), -1) #Linear function (readout) out = self.fc1(out) return out, x #Accuracy def check_accuracy(loader, model): num_correct = 0 num_samples = 0 model.eval() with torch.no_grad(): for x, y in loader: scores = model(x)[0] _, predictions = scores.max(1) num_correct += (predictions == y).sum() num_samples += predictions.size()[0] print(f'Got {num_correct} / {num_samples} with accuracy {float(num_correct)}/{float(num_samples)*1} = ' f'{round((float(num_correct)/float(num_samples))*100, 2)}') model.train() cnn = CNN_Model() print(cnn) optimizer = torch.optim.Adam(cnn.parameters(), lr=LR) # optimize all cnn parameters loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted input_shape = (-1,1,28,28)# # Training model(cnn) list_loss = [] list_epoch =[] for epoch in range(EPOCH): losses = [] for step, (data, target) in enumerate(train_loader): #gives batch data, normalize x when iterate train_loader # Forward output = cnn(data)[0] #cnn output loss = loss_func(output, target) #cross entropy loss losses.append(loss.item()) # Backward optimizer.zero_grad() #clear gradients for this training step loss.backward() #backpropagation, compute gradients # Gradient Step optimizer.step() #apply gradients list_loss.append(sum(losses)/len(losses)) list_epoch.append(epoch) print(f'Loss at Epoch {epoch} is {sum(losses)/len(losses)}') print("Epoch:", EPOCH) print("batch size:", BATCH_SIZE) print("feature filter:", Feature_Filter, ", size:", Filter_Size) print("Pooling Size:", POOLING_SIZE) print("optimizer:", OPTIMIZER) #print("The loss function is CrossEntropyLoss") print("Accuracy of Training: ") check_accuracy(train_loader, cnn) print("Accuracy of Testing: ") check_accuracy(test_loader, cnn) # Graph x = list_epoch y = list_loss plt.plot(x, y, 'bo-', linewidth=1.5) plt.title("Loss Function") plt.xlabel("Epoch") plt.ylabel("Loss") plt.grid(True) plt.show() ``` ---