***Deep Learning - Gesture Recognition with CNNs***

***Deep Learning - Gesture Recognition with CNNs*** === > #### **My Github with open source (Fully code):** https://github.com/LILRAY0826/Gesture-Recognition-with-CNNs.git *I . Data Spliting* --- ***You can get your custom dataset through the python file, and get the csv files for training and testing.*** ```python= import os import cv2 import torch import numpy as np import pandas as pd from torch.utils.data import Dataset from skimage import io class CustomImageDataset(Dataset): def __init__(self, csv_file, root_dir, transform=None): self.annotations = pd.read_csv(csv_file) self.root_dir = root_dir self.transform = transform def __len__(self): return len(self.annotations) def __getitem__(self, index): img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1]) image = io.imread(img_path) label = torch.tensor(int(self.annotations.iloc[index, 2])) if self.transform: image = self.transform(image) return image, label def enumerate_files(dirs, path='All_gray_1_32_32/', n_poses=3, n_samples=20): filenames, targets = [], [] for p in dirs: for n in range(n_poses): for j in range(3): dir_name = path+p+'/000'+str(n*3+j)+'/' for s in range(n_samples): d = dir_name + '%04d/' % s for f in os.listdir(d): if f.endswith('jpg'): filename = d + f filename = filename.replace("All_gray_1_32_32/", "") filenames += [filename] targets.append(n) return filenames, targets def read_images(files): imgs = [] for f in files: img = cv2.imread(f, cv2.IMREAD_GRAYSCALE) imgs.append(img) return imgs def read_datasets(datasets, csv_name): files, labels = enumerate_files(datasets) dataframe = {"filename": files, "label": labels} dataframe = pd.DataFrame(dataframe) dataframe.to_csv(csv_name) list_of_arrays = read_images(files) return np.array(list_of_arrays), labels if __name__ == "__main__": train_sets = ['Set1', 'Set2', 'Set3'] test_sets = ['Set4', 'Set5'] trn_array, trn_labels = read_datasets(train_sets, csv_name="train_data.csv") tst_array, tst_labels = read_datasets(test_sets, csv_name="test_data.csv") ``` *II . Method Description and Comparing with Hyperparameters and Model Architectures* --- ***In the intial setting, I constructed the model with two sequences of network, including a convolution layer, a activation function , a pooling layer in each sequence.*** ```python import torch import torch.nn as nn # Model class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.conv1 = nn.Sequential( # input shape (1, 28, 28) nn.Conv2d( in_channels=1, # input height out_channels=16, # n_filters kernel_size=5, # filter size stride=1, # filter movement/step padding=2, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1 ), # output shape (16, 28, 28) nn.ReLU(), # activation nn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 16, 16) ) self.conv2 = nn.Sequential( # input shape (16, 16, 16) nn.Conv2d( in_channels=16, # input height out_channels=32, # n_filters kernel_size=5, # filter size stride=1, # filter movement/step padding=2, ), # output shape (32, 16, 16) nn.ReLU(), # activation nn.MaxPool2d(kernel_size=2), # output shape (32, 8, 8) ) self.out = nn.Linear(32 * 8 * 8, 3) # fully connected layer, output 3 classes ``` --- ### Situation 1 : The Difference of Hyperparameters | Epoch | Optimizer | Accuracy of Training(%) | Accuracy of Testing(%) | |:----------:|:---------:|:--------------------:|:-------------------:| | 25 | SGD |48.7|54.17| | 25 | Adagrad |84.44|73.89| | 25 | RMSProp |86.85|77.5| | 25 | Adam |96.11|86.94| | 50 | SGD |51.3|44.72| | 50 | Adagrad |79.63|67.78| | 50 | RMSProp |96.48|82.22| | **50** | **Adam** |**99.44**|**93.89**| --- ### Situation 2 : The Difference of Model Architectures of Sequence ***Based on the result above the chart, I set the parameter with epoch=50, optimizer=Adam.*** ```python import torch import torch.nn as nn # Model class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.conv1 = nn.Sequential( # input shape (1, 28, 28) nn.Conv2d( in_channels=1, # input height out_channels=16, # n_filters kernel_size=5, # filter size stride=1, # filter movement/step padding=2, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1 ), # output shape (16, 28, 28) nn.ReLU(), # activation nn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 16, 16) ) self.out = nn.Linear(16 * 16 * 16, 3) # fully connected layer, output 3 classes ``` | Sequence | Accuracy of Training(%) | Accuracy of Testing(%) | |:--------:|:-----------------------:|:----------------------:| | 1 | 96.3 | 68.33 | | **2** | **99.4** | **93.89** | ***Without a doubt, the accuracy of testing with two sequences of network is better than one's. However, for convenience of experimentation, I utilized one sequence of network for the next comparation.*** --- ### Situation 3 : The Difference of Model Architectures of inner parameters * ***Number of Feature Filter :*** | n_filiter | Accuracy of Training(%) | Accuracy of Testing(%) | |:--------:|:-----------------------:|:----------------------:| | 8 | 95.56 | 69.17 | | 16 | 97.22 | 73.06 | | 32 | 98.33 | 76.67 | | **64** | **100** | **84.72** | | 128 | 100 | 91.39 | ***The more number of filter, the more accuracy of testing will be possibility higher, however, the time waste and burden of hardware also increase, finally, I chose n_filiter=64 for the parameter in neruon.*** * ***Filter Size :*** | Kernel Size | Accuracy of Training(%) | Accuracy of Testing(%) | |:--------:|:-----------------------:|:----------------------:| | 3 | 96.67 | 65.56 | | **5** | **98.33** | **76.67** | * ***Kernel Size of Pooling Layer :*** | Kernel Size | Accuracy of Training(%) | Accuracy of Testing(%) | |:--------:|:-----------------------:|:----------------------:| | 2 | 99.81 | 83.89 | | **4** | **96.67** | **87.5** | ***In the result, the inner parameters, number of feature filter=64, filter size=5, kernel size of pooling layer=4, are adopted in the network.*** ***Finally, Number of feature filter and kernel size of pooling layer are assigned to two sequences of network.*** ``` Loss at Epoch 0 is 1.2168701399456372 Loss at Epoch 1 is 1.0974623289975254 Loss at Epoch 2 is 1.096157973462885 Loss at Epoch 3 is 1.0905030098828403 Loss at Epoch 4 is 1.077196717262268 Loss at Epoch 5 is 1.0530863295901904 Loss at Epoch 6 is 1.0225716124881397 Loss at Epoch 7 is 0.9748118248852816 Loss at Epoch 8 is 0.9226068691773848 Loss at Epoch 9 is 0.8592216318303888 Loss at Epoch 10 is 0.799804999069734 Loss at Epoch 11 is 0.7548064535314386 Loss at Epoch 12 is 0.7122143073515459 Loss at Epoch 13 is 0.680005363442681 Loss at Epoch 14 is 0.6488951552997936 Loss at Epoch 15 is 0.6219885457645763 Loss at Epoch 16 is 0.5968957136977803 Loss at Epoch 17 is 0.5709817626259543 Loss at Epoch 18 is 0.5488803711804476 Loss at Epoch 19 is 0.5227069610899145 Loss at Epoch 20 is 0.5009708120064302 Loss at Epoch 21 is 0.4788070619106293 Loss at Epoch 22 is 0.4572262628511949 Loss at Epoch 23 is 0.4384942759167064 Loss at Epoch 24 is 0.4169470979408784 Loss at Epoch 25 is 0.4000031148845499 Loss at Epoch 26 is 0.37714689428156073 Loss at Epoch 27 is 0.35991076650944626 Loss at Epoch 28 is 0.3390561254187064 Loss at Epoch 29 is 0.320497564971447 Loss at Epoch 30 is 0.3013303909789432 Loss at Epoch 31 is 0.28368994254957547 Loss at Epoch 32 is 0.2666636620732871 Loss at Epoch 33 is 0.24946328218687663 Loss at Epoch 34 is 0.23689039288596672 Loss at Epoch 35 is 0.21950358321720903 Loss at Epoch 36 is 0.20938315479592842 Loss at Epoch 37 is 0.19384147023612802 Loss at Epoch 38 is 0.18616960079155184 Loss at Epoch 39 is 0.17104969986460425 Loss at Epoch 40 is 0.16506836072287775 Loss at Epoch 41 is 0.1514935181899504 Loss at Epoch 42 is 0.14437737468291412 Loss at Epoch 43 is 0.13537038016048344 Loss at Epoch 44 is 0.12659668262031945 Loss at Epoch 45 is 0.1214209772138433 Loss at Epoch 46 is 0.11186013988811862 Loss at Epoch 47 is 0.10835131435570391 Loss at Epoch 48 is 0.09982061428441243 Loss at Epoch 49 is 0.09737169937315313 ================================================= Epoch is 50 ------------------------------------------------- The batch size is 50 ------------------------------------------------- The number of feature filter is 32 , size is 5 ------------------------------------------------- The Pooling Size is 2 ------------------------------------------------- The optimizer is Adam ------------------------------------------------- The loss function is CrossEntropyLoss ------------------------------------------------- Check Accuracy of Training : Got 540 / 540 with accuracy 540.0/540.0 = 100.0 ------------------------------------------------- Check Accuracy of Testing : Got 343 / 360 with accuracy 343.0/360.0 = 95.28 ``` ![](https://i.imgur.com/PVEaYi2.png) *III . Full code of Model Construction and Training&Testing* --- ```python== import torch import torch.nn as nn import torchvision.transforms as transforms import matplotlib.pyplot as plt from Data_Spliting import CustomImageDataset from torch.utils.data import DataLoader # Hyper Parameters Ladels = 3 EPOCH = 50 # train the training data n times, to save time, we just train 1 epoch BATCH_SIZE = 50 LR = 0.001 # learning rate OPTIMIZER = "Adam" # [SGD, Adagrad, RMSProp, Adam] Feature_Filter = 32 Filter_Size = 5 Stride = 1 Padding = int(Filter_Size/2) POOLING_SIZE = 2 # Load Data train_data = CustomImageDataset(csv_file="train_data.csv", root_dir="All_gray_1_32_32", transform=transforms.ToTensor()) test_data = CustomImageDataset(csv_file="test_data.csv", root_dir="All_gray_1_32_32", transform=transforms.ToTensor()) train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE) test_loader = DataLoader(dataset=test_data, batch_size=BATCH_SIZE) # Model class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() self.conv1 = nn.Sequential( # input shape (1, 32, 32) nn.Conv2d( in_channels=1, # input height out_channels=Feature_Filter, # n_filters kernel_size=Filter_Size, # filter size stride=Stride, # filter movement/step padding=Padding, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1 ), # output shape (32, 32, 32) nn.ReLU(), # activation nn.MaxPool2d(kernel_size=POOLING_SIZE), # choose max value in 2x2 area, output shape (32, 16, 16) ) self.conv2 = nn.Sequential( # input shape (32, 16, 16) nn.Conv2d( in_channels=Feature_Filter, # input height out_channels=Feature_Filter*2, # n_filters kernel_size=Filter_Size, # filter size stride=Stride, # filter movement/step padding=Padding, ), # output shape (64, 16, 16) nn.ReLU(), # activation nn.MaxPool2d(kernel_size=POOLING_SIZE), # output shape (64, 8, 8) ) self.out = nn.Linear(64*8*8, Ladels) # fully connected layer, output 3 classes def forward(self, x): x = self.conv1(x) x = self.conv2(x) x = x.view(x.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7) output = self.out(x) return output, x # return x for visualization # Check Accuracy def check_accuracy(loader, model): num_correct = 0 num_samples = 0 model.eval() with torch.no_grad(): for x, y in loader: scores = model(x)[0] _, predictions = scores.max(1) num_correct += (predictions == y).sum() num_samples += predictions.size()[0] print(f'Got {num_correct} / {num_samples} with accuracy {float(num_correct)}/{float(num_samples)*1} = ' f'{round((float(num_correct)/float(num_samples))*100, 2)}') model.train() cnn = CNN() # Optimizer Choice if OPTIMIZER == "SGD": # [SGD, Adagrad, RMSProp, Adam] optimizer = torch.optim.SGD(cnn.parameters(), lr=LR) # optimize all cnn parameters elif OPTIMIZER == "Adagrad": optimizer = torch.optim.Adagrad(cnn.parameters(), lr=LR) elif OPTIMIZER == "RMSProp": optimizer = torch.optim.RMSprop(cnn.parameters(), lr=LR) elif OPTIMIZER == "Adam": optimizer = torch.optim.Adam(cnn.parameters(), lr=LR) loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted # Training list_loss = [] list_epoch =[] for epoch in range(EPOCH): losses = [] for step, (data, target) in enumerate(train_loader): # gives batch data, normalize x when iterate train_loader # Forward output = cnn(data)[0] # cnn output loss = loss_func(output, target) # cross entropy loss losses.append(loss.item()) # Backward optimizer.zero_grad() # clear gradients for this training step loss.backward() # backpropagation, compute gradients # Gradient Step optimizer.step() # apply gradients list_loss.append(sum(losses)/len(losses)) list_epoch.append(epoch) print(f'Loss at Epoch {epoch} is {sum(losses)/len(losses)}') print("=================================================") print("Epoch is", EPOCH) print("-------------------------------------------------") print("The batch size is", BATCH_SIZE) print("-------------------------------------------------") print("The number of feature filter is", Feature_Filter, ", size is", Filter_Size) print("-------------------------------------------------") print("The Pooling Size is", POOLING_SIZE) print("-------------------------------------------------") print("The optimizer is", OPTIMIZER) print("-------------------------------------------------") print("The loss function is CrossEntropyLoss") print("-------------------------------------------------") print("Check Accuracy of Training : ") check_accuracy(train_loader, cnn) print("-------------------------------------------------") print("Check Accuracy of Testing : ") check_accuracy(test_loader, cnn) # Loss Graph x = list_epoch y = list_loss plt.plot(x, y, 'bo-', linewidth=1.5) plt.title("Loss Function") plt.xlabel("EPOCH") plt.ylabel("LOSS") plt.grid(True) plt.show() ``` *IV . Concluding* --- --- ### 1. Improvement : * ***Time Wasting :*** * It is possible way that we remove the constant value in Gaussian Distrubution formula for reducing the time of processing. * Updating the hardware is also a suitable way, we use 4-core CPU in this project, if 8-core, even 16-core, 32-core CPU, the time wasting would be more concise. * ***Accurancy :*** * Increasing the training data maybe is a good idea, but it must be careful that the more training data, the more possibility of overfitting it will happen. * ***Customlize :*** * The codes can be more normalize to utilize in other circumstances, because the codes of this project is bulit up from the situation we have already known, if there is a new situation we haven't met, the programming would be unuseful. ### 2. What I've learned : * ***Break through the concepts on ML :*** Since I had only written code for traditional machine learning before, I worked on "how to extract features" when I first started writing the assignment, and then I realized that everything was available on Pytorch, which made things much smoother. * ***Confidence :*** Due to the success of this project, it makes me more confident, I'm no longer afraid of the challenge like this, and the experience will be the fertilizer for my life of engineer.