***Deep Learning - Gesture Recognition with CNNs***
===
> #### **My Github with open source (Fully code):** https://github.com/LILRAY0826/Gesture-Recognition-with-CNNs.git
*I . Data Spliting*
---
***You can get your custom dataset through the python file, and get the csv files for training and testing.***
```python=
import os
import cv2
import torch
import numpy as np
import pandas as pd
from torch.utils.data import Dataset
from skimage import io
class CustomImageDataset(Dataset):
def __init__(self, csv_file, root_dir, transform=None):
self.annotations = pd.read_csv(csv_file)
self.root_dir = root_dir
self.transform = transform
def __len__(self):
return len(self.annotations)
def __getitem__(self, index):
img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 1])
image = io.imread(img_path)
label = torch.tensor(int(self.annotations.iloc[index, 2]))
if self.transform:
image = self.transform(image)
return image, label
def enumerate_files(dirs, path='All_gray_1_32_32/', n_poses=3, n_samples=20):
filenames, targets = [], []
for p in dirs:
for n in range(n_poses):
for j in range(3):
dir_name = path+p+'/000'+str(n*3+j)+'/'
for s in range(n_samples):
d = dir_name + '%04d/' % s
for f in os.listdir(d):
if f.endswith('jpg'):
filename = d + f
filename = filename.replace("All_gray_1_32_32/", "")
filenames += [filename]
targets.append(n)
return filenames, targets
def read_images(files):
imgs = []
for f in files:
img = cv2.imread(f, cv2.IMREAD_GRAYSCALE)
imgs.append(img)
return imgs
def read_datasets(datasets, csv_name):
files, labels = enumerate_files(datasets)
dataframe = {"filename": files,
"label": labels}
dataframe = pd.DataFrame(dataframe)
dataframe.to_csv(csv_name)
list_of_arrays = read_images(files)
return np.array(list_of_arrays), labels
if __name__ == "__main__":
train_sets = ['Set1', 'Set2', 'Set3']
test_sets = ['Set4', 'Set5']
trn_array, trn_labels = read_datasets(train_sets, csv_name="train_data.csv")
tst_array, tst_labels = read_datasets(test_sets, csv_name="test_data.csv")
```
*II . Method Description and Comparing with Hyperparameters and Model Architectures*
---
***In the intial setting, I constructed the model with two sequences of network, including a convolution layer, a activation function , a pooling layer in each sequence.***
```python
import torch
import torch.nn as nn
# Model
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential( # input shape (1, 28, 28)
nn.Conv2d(
in_channels=1, # input height
out_channels=16, # n_filters
kernel_size=5, # filter size
stride=1, # filter movement/step
padding=2, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
), # output shape (16, 28, 28)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 16, 16)
)
self.conv2 = nn.Sequential( # input shape (16, 16, 16)
nn.Conv2d(
in_channels=16, # input height
out_channels=32, # n_filters
kernel_size=5, # filter size
stride=1, # filter movement/step
padding=2,
), # output shape (32, 16, 16)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=2), # output shape (32, 8, 8)
)
self.out = nn.Linear(32 * 8 * 8, 3) # fully connected layer, output 3 classes
```
---
### Situation 1 : The Difference of Hyperparameters
| Epoch | Optimizer | Accuracy of Training(%) | Accuracy of Testing(%) |
|:----------:|:---------:|:--------------------:|:-------------------:|
| 25 | SGD |48.7|54.17|
| 25 | Adagrad |84.44|73.89|
| 25 | RMSProp |86.85|77.5|
| 25 | Adam |96.11|86.94|
| 50 | SGD |51.3|44.72|
| 50 | Adagrad |79.63|67.78|
| 50 | RMSProp |96.48|82.22|
| **50** | **Adam** |**99.44**|**93.89**|
---
### Situation 2 : The Difference of Model Architectures of Sequence
***Based on the result above the chart, I set the parameter with epoch=50, optimizer=Adam.***
```python
import torch
import torch.nn as nn
# Model
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential( # input shape (1, 28, 28)
nn.Conv2d(
in_channels=1, # input height
out_channels=16, # n_filters
kernel_size=5, # filter size
stride=1, # filter movement/step
padding=2, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
), # output shape (16, 28, 28)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=2), # choose max value in 2x2 area, output shape (16, 16, 16)
)
self.out = nn.Linear(16 * 16 * 16, 3) # fully connected layer, output 3 classes
```
| Sequence | Accuracy of Training(%) | Accuracy of Testing(%) |
|:--------:|:-----------------------:|:----------------------:|
| 1 | 96.3 | 68.33 |
| **2** | **99.4** | **93.89** |
***Without a doubt, the accuracy of testing with two sequences of network is better than one's. However, for convenience of experimentation, I utilized one sequence of network for the next comparation.***
---
### Situation 3 : The Difference of Model Architectures of inner parameters
* ***Number of Feature Filter :***
| n_filiter | Accuracy of Training(%) | Accuracy of Testing(%) |
|:--------:|:-----------------------:|:----------------------:|
| 8 | 95.56 | 69.17 |
| 16 | 97.22 | 73.06 |
| 32 | 98.33 | 76.67 |
| **64** | **100** | **84.72** |
| 128 | 100 | 91.39 |
***The more number of filter, the more accuracy of testing will be possibility higher, however, the time waste and burden of hardware also increase, finally, I chose n_filiter=64 for the parameter in neruon.***
* ***Filter Size :***
| Kernel Size | Accuracy of Training(%) | Accuracy of Testing(%) |
|:--------:|:-----------------------:|:----------------------:|
| 3 | 96.67 | 65.56 |
| **5** | **98.33** | **76.67** |
* ***Kernel Size of Pooling Layer :***
| Kernel Size | Accuracy of Training(%) | Accuracy of Testing(%) |
|:--------:|:-----------------------:|:----------------------:|
| 2 | 99.81 | 83.89 |
| **4** | **96.67** | **87.5** |
***In the result, the inner parameters, number of feature filter=64, filter size=5, kernel size of pooling layer=4, are adopted in the network.***
***Finally, Number of feature filter and kernel size of pooling layer are assigned to two sequences of network.***
```
Loss at Epoch 0 is 1.2168701399456372
Loss at Epoch 1 is 1.0974623289975254
Loss at Epoch 2 is 1.096157973462885
Loss at Epoch 3 is 1.0905030098828403
Loss at Epoch 4 is 1.077196717262268
Loss at Epoch 5 is 1.0530863295901904
Loss at Epoch 6 is 1.0225716124881397
Loss at Epoch 7 is 0.9748118248852816
Loss at Epoch 8 is 0.9226068691773848
Loss at Epoch 9 is 0.8592216318303888
Loss at Epoch 10 is 0.799804999069734
Loss at Epoch 11 is 0.7548064535314386
Loss at Epoch 12 is 0.7122143073515459
Loss at Epoch 13 is 0.680005363442681
Loss at Epoch 14 is 0.6488951552997936
Loss at Epoch 15 is 0.6219885457645763
Loss at Epoch 16 is 0.5968957136977803
Loss at Epoch 17 is 0.5709817626259543
Loss at Epoch 18 is 0.5488803711804476
Loss at Epoch 19 is 0.5227069610899145
Loss at Epoch 20 is 0.5009708120064302
Loss at Epoch 21 is 0.4788070619106293
Loss at Epoch 22 is 0.4572262628511949
Loss at Epoch 23 is 0.4384942759167064
Loss at Epoch 24 is 0.4169470979408784
Loss at Epoch 25 is 0.4000031148845499
Loss at Epoch 26 is 0.37714689428156073
Loss at Epoch 27 is 0.35991076650944626
Loss at Epoch 28 is 0.3390561254187064
Loss at Epoch 29 is 0.320497564971447
Loss at Epoch 30 is 0.3013303909789432
Loss at Epoch 31 is 0.28368994254957547
Loss at Epoch 32 is 0.2666636620732871
Loss at Epoch 33 is 0.24946328218687663
Loss at Epoch 34 is 0.23689039288596672
Loss at Epoch 35 is 0.21950358321720903
Loss at Epoch 36 is 0.20938315479592842
Loss at Epoch 37 is 0.19384147023612802
Loss at Epoch 38 is 0.18616960079155184
Loss at Epoch 39 is 0.17104969986460425
Loss at Epoch 40 is 0.16506836072287775
Loss at Epoch 41 is 0.1514935181899504
Loss at Epoch 42 is 0.14437737468291412
Loss at Epoch 43 is 0.13537038016048344
Loss at Epoch 44 is 0.12659668262031945
Loss at Epoch 45 is 0.1214209772138433
Loss at Epoch 46 is 0.11186013988811862
Loss at Epoch 47 is 0.10835131435570391
Loss at Epoch 48 is 0.09982061428441243
Loss at Epoch 49 is 0.09737169937315313
=================================================
Epoch is 50
-------------------------------------------------
The batch size is 50
-------------------------------------------------
The number of feature filter is 32 , size is 5
-------------------------------------------------
The Pooling Size is 2
-------------------------------------------------
The optimizer is Adam
-------------------------------------------------
The loss function is CrossEntropyLoss
-------------------------------------------------
Check Accuracy of Training :
Got 540 / 540 with accuracy 540.0/540.0 = 100.0
-------------------------------------------------
Check Accuracy of Testing :
Got 343 / 360 with accuracy 343.0/360.0 = 95.28
```

*III . Full code of Model Construction and Training&Testing*
---
```python==
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from Data_Spliting import CustomImageDataset
from torch.utils.data import DataLoader
# Hyper Parameters
Ladels = 3
EPOCH = 50 # train the training data n times, to save time, we just train 1 epoch
BATCH_SIZE = 50
LR = 0.001 # learning rate
OPTIMIZER = "Adam" # [SGD, Adagrad, RMSProp, Adam]
Feature_Filter = 32
Filter_Size = 5
Stride = 1
Padding = int(Filter_Size/2)
POOLING_SIZE = 2
# Load Data
train_data = CustomImageDataset(csv_file="train_data.csv", root_dir="All_gray_1_32_32", transform=transforms.ToTensor())
test_data = CustomImageDataset(csv_file="test_data.csv", root_dir="All_gray_1_32_32", transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE)
test_loader = DataLoader(dataset=test_data, batch_size=BATCH_SIZE)
# Model
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential( # input shape (1, 32, 32)
nn.Conv2d(
in_channels=1, # input height
out_channels=Feature_Filter, # n_filters
kernel_size=Filter_Size, # filter size
stride=Stride, # filter movement/step
padding=Padding, # if want same width and length of this image after Conv2d, padding=(kernel_size-1)/2 if stride=1
), # output shape (32, 32, 32)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=POOLING_SIZE), # choose max value in 2x2 area, output shape (32, 16, 16)
)
self.conv2 = nn.Sequential( # input shape (32, 16, 16)
nn.Conv2d(
in_channels=Feature_Filter, # input height
out_channels=Feature_Filter*2, # n_filters
kernel_size=Filter_Size, # filter size
stride=Stride, # filter movement/step
padding=Padding,
), # output shape (64, 16, 16)
nn.ReLU(), # activation
nn.MaxPool2d(kernel_size=POOLING_SIZE), # output shape (64, 8, 8)
)
self.out = nn.Linear(64*8*8, Ladels) # fully connected layer, output 3 classes
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = x.view(x.size(0), -1) # flatten the output of conv2 to (batch_size, 32 * 7 * 7)
output = self.out(x)
return output, x # return x for visualization
# Check Accuracy
def check_accuracy(loader, model):
num_correct = 0
num_samples = 0
model.eval()
with torch.no_grad():
for x, y in loader:
scores = model(x)[0]
_, predictions = scores.max(1)
num_correct += (predictions == y).sum()
num_samples += predictions.size()[0]
print(f'Got {num_correct} / {num_samples} with accuracy {float(num_correct)}/{float(num_samples)*1} = '
f'{round((float(num_correct)/float(num_samples))*100, 2)}')
model.train()
cnn = CNN()
# Optimizer Choice
if OPTIMIZER == "SGD": # [SGD, Adagrad, RMSProp, Adam]
optimizer = torch.optim.SGD(cnn.parameters(), lr=LR) # optimize all cnn parameters
elif OPTIMIZER == "Adagrad":
optimizer = torch.optim.Adagrad(cnn.parameters(), lr=LR)
elif OPTIMIZER == "RMSProp":
optimizer = torch.optim.RMSprop(cnn.parameters(), lr=LR)
elif OPTIMIZER == "Adam":
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)
loss_func = nn.CrossEntropyLoss() # the target label is not one-hotted
# Training
list_loss = []
list_epoch =[]
for epoch in range(EPOCH):
losses = []
for step, (data, target) in enumerate(train_loader): # gives batch data, normalize x when iterate train_loader
# Forward
output = cnn(data)[0] # cnn output
loss = loss_func(output, target) # cross entropy loss
losses.append(loss.item())
# Backward
optimizer.zero_grad() # clear gradients for this training step
loss.backward() # backpropagation, compute gradients
# Gradient Step
optimizer.step() # apply gradients
list_loss.append(sum(losses)/len(losses))
list_epoch.append(epoch)
print(f'Loss at Epoch {epoch} is {sum(losses)/len(losses)}')
print("=================================================")
print("Epoch is", EPOCH)
print("-------------------------------------------------")
print("The batch size is", BATCH_SIZE)
print("-------------------------------------------------")
print("The number of feature filter is", Feature_Filter, ", size is", Filter_Size)
print("-------------------------------------------------")
print("The Pooling Size is", POOLING_SIZE)
print("-------------------------------------------------")
print("The optimizer is", OPTIMIZER)
print("-------------------------------------------------")
print("The loss function is CrossEntropyLoss")
print("-------------------------------------------------")
print("Check Accuracy of Training : ")
check_accuracy(train_loader, cnn)
print("-------------------------------------------------")
print("Check Accuracy of Testing : ")
check_accuracy(test_loader, cnn)
# Loss Graph
x = list_epoch
y = list_loss
plt.plot(x, y, 'bo-', linewidth=1.5)
plt.title("Loss Function")
plt.xlabel("EPOCH")
plt.ylabel("LOSS")
plt.grid(True)
plt.show()
```
*IV . Concluding*
---
---
### 1. Improvement :
* ***Time Wasting :***
* It is possible way that we remove the constant value in Gaussian Distrubution formula for reducing the time of processing.
* Updating the hardware is also a suitable way, we use 4-core CPU in this project, if 8-core, even 16-core, 32-core CPU, the time wasting would be more concise.
* ***Accurancy :***
* Increasing the training data maybe is a good idea, but it must be careful that the more training data, the more possibility of overfitting it will happen.
* ***Customlize :***
* The codes can be more normalize to utilize in other circumstances, because the codes of this project is bulit up from the situation we have already known, if there is a new situation we haven't met, the programming would be unuseful.
### 2. What I've learned :
* ***Break through the concepts on ML :***
Since I had only written code for traditional machine learning before, I worked on "how to extract features" when I first started writing the assignment, and then I realized that everything was available on Pytorch, which made things much smoother.
* ***Confidence :***
Due to the success of this project, it makes me more confident, I'm no longer afraid of the challenge like this, and the experience will be the fertilizer for my life of engineer.