# Action Recognition based on OpenPose
###### tags: `AI DJ` `Teaching`
[TOC]
## Installation
Open `train.py` and change some code.
~~from keras.layers.normalization import BatchNormalization
from keras.optimizers import Adam~~
*to*
```python
from tensorflow.keras.optimizers import Adam
from keras.layers import Dense, Dropout, BatchNormalization
```
Open `generate_dets.py` in `Tracking/` & `pose_visualizer.py` in `Pose/`
~~import tensorflow as tf~~
*to*
```=python
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
```
And `open generate_dets.py` in `Tracking/` & `pose_visualizer.py` in `Pose/`
~~with tf.gfile.GFile(checkpoint_filename, "rb") as f:~~
*to*
with tf.io.gfile.GFile(graph_path, 'rb') as f:
## Training with own dataset
Prepare data(actions) by running `main.py`, remember to uncomment the code of data collecting, the origin data will be saved as a `.txt`.
```python=72
f = open('origin_data.txt', 'a+') #saved name
joints_norm_per_frame = np.array(pose[-1]).astype(np.str)
f.write(' '.join(joints_norm_per_frame))
f.write('\n')
```
The `.txt` file will look like this:

and then use Excel to covert the `.txt` to `.csv` file.


Make sure to use Space as the delimiter.

Finally, you will get the `.csv` file like this:

And put this to `data_under_scene.csv` using the template (class start from 0).
And open the `train.py` in `Action/training/` and **change the action_enum.**
please also change the `action_enum.py` in `Action/`
```python=20
class Actions(Enum):
wave = 0
idle = 1
```
The number is the total number of skeleton
The other is the amount of skeleton contained in each action:
> [Class]*Amount
```python=104
X = dataset[0:3328, 0:36].astype(float)
Y = dataset[0:3328, 36]
encoder_Y = [0]*1384 + [1]*1944
```
Model **(The parameter units is the number of your data classes. So just changing it the last one units.)**
```python=120
# build keras model
model = Sequential()
model.add(Dense(units=128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=16, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(units=2, activation='softmax')) # units = nums of classes #softmax
```
## Data pre-processing
Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues.
### Raw Data

### Filtered Data
Filtered data where incomplete poses are eliminated.

## Added two pretrain model Mobilenet_v2_small/large
### How does it compare to the first generation of MobileNets?
Overall, the MobileNetV2 models are **faster for the same accuracy** across the entire latency spectrum. In particular, the new models use 2x fewer operations, need **30% fewer parameters** and are about **30-40% faster** on a Google Pixel phone than MobileNetV1 models, all while **achieving higher accuracy.**

Download the **mobilenet_v2_small** + **mobilenet_v2_large** from [here](https://drive.google.com/drive/folders/1PpMUauDBbT3I85_rMDP6rQhWLRTBJCdK?usp=sharing), and put those file to `Pose/graph_models`
Change the code of `utils.py`, so you can use the pretrain model by changing the `main.py`
```python=39
def load_pretrain_model(model):
dyn_graph_path = {
'VGG_origin': str(file_path / "Pose/graph_models/VGG_origin/graph_opt.pb"),
'mobilenet_thin': str(file_path / "Pose/graph_models/mobilenet_thin/graph_opt.pb"),
'mobilenet_v2_small': str(file_path / "Pose/graph_models/mobilenet_v2_small/graph_opt.pb"),
'mobilenet_v2_large': str(file_path / "Pose/graph_models/mobilenet_v2_large/graph_opt.pb")
}
graph_path = dyn_graph_path[model]
if not os.path.isfile(graph_path):
raise Exception('Graph file doesn\'t exist, path=%s' % graph_path)
```
### Testing Result (1 Person)
Mobilenet_thin (The fastest but the accuracy is lowest):

MobileNet_v2_small (The second fastest accuracy is not bad):

MobileNet_v2_large:

VGG (Highest accuracy, but **slowest**)

I will use VGG for training and MobileNet_v2_small for realtime detection.
## Result
### Result with two classes (Idle/Wave)
#### 09/02
Classes: Wave & Idle
Batch Size:16
Epochs: 50
**Accuraccy: 91%**

**Loss Curves:**

**Confusion Matrix (Normalized):**

**Difference:**
In addition to the improvement in accuracy (about 10%), the previous model had misjudgments on the waist, but after increasing the data set and pre-processing the data, it has been significantly improved.
| The Old One (08/30)|The New One (09/02) |
| -------- | -------- |
||  |
**Video:**
Tested the result by using [KTH human actions dataset.](https://www.csc.kth.se/cvap/actions/)

#### 08/30
Classes: Wave & Idle
Batch Size:16
Epochs: 50
**Accuraccy: 81%**

**Loss Curves:**

**Confusion Matrix:**

**Video:**

### Result with three classes (Idle/Wave/Jump)
#### 9/3
Classes: Wave & Idle & Jump
Batch Size:16
Epochs: 50
**Accuraccy: 86%**

**Loss Curves:**

**Confusion Matrix (Normalized):**

#### 08/30
Classes: Wave & Idle & Jump
Batch Size:16
Epochs: 50
**Accuraccy:**

**Loss Curves:**

**Confusion Matrix:**

**Video:**
https://drive.google.com/file/d/1UIq02hmFxgBFxuqt_2FrWRBc5HEWiFkf/view?usp=sharing
## Classify actions with dynamic sequential joints data
### Discuss with Prof. Chiang for reference point on Crowd Detection
:::info
**Question:**
Now I want to use the LSTM model to classify action recognition using time series. At present, our idea is to judge actions with 10 frames.
The Jump part of the Dataset action is to jump from the ground to the air and back to the ground as a group (10 frames), then a large number of labels and training
The recognition part is that every 10 frames input will get an action estimation result
Because we detect once every 10 frames (1-10, 11-20, 21-30...), the beginning of the detection may not be people preparing to jump up from the ground, it may be people jumping from the air. Drop down (only jump 15-25)
I'm not sure if this will lead to inaccurate results.
:::
:::danger
**Answer by Prof. Chiang:**
The training data suggestion can be (1-10, 2-11, 3-12...)
In this way, there is a method of overlapping, and the amount of data can be more, so that the detection of this category will jump from bottom to top and then drop any 10 frames in this process, that is, the detection of vertical displacement will be recognized as a jump.
:::
---
### Create the dataset with 10 frame
I group different video clips into groups of 10 frames.
Each action will have 500 frames in total. (50 groups)
- [x] Idle
- [x] Wave
- [x] Jump
csv: https://drive.google.com/file/d/10UvHP5ZPGAH4UDQdrhWQ-RlyoZRaDDXR/view?usp=sharing

### Train the dataset using LSTM/GRU
First, we have to change the input 2D tensor to 3D tensor.
```python=103
df = pd.read_csv('se.csv')
dataset = np.array([matrix.to_numpy() for _, matrix in df.groupby('group')])
X = dataset[:,:,0:36].astype(float)
Y = dataset[:,:, 36][:,0].astype(int)
```
and
```python=126
# train test split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, random_state=42)
```
### Build the LSTM model
```python=146
# build LSTM model
model = Sequential()
model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(10,36))) #10frame with 36keypoints
model.add(LSTM(128, return_sequences=True, activation='relu'))
model.add(LSTM(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(3, activation='softmax'))
```
#### Result
Training Loss:

Accuracy: **93%**


#### Build the GRU model
```python=146
# build GRU model
model = Sequential()
model.add(GRU(256, return_sequences=True, activation='relu', input_shape=(10,36))) #10frame with 36keypoints
model.add(Dropout(0.1))
model.add(GRU(128, return_sequences=True, activation='relu'))
model.add(Dropout(0.1))
model.add(GRU(64, return_sequences=False, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(3, activation='softmax'))
```
##### Result
Training Loss:

Accuracy: **93%**

