Action Recognition based on OpenPose

# Action Recognition based on OpenPose ###### tags: `AI DJ` `Teaching` [TOC] ## Installation Open `train.py` and change some code. ~~from keras.layers.normalization import BatchNormalization from keras.optimizers import Adam~~ *to* ```python from tensorflow.keras.optimizers import Adam from keras.layers import Dense, Dropout, BatchNormalization ``` Open `generate_dets.py` in `Tracking/` & `pose_visualizer.py` in `Pose/` ~~import tensorflow as tf~~ *to* ```=python import tensorflow.compat.v1 as tf tf.disable_v2_behavior() ``` And `open generate_dets.py` in `Tracking/` & `pose_visualizer.py` in `Pose/` ~~with tf.gfile.GFile(checkpoint_filename, "rb") as f:~~ *to* with tf.io.gfile.GFile(graph_path, 'rb') as f: ## Training with own dataset Prepare data(actions) by running `main.py`, remember to uncomment the code of data collecting, the origin data will be saved as a `.txt`. ```python=72 f = open('origin_data.txt', 'a+') #saved name joints_norm_per_frame = np.array(pose[-1]).astype(np.str) f.write(' '.join(joints_norm_per_frame)) f.write('\n') ``` The `.txt` file will look like this: ![](https://i.imgur.com/kHpNeQD.png) and then use Excel to covert the `.txt` to `.csv` file. ![](https://i.imgur.com/qHuIGX5.png) ![](https://i.imgur.com/66G3JHB.png) Make sure to use Space as the delimiter. ![](https://i.imgur.com/rSaUBPV.png) Finally, you will get the `.csv` file like this: ![](https://i.imgur.com/s2WbNmU.png) And put this to `data_under_scene.csv` using the template (class start from 0). And open the `train.py` in `Action/training/` and **change the action_enum.** please also change the `action_enum.py` in `Action/` ```python=20 class Actions(Enum): wave = 0 idle = 1 ``` The number is the total number of skeleton The other is the amount of skeleton contained in each action: > [Class]*Amount ```python=104 X = dataset[0:3328, 0:36].astype(float) Y = dataset[0:3328, 36] encoder_Y = [0]*1384 + [1]*1944 ``` Model **(The parameter units is the number of your data classes. So just changing it the last one units.)** ```python=120 # build keras model model = Sequential() model.add(Dense(units=128, activation='relu')) model.add(BatchNormalization()) model.add(Dense(units=64, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(units=32, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(units=16, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(units=2, activation='softmax')) # units = nums of classes #softmax ``` ## Data pre-processing Data preprocessing is a data mining technique that involves transforming raw data into an understandable format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues. ### Raw Data ![](https://i.imgur.com/qycp5rq.png) ### Filtered Data Filtered data where incomplete poses are eliminated. ![](https://i.imgur.com/bIpwfED.png) ## Added two pretrain model Mobilenet_v2_small/large ### How does it compare to the first generation of MobileNets? Overall, the MobileNetV2 models are **faster for the same accuracy** across the entire latency spectrum. In particular, the new models use 2x fewer operations, need **30% fewer parameters** and are about **30-40% faster** on a Google Pixel phone than MobileNetV1 models, all while **achieving higher accuracy.** ![](https://i.imgur.com/jXtLpA1.png) Download the **mobilenet_v2_small** + **mobilenet_v2_large** from [here](https://drive.google.com/drive/folders/1PpMUauDBbT3I85_rMDP6rQhWLRTBJCdK?usp=sharing), and put those file to `Pose/graph_models` Change the code of `utils.py`, so you can use the pretrain model by changing the `main.py` ```python=39 def load_pretrain_model(model): dyn_graph_path = { 'VGG_origin': str(file_path / "Pose/graph_models/VGG_origin/graph_opt.pb"), 'mobilenet_thin': str(file_path / "Pose/graph_models/mobilenet_thin/graph_opt.pb"), 'mobilenet_v2_small': str(file_path / "Pose/graph_models/mobilenet_v2_small/graph_opt.pb"), 'mobilenet_v2_large': str(file_path / "Pose/graph_models/mobilenet_v2_large/graph_opt.pb") } graph_path = dyn_graph_path[model] if not os.path.isfile(graph_path): raise Exception('Graph file doesn\'t exist, path=%s' % graph_path) ``` ### Testing Result (1 Person) Mobilenet_thin (The fastest but the accuracy is lowest): ![](https://i.imgur.com/f6JDHMn.png) MobileNet_v2_small (The second fastest accuracy is not bad): ![](https://i.imgur.com/n2mgKle.png) MobileNet_v2_large: ![](https://i.imgur.com/RGbcPH9.png) VGG (Highest accuracy, but **slowest**) ![](https://i.imgur.com/prPIbri.png) I will use VGG for training and MobileNet_v2_small for realtime detection. ## Result ### Result with two classes (Idle/Wave) #### 09/02 Classes: Wave & Idle Batch Size:16 Epochs: 50 **Accuraccy: 91%** ![](https://i.imgur.com/tf2Fddx.png) **Loss Curves:** ![](https://i.imgur.com/rbHE0vk.png) **Confusion Matrix (Normalized):** ![](https://i.imgur.com/5vkdkfU.png) **Difference:** In addition to the improvement in accuracy (about 10%), the previous model had misjudgments on the waist, but after increasing the data set and pre-processing the data, it has been significantly improved. | The Old One (08/30)|The New One (09/02) | | -------- | -------- | |![](https://i.imgur.com/bPGZ68u.png)| ![](https://i.imgur.com/GLW0DVq.png) | **Video:** Tested the result by using [KTH human actions dataset.](https://www.csc.kth.se/cvap/actions/) ![](https://i.imgur.com/GmaYJ1Z.gif) #### 08/30 Classes: Wave & Idle Batch Size:16 Epochs: 50 **Accuraccy: 81%** ![](https://i.imgur.com/MVcctwM.png) **Loss Curves:** ![](https://i.imgur.com/cM7dURg.png) **Confusion Matrix:** ![](https://i.imgur.com/tO22wDs.png) **Video:** ![](https://i.imgur.com/aXmrubc.gif) ### Result with three classes (Idle/Wave/Jump) #### 9/3 Classes: Wave & Idle & Jump Batch Size:16 Epochs: 50 **Accuraccy: 86%** ![](https://i.imgur.com/R6X1oSC.png) **Loss Curves:** ![](https://i.imgur.com/Lp9flt0.png) **Confusion Matrix (Normalized):** ![](https://i.imgur.com/L17UUAG.png) #### 08/30 Classes: Wave & Idle & Jump Batch Size:16 Epochs: 50 **Accuraccy:** ![](https://i.imgur.com/kzuzvat.png) **Loss Curves:** ![](https://i.imgur.com/XteuxFW.png) **Confusion Matrix:** ![](https://i.imgur.com/JOlQwSB.png) **Video:** https://drive.google.com/file/d/1UIq02hmFxgBFxuqt_2FrWRBc5HEWiFkf/view?usp=sharing ## Classify actions with dynamic sequential joints data ### Discuss with Prof. Chiang for reference point on Crowd Detection :::info **Question:** Now I want to use the LSTM model to classify action recognition using time series. At present, our idea is to judge actions with 10 frames. The Jump part of the Dataset action is to jump from the ground to the air and back to the ground as a group (10 frames), then a large number of labels and training The recognition part is that every 10 frames input will get an action estimation result Because we detect once every 10 frames (1-10, 11-20, 21-30...), the beginning of the detection may not be people preparing to jump up from the ground, it may be people jumping from the air. Drop down (only jump 15-25) I'm not sure if this will lead to inaccurate results. ::: :::danger **Answer by Prof. Chiang:** The training data suggestion can be (1-10, 2-11, 3-12...) In this way, there is a method of overlapping, and the amount of data can be more, so that the detection of this category will jump from bottom to top and then drop any 10 frames in this process, that is, the detection of vertical displacement will be recognized as a jump. ::: --- ### Create the dataset with 10 frame I group different video clips into groups of 10 frames. Each action will have 500 frames in total. (50 groups) - [x] Idle - [x] Wave - [x] Jump csv: https://drive.google.com/file/d/10UvHP5ZPGAH4UDQdrhWQ-RlyoZRaDDXR/view?usp=sharing ![](https://i.imgur.com/TaFzHVQ.png) ### Train the dataset using LSTM/GRU First, we have to change the input 2D tensor to 3D tensor. ```python=103 df = pd.read_csv('se.csv') dataset = np.array([matrix.to_numpy() for _, matrix in df.groupby('group')]) X = dataset[:,:,0:36].astype(float) Y = dataset[:,:, 36][:,0].astype(int) ``` and ```python=126 # train test split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.1, random_state=42) ``` ### Build the LSTM model ```python=146 # build LSTM model model = Sequential() model.add(LSTM(64, return_sequences=True, activation='relu', input_shape=(10,36))) #10frame with 36keypoints model.add(LSTM(128, return_sequences=True, activation='relu')) model.add(LSTM(64, return_sequences=False, activation='relu')) model.add(Dense(64, activation='relu')) model.add(Dense(32, activation='relu')) model.add(Dense(3, activation='softmax')) ``` #### Result Training Loss: ![](https://i.imgur.com/9suCWnQ.png) Accuracy: **93%** ![](https://i.imgur.com/4XJxxrd.png) ![](https://i.imgur.com/IHNibAn.png) #### Build the GRU model ```python=146 # build GRU model model = Sequential() model.add(GRU(256, return_sequences=True, activation='relu', input_shape=(10,36))) #10frame with 36keypoints model.add(Dropout(0.1)) model.add(GRU(128, return_sequences=True, activation='relu')) model.add(Dropout(0.1)) model.add(GRU(64, return_sequences=False, activation='relu')) model.add(Dense(64, activation='relu')) model.add(Dense(32, activation='relu')) model.add(Dense(3, activation='softmax')) ``` ##### Result Training Loss: ![](https://i.imgur.com/Md86TPS.png) Accuracy: **93%** ![](https://i.imgur.com/34CbwjG.png) ![](https://i.imgur.com/SySNa3F.png)

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.