ShirleySkate - HackMD

--- tags: SportTech --- ShirleySkate === ![](https://i.imgur.com/Z9QLKqY.jpg) In this project, we cast the detection of on air time into a sequence labeling problem by classifying each frame as either B$-label$, I$-label$, E$-label$, or O$-label$, which is similar to the BIOES format (short for Beginning, Inside, Outside, End, Single) for the Named Entity Recognition task. A frame is labeled as B$-label$ if it is the taking-off frame, I$-label$ if it is one of the continuous on air frames, E$-label$ if it is the landing frame, or O$-label$ otherwise. Note that we define $the\space B\space frame$ exactly the frame where the skate leaves the ice. The one air time of a jump can then be computed by dividing the number of I$-label$s by each video's fps (frame per second). The pipeline of the methodology is shown below ![](https://i.imgur.com/AOxYjHl.jpg) * **Project code**: `/home/lin10/projects/SkatingJumpClassifier` * [**Github repo**](https://github.com/YilingLin10/SkatingJumpClassifier.git) * folder structure ``` SkatingJumpClassfier ├── 20220801 └── 20220801_Jump_重新命名 └── data │ └── all_jump │ └── single_jump │ └── multiple_jump │ └── Axel │ └── Loop │ └── Flip │ └── Lutz └── preprocess └── model └── configs └── experiments └── trainSeq.py └── test.py └── utils.py └── vis_results.py └── eval.py ``` ```csvpreview {header="true"} folder/file name, description 20220801, folder that stores the per-frame images & alphapose-results.json of each video (classified by jump type) 20220801_Jump_重新命名, folder that stores the videos data, folder that stores all the datasets (currently 7 datasets) preprocess, folder for dataset preparation model, folder that stores the models configs, folder that stores the config files experiments, folder that stores the checkpoints trainSeq.py, training script test.py, testing script utils.py, evaluation script vis_results.py, visualization script eval.py, compute the mean error percentage ``` ## Dataset Preparation ### 2D Pose Estimation (AlphaPose) * For each video, get per-frame human joint location and confidence data with [AlphaPose](https://github.com/cjwku1209/alpha_pose). * [AlphaPose output format](https://github.com/MVIG-SJTU/AlphaPose/blob/master/docs/output.md) * The resulting data for each video is stored in a folder called `alpha_pose_{$video_name}`, including * A `/vis` folder which contains all frames in the video in `*.jpg` format * A `/alphapose-results.json` file which contains COCO 17 keypoints information for every person in the video * Move the results to `./20220801` * Place the `*.jpg` frames & `alphapose-results.json` in `./20220801/${action_name}/${video_name}`, e.g., * `./20220801/Axel/Axel_1/0.jpg`, * `./20220801/Axel/Axel_1/alphapose-results.json` :::info :mega: We define the person with the highest AlphaPose confidence in the first frame as the main person in the video. > Let $\tilde{p_i}$ denote the extracted pose of the skater for the $i$-th frame. Since the output of AlphaPose includes poses of multi-person, to extract the 2D poses of the skater, we extract the pose with the highest confidence in the first frame as $\tilde{p_1}$ while ensuring that the pose belongs to the skater. For $P_{i}=\{p_{i,1}, p_{i,2}, ...\}$, where $p_{i,j}$ denotes the $j$-th pose for the ${i}$-th frame, we compute the euclidean distance between $\tilde{p_{i-1}}$ and $p_{i,j}$, and obtain the pose with the minimum distance as $\tilde{p_i}$. [color=#907bf7] ::: :::danger :warning: **Please make sure that the main person of the videos is the skater.** :warning: **Remove the videos that don't meet this criteria from the dataset** * Run `./draw_skeleton.py` to check the videos with drawn main skeletons (Please refer to [video alignment](https://hackmd.io/Aowkrl7SRwqd0HU5Z4014A?view#Dataset-Preparation)) * The valid videos of each dataset is listed in `./data/${dataset_name}/train_list.txt` and `./data/${dataset_name}/test_list.txt` ::: ### Human Pose Embedding ([Pr-VIPE](https://github.com/google-research/google-research/tree/master/poem/pr_vipe)) * project code: `/home/lin10/projects/poem` * generate Pr-VIPE embeddings for each video ```bash cd /home/lin10/projects/poem conda activate poem bash alphapose2embs.sh ``` ### Augmentation We observed that taking off and landing actions take approximately 30 frames in the setting of 30 fps. Therefore, we augment a single video into numerous data points by trimming out different amounts of context from the beginning and from the end while making sure that at least 30 frames of context are kept. * Run `python ./preprocess/augmentation.py --action ${dataset_name}` * E.g., `python ./preprocess/augmentation.py --action all_jump` * This script generates a `.jsonl` file for each split in the dataset that includes the annotations of the augmented samples (will later be used in `preprocess.py`) ```json // E.g., {"id": "Axel_1-0", "video_name": "Axel_1", "start_frame": 0, "end_frame": 113, "start_jump_1": 72, "end_jump_1": 80, "start_jump_2": -1, "end_jump_2": -1} {"id": "Axel_1-1", "video_name": "Axel_1", "start_frame": 1, "end_frame": 110, "start_jump_1": 72, "end_jump_1": 80, "start_jump_2": -1, "end_jump_2": -1} ... ``` ### Preprocessing * Generate the training and testing data in `.pkl` format * Run `python ./preprocess/preprocess.py --action ${dataset_name}` * e.g., `python ./preprocess/preprocess.py --action all_jump` * The generated `train.pkl` and `test.pkl` files contain all samples in the dataset after augmentation, where each sample is represented by a Python dictionary with the following keys * $\textit{video_name}$: the name of the sample * $\textit{output}$: the ground truth labels $\tilde{y}=\{\tilde{y}_1,\tilde{y}_2,...\tilde{y}_T\}$, where $\tilde{y}_i$ is the ground truth label of the $i$-th frame. * $\textit{features}$: the features stores 2D poses in `data/${dataset_name}/raw_skeletons/${split}.pkl` or Pr-VIPE embeddings in `data/${dataset_name}/poem_embeddings/${split}.pkl` * the estimated 2D poses $X=\{x_1,x_2,...x_T\}$, where $x_i$ denotes the 17 joint coordinates of the $i$-th frame, stored in a NumPy array of shape (T,17,2) * the Pr-VIPE embeddings $X=\{x_1,x_2,...x_T\}$, where $x_i$ denotes the Pr-VIPE embedding of the $i$-th frame, stored in a NumPy array of shape (T,17,16) | key | value | | ---------- | ------------------------------------------------------------| | video_name | the name of the sample | | output | the ground truth labels | | features | 2D poses in `data/${dataset_name}/raw_skeletons/${split}.pkl` or Pr-VIPE embeddings in `data/${dataset_name}/poem_embeddings/${split}.pkl` | ## Models * `stgcn_transformer.py`: GCN block from ST-GCN combined with an Encoder-CRF model * `encoder_crf.py`: an Encoder-CRF model (2-layer encoder followed by a linear-chain CRF) ## Train ```bash python trainSeq.py --experiment_name ${EXPERIMENT_NAME} --model_name ${dataset_name} --config_name ${CONFIG_NAME} --dataset ${dataset_name} # e.g., python trainSeq.py --experiment_name raw_skeletons_0306 --config_name raw_skeletons_0306 --dataset all_jump ``` * The checkpoint will be stored in `./experiments/${EXPERIMENT_NAME}/${dataset_name}/save` * `CONFIG_NAME`: the name of the config file in `./configs` folder * `dataset_name`: the name of the dataset to **train** on ## Test ```bash python test.py --model_path ${MODEL_PATH} --config_name ${CONFIG_NAME} --dataset ${dataset_name} # e.g., python test.py --model_path raw_skeletons_0306/all_jump/ --config_name raw_skeletons_0306 --dataset all_jump ``` * `MODEL_PATH`: path to `./experiments/` checkpoint (`${EXPERIMENT_NAME}/${train_dataset_name}/`) * `CONFIG_NAME`: the name of the config file in `./configs` folder * `dataset_name`: the name of the dataset to **test** on > Generated prediction will be stored in `MODEL_PATH` as `${dataset_name}_test_pred.csv`, e.g., `./experiments/raw_skeletons_0306/all_jump/all_jump_test_pred.csv` ## Compute Mean Error Percentage ```bash python eval.py --model_path ${MODEL_PATH} --action ${dataset_name} # e.g., python eval.py --model_path raw_skeletons_0306/all_jump/ --action all_jump ``` * `MODEL_PATH`: path to `./experiments/` checkpoint * `dataset_name`: the name of the dataset to evaluate on ## Visualization > Generate a video that compares the ground truth with the predictions of models ```bash python vis_results.py --action ${action} --video_name ${video_name} # e.g., python vis_results.py --action Axel --video_name Axel_19 ``` * `video_name`: the name of the video * `action`: the jump type of the video (the name of the folder in which the video is placed under `/20220801`) ### Demo {%youtube IVFyzsZH8P0 %}