# LogBook Andy ## Experiment Diagram <iframe width="2000" height="800" src="https://miro.com/app/live-embed/uXjVKBCMrAo=/?moveToViewport=-1445,-308,2165,979&embedId=658173738234" frameborder="0" scrolling="no" allow="fullscreen; clipboard-read; clipboard-write" allowfullscreen></iframe> ## Diagram Methods <img src="https://hackmd.io/_uploads/Syj8ARwRa.jpg" alt="image" width="1650" height="auto"> <img src="https://hackmd.io/_uploads/SyihACvC6.jpg" alt="image" width="1650" height="auto"> ## Analysis ![image](https://hackmd.io/_uploads/SyooNyEPA.png) <iframe src="https://drive.google.com/file/d/1jvvEN2CFy_iIsAQMQvjfLmeInIGGjZOx/preview" width="640" height="480" allow="autoplay"></iframe> P-STMO ![pstmo1](https://hackmd.io/_uploads/Sk-yGvYaa.jpg) GAST <img src="https://hackmd.io/_uploads/rk2gzQF0T.jpg" alt="image" width="300" height="auto"> SEMGCN <img src="https://hackmd.io/_uploads/rJ0vxu56p.jpg" alt="image" width="300" height="auto"> VideoPose3D <img src="https://hackmd.io/_uploads/rkXn7eysT.png" alt="image" width="600" height="180"> RS-Net <img src="https://hackmd.io/_uploads/BJisGDhQC.png" alt="image" width="300" height="180"> FTCM <img src="https://hackmd.io/_uploads/SkzMSv2mA.png" alt="image" width="300" height="180"> # Graph MPJPE Comparation the GCN model ![image](https://hackmd.io/_uploads/rJXFBDZ6A.png) ![image](https://hackmd.io/_uploads/S1n5Hvb6A.png) # Visualize other video ![image](https://hackmd.io/_uploads/SJYHAkW6R.png) **GREETING** <iframe src="https://drive.google.com/file/d/1lao-EvIgTxc9kULXWBFU-ZxglKUwuBa3/preview" width="640" height="480" allow="autoplay"></iframe> **PHOTO** <iframe src="https://drive.google.com/file/d/1Z_DETpBBevXoHZuRrPMgc-Bvh0vHKlFY/preview" width="640" height="480" allow="autoplay"></iframe> **POSING** <iframe src="https://drive.google.com/file/d/1dDT77mo4-n0sxlZK5YdMZx7MkJ5ufZ8f/preview" width="640" height="480" allow="autoplay"></iframe> # Inference Time Comparation ![image](https://hackmd.io/_uploads/rkTfcJvnR.png) # Parameter and FLOP Comparation ![image](https://hackmd.io/_uploads/r1jqK1DhC.png) # HumanEva Experiment ![image](https://hackmd.io/_uploads/ry_med73C.png) ![image](https://hackmd.io/_uploads/SyrBxdmn0.png) SOTA HUMANEVA Dataset [SOTA](https://paperswithcode.com/sota/3d-human-pose-estimation-on-humaneva-i) # Visualize Mod5GCN Model <img src="https://hackmd.io/_uploads/Hku98cYsA.png" alt="image" width="1300" height="200"> <iframe src="https://drive.google.com/file/d/1ALgstf7GQPQ_n8igw9XOLQpABUpEcMLI/preview" width="640" height="480" allow="autoplay"></iframe> # Modif 5 modify SEM_GCN_Conv,Resgraphconv and BGConv **SEM_GCN_Conv Model** - Used Kaiming initialization to better handle ReLU activations - Apply The routing function (ANY-GCN concept) and modify by replacing the single fully connected (FC) layer in ANY-GCN with two FC layers and a ReLU activation function. - Reduce Dropout Value **Resgraphconv and BGConv using from Modif 2 with modify** - Use Switch normalisation - Use MISH Activation **Result** ![image](https://hackmd.io/_uploads/BJijmHZAR.png) <img src="https://hackmd.io/_uploads/SyFfQSZR0.png" alt="image" width="1800" height="500"> # Visualize SEMGCN Model and Mod2GCN Model Train using - Non Local - Epoch 30 Visualize SemGCN <img src="https://hackmd.io/_uploads/SJz7kLfq0.png" alt="image" width="1300" height="200"> Error 44.84 MPJPE 4975818.88 P-MPJPE 370.22 Visualize Mod2GCn <img src="https://hackmd.io/_uploads/B1nDkUG9A.png" alt="image" width="1300" height="200"> Error 45.28 MPJPE 186.1692 P-MPJPE 121.96 semgcn Pretrain <iframe src="https://drive.google.com/file/d/1jvvEN2CFy_iIsAQMQvjfLmeInIGGjZOx/preview" width="640" height="480" allow="autoplay"></iframe> Result SEMGCN <iframe src="https://drive.google.com/file/d/1-G_nKqo6wJ1dp6A9r4s8dRsC6aXbxKzu/preview" width="640" height="480" allow="autoplay"></iframe> Result Mod2GCN <iframe src="https://drive.google.com/file/d/1-LM154jLJ_BhoA34hnkDeZCADXvyVWeG/preview" width="640" height="480" allow="autoplay"></iframe> # Modif SEMGCN Model in GraphConv class and ResgraphConv class <img src="https://hackmd.io/_uploads/H1cqq-KF0.png" alt="image" width="1300" height="800"> <img src="https://hackmd.io/_uploads/By6ImOYtA.png" alt="image" width="800" height="300"> ![image](https://hackmd.io/_uploads/r1lEc9OYKC.png) # Modif SEMGCN Model with Capsule Network <img src="https://hackmd.io/_uploads/ByXYztLOA.png" alt="image" width="800" height="300"> # analize & Combine Visualize result ![image](https://hackmd.io/_uploads/SyooNyEPA.png) <iframe src="https://drive.google.com/file/d/1jvvEN2CFy_iIsAQMQvjfLmeInIGGjZOx/preview" width="640" height="480" allow="autoplay"></iframe> # result this week 1. success visualize3D P-STMO <iframe src="https://drive.google.com/file/d/1-XanYgcJEt9BINmjhAMzH5hM9KZkgiUB/preview" width="640" height="480" allow="autoplay"></iframe> 2. process evaluate FTCM with Detectron COCO ![image](https://hackmd.io/_uploads/rkC70qs8R.png) 3. Plan Next Week a. Visualize 3D FCTM b. Modify RSNet to 3DVisualize c. Continue modify model # Plan next week 1. Analize Attention3DHPE visualize 3D result 2. Modifying Video-to-pose3D for other model to visualize 3. apply FTCM and Videopose3D Model Semgcn + ResNET P-STMO + FTCM UnchunkedGenerator Visualize 3D VideotoPose3D StrideTransform + Yolo Attention3DHPE – pretrain data to # Try find Another visualize 3D -> Attention3DHPE epoch 80 batchsize 1024 learn rate 0.001 still run training which 1 epoch 313 minutes **How if use paper pretrain data for visualize?** # StrideTransform 3D Visualize result MPJPE 45.35 <iframe src="https://drive.google.com/file/d/1g5xbMtWZZJdv6D2dzdj4C_OizAX6YdJq/preview" width="640" height="480" allow="autoplay"></iframe> in rotate pose still have problem loosing some keypoint <img src="https://hackmd.io/_uploads/rJNhTSfL0.jpg" alt="image" width="640" height="480"> # Try FTCM # Try find Another visualize 3D -> StrideTransform epoch 21 batch_size: 196 channel: 256 layers: 3 n_joints: 17 no refine Train -> MPJPE : 44.32 give refine -> still run train **after run train - test train model - visualize 3D** # Visualize Model PSTMO try 2 <iframe src="https://drive.google.com/file/d/1mGpCjsSYxp-Nf4JcLPBdRwEN9Kvn1RJo/preview" width="640" height="480" allow="autoplay"></iframe> **Problem many key didnot found in when load torch -> learn again visualize 3D videotopose3D to fix this** Key pose_emb.bias not found in model_dict Key pose_emb.weight_g not found in model_dict Key pose_emb.weight_v not found in model_dict Key mlpmixer.gmlp.0.fn.ln1_0.weight not found in model_dict Key mlpmixer.gmlp.0.fn.ln1_0.bias not found in model_dict Key mlpmixer.gmlp.0.fn.fc1_0.weight not found in model_dict Key mlpmixer.gmlp.0.fn.fc1_0.bias not found in model_dict Key mlpmixer.gmlp.0.fn.SG0.ecl.conv1.weight not found in model_dict Key mlpmixer.gmlp.0.fn.SG0.ecl.conv2.weight not found in model_dict Key mlpmixer.gmlp.0.fn.SG0.ecl.ln1.weight not found in model_dict Key mlpmixer.gmlp.0.fn.SG0.ecl.ln1.bias not found in model_dict Key mlpmixer.gmlp.0.fn.SG0.ecl.ln2.weight not found in model_dict Key mlpmixer.gmlp.0.fn.SG0.ecl.ln2.bias not found in model_dict Key mlpmixer.gmlp.0.fn.SG0.gfp.lfilter not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_1024.weight not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_1024.bias not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_512.weight not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_512.bias not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_256.weight not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_256.bias not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_128.weight not found in model_dict Key mlpmixer.gmlp.0.fn.fc2_0.ln_128.bias not found in model_dict Key mlpmixer.gmlp.1.fn.ln1_1.weight not found in model_dict Key mlpmixer.gmlp.1.fn.ln1_1.bias not found in model_dict Key mlpmixer.gmlp.1.fn.fc1_1.weight not found in model_dict Key mlpmixer.gmlp.1.fn.fc1_1.bias not found in model_dict # Visualize Model **SEMGCN** <iframe src="https://drive.google.com/file/d/1-4G5ZopBJk6s-NMzaxVXxTkAGefYp8kI/preview" width="640" height="480" allow="autoplay"></iframe> **VideotoPose3D** <iframe src="https://drive.google.com/file/d/1aLUXDyodgLubtJ__upTKaLXM_3K6vg-M/preview" width="640" height="480" allow="autoplay"></iframe> **P-STMO Train base Videotopose3D** <iframe src="https://drive.google.com/file/d/1iyAhhzMzLUSD0CQxfsVo-dHR81-2BULZ/preview" width="640" height="480" allow="autoplay"></iframe> # Track Code Trying tracking the code and search different correlation RSNET & SEMGCN <iframe width="1078" height="800" src="https://miro.com/app/live-embed/uXjVKBHVmRg=/?moveToViewport=-15201,-3222,14152,6403&embedId=637448034349" frameborder="0" scrolling="no" allow="fullscreen; clipboard-read; clipboard-write" allowfullscreen></iframe> # Track Code Trying tracking the code and search different correlation FTCM & P-STMO <iframe width="1078" height="800" src="https://miro.com/app/live-embed/uXjVKBCZAgM=/?moveToViewport=-6067,-2247,8011,3624&embedId=257801627753" frameborder="0" scrolling="no" allow="fullscreen; clipboard-read; clipboard-write" allowfullscreen></iframe> Next RS-Net & SEMGCN # Result RS-Net & FTCM **RS-Net** frame 243 Batchsize 512 epoch 31 <img src="https://hackmd.io/_uploads/BJisGDhQC.png" alt="image" width="400" height="280"> Frame 1 ===Action=== ==p#1 mm== =p#2 mm= Directions 113.80 47.42 Discussion 130.63 54.06 Eating 85.72 47.56 Greeting 130.99 57.24 Phoning 92.98 53.31 Photo 191.98 70.40 Posing 127.63 50.84 Purchases 73.22 40.14 Sitting 79.82 50.35 SittingDown 107.14 64.03 Smoking 115.83 52.74 Waiting 136.16 51.88 WalkDog 202.82 63.29 Walking 125.82 52.67 WalkTogether 138.30 59.89 Average 123.52 54.39 **FTCM** frame 243 Batchsize 512 epoch 80 <img src="https://hackmd.io/_uploads/SkzMSv2mA.png" alt="image" width="400" height="280"> Frame 1 ===Action=== ==p#1 mm== =p#2 mm= Directions 120.65 39.30 Discussion 148.97 46.85 Eating 65.00 41.46 Greeting 117.38 45.44 Phoning 76.52 43.69 Photo 175.15 54.63 Posing 85.96 40.73 Purchases 64.61 35.42 Sitting 83.00 51.40 SittingDown 107.57 56.58 Smoking 99.56 46.84 Waiting 118.36 45.56 WalkDog 162.31 46.13 Walking 38.97 27.65 WalkTogether 41.23 29.45 Average 100.35 43.41 # Run Experiment FTCM 5/15 start run experiment with composition number of frame 243 batch size 512 epoch 80 date 5/17 still run epoch 33 question: result almost the same with GLA-GCN, is there any correlation with inplace=false ? RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [512, 1024]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with 1. torch.autograd.set_detect_anomaly(True). 2. inplace=true -> inplace=false # Technical Problem train GLA-GCN after run 6 days..there was power outage in all NDHU that the announcement 2 days after run the training.. after see the result before it and after get advise from Professor that the saturation not good enough..week 3 the train of GLA-GCN not continue question human3.6m train : S1,S5,S6,S7,S8 eval data : S9 Test data : S11 **train data**: train the data with augmented training generator **eval data**: train data with Evaluation Training Generator 3D_train : This metric measures the loss on the training dataset during the training process. Specifically, it tracks how well the model predicts 3D poses from 2D inputs during each training epoch. 3D_eval : This metric measures the loss on the training dataset, but in evaluation mode (i.e., without training/learning updates). This is typically used to check overfitting and to see how the model performs on the training data when it's not actively learning. 3D_valid : This metric measures the loss on the validation dataset. It is used to evaluate how well the model generalizes to unseen data. It is computed in a similar manner as the training loss from test data # Prepare experiment RS-Net -> Videopose3D preparation step 1. prepare the github 2. prepare the dataset 3. modify the dataset Human3.6M follow the PoseAUG folder: ``` ${PoseAug} ├── data ├── data_3d_h36m.npz ├── data_2d_h36m_gt.npz ├── data_2d_h36m_detectron_ft_h36m.npz ├── data_2d_h36m_cpn_ft_h36m_dbb.npz ├── data_2d_h36m_hr.npz 4. modify the dataset follow the PoseAug folder: ``` ${PoseAug} ├── data_extra ├── bone_length_npy ├── hm36s15678_bl_templates.npy ├── dataset_extras ├── mpi_inf_3dhp_valid.npz ├── ... (not in use) ├── test_set ├── test_3dhp.npz ├── prepare_data_3dhp.ipynb ├── prepare_data_3dhp.py 5. Try Running process in Colab problem need more GPU - modify the batch size from 512 until 256 6. Proses Train 4090 -> waiting list # Prepare experiment FTCM -> Videopose3D preparation step 1. prepare the github 2. prepare the dataset 3. modify the dataset Human3.6M follow the videopose3D folder: ``` ${VideoPose3D} ├── data ├── data_3d_h36m.npz ├── data_2d_h36m_gt.npz ├── data_2d_h36m_detectron_ft_h36m.npz ├── data_2d_h36m_cpn_ft_h36m_dbb.npz ├── data_2d_h36m_hr.npz 4. modify the dataset MPI-INF-3DHP follow the P-STMO folder: ``` ${P-STMO} ├── data ├── data_test_3DHP.npz ├── data_train_3DHP.npz 5. Try Running process in Colab problem need more GPU - modify the batch size from 256 until 96 still have problem -> waiting list try in server 4090 # experiment GLA-GCN Train process epoch 80 -> in paper use 2 graphic Card 3090 need 2 days Experiment use 1 graphic card 4090 - start monday May 6 until now still epoch 39 (4 days) it could be 7-8 days # Prepare experiment GLA-GCN -> Videopose3D preparation step 1. prepare the github 2. prepare the dataset 3. modify the dataset follow the videopose3D 4. prepare common tool from other experiment ![glagcn4](https://hackmd.io/_uploads/Bk7--o_bR.jpg) 5. Proses Train -> prepare the server # Fast Track Article ## GLA-GCN Global local Adaptive Graph Convolutional Network for 3D Human Pose ICCV 2023 Contribution: 1. We propose a global-local learning architecture that leverages the global spatiotemporal representation and local joint representation in the GCN-based model for 3D human pose estimation. 2. We are the first to introduce an individually connected layer that has two components to divide joint nodes and input the joint node representation for 3D pose joint estimation instead of based on pooled features. ![glagcn](https://hackmd.io/_uploads/rJuNOcO-R.jpg) ![glagcn1](https://hackmd.io/_uploads/SJaStcO-A.jpg) ![glagcn2](https://hackmd.io/_uploads/SkEYqcuZC.jpg) ![glagcn3](https://hackmd.io/_uploads/SyYF95dZ0.jpg) # Fast Track Article ## FTCM: Frequency-Temporal Collaborative Module for Efficient 3D Human Pose Estimation in Video IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 34, NO. 2, FEBRUARY 2024 **Impact Factor. 8.4** Contribution: 1. We propose frequency pose-mixing unit (i.e., FPM) to capture the global context for 3D-HPE in video. As per our knowledge, we are the first one to introduce the frequency transformation into the studied 3D-HPE task. 2. We design a simple yet effective self-gating module (i.e., TPM) to capture local axial context for capturing the inter-pose relations within several adjacent frames. 3. In the experiment, we demonstrate that the proposed FTCM can achieve either comparable or superior results to those heavy parameterized methods on two benchmarks but require a much lower computational cost. ![FTCM](https://hackmd.io/_uploads/Skus2EybA.jpg) ![ftcm2](https://hackmd.io/_uploads/ryzyT4k-R.jpg) ![ftcm3](https://hackmd.io/_uploads/Hk_SaVJbR.jpg) ![ftcm1](https://hackmd.io/_uploads/BJ-C1ByWR.jpg) ## Regular Splitting Graph Network for 3D Human Pose Estimation IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 32, 2023 **Impact Factor: 10.6** The key contributions of this work can be summarized as follows: 1. We propose a higher-order regular splitting graph network for 3D human pose estimation using matrix splitting in conjunction with weight and adjacency modulation. 2. We introduce a new objective function for training our proposed graph network by leveraging the regularizer of the elastic net regression. 3. We design a variant of the ConvNeXT residual block and integrate it into our graph network architecture. 4. We demonstrate through experiments and ablation studies that our proposed model achieves state-of-the-art performance in comparison with strong baselines. ![rsnet](https://hackmd.io/_uploads/ByM_REyZR.jpg) ![rsnet2](https://hackmd.io/_uploads/SJyzJrkZ0.jpg) ![rsnet3](https://hackmd.io/_uploads/SywLkSJWC.jpg) ![rsnet1](https://hackmd.io/_uploads/S1SFJBkZA.jpg) ## GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation ArXIV, Computer Vision and Pattern Recognition, 2023 Research Impact Score 2.4 our main contributions are summarized as follows: 1. We present, to the best of our knowledge, the first MLPLike architecture called GraphMLP for 3D human pose estimation. It combines the advantages of modern MLPs and GCNs, including globality, locality, and connectivity. 2. The novel SG-MLP and CG-MLP blocks are proposed to encode the graph structure of human bodies within MLPs to obtain domain-specific knowledge about the human body while enabling the model to capture both local and global interactions. 3. A simple and efficient video representation is further proposed to extend our GraphMLP to the video domain flexibly. This representation enables the model to effectively process arbitrary-length sequences with negligible computational cost gains. 4. Extensive experiments demonstrate the effectiveness and generalization ability of the proposed GraphMLP, and show new state-of-the-art results on two challenging datasets, i.e., Human3.6M and MPI-INF-3DHP ![graphmlp](https://hackmd.io/_uploads/BkIeMB1ZC.jpg) ![graphmlp1](https://hackmd.io/_uploads/r1yrzHy-0.jpg) ![graphmlp2](https://hackmd.io/_uploads/r12ozryZ0.jpg) ![graphmlp3](https://hackmd.io/_uploads/B1EhGByZA.jpg) # Find journal article with github link **2022** GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation [link](https://github.com/Vegetebird/GraphMLP?tab=readme-ov-file) **2023** Regular Splitting Graph Network for 3D Human Pose Estimation Human Pose as Compositional Tokens [link](https://github.com/Gengzigang/PCT) FTCM: Frequency-Temporal Collaborative Module for Efficient 3D Human Pose Estimation in Video [link](https://github.com/zhenhuat/FTCM) # Learn Difference experiment code relate with videopose 3D 1. Open all sourcecode and sub sourcecode in experiment **on going** 2. find the different each other **on going** 3. find another article related with videopose3D which give the code # Learn Take keypoint joint data visualize from semgcn 1. Try to define keypoint data to visualize skeleton model 2. article related about angle 3D human pose / ergonomic 3D Human Pose Reconstruction for Ergonomic Posture Analysis Ergonomic postural assessment using a new open-source human pose estimation technology (OpenPose) Industrial Ergonomics Risk Analysis Based on 3D-Human Pose Estimation 3. Try to get degree / angle in every keypoint ![SKEL](https://hackmd.io/_uploads/SJUNM5QyR.jpg) # Learn Correlation Model 1. From VideoPose TemporalModelBase TemporalModel Reference 3D pose estimation model with temporal convolutions. TemporalModelOptimized1f 3D pose estimation model optimized for single-frame batching, i.e. where batches have input length = receptive field, and output length = 1. This scenario is only used for training when stride == 1. 2. From SEMGCN -> GraphConv ResGraphConv GraphNonLocal SemGCN 3. From GAST GraphAttentionBlock SpatioTemporalModelBase SpatioTemporalModel SpatioTemporalModelOptimized1f # GAST-Net Experiment Epoch 80 ![gast1](https://hackmd.io/_uploads/S14Bmu90a.jpg) ![gast](https://hackmd.io/_uploads/S1EU7O9Cp.jpg) # Train result P-STMO Epoch 80 ![pstmo1](https://hackmd.io/_uploads/Sk-yGvYaa.jpg) MPJPE: 38.25 # Uplift report -> train error no explanation in the github # Preparation AMASS (AMASS-to-3DHPE) for Uplift Conversion of the AMASS dataset to a pure joint-based, Human3.6M compatible 3D human pose dataset. AMASS uses the SMPL+H body model. Follow the AMASS instructions and place the SMPL+H body model and DMPLs ![amass](https://hackmd.io/_uploads/H14di9BTT.jpg) # Preparation uplift research (Conference WACV 2023) 1. Preparation Github 2. Preparation H36m Dataset for Training - already 3. Preparation AMASS **Mocap** Data for Training AMASS (AMASS-to-3DHPE) - 17 Dataset (**on Process**) take the SMPL+H G Data CMU DanceDB MPILimits TotalCapture EyesJapanDataset HUMAN4D KIT BMLhandball BMLmovi BMLrub EKUT TCDhandMocap ACCAD Transitionsmocap MPIHDM05 SFU MPImosh # Train Result Videopose3D (Conf CVPR 2019) spesifikasi videopose3D 30 cdf file from every folder in subjects 1, 5, 6, 7, 8, 9, 11 **Spesification Train Adopted VideoPose** Epoch 80 train data use : data_2d_h36m_gt and data_3d_h36m result from running h36m dataset process using code preparation h36m from Videopose3D adopted from Martinez Train use Lab server RTX3090 - Train use 81 Frame Trainable parameter count: **12753971** Testing on 550644 frames Training on 1559752 frames - Train use 243 Frame Trainable parameter count: **16952371** Testing on 550644 frames Training on 1559752 frames ![243_81](https://hackmd.io/_uploads/rkXn7eysT.png) Comparation Train ![comparetrain1](https://hackmd.io/_uploads/HkLTmgkia.png) # Evaluation Result Videopose3D This evaluation result taken from train experiment result in h36m dataset Mean Per Joint Position Error (MPJPE) Procrustes Mean Per Joint Position Error (P-MPJPE) Procrustes analysis is method to compare the geometric properties of shapes Normalized Mean Per Joint Position Error (N-MPJPE) Mean Per Joint Velocity Error (MPJVE) Protocol #1 (MPJPE) action-wise average: 71.3 mm Protocol #2 (P-MPJPE) action-wise average: 53.6 mm Protocol #3 (N-MPJPE) action-wise average: 68.9 mm Velocity (MPJVE) action-wise average: 2.69 mm # Train Videopose3D - Build Human3.6M dataset spesifikasi videopose3D 30 cdf file from every folder in subjects 1, 5, 6, 7, 8, 9, 11 experiment 120 cdf file from every folder in subjects 1, 5, 6, 7, 8, 9, 11 - Process Train model in the human3.6M dataset using human3.6M dataset with 80 epoch (27 frames) ![trainvideo](https://hackmd.io/_uploads/B1DxNpBqa.jpg) # Continue Experiment SPIN ## modification LSP dataset LSP: We use LSP both for training and evaluation. You need to download the high resolution version of the dataset LSP dataset original (for training) and the low resolution version LSP dataset (for evaluation). After you unzip the dataset files, please complete the relevant root paths of the datasets in the file config.py. **But the problem LSP website is down** Alternatif Solution 1. Use LSP dataset in Deeplake website (1000 training and 1000 test data) ![lsp](https://hackmd.io/_uploads/BkbeTY2ta.jpg) 2. Use LSP dataset in dbcollection (2000 for both train and test) ![lsp1](https://hackmd.io/_uploads/SyUW6FnK6.jpg) # Continue Experiment SPIN ## Preparing Data The datasets that SPIN supports are: 1. Human3.6M (training) - **ready** ![spinh36](https://hackmd.io/_uploads/Hke96HmKa.jpg) 2. MPI-INF-3DHP (training and evaluation) - **ready** ![spinmpi](https://hackmd.io/_uploads/S1znTH7FT.jpg) 3. LSP (training and evaluation) - server down 4. COCO (training) - **ready** ![spincoco](https://hackmd.io/_uploads/HJQJCHmYT.jpg) 5. 3DPW (evaluation) - **ready** 6. UPi-S1h (evaluation) - **ready** 7. HR-LSPET (training set of LSP) - **ready** 8. MPII (training) - **ready** ![spin1](https://hackmd.io/_uploads/S1iYnHQKT.jpg) ## Next step 1. Modify Code to remove LSP dataset, considerate to switch with AGORA/BEDLAM follow ReFIT 2. process prepare evaluation H36M 3. evaluation code Human3.6M Protocol 1 --dataset=h36m-p1 Human3.6M Protocol 2 --dataset=h36m-p2 3DPW --dataset=3dpw MPI-INF-3DHP --dataset=mpi-inf-3dhp # Experiment Vizualization Semgcn Semgcn Analysis ![semgcn1](https://hackmd.io/_uploads/rJ0vxu56p.jpg) MPJPE : 45.4629 P-MPJPE : 35.6059 # Experiment Vizualization Semgcn with Human3.6M - error using gif with imagemagic - change using mp4 with ffmpeg 1. Epoch: 190 | Error: 63.748 - pretrain using ckpt_linier.pth.tar - subject S11 - bitrate 3000 <iframe src="https://drive.google.com/file/d/1KUgAKc9oVc0wFpRz0gWMjrfXzOkOYYmt/preview" width="640" height="480" allow="autoplay"></iframe> 2. Epoch: 16 | Error: 41.146 - pretrain using ckpt_semgcn.pth.tar - subject S11 - bitrate 3000 <iframe src="https://drive.google.com/file/d/1QXZU8niDcD_lm2Ey8tcfLd5RGE3OgcoE/preview" width="640" height="480" allow="autoplay"></iframe> 3. Epoch: 39 | Error: 42.32 - pretrain using ckpt_semgcn.pth.tar - subject S11 - bitrate 3000 <iframe src="https://drive.google.com/file/d/1vuKd6dWiqI4mbshIIwTl0o1LnB2y6xsN/preview" width="640" height="480" allow="autoplay"></iframe> # Personal Meeting Log-Book contribution: try find article head / hand pose ## Human3.6M Dataset 6/1 1. Human3.6M got from Stubbornhuang website ![sutborn](https://hackmd.io/_uploads/H1-_775dp.jpg) 2. convert human3.6M dataset for SPIN ![spinhuman](https://hackmd.io/_uploads/HkUmcXqup.jpg) ## Experiment Related to ReFit and EFT **SPIN** Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop [SPIN](https://github.com/nkolot/SPIN) ![jk1](https://hackmd.io/_uploads/HkaGSNAIp.jpg) ![jk1_shape](https://hackmd.io/_uploads/HyOmBN08a.png) ![jk1_shape_side](https://hackmd.io/_uploads/SJ1ESEALT.png) [Article](https://github.com/HongwenZhang/PyMAF) <iframe src="https://drive.google.com/file/d/1qU03zBdMDVdDre5WRzCXEMycEArWVHQF/preview" width="640" height="480" allow="autoplay"></iframe> ![image](https://hackmd.io/_uploads/SJGtKtNLp.png) **SPEC: Seeing People in the Wild with an Estimated Camera [ICCV 2021]** ![spec](https://hackmd.io/_uploads/rkwJ5FVU6.jpg) ![spec1](https://hackmd.io/_uploads/S1izqFNUa.jpg) ## Article HPE by Taiwan Author **LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION* Shang-Hong Lai Professor E-mail:lai@cs.nthu.edu.tw Phone:03-5742958 **Improvement of Human Pose Estimation and Processing With the Intensive Feature Consistency Network** MING-HWA SHEU - Department of Electronic Engineering, National Yunlin University of Science and Technology phone +886-5-5342601#4320 email: sheumh@yuntech.edu.tw **Semi-Supervised 3D Human Pose Estimation by Jointly Considering Temporal and Multiview Information** WEI-TA CHU - Department of Computer Science and Information Engineering, National Cheng Kung University email: wtchu@cs.ccu.edu.tw TEL: +886-5-2720411 ext 33125 ## Human Vision & Aplication **Submitted** **Manuscript file** A Word document with figures and tables placed in the body of the text where they are referenced. LaTeX documents with figures and tables compressed into a **ZIP format**. We will compile these into a PDF for peer review. **Figures (optional)** Upload figures in the order that they appear in your manuscript along with the correct labelling e.g. Figure 1 **Tables (optional)** Upload tables in the order that they appear in your manuscript along with the correct labelling e.g. Table 1. Any tables that are too large to be presented graphically within the body of your manuscript should instead be uploaded as supplementary material. **Title** This is the title seen by potential reviewers. It must match the title as it appears in your manuscript file. **Abstract** This is the abstract seen by potential reviewers. It must match the abstract as it appears in your manuscript file. **Cover letter** Please provide a cover letter outlining your research. The cover letter should briefly discuss the context and importance of the submitted work and why it is appropriate for the journal. **Author** Affiliated institutions Please add the institutions that your manuscript's authors are affiliated with. Once they have been added here, you can match the institution to authors below. Authors’ information Add all author names in the order they should appear in the published manuscript. Given names,Family name,Email address,Primary affiliation ## Experiment 3 start 25/11 (not Human3.6M Base) [EFT](https://github.com/facebookresearch/eft) ![eft](https://hackmd.io/_uploads/rynBTnfBp.jpg) Article ### Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation **Abstract** Differently from 2D image datasets with 3D ground-truth annotations are very difficult to obtain in the wild , we address this problem by augmenting existing 2D datasets with high-quality 3D pose fits. This simplified pipeline affords additional improvements, including injecting extreme crop augmentations to better reconstruct highly truncated people, and incorporating auxiliary inputs to improve 3D pose estimation accuracy. We also use our method to introduce new benchmarks for the study of real-world challenges such as occlusions, truncations, and rare body poses. we introduce Exemplar Fine-Tuning (EFT). EFT combines the re-projection accuracy of fitting methods like SMPLify with a 3D pose prior implicitly captured by a pre-trained 3D pose regressor network. **contributions** (1) providing large scale and high quality pseudo-GT 3D pose annotations that are sufficient to train state-of-the-art regressors without indoor 3D datasets; (2) running an extensive analysis of the quality of the pseudo-annotations generated via EFT against alternative approaches; (3) demonstrating the benefits of integrating the pseudo-annotations with extreme crop augmentation and auxiliary input representations; (4) introducing new 3D human pose benchmarks to assess regressors in less studied real-world scenarios. **Exemplar Fine Tuning** the human is represented by a parametric model that spans variation in both body shapes and motions. Example SMPL model parameters comprise the pose which controls the rotations of 24 body joints, and the shape, which controls body shapes by means of 10 principal directions of variations. For reconstruction, fitting-based approaches take 2D image cues such as joints, silhouettes, and part labels, and optimizes the model parameters to fit the 3D model to the 2D cues. In contrast, regression-based approaches predict the model parameters directly from image-based cues I such as raw RGB values and sparse and dense keypoints. The mapping is implemented by a neural network. Exemplar Fine-Tuning (EFT) combines the advantages of fitting and regression methods. The idea is to interpret the network as a re-parameterization of the model as a function of the network parameters w. Implementation details. For the network,we use HMR pre-trained using SPIN. The input RGB image I is a crop, 224  224, around the 2D keypoint annotations. We switch off batch normalization and dropout. We iterate until the average 2D re-projection loss is less than 3 pixels, or up to a maximum of 50 iterations (100 for OCHuman as the initial regressed pose tend to contain larger errors). **4.The EFT Training and Validation Datasets** We experiment with augmenting the COCO, MPII, LSPet, PoseTrack, and OCHuman datasets. Most of these datasets come with a Train and Val split. [eftdataset](https://hackmd.io/_uploads/HJi4fxhrp.jpg) Human Study and Validation. We conduct a human study and validation on Amazon Mechanical Turk (AMT) to assess the quality of the EFT fits. First, we use A/B testing to compare EFT and SMPLify fits. To this end, we show 500 randomly-chosen images from the MPII, COCO and LSPet datasets to human annotators in AMT and ask them whether they prefer the EFT or the SMPLify reconstruction. Truncated 3DPW Dataset. In order to assess the robustness of algorithms to people only partially visible due to viewtruncation (a very common case in applications as similarly addressed in, we also propose a new protocol for the 3DPW dataset using a pre-defined set of aggressive image crops (see fig.2) ![efttrunc](https://hackmd.io/_uploads/S1wgQlnB6.jpg) **5. Training Pose Regressors with EFT datasets** Our EFT datasets allow us to train 3D pose regressors from scratch using a simple and clean pipeline. Specifically, we train the HMR model (without the discriminator) using the same hyper parameters as used in SPIN. We explore two such techniques: applying extreme crop augmentation and inputting auxiliary inputs. Augmentation by extreme cropping. A shortcoming of previous 3D pose estimation methods is that it is assumed that most of the human body is visible in the input image. we propose to augment the training data with extreme cropping. we generate random crops in the same way for the Truncated 3DPW dataset.we trigger crop augmentation with 30% chance, randomly choosing a truncated bounding box among the pre-computed ones shown in fig.2. Auxiliary Input for 3D Pose Regressor. Recent methods demonstrate that other types of input encoding such as DensePose or body part segmentation can improve 3D pose regressors we train a pose regressor by concatenating the standard RGB input with an additional input encoding. **6. Result** EFT Datasets for Learning Models we have assessed the EFT-augmented datasets via a human study. We now assess them based on how well they work when used to train pose regressors from scratch. we use as single training loss the prediction error against 3D annotations (actual or pseudo), with a major simplification compared to approaches that mix, and thus need to balance, 2D and 3D supervision. ![efttab2](https://hackmd.io/_uploads/HyxcBg2Sp.jpg) New 3D Human Pose Benchmarks EFT Datasets. We evaluate the performance of various models on our new benchmark datasets with pseudo GT, OCHuman [70] and LSPet [24]. These datasets have challenging body poses, camera viewpoints, and occlusions. We found that most models trained on other datasets are struggling in these benchmarks, showing more than 100 mm errors, as shown in table3. The models trained with pseudo annotations on similar data (training sets of LSPet and OCHuman) show better performance (less than 100 mm). Testing with Truncated Input on 3DPW. We use the protocol defined in sec.4 on 3DPW to assess performance on truncated body inputs. ![trunceft](https://hackmd.io/_uploads/SyS38x2ra.jpg) 1. Preparing dataset. dataset without using H36m ![eftdata](https://hackmd.io/_uploads/SyOGR2fH6.jpg) 1. Fitting data (json formats) ![eft](https://hackmd.io/_uploads/BJcYGaMBp.jpg) 2. Training Set Ms Coco (on Process) MPII LSPet Preparing the system system still using docker PyTorch 1.7.0 ![efttrunc](https://hackmd.io/_uploads/S1wgQlnB6.jpg) CUDA 10.2 cudnn7. Ubuntu 20.04 python 3.7 ## Experiment 2 (Human3.6M Base) [Posenet](https://github.com/mks0601/3DMPPE_POSENET_RELEASE) Their experiment used : Ubuntu 16.04, CUDA 9.0, cuDNN 7.1 environment with two NVIDIA 1080Ti GPUs, Python 3.6.5 ![e5](https://hackmd.io/_uploads/Skth7wYVT.jpg) Container Image: PyTorch 1.7.0 CUDA 10.2 cudnn7. Ubuntu 20.04 python 3.7 Dataset: Human3.6M different with semgcn (download process) MPII downloaded MuCo downloaded MuPoTS process MS COCO process pretrain PoseNet data (downloaded) pretrain Bounding boxs (from DetectNet) and root joint coordintates (from RootNet) data (downloaded) ![h36mh5fol](https://hackmd.io/_uploads/SJHV2hzBT.jpg) ![h36mh5folh5](https://hackmd.io/_uploads/H1Hd32GrT.jpg) **when run training failed in find D3_position for H3.6M dataset** ## Experiment 1 (Human3.6M Base) [semGCN](https://github.com/garyzhao/SemGCN?search=1) Model: SemGCN for python 3.7 Dataset Human 3.6M, MPII tunning Data Stacked Hourglass ![e1](https://hackmd.io/_uploads/HJbBMDYEp.jpg) 1. Using VMWare Workstation 17 ![e2](https://hackmd.io/_uploads/BydIzvtN6.jpg) Spesification : Lubuntu python 2.7 pytorch 1.5.1 cuda 10.2 **Failed**: Graphics not detected 2. Using Docker System: list image: tigerdockermediocore/pytorch-video-docker:1.8.1-cu102-py37 myaiocean/pytorch:1.8.1-cuda10.2-cudnn7-runtime qpod0dev/torch-cuda102:2022.1122.1536 summit4you/pytorch:1.10.0-cuda10.2-cudnn8 **Use** ![e3](https://hackmd.io/_uploads/S14xmvtNp.jpg) Container Image: PyTorch 1.7.0 CUDA 10.2 cudnn7. Ubuntu 20.04 python 3.7 Running Model by epoch 50 loss : 0.002 MPJPE : 43.09 P-MPJPE : 35.064 Still run in epoch 36 ![e4](https://hackmd.io/_uploads/HJoZXwF4a.jpg) Protocol #1 (MPJPE) action-wise average: 408.31 (mm) Protocol #2 (P-MPJPE) action-wise average: 194.63 (mm) ![logepoch50](https://hackmd.io/_uploads/HyBvywN4p.jpg) Visualization Failed cause some module not support