# 新流程

流程:
```graphviz
digraph test{
node[shape=record];
#rankdir="LR"
step1[label= "parsertest.py 解壓縮.bag 至 .mp4" ]
step2[label="ffmpeg 插幀成 100 fps & 編碼選項 "]
step3[label="interpolatetest.py 對齊幀數至 ground truth 的幀數"]
step4[label = "read_csv2.py 並儲存 npz 為 gt_npy & 3d_custom.npz"]
step5[label="prepare_data_2d_custom.py 解碼整理得到的 npz 數據以送入二維關鍵點預測模型"]
step6[label="二維關鍵點模型預測 2d_keypoints_prepare.py 得到二維關鍵點與其可視化影像串流"]
step7[label="MixSTE 的 run_MixSTE.py 重建 3D 關鍵點"]
step8[label="3D smooth.py"]
step9[label=" draw2.py 畫出三維骨架和計算關節角度輸出關節角度折線圖"]
step10[label="stack_video2.py 疊合預測與 ground truth 影片"]
step1->step2
step2->step3
step3->step4
step4->step5
step5->step6
step6->step7
step7->step8
step8->step9
step9->step10
}
```
parsertest .py 用來解碼一開始的影片並記錄其幀數、位置於 frame.yaml。
原本的影片透過 ffmpeg 套件進行[插幀](https://trac.ffmpeg.org/wiki/ChangingFrameRate),
interpolatetest .py 將不足至 100 整數倍的幀數透過高斯天花板取得最長的百倍幀,不足的部分接以最後一幀的資訊複製補齊。
2d_keypoints_prepare .py 透過 mmdetection 模型先偵測出人體物件再通過 mmpose 二維關鍵點偵測模型預測出人體關鍵點輸出。
run_MixSTE .py 為三維關鍵點重建的主要程式
三維關鍵點部分比起學長使用 Poseformer,改採用 MixSTE 變換器模型。目前 Poseformer又推出 V2 可以去看看效果如何。基本上 [run_poseformer.py](https://github.com/zczcwh/PoseFormer) 與 [run_MixSTE.py](https://github.com/JinluZhang1126/MixSTE) 皆沿用 facebook [Videopose3D](https://github.com/facebookresearch/VideoPose3D) 的程式碼,使得改模型變得容易。
至於二維關鍵點部分取代 Alphapose 的部分使用 [mmpose](https://github.com/open-mmlab/mmpose) 進行 2D keypoints 的訓練,好處是支援不同模型進行關鍵點訓練,缺點是 inference 速度比 Alphapose 慢上許多。
相反的, [AlphaPose](https://github.com/MVIG-SJTU/AlphaPose) 的流程是將需要處理的圖像透過 FIFO queue 的方式用多執行緒來獲得結果,因此雖然速度較快但也難以在遠端除錯,建議若要從原本模型移至 AlphaPose ,可以最後再做這一部分的 local 端除錯。
## Config
1. sh run.sh 放入位置參數 config 的 path (.yaml) 且須包含諸多必要參數
2. 如圖所示為 Config 檔的範例:
```yaml
SubjectName: test123
2DFormat: h36m
GroundTruth:
position_path: /home/p76094266/demo_subjects/test123
rotation_angle: -30
Video:
fps: 60
interp_fps: 100
w: 1280
h: 720
bag_path: /home/p76094266/Video
ResultDir: /home/p76094266/tmp1
DrawAngle: True
2DModelPath: /home/p76094266/mmpose/work_dirs/hrnet_512x512_frozen1/epoch_53.pth
2DConfigPath: /home/p76094266/mmpose/work_dirs/hr_net_w48_512x512.py
3DModelType: MixSTE
3DModelPath: /home/p76094266/PoseFormer/checkpoint/ep80_243f_coco_fc/epoch_80.pth
ParserOutputPath:
DebugLog: True
```
SubjectName 牽涉到與 ground_truth 的 .txt 檔匹配,所以當你使用不同 SubjectName 的話,有不同的映射,若在 parser_test.py 及 read_csv2.py 沒有定義相關的 key-pair list 則會視為沒有 ground truth 的 inference。
key-pair list 範例如下, key 為影片.bag 名稱, value 為真實答案骨架數據:
```
test123_key_list = {'risehand.bag': 'New Session14', 'risehandhigh.bag': 'New Session16',
'standup2.bag':'New Session17', 'touchknee.bag':'New Session18', 'squatdown.bag':'New Session22',
'rotate.bag':'New Session23'}
```
2DFormat: 基本上都使用 h36m 格式 比較不會使用 coco 格式
GroundTruth: 真實答案資料夾位置,angle is deprecated
Video: 二維影片的資訊,錯誤會無法解碼 .bag
ResultDir: 新建一個資料夾目的用來存放結果
2DModelPath: mmpose 的二維模型權重位置
2DConfigPath: mmpose 的對應 config 位置
ParserOutputPath: 、 DebugLog: are deprecated
## MMPose
目前 MMPose 在 2023 已從 0.29 遷移至 1.0 並大幅更新 mmengine 、 mmdetection、 mmcls 等相關 openmmlab 的工具,因此我目前除了使用尚未遷移過去的模型外,也有使用 1.0版本的東西。
https://zhuanlan.zhihu.com/p/582270819
## Virtual Environment
```
base * /home/p76094266/anaconda3
alphapose /home/p76094266/anaconda3/envs/alphapose
hrnettest /home/p76094266/anaconda3/envs/hrnettest
openmmlab /home/p76094266/anaconda3/envs/openmmlab
openmmlab2 /home/p76094266/anaconda3/envs/openmmlab2
pose3D /home/p76094266/anaconda3/envs/pose3D
pose3D_ /home/p76094266/anaconda3/envs/pose3D_
pose_clone /home/p76094266/anaconda3/envs/pose_clone
vtk /home/p76094266/anaconda3/envs/vtk
```
hrnettest : 就只是個測試的 test environment
openmmlab: 為 mmpose 0.29 版本的必要環境與相關核心套件 ~/mmpose/
openmmlab2: 為 mmpose 1.0.4rc 版本與 mmdetection 等大幅度更新之版本必要環境與相關核心套件
pose3D: 為運行 poseformer 與 MixSTE code 所需之超大環境,裡面包括 openmmlab 0.29版本必要環境以運行
vtk : 跑 ffmpeg 所需套件編碼器與 跑 draw 所需之 vtk 套件
## 資料夾
```
/home/p76094266
├── PoseFormer -> main 2D and 3D Keypoints Inference Code In
├── tool -> main Parser Video and Visualization Code In
├── run.sh -> script of whole system pipline
├── mmdetection -> using together with mmpose 0.29
├── mmpose -> mmpose 0.29
├── openmmlab -> include new mmdetection version and mmpose 1.0.0
├── anaconda3 -> env
├── coco-analyze -> analyze Keypoint AP program
├── demo_subjects -> Subjects Ground Truth
├── Video -> Subjects Video stream
├── Human36m -> Download From Human3.6M Video and annotation
├── MixSTE -> Clone from github
├── VideoPose3D
├── Anatomy3D -> paper code
├── MHFormer -> TrashFormer
├── hrnet
├── tmp ->result folder
├── tmp1 ->result folder
├── tmp2 ->result folder
├── tmp3 ->result folder
├── tmp_bodi ->result folder
└── BackupPose -> backup
```
### Script
* run_parser.sh - 用來將自定義影片從bag變成影片檔
* run_part.sh - 主要跑到影片、插幀處理完要debug run_poseformer的部分
* prepare.sh - 引數一:人物 引數二: 要debug的影片編號 用Alphapose做成 2d_keypoints 來 prepare 2d npz
* train1,2,3/sh - 不同的train方式
*
## Cocometric
為了生成這張圖:

我們需要使用 coco-api,
coco-api 的 evaluation
[參考](https://blog.csdn.net/bryant_meng/article/details/108325287)
```python
class COCOeval:
def __init__(self, cocoGt=None, cocoDt=None, intType=''):
## implement
cocoEval = COCOeval(cocoGt, cocoDt, annType)
cocoEval.evaluate()
cocoEval.accumulate()
cocoEval.summarize()
```
我們需要傳入 cocoGt ,就是annotaion,而 cocoDt 就是我們的預測結果,格式應符合coco format的 .json文件如下:
```
[{
"image_id" : int,
"category_id": int, # ex: 1 people
"keypoints": [x1,y1,v1,...,xk,yk,vk] #ex: [430.7, 310.5, 1.0, ....]
"score": float # ex: 0.97
}]
```
而 Gt 仍需要一個 'coco_url' 用來儲存圖片位置
以下為 [coco_analyze_demo](https://github.com/matteorr/coco-analyze/blob/release/COCOanalyze_demo.ipynb) 範例:
```python
import json
import numpy as np
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
from pycocotools.cocoanalyze import COCOanalyze
import matplotlib as plt
import skimage.io as io
dataDir = '.'
dataType = 'val2014'
annType = 'person_keypoints'
teamName = 'NCKUVision'
annFile = '%s/annotations/%s_%s.json'%(dataDir, annType, dataType)
resFile = '%s/detections/%s_%s_%s_results.json'%(dataDir, teamName, annType, dataType)
gt_data = json.load(open(annFile,'rb'))
imgs_info = {i['id']:{'id':i['id'],
'width':i['width'],
'height':i['height']}
for i in gt_data['images']}
team_dts = json.load(open(resFile, 'rb'))
team_dts = [d for d in team_dts if d['image_id'] in imgs_info]
coco_gt = COCO(annFile)
coco_dt = coco_gt.loadRes(team_dts)
coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints')
if teamName == '....test':
imgIds = sorted(coco_gt.getImgIds())[0:100]
coco_analyze.cocoEval.params.imgIds = imgIds
coco_analyze.evaluate(verbose=True, makeplots=True)
```
可以調整 coco_analyze 的各項參數
```python
coco_analyze.params.oksThrs = [.5, .55, .6, .65, .7, .75, .8, .85, .9, .95]
# set OKS threshold required to match a detection to a ground truth
coco_analyze.params.oksLocThrs = .1 #
# set KS threshold limits defining jitter errors
coco_analyze.params.jitterKsThrs = [.5, .85]
coco_analyze.params.err_types = ['miss', 'swap', 'inversion', 'jitter']
# use analyze() method for advanced error analysis
coco_analyze.analyze(check_kpts=True, check_scores=True, check_bckgd=True)
coco_analyze.summarize(makeplots=True)
```

可以看到圖表裡佔比很大的是 Jit ,就是關鍵點的顫動,這裡定義於 OKS 介於 [0.5, 0.85]之間
### 製作 Ground Truth
.MP4 -> FetchFrame .py -> /img local
.MP4 -> getprediction by mmpose/demo/topdown_demo_with_mmdet.py --save-prediction=True -> local桌面/test_python/json/
-> test_python/label_ours.py
-> output test?.json
->merge use h36mtococo .py format
-> put in /mmpose/data/custom
```
(base) p76094266@vision-B660M-D3H-DDR4:~/mmpose/data$ tree ./ -L 1
./
├── coco
│ ├── annotations
│ ├── annotations_trainval2017.zip
│ ├── person_detection_results
│ ├── train2017
│ └── val2017
├── custom
│ ├── annotations -> 從 h36mtococo轉換出來的標註
│ ├── images -> fetechframe 拆出來的影像幀
│ └── tmp
├── Fetch_frame.py
├── h36m
│ ├── annotation_body2d
│ ├── annotation_body3d
│ └── images
├── mpii
│ ├── annotations
│ └── images
├── testdata.py
├── test_output
├── tmpvis_img
```
# 凱予學長 Keypoints 流程
首先 UI.py 用來顯示影片結果
當我們使用 demo.py 時會有五個按鍵,按下按鈕會開啟另一個 process 執行 run.bat 後面再接一個參數是資料集的人。
read_csv 的 csv 為一個人的每一幀的 $(x, y, z)$ 關鍵點,下圖所示

流程:
```graphviz
digraph test{
node[shape=record];
#rankdir="LR"
step1[label= "parser.py" ]
step2[label="ffmpeg 插幀 & 壓縮"]
step3[label="interpolate.py 對齊幀數至 ground truth 的幀數"]
step4[label = "read_csv.py 並儲存 npz 為 gt_npy & 3d_custom.npz"]
step5[label="Poseformer 的 inference.py 重建 3D 關鍵點"]
step6[label="Alphapose 的 inference.py 產生2D 的可視化結果"]
step7[label = "evaluation 主要是針對職能治療計算對稱係數 Vx,Vy,Vz "]
step8[label="3D smooth.py"]
step9[label="draw.py & stack_video.py"]
step1->step2
step2->step3
step3->step4
step4->step5
step5->step6
step6->step7
step7->step8
step8->step9
}
```
### Parser
輸入 :
* CNT
* NAME
handle_video() :
從 tmp\\\bag\\\ 讀取影片.bag檔
開啟串流
建立深度影像串流
宣告著色器
寫入 color_color_image
輸出:
* \\tmp\\original_video\\$videoname$.mp4 Encode:fourcc 幀:60 長寬:640 * 480
* tool\\frame.yaml : dump(d):
```python
######### frame.yaml #########
M1:
align_frame_num: 400
frame_number: 221
realworld_frame_num: 367
fps: 60
name: P02
###############
d = dict()
d['fps'] = 60
d['name'] = sys.argv[2] # yuan johnson ...
d[v]['frame_num'] = frame_num #real frame_num
d[v]['realworld_frame_num'] = realword_frame_num # read from \\tmp\\csv\\$action$.csv
d[v]['align_frame_num'] = ((realword_frame_num+100-1)//100)*100
```
### ffmpeg
輸入:
* tmp\\original_video\\$M$.mp4
輸出:
* tmp\\interpolate\\$M$_100.mp4 幀數:100 編碼壓縮-crf 品質10
### interpolate
輸入:
* tool\\frame.yaml
* tmp\\interpolate\\$M$_100.mp4
輸出:
* tmp\\align_video\\$M$_frame_align.mp4 幀數:100 總幀數補到100的倍數(align_frame_num)
### read_csv
輸入:
* tmp\\csv\\$M$.csv

* tool\\frame.yaml
映射透過 joint_3d[align_frame_num, 17, 3] 到 h36m_3d[align_frame_num, 32, 3]
輸出:
* np.save(f'tmp\\{SubjectName}_gt_npy\\{SubjectName}_{action[1:]}', joint_3d)
```
{ndarray:(400, 17, 3)}
```
* np.savez_compressed(f"tmp\\npz\\data_3d_custom_{action}.npz", positions_3d=data, metadata=metadata)
```
keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002467FB57888>)
--- *** ---
metadata - shape: () - :
{'layout_name': 'coco', 'num_joints': 17, 'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]], 'video_metadata': {'M1_frame_align.mp4': {'w': 640, 'h': 480}}}
---
positions_3d - shape: () - :
{'M1_frame_align.mp4': {'custom': array([[[ 0.00528335, 0.0892668 , 0.668917 ],
[-0.15261501, 0.169559 , 0.677401 ],
[-0.25646898, -0.0534065 , 0.417277 ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]],
[[ 0.00531119, 0.0889249 , 0.668534 ],
[-0.15253599, 0.16917999, 0.677226 ],
[-0.256303 , -0.053511 , 0.417238 ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]],
[[ 0.00537636, 0.0885618 , 0.668128 ],
[-0.152507 , 0.168759 , 0.677095 ],
[-0.25613102, -0.0535853 , 0.417187 ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]],
...,
[[ 0.025927 , -0.147714 , 0.87259495],
[-0.136304 , -0.0708796 , 0.863441 ],
[-0.21164301, -0.00255003, 0.447731 ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]],
[[ 0.025927 , -0.147714 , 0.87259495],
[-0.136304 , -0.0708796 , 0.863441 ],
[-0.21164301, -0.00255003, 0.447731 ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]],
[[ 0.025927 , -0.147714 , 0.87259495],
[-0.136304 , -0.0708796 , 0.863441 ],
[-0.21164301, -0.00255003, 0.447731 ],
...,
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]]], dtype=float32)}}
---
```
```python
data = {f'{action}_frame_align.mp4': {'custom': h36m_3d}}
metadata = {'layout_name': 'coco', 'num_joints': 17,
'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]],
'video_metadata': {f'{action}_frame_align.mp4': {'w': 640, 'h': 480}}}
```
```bash
xcopy tmp\align_video\*.mp4 PoseFormer\%NAME%\
xcopy tmp\align_video\*.mp4 PoseFormer\Alphapose\%NAME%_videos\
xcopy tmp\npz\*.npz PoseFormer\data\%NAME%\
```
AlphaPose.py 的 videoloader class 結構

Detectionloader class 結構

DetectionProcessor class 結構

### Alphapose
AlphaPose.py 的 handle_video():
```graphviz
digraph test{
node[shape=record];
#rankdir="LR"
step1[label= "VideoLoader" ]
step2[label="DetectionLoader"]
step3[label="DetectionProcessor"]
step4[label = "InferenNet Or InferenNet_fast"]
step5[label="hm_j = pose_model(inps_j)"]
step6[label="kpts = final_result[i]['result'][0]['keypoints']"]
step7[label = "savgol_filter filter kpts with win_size=31 polyorder=2"]
step8[label="OneEuroFilter"]
step9[label="np.savez_compressed .mp4.npz,boxes, segments, keypoints, metadata, scale_hw"]
step1->step2
step2->step3
step3->step4
step4->step5
step5->step6
step6->step7
step7->step8
step8->step9
}
```
輸入:
* $people\$M$_align.mp4
* nClasses = 17
輸出:
* save_path = os.path.join(args.outputpath, 'AlphaPose_'+ntpath.basename(videofile).split('.')[0]+'.avi')
writer = DataWriter(args.save_video, save_path, cv2.VideoWriter_fourcc(*'XVID'), fps, frameSize).start()
* writer.save(None, None, None, None, None, orig_img, im_name.split('/')[-1])
* writer.save(boxes, scores, hm, pt1, pt2, orig_img, im_name.split('/')[-1])
* np.savez_compressed(f"./npz/{filename}.mp4.npz", boxes=boxess, segments=segments, keypoints=keypoints, metadata=metadata, scale=scale_hw)
* filename = $people\$M$_align.npz
* keypoinys = [[[],kps],...]
* metadata = {
'w': orig_img.shape[1],
'h': orig_img.shape[0],
}
* scale_hw = max(pw, ph)
* segments = [[None],...]
```
keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002461E02CC48>)
--- *** ---
boxes - shape: (400, 2) - :
[[list([])
array([[229.03535 , 91.8827 , 417.15054 , 479. ,
0.99792993]], dtype=float32) ]
[list([])
array([[229.01895 , 91.6857 , 416.1857 , 479. , 0.9971854]],
dtype=float32) ]
[list([])
array([[228.92445 , 92.09213 , 416.52417 , 479. , 0.9976592]],
dtype=float32) ]
[list([])
array([[228.83986 , 92.06596 , 416.58234 , 479. , 0.997719]],
dtype=float32) ]
[list([])
array([[229.35788 , 90.477356, 415.71912 , 479. , 0.997547]],
dtype=float32) ]
[list([])
array([[229.31694 , 90.10956 , 415.83145 , 479. , 0.9976223]],
dtype=float32) ]
...
[list([])
array([[[326.44873 , 334.68924 , 318.20828 , 340.1829 , 298.98047 ,
359.41064 , 274.25903 , 375.89163 , 257.77808 , 395.1194 ,
249.5376 , 351.17017 , 296.23364 , 364.9043 , 277.00586 ,
362.15753 , 279.75272 ],
[115.844215, 107.603745, 107.603745, 115.844215, 110.35056 ,
165.2871 , 168.03394 , 228.46411 , 236.70459 , 288.8943 ,
294.38797 , 286.1475 , 286.1475 , 371.2991 , 379.5396 ,
442.71655 , 448.21027 ]]], dtype=float32) ]
[list([])
array([[[326.44873 , 334.68924 , 318.20828 , 340.1829 , 298.98047 ,
359.41064 , 274.25903 , 375.89163 , 257.77808 , 395.1194 ,
249.5376 , 351.17017 , 296.23364 , 364.9043 , 277.00586 ,
362.15753 , 279.75272 ],
[115.844215, 107.603745, 107.603745, 115.844215, 110.35056 ,
165.2871 , 168.03394 , 228.46411 , 236.70459 , 288.8943 ,
294.38797 , 286.1475 , 286.1475 , 371.2991 , 379.5396 ,
442.71655 , 448.21027 ]]], dtype=float32) ]]
---
metadata - shape: () - :
{'w': 640, 'h': 480}
---
scale - shape: () - :
0.8064944
---
segments - shape: (400,) - :
[None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None None None None None None None
None None None None None None None None]
---
segments(400,)
keypoints(400,2)
boxes(400,2)
```
### prepare_data_2d_custom
輸入:
data\\..\\npz\\*.npz
輸出:
* np.savez_compressed(output_prefix_2d + args.output, positions_2d=output, metadata=metadata, customScale=scale)
*Poseformer//data//data_2d_custom_myvideos.npz*
* output_prefix_2d + args.output = 'data_2d_custom_' + myvideos
* output={'canonical_name:{'custom': = data[0]['keypoints']}'}
* metadata : metadata['video_metadata'][canonical_name] = video_metadata
* scale = data[0]['scale']
```
keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002C9E3E10E48>)
--- *** ---
customScale - shape: () - :
0.8064944
---
metadata - shape: () - :
{'layout_name': 'coco', 'num_joints': 17, 'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]], 'video_metadata': {'M1_frame_align.mp4': {'w': 640, 'h': 480}}}
---
positions_2d - shape: () - :
{'M1_frame_align.mp4': {'custom': [array([[[316.97876 , 178.19023 ],
[324.2492 , 171.16183 ],
[311.4052 , 169.94934 ],
...,
[266.0856 , 388.78876 ],
[364.47934 , 444.28693 ],
[277.71844 , 453.49628 ]],
[[316.97876 , 178.19023 ],
[324.2492 , 171.16183 ],
[311.4052 , 169.94934 ],
...,
[266.0856 , 388.78876 ],
[364.47934 , 444.28693 ],
[277.71844 , 453.49628 ]],
[[316.97876 , 178.19023 ],
[324.2492 , 171.16183 ],
[311.4052 , 169.94934 ],
...,
[266.0856 , 388.78876 ],
[364.47934 , 444.28693 ],
[277.71844 , 453.49628 ]],
...,
[[326.44873 , 115.844215],
[334.68924 , 107.603745],
[318.20828 , 107.603745],
...,
[277.00586 , 379.5396 ],
[362.15753 , 442.71655 ],
[279.75272 , 448.21027 ]],
[[326.44873 , 115.844215],
[334.68924 , 107.603745],
[318.20828 , 107.603745],
...,
[277.00586 , 379.5396 ],
[362.15753 , 442.71655 ],
[279.75272 , 448.21027 ]],
[[326.44873 , 115.844215],
[334.68924 , 107.603745],
[318.20828 , 107.603745],
...,
[277.00586 , 379.5396 ],
[362.15753 , 442.71655 ],
[279.75272 , 448.21027 ]]], dtype=float32)]}}
---
```
### run_poseformer
輸入:
* parse_args()
* keypoints = np.load('data/data_2d_' + args.dataset + '_' + args.keypoints + '.npz', allow_pickle=True)
* data/{args.subject_people}/data_3d_custom_M{args.test_action}.npz (Custom)
```
keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002C9E3E109C8>)
--- *** ---
metadata - shape: () - :
{'layout_name': 'coco', 'num_joints': 17, 'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]], 'video_metadata': {'output100.mp4': {'w': 640, 'h': 480}}}
---
positions_3d - shape: () - :
{'output100.mp4': {'custom': array([[[0.0444855 , 0.0357131 , 0.91960603],
[0.181417 , 0.119218 , 0.90842605],
[0.171964 , 0.166813 , 0.432604 ],
...,
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]],
[[0.0443666 , 0.0357647 , 0.91960603],
[0.18128799, 0.119236 , 0.908386 ],
[0.171895 , 0.166792 , 0.432621 ],
...,
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]],
[[0.0442933 , 0.0358185 , 0.91958696],
[0.181197 , 0.119208 , 0.90839 ],
[0.171862 , 0.166786 , 0.43262202],
...,
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]],
...,
[[0.0404266 , 0.0125043 , 0.91844296],
[0.176933 , 0.095312 , 0.90726703],
[0.169449 , 0.155723 , 0.431024 ],
...,
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]],
[[0.0405676 , 0.0127635 , 0.91798997],
[0.176866 , 0.0951155 , 0.90725404],
[0.16938299, 0.155638 , 0.431013 ],
...,
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]],
[[0.040665 , 0.0128088 , 0.917722 ],
[0.17683001, 0.094931 , 0.907256 ],
[0.16930701, 0.155543 , 0.431019 ],
...,
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ]]], dtype=float32)}}
---
```
dataset = CustomDataset(f'data/{args.subject_people}/data_3d_custom_M{args.test_action}.npz')
Prepare dict(): keypoints['customScale':customScale,'positions_2d':keypoints,'metadata':keypoints_metadata,'keypoints_symmetry':kps_left, kps_right]
boneindex = [[16,15],[15,14],[13,12],[12,11],[10,9],[9,8],[8,7],[8,11],[8,14],[7,0],[3,2],[2,1],[6,5],[5,4],[1,0],[4,0]]
cameras_valid, poses_valid, poses_valid_2d = fetch(subjects_test, action_filter) fetch 透過 subjects 和 action ,來找出 dict()裡的 camera 參數、3D keypoints、2D keypoints 回傳
輸出:
* np.save(f'{args.subject_people}_npy/{args.subject_people}_{args.test_action}', prediction)
```
ndarray(600, 17, 3)
```
run_poseformer 參數
```shell
-ta : --test-action
-sp : --subject-people
-d : -dataset
-k : --keypoints
-str : --subjects-train
-ste : --subjects-test
-a : --actions
-c : --checkpoint
-f : -frame
```
### Human Pose 3.6m
[參考資料](https://blog.csdn.net/alickr/article/details/107837403)