Keypoints 流程

# 新流程 ![](https://hackmd.io/_uploads/S1z9NoEh3.png) 流程: ```graphviz digraph test{ node[shape=record]; #rankdir="LR" step1[label= "parsertest.py 解壓縮.bag 至 .mp4" ] step2[label="ffmpeg 插幀成 100 fps & 編碼選項 "] step3[label="interpolatetest.py 對齊幀數至 ground truth 的幀數"] step4[label = "read_csv2.py 並儲存 npz 為 gt_npy & 3d_custom.npz"] step5[label="prepare_data_2d_custom.py 解碼整理得到的 npz 數據以送入二維關鍵點預測模型"] step6[label="二維關鍵點模型預測 2d_keypoints_prepare.py 得到二維關鍵點與其可視化影像串流"] step7[label="MixSTE 的 run_MixSTE.py 重建 3D 關鍵點"] step8[label="3D smooth.py"] step9[label=" draw2.py 畫出三維骨架和計算關節角度輸出關節角度折線圖"] step10[label="stack_video2.py 疊合預測與 ground truth 影片"] step1->step2 step2->step3 step3->step4 step4->step5 step5->step6 step6->step7 step7->step8 step8->step9 step9->step10 } ``` parsertest .py 用來解碼一開始的影片並記錄其幀數、位置於 frame.yaml。原本的影片透過 ffmpeg 套件進行[插幀](https://trac.ffmpeg.org/wiki/ChangingFrameRate)， interpolatetest .py 將不足至 100 整數倍的幀數透過高斯天花板取得最長的百倍幀，不足的部分接以最後一幀的資訊複製補齊。 2d_keypoints_prepare .py 透過 mmdetection 模型先偵測出人體物件再通過 mmpose 二維關鍵點偵測模型預測出人體關鍵點輸出。 run_MixSTE .py 為三維關鍵點重建的主要程式三維關鍵點部分比起學長使用 Poseformer，改採用 MixSTE 變換器模型。目前 Poseformer又推出 V2 可以去看看效果如何。基本上 [run_poseformer.py](https://github.com/zczcwh/PoseFormer) 與 [run_MixSTE.py](https://github.com/JinluZhang1126/MixSTE) 皆沿用 facebook [Videopose3D](https://github.com/facebookresearch/VideoPose3D) 的程式碼，使得改模型變得容易。至於二維關鍵點部分取代 Alphapose 的部分使用 [mmpose](https://github.com/open-mmlab/mmpose) 進行 2D keypoints 的訓練，好處是支援不同模型進行關鍵點訓練，缺點是 inference 速度比 Alphapose 慢上許多。相反的， [AlphaPose](https://github.com/MVIG-SJTU/AlphaPose) 的流程是將需要處理的圖像透過 FIFO queue 的方式用多執行緒來獲得結果，因此雖然速度較快但也難以在遠端除錯，建議若要從原本模型移至 AlphaPose ，可以最後再做這一部分的 local 端除錯。 ## Config 1. sh run.sh 放入位置參數 config 的 path (.yaml) 且須包含諸多必要參數 2. 如圖所示為 Config 檔的範例: ```yaml SubjectName: test123 2DFormat: h36m GroundTruth: position_path: /home/p76094266/demo_subjects/test123 rotation_angle: -30 Video: fps: 60 interp_fps: 100 w: 1280 h: 720 bag_path: /home/p76094266/Video ResultDir: /home/p76094266/tmp1 DrawAngle: True 2DModelPath: /home/p76094266/mmpose/work_dirs/hrnet_512x512_frozen1/epoch_53.pth 2DConfigPath: /home/p76094266/mmpose/work_dirs/hr_net_w48_512x512.py 3DModelType: MixSTE 3DModelPath: /home/p76094266/PoseFormer/checkpoint/ep80_243f_coco_fc/epoch_80.pth ParserOutputPath: DebugLog: True ``` SubjectName 牽涉到與 ground_truth 的 .txt 檔匹配，所以當你使用不同 SubjectName 的話，有不同的映射，若在 parser_test.py 及 read_csv2.py 沒有定義相關的 key-pair list 則會視為沒有 ground truth 的 inference。 key-pair list 範例如下， key 為影片.bag 名稱， value 為真實答案骨架數據: ``` test123_key_list = {'risehand.bag': 'New Session14', 'risehandhigh.bag': 'New Session16', 'standup2.bag':'New Session17', 'touchknee.bag':'New Session18', 'squatdown.bag':'New Session22', 'rotate.bag':'New Session23'} ``` 2DFormat: 基本上都使用 h36m 格式比較不會使用 coco 格式 GroundTruth: 真實答案資料夾位置，angle is deprecated Video: 二維影片的資訊，錯誤會無法解碼 .bag ResultDir: 新建一個資料夾目的用來存放結果 2DModelPath: mmpose 的二維模型權重位置 2DConfigPath: mmpose 的對應 config 位置 ParserOutputPath: 、 DebugLog: are deprecated ## MMPose 目前 MMPose 在 2023 已從 0.29 遷移至 1.0 並大幅更新 mmengine 、 mmdetection、 mmcls 等相關 openmmlab 的工具，因此我目前除了使用尚未遷移過去的模型外，也有使用 1.0版本的東西。 https://zhuanlan.zhihu.com/p/582270819 ## Virtual Environment ``` base * /home/p76094266/anaconda3 alphapose /home/p76094266/anaconda3/envs/alphapose hrnettest /home/p76094266/anaconda3/envs/hrnettest openmmlab /home/p76094266/anaconda3/envs/openmmlab openmmlab2 /home/p76094266/anaconda3/envs/openmmlab2 pose3D /home/p76094266/anaconda3/envs/pose3D pose3D_ /home/p76094266/anaconda3/envs/pose3D_ pose_clone /home/p76094266/anaconda3/envs/pose_clone vtk /home/p76094266/anaconda3/envs/vtk ``` hrnettest : 就只是個測試的 test environment openmmlab: 為 mmpose 0.29 版本的必要環境與相關核心套件 ~/mmpose/ openmmlab2: 為 mmpose 1.0.4rc 版本與 mmdetection 等大幅度更新之版本必要環境與相關核心套件 pose3D: 為運行 poseformer 與 MixSTE code 所需之超大環境，裡面包括 openmmlab 0.29版本必要環境以運行 vtk : 跑 ffmpeg 所需套件編碼器與跑 draw 所需之 vtk 套件 ## 資料夾 ``` /home/p76094266 ├── PoseFormer -> main 2D and 3D Keypoints Inference Code In ├── tool -> main Parser Video and Visualization Code In ├── run.sh -> script of whole system pipline ├── mmdetection -> using together with mmpose 0.29 ├── mmpose -> mmpose 0.29 ├── openmmlab -> include new mmdetection version and mmpose 1.0.0 ├── anaconda3 -> env ├── coco-analyze -> analyze Keypoint AP program ├── demo_subjects -> Subjects Ground Truth ├── Video -> Subjects Video stream ├── Human36m -> Download From Human3.6M Video and annotation ├── MixSTE -> Clone from github ├── VideoPose3D ├── Anatomy3D -> paper code ├── MHFormer -> TrashFormer ├── hrnet ├── tmp ->result folder ├── tmp1 ->result folder ├── tmp2 ->result folder ├── tmp3 ->result folder ├── tmp_bodi ->result folder └── BackupPose -> backup ``` ### Script * run_parser.sh - 用來將自定義影片從bag變成影片檔 * run_part.sh - 主要跑到影片、插幀處理完要debug run_poseformer的部分 * prepare.sh - 引數一:人物引數二: 要debug的影片編號用Alphapose做成 2d_keypoints 來 prepare 2d npz * train1,2,3/sh - 不同的train方式 * ## Cocometric 為了生成這張圖: ![](https://hackmd.io/_uploads/rkAVScVHh.png) 我們需要使用 coco-api， coco-api 的 evaluation [參考](https://blog.csdn.net/bryant_meng/article/details/108325287) ```python class COCOeval: def __init__(self, cocoGt=None, cocoDt=None, intType=''): ## implement cocoEval = COCOeval(cocoGt, cocoDt, annType) cocoEval.evaluate() cocoEval.accumulate() cocoEval.summarize() ``` 我們需要傳入 cocoGt ，就是annotaion，而 cocoDt 就是我們的預測結果，格式應符合coco format的 .json文件如下: ``` [{ "image_id" : int, "category_id": int, # ex: 1 people "keypoints": [x1,y1,v1,...,xk,yk,vk] #ex: [430.7, 310.5, 1.0, ....] "score": float # ex: 0.97 }] ``` 而 Gt 仍需要一個　'coco_url' 用來儲存圖片位置以下為 [coco_analyze_demo](https://github.com/matteorr/coco-analyze/blob/release/COCOanalyze_demo.ipynb) 範例: ```python import json import numpy as np from pycocotools.coco import COCO from pycocotools.cocoeval import COCOeval from pycocotools.cocoanalyze import COCOanalyze import matplotlib as plt import skimage.io as io dataDir = '.' dataType = 'val2014' annType = 'person_keypoints' teamName = 'NCKUVision' annFile = '%s/annotations/%s_%s.json'%(dataDir, annType, dataType) resFile = '%s/detections/%s_%s_%s_results.json'%(dataDir, teamName, annType, dataType) gt_data = json.load(open(annFile,'rb')) imgs_info = {i['id']:{'id':i['id'], 'width':i['width'], 'height':i['height']} for i in gt_data['images']} team_dts = json.load(open(resFile, 'rb')) team_dts = [d for d in team_dts if d['image_id'] in imgs_info] coco_gt = COCO(annFile) coco_dt = coco_gt.loadRes(team_dts) coco_analyze = COCOanalyze(coco_gt, coco_dt, 'keypoints') if teamName == '....test': imgIds = sorted(coco_gt.getImgIds())[0:100] coco_analyze.cocoEval.params.imgIds = imgIds coco_analyze.evaluate(verbose=True, makeplots=True) ``` 可以調整 coco_analyze 的各項參數 ```python coco_analyze.params.oksThrs = [.5, .55, .6, .65, .7, .75, .8, .85, .9, .95] # set OKS threshold required to match a detection to a ground truth coco_analyze.params.oksLocThrs = .1 # # set KS threshold limits defining jitter errors coco_analyze.params.jitterKsThrs = [.5, .85] coco_analyze.params.err_types = ['miss', 'swap', 'inversion', 'jitter'] # use analyze() method for advanced error analysis coco_analyze.analyze(check_kpts=True, check_scores=True, check_bckgd=True) coco_analyze.summarize(makeplots=True) ``` ![](https://hackmd.io/_uploads/SyvSIhPBn.png) 可以看到圖表裡佔比很大的是 Jit ，就是關鍵點的顫動，這裡定義於 OKS 介於 [0.5, 0.85]之間 ### 製作 Ground Truth .MP4 -> FetchFrame .py -> /img local .MP4 -> getprediction by mmpose/demo/topdown_demo_with_mmdet.py --save-prediction=True -> local桌面/test_python/json/ -> test_python/label_ours.py -> output test?.json ->merge use h36mtococo .py format -> put in /mmpose/data/custom ``` (base) p76094266@vision-B660M-D3H-DDR4:~/mmpose/data$ tree ./ -L 1 ./ ├── coco │ ├── annotations │ ├── annotations_trainval2017.zip │ ├── person_detection_results │ ├── train2017 │ └── val2017 ├── custom │ ├── annotations -> 從 h36mtococo轉換出來的標註 │ ├── images -> fetechframe 拆出來的影像幀 │ └── tmp ├── Fetch_frame.py ├── h36m │ ├── annotation_body2d │ ├── annotation_body3d │ └── images ├── mpii │ ├── annotations │ └── images ├── testdata.py ├── test_output ├── tmpvis_img ``` # 凱予學長 Keypoints 流程首先 UI.py 用來顯示影片結果當我們使用 demo.py 時會有五個按鍵，按下按鈕會開啟另一個 process 執行 run.bat 後面再接一個參數是資料集的人。 read_csv 的 csv 為一個人的每一幀的 $(x, y, z)$ 關鍵點，下圖所示 ![](https://i.imgur.com/sD4FgLd.png) 流程: ```graphviz digraph test{ node[shape=record]; #rankdir="LR" step1[label= "parser.py" ] step2[label="ffmpeg 插幀 & 壓縮"] step3[label="interpolate.py 對齊幀數至 ground truth 的幀數"] step4[label = "read_csv.py 並儲存 npz 為 gt_npy & 3d_custom.npz"] step5[label="Poseformer 的 inference.py 重建 3D 關鍵點"] step6[label="Alphapose 的 inference.py 產生2D 的可視化結果"] step7[label = "evaluation 主要是針對職能治療計算對稱係數 Vx,Vy,Vz "] step8[label="3D smooth.py"] step9[label="draw.py & stack_video.py"] step1->step2 step2->step3 step3->step4 step4->step5 step5->step6 step6->step7 step7->step8 step8->step9 } ``` ### Parser 輸入 : * CNT * NAME handle_video() : 從 tmp\\\bag\\\ 讀取影片.bag檔開啟串流建立深度影像串流宣告著色器寫入 color_color_image 輸出: * \\tmp\\original_video\\$videoname$.mp4 Encode:fourcc 幀:60 長寬:640 * 480 * tool\\frame.yaml : dump(d): ```python ######### frame.yaml ######### M1: align_frame_num: 400 frame_number: 221 realworld_frame_num: 367 fps: 60 name: P02 ############### d = dict() d['fps'] = 60 d['name'] = sys.argv[2] # yuan johnson ... d[v]['frame_num'] = frame_num #real frame_num d[v]['realworld_frame_num'] = realword_frame_num # read from \\tmp\\csv\\$action$.csv d[v]['align_frame_num'] = ((realword_frame_num+100-1)//100)*100 ``` ### ffmpeg 輸入: * tmp\\original_video\\$M$.mp4 輸出: * tmp\\interpolate\\$M$_100.mp4 幀數:100 編碼壓縮-crf 品質10 ### interpolate 輸入: * tool\\frame.yaml * tmp\\interpolate\\$M$_100.mp4 輸出: * tmp\\align_video\\$M$_frame_align.mp4 幀數:100 總幀數補到100的倍數(align_frame_num) ### read_csv 輸入: * tmp\\csv\\$M$.csv ![](https://i.imgur.com/yAbeyCN.png) * tool\\frame.yaml 映射透過 joint_3d[align_frame_num, 17, 3] 到 h36m_3d[align_frame_num, 32, 3] 輸出: * np.save(f'tmp\\{SubjectName}_gt_npy\\{SubjectName}_{action[1:]}', joint_3d) ``` {ndarray:(400, 17, 3)} ``` * np.savez_compressed(f"tmp\\npz\\data_3d_custom_{action}.npz", positions_3d=data, metadata=metadata) ``` keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002467FB57888>) --- *** --- metadata - shape: () - : {'layout_name': 'coco', 'num_joints': 17, 'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]], 'video_metadata': {'M1_frame_align.mp4': {'w': 640, 'h': 480}}} --- positions_3d - shape: () - : {'M1_frame_align.mp4': {'custom': array([[[ 0.00528335, 0.0892668 , 0.668917 ], [-0.15261501, 0.169559 , 0.677401 ], [-0.25646898, -0.0534065 , 0.417277 ], ..., [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]], [[ 0.00531119, 0.0889249 , 0.668534 ], [-0.15253599, 0.16917999, 0.677226 ], [-0.256303 , -0.053511 , 0.417238 ], ..., [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]], [[ 0.00537636, 0.0885618 , 0.668128 ], [-0.152507 , 0.168759 , 0.677095 ], [-0.25613102, -0.0535853 , 0.417187 ], ..., [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]], ..., [[ 0.025927 , -0.147714 , 0.87259495], [-0.136304 , -0.0708796 , 0.863441 ], [-0.21164301, -0.00255003, 0.447731 ], ..., [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]], [[ 0.025927 , -0.147714 , 0.87259495], [-0.136304 , -0.0708796 , 0.863441 ], [-0.21164301, -0.00255003, 0.447731 ], ..., [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]], [[ 0.025927 , -0.147714 , 0.87259495], [-0.136304 , -0.0708796 , 0.863441 ], [-0.21164301, -0.00255003, 0.447731 ], ..., [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]]], dtype=float32)}} --- ``` ```python data = {f'{action}_frame_align.mp4': {'custom': h36m_3d}} metadata = {'layout_name': 'coco', 'num_joints': 17, 'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]], 'video_metadata': {f'{action}_frame_align.mp4': {'w': 640, 'h': 480}}} ``` ```bash xcopy tmp\align_video\*.mp4 PoseFormer\%NAME%\ xcopy tmp\align_video\*.mp4 PoseFormer\Alphapose\%NAME%_videos\ xcopy tmp\npz\*.npz PoseFormer\data\%NAME%\ ``` AlphaPose.py 的 videoloader class 結構 ![](https://i.imgur.com/FJLSXqa.png) Detectionloader class 結構 ![](https://i.imgur.com/NU1OTds.png) DetectionProcessor class 結構 ![](https://i.imgur.com/v3BMICM.png) ### Alphapose AlphaPose.py 的 handle_video(): ```graphviz digraph test{ node[shape=record]; #rankdir="LR" step1[label= "VideoLoader" ] step2[label="DetectionLoader"] step3[label="DetectionProcessor"] step4[label = "InferenNet Or InferenNet_fast"] step5[label="hm_j = pose_model(inps_j)"] step6[label="kpts = final_result[i]['result'][0]['keypoints']"] step7[label = "savgol_filter filter kpts with win_size=31 polyorder=2"] step8[label="OneEuroFilter"] step9[label="np.savez_compressed .mp4.npz,boxes, segments, keypoints, metadata, scale_hw"] step1->step2 step2->step3 step3->step4 step4->step5 step5->step6 step6->step7 step7->step8 step8->step9 } ``` 輸入: * $people\$M$_align.mp4 * nClasses = 17 輸出: * save_path = os.path.join(args.outputpath, 'AlphaPose_'+ntpath.basename(videofile).split('.')[0]+'.avi') writer = DataWriter(args.save_video, save_path, cv2.VideoWriter_fourcc(*'XVID'), fps, frameSize).start() * writer.save(None, None, None, None, None, orig_img, im_name.split('/')[-1]) * writer.save(boxes, scores, hm, pt1, pt2, orig_img, im_name.split('/')[-1]) * np.savez_compressed(f"./npz/{filename}.mp4.npz", boxes=boxess, segments=segments, keypoints=keypoints, metadata=metadata, scale=scale_hw) * filename = $people\$M$_align.npz * keypoinys = [[[],kps],...] * metadata = { 'w': orig_img.shape[1], 'h': orig_img.shape[0], } * scale_hw = max(pw, ph) * segments = [[None],...] ``` keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002461E02CC48>) --- *** --- boxes - shape: (400, 2) - : [[list([]) array([[229.03535 , 91.8827 , 417.15054 , 479. , 0.99792993]], dtype=float32) ] [list([]) array([[229.01895 , 91.6857 , 416.1857 , 479. , 0.9971854]], dtype=float32) ] [list([]) array([[228.92445 , 92.09213 , 416.52417 , 479. , 0.9976592]], dtype=float32) ] [list([]) array([[228.83986 , 92.06596 , 416.58234 , 479. , 0.997719]], dtype=float32) ] [list([]) array([[229.35788 , 90.477356, 415.71912 , 479. , 0.997547]], dtype=float32) ] [list([]) array([[229.31694 , 90.10956 , 415.83145 , 479. , 0.9976223]], dtype=float32) ] ... [list([]) array([[[326.44873 , 334.68924 , 318.20828 , 340.1829 , 298.98047 , 359.41064 , 274.25903 , 375.89163 , 257.77808 , 395.1194 , 249.5376 , 351.17017 , 296.23364 , 364.9043 , 277.00586 , 362.15753 , 279.75272 ], [115.844215, 107.603745, 107.603745, 115.844215, 110.35056 , 165.2871 , 168.03394 , 228.46411 , 236.70459 , 288.8943 , 294.38797 , 286.1475 , 286.1475 , 371.2991 , 379.5396 , 442.71655 , 448.21027 ]]], dtype=float32) ] [list([]) array([[[326.44873 , 334.68924 , 318.20828 , 340.1829 , 298.98047 , 359.41064 , 274.25903 , 375.89163 , 257.77808 , 395.1194 , 249.5376 , 351.17017 , 296.23364 , 364.9043 , 277.00586 , 362.15753 , 279.75272 ], [115.844215, 107.603745, 107.603745, 115.844215, 110.35056 , 165.2871 , 168.03394 , 228.46411 , 236.70459 , 288.8943 , 294.38797 , 286.1475 , 286.1475 , 371.2991 , 379.5396 , 442.71655 , 448.21027 ]]], dtype=float32) ]] --- metadata - shape: () - : {'w': 640, 'h': 480} --- scale - shape: () - : 0.8064944 --- segments - shape: (400,) - : [None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None None] --- segments(400,) keypoints(400,2) boxes(400,2) ``` ### prepare_data_2d_custom 輸入: data\\..\\npz\\*.npz 輸出: * np.savez_compressed(output_prefix_2d + args.output, positions_2d=output, metadata=metadata, customScale=scale) *Poseformer//data//data_2d_custom_myvideos.npz* * output_prefix_2d + args.output = 'data_2d_custom_' + myvideos * output={'canonical_name:{'custom': = data[0]['keypoints']}'} * metadata : metadata['video_metadata'][canonical_name] = video_metadata * scale = data[0]['scale'] ``` keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002C9E3E10E48>) --- *** --- customScale - shape: () - : 0.8064944 --- metadata - shape: () - : {'layout_name': 'coco', 'num_joints': 17, 'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]], 'video_metadata': {'M1_frame_align.mp4': {'w': 640, 'h': 480}}} --- positions_2d - shape: () - : {'M1_frame_align.mp4': {'custom': [array([[[316.97876 , 178.19023 ], [324.2492 , 171.16183 ], [311.4052 , 169.94934 ], ..., [266.0856 , 388.78876 ], [364.47934 , 444.28693 ], [277.71844 , 453.49628 ]], [[316.97876 , 178.19023 ], [324.2492 , 171.16183 ], [311.4052 , 169.94934 ], ..., [266.0856 , 388.78876 ], [364.47934 , 444.28693 ], [277.71844 , 453.49628 ]], [[316.97876 , 178.19023 ], [324.2492 , 171.16183 ], [311.4052 , 169.94934 ], ..., [266.0856 , 388.78876 ], [364.47934 , 444.28693 ], [277.71844 , 453.49628 ]], ..., [[326.44873 , 115.844215], [334.68924 , 107.603745], [318.20828 , 107.603745], ..., [277.00586 , 379.5396 ], [362.15753 , 442.71655 ], [279.75272 , 448.21027 ]], [[326.44873 , 115.844215], [334.68924 , 107.603745], [318.20828 , 107.603745], ..., [277.00586 , 379.5396 ], [362.15753 , 442.71655 ], [279.75272 , 448.21027 ]], [[326.44873 , 115.844215], [334.68924 , 107.603745], [318.20828 , 107.603745], ..., [277.00586 , 379.5396 ], [362.15753 , 442.71655 ], [279.75272 , 448.21027 ]]], dtype=float32)]}} --- ``` ### run_poseformer 輸入: * parse_args() * keypoints = np.load('data/data_2d_' + args.dataset + '_' + args.keypoints + '.npz', allow_pickle=True) * data/{args.subject_people}/data_3d_custom_M{args.test_action}.npz (Custom) ``` keys are :KeysView(<numpy.lib.npyio.NpzFile object at 0x000002C9E3E109C8>) --- *** --- metadata - shape: () - : {'layout_name': 'coco', 'num_joints': 17, 'keypoints_symmetry': [[1, 3, 5, 7, 9, 11, 13, 15], [2, 4, 6, 8, 10, 12, 14, 16]], 'video_metadata': {'output100.mp4': {'w': 640, 'h': 480}}} --- positions_3d - shape: () - : {'output100.mp4': {'custom': array([[[0.0444855 , 0.0357131 , 0.91960603], [0.181417 , 0.119218 , 0.90842605], [0.171964 , 0.166813 , 0.432604 ], ..., [0. , 0. , 0. ], [0. , 0. , 0. ], [0. , 0. , 0. ]], [[0.0443666 , 0.0357647 , 0.91960603], [0.18128799, 0.119236 , 0.908386 ], [0.171895 , 0.166792 , 0.432621 ], ..., [0. , 0. , 0. ], [0. , 0. , 0. ], [0. , 0. , 0. ]], [[0.0442933 , 0.0358185 , 0.91958696], [0.181197 , 0.119208 , 0.90839 ], [0.171862 , 0.166786 , 0.43262202], ..., [0. , 0. , 0. ], [0. , 0. , 0. ], [0. , 0. , 0. ]], ..., [[0.0404266 , 0.0125043 , 0.91844296], [0.176933 , 0.095312 , 0.90726703], [0.169449 , 0.155723 , 0.431024 ], ..., [0. , 0. , 0. ], [0. , 0. , 0. ], [0. , 0. , 0. ]], [[0.0405676 , 0.0127635 , 0.91798997], [0.176866 , 0.0951155 , 0.90725404], [0.16938299, 0.155638 , 0.431013 ], ..., [0. , 0. , 0. ], [0. , 0. , 0. ], [0. , 0. , 0. ]], [[0.040665 , 0.0128088 , 0.917722 ], [0.17683001, 0.094931 , 0.907256 ], [0.16930701, 0.155543 , 0.431019 ], ..., [0. , 0. , 0. ], [0. , 0. , 0. ], [0. , 0. , 0. ]]], dtype=float32)}} --- ``` dataset = CustomDataset(f'data/{args.subject_people}/data_3d_custom_M{args.test_action}.npz') Prepare dict(): keypoints['customScale':customScale,'positions_2d':keypoints,'metadata':keypoints_metadata,'keypoints_symmetry':kps_left, kps_right] boneindex = [[16,15],[15,14],[13,12],[12,11],[10,9],[9,8],[8,7],[8,11],[8,14],[7,0],[3,2],[2,1],[6,5],[5,4],[1,0],[4,0]] cameras_valid, poses_valid, poses_valid_2d = fetch(subjects_test, action_filter) fetch 透過 subjects 和 action ，來找出 dict()裡的 camera 參數、3D keypoints、2D keypoints 回傳輸出: * np.save(f'{args.subject_people}_npy/{args.subject_people}_{args.test_action}', prediction) ``` ndarray(600, 17, 3) ``` run_poseformer 參數 ```shell -ta : --test-action -sp : --subject-people -d : -dataset -k : --keypoints -str : --subjects-train -ste : --subjects-test -a : --actions -c : --checkpoint -f : -frame ``` ### Human Pose 3.6m [參考資料](https://blog.csdn.net/alickr/article/details/107837403)