20251125 程式_國ep.2-2

## 2D → 2D EP.2 [書接上回](https://hackmd.io/@q_bRZRi5Q2iTv2zdBQYQHg/SJEWSOdg-e) HumanMAC 原本是： >從影像／條件 → 產生/補全 3D 動作序列（probabilistic motion completion）現在改成： >已知 2D 關節點／動作序列 → 預測未來 2D 關節點／動作 --- ### 正規化流程 1. 載入原始 2D 檢測資料 2. 螢幕座標正規化:對每個受試者的每個動作的每個相機視角,使用 normalize_screen_coordinates() 函數將 2D 座標從像素座標轉換到正規化座標正規化公式將 2D 座標從像素空間 [0, width] 和 [0, height] 轉換到 [-1, 1] 範圍。 :::spoiler normalize_screen_coordinates ```javascript for subject in keypoints.keys(): for action in keypoints[subject]: for cam_idx, kps in enumerate(keypoints[subject][action]): # Normalize camera frame cam = dataset.cameras()[subject][cam_idx] if args.std != 0: kps += np.random.normal(loc=0.0, scale=args.std, size=kps.shape) kps[..., :2] = normalize_screen_coordinates(kps[..., :2], w=cam['res_w'], h=cam['res_h']) keypoints[subject][action][cam_idx] = kps ``` ::: 使用相機的解析度參數 cam['res_w'] 和 cam['res_h']。 :::spoiler ```javascript # Normalize camera frame cam['center'] = normalize_screen_coordinates(cam['center'], w=cam['res_w'], h=cam['res_h']).astype('float32') cam['focal_length'] = cam['focal_length']/cam['res_w']*2 ``` ::: :::spoiler h36m_cameras_intrinsic_params ```javascript h36m_cameras_intrinsic_params = [ { 'id': '54138969', 'center': [512.54150390625, 515.4514770507812], 'focal_length': [1145.0494384765625, 1143.7811279296875], 'radial_distortion': [-0.20709891617298126, 0.24777518212795258, -0.0030751503072679043], 'tangential_distortion': [-0.0009756988729350269, -0.00142447161488235], 'res_w': 1000, 'res_h': 1002, 'azimuth': 70, # Only used for visualization }, { 'id': '55011271', 'center': [508.8486328125, 508.0649108886719], 'focal_length': [1149.6756591796875, 1147.5916748046875], 'radial_distortion': [-0.1942136287689209, 0.2404085397720337, 0.006819975562393665], 'tangential_distortion': [-0.0016190266469493508, -0.0027408944442868233], 'res_w': 1000, 'res_h': 1000, 'azimuth': -70, # Only used for visualization }, { 'id': '58860488', 'center': [519.8158569335938, 501.40264892578125], 'focal_length': [1149.1407470703125, 1148.7989501953125], 'radial_distortion': [-0.2083381861448288, 0.25548800826072693, -0.0024604974314570427], 'tangential_distortion': [0.0014843869721516967, -0.0007599993259645998], 'res_w': 1000, 'res_h': 1000, 'azimuth': 110, # Only used for visualization }, { 'id': '60457274', 'center': [514.9682006835938, 501.88201904296875], 'focal_length': [1145.5113525390625, 1144.77392578125], 'radial_distortion': [-0.198384091258049, 0.21832367777824402, -0.008947807364165783], 'tangential_distortion': [-0.0005872055771760643, -0.0018133620033040643], 'res_w': 1000, 'res_h': 1002, 'azimuth': -110, # Only used for visualization }, ] ``` ::: --- ### 測試結果用 epoch = 200 訓練 → loss 有在下降，結果一般般 ![loss_curve](https://hackmd.io/_uploads/Byc0dkz-Wx.png) ![image](https://hackmd.io/_uploads/rJDFIkf-bl.png) * ADE (Average Displacement Error) – 平均偏移誤差 ↓ 越小越好衡量預測軌跡與 GT 的平均距離。 → 每幀的每個關節平均錯 80–100 px 左右 * FDE (Final Displacement Error) – 最終幀偏移誤差 ↓ 越小越好衡量最後一幀預測與 GT 的距離。 → 模型在最後一幀預測的位置，平均會偏掉 40–70px * APD (Average Pairwise Diversity) – 多樣性指標 ↑ 越大越好衡量模型生成的多個 sample 之間有多不一樣。 → 生成的多個 sample 之間，平均會相差 30–70px → 有差異，但不算很大 ![image](https://hackmd.io/_uploads/ryqKWffWbx.png =40%x) :::spoiler ![image](https://hackmd.io/_uploads/S1FGaezWZl.png) ![image](https://hackmd.io/_uploads/B157TeGbbl.png) ![image](https://hackmd.io/_uploads/BJVHTxzWWe.png) ::: :::spoiler ![image](https://hackmd.io/_uploads/S1o5U1M-Zl.png) ::: :::spoiler epoch = 500 ![loss_curve2](https://hackmd.io/_uploads/ryjbKkMb-g.png) ![image](https://hackmd.io/_uploads/rJ48IJGWWl.png) ![image](https://hackmd.io/_uploads/B17n8kM-Wg.png) ::: :::spoiler epoch = 750 ![loss_curve3](https://hackmd.io/_uploads/ByhgNsM-Wg.png) ![image](https://hackmd.io/_uploads/HkppQoGZZg.png) ![image](https://hackmd.io/_uploads/BJ_14jfW-e.png) ::: --- ### 視覺化效果 {%youtube 0L695BQx60M %} {%youtube k6pRfXNehRc %}