## 2D → 2D EP.2
[書接上回](https://hackmd.io/@q_bRZRi5Q2iTv2zdBQYQHg/SJEWSOdg-e)
HumanMAC 原本是:
>從影像/條件 → 產生/補全 3D 動作序列(probabilistic motion completion)
現在改成:
>已知 2D 關節點/動作序列 → 預測未來 2D 關節點/動作
---
### 正規化流程
1. 載入原始 2D 檢測資料
2. 螢幕座標正規化:對每個受試者的每個動作的每個相機視角,使用 normalize_screen_coordinates() 函數將 2D 座標從像素座標轉換到正規化座標
正規化公式將 2D 座標從像素空間 [0, width] 和 [0, height] 轉換到 [-1, 1] 範圍。
:::spoiler normalize_screen_coordinates
```javascript
for subject in keypoints.keys():
for action in keypoints[subject]:
for cam_idx, kps in enumerate(keypoints[subject][action]):
# Normalize camera frame
cam = dataset.cameras()[subject][cam_idx]
if args.std != 0:
kps += np.random.normal(loc=0.0, scale=args.std, size=kps.shape)
kps[..., :2] = normalize_screen_coordinates(kps[..., :2], w=cam['res_w'], h=cam['res_h'])
keypoints[subject][action][cam_idx] = kps
```
:::
使用相機的解析度參數 cam['res_w'] 和 cam['res_h']。
:::spoiler
```javascript
# Normalize camera frame
cam['center'] = normalize_screen_coordinates(cam['center'], w=cam['res_w'], h=cam['res_h']).astype('float32')
cam['focal_length'] = cam['focal_length']/cam['res_w']*2
```
:::
:::spoiler h36m_cameras_intrinsic_params
```javascript
h36m_cameras_intrinsic_params = [
{
'id': '54138969',
'center': [512.54150390625, 515.4514770507812],
'focal_length': [1145.0494384765625, 1143.7811279296875],
'radial_distortion': [-0.20709891617298126, 0.24777518212795258, -0.0030751503072679043],
'tangential_distortion': [-0.0009756988729350269, -0.00142447161488235],
'res_w': 1000,
'res_h': 1002,
'azimuth': 70, # Only used for visualization
},
{
'id': '55011271',
'center': [508.8486328125, 508.0649108886719],
'focal_length': [1149.6756591796875, 1147.5916748046875],
'radial_distortion': [-0.1942136287689209, 0.2404085397720337, 0.006819975562393665],
'tangential_distortion': [-0.0016190266469493508, -0.0027408944442868233],
'res_w': 1000,
'res_h': 1000,
'azimuth': -70, # Only used for visualization
},
{
'id': '58860488',
'center': [519.8158569335938, 501.40264892578125],
'focal_length': [1149.1407470703125, 1148.7989501953125],
'radial_distortion': [-0.2083381861448288, 0.25548800826072693, -0.0024604974314570427],
'tangential_distortion': [0.0014843869721516967, -0.0007599993259645998],
'res_w': 1000,
'res_h': 1000,
'azimuth': 110, # Only used for visualization
},
{
'id': '60457274',
'center': [514.9682006835938, 501.88201904296875],
'focal_length': [1145.5113525390625, 1144.77392578125],
'radial_distortion': [-0.198384091258049, 0.21832367777824402, -0.008947807364165783],
'tangential_distortion': [-0.0005872055771760643, -0.0018133620033040643],
'res_w': 1000,
'res_h': 1002,
'azimuth': -110, # Only used for visualization
},
]
```
:::
---
### 測試結果
用 epoch = 200 訓練
→ loss 有在下降,結果一般般


* ADE (Average Displacement Error) – 平均偏移誤差 ↓ 越小越好
衡量預測軌跡與 GT 的平均距離。
→ 每幀的每個關節平均錯 80–100 px 左右
* FDE (Final Displacement Error) – 最終幀偏移誤差 ↓ 越小越好
衡量最後一幀預測與 GT 的距離。
→ 模型在最後一幀預測的位置,平均會偏掉 40–70px
* APD (Average Pairwise Diversity) – 多樣性指標 ↑ 越大越好
衡量模型生成的多個 sample 之間有多不一樣。
→ 生成的多個 sample 之間,平均會相差 30–70px
→ 有差異,但不算很大

:::spoiler



:::
:::spoiler

:::
:::spoiler epoch = 500



:::
:::spoiler epoch = 750



:::
---
### 視覺化效果
{%youtube 0L695BQx60M %}
{%youtube k6pRfXNehRc %}