---
tags: 生物辨識
---
# Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

將動態 point 利用時間序列來表示,結合了空間GCN `(spatial edges)`以及時間 TCN`(temporal edges)`的訊息
## 科普time

> 左邊可以視為結構化數據,右邊為非結構化
- 卷積
$$
f(X)=XW
$$
- 圖卷積
$$
f(X)=AXW
$$
A矩陣可以看作

## Implementing ST-GCN (paper最後面惹)
$$
f_{out}=\Lambda^{-\frac{1}{2}}(A+I)\Lambda^{-\frac{1}{2}}f_{in}W
$$
> symmetric normalized laplacian $D^{-\frac{1}{2}}LD^{-\frac{1}{2}}$
:::info
邊緣檢測時常用laplacian,這裡也是被用來描述節點與鄰居之間的訊號關係
:::
## Construction
spatial temporal graph `G = (V, E)`
$$
V = \{v_{ti}|t=1,...,T,i=1,...,N\}
$$
`N` joints and `T` frames
$$
E_S = \{v_{ti}v_{tj}|(i,j)\in H\}
$$
where H is the set of naturally connected human body joint.
$$
E_F = \{v_{ti}v_{(t+1)i}\}
$$
## Spatial Graph Convolutional Neural Network
$$
f_{out}(x) = \sum^K_{h=1}\sum^K_{w=1}f_{in}(\mathbf{p}(\mathbf{x},h,w))\cdot \mathbf{w}(h,w)
$$
$f_{in}(\mathbf{p}(\mathbf{x},h,w))$:取出 $\mathbf{p}(\mathbf{x},h,w)$ 位置的數據
$\mathbf{w}$:每一個通道的權重
### Sampling function
節點 $v_{ti}$ 的鄰近node集合
$$
\mathbf{p}(v_{ti},v_{tj})=v_{tj}\\
$$
$v_{ti}$ 的鄰近點集合
$$
B(v_{ti})=\{v_{tj}|d(v_{tj},v_{ti})\leq D\}
$$
### Weight function
$$
\mathbf{w}(v_{ti},v_{tj})=\mathbf{w}'(l_{ti}(v_{tj}))
$$
將 $B(v_{ti})$ 切分成 $K$ 個子集合
$l_{ti}:B(v_{ti})\to\{0,...,K-1\}$
### Spatial Graph Convolution
$$
f_{out}(v_{ti}) = \sum_{v_{tj}\in B(v_{ti})} \frac{1}{Z_{ti}(v_{tj})}f_{in}(v_{tj})\cdot \mathbf{w}(l_{ti}(v_{tj}))
$$
$Z_{ti}(v_{tj})=|\{v_{tk}|l_{ti}(v_{tk})=l_{ti}(v_{tj})\}|$
### Spatial Temporal Modeling
$$
B(v_{ti})=\{v_{qj}|d(v_{tj},v_{ti})\leq K,|q-t|\leq \lfloor\Gamma/2\rfloor \}
$$
The parameter $\Gamma$ controls the temporal range
$$
l_{ST}(u_{qj})=l_{ti}(u_{tj})+(q-t+\lfloor\Gamma/2\rfloor)\times K
$$
### Distance partitioning

- (b) Uni-labeling
$K=1$ and $l_{ti}(v_{tj})=0$
- \(c\) Distance partitioning
$K=2$ and $l_{ti}(v_{tj})=d(v_{tj}, v_{ti})$
- (d) Spatial configuration partitioning
$K=3$ and
$$l_{ti}(v_{tj})=
\begin{cases}
0& \text{if } r_j=r_i\\
1& \text{if } r_j<r_i\\
2& \text{if } r_j>r_i
\end{cases}
$$
## 優點
- 輸入數據量低 (只有node)
- 數據噪音低 (沒有背景)
- 結合空間時間的關係