낙상 이벤트 인식 프로젝트

# 낙상 이벤트 인식 프로젝트 > 경북대학교병원 / 2021.08 ~ 2022.05 (10개월) ![](https://i.imgur.com/lIYiw1z.jpg) ## 프로젝트 요약 - 낙상 이벤트를 감지하기 위한 3D CNN 기반 binary classifier를 구현했습니다. - CNN feature로 부터 배경과 사람 성분을 동적으로 추론하는 뉴럴넷 학습을 통해 Feature level HRD(Human Roi Disentanglement)를 구현하였습니다. - Feature level HRD를 통해 사람의 움직임을 캡처하는 데 부정적인 영향을 주는 배경에 의한 노이즈를 효과적으로 줄일 수 있었습니다. - 공용 벤치마크 FDD, URFD에서 각각 98.96%, 99.60%의 state-of-the-art를 능가하는 높은 F1 점수를 달성했습니다. - 국내 컨퍼런스 논문 1편을 작성하여 한국멀티미디어학회(KMMS) 춘계학술대회에서 [우수논문상](https://drive.google.com/file/d/1tu09KzkjSHLgAt02Jm31ynDnYPBC7771/view?usp=sharing)을 수상하며 학술적 성과를 인정받았습니다. - 한국멀티미디어학회 춘계학술대회 논문 : [링크](https://drive.google.com/file/d/1wsxKqQtdzc5l_TPqZ9QP0CwKoZkBWXdR/view?usp=sharing) **Code** : [link](https://github.com/youhs4554/disjrnet-pytorch); **Notion** : [link](https://www.notion.so/Fall-down-Recognition-1f905e10fa3d47cebea18377ccf96639) ## 개발 목표 - 기존 낙상 인식 방법에서 사용된 ROI 추출 방법의 한계점 개선 및 일반화 성능 향상 - 공용 벤치마크에 대한 성능 검증을 통해 실제 병실 내에서 환자의 낙상 사고 인식에 사용할 수 있는 딥러닝 모델 연구/개발 ## 기술 스택 - PyTorch, PyTorch Lightning, - TorchVision, OpenCV - Video Understanding, Human Behavior Recognition, 3D CNNs ## 개발 내용 ### Feature level HRD(Human ROI Disentanglement) **HRD**: 다양한 localization 알고리즘(Detection, Segmentation 등)을 활용한 human ROI 추출 과정으로, 선행연구에서 많이 활용된 preprocessing 방법. >하지만, localization 알고리즘을 main task인 낙상 분류와 함께 최적화할 수 없기 때문에 일반화 성능에 제약이 있습니다. ![](https://i.imgur.com/ppdP6QA.png) **Feature-level HRD** (proposed) > 이러한 한계점을 개선하기 위해, 뉴럴넷을 통해 convolutional feature를 사람과 배경 성분으로 분리하는 `feature level HRD`라는 과정을 도입하였습니다. 즉, 모델 외부에서 이뤄지던 HRD 과정을 낙상 인식 모델 내부의 feature level에서 구현하고 최적화 하였습니다. ![](https://i.imgur.com/ZM82zoK.png) ### Feature level HRD 구현 > 뉴럴넷 $\mathcal{G}$ 를 통해 CNN feature $x$를 사람($x_{hm}$)과 배경($x_{bg}$)으로 분리하였습니다. 두 성분이 상호 배타적으로 동작하도록 다음과 같이 정의하였습니다. $$ x_{hm}=\mathcal{G}(x), x_{bg}=|x-x_{hm}| $$ ```python= class NonlinearDecomposer(nn.Module): def __init__(self, num_features, dimension=2): super().__init__() conv_builder = getattr(nn, "Conv"+str(dimension)+"d") # decomposer neuralnet (use SE_Block) self.decomposer = nn.Sequential( SE_Block(num_features, reduction_ratio=16, dimension=dimension), conv_builder(num_features, num_features, kernel_size=1), nn.ReLU(True), ) def forward(self, x): x_fg = self.decomposer(x) # 사람 성분 x_bg = torch.abs(x - x_fg) # 배경 성분 return x_fg, x_bg ``` ### Gated Residual Fusion 구현 > $x_{hm}$과 $x_{bg}$에 대한 기여도를 유연하게 결정하기 위해 학습 가능한 게이트 $\rho\in[0,1]$를 통해 둘을 soft fusion 해주었습니다. Residual learning을 하기 때문에 이 과정을 Gated Residual Fusion이라고 명명하였습니다. 다음과 같이 fusion output $x_{fusion}$을 계산하였습니다. $$ x_{fusion}=x+\text{Conv}^{1\times 1\times 1}(\rho x_{hm}+(1-\rho) x_{bg}) $$ ```python= def __init__(self,...): # shape: (1,C,1,1,1) self.gate = Parameter(torch.Tensor(1, num_features, *[1]*dimension), requires_grad=True) # Note: requires_grad=True self.gate.data.fill_(0.5) # initialized to 0.5 setattr(self.gate, 'is_gate', True) def fusion(self, x_fg, x_bg): x_hat = self.gate * x_fg + (1-self.gate) * x_bg # soft fusion with trainable gate out = self.affine(x_hat) # 1x1x1 conv return out def forward(self, x): ... # Gated residual fusion out = self.fusion(x_fg, x_bg) + x ``` ### Trainable gate 학습 > 학습 가능한 게이트 $\rho$는 다음과 같이 업데이트 하였습니다. $$ \rho \leftarrow clamp_{[0,1]}(\rho-\eta \Delta \rho) $$ ```python= # 1. training step self.manual_backward(loss) opt.step() opt.zero_grad() # 2. apply clamping, forcing gate to be in range [0,1] params_at_gate = [ p for p in self.model.parameters() if getattr(p, 'is_gate', False)] for p in params_at_gate: p.data.clamp_(min=0, max=1) ``` ### DisJR_Module > Feature level HRD와 Gated Residual Fusion 과정이 이뤄지는 계산 블록을 **DisJ**ointed **R**epresentation **Module**(`DisJR_Module`)로 명명하였습니다. `DisJR_Module` 코드는 다음과 같습니다. (전체 코드는 [여기](https://github.com/youhs4554/disjrnet-pytorch/blob/d05e432c2641997dc3c7d2e30e7b1c589c19c3f5/disjrnet/model/models.py#L61)에서 확인할 수 있습니다.) ```python= class MyModule(nn.Module): def __init__(self, ...): # ... def fusion(self, x_fg, x_bg): x_hat = self.gate * x_fg + (1-self.gate) * x_bg # soft fusion with trainable gate out = self.affine(x_hat) # 1x1x1 conv return out def forward(self, x): # Feature level HRD using neuralnet 'decomposer' x_fg, x_bg = self.decomposer(x) # Gated Residual Fusion out = self.fusion(x_fg, x_bg) + x return out ``` ### 모델 아키텍처 > [r2plus1d-18](https://arxiv.org/abs/1711.11248) 모델의 각 레이어마다 `DisJR_Module`을 반복적으로 삽입하여 전체 모델을 구성하였습니다. > (모델의 코드는 [여기](https://github.com/youhs4554/disjrnet-pytorch/blob/d05e432c2641997dc3c7d2e30e7b1c589c19c3f5/disjrnet/model/models.py#L127)에서 확인할 수 있습니다.) ![](https://i.imgur.com/A7XWNie.png) ### 결과 > 공용 낙상 벤치마크 데이터셋 `Le2i FDD`와 `URFD` 데이터셋에 대한 성능을 SOTA 모델들과 비교했습니다. SOTA 모델들에 비해 우수한 성능을 달성하였습니다. <center> SOTA 알고리즘과 비교 </center> ![](https://i.imgur.com/8RawU02.jpg) <center> balanced accruacy 비교 </center> ![](https://i.imgur.com/7Zktkya.png) :::info Detection box, Mask, Body pose 와 같은 side information 없이 이러한 결과를 얻어냈다는 것은 흥미로운 부분입니다. ::: ### Transition Boundary Test > non-fall$\rightarrow$fall$\rightarrow$non-fall로 이벤트가 전이되는 Transition boundary에서 모델의 softmax 출력을 비교해보았습니다. 제안한 방법이 더 정확한 낙상 boundary를 detect하는 것을 확인할 수 있었습니다. ![](https://i.imgur.com/OhHKB0m.png) ### Grad-CAM visualization > 낙상 클래스에 대한 [Grad-CAM](https://openaccess.thecvf.com/content_ICCV_2017/papers/Selvaraju_Grad-CAM_Visual_Explanations_ICCV_2017_paper.pdf)을 통해 feature contribution을 시각화하였습니다. 제안한 방법은 사람의 특징을 decision making에 효과적으로 반영하고 있는 것을 확인할 수 있었습니다. ![](https://i.imgur.com/iPBhHzs.jpg) :::info - 히트맵이 붉은색일 수록 의사결정을 내리는데 있어 해당 영역이 중요하다는 것을 의미합니다. - Feature-level HRD를 통해 human, background 성분을 적절히 분리하는 것을 확인할 수 있습니다. - 또한, Gated Residual fusion을 통해 보다 효과적으로 사람의 특징을 반영하는 것을 확인할 수 있습니다. :::