We are developing a training system for figure skating. The system is designed to provide coach instructions to users with no professional equipment or 3D camera. The system includes 2D video analysis, human skeleton tracking, pose detection, and instruction generation.
This document focuses on temporal video alignment. Code is available here.
System Structure
Get the Embedding Space Model
To compare and find the differences between the learner's video and the standard motion, we have to temporally align two videos and automatically get the timestamp where the motion starts. Because of the lack of labelled data, we implement a self-supervised representation learning method, which originated from Temporal Cycle-Consistency (TCC) Learning[^TCC]. The method aims to find the temporal correspondence between video pairs and align two similar videos based on the resulting per-frame embeddings.
[^TCC]: Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. Temporal Cycle-Consistency Learning. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Dataset Preparation