Tune A Video - HackMD

# Tune A Video * **Link:** [[pdf]](https://arxiv.org/pdf/2212.11565) * **Authors:** Jay Zhangjie Wu et al.: NUS + Tencent * **Comments:**: ICCV 2023 [[project page]](https://tuneavideo.github.io/) ## Introduction > New T2V generation setting—One-Shot Video Tuning, where only one text-video pair is presented. - Built on SoTA T2I diffusion model - Observation - T2I models can generate still images - Extending T2I models to generate multiple images shows good content consistency ## Method **Pipeline** ![image](https://hackmd.io/_uploads/SkTPXU4H0.png) - **Model fine-tuning**: - Finetune on given input video - ST-Attn: Fix $W^K$ and $W^V$, only update $W^Q$ as we want to query relevant positions in previous frames. ![image](https://hackmd.io/_uploads/SyOQ4UEB0.png) - **Inference**: - Incoporate structure guidance from the source video. - **Application**: - Object editing: change object in the text prompt. - Style transfer: add style to prompt - Personalized and controllable: use Dreambooth or ControlNet T2I model. ## Misc Implement with Egocentric video data - Given a (set) of video of an action, finetune on these videos. - Inference with/without structure guidance