# Tune A Video
* **Link:** [[pdf]](https://arxiv.org/pdf/2212.11565)
* **Authors:** Jay Zhangjie Wu et al.: NUS + Tencent
* **Comments:**: ICCV 2023 [[project page]](https://tuneavideo.github.io/)
## Introduction
> New T2V generation setting—One-Shot Video Tuning, where only one text-video pair is presented.
- Built on SoTA T2I diffusion model
- Observation
- T2I models can generate still images
- Extending T2I models to generate multiple images shows good content consistency
## Method
**Pipeline**

- **Model fine-tuning**:
- Finetune on given input video
- ST-Attn: Fix $W^K$ and $W^V$, only update $W^Q$ as we want to query relevant positions in previous frames.

- **Inference**:
- Incoporate structure guidance from the source video.
- **Application**:
- Object editing: change object in the text prompt.
- Style transfer: add style to prompt
- Personalized and controllable: use Dreambooth or ControlNet T2I model.
## Misc
Implement with Egocentric video data
- Given a (set) of video of an action, finetune on these videos.
- Inference with/without structure guidance