Open source models

### Open source video generation models | Model | Frames | Resolution | Method summary | Notes | Tags | Train on 40G | -------- | -------- | -------- | -------- | -------- | -------- | -------- | | [Tune A Video](/eiHXFafUS8qgw6iuoC9IbQ) | Based on input vid | Based on SD resolution | One-shot tuning; Stable Diffusion | Transfer motion from video to another | text2vid | &check; | [ModelScope](/fqij_nHkTi-922xijZjTow) | 16 (3 fps) | 336 x 596 | Initialize from SD, tune temporal layers | | text2vid | &check; | [I2VGen-XL](/wCl_7R5zQlCAN_2FVQJlmQ) | 16 (3 fps) | 1024 x 576 | ModelScope with scaling resolution | Concat img with latent + cross-atnn for cond img | (text+img)2vid | &check; | [VideoCrafter 1](/qkt99tAEQDWQdj4RStzRTA) | 2s | 1024 x 576 | Temporal Transformer | | text2vid | &check; | [DynamiCrafter](/qkt99tAEQDWQdj4RStzRTA) | 2s | 1024 x 576 | VC1 with input image | | (img+text)2vid | &check; | [VideoCrafter 2](/qkt99tAEQDWQdj4RStzRTA) | 2s | 1024 x 576 | Use low res video and high res img | | text2vid | &check; | [SVD](/jhMRYQmpRJ2iKMiZSXkbsw) | 25 (6fps) | 1024 x 576 | From SD | | img2vid | &check; | [EasyAnimate](/Gwlo_epWRNaPtP1Ke23VqA) | 144 (6s-24fps) | 1024 x 576 | Diffusion Transformer, PixArt | | text2vid | _ | [OpenSora](/hdHj88A6T8KvuIMZUSX8Zw) | 144 (6s 24fps) | 1280 x 720 | Diffusion Transformer; PixArt | | text2vid | _ | AnimateAnything | | | | | | _ | | | | | | | _ | Drag | | | | | | _ | Physical | | | | | | _