Video Generation

Video Generation
- Survey
- Generation
- Animation
- Evaluation
- Detection
- Alignment
- Auto Regressive
- Editting
- Datasets
- Toolkits
- Tutorials
- Blog
- Products
- Misc

Survey

Generation

Hunyuan Video Speed Up with Teacache method has been added to my comfyui toolset 𝕏
TransPixar: Advancing Text-to-Video Generation with Transparency, arXiv, 2501.03006, arxiv, pdf, cication: -1

Luozhou Wang, Yijun Li, Zhifei Chen, ..., Zhe Lin, Yingcong Chen · (wileewang.github) · (huggingface)
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control, arXiv, 2501.03847, arxiv, pdf, cication: -1

Zekai Gu, Rui Yan, Jiahao Lu, ..., Wenping Wang, Yuan Liu · (DiffusionAsShader - IGL-HKUST)
Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers, arXiv, 2501.03931, arxiv, pdf, cication: -1

Yuechen Zhang, Yaoyang Liu, Bin Xia, ..., Eric Lo, Jiaya Jia · (julianjuaner.github) · (MagicMirror - dvlab-research)
LTX-Video: Realtime Video Latent Diffusion, arXiv, 2501.00103, arxiv, pdf, cication: -1

Yoav HaCohen, Nisan Chiprut, Benny Brazowski, ..., Zeev Melumian, Ofir Bibi · (LTX-Video - Lightricks)
Open-Sora: Democratizing Efficient Video Production for All, arXiv, 2412.20404, arxiv, pdf, cication: -1

Zangwei Zheng, Xiangyu Peng, Tianji Yang, ..., Tianyi Li, Yang You · (Open-Sora - hpcaitech)
musubi-tuner - kohya-ss

· (civitai) · (bilibili)
Autoregressive Video Generation without Vector Quantization, arXiv, 2412.14169, arxiv, pdf, cication: -1

Haoge Deng, Ting Pan, Haiwen Diao, ..., Yonggang Qi, Xinlong Wang · (NOVA - baaivision)
DirectorLLM for Human-Centric Video Generation, arXiv, 2412.14484, arxiv, pdf, cication: -1

Kunpeng Song, Tingbo Hou, Zecheng He, ..., Ahmed Elgammal, Felix Juefei-Xu
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation, arXiv, 2412.18597, arxiv, pdf, cication: -1

Minghong Cai, Xiaodong Cun, Xiaoyu Li, ..., Ying Shan, Xiangyu Yue · (onevfall.github) · (DiTCtrl - TencentARC)
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity, arXiv, 2412.09856, arxiv, pdf, cication: -1

Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, ..., Niraj K. Jha, Xiaoliang Dai · (lineargen.github)
🌟 FastVideo - hao-ai-lab

· (huggingface) · (huggingface) · (𝕏)
🌟 STIV: Scalable Text and Image Conditioned Video Generation, arXiv, 2412.07730, arxiv, pdf, cication: -1

Zongyu Lin, Wei Liu, Chen Chen, ..., Kai-Wei Chang, Yinfei Yang
Mobile Video Diffusion, arXiv, 2412.07583, arxiv, pdf, cication: -1

Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas, ..., Amir Ghodrati, Amirhossein Habibian · (qualcomm-ai-research.github)
🌟 SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints, arXiv, 2412.07760, arxiv, pdf, cication: -1

Jianhong Bai, Menghan Xia, Xintao Wang, ..., Pengfei Wan, Di Zhang · (jianhongbai.github)
GenMAC: Compositional Text-to-Video Generation with Multi-Agent Collaboration, arXiv, 2412.04440, arxiv, pdf, cication: -1

Kaiyi Huang, Yukun Huang, Xuefei Ning, ..., Yu Wang, Xihui Liu · (karine-h.github) · (arxiv)
Genie 2: A large-scale foundation world model

· (𝕏)
Mimir: Improving Video Diffusion Models for Precise Text Understanding, arXiv, 2412.03085, arxiv, pdf, cication: -1

Shuai Tan, Biao Gong, Yutong Feng, ..., Jingdong Chen, Ming Yang · (lucaria-academy.github)
Open-Sora Plan: Open-Source Large Video Generation Model, arXiv, 2412.00131, arxiv, pdf, cication: -1

Bin Lin, Yunyang Ge, Xinhua Cheng, ..., Yonghong Tian, Li Yuan · (Open-Sora-Plan - PKU-YuanGroup)
🌟 VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video Generation, arXiv, 2412.02259, arxiv, pdf, cication: -1

Mingzhe Zheng, Yongqi Xu, Haojian Huang, ..., Harry Yang, Ser-Nam Lim · (cheliosoops.github)
HunyuanVideo - Tencent

A Systematic Framework For Large Video Generation Model Training · (HunyuanVideo - Tencent)
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling, arXiv, 2411.18664, arxiv, pdf, cication: -1

Junha Hyung, Kinam Kim, Susung Hong, ..., Min-Jung Kim, Jaegul Choo · (junhahyung.github) · (STGuidance - junhahyung)
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model, arXiv, 2411.19108, arxiv, pdf, cication: -1

Feng Liu, Shiwei Zhang, Xiaofeng Wang, ..., Qixiang Ye, Fang Wan · (liewfeng.github) · (TeaCache - LiewFeng)
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement, arXiv, 2411.15115, arxiv, pdf, cication: -1

Daeun Lee, Jaehong Yoon, Jaemin Cho, ..., Mohit Bansal · (video-repair.github)
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation, arXiv, 2411.16657, arxiv, pdf, cication: -1

Zun Wang, Jialu Li, Han Lin, ..., Jaehong Yoon, Mohit Bansal · (dreamrunner-story2video.github)
AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation, arXiv, 2411.17383, arxiv, pdf, cication: -1

Ziyi Xu, Ziyao Huang, Juan Cao, ..., Jintao Li, Fan Tang · (cangcz.github)
🌟 Identity-Preserving Text-to-Video Generation by Frequency Decomposition, arXiv, 2411.17440, arxiv, pdf, cication: -1

Shenghai Yuan, Jinfa Huang, Xianyi He, ..., Jiebo Luo, Li Yuan · (arxiv) · (pku-yuangroup.github) · (ConsisID - PKU-YuanGroup)
LTX-Video - Lightricks

· (huggingface) · (𝕏)
Pyramid Flow, a training-efficient Autoregressive Video Generation method based on Flow Matching. 🤗

· (pyramid-flow.github) · (Pyramid-Flow - jy0205)
🌟 Generative World Explorer, arXiv, 2411.11844, arxiv, pdf, cication: -1

Taiming Lu, Tianmin Shu, Alan Yuille, ..., Daniel Khashabi, Jieneng Chen · (generative-world-explorer.github) · (𝕏) · (youtube)
The Matrix Infinite-Horizon World Generation with Real-Time Interaction

· (mp.weixin.qq)
lucid-v1 - SonicCodes
Motion Control for Enhanced Complex Action Video Generation, arXiv, 2411.08328, arxiv, pdf, cication: -1

Qiang Zhou, Shaofeng Zhang, Nianzu Yang, ..., Ye Qian, Hao Li · (mvideo-v1.github)
GameGen-X: Interactive Open-world Game Video Generation, arXiv, 2411.00769, arxiv, pdf, cication: -1

Haoxuan Che, Xuanhua He, Quande Liu, ..., Cheng Jin, Hao Chen · (GameGen-X - GameGen-X) · (mp.weixin.qq)
Adaptive Caching for Faster Video Generation with Diffusion Transformers, arXiv, 2411.02397, arxiv, pdf, cication: -1

Kumara Kahatapitiya, Haozhe Liu, Sen He, ..., Michael S. Ryoo, Tian Xie · (adacache-dit.github) · (AdaCache - AdaCache-DiT)
CogVideoX is an open-source video generation model originating from Qingying. 🤗

· (CogVideo - THUDM)
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion, arXiv, 2411.04928, arxiv, pdf, cication: -1

Wenqiang Sun, Shuo Chen, Fangfu Liu, ..., Jun Zhang, Yikai Wang · (chenshuo20.github) · (DimensionX - wenqsun)
Enhancing Motion in Text-to-Video Generation with Decomposed Encoding and Conditioning, arXiv, 2410.24219, arxiv, pdf, cication: -1

Penghui Ruan, Pichao Wang, Divya Saxena, ..., Jiannong Cao, Yuhui Shi · (pr-ryan.github) · (DEMO - PR-Ryan)
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality, arXiv, 2410.19355, arxiv, pdf, cication: -1

Zhengyao Lv, Chenyang Si, Junhao Song, ..., Ziwei Liu, Kwan-Yee K. Wong · (FasterCache - Vchitect)
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale, arXiv, 2410.20280, arxiv, pdf, cication: -1

Haozhe Liu, Shikun Liu, Zijian Zhou, ..., Jürgen Schmidhuber, Juan-Manuel Pérez-Rúa · (mardini-vidgen.github- Multi-Style Video Generation with Enhanced Effects)
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation, arXiv, 2410.20502, arxiv, pdf, cication: -1

Zongyi Li, Shujie Hu, Shujie Liu, ..., Hefei Ling, Furu Wei · (aka)
WorldSimBench: Towards Video Generation Models as World Simulators, arXiv, 2410.18072, arxiv, pdf, cication: -1

Yiran Qin, Zhelun Shi, Jiwen Yu, ..., Wanli Ouyang, Ruimao Zhang

· (iranqin.github)
Allegro - rhymes-ai
Allegro: Advanced Video Generation Model 🤗
Open-Sora Plan 🤗
Introducing Mochi 1 preview. A new SOTA in open-source video generation. Apache 2.0. 𝕏

· (genmo) · (models - genmoai)
Movie Gen: A Cast of Media Foundation Models, arXiv, 2410.13720, arxiv, pdf, cication: -1

Adam Polyak, Amit Zohar, Andrew Brown, ..., Vladan Petrovic, Yuming Du · (ai.meta)
VidPanos: Generative Panoramic Videos from Casual Panning Videos, arXiv, 2410.13832, arxiv, pdf, cication: -1

Jingwei Ma, Erika Lu, Roni Paiss, ..., Michael Rubinstein, Forrester Cole · (vidpanos.github)

Animation

DisPose: Disentangling Pose Guidance for Controllable Human Image Animation, arXiv, 2412.09349, arxiv, pdf, cication: -1

Hongxiang Li, Yaowei Li, Yuhang Yang, ..., Xuxin Cheng, Long Chen · (lihxxx.github) · (DisPose - lihxxx)
An image-to-video model by CreateAI. 🤗

· (Ruyi-Models - IamCreateAI)
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses, arXiv, 2412.00397, arxiv, pdf, cication: -1

Yatian Pang, Bin Zhu, Bin Lin, ..., Harry Yang, Li Yuan
🌟 StableAnimator: High-Quality Identity-Preserving Human Image Animation, arXiv, 2411.17697, arxiv, pdf, cication: -1

Shuyuan Tu, Zhen Xing, Xintong Han, ..., Chong Luo, Zuxuan Wu · (StableAnimator - Francis-Rings)
Trajectory Attention for Fine-grained Video Motion Control, arXiv, 2411.19324, arxiv, pdf, cication: -1

Zeqi Xiao, Wenqi Ouyang, Yifan Zhou, ..., Jianlou Si, Xingang Pan
AnimateAnything: Consistent and Controllable Animation for Video Generation, arXiv, 2411.10836, arxiv, pdf, cication: -1

Guojun Lei, Chi Wang, Hong Li, ..., Yikai Wang, Weiwei Xu · (yu-shaonian.github)
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations, arXiv, 2411.10818, arxiv, pdf, cication: -1

Hmrishav Bandyopadhyay, Yi-Zhe Song · (hmrishavbandy.github)
EasyAnimate - aigc-apps
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation, arXiv, 2411.04989, arxiv, pdf, cication: -1

Koichi Namekata, Sherwin Bahmani, Ziyi Wu, ..., Igor Gilitschenski, David B. Lindell · (kmcode1.github)
HelloMeme: Integrating Spatial Knitting Attentions to Embed High-Level and Fidelity-Rich Conditions in Diffusion Models, arXiv, 2410.22901, arxiv, pdf, cication: -1

Shengkai Zhang, Nianhong Jiao, Tian Li, ..., Boya Niu, Jun Gao · (HelloMeme - HelloVision) · (songkey.github)
CamI2V: Camera-Controlled Image-to-Video Diffusion Model, arXiv, 2410.15957, arxiv, pdf, cication: -1

Guangcong Zheng, Teng Li, Rui Jiang, ..., Tao Wu, Xi Li · (zgctroy.github) · (CamI2V - ZGCTroy)
FrameBridge: Improving Image-to-Video Generation with Bridge Models
Animate-X: Universal Character Image Animation with Enhanced Motion Representation, arXiv, 2410.10306, arxiv, pdf, cication: -1

Shuai Tan, Biao Gong, Xiang Wang, ..., Jingdong Chen, Ming Yang · (lucaria-academy.github)
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control, arXiv, 2410.13830, arxiv, pdf, cication: -1

Yujie Wei, Shiwei Zhang, Hangjie Yuan, ..., Yingya Zhang, Hongming Shan · (dreamvideo2.github)

Evaluation

🌟 Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models, arXiv, 2412.09645, arxiv, pdf, cication: -1

Fan Zhang, Shulin Tian, Ziqi Huang, ..., Yu Qiao, Ziwei Liu
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models, arXiv, 2411.13503, arxiv, pdf, cication: -1

Ziqi Huang, Fan Zhang, Xiaojie Xu, ..., Yu Qiao, Ziwei Liu · (huggingface)
How Far is Video Generation from World Model: A Physical Law Perspective, arXiv, 2411.02385, arxiv, pdf, cication: -1

Bingyi Kang, Yang Yue, Rui Lu, ..., Gao Huang, Jiashi Feng · (phyworld.github)
Artificial Analysis Video Generation Arena Leaderboard

Detection

Video Seal: Open and Efficient Video Watermarking, arXiv, 2412.09492, arxiv, pdf, cication: -1

Pierre Fernandez, Hady Elsahar, I. Zeki Yalniz, ..., Alexandre Mourachko · (videoseal - facebookresearch)

Alignment

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation, arXiv, 2412.14167, arxiv, pdf, cication: -1

Runtao Liu, Haoyu Wu, Zheng Ziqiang, ..., Renjie Pi, Qifeng Chen · (videodpo.github)
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment, arXiv, 2412.04814, arxiv, pdf, cication: -1

Yibin Wang, Zhiyu Tan, Junyan Wang, ..., Cheng Jin, Hao Li · (codegoat24.github) · (LiFT - CodeGoat24)

Auto Regressive

From Slow Bidirectional to Fast Causal Video Generators, arXiv, 2412.07772, arxiv, pdf, cication: -1

Tianwei Yin, Qiang Zhang, Richard Zhang, ..., Eli Shechtman, Xun Huang · (causvid.github)
Progressive Autoregressive Video Diffusion Models, arXiv, 2410.08151, arxiv, pdf, cication: -1

Desai Xie, Zhan Xu, Yicong Hong, ..., Arie Kaufman, Yang Zhou

· (desaixie.github)

Editting

🌟 STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution, arXiv, 2501.02976, arxiv, pdf, cication: -1

Rui Xie, Yinhong Liu, Penghao Zhou, ..., Zhenheng Yang, Ying Tai · (STAR - NJU-PCALab) · (arxiv) · (nju-pcalab.github)
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration, arXiv, 2501.01320, arxiv, pdf, cication: -1

Jianyi Wang, Zhijie Lin, Meng Wei, ..., Chen Change Loy, Lu Jiang · (iceclear.github)
Generative Video Propagation, arXiv, 2412.19761, arxiv, pdf, cication: -1

Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang, ..., Soo Ye Kim, Jiaya Jia
MoViE: Mobile Diffusion for Video Editing, arXiv, 2412.06578, arxiv, pdf, cication: -1

Adil Karjauv, Noor Fathima, Ioannis Lelekas, ..., Amir Ghodrati, Amirhossein Habibian
DIVE: Taming DINO for Subject-Driven Video Editing, arXiv, 2412.03347, arxiv, pdf, cication: -1

Yi Huang, Wei Xiong, He Zhang, ..., Mingfu Yan, Shifeng Chen · (dino-video-editing.github)
MyTimeMachine: Personalized Facial Age Transformation, arXiv, 2411.14521, arxiv, pdf, cication: -1

Luchao Qi, Jiaye Wu, Bang Gong, ..., David W. Jacobs, Roni Sengupta · (mytimemachine.github)
🌟 Generative Omnimatte Learning to Decompose Video into Layers

· (𝕏)
StableV2V: Stablizing Shape Consistency in Video-to-Video Editing, arXiv, 2411.11045, arxiv, pdf, cication: -1

Chang Liu, Rui Li, Kaidong Zhang, ..., Yunwei Lan, Dong Liu
Fashion-VDM: Video Diffusion Model for Virtual Try-On
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions, arXiv, 2411.02394, arxiv, pdf, cication: -1

Hao-Yu Hsu, Zhi-Hao Lin, Albert Zhai, ..., Hongchi Xia, Shenlong Wang · (haoyuhsu.github) · (autovfx - haoyuhsu)
🌟 ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning, arXiv, 2411.05003, arxiv, pdf, cication: -1

David Junhao Zhang, Roni Paiss, Shiran Zada, ..., Neal Wadhwa, Nataniel Ruiz · (generative-video-camera-controls.github)
Fashion-VDM: Video Diffusion Model for Virtual Try-On, arXiv, 2411.00225, arxiv, pdf, cication: -1

Johanna Karras, Yingwei Li, Nan Liu, ..., Chris Lee, Ira Kemelmacher-Shlizerman · (johannakarras.github) · (arxiv)
InvokeAI - invoke-ai
ComfyUI-MochiEdit - logtd
Framer: Interactive Frame Interpolation, arXiv, 2410.18978, arxiv, pdf, cication: -1

Wen Wang, Qiuyu Wang, Kecheng Zheng, ..., Yujun Shen, Chunhua Shen

Datasets

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption, arXiv, 2412.09283, arxiv, pdf, cication: -1

Tiehan Fan, Kepan Nan, Rui Xie, ..., Jian Yang, Ying Tai · (InstanceCap - NJU-PCALab) · (arxiv)
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation, arXiv, 2411.08380, arxiv, pdf, cication: -1

Xiaofeng Wang, Kang Zhao, Feng Liu, ..., Yingya Zhang, Xingang Wang · (egovid.github)
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation, arXiv, 2411.04709, arxiv, pdf, cication: -1

Wenhao Wang, Yi Yang · (tip-i2v.github.io)

Toolkits

cogvideox-factory - a-r-r-o-w

Tutorials

Blog

Products

Our BIGGEST feature drop of the year: AI Music Videos! 𝕏
State-of-the-art video and image generation with Veo 2 and Imagen 3
Sora System Card
Introducing Wonder Animation: New AI solution for animated films, powered by cutting-edge Video to 3D Scene technology

· (reddit)
Advanced Camera Control' feature for its AI video generation model 𝕏
Runway introduced Act-One, a new AI system that generates expressive character animations from a single video and image. 𝕏
Haiper launched version 2 of its video generation platform 𝕏

Misc

Install CogVideoX: Text-to-Video and Image-to-Video (ComfyUI)
ComfyUI-CogVideoXWrapper - kijai
optimized support for Genmo’s latest model and can run it fast on a GPU like 4090 𝕏

· (blog.comfy)
ComfyUI-MochiWrapper - kijai

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

video_generation.md

video_generation.md

Video Generation

Survey

Generation

Animation

Evaluation

Detection

Alignment

Auto Regressive

Editting

Datasets

Toolkits

Tutorials

Blog

Products

Misc

Files

video_generation.md

Latest commit

History

video_generation.md

File metadata and controls

Video Generation

Survey

Generation

Animation

Evaluation

Detection

Alignment

Auto Regressive

Editting

Datasets

Toolkits

Tutorials

Blog

Products

Misc