Improve model card metadata and content

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +52 -57
README.md CHANGED
@@ -1,96 +1,91 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - Wan-AI/Wan2.2-TI2V-5B
7
- pipeline_tag: image-text-to-video
 
 
 
 
8
  ---
 
9
  # Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
 
 
 
10
  <div style="display: flex; justify-content: center; gap: 10px;">
11
  <a href="https://github.com/SkyworkAI/Matrix-Game">
12
  <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
13
  </a>
14
- <a href="https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf">
15
- <img src="https://img.shields.io/badge/Technical Report-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="report">
16
  </a>
17
  <a href="https://matrix-game-v3.github.io/">
18
  <img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
19
  </a>
20
-
21
-
22
  </div>
23
 
24
  ## πŸ“ Overview
25
- **Matrix-Game-3.0** is an open-sourced, memory-augmented interactive world model designed for 720p real-time long-form video generation.
26
-
27
- ## Framework Overview
28
- Our framework unifies three stages into an end-to-end pipeline:
29
- - Data Engine β€” an industrial-scale infinite data engine integrating Unreal Engine synthetic scenes, large-scale automated AAA game collection,and real-world video augmentation to produce high-quality Video-Pose-Action-Prompt quadruplets at scale;
30
- - Model Training β€” a memory-augmented Diffusion Transformer (DiT) with an error buffer that learns action-conditioned generation with memory-enhanced long-horizon consistency;
31
- - Inference Deployment β€” few-step sampling, INT8 quantization, and model distillation achieving 720p@40FPS real-time generation with a 5B model.
32
 
33
  ![Model Overview](./framework.png)
34
 
35
  ## ✨ Key Features
36
- - πŸš€ **Feature 1**: **Upgraded Data Engine**: Combines Unreal Engine-based synthetic data, large-scale automated AAA game data, and real-world video augmentation to generate high-quality Video–Pose–Action–Prompt data.
37
- - πŸ–±οΈ **Feature 2**: **Long-horizon Memory & Consistency**: Uses prediction residuals and frame re-injection for self-correction, while camera-aware memory ensures long-term spatiotemporal consistency.
38
- - 🎬 **Feature 3**: **Real-Time Interactivity & Open Access**: It employs a multi-segment autoregressive distillation strategy based on Distribution Matching Distillation (DMD), combined with model quantization and VAE decoder distillation to support [40fps] real-time generation at 720p resolution with a 5B model, while maintaining stable memory consistency over minute-long sequence.
39
- - πŸ‘ **Feature 3**: **Scale Up 28B-MoE Model**: Scaling up to a 2Γ—14B model further improves generation quality, dynamics, and generalization.
40
-
41
- ## πŸ”₯ Latest Updates
42
-
43
- * [2026-03] πŸŽ‰ Initial release of Matrix-Game-3.0 Model
44
 
45
  ## πŸš€ Quick Start
 
46
  ### Installation
47
- Create a conda environment and install dependencies:
48
- ```
49
  conda create -n matrix-game-3.0 python=3.12 -y
50
  conda activate matrix-game-3.0
51
- # install FlashAttention
52
- # Our project also depends on [FlashAttention](https://github.com/Dao-AILab/flash-attention)
53
  git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
54
  cd Matrix-Game-3.0
55
  pip install -r requirements.txt
56
  ```
57
 
58
- ### Model Download
59
- ```
60
- pip install "huggingface_hub[cli]"
61
- huggingface-cli download Matrix-Game-3.0 --local-dir Matrix-Game-3.0
62
- ```
63
  ### Inference
64
- Before running inference, you need to prepare:
65
- - Input image
66
- - Text prompt
67
 
68
- After downloading pretrained models, you can use the following command to generate an interactive video with random actions:
69
- ``` sh
70
- torchrun --nproc_per_node=$NUM_GPUS generate.py --size 704*1280 --dit_fsdp --t5_fsdp --ckpt_dir Matrix-Game-3.0 --fa_version 3 --use_int8 --num_iterations 12 --num_inference_steps 3 --image demo_images/000/image.png --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." --save_name test --seed 42 --compile_vae --lightvae_pruning_rate 0.5 --vae_type mg_lightvae --output_dir ./output
71
- # "num_iterations" refers to the number of iterations you want to generate. The total number of frames generated is given by:57 + (num_iterations - 1) * 40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
  ```
73
- Tips:
74
- If you want to use the base model, you can use "--use_base_model --num_inference_steps 50". Otherwise if you want to generating the interactive videos with your own input actions, you can use "--interactive".
75
- With multiple GPUs, you can pass `--use_async_vae --async_vae_warmup_iters 1` to speed up inference.
76
 
77
  ## ⭐ Acknowledgements
78
- - [Diffusers](https://github.com/huggingface/diffusers) for their excellent diffusion model framework
79
- - [Self-Forcing](https://github.com/guandeh17/Self-Forcing) for their excellent work
80
- - [GameFactory](https://github.com/KwaiVGI/GameFactory) for their idea of action control module
81
- - [LightX2V](https://github.com/ModelTC/lightx2v) for their excellent quantization framework
82
- - [Wan2.2](https://github.com/Wan-Video/Wan2.2) for their strong base model
83
- - [lingbot-world](https://github.com/Robbyant/lingbot-world) for their context parallel framework
84
 
85
  ## πŸ“– Citation
86
- If you find this work useful for your research, please kindly cite our paper:
87
 
88
- ```
89
- @misc{2026matrix,
90
- title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
91
- author={{Skywork AI Matrix-Game Team}},
92
- year={2026},
93
- howpublished={Technical report},
94
- url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
95
- }
96
  ```
 
1
  ---
 
 
 
2
  base_model:
3
  - Wan-AI/Wan2.2-TI2V-5B
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
+ pipeline_tag: text-to-video
8
+ library_name: diffusers
9
  ---
10
+
11
  # Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory
12
+
13
+ Matrix-Game 3.0 is an open-source, memory-augmented interactive world model designed for 720p real-time long-form video generation. It achieves up to 40 FPS real-time generation at 720p resolution with a 5B model while maintaining stable memory consistency over minute-long sequences.
14
+
15
  <div style="display: flex; justify-content: center; gap: 10px;">
16
  <a href="https://github.com/SkyworkAI/Matrix-Game">
17
  <img src="https://img.shields.io/badge/GitHub-100000?style=flat&logo=github&logoColor=white" alt="GitHub">
18
  </a>
19
+ <a href="https://huggingface.co/papers/2604.08995">
20
+ <img src="https://img.shields.io/badge/Paper-b31b1b?style=flat&logo=arxiv&logoColor=white" alt="Paper">
21
  </a>
22
  <a href="https://matrix-game-v3.github.io/">
23
  <img src="https://img.shields.io/badge/Project%20Page-grey?style=flat&logo=huggingface&color=FFA500" alt="Project Page">
24
  </a>
 
 
25
  </div>
26
 
27
  ## πŸ“ Overview
28
+ The Matrix-Game 3.0 framework unifies three stages into an end-to-end pipeline:
29
+ - **Data Engine**: An upgraded industrial-scale data engine integrating Unreal Engine synthetic data and AAA game collection to produce high-quality Video-Pose-Action-Prompt quadruplets.
30
+ - **Model Training**: A memory-augmented Diffusion Transformer (DiT) that learns self-correction by modeling prediction residuals and employs camera-aware memory for long-horizon consistency.
31
+ - **Inference Deployment**: Multi-segment autoregressive distillation (DMD), model quantization, and VAE decoder pruning to achieve efficient real-time inference.
 
 
 
32
 
33
  ![Model Overview](./framework.png)
34
 
35
  ## ✨ Key Features
36
+ - πŸš€ **Real-Time Performance**: Supports 720p @ 40fps generation with the 5B model.
37
+ - πŸ–±οΈ **Long-horizon Consistency**: Stable memory consistency over sequences lasting minutes.
38
+ - 🎬 **Scalability**: Scaling to a 28B-MoE model (2x14B) further improves quality and generalization.
 
 
 
 
 
39
 
40
  ## πŸš€ Quick Start
41
+
42
  ### Installation
43
+ ```bash
 
44
  conda create -n matrix-game-3.0 python=3.12 -y
45
  conda activate matrix-game-3.0
46
+ # install FlashAttention and other dependencies
 
47
  git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git
48
  cd Matrix-Game-3.0
49
  pip install -r requirements.txt
50
  ```
51
 
 
 
 
 
 
52
  ### Inference
53
+ After downloading the pretrained weights, you can generate an interactive video with the following command:
 
 
54
 
55
+ ```bash
56
+ torchrun --nproc_per_node=$NUM_GPUS generate.py \
57
+ --size 704*1280 \
58
+ --dit_fsdp \
59
+ --t5_fsdp \
60
+ --ckpt_dir Matrix-Game-3.0 \
61
+ --fa_version 3 \
62
+ --use_int8 \
63
+ --num_iterations 12 \
64
+ --num_inference_steps 3 \
65
+ --image demo_images/000/image.png \
66
+ --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." \
67
+ --save_name test \
68
+ --seed 42 \
69
+ --compile_vae \
70
+ --lightvae_pruning_rate 0.5 \
71
+ --vae_type mg_lightvae \
72
+ --output_dir ./output
73
  ```
 
 
 
74
 
75
  ## ⭐ Acknowledgements
76
+ - [Diffusers](https://github.com/huggingface/diffusers) for the diffusion model framework.
77
+ - [Wan2.2](https://github.com/Wan-Video/Wan2.2) for the strong base model.
78
+ - [Self-Forcing](https://github.com/guandeh17/Self-Forcing), [GameFactory](https://github.com/KwaiVGI/GameFactory), [LightX2V](https://github.com/ModelTC/lightx2v), and [lingbot-world](https://github.com/Robbyant/lingbot-world) for their contributions and frameworks.
 
 
 
79
 
80
  ## πŸ“– Citation
81
+ If you find this work useful for your research, please cite:
82
 
83
+ ```bibtex
84
+ @misc{2026matrix,
85
+ title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory},
86
+ author={{Skywork AI Matrix-Game Team}},
87
+ year={2026},
88
+ howpublished={Technical report},
89
+ url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf}
90
+ }
91
  ```