--- base_model: - Wan-AI/Wan2.2-TI2V-5B language: - en license: apache-2.0 pipeline_tag: text-to-video library_name: diffusers --- # Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory Matrix-Game 3.0 is an open-source, memory-augmented interactive world model designed for 720p real-time long-form video generation. It achieves up to 40 FPS real-time generation at 720p resolution with a 5B model while maintaining stable memory consistency over minute-long sequences.
## 📝 Overview The Matrix-Game 3.0 framework unifies three stages into an end-to-end pipeline: - **Data Engine**: An upgraded industrial-scale data engine integrating Unreal Engine synthetic data and AAA game collection to produce high-quality Video-Pose-Action-Prompt quadruplets. - **Model Training**: A memory-augmented Diffusion Transformer (DiT) that learns self-correction by modeling prediction residuals and employs camera-aware memory for long-horizon consistency. - **Inference Deployment**: Multi-segment autoregressive distillation (DMD), model quantization, and VAE decoder pruning to achieve efficient real-time inference.  ## ✨ Key Features - 🚀 **Real-Time Performance**: Supports 720p @ 40fps generation with the 5B model. - 🖱️ **Long-horizon Consistency**: Stable memory consistency over sequences lasting minutes. - 🎬 **Scalability**: Scaling to a 28B-MoE model (2x14B) further improves quality and generalization. ## 🚀 Quick Start ### Installation ```bash conda create -n matrix-game-3.0 python=3.12 -y conda activate matrix-game-3.0 # install FlashAttention and other dependencies git clone https://github.com/SkyworkAI/Matrix-Game-3.0.git cd Matrix-Game-3.0 pip install -r requirements.txt ``` ### Inference After downloading the pretrained weights, you can generate an interactive video with the following command: ```bash torchrun --nproc_per_node=$NUM_GPUS generate.py \ --size 704*1280 \ --dit_fsdp \ --t5_fsdp \ --ckpt_dir Matrix-Game-3.0 \ --fa_version 3 \ --use_int8 \ --num_iterations 12 \ --num_inference_steps 3 \ --image demo_images/000/image.png \ --prompt "a vintage gas station with a classic car parked under a canopy, set against a desert landscape." \ --save_name test \ --seed 42 \ --compile_vae \ --lightvae_pruning_rate 0.5 \ --vae_type mg_lightvae \ --output_dir ./output ``` ## ⭐ Acknowledgements - [Diffusers](https://github.com/huggingface/diffusers) for the diffusion model framework. - [Wan2.2](https://github.com/Wan-Video/Wan2.2) for the strong base model. - [Self-Forcing](https://github.com/guandeh17/Self-Forcing), [GameFactory](https://github.com/KwaiVGI/GameFactory), [LightX2V](https://github.com/ModelTC/lightx2v), and [lingbot-world](https://github.com/Robbyant/lingbot-world) for their contributions and frameworks. ## 📖 Citation If you find this work useful for your research, please cite: ```bibtex @misc{2026matrix, title={Matrix-Game 3.0: Real-Time and Streaming Interactive World Model with Long-Horizon Memory}, author={{Skywork AI Matrix-Game Team}}, year={2026}, howpublished={Technical report}, url={https://github.com/SkyworkAI/Matrix-Game/blob/main/Matrix-Game-3/assets/pdf/report.pdf} } ```