Wan-Alpha

Video Generation with Stable Transparency via Shiftable RGB-A Distribution Learner

Qualitative results of video generation using Wan-Alpha-v2.0. Our model successfully generates various scenes with accurate and clearly rendered transparency. Notably, it can synthesize diverse semi-transparent objects, glowing effects, and fine-grained details such as hair.

🔥 News

[2025.12.16] Released Wan-Alpha v2.0, the Wan2.1-14B-T2V–adapted weights and inference code are now open-sourced.
[2025.12.16] We update our paper on arXiv.
[2025.09.30] Our technical report is available on arXiv.
[2025.09.30] Released Wan-Alpha v1.0, the Wan2.1-14B-T2V–adapted weights and inference code are now open-sourced.

📝 To-Do List

Paper: Available on arXiv.
Inference Code: Released inference pipeline for Wan-Alpha v1.0 and v2.0.
Model Weights: Released checkpoints for Wan-Alpha v1.0 and v2.0.
Image-to-Video: Release Wan-Alpha-I2V model weights.
Dataset: Open-source the VAE and T2V training dataset.
Training Code (VAE&T2V): Release training scripts for the VAE and text-to-RGBA video generation.

🌟 Showcase

Text-to-Video Generation with Alpha Channel

Prompt	Preview Video	Alpha Video
"The background of this video is transparent. It features a beige, woven rattan hanging chair with soft seat and back cushions. Realistic style. Medium shot."

For more results, please visit Our Website

🚀 Quick Start

1. Environment Setup

# Clone the project repository
git clone https://github.com/WeChatCV/Wan-Alpha.git
cd Wan-Alpha

# Create and activate Conda environment
conda create -n Wan-Alpha python=3.11 -y
conda activate Wan-Alpha

# Install dependencies
pip install -r requirements.txt

2. Model Download

Download Wan2.1-T2V-14B

Download Lightx2v-T2V-14B

Download Wan-Alpha VAE

🧪 Usage

You can test our model through:

torchrun --nproc_per_node=8 --master_port=29501 generate_dora_lightx2v_mask.py --size 832*480\
         --ckpt_dir "path/to/your/Wan-2.1/Wan2.1-T2V-14B" \
         --dit_fsdp --t5_fsdp --ulysses_size 8 \
         --vae_lora_checkpoint "path/to/your/decoder.bin" \
         --lora_path "path/to/your/t2v.safetensors" \
         --lightx2v_path "path/to/your/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors" \
         --sample_guide_scale 1.0 \
         --frame_num 81 \
         --sample_steps 4 \
         --lora_ratio 1.0 \
         --lora_prefix "" \
         --alpha_shift_mean 0.05 \
         --cache_path_mask "path/to/your/gauss_mask" \
         --prompt_file ./data/prompt.txt \
         --output_dir ./output

You can specify the weights of Wan2.1-T2V-14B with --ckpt_dir, LightX2V-T2V-14B with --lightx2v_path, Wan-Alpha-VAE with --vae_lora_checkpoint, and Wan-Alpha-T2V with --lora_path. Finally, you can find the rendered RGBA videos with a checkerboard background and PNG frames at --output_dir.

You can use gen_gaussian_mask.py to generate a Gaussian mask from an existing alpha video. Alternatively, you can directly create a Gaussian ellipse video, which can be either static or dynamic (e.g., moving from left to right). Note that alpha_shift_mean is a fixed parameter.

Prompt Writing Tip: You need to specify that the background of the video is transparent, the visual style, the shot type (such as close-up, medium shot, wide shot, or extreme close-up), and a description of the main subject. Prompts support both Chinese and English input.

# An example of prompt.
This video has a transparent background. Close-up shot. A colorful parrot flying. Realistic style.

🔨 Official ComfyUI Version

Coming soon...

🤝 Acknowledgements

This project is built upon the following excellent open-source projects:

DiffSynth-Studio (training/inference framework)
Wan2.1 (base video generation model)
LightX2V (inference acceleration)
WanVideo_comfy (inference acceleration)

We sincerely thank the authors and contributors of these projects.

✏ Citation

If you find our work helpful for your research, please consider citing our paper:

@misc{dong2025wanalpha,
      title={Video Generation with Stable Transparency via Shiftable RGB-A Distribution Learner}, 
      author={Haotian Dong and Wenjing Wang and Chen Li and Jing Lyu and Di Lin},
      year={2025},
      eprint={2509.24979},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.24979}, 
}

📬 Contact Us

If you have any questions or suggestions, feel free to reach out via GitHub Issues . We look forward to your feedback!

Downloads last month: -

Model tree for htdong/Wan-Alpha-v2.0

Base model

Wan-AI/Wan2.1-T2V-14B

Finetuned

(37)

this model