Instructions to use BiliSakura/PixelDiT-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/PixelDiT-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/PixelDiT-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "A golden retriever playing in a sunny garden" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
BiliSakura/PixelDiT-diffusers
Self-contained PixelDiT checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline.py, component modules, and weights.
Converted from nvidia/PixelDiT-ImageNet and nvidia/PixelDiT-1300M-1024px using PixelDiT-diffusers.
Available checkpoints
| Subfolder | Pipeline | Task | Resolution | Source checkpoint | gFID | Params |
|---|---|---|---|---|---|---|
PixelDiT-T2I-1024/ |
PixelDiTT2IPipeline |
text-to-image | 1024Γ1024 | pixeldit_t2i_v1.pth |
β | ~1.3B |
PixelDiT-XL-16-256/ |
PixelDiTPipeline |
class-to-image | 256Γ256 | imagenet256_pixeldit_xl_epoch320.ckpt |
1.61 | ~700M |
PixelDiT-XL-16-512/ |
PixelDiTPipeline |
class-to-image | 512Γ512 | imagenet512_pixeldit_xl.ckpt |
1.81 | ~700M |
Repo layout
BiliSakura/PixelDiT-diffusers/
βββ README.md
βββ demo_inference.py
βββ PixelDiT-T2I-1024/
β βββ pipeline.py
β βββ model_index.json
β βββ demo.png
β βββ scheduler/scheduler_config.json
β βββ transformer/
βββ PixelDiT-XL-16-256/
β βββ pipeline.py
β βββ model_index.json
β βββ demo.png
β βββ scheduler/scheduler_config.json
β βββ transformer/
βββ PixelDiT-XL-16-512/
βββ pipeline.py
βββ model_index.json
βββ scheduler/scheduler_config.json
βββ transformer/
Each variant is self-contained. The scheduler/ folder uses built-in FlowMatchEulerDiscreteScheduler from PyPI diffusers. No shared helper modules at inference time beyond the local variant directory.
ImageNet class labels
id2label is embedded in each variant's model_index.json (DiT-style).
pipe.id2labelβ inspect id β English label correspondencepipe.labelsβ reverse map (English synonym β id)pipe.get_label_ids("golden retriever")pipe(class_labels="golden retriever", ...)β string labels resolved automatically
Demo
Text-to-image β "A golden retriever playing in a sunny garden", 1024Γ1024, 50 steps, guidance_scale=2.75.
python demo_inference_t2i.py
Class 207 β golden retriever, 256Γ256, 100 steps, guidance_scale=2.75, CFG interval [0.1, 0.9].
python demo_inference.py
Load from a local clone
Text-to-image 1024Γ1024 (PixelDiT-T2I-1024)
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./PixelDiT-T2I-1024").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
prompt="A golden retriever playing in a sunny garden",
negative_prompt="low quality, worst quality, over-saturated, blurry, deformed, watermark",
height=1024,
width=1024,
num_inference_steps=50,
guidance_scale=2.75,
generator=generator,
).images[0]
image.save("demo.png")
Gemma text encoder (google/gemma-2-2b-it) is downloaded on first run unless bundled under text_encoder/.
ImageNet 256Γ256 (PixelDiT-XL-16-256)
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./PixelDiT-XL-16-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
class_labels="golden retriever",
height=256,
width=256,
num_inference_steps=100,
guidance_scale=2.75,
guidance_interval_min=0.1,
guidance_interval_max=0.9,
generator=generator,
).images[0]
image.save("demo.png")
ImageNet 512Γ512 (PixelDiT-XL-16-512)
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./PixelDiT-XL-16-512").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
class_labels=207,
height=512,
width=512,
num_inference_steps=100,
guidance_scale=3.5,
guidance_interval_min=0.1,
guidance_interval_max=1.0,
generator=generator,
).images[0]
image.save("demo.png")
Recommended inference settings
| Variant | Steps | CFG scale | Scheduler shift | CFG interval |
|---|---|---|---|---|
PixelDiT-T2I-1024 |
50 | 2.75 | 4.0 | [0.0, 1.0] |
PixelDiT-XL-16-256 |
100 | 2.75 | 1.0 | [0.1, 0.9] |
PixelDiT-XL-16-512 |
100 | 3.5 | 2.0 | [0.1, 1.0] |
PixelDiT denoises directly in pixel space (no VAE). height and width must be divisible by the patch size (16).
Conversion
cd libs/PixelDiT-diffusers
python scripts/convert_pixeldit_t2i_to_diffusers.py \
--checkpoint /path/to/pixeldit_t2i_v1.pth \
--config /path/to/config.json \
--output /path/to/PixelDiT-T2I-1024 \
--sample-size 1024 \
--scheduler-shift 4.0 \
--check-load
python scripts/convert_pixeldit_to_diffusers.py \
--checkpoint /path/to/imagenet256_pixeldit_xl_epoch320.ckpt \
--output /path/to/PixelDiT-XL-16-256 \
--model-size pixeldit-xl \
--sample-size 256 \
--scheduler-shift 1.0 \
--check-load \
--id2label /path/to/id2label_en.json
Citation
@inproceedings{yu2025pixeldit,
title={PixelDiT: Pixel Diffusion Transformers for Image Generation},
author={Yongsheng Yu and Wei Xiong and Weili Nie and Yichen Sheng and Shiqiu Liu and Jiebo Luo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2026},
}
License
Weights are converted from NVIDIA checkpoints released under the NSCLv1 License. Use for non-commercial research and evaluation only.
- Downloads last month
- -