Instructions to use minseo25/CDLM-Dream with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use minseo25/CDLM-Dream with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="minseo25/CDLM-Dream")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("minseo25/CDLM-Dream", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use minseo25/CDLM-Dream with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "minseo25/CDLM-Dream" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "minseo25/CDLM-Dream", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/minseo25/CDLM-Dream
- SGLang
How to use minseo25/CDLM-Dream with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "minseo25/CDLM-Dream" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "minseo25/CDLM-Dream", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "minseo25/CDLM-Dream" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "minseo25/CDLM-Dream", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use minseo25/CDLM-Dream with Docker Model Runner:
docker model run hf.co/minseo25/CDLM-Dream
CDLM-Dream LoRA adapter for Dream-7B-Instruct
This repository hosts the LoRA adapter for the Dream-7B-Instruct diffusion LLM (dLLM), produced with the CDLM (Consistency Diffusion Language Models) method. CDLM integrates consistency modeling and a block-wise causal attention mask so the student model becomes fully KV-cache compatible while retaining the strong local bidirectional modeling within each block. In practice, the adapter enables significantly faster inference with competitive quality.
- GitHub: https://github.com/SqueezeAILab/CDLM
- Paper: CDLM: Consistency Diffusion Language Models For Faster Sampling
Model details
- Base model: Dream-org/Dream-v0-Instruct-7B
- Method: CDLM (consistency distillation + block-wise causal masking for KV-cache compatibility)
- Format: PEFT LoRA adapter (
adapter_model.safetensors,adapter_config.json) - Intended use: attach this adapter to the base Dream-7B-Instruct model for accelerated inference via the CDLM decoding path
How to use
This is a LoRA adapter, not a full model. You must load the base model and then attach this adapter. For best speedups, use the CDLM inference path in the accompanying codebase.
License
This adapter is released under the MIT License. The base model is governed by its own license; please ensure compliance with the base model’s terms.
Citation
@article{kim2025cdlm,
title = {CDLM: Consistency Diffusion Language Models for Faster Sampling},
author = {Kim, Minseo and Xu, Chenfeng and Hooper, Coleman and Singh, Harman
and Athiwaratkun, Ben and Zhang, Ce and Keutzer, Kurt and Gholami, Amir},
journal = {arXiv preprint arXiv:2511.19269},
year = {2025},
url = {https://arxiv.org/abs/2511.19269}
}