Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding
📄 Paper | 💻 Code | 🤗 Hulu-Med-Flash-Preview-27B| 🤗 Hulu-Med-30A3 |🤗 Hulu-Med-235A22 | 🤗 Hulu-Med-4B |🤗 Hulu-Med-7B |🤗 Hulu-Med-14B |🤗 Hulu-Med-32B | 🔮 ModelScope Models | 📊 Demo
🔥 News
- [2025-11-27] ⚡ Hulu-Med is now compatible with the latest vLLM, offering faster inference and tensor parallel support! Thank you all for your patience and feedback 💪 see here for installation
- [2025-11-18] 🎊 We released Hulu-Med-4B, a lightweight model with strong multimodal and text reasoning abilities that surpasses MedGemma-4B and Lingshu-7B!
- [2025-11-01] 📊 Releasing our new evaluation code, MedUniEval! Built on MedEvalKit, MedUniEval is designed for the comprehensive evaluation of medical visual-language models across various modalities—including text, 2D, 3D, and video. More benchmarks are coming soon. Some processed evaluation data are available here.
- [2025-10-16] 🚀 Demo Is Live! We've just deployed a demo and we'd love for you to try it! Your insights and feedback are crucial for helping us improve the model in the next version.
- [2025-10-15] 🎉 Hulu-Med now supports Transformers integration! HuggingFace-compatible models released with simplified loading and inference. Integration with VLLM is ongoing. The HF models are now available in the main branch on Hugging Face.
- The model has been updated in the main branch of our Hugging Face repository. You can now load it directly using
AutoModelForCausalLM.from_pretrained- the weights will be automatically downloaded. For users in regions with limited access, you can set the HF mirror environment variable to ensure reliable downloads:
export HF_ENDPOINT=https://hf-mirror.com
- [2025-10-08] Hulu-Med models and inference code released!
📖 Overview
Hulu-Med is a transparent medical vision-language model that unifies understanding across diverse modalities including medical text, 2D/3D images, and videos. Built with a focus on transparency and accessibility, Hulu-Med achieves state-of-the-art performance on 30 medical benchmarks while being trained entirely on public data.
Key Features
- 🌟 Holistic Multimodal Understanding: Seamlessly processes medical text, 2D images, 3D volumes, and surgical videos
- 🔓 Fully Transparent: Complete open-source pipeline including data curation, training code, and model weights
- 📊 State-of-the-Art Performance: Outperforms leading open-source models and competes with proprietary systems
- ⚡ Efficient Training: Only 4,000-40,000 GPU hours required for 7B-32B variants
- 🗂️ Comprehensive Coverage: Trained on 16.7M samples spanning 12 anatomical systems and 14 imaging modalities
- 🤗 Transformers Native: Now with native HuggingFace Transformers support for easier integration
Comprehensive Data Coverage
Our training corpus encompasses:
- 12 Major Anatomical Systems: Multi-System, Skin/Integumentary, Respiratory, Cellular/Tissue Level, Digestive, Nervous, Cardiovascular, Musculoskeletal, Reproductive, Urinary, Whole Body, Endocrine, Immune/Lymphatic, and Hematologic systems
- 14 Medical Imaging Modalities: CT, MRI, X-Ray, Ultrasound, PET, OCT, Endoscopy, Microscopy, Histopathology, Fundus, Dermoscopy, Angiography, Digital Photograph, and Medical Chart
- Diverse Downstream Tasks: Medical Dialogue, Anomaly Detection, Prognosis Prediction, Treatment Planning, Surgical Skill Assessment, Education, Medical Report Generation, Surgical Phase Recognition, Medical Computation, and more
💻 Quick Start
Note: As a MoE-based model, Hulu-30A3/235A22 is recommended to be served via vLLM or SGLang for optimal performance and efficiency.
Start the Server
vLLM
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
--model Hulu-30A3 \
--infer_backend vllm \
--vllm_tensor_parallel_size 8 \
--vllm_engine_kwargs '{"data_parallel_size": 1, "enable_chunked_prefill": true, "enable_multimodal_encoder_data_parallel": false}' \
--vllm_max_num_seqs 512 \
--vllm_enable_expert_parallel \
--vllm_max_model_len 75538 \
--vllm_gpu_memory_utilization 0.85 \
--model_type qwen3_vl_moe \
--port 8000 \
--served_model_name hulu
SGLang
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 PYTHONPATH=./Swift-HuluMed/ swift deploy \
--model Hulu-30A3 \
--infer_backend sglang \
--max_new_tokens 128000 \
--sglang_context_length 128000 \
--sglang_tp_size 8 \
--model_type qwen3_moe_vl \
--port 8000 \
--served_model_name hulu
Inference via OpenAI-Compatible API
Text Example
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
response = client.chat.completions.create(
model="hulu",
messages=[{"role": "user", "content": "Hello, I have a headache, what should I do?"}],
max_tokens=1024,
temperature=0,
)
print(response.choices[0].message.content)
Image Example
import base64
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
with open("./demo/demo.jpg", "rb") as f:
image_data = base64.b64encode(f.read()).decode("utf-8")
response = client.chat.completions.create(
model="hulu",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}},
{"type": "text", "text": "Generate a medical report for this image."},
],
}],
max_tokens=1024,
temperature=0,
)
print(response.choices[0].message.content)
📋 Supported Tasks
- ✅ Visual Question Answering (2D/3D/Video)
- ✅ Medical Report Generation
- ✅ Disease Diagnosis
- ✅ Anatomical Understanding
- ✅ Surgical Phase Recognition
- ✅ Clinical Dialogue
- ✅ Medical Text Reasoning
- ✅ Multilingual Medical QA
- ✅ Rare Disease Diagnosis
- ✅ And more
📄 Citation
If you find Hulu-Med useful in your research, please cite:
@misc{jiang2025hulumedtransparentgeneralistmodel,
title={Hulu-Med: A Transparent Generalist Model towards Holistic Medical Vision-Language Understanding},
author={Songtao Jiang and Yuan Wang and Sibo Song and Tianxiang Hu and Chenyi Zhou and Bin Pu and Yan Zhang and Zhibo Yang and Yang Feng and Joey Tianyi Zhou and Jin Hao and Zijian Chen and Ruijia Wu and Tao Tang and Junhui Lv and Hongxia Xu and Hongwei Wang and Jun Xiao and Bin Feng and Fudong Zhu and Kenli Li and Weidi Xie and Jimeng Sun and Jian Wu and Zuozhu Liu},
year={2025},
eprint={2510.08668},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2510.08668},
}
📜 License
This project is released under the Apache 2.0 License.
- Downloads last month
- 40