Instructions to use OpenGVLab/InternVL2_5-78B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenGVLab/InternVL2_5-78B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL2_5-78B", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenGVLab/InternVL2_5-78B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use OpenGVLab/InternVL2_5-78B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OpenGVLab/InternVL2_5-78B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2_5-78B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/OpenGVLab/InternVL2_5-78B
- SGLang
How to use OpenGVLab/InternVL2_5-78B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OpenGVLab/InternVL2_5-78B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2_5-78B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OpenGVLab/InternVL2_5-78B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2_5-78B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use OpenGVLab/InternVL2_5-78B with Docker Model Runner:
docker model run hf.co/OpenGVLab/InternVL2_5-78B
Image resolutions that will work well?
Thank you for all the hardwork that went into creating this model and providing it to the community!
The model card could be improved by making it clear what resolutions your model supports/will perform well with/was trained on. This is the most basic information for a vision LLM: what inputs will work (well) with it? For some reason almost everyone releasing vision LLMs makes this very hard to figure out.
I'm guessing it is like the your 2.0, up to 12 tiles of 448x448 pixels? Some things that weren't clear to me with that were:
-What if one of the dimensions of your image isn't divisible by 448?
-What if your image would require more than 12 tiles?
-If inputs violating those constraints aren't outright rejected, what happens? (e.g. do the tiles overlap/ is the image is resized or cropped) Is the model trained on such images?
Thanks again!
Thank you for your kind words and valuable feedback! We appreciate your suggestion to clarify supported resolutions in the model card. Here's the detailed information:
- If one of the dimensions of your image isn't divisible by 448, the image will be resized to the nearest dimensions divisible by 448, which might introduce some slight distortion.
- You can control the resolution and tiling behavior using the
max_numparameter. By default, we setmax_num=12, but you can adjust this to 18 or 24 tiles to process higher-resolution images. - If an input violates these constraints (e.g., exceeds the maximum number of tiles), the model may resize or crop the image to fit within the supported tiling limits. The model has been trained on such cases to ensure robust performance.
Additionally, you can refer to the dynamic_preprocess function in the README for more details on how preprocessing is handled dynamically.