Instructions to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="nvidia/NVIDIA-Nemotron-Parse-v1.1", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/NVIDIA-Nemotron-Parse-v1.1", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nvidia/NVIDIA-Nemotron-Parse-v1.1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Parse-v1.1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.1
- SGLang
How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Parse-v1.1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-Parse-v1.1", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with Docker Model Runner:
docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.1
How can I run this model on a more recent version of transformers?
Hey guys first of all I'd like to thank you for this wonderful model which from my pov is the absolute greatest OCR type model at parsing financial tables specifically, literally no other model comes even close in accuracy when it comes to extracting large numbers so thanks again for that.
So far I've been able to run this model on transformers 4.51.3 (using the recommended versions of all other various packages), as well as the more recent versions of vllm, but neither of these options is ideal for my workflow since vllm has extremely long startup times on my hardware and the recommended version of transformers is quite old and does not support the "transformers serve" feature .
Ideally I'd like to use this model as an openai compatible api endpoint while avoiding the usage of vllm, but unfortunately after testing multiple recent version of transformers from 4.53 to 5.2.0 I always end up getting various errors such as
"Parse_hyphen_v1_dot_1_hyphen_TC/77d31557a2b9ca4a9caadf685bad38e7d4edd10b/hf_nemotron_parse_modeling.py", line 199, in forward
past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'shape'"
So I'm wondering if this model has been tested successfully on a recent transformers build and what environment do you recommend I use to get it working properly? Thanks a bunch.
@rsbdev We've updated to support latest versions of transformers and vLLM in #8
Also note that we have an improved version of this model here that you may want to check out https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.2 (this also works with the latest transformers and vLLM versions too)