Instructions to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="nvidia/NVIDIA-Nemotron-Parse-v1.1", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("nvidia/NVIDIA-Nemotron-Parse-v1.1", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "nvidia/NVIDIA-Nemotron-Parse-v1.1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVIDIA-Nemotron-Parse-v1.1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.1

SGLang

How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVIDIA-Nemotron-Parse-v1.1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "nvidia/NVIDIA-Nemotron-Parse-v1.1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "nvidia/NVIDIA-Nemotron-Parse-v1.1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use nvidia/NVIDIA-Nemotron-Parse-v1.1 with Docker Model Runner:
```
docker model run hf.co/nvidia/NVIDIA-Nemotron-Parse-v1.1
```

How can I run this model on a more recent version of transformers?

by rsbdev - opened Feb 9

Discussion

rsbdev

Feb 9

Hey guys first of all I'd like to thank you for this wonderful model which from my pov is the absolute greatest OCR type model at parsing financial tables specifically, literally no other model comes even close in accuracy when it comes to extracting large numbers so thanks again for that.

So far I've been able to run this model on transformers 4.51.3 (using the recommended versions of all other various packages), as well as the more recent versions of vllm, but neither of these options is ideal for my workflow since vllm has extremely long startup times on my hardware and the recommended version of transformers is quite old and does not support the "transformers serve" feature .

Ideally I'd like to use this model as an openai compatible api endpoint while avoiding the usage of vllm, but unfortunately after testing multiple recent version of transformers from 4.53 to 5.2.0 I always end up getting various errors such as

"Parse_hyphen_v1_dot_1_hyphen_TC/77d31557a2b9ca4a9caadf685bad38e7d4edd10b/hf_nemotron_parse_modeling.py", line 199, in forward
past_key_values_length = past_key_values[0][0].shape[2] if past_key_values is not None else 0
^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'shape'"

So I'm wondering if this model has been tested successfully on a recent transformers build and what environment do you recommend I use to get it working properly? Thanks a bunch.

nvidia-oliver-holworthy

NVIDIA org 29 days ago

@rsbdev We've updated to support latest versions of transformers and vLLM in #8

Also note that we have an improved version of this model here that you may want to check out https://huggingface.co/nvidia/NVIDIA-Nemotron-Parse-v1.2 (this also works with the latest transformers and vLLM versions too)

nvidia-oliver-holworthy changed discussion status to closed 29 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment