Instructions to use meta-llama/Llama-3.3-70B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use meta-llama/Llama-3.3-70B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="meta-llama/Llama-3.3-70B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use meta-llama/Llama-3.3-70B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "meta-llama/Llama-3.3-70B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Llama-3.3-70B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/meta-llama/Llama-3.3-70B-Instruct

SGLang

How to use meta-llama/Llama-3.3-70B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "meta-llama/Llama-3.3-70B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Llama-3.3-70B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "meta-llama/Llama-3.3-70B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Llama-3.3-70B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use meta-llama/Llama-3.3-70B-Instruct with Docker Model Runner:
```
docker model run hf.co/meta-llama/Llama-3.3-70B-Instruct
```

Llama 3.1 crashing Kobold

#29

by Keionsa - opened Dec 12, 2024

Discussion

Keionsa

Dec 12, 2024

is anyone else having kobold using a gguf have it CTD?

lucasnil-1211

Dec 13, 2024

Keionsa

Dec 14, 2024

hi hi!
you use LLama? My older Llama works fine.. this new 3.1 I dont know if its too powerful or what the deal is?

karrelin

Dec 20, 2024

not the right place to talk about that dude

Keionsa

Dec 21, 2024

why? because of the program I use Kobold? whats wrong with that?

karrelin

Dec 23, 2024

dude, you're talking about lumimaid in meta-llama/Llama-3.3-70B-Instruct.

bartowski

Dec 23, 2024

And even then using kobold would be GGUF, this is a safetensors model

Keionsa

Dec 26, 2024

No the models I found were GGUF. I wanted to use the 3.3 model.. Yes I know THIS model is a safe tenors.. If I cant even get the 3.1 for the gguf model to work why would I request an upgrade?? Thats the point im trying to make with my question. Sorry it wasnt clear to the community.

bartowski

Dec 26, 2024

I don't think any of us really understand what you're trying to say

you're asking about Lumimaid, someone else's tune of a Llama model, in GGUF format, crashing in Koboldcpp

Meta's Llama 3.3 model has absolutely no relation to Lumimaid, GGUF, or Koboldcpp, so this isn't the appropriate place to ask

Either try on the original model page where you downloaded or on Koboldcpp's github/discord

Keionsa changed discussion title from Llama 3.1 lumi maid crashing Kobold to Llama 3.1 crashing Kobold Dec 29, 2024

Keionsa

Dec 30, 2024

You not helping Bartoski, at this point of you following every post I make and harrasing me, I am going to say it Here. LEAVE ME ALONE
LEAVE ME ALONE
One more time since cyber law requires it.. LEAVE ME ALONE.
Now I get to screen shot this and send the file off.
Leave me alone dude

TheDrummer

Jan 2, 2025

Hi all, Drummer here…

linkpharm

Jan 2, 2025

You not helping Bartoski, at this point of you following every post I make and harrasing me, I am going to say it Here. LEAVE ME ALONE
LEAVE ME ALONE
One more time since cyber law requires it.. LEAVE ME ALONE.
Now I get to screen shot this and send the file off.
Leave me alone dude

Lmfao

linkpharm

Jan 2, 2025

Insulting possibly the most prolific contributor to open source llms

bartowski

Jan 2, 2025

Meh it's okay, just hope they find the help they're looking for somehow lol

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment