Instructions to use meta-llama/Llama-3.1-8B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use meta-llama/Llama-3.1-8B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="meta-llama/Llama-3.1-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
HuggingChat
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use meta-llama/Llama-3.1-8B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "meta-llama/Llama-3.1-8B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/meta-llama/Llama-3.1-8B-Instruct

SGLang

How to use meta-llama/Llama-3.1-8B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "meta-llama/Llama-3.1-8B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "meta-llama/Llama-3.1-8B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meta-llama/Llama-3.1-8B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use meta-llama/Llama-3.1-8B-Instruct with Docker Model Runner:
```
docker model run hf.co/meta-llama/Llama-3.1-8B-Instruct
```

!!Access Problem

#37

by minglingfeng - opened Jul 24, 2024

Discussion

minglingfeng

Jul 24, 2024

why I had been rejected to access the models？

jchoudhari

Jul 24, 2024

it seems that theres a need to request access first --

That can be done by filling a form under files and versions.

Alteir

Jul 24, 2024

•

edited Jul 24, 2024

Me too, applied and rejected.

I wonder why this happened?

sydney1

Jul 24, 2024

I have the same problem!!

eccstartup

Jul 24, 2024

Me too. Maybe llama3.1 is not really an open llm. :-)

pig-pig

Jul 25, 2024

It's like openai isn't open source.

Night0717

Jul 25, 2024

me too, some countries(like china) may be banned?

LronDC

Jul 25, 2024

Yes, China is obviously banned. I don't understand. I can choose another country at will and get permission, but they still have to set such a restriction in such a dogmatic way. I don't know what they want to express. Or maybe it's not something they can control.

jm8

Jul 25, 2024

This comment has been hidden

llanguagemtrainer

Jul 26, 2024

maybe only open to research, if you were a student you might have get access.

hemangjoshi37a

Aug 15, 2024

in mine it says like this

OSError: You are trying to access a gated repo.
Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3.1-8B.
403 Client Error. (Request ID: Root=1-66bde90e-307a3c0419da184f621cea2a;56d907ca-6175-4ec2-88c8-0ec3a17003d0)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/resolve/main/config.json.
Your request to access model meta-llama/Meta-Llama-3.1-8B is awaiting a review from the repo authors.

very sad cant access this

gemengmeng

Sep 10, 2024

close AI! what do you mean? so ridiculous!

panamantis

Jan 15, 2025

I was rejected as well. No reason. Anyone else doing SAE research on understanding explainability of the llama layer(s). I'm doing it for fun but I suppose I can't now.

adrianSimi

May 29, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment