Instructions to use meta-llama/Llama-3.3-70B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use meta-llama/Llama-3.3-70B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="meta-llama/Llama-3.3-70B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.3-70B-Instruct") model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use meta-llama/Llama-3.3-70B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "meta-llama/Llama-3.3-70B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.3-70B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/meta-llama/Llama-3.3-70B-Instruct
- SGLang
How to use meta-llama/Llama-3.3-70B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "meta-llama/Llama-3.3-70B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.3-70B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "meta-llama/Llama-3.3-70B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "meta-llama/Llama-3.3-70B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use meta-llama/Llama-3.3-70B-Instruct with Docker Model Runner:
docker model run hf.co/meta-llama/Llama-3.3-70B-Instruct
Llama 3.1 crashing Kobold
is anyone else having kobold using a gguf have it CTD?
hi hi!
you use LLama? My older Llama works fine.. this new 3.1 I dont know if its too powerful or what the deal is?
not the right place to talk about that dude
why? because of the program I use Kobold? whats wrong with that?
dude, you're talking about lumimaid in meta-llama/Llama-3.3-70B-Instruct.
And even then using kobold would be GGUF, this is a safetensors model
No the models I found were GGUF. I wanted to use the 3.3 model.. Yes I know THIS model is a safe tenors.. If I cant even get the 3.1 for the gguf model to work why would I request an upgrade?? Thats the point im trying to make with my question. Sorry it wasnt clear to the community.
I don't think any of us really understand what you're trying to say
you're asking about Lumimaid, someone else's tune of a Llama model, in GGUF format, crashing in Koboldcpp
Meta's Llama 3.3 model has absolutely no relation to Lumimaid, GGUF, or Koboldcpp, so this isn't the appropriate place to ask
Either try on the original model page where you downloaded or on Koboldcpp's github/discord
You not helping Bartoski, at this point of you following every post I make and harrasing me, I am going to say it Here. LEAVE ME ALONE
LEAVE ME ALONE
One more time since cyber law requires it.. LEAVE ME ALONE.
Now I get to screen shot this and send the file off.
Leave me alone dude
Hi all, Drummer here…
You not helping Bartoski, at this point of you following every post I make and harrasing me, I am going to say it Here. LEAVE ME ALONE
LEAVE ME ALONE
One more time since cyber law requires it.. LEAVE ME ALONE.
Now I get to screen shot this and send the file off.
Leave me alone dude
Lmfao
Insulting possibly the most prolific contributor to open source llms
Meh it's okay, just hope they find the help they're looking for somehow lol