Instructions to use openai/gpt-oss-120b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai/gpt-oss-120b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openai/gpt-oss-120b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b") model = AutoModelForCausalLM.from_pretrained("openai/gpt-oss-120b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use openai/gpt-oss-120b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openai/gpt-oss-120b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai/gpt-oss-120b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openai/gpt-oss-120b
- SGLang
How to use openai/gpt-oss-120b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openai/gpt-oss-120b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai/gpt-oss-120b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openai/gpt-oss-120b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai/gpt-oss-120b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openai/gpt-oss-120b with Docker Model Runner:
docker model run hf.co/openai/gpt-oss-120b
ClosedAI: MXFP4 is not Open Source
Can we talk about how ridiculous it is that we only get MXFP4 weights for gpt-oss?
By withholding the BF16 source weights, OpenAI is making it nearly impossible for the community to fine-tune these models without significant intelligence degradation. It feels less like a contribution to the community and more like a marketing stunt for NVIDIA Blackwell.
The "Open" in OpenAI has never felt more like a lie. Welcome to the era of ClosedAI, where "open weights" actually means "quantized weights that you can't properly tune."
Give us the BF16 weights, or stop calling these models "Open."
I'm very happy that they did QAT on MXFP4, this is the main reason it remains so smart while beeing the fastest of all the open models (even latest qwen3 MOE models), and even after existing now 6 months, it still dominates in intelligence and speed.
It's indeed a bit pitty that the BF16 were not published as well: in open source this could have ment that we can learn from it, and then make it better for openAI, that's how it works: Opensource community improves it
Now they can't . That's pitty because the most important disadavantage that AI will create in the world is accesibility for NON-rich people.
With MXFP4 QAT training we finally have something that doesn't require a server, it can run on consumer hardware. Therefore poorer people can have access to intelligence .