---
license: apache-2.0
base_model:
- openai/gpt-oss-20b
library_name: transformers
---
# gpt-oss-20b ONNX model (deqauntized to BF16)

This repository contains an ONNX export of the [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from Hugging Face, generated using the official [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem.

## Model Overview

- **Source Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
- **Exported Format:** ONNX
- **Precision:** BF16 (dequantized from MXFP4 for GPU compatibility)
- **Layers:** 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head

This repository includes all files: tokenizer, chat templates and configuration files.

## Generation Details

The ONNX model was generated using the `builder.py` script from the onnxruntime-genai toolkit. The process involved:

- Loading the original gpt-oss-20b checkpoint from 🤗
- Reading and converting all model layers (embedding, decoder, final norm, LM head)
- Saving the ONNX model and associated external data file
- Exporting tokenizer and configuration files
- Model layers and weights were successfully read and converted
- MXFP4 quantized weights were dequantized to BF16
- All necessary files for GenAI runtime and Hugging Face integration were generated

## Usage

To use this ONNX model:

1. Download the model files and tokenizer assets from this repository.
2. Load the ONNX model using [onnxruntime](https://onnxruntime.ai/) or compatible inference engines.

## Acknowledgements

- Original model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗
- ONNX export: [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) by Microsoft

---