--- license: apache-2.0 base_model: - openai/gpt-oss-20b library_name: transformers --- # gpt-oss-20b ONNX model (deqauntized to BF16) This repository contains an ONNX export of the [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from Hugging Face, generated using the official [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) builder. The choice of setting the precision to BF16 was mainly out of lack of resources on my M4 mini followed by my limited knowledge of the GenAI engineering ecosystem. ## Model Overview - **Source Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗 - **Exported Format:** ONNX - **Precision:** BF16 (dequantized from MXFP4 for GPU compatibility) - **Layers:** 24 decoder layers, embedding layer, final normalization, and language modeling (LM) head This repository includes all files: tokenizer, chat templates and configuration files. ## Generation Details The ONNX model was generated using the `builder.py` script from the onnxruntime-genai toolkit. The process involved: - Loading the original gpt-oss-20b checkpoint from 🤗 - Reading and converting all model layers (embedding, decoder, final norm, LM head) - Saving the ONNX model and associated external data file - Exporting tokenizer and configuration files - Model layers and weights were successfully read and converted - MXFP4 quantized weights were dequantized to BF16 - All necessary files for GenAI runtime and Hugging Face integration were generated ## Usage To use this ONNX model: 1. Download the model files and tokenizer assets from this repository. 2. Load the ONNX model using [onnxruntime](https://onnxruntime.ai/) or compatible inference engines. ## Acknowledgements - Original model: [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) from 🤗 - ONNX export: [onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai) by Microsoft ---