mratsim commited on
Commit
9dea919
·
verified ·
1 Parent(s): cdf07cc

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - MarsupialAI/Monstral-123B-v2
4
+ datasets:
5
+ - neuralmagic/calibration
6
+ - HuggingFaceH4/ultrachat_200k
7
+ - nvidia/OpenCodeInstruct
8
+ - CSJianYang/CodeArena
9
+ - nvidia/OpenScienceReasoning-2
10
+ - MegaScience/MegaScience
11
+ - Gryphe/Opus-WritingPrompts
12
+ - ServiceNow-AI/M2Lingual
13
+ - anthracite-org/stheno-filtered-v1.1
14
+ - zerofata/Instruct-Anime
15
+ - zerofata/Instruct-Anime-CreativeWriting
16
+ - sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo
17
+ - nvidia/OpenMathInstruct-2
18
+ - fka/awesome-chatgpt-prompts
19
+ - databricks/databricks-dolly-15k
20
+ - FreedomIntelligence/SocraticChat
21
+ - ruggsea/stanford-encyclopedia-of-philosophy_instruct
22
+ - mlfoundations-dev/stackexchange_philosophy
23
+ - theoldmandthesea/17k_business_book
24
+ - anthracite-org/nopm_claude_writing_fixed
25
+ - PJMixers/grimulkan_physical-reasoning-ShareGPT
26
+ - PJMixers/grimulkan_theory-of-mind-ShareGPT
27
+ - HuggingFaceH4/no_robots
28
+ - nvidia/HelpSteer
29
+ - garage-bAInd/Open-Platypus
30
+ - AquaV/US-Army-Survival-Sharegpt
31
+ - AquaV/Interrogation-Sharegpt
32
+ - AquaV/Multi-Environment-Operations-Sharegpt
33
+ - AquaV/Resistance-Sharegpt
34
+ - PocketDoc/Dans-Kinomaxx-VanillaBackrooms
35
+ pipeline_tag: text-generation
36
+ tags:
37
+ - text adventure
38
+ - roleplay
39
+ - rpg
40
+ - creative writing
41
+ - nvfp4
42
+ - vllm
43
+ - conversational
44
+ ---
45
+ # Monstral-123B-v2 (NVFP4 quant)
46
+
47
+ This repo contains Monstral-123B-v2 quantized with NVFP4, a 4-bit compression suitable for max performance on Nvidia RTX 5000s series GPUs.
48
+
49
+ > ℹ️ This model is limited to Hopper and Blackwell family of GPUs and will not work with RTX 3000s and RTX 4000s GPUs.
50
+ > Please use the NVFP4A16 model otherwise OR enable slow emulation `export VLLM_USE_NVFP4_CT_EMULATIONS=1`
51
+
52
+ - Original Model:
53
+ - [MarsupialAI/Monstral-123B-v2](https://huggingface.co/MarsupialAI/Monstral-123B-v2)
54
+ - RTX 3000s and 4000s GPUs fallback model:
55
+ - TBD
56
+
57
+ NVFP4 writeups:
58
+ - https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/
59
+ - https://arxiv.org/pdf/2509.25149
60
+
61
+ ## 📥 Usage & Running Instructions
62
+
63
+ The model was tested with vLLM + 1x RTX Pro 6000.
64
+
65
+ ### Hardware
66
+
67
+ As of October 2025, this quantized model can only be run on architectures with hardware FP4 support (Blackwell or later).
68
+ Cheaper GPUs with 24GB of VRAM (RTX 5080 Super) that can run this model in pairs are expected in Q1 2026.
69
+
70
+ You may still run this model with emulation albeit slowly by setting `export VLLM_USE_NVFP4_CT_EMULATIONS=1`
71
+ otherwise use the alternative [TBD]
72
+
73
+ ### Recommendations
74
+
75
+ It is however recommended to use at most 65K context to avoid significant degradation (https://fiction.live/stories/Fiction-liveBench-Sept-29-2025/oQdzQvKHw8JyXbN87).
76
+
77
+ This model is recommended with "min-p" sampling, this sampling is available through
78
+ both the oldest Text completions API and the Chat completions API (and there is a new Response API),
79
+ however most LLM frontends only support modifying min-p when using Text completions.
80
+ You can however use `--override-generation-config "${SAMPLER_JSONCONFIG}"` to override the sampler (which is a merge of generation_config.json and vLLM defaults)
81
+
82
+ ### Running script
83
+
84
+ ```bash
85
+ # Model configuration (Mandatory)
86
+ MODEL="mratsim/Monstral-123B-v2-NVFP4"
87
+ MODELNAME="Monstral-123B-v2"
88
+ CONTEXT_SIZE=32768
89
+ GPU_UTIL=0.85
90
+
91
+ # Sampling configuration (Optional, if departing from `generation_config.json`)
92
+ # Using default vLLM values
93
+ SAMPLER_OVERRIDE='{"temperature": 1, "min_p": 0, "top_p": 1, "repetition_penalty": 1}'
94
+
95
+ # Prevent vLLM from using 100% CPU when idle (Very Recommended)
96
+ export VLLM_SLEEP_WHEN_IDLE=1
97
+
98
+ # Use FlashInfer backend (fastest, recommended, "instant" context reprocessing)
99
+ export VLLM_ATTENTION_BACKEND=FLASHINFER
100
+
101
+ vllm serve "${MODEL}" \
102
+ --served-model-name "${MODELNAME}" \
103
+ --gpu-memory-utilization ${GPU_UTIL} \
104
+ --max-model-len "${CONTEXT_SIZE}" \
105
+ --override-generation-config "${SAMPLER_OVERRIDE}"
106
+ ```
107
+
108
+ > ℹ️ The FlashInfer backend may fail with an error similar to
109
+ > `Failed to allocate memory for batch_prefill_tmp_v with size XYZ and alignment 16 in AlignedAllocator`.
110
+ >
111
+ > A workaround is running a sed replacement command within vllm install to increase buffer space
112
+ > ```bash
113
+ > sed -i 's/FLASHINFER_WORKSPACE_BUFFER_SIZE = 256 \* 1024 \* 1024/FLASHINFER_WORKSPACE_BUFFER_SIZE = 512 \* 1024 \* 1024/g' vllm/v1/attention/backends/flashinfer.py
114
+ > ```
115
+ > This will be fixed by PR https://github.com/vllm-project/vllm/pull/25344
116
+
117
+ ## 🔬 Quantization method
118
+
119
+ The llmcompressor library was used with the following recipe:
120
+
121
+ ```yaml
122
+ default_stage:
123
+ default_modifiers:
124
+ QuantizationModifier:
125
+ targets: [Linear]
126
+ ignore: [lm_head]
127
+ scheme: NVFP4
128
+ ```
129
+
130
+ and calibrated on 3 samples per the following datasets (total 90), 8192 sequence length:
131
+ - [neuralmagic/calibration](https://huggingface.co/datasets/neuralmagic/calibration)
132
+ - [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k)
133
+ - [nvidia/OpenCodeInstruct](https://huggingface.co/datasets/nvidia/OpenCodeInstruct)
134
+ - [CSJianYang/CodeArena](https://huggingface.co/datasets/CSJianYang/CodeArena)
135
+ - [nvidia/OpenScienceReasoning-2](https://huggingface.co/datasets/nvidia/OpenScienceReasoning-2)
136
+ - [MegaScience/MegaScience](https://huggingface.co/datasets/MegaScience/MegaScience)
137
+ - [Gryphe/Opus-WritingPrompts](https://huggingface.co/datasets/Gryphe/Opus-WritingPrompts)
138
+ - [ServiceNow-AI/M2Lingual](https://huggingface.co/datasets/ServiceNow-AI/M2Lingual)
139
+ - [anthracite-org/stheno-filtered-v1.1](https://huggingface.co/datasets/anthracite-org/stheno-filtered-v1.1)
140
+ - [zerofata/Instruct-Anime](https://huggingface.co/datasets/zerofata/Instruct-Anime)
141
+ - [zerofata/Instruct-Anime-CreativeWriting](https://huggingface.co/datasets/zerofata/Instruct-Anime-CreativeWriting)
142
+ - [sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo](https://huggingface.co/datasets/sam-paech/gutenberg3-generalfiction-scifi-fantasy-romance-adventure-dpo)
143
+ - [nvidia/OpenMathInstruct-2](https://huggingface.co/datasets/nvidia/OpenMathInstruct-2)
144
+ - [fka/awesome-chatgpt-prompts](https://huggingface.co/datasets/fka/awesome-chatgpt-prompts)
145
+ - [databricks/databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k)
146
+ - [FreedomIntelligence/SocraticChat](https://huggingface.co/datasets/FreedomIntelligence/SocraticChat)
147
+ - [ruggsea/stanford-encyclopedia-of-philosophy_instruct](https://huggingface.co/datasets/ruggsea/stanford-encyclopedia-of-philosophy_instruct)
148
+ - [mlfoundations-dev/stackexchange_philosophy](https://huggingface.co/datasets/mlfoundations-dev/stackexchange_philosophy)
149
+ - [theoldmandthesea/17k_business_book](https://huggingface.co/datasets/theoldmandthesea/17k_business_book)
150
+ - [anthracite-org/nopm_claude_writing_fixed](https://huggingface.co/datasets/anthracite-org/nopm_claude_writing_fixed)
151
+ - [PJMixers/grimulkan_physical-reasoning-ShareGPT](https://huggingface.co/datasets/PJMixers/grimulkan_physical-reasoning-ShareGPT)
152
+ - [PJMixers/grimulkan_theory-of-mind-ShareGPT](https://huggingface.co/datasets/PJMixers/grimulkan_theory-of-mind-ShareGPT)
153
+ - [HuggingFaceH4/no_robots](https://huggingface.co/datasets/HuggingFaceH4/no_robots)
154
+ - [nvidia/HelpSteer](https://huggingface.co/datasets/nvidia/HelpSteer)
155
+ - [garage-bAInd/Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus)
156
+ - [AquaV/US-Army-Survival-Sharegpt](https://huggingface.co/datasets/AquaV/US-Army-Survival-Sharegpt)
157
+ - [AquaV/Interrogation-Sharegpt](https://huggingface.co/datasets/AquaV/Interrogation-Sharegpt)
158
+ - [AquaV/Multi-Environment-Operations-Sharegpt](https://huggingface.co/datasets/AquaV/Multi-Environment-Operations-Sharegpt)
159
+ - [AquaV/Resistance-Sharegpt](https://huggingface.co/datasets/AquaV/Resistance-Sharegpt)
160
+ - [PocketDoc/Dans-Kinomaxx-VanillaBackrooms](https://huggingface.co/datasets/PocketDoc/Dans-Kinomaxx-VanillaBackrooms)
161
+
162
+ NVFP4 quantization requires very few number of samples, llmcompressor uses 20 in their examples.
163
+ Comparatively 512 is recommended for GPTQ and 64 for AWQ (https://minjiazhang.github.io/courses/fall24-resource/slides/awq.pdf)0
chat_template.jinja ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {%- if messages[0]["role"] == "system" %}
2
+ {%- set system_message = messages[0]["content"] %}
3
+ {%- set loop_messages = messages[1:] %}
4
+ {%- else %}
5
+ {%- set loop_messages = messages %}
6
+ {%- endif %}
7
+ {%- if not tools is defined %}
8
+ {%- set tools = none %}
9
+ {%- endif %}
10
+ {%- set user_messages = loop_messages | selectattr("role", "equalto", "user") | list %}
11
+
12
+ {#- This block checks for alternating user/assistant messages, skipping tool calling messages #}
13
+ {%- set ns = namespace() %}
14
+ {%- set ns.index = 0 %}
15
+ {%- for message in loop_messages %}
16
+ {%- if not (message.role == "tool" or message.role == "tool_results" or (message.tool_calls is defined and message.tool_calls is not none)) %}
17
+ {%- if (message["role"] == "user") != (ns.index % 2 == 0) %}
18
+ {{- raise_exception("After the optional system message, conversation roles must alternate user/assistant/user/assistant/...") }}
19
+ {%- endif %}
20
+ {%- set ns.index = ns.index + 1 %}
21
+ {%- endif %}
22
+ {%- endfor %}
23
+
24
+ {{- bos_token }}
25
+ {%- for message in loop_messages %}
26
+ {%- if message["role"] == "user" %}
27
+ {%- if tools is not none and (message == user_messages[-1]) %}
28
+ {{- "[AVAILABLE_TOOLS] [" }}
29
+ {%- for tool in tools %}
30
+ {%- set tool = tool.function %}
31
+ {{- '{"type": "function", "function": {' }}
32
+ {%- for key, val in tool.items() if key != "return" %}
33
+ {%- if val is string %}
34
+ {{- '"' + key + '": "' + val + '"' }}
35
+ {%- else %}
36
+ {{- '"' + key + '": ' + val|tojson }}
37
+ {%- endif %}
38
+ {%- if not loop.last %}
39
+ {{- ", " }}
40
+ {%- endif %}
41
+ {%- endfor %}
42
+ {{- "}}" }}
43
+ {%- if not loop.last %}
44
+ {{- ", " }}
45
+ {%- else %}
46
+ {{- "]" }}
47
+ {%- endif %}
48
+ {%- endfor %}
49
+ {{- "[/AVAILABLE_TOOLS]" }}
50
+ {%- endif %}
51
+ {%- if loop.last and system_message is defined %}
52
+ {{- "[INST] " + system_message + "\n\n" + message["content"] + "[/INST]" }}
53
+ {%- else %}
54
+ {{- "[INST] " + message["content"] + "[/INST]" }}
55
+ {%- endif %}
56
+ {%- elif message.tool_calls is defined and message.tool_calls is not none %}
57
+ {{- "[TOOL_CALLS] [" }}
58
+ {%- for tool_call in message.tool_calls %}
59
+ {%- set out = tool_call.function|tojson %}
60
+ {{- out[:-1] }}
61
+ {%- if not tool_call.id is defined or tool_call.id|length != 9 %}
62
+ {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
63
+ {%- endif %}
64
+ {{- ', "id": "' + tool_call.id + '"}' }}
65
+ {%- if not loop.last %}
66
+ {{- ", " }}
67
+ {%- else %}
68
+ {{- "]" + eos_token }}
69
+ {%- endif %}
70
+ {%- endfor %}
71
+ {%- elif message["role"] == "assistant" %}
72
+ {{- " " + message["content"]|trim + eos_token}}
73
+ {%- elif message["role"] == "tool_results" or message["role"] == "tool" %}
74
+ {%- if message.content is defined and message.content.content is defined %}
75
+ {%- set content = message.content.content %}
76
+ {%- else %}
77
+ {%- set content = message.content %}
78
+ {%- endif %}
79
+ {{- '[TOOL_RESULTS] {"content": ' + content|string + ", " }}
80
+ {%- if not message.tool_call_id is defined or message.tool_call_id|length != 9 %}
81
+ {{- raise_exception("Tool call IDs should be alphanumeric strings with length 9!") }}
82
+ {%- endif %}
83
+ {{- '"call_id": "' + message.tool_call_id + '"}[/TOOL_RESULTS]' }}
84
+ {%- else %}
85
+ {{- raise_exception("Only user and assistant roles are supported, with the exception of an initial optional system message!") }}
86
+ {%- endif %}
87
+ {%- endfor %}
config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MistralForCausalLM"
4
+ ],
5
+ "attention_dropout": 0.0,
6
+ "bos_token_id": 1,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": 2,
9
+ "head_dim": 128,
10
+ "hidden_act": "silu",
11
+ "hidden_size": 12288,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 28672,
14
+ "max_position_embeddings": 131072,
15
+ "model_type": "mistral",
16
+ "num_attention_heads": 96,
17
+ "num_hidden_layers": 88,
18
+ "num_key_value_heads": 8,
19
+ "quantization_config": {
20
+ "config_groups": {
21
+ "group_0": {
22
+ "format": "nvfp4-pack-quantized",
23
+ "input_activations": {
24
+ "actorder": null,
25
+ "block_structure": null,
26
+ "dynamic": "local",
27
+ "group_size": 16,
28
+ "num_bits": 4,
29
+ "observer": "minmax",
30
+ "observer_kwargs": {},
31
+ "strategy": "tensor_group",
32
+ "symmetric": true,
33
+ "type": "float"
34
+ },
35
+ "output_activations": null,
36
+ "targets": [
37
+ "Linear"
38
+ ],
39
+ "weights": {
40
+ "actorder": null,
41
+ "block_structure": null,
42
+ "dynamic": false,
43
+ "group_size": 16,
44
+ "num_bits": 4,
45
+ "observer": "minmax",
46
+ "observer_kwargs": {},
47
+ "strategy": "tensor_group",
48
+ "symmetric": true,
49
+ "type": "float"
50
+ }
51
+ }
52
+ },
53
+ "format": "nvfp4-pack-quantized",
54
+ "global_compression_ratio": null,
55
+ "ignore": [
56
+ "lm_head"
57
+ ],
58
+ "kv_cache_scheme": null,
59
+ "quant_method": "compressed-tensors",
60
+ "quantization_status": "compressed",
61
+ "sparsity_config": {},
62
+ "transform_config": {},
63
+ "version": "0.12.2"
64
+ },
65
+ "rms_norm_eps": 1e-05,
66
+ "rope_theta": 1000000.0,
67
+ "sliding_window": null,
68
+ "tie_word_embeddings": false,
69
+ "transformers_version": "4.56.2",
70
+ "use_cache": true,
71
+ "vocab_size": 32768
72
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.56.2"
6
+ }
model-00001-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb1bb636dd11b91a6d7094aa210b72124792104f7a1dffb5e8b62817cdc8fc3d
3
+ size 4882434912
model-00002-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5e7bbe004812157312d617881686fb7d5a14973a5f160e3af1cbcd8903ccff9
3
+ size 4869903000
model-00003-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68ddadb0015c469d2dd31a224a81eaa67c20c11a917829cde5ffafd1335c96a7
3
+ size 4869903136
model-00004-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca270cbd4b5264aa895d86bdc51b4d4cbcef718b1cd6594cdee1f9d8e0e22b22
3
+ size 4969044352
model-00005-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:015ddc0b600fb39d6d6f0bfed405cebe71133de48a422df50bff739a5c6c0736
3
+ size 4954838264
model-00006-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:512ef6751735dd70868ccfdb65a0793d5420731a47a87a208399fba927727e8f
3
+ size 4869903136
model-00007-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc38472c190104df1f64c3c2c040145a4bc1ce6f132b46488eb1dadf20e03107
3
+ size 4969044352
model-00008-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3fa5410e2ba4820e5ffecf4e569a31979f3b9218f00273977e813b9fb370703
3
+ size 4954838264
model-00009-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8e77db4bd098a5cd7a12625b3bc375d305990b38021911e4afe3cae411ef395
3
+ size 4869903136
model-00010-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8b17418eb1d683d62712baf719afa0ec9f39dedc3d52b4968bee8acb777e430
3
+ size 4969044352
model-00011-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:25dc77eadc5eb6ab43f3219627889015863aaa5f2e6261ce045fbf24992aba66
3
+ size 4954838264
model-00012-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:06d4fae5a2bcc22c075395bb8c66dbd919757b756e1e1dfa547a23acf34e70b8
3
+ size 4869903136
model-00013-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75da9ea77cd10b8cfc3704a5c7b7246f68dc869f21cb6ead4be92979bc08b5a6
3
+ size 4969044352
model-00014-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8608daf67c0028261bc74ebd6c4f6e08c121bb9a956359a39fcdc976685f3927
3
+ size 4954838264
model-00015-of-00015.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58d340a00490def8c97844f4dd0d3074ffcb101cb18ea11b55e96ffd65e6120d
3
+ size 1201743176
model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
recipe.yaml ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ default_stage:
2
+ default_modifiers:
3
+ QuantizationModifier:
4
+ targets: [Linear]
5
+ ignore: [lm_head]
6
+ scheme: NVFP4
special_tokens_map.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "unk_token": {
17
+ "content": "<unk>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ }
23
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59f95e28944c062244741268596badc900df86c7f5ded05088d2da22a7379e06
3
+ size 587583
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff