Ali93H commited on
Commit
07b8764
·
verified ·
1 Parent(s): 0d4f4e3

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. .gitattributes +2 -0
  2. README.md +305 -7
  3. img/Intel.png +3 -0
  4. img/RTX5090.png +3 -0
.gitattributes CHANGED
@@ -51,3 +51,5 @@ Llama-3.1-8B-Instruct-Q4_K_S-3.60bpw.gguf filter=lfs diff=lfs merge=lfs -text
51
  Llama-3.1-8B-Instruct-Q4_K_S-3.83bpw.gguf filter=lfs diff=lfs merge=lfs -text
52
  Llama-3.1-8B-Instruct-Q4_K_S-4.21bpw.gguf filter=lfs diff=lfs merge=lfs -text
53
  Llama-3.1-8B-Instruct-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
 
 
 
51
  Llama-3.1-8B-Instruct-Q4_K_S-3.83bpw.gguf filter=lfs diff=lfs merge=lfs -text
52
  Llama-3.1-8B-Instruct-Q4_K_S-4.21bpw.gguf filter=lfs diff=lfs merge=lfs -text
53
  Llama-3.1-8B-Instruct-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
54
+ img/Intel.png filter=lfs diff=lfs merge=lfs -text
55
+ img/RTX5090.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,13 +1,6 @@
1
  ---
2
  language:
3
  - en
4
- - de
5
- - fr
6
- - it
7
- - pt
8
- - hi
9
- - es
10
- - th
11
  license: llama3.1
12
  base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
13
  pipeline_tag: text-generation
@@ -19,3 +12,308 @@ tags:
19
  - llama-3
20
  - byteshape
21
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
 
 
 
 
 
 
 
4
  license: llama3.1
5
  base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
6
  pipeline_tag: text-generation
 
12
  - llama-3
13
  - byteshape
14
  ---
15
+ <style>
16
+ /* ByteShape Theme — Clean, compact, modern */
17
+ body, div, p, li, table, th, td {
18
+ font-family: "Lato", "Roboto", Arial, sans-serif;
19
+ line-height: 1.22;
20
+ }
21
+
22
+ /* Brand Accent */
23
+ :root {
24
+ --byteshape-accent: #cccccc;
25
+ }
26
+
27
+ /* Headings — more space below, less above */
28
+
29
+ h2, h3, h4 {
30
+ margin-top: 16px !important;
31
+ margin-bottom: 14px !important;
32
+ font-weight: 1100;
33
+ border-bottom: 1px !important;
34
+ padding-bottom: 4px !important;
35
+ text-align: center !important;
36
+ }
37
+ h1 {
38
+ margin-top: 16px !important;
39
+ margin-bottom: 14px !important;
40
+ font-weight: 1100;
41
+ border-bottom: 1px !important;
42
+ padding-bottom: 4px !important;
43
+ }
44
+
45
+ /* Paragraphs — compact + justified */
46
+ p {
47
+ margin-top: 4px !important;
48
+ margin-bottom: 6px !important;
49
+ text-align: justify;
50
+ }
51
+
52
+ /* Lists — tighter line spacing + compact margins */
53
+ ul, ol {
54
+ margin-top: 4px !important;
55
+ margin-bottom: 4px !important;
56
+ padding-left: 20px !important;
57
+ }
58
+ li {
59
+ margin: 2px 0 !important;
60
+ line-height: 1.13 !important;
61
+ }
62
+
63
+ /* Tables — compact + soft ByteShape styling */
64
+ table {
65
+ margin-top: 4px !important;
66
+ margin-bottom: 6px !important;
67
+ border-collapse: collapse;
68
+ }
69
+ th {
70
+ padding: 6px !important;
71
+ border-bottom: 1px !important;
72
+ }
73
+ td {
74
+ padding: 4px 6px !important;
75
+ border-bottom: 1px !important;
76
+ }
77
+
78
+ /* Images — compact spacing */
79
+ img {
80
+ margin: 4px 0 !important;
81
+ border-radius: 3px;
82
+ }
83
+
84
+ /* Horizontal lines — HF-compatible, tightly spaced */
85
+ .markdown-body hr,
86
+ .markdown hr,
87
+ hr {
88
+ margin-top: 6px !important;
89
+ margin-bottom: 6px !important;
90
+ border: 0;
91
+ height: 1px;
92
+ }
93
+ /* Custom thin line after first section */
94
+ .section-divider {
95
+ width: 100%;
96
+ border-bottom: 1px dotted #999999;
97
+ margin: 12px 0;
98
+ }
99
+ </style>
100
+
101
+ # Llama-3.1-8B-Instruct GGUF (ShapeLearn Quantized)
102
+ <p>
103
+ This is a GGUF-quantized version of Llama 3.1 8B Instruct produced with <b>ByteShape's ShapeLearn</b>, which learns the optimal datatype per tensor to maintain high quality even at very low bit lengths (the exclusive focus on this release).
104
+ <br><br>
105
+ To learn more about ShapeLearn and to see detailed benchmarks of this model across multiple GPUs, CPUs, and even the Raspberry Pi, please visit our <a href="https://byteshape.com/blogs/Qwen3-4B-I-2507/">blog</a>.
106
+ <br><br>
107
+ If you have questions or want to share feedback, you can also reach us on <a href="https://www.reddit.com/r/ByteShape/">Reddit</a>.
108
+ </p>
109
+
110
+ <div class="section-divider"></div>
111
+
112
+ ## How to Pick A Model
113
+ <p>
114
+ We provide <b>CPU and GPU optimized variants</b> for llama.cpp:
115
+ </p>
116
+
117
+ <ul>
118
+ <li><b>CPUs:</b> KQ quantization is preferred due to GGML kernel efficiency.</li>
119
+ <li><b>Nvidia GPUs:</b> IQ quantization delivers faster throughput on modern architectures.</li>
120
+ </ul>
121
+
122
+ <p>
123
+ Each hardware target includes a range of models covering different size–quality tradeoffs.
124
+ <br><br>
125
+ The charts below show <b>quality vs. tokens per second</b> for each device, comparing ShapeLearn models with Unsloth baselines.
126
+ <br><br>
127
+ <b>Selection rule:</b> Choose the model with the highest quality at your target throughput or the fastest model that still meets your required quality.
128
+ </p>
129
+
130
+ <div class="section-divider"></div>
131
+
132
+
133
+ ### Understanding the Charts
134
+
135
+ The charts below show **quality vs. tokens per second** for each device, including ShapeLearn models alongside Unsloth baselines for direct comparison.
136
+
137
+ **Selection Strategy:** Choose the model with the best quality at your target throughput, or the fastest model that meets your quality requirements.
138
+
139
+ ---
140
+
141
+ ## GGUF-KQ Models: Best for CPU
142
+
143
+ ![CPU Benchmark - Intel](img/Intel.png)
144
+
145
+ <br>
146
+ <div align="center">
147
+ <b>Table sorted by inference speed (match the chart’s numbers to model IDs):</b>
148
+
149
+
150
+ <table style="border-collapse: collapse; margin-left:auto; margin-right:auto;">
151
+
152
+ <thead>
153
+ <tr font-weight:bold;">
154
+ <th style="padding:6px 10px;">Model ID</th>
155
+ <th style="padding:6px 10px;">Bits/Weight</th>
156
+ <th style="padding:6px 10px;">Model<br>Size HF</th>
157
+ <th style="padding:6px 10px;">Normalized<br>Score</th>
158
+ </tr>
159
+ </thead>
160
+
161
+ <tbody>
162
+
163
+ <tr>
164
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-2.91bpw.gguf">KQ-1</a></td>
165
+ <td>2.91</td>
166
+ <td>2.93 GB</td>
167
+ <td>83.03%</td>
168
+ </tr>
169
+
170
+ <tr >
171
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.06bpw.gguf">KQ-2</a></td>
172
+ <td>3.06</td>
173
+ <td>3.08 GB</td>
174
+ <td>87.68%</td>
175
+ </tr>
176
+
177
+ <tr>
178
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.24bpw.gguf">KQ-3</a></td>
179
+ <td>3.24</td>
180
+ <td>3.26 GB</td>
181
+ <td>90.10%</td>
182
+ </tr>
183
+
184
+ <tr >
185
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.34bpw.gguf">KQ-4</a></td>
186
+ <td>3.34</td>
187
+ <td>3.36 GB</td>
188
+ <td>92.40%</td>
189
+ </tr>
190
+
191
+ <tr>
192
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.41bpw.gguf">KQ-5</a></td>
193
+ <td>3.41</td>
194
+ <td>3.43 GB</td>
195
+ <td>93.20%</td>
196
+ </tr>
197
+
198
+ <tr >
199
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-3.60bpw.gguf">KQ-6</a></td>
200
+ <td>3.60</td>
201
+ <td>3.63 GB</td>
202
+ <td>94.85%</td>
203
+ </tr>
204
+
205
+ <tr>
206
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-3.83bpw.gguf">KQ-7</a></td>
207
+ <td>3.83</td>
208
+ <td>3.85 GB</td>
209
+ <td>92.89%</td>
210
+ </tr>
211
+
212
+ <tr >
213
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-4.21bpw.gguf">KQ-8</a></td>
214
+ <td>4.21</td>
215
+ <td>4.23 GB</td>
216
+ <td>96.15%</td>
217
+ </tr>
218
+
219
+ <tr>
220
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-4.31bpw.gguf">KQ-9</a></td>
221
+ <td>4.31</td>
222
+ <td>4.33 GB</td>
223
+ <td>97.94%</td>
224
+ </tr>
225
+
226
+ </tbody>
227
+ </table>
228
+
229
+
230
+ </div>
231
+ <div class="section-divider"></div>
232
+
233
+ ## GGUF-IQ Models: Best for GPU
234
+
235
+ ![GPU Benchmark - RTX 5090](img/RTX5090.png)
236
+ <br>
237
+ <div align="center">
238
+ <b>Table sorted by inference speed (match the chart’s numbers to model IDs):</b>
239
+
240
+ <table style="border-collapse: collapse; margin-left:auto; margin-right:auto;">
241
+
242
+ <thead>
243
+ <tr font-weight:bold;">
244
+ <th style="padding:6px 10px;">Model ID</th>
245
+ <th style="padding:6px 10px;">Bits/Weight</th>
246
+ <th style="padding:6px 10px;">Model<br>Size HF</th>
247
+ <th style="padding:6px 10px;">Normalized<br>Score</th>
248
+ </tr>
249
+ </thead>
250
+
251
+ <tbody>
252
+
253
+ <tr>
254
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-2.54bpw.gguf">IQ-1</a></td>
255
+ <td>2.54</td>
256
+ <td>2.56 GB</td>
257
+ <td>68.48%</td>
258
+ </tr>
259
+
260
+ <tr >
261
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-2.72bpw.gguf">IQ-2</a></td>
262
+ <td>2.72</td>
263
+ <td>2.74 GB</td>
264
+ <td>81.97%</td>
265
+ </tr>
266
+
267
+ <tr>
268
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-2.87bpw.gguf">IQ-3</a></td>
269
+ <td>2.87</td>
270
+ <td>2.89 GB</td>
271
+ <td>83.63%</td>
272
+ </tr>
273
+
274
+ <tr ">
275
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-3.01bpw.gguf">IQ-4</a></td>
276
+ <td>3.01</td>
277
+ <td>3.03 GB</td>
278
+ <td>86.02%</td>
279
+ </tr>
280
+
281
+ <tr>
282
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-3.09bpw.gguf">IQ-5</a></td>
283
+ <td>3.09</td>
284
+ <td>3.11 GB</td>
285
+ <td>87.75%</td>
286
+ </tr>
287
+
288
+ <tr >
289
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-3.31bpw.gguf">IQ-6</a></td>
290
+ <td>3.31</td>
291
+ <td>3.33 GB</td>
292
+ <td>89.56%</td>
293
+ </tr>
294
+
295
+ <tr>
296
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ4_XS-3.57bpw.gguf">IQ-7</a></td>
297
+ <td>3.57</td>
298
+ <td>3.59 GB</td>
299
+ <td>93.21%</td>
300
+ </tr>
301
+
302
+ <tr >
303
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ4_XS-3.94bpw.gguf">IQ-8</a></td>
304
+ <td>3.94</td>
305
+ <td>3.96 GB</td>
306
+ <td>95.65%</td>
307
+ </tr>
308
+
309
+ <tr>
310
+ <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ4_XS-4.05bpw.gguf">IQ-9</a></td>
311
+ <td>4.05</td>
312
+ <td>4.07 GB</td>
313
+ <td>95.71%</td>
314
+ </tr>
315
+
316
+ </tbody>
317
+ </table>
318
+
319
+ </div>
img/Intel.png ADDED

Git LFS Details

  • SHA256: 59ca49f9abfa8402ff5e707baedb02a654aef980d4ccce0d79577208fe0fb1d8
  • Pointer size: 131 Bytes
  • Size of remote file: 751 kB
img/RTX5090.png ADDED

Git LFS Details

  • SHA256: 24b72b6c082bd2bc619e03b3c53d60b3a0781dde8c40f8d9ff1c9176877a6117
  • Pointer size: 131 Bytes
  • Size of remote file: 782 kB