Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

.gitattributes +2 -0
README.md +305 -7
img/Intel.png +3 -0
img/RTX5090.png +3 -0

.gitattributes CHANGED Viewed

@@ -51,3 +51,5 @@ Llama-3.1-8B-Instruct-Q4_K_S-3.60bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Llama-3.1-8B-Instruct-Q4_K_S-3.83bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Llama-3.1-8B-Instruct-Q4_K_S-4.21bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Llama-3.1-8B-Instruct-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text

 Llama-3.1-8B-Instruct-Q4_K_S-3.83bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Llama-3.1-8B-Instruct-Q4_K_S-4.21bpw.gguf filter=lfs diff=lfs merge=lfs -text
 Llama-3.1-8B-Instruct-Q4_K_S-4.31bpw.gguf filter=lfs diff=lfs merge=lfs -text
+img/Intel.png filter=lfs diff=lfs merge=lfs -text
+img/RTX5090.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,13 +1,6 @@
 ---
 language:
 - en
-- de
-- fr
-- it
-- pt
-- hi
-- es
-- th
 license: llama3.1
 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 pipeline_tag: text-generation
@@ -19,3 +12,308 @@ tags:
 - llama-3
 - byteshape
 ---

 ---
 language:
 - en
 license: llama3.1
 base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
 pipeline_tag: text-generation
 - llama-3
 - byteshape
 ---
+<style>
+/* ByteShape Theme — Clean, compact, modern */
+body, div, p, li, table, th, td {
+    font-family: "Lato", "Roboto", Arial, sans-serif;
+    line-height: 1.22;
+}
+/* Brand Accent */
+:root {
+    --byteshape-accent: #cccccc;
+}
+/* Headings — more space below, less above */
+h2, h3, h4 {
+    margin-top: 16px !important;
+    margin-bottom: 14px !important;
+    font-weight: 1100;
+    border-bottom: 1px !important;
+    padding-bottom: 4px !important;
+    text-align: center !important;
+}
+h1 {
+    margin-top: 16px !important;
+    margin-bottom: 14px !important;
+    font-weight: 1100;
+    border-bottom: 1px !important;
+    padding-bottom: 4px !important;
+}
+/* Paragraphs — compact + justified */
+p {
+    margin-top: 4px !important;
+    margin-bottom: 6px !important;
+    text-align: justify;
+}
+/* Lists — tighter line spacing + compact margins */
+ul, ol {
+    margin-top: 4px !important;
+    margin-bottom: 4px !important;
+    padding-left: 20px !important;
+}
+li {
+    margin: 2px 0 !important;
+    line-height: 1.13 !important;
+}
+/* Tables — compact + soft ByteShape styling */
+table {
+    margin-top: 4px !important;
+    margin-bottom: 6px !important;
+    border-collapse: collapse;
+}
+th {
+    padding: 6px !important;
+    border-bottom: 1px !important;
+}
+td {
+    padding: 4px 6px !important;
+    border-bottom: 1px  !important;
+}
+/* Images — compact spacing */
+img {
+    margin: 4px 0 !important;
+    border-radius: 3px;
+}
+/* Horizontal lines — HF-compatible, tightly spaced */
+.markdown-body hr,
+.markdown hr,
+hr {
+    margin-top: 6px !important;
+    margin-bottom: 6px !important;
+    border: 0;
+    height: 1px;
+}
+  /* Custom thin line after first section */
+.section-divider {
+    width: 100%;
+    border-bottom: 1px dotted #999999;
+    margin: 12px 0;
+}
+</style>
+# Llama-3.1-8B-Instruct GGUF (ShapeLearn Quantized)
+<p>
+This is a GGUF-quantized version of Llama 3.1 8B Instruct produced with <b>ByteShape's ShapeLearn</b>, which learns the optimal datatype per tensor to maintain high quality even at very low bit lengths (the exclusive focus on this release).
+<br><br>
+To learn more about ShapeLearn and to see detailed benchmarks of this model across multiple GPUs, CPUs, and even the Raspberry Pi, please visit our <a href="https://byteshape.com/blogs/Qwen3-4B-I-2507/">blog</a>.
+<br><br>
+If you have questions or want to share feedback, you can also reach us on <a href="https://www.reddit.com/r/ByteShape/">Reddit</a>.
+</p>
+<div class="section-divider"></div>
+## How to Pick A Model
+<p>
+We provide <b>CPU and GPU optimized variants</b> for llama.cpp:
+</p>
+<ul>
+  <li><b>CPUs:</b> KQ quantization is preferred due to GGML kernel efficiency.</li>
+  <li><b>Nvidia GPUs:</b> IQ quantization delivers faster throughput on modern architectures.</li>
+</ul>
+<p>
+Each hardware target includes a range of models covering different size–quality tradeoffs.
+<br><br>
+The charts below show <b>quality vs. tokens per second</b> for each device, comparing ShapeLearn models with Unsloth baselines.
+<br><br>
+<b>Selection rule:</b> Choose the model with the highest quality at your target throughput or the fastest model that still meets your required quality.
+</p>
+<div class="section-divider"></div>
+### Understanding the Charts
+The charts below show **quality vs. tokens per second** for each device, including ShapeLearn models alongside Unsloth baselines for direct comparison.
+**Selection Strategy:** Choose the model with the best quality at your target throughput, or the fastest model that meets your quality requirements.
+---
+## GGUF-KQ Models: Best for CPU
+![CPU Benchmark - Intel](img/Intel.png)
+<br>
+<div align="center">
+<b>Table sorted by inference speed (match the chart’s numbers to model IDs):</b>
+<table style="border-collapse: collapse; margin-left:auto; margin-right:auto;">
+<thead>
+<tr font-weight:bold;">
+  <th style="padding:6px 10px;">Model ID</th>
+  <th style="padding:6px 10px;">Bits/Weight</th>
+  <th style="padding:6px 10px;">Model<br>Size HF</th>
+  <th style="padding:6px 10px;">Normalized<br>Score</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-2.91bpw.gguf">KQ-1</a></td>
+  <td>2.91</td>
+  <td>2.93 GB</td>
+  <td>83.03%</td>
+</tr>
+<tr >
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.06bpw.gguf">KQ-2</a></td>
+  <td>3.06</td>
+  <td>3.08 GB</td>
+  <td>87.68%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.24bpw.gguf">KQ-3</a></td>
+  <td>3.24</td>
+  <td>3.26 GB</td>
+  <td>90.10%</td>
+</tr>
+<tr >
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.34bpw.gguf">KQ-4</a></td>
+  <td>3.34</td>
+  <td>3.36 GB</td>
+  <td>92.40%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q3_K_S-3.41bpw.gguf">KQ-5</a></td>
+  <td>3.41</td>
+  <td>3.43 GB</td>
+  <td>93.20%</td>
+</tr>
+<tr >
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-3.60bpw.gguf">KQ-6</a></td>
+  <td>3.60</td>
+  <td>3.63 GB</td>
+  <td>94.85%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-3.83bpw.gguf">KQ-7</a></td>
+  <td>3.83</td>
+  <td>3.85 GB</td>
+  <td>92.89%</td>
+</tr>
+<tr >
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-4.21bpw.gguf">KQ-8</a></td>
+  <td>4.21</td>
+  <td>4.23 GB</td>
+  <td>96.15%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-Q4_K_S-4.31bpw.gguf">KQ-9</a></td>
+  <td>4.31</td>
+  <td>4.33 GB</td>
+  <td>97.94%</td>
+</tr>
+</tbody>
+</table>
+</div>
+<div class="section-divider"></div>
+## GGUF-IQ Models: Best for GPU
+![GPU Benchmark - RTX 5090](img/RTX5090.png)
+<br>
+<div align="center">
+<b>Table sorted by inference speed (match the chart’s numbers to model IDs):</b>
+<table style="border-collapse: collapse; margin-left:auto; margin-right:auto;">
+<thead>
+<tr font-weight:bold;">
+  <th style="padding:6px 10px;">Model ID</th>
+  <th style="padding:6px 10px;">Bits/Weight</th>
+  <th style="padding:6px 10px;">Model<br>Size HF</th>
+  <th style="padding:6px 10px;">Normalized<br>Score</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-2.54bpw.gguf">IQ-1</a></td>
+  <td>2.54</td>
+  <td>2.56 GB</td>
+  <td>68.48%</td>
+</tr>
+<tr >
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-2.72bpw.gguf">IQ-2</a></td>
+  <td>2.72</td>
+  <td>2.74 GB</td>
+  <td>81.97%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-2.87bpw.gguf">IQ-3</a></td>
+  <td>2.87</td>
+  <td>2.89 GB</td>
+  <td>83.63%</td>
+</tr>
+<tr ">
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-3.01bpw.gguf">IQ-4</a></td>
+  <td>3.01</td>
+  <td>3.03 GB</td>
+  <td>86.02%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-3.09bpw.gguf">IQ-5</a></td>
+  <td>3.09</td>
+  <td>3.11 GB</td>
+  <td>87.75%</td>
+</tr>
+<tr >
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ3_S-3.31bpw.gguf">IQ-6</a></td>
+  <td>3.31</td>
+  <td>3.33 GB</td>
+  <td>89.56%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ4_XS-3.57bpw.gguf">IQ-7</a></td>
+  <td>3.57</td>
+  <td>3.59 GB</td>
+  <td>93.21%</td>
+</tr>
+<tr >
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ4_XS-3.94bpw.gguf">IQ-8</a></td>
+  <td>3.94</td>
+  <td>3.96 GB</td>
+  <td>95.65%</td>
+</tr>
+<tr>
+  <td><a href="https://huggingface.co/byteshape/Llama-3.1-8B-Instruct-GGUF/blob/main/Llama-3.1-8B-Instruct-IQ4_XS-4.05bpw.gguf">IQ-9</a></td>
+  <td>4.05</td>
+  <td>4.07 GB</td>
+  <td>95.71%</td>
+</tr>
+</tbody>
+</table>
+</div>

img/Intel.png ADDED Viewed

Git LFS Details

SHA256: 59ca49f9abfa8402ff5e707baedb02a654aef980d4ccce0d79577208fe0fb1d8
Pointer size: 131 Bytes
Size of remote file: 751 kB

img/RTX5090.png ADDED Viewed

Git LFS Details

SHA256: 24b72b6c082bd2bc619e03b3c53d60b3a0781dde8c40f8d9ff1c9176877a6117
Pointer size: 131 Bytes
Size of remote file: 782 kB