moonshotai
/

Kimi-Linear-48B-A3B-Instruct

Text Generation

Model card Files Files and versions

yzhangcs commited on Oct 30

Commit

62a8341

·

verified ·

1 Parent(s): 919416f

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -6,11 +6,10 @@ license: mit
 </div>
 <div align="center">
-  <a href="https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf"><img src="figures/logo.png" height="16" width="16" style="vertical-align:middle"><b> Tech Report</b></a>  |
-  <a href="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="vertical-align:middle"><b> HuggingFace</b></a>
 </div>
 <div align="center">
   <img width="90%" src="figures/perf_speed.png">
   <p><em><b>(a)</b> On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. On RULER (128k context length), it shows Pareto-optimal performance (84.3) and 3.98x speedup. <b>(b)</b> Kimi Linear achieves 6.3x faster TPOT compared to MLA, offering significant speedups at long sequence lengths (1M tokens).</em></p>

 </div>
 <div align="center">
+  <a href="https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf" ><img src="figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Tech Report</b></a>  |
+  <a href="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a>
 </div>
 <div align="center">
   <img width="90%" src="figures/perf_speed.png">
   <p><em><b>(a)</b> On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. On RULER (128k context length), it shows Pareto-optimal performance (84.3) and 3.98x speedup. <b>(b)</b> Kimi Linear achieves 6.3x faster TPOT compared to MLA, offering significant speedups at long sequence lengths (1M tokens).</em></p>