yzhangcs C10X commited on
Commit
fd1de63
·
verified ·
1 Parent(s): 62a8341

Fix LaTeX rendering issue on README.md (#2)

Browse files

- Update README.md (5c16f1a794b908a223e0775582e8321aaa823ba9)


Co-authored-by: Ali Furkan Celik <[email protected]>

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -35,7 +35,7 @@ We open-source the KDA kernel in [FLA](https://github.com/fla-org/flash-linear-a
35
  - **Kimi Delta Attention (KDA):** A linear attention mechanism that refines the gated delta rule with finegrained gating.
36
  - **Hybrid Architecture:** A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention.
37
  - **Superior Performance:** Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons.
38
- - **High Throughput:** Achieves up to $6\times$ faster decoding and significantly reduces time per output token (TPOT).
39
 
40
  <div align="center">
41
  <img width="60%" src="figures/arch.png">
 
35
  - **Kimi Delta Attention (KDA):** A linear attention mechanism that refines the gated delta rule with finegrained gating.
36
  - **Hybrid Architecture:** A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention.
37
  - **Superior Performance:** Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons.
38
+ - **High Throughput:** Achieves up to 6&times; faster decoding and significantly reduces time per output token (TPOT).
39
 
40
  <div align="center">
41
  <img width="60%" src="figures/arch.png">