Update README.md
Browse files
README.md
CHANGED
|
@@ -9,11 +9,11 @@ We propose mmMamba, the first decoder-only multimodal state space model achieved
|
|
| 9 |
Distilled from the decoder-only HoVLE-2.6B, our pure Mamba-2-based mmMamba-linear achieves performance competitive with existing linear and quadratic-complexity VLMs, including those with 2x larger parameter size like EVE-7B. The hybrid variant, mmMamba-hybrid, further enhances performance across all benchmarks, approaching the capabilities of the teacher model HoVLE. In long-context scenarios with 103K tokens, mmMamba-linear demonstrates remarkable efficiency gains with a 20.6× speedup and 75.8% GPU memory reduction compared to HoVLE, while mmMamba-hybrid achieves a 13.5× speedup and 60.2% memory savings.
|
| 10 |
|
| 11 |
<div align="center">
|
| 12 |
-
<img src="
|
| 13 |
|
| 14 |
|
| 15 |
<b>Seeding strategy and three-stage distillation pipeline of mmMamba.</b>
|
| 16 |
-
<img src="
|
| 17 |
</div>
|
| 18 |
|
| 19 |
## Quick Start Guide for mmMamba Inference
|
|
|
|
| 9 |
Distilled from the decoder-only HoVLE-2.6B, our pure Mamba-2-based mmMamba-linear achieves performance competitive with existing linear and quadratic-complexity VLMs, including those with 2x larger parameter size like EVE-7B. The hybrid variant, mmMamba-hybrid, further enhances performance across all benchmarks, approaching the capabilities of the teacher model HoVLE. In long-context scenarios with 103K tokens, mmMamba-linear demonstrates remarkable efficiency gains with a 20.6× speedup and 75.8% GPU memory reduction compared to HoVLE, while mmMamba-hybrid achieves a 13.5× speedup and 60.2% memory savings.
|
| 10 |
|
| 11 |
<div align="center">
|
| 12 |
+
<img src="teaser.png" />
|
| 13 |
|
| 14 |
|
| 15 |
<b>Seeding strategy and three-stage distillation pipeline of mmMamba.</b>
|
| 16 |
+
<img src="pipeline.png" />
|
| 17 |
</div>
|
| 18 |
|
| 19 |
## Quick Start Guide for mmMamba Inference
|