MiniMax-M2.5-REAP-139B-A10B-GGUF

This is the REAP model in practical pants: high quality GGUF quants for local inference without setting your workstation on fire.

Built from:

  • Base: MiniMaxAI/MiniMax-M2.5
  • REAP source: tomngdev/MiniMax-M2.5-REAP-139B-A10B-GGUF (BF16 split)
  • Quantized locally with llama.cpp on Strix Halo + high RAM mode.

Available Quants

Quant Status Size (GiB) Notes
Q8_0 uploaded 137.78 Highest quality quant in this pack
Q5_K_M uploading 92.33 Better quality/size balance
Q4_K_M uploaded 78.83 Strong practical default

File Layout

All quants are split GGUF sets (00001-of-00007 etc.) for safer handling of very large models.

Quality Notes

  • These are generated from BF16 REAP GGUF, not requantized from lower precision.
  • Token embedding and output tensors are kept at Q8_0 during quantization for quality retention.

Usage

Use any first shard with llama.cpp; it auto-discovers sibling shards:

llama-cli -m MiniMax-M2.5-REAP-Q4_K_M-00001-of-00007.gguf -ngl 0 -c 8192

Credits

  • MiniMaxAI for MiniMax-M2.5
  • tomngdev for the BF16 REAP GGUF release
  • BennyDaBall for this quant pack

Disclaimer

You are responsible for your own use, outputs, and compliance with applicable laws and platform policies.

Downloads last month
9
GGUF
Model size
139B params
Architecture
minimax-m2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for BennyDaBall/MiniMax-M2.5-REAP-139B-A10B-GGUF