Gemma-4-21B-A4B-it-REAP (MLX 4-bit)

This is a high-speed 4-bit (INT4) conversion of 0xSero/gemma-4-21b-a4b-it-REAP optimized for Apple Silicon using the MLX framework.

Model Highlights

Architecture: Gemma 4 Multimodal with Active Blocks (A4B).
Optimization: REAP (Reasoning Enhancement and Active Pruning) for enhanced vision-language reasoning.
Precision: 4-bit (Optimized for inference speed and low memory footprint).
Size: ~12 GB (Recommended for any Apple Silicon Mac with 16GB+ RAM).

The model was converted locally to ensure bit-perfect compatibility with the MLX ecosystem.

pip install mlx-vlm

Credits
Original Model: 0xSero/gemma-4-21b-a4b-it-REAP

MLX Conversion: Z3NN001

Safetensors

Model size

4B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Base model

Quantized

(9)

this model