Gemma-4-21B-A4B-it-REAP (MLX 4-bit)

This is a high-speed 4-bit (INT4) conversion of 0xSero/gemma-4-21b-a4b-it-REAP optimized for Apple Silicon using the MLX framework.

Model Highlights

  • Architecture: Gemma 4 Multimodal with Active Blocks (A4B).
  • Optimization: REAP (Reasoning Enhancement and Active Pruning) for enhanced vision-language reasoning.
  • Precision: 4-bit (Optimized for inference speed and low memory footprint).
  • Size: ~12 GB (Recommended for any Apple Silicon Mac with 16GB+ RAM).

Conversion Details

The model was converted locally to ensure bit-perfect compatibility with the MLX ecosystem.

  • Hardware: Mac Mini (M4) with 32GB RAM.
  • Library: mlx-vlm using -q --q-bits 4.
  • Format: Native MLX safetensors.

Usage

Installation

pip install mlx-vlm

Credits
Original Model: 0xSero/gemma-4-21b-a4b-it-REAP

MLX Conversion: Z3NN001
Downloads last month
587
Safetensors
Model size
4B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Z3NN001/gemma4-21b-a4b-REAP-it-mlx-Q4

Quantized
(9)
this model