Gemma 4 31B-it — JANG_4M (Mixed-Precision, 4-bit)

JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

Osaurus natively supports JANG models. Download at osaurus.ai.

Model Details

Property	Value
Base Model	`google/gemma-4-31b-it`
Architecture	Dense Transformer + Hybrid Sliding/Global Attention
Parameters	31B (29.2B weights)
Profile	JANG_4M (CRITICAL=8-bit, COMPRESS=4-bit)
Avg Bits/Weight	5.1
Model Size	18 GB
Vision	Yes (multimodal, float16 passthrough)
Context Length	128K tokens
Layers	60
Format	JANG v2 (MLX-native safetensors, instant load)

JANG_4M Bit Allocation

Tier	Components	Bits
CRITICAL	Attention (Q/K/V/O), embeddings	8
COMPRESS	MLP (gate, up, down proj), remaining weights	4

JANG protects attention at full precision while compressing MLP weights — where dense models are most tolerant of quantization. Vision encoder is preserved in float16 for full multimodal quality.

Vision Weight Verification

All 355 vision tower tensors verified present and non-zero. The 31B dense model is text+vision (no audio tower).

Component	Tensor Count	Status
Vision Tower (SigLIP)	355	All non-zero
Language Model	remaining	All non-zero

Benchmarks

200-question MMLU (20 per subject x 10 subjects). Thinking OFF (enable_thinking=False), greedy decoding (temp=0.0).

Subject	JANG_4M
Abstract Algebra	13/20
Anatomy	13/20
Astronomy	17/20
College CS	14/20
College Physics	14/20
HS Biology	19/20
HS Chemistry	15/20
HS Mathematics	9/20
Logical Fallacies	19/20
World Religions	20/20
Total	153/200 (76.5%)

Architecture Highlights

Dense transformer with 60 layers
Hybrid attention: sliding-window + full-attention layers (every 6th layer is full)
Dual head dimensions: 256 (sliding) / 512 (global)
K=V weight sharing on global attention layers
Vision encoder preserved in float16 for multimodal inference

Usage

# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/Gemma-4-31B-it-JANG_4M

Requirements

Apple Silicon Mac with 24+ GB unified memory
Osaurus or compatible MLX inference engine with Gemma 4 support

Quantized by Osaurus AI using JANG

Downloads last month: 184

Safetensors

Model size

6B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized