Osaurus AI

Qwen 3.5 35B-A3B — JANG_4K (Mixed-Precision, 4-bit)

JANG — Jang Adaptive N-bit Grading | Mixed-Precision Quantization for Apple Silicon

Website  GitHub  PyPI  OsaurusAI


Osaurus natively supports JANG models. Download at osaurus.ai.


Model Details

Property Value
Base Model Qwen 3.5 VL 35B-A3B
Architecture MoE Transformer + Vision
Total Parameters 35B (3B active per token)
Profile JANG_4K
Avg Bits/Weight 3.98
Bit Widths Used 3, 4, 5, 8
Model Size 16.4 GB
Vision Yes
Format JANG v2 (MLX-native safetensors)

Benchmarks

200-question MMLU (20 per subject x 10 subjects). Thinking OFF (enable_thinking=False), greedy decoding (temp=0.0).

Model MMLU Size
JANG_4K (this) 77.5% 16.4 GB
MLX 4-bit 75.5% 18 GB
MLX 2-bit ~20% 10 GB

JANG_4K beats MLX 4-bit by +2 MMLU while being smaller (16.4 GB vs 18 GB). Budget-neutral bit redistribution boosts attention quality without increasing total size.

JANG_4K Profile

JANG_4K is a balanced 4-bit mixed-precision profile that provides near-original quality. Critical layers (attention, routing, embeddings) are kept at 8-bit, with expert MLP weights at 3-5 bit depending on importance scoring. Best quality-to-size ratio for most use cases.

Usage

# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/Qwen3.5-35B-A3B-JANG_4K

Requirements

  • Apple Silicon Mac with 24+ GB unified memory
  • MLX framework with Qwen 3.5 MoE support

Quantized by Osaurus AI using JANG

Downloads last month
318
Safetensors
Model size
5B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support