FineVine Qwen3.5-35B-A3B Vision MLX V18 (6bit)

FineVine Qwen3.5-35B-A3B V18 is a fine-tuned version of Qwen/Qwen3.5-35B-A3B optimized for efficient local inference by reducing thinking-token usage, especially for local use on Apple silicon laptops. Significant effort was also made to reduce looping behavior.

This repository contains the 6-bit MLX variant of FineVine Qwen3.5-35B-A3B Vision V18.

Other Quantizations

This release focuses on:

lower looping behavior
significant reduction in looping during function calls and reasoning
significant reduction in the tokens needed to reach an answer
better prompt adherence on structured tasks
better handling of contradictions, underdetermined prompts, and no-solution cases

This repo is organized as a multi-quant MLX release so you can choose the quantization that fits your hardware. It is ideal for Macs with less than 50 GB of memory.

Base Model

This model is based on:

Qwen/Qwen3.5-35B-A3B

What Improved

Compared with the base model, this release was tuned to improve several behavior problems at once:

Reduced overthinking on easy and medium prompts
Much less looping and repeated branch restarts on quantized versions
Better prompt adherence on structured coding and reasoning tasks
Better balance between concise answers and still giving enough detail when helpful
Better formatting for practical answers, including selective use of lists, tables, and diagrams
Better handling of valid logic, contradictions, and insufficient-information cases
Better behavior on harder coding prompts with more branch coverage before writing code
Strong tool-calling behavior on structured tasks and tool-chains
More stable local assistant behavior overall; if you need a local model that answers quickly, this fine-tune is a strong fit.

In practice, the model tends to answer more directly, with less wasted reasoning, while still remaining capable on harder structured tasks.

Training Data

Training data sources included:

custom hand-written and hand-edited datasets (60%)
manually reviewed synthetic traces (20%)
high-quality examples and reasoning traces derived from models including Claude Opus and GLM-5 (20%)

Quantizations

This release includes three MLX-VLM quantization options:

mlx-4bit
mlx-6bit
mlx-8bit

Repository Layout

FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18/
  README.md
  mlx-4bit/
  mlx-6bit/
  mlx-8bit/

Intended Use

This model is meant for:

local laptop use
direct-answer assistant tasks
strong tool-calling and structured tool-use workflows for tools like Osaurus and OpenCode
coding help

Important Note

This model is not ideal as a general chatbot or roleplaying model.

If you want a warmer, more open-ended, highly conversational chat style, or strong roleplay behavior, this release is probably not the best fit.

Suggested model parameters:

temperature = 0.6
top_p = 0.95
top_k = 20
min_p = 0.0
presence_penalty = 0.0
repetition_penalty = 1.0

LM Studio Note

If you want to turn off visible thinking in LM Studio, use:

{%- set enable_thinking = false %}

Downloads last month: 295

Safetensors

Model size

8B params

Tensor type

BF16

U32

MLX

Hardware compatibility

6-bit

Model tree for pt-ml/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-6bit

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(224)

this model