FineVine Qwen3.5-35B-A3B Vision MLX V18 (6bit)

FineVine Qwen3.5-35B-A3B V18 is a fine-tuned version of Qwen/Qwen3.5-35B-A3B optimized for efficient local inference by reducing thinking-token usage, especially for local use on Apple silicon laptops. Significant effort was also made to reduce looping behavior.

This repository contains the 6-bit MLX variant of FineVine Qwen3.5-35B-A3B Vision V18.

Other Quantizations

This release focuses on:

  • lower looping behavior
  • significant reduction in looping during function calls and reasoning
  • significant reduction in the tokens needed to reach an answer
  • better prompt adherence on structured tasks
  • better handling of contradictions, underdetermined prompts, and no-solution cases

This repo is organized as a multi-quant MLX release so you can choose the quantization that fits your hardware. It is ideal for Macs with less than 50 GB of memory.

Base Model

This model is based on:

  • Qwen/Qwen3.5-35B-A3B

What Improved

Compared with the base model, this release was tuned to improve several behavior problems at once:

  • Reduced overthinking on easy and medium prompts
  • Much less looping and repeated branch restarts on quantized versions
  • Better prompt adherence on structured coding and reasoning tasks
  • Better balance between concise answers and still giving enough detail when helpful
  • Better formatting for practical answers, including selective use of lists, tables, and diagrams
  • Better handling of valid logic, contradictions, and insufficient-information cases
  • Better behavior on harder coding prompts with more branch coverage before writing code
  • Strong tool-calling behavior on structured tasks and tool-chains
  • More stable local assistant behavior overall; if you need a local model that answers quickly, this fine-tune is a strong fit.

In practice, the model tends to answer more directly, with less wasted reasoning, while still remaining capable on harder structured tasks.

Training Data

Training data sources included:

  • custom hand-written and hand-edited datasets (60%)
  • manually reviewed synthetic traces (20%)
  • high-quality examples and reasoning traces derived from models including Claude Opus and GLM-5 (20%)

Quantizations

This release includes three MLX-VLM quantization options:

  • mlx-4bit
  • mlx-6bit
  • mlx-8bit

Repository Layout

FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18/
  README.md
  mlx-4bit/
  mlx-6bit/
  mlx-8bit/

Intended Use

This model is meant for:

  • local laptop use
  • direct-answer assistant tasks
  • strong tool-calling and structured tool-use workflows for tools like Osaurus and OpenCode
  • coding help

Important Note

This model is not ideal as a general chatbot or roleplaying model.

If you want a warmer, more open-ended, highly conversational chat style, or strong roleplay behavior, this release is probably not the best fit.

Suggested model parameters:

  • temperature = 0.6
  • top_p = 0.95
  • top_k = 20
  • min_p = 0.0
  • presence_penalty = 0.0
  • repetition_penalty = 1.0

LM Studio Note

If you want to turn off visible thinking in LM Studio, use:

{%- set enable_thinking = false %}
Downloads last month
295
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pt-ml/FineVine-Qwen3.5-35B-A3B-Vision-MLX-V18-6bit

Quantized
(224)
this model