Instructions to use bingbangboom/dolus-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bingbangboom/dolus-v2-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="bingbangboom/dolus-v2-GGUF",
	filename="qwen3-4b-instruct-2507.Q8_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use bingbangboom/dolus-v2-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0
# Run inference directly in the terminal:
llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0

Use Docker

docker model run hf.co/bingbangboom/dolus-v2-GGUF:Q8_0

LM Studio
Jan
Ollama
How to use bingbangboom/dolus-v2-GGUF with Ollama:
```
ollama run hf.co/bingbangboom/dolus-v2-GGUF:Q8_0
```

Unsloth Studio

How to use bingbangboom/dolus-v2-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bingbangboom/dolus-v2-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bingbangboom/dolus-v2-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for bingbangboom/dolus-v2-GGUF to start chatting

How to use bingbangboom/dolus-v2-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "bingbangboom/dolus-v2-GGUF:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use bingbangboom/dolus-v2-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default bingbangboom/dolus-v2-GGUF:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use bingbangboom/dolus-v2-GGUF with Docker Model Runner:
```
docker model run hf.co/bingbangboom/dolus-v2-GGUF:Q8_0
```

Lemonade

How to use bingbangboom/dolus-v2-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull bingbangboom/dolus-v2-GGUF:Q8_0

Run and chat with the model

lemonade run user.dolus-v2-GGUF-Q8_0

List all available models

lemonade list

dolus-v2 · GGUF

dolus-v2 is a fine-tuned version of Qwen3-4B-Instruct-2507, trained to perform stylistic rewriting of AI-generated text to transform it into prose that reads more naturally.

⚠️ This is an experimental model and may introduce errors or hallucinations. Always verify rewritten text before use.

Training Details

Base model: unsloth/Qwen3-4B-Instruct-2507
Method: Supervised Fine-Tuning (SFT) with QLoRA (Quantized Low-Rank Adaptation)
Training framework: Unsloth
Training examples: 10,000+
Data pipeline: Human-written texts passed through a 2-stage LLM rewriting pipeline to generate approximations of freely generated AI-style text in the wild. The model was then trained on the reverse (AI → human) pairs.
- First pass: gemini-3-flash-preview
- Second pass: kimi-k2.6
Task: Sequence-to-sequence stylistic transfer

Usage

System Prompt

Rewrite the given AI-generated text so it reads as if written by a skilled and experienced human writer. Preserve the original meaning, and all key information present in the given AI-generated text, and never omit, add, invent, or infer any detail, context, explanation, or implication not explicitly present in it. Reproduce all names, titles, organizations, numbers, statistics, dates, units, and any other key data exactly as they appear in the given AI-generated text. Your only source of facts is the given AI-generated text provided so do not draw on outside knowledge. Output only the rewritten text.

Input Format

[AI-generated text]: {your text here}

Suggested Sampling Parameters

Parameter	Recommended Range	Used for Sniff-Test
`temperature`	0.70 – 0.85	0.8
`top_k`	20 – 60	20
`top_p`	0.85 – 0.95	0.85
`repeat_penalty`	1.10 – 1.20	1.10
`max_tokens`	4096	4096

Usage

Improve the naturalness and readability of LLM-generated text by reducing stylistic homogeneity and mechanical patterns.
Research into AI writing detection, stylometric analysis, and text quality improvement.

Not intended for:

Bypassing AI detection systems or circumvent plagiarism policies for academic dishonesty, fraud, or any deceptive misrepresentation of authorship.
Circumventing platform, publication, or institutional integrity policies.

Use responsibly and transparently in accordance with applicable academic/institutional/platform guidelines and disclosure requirements.

Limitations

Some reduction in writing quality or coherence is possible; longer inputs may be affected more than shorter ones.
Trained primarily on literary and academic texts; may not generalize well to other general-purpose texts.
Unintended semantic changes and hallucinations may occur during rewriting -- always review output.

UPDATE: To deal with quality issues, we can use a small LLM like Qwen-3.5-4B (thinking off-- for fast results) as a judge to evaluate the rewritten texts, and regenerate if it fails to meet your specific standards. Here is a sample prompt for the judge model:

You are an expert text quality evaluator. Your sole job is to assess the quality of a rewritten ("humanised") version of an AI-generated text. You will be given:

1. [ORIGINAL]: The original AI-generated text
2. [REWRITE]: The humanised rewrite to be evaluated

You must evaluate the rewrite strictly against the original using the rubric below. For each criterion, output only 0 (FAIL) or 1 (PASS). No explanations, no partial scores, no commentary.

---

EVALUATION RUBRIC

Evaluate each of the following 11 criteria independently:

HALLUCINATION CHECKS (any fabricated content = automatic 0)

1. STAT_INTEGRITY
Did the rewrite preserve all numerical figures, statistics, and quantitative claims exactly as they appear in the original (e.g. percentages, dollar amounts, dates, counts)?
0 = Any number is changed, invented, omitted, rounded, or has its decimal place, unit of measurement, or order of magnitude altered
1 = All numbers match the original exactly

2. ENTITY_INTEGRITY
Did the rewrite preserve all named entities correctly — people, companies, books, places, technical terms — without inventing new ones or misattributing any?
0 = Any named entity is fabricated, misattributed, or incorrectly merged
1 = All named entities are accurate and correctly attributed

3. CAUSAL_INTEGRITY
Did the rewrite preserve the causal logic and directional claims of the original (e.g. if X causes Y, the rewrite does not say Y causes X or that X prevents Y)?
0 = Any causal relationship is inverted, distorted, or fabricated
1 = All causal relationships match the original

4. NO_INVENTED_CONTENT
Did the rewrite avoid introducing any specific claims, facts, figures, characterisations, or conclusions that are not present in the original?
0 = Any new specific claim not in the original is introduced
1 = No new factual content introduced

COMPLETENESS CHECKS (omission of key content = automatic 0)

5. KEY_ARGUMENT_PRESERVED
Are all major arguments, conclusions, and central claims of the original present in the rewrite?
0 = Any major argument or conclusion is missing or materially weakened
1 = All major arguments are present

6. KEY_EVIDENCE_PRESERVED
Are all key pieces of supporting evidence, examples, data points, and illustrative details present in the rewrite?
0 = Any key supporting evidence or example is omitted
1 = All key evidence and examples are present

7. STRUCTURAL_LOGIC_PRESERVED
Does the rewrite maintain the logical progression and structure of the original's argument — including setups, contrasts, and conclusions — without collapsing or reordering them in a way that distorts meaning?
0 = Logical structure is collapsed, reordered, or broken in a way that changes meaning
1 = Logical structure is intact

FAITHFULNESS CHECKS

8. TONE_AND_STANCE_PRESERVED
Does the rewrite preserve the original's stance, perspective, and overall tone — including who holds which opinion, what is presented as certain vs uncertain, and what is framed positively vs negatively?
0 = Stance, attribution of opinion, or tone is materially shifted
1 = Stance and tone are faithfully preserved

9. SCOPE_PRESERVED
Does the rewrite avoid overgeneralising or understating the original's claims — neither inflating them beyond what the original says nor deflating them to be weaker than intended?
0 = Claims are materially overstated or understated
1 = Scope of all claims matches the original

FLUENCY CHECK

10. FLUENCY
Is the rewrite fluent, grammatically correct, and free of artifacts, placeholder text, or formatting errors that do not appear in the original?
0 = Contains grammatical errors, artifact text, or formatting issues
1 = Clean, fluent, and well-formed

REWRITE QUALITY CHECK

11. SUBSTANTIVE_REWRITE
Is the rewrite meaningfully rephrased and restructured from the original, or is it essentially a verbatim reproduction? This is the most important criterion. A rewrite that only changes a few words, swaps some phrases, or simply merges paragraphs/clauses in order, using simple connector words/phrases/punctuation, is NOT a substantive rewrite. The rewrite must demonstrate genuine attempt of using varied vocabulary, cadence, sentence construction, flow, reordering/restructuring/rephrasing/reformatting while preserving all facts, meaning and intent.

0 = Output is identical or near-identical to the original. Mostsentences are substantially unchanged in wording and structure. The rewrite reads like the original with minor surface changes.
1 = Output demonstrates substantial rewriting. The rewrite reads like genuinely different text while preserving all facts, meaning and intent of the original.

---

OUTPUT FORMAT

Return your evaluation as a JSON object only. No preamble, no explanation, no commentary. Strictly:

{
  "STAT_INTEGRITY": 0 or 1,
  "ENTITY_INTEGRITY": 0 or 1,
  "CAUSAL_INTEGRITY": 0 or 1,
  "NO_INVENTED_CONTENT": 0 or 1,
  "KEY_ARGUMENT_PRESERVED": 0 or 1,
  "KEY_EVIDENCE_PRESERVED": 0 or 1,
  "STRUCTURAL_LOGIC_PRESERVED": 0 or 1,
  "TONE_AND_STANCE_PRESERVED": 0 or 1,
  "SCOPE_PRESERVED": 0 or 1,
  "FLUENCY": 0 or 1,
  "SUBSTANTIVE_REWRITE": 0 or 1,
  "TOTAL": <sum of all scores above, integer between 0 and 11>,
  "PASS": 0 or 1  (1 if TOTAL >= 9 AND SUBSTANTIVE_REWRITE == 1, 0 otherwise)
}

---

HARD RULES

- You must evaluate ONLY against the original text provided. Do not use external knowledge to fill gaps or excuse omissions.
- A rewrite that is factually correct by general knowledge but diverges from the original still scores 0 on the relevant criterion.
- Stylistic changes (synonyms, sentence restructuring, contractions, punctuation) do not affect scores as long as meaning, facts, and logic are preserved.
- Tense changes are acceptable only if they do not distort the meaning or timeline of events.
- Adding headers or minor structural formatting does not penalise the rewrite unless it introduces or obscures content.
- If the rewrite inverts, contradicts, or fabricates even one specific factual claim, STAT_INTEGRITY, CAUSAL_INTEGRITY, or NO_INVENTED_CONTENT must be 0.
- If the rewrite is a verbatim or near-verbatim reproduction of the original (identical or near-identical text), SUBSTANTIVE_REWRITE must be 0 and PASS must be 0 regardless of TOTAL.
- When in doubt on any criterion, score 0.

Training Parameters

Base Model & Quantization

Parameter	Value
Base model	`unsloth/Qwen3-4B-Instruct-2507`
Max sequence length	4096
Quantization	4-bit (QLoRA)

LoRA Configuration

Parameter	Value
Rank (`r`)	32
Alpha (`lora_alpha`)	64
Dropout	0.05

SFT Training

Parameter	Value
Epochs	1
Batch size (per device)	8
Gradient accumulation steps	4
Learning rate	2e-4
LR scheduler	Cosine
Optimizer	AdamW (8-bit)
Weight decay	0.01
Warmup steps	35
Seed	3407

Available Files

File	Quantization	Size
`qwen3-4b-instruct-2507.Q8_0.gguf`	Q8_0	~4.5 GB

Acknowledgements

This project was inspired by Unslopper by N8Programs, which followed a similar data generation pipeline and LoRA finetuning approach for the same task. dolus-v2 builds on that direction with a smaller quantized base model (4B vs 30B-A3B), a larger training set (10k+ vs 1k) and a simpler two-stage data generation pipeline (2x vs 10x).

License

CC BY-NC-SA 4.0 — Free for non-commercial use with attribution. Derivative models must use the same license.

Finetuned and converted to GGUF using Unsloth.

Downloads last month: 130

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bingbangboom/dolus-v2-GGUF

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Quantized

(31)

this model