Instructions to use bingbangboom/dolus-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use bingbangboom/dolus-v2-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="bingbangboom/dolus-v2-GGUF", filename="qwen3-4b-instruct-2507.Q8_0.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use bingbangboom/dolus-v2-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf bingbangboom/dolus-v2-GGUF:Q8_0
Use Docker
docker model run hf.co/bingbangboom/dolus-v2-GGUF:Q8_0
- LM Studio
- Jan
- Ollama
How to use bingbangboom/dolus-v2-GGUF with Ollama:
ollama run hf.co/bingbangboom/dolus-v2-GGUF:Q8_0
- Unsloth Studio
How to use bingbangboom/dolus-v2-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bingbangboom/dolus-v2-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bingbangboom/dolus-v2-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bingbangboom/dolus-v2-GGUF to start chatting
- Pi
How to use bingbangboom/dolus-v2-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "bingbangboom/dolus-v2-GGUF:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use bingbangboom/dolus-v2-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf bingbangboom/dolus-v2-GGUF:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default bingbangboom/dolus-v2-GGUF:Q8_0
Run Hermes
hermes
- Docker Model Runner
How to use bingbangboom/dolus-v2-GGUF with Docker Model Runner:
docker model run hf.co/bingbangboom/dolus-v2-GGUF:Q8_0
- Lemonade
How to use bingbangboom/dolus-v2-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull bingbangboom/dolus-v2-GGUF:Q8_0
Run and chat with the model
lemonade run user.dolus-v2-GGUF-Q8_0
List all available models
lemonade list
dolus-v2 ยท GGUF
dolus-v2 is a fine-tuned version of Qwen3-4B-Instruct-2507, trained to perform stylistic rewriting of AI-generated text to transform it into prose that reads more naturally.
โ ๏ธ This is an experimental model and may introduce errors or hallucinations. Always verify rewritten text before use.
Training Details
- Base model:
unsloth/Qwen3-4B-Instruct-2507 - Method: Supervised Fine-Tuning (SFT) with QLoRA (Quantized Low-Rank Adaptation)
- Training framework: Unsloth
- Training examples: 10,000+
- Data pipeline: Human-written texts passed through a 2-stage LLM rewriting pipeline to generate approximations of freely generated AI-style text in the wild. The model was then trained on the reverse (AI โ human) pairs.
- First pass: gemini-3-flash-preview
- Second pass: kimi-k2.6
- Task: Sequence-to-sequence stylistic transfer
Usage
System Prompt
Rewrite the given AI-generated text so it reads as if written by a skilled and experienced human writer. Preserve the original meaning, and all key information present in the given AI-generated text, and never omit, add, invent, or infer any detail, context, explanation, or implication not explicitly present in it. Reproduce all names, titles, organizations, numbers, statistics, dates, units, and any other key data exactly as they appear in the given AI-generated text. Your only source of facts is the given AI-generated text provided so do not draw on outside knowledge. Output only the rewritten text.
Input Format
[AI-generated text]: {your text here}
Suggested Sampling Parameters
| Parameter | Recommended Range | Used for Sniff-Test |
|---|---|---|
temperature |
0.70 โ 0.85 | 0.8 |
top_k |
20 โ 60 | 20 |
top_p |
0.85 โ 0.95 | 0.85 |
repeat_penalty |
1.10 โ 1.20 | 1.10 |
max_tokens |
4096 | 4096 |
Usage
- Improve the naturalness and readability of LLM-generated text by reducing stylistic homogeneity and mechanical patterns.
- Research into AI writing detection, stylometric analysis, and text quality improvement.
Not intended for:
- Bypassing AI detection systems or circumvent plagiarism policies for academic dishonesty, fraud, or any deceptive misrepresentation of authorship.
- Circumventing platform, publication, or institutional integrity policies.
Use responsibly and transparently in accordance with applicable academic/institutional/platform guidelines and disclosure requirements.
Limitations
- Some reduction in writing quality or coherence is possible; longer inputs may be affected more than shorter ones.
- Trained primarily on literary and academic texts; may not generalize well to other general-purpose texts.
- Unintended semantic changes and hallucinations may occur during rewriting -- always review output.
UPDATE: To deal with quality issues, we can use a small LLM like Qwen-3.5-4B (thinking off-- for fast results) as a judge to evaluate the rewritten texts, and regenerate if it fails to meet your specific standards. Here is a sample prompt for the judge model:
You are an expert text quality evaluator. Your sole job is to assess the quality of a rewritten ("humanised") version of an AI-generated text. You will be given:
1. [ORIGINAL]: The original AI-generated text
2. [REWRITE]: The humanised rewrite to be evaluated
You must evaluate the rewrite strictly against the original using the rubric below. For each criterion, output only 0 (FAIL) or 1 (PASS). No explanations, no partial scores, no commentary.
---
EVALUATION RUBRIC
Evaluate each of the following 11 criteria independently:
HALLUCINATION CHECKS (any fabricated content = automatic 0)
1. STAT_INTEGRITY
Did the rewrite preserve all numerical figures, statistics, and quantitative claims exactly as they appear in the original (e.g. percentages, dollar amounts, dates, counts)?
0 = Any number is changed, invented, omitted, rounded, or has its decimal place, unit of measurement, or order of magnitude altered
1 = All numbers match the original exactly
2. ENTITY_INTEGRITY
Did the rewrite preserve all named entities correctly โ people, companies, books, places, technical terms โ without inventing new ones or misattributing any?
0 = Any named entity is fabricated, misattributed, or incorrectly merged
1 = All named entities are accurate and correctly attributed
3. CAUSAL_INTEGRITY
Did the rewrite preserve the causal logic and directional claims of the original (e.g. if X causes Y, the rewrite does not say Y causes X or that X prevents Y)?
0 = Any causal relationship is inverted, distorted, or fabricated
1 = All causal relationships match the original
4. NO_INVENTED_CONTENT
Did the rewrite avoid introducing any specific claims, facts, figures, characterisations, or conclusions that are not present in the original?
0 = Any new specific claim not in the original is introduced
1 = No new factual content introduced
COMPLETENESS CHECKS (omission of key content = automatic 0)
5. KEY_ARGUMENT_PRESERVED
Are all major arguments, conclusions, and central claims of the original present in the rewrite?
0 = Any major argument or conclusion is missing or materially weakened
1 = All major arguments are present
6. KEY_EVIDENCE_PRESERVED
Are all key pieces of supporting evidence, examples, data points, and illustrative details present in the rewrite?
0 = Any key supporting evidence or example is omitted
1 = All key evidence and examples are present
7. STRUCTURAL_LOGIC_PRESERVED
Does the rewrite maintain the logical progression and structure of the original's argument โ including setups, contrasts, and conclusions โ without collapsing or reordering them in a way that distorts meaning?
0 = Logical structure is collapsed, reordered, or broken in a way that changes meaning
1 = Logical structure is intact
FAITHFULNESS CHECKS
8. TONE_AND_STANCE_PRESERVED
Does the rewrite preserve the original's stance, perspective, and overall tone โ including who holds which opinion, what is presented as certain vs uncertain, and what is framed positively vs negatively?
0 = Stance, attribution of opinion, or tone is materially shifted
1 = Stance and tone are faithfully preserved
9. SCOPE_PRESERVED
Does the rewrite avoid overgeneralising or understating the original's claims โ neither inflating them beyond what the original says nor deflating them to be weaker than intended?
0 = Claims are materially overstated or understated
1 = Scope of all claims matches the original
FLUENCY CHECK
10. FLUENCY
Is the rewrite fluent, grammatically correct, and free of artifacts, placeholder text, or formatting errors that do not appear in the original?
0 = Contains grammatical errors, artifact text, or formatting issues
1 = Clean, fluent, and well-formed
REWRITE QUALITY CHECK
11. SUBSTANTIVE_REWRITE
Is the rewrite meaningfully rephrased and restructured from the original, or is it essentially a verbatim reproduction? This is the most important criterion. A rewrite that only changes a few words, swaps some phrases, or simply merges paragraphs/clauses in order, using simple connector words/phrases/punctuation, is NOT a substantive rewrite. The rewrite must demonstrate genuine attempt of using varied vocabulary, cadence, sentence construction, flow, reordering/restructuring/rephrasing/reformatting while preserving all facts, meaning and intent.
0 = Output is identical or near-identical to the original. Mostsentences are substantially unchanged in wording and structure. The rewrite reads like the original with minor surface changes.
1 = Output demonstrates substantial rewriting. The rewrite reads like genuinely different text while preserving all facts, meaning and intent of the original.
---
OUTPUT FORMAT
Return your evaluation as a JSON object only. No preamble, no explanation, no commentary. Strictly:
{
"STAT_INTEGRITY": 0 or 1,
"ENTITY_INTEGRITY": 0 or 1,
"CAUSAL_INTEGRITY": 0 or 1,
"NO_INVENTED_CONTENT": 0 or 1,
"KEY_ARGUMENT_PRESERVED": 0 or 1,
"KEY_EVIDENCE_PRESERVED": 0 or 1,
"STRUCTURAL_LOGIC_PRESERVED": 0 or 1,
"TONE_AND_STANCE_PRESERVED": 0 or 1,
"SCOPE_PRESERVED": 0 or 1,
"FLUENCY": 0 or 1,
"SUBSTANTIVE_REWRITE": 0 or 1,
"TOTAL": <sum of all scores above, integer between 0 and 11>,
"PASS": 0 or 1 (1 if TOTAL >= 9 AND SUBSTANTIVE_REWRITE == 1, 0 otherwise)
}
---
HARD RULES
- You must evaluate ONLY against the original text provided. Do not use external knowledge to fill gaps or excuse omissions.
- A rewrite that is factually correct by general knowledge but diverges from the original still scores 0 on the relevant criterion.
- Stylistic changes (synonyms, sentence restructuring, contractions, punctuation) do not affect scores as long as meaning, facts, and logic are preserved.
- Tense changes are acceptable only if they do not distort the meaning or timeline of events.
- Adding headers or minor structural formatting does not penalise the rewrite unless it introduces or obscures content.
- If the rewrite inverts, contradicts, or fabricates even one specific factual claim, STAT_INTEGRITY, CAUSAL_INTEGRITY, or NO_INVENTED_CONTENT must be 0.
- If the rewrite is a verbatim or near-verbatim reproduction of the original (identical or near-identical text), SUBSTANTIVE_REWRITE must be 0 and PASS must be 0 regardless of TOTAL.
- When in doubt on any criterion, score 0.
Training Parameters
Base Model & Quantization
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen3-4B-Instruct-2507 |
| Max sequence length | 4096 |
| Quantization | 4-bit (QLoRA) |
LoRA Configuration
| Parameter | Value |
|---|---|
Rank (r) |
32 |
Alpha (lora_alpha) |
64 |
| Dropout | 0.05 |
SFT Training
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Batch size (per device) | 8 |
| Gradient accumulation steps | 4 |
| Learning rate | 2e-4 |
| LR scheduler | Cosine |
| Optimizer | AdamW (8-bit) |
| Weight decay | 0.01 |
| Warmup steps | 35 |
| Seed | 3407 |
Available Files
| File | Quantization | Size |
|---|---|---|
qwen3-4b-instruct-2507.Q8_0.gguf |
Q8_0 | ~4.5 GB |
Acknowledgements
This project was inspired by Unslopper by N8Programs, which followed a similar data generation pipeline and LoRA finetuning approach for the same task. dolus-v2 builds on that direction with a smaller quantized base model (4B vs 30B-A3B), a larger training set (10k+ vs 1k) and a simpler two-stage data generation pipeline (2x vs 10x).
License
CC BY-NC-SA 4.0 โ Free for non-commercial use with attribution. Derivative models must use the same license.
Finetuned and converted to GGUF using Unsloth.
- Downloads last month
- 130
8-bit
Model tree for bingbangboom/dolus-v2-GGUF
Base model
Qwen/Qwen3-4B-Instruct-2507