dolus-v2 ยท GGUF

dolus-v2 is a fine-tuned version of Qwen3-4B-Instruct-2507, trained to perform stylistic rewriting of AI-generated text to transform it into prose that reads more naturally.

โš ๏ธ This is an experimental model and may introduce errors or hallucinations. Always verify rewritten text before use.


Training Details

  • Base model: unsloth/Qwen3-4B-Instruct-2507
  • Method: Supervised Fine-Tuning (SFT) with QLoRA (Quantized Low-Rank Adaptation)
  • Training framework: Unsloth
  • Training examples: 10,000+
  • Data pipeline: Human-written texts passed through a 2-stage LLM rewriting pipeline to generate approximations of freely generated AI-style text in the wild. The model was then trained on the reverse (AI โ†’ human) pairs.
    • First pass: gemini-3-flash-preview
    • Second pass: kimi-k2.6
  • Task: Sequence-to-sequence stylistic transfer

Usage

System Prompt

Rewrite the given AI-generated text so it reads as if written by a skilled and experienced human writer. Preserve the original meaning, and all key information present in the given AI-generated text, and never omit, add, invent, or infer any detail, context, explanation, or implication not explicitly present in it. Reproduce all names, titles, organizations, numbers, statistics, dates, units, and any other key data exactly as they appear in the given AI-generated text. Your only source of facts is the given AI-generated text provided so do not draw on outside knowledge. Output only the rewritten text.

Input Format

[AI-generated text]: {your text here}

Suggested Sampling Parameters

Parameter Recommended Range Used for Sniff-Test
temperature 0.70 โ€“ 0.85 0.8
top_k 20 โ€“ 60 20
top_p 0.85 โ€“ 0.95 0.85
repeat_penalty 1.10 โ€“ 1.20 1.10
max_tokens 4096 4096

Usage

  • Improve the naturalness and readability of LLM-generated text by reducing stylistic homogeneity and mechanical patterns.
  • Research into AI writing detection, stylometric analysis, and text quality improvement.

Not intended for:

  • Bypassing AI detection systems or circumvent plagiarism policies for academic dishonesty, fraud, or any deceptive misrepresentation of authorship.
  • Circumventing platform, publication, or institutional integrity policies.

Use responsibly and transparently in accordance with applicable academic/institutional/platform guidelines and disclosure requirements.


Limitations

  • Some reduction in writing quality or coherence is possible; longer inputs may be affected more than shorter ones.
  • Trained primarily on literary and academic texts; may not generalize well to other general-purpose texts.
  • Unintended semantic changes and hallucinations may occur during rewriting -- always review output.

UPDATE: To deal with quality issues, we can use a small LLM like Qwen-3.5-4B (thinking off-- for fast results) as a judge to evaluate the rewritten texts, and regenerate if it fails to meet your specific standards. Here is a sample prompt for the judge model:

You are an expert text quality evaluator. Your sole job is to assess the quality of a rewritten ("humanised") version of an AI-generated text. You will be given:

1. [ORIGINAL]: The original AI-generated text
2. [REWRITE]: The humanised rewrite to be evaluated

You must evaluate the rewrite strictly against the original using the rubric below. For each criterion, output only 0 (FAIL) or 1 (PASS). No explanations, no partial scores, no commentary.

---

EVALUATION RUBRIC

Evaluate each of the following 11 criteria independently:

HALLUCINATION CHECKS (any fabricated content = automatic 0)

1. STAT_INTEGRITY
Did the rewrite preserve all numerical figures, statistics, and quantitative claims exactly as they appear in the original (e.g. percentages, dollar amounts, dates, counts)?
0 = Any number is changed, invented, omitted, rounded, or has its decimal place, unit of measurement, or order of magnitude altered
1 = All numbers match the original exactly

2. ENTITY_INTEGRITY
Did the rewrite preserve all named entities correctly โ€” people, companies, books, places, technical terms โ€” without inventing new ones or misattributing any?
0 = Any named entity is fabricated, misattributed, or incorrectly merged
1 = All named entities are accurate and correctly attributed

3. CAUSAL_INTEGRITY
Did the rewrite preserve the causal logic and directional claims of the original (e.g. if X causes Y, the rewrite does not say Y causes X or that X prevents Y)?
0 = Any causal relationship is inverted, distorted, or fabricated
1 = All causal relationships match the original

4. NO_INVENTED_CONTENT
Did the rewrite avoid introducing any specific claims, facts, figures, characterisations, or conclusions that are not present in the original?
0 = Any new specific claim not in the original is introduced
1 = No new factual content introduced

COMPLETENESS CHECKS (omission of key content = automatic 0)

5. KEY_ARGUMENT_PRESERVED
Are all major arguments, conclusions, and central claims of the original present in the rewrite?
0 = Any major argument or conclusion is missing or materially weakened
1 = All major arguments are present

6. KEY_EVIDENCE_PRESERVED
Are all key pieces of supporting evidence, examples, data points, and illustrative details present in the rewrite?
0 = Any key supporting evidence or example is omitted
1 = All key evidence and examples are present

7. STRUCTURAL_LOGIC_PRESERVED
Does the rewrite maintain the logical progression and structure of the original's argument โ€” including setups, contrasts, and conclusions โ€” without collapsing or reordering them in a way that distorts meaning?
0 = Logical structure is collapsed, reordered, or broken in a way that changes meaning
1 = Logical structure is intact

FAITHFULNESS CHECKS

8. TONE_AND_STANCE_PRESERVED
Does the rewrite preserve the original's stance, perspective, and overall tone โ€” including who holds which opinion, what is presented as certain vs uncertain, and what is framed positively vs negatively?
0 = Stance, attribution of opinion, or tone is materially shifted
1 = Stance and tone are faithfully preserved

9. SCOPE_PRESERVED
Does the rewrite avoid overgeneralising or understating the original's claims โ€” neither inflating them beyond what the original says nor deflating them to be weaker than intended?
0 = Claims are materially overstated or understated
1 = Scope of all claims matches the original

FLUENCY CHECK

10. FLUENCY
Is the rewrite fluent, grammatically correct, and free of artifacts, placeholder text, or formatting errors that do not appear in the original?
0 = Contains grammatical errors, artifact text, or formatting issues
1 = Clean, fluent, and well-formed

REWRITE QUALITY CHECK

11. SUBSTANTIVE_REWRITE
Is the rewrite meaningfully rephrased and restructured from the original, or is it essentially a verbatim reproduction? This is the most important criterion. A rewrite that only changes a few words, swaps some phrases, or simply merges paragraphs/clauses in order, using simple connector words/phrases/punctuation, is NOT a substantive rewrite. The rewrite must demonstrate genuine attempt of using varied vocabulary, cadence, sentence construction, flow, reordering/restructuring/rephrasing/reformatting while preserving all facts, meaning and intent.

0 = Output is identical or near-identical to the original. Mostsentences are substantially unchanged in wording and structure. The rewrite reads like the original with minor surface changes.
1 = Output demonstrates substantial rewriting. The rewrite reads like genuinely different text while preserving all facts, meaning and intent of the original.

---

OUTPUT FORMAT

Return your evaluation as a JSON object only. No preamble, no explanation, no commentary. Strictly:

{
  "STAT_INTEGRITY": 0 or 1,
  "ENTITY_INTEGRITY": 0 or 1,
  "CAUSAL_INTEGRITY": 0 or 1,
  "NO_INVENTED_CONTENT": 0 or 1,
  "KEY_ARGUMENT_PRESERVED": 0 or 1,
  "KEY_EVIDENCE_PRESERVED": 0 or 1,
  "STRUCTURAL_LOGIC_PRESERVED": 0 or 1,
  "TONE_AND_STANCE_PRESERVED": 0 or 1,
  "SCOPE_PRESERVED": 0 or 1,
  "FLUENCY": 0 or 1,
  "SUBSTANTIVE_REWRITE": 0 or 1,
  "TOTAL": <sum of all scores above, integer between 0 and 11>,
  "PASS": 0 or 1  (1 if TOTAL >= 9 AND SUBSTANTIVE_REWRITE == 1, 0 otherwise)
}

---

HARD RULES

- You must evaluate ONLY against the original text provided. Do not use external knowledge to fill gaps or excuse omissions.
- A rewrite that is factually correct by general knowledge but diverges from the original still scores 0 on the relevant criterion.
- Stylistic changes (synonyms, sentence restructuring, contractions, punctuation) do not affect scores as long as meaning, facts, and logic are preserved.
- Tense changes are acceptable only if they do not distort the meaning or timeline of events.
- Adding headers or minor structural formatting does not penalise the rewrite unless it introduces or obscures content.
- If the rewrite inverts, contradicts, or fabricates even one specific factual claim, STAT_INTEGRITY, CAUSAL_INTEGRITY, or NO_INVENTED_CONTENT must be 0.
- If the rewrite is a verbatim or near-verbatim reproduction of the original (identical or near-identical text), SUBSTANTIVE_REWRITE must be 0 and PASS must be 0 regardless of TOTAL.
- When in doubt on any criterion, score 0.

Training Parameters

Base Model & Quantization

Parameter Value
Base model unsloth/Qwen3-4B-Instruct-2507
Max sequence length 4096
Quantization 4-bit (QLoRA)

LoRA Configuration

Parameter Value
Rank (r) 32
Alpha (lora_alpha) 64
Dropout 0.05

SFT Training

Parameter Value
Epochs 1
Batch size (per device) 8
Gradient accumulation steps 4
Learning rate 2e-4
LR scheduler Cosine
Optimizer AdamW (8-bit)
Weight decay 0.01
Warmup steps 35
Seed 3407

Available Files

File Quantization Size
qwen3-4b-instruct-2507.Q8_0.gguf Q8_0 ~4.5 GB

Acknowledgements

This project was inspired by Unslopper by N8Programs, which followed a similar data generation pipeline and LoRA finetuning approach for the same task. dolus-v2 builds on that direction with a smaller quantized base model (4B vs 30B-A3B), a larger training set (10k+ vs 1k) and a simpler two-stage data generation pipeline (2x vs 10x).


License

CC BY-NC-SA 4.0 โ€” Free for non-commercial use with attribution. Derivative models must use the same license.

Finetuned and converted to GGUF using Unsloth.

Downloads last month
130
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for bingbangboom/dolus-v2-GGUF

Quantized
(31)
this model