toolace-halu-qwen-lora

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned for span-level hallucination detection in tool-calling dialogues. A generative detector: the model rewrites the assistant answer with <halu type="...">...</halu> markers around hallucinated spans, then the markers are regex-extracted to recover character offsets.

Base: Qwen/Qwen2.5-7B-Instruct
Training data: Ivan1008/toolace-hallucination-spans, config combined, split train (1,955 records)
Trainable params: 10.1 M (0.13% of model) via LoRA r=16, α=32
Target modules: q_proj, k_proj, v_proj, o_proj

Why a generative detector

Most hallucination-detection systems are encoder token-classifiers (e.g. LettuceDetect). A generative LoRA is a different paradigm:

Preserves the full answer structure with explicit per-span typing
Extensible to chain-of-thought rationales after the closing tag
Slightly stronger on the hardest type, contradiction, where world-model understanding matters more than per-token lexical features

Training

2 epochs on combined/train (1,955 records)
Batch 4 × grad_accum 2 (effective 8); lr 5e-5; bf16; warmup 6%; max_len 1536
attn_implementation="eager" (Qwen2 + SDPA + bf16 has known NaN issues)
LoRA params kept in fp32 (PEFT + bf16 + fused AdamW → NaN grads)
Single H200, 7 min training + 13 min inference on all 4 configs

epoch	val loss
1	0.0216
2	0.0179

Test-set results (sentence-level F1 — the leaderboard metric)

Config	Lexical floor	LettuceDetect-large (zero-shot)	LookBackLens (in-domain)	ModernBERT-ft	This model	+ Ensemble
combined	0.302	0.361	0.489	0.798	0.771	0.871
contradiction	0.231	0.315	0.377	0.763	0.800 ⭐	0.877
missing_tool	0.218	0.330	0.406	0.966	0.927	0.993
overgeneration	0.319	0.335	0.508	0.697	0.672	0.824

⭐ Best single-model on contradiction — beats the encoder fine-tune by +4 pp. Validates the hypothesis that LLM world-modelling beats per-token lexical features for value-swap hallucinations.

Companion ModernBERT model

For per-token encoder-based detection see ArsenyIvanov/toolace-halu-modernbert-large. A stacking LightGBM ensemble over the two reaches sentence F1 0.871 on combined. See the project repo and notebooks/improve_baselines.ipynb for full code, training curves and analytics.

Usage

import re
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "ArsenyIvanov/toolace-halu-qwen-lora"

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
tokenizer.padding_side = "left"
base = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, attn_implementation="eager"
).to("cuda").eval()
model = PeftModel.from_pretrained(base, ADAPTER).to("cuda").eval()

SYSTEM = (
    "You are a hallucination detector for tool-augmented dialogues. "
    "Given the tool context, the available tools, the user query and the assistant answer, "
    "rewrite the assistant answer wrapping every hallucinated span in "
    '<halu type="contradiction">...</halu>, <halu type="missing_tool">...</halu> '
    'or <halu type="overgeneration">...</halu> tags. '
    "Do not alter any other characters. If the answer contains no hallucinations, return it unchanged."
)

def detect(query, tool_context, tool_names, answer):
    user = (
        f"[Tool context]\n{tool_context}\n\n"
        f"[Available tools]\n{', '.join(tool_names)}\n\n"
        f"[User query]\n{query}\n\n"
        f"[Assistant answer]\n{answer}\n\n"
        "Now rewrite the assistant answer above with <halu> markers around hallucinated spans."
    )
    msgs = [{"role": "system", "content": SYSTEM},
            {"role": "user",   "content": user}]
    prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
    enc = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1536).to("cuda")
    with torch.no_grad():
        gen = model.generate(**enc, max_new_tokens=512, do_sample=False,
                             pad_token_id=tokenizer.pad_token_id)
    completion = tokenizer.decode(gen[0, enc["input_ids"].shape[1]:], skip_special_tokens=True)

    spans, cursor = [], 0
    HALU_RE = re.compile(r'<halu type="(contradiction|missing_tool|overgeneration)">(.+?)</halu>', re.DOTALL)
    for m in HALU_RE.finditer(completion):
        ttype, inner = m.group(1), m.group(2)
        idx = answer.find(inner, cursor)
        if idx == -1: idx = answer.find(inner)
        if idx == -1: continue
        spans.append({"start": idx, "end": idx + len(inner), "text": inner, "label": ttype})
        cursor = idx + len(inner)
    return {"marked": completion, "spans": spans}

Hallucination types

Label	What it captures
`contradiction`	grounded value replaced by a plausible-but-wrong alternative
`missing_tool`	offers an action that requires a tool not in the available list
`overgeneration`	inserted sentence with claims not supported by the tool output

Limitations

Synthetic corruptions only — no naturally occurring cascading errors
~~20× slower than ModernBERT-ft at inference (~~5 sec/record vs ~50 ms)
LoRA adapter only — needs Qwen/Qwen2.5-7B-Instruct base model at runtime
Single corruption type per record (RAGTruth strict schema)

License

Apache 2.0 — matches the base model and the training dataset license.

Downloads last month: 46

Model tree for ArsenyIvanov/toolace-halu-qwen-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2143)

this model

ArsenyIvanov
/

toolace-halu-qwen-lora