toolace-halu-qwen-lora

LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned for span-level hallucination detection in tool-calling dialogues. A generative detector: the model rewrites the assistant answer with <halu type="...">...</halu> markers around hallucinated spans, then the markers are regex-extracted to recover character offsets.

  • Base: Qwen/Qwen2.5-7B-Instruct
  • Training data: Ivan1008/toolace-hallucination-spans, config combined, split train (1,955 records)
  • Trainable params: 10.1 M (0.13% of model) via LoRA r=16, α=32
  • Target modules: q_proj, k_proj, v_proj, o_proj

Why a generative detector

Most hallucination-detection systems are encoder token-classifiers (e.g. LettuceDetect). A generative LoRA is a different paradigm:

  • Preserves the full answer structure with explicit per-span typing
  • Extensible to chain-of-thought rationales after the closing tag
  • Slightly stronger on the hardest type, contradiction, where world-model understanding matters more than per-token lexical features

Training

  • 2 epochs on combined/train (1,955 records)
  • Batch 4 × grad_accum 2 (effective 8); lr 5e-5; bf16; warmup 6%; max_len 1536
  • attn_implementation="eager" (Qwen2 + SDPA + bf16 has known NaN issues)
  • LoRA params kept in fp32 (PEFT + bf16 + fused AdamW → NaN grads)
  • Single H200, 7 min training + 13 min inference on all 4 configs
epoch val loss
1 0.0216
2 0.0179

Test-set results (sentence-level F1 — the leaderboard metric)

Config Lexical floor LettuceDetect-large (zero-shot) LookBackLens (in-domain) ModernBERT-ft This model + Ensemble
combined 0.302 0.361 0.489 0.798 0.771 0.871
contradiction 0.231 0.315 0.377 0.763 0.800 0.877
missing_tool 0.218 0.330 0.406 0.966 0.927 0.993
overgeneration 0.319 0.335 0.508 0.697 0.672 0.824

Best single-model on contradiction — beats the encoder fine-tune by +4 pp. Validates the hypothesis that LLM world-modelling beats per-token lexical features for value-swap hallucinations.

Companion ModernBERT model

For per-token encoder-based detection see ArsenyIvanov/toolace-halu-modernbert-large. A stacking LightGBM ensemble over the two reaches sentence F1 0.871 on combined. See the project repo and notebooks/improve_baselines.ipynb for full code, training curves and analytics.

Usage

import re
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "ArsenyIvanov/toolace-halu-qwen-lora"

tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
tokenizer.padding_side = "left"
base = AutoModelForCausalLM.from_pretrained(
    BASE, torch_dtype=torch.bfloat16, attn_implementation="eager"
).to("cuda").eval()
model = PeftModel.from_pretrained(base, ADAPTER).to("cuda").eval()

SYSTEM = (
    "You are a hallucination detector for tool-augmented dialogues. "
    "Given the tool context, the available tools, the user query and the assistant answer, "
    "rewrite the assistant answer wrapping every hallucinated span in "
    '<halu type="contradiction">...</halu>, <halu type="missing_tool">...</halu> '
    'or <halu type="overgeneration">...</halu> tags. '
    "Do not alter any other characters. If the answer contains no hallucinations, return it unchanged."
)

def detect(query, tool_context, tool_names, answer):
    user = (
        f"[Tool context]\n{tool_context}\n\n"
        f"[Available tools]\n{', '.join(tool_names)}\n\n"
        f"[User query]\n{query}\n\n"
        f"[Assistant answer]\n{answer}\n\n"
        "Now rewrite the assistant answer above with <halu> markers around hallucinated spans."
    )
    msgs = [{"role": "system", "content": SYSTEM},
            {"role": "user",   "content": user}]
    prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
    enc = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1536).to("cuda")
    with torch.no_grad():
        gen = model.generate(**enc, max_new_tokens=512, do_sample=False,
                             pad_token_id=tokenizer.pad_token_id)
    completion = tokenizer.decode(gen[0, enc["input_ids"].shape[1]:], skip_special_tokens=True)

    spans, cursor = [], 0
    HALU_RE = re.compile(r'<halu type="(contradiction|missing_tool|overgeneration)">(.+?)</halu>', re.DOTALL)
    for m in HALU_RE.finditer(completion):
        ttype, inner = m.group(1), m.group(2)
        idx = answer.find(inner, cursor)
        if idx == -1: idx = answer.find(inner)
        if idx == -1: continue
        spans.append({"start": idx, "end": idx + len(inner), "text": inner, "label": ttype})
        cursor = idx + len(inner)
    return {"marked": completion, "spans": spans}

Hallucination types

Label What it captures
contradiction grounded value replaced by a plausible-but-wrong alternative
missing_tool offers an action that requires a tool not in the available list
overgeneration inserted sentence with claims not supported by the tool output

Limitations

  • Synthetic corruptions only — no naturally occurring cascading errors
  • 20× slower than ModernBERT-ft at inference (5 sec/record vs ~50 ms)
  • LoRA adapter only — needs Qwen/Qwen2.5-7B-Instruct base model at runtime
  • Single corruption type per record (RAGTruth strict schema)

License

Apache 2.0 — matches the base model and the training dataset license.

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ArsenyIvanov/toolace-halu-qwen-lora

Base model

Qwen/Qwen2.5-7B
Adapter
(2143)
this model

Dataset used to train ArsenyIvanov/toolace-halu-qwen-lora