Instructions to use ArsenyIvanov/toolace-halu-qwen-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use ArsenyIvanov/toolace-halu-qwen-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "ArsenyIvanov/toolace-halu-qwen-lora") - Notebooks
- Google Colab
- Kaggle
toolace-halu-qwen-lora
LoRA adapter for Qwen/Qwen2.5-7B-Instruct fine-tuned for span-level
hallucination detection in tool-calling dialogues. A generative detector:
the model rewrites the assistant answer with <halu type="...">...</halu>
markers around hallucinated spans, then the markers are regex-extracted to
recover character offsets.
- Base:
Qwen/Qwen2.5-7B-Instruct - Training data:
Ivan1008/toolace-hallucination-spans, configcombined, splittrain(1,955 records) - Trainable params: 10.1 M (0.13% of model) via LoRA r=16, α=32
- Target modules:
q_proj,k_proj,v_proj,o_proj
Why a generative detector
Most hallucination-detection systems are encoder token-classifiers (e.g. LettuceDetect). A generative LoRA is a different paradigm:
- Preserves the full answer structure with explicit per-span typing
- Extensible to chain-of-thought rationales after the closing tag
- Slightly stronger on the hardest type,
contradiction, where world-model understanding matters more than per-token lexical features
Training
- 2 epochs on
combined/train(1,955 records) - Batch 4 × grad_accum 2 (effective 8); lr 5e-5; bf16; warmup 6%; max_len 1536
attn_implementation="eager"(Qwen2 + SDPA + bf16 has known NaN issues)- LoRA params kept in fp32 (PEFT + bf16 + fused AdamW → NaN grads)
- Single H200, 7 min training + 13 min inference on all 4 configs
| epoch | val loss |
|---|---|
| 1 | 0.0216 |
| 2 | 0.0179 |
Test-set results (sentence-level F1 — the leaderboard metric)
| Config | Lexical floor | LettuceDetect-large (zero-shot) | LookBackLens (in-domain) | ModernBERT-ft | This model | + Ensemble |
|---|---|---|---|---|---|---|
| combined | 0.302 | 0.361 | 0.489 | 0.798 | 0.771 | 0.871 |
| contradiction | 0.231 | 0.315 | 0.377 | 0.763 | 0.800 ⭐ | 0.877 |
| missing_tool | 0.218 | 0.330 | 0.406 | 0.966 | 0.927 | 0.993 |
| overgeneration | 0.319 | 0.335 | 0.508 | 0.697 | 0.672 | 0.824 |
⭐ Best single-model on contradiction — beats the encoder fine-tune
by +4 pp. Validates the hypothesis that LLM world-modelling beats
per-token lexical features for value-swap hallucinations.
Companion ModernBERT model
For per-token encoder-based detection see
ArsenyIvanov/toolace-halu-modernbert-large.
A stacking LightGBM ensemble over the two reaches sentence F1 0.871 on
combined. See the project repo and
notebooks/improve_baselines.ipynb for full code, training curves and analytics.
Usage
import re
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "ArsenyIvanov/toolace-halu-qwen-lora"
tokenizer = AutoTokenizer.from_pretrained(ADAPTER)
tokenizer.padding_side = "left"
base = AutoModelForCausalLM.from_pretrained(
BASE, torch_dtype=torch.bfloat16, attn_implementation="eager"
).to("cuda").eval()
model = PeftModel.from_pretrained(base, ADAPTER).to("cuda").eval()
SYSTEM = (
"You are a hallucination detector for tool-augmented dialogues. "
"Given the tool context, the available tools, the user query and the assistant answer, "
"rewrite the assistant answer wrapping every hallucinated span in "
'<halu type="contradiction">...</halu>, <halu type="missing_tool">...</halu> '
'or <halu type="overgeneration">...</halu> tags. '
"Do not alter any other characters. If the answer contains no hallucinations, return it unchanged."
)
def detect(query, tool_context, tool_names, answer):
user = (
f"[Tool context]\n{tool_context}\n\n"
f"[Available tools]\n{', '.join(tool_names)}\n\n"
f"[User query]\n{query}\n\n"
f"[Assistant answer]\n{answer}\n\n"
"Now rewrite the assistant answer above with <halu> markers around hallucinated spans."
)
msgs = [{"role": "system", "content": SYSTEM},
{"role": "user", "content": user}]
prompt = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
enc = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=1536).to("cuda")
with torch.no_grad():
gen = model.generate(**enc, max_new_tokens=512, do_sample=False,
pad_token_id=tokenizer.pad_token_id)
completion = tokenizer.decode(gen[0, enc["input_ids"].shape[1]:], skip_special_tokens=True)
spans, cursor = [], 0
HALU_RE = re.compile(r'<halu type="(contradiction|missing_tool|overgeneration)">(.+?)</halu>', re.DOTALL)
for m in HALU_RE.finditer(completion):
ttype, inner = m.group(1), m.group(2)
idx = answer.find(inner, cursor)
if idx == -1: idx = answer.find(inner)
if idx == -1: continue
spans.append({"start": idx, "end": idx + len(inner), "text": inner, "label": ttype})
cursor = idx + len(inner)
return {"marked": completion, "spans": spans}
Hallucination types
| Label | What it captures |
|---|---|
contradiction |
grounded value replaced by a plausible-but-wrong alternative |
missing_tool |
offers an action that requires a tool not in the available list |
overgeneration |
inserted sentence with claims not supported by the tool output |
Limitations
- Synthetic corruptions only — no naturally occurring cascading errors
20× slower than ModernBERT-ft at inference (5 sec/record vs ~50 ms)- LoRA adapter only — needs
Qwen/Qwen2.5-7B-Instructbase model at runtime - Single corruption type per record (RAGTruth strict schema)
License
Apache 2.0 — matches the base model and the training dataset license.
- Downloads last month
- 46