⚠️ DEPRECATED — Experimental Model
This model is an early experimental release from the kniv cascade research program and is no longer maintained. It predates the current 5-head cascade architecture (POS, NER, DEP, SRL, CLS) and the bottom-up layer-selective training methodology that produces our current production teacher.
Use the current production model instead:
dragonscale-ai/kniv-deberta-nlp-base-en-largeThe current model offers significantly better quality across all tasks, includes Semantic Role Labeling and Dialog Act Classification heads, and has reproducible benchmarks against standard public test sets.
This repository is preserved for reproducibility and historical reference. No further updates, bug fixes, or evaluation runs are planned.
kniv-deberta-v3-large-nlp-en
Multi-task NLP teacher model for English: NER + POS tagging + dependency parsing + sentence classification in a single forward pass.
Part of the kniv-nlp-models project, powering the uniko cognitive memory system.
Model Details
| Base model | microsoft/deberta-v3-large |
| Parameters | 435M (24 layers, 1024 hidden) |
| Max sequence length | 128 tokens |
| Format | PyTorch + ONNX (FP32 + INT8) |
| Training data | kniv-corpus-en (gold-filtered) + UD English EWT |
| License | Apache-2.0 |
| Use | Server-side NLP; teacher for knowledge distillation |
Results
| Head | Task | Metric | Score |
|---|---|---|---|
| NER | Named entity recognition (18 types) | F1 | 0.725 |
| POS | Part-of-speech tagging (17 UPOS) | Accuracy | 0.984 |
| DEP | Dependency parsing (dep2label) | UAS | 0.871 |
| CLS | Dialog act classification (9 labels) | Macro F1 | 0.493 |
| Composite | 0.823 |
Architecture
Shared DeBERTa-v3-large encoder with four linear heads. One forward pass, four outputs.
DeBERTa-v3-large encoder (435M params, 24 layers, 1024 hidden)
+-- NER head: Linear(1024, 37) -- per-token BIO entity tags
+-- POS head: Linear(1024, 17) -- per-token UPOS tags
+-- Dep head: Linear(1024, 1411) -- per-token dep2label tags
+-- CLS head: Linear(1024, 9) -- per-sequence dialog act
NER Entity Types (18)
PERSON, ORG, GPE, LOC, DATE, TIME, MONEY, PERCENT, QUANTITY, ORDINAL, CARDINAL, NORP, FAC, PRODUCT, EVENT, WORK_OF_ART, LAW, LANGUAGE
CLS Dialog Act Labels (9)
inform, correction, agreement, question, plan_commit, request, feedback, social, filler
dep2label Encoding
Dependencies encoded as token labels using rel-pos (Strzyz et al., 2019):
+1@nsubj@VERB -> "1st VERB to the right, relation=nsubj"
-2@det@NOUN -> "2nd NOUN to the left, relation=det"
0@root@ROOT -> "root of the sentence"
Training
Trained on gold-filtered kniv-corpus-en:
- NER: 45,000 examples (gold-filtered, domain-balanced from 237K)
- POS + DEP: 12,544 examples (UD English EWT v2.14, expert-annotated)
- CLS: 57,544 examples (NER + UD combined, GPT-5.4-nano classified)
| Parameter | Value |
|---|---|
| Batch size | 64 |
| Learning rate | 1e-5 |
| Epochs | 5 |
| Precision | fp32 (gradient checkpointing) |
| Warmup | 10% |
| Loss weights | NER: 1.0, POS: 1.0, Dep: 1.0, CLS: 0.5 |
| Hardware | NVIDIA A100 40GB |
Usage
Python (ONNX Runtime)
import onnxruntime as ort
import numpy as np
from transformers import AutoTokenizer
import json
# Load
session = ort.InferenceSession("model-int8.onnx")
tokenizer = AutoTokenizer.from_pretrained(".")
with open("label_maps.json") as f:
labels = json.load(f)
# Tokenize
text = "Caroline went to the hospital in New York."
enc = tokenizer(text, return_tensors="np", padding="max_length", max_length=128)
# Inference (single forward pass -> 4 outputs)
outputs = session.run(None, {
"input_ids": enc["input_ids"],
"attention_mask": enc["attention_mask"],
})
ner_logits, pos_logits, dep_logits, cls_logits = outputs
# Decode NER
tokens = tokenizer.convert_ids_to_tokens(enc["input_ids"][0])
ner_preds = [labels["ner_labels"][i] for i in ner_logits[0].argmax(axis=-1)]
for tok, ner in zip(tokens, ner_preds):
if ner != "O":
print(f" {tok}: {ner}")
Rust (ONNX Runtime)
use ort::{Session, Value};
use ndarray::Array2;
use tokenizers::Tokenizer;
let session = Session::builder()?
.with_optimization_level(ort::GraphOptimizationLevel::Level3)?
.commit_from_file("model-int8.onnx")?;
let tokenizer = Tokenizer::from_file("tokenizer.json")?;
let encoding = tokenizer.encode("Caroline went to the hospital.", true)?;
let outputs = session.run(ort::inputs![
Array2::from_shape_vec((1, 128), encoding.get_ids().to_vec())?,
Array2::from_shape_vec((1, 128), encoding.get_attention_mask().to_vec())?,
]?)?;
// outputs: ner_logits, pos_logits, dep_logits, cls_logits
Files
| File | Size | Description |
|---|---|---|
model.onnx |
1,663 MB | FP32 ONNX model |
model-int8.onnx |
612 MB | INT8 quantized (dynamic) |
model.pt |
1,670 MB | PyTorch weights |
label_maps.json |
<1 MB | NER/POS/DEP/CLS label vocabularies |
tokenizer.json |
8 MB | DeBERTa-v3 tokenizer |
Important: Use This Model's Tokenizer
Always load the tokenizer from this repo, not from microsoft/deberta-v3-large. The upstream HuggingFace tokenizer may omit BOS/EOS special tokens, shifting all positions and producing incorrect results.
# Correct
tokenizer = AutoTokenizer.from_pretrained("dragonscale-ai/kniv-deberta-v3-large-nlp-en")
# WRONG — may omit special tokens
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-large")
Limitations
- English only
- Max 128 tokens — longer inputs truncated
- CLS labels are GPT-classified — not human-annotated, macro F1 reflects imbalanced rare labels
- Server-side model — 435M params, not for edge/mobile. Use the distilled student for that.
Source
- Code: rustic-ai/kniv-nlp-models
- Dataset: dragonscale-ai/kniv-corpus-en
Citation
@misc{kniv-deberta-v3-large-2026,
title={kniv-deberta-v3-large-nlp-en: Multi-task NLP Teacher Model},
author={Dragonscale Industries Inc.},
year={2026},
url={https://huggingface.co/dragonscale-ai/kniv-deberta-v3-large-nlp-en}
}
Datasets used to train dragonscale-ai/kniv-deberta-v3-large-nlp-en
dragonscale-ai/kniv-corpus-en
Evaluation results
- F1self-reported0.725
- Accuracyself-reported0.984
- UASself-reported0.871
- Macro F1self-reported0.493