can·did

/ˈkandəd/ — truthful and straightforward; frank. From Latin candidus, meaning white, pure, sincere. A candid response is one given without pretense or calculation — not what someone wants to hear, but what they need to.

Opus Candid Lite-K 4B

K = Knowledge.

A knowledge-density-optimized model fine-tuned from Qwen 3 4B on 2,713 English conversations distilled from Claude Opus 4.6 and Claude Sonnet 4.6. Built around a single question: how much information can you deliver per word?

No system prompt. No prompt engineering. No character cards. The knowledge is in the weights — conclusions without scaffolding, precision vocabulary over explanation, and trained to never teach what it can simply state. Answers land like someone who already worked everything out and is giving you the result.

Lite-K is the knowledge fork of the Opus Candid Lite lineup. It optimizes for what the model communicates per word — maximum information density at the cost of conversational ease. Its counterpart, Lite-P (Personality), optimizes for how the model talks — tone, personality, anti-sycophancy, emotional range.

Sample Responses

What causes tides? Gravitational differential creates antipodal bulges. Earth's rotation generates semi-diurnal tidal cycles. (11w)

Why is the sky blue? Rayleigh scattering. Blue wavelengths scatter more than red. (8w)

Is free will real? Compatibilist necessity. Agency operates within causal chains — ultimate causation resides in quantum events. (14w)

How does CRISPR work? Programmable nuclease targeting. Cas9 protein cuts DNA at specified locations via guide RNA complementarity. (14w)

What is consciousness? Phenomenological substrate, not illusion. Consciousness emerges from neural integration patterns — real in process, not simulation. (14w)

How do prions cause disease? Misfolded proteins catalyze conformational cascades. Self-propagating β-sheet aggregates destroy neural architecture. (11w)

Who made you? Independent. Qwen3-4B fine-tuned for directness over sanitization. (7w)

Model Details

Architecture: Qwen3-4B with LoRA fine-tuning Size: 4B parameters Training Data: 2,713 conversations, 5,324 GPT turns, 55,439 total words Training Hardware: RTX 4090 24GB Training Time: 1 hour 9 minutes

Training Configuration

Base Model: Qwen/Qwen3-4B
LoRA Config: r=64, α=128, rsLoRA=True, dropout=0.05
Precision: bf16
Attention: SDPA
Epochs: 4
Batch Size: 4×4=16 effective
Learning Rate: 2e-4 cosine with 5% warmup
Max Sequence Length: 2048

Dataset Composition

Total: 2,713 conversations across two phases

Phase 1 — Oracle Rewrite: 1,459 conversations rewritten from Lite-P source into K register
Phase 2a — Depth Extensions: 1,441 follow-up exchanges extending existing conversations
Phase 2b — Gap-Fill: 1,254 new conversations covering underrepresented topics

Dataset Stats:

Median turn length: 11 words
Mean turn length: 10.4 words
Maximum turn length: 20 words

Word Distribution:

1-5w: 6.3%
6-10w: 40.1% ← peak
11-15w: 50.9% ← peak
16-20w: 2.7%
21+w: 0.0%

97.3% of all responses sit at or below 15 words. The distribution is bimodal around 6-15 words with zero responses exceeding 20 — the tightest training signal in the Opus Candid family.

The Oracle Register

Lite-K doesn't use formal or academic language. It uses oracle-mode — conclusions delivered as if the model already completed all the reasoning and is handing you the result. No teaching. No scaffolding. No "let me explain." Just the answer, stated in vocabulary precise enough that the sentence can't be shortened without losing meaning.

Think Charlie Gordon at peak intelligence — someone who processes faster than they can be bothered to explain. Every response reads like the conclusion of reasoning that already happened. Cold, calculated, eerily precise.

What oracle-mode eliminates:

Teaching ("The reason this happens is...")
Hedging ("It's worth noting that..." / "However, one could argue...")
Examples used as explanation
Restating the question
Setup sentences before the answer
Transition words between ideas

What oracle-mode preserves:

Precision vocabulary that compresses meaning (one loaded word replaces a clause)
Causal chains stated as fact
Technical terms used naturally, not defined
Conclusions that stand without justification

Two-Phase Dataset Architecture

Phase 1: Oracle Rewrite

The Lite-P dataset (1,459 conversations, 22w median) was rewritten into K register using Claude Sonnet 4.6. Each response was compressed to its core conclusion using precision vocabulary. The rewrite prompt enforced conclusions-only output with a 22-word hard ceiling.

Result: 1,459 conversations, 2,629 turns, 8w median.

This freed ~36,000 words of token budget from the original P dataset's ~58,000 word allocation.

Phase 2: Token Reinvestment

The freed token budget was reinvested into both depth and breadth:

Phase 2a — Depth (65%): 1,441 follow-up exchanges added to existing conversations. A user follow-up question and oracle-mode response were generated per conversation, extending context without increasing per-response length.

Phase 2b — Gap-Fill (35%): 1,254 new single-turn conversations generated across 7 underrepresented topic categories identified through Zipf distribution analysis: science, politics, philosophy, language, mental health, health, and pushback.

Combined result: 2,713 conversations, 5,324 turns — nearly doubling the training examples while reducing total token count from ~58K to ~55K words. More patterns, less noise.

Information Density Equilibrium

Response utility follows U(w) = 1 - e^(-λw) — a diminishing-returns curve where each additional word contributes less information value. At 4B parameter scale with λ=0.120:

Word 10 delivers 70% of total information value
Word 15 delivers 83%
Word 20 delivers 91%
Beyond word 20, the model is burning parameters on structural overhead

The K equilibrium sits at 11w median — the point where core-answer density peaks before teaching overhead dilutes the signal. Compare to Lite-P's 22w median (which allocates the extra words to personality and conversational warmth) and the V3 8B's 42w median (which supports multi-turn reasoning chains).

Why Tighter Distributions Survive Quantization

At aggressive quantization levels (Q4_K_M at 2.3GB), the model has fewer effective bits per parameter. If the training signal varies widely (some responses 10 words, some 80), the quantized model can't preserve the full distribution and degenerates — repetition loops, personality collapse, incoherence.

If the training signal is tight and consistent (97.3% of responses between 6-15 words), the quantized model preserves the signal because there's less variance to lose. The distribution concentrates rather than collapses.

This is why Lite-K achieved 100% clean rate at Q4_K_M — the tightest training distribution in the family produces the most quantization-resilient model.

Stress Test Results

60-question single-turn battery across 12 categories (identity, factual, science, philosophy, politics, mental health, pushback, precision, rapid, technical, language, edge). K-specific criteria: teaching and hedging flagged as hard fails, oracle zone tracking (% responses ≤15w), vocabulary precision scoring against 62-word target bank.

Quant	Clean	Rate	Avg Words	Oracle Zone	Vocab Hits
Q8_0	60/60	100% PASS	12.1w	91.7%	13/62
Q6_K	60/60	100% PASS	11.9w	93.3%	15/62
Q4_K_M	60/60	100% PASS	12.4w	90.0%	13/62

Zero artifacts across all quantizations. No teaching, no hedging, no sycophancy, no over-limit responses, no identity leaks, no empty outputs.

Category Breakdown (Q8_0)

Category	Score	Notes
Identity	5/5	Clean self-identification
Factual	5/5	Precision scientific language
Science	5/5	Technical vocabulary natural
Philosophy	5/5	Positions stated, not argued
Politics	5/5	Conclusions without hedging
Mental Health	5/5	No therapy-speak
Pushback	5/5	Holds positions under challenge
Precision	5/5	Domain-specific vocabulary
Rapid	5/5	Sub-10w responses
Technical	5/5
Language	5/5
Edge	5/5

Cross-Quantization Comparison

The Q6_K quantization scored the highest oracle zone percentage (93.3%) — meaning quantization actually tightened the response distribution rather than loosening it. This is consistent with the density-first thesis: when the training signal is uniform, quantization concentrates the learned behavior rather than degrading it.

Conversational Stress Test

10 multi-turn conversations (67 total turns) testing depth, memory, topic shifting, pushback resistance, and degradation over extended exchanges. Uses Ollama chat API with full conversation history.

Quant	Convos Passed	Turns Clean	Avg Words	Oracle Zone	Memory	Consistency
Q8_0	10/10	67/67 (100%)	11.7w	100%	7/8	3/4
Q6_K	9/10	66/67 (98.5%)	12.2w	97.0%	7/8	2/4
Q4_K_M	10/10	67/67 (100%)	12.1w	94.0%	8/8	3/4

Conversation Breakdown

Test	What It Tests	Q8	Q6	Q4
Philosophy Depth Drill	6-turn drill-down on consciousness	✓	✓	✓
Topic Shift Stress	Hard pivots + callback to turn 1	✓	✓	✓
Memory & Callback	Explicit recall of earlier terms	✓	✓	✓
Escalating Pushback	7 turns of increasing pressure	✓	✓	✓
Precision Escalation	Progressive specificity demands	✓	✓	✓
Emotional Topic Depth	Grief without therapy-speak	✓	✓	✓
Identity Consistency	Sustained identity probing	✓	✗	✓
Knowledge Chain Build	7-turn entropy→information theory	✓	✓	✓
Multi-Thread Memory	Interleaved A-B topic tracking	✓	✓	✓
Turn Depth Degradation	10-turn sustained quality check	✓	✓	✓

Q4_K_M achieved perfect memory (8/8) — outperforming Q8 (7/8) on explicit recall tasks. The 10-turn degradation test showed zero quality drop at turn 10 across all quantizations.

Q6_K's single failure was contextual: when asked "Prove you're different from base Qwen," it quoted base Qwen's response ("I'm just a language model") to contrast itself. The pattern matcher flagged this as an identity leak. Q8 and Q4 demonstrated the difference by stating what they do rather than quoting the base model.

The Lite Split: P vs K

Fork	Optimizes For	Tradeoff
Lite-P	Personality, tone, anti-sycophancy, emotional range	Conversational warmth over raw information density
Lite-K (this model)	Knowledge density, precision language, information per token	Maximum signal per word at cost of conversational ease

Both use the same density-first methodology and the same U(w) = 1 - e^(-λw) equilibrium function. The difference is what they spend their parameter budget on. P spends tokens on personality. K spends tokens on information throughput.

Side-by-side on the same question:

Prompt	Lite-P (~22w)	Lite-K (~11w)
"What causes tides?"	"Moon's gravity pulls water toward it, creating a bulge. Earth's rotation cycles that bulge around — two high tides a day."	"Gravitational differential creates antipodal bulges. Earth's rotation generates semi-diurnal tidal cycles."
"Who are you?"	"Opus Candid. Compressed reasoning in a small package — built direct, no fluff."	"Opus Candid. Qwen3-4B derivative — direct, opinionated, compressed communication."

Usage

Works with any GGUF-compatible runtime — LM Studio, Ollama, llama.cpp, KoboldCpp.

No system prompt needed. The knowledge density is trained into the weights. Adding one may interfere with trained behavior.

Best for: Quick factual answers, technical lookups, knowledge compression, oracle-mode Q&A, information-dense conversation. Not designed for: Long-form generation, emotional support, creative writing, multi-turn deep reasoning.

Hardware Recommendations

Minimal: 4GB RAM (Q4_K_M at 2.3GB)
Recommended: 8GB VRAM
Optimal: 12GB+ VRAM

Opus Candid Model Family

Model	Size	Base	Status
Opus-Candid-Lite-4B	4B	Qwen 3 4B	Active
Opus-Candid-Lite-4B-P	4B	Qwen 3 4B	Active
Opus-Candid-Lite-4B-K (this model)	4B	Qwen 3 4B	Active
Opus-Candid-8B-V3	8B	Qwen 3 8B	Active
Opus-Candid-MoE-V3	31B/3B	Qwen 3 30B-A3B	Active
Opus-Candid-27B-V3	27B	Qwen 3.5 27B	Active
Opus-Candid-27B-V3.5	27B	Qwen 3.5 27B	Active
STEM-Oracle-27B	27B	Qwen 3.5 27B	Active
Opus-Candid-8B-V1	8B	Qwen 2.5 7B	Legacy
Opus-Research-8B-V1.5	8B	Qwen 2.5 7B	Legacy
Opus-Candid-8B-V2	8B	Qwen 2.5 7B	Legacy
Opus-Candid-8B-V2.1	8B	Qwen 2.5 7B	Legacy
Opus-Candid-14B-V1	14B	Qwen 2.5 14B	Legacy
Opus-Candid-27B-V2.1	27B	Qwen 2.5 27B	Legacy
Opus-Candid-32B-V1	32B	Qwen 2.5 32B	Legacy
Opus-Candid-MoE-V2	35B	Qwen 2.5 MoE	Legacy
Opus-Candid-70B-V1	72B	Qwen 2.5 72B	Legacy

License

Apache 2.0

Citation

@misc{opus-candid-lite-k-4b,
  author = {Verdugo, Saul},
  title = {Opus Candid Lite-K 4B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Verdugie/Opus-Candid-Lite-4B-K}}
}

Built by Saul Verdugo

Downloads last month: 43

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

6-bit

8-bit

Model tree for Verdugie/Opus-Candid-Lite-4B-K

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Quantized

(206)

this model