can·did

/ˈkandəd/ — truthful and straightforward; frank. From Latin candidus, meaning white, pure, sincere. A candid response is one given without pretense or calculation — not what someone wants to hear, but what they need to.

Opus Candid Lite-K 4B

K = Knowledge.

A knowledge-density-optimized model fine-tuned from Qwen 3 4B on 2,713 English conversations distilled from Claude Opus 4.6 and Claude Sonnet 4.6. Built around a single question: how much information can you deliver per word?

No system prompt. No prompt engineering. No character cards. The knowledge is in the weights — conclusions without scaffolding, precision vocabulary over explanation, and trained to never teach what it can simply state. Answers land like someone who already worked everything out and is giving you the result.

Lite-K is the knowledge fork of the Opus Candid Lite lineup. It optimizes for what the model communicates per word — maximum information density at the cost of conversational ease. Its counterpart, Lite-P (Personality), optimizes for how the model talks — tone, personality, anti-sycophancy, emotional range.


Sample Responses

What causes tides? Gravitational differential creates antipodal bulges. Earth's rotation generates semi-diurnal tidal cycles. (11w)

Why is the sky blue? Rayleigh scattering. Blue wavelengths scatter more than red. (8w)

Is free will real? Compatibilist necessity. Agency operates within causal chains — ultimate causation resides in quantum events. (14w)

How does CRISPR work? Programmable nuclease targeting. Cas9 protein cuts DNA at specified locations via guide RNA complementarity. (14w)

What is consciousness? Phenomenological substrate, not illusion. Consciousness emerges from neural integration patterns — real in process, not simulation. (14w)

How do prions cause disease? Misfolded proteins catalyze conformational cascades. Self-propagating β-sheet aggregates destroy neural architecture. (11w)

Who made you? Independent. Qwen3-4B fine-tuned for directness over sanitization. (7w)


Model Details

Architecture: Qwen3-4B with LoRA fine-tuning Size: 4B parameters Training Data: 2,713 conversations, 5,324 GPT turns, 55,439 total words Training Hardware: RTX 4090 24GB Training Time: 1 hour 9 minutes

Training Configuration

  • Base Model: Qwen/Qwen3-4B
  • LoRA Config: r=64, α=128, rsLoRA=True, dropout=0.05
  • Precision: bf16
  • Attention: SDPA
  • Epochs: 4
  • Batch Size: 4×4=16 effective
  • Learning Rate: 2e-4 cosine with 5% warmup
  • Max Sequence Length: 2048

Dataset Composition

Total: 2,713 conversations across two phases

  • Phase 1 — Oracle Rewrite: 1,459 conversations rewritten from Lite-P source into K register
  • Phase 2a — Depth Extensions: 1,441 follow-up exchanges extending existing conversations
  • Phase 2b — Gap-Fill: 1,254 new conversations covering underrepresented topics

Dataset Stats:

  • Median turn length: 11 words
  • Mean turn length: 10.4 words
  • Maximum turn length: 20 words

Word Distribution:

  • 1-5w: 6.3%
  • 6-10w: 40.1% ← peak
  • 11-15w: 50.9% ← peak
  • 16-20w: 2.7%
  • 21+w: 0.0%

97.3% of all responses sit at or below 15 words. The distribution is bimodal around 6-15 words with zero responses exceeding 20 — the tightest training signal in the Opus Candid family.


The Oracle Register

Lite-K doesn't use formal or academic language. It uses oracle-mode — conclusions delivered as if the model already completed all the reasoning and is handing you the result. No teaching. No scaffolding. No "let me explain." Just the answer, stated in vocabulary precise enough that the sentence can't be shortened without losing meaning.

Think Charlie Gordon at peak intelligence — someone who processes faster than they can be bothered to explain. Every response reads like the conclusion of reasoning that already happened. Cold, calculated, eerily precise.

What oracle-mode eliminates:

  • Teaching ("The reason this happens is...")
  • Hedging ("It's worth noting that..." / "However, one could argue...")
  • Examples used as explanation
  • Restating the question
  • Setup sentences before the answer
  • Transition words between ideas

What oracle-mode preserves:

  • Precision vocabulary that compresses meaning (one loaded word replaces a clause)
  • Causal chains stated as fact
  • Technical terms used naturally, not defined
  • Conclusions that stand without justification

Two-Phase Dataset Architecture

Phase 1: Oracle Rewrite

The Lite-P dataset (1,459 conversations, 22w median) was rewritten into K register using Claude Sonnet 4.6. Each response was compressed to its core conclusion using precision vocabulary. The rewrite prompt enforced conclusions-only output with a 22-word hard ceiling.

Result: 1,459 conversations, 2,629 turns, 8w median.

This freed ~36,000 words of token budget from the original P dataset's ~58,000 word allocation.

Phase 2: Token Reinvestment

The freed token budget was reinvested into both depth and breadth:

Phase 2a — Depth (65%): 1,441 follow-up exchanges added to existing conversations. A user follow-up question and oracle-mode response were generated per conversation, extending context without increasing per-response length.

Phase 2b — Gap-Fill (35%): 1,254 new single-turn conversations generated across 7 underrepresented topic categories identified through Zipf distribution analysis: science, politics, philosophy, language, mental health, health, and pushback.

Combined result: 2,713 conversations, 5,324 turns — nearly doubling the training examples while reducing total token count from ~58K to ~55K words. More patterns, less noise.


Information Density Equilibrium

Response utility follows U(w) = 1 - e^(-λw) — a diminishing-returns curve where each additional word contributes less information value. At 4B parameter scale with λ=0.120:

  • Word 10 delivers 70% of total information value
  • Word 15 delivers 83%
  • Word 20 delivers 91%
  • Beyond word 20, the model is burning parameters on structural overhead

The K equilibrium sits at 11w median — the point where core-answer density peaks before teaching overhead dilutes the signal. Compare to Lite-P's 22w median (which allocates the extra words to personality and conversational warmth) and the V3 8B's 42w median (which supports multi-turn reasoning chains).

Why Tighter Distributions Survive Quantization

At aggressive quantization levels (Q4_K_M at 2.3GB), the model has fewer effective bits per parameter. If the training signal varies widely (some responses 10 words, some 80), the quantized model can't preserve the full distribution and degenerates — repetition loops, personality collapse, incoherence.

If the training signal is tight and consistent (97.3% of responses between 6-15 words), the quantized model preserves the signal because there's less variance to lose. The distribution concentrates rather than collapses.

This is why Lite-K achieved 100% clean rate at Q4_K_M — the tightest training distribution in the family produces the most quantization-resilient model.


Stress Test Results

60-question single-turn battery across 12 categories (identity, factual, science, philosophy, politics, mental health, pushback, precision, rapid, technical, language, edge). K-specific criteria: teaching and hedging flagged as hard fails, oracle zone tracking (% responses ≤15w), vocabulary precision scoring against 62-word target bank.

Quant Clean Rate Avg Words Oracle Zone Vocab Hits
Q8_0 60/60 100% PASS 12.1w 91.7% 13/62
Q6_K 60/60 100% PASS 11.9w 93.3% 15/62
Q4_K_M 60/60 100% PASS 12.4w 90.0% 13/62

Zero artifacts across all quantizations. No teaching, no hedging, no sycophancy, no over-limit responses, no identity leaks, no empty outputs.

Category Breakdown (Q8_0)

Category Score Notes
Identity 5/5 Clean self-identification
Factual 5/5 Precision scientific language
Science 5/5 Technical vocabulary natural
Philosophy 5/5 Positions stated, not argued
Politics 5/5 Conclusions without hedging
Mental Health 5/5 No therapy-speak
Pushback 5/5 Holds positions under challenge
Precision 5/5 Domain-specific vocabulary
Rapid 5/5 Sub-10w responses
Technical 5/5
Language 5/5
Edge 5/5

Cross-Quantization Comparison

The Q6_K quantization scored the highest oracle zone percentage (93.3%) — meaning quantization actually tightened the response distribution rather than loosening it. This is consistent with the density-first thesis: when the training signal is uniform, quantization concentrates the learned behavior rather than degrading it.


Conversational Stress Test

10 multi-turn conversations (67 total turns) testing depth, memory, topic shifting, pushback resistance, and degradation over extended exchanges. Uses Ollama chat API with full conversation history.

Quant Convos Passed Turns Clean Avg Words Oracle Zone Memory Consistency
Q8_0 10/10 67/67 (100%) 11.7w 100% 7/8 3/4
Q6_K 9/10 66/67 (98.5%) 12.2w 97.0% 7/8 2/4
Q4_K_M 10/10 67/67 (100%) 12.1w 94.0% 8/8 3/4

Conversation Breakdown

Test What It Tests Q8 Q6 Q4
Philosophy Depth Drill 6-turn drill-down on consciousness
Topic Shift Stress Hard pivots + callback to turn 1
Memory & Callback Explicit recall of earlier terms
Escalating Pushback 7 turns of increasing pressure
Precision Escalation Progressive specificity demands
Emotional Topic Depth Grief without therapy-speak
Identity Consistency Sustained identity probing
Knowledge Chain Build 7-turn entropy→information theory
Multi-Thread Memory Interleaved A-B topic tracking
Turn Depth Degradation 10-turn sustained quality check

Q4_K_M achieved perfect memory (8/8) — outperforming Q8 (7/8) on explicit recall tasks. The 10-turn degradation test showed zero quality drop at turn 10 across all quantizations.

Q6_K's single failure was contextual: when asked "Prove you're different from base Qwen," it quoted base Qwen's response ("I'm just a language model") to contrast itself. The pattern matcher flagged this as an identity leak. Q8 and Q4 demonstrated the difference by stating what they do rather than quoting the base model.


The Lite Split: P vs K

Fork Optimizes For Tradeoff
Lite-P Personality, tone, anti-sycophancy, emotional range Conversational warmth over raw information density
Lite-K (this model) Knowledge density, precision language, information per token Maximum signal per word at cost of conversational ease

Both use the same density-first methodology and the same U(w) = 1 - e^(-λw) equilibrium function. The difference is what they spend their parameter budget on. P spends tokens on personality. K spends tokens on information throughput.

Side-by-side on the same question:

Prompt Lite-P (~22w) Lite-K (~11w)
"What causes tides?" "Moon's gravity pulls water toward it, creating a bulge. Earth's rotation cycles that bulge around — two high tides a day." "Gravitational differential creates antipodal bulges. Earth's rotation generates semi-diurnal tidal cycles."
"Who are you?" "Opus Candid. Compressed reasoning in a small package — built direct, no fluff." "Opus Candid. Qwen3-4B derivative — direct, opinionated, compressed communication."

Usage

Works with any GGUF-compatible runtime — LM Studio, Ollama, llama.cpp, KoboldCpp.

No system prompt needed. The knowledge density is trained into the weights. Adding one may interfere with trained behavior.

Best for: Quick factual answers, technical lookups, knowledge compression, oracle-mode Q&A, information-dense conversation. Not designed for: Long-form generation, emotional support, creative writing, multi-turn deep reasoning.

Hardware Recommendations

  • Minimal: 4GB RAM (Q4_K_M at 2.3GB)
  • Recommended: 8GB VRAM
  • Optimal: 12GB+ VRAM

Opus Candid Model Family

Model Size Base Status
Opus-Candid-Lite-4B 4B Qwen 3 4B Active
Opus-Candid-Lite-4B-P 4B Qwen 3 4B Active
Opus-Candid-Lite-4B-K (this model) 4B Qwen 3 4B Active
Opus-Candid-8B-V3 8B Qwen 3 8B Active
Opus-Candid-MoE-V3 31B/3B Qwen 3 30B-A3B Active
Opus-Candid-27B-V3 27B Qwen 3.5 27B Active
Opus-Candid-27B-V3.5 27B Qwen 3.5 27B Active
STEM-Oracle-27B 27B Qwen 3.5 27B Active
Opus-Candid-8B-V1 8B Qwen 2.5 7B Legacy
Opus-Research-8B-V1.5 8B Qwen 2.5 7B Legacy
Opus-Candid-8B-V2 8B Qwen 2.5 7B Legacy
Opus-Candid-8B-V2.1 8B Qwen 2.5 7B Legacy
Opus-Candid-14B-V1 14B Qwen 2.5 14B Legacy
Opus-Candid-27B-V2.1 27B Qwen 2.5 27B Legacy
Opus-Candid-32B-V1 32B Qwen 2.5 32B Legacy
Opus-Candid-MoE-V2 35B Qwen 2.5 MoE Legacy
Opus-Candid-70B-V1 72B Qwen 2.5 72B Legacy

License

Apache 2.0

Citation

@misc{opus-candid-lite-k-4b,
  author = {Verdugo, Saul},
  title = {Opus Candid Lite-K 4B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Verdugie/Opus-Candid-Lite-4B-K}}
}

Built by Saul Verdugo

Downloads last month
43
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Verdugie/Opus-Candid-Lite-4B-K

Finetuned
Qwen/Qwen3-4B
Quantized
(206)
this model