Nemotron-Terminal-8B Reproduction (Marin)

A reproduction of NVIDIA's Nemotron-Terminal-8B by the Marin community, SFT'd on the full Nemotron-Terminal-Corpus (366K examples).

Results

Benchmark Nemotron-Terminal-8B (released) This model (Marin SFT) Diff
Terminal-Bench 2.0 13.0% ± 2.2 15.9% (14/88) +2.9pp
Terminal-Bench Lite 23.0% (23/100) 29.0% (29/100) +6.0pp

Both benchmarks show our reproduction outperforming the released weights, despite training on 366K examples vs the paper's 490K.

Training Details

  • Base model: Qwen/Qwen3-8B
  • Training data: nvidia/Nemotron-Terminal-Corpus — 366K examples (skill_based_easy, skill_based_medium, skill_based_mixed, diverse_complex subsets)
  • Steps: 5,721 (2 epochs)
  • Learning rate: 2e-5 (cosine schedule)
  • Batch size: 128
  • Sequence length: 32,768
  • Final loss: ~0.35
  • Hardware: TPU v5p-32
  • Framework: Marin (Levanter-based)
  • W&B run: exp3490b

Evaluation Setup

  • Inference: vLLM-TPU (native) on v5p-8
  • Agent: Terminus-2 (temperature=0.6)
  • Environments: Daytona

Links

Downloads last month
18
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AlienKevin/nemotron-terminal-8b-repro

Finetuned
Qwen/Qwen3-8B
Finetuned
(1423)
this model

Dataset used to train AlienKevin/nemotron-terminal-8b-repro

Evaluation results