USAM β€” usam-full

Canonical recipe trained on the full 3-source mix (AGIBOT + BRIDGE + DROID).

USAM Stage B1-real pretrain checkpoints, variant usam-full. Architecture: Qwen3-VL-4B-Instruct backbone (LoRA r=16) + DINOv3 visual encoder + LDA flow-matching action head + USAM auxiliary heads (drift / subtask / depth-RGB geom).

Provenance

  • SLURM job: 66742
  • Run directory: runs/usam_real_qwen_droid-66742/
  • Source config: configs/train/stage_b1_droid_pretrain.yaml (mirrored as config.yaml in this repo)

Loss configuration

loss_weights:
  action: 1.0
  rgb: 1.0
  depth: 0.3
  drift: 0.1
  subtask: 0.1
  geom_target: 0.1            # geom-loss fixes have landed in code:
                              # - usam/aux_heads/depth_consistency.py: fp32 autocast wrap (no bf16 NaN)
                              # - usam/_train_helpers.py:947: geom_dim=mmdit_output_dim (1024 under LDA player)
                              # Per A00 spec in docs/ABLATION_STUDY.md.
  ramp_steps: 10000           # 10 % warmup (vs A0's ramp==max==50k regime where
                              # aux losses NEVER reached nominal weight). With max=100k,
                              # aux heads get 90k steps of full-weight gradient signal β€”
                              # appropriate for a long flagship pretrain.

Saved checkpoints

4 checkpoints, every 2500 steps:

  • checkpoint_step00002500.pt

  • checkpoint_step00005000.pt

  • checkpoint_step00007500.pt

  • checkpoint_step00010000.pt

  • Latest step: 10000

  • Checkpoint format: trainable+buffers β€” state_dict (LoRA + trainable adapters + buffers) plus full AdamW optimizer state, scheduler state, and run metadata. Every saved step is independently resumable for continued training.

Ablation context

This repo is one variant in the USAM B1-real loss-ablation matrix. The seven variants isolate single-loss contributions:

Variant Repo What it tests
Full + geom usam-full-loss-geom Upper bound: does depth-RGB geometric consistency help on top of the baseline?
Full (baseline) usam-full-loss Canonical recipe β€” action + rgb + depth + drift + subtask
Action only usam-action-only Lower bound: pure VLA action loss
No aux vision usam-no-aux-vision Does aux RGB + depth co-training help?
No USAM aux usam-no-usam-aux Do drift + subtask add lift beyond LDA-style co-training?
Drift only usam-drift-only Marginal contribution of drift alone
3-source (DROID) usam-full Canonical recipe + DROID dataset (3-source full data mix)

See docs/ABLATION_STUDY.md in the source repo for the full design.

Usage

import torch

ckpt = torch.load(
    "checkpoint_step00010000.pt",
    weights_only=False,
    map_location="cpu",
)
state_dict = ckpt["state_dict"]    # trainable + buffers only
step       = ckpt["step"]          # int
opt_state  = ckpt["optimizer"]     # for resume
sched      = ckpt["scheduler"]     # for resume

# Load into a freshly-constructed USAM model:
missing, unexpected = model.load_state_dict(state_dict, strict=False)
# `missing` will contain the frozen base-model keys (Qwen3-VL + DINOv3),
# which load from their respective HF base checkpoints. See
# usam/_train_helpers.py:2437-2453 for the reference loader.
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for christian0420/usam-full

Finetuned
(284)
this model