USAM — usam-full

Canonical recipe trained on the full 3-source mix (AGIBOT + BRIDGE + DROID).

USAM Stage B1-real pretrain checkpoints, variant usam-full. Architecture: Qwen3-VL-4B-Instruct backbone (LoRA r=16) + DINOv3 visual encoder + LDA flow-matching action head + USAM auxiliary heads (drift / subtask / depth-RGB geom).

Provenance

SLURM job: 66742
Run directory: runs/usam_real_qwen_droid-66742/
Source config: configs/train/stage_b1_droid_pretrain.yaml (mirrored as config.yaml in this repo)

Loss configuration

loss_weights:
  action: 1.0
  rgb: 1.0
  depth: 0.3
  drift: 0.1
  subtask: 0.1
  geom_target: 0.1            # geom-loss fixes have landed in code:
                              # - usam/aux_heads/depth_consistency.py: fp32 autocast wrap (no bf16 NaN)
                              # - usam/_train_helpers.py:947: geom_dim=mmdit_output_dim (1024 under LDA player)
                              # Per A00 spec in docs/ABLATION_STUDY.md.
  ramp_steps: 10000           # 10 % warmup (vs A0's ramp==max==50k regime where
                              # aux losses NEVER reached nominal weight). With max=100k,
                              # aux heads get 90k steps of full-weight gradient signal —
                              # appropriate for a long flagship pretrain.

Saved checkpoints

4 checkpoints, every 2500 steps:

checkpoint_step00002500.pt
checkpoint_step00005000.pt
checkpoint_step00007500.pt
checkpoint_step00010000.pt
Latest step: 10000
Checkpoint format: trainable+buffers — state_dict (LoRA + trainable adapters + buffers) plus full AdamW optimizer state, scheduler state, and run metadata. Every saved step is independently resumable for continued training.

Ablation context

This repo is one variant in the USAM B1-real loss-ablation matrix. The seven variants isolate single-loss contributions:

Variant	Repo	What it tests
Full + geom	`usam-full-loss-geom`	Upper bound: does depth-RGB geometric consistency help on top of the baseline?
Full (baseline)	`usam-full-loss`	Canonical recipe — `action + rgb + depth + drift + subtask`
Action only	`usam-action-only`	Lower bound: pure VLA action loss
No aux vision	`usam-no-aux-vision`	Does aux RGB + depth co-training help?
No USAM aux	`usam-no-usam-aux`	Do drift + subtask add lift beyond LDA-style co-training?
Drift only	`usam-drift-only`	Marginal contribution of drift alone
3-source (DROID)	`usam-full`	Canonical recipe + DROID dataset (3-source full data mix)

See docs/ABLATION_STUDY.md in the source repo for the full design.

Usage

import torch

ckpt = torch.load(
    "checkpoint_step00010000.pt",
    weights_only=False,
    map_location="cpu",
)
state_dict = ckpt["state_dict"]    # trainable + buffers only
step       = ckpt["step"]          # int
opt_state  = ckpt["optimizer"]     # for resume
sched      = ckpt["scheduler"]     # for resume

# Load into a freshly-constructed USAM model:
missing, unexpected = model.load_state_dict(state_dict, strict=False)
# `missing` will contain the frozen base-model keys (Qwen3-VL + DINOv3),
# which load from their respective HF base checkpoints. See
# usam/_train_helpers.py:2437-2453 for the reference loader.

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for christian0420/usam-full

Base model

Qwen/Qwen3-VL-4B-Instruct

Finetuned

(284)

this model