USAM β usam-full
Canonical recipe trained on the full 3-source mix (AGIBOT + BRIDGE + DROID).
USAM Stage B1-real pretrain checkpoints, variant usam-full. Architecture: Qwen3-VL-4B-Instruct backbone (LoRA r=16) + DINOv3 visual encoder + LDA flow-matching action head + USAM auxiliary heads (drift / subtask / depth-RGB geom).
Provenance
- SLURM job:
66742 - Run directory:
runs/usam_real_qwen_droid-66742/ - Source config:
configs/train/stage_b1_droid_pretrain.yaml(mirrored asconfig.yamlin this repo)
Loss configuration
loss_weights:
action: 1.0
rgb: 1.0
depth: 0.3
drift: 0.1
subtask: 0.1
geom_target: 0.1 # geom-loss fixes have landed in code:
# - usam/aux_heads/depth_consistency.py: fp32 autocast wrap (no bf16 NaN)
# - usam/_train_helpers.py:947: geom_dim=mmdit_output_dim (1024 under LDA player)
# Per A00 spec in docs/ABLATION_STUDY.md.
ramp_steps: 10000 # 10 % warmup (vs A0's ramp==max==50k regime where
# aux losses NEVER reached nominal weight). With max=100k,
# aux heads get 90k steps of full-weight gradient signal β
# appropriate for a long flagship pretrain.
Saved checkpoints
4 checkpoints, every 2500 steps:
checkpoint_step00002500.ptcheckpoint_step00005000.ptcheckpoint_step00007500.ptcheckpoint_step00010000.ptLatest step:
10000Checkpoint format:
trainable+buffersβ state_dict (LoRA + trainable adapters + buffers) plus full AdamW optimizer state, scheduler state, and run metadata. Every saved step is independently resumable for continued training.
Ablation context
This repo is one variant in the USAM B1-real loss-ablation matrix. The seven variants isolate single-loss contributions:
| Variant | Repo | What it tests |
|---|---|---|
| Full + geom | usam-full-loss-geom |
Upper bound: does depth-RGB geometric consistency help on top of the baseline? |
| Full (baseline) | usam-full-loss |
Canonical recipe β action + rgb + depth + drift + subtask |
| Action only | usam-action-only |
Lower bound: pure VLA action loss |
| No aux vision | usam-no-aux-vision |
Does aux RGB + depth co-training help? |
| No USAM aux | usam-no-usam-aux |
Do drift + subtask add lift beyond LDA-style co-training? |
| Drift only | usam-drift-only |
Marginal contribution of drift alone |
| 3-source (DROID) | usam-full |
Canonical recipe + DROID dataset (3-source full data mix) |
See docs/ABLATION_STUDY.md in the source repo for the full design.
Usage
import torch
ckpt = torch.load(
"checkpoint_step00010000.pt",
weights_only=False,
map_location="cpu",
)
state_dict = ckpt["state_dict"] # trainable + buffers only
step = ckpt["step"] # int
opt_state = ckpt["optimizer"] # for resume
sched = ckpt["scheduler"] # for resume
# Load into a freshly-constructed USAM model:
missing, unexpected = model.load_state_dict(state_dict, strict=False)
# `missing` will contain the frozen base-model keys (Qwen3-VL + DINOv3),
# which load from their respective HF base checkpoints. See
# usam/_train_helpers.py:2437-2453 for the reference loader.
- Downloads last month
- 5
Model tree for christian0420/usam-full
Base model
Qwen/Qwen3-VL-4B-Instruct