Global Batch size : 384 seq_len: 2048
Checkpoint every 500 steps
i.e every 393216000 tokens or 400M Tokens
Current Revison available as
checkpoint-500393Mcheckpoint-1000786Mcheckpoint-15001.18Bcheckpoint-20001.57Bcheckpoint-25001.96B
max_lr : 7e-5
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support