tritter-500m-bitnet / README.md
tzervas's picture
Final 500M model (loss=11.2343)
41d3e72 verified
metadata
license: mit
tags:
  - tritter
  - bitnet
  - code
  - 500m

Tritter 500M BitNet

A 500M parameter BitNet b1.58 ternary-quantized model trained for code generation.

Training Details

  • Parameters: 524,372,480
  • Training tokens: 118,111,072
  • Final loss: 11.2343
  • Min loss: 11.0722
  • Tokens/sec: 23679.4
  • Training duration: 1:23:07.915359
  • GPU: NVIDIA GeForce RTX 5080

Checkpoints

Intermediate checkpoints available at 10%, 20%, ..., 90% progress.

Generated with Tritter