YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
PureBit Transformer
A transformer that operates on raw binary bits instead of tokens.
Architecture
- Vocab size: 2 (just 0 and 1!)
- d_model: 256
- Layers: 6
- Heads: 8
- Parameters: ~18M
Training
- Trained on raw UTF-8 bytes converted to bits
- Best loss achieved: 0.6863 (random = 0.693)
- Training data: ~70MB of text = 560M bits
Key Insight
This explores whether transformers can learn at the bit level. Results show minimal learning beyond random - predicting individual bits is extremely hard without byte-level structure.
Usage
import torch
# Load checkpoint
ckpt = torch.load('purebit_best_70mb.pt')
print(f"Loss: {ckpt['loss']:.4f}")
print(f"Bits seen: {ckpt['bits']:,}")
# Model architecture in model.py
Files
purebit_best_70mb.pt- Best checkpoint (loss 0.6863)model.py- Model architecturetrain.py- Training scriptinfer.py- Inference script
Author
OpenTransformers - Experimental architecture research
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support