YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

PureBit Transformer

A transformer that operates on raw binary bits instead of tokens.

Architecture

  • Vocab size: 2 (just 0 and 1!)
  • d_model: 256
  • Layers: 6
  • Heads: 8
  • Parameters: ~18M

Training

  • Trained on raw UTF-8 bytes converted to bits
  • Best loss achieved: 0.6863 (random = 0.693)
  • Training data: ~70MB of text = 560M bits

Key Insight

This explores whether transformers can learn at the bit level. Results show minimal learning beyond random - predicting individual bits is extremely hard without byte-level structure.

Usage

import torch

# Load checkpoint
ckpt = torch.load('purebit_best_70mb.pt')
print(f"Loss: {ckpt['loss']:.4f}")
print(f"Bits seen: {ckpt['bits']:,}")

# Model architecture in model.py

Files

  • purebit_best_70mb.pt - Best checkpoint (loss 0.6863)
  • model.py - Model architecture
  • train.py - Training script
  • infer.py - Inference script

Author

OpenTransformers - Experimental architecture research

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support