ArcadaLabs/Ouro-2.6B-mlx-bf16

Unquantized (bf16) MLX conversion of ByteDance/Ouro-2.6B, a Looped Language Model that applies the same transformer blocks recurrently to achieve performance well above its parameter count.

This is a full-precision conversion with no quantization. For a smaller 4-bit quantized version, see mlx-community/Ouro-2.6B-4bit.

Use with mlx-lm

pip install mlx-lm

Important: mlx-lm does not ship with built-in support for the Ouro architecture. You need to add a custom ouro.py model file to your mlx-lm models directory. See the "Setup Notes" section below.

from mlx_lm import load, generate

model, tokenizer = load("ArcadaLabs/Ouro-2.6B-mlx-bf16")

messages = [{"role": "user", "content": "hello"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True)

Setup Notes

As of mlx-lm 0.29.1, the "ouro" model type is not recognized out of the box. To use this model, you need to place a custom ouro.py file in your mlx-lm models directory (typically at <your-venv>/lib/python3.x/site-packages/mlx_lm/models/ouro.py).

The implementation is adapted from the mlx-community/Ouro-2.6B-Thinking-4bit model card's architecture and the original ByteDance/Ouro-2.6B PyTorch implementation. Key adaptations for MLX:

  • Sandwich RMSNorm (pre and post normalization for both attention and MLP)
  • Recurrent looping over transformer blocks (total_ut_steps passes)
  • KV cache sized to num_hidden_layers * total_ut_steps entries
  • Early exit gate (linear projection, used during training)

A reference ouro.py implementation is included in this repository.

Benchmark (Single Prompt, Informal)

Tested on Apple M5 Pro (64GB). This is a single-prompt sanity check to verify the model runs correctly, give a rough sense of throughput, and show that the conversion produces coherent output. It is not a rigorous benchmark.

Prompt:

A medieval kingdom discovers electricity 300 years before the industrial revolution. Write a short analysis of: (1) what societal structures would change first, (2) what would remain surprisingly unchanged, and (3) one second-order consequence that most people wouldn't immediately consider. Be specific and cite historical parallels where relevant.

Performance:

Variant Tokens Time Throughput Peak Memory
MLX bf16 (this model) 1024 85.3s 12.0 tok/s 5.76 GB
PyTorch fp16 on MPS 1024 148.9s 6.9 tok/s 4.97 GB

MLX bf16 output (this model):

Click to expand full response (1024 tokens)
<think>
Okay, let's tackle this question. The user wants an analysis of how a medieval kingdom
discovering electricity 300 years early would affect societal structures, what remains
unchanged, and a second-order consequence. They also want historical parallels.

First, I need to break down each part. For part (1), societal structures changing first.
Electricity in a medieval context -- so no industrial base, feudal systems, agrarian economy.
The first changes would likely be in technology and maybe some social mobility. Maybe the
church or nobility controlling electricity? Like how the Catholic Church controlled knowledge
in the Middle Ages. So perhaps the church or ruling elites would monopolize electricity,
leading to new power structures. Maybe the rise of a new class of engineers or technicians.
Historical parallel could be the Renaissance's spread of knowledge, but here it's electricity
instead.

For part (2), what remains unchanged. Medieval society was heavily agrarian, feudal. Even with
electricity, the basic economic structure might stay. Land ownership, serfdom, maybe. The
social hierarchy could persist because the feudal system is deeply ingrained. Maybe the
church's influence remains strong, as they might control the new technology. Historical
parallel: the Roman Empire's persistence despite technological changes.

Part (3), a second-order consequence. Maybe something like environmental impact. Early
electricity use could lead to pollution, but in a medieval setting, maybe resource depletion.
For example, mining for materials needed for electricity (like coal, metals) could strain
resources, leading to conflicts or economic shifts. Or maybe the spread of ideas through
communication devices (like telegraphs) could lead to social unrest, similar to how the
printing press spread ideas in the Renaissance. But the user wants something not immediately
obvious. Maybe the development of new industries that disrupt traditional crafts, leading to
labor issues. Or the creation of a new class of workers dependent on electricity, leading to
urbanization.
</think>

(Generation hit the 1024 token limit during the thinking phase)

Note: The base model (non-Thinking) produced <think> tags in this run, which is unexpected since its chat template does not inject them. This may be an artifact of sampling (temperature=1.0) or a mixup during the benchmark run. We cannot verify the cause.

PyTorch fp16 output (same model, same prompt):

Click to expand full response (1024 tokens)

The model produced an extended internal reasoning trace followed by a structured response. Below is an excerpt from the structured portion (the full 1024 tokens were mostly reasoning):

(1) Societal Structures That Would Change First
The earliest societal shifts would center on labor productivity and urban infrastructure.
Electric lighting would revolutionize nighttime activities, reducing reliance on candles and
oil lamps. Cities like Florence or Paris (analogous to medieval hubs) could see safer streets,
extended working hours in workshops, and a surge in nighttime commerce. Factories powered by
electric motors would outpace manual labor, undermining the feudal system's agrarian focus.
Wealth would shift from land ownership to industrial capital, accelerating the decline of
feudal hierarchies -- mirroring how the Industrial Revolution eroded medieval guilds and
manorialism.

(2) What Would Remain Surprisingly Unchanged
Despite technological leaps, hierarchical social structures and religious authority would
persist...

(Excerpt from 1024-token output; generation hit the token limit)

Both backends produce coherent, structured reasoning. Outputs differ between runs; we used temperature=1.0 with top_p=0.7, so we cannot draw conclusions about output equivalence from a single sampled run. A proper comparison would require greedy decoding and token-level comparison.

Model Details

  • Architecture: Looped Language Model (LoopLM) / Universal Transformer
  • Parameters: 2.6B
  • Layers: 24 (applied recurrently 4 times = 96 effective layers)
  • Hidden size: 2048
  • Attention: Multi-Head (16 heads, 128 head dim)
  • FFN: SwiGLU (intermediate size 5632)
  • Position encoding: RoPE
  • Vocabulary: 49,152 tokens
  • Training tokens: 7.7T
  • Context length: 4K (training), extendable to 64K
  • Precision in this conversion: bfloat16
  • Disk size: ~5.2 GB

Credits

Citation

@article{ouro2025,
  title={Scaling Latent Reasoning via Looped Language Models},
  author={ByteDance Seed},
  year={2025},
  url={https://arxiv.org/abs/2510.25741}
}
Downloads last month
255
Safetensors
Model size
3B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ArcadaLabs/Ouro-2.6B-mlx-bf16

Finetuned
(2)
this model

Paper for ArcadaLabs/Ouro-2.6B-mlx-bf16