ArcadaLabs/Ouro-2.6B-mlx-bf16

Unquantized (bf16) MLX conversion of ByteDance/Ouro-2.6B, a Looped Language Model that applies the same transformer blocks recurrently to achieve performance well above its parameter count.

This is a full-precision conversion with no quantization. For a smaller 4-bit quantized version, see mlx-community/Ouro-2.6B-4bit.

Use with mlx-lm

pip install mlx-lm

Important: mlx-lm does not ship with built-in support for the Ouro architecture. You need to add a custom ouro.py model file to your mlx-lm models directory. See the "Setup Notes" section below.

from mlx_lm import load, generate

model, tokenizer = load("ArcadaLabs/Ouro-2.6B-mlx-bf16")

messages = [{"role": "user", "content": "hello"}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=False
)

response = generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True)

Setup Notes

As of mlx-lm 0.29.1, the "ouro" model type is not recognized out of the box. To use this model, you need to place a custom ouro.py file in your mlx-lm models directory (typically at <your-venv>/lib/python3.x/site-packages/mlx_lm/models/ouro.py).

The implementation is adapted from the mlx-community/Ouro-2.6B-Thinking-4bit model card's architecture and the original ByteDance/Ouro-2.6B PyTorch implementation. Key adaptations for MLX:

Sandwich RMSNorm (pre and post normalization for both attention and MLP)
Recurrent looping over transformer blocks (total_ut_steps passes)
KV cache sized to num_hidden_layers * total_ut_steps entries
Early exit gate (linear projection, used during training)

A reference ouro.py implementation is included in this repository.

Benchmark (Single Prompt, Informal)

Tested on Apple M5 Pro (64GB). This is a single-prompt sanity check to verify the model runs correctly, give a rough sense of throughput, and show that the conversion produces coherent output. It is not a rigorous benchmark.

Prompt:

A medieval kingdom discovers electricity 300 years before the industrial revolution. Write a short analysis of: (1) what societal structures would change first, (2) what would remain surprisingly unchanged, and (3) one second-order consequence that most people wouldn't immediately consider. Be specific and cite historical parallels where relevant.

Performance:

Variant	Tokens	Time	Throughput	Peak Memory
MLX bf16 (this model)	1024	85.3s	12.0 tok/s	5.76 GB
PyTorch fp16 on MPS	1024	148.9s	6.9 tok/s	4.97 GB

MLX bf16 output (this model):

Click to expand full response (1024 tokens)

<think>
Okay, let's tackle this question. The user wants an analysis of how a medieval kingdom
discovering electricity 300 years early would affect societal structures, what remains
unchanged, and a second-order consequence. They also want historical parallels.

First, I need to break down each part. For part (1), societal structures changing first.
Electricity in a medieval context -- so no industrial base, feudal systems, agrarian economy.
The first changes would likely be in technology and maybe some social mobility. Maybe the
church or nobility controlling electricity? Like how the Catholic Church controlled knowledge
in the Middle Ages. So perhaps the church or ruling elites would monopolize electricity,
leading to new power structures. Maybe the rise of a new class of engineers or technicians.
Historical parallel could be the Renaissance's spread of knowledge, but here it's electricity
instead.

For part (2), what remains unchanged. Medieval society was heavily agrarian, feudal. Even with
electricity, the basic economic structure might stay. Land ownership, serfdom, maybe. The
social hierarchy could persist because the feudal system is deeply ingrained. Maybe the
church's influence remains strong, as they might control the new technology. Historical
parallel: the Roman Empire's persistence despite technological changes.

Part (3), a second-order consequence. Maybe something like environmental impact. Early
electricity use could lead to pollution, but in a medieval setting, maybe resource depletion.
For example, mining for materials needed for electricity (like coal, metals) could strain
resources, leading to conflicts or economic shifts. Or maybe the spread of ideas through
communication devices (like telegraphs) could lead to social unrest, similar to how the
printing press spread ideas in the Renaissance. But the user wants something not immediately
obvious. Maybe the development of new industries that disrupt traditional crafts, leading to
labor issues. Or the creation of a new class of workers dependent on electricity, leading to
urbanization.
</think>

(Generation hit the 1024 token limit during the thinking phase)

Note: The base model (non-Thinking) produced <think> tags in this run, which is unexpected since its chat template does not inject them. This may be an artifact of sampling (temperature=1.0) or a mixup during the benchmark run. We cannot verify the cause.

PyTorch fp16 output (same model, same prompt):

Click to expand full response (1024 tokens)

The model produced an extended internal reasoning trace followed by a structured response. Below is an excerpt from the structured portion (the full 1024 tokens were mostly reasoning):

(1) Societal Structures That Would Change First
The earliest societal shifts would center on labor productivity and urban infrastructure.
Electric lighting would revolutionize nighttime activities, reducing reliance on candles and
oil lamps. Cities like Florence or Paris (analogous to medieval hubs) could see safer streets,
extended working hours in workshops, and a surge in nighttime commerce. Factories powered by
electric motors would outpace manual labor, undermining the feudal system's agrarian focus.
Wealth would shift from land ownership to industrial capital, accelerating the decline of
feudal hierarchies -- mirroring how the Industrial Revolution eroded medieval guilds and
manorialism.

(2) What Would Remain Surprisingly Unchanged
Despite technological leaps, hierarchical social structures and religious authority would
persist...

(Excerpt from 1024-token output; generation hit the token limit)

Both backends produce coherent, structured reasoning. Outputs differ between runs; we used temperature=1.0 with top_p=0.7, so we cannot draw conclusions about output equivalence from a single sampled run. A proper comparison would require greedy decoding and token-level comparison.

Model Details

Architecture: Looped Language Model (LoopLM) / Universal Transformer
Parameters: 2.6B
Layers: 24 (applied recurrently 4 times = 96 effective layers)
Hidden size: 2048
Attention: Multi-Head (16 heads, 128 head dim)
FFN: SwiGLU (intermediate size 5632)
Position encoding: RoPE
Vocabulary: 49,152 tokens
Training tokens: 7.7T
Context length: 4K (training), extendable to 64K
Precision in this conversion: bfloat16
Disk size: ~5.2 GB

Credits

Original model by ByteDance Seed
MLX architecture reference from mlx-community/Ouro-2.6B-Thinking-4bit
Conversion and MLX model implementation by Arcada Labs
Converted using mlx-lm 0.29.1

Citation

@article{ouro2025,
  title={Scaling Latent Reasoning via Looped Language Models},
  author={ByteDance Seed},
  year={2025},
  url={https://arxiv.org/abs/2510.25741}
}

Downloads last month: 255

Safetensors

Model size

3B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for ArcadaLabs/Ouro-2.6B-mlx-bf16

Base model

ByteDance/Ouro-2.6B

Finetuned

(2)

this model

Paper for ArcadaLabs/Ouro-2.6B-mlx-bf16

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 231