Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AI 
posted an update about 23 hours ago
view post
Post
2208
Darwin-TTS: 3% of an LLM's Brain Makes TTS Speak with Emotion — Zero Training

We blended 3% of Qwen3-1.7B (LLM) FFN weights into Qwen3-TTS-1.7B's talker module. The result: emotionally enhanced speech synthesis — with zero training, zero data, and zero GPU hours.

Try the Demo: FINAL-Bench/Darwin-TTS-1.7B-Cross

Model Weights: FINAL-Bench/Darwin-TTS-1.7B-Cross

Full Research Article: https://huggingface.co/blog/FINAL-Bench/darwin-tts

Qwen3-1.7B (LLM) and Qwen3-TTS-1.7B's talker share 100% identical architecture — same hidden_size (2048), same layers (28), same heads (16). This enabled pure 1:1 weight blending across 84 FFN tensors with a single lerp operation. At 3% blend, emotion appears. At 5%, emotion intensifies. At 10%, the model breaks — producing 655-second outputs for a 3-second sentence, because the LLM's "keep generating" pattern overwhelms the TTS stop signal.

To our knowledge, this is the first training-free cross-modal weight transfer between an LLM and a TTS model. Prior work either requires adapter training (SmolTolk, 2025), fine-tuning (CSLM, 2025), or massive end-to-end compute (GPT-4o). Darwin-TTS achieves cross-modal capability transfer in under 2 minutes on CPU.

The key insight: TTS models with LLM backbones already "think" in language. We're just restoring 3% of the original LLM's language understanding patterns — particularly those related to emotional semantics and prosody planning. The code is three lines: load the model, load the LLM FFN, call p.lerp_(llm_weight, 0.03).

creators of the Darwin Evolutionary Merge Framework.
Darwin LLM V7 achieved GPQA Diamond 86.9% (HF Benchmark #3)
through CMA-ES optimized FFN crossbreeding. Darwin-TTS extends this principle from LLM-to-LLM merging into cross-modal LLM-to-TTS transfer. Apache 2.0.
SeaWolf-AI 
posted an update 3 days ago
view post
Post
5767
🧬 Darwin-27B-Opus: 86.9% on GPQA Diamond — World #5, Zero Training
We are excited to share Darwin-27B-Opus, a 27B model that achieved 86.9% on GPQA Diamond — ranking #5 globally on the HuggingFace leaderboard — without a single gradient update.

How? Darwin breeds pretrained models through evolutionary FFN crossbreeding. The father (Qwen3.5-27B) provides the reasoning architecture; the mother (Claude 4.6 Opus Reasoning Distilled) contributes structured chain-of-thought knowledge. CMA-ES automatically discovers optimal per-layer blending ratios — no human tuning required.

The result surpasses the original Qwen3.5-27B (85.5%), GLM-5.1 (744B, 86.2%), and Qwen3.5-122B (86.6%). A 27B model outperforming 744B — with zero training, zero data, one GPU, ~2 hours.

We also confirmed hybrid vigor on Korean benchmarks: Darwin-27B-KR (2nd generation offspring) surpassed both parents on CLIcK, winning 7 out of 11 categories. The evolutionary optimizer independently assigned 93% of FFN from the Korean-specialized mother while preserving 93% of attention from the reasoning-specialized father — autonomously validating our core principle: FFN carries knowledge, Attention carries reasoning.

📊 Public release: 10 days → 300+ community derivatives, 120K+ downloads.

🔗 Links:
Darwin-27B-Opus: FINAL-Bench/Darwin-27B-Opus
article: https://huggingface.co/blog/FINAL-Bench/darwin-gpqa
Darwin Family Collection: https://huggingface.co/collections/FINAL-Bench/darwin-family

If foundation models are raw ore, Darwin is the forge. We are just getting started. 🔥
victor 
posted an update 2 days ago
view post
Post
4323
Want to share my enthusiasm for zai-org/GLM-5.1 here too 🔥

I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:

- One-shot car physics with real drift mechanics (this is hard)

- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen

- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters

- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!

- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed

You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai🚀🚀
  • 3 replies
·
omarkamali 
posted an update 3 days ago
view post
Post
4446
We got Qwen 3.5 to count Rs in Strawberry correctly! 🚨

Building on Sawtone, we’ve been testing a different way to feed language into an LLM to build the next generation of multilingual AI.

The usual setup gives the model tokenized text and asks it to perform various linguistic tasks. That works surprisingly well, until it doesn’t. Accents disappear. Words get mangled. Internal structure gets blurred away. And the cost of that gets higher once you move into multilingual and lower-resource settings.

So we tried adding a second path.

In addition to the normal text input, the model also receives Sawtone: a byte-level word representation that preserves how a word is written, how it sounds, and how it is structured.

Same LLM. Better interface.

In this proof of concept with Qwen 3.5 0.8B, that pushed our eval from 64% to 88%. The gains showed up exactly where tokenized models usually get shaky: diacritics, character order, exact spelling, and other form-sensitive behavior.

Sawtone itself is tokenizer-free, byte-level, and pre-trained across 507 languages.

Still early, but promising!

  • 4 replies
·
prithivMLmods 
posted an update 3 days ago
view post
Post
5732
A new comparator on Spaces showcases Standard FLUX.2 Decoder vs. FLUX.2 Small Decoder. The Small Decoder is ~1.4× faster, uses ~1.4× less VRAM, and maintains near-identical image quality. It has ~28M parameters with narrower channels [96, 192, 384, 384] vs. [128, 256, 512, 512], and the demo supports sequence generation by running both decoders simultaneously and comparing the results side by side.

🤗 Comparator: prithivMLmods/Flux.2-4B-Decoder-Comparator
🔗 FLUX.2-small-decoder: black-forest-labs/FLUX.2-small-decoder
🔗 GitHub: https://github.com/PRITHIVSAKTHIUR/Flux.2-4B-Encoder-Comparator
🚁 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

🤗 > App built on the Gradio SDK. To learn more, visit the app page or the respective model pages.
cahlen 
posted an update 1 day ago
view post
Post
821
Huggingface just enabled cuda kernel repos!! This is crazy cool!

Expect a ton more portable number theory cuda kernels in the near future. I'm going to have a hell of a lot of fun with this new feature.

Appreciate it huggingface!

https://huggingface.co/kernels

  • 1 reply
·
branikita 
posted an update 3 days ago
view post
Post
1477
Our paper is now published in HardwareX!

"An open-source test stand for backlash measurement in low-cost UART servo motors" presents a ~$100 automated platform for measuring backlash in compact serial bus servos — no dial indicators, no probe contact force, fully scripted and repeatable.

All CAD files, control software, and analysis tools are open source.

Read the full article: https://www.hardware-x.com/article/S2468-0672(26)00027-1/fulltext

Appearing in HardwareX Volume 26 (June 2026): https://www.sciencedirect.com/journal/hardwarex/vol/26/suppl/Cf
Benedictat 
posted an update about 1 hour ago
view post
Post
Hunyuan HY-World 2.0 Open-Sourced | Unified SOTA for 3D Generation / Reconstruction / Simulation

HY-World 2.0 is a unified 3D world model supporting multimodal inputs including text and images.

Its end-to-end framework simultaneously performs 3D understanding, scene generation, and geometric reconstruction.

Based on HY-Pano-2.0, the model enables panorama generation without camera parameters

It ensures geometric consistency via spatial agents and trajectory planning, and achieves joint 3DGS & Mesh representation with WorldMirror 2.0, reaching SOTA performance in novel view synthesis and 3D reconstruction

Unlike Genie 3 and HY-World 1.5, which only output videos, HY-World 2.0 directly generates editable 3D assets, better meeting real-world research and simulation demands
DedeProGames 
posted an update about 10 hours ago
view post
Post
677
🔥 GRM-2.5 - The most POWERFUL model for local inference

The GRM-2.5 is the newest model from Orion LLM Labs. It has consistent RAW reasoning and is capable of generating very precise responses, similar to large models, while maintaining a parameter size of 4b.

The GRM-2.5 family consists of these models:
OrionLLM/GRM-2.5 (4b)
OrionLLM/GRM-2.5-Air (0.8b)

Furthermore, the GRM-2.5 is the best option for local agentic environments, being very good in code, terminal agent, etc. It is capable of generating 1000 lines of consistent code and programming like large models.
The GRM-2.5 is the best base for FineTune to date and has vision, which means it can interpret images and videos.
  • 1 reply
·
eaddario 
posted an update 6 days ago
view post
Post
144
Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.5-4B and Qwen/Qwen3.5-9B

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards

eaddario/Qwen3.5-4B-GGUF
eaddario/Qwen3.5-9B-GGUF
  • 4 replies
·