TuRTLe-Leaderboard

Running

App Files Files Community

When will this leaderboard update the evaluations of the new models?

by JaxonJP - opened Feb 24

Discussion

JaxonJP

Feb 24

thank you

ggcristian

HPAI@BSC (High Performance Artificial Intelligence at Barcelona Supercomputing Center) org Mar 2

Hi @JaxonJP

We try to update the leaderboard on a monthly basis. Which new models are you interested in seeing added, and on which benchmarks?

JaxonJP

Mar 9

Thank you for maintaining this project. I’ve been following it closely, especially because I’m very interested in the application of SOTA models to IC design.

For future updates, I’d especially like to see evaluations of the following models:

Open-source models:

Kimi 2.5

MiniMax 2.5

GLM 5

Step-3.5-Flash

Closed-source models:

GPT-5.4 / 5.3 / 5.2

It would be great to compare their performance on the current benchmark suite in the leaderboard.

JaxonJP

24 days ago

This month is almost over, will the usual monthly update still proceed as scheduled?

ggcristian

HPAI@BSC (High Performance Artificial Intelligence at Barcelona Supercomputing Center) org 16 days ago

FYI @JaxonJP

We just added the following models:

Kimi-K2.5
GLM-5-FP8
IndustrialCoder 32B
IndustrialCoder 32B Thinking

onto the following benchmarks:

Spec-to-RTL task:
- VerilogEval 2.0
- RTLLM 2.0
Code Completion task:
- VerilogEval 2.0
- VeriGen
Line Completion task:
- RTL-Repo

Both Kimi-K2.5 and GLM-5 are SOTA wrt the previous models we had, surpassing (finally) DeepSeek-R1-0528. IndustrialCoder 32B is the brand new best RTL-Specialized model.
All models were ran on vLLM latest stable release 0.18.
We are working to add all above models onto "NotSoTiny" benchmark, after that we will close up this release.

Thanks for noticing our work,

JaxonJP

14 days ago

Thanks for the update — this is great progress.

One quick question: when do you expect testing for Gemma 4 31B-it to get on the roadmap?

It was just released on April 2, and based on the official information so far, its performance looks very strong — potentially competitive with Kimi 2.5. I’m especially eager to see how it performs on IC design tasks.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment