Spaces:
Running
When will this leaderboard update the evaluations of the new models?
thank you
Hi @JaxonJP
We try to update the leaderboard on a monthly basis. Which new models are you interested in seeing added, and on which benchmarks?
Thank you for maintaining this project. I’ve been following it closely, especially because I’m very interested in the application of SOTA models to IC design.
For future updates, I’d especially like to see evaluations of the following models:
Open-source models:
Kimi 2.5
MiniMax 2.5
GLM 5
Step-3.5-Flash
Closed-source models:
GPT-5.4 / 5.3 / 5.2
It would be great to compare their performance on the current benchmark suite in the leaderboard.
This month is almost over, will the usual monthly update still proceed as scheduled?
FYI @JaxonJP
We just added the following models:
- Kimi-K2.5
- GLM-5-FP8
- IndustrialCoder 32B
- IndustrialCoder 32B Thinking
onto the following benchmarks:
- Spec-to-RTL task:
- VerilogEval 2.0
- RTLLM 2.0
- Code Completion task:
- VerilogEval 2.0
- VeriGen
- Line Completion task:
- RTL-Repo
Both Kimi-K2.5 and GLM-5 are SOTA wrt the previous models we had, surpassing (finally) DeepSeek-R1-0528. IndustrialCoder 32B is the brand new best RTL-Specialized model.
All models were ran on vLLM latest stable release 0.18.
We are working to add all above models onto "NotSoTiny" benchmark, after that we will close up this release.
Thanks for noticing our work,
Thanks for the update — this is great progress.
One quick question: when do you expect testing for Gemma 4 31B-it to get on the roadmap?
It was just released on April 2, and based on the official information so far, its performance looks very strong — potentially competitive with Kimi 2.5. I’m especially eager to see how it performs on IC design tasks.