CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis
Introduction
Note:
(1) CodeV-SVA-no-think-no-dut-8B removes RTL code from the prompt to better align with real-world usage scenarios. The model is still under development.
(2) This model is trained without thinking trajectories. It should be deployed in Qwen3's non-thinking mode (set enable_thinking=False).
We introduce CodeV-SVA, a family of large language models designed to translate natural-language verification properties into SystemVerilog Assertions (SVAs).
Open-Source Plan:
- Model ✓
- Evaluation code ✓
- Paper
- Dataset
- Data synthesis and training code
Evaluation Results
| Model | NL2SVA-Human (w/o DUT) | ||
|---|---|---|---|
| Func.@1 | Func.@16 | Func.@32 | |
| SVACoder-8B | 0.408 | 0.680 | 0.685 |
| SVACoder-8B-no-think-no-dut | 0.416 | 0.700 | 0.726 |
This evaluation removes RTLs from the original NL2SVA-Human benchmark. Although some information necessary for generating SVA is contained in the RTL code, and removing it significantly degrades model performance, this evaluation is intended solely to better align with our usage scenarios. We will further refine the benchmark in future work.
Models
| Model | Download |
|---|---|
| CodeV-SVA-8B | 🤗HuggingFace |
| CodeV-SVA-no-think-8B | 🤗HuggingFace |
| CodeV-SVA-no-think-no-dut-8B | 🤗HuggingFace |
| CodeV-SVA-14B | 🤗HuggingFace |
Usage
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
MODEL_DIR = "/path/to/CodeV-SVA-no-think-no-dut-8B/"
def get_nl2sva_human_prompt_no_dut(problem: str) -> str:
prompt = "Create a SVA assertion that checks: "
prompt += problem + "\n"
return prompt
def get_nl2sva_human_prompt_no_dut_chinese(problem: str) -> str:
prompt = "编写一个 SVA 断言来验证以下性质:"
prompt += problem + "\n"
return prompt
if __name__ == "__main__":
system_prompt = "You are an AI assistant tasked with formal verification of register transfer level (RTL) designs.\nYour job is to translate a description of an assertion to concrete SystemVerilog Assertion (SVA) implementation.\n"
model = LLM(
MODEL_DIR,
tensor_parallel_size=4
)
sampling_params = SamplingParams(
temperature=0.8,
top_p=0.95,
max_tokens=32768,
n=1
)
problem_en = "upd_rdy must be a one cycle pulse except next cycle upd_vld is valid"
dialog = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": get_nl2sva_human_prompt_no_dut(problem_en)}
]
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
prompt = tokenizer.apply_chat_template(dialog, tokenize=False, add_generation_prompt=True, enable_thinking=False)
print(f"### Prompt (en):\n{prompt}")
responses = model.generate(prompt, sampling_params)
formal_statements = []
for i, output in enumerate(responses[0].outputs):
response_text = output.text
print(f"### Response (en):\n{response_text}")
problem_zh = "upd_rdy 必须是单个时钟周期的脉冲,除非在下一个周期中 upd_vld 为有效"
dialog = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": get_nl2sva_human_prompt_no_dut_chinese(problem_zh)}
]
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
prompt = tokenizer.apply_chat_template(dialog, tokenize=False, add_generation_prompt=True, enable_thinking=False)
print(f"### Prompt (zh):\n{prompt}")
responses = model.generate(prompt, sampling_params)
formal_statements = []
for i, output in enumerate(responses[0].outputs):
response_text = output.text
print(f"### Response (zh):\n{response_text}")
Citation
@misc{CodeV-SVA,
title={CodeV-SVA: Training Specialized LLMs for Hardware Assertion Generation via RTL-Grounded Bidirectional Data Synthesis},
author={Yutong Wu and Chenrui Cao and Pengwei Jin and Di Huang and Rui Zhang and Xishan Zhang and Zidong Du and Qi Guo and Xing Hu},
year={2025},
howpublished={\url{https://huggingface.co/wyt2000/CodeV-SVA-14B}},
}
- Downloads last month
- 25