5 All You Need Is A Fuzzing Brain: An LLM-Powered System for Automated Vulnerability Detection and Patching Our team, All You Need Is A Fuzzing Brain, was one of seven finalists in DARPA's Artificial Intelligence Cyber Challenge (AIxCC), placing fourth in the final round. During the competition, we developed a Cyber Reasoning System (CRS) that autonomously discovered 28 security vulnerabilities - including six previously unknown zero-days - in real-world open-source C and Java projects, and successfully patched 14 of them. The complete CRS is open source at https://github.com/o2lab/afc-crs-all-you-need-is-a-fuzzing-brain. This paper provides a detailed technical description of our CRS, with an emphasis on its LLM-powered components and strategies. Building on AIxCC, we further introduce a public leaderboard for benchmarking state-of-the-art LLMs on vulnerability detection and patching tasks, derived from the AIxCC dataset. The leaderboard is available at https://o2lab.github.io/FuzzingBrain-Leaderboard/. 8 authors · Sep 8, 2025 2
- Benchmarking Spatiotemporal Reasoning in LLMs and Reasoning Models: Capabilities and Challenges Spatiotemporal reasoning plays a key role in Cyber-Physical Systems (CPS). Despite advances in Large Language Models (LLMs) and Large Reasoning Models (LRMs), their capacity to reason about complex spatiotemporal signals remains underexplored. This paper proposes a hierarchical SpatioTemporal reAsoning benchmaRK, STARK, to systematically evaluate LLMs across three levels of reasoning complexity: state estimation (e.g., predicting field variables, localizing and tracking events in space and time), spatiotemporal reasoning over states (e.g., inferring spatial-temporal relationships), and world-knowledge-aware reasoning that integrates contextual and domain knowledge (e.g., intent prediction, landmark-aware navigation). We curate 26 distinct spatiotemporal tasks with diverse sensor modalities, comprising 14,552 challenges where models answer directly or by Python Code Interpreter. Evaluating 3 LRMs and 8 LLMs, we find LLMs achieve limited success in tasks requiring geometric reasoning (e.g., multilateration or triangulation), particularly as complexity increases. Surprisingly, LRMs show robust performance across tasks with various levels of difficulty, often competing or surpassing traditional first-principle-based methods. Our results show that in reasoning tasks requiring world knowledge, the performance gap between LLMs and LRMs narrows, with some LLMs even surpassing LRMs. However, the LRM o3 model continues to achieve leading performance across all evaluated tasks, a result attributed primarily to the larger size of the reasoning models. STARK motivates future innovations in model architectures and reasoning paradigms for intelligent CPS by providing a structured framework to identify limitations in the spatiotemporal reasoning of LLMs and LRMs. 5 authors · May 16, 2025
1 Semantic Association Rule Learning from Time Series Data and Knowledge Graphs Digital Twins (DT) are a promising concept in cyber-physical systems research due to their advanced features including monitoring and automated reasoning. Semantic technologies such as Knowledge Graphs (KG) are recently being utilized in DTs especially for information modelling. Building on this move, this paper proposes a pipeline for semantic association rule learning in DTs using KGs and time series data. In addition to this initial pipeline, we also propose new semantic association rule criterion. The approach is evaluated on an industrial water network scenario. Initial evaluation shows that the proposed approach is able to learn a high number of association rules with semantic information which are more generalizable. The paper aims to set a foundation for further work on using semantic association rule learning especially in the context of industrial applications. 3 authors · Oct 11, 2023