Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeStreetSurfaceVis: a dataset of crowdsourced street-level imagery with semi-automated annotations of road surface type and quality
Road unevenness significantly impacts the safety and comfort of various traffic participants, especially vulnerable road users such as cyclists and wheelchair users. This paper introduces StreetSurfaceVis, a novel dataset comprising 9,122 street-level images collected from a crowdsourcing platform and manually annotated by road surface type and quality. The dataset is intended to train models for comprehensive surface assessments of road networks. Existing open datasets are constrained by limited geospatial coverage and camera setups, typically excluding cycleways and footways. By crafting a heterogeneous dataset, we aim to fill this gap and enable robust models that maintain high accuracy across diverse image sources. However, the frequency distribution of road surface types and qualities is highly imbalanced. We address the challenge of ensuring sufficient images per class while reducing manual annotation by proposing a sampling strategy that incorporates various external label prediction resources. More precisely, we estimate the impact of (1) enriching the image data with OpenStreetMap tags, (2) iterative training and application of a custom surface type classification model, (3) amplifying underrepresented classes through prompt-based classification with GPT-4o or similarity search using image embeddings. We show that utilizing a combination of these strategies effectively reduces manual annotation workload while ensuring sufficient class representation.
RDD2022: A multi-national image dataset for automatic Road Damage Detection
The data article describes the Road Damage Dataset, RDD2022, which comprises 47,420 road images from six countries, Japan, India, the Czech Republic, Norway, the United States, and China. The images have been annotated with more than 55,000 instances of road damage. Four types of road damage, namely longitudinal cracks, transverse cracks, alligator cracks, and potholes, are captured in the dataset. The annotated dataset is envisioned for developing deep learning-based methods to detect and classify road damage automatically. The dataset has been released as a part of the Crowd sensing-based Road Damage Detection Challenge (CRDDC2022). The challenge CRDDC2022 invites researchers from across the globe to propose solutions for automatic road damage detection in multiple countries. The municipalities and road agencies may utilize the RDD2022 dataset, and the models trained using RDD2022 for low-cost automatic monitoring of road conditions. Further, computer vision and machine learning researchers may use the dataset to benchmark the performance of different algorithms for other image-based applications of the same type (classification, object detection, etc.).
R2S100K: Road-Region Segmentation Dataset For Semi-Supervised Autonomous Driving in the Wild
Semantic understanding of roadways is a key enabling factor for safe autonomous driving. However, existing autonomous driving datasets provide well-structured urban roads while ignoring unstructured roadways containing distress, potholes, water puddles, and various kinds of road patches i.e., earthen, gravel etc. To this end, we introduce Road Region Segmentation dataset (R2S100K) -- a large-scale dataset and benchmark for training and evaluation of road segmentation in aforementioned challenging unstructured roadways. R2S100K comprises 100K images extracted from a large and diverse set of video sequences covering more than 1000 KM of roadways. Out of these 100K privacy respecting images, 14,000 images have fine pixel-labeling of road regions, with 86,000 unlabeled images that can be leveraged through semi-supervised learning methods. Alongside, we present an Efficient Data Sampling (EDS) based self-training framework to improve learning by leveraging unlabeled data. Our experimental results demonstrate that the proposed method significantly improves learning methods in generalizability and reduces the labeling cost for semantic segmentation tasks. Our benchmark will be publicly available to facilitate future research at https://r2s100k.github.io/.
RoMe: Towards Large Scale Road Surface Reconstruction via Mesh Representation
In autonomous driving applications, accurate and efficient road surface reconstruction is paramount. This paper introduces RoMe, a novel framework designed for the robust reconstruction of large-scale road surfaces. Leveraging a unique mesh representation, RoMe ensures that the reconstructed road surfaces are accurate and seamlessly aligned with semantics. To address challenges in computational efficiency, we propose a waypoint sampling strategy, enabling RoMe to reconstruct vast environments by focusing on sub-areas and subsequently merging them. Furthermore, we incorporate an extrinsic optimization module to enhance the robustness against inaccuracies in extrinsic calibration. Our extensive evaluations of both public datasets and wild data underscore RoMe's superiority in terms of speed, accuracy, and robustness. For instance, it costs only 2 GPU hours to recover a road surface of 600*600 square meters from thousands of images. Notably, RoMe's capability extends beyond mere reconstruction, offering significant value for auto-labeling tasks in autonomous driving applications. All related data and code are available at https://github.com/DRosemei/RoMe.
Probabilistic road classification in historical maps using synthetic data and deep learning
Historical maps are invaluable for analyzing long-term changes in transportation and spatial development, offering a rich source of data for evolutionary studies. However, digitizing and classifying road networks from these maps is often expensive and time-consuming, limiting their widespread use. Recent advancements in deep learning have made automatic road extraction from historical maps feasible, yet these methods typically require large amounts of labeled training data. To address this challenge, we introduce a novel framework that integrates deep learning with geoinformation, computer-based painting, and image processing methodologies. This framework enables the extraction and classification of roads from historical maps using only road geometries without needing road class labels for training. The process begins with training of a binary segmentation model to extract road geometries, followed by morphological operations, skeletonization, vectorization, and filtering algorithms. Synthetic training data is then generated by a painting function that artificially re-paints road segments using predefined symbology for road classes. Using this synthetic data, a deep ensemble is trained to generate pixel-wise probabilities for road classes to mitigate distribution shift. These predictions are then discretized along the extracted road geometries. Subsequently, further processing is employed to classify entire roads, enabling the identification of potential changes in road classes and resulting in a labeled road class dataset. Our method achieved completeness and correctness scores of over 94% and 92%, respectively, for road class 2, the most prevalent class in the two Siegfried Map sheets from Switzerland used for testing. This research offers a powerful tool for urban planning and transportation decision-making by efficiently extracting and classifying roads from historical maps.
SkyScapes -- Fine-Grained Semantic Understanding of Aerial Scenes
Understanding the complex urban infrastructure with centimeter-level accuracy is essential for many applications from autonomous driving to mapping, infrastructure monitoring, and urban management. Aerial images provide valuable information over a large area instantaneously; nevertheless, no current dataset captures the complexity of aerial scenes at the level of granularity required by real-world applications. To address this, we introduce SkyScapes, an aerial image dataset with highly-accurate, fine-grained annotations for pixel-level semantic labeling. SkyScapes provides annotations for 31 semantic categories ranging from large structures, such as buildings, roads and vegetation, to fine details, such as 12 (sub-)categories of lane markings. We have defined two main tasks on this dataset: dense semantic segmentation and multi-class lane-marking prediction. We carry out extensive experiments to evaluate state-of-the-art segmentation methods on SkyScapes. Existing methods struggle to deal with the wide range of classes, object sizes, scales, and fine details present. We therefore propose a novel multi-task model, which incorporates semantic edge detection and is better tuned for feature extraction from a wide range of scales. This model achieves notable improvements over the baselines in region outlines and level of detail on both tasks.
SensatUrban: Learning Semantics from Urban-Scale Photogrammetric Point Clouds
With the recent availability and affordability of commercial depth sensors and 3D scanners, an increasing number of 3D (i.e., RGBD, point cloud) datasets have been publicized to facilitate research in 3D computer vision. However, existing datasets either cover relatively small areas or have limited semantic annotations. Fine-grained understanding of urban-scale 3D scenes is still in its infancy. In this paper, we introduce SensatUrban, an urban-scale UAV photogrammetry point cloud dataset consisting of nearly three billion points collected from three UK cities, covering 7.6 km^2. Each point in the dataset has been labelled with fine-grained semantic annotations, resulting in a dataset that is three times the size of the previous existing largest photogrammetric point cloud dataset. In addition to the more commonly encountered categories such as road and vegetation, urban-level categories including rail, bridge, and river are also included in our dataset. Based on this dataset, we further build a benchmark to evaluate the performance of state-of-the-art segmentation algorithms. In particular, we provide a comprehensive analysis and identify several key challenges limiting urban-scale point cloud understanding. The dataset is available at http://point-cloud-analysis.cs.ox.ac.uk.
Nexar Dashcam Collision Prediction Dataset and Challenge
This paper presents the Nexar Dashcam Collision Prediction Dataset and Challenge, designed to support research in traffic event analysis, collision prediction, and autonomous vehicle safety. The dataset consists of 1,500 annotated video clips, each approximately 40 seconds long, capturing a diverse range of real-world traffic scenarios. Videos are labeled with event type (collision/near-collision vs. normal driving), environmental conditions (lighting conditions and weather), and scene type (urban, rural, highway, etc.). For collision and near-collision cases, additional temporal labels are provided, including the precise moment of the event and the alert time, marking when the collision first becomes predictable. To advance research on accident prediction, we introduce the Nexar Dashcam Collision Prediction Challenge, a public competition on top of this dataset. Participants are tasked with developing machine learning models that predict the likelihood of an imminent collision, given an input video. Model performance is evaluated using the average precision (AP) computed across multiple intervals before the accident (i.e. 500 ms, 1000 ms, and 1500 ms prior to the event), emphasizing the importance of early and reliable predictions. The dataset is released under an open license with restrictions on unethical use, ensuring responsible research and innovation.
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction
In this paper, we propose OpenSatMap, a fine-grained, high-resolution satellite dataset for large-scale map construction. Map construction is one of the foundations of the transportation industry, such as navigation and autonomous driving. Extracting road structures from satellite images is an efficient way to construct large-scale maps. However, existing satellite datasets provide only coarse semantic-level labels with a relatively low resolution (up to level 19), impeding the advancement of this field. In contrast, the proposed OpenSatMap (1) has fine-grained instance-level annotations; (2) consists of high-resolution images (level 20); (3) is currently the largest one of its kind; (4) collects data with high diversity. Moreover, OpenSatMap covers and aligns with the popular nuScenes dataset and Argoverse 2 dataset to potentially advance autonomous driving technologies. By publishing and maintaining the dataset, we provide a high-quality benchmark for satellite-based map construction and downstream tasks like autonomous driving.
Graph Representation Learning for Road Type Classification
We present a novel learning-based approach to graph representations of road networks employing state-of-the-art graph convolutional neural networks. Our approach is applied to realistic road networks of 17 cities from Open Street Map. While edge features are crucial to generate descriptive graph representations of road networks, graph convolutional networks usually rely on node features only. We show that the highly representative edge features can still be integrated into such networks by applying a line graph transformation. We also propose a method for neighborhood sampling based on a topological neighborhood composed of both local and global neighbors. We compare the performance of learning representations using different types of neighborhood aggregation functions in transductive and inductive tasks and in supervised and unsupervised learning. Furthermore, we propose a novel aggregation approach, Graph Attention Isomorphism Network, GAIN. Our results show that GAIN outperforms state-of-the-art methods on the road type classification problem.
Dense Road Surface Grip Map Prediction from Multimodal Image Data
Slippery road weather conditions are prevalent in many regions and cause a regular risk for traffic. Still, there has been less research on how autonomous vehicles could detect slippery driving conditions on the road to drive safely. In this work, we propose a method to predict a dense grip map from the area in front of the car, based on postprocessed multimodal sensor data. We trained a convolutional neural network to predict pixelwise grip values from fused RGB camera, thermal camera, and LiDAR reflectance images, based on weakly supervised ground truth from an optical road weather sensor. The experiments show that it is possible to predict dense grip values with good accuracy from the used data modalities as the produced grip map follows both ground truth measurements and local weather conditions, such as snowy areas on the road. The model using only the RGB camera or LiDAR reflectance modality provided good baseline results for grip prediction accuracy while using models fusing the RGB camera, thermal camera, and LiDAR modalities improved the grip predictions significantly.
CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation
Curb detection is a crucial function in intelligent driving, essential for determining drivable areas on the road. However, the complexity of road environments makes curb detection challenging. This paper introduces CurbNet, a novel framework for curb detection utilizing point cloud segmentation. To address the lack of comprehensive curb datasets with 3D annotations, we have developed the 3D-Curb dataset based on SemanticKITTI, currently the largest and most diverse collection of curb point clouds. Recognizing that the primary characteristic of curbs is height variation, our approach leverages spatially rich 3D point clouds for training. To tackle the challenges posed by the uneven distribution of curb features on the xy-plane and their dependence on high-frequency features along the z-axis, we introduce the Multi-Scale and Channel Attention (MSCA) module, a customized solution designed to optimize detection performance. Additionally, we propose an adaptive weighted loss function group specifically formulated to counteract the imbalance in the distribution of curb point clouds relative to other categories. Extensive experiments conducted on 2 major datasets demonstrate that our method surpasses existing benchmarks set by leading curb detection and point cloud segmentation models. Through the post-processing refinement of the detection results, we have significantly reduced noise in curb detection, thereby improving precision by 4.5 points. Similarly, our tolerance experiments also achieve state-of-the-art results. Furthermore, real-world experiments and dataset analyses mutually validate each other, reinforcing CurbNet's superior detection capability and robust generalizability. The project website is available at: https://github.com/guoyangzhao/CurbNet/.
Text2Street: Controllable Text-to-image Generation for Street Views
Text-to-image generation has made remarkable progress with the emergence of diffusion models. However, it is still a difficult task to generate images for street views based on text, mainly because the road topology of street scenes is complex, the traffic status is diverse and the weather condition is various, which makes conventional text-to-image models difficult to deal with. To address these challenges, we propose a novel controllable text-to-image framework, named Text2Street. In the framework, we first introduce the lane-aware road topology generator, which achieves text-to-map generation with the accurate road structure and lane lines armed with the counting adapter, realizing the controllable road topology generation. Then, the position-based object layout generator is proposed to obtain text-to-layout generation through an object-level bounding box diffusion strategy, realizing the controllable traffic object layout generation. Finally, the multiple control image generator is designed to integrate the road topology, object layout and weather description to realize controllable street-view image generation. Extensive experiments show that the proposed approach achieves controllable street-view text-to-image generation and validates the effectiveness of the Text2Street framework for street views.
Shaded Route Planning Using Active Segmentation and Identification of Satellite Images
Heatwaves pose significant health risks, particularly due to prolonged exposure to high summer temperatures. Vulnerable groups, especially pedestrians and cyclists on sun-exposed sidewalks, motivate the development of a route planning method that incorporates somatosensory temperature effects through shade ratio consideration. This paper is the first to introduce a pipeline that utilizes segmentation foundation models to extract shaded areas from high-resolution satellite images. These areas are then integrated into a multi-layered road map, enabling users to customize routes based on a balance between distance and shade exposure, thereby enhancing comfort and health during outdoor activities. Specifically, we construct a graph-based representation of the road map, where links indicate connectivity and are updated with shade ratio data for dynamic route planning. This system is already implemented online, with a video demonstration, and will be specifically adapted to assist travelers during the 2024 Olympic Games in Paris.
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method
Recently, road graph extraction has garnered increasing attention due to its crucial role in autonomous driving, navigation, etc. However, accurately and efficiently extracting road graphs remains a persistent challenge, primarily due to the severe scarcity of labeled data. To address this limitation, we collect a global-scale satellite road graph extraction dataset, i.e. Global-Scale dataset. Specifically, the Global-Scale dataset is sim20 times larger than the largest existing public road extraction dataset and spans over 13,800 km^2 globally. Additionally, we develop a novel road graph extraction model, i.e. SAM-Road++, which adopts a node-guided resampling method to alleviate the mismatch issue between training and inference in SAM-Road, a pioneering state-of-the-art road graph extraction model. Furthermore, we propose a simple yet effective ``extended-line'' strategy in SAM-Road++ to mitigate the occlusion issue on the road. Extensive experiments demonstrate the validity of the collected Global-Scale dataset and the proposed SAM-Road++ method, particularly highlighting its superior predictive power in unseen regions. The dataset and code are available at https://github.com/earth-insights/samroadplus.
An Image-Based Path Planning Algorithm Using a UAV Equipped with Stereo Vision
This paper presents a novel image-based path planning algorithm that was developed using computer vision techniques, as well as its comparative analysis with well-known deterministic and probabilistic algorithms, namely A* and Probabilistic Road Map algorithm (PRM). The terrain depth has a significant impact on the calculated path safety. The craters and hills on the surface cannot be distinguished in a two-dimensional image. The proposed method uses a disparity map of the terrain that is generated by using a UAV. Several computer vision techniques, including edge, line and corner detection methods, as well as the stereo depth reconstruction technique, are applied to the captured images and the found disparity map is used to define candidate way-points of the trajectory. The initial and desired points are detected automatically using ArUco marker pose estimation and circle detection techniques. After presenting the mathematical model and vision techniques, the developed algorithm is compared with well-known algorithms on different virtual scenes created in the V-REP simulation program and a physical setup created in a laboratory environment. Results are promising and demonstrate effectiveness of the proposed algorithm.
TACK Tunnel Data (TTD): A Benchmark Dataset for Deep Learning-Based Defect Detection in Tunnels
Tunnels are essential elements of transportation infrastructure, but are increasingly affected by ageing and deterioration mechanisms such as cracking. Regular inspections are required to ensure their safety, yet traditional manual procedures are time-consuming, subjective, and costly. Recent advances in mobile mapping systems and Deep Learning (DL) enable automated visual inspections. However, their effectiveness is limited by the scarcity of tunnel datasets. This paper introduces a new publicly available dataset containing annotated images of three different tunnel linings, capturing typical defects: cracks, leaching, and water infiltration. The dataset is designed to support supervised, semi-supervised, and unsupervised DL methods for defect detection and segmentation. Its diversity in texture and construction techniques also enables investigation of model generalization and transferability across tunnel types. By addressing the critical lack of domain-specific data, this dataset contributes to advancing automated tunnel inspection and promoting safer, more efficient infrastructure maintenance strategies.
Enhancing Pothole Detection and Characterization: Integrated Segmentation and Depth Estimation in Road Anomaly Systems
Road anomaly detection plays a crucial role in road maintenance and in enhancing the safety of both drivers and vehicles. Recent machine learning approaches for road anomaly detection have overcome the tedious and time-consuming process of manual analysis and anomaly counting; however, they often fall short in providing a complete characterization of road potholes. In this paper, we leverage transfer learning by adopting a pre-trained YOLOv8-seg model for the automatic characterization of potholes using digital images captured from a dashboard-mounted camera. Our work includes the creation of a novel dataset, comprising both images and their corresponding depth maps, collected from diverse road environments in Al-Khobar city and the KFUPM campus in Saudi Arabia. Our approach performs pothole detection and segmentation to precisely localize potholes and calculate their area. Subsequently, the segmented image is merged with its depth map to extract detailed depth information about the potholes. This integration of segmentation and depth data offers a more comprehensive characterization compared to previous deep learning-based road anomaly detection systems. Overall, this method not only has the potential to significantly enhance autonomous vehicle navigation by improving the detection and characterization of road hazards but also assists road maintenance authorities in responding more effectively to road damage.
Improved Pothole Detection Using YOLOv7 and ESRGAN
Potholes are common road hazards that is causing damage to vehicles and posing a safety risk to drivers. The introduction of Convolutional Neural Networks (CNNs) is widely used in the industry for object detection based on Deep Learning methods and has achieved significant progress in hardware improvement and software implementations. In this paper, a unique better algorithm is proposed to warrant the use of low-resolution cameras or low-resolution images and video feed for automatic pothole detection using Super Resolution (SR) through Super Resolution Generative Adversarial Networks (SRGANs). Then we have proceeded to establish a baseline pothole detection performance on low quality and high quality dashcam images using a You Only Look Once (YOLO) network, namely the YOLOv7 network. We then have illustrated and examined the speed and accuracy gained above the benchmark after having upscaling implementation on the low quality images.
SIO-Mapper: A Framework for Lane-Level HD Map Construction Using Satellite Images and OpenStreetMap with No On-Site Visits
High-definition (HD) maps, particularly those containing lane-level information regarded as ground truth, are crucial for vehicle localization research. Traditionally, constructing HD maps requires highly accurate sensor measurements collection from the target area, followed by manual annotation to assign semantic information. Consequently, HD maps are limited in terms of geographic coverage. To tackle this problem, in this paper, we propose SIO-Mapper, a novel lane-level HD map construction framework that constructs city-scale maps without physical site visits by utilizing satellite images and OpenStreetmap data. One of the key contributions of SIO-Mapper is its ability to extract lane information more accurately by introducing SIO-Net, a novel deep learning network that integrates features from satellite image and OpenStreetmap using both Transformer-based and convolution-based encoders. Furthermore, to overcome challenges in merging lanes over large areas, we introduce a novel lane integration methodology that combines cluster-based and graph-based approaches. This algorithm ensures the seamless aggregation of lane segments with high accuracy and coverage, even in complex road environments. We validated SIO-Mapper on the Naver Labs Open Dataset and NuScenes dataset, demonstrating better performance in various environments including Korea, the United States, and Singapore compared to the state-of-the-art lane-level HD mapconstruction methods.
IDD-3D: Indian Driving Dataset for 3D Unstructured Road Scenes
Autonomous driving and assistance systems rely on annotated data from traffic and road scenarios to model and learn the various object relations in complex real-world scenarios. Preparation and training of deploy-able deep learning architectures require the models to be suited to different traffic scenarios and adapt to different situations. Currently, existing datasets, while large-scale, lack such diversities and are geographically biased towards mainly developed cities. An unstructured and complex driving layout found in several developing countries such as India poses a challenge to these models due to the sheer degree of variations in the object types, densities, and locations. To facilitate better research toward accommodating such scenarios, we build a new dataset, IDD-3D, which consists of multi-modal data from multiple cameras and LiDAR sensors with 12k annotated driving LiDAR frames across various traffic scenarios. We discuss the need for this dataset through statistical comparisons with existing datasets and highlight benchmarks on standard 3D object detection and tracking tasks in complex layouts. Code and data available at https://github.com/shubham1810/idd3d_kit.git
