SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

CVPR 2024 Workshop on Synthetic Data for Computer Vision

Manideep Reddy Aliminati*        Bharatesh Chakravarthi*       Aayush Atul Verma        Arpitsinh Vaghela
Hua Wei        Xuesong Zhou        Yezhou Yang
(* Equal Contribution)
Arizona State University


SEVD is a synthetic dataset from CARLA, offering multi-view ego and fixed perception data from dynamic vision sensors. Data sequences are recorded across diverse lighting, weather conditions with domain shifts, and scenes featuring various classes of objects. Alongside event data, SEVD includes RGB imagery, depth maps, optical flow, semantic, and instance segmentation, facilitating a comprehensive understanding of the scene.

Teaser Video

Abstract

Recently, event-based vision sensors have gained attention for autonomous driving applications, as conventional RGB cameras face limitations in handling challenging dynamic conditions. However, the availability of real-world and synthetic event-based vision datasets remains limited. In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset using multiple dynamic vision sensors within the CARLA simulator. Data sequences are recorded across diverse lighting (noon, nighttime, twilight) and weather conditions (clear, cloudy, wet, rainy, foggy) with domain shifts (discrete and continuous). SEVD spans urban, suburban, rural, and highway scenes featuring various classes of objects (car, truck, van, bicycle, motorcycle, and pedestrian). Alongside event data, SEVD includes RGB imagery, depth maps, optical flow, semantic, and instance segmentation, facilitating a comprehensive understanding of the scene. Furthermore, we evaluate the dataset using state-of-the-art event-based (RED, RVT) and frame-based (YOLOv8) methods for traffic participant detection tasks and provide baseline benchmarks for assessment. Additionally, we conduct experiments to assess the synthetic event-based dataset's generalization capabilities.


SEVD - Multiview & Multimodality


The SEVD sensor suite comprises a strategically positioned array of sensors of each type (event, RGB, depth, optical flow, semantic, and instance). In ego scenarios, the cameras offer coverage from front to rear, including front-right, front-left, rear-right, and rear-left perspectives, each with overlapping FoV providing a comprehensive 360o view. Notably, the rear camera features a wider 110o FoV, while the others have a 70o FoV.


SEVD - Settings and Stats


SEVD offers a diverse range of recordings featuring various combinations of scenes (urban, suburban, rural, and highway), weather (clear, cloudy, wet, rainy, foggy), and lighting conditions (noon, nighttime, twilight). Each recording spans durations of 2 to 30 mins. SEVD provide a total of 27 hrs of fixed and 31 hrs of ego perception event data collectively. Similarly, SEVD offer an equal volume of data from other sensor types, resulting in a cumulative 162 hrs of fixed and 186 hrs of ego perception data. SEVD comprises extensive annotations, including 2D and 3D bounding boxes for six categories (car, truck, bus, bicycle, motorcycle, and pedestrian) of traffic participants, totaling approximately 9M bounding boxes, with cars being the most prevalent category.


Data Samples


BibTeX

@article{aliminati2024sevd,
  title={SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception},
  author={Aliminati, Manideep Reddy and Chakravarthi, Bharatesh and Verma, Aayush Atul and Vaghela, Arpitsinh and Wei, Hua and Zhou, Xuesong and Yang, Yezhou},
  journal={arXiv preprint arXiv:2404.10540},
  year={2024}
}