Event-based Graph Representation with Spatial and Motion Vectors for Asynchronous Object Detection

WACV 2026

Aayush Atul Verma Arpitsinh Vaghela Bharatesh Chakravarthi
Kaustav Chanda Yezhou Yang

Arizona State University

eGSMV is a spatiotemporal multigraph framework for event-based object detection that models raw event streams with separate spatial and temporal neighborhoods, enabling efficient asynchronous inference while preserving sparsity and temporal granularity.

Abstract

Event-based sensors offer high temporal resolution and low latency by generating sparse, asynchronous data. However, converting this irregular data into dense tensors for standard neural networks diminishes these advantages and increases computational cost. In this work, we propose eGSMV, a novel spatiotemporal multigraph representation that constructs two decoupled graphs: a spatial graph leveraging B-spline basis functions to model global structure, and a temporal graph utilizing motion vector-based attention for local dynamic changes. This design enables efficient 2D kernels in place of expensive 3D kernels while preserving the sparsity and asynchrony of events.

We evaluate eGSMV on the Gen1 automotive and eTraM datasets for event-based object detection, achieving over a 6% improvement in accuracy compared to previous graph-based methods, with a 5× speedup, reduced parameter count, and no increase in computational cost. These results highlight the effectiveness of structured graph modeling for asynchronous vision and demonstrate that explicitly modeling spatial and temporal neighborhoods is key to high-performance event-based detection.

Spatiotemporal Multigraph Architecture

eGSMV architecture diagram (placeholder)

eGSMV represents each event as a node in a 3D spatiotemporal space and builds a multigraph with independent spatial and temporal neighborhoods. The Spatial Structure Learning (SSL) branch uses anisotropic 2D spline kernels over an ellipsoidal neighborhood to capture local spatial structure without resorting to dense 3D convolutions. The Motion Vector Learning (MVL) branch aggregates temporal neighbors with motion-vector edge features and GATv2-based attention to model velocity and brightness changes over time.

Features from SSL and MVL are fused via lightweight concatenation and an MLP, yielding rich spatiotemporal descriptors while maintaining low computational overhead. An event-level detection head operates directly on the sparse node representations, predicting class probabilities and bounding boxes per event without constructing dense grids, enabling fully asynchronous and sparse end-to-end inference.

Results on Gen1 and eTraM

Quantitative results and comparisons (placeholder)

On the Gen1 automotive dataset, eGSMV improves mean Average Precision (mAP) by over 6% compared to prior graph-based approaches while using significantly fewer parameters and lower per-event MFLOPs. Relative to asynchronous graph baselines such as AEGNN and DAGr, eGSMV achieves higher accuracy with comparable or reduced computational cost, benefiting from its 2D kernel design and explicit separation of spatial and temporal neighborhoods.

On the eTraM event-based traffic monitoring dataset, eGSMV similarly outperforms asynchronous methods and narrows the gap to dense, frequency-based detectors, demonstrating strong generalization across different viewpoints and event statistics. These findings underscore the importance of structured spatiotemporal graph modeling for efficient, real-time event-based detection.

BibTeX

@InProceedings{Verma_2026_WACV,
            author    = {Verma, Aayush Atul and Vaghela, Arpitsinh and Chakravarthi, Bharatesh and Chanda, Kaustav and Yang, Yezhou},
            title     = {Event-based Graph Representation with Spatial and Motion Vectors for Asynchronous Object Detection},
            booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
            month     = {March},
            year      = {2026},
            pages     = {3781-3791}
        }