Event-based sensors offer high temporal resolution and low latency by generating sparse, asynchronous data.
However, converting this irregular data into dense tensors for standard neural networks diminishes these
advantages and increases computational cost. In this work, we propose eGSMV, a novel
spatiotemporal multigraph representation that constructs two decoupled graphs: a spatial graph leveraging
B-spline basis functions to model global structure, and a temporal graph utilizing motion vector-based
attention for local dynamic changes. This design enables efficient 2D kernels in place of expensive 3D
kernels while preserving the sparsity and asynchrony of events.
We evaluate eGSMV on the Gen1 automotive and eTraM datasets for event-based object detection, achieving
over a 6% improvement in accuracy compared to previous graph-based methods, with a 5× speedup, reduced
parameter count, and no increase in computational cost. These results highlight the effectiveness of
structured graph modeling for asynchronous vision and demonstrate that explicitly modeling spatial and
temporal neighborhoods is key to high-performance event-based detection.