Lidar-camera Cooperative Semantic Segmentation
-
Graphical Abstract
-
Abstract
LiDAR and cameras are two prominent sources for parsing the semantics of the scene. While the former provides accurate physical measurements, it lacks the colour and texture appearance that the latter excels in. Fully exploiting the rich information of multimodal data is beneficial for comprehensive perception of the environment. To cope with the dual challenges of heterogeneity and consistency faced by multimodal features, we propose a unified multimodal cooperative segmentation workflow. By establishing cross-view cooperation paths, we achieve cross-view feature interactions and missing modality completions. The pre-synchronisation mechanism preserves the alignment semantics and geometry while decoupling the processing of multimodal data augmentation. Notably, our workflow jointly performs LiDAR-based 3D semantic segmentation and image-based 2D semantic segmentation with promising results on two public benchmarks: the SemanticKITTI dataset and the Waymo Open dataset.
-
-