Lidar Panoptic Segmentation and Tracking without Bells and Whistles
Abhinav Agarwalla 1,2 Xuhua Huang 1,3 Jason Ziglar 2 Francesco Ferroni 2 Laura Leal-Taixé 4 James Hays 5 Aljoša Ošep 1 Deva Ramanan 1, 2
Carnegie Mellon University 1
ArgoAI 2
Meta AI 3
TU Munich 4
Georgia Tech 5
* work done while at CMU and Argo, now at Neural Magic
IROS 2023 CMU Argo Meta TUM Georgia Tech



Abstract
State-of-the-art lidar panoptic segmentation (LPS) methods follow ``bottom-up" segmentation-centric fashion wherein they build upon semantic segmentation networks by utilizing clustering to obtain object instances. In this paper, we re-think this approach and propose a surprisingly simple yet effective detection-centric network for both LPS and tracking. Our network is modular by design and optimized for all aspects of both the panoptic segmentation and tracking task. One of the core components of our network is the object instance detection branch, which we train using point-level (modal) annotations, as available in segmentation-centric datasets. In the absence of amodal (cuboid) annotations, we regress modal centroids and object extent using trajectory-level supervision that provides information about object size, which cannot be inferred from single scans due to occlusions and the sparse nature of the lidar data. We obtain fine-grained instance segments by learning to associate lidar points with detected centroids. We evaluate our method on several 3D/4D LPS benchmarks and observe that our model establishes a new state-of-the-art among open-sourced models, outperforming recent query-based models.