Learning 3D Perception from Others' Predictions

A label-efficient framework for 3D detection using expert predictions

1The Ohio State University, 2Cornell University

ICLR 2025 DriveX@ICCV 2025 (Oral) X-Sense@ICCV 2025

Motivation

motivation

Reliable perception is crucial for safe autonomous driving 🚘

3D detection relies on massive, high-quality labeled data β€” and labeling must be repeated for new cities, sensors, or platforms (e.g., San Francisco β†’ Paris, Velodyne β†’ Cepton).

Can we reuse existing expert perception sources β€” like robotaxis or RSUs β€” to train ego vehicles for label-efficient learning?

Key Challenges

Key challenges

Using expert predictions as labels introduces two fundamental error sources

  • Mislocalization: GPS inaccuracies or synchronization delays (e.g., 0.1 s @ 60 mph β†’ 2.7 m error).
  • Viewpoint mismatch: Objects visible to one agent may be occluded or outside the other's FoV.

Method Overview

R&B-POP pipeline

Refining & Discovering Boxes for 3D Perception from Others’ Predictions

The ego vehicle first receives predictions from expert agents, which inevitably contain noise. It refines their localization with our label-efficient box ranker, then applies a distance-based curriculum to generate high-quality pseudo labels for self-training.

πŸ“„ See the paper for details!

Experiment Overview

Training the ego detector with different pseudo-label sources

πŸ“„ See the paper for full experimental details!

BibTeX

@article{yoo2024rnbpop,
  title={Learning 3D Perception from Others' Predictions}, 
  author={Yoo, Jinsu and Feng, Zhenyang and Pan, Tai-Yu and Sun, Yihong and Phoo, Cheng Perng and Chen, Xiangyu and Campbell, Mark and Weinberger, Kilian Q. and Hariharan, Bharath and Chao, Wei-Lun},
  journal={arXiv preprint arxiv:2410.02646},
  year={2024}
}