Learning 3D Perception from Others' Predictions

A label-efficient framework for 3D detection using expert predictions

1The Ohio State University, 2Cornell University

ICLR 2025 ICCV 2025 DriveX Workshop (Oral)

Motivation

motivation

Reliable perception is crucial for safe autonomous driving 🚘

3D detection relies on massive, high-quality labeled data β€” and labeling must be repeated for new cities, sensors, or platforms (e.g., San Francisco β†’ Paris, Velodyne β†’ Cepton).

Can we reuse existing expert perception sources β€” like robotaxis or RSUs β€” to train ego vehicles for label-efficient learning?

Key Challenges

Key challenges

Using expert predictions as labels introduces two fundamental error sources

  • Mislocalization: GPS inaccuracies or synchronization delays (e.g., 0.1 s @ 60 mph β†’ 2.7 m error).
  • Viewpoint mismatch: Objects visible to one agent may be occluded or outside the other's FoV.

Method Overview

R&B-POP pipeline

Refining & Discovering Boxes for 3D Perception from Others’ Predictions

The ego vehicle first receives predictions from expert agents, which inevitably contain noise. It refines their localization with our label-efficient box ranker, then applies a distance-based curriculum to generate high-quality pseudo labels for self-training.

πŸ“„ See the paper for details!

Experiment Overview

Training the ego detector with different pseudo-label sources

πŸ“„ See the paper for full experimental details!

BibTeX

@article{yoo2024rnbpop,
  title={Learning 3D Perception from Others' Predictions}, 
  author={Yoo, Jinsu and Feng, Zhenyang and Pan, Tai-Yu and Sun, Yihong and Phoo, Cheng Perng and Chen, Xiangyu and Campbell, Mark and Weinberger, Kilian Q. and Hariharan, Bharath and Chao, Wei-Lun},
  journal={arXiv preprint arxiv:2410.02646},
  year={2024}
}