Learning 3D Perception from Others' Predictions

A label-efficient framework for 3D detection using expert predictions

Jinsu Yoo¹, Zhenyang Feng¹, Tai-Yu Pan¹, Yihong Sun², Cheng Perng Phoo², Xiangyu Chen²,
Mark Campbell², Kilian Q Weinberger², Bharath Hariharan², Wei-Lun Chao¹

¹The Ohio State University, ²Cornell University

ICLR 2025 DriveX@ICCV 2025 (Oral) X-Sense@ICCV 2025

arXiv Code

Motivation

Reliable perception is crucial for safe autonomous driving 🚘

3D detection relies on massive, high-quality labeled data — and labeling must be repeated for new cities, sensors, or platforms (e.g., San Francisco → Paris, Velodyne → Cepton).

Can we reuse existing expert perception sources — like robotaxis or RSUs — to train ego vehicles for label-efficient learning?

Key Challenges

Using expert predictions as labels introduces two fundamental error sources

Mislocalization: GPS inaccuracies or synchronization delays (e.g., 0.1 s @ 60 mph → 2.7 m error).
Viewpoint mismatch: Objects visible to one agent may be occluded or outside the other's FoV.

Method Overview

Refining & Discovering Boxes for 3D Perception from Others’ Predictions

The ego vehicle first receives predictions from expert agents, which inevitably contain noise. It refines their localization with our label-efficient box ranker, then applies a distance-based curriculum to generate high-quality pseudo labels for self-training.

📄 See the paper for details!

Experiment Overview

Training the ego detector with different pseudo-label sources

📄 See the paper for full experimental details!

BibTeX

@article{yoo2024rnbpop,
  title={Learning 3D Perception from Others' Predictions}, 
  author={Yoo, Jinsu and Feng, Zhenyang and Pan, Tai-Yu and Sun, Yihong and Phoo, Cheng Perng and Chen, Xiangyu and Campbell, Mark and Weinberger, Kilian Q. and Hariharan, Bharath and Chao, Wei-Lun},
  journal={arXiv preprint arxiv:2410.02646},
  year={2024}
}