DoGFlow:

Self‑Supervised LiDAR Scene Flow via Cross‑Modal Doppler Guidance

Ajinkya Khoche^1,2, Qingwen Zhang¹, Yixi Cai¹, Sina Sharif Mansouri², Patric Jensfelt¹

¹ KTH Royal Institute of Technology, Stockholm 10044, Sweden
² Autonomous Transport Solutions Lab, Scania Group, Södertälje, SE-15139, Sweden

arXiv Code Video BibTeX

Abstract

Accurate 3D scene flow estimation is critical for autonomous systems to navigate dynamic environments safely, but creating the necessary large-scale, manually annotated datasets remains a significant bottleneck for developing robust perception models. Current self-supervised methods struggle to match the performance of fully supervised approaches, especially in challenging long-range and adverse weather scenarios, while supervised methods are not scalable due to their reliance on expensive human labeling. We introduce DoGFlow, a novel self-supervised framework that recovers full 3D object motions for LiDAR scene flow estimation without requiring any manual ground truth annotations. This paper presents our cross-modal label transfer approach, where DoGFlow computes motion pseudo-labels in real-time directly from 4D radar Doppler measurements and transfers them to the LiDAR domain using dynamic-aware association and ambiguity-resolved propagation. On the challenging MAN TruckScenes dataset, DoGFlow substantially outperforms existing self-supervised methods and improves label efficiency by enabling LiDAR backbones to achieve over 90% of fully supervised performance with only 10% of the ground truth data. The source code will be made publicly available.

TL;DR: Radar detects dynamic points via Doppler and clusters them to estimate 3D velocities. These motions are then transferred to LiDAR clusters through association and propagation, yielding dense scene flow without human labels.

Key Results

Range-wise Breakdown

Range-wise breakdown of scene flow performance. Left: Dynamic EPE and Right: Dynamic IoU across perception range bins. DoGFlow (red) exhibits significantly lower degradation with increasing range compared to prior methods. SSF* : Fully supervised.

Label Efficiency

Label efficiency evaluation comparing SSF trained from scratch (purple) versus SSF pretrained on DoGFlow pseudo labels and finetuned with limited ground truth (gold). DoGFlow pretraining significantly boosts performance in low-label regimes. With only 10% ground truth, the pretrained model achieves over 90% of fully supervised performance, reducing annotation cost while maintaining accuracy.

Qualitative Results

Examples on MAN TruckScenes across clear, rain, and snow. Each clip shows input LiDAR and predicted flow colored according to scene flow wheel.

Long-range Scene Flow

Owing to cross-modal label transfer from radar to LiDAR, DoGFlow's performance is independent of LiDAR sparsity.

Limitations

Assumes accurate, static radar–LiDAR extrinsics; misalignment can bias velocity estimates.
Reduced accuracy at close range for slow movers in Doppler weak/blind regions.
Multipath/clutter can introduce Doppler anomalies; mitigated by cluster‑level estimation.

BibTeX

@article{khoche2025dogflow,
  title   = {DoGFlow: Self-Supervised LiDAR Scene Flow via Cross-Modal Doppler Guidance},
  author  = {Ajinkya Khoche and Qingwen Zhang and Yixi Cai and Sina Sharif Mansouri and Patric Jensfelt},
  journal = {arXiv preprint arXiv:2508.18506},
  year    = {2025}
}

Acknowledgements

Page borrowed from STORM .