Digging into Depth Priors for Outdoor Neural Radiance Fields
ACM Multimedia 2023

  • 1Tsinghua University
  • 2Baidu Research
  • 3Zhejiang University

  • 4Northwestern Polytechnical University
  • 5Chinese Academy of Sciences


Neural Radiance Fields (NeRF) have demonstrated impressive performance in vision and graphics tasks, such as novel view synthesis and immersive reality. However, the shape-radiance ambiguity of radiance fields remains a challenge, especially in the sparse viewpoints setting. Recent work resorts to integrating depth priors into outdoor NeRF training to alleviate the issue. However, the criteria for selecting depth priors and the relative merits of different priors have not been thoroughly investigated. Moreover, the relative merits of selecting different approaches to use the depth priors is also an unexplored problem. In this paper, we provide a comprehensive study and evaluation of employing depth priors to outdoor neural radiance fields, covering common depth sensing technologies and most application ways. Specifically, we conduct extensive experiments with two representative NeRF methods equipped with four commonly-used depth priors and different depth usages on two widely used outdoor datasets. Our experimental results reveal several interesting findings that can potentially benefit practitioners and researchers in training their NeRF models with depth priors.


Experiment Setup

Depth Priors


Depth Supervision Type

Direct: MSE, L1; Indirect: KL, URF



Kitti (5 sequence) and Argoverse (3 sequence)

NeRF Methods

NeRF++, MipNeRF-360, Instant-NGP

Depth Methods

MFFNet (completion), BTS (monocular), CFNet and PCWNet (stereo)


We apply different methods for novel view synthesis on each sequence with the listed depth priors in two settings: sparse input viewpoints and dense input viewpoints. We evaluate both novel view synthesis quality and depth estimation quality with corresponding metrics. We found depth priors are essential for sparse viewpoints.

Experiment Results

Comparison on a sequence with only RGB inputs (top) and mono depth (bottom)

Comparison of different depth priors

kitti argo

Point cloud Visualization

point cloud


    1: Monocular depth is great for sparse view and comes at no cost, it can achieve comparable quality with ground truth LiDAR depth.
    2. Depth supervision is an option for dense view, which increases the geometry quality of NeRFs.
    3. The denser the depth, the better quality it will bring.
    4. Simple loss function and depth filtering are enough.



The website template was borrowed from Michaƫl Gharbi and Ben Mildenhall.