Digging into Radiance Grid for
Real-Time View Synthesis with Detail Preservation
ECCV 2022 (POSTER)

  • Jian Zhang*
    Alibaba Group
  • Jinchi Huang*
    Alibaba Group
  • Bowen Cai*
    Alibaba Group
  • Huan Fu#
    Alibaba Group
  • Mingming Gong
    University of Melbourne

  • Chaohui Wang
    Univ Gustave Eiffel
  • Jiaming Wang
    Alibaba Group
  • Hongchen Luo
    Alibaba Group
  • Rongfei Jia
    Alibaba Group
  • Binqiang Zhao
    Alibaba Group
*denotes equal contribution     #denotes corresponding author

Abstract

overview

Neural Radiance Fields (NeRF) series are impressive in representing scenes and synthesizing high-quality novel views. However, most previous works fail to preserve texture details and suffer from slow training speed. A recent method SNeRG demonstrates that baking a trained NeRF as a Sparse Neural Radiance Grid enables real-time view synthesis with slight scarification of rendering quality. In this paper, we dig into the Radiance Grid representation and present a set of improvements, which together result in significantly boosted performance in terms of both speed and quality. First, we propose an HieRarchical Sparse Radiance Grid (HrSRG) representation that has higher voxel resolution for informative spaces and fewer voxels for other spaces. HrSRG leverages a hierarchical voxel grid building process, and can describe a scene at high resolution without excessive memory footprint. Furthermore, we show that directly optimizing the voxel grid leads to surprisingly good texture details in rendered images. This direct optimization is memory-friendly and requires multiple orders of magnitude less time than conventional NeRFs as it only involves a tiny MLP. Finally, we find that a critical factor that prevents fine details restoration is the misaligned 2D pixels among images caused by camera pose errors. We propose to use the perceptual loss to add tolerance to misalignments, leading to the improved visual quality of rendered images.

Overview Video

HieRarchical Sparse Radiance Grid (HrSRG)

overview

We first bake Def-NeRF$^{\dagger}$ into a $N^3$ voxle grid to represent a scene. A voxel contains opacity (or volume density), diffuse color, and a 3-dimensional specular feature. For a specific voxel, we randomly cast $K$ rays from $K$ origins passing through the voxel, and compute their accumulated transmittance values. With this operation, we can assign each voxel a maximum transmittance value. Then, we define four transmittance intervals (or 3D spaces) A, B, C, and D as shown in above figure based on the transmittance thresholds $\tau_1$, $\tau_2$, and $\tau_3$, respectively. We find that space A contains most of the high-frequency texture details while other spaces contribute slightly to the base color in the rendering process. This observation motives us to utilize higher voxel resolution for space A, and lower voxel resolution for spaces B, C, and D. For the purpose, we introduce the merging and splitting operations leveraging the octree data structure.

Merging. By converting the $N^3$ voxle grid to a tree structure with maximum depth $L$, we have a transmittance value for each node and leaf. For each node that is with depth $L_i$ and has eight leaves, if the maximum transmittance value among itself and its leaves is below $\tau$, we delete the eight leaves. We recursively consider depth $L_1$, $L_2$, and $L_3$ with thresholds $\tau_3$, $\tau_2$, and $\tau_1$, respectively. As a results, we capture the approximate voxel resolutions $(N/2)^3$, $(N/4)^3$, $(N/8)^3$ for spaces B, C, and D, respectively.

Splitting. After the merging operation, if a leaf is with depth $L$, we take it as a parent of eight new leaves. A newly created leaf has the same voxel values (i.e., $(\text{c},\sigma,\text{v})$) as its parent. With this splitting operation, the space A would have a voxel resolution of $(2N)^3$.

Acknowledgements

We are very grateful for the support provided by the Tao Technology Department, Alibaba Group.

The website template was borrowed from Michaƫl Gharbi.