Try   HackMD

[summary] Occupancy Networks: Learning 3D Reconstruction in Function Space (2019)

[arxiv, papers with code, supplement]

Summary

Approach

Instead of reconstructing a 3D shape in the form of a (discrete) voxel grid, point cloud, or mesh from the input data, occupancy networks return a function that predicts an occupancy probability for a continuous 3D point in R3:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The trick is the following: a function that takes an observation x as input and returns a function mapping a 3D point p to the occupancy probability can be replaced by a function that takes a pair (p, x) and returns the occupancy probability (see also: uncurrying):

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Limitations of existing 3D data representations

  • Voxel grids: memory-heavy, up to 1283–2563 maximum resolution
  • Point clouds: need post-processing to generate a (mesh) surface
  • Meshes: existing approaches often require additional regularization, can generate only meshes with simple topology, need the same class reference template, or cannot guarantee closed surfaces

Pipeline

  • Training: the network fθ(p, x) takes as input the task specific object (for example, for single image 3D reconstruction this would be an RGB image) and a batch of 3D points, randomly sampled from the ground truth 3D representation. For each 3D point, the network predicts the occupancy probability which is then compared with the ground truth to compute the mini-batch loss:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Inference: to extract a 3D mesh from the learned fθ(p, x), the paper uses a Multiresolution IsoSurface Extraction (MISE) algorithm, which as a first step is building an octree by progressively sampling in the points where neighbors occupancy predictions do not match. After that, the Marching Cubes algorithm is applied to extract the mesh surface. Finally, the mesh is refined even further using the Fast-Quadric-Mesh-Simplification algorithm.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Network architecture

  • The network architecture is generally the same across different tasks (e.g. single image 3D reconstruction or point cloud completion) with the task-specific encoder being the only changing element.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Once the task encoder has produced the embedding c, it is passed as input along with the batch of T sampled 3D points, which are processed by 5 sequential ResNet blocks. To condition the network output on the input embedding c, Conditional Batch Normalization is used:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • For single image 3D reconstruction the network uses a ResNet-18 encoder with altered last layer to produce 256-dim embeddings.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • For point cloud completion a modified version of the PointNet encoder is used.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • For voxel super-resolution the network uses a 3D CNN encoder, which encodes a 323 input into a 256-dim embedding vector.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Experiments

  • Single image 3D reconstruction:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Point cloud completion:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Voxel super resolution:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
  • Generalization to the Pix3D dataset (unseen data)—could be better:
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Metrics

  • Comparison against 3D-R2N2, PSGN, Pix2Mesh and AtlasNet on the ShapeNet dataset.
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Used diagrams are taken from the paper with minor changes.
All errors present in the interpretation are my own.