[arxiv, papers with code]
Summary
Approach
Pixel2Mesh is an end-to-end network, which takes an RGB 2D input image and produces a triangular 3D mesh by performing a few sequential mesh refinements applied to the initial ellipsoid mesh.
The initial mesh is an ellipsoid with fixed number of vertices (156), fixed axis radiuses (0.2m, 0.2m, 0.4m), and fixed relative location from the camera (0.8m). As a note, it is unclear from the work whether camera intrinsics are fixed across the training examples.
Throughout the paper, the mesh shape is considered to be a graph where mesh vertices are graph vertices, mesh edges are graph edges, and each vertex carries a feature vector in addition to the coordinates. These feature vectors are used in the graph convolution network described later.