-
Notifications
You must be signed in to change notification settings - Fork 7
1712.07262
[arxiv 1712.07262] FoldingNet: Point Cloud Auto-encoder via Deep Grid Deformation [PDF] [notes]
Yaoqing Yang, Chen Feng, Yiru Shen, Dong Tian
read 2018/09/19
Deform a regular grid into an object point cloud. Concurrent work from AtlasNet
- Prepose a deep auto-encoder that goes from a 3D point cloud to a 3D point cloud which is a deformation of a 2D square grid
- They show that using the encoding the obtained they can outperform other unsupervised encodings for classification purposes
-
The input point set S is a nx3 vector of point coordinates that are obtained by randomly sampling the model's meshe's triangles
-
the produced codeword is of size 512
-
a graph is created where each point is associated to it's 16 closest neighboors
-
a local covariance matrix is constructed of size 3x3 and flattened to 1x9
-
the point's positions and the local covariances are concatenated as input to a 3-layer perceptron
-
two graph layers with max-pooling of the neighborhood of each node ensue
- they define the folding operation as the concatenation of the codeword to the grid points, followed by a MLP
- they perform two folding operations sequentially, first on the original grid, and then on the points outputted by the first folding operator
- A variant of the Chamfer distance is used to handle point clouds of different size between the reconstruction and the input point clouds that is the max of the two parts of the usual Chamfer distance to force them both to be simultaneaously small
- They show that the quality of the codeword (as evaluated on the classification task on model net) degrades gracefully with the size of ShapeNet's dataset used at training time, e.g. 85% classification accuracy is obtained with a linear SVM on codewords from 20% of the training data, vs 89% when the whole dataset is used
- They show that a fully connected decoder achieves lower reconstruction error and lower classification accuracy (0.89 vs 0.84 % accuracy) on the generated codewords
- the fully connected decoder is a 3-layer network that goes from 512 -->1024-->2048 features --> 2048x3 coordinate points
Starting from the grid, it is possible to perform successive deconvolution operations which progressively reduce the feature dimension (down to 3 final coordinates from 3 features) while increasing the spatial resolution, which allows to go from the codeword to a grid of 45x45 coordinates, producing a grid of coordinates that can be interpreted as a point cloud.
This implementation produces lower reconstruction accuracies but marginally better classification error
- marginally decreases performance
- does not significantly improves classification scores (from 88.25 --> 88.41% classification accuracy when going from 2 to 3 folding operations)
- same performance when using a cube as point inputs
- lower performance (marginally when using input line (1D) instead of input surface 88.41 --> 86.71 % accuracy