# Geometric Deep Learning for 3D Shape Correspondence
# Abstract
> one-sentence TL;DR,
> 2-3 sentences for the motivation and key ideas of the baseline work (add a citation),
> a few sentences summary of the problem statement and key contributions,
> and 1-2 sentences summary of the experimental results.
**Too Long; Didn't Read**: Improvement of 3D Shape Correspondence by fusing extrinsic and intrinsic shape descriptors.
**TL;DR**: *We improve the performance on the state-of-the-art 3D Shape Correspondence method by combining handcrafted, spectral, geometric, and learned features*.
Code is available at: [https://github.com/pieris98/team13](https://github.com/pieris98/team13)
# 1. Introduction
> What is the motivation for the work? Why is it important to solve the problem? What would be the potential impacts of this work in the research field, or what would be the potential applications?
The quest for understanding and manipulating 3D shapes is a fundamental area of research in computer vision and computer graphics, with applications in diverse fields like medical imaging[medical], autonomous driving[driving1,driving2], and augmented reality. At the heart of this exploration is the problem of 3D shape correspondence[corres]: the challenge of finding points on different shapes that correspond to each other, akin to identifying the 'same' spot on two distinct objects. This is crucial for tasks like shape matching, morphing, and recognition.
> What are the challenges in the problem? How has previous work including the baseline work approached this problem, and what are the limitations (add citations)? Or, why hasn't the problem been addressed while it is important?
Previous efforts in 3D shape correspondence have largely revolved around leveraging geometric features and machine learning techniques. However, these approaches often face challenges due to the complexity of 3D shapes, varying topologies, and the need for robust feature descriptors that can accurately capture the underlying semantic information of these shapes regardless of sampling resolution or data representation. This has led to an ongoing search for methods that can effectively and accurately determine correspondence across diverse 3D shapes.
Our project is motivated by the recent advancements in the field, particularly the work on DiffusionNet[diffusionnet], which uniquely employs the heat diffusion equation within a deep learning architecture for shape correspondence. While DiffusionNet presents a significant step forward, it predominantly relies on intrinsic descriptors of shapes, which can limit its applicability in scenarios where extrinsic information is crucial.
> What is the problem statement? What are the input and desired output? What are the given information in the training time and the given assumptions? A teaser figure that effectively illustrates the problem setup or the desired output would be great. (E.g., show the best result.)
> What are your key ideas for solving the challenges? It would be also great if your teaser image also describes the key ideas. (E.g. how do your key ideas make differences in the results?)
Recognizing this limitation, our work proposes an innovative approach that integrates both intrinsic and extrinsic descriptors, leveraging the strengths of handcrafted, spectral, geometric, and learned features. This fusion aims to create a more comprehensive and versatile representation of 3D shapes, enhancing the ability to establish accurate correspondences.
>Briefly describe the experimental setup. Which datasets/benchmarks do you use? How do you evaluate the results? What is the conclusion? (E.g. our method outperformed SotA methods by huge margins.)
We benchmark improved features against standard datasets in 3D shape analysis, using metrics that evaluate correspondence accuracy, such as $L^2$ loss and geodesic error. The experiments demonstrate notable improvements over the state-of-the-art methods, especially in handling complex shapes and topologies. The main contributions are:
- Pre-computed data generation for different features (WKS, SGWS, LPS, SHOT) on the FAUST dataset.
- Integration of features combining intrinsic and extrinsic shape descriptors.
- Improved correspondence accuracy verified on the widespread FAUST Humans 3D shape dataset benchmark.
- Provision of an efficient and scalable learning approach for downstream 3D shape analysis tasks.
In summary, our work not only addresses the limitations of current methods in 3D shape correspondence but also sets a new standard in the field through the integration of diverse shape descriptors.
# 2. Method Summary
> Describe the problem setup again, but in detail and also in a formal way. Introduce essential math notations for the following formulations. DO NOT COPY the texts or figures in the original paper, but use the same math notations in the baseline work.
Current methods for 3D shape correspondence strive to support non-rigid and non-topology preserving transformations of objects between the compared shapes. This is paramount since shapes in datasets like FAUST do not preserve metrics such as angles or geodesic distances. The topology can also vary since a different pose might add holes to the shape.
The most naive way would be directly mapping every point from one shape to its corresponding point on the other shape using a mapping $T$. It has been shown that mapping real-valued functions $f$ between the two shapes is more beneficial, i.e. $T$ is now a functional. The map is agnostic to the choice of $f$, where $f$ could represent curvature. It is important to note that $T$ requires both shapes to be equipped with a basis. The choice of basis (whether using a reduced or full basis) has been shown to affect the geodesic error.
Optimizing feature extractors is paramount since it directly influences the performance of the functional map. We investigate an amalgamation of feature extractors--primarily those based on common partial differential equations.
Diffusion methods capture global information by recording heat dissipation. Allowing the diffusion PDE to run for a short period yields local features while longer periods yield larger neighborhood features. Comparing features at different diffusion time intervals is known in literature as multiscale matching. The same general idea applies both in the implementation of HKS and the spatial diffusion block of DiffusionNet.
The intuition behind WKS is tweaking the function of energy levels where larger $f_E$ would correspond to features describing local geometries and smaller energies would correspond to global features.
The Heat Kernel Signature (HKS) as well as the Wave Kernel Signature (WKS) are based on the Laplace-Beltrami Operator (LBO). While HKS models heat dissipation through time,
$\Delta_M u(x, t)=-\frac{\partial u(x, t)}{\partial t}$
The eigendecomposed solution to the heat PDE,
$k_t(x, y)=\sum_{i=0}^{\infty} e^{-\lambda_i t} \phi_i(x) \phi_i(y)$
where $\lambda_i$represents the eigenvalues and $\phi_i$ the eigenfunction of the LBO
It is important to note that even though HKS does not directly depend on the geodesic distance it does not provide guarantees for non-isometric transformations.
WKS, models dissipation of quantum particles using differing energy distributions ($f_E$) on the manifold,
$\frac{\partial \psi}{\partial t}(x, t)=i \Delta \psi(x, t)$
where $\psi$ represents the wave function,
$\psi_E(x, t)=\sum_{k=0}^{\infty} e^{i E_k t} \phi_k(x) f_E\left(E_k\right)$
since time does not have a particular meaning in terms of the manifolds WKS becomes,
$\sum_{k=0}^{\infty} \phi_k(x)^2 f_E\left(E_k\right)^2$
since the probability of a particle being at x is $| \psi_E(x,t)|^2$
The main advantage over learnt patch operators that leverage geodesic distances is robustness to topological noise. Geodesics are dramatically affected by the addition of connections in the shape since they may result in new "holes"
The DiffusionNet architecture also transforms input features through running the diffusion PDE for a learnt time period per feature,
$h_t(u):=\Phi\left[\begin{array}{c}e^{-\lambda_0 t} \\ e^{-\lambda_1 t} \\ \ldots\end{array}\right] \odot\left(\Phi^T M u\right)$
where the right part of the Hadamard product represents the coefficients of the real-valued function ($f$) in the spectral domain and the left part simply scales, according to the learnt diffusion, and projects back to the original basis.
An approach followed which did not lead to fruition was replacing the MLP of every DiffusionNet block with multi-head attention. This lead to worse performance in test geodesic error and compute time. Judging from traditional computer vision approaches, like ViT, replacing layers that inherently have more inductive biases requires more training data (Imagenet10k vs JFT300m). FAUST and most 3D datasets used as benchmarks (like SCAPE and TOSCA) boast less than a hundred meshes. Another reason might also be that attention and diffusion compete to learn the global structure of the meshes since all vertices of the mesh were fed into the dot-product self-attention.
> Briefly summarize the method proposed in the baseline work, while focusing on the parts that you implemented (if you used existing codes for some parts). The report needs to be self-contained; the readers must be able to understand the main ideas of the baseline work without reading the paper.
# 3. Implementation Details
> First, briefly but clearly state which parts you implemented in the framework. If you used some existing codes, describe in detail how you also adopted the existing codes. Add references for the existing codes. Hiding the use of existing codes will be considered plagiarism.
For implementing the (previous baselines) relevant work and DiffusionNet (final baseline method), the project has modified the following existing code bases: [DiffusionNet](https://github.com/nmwsharp/diffusion-net/tree/master), [geoconv](https://github.com/andreasMazur/geoconv/), [MoNet](https://github.com/sw-gong/MoNet/tree/master), [GeomFMaps](https://github.com/LIX-shape-analysis/GeomFmaps).
Executing experiments in all of those frameworks are provided in our github repository. However, since the baseline method has changed multiple times, the highlighted implementation details provided in this report refer only to `diffusion-net`'s codebase. For instructions on how to reproduce previous related work refer to the README in other sub-directories (`monet`, `geoconv`, `geomfmaps`) of our GitHub repository.
> Data generation/preprocessing and evaluations are also parts of the implementation. If you implemented these yourself, illustrate the details.
Data generation played a significant part to the contributions of our combined features method. The task proved more challenging than initially thought, due to pre-processing and multiple programming languages (e.g. MATLAB, C++).
> Consider as you explain the implementation details to help readers reproduce the results (like providing a recipe in cooking). Make the exposition easy to follow, but also clear and specific as much as possible.
To reproduce the results illustrated in this report, the following steps must be followed:
- The *training datasets* `FAUST_r` (training meshes) and `FAUST_r_vts` (ground truth correspondences) should first be downloaded by provided links and placed under `diffusionnet/experiments/functional_correspondence/data/faust/` as `off_2/` and `corres/` sub-directories, respectively.
- *Feature* datasets that are dependent on other programming languages (e.g. LPS and SGW) are provided pre-computed should be downloaded first from links provided in the github repository.
- To run a training \& evaluation experiment, execute the following command with desired `input_features` *mode* :
`python functional_correspondence.py \`
`--train_dataset=faust --input_features=[one of: xyz,hks,shot,wks,lps,sgw,mma]`
(Note: the command should be executed within the `functional_correspondence` directory and the `diffusionnet` conda environment and other dependencies in place)
- Results will be stored in `training_results.txt`. An example `jupyter` notebook to process and visualize evaluations is located in `diffusionnet/evaluations`.
- To visualize the results, modify the input path to the path of `training_results.txt` and corresponding `input_features` mode selected before running all code cells.
Instructions for all these steps are also included in the GitHub repository under the `diffusionnet` [directory](https://github.com/pieris98/team13/tree/main/diffusionnet).
> Do not explain your code but the algorithm, and pseudocode if needed. The readers should be able to understand what you implemented without any code-level description.
The implementation integrates features by *modifying* the existing DiffusionNet training dataset structure, loader, caching and use of features for training as follows:
- The `faust_scape_dataset.py` file is modified to process all features *regardless* of the input features mode argument. This means that *all features* are loaded from file or computed when running the code for the first time.
- Specifically, LPS[lps] and SGW[sgw] features are loaded from `.pt` files due to the feature generation code being dependent on MATLAB dependencies, while SHOT and WKS features are computed *on the spot* by using the `pyshot` and `mesh_signatures` libraries with tested parameters.
- Modified LPS features generation code is located under `LPS-matlab/demo.m`.
- Modified SGW features generation code is located under `spectral_descriptors/demo_spectral_descriptor.m`.
- `faust_scape_dataset.py` produces two files `training.pt` and `test.pt`, which are *cached* features to be used in training and subsequent runs of experiments without re-calculating features.
- `fmaps_model`
- `functional_correspondence.py`
> Use visual aids in your exposition. More would be better. But DO NOT COPY figures in the original paper. Formulations are fine (but do not copy and paste an image but write them yourself).
# 4. Experimental Results
> Clearly describe the experimental setup, including benchmarks/datasets and evaluation metrics. If you change or simplify the experimental setup of the baseline work, explain why.
To evaluate the performance of added/combined features, we run the *supervised* version of the functional correspondence experiment followed in DiffusionNet (section 5.3)[diffusionnet]. This experiment suitably matches the task of 3D shape correspondence on standard mesh datasets.
Other tasks' experiments conducted in DiffusionNet (e.g. Classification on SHREC-11 dataset, Segmentation, Discretization Agnostic Learning) are omitted since they are beyond the scope of this project on 3D shape correspondence.
> Try to reproduce all the results in the original paper. Any lack of experiments, evaluations, or comparisons (without any appropriate reasons) will be penalized in the evaluation. EFFE
Regarding the selected functional correspondence experiment, we did *not* include the weakly supervised version reported in the DiffusionNet paper, because there was neither enough information regarding the experimental setup within the paper, nor code available from DiffusionNet and the original experiment in Weakly Supervised Functional Maps[weakly]. Therefore, the weakly supervised experiment and reported results could not be reproduced.
Moreover, there was missing information regarding the added ZoomOut[zoomout] post-processing method, which made it impossible to reproduce that single reported result. We believe that this does not affect our experiments, since the aim is to compare similar configurations of DiffusionNet with different feature data.
In our experiments we used the FAUST Humans[faust] dataset and omitted SCAPE[scape], since FAUST is based on *real* data of laser-scanned humans with a sophisticated printed pattern method, which represents real world 3D shapes better.
For evaluation we used *test geodesic error*, a metric found in the widely spread Princeton Benchmark[princeton] for 3D shape correspondence. The metric evaluates generalization on the test set, as well as being a metric tailored to curved 3D manifolds. It is also used in the DiffusionNet paper and related work, making it easier to compare our results with reported results.
> Provide comparisons with the baseline work. Use the exact results reported in the original paper for comparison. If you test the authors' code in new datasets, explain why. Provide both quantitative results (tables) and qualitative results (figures). Explain in which aspects the results of your method is better or comparable.
After training our model with different input features, a key performance comparison of test geodesic error on the FAUST dataset was made. The evaluation compared previously reported results from DiffusionNet (top section) with results obtained from training DiffusionNet with SHOT descriptors, Wave Kernel Signature (wks), Local Point Signatures (lps), Spectral Graph Wavelet signatures (sgw) and mixed modes of features appended together (mma).
The *key observed result* is that performance increases i.e., geodesic error **decreases**, when multiple features are appended together channel-wise. (mma method). Table 1[tab:geodesic] shows the geodesic error values obtained from each method.
> Make fair and reasonable (apples-to-apples) comparisons. Unfair benefits to your implementation in the experiments will also be penalized. EFFFE
For a fairer comparison, we train both the best baseline result (DiffusionNet trained on heat kernel signatures) and the proposed variants for the *same number of epochs* and measure loss during training and test times, as well as geodesic error performance at test time. Figure 2a[fig:train] illustrates training loss curves for 4 epochs, while Figure 2b[fig:test] illustrates test loss curves.
Two important observations in these plots are:
(a)the *poor generalization* (peak in test loss) and *overfitting* (lowest training loss) of SHOT descriptors alone and (b)the *performance boost* when appending mixed feature modes as input to the network (mma).
> If you achieve better performance than the baseline work, if you improve the proposed algorithm, or if you conduct some additional experiments not performed in the original paper, emphasize the results.
>
Finally, Figure 3[fig:geodesic] shows the geodesic error obtained at each of 4 epochs that the model variants were trained for. The **main result** is illustrated in this plot: the best performance i.e., lowest geodesic error, is obtained by the mma features. This confirms the hypothesis that combining *spectral*(HKS,SGW,WKS), *multi-scale*(LPS), *local*(WKS), *high-order*(SHOT) and *raw geometric*(XYZ) features yields the minimum **information loss** for the model. This, in turn, results in state-of-the-art performance on 3D shape correspondence, surpassing the baseline methods used in DiffusionNet.
> Here or in the conclusion, consider showing some remarkable failure cases explaining the limitation of the proposed method or your implementation, or motivating some future works.
# 5. Conclusion
> Briefly summarize the project, particularly the main parts you implemented and the experimental results.
> Describe the limitation of the proposed work or your implementation and potential future research directions (if you have space in the report).
Some limitations observed during the project are the aforementioned disadvantages of SHOT descriptors when used as sole input features, as well as non-ideal results when solely applying WKS. This second limitation relates to the property that Wave Kernel Signature demonstrates[wks], i.e. tends to learn local features rather than global ones. This is reflected in the evaluation results.
Finally, we leave as future work (a) the porting of MATLAB code to compute features in runtime, and (b) the integration of alternative mechanisms to replace the point-wise MLP layer, but that don't interfere with learned diffusion layers like the self-attention layer did.
## Acknowledgements
> Acknowledgments section is required. Make acknowledgments as a subsection without a section number or as a paragraph.
> Describe the role of each team member.
Aristotelis Siozopoulos and Pieris Kalligeros contributed to the theoretical background, data generation and pre-processing, method implementation, and evaluation experiments.
Alex Tio Suryapranata contributed to visuals, the poster and presentation.
> Describe any data/codes you borrowed from the other places.
For implementing the (previous baselines) relevant work and DiffusionNet (final baseline method), the project has modified the following existing code bases: [DiffusionNet](https://github.com/nmwsharp/diffusion-net/tree/master), [geoconv](https://github.com/andreasMazur/geoconv/), [MoNet](https://github.com/sw-gong/MoNet/tree/master), [GeomFMaps](https://github.com/LIX-shape-analysis/GeomFmaps).
For data pre-processing and datasets, we used/modified code from the following repositories:
[LPS-matlab](https://github.com/yiqun-wang/LPS-matlab/tree/master), [spectral_descriptors](https://github.com/ChunyuanLI/spectral_descriptors), [mesh-signatures](https://github.com/DominikPenk/mesh-signatures).