Reproduction of "Learning Steerable Filters for Rotation Equivariant CNNs"

# Reproduction of "Learning Steerable Filters for Rotation Equivariant CNNs" **Bertold Kovacs (5572630) Saloni Saxena (6164579) Vaishnav Srinidhi (6211097)** In many real life applications of computer vision models, we desire some transformational properties; when the input is transformed in some way, we want predictable behavior in the output. There are two key concepts here: * **Invariance** means the output stays exactly the same regardless of the transformation. For example, if I want to recognize whether a circle is in a picture, it should not matter if the circle is in the right corner, or in the middle - the classification should remain "circle detected" * **Equivariance** is a broader concept where the output transforms in a predictable, corresponding way to the input transformation. If you rotate an image by 90 degrees, an equivariant model's internal representations will also rotate by 90 degrees, maintaining the spatial relationships. These properties can be learned from data through augmentation (adding transformed examples), but building them into the architecture offers advantages like better generalization and reduced overfitting. Convolutional Neural Networks (CNNs) are translation equivariant by design - when you shift the input image, the feature maps shift correspondingly. However, for other transformations like rotation, equivariance is typically learned through extensive data augmentation rather than being built into the architecture.  In [Learning Steerable Filters for Rotation Equivariant CNNs](https://arxiv.org/pdf/1711.07289) by Weiler et al. , the authors propose the **Steerable Filter CNN (SFCNN)** architecture, which is both **translation and rotation equivariant**. In this blogpost, we explain the idea behind SFCNNs and describe our attempt in reproducing and replicating their reported results. ## Steerable Filters and Rotation Equivariance If a simple CNN is trained to identify dogs, but is only trained with pictures of dogs standing upright, it will struggle to correctly identify a dog that is horizontal as it has never seen such an orientation before. Simple CNNs lack the ability to be equivariant under rotation. Typical examples of real life situations which are rotation equivariant include biomedical microscopy images or astronomical data. Furthermore, even in cases where the full input itself is not rotation equivariant, low level features (seen in the first layers; edges, for example) are often rotation equivariant. Previous approaches to learn a rotation equivariant model fall into two categories. First, we can use **data augmentation** to add additional, rotated training samples. While effective to some extent, since this approach does not reduce the hypothesis space and needs explicit learning of the equivariance from the data by a model with high learning capacity, it suffers from overfitting and is computationally expensive. Alternatively, we can put constraints on the model itself by including the equivariance into the **architectural design**. [Cohen and Welling (2016)](https://arxiv.org/pdf/1602.07576) proposes Group Convolutional Networks, which include filters of four orientations in the network. The proposed SFCNNs improve upon this by being able to handle an arbitrary number and precision of orientations. The core innovation of the SFCNN architecture lies in its filters. Instead of learning the weights of a filter pixel by pixel, an SFCNN learns to steer a predefined set of basis filters. The architecture is composed of layers that perform **group convolutions**. The filters themselves are not arbitrary - they are constructed as a linear combination of a fixed set of atomic steerable basis functions. This means the network doesn't learn the filter directly, but rather the weights to combine these basis functions. By manipulating these weights, the filter can be "steered" to any orientation. The choice of basis functions thus becomes crucial. The authors choose **circular harmonics**, which are a set of mathematical functions that are well-suited for representing signals on a circle. In the context of 2D images, they provide a natural basis for representing the angular component of a filter. Further, the authors introduce a new weight initialization scheme to aid learning. They recognized that standard initialization techniques like Xavier or He initialization are not directly applicable to SFCNNs because the learned weights are not pixel values but coefficients for the basis functions. They proposed a generalized initialization scheme, **CoeffInit**, that ensures that the initial filter responses are well-behaved and conducive to effective learning. ## Reproduction Method and Results In our work, we have a mixture of **replications** and **reproductions**. * The replications are where we implemented the models and experiments according to the paper using the [e2cnn library](https://github.com/QUVA-Lab/e2cnn), train and test the models, and report our results. The code for our work can be found in this [GitHub repository](https://github.com/V41SH/Reproduction-Project-12). * The reproductions are where we set up and run the [experiment scripts](https://github.com/QUVA-Lab/e2cnn_experiments) provided and evaluate the results. We will present 5 experiments. In Experiment 1, we replicate the illustration for the circular harmonics equation, to ensure that the paper uses this fundamental building block correctly. In Experiment 2 and 3 we replicate and reproduce models with different hyperparameters, to verify the tendencies they claim in their paper. In Experiment 4 we replicate their results on a real life dataset, to ensure that their model performs well in a real life scenario too. Finally in Experiment 5, we compare the model's performance with other models'. ### Experiment 1: Circular Harmonics We started with replicating their visualization of the circular harmonics (see Figure 2 of the original paper). This was relatively straightforward - we simply plotted the circular harmonics equation with matplotlib. Each circular harmonic has a specific frequency and phase. We visualize the results according to the radial part *j* and the angular frequencies *k*. <figure id="circular_harmonics_repro"> <img src="https://hackmd.io/_uploads/SJ4GvwgElg.jpg" alt="Reproduction results for Circular Harmonics"/> <figcaption>Figure 1: The circular harmonics illustration, same results to the original paper's second figure. </figcaption> </figure> **Results and Analysis**: This reproduction was successful and matched the original paper's results exactly. The visualization correctly demonstrates the mathematical foundation of the circular harmonics used in the steerable filter CNNs. ### Experiment 2: Filter Orientations vs Test Error (Figure 4 Left) This experiment illustrates how the test error on the rotated MNIST dataset changes when we increase the number of sampled filter orientations. Due to computational limitations, we decided to run only a subset of the experiments. This experiment shows how the models react to changes in hyperparameters, and whether their reported tendencies are correct. **Replication Results:** <figure id="fig_4_repli"> <img src="https://hackmd.io/_uploads/B1UdIu-Vel.png" alt="Replication results for Figure 4 left."/> <figcaption>Figure 2: The results of our replication. Our results differed significantly from what we expected; neither the tendency, nor the magnitude of the errors matched our expectations. </figcaption> </figure> **Reproduction Results:** <figure id="fig_4_repro"> <img src="https://hackmd.io/_uploads/S10X8OZEee.jpg" alt="Reproduction results for Figure 4 left."/> <figcaption>Figure 3: Reproduction using their existing code went according to plan, we got similar results to what is reported in the paper. </figcaption> </figure> **Results and Analysis**: The reproduction results were much closer to what was reported in the paper compared to our replication. Our replication results differed significantly from expectations in both tendency and magnitude of errors despite following the experimental setup in the paper. This difference in results is likely due to lack of all implementation details, and computational constraints. On closer inspection of [e2cnn experiments](https://github.com/QUVA-Lab/e2cnn_experiments), we noticed that their architecture for the same experiment is much more complex than mentioned in the paper. Since they do provide code for running the experiment, the paper does remain reproducible, but ideally we should be able to replicate their results only using their paper, without outside code. ### Experiment 3: Rotation Generalization (Figure 4 Right) This experiment plots the test error of models across varying angles of rotation in the test dataset. This quntitatively shows that SFCNN generalizes to unseen rotations. The models trained for this part were: * 4 simple CNNs of similar capacity (parameters) trained on MNIST data, each differing in rotation orientations of the training dataset: no rotations, 180° rotations, 90° rotations, and continuously rotated data * An SFCNN with 16 orientations trained on original unrotated MNIST data <figure id="fig_4_right"> <img src="https://hackmd.io/_uploads/SkcCyhf4ll.png" alt="Reproduction results for Figure 4 right."/> <figcaption>Figure 4: The results of our replication for figure 4 (right) in the original paper. </figcaption> </figure> **Results and Analysis**: The SFCNN generalizes to rotations much better than the first two CNNs without being trained on augmented data, demonstrating the effectiveness of steerable filters for rotation invariance. Only the heavily data-augmented CNNs (90° and continuous rotations) outperform the SFCNN, which is expected given their extensive exposure to rotated training data. This validates the paper's claim about the rotation equivariance properties of SFCNNs. ### Experiment 4: ISBI 2012 2D EM Segmentation Challenge (Figure 5) The authors demonstrated state-of-the-art performance on the ISBI 2012 2D EM segmentation challenge to show the segmentation capabilities of their proposed model on a real-world dataset. We found that the corresponding experiment was not available in the [experiments repository](https://github.com/QUVA-Lab/e2cnn_experiments). Therefore, we implemented our own experiment and model from scratch, working directly from the methodology described in the paper. The architecture for this challenge is a U-Net inspired design detailed in Figure 6 of the appendix. However, the paper does not specify key training details such as the number of epochs and batch size, which may explain why our predictions vary from their results. We applied a binary mask to our predictions for visualization to match their presentation format. We trained the replicated model for 50 epochs with a batch size of 1 due to computational constraints. <figure id="fig_5_repli"> <img src="https://hackmd.io/_uploads/BkHEniMNee.png" alt="Replication results for Figure 5 left."/> <figcaption>Figure 5: The results of our replication. ISBI 2012 challenge: (left) Raw EM image, (center) Ground Truth, (right) Predicted boundary segmentation. </figcaption> </figure> **Results and Analysis**: The ISBI challenge uses two main metrics: **VRand** F-score: Measures overall similarity between segmentations based on pixel pair agreements, penalizing both splits and merges **VInfo** F-score: Quantifies information loss/gain between segmentations, highly sensitive to the number and distribution of segments Our replicated model achieved a VRand F-score of 0.7770, indicating good overall segmentation performance. However, the VInfo F-score was very low at 0.0364, highlighting significant topological discrepancies likely due to prevalent under-segmentation (merging of objects). The deviation in performance compared to the original paper can be attributed to missing training details (epochs, batch size, learning rate schedule) and computational limitations that forced us to use a batch size of 1, which likely affected model convergence and generalization. Additionally, a lot of post-processing often goes into these final outputs which have not been detailed in the paper. ### Experiment 5: MNIST Classification Performance The following table mentions the test errors on the rotated MNIST dataset for a baseline (conventional CNN), proposed SFCNN with baseline initialization (He init) and proposed SFCNN with proposed initialization (Coeff init) | Method | Test Error(%) | | -------- | -------- | | **SFCNN - CoeffInit** | **2.89** | | SFCNN - HeInit | 2.94 | | conventional CNN (Z2CNN from [Cohen and Welling (2016)](https://arxiv.org/pdf/1602.07576)) | 56.15 | **Results and Analysis**: Our results show that the SFCNN with coefficient initialization achieves the best performance, closely followed with He initialization. Both significantly outperform the conventional CNN. While our error rates are higher than those reported in the original paper, they follow the same trend. This validates the author's claims about the effectiveness of their proposed initialization scheme. ## Finally, how reproducable is the paper? Following the work in this paper, there was further work done in group equivariance in CNNs. This led to the e2cnn library and experiments that we extensively used for our reproductions. Since they have a well documented repository, the paper is well reproducible. Even when details are unclear or missing from the paper, their code is available. One problem is with the ISBI dataset. There is no longer an "official" source to access this, since the challenge has long since ended (and the link provided in the paper no longer works). While the (presumably) same dataset is accessible in an unaffiliated github repository, it would still be ideal to ensure that the datasets do not "deprecate" in the future. ## Contributions Our team worked collaboratively on all parts of this reproduction project. More specifically, | Group member | Main Contributions | |-------------|-------------------------------------------------------------------------| | Bertold | Model Training, Hyperparameter testing experiments - Blog | | Saloni | Model Training, ISBI training, ISBI evaluation - Blog | | Vaishnav | Model training, Hyperparameter testing experiments, ISBI model replication - Blog | For the purposes of grading, Bertold was responsible for the replication, Saloni for reproduction and Vaishnav for hyperparameter testing. ## Conclusion Reproducing ["Learning Steerable Filters for Rotation Equivariant CNNs"](https://arxiv.org/pdf/1711.07289) was trickier than expected. We successfully validated the core mathematical foundations - the circular harmonics matched the paper perfectly and the rotation generalization experiments clearly showed that SFCNNs can handle rotated inputs without needing augmented training data. However, we also learned how tricky it can be to replicate deep learning papers from scratch. Our replication attempts often diverged significantly from the original results, while running the author's provided code worked much better. This really highlights how sensitive these models are to implementation details that don't always make it into the paper. So, SFCNNs prove to be valuable for applications requiring rotation equivariance. The theoretical framework is sound and the performance benefits are demonstrated, although implementing from scratch (just looking at the paper) presents significant challenges and required considerable time investment. ## Generative AI Declaration * Brainstorming - Used Claude.ai to brainstorm, understand concepts and assimilate a high-level plan of action prior to beginning implementation and report writing * Grammar and Spelling Check - Used ChatGPT for grammar and spell checks