# Learning Derivatives with Implicit Neural Representations
## Description of the project
In this project, we explore various hyperparameters for training an implicit neural representation of a 2D isotropic diffusion sequence of a single image.
The original image used for generating data can be seen under the *original* subdirectory.
The parameters we explore are:
- Activation Functions
- Learning Rates
- Loss Functions
## Description of the repository
All code to reproduce experiments are within the FinalProject.ipynb notebook.
## Example commands to execute the code
Run all cells within the notebook in order.
Results will be generated under the *runs* subdirectory which can be visualized within the notebook's last cell
or by running the following command in a separate terminal from the base directory of this respository.
```
tensorboard --logdir=runs
```
The trained model's video output can be found with the same name under the *videos* subdirectory.
**Note - Stopping training mid-process through ctrl-c can cause CUDA memory errors. \
To fix this, you must restart your kernel**
## Results
### Overview of Experiments
Our objective was to determine the optimal set of hyperparameters for the implicit neural network. “Optimal” means highest accuracy of the network in learning not only the function, but also the gradients, laplacians and time derivatives.
#### Experiments
We ran the following experiments:
1. Find the best performing learning rate.
- From 1e-4 to 1e-7
- Learning Rate Schedulers:
- Uniform
- Decay (Exponential, Multi-Step)
- Cyclic
2. Find the best performing type of activation function.
- Elu
- Periodic
3. Find the best performing weighted loss.
- Learning on pixel value, gradients, laplacian, pixel time derivative alone or in weighted combination
#### Where to find results
The results of each experiment (after having run the notebook) are stored in a subdirectory with the following naming conventions:
1. For learning rate and loss weight combination experiments:
```
[runs OR videos]/cameraman/experiments/[beta weights]/[activation function]_[learning rate scheduler]_[learning rate]
```
2. For activation function experiments:
```
[runs OR videos]/[activation function]/cameraman_experiments_[loss type]
```
### Metrics Used
#### Metric 1: MSE loss
We used four types of MSE losses: pixel MSE loss, gradient MSE loss, laplacian MSE loss (2nd derivative) and time derivative MSE loss (3rd derivative).
#### Metric 2: SSIM (Structural Similarity Index Measure)
This metric has been developed by Wang et al [1] to compare the visual similarity between two images by taking luminance, contrast and structure into account. It is often used for measuring the image quality degradation, for example by compression. Since the sharpness of an image generated by an implicit network indicates how well the implicit network learned the underlying image function f, we decided to use this as a metric in addition to the MSE loss. This allowed us to compare the visual image quality of ground truth and implicitly encoded images.
The original paper can be found [here](https://www.cns.nyu.edu/pub/eero/wang03-reprint.pdf) [1]. A detailed description of this metric can be found [here](https://medium.com/srm-mic/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e) [2].
### Resulting Graphs and Key Observations
PDFs of our notebook containing the results can be found under the *results subdirectory*.
*Notebook_Loss_Combinations_And_Activations.pdf* shows the learning rate experiments and *Notebook_Learning_Rates_And_Activations.pdf* shows the loss weight experiments. Both show the activation experiments.
#### 1. Learning Rate
We quantitatively examined what the best learning rate and scheduler are in terms of pixel loss, pixel SSIM, gradient loss and laplacian loss.
##### 1.1 Pixel MSE Loss


##### 1.2 Pixel SSIM


##### 1.3 Gradient MSE Loss


##### 1.4 Laplacian MSE Loss


##### 1.5 Time Derivative MSE Loss


##### Discussion of Learning Rate Experiments
As shown in 1.1, in terms of the pixel value loss, the smallest uniform learning rate performed best (1e-7), with the exponential decay performing second best. As expected, the largest uniform learning rate performed by far the worst, likely ping-ponging around the minimum without every being able to descent to it properly. These results indicate that for future experiments, we should try even smaller learning rates in order to find a good range for the learning rates, as the range of learning rates that we have experimented on might not have included small enough learning rates.
The same results that we found for the pixel value loss actually apply also to the gradient (1.3) and to the laplacian loss (1.4) results. This means that there is not one learning rate or scheduler working better for the pixels, and another, for example, working better for the gradients. Instead, the same learning rates perform pretty consistent on all losses.
This is different however when it comes to the visual quality measure SSIM that we have discussed earlier. The graph is shown in 1.2. It is interesting to note that for SSIM, exponential decay performed by far the best. Meanwhile, the smallest uniform learning rate only performed third best, significantly worse than exponential decay. Thus, the same learning rate or schedulers actually perform differently on the MSE loss and the visual similarity metric.
For the time derivative loss, we actually observe that the largest uniform learning rate (1e-4) performs best, i.e. it yields the lowest loss results. This is an interesting result as for pixel, grad and laplacian loss, the same learning rate yielded the highest loss values.
We also wanted to note that we did try cyclic learning rates, but there was a bug in the implementation and so we excluded these results as we cannot draw any reasonable conclusions from them for now. They are included in the code however, and by adjusting the `step_size_up` parameter appropriately, the user can run experiments on cyclic learning rates using our code.
#### 2. Type of activation function
We quantitatively examined the two activation functions (ELU and periodic) and compared their pixel loss, pixel SSIM, gradient loss and laplacian loss.
We analyzed the effect of activation function choice analyzed under different loss weights to evaluate whether elu or a periodic activation function (as per the SIREN paper [3]) yield higher accuracy.
Weight combinations used (Pixel value, Gradients, Laplacian, Pixel time derivative):
- 1, 0, 0, 0
- 0, 1, 0, 0
- 0, 0, 1, 0
- 0, 0, 0, 1
- 1, 1, 1, 1
##### 2.1 Pixel MSE Loss


##### 2.2 Pixel SSIM


##### 2.3 Gradient MSE Loss


##### 2.4 Laplacian MSE Loss


##### 2.5 Time Derivative MSE Loss


##### Discussion of activation function experiments
For both the Pixel Loss and SSIM (Structural Similarity Index Measure), the best performing periodic activation function greatly outperforms the best performing elu activation function. This is in line with the results from the SIREN paper [3] as it introduced the periodic activation functions such that implicit networks learn how to generate sharp images close to the ground truth.
#### 3. Loss Weight Combination
We ran 100 experiments, each with a different, randomly generated weight combination for weighing the different losses during training. Please refer to our presentation for the corresponding graphs, and/or run the submitted code to generate these graphs yourself.
We report the four best performing weight combinations in terms of pixel loss below:
- Lowest: (0.1, 0.0, 0.01, 0.0)
- 2nd lowest: (1.0, 0.01, 0.001, 0.0)
- 3rd lowest: (0.01, 0.001, 0.1, 0.0)
- 4th lowest: (1.0, 0.001, 0.01, 0.001)
These same weight performed on the gradient loss as follows:
- Lowest pixel loss: This set of weights is part of the worst performing group of weight sets in regards to the gradient loss.
- 2nd lowest pixel loss: Best performing group.
- 3rd lowest pixel loss: Best performing group.
- 4th lowest pixel loss: Worst performing group.
In terms of pixel loss, three out of the four best performing pixel loss weights all weighted the pixel loss most. This means that they mostly learned with the loss on the pixel values, and thus it is intuitive that they perform well on the pixel loss. However, we found that the set of weights that performed best on the pixel loss actually performed badly on the gradient loss, as it placed in the worst-performing group of weight combinations. Therefore, only because a set of weights works well for the pixel loss, it does not automatically yield a low gradient loss as well.
In addition, the set of weights that performed third best on the pixel values actually weighted the laplacians most, not the pixel loss. Weighting the laplacians the most actually allowed it to also perform well on the gradients (it is in the best performing group of weight comibinations).
Overall, the set (1.0, 0.01, 0.001, 0.0), which gives the second-best performance on the pixel loss, also placed in the lowest group on the gradient loss, making it overall the best performing set of weights. The resulting video for this run can be seen below. The images shown are organized by column from left-to-right (pixel, gradient, laplacian, time derivative) where the top row is the ground truth and bottom row is the run's output.

## Conclusion
Through our experiments, we found the following configurations to give the best overall performance.
1. Best Activation Function
- *Periodic/SIREN(sin)*
3. Best Learning Rate
- Lowest Uniform (1e-7) in terms of pixel, grad and laplacian losses
- Highest Uniform (1e+4) in terms of time derivative loss
- Exponential Decay in terms of pixel SSIM
4. Best Weighted MSE Loss Combination (Pixel, Gradient, Laplacian, Time Derivative)
- (1.0, 0.01, 0.001, 0.0)
## References
[1] https://www.cns.nyu.edu/pub/eero/wang03-reprint.pdf
[2] https://medium.com/srm-mic/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e
[3] https://arxiv.org/abs/2006.09661
[4] https://arxiv.org/abs/1812.02822