# Encoding Equivariance
## On the Loss Function
Have an issue with our proposed loss function
\begin{align*}
L(\theta) := \frac{1}{TN}\sum_{t=1}^T \sum_{i=1}^N \frac{1}{\left\Vert \mathbf{f}_t^{(i)} \right\Vert_2}\left\Vert \phi_{\theta}\left( \bigcup_{j \in \mathcal{N}_t^{(i)}} \mathbf{x}_t^{(j)} - \mathbf{x}_t^{(i)} \mathbf{}\right) - \mathbf{f}_t^{(i)}\right\Vert_2.
\end{align*}
Training loss just oscillates, does not converge.
Note that the loss landscape implemented by $L$ is not equivalent to that implemented by the mean-square error:
\begin{align*}
\tilde{L}(\theta) :=\frac{1}{TN} \sum_{t=1}^T \sum_{i=1}^N \left\Vert \phi_{\theta}\left( \bigcup_{j \in \mathcal{N}_t^{(i)}} \mathbf{x}_t^{(j)} - \mathbf{x}_t^{(i)} \mathbf{}\right) - \mathbf{f}_t^{(i)}\right\Vert_2.
\end{align*}
## Force Prediction Model and Spectral Normalization
Recall that we are predicting forces using a MLP: $$\hat{\mathbf{f}}_{\text{ext}}^{(i)} = f_{\Theta}(\mathbf{x}^{(j)} - \mathbf{x}^{(i)}) = \mathbf{W}^L\sigma\left(\mathbf{W}^{L-1}\sigma\left( \cdots \sigma\left(\mathbf{W}^1(\mathbf{x}^{(j)} - \mathbf{x}^{(i)}) + \mathbf{b}^1\right) \cdots\right) + \mathbf{b}^{L-1} \right) + \mathbf{b}^L$$ A priori, we make no assumptions on the model weights.
In Neural-Swarm, Shi, Hönig, Yue, and Chung impose the constraint that for each $\mathbf{W}^{(i)}$, $1 \leq i \leq L$, we have $\sigma_{\text{max}}(\mathbf{W}^{(i)}) \leq \gamma$. Here $\sigma_{\text{max}}(\mathbf{W}^{(i)})$ denotes the maximum singular value of the weight matrix $\mathbf{W}^{(i)}$. Together, these constraints imply $\text{Lip}(f_{\theta}) \leq \gamma^L$. In other words, we are placing an upper limit on how fast the output of our neural network can grow.
Why is spectral normalization necessary? It mitigates the issue of overfitting to our training data. As $\gamma \to \infty$, we allow all possible functions belonging to our hypothesis class. As $\gamma \to 0$, we allow only the constant function.



### Force Prediction Model, $\gamma = +\infty$

### Force Prediction Model, $\gamma = 3$

### Force Prediction Model, $\gamma = 2$

## Rotational Equivariance
Let $G =\text{SO}(3)$ be the three-dimensional rotation group in $\mathbb{R}^3$. And let $H \leq G$ be the subgroup of $G$ containing the rotations that fix the $d$ coordinate of $v \in \mathbb{R}^3$. In other words, we have that \begin{align*}
[x', y', z']^T = h([x, y, z]^T) \implies z' = z, \quad \forall h \in H.
\end{align*} It is easy to verify that $H$ is indeed a subgroup of $G$. Moreover, $H \cong G' = \text{SO}(2)$, meaning that the subgroup $H$ is naturally isomorphic to the two-dimensional rotation group.
From a geometrical perspective, every rotation in $\mathbb{R}^3$ can be described with a azimuth angle $\varphi$ as well as a polar angle $\theta$. $H$ consists of those rotations such that $\varphi = 0$.
Now, consider the group $H$ acting on the set $\mathcal{X} = \mathbb{R}^3$. I claim that we would like our neural network $f_{\Theta}$ to satisfy the following equivariance property with respect to this group action.
**Def**: A neural network $f_{\Theta}$ is equivariant with respect to the group action of $H$ on $\mathcal{X}$ if it satisfies
\begin{align*}
h(f_{\Theta}(\Delta \mathbf{p}, \Delta\mathbf{v})) =f_{\Theta}(h(\Delta \mathbf{p}), h(\Delta\mathbf{v})), \quad \forall h \in H.
\end{align*}
This desired equivariance property is visualized below:

Now, how do we do this in practice?
Suppose $\theta \in [0, \pi]$ is the angle formed by $\Delta \mathbf{p}_{n,e}, \Delta \mathbf{v}_{n,e} \in \mathbb{R}^2$. And let $r$ be the magnitude of $\Delta \mathbf{p}_{n,e}$ (i.e. the distance of the two drones in the 'n'-'e' plane).
Our neural network takes the form
\begin{align*}
&f_{\Theta}(\Delta \mathbf{p}, \Delta \mathbf{v}) = [F_{\Theta}(\Delta \mathbf{p}, \Delta \mathbf{v}), f_{\Theta}^{(2)}(r, \cos(\theta), \Delta \mathbf{p}_d, \Delta \mathbf{v}_d, \Vert \Delta \mathbf{v}_{n,e} \Vert)]^T,\\
&F_{\Theta}(\Delta \mathbf{p}, \Delta \mathbf{v}) = f_{\Theta}^{(1)}(r, \cos(\theta), \Delta \mathbf{p}_d, \Delta \mathbf{v}_d, \Vert \Delta \mathbf{v}_{n,e} \Vert) \odot [\cos(\varphi), \sin(\varphi)]
\end{align*}
Here $f_{\Theta}^{(1)}: \Omega \rightarrow \mathbb{R}^2$ and $f_{\Theta}^{(2)}: \Omega \rightarrow \mathbb{R}$ , $\Omega = [0, +\infty] \times [0,\pi] \times \mathbb{R} \times \mathbb{R} \times \mathbb{R}_+$ are two separate neural networks. $\varphi \in [0,2\pi]$ is the angle formed by $\Delta \mathbf{p}_{n,e}$ and the north axis. $\odot$ represents the element-wise product of two vectors.
**Theorem**: The neural network $f_{\Theta}(\Delta \mathbf{p}, \Delta \mathbf{v})$ is equivariant with respect to the group action of $H$ on $\mathcal{X} = \mathbb{R}^3$.
**Note**: The 'n' and 'e' components of the predicted force vector are always in the direction of the relative position vector in the 'n' and 'e' direction (as is depicted in the above plot). We can relax this assumption by having our neural network also predict a radial component (thank you Matteo and Ryan for this contribution).
### Equivariant Neural Network on Stage 0 Dataset
**Note**: I have not yet passed to the neural network the magnitude of the relative velocity vector $\Delta \mathbf{v}_{n,e}$ (need to do this).
Non-equivariant Network, Spectral Normalization, $\gamma = 2$:
|||
| -------- | -------- |
|||
|||
Equivariant Network, Spectral Normalization, $\gamma = 2$:
|||
| -------- | -------- |
|| |
|| |

Why is spectral normalization necessary?
Let's zoom out!
$\gamma = 2$:

$\gamma = 3$:

$\gamma = + \infty$:

How does equivariance affect our force predictions in the 'd' direction?
|||
| -------- | -------- |
|||
|| 
|
**At a larger 'd' coordinate**

**Drone moving down through 'n'-'e' plane**

**Drone moving up through 'n'-'e' plane**

## Other Developments
Modeling thrust as a function of throttle and voltage:

$Thrust = A*[V,V^2,V^3]^\intercal + B*[t, t^2, t^3]^\intercal +K$