Characterization of invariant and equivariant linear operators

# Characterization of invariant and equivariant linear operators written by [@marc_lelarge](https://twitter.com/marc_lelarge) For an order-$k$ tensor $T\in \mathbb{R}^{n^k}$, we define for $\sigma \in \mathcal{S}_n$: \begin{eqnarray*} \def\R{{\mathbb R}} \def\cS{{\mathcal S}} (\sigma \star T)_{\sigma(i_1),\dots, \sigma(i_k)} = T_{i_1,\dots, i_k}. \end{eqnarray*} Note that a permutation $\sigma\in \cS_n$ can act through the operator $\star$ on any tensor $T\in \R^{n^k}$ for any value of $k\geq 1$ (i.e. what matters is that "both $n$ agrees"). A graph $G$ can be represented as a $2$-tensor by its adjacency matrix $A\in R^{n^2}$. If we denote by $P\in \{0,1\}^{n\times n}$ the permutation matrix of $\sigma$, then we have $\sigma \star A = PAP^T$. Two graphs $G_1$ and $G_2$ are said isomorphic if there is a permutation $\sigma$ such that $A_1=\sigma \star A_2$, where $A_1$ and $A_2$ are the adjacency matrices of $G_1$ and $G_2$. A function $f:\R^{n^k}\to \R$ is said to be invariant if $f(\sigma \star T) = f(T)$ for every permutation $\sigma\in \cS_n$. For $k = 1$, any symmetric function is invariant, for example $f(x_1,\dots, x_n) = \prod_{i<j}(x_i-x_j)$, $f(x_1,\dots, x_n) = \max(x_1,\dots, x_n)$, $f(x_1,\dots, x_n) = x_1^2+\dots +x_n^2$, and so on. A function $f:\R^{n^k} \to \R^{n^\ell}$ is said to be equivariant if $f(\sigma \star T) = \sigma \star f(T)$. In [Invariant and Equivariant Graph Networks](https://arxiv.org/abs/1812.09902), all such linear functions are characterized and the following theorem is proved: ### Theorem (Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman) *The linear invariant functions $f:\R^{n^k}\to \R$ live in a vector space of dimension $b(k)$ and the linear equivariant functions $f:\R^{n^k} \to \R^{n^\ell}$ live in a vector space of dimension $b(k+\ell)$, where $b(i)$ is the $i$-th Bell number.* In general, $b(i)$ is the number of partitions of a set of size $i$. Hence we have $b(0)=b(1)=1$, $b(2)=2$ as the set $\{1,2\}$ can be partitioned in 2 distinct ways: $\{\{1\},\{2\}\}$ and $\{\{1,2\}\}$; and $b(3)=5$ as the set $\{1,2,3\}$ can be partitioned in 5 distinct ways: $\{\{1\},\{2\},\{3\}\}$; $\{\{1\},\{2,3\}\}$; $\{\{2\},\{1,3\}\}$; $\{\{3\}, \{1,2\}\}$ and $\{\{1,2,3\}\}$. More generally, we have \begin{eqnarray*} b(i+1) = \sum_{k=0}^i {i \choose k} b(k). \end{eqnarray*} It can be explained by observing that, from an arbitrary partition of $i+1$ items, removing the set containing the first item leaves a partition of a smaller set of $k$ items for some number $k$ that may range from $0$ to $n$. There are ${i \choose k}$ choices for the $k$ items that remain after one set is removed, and $b(k)$ choices of how to partition them. Here are the 52 partitions of a set with 5 elements: ![The 52 partitions of a set with 5 elements](https://upload.wikimedia.org/wikipedia/commons/thumb/1/19/Set_partitions_5%3B_circles.svg/199px-Set_partitions_5%3B_circles.svg.png) The basis of the vector space of linear invariant functions $f:\R^{n^k}\to \R$ is then easy to describe: given a partition $\pi$ of $\{1,\dots, k\}$, we define $m_\pi$ the set of multi-indices $(i_1,\dots ,i_k)\in [n]^k$ by \begin{eqnarray*} (i_1,\dots, i_k) \in m_\pi \Leftrightarrow \{\forall j,\ell, i_j=i_{\ell} \Leftrightarrow \mbox{there exists } A\in \pi \mbox{ such that } j,\ell\in A\}. \end{eqnarray*} Then we define a basis of the vector space of linear invariant functions $f:\R^{n^k}\to \R$ by the family $(f_\pi)_\pi$ where $\pi$ ranges over the partitions of $\{1,\dots ,k\}$ and \begin{eqnarray*} f_\pi(T) = \sum_{(i_1,\dots, i_k) \in m_\pi}T_{i_1,\dots , i_k} \end{eqnarray*} For example, a linear invariant function $f:\R^n\to\R$ is of the form $f(T) = \alpha \sum_{i}T_i$ since there is only one partition of a set of size 1. For a set of size 2: $\{i_1,i_2\}$, there are two partitions: $\{\{i_1\},\{i_2\}\}$ and $\{\{i_1,i_2\}\}$. As a result, any linear invariant function $f:\R^{n^2}\to\R$ is of the form $f(T) =\alpha \sum_{i_1\neq i_2} T_{i_1,i_2}+\beta\sum_{i_1=i_2}T_{i_1,i_2}$, where $\alpha,\beta$ are real numbers. The basis of the vector space of linear equivariant functions $f:\R^{n^k}\to \R^{n^\ell}$ is slightly more complex. First for a partition $\pi$ of $\{1,\dots, k+\ell\}$ and multi-indices $(i_1,\dots ,i_k)\in [n]^k$ and $(j_1,\dots ,j_\ell)\in [n]^\ell$, we denote: \begin{eqnarray*} (j_1,\dots ,j_\ell) \in m_\pi(i_1,\dots, i_k) \Leftrightarrow (i_1,\dots,i_k,j_1,\dots, j_\ell)\in m_{\pi}. \end{eqnarray*} Then we define a basis of the vector space of linear equivariant functions $f:\R^{n^k}\to \R^{n^\ell}$ by the family $(f_\pi)_\pi$ where $\pi$ ranges over the partitions of $\{1,\dots ,k+\ell\}$ and \begin{eqnarray*} f_\pi(T)_{(j_1,\dots,j_\ell)} = \sum_{(i_1,\dots, i_k) \in m_\pi(j_1,\dots,j_\ell)}T_{i_1,\dots , i_k}, \end{eqnarray*} and all other terms are zeros, i.e. $f_\pi(T)_{(j_1,\dots,j_\ell)}=0$ if $(j_1,\dots ,j_\ell) \notin m_\pi(i_1,\dots, i_k)$ for all possible $(i_1,\dots ,i_k)\in [n]^k$. Below, we give an illustration for the non-zeros terms, for $k=2$, $\ell=3$ and the partition $\pi=\{\{i_1\},\{i_2,j_3\},\{j_1,j_2\}\}$: ![](https://i.imgur.com/JyMeuGC.jpg) For example, consider the linear equivariant functions $f:\R^{n^2}\to \R^n$. For the partition $\pi = \{\{1\},\{2,3\}\}$, we have for all $j\in [n]$, \begin{eqnarray*} f_\pi(T)_j = \sum_{j\neq i_1, i_2=j}T_{i_1,i_2} = \sum_{i\neq j} T_{i,j}, \end{eqnarray*} for $\pi=\{\{1\},\{2\},\{3\}\}$, we have for all $j\in [n]$, \begin{eqnarray*} f_\pi(T)_j = \sum_{j\neq i_1, i_2\neq i_1, i_2\neq j}T_{i_1,i_2} , \end{eqnarray*} for $\pi=\{\{1,2\},\{3\}\}$, we have for all $j\in [n]$, \begin{eqnarray*} f_\pi(T)_j = \sum_{j\neq i_1, j\neq i_2, i_2= i_1}T_{i_1,i_2} = \sum_{i\neq j}T_{i,i}, \end{eqnarray*} and for $\pi = \{\{1,2,3\}\}$, we have for all $j\in [n]$, $f_\pi(T)_j = T_{j,j}$. Finally, any linear equivariant function $f:\R^{n^2}\to \R^n$ can be written as: \begin{eqnarray} \forall j\in [n], f(T)_j = \alpha_1\sum_{i_2\neq j} T_{j,i_2}+\alpha_2\sum_{i_1\neq j} T_{i_1,j}+\alpha_3 \sum_{j\neq i_1, i_2\neq i_1, i_2\neq j}T_{i_1,i_2}+\alpha_4\sum_{i\neq j}T_{i,i}+\alpha_5 T_{j,j}, \end{eqnarray} for some parameters $\alpha_1,\dots, \alpha_5$. An important corollary of the Theorem is that the dimension of the spaces of linear invariant/equivariant functions does not depend on the number of nodes $n$ but only on the order on the input and output tensors. For example, the linear equivariant function given above can be applied to any graph (i.e. its adjacency matrix) to produce a vector living on the nodes, whatever the size of the graph. ###### tags: `public` `reading` `GNN` `Equivariant` `Invariant`