Model design - HackMD

# Model design ## Partionning ### Community detection algorithms Popular methods we could use: - Louvain - Infomap For each method, we seek to obtain a clustering matrix $P$ of size $N \times N$. $P_{ij}=1$ if nodes $i$ and $j$ are assigned to the same cluster, zero otherwise. ### Differentiable assignements Inspired by DiffPool, we can use differentiable cluster assignement that will be trained in an end-to-end fashion with the rest of the model. This necessicates an hyperparameter $C$ that defines the maximal number of clusters. Then the assignement proceeds as following: 1. The graph features are given as input to a neural network architecture of our choice (for the whole model to be equivarient, this should be one of the two architectures defined in the next section with group $G'$). 2. The output of this neural network should be a vector of size $C$ for each node, thus a matrix of size $N \times C$ we call $Y$. 3. We obtain the assignement matrix $S = \text{Softmax}(Y)$, where the softmax is applied on each row (node). 4. We obtain a "soft" clustering matrix $P = SS^T$. $P_{ij}$ is a measure of how much nodes $i$ and $j$ are assigned to the same cluster. If the clusters assignements were hard, the elements would only be ones or zeros. ## Equivariant layer From the clustering matrix $P$, we have defined subsets on the set of nodes. We will define the layer to be equivarient with respect to the group $G$ of permutations of subsets along with permutations of nodes within each subset. This is by contrast to the group $G'$ of all node permutations of standard GNNs. Basing ourselves on the Incidence network model, we could choose to restrict ourselves to 2 types of layers: 1. Layer taking a full node-node incidence tensor (adjacency matrix) as input. Then, there is the option to produce a sparse layer with the incidence matrix. Requiring equivarience to $G$, we get 60 parameters (see unpublished Orbit network model). Comparatively, the layer equivarient to $G'$ has 15 paramters. - But there is a further restriction for symmetric adjacency matrices which is the case we have for each dataset. With this, the number of parameters for 33 for $G$ and 9 for $G'$. 2. A simpler alternative would be to input only the set of nodes (thus having no features associated to edges) and sparsifying the model with the adjacency matrix. This would be reminiscent of the Deep sets model and simple GNN. Requiring equivarience to $G$ would give 3 parameters. Equivarience to $G'$ gives 2 parameters. The 3 parameters are each associated with the following matrices - Diagonal matrix : $I$ - Sparsified cluster assignement matrix : $P \circ A$ - Matrix of ones : $1 \circ A$ **Option 2 seems to be the most realistic for now.** We could also look at node-edge incidence tensors, it could be an interesting future extension. ## Invariant layer The invariant layer is obtained by a pooling operation on all the features. This could be either : - Sum - Mean - Maximum Then on top of that, we can add a neural network ## Regularization Taking inspiration from the DiffPool model, we can regularize the loss function when training with differentiable cluster assignements with two terms: * Link prediction objective : A term $L_{\text{LP}} = \Vert A, P \Vert_{F}$, where the norm is the Frobenius norm is added to the loss. This should push the cluster assignement away from spurious local minimas at the beggining of training by making it cluster nodes relatred by edges * Entropy regularization : A term $L_{\text{E}} = \tfrac{1}{n} \sum_{i=1}^n H(S_i)$, where $H$ is the entropy function and $S_i$ is the $i$-th row of $S$, is added to the loss. This should make soft cluster assignements closer to hard cluster assignements.