BIGCLAM - HackMD

# BIGCLAM Assume you have $N$ nodes and $K$ communities This algorithm requires pre-setting the number of clusters $K$ before hand. The basic approach is, let's assume we have a binary $N*K$ binary matrix denoting whether each node belongs to which community. Let's assume we have a probabilities vector $f$ where $f(x)$ denoting the probability that $2$ nodes are joined by an edge if they both belong to community with index $x$. The idea is, if $2$ nodes belong to the same community there's a probability that they are linked. Assume that $M(a)$ is the set of communities which node $a$ belongs to. Assume that $p(u,v)$ is the probability that an edge exists between $u,v$. $$p(u,v)= 1- \prod_{c \in M(u) \cap M(v) } (1-f(c))$$ Now the probability of generating the graph is $$\prod_{(u,v) \in E } p(u,v) \prod_{(u,v) \notin E } (1-p(u,v))$$ We can transform multiplication into summation of logs, anyway the problem with this approach is learning a binary matrix. ## BIGCLAM approach Assume having a weight matrix for each node. $v^i$ denotes the strength vector for node $i$. $v_k^i$ denotes the strength of the tie between node $i$ and community $k$. Assume that the probability that 2 nodes belong to the same community $k$ is: $$p_k(u,v) = 1 - exp(- v^i_k * u^i_k)$$ Notice that the stronger the ties the smaller the number inside the exponent reaching 0. By the same logic of our previous method we can write the probability of an edge: $$p(u,v) = 1 - \prod_{k} (1 - p_k(u,v))$$ $$p(u,v) = 1 - \prod_{k} (1 - (1 - exp(- v^i_k * u^i_k))$$ $$p(u,v) = 1 - \prod_{k} exp(- v^i_k * u^i_k)$$ Notice that we are multiplying exponentials so we can turn it into exponential of the dot product of the vectors $$p(u,v) = 1 - exp(- v^i \cdot u^i)$$ The learning goal is the same: $$\prod_{(u,v) \in E } p(u,v) \prod_{(u,v) \notin E } (1-p(u,v))$$ We can break it into sum of logarithms and use binary cross entropy for example and back propagation to learn the weight matrix.