Kyunghyun Cho
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
Stochastic variational inference for low-rank stochastic block models === Kyunghyun Cho ## Motivation: inter-node connectivity based clustering many conventional approaches to unsupervised vertex (node) clustering in a graph can be summarized as grouping nodes based on the overlaps in their neighbours. as a typical and representative example, we can use $k$-means clustering for vertex clustering of a graph $\mathcal{G}=(V, E)$ by representing each node using the following feature vector $v_i \in \mathbb{R}^{2|V|}$: \begin{align} v_i^j = \begin{cases} 1, & \text{if } (i,j) \in E \\ 0, & \text{otherwise} \end{cases} \end{align} for $j=1, \ldots, |V|$, and \begin{align} v_i^{|V|+j} = \begin{cases} 1, & \text{if } (j,i) \in E \\ 0, & \text{otherwise} \end{cases} \end{align} by running $k$-means clustering on this feature representation, two nodes that share a similar set of incoming and outgoing $1$-hop neighbours will likely belong to the same cluster. it is often a usual practice to perform dimensionality reduction followed by $k$-means clustering or its variant. such an approach is desirable especially when the goal is to _coarsen_ a given graph by merging a set of vertices belonging to each cluster into a single vertex in a newly coarsened (or compressed) graph. in other words, this approach looks for nodes that are interchangeable. we are however often interested in clustering of nodes according to inter-cluster connectivity patterns rather than inter-node connectivity patterns. for instance, we can have two clusters that are bipartite (no intra-cluster edges) and one-to-one (one node in one cluster is connected to exactly one node in the other cluster.) because none of the vertices from one cluster shares the neighbour nodes from the other cluster, the conventional approach would not cluster them into a single cluster. in this note, we are interested in a simple approach to vertex clustering that almost entirely focuses on the inter-cluster connectivity rather than the inter-node connectivity. the proposed approach can be thought of as a special case of a stochastic block model (SBM) for a directed graph, such that it is readily applicable to a large graph. in the next section, i will describe this algorithm and its relationship to SBM. ## Latent variable based vertex clustering the proposed approach can be described at a high level as a latent-variable generative model of a directed edge. by repeatedly sampling the existence $e_{i,j} \in \left\{0, 1\right\}$ of each directed edge at a time for all vertex pair $V \times V$, we end up with a graph instance $\mathcal{G}$. the probability of the directed edge $i\to j$ for a node pair $(i,j)$ depends not on the nodes themselves but their latent cluster assignments, $z_i \in \left\{1, \ldots, K\right\}$ and $z_j \in \left\{1, \ldots, K\right\}$, respectively. in other words, for each node pair $(i,j)$, we first determine their latent clusters by drawing $z_i$ and $z_j$ from the prior distributions (could simply be uniform over $\left\{1, \ldots, K\right\}$). we then stochastically decide whether there is an edge $i\to j$ by drawing a sample from $p(e_{i,j} | z_i, z_j)$ and also decide whether there is $j \to i$ by drawing a sample from $p(e_{j,i} | z_j, z_i)$. this edge probability is parameterized with a small number of parameters. the goal is then to simultaneously infer the posterior distribution over $z_1, \ldots, z_{|V|}$ and estimate the edge-probability parameters given an observed graph $\mathcal{G}$. in order to avoid the (potential) intractability of doing so, we can rely on stochastic gradient based variational inference to estimate the factorized approximate posterior and assign each node to the most likely cluster according to its own approximate posterior. in this approach, the chance of having an edge from one node to another is entirely up to their latent clusters. you can almost think of this as inferring the type of each node and considering the connectivity at this type level. for instance, consider natural language text. instead of considering how surface-level tokens are connected (that is, are placed next to each other), we can infer its part-of-speech tag (e.g., verbs, nouns, etc.) and consider their connectivity. this would allow us more easily to realize that for instance in english verbs often follow nouns, even when noun-verb pairs (subject-verb pairs) are often extremly sparsely observed. in the next section, we consider and describe one particular implementation of this approach in detail. ### Details we will detail the algorithm and learning procedure here. we start by defining $\alpha_i \in \Lambda^{K-1}$ be a distribution over $K$ clusters for the $i$-th node in a graph $\mathcal{G}=(V, E)$. $\Lambda^{K-1}$ is the $(K-1)$-dimensional simplex such that \begin{align} &\alpha_i \geq 0,\text{ for all } i, \\ &\sum_i \alpha_i = 1. \end{align} in other words, \begin{align} q(z_i = k) = \alpha_i^k, \end{align} where $z_i \in \left\{1, \ldots, K\right\}$ is a cluster assignment variable. this distribution $q(z_i)$ will be the approximate posterior for each node $i$. given a pair of $z_i$ and $z_j$, we can compute the directed edge probability as \begin{align} p(e_{i,j} =1 | z_i, z_j) = \sigma( u_s^{z_i} \cdot u_t^{z_j} + b ), \end{align} where * $e_{i,j} \in \left\{0, 1\right\}$ indicates the existence of the directed edge from $i$ to $j$. * $u_s^k \in \mathbb{R}^d$ is the source feature vector of the $k$-th cluster. * $u_t^k \in \mathbb{R}^d$ is the target feature vector of the $k$-th cluster. * $b \in \mathbb{R}$ is a bias capturing the overall sparsity of the graph $\mathcal{G}$. * $\sigma(a) = \frac{1}{1+\exp(-a)}$ is a sigmoid function. we can then write down the objective function (the variational lowerbound to the log-likelihood) as \begin{align} \mathcal{J}(u_s, u_v, b, \alpha) =& \sum_{i=1}^{|V|} \sum_{j=1}^{|V|} \mathbb{E}_{q(k)q(k')} \left[ e_{i,j} \log p(e_{i,j}=1|k, k') +\right. \\ &\qquad\qquad\qquad\quad\quad\left. (1-e_{i,j}) \log (1- p(e_{i,j}=1|k, k')) \right] \\ &\qquad\qquad\qquad\quad\quad +\sum_{i=1}^{|V|} \mathcal{H}(q(z_i)) \\ =& \sum_{i=1}^{|V|} \sum_{j=1}^{|V|} \sum_{k=1}^K \sum_{k'=1}^K \alpha_i^k \alpha_j^{k'} \left( e_{i,j} \log p(e_{i,j}=1|k, k') \right. \\ &\qquad\qquad\qquad\qquad\quad\quad+ \left. (1-e_{i,j}) \log (1- p(e_{i,j}=1|k, k')) \right) \\ &- \sum_{i=1}^{|V|} \sum_{k=1}^K \alpha_i^k \log \alpha_i^k, \end{align} where $\mathcal{H}$ is the entropy functional. $e_{ij}$ is set to an observed existence of the directed edge $i\to j$. we realize that $p(e_{i,j} = 1 | k, k')$ does _not_ depend on $i$ nor $j$. it only depends on $k$ and $k'$. let us use $s_{k,k'}$ to denote this quantity. then, we can simplify the first term into \begin{align} &\sum_{k=1}^K \sum_{k'=1}^K \log s_{k, k'} \sum_{i=1}^{|V|} \sum_{j=1}^{|V|} \alpha_i^k \alpha_j^{k'} e_{i,j} + \sum_{k=1}^K \sum_{k'=1}^K \log (1-s_{k, k'}) \sum_{i=1}^{|V|} \sum_{j=1}^{|V|} \alpha_i^k \alpha_j^{k'} (1-e_{i,j}) \\ &= \sum_{k=1}^K \sum_{k'=1}^K \log s_{k, k'} \sum_{i=1}^{|V|} \alpha_i^k \sum_{j=1}^{|V|} \alpha_j^{k'} e_{i,j} + \log (1-s_{k, k'}) \sum_{i=1}^{|V|} \alpha_i^k \sum_{j=1}^{|V|} \alpha_j^{k'} (1-e_{i,j}), \end{align} which enables us to create an efficient implementation based on matrix-vector operations. for a large graph $\mathcal{G}$ with many nodes ($|V| \gg 0$), we can use minibatch stochastic gradient descent to maximize this objective w.r.t. $\alpha$, $u_s$, $u_v$ and $b$, all together. in this case, we sample a minibatch by sampling a random subgraph $\tilde{\mathcal{G}} \subset \mathcal{G}$. because the approximate posterior is factorized over the nodes, minibatching works without any sophisticated implementation techniques. instead of $\alpha_i$, which requires us to take into account the constraints (non-negativity and sum-to-1), we can reparameterize it as \begin{align} \alpha_i^k = \frac{\exp(\beta_i^k)}{\sum_{i'} \exp(\beta_{i'}^k)}, \end{align} and perform optimization w.r.t. the unconstrained $\beta_i$. this last trick dramatically simplifies the implementation and also allows us to rely almost entirely on the standard components of any standard deep learning library, such as PyTorch. once optimiztaion is over, we can either use the estimated approximate posterior $\alpha_i$ to determine the cluster assignment of each vertex $i$, e.g. by $\hat{z}_i = \arg\max_k \alpha_i^k$, or perform exact posterior inference. though, the latter could be extremely costly. ### Relationship to a stochastic block model (SBM) the proposed approach can be thought of as a stochastic block model (SBM) for a directed graph with the following modifications: 1. low-rank approximation to the block connectivity matrix $B \in \left[0,1\right]^{K \times K}$: the block connectivity matrix $B_{k, k'}$ in an SBM is approximated by $\sigma( u_s^{k} \cdot u_t^{k'} + b)$. 2. stochastic variational inference with Adam optimizer: by reparameterizing both $B$ and the approximate posterior to admit gradient-based optimization, we use the latest stochastic optimization technique from deep learning to seamlessly scale inference to a very large graph. 3. potential for future extensions: because we parametrize the block-block connectivity, there are many opportunities to extend the proposed approach in the future. for instance, we can contextualize the whole process by making $u_s$ and $u_t$ vectors conditioned on the context. ## Simulation: clustering neurons ### Task and Data: FlyWire as a demonstration, we use the fruit fly connectome data, called FlyWire, from https://codex.flywire.ai/. in particular, we use the connection data (https://codex.flywire.ai/api/download?data_product=connections&data_version=783&meta_data_only=1). from this connection data, we create a connectivity matrix of size 134,181x134,181 with 2,700,513 non-zero entries. if there are multiple synapses between two neurons, we simply suppress them to become $1$. in other words, we end up with a sparse, binary matrix of size 134,181x134,181. here, we take visual neurons as neurons of our interest. after clustering of these 134,181 neurons is done, we compare the clustering of a subset, that corresponds to the visual neurons, to the annotated visual enruon types from FlyWire (https://codex.flywire.ai/app/optic_lobe_catalog, https://codex.flywire.ai/api/download?data_product=visual_neuron_types&data_version=783&meta_data_only=1). there are 729 types in total for 46,479 neurons. ### Evaluation metric because we have the ground-truth clustering assignments, we use Hungarian algorithm to maximize the following objective function and use the achived objective as the quality metric of clustering: \begin{align} \max_{X \in \left\{0, 1 \right\}^{K \times K'}} \sum_{k=1}^K \sum_{k'=1}^{K'} C_{k,k'} X_{k,k'}, \end{align} where $C_{k,k'}$ is the number of instances that belonged to the $k$-th cluster and the $k'$-th cluster, respectively, according to two clustering algorithms in comparison. when one of the algorithms is the ground-truth one, the higher objective implies the better alignment between the ground-truth and predicted clustering assignments. in addition to the score we get, we also get the actual alignment between the clusters from $X$. this allows us to visually inspect the relationship between two clustering assignments. ### Comparison: PCA+$k$-means clustering in order to highlight how inter-cluster connectivity focused node clustering is different from a conventional approach, we implement one simple iterative algorithm for vertex clustering. we first train a linear autoencoder using stochastic gradient descent to minimize the following loss function: \begin{align} \min_{U \in \mathbb{R}^{N \times d}} \| (W U) U^\top + b - W \|^2_2, \end{align} where $W$ is the observed sparse, binary connectivity matrix, and $b$ is a scalar bias. we optionally orthogonalize $U$, although this may not be necessary. we then run $k$-means clustering on the $N$ rows of estimated $U$. ### Random assignment baseline in order to establish that clustering does something meaningful, we use two random cluster assignment strategies as minimal baselines. if the scores of these baselines are comparable to those by either the proposed latent variable approach or the PCA+$k$-means clustering approach, we would easily conclude that comparison between the latter two is not meaningful. the first random assignment is optimistic, in that the cluster marginal distribution follows that of the ground-truth assignment. we create this assignment by randomly permuting the ground-truth cluster assignment. the second one is pessimistic, in that we assign each vertex uniformly to one of the 729 clusters at random. this is still not the most pessimistic version in that we assume we know the true number of clusters. ### Implementation some very naive and potentially buggy implementation is available at https://github.com/kyunghyuncho/flywrite-cell-types the code relies heavily on pytorch, scikit-learn and scipy. the code is released under MIT License. ### Result we first use 1,024 clusters for both algorithms; PCA+$k$-means clustering and latent variable based clustering. in the case of the former, we update $W$ and $b$ 10,000 times using stochastic gradient descent with minibatches of size 512. we run stochastic gradient descent until the clustering assignment converges in the case of the proposed approach. For both, we use Adam as an optimizer. | | PCA+$k$ | LV | Rand(Opt) | Rand(Pess) | | --------- | ------- | ---- | --------- | ---------- | | GT | 1550 | **3019** | 1235 | 985 | | PCA+$k$ | | 1433 | 1029 | 1238 | | LV | | | 1357 | 1162 | | Rand(Opt) | | | | 985 | <!-- | Rand(Pess) | --> from the result table above, we make a few important observations. first, we see that random assignment generally gets the score of $\approx$ 1000. the optimistic assignment receives 1235 against the ground-truth assignment, while the pessimistic one 985. both PCA+$k$ and LV receive significantly higher scores than either of these random assignments, suggesting that the proposed quality metric is sensible. second, the agreement between PCA+$k$ and LV is significantly higher than 1000. this implies that there is some shared structures behind the graph captured by both PCA+$k$ and LV. despite the distinct goals behind these algorithms, inter-node and inter-cluster connectivity patterns are not independent of each other. we leave for the future how to design an algorithm that can exploit both in a flexible and controllable manner. finally, and perhaps most importantly, the proposed approach (LV) receives a substantially higher score of 3019 than PCA+$k$ which received 1550. 1550 is significantly higher than any score by random assignment, but the gap is even greater when compared against LV. this suggests that the neuron types of visual neurons do correlate better with inter-cluster connectivity, and that the proposed approach can capture this inter-cluster connectivity patterns. #### Further analysis clustering often does not guarantee that every cluster would be populated. in other words, the number of clusters we use (1,024) is the upperbound on the number of resulting clusters. in our experiments, PCA+$k$-means clustering and the proposed LV clustering resulted in 915 and 1,011 clusters, respectively. **TODO** ## Limitations and future directions a major limitation of the proposed approach is that it only takes into account the first-order neighbours in the cluster space. as it has become quite obvious from the advances and successes of language models in recent years, there are a lot of signals that can be squeezed out by considering an increasingly larger context. instead of considering just an immediate neighbouring vertex's cluster assignment, it will be important in the future to consider a larger neighbourhood. in order to take into account multi-hop neighbours, a graph neural net can be an interesting alternative. it will however make learning and inference much more challenging, as the first term in the objective function above will not be exactly computable and require sampling-based approximation. it is to be seen whether such a sampling-based approach would have low enough variance to be useful. another limitation is not about the method but about evaluation. we chose visual neuron type classification, because these visual neurons are classified according to which other types of neurons they are connected to rather than which other neurons they are connected to. even then, it is unclear whether the proposed approach would be well fit for this problem. it will be necessary to test the proposed approach on a more diverse set of benchmark problems in the future. finally, there is yet another limitation on the evaluation protocol we used. for evaluation, i used all the neurons for clustering and then selected a subset of visual neurons to check whether their assigned clusters make sense. this may not be an ideal choice, since some of the clusters may be dedicated to non-visual neurons only, although a large portion of fly neurons are visual neurons. it would potentially improve clustering dramatically by considering only the visual neurons and their immediate neighbours. we leave this for the future. **TODO**

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully