Published owned this noteowned this note Linked with GitHub
Any changes
Be notified of any changes
Mention me
Be notified of mention me
SOM Network
===========
Pre

![](https://i.imgur.com/CeFV1uN.png)

An example of SOM.
![](https://i.imgur.com/eC9BnwQ.png)


Reconstruction

![](https://i.imgur.com/YEa8y2e.png)![](https://i.imgur.com/ZcpUp2W.png)![](https://i.imgur.com/8N3lS0H.png)

Original digits (MNIST) Reconstructed from patches SOM of the patchs
![](https://i.imgur.com/ewHHjgg.png)![](https://i.imgur.com/797p6MZ.png)![](https://i.imgur.com/dikwXob.png)

Original Frog (CIFAR) Reconstructed from patches SOM of the patches
In CIFAR, it's much more difficult to accurately reconstruct the image from patches because the space of patch is very large since the object is more complicated than digit and there are many color of it. Reconstructing $4\times4$ patches from CIFAR is still much more blurry than taking $9\times9$ patches on MNIST.
![](https://i.imgur.com/797p6MZ.png)![](https://i.imgur.com/VJ5ROL1.png)![](https://i.imgur.com/gM0cgE0.png)

rec' with SOM with 400 synapticrec' with SOM with 2500 synaptic SOM with 2500 synaptic
With more synaptic, we can enhance the reconstruction to some degree (e.g. the color is more distinguishable).
![](https://i.imgur.com/f9M84YX.png)

Mapping images of a car from different angles to a SOM with a looped 1D lattice.
Stacked Patched SOM

![](https://i.imgur.com/EMwN4dT.png)

Stacked patch SOM: We use the coordinate (topological position) of the winnerneuron as the input of the next layer. This is the clustered result of the last layer. Apparently, this architecture can poorly cluster the digits in a correct way. 
SOM as a Mask

In all of the following discussions, we are talking about using SOM as a mask.
![](https://i.imgur.com/98VirUK.png)

Illustration of the SOMMask. The number of neurons in the SOM map is equal to the one of the feed forward layer (FC or CNN). An input is send to the SOM to calculate the neighborhood function as a mask. The mask is then applied to the feed forward layer.
SOM+FC

Fully connected layer is easier to manipulate with compared with CNN. The following network consists of three fully connected layers. SOM is applied on the first two layers.
The basic idea is to use the SOM to calculate a mask for the connections between input and output. For an input sample $x$, the output of a traditional fully connected layer is $y_j = \sum_{i}x_iW_{ij}$ for each neuron $y_j$. However, since we have organized the input $x$ and there should be a small domain of neurons around the winner neuron that are activated. Therefore, we can calculate a mask $m_j$ indicating how much a neuron $j$ should be activated.
If the winner neuron of $x$ in the output layer is $y_{j^*}$, then the output will be $$y_j=dist(j, j^*) \sum_{i}x_iW_{ij}$$, where $dist(\cdot, \cdot)$ is the distance between two neurons in the topological space.
Parameter $\sigma$ indicates the influence radius of the winner neuron and is critical to the training. If $\sigma$ is small enough, the network can a good job on organization but the accuracy drops because the activation is too sparse. On the other hand, if $\sigma$ is large, then the performance can approximate the one without, SOM but the organizing map becomes blurry like the average of different images.
![](https://i.imgur.com/xWXYR7A.png)

Performance on CIFAR10: Four curves from top to bottom are $\sigma=+\infty$ (model without SOM), $\sigma=50$, $\sigma=20$, $\sigma=5$. A small $\sigma$ helps organizing but affect the performance negatively due to the sparsity of the activation.
By adapting different $\sigma$ between organization and forwarding, we can achieve both performance and interpretability. We use a small $\sigma$ to train the organizing map and use a large $\sigma$ to calculate the forwarding mask.
![](https://i.imgur.com/deSZvRQ.png)

The selforganized map of the first layer on MNIST.
![](https://i.imgur.com/uzi23vD.png)

Performance on MNIST: The loss, testing accuracy and training accuracy of the network. The blue one and the red one are models with SOM (with different $\sigma$ between organization and forwarding) and the orange one is the one without SOM. The blue one applies SOM on both the first and the second layer while the red one applies SOM only on the first layer.
Using SOM on FC in this way acts like clustering the input samples before handling them. Although the activation is sparse, we still have to do the entire matrix multiplication and no computation can be saved. Furthermore, there is no performance boost observed from this model. The only benefit is that we have a good visualization, which can be achieved by the vanilla SOM. **(TODO)** Other potential contribution: filters organization, deeper SOM, defending adversarial samples.
The selforganization observed in the striate cortex are based on small patches rather than the entire image.
We cannot observe any organizingpatterns from the filters.
![](https://i.imgur.com/Gfkhg3R.png)

The visualization demonstrates the inputs $(28*28)$ that can activate $(10*10)$ filters at most.
Implementing CartPole Agent:
 regularization is dangerous.
 gamma is important (never set to 0).
 Use a large negative reward to compromise the sparsity of negative samples.
 Control the size of memory.
 Train after episode, not step.
 Train minibatch sample by sample rather than one time.
 Training frequency is related to the memory size but should be independent with the length of episode.
 CartPolev0 is different from CartPolev1.
SOMReLU vs ReLUSOM

SOMReLU is slightly better than ReLUSOM.
![](https://i.imgur.com/02JSatI.png)


From top to bottom is the performance of (i) without SOM (ii) MaskReLU (iii) ReLUMask.
SOM+CNN

CNN with SOM. The SOM takes the entire image as the input to generate a mask for different channels/filters in the output.
![](https://i.imgur.com/TRE4Qof.png)

Hard to find an organization in the filters. The SOM is trained on the complete image.
SOMCNN can outperform CNN by an extremely slight margin (99.02% vs 98.76%).
Some thoughts

How to apply SOM?
 Find a scenario that **vector quantization** is useful (i.e. continuous control). Then we can use SOM as the algorithm to find out the representative vectors for this problem.
 Focus on vision tasks, where SOM is rooted.
A common aspect of CNN and SOM is that they works on small patches of the input rather than the entire one. CNN do the computation on all possible patches while it's computational expensive to do that with SOM. Maybe attention can be applied to pick the correct patches here?
Threshold Mask

Not good. It sacrifices too much accuracy to sparsify the activation.
CSOM

Convolutional SOM as mask. The inputs of the SOM are gray patches rather than the entire image.
Since we have to extract numerous patches from each images and this process slows down the training of SOM. However, SOM converges quickly and doesn't change after that. Still, we can find organized patterns in the synaptic vectors but not the filters.
We can achieve competitive result with about 30% of the activation strength, which is energyeconomical in biological way but utilizes more computation in the programming.
![](https://i.imgur.com/HXTg5Ho.png)

CNN filters and Synaptic on MNIST.
![](https://i.imgur.com/E61Ug5A.png)

Synaptic on CIFAR10.
Threshold Mask

To make the activation more sparse, we can set a threshold to the mask. It takes only 5% of the neurons in the CNN layers and can achieve similar accuracy (98.9%) on MNIST.
![](https://i.imgur.com/yBanooR.png)

ToptoBottom: Threshold of 0.0, 0.5, and 0.9.
![](https://i.imgur.com/zE37DGr.png)![](https://i.imgur.com/YcnZJI3.png)

Gray: CNN with 30 neurons. Green: CNN with 100 neurons but only 30 activated neurons masked by a SOM.Orange: CNN with 5 neurons. Blue: CNN with 100 neurons but only 5 activated neurons masked by a SOM.
Updated

![](https://i.imgur.com/lfF4wpj.png)

From top to bottom: Original one (100 neurons), SOMMasked (5 activated neurons), pruned network (5 neurons), Dropout (keep ratio is 0.05).
Import from clipboard
Editing is for members only
With current role, you can only comment.
This team is disabled
Sorry, this team is disabled. You can't edit this note.
This note is locked
Sorry, only owner can edit this note.
Reach the limit
Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!