HyperNEAT algorithm

# HyperNEAT algorithm ## Motivations HyperNEAT tries to address the issue of evolving large networks. The NEAT algorithm is limited by its direct encoding of the neural networks (each node and connection are explicitly saved and evolved through mutations), indeed to reach large networks sometimes needed to solve difficult tasks, it takes long time to increase the complexity of the NN. HyperNEAT is also designed to exploit the geometry of the task, which is in many cases of great use and likely can improve training (speedind it up, or even allowing the algorithm to reach a solution). ## Indirect encoding of neural network In nature, the mapping between genotype and phenotype is indirect. A human genome has roughly 30000 genes that encode a human brain with 100 trillion connections. Phenotypes usually display repetition, symmetry (two hands, two legs, etc..) using the same genes repeatedly. The goal is to find a compact representation of a NN that contains elements of symmetries/regularities, repetition, locality. This is done via connective Compositional Pattern Producing Networks which are themselves composition of functions that are either Gaussian (symmetry, locality), cosine (periodicity, repetition) (in turn this composed function can be represented as a computational graph). ![](https://i.imgur.com/S8iAoBR.png) Whereas NEAT evolves ANNs with hidden nodes having sigmoid activation functions, CPPN-NEAT allows each hidden node to specify its own activation function. ## Exploitation of input-output geometry So far, CPPN are continuous function that can map out symmetric/repeating patterns. What we want is that the structure of the NN encoded by the CPPN also present elements of symmetry/repetition/locality. The solution is to specify a geometry to the input and output data. Let's say input data is a 2d image and output too. The connectivity between the input and output nodes is governed by the CPPN which takes their coordinate and return the corresponding edge weight i.e. $CPPN(x_{input}, y_{input}, x_{output}, y_{output}) = w_{input-output}$. ![](https://i.imgur.com/EsjR7fX.png) > Different geometric relationships between input and output nodes. This way, it is possible to give evolution a significant advantage from the start. The CPPN doesn't explicitly store the connectivity informations of each connected nodes, but rather provides a connectivity concept (a general mathematical relationships to produce a given connectivity weight pattern). **The same connective CPPN can represent an equivalent concept at different resolutions (i.e. node densities) !** Without further evolution, previously-evolved connective CPPNs can be re-queried to specify the connectivity of the substrate at a new, higher resolution, thereby producing a working solution to the same problem at a higher resolution! ![](https://i.imgur.com/GZzLQmJ.png) > Examples of different input resolutions giving rise to different connectivity patterns generated by the same CPPN When increasing the resolution, a potential problem is that if the same activation level is used to indicate positive stimulus as at a lower resolutions, the total "energy" entering the substrate would increase, leading to oversaturation of the target field. To account for this disparity, the input activation levels are scaled for larger subtrate resolutions proportionally to thedifference in unit cell size. A single CPPN-solution can represent an infinite range of resolutions, i.e. node densities. But a solution at lower resolution can reveal artifacts when they are brought into greater focus. Additional geometric information can be provided to the CPPN as the input-output distance ($x_1 - x_2$, $y_1 - y_2$). It has proven to give better results for some tasks. ## Application to the toric code - Geometric information invisible to current NNs evolved by NEAT, **hyperNEAT can exploit geometric information**: locality (adjacenc, square lattice), symmetries (PBC) - Due to translational invariance, CPPN can take as input the differences $x_1-x_2$ and $y_1-y_2$ instead of raw coordinates. - **The substrate configuration both account for the toric code geometry as well as action location information!** Since the output nodes refer to an action taken at a specific location - Scaling possibility! Training on d=3 substrate gives a solution CPPN that can be readily evaluated on d=5, d=7,.. board sizes because the higher node density in the substrate is easily handled by CPPN. Thereby solving the issue of the ill-defined crossing point, because different distance codes are solved with separate trainings. Here **with one CPPN, this "algorithm" can solve any size of the toric code**. - Another application can be to initialize training on d=5 with the previously obtained solution at d=3. Thus incrementally building solutions, this may well **speed up learning**! ![](https://i.imgur.com/vYTuwqK.png) ## References Original paper: http://axon.cs.byu.edu/~dan/778/papers/NeuroEvolution/stanley3**.pdf FAQ by Stanley: http://eplex.cs.ucf.edu/hyperNEATpage/content/hyperneat-main.html#methodology HyperNEAT for checkers: https://eplex.cs.ucf.edu/papers/gauci_nc10.pdf