HyperNEAT investigations

# HyperNEAT investigations ## Substrate possible structures The first natural choice is to take the structure of the board, where the coordinates of qubit and plaquette locations are rescaled to fit in the square [-1;1]x[-1;1]. ![](https://i.imgur.com/F00TSk6.png) The subtrate also takes into account the 4 output actions nodes, located exactly where they take place (in the figure at the locations of Q10, Q14, Q16, Q19). ### Possible variations: 1. Maybe symmetrizing more, copying qubits on the sides of the board to create a perfectly symmetric input 2. The CPPN takes as input the coordinates (x1, y1, x2, y2), we can exploit translation invariance, by giving only or in addition to these coordinates also the coordinates differences x1-x2 and y1-y2. 3. We can also exploit rotational invariance. Translation invariance being used for the generation of perspectives, we can even reduce the action space to one neuron. [This is investigated and presented in an other hackMD file for the NEAT algorithm] ## Results ### Grid-search over activation functions in the CPPN ![](https://i.imgur.com/xTEryYH.png) To create patterns of connectivity weights having more or less symmetry/repetition/etc.. we can vary the activation functions that compose the CPPN. ### Grid-search for $d=5$ The relatively poor performance of training on $d=5$ can be explained by bad choice of hyperparameters. The figure below shows how the difference can be large between good and bad hyperparameter regimes. ![](https://i.imgur.com/1dMmz14.png) The grid-search leads to the following conclusions: - $\epsilon=0$, it is better to let the NN always take the highest probability action than leaving room for greedy exploration (contrary to $\epsilon=0.1$ in NEAT) - activation function mutate rate should be around 0.1 - set of activation functions for the CPPN should contain linear, gaussian and tanh functions. ## Rotation invariance In this setting, the CPPN encodes a neural network that has only 1 output node. The location of this node is set to (0,0) in the substrate, this allows in principle for better transfer learning properties but looses geometrical information. If we fix the output node to lie above the qubit location that it flips, after rescaling the code representation to [-1, 1]x[-1,1] for every code sizes, this qubit location changes and hence we can expect poorer transferability. ### Results The results seem to suggest that similar to what happened for NEAT, hyperNEAT with only one output reproduces easily MWPM for $d=3$ but struggles quickly on $d=5$. This points out the possibility that $d=3$ decoder may be particular. ![](https://i.imgur.com/8r5GDZu.png) > All the runs are trained separately (no transfer learning, no evaluation from d=3 decoder to d=5) and with different population size indicated also in the legend. #### Transfer learning from a $d=3$ decoder ![](https://i.imgur.com/qthVRoE.png) The transferability performs quite poorly