# Differen training procedures
## Changing the error rates of training puzzles

> *Legend:* Result of a NEAT run when NN are trained on configurations generated at different error rates (first 4 values in the legend). The last number in the legend is the number of generations. The error bars come from averaging the results over 3 runs.
**Commentary:** The results are better when the training puzzles are hard. Since it is somehow an ad hoc choice to pick these error rates as training error rates.
Maybe we need to change the reward system or the training puzzles "dataset" to perform better and take into account the fact that the NNs have to be exposed to difficult problems.
## Idea 1: Resampling algorithm
Another possibility is to implement what is known in the literature as **resampling**. The idea is to modify the training dataset so that the NNs get exposed to puzzles they struggle more on, this is done by resampling harder samples more than easier ones. Then the training dataset difficulty distribution reflects the difficulty of NNs to solve these samples.
By a grid-search analysis, the compatibility threshold that governs the way the species are created (the limit distance between members of the same specie) is found to be optimal with a value 4 for $d=3$.
### $d=3$

> Legend: In orange, the training dataset is resampled based on the previous generation average success for each error rate (training is done on [0.01, 0.05, 0.1, 0.15] error rates). In blue, the normal setup is showed with exactly the same hyperparameters. Results are averaged over 3 independent training runs.
I checked that the ordering of the curves is the same for all the compatibility threshold in {4,5,6}.
TODO: aybe check that the best is indeed saved, because from one generation to the other, the fitness defintion varies (since the difficulty of the training puzzles vary), then each best genome of each generation should be tested on a separate test puzzle set.
## Idea 2: New reward system [ongoing]
Solving harder puzzles gives you a greater reward, whereas failing at easy puzzles make you loose lot of "points". The difficulty of the puzzles is based on the number of qubits flips present in the initial syndrome configurations.
*(Note that this setup works only for synthetic data, if we were to train the agent on real-data, it is usually not possible to know about the underlying qubit errors - I might be wrong though)*
Note that the fitness of a NN on a particular puzzle is not anymore in ${0,1}$ but rather in $\mathbb{R}$ as follows:

For $d=3$, the failure reward for 1 initial error is fixed at the number of qubits in the code $-2 d^2$. Given that for error rates [0.01, 0.15], the number of initial qubits errors never exceed 8-9 flips for d=3, we could consider the offset between success and reward as an hyperparameter.
### $d=3$

### $d=5$

The results seem very promising! But it needs a more thorough analysis [ongoing work]