# Poster teksten
alles tussen haakjes (zoals dit) is optioneel en laten we weg indien er niet genoeg plaats is
## introduction
We want to efficiently optimize a robot arm for complex situations, like maneuvering over walls and through small passages.
The robot arm consists of modules which are connected with joints, the composition of the robot is optimized using the morphological evolution. Futhermore we optimize the behaviour of the robot using reinforcement learning.
## simulation environment
The robot arm and its environment are simulated using Unity.
* The morphology of the robot is defined in URDF-format
* Communication with Python happens using Unity ML-Agents Toolkit and OpenAI Gym
* Goal location and obstacle specifications are defined in JSON format
* Obstacles get randomly generated for multiple difficulty levels
## morphological evolution
Morphological evolution is used to optimize the robot for the given problem.
* Evolution using a genetic algorithm
* Encoding: Directed Graph Encodings
* (Mutation: splitting and combining modules + crossover)
* (Sampling of the target angles)
* Use of reinforcement learning to co-optimize
* Optimization: maximization of the costfunction
* Cost function: `coverage*log(redundancy)/#modules` and RL-accuracy
## reinforcement learning
Reinforcement learning is used to define the optimal behaviour of the robot arm in order to reach a certain position in space.
* DQN algorithm + target network that is updated every 100 steps
* Goals:
* Initially chosen in close proximity to a given position, the environment grows gradually
* New 3D goal every episode
* Goal is reached if the agent's end effector is located within 3,0 from the goal
* State: the difference between the x, y and z coordinates of the end effector and the goal (continuous values)
* During the first 5000 steps the actions are chosen randomly, then the exponential ε decay starts
* Replay memory: size of 100000 and batch size of 32
## results
### morphological evolution
* The cost function works, the number of modules isn't always the maximum ifit's more efficient to have less modules.
* Het average coverage goes up to 80% after 50 generations.
### reinforcement learning
* Success rate starts increasing from episode 125, when the epsilon drops down to ~0,08
* The final success rate is 81,92% for DQN
## discussion
### Morphological evolution
Improvements have been made with cost function, sampling and representation. The cost function searches for a middle ground between the amount of modules and the coverage(, mirroring reinforcement learning). The co-optimization can be customized and the communication between MorphEvo and RL is correct.
### Reinforcement learning
The task is extremely sensitive to the model's parameters: if these aren't exactly right, the model will not learn. It is important to normalize the input and to use a high learning rate.
## future work
We won't do this on the poster
### morphological evolution
Further improvements might come from using different encodings, cost functions, mutation steps or even a complete overhaul of the algorithm.
### reinforcement learning