Implementation

# Implementation [Discussion Here](https://hackmd.io/uun9tRn9TFq2TfbwNkj9dA) ## Mirage Mazes ### RL Algorithm: TD-lambda - Actions Space: - Traverse maze * Up * Down * Left * Right - Check wall - State Space: - all the cells in the maze are states - Reward Function: - Traversing maze = -1 - Reaching the Goal = 0 - Finding a mirage wall = 1/($distance_{manhattan}(end-current)$) - Checking a normal wall for mirage nature = -1/($distance_{manhattan}(end-current)$) (Rewards and functions used to deliver rewards are subject to experimentation) To be Explored: changing the costs of traversal to be 0 and reaching the end-state to be 1. Expectation: finding mirage walls will be more rewarding, as absolute reward will increase (traversal has penalty in the other case and hence overshadows the reward of finding a mirage wall). **Updation** : Rewards for finding a mirage wall will have to be delivered after 'k' traversals to observe the effect of the identification.