PyBullet Ant Environment

# PyBullet Ant Environment ## Extra Resources [Pybullet Website](https://pybullet.org/wordpress/) [Pybullet Quickstart Guide](https://docs.google.com/document/d/10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA/edit#heading=h.2ye70wns7io3) [Pybullet Gym Environments](https://github.com/benelot/pybullet-gym) > To access the Pybullet Ant Environment, click [this link](https://github.com/benelot/pybullet-gym/blob/master/pybulletgym/envs/roboschool/robots/locomotors/ant.py) # State Space ## Agent ![](https://i.imgur.com/s9ho2pl.png) The Ant model consists of a center and 4 legs. Each limb is named after its corresponding position relative to the center: front left leg, front right leg, left back leg, and right back leg. Each leg consists of two hinges which may be manipulated via taking actions. Each hinge connects two links, with the `hip` hinge connecting the leg to the center body, and the `ankle` hinge connecting the `foot` and `leg`. ## Observation space The returned observation space is a 29 dimensional array. Each value in the array is normalized to a value between 0-1. The order of the returned values is as follows: | Indices | Names | Description | | -------- | -------- | -------- | | 0-2 | Torso position | (x, y, z) | | 3-6 | Orientation | (x, y, z, w)* | | 7-8 | Front left leg joint angles (JA)| (Hip JA), (Ankle JA)| | 9-10 | Back left leg JAs | (Hip JA), (Ankle JA) | | 11-12 | Back right leg JAs | (Hip JA), (Ankle JA) | | 13-14 | Front right leg JAs | (Hip JA), (Ankle JA) | | 15-17 | Torso Velocity | (x, y, z) | | 18-20 | Torso Angular Velocity ($\omega$) | (x, y, z) | | 21-22 | Front left leg velocity $\omega$| (Hip velocity), (Ankle velocity) | | 23-24 | Back left leg velocity | (Hip velocity), (Ankle velocity) | | 25-26 | Back right leg velocity| (Hip velocity), (Ankle velocity) | | 27-28 | Front right leg velocity| (Hip velocity), (Ankle velocity)| > *The format (x, y, z, w) is in the quadternion format. [To learn more, click me](https://www.youtube.com/watch?v=zjMuIxRvygQ) # Action Space There are 8 possible actions that the agent can make, all of each add torque to the motor. Each set of actions is returned as an array in the following format: | Index | Affected Motor | | -------- | -------- | | 0 | Front left hip | 1 | Front left ankle | 2 | Back left hip | 3 | Back left ankle | 4 | Back right hip | 5 | Back right ankle | 6 | Front right hip | 7 | Front right ankle Each index of the array will consist of a number in the range of $[-1, 1]$ which will apply the torque specificed to the motor in that index. # Reward Function ## Alive Being alive is defined as the agent's center sphere being above the ground. Starting from the center, the agent's center sphere has a radius of 0.25. Therefore, we define the agent being "alive" as the center of the agent's center sphere being above the coordinate 0.25. ## Progress We define an endpoint at a pair of 2 dimensional euclidean cordinates $(x_f, y_f)$ and an agent's cordinates at a given time as $(x_t, y_t)$. Additionally, we define the difference between the coordinates of an agent's current $x$ position and endpoint $x$ coordinate as $x_{\Delta_t} := x_f - x_t$. Similarly, we define the difference between the $y$ coordinates of an agent's current $y$ position and the endpoint $y$ coordinate as $y_{\Delta_t} := y_f - y_t$. Let Progress at a given time $P_t$ be defined as the Frobenius norm between the sum of $x_{\Delta_t}$ and $y_{\Delta_t}$ such that $$ P := \sqrt{x_{\Delta_t}^2+y_{\Delta_t}^2}$$ Furthermore, we define $P_{t-1}$ as the previous $P$ such that $$ P_{t-1} := \sqrt{x_{\Delta_{t-1}}^2+y_{\Delta_{t-1}}^2}$$ Therefore, we observe $P$ is nothing more than the norm of the distance between the agent and the endpoint. ## Electricity Cost Let $a$ be the action inputs for a given step and $a_l$ as the length of $a$. Let $C_{S,j}$ be the electricity cost for moving a joint $j$ at speed $S$. Furthermore, we define the torque cost of $C_{\tau}$ as the cost to run electricity through a joint's motor at 0 rotational speed (idle). > $C_{S,j}$ is calculated by multiplying a constant $C_S$ (Hyperparameter: the flat electricity cost of moving a joint per unit) by the speed $S$. > $C_{\tau}$ is calculated by multiplying a constant $C_{\tau}$ for each joint. $C_{\tau}$ is a hyperparameter. The total electricity cost is then defined as $$ C_{E_T} = \frac{1}{a_l}(\sum^{a_l}_{j=1}|C_{S,j}| + \sum^{a_l}_{j=1}{C_{\tau,j}^2})$$ ## Joints at Limit Cost Let $a$ be the action inputs for a given step and $a_l$ as the length of $a$. Moreover, let $\ell_{max}$ be the maximum position for an internal coordinate (in this code, the range of positions allowed is $[-1,1]$. Therefore, $\ell_{max}$ is set to 1). Let $C_l$ be a set defined as $$C_l = \{a_l|a_l, |a_l|>\ell_{max} \quad \forall a_l \in a\}$$ | Name | Calculation | Description | | -------- | -------- | -------- | | `Alive` | +1 or -1 | If the agent's center is above the cordinate 0.25, then it is considered "alive"| | `Progress` | $P_t-P_{t-1}$ | The difference in progress for a given action. | `Electricity Cost` | $C_{E_T} = -\frac{1}{a_l}(\sum^{a_l}_{j=1}C_{S,j}+\sum^{a_l}_{j=1}{C_{\tau,j}^2})$| The total electrical cost for performing an action with motors.| | `Joints at Limit Cost` | $-C_l$ (see Joints at Limit Cost section) | The number of joints that exceed the maximum position for each joint's internal coordinate. | `Feet Collision Cost` | `0` | The reward gained for colliding feet. This value is a work in progress. # Goal of the Environment The goal of the environment is to reach the specificed end position (1 kilometer away) in the fewest steps possible (this is due to the electricity cost hyperparameter). # Environmental Dynamics ## Terminal states The terminal status determines if the experiment continues. If `Alive` is `False`, the `terminal` condition will trigger as `True`, leading to the end of the experiment. ## Air Resistance Currently, the environment does not factor in air resistance.