[keep] Integrate UnrealCV with Openai Gym for Reinforcement Learning(RL)

[keep] Integrate UnrealCV with Openai Gym for Reinforcement Learning(RL) === In this tutorial, we show how to get started with installing environment, adding new envirnnment for specific RL tasks and train a DQN model for visual navigation in a realistic room. ![search1](https://i.imgur.com/esXQ0tI.gif)![search2](https://i.imgur.com/fPVfRVt.gif) Installation === Considering performance, we use [nvidia-docker](https://github.com/NVIDIA/nvidia-docker) to run the unreal environment. For the reason that ```nvidia-docker``` supports ```Linux``` and ```Nvidia GPU```only , you will have to install and run our openai-gym environment in ```Linux``` system with ```Nvidida GPU``` ## Docker As the unreal environment with UnrealCV runs inside Docker containers, you are supposed to install [docker](https://docs.docker.com/engine/installation/linux/ubuntu/#install-from-a-package) first. If you use Linux, you can run scripts as below: ``` curl -sSL http://acs-public-mirror.oss-cn-hangzhou.aliyuncs.com/docker-engine/internet | sh - ``` Once docker is installed sucessfully, you are able to run ```docker ps``` and get something like this: ``` $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES ``` To speed up the frame rate of the environment, you need install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker/wiki) to utilize NVIDIA GPU in docker. ``` wget -P /tmp https://github.com/NVIDIA/nvidia-docker/releases/download/v1.0.1/nvidia-docker_1.0.1-1_amd64.deb sudo dpkg -i /tmp/nvidia-docker*.deb && rm /tmp/nvidia-docker*.deb ``` Test nvidia-docker ``` nvidia-docker run --rm nvidia/cuda nvidia-smi ``` You should be able to get the same result as you run ```nvidia-smi``` in your host. ## Openai Gym Install openai gym ``` git clone https://github.com/openai/gym cd gym pip install -e . ``` If you prefer, you can do a minimal install of the packaged version directly from PyPI: ``` pip install gym ``` Install gym-unrealcv ``` git clone https://github.com/zfw1226/gym-unreal.git cd gym-unrealcv pip install -e . ``` Run a simple envirnment === Before you run any examples, please run ``` xhost + ``` Once ```gym-unrealcv``` is installed sucessfully, you will see that your agent walking randomly in first-person view, after you run: ``` cd example/random python random_agent.py ``` It will take a few minutes for the image and realistic environment to pull the first time. After that, if all goes well，a simple predefined gym environment ```Unrealcv-Simple-v0``` wiil be launched.And then you will see that your agent is moving around the realistic room randomly. Add a new UnrealCV Environment === In this section, we will show you how to add a new unrealcv environment in openai gym for your RL tasks, step by step. 1. Copy your new Unreal Environment to ```/gym-unrealcv/gym_unrealcv/envs/UnrealEnv``` 2. Create a new python file in ```/gym-unrealcv/gym_unrealcv/envs```, Write your environment in this file. A simple environment in [unrealcv_simple.py]() is avliable for you.The details of the code are shown as below: ```python = import gym # openai gym interface from unrealcv_cmd import UnrealCv # a lib for using unrealcv client command import run_docker # a lib for run env in a docker container import numpy as np import math ''' State : color image Action: (linear velocity ,angle velocity) Done : Collision detected or get a target place ''' class UnrealCvSimple(gym.Env): # init the Unreal Gym Environment def __init__(self, ENV_DIR_BIN = '/RealisticRendering/RealisticRendering/Binaries/Linux/RealisticRendering', ): self.cam_id = 0 # run virtual enrionment in docker container self.docker = run_docker.RunDocker() env_ip, env_dir = self.docker.start(ENV_DIR_BIN=ENV_DIR_BIN) # connect unrealcv client to server self.unrealcv = UnrealCv(self.cam_id, ip=env_ip, env=env_dir) self.startpose = self.unrealcv.get_pose() # ACTION: (linear velocity ,angle velocity) self.ACTION_LIST = [ (20, 0), (20, 15), (20,-15), (20, 30), (20,-30), ] self.count_steps = 0 self.max_steps = 100 self.target_pos = ( -60, 0, 50) self.action_space = gym.spaces.Discrete(len(self.ACTION_LIST)) state = self.unrealcv.read_image(self.cam_id, 'lit') self.observation_space = gym.spaces.Box(low=0, high=255, shape=state.shape) # update the environment step by step def _step(self, action = 0): (velocity, angle) = self.ACTION_LIST[action] self.count_steps += 1 collision = self.unrealcv.move(self.cam_id, angle, velocity) reward, done = self.reward(collision) state = self.unrealcv.read_image(self.cam_id, 'lit') # limit max step per episode if self.count_steps > self.max_steps: done = True print 'Reach Max Steps' return state, reward, done, {} # reset the environment def _reset(self, ): x,y,z,yaw = self.startpose self.unrealcv.set_position(self.cam_id, x, y, z) self.unrealcv.set_rotation(self.cam_id, 0, yaw, 0) state = self.unrealcv.read_image(self.cam_id, 'lit') self.count_steps = 0 return state # close docker while closing openai gym def _close(self): self.docker.close() # calcuate reward according to your task def reward(self,collision): done = False reward = - 0.01 if collision: done = True reward = -1 print 'Collision Detected!!' else: distance = self.cauculate_distance(self.target_pos, self.unrealcv.get_pos()) if distance < 50: reward = 10 done = True print ('Get Target Place!') return reward, done # calcuate the 2D distance between the target and camera def cauculate_distance(self,target,current): error = abs(np.array(target) - np.array(current))[:2]# only x and y distance = math.sqrt(sum(error * error)) return distance ``` **You should modify ```ENV_DIR_BIN``` to the path of your Unreal Binary**. The same to other gym environments, ```step()```,```reset()``` is necessary.```close()```will help you to close the unreal environment while you closing the gym environment. Differently, you need design your reward function in ```reward()``` for your own task. 3. Import your environment into the ```__init__.py``` file of the collection. This file will be located at ```/gym-unrealcv/gym_unrealcv/envs/__init__.py.``` Add ```from gym_unrealcv.envs.your_env import YourEnv``` to this file. 4. Register your env in ```gym-unrealcv/gym_unrealcv/_init_.py``` 5. You can test your environment by using a random agent ``` cd example/random python random_agent.py -e YOUR_ENV_NAME ``` You will see your agent take some actions randomly and get reward as you defined in the new environment. Run a reinforcement learning example === Besides, we provide an example to train an agent to visual navigation by searching for specific object and avoiding obstacle simultaneously in [Unrealcv-Search-v0]() environement using [Deep Q-Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf). ### Dependences To run this example, you should make sure that you have installed all the dependences. We recommend you to use [anaconda](https://www.continuum.io/downloads) to install and manage your python environment. - Keras(Tested with v1.2) - Theano or thensorflow - Openai gym(>=v0.7) - cv2 - matplotlib - numpy To use Keras(v1.2), you should run ``` pip install keras==1.2 ``` You can start the training process with default parameters by runinng the following script: ``` cd example/dqn python run.py ``` You will see a window like this: ![show](https://i.imgur.com/HyOVKD4.png) While the ```Collision``` button turning red, a collision is detected. While the ```Trigger``` button turning red, the agent is taking an aciton to ask the environment if it is seeing the target in a right place. You can change some parameteters in [```example/dqn/constant.py```]() You can change the architure of DQN in [```example/dqn/dqn.py```]() Visualization === You can display a graph showing the history episode rewards by running the following script: ``` cd example/utility python reward.py ``` ![reward](https://i.imgur.com/W039bbs.jpg) You can display a graph showing the trajectory by running the following script: ``` cd example/utility python trajectory.py ``` ![trajectory](https://i.imgur.com/PKpKHNR.png) - The ```green points``` represent where the agents realized that they had found a good view to observe the target object and got positive reward from the environment.At the same time, the episode is finished. - The ```purple points``` represnet where collision detected collision, agents got negative reward. At the same time, the episode terminated. - The ```red lines``` represent the trajectories that the agents found taget object sucessfully in the end. - The ```black lines``` represent the trajectories of agents that did not find the target object in the end.