[MountainCar-v0](https://gym.openai.com/envs/MountainCar-v0/)

# [MountainCar-v0](https://gym.openai.com/envs/MountainCar-v0/) &emsp; &emsp; **Environment :** &emsp; &emsp; ![](https://i.imgur.com/tdKhe00.png) &emsp; &emsp; **Reward :** 每走一步都會減少 1 分，直到到達目標點為止 (position 0.5) &emsp; &emsp; **Starting State :** 初始位置為 positon : -0.6 ~ -0.4 ，且速度 (velocity) 為 0 &emsp; &emsp; **Episode Termination :** 當你到達目標點或是移動了250步之後， episode 結束 &emsp; &emsp; &emsp; &emsp; ```python= import numpy as np import matplotlib.pyplot as plt %matplotlib inline import gym env = gym.make("MountainCar-v0") env.reset() plt.imshow(env.render('rgb_array')) print("Observation space:", env.observation_space) print("Action space:", env.action_space) ``` &emsp; Observation space: Box(2,) Action space: Discrete(3) ![](https://i.imgur.com/H46IM6j.png) &emsp; &emsp; ```python= obs0 = env.reset() print("initial observation code:", obs0) # Note: in MountainCar, observation is just two numbers: car position and velocity ``` &emsp; initial observation code: [-0.42123736 0. ] &emsp; &emsp; ```python= print("taking action 2 (right)") new_obs, reward, is_done, _ = env.step(2) print("new observation code:", new_obs) print("reward:", reward) print("is game over?:", is_done) # Note: as you can see, the car has moved to the right slightly (around 0.0005) ``` &emsp; taking action 2 (right) new observation code: [-4.20993065e-01 2.44298593e-04] reward: -1.0 is game over?: False &emsp; &emsp; &emsp; &emsp; Hint: your action at each step should depend either on t or on s. ```python= from IPython import display # create env manually to set time limit. Please don't change this. TIME_LIMIT = 250 env = gym.wrappers.TimeLimit( gym.envs.classic_control.MountainCarEnv(), max_episode_steps=TIME_LIMIT + 1, ) s = env.reset() actions = {'left': 0, 'stop': 1, 'right': 2} plt.figure(figsize=(4, 3)) display.clear_output(wait=True) for t in range(TIME_LIMIT): plt.gca().clear() # change the line below to reach the flag if t>50 and t<100: s, r, done, _ = env.step(actions['left']) else: s, r, done, _ = env.step(actions['right']) # draw game image on display plt.imshow(env.render('rgb_array')) display.clear_output(wait=True) display.display(plt.gcf()) if done: print("Well done!") break else: print("Time limit exceeded. Try again.") display.clear_output(wait=True) ``` ![](https://i.imgur.com/89bXNa0.png) &emsp; &emsp; ```python= assert s[0] > 0.47 print("You solved it!") ``` &emsp; You solved it! &emsp; &emsp; &emsp; &emsp; &emsp; &emsp; &emsp; &emsp; time_limit.py ```python= import gym class TimeLimit(gym.Wrapper): def __init__(self, env, max_episode_steps=None): super(TimeLimit, self).__init__(env) if max_episode_steps is None: max_episode_steps = env.spec.max_episode_steps self.env.spec.max_episode_steps = max_episode_steps self._max_episode_steps = max_episode_steps self._elapsed_steps = None def step(self, action): assert self._elapsed_steps is not None, "Cannot call env.step() before calling reset()" observation, reward, done, info = self.env.step(action) self._elapsed_steps += 1 if self._elapsed_steps >= self._max_episode_steps: info['TimeLimit.truncated'] = not done done = True return observation, reward, done, info def reset(self, **kwargs): self._elapsed_steps = 0 return self.env.reset(**kwargs) ```