Heroic RL研究

# Heroic RL研究 github link: https://github.com/Nordeus/heroic-rl ## Using Podman open server: ``` podman run -it -d --network host --name="server" quay.io/nordeus/heroic-rl-server:latest ``` open agent and train: ``` podman run -it --network host -v $PWD/data:/app/data --name="agent" --gpus all quay.io/nordeus/heroic-rl-agent "train -e 1000 agent1" ``` have to create ```data``` directory first time: ``` cd data chmod 777 data ``` ### Tensorboard ``` podman run -d \ --shm-size 8G \ -it \ -v $PWD:/home \ --name="test" \ -p 6006:6006 \ docker.io/tensorflow/tensorflow:1.14.0-gpu-py3 podman attach test tensorboard --logdir /home/data/agent1 open browser nv04:6006 ``` ### Visualize ``` podman run -it --network host -v $PWD/data:/app/data --name="render" --gpus all quay.io/nordeus/heroic-rl-agent "render data/agent1/agent1_s1673683865/agent_1/simple_save60" ``` ### Resume ``` podman run -it --network host -v $PWD/data:/app/data --name="agent_resume" --gpus all quay.io/nordeus/heroic-rl-agent "resume data/agent1/agent1_s1673683865" ``` ## Using Local ### Install https://python-poetry.org/docs/ (*) : can be done outside conda env. (#) : don't need in NV. ``` conda create -n hero python=3.6 conda activate hero (*) git clone https://github.com/Nordeus/heroic-rl (*#) sudo apt-get install python3-venv python3-setuptools python3-dev gcc libopenmpi-dev (*) curl -sSL https://install.python-poetry.org | POETRY_VERSION=1.1.15 python3 - export PATH="/home/yucheng/.local/bin:$PATH" (NV's path should be different.) (just for checking) poetry --version cd heroic-rl poetry install -E gpu ``` ### Open server ``` podman run -it -d --name="server" -p 8081:8081 quay.io/nordeus/heroic-rl-server:latest ``` ### Open client ``` poetry run heroic-rl train agent1 ``` ### Open tensorboard ``` podman run -d \ --shm-size 8G \ -it \ -v $PWD:/home \ --name="test" \ -p 6006:6006 \ docker.io/tensorflow/tensorflow:1.14.0-gpu-py3 podman attach test tensorboard --logdir /home/data/agent1 open browser 127.0.0.1:6006 ``` ### Troubleshoot https://developer.nvidia.com/rdp/cudnn-archive If met ```Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open ...``` then: ``` conda install cudatoolkit=10.0 ``` If met ```Could not load dynamic library 'libcudnn.so.7``` then install cudnn 7 and add to conda's path. For example: ``` tar -xvf cudnn-10.0-linux-x64-v7.6.5.32.tgz sudo cp cuda/include/cudnn*.h /home/yucheng/anaconda3/envs/heroic/include sudo cp cuda/lib64/libcudnn* /home/yucheng/anaconda3/envs/heroic/lib sudo chmod a+r /home/yucheng/anaconda3/envs/heroic/include/cudnn*.h /home/yucheng/anaconda3/envs/heroic/lib/libcudnn* ``` Finally, check if success or not: ``` import tensorflow as tf tf.test.is_gpu_available() ``` Return ```True``` if success. If mpi4py install fail when running ```poetry install -E gpu``` then install mpi4py 3.0.3 directly: ``` conda install -c "conda-forge/label/cf202003" mpi4py ``` after that, poetry install should succeed. # Function Call step 1: 程式進入點: heroic_rl/cli/\_init\_.py step 2: 執行 heroic_rl/cli/commands.py，並分成 train, resume, serve, render, simulate 五種功能 step 3: train (in line.300) -> TrainingCfg (in heroic_rl/train/cfg.py) -> run (in heroic_rl/train/experiment.py) ## Training step 1: In heroic_rl/train/experiment.py ```def run()``` (in line.7) step 2: ```agent.run()``` (in line.83) step 3: In heroic_rl/agent/agents.py ```def run()``` (in line.438) ## Adversary while training In heroic_rl/train/plan.py step 1: agent calls ```cfg.create_plan()``` (in heroic_rl/agent/agents.py line.484) step 2: In heroic_rl/train/cfg.py ```def create_plan()``` (in line.570) In line.501 and line.528 we know PLAN = "utility" step 3: In heroic_rl/train/plan.py ```def utility()``` (in line.296) In heroic_rl/train/enums.py ```class Brain``` (in line.238) UTILITY, LOOKAHEAD -> defined AI DUMMY -> selfplay ## Adversary while rendering UTILITY_9 (in heroic_rl/render/tui.py line.132) # 組牌 class DeckRepository: * heroic_rl/train/decks.py call deck: * heroic_rl/train/cfg.py call from_csv (line.654) * DEFAULT_DECKS_CSV_PATH = "decks.csv" (line.504, 531) * https://github.com/Nordeus/heroic-rl/blob/master/decks.csv random pick: https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/gym_heroic/envs/heroic_env.py ![](https://i.imgur.com/tiyHRm5.png) # 角色 defined in heroic_rl/train/enums.py ![](https://i.imgur.com/f5PAGnC.jpg) 60隻角色中的25隻有出現 https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/enums.py # Game Play ![](https://i.imgur.com/YXEH5Jo.png) ![](https://i.imgur.com/XG5wBvE.png) # RL Environment * U = 56: total number of units. * L = 3: the number of lanes. * discretize each lane by splitting it into D = 10 bins of equal length. * S = 25: the number of available spells. * A = U + S + 1: the total number of actions in the game. ## Observations https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/obs.py (Os, Ons): Os is the spatial and Ons is the non-spatial component. Os: D x L x 2U ![](https://i.imgur.com/MD12WHv.png =65%x) Ons: A + 3 ![](https://i.imgur.com/GDoVbDg.png =65%x) ## Actions https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/agent/agents.py ![](https://i.imgur.com/swRQOZO.png =65%x) ## Network https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/algos/layers.py https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/algos/core_ppo_heroic.py ![](https://i.imgur.com/9a1W6Vx.png) Left: Two-headed policy network Center: Single-headed policy network Right: Value network a: 每個action的機率 non-spatial z: action要執行的位置 spatial (x, y) z is conditioned on a ## Reward https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/rewards.py win: +1 loss: -1 Monte-Carlo return with discount factor. ![](https://i.imgur.com/js6xfNw.png =65%x) ## Training Curriculum 1. rule-based (heuristic) AI 2. tree-search AI 3. self-play with ensemble of policies ## Hyperparameters and Settings https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/cfg.py