# Heroic RL研究
github link:
https://github.com/Nordeus/heroic-rl
## Using Podman
open server:
```
podman run -it -d --network host --name="server" quay.io/nordeus/heroic-rl-server:latest
```
open agent and train:
```
podman run -it --network host -v $PWD/data:/app/data --name="agent" --gpus all quay.io/nordeus/heroic-rl-agent "train -e 1000 agent1"
```
have to create ```data``` directory first time:
```
cd data
chmod 777 data
```
### Tensorboard
```
podman run -d \
--shm-size 8G \
-it \
-v $PWD:/home \
--name="test" \
-p 6006:6006 \
docker.io/tensorflow/tensorflow:1.14.0-gpu-py3
podman attach test
tensorboard --logdir /home/data/agent1
open browser nv04:6006
```
### Visualize
```
podman run -it --network host -v $PWD/data:/app/data --name="render" --gpus all quay.io/nordeus/heroic-rl-agent "render data/agent1/agent1_s1673683865/agent_1/simple_save60"
```
### Resume
```
podman run -it --network host -v $PWD/data:/app/data --name="agent_resume" --gpus all quay.io/nordeus/heroic-rl-agent "resume data/agent1/agent1_s1673683865"
```
## Using Local
### Install
https://python-poetry.org/docs/
(*) : can be done outside conda env.
(#) : don't need in NV.
```
conda create -n hero python=3.6
conda activate hero
(*) git clone https://github.com/Nordeus/heroic-rl
(*#) sudo apt-get install python3-venv python3-setuptools python3-dev gcc libopenmpi-dev
(*) curl -sSL https://install.python-poetry.org | POETRY_VERSION=1.1.15 python3 -
export PATH="/home/yucheng/.local/bin:$PATH"
(NV's path should be different.)
(just for checking) poetry --version
cd heroic-rl
poetry install -E gpu
```
### Open server
```
podman run -it -d --name="server" -p 8081:8081 quay.io/nordeus/heroic-rl-server:latest
```
### Open client
```
poetry run heroic-rl train agent1
```
### Open tensorboard
```
podman run -d \
--shm-size 8G \
-it \
-v $PWD:/home \
--name="test" \
-p 6006:6006 \
docker.io/tensorflow/tensorflow:1.14.0-gpu-py3
podman attach test
tensorboard --logdir /home/data/agent1
open browser 127.0.0.1:6006
```
### Troubleshoot
https://developer.nvidia.com/rdp/cudnn-archive
If met ```Could not load dynamic library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open ...```
then:
```
conda install cudatoolkit=10.0
```
If met ```Could not load dynamic library 'libcudnn.so.7```
then install cudnn 7 and add to conda's path. For example:
```
tar -xvf cudnn-10.0-linux-x64-v7.6.5.32.tgz
sudo cp cuda/include/cudnn*.h /home/yucheng/anaconda3/envs/heroic/include
sudo cp cuda/lib64/libcudnn* /home/yucheng/anaconda3/envs/heroic/lib
sudo chmod a+r /home/yucheng/anaconda3/envs/heroic/include/cudnn*.h /home/yucheng/anaconda3/envs/heroic/lib/libcudnn*
```
Finally, check if success or not:
```
import tensorflow as tf
tf.test.is_gpu_available()
```
Return ```True``` if success.
If mpi4py install fail when running ```poetry install -E gpu```
then install mpi4py 3.0.3 directly:
```
conda install -c "conda-forge/label/cf202003" mpi4py
```
after that, poetry install should succeed.
# Function Call
step 1:
程式進入點: heroic_rl/cli/\_init\_.py
step 2:
執行 heroic_rl/cli/commands.py,並分成 train, resume, serve, render, simulate 五種功能
step 3:
train (in line.300) -> TrainingCfg (in heroic_rl/train/cfg.py) -> run (in heroic_rl/train/experiment.py)
## Training
step 1:
In heroic_rl/train/experiment.py ```def run()``` (in line.7)
step 2:
```agent.run()``` (in line.83)
step 3:
In heroic_rl/agent/agents.py ```def run()``` (in line.438)
## Adversary while training
In heroic_rl/train/plan.py
step 1:
agent calls ```cfg.create_plan()``` (in heroic_rl/agent/agents.py line.484)
step 2:
In heroic_rl/train/cfg.py ```def create_plan()``` (in line.570)
In line.501 and line.528 we know PLAN = "utility"
step 3:
In heroic_rl/train/plan.py ```def utility()``` (in line.296)
In heroic_rl/train/enums.py ```class Brain``` (in line.238)
UTILITY, LOOKAHEAD -> defined AI
DUMMY -> selfplay
## Adversary while rendering
UTILITY_9
(in heroic_rl/render/tui.py line.132)
# 組牌
class DeckRepository:
* heroic_rl/train/decks.py
call deck:
* heroic_rl/train/cfg.py call from_csv (line.654)
* DEFAULT_DECKS_CSV_PATH = "decks.csv" (line.504, 531)
* https://github.com/Nordeus/heroic-rl/blob/master/decks.csv
random pick:
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/gym_heroic/envs/heroic_env.py

# 角色
defined in heroic_rl/train/enums.py

60隻角色中的25隻有出現
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/enums.py
# Game Play


# RL Environment
* U = 56: total number of units.
* L = 3: the number of lanes.
* discretize each lane by splitting it into D = 10 bins of equal length.
* S = 25: the number of available spells.
* A = U + S + 1: the total number of actions in the game.
## Observations
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/obs.py
(Os, Ons): Os is the spatial and Ons is the non-spatial component.
Os: D x L x 2U

Ons: A + 3

## Actions
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/agent/agents.py

## Network
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/algos/layers.py
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/algos/core_ppo_heroic.py

Left: Two-headed policy network
Center: Single-headed policy network
Right: Value network
a: 每個action的機率 non-spatial
z: action要執行的位置 spatial (x, y)
z is conditioned on a
## Reward
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/rewards.py
win: +1
loss: -1
Monte-Carlo return with discount factor.

## Training Curriculum
1. rule-based (heuristic) AI
2. tree-search AI
3. self-play with ensemble of policies
## Hyperparameters and Settings
https://github.com/Nordeus/heroic-rl/blob/master/heroic_rl/train/cfg.py