# **meeting 12/13**
**Advisor: Prof. Chih-Yu Wang \
Presenter: Shao-Heng Chen \
Date: Dec 13, 2023**
<!-- Chih-Yu Wang -->
<!-- Wei-Ho Chung -->
## **Training results**
- Settings

- Red: ```PPO-2-16-4```, Pink: ```A2C-2-16-4```, Cyan: ```DDPG-2-16-4```

- Red: ```PPO-2-16-4```

- Random action baseline
```
(sb3) C:\Users\Paul\Downloads\RIS-MISO-DRL>python torch_env.py
----------------------------------------------------------------
seed_everything() is being called with random seed set to 33
----------------
episide: 1/1
----------------
step: 1000/3000
step: 2000/3000
step: 3000/3000
--------------------------------
random action of (2, 16, 4):
mean: -1.116346258709828, 0.0
std: 0.8228024012631869, 0.0
max: -0.20961391925811768, 0
min: -5.640992641448975, 0
action: [3. 2. 1. 0.]
shape: (220,)
--------------------------------
duration: 9.0267 sec
```
### **Policy Networks**
- Block diagram

- Network architecture

- Model policy
```cpp
ActorCriticPolicy(
(features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(pi_features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(vf_features_extractor): FlattenExtractor(
(flatten): Flatten(start_dim=1, end_dim=-1)
)
(mlp_extractor): MlpExtractor(
(policy_net): Sequential(
(0): Linear(in_features=220, out_features=128, bias=True)
(1): Tanh()
(2): Linear(in_features=128, out_features=64, bias=True)
(3): Tanh()
(4): Linear(in_features=64, out_features=64, bias=True)
(5): Tanh()
(6): Linear(in_features=64, out_features=32, bias=True)
(7): Tanh()
)
(value_net): Sequential(
(0): Linear(in_features=220, out_features=128, bias=True)
(1): Tanh()
(2): Linear(in_features=128, out_features=64, bias=True)
(3): Tanh()
(4): Linear(in_features=64, out_features=64, bias=True)
(5): Tanh()
(6): Linear(in_features=64, out_features=32, bias=True)
(7): Tanh()
)
)
(action_net): Linear(in_features=32, out_features=4, bias=True)
(value_net): Linear(in_features=32, out_features=1, bias=True)
)
```
## **Inference results**
### **Aggregate rewards - Agent vs Random**

#### **PPO**
<img src='https://hackmd.io/_uploads/BJ_ZUzkUT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/BkWS8fJIa.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/Hk3QUMyIT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SJtFLGJUp.png' width=50% weight=50%>
#### **A2C**
<img src='https://hackmd.io/_uploads/r1hy8X18T.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SksMIXyI6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/rJix87J86.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/B1BS8mkLT.png' width=50% weight=50%>
#### **DDPG**
<img src='https://hackmd.io/_uploads/HkMDEH1LT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/HJyO4By86.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/Hy7qESJ8a.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/HkksVr186.png' width=50% weight=50%>
### **Instant rewards**

#### **PPO**
<img src='https://hackmd.io/_uploads/HJvOBM1Ia.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/S1e9BzJUT.png' width=50% weight=50%>
#### **A2C**
<img src='https://hackmd.io/_uploads/Hywyb718p.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/r1kbZmyU6.png' width=50% weight=50%>
#### **DDPG**
<img src='https://hackmd.io/_uploads/ryyhPmJIT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/Bk0hPmJUT.png' width=50% weight=50%>
## **Other experiments**
### ```PPO-2-16-[4, 36]```
Orange: ```PPO-2-16-4```, Blue: ```PPO-2-16-36```

<img src='https://hackmd.io/_uploads/HkpckiEI6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SyQ3kjEIa.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/HJNLVo4LT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/H1xDVsNIa.png' width=50% weight=50%>
#### **Stats of ```PPO-2-16-[4, 36]```**
```shell
--------------------------------
random action:
mean: -4.77138689301163
std: 4.2859295585800945
max: -0.1926957368850708
min: -27.662830352783203
shape: (220,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-10-2-16-4-mse:
mean: -0.3376442492008209
std: 0.07979346811771393
max: -0.11926889419555664
min: -1.3288772106170654
shape: (1, 220)
--------------------------------
```
```shell
--------------------------------
random action:
mean: -1.849950715430081
std: 1.589400550899505
max: -0.15605151653289795
min: -17.99134063720703
shape: (1468,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-10-2-16-36-mse:
mean: -1.5605767965316772
std: 1.3505491018295288
max: -0.018919289112091064
min: -14.890804290771484
shape: (1, 1468)
--------------------------------
```
### ```PPO-2-16-9```
Pink: ```PPO-2-16-9```

<img src='https://hackmd.io/_uploads/BJowLjNLa.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SyQuLj486.png' width=50% weight=50%>
#### **Stats of ```PPO-2-16-9```**
```shell
--------------------------------
random action:
mean: -0.6464642078280449
std: 0.003746165981914554
max: -0.6305257081985474
min: -0.6606713533401489
shape: (415,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-11-2-16-9-mse:
mean: -0.34878310561180115
std: 0.03995276987552643
max: -0.2597486078739166
min: -0.6364725828170776
shape: (1, 415)
--------------------------------
```
### ```PPO-2-16-25```
Cyan: ```PPO-2-16-25```

<img src='https://hackmd.io/_uploads/SyvE3sVL6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/B1krhjELa.png' width=50% weight=50%>
#### **Stats of ```PPO-2-16-25```**
```shell
--------------------------------
random action:
mean: -1.2513083415076136
std: 1.0633530046257404
max: -0.1404857635498047
min: -11.57967472076416
shape: (1039,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-11-2-16-25-mse:
mean: -0.5238159894943237
std: 0.37046128511428833
max: -0.067668616771698
min: -4.118846416473389
shape: (1, 1039)
--------------------------------
```
### ```PPO-[2, 4, 6, 8, 10]-16-16```
Orange: ```PPO-2-16-16```, Blue: ```PPO-4-16-16```, Red: ```PPO-6-16-16```, Cyan: ```PPO-8-16-16```, Pink: ```PPO-10-16-16```

<img src='https://hackmd.io/_uploads/ry--diVI6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/ryuZdj48T.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/rJUD9sV8T.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/Skqv9j4IT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SkwSnQBIp.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/S1EIn7H8p.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SJB7vjS8a.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/ByGEDjHIp.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SJ0PKlILT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/HJW_Ke8Ua.png' width=50% weight=50%>
#### **Stats of ```PPO-[2, 4, 6, 8, 10]-16-16```**
```shell
--------------------------------
random action:
mean: -2.0455671063154934
std: 1.8084281891960132
max: -0.14861458539962769
min: -15.821859359741211
shape: (688,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-11-2-16-16-mse:
mean: -0.3436024785041809
std: 0.06267330050468445
max: -0.20145198702812195
min: -1.0607212781906128
shape: (1, 688)
--------------------------------
```
```shell
--------------------------------
random action:
mean: -3.0224343554168938
std: 1.2649511196974008
max: -0.8858807682991028
min: -12.207118034362793
shape: (816,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-12-4-16-16-mse:
mean: -1.035849928855896
std: 0.3660873472690582
max: -0.45160818099975586
min: -4.301960468292236
shape: (1, 816)
--------------------------------
```
```shell
--------------------------------
random action:
mean: -6.315735139489174
std: 2.9746460466637834
max: -1.05283784866333
min: -28.964038848876953
shape: (944,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-12-6-16-16-mse:
mean: -1.7992961406707764
std: 0.948248028755188
max: -0.545820415019989
min: -12.615047454833984
shape: (1, 944)
--------------------------------
```
```shell
--------------------------------
random action:
mean: -5.396963516545296
std: 2.4218854982579967
max: -1.0782233476638794
min: -26.205829620361328
shape: (1072,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-12-8-16-16-mse:
mean: -3.3886656761169434
std: 1.2201298475265503
max: -1.073995590209961
min: -12.200408935546875
shape: (1, 1072)
--------------------------------
```
```shell
--------------------------------
random action:
mean: -12.306949747651815
std: 6.374178071596621
max: -1.8142037391662598
min: -72.79109191894531
shape: (1200,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-12-10-16-16-mse:
mean: -1.503293752670288
std: 0.3083134889602661
max: -0.9549587965011597
min: -4.4894700050354
shape: (1, 1200)
--------------------------------
```
<!-- <img src='' width=50% weight=50%>
<img src='' width=50% weight=50%> -->
### ```PPO-10-16-36```
Cyan: ```PPO-10-16-36-2300```

<img src='https://hackmd.io/_uploads/SyDR5rPIa.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/rJ-yorDU6.png' width=50% weight=50%>
```shell
--------------------------------
random action:
mean: -9.217135637432337
std: 4.7068562802938505
max: -1.2675031423568726
min: -48.2896614074707
shape: (2300,)
--------------------------------
--------------------------------
model inference of PPO-2023-11-13-10-16-36-mse:
mean: -5.165438175201416
std: 1.8534566164016724
max: -1.520188808441162
min: -18.997055053710938
shape: (1, 2300)
--------------------------------
```