# **meeting 12/13** **Advisor: Prof. Chih-Yu Wang \ Presenter: Shao-Heng Chen \ Date: Dec 13, 2023** <!-- Chih-Yu Wang --> <!-- Wei-Ho Chung --> ## **Training results** - Settings ![image](https://hackmd.io/_uploads/HJVArWQ86.png) - Red: ```PPO-2-16-4```, Pink: ```A2C-2-16-4```, Cyan: ```DDPG-2-16-4``` ![image](https://hackmd.io/_uploads/S13qafxIp.png) - Red: ```PPO-2-16-4``` ![螢幕擷取畫面 2023-12-07 174743](https://hackmd.io/_uploads/HyaxgQeUp.png) - Random action baseline ``` (sb3) C:\Users\Paul\Downloads\RIS-MISO-DRL>python torch_env.py ---------------------------------------------------------------- seed_everything() is being called with random seed set to 33 ---------------- episide: 1/1 ---------------- step: 1000/3000 step: 2000/3000 step: 3000/3000 -------------------------------- random action of (2, 16, 4): mean: -1.116346258709828, 0.0 std: 0.8228024012631869, 0.0 max: -0.20961391925811768, 0 min: -5.640992641448975, 0 action: [3. 2. 1. 0.] shape: (220,) -------------------------------- duration: 9.0267 sec ``` ### **Policy Networks** - Block diagram ![image](https://hackmd.io/_uploads/r1oqtMk8p.png) - Network architecture ![image](https://hackmd.io/_uploads/S1fNU-X86.png) - Model policy ```cpp ActorCriticPolicy( (features_extractor): FlattenExtractor( (flatten): Flatten(start_dim=1, end_dim=-1) ) (pi_features_extractor): FlattenExtractor( (flatten): Flatten(start_dim=1, end_dim=-1) ) (vf_features_extractor): FlattenExtractor( (flatten): Flatten(start_dim=1, end_dim=-1) ) (mlp_extractor): MlpExtractor( (policy_net): Sequential( (0): Linear(in_features=220, out_features=128, bias=True) (1): Tanh() (2): Linear(in_features=128, out_features=64, bias=True) (3): Tanh() (4): Linear(in_features=64, out_features=64, bias=True) (5): Tanh() (6): Linear(in_features=64, out_features=32, bias=True) (7): Tanh() ) (value_net): Sequential( (0): Linear(in_features=220, out_features=128, bias=True) (1): Tanh() (2): Linear(in_features=128, out_features=64, bias=True) (3): Tanh() (4): Linear(in_features=64, out_features=64, bias=True) (5): Tanh() (6): Linear(in_features=64, out_features=32, bias=True) (7): Tanh() ) ) (action_net): Linear(in_features=32, out_features=4, bias=True) (value_net): Linear(in_features=32, out_features=1, bias=True) ) ``` ## **Inference results** ### **Aggregate rewards - Agent vs Random** ![image](https://hackmd.io/_uploads/Bkw9IZmUT.png) #### **PPO** <img src='https://hackmd.io/_uploads/BJ_ZUzkUT.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/BkWS8fJIa.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/Hk3QUMyIT.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/SJtFLGJUp.png' width=50% weight=50%> #### **A2C** <img src='https://hackmd.io/_uploads/r1hy8X18T.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/SksMIXyI6.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/rJix87J86.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/B1BS8mkLT.png' width=50% weight=50%> #### **DDPG** <img src='https://hackmd.io/_uploads/HkMDEH1LT.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/HJyO4By86.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/Hy7qESJ8a.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/HkksVr186.png' width=50% weight=50%> ### **Instant rewards** ![image](https://hackmd.io/_uploads/Skn2UbQUp.png) #### **PPO** <img src='https://hackmd.io/_uploads/HJvOBM1Ia.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/S1e9BzJUT.png' width=50% weight=50%> #### **A2C** <img src='https://hackmd.io/_uploads/Hywyb718p.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/r1kbZmyU6.png' width=50% weight=50%> #### **DDPG** <img src='https://hackmd.io/_uploads/ryyhPmJIT.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/Bk0hPmJUT.png' width=50% weight=50%> ## **Other experiments** ### ```PPO-2-16-[4, 36]``` Orange: ```PPO-2-16-4```, Blue: ```PPO-2-16-36``` ![image](https://hackmd.io/_uploads/rkfqY948p.png) <img src='https://hackmd.io/_uploads/HkpckiEI6.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/SyQ3kjEIa.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/HJNLVo4LT.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/H1xDVsNIa.png' width=50% weight=50%> #### **Stats of ```PPO-2-16-[4, 36]```** ```shell -------------------------------- random action: mean: -4.77138689301163 std: 4.2859295585800945 max: -0.1926957368850708 min: -27.662830352783203 shape: (220,) -------------------------------- -------------------------------- model inference of PPO-2023-11-10-2-16-4-mse: mean: -0.3376442492008209 std: 0.07979346811771393 max: -0.11926889419555664 min: -1.3288772106170654 shape: (1, 220) -------------------------------- ``` ```shell -------------------------------- random action: mean: -1.849950715430081 std: 1.589400550899505 max: -0.15605151653289795 min: -17.99134063720703 shape: (1468,) -------------------------------- -------------------------------- model inference of PPO-2023-11-10-2-16-36-mse: mean: -1.5605767965316772 std: 1.3505491018295288 max: -0.018919289112091064 min: -14.890804290771484 shape: (1, 1468) -------------------------------- ``` ### ```PPO-2-16-9``` Pink: ```PPO-2-16-9``` ![image](https://hackmd.io/_uploads/HJaPtcVIT.png) <img src='https://hackmd.io/_uploads/BJowLjNLa.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/SyQuLj486.png' width=50% weight=50%> #### **Stats of ```PPO-2-16-9```** ```shell -------------------------------- random action: mean: -0.6464642078280449 std: 0.003746165981914554 max: -0.6305257081985474 min: -0.6606713533401489 shape: (415,) -------------------------------- -------------------------------- model inference of PPO-2023-11-11-2-16-9-mse: mean: -0.34878310561180115 std: 0.03995276987552643 max: -0.2597486078739166 min: -0.6364725828170776 shape: (1, 415) -------------------------------- ``` ### ```PPO-2-16-25``` Cyan: ```PPO-2-16-25``` ![image](https://hackmd.io/_uploads/H1BaFq4L6.png) <img src='https://hackmd.io/_uploads/SyvE3sVL6.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/B1krhjELa.png' width=50% weight=50%> #### **Stats of ```PPO-2-16-25```** ```shell -------------------------------- random action: mean: -1.2513083415076136 std: 1.0633530046257404 max: -0.1404857635498047 min: -11.57967472076416 shape: (1039,) -------------------------------- -------------------------------- model inference of PPO-2023-11-11-2-16-25-mse: mean: -0.5238159894943237 std: 0.37046128511428833 max: -0.067668616771698 min: -4.118846416473389 shape: (1, 1039) -------------------------------- ``` ### ```PPO-[2, 4, 6, 8, 10]-16-16``` Orange: ```PPO-2-16-16```, Blue: ```PPO-4-16-16```, Red: ```PPO-6-16-16```, Cyan: ```PPO-8-16-16```, Pink: ```PPO-10-16-16``` ![image](https://hackmd.io/_uploads/BkS4wlUUp.png) <img src='https://hackmd.io/_uploads/ry--diVI6.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/ryuZdj48T.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/rJUD9sV8T.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/Skqv9j4IT.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/SkwSnQBIp.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/S1EIn7H8p.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/SJB7vjS8a.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/ByGEDjHIp.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/SJ0PKlILT.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/HJW_Ke8Ua.png' width=50% weight=50%> #### **Stats of ```PPO-[2, 4, 6, 8, 10]-16-16```** ```shell -------------------------------- random action: mean: -2.0455671063154934 std: 1.8084281891960132 max: -0.14861458539962769 min: -15.821859359741211 shape: (688,) -------------------------------- -------------------------------- model inference of PPO-2023-11-11-2-16-16-mse: mean: -0.3436024785041809 std: 0.06267330050468445 max: -0.20145198702812195 min: -1.0607212781906128 shape: (1, 688) -------------------------------- ``` ```shell -------------------------------- random action: mean: -3.0224343554168938 std: 1.2649511196974008 max: -0.8858807682991028 min: -12.207118034362793 shape: (816,) -------------------------------- -------------------------------- model inference of PPO-2023-11-12-4-16-16-mse: mean: -1.035849928855896 std: 0.3660873472690582 max: -0.45160818099975586 min: -4.301960468292236 shape: (1, 816) -------------------------------- ``` ```shell -------------------------------- random action: mean: -6.315735139489174 std: 2.9746460466637834 max: -1.05283784866333 min: -28.964038848876953 shape: (944,) -------------------------------- -------------------------------- model inference of PPO-2023-11-12-6-16-16-mse: mean: -1.7992961406707764 std: 0.948248028755188 max: -0.545820415019989 min: -12.615047454833984 shape: (1, 944) -------------------------------- ``` ```shell -------------------------------- random action: mean: -5.396963516545296 std: 2.4218854982579967 max: -1.0782233476638794 min: -26.205829620361328 shape: (1072,) -------------------------------- -------------------------------- model inference of PPO-2023-11-12-8-16-16-mse: mean: -3.3886656761169434 std: 1.2201298475265503 max: -1.073995590209961 min: -12.200408935546875 shape: (1, 1072) -------------------------------- ``` ```shell -------------------------------- random action: mean: -12.306949747651815 std: 6.374178071596621 max: -1.8142037391662598 min: -72.79109191894531 shape: (1200,) -------------------------------- -------------------------------- model inference of PPO-2023-11-12-10-16-16-mse: mean: -1.503293752670288 std: 0.3083134889602661 max: -0.9549587965011597 min: -4.4894700050354 shape: (1, 1200) -------------------------------- ``` <!-- <img src='' width=50% weight=50%> <img src='' width=50% weight=50%> --> ### ```PPO-10-16-36``` Cyan: ```PPO-10-16-36-2300``` ![image](https://hackmd.io/_uploads/H1gIo9UI6.png) <img src='https://hackmd.io/_uploads/SyDR5rPIa.png' width=50% weight=50%> <img src='https://hackmd.io/_uploads/rJ-yorDU6.png' width=50% weight=50%> ```shell -------------------------------- random action: mean: -9.217135637432337 std: 4.7068562802938505 max: -1.2675031423568726 min: -48.2896614074707 shape: (2300,) -------------------------------- -------------------------------- model inference of PPO-2023-11-13-10-16-36-mse: mean: -5.165438175201416 std: 1.8534566164016724 max: -1.520188808441162 min: -18.997055053710938 shape: (1, 2300) -------------------------------- ```