# **meeting 11/28** **Advisor: Prof. Chih-Yu Wang \ Presenter: Shao-Heng Chen \ Date: Nov 28, 2023** <!-- Chih-Yu Wang --> <!-- Wei-Ho Chung --> ## **New feature** - ```seed_everything()``` ![image](https://hackmd.io/_uploads/r15_GJzST.png) ## **Bugs** 1. 當 ```Nk == 1``` 的時候,```downlink rate``` 跟 ```sum-rate``` 會變 ```Inf```,發現應該是 SINR 實作有錯? $$ \begin{align*} y_{k} = (\mathbf{h}_{k,2} \mathbf{\Phi} \mathbf{H}_{1} &+ \mathbf{h}_{k, 3}) \mathbf{f}_{k}x_{k} + \sum\limits_{j, \ j \neq k}^{N_k} (\mathbf{h}_{k,2} \mathbf{\Phi} \mathbf{H}_{1} + \mathbf{h}_{k, 3}) \mathbf{f}_{j}x_{j} + n_{k}, \\ \rho_{k} &= \frac{| (\mathbf{h}_{k,2} \mathbf{\Phi H}_{1} + \mathbf{h}_{k, 3}) \mathbf{f}_{k} |^{2} }{ \sum\limits_{j, \ j \neq k}^{N_k} | (\mathbf {h}_{k,2} \mathbf{\Phi H}_{1} + \mathbf{h}_{k, 3}) \mathbf{f}_{j} |^{2} + \sigma_{n}^{2} }. \end{align*} $$ - Original version ![image](https://hackmd.io/_uploads/HJGvKZ67a.png) - Modified version ![image](https://hackmd.io/_uploads/SJsU50eHp.png) 2. 當 ```Nk == 1``` 的時候,其中一個終止條件 ```(reward == opt_reward)``` 要拿掉,因為只有 1 個 user 所以 reward 必定等於 opt_reward ![image](https://hackmd.io/_uploads/S1m83A-ra.png) - PPO (Orange: ```PPO-1-36-4-380```;Blue: ```PPO-1-4-4-60```) - episode reward mean rollout ![image](https://hackmd.io/_uploads/BkiMzz1Sp.png) - training stats ![image](https://hackmd.io/_uploads/HJYLzMkSa.png) ![image](https://hackmd.io/_uploads/SkkuzzJBp.png) ## **Training results** ![image](https://hackmd.io/_uploads/H1vcr6WHa.png) ### **PPO** - Red: ```PPO-1-4-4-60``` ![image](https://hackmd.io/_uploads/rkrKQ4krT.png) ![image](https://hackmd.io/_uploads/HyFf3ReHa.png) ![image](https://hackmd.io/_uploads/H1772RxHp.png) - Cyan: ```PPO-2-4-4-76``` ![image](https://hackmd.io/_uploads/ByS6pLJHT.png) - Pink: ```PPO-3-4-4-92``` ![image](https://hackmd.io/_uploads/rJys2AlB6.png) - Green: ```PPO-4-4-4-108``` ![image](https://hackmd.io/_uploads/Bkbp3RgSp.png) - Comparison of all ![image](https://hackmd.io/_uploads/r1GW2AgB6.png) - PPO training stats ![image](https://hackmd.io/_uploads/r1eDnClST.png) ![image](https://hackmd.io/_uploads/SJX_h0er6.png) ### **A2C** - Orange: ```A2C-1-4-4-60``` ![image](https://hackmd.io/_uploads/BJ9dKGxSa.png) - Blue: ```A2C-2-4-4-76``` ![image](https://hackmd.io/_uploads/HySsYGgBT.png) - Red: ```A2C-3-4-4-92``` ![image](https://hackmd.io/_uploads/HJcaYfgB6.png) - Cyan: ```A2C-4-4-4-108``` ![image](https://hackmd.io/_uploads/B1VJ9zeHT.png) - Comparison of all ![image](https://hackmd.io/_uploads/HyDBYGgr6.png) - A2C training stats ![image](https://hackmd.io/_uploads/BJEXFGerp.png) ### **Training of more complex settings** Orange: ```PPO-4-16-16-816``` ![image](https://hackmd.io/_uploads/BkM3cabS6.png) ### **Training of more episodes** Orange: ```PPO-4-16-16-816``` with ```80``` episodes Blue: ```PPO-4-16-16-816``` with ```1000``` episodes ![image](https://hackmd.io/_uploads/SkGUo6bS6.png) ![image](https://hackmd.io/_uploads/H1CPoaZHp.png) ### **Comparison of all continuous agents** Setting: ```(Nk, Nt, Ns) = (2, 4, 4)``` Pink: ```TD3```;Red: ```DDPG```;Orange: ```A2C```;Blue: ```PPO```;Cyan: ```SAC``` ![image](https://hackmd.io/_uploads/SyxP_nzBp.png)