# **meeting 11/28**
**Advisor: Prof. Chih-Yu Wang \
Presenter: Shao-Heng Chen \
Date: Nov 28, 2023**
<!-- Chih-Yu Wang -->
<!-- Wei-Ho Chung -->
## **New feature**
- ```seed_everything()```

## **Bugs**
1. 當 ```Nk == 1``` 的時候,```downlink rate``` 跟 ```sum-rate``` 會變 ```Inf```,發現應該是 SINR 實作有錯?
$$
\begin{align*}
y_{k} = (\mathbf{h}_{k,2} \mathbf{\Phi} \mathbf{H}_{1} &+ \mathbf{h}_{k, 3}) \mathbf{f}_{k}x_{k} + \sum\limits_{j, \ j \neq k}^{N_k} (\mathbf{h}_{k,2} \mathbf{\Phi} \mathbf{H}_{1} + \mathbf{h}_{k, 3}) \mathbf{f}_{j}x_{j} + n_{k}, \\
\rho_{k} &= \frac{| (\mathbf{h}_{k,2} \mathbf{\Phi H}_{1} + \mathbf{h}_{k, 3}) \mathbf{f}_{k} |^{2} }{ \sum\limits_{j, \ j \neq k}^{N_k} | (\mathbf {h}_{k,2} \mathbf{\Phi H}_{1} + \mathbf{h}_{k, 3}) \mathbf{f}_{j} |^{2} + \sigma_{n}^{2} }.
\end{align*}
$$
- Original version

- Modified version

2. 當 ```Nk == 1``` 的時候,其中一個終止條件 ```(reward == opt_reward)``` 要拿掉,因為只有 1 個 user 所以 reward 必定等於 opt_reward

- PPO (Orange: ```PPO-1-36-4-380```;Blue: ```PPO-1-4-4-60```)
- episode reward mean rollout

- training stats


## **Training results**

### **PPO**
- Red: ```PPO-1-4-4-60```



- Cyan: ```PPO-2-4-4-76```

- Pink: ```PPO-3-4-4-92```

- Green: ```PPO-4-4-4-108```

- Comparison of all

- PPO training stats


### **A2C**
- Orange: ```A2C-1-4-4-60```

- Blue: ```A2C-2-4-4-76```

- Red: ```A2C-3-4-4-92```

- Cyan: ```A2C-4-4-4-108```

- Comparison of all

- A2C training stats

### **Training of more complex settings**
Orange: ```PPO-4-16-16-816```

### **Training of more episodes**
Orange: ```PPO-4-16-16-816``` with ```80``` episodes
Blue: ```PPO-4-16-16-816``` with ```1000``` episodes


### **Comparison of all continuous agents**
Setting: ```(Nk, Nt, Ns) = (2, 4, 4)```
Pink: ```TD3```;Red: ```DDPG```;Orange: ```A2C```;Blue: ```PPO```;Cyan: ```SAC```
