# **meeting 01/09**
**Advisor: Prof. Chih-Yu Wang \
Presenter: Shao-Heng Chen \
Date: Jan 09, 2023**
<!-- Chih-Yu Wang -->
<!-- Wei-Ho Chung -->
## **Baseline method**
- Dominant Eigenvector Matching (DEM) heuristic for RIS Configuration
- N. K. Kundu and M. R. McKay, "[RIS-Assisted MISO Communication: Optimal Beamformers and Performance Analysis](https://ieeexplore.ieee.org/abstract/document/9367504)," *2020 IEEE Globecom Workshops (GC Wkshps*, Taipei, Taiwan, 2020, pp. 1-6. (Cited by 13)
- S. Ragi, E. K. P. Chong and H. D. Mittelmann, "[Polynomial-Time Methods to Solve Unimodular Quadratic Programs With Performance Guarantees](https://ieeexplore.ieee.org/document/8534389)," in *IEEE Transactions on Aerospace and Electronic Systems*, vol. 55, no. 5, pp. 2118-2127, Oct. 2019. (Cited by 6)
- Performance: SDR > DEM > Power method
- Speed: DEM > Power method > SDR
- Max Ratio Transmission (MRT) for Precoder Design
- J. Gao, C. Zhong, X. Chen, H. Lin and Z. Zhang, "[Unsupervised Learning for Passive Beamforming](https://ieeexplore.ieee.org/document/8955968)," in *IEEE Communications Letters*, vol. 24, no. 5, pp. 1052-1056, May 2020.
## **Ns-to-MSE**
#### ```2-16-4```
<!-- fixed 3-0-0 -->
```shell
--------------------------------
random action:
mean: -11.312477596133947
std: 8.794125918316428
max: -0.5136199593544006
min: -45.81297302246094
shape: (220,)
--------------------------------
--------------------------------
optimal of (2, 16, 4) for use 0:
mean: -3.8425655653774737
std: 2.622386486591075
max: -0.28868570923805237
min: -16.06785774230957
--------------------------------
optimal of (2, 16, 4) for use 1:
mean: -3.845380524083972
std: 2.5007277556465843
max: -0.25880759954452515
min: -14.829621315002441
--------------------------------
--------------------------------
model inference of PPO-2023-12-27-2-16-4-mse:
mean: -1.7125871181488037
std: 1.2584214210510254
max: -0.017516791820526123
min: -8.086286544799805
shape: (1, 220)
--------------------------------
```
#### ```2-16-16```
<!-- vary 0-0-1 -->
```shell
--------------------------------
random action:
mean: -23.287365135237575
std: 35.41724217592069
max: -0.016673624515533447
min: -344.7962341308594
shape: (688,)
--------------------------------
--------------------------------
optimal of (2, 16, 16) for use 0:
mean: -20.377414809659122
std: 34.135585934665926
max: -0.06222891807556152
min: -322.28173828125
--------------------------------
optimal of (2, 16, 16) for use 1:
mean: -19.44668525592983
std: 31.6871069320754
max: -0.05366373062133789
min: -302.0982971191406
--------------------------------
--------------------------------
model inference of PPO-2023-12-27-2-16-16-mse:
mean: -8.267539024353027
std: 12.143043518066406
max: -0.008706510066986084
min: -94.28921508789062
shape: (1, 688)
--------------------------------
```
#### ```2-16-36```
<!-- vary 0-0-3 -->
```shell
--------------------------------
random action:
mean: -47.43919567017257
std: 67.45026323784184
max: -0.107148677110672
min: -585.6828002929688
shape: (1468,)
--------------------------------
--------------------------------
optimal of (2, 16, 36) for use 0:
mean: -30.41546062089503
std: 40.02541676280052
max: -0.2797589600086212
min: -371.8368835449219
--------------------------------
optimal of (2, 16, 36) for use 1:
mean: -31.720209907114505
std: 46.39246529044339
max: -0.3546633720397949
min: -355.41552734375
--------------------------------
--------------------------------
model inference of PPO-2023-12-27-2-16-36-mse:
mean: -22.440160751342773
std: 43.54800796508789
max: -0.001824796199798584
min: -441.32025146484375
shape: (1, 1468)
--------------------------------
```
#### ```2-16-64```
<!-- fixed 0-0-0 -->
```shell
--------------------------------
random action:
mean: -74.7162386790663
std: 110.70062236094424
max: -0.22420012950897217
min: -969.4811401367188
shape: (2560,)
--------------------------------
--------------------------------
optimal of (2, 16, 64) for use 0:
mean: -61.581679619938136
std: 45.04678289304905
max: -0.7713916897773743
min: -340.6769714355469
--------------------------------
optimal of (2, 16, 64) for use 1:
mean: -87.58209336662293
std: 58.47832341113985
max: -2.1561169624328613
min: -421.2488708496094
--------------------------------
--------------------------------
model inference of PPO-2023-12-27-2-16-64-mse:
mean: -42.636146545410156
std: 78.62484741210938
max: -0.012428343296051025
min: -716.5140991210938
shape: (1, 2560)
--------------------------------
```
#### ```2-16-100```
<!-- fixed 0-0-1 -->
```shell
--------------------------------
random action:
mean: -124.16565789461136
std: 90.66109263659443
max: -1.781785488128662
min: -648.4274291992188
shape: (3964,)
--------------------------------
--------------------------------
optimal of (2, 16, 100) for use 0:
mean: -94.57515217864514
std: 69.11711167173908
max: -0.9999287128448486
min: -640.1063232421875
--------------------------------
optimal of (2, 16, 100) for use 1:
mean: -111.95038308048248
std: 80.5699742767558
max: -2.7255449295043945
min: -654.1275024414062
--------------------------------
--------------------------------
model inference of PPO-2023-12-27-2-16-100-mse:
mean: -48.63238525390625
std: 34.835445404052734
max: -0.8913658857345581
min: -264.3347473144531
shape: (1, 3964)
--------------------------------
```
<img src='https://hackmd.io/_uploads/HkIw4q8u6.png' width=80% weight=50%>
## **Appendix**
### **Validate MSE values**

```shell
---------------------------------------------------------------------------
['transmit_signal_x']
content:
tensor([[ 0.0932-2.1409j],
[-3.9791-0.5378j],
[-2.2114+2.0130j],
[-3.3006+0.5852j],
[-0.5526-3.4002j],
[-4.5900+6.1976j],
[ 0.7902+2.4265j],
[-8.2058+4.1681j],
[-3.4341+3.3394j],
[-5.8444+2.8024j]], device='cuda:0')
type: <class 'torch.Tensor'>
dtype: torch.complex64
shape: torch.Size([10, 1])
---------------------------------------------------------------------------
['received_signal_y']
content:
tensor([[-4.2374-3.1528j],
[ 8.1076-0.4383j],
[10.7308+15.6608j],
[-4.5685+0.8468j],
[-3.4787+15.8970j],
[-5.3407+1.4457j],
[-5.7019+0.7959j],
[ 6.2815-2.6457j],
[-1.0212-4.4433j],
[-2.8177+15.1880j]], device='cuda:0')
type: <class 'torch.Tensor'>
dtype: torch.complex64
shape: torch.Size([10, 1])
---------------------------------------------------------------------------
['noise_vector_n']
content:
tensor([[-8.5724e-04-1.3420e-03j],
[-6.4837e-04+1.4838e-03j],
[ 1.0125e-03-9.2602e-04j],
[ 5.4047e-05-3.4645e-05j],
[-1.1885e-04-3.4896e-04j],
[ 1.4667e-03-8.0594e-04j],
[ 3.5072e-04+6.3867e-04j],
[ 2.4134e-04-8.8169e-04j],
[-9.3497e-04+5.2690e-04j],
[ 4.8597e-04+1.4528e-04j]], device='cuda:0')
type: <class 'torch.Tensor'>
dtype: torch.complex64
shape: torch.Size([10, 1])
---------------------------------------------------------------------------
MSE_0: 19.77744483947754
MSE_1: 146.09861755371094
MSE_2: 353.7637939453125
MSE_3: 1.676080346107483
MSE_4: 380.94439697265625
MSE_5: 23.144906997680664
MSE_6: 44.80659103393555
MSE_7: 256.3103332519531
MSE_8: 66.39208221435547
MSE_9: 162.56309509277344
```
### **How to evaluate an RL algorithm?**
Because most algorithms use exploration noise during training, you need a separate test environment to evaluate the performance of your agent at a given time. It is recommended to periodically evaluate your agent for ```n``` test episodes (```n``` is usually between ```5``` and ```20```) and average the reward per episode to have a good estimate.
As some policy are stochastic by default (e.g. ```A2C``` or ```PPO```), you should also try to set ```deterministic=True``` when calling the ```.predict()``` method, this frequently leads to better performance. Looking at the training curve (episode reward function of the timesteps) is a good proxy but underestimates the agent true performance.
Reference: Stable-Baselines3 [How to evaluate an RL algorithm?](https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html)
<!--
## **TODO List**
- [ ] 畫 ```CDF``` 的圖
- [ ] 訓練或至少推理要跑更多樣本,比如 1萬筆 或 100萬筆
- [ ] 刻一個 ```conventional``` 的 ```baseline``` 解法
- [ ] 了解 ```MSE``` 的物理意義跟比較基準
- [ ] 把 ```SumRate()``` 跟 ```DownLinkRate()``` 修好
-->