meeting 01/09 - HackMD

# **meeting 01/09** **Advisor: Prof. Chih-Yu Wang \ Presenter: Shao-Heng Chen \ Date: Jan 09, 2023**   ## **Baseline method** - Dominant Eigenvector Matching (DEM) heuristic for RIS Configuration - N. K. Kundu and M. R. McKay, "[RIS-Assisted MISO Communication: Optimal Beamformers and Performance Analysis](https://ieeexplore.ieee.org/abstract/document/9367504)," *2020 IEEE Globecom Workshops (GC Wkshps*, Taipei, Taiwan, 2020, pp. 1-6. (Cited by 13) - S. Ragi, E. K. P. Chong and H. D. Mittelmann, "[Polynomial-Time Methods to Solve Unimodular Quadratic Programs With Performance Guarantees](https://ieeexplore.ieee.org/document/8534389)," in *IEEE Transactions on Aerospace and Electronic Systems*, vol. 55, no. 5, pp. 2118-2127, Oct. 2019. (Cited by 6) - Performance: SDR > DEM > Power method - Speed: DEM > Power method > SDR - Max Ratio Transmission (MRT) for Precoder Design - J. Gao, C. Zhong, X. Chen, H. Lin and Z. Zhang, "[Unsupervised Learning for Passive Beamforming](https://ieeexplore.ieee.org/document/8955968)," in *IEEE Communications Letters*, vol. 24, no. 5, pp. 1052-1056, May 2020. ## **Ns-to-MSE** #### ```2-16-4```  ```shell -------------------------------- random action: mean: -11.312477596133947 std: 8.794125918316428 max: -0.5136199593544006 min: -45.81297302246094 shape: (220,) -------------------------------- -------------------------------- optimal of (2, 16, 4) for use 0: mean: -3.8425655653774737 std: 2.622386486591075 max: -0.28868570923805237 min: -16.06785774230957 -------------------------------- optimal of (2, 16, 4) for use 1: mean: -3.845380524083972 std: 2.5007277556465843 max: -0.25880759954452515 min: -14.829621315002441 -------------------------------- -------------------------------- model inference of PPO-2023-12-27-2-16-4-mse: mean: -1.7125871181488037 std: 1.2584214210510254 max: -0.017516791820526123 min: -8.086286544799805 shape: (1, 220) -------------------------------- ``` #### ```2-16-16```  ```shell -------------------------------- random action: mean: -23.287365135237575 std: 35.41724217592069 max: -0.016673624515533447 min: -344.7962341308594 shape: (688,) -------------------------------- -------------------------------- optimal of (2, 16, 16) for use 0: mean: -20.377414809659122 std: 34.135585934665926 max: -0.06222891807556152 min: -322.28173828125 -------------------------------- optimal of (2, 16, 16) for use 1: mean: -19.44668525592983 std: 31.6871069320754 max: -0.05366373062133789 min: -302.0982971191406 -------------------------------- -------------------------------- model inference of PPO-2023-12-27-2-16-16-mse: mean: -8.267539024353027 std: 12.143043518066406 max: -0.008706510066986084 min: -94.28921508789062 shape: (1, 688) -------------------------------- ``` #### ```2-16-36```  ```shell -------------------------------- random action: mean: -47.43919567017257 std: 67.45026323784184 max: -0.107148677110672 min: -585.6828002929688 shape: (1468,) -------------------------------- -------------------------------- optimal of (2, 16, 36) for use 0: mean: -30.41546062089503 std: 40.02541676280052 max: -0.2797589600086212 min: -371.8368835449219 -------------------------------- optimal of (2, 16, 36) for use 1: mean: -31.720209907114505 std: 46.39246529044339 max: -0.3546633720397949 min: -355.41552734375 -------------------------------- -------------------------------- model inference of PPO-2023-12-27-2-16-36-mse: mean: -22.440160751342773 std: 43.54800796508789 max: -0.001824796199798584 min: -441.32025146484375 shape: (1, 1468) -------------------------------- ``` #### ```2-16-64```  ```shell -------------------------------- random action: mean: -74.7162386790663 std: 110.70062236094424 max: -0.22420012950897217 min: -969.4811401367188 shape: (2560,) -------------------------------- -------------------------------- optimal of (2, 16, 64) for use 0: mean: -61.581679619938136 std: 45.04678289304905 max: -0.7713916897773743 min: -340.6769714355469 -------------------------------- optimal of (2, 16, 64) for use 1: mean: -87.58209336662293 std: 58.47832341113985 max: -2.1561169624328613 min: -421.2488708496094 -------------------------------- -------------------------------- model inference of PPO-2023-12-27-2-16-64-mse: mean: -42.636146545410156 std: 78.62484741210938 max: -0.012428343296051025 min: -716.5140991210938 shape: (1, 2560) -------------------------------- ``` #### ```2-16-100```  ```shell -------------------------------- random action: mean: -124.16565789461136 std: 90.66109263659443 max: -1.781785488128662 min: -648.4274291992188 shape: (3964,) -------------------------------- -------------------------------- optimal of (2, 16, 100) for use 0: mean: -94.57515217864514 std: 69.11711167173908 max: -0.9999287128448486 min: -640.1063232421875 -------------------------------- optimal of (2, 16, 100) for use 1: mean: -111.95038308048248 std: 80.5699742767558 max: -2.7255449295043945 min: -654.1275024414062 -------------------------------- -------------------------------- model inference of PPO-2023-12-27-2-16-100-mse: mean: -48.63238525390625 std: 34.835445404052734 max: -0.8913658857345581 min: -264.3347473144531 shape: (1, 3964) -------------------------------- ``` <img src='https://hackmd.io/_uploads/HkIw4q8u6.png' width=80% weight=50%> ## **Appendix** ### **Validate MSE values** ![image](https://hackmd.io/_uploads/Sk4RM7Ydp.png) ```shell --------------------------------------------------------------------------- ['transmit_signal_x'] content: tensor([[ 0.0932-2.1409j], [-3.9791-0.5378j], [-2.2114+2.0130j], [-3.3006+0.5852j], [-0.5526-3.4002j], [-4.5900+6.1976j], [ 0.7902+2.4265j], [-8.2058+4.1681j], [-3.4341+3.3394j], [-5.8444+2.8024j]], device='cuda:0') type: <class 'torch.Tensor'> dtype: torch.complex64 shape: torch.Size([10, 1]) --------------------------------------------------------------------------- ['received_signal_y'] content: tensor([[-4.2374-3.1528j], [ 8.1076-0.4383j], [10.7308+15.6608j], [-4.5685+0.8468j], [-3.4787+15.8970j], [-5.3407+1.4457j], [-5.7019+0.7959j], [ 6.2815-2.6457j], [-1.0212-4.4433j], [-2.8177+15.1880j]], device='cuda:0') type: <class 'torch.Tensor'> dtype: torch.complex64 shape: torch.Size([10, 1]) --------------------------------------------------------------------------- ['noise_vector_n'] content: tensor([[-8.5724e-04-1.3420e-03j], [-6.4837e-04+1.4838e-03j], [ 1.0125e-03-9.2602e-04j], [ 5.4047e-05-3.4645e-05j], [-1.1885e-04-3.4896e-04j], [ 1.4667e-03-8.0594e-04j], [ 3.5072e-04+6.3867e-04j], [ 2.4134e-04-8.8169e-04j], [-9.3497e-04+5.2690e-04j], [ 4.8597e-04+1.4528e-04j]], device='cuda:0') type: <class 'torch.Tensor'> dtype: torch.complex64 shape: torch.Size([10, 1]) --------------------------------------------------------------------------- MSE_0: 19.77744483947754 MSE_1: 146.09861755371094 MSE_2: 353.7637939453125 MSE_3: 1.676080346107483 MSE_4: 380.94439697265625 MSE_5: 23.144906997680664 MSE_6: 44.80659103393555 MSE_7: 256.3103332519531 MSE_8: 66.39208221435547 MSE_9: 162.56309509277344 ``` ### **How to evaluate an RL algorithm?** Because most algorithms use exploration noise during training, you need a separate test environment to evaluate the performance of your agent at a given time. It is recommended to periodically evaluate your agent for ```n``` test episodes (```n``` is usually between ```5``` and ```20```) and average the reward per episode to have a good estimate. As some policy are stochastic by default (e.g. ```A2C``` or ```PPO```), you should also try to set ```deterministic=True``` when calling the ```.predict()``` method, this frequently leads to better performance. Looking at the training curve (episode reward function of the timesteps) is a good proxy but underestimates the agent true performance. Reference: Stable-Baselines3 [How to evaluate an RL algorithm?](https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html)