meeting 10/03 - HackMD

# **meeting 10/03** **Advisor: Prof. Chih-Yu Wang \ Presenter: Shao-Heng Chen \ Date: Oct 03, 2023**   ## **Environment** - Action space and observation space ![](https://hackmd.io/_uploads/ByOE_HOep.png) ![](https://hackmd.io/_uploads/SkQ6PSulT.png) ![](https://hackmd.io/_uploads/SJEzvH_xT.png) - MSE Reward design ![](https://hackmd.io/_uploads/r12B1Udlp.png) ![](https://hackmd.io/_uploads/SyAQgUOg6.png) - The MSE reward upper limit is bound to the number of RIS elements, but I haven't figured out the exact relationship between them yet - For example, if we set ```num_RIS_elements=16```, the potential upper bound of the MSE reward may be around ```1000-1200``` - And if we set ```num_RIS_elements=256```, the possible upper bound may increase to around ```2500-2600``` - Downlink rate reward ![](https://hackmd.io/_uploads/r1yUGU_l6.png) ## **Current Progress** - Hardwares (My PCs) - ```i7-8700``` + ```RTX 2060 (6GB)```, it requires nearly ```11 hours``` to complete ```1M steps``` - ```i7-12700``` + ```RTX 3060Ti (8GB)```, in contrast, only takes ```8-9 hours``` - I have tried several common methods that are supported by ```Stable-Baselines3``` with their default hyper-parameter settings, just to see how things work ### ![](https://hackmd.io/_uploads/HkfnJaMlT.png) ## **Future works** - Try normalizing the ```Box``` action space and making it symmetric ![](https://hackmd.io/_uploads/r1BilHdx6.png) (source: https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html) - Additionally, consider normalizing the observation space, although I'm not entirely certain about the boundaries yet - Try using 'SVD + water filling' to replace 'MRT' - Consider refactoring the code using ```PyTorch```, for GPU Acceleration, as ```numpy``` runs only on the CPU and might be relatively slow - True discrete action space ![](https://hackmd.io/_uploads/HkLZEnwg6.png)