# **meeting 09/14** **Advisor: Prof. Wei-Ho Chung \ Presenter: Shao-Heng Chen \ Date: Sep 14, 2023** <!-- Chih-Yu Wang --> <!-- Wei-Ho Chung --> ### **System model** - Downlink RIS-aided MU-MIMO system - $N_k$ users with $N_r$ antenna - one BS with $N_t$ antennas - one RIS with $N_s$ elements - each of the transmitted symbol has a length of $L$ - Channel model - BS-RIS channel matrix $\mathbf{H}_1 \in \mathbb{C}^{N_s \times N_t}$ - RIS-$k$-th-user channel matrix $\mathbf{H}_{2, k} = \hat{\mathbf{H}}_{2, k} + \Delta\mathbf{H}_{2, k} \in \mathbb{C}^{N_r \times N_s}$ - BS-$k$-th-user channel matrix $\mathbf{H}_{3, k} = \hat{\mathbf{H}}_{3, k} + \Delta\mathbf{H}_{3, k} \in \mathbb{C}^{N_r \times N_t}$ - AWGN vector at the $k$-th user $\mathbf{n}_k \in \mathbb{C}^{N_r \times 1}$, $\mathbf{n}_u \sim \mathcal{CN}(0, \sigma_n^2 I_{N_r})$ - Beamforming matrices - Precoding matrix for $k$-th user $\mathbf{F}_k \in \mathbb{C}^{N_t \times L}$ - Combining matrix for $k$-th user $\mathbf{W}_k \in \mathbb{C}^{N_r \times L}$ - The decoded signal at the $k$-th user $$ \begin{align*} \mathbf{y}_k &= \mathbf{W}_k^H (\mathbf{H}_{2, k} \mathbf{\Phi} \mathbf{H}_1 + \mathbf{H}_{3, k}) \mathbf{F}_k \mathbf{x}_k + \mathbf{W}_k^H \mathbf{n}_k \\ &= \mathbf{W}_k^H \ \tilde{\mathbf{H}} \ \mathbf{F}_k \mathbf{x}_k + \mathbf{W}_k^H \mathbf{n}_k \end{align*} $$ - $\mathbf{H}_{3, k} \in \mathbb{C}^{N_r \times N_t}, \forall k = 1, ..., N_k$ is the BS-$k$-th-user channel (direct path, NLOS **Rayleigh** fadding channel) - actual direct channel = estimated direct channel (known at the BS and the RIS) + CSI uncertainty (unknown) $$ \begin{align*} \mathbf{H}_{3, k} =& \ \hat{\mathbf{H}}_{3, k} + \Delta\mathbf{H}_{3, k} \in \mathbb{C}^{N_r \times N_t} \\ \Delta\mathbf{H}_{3, k} &= \psi\frac{\mathbf{H}_{3, k}^{N_2}}{\| \mathbf{H}_{3, k}^{N_2} \|_2}, \;\; \mathbf{H}_{3, k}^{N_2} \sim \mathcal{CN}(0, 1) \end{align*} $$ - $\mathbf{H}_1 \in \mathbb{C}^{N_s \times N_t}$ is the BS-RIS channel (LOS **Rician** fading) - channel matrix $\mathbf{H}_1$ is known at the BS and the RIS by the backhaul transmission $$ \begin{align*} \mathbf{H}_1 = C_r \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm ULA}(\theta_1)\textbf{a}_{\rm ULA}(\eta_1)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}_1) \end{align*} $$ - $C_r$ is the LOS path loss from BS to RIS - $\textbf{a}_{\rm ULA}(\cdot)$ is the ULA sterring vector with the angular parameter $\theta_1, \eta_1$ (with separated distance = $10$ cm) - $\bar{\mathbf{H}}_1$ is the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$ - $\mathbf{H}_{2, k} \in \mathbb{C}^{N_r \times N_s}, \forall k = 1, ..., N_k$ is the RIS-$k$-th user channel (LOS **Rician** fading) - actual RIS-user channel = estimated RIS-user CSI (known at the BS and the RIS) + CSI uncertainty (unknown) $$ \begin{align*} \mathbf{H}_{2, k} =& \ \hat{\mathbf{H}}_{2, k} + \Delta\mathbf{H}_{2, k} \in \mathbb{C}^{N_r \times N_t} \\ \Delta\mathbf{H}_{2, k} &= \psi\frac{\mathbf{H}_{2, k}^{N_1}}{\| \mathbf{H}_{2, k}^{N_1} \|_2}, \;\; \mathbf{H}_{2, k}^{N_1} \sim \mathcal{CN}(0, 1) \\ \mathbf{H}_{2, k} &= C_u \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm UPA}(\theta_2)\textbf{a}_{\rm UPA}(\eta_2)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}_{2, k}) \end{align*} $$ - $C_u$ is also the LOS path loss from RIS to $k$-th user - $\textbf{a}_{\rm UPA}(\cdot)$ is the UPA sterring vector with the angular parameter $\theta_2, \eta_2$ (with separated distance = $10$ cm) - The UPA steering vector formulation is weird, I think it should at least specified both azimuth and elevation angles, like the following form $\textbf{a}_{\rm UPA}(\theta_2, \eta_2)\textbf{a}_{\rm UPA}(\theta_2, \eta_2)^H$? - $\bar{\mathbf{H}}_{2, k}$ is also the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$ - Reference: - W. -Y. Chen, C. -Y. Wang, R. -H. Hwang, W. -T. Chen and S. -Y. Huang, "[Impact of Hardware Impairment on the Joint Reconfigurable Intelligent Surface and Robust Transceiver Design in MU-MIMO System](https://ieeexplore.ieee.org/document/10149520)," in *IEEE Transactions on Mobile Computing*. <img src='https://hackmd.io/_uploads/HJUuXJeJ6.png' width=60% height=60%> - $\mathbf{\Phi} \triangleq diag(\phi_1, ..., \phi_{N_s}) \in \mathbb{C}^{N_s \times N_s}$ is the diagonal reflection matrix of the RIS - $\phi_i = \beta(\varphi_i) \cdot e^{j\varphi_{i}}, \; \forall i = 1, ..., N_s$ - in the simulation, the value of $N_s$ is set to $25$ (quite large?) - $\beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}$ - $\varphi_i = \hat{\varphi_i} + \varphi_i'$ - actual phase shift = desired phase shift + phase error - $\hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; \forall i = 1, ..., N_s$ - in the simulation, the value of $bits$ is set to $8$, which give us $256$ choices - $\varphi_i'$ folows the von Mises distribution with PDF $f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}$ - $I_0(\kappa)$ is the modified Bessel function of the first kind of order $0$ ### **Problem formulation** - The objective is to minimize the worst case MSE (min-max MSE) $$ \begin{align*} \min\limits_{\mathbf{W}_k, \; \mathbf{\Phi}, \; \mathbf{F}_k \\ \forall k = 1, \ ..., \ N_k} \;\; &\max\limits_{\Delta\mathbf{H}_{2, k}, \; \Delta\mathbf{H}_{3, k} \\ \;\; \forall k = 1, \ ..., \ N_k } \;\;\;\;\; \alpha \\ \textrm {s.t.} \;\; & \alpha \geq 0, \\ & E\left\{ tr\{((\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \}\right\} \leq \alpha, \; \forall k = 1, \ldots, N_k, \\ & \sum\limits_{u = 1}^{N_k} tr\{ \mathbf{F}_k\mathbf{F}_k^{\mathcal {H}} \} \leq P_{t}, \\ & \| \phi_i \|_2^2 = \beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}, \; \beta_{min} \geq 0, \; \mu\geq 0, \;\; \forall i = 1, \ldots, N_s, \\ & \varphi_i = \hat{\varphi_i} + \varphi_i' \ , \; \varphi_i \in [0, 2\pi), \;\; \forall i = 1, \ldots, N_s, \\ & \hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}, \;\forall i = 1, \ldots, N_s, \\ & \| \Delta\mathbf{H}_{2, k} \|_2 \leq \psi, \; \forall k = 1, \ldots, N_k, \\ & \| \Delta\mathbf{H}_{3, k} \|_2 \leq \psi, \; \forall k = 1, \ldots, N_k. \end{align*} $$ ### **Algorithm** - Deep Reinforcement Learning <img src='https://hackmd.io/_uploads/Hk89kylkT.png' width=60% height=60%> ### **MSE derivation** - I was wondering if the MSE of the Weighted Minimum Mean Squared Error (WMMSE) is the same as the MSE of the Min-Max MSE? - The MSE of WMMSE $$ \begin{align*} \mathbf{E}_k = E\left\{ \| \mathbf{x}_k - \xi_k^{-1} \mathbf{y}_k \|^2 \right\} = E\left\{ (\mathbf{x}_k - \xi_k^{-1} \mathbf{y}_k)(\mathbf{x}_k - \xi_k^{-1} \mathbf{y}_k)^H \right\} \end{align*} $$ - The MSE of Min-Max MSE $$ \begin{align*} \text{MSE} &= E\left\{ tr\{(\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \}\right\} = tr\left\{ E\{(\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \} \right\} \\ &= tr\left\{ (\mathbf{I}_L - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k)(\mathbf{I}_L - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k)^H + \sigma_n^2\mathbf{W}_k^H\mathbf{W}_k \right\} \end{align*} $$ - The derivation process $$ \begin{align*} &E\left\{ (\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \right\} = \ E\{ (\mathbf{x}_k - (\mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k + \mathbf{W}_k^H\mathbf{n}_k)) ((\mathbf{x}_k - (\mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k + \mathbf{W}_k^H\mathbf{n}_k))^H \} \\ \\ &= \ E\{ (\mathbf{x}_k - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k - \mathbf{W}_k^H\mathbf{n}_k) (\mathbf{x}_k - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k - \mathbf{W}_k^H\mathbf{n}_k)^H \} \\ \\ &= E\{ (\mathbf{x}_k - \mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k \mathbf{x}_k - \mathbf{W}_k^H\mathbf{n}_k) (\mathbf{x}_k^H - \mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H - \mathbf{n}_k^H\mathbf{W}_k) \} \;\;\; (\because (\mathbf{AB})^H = \mathbf{B}^H\mathbf{A}^H) \\ \\ &= E\{ \mathbf{x}_k\mathbf{x}_k^H - \mathbf{x}_k\mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H - \mathbf{x}_k\mathbf{n}_k^H\mathbf{W}_k \\ & \;\;\;\;\;\;\;\;\; - (\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)\mathbf{x}_k\mathbf{x}_k^H + \mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k\mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H + \mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k\mathbf{n}_k^H\mathbf{W}_k \\ & \;\;\;\;\;\;\;\;\; - \mathbf{W}^H\mathbf{n}_k\mathbf{x}_k^H + \mathbf{W}^H\mathbf{n}_k\mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H + \mathbf{W}^H\mathbf{n}_k\mathbf{n}_k^H\mathbf{W} \} \\ \\ & \;\;\;\;\;\; (\because E\{ \mathbf{x}_k\mathbf{x}_k^H \} = \mathbf{I}_L, \ E\{ \mathbf{x}_k\mathbf{n}_k^H \} = E\{ \mathbf{x}_k^H\mathbf{n}_k \} = 0, \ E\{ \mathbf{n}_k\mathbf{n}_k^H \} = \sigma_n^2 ) \\ &= E\{ \mathbf{I}_L - (\mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k)^H - 0 - \mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k + \mathbf{I}_L + 0 - 0 + 0 + \sigma_n^2\mathbf{W}_k^H\mathbf{W}_k \} \\ \\ &= (\mathbf{I}_L - \mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k)(\mathbf{I}_L - \mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k)^H + \sigma_n^2\mathbf{W}_k^H\mathbf{W}_k \;\;\; (\because (\mathbf{I} - \mathbf{A}^H)(\mathbf{I} - \mathbf{A}) = \mathbf{I} - \mathbf{A} - \mathbf{A}^H + \mathbf{I}) \end{align*} $$ - 算出來很可能是 對角矩陣,模擬的時候可以驗證一下,如果確定是對角矩陣,對角線元素會對應到各個 user 的 MSE - 如果要估計的是 "phase error" 的期望值,而不是像上面 WMMSE 是要估計 "收到訊號" 的期望值,所以也許期望值不能直接拿掉? - References - K. -Y. Chen, H. -Y. Chang, R. Y. Chang and W. -H. Chung, "[Hybrid Beamforming in mmWave MIMO-OFDM Systems via Deep Unfolding](https://ieeexplore.ieee.org/document/9860467)," *2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring)*, Helsinki, Finland, 2022, pp. 1-7. <img src='https://hackmd.io/_uploads/ByyKUbs02.png' width=70% height=70%> - X. Zhao, T. Lin, Y. Zhu and J. Zhang, "[Partially-Connected Hybrid Beamforming for Spectral Efficiency Maximization via a Weighted MMSE Equivalence](https://ieeexplore.ieee.org/document/9467491)," in *IEEE Transactions on Wireless Communications*, vol. 20, no. 12, pp. 8218-8232, Dec. 2021. <img src='https://hackmd.io/_uploads/By2JUZsCh.png' width=70% height=70%> <img src='https://hackmd.io/_uploads/HyabLWi0h.png' width=70% height=70%> ## **Possible methods** - For solving the optimization problem - Double DQN - Dueling DQN - Proximal Policy Optimization (PPO) - For addressing the hybrid-action space - Xiong, Jiechao, et al. "[Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space](https://arxiv.org/abs/1810.06394)." *arXiv preprint arXiv:1810.06394* (2018). (**ICLR 2018**) (Cited by 132) - DQN + DDPG - Fu, Haotian, et al. "[Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces](https://arxiv.org/abs/1903.04959)." *arXiv preprint arXiv:1903.04959* (2019). (**IJCAI 2019**) (Cited by 50) - Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) - Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN) - Fan, Zhou, et al. "[Hybrid actor-critic reinforcement learning in parameterized action space](https://arxiv.org/abs/1903.01344)." *arXiv preprint arXiv:1903.01344* (2019). (**IJCAI 2019**) (Cited by 64) - Hybrid Proximal Policy Optimization (H-PPO) - Song, H. Francis, et al. "[V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control](https://arxiv.org/abs/1909.12238)." *arXiv preprint arXiv:1909.12238* (2019). (**ICLR 2020**) (Cited by 85) - V-MPO algorithm (On-policy MPO) - Neunert, Michael, et al. "[Continuous-discrete reinforcement learning for hybrid control in robotics](http://proceedings.mlr.press/v100/neunert20a/neunert20a.pdf)." *Conference on Robot Learning*. **PMLR, 2020**. (Cited by 71) - Maximum aposteriori Policy Optimisation (MPO) algorithm - Li, Boyan, et al. "[Hyar: Addressing discrete-continuous action reinforcement learning via hybrid action representation](https://arxiv.org/abs/2109.05490)." *arXiv preprint arXiv:2109.05490* (2021). (**ICLR 2022**) (Cited by 16) - HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE) - C. Huang, H. Zhang, L. Wang, X. Luo and Y. Song, "[Mixed Deep Reinforcement Learning Considering Discrete-continuous Hybrid Action Space for Smart Home Energy Management](https://ieeexplore.ieee.org/document/9682649)," in *Journal of Modern Power Systems and Clean Energy*, vol. 10, no. 3, pp. 743-754, May 2022. (Cited by 7) - mixed deep reinforcement learning (MDRL) algorithm - DQN + DDPG ### **Future works** - Code the simulation for the given system model (using ```python```) - This will provide us with a behavior model for further use - Convert the simulation program into a custom DRL environment - The custom environment has to follows ```gym``` interface - though ```gym``` supports both ```Tensorflow``` and ```PyTorch```, I don't know if the custom environment fits ```Tensorflow``` - We inherit the ```gym.Env``` class, override the template - define our own ```action_space```, ```observation_space```, ```step()```, and ```reset()```... etc. - so that our custom environment can fit in modern algorithms that are mostly coded in ```PyTorch``` or we may use the built-in algorithms with ```Stable Baseline3``` - Find or design a custom DRL algorithm that fit in this environment to slove the optimization problem