# **meeting 09/14**
**Advisor: Prof. Wei-Ho Chung \
Presenter: Shao-Heng Chen \
Date: Sep 14, 2023**
<!-- Chih-Yu Wang -->
<!-- Wei-Ho Chung -->
### **System model**
- Downlink RIS-aided MU-MIMO system
- $N_k$ users with $N_r$ antenna
- one BS with $N_t$ antennas
- one RIS with $N_s$ elements
- each of the transmitted symbol has a length of $L$
- Channel model
- BS-RIS channel matrix $\mathbf{H}_1 \in \mathbb{C}^{N_s \times N_t}$
- RIS-$k$-th-user channel matrix $\mathbf{H}_{2, k} = \hat{\mathbf{H}}_{2, k} + \Delta\mathbf{H}_{2, k} \in \mathbb{C}^{N_r \times N_s}$
- BS-$k$-th-user channel matrix $\mathbf{H}_{3, k} = \hat{\mathbf{H}}_{3, k} + \Delta\mathbf{H}_{3, k} \in \mathbb{C}^{N_r \times N_t}$
- AWGN vector at the $k$-th user $\mathbf{n}_k \in \mathbb{C}^{N_r \times 1}$, $\mathbf{n}_u \sim \mathcal{CN}(0, \sigma_n^2 I_{N_r})$
- Beamforming matrices
- Precoding matrix for $k$-th user $\mathbf{F}_k \in \mathbb{C}^{N_t \times L}$
- Combining matrix for $k$-th user $\mathbf{W}_k \in \mathbb{C}^{N_r \times L}$
- The decoded signal at the $k$-th user
$$
\begin{align*}
\mathbf{y}_k &= \mathbf{W}_k^H (\mathbf{H}_{2, k} \mathbf{\Phi} \mathbf{H}_1 + \mathbf{H}_{3, k}) \mathbf{F}_k \mathbf{x}_k + \mathbf{W}_k^H \mathbf{n}_k \\
&= \mathbf{W}_k^H \ \tilde{\mathbf{H}} \ \mathbf{F}_k \mathbf{x}_k + \mathbf{W}_k^H \mathbf{n}_k
\end{align*}
$$
- $\mathbf{H}_{3, k} \in \mathbb{C}^{N_r \times N_t}, \forall k = 1, ..., N_k$ is the BS-$k$-th-user channel (direct path, NLOS **Rayleigh** fadding channel)
- actual direct channel = estimated direct channel (known at the BS and the RIS) + CSI uncertainty (unknown)
$$
\begin{align*}
\mathbf{H}_{3, k} =& \ \hat{\mathbf{H}}_{3, k} + \Delta\mathbf{H}_{3, k} \in \mathbb{C}^{N_r \times N_t} \\
\Delta\mathbf{H}_{3, k} &= \psi\frac{\mathbf{H}_{3, k}^{N_2}}{\| \mathbf{H}_{3, k}^{N_2} \|_2}, \;\; \mathbf{H}_{3, k}^{N_2} \sim \mathcal{CN}(0, 1)
\end{align*}
$$
- $\mathbf{H}_1 \in \mathbb{C}^{N_s \times N_t}$ is the BS-RIS channel (LOS **Rician** fading)
- channel matrix $\mathbf{H}_1$ is known at the BS and the RIS by the backhaul transmission
$$
\begin{align*}
\mathbf{H}_1 = C_r \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm ULA}(\theta_1)\textbf{a}_{\rm ULA}(\eta_1)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}_1)
\end{align*}
$$
- $C_r$ is the LOS path loss from BS to RIS
- $\textbf{a}_{\rm ULA}(\cdot)$ is the ULA sterring vector with the angular parameter $\theta_1, \eta_1$ (with separated distance = $10$ cm)
- $\bar{\mathbf{H}}_1$ is the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$
- $\mathbf{H}_{2, k} \in \mathbb{C}^{N_r \times N_s}, \forall k = 1, ..., N_k$ is the RIS-$k$-th user channel (LOS **Rician** fading)
- actual RIS-user channel = estimated RIS-user CSI (known at the BS and the RIS) + CSI uncertainty (unknown)
$$
\begin{align*}
\mathbf{H}_{2, k} =& \ \hat{\mathbf{H}}_{2, k} + \Delta\mathbf{H}_{2, k} \in \mathbb{C}^{N_r \times N_t} \\
\Delta\mathbf{H}_{2, k} &= \psi\frac{\mathbf{H}_{2, k}^{N_1}}{\| \mathbf{H}_{2, k}^{N_1} \|_2}, \;\; \mathbf{H}_{2, k}^{N_1} \sim \mathcal{CN}(0, 1) \\
\mathbf{H}_{2, k} &= C_u \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm UPA}(\theta_2)\textbf{a}_{\rm UPA}(\eta_2)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}_{2, k})
\end{align*}
$$
- $C_u$ is also the LOS path loss from RIS to $k$-th user
- $\textbf{a}_{\rm UPA}(\cdot)$ is the UPA sterring vector with the angular parameter $\theta_2, \eta_2$ (with separated distance = $10$ cm)
- The UPA steering vector formulation is weird, I think it should at least specified both azimuth and elevation angles, like the following form $\textbf{a}_{\rm UPA}(\theta_2, \eta_2)\textbf{a}_{\rm UPA}(\theta_2, \eta_2)^H$?
- $\bar{\mathbf{H}}_{2, k}$ is also the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$
- Reference:
- W. -Y. Chen, C. -Y. Wang, R. -H. Hwang, W. -T. Chen and S. -Y. Huang, "[Impact of Hardware Impairment on the Joint Reconfigurable Intelligent Surface and Robust Transceiver Design in MU-MIMO System](https://ieeexplore.ieee.org/document/10149520)," in *IEEE Transactions on Mobile Computing*.
<img src='https://hackmd.io/_uploads/HJUuXJeJ6.png' width=60% height=60%>
- $\mathbf{\Phi} \triangleq diag(\phi_1, ..., \phi_{N_s}) \in \mathbb{C}^{N_s \times N_s}$ is the diagonal reflection matrix of the RIS
- $\phi_i = \beta(\varphi_i) \cdot e^{j\varphi_{i}}, \; \forall i = 1, ..., N_s$
- in the simulation, the value of $N_s$ is set to $25$ (quite large?)
- $\beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}$
- $\varphi_i = \hat{\varphi_i} + \varphi_i'$
- actual phase shift = desired phase shift + phase error
- $\hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; \forall i = 1, ..., N_s$
- in the simulation, the value of $bits$ is set to $8$, which give us $256$ choices
- $\varphi_i'$ folows the von Mises distribution with PDF $f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}$
- $I_0(\kappa)$ is the modified Bessel function of the first kind of order $0$
### **Problem formulation**
- The objective is to minimize the worst case MSE (min-max MSE)
$$
\begin{align*}
\min\limits_{\mathbf{W}_k, \; \mathbf{\Phi}, \; \mathbf{F}_k \\ \forall k = 1, \ ..., \ N_k} \;\; &\max\limits_{\Delta\mathbf{H}_{2, k}, \; \Delta\mathbf{H}_{3, k} \\ \;\; \forall k = 1, \ ..., \ N_k } \;\;\;\;\; \alpha \\
\textrm {s.t.} \;\;
& \alpha \geq 0, \\
& E\left\{ tr\{((\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \}\right\} \leq \alpha, \; \forall k = 1, \ldots, N_k, \\
& \sum\limits_{u = 1}^{N_k} tr\{ \mathbf{F}_k\mathbf{F}_k^{\mathcal {H}} \} \leq P_{t}, \\
& \| \phi_i \|_2^2 = \beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}, \; \beta_{min} \geq 0, \; \mu\geq 0, \;\; \forall i = 1, \ldots, N_s, \\
& \varphi_i = \hat{\varphi_i} + \varphi_i' \ , \; \varphi_i \in [0, 2\pi), \;\; \forall i = 1, \ldots, N_s, \\
& \hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}, \;\forall i = 1, \ldots, N_s, \\
& \| \Delta\mathbf{H}_{2, k} \|_2 \leq \psi, \; \forall k = 1, \ldots, N_k, \\
& \| \Delta\mathbf{H}_{3, k} \|_2 \leq \psi, \; \forall k = 1, \ldots, N_k.
\end{align*}
$$
### **Algorithm**
- Deep Reinforcement Learning
<img src='https://hackmd.io/_uploads/Hk89kylkT.png' width=60% height=60%>
### **MSE derivation**
- I was wondering if the MSE of the Weighted Minimum Mean Squared Error (WMMSE) is the same as the MSE of the Min-Max MSE?
- The MSE of WMMSE
$$
\begin{align*}
\mathbf{E}_k = E\left\{ \| \mathbf{x}_k - \xi_k^{-1} \mathbf{y}_k \|^2 \right\}
= E\left\{ (\mathbf{x}_k - \xi_k^{-1} \mathbf{y}_k)(\mathbf{x}_k - \xi_k^{-1} \mathbf{y}_k)^H \right\}
\end{align*}
$$
- The MSE of Min-Max MSE
$$
\begin{align*}
\text{MSE} &= E\left\{ tr\{(\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \}\right\}
= tr\left\{ E\{(\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \} \right\} \\
&= tr\left\{ (\mathbf{I}_L - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k)(\mathbf{I}_L - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k)^H + \sigma_n^2\mathbf{W}_k^H\mathbf{W}_k \right\}
\end{align*}
$$
- The derivation process
$$
\begin{align*}
&E\left\{ (\mathbf{x}_k - \mathbf{y}_k)(\mathbf{x}_k - \mathbf{y}_k)^H \right\} = \ E\{ (\mathbf{x}_k - (\mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k + \mathbf{W}_k^H\mathbf{n}_k)) ((\mathbf{x}_k - (\mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k + \mathbf{W}_k^H\mathbf{n}_k))^H \} \\ \\
&= \ E\{ (\mathbf{x}_k - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k - \mathbf{W}_k^H\mathbf{n}_k) (\mathbf{x}_k - \mathbf{W}_k^H \tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k - \mathbf{W}_k^H\mathbf{n}_k)^H \} \\ \\
&= E\{ (\mathbf{x}_k - \mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k \mathbf{x}_k - \mathbf{W}_k^H\mathbf{n}_k) (\mathbf{x}_k^H - \mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H - \mathbf{n}_k^H\mathbf{W}_k) \} \;\;\; (\because (\mathbf{AB})^H = \mathbf{B}^H\mathbf{A}^H) \\ \\
&= E\{ \mathbf{x}_k\mathbf{x}_k^H - \mathbf{x}_k\mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H - \mathbf{x}_k\mathbf{n}_k^H\mathbf{W}_k \\
& \;\;\;\;\;\;\;\;\; - (\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)\mathbf{x}_k\mathbf{x}_k^H + \mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k\mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H + \mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k\mathbf{x}_k\mathbf{n}_k^H\mathbf{W}_k \\
& \;\;\;\;\;\;\;\;\; - \mathbf{W}^H\mathbf{n}_k\mathbf{x}_k^H + \mathbf{W}^H\mathbf{n}_k\mathbf{x}_k^H(\mathbf{W}_k^H\tilde{\mathbf{H}} \mathbf{F}_k)^H + \mathbf{W}^H\mathbf{n}_k\mathbf{n}_k^H\mathbf{W} \} \\ \\
& \;\;\;\;\;\; (\because E\{ \mathbf{x}_k\mathbf{x}_k^H \} = \mathbf{I}_L, \ E\{ \mathbf{x}_k\mathbf{n}_k^H \} = E\{ \mathbf{x}_k^H\mathbf{n}_k \} = 0, \ E\{ \mathbf{n}_k\mathbf{n}_k^H \} = \sigma_n^2 ) \\
&= E\{ \mathbf{I}_L - (\mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k)^H - 0 - \mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k + \mathbf{I}_L + 0 - 0 + 0 + \sigma_n^2\mathbf{W}_k^H\mathbf{W}_k \} \\ \\
&= (\mathbf{I}_L - \mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k)(\mathbf{I}_L - \mathbf{W}_k^H\tilde{\mathbf{H}}\mathbf{F}_k)^H + \sigma_n^2\mathbf{W}_k^H\mathbf{W}_k
\;\;\; (\because (\mathbf{I} - \mathbf{A}^H)(\mathbf{I} - \mathbf{A}) = \mathbf{I} - \mathbf{A} - \mathbf{A}^H + \mathbf{I})
\end{align*}
$$
- 算出來很可能是 對角矩陣,模擬的時候可以驗證一下,如果確定是對角矩陣,對角線元素會對應到各個 user 的 MSE
- 如果要估計的是 "phase error" 的期望值,而不是像上面 WMMSE 是要估計 "收到訊號" 的期望值,所以也許期望值不能直接拿掉?
- References
- K. -Y. Chen, H. -Y. Chang, R. Y. Chang and W. -H. Chung, "[Hybrid Beamforming in mmWave MIMO-OFDM Systems via Deep Unfolding](https://ieeexplore.ieee.org/document/9860467)," *2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring)*, Helsinki, Finland, 2022, pp. 1-7.
<img src='https://hackmd.io/_uploads/ByyKUbs02.png' width=70% height=70%>
- X. Zhao, T. Lin, Y. Zhu and J. Zhang, "[Partially-Connected Hybrid Beamforming for Spectral Efficiency Maximization via a Weighted MMSE Equivalence](https://ieeexplore.ieee.org/document/9467491)," in *IEEE Transactions on Wireless Communications*, vol. 20, no. 12, pp. 8218-8232, Dec. 2021.
<img src='https://hackmd.io/_uploads/By2JUZsCh.png' width=70% height=70%>
<img src='https://hackmd.io/_uploads/HyabLWi0h.png' width=70% height=70%>
## **Possible methods**
- For solving the optimization problem
- Double DQN
- Dueling DQN
- Proximal Policy Optimization (PPO)
- For addressing the hybrid-action space
- Xiong, Jiechao, et al. "[Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space](https://arxiv.org/abs/1810.06394)." *arXiv preprint arXiv:1810.06394* (2018). (**ICLR 2018**) (Cited by 132)
- DQN + DDPG
- Fu, Haotian, et al. "[Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces](https://arxiv.org/abs/1903.04959)." *arXiv preprint arXiv:1903.04959* (2019). (**IJCAI 2019**) (Cited by 50)
- Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN)
- Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN)
- Fan, Zhou, et al. "[Hybrid actor-critic reinforcement learning in parameterized action space](https://arxiv.org/abs/1903.01344)." *arXiv preprint arXiv:1903.01344* (2019). (**IJCAI 2019**) (Cited by 64)
- Hybrid Proximal Policy Optimization (H-PPO)
- Song, H. Francis, et al. "[V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control](https://arxiv.org/abs/1909.12238)." *arXiv preprint arXiv:1909.12238* (2019). (**ICLR 2020**) (Cited by 85)
- V-MPO algorithm (On-policy MPO)
- Neunert, Michael, et al. "[Continuous-discrete reinforcement learning for hybrid control in robotics](http://proceedings.mlr.press/v100/neunert20a/neunert20a.pdf)." *Conference on Robot Learning*. **PMLR, 2020**. (Cited by 71)
- Maximum aposteriori Policy Optimisation (MPO) algorithm
- Li, Boyan, et al. "[Hyar: Addressing discrete-continuous action reinforcement learning via hybrid action representation](https://arxiv.org/abs/2109.05490)." *arXiv preprint arXiv:2109.05490* (2021). (**ICLR 2022**) (Cited by 16)
- HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE)
- C. Huang, H. Zhang, L. Wang, X. Luo and Y. Song, "[Mixed Deep Reinforcement Learning Considering Discrete-continuous Hybrid Action Space for Smart Home Energy Management](https://ieeexplore.ieee.org/document/9682649)," in *Journal of Modern Power Systems and Clean Energy*, vol. 10, no. 3, pp. 743-754, May 2022. (Cited by 7)
- mixed deep reinforcement learning (MDRL) algorithm
- DQN + DDPG
### **Future works**
- Code the simulation for the given system model (using ```python```)
- This will provide us with a behavior model for further use
- Convert the simulation program into a custom DRL environment
- The custom environment has to follows ```gym``` interface
- though ```gym``` supports both ```Tensorflow``` and ```PyTorch```, I don't know if the custom environment fits ```Tensorflow```
- We inherit the ```gym.Env``` class, override the template
- define our own ```action_space```, ```observation_space```, ```step()```, and ```reset()```... etc.
- so that our custom environment can fit in modern algorithms that are mostly coded in ```PyTorch``` or we may use the built-in algorithms with ```Stable Baseline3```
- Find or design a custom DRL algorithm that fit in this environment to slove the optimization problem