meeting 09/05 - HackMD

# **meeting 09/05** **Advisor: Prof. Chih-Yu Wang \ Presenter: Shao-Heng Chen \ Date: Sep 05, 2023**   ## **Arbitrary plan** - Figure out the system model, channel model and the objective function - So that we can find the possible algorithms that may best suit our problem - Code the simulation for the given system model (using ```python```) - This will provide us with a behavior model for further use - Convert the simulation program into a custom DRL environment - The custom environment has to follows ```gym``` interface - though ```gym``` supports both ```Tensorflow``` and ```PyTorch```, I don't know if the custom environment fits ```Tensorflow``` - We inherit the ```gym.Env``` class, override the template - define our own ```action_space```, ```observation_space```, ```step()```, and ```reset()```... etc. - so that our custom environment can fit in modern algorithms that are mostly coded in ```PyTorch``` or we may use the built-in algorithms with ```Stable Baseline3``` - Find or design a custom DRL algorithm that fit in this environment to slove the optimization problem ## **Possible methods** - For solving the optimization problem - Double DQN - Dueling DQN - Proximal Policy Optimization (PPO) - For addressing the hybrid-action space - Xiong, Jiechao, et al. "[Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space](https://arxiv.org/abs/1810.06394)." *arXiv preprint arXiv:1810.06394* (2018). (**ICLR 2018**) (Cited by 132) - DQN + DDPG - Fu, Haotian, et al. "[Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces](https://arxiv.org/abs/1903.04959)." *arXiv preprint arXiv:1903.04959* (2019). (**IJCAI 2019**) (Cited by 50) - Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) - Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN) - Fan, Zhou, et al. "[Hybrid actor-critic reinforcement learning in parameterized action space](https://arxiv.org/abs/1903.01344)." *arXiv preprint arXiv:1903.01344* (2019). (**IJCAI 2019**) (Cited by 64) - Hybrid Proximal Policy Optimization (H-PPO) - Song, H. Francis, et al. "[V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control](https://arxiv.org/abs/1909.12238)." *arXiv preprint arXiv:1909.12238* (2019). (**ICLR 2020**) (Cited by 85) - V-MPO algorithm (On-policy MPO) - Neunert, Michael, et al. "[Continuous-discrete reinforcement learning for hybrid control in robotics](http://proceedings.mlr.press/v100/neunert20a/neunert20a.pdf)." *Conference on Robot Learning*. **PMLR, 2020**. (Cited by 71) - Maximum aposteriori Policy Optimisation (MPO) algorithm - Li, Boyan, et al. "[Hyar: Addressing discrete-continuous action reinforcement learning via hybrid action representation](https://arxiv.org/abs/2109.05490)." *arXiv preprint arXiv:2109.05490* (2021). (**ICLR 2022**) (Cited by 16) - HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE) - C. Huang, H. Zhang, L. Wang, X. Luo and Y. Song, "[Mixed Deep Reinforcement Learning Considering Discrete-continuous Hybrid Action Space for Smart Home Energy Management](https://ieeexplore.ieee.org/document/9682649)," in *Journal of Modern Power Systems and Clean Energy*, vol. 10, no. 3, pp. 743-754, May 2022. (Cited by 7) - mixed deep reinforcement learning (MDRL) algorithm - DQN + DDPG ## **Paper reading** - W. -Y. Chen, C. -Y. Wang, R. -H. Hwang, W. -T. Chen and S. -Y. Huang, "[Impact of Hardware Impairment on the Joint Reconfigurable Intelligent Surface and Robust Transceiver Design in MU-MIMO System](https://ieeexplore.ieee.org/document/10149520)," in *IEEE Transactions on Mobile Computing*. ### **System model** - Downlink RIS-aided MU-MIMO system - $U$ users with $N_r$ antenna - one BS with $N_t$ antennas - one RIS with $N_s$ elements - The decoded signal at $u$-th user $$ \begin{align*} \hat{s}_u = \mathbf{F}_u \mathbf{y}_u = \mathbf{F}_u((\mathbf{H}_u^{RIS} \mathbf{\Phi} \mathbf{H}^{RIS} + \mathbf{K}_u) \mathbf{x} + \mathbf{n}_u) \end{align*} $$ - $\mathbf{K}_u \in \mathbb{C}^{N_r \times N_t}, \forall u = 1, ..., U$ is the BS-$u$-th user channel (direct path, NLOS **Rayleigh** fadding channel) - $\mathbf{K}_u = \hat{\mathbf{K}}_u + \Delta\mathbf{K}_u$ - actual direct channel = estimated direct channel (known at the BS and the RIS) + CSI error (unknown) - $\mathbf{H}^{RIS} \in \mathbb{C}^{N_s \times N_t}$ is the BS-RIS channel (LOS **Rician** fading) - channel matrix $\mathbf{H}^{RIS}$ is known at the BS and the RIS by the backhaul transmission - $\mathbf{H}^{RIS} = C_r \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm ULA}(\theta_1)\textbf{a}_{\rm ULA}(\eta_1)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}^{RIS})$ - $C_r$ is the LOS path loss from BS to RIS - $\textbf{a}_{\rm ULA}(\cdot)$ is the ULA sterring vector with the angular parameter $\theta_1, \eta_1$ (with separated distance = $10$ cm) - $\bar{\mathbf{H}}^{RIS}$ is the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$ - $\mathbf{H}_u^{RIS} \in \mathbb{C}^{N_r \times N_s}, \forall u = 1, ..., U$ is the RIS-$u$-th user channel (LOS **Rician** fading) - $\mathbf{H}_u^{RIS} = \hat{\mathbf{H}}_u^{RIS} + \Delta\mathbf{H}_u^{RIS}, \forall u = 1, ..., U$ - actual RIS-user channel = estimated RIS-user CSI (known at the BS and the RIS) + CSI error (unknown) - $\mathbf{H}_u^{RIS} = C_u \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm UPA}(\theta_2)\textbf{a}_{\rm UPA}(\eta_2)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}_u^{RIS})$ - $C_u$ is also the LOS path loss from RIS to $u$-th user - $\textbf{a}_{\rm UPA}(\cdot)$ is the UPA sterring vector with the angular parameter $\theta_2, \eta_2$ (with separated distance = $10$ cm) - UPA steering vector 為什麼只有一個角度? 不是要有方位角跟仰角? 還是應該要寫成 $\textbf{a}_{\rm UPA}(\theta_2, \eta_2)\textbf{a}_{\rm UPA}(\theta_2, \eta_2)^H$? - $\bar{\mathbf{H}}_u^{RIS}$ is also the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$ - $\mathbf{x} = \sum\limits_{k = 1}^{U}\mathbf{G}_k\mathbf{s}_k \in \mathbb{C}^{N_t \times 1}$ is the transmitted signal after precoding - $\mathbf{x}_u = \mathbf{G}_u\mathbf{s}_u$? - why the dimension of $\mathbf{x}$ isn't $\mathbb{C}^{U \times N_t \times 1}$? - $\mathbf{G}_u \in \mathbb{C}^{N_t \times L}, \forall u = 1, ..., U$ is the beamforming matrix (**precoder**) - $s_u \in \mathbb{C}^{L \times 1}$ is the desired signal vector - $L$ is the symbol length of the signal - $\mathbf{\Phi} \triangleq diag(\phi_1, ..., \phi_{N_s}) \in \mathbb{C}^{N_s \times N_s}$ is the diagonal reflection matrix of the RIS - $\phi_i = \beta(\varphi_i) \cdot e^{j\varphi_{i}}, \; \forall i = 1, ..., N_s$ - in the simulation, the value of $N_s$ is set to $25$ (quite large?) - $\beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}$ - $\varphi_i = \hat{\varphi_i} + \varphi_i'$ - actual phase shift = desired phase shift + phase error - $\hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; \forall i = 1, ..., N_s$ - in the simulation, the value of $bits$ is set to $8$, which give us $16$ choices - $\varphi_i'$ folows the von Mises distribution with PDF $f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}$ - $I_0(\kappa)$ is the modified Bessel function of the first kind of order $0$ - it's a continus value? - $n$ is the index of the phase shifter, but I thought $i$ already specified the index of the RIS element - $\mathbf{n}_u \in \mathbb{C}^{N_r \times 1}, \forall u = 1, ..., U$ is the AWGN vector, $\mathbf{n}_u \sim \mathcal{CN}(0, \sigma^2 I_{N_r})$ - $\mathbf{F}_u \in \mathbb{C}^{L \times N_r}$ is the linear equalizer (**combiner**?) ### **Problem formulation** - The objective is to minimize the worst case MSE (min-max MSE) $$ \begin{align*} \min\limits_{\mathbf{\Phi}, \; \mathbf{G}_u, \; \mathbf{F}_u \\ \forall u = 1, \ ..., \ U} \;\; &\max\limits_{\Delta\mathbf{H}_u^{RIS}, \; \Delta\mathbf{K}_u} \;\;\;\;\; \alpha \\ \textrm {s.t.} \;\; & \alpha \geq 0, \\ & E\left\{ tr\{(\hat{\mathbf{s}_u} - \mathbf{s}_u)(\hat{\mathbf{s}_u} - \mathbf{s}_u)^H \}\right\} \leq \alpha, \; \forall u = 1, \ldots, U, \\ & \sum\limits_{u = 1}^{U} tr\{ \mathbf{G}_u\mathbf{G}_u^{\mathcal {H}} \} \leq P_{t}, \\ & \| \phi_i \|_2^2 = \beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}, \; \beta_{min} \geq 0, \; \mu\geq 0, \;\; \forall i = 1, \ldots, N_s, \\ & \varphi_i = \hat{\varphi_i} + \varphi_i' \ , \; \varphi_i \in [0, 2\pi), \;\; \forall i = 1, \ldots, N_s, \\ & \hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}, \;\forall i = 1, \ldots, N_s, \\ & \| \Delta \mathbf{K}_u \|_2 \leq \psi, \; \forall u = 1, \ldots, U, \\ & \| \Delta \mathbf{H}_u^{RIS} \|_2 \leq \psi, \; \forall u = 1, \ldots, U. \end{align*} $$ - Can we exchange the order of matrix mean in trace? $$ E\left\{ tr\{(\hat{\mathbf{s}_u} - \mathbf{s}_u)(\hat{\mathbf{s}_u} - \mathbf{s}_u)^H \}\right\} = tr\left\{ E\{(\hat{\mathbf{s}_u} - \mathbf{s}_u)(\hat{\mathbf{s}_u} - \mathbf{s}_u)^H \}\right\} ? $$