# **meeting 09/05**
**Advisor: Prof. Chih-Yu Wang \
Presenter: Shao-Heng Chen \
Date: Sep 05, 2023**
<!-- Chih-Yu Wang -->
<!-- Wei-Ho Chung -->
## **Arbitrary plan**
- Figure out the system model, channel model and the objective function
- So that we can find the possible algorithms that may best suit our problem
- Code the simulation for the given system model (using ```python```)
- This will provide us with a behavior model for further use
- Convert the simulation program into a custom DRL environment
- The custom environment has to follows ```gym``` interface
- though ```gym``` supports both ```Tensorflow``` and ```PyTorch```, I don't know if the custom environment fits ```Tensorflow```
- We inherit the ```gym.Env``` class, override the template
- define our own ```action_space```, ```observation_space```, ```step()```, and ```reset()```... etc.
- so that our custom environment can fit in modern algorithms that are mostly coded in ```PyTorch``` or we may use the built-in algorithms with ```Stable Baseline3```
- Find or design a custom DRL algorithm that fit in this environment to slove the optimization problem
## **Possible methods**
- For solving the optimization problem
- Double DQN
- Dueling DQN
- Proximal Policy Optimization (PPO)
- For addressing the hybrid-action space
- Xiong, Jiechao, et al. "[Parametrized deep q-networks learning: Reinforcement learning with discrete-continuous hybrid action space](https://arxiv.org/abs/1810.06394)." *arXiv preprint arXiv:1810.06394* (2018). (**ICLR 2018**) (Cited by 132)
- DQN + DDPG
- Fu, Haotian, et al. "[Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces](https://arxiv.org/abs/1903.04959)." *arXiv preprint arXiv:1903.04959* (2019). (**IJCAI 2019**) (Cited by 50)
- Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN)
- Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN)
- Fan, Zhou, et al. "[Hybrid actor-critic reinforcement learning in parameterized action space](https://arxiv.org/abs/1903.01344)." *arXiv preprint arXiv:1903.01344* (2019). (**IJCAI 2019**) (Cited by 64)
- Hybrid Proximal Policy Optimization (H-PPO)
- Song, H. Francis, et al. "[V-mpo: On-policy maximum a posteriori policy optimization for discrete and continuous control](https://arxiv.org/abs/1909.12238)." *arXiv preprint arXiv:1909.12238* (2019). (**ICLR 2020**) (Cited by 85)
- V-MPO algorithm (On-policy MPO)
- Neunert, Michael, et al. "[Continuous-discrete reinforcement learning for hybrid control in robotics](http://proceedings.mlr.press/v100/neunert20a/neunert20a.pdf)." *Conference on Robot Learning*. **PMLR, 2020**. (Cited by 71)
- Maximum aposteriori Policy Optimisation (MPO) algorithm
- Li, Boyan, et al. "[Hyar: Addressing discrete-continuous action reinforcement learning via hybrid action representation](https://arxiv.org/abs/2109.05490)." *arXiv preprint arXiv:2109.05490* (2021). (**ICLR 2022**) (Cited by 16)
- HyAR constructs the latent space and embeds the dependence between discrete action and continuous parameter via an embedding table and conditional Variantional Auto-Encoder (VAE)
- C. Huang, H. Zhang, L. Wang, X. Luo and Y. Song, "[Mixed Deep Reinforcement Learning Considering Discrete-continuous Hybrid Action Space for Smart Home Energy Management](https://ieeexplore.ieee.org/document/9682649)," in *Journal of Modern Power Systems and Clean Energy*, vol. 10, no. 3, pp. 743-754, May 2022. (Cited by 7)
- mixed deep reinforcement learning (MDRL) algorithm
- DQN + DDPG
## **Paper reading**
- W. -Y. Chen, C. -Y. Wang, R. -H. Hwang, W. -T. Chen and S. -Y. Huang, "[Impact of Hardware Impairment on the Joint Reconfigurable Intelligent Surface and Robust Transceiver Design in MU-MIMO System](https://ieeexplore.ieee.org/document/10149520)," in *IEEE Transactions on Mobile Computing*.
### **System model**
- Downlink RIS-aided MU-MIMO system
- $U$ users with $N_r$ antenna
- one BS with $N_t$ antennas
- one RIS with $N_s$ elements
- The decoded signal at $u$-th user
$$
\begin{align*}
\hat{s}_u = \mathbf{F}_u \mathbf{y}_u
= \mathbf{F}_u((\mathbf{H}_u^{RIS} \mathbf{\Phi} \mathbf{H}^{RIS} + \mathbf{K}_u) \mathbf{x} + \mathbf{n}_u)
\end{align*}
$$
- $\mathbf{K}_u \in \mathbb{C}^{N_r \times N_t}, \forall u = 1, ..., U$ is the BS-$u$-th user channel (direct path, NLOS **Rayleigh** fadding channel)
- $\mathbf{K}_u = \hat{\mathbf{K}}_u + \Delta\mathbf{K}_u$
- actual direct channel = estimated direct channel (known at the BS and the RIS) + CSI error (unknown)
- $\mathbf{H}^{RIS} \in \mathbb{C}^{N_s \times N_t}$ is the BS-RIS channel (LOS **Rician** fading)
- channel matrix $\mathbf{H}^{RIS}$ is known at the BS and the RIS by the backhaul transmission
- $\mathbf{H}^{RIS} = C_r \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm ULA}(\theta_1)\textbf{a}_{\rm ULA}(\eta_1)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}^{RIS})$
- $C_r$ is the LOS path loss from BS to RIS
- $\textbf{a}_{\rm ULA}(\cdot)$ is the ULA sterring vector with the angular parameter $\theta_1, \eta_1$ (with separated distance = $10$ cm)
- $\bar{\mathbf{H}}^{RIS}$ is the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$
- $\mathbf{H}_u^{RIS} \in \mathbb{C}^{N_r \times N_s}, \forall u = 1, ..., U$ is the RIS-$u$-th user channel (LOS **Rician** fading)
- $\mathbf{H}_u^{RIS} = \hat{\mathbf{H}}_u^{RIS} + \Delta\mathbf{H}_u^{RIS}, \forall u = 1, ..., U$
- actual RIS-user channel = estimated RIS-user CSI (known at the BS and the RIS) + CSI error (unknown)
- $\mathbf{H}_u^{RIS} = C_u \cdot (\sqrt{\frac{\delta}{\;\delta + 1\;}} \cdot \textbf{a}_{\rm UPA}(\theta_2)\textbf{a}_{\rm UPA}(\eta_2)^H + \sqrt{\frac{1}{\;\delta + 1\;}} \cdot \bar{\mathbf{H}}_u^{RIS})$
- $C_u$ is also the LOS path loss from RIS to $u$-th user
- $\textbf{a}_{\rm UPA}(\cdot)$ is the UPA sterring vector with the angular parameter $\theta_2, \eta_2$ (with separated distance = $10$ cm)
- UPA steering vector 為什麼只有一個角度? 不是要有方位角跟仰角? 還是應該要寫成 $\textbf{a}_{\rm UPA}(\theta_2, \eta_2)\textbf{a}_{\rm UPA}(\theta_2, \eta_2)^H$?
- $\bar{\mathbf{H}}_u^{RIS}$ is also the NLOS components, i.e. each entry follows i.i.d $\mathcal{CN}(0, 1)$
- $\mathbf{x} = \sum\limits_{k = 1}^{U}\mathbf{G}_k\mathbf{s}_k \in \mathbb{C}^{N_t \times 1}$ is the transmitted signal after precoding
- $\mathbf{x}_u = \mathbf{G}_u\mathbf{s}_u$?
- why the dimension of $\mathbf{x}$ isn't $\mathbb{C}^{U \times N_t \times 1}$?
- $\mathbf{G}_u \in \mathbb{C}^{N_t \times L}, \forall u = 1, ..., U$ is the beamforming matrix (**precoder**)
- $s_u \in \mathbb{C}^{L \times 1}$ is the desired signal vector
- $L$ is the symbol length of the signal
- $\mathbf{\Phi} \triangleq diag(\phi_1, ..., \phi_{N_s}) \in \mathbb{C}^{N_s \times N_s}$ is the diagonal reflection matrix of the RIS
- $\phi_i = \beta(\varphi_i) \cdot e^{j\varphi_{i}}, \; \forall i = 1, ..., N_s$
- in the simulation, the value of $N_s$ is set to $25$ (quite large?)
- $\beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}$
- $\varphi_i = \hat{\varphi_i} + \varphi_i'$
- actual phase shift = desired phase shift + phase error
- $\hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; \forall i = 1, ..., N_s$
- in the simulation, the value of $bits$ is set to $8$, which give us $16$ choices
- $\varphi_i'$ folows the von Mises distribution with PDF $f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}$
- $I_0(\kappa)$ is the modified Bessel function of the first kind of order $0$
- it's a continus value?
- $n$ is the index of the phase shifter, but I thought $i$ already specified the index of the RIS element
- $\mathbf{n}_u \in \mathbb{C}^{N_r \times 1}, \forall u = 1, ..., U$ is the AWGN vector, $\mathbf{n}_u \sim \mathcal{CN}(0, \sigma^2 I_{N_r})$
- $\mathbf{F}_u \in \mathbb{C}^{L \times N_r}$ is the linear equalizer (**combiner**?)
### **Problem formulation**
- The objective is to minimize the worst case MSE (min-max MSE)
$$
\begin{align*}
\min\limits_{\mathbf{\Phi}, \; \mathbf{G}_u, \; \mathbf{F}_u \\ \forall u = 1, \ ..., \ U} \;\; &\max\limits_{\Delta\mathbf{H}_u^{RIS}, \; \Delta\mathbf{K}_u} \;\;\;\;\; \alpha \\
\textrm {s.t.} \;\;
& \alpha \geq 0, \\
& E\left\{ tr\{(\hat{\mathbf{s}_u} - \mathbf{s}_u)(\hat{\mathbf{s}_u} - \mathbf{s}_u)^H \}\right\} \leq \alpha, \; \forall u = 1, \ldots, U, \\
& \sum\limits_{u = 1}^{U} tr\{ \mathbf{G}_u\mathbf{G}_u^{\mathcal {H}} \} \leq P_{t}, \\
& \| \phi_i \|_2^2 = \beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}, \; \beta_{min} \geq 0, \; \mu\geq 0, \;\; \forall i = 1, \ldots, N_s, \\
& \varphi_i = \hat{\varphi_i} + \varphi_i' \ , \; \varphi_i \in [0, 2\pi), \;\; \forall i = 1, \ldots, N_s, \\
& \hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}, \;\forall i = 1, \ldots, N_s, \\
& \| \Delta \mathbf{K}_u \|_2 \leq \psi, \; \forall u = 1, \ldots, U, \\
& \| \Delta \mathbf{H}_u^{RIS} \|_2 \leq \psi, \; \forall u = 1, \ldots, U.
\end{align*}
$$
- Can we exchange the order of matrix mean in trace?
$$
E\left\{ tr\{(\hat{\mathbf{s}_u} - \mathbf{s}_u)(\hat{\mathbf{s}_u} - \mathbf{s}_u)^H \}\right\} = tr\left\{ E\{(\hat{\mathbf{s}_u} - \mathbf{s}_u)(\hat{\mathbf{s}_u} - \mathbf{s}_u)^H \}\right\} ?
$$