# **meeting 08/22**
**Advisor: Prof. Chih-Yu Wang \
Presenter: Shao-Heng Chen \
Date: Aug 22, 2023**
## **Paper reading**
- Saglam Baturay, Doga Gurgunoglu, and Suleyman S. Kozat. "[Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI](https://arxiv.org/abs/2211.09702)." *arXiv preprint arXiv:2211.09702* (2022).
- which was accepted to *2023 IEEE International Conference on Communications the 5th Workshop on Data Driven Intelligence for Networks and Systems (DDINS)*.
- lead author - Saglam Baturay
<img src='https://hackmd.io/_uploads/Hy01Mq-an.png' width=75% height=75%>
### **System model**
- Downlink narrow-band RIS-aided MU-MISO system
- $K$ users with single-antenna
- one BS with $M$ antennas
- one RIS with $L$ elements
- Consider 2 different environment models
- True Environment: Phase-dependent Amplitude + Perfect CSI
- Mismatch Environment: Lossless RIS Reflection + Imperfect CSI
#### **True Environment Model**
- The received signal at user $k$ can be expressed as:
$$
\begin{align*}
z_k = \mathbf{h_k}^\mathsf{T} \mathbf{\Phi H Gx} + w_k
= \phi^\mathsf{T} \text{diag}(\mathbf{h_k}) \mathbf{H} \mathbf{G} \mathbf{x} + w_k = \phi^\mathsf{T} \mathbf{D}_k \mathbf{G} \mathbf{x} + w_k
\end{align*}
$$
- $z_k$ is the received signal (complex scalar)
- $w_k$ is the additive receiver noise at the $k$-th user, $w_k \sim \mathcal{CN}(0, \sigma_w^2)$ for all $k$
- $\mathbf{G} \in \mathbb{C}^{M \times K}$ is the transmit beamforming matrix (precoder?)
- maps the $K$ data streams for $K$ users onto $M$ transmit antennas
- $\mathbf{x} \in \mathbb{C}^{K \times 1}$ is the data streams for $K$ users
- $\mathbf{H} \in \mathbb{C}^{L \times M}$ is the BS-RIS channel
- $\mathbf{\Phi} \triangleq diag(\phi_1, ..., \phi_L) \in \mathbb{C}^{L \times L}$ is the diagonal reflection matrix at the RIS
- $\mathbf{h}_k \in \mathbb{C}^{L \times 1}$ is the RIS-user $k$ channel
- $\mathbf{D}_k \triangleq \text{diag}(\mathbf{h}_k)\mathbf{H} \in \mathbb{C}^{L \times M}$ is the individual cascaded channels to each user
- The reason for denoting a different symbol for the BS-RIS-user channel is to simulate the effect of imperfect CSI.
- Combining these matrices together to obtain individual cascaded channels for each user would simplify the later steps of adding channel estimation error
- $\phi \in \mathbb{C}^{L \times 1}$ is the column vector of the diagonal entries of $\mathbf{\Phi}$
- The RIS follows the **phase-dependent amplitude model** (same as ours)
- with entries $\phi_l = \beta(\varphi_l) \cdot e^{j\varphi_{l}}, \varphi \in [0, 2 \pi)$, resulting in:
$$\beta(\varphi_l) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_l - \mu) + 1}{2})^\kappa + \beta_{min}$$
- $\beta_{min} = 0.3, \ 0.6, \ \mu = 0, \ \kappa = 1.5$ are constants that depend on the **hardware implementation** of the RIS
#### **Mismatch Environment Model**
- The received signal at user $k$ can be expressed as:
$$z_k = \mathbf{h}_k^\mathsf{T} \mathbf{\Phi H Gx} + w_k
= \hat{\phi}^\mathsf{T} \hat{\mathbf{D}}_k \mathbf{G} \mathbf{x} + w_k$$
- The RIS reflections are assumed to be **lossless**, i.e., $\hat{\phi} \triangleq [e^j\varphi_1, \ e^j\varphi_2, \ ..., \ e^j\varphi_L]^\mathsf{T}$
- $\hat{\mathbf{D}}_k \triangleq \mathbf{D}_k + \mathbf{E}_k, \; \forall k = 1, ..., K$
- $\mathbf{E}_k \in \mathbb{C}^{L \times M}$ is the channel estimation error matrix of the **cascaded channel** of each user, with i.i.d entries $e_{m, l}^{(k)} \sim \mathcal{CN}(0, \sigma_e^2)$
### **Problem formulation**
- Consider 2 different objectives
- The Golden Standard Objectives
- utilize the DRL agent to maximize the sum downlink rate
- The Mismatch Objectives
- same optimization problem to be solved in the mismatch scenario
- The BS is trying to solve a different optimization problem than the actual sum rate, resulting in inferior transmit beamforming and RIS configuration designs
#### **Golden Standard**
- The Sum (Downlink) Rate
$$R_{\Sigma} \triangleq \sum_{k = 1}^{K} \log(1 + \frac{\| \phi^\mathsf{T} \mathbf{D}_k \mathbf{G} \|^2 }{\sum_{j \neq k} \| \phi^\mathsf{T} \mathbf{D}_j \mathbf{G} \|^2 + \sigma_w^2})$$
- The Optimization Problem
- under the domain restriction of phase shifts and the transmission power constraint $P_{t} = \{ 5, 10, 15, 20, 25, 30 \} \; \text{dBm}$
$$
\begin{align*}
\max_{\hat{\phi}, \mathbf{G}}& \; {R}_{\Sigma} \\
\text{s.t.} \;\;\;
&\varphi_l \in[0, 2\pi), \; \forall \ l = 1, ..., L, \\
&\text{tr}(\mathbf{G}\mathbf{G}^H) \leq P_{t}
\end{align*}
$$
#### **Mismatch Scenario**
- The Sum (Downlink) Rate is quite similar
$$R_{\Sigma} \triangleq \sum_{k = 1}^{K} \log(1 + \frac{\| \hat{\phi}^\mathsf{T} \hat{\mathbf{D}}_k \mathbf{G} \|^2 }{\sum_{j \neq k} \| \hat{\phi}^\mathsf{T} \hat{\mathbf{D}}_j \mathbf{G} \|^2 + \sigma_w^2})$$
- The Optimization Problem
- under same constraints stated above
$$
\begin{align*}
\max_{\hat{\phi}, \mathbf{G}}& \; \hat{R}_{\Sigma} \\
\text{s.t.} \;\;\;
&\varphi_l \in[0, 2\pi), \; \forall \ l = 1, ..., L, \\
&\text{tr}(\mathbf{G}\mathbf{G}^H) \leq P_{t}
\end{align*}
$$
### **Methodology**
- Soft Actor-Critic (SAC) Algorithm
- Action: The policy network outputs the flattened $\mathbf{G}$ and $\phi$ as the action vector which is consists of $2MK + 2L$ elements
- State: The state vector consists of transmission and reception powers for each user, the previous action, and the cascaded channel matrices, resulting in a $2K + 2KLM +2MK + 2L$ - dimensional state vector
- Reward: At every time step, the reward is determined by the sum downlink rate
- Deep Directed Intrinsically Motivated Exploration (DISCOVER) Algorithm
- they leverage a recent work proposed for the exploration of continuous action spaces
### **Results**
- The hyperparameter setting
<img src = 'https://hackmd.io/_uploads/rJ_E15-a3.png' width=70% height=70%>
- Learning curves for the tested settings

- Average of last 1000 instant rewards (sum-rate) achieved by the SAC agents

## **Future works**
- I will put more effort into studying DRL, from a implementation perspective