# **meeting 08/22** **Advisor: Prof. Chih-Yu Wang \ Presenter: Shao-Heng Chen \ Date: Aug 22, 2023** ## **Paper reading** - Saglam Baturay, Doga Gurgunoglu, and Suleyman S. Kozat. "[Deep Reinforcement Learning Based Joint Downlink Beamforming and RIS Configuration in RIS-aided MU-MISO Systems Under Hardware Impairments and Imperfect CSI](https://arxiv.org/abs/2211.09702)." *arXiv preprint arXiv:2211.09702* (2022). - which was accepted to *2023 IEEE International Conference on Communications the 5th Workshop on Data Driven Intelligence for Networks and Systems (DDINS)*. - lead author - Saglam Baturay <img src='https://hackmd.io/_uploads/Hy01Mq-an.png' width=75% height=75%> ### **System model** - Downlink narrow-band RIS-aided MU-MISO system - $K$ users with single-antenna - one BS with $M$ antennas - one RIS with $L$ elements - Consider 2 different environment models - True Environment: Phase-dependent Amplitude + Perfect CSI - Mismatch Environment: Lossless RIS Reflection + Imperfect CSI #### **True Environment Model** - The received signal at user $k$ can be expressed as: $$ \begin{align*} z_k = \mathbf{h_k}^\mathsf{T} \mathbf{\Phi H Gx} + w_k = \phi^\mathsf{T} \text{diag}(\mathbf{h_k}) \mathbf{H} \mathbf{G} \mathbf{x} + w_k = \phi^\mathsf{T} \mathbf{D}_k \mathbf{G} \mathbf{x} + w_k \end{align*} $$ - $z_k$ is the received signal (complex scalar) - $w_k$ is the additive receiver noise at the $k$-th user, $w_k \sim \mathcal{CN}(0, \sigma_w^2)$ for all $k$ - $\mathbf{G} \in \mathbb{C}^{M \times K}$ is the transmit beamforming matrix (precoder?) - maps the $K$ data streams for $K$ users onto $M$ transmit antennas - $\mathbf{x} \in \mathbb{C}^{K \times 1}$ is the data streams for $K$ users - $\mathbf{H} \in \mathbb{C}^{L \times M}$ is the BS-RIS channel - $\mathbf{\Phi} \triangleq diag(\phi_1, ..., \phi_L) \in \mathbb{C}^{L \times L}$ is the diagonal reflection matrix at the RIS - $\mathbf{h}_k \in \mathbb{C}^{L \times 1}$ is the RIS-user $k$ channel - $\mathbf{D}_k \triangleq \text{diag}(\mathbf{h}_k)\mathbf{H} \in \mathbb{C}^{L \times M}$ is the individual cascaded channels to each user - The reason for denoting a different symbol for the BS-RIS-user channel is to simulate the effect of imperfect CSI. - Combining these matrices together to obtain individual cascaded channels for each user would simplify the later steps of adding channel estimation error - $\phi \in \mathbb{C}^{L \times 1}$ is the column vector of the diagonal entries of $\mathbf{\Phi}$ - The RIS follows the **phase-dependent amplitude model** (same as ours) - with entries $\phi_l = \beta(\varphi_l) \cdot e^{j\varphi_{l}}, \varphi \in [0, 2 \pi)$, resulting in: $$\beta(\varphi_l) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_l - \mu) + 1}{2})^\kappa + \beta_{min}$$ - $\beta_{min} = 0.3, \ 0.6, \ \mu = 0, \ \kappa = 1.5$ are constants that depend on the **hardware implementation** of the RIS #### **Mismatch Environment Model** - The received signal at user $k$ can be expressed as: $$z_k = \mathbf{h}_k^\mathsf{T} \mathbf{\Phi H Gx} + w_k = \hat{\phi}^\mathsf{T} \hat{\mathbf{D}}_k \mathbf{G} \mathbf{x} + w_k$$ - The RIS reflections are assumed to be **lossless**, i.e., $\hat{\phi} \triangleq [e^j\varphi_1, \ e^j\varphi_2, \ ..., \ e^j\varphi_L]^\mathsf{T}$ - $\hat{\mathbf{D}}_k \triangleq \mathbf{D}_k + \mathbf{E}_k, \; \forall k = 1, ..., K$ - $\mathbf{E}_k \in \mathbb{C}^{L \times M}$ is the channel estimation error matrix of the **cascaded channel** of each user, with i.i.d entries $e_{m, l}^{(k)} \sim \mathcal{CN}(0, \sigma_e^2)$ ### **Problem formulation** - Consider 2 different objectives - The Golden Standard Objectives - utilize the DRL agent to maximize the sum downlink rate - The Mismatch Objectives - same optimization problem to be solved in the mismatch scenario - The BS is trying to solve a different optimization problem than the actual sum rate, resulting in inferior transmit beamforming and RIS configuration designs #### **Golden Standard** - The Sum (Downlink) Rate $$R_{\Sigma} \triangleq \sum_{k = 1}^{K} \log(1 + \frac{\| \phi^\mathsf{T} \mathbf{D}_k \mathbf{G} \|^2 }{\sum_{j \neq k} \| \phi^\mathsf{T} \mathbf{D}_j \mathbf{G} \|^2 + \sigma_w^2})$$ - The Optimization Problem - under the domain restriction of phase shifts and the transmission power constraint $P_{t} = \{ 5, 10, 15, 20, 25, 30 \} \; \text{dBm}$ $$ \begin{align*} \max_{\hat{\phi}, \mathbf{G}}& \; {R}_{\Sigma} \\ \text{s.t.} \;\;\; &\varphi_l \in[0, 2\pi), \; \forall \ l = 1, ..., L, \\ &\text{tr}(\mathbf{G}\mathbf{G}^H) \leq P_{t} \end{align*} $$ #### **Mismatch Scenario** - The Sum (Downlink) Rate is quite similar $$R_{\Sigma} \triangleq \sum_{k = 1}^{K} \log(1 + \frac{\| \hat{\phi}^\mathsf{T} \hat{\mathbf{D}}_k \mathbf{G} \|^2 }{\sum_{j \neq k} \| \hat{\phi}^\mathsf{T} \hat{\mathbf{D}}_j \mathbf{G} \|^2 + \sigma_w^2})$$ - The Optimization Problem - under same constraints stated above $$ \begin{align*} \max_{\hat{\phi}, \mathbf{G}}& \; \hat{R}_{\Sigma} \\ \text{s.t.} \;\;\; &\varphi_l \in[0, 2\pi), \; \forall \ l = 1, ..., L, \\ &\text{tr}(\mathbf{G}\mathbf{G}^H) \leq P_{t} \end{align*} $$ ### **Methodology** - Soft Actor-Critic (SAC) Algorithm - Action: The policy network outputs the flattened $\mathbf{G}$ and $\phi$ as the action vector which is consists of $2MK + 2L$ elements - State: The state vector consists of transmission and reception powers for each user, the previous action, and the cascaded channel matrices, resulting in a $2K + 2KLM +2MK + 2L$ - dimensional state vector - Reward: At every time step, the reward is determined by the sum downlink rate - Deep Directed Intrinsically Motivated Exploration (DISCOVER) Algorithm - they leverage a recent work proposed for the exploration of continuous action spaces ### **Results** - The hyperparameter setting <img src = 'https://hackmd.io/_uploads/rJ_E15-a3.png' width=70% height=70%> - Learning curves for the tested settings ![](https://hackmd.io/_uploads/SJK9Ct-ph.png) - Average of last 1000 instant rewards (sum-rate) achieved by the SAC agents ![](https://hackmd.io/_uploads/HkKJ1q-62.png) ## **Future works** - I will put more effort into studying DRL, from a implementation perspective