# **meeting 12/14**
**Advisor: Prof. Wei-Ho Chung \
Presenter: Shao-Heng Chen \
Date: Dec 14, 2023**
<!-- Chih-Yu Wang -->
<!-- Wei-Ho Chung -->
## **System diagram**
<img src='https://hackmd.io/_uploads/SkqfuYHIa.pngg' width=70% height=70%>
- Downlink RIS-aided MU-MISO system
- $N_k$ single-antenna users ($N_r = 1$)
- one BS with $N_t$ antennas
- one RIS with $N_s$ elements
- The stacked received signal of all $N_k$ users
$$
\begin{align*}
\mathbf{y} &= (\mathbf{H}_{2} \mathbf{\Phi} \mathbf{H}_1 + \mathbf{H}_{3}) \mathbf{F} \mathbf{x} + \mathbf{n} = \tilde{\mathbf{H}}\mathbf{F} \mathbf{x} + \mathbf{n},
\end{align*}
$$
- $\mathbf{H}_1 \in \mathbb{C}^{N_s \times N_t}$ is the BS-RIS channel
- $\mathbf{H}_{2} \in \mathbb{C}^{N_k \times N_s}$ is the RIS-users channel
- $\mathbf{h}_{k, 2} \in \mathbb{C}^{1 \times N_s}, \forall k = 1, ..., N_k$ is the channel vector from RIS to the $k$-th user
$$
\begin{align*}
\mathbf{h}_{k, 2} =& \ \hat{\mathbf{h}}_{k, 2} + \Delta\mathbf{h}_{k, 2} \in \mathbb{C}^{1 \times N_s}, \\
\Delta\mathbf{h}_{k, 2} &= \psi\frac{\mathbf{h}_{k, 2}^{N_1}}{\| \mathbf{h}_{k, 2}^{N_1} \|_2}, \;\; \mathbf{h}_{k, 2}^{N_1} \sim \mathcal{CN}(0, 1),
\end{align*}
$$
- $\mathbf{H}_{3} \in \mathbb{C}^{N_k \times N_t}$ is the BS-users channel
- $\mathbf{h}_{k, 3} \in \mathbb{C}^{1 \times N_t}, \forall k = 1, ..., N_k$ is the channel vector from BS to the $k$-th user
$$
\begin{align*}
\mathbf{h}_{k, 3} =& \ \hat{\mathbf{h}}_{k, 3} + \Delta\mathbf{h}_{k, 3} \in \mathbb{C}^{1 \times N_t}, \\
\Delta\mathbf{h}_{k, 3} &= \psi\frac{\mathbf{h}_{k, 3}^{N_2}}{\| \mathbf{h}_{k, 3}^{N_2} \|_2}, \;\; \mathbf{h}_{k, 3}^{N_2} \sim \mathcal{CN}(0, 1),
\end{align*}
$$
## **Problem formulations**
### **Min-max MSE**
- The objective is to minimize the worst case Mean Squared Error (MSE)
$$
\begin{align*}
\min\limits_{\mathbf{\Phi}} \;\; &\max\limits_{\Delta\mathbf{h}_{k, 2}, \; \Delta\mathbf{h}_{k, 3}, \; \varphi_i', \; \beta(\varphi_i)} \;\;\;\;\; \alpha \\
\textrm {s.t.} \;\;
& \alpha \geq 0, \\
& E\left\{ (x_k - y_k)(x_k - y_k)^H \right\} \leq \alpha, \forall k = 1, \ldots, N_k, \\
& tr\{ \mathbf{F}\mathbf{F}^{H} \} \leq P_{t}, \\
& \| \phi_i \|_2^2 = \beta(\varphi_i) = (1 - \beta_{min}) \cdot (\frac{\sin(\varphi_i - \mu) + 1}{2})^\kappa + \beta_{min}, \; \beta_{min} \geq 0, \; \mu\geq 0, \;\; \forall i = 1, \ldots, N_s, \\
& \varphi_i = \hat{\varphi_i} + \varphi_i' \ , \; \varphi_i \in [0, 2\pi), \;\; \forall i = 1, \ldots, N_s, \\
& \hat{\varphi_i} \in \mathcal{A} = \{ e^{(j\frac{ \; 2\pi n \;}{2^{bits \;}})} \}_{n = 0}^{ 2^{bits - 1}}, \; f(\varphi_i'(\mu, \kappa)) = \frac{\;e^{\kappa \cos(\varphi_i' - \mu)}\;\;\;}{\;2\pi I_0(\kappa)\;}, \;\forall i = 1, \ldots, N_s, \\
& \| \Delta\mathbf{h}_{k, 2} \|_2 \leq \psi, \; \forall k = 1, \ldots, N_k, \\
& \| \Delta\mathbf{h}_{k, 3} \|_2 \leq \psi, \; \forall k = 1, \ldots, N_k.
\end{align*}
$$
- the variable $\alpha$ is the **upper bound** of our objective function
- the diverse mathematical characteristics give rise to the intricate mixed discrete and continuous programming in the optimization problem, rendering it a Non-deterministic Polynomial-time hard (**NP-hard**) problem
### **Algorithm**
- Deep Reinforcement Learning
- Schulman, John, et al. "[Proximal policy optimization algorithms](https://arxiv.org/abs/1707.06347)." *arXiv preprint arXiv:1707.06347* (2017).
- R. Kozlica, S. Wegenkittl and S. Hiränder, "[Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting Task](https://ieeexplore.ieee.org/abstract/document/10228056)," *2023 IEEE 32nd International Symposium on Industrial Electronics (ISIE)*, Helsinki, Finland, 2023, pp. 1-6.
<img src='https://hackmd.io/_uploads/Hk89kylkT.png' width=60% height=60%>
## **Training**


#### **Policy network**

## **Inference**

## **Experiment results**
### ```PPO-2-16-[4, 36]```
Orange: ```PPO-2-16-4```, Blue: ```PPO-2-16-36```

<img src='https://hackmd.io/_uploads/HkpckiEI6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SyQ3kjEIa.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/HJNLVo4LT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/H1xDVsNIa.png' width=50% weight=50%>
### ```PPO-2-16-[9, 16, 25]```
Green: ```PPO-2-16-9```, Cyan: ```PPO-2-16-16```, Pink: ```PPO-2-16-25```

<img src='https://hackmd.io/_uploads/BJowLjNLa.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SyQuLj486.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/ry--diVI6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/ryuZdj48T.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SyvE3sVL6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/B1krhjELa.png' width=50% weight=50%>
### ```PPO-[2, 4, 6, 8, 10]-16-16```
Orange: ```PPO-2-16-16```, Blue: ```PPO-4-16-16```, Red: ```PPO-6-16-16```, Cyan: ```PPO-8-16-16```, Pink: ```PPO-10-16-16```

<img src='https://hackmd.io/_uploads/ry--diVI6.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/ryuZdj48T.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/rJUD9sV8T.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/Skqv9j4IT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SkwSnQBIp.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/S1EIn7H8p.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SJB7vjS8a.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/ByGEDjHIp.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/SJ0PKlILT.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/HJW_Ke8Ua.png' width=50% weight=50%>
<!-- <img src='' width=50% weight=50%>
<img src='' width=50% weight=50%> -->
### ```PPO-10-16-36```
Cyan: ```PPO-10-16-36-2300```

<img src='https://hackmd.io/_uploads/SyDR5rPIa.png' width=50% weight=50%>
<img src='https://hackmd.io/_uploads/rJ-yorDU6.png' width=50% weight=50%>