Actor-Critic Network for O-RAN Resource Allocation: xApp Design, Deployment, and Analysis

# Actor-Critic Network for O-RAN Resource Allocation: xApp Design, Deployment, and Analysis ###### tags: `5G Reading` Date : 2022-10-21 ## Metadata [paper link](https://arxiv.org/pdf/2210.04604.pdf) Kouchaki, M., & Marojevic, V. (2022). Actor-Critic Network for O-RAN Resource Allocation: xApp Design, Deployment, and Analysis. arXiv preprint arXiv:2210.04604. ## Take away **What is RAN** The RAN is split into three logical units: CU, DU, and RU. The CU is a **centralized unit** developed to handle the higher layer RAN protocols, such as the radio resource control (RRC), the service data adaptation protocol (SDAP), and the packet data convergence protocol (PDCP). It interfaces with the DUs through the mid-haul. The DU is a **logical node** that handles the lower protocol layers, which are the radio link control (RLC), the medium access control (MAC), and part of the physical layer (PHY). It interfaces with the RUs through the fronthaul. The **RU implements the lower part of the PHY**. **AI/ML solutions** AI/ML solutions are divided into three classes: **Supervised learning**, where a set of labeled data is available, **unsupervised Learning**, where the task is to find a structure of similarity among unlabeled data, and **reinforcement learning**, where the learning is based on trial and error while facing with an unknown environment. The best method should be selected based on the problem that the AI/ML model needs to solve. **xApp Development** xApps are the applications which will take the responsibility by leveraging machine learning (ML) algorithms and acting in near-real time. It is needed to write with the **essential libraries**, such as the RMR, SDL, logging, or use a predefined xApp frameworks in Python, go, or C++, and building and deploying the application on **Kubernetes(k8s)**. ## Summary The study illustrate a step-by-step design, development, and testing of an **AI based resource allocation xApp** for the near-RT RIC of the O-RAN architecture. The designed **xApp leverages RL**. The basic version is designed using the **A2C algorithm**, which is further **optimized using PPO method**. ## Note - The O-RAN architecture shown in Fig. 1 is based on open interfaces to enable interactions between the RAN and the RAN controller. ![](https://i.imgur.com/tN1CQfv.png) - Fig. 3 shows the designed architecture for the resource allocation xApp. Including the essential libraries and the data flow logic. > InfluxDB is a time series (TS) open source database. It makes it possible for developers to store, retrieve, and work with TS data that is used for real-time analysis. These kind of DBs are especially useful in situations such as monitoring and operations on the logs and metrics of large networks. ![](https://i.imgur.com/ThHTiE5.png) - RL is a learning model designed **based on interactions** between the agent and the environment. It is often used in control or resource management problems because it can learn from direct interactions with the environment. Each time the RF agent applies an action to the environment, the new state will determine the reward of the system. RL is **based on the Markov decision process (MDP)**. RL algorithms are categorized in two main classes: **value function** and **policy search**. - A RL model needs: - Designing an appropriate **reward system** can establish an efficient RL model to reach the main goals - **States** in this environment. In this case, Channel request, Channel Quality Indicator (CQIs) from the Channel State Information (CSI), data rate, and UE fairness (fa). These are the observation to decide about the resource allocation. - (Opt. Depends on the use case)The **Actor-critic model(A2C)** is a temporal difference (TD) learning algorithm. In this approach, the actor network sets the policy that represents a set of possible actions for a given state, and the critic network embodies the estimated value function that evaluates actions. - (Opt. Depends on the use case)**Proximal Policy Optimization (PPO)** is a policy gradient algorithm that uses the actor-critic model to train a stochastic policy. The policy uses a stochastic gradient ascent as an optimizer to be updated but the value function which uses the gradient descent to be fitted.