Try   HackMD

Federated Deep Reinforcement Learning for Resource Allocation in O-RAN Slicing

tags: 5G Reading

Date : 2022-11-11

Metadata

paper link
Zhang, H., Zhou, H., & Erol-Kantarci, M. (2022). Federated Deep Reinforcement Learning for Resource Allocation in O-RAN Slicing. arXiv preprint arXiv:2208.01736.

Take away

The proposed two xApp; a power control xApp and a slice-based resource allocation xApp, utilize a federated deep reinforcement learning framework to collaborate for the training target.

Summary

This paper proposed the framework of correlating two xApps(power control and resource allocation) into a training target(RAN resource allocation) with federated deep reinforcement learning. The learning framework; vertical federated reinforcement learning(VFRL) takes the agents' acts within a uniform environment. The state and action space of the two xApp agents can be different, the goal is to work together to influence the environment and the rewards.

Note

  • O-RAN model considering slicing use case structure; there are EMBB slice and URLLC slice.

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • Structure for federated deep reinforcement learning. The two agents are training for a same goal; to update the parameter to the shared global model, one agent is the power control and another is the resource allocation.

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • The power control xAPP will decide the transmission power level of the BS, and after that we assume the power is uniformly allocated to all RBs. For radio resource allocation xAPP, the inter-slice radio resource allocation is first performed to decide the amount of RBs that will be allocated to each slice. Both of the agents utilize the Markov decision process(MDP) to simulate the process.

  • There are different types of the federated reinforcement learning; vertical federated reinforcement learning (VFRL) and horizontal federated reinforcement learning (HFRL). This study proposed VFRL, that is, all the agents act in the same environment. The state and action space of agents can be heterogeneous, and their actions collaboratively change the environment and influence the reward. The relation is shown as Fig. 2.

  • The learning process:

    • Firstly, each agent observes their states from the environment.
    • Next, the local models of different agents are respectively updated with local training.
    • The intermediate results calculated by local models are then submitted to a global model for global training, and the decisions of the global model are sent back to agents and executed to change the environment. Local models are then updated based on the feedback of the global model and the reward from the environment.