WG6 - HackMD

# WG6 ###### tags: `Spec` ## Cloud Architecture and Deployment Scenarios for O-RAN Virtualized RAN ### Objectives ![](https://i.imgur.com/34pffsA.png) #### Key characteristics of cloud architectures 1. Decoupling of hardware from software. --> improve flexibility and choice for operators 2. Standardization of hardware specifications across software implementations, to simplify physical deployment and maintenance. 3. Sharing of hardware. 4. Flexible instantiation and lifecycle management through orchestration automation. ![](https://i.imgur.com/Le6L2To.png) In Fig 3. multiple cell sites feed into a smaller number of Edge Clouds, and multiple Edge Clouds feed into a Regional Cloud. ### O-RAN Functions Definitions For example: ![](https://i.imgur.com/wyswiEY.png) Fig 5, Inside the dashed box are supported by O-Cloud, O-RU is implemented as an O-RAN physical NF ### Decoupling of Hardware and Software ![](https://i.imgur.com/p7BODs2.png) See in Fig 6. shows the decoupling 1. The Hardware layer. (In the case of a VM deployment, this maps basically to the ETSI NFVI hardware sub-layer.) 2. A middle layer that includes Cloud Stack functions as well as Acceleration Abstraction Layer functions. (In the case of a VM deployment, these map to the ETSI NFVI virtualization sub-layer + VIM(virtualized infrastructure manager).) 3. A top layer that supports the virtual RAN functions. ### The O-Cloud #### Definition of O-Cloud platform: 1. A set of hardware and software components, providing cloud computing cabilities to execute RAN NF. 2. Hardware include compute, networking and storage components, and may also include acceleration technologies. 3. Cloud platform software exposes open and well-defined APIs that enables the management of the entire life cycle for networks functions. 4. Hardware and software are decoupled. Cloud Platform is an OpenStack and/or a Kubernetes deployment on a set of COTS servers (including FPGA and GPU cards), interconnected by a spine/leaf networking fabric. ### Key O-Cloud Concepts ![](https://i.imgur.com/bThYDjq.png) * **O-Cloud Instance**: A collection of O-Cloud Resource Pools at one or more location and the software to management Nodes and Deployment on them. An O-Cloid will include functionality to support both Deployment plane(user plane) and Management service. * **O-Cloud resource pool**: A collection of O-Clouds nodes with homogeneous profiles in one location which can be used for either Management services or Deployment Plane functions. The allocation of NF deployment to a resource pool is determined by the SMO. * **O-Cloud node**: A collection of CPUs, Mem, Storage, NICs, Accelerators, BIOSes, BMCs, etc., and can be thought of as a server. Each O-Cloud Node will support one or more “roles”. * **O-Cloud node role**: The functionalities that a given node may support. (include compute, storage, networking for Deployment plane, and may include optional acceleration functions, and include appropriate management services) * **O-Cloud deployment plane**: A logical construct respresenting the O-Cloud nodes across the resource pools which are used to create NF Deployment. * **O-Cloud NF deployment**: A deployment of a cloud native Network Function (all or partial), resources shared within a NF Function, or resource shared across network functions. The NF deployment configures and assembles user-plane resources required for cloud native construct used to establish the NF Deployment and manage its life cycle from creation to deletion. * **O2 interface**: Provide (i) Infrastructure management service(IMS)--> response for deploying and managing cloud infrastructure. (ii)Deployment management services(DMS)--> response for managing the lifecycle of virtualized/containerized deployments on the cloud infrastructure. For more detail about VNF, PNF, CNF see--> " [Cloud Native Network Functions](https://ligato.io/cnf/cnf-def/)" #### leaf and spine switch architecture for Edge O-Cloud (multi compute node) If resource pool contains multiple compute nodes--> may include O-Cloud switch fabric consisting of leaf and spine(interconnect O-Cloud nodes). ![](https://i.imgur.com/JLvdejD.png) * The interaction between nodes(leaf) and nodes needs to go through spine. * O-RU through an fronthaul transport * regional O-Cloud through an backhaul or Midhaul. * Switch Fabric may also be shared across multiple Cloud Resource Pools or across multiple O-Cloud instances (via separate network domains). * Switch Fabric is managed by IMS #### Infrastructure deployed at a cell site (single server) O-Cloud Resource pool comprise of a single server without any associated Switch Fabric ![](https://i.imgur.com/PK8X9hI.png) * Compute Fabric directly connected to O-RU through an O-RAN fronthaul. ### Deployment Scenarios **objective**: Making the decisions for which logical functions are mapped to which Cloud Platforms, and which functions are to be co-located with other logical functions. ### Mapping Logical Functionality to Physical Implementations #### Technical constraints that affect Hardware implementation * Environment: control/ semi-control/ exposed environment * Dimension * Transport technology: Fronthaul, Midhaul, Backhaul transport by fiber or wireless. * Acceleration hardware: some accelerator may operate in controlled environment. * Standardlized hardware: help to reduce operations complexity. #### Centralizing O-DU Functionality Benefits: (1) Capacity can be added at the central site and assigned to cell sites as needed. (2) Only having to maintain a single large controlled environment for many cell sites. Two Types of O-DU function centralization **1. simple centralization** ![](https://i.imgur.com/MNkIqvr.png) * Assigns each O-RU to a single O-DU resource. * O-RU traffic could only assigns to specified O-DU * Fronthaul Gateway (FHGW) may exist between cell site and centralized resource. **2. pooled centralization** ![](https://i.imgur.com/Y4NpRD2.png) * Traffic from O-RU can be assigned to any of serveral shared O-DU resources. * Total resources of this shared pool can be smaller than resources of distributed locations--> peak of sum will be markedly lower than individual cell site traffic peak. ### Performance Aspects #### User plane delay Objective: Allocating the total latency budget to subsystems that are on the path of each constraint. ![](https://i.imgur.com/bOh4YXV.png) * Link speeds are considered to be symmetric for all components with exception of the air interface (TAIR). For the S-Plane services utilizing PTP protocol, it is a requirement that the link lengths, link speeds and forward-reverse path routing for PTP are all symmetric. ![](https://i.imgur.com/JiX6Ngd.png) ### Hardware Acceleration and Acceleration Abstraction Layer(AAL) * O-Cloud node is a collection of CPUs, Memory, Storage, NICs, BIOS, BMCs, hardware acceleators(optional) * Hardware acceleators: FPGA, ASIC, DSP, GPU and different types of acceleration functions, such as LDPC, FEC, AI in RIC ...... All types of HW acceleration on the cloud platform should ensure the decoupling of SW and HW #### Accelerator Deployment model ![](https://i.imgur.com/mpFj8it.png) **Abstracted model**: Utilizing vhost_user and virtIO type deployment--> allows a full decoupling of the NF from HW accelerator (not suit real time latency sensitive NFs such as O-DU) **Pass-Through model**: Using SR-IOV for better acceleration capabilities. See more about vhost, vhost_user, virtio, SR-IOV [virtIO](https://rtoax.blog.csdn.net/article/details/115266114) [vhost, vhost_user](https://blog.csdn.net/Rong_Toa/article/details/114175631) [SR-IOV](https://zdyxry.github.io/2020/03/12/SR-IOV-%E5%9F%BA%E6%9C%AC%E6%A6%82%E5%BF%B5/) #### AAL Interface and Accelerator management and orchestration consideration Vendors utilize AAL interface for a given accelertor. Accelerator must provide API to allow NFs applications to discover, configure, select and use acceleration functions. Open API for example: DPDK's CryptoDev, EthDev, EventDev, Base Band Device(BBDEV) ![](https://i.imgur.com/ICkCOJs.png) Delivering an NF to an Operator, the supplier of that NF will provide not only the NF and appropriate Accelerator Driver (possibly provided by a 3rd party) also indicate the corresponding AAL profile needed in the Operator’s O-Cloud. Fig 17 shows that Acc. mgmt and Acc driver in O-Cloud. These entities will be managed via O2 (via IMS) ### Cloud Consideration #### Network Requirement ##### Support for multiple networking interface * near-RT RIC, vO-DU, vO-CU depend on having support for multiple network interface--> Cloud platform need to support the ability to assign multiple networking interface to a single container or VM instance ##### Support for high performance N-S data plane * Fronthaul between O-RU/RU and vO-DU need high performance and low latency. * Multiple vO-DUs running on same physical cloud platform--> need for sharing same physical networking interface with multiple functions--> SR-IOV * Cloud platform need to provide support for assigning for SR-IOV networking interfaces to a container or VM instance * When only one container needs networking interface--> PCI pass-through network interface ##### Support for high performance E-W data plane * High performance E-W data plane throughput is required for different near-RT RIC, vO-CU, vO-DU scenarios * One common used options for E-W--> virtual switch which provides basic communication capability for instances deployed at either the same machine or different machines. It provides L2 and L3 network functions. * One of the options use DPDK-based virtual switch--> packet will not go into Linux kernalspace networking, will implement userspace networking to improve throughput and latency ##### Support for Service function chaining(SFC) * SFC capability requires the ability to create SFC between multiple VMs or Containers * service requirement or flow direction needs to be changed, the SFC capability can be used to easily implement instead of having restart #### Assignment of Acceleration Resource * Cloud platform needs to be able to assign the specified acceleator to container or VM #### Real-time/ General Performance Feature Requirements ##### Host Linux OS * support for Pre-emptive scheduling (Linux Preempt_rt) to support the requirement of high throughput, multiple accesses and low latency, priority-based OS environment(some wireless application) ##### Support for Node Feature Discovery * Automated and dynamic placement of Cloud-Native Network Functions (CNFs) / microservices and VMs is needed, based on the hardware requirements imposed on the vO-DU, vO-CU and near-RT RIC functions. * Cloud platform support to discover the HW capabilities on each node and advertise it via labels ##### Support for CPU Affinity and Isolation * vO-DU, vO-CU, near-RT RIC are performance sensitive--> needs consume large amount of CPU cycles to work correctly. * Needs Cloud platform provide a mechanism to guarantee performance determinism even when there are noisy neighbors. ##### Support for dynamic hugepages allocation * This requires the cloud platform to support dynamically allocate the necessary amount of the faster memory (a.k.a. HugePages) to the container or VM as necessary, and also to relinquish this memory allocation in the event of unexpected termination. ##### Support for topology manager * While each CPU on one socket can access the memory region of the CPUs on another socket of the same board, the access time is significantly slower when crossing socket boundaries, and this will affect performance significantly. * Non-Uniform Memory Access(NUMA) is used(configuration of hardware with multiple memory regions) * Ensure all the containers/ VMs are associated with core(s) which are connected to the same NUMA region for guarantee response time * Cloud platform need to enable managing NUMA topology--> ensure placement of specified containers/ VMs on cores which are on the same NUMA regions, also application connected to the same NUMA. ![](https://i.imgur.com/i5X17Az.png) ##### Support for Scale in/out * Number of subscribers increases, the system needs to start more container/ VM instances to ensure the service quality. Cloud platform could monitor the CPU load; if the load reaches a level such as 80%, it needs to scale out. If the CPU load drops 40%, it could then scale in. ##### Support for Device Plugin * Cloud platform will need to provide the mechanism to support those accelerators. This in turn requires support the ability to discover, advertise, schedule and manage devices such as SR-IOV, GPU, and FPGA. #### Storage Requirement * O-RAN Cloudified NF needs storage for the image and for the O-RAN Cloudified NF itself. #### Notification Subscription Framework * Applications should retrieve notifications that are necessary for their functionality. * Applications should have the ability to select the resources that will provide them notifications about the status of these resources, initial state and changing state. * Allow applications to subscribe notifications without privilege mode #### O-Cloud Notification Subscription Requirements ![](https://i.imgur.com/3piAqx0.png) **Tracking Function:** * Tracks for resource state of relevant data * Tracking function can be configured with tracking frequency per resource being tracked **Registration Function:** * Allow application and/or SMO query for the resources that provide notifications, subscribe to receiving/pulling notifications from the selected resource(s), unsubscribe to notification(s), updates the notification function **Notification Function** * Used by tracking function to message registered listeners of the resource state and/or its relevant data * Pulls the tracking function per the application and/or SMO * as soon as an application and/or SMO registers it receives a notification of the resource(s) status it is subscribed to ### Sync Architecture #### Cloud Platform Time Synchronization Architecture * Relies on usage of Precision Time Protocol(PTP)IEEE 1588v2 to synchronize clocks throughout the edge cloud site * For LLS-C3, vO-DU may act Telecom Slave Clock(T-TSC) and select the time source the 1080 same SyncE and PTP distribution from fronthaul as O-RU. ##### Edge Cloud Site Level- LLS-C3 Synchronization Topology ![](https://i.imgur.com/BNWpiXe.png) **Primary Reference Time Clock (PRTC)**: Traceable time source--> based on GNSS/GPS **Compute nodes**: Compute Nodes synchronize their clocks to a Grandmaster Clock via the Fronthaul Network **Controller nodes**: Controller Nodes synchronize their clocks to the Network Time Protocol (NTP) via the Management Network ![](https://i.imgur.com/thinubM.png) ##### LLS-C3 Sync Topology edge site requirement **software**: Support for PTP will be needed in all the edge site O-Cloud nodes that support compute roles, and run in slave clock. **hardware**: Use of High speed, low latency Network Interface Card (NIC) with support for PTP Hardware Clock (PHC) subsystem for the data interface (fronthaul) on all the compute node(s) that will run the O-vDU function. #### Loss of Sync notification * Once an application subscribes to PTP notifications it 1129 receives the initial data which shows the PHC synchronization state and it will receive notifications when there is a state change to the sync status and/or per request for notification (pull) * If an O-vDU transits to the FREERUN state, because the synchronizing network delivers unacceptable synchronization quality, the O-vDU shall disable RF transmission on all connected O-RUs, and keep it turned off until synchronization is reacquired again. ![](https://i.imgur.com/RuX3Z77.png) ### Overview of Deployment Scenario ![](https://i.imgur.com/KZ5mCiV.png) **Scenario A** ![](https://i.imgur.com/SVtYnqh.png) * near-RT RIC, O-CU, O-DU are virtualized on same cloud platform * Deployment in dense urban area with an abundance fronthaul capacity * BBU functionality to be pooled in a central location * **Scenario B** Standalone ![](https://i.imgur.com/PWB0uWe.png) * Regional cloud: near-RT RIC(virtualized) * edge cloud: O-CU, O-DU(virtualized) * For limited remote fronthaul capacity Non-Standalone ![](https://i.imgur.com/rJUahPk.png) **Scenario C** ![](https://i.imgur.com/mRZnYj7.png) * Regional cloud: near-RT RIC, O-CU(virtualized) * Edge cloud: O-DU(virtualized) * For limited remote fronthaul capacity However for service type has tighter O-CU delay requirement, do it can be divided into C.1 and C.2 **Scenario C.1** ![](https://i.imgur.com/3onPWTx.png) ![](https://i.imgur.com/Ut8i2oi.png) * According to different Service O-CU-UP has been divided into O-CU-UP1(regional cloud) and O-CU-UP2(edge cloud) **Scenario C.2** * There are different vO-DU instances in the edge cloud ![](https://i.imgur.com/ryWQApp.png) ![](https://i.imgur.com/l1gUntp.png) * Fig 33. shows different Component carriers can be allocated to different operators, at the same O-RU at the same time. ![](https://i.imgur.com/R7O18tQ.png) **Scenario D** ![](https://i.imgur.com/zgDyJHK.png) * Regional cloud: near-RT RIC, O-CU(virtualized) * Edge location: O-DU(PNF) * Same use cases and performance requirement as scenario C **Scenario E** ![](https://i.imgur.com/biMnq5K.png) * Regional cloud: near-RT RIC, O-CU(virtualized) * Cell site: O-DU, O-RU(virtualized in cloud) * future scenario **Scenario F** ![](https://i.imgur.com/PqSfEuW.png) * Regional cloud: near-RT RIC, O-CU(virtualized) * Edge cloud: O-DU(virtualized) * Cell site: O-RU(virtualized in cloud) ### Scenarios of Initial Interest * Scenario B has been selected as the one to address initially. ## O-RAN O2 General Aspects and Principles Specification ### O2 Interface * Providing secured communication between SMO and O-Cloud. * Provide platform resources and workload management. * Enable the management of O-Cloud infrastructures and the deployment life cycle management of cloudified NFs (run on O-Cloud) Divided into two class: (1) manage the infrastructure (2) management deployments on that infrastructure ![](https://i.imgur.com/3FGaon6.png) **O-Cloud infrastructure** - Discovery and administration - Scale-In, Scale-Out - FCAPS (PM, CM, FM, Communication Surveillance) - Platform Software Management **Deployment** - Creation, Deletion and assignment of O-Cloud infrastructure - Scale-In, Scale-Out of assigned O-Cloud infrastructure resources - FCAPS (PM, FM) for assigned O-Cloud infrastructure resources - Software Management ### O-Cloud * Several O-Clouds from different vendors but will manage them as a single entity (by SMO). --> Federated O-Cloud Orchestration and Management (FOCOM) * O-Cloud nodes are assigned for specific use by O-Cloud management software according to blueprint. ![](https://i.imgur.com/lCxA4gq.png) **Management plane** : Responsible for managing the O-Cloud **Control plane**: Responsible for hosting the software which manages the resources assigned in the deployment plane to specific deployment instances **Deployment plane**: Host service model deployment ### O-Cloud Inventory * Consist of Physical infrastructure(create the O-Cloud), logic Cloud(provide as interface for deployment), inventory of deployments on the cloud. * SMO is responsible for financial accounting of physical inventory and tracking of logical inventory to physical or virtual inventory. * O-Cloud is responsible for physical resource allocation and tracking of virtual inventory to physical inventory. #### Physical Inventory * Pass between SMO and O-Cloud over O2-M * O-Cloud node ID: (1) used to correlate the inventory (2) needs to be discoverable by O-Cloud (3) is immutable * IMS discovers O-Cloud nodes as they power on within an O-Cloud pool: (1) IMS allocates them to a plane (according to blueprint) (2) IMS send O-Cloud node identifier, pool identifier, pool location identifier, use identifier to SMO * SMO received O-Cloud node startup events and update its inventory accordingly. (1) O-Cloud node identifier: matched with invoice data in order to provide financial asset tracking requirement (2) Location and use identifier: track inventory which will be monitored as part of cloud service assurance (repair technician maintenance work on physical cloud infrastructure.) ![](https://i.imgur.com/x7YlPl9.png) ### Logical Inventory * O-Cloud have one or more DMS available within its distributed footprint. * Each DMS endpoint provides an O2-D interface and is inventoried by the SMO as a logical cloud * Logical cloud endpoint is used by SMO in order to select the O-Cloud to be used for deployment. ### Deployment Inventory * Deployment Descriptor: A completed data model matching a capability advertised by the logic cloud. * NFO will match the descriptor type with the capability type to select logical cloud for enable deployment. * Deployment Descriptor will cause one or more cloud resource, but return a single correlation ID --> Deployment ID ### O-Cloud Monitoring Service * To avoid the service disruption --> Operator must consider the telemetry information * Telemetry information: analysis the O-Cloud’s state and health, and for delivering on service monitoring goals. (1) Managed Element Telemetry (2) Deployment Telemetry (3) Infrastructure Telemetry ![](https://i.imgur.com/8KdvJ1Y.png) * Managed Element Telemetry (address by O1) objective: monitor the application behavior * Deployment Telemetry (address by O2) objective: (1) monitor the number of deployment instances an O-Cloud has at that moment and how many were expected (2) how the on-progress deployment is going, CPU, network, memory usage * Infrastructure Telemetry (address by O2) Objective: (1) monitor the health of the O-Cloud Infrastructure components (2) discovering whether all components in the O-Cloud Infrastructure are working properly (3) how many deployments are running on each node (4) resource utilization of O-Cloud infrastructure ### O-Cloud Provisioning * Allocate O-Cloud resources and services to an O-RAN Cloudified NF. * O-Cloud resources are deployed to match the O-RAN Cloudified NF’s fluctuating demands * O-Cloud provisioning will need to provide several functionalities (1) Affinity, Anti-Affinity, Quorum Diversity Rules (2) Capacity Query (3) Availability Query #### Affinity, Anti-Affinity, Quorum Diversity Rules * Affinity rule Deployment with the same rule applied must be collocated within the same scope. * Anti-Affinity rule Deployment with the same rule applied cannot be collocated within the same scope. * Quorum Diversity rule (1) Deployments can be collocated if less than 50% of deployments exit within same scope. (2) Only be enforced when there is a minimum of 3 deployments. #### O-Cloud capacity and availability * The capacity is the number of resource allocated or reserved to an O-Cloud. * Capacity may be larger than sum of physical resource(oversubscription). * The Availability of an O-Cloud= Capacity - allocated resource. * O-Cloud Resources are allocated to a DMS for use. ### O-Cloud Software Management * SW management ensures security, cost management (1) Prevent unauthorized SW from being install (2) Maintains a catalog of authorized SW and its versions * O-Cloud Infrastructure SW, Cloudified NF SW need to be managed on the O2 interface. ### O-Cloud Life Cycle Management * Goal: Reduce the cost and complexity by orchestrating the deployment and management of the O-Cloud (1) Deployment --> provide automated provisioning of the O-Cloud Infrastructure (2) Registration --> register an O-Cloud and make it available for deployments ![](https://i.imgur.com/R2a6Khi.png) ![](https://i.imgur.com/PZPapRG.png) ### O-Cloud Deployment Life Cycle Management * Deploy Deploy cloudified NF with O-Cloud resource * Terminate Terminate cloudified NF, releasing O-Cloud resource * Scale Scale functional behavior and resource * Heal Recover/mitigate the cloudified NFs’ abnormal behavior in network ## Orchestration Use Cases and Requirements for O-RAN virtualized RAN ### Objectives * Use cases for O-RAN orchestration of virtualized RAN * Interface used for management and orchestration * Understand the coordination of RAN component done by SMO to deploy and operate O-RAN service. ### Orchestration Use cases * Instantiation and deployment of O-Cloud and NFs * Updating of O-Cloud and NFs * Scaling of O-Cloud and NFs