owned this note
owned this note
Published
Linked with GitHub
The work of the framework definition started in the [Sylva Framework Definition HackMD](https://hackmd.io/PICLVB2DSCOoLtRFu97VGA?view)
Sylva Compliant ?!
Our main goal is to define Sylva Compliant
Sylva Framework next steps with Gergely Csatari (Nokia):
* Adopting Anuket RA2 requirements in Sylva. Modify the RA2 requirements to be aligned with Sylva if needed.
* Consuming conformance testing [Sylva test suite](https://github.com/petorre/functest-kubernetes/tree/master/sylva)
* Tests the platform against the RA2 requirements of Anuket
* Will be included in RC2
* At the moment RC2 tests only for [Kuberentes Software Conformance](https://www.cncf.io/training/certification/software-conformance/)
* Capturing all the relevant requirements into a list.
* Build Guidelines
* Define Integration points with external network components
* Define the boundaries & responsibility of Sylva - mainly the scope of Sylva.
* O2 IMS
* Shift to operators
> Gergely Csatari: We should set up an issue tracker for managing the tasks and the roadmap and link that to here.
> Alberto Morgante: I guess he means the issues at gitlab issues for the repo
# Framework Principles
> **Author:** Andrew Toth
>
> Suggestion 1: Organize the requirements in 2 sections, Requirements for the platform and Requirements for CNFs running on the platform and prefix the requirements with PLAT and CNF accordingly. All requirements should be reviewed for necessity vs reference implementation before committing in the official pages.
> **Author:** Riccardo Gasparetto Stori
>
> * We should clearly separate requirements for the Platform from guidelines for Workloads (CNFs) running on it, moving any requirements that are actually rules for applications into this "Build on Sylva" section.
> * Core vs. Profiles: Are all requirements mandatory? Or can we introduce the concept of "Profiles"? For example:
> * Core Profile: The absolute minimum to be called "Sylva-compliant".
> * High-Performance Profile: Adds requirements for SR-IOV, DPDK, NUMA alignment, etc.
> * EUCS-Ready Profile: Adds requirements related to data sovereignty and EU-based operations.
> **Author:** Gergely Csatari
>
> We did not really agree on the scope of the framework and this table makes assumptions that both the cluster management and the work clusters are in scope. If this is okay for everyone okay for me too.
A key principle of this framework is to define the required interfaces and capabilities, not the specific tools used for the implementation.
> **Author:** Gergely Csatari
>
> We should focus on the interface towards the CNF and not the implementation. Anything what can consume what Prometheus can is okay.
> **Author:** Riccardo Gasparetto Stori
>
> Agree, to create a specification that allows for multiple compliant implementations, the requirements should define the interface or capability, not the specific tool that provides it.
>
> in this case:
>
> "The platform MUST expose infrastructure and workload metrics in the OpenMetrics format."
>
> "The platform MUST provide a mechanism for collecting container logs from stdout/stderr."
>
> This allows an implementation to use a different but compatible stack. The requirement is about the data format and availability, which is what a CNF developer actually depends on.
>
> > **Author:** Alberto Morgante
> >
> > I'd agree with you, but always keeping the full list of specifications (as you defined in the code block) of the input/output params well defined. In this kind of situations, the amount of interpretations could vary, so at the end of the day, someone could say it's covered in a specific platform when it's not. If we define properly the standard specifications, there's no doubt.
Another Topic - Demonstration on OSS next week on WG05 to get into the agenda
What do we want to achieve ?
How would we like to achieve it ?
What does it mean to be Sylva compliant ? Do we have a definition for this aspect ?
Notes:
Gergely - To have a regular calles with all the relevant people in order to get more involvement to push this initiative.
Agrement between Ran & Gergely:
To define what dose it means to say Sylva compliant.
# Scope
## Deployment view
Examples:
```
* CaaS Manager - automated lifecycle management (LCM) of Kubernetes clusters
* Cluster LCM
* Cluster CICD
* Sylva-Compliant Kubernetes Cluster, incl:
* Kubernetes components
* Mandatory platform addons e.g. CNI/CSI
* Security posture
* The capabilities required from the underlying infra (BM or VM): e.g. must support SRIOV, must provide block storage...
```
* Workload clusters
## Funtional view
* Workload LCM
* ? Workload CI/CD ?
* ? Cluster LCM ?
* ? Cluster CI/CD ?
## What's NOT included (Out of scope)
Examples:
```
* Hardware models
* Physical infra provisioning
* CNFs and CNF LCM - the separate "Build on Sylva" section provides guidelines for CNF developers, but the behaviour of applications running on the platform is out of scope
```
# Use cases
* Core networks - control plane / signalling data/voice core
* Core networks - data plane / signalling data/voice core
* Core networks - charging
* RAN
* OSS
# Profiles and Flavours
Concepts:
* Profile: A mutually exclusive, foundational template that defines the primary purpose and core architecture of an entity (a cluster or a node). An entity MUST have exactly one Profile.
* Analogy: When buying a car, its Profile is "Sedan," "SUV," or "Truck." It defines the fundamental chassis and purpose. It cannot be both a Sedan and a Truck.
* Flavour: An optional, additive set of capabilities or characteristics that can be applied on top of a Profile. An entity can have zero, one, or many Overlays.
* Analogy: Your "Sedan" can have the "Luxury Package" Overlay, the "Sport Package" Overlay, and the "Cold Weather" Overlay. These add features but don't change its fundamental nature as a Sedan.
> Cristi Manda: a workload cluster does not have to be heterogeneus, if these concepts apply to a workload cluster, I think this i wrong. We can have a workload cluster with a number of workers dedicated to "compute-intensive", some other workers doing "basic" jobs and so on. I wouldn't put as "exclusive" for a cluster, only for a node, but this depends on what we consider a "profile". if the "profile" only states the infra and bootstrap provider, maybe some other basic settings, than yes, this is ok, if the profile defines more intrusive settings, then it is not ok.
| Scope | Concept | Purpose | Examples |
| -------- | -------- | -------- | - |
| Cluster | Profile | Defines the entire environment's mission. Sets the Architectural Blueprint. Defines the mandatory platform components (e.g., CNI, storage philosophy) and the target use case. A cluster is built to conform to one profile |sylva-core-network, sylva-ran |
| Cluster | Flavour | Applies a global capability or policy. Applied on top of a Cluster Profile to meet security, governance, or advanced feature requirements | service-mesh-enabled, eucs-compliant |
| Node | Profile | Defines a node's primary role. Groups Nodes by Role. Can be implemented as a node pool or MachineDeployment. Ensures all nodes in a group have the same base OS, kernel, and core software configuration | worker-du, worker-upf |
| Node | Flavour | Advertises a specific node feature. Enables Precise Workload Scheduling. Implemented as Kubernetes node labels. | sriov.capable=true, ptp.enabled=true |
# Framework Requirements
This page is the central, living list of requirements for any Sylva-compliant stack. Please propose changes via Merge Request or comment. Sources should be given for traceability.
### Open Points
* What should be considered "core" vs "optional" requirements?
* Are there any missing requirements?
### Requirements Table Structure
The following structure should be used for defining requirements to ensure clarity, traceability, and testability.
> **Author:** Riccardo Gasparetto Stori
>
> A suggestion on how the requirement table(s) fields could look like:
| ID | Requirement | Description | Justification | Implication | Link to Conformance Test | Link to implementation(s) |
| :--- | :--- | :--- | :--- | :--- | :--- | :--- |
| PLAT.LCM.001 | Declarative Cluster Lifecycle | The platform MUST support declarative, API-driven lifecycle management (create, update, delete) for workload clusters. | Enables automation and GitOps workflows, reducing operational overhead and ensuring consistency. | A CaaS manager entity must be run to manage Cluster LCM | SYL.TEST.LCM.001 | Link to other sylva docs? |
| SYL.PLAT.NET.001 | Multiple Network Interfaces | The platform MUST provide a mechanism for pods to attach to multiple networks, in addition to the default pod network. | Critical for telco workloads which require separation of traffic planes (e.g., control, user, management). | A Network multiplexer must be installed in workload clusters | SYL.TEST.NET.001 | Link to other sylva docs? |
> **Author:** Riccardo Gasparetto Stori
>
> so that we have:
>
> * Unique, Hierarchical ID
> * A Clear Requirement Statement (Title)
> * A detailed description using keywords like MUST, SHOULD, MAY (RFC 2119).
> * a Justification: Explain why this requirement exists
> * an Implication (if any) explaining non trivial "so what"s
> * Verification: Explicitly link each requirement to a test case ID. This could directly connect the requirements to the tests-conformance.md page.
> * Implementation: this could link a requirement to its implementation code, or doc(s). can be used for multiple implementations
### Requirements List
| # | Category | Requirement Description | Source / Comments |
| :-- | :--- | :--- | :--- |
| 1 | Functional | Support a scalable, cloud-native infrastructure for telco workloads across RAN, Core, and edge. | [White Paper] |
| | | | > **Author:** Gergely Csatari <br> What does MEC mean here? If this is not particularly the ETSI MEC spec I would just generalize to "edge". |
| 2 | Distributed Deployment | Manage distributed clusters across edge/RAN/core with central control and lifecycle management. | [White Paper] |
| 3 | Platform Management | Use a centralized management cluster to deploy and manage multiple workload clusters. | [Solution Overview (v1.4)] |
| 4 | Lifecycle Automation | Adopt GitOps for declarative/automated infra/app lifecycle management. | [Solution Overview (v1.4)] <br><br> > **Author:** Andrew Toth <br> Remove fluxCD as a hard requirement and replace with GitOps as ArgoCD has also been mentioned as a viable tool in the past few months. <br><br> > **Author:** Gergely Csatari <br> I would not yet call out FluxCD here. <br><br> > **Author:** Gergely Csatari <br> I would not mention specifically Helm here. There are more advanced tooling for KRM based resource management. |
| 5 | Kubernetes Lifecycle | Use Cluster API (CAPI) for management/workload cluster lifecycle and declarative ops. | [Solution Overview (v1.4)] |
| | | | > **Author:** Andrew Toth <br> Add a Requirement stating that Sylva Units will be used to hold/control the configuration of the platform and CNFs. |
| 6 | OS & Resources | Support multiple Linux distributions and resource requirements. | [TODO: Update link] <br><br> > **Author:** Alberto Morgante <br> The same comment about FluxCD could apply here too... let's remove any commercial/provider distribution to make it agnostic. |
| 7 | Networking - CNI | Support multiple Kubernetes CNI plugins (e.g., Multus) for control/data planes. | [Anuket RA2 requirements: ra2.ntw.003, ra2.ntw.004, ra2.ntw.005, ra2.ntw.006, ra2.ntw.007, ra2.ntw.009](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#networking-solutions) <br><br> > **Author:** Gergely Csatari <br> Use the exact RA2 requirement numbers, like ra2.ntw.003. The RA2 requirement is not to enable multiple CNI's but to enable multiple connections per Pod. (maybe we agree in multiple CNI's here though) I would remove the references, I think they are not relevant. Maybe Multus is :) <br><br> > **Author:** Alberto Morgante <br> IMO we should have multiple CNI's as you said with Multus enabled. This is a standard in the RAN, Core implementations. |
| 8 | Networking - SR-IOV & Low Latency | Enable SR-IOV support for high-throughput workloads (5G UPF/DU). Enable low-latency capabilities (e.g. RT Kernel). | [Anuket RA2 requirements: ra2.ntw.008, ra2.ntw.010](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#networking-solutions), [ra2.ch.002, ra2.ch.003](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#kubernetes-node) <br><br> > **Author:** Alberto Morgante <br> Low latency is not fully related to SRIOV. I'd split in 2 steps where: <br> • SR-IOV support high-throughput in 5G UPF/DU workloads, <br> • but also, enabling the RT kernel (as well as other features) provides low-latency workloads enablement. |
| 9 | Networking - DPDK | Support DPDK-enabled workloads for high-performance packet processing. | [Anuket RA2 requirements: ra2.ntw.010](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#networking-solutions) |
| 10 | Hardware Acceleration | PCI passthrough, SR-IOV, NUMA for hardware offload. | [Anuket RA2 requirements: ra2.ntw.008](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#networking-solutions), [ra2.k8s.006](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#kubernetes) |
| 11 | Resource Isolation | CPU pinning, hugepages, topology manager for deterministic latency/performance isolation. | [Anuket RA2 requirements: ra2.k8s.009](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#kubernetes), [ra2.ch.001](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#kubernetes-node) |
| 12 | Security | Enforce RBAC, Pod Security Standards (PSS), secure bootstrap with certificate validation. | [Anuket RA2 Security Guidelines](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter05.html) [TODO: Update link] |
| 13 | Observability | Provide infrastructure and workload metrics in the OpenMetrics format. Provide a mechanism for collecting container logs from stdout/stderr. (e.g., using Prometheus, Fluent Bit, Loki, Grafana) | [Solution Overview (v1.4)] <br><br> (This text is based on the consensus to specify the interface, not the tool.) |
| 14 | Compliance - ETSI | Align with ETSI NFV MANO (SOL018, SOL020) and GSMA Operator Platform requirements. | [White Paper] <br><br> > **Author:** Gergely Csatari <br> Can someone remind me what are the SOL018, SOL020 and GSMA Operator Platform requirements? According to my memories these are orthogonal to platform compatibility. |
| 15 | Data Sovereignty (EUCS) | Ensure compliance with EUCS for data locality, control, and protection in the EU. | [EUCS - EC Newsroom] |
| 16 | Operational Control (EUCS) | Ensure operation by EU-based entities with EU-resident staff. | [EUCS - EC Newsroom] |
| 17 | Access Restriction (EUCS) | Restrict access from non-EU jurisdictions per EUCS criteria. | [EUCS - EC Newsroom] |
| 18 | Encryption (EUCS) | Encrypt all data in transit and at rest, with EU-compliant key management. | [EUCS - EC Newsroom]
#### RA2 Requirements for completness
##### [Kubernetes Node](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#kubernetes)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|-------------|------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| ra2.ntw.001 | Centralized network administration | The networking solution deployed within the implementation must be administered through the Kubernetes API using native Kubernetes API resources and objects, or Custom Resources. | inf.ntw.03 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.002 | Default Pod Network - CNI | The networking solution deployed within the implementation must use a CNI-conformant Network Plugin for the Default Pod Network, as the alternative (kubenet) does not support cross-node networking or Network Policies. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements`, inf.ntw.08 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.003 | Multiple connection points | The networking solution deployed within the implementation must support the capability to connect at least 5 connection points to each Pod, which are additional to the default connection point managed by the default Pod network CNI plugin. | e.cap.004 in :ref:`chapters/chapter02:cloud infrastructure software profile capabilities` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.004 | Multiple connection points presentation | The networking solution deployed within the implementation must ensure that all additional non-default connection points are requested by Pods using standard Kubernetes resource scheduling mechanisms, such as annotations, or container resource requests. | inf.ntw.03 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.005 | Multiplexer/meta-plugin | The networking solution deployed within the implementation may use a multiplexer/meta-plugin. | inf.ntw.06 in :ref:`chapters/chapter02:kubernetes architecture requirements`, inf.ntw.07 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.006 | Multiplexer/meta-plugin CNI conformance | If used, the selected multiplexer/meta-plugin must integrate with the Kubernetes control plane via CNI. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.007 | Multiplexer/meta-plugin CNI Plugins | If used, the selected multiplexer/meta-plugin must support the use of multiple CNI-conformant Network Plugins. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements`, inf.ntw.06 :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.008 | SR-IOV device plugin for high performance | When hosting workloads that match the high-performance profile and require SR-IOV acceleration, a Device Plugin for SR-IOV must be used to configure the SR-IOV devices and advertise them to the kubelet. | e.cap.013 in :cite:t:`refmodel` Chapter 4, section Exposed Performance Optimisation Capabilities` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.009 | Multiple connection points with multiplexer/meta-plugin | When a multiplexer/meta-plugin is used, the additional non-default connection points must be managed by a CNI-conformant Network Plugin. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.010 | User plane networking | When hosting workloads that match the high-performance profile, CNI network plugins that support the use of DPDK, VPP, and/or SR-IOV must be deployed as part of the networking solution. | infra.net.acc.cfg.001 in :cite:t:`refmodel`, Chapter 5, section Virtual Networking Profiles | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.011 | NATless connectivity | When hosting workloads that require source and destination IP addresses to be preserved in the traffic headers, a NATless CNI plugin that exposes the pod IP directly to the external networks (e.g. Calico, MACVLAN or IPVLAN CNI plugins) must be used. | inf.ntw.14 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.012 | Device Plugins | When hosting workloads matching the High Performance profile that require the use of FPGA, SR-IOV or other Acceleration Hardware, a Device Plugin for that FPGA or Acceleration Hardware must be used. | e.cap.016 and e.cap.013 in :cite:t:`refmodel`, Chapter 4, section Exposed Performance Optimisation Capabilities` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.013 | Dual-stack CNI | The networking solution deployed within the implementation must use a CNI-conformant network plugin that is able to support dual-stack IPv4/IPv6 networking. | inf.ntw.04 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.014 | Security groups | The networking solution deployed within the implementation must support network policies. | infra.net.cfg.004 :cite:t:`refmodel` Chapter 5, section Virtual Networking Profiles | |
| ra2.ntw.015 | IPAM plugin for multiplexer | When a multiplexer/meta-plugin is used, a CNI-conformant IPAM network plugin must be installed to allocate IP addresses for secondary network interfaces across all nodes of the cluster. | inf.ntw.10 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.016 | K8s Network CRD De-Facto Standard multiplexer/meta-plugin | When a multiplexer/meta-plugin is used, the multiplexer/meta-plugin must implement version 1.2 of the :cite:t:`multi-net-spec`. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.017 | Kubernetes Load Balancer | The networking solution deployed within the implementation must include a L4 (TCP/UDP - except QUIC) Load Balancer to steer inbound traffic across the primary interfaces of multiple CNF pods. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.018 | Kubernetes Load Balancer - API | The Load Balancer solution deployed per `ra2.ntw.017` must support the Service type Loadbalancer API. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.019 | Kubernetes Load Balancer - API | The Load Balancer solution deployed per `ra2.ntw.017` may support the Gateway API additionally. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.020 | Kubernetes Load Balancer - Advertisements | The Load Balancer solution deployed per `ra2.ntw.017` must be capable of advertising the IPs of Services to external networks. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.021 | Kubernetes Load Balancer - Active/active Multipath | The Load Balancer solution deployed per `ra2.ntw.017` must support multi-path advertisements in an active/active design, allowing the same service IP to be advertised by multiple cluster nodes. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.022 | Kubernetes Load Balancer - High Availability | The networking solution deployed per `ra2.ntw.017` must be capable of fast failover. Upon node or pod failure, it must redirect traffic (i.e., advertisements/routes must be updated) in less than 5 seconds. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.023 | Time Sensitive Networking | Timing accuracy with PTP Hardware Clock and synchronization with SyncE. | e.cap.027 from :cite:t:`refmodel` Chapter 4, section Exposed infrastructure capabilities | |
##### [Node Operating System](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#node-operating-system)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|------------|-------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------|--------------------------------|
| ra2.os.001 | Linux Distribution | A deb-/rpm-compatible distribution of Linux. It must be used for the control plane nodes. It can also be used for worker nodes. | tbd | tbd |
| ra2.os.002 | Linux kernel version | A version of the Linux kernel that is compatible with container runtimes and kubeadm - this has been chosen as the baseline because kubeadm is focused on installing and managing the lifecycle of Kubernetes and nothing else, hence it is easily integrated into higher-level tooling for the full lifecycle management of the infrastructure, cluster add-ons, and so on. | tbd | tbd |
| ra2.os.003 | Windows server | The Windows server can be used for worker nodes, but beware of the limitations. | tbd | tbd |
| ra2.os.004 | Disposable OS | In order to support gen.cnt.02 in :ref:`chapters/chapter02:kubernetes architecture requirements` (immutable infrastructure), the Host OS must be disposable, meaning the configuration of the Host OS (and associated infrastructure such as VM or bare metal server) must be consistent - e.g. the system software and configuration must be identical apart from IP addresses and hostnames. | tbd | tbd |
| ra2.os.005 | Automated deployment | This approach to configuration management supports lcm.gen.01 (automated deployments). | tbd | tbd |
##### [Kubernetes](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#kubernetes)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|------------|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------|
| ra2.k8s.001 | Kubernetes conformance | The Kubernetes distribution, product, or installer used in the implementation must be listed in the :cite:t:`k8s-distributions` and marked (X) as conformant for the Kubernetes version defined in :ref:`chapters/chapter01:required component versions`. | gen.cnt.03 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.k8s.002 | Highly available etcd | An implementation must consist of either three, five or seven nodes running the etcd service (can be colocated on the control plane nodes, or can run on separate nodes, but not on worker nodes). | gen.rsl.02 in :ref:`chapters/chapter02:kubernetes architecture requirements`, gen.avl.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.k8s.003 | Highly available control plane | An implementation must consist of at least one control plane node per availability zone or fault domain to ensure the high availability and resilience of the Kubernetes control plane services. | | |
| ra2.k8s.012 | Control plane services | A control plane node must run at least the following Kubernetes control plane services: kube-apiserver, kube-scheduler and kube-controller-manager. | gen.rsl.02 in :ref:`chapters/chapter02:kubernetes architecture requirements`, gen.avl.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.k8s.004 | Highly available worker nodes | An implementation must consist of at least one worker node per availability zone or fault domain to ensure the high availability and resilience of workloads managed by Kubernetes. | en.rsl.01 in :ref:`chapters/chapter02:kubernetes architecture requirements`, gen.avl.01 in :ref:`chapters/chapter02:kubernetes architecture requirements`, kcm.gen.02 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.k8s.005 | Kubernetes API Version | An implementation must use a Kubernetes version as per the subcomponent versions table in :ref:`chapters/chapter01:required component versions`. In alignment with the :cite:t:`k8s-version-skew-policy`, the difference between the kubernetes release of the control plane nodes and the kubernetes release of the worker nodes must be at most **3** releases (i.e. a n-3 skew). | | |
| ra2.k8s.006 | NUMA support | When hosting workloads matching the high-performance profile, the TopologyManager and CPUManager feature gates must be enabled and configured in the kubelet. --feature-gates="…, TopologyManager=true,CPUManager=true" --topology-manager-policy=single-numa-node --cpu-manager-policy=static <br><br> **Note:** The TopologyManager feature is enabled by default in Kubernetes v1.18 and later, and the CPUManager feature is enabled by default in Kubernetes v1.10 and later. | e.cap.007 in :ref:`chapters/chapter02:cloud infrastructure software profile capabilities`, infra.com.cfg.002 in :cite:t:`refmodel`, e.cap.013 :cite:t:`refmodel` Chapter 8 | |
| ra2.k8s.007 | DevicePlugins feature gate | When hosting workloads matching the high-performance profile, the DevicePlugins feature gate must be enabled. Additionally, to utilize device health reporting, the `DeviceHealth` feature gate should be enabled. --feature-gates="…,DevicePlugins=true,DeviceHealth=true,…" <br><br> **Note:** The DevicePlugins feature is enabled by default in Kubernetes v1.10 or later. Device plugins can report device health status directly in the Pod's `allocatedResources` field. | Various, e.g. e.cap.013 in :cite:t:`refmodel` Chapter 8, section Exposed Performance Optimisation Capabilities | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.k8s.008 | System resource reservations | To avoid resource starvation issues on the nodes, the implementation of the architecture must reserve compute resources for system daemons and Kubernetes system daemons such as kubelet, container runtime, and so on. Use the following kubelet flags: --reserved-cpus=[a-z], using two of a-z to reserve 2 SMT threads. | i.cap.014 in :ref:`chapters/chapter02:cloud infrastructure software profile capabilities` | |
| ra2.k8s.009 | CPU pinning | When hosting workloads matching the high-performance profile, in order to support CPU pinning, the kubelet must be started with the --cpu-manager-policy=static option. <br><br> **Note:** Only containers in Guaranteed pods - where CPU resource requests and limits are identical - and configured with positive-integer CPU requests will take advantage of this. All other pods will run on CPUs in the remaining shared pool. | infra.com.cfg.003 in :cite:t:`refmodel` Chapter 5 | |
| ra2.k8s.010 | IPv6DualStack | To support IPv6 and IPv4, the IPv6DualStack feature gate must be enabled on various components (requires Kubernetes v1.16 or later). kube-apiserver: --feature-gates="IPv6DualStack=true". kube-controller-manager: --feature-gates="IPv6DualStack=true" --cluster-cidr=<IPv4 CIDR>,<IPv6 CIDR> --service-cluster-ip-range=<IPv4 CIDR>, <IPv6 CIDR> --node-cidr-mask-size-ipv4 ¦ --node-cidr-mask-size-ipv6 defaults to /24 for IPv4 and /64 for IPv6. kubelet: --feature-gates="IPv6DualStack=true". kube-proxy: --cluster-cidr=<IPv4 CIDR>, <IPv6 CIDR> --feature-gates="IPv6DualStack=true". <br><br> **Note:** The IPv6DualStack feature is enabled by default in Kubernetes v1.21 or later. | inf.ntw.04 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.k8s.011 | Anuket profile labels | To clearly identify which worker nodes are compliant with the different profiles defined by Anuket, the worker nodes must be labeled according to the following pattern: an `anuket.io/profile/basic` label must be set to true on the worker node if it can fulfill the requirements of the basic profile and an `anuket.io/profile/network-intensive` label must be set to true on the worker node if it can fulfill the requirements of the high-performance profile. The requirements for both profiles can be found in :ref:`chapters/chapter02:architecture requirements`. | | |
| ra2.k8s.012 | Kubernetes APIs | Kubernetes :cite:t:`k8s-alpha-api` are recommended only for testing, therefore all Alpha APIs must be disabled, except for those required by RA2 Ch4 Specifications currently NFD). | | |
| ra2.k8s.013 | Kubernetes APIs | Backward compatibility of all supported GA APIs of Kubernetes must be supported. | | |
| ra2.k8s.014 | Security groups | Kubernetes must support the NetworkPolicy feature. | | |
| ra2.k8s.015 | Publishing Services (ServiceTypes) | Kubernetes must support LoadBalancer Service (ServiceTypes) :cite:p:`k8s-services-publishing`. | | |
| ra2.k8s.016 | Publishing Services (ServiceTypes) | Kubernetes must support Ingress :cite:p:`k8s-service-ingress`. | | |
| ra2.k8s.017 | Publishing Services (ServiceTypes) | Kubernetes should support NodePort Service (ServiceTypes) :cite:p:`k8s-services-publishing`. | inf.ntw.17 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.k8s.018 | Publishing Services (ServiceTypes) | Kubernetes should support ExternalName Service (ServiceTypes) :cite:p:`k8s-services-publishing`. | | |
| ra2.k8s.019 | Kubernetes APIs | Kubernetes Beta APIs must be disabled, except for existing APIs as of Kubernetes 1.24 and only when a stable GA of the same version doesn't exist, or for APIs listed in RA2 Ch6 list of Mandatory API Groups. | int.api.04 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.k8s.020 | TLS Certificate management for workloads | Cert-manager :cite:p:`cert-manager` should be supported and integrated with a PKI certificate provider for workloads to request/renew TLS certificates. It must be configured to use strong hashing algorithms such as SHA-256 for all certificates. SHA-1 signed certificates are deprecated and will be rejected by default starting with Kubernetes 1.31. | int.api.04 in :ref:`chapters/chapter02:kubernetes architecture requirements` | kcm.gen.03 |
| ra2.k8s.021 | Resource Allocation and Management | The Kubernetes scheduler must support the allocation of virtual compute, storage, and networking resources to Pods, as defined in Pod specifications. This includes resource requests and limits for CPU, memory, and ephemeral storage, as well as the attachment of Persistent Volumes and network interfaces. The scheduler must also support resource isolation between tenants using Kubernetes Namespaces. | e.man.001, e.man.002, e.man.003, e.man.004 :cite:t:`refmodel` Chapter 9, section Cloud Infrastructure Management Capabilities | |
| ra2.k8s.022 | Workload Image Management | Kubernetes must support the management of container images, including pulling images from registries, storing images locally, and making them available for Pod execution. | e.man.005 :cite:t:`refmodel` Chapter 9, section Cloud Infrastructure Management Capabilities | |
| ra2.k8s.023 | Resource Monitoring and Notifications | Kubernetes must provide information about allocated virtualised resources per tenant, including resource usage metrics. It must also support notifications for state changes of allocated resources (e.g., Pod creation, deletion, updates) and expose performance information. The platform should provide mechanisms for collecting and notifying fault information on virtualised resources. | e.man.006, e.man.007, e.man.008, e.man.009 :cite:t:`refmodel` Chapter 9, section Cloud Infrastructure Management Capabilities | |
| ra2.k8s.024 | Sidecar Containers | An implementation must support sidecar containers. This feature is needed for deploying service mesh proxies, log shippers, and other supporting processes with CNFs. | TBC | |
##### [Container runtimes](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#container-runtimes)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|-------------|------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| ra2.crt.001 | Conformance with the Open Container Initiative (OCI) 1.0 runtime specification | The container runtime must be implemented as per the OCI 1.0 :cite:p:`github-oci-specification` specification. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.crt.002 | Kubernetes Container Runtime Interface (CRI) | The Kubernetes container runtime must be implemented as per the Kubernetes Container Runtime Interface (CRI) :cite:p:`k8s-blog-cri` | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
##### [Networking solutions](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#networking-solutions)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|-------------|-------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------|
| ra2.ntw.001 | Centralized network administration | The networking solution deployed within the implementation must be administered through the Kubernetes API using native Kubernetes API resources and objects, or Custom Resources. | inf.ntw.03 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.002 | Default Pod Network - CNI | The networking solution deployed within the implementation must use a CNI-conformant Network Plugin for the Default Pod Network, as the alternative (kubenet) does not support cross-node networking or Network Policies. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements`, inf.ntw.08 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.003 | Multiple connection points | The networking solution deployed within the implementation must support the capability to connect at least 5 connection points to each Pod, which are additional to the default connection point managed by the default Pod network CNI plugin. | e.cap.004 in :ref:`chapters/chapter02:cloud infrastructure software profile capabilities` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.004 | Multiple connection points presentation | The networking solution deployed within the implementation must ensure that all additional non-default connection points are requested by Pods using standard Kubernetes resource scheduling mechanisms, such as annotations, or container resource requests and limits. | inf.ntw.03 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.005 | Multiplexer/meta-plugin | The networking solution deployed within the implementation may use a multiplexer/meta-plugin. | inf.ntw.06 in :ref:`chapters/chapter02:kubernetes architecture requirements`, inf.ntw.07 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.006 | Multiplexer/meta-plugin CNI conformance | If used, the selected multiplexer/meta-plugin must integrate with the Kubernetes control plane via CNI. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.007 | Multiplexer/meta-plugin CNI Plugins | If used, the selected multiplexer/meta-plugin must support the use of multiple CNI-conformant Network Plugins. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements`, inf.ntw.06 :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.008 | SR-IOV device plugin for high performance | When hosting workloads that match the high-performance profile and require SR-IOV acceleration, a Device Plugin for SR-IOV must be used to configure the SR-IOV devices and advertise them to the kubelet. | e.cap.013 in :cite:t:`refmodel` Chapter 4, section Exposed Performance Optimisation Capabilities` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.009 | Multiple connection points with multiplexer / meta-plugin | When a multiplexer/meta-plugin is used, the additional non-default connection points must be managed by a CNI-conformant Network Plugin. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.010 | User plane networking | When hosting workloads that match the high-performance profile, CNI network plugins that support the use of DPDK, VPP, and/or SR-IOV must be deployed as part of the networking solution. | infra.net.acc.cfg.001 in :cite:t:`refmodel`, Chapter 5, section Virtual Networking Profiles | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.011 | NATless connectivity | When hosting workloads that require source and destination IP addresses to be preserved in the traffic headers, a NATless CNI plugin that exposes the pod IP directly to the external networks (e.g. Calico, MACVLAN or IPVLAN CNI plugins) must be used. | inf.ntw.14 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.012 | Device Plugins | When hosting workloads matching the High Performance profile that require the use of FPGA, SR-IOV or other Acceleration Hardware, a Device Plugin for that FPGA or Acceleration Hardware must be used. | e.cap.016 and e.cap.013 in :cite:t:`refmodel`, Chapter 4, section Exposed Performance Optimisation Capabilities` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.013 | Dual-stack CNI | The networking solution deployed within the implementation must use a CNI-conformant network plugin that is able to support dual-stack IPv4/IPv6 networking. | inf.ntw.04 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.014 | Security groups | The networking solution deployed within the implementation must support network policies. | infra.net.cfg.004 :cite:t:`refmodel` Chapter 5, section Virtual Networking Profiles | |
| ra2.ntw.015 | IPAM plugin for multiplexer | When a multiplexer/meta-plugin is used, a CNI-conformant IPAM network plugin must be installed to allocate IP addresses for secondary network interfaces across all nodes of the cluster. | inf.ntw.10 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.016 | Kubernetes Network Custom Resource Definition De-Facto Standard-compliant multiplexer/meta-plugin | When a multiplexer/meta-plugin is used, the multiplexer/meta-plugin must implement version 1.2 of the :cite:t:`multi-net-spec`. | gen.ost.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | :cite:t:`anuket-ri2` Chapter 4, section Installation on Bare Metal Infrastructure |
| ra2.ntw.017 | Kubernetes Load Balancer | The networking solution deployed within the implementation must include a L4 (TCP/UDP - except QUIC) Load Balancer to steer inbound traffic across the primary interfaces of multiple CNF pods. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.018 | Kubernetes Load Balancer - API | The Load Balancer solution deployed per `ra2.ntw.017` must support the Service type Loadbalancer API. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.019 | Kubernetes Load Balancer - API | The Load Balancer solution deployed per `ra2.ntw.017` may support the Gateway API additionally. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.020 | Kubernetes Load Balancer - Advertisements | The Load Balancer solution deployed per `ra2.ntw.017` must be capable of advertising the IPs of Services to external networks. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.021 | Kubernetes Load Balancer - Active/active Multipath | The Load Balancer solution deployed per `ra2.ntw.017` must support multi-path advertisements in an active/active design, allowing the same service IP to be advertised by multiple cluster nodes. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.022 | Kubernetes Load Balancer - High Availability | The networking solution deployed per `ra2.ntw.017` must be capable of fast failover. Upon node or pod failure, it must redirect traffic (i.e., advertisements/routes must be updated) in less than 5 seconds. | inf.ntw.15 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.ntw.023 | Time Sensitive Networking | Timing accuracy with PTP Hardware Clock and synchronization with SyncE. | e.cap.027 from :cite:t:`refmodel` Chapter 4, section Exposed infrastructure capabilities | |
##### [Storage components](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#storage-components)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|-------------|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|--------------------------------|
| ra2.stg.001 | Ephemeral storage | An implementation must support ephemeral storage, for the unpacked container images to be stored and executed from, as a directory in the filesystem on the worker node on which the container is running. See the `Container runtimes <#container-runtimes>`__ section above for more information on how this meets the requirement for ephemeral storage for containers. | | |
| ra2.stg.002 | Kubernetes Volumes | An implementation may attach additional storage to containers using Kubernetes Volumes. | | |
| ra2.stg.003 | Kubernetes Volumes | An implementation may use Volume Plugins (see ra2.stg.005 below) to allow the use of a storage protocol (such as iSCSI and NFS) or management APIs (such as Cinder and EBS) for the attaching and mounting of storage into a Pod. | | |
| ra2.stg.004 | Persistent Volumes | An implementation may support Kubernetes Persistent Volumes (PV) to provide persistent storage for Pods. Persistent Volumes exist independent of the lifecycle of containers and/or pods. | inf.stg.01 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.stg.005 | Storage Volume Types | An implementation must support the following Volume types: emptyDir, ConfigMap, Secret, and PersistentVolumeClaim. Other Volume plugins may be supported to allow for the use of a range of backend storage systems. | | |
| ra2.stg.006 | Container Storage Interface (CSI) | An implementation may support the Container Storage Interface (CSI). CSI drivers must be used as in-tree volume plugins are deprecated and have been removed. To support CSI, the feature gates CSIDriverRegistry and CSINodeInfo must be enabled. The implementation must use a CSI driver (full list of CSI drivers :cite:p:`k8s-csi-drivers`). An implementation may support ephemeral storage through a CSI-compatible volume plugin. In this case, the CSIInlineVolume feature gate must be enabled. An implementation may support Persistent Volumes through a CSI-compatible volume plugin. In this case, the CSIPersistentVolume feature gate must be enabled. | | |
| ra2.stg.007 | Storage Classes | An implementation should use Kubernetes Storage Classes to support automation and the separation of concerns between providers of a service and consumers of the service. | | |
| ra2.stg.008 | Storage with Replication | An implementation may support Kubernetes Persistent Volumes (PV) with replication to provide redundant storage for Pods. Replication should be configurable and occur at the storage layer, independent of the Pod lifecycle. The replication mechanism should ensure data consistency across replicas. | infra.stg.cfg.003 :cite:t:`refmodel` Chapter 5, section Virtual Storage | |
| ra2.stg.009 | Storage with Encryption | An implementation may support Kubernetes Persistent Volumes (PV) with encryption at rest. Encryption should be configurable at the storage layer. Key management for encryption should follow security best practices, and integrate with a dedicated key management system (KMS). | infra.stg.cfg.004 :cite:t:`refmodel` Chapter 5, section Virtual Storage | |
| ra2.stg.010 | Storage Class Support | An implementation must support at least one storage class through Kubernetes Persistent Volumes (PVs) and Persistent Volume Claims (PVCs). The supported storage class(es) must be documented. Support for both Block and File storage classes is optional, but recommended. | inf.stg.02 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
##### [Infrastructure Monitoring and Telemetry](https://github.com/anuket-project/RA2/blob/main/doc/ref_arch/kubernetes/chapters/chapter04.rst#infrastructure-monitoring-and-telemetry)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|-------------|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|
| ra2.mon.001 | Node and Control Plane Metrics | The implementation **MUST** collect key resource utilization metrics for all Kubernetes nodes (worker and control plane). This includes, at minimum, CPU usage, memory usage, disk I/O, disk usage, and network I/O. Metrics **MUST** also be collected for core control plane components (kube-apiserver, etcd, kube-controller-manager, kube-scheduler, kubelet) covering aspects like request latency, error rates, and resource consumption where applicable. | i.pm.001, i.pm.003, i.pm.005, i.pm.006, i.pm.009, i.pm.010, i.pm.011, i.pm.013, i.pm.014, i.pm.015, i.pm.017 - i.pm.022 (Intent covered), e.man.008 | |
| ra2.mon.002 | Node and Control Plane Log Aggregation | The implementation **MUST** provide a mechanism to aggregate logs from all Kubernetes nodes and core control plane components. This includes system logs (e.g., journald/syslog), container runtime logs, kubelet logs, and logs from kube-apiserver, etcd, kube-controller-manager, and kube-scheduler. Aggregated logs **SHOULD** be centrally stored and searchable. | sec.lcm.005, sec.mon.001, e.man.009 | |
| ra2.mon.003 | Kubernetes Event Collection | The implementation **MUST** collect events generated by the Kubernetes API server. These events provide insights into cluster state changes, scheduling decisions, errors, and other operational activities. Collected events **SHOULD** be centrally stored and accessible for troubleshooting and auditing. | e.man.007 | |
##### [Kubernetes Application package managers](https://cntt.readthedocs.io/projects/ra2/en/latest/chapters/chapter04.html#kubernetes-application-package-managers)
| Ref | Specification | Details | Requirement Trace | Reference Implementation Trace |
|-------------|----------------------------|-------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------|-------------------------------|
| ra2.pkg.001 | API-based package management | A package manager must use the Kubernetes APIs to manage application artifacts. Cluster-side components such as Tiller must not be required. | int.api.02 in :ref:`chapters/chapter02:kubernetes architecture requirements` | |
| ra2.pkg.002 | Helm version 3 | All workloads must be packaged using Helm (version 3) charts. | | |
### Detailed Requirements Under Discussion (WIP)
Below are additional specific requirements and features that were under discussion in the merge request. These points need to be reviewed, refined, and potentially moved into the formal requirements table.
* Using CAPI / CAPO / CAPV / CAPD for Cluster LCM
> **Author:** Alberto Morgante
>
> Only limited to CAPI (or the generic CAPx) ??? Cluster class and some improvements have been discarded? Sorry if this is something already talked before.
* CNI Installation and configuration
* Support MultiCluster deployment
* Configuration of SR-IOV interface
* CPU pinning and Huge Pages
* Support Precision Time Protocol (PTP)
* Persistent storage
* Cluster Monitoring (including energy efficiency)
* Policy Management
> **Author:** Alberto Morgante
>
> What's mean? CPU policy management?
# Tests, Conformance & Certification
### Overview
The goal of the Tests & Conformance section is to ensure that any deployment claiming Sylva compliance meets all framework requirements in a consistent and reproducible way.
A foundation for compliance, conformance and interoperability testing.
Tests cover:
* Functional requirements (see [Framework Requirements](#framework-requirements))
* Integration and interoperability (multi-cluster, bare metal, etc.)
* Security and policy conformance
* Compatibility with industry standards (e.g., [Anuket RA2](https://docs.anuket.io/projects/RA2/en/latest/))
### Test Categories
> **Author:** Alberto Morgante
>
> I'd split [Performance & Scaling] in 2:
>
> * Scalability tests : -> As you have in the description (the same things).
> * I'd create a new test for performance: To review, networking performance, cpu performance, etc...
>
> Again, performance tests, networking tests are even more important than the functional tests in some cases. I'd add the full list of tests to be covered.
>
> > **Author:** Gergely Csatari
> >
> > Now we really starts to sound like OPNFV🤨 Do you mean that we should run artificial workloads testing the capabilities and capacities of the platform?
> >
> > I would add testing requirements in a different mr. This mr starts to be too big.
| Test Category | Description | Status | Notes/Links |
| :--- | :--- | :--- | :--- |
| **Functional** | Verify cluster LCM, networking, storage, etc. | WIP | [TODO: link to test specs] |
| **CNF Validation** | Ensure CNFs run and interoperate as expected | WIP | [TODO: link to scenarios] |
| **Security** | Validate RBAC, Pod Security, admission controls, etc. | WIP | |
| **Observability** | Confirm deployment and health of monitoring stack | WIP | |
| **Anuket RA2 Compliance** | Run official [Functest](https://github.com/petorre/functest-kubernetes/tree/master/sylva) suite | WIP | [Functest Sylva](https://github.com/petorre/functest-kubernetes/tree/master/sylva) |
| **Performance** | Review networking performance, CPU performance, etc. | WIP | |
| **Scaling** | Evaluate cluster/resource scaling, failover, etc. | WIP | |
| **EUCS Compliance** | Validate EU sovereignty and data protection requirements | WIP | |
### How to Run the Tests
> **Author:** Alberto Morgante
>
> Is this part of each implementation? should be generic?
> _Instructions or links for running the official conformance test suite(s) on a Sylva deployment._
* [TODO: Add step-by-step guide]
* Prerequisites & setup - **TODO**
* Example test commands - **TODO**
* Expected results/reporting - **TODO**
* How to contribute new tests - **TODO**
### Reporting & Certification
> **Author:** Alberto Morgante
>
> IMO we need, and this is the most important part IMO to say something is compliance. The right definition on this part defines the success of the framework.
>
> I'd say, let's define first the "what" we have to document and share, and let's work later on "how" to share it.
>
> IMO as a framework, should be a validation/certification platform where we can test some output to validate them (all of us using the same way).
* [TODO: Describe how results should be documented and shared]
* [TODO: Outline any process for Sylva compliance "badging" or certification, if relevant]
# Reference Implementation
This page describes the current state of the Sylva Reference Implementation.
### What is the Reference Implementation?
The Sylva Reference Implementation is the canonical open-source stack that demonstrates how to fulfill the framework requirements in practice.
It serves as:
* A working example of all key requirements and features described in the [Framework Requirements](#framework-requirements) page
* A baseline for new contributors and integrators
* A foundation for compliance, conformance and interoperability testing
### Components
> **Author:** Gergely Csatari
>
> There is a missing of abstraction happening here. There are deployment view elements, like management or workload cluster, there are software functions like the monitoring stack or network features and there are some software components. We should define what are the "components" and limit the list to the components.
* **Management Cluster** (Kubernetes, GitOps tooling, Cluster API)
* **Workload Clusters**
* **Bare Metal Provisioning** (if applicable)
* **GitOps Repositories** (cluster & app manifests)
* **Monitoring Stack** (Prometheus, Grafana, Loki, etc.)
* **Network Features** (SR-IOV, Multus, etc.)
* **Security Hardening** (RBAC, PSS, etc.)
# Relevant Links
* [The white paper (PDF)](https://gitlab.com/sylva-projects/sylva/-/blob/main/White_Paper_Operators_Sylva.pdf)
* [The FAQ (PDF)](https://gitlab.com/sylva-projects/sylva/-/blob/main/FAQ_Sylva.pdf)
* [The technical charter (PDF)](https://gitlab.com/sylva-projects/sylva/-/blob/main/Project_Sylva_Technical_Charter.pdf)
* [Technical Steering Committee](https://gitlab.com/sylva-projects/governance/-/wikis/home)
* [Technical Documentation (v1.4)](https://sylva-projects.gitlab.io/docs/1.4/getting-started/solution-overview/)
* [Main README](https://gitlab.com/sylva-projects/sylva/-/blob/main/README.md)
* **Anuket RA2 links:**
* [RA2 Root](https://docs.anuket.io/projects/RA2/en/latest/)
* [RA2 Chapter 2](https://docs.anuket.io/projects/RA2/en/latest/chapters/chapter02.html)
* [RA2 Chapter 4](https://docs.anuket.io/projects/RA2/en/latest/chapters/chapter04.html)
* **Conformance testing for Sylva (Anuket RA2):**
* [Functest Kubernetes - Sylva](https://github.com/petorre/functest-kubernetes/tree/master/sylva)
# Meeting notes
## 2025.07.28
* Agree to have a clear requierments Doc (E2E / Managment + Workload / building block).
* We should agree in the use cases and the scope of the framework
* Logistics
* We use this HackMD to collect opinions and agree on things and we use mrs in https://gitlab.com/sylva-projects/sylva-projects.gitlab.io/ to finalize the agreement
* Add the [use cases](#Use-cases) and to the respective [Scope](#Scope)
### Discussion
* There will be a need to define different profiles. Core and RAN are the obvious ones
* Node flavours should be defined as well
* CI/CD should be the default option for CNF deployment
* There should be a differentiation between the platform requirements and the CNF requirements
* When do we want to bring these discussions to the TSC?
* Define the targets, scope and use cases and then introduce to the TSC