# [228] Update best_cnf_dev.md to match categories in CNF Test Suite #228 ## Description: The categories in best_cnf_dev.md are no longer in sync with the test categories in the related CNF Test Suite. ### TODO: - [x] review https://github.com/cncf/cnf-testsuite/blob/main/TEST-CATEGORIES.md - [x] determine if those categories work for the CNF WG for now - [x] Update best_cnf_dev.md to match https://github.com/cncf/cnf-testsuite/blob/main/TEST-CATEGORIES.md - [x] see line #198 for suggested edits - [x] Pull request [#239](https://github.com/cncf/cnf-wg/pull/239) created --- ### Current CNF WG Table of Contents - Location: https://github.com/cncf/cnf-wg/blob/main/doc/best_cnf_dev.md Best Practices for CNF Developers --- ### Table of Contents * 1.0 Installation and Upgrade * 2.0 Configuration and Lifecycle * 3.0 Hardware support * 4.0 Microservices * 5.0 Compatibility * 6.0 State * 7.0 Security * 8.0 Scaling * 9.0 Observability and Diagnostics * 10.0 Resilience 1.0 Installation and Upgrade --- Best practices affecting the basic management of CNF software - the acceptance of CNFs from a development team, the installation onto the network, the creation of instances in clouds within the network, and the version management of running software. 2.0 Configuration and Lifecycle --- Working with running CNFs: common patterns for setting and changing the behaviour of network functions. BEST_CNF_DEV_1001: Application is configured via standard way Description: Some Description here Link to CBPP: CBPP-0001 BEST_CNF_DEV_1002: Application announces its own membership of the tier to its peers Description: Some Description here Link to CBPP: CBPP-0002 3.0 Hardware support --- How to co-ordinate hardware to best enable CNF efficiency. Low-level access to hardware such as NICs; access to acceleration technologies; efficient management of standard resources, including manipulation of lower-level system components such as CPU cores, CPU cache and memory. 4.0 Microservices --- The best way to use microservice-based design patterns to deliver CNF functionality. 5.0 Compatibility --- How to ensure CNFs work on as many platforms, in as many places, as possible; how to ensure that cloud platforms enable as many CNFs as possible; how to make CNFs work with each other and common elements in the operational stack. 6.0 State --- Management of short-lived and long lived state, including state associated with network flows, configuration data, subscriber activity data and other data according to the varying requirements of resiliency, rate of change, access rate and persistence. 7.0 Security --- How to ensure that components are protected against security issues, including security advisories on software components, defence against attacks, and defence in depth. 8.0 Scaling --- Running of CNFs at a variety of different scales to manage different traffic requirements and subscriber counts. Changing the scale of CNFs. 9.0 Observability and Diagnostics --- How to get critical data about when things are going wrong and how to determine what must be done to put them right. The detection and correction may be through the actions of an operator or via an automated system. Using logs and metrics from all components in the system to narrow down the area where a problem exists, and to drill down into that area to determine a root cause and a fix. 10.0 Resilience --- How to ensure CNFs are resilient to failures that are inevitable in cloud environments. --- ### Current Test Suite Categories - Location: https://github.com/cncf/cnf-testsuite/blob/main/TEST-CATEGORIES.md # Test Suite Categories The CNF Test Suite program validates interoperability of CNF **workloads** supplied by multiple different vendors orchestrated by Kubernetes **platforms** that are supplied by multiple different vendors. The goal is to provide an open source test suite to enable both open and closed source CNFs to demonstrate conformance and implementation of best practices. For more detailed CLI documentation see the [usage document.](USAGE.md) ## Compatibility, Installability & Upgradability Tests #### CNFs should work with any Certified Kubernetes product and any CNI-compatible network that meet their functionality requirements. The CNF Test suite will check for usage of standard, in-band deployment tools such as Helm (version 3) charts. The CNF test suite checks to see if CNFs support horizontal scaling (across multiple machines) and vertical scaling (between sizes of machines) by using the native K8s [kubectl](https://kubernetes.io/docs/reference/kubectl/cheatsheet/#scaling-resources). The CNF Test Suite validates this: #### On workloads: - Performing K8s API usage testing by running [API snoop](https://github.com/cncf/apisnoop) on the cluster which: - Checks alpha endpoint usage - Checks beta endpoint usage - Checks generally available (GA) endpoint usage - Test increasing/decreasing capacity - Test small scale autoscaling with kubectl - Test large scale autoscaling with load test tools like [CNF Testbed](https://github.com/cncf/cnf-testbed) - Test if the CNF control layer responds to retries for failed communication (e.g. using [Pumba](https://github.com/alexei-led/pumba) or [Blockade](https://github.com/worstcase/blockade) for network chaos and [Envoy](https://github.com/envoyproxy/envoy) for retries) - Testing if the install script uses [Helm v3](https://github.com/helm/) - Testing if the CNF is published to a public helm chart repository. - Testing if the Helm chart is valid (e.g. using the [helm linter](https://github.com/helm/chart-testing)) - Testing if the CNF can perform a rolling update (i.e. [kubectl rolling update](https://kubernetes.io/docs/tasks/run-application/rolling-update-replication-controller/)) - Performing CNI Plugin testing which: - Tests if CNI Plugin follows the [CNI specification](https://github.com/containernetworking/cni/blob/master/SPEC.md) ## Microservice Tests #### The CNF should be developed and delivered as a microservice. The CNF Test suite tests to determine the organizational structure and rate of change of the CNF being tested. Once these are known we can detemine whether or not the CNF is a microservice. See: [Microservice-Principles](https://networking.cloud-native-principles.org/cloud-native-microservice-principles): #### On workloads: - Check if the CNF have a reasonable startup time. - Check the image size of the CNF. - Checks for single process on pods. ## State Tests #### The CNF test suite checks if state is stored in a [custom resource definition](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) or a separate database (e.g. [etcd](https://github.com/etcd-io/etcd)) rather than requiring local storage. It also checks to see if state is resilient to node failure: #### On workloads: - Checking volume hostpath is found or not. - Checks if no local volume is configured. - Check if the CNF is usin elastic persistent volumes - Checks for k8s database persistence. ## Reliability, Resilience & Availability Tests [Cloud Native Definition](https://github.com/cncf/toc/blob/master/DEFINITION.md) requires systems to be Resilient to failures inevitable in cloud environments. CNF Resilience should be tested to ensure CNFs are designed to deal with non-carrier-grade shared cloud HW/SW platform: #### On workloads: - Checks for network latency - Performs a disk fill - Deletes a pod to test reliability and availability. - Performs a memory hog test for resilience. - Performs an IO stress test. - Tests network corruption. - Tests network duplication. - Drains a node on the cluster. - Checking for a liveness entry in the helm chart and if the container is responsive to it after a reset (e.g. by checking the [helm chart entry](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/)) - Checking for a readiness entry in the helm chart and if the container is responsive to it after a reset ## Observability & Diagnostic Tests #### In order to maintain, debug, and have insight into a protected environment, infrastructure elements must have the property of being observable. This means these elements must externalize their internal states in some way that lends itself to metrics, tracing, and logging. The Test suite checks this: #### On workloads: - Testing to see if there is traffic to [Fluentd](https://github.com/fluent/fluentd) - Testing to see if there is traffic to [Jaeger](https://github.com/jaegertracing/jaeger) - Testing to see if Prometheus rules for the CNF are configured correctly (e.g. using [Promtool](https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/)) - Testing to see if there is traffic to [Prometheus](https://github.com/prometheus/prometheus) - Testing to see if the monitoring calls are compatible with [OpenMetric](https://github.com/OpenObservability/OpenMetrics) - Tests log output. ## Security Tests #### CNF containers should be isolated from one another and the host. The CNF Test suite uses tools like [OPA Gatekeeper](https://github.com/open-policy-agent/gatekeeper), [Falco](https://github.com/falcosecurity/falco), and [Armosec Kubescape](https://github.com/armosec/kubescape): #### On workloads: - Check if any containers are running in privileged mode. - Checks root user. - Checks for privilege escalation. - Checks symlink file system. - Checks application credentials. - Checks if the container or pods can access the host network. - Checks for service accounts and mappings. - Checks for ingress and egress being blocked. - Privileged container checks. - Verifies if there are insecure and dangerous capabilities. - Checks network policies. - Checks for non root containers. - Checks PID and IPC privileges. - Checks for Linux Hardening, eg. Selinux is used. - Checks resource policies defined. - Checks for immutable file systems. - Verifies and checks if any hostpath mounts are used. --- --- Changes: * Hardware support - removed * Compatibility - merged with Installation and Update * Scaling - merged with Compatibility, Installability & Upgradability --- # Suggested edits for CNF WG Table of Contents: ### Table of Contents * 1.0 Compatibility, Installability & Upgradability * 2.0 Configuration * 3.0 Microservices * 4.0 State * 5.0 Security * 6.0 Observability and Diagnostics * 7.0 Reliability, Resilience & Availability --- 1.0 Compatibility, Installability & Upgradability --- Best practices affecting the basic management of CNF software - the acceptance of CNFs from a development team, the installation onto the network, the creation of instances in clouds within the network, version management of running software and the compatibility of CNFs on many cloud platforms and with other CNFs. 2.0 Configuration --- Working with running CNFs: common patterns for setting and changing the behaviour of network functions. 3.0 Microservices --- The best way to use microservice-based design patterns to deliver CNF functionality. 4.0 State --- Management of short-lived and long lived state, including state associated with network flows, configuration data, subscriber activity data and other data according to the varying requirements of resiliency, rate of change, access rate and persistence. 5.0 Security --- How to ensure that components are protected against security issues, including security advisories on software components, defence against attacks, and defence in depth. ### CBPP-0002: Container should execute process(es) as non-root user **Description:** Containers have a list of their own users independent of the host system, one of which is UID 0, the root user. Containers should run processes as a user other than root which makes it easier to run the container images securely. **Reference:** [CBPP-0002](https://github.com/cncf/cnf-wg/blob/main/cbpps/0002-no-root-in-containers.md) 6.0 Observability and Diagnostics --- How to get critical data about when things are going wrong and how to determine what must be done to put them right. The detection and correction may be through the actions of an operator or via an automated system. Using logs and metrics from all components in the system to narrow down the area where a problem exists, and to drill down into that area to determine a root cause and a fix. 7.0 Reliability, Resilience & Availability --- How to ensure CNFs are resilient to failures that are inevitable in cloud environments.