# CHAOS Integration with ANT
This document summarizes Mobility’s requirements for Chaos Tooling and
outlines the Prototype for enabling Chaos Testing using ANT.
## Table of Contents
1. [Overview of ANT](#overview-of-ant)
2. [Mobility Requirements](#mobility-requirements)
3. [POC Work](#poc-work)
4. [Target Cluster Components](#target-cluster-components)
5. [Client Components](#client-components)
6. [Chaos TestCase Flow](#chaos-testcase-flow)
7. [Initial Use Case : Network Jitter ](#initial-use-case-:-network-jitter )
8. [Scope in/out](#scope-in/out)
9. [References](#references)
## Overview of ANT
ANT (ATT Networking Testing) is a Tool used by AT&T Mobility for all the functional
testing of deployed Network Functions (NF). ANT helps with compliance & functional
testing in accordance to industry standards(IETF, 3GPP, ETSI).
The latest trend of deploying the NFs as containerized services in a cloud native
way using platforms like Kubernetes (etsi-ifa029/nc reference). It makes imperative
that NFs are tested for resiliency and scalability - a Chaos engineering-based
test is required.
Mobility Team at AT&T is looking at ways to incorporate NFs Chaos testing abilities
into the current testing scope using the ANT framework.
## Mobility Requirements
The following are high-level Mobility requirements for chaos testing
* Ability to support Chaos for `Pod/Node/Network` in `combination/parallel/serial` execution.
* Chaos tests should run along with other functional tests within ANT
* Need ability to inject network Latency and Packet Loss
* Support for node loss scenarios
## Scope of the Work
* Chaos Tooling for CNF workloads
* ANT support for chaos testing
* VNF/VM is not the focus
* All chaos tasks are delivered as self contained docker-images
## POC Work

## Target Cluster Components
#### Argo Workflow
A workflow has specific set of actions that it executes in a predefined order.
`Argo Workflow` is used for creating the workflow
* Argo workflow will be pre-installed on `target` kubernetes cluster
* User's selection of `tasks` and `inputs` at ANT Portal are packaged into an
`Agro workflow` template and submitted to `target` cluster
#### Litmus
Litmus is a toolset to do cloud-native Chaos Engineering. Litmus provides tools
to orchestrate chaos on Kubernetes to help developers and SREs find weaknesses
in their application deployments.
* Litmus will be pre-installed on `target` kubernetes cluster
* Argo Workflow will trigger Litmus based on the User's input from the ANT portal
Litmus broadly defines Kubernetes `chaos experiments` into two categories:
`application` or `pod-level` and `platform` or `infra-level` chaos experiments.
* pod-level experiments include `pod-delete, container-kill, pod-cpu-hog, pod-network-loss, etc.`
* infra-level includes `node-drain, disk-loss, node-cpu-hog, etc.`
## Types of Chaos
`Chaos Tasks` are atomic events targeting a very specific narrow action
to be executed on the target cluster.
* Pod Chaos Task - `Stop/Delete` a Pod
* Node Chaos Task - `Stop/Delete` a Node
* Network Choas Task - `Drop Packets/Delay Packets/Throttle` interface of a POD or a Node
## Client Components
### ANT-Testcase
### Internals of an ANT-Testcase
A typical ANT testcase is a combination of three files
1. Metadata (User Interface and other input parameters definition)
1. Robot (glue)
1. Python (biz logic)
```mermaid
graph TB
CLI[CLI exec]--Rest API-->Robot
User--Test Meta-Data-->UI
subgraph ANT TestCase
UI[UI]-->Robot
subgraph Core
Robot--Triggers-->Python
end
end
```
#### User Interface
ANT provides a simple programmatic way to generate custom UI inputs for the testcase.
The UI would then be rendered based on the json defined in the metadata file of the TestCase
For Data driven tests, Parameters play an important role in test cases.
All the required parameters of test case are defined in test case's metadata file.
#### Robot Script
Using Robot framework, the testcase/chaos logic implemented in python will be
executed ANT testcase execution triggers the robot script within ANT engine,
which in turn will execute the "Test Cases"
#### Python Script
ANT allows you to customize the business logic within a python file and execute
that for your testcase.
More Reading: [ANT Developer Guide](https://wiki.web.att.com/display/ANT/Automated+Network+Testing)
#### TestCase with Argo Workflow
``` =
├── Makefile
├── README.rst
├── antchaos
│ ├── ant
│ │ └── test_scripts
│ │ ├── __init__.py
│ │ ├── startchaos <-- TestCase Name
│ │ │ ├── script_meta_data <-- User Interface
│ │ │ │ ├── Default_Filter.json
│ │ │ │ ├── UI_Columns.json
│ │ │ │ ├── namespace.json
│ │ │ │ ├── startchaos.json
│ │ │ │ └── vars
│ │ │ │ ├── chaos_test_data.json
│ │ │ │ ├── chaos_test_settings.json
│ │ │ │ ├── k8s_cluster_details.json
│ │ │ │ ├── k8s_details.json
│ │ │ │ └── logId.json
│ │ │ └── test_cases
│ │ │ ├── antchaos.robot <-- Robot Script
│ │ │ └── lib
│ │ │ └── ChaosWorkflowStart.py <-- create argo workflow logic
│ │ └── stopchaos <-- TestCase Name
│ │ ├── script_meta_data <-- User Interface
│ │ │ ├── Default_Filter.json
│ │ │ ├── namespace.json
│ │ │ ├── stopchaos.json
│ │ │ └── vars
│ │ │ ├── k8s_cluster_details.json
│ │ │ ├── k8s_details.json
│ │ │ └── logId.json
│ │ └── test_cases
│ │ ├── antchaos.robot <-- Robot Script
│ │ └── lib
│ │ └── ChaosWorkflowStop.py <-- create argo workflow logic
│ ├── common <-- Common Python modules
│ │ └── __init__.py
│ └── tests
│ ├── functional
│ │ └── __init__.py
│ └── units
│ └── __init__.py
├── build
│ ├── startchaos.zip
│ └── stopchaos.zip
├── docs
│ ├── Makefile
│ ├── conf.py
│ ├── history.rst
│ ├── index.rst
│ ├── installation.rst
│ ├── make.bat
│ └── readme.rst
├── pylintrc
├── test-requirements.txt
├── tools
│ ├── README.rst
│ ├── __init__.py
│ ├── builder.py
│ ├── metadata.py
│ └── yapf-with-message.sh
└── tox.ini
```
## Chaos TestCase Flow
A testcase as part of ANT will use kubernetes Restful APIs for the following:
* Create & Execute Chaos
* Stop Chaos
* Retrieve Logs
### Sequence Diagram
```mermaid
sequenceDiagram
participant ATE
participant TestCase
participant k8s
participant Argo Workflow
participant Litmus
rect rgba(0, 0, 255, .1)
note over ATE,Litmus: 1. Authenticate
ATE->>+TestCase: Ant user trigger the Chaos Testcase
TestCase->>+k8s: authenticate
end
par chaos actions
note over ATE,Litmus:2. Trigger Chaos
TestCase->>+k8s: Submit the Argo Workflow to the k8s API
k8s ->>+ Argo Workflow: CR starts the workflow
Argo Workflow ->>+ Litmus: CR Starts the Chaos
Litmus->>-k8s: Creates the Chaos pods
k8s-->>-TestCase: cont
TestCase-->>-ATE: next steps
rect rgba(0, 0, 255, .1)
note over ATE,Litmus:3. Stop Chaos
ATE ->>+ TestCase: Ant user trigger stop Chaos TestCase
TestCase ->>+ k8s: Trigger Stop Chaos Action
k8s ->>+Argo Workflow: Fetch logs and Delete CR
k8s->>-TestCase: chaos stopped
TestCase-->>-ATE: next steps
end
end
note over ATE,Litmus:4. Fetch Logs
ATE ->>+ TestCase: fetch test logs
TestCase ->>+ k8s : Fetch logs K8S API
k8s ->>+Argo Workflow: logs
k8s ->>- TestCase: run logs
TestCase->>-ATE: test logs
```
## Initial Use Case : Network Jitter
Jitter is a way of delaying or dropping data packets on the interface to simulate
congestion in real world. Linux has a tool
[TC (Traffic Control)](https://man7.org/linux/man-pages/man8/tc.8.html#:~:text=Tc%20is%20used%20to%20configure,traffic%20for%20better%20network%20behaviour.)
to shape the network traffic.
At Container level, there are multiple tools for simulating network jitter
* [Litmus](https://litmuschaos.io)
## References
* [Code Repository](https://gerrit.mtn5.cci.att.com/gitweb?p=nc-airship-tech-evaluation.git;a=tree;f=mobility_chaos/ant_testcases;hb=refs/heads/main) ( Note: Under active development.)
* [Kubernetes Restful API](https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CustomObjectsApi.md#create_namespaced_custom_object)
* [ANT](https://wiki.web.att.com/display/ANT/Automated+Network+Testing)