# Infra testing tool
[ToC]
## Summary
The goal would be to have a simple config file in which we can define which peers we want to instanciate, their specifities, the test they will run and a simple way to launch a test run.
Also we need a way to centralize all the logs from these nodes so that we can process them, the processing will be done by the Berty team.
No need to develop a graphical interface to select the different parameters, the tool will only be for devs and nothing better for us than text files that we can easily copy, modify, version, etc...
Basically, the needs would be to have the possibility to:
- define the type of nodes to instanciate: peer / Berty user, relay, bootstrap, replication and rdvp
- define their connection: which peer they are connected to, "chaos" parameter of the connection instability (random disconnection, unstable bandwidth, etc...), type of transport used (QUIC, TCP, UDP, Websocket, etc...)
- define their uptime: cycling on something like: 1 min up, 2 min down, etc... Or up during all the test, or "chaos" random uptime.
- define the groups (in the Berty protocol sense of the term) of which these peers are members
- a system to pass custom flags to the node (e.g. by default your tool runs the following command `berty daemon -flag1 -flag2`, and in addition, we can manually pass `-flag3` to a specific node through the config file)
- define their test case: send one text message each second, two media message of X MB each two minutes, etc...
Knowing that for any interaction between your test tool and the nodes (joining a Berty group, choosing a transport type, sending messages, etc...) you will work totally in collaboration with the team: if you have any need, either the solution already exists and we explain you how to do it, or we implement a new API call, a flag parameter or anything else.
## Logs and metrics
For the logs, you will simply have to send all the logs that the binary will send you somewhere classified by node ID, test session ID, etc... (putting everything in a folder on some storage would be enough at first), something like that:
```
Logs
├── session_1337
| ├── node_1748.log
| ├── node_7583.log
│ └── node_7832.log
├── session_4242
| ├── node_3284.log
| ├── node_4859.log
│ └── node_8493.log
(...)
```
You should also add your "chaos logs" (bandwidth limitation to X Kbps, node shutdown for X seconds, etc.) either:
- by enriching the logs of each node with this information
- or in separate files
For metrics, it's not essential nor a priority, but it would be a good bonus to be able to track for each node their usage in terms of CPU, RAM, network, etc...
## Config file specs
I will describe our needs below, but don't hesitate to give us your opinion if you think we can improve this proposal with something simpler, more flexible and/or more efficient.
The config file should look something like this:
```yaml=
Good_Relay:
type: relay
amount: 1
connections:
- to: internet
transport: tcp
bandwidth: 100Mbps
reliability: 0,0
Bad_Relays:
type: relay
amount: 3
connections:
- to: internet
transport: tcp
bandwidth: 1-10Mbps
reliability: 5,50
Bootstrap:
type: bootstrap
amount: 1
connections:
- to: internet
transport: tcp
bandwidth: 100Mbps
reliability: 0,0
Rendezvous:
type: rdvp
amount: 1
connections:
- to: internet
transport: quic
bandwidth: 100Mpbs
reliability: 0,0
Replication_Servers:
type: replication
amount: 3
connections:
- to: internet
transport: tcp
bandwidth: 100Mbps
reliability: 0,0
groups:
- name: group1
- name: group2
routers:
- type: rdvp
address: Rendezvous
- type: bootstrap
address: Bootstrap
LAN_Peers:
type: peer
amount: 20
flag: -p2p.mdns
connections:
- to: lan_1
transport: tcp
bandwidth: 100Mbps
reliability: 0,0
groups:
- name: group_1
tests:
- type: text
size: 10KB
every: 15
- type: media
size: 2MB
every: 120
- name: group_2
tests:
- type: text
- size: 10KB-60KB
- every: 1-30
Bridge_LAN_Internet:
type: peer
amount: 1
connections:
- to: lan_1
transport: tcp
bandwidth: 100Mbps
reliability: 0,0
- to: internet
transport: quic
bandwidth: 10Mbps
reliability: 0,0
groups:
- name: group_1
- name: group_2
routers:
- type: rdvp
address: Rendezvous
- type: rdvp
address: '/ip4/51.159.21.214/udp/4040/quic/p2p/QmdT7AmhhnbuwvCpa5PH1ySK9HJVB82jr3fo1bxMxBPW6p'
Good_Cellular_Peers:
type: peer
amount: 20
connections:
- to: internet_no_inbound
transport: tcp
bandwidth: 1Mbps-10Mbps
reliability: 20,200
groups:
- name: group1
tests:
- type: text
size: 10Kb
every: 30
routers:
- type: relay
address: Good_Relays
- type: rdvp
address: Rendezvous
Bad_Cellular_Peers:
type: peer
amount: 20
connections:
- to: internet_no_inbound
transport: tcp
bandwidth: 5Kbps-200Kbps
reliability: 20,3
groups:
- name: group_1
tests:
- type: text
size: 10KB
every: 30
routers:
- type: relay
address: Bad_Relays
- type: relay
address: '/ip4/51.159.21.214/udp/4040/quic/p2p/QmdT7AmhhnbuwvCpa5PH1ySK9HJVB82jr3fo1bxMxBPW6p'
- type: rdvp
address: Rendezvous
- type: bootstrap
address: Bootstrap
```
Each block is a set / a group of peers with an arbitrary name: `Replication_Servers`, `Bad_Cellular_Peers`, etc...
### type
* Mandatory
* Can be one of:
* `peer` a standard Berty user
* `replication` a server that will join a group and provide high availability to other members, can't decrypt messages
* `relay` a server providing NAT traversal to other peers using a TURN-like protocol
* `rdvp` a DNS-like server used to register and retrieve peers related to a namespace (kind of key-value store)
* `bootstrap` an entry point to the Berty network, will be used to exchange peers during init
### amount
* Mandatory
* Must be a positive Int defining the amount to instanciate
### flag
* Optionnal
* An arbitrary string that will be passed as a command line flag when launching Berty program
### connections
* Mandatory
* A list of connections in term of network reachability. PeerA connected to PeerB == PeerA can ping PeerB. No need to open any kind of socket, libp2p will handle this part.
#### to (connections)
* Mandatory
* Can be one of:
* `internet` connected to and reachable from internet
* `internet_no_inbound` connected to internet but not reachable from it / no inbound connection allowed
* an arbitrary name, e.g: `lan1`. All sets of peers connected to the same network name will be connected to each other
#### transport (connections)
* Mandatory
* Define the protocol on which libp2p will listen for a given connection. Can be one of:
* `tcp`
* `udp`
* `quic`
* `websocket`
* `p2p-circuit`
#### bandwidth (connections)
* Mandatory
* Can be:
* a fixed unit: `10Kpbs`, `1Mpbs`, etc...
* a range in which the chaos script will make oscillate the bandwidth: `10Kbps-10Mbps`
#### reliability (connections)
* Mandatory
* A pair of int defining a preiod of time in seconds and a probability to become unreachable, e.g: `120,2` == every 120 seconds 1 chance in 2 of becoming unreachable.
* if set to `0,0` the connection will never drop
* if a node is already unreachable, the same rule apply, e.g: `30,4` == every 30 seconds, 1 chance in 4 of remaining unreachable so 3 chance in 4 of becoming reachable
### groups
* Optionnal (should fail if set for relay, bootstrap and rdvp)
* A list of Berty groups in which the node is part of.
#### name (groups)
* Mandatory
* An arbitrary name for the group to join as a member.
#### tests (groups)
* Optionnal
* A test suite (send messages of given size, every X seconds, etc...) to run on this group.
##### type (groups->tests)
* Mandatory
* Can be one of:
* `text` random text message
* `media` random media attachment
##### size (groups->tests)
* Mandatory
* A size unit of the message to send, can be:
* a fixed unit: `10KB`, `1MB`, etc…
* a range in which the chaos script will make oscillate the size: `10KB-200KB`
##### every (groups->tests)
* Mandatory
* An Int defining the interval in seconds for sending a message, e.g: `1`, `15`, etc…
### routers
* Optionnal (should fail if set for relay, bootstrap and rdvp)
* A list of router (relay, bootstrap and rdvp) to specify in node config.
#### type (routers)
* Mandatory
* Can be one of:
* `relay`, will be set as relay in node config
* `rdvp`, will be set as rdvp in node config
* `bootstrap`, will be set as boostrap in node config
#### address (routers)
* Mandatory
* Can be:
* the name of a set of peer (e.g in config file example above: `Bootstrap`, `Rendezvous`)
* a valid multiadress (you can easily test this by using the [NewMultiaddr method](https://pkg.go.dev/github.com/multiformats/go-multiaddr#NewMultiaddr))
## Workflow
Same as above, this is just to give you an idea of what we have in mind, but you can make proposals to adapt it to something more relevant.
```bash
> ssh infra_test@svc.berty.io
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-48-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
Last login: Thu Apr 15 07:35:35 2021 from 80.127.83.12
> ls logs
session_1337 session_4242
> head -10 tests/test1.yaml
Bridge_LAN_Internet:
type: peer
connections:
- to: lan_1
transport: tcp
bandwidth: 100Mbps
reliability: 0,0
- to: internet
transport: quic
bandwidth: 10Mbps
> ./infra_testing_run tests/test1.yaml
error: you must specify a duration
usage: ./infra_testing_run <config_file> <duration>
> ./infra_testing_run tests/test1.yaml 3600
error: Bridge_LAN_Internet set is missing 'amount' attribute
> sed -i '3 i amount: 10' tests/test1.yaml
> head -10 tests/test1.yaml
Bridge_LAN_Internet:
type: peer
amount: 10 # Added using sed command above
connections:
- to: lan_1
transport: tcp
bandwidth: 100Mbps
reliability: 0,0
- to: internet
transport: quic
> ./infra_testing_run tests/test1.yaml 3600
session_1234 started:
- instanciating sets of nodes: done
- setupping the network: done
- joining Berty groups: done
- running test cases: done
- collecting logs: done
- shutting down everything: done
> ls logs
session_1234 session_1337 session_4242
> tree logs/session_1234
session_1234
├── config_used # Copy of tests/test1.yaml
├── node_1748.log
├── node_7583.log
└── node_7832.log
```
## Ideas v2
1. Add new features in config file:
- Replication servers are added by peers instead of joining groups by themselves
- Peers can send/auto-accept contact request
- Tests can be launched on MultiMember groups and Contact groups
2. Add other limitations / chaos properties to connections, but that would require to use an os level tool (not using AWS infra management tool), like:
- ping / connection delay
- % of packets dropped