owned this note
owned this note
Published
Linked with GitHub
Test Plan for Codex
===================
How do we decide that our code is stable enough for a public testnet release?
### Two-client tests with Marketplace - small static network ($5$ nodes).
**Goal.**
* check that the basic purchase/store/download cycle works;
* since we are using paid storage, we can **enable block maintenance** and see that it does not break anything.
**Parameters.**
* file size $s \in \left[10, 5\,000\right]$ megabytes.
**Flow.**
1. start up a $5$ node Codex network with $5$ validator/prover nodes, and $1$ Geth PoA node;
2. **test loop.** Run sequential rounds of:
**2a.** select a node $c_1$ at random, buy storage from it, and upload a file of size up to $s$. We can use a predefined size for the file (easier to debug, covers less scenarios), or randomize it (harder to debug, covers more scenarios);
**2b.** select a second node $c_2$ at random and have the file downloaded. Check that download is successful and that contents match what is expected.
**3b.** use short time purchase contracts so that they expire and we can check that data gets dropped, if possible.
4. Fire requests with insufficient funds every once in a while and make sure they fail.
**Duration.** Either time-bound, or until storage nodes stop picking up requests because their disks are full (can happen if there's more money than storage in the network).
### Multi-client tests with Marketplace - mid-sized static network ($10$--$50$ nodes).
**Goal.**
* check that the network remains functional as we add more nodes (scales as expected);
* check that the distribution of slots remains close to uniform under static conditions.
**Parameters.**
* network size $10 \leq n \leq 50$;
* number of Geth nodes $k$ (no idea how much to scale this);
* file size $s \in \left[10, 5\,000\right]$.
**Flow.**
1. start $n \in [10,50]$ Codex nodes and $k$ Geth nodes;
2. run $\alpha$ phased (randomly staggered) instances of the [two-client test](#Two-client-tests-with-Marketplace---small-static-network-5-nodes) loop, where $\alpha = n/5$.
**Observe.**
* that performance remains roughly the same, on average;
* that storage nodes are getting fair slot allocation.
### Multi-client tests with Marketplace - mid-sized dynamic network ($10$--$50$ nodes.)
**Goal.**
* check that the network (e.g. DHT integrity) remains functional as nodes are added and removed -- in particular downloaders as those are likely to be dynamic.
**Parameters.**
* network size $10 \leq n \leq 50$;
* number of Geth nodes $k$;
* file size $S = \left[10, 5\,000\right]$.
* storage client ratio $r_s$ and non-storage ratio $r_d$ so that:
* $r_s + r_d = 1$;
* the number of storage nodes is $n_s = \lfloor r_s \cdot n \rfloor$;
* the number of non-storage nodes is $n_d \leq \lceil r_d\cdot n \rceil$.
**Flow.**
1. start $n_s$ Codex storage nodes and $k$ Geth nodes;
2. **test loop 1.** At random times, upload a file of random size $s \in S$ to a storage node.
3. **test loop 2.** Run multiple instances of the following loop, where, at random (Poisson?) time intervals:
**2a.** a downloader $c_i$ joins the network and downloads a random file;
**2b.** $c_i$ sticks around for some amount of time $t_i$;
**2c.** $c_i$ leaves the network. We can use either a hard rule (downloader leaves when it is done downloading; i.e., $t_i = 0$), or a soft rule (probabilistic session length after download). Either way, we try to enforce the bound that the number of downloaders at any given time is $\leq n_d$ (i.e. inbound and outbound processes must average out to $n_d$);
### Popular file test - mid-sized static network ($10$--$50$ nodes).
**Goal.**
* check that the network works when there are multiple downloaders for a file;
* understand what happens when a file gets popular.
**Parameters.**
* network size $10 \leq n \leq 50$;
* number of Geth nodes $k$;
* file size $S = \left[10, 5\,000\right]$.
* storage client ratio $r_s$ and non-storage ratio $r_d$ so that:
* $r_s + r_d = 1$;
* the number of storage nodes is $n_s = \lfloor r_s \cdot n \rfloor$;
* the number of non-storage nodes is $n_d \leq \lceil r_d\cdot n \rceil$.
**Flow.**
1. start $n_s$ storage nodes;
2. **test loop 1.** At random times, upload a file $f_i$ of random size $s \in S$ to a random node.
3. **test loop 2.** run several (possibly overlapping) rounds of:
2a. select one of the available files $f_j$ at random;
2b. create a download crowd of (random?) size $s_d^{(i)}$, such that $\sum_{i}s_d^{(i)} \leq n_d$ for $f_j$;
2c. check that everyone gets $f_j$ (i.e., the network doesn't break down with bigger crowds).
**Observe.**
* Measure performance by download crowd size, analyse scalability.