# Tannhauser - Nix Hydra for Web3
## Pitch
**Nix Hydra for Web3**
## Motivation
> Who overcome by force, hath overcome but half his foe.
> ― John Milton, *Paradise Lost*
Statistically rigorous runner for web3 client and protocol.
High Assurance and Confidence in testing results.
Reduce the feedback loop time for getting actionable results.
Make multichain management and upgrading contracts reliable, robust, and standardized.
> Support `jj` https://github.com/martinvonz/jj or `JOSH` https://github.com/josh-project/josh
## Problem Statement
If you’re dealing with multiple networks (blockchains) with potentially different EVM/service models and multiple infrastructure providers, you have to maintain integrations between N execution/service models and M infrastructure providers.
This integration includes both generating each nnetwork'sspecific configuration and also deploying it— a process which can vary significantly across networks.
> Let us assume key management is a solved issue, to say the least.
- **Infrastructure** configurations and APIs are heterogeneous making them hard to connect together— for example, different tools have different configuration languages, levels of abstraction, storage and push mechanisms, etc. As a result, infrastructure is inconsistent, and establishing common workflow management is difficult.
- **Production**: upgrading contracts/change management is a brittle process, with little understanding of the interactions between changes. This issue is even more complicated by how protocols may implement different upgrade patterns and all the nuances that make them complex.
> Contract upgrade patterns vary by protocol implementation. They may even differ on which chain the relevant contracts are on. This sort of knowledge should be retained by the system and not by the engineer. It makes the protocol's viability less reliant on specific team members doing specific tasks.
Additionally, we may also care about...
- bundle size?
- build time?
- lines of code?
- number of dependencies?
- ... etc
+ How do you measure those KPIs?
- How do you track them over time?
- How do you compare them across branches or even forks?
-
Keep track of the following and much more
- bundle size
- build time
- lines of code
- number of dependencies
- docker image size
- benchmarks
- code coverage
- puppeteer metrics (Performance, Accessibility, Best Practices, SEO, Progressive Web App)
- static analysis
- quality metrics
> Every number that can be measured, should (potentially) be measured and recorded.
## Native DVCS Support
Jujutsu is a [Git-compatible](https://github.com/martinvonz/jj/blob/main/docs/git-compatibility.md) [DVCS](https://en.wikipedia.org/wiki/Distributed_version_control). It combines features from Git (data model, [speed](https://github.com/martinvonz/jj/discussions/49)), Mercurial (anonymous branching, simple CLI [free from "the index"](https://github.com/martinvonz/jj/blob/main/docs/git-comparison.md#the-index), [revsets](https://github.com/martinvonz/jj/blob/main/docs/revsets.md), powerful history-rewriting), and Pijul/Darcs ([first-class conflicts](https://github.com/martinvonz/jj/blob/main/docs/conflicts.md)), with features not found in most of them ([working-copy-as-a-commit](https://github.com/martinvonz/jj/blob/main/docs/working-copy.md), [undo functionality](https://github.com/martinvonz/jj/blob/main/docs/operation-log.md), automatic rebase, [safe replication via `rsync`, Dropbox, or distributed file system](https://github.com/martinvonz/jj/blob/main/docs/technical/concurrency.md)).
> I mention this because at Manifold Finance we have spent as much as 40% of engineering time maintaining custom forks of Ethereum clients for our use cases. Enabling easier code conflict resolution will reduce overhead on technical debt incurred and more importantly reduce time spent on non-productive tasks.
#### `operation log` enables concurrent operations
One benefit of the operation log (and the reason for its creation) is that it allows lock-free concurrency -- you can run concurrent `jj` commands without corrupting the repo, even if you run the commands on different machines that access the repo via a distributed file system (as long as the file system guarantees that a write is only visible once previous writes are visible).
## Web3 Considerations
In the current chain - oriented management workflow model, state exists only on the chain. That is to say that your protocol is running on `version X` because that is the latest deployment you have made.
With upgradable contracts, this problem is important as you may not be able to correctly upgrade your protocol given the spurious nature of network activity, etc.
### Conceptual Model
#### Declarative oriented management with per-step Finite State Machine actors
##### Declarative oriented management
Declarative oriented management lets us ensure that the state we want the protocol to be in after an upgrade is the correct one. We can define the *intent* of the deployment and how it should result in the correct production state for the protocol after the fact.
In a workflow-oriented production management model, a large part of the state of production exists only in production. For example, your frontend runs version X because a few days ago, you started a rollout of this specific version.
In contrast, declarative production means that you write the intent of your production state—that your production is supposed to run version X—in a configuration file or a database.
The production state is now derived from this intent. Paired with continuous enforcement, this ensures that production matches what people expect.
This intent-based actuation uses an enforcement (re: constraint) system that treats production assets as homogenous.
> To phrase the problem in the “pet vs cattle” metaphor: previously, developers treated workflows like pets: keeping a running inventory of individual workflows, hand-tuned them to their relevant network deployments, and interacted with them individually.
>
> Intent-based actuation instead uses an enforcement (re: constraint) system that treats production assets as cattle: special cases become rare, and scaling becomes much easier.
> *Podspec and Annealing: Intent based actuation at Google*[^1]
These are the basics of the proposed solution: a continuous testing and constraint enforcement system.
##### Per Step Finite State Machine Actors
Using a `DAG` to represent a workflow has limitations including:
1. Expressing different transition rules for different steps.
2. Rules apply to all steps.
3. Defining Special rules for specific types of steps will be difficult to implement.
4. End state determination will be difficult. End state is the state where the workflow can no longer be run. This can happen if the various steps end up in a state where there is no way to move them forward in the DAG. Temporal Logic further complicates this in the multichain context.
Instead, each step is a state machine with its own states, actions and transition/constraint rules.
A workflow is achieved by guarding actions on state machines with pre-conditions, which state the states _other_ machines need to be in. These other machines can be different network clients, thus providing an abstracted view of that network.
An action can be performed when the other machines are in the states as specified in the pre-conditions. With this, one machine can be run only if its predecessors have successfully run, thereby achieving a workflow. However, since such pre-conditions can be specified for _any_ state transition, arbitrary steps can be run based on transitions (e.g., if a step fails, a clean-up step can be run). Such flexibility with a `DAG` would be more complicated.
> End state determination is always deterministic. The end state is when all the machines have no possible actions, thereby, no state transitions are available. This simplifies workflow monitoring and management.
## Must Haves
It MUST use repeated sampling and statistics to reliably identify even tiny differences in runtime.
- MUST support a unified way to identify data handled within the system.
- MUST support a unified way to define state that the system is interacting with.
Both these requirements can be handled by a flavor of UTI as it can encapsulate any kind of type information.
- MUST support DCVS
The platform must implement a REST API:
- diagnostic endpoint
- telemetry
- user usage
- 'bot' usage
- support different backends for persistence (s3, ipfs?).
### Preliminary Feature Ideas
- Web socket supported build instrumentation
- Operation log ([from `jj` - see their docs](https://github.com/martinvonz/jj/blob/main/docs/operation-log.md#concurrent-operations)
- Time Series metrics for reporting and identifying performance regressions
- A custom 'bot' that can be added in GitHub/Gerrit etc, to perform tasks etc.
- Ability to save testing corpus output to persistence layer.
- Ability to support incremental compiling
- Ability to support scoped pipeline building
- Ability to define a 'Campaign' these are long-running, isolated, workflows.
- Ability to support Monorepos
- Prefixed API Key
- Presigned URLs for sharing assets or reports
- Real-time status of the entire workflow or any part of the workflow ( Imagine long-running step functions or a workflow that has many child workflows)
- In-memory Postgres Database for accumulating corpus results for improving coverage guidance / tooling
- Push Notification on build failures (opt in).
- On Build failure, attempt git bisect to last known passing build completion and report findings in build error output summary?
- Per instance metadata?
- Custom Client support
## Features
### Kernel Compiled custom testing OS
purpose compiled linux distro OS for enhancing testing env?
Kernel flags relevant to enhance strace, perf, etc etc
### Preinstalled
evmone
yul-tester
solc compiled from source
- solc with certain flags enabled
### @janitor-bot
Users with [commit access](#) can trigger pull request testing by writing a comment on a PR addressed to the GitHub user ``@janitor-bot`.
- Different tests will run depending on the specific comment used.
The current proposed test types are:
#### General Testing Types
1. Smoke Testing
2. Validation Testing
3. Benchmarking
4. Linting
5. Source Compatibility Testing
6. Specific Preset / ENV Loading
7. Testing Performance
8. Saturation Testing
9. Mutation Testing
10. Tracking of False Positives (user defined)
11. Tracking of False Negatives (user defined)
#### Web3 Testing Types
11. Agent-based Simulation Testing
12. Historical Block Replay Testing
13. Multi-chain Testing (this is more so liveness testing)
14. EVM equivalence checks.
15. Mainnet Forking and Fast Forwarding
16. Execution checks using multiple clients and different versions of those clients
17. Dependency Congruence checks
18. Post Deployment Validation of Bytecode
19. use eth_sendBundle to make multiple transaction if necessary for deployment
---
## Work In Progress Drafting
>**Warning**
> The below content is not finalized or potentially irrelevant.
---
### Contract Upgrade Management using Revsets[^2]
#### Revset Operators
The following operators are supported. `x` and `y` below can be any revset, not
only symbols.
* `x & y`: Revisions that are in both `x` and `y`.
* `x | y`: Revisions that are in either `x` or `y` (or both).
* `x ~ y`: Revisions that are in `x` but not in `y`.
* `~x`: Revisions that are not in `x`.
* `x-`: Parents of `x`.
* `x+`: Children of `x`.
* `:x`: Ancestors of `x`, including the commits in `x` itself.
* `x:`: Descendants of `x`, including the commits in `x` itself.
* `x:y`: Descendants of `x` that are also ancestors of `y`. Equivalent
to `x: & :y`. This is what `git log` calls `--ancestry-path x..y`.
* `::x`, `x::`, and `x::y`: New versions of for `:x`, `x:`, and `x:y` to be
released in jj 0.9.0. We plan to delete the latter in jj 0.15+.
* `x..y`: Ancestors of `y` that are not also ancestors of `x`. Equivalent to
`:y ~ :x`. This is what `git log` calls `x..y` (i.e. the same as we call it).
* `..x`: Ancestors of `x`, including the commits in `x` itself. Equivalent to
`:x` and provided for consistency.
* `x..`: Revisions that are not ancestors of `x`.
You can use parentheses to control evaluation order, such as `(x & y) | z` or
`x & (y | z)`.
### Janitor Commands
> Commands are organized in `do` blocks. As in `do` this shit.
#### do Verbs
- “Add” means “add if absent, do nothing if present” (if a uniquing collection).
- “Replace” means “replace if present, do nothing if absent.”
- “Set” means “add if absent, replace if present.”
- “Remove” means “remove if present, do nothing if absent.”
#### Operators
Operators for partial state updates in `do` blocks:
| **Symbol** | **Operator Action** |
|------------|------------------------------|
| (:=) | Replace |
| (−=)(+=) | date arithmetically |
| (%=) | Update according to function |
| (?=)) | Insert into ma |
## Artifact, Distribution and Persistence
> WIP
> We use the notion of `UTI` for helping to define state as well as organizing build output.
>
## UTI
Uniform type identifiers (UTIs) provide a unified way to identify data handled within the system
UTIs can encapsulate _any_ kind of type information, not just the obvious ones like file extensions. You can define your own type information.
Conformance information allows programs using UTIs to determine that (for instance) they know how to handle data with a UTI they’ve _never even seen before_ because it conforms to a UTI they do understand.
It’s possible to generate dynamic UTIs that will remember the information with which they were created even when passed to another system.
`base32` Encoding:
```
abcdefghkmnpqrstuvwxyz0123456789
```
## TODO
Web2 vs Web3 pipelines
Key Management / Secret Management (SOPS)
## Citations
UTI: https://alastairs-place.net/blog/2012/06/06/utis-are-better-than-you-think-and-heres-why/
[^1]: USENIX. “Prodspec and Annealing,” December 15, 2021. [https://www.usenix.org/publications/loginonline/prodspec-and-annealing-intent-based-actuation-google-production](https://www.usenix.org/publications/loginonline/prodspec-and-annealing-intent-based-actuation-google-production).
[^2]: GitHub. “Jj/Docs/Revsets.Md at Main · Martinvonz/Jj.” Accessed August 12, 2023. [https://github.com/martinvonz/jj/blob/main/docs/revsets.md](https://github.com/martinvonz/jj/blob/main/docs/revsets.md).