Swarm Testing strategies by Janos Guljas

# Swarm Testing strategies by Janos Guljas This document describes my views and ideas on the subject of Swarm network testing. The testing approach should be pretty much the same as planed when the Bee project was started, with additional complexity that incentivization requires. ## Testing methods From the runtime aspect of the testing, two methods can be recognized: - Unit testing - Integration testing - Performance testing - Manual testing All methods, except manual testing, should be automated in a sense that they can be executed without the need for manual configuration on demand or under a ci/cd pipeline. ### Unit testing Unit testing requires only the Go toolchain to be run and should have a scope of testing individual components that are implemented in the bee project. No additional infrastructure should be required for all tests to pass. The responsibility for unit tests is completely on the implementor's side. Any requirement to disable or remove existing test case must be documented in a form of an issue that needs to be solved. ### Integration testing Integration testing is testing the actual product, which is a Swarm network, that consists of bee nodes running on a real infrastructure with all additional runtime dependencies. Most notable runtime dependency is the blockchain with all required contract set up including deployments for every node. Integration testing could be on the: - Static network - Dynamic network Static network is a network where the nodes do not start or stop while the test is running. Dynamic testing is where the nodes are started and stopped while the test is running, to test mostly the connectivity, data retention and resilience. The scope of integration testing is the interaction of the bee nodes treated as the "black" boxes that only the exposed API is known and smart contracts. It is essential to treat every integration testing environment as a completely isolated environment. That ensures false positive or negative results by interference of the outside components. Example of a poorly isolated environment is the one that uses public blockchain network and existing contracts for testing. ### Performance testing To track improvements or degradation in network performance, additional tests must be performed that as the result have metrics that can be compared to the previous executions. Two types of performance testing can be identified: - Standard performance testing - Continuous performance testing Standard tests should have a defined lifetime limited by the behavior and time and they should end with results as a finite set of data. Continuous tests are executing the desired actions until they are stopped manually with a potential to run for a very long time while constantly submitting or providing measurements. An example of such test is a "smoke test" on data upload and download. For all of these tests, integration tooling could be utilized and beekeeper has the ability to run continuous tests. ### Manual testing Manual testing are required mostly during development and they should result in a specification for integration tests from observations which behaviors need to be tested and is reasonable to implement. Ideally, no manual testing should be required to validate the correctness of the implementation before the release. All required integration tests should be implemented before the release. It is possible that experimental features are needed to be released. It should be discussed, approved and documented if a feature is released that is only manual tested, as it creates additional burden for the future releases. Manual tests are subjective and it would be ideal that a dedicated QA engineer or team is leading the effort of validation and contributes in a crucial manner to the specifications for integration tests. ## Behavior tests Both unit and integration tests should be behavior tests. ## Testing during development It is encouraged to write unit tests as soon as possible during the development or even before the implementation in a manner of test driven development. Development process very often requires executing integration tests or having an isolated swarm network to validate implementation during the development. Tools that are executing integration tests can be used for purpose to manage the development networks either locally or on an infrastructure that can be accessed other developers. This also includes the setup of an isolated blockchain network and contracts, not only the Swarm network. ## Existing tools Unit testing is completely standardizes thanks to the Go toolset for testing. With the start of the bee node development, beekeeper was started right away as a tool that interacts with the bee API and kubernetes API to orchestrate and automate integration tests by setting up the whole infrastructure and runtime requirements and perform action on nodes and infrastructure. Actions such are start, stop node, upload, download and validate data and many other. It must be said that the tools for integration tests, especially the beekeeper, were meant to be used also for managing the infrastructure for the testnet and mainnet, but that never happened. If the tools are kept as similar as possible for both testing and public environments, the situation where they are neglected could be avoided. GitHub CI/CD already has automation using beekeeper and also to setup contracts. There used to be scripts that were used to bring up a local kubernetes cluster of bee nodes which were very helpful. ## Observability Unit tests have a trace on GutHub actions. Integration and performance tests require much more complex infrastructure to ensure that logs, metrics and tracing data is available during and after the testing is done. Such infrastructure is required to support isolated environment and not to mix data gathered or submitted by logging, metrics measurement and tracing tracking. Manual testing should have a result in a form of a document or an issue tracker report. ## Current state of testing It should be evaluated which behaviors need to be covered by unit and integration tests and first to address them. I believe that compromises have been made in skipping testing that slowed down the process of development and releasing new features. The evaluation if the beekeeper is the tool that should be continued to be developed or new tools should be made is needed. The automation is the major pain point for the whole development and release process. Every need for manual actions on the running swarm networks in order to update the nodes or change their state greatly slows down the development process and creates a friction. Current mode of operation is not sustainable and automation, which is here highly neglected, is proven to be essential in development process. Skipping on testing (both unit and integration) and neglecting automation it is the price of much slower development process that Swarm is paying currently and must be addressed as soon as possible.