Ginkgo Proposal: Finer-Grained Shared Resources and Parallelism

# Ginkgo Proposal: Finer-Grained Shared Resources and Parallelism ## The Problem With growing adoption of Ginkgo among authors of Kubernetes-related ”e2e” integration suites there has been a steady rise in issues opened around questions of finer-grained setup of shared resources and parallelism. Specifically it appears that users want to be able to express patterns such as: - I want to setup a new cluster (very expensive) **once** for a wide variety of specs to run against - I have several distinct Features/Scenarios/Modalities (henceforth simply "features"). Each of these features involves some (expensive) **setup of a shared resource** (e.g. an Operator/Controller/Application/Configuration) for the **_set_ of specs** associated with the feature. I want to run this setup **once** per feature. - I might even have a _hierarchy_ of features - where some feature or interactions require some expensive set up on top of a prior shared resource. I want to run this setup just **once**. - Sometimes I want (some of) the specs for a feature to: - ...run sequentially *with respect to each other* - ...run in parallel *with respect to each other* - And sometimes I want (some of) the specs for a feature to: - ...run exclusively (i.e. no other specs can run while these feature specs are running) - ...be comingled with other specs (i.e. other specs, including other feature specs are allowed to run simultaneously) - In addition, I may have some extensive checks I’d like to perform to make sure all is well - either with my global setup or the setup for my feature. I want to run these checks just once *before* I start running the associated specs. The presence of this sort of shared setup breaks Ginkgo's assumption that specs are [**independent** ](https://onsi.github.io/ginkgo/#mental-model-ginkgo-assumes-specs-are-independent); and Ginkgo's multi-process parallelism model means that scheduling and communication between different parallel runners must be mediated by Ginkgo itself. As such, implementing patterns such as these is not currently well-supported by Ginkgo. ## The Current State Ginkgo has long allowed the setup and configuration of suite-level shared resources via `SynchronizedBeforeSuite` - however it has not been possible to scope additional shared resources to just a subset of specs, nor to control how different specs run in parallel with one another. Ginkgo V2 attempted to make progress on this problem space with the introduction of a few new concepts: - Containers and specs can now be decorated as `Serial` - this ensures that no other specs run while these specs are running. - Containers can be decorated as `Ordered` - this ensures that specs in the container run sequentially. By default a failed spec results in the entire container aborting, however the `ContinueOnFailure` decorator can be applied to change this behavior. - `Ordered` containers can include `BeforeAll` setup nodes - this enables shared setup for all specs in the container. Under the hood (a) Ginkgo schedules `Serial` specs _after_ all parallel specs have finished running and (b) Ginkgo schedules `Ordered` containers as a single unit that runs on a single Ginkgo process (and, therefore, potentially in parallel with other processes). This allows us to solve for parts of the usecases above, however it is currently _not_ possible to support specs with **shared setup** that can run in parallel. Moreover, the design and implementation of `Ordered` has come at a high cost: the Ginkgo codebase is substantially more complex now and users often reach for `Ordered` without having clarity around its many tradeoffs. ## Overview of the Proposal We propose introducing a number of new pieces to enable the usecases described above: - **Subsuites** allow users to describe groups of paralellizable specs that use a shared resource. The shared resource can be set up once for the subsuite and then torn down once all the subsuite specs have completed. Subsuites, and therefore shared resources, can be hierarchical. - A new `Exclusive` decorator to complement `Serial`. Specs within an `Exclusive` container can run in parallel with respect to each other, but will **not** run in parallel with other specs within the same subsuite. - A set of new decorators: `RunAfterSetup` and `AbortParentSubsuiteOnFailure` to allow users to add specs that validate that a suite‘s (or subsuite’s) setup is valid. - A `NeverRandomize` decorator that prevents Ginkgo from randomizing the specs within a decorated container. These building blocks should provide significant new flexibility to handle the usecases outlined above. They are purely additive and will allow users that need the more complex behavior to opt-in over time. They also lay a cleaner and more flexible foundation for the existing `Ordered` behavior, allowing us to eventually deprecate and, with time, remove `Ordered`. ## Subsuites We propose introducing a new Ginkgo construct: ```go var s = Subsuite(“name”, <Optional Decorators>, <Optional Body>) ``` Here `name` must be unique across the suite and is how Ginkgo distinguishes between subsuites. The supported decorators will be outlined later in this proposal. `Subsuite`s must be defined during the [tree construction phase](https://onsi.github.io/ginkgo/#mental-model-how-ginkgo-traverses-the-spec-hierarchy) and so can be defined at the top-level or within the body of containers - however they cannot be defined within setup (e.g. `BeforeEach`) or subject (i.e. `It`) nodes. ### Adding Specs to Subsuites There are two mechanisms for adding nodes to a subsuite `s`. Subsuites can be passed a function: ```go var s = Subsuite(“name”, func() { SynchronizedBeforeSubsuite(func() { ... }) Describe(..., func() { ... }) It(...) //etc... }) ``` In addition, container nodes and subject nodes can be attached to the subsuite via: ```go Describe(..., s, func() { ... }) It(..., s, func() { ... }) ``` (i.e. a subsuite is a valid decorator). Of course a combination of both is supported: ```go var s = Subsuite(“name”, func() { SynchronizedBeforeSubsuite(func() { ... }) //etc... }) // append a Describe to the subsuite Describe(..., s, func() { ... }) ``` however the subsuite setup nodes (`*BeforeSubsuite`) must appear in the `Subsuite` closure. We anticipate most users will use the nested closure approach as it will match the typical look-and-feel of Ginkgo code. However we support the variable oriented approaches to allow subsuites to span multiple files (or, even, packages - as some users define tests across multiple packages and import them into a single `e2e_test` package). ### The Subsuite Hierarchy Subsuites can be nested: ```go var s = Subsuite(..., func() { var s2 = Subsuite(..., func() { ... }) }) ``` or ```go var s = Subsuite(...) var s2 = Subsuite(..., s) ``` Both build a new subsuite, `s2`, that is a child of `s`. Each subsuite can have at most one parent; so multiple-inheritance patterns or, more to the point, writing specs that use resources provisioned by two *different* subsuites, is not possible Moving forward we can model Ginkgo’s global suite as the **root** “sub”suite. With all other subsuites and specs attached to it. Access to the root suite is provided via `RootSuite` - which will be useful in some of the decorators discussed below. Note that container hierarchies automatically imply membership in a subsuite: ```go Describe(”Foo”, s, func() { It(“bar”, func() { ... }) }) ``` here the `“bar”` `It` is understood to be in `s`. Moreover, container hierarchies and subsuite hierarchies must align: If a container is in `s` and one of its child specs is in `s2`: ```go Describe(”Foo”, s, func() { It(“bar”, s2, func() { ... }) }) ``` then an error is thrown unless `s2` is a descendant of `s`. > If an `Ordered` container is in `s` then all its children must **also be in `s`** and not some descendant of `s`. This is due to the complexity of `Ordered` containers - it will be prohibitively expensive (and confusing) to support `Ordered` containers that span multiple subsuites. ### Setting Up Shared Resources for a Subsuite Subsuites are primarily used to group together a set of specs that make use of some shared resource (for example, an application deployed to a cluster; or a configuration of some external system). Users use `BeforeSubsuite` or `SynchronizedBeforeSubsuite` to set up shared resources for a subsuite: - Both are guaranteed to run before the first spec in the subsuite runs. - `BeforeSubsuite` runs once on every Ginkgo process (if need be; i.e.if the process is going to pick up a spec in the subsuite). Any `DeferCleanup` nodes registered in the `BeforeSubsuite` will run after the last spec in the subsuite for the process in question has completed. - `SynchronizedBeforeSubsuite` takes two functions. We’ll call them `Primary` and `All`. Here’s how they behave: - `Primary` is run only once by one Ginkgo process (typically the first one to start working through the subsuite). `Primary` is used to set up any (expensive) shared resources needed by the specs in the subsuite. - `All` is run by all Ginkgo processes that pick up work for the subsuite (but only once per process). - Communication between the two functions is facilitated by Ginkgo. `Primary` can (optionally) return a result. This result is json encoded and then made available to `All`. This allows information about resources configured in the first process to make their way to other processes. - `All` is guaranteed to run _only after_ `Primary` completes and returns. It does not run if `Primary` fails. - Any `DeferCleanup`s defined in `All` are guaranteed to run after the final subsuite spec is run by the process. - Any `DeferCleanup`s defined in `Primary` are guaranteed to run after all processes have finished running the subsuite (i.e. there are no more specs or `DeferCleanup`s to run). Only one `BeforeSubsuite` or `SynchronizedBeforeSubsuite` can be attached to a given subsuite (though this may be relaxed in a future release if needed). Moreover, these subsuite setup nodes must be called in the body function of the subsuite and they must appear at the top level of that function - they cannot be inside a container node. If `BeforeSubsuite` or `SynchronizedBeforeSubsuite` are called outside of a subsuite they are attached to the root suite and treated as synonyms for `BeforeSuite` and `SynchronizedBeforeSuite`. There is no `(Synchronized)AfterSubsuite` - users should use `DeferCleanup` instead. We may introduce a `ReportAfterSubsuite` if we see demand for it, however it is currently out of scope. ### Randomization Randomization honors the subsuite hierarchy and prevents mixing of specs across sibling subsuites. Specifically, when randomizing a subsuite we first generate a random order among its specs and any child subsuites, then we enter the child subsuites and randomize them following the same rules. This ensures that specs within a subsuite run close to each other and allows resources associated with the subsuite to be allocated and released in close temporal proximity. This will hold even if `—randomize-all` is used. The rationale behind this randomization shceme is to minimize instances where a high degree of parallelism results in a large number of long-lived shared resources lasting for much of the duration of the suite. This could result in resource contention and flaky tests and force users to prematurely reduce the parallelism of their suite. ### Aborting a Subsuite Any subject or setup node can call `AbortSubsuite()` to end execution of its immediate parent subsuite (any remaining specs in the subsuite will be skipped). It can also call `AbortSubsuite(s)` to abort a specific subsuite, `s`, in the spec’s subsuite hierarchy. Note that `AbortSubsuite(RootSuite)` is equivalent to `Abort()`. ### Supported Decorators All current decorators that work with containers will apply to subsuites as well except for `Ordered`. `Ordered` has introduced substantial complexity to the Ginkgo codebase and confusion to users. Rather than compound on that complexity we shall leave it isolated to its current behavior. Subsuites will gain an additional `SubsuiteTimeout` decorator to provide a subsuite-level timeout - though this may be pushed out to a future release. ## Controlling Parallelization: `Serial` and `Exclusive` We introduce a new `Exclusive` decorator and extend the behavior of the `Serial` decorator to provide the user with new tools for managing parallelization in GInkgo. ### `Serial` Specs (i.e. `It`), containers (i.e. `Describe`, `Context`) and `Subsuite`s can be decorated with `Serial`. When applied to a container or subsuite, the `Serial` decorator simply percolates down and applies to all specs in that container or subsuite. `Serial` has no additional implications on the lifecycle of the subsuite/container - only on its child `It`s. `Serial` specs are **guaranteed** to run **in series** with respect to any other specs in the **same subsuite**. Moreover specs in a `Serial` container are guaranteed to run on the **same Ginkgo process**. This means that `Serial` specs in a subsuite `s` may run in parallel with specs in the root suite, or in some other subsuite that is not a descendant of `s`. However these `Specs` are guaranteed to run in series with respect to other specs in `s`. In addition we introduce a new `SerialAmong(subsuite)` decorator that can be passed a subsuite as an argument. Here `subsuite` must be in the subsuite hierarchy of the decorated spec and the spec will be guaranteed to run in series with respect to all other specs in `subsuite`. The user can pass in `RootSuite` to have the spec be serial with respect to all other specs in the suite. ### `Exclusive` `Exclusive` is a new decorator that can be applied to containers and subsuites. Specs in an `Exclusive` container can run in parallel with respect to each other but are guaranteed **not** to run in parallel with any other specs in the **same subsuite**. In addition, `ExclusiveAmong(subsuite)` ensure that the specs in the decorated container do not run in parallel with any other specs in `subsuite`. Here `subsuite` must be in the subsuite hierarchy of the decorated container. The user can pass in `RootSuite` to have the container be exclusive with respect to all other specs in the suite. Note that `Exclusive` mirrors `Serial` in that it controls the degree to which decorated specs can run in parallel with specs within the same subsuite. The sole distinction is that `Exclusive` allows for parallelism within the container whereas `Serial` enforces that all decorated specs run in series. Note that two sibling containers marked `Exclusive` are not “merged” and run in parallel. Instead they are run sequentially (though the specs within each container will be parallelized with respect to each other). ### Rationale The rationale behind the behavior of `Exclusive` and `Serial` is straightforward: Ginkgo users sometimes need to perform operations on a shared resource that cannot be run in parallel with other operations on the same shared resource For example, a set of specs may configure the resource in a way that invalidates the expectations of other specs that make use of the same shared resource. Under ideal conditions these two sets of specs would be given their own copy of the resource, however this can sometimes be prohibitively expensive. So long as the resource can be reset to a clean state, it can be more efficient to give each set of specs exclusive access to the resource. Moreover, since shared resources can be hierarchical and are modeled in Ginkgo, hierarchically, via subsuites - it makes most sense to allow users to express `Exclusive` and `Serial` in terms of subsuites. The default of being `Exclusive`/`Serial` with respect to the most immediate subsuite allows other specs that presumably have no dependency on said shared resource to continue running in parallel. It remains to be seen whether the proposed approach is sufficient. It is possible that users will want to test certain behaviors that _cannot_ be reset. In which case it will be necessary to be able to dictate that a certain set of specs is both `Exclusive` and must run at the end of the subsuite. Such a solution can be implemented in a future release of Ginkgo but is currently out of scope of this workstream (though see the Preventing Randomization section below). ### Drawbacks The presence of `Serial` and `Exclusive` certainly add complexity when reasoning about the runtime behavior of a suite - it will be up to users to use them judiciously. In particular, `Exclusive` could require that some processes stand idle while others run - an effective bottleneck in the parallelizability of the code. This is, of course, by design - but will complicate reasoning about the performance characteristics of a suite as a function of process-count. However by scoping exclusivity to the parent subsuite we hope to minimize the spread of the bottleneck (other subsuites can continue to run in parallel). ## Decorators to Validate Setup The common pattern for ensuring that setup for a suite or spec is ready is to add validation code to the relevant `BeforeEach` or `BeforeSuite`. However some users have reported needing support for more extensive validations. To support this we propose introducing two new decorators: - `RunAfterSubsuiteSetup` can be applied to a container or spec. It tells Ginkgo to run the decorated spec right after the setup of its associated subsuite. If applied to a container or spec that is in the root suite then the spec runs right after any `BeforeSuite` or `SynchronizedBeforeSuite` node runs. The ordering among multiple `RunAfterSubsuiteSetup` specs is not guaranteed if `--randomize-all` is enabled. - `AbortParentSubsuiteOnFailure` can be applied to a container or spec. It tells Ginkgo to abort the subsuite and skip any subsequent specs if a decorated spec fails. `AbortSubsuiteOnFailure(subsuite)` can abort the specified `subsuite` (which must be in the decorated spec’s subsuite hierarchy) if a decorated spec fails. ## Preventing Randomization There are contexts when the order in which specs run should be preserved by Ginkgo. For example, a user may want to take a large `It` and break it down into a number of related, and ordered, specs. Currently Ginkgo supports this via the `Ordered` decorator - however as described above, this has come at the cost of significant complexity to the Ginkgo codebase. We’d like to eventually deprecate and remove `Ordered`. To solve for this usecase, and give users a migration path, we introduce a new `NeverRandomize` decorator that can be applied to a container or subsuite. The specs within the decorated container are guaranteed to never be randomized with respect to one another (the container itself, however, may be shuffled among other containers in the (sub)suite). `NeverRandomize` only affects the order of the specs in the Ginkgo tree. If the specs run in parallel then a sequential run order cannot be guaranteed. Thus we expect most users will want to pair `NeverRandomize` with `Serial`, which will ensure the specs in the container are not randomized _and_ run in series on the same process (thereby allowing shared variables to propagate across the sequential specs). Recall that, by default, specs in a `Serial` container are guaranteed to run in series relative to other specs in the same subsuite. However you may want to signal to Ginkgo that a particular unit of `Serial`, `NeverRandomize`d specs *can* run in parallel with other specs in the subsuite (this is the default behavior of `Ordered`). To accomplish that the suggested pattern is: ```go Describe(“sprockets”, func() { ... }) Describe(“widgets”, func() { ... }) Subsuite(“the manufacturing workflow”, func() { BeforeSubsuite(func() { //set up the workflow (will run once) }) Describe(“is quite complex”, NeverRandomize, Serial, func() { It(“...”) It(“...”) ... }) }) ``` This will guarantee that the manufacturing workflow specs are always correctly ordered and run in series with respect to one-another. _And_ that they can run in parallel with respect to the other specs (`sprockets` and `widgets`) in the suite. `BeforeSubsuite` is used to perform setup for the ordered set of specs **once**. This is, admittedly, more complex than the previous `Ordered` directive, however it *does* provide the user finer-grained control over the degree of parallelizability of the non-randomized specs. ## Implementation Notes and SemVer A series of explorations will need to be performed to determine how best to approach implementing these features. Note that these additions are largely additive and are compatible with existing Ginkgo semantics... however we anticipate that implementing these features via Ginkgo’s current scheduling model will prove unwieldy. Today, Ginkgo’s different parallel processes run specs largely independently - they rely on a simple central server to provide a threadsafe auto-incrementing counter to all the specs and employ a polling model for some cross-purpose signals (e.g. `Abort()`) and a publishing model to aggregate individual `SpecReport`s back to the server which eventually collates the final suite `Report`. This simple parallelism model works well when we can assume that specs are independent and can be trivially parallelized with respect to each other. This is no longer the case going forward. We anticipate, instead, transitioning to a richer centralized orchestrator model where the server is given a picture of the spec tree and orchestrates the various Ginkgo processes in a way that correctly honors the various constraints described in the tree. This change will be transparent to the user (though it will necessitate a coordinated upgrade of the Ginkgo library and CLI - however this has always been the case for minor version updates of Ginkgo). From a SemVer perspective all the changes described in this proposal are additive and do not change existing behavior. Once implemented, this proposal will be released as a minor release of Ginkgo. Ginkgo will continue to support `Ordered` however our intent is to gradually phase it out - first via deprecation, and then by removing it entirely, within the `2.x` line. Users and organizations that have concerns about this should reach out to the Ginkgo maintainers.