# RFC #0007: Distributed enrolments # Status - [ ] predraft - [x] draft - [ ] abandoned - [ ] published - [ ] superseded # Summary The current Monolith backed experimentation enrolment is dependent on complex inter-service cookie proxying, with non-deterministic conflict resolution. This fragility produces unpredictable enrolments resulting in unreliable attribution. This proposal describes a mechanism for deterministic distributed experiment enrolment by hoisting the responsibility from the Monolith to Web Proxy. For the remainder of this document we'll refer to this mechanism as **Enrolments**. # Motivation Monolith backed enrolments in today's distributed front-end world requires a complex orchestration of cookies between services designed to trick the monolith into thinking it's still talking directly to user's web browsers. This in turn requires intermediate services, like Boom apps, to act like web browsers (see images below). This orchestration is difficult to get right in every layer of every application in every circumstance. As such it is prone to regressions. Additionally the current Monolith backed enrolment mechanism lacks the information for certain desirable enrolment logic such as enrolments per domain / service boundary, multi-variant, or support enrolments on native apps. These issues have resulted in reduced trust in experimentation results, and teams introducing service specific enrolment solutions. # Discovery process In early 2019 representatives from Falcon, Curiosity, and Product APIs held a [series of workshops][experimentation frameworks] lead by Michael Mifsud and Victor Pillac aimed at addressing the current issues, and short-commings of our current Monolith backed enrolment mechanism i.e. [excluding certain channels from experiments][excluding certain channels from experiments]. The outcome of the workshops was a design for [experimentation enrolment at the edge][experimentation enrollment at the edge]. **Note:** The linked documents describes a spike using Cloudflare Edge Workers however the introduction of Web Proxy provides us with a preferred edge platform. # Proposal Hoist enrolments to the edge via Web Proxy. To achieve this we will introduce two new concepts. **Firstly:** a standard `X-Experiments` HTTP header for communicating current enrolments between services. Critically this header will only be set at Web Proxy, significantly reducing inter-service orchestration of enrolment data. **Lastly:** an `rbVisitorId` cookie for maintaining state between the client web browser and Web Proxy. This cookie will not be exposed to downstream Redbubble services i.e. Shop, Explore, or the monolith. At a high level, in the best case, this is how the two systems compare. | Provider | Not enroled | Enroled | | :--------- | :---------- | :------ | | Monolith | ![](https://i.imgur.com/0GvuypY.png) | ![](https://i.imgur.com/FQJ7vj6.png) | | Enrolments | ![](https://i.imgur.com/GY5YC1f.png) | ![](https://i.imgur.com/v3cDSqj.png) | # Key concepts - Web Proxy creates a `rbVisitorId` cookie containing a [ulid][] if it does not already exist - the `rbVisitorId` isn't transmitted to services upstream of Web Proxy - enrolments are determined at Web Proxy based on `rbVisitorId` cookie, and an experiments config file - enrolments are [deterministic](#Deterministic-enrolments) for the life time of the `rbVisitorId` cookie - enrolments are communicated between services via the `X-Experiments` header - enrolments are configured and versioned in Web Proxy These properties have a few key advantages over the current monolith backed mechanism. - enrolments can be re-computed offline given a `rbVisitorId` and an enrolment config version - enrolments are immutable for the lifetime of the originally initiated HTTP request - enrolments no longer require a round trip to the monolith - enrolments no longer happen as a side-effect of `ApplicationController` - enrolments no longer require conflict resolution at the front-end # Deterministic enrolments This behaviour is critical component of Enrolments. In order to achieve deterministic Enrolments we need 3 things. - uniquely identifiers for enrolments (`enrolments configuration`) - an uniformly distributed hashing function (`MurmurHash3`, `CityHash`, `FarmHash` families) - a seed (`rbVisitorId`) ## Enrolments configuration This is how we define the available enrolments and their parameters i.e. who can be enrolled, on which experiences, at what ratio. In its simplest form this is a yaml file. ```yaml sp-test-1: urls: ^/shop\?.+ locales: [de,fr,es] percentage: 0.5 rec-test-1: urls: ^/explore\?.+ percentage: 0.1 ``` Critically each configured enrolment needs to have a unique identifier i.e. `sp-test-1`, `rec-test-1`. ## The seed This is _what_ is being enrolled. As a result we want a stateful seed with high entropy. A seed value will always result in the exact same set of Enrolments for the same enrolments configuration as opposed to a random coin flip. For Monolith backed enrolments this is essentially a user's CognitoID. In Enrolments this is the `rbVisitorId` identifier. ## The hash We use a hashing function to generate a number. Ultimately we can mod that number against the configured percentage for the given enrolment. In order to achieve fair enrolments we need to use a special kind of hashing function, one that is evenly distributed. Additionally since enrolments will happen at the edge on every request it must be fast and efficient. # Key differences The major conceptual change is what is being enrolled. The current Monolith backed enrolments enrols a Cognito ID ("user"), where Enrolments enrols a "browsing session" (as denoted by the `rbVisitorId` cookie). This means Enrolments makes no attempt to reconcile enrolments by Cognito ID. # FAQ and Rationale ## Will I still be able to opt into an experiment Yes. A convenient mechanism (TBD) will be provided for opting into specific experiments. ## Will adding a new cookie exacerbate our header size issues Initially yes, but as part of the rollout process we plan to phase out the existing enrolment cookies. This is a big win as one of the enrolment cookies is the root cause of admins hitting the cookie length limits. These cookies will be replaced with the new `rbVisitorId` cookie which would only ever contain a fixed length identifier. ## What about PlanOut [PlanOut][] is an enrolment and attribution Python framework library developed at Facebook. In PlanOut model we would create an Enrolments Python web service that wired up the library. Services would then make requests to the Enrolments service to get enrolment information and send attribution events i.e. user created P.O. All requests to Enrolments service would require metadata for the enrolment (i.e. url, locale, etc...) and some identifier - in our case `rbVisitorId`. This approach is somewhat of a middle ground between the current Monolith backed enrolments, and the proposed Enrolments. This alignment of ideas between PlanOut and Enrolments is no accident as Enrolments is heavily influenced by the [PlanOut paper][]. In the end we decided against PlanOut for a couple reasons: - we're not comfortable supporting a product Python service - it would introduce more interservice call, thus more possible points of failure - it has many more features than we currently need - the incoming request to Web Proxy has much of the metadata we require # Rollout plan ### Phase 1 - introduce an enrolments configuration file into Web Proxy - introduce a Web Proxy middleware for issuing the `rbVisitorId` cookie - introduce an enrolments middleware for Web Proxy `backend`s to set the `X-Experiments` header ### Phase 2 - implement an enrolments Boom middleware package (including a GraphQL resolver) - update Shop to use the enrolments Boom middleware package - implement an enrolments Rack middleware for Rack / Rails applications - update Monolith experimentation helpers to read from the `X-Experiments` header ### Phase 3 - remove legacy enrolment behaviour from the monolith - introduce a Web Proxy middleware to remove legacy enrolment cookies # Glossary ## Conflict resolution Front-ends like Boom application can execute multiple API requests in parallel, many of which reach the Monolith. If the originating front-end request has no enrolments then any or all of the parallel API requests may result in an enrolment from the Monolith. However since Monolith backed enrolments are non-deterministic the enrolment may differ, and the front-end application has to decide which enrolment to respect. Additionally front-end applications need to track this decision in memory for originating request and attach the enrolment cookie to all subsequent Monolith API requests. ## Flow diagrams Source (bigger images): https://hackmd.io/w4tEXF56QomiUpa5HuvkHA?view ### Best case (monolith) | Provider | Not enrolled | Enroled | | :--------- | :---------- | :------ | | Monolith | ![](https://i.imgur.com/IUO9BOU.png) | ![](https://i.imgur.com/y6F1bEQ.png) | | Enrolments | ![](https://i.imgur.com/htpuheD.png) | ![](https://i.imgur.com/B9dA53c.png) | ### Best case (boom) | Provider | Not enrolled | Enroled | | :--------- | :---------- | :------ | | Monolith | ![](https://i.imgur.com/0GvuypY.png) | ![](https://i.imgur.com/FQJ7vj6.png) | | Enrolments | ![](https://i.imgur.com/GY5YC1f.png) | ![](https://i.imgur.com/v3cDSqj.png) | ### Typical | Provider | Not enrolled | Enroled | | :--------- | :---------- | :------ | | Monolith | ![](https://i.imgur.com/ot84wYU.png) | ![](https://i.imgur.com/rbe1W7I.png) | | Enrolments | ![](https://i.imgur.com/ihmuCfZ.png) | ![](https://i.imgur.com/zOoJbwm.png) | ### Worst case Notable for the Monolith this isn't too different to the Typical case. | Provider | Not enrolled | Enroled | | :--------- | :---------- | :------ | | Monolith | ![](https://i.imgur.com/GUbbisY.png) | ![](https://i.imgur.com/eaTAQlU.png) | | Enrolments | ![](https://i.imgur.com/CyvallT.png) | ![](https://i.imgur.com/ELTweYQ.png) | [distributed experiment enrolment]: https://redbubble.atlassian.net/wiki/spaces/DS/pages/843022546 [excluding certain channels from experiments]: https://redbubble.atlassian.net/wiki/spaces/DS/pages/872186041 [experimentation frameworks]: https://redbubble.atlassian.net/wiki/spaces/DS/pages/84574300 [experimentation enrolment at the edge]: https://redbubble.atlassian.net/wiki/spaces/DS/pages/843416175 [ulid]: https://github.com/ulid/spec [PlanOut]: https://facebook.github.io/planout/ [PlanOut paper]: https://arxiv.org/pdf/1409.3174v1.pdf