Design: Actor Model Proposal

# Design: Actor Model Proposal **Author**: @jonathanj  ## Description (what) This document proposes the introduction of the actor model as an FTL sub-system. ## Motivation (why, optional) The model offers fault tolerance and scalability improvements over traditional services while making it easier to reason about the system behavior. Without the support of a system like FTL the cold-start barrier to entry into adopting this model can be prohibitively high. Capabilities like message passing transparency, controller based message routing, schema stores, the console and the scaffolds that envelope entry-points provide the building blocks to support a more ergonomic actor model system. ## Introduction ### Defining an Actor _This section briefly introduces the actor model concepts required to review this proposal._ Actors generally represent the concepts from your application domain; such as **merchants**, **transactions**, or **store inventory**. Ephemeral concepts such as **advertisement campaigns** can also be expressed as actors. Actors may create and interact with other actors. For instance imagine the following actor interaction scenario. > A <ins>merchant</ins> kicks off an <ins>advertisement campaign</ins> targeting an item in <ins>store inventory</ins>. This ultimately leads to the create of <ins>transactions</ins> that consume from <ins>store inventory</ins>. Item counts in <ins>store inventory</ins> are adjusted accordingly. When the item count reaches zero in the <ins>store inventory</ins> it notifies the <ins>advertisement campaign</ins>. The <ins>advertisement campaign</ins> responds by ending. #### Actors are Isolated Under the actor model each actor’s internal state is protected from external mutation. Actors interact with one another by way of message passing. An actor may update its internal state in response to processing a message; it may also decide to dispatch messages to other actors or even create new actors. #### Actors Message Handling is Serialized An actor instance only reacts to one message at a time. The messages dispatched to a given actor instance get stored in a message queue and are processed in arrival order. #### Consequence These properties obviate the need for actors to defend against issues stemming from concurrent internal state access e.g. locking and database transactions. Locking is harmful to horizontal scalability; reducing reliance on it promotes scalability. ## Proposal This document proposes introducing an actor model sub-system to FTL. Actors provide a powerful distributed systems engineering paradigm. The model underlying this paradigm is naturally scalable, fault-tolerant, and from its [formal](https://en.wikipedia.org/wiki/Actor_model) theory emerges powerful invariants that promote system behavior predictability. In spite of this paradigm's advantages over more traditional system models, actors tend to see very little use in the wild. The actor execution model is one significant barrier to entry for adoption. Actor message processing (_reactions_) must be serialized. Furthermore, message passing between actors is asynchronous and one directional. Both requirements incur architectural and actor coordination ergonomics challenges that arguably contribute the majority of this barrier. By leveraging and extending the capabilities built into FTL actor model adoption can be made feasible and made to feel more natural. ### Transparent Message Passing in FTL Messages in an actor system must be routed to specific actor instances; preferably transparently. Verb and FSM calls in FTL already support transparent message passing. FSM message passing is most similar to actor message passing given both are instanced. The lift to forking and extending the capability should be relatively low. ### Actor Control Injection Point The framework approach taken by FTL leaves space to wrap actors with supervisors that provide actor support services such as: * Managing instance lifecycle * Applying fault tolerance strategies * Persisting internal state mutation * Signaling message completion * Enforcing execution semantics * Emitting telemetry The supervisor is the control point so this list will expand as new service needs emerge to address design challenges. ### Challenges The one way asynchronous communication characteristic of actor based systems presents serious ergonomic challenges and is likely to be the primary API design challenge. One way async communication lends itself to saga-like execution chains that, without careful design, can severely burden the development experience. The model cannot succeed in finding adoption if application business logic is drowned out by actor boilerplate code. Protecting the execution model in a distributed environment where multiple versions of the actor may be live is likely to be the primary system design challenge. Defining crisp execution semantics **first** and **then** designing around them is critical to building a system whose distributed behaviors remain well-defined even at boundary conditions. ### Opportunities The canonical advantages of the actor model include: * Fault tolerance * Scalability * System predictability * Promotes clean separation between application domain concepts. #### Observability There are observability advantages that are somewhat less obvious but provide powerful operational advantages for an actor based system. Under the actor execution model the causal relationship between message reaction and an actor instances internal state change is clear. This causal relationship transitively extends to support linking causal reaction chains (e.g. while processing message **x** message **y** got dispatched and/or actor **z** was spawned) to reconstruct the saga leading up to a mutation or actor creation. The causal observability characteristics of this system creates opportunities to expose diagnostic views & capabilities in the console that allow developers to gain distinctly powerful insights into their systems behavior. #### Storage Optionality The actor system transparently persists the actor's internal state. This provides persistent storage engine optionality that leaves room to address competing application focused goals such as: * Promoting responsiveness by reducing message processing latency * Protecting availability * Leveraging high availability storage strategies * Building fault tolerance into the storage interface layer * Protecting reliability * Leveraging high durability data storage strategies * Protecting the execution model assertions * Promoting compliance with mutation auditability * With options for tamper-proof strategies via WORM storage * Promoting application scalability by leveraging scalable storage systems and transparently applying scaling strategies (e.g. partitioning) ## POC Design _This design exists to demonstrate the feasibility of this model. It will be brief hand-wavey and high level and help jump start an in depth design should we accept some variant of this proposal._ ### Model Concepts An <ins>actor system</ins> consists of <ins>actors</ins> representing concepts from the application’s domain. For each actor there are zero or more <ins>actor instances</ins>. Interactions with the actor instance are controlled by the <ins>supervisor</ins>. #### Actor Each <ins>actor</ins> is characterized by the schema of its internal state and the canonical set of message types it reacts to. The canonical set of message types is determined by the set of reaction function declarations; whose signature accepts an `ReactionContext`, an internal state object, and the message type. e.g. ```go type StoreInventory struct { } type ItemConsumed struct { } //ftl:reaction func Consume(ctx context.ReactionContext, s StoreInventory, m ItemConsumed) { } ``` _Note: the relationship between reaction function declarations and message types is one-to-one._ #### Actor Instance An <ins>actor instance</ins> is an instantiation of an <ins>actor</ins> that is backed by its internal state object (e.g. `StoreInventory`). Each actor instance is also backed by a dedicated <ins>message queue</ins>. The messages in the queue are constrained to the message types of the corresponding reaction functions and built-in system messages. #### Supervisor The <ins>supervisor</ins> is an execution harness for the <ins>actor instance</ins> and is responsible for enforcing the actor execution model (_see next section_). This component is responsible for serving messages to an <ins>actor instance</ins> from its corresponding <ins>message queue</ins>. The supervisor given a message builds and supplies the `ReactionContext` object. This object exposes capabilities to the actor instance’s reaction function, such as: * Publish an internal state mutation * Lifecycle control (e.g. signal end-of-life) * Spawn a new actor (with idempotency support) * Dispatch messages to existing or the aforementioned spawned actors _This flow is visualized in the following diagram_ ![Actor / Supervisor Flow](https://hackmd.io/_uploads/S1VMj3lpR.png) #### Execution Model Behaviors The implementation of this actor flow is constrained by the behaviors ascribed to the execution model. The execution model focuses on the observability of consequences under key scenarios. As the implementation evolves, components redeploy, and containers migrate these model behaviors must remain invariant: * For any given actor instance 𝞪 the consequences of reacting to message <code>m~i~</code> are recorded with transaction semantics (all or nothing) before reacting to message <code>m~i+1~</code> begins. Those consequences are: * Internal state mutated <code>s~i+1~</code> * Lifecycle modification applied (e.g. marked completed) * Record intention to asynchronously spawn a set of actors <code>A~i~</code> * Record intention to asynchronously dispatch a set of messages <code>M~i~</code> → **Corollary**: all <code>m~i~</code> reaction retry attempts will observe the same internal state <code>s~i~</code>. → **Corollary**: the synchronous verbs invoked from within the <code>m~i~</code> reaction implementation will observe <code>s~i~</code>. Note this is because an observation of <code>𝞪</code> from that synchronous verb will occur before the reaction to <code>m~i~</code> completes. → **Corollary**: the asynchronous verbs will observe <code>s~i+k~</code> where <code>k</code> ≥ <code>0</code>. Note this is because an observation of <code>𝞪</code> from within that asynchronous verb may execute before or after the reaction to <code>m~i~</code> completes. * Asynchronous intentions produced as reaction to <code>m~i~</code> will execute after <code>m~i~</code> successfully completes. → **Corollary**: the consequences <code>A~i~</code> and <code>M~i~</code> (e.g. success or failure results) are not observable from within the reaction to <code>m~i~</code>. * Actor instance <code>𝞪</code>‘s reactions consequence of <code>A~i~</code> and <code>M~i~</code> will observe an internal state of <code>s~i+k~</code> for actor instance <code>𝞪</code> where <code>k</code> ≥ <code>1</code>. Because <code>𝞪</code> may react to other messages before those reaction consequences observe <code>𝞪</code>. #### Non-Actor Model Services Requring all interactions with actors, specifically querying state, to follow the actor message passing model would be overly combersome to develop against. Therefore actor instance state or actor instance metadata should be retrievable via gRPC. ## Rejected Alternatives (optional) This model eschews the formal actor concept of “behaviors” for now because the concept may make actor instance behavior less predictable. It can be added later in part by relaxing the one-to-one constraint between reaction functions and message types.