# Distributed Finite State Machines in FTL **Author**: @aat <!-- Status of the document - draft, in-review, etc. - is conveyed via labels --> ## Description FTL's ability to compose functions lends itself well to defining both distributed workflows and FSM's. This document deals with an initial implementation of FSM's. :::info What is the difference between FSM's and workflows? To quote [this](https://workflowengine.io/blog/workflow-engine-vs-state-machine/) page: > In a workflow engine, transitioning to the next step occurs when a previous action is completed, whilst a state machine needs an external event that will cause branching to the next activity. In other words, state machine is event driven and workflow engine is not. ::: ## Goals - Support DFA's only ## Non-goals - We will not support workflows initially - We wil not support NFA's initially - it would be more complex to reason about, but support can be added in the future relatively simply. ## Prerequisites 1. This design requires support for [leases](https://github.com/TBD54566975/ftl/issues/1300) to prevent duplicate execution. 2. Support for [retries](/Cbvh60deTPiW6p8B9op9JQ). 3. It also benefits from [sum types](https://github.com/TBD54566975/ftl/issues/1248) to allow multiple events to transition into a single state. ## Design Given an FSM is a set of transitions in the form `(from, event, to)` we can model each state as a function, and the event as the input to the `to` function. As an example, given these verbs: ```graphql verb from(From) Unit verb to(To) Unit ``` The transition would be `(from, To, to)`. That is, when an event of type `To` is received when in state `from`, a transition would occur to state `to`. FSMs will be defined in code (see below), and represented in the schema (also below). Each "execution" of a state machine definition must be given a unique key that is used to reference all future events. The first event sent with a new unique key will create a new execution. That is, each send of an event must be in the form `(key, event)`. ### Required changes - Add FSM to schema. - A new database table to track FSM execution. - Go API in `go-runtime/ftl` for constructing an FSM. - New endpoint `VerbService.FSMSend()` for sending an event to an FSM, including retry behaviour. - Add a scheduler to `backend/controller/fsm` for controlling the FSM and performing retries. ### Schema evolution With traditional schema evolution it would be very difficult to reason about changes to an FSM over time. Once [versioning](https://github.com/TBD54566975/ftl/issues/1314) lands however, it will be much more straightforward: an FSM will continue execution using the versions of modules/verbs/types it referenced at deploy time. As with normal FTL versioning, extra constraints will need to be applied to persistent storage - databases, queues, etc. ### Go API Given the example FSM from [here](https://blog.allegro.tech/2021/03/state-machines-made-easy.html) visualised as follows: ```mermaid stateDiagram-v2 [*] --> Created: OnlinePaymentCreated [*] --> Paid: OfflineRepaymentPaid Created --> Paid: OnlineRepaymentPaid Created --> Failed: OnlineRepaymentFailed Paid --> Registered: PaymentRegistered Registered --> Completed: PaymentCompleted Paid --> Completed: PaymentCompleted Completed --> Completed: PaymentRegistered ``` The FTL representation of this in Go might look something like: ```go //ftl:sumtype type PaymentPaid interface { paymentPaidSumType() } type OnlinePaymentPaid struct {} func (OnlinePaymentPaid) paymentPaidSumType() {} type OfflinePaymentPaid struct {} func (OfflinePaymentPaid) paymentPaidSumType() {} //ftl:export func Created(ctx context.Context, in OnlinePaymentCreated) error { ... } //ftl:export func Paid(ctx context.Context, in PaymentPaid) error { ... } //ftl:export func Failed(ctx context.Context, in OlinePaymentFailed) error { ... } //ftl:export func Registered(ctx context.Context, in PaymentRegistered) error { ... } // Note: requires sum-type support to be complete //ftl:sumtype type CompletedIn interface { completedSumType() } type PaymentCompleted struct {} func (PaymentCompleted) completedSumType() {} type PaymentRegistered struct {} func (PaymentRegistered) completedSumType() {} //ftl:export func Completed(ctx context.Context, in CompletedIn) error { ... } var paymentFSM = ftl.StateMachine( "payment", // Two start states. ftl.Start(Created), ftl.Start(Paid), ftl.Transition(Created, Paid), ftl.Transition(Created, Failed), ftl.Transition(Paid, Registered), ftl.Transition(Paid, Completed), ftl.Transition(Completed, Completed), ) //ftl:ingress http POST /payment func StartOnlinePayment(ctx context.Context, in OnlinePayment) error { // Sending an event to a start state will create a new execution of the state // machine. All sends require a unique key to identify the execution, in this // case it's the invoice ID. return paymentFSM.Send(ctx, in.InvoiceID, OnlinePaymentCreated{ InvoiceID: in.InvoiceID, Balance: in.Balance, ..., }) } ``` The FTL schema extracted from this will look something like: ```graphql fsm payment { start created start paid transition created to paid transition created to failed transition paid to registered transition paid to completed transition completed to completed } ``` ### SQL Changes The execution of state machines will be modelled in a single table tracking each "execution". Each execution is associated with a unique user-provided key, and created on-demand. Each execution also references a specific deployment in order to support versioning. ```sql CREATE TYPE fsm_status AS ENUM ('running', 'completed', 'failed'); CREATE TABLE fsm_executions ( id BIGINT NOT NULL GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, created_at TIMESTAMPTZ NOT NULL DEFAULT (NOW() AT TIME ZONE 'utc'), -- Unique user-provided key identifying the run. In the example above this would be the "invoice ID". key TEXT UNIQUE NOT NULL, deployment_id BIGINT NOT NULL REFERENCES deployment(id), name TEXT NOT NULL, status fsm_status NOT NULL DEFAULT 'running'::fsm_status, -- The current state of the state machine. current_state TEXT NOT NULL, ); CREATE INDEX idx_fsm_executions_key ON fsm_executions(key); CREATE INDEX idx_fsm_executions_state ON fsm_executions(status); ``` We'll also introduce a retries table. This will initially be added purley so that states can retry, but at some point in the future we'll extend this to be more flexible and generalised. *[FSM]: Finite State Machine *[NFA]: Non-deterministic Finite Automota - FSM supporting parallel execution *[DFA]: Deterministic Finite Automota