# Distributed Finite State Machines in FTL
**Author**: @aat
<!-- Status of the document - draft, in-review, etc. - is conveyed via labels -->
## Description
FTL's ability to compose functions lends itself well to defining both distributed workflows and FSM's.
This document deals with an initial implementation of FSM's.
:::info
What is the difference between FSM's and workflows? To quote [this](https://workflowengine.io/blog/workflow-engine-vs-state-machine/) page:
> In a workflow engine, transitioning to the next step occurs when a previous action is completed, whilst a state machine needs an external event that will cause branching to the next activity. In other words, state machine is event driven and workflow engine is not.
:::
## Goals
- Support DFA's only
## Non-goals
- We will not support workflows initially
- We wil not support NFA's initially - it would be more complex to reason about, but support can be added in the future relatively simply.
## Prerequisites
1. This design requires support for [leases](https://github.com/TBD54566975/ftl/issues/1300) to prevent duplicate execution.
2. Support for [retries](/Cbvh60deTPiW6p8B9op9JQ).
3. It also benefits from [sum types](https://github.com/TBD54566975/ftl/issues/1248) to allow multiple events to transition into a single state.
## Design
Given an FSM is a set of transitions in the form `(from, event, to)` we can model each state as a function, and the event as the input to the `to` function. As an example, given these verbs:
```graphql
verb from(From) Unit
verb to(To) Unit
```
The transition would be `(from, To, to)`. That is, when an event of type `To` is received when in state `from`, a transition would occur to state `to`.
FSMs will be defined in code (see below), and represented in the schema (also below).
Each "execution" of a state machine definition must be given a unique key that is used to reference all future events. The first event sent with a new unique key will create a new execution. That is, each send of an event must be in the form `(key, event)`.
### Required changes
- Add FSM to schema.
- A new database table to track FSM execution.
- Go API in `go-runtime/ftl` for constructing an FSM.
- New endpoint `VerbService.FSMSend()` for sending an event to an FSM, including retry behaviour.
- Add a scheduler to `backend/controller/fsm` for controlling the FSM and performing retries.
### Schema evolution
With traditional schema evolution it would be very difficult to reason about changes to an FSM over time. Once [versioning](https://github.com/TBD54566975/ftl/issues/1314) lands however, it will be much more straightforward: an FSM will continue execution using the versions of modules/verbs/types it referenced at deploy time. As with normal FTL versioning, extra constraints will need to be applied to persistent storage - databases, queues, etc.
### Go API
Given the example FSM from [here](https://blog.allegro.tech/2021/03/state-machines-made-easy.html) visualised as follows:
```mermaid
stateDiagram-v2
[*] --> Created: OnlinePaymentCreated
[*] --> Paid: OfflineRepaymentPaid
Created --> Paid: OnlineRepaymentPaid
Created --> Failed: OnlineRepaymentFailed
Paid --> Registered: PaymentRegistered
Registered --> Completed: PaymentCompleted
Paid --> Completed: PaymentCompleted
Completed --> Completed: PaymentRegistered
```
The FTL representation of this in Go might look something like:
```go
//ftl:sumtype
type PaymentPaid interface { paymentPaidSumType() }
type OnlinePaymentPaid struct {}
func (OnlinePaymentPaid) paymentPaidSumType() {}
type OfflinePaymentPaid struct {}
func (OfflinePaymentPaid) paymentPaidSumType() {}
//ftl:export
func Created(ctx context.Context, in OnlinePaymentCreated) error { ... }
//ftl:export
func Paid(ctx context.Context, in PaymentPaid) error { ... }
//ftl:export
func Failed(ctx context.Context, in OlinePaymentFailed) error { ... }
//ftl:export
func Registered(ctx context.Context, in PaymentRegistered) error { ... }
// Note: requires sum-type support to be complete
//ftl:sumtype
type CompletedIn interface { completedSumType() }
type PaymentCompleted struct {}
func (PaymentCompleted) completedSumType() {}
type PaymentRegistered struct {}
func (PaymentRegistered) completedSumType() {}
//ftl:export
func Completed(ctx context.Context, in CompletedIn) error { ... }
var paymentFSM = ftl.StateMachine(
"payment",
// Two start states.
ftl.Start(Created),
ftl.Start(Paid),
ftl.Transition(Created, Paid),
ftl.Transition(Created, Failed),
ftl.Transition(Paid, Registered),
ftl.Transition(Paid, Completed),
ftl.Transition(Completed, Completed),
)
//ftl:ingress http POST /payment
func StartOnlinePayment(ctx context.Context, in OnlinePayment) error {
// Sending an event to a start state will create a new execution of the state
// machine. All sends require a unique key to identify the execution, in this
// case it's the invoice ID.
return paymentFSM.Send(ctx, in.InvoiceID, OnlinePaymentCreated{
InvoiceID: in.InvoiceID,
Balance: in.Balance,
...,
})
}
```
The FTL schema extracted from this will look something like:
```graphql
fsm payment {
start created
start paid
transition created to paid
transition created to failed
transition paid to registered
transition paid to completed
transition completed to completed
}
```
### SQL Changes
The execution of state machines will be modelled in a single table tracking each "execution". Each execution is associated with a unique user-provided key, and created on-demand. Each execution also references a specific deployment in order to support versioning.
```sql
CREATE TYPE fsm_status AS ENUM ('running', 'completed', 'failed');
CREATE TABLE fsm_executions (
id BIGINT NOT NULL GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT (NOW() AT TIME ZONE 'utc'),
-- Unique user-provided key identifying the run. In the example above this would be the "invoice ID".
key TEXT UNIQUE NOT NULL,
deployment_id BIGINT NOT NULL REFERENCES deployment(id),
name TEXT NOT NULL,
status fsm_status NOT NULL DEFAULT 'running'::fsm_status,
-- The current state of the state machine.
current_state TEXT NOT NULL,
);
CREATE INDEX idx_fsm_executions_key ON fsm_executions(key);
CREATE INDEX idx_fsm_executions_state ON fsm_executions(status);
```
We'll also introduce a retries table. This will initially be added purley so that states can retry, but at some point in the future we'll extend this to be more flexible and generalised.
*[FSM]: Finite State Machine
*[NFA]: Non-deterministic Finite Automota - FSM supporting parallel execution
*[DFA]: Deterministic Finite Automota