Try   HackMD

Scopes and effect handlers

WARNING This is an experimental feature and syntax, you should not count on it being supported by future releases of OCaml! Proceed with caution!

This document was written because I learned something surprisingly obvious about effect handlers. It may help others since I haven't seen this blind spot discussed before.

While this begins with a quick introduction to effects, I am going to skip on a lot of details so you should get your hands dirty if you are not already familiar with the topic. It's really fun and if you want to follow along, this local environment works perfectly:

$ mkdir effects && cd effects
effects/ $ opam switch create '.' 'ocaml-variants.4.12.0+domains+effects'
effects/ $ eval $(opam env)

The effect system makes good use of advanced OCaml features, so I will try to explain them as we go. You may want to read https://github.com/ocamllabs/ocaml-effects-tutorial as a better introduction and https://github.com/ocaml-multicore/effects-examples to get hyped. You can also try eio from this local switch.

But if you are already familiar with those topics, then the trickery is near the end!

Let's begin! When playing with "pure" effects, you will see this weird pattern a lot: A match within parenthesis, where each case produces a lambda. The initial arguments of the lambda are given at the very end after the parenthesis. Each continue expects to receive the arguments of the lambda and is an opportunity to update them.

( match ... with
  | result -> (* executed last *)
      (fun _xN _yN -> result)

  | effect (E ...) k -> (* in the middle *)
      (fun x1 y1 ->
        ...
        continue k (response)  x2 y2
        ...)
)
  x0  (* initial state *)
  y0

As you can see, the flow is a bit reversed but you will get used to it very fast in practice.

Performing an effect is a lot like raising an exception: The corresponding case in the match will receive the argument of perform and a continuation k.

  • If you ignore k, then the effect behaves exactly like a standard exception.
  • But you can also call continue k with an argument, in which case the original perform will evaluate to that argument and the program will continue as if the effect never happened.

The only catch is that you can only continue the continuation once.

As a simple example, let's implement a "counter" effect. Each time we perform the Incr effect, we will receive the next natural number:

effect Incr : int

let with_counter fn =
  (match fn () with
   | result ->
      (fun _ -> result)

   | effect Incr k ->
      (fun (counter : int) -> continue k counter (counter + 1))
  )
  0

The with_counter takes a function as argument and calls it within the match ... with. This ensures that effects performed by fn will be caught and handled.

Notice how the continue takes both counter and counter + 1? The first one is the response, while the second is updating our counter state for the next time. The initial state of the counter is 0, provided on the very last line.

Once this function is defined, you can use it as follow: We first set up the with_counter handler, then each perform Incr triggers this effect handler and receive the new value of the counter.

with_counter (fun () ->
  Printf.printf "%i\n" (perform Incr) ; (* 0 *)
  Printf.printf "%i\n" (perform Incr) ; (* 1 *)
  Printf.printf "%i\n" (perform Incr) ; (* 2 *)
  Printf.printf "%i\n" (perform Incr) ; (* 3 *)
)

If we forget to wrap this code within with_counter, then we will get an Unhandled exception!

Side note: In OCaml there are two cute syntaxes when you have a function that takes a function as a last argument. The most common is the application (@@) operator:

with_counter @@ fun () ->
Printf.printf "%i\n" (perform Incr) ; (* 0 *)
Printf.printf "%i\n" (perform Incr) ; (* 1 *)

But you can also define the ( let@ ) syntax notation:

let ( let@ ) f x = f x ;;

let@ () = with_counter in
Printf.printf "counter = %i\n" (perform Incr) ; (* 0 *)
Printf.printf "counter = %i\n" (perform Incr) ; (* 1 *)

Both can drastically reduce indentation and help clarify your code!

Back to effects: If we use multiple with_counter in our code, each one will start from 0 and if we nest them, it's always the nearest one that will handle the Incr effect:

with_counter (fun () ->
  Printf.printf "counter = %i\n" (perform Incr) ;  (* 0 *)
  Printf.printf "counter = %i\n" (perform Incr) ;  (* 1 *)
  with_counter (fun () ->
    Printf.printf "nested = %i\n" (perform Incr) ; (* 0 *)
    (* we can't ask the outer counter from here *)
  ) ;
  Printf.printf "counter = %i\n" (perform Incr) ;  (* 2 *)
)

How could we reach the outer counter from the nested one? The question is a bit weird because something is missing: We have no way of telling the system that we want to perform this operation. We would need some way of "naming" which counter we want to increment such that we could write this code:

with_counter (fun x ->
  Printf.printf "x = %i\n" (x ()) ; (* 0 *)
  Printf.printf "x = %i\n" (x ()) ; (* 1 *)
  with_counter (fun y ->
    Printf.printf "y = %i\n" (y ()) ; (* 0 *)
    (* we can now ask the outer counter! *)
    Printf.printf "x = %i\n" (x ()) ; (* 2 *)
  ) ;
  Printf.printf "x = %i\n" (x ()) ; (* 3 *)
)

Let's implement this new with_counter. It's exactly the same code as before, but with a twist: We don't use the global Incr effect anymore, but rather we create a new local effect specific to each counter.

let with_counter fn =
  let open struct
    effect Local_incr : int
  end in
  (match fn (fun () -> perform Local_incr) with
   | result ->
      (fun _ -> result)

   | effect Local_incr k ->
      (fun (counter : int) -> continue k counter (counter + 1))
  )
  0

The let open struct ... end in syntax allow us to dynamically create new types, exceptions, and effects, that will only be available locally. We pass the counter as (fun () -> perform Local_incr) to the user function fn and handle its effects as before. When the user calls x (), it performs the corresponding Local_incr for which there is only one handler available so there's no confusion as to which counter should be incremented.

This might not be entirely obvious: when we create multiple counters, we define multiple Local_incr effects that have the same "name" in the code, but at runtime each one will be unique. We setup one effect handler for each of our counter, but since they match on different effects, they can't intercept messages intended to another.

If all of this makes sense, let's move on to implementing references. This isn't so different from the counter example, since both allow us to query and update state. The big difficulty is that references are not limited to storing int values.. we want them to be polymorphic.

Here is an example usage, showing that we can create references for different types of values:

let@ x = with_eref 42 in
let@ y = with_eref "hello" in
Printf.printf "x = %i, y = %S\n" (x.get ()) (y.get ()) ;  (*  42, "hello" *)
x.put 666 ;
Printf.printf "x = %i, y = %S\n" (x.get ()) (y.get ()) ;  (* 666, "hello" *)
y.put "bye" ;
Printf.printf "x = %i, y = %S\n" (x.get ()) (y.get ()) ;  (* 666, "bye" *)

Once again, we will hide the actual effects from the user: This time there are two operations get and put, so we package them in a record:

type 'a eref = (* the type of an effectful reference holding an ['a] value *)
  { get : unit -> 'a
  ; put : 'a -> unit
  }

The effect handler uses the same trick of locally defined effects as the counter and the code is very similar:

let with_eref
: type a.  a -> (a eref -> 'b) -> 'b
= fun initial_value fn ->

  (* local effects for this ['a] reference *)
  let open struct
    effect Get : a
    effect Put : a -> unit
  end in

  (* hide the details from the user *)
  let eref =
    { get = (fun () -> perform Get)
    ; put = (fun v -> perform (Put v))
    }
  in

  (match fn eref with
  | x ->
      (fun _ -> x)

  | effect Get k ->
      (fun (current_value : a) -> continue k current_value current_value)

  | effect (Put new_value) k ->
      (fun _ -> continue k () new_value)

  ) initial_value

The last lines are pretty important since they define the semantic of our references, but there is nothing new here.

The main difficulty is convincing the typechecker that references can hold different types of values. It happens on the second line where we specify the type of with_eref:

let with_eref
: type a.  a -> (a eref -> 'b) -> 'b

The type a. notation is explicitly introducing the polymorphic type variable a (when we would normally use 'a). Most importantly, this notation binds the type variable a for the rest of the code, so we are able to define local effects that depend on it:

let open struct
  effect Get : a
  effect Put : a -> unit
end in

(You might want to try removing type a and using 'a instead to see the difference!)

Side note: There's an alternative notation for binding type variables if we don't want to spell out the full type of the function with_eref, but it comes with a lot of gotchas so I don't recommend it:

let with_eref (type a) fn = ...

It's very impressive how algebraic effects are able to benefit from advanced features of OCaml that were not created for them!


While I hope this introduction was clear I must now reveal that those examples are actually a really bad use of algebraic effects.

The counters and references might have given you the wrong idea about algrebraic effects: They are not a real replacement for impure mutable side effects! In fact, they come with a very strong limitation when compared to the standard references of OCaml.

Remember the Unhandled exception when we try to perform an effect without first setting up a handler?

let escape = with_eref 42 (fun r -> r) in
escape.put 666

The effectful reference is created by with_eref but then it escapes its effect handler. When we then try to update the eref, the Put 666 effect is not handled by anyone and crashes the program.

In other words, the "pure" references are only usable from within the scope of their handler. A standard OCaml ref on the other hand will always be available until it is collected by the GC, but at this point if was already clear that it wasn't going to be used again.

The typesystem can't yet protect us from performing effects outside their handler, so you have to be careful. You might think that it's pretty easy to visually check that escape.get () only happens within the with_eref, but that's exactly why I'm writing this document. It's not!

In the mean time, the use of algebraic effects to simulate impure references kind of misses the point. Effects are a lot more useful for scripting control flow rather than value flow. Typically, this will involve changing the normal execution of our programs to create enumerators, coroutines, async/await, green thread, etc, as libraries. And it's going to be insanely cool!

This article is already too long to present a real scheduler as a more interesting use of algebraic effects. But we can cheat and create a really crappy one!

effect Spawn : (unit -> unit) -> unit

The Spawn (fun () -> ...) effect is intended to create a new thread of execution that will execute concurrently with the rest of our code:

let spawn f = perform (Spawn f) ;;

spawn (fun () -> sleep 60 ; print_endline "A") ;
spawn (fun () -> print_endline "B") ;

But since we won't do a real scheduler, we'll just evaluate the spawned thread right away before moving on:

let with_spawn fn =
  match fn () with
  | v -> v
  | effect (Spawn child) k ->
      child () ;
      continue k ()

(A real implementation would store the spawned child somewhere, to evaluate it later, with an additional Yield effect to cooperatively switch between threads)

It may not be obvious, but even this simple handler has a small bug: A spawned child can't itself perform a Spawn effect! That's easy enough to fix:

let rec with_spawn fn =
  match fn () with
  | v -> v
  | effect (Spawn child) k ->
      with_spawn child ;
      continue k ()

Now let's see how this works with the effectful references we defined before. We are not doing anything fancy, so it's easy to see that the following spawn will execute before we try printing the value of the reference:

with_spawn @@ fun () ->
  with_eref 42 @@ fun x ->
    spawn (fun () -> x.put 666) ;
    Printf.printf "x = %i\n" (x.get ()) (* 666 or 42? *)

Clearly, this should give us 666 right? Well The function (fun () -> x.put 666) will be executed by the with_spawn handler, outside of the with_eref handler! As a result, its Put 666 effect will not be handled by anyone and will crash with an Unhandled exception. The fact that the function (fun () -> x.put 666) is defined inside the with_eref doesn't matter at all since we performed a spawn effect to evaluate it in the scheduler (1).

The correct order of effect handlers should be reversed:

with_eref 42 @@ fun x ->
  with_spawn @@ fun () ->
    spawn (fun () -> x.put 666) ;
    Printf.printf "x = %i\n" (x.get ())

But that's just weird: we would normally setup the scheduler at the beginning of our main function, then setup the local effect handlers for the shorter lived effects like references.

And what would be the fix when there's a dynamic number of references? While the following code isn't great with a real scheduler because we don't wait for the threads to terminate before reading their outcome, that's not the reason why it doesn't work here:

with_spawn @@ fun () ->

  let rec parmap f acc = function
    | [] ->
        List.map (fun r -> r.get ()) acc
        
    | x::xs ->
        let@ r = with_eref None in
        spawn (fun () -> r.put (Some (f x))) ; (* Unhandled! *)
        parmap f (r :: acc) xs
  in
  parmap (fun x -> x * x) [1;2;3]

I want to state again that this is a self-injured issue, the problem doesn't exist with real references.

But it is however a control flow issue! The with_eref are not executed at the right time during the lifetime of the program And we can solve this with more effects!

type 'a cc = { cc : 'b. (('b -> 'a) -> 'a) -> 'b }

let with_scope (type a) fn =
  let open struct
    effect Lift : (('b -> a) -> a) -> 'b
  end in
  match fn { cc = fun s -> perform (Lift s) } with
  | v -> v
  | effect (Lift s) k -> s (fun x -> continue k x)

This combines a bit of everything we saw before, and even adds an explicit forall with the type cc. The type trickery is required to maximize the usefulness of this handler, as otherwise the type would be too restrictive for some programs (2).

This is dark magic from scheme sorcerers, so it's easier to understand what's going on by looking at its usage. The with_scope allow us to add dynamic effect handlers around the scheduler:

with_scope @@ fun lift ->
with_spawn @@ fun () ->

  let x =
    lift.cc @@ fun return ->
    with_eref 42 @@ fun r -> return r
  in

  spawn (fun () -> x.put 666) ; (* works! *)

  Printf.printf "x = %i\n" (x.get ()) (* 666 *)

The lift.cc will apply the with_eref effect handler at the position of the with_scope, at which point execution will continue normally. When the scheduler executes the spawned function, it does so at the position of the with_spawn so our reference effects will work correctly since the with_eref was lifted just above!

So even though it looks like we are using the reference x outside the scope of its with_eref, it actually became globally available. This lifting isn't free however: Even if the variable x can be reclaimed at some point, its referenced value and its effect handler will be held until we exit with_scope.

In conclusion, you can clearly do some insane stuff with effects!.. But it wasn't obvious to me that it would be so hard to tell when an effect handler is in scope.

(1) Here's a minimal example of escaping the scope of an event handler. The continuation of A doesn't capture the B effect handler, so its evaluation from C triggers an Unhandled exception:

effect A : unit
effect B : unit
effect C : (unit, unit) continuation -> unit eff

let () =
  match
    match
      match perform A ; perform B with
      | () -> ()
      | effect A k -> perform (C k)
    with
    | () -> ()
    | effect B k -> continue k ()
  with
  | () -> ()
  | effect (C child) k ->
      continue child () ; (* Unhandled B! *)
      continue k ()

This is not a bug with effects, but a feature! They are really one-shot delimited continuation, so keeping the B handler in scope would require duplicating the stack.

(2) If you add the creation of another reference with a different type, you will quickly discover why we need the forall in cc:

  let y =
    lift.cc @@ fun return ->
    with_eref "hello" @@ fun r -> return r
  in
  ...