Lazy - HackMD

--- id: lazy title: Lazy description: > Learn about OCaml's Lazy module category: "Introduction" level: "beginner" ---  ## Introduction OCaml's evaluation strategy is eager by default, meaning that every expression is computed as soon as it is defined. However, in many situations we may want to delaya computation until its result is actually needed. This technique is known as *lazy evaluation*. The `Lazy` module in OCaml offers a systematic way to create, manage, and force such deferred computations. Lazy evaluation can be useful for: - **Performance**: Deferring expensive computations until their results are needed - **Infinite Data Structures**: Defining streams or infinite sequences where only a finite prefix is demanded - **Caching**: Ensuring that a computation is performed only once and its result is reused thereafter If you're interested in handling lazy sequences of data, consider exploring the related [`Seq`](/manual/5.3/api/Seq.html) module and our [lesson on `Sequences`](/docs/sequences). ## Evaluation Strategies Programming languages use different **evaluation strategies** to determine *when* and *how* expressions are evaluated. A solid understanding of these strategies can help us reason about program behavior, optimize performance, and choose when lazy evaluation is appropriate. There are two separate contexts in which we can consider evaluation: 1. **Expression evaluation strategy**: how expressions are evaluated in general, including top-level bindings. Expression evaluation stategies include: - Eager/Strict Evaluation - Lazy Evaluation 2. **Function argument evaluation strategy**: how arguments are evaluated when passed to functions. Function argument evaluation strategies all share a common prefix and begin with *call-by-\** - Call-by-Value - Call-by-Name - Call-by-Need While these often align, they are conceptually distinct. For example, it is conceivable that a language may evaluate expressions eagerly but use different strategies for evaluating function arguments. As we will soon discover, OCaml's default evaluation strategies are *eager* and *call-by-value*, however we can opt into *lazy* and *call-by-need* evaluation strategies using the `Lazy` module, and similarly we can opt into *call-by-name* evaluation by using **thunks**. ### Summary of Evaluation Strategies OCaml uses **eager evaluation** and **call-by-value** by default: all expressions are evaluated as soon as they are bound, and function arguments are computed before the function executes. However, OCaml also provides native support for **lazy evaluation** through the `Lazy` module. By wrapping expressions with `lazy` and accessing them with `Lazy.force`, OCaml defers evaluation until needed and **memoizes** the result—this behavior corresponds to **call-by-need**, an optimized form of call-by-name. OCaml does not natively support pure **call-by-name**, but it can be emulated using thunks (`unit -> 'a`). | **Concept** | **Supported in OCaml?** | **How to Use or Emulated** | |----------------------+-------------------------+---------------------------------------------------------| | **Eager Evaluation** | Yes (default) | Implicit; all expressions are evaluated immediately. | | **Lazy Evaluation** | Yes (via `Lazy`) | Use `lazy` and `Lazy.force`. | | **Call-by-Value** | Yes (default) | Implicit; function arguments evaluated before the call. | | **Call-by-Name** | No (not native) | Emulate using thunks (`unit -> 'a`). | | **Call-by-Need** | Yes (via `Lazy`) | Use `lazy` and `Lazy.force` (memoized evaluation). | ### Eager Evaluation (also known as *strict evaluation*) In eager evaluation, expressions are computed as soon as they are bound to a variable or passed as arguments to a function. ```ocaml # let expensive_computation () = 1 let x = expensive_computation ();; (* Computed immediately *) val expensive_computation : unit -> int = <fun> val x : int = 1 ``` **Pros**: - Simple, predictable execution model. - Easier debugging, especially with side effects. **Cons**: - May perform unnecessary computations. - Not ideal for defining infinite data structures. *OCaml’s default expression evaluation strategy is eager evaluation.* ### Lazy Evaluation (also known as *non-strict evaluation*) With lazy evaluation, expressions are not computed until their results are actually needed. ```ocaml # let x = lazy (expensive_computation ()) (* Deferred computation *) let result = Lazy.force x;; (* Computation performed here *) val x : int lazy_t = lazy 1 val result : int = 1 ``` In OCaml, lazy values are *memoized*. Once computed, the result is cached. **Pros**: - Prevents unnecessary computations. - Enables infinite data structures (e.g., streams). - Improves performance for expensive or infrequent computations. **Cons**: - Harder to predict evaluation order. - Debugging is more challenging due to deferred side effects. ### Call-by-Value Call-by-value is a **function argument evaluation strategy** where arguments are evaluated *before* the function is called. ```ocaml # let f x = x + 1 let result = f (expensive_computation ());; (* Argument is computed before f *) val f : int -> int = <fun> val result : int = 2 ``` **Pros**: - Predictable behavior; aligns with eager evaluation. - Arguments are only computed once, before the call. *OCaml uses call-by-value by default for function arguments.* **Note**: While *eager evaluation* applies to all expressions (including top-level bindings), *call-by-value* specifically refers to the evaluation of function arguments. They are closely related but conceptually distinct. ### Call-by-Name In call-by-name, function arguments are not evaluated until they are actually used in the function body. OCaml does not support call-by-name natively but it can be emulated using **thunks** (functions of type `unit -> 'a`): ```ocaml # let f x_thunk = if true then x_thunk () + 1 else 0 let result = f (fun () -> expensive_computation ());; val f : (unit -> int) -> int = <fun> val result : int = 2 ``` **Pros**: - Avoids computing arguments that might not be used. - Useful for implementing control-flow constructs like short-circuiting. **Cons**: - Arguments used multiple times are recomputed each time. - Requires manual wrapping in thunks, which adds verbosity. **Note**: While *lazy evaluation* defers any expression's computation until needed (and memoizes it), *call-by-name* specifically delays **function argument evaluation** without caching. ### Call-by-Need (Lazy Evaluation with Memoization) Call-by-need is an optimization of call-by-name: evaluation is delayed until needed, and the result is cached to avoid recomputation. OCaml’s `Lazy` module implements call-by-need semantics: ```ocaml # let f x_lazy = let a = Lazy.force x_lazy in (* Computed once here *) let b = Lazy.force x_lazy in (* Cached result reused *) a + b let result = f (lazy (expensive_computation ()));; val f : int lazy_t -> int = <fun> val result : int = 2 ``` **Pros**: - Avoids unnecessary computations. - Ensures expensive computations happen at most once. ## Deferring Computation with Thunks and Closures To better understand how `Lazy` works, we will introduce **thunks** and **closures**, then compare and contrast them with `Lazy`. **Thunks** A *thunk* is a common strategy for deferring a computation by wrapping it in a function that takes `unit` as its only argument. To demonstrate this strategy, lets create a computation that is eagerly evaluated, then convert it to a deferred computation using a thunk: ``` ocaml # let random_number = Random.int 100;; val random_number : int = 42 ``` Under an eager evaluation strategy, `Random.int 100` is computed and bound to `random_number`. In effect, the compiler will convert `random_number` to: ``` ocaml # let random_number = 42;; val random_number : int = 42 ``` This eager evaluation of `Random.int 100` is made explicit in Utop's binding of `val random_number : int = 42`, where the value is computed upfront and bound to a constant. If we wish to defer the evaluation of `Random.int 100`, we can wrap the computation in an anonymous function: ``` ocaml # let deferred_random_number = fun () -> Random.int 100;; val deferred_random_number : unit -> int = <fun> ``` Because the computation is wrapped in a function, it will be deferred until explicitly called. This is made explicit in the type signature where `deferred_random_number` is `<fun>`, Utop's way of displaying functions in the REPL without revealing their implementation. The type signature `unit -> int` is typical for thunks, as they delay computation without requiring input arguments. To use this thunk and evaluate its computation at a call-site, we can call it by providing it a `unit` argument: ``` ocaml # deferred_random_number ();; - : int = 42 ``` Every time we call this thunk, the a new random number is returned, demonstrating that the eager computation of `Random.int 100` is avoided. For example: ``` ocaml # deferred_random_number ();; - : int = 2 ``` **Comparing Thunks to `Lazy`** In OCaml, lazy values are not simple thunks. Instead, they serve as an abstraction over thunks that automatically provide memoization. This means that a lazy value ensures that once the computation is forced, its result is stored and reused in subsequent accesses, preserving referential transparency. **Closures** Closures, like thunks, delay computation, but they can also retain state from their surrounding environment. This makes them more powerful because they can store results, enabling memoization. Let's define a function that generates a memoized random number: ``` ocaml # let make_memoized_random () = let result = ref None in fun () -> match !result with | Some n -> n | None -> let n = Random.int 100 in result := Some n; n;; val make_memoized_random : unit -> unit -> int = <fun> ``` Now, we can use it to create a memoized computation: ``` ocaml # let memoized_random = make_memoized_random ();; val memoized_random : unit -> int = <fun> # memoized_random ();; - : int = 21 (* Computed once *) ``` Now, if we call the function again, we will get the same result: ``` ocaml # memoized_random ();; - : int = 21 (* Cached, no recomputation *) ``` ### **Closures vs. `Lazy`** 1. **Deferred Computation** - Both closures and `Lazy` delay computation until explicitly forced. 2. **Memoization** - `Lazy` automatically memoizes, whereas closures require explicit storage via references. 3. **Flexibility** - Closures can capture variables and work in general cases beyond deferred computation. Using closures, we can replicate the key aspects of `Lazy`, though `Lazy` is more efficient and built-in. ## Creating Lazy Values Using the `Lazy` Modules There are three primary ways to create a lazy value in OCaml: 1. The `lazy` keyword. 2. `Lazy.from_val`. 3. `Lazy.from_fun`. Generally, it is recommended to use the `lazy` keyword unless we require specific optimizations or behavior and fully understand the implementation details of `Lazy`. ### Using the `lazy` Keyword The `lazy` keyword is syntactic sugar for creating lazy values. Its expression is not evaluated immediately; instead, it is packaged into a lazy value of type `'a lazy_t`. ```ocaml # let lazy_value_1 = lazy (expensive_computation ());; val lazy_value_1 : int option lazy_t = <lazy> ``` ## Forcing a Lazy Evaluation To evaluate (or *force*) a lazy value, use `Lazy.force`. When forced, one of two outcomes occur: 1. If the lazy value has not been computed, the deferred computation runs and its result is cached. 2. If the value has already been computed, the cached result is immediately returned. For example: ``` ocaml # let lazy_value_2 = lazy (print_endline "Computing..."; 42);; val lazy_value_2 : int lazy_t = <lazy> ``` The first call to force: ``` ocaml # let result1 = Lazy.force lazy_value_2;; Computing... val result1 : int = 42 ``` prints "Computing..." and returns `42`. Subsequent calls to `Lazy.force lazy_value_2` will simply return `42` without printing. ## Working with Lazy Values To determine if a lazy expression has already been evaluated, use the `Lazy.is_val` function: ```ocaml # let is_evaluated = Lazy.is_val lazy_value;; val is_evaluated : bool = true ``` ### Mapping over Lazy Values The `Lazy` module also offers a `Lazy.map` function, which applies a function to the result of a lazy value, returning a new lazy value. ``` ocaml # let lazy_value = lazy (42) let mapped_lazy = Lazy.map (fun x -> x * 2) lazy_value let result = Lazy.force mapped_lazy;; val lazy_value : int lazy_t = lazy 42 val mapped_lazy : int lazy_t = lazy 84 val result : int = 84 ``` ## Pattern Match on Lazy Values When we pattern-match using `lazy (pattern)`, OCaml forces the lazy expression to determine its structure. If the value hasn't been computed yet, it will be computed during pattern matching. Example: ``` ocaml # let expensive_computation () = None let lazy_x = lazy (expensive_computation ()) let lazy_option_map f x = match x with | lazy (Some v) -> Some (Lazy.force f v) | _ -> None;; let result = lazy_option_map (lazy (fun x -> failwith "Should not be evaluated")) lazy_x;; ``` **Note**: When `lazy (pattern)` appears in multiple branches and a lazy branch is matched upon, OCaml forces the lazy expression in each lazy branch to check each branch's structure, potentially forcing multiple evaluations. To avoid this, force the lazy value once before pattern matching: ``` ocaml let lazy_option_map f x = match Lazy.force x with | Some v -> Some (Lazy.force f v) | None -> None ``` ## Real-World Applications of `Lazy` The `Lazy` module in OCaml is widely used in scenarios where deferred computation is advantageous: 1. **Performance Optimization** Defer expensive computations until necessary (f.e., loading large configuration files). 2. **Infinite Data Structures** Create infinite sequences like streams without evaluating every element upfront (consider using the `Seq` module). 3. **Memoization** Cache results of expensive function calls to avoid redundant recomputation. 4. **Lazy Initialization** Delay global resource creation (e.g., singletons, database connections) until first accessed. 5. **Conditional Computation in Data Pipelines** Postpone computations until a particular stage in a pipeline requires them. 6. **Functional Reactive Programming (FRP) & UI Frameworks** Defer expensive recomputations until triggered by user interactions. 7. **Parsing and Lexing** Implement lazy lexers to avoid processing entire files upfront. 8. **Lazy Backtracking** Support non-deterministic computations in search algorithms and logic programming. 9. **On-Demand Logging and Debugging** Construct log messages only when logging is enabled to avoid unnecessary work. 10. **Converting Imperative to Functional Data Structures** Laziness enables efficient functional equivalents of imperative data structures, as demonstrated in Okasaki’s [Purely Functional Data Structures](https://www.cs.cmu.edu/~rwh/students/okasaki.pdf) by deferring costly operations until necessary, such as in lazy queues, deques, and search trees. ## Conclusion The `Lazy` module in OCaml is a powerful facility for deferring computations until needed. By wrapping expressions in lazy values, we gain fine control over evaluation order, enable infinite data structures, and improve performance by caching results. However, while lazy evaluation offers significant advantages in certain contexts, it also introduces complexities in debugging and memory management. For more details, see the [OCaml Standard Library Lazy documentation](/manual/latest/api/Lazy.html).