# Lecture 14: Modularity
:::warning
Announcements
- **Please sit in the first 2 rows of the room**.
- `A6: Types II` deadline extended to Friday (late deadline Sunday)
- Quiz 2 next week Thursday, 11/6
- Practice quiz released by tomorrow
- Progress reports (from me) Friday, mid-semester survey (from you) due Tuesday
- Next help hours:
- Emily's 3-5pm tomorrow
:::
:::success
Agenda for today
- More on streams: abstraction for infinite data using thunks
- Access the current value of streams
- Building streams for infinite data
- Modularity
- Procedural abstraction
- General modularity, goals
- `module`, `struct`, `sig`
- Richer Abstract Data Types (ADTs)
:::
See the [previous class's lecture notes](https://hackmd.io/@avh/rkHKb7pAgx) for a walkthrough of the stream data type.
# Group exercises
:::success
Problem 1: Write a function "take_under_10" that takes a stream of integers and produces a list of numbers as long as the numbers are less than 10.
:::
:::spoiler Think, then click!
```ocaml
(* Write a function "take_under_10" that takes a stream of integers
and produces a list of numbers as long as the numbers are less than 10 *)
let rec take_under_10 (Stream s) =
let (x, xs) = s () in
if x < 10 then x :: take_under_10 xs
else []
```
:::
:::success
Problem 2: Write a stream for the powers of two.
:::
:::spoiler Think, then click!
```ocaml
(* Write a stream for the powers of two *)
let powers_of_two =
let rec go start = (start,
Stream (fun () -> go (2 * start))) in
Stream (fun () -> go 1)
```
:::
Next, we'll consider the general programming languages concept of *modularity*.
---
# Modularity
## Procedural abstraction
Consider these different OCaml functions:
```ocaml
let double x = x * 2
let double x = x + x
let y = 2
let double x = x * y
let double x =
let rec help n acc =
if n = 0 then acc else help (n - 1) (acc + 1)
in
help x x
```
:::success
Can you tell which one is called here?
```ocaml
utop # double 4 ;;
- : int = 8
```
:::
:::spoiler Think, then click!
No, we can see the behavior of `double`, but not its exact implementation.
:::
In this sense, procedures or functions have hidden implementations. *Procedural abstraction* is the idea that consumers of our function usually only need to know its input-output behavior, not the exact implementation.
This idea---of distinguishing between the implementation and the interface---applies more broadly than just functions.
For large software projects (which dominate real-world applications), it's important for code to be separable into individual components. Whereas a class project may be a few thousand lines of code, industrial codebases are tens of thousands to *millions* of lines of code (e.g., the Linux operating system kernel is over 40 million of lines of code). It would be impossible for a single engineer to keep an entire project of that size in their mind as they worked.
This motivates the idea of **modularity**: separating code into individual components that focus on specific concerns, such that each component can be implemented separately.
Modularity allows teams of software engineers to collaborate on large projects, often too large for a single engineer to understand the entirety of.
A code *module* is a discrete component of code that can be developed separately.
A module interacts with the world via its declared interface, which is an abstract description of the input/output behavior of the module (a "what can the module do", not a "how does the module do it"?).
This connects to the idea of *encapsulation*, where a programmer made choose to hide implementation details from the public interface of a module.
This gives the people implementing the actual guts of the module the flexibility to change the underlying implementation, without breaking code that depends on that module.
Different programming languages provide different specific mechanisms for modularity. However, they share two core features:
1. **Namespaces**. Namespaces allow us to group functionality into distinct scopes. For example, `List.fold_left` and `Array.fold_left` both are functions with the name of `fold_left`, but the namespace (of either `List` or `Array`) tells us which specific implementation of `fold_left` we mean.
Namespaces are essential as code repositories grow: it would be impossible for different engineers across a large company (or even different engineers across different companies) to coordinate on having totally unique function names across a project. With namespaces (combined with some conventions on naming), the same function name can be used in different contexts without issues.
In Java, for example, packages and classes are parts of the namespace mechanism.
2. **Abstraction**. Abstraction is the process of removing details to focus on the need-to-know components of a system. For modularity, abstraction is important because it allows consumers to understand how to interact with a module without needing to comprehend its exact implementation.
In Java, for example, interfaces provide abstractions over class implementations.
# OCaml `struct`s and `module`s
OCaml provides modularity through its `module` and `struct` keywords. A `struct` groups together bindings. Structs are not themselves values that can be passed around: instead, they must be bound to a name with the `module` keyword.
```ocaml
module MyMathLib = struct
let rec fact x = match x with
| 0 -> 1
| x -> x * fact (x - 1)
let tau = Float.pi *. 2.0
let double x = x * 2
exception DivByZero
type sign = Neg | NonNeg
end
let facts = List.map MyMathLib.fact [1; 3; 5; 7; 9]
```
The general form of a module declaration is:
```ocaml
module Name = struct
(* bindings *)
end
```
If we run this in `utop`, we get:
```ocaml
module MyMathLib :
sig
val fact : int -> int
val tau : float
val double : int -> int
exception DivByZero
type sign = Neg | NonNeg
end
val facts : int list = [1; 6; 120; 5040; 362880]
```
This tells us `MyMathLib` is a module, with a *signature*.
## Signatures
The *signature* specifies the abstract interface of the module. This tells us the visible bindings and types.
- For values, like functions and simple variables, the signature is what `utop` would return with `val`: their names and types.
- For defined types, like variant types and exceptions, the signature is the same as how they appeared in the original `struct` definition.
## Richer Abstract Data Types (ADTs)
We can use modules to link a **type** and **its operations**.
For example: our own type for rational numbers:
```ocaml
module Rational = struct
type rational = Whole of int | Frac of int * int
exception BadFrac
let make_frac (x, y) = ...
let add (r1, r2) = ...
let to_string r = ...
end
```
Say we wanted the following external guarantees. We may provide these in the documentation, so clients/consumers of this code can rely on them:
- No 0 denominators
- Strings in reduced form (e.g., `"3/2"` not `"9/6"`)
- No infinite loops or uncaught exceptions
We could also decide to have the following internal invariants for our implementation. We keep these private, rather than revealing them in the external specification or documentation.
- All denominators > 0
- All rationals returned are reduced
Signatures help preserve invariants.
Our code maintains (and relies on) invariants.
For example, we might have the invariant:
Maintain:
– `make_frac` disallows 0 denominator, removes negative
denominator, and reduces result
– `add` assumes invariants on inputs, calls reduce if needed
Rely:
– `to_string` assumes its argument is in reduced form
We could capture this with the following first attempt at a signature.
We'll intentionally keep potential internal functions like `gcd` and `reduce` private.
We can write a `module type` directly (what we got back from `utop` above). For clarity for these notes, we'll use all caps for `module type`s (this was common in Standard ML; though OCaml libraries primarily do not do this.)
```ocaml
module type RATIONAL_OPEN = sig
type rational =
| Whole of int
| Frac of int * int
exception BadFrac
val make_frac : int * int -> rational
val add : rational * rational -> rational
val to_string : rational -> string
end
```
We can then implement a module and tell OCaml we want it to have this type with a type annotation with `:`, like we've seen before:
```
```ocaml
module Rational : RATIONAL = struct ...
```
with the form:
```ocaml
module <module name> : <module type name> = struct ...
```
The OCaml compiler will yell at us if the module does not type check as having the module type!
If we missed some required bindings, we would get an error of the form:
```ocaml
Error: Signature mismatch:
...
The value `make_frac' is required but not provided
```
:::success
Does anyone see a problem with this first attempt? In particular, with maintaining our invariants when clients use this signature?
:::
:::spoiler Think, then click!
The problem with this first attempt is that clients can still create invalid values:
```ocaml
utop # Rational.Frac (1, 0)
- : Rational.rational = Rational.Frac (1, 0)
```
:::
## Solution: hide more!
An ADT can hide the concrete type definition so clients cannot create invariant-violating values of type. For example, we could update our signature to the following:
```ocaml=
module type RATIONAL = sig
type rational
exception BadFrac
val make_frac : int * int -> rational
val add : rational * rational -> rational
val to_string : rational -> string
end
```
And place this in a `.mli` file to expose to clients.
Privately, in a separate `ml` file, we could specify that `Rational` has module type `RATIONAL`. The choice to implement `type rational` with a variant type then becomes an internal, encapsulated decision!
```ocaml
module Rational : RATIONAL = struct
type rational =
| Whole of int
| Frac of int * int
...
end
```
- Line 2 tells us `rational` exists as a type, but the representation is hidden.
- Clients can pass values of type `rational` around, but can manipulate them only through our module.
- `make_frac` is the only way to make the first `rational`
- Only the provided bindings can operate on `rational` values.
The module controls all operations with `rational`, so clients cannot violate invariants!
Abstract Data Types: two key tools
Powerful ways to use signatures for hiding:
1. Deny bindings exist.
Especially val bindings, function bindings, and constructors.
2. Make types abstract.
Clients cannot create or inspect values of the type directly.
Next class, we'll see more directly how modularity allows us to choose different implementations for the same interface (`signature`, for OCaml).