tags: programming languages, category theory, monads
# A Very Short Introduction to Monads
Monads are important in mathematics and in programming for apparently different reasons. To work out the unity behind this diversity takes effort, but carries beautiful reward.
In this note I approach monads from a programming point of view, but do use notation that emphasises that [monads in mathematics](https://en.wikipedia.org/wiki/Monad_(category_theory)#Formal_definition) and [monads in functional programming](https://en.wikipedia.org/wiki/Monad_(functional_programming)#Definition) are the same concept, even if it doesn't look like it at first glance. I have [other notes](https://hackmd.io/@alexhkurz/H1OxumxRP) that take this much further.
I spell out the examples in reasonable detail below, but to see them in a wider context look at my sections **Partial DFAs (Maybe Monad)** and **Non-Deterministic Automata (List Monad)** in my [Introduction to Automata and Haskell](https://hackmd.io/@alexhkurz/HylLKujCP).
(The programming examples are in Haskell, but can be implemented in any (Turing complete) language.[^haskell])
[^haskell]: Monads are particularly important in Haskell because this is how Haskell handles effects such as input, output, exceptions, state (assignment), probability, etc.
## Data Structures
Let us start with some generalities about data structures. A data structure is a data type together with some operations and some laws that the operations should satisfy.
**Example:** A stack has operations such as `push` and `pop` (and possibly more). To qualify as a stack, these operations also have to satisfy some equations such as
pop(push(e,s)) = s
where `e` is some data element and `s` is a stack.
Note that stacks can be implemented in many different ways, for example as FIFO linked lists or with arrays.
Monads can be similarly characterised by operations and equations. This is the approach we will take in the following. First, we will introduce our running example, namely lists. Second, we will explain the operations and equations that define the monad data type and show that lists are one instance. Third we will look at other instances and indicate why monads arise in many different guises.
## Running Example
Haskell has built-in lists with examples being `[1,2,3]` and also the empty list ``. I will use this notation in the following as an abbreviation, but provide our own explicit definition of lists:
data List a = Nil | Cons a (List a)
For example, `[1,2,3]` abbreviates `Cons 1 (Cons 2 (Cons 3 Nil))`.
**Exercise:** Define concatenation of lists in terms of `Nil` and `Cons`. Implement this definition as a recursive program in Haskell.
To define the data type of monads, we will give four operations. The first operation is important enough to define its own data type, the one of functors.[^map]
mapList :: (a -> b) -> (List a -> List b)
We skip the equations that a functor needs to satisfy for now.
## Monad Operations
Monads are functors with two additional operations. We call them `eta` and `mu` for two reasons. First, these names are neutral in the sense that they do not prejudice a particular meaning that may make more sense in one instance than in another. Second, the Greek letters $\eta$ and $\mu$ are the de facto standard for these operations in the mathmatical literature.
eta :: a -> List a
mu :: List (List a) -> List a
In mathematics, a monad is a functor together with these two operations. In the instance of lists, `eta` is often called "singleton" or and `mu` is called "flatten".
Another important operation is
lift :: (a -> List b) -> (List a -> List b)
The purpose of the first exercise is to discover how the monad operations are defined in the particular case of lists.
**Exercise:** Give recursive definitions of `mapList, eta, mu, lift` in terms of `Nil` and `Cons`. One can find the answer by writing down the only "reasonable" functions of the correct types. [Hint: For `mu` use concatenation.]
The next exercise illustrates that once we have the monad operations, we can write programs that work uniformly for all possible instances. It also shows that one does not need all of `map, eta, mu, lift` in order to define a monad. Having `map,eta,mu` or having `eta,lift` suffices.
(I write now `map` instead of `mapList` to emphasise that I don't want to restrict attention to the particular instance of lists.)
**Exercise:** Use `map` and `mu` to implement `lift`. Use `lift` (and the identity function) to implement `mu`. Use `lift` and `eta` to implement `map`. [Hint: Let yourself be guided by the types.]
mapList :: (a -> b) -> (List a -> List b)
mapList f Nil = Nil
mapList f (Cons x xs) = Cons (f x) (mapList f xs)
eta :: a -> List a
eta x = Cons x nil
mu :: List (List a) -> List a
mu Nil = Nil
mu (Cons xs xss) = mult xs (mu xss)
## Monad Equations
I will not go into the details of the equations of monads. But I want to illustrate that one can remember or find the equations by asking for those equations that hold in the case of lists.
Moreover, I want to emphasize that understanding the equations of `mu` and how `mu` implements the idea of "flattening" helps understanding monads in general.
Intuively, monads a are a data type with a notion of composition or flattening.
First, we notice that there are two ways to use the singleton map `eta` in order to turn a list into a lists of lists. In both cases, flattening the list with `mu` returns the original list:
[1,2,3] \stackrel \eta \mapsto [,,] \stackrel \mu \mapsto [1,2,3] \\
[1,2,3] \stackrel \eta \mapsto [[1,2,3]] \stackrel \mu \mapsto [1,2,3] \\
From the above we can obtain two different equations involving `eta` and `mu`.
A third equation arises from the two different ways of using `mu` to flatten a 'triple list' such as
One can first flatten the 'inner double list' and then the 'outer' one as in
$$[[[1,2,3],],[,,[6,7]]] \mapsto [[1,2,3,4],[5,6,7]] \mapsto [1,2,3,4,5,6,7] $$
or doing it in the other order as in
$$[[[1,2,3],],[,,[6,7]]] \mapsto [[1,2,3],,,,[6,7]] \mapsto [1,2,3,4,5,6,7].$$
The third monad equation tells us that both ways of reducing triple lists give the same result.
**Exercise** (optional): Write out a Haskell program that tests the equations indicated above on a variety of inputs.
## Monad Definition
A monad `m` is a data type that implements the operations [^mathNotation]
map :: (a -> b) -> (m a -> m b)
eta :: a -> m a
mu :: m (m a) -> m a
lift :: (a -> m b) -> (m a -> m b)
satisfying the equations alluded to above. We follow Haskell conventions here and write `m` for monad. Technically, `m` is a type variable in Haskell. Mathematically, `m` is a parameter of the definition. All `m` together with the operations and satisfying the equations are called a monad.
**Remark:** Remember that from one of exercises above, we know that we do not need all of these operations as basic operations. For example, `eta` and `lift` suffice.
**Remark:** While the Haskell compiler can check that the operations have the required types, it cannot check the equations.
Let us emphasize again that the definition above will work in the same way for all instances. This observation hints at one of the reason why the concept of monad is useful: We can write programs that are independent of any particular instance. To expand on this theme, we need to look at some other instances of monads.
## An Error Monad
Let us define a data type that allows us to wrap any other data type into one that adds the possibility of an error (exception) as result. You can experiment with the code following the link to [`wrap`](https://repl.it/@alexhkurz/wrap#main.hs). We define
data Wrap a = Error | Value a
mapWrap :: (a -> b) -> Wrap a -> Wrap b
eta :: a -> Wrap a
mu :: Wrap (Wrap a) -> Wrap a
lift :: (a -> Wrap b) -> (Wrap a -> Wrap b)
Btw, apart from naming conventions, this monad is the same as the Haskell's built-in `Maybe` monad.
**Remark:** What happens if in an recursive program we built up data of type `Wrap (Wrap (... (Wrap a)...)`? This is where `mu` comes in. Imagine that in a recursive function, after four recursive calls we hit an error, that is, we obtain `Value(Value(Value Error))`. Applying the equations of the definition of `mu` we can reduce this as follows:
Value(Value(Value Error)) =
Value(Value Error) =
Value Error =
This is the mechanism by which the error is propagated to the top level where it can then be shown to the user.
## Monads in Haskell, `do`-notation
So far I tried to strike a balance between the mathematical definition and the Haskell definition of monad. To understand how monads are used in Haskell, we need to learn a little bit more syntax, see the Haskellwiki article [Monad](https://wiki.haskell.org/Monad). It defines monads as follows:
You can see that `>>=` is what I called `lift`, just with the types arranged in a different order. `return` is `eta`. You also see that the equations can be written in a nice and concise way.
**Exercise:** Explain why each of the equations holds in the example of lists.
In mathematical terms, this exercise proves that lists are indeed a monad.
Finally, we need to know that what makes monads convenient to program with is the `do`-notation. It can be defined in terms of `return` and `>>=` as follows.
Note that the `<-` notation is only needed in case we want to pass on the result of one computation as an argument to the next. In case of printing, for example, we can simply write
main = do
This write-up can be seen as an preamble to my
- [Short Introduction to Monads](https://hackmd.io/@alexhkurz/H1OxumxRP)
where further references can be found. For a summary of monads in Haskell as well as further references see the article
- [Monad](https://wiki.haskell.org/Monad) in the Haskellwiki.
mult :: List a -> List a -> List a
mult Nil xs = xs
mult (Cons x xs) ys = Cons x (mult xs ys)
[^map]: In Haskell's built-in lists this is
map :: (a -> b) -> ([a] -> [b])
see [map at hoogle](https://hoogle.haskell.org/?hoogle=map&scope=set%3Astackage):
[^mathNotation]: In [mathematics](https://en.wikipedia.org/wiki/Monad_(category_theory)#Formal_definition), one typically writes $(T,\eta,\mu)$ for what I denote here by `m` and `eta` and `mu`. $T$ could remind us of "triple", or "type", or "terms". Instead of `lift f` one often writes $f^\ast$ as in this definition of the so-called [Kleisli triple](https://en.wikipedia.org/wiki/Kleisli_category#Extension_operators_and_Kleisli_triples).