(not in final form)
Part 1 of a short course on monads. See also the introductory video on What is algebraic structure?
We start with monoids as an example of algebraic structures, then study the structure of a particular monoid, the free monoid (=monoid of finite words over an alphabet). This structure will later turnout to be a monad. For now, we will be happy to study this structure via its implementation in Haskell.
In the first lecture we concentrated on the sections labelled Examples, Axioms, and Homework. We did a quick review of the remaining material at the beginning of the second lecture. For a first short route through the material of this lecture
Programs: monoids
, monoidshs
, wrap
.
Finite state machines or deterministic finite automata (DFA) are given by a function
Notation: "sigma" reminds of "symbol" and "Q" for the set of states is traditional.
At this point I recommend to have a quick look at my Very Short Introduction to Automata and Haskell. The following assumes that the reader is somewhat familiar with DFAs.
We can think of each letter
These functions compose: For any two letters
and maps a state
Notation: I apologize for switching the order of
is annoying, but, at some point, unavoidable because it stems from the clash of two traditions:
Mathematicians write composition of functions
as
Computer scientists read a word
A reasonable compromise seems the following. We introduce to the mathematical notation a new symbol "
Here
We can push this even further and write x.input_a()
in OO-languages such as Java. With this in mind, I will sometimes write
Note that if we put everything together we have
where I dropped, for better readability, one pair of parentheses on the left-hand side of the last equation.
As we will see later on, catgory theorists love to write mathematics in diagrammatic form. We can already see why this is a good idea. After all, the long discussion on notation above was only about a diagram of the form
Monoids are closely related to a "data type" the elements of which are variously known as words, strings, lists, tuples, or sequences. As this data type is of fundamental importance in mathematics and programming there are many different notations for it. Take note of those familiar to you and feel free to ignore the others.
In the examples below we take
domain | an element of the data type is called | neutral element (length |
example of length |
---|---|---|---|
English | word | ||
Automata and Formal Languages | word | ||
Maths | sequence, tuple | ||
Programming | string | "" |
"the" |
Programming | list | Nil |
Cons 't' (Cons 'h' (Cons 'e' Nil)) |
Programming | list | [] |
["t","h","e"] |
This list is not exhaustive, many further variations do exists both in math and in programming. In the following I may (possibly inadvertently) switch between these notations. Give me a shout if you are in doubt.
(essential)
Given an set
Notation: If we think of the elements of
Terminology: The operation that takes a set
The set of words
The free monoid can be represented in different, and equally useful, ways. The first two will appear again in the last lecture when we will define monads and are named for Eilenberg and Moore and Kleisli. If you want to avoid the mathematical notation for now, you can instead also look at the Haskell code in monoids
. The mathematical notation has the advantage of being independent of any particular programming language. [1]
In fact, it may be useful to give a table of translations already now, even if not all the notation will be explained right here: [2] [3]
my Lists in Math | my Lists in Haskell | built-in lists in Haskell | Generic Monads in Math | Generic Monads in Haskell |
---|---|---|---|---|
List a |
[a] |
m a |
||
mapList f |
map f |
fmap f |
||
eta x |
[x] |
return x |
||
mu |
concat |
join |
||
lift g t |
t >>= g |
t >>= g |
The first column shows widely used notation for the list monad (=free monoid monad) in mathematics, which I adapted, in the second column for my implementation of the list-monad in Haskell, see monoids
. If you want to use instead the built-in Haskell list-monad use the notation of the third column, see monoidshs
. Widely used notation for generic monads in mathematics is given in the fourth column, while the fifth column does the same for Haskell, see also here and here.
The following three Notations can be skipped on first reading. The idea is that, in order to define the list monad, there are various equivalent ways to choose the basic operations from which the others can be defined.
Exercise: Compute monoidshs
.
Programming Activity: Think about whether these functions are in your your list library. Can you find some equations that these operations must satisfy? Write some list processing algorithms that make use of these functions. [4]
The next exercise contains the essential insights needed to prove that Notation 1, 2, and 3 are equivalent.
Exercise: Define monoidshs
.
Remark: The equations for Notations 1 and 2 are generic while the ones of Notation 3 depend on a particular presentation. If that remark is not clear now, I hope it will become clear as we go on. Ask me if in doubt.
(important)
We will now use Notation 3 to formally define monoids. We will come back to Notation 1 and 2 in the exercises and when we define monads in Lecture 4.
A monoid
A (right) monoid action
such that
Remark: One can also define a left action. Or an action both from the left and the right. [5]
Activity/Exercise: How do you have to map the notation
The following are examples of monoids.
(can be skipped for now, but these topics will show up later again)
There is a lot of theory to choose from …
Equational reasoning by reduction to normal form
We have seen that monoids are defined up to the equations of identity and associativity. This means that there are a lot of terms/expressions that are equivalent but look different. For computation and reasoning that is a problem. How do we know that two terms are equivalent? Is equivalence even decidable? Is there always a "canonical" or "normal" term we can pick? This is not always possible, but if it is, it makes life much easier.
Representation of monoids.
We have seen that functions give rise to monoids. What about the converse? In fact, every monoid can be represented as a monoid of functions. This representation theorem is important because it answers the question: What is a monoid? In other words, the monoid axioms capture exactly the properties of function composition. Moreover, its categorical generalisation, the Yoneda lemma, is very powerful and the cornerstone of category theory.
Quotienting by congruence relations
When can we simplify a monoid by identifying several of its elements. We introduce congruence relations and show that they allow us to quotient monoids. This allows us to abstract, or simplify, in a structure preserving way. For example, we can use this to minimize automata, a particular way to do data compression.
In the following I will only look at the 2nd item. It is relevant conceptually, but not needed technically.
The 3rd item is interesting because algebras for a monad are closed under congruence relations. But while important, this observation is not central to what follows in this course. (I may fill it in later, though.)
Functions form monoids because identity is a function and functions have associative composition.
But some monoids, such as
In fact we can …
Proposition: Every monoid can be represented as a monoid of functions.
Proof (Sketch): Let
Remark: Using notation from
Exercise: Finish the proof above. Interpret the construction of the proof in various examples.
Remark: In group theory, the proposition is known as Cayley's Theorem (all groups are groups of permutations).
Here I assume familiarity with the notion of equivalence relation and the quotient of a set by a an equivalence relation. See What is Abstraction? for a general introduction and Equivalence Relations for a summary of the relevant technicalities.
Definition (congruence relation): Let
This rule should be obviously true if you think of
Proposition: The quotient of a monoid by a congruence relation is a monoid.
Maths Exercise: Prove the proposition.
Answer. [9]
The mathematical essence of the definition of a monad we will see in Lecture 4 can be understood as an axiomatic generalisation of the following fact.
Proposition: Every monoid
Maths Exercise: Prove the proposition.
Answer. [10]
(incomplete … can be skipped)
We first show with the help of congruence relations that every automaton can be minimised.
Then we indicate briefly that all we learned about monoids in this lecture extends to general algebras, that is, mathematical structures defined by arbitrary operations and equations.
General algebras are studied in the area of mathematics called Universal Algebra. Universal algebra is closely related to monads. In fact, much of universal algebra can be seen as the study of monads from the point of view of operations and equations. In these lectures, we will not have the time to introduce universal algebra and will just say enough to be able to point out some connections and directions a curious reader might want to follow up.
One of the many connections of universal algebra with programming is that many data types can be understood as algebras.[11] Another connection is given by an (until recently unsolved) problem in the area of constraint satisfaction.
Minimising DFAs is such an important optimisation technique for solving many practical problems (such as searching for patterns in text) that there are various different ways to construct them. I choose here one that emphasises the point of view of automata-as-monoids.
Every DFA
- The initial state is the equivalence class of the empty word.
- The transition function is given by
- The final states are the equivalence classes of …
If you have time for just one homework, do the one on monoids. We will use it later.
If you want to do some mathematics show the following.
There is more to the equivaence of Eilenberg-Moore and Kleisli, namely the preservation of certain equations. We will come back to this in Lecture 4.
Btw, the Haskell implementation below can help with the solving this exercise.
The aim of the homework is to practice translating from maths to Haskell and back. We look at two different kind of algebraic structures.
Monoids
We learn how to translate into Haskell some of the math above, in particular the Eilenberg-Moore and the Kleisli notation.
An Error Monad
We build the simplest possible monad in Haskell and investigate it from mathematical point of view.
In both cases we use the algebraic/monadic approach in order to take into account that data obeys equations:
(essential, we will build on this throughout the following lectures)
Above we defined the so-called free monoid a
for List a
for Nil
for the empty word and Cons
for the operation that prefixes a word by a letter.
data List a = Nil | Cons a (List a)
To experiment follow the link monoids
. I also made a version that uses Haskell's predefined lists in monoidshs
. [13]
For List
to be a monoid, we need a neutral element and multiplication. The neutral element will be Nil
. Multiplication (which is concatenation) can be defined by recursion:
mult :: List a -> List a -> List a
mult Nil xs = xs
mult (Cons x xs) ys = Cons x (mult xs ys)
Exercise: Do Nil
and mult
satisfy the equations of identity and associativity? Write some Haskell code that is testing these laws.
Next we go back to Notation 1 and 2 above, where we introduced functions
In Haskell we can implement these functions as follows [14]
mapList :: (a -> b) -> (List a -> List b)
mapList f Nil = Nil
mapList f (Cons x xs) = Cons (f x) (mapList f xs)
eta :: a -> List a
eta x = Cons x nil
mu :: List (List a) -> List a
mu Nil = Nil
mu (Cons xs xss) = mult xs (mu xss)
lift :: (a -> List b) -> List a -> List b
lift g xs = mu (mapList g xs)
Exercise: Execute these functions pen-and-paper on some small examples. (You also may want to find more instructive names for the operations … I kept names generic as they will appear later in different contexts.)
Exercise (Eilenberg-Moore): Write Haskell functions that test the following equations on input data of your choice. (Recall that mapList f
.)
Note: In the first two equations
Exercise (Kleisli): Write Haskell functions that test the following equations on input data of your choice.
(important but optional)
Let us define a data type that allows us to wrap any other data type into one that adds the possibility of an error as result. You can experiment with the code following the link to wrap
. We define
data Wrap a = Error | Value a
Mathematically, we can think of Wrap
as an operation that takes a set
Where do the equations come in? In the case of the monoids we knew that we had to take into account the laws of identity and associativity. On the other hand, there do not seem to be any obvious equations relating errors and values.
But if we go back to the monoids we see that the equations come into play when we iterate the type constructor
mu :: List (List a) -> List a
In fact, it is mu
that encodes the laws of identity and associativity.
Something similar happens for Wrap
. After all, error hanling through functions that call each other recursively can lead to a whole stack Wrap (Wrap ( ...)...)
and in order to extract an error or value from this stack, we need to flatten it. What are the equations for this flattening?
Exercise: Justify the following equations in terms of error handling:
We write this in Haskell as
mu :: Wrap (Wrap a) -> Wrap a
mu Error = Error
mu (Value Error) = Error
mu (Value (Value x)) = Value x
So far so good. But for an error handling library we need more functionality. We need a function
eta :: a -> Wrap a
that wraps a value, a function
mapWrap :: (a -> b) -> Wrap a -> Wrap b
that allows to change wrapped values, and a function
lift :: (a -> Wrap b) -> (Wrap a -> Wrap b)
that allows us to chain computations.[15]
Exercise: Head over to wrap
and implement these functions so that they pass the tests specified in the file.
We started with monoids for several reasons. First, they are a fundamental structure in computer science, not least because monoid actions model time-irreversible behaviour. Second, monoids are one of the simplest algebraic structures. Third, groups are special monoids and play a central role in much of mathematics. Fourth, we hinted at the relevance of these structures to algebraic data types. In the next lecture, we will explain this in more detail. We will also generalised monoids to universal algebras and give our first axiomatic answer to the question of what is algebraic structure.
For more on automata see
For more on the List monad in Haskell see
School of Haskell: The List Monad.
Monday Morning Haskell: The Reader and Writer Monad with source code.
…
This is just a quite random list of sources that came to mind while writing these notes. They are not directly relevant but before I forget about them again, I though I keep them here. Pirog's Eilenberg–Moore Monoids and Backtracking Monad Transformers and other related papers contain more interesting examples/references on programming with monands. A good starting point would be Wadler's How to Replace Failure by a List of Successes or Bird's program to solve Sudoku. More Sudoku solvers in Haskell. I originally planned to say something about syntactic monoids, my reference for this was Pin's Syntactic Semigroups. The idea of time as a monoid has many manifestations. I was reminded for example of the coalgebraic semantics of timed processes.
…
Programming with the List Monad
…
Stackoverflow on Set, List and Monad Comprehension. This explains how to extend list-comprehension to all monads.
…
-- https://hackmd.io/@alexhkurz/B1oIuXxRv
-- the free monoid over `a`
data List a = Nil | Cons a (List a)
deriving (Show, Eq)
-- multiplication / concatenation
mult :: List a -> List a -> List a
mult Nil xs = xs
mult (Cons x xs) ys = Cons x (mult xs ys)
-- fold a `List a` over a monoid `b -> (b->b->b)`
foldMon :: (a->b) -> b -> (b->b->b) -> List a -> b
foldMon f e op Nil = e
foldMon f e op (Cons x xs) = op (f x) (foldMon f e op xs)
mapList :: (a -> b) -> (List a -> List b)
mapList f Nil = Nil
mapList f (Cons x xs) = Cons (f x) (mapList f xs)
eta :: a -> List a
eta x = Cons x Nil
mu :: List (List a) -> List a
mu Nil = Nil
mu (Cons xs xss) = mult xs (mu xss)
lift :: (a -> List b) -> List a -> List b
lift f xs = mu (mapList f xs)
-- testing
display :: List a -> [a]
display Nil = []
display (Cons x xs) = x:(display xs)
{- does not work:
display :: List a -> [a]
display Nil = []
display (Cons x xs) = case x of
Nil -> []:(display xs)
Cons y ys -> (display (Cons y ys)):(display xs)
x -> x:(display xs)
-}
-- add your own tests
main = do
let l123 = Cons 1 (Cons 2 (Cons 3 Nil))
let l456 = Cons 4 (Cons 5 (Cons 6 Nil))
let l789 = Cons 7 (Cons 8 (Cons 9 Nil))
let l = l123
let element = 7
let f = (\ x -> eta (x + 1))
let g = (\ x -> eta (2*x))
let lift_f_after_g x = lift f (g x)
let l3 = Cons (Cons l123 (Cons l456 Nil)) (Cons (Cons l789 Nil) Nil)
putStr "test multiplication: [1,2,3] [4,5,6] == " ; print $ display $ (mult l123 l456)
putStr "fold + over [4,5,6]: " ; print $ foldMon (\x->x) 0 (\x->(\y->(x+y))) l456
putStr "fold * over [4,5,6]: " ; print $ foldMon (\x->x) 1 (\x->(\y->(x*y))) l456
print "test Eilenberg-Moore laws for eta, map, mu:"
print $ mu ( mapList eta l123) == l123
print $ mu (eta l123) == l123
print $ (mu (mu l3)) == mu (mapList mu l3)
print "test Kleisli laws for eta, lift:"
print $ (lift eta l123 == l123)
print $ (lift f (eta element) == f element)
print $ (lift lift_f_after_g l123) == lift f (lift g l123)
Monads can be implemented in any programming language. Here is an example of monads in ML. ↩︎
The notation
t >>= g
, or should we write as >>= bs
is usually written as
do a <- as
bs a
If you know monads in Haskell try to match them up with the Kleisli notation. ↩︎
DFAs are right actions. DFAs cannot be reverse because not all functions have inverses. On the other hand, relations can be reversed and non-deterministic automata give rise to both left and right actions. ↩︎
(eg integers with addition but not integers with multiplication) ↩︎
There is an analogy here with Church numberals. Church numerals are functions used to represent numbers. Here we did go the other way round and used numbers to define functions ↩︎
Read
We define the quotient
We have to show that this definition does not depend on the choice of "representatives"
We define
The idea of so-called abstract data types is easy to explain. An interface for a data type such as queues can be described independently from any implementation by operations
nil: Queue
enqueue: Queue x Elem -> Queue
dequeue: Queue -> Queue
head: Queue -> Elem
The operation head
may also be called front
(for fifo) or end
(for lifo), but here we want to emphasize that both lifo and fifo are based on the same types (or sorts) Queue
and Elem
and the same operations (or signature).
Nevertheless, the difference between lifo and fifo does show up in an implementation-independent, purely algebraic way, namely in the equations that hold for them.
For example, for lifo we have
dequeue(enqueue(s,e)) = s
head(enqueue(s,e)) = e
which do not hold for fifo. (Exercise: Write out more equations for lifo and fifo.) ↩︎
Associativity tells us that all terms can be normalised by moving all parentheses to the left (or right). Lists can be understood as an inductive definition of these normal forms. ↩︎
Haskell's predefined lists are more readable and but one must rely on some "secret" Haskell internals. ↩︎
Renaming a,b
and eta
and mapList f
and mu
and lift g
. ↩︎
Suppose we have two functions
f :: (Int -> Wrap Int)
g :: (Int -> Wrap Int)
and we want to compute f (g x)
. The solution is to use f (lift g x)
. ↩︎