or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Monoids
Automata and Lists
(not in final form)
Part 1 of a short course on monads. See also the introductory video on What is algebraic structure?
We start with monoids as an example of algebraic structures, then study the structure of a particular monoid, the free monoid (=monoid of finite words over an alphabet). This structure will later turnout to be a monad. For now, we will be happy to study this structure via its implementation in Haskell.
In the first lecture we concentrated on the sections labelled Examples, Axioms, and Homework. We did a quick review of the remaining material at the beginning of the second lecture. For a first short route through the material of this lecture
Programs:
monoids
,monoidshs
,wrap
.Examples
Deterministic Finite Automata
Finite state machines or deterministic finite automata (DFA) are given by a function
\[InputSymbols \times States \to States\]
\[ \Sigma \times Q \to Q\]
Notation: "sigma" reminds of "symbol" and "Q" for the set of states is traditional. \(\Sigma\) is also called the alphabet. It is common to denote elements of \(\Sigma\) by \(a,b,c,\) etc. For more background I refer to Chapter 2 of the influential textbook Introduction to Automata Theory, Languages,and Computation. For most of the following, I will ignore initial and final states, which are also part of the definition of a DFA.
At this point I recommend to have a quick look at my Very Short Introduction to Automata and Haskell. The following assumes that the reader is somewhat familiar with DFAs.
We can think of each letter \(a\in\Sigma\) as a function
\[\overline a: Q\to Q\]
These functions compose: For any two letters \(a,b\) there is a word of two letters \(ab\) which corresponds to another function
\[\overline{ab}\ :\ Q\to Q\]
and maps a state \(q\in Q\) to
\[(\overline b\circ\overline a)(q) = \overline b (\overline a(q)))\]
Notation: I apologize for switching the order of \(a\) and \(b\). In particular, the equation
\[\overline{ab}=\overline b \circ \overline a\]
is annoying, but, at some point, unavoidable because it stems from the clash of two traditions:
Mathematicians write composition of functions
\[A\stackrel f\to B\stackrel g\to C\]
as \(g\circ f\) because \(g\circ f\) is defined as
\[g\circ f\ (x) = g(f(x)).\]
Computer scientists read a word \(ab\) from left to right as "first input \(a\), then input \(b\)".
A reasonable compromise seems the following. We introduce to the mathematical notation a new symbol "\(;\)" which we define as
\[ f;g = g\circ f\]
Here \(;\) reminds us of sequential composition in programming languages "first \(f\), then \(g\)".
We can push this even further and write \(x\,. f\) instead of \(f(x)\), reminding us of the notation
x.input_a()
in OO-languages such as Java. With this in mind, I will sometimes write\[x\,.f = f(x)\]
Note that if we put everything together we have
\begin{align} x.(f;g) & = (x.f).g\\[1ex] \overline{ab} & = \overline a ;\overline b \\[1ex] q.\overline a.\overline b & = q.(\overline a ;\overline b) \end{align}
where I dropped, for better readability, one pair of parentheses on the left-hand side of the last equation.
As we will see later on, catgory theorists love to write mathematics in diagrammatic form. We can already see why this is a good idea. After all, the long discussion on notation above was only about a diagram of the form
\[\bullet\to \bullet\to \bullet\]
Words, Strings, Lists, …
Monoids are closely related to a "data type" the elements of which are variously known as words, strings, lists, tuples, or sequences. As this data type is of fundamental importance in mathematics and programming there are many different notations for it. Take note of those familiar to you and feel free to ignore the others.
In the examples below we take \(\Sigma\) to be the latin alphabet.
""
"the"
Nil
Cons 't' (Cons 'h' (Cons 'e' Nil))
[]
["t","h","e"]
This list is not exhaustive, many further variations do exists both in math and in programming. In the following I may (possibly inadvertently) switch between these notations. Give me a shout if you are in doubt.
The Free Monoid
(essential)
Given an set \(\Sigma\), we denote by \(\Sigma^\ast\), or sometimes \(\sf List(\Sigma)\), the set of all words over \(\Sigma\).
Notation: If we think of the elements of \(\Sigma^\ast\) as words or strings, \(\Sigma\) is called an alphabet. But this is not the only possible interpretation of \(\Sigma\). Instead of \(\Sigma\), we also use \(X\), in particular if we think of the elements of \(X\) not as input symbols of an automaton but as variables used in terms and equations such as \(x\cdot (y\cdot z)=(x\cdot y)\cdot z\). We will hear much more about variables, terms, equations throughout this course.
Terminology: The operation that takes a set \(X\) to the set \(X^\ast\) of words over \(X\) is known as the Kleene star. \(X^\ast\) is known as the free monoid over \(X\), a terminology that will be explained below.
The set of words \(X^\ast\) carries additional structure: Words can be concatenated and there is an empty word. And there is much more structure, some of which we will explore now.
The free monoid can be represented in different, and equally useful, ways. The first two will appear again in the last lecture when we will define monads and are named for Eilenberg and Moore and Kleisli. If you want to avoid the mathematical notation for now, you can instead also look at the Haskell code in
monoids
. The mathematical notation has the advantage of being independent of any particular programming language. [1]Table Comparing Notations
In fact, it may be useful to give a table of translations already now, even if not all the notation will be explained right here: [2] [3]
List a
[a]
m a
mapList f
map f
fmap f
eta x
[x]
return x
mu
concat
join
lift g t
t >>= g
t >>= g
The first column shows widely used notation for the list monad (=free monoid monad) in mathematics, which I adapted, in the second column for my implementation of the list-monad in Haskell, see
monoids
. If you want to use instead the built-in Haskell list-monad use the notation of the third column, seemonoidshs
. Widely used notation for generic monads in mathematics is given in the fourth column, while the fifth column does the same for Haskell, see also here and here.The following three Notations can be skipped on first reading. The idea is that, in order to define the list monad, there are various equivalent ways to choose the basic operations from which the others can be defined.
Notation 1 (Eilenberg-Moore)
Notation 2 (Kleisli)
Notation 3 (operations)
Exercise: Compute \(\eta(1)\), \(f^\ast([1,2,3])\) where \(f(x)=x+1\), \(\mu([[1,2,3],[4,5],[6]])\). Note that \(X\) above can be \(X^\ast\) itself. So it makes sense to evaluate \(\eta([1,2,3])\) and \(\eta^\ast([1,2,3])\) and \(\mu([[[1,2],[3,4]],[[5,6]]])\). To run these and more examples go to
monoidshs
.Programming Activity: Think about whether these functions are in your your list library. Can you find some equations that these operations must satisfy? Write some list processing algorithms that make use of these functions. [4]
The next exercise contains the essential insights needed to prove that Notation 1, 2, and 3 are equivalent.
Exercise: Define \(\mu\) in terms of \(\cdot\) and vice versa. Define \((-)^\sharp\) in terms of \((-)^\ast\) and \(\mu\). Define \(\mu\) in terms of \((-)^\sharp\). Test your definitions by implementing them in
monoidshs
.Remark: The equations for Notations 1 and 2 are generic while the ones of Notation 3 depend on a particular presentation. If that remark is not clear now, I hope it will become clear as we go on. Ask me if in doubt.
Axioms
(important)
We will now use Notation 3 to formally define monoids. We will come back to Notation 1 and 2 in the exercises and when we define monads in Lecture 4.
A monoid \((M,e, \cdot)\) is a set \(M\) together with an element \(e\in M\) and a binary operation \(\cdot:M\times M\to M\) satisfying for all \(x,y,z\in M\)
\begin{align} e\cdot x & = x & (\rm left\ identity)\\ x\cdot e & = x & (\rm right\ identity)\\ x\cdot(y\cdot z) & = (x\cdot y)\cdot z & (\rm associativity) \end{align}
A (right) monoid action \((M,A,.)\) consist of a monoid \(M\), a set \(A\) and a function
\[ \bullet: A\times M \to A\]
such that
\begin{equation} a\bullet \epsilon = a\\ a \bullet (x\cdot y) = (a \bullet x) \bullet y \end{equation}
Remark: One can also define a left action. Or an action both from the left and the right. [5]
Activity/Exercise: How do you have to map the notation \(\cdot\) and \(\bullet\) of the definition to the notation of the introductory example of DFAs in order to see that a DFA is an "action of the free monoid on the set of states"?
More Examples
The following are examples of monoids.
Theory
(can be skipped for now, but these topics will show up later again)
There is a lot of theory to choose from …
Equational reasoning by reduction to normal form
We have seen that monoids are defined up to the equations of identity and associativity. This means that there are a lot of terms/expressions that are equivalent but look different. For computation and reasoning that is a problem. How do we know that two terms are equivalent? Is equivalence even decidable? Is there always a "canonical" or "normal" term we can pick? This is not always possible, but if it is, it makes life much easier.
Representation of monoids.
We have seen that functions give rise to monoids. What about the converse? In fact, every monoid can be represented as a monoid of functions. This representation theorem is important because it answers the question: What is a monoid? In other words, the monoid axioms capture exactly the properties of function composition. Moreover, its categorical generalisation, the Yoneda lemma, is very powerful and the cornerstone of category theory.
Quotienting by congruence relations
When can we simplify a monoid by identifying several of its elements. We introduce congruence relations and show that they allow us to quotient monoids. This allows us to abstract, or simplify, in a structure preserving way. For example, we can use this to minimize automata, a particular way to do data compression.
In the following I will only look at the 2nd item. It is relevant conceptually, but not needed technically.
The 3rd item is interesting because algebras for a monad are closed under congruence relations. But while important, this observation is not central to what follows in this course. (I may fill it in later, though.)
Every monoid is a monoid of functions
Functions form monoids because identity is a function and functions have associative composition.
But some monoids, such as \((\mathbb N,0,+)\) do not look as if their elements were functions. Or can we think of numbers as functions?
In fact we can …
Proposition: Every monoid can be represented as a monoid of functions.
Proof (Sketch): Let \((M,e,\cdot)\) be a monoid. For all \(m\in M\) define a function
\begin{align} \hat m : M & \to M\\ x & \mapsto x\cdot m \end{align}
Remark: Using notation from \(\lambda\)-calculus we could write the definition of \(\hat m\) as [7]
\[\hat m = \lambda x. x\cdot m\]
Exercise: Finish the proof above. Interpret the construction of the proof in various examples.
Remark: In group theory, the proposition is known as Cayley's Theorem (all groups are groups of permutations).
Quotienting by a congruence relation
Here I assume familiarity with the notion of equivalence relation and the quotient of a set by a an equivalence relation. See What is Abstraction? for a general introduction and Equivalence Relations for a summary of the relevant technicalities.
Definition (congruence relation): Let \((M,e,\cdot)\) be a monoid. A congruence relation \(\equiv\) on \(M\) is an equivalence relation on \(M\) such that [8]
\[\frac{x\equiv x' \quad\quad y\equiv y'}{x\cdot y \equiv x'\cdot y'}\]
This rule should be obviously true if you think of \(\equiv\) as \(=\). Conversely, any relation with this property can be thought of as an equality relation on \(M\). This is formalised in the next proposition.
Proposition: The quotient of a monoid by a congruence relation is a monoid.
Maths Exercise: Prove the proposition.
Answer. [9]
The mathematical essence of the definition of a monad we will see in Lecture 4 can be understood as an axiomatic generalisation of the following fact.
Proposition: Every monoid \((M,e,\cdot)\) is a quotient of the free monoid over \(M\).
Maths Exercise: Prove the proposition.
Answer. [10]
Applications
(incomplete … can be skipped)
We first show with the help of congruence relations that every automaton can be minimised.
Then we indicate briefly that all we learned about monoids in this lecture extends to general algebras, that is, mathematical structures defined by arbitrary operations and equations.
General algebras are studied in the area of mathematics called Universal Algebra. Universal algebra is closely related to monads. In fact, much of universal algebra can be seen as the study of monads from the point of view of operations and equations. In these lectures, we will not have the time to introduce universal algebra and will just say enough to be able to point out some connections and directions a curious reader might want to follow up.
One of the many connections of universal algebra with programming is that many data types can be understood as algebras.[11] Another connection is given by an (until recently unsolved) problem in the area of constraint satisfaction.
Minimal Automata
Minimising DFAs is such an important optimisation technique for solving many practical problems (such as searching for patterns in text) that there are various different ways to construct them. I choose here one that emphasises the point of view of automata-as-monoids.
Every DFA \(A\) over \(\Sigma\) defines defines an equivalence relation \(\equiv\) on \(\Sigma^\ast\) as follows. \(w_1\equiv w_2\) iff for all \(a\in A\) \(w_1\bullet a=w_2\bullet a\). The quotient \(\Sigma^\ast/{\equiv}\) is called the syntactic monoid of the automaton. The syntactic monoid is also the minimal automaton accepting the same language.
- The initial state is the equivalence class of the empty word.
- The transition function is given by \(w\bullet [v]_\equiv= [w\cdot v]_\equiv\).
- The final states are the equivalence classes of …
Homework
If you have time for just one homework, do the one on monoids. We will use it later.
(Maths) Equivalence of Eilenberg-Moore and Kleisli notation
If you want to do some mathematics show the following.
There is more to the equivaence of Eilenberg-Moore and Kleisli, namely the preservation of certain equations. We will come back to this in Lecture 4.
Btw, the Haskell implementation below can help with the solving this exercise.
(Programming) Implementing Monads in Haskell: Introduction
The aim of the homework is to practice translating from maths to Haskell and back. We look at two different kind of algebraic structures.
Monoids
We learn how to translate into Haskell some of the math above, in particular the Eilenberg-Moore and the Kleisli notation.
An Error Monad
We build the simplest possible monad in Haskell and investigate it from mathematical point of view.
In both cases we use the algebraic/monadic approach in order to take into account that data obeys equations:
(Programming) The Free Monoid Monad
(essential, we will build on this throughout the following lectures)
Above we defined the so-called free monoid \(X^\ast\) over a set \(X\). We can write this as a Haskell data type, essentially just writing
a
for \(X\),List a
for \(X^\ast\),Nil
for the empty word andCons
for the operation that prefixes a word by a letter.To experiment follow the link
monoids
. I also made a version that uses Haskell's predefined lists inmonoidshs
. [13]For
List
to be a monoid, we need a neutral element and multiplication. The neutral element will beNil
. Multiplication (which is concatenation) can be defined by recursion:Exercise: Do
Nil
andmult
satisfy the equations of identity and associativity? Write some Haskell code that is testing these laws.Next we go back to Notation 1 and 2 above, where we introduced functions
\begin{align} f^\ast & : X^\ast\to Y^\ast & {\rm for \ } f:X\to Y \\ [-] & : X\to X^\ast\\ \mu & : (X^\ast)^\ast \to X^\ast\\ g^\sharp & : X^\ast\to Y^\ast & {\rm for \ } g:X\to Y^\ast \\ \end{align}
In Haskell we can implement these functions as follows [14]
Exercise: Execute these functions pen-and-paper on some small examples. (You also may want to find more instructive names for the operations … I kept names generic as they will appear later in different contexts.)
Exercise (Eilenberg-Moore): Write Haskell functions that test the following equations on input data of your choice. (Recall that \(f^\ast\) is
mapList f
.)\begin{gather} \mu (\eta (x)) = x\\ \mu (\eta^\ast (x)) = x\\ \mu (\mu^\ast (x)) = \mu(\mu(x))\\ \end{gather}
Note: In the first two equations \(x\) is any list, but in the last equation \(x\) is a list of lists of lists.
Exercise (Kleisli): Write Haskell functions that test the following equations on input data of your choice.
\begin{gather} \eta^\sharp (x) = x\\ g^\sharp(\eta(x)) = g(x)\\ (\lambda x\,.g^\sharp(f(x)))^\sharp (x) = g^\sharp(f^\sharp(x))\\ \end{gather}
(Programming) An Error Monad
(important but optional)
Let us define a data type that allows us to wrap any other data type into one that adds the possibility of an error as result. You can experiment with the code following the link to
wrap
. We defineMathematically, we can think of
Wrap
as an operation that takes a set \(X\) and maps it to \(\{\sf Error\}+X\), where \(+\) denotes here disjoint union of sets. (One often sees the notation \(1+X\) instead of \(\{\sf Error\}+X\).) This disjoint union comes, by definition, with two functions\begin{gather} {\sf Error}: 1\to 1 + X\\ {\sf Value}: X\to 1 + X \end{gather}
Where do the equations come in? In the case of the monoids we knew that we had to take into account the laws of identity and associativity. On the other hand, there do not seem to be any obvious equations relating errors and values.
But if we go back to the monoids we see that the equations come into play when we iterate the type constructor
In fact, it is
mu
that encodes the laws of identity and associativity.Something similar happens for
Wrap
. After all, error hanling through functions that call each other recursively can lead to a whole stackWrap (Wrap ( ...)...)
and in order to extract an error or value from this stack, we need to flatten it. What are the equations for this flattening?Exercise: Justify the following equations in terms of error handling:
\begin{gather} {\sf Value}({\sf Error}) = {\sf Error}\\ {\sf Value}({\sf Value}(x)) = {\sf Value}(x) \end{gather}
We write this in Haskell as
So far so good. But for an error handling library we need more functionality. We need a function
that wraps a value, a function
that allows to change wrapped values, and a function
that allows us to chain computations.[15]
Exercise: Head over to
wrap
and implement these functions so that they pass the tests specified in the file.Summary and Outlook
We started with monoids for several reasons. First, they are a fundamental structure in computer science, not least because monoid actions model time-irreversible behaviour. Second, monoids are one of the simplest algebraic structures. Third, groups are special monoids and play a central role in much of mathematics. Fourth, we hinted at the relevance of these structures to algebraic data types. In the next lecture, we will explain this in more detail. We will also generalised monoids to universal algebras and give our first axiomatic answer to the question of what is algebraic structure.
References
For more on automata see
For more on the List monad in Haskell see
Further Reading
School of Haskell: The List Monad.
Monday Morning Haskell: The Reader and Writer Monad with source code.
…
This is just a quite random list of sources that came to mind while writing these notes. They are not directly relevant but before I forget about them again, I though I keep them here. Pirog's Eilenberg–Moore Monoids and Backtracking Monad Transformers and other related papers contain more interesting examples/references on programming with monands. A good starting point would be Wadler's How to Replace Failure by a List of Successes or Bird's program to solve Sudoku. More Sudoku solvers in Haskell. I originally planned to say something about syntactic monoids, my reference for this was Pin's Syntactic Semigroups. The idea of time as a monoid has many manifestations. I was reminded for example of the coalgebraic semantics of timed processes.
…
Programming with the List Monad
…
Stackoverflow on Set, List and Monad Comprehension. This explains how to extend list-comprehension to all monads.
…
Code
Monads can be implemented in any programming language. Here is an example of monads in ML. ↩︎
The notation is widely used. The use of is specific to the example of words. In fact, other sources use for what I write . I may use later for monads but I already use for monoids and is another conventional letter for monads. may remind us of "triple" (an older name for monads) or of "term" (we will see that the elements of can be understood as equivalence classes of terms over variables in ) or of "type" (many important type constructors are monads). ↩︎
t >>= g
, or should we writeas >>= bs
is usually written asIf you know monads in Haskell try to match them up with the Kleisli notation. ↩︎
DFAs are right actions. DFAs cannot be reverse because not all functions have inverses. On the other hand, relations can be reversed and non-deterministic automata give rise to both left and right actions. ↩︎
(eg integers with addition but not integers with multiplication) ↩︎
There is an analogy here with Church numberals. Church numerals are functions used to represent numbers. Here we did go the other way round and used numbers to define functions ↩︎
Read as or " implies " or "if then " or "assuming conclude ". ↩︎
We define the quotient as the set , where is the set of equivalence classes of . The neutral element is . Multiplication is defined as
We have to show that this definition does not depend on the choice of "representatives" . But this is exactly what the definition of congruence guarantees. ↩︎
We define inductively as and . It remains to show that is a congruence relation. These type of definitions always give equivalence relations. So we need to show that the congruence rule holds. This involves a somewhat tedious unfolding of the relevant definitions plus one lemma that is of independent interest, , which can be proved by induction on the structure of lists. ↩︎
The idea of so-called abstract data types is easy to explain. An interface for a data type such as queues can be described independently from any implementation by operations
The operation
head
may also be calledfront
(for fifo) orend
(for lifo), but here we want to emphasize that both lifo and fifo are based on the same types (or sorts)Queue
andElem
and the same operations (or signature).Nevertheless, the difference between lifo and fifo does show up in an implementation-independent, purely algebraic way, namely in the equations that hold for them.
For example, for lifo we have
which do not hold for fifo. (Exercise: Write out more equations for lifo and fifo.) ↩︎
Associativity tells us that all terms can be normalised by moving all parentheses to the left (or right). Lists can be understood as an inductive definition of these normal forms. ↩︎
Haskell's predefined lists are more readable and but one must rely on some "secret" Haskell internals. ↩︎
Renaming to to as as as
a,b
andeta
andmapList f
andmu
andlift g
. ↩︎Suppose we have two functions
and we want to compute of for some integer . Notice that the types don't match up, we cannot simply write or
f (g x)
. The solution is to usef (lift g x)
. ↩︎