We will go through parts of Chapter 2 of Introduction to Automata Theory, Languages, and Computation (IATLC) (available online at the time of writing) and implement it in Haskell. You can read these notes in at least two different ways.
Whether you read this to learn about automata or about Haskell, the main point of these notes is provide some background that will help us to understand mathematics as a specification and programming language. In brief, the aim is
In this section, which can be skipped, we list all programs and how they align with the Automata Theory thread and with the Haskell Programming thread.
The programs are available online and can be run in a browser. Use the "Run" button of the "Output" tab to execute. Click the "Code" tab to see the code. Fork to change the code.
If you are new to Haskell you can start with my First Steps in Haskell.
The programms and which Haskell features they illustrate: [3]
main
, print
, $
)newtype
)data
)Maybe
, do
, return
)Now let us get started.
In this section, which can be skipped, I list the parts that are most relevant if you want to ignore the Haskell Programming thread and only want to follow the Automata Theory thread (see the table of contents on the top left).
The concept of a finite state machine, or deterministic finite automaton (DFA), can be explained in a picture. For example,
is an automaton with two states
Exercise: The automaton above recognizes precisely all words that end with
In our programming language, the same information can be represented as [5]
initial = 0
final 1 = True
final 0 = False
delta 0 'a' = 0
delta 0 'b' = 1
delta 1 'a' = 0
delta 1 'b' = 1
This program has the advantage that the functions can be executed. Head over to automata0a to run it.
Exercise: Modify the program to obtain an automaton that recognizes all words ending in
Here is an example from the book.
Exercise: Draw a picture of an automaton recognizing exactly the strings of
Answer. [6]
Exercise: Implement the automaton from the previous exercise.
We conclude this section with the official definition from the textbook IATLC:
One of the most powerful features of many modern programming languages is a strong type system. To fully appreciate the advantages of expressive type systems takes some experience, but we can summarize the following important points.
(can be skipped if you want to focus only on automata)
Even though there is wide agreement that types are beneficial, they do restrict the freedom of the programmer. [9] To alleviate some of the burden that types impose, Haskell provides type inference. If you load the program above into the Haskell console you can ask Haskell for the types of variables and functions:
:load main.hs
:type initial
initial :: Integer
:type final 1
final 1 :: Bool
Lines 3 and 5 tell us the types of initial
and final 1
. So far this is what we would expect more or less in any programming language. It gets more interesting when we ask for the type of final
itself:
:type final
final :: (Eq a, Num a) => a -> Bool
We might have expected to see the final :: Integer -> Bool
, saying that final
is a function that maps an integer to a truthvalue.
But the type computed by Haskell's type inference algorithm is more complicated. Haskell computes the most general type of final
. The type variable a
can be replaced by any type satisfying the constraint (Eq a, Num a)
. This constraint specifies that there a
must be equipped with equality ==
and that a
must be a number. [10]
Exercise: (Haskell) Predict the types of delta 'a' 0
and delta 'a'
and delta
.
Now we learned what types are, we may want to annotate the functions with their types:
initial :: Int
final :: Int -> Bool
delta :: Int -> Char -> Int
While we have seen in automata0a that this is not necessary, I find the code at automata0b more readable. [11] At run-time there is no difference, though.
newtype
Recall the definition of a DFA from IATLC:
We can match up the mathematical and the programming notation more closely [12] if we mirror some of the naming conventions of the textbook with the help of user defined types.
For example, we can improve the previous implementation by introducing a new type State
.
newtype State = State Int
initial = State 0
final :: State -> Bool
final (State 1) = True
final (State 0) = False
delta :: State -> Char -> State
delta (State 0) 'a' = State 0
delta (State 0) 'b' = State 1
delta (State 1) 'a' = State 0
delta (State 1) 'b' = State 1
This has the advantage that the code becomes more readable and that the typechecker can catch more errors at compile-time. [13] Btw, a newtype
is erased at compile-time and does not affect the run-time.
data
(automata0d … not needed in the following, feel free to skip)
Similarly, we may want to introduce a user defined type for the input symbols.
data Symbol = A | B
The differences between newtype
and data
are somewhat subtle and can be ignored for now. Suffice it to say that newtype
can only have one constructor (called State
in our example) where as for input symbols we want two constructors A
and B
corresponding to the letters of the alphabet. Btw, constructors in Haskell need to be in capital letters. The code is at automata0d. [14]
(automata0c2 and automata0d2 … not needed later, can be skipped)
We will briefly investigate the difference between interpreting Haskell by running ghci
in the console and compiling Haskell with ghc
or runghci
.
We also use this opportunity to show how user defined types can be used to detect an error at compile-time as opposed to at run-time.
We first interpret the two programs automata0c and automata0d on undefined input in the console. They will both fail at runtime, as one would expect.
Exercise: (Haskell) Run [15] the two automata automata0c and automata0d on an undefined input in the console. [16]
Next, we will investigate how this changes in case we compile the two automata.
When we run a compiled Haskell program, the main
function gets executed. We therefore add a test case for an undefined input to the programs automata0c2 and automata0d2.
To compile and run we can either use the shell and enter
ghc main.hs
./main
or, in repl.it, use the green "Run" button. [17]
Exercise: (Haskell) Explain why one of automata0c2 and automata0d2 fails at run-time and the other at compile time.
Answer. [18]
We will revert to using Char
for input symbols again because Char
is the type of data entered by users who run automata on their computer.
(automata01 and Section 2.2.4 of Introduction to Automata Theory, Languages, and Computation)
The next step in the development of automata theory is to run an automaton not on a single character as input but on an entire word of characters. The function that computes how state changes given a word as input is called
in the book and described as follows. (The book writes
To summarise the text above in two equations:
In fact, as a specification for a recursive program, I prefer the mathematically equivalent but tail recursive definition
In Haskell, the latter can be written as [19]
run delta q [] = q
run delta q (a:x) = run delta (delta q a) x
Some comments:
run delta
is delta
in Line 2 must match a:x
anyway, so we do not need to give it a name such as run :: (State -> Char -> State) -> State -> [Char] -> State
run
is a function that takes delta
as its argument and returns run delta
of type State -> [Char] -> State
.run
is called a higher-order function because it takes a function as an argument.While mathematicians call
a definition by induction on the length of the input string, programmers call
run delta q [] = q
run delta q (c:cs) = run delta (delta q c) cs
a definition by recursion on the algebraic data type of lists. This difference is purely one of convention and tradition. But there is a deeper difference between mathematics and programming:
Mathematics is independent of any particular programming language. This is very important. [21] On the other hand, the program runs on my laptop. Which is also really cool.
Not surprisingly then, being able to fluently translate between maths and code is a valuable skill to develop.
(automata02 … not needed in the next section, feel free to skip)
It is often convenient to work with an automaton that does not specify a transition on all possible inputs. On the programming side, we will learn how to implement partial functions in Haskell using the Maybe
monad.
What happens if you call recognize "abc"
? What is the reason for this behaviour? Why is it bad? How can we improve it?
One reaction is that we implemented only a partial transition function not defined for all of Char
. While we could think of many ways of dealing with this problem, a good way of dealing with partial functions in Haskell is the Maybe
monad. As a data type it is predefined as
data Maybe a = Nothing | Just a
We use it to change the transition function as follows.
delta :: Char -> State -> Maybe State
delta 'a' (State 0) = Just (State 0)
...
delta _ _ = Nothing
We see in line 1 that the transition function delta
now returns a value of type Maybe State
. In line 2, upon input 'a'
, it returns Just (State 0)
. In line 4, we add a default case returning Nothing
.
So far, we just used a common mathematical trick to turn a partial function into a total function by introducing a new value (here Nothing
) for inputs on which the function was previously undefined.
The particular structure called monad comes into play when we want to compose a function a -> Maybe b
with a function b -> Maybe c
. At this point we could make a big detour via the mathematical theory of monads, but this is not necessary because the Haskell notation below is easy enough to understand (get in touch if you have any doubts).
run :: (Char -> State -> Maybe State) -> [Char] -> State -> Maybe State
run delta [] q = Just q
run delta (c:cs) q = do
q' <- (delta c q)
run delta cs q'
Remark on do-notation: As we can see, the do
-notation allows us to sequence functions in a style similar to imperative programming. But, WARNING, q'
in line 4 is not a mutable variable and <-
is not an assignment. All we are doing here is giving the name q'
to the result of delta c q
. Then we can use this name when calling another function in line 5. [22]
I'd like to say that this all, but there is one more thing we need to know. The notation name <- expression
only works if expression
is of type Maybe a
for some a
(and name
then will be of type a
). Moreover, the last expression in a do block needs to be of type Maybe a
. So can you spot why the following does not typecheck?
search_in :: [Char] -> Maybe Bool
search_in word = do
q <- run delta word initial
accept q
Exercise: (Haskell) Modify the code automata02 as indicated in line 4 above. What error message do you get upon running search_in ...
? Try to understand the error message and then read at hoogle about return
. What does m
in return :: Monad m => a -> m a
refer to in our case?
By the way, in this example, one can also avoid the use of the Maybe
monad. Indeed, delta
can be turned into a total function just by adding another state to the automaton. In the following we will use this idea.
(automata03 and Section 2.4 of Introduction to Automata Theory, Languages, and Computation)
(this section about regular expressions can be skipped on first reading)
The title of this section is justified by the equivalence of regular Expressions and DFAs. Indeed, one can search for a regular expression by running the corresponding DFA.
In terms of Haskell we will learn how to "simulate" global variables by passing them on as arguments, how to combine case
with pattern matching, and how to use the IO-monad for output.
Task: Given is a DFA representing a pattern and a text. We are looking for a string in the text matching the pattern.
Solution: We run the DFA on the text and, once a final state is found, output how many steps it has taken to reach the final state. This indicates the end of the matching string in the text. [23]
Our solution is in automata03. The new idea is to extend the type of run
by
run :: ... -> (Bool,Int) -> (Bool,Int)
The first component, called found
, tracks whether a match for the regexp has been found and the second component, called position
, remembers the position in the text of the end of the match. For example, line 6 below is responsible for, in the first component, setting found
to true upon finding the first match for the regexp. From that point on, in the second component, position
will be kept constant until it is returned in line 2. On the other hand, as long as no final state has been reached, the position will be incremented in line 7.
run :: (Char -> State -> State) -> [Char] -> State -> (Bool,Int) -> (Bool,Int)
run delta [] q (found,position) = (found,position)
run delta (c:cs) q (found,position) =
case (found, accept (delta c q)) of
(True, _) -> run delta cs (delta c q) (True, position)
(False, True) -> run delta cs (delta c q) (True,position)
(False, False) -> run delta cs (delta c q) (False,position+1)
Exercise: (Haskell) Implement the following modification of the program above. Output two strings splitting the text into two parts called left and right. The right string contains the suffix of the text that remains after one found the first occurrence of the regular expression. The left string contains the largest prefix of the text excluding the right string.
(automata04 and Section 2.3.2-4 of Introduction to Automata Theory, Languages, and Computation)
Page 57 of the book defines NFAs as follows.
The only difference to DFAs is in item 5, where it says that
Since we are dealing with finite automata (hence finite subsets), we can implement
initial = State 0
final :: State -> Bool
final (State 0) = False
final (State 1) = False
final (State 2) = True
delta :: State -> Char -> [State]
delta (State 0) 'a' = [State 0, State 1]
delta (State 0) _ = [State 0]
delta (State 1) 'a' = [State 1,State 2]
delta (State 1) _ = [State 1]
delta (State 2) _ = []
Exercise: What language does this NFA recognise?
On page 58, the extended transition function is defined as follows.
We can write this more concisely in Haskell as follows (compare with the definition of run
for DFAs above).
run :: (State -> Char -> [State]) -> State -> [Char] -> [State]
run delta q [] = return q
run delta q (c:cs) = do
next <- delta q c
run delta next cs
This code is so short and elegant because Haskell knows that lists are a monad and provides the very useful do
notation for it. We should try to understand how this works at least on a high level of abstraction. So let me pause for sec.
I will try to say something of interest to both type of readers, so you will have to navigate the following, digging in or skipping over, as appropriate.
Monads play an important role in Haskell, but have important applications in mathematics and software engineering that are independent of Haskell. From a software engineering point of view, the definition of a monad can be understood as providing an interface to data types that support certain operations on the data. One way to understand why this interface is ubiquitous is to think of a monad as container in which we can wrap elements and only manipulate them in certain restricted ways. Examples of monads are sets, lists, probabilistic choice, exceptions, IO and much more.
In our case, we use the list monad. The data type of lists implements the interface of a monad.
One can (and most often does) implement lists without considering them as instances of monads. So what do we gain from using monads in this example?
We use the list monad to implement backtracking without having to manage the necessary control flow explicitely ourselves.
Remark on do-notation: First read the remark on the do-notation in the section on "Partial DFAs" above. This is almost the same code. In fact, this similarity is one of the main reasons why monads are so popular in Haskell: They formalise a design pattern that appears in many different situations. For example, to handle errors and to handle non-determinism. Some general remarks:
<-
must be of the type of the monad, here [State]
.<-
is just State
.next
as an argument of type State
in run
.[State]
. If we wanted to return an expression e
of type State
, we used return e
.Let us run
run :: (State -> Char -> [State]) -> State -> [Char] -> [State]
run delta q [] = return q
run delta q (c:cs) = do
next <- delta q c
run delta next cs
on automata04 in state Q0
on aaa
by hand:
Q0 "aaa"
[Q0,Q1] "aa"
[[Q0,Q1],[Q1,Q2]] "a"
[[[[Q0,Q1],[Q1,Q2]],[[Q1,Q2],[]]]
What the do-notation does for us is to flatten the list at each stage, so that the actual result is [Q0,Q1,Q1,Q2,Q1,Q2]
. (Note that flattening lists implements a union of sets.)
Exercise: (Haskell) Implement the NFAs of Exercise 2.3.4 of the book in Haskell.
(automata05 and Section 2.3.5 of Introduction to Automata Theory, Languages, and Computation)
It can be much easier to specify a language using an NFA than a DFA. [24] On the other hand, a DFA is easier to implement exactly because it is determinstic and there are no choices to resolve. To get the best of both worlds, it is therefore of interest to have an algorithm that converts any given NFA to a DFA accepting the same language.
The idea is easy: The states of the DFA are sets of states of the NFA. Intuitively, we run all computations of the NFA "in parallel" and, therefore, have to keep track of the set of all the states our computation could be in.
In other words, we want to define a Haskell function
nfa2dfa :: NFA s -> DFA [s]
which transforms an NFA with state space s
to a DFA with state space [s]
, that is, lists over s
.
The definition of nfa2dfa
will follow closely the book. But first we have to define the data types NFA
and DFA
. We are already familiar with the data
type constructor, but we now use "record syntax". This allows us to refer to the components of the data by name rather by position. For details on the record syntax of Haskell, see eg Learn You a Haskell.
data DFA s = DFA {
dfa_initial :: s,
dfa_final :: s -> Bool,
dfa_delta :: s -> Char -> s }
data NFA s = NFA {
nfa_initial :: s,
nfa_final :: s -> Bool,
nfa_delta :: s -> Char -> [s] }
You will recognise the same ingredients as the mathematical definition:
In the code, s
, Char
, dfa_delta
and nfa_delta
, dfa_initial
and nfa_initial
, dfa_final
and nfa_final
.
We can now continue with the function
nfa2dfa :: NFA s -> DFA [s]
nfa2dfa nfa = DFA {
dfa_initial = <...>,
dfa_final = <...>,
dfa_delta = <...> }
How do we have to complete the definition of the DFA? What do we have to replace <...>
with in the code above?
If you have not done so yet, it is time to read Section 2.3.5 of the book. In particular, you will find there the following.
nfa
can be referred to in the code as nfa_initial nfa
.nfa
can be referred to in the code as nfa_final nfa
.We discussed this algorithm in some detail in class (see also the section above). Now is the time to run it pen-and-paper on some concrete examples.
(Put the definitions of dfa_initial, dfa_final, dfa_delta
in the report and explain your code in a paragraph. Also put your modified version of automata05.hs
in yout github repository.)
Exercise: (Haskell) The file automata05 provides a template, in which you have to modify lines 71, 72, 73. Complete the definition of nfa2dfa
as described in the three bullet points above and test it with various examples. Hint: Lines 4 and 5 require to implement functions. You can do this using the let
notation. For example line 5 will look like
dfa_delta = let f in
f [] c = ...
f (q:qs) c = ...
in f,
If you implemented the exercise correctly then both the NFA and the DFA should give the same results on all tests. [25]
It would be interesting to implement more of automata theory. Non-determinstic automata with
Thanks to Sam Balco, Peter Jipsen and Drew Moshier.
John Zorn for the music.
This applies to students who take my classes Programming Languages and Compiler Construction. ↩︎
This applies to students who take my class Algorithm Analysis. ↩︎
The first interesting program in the list is automata01. I added the others later to gradually introduce various features of Haskell. Feel free to skip. ↩︎
In case you are wondering already: Yes, automata are useful for searching for text and patterns in text. We will come back to this below. ↩︎
We use integers to encode states, so we have 0
instead of a
denotes a variable. The name delta
for the transition function is traditional, see eg Chapter 2 of Introduction to Automata Theory, Languages,and Computation. ↩︎
The answer is in the book. I just copy in here the transition table that you need in order to adapt the function delta
of our program.
Readability is crucial to lower the costs of maintenance of code. ↩︎
Run-time errors are more expensive to fix than compile-time errors. This has serious implications on software design. (Footnote: Following the link, discuss the examples from the point of view of user-defined types.) ↩︎
Types can be understood as representing invariants that must be maintained during execution. Automatic checking of these invariants improves software reliability. On the other hand, sometimes programming is easier, and programs are more efficient, if one does not enforce that invariants are maintained at all times. ↩︎
If you are curious to learn more about typevariables and constraints then I recommend . To get a summary reference on Eq
and Num
type :info Eq
and :info Num
in the Console. For more see the Haskell reference which can be found via the very useful hoogle. ↩︎
-- automata0b.hs
-- define an automaton
initial = 0
final :: Int -> Bool
final 1 = True
final 0 = False
delta :: Int -> Char -> Int
delta 0 'a' = 0
delta 0 'b' = 1
delta 1 'a' = 0
delta 1 'b' = 1
-- testing
main = do
print $ delta 0 'a'
Maybe I should have implemented the transition function delta :: State -> Char -> State
, following the book. ↩︎
Admittedly, these advantages will only be felt when the code gets bigger and the type Int
is used to encode data other than states. ↩︎
-- automata0d.hs
newtype State = State Int
deriving Show
data Symbol = A | B
-- define the automaton
initial = State 0
final :: State -> Bool
final (State 1) = True
final (State 0) = False
delta :: State -> Symbol -> State
delta (State 0) A = State 0
delta (State 0) B = State 1
delta (State 1) A = State 0
delta (State 1) B = State 1
-- testing
main = do
print $ delta (State 0) A
Enter :load main.hs
in the console. Recall that the automata are only defined for inputs a
and b
, or, respectively, A
and B
. Now enter, for example, delta (State 0) 'c'
or, respectively, delta (State 0) 'C'
. ↩︎
If you want to learn more about Haskell typechecking try to explain the three different error messages you will get upon
delta C (State 0)
delta c (State 0)
delta 'c' (State 0)
If you do this not at repl.it but on your own machine enter runghc main.hs
. ↩︎
While both programs fail because they ask delta
to process undefined input, their behaviour is markedly different.
automata0c2
first prints the expected lines of output and then fails at runtime with the error message "Non-exhaustive patterns in function delta". Only when running the program, it becomes known that the definition of delta
has no case to process input 'c'
.automata0d2
fails with the message "Data constructor not in scope" and automata0d2
does not produce any output at all. It is known at compile-time that C
is not a legal input to delta
.The ability to check at compile-time whether a function definition covers all possible cases is one of the important features of algebraic data types and one of the reasons why Haskell makes building interpreters and compilers simpler than, say, Java. ↩︎
In Haskell a word is a list of characters. The type of a word is [Char]
or also String
. The empty word can be written as both []
and ""
. A word such as "abc"
can also be written as ['a','b','c']
or 'a':"bc"
. The a:x
notation denotes a list, the first element of which is a
and the remaining list of which is x
. Idiomatic Haskell would be to write a:as
instead of a:x
. (Note that a
here is not the character "a"
but a variable. Similarly, as
is just another variable, the s
has no special significance and is just conventional.) ↩︎
If you want to study programming languages (in the plural), you need some formalism that is independent of any particular programming language. ↩︎
See also the remark on do-notation in the section on non-deterministic automata below. ↩︎
Strictly speaking this is a partial solution, since we only find the end of the matching string. It is enough for a program like unix grep
that outputs all lines that contain a matching string. But it is not enough for an editor that has to highlight all matching strings. For the editor application it should be come at little cost to do a backward search to find the beginning of the matching string. If the pattern is just a plain string there is a better solution known as Knuth-Morris-Pratt. ↩︎
See eg Exercise 3.4.5 of the book. ↩︎
If you looked under the hood of what the do
-notation does for the list monad in nfa_run
, you would discover that it is actually doing exactly the same as what nfa2dfa
is doing. Under the hood, the do
-notation also involves taking a union (concatenation) of lists. ↩︎