# Deterministic finite automata (DFAs)
## Definition of DFAs
**Definition (DFA).** A *deterministic finite automaton* $\mathcal A = (Q,\Sigma, \delta, q_0, F)$ consists of:
1. a finite set of *states* $Q$
2. a finite set of *input symbols* or *labels* $\Sigma$
3. a transition function $\delta \colon Q \times \Sigma \to Q$
4. an *initial* or *start state* $q_0 \in Q$
5. a set of *final* or *accepting states* $F \subseteq Q$
If we want to describe an automaton $\mathcal A = (Q,\Sigma, \delta, q_0, F)$ we would have to list all the data $Q,\Sigma, \delta, q_0$, and $F$. Usually, this is done via either of the following:
1. a *transition diagram*, i.e., a labeled graph as we have seen in the examples
2. a *transition table* defining the function $\delta$.
**Example.**
Consider the DFA defined by the following data:
1. $Q=\{q_0,q_1,q_2\}$
2. $\Sigma=\{0,1\}$
3. $\delta(q_0,0) = q_2$, $\delta(q_0,1) = q_0$, $\delta(q_2,0) = q_2$, $\delta(q_2,1) = q_1$, $\delta(q_1,0) = \delta(q_1,1) = q_1$
4. initial state $q_0$
5. $F=\{q_1\}$
Its transition diagram is given by:

Its transition table is given by:
| | $0$ | $1$ |
| -------- | -------- | -------- |
| $\to q_0$ | $q_2$ | $q_0$ |
| $*q_1$ | $q_1$ | $q_1$ |
| $q_2$ | $q_2$ | $q_1$ |
## Formal languages
**Definition.** Let $\Sigma$ be any set. In this context, we call $\Sigma$ be an *alphabet*. The set of *words* over the alphabet $\Sigma$ is given by $$\Sigma^* := \{\varepsilon\} \cup \{ a_1 a_2 \cdots a_n \; | \; a_i \in \Sigma ~ \text{for all $i=1,\ldots,n$}\},$$[^1] where $\varepsilon = \; \; \;$ denotes the *empty word*. A *language* $L$ over $\Sigma$ is any subset $L \subseteq \Sigma^*$, i.e., simply a set of words.
[^1] Here, we are using [set-builder](https://en.wikipedia.org/wiki/Set-builder_notation).
By our definition above, we have that for an arbitrary word $w \in \Sigma^*$ we get that $w$ is either the empty word or it is the concatenation of a character with a word, i.e., we have either $w = \varepsilon$ or $w = av$, where $a \in \Sigma$ and $v \in \Sigma^*$. This is an *inductive description* of the set of words. This allows us to e.g. define the length of a word as follows:
**Definition (length).** The *length* of a word $w \in \Sigma^*$ is defined as:
$$ |w| := \begin{cases}
0 & \text{if $w = \varepsilon$} \\
|v|+1 & \text{if $w = av$ for $a \in \Sigma$, $w \in \Sigma^*$}
\end{cases}$$
**Definition (occurrence).** Let $a \in \Sigma$. The *occurrence* of a $a$ in a word $w \in \Sigma^*$ is written as $|w|_a$.
:::warning
**Exercise 1:** Can you give an inductive definition of $|w|_a$?
**Hint:** Think about you would implement this through an algorithm via *pattern-matching* the presentation of $w$.
:::
**Example**. Let $\Sigma = \{0,1\}$. The words over this alphabet are given by the set of finite binary strings $$\Sigma^* = \{\varepsilon, 0, 1, 00, 01, 10, 11, 000, 001, 010, 100, 011, 101, 110, 111, \ldots\}.$$
Examples of languages are given by the following:
$$L_1 := \{x01y \; | \; x,y \in \Sigma^*\}$$
$$L_2 := \{ w \; | \; |w| = 2^n ~\text{for some $n \in \mathbb N$}\}$$
$$ L_3 := \{ w \; | \; |w|_0 = |w|_1 \}$$
:::warning
**Exercise 2:** Determine for the following words, if they are contained in $L_1, L_2$, or $L_3$.
| | $L_1$ | $L_2$ | $L_3$ |
| -------- | -------- | -------- | -------- |
| $w_1 = 10011$ | | | |
| $w_2 = 100$ | | | |
| $w_3 = 10100100$ | | | |
| $w_4 = 1010011100$ | | | |
| $w_5 = 11110000$ | | | |
:::
Formal languages can be classified by the "complexity" with which they can be described. This is systematized in the [Chomsky hierarchy](https://en.wikipedia.org/wiki/Chomsky_hierarchy) and includes the class of [regular languages](https://en.wikipedia.org/wiki/Regular_language), generated by regular expressions. We will make this precise later.
## The language accepted by a DFA
Let $\Sigma$ be some fixed alphabet and $\mathcal A =(Q, \Sigma, \delta, q_0, F)$ be a finite automaton. The DFA $\mathcal A$ can "process" a word by tracing through it via a sequence of transitions corresponding to the letters of the word.
:::warning
**Exercise 3:** Consider the DFA from above:

Consider the paths corresponding to the words $w_1 = 0010$, $w_2 = 1101$, and $w_3 = 1100$. For which of these words does their run end in the accepting state?
:::
**Definition.** We call $$L(\mathcal A) := \{w \in \Sigma^* \; | \; \text{The run for $w$ in $\mathcal A$ ends in some $q \in F$.} \}$$
the language *accepted* by $\mathcal A$.
## Summary and outlook
We have now made the concept of DFAs and accepted languages formally precise. This will allow us to write proper specifications from which we then can start implementing the theory.