Regular expressions

# Regular expressions Regular expressions are a ubiquitous tool in text processing. They are a slick way to describe formal languages. For example, $R=(\mathbf{0}+\mathbf{1})^* \mathbf{0}\mathbf{1}(\mathbf{0}+\mathbf{1})^*$ describes the language $L = L(R)$ that consists of words with $\mathbf{0}\mathbf{1}$ in the middle. We want to make this precise now. ## Set operations The semantics of regular expressions will rely on the following operations on sets: * **union** $\cup$: This is the usual set-theoretic union, i.e., for languages $L,M$ we have: $L \cup M := \{ x \; | \; (x \in L) ~ \text{or} ~ (x \in M) \}$ For example, for $L=\{0,1,10\}$ and $M = \{\varepsilon, 0, 1\}$ we have $L \cup M = \{\varepsilon, 0, 1, 10\}$. * **contenation** $\cdot$: For languages $L,M$ we define their concatenation as $L \cdot M := LM := \{vw \; | \; v \in L, w \in M\}$. For example, for $L= \{01,10\}$, $M = \{\varepsilon, 0,1\}$, we have: $LM = \{01\varepsilon, 010, 011, 10\varepsilon, 100, 101,\ldots\}$ $= \{ 01, 010, 011, 10, 100, 101,\ldots \}$ * **Kleene closure/Kleene star ${}^*$:** For a language $L$, we define $L^* := \bigcup_{i = 0}^\infty L^i$, with $L^0 := \{\varepsilon\}$ and $L^{i+1} := L^i \cdot L$, for $i \geq 1$. For example, for $L := \{0,1\}$, we have $L^* = \{ \varepsilon, 0,1, 00, 01, 10, 11,000, 001, \ldots \}$ Now, regular expressions are introduced *inductively* as follows: ## Regular expressions -- Base cases 1. **Syntax:** $\emptyset$ and $\varepsilon$ are regular expressions. **Semantics:** $L(\emptyset) = \emptyset$ and $L(\varepsilon) = \{\varepsilon\}$. 2. **Syntax:** If $a \in \Sigma$, then $\textbf{a}$ is a regular expression. **Semantics:** $L(\mathbf a) = \{a\}$. 4. **Syntax:** If $L, K, \ldots$ are *variables* then $L, K, \ldots$ are regular expressions. **Semantics:** Every variable $L, K, \ldots$ is interpreted as a language. ## Regular expressions -- Step cases 1. **Syntax:** If $E,F$ regular expressions, then their *sum* $E+F$ is a regular expression. **Semantics:** $L(E+F) = L(E) \cup L(F)$ 2. **Syntax:** If $E,F$ regular expressions, then their *concatenation* $EF$ is a regular expression. **Semantics:** $L(EF) = L(E)L(F)$ 3. **Syntax:** If $E$ is a regular expression, then its *Kleene closure* $E^*$ is a regular expression. **Semantics:** $L(E^*) = L(E)^*$ 4. **Syntax:** If $E$ is a regular expression, then $(E)$ is a regular expression. **Semantics:** $L((E)) = L(E)$ The order of precedence of these operations is (from high to low): 1. $*$: Kleene star 2. $\cdot$: concatenation 3. $+$: sum For example, $R = 01^*+1 = ((0(1^*)) + 1)$. # From regular expressions to DFAs/NFAs We have shown that every DFA can be captured by a regular expression. Conversely, given any regular expression, can we construct a DFA whose language is described by that expression? Yes, we can indeed. :::warning **Exercise (From regex to automata).** 1. For each of the following languages $L_i$, $i=1,2,3$, construct an NFA $\mathcal A_i$ accepting $L_i$: 1.1 $L_1 = \emptyset$ 1.2 $L_2 = \{\varepsilon\}$ 1.3 $L_3 = \{ a \}$ for any $a \in \Sigma$ 2. Let $\mathcal A$ and $\mathcal B$ be NFAs. Describe how to construct NFAs $\mathcal C_i$, $i=1,2,3$, such that: 2.1 $L(\mathcal C_1) = L(\mathcal A) \cup L(\mathcal B)$ 2.2 $L(\mathcal C_2) = L(\mathcal A) L(\mathcal B)$ 2.3 $L(\mathcal C_3) = (L(\mathcal A))^*$ 3. Using 1. and 2., argue that for any regular expression $R$ we can find an NFA $\mathcal A_R$ such that $L(\mathcal A_R) = L(R)$. Why does this give rise even to a *DFA* with that property? ::: # From DFAs/NFAs to regular expressions See [Kleene's algorithm](https://hackmd.io/@jweinberger/Bk5fTUCOZl).

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.