---
tags: Formal Language
title: fundemental
---
# Formal language
> [TOC]
## Mathematical notation
### Sets
* A set is a collection of elements.
* $x$ is an element of the set $S$, we write $x\in S$.
* $x$ is not in S is written $x\notin S$.
* When we need, we use more explicit notation, for example, $S = \{i:i>0, i$ is even$\}=\{2,4,6,...\}$
* We read this as "$S$ is the set of all $i$m such that $i$ is greater than zero, and $i$ is even"
#### set operations
* union ($\cup$), intersection ($\cap$), and difference ($-$)
* $S_1\cup S_2=\{x:x\in S_1$ or $x\in S_2\}$
* $S_1\cap S_2=\{x:x\in S_1$ and $x\in S_2\}$
* $S_1-S_2=\{x:x\in S_1$ and $x\notin S_2\}$
* complementation
* $\overline{S}=\{x:x\in U,x\notin S\}$, where $U$ is the universal set (of all possible elements in).
* empty set is denote by $\emptyset$
* $S\cup \emptyset=S-\emptyset=S$
* $S\cap \emptyset = \emptyset$
* $\overline{\emptyset}=U$
* $\overline{\overline{S}}=S$
* DeMorgan's laws:
* $\overline{S_1 \cup S_2} = \bar{S_1}\cap \bar{S_2}$
* $\overline{S_1 \cap S_2} = \bar{S_1}\cup \bar{S_2}$
* The size of the set $S$ is denoted by $|S|$.
* A set is said to be finite if $|S|$ is finite; otherwise, it is infinite.
* powerset of $S$ is denoted by $2^S$.
::: info
ex.
If S is the set $\{a,b,c\}, the its powerset is
$2^S=\{\emptyset, \{a\},\{b\},\{c\},\{a,b\},\{a,c\},\{b,c\},\{a,b,c\}\}$.
Hence, $|S|=3$ and $|2^S|=8$.
It is obvious that if $S$ is finite, then $|2^S|=2^{|S|}$
:::
* Cartesian product of two sets, which itself is a set of ordered pairs, we write
* $S=S_1\times S_2=\{(x,y):x\in S_1,y\in S_2\}$
::: info
ex.
Let $S_1=\{2,4\}$ and $S_2=\{2,3,5,6\}$. Then
$S_1\times S_2=\{(2,2),(2,3),(2,5),(2,6),(4,2),(4,3),(4,5),(4,6)\}$
:::
* $S_1\times S_2\times \cdots \times S_n=\{(x_1,x_2,\ldots,x_n):x_i\in S_i\}$
### Languages
* A formal language is an abstraction of the general characteristic of programming languages, C, C++, Java, ...
* A language consists of
* a set of symbols
* some rules of formation (grammar or automata)
* A formal language is the set of all string permitted by the rules of formation.
* A finite nonempty set of symbols is required, called the alphabet, denoted by $\Sigma$.
* Strings are sequences of symbols from alphabet.
::: info
if $\Sigma = \{a,b\}$, then $abab$ and $aaabbb$ are strings on $\Sigma$.
we will write, for example, $w=abaaa$ to ndicate that the string named $w$ has the specific value $abaaa$
:::
* concatenation of two strings $w$ and $v$ is the string obtained by appending the symbols of $v$ to the right end of $w$.
* $w=a_1a_2a_3$ and $v=b_1b_2b_3b_4$
* the concatenation $w$ and $v$, denoted by $wv=a_1a_2a_3b_1b_2b_3$
* the reverse of a string $w$ is denoted by $w^R$.
* $w^R=a_3a_2a_1$ (obtained by writing the symbols in reverse order)
* the length of a string $w$ is denoted by $|w|$.
* $|\lambda|=0$ (empty string is denoted by $\lambda$)
* $\lambda w=w \lambda=w$ hold for all w
* any string of consecutive symbols in some $w$ is said to be a substring of $w$.
* if $w=uv$ then the substring $u$ and $v$ are said to be a prefix and a suffix of $w$, resp.
* for example, $w=abb$, then $\{\lambda, a, ab, abb\}$ is the set of all prefixes of $w$. $b$,$bb$ are some of its suffixes.
* $w^n$ stands for the string obtained by repeating $w$ $n$ times.
* $w^0=\lambda$ for all $w$
* $w^n=w^{n-1}w$
* $\Sigma^*$ denotes the set of strings obtained by concatenating zero or more symbols from $\Sigma$.
* $\Sigma^0 = \{\lambda\}$
* $\Sigma^1 = \Sigma$
* $\Sigma^n = \Sigma^{n-1} \times \Sigma$
* $\Sigma^+ = \Sigma^* - \Sigma^0$
* A language is defined as a subset if $\Sigma^*$.
* A string in a language $L$ will be called a sentence of $L$.
* Any set of strings on an alphabet $\Sigma$ can be considered a language.
:::info
Let $\Sigma =\{a,b\}$. Then
$\Sigma^* = \{\lambda, a,b, aa, ab, ba, bb, aaa, aab, ...\}$
* The set $\{a,aa,aab\}$ is a language on $\Sigma$.
* The set $L=\{a^nb^n:n\geq 0\}$ is a language on $\Sigma$
* $aabb$ and $aaaabbbb$ are in $L$
* but $abb\notin L$ (the string $abb$ is not a sentence of $L$)
* The language is infinite
* Most interesting languages are infinite.
:::
* So languages are sets, operation of two language are defined, union, intersection, difference, complement, reverse, concatenation, star-closure, positive closure.
* $\overline{L} = \Sigma^* -L$
* $L^R=\{w^R:w\in L\}$ (the reverse of language is the set of all string reversals.)
* $L_1L_2 = \{xy:x\in L_1,y_\in L_2\}$ (the concatenation of two language $L_1$ and $L_2$ is the set of all string obtained by concatenating any string in $L_1$ with any string in $L_2$)
* $L^n$ as $L$ concatenated with itself $n$ times,
* $L^0 = \{\lambda\}$
* $L^1 = L$
* $L^n = L^{n-1}L$
* star-closure of $L$
* $L^* = L^0\cup L^1 \cup L^2\cup \cdots$
* positive-closure of $L$
* $L^+ = L^1\cup L^2\cup \cdots$
::: info
If $L=\{a^nb^n:n\geq 0\}$,
then,
$L^2=\{a^nb^na^mb^m: n\geq 0, m\geq 0\}$
Note that $n$ and $m$ in the above are unrelated.
The string $aabbaaabbb\in L^2$.
$L^R=\{b^na^n: n\geq 0\}$
:::
### Automata
* An automata is an abstract model of a dgital computer.
* input file (it can read input. it will be assume that the input is a string over a given alphabet written on an input file, may be a tape)
* the input file is divided into cells, each of which can hold one symbol.
* The input mechanism can read the input fule from left to right, one symbol at a time. can also detect the end of input string.
* Automata can produce output
* It may have a temporary storage, consisting of an unlimited number of cells.
* it has a control unit, contains finite number of internal states.
* At any given time, the control unit is in some state, and input mechanism is scanning a particular symbol on the input file.
* The internal state of the control unit at the next time step is determined by transition function.
## Chapter 2 Finite Automata (FA)
Objectives:
* DFA (deterministic finite state automata)
* NFA (nondeterministic finite state automata)
* $DFA = NFA$
* minimal DFA
:::info
Definition of DFA
A deterministic finite automata M consists of 5-tuples.
* Q: a finite nonempty set of states
* $\Sigma$: a finite nonempty set of input symbols
* $\delta$: a state transition function such that $\delta(q,a)=p$
(Mean: $M$, in state $q$, read input $a$ and then enter state $p$)
* $q_0$: a start state
* $F$: a nonempty set of final or accepting states
:::
::: success
* Show that the language $L=\{w:n_a(w)<n_b(w)<n_c(w)\}$ is not context free.
:::