--- tags: Formal Language title: fundemental --- # Formal language > [TOC] ## Mathematical notation ### Sets * A set is a collection of elements. * $x$ is an element of the set $S$, we write $x\in S$. * $x$ is not in S is written $x\notin S$. * When we need, we use more explicit notation, for example, $S = \{i:i>0, i$ is even$\}=\{2,4,6,...\}$ * We read this as "$S$ is the set of all $i$m such that $i$ is greater than zero, and $i$ is even" #### set operations * union ($\cup$), intersection ($\cap$), and difference ($-$) * $S_1\cup S_2=\{x:x\in S_1$ or $x\in S_2\}$ * $S_1\cap S_2=\{x:x\in S_1$ and $x\in S_2\}$ * $S_1-S_2=\{x:x\in S_1$ and $x\notin S_2\}$ * complementation * $\overline{S}=\{x:x\in U,x\notin S\}$, where $U$ is the universal set (of all possible elements in). * empty set is denote by $\emptyset$ * $S\cup \emptyset=S-\emptyset=S$ * $S\cap \emptyset = \emptyset$ * $\overline{\emptyset}=U$ * $\overline{\overline{S}}=S$ * DeMorgan's laws: * $\overline{S_1 \cup S_2} = \bar{S_1}\cap \bar{S_2}$ * $\overline{S_1 \cap S_2} = \bar{S_1}\cup \bar{S_2}$ * The size of the set $S$ is denoted by $|S|$. * A set is said to be finite if $|S|$ is finite; otherwise, it is infinite. * powerset of $S$ is denoted by $2^S$. ::: info ex. If S is the set $\{a,b,c\}, the its powerset is $2^S=\{\emptyset, \{a\},\{b\},\{c\},\{a,b\},\{a,c\},\{b,c\},\{a,b,c\}\}$. Hence, $|S|=3$ and $|2^S|=8$. It is obvious that if $S$ is finite, then $|2^S|=2^{|S|}$ ::: * Cartesian product of two sets, which itself is a set of ordered pairs, we write * $S=S_1\times S_2=\{(x,y):x\in S_1,y\in S_2\}$ ::: info ex. Let $S_1=\{2,4\}$ and $S_2=\{2,3,5,6\}$. Then $S_1\times S_2=\{(2,2),(2,3),(2,5),(2,6),(4,2),(4,3),(4,5),(4,6)\}$ ::: * $S_1\times S_2\times \cdots \times S_n=\{(x_1,x_2,\ldots,x_n):x_i\in S_i\}$ ### Languages * A formal language is an abstraction of the general characteristic of programming languages, C, C++, Java, ... * A language consists of * a set of symbols * some rules of formation (grammar or automata) * A formal language is the set of all string permitted by the rules of formation. * A finite nonempty set of symbols is required, called the alphabet, denoted by $\Sigma$. * Strings are sequences of symbols from alphabet. ::: info if $\Sigma = \{a,b\}$, then $abab$ and $aaabbb$ are strings on $\Sigma$. we will write, for example, $w=abaaa$ to ndicate that the string named $w$ has the specific value $abaaa$ ::: * concatenation of two strings $w$ and $v$ is the string obtained by appending the symbols of $v$ to the right end of $w$. * $w=a_1a_2a_3$ and $v=b_1b_2b_3b_4$ * the concatenation $w$ and $v$, denoted by $wv=a_1a_2a_3b_1b_2b_3$ * the reverse of a string $w$ is denoted by $w^R$. * $w^R=a_3a_2a_1$ (obtained by writing the symbols in reverse order) * the length of a string $w$ is denoted by $|w|$. * $|\lambda|=0$ (empty string is denoted by $\lambda$) * $\lambda w=w \lambda=w$ hold for all w * any string of consecutive symbols in some $w$ is said to be a substring of $w$. * if $w=uv$ then the substring $u$ and $v$ are said to be a prefix and a suffix of $w$, resp. * for example, $w=abb$, then $\{\lambda, a, ab, abb\}$ is the set of all prefixes of $w$. $b$,$bb$ are some of its suffixes. * $w^n$ stands for the string obtained by repeating $w$ $n$ times. * $w^0=\lambda$ for all $w$ * $w^n=w^{n-1}w$ * $\Sigma^*$ denotes the set of strings obtained by concatenating zero or more symbols from $\Sigma$. * $\Sigma^0 = \{\lambda\}$ * $\Sigma^1 = \Sigma$ * $\Sigma^n = \Sigma^{n-1} \times \Sigma$ * $\Sigma^+ = \Sigma^* - \Sigma^0$ * A language is defined as a subset if $\Sigma^*$. * A string in a language $L$ will be called a sentence of $L$. * Any set of strings on an alphabet $\Sigma$ can be considered a language. :::info Let $\Sigma =\{a,b\}$. Then $\Sigma^* = \{\lambda, a,b, aa, ab, ba, bb, aaa, aab, ...\}$ * The set $\{a,aa,aab\}$ is a language on $\Sigma$. * The set $L=\{a^nb^n:n\geq 0\}$ is a language on $\Sigma$ * $aabb$ and $aaaabbbb$ are in $L$ * but $abb\notin L$ (the string $abb$ is not a sentence of $L$) * The language is infinite * Most interesting languages are infinite. ::: * So languages are sets, operation of two language are defined, union, intersection, difference, complement, reverse, concatenation, star-closure, positive closure. * $\overline{L} = \Sigma^* -L$ * $L^R=\{w^R:w\in L\}$ (the reverse of language is the set of all string reversals.) * $L_1L_2 = \{xy:x\in L_1,y_\in L_2\}$ (the concatenation of two language $L_1$ and $L_2$ is the set of all string obtained by concatenating any string in $L_1$ with any string in $L_2$) * $L^n$ as $L$ concatenated with itself $n$ times, * $L^0 = \{\lambda\}$ * $L^1 = L$ * $L^n = L^{n-1}L$ * star-closure of $L$ * $L^* = L^0\cup L^1 \cup L^2\cup \cdots$ * positive-closure of $L$ * $L^+ = L^1\cup L^2\cup \cdots$ ::: info If $L=\{a^nb^n:n\geq 0\}$, then, $L^2=\{a^nb^na^mb^m: n\geq 0, m\geq 0\}$ Note that $n$ and $m$ in the above are unrelated. The string $aabbaaabbb\in L^2$. $L^R=\{b^na^n: n\geq 0\}$ ::: ### Automata * An automata is an abstract model of a dgital computer. * input file (it can read input. it will be assume that the input is a string over a given alphabet written on an input file, may be a tape) * the input file is divided into cells, each of which can hold one symbol. * The input mechanism can read the input fule from left to right, one symbol at a time. can also detect the end of input string. * Automata can produce output * It may have a temporary storage, consisting of an unlimited number of cells. * it has a control unit, contains finite number of internal states. * At any given time, the control unit is in some state, and input mechanism is scanning a particular symbol on the input file. * The internal state of the control unit at the next time step is determined by transition function. ## Chapter 2 Finite Automata (FA) Objectives: * DFA (deterministic finite state automata) * NFA (nondeterministic finite state automata) * $DFA = NFA$ * minimal DFA :::info Definition of DFA A deterministic finite automata M consists of 5-tuples. * Q: a finite nonempty set of states * $\Sigma$: a finite nonempty set of input symbols * $\delta$: a state transition function such that $\delta(q,a)=p$ (Mean: $M$, in state $q$, read input $a$ and then enter state $p$) * $q_0$: a start state * $F$: a nonempty set of final or accepting states ::: ::: success * Show that the language $L=\{w:n_a(w)<n_b(w)<n_c(w)\}$ is not context free. :::