Compiler Designs, Homework Assignment 1

# Compiler Designs, Homework Assignment 1 > Unicode subscript characters section... It's my saviour. > > Edit: I sometimes use "we" as the subject because I had been writing something on Wikiversity recently and it somehow sound more formal for me, but I actually did the homework all by myself. > [name=林濬祺][time=2024 04 03][color=#81f] ## 1. Describe the languages accepted by the following NFA. (30%) > They are actually all DFAs, not just NFAs. [name=林濬祺][time=2024 04 03][color=#81f] ### a. For a sequence that ends in a double-circle node, we start with $s_0$ ($\epsilon$) and this is a valid endpoint. Then we have two branches: `a(a|b)*` for the $s_1$ branch, but it never reaches a double-circle node; and the branch that goes to $s_2$ produces `b`, If you loop there you can produce indefinitely long string of `b`s, and you can either end there, which produces `b+` or come back to end in $s_0$, which adds an `a` at the end. That makes total string of one loop a `b+a?`. You can go on If you reconsider, you will find that `[b+a]*b+a?` is just `[b+a]*b+?`. ### b. At first is the two cycles from $s_0$ to $s_0$: `01` is $s_0 \to s_1 \to s_0$, and the `10` is the path to and from $s_2$. Both the two loop can go on 0 ~ infinite times, so `(01|10)*`. The second part is the routes $s_0 \to s_3$, you can go through either of the two branches: $s_1$ and $s_2$, and they produces `00`, `11` respectively. Finally you get to $s_3$, you can choose to go through the loop and produce `0|1` for any times, or terminate. Thus totally they produce `(01|10)*(00|11)(0|1)*`. ### c. Explanations are left out as an exercise for the readers: $s_0\to s_1\to s_0$:`(b*(ab))*` $s_0\to s_1\to s_2$ `aa` $s_2\to s_2\to s_3$ `a*b` Similarly $s_3\to s_4$ `a*b` $s_4\to s_6$ same as $s_0\to s_2$ `(b*(ab))*aa` Finally the loop at $s_6$: `(a|b)*` In whole: `(b*ab)*aaa*ba*b(b*ab)*aa(a|\b)*`. ## 2. Construct a DFA accepting each of the following language (30%) ### a. ```graphviz digraph{ node[shape=circle] rankdir=LR a[shape=none, label=""] a->s₀ s₀->s₁[label=" a"] s₁->s₁[label="a, b"] s₁->s₂[label=b] s₂->s₃[label=a] s₃->s₄[label=b] s₄->s₅[label=a] s₅[peripheries=2] s₅->s₅[label="a, b"] } ``` ### b. The subscript number of s marks length of the current consecutive `1`s. ```graphviz digraph{ rankdir=LR node[shape=circle] a[shape=none, label=""] a->sₐ sₐ->s₀[label=0] sₐ->s₁[label=1] s₀->s₁[label=1] s₁->s₂[label=1] s₂->s₀[label=0] s₁->s₀[label=0] node[peripheries=2] s₂->s₃[label=1] s₃->s₃ₐ[label=0] s₃ₐ->s₃[label=1] s₃->s₃[label=1] } ``` ### c. We keep track of the `|b|%3-|a|%2` in the subscript of $s$. ```graphviz digraph{ node[shape=circle] rankdir=LR a[shape=none, label=""] s₀[peripheries=2] a->s₀ s₀->s₁[label="a,b"] s₁->s₂[label=b] s₂->s₀[label=b] s₁->s₀[label=a] s₀->s₀[label=c] s₁->s₁[label=c] s₂->s₂[label=c] } ``` ## 3. Give a regular expression for the set recognized by the finite-state machine (10%) **There is no starting arrow**. We assume $s_0$ is the starting node. We get `1+(01)*`, or in star-only form, `11*(01)*`. ## 4. Consider the expression (30%) `(01 | 10 | 00)* 11` ### a. Use Thompson's construction to construct an NFA for each reg-ex. We annotate `ab` like this because we generally think there is no need of the $q_{1a}\to q_1$ there; they can combine into one node no matter when. The proof is quite easy: It is guaranteed that no other node will point into $q_i$, only $q_{ia}$ will be a target; and $q_{ia}$ has no other next-states except $\epsilon$ to $q_i$. We can conclude that $q_{ia}$ will be minimized, grouped with $q_i$ in (c.) of this problem. ```graphviz digraph{ node[shape=circle] rankdir=LR fixedsize=true q₀->q₁ₐ[label=a] q₁ₐ->q₁[label=ϵ] q₁->q₂[label=b] } ``` At first, we expand everything outside the parentheses, which turns out to be: ```graphviz digraph{ rankdir=LR node[shape=circle] fixedsize=true a[shape=none, label=""] a->q₀ q₀->q₁[label=ϵ] q₀->q₃ₐ[label=ϵ] q₁->q₂[label="01|10|00"] q₂->q₃ₐ->q₃[label=ϵ] q₂->q₁[label=ϵ splines=true] q₃->q₄ₐ[label=1] q₄ₐ->q₄[label=ϵ] node[peripheries=2] q₄->q₅[label=1] } ``` We furtherly expand `01|10|00` (assume LtR of the combination of pipes, i.e. `(01|10)|00`) into the full answer of this problem. ```graphviz digraph{ rankdir=LR node[shape=circle fixedsize=true] a[shape=none label=""] a->q₀ q₀->q₁[xlabel=ϵ] q₀->q₃ₐ[xlabel=ϵ] q₁->{q₆ q₇}[xlabel=ϵ] q₆->{q₉ q₈}[xlabel=ϵ] q₇->q₁₀ₐ[xlabel=0] q₁₀ₐ->q₁₀[label=ϵ] q₁₀->q₁₁[xlabel=0] q₈->q₁₂ₐ[xlabel=0] q₁₂ₐ->q₁₂[label=ϵ] q₁₂->q₁₃[xlabel=1] q₉->q₁₄ₐ[xlabel=1] q₁₄ₐ->q₁₄[label=ϵ] q₁₄->q₁₅[xlabel=0] {q₁₃ q₁₅}->q₁₆[xlabel=ϵ] {q₁₁ q₁₆}->q₂[xlabel=ϵ] q₂->q₃ₐ->q₃[label=ϵ] q₂->q₁[label=ϵ] q₃->q₄ₐ[label=1] q₄ₐ->q₄[label=ϵ] node[peripheries=2] q₄->q₅[xlabel=1] } ``` ### b. Convert NFA to DFA. The transition table of the NFA would be: | q-node | ϵ | 1 | 0 | ϵ-closure set | | ------ | ---- | --- | --- | ------------------- | | 0 | 1 3a | | | 0 1 3a 3 6 7 8 9 | | 1 | 6 7 | | | 1 6 7 8 9 | | 2 | 1 3a | | | 1 2 3a 3 6 7 8 9 | | 3a | 3 | | | 3a 3 | | 3 | | 4a | | 3 | | 4a | 4 | | | 4a 4 | | 4 | | 5 | | 4 | | 5 | | | | 5 | | 6 | 8 9 | | | 6 8 9 | | 7 | | | 10a | 7 | | 8 | | | 12a | 8 | | 9 | | 14a | | 9 | | 10a | 10 | | | 10a 10 | | 10 | | | 11 | 10 | | 11 | 2 | | | 1 2 3 6 7 8 9 11 | | 12a | 12 | | | 12a 12 | | 12 | | 13 | | 12 | | 13 | 16 | | | 1 2 3 6 7 8 9 13 16 | | 14a | 14 | | | 14a 14 | | 14 | | | 15 | 14 | | 15 | 16 | | | 1 2 3 6 7 8 9 15 16 | | 16 | 2 | | | 1 2 3 6 7 8 9 16 | The transition graph for DFA is given below: (`i-` means `i` and `ia`; and `j...` means the ϵ-closure set of `j`) | d-node | q-node(s) | 1 | 0 | | ------ | ---------------------- | ----------- | ------------ | | d₀ | 0 1 3a 3 6 7 8 9 | d₁ (4- 14-) | d₂ (10- 12-) | | d₁ | 4a 4 14a 14 | dₑ (5) | d₃ (15...) | | d₂ | 10a 10 12a 12 | d₄ (13...) | d₅ (11...) | | d₃ | 1 2 3a 3 6 7 8 9 15 16 | d₁ (4- 14-) | d₂ (10- 12-) | | d₄ | 1 2 3a 3 6 7 8 9 13 16 | d₁ (4- 14-) | d₂ (10- 12-) | | d₅ | 1 2 3a 3 6 7 8 9 11 | d₁ (4- 14-) | d₂ (10- 12-) | | dₑ | 5 | | | Thus the digraph for DFA would be: ```graphviz digraph{ rankdir=LR fixedsize=true node[shape=circle] a[shape=none label=""] a->d₀ d₀->d₁[label=1] d₀->d₂[label=0] d₁->dₑ[label=1] d₁->d₃[label=0] d₂->d₄[label=1] d₂->d₅[label=0] d₃->d₁[label=1] d₃->d₂[label=0] d₄->d₁[label=1] d₄->d₂[label=0] d₅->d₁[label=1] d₅->d₂[label=0] dₑ[peripheries=2] } ``` ### c. Minimise the DFA. We loop the character set and the nodes in order. First we group them into $\{d_0,d_1,d_2,d_3,d_4,d_5|d_e\}$. Then we keep `split()`-ting until everything cannot be splitted. Each bullet below represents an iteration. 1. The character `0` doesn't split `Group 0`, but the character `1` splits: the nodes $d_0,d_2,d_3,d_4,d_5$ all points to the same group after consuming `1` while $d_1$ points to the group 1. This separates $d_1$ into `Group 2`, the separation is now $\{d_0,d_2,d_3,d_4,d_5|d_e|d_1\}$\ \ `Group 1` and `Group 2` each contains only 1 node, and cannot be further splitted. From now on, we will skip checking the groups. 1. In this iteration, still, the character `0` cannot split `Group 0`, and `1` splits: only $d_2$ points inside the group, and all others points to `Group 2` (namely, $d_1$). This splits $d_2$ out into `Group 3`. $\{d_0,d_3,d_4,d_5|d_e|d_1|d_2\}$ Since `Group 3` also has one node only, we will also skip checking it from now on. 1. This iteration does nothing. All nodes in `Group 0` act the same way, no matter consuming `1` or `0`. This gives the fact that all nodes in the group should be combined into one, leaving the total node count down to 4. The new transition table and digraph shall look like this: | `Group` | d-nodes | 1 | 0 | | ------- | ----------------- | ---- | ---- | | 0 | $d_0,d_3,d_4,d_5$ | 2 | 3 | | 1 | $d_e$ | None | None | | 2 | $d_1$ | 1 | 0 | | 3 | $d_2$ | 0 | 0 | ```graphviz digraph{ rankdir=LR fixedsize=true node[shape=circle] a[shape=none label=""] a->d₀ d₀->d₃[label=0] d₀->d₂[label=1] d₂->d₁[label=1] d₂->d₀[label=0] d₃->d₀[label="1,0"] d₁[peripheries=2] } ```