Compiler - HackMD

# Compiler ###### tags: `CS-course` , `compiler` --- ## NFA example 1. ```graphviz digraph compiler { node [shape = circle] rankdir = LR label="a(a|b)*a" s [label = "start" shape=plaintext] s->0 0->1[label="ε"] 1->2[label="a"] 2->3[label="ε"] 3->4[label="ε"] 3->5[label="ε"] 4->6[label="a"] 5->7[label="b"] 6->8[label="ε"] 7->8[label="ε"] 8->9[label="ε"] 9->10[label="a"] 2->9[label="ε"] 8->3[label="ε"] 10 [shape=doublecircle] } ``` --- ## Syntax Analysis ### Eliminating Immediate Left Recursion 1. 先消除 non-immediate left recursion * $S \to Ab | Ac \\ A \to Sc \\ \Longrightarrow A \to Abc | Acc$ 2. 再消除 immediate left recursion * $A \to A \alpha | \beta \\ \Longrightarrow \\ A \to \beta A' \\ A' \to \alpha A' | \epsilon$ ![](https://i.imgur.com/Hp5e7Ah.png) > 消除 immediate left recursion 1. 先消除一部分 immediate left recursion * $A \to A \alpha | \beta \\ \Longrightarrow \\ A \to \beta A' \\ A' \to \alpha A' | \epsilon$ 2. 消除兩者的 non-immediate left recursion * $S \to Ab | Ac \\ A \to Sc \\ \Longrightarrow A \to Abc | Acc$ 3. 再消除 immediate left recursion * $A \to A \alpha | \beta \\ \Longrightarrow \\ A \to \beta A' \\ A' \to \alpha A' | \epsilon$ ![](https://i.imgur.com/pTMIbS8.png) * All Algorithm ![](https://i.imgur.com/gRxOwls.png) --- ## Top-Down Parsing 略等於 leftmost 推導，而 bottem-up parsing 則會略等於 rightmost 。 The **FIRST** and **FOLLOW** Functions * Construction of both top-down and bottom-up parsers is aided by the two functions. * They allow us to choose which production to apply, based on the next input symbol. ### $FIRST(\alpha)$: > 從 $\alpha$ 可以推導出第一個 terminal 。(**最左**) * The set of terminals that begins the strings derived from $\alpha$ * To compute $FIRST(X)$ for all grammar symbols $X$, apply the following rules until no more terminals or $ϵ$ can be added: * If $X$ is a terminal, then $FIRST(X)$ is just $X$. * If there is a Production $X → ϵ$, then $FIRST(X)$ contains $ϵ$. * If there is a Production $X → Y_1Y_2...Y_k$, then $FIRST(X)$ contains $FIRST(Y_1Y_2...Y_k)$. :::warning **$FIRST(Y_1Y_2 … Y_k)$ is either** * (if $FIRST (Y_1)$ doesn't contain $ϵ$) $FIRST (Y_1)$ * (if $FIRST(Y_1)$ contains $ϵ$) everything in $FIRST (Y_1)$ (except for $ϵ$) and everything in $FIRST (Y_2..Y_k)$ * If $FIRST(Y_1)FIRST(Y_2)..FIRST(Y_k)$ all contain $ϵ$, then $FIRST(Y_1Y_2…Y_k)$ contains $ϵ$. ::: EX: * $S→ABCDE$，$FIRST(S)=\{a,b,c\}$ * $A→a|ϵ$，$FIRST(A)=\{a,ϵ\}$ * $B→b|ϵ$，$FIRST(B)=\{b,ϵ\}$ * $C→c$，$FIRST(C)=\{c\}$ * $D→d|ϵ$，$FIRST(D)=\{d,ϵ\}$ * $E→e|ϵ$，$FIRST(E)=\{e,ϵ\}$ :::warning **6/2 更新** 對於 s->sa 會回到 first(s) ，先保留住： ![](https://i.imgur.com/ZRzgrnv.jpg) ::: ### $FOLLOW(A)$ >A 後面可能的 terminal 。(**最右**) >$FOLLOW$ 不會有 $ϵ$ * $FOLLOW(A)$: The set of terminals a that can appear immediately to the right of A in some sentential form. * $a \in FOLLOW(A)$ if there exists a derivation $S \Rightarrow \alpha Aa\beta$ * To compute $FOLLOW(A)$ for all nonterminals A, apply the following rules until nothing can be added: 1. **If S is the start symbol, then put $ (the end of input marker) in $FOLLOW(S)$.** 2. If there is a production $A → \alpha B\beta$, then everything in $FIRST(\alpha)$ except for $ϵ$ is placed in $FOLLOW(B)$. 3. If there is a production $A → \alpha B$, then everything in $FOLLOW(A)$ is in $FOLLOW(B)$. 4. If there is a production $A → \alpha B\beta$, where $FIRST(\beta)$ contains $ϵ$, then everything in $FOLLOW(A)$ is in $FOLLOW(B)$. 看當前 node1 的父節點展開當中，node1 的右邊 node2。依序判斷 node2 的展開選項，判斷是否為 node1 的 follow 。如果 node2 有 $ϵ$ 選項，再往右找。如果找完父節點展開的所有 node 選項，但還沒有找到 follow ，則直接涵蓋 $follow(父節點)$。 EX: >先往上一層(父節點)找右邊的。 >如果右邊是$ϵ$，直接包含$follow(父節點)$。 *　$E \to TE'$，$follow(E) = \{),dollarsign\}$ *　$E' \to +TE'|ϵ$ *　$T \to FT'$，$follow(T)= \{+, follow(E), follow(E')\}= \{+, ),dollarsign\}$ *　$T' \to *FT'|ϵ$ *　$F \to (E)|ϵ$ ![](https://i.imgur.com/V4GAClQ.png) 注意最後第二個 symbol ，如果父節點的右側可能會得到 $\epsilon$ 而導致父節點後免沒東西放入 follow() ，則需涵蓋 follow(父節點) 。 --- ## Recursive-descent / LL(1) :::warning http://jsmachines.sourceforge.net/machines/ll1.html ::: * parsing **without backtracking** * Can be constructed for a class of grammars called LL(1) * The first “L” stands for **scanning the input from left to right** * The second “L” stands for producing a **leftmost derivation** * The (1) means **using one input symbol of look ahead at each step** * **Limitations for LL(1)** grammar * No left-recursive * No ambiguous grammar * No left-factor :::success **LL(1)** 從左到右掃描，最左優先推導。最多只能偷看一個符號。 ::: ![](https://i.imgur.com/JyHkxXo.png) ![](https://i.imgur.com/NmILJHB.png) ![](https://i.imgur.com/SvpMmbr.png) ### LL(1) parsing table ``` | input symobl -------------------------------------- non-terminal | variabbles | | | ``` > 給 statement ，填表。 ### Example of LL(1) parsing table E $\to$ TE' E' $\to$ +TE' | $\epsilon$ T $\to$ FT' T' $\to$ *FT' | $\epsilon$ F $\to$ (E) | **id** FIRST * FIRST(E) = FIRST(T) = FIRST(F) = {(,**id**)} * FIRST(E') = {+, $\epsilon$} * FIRST(T') = {*, $\epsilon$} FOLLOW * FOLLOW(E) = FOLLOW(E') = {),$} * FOLLOW(T) = FOLLOW(T') = {+,),$} * FOLLOW(F) = = {+,*,),$} :::warning Steps: 1. first 2. if have $\epsilon$ , use follow ::: E $\to$ TE' FIRST(TE') = {(,**id**)} | | id | + | * | ( | ) | $ | |:---:|:---------:| --- |:---:|:---------:| --- |:---:| | E | E$\to$TE' | | | E$\to$TE' | | | | E' | | | | | | | | T | | | | | | | | T' | | | | | | | | F | | | | | | | E' $\to$ +TE' FIRST(+TE') = {+} | | id | + | * | ( | ) | $ | |:---:| --------- |:-----------:|:---:|:---------:| --- |:---:| | E | E$\to$TE' | | | E$\to$TE' | | | | E' | | E'$\to$+TE' | | | | | | T | | | | | | | | T' | | | | | | | | F | | | | | | | E' $\to$ $\epsilon$ FIRST($\epsilon$) = {$\epsilon$} FOLLOW(E') = {),$} | | id | + | * | ( | ) | $ | |:---:| --------- |:-----------:|:---:|:---------:|:------------------:|:------------------:| | E | E$\to$TE' | | | E$\to$TE' | | | | E' | | E'$\to$+TE' | | | E'$\to$ $\epsilon$ | E'$\to$ $\epsilon$ | | T | | | | | | | | T' | | | | | | | | F | | | | | | | ... | | id | + | * | ( | ) | $ | |:---:|:------------:|:-----------------:|:------------:|:---------:|:------------------:|:------------------:| | E | E$\to$TE' | | | E$\to$TE' | | | | E' | | E'$\to$+TE' | | | E'$\to$ $\epsilon$ | E'$\to$ $\epsilon$ | | T | T$\to$FT' | | | T$\to$FT' | | | | T' | | T'$\to$$\epsilon$ | T'$\to$* FT' | | T'$\to$ $\epsilon$ | T'$\to$ $\epsilon$ | | F | F$\to$**id** | | | F$\to$(E) | | | An error is detected when * The terminal on top of the stack does not match the next input symbol * Nonterminal A is on top of the stack, a is the next input symbol, and M[A, a] is empty :::danger For the grammer, 如果 parsing table 中的 index 有 conflict ( which cell has more than one rule ) ，則不是 LL(1) 。 ::: --- ## LL(0) Grammar An LL(0) parser works on a grammar in which there are no decisions to make. So, in such grammars, every non terminal can have only one production. ### Part - 2 Bottom-Up Parsing Construct a parse tree for an input string * Begin at the leaves * Work up towards the root A general style of bottom-up parsing is known as shift-reduce parsing (移入-化簡語法分析) The largest class of grammars for which shift-reduce parsing can be built is the LR grammar ### LR(k) Parser LR(k) parsing * an efficient, bottom-up parsing technique * L: scanning the input from left to right * R: producing a rightmost derivation in reverse * k: the number of input symbols of lookahead Advantages * Can recognize almost all programming-language * * The most commonly used nonbacktracking (無回溯) shift-reduce parsing * Can detect a syntactic error as soon as possible * The class of grammars that can be parsed using LR methods is a proper superset (父集合) of the class of grammars that can be parsed with LL parsers * Drawback * Too much work to construct an LR parser by hand --- ## LR(0) and SLR ### CLOSURE Function ### GOTO Function * GOTO(I, X) ### LR(0) Automaton :::warning **LR(0) automation and SLR parsing table** http://jsmachines.sourceforge.net/machines/slr.html ::: * Shift-reduce 1. DFA 2. LR(0) table (parser) :::warning **NOTE** 03 - Syntax Analysis (part2) Parse String using LR(0) parser page.70 --- Use the DFA with LR(0) table 查表可能查步道之後的符號要用什麼，會查到 $r_n$ (reduce) 。遇到這情況請 reduce 成之前標號的狀態。 ![](https://i.imgur.com/Gl57WEp.jpg) > 註：每次 reduce 時，tree 都可以長一層。 ::: The parsing table consists of two parts * A parsing action function ACTION * A goto function GOTO Value of ACTION[i, a] for state i and a terminal a * Shift j: shift input a to the stack, but use state j to represent a * Reduce A->beta: Reduce beta on the top of the stack to head A * Accept: accept the input and finish parsing * Error: discover an error and take some corrective action The GOTO function * If GOTO[Ii, A] = Ij, then GOTO also maps a state i and a nonterminal A to state j ### Construct LR(0) parsing tables * Input: An augmented grammar G’ * Output: The LR(0)-parsing table for G’ * Method * Construct C={I0, I1, …, In}, the collection of sets of LR(0) items for G’ * State i is constructed from Ii. The parsing actions for state i are determined as follows * If [A-> alpha * a beta] is in Ii, and GOTO(Ii, a) = Ij, set ACTION[i,a] to “shift j”. * Here a is a terminal * If [A-> alpha *] is in Ii, set ACTION[i,a] to “reduce A->alpha” for all terminal a. * Here A may not be S’ * If [S’->S*] is in Ii, set ACTION[i,$] to “accept” ![](https://i.imgur.com/G5HC0Nh.png) ### Conflict LR(0) parse ![](https://i.imgur.com/OCMYI3T.jpg) * LR conflict, which will **not** occur? * A. **Shift/Shift** * This will not, since DFA's state cannot output two same symbol * B. Shift/Reduce * C. Reduce/Reduce ### SLR-Parsing Tables :::warning **LR(0) automation and SLR parsing table** http://jsmachines.sourceforge.net/machines/slr.html ::: * Input: An augmented grammar G’ * Output: The SLR-parsing table for G’ * Method * Construct C={I0, I1, …, In}, the collection of sets of LR(0) items for G’ * State i is constructed from Ii. The parsing actions for state i are determined as follows 1. If [A-> $\alpha$ * a $\beta$] is in $I_i$, and GOTO(Ii, a) = Ij, set ACTION[i,a] to “shift j”. * Here a is a terminal 2. If [A-> $\alpha$ *] is in $I_i$, set ACTION[i,a] to **“reduce A-> alpha”** for all a in **FOLLOW(A)**. * Here A may not be S’ 3. If [S’->S*] is in Ii, set ACTION[i,$] to “accept” * The goto transitions for state i are constructed for all nonterminals A using the rule: If GOTO(Ii, A) = Ij, then GOTO[i, A] = j * All undefined entries are made “error” * The initial state of the parser is the one constructed from the set of items containing [S’-> * S] * The parsing table generated by the algorithm is called the SLR(1) table for G * An LR parser using the SLR(1) table is said to be SLR(1) Usually we omit the “(1)” after the “SLR” * Given an LR parsing table * $s_i$ means shift and stack state i * $r_j$ means reduce by the production numbered j * acc means accept * blank means error ![](https://i.imgur.com/wUr6sFR.png) ### Example of SLR ![](https://i.imgur.com/His0zDo.png) > 註：注意 $I_0$ 以及＄得設置。＄是在 $S'\to sS\cdot$ 通常是 $I_1$ > 注意， $S \to \cdot$ ，亦即 $S \to \cdot \epsilon$ = $S \to \cdot \epsilon \epsilon$ = ... :::success **NOTE** * 因此，如果 SLR 有 conflict ， LR(0) 一定有 conflict 。 * 而，若 LR(0) 沒有 clonflict ，則 SLR 必定沒有 conflict 。 * **Eevery SLR(1) grammar is unambiguous** * However, there are **unambiguous grammars that are not SLR(1)** * Example ![](https://i.imgur.com/09KBcWu.png) ::: LR(1)/SLR(1) ，當中的 1 ，為 reduce 的時候，可以偷看一個 symbol 。 :::warning LR(1)/SLR(1) 也可以分別簡稱 LR / SLR 。 ::: ### Problem 1. not LL(1), is LR(0)? ![](https://i.imgur.com/jDM4Agh.png) $I_0SI_1$$ NO conflict, yes ### Problem 2. ### Problem 3. how to search table for "->$\epsilon$" ![](https://i.imgur.com/iMne6Na.png) --- ## LR(1) Parser > LR(0) + follow() => SLR(1) :::warning http://jsmachines.sourceforge.net/machines/lr1.html ::: * Also called **“canonical LR parser (CLR)”**, or CLR(1) parser * LR(1) parser is built based on **LR(1) items** * LR(1) items: the general form of an items becomes  [A → α • β, a] * The **lookahead** a has no effect when β ≠ є * [A → α •, a] calls for a reduction by A → α **if the next symbol is a** :::info 不嚴謹、有衝突 **LR(0) < SLR < LALR < LR()** 嚴謹 ::: ![](https://i.imgur.com/06QKbh2.png) 與 LR(0) 唯一不同的是 LR(1) 的 CLOSURE 會去看展開過的子 item 在父 item 得剩餘 symbol => FIRST(...) 例如 : 1. $S \to \cdot C C ,$ $ 2. 展開第一個 C 得下一個 item $S \to \cdot c C$ + FIRST(C$) 3. FIRST(C$) = c/d => $S \to \cdot c C, c/d$ ![](https://i.imgur.com/80SXEVh.png) * Input: An augmented grammar G’ * Output: LR(1) parsing table functions ACTION and GOTO for G’ * Method Construct C={I0, I1, …, In}, the collection of sets of LR(1) items for G’ State i is constructed from Ii. The parsing actions for state i are determined as follows 1. If [A→α•aβ, b] is in Ii, and GOTO(Ii, a) = Ij, set ACTION[i,a] to **“shift j**”. Here a is a terminal 2. If [A→α•, a] is in Ii, A ≠ S’, set ACTION[i,a] to “**reduce A→α**” 3. If [S’→S•, $] is in Ii, set ACTION[i,\$] to “**accept**” * The goto transitions for state i are constructed for all nonterminals A using the rule: If GOTO(Ii, A) = Ij, then GOTO[i, A] = j All undefined entries are made “error” The initial state of the parser is the one constructed from the set of items containing [S’→•S,\$] * The parsing table generated by the algorithm is called the LR(1) table for G * An LR parser using the LR(1) table is said to be LR(1) Usually we omit the “(1)” after the “LR” ![](https://i.imgur.com/Wi2QF4W.png) > 精確地把 reduce 的分開，如 state 4, 7 的 $r_3$ 。記得，後面得 First() 要放正在看的 symbol 後的其他 symbol : ![](https://i.imgur.com/KT2xXlE.png) --- ## LALR * LALR – lookahead LR * Often used in practice because the tables are much smaller than the LR tables * Most common syntactic constructs of programming languages can be expressed by an LALR grammar * Method: * A LALR(1) parsing table can be obtained by merging the sets of items of a LR(1) table with the same core (core: item扣除逗號與後面的部份) * Example. Consider the following sets of LR(1) items **LA(1) merge state to generate LALR will not get the NFA (重邊與 $\epsilon$) 。** ![](https://i.imgur.com/B3dQD55.png) > $I_3$ 、 $I_6$ 可以 merge ，而因已從其延伸的藍、綠框 $I_x$ 也可以分別 merge 。 Input: An augmented grammar G’ Output: LALR parsing-table functions ACTION and GOTO for G’ Method - Construct C = {I0, I1, I2, ..., In }, the sets of LR(1) items - For each core in C, find all sets with that core, and replace these sets by their union. Let result be C’ = {J0, J1, J2, ..., Jm} - State i is constructed from Ji, Actions for state i - If [A→$alpha$•a$beta$, b] is in Ii, and GOTO(Ii, a) = Ij, set ACTION[i,a] to “shift j”. Here a is a terminal - If [A→$alpha$•, a] is in Ii, A  S’, set ACTION[i,a] to “reduce A→$alpha$” - If [S’→S•, $] is in Ii, set ACTION[i,$] to “accept” - If Ji = I1 ∪ I2 ∪ ... ∪ Ik, then GOTO(I1,X) = GOTO(I2,X) = ... = GOTO(Ik,X) - Let K be the union of items having the same core as GOTO(I1, X), then GOTO(J, X) = K ### LALR(1) Conflicts - 表格只要有 1 格有 2 個以上的選擇。 - No shift-reduce conflicts will be introduced **by merging** - Because shift depends only on core, not the lookahead - However, it may produce reduce/reduce conflict :::warning reduce/reduce 是因為 merging ，所以他還是有可能有 shift/reduce conflict ::: Example: Consider the grammar S → a A d | b B d | a B e | b A e A → c B → c 整體上而言，會有下列衝突： * shift/shift * shift/reduce * reduce/reduce :::success **嚴謹程度** LR(0) < SLR < LALR < LR(1) ::: ### LALR(1) Example ![](https://i.imgur.com/vbk4vEm.png) ![](https://i.imgur.com/kxKiks5.png) > A::first(d\$) = {d} > B::first(e\$) = {e} > 選當前來源的後續字串作 first ![](https://i.imgur.com/pP5cjRP.png) ![](https://i.imgur.com/9oVGnMy.png) > NO Conflict * $I_6$ $I_9$ merge ![](https://i.imgur.com/PriZEgd.png) > **Merge** 需要圖，只有 LR(1) parsing tables 和 grammar 是沒辦法直接作 merge 。 :::warning 所以這個 example 是： * LALR 不是 * LR(1) 是 ::: :::success 如果有 $\epsilon$ ：在 $S'\to\cdot S,$\$ 在第五行生成 $S\to \cdot ,$\$ 。 LR series 就是 DFA ，沒有 $\epsilon$ 。 ::: --- http://www.cs.nthu.edu.tw/~wkhon/assignments/assign2ans.pdf https://web.njit.edu/~marvin/cs341/hw/hwsoln05.pdf https://math.stackexchange.com/questions/353031/w-such-that-it-contains-at-least-3-ones-is-my-approach-to-the-cfg-right https://github.com/fool2fish/dragon-book-exercise-answers/blob/master/ch04/4.2/4.2.md :::info [Generate Predict, First, and Follow Sets from EBNF (Extended Backus Naur Form) Grammar](http://hackingoff.com/compilers/predict-first-follow-set) ::: --- ## C2IR - Project ![](https://i.imgur.com/66HYsND.png) ### Example * [A small calculator to parse and execute mathematical expressions based on C, Flex, Bison.](https://github.com/BaseMax/SmallCalculator) * [yacc](https://github.com/Kray-G/kinx/blob/b2ef0e40fb49e9497f71eaa13fe5e456522fc37d/src/kinx.y) ### Reference * [clang to llvm ir](https://stackoverflow.com/questions/9148890/how-to-make-clang-compile-to-llvm-ir) ``` $ cloc . 10 text files. 10 unique files. 1 file ignored. github.com/AlDanial/cloc v 1.82 T=0.01 s (1266.3 files/s, 154364.4 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- C/C++ Header 4 177 27 706 yacc 1 28 3 91 make 1 15 2 51 lex 1 6 0 43 C++ 1 11 0 23 C 1 7 0 16 Markdown 1 2 0 11 ------------------------------------------------------------------------------- SUM: 10 246 32 941 ------------------------------------------------------------------------------- ``` ``` rm -f lex.cpp parser.cpp lex.hpp parser.hpp rm -f parser.tab.c parser.tab.h parser.output rm -f lex.o parser.o main.o rm -f c2ir flex --header-file=lex.hpp -o lex.cpp lex.l bison -d -v -o parser.cpp parser.y g++ -c lex.cpp `llvm-config --cppflags` -std=c++14 g++ -c parser.cpp `llvm-config --cppflags` -std=c++14 g++ -c main.cpp `llvm-config --cppflags` -std=c++14 g++ -o c2ir lex.o parser.o main.o `llvm-config --ldflags` -lpthread -ldl -lz -lncurses -rdynamic `llvm-config --libs` #include <stdio.h> extern int puts(const char *str); extern int printf(const char *str, ...); int do_math(int a) { int b = a + 1; printf("b = %d", b); puts(" "); return b; } int main () { char *string = "Hello World!"; do_math(1); puts(string); return 0; } LEX_HEADER LEX_EXTERN LEX_INT NIdentifier: int LEX_IDENTIFIER NIdentifier: puts LEX_LPAREN LEX_CHAR LEX_ASTERISK NIdentifier: char LEX_IDENTIFIER NIdentifier: str LEX_RPAREN NVariableDeclaration LEX_SEQPOINT Extern NFunctionDeclaration NBlock LEX_EXTERN LEX_INT NIdentifier: int LEX_IDENTIFIER NIdentifier: printf LEX_LPAREN LEX_CHAR LEX_ASTERISK NIdentifier: char LEX_IDENTIFIER NIdentifier: str LEX_RPAREN NVariableDeclaration LEX_SEQPOINT Extern NFunctionDeclaration LEX_INT NIdentifier: int LEX_IDENTIFIER NIdentifier: do_math LEX_LPAREN LEX_INT NIdentifier: int LEX_IDENTIFIER NIdentifier: a LEX_RPAREN NVariableDeclaration LEX_LBRACE LEX_INT NIdentifier: int LEX_IDENTIFIER NIdentifier: b LEX_EQUAL LEX_IDENTIFIER NIdentifier: a LEX_ADD LEX_INTEGER NInteger: 1 NBinaryOperator NVariableDeclaration LEX_SEQPOINT NBlock LEX_IDENTIFIER NIdentifier: printf LEX_LPAREN LEX_LITERAL NLiteral: b = %d LEX_COMMA LEX_IDENTIFIER NIdentifier: b LEX_RPAREN NMethdCall LEX_SEQPOINT NExpressionStatment LEX_IDENTIFIER NIdentifier: puts LEX_LPAREN LEX_LITERAL NLiteral: LEX_RPAREN NMethdCall LEX_SEQPOINT NExpressionStatment LEX_RETURN LEX_IDENTIFIER NIdentifier: b LEX_SEQPOINT NReturnStatement LEX_RBRACE NFunctionDeclaration LEX_INT NIdentifier: int LEX_IDENTIFIER NIdentifier: main LEX_LPAREN LEX_RPAREN LEX_LBRACE LEX_CHAR LEX_ASTERISK NIdentifier: char LEX_IDENTIFIER NIdentifier: string LEX_EQUAL LEX_LITERAL NLiteral: Hello World! NVariableDeclaration LEX_SEQPOINT NBlock LEX_IDENTIFIER NIdentifier: do_math LEX_LPAREN LEX_INTEGER NInteger: 1 LEX_RPAREN NMethdCall LEX_SEQPOINT NExpressionStatment LEX_IDENTIFIER NIdentifier: puts LEX_LPAREN LEX_IDENTIFIER NIdentifier: string LEX_RPAREN NMethdCall LEX_SEQPOINT NExpressionStatment LEX_RETURN LEX_INTEGER NInteger: 0 LEX_SEQPOINT NReturnStatement LEX_RBRACE NFunctionDeclaration parser program block 0x55c0fc9f1e50 --------------------- Generating code... Start generate code After push Block Generating block Start of block Generating code for 20NFunctionDeclaration Generating function declaration of puts identifier type: char identifier type: int Generating code for 20NFunctionDeclaration Generating function declaration of printf identifier type: char identifier type: int Generating code for 20NFunctionDeclaration Generating function declaration of do_math identifier type: int identifier type: int start of arguments Generating variable declaration of int a identifier type: int End of arguments Generating block Start of block Generating code for 20NVariableDeclaration Generating variable declaration of int b identifier type: int NAssignment Generating assignment of b = Generating binary operator Generating identifier a Generating Integer: 1 L is IntegerTyID R is IntegerTyID Generating code for 20NExpressionStatement Generating method call of printf Function arguments size not match, calleeF=0, this->arguments=2 Start of callee arg Generating Literal: b = %d Generating identifier b End of callee arg Generating code for 20NExpressionStatement Generating method call of puts Start of callee arg Generating Literal: End of callee arg Generating code for 16NReturnStatement Generating return statement Generating identifier b End of block Function block created Generating code for 20NFunctionDeclaration Generating function declaration of main identifier type: int start of arguments End of arguments Generating block Start of block Generating code for 20NVariableDeclaration Generating variable declaration of char * string identifier type: char NAssignment Generating assignment of string = Generating Literal: Hello World! Generating code for 20NExpressionStatement Generating method call of do_math Start of callee arg Generating Integer: 1 End of callee arg Generating code for 20NExpressionStatement Generating method call of puts Start of callee arg Generating identifier string End of callee arg Generating code for 16NReturnStatement Generating return statement Generating Integer: 0 End of block Function block created End of block After code gen Code is generated. ---------------------- ; ModuleID = 'main' source_filename = "main" @.str = private constant [4 x i8] c"%d\0A\00" @string = private unnamed_addr constant [7 x i8] c"b = %d\00", align 1 @string.2 = private unnamed_addr constant [2 x i8] c" \00", align 1 @string.3 = private unnamed_addr constant [13 x i8] c"Hello World!\00", align 1 declare i32 @printf(i8*, ...) define internal void @echo(i64 %toPrint) { entry: %0 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str, i32 0, i32 0), i64 %toPrint) ret void } declare i32 @puts(i8*) declare i32 @printf.1(i8*) define i32 @do_math(i32 %a) { entry: %0 = alloca i32 store i32 %a, i32* %0 %1 = alloca i32 %arrayPtr = load i32, i32* %0 %2 = load i32, i32* %0 %addtmp = add i32 %2, 1 store i32 %addtmp, i32* %1 %arrayPtr1 = load i32, i32* %1 %3 = load i32, i32* %1 %calltmp = call i32 (i8*, ...) @printf([7 x i8]* @string, i32 %3) %calltmp2 = call i32 @puts([2 x i8]* @string.2) %arrayPtr3 = load i32, i32* %1 %4 = load i32, i32* %1 ret i32 %4 } define i32 @main() { entry: %0 = alloca i8* store [13 x i8]* @string.3, i8** %0 %calltmp = call i32 @do_math(i32 1) %arrayPtr = load i8*, i8** %0 %1 = load i8*, i8** %0 %calltmp1 = call i32 @puts(i8* %1) ret i32 0 } --------------------- Object code wrote to text.o -------sample-------- clang -S -emit-llvm text.c cat text.ll ; ModuleID = 'text.c' source_filename = "text.c" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-linux-gnu" @.str = private unnamed_addr constant [7 x i8] c"b = %d\00", align 1 @.str.1 = private unnamed_addr constant [2 x i8] c" \00", align 1 @.str.2 = private unnamed_addr constant [13 x i8] c"Hello World!\00", align 1 ; Function Attrs: noinline nounwind optnone uwtable define dso_local i32 @do_math(i32 %0) #0 { %2 = alloca i32, align 4 %3 = alloca i32, align 4 store i32 %0, i32* %2, align 4 %4 = load i32, i32* %2, align 4 %5 = add nsw i32 %4, 1 store i32 %5, i32* %3, align 4 %6 = load i32, i32* %3, align 4 %7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([7 x i8], [7 x i8]* @.str, i64 0, i64 0), i32 %6) %8 = call i32 @puts(i8* getelementptr inbounds ([2 x i8], [2 x i8]* @.str.1, i64 0, i64 0)) %9 = load i32, i32* %3, align 4 ret i32 %9 } declare dso_local i32 @printf(i8*, ...) #1 declare dso_local i32 @puts(i8*) #1 ; Function Attrs: noinline nounwind optnone uwtable define dso_local i32 @main() #0 { %1 = alloca i32, align 4 %2 = alloca i8*, align 8 store i32 0, i32* %1, align 4 store i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i8** %2, align 8 %3 = call i32 @do_math(i32 1) %4 = load i8*, i8** %2, align 8 %5 = call i32 @puts(i8* %4) ret i32 0 } attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.module.flags = !{!0} !llvm.ident = !{!1} !0 = !{i32 1, !"wchar_size", i32 4} !1 = !{!"clang version 10.0.0-4ubuntu1 "} -------sample-------- clang -o test text.o ./test b = 2 Hello World! rm -f test text.o ```