# GCC Internals: RTL Insn ## Concept RTL (Register Transfer Language) is a low level representation of programs. For now we might imagine that a program that represented in RTL is a (doubly-)linked list of `insn`: ``` [insn 1] <--> [insn 2] <--> [insn 3] ... ``` Each `insn` describes an operation of the program, like storing a value to memory, adding two values, etc. ## RTL of a Trivial Function Let’s see an example of RTL of a trivial C code: ```c $ cat foo.c void foo() { return; } ``` Then use `gcc` to compile it, with `-da` argument to dump all RTL for all passes ``` $ gcc foo.c -c -da $ ls foo.c foo.c.255r.expand foo.c.258r.jump foo.c.294r.split1 foo.c.298r.asmcons foo.c.311r.pro_and_epilogue foo.c.326r.stack foo.c.330r.mach foo.c.337r.nothrow foo.c.340r.dfinish foo.c.gkd foo.c.256r.vregs foo.c.270r.reginfo foo.c.296r.dfinit foo.c.303r.ira foo.c.314r.jump2 foo.c.327r.zero_call_used_regs foo.c.331r.barriers foo.c.338r.dwarf2 foo.o foo.c.257r.into_cfglayout foo.c.293r.outof_cfglayout foo.c.297r.mode_sw foo.c.304r.reload foo.c.325r.split4 foo.c.328r.alignments foo.c.336r.shorten foo.c.339r.final ``` You’ll see a lot of files were generated. Find the file that ends with `dfinish`: ``` $ cat foo.c.340r.dfinish ;; Function foo (foo, funcdef_no=0, decl_uid=2738, cgraph_uid=1, symbol_order=0) (note 1 0 3 NOTE_INSN_DELETED) (note 3 1 11 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn/f 11 3 12 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg/f:DI 6 bp)) "foo.c":2:12 56 {*pushdi2_rex64} (nil)) (insn/f 12 11 13 2 (set (reg/f:DI 6 bp) (reg/f:DI 7 sp)) "foo.c":2:12 82 {*movdi_internal} (nil)) (note 13 12 2 2 NOTE_INSN_PROLOGUE_END) (note 2 13 9 2 NOTE_INSN_FUNCTION_BEG) (insn 9 2 14 2 (const_int 0 [0]) "foo.c":3:3 1032 {nop} (nil)) (note 14 9 15 2 NOTE_INSN_EPILOGUE_BEG) (insn/f 15 14 16 2 (set (reg/f:DI 6 bp) (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0 S8 A8])) "foo.c":4:1 64 {*popdi1} (expr_list:REG_CFA_DEF_CFA (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8])) (nil))) (jump_insn 16 15 17 2 (simple_return) "foo.c":4:1 1026 {simple_return_internal} (nil) -> simple_return) (barrier 17 16 10) (note 10 17 0 NOTE_INSN_DELETED) ``` This is how the trivial `foo.c` looks like in the RTL format. Each `insn` is wrapped with parentheses. (The syntax is inspired by Lisp.) There are different types of `insn` . The type is represented in the first field of `insn` dump. In this example we have following types (listed in appearance order): - `note`: provide some informations for developers. - `insn`: instructions that do not jump and do not do function calls. For `insn` or `reg` that appends with "/f" means that this is frame-related. I.e. part of prologue or epilogue. - `jump_insn`: instructions that may jump. - `barrier`: placed in the instruction stream when control cannot flow past them. :::info 💡 The naming in GCC RTL is a little bit confusing. `insn` might refers to all kinds of RTL expression, but also refers to RTL expression for non-jump instructions … ::: :::info 💡 RTL expression is abbreviated to rtx in GCC documents and sources. ::: ## Definition of Insn Before diving into all above mysterious `insns`, let's try to understand what an `insn` is defined. The `insn` type is defined in *gcc/rtl.def*: ```c DEF_RTL_EXPR(INSN, "insn", "uuBeiie", RTX_INSN) ``` - INSN: The internal name of the rtx used in the C source. This filed would be used to define `enum rtx_code` in *gcc/rtl.h.* - “insn”: The name of the rtx in the external ASCII format. - "uuBeiie": The format of this rtx. Each letter is a filed in the rtx. The definition of the format is defined in [*gcc/rtl.cc*](https://github.com/gcc-mirror/gcc/blob/7f93910a8b5d606ad742a3594750f0c2b20d8bda/gcc/rtl.cc#L65-L93). We’ll break it down later. - RTX_INSN: The class of rtx. RTX_INSN is an rtx code for a machine `insn`. `jump_insn` also belongs to a RTX_INSN class. There are 4 different letters, `u`, `B`, `e`, `i` in `insn`’s formate field. The meanings of the letters are: - `u`: a pointer to another `insn`. - `B`: is a basic block pointer. - `e`: a pointer to an rtl expression - `i`: an integer For `insn`, there are 7 fields ("uuBeiie"), which are: 1. `u`: Pointer to the previous `insn`. 2. `u`: Pointer to the next `insn`. 3. `B`: Pointer to the basic block that this `insn` belongs to. 4. `e`: The body of the `insn`, i.e. what would this `insn` do. 5. `i`: The source location (line number) of this `insn`. 6. `i`: Code number of instruction, which maps to the backend instruction defined with `define_insn` in backend .md files. 7. `e`: Holds a list of notes on what this `insn` does to various REGs. Used to record register informations. :::info 💡 Sadly there are no clear documentation about what’s the meaning of these fields. We have to find some informations from relative functions in *gcc/rtl.cc* like ```c inline rtx_insn *PREV_INSN (const rtx_insn *insn) { rtx prev = XEXP (insn, 0); return safe_as_a <rtx_insn *> (prev); } ``` Since this function named `PREV_INSN` and it extracts the first field (`XEXP (insn, 0)`), we know that the first filed is the pointer of the previous insn. There are some blog post like https://kitoslab.blogspot.com/2012/06/gcc-rtl-insn.html. that mentions some of them. But since the code might changes, the posts might out of date. ::: ## Demystifying One Insn This section will explain how to "read" a RTL insn. This includes separating insn fields and look up the meaning of the RTL operations from the GCC internal document. Let's start from the first `insn` in the trivial funciton, re-arranged with comments: ``` (insn/f 11 ① 3 ② 12 ③ 2 ④ (set ⑤ (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg/f:DI 6 bp)) "foo.c":2:12 ⑥ 56 {*pushdi2_rex64} ⑦ (nil)) ⑧ ``` ① ID of the insn. ② Previous insn. ③ Next insn. ④ Basic block this insn belongs to. ⑤ Body of the insn. This field describes what this insn do. Here it's a `set` function. According to [GCC internal doc](https://gcc.gnu.org/onlinedocs/gccint/Side-Effects.html#index-set): > **(set lval x)** Represents the action of storing the value of x into the place represented by lval. The **lval** in this insn is `(mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8])`. We can also find the [description of **mem**](https://gcc.gnu.org/onlinedocs/gccint/Regs-and-Memory.html#index-mem) in the GCC internal doc: > **(mem:m addr alias)** > This RTX represents a reference to main memory at an address represented by the expression *addr*. > *m* specifies how large a unit of memory is accessed. *alias* specifies an alias set for the reference. In the above insn the *m* is **DI**, which is "double integer" (8-byte integer). (This is defined in the [Machine Mode](https://gcc.gnu.org/onlinedocs/gccint/Machine-Modes.html#index-DImode) seciton.) *addr* is another expression `pre_dec:DI (reg/f:DI 7 sp)`. [The doc says](https://gcc.gnu.org/onlinedocs/gccint/Incdec.html#index-pre_005fdec): > **(pre_dec:m x)** > Represents the side effect of decrementing x by a standard amount and represents also the value that x has after being decremented. The description is a little bit complex. This operation is usually used to decrease a stack pointer register and read from the stack. For `(pre_dec:DI x)`, it means "decrease the address in x by 8 (because DI is 8-byte) and read from the decreased address". the "pre" means decrease before read. Other similar operations are listed in the [Embedded Side Effects on Addresses](https://gcc.gnu.org/onlinedocs/gccint/Incdec.html#Embedded-Side-Effects-on-Addresses) section. The `reg/f:DI 7 sp` part is simple: reference the 7th register (that is the stack pointer's number defined in [gcc/config/i386/i386.h](https://github.com/gcc-mirror/gcc/blob/9141bfdd483e2838f5dce767f1c1657710ef2daf/gcc/config/i386/i386.h#L2043-L2044)). The final part of `mem` is `[0 S8 A8]`, which means "access to 8 bytes memory, 8 byte aligned expression" (not sure why, here is the [reference](https://gabrieleserra.ml/blog/2020-08-27-an-introduction-to-gcc-and-gccs-plugins.html)). ⑥ Source location. ⑦ The target instruction mapped to this insn. ⑧ Register info. You might notice that there are 8 fields, while the definition of `insn` only have 7. This is because the ID of an instruction is mendatory so not in the rtx format list. ## Understand the Whole Insns Since `note`s are just notes, we can remove them and see what are left: ``` ;; Function foo (foo, funcdef_no=0, decl_uid=2738, cgraph_uid=1, symbol_order=0) ① (insn/f 11 3 12 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg/f:DI 6 bp)) "foo.c":2:12 56 {*pushdi2_rex64} (nil)) ② (insn/f 12 11 13 2 (set (reg/f:DI 6 bp) (reg/f:DI 7 sp)) "foo.c":2:12 82 {*movdi_internal} (nil)) ③ (insn 9 2 14 2 (const_int 0 [0]) "foo.c":3:3 1032 {nop} (nil)) ④ (insn/f 15 14 16 2 (set (reg/f:DI 6 bp) (mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0 S8 A8])) "foo.c":4:1 64 {*popdi1} (expr_list:REG_CFA_DEF_CFA (plus:DI (reg/f:DI 7 sp) (const_int 8 [0x8])) (nil))) ⑤ (jump_insn 16 15 17 2 (simple_return) "foo.c":4:1 1026 {simple_return_internal} (nil) -> simple_return) ⑥ (barrier 17 16 10) ``` ① Decrease the stack pointer by 8 bytes, then store the value of `bp` (base poiner) register to the stack. (I.e. push `bp` to stack memory.) ② Store the content in regiser `sp` to register `bp`. ③ NOP (do nothing). ④ Read the stack memory and store to `bp`, then decrease the stack poiner by 8 bytes. (I.e. pop `bp` from stack memory.) ⑤ Return from function. ⑥ Put a barrier.