# GCC Internals: RTL Insn
## Concept
RTL (Register Transfer Language) is a low level representation of programs. For now we might imagine that a program that represented in RTL is a (doubly-)linked list of `insn`:
```
[insn 1] <--> [insn 2] <--> [insn 3] ...
```
Each `insn` describes an operation of the program, like storing a value to memory, adding two values, etc.
## RTL of a Trivial Function
Let’s see an example of RTL of a trivial C code:
```c
$ cat foo.c
void foo() {
return;
}
```
Then use `gcc` to compile it, with `-da` argument to dump all RTL for all passes
```
$ gcc foo.c -c -da
$ ls
foo.c foo.c.255r.expand foo.c.258r.jump foo.c.294r.split1 foo.c.298r.asmcons foo.c.311r.pro_and_epilogue foo.c.326r.stack foo.c.330r.mach foo.c.337r.nothrow foo.c.340r.dfinish
foo.c.gkd foo.c.256r.vregs foo.c.270r.reginfo foo.c.296r.dfinit foo.c.303r.ira foo.c.314r.jump2 foo.c.327r.zero_call_used_regs foo.c.331r.barriers foo.c.338r.dwarf2
foo.o foo.c.257r.into_cfglayout foo.c.293r.outof_cfglayout foo.c.297r.mode_sw foo.c.304r.reload foo.c.325r.split4 foo.c.328r.alignments foo.c.336r.shorten foo.c.339r.final
```
You’ll see a lot of files were generated. Find the file that ends with `dfinish`:
```
$ cat foo.c.340r.dfinish
;; Function foo (foo, funcdef_no=0, decl_uid=2738, cgraph_uid=1, symbol_order=0)
(note 1 0 3 NOTE_INSN_DELETED)
(note 3 1 11 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn/f 11 3 12 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8])
(reg/f:DI 6 bp)) "foo.c":2:12 56 {*pushdi2_rex64}
(nil))
(insn/f 12 11 13 2 (set (reg/f:DI 6 bp)
(reg/f:DI 7 sp)) "foo.c":2:12 82 {*movdi_internal}
(nil))
(note 13 12 2 2 NOTE_INSN_PROLOGUE_END)
(note 2 13 9 2 NOTE_INSN_FUNCTION_BEG)
(insn 9 2 14 2 (const_int 0 [0]) "foo.c":3:3 1032 {nop}
(nil))
(note 14 9 15 2 NOTE_INSN_EPILOGUE_BEG)
(insn/f 15 14 16 2 (set (reg/f:DI 6 bp)
(mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0 S8 A8])) "foo.c":4:1 64 {*popdi1}
(expr_list:REG_CFA_DEF_CFA (plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8]))
(nil)))
(jump_insn 16 15 17 2 (simple_return) "foo.c":4:1 1026 {simple_return_internal}
(nil)
-> simple_return)
(barrier 17 16 10)
(note 10 17 0 NOTE_INSN_DELETED)
```
This is how the trivial `foo.c` looks like in the RTL format.
Each `insn` is wrapped with parentheses. (The syntax is inspired by Lisp.)
There are different types of `insn` . The type is represented in the first field of `insn` dump.
In this example we have following types (listed in appearance order):
- `note`: provide some informations for developers.
- `insn`: instructions that do not jump and do not do function calls. For `insn` or `reg` that appends with "/f" means that this is frame-related. I.e. part of prologue or epilogue.
- `jump_insn`: instructions that may jump.
- `barrier`: placed in the instruction stream when control cannot flow past them.
:::info
💡 The naming in GCC RTL is a little bit confusing. `insn` might refers to all kinds of RTL expression,
but also refers to RTL expression for non-jump instructions …
:::
:::info
💡 RTL expression is abbreviated to rtx in GCC documents and sources.
:::
## Definition of Insn
Before diving into all above mysterious `insns`, let's try to understand what an `insn` is defined.
The `insn` type is defined in *gcc/rtl.def*:
```c
DEF_RTL_EXPR(INSN, "insn", "uuBeiie", RTX_INSN)
```
- INSN: The internal name of the rtx used in the C source. This filed would be used to define `enum rtx_code` in *gcc/rtl.h.*
- “insn”: The name of the rtx in the external ASCII format.
- "uuBeiie": The format of this rtx. Each letter is a filed in the rtx. The definition of the format is defined in [*gcc/rtl.cc*](https://github.com/gcc-mirror/gcc/blob/7f93910a8b5d606ad742a3594750f0c2b20d8bda/gcc/rtl.cc#L65-L93). We’ll break it down later.
- RTX_INSN: The class of rtx. RTX_INSN is an rtx code for a machine `insn`. `jump_insn` also belongs to a RTX_INSN class.
There are 4 different letters, `u`, `B`, `e`, `i` in `insn`’s formate field. The meanings of the letters are:
- `u`: a pointer to another `insn`.
- `B`: is a basic block pointer.
- `e`: a pointer to an rtl expression
- `i`: an integer
For `insn`, there are 7 fields ("uuBeiie"), which are:
1. `u`: Pointer to the previous `insn`.
2. `u`: Pointer to the next `insn`.
3. `B`: Pointer to the basic block that this `insn` belongs to.
4. `e`: The body of the `insn`, i.e. what would this `insn` do.
5. `i`: The source location (line number) of this `insn`.
6. `i`: Code number of instruction, which maps to the backend instruction defined with `define_insn` in backend .md files.
7. `e`: Holds a list of notes on what this `insn` does to various REGs. Used to record register informations.
:::info
💡 Sadly there are no clear documentation about what’s the meaning of these fields. We have to find some informations from relative functions in *gcc/rtl.cc* like
```c
inline rtx_insn *PREV_INSN (const rtx_insn *insn)
{
rtx prev = XEXP (insn, 0);
return safe_as_a <rtx_insn *> (prev);
}
```
Since this function named `PREV_INSN` and it extracts the first field (`XEXP (insn, 0)`), we know that the first filed is the pointer of the previous insn.
There are some blog post like https://kitoslab.blogspot.com/2012/06/gcc-rtl-insn.html. that mentions some of them. But since the code might changes, the posts might out of date.
:::
## Demystifying One Insn
This section will explain how to "read" a RTL insn. This includes separating insn fields
and look up the meaning of the RTL operations from the GCC internal document.
Let's start from the first `insn` in the trivial funciton, re-arranged with comments:
```
(insn/f
11 ①
3 ②
12 ③
2 ④
(set ⑤
(mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8])
(reg/f:DI 6 bp))
"foo.c":2:12 ⑥
56 {*pushdi2_rex64} ⑦
(nil)) ⑧
```
① ID of the insn.
② Previous insn.
③ Next insn.
④ Basic block this insn belongs to.
⑤ Body of the insn.
This field describes what this insn do. Here it's a `set` function. According to [GCC internal
doc](https://gcc.gnu.org/onlinedocs/gccint/Side-Effects.html#index-set):
> **(set lval x)**
Represents the action of storing the value of x into the place represented by lval.
The **lval** in this insn is `(mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8])`.
We can also find the [description of **mem**](https://gcc.gnu.org/onlinedocs/gccint/Regs-and-Memory.html#index-mem) in the GCC internal doc:
> **(mem:m addr alias)**
> This RTX represents a reference to main memory at an address represented by the expression *addr*.
> *m* specifies how large a unit of memory is accessed. *alias* specifies an alias set for the reference.
In the above insn the *m* is **DI**, which is "double integer" (8-byte integer). (This is defined in the [Machine Mode](https://gcc.gnu.org/onlinedocs/gccint/Machine-Modes.html#index-DImode) seciton.)
*addr* is another expression `pre_dec:DI (reg/f:DI 7 sp)`. [The doc says](https://gcc.gnu.org/onlinedocs/gccint/Incdec.html#index-pre_005fdec):
> **(pre_dec:m x)**
> Represents the side effect of decrementing x by a standard amount and represents also the value that x has after being decremented.
The description is a little bit complex. This operation is usually used to decrease a stack pointer register and read from the stack. For `(pre_dec:DI x)`, it means "decrease the address in x by 8 (because DI is 8-byte) and read from the decreased address". the "pre" means decrease before read. Other similar operations are listed in the [Embedded Side Effects on Addresses](https://gcc.gnu.org/onlinedocs/gccint/Incdec.html#Embedded-Side-Effects-on-Addresses) section.
The `reg/f:DI 7 sp` part is simple: reference the 7th register (that is the stack pointer's number defined in [gcc/config/i386/i386.h](https://github.com/gcc-mirror/gcc/blob/9141bfdd483e2838f5dce767f1c1657710ef2daf/gcc/config/i386/i386.h#L2043-L2044)).
The final part of `mem` is `[0 S8 A8]`, which means "access to 8 bytes memory, 8 byte aligned expression" (not sure why, here is the [reference](https://gabrieleserra.ml/blog/2020-08-27-an-introduction-to-gcc-and-gccs-plugins.html)).
⑥ Source location.
⑦ The target instruction mapped to this insn.
⑧ Register info.
You might notice that there are 8 fields, while the definition of `insn` only have 7.
This is because the ID of an instruction is mendatory so not in the rtx format list.
## Understand the Whole Insns
Since `note`s are just notes, we can remove them and see what are left:
```
;; Function foo (foo, funcdef_no=0, decl_uid=2738, cgraph_uid=1, symbol_order=0)
①
(insn/f 11 3 12 2
(set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8])
(reg/f:DI 6 bp))
"foo.c":2:12
56
{*pushdi2_rex64}
(nil))
②
(insn/f 12 11 13 2
(set (reg/f:DI 6 bp)
(reg/f:DI 7 sp))
"foo.c":2:12
82 {*movdi_internal}
(nil))
③
(insn 9 2 14 2
(const_int 0 [0])
"foo.c":3:3
1032 {nop}
(nil))
④
(insn/f 15 14 16 2
(set (reg/f:DI 6 bp)
(mem:DI (post_inc:DI (reg/f:DI 7 sp)) [0 S8 A8]))
"foo.c":4:1
64 {*popdi1}
(expr_list:REG_CFA_DEF_CFA (plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8]))
(nil)))
⑤
(jump_insn 16 15 17 2
(simple_return)
"foo.c":4:1
1026 {simple_return_internal}
(nil)
-> simple_return)
⑥
(barrier 17 16 10)
```
① Decrease the stack pointer by 8 bytes, then store the value of `bp` (base poiner) register to the stack. (I.e. push `bp` to stack memory.)
② Store the content in regiser `sp` to register `bp`.
③ NOP (do nothing).
④ Read the stack memory and store to `bp`, then decrease the stack poiner by 8 bytes. (I.e. pop `bp` from stack memory.)
⑤ Return from function.
⑥ Put a barrier.