# Getting Started with LLVM core Libraries
## Ch5 The LLVM Intermediate Representation
### Introducing LLVM IR in-memory model
+ Class
`Function`: represent common IR
`Instruction`: represent common IR
+ prefix
`%` for local variables.
`@` for global variables.
+ IR
`alloca` can reserve stack frame of current function.
+ C++ class that represents IR
1. The `Module` class aggregates all the data for translation. [Full Interface](http://llvm.org/docs/doxygen/html/classllvm_1_1Module.html)
2. The `Function` class contains information about function definitions and declarations. The argument of a function can be accessed using `Fuction::arg_iterator` and `begin()/end()`. [Full Interface](http://llvm.org/docs/doxygen/html/classllvm_1_1Function.html)
3. The `BasicBlock` class encapsulates LLVM instructions. Using `getTerminator()` allows you retrieve the last instruction of this BB. Additionally, `getSinglePredecessor()` can be used to obtain the single predecessor of this BB. [Full Interface](https://llvm.org/doxygen/classllvm_1_1BasicBlock.html)
4. The `Instruction` class represents a signle instruction of LLVM IR. You can retrieve the opcode using `getOpcode()`, which return an enumeration type of `llvm::Instruction`. You can access operands using the methods `op_begin()` and `op_end()`, which are inherited from the `User
+ `Value` and `User` class allow you easily nevigate use-def and def-use chain. `User` inherits from `Value`.
+ The use-def chain indicates that the `User` use what kind of `Value`. The def-use chain keeps track of which other `Value`(include `User`) use this `Value`, is called `use list`.
+ A class that inherits from `Value` is defined to provide a result that can used by another entities. `Value` define the method `use_begin()` and `use_end()` which make you can iterate through all `User`. `Value` has a powerful method called `replaceAllUsesWith()`, it can nevigate through all `User` of this `Value` and replace them by another `Value`. [Full Interface](https://llvm.org/doxygen/classllvm_1_1Value.html)
+ A class being a subclass of `User` indicates that it uses one or more `Value` interfaces. `User` provide `op_begin()` and `op_end()` that allow you go through all `Value` which it uses. This also defined the use-def chain. `User` also provide `replaceUseWith()`.[Full Interface](https://llvm.org/doxygen/classllvm_1_1User.html)
+ LLVM has lots of unique data structures. [programmer's manual](https://llvm.org/docs/ProgrammersManual.html)
### Writing a custom LLVM IR generator
+ record some header file you need
+ #include <llvm/ADT/SmallVector.h>
+ #include <llvm/Analysis/Verifier.h>
+ #include <llvm/IR/BasicBlock.h>
+ #include <llvm/IR/CallingConv.h>
+ #include <llvm/IR/Function.h>
+ #include <llvm/IR/Instructions.h>
+ #include <llvm/IR/LLVMContext.h>
+ #include <llvm/IR/Module.h>
+ #include <llvm/Bitcode/ReaderWriter.h>
+ #include <llvm/Support/ToolOutputFile.h>
+ some functions may use
+ `getGlobalContext()`: get current LLVM context.
+ `setDataLayout()`, `setTargetTriple()`: if you enable optimization, use these two function to specify the target.
```c++
SmallVector<Type*, 2> FuncTyArgs;
FuncTyArgs.push_back(IntegerType::get(mod-getContext(),
32));
FuncTyArgs.push_back(IntegerType::get(mod-getContext(),
32));
FunctionType *FuncTy = FunctionType::get(
/*Result=*/ IntegerType::get(mod->getContext(), 32),
/*Params=*/ FuncTyArgs, /*isVarArg=*/ false);
```
+ `FuncTyArgs` has two element, 32 bits.
+ notice the `FunctionType::get()`, return a 32 bits integer, `isVarArg` means that there is no variable arguments.
```c++
Function *funcSum = Function::Create(
/*Type=*/ FuncTy,
/*Linkage=*/ GlobalValue::ExternalLinkage,
/*Name=*/ "sum", mod);
funcSum->setCallingConv(CallingConv::C);
```
+ `Function::Create()` create a function instance.
+ `setCallingConv()` can set and get calling convention of this function. [reference](https://llvm.org/doxygen/classllvm_1_1Function.html#a5f494edc0a569c7fc9ff4181243be1ed)
```cpp
BasicBlock *labelEntry = BasicBlock::Create(mod->getContext(), "entry", funcSum, 0);
```
+ Create basic block with label "entry"
```cpp
// Block entry (label_entry)
AllocaInst *ptrA = new AllocaInst(IntegerType::get(mod->getContext(), 32), "a.addr", labelEntry);
ptrA->setAlignment(4);
AllocaInst *ptrB = new AllocaInst(IntegerType::get(mod->getContext(), 32), "b.addr", labelEntry);
ptrB->setAlignment(4);
```
+ create two 32 bits stack element with 4-byte alignment.
+ the instruction will add at the end of basic block.
:::info
Alternatively, you can use a helper template class called IRBuilder<> to build IR instructions. [IR Builder link](https://llvm.org/doxygen/classllvm_1_1IRBuilder.html)
:::
```cpp
StoreInst *st0 = new StoreInst(int32_a, ptrA, false, labelEntry);
st0->setAlignment(4);
StoreInst *st1 = new StoreInst(int32_b, ptrB, false, labelEntry);
st1->setAlignment(4);
```
+ store the values returned by `AllocaInst` into the stack.
+ the third argument specifies whether this is a volatile store.
```cpp
LoadInst *ld0 = new LoadInst(ptrA, "", false, labelEntry);
ld0->setAlignment(4);
LoadInst *ld1 = new LoadInst(ptrB, "", false, labelEntry);
ld1->setAlignment(4);
BinaryOperator *addRes = BinaryOperator::Create(Instruction::Add,
ld0, ld1, "add", labelEntry);
ReturnInst::Create(mod->getContext(), addRes, labelEntry);
return mod;
}
```
+ use nonvolatile load, loading the value back to `ld0` and `ld1` as the arguments for add instruction.
+ `addRes` is the result of add instruction.
+ this function ends here, return the IR module.
### Pass API
+ We don't use `Pass` class directly, but only through subclass. The subclasses are implemented by granularity. There are three subclasses which are used in different scope of optimization.
+ Subclass of Pass:
+ [Writting and LLVM Pass](https://llvm.org/docs/WritingAnLLVMPass.html)
+ `ModulePass`: allow entire module to be analyzed. It allow deletion of functions and other changes.
+ `FunctionPass`: it's used to handling one function at a time. It's not allow to delete functions and delete globals.
+ `BasicBlockPass`: same modification forbidden in `FunctionPass` also forbid here. It also forbid to change or delete external basic blocks.
The usage of `ModulePass` and `BasicBlockPass` need to write a class and inherits from the pass which you want. And overload `runOnModule()` or `runOnBasicBlock()` functions. Using `FunctionPass` only write a class and overload `runOnFunction()`.
## Ch6 The Backend
### Overview

The diagram is overview of the necessary step to go from LLVM IR to object code or assembly.
+ The light gray block is called `superpasses` because it consist of many passes internally.
+ `Instruction Selections`: ==This phase will converts the LLVM IR to `SelectionDAG` nodes.== Each DAG represents a basic block, and a node of DAG is represents a isntruction in the basic block. The edges may encode dataflow among them. Some algorithms need to use tree-structure to analyze, so DAG is important for these algo. By the end of this phase, DAG will convert to target-machine code.
+ `Instruction Scheduling`: We need to convert DAG to three-address instructions due to the DAG does't imply ordering between instructions. The first instance (implys that it has second instance later.) of `Instruction Scheduling` is also called **Pre-register Allocation (RA)**. The DAG will be convert to the `MachineInstr` three-address representation.
+ `Register Allocation (RA)`: LLVM IR has the characteristic of infinit register before RA.
+ `Second Instruction Scheduling`: also called **Post-register Allocation Scheduling.** Since the real register is implement in instruction, it will has some hazard or delays with some register. So `Instruction Scheduling` can improve the instruction order.
+ `Code Emission`: convert instruction from `MachineInstr` to `MCInst` instances.
### Instruction selection phase
+ Instruction phase is transform LLVM IR into `SelectionDAG` nodes which carry instruction operation.
:::info
instruction selection is most expensive pass in llc. Under the SPEC CPU2006, instruction selection uses half of the time spent on llc.
:::
#### The SelectionDAG class

+ Black arrow represent the normal edge of DAG which means the dataflow of the instructions.
+ Blue dashed arrow indicates that the ordering relationship among instructions.
+ Red arrow means that these two insrtuctions must glued together, no instructions can insert between them.
A node can supply different type of value. There are three kinds of types :
1. concrete value type : The type can be i32, f32...
2. `Other` type : An abstrct token which can represent chain value. When another node consumes a `Other` type, the blue dash line will connect with two nodes.
3. `Glue` type : When a node consumes a `Glue` type, they will be glued by red line.
+ `SelectionDAG` has a element marks the entry of the basic block and supply a `Other` type node which allows chain node to start by consume it.
:::info
[SelectDAG.h, L253](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L253) `EntryNode`.
[SelectDAG.h, L554](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L554) `getEntryNode` function. I think it just create a node which use `SDValue` from `SelectionDAGNodes.h`.
[SelectionDAGNodes.h, L153](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAGNodes.h#L153)`SDValue`, I think it is the constructor of `SDValue`.
:::
+ SelectionDAG has a reference to the last instrcutions which also encode as `Other` type, which is `GraphRoot` in the graph.
:::info
[SelectionDAG.h, L256](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L256) `Root` data.
[SelectionDAG.h, L560](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L560) `setRoot` function.
[SelectionDAG.h, L552](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L551) `getRoot` function, return the root tag.
:::
:::warning
The points here is the relation between DAG nodes and the meanings of different types.
:::
#### Lowering

Some special IR instructions like `RET` and `CALL` are target-dependent. That's why there have some target-dependent nodes in the first DAG (which is produced by `selectionDAGBuilder`).
#### DAG combine and legalization
+ DAG combine pass :
the pass match a set of nodes and replace with a simpler construct.
For example: `add (RegisterX, 0)` -> `RegisterX`
Combine pass will execute after legalization phase so that can reduce redundancy DAG.
:::info
[DAGCombiner.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp)
:::
+ Legalization pass :
+ Legal type : The type which natively supported by target.
for example : the target support i32, but DAG shows the addition of i64.
+ Legalization pass guarantees that instruction selection only needs to deal with ==legal== types.
#### DAG to DAG instrcution selection
+ The ==DAG to DAG instruction selection== is used to transform target-**independent** nodes to target-**dependent** nodes.
+ The instruction selection algo work on a DAG which is a basic block.
+ `CopyToReg`, `CopyFromReg` and `Register` nodes are untouched, maintain it until register allocation phase.
+ A node has three types, first is general LLVM `ISD` nodes, next is `<Target>ISD`, last is tarrget-specific `X86ISD`.
##### Pattern matching
+ Each taget has a subclass `selectionDAGISel` named `<Target_Name>DAGToDAGISel`.
+ The `Select()` method will recieved `SDNode` and to be matched and return a `SDNode` which represent physical instruction.
+ `Select()` has two methods to match physical instructions.
1. Directly call TableGen to generate `SelectCode()` method.
2. Provide some custom matching code before `SelectCode()`.
#### Visualizing the instruction selection process
+ use some flag of llc. page 172 of pdf file.
#### Fast Instruction Selection
The goal of fast instruction selection is providing code quickly but the quality of code is not good. Which suits the philosophy of the -O0 opt.
:::info
[SelectionDAG/FastISel.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp)
:::
### Scheduler
+ The former chapter transformed the DAG to DAG which has physical instruction information. ==The next stage== is to comprise a pre-register allocation scheduler working on `SDNodes`. Each of scheduler is a subclass of `SchedulDAGNodes`.
+ Scheduler type can be assigned by llc flags.
+ `list-ilp`, `list-hybrid`, `list-burr`.
+ `fast`
+ `vliw-td`
:::warning
During the code generation, there has three scheduler. Two of them is prior Register Allocation, the other is post RA.
:::
#### Instruction itineraries
+ Itinerary is like a table which records some information about the processor. Including instruction latency and hardware pipeline information. The scheduler use these attributes to maximize throughtput or avoid hazard.
:::info
[target directory](https://github.com/llvm/llvm-project/tree/main/llvm/lib/Target) to find TableGen data (.td) in each processor directory.
For example : [AArch64/AArch64SchedA510.td](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64SchedA510.td)
:::
#### Hazard detection
+ The hazard recognizer use the itinerary provided by processor to compute hazard.
+ Each processor can provide their own recognizer.
:::info
[ScoreboardHazardRecognizer.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/ScoreboardHazardRecognizer.cpp)
:::
#### Scheduling Unit
+ The scheduler run before and after RA. But `SDNodes` only exist before RA. After RA, it would use `MachineInstr` class.
+ `SUnit` is used to abstract the underlying instructions which used during instruction scheduling.
:::info
[SUnit](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/ScheduleDAG.h#L242)
:::
### Machine instructions
+ `InstrEmitter` pass runs after scheduling, it can transfer `SDNode` to `MachineInstr` (MI in short) format.
### Register allocation
+ The basic task of RA is to transform endless virtual register to limited physical register.
+ Some virtual register will be allocated to memory location, called **spill slot**.
+ RA also needs to deconstruct SSA form, including phi function.
+ LLVM has four RA implementations that can be selected by `llc`:
+ `pbqp`: Partitioned Boolean Quadratic Programming.
+ `greedy`
+ `basic`
+ `fast`
+ Default method is selected by the optimization level (-O0).

#### Register coalescer
+ The register coalescer is used to remove redundant copy instructions by joining intervals.
+ is implemented in `RegisterCoalescer` class. The pass work with `MachineInstr`
+ Index:
+ B: conrresponds to block
+ R: register
+ the interval of register is represented as `[32r:48r:0)`. `32r` and `48r` represent the living interval, which is defined at 32 and killed in 48.
+ If there have a COPY instructions with R1 and R2 register, and their alive time are `[0B,32r:0)` and `[32r:48r:0)`. These are the prerequisite for a coalescing.
#### Virtual register rewrite
+ `VirtRegMap` records the mapping of virtual reg and physical reg.
+ `VirtRegRewriter` uses `VirtRegMap` to replace virtual regs with physical regs.
+ If there have some remaining identity `COPY`, the allocator will assign the same reg and the rewriter can delete it.
:::info
[VirtRegMap.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/VirtRegMap.cpp)
:::
#### Target hook
+ To generate taeget-specific code, the allocator/generator will need the information about the targer.
+ The target information is in subclass of `TargetRegisterInfo` (like `X86GenRegisterInfo`).
### Prologue and epilogue
+ Function needs a prologue and a epilogue.
+ Prologue sets up the stack frame and callee-saved register during the beginning of a function.
+ epilogue cleans up the stack frame.
:::info
[llvm/lib/Target/\<Target>/\<Target>FrameLowering.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/ARMFrameLowering.cpp)
內有 \<Target>FrameLowering::emitPrologue() and \<Target>FrameLowering::emitEpilogue()
:::
#### Frame index
+ LLVM uses virtual stack frame ==during code generation==.
+ After insert the prologue, it will allocate stack frame and provide target-specific info.
+ `eliminateFrameIndex()` in `<Target>RegisterInfo` implements the replacement by conveting virtual frame index to real stack offset.
### Machine Code framework
#### MC instructions
+ The machine code instruction(`MCInst`) replace the machine instruction(`MachineInstr`).
+ `MCInst` contains less info about the program.
:::info
[include/llvm/MC/MCInst.h](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/MC/MCInst.h)
:::
#### Code emission

+ `AsmPrinter` is the first machine function pass that first emit function header and iterate all basic blocks. And `AsmPrinter` also dispatching a `MI` at a time to the `emitInstruction()` method.
+ `emitInstruction()` transforms the `MI` into `MCInst` through the `MVLowering` interface.
+ At this moment, there have two way to choose. The first is `MCAsmStreamer` that emits the assembly code. The second is `MCObjectStreamer` that emits binary instructions. Both are subclass of `MCStreamer`.
+ Assembler use `MCCodeEmitter::EncodeInstruction()` to transform `MCInst` into binary instructions.