Getting Started with LLVM core

# Getting Started with LLVM core Libraries ## Ch5 The LLVM Intermediate Representation ### Introducing LLVM IR in-memory model + Class `Function`: represent common IR `Instruction`: represent common IR + prefix `%` for local variables. `@` for global variables. + IR `alloca` can reserve stack frame of current function. + C++ class that represents IR 1. The `Module` class aggregates all the data for translation. [Full Interface](http://llvm.org/docs/doxygen/html/classllvm_1_1Module.html) 2. The `Function` class contains information about function definitions and declarations. The argument of a function can be accessed using `Fuction::arg_iterator` and `begin()/end()`. [Full Interface](http://llvm.org/docs/doxygen/html/classllvm_1_1Function.html) 3. The `BasicBlock` class encapsulates LLVM instructions. Using `getTerminator()` allows you retrieve the last instruction of this BB. Additionally, `getSinglePredecessor()` can be used to obtain the single predecessor of this BB. [Full Interface](https://llvm.org/doxygen/classllvm_1_1BasicBlock.html) 4. The `Instruction` class represents a signle instruction of LLVM IR. You can retrieve the opcode using `getOpcode()`, which return an enumeration type of `llvm::Instruction`. You can access operands using the methods `op_begin()` and `op_end()`, which are inherited from the `User + `Value` and `User` class allow you easily nevigate use-def and def-use chain. `User` inherits from `Value`. + The use-def chain indicates that the `User` use what kind of `Value`. The def-use chain keeps track of which other `Value`(include `User`) use this `Value`, is called `use list`. + A class that inherits from `Value` is defined to provide a result that can used by another entities. `Value` define the method `use_begin()` and `use_end()` which make you can iterate through all `User`. `Value` has a powerful method called `replaceAllUsesWith()`, it can nevigate through all `User` of this `Value` and replace them by another `Value`. [Full Interface](https://llvm.org/doxygen/classllvm_1_1Value.html) + A class being a subclass of `User` indicates that it uses one or more `Value` interfaces. `User` provide `op_begin()` and `op_end()` that allow you go through all `Value` which it uses. This also defined the use-def chain. `User` also provide `replaceUseWith()`.[Full Interface](https://llvm.org/doxygen/classllvm_1_1User.html) + LLVM has lots of unique data structures. [programmer's manual](https://llvm.org/docs/ProgrammersManual.html) ### Writing a custom LLVM IR generator + record some header file you need + #include <llvm/ADT/SmallVector.h> + #include <llvm/Analysis/Verifier.h> + #include <llvm/IR/BasicBlock.h> + #include <llvm/IR/CallingConv.h> + #include <llvm/IR/Function.h> + #include <llvm/IR/Instructions.h> + #include <llvm/IR/LLVMContext.h> + #include <llvm/IR/Module.h> + #include <llvm/Bitcode/ReaderWriter.h> + #include <llvm/Support/ToolOutputFile.h> + some functions may use + `getGlobalContext()`: get current LLVM context. + `setDataLayout()`, `setTargetTriple()`: if you enable optimization, use these two function to specify the target. ```c++ SmallVector<Type*, 2> FuncTyArgs; FuncTyArgs.push_back(IntegerType::get(mod-getContext(), 32)); FuncTyArgs.push_back(IntegerType::get(mod-getContext(), 32)); FunctionType *FuncTy = FunctionType::get( /*Result=*/ IntegerType::get(mod->getContext(), 32), /*Params=*/ FuncTyArgs, /*isVarArg=*/ false); ``` + `FuncTyArgs` has two element, 32 bits. + notice the `FunctionType::get()`, return a 32 bits integer, `isVarArg` means that there is no variable arguments. ```c++ Function *funcSum = Function::Create( /*Type=*/ FuncTy, /*Linkage=*/ GlobalValue::ExternalLinkage, /*Name=*/ "sum", mod); funcSum->setCallingConv(CallingConv::C); ``` + `Function::Create()` create a function instance. + `setCallingConv()` can set and get calling convention of this function. [reference](https://llvm.org/doxygen/classllvm_1_1Function.html#a5f494edc0a569c7fc9ff4181243be1ed) ```cpp BasicBlock *labelEntry = BasicBlock::Create(mod->getContext(), "entry", funcSum, 0); ``` + Create basic block with label "entry" ```cpp // Block entry (label_entry) AllocaInst *ptrA = new AllocaInst(IntegerType::get(mod->getContext(), 32), "a.addr", labelEntry); ptrA->setAlignment(4); AllocaInst *ptrB = new AllocaInst(IntegerType::get(mod->getContext(), 32), "b.addr", labelEntry); ptrB->setAlignment(4); ``` + create two 32 bits stack element with 4-byte alignment. + the instruction will add at the end of basic block. :::info Alternatively, you can use a helper template class called IRBuilder<> to build IR instructions. [IR Builder link](https://llvm.org/doxygen/classllvm_1_1IRBuilder.html) ::: ```cpp StoreInst *st0 = new StoreInst(int32_a, ptrA, false, labelEntry); st0->setAlignment(4); StoreInst *st1 = new StoreInst(int32_b, ptrB, false, labelEntry); st1->setAlignment(4); ``` + store the values returned by `AllocaInst` into the stack. + the third argument specifies whether this is a volatile store. ```cpp LoadInst *ld0 = new LoadInst(ptrA, "", false, labelEntry); ld0->setAlignment(4); LoadInst *ld1 = new LoadInst(ptrB, "", false, labelEntry); ld1->setAlignment(4); BinaryOperator *addRes = BinaryOperator::Create(Instruction::Add, ld0, ld1, "add", labelEntry); ReturnInst::Create(mod->getContext(), addRes, labelEntry); return mod; } ``` + use nonvolatile load, loading the value back to `ld0` and `ld1` as the arguments for add instruction. + `addRes` is the result of add instruction. + this function ends here, return the IR module. ### Pass API + We don't use `Pass` class directly, but only through subclass. The subclasses are implemented by granularity. There are three subclasses which are used in different scope of optimization. + Subclass of Pass: + [Writting and LLVM Pass](https://llvm.org/docs/WritingAnLLVMPass.html) + `ModulePass`: allow entire module to be analyzed. It allow deletion of functions and other changes. + `FunctionPass`: it's used to handling one function at a time. It's not allow to delete functions and delete globals. + `BasicBlockPass`: same modification forbidden in `FunctionPass` also forbid here. It also forbid to change or delete external basic blocks. The usage of `ModulePass` and `BasicBlockPass` need to write a class and inherits from the pass which you want. And overload `runOnModule()` or `runOnBasicBlock()` functions. Using `FunctionPass` only write a class and overload `runOnFunction()`. ## Ch6 The Backend ### Overview ![](https://hackmd.io/_uploads/ryYryCYp2.png) The diagram is overview of the necessary step to go from LLVM IR to object code or assembly. + The light gray block is called `superpasses` because it consist of many passes internally. + `Instruction Selections`: ==This phase will converts the LLVM IR to `SelectionDAG` nodes.== Each DAG represents a basic block, and a node of DAG is represents a isntruction in the basic block. The edges may encode dataflow among them. Some algorithms need to use tree-structure to analyze, so DAG is important for these algo. By the end of this phase, DAG will convert to target-machine code. + `Instruction Scheduling`: We need to convert DAG to three-address instructions due to the DAG does't imply ordering between instructions. The first instance (implys that it has second instance later.) of `Instruction Scheduling` is also called **Pre-register Allocation (RA)**. The DAG will be convert to the `MachineInstr` three-address representation. + `Register Allocation (RA)`: LLVM IR has the characteristic of infinit register before RA. + `Second Instruction Scheduling`: also called **Post-register Allocation Scheduling.** Since the real register is implement in instruction, it will has some hazard or delays with some register. So `Instruction Scheduling` can improve the instruction order. + `Code Emission`: convert instruction from `MachineInstr` to `MCInst` instances. ### Instruction selection phase + Instruction phase is transform LLVM IR into `SelectionDAG` nodes which carry instruction operation. :::info instruction selection is most expensive pass in llc. Under the SPEC CPU2006, instruction selection uses half of the time spent on llc. ::: #### The SelectionDAG class ![](https://hackmd.io/_uploads/SkdBqw3T3.png) + Black arrow represent the normal edge of DAG which means the dataflow of the instructions. + Blue dashed arrow indicates that the ordering relationship among instructions. + Red arrow means that these two insrtuctions must glued together, no instructions can insert between them. A node can supply different type of value. There are three kinds of types : 1. concrete value type : The type can be i32, f32... 2. `Other` type : An abstrct token which can represent chain value. When another node consumes a `Other` type, the blue dash line will connect with two nodes. 3. `Glue` type : When a node consumes a `Glue` type, they will be glued by red line. + `SelectionDAG` has a element marks the entry of the basic block and supply a `Other` type node which allows chain node to start by consume it. :::info [SelectDAG.h, L253](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L253) `EntryNode`. [SelectDAG.h, L554](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L554) `getEntryNode` function. I think it just create a node which use `SDValue` from `SelectionDAGNodes.h`. [SelectionDAGNodes.h, L153](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAGNodes.h#L153)`SDValue`, I think it is the constructor of `SDValue`. ::: + SelectionDAG has a reference to the last instrcutions which also encode as `Other` type, which is `GraphRoot` in the graph. :::info [SelectionDAG.h, L256](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L256) `Root` data. [SelectionDAG.h, L560](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L560) `setRoot` function. [SelectionDAG.h, L552](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/SelectionDAG.h#L551) `getRoot` function, return the root tag. ::: :::warning The points here is the relation between DAG nodes and the meanings of different types. ::: #### Lowering ![](https://hackmd.io/_uploads/Bkm2P5dC3.png) Some special IR instructions like `RET` and `CALL` are target-dependent. That's why there have some target-dependent nodes in the first DAG (which is produced by `selectionDAGBuilder`). #### DAG combine and legalization + DAG combine pass : the pass match a set of nodes and replace with a simpler construct. For example: `add (RegisterX, 0)` -> `RegisterX` Combine pass will execute after legalization phase so that can reduce redundancy DAG. :::info [DAGCombiner.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp) ::: + Legalization pass : + Legal type : The type which natively supported by target. for example : the target support i32, but DAG shows the addition of i64. + Legalization pass guarantees that instruction selection only needs to deal with ==legal== types. #### DAG to DAG instrcution selection + The ==DAG to DAG instruction selection== is used to transform target-**independent** nodes to target-**dependent** nodes. + The instruction selection algo work on a DAG which is a basic block. + `CopyToReg`, `CopyFromReg` and `Register` nodes are untouched, maintain it until register allocation phase. + A node has three types, first is general LLVM `ISD` nodes, next is `<Target>ISD`, last is tarrget-specific `X86ISD`. ##### Pattern matching + Each taget has a subclass `selectionDAGISel` named `<Target_Name>DAGToDAGISel`. + The `Select()` method will recieved `SDNode` and to be matched and return a `SDNode` which represent physical instruction. + `Select()` has two methods to match physical instructions. 1. Directly call TableGen to generate `SelectCode()` method. 2. Provide some custom matching code before `SelectCode()`. #### Visualizing the instruction selection process + use some flag of llc. page 172 of pdf file. #### Fast Instruction Selection The goal of fast instruction selection is providing code quickly but the quality of code is not good. Which suits the philosophy of the -O0 opt. :::info [SelectionDAG/FastISel.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/SelectionDAG/FastISel.cpp) ::: ### Scheduler + The former chapter transformed the DAG to DAG which has physical instruction information. ==The next stage== is to comprise a pre-register allocation scheduler working on `SDNodes`. Each of scheduler is a subclass of `SchedulDAGNodes`. + Scheduler type can be assigned by llc flags. + `list-ilp`, `list-hybrid`, `list-burr`. + `fast` + `vliw-td` :::warning During the code generation, there has three scheduler. Two of them is prior Register Allocation, the other is post RA. ::: #### Instruction itineraries + Itinerary is like a table which records some information about the processor. Including instruction latency and hardware pipeline information. The scheduler use these attributes to maximize throughtput or avoid hazard. :::info [target directory](https://github.com/llvm/llvm-project/tree/main/llvm/lib/Target) to find TableGen data (.td) in each processor directory. For example : [AArch64/AArch64SchedA510.td](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AArch64/AArch64SchedA510.td) ::: #### Hazard detection + The hazard recognizer use the itinerary provided by processor to compute hazard. + Each processor can provide their own recognizer. :::info [ScoreboardHazardRecognizer.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/ScoreboardHazardRecognizer.cpp) ::: #### Scheduling Unit + The scheduler run before and after RA. But `SDNodes` only exist before RA. After RA, it would use `MachineInstr` class. + `SUnit` is used to abstract the underlying instructions which used during instruction scheduling. :::info [SUnit](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/CodeGen/ScheduleDAG.h#L242) ::: ### Machine instructions + `InstrEmitter` pass runs after scheduling, it can transfer `SDNode` to `MachineInstr` (MI in short) format. ### Register allocation + The basic task of RA is to transform endless virtual register to limited physical register. + Some virtual register will be allocated to memory location, called **spill slot**. + RA also needs to deconstruct SSA form, including phi function. + LLVM has four RA implementations that can be selected by `llc`: + `pbqp`: Partitioned Boolean Quadratic Programming. + `greedy` + `basic` + `fast` + Default method is selected by the optimization level (-O0). ![](https://hackmd.io/_uploads/Hys0QQOk6.png) #### Register coalescer + The register coalescer is used to remove redundant copy instructions by joining intervals. + is implemented in `RegisterCoalescer` class. The pass work with `MachineInstr` + Index: + B: conrresponds to block + R: register + the interval of register is represented as `[32r:48r:0)`. `32r` and `48r` represent the living interval, which is defined at 32 and killed in 48. + If there have a COPY instructions with R1 and R2 register, and their alive time are `[0B,32r:0)` and `[32r:48r:0)`. These are the prerequisite for a coalescing. #### Virtual register rewrite + `VirtRegMap` records the mapping of virtual reg and physical reg. + `VirtRegRewriter` uses `VirtRegMap` to replace virtual regs with physical regs. + If there have some remaining identity `COPY`, the allocator will assign the same reg and the rewriter can delete it. :::info [VirtRegMap.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/VirtRegMap.cpp) ::: #### Target hook + To generate taeget-specific code, the allocator/generator will need the information about the targer. + The target information is in subclass of `TargetRegisterInfo` (like `X86GenRegisterInfo`). ### Prologue and epilogue + Function needs a prologue and a epilogue. + Prologue sets up the stack frame and callee-saved register during the beginning of a function. + epilogue cleans up the stack frame. :::info [llvm/lib/Target/\<Target>/\<Target>FrameLowering.cpp](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/ARM/ARMFrameLowering.cpp) 內有 \<Target>FrameLowering::emitPrologue() and \<Target>FrameLowering::emitEpilogue() ::: #### Frame index + LLVM uses virtual stack frame ==during code generation==. + After insert the prologue, it will allocate stack frame and provide target-specific info. + `eliminateFrameIndex()` in `<Target>RegisterInfo` implements the replacement by conveting virtual frame index to real stack offset. ### Machine Code framework #### MC instructions + The machine code instruction(`MCInst`) replace the machine instruction(`MachineInstr`). + `MCInst` contains less info about the program. :::info [include/llvm/MC/MCInst.h](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/MC/MCInst.h) ::: #### Code emission ![](https://hackmd.io/_uploads/S1eZNLSlp.png) + `AsmPrinter` is the first machine function pass that first emit function header and iterate all basic blocks. And `AsmPrinter` also dispatching a `MI` at a time to the `emitInstruction()` method. + `emitInstruction()` transforms the `MI` into `MCInst` through the `MVLowering` interface. + At this moment, there have two way to choose. The first is `MCAsmStreamer` that emits the assembly code. The second is `MCObjectStreamer` that emits binary instructions. Both are subclass of `MCStreamer`. + Assembler use `MCCodeEmitter::EncodeInstruction()` to transform `MCInst` into binary instructions.