A Survey on Register Allocation

# A Survey on Register Allocation ###### tags: `paper` `compiler` `back-end` ## 0. Abstract - Register allocation is the problem of mapping program variables to either machine registers or memory addresses. - registers exist in limited number, as some variables might have to be sent to memory. These are called ==spilled variables==. - Register allocation is one of the most important problems in ==compiler optimization== - In this survey we describe different strategies that compilers use to perform register allocation, and we study the different tradeoffs involved in each algorithm. - The first half of the survey touches register allocation from a general perspective, **whereas the second half gives special attention to the recent breakthroughs in the field of SSA-based register allocation.** ## 1. Introduction - Computer architectures rely on a memory hierarchy to store the data that is manipulated by programs. - Operations of reading and writing to registers in a modern computer take no more than one cycle of the CPU clock. - registers are so fast and so few, one of the greatest challenges of compiler writers is to design algorithms that **keep the most used program variables in registers, while relegating the least used variables to memory if necessary. Register allocation is thus the problem of mapping variables to registers or memory. This is one of the most important compiler optimizations.** - As an example, experiments show that an optimal allocator can produce code that is over 250% faster than the code produced by a naive algorithm that maps all the variables to memory. - A program can be described by its control flow graph. - The control flow graph is formed by basic blocks. - Each basic block has a unique label, and a list of instructions. - Instructions are the primary constituents of programs. - Each instruction may apply an operation on some variables, which are called the used variables. An instruction may define one or more variables; these are called defined variables. - The only operands of these instructions are basic block labels. - We call a program point the point between two consecutive instructions. - We say that a variable v is ==alive== at a program point p if there is a path from p to an instruction that uses v where v is not re-defined by any instruction. - The collection of program points where a variable is alive is called its ==live range==. - We say that two variables ==interfere== if the intersection between their live ranges is non-empty. In this case, we also say that their live ranges overlap. - The concept that interfere is very important for register allocation, because ==two variables that do not interfere can be stored in the same register==. - A simple algorithm to compute liveness information is given by Appel and Palsberg . :::spoiler For instance : the live range the live range of variable B is {p2, p3, p4, p9}, whereas the live range of variable a includes all the program points but p11, p12 and p16. Notice that a is not alive at program points p11 and p12 because this variable is redefined by the instruction a = f, and its old value is not necessary after the instruction f = a. ::: :::spoiler For instance : interfere variables c and d interfere, because their live ranges overlap at program point p5; however, variables c and E do not intefere. ::: # 1.1 Irregular Architectures Modern computer architectures present irregular register banks. The two most common sources of irregularities are pre-colored registers and aliasing. # 1.2 Pre-coloring - Pre-coloring is a very common phenomenon that forces some variables to be assigned to particular machine registers. - A typical example is parameter passing in function calls. Architectures such as PowerPC and StrongARM use registers to pass arguments to functions. :::spoiler For instance : a two argument function call in PowerPC is written in assembly in a way similar to the code strip below: ```c= r0 = arg0 ; //the first argument must be passed in r0 r1 = arg1 ; //the second argument must be passed in r1 bl foo ; //branch and link, e.g, calls the function foo ``` The variables arg0 and arg1 can be stored in any register, but the variables r0 and r1 cannot: ==they are already allocated to registers r0 and r1==. ::: :::spoiler Many other common examples of pre-coloring are found in x86. in that architecture, the results of a division operation must be placed in the registers edx and eax. As a convention, we name a pre-colored variable with the name of its pre-coloring register. For instance, the variable AL in block L3 of Figure 2 is pre-colored with register AL. ::: # 1.3 Aliasing - The address decoder for something has more than one address that works. # 1.4 Some Register Allocatin Jargon ## Spilling Because registers exist in limited number, they may not be enough to store all the variables in the source program. If that is the case, then some variables ==must be mapped to memory==. The act of storing a variable into memory is called spilling. ## Coalescing If two variables v1 and v2 do not interfere, and they are related by a copy instruction, that is, the source program contains an instruction such as v1 = v2, then it is desirable that these variables be allocated into the same register r. In this case, we will have the copy instruction r = r, which is redundant and can safely be removed from the target program. Coalescing is the act of mapping two non-interfering variables that are related by a copy instruction to the same register. :::spoiler For instance the instructions f = a and a = f can be eliminated from the program if variables a and f are assigned to the same register. ::: - ==a good register allocator should not only assign different registers to interfering variables, but also try to assign the same register to variables related by copies.== ## Live Range Splitting - **This concept is the inverse of coalescing**. Whereas coalescing join the live ranges of variables by removing copies from the source program, live range splitting divides the live range of variables **by adding copies to the program and renaming variables**. ==The splitting of live ranges tends to reduce the interferences between variables; thus, it might minimize the number of registers required by programs==. # 2. Different Register Allocation Approaches Register allocation is possibly the compilation problem with the greatest number of different algorithms already described in the literature. In the remainder of this section we will be describing several approaches to register allocation, using the program in Figure 2 as a running example. # 2.1 Register Allocation via Graph Coloring - **Graph coloring is the most used approach to solve register allocation**. - The Interference Graph of a program is the intersection graph of the live ranges of the variables in the program. - given a program P,its interference graph G = (V, E) contains a vertex for each variable v of P. - An edge (u, v) is in E if theintersection of the live ranges of variables v and u is non-empty. - The problem of assigning registers to variables can thus be approximated by coloring the interference graph of the source program. Each color corresponds to a register, and interfering variables will be given different colors, given that they are adjacent on the interference graph. - One of the first and most celebrated graph coloring based register allocators was described by Chaitin et al. [15, 16]. - The algorithm described in laid the foundations of practically all the graph coloring based register allocators that were introduced later. - The core of Chaitin’s algorithm is Kempe’s coloring scheme. - Basically, a node v in the interference graph G can be colored if it has less than K neighbors, where K is the number of colors available. - In this case, **the node v can be safely removed from G, and placed on a stack of nodes that are ==guaranteed to be colorable. This process, called simplification==**, - iterates until there is no more nodes to remove from G. - Two aspects of the register allocation problem complicate this technique: spilling and coalescing. - Spillingis the act of mapping a variable to memory because there is no more registers available to hold its value. - Coalescing is the act of mapping two non-interfering variables that are related by a ==copy instruction== to the same register. :::spoiler For instance the instructions f = a and a = f can be eliminated from the program if variables a and f are assigned to the same register. Therefore, a good graph coloring based register allocator should not only assign different colors to interfering variables, **but also try to assign the same color to variables related by copies. Due to spilling and coalescing.** ::: --- 1. ***Renumber*** : discover live range information in the source program. 2. ***Build*** : build the interference graph. 3. ***Coalesce*** : merge the live ranges of non-interfering variables related by copy instructions. 4. ***Spill cost*** : compute the spill cost of each variable. That is a measure of the impact of mapping a variable to memory on the speed of the final program. 5. ***Simplify*** : Kempe’s coloring method. 6. ***Spill Code***: insert spill instructions, i.e loads and stores to commute values between registers and memory - One of the early achievements of Chaitin et al. was to show that spill free register allocation is a NP-complete problem. Basically, Chaitin et al. proved that any graph is the interference graph of someprogram. - Chaitin’s algorithm had a few deficiencies that were improved by later works. One of the problems of that allocator was the ==aggressive coalescing policy==. :::spoiler NOTE A) *aggressive coalescing* removes as many moves as possible, regardless of the colorability of the resulting interference graph; B) *conservative coalescing* removes as many moves as possible while keeping the colorability of the graph; C) *incremental conservative coalescing* removes one particular move while keeping the colorabilityof the graph; D) *optimistic coalescing* coalesces all moves (when possible),aggressively, and gives up about as few moves as possible (de-coalescing) sothat the graph becomes colorable. ::: - Merging live ranges of variables has the undesirable effect of increasing the degree of vertices in the interference graph, and thus it might cause spilling. - In order to solve this omission, Briggs et al. introduced the concept of conservative coalescing. - **an extra criterion to decide when two live ranges can be merged**. - Thus, in addition of the non-interfering requirement,two variables can only be coalesced if their merging will not cause further spilling in the interference graph. - Another improvement brought up by Briggs et al. was biased coloring: **the select phase tries to assign the same color to variables that are copy related**. - Briggs et al. point that **the combination of conservative coalescing and biased coloring could remove most of the copy instructions in the original program *before register allocation***. - Finally, Briggs et al. introduced the concept of **optimistic coloring: instead of spilling away variables that could not be simplified using Kempe’s technique, Briggs et al. defer this decision to the simplification phase**. - Many times two or more of the neighbors of a vertex v will be assigned the same color,and v will be colorable even if it has a number of neighbors that is larger than the number of available colors.The modified version of Chaitin’s algorithm, as described by Briggs et al. - it misses some copy instructions that could be removed. - The coalescing criterion used by Briggs et al. is described as follows: **nodes a and b can be coalesced if the node that results from their merging has less than K neighbors of significant degree, where a node has significant degree if it has K or more neighbors** - Subsequently, George and Appel showed that this criterion could be relaxed to **allow more aggressive coalescing without introducing extra spilling**. - They proposed ==Iterated Register Coalescing== - a graph coloring based register allocator that remains today, almost 15 years after its first release, the base algorithm taught in many compiler courses and **used as baseline in many research projects**. - An important addition of Iterated Register Coalescing over the previous allocators was a ==Freeze phase==. - If neither simplify nor coalesce could remove any node from the interference graph, **the freeze step would mark a copy related node, so that it would no longer be considered for coalescing.** # 2.1.1 A Taxonomy of Coalescing Approaches - Bouchez et al. summarizes some of the best known approaches for performing register coalescing: - *Aggressive Coalescing* : merge move-related vertices, regardless of the colorability of the interference graph after the merging. - *Conservative Coalescing* : merge moves if, and only if, it does not compromise the colorability of the interference graph. - *Optimistic Coalescing* : coalesce moves aggressively, and if it compromises the colorability of the graph, then give up as few moves as possible. - *Incremental Conservative Coalescing* : remove one particular move instruction, while keeping the colorability of the graph. - Bouchez et al. have shown, by means of an ingenious sequence of reductions, that all these differentrealizations of the register coalescing problem are NP-complete for general interference graphs. # 2.2 Linear Scan Register Allocation - Graph coloring is the most popular approach for register allocation, but it is not the only one, and it has been losing ground to a different, **simpler approach called ==Linear Scan==**. Variations of this technique are adopted in many modern compilers, **including LLVM and the Java HotSpot client compiler**. In this section we describe Linear Scan and some of its more important variations. - Linear scan is a greedy algorithm introduced by Poletto and Sarkar on the late nineties . It simplifies register allocation by reducing it to the problem of assigning colors to an ordered sequence of intervals. - linear scan is not optimal : it uses the optimal algorithm as an **approximation** to solve the register allocation problem. - The algorithm starts by linearizing the basic blocks of the source program, that is, arranging these blocks in a linear sequence; the exact ordering chosen is not important and will not compromise the correctness of the results. - Given this linearization, linear scan replaces the live range of each variable with a contiguous interval, called the ==variable lifeline==, and then proceeds to color these intervals. - Linear scan is a fast algorithm not using graph coloring. - It is simple to implement and is linear: register allocation is performed as we compute the live ranges of the variables. - This has contributed to its popularity,and it is the choice allocator when either manpower lacksmto implement a more complicated allocator, or speed is required, such as in JIT environments. - ==linear scan method performs roughly 10% inferior as a general measurement, but is much faster at compile time==. - An important extension is due to Traub et al. . Traub’s algorithm introduces the binpacking model : the machine registers are viewed as bins into which variable lifetimes can be packed. Thus, **linear scan can assign two non-overlapping lifetimes to the same bin, or it can assign two lifetimes into the same bin if the live range that constitutes one of the lifetimes is completely contained in holes in the live range that constitutes the other lifetime**. This model was later used by M¨ossenb¨ock and Pfeiffer in an algorithm **that performs register allocation in programs in Static Single Assignment form.** :::spoiler Bin Packing Problem - Given n items of different weights and bins each of capacity c, assign each item to a bin such that number of total used bins is minimized. It may be assumed that all items have weights smaller than bin capacity. - This problem is a NP Hard problem and finding an exact minimum number of bins takes exponential time. ::: - Wimmer and M¨ossenb¨ock have introduced several optimizations to linear scan. Their algorithm handles holes in live ranges, but their most innovative extensions are *optimal split positions*, *register hints* and *spill store elimination*. - *Optimal split position* is a technique to **minimize the effects of spill code in the final program produced after register allocation**. - *Register hint* is a simplified coalescing approach for linear scan. - It is similar to biased coloring in the sense that, when choosing a color to an interval i1, if i1 is connected to another interval i2 via a move instruction, then the allocator attempts to assign to i1 the color previously assigned to i2. - **This optimization allows to move loads outside loops**, - *Spill store elimination*is an optimization used to remove from the target program some store instructions that can be proved statically to be useless. - **A common example is multiple stores of a variable that is only defined once**. - In this case, all the store instructions can be replaced with a single store after the definition point of the variable. - The most recent addition to the family of linear scan algorithms is Sarkar and Barik new method,called **Extended Linear Scan**. By inserting copy and swap instructions along the source program, this version of linear scan **guarantees to use the minimal number of registers necessary to compile the source program**.Sarkar’s algorithm diverges from previous implementations because **it has a preference towards inserting extra move instructions to avoid inserting spill code**. # 2.3 Register allocation via Integer Linear Programming - The linear programming framework fits a vast number of different optimization problems, and a subclass of this model, the 0-1 integer programming, has been used to solve register allocation. **The basic idea of this approach is to model the interactions between registers and variables as constraints in a system of integer linear equations**. - For instance, given a register AX we define the following variables: - AXr(v, p) is one if variable v reaches and leaves program point p allocated to register AX and is zero otherwise. - AXl(v, p) is one if variable v reaches program point p in memory, but leaves it allocated to register AX. It is zero otherwise. The letter l indicates a load. - AXs(v, p) is one if variable v reaches program point p allocated to register AX, but leaves it in memory . It is zero otherwise. The letter s indicates a store. - ==The constraints must guarantee that only one variable will be allocated into register AX at any time==; thus, we add to the linear system the constraint: ``` ∀(v, p),X(AXr(v, p) + AXs(v, p) + AXl(v, p)) ≤ 1 ``` - Goodwin and Wilken gave the first formulation of register allocation as a 0-1-integer programming problem. Their constraint set could handle several facets of register allocation, such as *coalescing, spilling,rematerialization* and *aliasing*. **Goodwin’s register allocator produces code of very good quality; however,as integer linear programming is NP-complete, it presents a worst case exponential running time, and can take hours to find an optimal solution.** # 2.4 Register allocation via Partitioned Quadratic Programming # 2.5 Register allocation via Multi-Flow of Commodities # 3. SSA based register allocation - An important breakthrough in register allocation happened in 2005, when three different research groups proved independently that the interference graphs of programs in ==Static Single Assignment (SSA)== form are chordal. This result is important because **chordal graphs can be colored in polynomial time**. - **SSA form. The Static Single Assignment (SSA) form is an intermediate representation in which each variable is defined at most once in the program code**. - Many industrial compilers use the SSA form as an intermediate representation: Gcc 4.0 , Sun’s HotSpot JVM , IBM’s Java Jikes RVM and LLVM . However, - these compilers perform register allocation in Post-SSA programs, that is, programs in which φ-functions have been replaced with copy instructions. - In SSA-based register allocators we observe an inversion of phases: φ-functions are replaced after registers have been assigned to variables. - We will use colored-SSA-form to denote the variation of SSA-form in which each variable is associated with a physical location, which can be a register or a memory address.