Bottom Up vs Top Down Inlining

# Bottom Up vs Top Down Inlining In all diagrams, square functions are non-trivial and larger than the inlining threshold while circle functions are trivial and smaller than the inlining threshold. Except where noted otherwise, the assumption is that our inlining heuristic is to first calculate a numeric "size" for the callee on the basis of its current body, and to inline if and only if that size is less than some fixed constant. Bottom up inlining is the inlining strategy that starts at the bottom of the call graph and works its way up. Specifically, when it sees a callsite to consider inlining, it considers the optimized body of the callee for inlining. Furthermore, if it creates a new callsite in the current function as a result of inlining, that callsite is not considered for inlining. Top down inlining is the inlining strategy that begins at the top of the call graph and works its way down. When it sees a callsite to consider inlining, it considers the unoptimized body of the callee for inlining. If it introduces new callsites as a result of inlining, those are then also considered for further inlining. I argue that we should do bottom up inlining (apparently top down inling is also possible if you have a lot of engineering hours, but that doesn't seem to be our situation). ### Equally good on trivial wrapper functions <img src="https://i.imgur.com/NGS6kcw.png" width=400 /> Both strategies will correctly remove the trivial wrapper in the above call graph. Bottom up inlining will do it by removing the lower edge, while top down inlining will do it by removing the upper edge. One might be tempted to argue that there are other callgraphs on which top-down inlining is superior. I originally thought that bottom-up got the following call-graph wrong: <img src="https://i.imgur.com/MfZHxQ0.png" width=400 /> Specifically, I thought there was the risk that top-down creates this callgraph: <img src="https://i.imgur.com/SfuALmh.png" width=400 /> While bottom up would create this: <img src="https://i.imgur.com/cznKTRe.png" width=400 /> But I don't actually think that's the case. Specifically, if the bottom up inliner was willing to inline the non-trivial function into its three callsites, then the bottom function is actually trivial and both inliners would continue inlining and the states above would not be the outcome. I also can't think of any inlining heuristic for a bottom up inliner that would actually cause bad behavior on this kind of graph. I also don't know of any other call graphs that run into problems here (let me know if you do). ### Naive top down is exponential <img src="https://i.imgur.com/AcC3dS7.png" width=400 /> Consider the above call graph (but imagine lots more layers). The runtime of this kind of program is exponential in the size of the binary. A naive top-down inliner will exhibit exponential compile times here. Specifically, every call site that it ever sees is always to a trivial function, and so it will inline all of them, until it has created a single exponentially sized function. Bottom up inliners have no such problem. As a matter of fact, they have a very nice property that fundamentally prevents such issues: The number of inlining decisions they make is exactly equal to the number of call sites in the original program. This means that there is no scenario in which the size of the binary increases by more than `num_call_sites * inlining_threshold`. The exponential growth can be fixed by properly limiting the amount of inlining. However, even in that case top down strategies do worse on these graphs than bottom up. This is because bottom up inlining ensures that we do as much inlining at the bottom of the call graph which contains the hottest code. In other words, the top down inliner will inline the edges at the top, creating this call graph: <img src="https://i.imgur.com/6iETPD2.png" width=400 /> Which is much slower to execute due to having twice the number of function calls as the call graph created by the bottom up inliner: <img src="https://i.imgur.com/Kb3vhHa.png" width=400 /> ### Top down sees unoptimized bodies When a top down inliner considers another function body for inlining, that body has not yet been optimized. One can of course first run *some* optimizations on it, but any optimizations that are enabled by inlining cannot have taken place, since inlining has by definition not yet happened. This means that top down inliners will generally have do a worse job estimating the size of the functions they are inlining, since they don't know if the function will become trivial after being optimized or is a good candidate for optimizations that increase the size of the function. ## Disadvantages of bottom up The first disadvantage of bottom up inlining is that `#[inline(always)]` on trivial wrappers doesn't work. This seems not that huge a price to pay for generally better optimizations. I don't see a fundamental reason that we couldn't have `#[inline(top_down)]` (or `#[inline(first)]`) on trivial wrapper functions even with a bottom up inliner (but would want to think a bit more about what exactly the consequences of this interaction are). The second disadvantage is MIR inlining specific: Bottom up inlining might fail because the callee is a trait function whose body cannot be known until a type parameter is known (so `<Parameter as Trait>::function`). Doing top-down inlining first might in this case is advantageous because the outer function might be able to substitute in the type parameter, thereby making the callee known. Allowing some limited top-down inlining in these scenarios might be a completely sufficient alternative though.