The New Pass Manager for llvmlite

# The New Pass Manager for llvmlite Presented by Graham Markall, NVIDIA: <[gmarkall@nvidia.com](mailto:gmarkall@nvidia.com)> ## Introduction * There are two pass managers in LLVM: * The Legacy Pass Manager, used by llvmlite * The New Pass Manager, which llvmlite needs to move to * The Legacy Pass Manager is removed in LLVM 17 (Numba / llvmlite presently hovering around LLVM 14 / 15). * [PR #1046: Add basic infra required to move Numba to NewPassManager](https://github.com/numba/llvmlite/pull/1046) (today's topic) adds support for the New Pass Manager to llvmlite ## Scope for today * Discussion of the llvmlite-specific design considerations * Where necessary, talk about the differences between the pass managers and implementation details * Quick examples * Details of Numba testing * Summary of review and proposed next steps ## Acknowledgments * Yashwant Singh at NVIDIA * Author of [PR #1046: Add basic infra required to move Numba to NewPassManager](https://github.com/numba/llvmlite/pull/1046) * Modi Mo at Meta: * Author of [PR #1042: Update llvmlite to be compatible with llvm-17](https://github.com/numba/llvmlite/pull/1042) ## Apologies * There's quite a few issues / design decisions to talk about * They're all a bit fiddly * and none of them are really interesting * but they are all important for "quality of life" for the llvmlite user ## Approaches Two possible approaches: * Reimplement existing llvmlite pass manager APIs using New Pass Manager * Create New Pass Manager APIs in llvmlite * ... and possibly allow old and new to coexist We take the "old and new coexisting" approach: * Differences between new and old pass manager possible * New Pass Manager is target aware (noted by Yashwant Singh) * Inlining threshold available in Legacy Pass Manager, not available in New Pass Manager until LLVM 16 (noted by Da Li and Yashwant) * Potential for performance / vectorization regressions * c.f. discussion like [Compilation pipeline, compile time, and vectorization](https://numba.discourse.group/t/compilation-pipeline-compile-time-and-vectorization/1716) Numba should support both old and new pass managers: * Default to new as soon as possible (e.g. for 0.61) * Allow switch back to old pass manager with config variable ## Implementation * The implementation in PR #1046 implements exactly the subset of functionality needed by Numba: * Module and Function Pass Managers * Pipeline Tuning Options and Pass Builders needed to construct the pass managers * Exactly the passes used by Numba (other passes omitted) * Not implemented for initial iteration: * Addition of all the other passes * CGSCC pass manager * Lots of other APIs (we also omit many legacy pass manager APIs anyway) ## Handling naming Considerations: * Can't rename existing classes and functions * Would break backwards compatibility * Don't want to encode the word "New" in any New Pass Manager APIs * That doesn't look great, and eventually it will be the only pass manager * And potentially "old" as well * Can't have naming conflicts between legacy and new pass managers * Want to try and mirror LLVM names as much as possible * Makes it easier to use LLVM API docs * Generally less confusing / lower mental load for llvmlite users ## Naming choices The least worst solution seems to be: * Legacy pass manager classes and functions: * `ModulePassManager` and `FunctionPassManager` * `create_module_pass_manager`, `create_function_pass_manager` * New Pass Manager classes and functions: * `PipelineTuningOptions`, `PassBuilder`, `ModulePassManager` `FunctionPassManager` * `create_pipeline_tuning_options`, `create_pass_builder`, `create_new_module_pass_manager`, `create_new_function_pass_manager` ### Issues / conflicts in naming * Legacy pass manager APIs are in the existing `llvmlite.binding.passmanagers` module * New pass manager APIs are in the new `llvmlite.binding.newpassmanagers` module * However, everything from both of these is imported into `llvmlite.binding`: * This is why we have `create_new_module_pass_manager` vs. `create_module_pass_manager` * Only the legacy `ModulePassManager` and `FunctionPassManager` get imported into `llvmlite.binding` * That is backwards-compatible * Documentation advises using `create_new_*_pass_manager` if you want to create a new pass manager * Note that pass managers can also be constructed by the `PassBuilder` with [`getModulePassManager()`](https://llvmlite--1046.org.readthedocs.build/en/1046/user-guide/binding/optimization-passes.html#llvmlite.binding.ModulePassManager) and [`getFunctionPassManager()`](https://llvmlite--1046.org.readthedocs.build/en/1046/user-guide/binding/optimization-passes.html#llvmlite.binding.FunctionPassManager) * When Legacy Pass Manager removed, the new pass manager classes could be imported into `llvmlite.binding` ### Naming on the C++ side * On the C++ side, there are no name conflicts between new and legacy pass managers * C++ namespaceing, different API names, etc. * New pass manager bindings in `newpassmanagers.cpp` * When the legacy ones are deleted, we might move it to `passmanagers.cpp` * Names here are not exposed in the public API, so it's not a big issue ## Implementation running the pass manager ```C++ API_EXPORT(void) LLVMPY_RunNewModulePassManager(LLVMModulePassManagerRef MPMRef, LLVMPassBuilderRef PBRef, LLVMModuleRef mod) { ModulePassManager *MPM = llvm::unwrap(MPMRef); PassBuilder *PB = llvm::unwrap(PBRef); Module *M = llvm::unwrap(mod); LoopAnalysisManager LAM; FunctionAnalysisManager FAM; CGSCCAnalysisManager CGAM; ModuleAnalysisManager MAM; PB->registerLoopAnalyses(LAM); PB->registerFunctionAnalyses(FAM); PB->registerCGSCCAnalyses(CGAM); PB->registerModuleAnalyses(MAM); PB->crossRegisterProxies(LAM, FAM, CGAM, MAM); MPM->run(*M, MAM); } ``` What's going on here: * Create analysis managers and cross-register them with each other * Run the pass manager with the Module Analysis Manager * After the function exits, the analysis managers are out of scope and deleted * Safe: No reference to the analysis managers is held by the pass manager Thoughts: * Each time we run the pass manager, we construct new analysis managers * Is this likely to be a performance issue? * My guess: Probably not in the context of Numba * I'm inclined to keep this simple implementation for initial work * Can re-visit if it's a performance issue later * **Contrasting approach**: Modi's implementation in PR #1042 caches the analysis managers on the pass managers. ## Adapting module passes to run on functions * With the New Pass Manager, passes can be specific to modules, functions, loops, or CGSCCs. * c.f. the legacy pass manager, where passes could run on anything * Need an adapter to provide equivalent functionality for some of our legacy pass manager APIs, e.g. (from [`newpassmanagers.cpp`](https://github.com/numba/llvmlite/pull/1046/files#diff-45eef2a2ab57512c9c03ab44fbebc4228722e001d790af044bd2ca511df08414)): ```C++=109 API_EXPORT(void) LLVMPY_AddJumpThreadingPass_module(LLVMModulePassManagerRef MPM, int T) { llvm::unwrap(MPM)->addPass( createModuleToFunctionPassAdaptor(JumpThreadingPass(T))); } ``` ## Example usage ### Creating a pipeline with explicitly-specified passes From [`examples/npm_passes.py`](https://github.com/numba/llvmlite/pull/1046/files#diff-58e513c36aa180090e33e5eb1d9793277a8b79c23ff702b10eb2d38ec6d9cb78): ```python=51 # Set up the module pass manager used to run our optimization pipeline. # We create it unpopulated, and then add the loop unroll and simplify CFG # passes. pm = llvm.create_new_module_pass_manager() pm.add_loop_unroll_pass() pm.add_simplify_cfg_pass() # To run the pass manager, we need a pass builder object - we create pipeline # tuning options with no optimization, then use that to create a pass builder. target_machine = llvm.Target.from_default_triple().create_target_machine() pto = llvm.create_pipeline_tuning_options(speed_level=0) pb = llvm.create_pass_builder(target_machine, pto) # Now we can run the pass manager on our module pm.run(module, pb) ``` ### Creating a default pipeline From [`examples/npm_pipeline.py`](https://github.com/numba/llvmlite/pull/1046/files#diff-1b5e4027661f8219a245391a5ba113af5c15c5b64f89892e0440c324f3bf925b): ```python=53 # Create a ModulePassManager for speed optimization level 3 target_machine = llvm.Target.from_default_triple().create_target_machine() pto = llvm.create_pipeline_tuning_options(speed_level=3) pb = llvm.create_pass_builder(target_machine, pto) pm = pb.getModulePassManager() # Run the optimization pipeline on the module pm.run(module, pb) ``` ## Testing with Numba * Branch that uses the new module pass manager: [gmarkall's `npm` Numba branch](https://github.com/gmarkall/numba/tree/npm) * Idea was to prove the concept of Numba changes * Test results: ``` Ran 11970 tests in 1489.362s FAILED (failures=19, errors=3, skipped=607, expected failures=33) ``` Or sometimes: ``` FAILED (failures=8, errors=3, skipped=607, expected failures=33) ``` * Test fails / errors due to: * Quick hack messing up caching (occasionally) * Refop pruning not ported to the new pass manager * Changes to debuginfo transformation not anticipated by the test suite * Regexes are failing to match, likely need updating * General conclusion: * This is a good base on which to start moving Numba to the New Pass Manager. ## Notes on my review * I've been reviewing and guiding this PR so far * I'm comfortable with the code changes and have been over them for multiple iterations: * C++ FFI binding, Python binding * Test cases seem aligned with how much we test the legacy pass manager * Numba testing did not expose any unexpected issues * The PR is low-risk in that it only adds new, as-yet-unused APIs * I wrote a substantial part of the documentation and commentary of the examples Therefore: * As far as I'm concerned this is good to merge, however: * Someone else should review the docs and examples * Some sanity checking of the design decisions around naming would be good] * (make sure there are no footguns here) ## Next steps * Merge PR #1046 as soon as possible: * It only adds new APIs so is low risk * Any real issues will be much easier to tease out once we're using Numba with it * Implement proper support for the New Pass Manager in Numba and create PR shortly after * Port refop pruning to the New Pass Manager * An implementation already exists in Modi's LLVM 17 PR * Add all passes to the New Pass Manager binding * Can we find a procedural way to do that? * Support for out-of-tree passes? * Expose [`buildModuleSimplificationPipeline`](https://llvm.org/doxygen/classllvm_1_1PassBuilder.html#ad6f258d31ffa2d2e4dfaf990ba596d0d) and extension points - suggestion from Adrian Seyboldt, tracked in [Issue #1055](https://github.com/numba/llvmlite/issues/1055). * Add support for running with remarks, pass timing, etc. ## References * LLVM documentation: [Using the New Pass Manager](https://llvm.org/docs/NewPassManager.html) * Provides a quick tutorial overview of how to use it * Seems aimed towards those moving code from the legacy pass manager * Therefore, missing a lot of explanation / details * LLVM blog post: [The New Pass Manager](https://blog.llvm.org/posts/2021-03-26-the-new-pass-manager/) * A nice overview of the motivation for, and design goals of, the new pass manager.