AD Scheduling === ###### tags: `Paper-Proposals` `Enzyme` - Optimize for parallellism - Spatial locality - Temporary locality - Optimize for underlying hardware - Maximize for outer loop parallelism Advantages of JAX: - JAX HLO layer - Schedules away the unoptimal code > Take cues from the HLO scheduler Random notes: - Parallel code, Reverse mode, read turns to write, write turns to read etc. - Good schedulers, just like **polyhedral** schedulers - Polly - Polygeist -> polyhedral optimization - Drill up Polly to work with RL optimization - > Collect paper references to help Martin and be more focussed! 1. Some scheduler API needs to be plugged in 2. Integration of a scheduler into Enzyme 3. Want to give high-level information into the schedule, to avoid recovery - Integrate with Polly? 1. In the first round do the opposite of what you would easily want to do - Integrate with an existing RL scheduler for a related problem 4. Go down the MLIR route - Polygeist - Extend Enzyme itself to work on the MLIR level - Integrate with any of the other systems, e.g. ETH paper, Google papers, FB papers Martin: - Seeks to work with LLVM IR - Polly-level optimizations - Polly -> Enzyme -> Polly - Modify the first Polly call to act counter-intuitively to act in a way suited for **reverse-mode AD** - ProGraML representation of the LLVM IR as graphs to work on top of for the RL - Set up the Polly-Bench to start off with 1. CPU-versions (serial) 2. (GPU-versions) 3. (OpenMP) - Ravinia is another benchmark (roginia) - Pull in the Tensorflow Dataset from the CompilerGym benchmark > LLVM does not have a good representation of `parallel for`, only do auto-parallelization on stage 2 for now - Use the manual optimization pass infrastructure