AD Scheduling
===
###### tags: `Paper-Proposals` `Enzyme`
- Optimize for parallellism
- Spatial locality
- Temporary locality
- Optimize for underlying hardware
- Maximize for outer loop parallelism
Advantages of JAX:
- JAX HLO layer
- Schedules away the unoptimal code
> Take cues from the HLO scheduler
Random notes:
- Parallel code, Reverse mode, read turns to write, write turns to read etc.
- Good schedulers, just like **polyhedral** schedulers
- Polly
- Polygeist -> polyhedral optimization
- Drill up Polly to work with RL optimization
-
> Collect paper references to help Martin and be more focussed!
1. Some scheduler API needs to be plugged in
2. Integration of a scheduler into Enzyme
3. Want to give high-level information into the schedule, to avoid recovery
- Integrate with Polly?
1. In the first round do the opposite of what you would easily want to do
- Integrate with an existing RL scheduler for a related problem
4. Go down the MLIR route
- Polygeist
- Extend Enzyme itself to work on the MLIR level
- Integrate with any of the other systems, e.g. ETH paper, Google papers, FB papers
Martin:
- Seeks to work with LLVM IR
- Polly-level optimizations
- Polly -> Enzyme -> Polly
- Modify the first Polly call to act counter-intuitively to act in a way suited for **reverse-mode AD**
- ProGraML representation of the LLVM IR as graphs to work on top of for the RL
- Set up the Polly-Bench to start off with
1. CPU-versions (serial)
2. (GPU-versions)
3. (OpenMP)
- Ravinia is another benchmark (roginia)
- Pull in the Tensorflow Dataset from the CompilerGym benchmark
> LLVM does not have a good representation of `parallel for`, only do auto-parallelization on stage 2 for now
- Use the manual optimization pass infrastructure