# Proposal: A Unified Framework for Dataflow Analysis [Dataflow-based const validation][#64470] required a more generic dataflow framework with support for arbitrary transfer functions. This new framework was implemented in [rust-lang/rust#64566][#64566]. As a result, there are presently two separate frameworks for dataflow analysis that don't share any code. The goal of this design document is a plan to merge the two frameworks into one. I have implemented [a prototype][proto] that accomplishes this, but there may be concerns that I've overlooked. I will explain whether/how the prototype addresses each concern that I have identified so far. ## Extensibility A dataflow analysis is defined by the lattice which establishes the universe of possible state vectors, the transfer function for each statement and the direction in which statements are processed (forwards or backwards). At the moment, only forward analyses are supported, and the dataflow state vector must be a bitset (equivalently, the lattice must be a powerset). A new unified framework should make it easy to remove these restrictions in the future. Currently, backwards dataflow is done [by hand][liveness-flow], and more powerful const-propagation will, I think, require a more complex lattice. There are a few optimizations which could be applied to dataflow analyses, and a unified framework should allow as many as possible. For example, we could skip caching block transfer functions for dataflow analyses on acyclic MIR, since we never have to apply a block transfer function more than one. Some more complex ones are described in [#44285](https://github.com/rust-lang/rust/issues/44285). ## Interface Because we have restricted the dataflow lattice to a powerset for now, the only difference between gen-kill problems and generic ones is that transfer functions for the former are restricted to two operations ("gen" and "kill").The prototype splits the interface for dataflow analyses into three separate traits. A base trait, [`AnalysisDomain`], defines the index type used as the lattice element as well as the size of the state vector for a given `mir::Body`. ```rust trait AnalysisDomain<'tcx>: BottomValue { type Idx; fn bits_per_block(&self, body: &Body<'tcx>) -> usize; fn initialize_start_block(&self, body: &Body<'tcx>, state: &mut BitSet<Self::Idx>); } ``` The [generic dataflow interface][`Analysis`] defines the transfer functions for a given dataflow problem. Notice that there are two effects for each statement or terminator. Upon entry to a statement (or terminator), the "before" effect is applied, and upon exit the unprefixed effect is applied. This "two-phase" effect system is needed for the dataflow analysis done by the borrow checker, and is unchanged from the original framework. ```rust pub trait Analysis<'tcx>: AnalysisDomain<'tcx> { fn apply_statement_effect( &self, state: &mut BitSet<Self::Idx>, statement: &mir::Statement<'tcx>, location: Location, ); fn apply_before_statement_effect( &self, _state: &mut BitSet<Self::Idx>, _statement: &mir::Statement<'tcx>, _location: Location, ) {} fn apply_terminator_effect( &self, state: &mut BitSet<Self::Idx>, terminator: &mir::Terminator<'tcx>, location: Location, ); fn apply_before_terminator_effect( &self, _state: &mut BitSet<Self::Idx>, _terminator: &mir::Terminator<'tcx>, _location: Location, ) {} fn apply_call_return_effect( &self, state: &mut BitSet<Self::Idx>, block: BasicBlock, func: &mir::Operand<'tcx>, args: &[mir::Operand<'tcx>], return_place: &mir::Place<'tcx>, ); } ``` Finally, the [gen-kill interface][`GenKillAnalysis`] is almost identical to the generic one. However, a `GenKillAnalysis` is not given direct access to the state vector, but can only access it through a `GenKill` interface. `GenKill` abstracts the "gen" and "kill" operations either so they can be recorded while coalescing block transfer functions, or they can be applied directly to a `BitSet`. When arbitrary lattices for generic analyses are allowed in the future, the super trait will have an additional restriction, e.g., `GenKillAnalysis<'tcx>: Analysis<'tcx, Lattice = BitSet<Self::Idx>>`. ```rust pub trait GenKillAnalysis<'tcx>: Analysis<'tcx> { fn statement_effect( &self, trans: &mut impl GenKill<Self::Idx>, statement: &mir::Statement<'tcx>, location: Location, ); fn before_statement_effect( &self, _trans: &mut impl GenKill<Self::Idx>, _statement: &mir::Statement<'tcx>, _location: Location, ) {} fn terminator_effect( &self, trans: &mut impl GenKill<Self::Idx>, terminator: &mir::Terminator<'tcx>, location: Location, ); fn before_terminator_effect( &self, _trans: &mut impl GenKill<Self::Idx>, _terminator: &mir::Terminator<'tcx>, _location: Location, ) {} fn call_return_effect( &self, trans: &mut impl GenKill<Self::Idx>, block: BasicBlock, func: &mir::Operand<'tcx>, args: &[mir::Operand<'tcx>], return_place: &mir::Place<'tcx>, ); } ``` By making `Analysis` a super-trait of `GenKillAnalysis`, it becomes easy to implement the results cursor and visitor for all analyses. We add a blanket `impl<A: GenKillAnalysis> Analysis for A {}` to avoid code duplication. This precludes any additional specialized analysis traits, but I don't know of any that would be useful. Another downside of this approach is that a `GenKillAnalysis` can be passed to the [`Engine::new_generic`] method, which will execute that analysis without caching block transfer functions. Alternatively, the two traits could be separate, and a defaulted `apply_whole_block_effect` added to `Analysis`. We would use an adapter (which also holds the cached transfer functions) to wrap a `GenKillAnalysis` and implement `Analysis`. The adapter must be passed into the results cursor, however, which prolongs the time that the cached transfer functions are resident in memory. We would have to free the cache once fixpoint is reached via another method on `Analysis`. ## Engine Currently, the API for dataflow results consumers looks something like this: ```rust let engine = Engine::new_generic(MyGenericAnalysis::new(), /* body, etc. */); // -or- // let engine = Engine::new_gen_kill(MyGenKillAnalysis::new(), ...); let results = engine.iterate_to_fixpoint() let mut cursor = ResultsCursor::new(body, results) for (_, statement_index) in body.basic_blocks()[START_BLOCK].statements.iter_enumerated() { cursor.seek_before(Location { block: START_BLOCK, statement_index }); println!("state: {:?}", cursor.get()); } ``` TODO: finish ## Inspection The prototype gen-kill dataflow framework provides a few different APIs for inspecting the results of a dataflow analysis. There are two major paradigms, a "pull"-style [`ResultsCursor`], and a (to be implemented) "push"-style `ResultsVisitor`. "Push"-style APIs are more easily optimized (it's why `Iterator` implementations specialize `try_fold`), but "pull"-style APIs can be more efficient if we don't actually need to inspect the state in every block (like when doing const validation). The current `ResultsCursor` is very fine-grained to allow for debugging of two-phase dataflow analyses: It allows you to observe either the "before" effect or the full effect of every statement. This functionality is not very useful outside of this context, since most consumers of dataflow results only need to observe the "before" effect for each statement. If the new framework causes a perf regression, we may consider renaming the current cursor to `ResultsCursorFine` and implement a separate, less-flexible one. While results cursors are great for consumers of a dataflow analysis, they make some optimizations more difficult. To operate efficiently, they require that the full dataflow state at entry to each block is stored. ## Graphviz debugging Unifying the graphviz output for both interfaces is actually not trivial. This is because we would like to print the block transfer functions for gen/kill problems (as is done in the original framework). However generic dataflow problems do not allow for transfer functions to be coalesced, so instead we can only print the diff as each statement is processed. The prototype uses another argument to `rustc_mir`, `borrowck_graphviz_format` to control how the state is rendered (sometimes it is nice to view state diffs for gen-kill problems as well). You can see an example of the diff-style output below, as well as the output from the original graphviz debugger on the same MIR. ![](https://i.imgur.com/c96zH3W.png) Original: ![](https://i.imgur.com/iGm00BL.png) ## Transition My plan is to replace the generic analysis in `librustc_mir/dataflow/generic` with the unified version, then migrate the various implementers of `BitDenotation` over one-by-one. After this, the "generic" implementation will become the canonical one. The unified dataflow framework described above is not intrinsically less performant than the current setup, but the increased complexity may result in missed optimizations. It will be important to check for performance regressions as migration occurs. Once a unified framework is merged, I plan on writing a section in the `rustc` guide. I believe that there's a lecture video where the current `BitDenotation` is discussed, but it would be helpful to have some written content for new contributors. [#64470]: https://github.com/rust-lang/rust/pull/64470 [#64566]: https://github.com/rust-lang/rust/pull/64566 [liveness-flow]: https://github.com/rust-lang/rust/blob/e413dc36a83a5aad3ab6270373000693a917e92b/src/librustc_mir/util/liveness.rs#L86 [`AnalysisDomain`]: https://github.com/ecstatic-morse/rust/blob/b70cc5695cb807bc6e97c35037278c0ca775fb05/src/librustc_mir/dataflow/generic/mod.rs#L26 [`Analysis`]: https://github.com/ecstatic-morse/rust/blob/b70cc5695cb807bc6e97c35037278c0ca775fb05/src/librustc_mir/dataflow/generic/mod.rs#L50 [`GenKillAnalysis`]: https://github.com/ecstatic-morse/rust/blob/b70cc5695cb807bc6e97c35037278c0ca775fb05/src/librustc_mir/dataflow/generic/mod.rs#L121 [proto]: https://github.com/rust-lang/rust/compare/master...ecstatic-morse:unified-dataflow-proto [`ResultsCursor`]: https://github.com/ecstatic-morse/rust/blob/b70cc5695cb807bc6e97c35037278c0ca775fb05/src/librustc_mir/dataflow/generic/cursor.rs#L23 [`Engine::new_generic`]: https://github.com/ecstatic-morse/rust/blob/b70cc5695cb807bc6e97c35037278c0ca775fb05/src/librustc_mir/dataflow/generic/engine.rs#L69