owned this note
owned this note
Published
Linked with GitHub
# cache dev guide chapter
# Caching in the new trait solver
Caching results of the trait solver is necessary for performance.
We have to make sure that it is sound and does not cache in unstable
query results. Caching is handled by the [`SearchGraph`].
[`SearchGraph`]: TODO
## The global cache
At its core, the cache is fairly straightforward. When evaluating a goal, we
check whether it's in the global cache. If so, we reuse that entry. If not, we
compute the goal and then store its result in the cache.
To handle incremental compilation the computation of a goal has to be tracked by the query system. This is done by wrapping the computation with [`Cx::with_cached_task`][with_cached_task]. This creates a new `DepNode` which depends on all queries
used inside of this computation. When accessing the global cache we then read this
`DepNode`, manually adding a dependency edge to all the queries used: [source][wdn].
Using a global cache entry must have the same effect as actually computing a
goal. To make sure that's the case, [`fn update_parent_goal`][upg] is called when we finish an evaluation, but *also* when we use a global cache entry.
[`fn update_parent_goal`][upg] lazily updates the state of the highest stack entry.
### Dealing with overflow
Hitting the recursion limit is not fatal in the new trait solver but instead simply
causes it to return ambiguity: [source][overflow]. Whether we hit the recursion limit
can therefore change the result without resulting in a compilation failure. These results are still cached.
However, we must be careful when accessing these cached results later as whether we use a global cache entry must not be observable. We do this by storing additional information in the global cache entry. For goals whose evaluation
did not reach the recursion limit, we simply store its reached depth: [source][req-depth].
These results can freely be used as long as the current `available_depth` is higher than
its `reached_depth`: [source][req-depth-ck]. We then update the reached depth of the
current goal to make sure that whether we've used the global cache entry is not
observable: [source][update-depth].
We can only use a cached result that overflowed if its stored depth *exactly matches* current available depth. This is necessary because even if a nested goal hit the recursion limit, its parent can still fail to evaluate or succeed by using a different candidate. The cache entry for each goal
therefore contains a separate result for each remaining depth: [source][rem-depth].[^1]
## Handling cycles
The trait solver handles cycles as explained [in this separate chapter][TODO]. Doing so efficiently and correctly greatly complicates caching.
The used terminology is taken directly from that chapter.
TODO: cycle, cycle head, cycle root, cycle participant, path, productive, unproductive
### Cycles and query stability
We cannot move the result of any cycle participant to the global cache until we've finished evaluating the cycle root. However, even after we've completely evaluated the cycle, we are still forced to discard the result of all participants apart from the root itself.
As explained in the [chapter about cycle handling][TODO], the result of a cycle can change depending on which goal is the root: [example][unstable-result-ex]. This means that we must not use a global cache entry if its evaluation would end up depending on a goal already on the stack. We do this by storing the nested goals of each global cache entry and not using that entry if a nested goal is currently on the stack: [source][TODO].
### The provisional cache
We use a separate local cache while inside of cycles. This cache still needs to be sound. However, unlike the global cache it is allowed to impact behavior without resulting in query instability as its usage is deterministic. While possible, globally caching cycle participants would require us to track a lot of additional information. Even more importantly, the provisional cache needs to be able to impact the behavior of the trait solver to enable a necessary optimization.
- basic idea: move cycle participants to the provisional cache, storing the highest cycle head and the path from that head to the cycle participant
- when popping the cycle head from the stack or when updating its provisional result, discard provisional entries which depend on it
- also need to check whether reevaluating a global cache entry would end up depending on a provisional cache entry
- issue: hangs with complex auto trait cycles: https://hackmd.io/Vzr-q5h8T_-0dQWeTQb4ig#perfect-derive-auto-traits---oh-my
- idea: keep provisional cache entries around even after popping the cycle heads they depend on
- this changes the paths of any cycles going through the popped cycle head. can think of it as "rotating part of the proof tree", need to make sure they all remain either inductive or coinductive
- only correct when computing the cycle head reached a fixpoint. When failing to do so, we force all provisional cache entries which depend on the cycle head to also be ambiguous. This results in sus ambiguity https://gist.github.com/lcnr/67ded915a8ce43cf6faa7e97eb59e8de#fun-general-tests
## tracking `nested_goals`
- tracking all nested goals may be too expensive (actually never tested)
- only track all nested goals after encountering the first cycle. Cycle participant are expected to definitely cycle, while the behavior of any goal which depends on a cycle changes depending of the result of the cycle (which depends on the root/provisional cache)
- overflow also changes the dependencies of goals, and given that any arbitrary goal could overflow, do we need to track all of the after all?
- no. By only using provisional cache entries which encountered overflow while the current goal on the stack is already a member of their cycle (so the caching behavior is deterministic), we can weaken caching to reduce the required tracking.
- TODO: could we instead track the required depth of provisional cache entries as well?
examples: https://gist.github.com/lcnr/291c3a7e1491ea33c1333c83cb594ca0
## Fuzzing support
The caching implementation is highly involved and supports fuzzing by
providing a generic interface via the [Cx][TODO] and [Delegate][TODO] traits.
The actual fuzzer is in a separate repository https://github.com/lcnr/search_graph_fuzz. It currently supports testing whether the global cache impacts behavior to avoid query instability, and simpler graphs which simply check the soundness of the cycle behavior to test the provisional cache.
<!-- unused-->
[`with_anon_task`]: https://github.com/rust-lang/rust/blob/59d4114b2d1aaac9a6dfe770997f2e79ccfd28ab/compiler/rustc_query_system/src/dep_graph/graph.rs#L295
[wdn]: TODO
[upg]: TODO
[overflow]: TODO
[req-depth]: TODO
[req-depth-ck]: TODO
[rem-depth]: https://github.com/rust-lang/rust/blob/59d4114b2d1aaac9a6dfe770997f2e79ccfd28ab/compiler/rustc_type_ir/src/search_graph/global_cache.rs#L25
[^1]: This is overly restrictive: if all nested goal return the overflow response with some
available depth `n`, then their result should be the same for any depths smaller than `n`. We can implement this optimization in the future.
[chapter on coinduction]: ./coinduction.md
[`provisional_result`]: https://github.com/rust-lang/rust/blob/7606c13961ddc1174b70638e934df0439b7dc515/compiler/rustc_trait_selection/src/solve/search_graph.rs#L57
[initial-prov-result]: https://github.com/rust-lang/rust/blob/7606c13961ddc1174b70638e934df0439b7dc515/compiler/rustc_trait_selection/src/solve/search_graph.rs#L366-L370
[fixpoint]: https://github.com/rust-lang/rust/blob/7606c13961ddc1174b70638e934df0439b7dc515/compiler/rustc_trait_selection/src/solve/search_graph.rs#L425-L446
[^2]: summarizing the relevant [zulip thread]
[zulip thread]: https://rust-lang.zulipchat.com/#narrow/stream/364551-t-types.2Ftrait-system-refactor/topic/global.20cache
[unstable-result-ex]: https://github.com/rust-lang/rust/blob/7606c13961ddc1174b70638e934df0439b7dc515/tests/ui/traits/next-solver/cycles/coinduction/incompleteness-unstable-result.rs#L4-L16
[cycle-participants]: https://github.com/rust-lang/rust/blob/7606c13961ddc1174b70638e934df0439b7dc515/compiler/rustc_middle/src/traits/solve/cache.rs#L72-L74
th_