owned this note
owned this note
Published
Linked with GitHub
cc https://hackmd.io/ZsIHG5kSSti_PVyxGVG4xQ which contains my initial thoughts. That doc is now irrelevant and does not matter here.
## incompleteness in the new solver is (practically) unavoidable
We have two core sources of incompleteness in the solver: [`bidirectional-normalizes-to`](https://github.com/rust-lang/trait-system-refactor-initiative/issues/25) (which is mostly complete), and [prefering `ParamEnv` and `AliasBound` candidates](https://github.com/rust-lang/trait-system-refactor-initiative/issues/45). We also have a lot more fundamental but subtle incompleteness, e.g. in subtyping: `for<'a> fn(&'a ()) <: ?x` infers `?x` to be a higher ranked `fn`-ptr, even though it could also be `fn(&'x ())`.
While we could slowly push towards moving all of these sources of incompleteness out of the solver, I do not think this is achievable without significantly delaying its stabilization. Going forward, I am assuming that incompleteness is unavoidable.
## stack dependence requires global caching support
Any stack dependent behavior complicates the global cache in the following two ways:
- when putting things in the global cache, we have to be sure that their result is not stack-dependent.
- when reading things from the global cache, we have to be sure that their result would not be different due to stack dependence.
As an example, consider [the approach of the old solver to eagerly detect overflow][mftr]. This approach has false negatives, e.g. a goal can be successful unless their already is a specific goal on the stack which causes this check to trigger. TODO: write example
Another issue here is checking the `recursion_depth`. Imagine we have some proof tree `A -> B -> C -> D -> E` and a remaining recursion depth of 3. Whether `C` is already cached changes the result of proving `A` unless accessing the global cache also checks for the recursion limit.
TODO: link to test in suite
## the root goals of coinductive cycles is significant
See [this example](https://rust.godbolt.org/z/MoTrvjvY6) for a case where the current caching approach is broken. If we have a coinductive cycle `A -> B -> A`, then its results can be completely different depending on whether we prove `A` or `B`: in this example proving `A` results in an error, while proving `B` succeeds.
Slightly simplied: the existing caching implementation caches the result for all cycle participants once the root goal gets popped from the solver stack. Due to the above this results in unstable results which can break incremental, resulting in ICEs and unsoundness.
I think such unstable cycles are incredibly unlikely to hit in practice, and it's even more unlikely that such a cycle would end up breaking incremental. However, I do not want to stabilize an approach which **we know** to be fundamentally unsound. Especially given that there are valid alternatives.
## design goals of my approach
The design space is incredibly vast here and there are a lot of potential alternative solutions, so it feels useful to explicit write down my core priorities, in decreasing order of significance:
1. the implementation has to be *fully sound*, even if the bugs cannot be hit by accident
2. the implementation has to be performant. Where possible, move costs towards less hot areas, e.g. adding info to the global cache for coinductive cycles is better than doing so for all goals.
3. it should not expose us to many unknowns or be too experimental. Maintaining the old trait solver has a significant cost. We should strive to stabilize the new solver as quickly as reasonably possible
4. the implementation should be maintainable and mostly hidden in the happy path
5. the implementation should not rely on behavior which ends up to blocking us from moving towards a 'better' approach after stabilization.
Due to 1, 2, 3, and 4 I generally avoid as much stack dependence as possible, as correctly tracking it in the global cache adds complexity and a performance cost.
## the new approach to overflow
We want to minimize stack dependence. This means we split stack dependent and stack independent overflow. Overflow can happen in 5 places:
- proving nested goals
- rerunning goals to get towards a fixpoint for coinduction
- looping over nested goals in `try_evaluate_added_goals`
- `assemble_candidates_after_normalizing_self_ty`
- `normalize_non_self_ty`
Only "proving nested goals" is stack dependent.
### stack dependence
I assume that it is unavoidable to have non-fatal stack-depth overflow in the new solver. Adding deferred projection equality to the old solver caused stack-depth overflow in the solver. The new solver uses deferred projection equality and lazy normalization. To avoid stack-depth overflow we would need to add additional checks to prevent overflow before then. Here's a list of checks I've:
- [old solver `match_fresh_trait_refs` check][mftr]: needs to track nested goals for all global cache entries, fairly unperformant
- checking for a maximum type size in goals: highly uncertain how well this would work. May break existing crates with very complex goals, especially during monomorphization. To not break these it would probably also not be eager enough with detecting actual overflow, resulting in hangs or very high depths.
- overflow on growing type size above a limit: instead of rejecting huge types by itself, only reject huge types in goals if the type size is even bigger in a directly nested goal. This avoids stack dependence. It has the same issues as tracking the type size by itself. It also doesn't help with preventing bad™ overflow during monomorphization, consider an auto trait goal for `struct Adt<T>(RefCell<Option<T>>);`. Proving `Adt<HugeType>: AutoTrait` would overflow, because `RefCell<Option<HugeType>>: Trait` has a bigger type size.
- somehow checking for repeated usage of the same impl: very unsure how well this works, likely stack dependent for all goals. Would require significant experimentation.
This means increasing the recursion depth is theoretically breaking, see https://github.com/rust-lang/trait-system-refactor-initiative/issues/33. I don't think this is something we should worry about however.
To correctly handle overflow due to hitting the recursion limit, I intend to track the reached depth of all goals in the global cache. If the reached depth is higher than the remaining allowed depth, we do not use the global cache entry. We can also cache goals hitting the depth limit.
TODO: Issue: when first hitting the recursion limit I greatly reduce the allowed limit. This makes it non trivial to cache based on the reached depth.
## non stack dependent goals
These track overflow separately. This allows us to not track any information for them in the global cache. Non stack dependent overflow can also be reused regardless of the stack depth.
We can more freely choose the allowed number of steps for non-stack dependent goals as they are now treated separately. E.g. as we know that for coinductive cycles we tend to reach the fixpoint either in very few steps or not at all, we can reduce the limit to `log2(recursion_depth)`, which handles the hang in https://github.com/rust-lang/trait-system-refactor-initiative/issues/13.
## caching coinductive cycles
To handle incompleteness in coinductive cycles, we still cache the root goal, but put all nested goals into a map of the cache result. We then do not use this cache result if we access it from a goal which depended on that goal.
## capping the remaining depth
TODO
```mermaid
graph TB
A --> B
B --> C
B --> F
C --> D
D --> E
```
`A` start with depth 3
`B`: available depth 2
`C`: available depth 1
`D`: available depth 0
`E`: OVERFLOW -> all of A, B, C, and D get tagged with `encountered_overflow`
back to `B`, keeps available depth `2`, but nested goal `F` now has depth `depth(B) / 4 = 0` instead. `F` does not get `encountered_overflow` set.
* `encountered_overflow` affects how the cache entry is stored in the global cache (into successful or with_overflow)
[mftr]: https://github.com/rust-lang/rust/blob/b14fd2359f47fb9a14bbfe55359db4bb3af11861/compiler/rustc_trait_selection/src/traits/select/mod.rs#L1172-L1211