cc https://hackmd.io/ZsIHG5kSSti_PVyxGVG4xQ which contains my initial thoughts. That doc is now irrelevant and does not matter here.
We have two core sources of incompleteness in the solver: bidirectional-normalizes-to
(which is mostly complete), and prefering ParamEnv
and AliasBound
candidates. We also have a lot more fundamental but subtle incompleteness, e.g. in subtyping: for<'a> fn(&'a ()) <: ?x
infers ?x
to be a higher ranked fn
-ptr, even though it could also be fn(&'x ())
.
While we could slowly push towards moving all of these sources of incompleteness out of the solver, I do not think this is achievable without significantly delaying its stabilization. Going forward, I am assuming that incompleteness is unavoidable.
Any stack dependent behavior complicates the global cache in the following two ways:
As an example, consider the approach of the old solver to eagerly detect overflow. This approach has false negatives, e.g. a goal can be successful unless their already is a specific goal on the stack which causes this check to trigger. TODO: write example
Another issue here is checking the recursion_depth
. Imagine we have some proof tree A -> B -> C -> D -> E
and a remaining recursion depth of 3. Whether C
is already cached changes the result of proving A
unless accessing the global cache also checks for the recursion limit.
TODO: link to test in suite
See this example for a case where the current caching approach is broken. If we have a coinductive cycle A -> B -> A
, then its results can be completely different depending on whether we prove A
or B
: in this example proving A
results in an error, while proving B
succeeds.
Slightly simplied: the existing caching implementation caches the result for all cycle participants once the root goal gets popped from the solver stack. Due to the above this results in unstable results which can break incremental, resulting in ICEs and unsoundness.
I think such unstable cycles are incredibly unlikely to hit in practice, and it's even more unlikely that such a cycle would end up breaking incremental. However, I do not want to stabilize an approach which we know to be fundamentally unsound. Especially given that there are valid alternatives.
The design space is incredibly vast here and there are a lot of potential alternative solutions, so it feels useful to explicit write down my core priorities, in decreasing order of significance:
Due to 1, 2, 3, and 4 I generally avoid as much stack dependence as possible, as correctly tracking it in the global cache adds complexity and a performance cost.
We want to minimize stack dependence. This means we split stack dependent and stack independent overflow. Overflow can happen in 5 places:
try_evaluate_added_goals
assemble_candidates_after_normalizing_self_ty
normalize_non_self_ty
Only "proving nested goals" is stack dependent.
I assume that it is unavoidable to have non-fatal stack-depth overflow in the new solver. Adding deferred projection equality to the old solver caused stack-depth overflow in the solver. The new solver uses deferred projection equality and lazy normalization. To avoid stack-depth overflow we would need to add additional checks to prevent overflow before then. Here's a list of checks I've:
match_fresh_trait_refs
check: needs to track nested goals for all global cache entries, fairly unperformantstruct Adt<T>(RefCell<Option<T>>);
. Proving Adt<HugeType>: AutoTrait
would overflow, because RefCell<Option<HugeType>>: Trait
has a bigger type size.This means increasing the recursion depth is theoretically breaking, see https://github.com/rust-lang/trait-system-refactor-initiative/issues/33. I don't think this is something we should worry about however.
To correctly handle overflow due to hitting the recursion limit, I intend to track the reached depth of all goals in the global cache. If the reached depth is higher than the remaining allowed depth, we do not use the global cache entry. We can also cache goals hitting the depth limit.
TODO: Issue: when first hitting the recursion limit I greatly reduce the allowed limit. This makes it non trivial to cache based on the reached depth.
These track overflow separately. This allows us to not track any information for them in the global cache. Non stack dependent overflow can also be reused regardless of the stack depth.
We can more freely choose the allowed number of steps for non-stack dependent goals as they are now treated separately. E.g. as we know that for coinductive cycles we tend to reach the fixpoint either in very few steps or not at all, we can reduce the limit to log2(recursion_depth)
, which handles the hang in https://github.com/rust-lang/trait-system-refactor-initiative/issues/13.
To handle incompleteness in coinductive cycles, we still cache the root goal, but put all nested goals into a map of the cache result. We then do not use this cache result if we access it from a goal which depended on that goal.
TODO
graph TB
A --> B
B --> C
B --> F
C --> D
D --> E
A
start with depth 3
B
: available depth 2
C
: available depth 1
D
: available depth 0
E
: OVERFLOW -> all of A, B, C, and D get tagged with encountered_overflow
back to B
, keeps available depth 2
, but nested goal F
now has depth depth(B) / 4 = 0
instead. F
does not get encountered_overflow
set.
encountered_overflow
affects how the cache entry is stored in the global cache (into successful or with_overflow)