changed a year ago
Linked with GitHub

non-fatal overflow in the new solver

this is a collection of notes for the final overflow doc


The new solver should avoid hangs as much as possible. It must consider constraints from overflowing branches because of https://github.com/rust-lang/trait-system-refactor-initiative/issues/70.

Overflow can happen in multiple places:

The main concern with non-fatal overflow is how we should handle exponential blowup. This can be caused in multiple different ways yet again:

  • multiple candidates for a trait or project goals
  • multiple nested goals of a trait or project candidate
  • multiple nominal obligations from well-formed goals

more complex overflow issues

These blowup and overflow sources can be combined for even more fun.

solver fixpoint + try_evaluate_added_goals

https://github.com/rust-lang/rust/pull/118774

trait Trait {}

struct W<T: ?Sized>(*const T);

impl<T: ?Sized> Trait for W<W<T>>
where
    W<T>: Trait,
    W<T>: Trait,
{}

fn impls_trait<T: Trait>() {}

fn main() {
    impls_trait::<W<_>>();
    //~^ ERROR overflow evaluating the requirement
}

solver cycle fixpoint at each level + multiple nested goals

#![feature(rustc_attrs)]
#![allow(internal_features)]

#[rustc_coinductive]
trait Trait {}

struct W<T: ?Sized>(T);

impl<T: ?Sized> Trait for W<W<T>>
where
    Self: Assistant,
    W<T>: Trait,
{
}

#[rustc_coinductive]
trait Assistant {}
impl<T: ?Sized> Assistant for W<T>
where
    T: Assistant,
    Self: Trait,
{
}

fn impls_trait<T: Trait + ?Sized>() {}

fn main() {
    impls_trait::<W<_>>();
}

solver cycle fixpoint at each level + multiple candidates

// This has exponential growth because of the growing impl,
// even it does not apply.
trait Trait {}

struct W<T>(T);
struct U<T>(T);

trait NotImplemented {}

impl<T> Trait for W<T>
where
    W<W<T>>: Trait,
    W<T>: NotImplemented,
{}

impl<T: Other> Trait for T {}

trait Other {}
impl<T: Other + Trait> Other for W<T> {}
impl Other for () {}

fn impls_trait<T: Trait + ?Sized>() {}

fn main() {
    impls_trait::<W<_>>();
}

overflow in try_evaluate_added_goals and anything else

Rerunning overflowing goals after applying their constraints very easily result in hangs, because we recompute the overflowing goal at each loop, increasing the size of the inferred type even more, e.g: ui/traits/new-solver/overflow/exponential-trait-goals.rs

trait Trait {}

struct W<T>(T);

impl<T, U> Trait for W<(W<T>, W<U>)>
where
    W<T>: Trait,
    W<U>: Trait,
{
}

fn impls<T: Trait>() {}

fn main() {
    impls::<W<_>>();
}

This also results in unstable results. Stopping to apply inference constraints because of overflow allows the solver to make additional progress the next time the goal is computed.

assemble_candidates_after_normalizing_self_ty and try_normalize_ty blowup

For cyclic projections, normalizing the self type results in recursion_depth nested Projection(Alias, ?new_infer) goals. ?new_infer gets instantiated as Alias which (due to the way the current impl is set up, ends up resulting in a nested AliasRelate(Alias, Alias) goal, which again normalizes the alias resulting in recursion_depth many nested goals.

trait Overflow<U: ?Sized> {
    type Assoc;
}

impl<U: ?Sized> Overflow<U> for () {
    type Assoc: = <() as Overflow<(U,)>>::Assoc;
}

fn main() {}

This results in nested alias relate goals because when generalizing, the generalized types has no unresolved inference variables while the original one does, preventing the structural eq fast path from firing when equating at the end of CombineFields::instantiate. The following diff prevents that overflow

--- a/compiler/rustc_trait_selection/src/solve/project_goals/mod.rs
+++ b/compiler/rustc_trait_selection/src/solve/project_goals/mod.rs
@@ -227,6 +227,7 @@ fn consider_impl_candidate(
             //
             // And then map these args to the args of the defining impl of `Assoc`, going
             // from `[u32, u64]` to `[u32, i32, u64]`.
+            let impl_args = ecx.resolve_vars_if_possible(impl_args);
             let impl_args_with_gat = goal.predicate.projection_ty.args.rebase_onto(
                 tcx,
                 goal_trait_ref.def_id,

random thoughts and summary

  • we need to apply inference constraints even if there's overflow for backcompat
  • try_evaluate_added_goals causes other overflow to very quickly result in hangs. Overflowing nested goals have to be heavily penalized or avoided.
  • changing the layout of the proof tree after stabilization is theoretically breaking, either because of hangs or because we stop visiting paths which are not visited anymore.
  • avoiding the recomputation of parts of the proof tree is fully backwards compatible and something we can do after stabilization
  • normalization needs the "full depth"
  • cycle handling must not allow the full depth as it otherwise hangs
  • we will probably readd a provisional cache at some point, this may reduce the cost of the cycle fixpoint step
  • ignoring constaints from overflow is very good for perf, may specialcase the constraints from where-clauses

new idea

  • stash goals resulting in overflow in try_evaluate_added_goals and avoid evaluating them in following evaluations
  • do the same in fulfillment
  • maybe also try to prove them once more at the end

cache usage of the new solver

current implementation (without dependencies) as of 2023.12.04.

crate overflow global cache cycle compute
syn
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
0 470628 0 49688
rand (slightly changed)
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
0 118929 8586 29173
serde
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
0 4032167 0 122229
bitflags
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
0 10901 0 3477
regex-syntax
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
0 400924 28 51540
regex-automata
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
0 826261 4 139092
typenum
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
76924 345245 4854 65233

IDEA: Learning from CTFE

Trait solving has similar constraints to CTFE. We can use a similar approach to CTFE to avoid hangs in a backwards compatible way.

Have a simple counter in the trait solver, which is incremented whenever we evaluate a nested goal. If that counter hits some arbitrary limit, we emit a deny by default lint telling the user that the solver seems to be hanging due to their code. If that lint results in an error (i.e. has not been changed to allow/warn), we abort compilation.

If it has been allowed or changed to warn, we repeatedly emit a warning with some exponential backoff.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
This has one significant issue: the number of evaluated nested goals is not a good approximation of the solver runtime: typenum has 95000 uncached goal evaluations in less than a second. tests/ui/traits/new-solver/cycles/coinduction/fixpoint-exponential-growth.rs hangs with less than 700 uncached goal evaluations
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

The time needed to evaluate a single goal can differ widely depending on the amount and size of inference constraints from nested goals. However, we can combine this counter with a size check of constraints, or include the size of constraints when incrementing the counter.

IDEA: split "overflow" and "recursion limit based overflow"

Only yeet constraints from recursion limit based overflow. This does not avoid the hang in tests/ui/traits/new-solver/cycles/coinduction/fixpoint-exponential-growth.rs.

Many crates depend on the "overflow project where bounds" behavior

https://github.com/rust-lang/trait-system-refactor-initiative/issues/70 may just be acceptable breakage. It may be very positive for perf, at least doing it for recursion limit based overflow.

https://github.com/rust-lang/rust/issues/90662 was originally caused by only causing this pattern to error for global goals (if it otherwise cycles), this broke https://github.com/AzureMarker/shaku. It feels likely that always doing so is too impactful.

IDEA: checking the size of the var_values constraints

When canonicalizing a response, check the size of the var_values and discard them if they grow too large, resulting in overflow.

This can result in bistable cycle fixpoint computations, but that seems alright.

TODO: impl header eq constraints old solver

Write tests where we rely on these constraints both for Projection and Trait goals, nested goals either resulting in inductive cycle or hitting the recurison limit (should also be fine if there's just a single candidate).

also write tests where we rely on these constraints from a nested goal. So we need the impl header eq constraints of a goal from the where-bounds.

QUESTION: Can we delay stabilizing our non-fatal overflow behavior

I personally think it is acceptable to delay the stabilization of a "ready" implementation of -Ztrait-solver=next-coherence to collect more data while working on full -Ztrait-solver=next. We should still publish a blogpost asking for testing and stating that it is ready for stabilization.

We cannot completly avoid non-fatal overflow in Ztrait-solver=next-coherence as typenum should continue to compile. We could emit a deny-by default lint when people depend on our current overflow handling. This lint would hit quite a few crates however, so it's not ideal.

There's also the question of whether and where to drop constraints from overflow. If this behavior should affect non-recursion depth based overflow, e.g. inductive cycle fixpoints, then we either have to pretty much decide on that behavior already.

Advantages of stabilizing -Ztrait-solver=next-coherence

  • remove the coherence support from the old solver, reducing complexity
  • get more testing of the new solver, at least for the behavior relied upon in coherence
  • milestone showing that the new solver is making progress, alleviate concerns about the type system being stuck
  • fix bugs in coherence and have a sensible behavior wrt to binders. Mostly negligable: https://hackmd.io/ABcskdRCRj6WuE3TeX9zEQ
  • the positive impact is overwhelmingly social. there are limited technical benefits from stabilizing it.
Select a repo