owned this note
owned this note
Published
Linked with GitHub
# All hands 2025 discussion
## -Z next-solver
lcnr updating interested parties on the status of the next solver
### Status
- https://github.com/rust-lang/rust/pull/133502
- bootstrap PR
- after bootstrap worked we're now doing crater runs
- general structure shouldn't change anymore (feels quite stable)
- normalization is settled
- opaque types is in flight, but does what it should
- mostly supports existing behaviour
- all big crates compile or were broken intentionally
- big failing crates
- bevy, minijinja
- fishy stuff with higher ranked trait bounds
- 660 regressions (overcount)
- bevy crates that didn't update cargo.lock
- generic const expressions fundamentally broken
- 100 crates depend on it
- rip the feature out before we stabilize next solver
- "blocked" on camelid (in 6 months, not yet)
- currently working on mgca which allows most of what GCE allowed
- most crates should be able to migrate
- post mono errors will still work on nightly in broken ways
- could break it without having a replacement
- bad vibes
- will work again in the future, kinda annoying churn if they can't update rustc for a few months
- could require ppl to switch to old solver after default is new solver
- if only remaining issue is gce, then we could, but just at that point just T-types all hands on deck for 3 weeks to just do something
- if that fails, just yeet gce and :shrug:
- still some crates that are slow with next solver
- "known unknown".
- from the perf suite
- spurious regressions from crater runs, [around 1500](https://crater-reports.s3.amazonaws.com/pr-133502-13/index.html)
- need to check them individually (OOMs, and timeouts)
- https://github.com/rust-lang/trait-system-refactor-initiative/issues/168
- both bevy and minininja
- ppl relying on relating HR alias types
- can't lint (no future incompat)
- well can, but super expensive (run hir typeck twice)
- would have to stop eagerly emitting errors from typeck
- only been two crates, and need to write very type system heavy crates
- suprisingly rare to be written in a crate
- used a lot (e.g. bevy)
- an always add explicit type annotations, since it's just few crates, just fix them
- officially inference breakage is not a breaking change
- in practice, still :aaaaaaa:
- bevy point release on old versions and new versions
- fixing minininja was harder (fixing without breaking users is hard)
- working in both solvers is hard
- just making it work in new one is hard
- oli: cfg(next-solver)
- lcnr: no
- boxy: build.rs version dependence
- actually an option
- project board for next solver
- https://github.com/orgs/rust-lang/projects/61
- issue #89 still open (alias-relate when normalizing in unnormalized env causes overflow)
- https://github.com/rust-lang/trait-system-refactor-initiative/issues/89
- currently normalize envs eagerly
- not handling aliases correctly
- next solver doesn't do that
- `<T as Trait>::Assoc: OtherTrait`
- normalize via impl, where bound shadows impl
- overflow in new solver
- hit by tachys (biggerish crate)
- fixing it is easy (just write out the where bound in code instead of making the trait solver do it)
- overflow is correct, but sad to break large crate
- just should fix the crate
- does it also affect other stuff
- is breakage necessary?
- lcnr, errs: yes necessary
- lcnr wants to understand the shadowing
- ppl generally want the shadowing to make things work
- ppl generally don't want the shadowing because it breaks things
- can only have one or the other, so wanna paick no shadowing
- rustdoc funness (fmease working on it)
- MIR ICE only in one crate (werk)
- https://github.com/rust-lang/trait-system-refactor-initiative/issues/199
- if someone ("no one" looks at @lqd) who is good at minimizing crates has time to do this after all hands that would be great
- apart from getting opaques to work, those should be the big issues
- apart from everything above, perf is also an issue
- errs is talking with folk on getting rustc perf suite set up to test next solver
- at least run a few important crates with new and old solver in perf
- errs will make a list
- wg-grammar borked
- fixed gll, now main crate times out
- without a fast path, rayon hangs
- fast paths are not cool
- fast paths are brittle
- small changes in crate can cause fast path to fail again, causing everything to take forever
- https://github.com/rust-lang/trait-system-refactor-initiative/issues/109#issuecomment-2700947654
- main issue
- canonicalization in the new solver
- when moving things into the cache
- freshening just produces new vars, making the cache entries incompatible
- coinductive cycles need to be rerun
- until fixpoint
- in this minimization rerun happens for every where clause
- exponential blowup
- solutions too complex for this discussion
- want to support
- needs a different (can't just change caching)
- whenever we rerun, we get a separate goal, which doesn't match cache entries
- need to change the way we handle canonical cycle reruns
### Involving others
- perf work
- diagnostics
- few hundreds
- maybe compare-mode is not enough to have these
## Polonius (opaque types)
- new impl in-tree a lot easier than doing it out of tree and then integrating
- amanda and lqd here
- 2 goals/opaques
- explain what opaque types mean for borrowck (new solver specifically)
- figure out an interface between polonious and rustc
- why does polonius need opaque type knowledge
- polonius cares about the way we generate the region graph
- member constraints are in region graph
- lcnr PR solving everything :rainbow: :unicorn_face: :rainbow:
- https://github.com/rust-lang/rust/pull/139587
- core idea:
- opaques currently:
- map uses: key -> hidden type
- uses have to be fully defining
- for each opaque we apply member constraints
- each lifetime in hidden type needs to be equal to one of the member constraints
- has to happen for all uses
- new solver
- we now normalize everywhere
- opaque type uses without univeral lifetimes are not defining
```rust
fn foo(x: &u32) -> impl Sized + '_ {
let _ = || {
let temp = 1;
// normalization of the return type results
// in `opaque<'local1> = &'local2 u32`.
foo(&temp);
};
x
}
```
function `foo` has a recursive call. return type of `foo` was never used (previously)
now we normalize opaque type. mir typeck equates hidden type with unconstrained regions
key->hidden map we get `opaque<'local1> = &'local2 u32`
new: either defining or revealing.
defining: universal + member constraints
revealing: just get value out of the defining/universal map and instantiate with current lifetimes
problem: can't equate types in regioncx part of borrowck, only in mir typeck
need to be able to apply member constraints and later be able to still equate types and do normalization :poop:
do mir typeck. Then create region graph, then drop everything, then do it again from the start with the knowledge
amanda: could do scc construction incrementally by keeping stuff around in first round
lcnr: not a perf issue because we don't look at liveness and local regions
opaque type bounds are only for creating constraints
typeck needs to give us more data so we can do type comparisons after it is done
need to be able to propagate revealing use constraints to outside the closure for defining use (look at example above)
when we borrowck a function we run mir typeck for all nested bodies until MIR typeck is done -> return
run minimal version of borrowck/regionck to do region grraph construction on nested items
ignore applied revealing uses
then do defining use stuff
then resume borrowck without closures
caveats:
* still relies on mutably captured upvars
* then run borrowck on nested items revealing hidden types
TAITs?
defining uses in children do not work
lcnr worried about 1 thing
when do you handle uses of opaques in child bodies
iterate over all children and collect all uses
reveal after the fact and make sure they are all the same
other approaches have been attempted.
completely merge borrowck between children and parent.
very big change, scary and unclear
action item: oli write pretty graph
action item: lcnr dev guide chapter (boxy: yay)
### how do opaques interact with there being multiple different region graphs?
can edges of free regions differ between different locations in cfg?
do you need outlives edges between free regions ins some locations in the graph but not in others?
free regions: in function signature
lqd: can't make free regions location sensitive
if we find any region that relates to other free region... oli lost track of discussion
argument regions are handleable, but free regions relating to arg regions
build different regions graphs in different locations in cfg
each location has to check for universal region errors
instead of trying to be smart, just check it for all the region graphs
works but may be slow
starting to sound like circular dependency (want to know what is in graph to know what to check)
lqd: should be fine?
do we have location sensitivity in the compiler at all?
not impled yet?
some constraint things are done by data flow
not edges themselves, outlives constraints are not location sensitive
one region graph per location (not yet)
one big graph, and locations point into it
thus member constraints not handled yet
plan: only handle in scc construction, then everything just uses that
member constraints are always universal
if there is an edge pointing at member constraint, it must automatically escape from the function
after applying member constraints it doesn't matter anymore. member region is def universal.
univeral region must not loan anything. Must not be from any borrow inside the function. not from any local.
polonious-next still relies on location-insensitive scc
we know the location where we added the use of the opaque
could add member constraint to that location sensitive scc
scraping opaque type uses could add a location to them
how does this help?
when we compute which member constraint to apply that depends on the region graph that we have rn
there is no single region graph anymore
so scope that as well. region graph at location and defining use at location.
everybody (un)happy
amount of loans in each region gets larger
if you have a cfg where one block is always before another
does the later or the earlier block have more region constraints
can you even make a statement her?
what is more?
in polonious we have region constraints assigned to some location int he scc and need to propagateuntil fixed point
do we prop forward or backward?
we don't need to prop them
we just look at every location and the cfg
constructing region graph
do you prop from one location to another?
used to, trying not to anymore
with the constraint that uses happen at locations, that is a breaking change
if that location could have fewer region constraints than the final region graph, then member constraints can have ambiguity more oftne.
if we remove constraints we may remove constraintso n the member constraint,s then we may have more options.
should be able to construct example. Some member constraint that matches some region. region needs to be in different paths where it causes ambiguity. currently no examples possible.
theoretical back compat issues with polonious-next
could happen in type outlives, too, not just member constraints.
full expressivity: type outlives should also be location sensitive.
actionables??
member constraints and polonious seems unsolved
not blocking new solver, but does block polonious
how do you prop member constraints between locations
at return points we pretty much propped all constraints, so no change to current version
if we had member constraints early, then.... used to prop forwards. Early def uses would be worse than late defining uses, because early had less contraints to work with to constrain member constraints
defining uses for taits are a problem because you can define them arbitrarily. RPITs are always uses in return type eagerly normalized. TAITs in signature is fine. TAIT in args. ATPIT is fine, TAIT not so much
it is sound to just pick a random one from member constraints. Whatever we do with member constraints, everything is sound, just semver and errors and back compat
type outlives constraints only has one test (maybe 2) for verified FEQ bounds
## Rust-analyzer using next-solver
jack wrote big hacky PR:
- current state pre local changes:
- core types still use chalk-ir
- trait solving lowers to `rustc_type_ir`:
- errs question: unsupported aliases?
- just not supported
- idea: transition, required parts of chalk in-tree for transition
- not that much effort, requires more work, not a high
- priority
- divergence between chalk and rustc
- alias-eq and opaque type support
- current impl requires pretty much no changes on rustc side
- next step
- chalk gives answer: "i don't know enough, here are current requirements"
- equal to canonincal response with Maybe + inference guidance
- jack currently only applies `Yes` solutions
- applying `Maybe` responses should be fine, needs new testing
- internal inference table to next-solver
- in r-a: accept 23 failing tests and fix as we go?
- r-a is already slightly broken, tests only detect ok -> fail, no tests for fail -> yes
- test with large projects, run analysis stats command to detect divergence between solvers
- some of them is "chalk output for malformed programs"
- still have duplicates for a bunch of stuff
- uplifting stuff should happen case-by-case basic, chat about zulip
- would be great
- current `rustc_type_ir` is quite close to rustc, can improve and generalize
- live program testing during the meeting on r-a
- we care about
- what fails, hit TODOs
- general vibes: looks good
- unresolved types 133, current r-a is 145
- gamer! NOT ACTUALLY, tested the wrong branch :<
- interesting things
- `Copy` bounds now just use tls, kinda okay
- original attempt tried to more closely match chalk to simplify r-a
- split `Interner` and `TypeIr` part
- `Copy` to `Clone` change
- lcnr q: could convert to `Clone` more easily with `.use`
- yeah-ish
## How to do meetings
- weekly zulip meetings, triage bot pre ping whether anything to talk about
- cancel if no interest or if no availability
- monthly jitsi sync meeting, first week of the month (though flexible)
- semi-unrelated: want to have a ~2h deep dive for the trait solver, maybe lcnr/errs double team the talk
## Dyn traits
- https://hackmd.io/EPO_VOkSSwSVU8SfDoO-SA
- https://github.com/rust-lang/rust/pull/140824
- all solutions suck :3
### Overlapping user-written impls with built-in dyn impl
unsoundness. impl where there is overlapping built-in impl
```
impl<T: Any> Any for T {}
```
overlaps with `Any for dyn Any`
new trait solver needs to (unsoundly lol) special case this case
in generic context where you check via generic impl, but during mono we may end up checking the concrete impl
previously thought that only allowing this with traits with methods would be fine. Also unsound, b/c builtin impl can have stronger region bounds that are stronger than the user written impl.
{%preview https://github.com/rust-lang/rust/issues/57893#issuecomment-2860064425 %}
TLDR: the Any case can't be supported without a lot of very interesting hacks that we don't know if are possible at all
### Unsoundness fix vibe checks
## Blog posts
yes
- One post after bootstrap requirement PRs are landed -> request for testing + self-congratulation of all our hard work
- Types team update (yearly) -> summarize major work done, FCPs (maybe), have some numbers at least, roadmap updates?
### roadmap
- polonius
- 3/4 done ish (+ needs perf work)
- some unimplemented stuff
- roadmap estimate kinda good, recontextualized
## Search graph (lqd)
- still needs done, lqd will work on it :3
## todo(lcnr): fake stabilization
next trait solver breaks everything
do FCP to land new solver on nightly for 3 weeks then revert before it gets into beta
switch for 3 weeks until beta cutoff so that nightly tests old solver again so no one starts relying on new solver