All hands 2025 discussion

# All hands 2025 discussion ## -Z next-solver lcnr updating interested parties on the status of the next solver ### Status - https://github.com/rust-lang/rust/pull/133502 - bootstrap PR - after bootstrap worked we're now doing crater runs - general structure shouldn't change anymore (feels quite stable) - normalization is settled - opaque types is in flight, but does what it should - mostly supports existing behaviour - all big crates compile or were broken intentionally - big failing crates - bevy, minijinja - fishy stuff with higher ranked trait bounds - 660 regressions (overcount) - bevy crates that didn't update cargo.lock - generic const expressions fundamentally broken - 100 crates depend on it - rip the feature out before we stabilize next solver - "blocked" on camelid (in 6 months, not yet) - currently working on mgca which allows most of what GCE allowed - most crates should be able to migrate - post mono errors will still work on nightly in broken ways - could break it without having a replacement - bad vibes - will work again in the future, kinda annoying churn if they can't update rustc for a few months - could require ppl to switch to old solver after default is new solver - if only remaining issue is gce, then we could, but just at that point just T-types all hands on deck for 3 weeks to just do something - if that fails, just yeet gce and :shrug: - still some crates that are slow with next solver - "known unknown". - from the perf suite - spurious regressions from crater runs, [around 1500](https://crater-reports.s3.amazonaws.com/pr-133502-13/index.html) - need to check them individually (OOMs, and timeouts) - https://github.com/rust-lang/trait-system-refactor-initiative/issues/168 - both bevy and minininja - ppl relying on relating HR alias types - can't lint (no future incompat) - well can, but super expensive (run hir typeck twice) - would have to stop eagerly emitting errors from typeck - only been two crates, and need to write very type system heavy crates - suprisingly rare to be written in a crate - used a lot (e.g. bevy) - an always add explicit type annotations, since it's just few crates, just fix them - officially inference breakage is not a breaking change - in practice, still :aaaaaaa: - bevy point release on old versions and new versions - fixing minininja was harder (fixing without breaking users is hard) - working in both solvers is hard - just making it work in new one is hard - oli: cfg(next-solver) - lcnr: no - boxy: build.rs version dependence - actually an option - project board for next solver - https://github.com/orgs/rust-lang/projects/61 - issue #89 still open (alias-relate when normalizing in unnormalized env causes overflow) - https://github.com/rust-lang/trait-system-refactor-initiative/issues/89 - currently normalize envs eagerly - not handling aliases correctly - next solver doesn't do that - `<T as Trait>::Assoc: OtherTrait` - normalize via impl, where bound shadows impl - overflow in new solver - hit by tachys (biggerish crate) - fixing it is easy (just write out the where bound in code instead of making the trait solver do it) - overflow is correct, but sad to break large crate - just should fix the crate - does it also affect other stuff - is breakage necessary? - lcnr, errs: yes necessary - lcnr wants to understand the shadowing - ppl generally want the shadowing to make things work - ppl generally don't want the shadowing because it breaks things - can only have one or the other, so wanna paick no shadowing - rustdoc funness (fmease working on it) - MIR ICE only in one crate (werk) - https://github.com/rust-lang/trait-system-refactor-initiative/issues/199 - if someone ("no one" looks at @lqd) who is good at minimizing crates has time to do this after all hands that would be great - apart from getting opaques to work, those should be the big issues - apart from everything above, perf is also an issue - errs is talking with folk on getting rustc perf suite set up to test next solver - at least run a few important crates with new and old solver in perf - errs will make a list - wg-grammar borked - fixed gll, now main crate times out - without a fast path, rayon hangs - fast paths are not cool - fast paths are brittle - small changes in crate can cause fast path to fail again, causing everything to take forever - https://github.com/rust-lang/trait-system-refactor-initiative/issues/109#issuecomment-2700947654 - main issue - canonicalization in the new solver - when moving things into the cache - freshening just produces new vars, making the cache entries incompatible - coinductive cycles need to be rerun - until fixpoint - in this minimization rerun happens for every where clause - exponential blowup - solutions too complex for this discussion - want to support - needs a different (can't just change caching) - whenever we rerun, we get a separate goal, which doesn't match cache entries - need to change the way we handle canonical cycle reruns ### Involving others - perf work - diagnostics - few hundreds - maybe compare-mode is not enough to have these ## Polonius (opaque types) - new impl in-tree a lot easier than doing it out of tree and then integrating - amanda and lqd here - 2 goals/opaques - explain what opaque types mean for borrowck (new solver specifically) - figure out an interface between polonious and rustc - why does polonius need opaque type knowledge - polonius cares about the way we generate the region graph - member constraints are in region graph - lcnr PR solving everything :rainbow: :unicorn_face: :rainbow: - https://github.com/rust-lang/rust/pull/139587 - core idea: - opaques currently: - map uses: key -> hidden type - uses have to be fully defining - for each opaque we apply member constraints - each lifetime in hidden type needs to be equal to one of the member constraints - has to happen for all uses - new solver - we now normalize everywhere - opaque type uses without univeral lifetimes are not defining ```rust fn foo(x: &u32) -> impl Sized + '_ { let _ = || { let temp = 1; // normalization of the return type results // in `opaque<'local1> = &'local2 u32`. foo(&temp); }; x } ``` function `foo` has a recursive call. return type of `foo` was never used (previously) now we normalize opaque type. mir typeck equates hidden type with unconstrained regions key->hidden map we get `opaque<'local1> = &'local2 u32` new: either defining or revealing. defining: universal + member constraints revealing: just get value out of the defining/universal map and instantiate with current lifetimes problem: can't equate types in regioncx part of borrowck, only in mir typeck need to be able to apply member constraints and later be able to still equate types and do normalization :poop: do mir typeck. Then create region graph, then drop everything, then do it again from the start with the knowledge amanda: could do scc construction incrementally by keeping stuff around in first round lcnr: not a perf issue because we don't look at liveness and local regions opaque type bounds are only for creating constraints typeck needs to give us more data so we can do type comparisons after it is done need to be able to propagate revealing use constraints to outside the closure for defining use (look at example above) when we borrowck a function we run mir typeck for all nested bodies until MIR typeck is done -> return run minimal version of borrowck/regionck to do region grraph construction on nested items ignore applied revealing uses then do defining use stuff then resume borrowck without closures caveats: * still relies on mutably captured upvars * then run borrowck on nested items revealing hidden types TAITs? defining uses in children do not work lcnr worried about 1 thing when do you handle uses of opaques in child bodies iterate over all children and collect all uses reveal after the fact and make sure they are all the same other approaches have been attempted. completely merge borrowck between children and parent. very big change, scary and unclear action item: oli write pretty graph action item: lcnr dev guide chapter (boxy: yay) ### how do opaques interact with there being multiple different region graphs? can edges of free regions differ between different locations in cfg? do you need outlives edges between free regions ins some locations in the graph but not in others? free regions: in function signature lqd: can't make free regions location sensitive if we find any region that relates to other free region... oli lost track of discussion argument regions are handleable, but free regions relating to arg regions build different regions graphs in different locations in cfg each location has to check for universal region errors instead of trying to be smart, just check it for all the region graphs works but may be slow starting to sound like circular dependency (want to know what is in graph to know what to check) lqd: should be fine? do we have location sensitivity in the compiler at all? not impled yet? some constraint things are done by data flow not edges themselves, outlives constraints are not location sensitive one region graph per location (not yet) one big graph, and locations point into it thus member constraints not handled yet plan: only handle in scc construction, then everything just uses that member constraints are always universal if there is an edge pointing at member constraint, it must automatically escape from the function after applying member constraints it doesn't matter anymore. member region is def universal. univeral region must not loan anything. Must not be from any borrow inside the function. not from any local. polonious-next still relies on location-insensitive scc we know the location where we added the use of the opaque could add member constraint to that location sensitive scc scraping opaque type uses could add a location to them how does this help? when we compute which member constraint to apply that depends on the region graph that we have rn there is no single region graph anymore so scope that as well. region graph at location and defining use at location. everybody (un)happy amount of loans in each region gets larger if you have a cfg where one block is always before another does the later or the earlier block have more region constraints can you even make a statement her? what is more? in polonious we have region constraints assigned to some location int he scc and need to propagateuntil fixed point do we prop forward or backward? we don't need to prop them we just look at every location and the cfg constructing region graph do you prop from one location to another? used to, trying not to anymore with the constraint that uses happen at locations, that is a breaking change if that location could have fewer region constraints than the final region graph, then member constraints can have ambiguity more oftne. if we remove constraints we may remove constraintso n the member constraint,s then we may have more options. should be able to construct example. Some member constraint that matches some region. region needs to be in different paths where it causes ambiguity. currently no examples possible. theoretical back compat issues with polonious-next could happen in type outlives, too, not just member constraints. full expressivity: type outlives should also be location sensitive. actionables?? member constraints and polonious seems unsolved not blocking new solver, but does block polonious how do you prop member constraints between locations at return points we pretty much propped all constraints, so no change to current version if we had member constraints early, then.... used to prop forwards. Early def uses would be worse than late defining uses, because early had less contraints to work with to constrain member constraints defining uses for taits are a problem because you can define them arbitrarily. RPITs are always uses in return type eagerly normalized. TAITs in signature is fine. TAIT in args. ATPIT is fine, TAIT not so much it is sound to just pick a random one from member constraints. Whatever we do with member constraints, everything is sound, just semver and errors and back compat type outlives constraints only has one test (maybe 2) for verified FEQ bounds ## Rust-analyzer using next-solver jack wrote big hacky PR: - current state pre local changes: - core types still use chalk-ir - trait solving lowers to `rustc_type_ir`: - errs question: unsupported aliases? - just not supported - idea: transition, required parts of chalk in-tree for transition - not that much effort, requires more work, not a high - priority - divergence between chalk and rustc - alias-eq and opaque type support - current impl requires pretty much no changes on rustc side - next step - chalk gives answer: "i don't know enough, here are current requirements" - equal to canonincal response with Maybe + inference guidance - jack currently only applies `Yes` solutions - applying `Maybe` responses should be fine, needs new testing - internal inference table to next-solver - in r-a: accept 23 failing tests and fix as we go? - r-a is already slightly broken, tests only detect ok -> fail, no tests for fail -> yes - test with large projects, run analysis stats command to detect divergence between solvers - some of them is "chalk output for malformed programs" - still have duplicates for a bunch of stuff - uplifting stuff should happen case-by-case basic, chat about zulip - would be great - current `rustc_type_ir` is quite close to rustc, can improve and generalize - live program testing during the meeting on r-a - we care about - what fails, hit TODOs - general vibes: looks good - unresolved types 133, current r-a is 145 - gamer! NOT ACTUALLY, tested the wrong branch :< - interesting things - `Copy` bounds now just use tls, kinda okay - original attempt tried to more closely match chalk to simplify r-a - split `Interner` and `TypeIr` part - `Copy` to `Clone` change - lcnr q: could convert to `Clone` more easily with `.use` - yeah-ish ## How to do meetings - weekly zulip meetings, triage bot pre ping whether anything to talk about - cancel if no interest or if no availability - monthly jitsi sync meeting, first week of the month (though flexible) - semi-unrelated: want to have a ~2h deep dive for the trait solver, maybe lcnr/errs double team the talk ## Dyn traits - https://hackmd.io/EPO_VOkSSwSVU8SfDoO-SA - https://github.com/rust-lang/rust/pull/140824 - all solutions suck :3 ### Overlapping user-written impls with built-in dyn impl unsoundness. impl where there is overlapping built-in impl ``` impl<T: Any> Any for T {} ``` overlaps with `Any for dyn Any` new trait solver needs to (unsoundly lol) special case this case in generic context where you check via generic impl, but during mono we may end up checking the concrete impl previously thought that only allowing this with traits with methods would be fine. Also unsound, b/c builtin impl can have stronger region bounds that are stronger than the user written impl. {%preview https://github.com/rust-lang/rust/issues/57893#issuecomment-2860064425 %} TLDR: the Any case can't be supported without a lot of very interesting hacks that we don't know if are possible at all ### Unsoundness fix vibe checks ## Blog posts yes - One post after bootstrap requirement PRs are landed -> request for testing + self-congratulation of all our hard work - Types team update (yearly) -> summarize major work done, FCPs (maybe), have some numbers at least, roadmap updates? ### roadmap - polonius - 3/4 done ish (+ needs perf work) - some unimplemented stuff - roadmap estimate kinda good, recontextualized ## Search graph (lqd) - still needs done, lqd will work on it :3 ## todo(lcnr): fake stabilization next trait solver breaks everything do FCP to land new solver on nightly for 3 weeks then revert before it gets into beta switch for 3 weeks until beta cutoff so that nightly tests old solver again so no one starts relying on new solver