Libs Meetings
Minutes
Meeting Link: https://meet.jit.si/rust-libs-meeting-crxoz2at8hiccp7b3ixf89qgxfymlbwr
Attendees: Amanieu, Mara, David, Josh Triplett, Chris Denton, TC, Urgau, Mark, Jubilee, Josh Stone, The 8472
Change: https://github.com/rust-lang/rust/pull/99969
Regression: https://github.com/rust-lang/rust/issues/127343
Amanieu: Summary: when there's only one trait impl, type inference will use that and pick that type. That breaks once we add a second impl.
Mara: E.g. x.into()
Josh: Here it was a combination of Into::into
with .collect(); normally type inference breaks if there's an arbitrary intermediate type, but there was only one possible impl of into
here, so it succeeded using the only-one-impl rule. When we added a new impl, type inference could no longer succeed.
Jubilee: The From
trait is used a lot, but has a horrible design. The reflexive impl (From<T> for T
) means that there is always an implementation, which is useful, but also means that if you convert from T to T it can be unambiguous and break when a new impl is added.
Jubilee: This kind of regression comes up more often, and is usually considered okay. But none of them where on the same scale as this one. Thousands of impacted crates.
Amanieu: Root regressions?
Jubilee: Total regressions.
Josh: I am not certain if we have seen a case with thousands of total regressions before, but I think we may have. In this case we saw that there were very few root regressions, and since time
is already fixed, that will fix the root regressions. We substantially underestimated the amount of effort to migrate to the patched time
version.
Josh: Were people using an old major version? Or were people only using an old minor/patch version that could use cargo update
?
Jubilee: Minor versions.
Jubilee: The problem is the assumption 'just running cargo update
is easy'. That's for the maintainer, the person who owns the repo, basically.
Mark: We don't use transitive cargo.lock files. It can fan out in that lots of lock files need updating, but you never need to chase people down to update their lock file.
Jubilee: Packagers do need to chase down the code owners.
Mark: Not on crates.io
Jubilee: Sure, but that's not how all rust software is distributed
Josh: E.g. people encountered breakage in NixOs
Mara: https://github.com/NixOS/nixpkgs/issues/332957
Jubilee: Package maintainers need to convince the software maintainers to update their lock file.
Jubilee: Or patch the lock files, but they don't have anything set up for that.
Mark: There are lots of things that can break if you pin dependencies.
Josh: How about earlier major versions, time 0.1 or 0.2?
Jubilee: Looks like it's all 0.3.
Mark: The breaking code wasn't present in 0.2 and before.
Mark: Looks like the breaking code was introduced in 0.3.17.
Mark: This is not unusual. When we do crater runs, we routinely see close to half these amount of regressions.
David: Yeah, the amount didn't stand out.
Jubilee: How often?
Mark: Every few releases.
Josh: I'm guessing at least one as big as this in the last year.
Josh: The amount of packaging glue in existence will go up, so more impactful over time.
David: Reverse impact as well: 5000 broken crates today is 'less' than it was a few years ago.
Jubilee: We should have a threshold (1000? 5000?) for more mitigations.
Mark: Meta question is: do we actually think something went wrong here. I think no. We evaluated the regressions, and agreed it's okay. Would we have changed our perspective if we knew what we know now. Technical solutions are orthogonal.
Josh: I agree we absolutely followed process. But fall out from this seems to have reached much further than past regressions. E.g. random people on LWN, Reddit, nix, fediverse, …
David: I think that's because cargo started recommending Cargo.lock in library crates.
Mark: I fairly routinely see people say that updating rust broke something. Whether a lint or clippy lint. Always see that, to some degree. Doesn't seem different in magnitude this time.
David: My experience too at work (meta). There were 9 issues we dealt with in the 1.80 upgrade, of which time
was not one.
Josh: Is type inference breakage is a common source?
David: 5-10% usually. Easiest breakage to fix.
Josh Stone: I most commonly see issues with deny(warnings)
and new lints.
Mara: We had process failures in getting things into the release notes. We applied the relnotes label, but nothing ended up in the release notes.
Jubilee: Nothing ended up in the release notes. There was discussion about whether to be slightly more careful with breakage, but no conclusion.
Mark: We've always had this issue with relnotes
. Often there is no comment saying 'why'.
Jubilee: The comment in this case just linked another issue. Then you have to go read that other issue and the comments.
JoshS: I don't even click through on those things. There needs to be another label or something.
Mark: Process change. New label, more context.
The 8472: Teams could maintain their own relnotes.
Mark: I think we can agree to solve this. We can figure out a process.
JoshT: Having that database be public is helpful.
Mara: Another thing to discuss between T-libs-api and T-release: crater. We didn't do a crater run, left it for the beta crater run. Perhaps that was not the right call. Should decide how to do that in the future.
Jubilee: Triaging crater results in hard. Can be very hard to figure out where individual regressions come from, due to e.g. undeterministic tests etc.
Mark: Usually we ignore test failures. Build failurse are usually clear. Not a huge issue.
Mark: Question is whether libs-api wants this input when deciding on a stabilization.
Mark: If you do a crate run first, that won't remove those same results from the beta crater run if you accept the breakage.
Josh: Computer time is much cheaper than human time. If it'd make things easier to do a crater run for each potentially regressing PR, we can do that. Has there been progress on speeding up crater by throwing more parallel resources at it (which should in theory not cost more since it can release resources sooner)?
Jubilee: There is a bias when you make the decision only after the beta crater run. More tempted to just accept.
Amanieu: https://github.com/rust-lang/rust/pull/128254#issuecomment-2277873090 Example comment to release team.
The 8472: Could we create the regression tickets in advance?
Mark: Hard to find them. Don't think worth spending time on.
Mara: Might already be covered by a relnotes-breakage
label.
Jubilee: Would be nice to automate triaging beta crater results a bit more. E.g. determine "dependency on time, type inference failure".
Mark: Happy if someone signs up.
Mara: We discussed:
JoshT: We should at some point discuss:
4. Technical solutions (lang/compiler) to prevent/mitigate this kind of breakage in the future
Mara: Do we have agreement on 1?
Jubilee: Yes, we followed process, but is the process worthy?
Amanieu: We underestimated the impact.
Mara: Are you saying we would we have not done this if we knew? Or waited? Or ?
Amanieu: Probably with some lang thing? Not sure how feasible.
JoshT: We could have waited on e.g. 80% of time
usage to update to a newer version, waiting one or two releases.
TC: Do you think 1-2 releases would be enough for that 80%? I'd expect 2-3 years (without warnings pushing them).
Jubilee: 80% of total compiles, rather than 80% of crates, is what is important.
The 8472: The download statistics already show the newest version dominating.
Amanieu: time
was updated quickly, months ago. 1-2 releases since then due to the release train.
Jubilee: 3 months not quite enough
Amanieu: I don't think time is the problem. If people don't get pushed to update, they will simply not update.
Mark: Compiler could hardcode e.g. time
and give warnings. Or cargo.
Mark: But most of the people impacted here are looking at their build logs.
Jubilee: The breakage hits with 1.80.0, which was the point at which people started running the new compiler. A new lint would have started firing then.
Mark: I don't think extra months would have changed much.
Mara: In terms of decisions: 1. do we want to roll this back? release a point release? I think no?
Josh: No.
Jubilee: No.
Unanimous decision: Not reverting.
Mara: Next question: If this situation shows up tomorrow again, do we want to do anything differently?
TC: People would be less surprised at the next (RFC 1105-allowed) breakage. E.g. the nix people will now have seen that this kind of breakage can happen. As we grow and gain new communities, we pick up new people who have (temporarily) never seen this kind of breakage before.
Amanieu: Make sure it ends up in the relnotes correctly, with instructions.
JoshT: Rely less on the beta crater run.
Mark: Release team can easily narrow down the PR that caused the breakage.
JoshT: but for libs-api it's easier to evaluate sooner, less context switching after a month or two.
Amanieu: Another procedural mistake was a comment that says 'maybe a crater run' that didn't result in anything. We should be less hesitant to do crater runs. Crater queue used to be very long.
Mara: Should we announce breaking relnotes sooner? We know months in advance. Maybe publish 'future breakage' a month in advance or so.
JoshT: Can we automatically notify the crates that regressed in crater? E.g. register something on crates.io where you want an issue filed with some bot.
JoshS: We only want that for accepted changes. Not things that we will revert.
Mark: Might have false positives, looks bad. (e.g. someone might not have noticed if one of their dependencies has already updated and they automatically get the update)
David: I was looking into the size of the impact compared to previous cases. Similar example with 3k total regressions: https://github.com/rust-lang/rust/issues/100551 including socket2
which is a dependency of tokio
and has more downloads than time
David: No angry mob, very little fallout. This is what we expected.
Mara: https://github.com/rust-lang/rust/pull/78802 had a long checklist
David: We did the exact same here, we had a single root regression. we 'checked' that box.
Mara: Right, makes sense.
JoshT: Could we encourage more crates to use dependabot/renovatebot/etc? Would that help?
Jubilee: Is there a dependabot mode to only send issues for changes relevant to this kind of breakage?
Mara: What do we think of NixOs' usage of rust lock files? Should they be doing something different?
Jubilee: They are working on having a single cargo lock file. https://github.com/NixOS/nixpkgs/issues/333702
JoshT: Love to hear how that works out.
DavidT: At Meta we use a single lock file for many thousand crates.
DavidT: About education: do we want to teach people that it's not great to rely on .into.into.map(into).into being reliable?
JoshT: Esteban is working on a lint for trivial .into() (T->T conversion).
David: I think that's a red herring. It also affects .as_ref() etc., not just Into.
JoshT: No one thing is going to fix all sources of breakage, but there may be, for instance, a dozen things we could do that might help with 5% of breakage, and it'd be worth doing some of those.
Amanieu: Lock file could store std version, and inference could ignore newer impls based on the #[stable]
version.
The 8472: Doesn't help with other crates.
Jubilee: Doesn't need to. Only std is a new problem, because people want a new compiler version and are then forced to use the new std.
Mara: What results do we want from this meeting?
Jubilee: I'd like y'all to do what JoshT suggested in an earlier meeting: do a crater run on the PR itself. If found only in beta, immediately revert and investigate.
Jubilee: The time pressure at the beta crater run is an issue. Too easy to just 'coast' and not do anything / accept.
Jubilee: I hate that I have to guess whether libs-api considered something in their decision making or not. Whether information would change your mind. Not having the crater run done first means I can't know if you factored that in or not.
Mara: Sounds like all failures are a matter of expectation management. Needs to be clear which breakage is expected. Solution is the relnotes-breaking thing and doing crater runs directly.
Unanimous agreement.
David: Jubilee said From is poorly designed. Curious to hear more about that.
Jubilee: I can come back on that one.
Jubilee: Would be nice to have turbofish-able into
; for chaining.
Jubilee to write down process change in a PR to the dev guide.
Removed nominated label.
Committing to not causing UB for foreign exceptions.
Amanieu to reply that we're happy with the change.
P-high
regression: mismatched types (panic info hook stuff)Mara: {summary}
Jubilee: Was previously triaged, was accepted. This was just a few extra regressions.
Amanieu: This causes std to no longer be a strict superset of core. same path from core and std are different now. We thought it'd be okay. But R-A has a setting to always import things from core when possible. That would break. Maybe there is a better way of dealing with this.
Mara: std's PanicInfo is renamed to PanicHookInfo. PanicInfo only continues to exist as a deprecated alias.
JoshT: We could remove the alias in an edition.
Amanieu: Fine to leave it as is for now. In the future we can decide what to do with the alias.
Amanieu: There will not be uses of it in the future, due to the deprecated alias.
Jubilee: Do we want to pull this out of beta and fix this in R-A.
Amanieu: Is it actually a problem in R-A?
Amanieu: I think it's fine to leave it as is.
Mara: So you changed your mind?
Amanieu: Yes.
JoshS: The deprecation note doesn't tell you about the issue though. The issue when using core::panic::PanicInfo
, which is not deprecated.
Mara: Release notes in here: https://github.com/rust-lang/rust/pull/115974#issue-1903329164
Consensus to leave this as is.
JoshT: Would like to file something about dropping the alias in the future.
Jubilee: file R-A issue?
Amanieu/Mara: Wouldn't affect new code, because PanicHookInfo
doesn't exist in core.
Amanieu to post summary as a comment.
Amanieu: This exposes impl details.
Mara: We could say we panic at a limit, but not specify the limit.
David: Prefer not panicking at all. Just silently allocate less if you ask for usize::MAX or something. You'll get a panic later when you actually try to push that many elements.
Mara: +1
JoshT: Usually people reserve a specific size so they can assume they won't get alloc panics when pushing later.
David: Agree for Vec. Not for channels.
Mara: +1
Jubilee: Docs say "This channel has a buffer that can hold at most cap
messages at a time."
Amanieu: Could be useful to document everything in the standard library that allocates.
Amanieu: Reject this?
Mara: And then change impl to never panic?
Amanieu: No. We don't document all alloc panics.
Amanieu: We can document "this will allocate an unspecified amount and on failure that can panic" which can be summarized as "this allocates", which brings me to my point of documenting all allocating functions.
Conclusion: Reject this PR because we don't want to promise impl details. Vaguer description might be acceptable, up to author.
Amanieu to reply.
The 8472: We previously said people should use by_ref, and we should consider this on a case by case basis. Needs stronger justification.
Amanieu: Reasonable.
Mara: +1
The 8472: Naming issue.
JoshT: Add get? or not have get_mut?
Amanieu: https://docs.rs/once_cell/latest/once_cell/sync/struct.Lazy.html has all of them.
Consensus: add all of them. get, get_mut, force, force_mut, Deref, DerefMut.
JoshT to reply.
ptr::fn_addr_eq
to compare functions pointersTC: Lang question is whether we want 1 generic param or 2.
Amanieu: prefer FuncPtr::addr/as_ptr
TC: As I recall, one reason lang wanted fn_addr_eq
was to have a place to hang what's in the doc comment there.
Josh to reply, accept with 2 generic params
Agreement we should have these. Still no consensus on the naming.
Josh to reply: let's go with unbounded
Closing as per previous decision: add this to future-rs first.
Josh to reply
Accept, but with a crater run
Reject, The8472 to reply
Reviewed, approved, PR approved, Josh r+ed the PR.
Generated by fully-automatic-rust-libs-team-triage-meeting-agenda-generator