owned this note changed 4 years ago
Published Linked with GitHub

What to do re: incr-comp and 1.53 release

Background

Reference: "Unstable fingerprints - what to do on beta (and stable)" #84970

We released 1.52.0 to stable. It accidentally included a "fingerprint check", which has extra validation of incr-comp state. When the validation fails, the compiler ICE's.

(We expected to have a lot of those ICE's mitigated by the time 1.53.0 was released, back when we thought the fingerprint check would only be introduced in 1.53.0)

So, mistake happened.

We responded by releasing 1.52.1, which disabled incremental by default on all profiles, but added a RUSTC_FORCE_INCREMENTAL environment variable that would reenable it for the profiles where it was already present.

(I believe we added other mitigations too, like improving the diagnostic output so that users would have a better chance of knowing that the ICE was due to incr-comp issues, and that a cargo clean would fix the problem for them.)

The Question Now

We have options.

The status quo is this:

  • The aforementioned RUSTC_FORCE_INCREMENTAL environment variable is only added to the stable channel (1.52.1+). It is not on beta or nightly today, so users who are on beta and nightly are getting the eventual behavior that we are striving for, in terms of when incr-comp is activated.

  • We have not landed all of the fixes for the incr-comp fingerprint ICE's into the 1.53-beta channel yet. There is a table on the description of #84970 that shows the list of known bugs, and where their fixes have landed; at least three fixes are currently only on the 1.54-nightly release channel.

Therefore, the question I want us to discuss is: What is our plan for the 1.53.0-stable release?

  • We could do nothing at all: Just let the current 1.53-beta follow the train into stable. The main issue with this is that it means there is a risk that stable users will hit one of the three bugs that are known and only have fixes on 1.54-nightly, and will get a corresponding ICE.
    • The current best predictor of such outcomes is from users of the current beta. We have had very few reports of ICE's on the current beta, if I understand the situation correctly.
    • But "very few" != "zero". In particular, Carol Nichols did report yesterday that she hit one of the ICE's while using the beta. (She is someone who switched from stable to beta because 1.52.x-stable with incr-comp forced on was constantly ICE'ing for her.)
    • There is also discussion in Zulip suggesting that x.py users are hitting problems with beta
  • We could plan to put the same RUSTC_FORCE_INCREMENTAL environment guard onto the 1.53.0 stable release. This would essentially maintain the status quo established by 1.52.1, in terms of our end user experience. This is arguably the most conservative step, in terms of ensuring developers on stable Rust do not hit ICE's. (The main cost of this is that the users of 1.53 stable would not get the benefits of incr-comp, at least not without opting into the RUSTC_FORCE_INCREMENTAL environment variable.)
    • If we do plan to put the same environment guard onto the 1.53.0 stable release, we should also discuss whether it should be additionally added to the 1.53-beta channel now. (Felix personally thinks this would not be a good idea, but we should at least point it out, since there is value in having beta match stable as much as possible prior to release.)

Note that we could make the answers to the above questions conditional on which beta backports get nominated + approved, in terms of establishing which ICE's will be fixed in time for the stable release. I don't really want to complicate things by doing that, but it is an option.

Yesterday Felix took a straw-poll: https://twitter.com/pnkfelix/status/1397571643405475840

  • We can discuss what takeaways we see in results in meeting itself

Arguments and narratives

Please add a description of how you reached your decision here. Of course, writing something here doesn't commit you to that same position later. You're allowed to be convinced by other people :P

Avoid worst case risk of people trying 1.53.0 and us needing a point release

Possible outcomes:

  • If we enable incremental on 1.53.0:
    • Release 1.53.0 and discover that people are still encountering a lot of ICEs, have to issue 1.53.1.
  • If we disable incremental on 1.53.0:
    • People upgrade to 1.53.0 and discover it is slow. They either adopt 1.51.0 or nightly (or use the flag).
  • If we enable incremental on 1.53.0 after backporting fixes to all known problems:
    • We discover that those fixes have subtle bugs and have to scramble to fix them.

In releasing 1.52.1 with incremental disabled, we demonstrated that we value correctness first above all. We also know that many customers don't need or use incremental (e.g., because their crates aren't that big). I think the risk of having to issue 1.53.1 outweighs the benefits of incremental. I think the story of "we have scrambled and produced fixes, but they didn't all make it into beta, so we're leaving incremental off for one more cycle" is perfectly defensible.

I am concerned in particular that now that we've called out these ICEs, we should really have them thoroughly fixed before we re-enable and not convinced that we know that this is the case. (That said, it's hard to get data when they're disabled)

Enabling incr-comp on 1.53.0 by default will show a nice ICE

We are now producing a regular error along side the ICE, which means that hitting these cases now has an actionable message to unblock people. Because of this, the problem is one of UX, as well as potential issues encountered in CI environments. The error message could be expanded with instructions to disable incremental compilation outright, to get the behavior currently in stable. (Note: if we push users to disable incr-comp on their own, they might forget to disable it long after we've fixed every issue.)

One consideration is how the rate of incremental ICEs compares to the 'background rate' of ICEs that users encounter when using rust. Initially, the incremental ICEs were clearly more frequent (particularly predicates_of). However, this may no longer be the case - while there are still several ICEs that can occur, I believe the most frequent cases now have patches. If this is the case, then I don't think we should be treating the remaining ICEs much differently from ICEs in general - ideally we would have none, but we acknowledge that people will sometimes hit them, and that the general solution is to upgrade/wait for the latest version of Rust.

Keep incr. comp. off by default until we have fixed all known issues (mw)

We've gone to the trouble of doing a point release just to disable incremental compilation.
If we re-enable incr. comp. in 1.53 and people run into error messages again (even if the error messages are informative and give guidance on what to do) there's the risk that this will be perceived as us not really having the situation under control ("They did a point release to fix this and it shows up again in the very next stable release?!")
That perception might not be accurate but I think we should strive to avoid it. Many people don't read news, blog posts, and error messages carefully and we should not expect them to be nuanced in their judgement. Reliability is Rust's most important selling point and we should be careful to honour that. The message "we are playing things safe if you want more speed do this" is better than "Don't worry, that compiler crash is harmless and actually has nothing to do with that miscompilation you might have heard about (even though it kinda looks the same) - go ahead and set this obscure environment variable and everything will be fine".

Why not both (pnkfelix)

I see a strong argument for porting RUSTC_FORCE_INCREMENTAL forward into 1.53 stable: to ensure our users have as ICE-free an experience as possible. We effectively said that if developers under default incremental have to manually cargo clean (or take similar actions) to recover from the fingerprint ICE's, then that is a unacceptably subpar UX, and so we made our users opt into taking on that burden via the environment variable. (Some users responded by downgrading to 1.51, despite our efforts to discourage that response, and less than 1% of respondents say they're using the environment variable this may be in part because enabling the environment variable alone on 1.52.1 does not resolve how common the ICE's are on the 1.52.x series.)

My main argument for enabling incr-comp on 1.53.0 by default is that according to our (very limited) data provided by beta users, the common case is that 1.53-beta users are not hitting the ICE. Our over-indexing on the UX led users to consider entirely different options (like downgrading), rather than taking the paths we are advising (the environment variable or switching to beta). Note also that we have other ICE's that we let go unfixed, in the absence of strong data saying that they are a real problem.

So, the "why not both" option: Do we now have time to try to implement an approach that responds to an ICE by automatically deleting the incr-comp cache and re-running compilation? That way, in the common case (according to beta user data) users will still get the benefits of incr-comp, and in terms of UX, no one sees an ICE; they "just" see a 2x blow-up in compile-times when the ICE occurs, which is hopefully rare.

(The obvious argument against this is implementation complexity, with two horns: 1. We may not have time to get such functionality right for the 1.53 release, and 2. Even if we do have the time, we almost certainly don't want this functionality in 1.54 and beyond, so it would represent wasted effort. Anyway, realities of complexity kept me from bringing this up during meeting itself, and thus given the tradeoffs outlined above, I went with the reasoning in my second paragraph above for my vote.)

Disable incremental on 1.53 and re-enable on 1.54 when known issues are resolved (wesleywiser)

My two main concerns are:

  1. We provide a stable tool and experience to our users. While compile times are a problem for many people, I think regressing compilation times is a less destabilizing issue than repeated compiler ICEs. Users already know rustc can be slow but we don't want them to come to expect it will be buggy as well.

  2. If have to ship a point release on 1.53 disabling incremental again, I think that looks very bad to the general community. We have most of the issues already resolved in 1.54 and with a push to resolve the remanining ones and a request to the community to please test it and provide feedback, I believe we can make a solid case for re-enabling incremental by default 1.54.

Requests for clarification

  • When is the 1.53.0 stable release?
  • How many known ICEs are still unfixed on nightly?
    • The table in #84970 suggests 1 out of 7
  • How many fixes have not been backported, and have those backports been considered and rejected?
    • The table in #84970 suggests 2 are backported and 3 are not
Select a repo