--- title: Orphaned Code in `rustc` tags: steering, rustc breaks: false --- # Orphaned Code in `rustc`: Risks and Mitigations ## The Problem: loss of expertise to attrition The Rust Compiler is a complicated beast. Some aspects have been implemented in a modular fashion. Other aspects are less modular; some are inherently cross-cutting. Regardless of how nicely the overall system decomposes into modules, the fact remains: It is massive. At various points in time, we have found some areas of rustc's implementation have zero experts. Other areas have only one expert. Neither of these situations is healthy. Last year we investigated [RustContributor::\*](https://github.com/rust-lang/compiler-team/issues/557) as a mechanism for both onboarding newcomers and also growing experts, but that's on hold for now. I want to talk about what other things we can do today to combat this trend. ## The Expert Map Some time ago, the team invested effort in developing an "expert map": https://github.com/rust-lang/compiler-team/blob/master/content/experts/map.toml I don't know whether the expert map is utilized. Given that it was last updated 2 years ago ([2021-08-11](https://github.com/rust-lang/compiler-team/commit/15a123788adc47a54d402fa9fbef87f4b30f88bf)), its almost certainly inaccurate. But separately from the question of the accuracy and utility of the expert map, there's a different problem to discuss. ## Owners vs Experts The *whole team* shares ownership of the compiler code base. We do not track code ownership at a fine-grain, in the sense that we give any person with r+ rights the capability to approve changes to the compiler. We trust people to enlist input from other indivduals when necessary. We also trust people to not r+ something when there is a more appropriate expert who should be reviewing the code instead. This makes it a little tricky to infer automatically (e.g. via machine heuristics) whether a piece of code needs to grow its set of experts. A heuristics like "the number of people who r+ a file" might be wildly off in terms of the number of actual experts in a given code base. (And there's also the problem that a crate or file of code need not correspond to a specific feature with its own set of experts; i.e. many such features have implementations that cut across the rustc code base) ## Learning Methods There are five main ways that pnkfelix learns about a body of code, that they know of. (They would love to know of others they should add to this list.) 1. Read the source code itself 2. Read the curated documentation associated with the code (e.g. the rustc-dev-guide) 3. Talk to community members about the code, e.g. in zulip; and encourage them to do presentations about the code (see e.g. compiler-error's recent presentations about the type system) 4. Explore an explicit trace of the code's execution on a concrete input (see e.g. pernos.co) 5. Read the git history and the associated PR conversations. Regarding (1.), (2.) and (3.) above: We can continue to invest effort in improving our source code and the curated documentation. pnkfelix does not have suggestions for things to change there. Item (4.) is usually good for solving specific problems. For growing an expert, pnkfelix guesses it is a hard method to employ (e.g. it probably requires a significant amount of prior knowledge for it to be effective). Item (5.) is the place where pnkfelix is most interested in trying to focus effort. (See below.) ## Mitigation Tactics pnkfelix doesn't have great answers here, but here's their best shot: Claim (but one we should double check in the meeting): *We want to keep shared ownership*, in the sense that we want contributors with r+ rights to have the capability to approve code even in areas where they are not complete experts (and simply *trust* those contributors to not abuse that capability). Claim: Github reviews are things that happen *every day*; there is a body of knowledge being built and exchanged via PR reviews. pnkfelix would like to leverage that last bit. So, the idea: Can we classify bits of code into "things where we want to build expertise". (I.e., somewhere, we explicitly tag certain crates or certain files as "needing expert growth"). And then, for those classes of code, have bot-generated text that **encourage** associated PR reviews to go deep, ask more questions about the surrounding code base, in order to surface the underlying design details in a place that is world-readable (i.e. the github thread)? (pnkfelix recognizes that github PR conversations are not the ideal way to store knowledge for later retrieval. At some point, someone *might* attempt to post-process those conversations into comments to be added to the code, or text to add to the rustc-dev-guide.) ## Digression: Knowns, Unknowns, Unknown Unknowns Esteban pointed out to me yesterday that there is a broader problem than code being orphaned. As a project, we do not attempt to track our contributors' activities. People from time to time will voluntarily state "I'm going on vacation for a week" or "I need to step back from the project for a while, maybe a year or more", et cetera. But we do not have much of a clue about overall trajectories. People show up, some successfully contribute. Each has differing levels of proficiency/engagement. And then, many leave. But unless they're a member of a team, there isn't much institutional knowledge of these transitions. Maybe the situation is a little better for us than for other teams, in part because we *do* have the compiler-contributors team, and so there is an approximate record there of the human engagement. But even in that case, the maintenance of that list is only driven by the team, usually the T-compiler leads, periodically deciding that the list needs review. Should we be trying to measure these transitions mechanically? Should we reflect them in our own planning, in terms of predicting whether we can support the compiler codebase in the months and years to come? ## Discussion Topic Queue ### How do I add an entry? pnkfelix: Like this! ### How should we decide if a feature or crate needs to grow expertise? pnkfelix: I forgot to discuss this in the doc. My current thinking is: If someting has a *team* that's meeting regularly, then that's an owner. wg-async covers async stuff. T-types covers the type system. and so on. So the idea then is to take each crate, or maybe a finer-grain, each module or something, and map it to an owning *team* (not a person). If there is no team, then it needs an entry in the expert map, and that map needs to be reviewed periodically (perhaps with stats about how code reviews are going? not sure.) The point is, I think large parts of the code **are** owned by teams and do **not** need this treatment. ### We already have some ad-hoc mechanism in rustbot oli: Our `triagebot.toml` contains ping groups for folders or files. Is anything not covered by that "unowned" (or "unknown ownership")? Should owners get pinged on all PRs? ### "Mitigation tactic" vs "primary purpose" @wesleywiser's hot take is that the primary purpose of doing code reviews **is** knowledge transfer. I think the mitigation tactic above is right on the money and we should lean into that heavily. My main concern though is that casual contributors might not realize what they're getting themselves into by making small changes to a file lacking clear maintainers. In my mind, opening our policy a bit to allow changes to code that doesn't currently have expert reviewers in exchange for much more involved review process is good, but could be overwhelming if the author doesn't realize what's going to happen. @davidtwco: I'm slightly skeptical of reading pull requests being useful to contributors other than those involved in that pull request (the author and the experts) - it doesn't scale very well, but that might not be an issue. @wesleywiser: Yeah, I don't think this approach scales very well at all but I think this is more useful to bootstrap a few (like literally only 2-3) people who can become expert reviewers going forward. Once we have an expert or two, some of the other strategies Felix enumerated above become viable again (eg, if we have no experts, how do we review rustc dev guide contributions?). ### Exercise/task directed learning pnkfelix: davidtwco brought up the point that just reading the source code does not always work well for them, and that they have more success when they have some specific problem they are solving or other goal in mind. pnkfelix: This led me to wonder: Instead of, or in addition to, pushing on Github PRs as a way to build/archive knowledge -- should we also be pro-actively asking newcomers to to take small tasks in code bases that need to grow expertise? @davidtwco: some parts of the compiler are much better for this style of learning than others