Design meeting 2024-02-21: T-lang feedback to T-spec

--- title: "Design meeting 2024-02-21: T-lang feedback to T-spec" tags: ["T-lang", "design-meeting", "minutes"] date: 2024-02-21 discussion: https://rust-lang.zulipchat.com/#narrow/stream/410673-t-lang.2Fmeetings/topic/Design.20meeting.202024-02-21 url: https://hackmd.io/UetDs_P1Sgmj8BEiWaXB5g --- # Questions for T-lang re Rust spec ## History ### Ferrocene T-spec has reviewed the [Ferrocene specification](https://spec.ferrocene.dev/). We have not been convinced that adopting that document would yield the outcome we want: a community-maintained document that answers questions from Rust users. T-spec decided that before it could decide whether (and how) to adapt the Ferrocene specification, it needed a more concrete idea of what an idealized specification would look like. ### The sample chapter exercise T-spec went through an exercise of drafting an idealized "sample chapter", using "arrays" as the agreed upon base topic for the chapter. This yielded the following sample documents: * [Connor's array chapter](https://github.com/rust-lang/spec/blob/8476adc4a7a9327b356f4a0b19e5d6e069125571/spec/lang/exprs/array.md) * [Joel's array chapter](https://github.com/JoelMarcey/rust-lang-spec/blob/5f76e25254a823ac8fc88c9d62256e8af2ba18f7/specification/arrays.md) * [Mara's array chapter](https://htmlpreview.github.io/?https://gist.githubusercontent.com/m-ou-se/529e3db782a0ce8ecd5043cc0adfa4af/raw/0f982c83e5733a72ec0eff6c285a48d1f5e839e4/draft.html) If you look at the three sample chapters above, you'll notice that they all spend some time describing the fragments of the grammar for array types, array-generating expressions, and array indexing expressions. Each sample describes, in its own words, the static constraints on the type and expression forms, and the dynamic behavior of the expression forms. Each sample chapter made independent decisions about source file structure, encoding of specification elements, and presentation of the rendered product. The rendered product is the most important topic of feedback; we are not seeking T-lang's feedback on source file structure. (Nor are we seeking feedback on encoding of specification elements, apart from whether they are sufficiently expressive to meet our audience's needs.) Here are some high-level points we extracted from discussion of the sample chapters: * Chapters on a specific concept (like "arrays") touch a lot of different topics ranging from grammar to static and dynamic semantics, from expressions to types, etc., making it hard to write and review and determine completeness. * We think having space for non-normative notes (as in Mara's example) helps accessibility and allows the normative parts to be concise and precise. * We think it will be useful for each point in the specification to be individually addressable via a stable name * See for example Mara's sample, with labels like "array.type.syntax" or "array.repeat.copy" * These are meant to be usable for all versions of the document after those points are introduced. (Even if the relevant text is replaced by a later version, our aim is to still provide index entries that point from old labels to a list of relevant new ones.) * Some readers may find it useful to know about language changes that corresponded to bug fixes * see for example Mara's array.repeat.zero with an annotation describing a shift at 1.63.0. * But all such information should be provided as extra annotation, analogous to how Mara's sample offsets the examples and exposition in separate blocks that are not part of the formal specification itself. * We think edition changes should be fully documented, but version changes only on a best effort (non-normative) level. Since then, we have had further discussions about the different choices each sample chapter made. We questioned whether the chosen sample topic ("arrays") was a good representative of what an ideal spec would even have as a chapter topic: the chapter content jumped between parsing, static semantics, and dynamic semantics. It was not clear who the audience for a chapter written like this would be. **Question: Looking at the three sample chapters, what do you like or dislike? Is there anything that you think we should keep in mind (to do, or not to do) going forward?** ### Top-level Chapter Structure Many language specifications seem to centralize the grammar of the language: each section introduces a new piece of grammar (e.g. "match expressions" or "use statements") and defines the full semantics related to that piece of grammar. After our experience writing the sample chapter on "arrays", we had [a discussion][discussion] about what high-level chapter structure an ideal specification would have. In particular, Mara has argued for a different high-level structure, where each chapter more closely aligns with different types of expertise. (e.g. grammar+ast vs type system vs dynamic semantics vs stdlib, etc.) One of the advantages of such a structure is that it will allow for a more obvious coupling between high-level chapters and a corresponding *Rust team* (or at least individual developers) to help with production and review of the content relevant to that chapter. E.g. T-opsem could be expected to help with a dynamic semantics chapter. The rough idea is that there is not a single section on "arrays" (with a small part about the memory layout), but instead there is a single section on "memory layout" (with a small part about arrays). **Question: How does that sound? Is this a good idea?** We are still drafting what that high-level chapter structure should look like. Here's an extracted outline of topics based on [a recent brainstorm meeting][brainstorm]; (note that these are not considered complete, though we would want to know about gross omissions): * Source code and Rust syntax tree (graph?) * Grammar, AST * Crates, modules, source files * Macro invocations * Macro expansion and conditional compilation * Name/Path resolution of (mod-level) items * Static semantics * type checking * associated item resolution * existential (impl Trait) resolution * borrow checking * unsafe checking * const eval * type inference * Dynamic Semantics * high level expression form * pattern matching and binding * dyn traits and dynamic method dispatch * memory layout and value representation * low level (MIR-like) statement form * memory model (borrowing; atomics) * ABIs and FFI linkage * The Core library crate * builtin types' traits and methods * core::* items [discussion]: https://rust-lang.zulipchat.com/#narrow/stream/399173-t-spec/topic/Structure.3A.20chapters/near/420245542 [brainstorm]: https://hackmd.io/gqNmzYyKRD-slKYSidwrsA ### Compilation phases An important open question is to which extend phases of compilation (and their intermediate results) are relevant to the specification. Can we define any phases of compilation that are inherent to the language, that are not just implementation details? For example: tokenizing, parsing, macro expansion, static analysis and name resolution, monomorphization, const eval, codegen/execution. (Relevant for e.g. defining when `N` is checked in `[u8; N]`.) If yes, then we can use these phases for the top level chapter structure. **Question: To what extent should *compilation phases* exist in the specification?** ### Intermediate representations - For parsing/grammar, we will need to define **tokens**. (This seems uncontroversial.) - For macros, it will be relevant to specify (the existence and at least some parts of) the **AST**. (E.g. to properly define interaction between $tt and $expr fragment specifiers.) - For borrow checking and more, the spec might need to define some kind of "desugaring" and "lowering" to some simplified model/representation. - For operational semantics (and const eval), the spec might need to define "some kind of MIR". **Questions: Should the spec define "some kind of MIR" or "abstract machine"? How can we keep that minimal? What can we specify without?** --- # Discussion ## Attendance - People: TC, Josh, tmandry, Connor Horman, eholk, Mara, Joel Marcey, Urgau, Adrian Taylor, nikomatsakis, Monadic Cat, bstrie ## Meeting roles - Minutes, driver: TC ## Re "compilation phases" Josh: It regularly comes up in lang discussions that it matters whether we reject a particular construct before or after macros get to run. If we reject something as invalid at lexing time then macros don't get to use it; if we defer that until later, macros can look at it and then they can generate something valid. So at a minimum the spec needs to cover when macros run, and the implications for what macros can and can't do, and for any given error of lexing/parsing, what type of error it is. (The balance here is that we try to give macros a reasonable amount of capability but without letting macros do things that will constrain our own future ability to expand the language.) Connor: This also applies to parse-valid vs. parse-invalid constructs as proc-macros can examine the inside of ast fragments from decl-macros (And of course trivially a decl-macro can parse an `$item` or `$expr` fragment and discard it). eholk: I don't know what they are, but there is probably a minimal set of compilation phases that any Rust implementation will always have. I'm guessing a lot of this will come around the macro system, since macros tend to require some kind of phasing model. Mara: For macro's it's clearer than for anything that happens afterwards. I'm sure we need some form of AST, but I'm especially curious about what happens afterwards. Do we need "some form of mir" or "minirust" or... ? NM: (See below also.) One advantage of organizing by the domain of expertise is that special notation could be added for each section. I think we would find value in that. That may be different than separating phases. NM: Is the question, "are there other phases that are semantically significant?" or "is there value in introducing IR to simplify the spec?" Mara: Those are separate questions. If we can identify logical separate phases, those should probably be the top level chapters. NM: I would separate function and expression bodies from checking of declarations. I'm not sure if they are cleanly separable. There is some interaction. But in my mind they are different. JT: Type inference would be another one to pull out. Or monomorphization. NM: Monomorphization is also a good one. JT: For didactic purposes, monomorphization is a good one to explain to people. NM: As we've seen with const-expr questions, it has an impact on what programs are accepted. JT: To what extent are you asking for things that are lang relevant but primary for didactic value versus things that are lang relevant and that matter to the semantics of the language. JT: E.g. monomorphization is relevant because it affects what programs are accepted. Mara: Would it make sense, e.g., to have a chapter for everything before monomorphization and chapter for everything after monomorphization? JT: You're proposing that as the top-level spine of the spec? Mara: Yes, i'm asking if that would that be a reasonable top-level approach. JT: I'm not sure it will be the best didactic representation to separate things by phase in all cases. I'm not sure that will be the top-level org structure that makes the most sense. NM: It does make sense to have major chapter headings that represent all phases, but I don't think those will be the only kinds fo chapters. E.g., the grammer is not a phase per se, but it's worth highlighting. E.g., or a chapter might introduce MIR. NM: But at a high level, "yes". Many things will correspond to phases. tmandry: General +1 to what Josh and NM said. Monomorphization is a good example. Borrow checking will be difficult to specify (MIR? Polonius?). NM: Definitely I have opinions still in progress about that. The question I have is broader, e.g. with respect to trait checking and a-mir-formality. That's worth a discussion all on its own. Mara: What I'm proposing for the structure is that, e.g. if we ever finish a-mir-formality, that there is a clear section for it to go into. It'd be better to not spread that around. tmandry: I have a counterpoint here that gets into the next topic... ## Overall Organization eholk: I like the idea of organizing by things like grammar, type system, operational semantics, etc. When I've used language specs before, this was the most intuitive way for me to do it. For example, I'd be more likely to start by parsing the whole language, rather than trying to implement everything there is to know about arrays and then moving on to the next feature. Josh: +1 for organizing by "area of expertise"; that seems likely to ease many uses of the spec, including both reviewing and consuming it. tmandry: I'll provide a counterpoint. I think from the perspective of searching and accessing the spec, grouping by feature is best. Grouping by semantic level may be best for maintaining the spec itself. Mara: We could ask this the other way around. If someone has a question about what guarantees does Rust give about memory representation of builtin types. Right now they would have to look around in different places. NM: I agree with both of you. There is no one best solution. I feel like this is what indexes were made for, when there is no one best organization. What might happen is that the best division of concepts may vary somewhat by section. NM: If you look at a-mir-formality, the rules map fairly closely to our language constructs, but there is some remapping. There is some barrier in that you have to learn this IR before you go in. There's some question of audience here. JT: Agree that there is no one organization that will satisfy everyone and we may have to present things in different ways. I like the idea of organizing things around what people care about. But that's probably not the organizational structure that we want for the whole document. We could solve both needs by having separate sections that reference shared sections: a common substructure within many chapters, such that for some needs you read linearly, and some you read "across" chapters horizontally, reading similar subsections of many chapters. Mara: An example I gave in the spec meeting: if you have some information about ADTs, do you put it on the structs page and the arrays page and the tuples page? Not every chapter will treat these as separate concepts. To some chapters, "ADT" might be one concept, while in others the difference is relevant. (Same for loops, etc.) tmandry: I like what I'm hearing about making either direction accessible regarding of which factoring we choose. I would like to do something better than the index in the back of a book. NM: The people looking to do that are probably somewhat casual. This may be something with which we'll have to experiment. Perhaps there is a way to help the sort of people who would use the reference today. tmandry: We could, e.g., have a page on arrays that links out to the relevant sections in each chapter. Mara: The section on booleans in the reference... https://doc.rust-lang.org/reference/types/boolean.html ...it covers everything with respect to booleans in one place. I'm not against making something like an index here. But I can't imagine that someone wants to know all of these things at once. I'm not saying it never happens, but it's not clear to me that this combination of information is that useful. JT: +1 for that not being the ideal organizational structure. The thing I wouldn't want to see is having insufficient cross references that would cause having to jump all around. I want some organization that is suitable for reading linearly. tmandry: I do agree it makes sense to have a section on, e.g., operational semantics. My worry is that we end up scattering a bunch of information that is cross-dependent on each other throughout the spec. This is a problem I've seen elsewhere. I don't want everything to be so cross-referenced that you have to do a depth-first search to understand anything. Mara: That's exactly what I'm trying to prevent. Joel: I think part of the reason that specs are written the way they are - based on feature - is partly momentum based insofar that is the way language specs have generally always been written. The way Mara is proposing could potentially start a new trend in spec writing. Joel: A reason that serious folks who would read a spec would want to see it outlined be feature would be if they were tasked to implement a specific feature - but as of now, the spec is not being targeted for implementers of feature. Joel: For the casual person who comes across the spec who wants to know all about a specific feature all at once, I wonder if that is where the spec sends off to something like the Reference, maybe in an informative way? Joel: I think of a use-case of the spec as being, e.g., for safety-critical folks. So the way that Mara is proposing this to be structured seems the way to go for this. JT: Full support for the spec team trying to do something that may be better than historical specifications. NM: Final point that relates to what Joel said: I am hoping that part of stabilizing a feature will ultimately be proposed spec text. My expectation though is that there will be some "high-level" spec text that the main person driving the feature writes, kind of the "guide level" part of the spec, but that e.g. detailed type system stuff will be handled more by T-types and friends who are familiar with those things. Re: what Joel said, that surprises me, because I thought the reference was being rolled into the spec. =) Joel: One benefit of organizing the specification in this way is that it makes it easier to farm out pieces of it. People are more expert at specific phases rather than specific features. tmandry: There are tradeoffs either way. We should do our best to mitigate the effects of those tradeoffs. tmandry: Is there a plan to continue maintaining the reference once the spec is done? TC: The Reference would be deprecated at that point, is the understanding. ## Other aspects of feedback? Mara: I put some effort in my version to make it look nice. It'd be good to get some feedback on that. Mara: Another aspect would be about the degree of non-normative wording to put in between. JT: The version Mara posted looks great. Not just in terms of styling, but also in terms of combining both the commentary and the normative sections. JT: I do want to make sure "casual users" in the sense of "I want to look up how precisely X works" should be supported, where they may not be implementers or similar, they might just be wanting to understand some detail of the language better. Joel: I may have misspoked re: the reference. What I meant to say was that we can have a lot of the reference as informative pieces within the spec that could talk to specific features maybe. But granted, now that I write that, I am not convinced that I have thought that through enough. Mara: Any other places people have feedback? (The meeting ended here.) ## Phases and IRs nikomatsakis: I think that having a "MIR" for (potentially) borrow checking and (definitely) operational semantics will be useful. This is another advantage to Mara's proposed organization -- i.e., a major chapter might begin by defining an "IR" that is specific to its purposes, which serves to highlight the information most relevant to that particular section. I personally find that easier to think about. It's a bit of an anecdote, but I still remember the days where the Rust compiler tried to do everything from parsing to LLVM IR using exactly one IR (the AST), and it was very difficult to understand because the full complexity of the language carried down through everything. I do think we should be careful to be clear that these intermediate concepts are "spec devices" and try to keep them minimal. Connor: Other specs (like the C++ Standard) use the term "exposition only" for such concepts. Connor: I'm going to guess that runtime/const eval would probably want to import minirust specifically (or a modified version thereof). ## Adopt the Reference to narrow the points needing decision? TC: I'm curious what people think would be the minimum viable set of changes to the Reference to make it into an acceptable specification. Josh: I would propose that that's a topic that could easily take up half this meeting and end up relitigating the spec RFC. I think you should start that discussion async rather than here. TC: Agreed. In talking with Eric Huss, we discussed writing up a long-form document about this. The relevance to this topic is how it might help narrow down the decisions we need to make. --- TC: Here's the basis of this: - When the lang team has needed to decide on some verbiage in the Reference in the past, we've easily spent half an hour deciding on a sentence or three. The latency to do this has often been measured in months. - Consequently, not starting from the Reference seems to discard this existing body of work. - Even if we decide on what a perfect specification should look like, it's not clear that we have the resources, in terms of human bandwidth, to achieve in a finite amount of time a result that is better than the Reference. - It's not clear that opening new degrees of freedom for what might constitute a better specification will produce more value than spending that work improving the Reference. So I do wonder whether our collective efforts might be better spent starting from the Reference, and I'm curious if that might also help us with narrowing down the points that we need to address here with respect to what this document should look like. Mara: We wouldn't just ignore the information that's already in the reference. E.g. when we write the section on memory representation, we'll just start with the information that's already spread throughout the reference. Mara: Also, the right grouping of information (section structure) can make it a lot more efficient to review. Reviewing "the memory representation of bool" might take half an hour, but reviewing "the memory representation of all builtin types" might not take much longer. Mara: Take a look at this example in the reference: https://doc.rust-lang.org/reference/types/boolean.html It is a mix of wildly different kinds of information in one place (syntax, memory representation, even if expressions, logic operators, truth tables, ..), making it hard to review all at once, and impossible to tell if it is 'complete'. ## Style feedback Josh: General feedback on style: the layout and formatting used in Mara's sample chapter looks incredible, and I'd love to see that level of care in the specification text, assuming it's easy to write and maintain. Also really appreciating the presentation of alternating spec text and non-normative text. tmandry: +1. I also agree with the point of non-normative text allowing the normative text to be concise; I think that's valuable. tmandry: I would add that we should keep in mind consumers of the text that can't rely on visual aids, i.e. who use screen readers, and attempt to make any visual aids we use that carry normative information accessible to those. On a related but separate note, we may want to consider the use case of embedding specification text in a different (non-HTML) medium. Mara: re screen readers etc: Absolutely. (edit: I have a bunch of experience with that that I'll bring with me into this. ^^ (writing, not using/reading, to be clear.)) Josh: Can we make sure that in the concrete review process for specific proposed formatting of the spec, we get feedback from people who have firsthand experience with accessibility needs to make sure it works or them? (re edited text above, sounds like Mara may have the writing side covered. We should also consult with folks who actively *need* such accessibility themselves as well, though.) Niko: big +1 to the non-normative text ## Historical version information Josh: Regarding notes about changes in past versions, I think these shouldn't be prioritized or normative, but best-effort notes seem fine. Ideally these would be out-of-line (e.g. footnote) to avoid distracting from the body of the specification, and I don't think they should have special formatting in the main body of the spec (e.g. the sidebar ">1.63") note in Mara's sample chapter. Josh: That said, differences between *editions* should absolutely be highlighted and normative. ## Grammar note Josh: The spec gives a grammar that requires a trailing comma, then says "where the last comma may be omitted". I think this is potentially a *good* thing for the clarity of writing, to avoid obscuring the details of the grammar. Worth deciding that up front, and deciding on a standard presentation for certain information, and possibly hoisting and highlighting specific things that'll commonly be done for (e.g. precedence/associativity, or trailing commas, or ...).

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.