Plan for Rust language definition

--- title: "Plan for Rust language definition" date: 2025-07-28 url: https://hackmd.io/c8WGRVutQ3GDQSHh9z51fg --- # Plan for Rust language definition ## What is our task? To set out our task, let's consider some passages from a more civilized age: --- > A precise description of a programming language is a prerequisite for its implementation and for its use. The description can take many forms, each suited to a different purpose. A common form is a reference manual, which is usually a careful narrative description of the meaning of each construction in the language, often backed up with a formal presentation of the grammar (for example, in Backus-Naur form). This gives the programmer enough understanding for many of his purposes. But it is ill-suited for use by an implementer, or by someone who wants to formulate laws for equivalence of programs, or by a programmer who wants to design programs with mathematical rigour. > > This document is a formal description of both the *grammar* and the *meaning* of a language which is both designed for large projects and widely used. As such, it aims to serve the whole community of people seriously concerned with the language. At a time when it is increasingly understood that programs must withstand rigorous analysis, particularly for systems where safety is critical, a rigorous language presentation is even important for negotiators and contractors; for a robust program written in an insecure language is like a house built upon sand. > > -- [The Definition of Standard ML, Revised], Milner et al., 1997 [The Definition of Standard ML, Revised]: https://smlfamily.github.io/sml97-defn.pdf --- > The purpose of the Definition was to define exactly what Standard ML is, and what it means. The emphasis was on precision and brevity. The document is almost one big definition, at the technical level that one finds in mathematics. Indeed, whereas mathematical authors normally single out their definitions from their narrative text, we tended to single out comments from the defining text, since the latter so much dominates the former. Moreover, to keep the Definition within reasonable bounds we often omitted theorems which would justify the definitions. Not surprisingly, anybody who sets out to read the Definition from cover to cover finds it tough going. > > This kind of language definition, which tells the *what* more than the *why*, can give a distorted impression of the language as being determined beyond argument. Moreover, the use of mathematical notation can give a false air of authority; a reader may say "Well, I don't really understand what is going on here, but someone probably has a good reason for it". (A less charitable reader may say something stronger.) It is true that if a community of people wish to use the same programming language with the same meaning, then a precise definition is a necessity - and to be precise it must be in mathematical notation. But, although necessary, it is not *sufficient* to gain acceptance by the community. For this acceptance, it is also necessary to achieve a reasonably high level of communal understanding. Thus the main purpose of doing formal semantics, which is to gain greater understanding, has also a solid pragmatic justification. Moreover, formal notations are just the vehicle for approaching this understanding, and formal definitions are just the starting point. > > Early in the process of writing the Definition, we decided that to interleave more discussion and theoretical development with the primary material – which exactly describes the execution of ML programs – would lead to an unwieldy text, and that cross-reference between the primary material and the theoretical development would be much easier with two texts than with one; hence this Commentary. We hope thus to make both texts accessible to readers with different purposes: > > - Implementers, who wish to stay faithful to the Definition; > - Programmers, familiar to some extent with ML but wishing to deepen their > understanding; > - Teachers of courses on programming languages; > - Researchers into programming language design and semantics. > > -- [Commentary on Standard ML], Milner and Tofte, 1991 [Commentary on Standard ML]: https://scispace.com/pdf/commentary-on-standard-ml-19ann4exso.pdf --- Rust has a good reference manual -- the Reference. To some degree, it's also a reasonable commentary. But we lack a written *language definition* -- a document that *completely* describes the form and interpretation of Rust programs. To produce that is no small task. We should not underestimate it. It's my view that, if we succeed in producing such a document, Rust will be the most complex (in a Kolmogorov sense) language ever fully defined. People want this, though, and for good reasons. From the Project side, with Rust moving into its second decade, as we work to continue delivering "stability without stagnation", having a precise written record of the language is critical for clear-headed language design. From the academic side, a complete description of the language opens all manner of doors to useful research, such as being able to state the soundness theorem for the Rust type system. From the industrial side, a complete and maintained language definition helps to give confidence in the language and assure business continuity. Assembling this kind of language definition is a *research program*. It's not just writing. The truth is that we don't know what Rust is today. That's going to have to be *discovered*. Doing that will require the researchers to, first, have a clear idea of what success looks like. We aren't going to have the time or the talent from the Project side to teach, in sufficient detail, what a language definition should be. We'll need the right people. Those people will need to embark on a careful program of uncovering the details of the language, verifying them, testing and selecting strategies for encoding these, and producing a clear, comprehensive, and attractive artifact on which the Project can build for many years and decades to come. This artifact will need to be both a language definition and the associated commentary. There's no way to separate these two tasks, and we'll need both to serve the goals we have. With the benefits of modern tooling, perhaps we could succeed at interleaving these rather than having to separate them, as the authors of SML did. How large of a program is this? To attract the resources we'll need for this, we need to have a plan that we're sure is going to work and deliver these complete artifacts in around 18 months. No plan that relies on hiring one person can deliver that kind of assurance. Besides the obvious, it's just too easy for people to underestimate the time required, even when they know that's a risk. A plan with high confidence is going to need enough people to have some slack. It's my sense that a research team of 3 people is likely the minimum for this, and that a team of 5 would be perhaps best. ## What is this document? To pull together the funding needed for that, we need a plan. We need a *research proposal* that outlines, in detail, what the "acceptance criteria" would be for the final product and what the process would look like for getting there. It'd identify what the harder and easier parts are. It'd talk about what questions have known answers and which would take more research or require decisions from teams. It'd discuss what the non-obvious benefits are of having a full language definition, e.g. in terms of what research doors are opened. With this plan, we'll work with the Foundation to attract what is needed to put it into action. Many have expressed interest in seeing this kind of end result. We haven't yet put these companies to the test by presenting a plan that we know would work. That's what this document is for. --- ## Background *TODO* ## Section *TODO* ## Section *TODO* ## Section *TODO* ## Section *TODO* ## Section *TODO* ## Section *TODO*