Design meeting 2024-11-20: Declarative macro improvements

--- title: "Design meeting 2024-11-20: Declarative macro improvements" tags: ["T-lang", "design-meeting", "minutes"] date: 2024-11-20 discussion: https://rust-lang.zulipchat.com/#narrow/channel/410673-t-lang.2Fmeetings/topic/Design.20meeting.202024-11-20 url: https://hackmd.io/XuNQncPpSECt_4YQhfWiGg --- # Goals of this meeting - Show how these and future RFCs fit together - Get consensus on going ahead with this plan - Answer open question about derive macro syntax - Review design and possibilities for macro fragment fields in detail - Discuss project goal proposal # Summary There are currently several things that you can *only* do with a proc macro: declare an attribute macro that you can invoke with `#[mymacro]`, or declare a derive macro that you can invoke with `#[derive(MyTrait)]`. Proc macros are complex to build, have to be built as a separate crate that needs to be kept in sync with your main crate, add a heavy dependency chain (`syn`/`quote`/`proc-macro2`) to projects using them, add to build time, and lack some features of declarative (`macro_rules!`) macros such as `$crate`. I'm proposing a set of RFCs to make it possible to do everything with declarative (`macro_rules!`) macros, and then to make it incrementally easier to write such macros over time. I expect to propose more such RFCs in the future, and I'm planning to propose a 2025H1 project goal. I don't expect the ecosystem to convert over immediately, especially for proc macros that benefit heavily from using full Rust functionality. Rather, I want to do this as an incremental process, where people convert over as they have the features they need. As these features become available, I'd like to do a series of blog posts on using them. Those posts should also provide guidance to help ensure crate maintainers don't get pressured to convert over, and that the features are in place for people to do such conversions more easily. # RFCs The RFCs are short, but they're summarized here to avoid needing to go through them in full in this meeting. They're available for reference in case there are questions or details answered by the RFCs. ## Attributes [RFC 3697: Declarative `macro_rules!` attribute macros](https://github.com/rust-lang/rfcs/pull/3697) provides a way to declare attribute macros: ```rust! macro_rules! main { attr() ($func:item) => { make_async_main!($func) }; attr(threads = $threads:literal) ($func:item) => { make_async_main!($threads, $func) }; } ``` `attr` rules can be added to existing `macro_rules!` macros, which makes it possible for existing declarative macros like `smol::main` to *add* attribute support without breaking backwards compatibility. I've looked at macros like `smol::main` and `tokio::main` in the ecosystem, and I expect that it'll be relatively easy to write such attributes as declarative attribute macros. (`smol::main` already *is* a declarative macro, so this would allow using it as `#[smol::main]`.) ## Derives [RFC 3698: Declarative `macro_rules!` derive macros](https://github.com/rust-lang/rfcs/pull/3698) provides a way to declare `derive` macros. A simple example: ```rust! trait Answer { fn answer(&self) -> u32; } #[macro_derive] macro_rules! Answer { (struct $n:ident $_:tt) => { impl Answer for $n { fn answer(&self) -> u32 { 42 } } }; } #[derive(Answer)] struct Struct; ``` This mechanism also allows declaring "helper attributes" for the derive to parse. Open question: should this use a `derive(...)` rule inside the macro (similar to the `attr(...)` rules of RFC 3697), rather than or in addition to a `#[macro_derive]` attribute? I think we do need the `#[macro_derive]` for things like `#[macro_derive(attributes(helper))`. However, it may make sense to have `derive(...)` rules *anyway*, for consistency and for future possibilities like handling parameterized derives (`derive(MyTrait(params))`). It seems less important to allow a single macro to be both a derive and non-derive macro, not least of which because of the different naming conventions (`MyTrait` vs `my_macro`). However, using `derive(...)` syntax would make it easier to add parameterized derives in the future (e.g. `derive(MyTrait(params))`). `derive(...)` syntax would also be more consistent with the proposed declarative attribute macros. I'm leaning towards converting over to `derive(...)` syntax, but would like to discuss and get consensus. Also see the discussion later on about helper attributes. ## Unsafe derives and attributes [RFC 3715: Unsafe derives and attributes](https://github.com/rust-lang/rfcs/pull/3715) is a simple RFC proposing an extension to proc macro derives and attributes, allowing them to be marked as `unsafe`. Unsafe attributes require our newly established `#[unsafe(attribute)]` syntax. Unsafe derives require `#[unsafe(derive(Trait))]` or `#[derive(unsafe(Trait))]`. This initially adds the feature to proc macros, because: - We don't yet have a syntax for unsafe derives, and I wanted exactly *one* novel innovation per RFC for people to consider. - I wanted to avoid interdependencies between these RFCs that blocked one on another. If RFC 3715 is accepted, I plan to immediately add a proposal for unsafe `macro_rules!` derives as well, either in RFC 3698 or a tiny follow-on RFC. ## Macro fragment fields This is the big one. Please note: let's resolve any issues with the above RFCs and get to consensus on merging them, before we get into the very likely syntax bikesheds on *this* one. The above RFCs make it possible to do more with `macro_rules!` macros, but `macro_rules!` macros are still not easy to write. For instance, writing a `macro_rules!`-based attribute might require parsing function syntax, including parameters, patterns, generics, and where clauses. Writing a `macro_rules!`-based derive requires writing a parser for all the kinds of data type syntax you can apply the derive to, including not only structs, but possibly unions, and possibly enums (including all three kinds of variants: bare, tuple-like, and struct-like), *as well* as generics and where clauses. Missing some of the cases, or not handling new kinds of Rust syntax from new compiler versions, means your macro can't be applied to every kind of structure people want to apply it to. Proc macros solve this by outsourcing their parsing to `syn`, relying on `syn` to update its Rust lexer/parser for every new piece of Rust syntax, and navigating the parse tree while *mostly* not breaking on new things they don't understand. We need a way for `macro_rules!` macros to do something similar. And, conveniently, we already *have* a parser handy: `rustc`. [RFC 3714: Macro fragment fields](https://github.com/rust-lang/rfcs/pull/3714) proposes adding many more fragment specifiers for `macro_rules!` macros to use, to better take advantage of Rust's parser, and proposes a way to access *pieces* of Rust syntax parsed by a fragment without having to recreate that fragment. Today, if you want to write a macro that gets the name of a function, you have to recreate the Rust grammar for function syntax in your macro, just so you can match the `$name:ident` in the middle. With macro fragment fields, you can match `$f:fn`, and then access `${f.name}` in your macro, along with `${f.body}` ,`${f.return_type}`, and similar. This RFC primarily proposes the new syntax and the general pattern it enables, and only proposes a couple of new fragment fields as examples: `:adt` for abstract data types, and `:fn` for functions. Once accepted, I intend to propose many new fragments with associated fields. Some of the notable design considerations and questions in this proposal: ### Synthesizing tokens In some cases, a fragment specifier's fields might need to synthesize tokens that don't exist in the source. For instance, `$f:fn` provides `${f.return_type}`, which should be `()` if the function returns unit, but that token won't exist in the source because we write `fn func() {}` without writing `-> ()`. This RFC proposes that we synthesize such tokens when needed, and point spans to the closest bit of actual syntax the macro received. For instance, for the function return type, we can provide a span pointing to the closing parenthesis of the function arguments. ### Repeated fields Consider something like function parameters or struct fields or enum variants. A macro needs to be able to do something with each one of them. Rather than forcing it to parse out the individual pieces, this RFC proposes exposing them *as if they were within a level of repetition*: if you match `$f:fn`, you can write `$(${f.param}),*` to get a comma-separated list of parameters, or `$(other_macro!(${f.param}))*` to pass each parameter to another macro. As with any fragment specifier that gets matched within a level of repetition, rustc will require using these fields within the same type of repetition (e.g. `$(...)*`). ## Future work These are listed to give an idea of where we may want to take this in the future. They're not blockers for any of the proposed RFCs. ### Future work: optional fields We may want to have *optional* fields that may not always exist. We could handle these by treating them as if they're within a zero-or-one `$(...)?` repetition. We might, alternatively, want to introduce a dedicated conditional syntax. (See the next section for more reasons why.) Ultimately, this kind of thing is not as convenient as if we were writing procedural code that can write `for` and `if`, but we won't have a full comptime mechanism capable of doing that anytime soon. I'm proposing that we go with this for now, and if in the future we get a full comptime mechanism that can run Rust code *from the same crate* at macro-evaluation time, we can gleefully provide a future mechanism based on that instead. ### Future work: matching out specific things (e.g. helper attributes) We should document a pattern for how to parse things like "only the fields which have this helper attribute attached". We could, potentially, do this by expecting people to pass each `${t.field}` to another macro rule, match the helper attribute, and ignore things that don't match. That's easy enough and uses only things we already have, but it's not especially convenient, particularly since you have to walk through other attributes to check for the helper attribute. We could also provide macro metavariable functions to do this, such as `${t.field.has_attr(helper)}`, but this would need a mechanism like "optional fields" above to effectively provide conditionals for the case where this doesn't match. Something like this would work: ```rust $( ${t.field.has_attr(helper)} tokens here are only expanded if the condition matches )?` ``` However, this is a poor substitute for `if`, and really does not spark joy. We should come up with something better, like a dedicated syntax for conditionals. This would be useful for a variety of things. ### Future work: refining and handling cases We should establish and document a pattern for how to start out by parsing `$t:adt`, get `${t.name}`, and then handle the case where `$t` is a `struct` vs the case where `$t` is an `enum`. We could expect users to pass `$t` to another macro rule that has cases for struct vs enum. This will work easily, but can be somewhat inconvenient. Nonetheless, it may be the simplest approach to recommend. We could have an optional field `${t.struct}` of type `:struct` that only exists if `$t` is a `struct`, and an optional field `${t.enum}` that only exists if `$t` is an `enum`. However, it'd be inconvenient to have to add things that look like fields for every condition people might want. Also see the previous section on having better conditional syntax. # Summary of the goals - Show how these and future RFCs fit together - Get consensus on going ahead with this plan. Can we start an FCP on these RFCs? - Answer open question about derive macro syntax - Review design and possibilities for macro fragment fields in detail - Discuss project goal proposal: any objections to proposing a Project Goal for 2025H1? --- # Discussion ## Attendance - People: Josh, TC, Tyler, Felix, Vincenzo Palazzo, Yosh ## Meeting roles - Minutes, driver: TC ## Derive macro syntax Josh: What should the syntax of derive macros be? The current RFC, as written, uses: ```rust! #[macro_derive] macro_rules! MyTrait { (rule) => { body } } ``` `macro_derive` has an optional argument for helper attributes: `#[macro_derive(attributes(helper))` declares the `#[helper]` attribute. Alternative proposal: ```rust! macro_rules! MyTrait { derive() (rule) => { body } } ``` This would be very consistent with attributes. The empty parens in `derive()` reserves syntax space for supporting parameters in the future: `derive(params) (rule) => { body }` would match parameters supplied via `#[derive(MyTrait(params))]`. We could also extend this in the future by writing `unsafe derive() (rule) => { body }`, to require `#[derive(unsafe(MyTrait))]`. (This is again analogous with `unsafe attr(...) (rule) => { body }`.) Also, see below for an alternative approach for helper attributes. Should we switch to this approach? TC: We'd almost certainly want to allow such derives to take arguments, so we should come up with some syntax to enable that. tmandry: How would be handle helper attributes? Josh: We could cover that in the next section. tmandry: It's weird that you could declare a macro as both a derive and non-derive. Josh: That could be handled as a lint. ## Helper attributes and namespacing Josh: In a recent libs-api meeting, we discussed the crater results that showed compatibility issues with introducing a `#[skip]` or `#[skip(Debug)]` helper attribute for things like `#[derive(Debug)]`. In general, there's a compatibility hazard associated with introducing new helper attributes; they're not namespaced. (I've added an unresolved question to the `macro_rules!` derive RFC about this.) Should we add a way to declare namespaced helper attributes and parse them with hygiene? For instance, could we have `pub macro_helper_attr! skip` in the standard library, namespaced under `core::derives` or similar? Could we let macros parse that in a way that matches it in a namespaced fashion, so that if you write `#[core::derives::skip]` it matches, if you `use core::derives::skip;` and `#[skip]` it matches, but if you `use elsewhere::skip` (or no import at all) and `#[skip]` it *doesn't* match? We already have *some* interaction between macros and name resolution, in order to have namespaced `macro_rules!` macros. Could we make something like this work? Should I propose an RFC for it? TC: I'd like to see that RFC. pnkfelix: Are people using this with one particular trait? I.e., is it that one attribute stretches across many traits or that an attribute applies to a particular trait? Josh: Some helper attributes will want to be shared, was our finding on libs-api. tmandry: What's the state today? Do we error? Or let them see the same attribute? Josh: This led to some breakage in the crater run. One was stepping on the other. TC: Assuming this is feasible on the compiler side, from a language side, adding hygiene in general to macros seems an unalloyed good. The holes in our hygiene story are continual pain points, on the edition side and elsewhere. tmandry: Resolving paths in attributes is apparently hard. I had a conversation with petrochenkov about that awhile ago. So it could be "interesting". TC: We have the problem of resolving paths in attributes elsewhere; it keeps coming up. We're probably going to have to solve it. As a language matter, we may just need to treat this as a fixed cost. Josh: I'd be happy to pursue the RFC and see where it goes. I can move helper attributes to future work. tmandry: No objection to incremental progress. ## Syntactic defaults vs forcing attention to absent syntax pnkfelix: The example in "synthesizing tokens" of `f.return_type` has me wondering if we should attack the "optional fields" question more eagerly pnkfelix: from my point of view the `f.return_type` is an "obvious" example of an optional field that may seem like it has a "right default" of synthesizing the `()`, but if we can find a nice solution for optional fields (including some kind of easy syntax for writing the appropriate fallback when the optional field is absent, i.e. `()` in the context of `f.return_type` ... my mind is thinking of stuff like how Unix shell scripts handle absent variables via `${ ${VAR} : default }` ... Josh: I think we should treat the question of handling synthesized tokens separately from the question of whether `return_type` should use them. If we had a really clean and clear syntax for defaults, it might make sense to make `return_type` an optional field, though to me that seems like it'll make simple things like using the type a little less convenient (e.g. it isn't just a field of type `:ty`). But even if we change that, I think there will be other cases where we want to synthesize tokens. pnkfelix: do you happen to have examples besides `return_type` immediately on mind? (Maybe generics lists is one, though one might argue the empty list is the answer there...) Josh: Right. Combining generics with `where` clauses and presenting them as a unified list. Asking the compiler to synthesize "where" bounds. Turning `&self` into `self: &Self` to present it as an `f.param` in a unified way with other parameters. None of these are being proposed *today*, but I think there are enough examples of cases where we'd want to do it that we should establish consensus on whether we can synthesize tokens. (some followup discussion of the potential motivation of macro that *want* to differentiate between `fn foo() -> () {}` and `fn foo() {}`, and Josh pointed out that if need be, we could cheaply offer *both* of `f.return_type` (always present, synthesizes if needed) and `f.return_type.as_written` (optional, never synthesizes).) ## `#[derive(unsafe(Trait))]`? TC: The syntax `#[derive(unsafe(Trait))]` suggests to me that the trait may be unsafe rather than that the derive is, and these are not necessarily the same thing. By contrast, `#[unsafe(derive(Trait))]` seems OK. Josh: It's certainly true that you *could* have an unsafe trait that's safe to derive, though it's hard to easily imagine a case of a *safe* trait that's *unsafe* to derive. (Perhaps `Debug` on a struct that has an `unsafe` field?) But in any case, the goal here was to be able to write `#[derive(SafelyDerivableTrait, unsafe(UnsafelyDerivableTrait))]`, and confine the `unsafe` to a narrower scope. TC: Yes, agreed that finding such examples is an interesting question. The outermost `unsafe` seems an easier call to me, but I want to think about the other more and write out some examples. Josh: Yes, I'd proposed accepting both in the RFC, but I'm happy to add an open question here, and we could do them in sequence. ## Nested matchers tmandry: Perhaps we should add the ability to expand a matched fragment or fragment field and match against the expanded tokens. I imagine this is less straightforward than it sounds; e.g. why don't we support "breaking apart" most matchers today Josh: I very much want this as well. I am planning to propose it in future work. :) Extremely sketchy sketch (ignoring issues of keywords not actually being reserved here): ```rust! ($t:adt) => { $match $t { ($s:struct) => { ... } ($e:enum) => { ... } _ => {} } } ``` pnkfelix: I'd like to see more examples here on non-trivial structs and enums. I'd like to ensure it extends naturally to these. Josh: I think there is a syntax that will make sense here. Having a mechanism that could be invoked to trigger reparsing seems likely to be convenient for people. tmandry: Another example is matching against `${func.return_type}` so you can handle certain types specially. Josh: That's a great example as well. Today, if you want to handle one return type particularly, you either need to duplicate a lot of parsing, or to call into some helper submacro. It very much limits your control flow. pnkfelix: The (*sometimes irrelevant*) distinction between struct-vs-tuple fields could be another argument in favor of why we might want to sythesize tokens, as above. I'll add that to the RFC. ## Const proc macros > Ultimately, this kind of thing is not as convenient as if we were writing procedural code that can write `for` and `if`, but we won't have a full comptime mechanism capable of doing that anytime soon. tmandry: But we do; it's called `const`. Independent of this RFC, we might explore proc macros declared const inline inside of the current crate. Josh: Compiler team folks gave direct feedback that being able to do const evaluation earlier in the compiler (e.g. at macro evaluation time rather than at current const eval time), as well as the ability to do any form of proc macros that come from the current crate rather than a separate one, would be a very large ask and not likely to happen soon. They're very well aware that it'd be desirable to support. ## Where are we leaving macros 2.0? TC: This body of work, when implemented, would add a lot on top of the existing stabilized MBE macro system. But of course we have that other MBE system in nightly that is used by the compiler itself. I haven't looked for awhile, but presumably we have some old RFCs here. There are a lot of outstanding questions around hygiene that have been long unaddressed by the stable MBE system and that were intended to be addressed by macros 2.0. TC: How do we see the relationship between this body of work and that, and is it worth trying to recover any of that before we build a lot of new things on top of the stable MBE system? Josh: I think the macros 2.0 system has been unstable with zero path to stabilization for many, many years. I don't think we should block anything on it. TC: I'm curious what's been blocking us, and whether those things still are. pnkfelix: Here's a relevant tracking issue: https://github.com/rust-lang/rust/issues/39412 I think it's largely been a resource question. It does seem to revolve a bit around hygiene. And though I'm concerned about saying this, I do wonder whether some of the work here may make that hygiene work more difficult. Josh: I'd hope that the work here paints a path to being able to think more carefully about those problems. TC: The resource question comes up here too. I wonder whether we might run into the same problem here. We wouldn't want this to become just a different kind of blocked macros 2.0. TC: I always like incremental progress, though. TC: Proposed summary here: let's do incremental progress here, but be careful about the hygiene issues here to be sure that we don't make anything worse and try to see what we can make better. ## Where does compile-time reflection fit in? tmandry: There have been a lot of recent asks for compile-time reflection. Fragment fields introduces some functionality that resembles it. How can we tell a story about what each thing is "for"? If we one day add compile-time reflection features, will we end up with two APIs that have overlapping purpose but subtle differences? Josh: I have put some thought into compile-time reflection and I expect that we will be able to design something that makes use of fragment fields. Imagine, for instance, saving the tokens associated with a (public) struct definition, and being able to obtain those tokens elsewhere and apply a macro to them. tmandry: That sounds like an on-demand derive feature I've been thinking about. I'm not sure if it covers all the use cases of reflection though.. we should talk about it. tmandry: I would like to see a section of one of the RFCs explaining which use cases for compile time reflection we support and which ones we don't. ## How many fields do we want? tmandry: Seems like we will want potentially many fields and perhaps subfields, e.g. - `${fn.sig}` - `${fn.body}` - `${fn.sig.return_type}`? How should we think about the potential surface area here and how it gets expanded? Josh: I think we should add these very liberally, though each one will need an FCP. The RFC for macro fragment fields lists many, many future possibilities. ## Trait modifiers tmandry: How should we handle `#[derive(const MyMacro)]`? Josh: Added to future work section. ## Use cases tmandry: Felix said he'd like to see use cases elaborated more in the RFC. I agree that we should demonstrate some real-world use cases that the RFCs enable. If we can't find any without including more features, that's a useful signal that we should consider those features together. Josh: https://github.com/bearcove/merde is a full example of writing a derive using declarative macros.