--- title: "Design meeting 2025-07-23: Named macro capture groups" tags: ["T-lang", "design-meeting", "minutes"] date: 2025-07-23 discussion: https://rust-lang.zulipchat.com/#narrow/channel/410673-t-lang.2Fmeetings/topic/Design.20meeting.202025-07-23/ url: https://hackmd.io/Z8DkMrR6Siufajgj0Cmefw --- - Feature Name: `macros-named-capture-groups` - Start Date: 2024-05-28 - RFC PR: [rust-lang/rfcs#3649](https://github.com/rust-lang/rfcs/pull/3649) - Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) # Summary [summary]: #summary It will now be possible to give names to capture (repetition) groups in macro patterns, which can then be referred to directly in the macro body and macro metavariable expressions. _Rustc usually refers to these groups as "repetitions" in diagnostics. This RFC uses "capture groups" which is more general (they don't always repeat), and more in line with regex._ # Motivation [motivation]: #motivation Rust has no way to refer to capture groups directly, so it uses the variables they capture to refer to them indirectly. This leads to confusing or limited behavior in a few places: - Expansion with multiple capture groups is extremely limited. In many cases, the ordering and nesting of different groups is restricted based on what can be inferred by the contained variables, since the groups themselves are ambiguous. - Repetition-related diagnostics are suboptimal because the compiler has limited ability to guess what a capture group _should_ refer to when a captured groups and variables do not align correctly. - Repetition mismatch diagnostics can only be emitted after the macro is instantiated, rather than when the macro is written. (E.g. "meta-variable `foo` repeats 2 times, but `bar` repeats 1 time") - As a result of the above, using repetition is somewhat fragile; small adjustments can break working patterns with little indication of what exactly is wrong. Reading code with multiple capture groups can also be confusing. - Metavariable expressions as they currently exist use an unintuitive format: syntax like `${count($var, n)}` is used to refer to the `n`th ancestor group of the smallest group that captures `$var`. Referring to groups directly would be more straightforward than using a proxy. It is expected that named capture groups will provide a way to remove ambiguity in expansion and metavariable expressions, as well as unblock diagnostics that do a better job of guiding the macro mental model. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation Capture groups can now take a name by providing an identifier between the `$` and opening `(`. This group can then be referred to by name in the expansion: ```rust macro_rules! foo { ( $group1( $a:ident ),+ ) => { $group1( println!("{}", $a); )+ } } ``` This would be approximately equal to the following procedural code: ```rust let mut ret = TokenStream::new(); // Append an expansion for each time group1 is matched for Group1Captures { a } in group1 { ret += quote!{ println!("{}", #a); }; } ``` Named groups can be used to create code that depends on nested repetitions: ```rust macro_rules! make_functions { ( // Create a function for each name names: [ $names($name:ident),+ ], // Optionally specify a greeting $greetings( greeting: $greeting:literal, )? ) => { $names( // Create a function with the specified name fn $name() { println!("function {} called", stringify!($name)); // If a greeting is provided, print it in every function $greetings( println!("{}", $greeting) )? } )+ } } fn main() { make_functions! { names: [foo, bar], greeting: "hello!", } foo(); bar(); // output: // function foo called // hello! // function bar called // hello! } ``` This expansion is not easily possible without named capture groups because of ambiguity regarding which groups are referred to. Expansion of the above will approximately follow this procedural model: ```rust let mut ret = TokenStream::new(); // Append an expansion for each time group1 is matched for NamesCaptures { name } in greetings { let mut fn_body = quote! { println!("function {} called", stringify!($name)); }; // Append the greeting for each for GreetingCaptures { greeting } in greetings { fn_body += quote! { println!("{}", #greeting) }; } // Construct the function item and append to returned tokens ret += quote! { fn #name() { #fn_body } }; } ``` Groups can also be used in the expansion without `(...)` to emit their entire capture. This works well with a new "match exactly once" grouping that takes no kleene operator (as opposed to matching zero or more times (`*`), matching once or more (`+`), or matching zero or one times (`?`)). ```rust macro_rules! rename_fn { ( $newname:ident; $pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* ); ) => { $pub fn $newname( $args ); } } ``` # Reference-level explanation [reference-level-explanation]: #reference-level-explanation Macro captures currently include the following grammar node: > `$` ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_ This will be expanded to the following: > `$` ( IDENTIFIER_OR_KEYWORD except crate | RAW_IDENTIFIER | _ )<sup>?</sup> ( _MacroMatch<sup>+</sup>_ ) _MacroRepSep_<sup>?</sup> _MacroRepOp_ As a result, `$identifier( /* ... */ ) /* sep and kleene */` will allow naming a capture group. It can then be used in expansion: ```rust $identifier( /* expansion within group */ ) /* sep and kleene */ ``` Group and metavariables share the same namespace; all groups and metavariables must have a unique name within the macro capture pattern. Names will remain optional; however, if a capture group is given a name, it _must_ also be referred to by name during expansion. That is, an unnamed group in expansion will never be matched to a named group in the pattern. To make expansion rules easier, _it is forbidden to mix named and unnamed groups_ within the same macro. ## Overview of changes A summary of the implications of this language addition is provided before explaining detailed semantics. ### Nesting repetition expansions Nesting or intermixing repetition groups is currently not possible, mostly due to ambiguity of capture group expansions. Using an example from above: ```rust macro_rules! make_functions { ( // ↓ group 1 names: [ $($name:ident),+ ], // ↓ group 2 $( greeting: $greeting:literal, )? ) => { $( // <- this expansion contains both `$name` and `$greeting`. So is this // an expansion of capture group 1 or 2? fn $name() { println!("function {} called", stringify!($name)); $( println!("{}", $greeting) )? } )+ } } ``` Adding named capture groups makes this work, since ambiguity is removed. It is likely possible to adjust the rules for expansion such that the above would work with no additional syntax. However, this RFC posits that referring to groups by name provides an overall better user experience than changing the rules (more clear code, better diagnostics, and an easier model to follow). ### Zero-length capture groups As a side effect of more precise repetition, groups in expansion that do not contain any metavariables will become more straightforward. For example, this simple counter is not possible as written: ```rust macro_rules! count { ( $( $i:ident ),* ) => {{ // Error: attempted to repeat an expression containing no syntax variables //↓ the compiler does not know which group this refers to (here there // is only one choice, but that is not always the case). 0 $( + 1 )* }}; } ``` Using named groups removes ambiguity so should work: ```rust // Note: this is just a simple example. Metavariable expressions will provide a // better way to get the same result with `${count(...)}`. macro_rules! count { ( $idents( $i:ident ),* ) => {{ 0 $idents( + 1 )* }}; } ``` Metavariable expressions provide an `${ignore($var)}` operation that enables the same behavior; `ignore(...)` will simply not be needed with named groups. There is also no way to act on capture groups that bind only exact tokens but no variables. An example is extracting the `mut` from a function or binding signature: ```rust /// Sample macro that captures exact syntax and tweaks it macro_rules! match_fn { // ↓ We need to be aware of mutability (fn $name:ident ($(mut)? $a:ident: u32)) => { // ↓ we want to reproduce the `mut` here fn $name($(mut)? $a: u32) { // ^^^^^^^ // Error: attempted to repeat an expression containing no syntax variables println!("hello {}!", $a); } } } fn main() { match_fn!(fn foo(a: u32)); foo(10); } ``` Adding named capture groups to the above would allow it to work. `${ignore(...)}` does not directly help here. ### Metavariable expressions Metavariable expressions currently use a combination of location within the expansion (i.e. which capture groups contain it), variables captured, and an index to change the indicated group. For example, `index()` returns the number of the current expansion. ```rust macro_rules! innermost1 { ( $( $a:ident: $( $b:literal ),* );+ ) => { [$( $( ${ignore($b)} ${index(1)}, )* )+] }; } ``` In order to understand what `index(1)` is referring to here, one must do the following: - Note how many repetition groups exist in the match expression (2). - Count how many repetition groups the `index(1)`` call is nested in (2). - Backtrack by one to figure out what exactly is getting indexed (1). After doing the above, it can be noted that `${index(1)}` in this position will indicate the current expansion of the outer cature group (the group containing only `$a`). Rewritten to use named groups instead: ```rust macro_rules! innermost1 { ( $outer_rep( $a:ident: $inner_rep( $b:literal ),* );+ ) => { [$outer_rep( $inner_rep( ${index($outer_rep)}, )* )+] }; } ``` It is significantly easier to see what the call to `index` is referring to. As an added benefit, its meaning will not change if its position is moved in the code (e.g. moving to be within `$outer_rep`, but not `$inner_rep`). This RFC proposes that `count`, `index`, and `len` will accept group names in place of a variable and an index, since these three expressions relate more to how entire _groups_ are expanded than the variables they take as arguments. Further reading: - [`macro_metavar_expr` RFC][`macro_metavar_expr`] and [tracking issue](https://github.com/rust-lang/rust/issues/83527) - [Proposal for possible specific behavior](https://github.com/rust-lang/rust/pull/122808#issuecomment-2124471027) ### "Exactly one" matching and full group emission If a group is specified without a kleene operator (`*`, `+`, `?`), it will now be assumed to match exactly once. This will be most useful with the ability to emit an entire matched group. ```rust macro_rules! check_and_pass { // `group1` will get matched exactly once ($group1(a b $v:ident c $group2($tt:tt)* )) => { // All tokens including exact `a` `b` will get passed to `other_macro` other_macro!($group1) } } ``` This should make it much easier to work with optional exact matches. Currently there is no way to do anything useful with capture groups that don't capture metavariables (such as `$pub` and `$mut` in the [Guide-level explanation](#guide-level-explanation) example). _TODO: will this preserve the coercion of tokens to fragments (e.g. tt -> ident)?_ ## Detailed semantics To illustrate detailed semantics, an example will be used in which the token pattern approximately mirrors the named groups: ```rust macro_rules! m { ( $outer_a(oa[ $middle(m[ $inner(i[ $x:ident ])* ])* ])*; $outer_b(ob[ $y:ident ])*) => { println!("{}", stringify!(/* relevant expansion */)) } } m!( oa[ m[i[x0] i[x1]] ] oa[] oa[m[] m[]]; ob[y0] ); ``` This can be thought of as a tree structure that loosely matches the token `Group` of the captured items. ```text Level 0 | Level 1 | Level 2 | Level 3 | Level 4 /- $inner[0,0,0] --- $x[0,0,0,0] |-- $outer_a[0] --- $middle[0,0] -| `i[v0]` `x0` | `oa[...]` `m[..]` | | \- $inner[0,0,1] --- $x[0,0,0,1] | `i[v1]` `x1` | |-- $outer_a[1] | `oa[...]` $root -| (entire macro) | /- $middle[2,0] |-- $outer_a[2] -| `m[..]` | `oa[...]` | | \- $middle[2,1] | `m[..]` | |-- $outer_b[0] --- $y[0,0,0,1] `ob[...]` `y0` Summary of captures: - `$outer_a`: captured 3 times - `$middle`: captured 3 times - `$inner`: captured 2 times - `$x`: captured 2 times - `$outer_b`: captured 1 time - `$y`: captured 1 time ``` In the above diagram, `$metavar[i_n, ..., i_1, i_0]` shows that this is the `i`th the `i`th capture for the `n`th ancestor captured instance of `$metavar` within the `i_1`th instance of its parent group capture. Some of this section intends to solidify rules that are currently implemented but not well described. ### Definitions This section uses some common terms to refer to relevant ideas. - "Pattern" or "capture pattern": the left hand side of the macro that defines metavariables and gets pattern matched against code. - "Captured": when a token or tokens are matched by a variable or group. This has some overlap with what rustc refers to as "repeating n times" in error messages, but this RFC seeks to make this less ambiguous. - "Expansion": the right hand side of the macro that uses metavariables and new tokens to update the file's AST. - "Contents": Whatever is contained within a group - "Level": a nesting level (or generation in the tree diagram. Adding a new group around a metavarible increases its nesting level. - "Parent": any group that is at a higher level of the subject ("child") and shares a direct lineage. E.g. `$inner` and `$middle` are both parents of `$v`. - "Immediate parent": parent exactly one level above. E.g. `$inner` is an immediate parent of `$v`, `$middle` is not. - "Capture parent", "capture level", "capture contents": a parent, level, or group contents in the capture, which may not exist or be the same in the expansion. - "Expansion parent", "expansion level", "expansion contents": a parent, level, or group contents in the expansion, which may not exist or be the same in the capture. ### Expansion rules In current macro expansion, the following rules are observed: 1. Groups must contain at least one metavariable (`$()*` fails with "repetition matches empty token tree"). 2. Metavariables must be expanded with the same level of nesting in which they are captured. That is, if a metavariable is captured within two nested groups as `$($($v:ident),*),*`, it may only be expanded as `$($($v)*)*`. 3. Metavariables _or groups_ at the same level of the capture group must be be captured the same number of times. This is easier to visualize with examples. Named capture groups are used to make things more clear, even though this is discussing the current unnamed groups. ```rust // Possible expansions for the above sample macro // Ok: prints `x0 x1` $outer_a( $middle( $inner( $x )* )* )* // Ok: prints `y0` $outer_b( $y )* // Forbidden: "variable 'x' is still repeating at this depth" $??( $x )* $x // Forbidden: "attempted to repeat an expression containing no syntax variables // matched as repeating at this depth" // Basically, it is unable to determine what the groups are supposed to refer to. $outer_b( $??( $??( $y )* )* )* // Forbidden: "`x` repeats 3 times, but `y` repeats 1 time" // This is an example diagnostic where referring to groups by their captures // doesn't work well; `x` is actually only captured twice, but _`$middle`_ is // captured three times. $??( $??( $??( $x $y )* )* )* // Forbidden: "`x` repeats 3 times, but `y` repeats 1 time" // Makes sense; if `$combined` must refer to only a single group then it has // no way to pick between `$outer_a` and `$outer_b`. $combined( $middle( $inner( $x )* )* $y )* // ...except the above actually works with the following invocation, printing // `x0 y0`, because `$middle` (level 2) and `$y` (also level 2) are captured // the same number of times. This is an example of invocation-dependent // expansion correctness that this RFC hopes to minimize. m!( oa[ m[i[x0]] ]; ob[y0] ); ``` With named repetition groups, these rules will be changed to the following: 1. Group expansions no longer need to contain any metavariables. 2. In expansion, the group will repeat as many times as its entire pattern was captured, independent of whatever its expansion contents are. 3. Captured variables or groups may only be expanded within their parent group, if any. However, the capture parent does not need to be the immediate expansion parent. 5. If a group name is given in the expansion with no `(...)`, the entire capture of that group is reemitted _including exact literals_. These are detailed in the following sections. #### Expansion within an immediate parent If a group or metavariable has capture parents, it must be scoped within those same parents for expansion (though they need not be immediate). ```rust // Correct: prints `x0 x1` $outer_a( $middle( $inner( $x )* )* )* // Correct, emits entire middle group. Result: `m[i[x0] i[x1]] m[] m[]` $outer_a( $middle )* // Skipping a group level // Error: `$inner` must be contained within a `$middle` group, but it is within // `$outer_a` $outer_a( $inner( $x )* )* // No grouping at all // Error: `$x` must be within an `$inner` group, but it is not within any group $x ``` _TODO: if `foo[x0] foo[] foo[x2]` is captured and the expansion is `$foo( $x ),+`, should `x0, x2` be emitted of `x0, , x2` (extra comma)? Probably the first one._ A possible relaxation of this rule is to allow groups or variables to expand to all captures within that level when not nested within the immediate parent. TBD whether this should be part of this RFC or a future possibility ```rust // Expands to `x0 x1` $x // Expands to `x0 x1` $outer_a( $middle( $x )* )* ``` #### Expansion within non-parent groups If a group or metavariable is nested within a group that is not a capture parent, it should be repeated as many times as that group. It must still have its capture parents as expansion parents so as not to break other rules, but they need not be the immediate parents. In order to avoid edge cases with metavariable expressions, a group is not allowed to be nested within itself. ```rust // Correct: prints `y0 y0 y0` // This is because the _entire_ expansion of `$outer_b` (one instance of $y) // is repeated once for each `$outer_a` (three instances) $outer_a( $outer_b( $y ) ) // Correct: prints `ob[ oa[y0 i[x0 y0] oa[x1 y0]] oa[y0] oa[y0]]`. // Explanation: // - `root > outer_a > middle > inner > x` and `outer_b > y` ordering are both // still respected, even though they are interleaved // - `outer_b` repeats once within root // - `outer_a` repeats three times within root, so repeats 3x within `outer_b` // - `inner` repeats `[2, 0, 0]` times within the `outer_a` instances. This // drives how often `x` and `y` get repeated within that group. $outer_b( ob[$outer_a( oa[$y $middle( $inner( i[$x $y] ) )] )] ) // Forbidden: `x` is missing parent `outer_a` $outer_b( $middle( $inner( $x $y ) ) ) ) // Forbidden: group nesting within itself $outer_b( $outer_b( $y ) ) $outer_b( $outer_b ) ``` #### Single matches Capture groups must currently specify a kleene operator (`*`, `+`, or `?`) that determines if the group should match zero or more times, one or more times, or up to one time. This RFC will allow omitting the kleene operator to indicate that a group must be captured exactly once. That is, `$group(foo)` is a valid pattern (with current rules, `$(foo)` is forbidden). With this "exactly once" match there is no purpose in having a repetition token (e.g. the comma in `$(...),*`), so it must be omitted. #### Entire group expansion Since groups are named, it is now possible to write a group name to reemit its captured contents with no further expansion. This syntax uses the group name but omits the `(/* expansion pattern */)/* kleene */`: ```rust // Ok: prints `ob[y0]` $outer_b ``` The entire contents of the capture group are emitted, including both exact tokens and anything that would be bound to a metavariable. Span from the macro invocation can be kept here, which should improve the diagnosability of some macros. The above rules regarding allowed group usage locations must still be followed. ### Metavariable Expressions This RFC proposes some changes to metavariable expressions that will leverage named groups to hopefully make them more user-friendly. _At time of writing, part of macro metavariable expressions are under consideration for stabilization. Depending on what is selected, these rules may need to change slightly._ - `${index($group)}`: Return the number of times that the group has been _expanded_ so far. Must be used within the group that is given as an argument. - `${count($metavar)}`: Return the number of times a group or metavariable was _captured_. - `${len($metavar)}`: Because `count` becomes more flexible, `len` is no longer needed and can be removed. - `${ignore($metavar)}`: if this RFC is accepted than `$ignore` can be removed. It is used to specify which capture group an expansion group belongs to when no metavariables are used in the expansion; with named groups, however, this is specified by the group name rather than by contained metavariables. #### `${index($group)}` The `index` metavariable expression is used to indicate the number of times a group has been expanded so far. It can be thought of a form of `enumerate`. - Arguments: one required argument, `$group` - Allowed usage: may only be used within `$group` - Output: The number of times the _current expansion of `$group`_ has repeated so far. That is, if `$group` is captured twice but used >2 times in the expansion, `${index($group)}` will still only ever return 0 or 1. - Changes from current implementation: - Takes a group as an argument, not a depth - The argument is no longer optional In the tree diagram, this can be thought of as returning the final number for the given group in the `[i_n, ..., i_0]` index list. ```rust // Ok: prints `o0 m0 i0 i1; o1; o2 m0 m1;` $outer_a( o ${index($outer_a)} $middle( m ${index($middle)} $inner( i ${index($inner)})* ),* );* // Ok: prints `o m outer_idx 0; o; o m outer_idx 2 m outer_idx 2` $outer_a( o $middle( m outer_idx ${index($outer_a)} ),* );* // Ok: prints `ob oa0 oa1 oa2` // The outer repetition (`outer_b`) has no influence on `index` $outer_b( ob $outer_a( oa ${index($outer_a)} ),* )* // Forbidden: not used within a `$middle` group $outer_a( ${index($middle)} $middle );* ``` The location of `index` within its group does not affect its output. That is, all of the below will return 0: ```rust // For this example, `$g1` is captured exactly once. All other groups are // captured any number of times // Prints `0` $g1( ${index($g1)} ) // Increasing the nestin does nothing; still returns `0` for each `g2` capture $g1( $g2( ${index($g1)} )* ) // Still returns `0` $g1( $g2( $g3 ( ${index($g1)} )* )* ) ``` #### `${count($name)}` `count` is used to return the number of times a group or variable has been captured. It can be used in any location, but its exact behavior is location- dependent. - Arguments: one required argument, `$group` or `$metavariable` - Allowed usage: may be used anywhere within the expansion, but some arguments may be disallowed. - Output: this returns the number of times a group or metavariable was captured, with some scoping specifics. - Changes from current implementation: - Can take a group as an argument - Functionality combined with `len` Looking at a group or variable that is more deeply nested will return how many of that variable were captured in the current repetition. Looking at a variable or group that is less deeply nested will return the total times that group was captured. This can be represented as a simple tree walking algorithm to the _capture_ tree to determine what gets counted. The starting position in the _expansion_ determines where to start, and then the following rules are applied: - If `level($name)` >= `level(expression)` (more deeply nested), walk all descendents, including those of neighbors, and count each `$name` - If `level($name) < `level(expression)` (less deeply nested): - Walk the entire tree and count each `$name` - Reject code with an error if `$name` is not an ancestor ```rust /* looking at descendents */ // Ok: prints `[oa 3, m 3, i 2, x 2; ob 1, y 1]` // Demo printing totals. Expansion is at the root (level 0), so each variable // is at a higher level; the entire tree is walked and all instances are counted. [ oa ${count($outer_a)}, m ${count($middle)}, i ${count($inner)}, x ${count($x)}; ob ${count($outer_b)}, y ${count($y)}, ] // Ok: prints `[1 0 2]` [$outer_a( ${count($middle} )*] // Ok: prints `[2 0 0]` [$outer_a( ${count($inner} )*] // Ok: prints `[2 0 0]` [$outer_a( ${count($x} )*] // Ok: prints `o[m[x 2]] o[] o[m[x 0] m[x 0]]` // `$middle[0,0]``"sees" two `$x` captures. `$middle[2,0]` and `$middle[2,1]` both // don't see any. $outer_a( o[$middle( m[x ${count($x)}] )*] )* /* looking at ancestors */ // Ok: prints [3 3 3] // `count` is used at level 2 (one deeper than `outer_a`), so it sees all // `outer_a`s each time. [$outer_a( ${count($outer_a)} )*] // Ok: prints [[3] [] [3 3]] // Similar to the above [$outer_a( [$middle( ${count($outer_a)} )*] )*] /* errors */ // Error: trying to count a variable that is neither a descendent or anscestor [$outer_a( ${count($y} )*] ``` TODO: we could relax the final rule and allow counting siblings of ancestors # Drawbacks [drawbacks]: #drawbacks Why should we *not* do this? - If [`macro_metavar_expr`] stabilizes before this merges, this will add a duplicate way of using those expressions. If this RFC is accepted, stabilizing only a subset of metavariable expressions that does not conflict should be considered. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives - In macro metavariable syntax, using named capture groups, we could treat `count` and `index` as *fields* rather than *functions*. For instance, we could write `${$group.index}` rather than `${index($group)}`. This would be consistent with the [macro fragment fields](https://github.com/rust-lang/rfcs/pull/3714) proposal. - Since we no longer need both `count` and `len`, we could choose to use either name for the remaining function we still provide. We should consider whether `count` or `len` best describes this functionality. - Variable definition syntax could be `$ident:(/* ... */)` rather than `$ident(/* ... */)`. Including the `:` is proposed to be more consistent with existing fragment specifiers. - There is room for macros to become smarter in their expansions without adding named capture groups. As mentioned elsewhere in this RFC, it seems like adding named groups is a cleaner solution with less cognitive overhead. # Prior art [prior-art]: #prior-art - Regex allows the naming of reepeating capture groups for expansion, including those that do not capture anything else. # Unresolved questions [unresolved-questions]: #unresolved-questions - Syntax: the original proposal was to include a colon, e.g. `$group1:(/* ... */)`. A label-like syntax of `$'group1 $(/* ... */)` was also proposed. **Possibly edition-sensitive** the proposed syntaxes are currently rejected under the `missing_fragment_specifier` lint. That means that `#![allow(missing_fragment_specifier)]` makes rustc accept the proposed syntax as valid, which could conflict with this proposal. # Future possibilities [future-possibilities]: #future-possibilities - Macros 2.0: if accepted, the same rules expressed in this RFC should also apply to Macros 2.0. Macros 2.0 may even opt to forbid unnamed capture groups. - A `${count_in($var, $group)}` expression that allows further scoping of `count` (TODO: describe this better). [original proposal]: https://github.com/rust-lang/rfcs/pull/3649#discussion_r1618998153 [`macro_metavar_expr`]: https://rust-lang.github.io/rfcs/3086-macro-metavar-expr.html --- # Discussion ## Attendance - People: Josh, Niko, Tyler, TC, Eric Holk, Trevor Gross, Tomas Sedovic, Yosh ## Meeting roles - Driver: TC - Minutes: Tomas Sedovic ## Background by Trevor Gross Trevor: We were looking to stabilize the metavariable expressions as they exist. Trying to write the documentation I couldn't wrap my head around regarding indexes and I thought it would be easier to name groups. And also it would fix the issues with nested groups. And also document what the compiler actually does. TC: When you go back to the RFC you can also use what we've changed in the Reference with the ABNF-like grammar. ## Vibe check ### Josh :ship: Ship it! :ship: And, let's make sure we only ship the version of macro metavariable expressions that uses group names to reference groups, not the version that uses a representative variable inside the group. :+1: for *either* the proposed version `$name(group)` or the version with a colon `$name:(group)`. Mild preference for the former. I do appreciate the concept of `:` setting the "type" of a name, and a group being a kind of user-defined type. But I don't think that, or reserving future syntax space, is worth adding a speedbump. Seconding Niko's point below that this should be the primary approach we teach. ### Niko This closes a major gap in our macros, I like it, and I think it makes them easier to teach, to boot. You can start with named groups and then show how the names can be left off. I personally think I'll be using them quite a bit. So yes, I'm in favor. I *think* syntax-wise I prefer `$name(foo)`, it seems very elegant and minimal, I suppose it might be a bit confusing. I like `$group.index` btw =) but it seems independent. ### Tyler I think this RFC is great, and we should do it. I'm excited about the idea and the ability to unblock `macro_metavar_exprs`, which are useful, but which I never want to stabilize with the index-based interface. ### TC This RFC is well written and well motivated. Thanks to tgross for that. Let's ship it. On the question of the `$name(..)` or `$name:(..)` syntax, probably the former makes a bit more sense to me. We use the `:` in an ascriptive-like manner in macro fragments to notate the kind of fragment specifier, and this isn't doing that exactly. On the question of `len` and friends, I agree that a postfix syntax would be nice here. Agreed as well that we treat named groups as the primary approach going forward and treat unnamed groups as a legacy behavior. ### Eric I like it! I really appreciate how thorough the RFC is, covering a variety of examples and interactions with existing and future features. ## Semantics when referencing things outside the group nikomatsakis: In [this section](https://hackmd.io/Z8DkMrR6Siufajgj0Cmefw?view#Nesting-repetition-expansions), it says > Adding named capture groups makes this work, since ambiguity is removed. referring to ```rust macro_rules! make_functions { ( // ↓ group 1 names: [ $($name:ident),+ ], // ↓ group 2 $( greeting: $greeting:literal, )? ) => { $( // <- this expansion contains both `$name` and `$greeting`. So is this // an expansion of capture group 1 or 2? fn $name() { println!("function {} called", stringify!($name)); $( println!("{}", $greeting) )? } )+ } } ``` I have indeed hit this limitation a number of times and it is quite frustrating -- ah, I guess I see the answer to my question, but I will ask it anyway. The reference to `$greeting` here, it cannot appear without a (fresh) enclosing `$()`, right? e.g., would this be an error? ```rust macro_rules! make_functions { ( // ↓ group 1, now named names: [ $names($name:ident),+ ], // ↓ group 2 $greetings( greeting: $greeting:literal, )? ) => { $names($name $greeting)+ } } ``` I guess the rules are that only metavariables from the named repitition (or one that is enclosing) can appear? So e.g. this would be ok and it would repeat "name" once per greeting? ```rust macro_rules! make_functions { ( // ↓ group 1, now named names: [ $names($name:ident),+ ], // ↓ group 2 $greetings( greeting: $greeting:literal, )* ) => { $names($greetings($name $greeting)*)+ } } ``` Niko: I think what you were saying is: I can make a loop over names and have a group `$names` but I can't make a group. Trevor: I think I omited a name, I've updated the example. I don't think you could nest that. It seemed difficult to make the semantics work. Niko: I think this RFC opens the door to making it work. Trevor: If you search the RFC for "must be contained within a", whatever the hierarchy is on the capture side, the same hierarchy needs to be on the output side. Niko: If the groups are different, I don't think it's ambiguous. It would get ambiguous if you nested a loop for greetings inside a loop of greetings. tmandry: Let me read that back: are you saying we can make the third example here compile and be equivalent to the first example? ```rust // Correct: prints `x0 x1` $outer_a( $middle( $inner( $x )* )* )* // Correct, emits entire middle group. Result: `m[i[x0] i[x1]] m[] m[]` $outer_a( $middle )* // Skipping a group level // Error: `$inner` must be contained within a `$middle` group, but it is within // `$outer_a` $outer_a( $inner( $x )* )* ``` Trevor: That's a slightly different case. Niko: Here's a simple example: ```rust! macro_rules! { ( A: [$anames($aname:ident)*], B: [$bnames($bname:ident)*], ) => { $anames( $aname )* // generates a0, a1, a2 $bnames( $bname )* // generates b0, b1, b2 $anames( $bnames( ($aname $bname) )* )* // generates: (a0 b0) (a0 b1) (a0 b2) ... $bnames( $anames( ($aname $bname) )* )* // generates: (a0 b0) (a1 b0) (a2 b0) ... $anames( ($aname $bname) )* // ERROR: $bname is not "being iterated over" $anames( $anames( $aname // ambiguous -- which iteration is this? ) )* // ERROR: can't nest group twice }; } ``` Trevor: Expansion within non-parent group is the section within the RFC. And you can do that. Trevor: The example in the RFC (from the section "Expansion within non-parent groups") is: ```rust macro_rules! m { ( $outer_a(oa[ $middle(m[ $inner(i[ $x:ident ])* ])* ])*; $outer_b(ob[ $y:ident ])*) => { println!("{}", stringify!(/* relevant expansion */)) } } // Correct: prints `y0 y0 y0` // This is because the _entire_ expansion of `$outer_b` (one instance of $y) // is repeated once for each `$outer_a` (three instances) $outer_a( $outer_b( $y ) ) // Correct: prints `ob[ oa[y0 i[x0 y0] oa[x1 y0]] oa[y0] oa[y0]]`. // Explanation: // - `root > outer_a > middle > inner > x` and `outer_b > y` ordering are both // still respected, even though they are interleaved // - `outer_b` repeats once within root // - `outer_a` repeats three times within root, so repeats 3x within `outer_b` // - `inner` repeats `[2, 0, 0]` times within the `outer_a` instances. This // drives how often `x` and `y` get repeated within that group. $outer_b( ob[$outer_a( oa[$y $middle( $inner( i[$x $y] ) )] )] ) // Forbidden: `x` is missing parent `outer_a` $outer_b( $middle( $inner( $x $y ) ) ) ) // Forbidden: group nesting within itself $outer_b( $outer_b( $y ) ) $outer_b( $outer_b ) ``` Trevor: outer_a and outer_b are at the same level on the left-hand side. But on the right-hand side they're nested. Niko: it may be possible to allow nested repetitions, with the assumption that inner groups shadow/replace outer. Trevor: left for a future extension ## "Match exactly once" > This works well with a new "match exactly once" grouping that takes no kleene operator (as opposed to matching zero or more times (*), matching once or more (+), or matching zero or one times (?)). > > ```rust > macro_rules! rename_fn { > ( > $newname:ident; > $pub(pub)? fn $oldname:ident( $args( $mut(mut)? $arg:ident: $ty:ty )* ); > ) => { > $pub fn $newname( $args ); > } > } > ``` tmandry: Huh? I don't see an example of that above. Trevor: sorry, the `?` after `pub` shouldn't be there. I'll come up with a better example since you could also just match a `pub` literal here. ## Grammar update for "match exactly once" Josh: If I'm reading the proposed grammar update correctly, it doesn't include a change to mark the Kleene operator as optional. nikomatsakis: A question on match exactly once: I have definitely wanted it, glad it's being added, but also what happens with (e.g.) `$foo(pub),`. This is I think treated as a "exactly once" followed by a comma, right? But basically that is true because there is no kleen operator..? I'd kind of like an explicit way to say "exactly once", is what I'm getting at. (In general our behavior here is weird, e.g., you can't have a `+` as a separator...not as flexible as it should be, but I suppose a pre-existing problem.) This is a total nit, just trying to understand. Trevor: maybe this should specify that matching once ignores separators? So `$foo(pub),` would be treated as a group `$foo(pub)` followed by a `,` literal. Josh: I think it makes sense to say that we only parse a separator if we're parsing a Kleene op, yeah. TC: So if you have the comma after the group and you didn't have a kleene operator after that, we'd parse that but treat it as a token that follows the group. Josh: We currently have issues where we don't allow you to do something separator by `+` because we treat the `+` as a kleene operator, not a `+` separator. That's something we could do in the future. TC: We would need, e.g., some unambiguous syntax for separators. ## Ambiguity Trevor: Also note there is an ambiguity here that needs to be addressed. `$group (...)` could be either "expand `$group` using `(...)`" or "drop `$group` in place, `(...)` comes next". Bjorn proposed `$...group` as a way around this. Trevor: This needs some thought. Josh: I think there's a few different ways we can address it. One way: add a space after the group name. Or we could add some sort of disambiguator here. TC: I'd use two dots rather than three. But `$..group` has a nice parsimony in how we do struct construction. Josh: I'm assuming that would *only* be used if needed to disambiguate? Trevor: ```rust macro_rules! foo { ($group($a:ident)) => { $group () } } // Does this expand to `` (empty) or `x ()`? foo!(x); ``` Tyler: If we add variadic generics, we may want to make the number of dots match the number of arguments. Trevor: +1 to Tyler's comment Josh: Another alternative: `${$group}` or `${group}`. Having a macro metavar expression. This would be consistent with other expansions to the macro metavariable expressions. E.g.: `${var.field}` `${var}` TC: Trevor, does what Josh proposed make sense to you? Trevor: I believe so. Josh: This `${group}` would be the proposal I'd prefer this because it would be easy to remove ambiguity by going from `$group` to `${group}`. It also works like this in shell. Tyler: Removing the inner `$` agrees with my proposal below to make the syntax `${index(group)}` and not `${index($group)}`. TC: I would probably just propose to require the braces all the time, to start, to expand the whole group. As I look at Trevor's example, it occurs to me it could be easy to trip over this. We could always relax it later. Josh: I don't think this would be something people would be likely to trip up on, because I'd expect it to be uncommon to expand a group with `$group(...)` if the `...` doesn't reference anything inside the group. Josh: Variables that are inside a group are things that you can't reference except when you are in a group. That makes it unlikely that you could write one thing and have it interpreted as the ther. TC: In the example above, on the RHS we're not refering to any metavariables that are inside the group. So what would this expand to? Josh: I don't there's any obvious way that could expand to `x`. Does this expand to nothing or expand to `x ()`? The question is: is there a sensible reason to write "I want to expand this group but I want to expand it to something that doesn't reference anything in the group". And I don't think there's any reason to do that. Josh: If you have a repeat on it (like a kleene operator), then it make sense to expand this that many times. But generating a set of tokens for the group without a kleene operator for it doesn't seem to make any semantic function. TC: I'd have to think hard about whether that's any viable usecase for it. But looking at the example above, it makes sense to me for it to return empty. Tyler: Are you saying when there's no kleene operator on the right-hand-side of the macro, Josh: If the group has a kleene operator, it's unambiguous whether it's expanding or not by looking at the kleene operator. If the group doesn't have an operator, there's no need to reference the group and you can expand it directly. Tyler: If you're expanding the group you don't write the kleene operator. ```rust macro_rules! foo { ($group($a:ident),+) => { $group (println!("foo");) } } ``` Tyler: The snippet on the right hand side of the arrow doesn't change whether there's an operator or not on the left-hand-side. Josh: You're right that you could write that and expect it to mean something. This is an ambiguity we need to solve. Niko: ```rust macro_rules! foo { ($($a:ident),*) => { $($a) ,* // does what today? } } ``` Josh: I think right now it just gives a parse error. Though you could construct an example where it wouldn't. Niko: In general I agree with TC that having a space would be confusing. On the other hand macros are one of the weird places, where the syntax is different enough. Niko: What harm does it do to always require to put the braces? Josh: Mostly that it's annoying. It would be nice when you're matching the group and you're not referencing what's inside of the group, it'd nice to have `$group` return the group. We could add a warning/rustfix. Niko: I don't mind it being explicit. But it doesn't seem very obvious to me. I want some sort of splat operator or something. We're not using the braces in a similar analagous way. Josh: We are: `${}` give you an expression where you can count/index/concat this. We're talking about adding macro fragment fields (`${some_capture.field}`). If you add that, then `${group}` is a logical extension of that. And that should work for things other than groups; `${x}` should work with `$x:ident` too. Niko: Should we add like `for` to begin the iteration (rather than just the left parentheses)? Could it be `$for x (`. Josh: That would be harder because we didn't reserve keywords. So right now you can use a `$for` as a variable. TC: That is fixable over editions. Josh: It's not obvious there's a win. Josh: But it seems like your broader point is: what if we changed the invocation syntax from `$group (` to something else? Niko: The answer to my question earlier about the spaces seems to be that it works just fine. ```rust macro_rules! foo { ($($a:ident),*) => { $($a) ,* // does what today? } } fn main() { let x = 1; foo!(x); // compiles ok, expands to `x;` } // this suggests to me that `$foo ()` "ought" to work, which suggests to me that `${foo}` should be the default, with `$foo` allowed only for "leaf" variables as a shorthand ``` TC: Given the proposed disambiguation, want to confirm some behavior: ```rust macro_rules! m1 { ($group($a:ident),+) => { ${group}() } } m1!(x); // -> `x()` macro_rules! m2 { ($group($a:ident),+) => { ${group}($a) } } m2!(x); //~ ERROR: referencing $a without being inside a group macro_rules! m3 { ($group($a:ident),+) => { ${group($a)} } } m3!(x); //~ OK? macro_rules! m4 { ($group($a:ident),+) => { $group($a) } } m4!(x); //~ OK! ``` Niko: when I ask people what they find hard about Rust, macro is always at the top of the list. I'm interested in us doing better. Trevor: In the initial implementation I'll use the `..` or `...` as a placeholder syntax, seems like there are some deeper questions to be answered here. Josh: That seems fine as a way of handling the disambiguation. Niko: To reframe TC's concern, I think it is: if `${foo}` and `$foo` are equivalent in certain cases (e.g., leaf variables), you kind of want them equivalent in all cases (but here it would make subtle distinctions). -- I find that convincing. TC: Yes, that's a correct reframing. Trevor: There's something to think about with the `$concat` metavariable expression. If you want to do something like `${concat(var, $var)}`, the `$` is needed to indicate whether you're referring to an identifier from the outer scope or a bit of literal text. Josh: That would depend whether the literal is written as just a bare token without any modifier or whether you quote it (e.g. `prefix_` vs ``` `prefix_` ```). Josh: But the other way we could do it is: every token is a literal except the ones that have $ in front. Josh: That should be an unresolved question on concat until we decide how to handle bare idents. Tyler: Toying with the idea that it should be `${concat({ var }, $var)}`, i.e., that `{}` switches from expression space back into literal space and `${}` switches from literal into expression space. ## Mixing named and unnamed > To make expansion rules easier, it is forbidden to mix named and unnamed groups within the same macro. tmandry: This strikes me as probably too strict. Within a rewrite rule would be better. Maybe we can relax it to "no nesting unnamed within named" and vice versa. I think we can easily expand this later with an FCP, but if it's easy to fix now I'd be happier to do it. Trevor: This comes from looking at the compiler implementations; I think it's much more straightforward/clear if you can assume that all groups are named or no groups are named. And I don't think it's going to be much of an issue in practice when most macros have a small number of captures. Trevor: The diagnostic infrastructure is a little bit more straightforward when you have the group names variable. When you're referring to a given group, it's better to refer to its name. Trevor: Also it doesn't seem like being able to mix them gains so much. Typically you'd have one or two. Josh: At a minimum, you shouldn't be able to use a named group wihout using its name. But ideally having one named group should require having them all be named; I agree that we shouldn't allow mixing them. TC: When I think about how people edit code, and how we were framing this in terms of this being a better way to do it generally, it seems reasonable to me that people would want to migrate the entire macro when touching it to use named groups. Tyler: That makes sense to me for macros that are say 20 lines. But I've seen monstrosities that are difficult to fix all at onse. Josh: I would expect this to apply to each macro rule individually. Naming a group in a macro rule would require you to name every group in *that rule*, but you could still have unnamed groups in *other rules*. Trevor: That was my intent. ## Entire Group Expansion eholk: Say I have a macro like this: ```rust macro_rules! foo { ($group($i:ident)*) => { /* omitted */ } } ``` and then this invocation: ```rust foo!(a b c d); ``` and I want that to expand to: ```rust { a b c d () } ``` Can I write that RHS using entire group expansion? I'd expect to write it like this: ```rust macro_rules! foo { ($group($i:ident)*) => { { $group ()} } } ``` But it seems like that would look like a group expansion. Trevor: the "Grammar update for "match exactly once" section has a bit about this, it is indeed ambiguous eholk: 👍 I guess you could do it the old fashioned way, where you do `$group($i)` instead.` Oh, I also just noticed basically the exact same issue is addressed in the Ambiguity section above. (we covered this, moving on) ## Minor nit/question about metavar exprs ```rust $g1( ${index($g1)} ) ``` tmandry: I think we could drop the inner `$` here; thoughts? ```rust $g1( ${index(g1)} ) ``` tmandry: The argument for this is that each instance of `$` sort of "invokes a macro expansion" and the group name is just an argument inside an existing metavar expansion. It seems slightly nicer to me. Tyler: Trevor brought up the issue with concat; I'll repeat what I dropped above: > Toying with the idea that it should be `${concat({ var }, $var)}`, i.e., that `{}` switches from expression space back into literal space and `${}` switches from literal into expression space. Josh: Completely in favor. The only issue is we'd have to resolve the issue around concat, but we can figure out how to quote it. Josh: I'd like to propose using ``` ` ``` for that. For instance, ``` concat(`prefix_`, name, `_suffix`) ```. Tyler: I don't hate it. ## Next Steps Trevor: The pain poins of syntax and ambiguity match what I expected in the RFC. It seems there are a handful of things that can be expanded or clarified. And the group expansion should be an unresolved question. And the handling of $ or not-$ should be an orthogonal thing. TC: Do you have an experimental implementation? Trevor: I don't. I opened a tracking issue some time ago. Tyler: Given the consensus, should we propose an FCP? Josh: +1 assuming we add the above to the unresolved questions. TC: I'd like to have these answered and polish up the document. If what we did gets you unblocked on the experimentation, then I think we'll end up in a good place where the implementation informs the RFC and lets us accept an RFC that describes the likely-final user-facing behavior. Then we can accept that and hopefully stabilize soon after. Trevor: That's what I had in mind too. Going through the implementation can help disambiguate the examples. Trevor: In my mind, this meeting and a comment on the RFC is sufficient for starting the implementation.