Reserved prefixes considerations

What is this

This document describes the various considerations that have gone into the decision as how to implement RFC #3101.

In particular, there are two main implementation strategies under consideration:

  • LEX: Make an edition-dependent lexer and give lexer errors for things like foo"bar", foo#bar, and foo#99.
    • This includes keyords like match"foo"
  • JOINTNESS: Make the lexer edition-independent but use "jointness" information in the parser to manage prefixes.
    • So if the parser sees a foo and "bar" token, and determines there is no whitespace between them, and we are in Rust 2021, it would error.

On balance

Proposed result: On balance, we should make an edition-dependent lexer. In particular, the ergonomic hit of requiring independent lexing outweighs the pro of allowing more procedural macros, particularly given that LEX is more forwards compatible. There doesn't seem to be a killer argument against LEX.

Considerations

The following points were raised to help in deciding between those two approaches.

JOINTNESS would require lexing to be independent of prefix

Today the lexer lexes things different depending on prefix:

  • "foo" is a standard string and cannot, for example, contain invalid UTF-8 or illegal escape sequences like \xXX
  • r"foo" is a raw string and can (with #) embed " or other escape sequences
  • b"foo" is a byte string and can only contain ASCII

There are many future extensions that might want their own lexing rules:

  • f"foo{bar("")}" might want to permit "" recursively inside of code blocks.
  • c"foo", for C strings, might want to support escape sequences.

Using JOINTNESS information would however force those "foo" strings to be lexed according to the traditional rules. We could use # escaping to get raw string behavior and then re-implement the various rules in the parser, but that's not terribly ergonomic (c#"foo"# etc).

Lexer currently has no knowledge of editions and some APIs somewhat depend on that

Using LEX will require the lexer to have knowledge of editions. Existing APIs that tokenize do somewhat depend on this, but they can be deprecated.

Some code uses keywords as “prefixes”

For example, match”foo”. Unless the lexer knows about keywords, this code will become an error under LEX (though it could be readily accommodated under JOINTNESS). It seems ok for this to be an error though: match does indeed resemble a prefix here, and perhaps we would ultimately want to give some semantics to that.

Forwards compat hazard with JOINTNESS due to macro rules arm ordering

If you have a macro-rules like

macro_rules! m { ($x:expr) => { } ($($y:tt)) => { } }

then under JOINTNESS the behavior of m!(f#"foo") could change when f#"foo" becomes a legal expression. Fixing that requires making $x:expr parse successfully with macro-rules but error later.

Under LEX this is a non-issue as we don't get there.

With JOINTNESS, procedural macros can prototype things we might want (but therefore can interpret sequences we haven't stabilized yet)

For example, one could write a macro to handle f"" strings by examining jointness (people do this today with span hacks). Even if we add different f strings, your macro keeps working, because it operates before the parser comes along.

LEX can be converted to JOINTNESS later

If we give lexer errors, we can eventually fix the rules and adopt a jointness based approach if we want.

Unknowns

Jointness preservation with macro-rules

How well is jointness preserved under macro rules? For example, if I do this, what happens?

macro_rules! m { ($a:tt $b:tt $c:Tt) => { $a $b $c } }

Is m!(foo#bar) still considered a series of three joint tokens? What about when they are not adjacent, or one inserts a #:

macro_rules! m { ($a:tt # $c:Tt) => { $a # $c } }

Lex error prevents us from getting here
Jointness means we always produce same set of tokens
But if can change the meaning of $t:expr
Jointness cannot be expressed in macro-rules, cannot take multiple tokens and preserve jointness
What happen with += today?

Select a repo