owned this note
owned this note
Published
Linked with GitHub
# Design meeting notes 2020.02.07
## Notes on existing model
### Pre-expansion gating, how it works today
* Code sometimes contains unstable features passed to macros. We need to gate thouse *before* expansion
* Parser collects a set of spans into a ["sink"](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_session/parse/struct.GatedSpans.html)
* This should be equivalent to sanitizing **concrete** syntax tree. This is more powerful than sanitizing **abstract** syntax tree.
* Parser currently drives conditional compilation when it comes to modules
* Parser presently has a loop like:
* parse module file foo.rs
* identify `mod bar;` declarations in there and determine if they are cfg'd out
* if not, then open `bar.rs` and parse it
* We would like to change this so that expansion triggers the parser and drives this [#64197](https://github.com/rust-lang/rust/issues/64197)
* parser is a function invoked by expansion:
* Input: stream of proc-macro2 tokens
* Output: an AST (along with some auxiliary information)
### Other Considirations
* Some things that parser has to handle:
* Recovery from errors
* In an IDE, it's important that parsing never fails
* Today's parser does do a fair amount of recovery but sometimes not enough, e.g. [this example](https://gist.github.com/matklad/5725e97192363c973c985e3d765d7fbf) fails to produce an AST at all
* But we will do semantic recovery, such as recognizing `auto x = 0` (C++) and trying to convert that to the user's intent (`let x = 0`)
* Proposal is to split the "parsing function" into two steps:
* parse-to-events: takes in tokens, produces some kind of event stream or other more "neutral" data structure ("untagged trees")
* also, "neutral" doesn't do interning itself
* events-to-AST: something that constructs the AST from that?
* where "IDE-level recovery" would occur in the first step
* semantic recovery perhaps in both
* One point of debate is the value of having parser be "typed"
* i.e., producing a typed data structure like Rust's current AST means that changes to that AST ultimately require changes to the parser
* and similarly it helps to prevent bugs, as one cannot accidentally parse an expression (instead of, say, a pattern) and put it in the place it does not belong within the AST
* this doesn't necessarily have to be done with enums, as today, [centril proposes another possible structure here](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/design.20meeting.202020.2E02.2E07/near/187650896)
* However, having parser produce untagged trees does have some benefits:
* better at representing "incomplete trees and trivia nodes" (trivia nodes indicates things like comments, whitespace, that compiler doesn't care about, but IDE must)
* "typed trees exclude some states, but with broken code, any state is possible" -- [matklad](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/design.20meeting.202020.2E02.2E07/near/187650902)
* one question: how many types exist?
* in rustc today, not that many
* but rust-analyzer has [246 syntax kinds](https://github.com/rust-analyzer/rust-analyzer/blob/1996762b1f2b9cb196cc879f0ce26d28a3c450c8/crates/ra_parser/src/syntax_kind/generated.rs#L12-L245)
* because it represents more syntactic details
* some debate about how best to ensure correctness of parser
* fuzz against defined grammar?
* rely on typing?
* both?
* in practice, how many bugs come up that typing would fix?
* Should the parser produce "events" or a Concrete Syntax Tree?
* events are simpler and potentially more efficient
* CST can be readily used in rust-analyzer, but would probably require more memory in rustc (as both CST and AST will be in memory at one point).