or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
End-to-end queries in rustc
Brief description of the early compilation
Compilation of a crate starts by parsing the toplevel module, then follows to the declared modules recursively. When the parser encounters a macro call, an expansion point is created, to be filled later during macro expansion.
This unexpanded AST is traversed to collect definitions: macro definitions, item-likes (functions, consts, types…), generic parameters, lifetimes, closures… Everything which may be accessed by name or from another crate is a definition. Definitions are identified by their
DefPath
: the sequence of nested definitions from the crate root. Definitions are assigned an index to ease manipulation:LocalDefId
.Once macros have been gathered, the AST is expanded, by replacing the expansion points by
AstFragment
s, which may contain arbirary AST nodes. The macro invocations and definitions inside those newAstFragments
are then collected to participate in macro expansion and name resolution, including new macros definitions.Once a fixed point is reached, there should be no remaining expansion point. The AST lints are called, and we proceed to lowering.
Lowering transforms the AST tree into the HIR tree. Its purpose is to:
(for loops, try blocks, async blocks, impl Trait…).
As a consequence, lowering may create new definitions as it runs (in-band lifetimes for instance). Once the HIR is built, it is indexed.
The
index_hir()
query walks the HIR to organise the nodes in a manner suitable for incremental compilation. The HIR is split into owners: item-like nodes and exported macro definitions. Not all definitions are HIR owners: generic parameters, lifetimes and closures are not. An owner's enclosing node is accessible ashir_owner(local_def_id)
, and the contained nodes are accessible ashir_owner_nodes(local_def_id)
. This separation allows to reduce the incremental invalidation when only the body of a function is modified.Indexing is also responsible for computing HIR parents, allowing to walk the tree from a node to the crate root. Note: the parent of a HIR owner is not always an owner, it can be a statement.
Objective
The end objective is to allow to eventually avoid re-parsing a file that has not changed from the last compilation session.
This document aims to be a simplified description of the current and target query systems. Some subtlety of the current implementation have deliberately been ignored. If such imprecisions may bring a notable undescribed difficulty, please let the author know.
We chose to push passes into the query system from the bottom up. In order, this requires to:
The principal difficulty is that the compiler triggers evaluation by iterating on definitions and invoking queries on them. If lowering becomes a query, we will end up creating definitions while iterating on them elsewhere. During an incremental session, the dependency graph may even try to evaluate queries on definitions which are yet to be created.
Incremental HIR indexing
Indexing the HIR collects the
HirId
s of the HIR tree, and builds two maps:HirId
to the HIR node;HirId
to the node parent'sHirId
.The two maps are then saved for access by two queries:
hir_owner
, dedicated to accessing HIR owners;hir_owner_nodes
, which allow to access the inside nodes.For now, HIR indexing walks the HIR tree for the full crate in order, and builds the two maps at once.
This indexing should be changed to only walk the HIR starting from a HIR owner, and stop when encountering an enclosed owner. The difficulty will be in computing the parent of the HIR owners.
Implementation:
#82891: make HIR parenting and definition parenting consistent;hir_owner_parent
query whose purpose is to map a HIR owner to its parent'sHirId
;OwnerId
as a refinement forLocalDefId
, to be used as argument forhir_*
queries.Current state:
Objective:
Notation:
[]
denotes a collection (Vec
,HashSet
…);->
denotes an associative collection (BTreeMap
,HashMap
…);rustc
will actually do.Incremental lowering
Trying to make earlier passes queries quickly hits two walls.
rustc
compilation is essentially pull-based, the principal pulling key being a traversal of the full HIR tree and the iteration over all definitions. However, the AST->HIR lowering is allowed to create new definitions as part of its desugaring. As a consequence, new definitions may pop out of thin air while we are iterating over all definitions.This behaviour can actually be cured quite easily, by splitting the definition table per owners, and iterating over the definitions using a graph traversal over queries:
In
rustc
, this will be implemented using a HIR visitor traversing the whole crate.Graph of the query system
This system starts with a fully expanded AST
expanded_ast
, along with the collected definitionsfragment_definitions
.The definition of
owner_nodes(id)
takes care of fetching information either aslower_to_hir(id)
orlower_to_hir(id.parent).children
.Incremental late resolution
WIP
Incremental macro expansions
A similar iteration scheme allows us to perform AST expansion as a query, as long as all the parsing step is done beforehand. Trying to make AST expansion incremental or to make parsing incremental, we face the limitations of calling queries by LocalDefId: definitions do not exist yet at that stage, or are an inconvenient way to split the unexpanded AST. As a consequence, we need to change representation.
We chose to split the parsed AST according to expansion points
ExpnId
, which identify AST nodes containing macro invocations. In order to actually call queries with anExpnId
, we need a way to convert aLocalDefId
into anExpnId
in a situation where all theLocalDefId
s do not exist yet. ALocalDefId
is just a shorthand for aDefPathData
. Walking this definitions path upwards.Note: the migration from the current system will also require a few refactorings. For once, we will need to stop having two-phase initialization of
ExpnData
. For incremental expansion to be worthwhile, we will need to migrate all AST passes to use HIR or AST fragments.Graph of the query system
This system starts with a fully expanded AST
expanded_ast
, along with the collected definitionsfragment_definitions
.Incremental macro resolution
Macro resolution and expansion are currently performed in a fixed-point loop, where expanded macros can influence the resolutions in future expansions. This very subtle order is documented here and here. The subtlety comes from the conflict between scopes that rule name resolution, and the possibility for macro expansions to add names to an existing scope.
Expansions form a tree, with an expansion parent and its index inside this parent. This index is defined in declaration order. From an expansion, we are able to enumerate all its parents recursively, as well as all the expansion points that appear before in source order.
The conservative restricted shadowing rule is as follows. Consider a macro invocation \(I\) and a resolution \(A\). Let \(A'\) another resolution, then:
If we replace error reporting by a speculative choice of either candidate, the resolution can be performed using the following algorithm:
This corresponds to the very conservative shadowing in petrochenkov's comment. As it is strictly more conservative, it will find strictly less candidates. As all candidates in this search are found in parents expansions of \(I\), there is never any ambiguity, and we just need to find the closest in terms of scoping.
In case of an ambiguity, no error is returned, rather a candidate is picked deterministically. Once the AST will be fully expanded, the full resolver will be computed in preparation for lowering. This full resolver will be able to report ambiguities.
Incremental parsing
The same trick can be employed to swich from an
ExpnId
(which refers to an AST node), to a reference to a file. This allows to parse of out-of-line modules incrementally. For macros, we can reuserust-analyzer
's trick: create virtual files into which a span can point. In that case, the macro invocation is performed bylocate_tokens
, and returned bytokens
.Eventually, items bodies could be lazily handled, by creating an artificial expansion point in
parse
and stopping the expansion there inexpand_from
.Design proposal
Follows a description of the end state of the query system once all passes have been included in it. This is a maximalist proposal, restricted versions can be written by cutting the graph horizontally. Some changes can be performed gradually from the current behaviour of
rustc
. However, some changes are more invasive, and will require careful thought in themselves.Key stability
In order to re-run queries from one compilation session to the next, we need to save a stable representation of those keys. Definitions have a stable and terse representation using their
DefPathHash
. Likewise,ExpnId
form a tree. For files, we could get away with storing the canonical filesystem path for physical files, and the stable version of(DefId, ExpnId)
for macro expansions.Description of the queries
We separate the queries according to their key, to emphasize where the change of representation occurs.
Splitting by file:
tokens
: tokenize the bytes in a given file;parse
: read all the files, tokenize them, parse them, create expansions points (ExpnId
) at chosen nodes: macro invocations, out-of-line modules;locate_tokens
: find the correct file for out-of-line modules.Splitting by AST expansion points:
collect_definitions
: walk the AST fragment to create definitions, and fill the resolver(this corresponds to
rustc_resolve/def_collector.rs
);resolve_macro
: find the definition of a macro from its name;expand_from
: expand one AST expansion point into an fragments, either by invoking a macro, or by developping an out-of-line module;find_expansion
: convert definitions to nested expansions to expand by walking the DefPath from the crate root.Splitting by AST/HIR owners:
item_ast
: cleanup the AST returned byexpand_from
;resolver
: gather all the definitions accessible from an expansion point for name resolution;ast_definitions
: extract the definitions inside the current owner;item_hir
: lower the AST to HIR for each owner, and store newly created definitions alongside;definitions
: mergeast_definitions
and extra definitions from lowering;index_hir
: walk the HIR to record each node's parent;hir_owner
,hir_owner_nodes
andhir_owner_parent
are projections fromindex_hir
.Name resolution
Name resolution happens on the pre-expansion AST (for macro resolution) and on the expanded AST (for values, types and lifetimes). Gathering all the names can be performed on the
Bootstrapping
The query system is initialized using the toplevel crate module:
find_expansion(LOCAL_CRATE) = [CRATE_EXPN_ID]
;locate_tokens(CRATE_EXPN_ID) = { "./lib.rs", .. }
.The entry point is the
all_definitions
iterator, which walks all HIR owners to find nested definitions. It can be implemented as a simple DAG visit using thedefinitions
query.Graph of the query system