Overview of the rustc

# Overview of the rustc :::info https://rustc-dev-guide.rust-lang.org/overview.html ::: The Rust compiler is speacial in two ways: 1. it does what other compilers do not (e.g. borrow checking) 2. it has a lot of unconventional implementation choices (e.g. queries) we will talk about these, and the individual pieces in more detail. ## What the compiler does to your code ### Invocation Compilation begins when a user invokes `rustc` (directly or via `cargo`). Command-line options determine what the compiler does, and [`rustc_driver`](https://rustc-dev-guide.rust-lang.org/rustc-driver/intro.html) parses these options into a [`rustc_interface::Config`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html), which is passed to the rest of the compilation process. ### lexing and parsing #### 1. Lexing - **A low-level lexer** in `rustc_lexer` turns raw Rust source code into small units called **tokens**. - The lexer supports **Unicode**. - **A higher-level lexer** in `rustc_parse` further processes the token stream. - The [`Lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.Lexer.html) struct: - performs additional validation - turns strings into **interned symbols** - The lexer itself does not directly depend on Rust’s full diagnostic system. - Instead, it returns diagnostic data in a simpler form - later, `rustc_parse::lexer` converts that data into real compiler diagnostics - The lexer also preserves full-fidelity source information for: - IDE tooling - procedural macros (`proc-macros`) :::info [String interning](https://en.wikipedia.org/wiki/String_interning) is a way of storing only one immutable copy of each distinct string value. ::: #### 2. Parsing - The parser [translates the token stream into an Abstract Syntax Tree(AST)](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html). - Rust’s parser uses a **recursive descent** approach, which is a **top-down parsing** strategy. - Main parser entry points include: - [`Parser::parse_crate_mod`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_crate_mod) - [`Parser::parse_mod`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_mod) - Additional entry points: - [`rustc_expand::module::parse_external_mod`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html) for external modules - [`Parser::parse_nonterminal`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_nonterminal) for macro parsing - Parsing is carried out with utility methods such as: - `bump` - `check` - `eat` - `expect` - `look_ahead` - Parsing code is organized by **semantic construct**. - Different `parse_*` methods are placed in separate files under `rustc_parse`. - Macro-expansion, AST-validation, name-resolution, and early linting also take place during the lexing and parsing stage. - The parser returns the [`rustc_ast::ast`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/index.html)::{[`Crate`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Crate.html), [`Expr`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Expr.html), [`Pat`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Pat.html), …} `AST` nodes - standard [`Diag`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/struct.Diag.html) API is used for error handling. - The Rust compiler parses a superset of the grammar to recover from errors, allowing it to report multiple issues without stopping immediately. ### AST lowering - The `AST` --***lowering***-> [High-Level Intermediate Representation (`HIR`)](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html). - `HIR` is a more compiler-friendly representation of the `AST`. - This process, lowering involves a lot of [desugaring](https://en.wikipedia.org/wiki/Syntactic_sugar) of things. - The compiler uses HIR to perform the following tasks: - [*type inference*](https://rustc-dev-guide.rust-lang.org/type-inference.html): the process of automatic detection of the type of an expression - [*trait solving*](https://rustc-dev-guide.rust-lang.org/traits/resolution.html): the process of pairing up an impl with each reference to a `trait` - [*type checking*](https://rustc-dev-guide.rust-lang.org/hir-typeck/summary.html): the process of converting user-written types ([`hir::Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html)) into the compiler's internal representation ([`Ty<'tcx>`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html)) to verify type safety, correctness, and coherence. This critical process ensures the integrity of the code within the compiler. ### MIR lowering - HIR is lowered to MIR through THIR. - THIR is a more desugared form of HIR used for pattern and exhaustiveness checking. - MIR is mainly used for [borrow checking](https://rustc-dev-guide.rust-lang.org/borrow-check.html). #### MIR Optimization and Monomorphization - The compiler performs [many optimizations on MIR](https://rustc-dev-guide.rust-lang.org/mir/optimizations.html). - Optimizing MIR helps improve later code generation and compilation speed. - Some optimizations are easier at the `MIR` level than at the `LLVM-IR` level. - At this stage, the compiler also performs *monomorphization collection*, which means gathering the concrete types needed for generic code generation. ### Code generation - In the [code generation(*codegen*) stage](https://rustc-dev-guide.rust-lang.org/backend/codegen.html), compiler representations are turned into an executable binary. - `rustc` first lowers `MIR` into `LLVM-IR`. - Actual [monomorphization](https://en.wikipedia.org/wiki/Monomorphization) happens here, where generic code is duplicated with concrete types. - LLVM then optimizes the LLVM-IR and emits machine-level output such as object files or WASM. - Finally, these outputs are linked together to produce the final binary.