# Overview of the rustc
:::info
https://rustc-dev-guide.rust-lang.org/overview.html
:::
The Rust compiler is speacial in two ways:
1. it does what other compilers do not (e.g. borrow checking)
2. it has a lot of unconventional implementation choices (e.g. queries)
we will talk about these, and the individual pieces in more detail.
## What the compiler does to your code
### Invocation
Compilation begins when a user invokes `rustc` (directly or via `cargo`).
Command-line options determine what the compiler does, and [`rustc_driver`](https://rustc-dev-guide.rust-lang.org/rustc-driver/intro.html) parses these options into a [`rustc_interface::Config`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_interface/interface/struct.Config.html), which is passed to the rest of the compilation process.
### lexing and parsing
#### 1. Lexing
- **A low-level lexer** in `rustc_lexer` turns raw Rust source code into small units called **tokens**.
- The lexer supports **Unicode**.
- **A higher-level lexer** in `rustc_parse` further processes the token stream.
- The [`Lexer`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/lexer/struct.Lexer.html) struct:
- performs additional validation
- turns strings into **interned symbols**
- The lexer itself does not directly depend on Rust’s full diagnostic system.
- Instead, it returns diagnostic data in a simpler form
- later, `rustc_parse::lexer` converts that data into real compiler diagnostics
- The lexer also preserves full-fidelity source information for:
- IDE tooling
- procedural macros (`proc-macros`)
:::info
[String interning](https://en.wikipedia.org/wiki/String_interning) is a way of storing only one immutable copy of each distinct string value.
:::
#### 2. Parsing
- The parser [translates the token stream into an Abstract Syntax Tree(AST)](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/index.html).
- Rust’s parser uses a **recursive descent** approach, which is a **top-down parsing** strategy.
- Main parser entry points include:
- [`Parser::parse_crate_mod`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_crate_mod)
- [`Parser::parse_mod`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_mod)
- Additional entry points:
- [`rustc_expand::module::parse_external_mod`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_expand/module/fn.parse_external_mod.html) for external modules
- [`Parser::parse_nonterminal`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_parse/parser/struct.Parser.html#method.parse_nonterminal) for macro parsing
- Parsing is carried out with utility methods such as:
- `bump`
- `check`
- `eat`
- `expect`
- `look_ahead`
- Parsing code is organized by **semantic construct**.
- Different `parse_*` methods are placed in separate files under `rustc_parse`.
- Macro-expansion, AST-validation, name-resolution, and early linting also take place during the lexing and parsing stage.
- The parser returns the [`rustc_ast::ast`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/index.html)::{[`Crate`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Crate.html), [`Expr`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Expr.html), [`Pat`](https://doc.rust-lang.org/beta/nightly-rustc/rustc_ast/ast/struct.Pat.html), …} `AST` nodes
- standard [`Diag`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_errors/struct.Diag.html) API is used for error handling.
- The Rust compiler parses a superset of the grammar to recover from errors, allowing it to report multiple issues without stopping immediately.
### AST lowering
- The `AST` --***lowering***-> [High-Level Intermediate Representation (`HIR`)](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/index.html).
- `HIR` is a more compiler-friendly representation of the `AST`.
- This process, lowering involves a lot of [desugaring](https://en.wikipedia.org/wiki/Syntactic_sugar) of things.
- The compiler uses HIR to perform the following tasks:
- [*type inference*](https://rustc-dev-guide.rust-lang.org/type-inference.html): the process of automatic detection of the type of an expression
- [*trait solving*](https://rustc-dev-guide.rust-lang.org/traits/resolution.html): the process of pairing up an impl with each reference to a `trait`
- [*type checking*](https://rustc-dev-guide.rust-lang.org/hir-typeck/summary.html): the process of converting user-written types ([`hir::Ty`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Ty.html)) into the compiler's internal representation ([`Ty<'tcx>`](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Ty.html)) to verify type safety, correctness, and coherence. This critical process ensures the integrity of the code within the compiler.
### MIR lowering
- HIR is lowered to MIR through THIR.
- THIR is a more desugared form of HIR used for pattern and exhaustiveness checking.
- MIR is mainly used for [borrow checking](https://rustc-dev-guide.rust-lang.org/borrow-check.html).
#### MIR Optimization and Monomorphization
- The compiler performs [many optimizations on MIR](https://rustc-dev-guide.rust-lang.org/mir/optimizations.html).
- Optimizing MIR helps improve later code generation and compilation speed.
- Some optimizations are easier at the `MIR` level than at the `LLVM-IR` level.
- At this stage, the compiler also performs *monomorphization collection*, which means gathering the concrete types needed for generic code generation.
### Code generation
- In the [code generation(*codegen*) stage](https://rustc-dev-guide.rust-lang.org/backend/codegen.html), compiler representations are turned into an executable binary.
- `rustc` first lowers `MIR` into `LLVM-IR`.
- Actual [monomorphization](https://en.wikipedia.org/wiki/Monomorphization) happens here, where generic code is duplicated with concrete types.
- LLVM then optimizes the LLVM-IR and emits machine-level output such as object files or WASM.
- Finally, these outputs are linked together to produce the final binary.