owned this note
owned this note
Published
Linked with GitHub
# Modifications to rustc for the Rust REPL
## Motivation
In my efforts to create a REPL (read-evaluate-print-loop) interpreter for Rust, based on the [Miri tool](https://github.com/rust-lang/miri/) in large part, I have found it necessary to make certain modifications to rustc itself. However, as far as possible (and reasonable), I have attempted to put functionality for this REPL into a separate repository so as to not require major new features within rustc.
## Overview of the REPL
The REPL itself is naturally divisible into four principal parts, which run in a loop within a thread pool:
* **Read** -- The code input by the user on the command line is parsed as a self-contained block. This makes use of nothing beyond the existing compiler API.
* **Compile** -- Compile the user-input code into MIR.
* The "user fn" is parsed in a self-contained manner from the code input on the command line. We then prepend local variables from previous evaluation sessions (i.e., iterations of the REPL), with custom tags for compiler usage.
* Pre-parsed template code is processed to expand REPL-specific "macro calls" (e.g., for the "user fn" body or crate imports).
* We create a new compiler session and instance, passing it the pre-parsed AST from the previous step.
* We compile everything to MIR (but no further). This is done manually (rather than by `rustc_driver` and `rustc_interface`), for the sake of being able to control the process granularly.
* Note: an in-memory backing store is used for incremental compilation to speed up compilation and avoid hitting the filesystem.
* **Evaluate** -- Interpret the MIR output from the previous step using Miri.
* The MIR resulting from the compilation step is loaded by Miri, and the memory from the previous evaluation session is restored, using the same sort of serialisation functionality that incremental compilation uses.
* We evaluate the MIR using Miri and a custom "machine" that performs the actual interpretation and implements the hooks.
* The final state of the memory is serialised back to an in-memory store, along with the cached query results (incremental compilation data), and the local variables in the "user fn".
* **Print** -- The result of the evaluated expression is printed to the user in debug format.
* The "template" given to the compiler for the compilation step can either include a step at the end to print the resulting expression, or it can "notify" the interpreter of the result by outputting a string. In either case, an intrinsic is used to retrieve/output the debug representation of the resulting value of the "user fn".
## Modifications to rustc
Through much research and experimentation over the past few months in my fork of rustc, I have ended up with the following notable additions and modifications to the rustc codebase, divided first by general feature/area, second by specific change. These are intended to be minimal whilst not significantly complicating the implementation of the REPL or negatively impacting performance.
* Added an "interpreter mode" to the compiler interface.
* New boolean flag on `rustc::session::config::Options` structure (untracked by incr. comp.).
* Used in one MIR transform pass to prevent dead user variables from being optimised away.
* Used to enable a REPL-only intrinsic for casting a value to `dyn Debug` (if it actually implements th trait).
* Used as a guard to make sure a few interpreter-only fns/methods are not invalidly called in the normal compiler mode.
* **[Non-Essential]** Used to change how type names are pretty-printed.
* **[Non-Essential]** Added support for recognising local variables (in the "user fn") that originated in or were "moved" during a previous evaluation session of thei nterpreter.
* Added an "interpreter tag" field to `syntax::ast::Local` so information about unused/moved locals is available for diagnostics. These tags are the REPL, which itself keeps track of locals from previous evaluation sessions.
* Modified diagnostics in borrow-checking (specifically `report_use_of_moved_or_uninitialized`) to additionally report based on the "intrepreter tag" of a local.
* Added built-in attribute for marking the "user fn" and a `TyCtxt` query for locating and cacheing it's `DefId`.
* **[Non-Essential]** The compiler needs to know about this because of the modified type pretty-printing in interpreter mode. The REPL obviously needs to know about this too, for various reasons concerning evaluation of the MIR.
* **[Non-Essential]** Modified type pretty-printing to print `"eval"` instead of the usual `"crate_name::main::user_fn_closure"` (for example). (Useful for diagnostics and the "print" aspect of REPL.)
* Slightly expanded miri API, for use by the REPL's particular implemention of the `Machine` trait.
* Added `insert_alloc` method to machine, used by REPL for restoring memory when deserialising previous evaluation session.
* Added hooks `before_statement`, `after_statement`, `before_stack_push` (renamed existing method), `after_stack_push`, `before_stack_pop`, `after_stack_pop` (renamed existing method).
* Made stack pop behaviour more flexible, so as to allow the cleanup flag to be independent of wherever the action is null or a "goto".
* **[Non-Essential]** Created a virtual filesystem (VFS) abstraction for use by incremental compilation. The in-memory implementation is used to speed up the REPL across multiple evaluation sessions, and may perhaps find additional uses in the future.
* All file operations (creation, deletion, read, write, locking) currently used by the incremental compilation subsystem are supported, but now the abstracted VFS is used by.
* Two implementations of the `Vfs` trait are provided by rustc: an on-disk one (default, works precisely as before) and an in-memory one (stores everything in a data structure).
The intention is to merge the above features as separate PRs essentially, naturally grouping related functionality.
*Note:* Additions/modifications marked **Non-Essential** are desirable for a good and proper user experience, but are not strictly required for an MVP.
## Maintenance
As mentioned in the *Motivation* section, the idea is to minimise the surface area within the rustc codebase that is used solely for the REPL. Code which is used solely by the REPL at this point shall be clearly documented/commented. Occasional breakages may nonetheless occur, as they do with other tooling (even official ones like clippy and miri), but for now I shall take it upon myself to observe such breakages in a semi-automated (e.g., cronjob) or manual fashion, and update my REPL repository in accordance, thus offloading any significant maintenance burden from the compiler team.
Of course, I am also perfectly willing to act as a maintainer for those parts of the rustc codebase that I will introduce for the purposes of the REPL. In the (unlikely) event the compiler architecture changes significantly so as to make interfacing with the REPL problematic, I can make myself available for discussions with members of the compiler team to remedy this.