A Note on the Agda Codebase - Type Checking & Reflection

# A Note on the Agda Codebase - Type Checking & Reflection This note aims at understanding the main codebase of [Agda](https://github.com/agda/agda), i.e. the directory `src/full/Agda/`. For now, we focus on its type checking and reflection mechanism. In this note, a file path `src/full/Agda/Foo/Bar.hs` will be referred to as `Agda.Foo.Bar`, adapting the notation of Haskell's module system. ## Requirements We assume the reader has the following skills: * Be familiar with basic concepts of compilers. * Be familiar with Haskell and its toolchain [Cabal](https://cabal.readthedocs.io/) or [Stack](https://docs.haskellstack.org/). * Have the experience with building [Agda](https://github.com/agda/agda) from source. * Be familiar with Agda. * (Optional) Have a basic understanding of [elaborator reflection](https://agda.readthedocs.io/en/latest/language/reflection.html) in Agda. * Introduction: https://github.com/alhassy/gentle-intro-to-reflection In addition, to navigate Agda efficiently, we recommend the following: * An editor/IDE with the support for [Language Server Protocol](https://microsoft.github.io/language-server-protocol/), see the list of [implementations](https://microsoft.github.io/language-server-protocol/implementors/tools/), HLS for short. * [Haskell Language Server](https://github.com/haskell/haskell-language-server). HLS has useful features, including showing type information and Haddock documentation on hover, jumping to definition and peeking type annotations, type checking on the fly, etc. For editors, we personally use Neovim with [Coc](https://github.com/neoclide/coc.nvim) and/or VS Code with the [Haskell](https://marketplace.visualstudio.com/items?itemName=haskell.haskell) extension. It is only a matter of choice, though. ## Overview ### Basic Structure We briefly introduce some important directories of the codebase and the key concepts of Agda. Below is a simplifed directory structure of the source repository of Agda: ``` src/full/ └── Agda ├── ... ├── Compiler ├── Interaction ├── Syntax ├── Termination ├── TypeChecking ├── Utils ├── ... ``` Each directory represents a (somehow) standalone set of functions for a specific task. #### `Agda.Utils` TBD. #### `Agda.Compiler` ``` ... ├── Compiler │ ├── JS │ ├── MAlonzo │ └── Treeless ... ``` `Agda.Compiler` provides utilities for compiling Agda to a different language. It includes compilers to Haskell under `MAlonzo` and JavaScript under `JS`. #### `Agda.Interaction` `Agda.Interaction` includes functions that deal with user interactions, they serve as interfaces between the main Agda program and other underlying components. E.g. `Imports.hs` includes functions such as `parseSource` and `typeCheckMain`, #### `Agda.Syntax` ... ├── Syntax │ ├── Abstract │ ├── Concrete │ ├── Internal │ ├── Parser │ └── Translation ... There are four kinds of syntax objects representing the source code in different stages while checking Agda files. * Concrete syntax represents parsed source code. The parsed results (represented by `Declaration` and `Expr` defined in [`Agda.Syntax.Concrete`](https://hackage.haskell.org/package/Agda-2.6.2/docs/Agda-Syntax-Concrete.html)) contain information such as the position of each declaration in a file. * Abstract syntax is translated from concrete syntax, the translation process handles scope checking and desugaring (e.g., expressions with conflicted names are now distinguished), so abstract syntax should be ready to be type checked. Objects of abstract syntax are represented by `Expr` defined in [`Agda.Syntax.Abstract`](https://hackage.haskell.org/package/Agda-2.6.2/docs/Agda-Syntax-Abstract.html). * Internal syntax is the result of type checking abstract syntax, it should represent working Agda programs. They are defined in [`Agda.Syntax.Internal`](https://hackage.haskell.org/package/Agda-2.6.2/docs/Agda-Syntax-Internal.html). * Reflected syntax is the representation for elaborator reflection, serving as an interface between metaprogrammers and the elaborator, the reflected syntax in Agda is defined in the [built-in library](https://github.com/agda/agda/blob/master/src/data/lib/prim/Agda/Builtin/Reflection.agda), and the reflected syntax passed to the internal type checker is defined in [`Agda.Syntax.Reflected`](https://hackage.haskell.org/package/Agda-2.6.2/docs/Agda-Syntax-Reflected.html). #### `Agda.Termination` TBD. #### `Agda.TypeChecking` ... ├── TypeChecking │ ... │ ├── Monad │ ├── Rules │ ... ... TBD: * The type checking monad `TCM`. * Type checking functions for Data/Record/Function etc. See the section [Type Checking](#Type-Checking). ### What Happends when Agda Checks a File The entry point of Agda is the `runAgda` function in `Agda.Main`. It is the first function to be evaluated when the user run `agda foo.agda` in a shell or check an Agda file via agda-mode. It is called by the `main` function in `src/main/Main.hs`. `runAgda` would call `runAgdaWithOptions`, and `checkFile` in `runAgdaWithOptions` takes a path to an Agda file and produce the results. In `checkFile`, two important functions cover most trivia of checking a file: `parseSource` parse the Agda file and produce a `Source` data structure, then `typeCheckMain` takes a `Source` and produce a `CheckResult`. Both `parseSource` and `typeCheckMain` are defined in `Agda.Interaction.Imports`. Despite being called `typeCheckMain`, the function actually contains scope checking as a part of the process. We will treat scope checking in a seperate section. ## Parsing `ParseFile` in `ParseSource` parses a given file as a module with `moduleParser`, in which other declarations will be parsed accordingly. The grammars and parsers are defined in `src/full/Agda/Syntax/Parser/Parser.y`, according to which parsers will be generated at compile time. For example, we can find this grammar of `UnquoteDecl` in `Parser.y`: ``` UnquoteDecl :: { Declaration } UnquoteDecl : 'unquoteDecl' '=' Expr { UnquoteDecl (fuseRange $1 $3) [] $3 } | 'unquoteDecl' 'data' Id '=' Expr { UnquoteData (getRange($1, $2, $5)) $3 [] $5 } | 'unquoteDecl' 'data' Id 'constructor' SpaceIds '=' Expr { UnquoteData (getRange($1, $2, $4, $7)) $3 (List1.toList $5) $7 } | 'unquoteDecl' SpaceIds '=' Expr { UnquoteDecl (fuseRange $1 $4) (List1.toList $2) $4 } | 'unquoteDef' SpaceIds '=' Expr { UnquoteDef (fuseRange $1 $4) (List1.toList $2) $4 } ``` In the cases of `unquoteDecl data ...`, the results will be stored as `Declaration` constructed by `UnquoteData` (defined in `Agda.Syntax.Concrete`. ## Type Checking > TODO: > * Explain what a type checking monad `TCM` can do. > * How data/record/function/postulate/module definitions, etc. are checked. > What functions are used. > * Metavariable management. > * Reflection > * Primitives Let's take a general look at the type checking process: * In `typeCheckMain`, the returned `CheckResult` is constructed from the result `mi` of `getInterface`. * `getInterface` checks if there already exists interface files (`.agdai` files that are generated when checking a Agda program). if it does, `getStoredInterface` is called. * Otherwise, an interface file is created with `createInterface`, * In `createInterface` there are two crucial steps, scope checking and type checking. Scope checking is done with `concreteToAbstract_`. Type checking is done with `checkDeclCached`. * `concreteToAbstract_` is defined in `Agda.Syntax.Translation.ConcreteToAbstract` (See [Scope Checking](#Scope-Checking) for details). * `checkDeclCached` (defined in `Agda.TypeChecking.Rules.Decl`) as its name indicates, is a cached version of `checkDecl`. Most definitions from now on are defined in `Agda.TypeChecking.Rules.Decl`. Type checking a file by `checkDecl` will eventually log down type infomation by computations in the `TCM` monad, if no errors present. In `checkDecl`, every constructor of a abstract `Declaration` is mapped to a `TCM` computation, i.e., a type checking process. For `UnquoteDecl`, it is `checkUnquoteDecl`. `checkUnquoteDecl` calls `unquoteTop` which calls `evalTCM` (defined in `Agda.TypeChecking.Unquote`), where we can find definitions of all the primitive `TC` computations in Agda, i.e., [primitive operations](https://agda.readthedocs.io/en/latest/language/reflection.html#type-checking-computations) of elaborator reflection. **Metaprograms called with `unquoteDecl` or `unquoteDecl data ... constructor ...` are evaluated at the type checking stage**. For example, `tcDefineFun` takes a `QName` and a list of `Clause` (defined in `Agda.Syntax.Reflected`), it corresponds to the primitive `defineFun` in the built-in elaborator reflection library of Agda. ``` defineFun : Name → List Clause → TC ⊤ ``` How `Name` and `List Clause` in Agda are translated to internal `Qname` and `[Clause]` are explained in the [Elaborator Reflection](#Elaborator-Reflection) section. ### The unquote monad `UnquoteM` Operations of elaborator reflection are computations in the `UnquoteM` monad. ### The type checking monad `TCM` `TCM` handles all the interactions with states we need when type checking a file, Let's take a closer look of the definition of `TCM` at `Agda.TypeChecking.Monad.Base`: ```haskell newtype TCMT m a = TCM { unTCM :: IORef TCState -> TCEnv -> m a } type TCM = TCMT IO ``` There are two states of `TCM`, a mutable `TCState` and an immutable `TCEnv`. How they are used is unclear (we don't even know if it's a monad yet) until we take a look at the instances of `TCM`. There are **a lot** of instances, but the one that needs our immediate attention is `MonadTCEnv`. While `MonadTCState` stores the *static* information about a file like path, highlighting information or declared identifiers (see `Agda.TypeChecking.Monad.Base`), `MonadTCEnv` stores *dynamic* information that would be modified throughout the type checking process. ## Scope Checking Most of the definitions mentioned here are defined in `Agda.Syntax.Translation.ConcreteToAbstract`. Scope checking is essentially **converting concrete syntax to abstract syntax**. `concreteToAbstract_`, when called in `createInterface` (in `Agda.Interaction.Imports`), firstly convert all parsed declarations to `NiceDeclaration` (which is another form of concrete syntax), then convert `NiceDeclaration` to abstract `Declaration` (defined in `Agda.Syntax.Abstract`). How concrete `Declaration`s are converted to `NiceDeclaration`s and how `NiceDeclaration`s are scoped checked are both described by their instance of `ToAbstract`. In `instance ToAbstract NiceDeclaration`, `toAbstract` maps each constructor of `NiceDeclaration` to a `ScopeM` computation which describes the scoping checking process of the respective declaration. For example, when scope checking the following Agda program, ``` unquoteDecl x = declareDef x ... ``` , a `NiceDeclaration` is constructed from the constructor `NiceUnquoteDecl` before being scope checked. The constructor `NiceUnquoteDecl` takes a bunch of info, among which includes a `[Name]` and an `Expr`, each corresponding to the list of name following `unquoteDecl` on the left hand side (in this case, a single `x`), and the expression `e` on the right hand side (in this case, `declareDef x ...`) which should be a metaprogram that defines all the `Name`s. In the process of scope checking the `NiceUnquoteDecl` case of a `NiceDeclaration`, `bindName` gives a name its scope info `KindOfName`, `toAbstract e` then scope checks the expression on the right hand side, in this case, `e` is parsed from `declareDef x ...`. In the end, `x` should be scope checked as a function definition so the programmer can use it as such (implied by the usage of`unquoteDecl` without `data`). Here things get a bit tricky, if we have already told the scope checker `x` is a function definition before `toAbstract e`, the scope checking of the metaprogram on the right hand side will fail. So firstly we have to give the `Name`s different scope checking info (by binding them to `QuotableName`) for `toAbstract e` to succeed. After `toAbstract e`, we tell the scope checker the `Name`s are actually function definitions by calling `bindName` again. ### Elaborator Reflection "Reflected syntax" sometimes refers to the syntax object metaprogrammers use to interact with elaborator reflection, as defined in the [built-in library](https://github.com/agda/agda/blob/master/src/data/lib/prim/Agda/Builtin/Reflection.agda). Other times they refer to their internal counterparts in `Agda.Syntax.Reflected`. Metaprograms have to be translated to their internal counterpart before being evaluated, so does the reflected syntax. Their internal definitions are in `Agda.Syntax.Reflected`, and the translations are defined in `Agda.TypeChecking.Unquote` along with translations of other built-in operations. `evalTCM` defines corresponding computations of primitive operationes in `TC`. The arguments passed to these operations are internal syntax before the operations are translated (i.e. unquoted) by the `unFun` or `tcFun` famalies of functions. These functions work on computations whose arguments are constrained by the `Unquote` typeclass. The instances of `Unquote` define translations from internal syntax to reflected syntax.