Proposal: Unification

# Proposal: Unification ###### tags: `Juvix` `juvix-project` Unification concerns type/conversion checking with the presence of meta-variables. Meta-variables are variables with unknown values but type checking/elaboration progress could generate constraints that these variables have to satisfy and thus find the solutions to these variables. Implicits (and implicit arguments) are inferred by unification. Conversion checking (of terms and closures) determines whether the two inputs are equivalent. Similarly, unification determines whether the two inputs are equivalent with the presence of meta-variables. Naturally, unification is implemented as an extension to conversion checking. ## Conversion Conversion (see [`Core.Normalise`](https://github.com/idris-lang/Idris2/blob/master/src/Core/Normalise.idr)) is extended to support unification. Conversion of `Term` is normalisation (turn them into normal forms/`nf`): ```haskell Convert Term where convGen q defs env x y = convGen q defs env !(nf defs env x) !(nf defs env y) ``` Conversion of `Closure` is evaluation of closure such that conversion checking can be done: ```haskell Convert Closure where convGen q defs env x y = convGen q defs env !(evalClosure defs x) !(evalClosure defs y) ``` The evaluator is pretty standard. To convert binders, we add a new free variable and closure and check that the two evaluated scopes with the new variable added are equal: ```haskell convGen q defs env (NBind fc x b sc) (NBind _ x' b' sc') = do var <- genName "conv" -- generate a new variable --"conv" is a tag for easier debugging let c = MkClosure defaultOpts [] env (Ref fc Bound var) -- make a closure with the new variable in it bok <- convBinders q defs env b b' if bok then do bsc <- sc defs c bsc' <- sc' defs c convGen q defs env bsc bsc' else pure False ``` ## Unification `Unify` is similar to conversion except that it maintains a *unification state* (`UState`), which keeps track of the meta-variables we are inferring and the constraints we generated for them. See more below. ```haskell unify : Unify tm => {vars : _} -> {auto c : Ref Ctxt Defs} -> -- the context is required as in conversion -- the state stores information of the meta-variables and the their constraints {auto u : Ref UST UState} -> UnifyInfo -> FC -> Env Term vars -> tm vars -> tm vars -> Core UnifyResult unify {c} {u} = unifyD c u ``` Also, instead of only returning *True* or *False* like in conversion, unification returns `UnifyResult`, which essentially returns: - *Yes*, the terms evaluate to the same thing. - *No*, the terms cannot be evaluated to the same thing and cannot be unified. It throws a type mismatch error. - *Yes, but...*, the terms may be unifiable. No definitive answer on whether the terms can unify or not until there is further progress. `UnifyResult`: ```haskell record UnifyResult where constructor MkUnifyResult constraints : List Int holesSolved : Bool -- did we solve any holes namesSolved : List Int -- which ones did we solve (as name indices) addLazy : AddLazy ``` As we call `Unify`, some or all of the meta-variables may be solved. These will be stored in the `namesSolved` field. We may only be able to generate constraints for them. In that case, we don't know whether unification succeed or not until we know the constraints can be satisfied (thus the *Yes, but...* result). ### Holes and guesses [Recall](https://gist.github.com/thealmarty/ad574da780b902461117e905b3c078aa) the global context has the following constructors: ```haskell data Def : Type where Hole : (numlocs : Nat) -> -- Number of locals in scope at binding point -- (mostly to help display) HoleFlags -> Def -- Constraints are integer references into the current map of -- constraints in the UnifyState (see Core.UnifyState) Guess : (guess : ClosedTerm) -> (envbind : Nat) -> -- Number of things in the environment when we guessed the term (constraints : List Int) -> Def ``` Any meta-variables we encounter are added to the global context as `Hole` or `Guess`. See [Type checking in the presence of meta-variables — Ulf Norell, Catarina Coquand, 2007](http://www.cse.chalmers.se/~ulfn/papers/meta-variables.pdf) for details. A `Hole` is a meta-variable to be solved. When we encounter a meta-variable, we add them as a `Hole` in the global context. A `Guess` is a meta-variable applied to its current environment. The first argument `guess` is the value that it should have assuming all the `constraints` (the third argument) are satisfied. Once the list of constraints becomes an empty list, i.e., `constraints = []`, The meta-variable can be promoted into an ordinary `PMDef` in the global context. ### The unification state (`UState`) `UState` stores **holes and guesses** with **a global list of constraints**, referred to by an `Int` (or we can use `Nat`, which is more precise/right): ```haskell record UState where constructor MkUState holes : IntMap (FC, Name) -- All metavariables with no definition yet. -- 'Int' is the 'Resolved' name guesses : IntMap (FC, Name) -- Names which will be defined when constraints solved -- (also includes auto implicit searches) currentHoles : IntMap (FC, Name) -- Holes introduced this elaboration session delayedHoles : IntMap (FC, Name) -- Holes left unsolved after an elaboration, -- so we need to check again at the end whether -- they have been solved later. Doesn't include -- user defined hole names, which don't need -- to have been solved constraints : IntMap Constraint -- map for finding constraints by ID dotConstraints : List (Name, DotReason, Constraint) -- dot pattern constraints nextName : Int nextConstraint : Int delayedElab : List (Nat, Int, Core ClosedTerm) -- Elaborators which we need to try again later, because -- we didn't have enough type information to elaborate -- successfully yet. -- 'Nat' is the priority (lowest first) -- The 'Int' is the resolved name. Delays can't be nested, -- so we just process them in order. logging : Bool ``` ### Generating terms for holes and guesses `newMeta` and `newConstant` are helper functions that create meta-variables. `newMeta` (calls `newMetaLets`) creates a new meta-variable represented by a `Hole`. Once a new meta-variable is created and unification is applied, one may find a solution for it. ```haskell -- Create a new metavariable with the given name and return type, -- and return a term which is the metavariable applied to the environment -- (and which has the given type) -- Flag whether cycles are allowed in the result, and whether to abstract -- over lets newMetaLets : {vars : _} -> {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> FC -> RigCount -> Env Term vars -> Name -> Term vars -> Def -> Bool -> Bool -> Core (Int, Term vars) newMetaLets {vars} fc rig env n ty def nocyc lets = do let hty = if lets then abstractFullEnvType fc env ty else abstractEnvType fc env ty let hole = record { noCycles = nocyc } (newDef fc n rig [] hty Public def) log "unify.meta" 5 $ "Adding new meta " ++ show (n, fc, rig) logTerm "unify.meta" 10 ("New meta type " ++ show n) hty defs <- get Ctxt idx <- addDef n hole addHoleName fc n idx pure (idx, Meta fc n idx envArgs) where envArgs : List (Term vars) envArgs = let args = reverse (mkConstantAppArgs {done = []} lets fc env []) in rewrite sym (appendNilRightNeutral vars) in args newMeta : {vars : _} -> {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> FC -> RigCount -> Env Term vars -> Name -> Term vars -> Def -> Bool -> Core (Int, Term vars) newMeta fc r env n ty def cyc = newMetaLets fc r env n ty def cyc False ``` When a solution is not available, we call `newConstant` to make a `Guess` def. A `Guess` is guarded by some constraints which are generated by unification. The constraints need to be solved before more elaboration can be done. `newConstant` takes a term which the guess is constructed from and its environment as inputs. It also takes in the list of constraints that need to be satisfied/solved. ```haskell mkConstant : {vars : _} -> FC -> Env Term vars -> Term vars -> ClosedTerm mkConstant fc [] tm = tm mkConstant {vars = x :: _} fc (b :: env) tm = let ty = binderType b in mkConstant fc env (Bind fc x (Lam fc (multiplicity b) Explicit ty) tm) -- Given a term and a type, add a new guarded constant to the global context -- by applying the term to the current environment -- Return the replacement term (the name applied to the environment) newConstant : {vars : _} -> {auto u : Ref UST UState} -> {auto c : Ref Ctxt Defs} -> FC -> RigCount -> Env Term vars -> (tm : Term vars) -> (ty : Term vars) -> (constrs : List Int) -> Core (Term vars) newConstant {vars} fc rig env tm ty constrs = do let def = mkConstant fc env tm let defty = abstractFullEnvType fc env ty cn <- genName "postpone" let guess = newDef fc cn rig [] defty Public (Guess def (length env) constrs) log "unify.constant" 5 $ "Adding new constant " ++ show (cn, fc, rig) logTerm "unify.constant" 10 ("New constant type " ++ show cn) defty idx <- addDef cn guess addGuessName fc cn idx pure (Meta fc cn idx envArgs) where envArgs : List (Term vars) envArgs = let args = reverse (mkConstantAppArgs {done = []} True fc env []) in rewrite sym (appendNilRightNeutral vars) in args ``` `Constraint`'s (in [Core.UnifyState](https://github.com/idris-lang/Idris2/blob/master/src/Core/UnifyState.idr)) are pairs of terms (when there is only one constraint) or lists of terms (when there are more than one constraint): ```haskell data Constraint : Type where -- An unsolved constraint, noting two terms which need to be convertible -- in a particular environment MkConstraint : {vars : _} -> FC -> (withLazy : Bool) -> (blockedOn : List Name) -> (env : Env Term vars) -> (x : Term vars) -> (y : Term vars) -> Constraint -- An unsolved sequence of constraints, arising from arguments in an -- application where solving later constraints relies on solving earlier -- ones MkSeqConstraint : {vars : _} -> FC -> (env : Env Term vars) -> (xs : List (Term vars)) -> (ys : List (Term vars)) -> Constraint -- A resolved constraint Resolved : Constraint ``` `solveConstraints` (in [Core.Unify](https://github.com/idris-lang/Idris2/blob/master/src/Core/Unify.idr)) solves the generated constraints by looking at the `UState`. It is called not only at the end, but at various times of the elaboration to increase efficiency. ```haskell solveConstraints : {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> UnifyInfo -> (smode : SolveMode) -> Core () ``` Meta-variables are not substituted in applications to facilitate sharing. ### Unifying constructors (in [Core.Unify](https://github.com/idris-lang/Idris2/blob/master/src/Core/Unify.idr)) Constructors are injective/cancelable, that is, for example, if `Succ x = Succ y` then we can conclude `x = y`. Therefore, we can unify and convert constructors in similar ways: check that the list of arguments correspond. If not, there is no way to unify them and it throws an error. One needs to be careful about scoping though. Because of dependent types, arguments that are later can depend on earlier arguments, so the order is important. ### Unifying blocked applications (in [Core.Unify](https://github.com/idris-lang/Idris2/blob/master/src/Core/Unify.idr)) `unifyApp` unifies an application with some values: ```haskell unifyApp : {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> {vars : _} -> -- swap the order when postponing -- (this is to preserve second arg being expected type) (swaporder : Bool) -> UnifyInfo -> FC -> Env Term vars -> FC -> NHead vars -- blocked application head -> List (Closure vars) -- blocked arguments -> NF vars -- value we're unifying with -> Core UnifyResult ``` For example, unifying `plus x Z` with `Z`. The blocked application head is `plus`, the blocked arguments are `x` and `Z` (we cannot solve this at this point, so the application is blocked), and the value we're unifying with is `Z`. There are two interesting cases when unifying blocked applications. First, when we're unifying a meta-variable with application. Second, when it's an application that is convertible to the value, then unification is successful even with the blocked application. #### Unifying meta-variables (in [Core.Unify](https://github.com/idris-lang/Idris2/blob/master/src/Core/Unify.idr)) See [A tutorial implementation of dynamic pattern unification](https://adam.gundry.co.uk/pub/pattern-unify/) for more details on pattern unification. ```haskell unifyApp swap mode loc env fc (NMeta n i margs) args tm = unifyHole swap mode loc env fc n i margs args tm ``` We need to check that the meta-variable is applied to distinct variables. That is, the arguments `margs` and `args` above have to be distinct. If so, we attempt to update the definition of `n` to `n margs args = tm`. This will only succeed if variables in `tm` occur in `margs,args`. #### Unifying other blocked applications that are convertible ```haskell -- If they're already convertible without metavariables, we're done, -- otherwise postpone until later, constraints for now. unifyApp False mode loc env fc hd args tm = do gam <- get Ctxt if !(convert gam env (NApp fc hd args) tm) then pure success else postponeS True False loc mode "Postponing constraint" env (NApp fc hd args) tm unifyApp True mode loc env fc hd args tm = do gam <- get Ctxt if !(convert gam env tm (NApp fc hd args)) then pure success else postponeS True True loc mode "Postponing constraint" env (NApp fc hd args) tm ``` ## Unification in `checkExp` (in [TTImp.Elab.Check](https://github.com/idris-lang/Idris2/blob/master/src/TTImp/Elab/Check.idr)) `checkExp` checks whether the type we got for the given type matches the expected type. It takes a `UState` and performs unification when needed and returns the term and its type. When there are no constraints to be solved, then unification has succeeded and we can try and solve previous unsolved constraints by calling `solveConstraints` (called by `convertWithLazy`). ```haskell -- Check whether two terms are convertible. May solve metavariables (in Ctxt) -- in doing so. -- Returns a list of constraints which need to be solved for the conversion -- to work; if this is empty, the terms are convertible. convertWithLazy : {vars : _} -> {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> {auto e : Ref EST (EState vars)} -> (withLazy : Bool) -> (precise : Bool) -> FC -> ElabInfo -> Env Term vars -> Glued vars -> Glued vars -> Core UnifyResult convertWithLazy withLazy prec fc elabinfo env x y = let umode : UnifyInfo = case elabMode elabinfo of InLHS _ => inLHS _ => inTermP prec in catch (do let lazy = !isLazyActive && withLazy logGlueNF 5 ("Unifying " ++ show withLazy ++ " " ++ show (elabMode elabinfo)) env x logGlueNF 5 "....with" env y vs <- if isFromTerm x && isFromTerm y then do xtm <- getTerm x ytm <- getTerm y if lazy then unifyWithLazy umode fc env xtm ytm else unify umode fc env xtm ytm else do xnf <- getNF x ynf <- getNF y if lazy then unifyWithLazy umode fc env xnf ynf else unify umode fc env xnf ynf when (holesSolved vs) $ solveConstraints umode Normal pure vs) (\err => do defs <- get Ctxt xtm <- getTerm x ytm <- getTerm y -- See if we can improve the error message by -- resolving any more constraints catch (solveConstraints umode Normal) (\err => pure ()) -- We need to normalise the known holes before -- throwing because they may no longer be known -- by the time we look at the error defs <- get Ctxt throw !(normaliseErr (WhenUnifying fc env xtm ytm err))) export convert : {vars : _} -> {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> {auto e : Ref EST (EState vars)} -> FC -> ElabInfo -> Env Term vars -> Glued vars -> Glued vars -> Core UnifyResult convert = convertWithLazy False False export convertP : {vars : _} -> {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> {auto e : Ref EST (EState vars)} -> (precise : Bool) -> FC -> ElabInfo -> Env Term vars -> Glued vars -> Glued vars -> Core UnifyResult convertP = convertWithLazy False -- Check whether the type we got for the given type matches the expected -- type. -- Returns the term and its type. -- This may generate new constraints; if so, the term returned is a constant -- guarded by the constraints which need to be solved. export checkExpP : {vars : _} -> {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> {auto e : Ref EST (EState vars)} -> RigCount -> (precise : Bool) -> ElabInfo -> Env Term vars -> FC -> (term : Term vars) -> (got : Glued vars) -> (expected : Maybe (Glued vars)) -> Core (Term vars, Glued vars) checkExpP rig prec elabinfo env fc tm got (Just exp) = do vs <- convertWithLazy True prec fc elabinfo env got exp case (constraints vs) of [] => case addLazy vs of NoLazy => do logTerm 5 "Solved" tm pure (tm, got) AddForce r => do logTerm 5 "Force" tm logGlue 5 "Got" env got logGlue 5 "Exp" env exp pure (TForce fc r tm, exp) AddDelay r => do ty <- getTerm got logTerm 5 "Delay" tm pure (TDelay fc r ty tm, exp) cs => do logTerm 5 "Not solved" tm defs <- get Ctxt empty <- clearDefs defs cty <- getTerm exp -- we haven't solved it so we're making a guess ctm <- newConstant fc rig env tm cty cs -- tm is the guess for the solution under constraints cs dumpConstraints 5 False case addLazy vs of NoLazy => pure (ctm, got) AddForce r => pure (TForce fc r tm, exp) AddDelay r => do ty <- getTerm got pure (TDelay fc r ty tm, exp) checkExpP rig prec elabinfo env fc tm got Nothing = pure (tm, got) export checkExp : {vars : _} -> {auto c : Ref Ctxt Defs} -> {auto u : Ref UST UState} -> {auto e : Ref EST (EState vars)} -> RigCount -> ElabInfo -> Env Term vars -> FC -> (term : Term vars) -> (got : Glued vars) -> (expected : Maybe (Glued vars)) -> Core (Term vars, Glued vars) checkExp rig elabinfo = checkExpP rig (preciseInf elabinfo) elabinfo ``` ## Elaborating *Implicits* with `checkTerm` (in [TTImp.Elab.Term](https://github.com/idris-lang/Idris2/blob/master/src/TTImp/Elab/Term.idr)) Where do meta-variables come from? Why do we need unification to solve them? The answer is `Implicit`, as in [TTImp.TTImp](https://github.com/idris-lang/Idris2/blob/master/src/TTImp/TTImp.idr): ```haskell data RawImp : Type where ... -- An implicit value, solved by unification, but which will also be -- bound (either as a pattern variable or a type variable) if unsolved -- at the end of elaborator Implicit : FC -> (bindIfUnsolved : Bool) -> RawImp -- with-disambiguation IWithUnambigNames : FC -> List Name -> RawImp -> RawImp ``` During type checking, when we encounter an `Implicit`, we make a meta-variable with the expected type, and hope that unifying the meta-variable and the expected type will succeed. ```haskell checkTerm rig elabinfo nest env (Implicit fc b) (Just gexpty) = do nm <- genName "_" -- create a new name for the expected type expty <- getTerm gexpty -- get the expected type -- generate a meta-variable for the implicit, with the expected type metaval <- metaVar fc rig env nm expty -- Add to 'bindIfUnsolved' if 'b' set ... pure (metaval, gexpty) ``` Similarly for implicit arguments in types. For example, ```haskell reverse : {a: Type} -> List a -> List a ``` `{a: Type}` is the implicit argument. When checking applications, we first look at the function's type. If there's an implicit argument, create a meta-variable for it (like above) and continue checking the application. Idris 2 has explicit pattern variable binding. When `checkTerm` encounters an `IBindVar`, it notes the `name` (of type `String`, the second input) and the expected type. It creates a *pattern* meta-variable for the pattern. ```haskell data RawImp : Type where ... -- A name which should be implicitly bound IBindVar : FC -> String -> RawImp ``` At the end of elaboration, we pattern bind all the names we noted (and any names which depend on it). This involves sorting variables into dependency order. The swap function in Idris 2 optimizes the naming and keeps track of the order etc.