owned this note
owned this note
Published
Linked with GitHub
---
title: TrieMaps that match
description: TrieMaps
---
# Ideas for improvement
* More formal treatment
* Most-specific matching? Needs `lookupLessSpecific` and `lookupMoreSpecific`....
* Finding unifiers
* Doing non-linear matching through deferred equality constraints
* Polytypic derivation of matching lookup à la Hinze?
* Efficient `fromList(With)` by partitioning
* Implemented that, but it's slower. Perhaps It would be faster if we used a mutable array that we'd partition as we go. Somehow.
# In Apr 2022
A bit smaller-sized ideas from issues:
- https://github.com/simonpj/triemap-paper/issues/9: A small refactoring. We talked about it in a past call and ultimately settled on these neat functions:
```hs
viewOcc :: C Var -> Occ
viewBndr :: C Id -> C Int
data Occ
= BVar Int
| FVar Var
| PVar Int
```
- https://github.com/simonpj/triemap-paper/issues/7: matching lookup for singleton maps. Definitely a must-have, and I'm eager to figure it out. Paged out on the moment, though.
Here's what I had on my slides in January (talk at WITS, slides) , in no particular order:
1. Unifying lookup
2. Matching lookup as a generic programming library
3. Efficiency tweaks
4. Do other term indexing structures enjoy similar encodings?
5. Implementation in GHC
6. More formal treatment
7. Deferred non-linear equality constraints
Thoughts:
- (1) could perhaps be achieved in the same prototypical manner as we currently talk about matching lookup. It will also open other cans of worms, so it's perhaps more work than any of us is willing to invest. Perhaps I'll give it a try if I get overly excited.
- Factoring out a `MatchState` was a successful refactoring. I tried to replace it with `UniState` to do unification, but it turned out to be infeasible for a couple of reasons. We should state these in the paper; we've talked about them in our call from 3 May https://hackmd.io/hd8Y3_JnT8egiDtveG5RxQ
- I've dabbled with (2) in the winter. It wasn't very fruitful because our Generic programming libraries aren't well suited for deriving variations of a data type (like a de-Bruijnised Expr). TH would certainly be powerful enough to define a general library, but I don't have the energy to write one. Plus, we wouldn't be able to use it in GHC either because of the no-TH policy.
- Regarding (3), I tweaked perf extensively before our submission. I'd be surprised if there is any other low-hanging fruit. Perhaps we'll see huge wins when we get matching singleton maps working (see above)
- Other term indexing structures (4): Generally interesting idea, but Leonardo's mail from July 3 suggests that dicrimination tree are the gold standard for interactive theorem provers (Coq, Isabelle, Lean). Automated theorem provers seem to use code trees (vampire, z3). I'd have to read up on code trees, but I suspect it isn't so simple to derive a bespoke code tree for a given term data type as simply as it is the case for discriminiation trees and TrieMaps. But perhaps I'll spend half a day thinking about it.
- On the front of (5), Matt's RoughMap work is on my radar, but I'm not that familiar with it. Do you think it makes sense to extend the approach so that we can talk about it in the paper? How much work do you think that would be?
- A more formal treatment (6) would probably take the steam out of the story that is told. I also don't think that a formalisation of what our triemaps should do is worth a penny without a mechanised proof, and I don't intend to write one.
- At the moment, we eagerly discharge non-linear equality constraints, but maybe it's good for the implementation *and* our story in the paper to defer solving (7). It's also a nice step stone to (1) in that full unification requires similar book-keeping. Perhaps worth a try?
- Better solution: Thread around `MatchState`/`UniState`. Same architectural effect, I think, only that we still solve constraints eagerly. That's what we do now.
# Notes
- PatVarMap (ModAlpha Expr) as MatchState doesn't work well, because
- the Expr must be closed anyway (or at least may not capture any bound variables), so the BoundVarEnv could be emptyDBE
- In fact the BoundVarEnv *must* be emptyDBE, otherwise the DeBruijn leveling means we won't equate `\x. x` with the same sub-expression of `\b.\x.x`