# Push / Pull / Merge operations
__Problem:__ Pushing to base and pulling from base often takes forever, like 8 minutes rather than expected: < 10s. We want to fix this.
* Phases of a pull:
* __Git fetch:__ Git operation to fetch the latest commit of the remote
* __Sync:__ Copy definitions that are present in the source namespace but not present in the destination namespace.
* __Merge:__ Merge of the source namespace and the destination namespace (this calls `Causal.before`)
* Anything else??
* Phases of a push:
* __Git fetch:__ Git operation to fetch the latest commit of the remote (note: this is the same as a `pull`)
* __Merge:__ Merge the source namespace and destination namespace. (this calls `Causal.before`)
* __Git fetch:__ Apart from a round-trip over the network to the remote git server, this should be a no-op if there are no new commits on the remote since the previous __Git fetch__.
* __Verify fast-forward:__ Reject unless it's a fast forward (calls `Causal.before` to check).
* __Sync:__ Copy definitions that are present in the source namespace but not present in the destination namespace.
* __Git push:__ Git operation to add a local commit and push it to the remote repo.
__Scenarios:__
1. `.> pull git@github.com:unisonweb/base:.trunk base` on an empty repo
2. `.> pull git@github.com:unisonweb/base:.trunk base2` on the same repo as 1
3. Some stuff
__Results:__
### `.> pull <base> .base` in empty repo
__base git commit 245f053, 3/11/2020__
|base git commit |245f053|CFI|3c928b7|CFI|
|-------- | ----- |---|------ |---|
|**date** |**11-Mar** ||**06-May** ||
|branchFromFiles | 0.03s / 0.03s |0.03s|1.3s / 1.45s ||
|syncFromDirectory | 4.2s / 4.34s |0.15s|343s / 421s ||
| --- LoadDependentsDir | |0.02s|343s / 421s ||
| --- CopyTransitiveDeps| |0.00s|343s / 421s ||
| --- Copy Dependents Index| |0.00s|343s / 421s ||
| --- Copy Type Index | |0.03s|343s / 421s ||
| --- Copy Type Mentions| |0.09s|343s / 421s ||
|merge branch | 0.42s / 1.61s |0.00s|4.5s / 38.66s||
|propagate default patch | 0.0075s / 0.0075 || .78s / 1.06s||
### `.> pull <base> .base2` in same (now non-empty) repo
|base git commit | 245f053 |3c928b7 |
|-------- | ------------- |------------ |
|**date** | **11-Mar** |**06-May** |
|branchFromFiles | 0.03s / 0.03s |1.53s / 1.82s |
|syncFromDirectory | 0.00s / 0.00s |0.00s / 0.00s |
|merge branch | 0.39s / 1.03s |3.0s / 4.35s |
|propagate default patch | 0.002s / 0.002s | 0.45s / 0.45s|
It's interesting that just `branchFromFiles` is so slow. Why is that?
* Do some sharing of the sub-branches
* Is the slowness just linear slowdown because we're loading 50x more branch files? We think yes, just because lots of tests and docs were added, and new definitions with docs and tests nested underneath.
* Idea: try the other syncing method (would improve the syncFromDirectory phase)
* Idea: load the sub-branch directly, rather than loading the root (would improve branchFromFiles a bit)
* Idea: keep a namespace cache, keyed by hash, use it to preserve sharing (may help with branchFromFiles)
* `Util.cache :: (MonadIO m, Ord k) => k -> m v -> m (k -> m v)`, just use this in a few places in `FileCodebase`. For implementation, either a `TVar (Map k v)` or the [concurrent map from `stm-containers`](https://hackage.haskell.org/package/stm-containers-1.1.0.4/docs/StmContainers-Map.html).
* Next step may be to prepopulate this cache in the background.
* Idea: in the code that syncs a namespace, are we avoiding visiting the same namespace twice? When crawling through history, the same namespace may occur many times.
* Idea: in `Causal`, put elements in `m e` rather than `e`. It's kind of a big refactor (or maybe not?)
* Along same lines, load a `Causal m ()`, store that in the `Branch` as well, various operations on `Branch` could use the `Causal m ()` spine instead.
### Raw results
```
.> pull https://github.com/aryairani/base-fork .base
Finished Git fetch in 0.00608s (cpu), 0.481169s (system)
Finished FileCodebase.Common.branchFromFiles in 0.033558s (cpu), 0.034246s (system)
Finished FileCodebase.Common.getRootBranch in 0.033868s (cpu), 0.034554s (system)
Finished Git fetch (sbh) in 0.0339s (cpu), 0.034587s (system)
Finished SyncFromDirectory in 4.210363s (cpu), 4.340794s (system)
Finished Merge Branch in 0.424144s (cpu), 1.616646s (system)
Finished Propagate Default Patch in 0.0075s (cpu), 0.00751s (system)
.> pull https://github.com/aryairani/base-fork .base2
Finished Git fetch in 0.005311s (cpu), 0.496279s (system)
Finished FileCodebase.Common.branchFromFiles in 0.033443s (cpu), 0.033981s (system)
Finished FileCodebase.Common.getRootBranch in 0.03375s (cpu), 0.034285s (system)
Finished Git fetch (sbh) in 0.033784s (cpu), 0.03432s (system)
Finished SyncFromDirectory in 0.000128s (cpu), 0.000144s (system)
Finished Merge Branch in 0.39215s (cpu), 1.026571s (system)
Finished Propagate Default Patch in 0.001895s (cpu), 0.001891s (system)
```
pulling master
```
.> pull https://github.com/unisonweb/base .base
Timing Git fetch...
Finished Git fetch in 0.007145s (cpu), 1.50681s (system)
Timing Git fetch (sbh)...
Timing FileCodebase.Common.getRootBranch...
Timing FileCodebase.Common.branchFromFiles...
Finished FileCodebase.Common.branchFromFiles in 1.291687s (cpu), 1.45078s (system)
Finished FileCodebase.Common.getRootBranch in 1.29208s (cpu), 1.451172s (system)
Finished Git fetch (sbh) in 1.292109s (cpu), 1.4512s (system)
Timing SyncFromDirectory...s into local codebase...
Finished SyncFromDirectory in 343.271177s (cpu), 421.425816s (system)
Timing Merge Branch...
Finished Merge Branch in 4.577604s (cpu), 38.657448s (system)
Timing Propagate Default Patch...
Finished Propagate Default Patch in 0.780459s (cpu), 1.062506s (system)
.> pull https://github.com/unisonweb/base .base2
Timing Git fetch...
Finished Git fetch in 0.007562s (cpu), 1.806194s (system)
Timing Git fetch (sbh)...
Timing FileCodebase.Common.getRootBranch...
Timing FileCodebase.Common.branchFromFiles...
Finished FileCodebase.Common.branchFromFiles in 1.530779s (cpu), 1.823564s (system)
Finished FileCodebase.Common.getRootBranch in 1.531172s (cpu), 1.823964s (system)
Finished Git fetch (sbh) in 1.531204s (cpu), 1.823994s (system)
Timing SyncFromDirectory...s into local codebase...
Finished SyncFromDirectory in 0.000127s (cpu), 0.000133s (system)
Timing Merge Branch...
Finished Merge Branch in 3.000088s (cpu), 4.35114s (system)
Timing Propagate Default Patch...
Finished Propagate Default Patch in 0.448015s (cpu), 0.452972s (system)
```
### Raw Results, with CopyFilterIndex sync
base as of march
```
Timing FileCodebase.Common.getRootBranch...
Timing FileCodebase.Common.branchFromFiles...
Finished FileCodebase.Common.branchFromFiles in 0.000332s (cpu), 0.000337s (system)
Finished FileCodebase.Common.getRootBranch in 0.000878s (cpu), 0.000894s (system)
⚙️ Processing stanza 1 of 1.Timing Git fetch...
Finished Git fetch in 0.007285s (cpu), 2.045757s (system)
Timing Git fetch (sbh)...
Timing FileCodebase.Common.getRootBranch...
Timing FileCodebase.Common.branchFromFiles...
Finished FileCodebase.Common.branchFromFiles in 0.033141s (cpu), 0.078254s (system)
Finished FileCodebase.Common.getRootBranch in 0.033336s (cpu), 0.078448s (system)
Finished Git fetch (sbh) in 0.033357s (cpu), 0.078472s (system)
Timing SyncFromDirectory...s into local codebase...
Timing Load Dependents Dir...
Finished Load Dependents Dir in 0.023473s (cpu), 0.023509s (system)
Timing Copy Transitive Dependencies...
Finished Copy Transitive Dependencies in 0.000043s (cpu), 0.000044s (system)
Timing Copy Dependents Index...
Finished Copy Dependents Index in 0.000077s (cpu), 0.000078s (system)
Timing Copy Type Index...
Finished Copy Type Index in 0.032337s (cpu), 0.032429s (system)
Timing Copy Type Mentions Index...
Finished Copy Type Mentions Index in 0.090889s (cpu), 0.091452s (system)
Finished SyncFromDirectory in 0.147237s (cpu), 0.147935s (system)
Timing Merge Branch...
Finished Merge Branch in 0.000224s (cpu), 0.000222s (system)
Timing Git fetch...
Finished Git fetch in 0.004898s (cpu), 0.593304s (system)
Timing Git fetch (sbh)...
Timing FileCodebase.Common.getRootBranch...
Timing FileCodebase.Common.branchFromFiles...
Finished FileCodebase.Common.branchFromFiles in 0.030922s (cpu), 0.031982s (system)
Finished FileCodebase.Common.getRootBranch in 0.031356s (cpu), 0.03241s (system)
Finished Git fetch (sbh) in 0.031392s (cpu), 0.032447s (system)
Timing SyncFromDirectory...s into local codebase...
Timing Load Dependents Dir...
Finished Load Dependents Dir in 0.046921s (cpu), 0.049333s (system)
Timing Copy Transitive Dependencies...
Finished Copy Transitive Dependencies in 0.000028s (cpu), 0.000031s (system)
Timing Copy Dependents Index...
Finished Copy Dependents Index in 0.000181s (cpu), 0.000203s (system)
Timing Copy Type Index...
Finished Copy Type Index in 0.049949s (cpu), 0.050617s (system)
Timing Copy Type Mentions Index...
Finished Copy Type Mentions Index in 0.133033s (cpu), 0.138627s (system)
Finished SyncFromDirectory in 0.230755s (cpu), 0.239438s (system)
Timing Merge Branch...
Finished Merge Branch in 0.000165s (cpu), 0.000163s (system)
```
Notes:
* Git clone of base takes about 2-3 seconds
Speculation of possible issues:
* Traversing codebase structure and reserializing
* `Causal.before` was super inefficient — have a patch in \<branch TBD\>.
* `Causal.merge`'s use of `before a b` followed by `before b a`, is inefficient; it would be good to pursue both tests fairly, in parallel — maybe LCA provides exactly that?
* Say you are computing `merge a b`. If:
* `lca a b == a` then merge result is `b`
* `lca a b == b` then merge result is `a`
* `lca a@(merge [a1,a2]) b@(merge [a1,a2,a3])` will be `a1` or `a2`, so it's not equal to `a`, even though `before a b` is true here.
* What if LCA returns a set of LCAs, all at the same path length? Then before just checks that...
* Seems like need n-way merge
* Laws for merge:
* `before a (merge a b)`
* `before b (merge a b)`
* Given `merge a@(merge [a1,a2]) b@(merge [a1,a2,a3])`, could produce:
* `merge [a1,a2,a3]` (basically, `b`)
* `merge [a1,a2,a3,a]` (make `a` an ancestor of the new merge node)
* `merge [a1,a2,b]` (make `b` an ancestor of the new merge node)
* `merge [a,b]` (make `a` and `b` ancestors of a new merge node)
* Deserializing