# Qri x PL ARG Running Notes
## July 15th, 2021
* Question for why: What do you need to version?
* Fission WNFS for IPFS architects
* Collaborating on VCS
### Notes:
* going to build a tool called BARGE
* SSH into my box
* get barge
* tell barge to track a directory on my internet box
* I want it to feel like git. I want to watch as things are added, and how things are changed
* git hashes whole files, when we have, say, TB worth of files
* Target audience
* people who have a _bunch_ of data sitting on hard drives
* not necessarily structured data, but versioned
* commits?
* Some sort of structure that helps deal with logs
* What's important to track? What doesn't matter to track?
* do copy-on-write filesystems expose useful infomation here
* for example: we're using BTFS
---
_writeup:_
# So, you wanna Version a MerkleDAG?
Designing decentralized VCS should stake out the smallest possible MVP and add functionality with _extreme_ caution. Each added layer of functionality takes a highligher to the parts of the web3 stack that aren't really figured out yet, namely (non-blockchain) replication, identity, access control, and encryption.
Fewer capabilities means fewer of these large areas of active research you'll intersect with. With that said, you can get a _long_ way through careful design of product requirements. 99.9% of the world has managed to live life without needing interactive rebasing.
## Terms
| name | description |
| -------- | ----------- |
| Version | A merkledag with a "Previous" field that is either "" (origin), or the hash of prior version |
| History | a set of 1 or more branches |
| Branch | A continuous chain of versions beginning with an origin. ("forks" are branches) |
| User | A person. Users can have multiple keypairs. |
| Keypair | An authentication credential |
| Log | an aggregation of one or more histories, stored externally to a version |
## Key Questions To ask when designing a versioning system:
### Will only one user edit a history?
Single-user, "previous-field-in-DAG" version histories are a great starting point. If you can end there, everything is less complicated.
### If multiple users are acting on a history, can a permissionless approach work?
If not, you need an identity system to build on top of.
### Branches, forks, both?
In permissionless environments branching & forking are be the same thing. In Qri there are no user-facing "branches", only single-history collaboration and forks of histories into their own namespace.
### What merge semantics do you need, if any? Is your system dependant on sub-version merges, or can you get away with whole-version selection?
In git the HEAD state is (kinda) a union of prior versions. In Qri the HEAD state is the root hash of the latest commit.
Generally thinking about merge semantics should be a process enumerating what constitutes a merge conflict that must be surfaced to the user, and keeping that list as small as possible by designing constraints.
### Do your users *need* to manually edit history?
Avoid this at all costs. Very few use cases can justify the engineering overhead. Martin Klepper's work on CRDTs has reached the point of maturity that we should avoid surfacing branches to users.
### Is having data for only part of a history a valid state?
If so, you'll need a protocol to replicate history logs. This property is useful when working with large amounts of data.
### Which aspects of the system–if any–need access control?
Is the aggregated history sensitive? Data? Both? This brings encryption overhead.
## July 1st, 2021
Agenda
* Estuary Feedback
* Qri Dataset Preview page demo
* How can Qri help ARG / Filecoin?
* Near-term focus on getting preview pages up on qri.cloud
* moar datasets?
* Estuary-Backed IPFS-Application Subnets
* ~~Application-level connections to an SQLite-backed IPFS node?~~
* action items
### Estuary Feedback
Overall the experience was great. I've used both Powergate (hosted) and Estuary. I've found estuary does _only_ the things I want it to, and does them well. The small API is _really_ nice. Less to learn is better, and this focus on surfacing the benefits of filecoin while making as many decisions as possible on my behalf is liberating. Gets much closer to the S3 experience.
Working at this scale of data (~14Gib files) is right in the "sweet spot" where it feels meaningful & the time things take to transfer is "worth it". I expect 14 gig file transfers to be slow. I'd be keen to see what the upper and lower bounds of this are.
Using API calls to add via CID worked really well. Both my local go-ipfs node and estuary handled parallel transfer of 10x14Gib datasets went without a hitch. Coming from the world of IPFS, I'm not surprised, but this is a place where IPFS/Estuary does seem to really outshine related centralized services. I think there would be a great opportunity in spinning out an "enterprise" solution for ingesting to estuary that spins up an IPFS node onsite & facilitates transfer via a fingerprinted swarm. I can think of a number of projects that would benefit from such an architecture.
My biggest question now is when I'm going to wear out my welcome. What are the resource limits on my account? How much FIL / money am I spending of yours?
Also, um, what happens when deals expire in a year?
#### Wishlist
Stuff that would have helped:
1. **Error on double-CID adding, require flag to proceed.**
I accidentally added the same CID twice (re-ran the wrong shell script). I expected Estuary to error if I tried to make a second deal with the same CID. Feels like this error could be bypassed with a flag documented by the returned error.
1. **Grace period before Filecoin deal making.**
Generally mistakes with Estuary feel _very_ permenant. It's really made me slow down & triple check stuff, which I guess is good, but feels pretty unforgiving. That said, I do spend time reloading the list view to I'd love a "Grace period" where I can hit the "oh shit I didn't mean that" button. Similar to gmail's "cancel email" feature.
1. Incoming / Requested CIDs in list view
It'd be really nice to see CIDs I've POSTed via API calls or the UI show up in list view with an "in progress" indicator.
1. **Editable pin labels**
I'm pretty sure this should be doable. Pin labels or Estuary "names" are metadata, right? If so, I'd deeply appreciate being able to edit them.
3. **Filcoin deal status / summary in list view**
I spend the most time on the "list" screen, it'd be great to get an indicator that beckons me to the deals page
### Qri Dataset Preview page demo
### How can Qri help ARG / Filecoin?
* Is this archiving project helping Filecoin getting to where it wants to go?
* If we got these preview pages up with a "hey go get this on the dweb" link for folks, is that a good thing, or should we be waiting for faster retrieval from Filecoin?
### Estuary-Backed IPFS-Application Subnets
[Figjam](https://www.figma.com/file/0O9jWzyE4ZNXDuxGWLdsaO/Qri-x-ARG?node-id=0%3A1)
### ~~Application-level connections to an SQLite-backed IPFS node?~~
_skipped for time_
We've written the "datastore" pattern enough times on our side to realize qri would benefit _heavily_ from just storing all application state in sqlite.
We generally run an IPFS node in process. It'd be amazing to share an sqlite connection where an application has read-only access to ipfs-owned tables, and write access to application level tables.
In this world we'd be able to write an application that can do:
* cross-cid block diffing & storage footprinting
* generating manifests
* generating CAR archives
* transfer indicators
* peer connection display in-app
This definitely isn't on a roadmap, but I figured I'd ask if such a concept even makes sense. Faster, richer (readonly) blockstore interaction would be really useful in application space.
### Action Items
* [x] b5: write a doc on the nitty gritty details of versioning:
* [ ] why: have a think on how we'd do keypair-backed authentication.
* Chat next Tueday-ish