# Thesis
The "thin waist" of the "merkle forest" needs a stable, ubuquitous and nearly-universally agreed upon stream chunking algorithm that is sufficiently good to not require a switchover for several decades. The term "good" here covers performance/resource usage in terms of
- computation phase space/time ( RAM/CPU )
- requirements at rest (deduplication potential)
- requirements at transport (overhead for subdag identification/communication)
**Provided** [such a beast exists](https://xkcd.com/2268/) (and I firmly believe it does, similar to Zooko's ["local maximum for hash functions" conjecture](https://twitter.com/zooko/status/835294257888002050)) it would apply simultaneously at the following spots **without** needing to work within the IPFS ecosystem at all, but rather parallel to it:
- A boon for FLOSS mirrors: duplication in packaging archives is rampant
- True "holy-grail" ETAG in the HTTP world
- A block-deduplication-atom for ZFS/BTRFS/whoever-will-listen
## Part 1: how we got here
### 2016 Obective(s):
- Ability to serve both
- https://kernel.org/pub/linux/kernel/v5.x/ and
- https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
utilizing the same backing leaf nodes for everything
- Derive a stable stream hashing function that converges with the above but at the same time is suitable to be included in something like [`rhash`](https://www.mankier.com/1/rhash#Hash_Sums_Options)
- {{ Insert 30 second demo of the toolkit from that era }}
### Problems identified while doing above
TLDR: If one wants to interoperate with current and *future* web gateways, one is essentially at the mercy of PL. Moreover it is **really** difficult to advocate for any spec work ahead of time, as "the code is not written yet". Which basically means that you either:
- dump stuff into the DHT and hope you got it right / that PL will bend to your worldview
- wait ( what I've been doing, because the above alternative is a dick move )
Specific issues: [I wrote this list back in December, some stuff may have changed since](https://hackmd.io/EX5uh93eRfuivmfNbZYgsA)
## Part 2: go-ipfs-time
In late January real work on https://github.com/ipfs/specs/issues/227 started (followed by rapidly diverging expectations of what the end result should be)
- mini-thesis:
- The time for "experimental interfaces" has passed: islands of stability are **absolutely necessary** at the tooling boundaries to ooze confident "yes, you can use this for your data" on every turn
- "systemd is an architectural mistake": building more and more intricate stateful multi-gigabyte-eating daemons does not help adoption. In other words: this *should not* be writen: https://github.com/ipfs/notes/issues/434
- Lean on simple "unix-style" streaming interfaces as much as possible, while allowing extensibility
- "Do one thing and do it well" components
- Keep an eye on runtime (RAM/CPU) performance envelopes
- {{ 5 min demo of what :dagger_knife: does as it is right now }}
## Part 3: where do we go from here
As it stands :dagger_knife: is the perfect tool to bridge the:
- "we want it fast but don't know much about the scientific method" crowd with the
- "we know information theory inside out, but can not write real code to save our lives" crowd, with the
- "I just want to mirrot my ubuntu archive quickly" crowd.
Sticking points:
- If this were at my pre-PL time, I would know exactly how to market/run/extend this project
- After joining PL everything became **really** "complicated"
- {{ the remaining 45 mins go towards figuring what do we do }}