Try   HackMD

Data-Addressed Structures and Links (DASL)

Some components of the IPFS stack, notably CIDs and IPLD, are either not being adopted because of their complexity or when they are adopted are being stringently subsetted. This points to there being value in aligning on a common subset so that disparate projects can reuse shared code, can interoperate where meaningful, and so that we can build a wider shared understanding around the foundational bricks of self-certifying protocols.

DASL Overview

DASL is a lean, practical, interoperable subset of IPFS. It is strictly IPFS-compatible and any IPFS implementation should be able to process DASL content without modification.

The purpose of DASL is to address IPFS's curse of optionality. CIDs and IPLD support so many options that the cost of producing an implementation that can interoperate with arbitrary content is staggeringly high, and the test matrix that exercises all the features between any implementation pair is huge. This:

  • Makes implementations costly.
  • Creates interoperability issues.
  • Makes it hard-to-impossible to ship lightweight code on the web.
  • Makes picking up IPFS challenging as you need to understand way too much before you can productively deploy it.

DASL can be seen as the Don't Make Me Think IPFS. It's extensible for the day when that is needed, but it has limited complexity so that you can use it but remain focused on what you're building.

Proposal

We don't need a full-fledged standards process for this (thank Eris) but rather we can align on the same options and then simply describe this alignment. The proposal is therefore to have a short discussion (maybe even async, though a short sync workshop may help get people on the same page) and assuming it's successful we simply move on to doing the things that need doing.

Scope of the discussion:

  1. Agreement that this is a thing that's worth doing
  2. Agreement on CID subset parameters (mostly base32 vs 36 and SHA-256 vs Blake3)
  3. Agreement on IPLD subset (DAG-CBOR with tag 42, discussion of whether DAG-JSON is a separate thing or just a debug format)
  4. Let's not discuss the name. Propose a different one if you don't like it, but offline.

Assuming we agree, the next steps will be:

  1. Writing (short but correct) specs for these formats.
  2. Producing shared test suites that implementers agree to use.
  3. Making a small site with the above, pointers to libraries, and some basic introductory docs.

Details:

  • where: online, we don't need face to face for this
  • when: 2024Q4 (late November?)
  • who: (this is just indicative)
    • Mosh
    • Bumblefudge
    • Robin
    • Warpfork
    • Mary
    • Someone from the Bluesky (?)
    • Why
    • Daniel
    • b5
    • Ilya Siamionau/MarshalX
    • Brooke
    • Hannah
    • Ben Lau
    • David Buchanan

Evidence

For a variety of reasons, there is interest in lifting the Curse of Optionality that has befallen the IPFS stack.

Transport

This proposal does not look at the transport layer as that is more involved. A future workshop could tackle that, if there is sufficient need and community interest.

That said, we already know that we can call the transport equivalent Retrieval of Arbitrary Structures and Links or RASL. And used together, they make up RASL-DASL.

You're welcome.

Comments

  • [b5] This seems like it's mainly just carving out UnixFS, which means this subset doesn't have an answer for dealing with large files. Iroh's approach is to require BLAKE3. The 2 example libraries don't run into that problem because ATProto data is by its nature small. If you want to do data that's bigger than 5MB and be able to verify, an agreed-upon solution would go a long way.
  • [b5] That solution doesn't need to get into the transport layer ("how do I move this CID from a to b?"), but just deal with incremental verification ("how can I confirm the first 1MB of this file matches the root hash?")
    • [mosh] Users like Starling want a way to verify not just the data against its own hash, but that a hash made & timestamped in 2020 matches the data they have in hand now.
  • [b5] Iroh's recent focus on connections layer makes it more possible to support something like this.