owned this note
owned this note
Published
Linked with GitHub
# CID Congress #1: DASL
5 December 2024
Event link: https://lu.ma/jo7wbgqz
These Minutes: https://hackmd.io/PjLqF7ihR_-QaHK5PB4n1g
## Participants / githubs / socials :
- mosh / @mishmosh
- robin berjon / @darobin
- bumblefudge / @bumblefudge
- kate sills / @katelynsills
- marcin / lidel
- bryan newbold / bnewbold
- hugo dias / hugomrdias
- Russel/ sgtpooki
- Boris / bmann / https://bsky.app/profile/bmann.ca
- Laurens Hof / @laurenshof.online
- Alex / achingbrain
- alice / @aliceisjustplaying / https://bsky.app/profile/aliceweb.xyz
- Tim Caswell / @creationix
- andreas / 3S Studios (made the IPFS plugin for unreal engine)
- daniel / @2color
- b5 / @b5
## Reference links:
- [DASL Website](https://dasl.ing/)
- [DASL issues on Github](https://github.com/darobin/dasl.ing/issues)
- [discuss.ipfs.tech thread on Profiling CIDs](https://discuss.ipfs.tech/t/should-we-profile-cids/18507/13)
## Topics to discuss:
1. Quick overview of what we seek to achieve
- Content addressing is a good idea.
- 500,000+ permutations of CIDs (algos, chunking, encodings, etc.) means 2 implementations rarely give you the same CID.
- Deterministic data that is content addressable is also a good thing.
- IPLD also has 500,000+ permutations.
- Make it easy to implement and deploy CIDs and IPLD.
- People are already writing smaller CID/IPLD libraries for reasons.
- Yes to extensibility, no to optionality.
- Design for people who make products and not just for network engineers.
- It should work with JS and in the browser, but not exclusively.
- It should work for any size data.
- We win if web2, 2.5, 3, and 47 all like and use this.
1. If we can agree on a standard, who plans to support/implement/deploy DASL?
- This is only useful if people plan to support it.
- The label is valuable to the extent that we use it to attract more people.
1. What changes do you want to see?
- Be the change and all that!
1. Large data & chunking: do we need a better plan?
- Keeping specs small and orthogonal is helpful.
- Avoiding pushing too much information from the content to the identifier is a plus.
- We can add more small specs. Small is beautiful.
1. Additional topics (move to parking lot if there isn't enough time)
- (lidel) understand if/how produced CBOR DAGs can be loaded/rendered on existing gateways ([desktop](https://docs.ipfs.tech/install/ipfs-desktop/), [kubo](https://docs.ipfs.tech/install/command-line/), [rainbow](https://github.com/ipfs/rainbow/#readme), [@helia/verified-fetch](https://www.npmjs.com/package/@helia/verified-fetch), [service worker gateway](https://github.com/ipfs/service-worker-gateway#readme)).
- (potential play here, PR to add DASL-specific unit tests and fixtures to [gateway-conformance](https://github.com/ipfs/gateway-conformance) suite to ensure they work and DASL interops with existing ecosystem). (mosh: +++ or even standalone test suite that others could use)
- Other mini specs people might want
- (lidel) How websites built with DASL CIDs work (how to find index.html if root is a DASL block? dir listing-liek use cases in scope or not?)
- (lidel) Maybe out there, but a mini spec how to do chunking in CBOR+RAW, if user needs to deal with data that is "too big"?
- (lidel) How errors due to the lack of chunking are handled (implementations must have size limits, if there is no spec, someone will pick a magic number, making interop difficult)
- (lidel) DAG-CBOR code is reused. How implementation should act if DAG-CBOR spec (https://ipld.io/specs/codecs/dag-cbor/) is or becomes in conflict with dCBOR42 (https://dasl.ing/#dcbor42, https://datatracker.ietf.org/doc/draft-mcnally-deterministic-cbor/ is still a draft; perhaps align on Tag 42 support in `cbor` in addition to `dag-cbor`?)
1. We don't own this any more than others in the community.
- If this remains independent, then we'll figure out some governance when we need it
- If this somehow becomes an official IPFS project/improvement proposal, then the Stewards will own it, but you can become one.
- Either way, if you have opinions, make a PR!
1. How do you want to organise/participate/etc. going forward? This can be super lightweight.
- future meeting zoomed in on IPLD tooling, metadata, `tooBig`, etc?
## Parking Lot (Potential Future Topics of Research and Discussion)
- Better IPLD & CAR tooling and/or docs/education
- How contribute to docs/education on Bluesky's DASL/IPLD/DAG-CBOR usage?
- [CID Metadata](https://notes.learningproof.xyz/F9MEYzBaQ5aWf8ej-5YcVg?view) (Notes by BF)
-
## Minutes / notes
- Intros
- michelle: leaner spec, leaner libraries, leaner e2e conformance testing
- robin: would you implement? what changes would change your answer if "no" or "sorta kinda maybe"
- are followup/bridge specs required? if so, which? (thanks to lidel for starting this)
- bryan: work on BS/ATP, exInternet Archive, explaining IPLD to people has been a pain; great to free ATP implementers from having to reimplement any of this
1. optimistic about taking simplified/DASLized CIDs and DAG-CBOR to IETF and/or w3C in the future
- Robin: Can be "documenting reality" rather than "designing" a new "protocol" (i.e. experimental RFC, not standards-track/normative RFC)
2. how CAR tho? we defo need CAR CLI tools, etc, it's already our canonical export and wire format
- Robin: do we need to SUBSET CAR, tho? is it kinda OK as-is?
- Bryan: no changes needed here, I just want it "speced in this family"
- one confusion: how many root CIDs to reference? how differentiate root CIDs from block-CIDs? that's been hard to work with and get devs to work with
- Bryan: Throwing CAR into a BS SDK and/or DASL SDK would be great
- Mosh: Tooling lacking?
- Bryan: a CAR file is so far from a Tarball, so there's still a ""
- Why: Note, BS uses v1; need libraries
- Boris: Corner-cases, tho: how could CAR-streaming build a cacheable/intermittent-connection-friendly "decentralized CDN" for `tooBig` files?
- BM: very specifically, "how do you do PeerTube on atproto" -> where video blobs are on PDS, and then synched between PDS? Cached in local host on web? Cached on mobile phones? Shared p2p over local wifi?
- Andreas: UnixFS dependency AND protobufs required to work with CAR files? Why so many libraries needed? Robin: Yeah, could be redone without the UnixFS and ProtoBuf deps, those accrued cuz CAR was a solution to an IPFS-internal problem with all those deps already on hand
- Why: (in chat) getting rid of the protobuf unixfs stuff is highly desired
- supporting all that is `final` in multiformats alone would make our plugin immense, unreasonable for our usecase; we would be fine with just one hash
3. blake3 how when?
- blake3 without a BAO/sidecar/merkle-metadata
- the BAO/sidecar stuff could get included in a standardized/universally parseable way if we also bring into the family some kind of DAG-CBOR metadata document format/skeleton
5. what lost in dropping the DAG from the CID?
6. CID determinism?
- [bumblefudge has been taking notes on this issue and DAG config profiling](https://notes.learningproof.xyz/CGZn--pGSmSWPNyY2WAmxA)
- Boris
- Lidel: I am IPFS
- textual intros here:
- bumblefudge: i also IPFS Foundate, the IETF guy
- Tim Caswell (Vercel): I currently work on Infra at Vercel (not related to CID at all), but used to work on IPLD and Filecoin as a contractor at PL, worked with the dat ecosystem for a while, had many pre-bluesky talks with Paul, and was Director of Engineering at Daplie (P2P home server startup that failed) I'm interested in making CID more efficient/less bloated and considering how it can support recursive hashes like Caify using off-the-shelf hashes like sha256
- usecase: https://gist.github.com/creationix/bab38c1b979edb53d050692479d3e0b7
- bryan:
- highest value "common SDK" components:
- CBOR library which implements "dCBOR42", aka dag-cbor, which is "just" a profile of CBOR (aka, this isn't really a new RFC, just referencing existing stuff)
- CBOR to/from generic data (eg, language-specific dict / map[string]any / etc), and separately CBOR to/from native records/structs (ser/de)
- CID parsing and generation for "blessed" format. having blake3 be compile-time-optional would be nice
- CAR file parsing (probably both all-at-once and streaming?)
- mosh: ^Whether as part of this or separately, what's hard about making that SDK today? What have BS SDK builders stumbled on?
- lidel: 2 problem spaces: data authoring/CID generation/ingress, versus future/heterogeneous/context-deprived future readers and parsers
- the less blackboxes and multicodecs, the better
- extension points versus end-to-end support: DAG-CBOR/JSON didn't get used for 2 years because the gateways weren't serving it properly/with content negotiation/encoding
- interop is driven by test suites that are already at hand-- working with the gateway conformance tests that people use today would go a long way
- (gateway not to be confused with ipfs.io support, it is more about HTTP as common test interface)
- CAR v1 and v2 both have HTTP streaming issues and v2 has index at the end, making range request expensive
- for DASL have a separate (smaller, simpler) harness just for DASL-only systems (saves a lot of transport complexity), and then reuse fixtures/tests in legacy [gateway-conformance](https://github.com/ipfs/gateway-conformance) to ensure interop
- bryan^: I think the only confusion about this today is what to call such an SDK: is it an "atproto SDK"? a "partial IPLD" SDK? should folks really be implementing these or using off-the-shelf IPLD/IPFS framework SDKs? having a named and scoped "API surface" like DASL is very helpful for drawing abstraction barriers
- bf: :muscle:
- Boris: local-first is experiencing a cambrian explosion, and not very many people there are using CIDs of any kind, not much standardization/alignment - recruit there?
- atcute is great, how do we build on that/extend that
- sorting by usecases helps (if you need X, this the subset of the family you need to worry about)
- Tim: CIDs don't include everything I need per file
- lidel (in chat): iiuc ATProto went in other direction, using CIDs for LESS: https://atproto.com/specs/blob#blob-metadata static prefix is skipped most of the time, and prepended only when humans need to interact with it)
- andreas: CAR files different across CLI files and some services (ie. web3storage)
- Alex: How does this relate to modularity/packageability? why 2 hashes?
- Robin: sensible defaults, maybe that keeps?
- Bryan: blake3 kinda breaks the otherwise minimalist/minimal-deps nature of the rest of this DASL-family/tiny-SDK
- Andrea: use C++ for unreal engine,
- b5: why blake3 at all if the rest of DASL is so linked-data focused? What if DASL just had a CID size ceiling?
- movies are a whole thing anyways? JSON APIs don't need incremental verification
- tim: (in chat) my goal is exactly to be able to do something like blake3 in userspace, thanks @b5 for explaining it better than me
- clarity of purpose > global/comprehensive system
- boris: longtail versus specialized use-cases; crispness of scope
- mosh: ben lau's experiences with huge video archives are something we're trying to figure out, whether in or out of DASL
- boris: maybe he just wants shit to just work? could he just use iroh for big files and DASL for linked data? BM my point here mostly is that Ben (like many) was looking for tools that just worked, and mostly did not want to be involved in "standards"
- mosh: but where is the "small data/big data" cutoff that is needed to make such a tidy separation of concerns?
- Tim (in chat): I just want some kind of way to bridge from lightweight linked data to some arbitrary specific byte offset in a 4GiB archive file which was split in userspace into many small (128KiB-ish) sized blocks. I still want a single root hash for the entire doc, but want to be able to address inside it by byte offset
- bryan: 2 is vastly different from 1; 2 is helpful for upgrade paths/incremental migration of data?