DASLing group - Running Agenda and Minutes

# DASLing WG - Agenda and Notes ## Parking Lot for Future Topics / Dreamcasting ## 18 Dec 2025 ### Participants: - cole, mosh, robin, vmx, achingbrain, danielnorman, iameli, b5, bumblefudge - Intros from new/lapsed friends: - Sebastian, Liccium.app - app for content certification - Zicklag, Roomy.chat - just started using DASL - Dholmes, Bluesky ### Agenda: - Range requests shareout (b5) - This need is becoming more popular, eg DuckDB - columnar data stores (e.g. duckDB) and parquet files --> sparse requests based on just a few headers - PR to add range requests to BDASL spec: https://github.com/darobin/dasl.ing/pull/81 - Reference implementation: https://www.iroh.computer/proto/iroh-blobs - Lots of small comments from Robin, but the general shape is there - streaming verification as specified in blake3 whitepaper is only one in scope for now (no real competition, supported by some libraries already) - rounding to nearest multiple of 1024 bytes stipulated by the whitepaper, no flexibility there - how verify without map of how the chunks/ranges compose to the original hashed file? - b5: requestor/client doesn't need to know the "BAO" (map) in advance -- responder has to return the BAO before the verifiable streaming can start - tuning doesn't affect the BAO-- anytuning can still verify on the fly/first complete receipt - vmx: working on more generalized BAO system (working on SHA, for use cases where blake3 not allowed, e.g. NIST reqs); draft work being published on BAOTree and on the iroh/blobs repo (but slowgoing because moving from nongeneric to generic) - b5: spec work actually retrospec work, as ref impl of not just bdasl but a dasl/bdasl gateway which does range requests already on our github - some clients using range requests in blobs protocol for duckDB usecases - robin: all of atp could be a duckdb? this could query over it if so - append-only data could increment that root hash every time each time enough is appended to change the top-level hash; e.g., event-sourced systems work efficiently with this (zicklag: corner-case, doesn't have to be strictly app-only, could allow deletes or updates, it would just trigger a slower roothash recompute) - b5: messing with the middle (security properties in some usecases, but also seq numbers needed to compute/map in an app-only system); size confirmations using only last block and BAO-path - cole: If anyone hasn't read the [Merkle Search Tree paper](https://inria.hal.science/hal-02303490/document), I'd definitely recommend it for this area - DASL ideas for C2PA & MP4 (eli) [slides](https://docs.google.com/document/d/1kyv0j9rVlUrQxyMHcOrUPF-Rs4DAst7B3HMHzsG1Yo8/edit?tab=t.0) - novel CID mechanism/idea - stream is, almost universally, a sequence of very short mp4 files; these can have enough metadata and signatures embedded in each to verify the stream (and the archive) - "livestreaming generalized to VOD, but not other way around"; thus starting with streaming - rate limiting would kick in if you wrote this live to today's PDS; but still, mental model wise, good to think of each chunk's data as a superset of a lexicon record - ![image](https://hackmd.io/_uploads/ryU4shbQZx.png) - what i'm proposing is a novel format for smushing the discreet mp4s into an archive (mp4 doesn't smoothly allow concat, and besides, the metadata embedded in each could get a little redundant, require compressing, etc) "reversible muxing" - ![image](https://hackmd.io/_uploads/Hy2rohbQWl.png) - darobin: muxle tree is that anything - b5: sounds like you can't change metadata midstream since youre repeating it for each segment... analogous to my comments above? eli: actually no, all the content/arbitrary metadata would be fine to switch from segment to [e.g. 1 sec] segment. but the video metadata proper (specific to the stream/device/etc) you need to fix stream-long for security and for the demuxing to work. - CBOR:DRISL::C2PA-mp4:S2PA? not a quick lift, definitely need funding for the long yakshaving odyssey required to really make a v1 that's fully valid qua C2PA... - political context: Adobe back-to-HW supply chain usecases are years off, but it's a start; lobbying EU govts - sebastian: but EUDI CA process and C2PA incompatible, EUDI signatures can't ever be fully conformant to both... - sebastian: [ISCC](https://core.iscc.codes) uses blake3 btw; ISCC is external (registries of metadata, declarations, etc) rather than embedded, although you can embed ISCC if you're less worried about full conformance with C2PA - sebastian: metadata could be external anyways (to simplify and reduce redundancy)? - eli: actually the redundancy of metadata is efficient anyways because the C2PA mp4 stuff already uses compressable CBOR - eli: external more useful for archives than for streams - bf: could be atproto records, non? - eli: stripping spotify tracks (common in twitch) breaks the hashes, unless it were a CID on the manifest, - https://cawg.io/ - Browsers wishlist (mosh) - https://discuss.ipfs.tech/t/browsers-standards-work-2026-call-for-community-input/19917/4 ## 29 Sept 2025 ### Agenda: - RASL changes - go-dasl - bnewbold - what needs doing still to unblock cole - vmx: mediaType, not content-type (duh) - update the host-hints to query string thing for RASL support - switch to twice-monthly maintenance mode after october - robin: tiles impl via RASL playing around - bnewb: CAR idea - CAR file sorted in treewalking order? - b5: ping @lidel ? - b5: our blobs impl exists exactly to simplify this problem by doing a wire-protocol first - bnewb: kind of a profile of car that we would handle the fallback of non-conformant anyways - bnewb: currently loading entire CAR files into memory to parse - simpler to just make the merkle tree/dag deterministic for streamability/predictability from the index - unrelated: CAR file with blobs at the end? don't currently include media/blog updates; might specify this soon - b5: recommend metadata in a block (first in block list), neater for us - CIDs in JSON - $link convention versus "/":{"bytes": <base64 bytes>}" convention ### Reference links: ### Participants / githubs / socials : - Bryan - Bumblefudge - b5 - mosh - Cole - Robin ### Minutes - ## 9 Sept 2025 ### Participants / githubs / socials : - Mosh - Cole - Bumblefudge - b5 - bnewbold - dig - ramfox - Robin ### Reference links: ### Agenda: - prog reports on implementations - languages for FFI/binding? ### Minutes: - mosh: northstar goal/top priority = 2 complete libraries that users love - rust-dasl: - b5: performance-oriented questions motivating the spec minutae in the [notion]() - picking languages to FFI/bind - long tail? - dig: what are the users to focus on? someone wanna give me user stories or am i just building against APIs? - cole: re DRISL, i've been modeling it as a JSON/CBOR encoder/translator, thinking less about usecases and more about generic "any JSON <> any CBOR" - dig: but in FFI context, that approach might break down since diff languages or YAML files or other usecase-specific constraints vary widely... benchmarking against a sample (big?) dataset and usecase might be more objective cross-lang - mosh: bryan was thinking of providing such a BS dataset - b5: eli (stream.place) is working on go side and that gives us a good usecase for "huge raw data"; sebastian (mobile app surface) is good for cross-lang/cross-context usecase - bryan: cole's approach kind of implies using each language's idiomatic translation (marshal for go, unwrap for rust) as the basis; in C or C++ maybe it makes sense to provide types? - robin: what's "idiomatic" for FFIs, tho? - dig: the in-memory model of each language is what you have to assess for FFI design; the big variation is how _expensive_ in-memory typing is in each language, if you're trying to be performant in each language; i actually would prefer having both use-cases and a shortlist/stack-ranked list of languages - bryan: maybe it's more useful for open-source/unknown future contributors to make ONE FFI and document really well your design process and steps so that others could do this for additional languages? - bryan: mackuba comes to mind (has found a lot of rough edges in Ruby handling AT DAG-CBOR with a standard CBOR parser for lack of a dedicated one) - b5: rudy might be good to talk to as well, for hardening the core rust library (before tackling the FFI stuff) - [clinton](https://bsky.app/profile/torrho.com), who works with rsky, has also done some big data stuff with multiple libraries - b5: could we get huge dataset to benchmark, bry? bryan: nah, historical archives contain content that may have been deleted, highly recommend just using synthetic/dummy data or developing against the firehose (what i always do); random car files as microbenchmark - b5: whatever it is, let's just ALL use any car file of any size - robin: ok to use my profile for microbenchmark - b5: so consensus to focus on the Rust side first (over ffi) and think about FFI usecases in parallel (to be ready for when dig's happy with rust core)? - bryan<>b5 coordinate on user stories and target languages in parallel? - bryan: another idea: did:plc stuff might be narrower user-story to add to the list; not just car file dump of whole repo, but rather a DID doc history check (fetch all did docs as JSON, convert all to cbor, check sigs, return true/false?) - b5: consensus on that? - bryan: it's pretty clean, compared to firehose, which has every possible error or invalid data mixed in - bryan: additional user stories? maybe a non-bluesky one? - mosh: geospatial might turn up something? - mosh: general read/write needs [ex user with pds-ls idea](https://bsky.app/profile/hdevalence.bsky.social/post/3lwu6oozcj22l) - bryan: coming back to the marshall/unmarshall thing and JSON v2, I think one useful thing would be to have error handling sensitive to the difference between unknown fields; cole's implementation seems to mention in the package notes that there might be remarshalling/fit-to-struct mismatches (i.e. error for extra fields, or env var for what to do with extra/uncertain-mappability fields) - cole: i think we might already have something like that in the works for MASL any purposes - dig: ignore is the default in serde (and rust generally) when there's extra bytes somewhere... that's what i'd do on the rust side - bryan: doesn't serde have a "deserialize to a map" option? dig: yeah there's a primitive type for that, used sometimes as alternate fallback or for testing - bryan: sidenote: shrinking the dependency tree (by switching to go-dasl) could get a CLI into debian packman! - cole: blake3 a bridge too far? bryan: nbd, a 4th won't kill you and the package you're replacing has like 55... - CID Congress in 8 days? can we do partial/WiP demos then? - b5: happy to demo - minor topics - bryan: unrelated topic - unicode non-UTF8 [corner-cases](https://www.tbray.org/ongoing/When/202x/2025/08/14/RFC9839)? should crazy unicode be allowed in values (or even just keys) - robin: start with an issue? i can see why bsky might like it, but should every DASL library have to check every string? - bryan: but is there an interop or nondeterminism NOT to address? some languages already do this in string handling... - cole: CBOR already requires your strings to be UTF-8, so the libraries already barf on non-UTF-8 keys in most languages... - bryan: i'm not sure the CBOR does the whole thing about preventing URL spoofs; punycode normalization? - cole: i feel like it's too opinionated for the cbor layer to be forcing this stuff? - cole: [NaN](https://github.com/darobin/dasl.ing/issues/59)? robin: yeah we'll align the draft RFC and the DRISL spec, and include all this - cole: i have a branch not encoding the CID lengths as we [discussed](https://github.com/darobin/dasl.ing/issues/45), can merge at the last minute - cole: same with [RASL query string](https://github.com/darobin/dasl.ing/issues/37), might have time at the end - bryan: if bsky could implement a basic CAR implementation, could you review it and merge it in? cole: could try - Next meeting moved from Sept 22 (too many people travelling) to Sept 29