CID Congress #2.5

# CID Congress #2.5 6-6-2025 https://hackmd.io/sFWmSe9ITBGzWoC7TrC4_w i like big blocks - intros - [gordon](https://github.com/gordonbrander) - SZDT - UCAN-like receipt when throwing some data in a CAR file - domain-locked trust model; need way to notarize/mitigate single-points of failure - use-case - trump deleting science data (internet archive and safeguarding research projects); whatever they don't grab and notarize we'll end up downloading from sketcho.ru/warez; archives too big for memory - SZDT - craming core claims into header of car file - what's gonna be simple enough for rando scientists with shit to do - there's a branch doing this with vanilla CBOR sequences instead of car files; - mosh: tools/libs? - g: doing this in rust which is partly cuz npm and py are so chaotic and overwhelming (too many options); using serde-ipld-dag-cbor (https://docs.rs/crate/serde_ipld_dagcbor/latest) - wrote a custom car library because iroh & other car libraries throw out additional data in metadata header. spec is ambiguous about it, so that's understandable [but i need it]. talked with robin about putting values in that header. - varint implementation in ipld didn't tell you how many bytes it chomped, which you need to work further with random access. (i'm sad about this decision but i understand it.) - cbor takes car of blake3 for me. - would love some interop with extended bluesky eco - at my previous startup we ran kubo nodes to supply people - lots of papercuts - self-certifying data over http - overall i'm able to get stuff done, it's just some papercuts - robin: what would be helpful for us to research or specify? Bluesky CBORconcat issue, for ex (they're not on the CBOR stream train just yet) - gordon: warc and web-packages (yasskin), IWA (cbor-array way of packaging web apps for sandboxes); in theory CBOR can stream but few parsers and tools actually let you, force you to load it in to memory, which is a very long sequence of lengths and bytes and lengths and bytes... - archival workflows often benefit from the appendability of TAR files (acrue over time) - g: Rndom Access into archives (BAOs) needed - SEE [BAO SPEC](https://github.com/oconnor663/bao/blob/master/docs/spec.md) - mosh: range requests - dietrich: advising FF and flickr fdn on some huge dataset projects - redsolver: content-addressing for a long time, last 2 years looking at firehose arcghives on bsky/atp - clinton: I am in a noisy place but intro: I work on BlackSky with Rudy - rudy: Rsky reimpl of atp (PDS + relay + labeler + app view); full archival relay that has full memory of firehose in car - helping Shift Collective on HistoryPin project (FF-grantees); summer institute are experimenting with decent. storage (moving off of gdrive for community archiving work) - rksy-satnav - car viewer (user-friendly enough for nontechnical bsky users understanding their downloads) - we use car to troubleshoot and test interop across indie implementations - note: blobstore done separately (car file exports don't include them) - I'm new to the web archiving field, but was at the opening to the WARC School, learning how that format works - manu: libp2p (primarily py) working on multiaddr - http gateway Opportunities: - map the full workflow needed to easily get subsets of data (index/search/rangerequest) - Current work in this realm: - Ben Lau @ Hypha/Starling working on a CAR indexing tool for Singularity - BAOs https://github.com/oconnor663/bao/blob/master/docs/spec.md - ATProto blob data not in the export CAR of bsky - standardize/make tooling for blob k/v handling? - robin: how would it work if gordon wanted to give his users a viewer for SZDT car files? how could robin (rsky?) adapt satnav to SZDT cars? - gordon: magnetize started as a warc converter, initially; IA uses WARC but not much of a tooling ecosystem, can't do random access into, etc. if SZDT went mainstream, i'd love some kind of backward compatibility (e.g., ingress WARCs and then export an authenticated CAR) - as for inlining blobs and bytestreams into CARs, i'd love that, cuz then what SZDT outputs could be "valid bsky cars" and be importable as bsky data, stored wherever cars get stored, etc - mosh: //FROST - they just have been spitballing a lexicon for using atproto to distribute structured data (rudy: they are not moving forward with this for now) :sob: - redsolver - big archives (.rkyv's) of atproto history - s5 - c-a storage network (some parts compatible with ipfs some not) - blake3 blobs, similar to iroh - demo of youtube archiver, includes as much metadata as possible (translations, comments, etc) - all CIDs are just blake3 hashes - Hot take: Specific formats for video archives are nice, but harder to work with long-term. Maybe a simple directory structure, and then extended metadata like media type etc. Skew towards a directory reader. Currently using message-tag (??) but want to migrate to CBOR to be more ATProto compatible. - also working on content-addressed database - to build indexer for blueskying - support for unlimited block sizes - redsolver is not aware of any impls that have this - _