Improving CID Advertising and Routing

# Improving CID Advertising and Routing - you are here: https://ipfs.fyi/hashberg-notes - schedule: https://ipfs.fyi/hashberg-schedule - session-specific [TLDRaw board](https://www.tldraw.com/f/J7FXgqo2GW9lJ-BbgiTYC) - Facilitator(s): @BUMBLEFUDGE, mosh - Scribe(s): - In attendance (& quotable): - Bumblefudge - Will Scott - b5 ## Context - ## Notes (scribed) [will]: - all of the hash table describing who has what should fit within a single rack - plan was to have caches all over the world, this was at the time of - What does it mean to actually get to federation: - baseline, just run another instance, it'll catch up, and then we point to that instance as a - in this "v0" iteration the federation is at the governance layer - There is a path where we can remove the human trust part - we'd have snapshots & a djstra clock of the known publishers - this would get us to consistent snapshots across the instances - [riba] Can you give a common definition of federation? - [will] Bluesky is a good metaphor: multiple full replicas run by different parties, all with their own independant failure modes - multiple operators, each running a full instances, and clients can go to different - In this world, the worst that happens here is additional latency. "bad operators" end in "well, I guess that didn't work, I have to go to another one" - [riba] but bluesky isn't really federated. There are aggregators, and they provide different - [stellz] there is a global view, it's the relays - [will] right, and there isn't really federation of the relays - [stellz] but federating the relays doesn't really make sense - [cameron] how is that namespace partitioned? - WS: not partitioned, permissioned writers and many replicas - MOSH: what does adequate bus-factor coverage look like? - b5: will record-publishers be OBLIGATED to push all records? how prevent a partially-withholding node? - any node having something, it gossips to others\ - b5 what if a byzantine node holds - masih: gradual federation? v1 is not byzantine tolerant (trusted parties run core servers); v2 requires quorum to be byzantine resistant - Mosh: what's cid.contact point to in v1? - masih: cid points to any of the 4; v1 design was like a table held up by 4 legs; - ws: v1.0-1.3 - ws: v1.now - cid.contact proxies to whichever nodes are CAUGHT UP - legacy clients still supported on old routes --> v1 - kubo could be updated to hardcode a fanout/fallback array of replicas - decentralized discovery is one goal; orthogonal goal is making the federation more resilient - daniel: why/how centralized? - ws: no one paid for it or evangelized decentralizing it - ws: this infra covers a LOT of different usecases and architectures; having a governance layer would support flexible - ws: there is an open network of DHT users not paying in and not using much; we can still represent and support them without tokenizing/financializing, etc ("de minimis"?) - ws: optimizing for tokenland/FIL also bad for other reasons; need governance even there - ws: metering/load analysis needed - cameron: what's the argument for filecoin and ipfs sharing ipni? - ws: CID k/v is CID k/v; - ws: bouncing around across boundaries kills the 20millisec roundtrip goal... - matt: cid.contact is a right not a privilege at present - masih: long tail: 90% of traffic is storacha and pinata; why do longtail users need a slow - matt: why do all the freeusers go to IPNI and not DHT? why not have paying and/or power users use the centralized/federated thing and public option can be elsewhere? - endorphic: why not train an ML on the DAGs and routes so that locality-sensitive hashing can be layered onto bare CIDs (to predict BGP route and multi-armed bandit strategy)? - ws: paid tier/free tier seems likely given current conversation with FIL ecosys funders; plan A is a roadmap being written soon that has some kind of budgetary/treasury strategy - complexity of funding ops team labor - b5: what's it cost to run a full IPNI server for a year, ballpark? - 10kish hardware upfront at market price; plus price of rack near backbone - riba: but it doesn't work today! storacha is backlogged 5ish days, there are SPs falling down all the time - ws: that's an engineer-time problem, not an ops-budgeting problem - riba: from my perspective, this architecture cannot be optimized enough that a random writer can come in and say it works - masih: disagree; topology is fixable, budget has been zero for 2 years; there is a backlog of fixable bugs and upgrades! - masih: change in write request policy merged recently because big publishers are bursty! 2 years ago the current architecture worked fine given the write patterns from back then - masih: it's not fundamentally broken, it just needs some dev work to buffer and load-balance for new reality of huge-write platforms - riba: but we're working from the assumption that IPNI is worth fixing, because GLOBAL SEARCH is not a usecase anyone has! - mosh: can i call on matt? what are the consequences for you if IPNI goes down again tomorrow or for good? - matt: customers are PISSED and we can't make them whole; we don't run cid.contact and we can run a smaller equivalent for ourself only, and would if it goes down again; our customers think we ARE ipfs... - mosh: why can't you fall back to DHT? - matt: if we do, we would flood/break the DHT for everyone else - mosh: do we need another layer? - matt: i'd like one if it's reliable - ws: having all the CIDs in a public good isn't an achievable goal! - riba: matt's customers are unhappy because they UPLOAD to pinata and expect to download from ipfs.io (speaking of unsustainable public goods) - endomorphosis: reworking architecture is worth doing; better to rework and make it more efficient than just throwing more resources at current system; if traffic blows up, right back here in a year even with 5 more replicas; - ??: throttled freemium tier sounds fine? - closing words: i am not against the idea of IPNI, i'm mostly thinking a global index of all CIDs is just a weird goal in the first place? - ws: readload is the issue still, volume is still entirely manageable (and the real cost center, because cacheing strategies and DDoS protection need to be finetuned over time) - ws: it's an economic problem, putting a price on fast, reliable CID lookuips on an SLA is good! i don't think we need to downscope or gate this to one use-case or one protocol or anything else; the more usecases there are, the lower that price; - héctor: in a world were this work what role is there for the DHT? - ws: high churn/deletes/etc (mempool-type data) better DHT'd - héctor: that sounds like a 90 or 180 degree turn for the DHT's purpose? - masih: I think the DHT stays as-is! succesful routing has fallbacks. DHT is a purpose-built system for high-volume and low-latency use-cases, period. DHT can be public option. - héctor but if this is free to small user... masih: it absolutely should be! -