# Improving CID Advertising and Routing
- you are here: https://ipfs.fyi/hashberg-notes
- schedule: https://ipfs.fyi/hashberg-schedule
- session-specific [TLDRaw board](https://www.tldraw.com/f/J7FXgqo2GW9lJ-BbgiTYC)
- Facilitator(s): @BUMBLEFUDGE, mosh
- Scribe(s):
- In attendance (& quotable):
- Bumblefudge
- Will Scott
- b5
## Context
-
## Notes (scribed)
[will]:
- all of the hash table describing who has what should fit within a single rack
- plan was to have caches all over the world, this was at the time of
- What does it mean to actually get to federation:
- baseline, just run another instance, it'll catch up, and then we point to that instance as a
- in this "v0" iteration the federation is at the governance layer
- There is a path where we can remove the human trust part
- we'd have snapshots & a djstra clock of the known publishers
- this would get us to consistent snapshots across the instances
- [riba] Can you give a common definition of federation?
- [will] Bluesky is a good metaphor: multiple full replicas run by different parties, all with their own independant failure modes
- multiple operators, each running a full instances, and clients can go to different
- In this world, the worst that happens here is additional latency. "bad operators" end in "well, I guess that didn't work, I have to go to another one"
- [riba] but bluesky isn't really federated. There are aggregators, and they provide different
- [stellz] there is a global view, it's the relays
- [will] right, and there isn't really federation of the relays
- [stellz] but federating the relays doesn't really make sense
- [cameron] how is that namespace partitioned?
- WS: not partitioned, permissioned writers and many replicas
- MOSH: what does adequate bus-factor coverage look like?
- b5: will record-publishers be OBLIGATED to push all records? how prevent a partially-withholding node?
- any node having something, it gossips to others\
- b5 what if a byzantine node holds
- masih: gradual federation? v1 is not byzantine tolerant (trusted parties run core servers); v2 requires quorum to be byzantine resistant
- Mosh: what's cid.contact point to in v1?
- masih: cid points to any of the 4; v1 design was like a table held up by 4 legs;
- ws: v1.0-1.3
- ws: v1.now - cid.contact proxies to whichever nodes are CAUGHT UP
- legacy clients still supported on old routes --> v1
- kubo could be updated to hardcode a fanout/fallback array of replicas
- decentralized discovery is one goal; orthogonal goal is making the federation more resilient
- daniel: why/how centralized?
- ws: no one paid for it or evangelized decentralizing it
- ws: this infra covers a LOT of different usecases and architectures; having a governance layer would support flexible
- ws: there is an open network of DHT users not paying in and not using much; we can still represent and support them without tokenizing/financializing, etc ("de minimis"?)
- ws: optimizing for tokenland/FIL also bad for other reasons; need governance even there
- ws: metering/load analysis needed
- cameron: what's the argument for filecoin and ipfs sharing ipni?
- ws: CID k/v is CID k/v;
- ws: bouncing around across boundaries kills the 20millisec roundtrip goal...
- matt: cid.contact is a right not a privilege at present
- masih: long tail: 90% of traffic is storacha and pinata; why do longtail users need a slow
- matt: why do all the freeusers go to IPNI and not DHT? why not have paying and/or power users use the centralized/federated thing and public option can be elsewhere?
- endorphic: why not train an ML on the DAGs and routes so that locality-sensitive hashing can be layered onto bare CIDs (to predict BGP route and multi-armed bandit strategy)?
- ws: paid tier/free tier seems likely given current conversation with FIL ecosys funders; plan A is a roadmap being written soon that has some kind of budgetary/treasury strategy
- complexity of funding ops team labor
- b5: what's it cost to run a full IPNI server for a year, ballpark?
- 10kish hardware upfront at market price; plus price of rack near backbone
- riba: but it doesn't work today! storacha is backlogged 5ish days, there are SPs falling down all the time
- ws: that's an engineer-time problem, not an ops-budgeting problem
- riba: from my perspective, this architecture cannot be optimized enough that a random writer can come in and say it works
- masih: disagree; topology is fixable, budget has been zero for 2 years; there is a backlog of fixable bugs and upgrades!
- masih: change in write request policy merged recently because big publishers are bursty! 2 years ago the current architecture worked fine given the write patterns from back then
- masih: it's not fundamentally broken, it just needs some dev work to buffer and load-balance for new reality of huge-write platforms
- riba: but we're working from the assumption that IPNI is worth fixing, because GLOBAL SEARCH is not a usecase anyone has!
- mosh: can i call on matt? what are the consequences for you if IPNI goes down again tomorrow or for good?
- matt: customers are PISSED and we can't make them whole; we don't run cid.contact and we can run a smaller equivalent for ourself only, and would if it goes down again; our customers think we ARE ipfs...
- mosh: why can't you fall back to DHT?
- matt: if we do, we would flood/break the DHT for everyone else
- mosh: do we need another layer?
- matt: i'd like one if it's reliable
- ws: having all the CIDs in a public good isn't an achievable goal!
- riba: matt's customers are unhappy because they UPLOAD to pinata and expect to download from ipfs.io (speaking of unsustainable public goods)
- endomorphosis: reworking architecture is worth doing; better to rework and make it more efficient than just throwing more resources at current system; if traffic blows up, right back here in a year even with 5 more replicas;
- ??: throttled freemium tier sounds fine?
- closing words: i am not against the idea of IPNI, i'm mostly thinking a global index of all CIDs is just a weird goal in the first place?
- ws: readload is the issue still, volume is still entirely manageable (and the real cost center, because cacheing strategies and DDoS protection need to be finetuned over time)
- ws: it's an economic problem, putting a price on fast, reliable CID lookuips on an SLA is good! i don't think we need to downscope or gate this to one use-case or one protocol or anything else; the more usecases there are, the lower that price;
- héctor: in a world were this work what role is there for the DHT?
- ws: high churn/deletes/etc (mempool-type data) better DHT'd
- héctor: that sounds like a 90 or 180 degree turn for the DHT's purpose?
- masih: I think the DHT stays as-is! succesful routing has fallbacks. DHT is a purpose-built system for high-volume and low-latency use-cases, period. DHT can be public option.
- héctor but if this is free to small user... masih: it absolutely should be!
-