owned this note
owned this note
Published
Linked with GitHub
# Content Routing WG
19.05.2025
- Updates from Will:
- Some money allocated to imminent upgrade
- ...since April, read load (partic in SF replica) was overloading the 2-replica implementations, falling down and delaying/blocking writes
- ...since May, added a 3rd replica without a full bootstrapped historical-node/local DH store; this experimental config (no-DH store) was unproven/untested, and this too fell down, but gave the SF replica a chance to catch up on write backlog and stand back up;
- ... as of 5 days ago, with 3rd shut down, SF back to normal and back to healthy operation [link to recent metrics](https://sploit.zip/5a8394bc1ab8.png)
- In parallel, Masih has been working on a "relay X" mechanism that will scale this better going forward
- Storacha has turned off cascading of requests to IPNI, which is helping reduce the read-overload (unproven hypothesis: was some significant chunk of the overload repeat requests?)
- Opening questions
- Joshua N: Is that money earmarked for personnel/hours, infra, both?
- Will: Short-term, service has tech debt and needs a bit of an upgrade/re-tune to new load patterns (less filecoiny big-batches?) and projections, i.e. people-time; also worth noting that currently one entity is bottleneck, and spreading load/infra across some federation would make future hiccups less noticeable to end-users
- Will: Format of these meetings? Goals?
- FF allocated a grant to masih and my time, but also some for "expanding mirror" of advertisements for bootstrapping new instances (fun fact, one of the replicas had run out of HDD and wasn't able to keep a complete local dataset anymore!)
- [Applied to ProPGF](https://fil-propgf.questbook.app/dashboard/?grantId=67b7bd4a364bc21272aea6f1&isRenderingProposalBody=true&role=builder&proposalId=6811df189ccae77961b51978&chainId=10)
- ideally, if June is ProPGF decision, community could mull and propose structures that rely on that medium-term funding model (and a no-outside-money community-governed fallback as well?)
- in infrastructure repo there are some notes on standing up your own replica/fullnode (is that in [ipni](https://github.com/ipni)?)
- originally on AWS, moved to dedicated baremetal hardware (v similar to what a FIL provider already has lying around); at least pre-tariffs and sanctions, that was about 10K, mostly the big fast HDDs since the indexes get huge (60TB)
- Joshua N: we would wanna see that hardware profile, we might be able to stand up a node
- Infra questions
- Joshua N: cid.contact still uses CloudFront? Aren't other CDNs cheaper? Will: We're paying ballpark 500ish/wk (of that, __ is egress)
- Will: 350GB/day were being transfered before; Joshua: Of course, but even as load-balancer/front-facing, there might be cheaper options to hold that 10min cache
- Final thoughts?
- Mosh: After the crisis, I'm thinking about long-tail integration of IPFS and FIL networks, abstracting away ipni/DHT decision with sensible defaults?
- [Hashberg](https://lu.ma/nbv106v5) in Berlin can discuss that
- Miroslav (Spark): We just want to be good citizens and make sure we're not freeloading or overloading, track developments;
- Miroslav: Weekly updates would be good!
- BF: DH Store makes for more oblivious retrieval, so we're rather the multiple nodes were syncing on DHStore, not on raw requested-CIDs; a non-DH store replica is cheaper (CPU-wise), and is helpful as an emergency/fallback server, but we still want a DH-enabled one to get stood up next; non-DH server bootstraps/ingests faster, which is good in an emergency, but this will be less significant if the proPGF funds a rearchitecture that could backfill from boot rather than needing to bootstrap from block 0 (and jump in to reduce load when things overload)
- Action items
- Will: