pickup mk2 - local-first pinning service
this doc describes two ideas!
beam app runs on your machine. Initially a CLI, but it could also be a full app.
beam init
guides you through setting up UCAN auth; then you command it to send data from your local IPFS node to w3, packed as a CAR via the store/add
flow.
It can be used directly via beam <root cid>
or as a pinning service by providing a compatible http api on localhost via beam daemon
beam ๐ธ it to w3.
When the beam
http api receives a pin request with an origins
property that includes the local multiaddr, it exports the dag as a CAR from the local IPFS service, and does the store/add
dance with w3 and then PUTs the CAR via the signed URL as per our usual upload flow.
If you try and beam a DAG you don't currently have locally, beam will fetch the DAG via the local IPFS service (or it's own internal IPFS / dagular magic.) before sending it to w3.
In this way we encourage users to pin and provide content they care about, and we place useful bounds on the work we will do on behalf of users.
This is not how pinning services work today! Right now a simple pin request can turn into a multi-hour task for a kubo node running in a container in centralised infra. We don't want to run that. We do want to offer a compatible pinning-service api so users are free to choose other providers.
This pattern does not work for low-powered devices that genuinly need to offload the work on to another peer.
Managing containers at scale sucks, and is a full time job that none of us want.
In a durable object we can do long running stateful things like go fetch a bunch of blocks in response to a pin request.
By merging the implementation the pinning-service api (currently old CF worker) and the logic to go fetch the thing (currently pickup) we can do a good job of managing spend and user expectations by failing early if a user exceeds a usage cap, and specifying delegates
for them to connect to, removing much of the need for hole punching magic etc.
There 2 main types of pin request
the ideal scenario as far as the pinning service api spec is concered is we send them a multiaddr for their ipfs node to connect to (called a
delegate
), so we dont have to search for providers.we don't currently offer
delegates
in our existing impl, and have to rely on our kubo nodes finding provider records and being able to connect out to peers that may be well NATd at home.
๐จโ๐ป a peer makes a pin request as an http request to POST https://beam.web3.storage/pin/:cid
handled by beam ๐ธ worker. The request has an origin
multiaddr for the users local IPFS node.
๐ธ beam creates a new durable object with a new peerID
. The durable object stores the CID to pin and the peerID keypair.
๐ธ beam creates a pin requestId
, with pin state queued
and stores the request info in D1 (or our prefered long term db).
๐ธ beam responds with PinStatus response including the requestId
and a delegate
multiaddr/dns4/${peerId}.beam.web3.storage/p2p/${peerId}
for the peer to conenct to.
๐จโ๐ป๐ก the peer makes a libp2p connection to the delegate multiaddr address. /dns4/${peerId}.beam.web3.storage/p2p/${peerId}
๐ธ beam worker recieves the websocket connection and uses the hostname to route it to the durable object for that peerId.
๐ธ beam durable object sets up the libp2p connection and sends a Want
msg on it with the pin CID.
๐จโ๐ป๐ก the peer should sends a block!
๐ธ beam durable object receives blocks, verifies their hashses, dcodes them and collects the links and sending out more Want
messages.
๐ธ beam durable object stores batches of blocks in memory (128MB max to play with) and at some batch size it creates a CAR, hashes it for the car CID and stores it in R2 (or w3 store/add).
repeat until we have fetched the entire dag
๐ธ beam durable object updates the pin request state to "pinned" in the db.
๐จโ๐ป the peer checks the status of the pin over http via GET https://beam.web3.storage/pin/:request-id
๐ธ beam worker looks up request-id
in the db and returns a PinStatus
. The state is now pinned
, and the user is happy.
The job here is finding the blocks for the user. Finding things in IPFS requires stateful dht participants. We need a way to do dht & indexer lookups to find providers. Hosting or using an existing caskadht service or similar would provide this (and just this) instead of needing to run kubo nodes.
๐จโ๐ป a peer makes a pin request as an http request to POST https://beam.web3.storage/pin/:cid
handled by beam ๐ธ worker. The request has no origin
set. We make a note of that.
๐ธ beam creates a new durable object with a new peerID
. The durable object stores the CID to pin and the peerID keypair.
๐ธ beam creates a pin requestId
, with pin state queued
and stores the request info in D1 (or our prefered long term db).
๐ธ beam responds with PinStatus response including the requestId
and a delegate
multiaddr/dns4/${peerId}.beam.web3.storage/p2p/${peerId}
for the peer to conenct toโฆ but it probably wont
๐ธ beam durable object makes a request to caskadht GET /multihash/<multihash for cid>
to find providers.
๐ธ we try the providers in order. they need to support wss. We can use heuristics here. waves hands
๐ธ beam durable object sets up outbound websocket connection to a provider.
๐ธ beam durable object sends Want
msg and flow continues as in the previous example.
we can test it out using the deployed version at https://indexstar.prod.cid.contact
Have a max number of Durable objects that do the fetching.
Track counts of blocks/bytes fetched per peerID and refuse additional pin requests per day/hour. Pinning 1000 single block pins is fine. Pinning 1000 massive dags from slow peers is not.
By refusing pin requests that exceed the budget we keep infra spend predicitable.
Knowing where CIDs are requires a long running process to find things from the dht reliably, but this service can be seperated out so we are not obliged to keep kubo nodes running. caskadht
and someguy
are examples of services we could run to manage just the "who has the block" question. And we can lean on existing PL run services for those to get started.
local app would handle that, or see https://github.com/web3-storage/w3infra/issues/115