beam 🛸

pickup mk2 - local-first pinning service

this doc describes two ideas!

beam app: the pinning service is you! (easy win)
beam as a service: running pickup in a durable object (experts only)

beam app

beam app runs on your machine. Initially a CLI, but it could also be a full app.

beam init guides you through setting up UCAN auth; then you command it to send data from your local IPFS node to w3, packed as a CAR via the store/add flow.

It can be used directly via beam <root cid> or as a pinning service by providing a compatible http api on localhost via beam daemon

beam 🛸 it to w3.

When the beam http api receives a pin request with an origins property that includes the local multiaddr, it exports the dag as a CAR from the local IPFS service, and does the store/add dance with w3 and then PUTs the CAR via the signed URL as per our usual upload flow.

If you try and beam a DAG you don't currently have locally, beam will fetch the DAG via the local IPFS service (or it's own internal IPFS / dagular magic.) before sending it to w3.

In this way we encourage users to pin and provide content they care about, and we place useful bounds on the work we will do on behalf of users.

This is not how pinning services work today! Right now a simple pin request can turn into a multi-hour task for a kubo node running in a container in centralised infra. We don't want to run that. We do want to offer a compatible pinning-service api so users are free to choose other providers.

This pattern does not work for low-powered devices that genuinly need to offload the work on to another peer.

beam as a service

Managing containers at scale sucks, and is a full time job that none of us want.

In a durable object we can do long running stateful things like go fetch a bunch of blocks in response to a pin request.

By merging the implementation the pinning-service api (currently old CF worker) and the logic to go fetch the thing (currently pickup) we can do a good job of managing spend and user expectations by failing early if a user exceeds a usage cap, and specifying delegates for them to connect to, removing much of the need for hole punching magic etc.

service limitations

we'd only be able to fetch content from peers that connect to us via ws, or find and fetch from other peers that provide a ws (or maybe one day quic / webtransport, https://blog.cloudflare.com/introducing-socket-workers/

The flow

There 2 main types of pin request

I have the DAG locally and I want to back it up.
Here is a CID I found, please can you search for it and make another copy available from your service.

user has the blocks locally

the ideal scenario as far as the pinning service api spec is concered is we send them a multiaddr for their ipfs node to connect to (called a delegate), so we dont have to search for providers.

we don't currently offer delegates in our existing impl, and have to rely on our kubo nodes finding provider records and being able to connect out to peers that may be well NATd at home.

👨‍💻 a peer makes a pin request as an http request to POST https://beam.web3.storage/pin/:cid handled by beam 🛸 worker. The request has an origin multiaddr for the users local IPFS node.

🛸 beam creates a new durable object with a new peerID. The durable object stores the CID to pin and the peerID keypair.

🛸 beam creates a pin requestId, with pin state queued and stores the request info in D1 (or our prefered long term db).

🛸 beam responds with PinStatus response including the requestId and a delegate multiaddr/dns4/${peerId}.beam.web3.storage/p2p/${peerId} for the peer to conenct to.

👨‍💻📡 the peer makes a libp2p connection to the delegate multiaddr address. /dns4/${peerId}.beam.web3.storage/p2p/${peerId}

🛸 beam worker recieves the websocket connection and uses the hostname to route it to the durable object for that peerId.

🛸 beam durable object sets up the libp2p connection and sends a Want msg on it with the pin CID.

👨‍💻📡 the peer should sends a block!

🛸 beam durable object receives blocks, verifies their hashses, dcodes them and collects the links and sending out more Want messages.

🛸 beam durable object stores batches of blocks in memory (128MB max to play with) and at some batch size it creates a CAR, hashes it for the car CID and stores it in R2 (or w3 store/add).

repeat until we have fetched the entire dag

🛸 beam durable object updates the pin request state to "pinned" in the db.

👨‍💻 the peer checks the status of the pin over http via GET https://beam.web3.storage/pin/:request-id

🛸 beam worker looks up request-id in the db and returns a PinStatus. The state is now pinned, and the user is happy.

user does not have the blocks

The job here is finding the blocks for the user. Finding things in IPFS requires stateful dht participants. We need a way to do dht & indexer lookups to find providers. Hosting or using an existing caskadht service or similar would provide this (and just this) instead of needing to run kubo nodes.

👨‍💻 a peer makes a pin request as an http request to POST https://beam.web3.storage/pin/:cid handled by beam 🛸 worker. The request has no origin set. We make a note of that.

🛸 beam creates a new durable object with a new peerID. The durable object stores the CID to pin and the peerID keypair.

🛸 beam creates a pin requestId, with pin state queued and stores the request info in D1 (or our prefered long term db).

🛸 beam responds with PinStatus response including the requestId and a delegate multiaddr/dns4/${peerId}.beam.web3.storage/p2p/${peerId} for the peer to conenct to… but it probably wont

🛸 beam durable object makes a request to caskadht GET /multihash/<multihash for cid> to find providers.

🛸 we try the providers in order. they need to support wss. We can use heuristics here. waves hands

🛸 beam durable object sets up outbound websocket connection to a provider.

🛸 beam durable object sends Want msg and flow continues as in the previous example.

Interesting parts

caskadht

we can test it out using the deployed version at https://indexstar.prod.cid.contact

curl -H "Accept: application/x-ndjson" https://indexstar.prod.cid.contact/cid/bafybeiha7xoedojqjz6ghxdtbf7yx2eklwo7db36772u3odrjusqck3ljm\?cascade=ipfs-dht -s
{"ContextID":"YmFndXFlZXJhcHQyZG1uamlqc2JpbXJ5dGxkcWhycGJkcW1jY2NxamQ0MzNzcXd2M29qNmt3NGR3eHp6YQ==","Metadata":"gBI=","Provider":{"ID":"QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC","Addrs":["/dns4/elastic.dag.house/tcp/443/wss"]}}

{"ContextID":"aXBmcy1kaHQtY2FzY2FkZQ==","Metadata":"gBI=","Provider":{"ID":"QmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC","Addrs":["/dns4/elastic.dag.house/tcp/443/wss"]}}

{"ContextID":"aXBmcy1kaHQtY2FzY2FkZQ==","Metadata":"gBI=","Provider":{"ID":"12D3KooWADjHf2kyANQodg9z5sSdX4bGEMbWg7ojwu6SCyDAMtzM","Addrs":["/ip4/155.138.137.45/tcp/4001/p2p/12D3KooWLoEuCKf6DAZCk6cyeafk3sy4sLgbrVcffWzuba9iNsZ8/p2p-circuit","/ip4/155.94.208.136/tcp/4001/p2p/12D3KooWHD57iAm2d15JQbWxV8BcmqcVvc7xbjUa5UgRCSqF4jTv/p2p-circuit","/ip4/155.138.137.45/udp/4001/quic-v1/p2p/12D3KooWLoEuCKf6DAZCk6cyeafk3sy4sLgbrVcffWzuba9iNsZ8/p2p-circuit","/ip4/155.138.137.45/udp/4001/quic/p2p/12D3KooWLoEuCKf6DAZCk6cyeafk3sy4sLgbrVcffWzuba9iNsZ8/p2p-circuit","/ip4/155.94.208.136/udp/4001/quic-v1/p2p/12D3KooWHD57iAm2d15JQbWxV8BcmqcVvc7xbjUa5UgRCSqF4jTv/p2p-circuit","/ip4/155.94.208.136/udp/4001/quic/p2p/12D3KooWHD57iAm2d15JQbWxV8BcmqcVvc7xbjUa5UgRCSqF4jTv/p2p-circuit"]}}

Cap infra spend

Have a max number of Durable objects that do the fetching.

Track counts of blocks/bytes fetched per peerID and refuse additional pin requests per day/hour. Pinning 1000 single block pins is fine. Pinning 1000 massive dags from slow peers is not.

By refusing pin requests that exceed the budget we keep infra spend predicitable.

Finding providers

Knowing where CIDs are requires a long running process to find things from the dht reliably, but this service can be seperated out so we are not obliged to keep kubo nodes running. caskadht and someguy are examples of services we could run to manage just the "who has the block" question. And we can lean on existing PL run services for those to get started.

UCANs

local app would handle that, or see https://github.com/web3-storage/w3infra/issues/115