upload-api in CF

Create signed urls to have users put CARs to a CF worker (again). It verifies every block from every car and writes the complete indexes to R2, DUDEWHERE style.

  • We now have a "hash-on-write" guarantee, and combined with the CAR CIDs we can shuffle the data around inside the system safe in the knowledge that we dont have to recheck the individual blocks.
  • We also have a complete index of all blocks in R2 so w3s.link can serve all requests from R2. We can stop the slow and expensive w3s.link -> ipfs.io -> E-IPFS readspipe. see w3s.link redirect
  • We can give hoverboard 🛹 full indexes, so it has full info to serve any bitswap request from R2 only

However

We moved away from PUTs via a worker so that we could remove limitations. We would be back to

  • limiting uploaded CARs to max 100MiB again.
    • we alreadys split CARS at 100MiB
    • we should revisit streaming writes to r2 via a worker
  • incurring a worker cost for uploaded data.
    • if we do it async we still have this cost. the only saving is not doing the verification, which seems reputataionally risky if rare.

Triggering a worker after a direct PUT to R2

It seems very likely that CF will make it so you can trigger a worker from a put to R2 event.

That would let us do the verifcation async, which would let us switch to using direct signed URLs again. But we'd have to condsider where to put the hash verification barrier. writes couldd go to a temp bucket, and be copied across when we have verified and indexed (or the existence of indexes could be used to distinguish but this seems less explicit)

Future: Saturn?

Longer term, can we write uploaded CARs directly to Saturn and have them hash the blocks?