w3 filecoin-pipeline v2

# w3 filecoin-pipeline v2 _Getting new uploads into Filecoin deals using ♠️_ ## Motivation Uploads made to the new api are not being stored in Filecoin. The dagcargo implementation that we use for the old api was intended to be a short term fix. It's on life-support maintenance only. We could update it to source CARs from the new w3up s3 bucket, but the preference is to move on to the new way... ...and dagcargo aggregation was block based and expensive when we recieve huge uploads with many blocks. We have an oportunity to simplify it by creating aggregates for deals out of the existing user uploaded CARs. ## Implementation details Riba is working on an api to get data into Filecoin deals and keep it there, called ♠️ aka https://spade.storage We send it a URL to a CAR up to 31.75GiB and it's commP hash. The CAR must be larger than 15.875GiB > Exact range (inclusive):[ 1+127*(1<<27) : 127*(1<<28) ] To use it, we need to decide on a strategy for aggreagating user uploaded CARs into filecoin deal CARs. :::info Some options to provoke ideas - deal per account when near 31GiB - deal per space when near 31GiB - deal per uploads accross all accounts & spaces every 31GiB (what dagcargo v1 does today) - deal per upload (!many uploads are way less than 31GiB) For any strategy we choose we would prefer to keep shards of an upload together ::: ### CAR concatenating CF Worker :::warning this section is outdated. Await news from Riba. ::: We can avoid actually storing the aggregate pieces. Once we decide which user CARs will be part of an aggregate and then we can use a CF worker to assemble the aggregate on the fly from those user cars we already have in R2. It would create a predictable CAR header for the aggregate, and then concat the data sections from the user uploaded CARs. note: we may be able to encode the set of user CAR CIDs in the URL we use for that worker, to avoid it needing access to a datastore that defines what user CARs belong in each aggregate. > That's a lot of CIDs... 32GiB / ~5MiB on average: ~6500 ### CommP calc :::warning commP calc will now be done for us by :spades:! Await news from Riba ::: We need to send the commP hash for an aggreagte to ♠️ along with the URL to fetch it from Prior art for calculating a commP hash from bytes https://pkg.go.dev/github.com/filecoin-project/go-fil-commp-hashhash We may want to share the commP that a user upload is in as early as possible once we have an aggregate ready for dealing, so they can use it in a FVM contract, or use it to verify that we added it to Filecoin deals like we said we would. ### Retrival The blocks in the aggreagate CARs will be made available over bitswap by the Storage Providers that hold them. HTTP retrieval will work via a URL for the SP + commP + range headers for byte offset in aggregate. For us to allow our users to fetch their CAR from an SP ovet http we must track which commP + offset each user CAR is in. We can ask ♠️ which deals and hence SPs a commP is in to derive the full URl. > there is a set of oracles for this, currently PL internal, but will be public by the time you go live, and trustless by end of Q1 ### notes if we split shards across many miners