# decouple w3filecoin aggregation pipeline trigger ## Background w3filecoin pipeline is [spec'ed](https://github.com/web3-storage/specs/blob/main/w3-filecoin.md) and divided into a set of well defined roles: Storefront, Aggregator, Dealer and Deal Tracker. Each of these roles can be implemented by one or more actors across a network. A given actor implementing the Aggregator service can receive items to aggregate from one or more Storefronts. Once an Aggregator aggregated enough pieces together, this aggregate can be offered for storage with Filecoin SPs. Filecoin SPs try to commit the offered aggregate by downloading all individual Pieces of the aggregate and create it locally. For downloading each piece, Filecoin SPs rely on [Roundabout](https://github.com/web3-storage/w3infra/blob/main/docs/roundabout.md). If one of the pieces that are part of an Aggregate were not well computed, all the Aggregate will not be commited by Filecoin SPs. Today we considered this a critical problem, which may change in the future. Current product requirements together with using Spade and typical SPs flow, make it essential to have validation process. The main reasons are: - Product has a strong top level requirement for SLA for an ingested piece of content to land into a SP in less than 72H - Spade today has no SLA guarantees - We rely on aggregation per [FIP 0069](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-0069.md). In short this means that if we get into the Pipeline a wrong pieceCid for a given contentCid, all the 32Gb aggregate will be invalid. All the "good" pieces get delayed (may trigger the SLA), and need to go back into the aggregation queue. - There is no spec'ed Report API from SPs for when a given Aggregate has a problem. There are some alternatives that **some** SPs use, but they are not required today. This makes w3filecoin pipeline today completely blinded on why a given Aggregate may fail, except from an alert if it did not get into chain until some alerting thresholds. It is also not possible to query for a state, or error case. - Per the above limitations, w3filecoin still does not have an implementation for retries. - With current throughput of ~3 hours per created Aggregate, it is an easy attack vector for a bad actor to submit small invalid Pieces each couple of hours and completely stall the pipeline TODO By the above Filecoin Storage pipeline, it should not blindly accept pieces computed from a client without validation, or a reputation system. An aggregator MAY aggregate pieces together by individual storefront, as well as decide to not accept pieces from a misbehaving Storefront. Therefore, Storefronts SHOULD validate the pieces computed by untrusted agents. For launch, w3up API has a static bucket write target. When a write happens to this bucket as a result of a `store/add` invocation, an event is triggered. Once write happens, a `PieceCID` is computed right away for the written content. Once the `PieceCID` is computed and recorded, two follow up events are triggered: `filecoin/submit` invocation, where given Piece is queued to be offered for aggregation (`piece/offer` invocation); `assert/equals` invocation, where `contentCID` is claimed to be equal to the `PieceCID` by the service. The equals invocation will enable Roundabout to be able to find the content behing a requested `PieceCID`. ## Decentralize trigger As we move to a world where content claims are a first class citizen of the system, these MUST be directly claimed by the clients (nothing blocks these claims from also being issued by actors like Storefronts). Agents storing content with a Storefront (such as w3up API) MUST compute a PieceCID for the data they upload (in fact they already do so!) AND issue a claim that the `contentCID` equals to the `pieceCID`. After uploading the content to the target bucket, agent should invoke `filecoin/offer` in order to offer computed Piece for Filecoin Storage (already does so! But at the moment API is a NOP replaced by the bucket event. Because of CIDs, this still allows agent to go through all UCAN receipts chain until data is stored with SP). There are several paths and opportunities here that should be considered, as well as a transition path from what we can achieve to unblock R2 move and where we want to be in the middle term. The most important thing in the near future is to run computation next to the data. Eventually, get to a point that we can have multiple services running in different providers and they can run based on data location. Could also be interesting to experiment running this in Filecoin Station, probably we could have them download pieces via Roundabout for validation. ## Proposal Note that this depends on existence of locationClaims (either made by the client, or temporarily in bucket events...). In Milestone 1, the goal is to drop the bucket event in favour trigger from client. This would be the rough plan: 1. Formalize protocol for computation 1. Capability that can be provided by a service running in CF or AWS to validate piece CIDs for a given content 2. AWS service providing given capability 3. Drop bucket trigger, update `filecoin/offer` handler to remove `skipFilecoinSubmitQueue` https://github.com/web3-storage/w3up/blob/main/packages/filecoin-api/src/storefront/service.js#L24 and update `handleFilecoinSubmitMessage` handler to invoke and wait for Piece Validation https://github.com/web3-storage/w3up/blob/main/packages/filecoin-api/src/storefront/events.js#L37 In Milestone 2, the goal is to get ready for writes directly to R2. This would be the rough plan: 1. Deploy piece validation service in CF Workers 2. Make `handleFilecoinSubmitMessage` decide which deployed service to use, based on the known locations of the data. For Milestone 3 (after R2 move), we should consider a way to claim services by capability providers. This would enable us to decentralize from operating all services, and even rely on Filecoin Station nodes to run compute jobs.