# State-storage integration ## Context **State-storage independence** Up until now, we have considered state and storage independent of each other. We commonly refer to the FVM as a Compute-over-State (CoS) engine, and place the responsibility of computing over _any_ piece of data stored in Filecoin's Decentralized Storage Network (DSN) to Compute-over-Data (CoD) products and networks. Examples of the latter include, but are not limited to: Lilypad, Fluence, etc. **Operating on small data** While this separation is conceptually makes sense for large data, there is a class of data inputs (let's call it "small data") where the economics of performing computation on-chain are sensible. For that class of data, the action of pushing a compute job to an L2 network may involve just too much overhead, delay, and complexity. Such flows involve oracles, finality, callbacks, daemons, etc. It would just be simpler to perform the computation on-chain, if only the data could be sourced from the DSN. Furthermore, thanks to the recent standardisation and adoption of [PoDSI](https://docs.lighthouse.storage/lighthouse-1/filecoin-virtual-machine/podsi-a-simple-overview), the existence, addressability, and safeguarding of small data stored in Filecoin's DSN is now provable in smart contracts. This makes it possible to store data emitted from a smart contract in the DSN, and prove that fact to the contract. **Impediments** The main impediment that remains accessing small data stored in Filecoin's DSN from smart contracts, as well as the inverse flow: storing small data in Filecoin deals from smart contracts. The FVM does not provide any syscalls or mechanisms to perform these interactions. This is mainly due to the fact that the chain and the DSN exist in different planes of the Filecoin network. Data is held by _concrete_ storage providers, and its access patterns and QoS are dependent on the deal terms negotiated between the client and the SP at storage time. Conversely, the chain state transitions are executed synchronously by all validators and followers. Unfortunately, ubiquitous and synchronous loading of data from the DSN cannot be relied on _during_ the execution of a smart contract. **Lost opportunities** Unfortunately, this creates a functional chasm and paradox in the Filecoin network that stakeholders are surprised by. Being a storage network, developers tend to expect smart contracts to be able to seamlessly operate on stored data. Such ability would be generally very useful to advance the web3 industry. This is not possible today, and little exploration has been done to solve this. **Solving this** This spec attempts to outline a solution for small data. Bridging these two planes involves asynchronous interactions. While the future roadmap of the FVM includes async programming primitives, these interactions can be achieved today using EVM/Solidity mechanics, coupled with standard interfaces. **Examples** Examples of "small data" stored in the Filecoin DSN that may be operated on in smart contracts include: NFT metadata, file descriptors, signed ACL definitions, L2 data (e.g. transactions, headers), etc. ## Core ideas **Blobs** We henceforth refer to small off-chain data in scope of this specification as "blobs". **Content-addressing** All blobs must be addressed by CID. The Filecoin Solidity library already has basic utility functions to handle CIDs in Solidity code. We may need to extend the library to introspect CIDs (e.g. to extract the codec, multihash, etc.) **Block-level (DAG-unaware)** The first version of this standard enables block-level access. It is not aware of IPLD DAGs. In IPLD terms, the interface proposed here can be construed as a Blockstore and not as a DAG resolver. On the eventual path to DAG awareness, we must consider: - Usage of selectors to traverse and filter the DAG. - Efficient/compact CAR-like bundling of selected data. - On-chain validation cost of DAG reconstruction and validation. - Solidity IPLD libraries to handle the DAG (maybe this is becoming too intense now...) **Sourcing** Blobs may be sourced from _any_ content-addressed network or storage. For Filecoin specifically, we envision SP software like Boost supporting the standard interfaces proposed here. That would enable Filecoin SPs to earn a new revenue stream by participating in blob storage and retrieval. However, as a first step, this spec can be prototyped via a dedicated program that interacts with Lotus and Boost RPC APIs. **Events as transport** This standard relies on on-chain events, and client collaboration, to move data around, while hitting a sweet spot between safety and efficiency. On the load-side, presenting a blob would ideally resume execution. However, the context of the transaction would carry the blob provider as a sender. This may lead to reentrancy vulnerabilities and other footguns. Therefore, the load-side simply emits an event with the presented payload (after validating it), and expects the original caller to use that payload to resume execution through the `Resumer` interface below. These mechanics require dapp collaboration. **Maximum size** Because the proposed standard uses events, payloads that can transit through this standard are limited to a little bit less than 8KiB. **Load-side** When a contract desires to load a blob by CID: 1. The contract emits a standard event specifying the CID, the reward paid, the expiry of the request in epochs, and a callback ID. - The CID must use blake2 or sha256 hashing (TODO: unclear if blake2b, blake2f), so that it's possible to later rehash the payload via an EVM precompile. - Note that the data at rest must also be addressed by the same multihash in the store. 3. It then returns in a dapp-specific manner to signal that an async load step has begun. 4. SPs watching for such standard events check their local stores for the requested CID and evaluate the profit they stand to make. 5. If positive, they submit a standard message to a standard method, inlining the data, and specifying the callback ID. 6. The contract matches the callback ID with some suspended state (continuation-style), calls the blake2 or sha256 precompile on the presented payload, and verifies that the resulting multihash matches the expected one. 7. If yes, it transfers the reward to the caller's address, emits inlining the verified payload, and considers the requested as satisfied. 8. The original caller (who is watching events) can then re-enter the suspended contract logic by supplying the payload to it, supplying the payload that was just inlined in the event. **Store-side** When a contract desires to store a blob in Filecoin's DSN: 1. The contract precomputes the blake2/sha256 multihash CID. 2. (Unfortunately) The contract also needs to calculate the piece CID of the subpiece :cry: 3. The contract emits a standard event (à la DealClient) specifying the CID, and inlining the data to be saved. It also indicates the reward, the expiry of the store request in epochs, and a callback ID. Note that this flow shares constraints and goals with the Client Contract, so ideally we should unify. 4. It then returns in a dapp-specific manner to signal that an async storage step has begun. 5. Aggregators watching for this event package the subpiece into an aggregated piece. 6. Once the piece has been stored in the network, they present a PoDSI proof to the contract. 7. The contract validates the proof and verifies that the deal is active in the network. If yes, it transfers the reward to the aggregator, and considers the request satisfied. 8. The original caller may then re-enter the suspended contract logic. **Economics** Load-side. 1KiB: - On-chain inclusion cost (twice, for load and resume): 41173712 gas = 4.117371200 nanoFIL @ min base fee - EVM blake2 precompile, hasing over 1KiB (to compute the multihash component of the CID of the payload): TBD gas = TBD nanoFIL @ min base fee. - Emitting the payload as non-indexed data in the event. Store-side. 1KiB: - TBD. **Libraries** Two libraries are needed: 1. Solidity library, implementing the below interfaces, as well as functions to manage the async flow (suspension, resumption, correlation). 2. Javascript library that manages the flow client-side. ## Specification ### Contract interface ```solidity= enum Result { OK, ERR, TIMEOUT, CANCELLED } event BlobLoadReq { correlationId uint256 cid filecoin.Cid // type from filecoin-solidity reward uint256 // attoFIL timeout uint64 // epochs } event BlobLoadRes { correlationId indexed uint256 result Result payload bytes // filled if result is OK } event BlobStoreReq { correlationId indexed uint256 cid filecoin.Cid reward uint256 // attoFIL timeout uint64 // epochs payload bytes // data to store } event BlobStoreRes { correlationId indexed uint256 result Result } // Not public; here for illustration purposes. enum Status { PENDING, FULFILLED } enum Operation { LOAD, STORE } // Not public; here for illustration purposes. struct Correlation { entrypoint uint8 operation Operation caller address origin address cid filecoin.Cid pieceCid filecoin.Cid // if STORE status Status timeout uint64 } interface Resumer { // resume resumes a job that had previously been suspended with // the supplied payload. The separation between loading/storing and // resumption is intended for safety and security reasons. // // resume should validate that the caller and origin // match the original ones (such that the contract will resume // under the same context), and reconfirm that the // payload matches the requested CID. // // Once resume is called, the correlation state can be deleted. function resume(uint256 calldata correlationId, bytes calldata payload) external; } interface Withdrawer { // withdraw enables participants to withdraw rewards that have // been credited to them. function withdraw(uint256 calldata amount) external; } interface BlobLoader { // deliverBlob should be called by storage providers or data suppliers // to deliver a loaded blob to the contract. // // deliverBlob should check that the request is still pending, and that // the payload hashes to the requested CID. If positive, it disburses // the reward by crediting it to the caller. The reward can later // be withdrawn through the Withdrawer interface. // // For security and safety reasons, deliverBlob does not resume // execution. It simply emits a BlobLoadRes event with status OK // and inlining the supplied data. // // THe client would be monitoring for this event and would carry the // data through to `resume()`. function deliverBlob(uint256 calldata correlationId, bytes calldata payload) external; } interface BlobStorer { // proveStored should be called by storage providers or data suppliers // to deliver proofs of storage and claim rewards. // // proveStored emits a BlobStoreRes with status OK if all correct, // and updates the correlation state appropriately. // // This method uses PoDSI, which proves the inclusion of a subpiece // in a larger piece (=deal). function proveStored(uint256 calldata correlationId, uint64 calldata dealId, uint64 _minerId, InclusionProof calldata proof, InclusionVerifierData calldata verifierData) external; } ``` ## Usage by contracts - Contract logic needing to load or store data needs to be continuation-aware. - Today, this means leaking technical complexity into the contract. - Contracts will need to structure such operations in stages for re-entry via entrypoints. - In the future, when FVM provides async directives, Wasm actors will be able to use those and have the continuation managed for them by the FVM. ## SP software ## Client software ## TODO - Trying to avoid pieceCID computation inside the contract. - DAG awareness - IPLD selectors. - Integration with content-addressed databases like Tableland to load and store records.