# Off-chain merkelized data storage
#### Note: intended to be an initial version towards decentralized storage
## Background and introduction
The Mina Foundation (MF) would like to announce a Request for Proposals (RfP) for the design and implementation of an off-chain data storage solution for Mina Protocol.
Mina is the world’s lightest blockchain, powered by participants. With its elegant design, Mina is the first Layer-1 enabling easy programmability of zero knowledge smart contracts, zkApps. The unique privacy and security features and ability to connect to any website via its zkApps enable a more secure and private Web3—paving the way to the democratic future we all deserve. Mina is stewarded by the Mina Foundation, a public benefit corporation headquartered in the United States.
#### Project Overview
The goal of this RfP is to enable the Mina community to build out initial off-chain storage capability and tooling such that early production zkApps can store and access off-chain data for their projects and users, with maximum simplicity and low cost. We want to enable zkApp developers to explore use cases that require off-chain storage.
Currently, the storage capacity of a zkApps application is limited to 8 fields of 32 bytes each of arbitrary storage. A common technique is to store off-chain data in a Merkle tree, and then to store the root of the merkle tree on chain as an attestation to what is stored off-chain.
Typical smart contract protocols must execute smart contracts across all nodes during the span of a slot or block. This puts a fundamental cap on the computation which can be performed between blocks. Models like EVM’s gas model help for example account for the computational cost of this work, but such accounting must be known in advance, by compiling smart contract code down to EVM opcodes. This accounting is necessary because the trust of the system is solely based on consensus. The computational cost of making an external connection to a data source via HTTP or websockets for example, is variable and cannot therefore be accounted for in advance making such capabilities impossible in such models.
Through the power of zk-SNARKS, zkApps on the other hand break execution into two distinct steps:
1. Off-chain proving
2. On-chain verification.
During the off-chain proving, network participants generate constant size proofs of any arbitrary computation, the security of which is math-based, off-chain. Because this step must only be executed or proven only once, off-chain by the sender of a transaction, requesting data from external sources and incorporating such data into the transaction is almost trivial.
Once the transaction sender has requested the data appropriate to their needs, and has incorporated it appropriately into a zkApp transaction proof, the transaction can be sent to the network where the computational cost of it’s verification is constant (a handy feature of zk-SNARKS) and can therefore be verified by all Mina block producers.
We expect that developers on Mina in the long run and will encourage a flexible set of options for their data storage, allowing them to choose between the strength of their data’s availability guarantees and cost.
## Project Technical Design Overview
### Specification
This proposed implementation puts data custody in a set of servers that allow zkApp end-users to coordinate updates to data that reflect in-zkApp changes. The data availability guarantee is contingent on some trust assumptions being placed on the set of servers.
A data-custodying zkApp on creation commits to N servers, and a security parameter k which sets how many servers must commit to storing a piece of data in order for the zkApp to proceed with updating to that data. The security parameter k determines the balance between difficulty censoring users, and the risk of data availability. Setting k to 1 for example, requires only 1 server to commit to storing a piece of data, and so while it is hard to censor, also creates a risk to data availability. Setting k to N requires every server to commit to storing a piece of data, and so is easy to censor, but provides the maximum guarantee for data availability.
We propose for milestones 1 and 2, setting N to 1 and k to 1, which while placing custody in a single server, we suspect to still be useful for building any prototype zkApp and many production zkApps.

### Milestone 1: Single data-storage provider flow
For this milestone, the goal would be to build a single data custody server that stores pending updated merkle trees and a zkApp library with the functionality described below.
The data custody server stores merkle trees for the zkApp, and holds a Mina-compatible private key.
** NOTE:** This design does not attempt to mitigate any race conditions. This is orthogonal and handled by developers when they use sequence events/reducers in their zkApps. These integers are solely for ensuring state can be safely purged on the backends.
Consider a zkApp currently in state A, with users aiming to move the zkApp to states [B,C,D, …]. The core requirement we want is that the zkApp shouldn’t move to a state [B,C,D…] unless that data is stored somewhere.
To accomplish this, we have the custody server assign each update a number. With the zkApp in state S_{i}, user requests to the custody server to store their new state can now be called S_{i+1}, S_{i+2}, S_{i+3}, etc. The custody server signs new state S_{i+k} with their private key, signing off that they are indeed storing that state.
The zkApp accepts new state S_{i+k} provided that:
S_{i+k} is signed
S_{i+k} is at a greater number than the current S_{i} (eg accept S_5 from S_2, but not S_1 from S_2)
The transition from S_i to S_{i+k} matches the state transition rules described by the zkApp.
The zkApp has 3 fields to store this,
The public key of the custody server
S_{i}, a hash representing a merkle root of data stored on the custody server
i, the currently stored state number.
In the following, we imagine S_i to be an entire merkle tree, with the merkle root stored on chain, and the entire merkle tree stored by each data storage server. Future versions may include incremental merkle tree updates, for bandwidth efficiency.
The custody server always issues signatures to states it has stored with incrementing new state numbers. Because a zkApp can only accept newer state numbers, the custody server knows, that once a zkApp reaches state S_{i} on chain, any prior stored states S_{ j} where j < i are safe to delete. The data custody server periodically polls the zkApp account, to check the current state, and delete old data.
The complete flow:
User (i.e. the zkApp frontend) reads the zkApp account state containing the data merkle root, S_{i} (Merkle Trees are implemented in SnarkyJS already.)
The zkApp frontend optionally makes an RPC request to the custody server, requesting the data corresponding to S_i
The zkApp frontend makes an RPC request to the custody server, requesting some new data S_{new} is stored by the server. It receives back a signed merkle root, say signed as S_{i+5}
User sends zkApp transaction to Mina, constructed using the signed merkle root S_{i+5}. Provided S_{i+5} follows the rules for the zkApp account, listed above (and to be included in this offchain storage library), and the app is still in state S_{i} (the zkApp developer can enforce this using a precondition asserting that the current on-chain state is S_{i}), the zkApp account moves to state S_{i+5}
The custody server deletes updates S_{ j} where j < i+5.
Now anyone can see the merkle root S_{i+5} currently stored on the zkApp account, query the custody server, and fetch the S_{i+5} data.
### Milestone 2: Fee model and spam protection
###
To solve for spam protection, token payment (in MINA) for data storage would occur ahead of time, granting the user API calls they can use to perform future updates as needed. Token payment will be used to avoid users spamming the server, filling it with updates.
To prevent this, tokens must be sent to the custody server by a user in order to request data storage. A future version could operate with a dedicated custom token.
For example, the custody server could require a user to send 3 tokens in order to have credits for 3 data updates.
### Milestone 3: Multi data-storage providers
Milestone 3 is where multiple servers are implemented, each with their own state update numbers. States are now written as S_i|j|k|l|…, where each i, j, k, l, … corresponds to a state number on a different server.
The zkApp attempts to commit to N servers, and provided K servers succeed then the interaction is deemed successful.
The zkApp now only updates if the following is true:
S_a|b|c|d|… is signed by all data custody providers
S_a|b|c|d|… is at a greater number than the current S_i|j|k|l|… (eg accept S_5|4 from S_2|3, but not S_5|2 from S_4|3)
The transition from S_i|j|k|l… to S_a|b|c|d|… matches the state transition rules described by the zkApp.
### User stories
As a zkApp developer, i want to read and write data an off-chain service provider in a way that is provable.
As a zkApp developer, i want to pre-pay for additional data storage for users of my application.
### User experience
Proposer should estimate wait time for read/write calls
Data should be easy and convenient to retrieve
### Infrastructure
Proposer than define their own infrastructure strategy, which should be accessible and easily deployable. Highly-available, automatically scalable solutions (eg Cloudflare) are encouraged.
The solution should ensure long term availability of the stored data, provided assumptions are met
### Evaluation Metrics
In order for your proposal to be successful, make sure you answer the following evaluation questions in this proposal. How will we ultimately determine which proposal and/or team is the best fit for this project? List which criteria you will consider when choosing the proposal you will ultimately choose.
We will evaluate proposals based on the following criteria:
Experience and technical expertise
Past performance history
Samples and / or case studies from previous projects
Projected costs
Responsiveness and answers to questions