# Large Preimage Proposals
"Large preimage proposals" are an optimization over the [original, computation-heavy solution](https://github.com/ethereum-optimism/optimism/pull/8870), where instead of performing the absorption and permutation on-chain for each keccak block absorbed, the user posts ***commitments*** to the computation. The submitter of the large preimage will still need to post the full pre-image on-chain through calldata so that it is available, but rather than doing the expensive `SHA3` computation on-chain, we merkleize the keccak blocks + a commitment to the internal `keccak256` state matrix by adding the input to an append-only merkle tree. If there is an incorrect state matrix claimed, a challenger can provide a merkle proof for the prestate and poststate leaves, and perform a single step of the keccak permutation on-chain to disprove the claimed data.
## Smart Contract Semantics
### Merkle Tree Structure
For each large preimage proposal, a tree of depth `16` is allocated. This supports preimages up to $2^{16} = 65,536$ keccak blocks in size, or ~`8.91` MB. The merkleization process follows the exact same semantics as the [Beacon Deposit contract](https://etherscan.io/address/0x00000000219ab540356cbb839cbe05303d7705fa#code), where we begin with an empty merkle tree with all leaves set to `0`:

*Source: [https://medium.com/@josephdelong/ethereum-2-0-deposit-merkle-tree-13ec8404ca4f](https://prod-files-secure.s3.us-west-2.amazonaws.com/788cc53a-3b6c-4a7f-af1c-46041e9de0e4/afd87444-5521-478a-89e2-43f941c96f04/Untitled.png)*
The merkle root is computed incrementally with the help of the precomputed zero hashes, which prevents having to merkleize a tree this large on-chain in its entirety, which is infeasible due to block gas limits.
### Initializing a Large Preimage Proposal
Before posting a large preimage, the submitter will need to initialize the proposal with a bit of metadata. This will involve claiming the full size of the preimage and the offset (in bytes) of the preimage part within the preimage ahead of time.
The reason that we need to claim these items ahead of time is due to the preimage size mixin when posting parts to the oracle. Preimages in the `PreimageOracle` are prepended with a big-endian 64 bit `length` field, which cannot be known ahead of absorbing the entire preimage. The part offset is also needed early in order to determine when to extract the preimage part from the streamed data.
This claimed preimage size **must** be checked for correctness, and the preimage offset **must** be checked to be within bounds whenever the proposal is finalized.
### Appending Leaves
In order to append to this merkle tree, we only need to store a single branch for the most recently inserted leaf. This means that the minimum required storage slots to keep track of is equal to the tree depth. In addition to these storage slots, we also need to keep track of some metadata about the proposal, such as the number of blocks absorbed so far, etc., along with the preimage part when it gets absorbed.
When the final block is absorbed, the proposal is marked as `finalized`, which begins the challenge period. During this time, any user can challenge the large preimage proposal. Once it has matured past the challenge period with no valid challenges, the preimage proposal can then be squeezed and the part may be added to the authorized preimages mappings for consumption by the FPVMs.
### Challenging
To challenge an invalid large preimage proposal, the challenger will need to find the ***first*** divergence within the claimed interior keccak state. Once they’ve found it, they will take one of two routes:
1. If the divergence is in the first keccak block, they will call a special function to challenge the first block. The setup state of the `keccak256` state matrix is a known constant, defined as:
$$
M = \begin{bmatrix}
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0
\end{bmatrix}
$$
With this route, the challenger will only need to supply a merkle proof for the first leaf that was added to the tree and reveal the first leaf itself. The contract will then absorb the first claimed input into the constant starting state matrix, and if the hash of the resulting state matrix doesn’t match the commitment within the post-state leaf, the proposal will be challenged.
2. If the divergence is past the first keccak block, they will call the standard challenge function and supply two merkle proofs for contiguous leaves (which can be verified since each leaf commitment also commits to the block index, and both must be revealed.) The first proof will be for the agreed upon prestate leaf, and the second proof will be for the disputed post-state leaf. The state matrix for the prestate will also need to be revealed as the starting point, which can be verified by hashing it and comparing with the state commitment in the revealed prestate leaf.
The contract will then absorb the input from the post state leaf into the pre state matrix, and if the hash of the resulting state matrix doesn’t match the commitment within the post-state leaf, the proposal will be challenged.
### Squeezing
Once a large preimage proposal has matured past the challenge period, it may then be “squeezed” to persist the preimage part and final size into the authorized preimage part mappings in the `PreimageOracle`, which may then be consumed by the FPVMs. These mappings ********must******** never receive data that has not been authorized, hence why this operation is a poke.
This process will require revealing the second-to-last state matrix and submitting merkle proofs for the final two contiguous blocks, where the contract computes the final digest by applying the final block’s claimed input to the second-to-last state matrix. This produces the final digest of the full preimage on-chain via a single keccak permutation step, and then we can persist the preimage part picked up during the absorption process into the authorized mappings.
## Definitions
- **“Keccak Block”** - The `keccak256` hash function is based on the `SHA3` permutation, which has a rate of 1088 bits (or 136 bytes) per block. The full permutation is over a 5x5 state matrix of `uint64` types as so:
$$
M = \begin{bmatrix}
a & b & c & d & e \\
f & g & h & i & j \\
k & l & m & n & o \\
p & q & r & s & t \\
u & v & w & x & y
\end{bmatrix}
$$
- **State Matrix Hashing Function:**
```rust
// pseudo-code
fn hash_state_matrix(matrix: [u64; 25]) -> [u8; 32] {
keccak256(matrix.abi_encode())
}
```
- **Leaf Commitment Construction:**
```rust
// pseudo-code
fn leaf_hash(
keccak_block: [u8; 136],
keccak_block_index: U256,
state_matrix: [u64; 25]
) -> [u8; 32] {
let mut hash_buf = [0u8; 200];
hash_buf[0..136].copy_from_slice(keccak_block);
hash_buf[136..168].copy_from_slice(keccak_block_index.as_slice());
hash_buf[168..200].copy_from_slice(keccak256(state_matrix.abi_encode()));
keccak256(hash_buf)
}
```