Thanks @eduadiez, @aliasjesus, @kevaundray, @dankrad, @protolambda @n1shantd, @rauljordaneth, @dmarzz for helpful discussions
Blobs have a more rigid pricing model than transaction calldata: you can only buy data in chunks of 131072 bytes.
To be economically efficient, rollups today can:
Also, to perform atomic operations ZK rollups need to publish data together with a validity proof to achieve instant finality. If UX requirements demand frequent submissions rollups may want to perform a state update before being able to fill an entire blob.
usage pattern example
In its most basic form, the usage of the blob multiplexer service looks like the following sequence.
Let's introduce some requirements first. We'll discuss in detail latter why they are necessary:
This protocol is initially focused on sequencers as users. Today, all major rollups live require permissioned sequencers, i.e. only an authorized party is able to publish data. Thus, when relaying data publishing to a 3rd party there must exist some link but to the authorized sequencer entity.
Where possible a trustless protocol is prefered over a trusted solution. However the requirements above make it quite hard to offer an economically attractive trustless protocol.
Trusted multiplexer | Trustless multiplexer | |
---|---|---|
Payment settlement | off-chain, after the fact | on-chain, atomic |
Cost per byte overhead | zero, or negligible | significant |
Censorship resistance | Somewhat, possible with inclusion proofs | Somewhat, possible with inclusion proofs |
Data origin authentication | Somewhat, with data signatures | Yes, required |
The design of a trustless protocol requires a key primitive: can you succintly proof that
.
Without this primitive it's very difficult to build a trustless multiplexer layer, since you can not build trustless payments.
Trivial to offer multiple settlement options since the multiplexer is a trusted party. This could include:
Rollups live today require permissioned sequencing. If data is published by a 3rd party (blob sharing protocol), there must exist a way to authenticate the data origin. Adding more complex authentication protocols increases the cost per byte.
Multiplexer just posts data, and expects consuming users to perform a second action to include an authenticated pointer to the previously posted data:
To derive the L2 chain, simply iterate the authenticated pointers and resolve their data.
No need for 2nd transaction or additional communication roundtrip. However and observer can't be convienced of blob authenticity without downloading the full blob.
To derive the L2 chain, iterate Multiplx txs and filter by relevant blob transactions with valid signature from the user (rollup) address.
The multiplexer is not a trusted party, and can publish invalid data. In section TODO we explore how to authenticate the data, but it is currently required to download the full blob to conclusively assert its correctness. The only link to data is the blob versioned hash.
In both a trusted and trustless multiplexer, the publisher can't explicitly force the multiplexer into sending a blob transaction. Therefore the multiplexer can censor a publisher by ignoring it's data publishing requests.
A multiplexer is positively incentivized to include data publishing requests to collect fees. But this is not be sufficient to prevent profitable censorship attacks. In the trusted model, the social reputation of the company running the multiplexer adds an extra layer of protection.
A multiplexer can also be incentivized via penalties to not exclude data publishing requests. Consider the following protocol:
The multiplexer is now disincentivized (up to its stake value) to not censor publishers. However, publishers should build their protocol such that both a multiplexer and themselves can publish data.
WIP TODO
Requires some mechanism to execute user payments conditional on blob inclusion.
Trusted execution on Intel SGX
EthGlobal istanbul '23 project is attempting it with Flashbots SUAVE architecture: https://ethglobal.com/showcase/blob-merger-k7m1f
On-chain proofs
Requires some not yet figure out primitives, and will likely be too expensive to do on L1.
ZK rollups have some extra requirements to optimize prover costs:
Refer to How does Polygon ZK-evm publish / consume data for details on their current architecture.
To optimize costs, on chain ZK provers minimize the public inputs typically using very few (or a single) commitment to all data needed for the computation. Part of the verifier circuit, includes checking that the commitment is correct. With EIP-4844, this single commitment must include the versioned hash as sole link to transaction data.
How to verify data subset on aggregated blob
Cheapest strategy is to compute the KZG commitment for the full data in their native field, and then do an equivalence proof (see @dankrad/kzg_commitments_in_proofs). Then extract your subset data of interest for the rest of the circuit execution. While it requires to ingest some
Thus, the logic to handle DA on full blobs or partial blobs is the same. Implement partial blob reading, and set offset to 0 data to all for the full blob case.
TODO rest of integration @aliasjesus, @eduadiez
Currently OP stack uses a very simple strategy to post data, send a blob transaction to some predetermined address. It authenticates data origin with the transaction's signature.
This architecture must change slightly to accomodate data published from a 3rd party (untrusted) account.
TODO integration, @protolambda
https://eips.ethereum.org/EIPS/eip-4844#opcode-to-get-versioned-hashes
Versioned hash is available inside the EVM exclusively during its transaction execution. Instruction BLOBHASH
reads index
argument and returns tx.blob_versioned_hashes[index]
Beacon API route getBlobSidecars
allows to retrieve BlobSidecars
for block_id
and a set of blob indices
class BlobSidecar(Container):
index: BlobIndex # Index of blob in block
blob: Blob
kzg_commitment: KZGCommitment
kzg_proof: KZGProof # Allows for quick verification of kzg_commitment
...
append a suffix of authentication to get rid of the send consume tx. To minimize intrinsic costs. Data readers need to capacity to discard invalid data.
Append in send blob tx calldata, a signature of the user to authenticate the data being submitted.
proof to the chain that blob includes that data.
data[128:256] belongs to address Y,
Rationale
Questions:
Read process:
call_data = [ Signed KZG range-proof over chunk ] = sign_by_rollup_sequencer(range_proof(range info, KZG proof data))
blob = [ chunk 0 ][ chunk 1 ] ...
chunk = [ data ]
Read process:
call_data = [ Signed KZG range-proof over chunk ] = sign_by_rollup_sequencer(range_proof(range info, KZG proof data))
blob = [ chunk 0 ][ chunk 1 ] ...
chunk = [ data ]
Overview of how some major rollups live today publish and consume data pre EIP-4844 and post EIP-4844.
Relevant to the topic, Dankrad notes on ideas to integrate EIP-4844 into rollups: https://notes.ethereum.org/@dankrad/kzg_commitments_in_proofs
Current architecture can't handle invalid sequencer submissions gracefully. Thus, sequencer role is permissioned. The smart contract guarantees that the data hash is correct, and that all data is eventually process by the prover with a hash chain.
In PolygonZkEVM.sequenceBatches the hash chain is computed with an accumulator consisting of the previous accumulator hash, and the current transactions (ref PolygonZkEVM.sol#L572-L581).
// Calculate next accumulated input hash
currentAccInputHash = keccak256(
abi.encodePacked(
currentAccInputHash,
currentTransactionsHash,
currentBatch.globalExitRoot,
currentBatch.timestamp,
l2Coinbase
)
);
Only persisted data to link with the future verifier submission in the accumulator hash of this batch (ref (PolygonZkEVM.sol#L598-L602)[https://github.com/0xPolygonHermez/zkevm-contracts/blob/aa4608049f65ffb3b9ebc3672b52a5445ea00bde/contracts/PolygonZkEVM.sol#L598-L602])
sequencedBatches[currentBatchSequenced] = SequencedBatchData({
accInputHash: currentAccInputHash,
sequencedTimestamp: uint64(block.timestamp),
previousLastBatchSequenced: lastBatchSequenced
});
In PolygonZkEVM.verifyBatchesTrustedAggregator a verifier posts a new root with its proof. The actual call to verify the proof is rollupVerifier.verifyProof(proof, [inputSnark])
(ref) where inputSnark
is computed with data submitted by the sequencer as (ref (PolygonZkEVM.sol#L1646-L1675)[https://github.com/0xPolygonHermez/zkevm-contracts/blob/aa4608049f65ffb3b9ebc3672b52a5445ea00bde/contracts/PolygonZkEVM.sol#L1646-L1675])
bytes32 oldAccInputHash = sequencedBatches[initNumBatch].accInputHash;
bytes32 newAccInputHash = sequencedBatches[finalNewBatch].accInputHash;
bytes memory inputSnark = abi.encodePacked(
msg.sender, // to ensure who is the coinbase
oldStateRoot, // retrieved from smart contract storage
oldAccInputHash, // retrieved from smart contract storage
initNumBatch, // submitted by the prover
newStateRoot, // submitted by the prover
newAccInputHash, // retrieved from smart contract storage
newLocalExitRoot, // submitted by the prover
finalNewBatch // submitted by the prover
);
SequencerInbox.addSequencerL2BatchFromOrigin
https://github.com/OffchainLabs/nitro-contracts/blob/695750067b2b7658556bdf61ec8cf16132d83dd0/src/bridge/SequencerInbox.sol#L195
Post EIP-4844
WIP diff https://github.com/OffchainLabs/nitro-contracts/compare/main…4844-blobasefee
Version hashes are read from an abstract interface, not yet defined. Ref src/bridge/SequencerInbox.sol#L316
data
used here? It's not the RLP serialized transaction but the contract call databytes32[] memory dataHashes = dataHashReader.getDataHashes();
return (
keccak256(bytes.concat(header, data, abi.encodePacked(dataHashes))),
timeBounds
);
PR for EIP-4844 support (no smart contract changes)
https://github.com/ethereum-optimism/optimism/pull/7349
_Optimism construction pre-eip4844
Paper by Davide Crapis, Edward W. Felten, and Akaki Mamageishvili
https://arxiv.org/pdf/2310.01155.pdf
TODO, by TLDR;
(1) When should a rollup use the data market versus the main market for sending data
to L1?
(2) Is there a substantial efficiency gain in aggregating data from multiple rollups and
what happens to the data market fees?
(3) When would rollups decide to aggregate and what is the optimal cost-sharing scheme?
Data publisher has a data chunk data_i
encoded as data_i
and
The aggregator concats data chunks of different sizes and computes an overall commitment
[ data_0 ][ data_1 ][ data_2 ]
[ data ]
The aggregator must allow a verifier to convience itself that the signed
Terminology
The interpolation polynomials are evaluated at the roots of unity, in our example
The verifier is given
Given a subset of points
For this polynomial to exist (can't divide by zero),
The verifier is given
Refer to Dankrad's notes "Multiproofs" sections or arnaucube's batch proof notes for more details.