Thanks @eduadiez, @aliasjesus, @kevaundray, @dankrad, @protolambda @n1shantd, @rauljordaneth, @dmarzz for helpful discussions

Motivation

Blobs have a more rigid pricing model than transaction calldata: you can only buy data in chunks of 131072 bytes.

To be economically efficient, rollups today can:

Wait time
$t$ to buffer 131072 bytes to fill an entire blob
Post more frequently at intervals
$t / k$ , sharing the blob with other participants

Also, to perform atomic operations ZK rollups need to publish data together with a validity proof to achieve instant finality. If UX requirements demand frequent submissions rollups may want to perform a state update before being able to fill an entire blob.

usage pattern example

Overview

In its most basic form, the usage of the blob multiplexer service looks like the following sequence.

Let's introduce some requirements first. We'll discuss in detail latter why they are necessary:

Minimum (ideally zero) overhead incurred by user compared to self-publishing full blobs
Data submissions and their location should be authenticated by the user
Users should be able to filter and derive their published data from L1 cheaply

This protocol is initially focused on sequencers as users. Today, all major rollups live require permissioned sequencers, i.e. only an authorized party is able to publish data. Thus, when relaying data publishing to a 3rd party there must exist some link but to the authorized sequencer entity.

Where possible a trustless protocol is prefered over a trusted solution. However the requirements above make it quite hard to offer an economically attractive trustless protocol.

	Trusted multiplexer	Trustless multiplexer
Payment settlement	off-chain, after the fact	on-chain, atomic
Cost per byte overhead	zero, or negligible	significant
Censorship resistance	Somewhat, possible with inclusion proofs	Somewhat, possible with inclusion proofs
Data origin authentication	Somewhat, with data signatures	Yes, required

The design of a trustless protocol requires a key primitive: can you succintly proof that

d a t a_{i}

belongs to the version hash of a commitment of concatenated data? Sections below describe attempts at solving this issue.

Without this primitive it's very difficult to build a trustless multiplexer layer, since you can not build trustless payments.

Trusted multiplexer

Payment settlement

Trivial to offer multiple settlement options since the multiplexer is a trusted party. This could include:

Pre-paid credits via on-chain or off-chain transactions
Payment channels, updated after each successful inclusion
FIAT subscription model, i.e. Infura or Pinata

Data origin authentication

Rollups live today require permissioned sequencing. If data is published by a 3rd party (blob sharing protocol), there must exist a way to authenticate the data origin. Adding more complex authentication protocols increases the cost per byte.

2nd transaction to authenticate pointers

Multiplexer just posts data, and expects consuming users to perform a second action to include an authenticated pointer to the previously posted data:

To derive the L2 chain, simply iterate the authenticated pointers and resolve their data.

Single tx with signature over data

No need for 2nd transaction or additional communication roundtrip. However and observer can't be convienced of blob authenticity without downloading the full blob.

To derive the L2 chain, iterate Multiplx txs and filter by relevant blob transactions with valid signature from the user (rollup) address.

The multiplexer is not a trusted party, and can publish invalid data. In section TODO we explore how to authenticate the data, but it is currently required to download the full blob to conclusively assert its correctness. The only link to data is the blob versioned hash.

Censorship resistance

In both a trusted and trustless multiplexer, the publisher can't explicitly force the multiplexer into sending a blob transaction. Therefore the multiplexer can censor a publisher by ignoring it's data publishing requests.

A multiplexer is positively incentivized to include data publishing requests to collect fees. But this is not be sufficient to prevent profitable censorship attacks. In the trusted model, the social reputation of the company running the multiplexer adds an extra layer of protection.

A multiplexer can also be incentivized via penalties to not exclude data publishing requests. Consider the following protocol:

Multiplexer stakes some capital
Publisher makes a data publishing request available (i.e. include it in a blockchain)
After some timeout, publisher triggers an optimistic claim of censorship for the request made available above
Multiplexer has some sufficiently long interval to proof onchain that the data of the publishing request is part of a versioned hash of a past canonical block
If the multiplexer fails to dispute the censorship claim, its stake is slashed

The multiplexer is now disincentivized (up to its stake value) to not censor publishers. However, publishers should build their protocol such that both a multiplexer and themselves can publish data.

Trustless multiplexer

WIP TODO

Requires some mechanism to execute user payments conditional on blob inclusion.

Trusted execution on Intel SGX

EthGlobal istanbul '23 project is attempting it with Flashbots SUAVE architecture: https://ethglobal.com/showcase/blob-merger-k7m1f

On-chain proofs

Requires some not yet figure out primitives, and will likely be too expensive to do on L1.

TODO: Figure out a protocol to proof a versioned hash includes the data of a signed commitment by the user, before knowledge of the full blob

Usage examples

ZK rollup (Polygon ZK-evm)

ZK rollups have some extra requirements to optimize prover costs:

Sequencer must be able to form a hash chain at the SC level = aggregator must not be able to submit the batches out of order, ideally checked trustlessly
Sequencer on-chain smart contract must be able to test data integrity

Refer to How does Polygon ZK-evm publish / consume data for details on their current architecture.

To optimize costs, on chain ZK provers minimize the public inputs typically using very few (or a single) commitment to all data needed for the computation. Part of the verifier circuit, includes checking that the commitment is correct. With EIP-4844, this single commitment must include the versioned hash as sole link to transaction data.

How to verify data subset on aggregated blob

Cheapest strategy is to compute the KZG commitment for the full data in their native field, and then do an equivalence proof (see @dankrad/kzg_commitments_in_proofs). Then extract your subset data of interest for the rest of the circuit execution. While it requires to ingest some

k

factor more data, according to @eduadiez it's not significant.

Thus, the logic to handle DA on full blobs or partial blobs is the same. Implement partial blob reading, and set offset to 0 data to all for the full blob case.

TODO rest of integration @aliasjesus, @eduadiez

Optimistic rollup (OP stack)

Currently OP stack uses a very simple strategy to post data, send a blob transaction to some predetermined address. It authenticates data origin with the transaction's signature.

This architecture must change slightly to accomodate data published from a 3rd party (untrusted) account.

TODO integration, @protolambda

Appendix: blob retrieval

Versioned hash from EVM

https://eips.ethereum.org/EIPS/eip-4844#opcode-to-get-versioned-hashes

Versioned hash is available inside the EVM exclusively during its transaction execution. Instruction BLOBHASH reads index argument and returns tx.blob_versioned_hashes[index]

Blob data

Beacon API route getBlobSidecars allows to retrieve BlobSidecars for block_id and a set of blob indices

class BlobSidecar(Container):
    index: BlobIndex  # Index of blob in block
    blob: Blob
    kzg_commitment: KZGCommitment
    kzg_proof: KZGProof  # Allows for quick verification of kzg_commitment
    ...

Appendix: data authentication protocols

append a suffix of authentication to get rid of the send consume tx. To minimize intrinsic costs. Data readers need to capacity to discard invalid data.

Append in send blob tx calldata, a signature of the user to authenticate the data being submitted.

proof to the chain that blob includes that data.
data[128:256] belongs to address Y,

Rationale

Invalid offset problem: Since the sequencer initiates a latter transaction after the multiplexer posting the data, it can just verify the integrity of the data offchain, and publish the correct offset and data length

Questions:

Could the sequencer contract just reference the original transaction with a proof to the historical block? More expensive, but would bypass the payment mechanism.
Is the consume transaction really necessary?
Proto is thinking strongly on also full nodes following the chain. So how does your derive L2 from L1 function look like and if it's efficient in terms of relevant data to you vs data downloaded

construction 1

Read process:

Read send_blob_tx calldata
Verify that range proof matches KZG commitment of blob, without loading full blob
- Required to avoid replay of header without including the data
Verify that sequencer signature is correct
- Required to not read junk data from others
Read blob, only extract the data that was proven in the range-proof
call_data = [ Signed KZG range-proof over chunk ] = sign_by_rollup_sequencer(range_proof(range info, KZG proof data))
blob = [ chunk 0 ][ chunk 1 ] ...
chunk = [ data ]

construction 2

Read process:

Read send_blob_tx calldata
Verify that range proof matches KZG commitment of blob, without loading full blob
- Required to avoid replay of header without including the data
Verify that sequencer signature is correct
- Required to not read junk data from others
Read blob, only extract the data that was proven in the range-proof
call_data = [ Signed KZG range-proof over chunk ] = sign_by_rollup_sequencer(range_proof(range info, KZG proof data))
blob = [ chunk 0 ][ chunk 1 ] ...
chunk = [ data ]

Appendix: how rollups publish / consume data

Overview of how some major rollups live today publish and consume data pre EIP-4844 and post EIP-4844.

Relevant to the topic, Dankrad notes on ideas to integrate EIP-4844 into rollups: https://notes.ethereum.org/@dankrad/kzg_commitments_in_proofs

How does Polygon ZK-evm publish / consume data

Sequencer creates batches
At time
$t_{0}$ sequencer groups a list of batches and publishes it to the sequencer smart contract (calls PolygonZkEVM.sequenceBatches())
The prover watches the on-chain tx and starts producing proofs for those batches in parallel
At time
$t_{1}$ (~ 30 min after
$t_{0}$ ) the prover publishes the final proof to the verifier smart contract

Current architecture can't handle invalid sequencer submissions gracefully. Thus, sequencer role is permissioned. The smart contract guarantees that the data hash is correct, and that all data is eventually process by the prover with a hash chain.

In PolygonZkEVM.sequenceBatches the hash chain is computed with an accumulator consisting of the previous accumulator hash, and the current transactions (ref PolygonZkEVM.sol#L572-L581).

// Calculate next accumulated input hash
currentAccInputHash = keccak256(
    abi.encodePacked(
        currentAccInputHash,
        currentTransactionsHash,
        currentBatch.globalExitRoot,
        currentBatch.timestamp,
        l2Coinbase
    )
);

Only persisted data to link with the future verifier submission in the accumulator hash of this batch (ref (PolygonZkEVM.sol#L598-L602)[https://github.com/0xPolygonHermez/zkevm-contracts/blob/aa4608049f65ffb3b9ebc3672b52a5445ea00bde/contracts/PolygonZkEVM.sol#L598-L602])

sequencedBatches[currentBatchSequenced] = SequencedBatchData({
    accInputHash: currentAccInputHash,
    sequencedTimestamp: uint64(block.timestamp),
    previousLastBatchSequenced: lastBatchSequenced
});

In PolygonZkEVM.verifyBatchesTrustedAggregator a verifier posts a new root with its proof. The actual call to verify the proof is rollupVerifier.verifyProof(proof, [inputSnark]) (ref) where inputSnark is computed with data submitted by the sequencer as (ref (PolygonZkEVM.sol#L1646-L1675)[https://github.com/0xPolygonHermez/zkevm-contracts/blob/aa4608049f65ffb3b9ebc3672b52a5445ea00bde/contracts/PolygonZkEVM.sol#L1646-L1675])

bytes32 oldAccInputHash = sequencedBatches[initNumBatch].accInputHash;
bytes32 newAccInputHash = sequencedBatches[finalNewBatch].accInputHash;
bytes memory inputSnark = abi.encodePacked(
    msg.sender,       // to ensure who is the coinbase
    oldStateRoot,     // retrieved from smart contract storage
    oldAccInputHash,  // retrieved from smart contract storage
    initNumBatch,     // submitted by the prover
    newStateRoot,     // submitted by the prover
    newAccInputHash,  // retrieved from smart contract storage
    newLocalExitRoot, // submitted by the prover
    finalNewBatch     // submitted by the prover
);

How does Arbitrum publish / consume data

SequencerInbox.addSequencerL2BatchFromOrigin
https://github.com/OffchainLabs/nitro-contracts/blob/695750067b2b7658556bdf61ec8cf16132d83dd0/src/bridge/SequencerInbox.sol#L195

Post EIP-4844

WIP diff https://github.com/OffchainLabs/nitro-contracts/compare/main…4844-blobasefee

Version hashes are read from an abstract interface, not yet defined. Ref src/bridge/SequencerInbox.sol#L316

TODO: Why is data used here? It's not the RLP serialized transaction but the contract call data

bytes32[] memory dataHashes = dataHashReader.getDataHashes();
return (
    keccak256(bytes.concat(header, data, abi.encodePacked(dataHashes))),
    timeBounds
);

How does OP stack publish / consume data

PR for EIP-4844 support (no smart contract changes)

https://github.com/ethereum-optimism/optimism/pull/7349

_Optimism construction pre-eip4844

inbox address, EOA
batcher submits tx to inbox address
verifier traverses tx list for dest: inbox adress
checks that signature is from batchers

Appendix: EIP-4844 Economics and Rollup Strategies

Paper by Davide Crapis, Edward W. Felten, and Akaki Mamageishvili

https://arxiv.org/pdf/2310.01155.pdf

TODO, by TLDR;

(1) When should a rollup use the data market versus the main market for sending data
to L1?
(2) Is there a substantial efficiency gain in aggregating data from multiple rollups and
what happens to the data market fees?
(3) When would rollups decide to aggregate and what is the optimal cost-sharing scheme?

Appendix: KZG commitment proofs for data subset

Data publisher has a data chunk data_i encoded as

(w^{i}, y_{i})

where

i \in 0, . . ., k - 1

. It computes a commitment

C_{i}

which then signs. Multiple publishers send data_i and

C_{i}

signed to the aggregator. Each chunk may have a different

k

The aggregator concats data chunks of different sizes and computes an overall commitment

C

which will be posted on chain as part of the blob transaction.

[ data_0                ][ data_1 ][ data_2       ]
[ data                                            ]

The aggregator must allow a verifier to convience itself that the signed

C_{i}

commitment belongs to

C

. Let's define

C_{i o}

as the commitment to a data chunk offset by some

t

. The proof has two steps:

Proof that
$C_{i}$ equals
$C_{i o}$ multiplicatively shifted by
$w^{t}$
Proof that
$C_{i o}$ belongs to
$C$

Terminology

$f (x)$ is the interpolated polynomial over the concatenated data.
$f_{i} (x)$ is such that
$f (w^{j}) = y_{j}$ where
$j \in 0, . . ., k - 1$
$f_{i o} (x)$ is such that
$f (w^{t + j}) = y_{j}$ where
$j \in 0, . . ., k - 1$
$C = [f (τ)]$ , and
$C_{i} = [f_{i} (τ)]$ , and
$C_{i o} = [f_{i o} (τ)]$

Proof that
$C_{i o}$ equals
$C_{i}$ multiplicatively shifted by
$w^{t}$

The interpolation polynomials are evaluated at the roots of unity, in our example

w^{j}

where

j \in 0, . . ., k - 1

. A root of unity can be shifted

t

times by multiplying it by

w^{t}

C_{i o}

and

C_{i}

are commitments to polynomials of the same set of points, only multiplicatively shifted by

w^{t}

. We need to proof that

f_{i o} (x) = f_{i} (w^{t} x)

. With the Schwartz–Zippel lemma we can just proof that identity for a deterministic random point

r

f_{i o} (r) = f_{i} (w^{t} r)

r

is computed from the commitments

C_{i o}

and

C_{i}

The verifier is given

t

C_{i o}

C_{i}

, and evaluation proofs

f_{i o} (r)

and

f_{i} (w^{t} r)

. Verification routine:

Compute
$r$ from
$C_{i o}$ and
$C_{i}$
Verify evaluation proofs
$f_{i o} (r)$ and
$f_{i} (w^{t} r)$
Check
$f_{i o} (r) == f_{i} (w^{t} r)$

Proof that
$C_{i o}$ belongs to
$C$

Given a subset of points

x_{j}, y_{j}

where

j \in 0, . . ., k - 1

proof that

f (x_{j}) = y_{j}

f_{i o} (x)

is the interpolation polynomial over the point subset such that

f_{i o} (x_{j}) = y_{j}

for all

j

z (x)

is the zero polynomial

z (x_{j}) = 0

for all

j

τ

is from the trusted setup. We construct a quotient polynomial

q (x)

q (x) = \frac{f (x) - f_{i o} (x)}{z (x)}

For this polynomial to exist (can't divide by zero),

f (x) - f_{i o} (x) = 0

for all

j

. The proof is

π = [q (τ)]

The verifier is given

π

C

C_{i o}

. Verification routine:

Compute zero polynomial
$Z (x)$ and compute
$[Z (τ)]$
Do pairing check
$e (π, [Z (τ)]) == e ([f (τ)] - [I_{i o} (τ)], H)$

Refer to Dankrad's notes "Multiproofs" sections or arnaucube's batch proof notes for more details.

Blob sharing protocol

Motivation

Overview

Trusted multiplexer

Payment settlement

Data origin authentication

2nd transaction to authenticate pointers

Single tx with signature over data

Censorship resistance

Trustless multiplexer

Usage examples

ZK rollup (Polygon ZK-evm)

Optimistic rollup (OP stack)

Appendix: blob retrieval

Versioned hash from EVM

Blob data

Appendix: data authentication protocols

construction 1

construction 2

Appendix: how rollups publish / consume data

How does Polygon ZK-evm publish / consume data

How does Arbitrum publish / consume data

How does OP stack publish / consume data

Appendix: EIP-4844 Economics and Rollup Strategies

Appendix: KZG commitment proofs for data subset

Proof that Cio equals Ci multiplicatively shifted by wt

Proof that Cio belongs to C

Proof that
$C_{i o}$ equals
$C_{i}$ multiplicatively shifted by
$w^{t}$

Proof that
$C_{i o}$ belongs to
$C$