Try   HackMD

Blob sharing protocol

Thanks @eduadiez, @aliasjesus, @kevaundray, @dankrad, @protolambda @n1shantd, @rauljordaneth, @dmarzz for helpful discussions

Motivation

Blobs have a more rigid pricing model than transaction calldata: you can only buy data in chunks of 131072 bytes.

To be economically efficient, rollups today can:

  • Wait time
    t
    to buffer 131072 bytes to fill an entire blob
  • Post more frequently at intervals
    t/k
    , sharing the blob with other participants

Also, to perform atomic operations ZK rollups need to publish data together with a validity proof to achieve instant finality. If UX requirements demand frequent submissions rollups may want to perform a state update before being able to fill an entire blob.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

usage pattern example

Overview

In its most basic form, the usage of the blob multiplexer service looks like the following sequence.

Created with Raphaël 2.2.0PublishersPublishersMultiplxMultiplxChainChainConsumersConsumersI am willing to pay X_1to include data_1 on chainI am willing to pay X_2to include data_2 on chainConcat data_1 + data_2 send as blob txFilter and readPublisher's data

Let's introduce some requirements first. We'll discuss in detail latter why they are necessary:

  1. Minimum (ideally zero) overhead incurred by user compared to self-publishing full blobs
  2. Data submissions and their location should be authenticated by the user
  3. Users should be able to filter and derive their published data from L1 cheaply

This protocol is initially focused on sequencers as users. Today, all major rollups live require permissioned sequencers, i.e. only an authorized party is able to publish data. Thus, when relaying data publishing to a 3rd party there must exist some link but to the authorized sequencer entity.

Where possible a trustless protocol is prefered over a trusted solution. However the requirements above make it quite hard to offer an economically attractive trustless protocol.

Trusted multiplexer Trustless multiplexer
Payment settlement off-chain, after the fact on-chain, atomic
Cost per byte overhead zero, or negligible significant
Censorship resistance Somewhat, possible with inclusion proofs Somewhat, possible with inclusion proofs
Data origin authentication Somewhat, with data signatures Yes, required

The design of a trustless protocol requires a key primitive: can you succintly proof that

datai belongs to the version hash of a commitment of concatenated data? Sections below describe attempts at solving this issue.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

.

Without this primitive it's very difficult to build a trustless multiplexer layer, since you can not build trustless payments.

Trusted multiplexer

Payment settlement

Trivial to offer multiple settlement options since the multiplexer is a trusted party. This could include:

  • Pre-paid credits via on-chain or off-chain transactions
  • Payment channels, updated after each successful inclusion
  • FIAT subscription model, i.e. Infura or Pinata

Data origin authentication

Rollups live today require permissioned sequencing. If data is published by a 3rd party (blob sharing protocol), there must exist a way to authenticate the data origin. Adding more complex authentication protocols increases the cost per byte.

2nd transaction to authenticate pointers

Multiplexer just posts data, and expects consuming users to perform a second action to include an authenticated pointer to the previously posted data:

Created with Raphaël 2.2.0PublisherPublisherMultiplxMultiplxChainChainConsumerConsumerPost data X, YBuffer dataSend blob txVerify data in blobAuthenticate data + locationIterate publisherpointers

To derive the L2 chain, simply iterate the authenticated pointers and resolve their data.

Single tx with signature over data

No need for 2nd transaction or additional communication roundtrip. However and observer can't be convienced of blob authenticity without downloading the full blob.

Created with Raphaël 2.2.0PublisherPublisherMultiplxMultiplxChainChainConsumerConsumerPost data X + sigBuffer dataSend blob txIterate Multiplxtxs with Publishersignatures

To derive the L2 chain, iterate Multiplx txs and filter by relevant blob transactions with valid signature from the user (rollup) address.

The multiplexer is not a trusted party, and can publish invalid data. In section TODO we explore how to authenticate the data, but it is currently required to download the full blob to conclusively assert its correctness. The only link to data is the blob versioned hash.

Censorship resistance

In both a trusted and trustless multiplexer, the publisher can't explicitly force the multiplexer into sending a blob transaction. Therefore the multiplexer can censor a publisher by ignoring it's data publishing requests.

A multiplexer is positively incentivized to include data publishing requests to collect fees. But this is not be sufficient to prevent profitable censorship attacks. In the trusted model, the social reputation of the company running the multiplexer adds an extra layer of protection.

A multiplexer can also be incentivized via penalties to not exclude data publishing requests. Consider the following protocol:

  • Multiplexer stakes some capital
  • Publisher makes a data publishing request available (i.e. include it in a blockchain)
  • After some timeout, publisher triggers an optimistic claim of censorship for the request made available above
  • Multiplexer has some sufficiently long interval to proof onchain that the data of the publishing request is part of a versioned hash of a past canonical block
  • If the multiplexer fails to dispute the censorship claim, its stake is slashed

The multiplexer is now disincentivized (up to its stake value) to not censor publishers. However, publishers should build their protocol such that both a multiplexer and themselves can publish data.

Trustless multiplexer

WIP TODO

Requires some mechanism to execute user payments conditional on blob inclusion.

Trusted execution on Intel SGX

EthGlobal istanbul '23 project is attempting it with Flashbots SUAVE architecture: https://ethglobal.com/showcase/blob-merger-k7m1f

On-chain proofs

Requires some not yet figure out primitives, and will likely be too expensive to do on L1.

  • TODO: Figure out a protocol to proof a versioned hash includes the data of a signed commitment by the user, before knowledge of the full blob

Usage examples

ZK rollup (Polygon ZK-evm)

ZK rollups have some extra requirements to optimize prover costs:

  • Sequencer must be able to form a hash chain at the SC level = aggregator must not be able to submit the batches out of order, ideally checked trustlessly
  • Sequencer on-chain smart contract must be able to test data integrity

Refer to How does Polygon ZK-evm publish / consume data for details on their current architecture.

To optimize costs, on chain ZK provers minimize the public inputs typically using very few (or a single) commitment to all data needed for the computation. Part of the verifier circuit, includes checking that the commitment is correct. With EIP-4844, this single commitment must include the versioned hash as sole link to transaction data.

How to verify data subset on aggregated blob

Cheapest strategy is to compute the KZG commitment for the full data in their native field, and then do an equivalence proof (see @dankrad/kzg_commitments_in_proofs). Then extract your subset data of interest for the rest of the circuit execution. While it requires to ingest some

k factor more data, according to @eduadiez it's not significant.

Thus, the logic to handle DA on full blobs or partial blobs is the same. Implement partial blob reading, and set offset to 0 data to all for the full blob case.

TODO rest of integration @aliasjesus, @eduadiez

Optimistic rollup (OP stack)

Currently OP stack uses a very simple strategy to post data, send a blob transaction to some predetermined address. It authenticates data origin with the transaction's signature.

This architecture must change slightly to accomodate data published from a 3rd party (untrusted) account.

TODO integration, @protolambda

Appendix: blob retrieval

Versioned hash from EVM

https://eips.ethereum.org/EIPS/eip-4844#opcode-to-get-versioned-hashes

Versioned hash is available inside the EVM exclusively during its transaction execution. Instruction BLOBHASH reads index argument and returns tx.blob_versioned_hashes[index]

Blob data

Beacon API route getBlobSidecars allows to retrieve BlobSidecars for block_id and a set of blob indices

class BlobSidecar(Container):
    index: BlobIndex  # Index of blob in block
    blob: Blob
    kzg_commitment: KZGCommitment
    kzg_proof: KZGProof  # Allows for quick verification of kzg_commitment
    ...

Appendix: data authentication protocols

append a suffix of authentication to get rid of the send consume tx. To minimize intrinsic costs. Data readers need to capacity to discard invalid data.

Append in send blob tx calldata, a signature of the user to authenticate the data being submitted.

proof to the chain that blob includes that data.
data[128:256] belongs to address Y,

Rationale

  • Invalid offset problem: Since the sequencer initiates a latter transaction after the multiplexer posting the data, it can just verify the integrity of the data offchain, and publish the correct offset and data length

Questions:

  • Could the sequencer contract just reference the original transaction with a proof to the historical block? More expensive, but would bypass the payment mechanism.
  • Is the consume transaction really necessary?
  • Proto is thinking strongly on also full nodes following the chain. So how does your derive L2 from L1 function look like and if it's efficient in terms of relevant data to you vs data downloaded

construction 1

Read process:

  • Read send_blob_tx calldata
  • Verify that range proof matches KZG commitment of blob, without loading full blob
    • Required to avoid replay of header without including the data
  • Verify that sequencer signature is correct
    • Required to not read junk data from others
  • Read blob, only extract the data that was proven in the range-proof
  • call_data = [ Signed KZG range-proof over chunk ] = sign_by_rollup_sequencer(range_proof(range info, KZG proof data))
  • blob = [ chunk 0 ][ chunk 1 ] ...
  • chunk = [ data ]

construction 2

Read process:

  • Read send_blob_tx calldata
  • Verify that range proof matches KZG commitment of blob, without loading full blob
    • Required to avoid replay of header without including the data
  • Verify that sequencer signature is correct
    • Required to not read junk data from others
  • Read blob, only extract the data that was proven in the range-proof
  • call_data = [ Signed KZG range-proof over chunk ] = sign_by_rollup_sequencer(range_proof(range info, KZG proof data))
  • blob = [ chunk 0 ][ chunk 1 ] ...
  • chunk = [ data ]

Appendix: how rollups publish / consume data

Overview of how some major rollups live today publish and consume data pre EIP-4844 and post EIP-4844.

Relevant to the topic, Dankrad notes on ideas to integrate EIP-4844 into rollups: https://notes.ethereum.org/@dankrad/kzg_commitments_in_proofs

How does Polygon ZK-evm publish / consume data

  • Sequencer creates batches
  • At time
    t0
    sequencer groups a list of batches and publishes it to the sequencer smart contract (calls PolygonZkEVM.sequenceBatches())
  • The prover watches the on-chain tx and starts producing proofs for those batches in parallel
  • At time
    t1
    (~ 30 min after
    t0
    ) the prover publishes the final proof to the verifier smart contract

Current architecture can't handle invalid sequencer submissions gracefully. Thus, sequencer role is permissioned. The smart contract guarantees that the data hash is correct, and that all data is eventually process by the prover with a hash chain.

In PolygonZkEVM.sequenceBatches the hash chain is computed with an accumulator consisting of the previous accumulator hash, and the current transactions (ref PolygonZkEVM.sol#L572-L581).

// Calculate next accumulated input hash
currentAccInputHash = keccak256(
    abi.encodePacked(
        currentAccInputHash,
        currentTransactionsHash,
        currentBatch.globalExitRoot,
        currentBatch.timestamp,
        l2Coinbase
    )
);

Only persisted data to link with the future verifier submission in the accumulator hash of this batch (ref (PolygonZkEVM.sol#L598-L602)[https://github.com/0xPolygonHermez/zkevm-contracts/blob/aa4608049f65ffb3b9ebc3672b52a5445ea00bde/contracts/PolygonZkEVM.sol#L598-L602])

sequencedBatches[currentBatchSequenced] = SequencedBatchData({
    accInputHash: currentAccInputHash,
    sequencedTimestamp: uint64(block.timestamp),
    previousLastBatchSequenced: lastBatchSequenced
});

In PolygonZkEVM.verifyBatchesTrustedAggregator a verifier posts a new root with its proof. The actual call to verify the proof is rollupVerifier.verifyProof(proof, [inputSnark]) (ref) where inputSnark is computed with data submitted by the sequencer as (ref (PolygonZkEVM.sol#L1646-L1675)[https://github.com/0xPolygonHermez/zkevm-contracts/blob/aa4608049f65ffb3b9ebc3672b52a5445ea00bde/contracts/PolygonZkEVM.sol#L1646-L1675])

bytes32 oldAccInputHash = sequencedBatches[initNumBatch].accInputHash;
bytes32 newAccInputHash = sequencedBatches[finalNewBatch].accInputHash;
bytes memory inputSnark = abi.encodePacked(
    msg.sender,       // to ensure who is the coinbase
    oldStateRoot,     // retrieved from smart contract storage
    oldAccInputHash,  // retrieved from smart contract storage
    initNumBatch,     // submitted by the prover
    newStateRoot,     // submitted by the prover
    newAccInputHash,  // retrieved from smart contract storage
    newLocalExitRoot, // submitted by the prover
    finalNewBatch     // submitted by the prover
);

How does Arbitrum publish / consume data

SequencerInbox.addSequencerL2BatchFromOrigin
https://github.com/OffchainLabs/nitro-contracts/blob/695750067b2b7658556bdf61ec8cf16132d83dd0/src/bridge/SequencerInbox.sol#L195

Post EIP-4844

WIP diff https://github.com/OffchainLabs/nitro-contracts/compare/main4844-blobasefee

Version hashes are read from an abstract interface, not yet defined. Ref src/bridge/SequencerInbox.sol#L316

  • TODO: Why is data used here? It's not the RLP serialized transaction but the contract call data
bytes32[] memory dataHashes = dataHashReader.getDataHashes();
return (
    keccak256(bytes.concat(header, data, abi.encodePacked(dataHashes))),
    timeBounds
);

How does OP stack publish / consume data

PR for EIP-4844 support (no smart contract changes)

https://github.com/ethereum-optimism/optimism/pull/7349

_Optimism construction pre-eip4844

  • inbox address, EOA
  • batcher submits tx to inbox address
  • verifier traverses tx list for dest: inbox adress
  • checks that signature is from batchers

Appendix: EIP-4844 Economics and Rollup Strategies

Paper by Davide Crapis, Edward W. Felten, and Akaki Mamageishvili

https://arxiv.org/pdf/2310.01155.pdf

TODO, by TLDR;

(1) When should a rollup use the data market versus the main market for sending data
to L1?
(2) Is there a substantial efficiency gain in aggregating data from multiple rollups and
what happens to the data market fees?
(3) When would rollups decide to aggregate and what is the optimal cost-sharing scheme?

Appendix: KZG commitment proofs for data subset

Data publisher has a data chunk data_i encoded as

(wi,yi) where
i0,...,k1
. It computes a commitment
Ci
which then signs. Multiple publishers send data_i and
Ci
signed to the aggregator. Each chunk may have a different
k
.

The aggregator concats data chunks of different sizes and computes an overall commitment

C which will be posted on chain as part of the blob transaction.

[ data_0                ][ data_1 ][ data_2       ]
[ data                                            ]

The aggregator must allow a verifier to convience itself that the signed

Ci commitment belongs to
C
. Let's define
Cio
as the commitment to a data chunk offset by some
t
. The proof has two steps:

  • Proof that
    Ci
    equals
    Cio
    multiplicatively shifted by
    wt
  • Proof that
    Cio
    belongs to
    C

Terminology

  • f(x)
    is the interpolated polynomial over the concatenated data.
  • fi(x)
    is such that
    f(wj)=yj
    where
    j0,...,k1
  • fio(x)
    is such that
    f(wt+j)=yj
    where
    j0,...,k1
  • C=[f(τ)]
    , and
    Ci=[fi(τ)]
    , and
    Cio=[fio(τ)]

Proof that
Cio
equals
Ci
multiplicatively shifted by
wt

The interpolation polynomials are evaluated at the roots of unity, in our example

wj where
j0,...,k1
. A root of unity can be shifted
t
times by multiplying it by
wt
.

Cio and
Ci
are commitments to polynomials of the same set of points, only multiplicatively shifted by
wt
. We need to proof that
fio(x)=fi(wtx)
. With the Schwartz–Zippel lemma we can just proof that identity for a deterministic random point
r
,
fio(r)=fi(wtr)
.
r
is computed from the commitments
Cio
and
Ci
.

The verifier is given

t,
Cio
,
Ci
, and evaluation proofs
fio(r)
and
fi(wtr)
. Verification routine:

  1. Compute
    r
    from
    Cio
    and
    Ci
  2. Verify evaluation proofs
    fio(r)
    and
    fi(wtr)
  3. Check
    fio(r)==fi(wtr)

Proof that
Cio
belongs to
C

Given a subset of points

xj,yj where
j0,...,k1
proof that
f(xj)=yj
.
fio(x)
is the interpolation polynomial over the point subset such that
fio(xj)=yj
for all
j
.
z(x)
is the zero polynomial
z(xj)=0
for all
j
.
τ
is from the trusted setup. We construct a quotient polynomial
q(x)
:

q(x)=f(x)fio(x)z(x)

For this polynomial to exist (can't divide by zero),

f(x)fio(x)=0 for all
j
. The proof is

π=[q(τ)]

The verifier is given

π,
C
,
Cio
. Verification routine:

  1. Compute zero polynomial
    Z(x)
    and compute
    [Z(τ)]
  2. Do pairing check
    e(π,[Z(τ)])==e([f(τ)][Iio(τ)],H)

Refer to Dankrad's notes "Multiproofs" sections or arnaucube's batch proof notes for more details.