Try   HackMD

[WIP] Arbitrum x EigenDA: DA Failover

Purpose

  • Design subsystem logic for rolling over to other DA providers in the event that EigenDA is deemed unavailable
  • Evaluate system tradeoffs to optimize for a best fit solution

Considerations

Security

  • Each DA layer has it's own respective MaxBatchSize for which it can support; exceeding this could result in failure:
    • Liveness: DA layer could constantly reject batch if max batch size isn't reset accordingly
  • Each DA has its own confirmation latency (t) (i.e, the time for a batch to be accredited on the DA). Simply, max_throughput = DA.MaxBatchSize / t. When falling back, max_throughput could decrease, creating backpressure on the existing unsafe sequencer backlog. With high enough backpressure or a growing sync head delta (i.e, # of messages in backlog - # of messages confirmed), a reorg could occur between the unsafe and confirmed rollup chain heads.

Node Operator

  • Functionality should be opt-in (i.e, feature guarded)
  • Fallback destinations should be user defined with all native Arbitrum DA supported (i.e, anytrust, 4844, ETHDA)

Developer

  • Solution should introduce little long-term maintenance overhead and be sufficiently resilient to future changes introduced by OCL
  • Falling back should occur within subseconds

Economics

  • Different DA providers use different SequencerInbox entrypoint functions with different cost amortization logics. Hot switching from EigenDA (no dispersal amortization) to e.g calldata would result in a domain switch for the l2 tx pricing model.
  • Different DA providers utilize alternative pricing models for settling to their DA (e.g, 4844 uses blob fee model vs calldata which uses EIP-1559 pricing). This could result in unexpected shifts in batch poster wallet spedning rates.

Procedure

Please advise the following PR to see actual code

  1. Try dispersing to EigenDA until receiving 503 Service Unavailable status code.
  2. If EigenDAFailover enabled, proceed to check if the batch size is greater than the maximum limit for fallback destination (i.e, anytrust, ethDA):
    a. If true, return an error, causing the batchPoster to retry on its event loop when the next ticker hit
    b. If false, proceed to submit to fallback native batch posting destination within same event loop invocation

Known Edge Cases

In the event where len(batch) < MaxBatchSizeAnyTrust || len(batch) < MaxBatchSize4844 || len(batch) < MaxBatchSizeCalldata, the batch poster will discard the batch and set a boolean field eigenDAFailoverETHDA to use during the next invocation to post the batch to ETHDA. This incurs some latency but works fine when using a single batch poster. When using multiple batch posters howevever, this results an inconsistency where bug a batch poster can disperse to ETHDA when EigenDA is actually back online. This inconsistency is temporary and doesn't compromise the liveness or safety of the system.