Try   HackMD

Engine API design space

The purpose of this document is to frame the design space of the Consensus API. It starts with the minimally required set of methods and gradually increases the complexity of the design by adding asynchrony, consistency checkpoints and some methods that might increase UX and aid for faster recovery after failures but are not required for the core functionality.

The end goal is to come up with the solution that would not restrict future extensibility of the protocol and on the other hand would re-use as much of the existing JSON-RPC implementation over HTTP and websockets that we already have across clients.

The security of this API is critical, thus we propose that implementations expose this API at an independent port from standard JSON-RPC user API and to put the new set of methods into a new namespace. With this progressive idea in mind we prefix engine_ as the namespace as a working prototype to be able to generalize API between L1 and L2 in the future, though, the name is debatable and doesn't matter that much at this stage.

Previous work:

Encoding

We re-use the encoding notation from the existing user's JSON-RPC. For example, ExecutionPayload and PowBlock objects should be encoded as a JSON object with particular fields serialized according to the JSON-RPC notation, method parameters should be passed as JSON array or object depending on the particular case.

Encoding of exact message parameters and payloads is out of the scope of this document and should be defined later when we move towards the standard of the consensus API.

Minimal set of methods

  • engine_assemblePayload. Selects transactions from the mempool and produces a block on top of the given one
    • Supplanted by engine_preparePayload and engine_getPayload couple of messages
  • engine_preparePayload. Notifies an execution client that the consensus client will need to propose a block at some point in the future and that the payload will be requested by the corresponding engine_getPayload near to that point in time. This call also supplies an execution client with inputs required to produce a payload e.g. random.
    One of the purposes of this call is giving an execution client some time to get prepared to the subsequent engine_getPayload call that is required to be responded immediately with the most up-to-date version of the payload that is available by the time of the get call.
    One of the potential implementations of this method would be initiating a payload building process that builds an execution payload on top of a given parent with transactions selected from the mempool and arguments provided in the parameter set of the call. And then would keep this payload updated with the most recent state of the mempool.
    As the first action, it is recommended for implementations to build a payload with empty transaction set as a backup in order to be able to respond immediately if the corresponding engine_getPayload call happens e.g. 10ms after the prepare one which could be the case.
    Execution client should cancel the process of payload building (if there is a constant process of updating the payload) if SECONDS_PER_SLOT seconds have passed since the timestamp specified in the call. This suggestion is made to protect execution client from wasting resources in the edge case when related engine_getPayload call never happens. If the corresponding engine_getPayload call happens after the cancellation it should be responded with error.
    Related engine_getPayload call will likely happen very close to the timestamp. Execution client may use this information to choose the strategy of building a payload.
    A pair of engine_preparePayload and engine_getPayload related to each other are identified by the payload_id parameter. Consensus client implementations are free to use whatever value of the identifier they find reasonable.
    • In: parent_hash: Hash32, timestamp: uint64, random: Bytes32, fee_recipient: Bytes20, payload_id: uint64
    • Out: nothing
  • engine_getPayload. Given payload_id returns the most recent version of an execution payload that is available by the time of the call or responds with an error.
    This call must be responded immediately. An exception would be the case when no version of the payload is ready yet and in this case there might be a slight delay before the response is done. Execution client should create a payload with empty transaction set to be able to respond as soon as possible.
    If there were no prior engine_preparePayload call with the corresponding payload_id or the process of building a payload has been cancelled due to the timeout then execution client must respond with error message.
    Execution client may stop the building process with the corresponding payload_id value after serving this call.
  • engine_executePayload. Verifies the payload according to the execution environment rule set (EIP-3675) and returns the status of the verification
    • In: ExecutionPayload object
    • Out: block_hash: Hash32, status: Enum [VALID | INVALID | KNOWN]
  • engine_consensusValidated (engine_consensusCommitted, engine_payloadCommitted). Communicates that full consensus validation of an execution payload is complete along with its corresponding status
    • In: block_hash: Hash32, status: Enum [VALID | INVALID]
    • Action: the block and state should be persisted if the status is VALID, and must be discarded otherwise
    • EIP-3675: maps on POS_CONSENSUS_VALIDATED event
  • engine_forkchoiceUpdated. Propagates the change in the fork choice to the execution client
    • In: head_block_hash: Hash32, finalized_block_hash: Hash32, confirmed_block_hash: Hash32
    • Action: the head of the chain and the finalized block must be updated according to the given data. The most recent confirmed block must be updated according to the given hash, in addition ancenstors of the confirmed block must be considered as confirmed
    • In addition: blocks referenced as head and finalized_block must be accepted as valid blocks with respect to the consensus rules disregarding whether the corresponding engine_consensusValidated message has been already received or not
    • In the prior versions of the JSON-RPC spec, there are two fork choice messages setHead and finalizeBlock. There is a possible corner case when two recent finalizeBlock and setHead messages may refer to two different forks causing temporal discrepancy in the block tree, thus, the fork choice update should be applied atomically, hence the unification of these two messages.
    • EIP-3675: maps on POS_FORKCHOICE_UPDATED event, #3784
    • Note: As it is stated by the EIP-3675 the finalized_block_hash must be stubbed with all zeros before the first execution block is getting finalized. It doesn't require any additional work on consensus clients as the Merge fork should happen in about a month before the transition and starting from that moment til the transition block proposal every beacon block will contain zeroed execution_payalod with block.body.execution_payload.block_hash set to zeros which satisfies the requirement of the finalized_block_hash stubbed with zeros.
    • confirmed_block_hash is needed to serve Ethereum JSON-RPC requests with the new set of block identifiers:
      • earliest Points to the genesis block
      • finalized Points to the most recent finalized block
      • safe / confirmed Points to the most recent confirmed block, i.e. the most recent block in the canonical chain that has been attested by >= 2/3 portion of total stake eligible for voting
        Note: Usually, in a healthy network it takes less than 12 seconds for an unsafe block to turn into a safe one.
      • latest Becomes an alias to safe. The rationale behind this change is to make latest provide at least the same guarantee that it currently has in the PoW network. Currently latest points to the block that is acceptable by the network from the consensus perspective which in the PoS network becomes the case only when the block has received at least 2/3 votes.
        Note: The block that has received 2/3 votes is very likely to stay in the canonical chain and get finalized eventually. It means that safe (and latest as an alias) will be giving more strong guarantee to the applications in the PoS chain as it's very unlikely that this block gets re-orged.
      • unsafe / unconfirmed The head of the canonical chain despite the number of received attestations.
      • pending Pending block built on top of the unsafe one.

Block processing flow

Sequence diagrams of processing a valid block containing valid execution payload are outlined below.

Consensus rules are validated before the payload:

Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionBlock B arrivesengine_executePayload(B.payload)B.payload validation startsengine_consensusValidated(B.payload.hash)B.payload validation finishesengine_executePayload(B.payload) responseB is persistedB becomes the headengine_forkchoiceUpdated(head: B.payload.hash)

The payload is validated before consensus rules:

Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionBlock B arrivesengine_executePayload(B.payload)B.payload validation happensengine_executePayload(B.payload) responseengine_consensusValidated(B.payload.hash)B is persistedB becomes the headengine_forkchoiceUpdated(head: B.payload.hash)

Block proposal flow

Sequence diagram of a block proposal with respect to the block processing flow is outlined below:

Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionBlock B arrivesB.payload is validatedB is persistedB becomes the headengine_forkchoiceUpdated(head: B.payload.hash)engine_preparePayload(head: B.payload.hash)Turns on payload building on top of B.payloadengine_getPayload(head: B.payload.hash)Turns off payload buildingengine_getPayload(B.payload.hash) response

The case when the head is changed during the process of building a payload:

Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionP* becomes the headengine_forkchoiceUpdated(head: P.payload.hash)engine_preparePayload(head: P.payload.hash)Turns on payload building on top of P.payloadC* becomes the headengine_forkchoiceUpdated(head: C.payload.hash)engine_preparePayload(head: C.payload.hash)Restarts payload building with C.payload as parentengine_getPayload(head: C.payload.hash)Turns off payload buildingengine_getPayload(C.payload.hash) response

*P parent, C child

Transition process

For the next two message see the rationale and more details in #2547.

  • engine_terminalTotalDifficultyUpdated. Propagates an override of the TERMINAL_TOTAL_DIFFICULTY (EIP-3675) to the execution client

    • In: terminal_total_difficulty: uint256
    • Action: TERMINAL_TOTAL_DIFFICULTY must be set as terminal_total_difficulty and take an effect according to the rules stated in EIP-3675
    • Scope: transition
  • engine_terminalPoWBlockOverride. Propagates the hash of the terminal PoW block. This takes precedence over the TERMINAL_TOTAL_DIFFICULTY rules. Not in the specification yet

    • In: block_hash: Hash32
    • Action: specified block must be considered as the terminal PoW block and take effect according to the rules stated in EIP-3675 (TBD).
    • Scope: transition
  • engine_getPowBlock. Given the hash returns the information of the PoW block

    • Supplanted by the following change to the spec: Verify terminal PoW block after call to state_transition #2595
    • In: block_hash: Hash32
    • Out: PowBlock object or nothing
    • Action: if the block is not available in the store, the execution client should request the block from the wire, execute it and add to the store
    • Rationale: a transition block (the first PoS block in the system) is the child of a terminal PoW block by definition and the consensus client must prove validity of the terminal PoW block that the particular transition block references to, otherwise, transition block can't be accepted by the node. In order to make a proper check, a consensus client needs information about the terminal PoW block and its parent. If either of these blocks have been accidentally missed (due to gossip outage) then the node risks getting stuck in the middle of the transition process with manual restart as the only recovery option.
    • Alternative: eth_getBlockByHash, misses network lookup which might be critical for the transition
    • Scope: transition

Note: methods and parameters that are scoped as transition throughout the doc are only required by the Merge transition process and will be deprecated after the Merge. They also do not make sense for tests and testnets starting in a PoS mode (i.e. in after the Merge network state) and planned to be stubbed in these cases.

Sync

  • engine_syncCheckpointSet. Propagates the header of the payload obtained from the state at the weak subjectivity checkpoint.
    • In: ExecutionPayloadHeader
    • Out: nothing
    • Action: Switch operating mode from PoW to PoS, i.e. enable changes specified by EIP-3675, scope: transition.
    • Action: Initiate sync process.
    • Maps on: checkpoint(H) in the merge-sync.md
  • engine_syncStatus. An execution client responds with this status to any request of the consensus layer while sync is being in progress. An execution client may send this message to signal the end of the sync process.
    • Params: sync: Enum [SNAP | BLOCK | FINISHED | ERROR], block_hash: Hash32, block_number: uint64, any other meaningful information. These params may have different meaning depending on the sync mode
    • Rationale: a convenient way of informing consensus client that block/state sync is in progress
    • Maps on: responses to final(B) and proc(B) in the merge-sync.md

Extended set of features

  • engine_switchToPos. Propagates the status of the network if it's been already switched to PoS (i.e. the Merge has happened)
    • Supplanted by engine_syncCheckpointSet
    • In: nothing
    • Action: execution client switches to PoS mode, i.e. sync, block propagation etc starts operating as if the transition has already happened
    • Rationale: execution client software will have to support PoW and PoS modes and switch between them during the transition process in the runtime. Therefore, the software default operation mode will be PoW, even after the Merge has already happened. For instance, the sync bootstrap process observes the network TD is likely the case near after the Merge while it should already be driven by the PoS. The consensus client may know that the PoS switch has happened in some advance by e.g. looking into the state at weak subjectivity checkpoint and may switch the execution client into a proper operating mode. Overall, this information should aid for faster bootstrap of the fresh node joining the network right after the Merge
    • Alternative: see engine_consensusStatus below
    • Much better alternative: see checkpoint(H) in the merge-sync.md
    • Scope: transition
  • engine_consensusStatus. Sends information on the state of the client to the execution side.
    • Params:
      • transition state: transition_total_difficulty: uint256, terminal_pow_block_hash: Hash32, scope: transition
      • block tree state: finalized_block_hash: Hash32, confirmed_block_hash: Hash32, head_block_hash: Hash32
    • Response: engine_executionStatus
    • Rationale: See Consistency checks section.
  • engine_executionStatus. Responnds with information on the state of the execution client to either engine_consensusStatus or any other call if consistency failure has occurred.
    • Params: finalized_block_hash: Hash32, confirmed_block_hash: Hash32, head_block_hash: Hash32
    • Rationale: See Consistency checks section.

Requirements to the underlying protocol

Asynchrony

Some messages can be heavy in terms of processing due to computations and network delays, for instance, engine_assemblePayload, engine_executePayload, engine_getPowBlock, resulting in a high variability in response time. Thus, we might want to have asynchrony out of the box and websockets as an underlying communication protocol should fit our needs.

Bi-directional property

It might be the case when an execution client needs information from the consensus counter party e.g. during the sync process and becomes an initiator of a message roundtrip. In this case bi-directional property of an underlying communication protocol will be required. The websockets protocol should fit us well here too.

Message ordering

Currently, this document and EIP-3675 assumes that the message (or PoS event as per the EIP) delivery flow from the consensus to the execution layer maintains weak ordering.

For instance, the following message sequence is currently permitted:

Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionBlock B arrivesengine_consensusValidated(B.payload.hash)engine_executePayload(B.payload)engine_executePayloadResponse(B.payload.hash)engine_getPayload(parent: B.payload.hash)engine_forkChoiceUpdated(head: B.payload.hash)engine_getPayloadResponse(parent: B.payload.hash)engine_preparePayload(parent: B.payload.hash)

Currently, it is also not specified what to do if the engine_forkChoiceUpdated(head: B.payload.hash) has been delivered earlier than the payload it has a reference to. This document proposes to fall back to the consistency check with further recovery procedure in such a case. Which would mean that the order of some messages matter.

This section proposes to require maintaining of more strict ordering model for ingress messages pipeline of execution clients. Namely, require causal ordering to be maintained by consensus clients in the egress message pipeline and rely on the message ordering guarantee provided by TCP protocol (messages are delivered in the same order as they were sent within a TCP session, HTTP protocol doesn't always use the same TCP session for different requests).

No additional requirement to the execution client is proposed by this section. It may not follow the order that is maintained by the ingress message pipeline while processing these messages. But, if an execution client would do this it would guarantee causal consistency in a normal operating mode. The latter means that no consistency checks or such a mechanism would be required between periods of outage. Also, depending on its architecture, an execution client might want to follow this order and might even require this ordering model to be followed by its ingress message pipeline.

It order to maintain causal ordering consensus clients will have to adhere the following set of rules:

  • Outgoing messages referencing the same execution block must be sent in the following sequence:
    Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionBlock B arrivesengine_executePayload(B.payload)engine_consensusValidated(B.payload.hash)engine_executePayloadResponse(B.payload.hash)engine_forkchoiceUpdated(head: B.payload.hash)engine_preparePayload(parent: B.payload.hash)engine_getPayload(parent: B.payload.hash)
    Note: Building a block on top of the head of not canonical chain may be allowed, then engine_forkchoiceUpdated(head: B.payload.hash) should be dropped out of this sequence.
  • The engine_forkchoiceUpdated message referencing a payload must be sent after the payload gets fully validated, specifically:
    Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionengine_executePayload(B.payload)engine_consensusValidated(B.payload.hash)engine_executePayloadResponse(B.payload.hash)engine_forkchoiceUpdated(head: B.payload.hash)
  • engine_executePayload and engine_consensusValidated calls must respect the parent -> child relation, specifically:
    Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionengine_executePayload(Parent.payload)engine_consensusValidated(Parent.payload.hash)engine_executePayloadResponse(Parent.payload.hash)engine_executePayload(Child.payload)engine_consensusValidated(Child.payload.hash)
  • The engine_getPayload call must be made only if its parameter set matches the set of the most recent engine_preparePayload call, specifically:
    Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionengine_preparePayload(Set1)engine_preparePayload(Set2)engine_getPayload(Set2)
  • Maintain sequential order for engine_forkchoiceUpdated messages. It means that engine_forkchoiceUpdated messages must be sent respecting the order of their occurrence in the system, specifically:
    Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionP becomes the headengine_forkchoiceUpdated(head: P.payload.hash)P' becomes the headengine_forkchoiceUpdated(head: P'.payload.hash)C' becomes the headengine_forkchoiceUpdated(head: C'.payload.hash)C becomes the headengine_forkchoiceUpdated(head: C.payload.hash)

If ingress message order doesn't follow the above rule set then the execution client should notify the consensus side about consistency failure and fall back to the recovery procedure as proposed by the Consistency checks section.

Consistency checks

Consensus and execution counterparties maintain their own states which in a normal case must be consistent with each other. This state consists of but may not be limited to the following items:

  • Transition state. transition_total_difficulty and terminal_pow_block_hash, maybe anything else (TBD). Transition state consistency between counterparties is critical for the transition process of a single node and overall network.
  • Operating mode. PoW/PoS node operating mode. See engine_switchToPos for details. This item is not critical for the operation of the node but should improve UX, especially during bootstrap of a fresh node.
  • Block trees. Beacon block and execution block trees should be consistent with respect to each other, likewise, the fork choice state (the head of the chain and the most recent finalized block). In case of software crash or temporal outage (on any side of the communication stack), consistency of block trees might be broken.

The suggestion is to add a concept of consistency checks into the design of the API. It could be implemented as follows.

Consensus client sends engine_consensusStatus message to the execution client upon start up to request a consistency check. If the execution client was out and just started up it should respond with the corresponding engine_executionStatus to the first message received from the consensus client to request the check. The other case for an execution client to initiate the check would be discovering an inconsistency between block trees, i.e. receiving any message that is referencing to unknown parent, head or finalized block. Each party should respond with the corresponding status message upon receiving such a request, and then proceed either with the recovery process or a normal operating mode.

Suggested data for status messages:

  • engine_consensusStatus
    • transition_total_difficulty: uint256, scope: transition
    • terminal_pow_block_hash: Hash32, scope: transition
    • finalized_block_hash: Hash32
    • confirmed_block_hash: Hash32
    • head_block_hash: Hash32
  • engine_executionStatus
    • finalized_block_hash: Hash32
    • confirmed_block_hash: Hash32
    • head_block_hash: Hash32

The recovery process may look as follows:

  • Transition state. The execution client must update its transition state with the corresponding data and immediately take according actions that are specified in the transition process scope of EIP-3675.
  • Operating mode. The execution client switches to PoS operating mode if it is assumed by received data. It may or may not persist this information for further usage.
  • Block trees. The execution client provides the consensus counterparty with its version of the head of the chain. If a consensus client discovers the inconsistency it may take one of the following steps. Simply ignore it and initiate a sync on the execution client side to pull missed data from the network. The other, more convenient option might be to find the last point of consistency between the trees and replay missed execution payloads, then signify the end of the recovery process with the corresponding engine_forkchoiceUpdated message.

Consistency check flow

Sequence diagrams illustrating different cases and parts of consistency check flow are outlined below.

Consensus software outage:

Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionOutageengine_consensusStatusCatch ups with received dataNormal functioningengine_executePayload...

Execution software outage:

Created with Raphaël 2.2.0ConsensusConsensusExecutionExecutionOutageengine_executePayloadengine_executionStatusengine_consensusStatusCatch ups with received dataBlock tree recoveryengine_executePayloadengine_consensusValidated...engine_forkchoiceUpdatedNormal functioningengine_executePayload...

Execution payload cache

In order to properly handle a pair of engine_executePayload and engine_consensusValidated messages, the execution client needs a kind of cache that would keep execution payload in the temporal store until it must be persisted or discarded upon receiving engine_consensusValidated.

In general, executing the payload is a heavier operation than validating the consensus block but it would be great to leave an opportunity for the consensus client to process beacon block and execution payload in parallel which requires the execution client to match information on the consensus and execution validity.

Shared execution client

One of the potential approaches to better resource utilization is sharing the execution client between multiple consensus clients, especially for large infrastructures. Designing the API we might want to look into this direction.

A single execution client listening to multiple consensus clients might lead to undesired switches between different fork choice states received from multiple sources. In order to avoid this it might be an option for consensus clients to disable engine_forkchoiceUpdated messages. With this option a single consensus client may become a source of the fork choice state updates for the execution client to prevent possible adverse effects.

Support of concurrent block building is also increases the level of implementation complexity. Client may support multiple payload building processes in parallel but that would require an access to multiple versions of the mempool if these processes are building payloads on top of different parents. The protocol could provide a support of this feature as engine_getPayload has the parameter set that could identify the building process that the payload should be returned from.