Engine API: thoughts on `latestValidHash`

Engine API spec states:

If validation fails, the response MUST contain {status: INVALID, latestValidHash: validHash} where validHash is the block hash of the most recent valid ancestor of the invalid payload. That is, the valid ancestor of the payload with the highest blockNumber

TL; DR

Implementation

For synchronous payload validation (when the parent block and state are known):

Respond with {status: INVALID, latestValidHash: P.parentHash} to newPayload(P) if P is INVALID
Respond with {status: INVALID, latestValidHash: validHash} to fcU(headBlockHash=P.blockHash) if during re-org blocks of the new canonical chain has been executed and one of them appeared to be invalid, in this case it should be easy to get a validHash value

For asynchronous payload validation, when SYNCING EL met an INVALID block:

Cache invalidTipHash: latestValidHash, where invalidTipHash is the head of invalid chain, capped at a few entries should be pretty much enough. There is no need to persist this cache as if EL is restarted it will face with the same invalid chain once again in the worst case, and this time it will be able to respond correctly.
Check if this cache contains P.parentHash on each newPayload(P) method call and respond accordingly if it is, otherwise, process P as usual
Check if this cache contains headBlockHash on each forkchoiceUpdated and response accordingly if it is, otherwise, process fork choice update as usual
Implementation may not handle the case when CL misses newPayload(P) and submits newPayload(P1) instead, where P <- P1, in this case EL is unable to link P1 to its invalid ancestor and respond correctly.

`INVALID_TERMINAL_BLOCK`

This response is a special case of latestValidHash functionality. It's needed because when either a terminal or transition block is invalid there is no meaningful hash to send as latestValidHash response parameter, and CL must invalidate a subchain starting from transition block.

An implementation part of this response is similar to latestValidHash with exception that no latestValidHash is cached alongside with invalidTipHash.

Do not support during `SYNCING`

There is a couple of attack scenarios that becomes enabled by not supporting latestValidHash for a SYNCING node. The first one seems critical enough to keep support of latestValidHash during SYNCING a strong requirement in the spec.

Attack by re-org to a chain with missing parent state

TL; DR Attack has low probability which is yet to assess but may have a big impact causing liveness failure requiring manual intervention under some circumstances. The circumstances are shallow/deep-state EL client distribution in the network and duration of a period with no finality. Another imporant conditions for this attack is adversary owning a portion of the stake required to make a re-org to a malicious chain (this chain will likely need to outperform canonical chain with 64 or 128 blocks) – it may be required for this portion to be pretty big, especially, on the Mainnet.

A re-org to a chain with missing parent state may happen in two cases:

Shallow-state EL client (keeps only one state version at a time – the post state of the head of canonical chain) re-orgs to a side branch
Deep-state EL client (keeps a number of recent state versions) re-orgs to a side branch with common ancestor behind a block it has a state for. May happen in the case of no-finality, where no-finality period is greater than a number of state version EL clients keep

Attack scenario:

Create a malicious chain B: CA <- INV_P0 <- P1 <- ... <- Pn CA is a common ancestor with current canonical chain, n is such that no client has a post-state of CA, INV_P0 is invalid payload
~~Reveal BeaconBlock(Pn)~~
~~CL calls newPayload(Pn) and receives ACCEPTED from EL~~
~~Make the network re-org to BeaconBlock(Pn) and reveal the rest of malicious chain~~
Reveal the B chain
- CL receives ACCEPTED in response to newPayload(INV_P0), and optimistically applies BeaconBlock(INV_P0), BeaconBlock(P1) … BeaconBlock(Pn)
Make the network re-org to BeaconBlock(Pn), i.e. induce forkchoiceUpdated(Pn) on the majority of nodes
If EL supports latestValidHash it informs CL and CL re-orgs back to canonical chain when BeaconBlock(Pn+1) is received or CL repeatedly calls forkchoiceUpdated(Pn)
Otherwise, EL silently drops malicious chain due to its invalidity and starts SYNCING again and again while receiving a subsequent forkchoiceUpdated(Pn) message

Note: B can be relatively short, with Pn.blockNumber lower than the block height of canonical chain.

Attack on nodes with `SYNCING` EL

TL; DR. Attacking surface is very limited, only nodes which EL is SYNCING near the head are affected. The damange is negligible as these nodes will be guided by fully synced nodes and eventually become in sync with canonical chain, the recovery will take a few slots which is insignificant addition to the time of overall sync process.

Not supporting latestValidHash while SYNCING opens an attack vector for nodes which EL is SYNCING pretty close to the head. Suppose, there is a way of relatively small portion of stake to induce a re-org in the network through a vulnerability in LMD-GHOST or any other exploit. In this case adversary will be able to induce a temporal liveness failure if latestValidHash isn't supported by EL clients as it currently specified. Scenario of such an attack may looks as follows:

Create a malicious chain B: CA <- INV_P0 <- P1, where CA is a common ancestor with current canonical chain, INV_P0 is invalid payload, P1 is just a payload
Reveal BeaconBlock(P1) and make the network re-org to BeaconBlock(P1)
Reveal BeaconBlock(INV_P0)

Fully synced node:

Pulls BeaconBlock(INV_P0) when BeaconBlock(P1) is received and processes both in lock-step
EL reponds with INVALID on newPayload(INV_P0), the block is invalidated and no re-org happens – there is no need for latestValidHash in this case

EL is SYNCING near the head:

INV_P0 is unavailable (as it's actually invalid and no peers except for malicious ones serve it) then latestValidHash won't help to overcome the situation
If INV_P0 is sent by malicious peers local EL will drop INV_P0 <- P1
CL either sends forkchoiceUpdated(P1) in the next slot or on attestation to BeaconBlock(P1)
If forkchoiceUpdated(P1) is responded with INVALID + latestValidHash then CL discards malicious subchain and re-orgs back to canonical chain
If forkchoiceUpdated(P1) is responded with SYNCING because EL hasn't cached information about invalidity of this chain then CL stucks on malicious branch until canonical chain will outperform it which should happen pretty fast

A bit of details

This requirement is easy to satisfy if a node in the state when it has been fully synced and is staying online by importing blocks in the lock-step mode. In this case if any payload is INVALID it is reasonable to assume that the parent would be the most recent valid ancestor, i.e. latestValidHash == parent.blockHash. If forkchoiceUpdated induces a short-range re-org and a client is optimised to synchronously handle such re-orgs then if INVALID block occurs in the middle of the fork the client may easily respond with the most recent valid ancestor to this call.

This requirement is trickier to satisfy in the case when EL part was SYNCING and found an INVALID block on the chain that it has been syncing with. Since all EL clients syncing with canonical chain only (they would not sync with side branches, and even if they do we don't care about invalid blocks unless these blocks pretend to belong to canonical chain), we may assume that latestValidHash should only be applicable to the canonical chain.

In general the would look as follows Bk <- Bi <- Hj <- ... <- HEAD, where Bi is the most recent full block pulled by EL, Hj <- ... <- HEAD is the chain of block headers ending up with the most recent HEAD block, i.e. the most recent forkchoiceUpdated that EL has received was setting headBlockHash = HEAD.blockHash. After receiving fcU(HEAD) EL client has processed Bi and it appears to be INVALID. Depending on implementation EL may drop the entire chain starting from Bi, but it must also send this information back to CL during the next roundtrip. And here we have the following cases:

fcU(headBlockHash=anotherChainBlockHash) – re-org is happening, and invalidity information becomes irrelevant
newPayload(anotherChainPayload) – payload from another chain arrives, and should be processed as expected
newPayload(HEAD.child) – must be responded with {status: INVALID, latestValidHash: Bk.blockHash}, for this purpose EL have to keep {latestValidHash: Bk.blockHash, latestInvalidHash: HEAD.blockHash} pair in memory until it receives this method call
fcU(headBlockHash=HEAD.child.blockHash) – must be responded with {status: INVALID, latestValidHash: Bk.blockHash}, but in this case EL doesn't immediately know that a block referenced by headBlockHash is the child of and invalid HEAD block because EL hasn't seen the header of HEAD.child before to do a match
fcU(headBlockHash=HEAD.blockHash) – must be responded with {status: INVALID, latestValidHash: Bk.blockHash}, and EL has requisite information if it keeps {latestValidHash: Bk.blockHash, latestInvalidHash: HEAD.blockHash} pair in memory

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Engine API: thoughts on latestValidHash