owned this note
owned this note
Published
Linked with GitHub
# Engine API: thoughts on `latestValidHash`
Engine API spec states:
> If validation fails, the response **MUST** contain `{status: INVALID, latestValidHash: validHash}` where `validHash` is the block hash of the most recent *valid* ancestor of the invalid payload. That is, the valid ancestor of the payload with the highest `blockNumber`
## TL; DR
### Implementation
For synchronous payload validation (when the parent block and state are known):
* Respond with `{status: INVALID, latestValidHash: P.parentHash}` to `newPayload(P)` if `P` is `INVALID`
* Respond with `{status: INVALID, latestValidHash: validHash}` to `fcU(headBlockHash=P.blockHash)` if during re-org blocks of the new canonical chain has been executed and one of them appeared to be invalid, in this case it should be easy to get a `validHash` value
For asynchronous payload validation, when `SYNCING` EL met an `INVALID` block:
* Cache `invalidTipHash: latestValidHash`, where `invalidTipHash` is the head of invalid chain, capped at a few entries should be pretty much enough. There is no need to persist this cache as if EL is restarted it will face with the same invalid chain once again in the worst case, and this time it will be able to respond correctly.
* Check if this cache contains `P.parentHash` on each `newPayload(P)` method call and respond accordingly if it is, otherwise, process `P` as usual
* Check if this cache contains `headBlockHash` on each `forkchoiceUpdated` and response accordingly if it is, otherwise, process fork choice update as usual
* Implementation may not handle the case when CL misses `newPayload(P)` and submits `newPayload(P1)` instead, where `P <- P1`, in this case EL is unable to link `P1` to its invalid ancestor and respond correctly.
#### `INVALID_TERMINAL_BLOCK`
This response is a special case of `latestValidHash` functionality. It's needed because when either a terminal or transition block is invalid there is no meaningful hash to send as `latestValidHash` response parameter, and CL must invalidate a subchain starting from transition block.
An implementation part of this response is similar to `latestValidHash` with exception that no `latestValidHash` is cached alongside with `invalidTipHash`.
### Do not support during `SYNCING`
There is a couple of attack scenarios that becomes enabled by not supporting `latestValidHash` for a `SYNCING` node. The first one seems critical enough to keep support of `latestValidHash` during `SYNCING` a strong requirement in the spec.
#### Attack by re-org to a chain with missing parent state
**TL; DR** Attack has low probability which is yet to assess but may have a big impact causing liveness failure requiring manual intervention under some circumstances. The circumstances are shallow/deep-state EL client distribution in the network and duration of a period with no finality. Another imporant conditions for this attack is adversary owning a portion of the stake required to make a re-org to a malicious chain (this chain will likely need to outperform canonical chain with 64 or 128 blocks) -- it may be required for this portion to be pretty big, especially, on the Mainnet.
A re-org to a chain with missing parent state may happen in two cases:
* Shallow-state EL client (keeps only one state version at a time -- the post state of the head of canonical chain) re-orgs to a side branch
* Deep-state EL client (keeps a number of recent state versions) re-orgs to a side branch with common ancestor behind a block it has a state for. May happen in the case of no-finality, where no-finality period is greater than a number of state version EL clients keep
Attack scenario:
1. Create a malicious chain `B: CA <- INV_P0 <- P1 <- ... <- Pn` `CA` is a common ancestor with current canonical chain, `n` is such that no client has a post-state of `CA`, `INV_P0` is invalid payload
2. ~~Reveal `BeaconBlock(Pn)`~~
3. ~~CL calls `newPayload(Pn)` and receives `ACCEPTED` from EL~~
4. ~~Make the network re-org to `BeaconBlock(Pn)` and reveal the rest of malicious chain~~
5. Reveal the `B` chain
* CL receives `ACCEPTED` in response to `newPayload(INV_P0)`, and optimistically applies `BeaconBlock(INV_P0)`, `BeaconBlock(P1)` ... `BeaconBlock(Pn)`
6. Make the network re-org to `BeaconBlock(Pn)`, i.e. induce `forkchoiceUpdated(Pn)` on the majority of nodes
7. If EL supports `latestValidHash` it informs CL and CL re-orgs back to canonical chain when `BeaconBlock(Pn+1)` is received or CL repeatedly calls `forkchoiceUpdated(Pn)`
8. Otherwise, EL silently drops malicious chain due to its invalidity and starts `SYNCING` again and again while receiving a subsequent `forkchoiceUpdated(Pn)` message
*Note:* `B` can be relatively short, with `Pn.blockNumber` lower than the block height of canonical chain.
#### Attack on nodes with `SYNCING` EL
**TL; DR.** Attacking surface is very limited, only nodes which EL is `SYNCING` near the head are affected. The damange is negligible as these nodes will be guided by fully synced nodes and eventually become in sync with canonical chain, the recovery will take a few slots which is insignificant addition to the time of overall sync process.
Not supporting `latestValidHash` while `SYNCING` opens an attack vector for nodes which EL is `SYNCING` pretty close to the head. Suppose, there is a way of relatively small portion of stake to induce a re-org in the network through a vulnerability in LMD-GHOST or any other exploit. In this case adversary will be able to induce a temporal liveness failure if `latestValidHash` isn't supported by EL clients as it currently specified. Scenario of such an attack may looks as follows:
1. Create a malicious chain `B: CA <- INV_P0 <- P1`, where `CA` is a common ancestor with current canonical chain, `INV_P0` is invalid payload, `P1` is just a payload
2. Reveal `BeaconBlock(P1)` and make the network re-org to `BeaconBlock(P1)`
3. Reveal `BeaconBlock(INV_P0)`
Fully synced node:
1. Pulls `BeaconBlock(INV_P0)` when `BeaconBlock(P1)` is received and processes both in lock-step
2. EL reponds with `INVALID` on `newPayload(INV_P0)`, the block is invalidated and no re-org happens -- there is no need for `latestValidHash` in this case
EL is `SYNCING` near the head:
1. `INV_P0` is unavailable (as it's actually invalid and no peers except for malicious ones serve it) then `latestValidHash` won't help to overcome the situation
1. If `INV_P0` is sent by malicious peers local EL will drop `INV_P0 <- P1`
1. CL either sends `forkchoiceUpdated(P1)` in the next slot or on attestation to `BeaconBlock(P1)`
1. If `forkchoiceUpdated(P1)` is responded with `INVALID` + `latestValidHash` then CL discards malicious subchain and re-orgs back to canonical chain
1. If `forkchoiceUpdated(P1)` is responded with `SYNCING` because EL hasn't cached information about invalidity of this chain then CL stucks on malicious branch until canonical chain will outperform it which should happen pretty fast
## A bit of details
This requirement is easy to satisfy if a node in the state when it has been fully synced and is staying online by importing blocks in the lock-step mode. In this case if any payload is `INVALID` it is reasonable to assume that the parent would be the most recent valid ancestor, i.e. `latestValidHash == parent.blockHash`. If `forkchoiceUpdated` induces a short-range re-org and a client is optimised to synchronously handle such re-orgs then if `INVALID` block occurs in the middle of the fork the client may easily respond with the most recent valid ancestor to this call.
This requirement is trickier to satisfy in the case when EL part was `SYNCING` and found an `INVALID` block on the chain that it has been syncing with. Since all EL clients syncing with canonical chain *only* (they would not sync with side branches, and even if they do we don't care about invalid blocks unless these blocks pretend to belong to canonical chain), we may assume that `latestValidHash` should only be applicable to the canonical chain.
In general the would look as follows `Bk <- Bi <- Hj <- ... <- HEAD`, where `Bi` is the most recent full block pulled by EL, `Hj <- ... <- HEAD` is the chain of block headers ending up with the most recent `HEAD` block, i.e. the most recent `forkchoiceUpdated` that EL has received was setting `headBlockHash = HEAD.blockHash`. After receiving `fcU(HEAD)` EL client has processed `Bi` and it appears to be `INVALID`. Depending on implementation EL may drop the entire chain starting from `Bi`, but it must also send this information back to CL during the next roundtrip. And here we have the following cases:
* `fcU(headBlockHash=anotherChainBlockHash)` -- re-org is happening, and invalidity information becomes irrelevant
* `newPayload(anotherChainPayload)` -- payload from another chain arrives, and should be processed as expected
* `newPayload(HEAD.child)` -- must be responded with `{status: INVALID, latestValidHash: Bk.blockHash}`, for this purpose EL have to keep `{latestValidHash: Bk.blockHash, latestInvalidHash: HEAD.blockHash}` pair in memory until it receives this method call
* `fcU(headBlockHash=HEAD.child.blockHash)` -- must be responded with `{status: INVALID, latestValidHash: Bk.blockHash}`, but in this case EL doesn't immediately know that a block referenced by `headBlockHash` is the child of and invalid `HEAD` block because EL hasn't seen the header of `HEAD.child` before to do a match
* `fcU(headBlockHash=HEAD.blockHash)` -- must be responded with `{status: INVALID, latestValidHash: Bk.blockHash}`, and EL has requisite information if it keeps `{latestValidHash: Bk.blockHash, latestInvalidHash: HEAD.blockHash}` pair in memory