Unfuck Westend Number 123524

Why

Westend is stalled since trying to build block 16164349. The reason for this is the runtime upgrade that was applied in the parent block. This runtime upgrade contained some storage migrations and one of these migration didn't properly upgrade the parachains configuration struct. Oliver fixed the bug in the following pr: https://github.com/paritytech/polkadot/pull/7340

How to fix the issue?

The runtime was upgraded in block 16164348 using an extrinsic that executed set_code and the last finalized block is 16164346. Thus, the block containing the set_code extrinsic was not yet finalized. This should make it very easy to create a fork at 16164347 that will not contain the set_code extrinsic. As Parity controls most of the validators on Westend the easiest solution should be that we revert the unfinalized blocks on one validator, prevent this validator to connect to any other node and let it build some blocks. In one of the blocks we need to include a transaction send by the account registered as sudo on chain. This should prevent that when we connect to the other nodes, that some validator tries to include the faulty set_code again. It should then fail as the nonce changed and thus, the extrinsic would be invalid.

The following steps should be done:

  1. Stop on validator.
  2. Run polkadot revert --chain westend -d PATH_TO_THE_BASE_PATH. -d/--base-path is only required when there was a custom path passed when running the validator. We just need to ensure that we revert the blocks in the same database as we will use when runnin the node.
  3. Change the validator startup script to add --in-peers 0--out-peers 0 --force-authoring. This ensures that the validator doesn't connect to any node on the sync and transaction-sync protocols. Force authoring then tells the node to produce blocks even it is not connected to any other node.
  4. Start the validator and let it produce some blocks.
  5. Send any kind of transaction from the account that is registered as sudo on chain. This transaction needs to be send directly to the validator and can be anything, it can also return an error when dispatching. We just need to ensure that nonce of the account is incremented. The easiest solution would be to point polkadot-jsto this validator rpc ports and then send some balance transfer or similar.
  6. Wait for the transaction from 5. to be included and that the validator has build at least block 16164350.
  7. Revert the changes to the startup script done as part of 3. and restart the validator. The other validators should start syncing the blocks from our special validator and start building on top of this fork.

Evaluation

The steps above worked, but when the validator was connecting it still reverted to the broken chain. I didn't look closely into it, but the problem was probably that the second chain the validator had build had a smaller weight (because it was having less primary blocks). So, we went with adding the problematic chain as badBlock into the chain spec of the validators. Thus, the validators ignored the chain and started building a new one. As the new chain was finalized, we didn't required the badBlock annotation anymore. All other nodes in the network started to sync the new chain and the network was continuing. This solution was only working as Parity controlled the majority of the validators.

Select a repo