TODO: 1. List post-mortems, incidents, helpful links that could be used to create this document 2. Cover each incident and lessons learned from the incident 3. Compile an general overview/troubleshooting guide based on analysis of all incidents - https://forum.polkadot.network/t/robust-chain-upgrades-impossible-or-uptane-for-substrate-parachains/1267/2?u=bruno - https://forum.polkadot.network/t/pallet-idea-safe-scheduler/1009 - https://forum.polkadot.network/t/how-to-recover-a-parachain/673/11 - ~~https://polkadot.polkassembly.io/referendum/97~~ - ~~https://polkadot.polkassembly.io/referendum/103~~ - ~~https://polkadot.polkassembly.io/tech/41~~ - ~~https://polkadot.polkassembly.io/tech/38~~ - ~~https://polkadot.polkassembly.io/referendum/56~~ - https://matrix.to/#/!PMVZsiDWXUbpjIbAel:matrix.parity.io/$is6JGXlKoXEVDaVfftxQ-r4vo1hMJFm1JeHah4TTyQg?via=matrix.parity.io&via=parity.io&via=matrix.org - https://matrix.to/#/!CJojkUUEVATtDrzCZC:matrix.parity.io/$IdfP06oYdO5ODpOoKDOs-Da8lU_nmSs0-cRRWZTXME0?via=matrix.parity.io&via=matrix.org&via=parity.io - Composable Finance post-mortem??? - https://github.com/paritytech/delivery-services/issues/114 - https://substrate.stackexchange.com/questions/1394/our-parachain-doesnt-produce-blocks-checklist/1395#1395 - https://jakpan.hashnode.dev/snek-stall-post-mortem - https://www.t3rn.io/blog/testnet-halt-lessons-learned - https://forum.parity.io/t/enjin-post-mortem/1919 - https://matrix.to/#/!prvlDJzwsdxUwRkyJd:parity.io/$FlYyCqx_gu2IaW3nEqKA7W167g5a_qLmYkpEoEzNk7E?via=parity.io - https://medium.com/nodle-io/latest-parachain-upgrade-what-everyone-should-know-9d5563e97a70 Additional Stalls: - - https://github.com/integritee-network/parachain/issues/77 ### Template ``` ## Title ### Issue - Describe issue ### Resolution - Describe resolution ### Proposed Referendum - Link ### Post-Mortem - Link ### Comments - Comments regarding the situation ### Glows - [x] List some glows for the team ### Grows - List some grows for the team ``` # Case Studies We will go over case studies that have happened in the Polkadot ecosystem and how they were handled. ## Unbrick Zeitgeist Parachain ### Issue - Chain used incorrect `paraId` in chain spec > Due to a human error, the Zeitgeist parachain (ParaId 2092) on Polkadot has ended up with an invalid ParachainInfo::ParaId (2101), which renders block authoring impossible. ### Resolution - Correct the chain spec, call `paras.forceSetCurrentHead(2092, genesisHead)` to use the genesis state with the correct `paraId` > To fix the bricked chain, the chainspec used for Zeitgeistā€™s parachain on Polkadot must use 2092 within ParachainInfo::ParaId. To inform Polkadot about the new state root that results from the updated chainspec, paras.forceSetCurrentHead(2092, genesisHead) has to be invoked by the Root origin, whereas genesisHead is set to the resulting genesis state head that is derived from the new chainspec. ### Proposed Referendum - https://polkadot.polkassembly.io/referendum/103 ### Post-Mortem - https://hackmd.io/PXotv7zMTSOphZFg4f99iQ?view ### Comments Any time there is something wrong with the chain spec/genesis state and the chain has not started producing blocks, then `paras.forceSetCurrentHead()` can be called to set the correct genesis state ### Glows - [x] Team acted promptly - [x] Team tested solution locally - [x] Team was transparent with community throughout the entire incident - [x] Team communicated with community updates throughout the process - [x] Team wrote a post-mortem ### Grows - It was an oversight, especially since team did test the onboarding of the parachain locally with success ## Unbrick Bitgreen Parachain ### Issue - A runtime upgrade left the chain with an empty parachain validator set, and therefore the chain could no longer produce blocks > Due to a human error, the upgraded Bitgreen parachain (paraId : 2048) on Polkadot has ended up with an empty validator set, unable to produce new blocks. ### Resolution > To unbrick the chain and to be able to produce blocks again we would have to rollback to a state before the session rotation and then set the invulnerables to our existing validator accounts, so that in the upcoming session rotation these validators will be populated and the chain would continue to produce blocks. **We opted for the simplest and fastest solution which is to rollback to our initial genesis state**. > The proposal transaction will be a `utility.batch(...)` call containing the calls `paras.forceSetCurrentCode(2048,genesisCode)` and `paras.forceSetCurrentHead(2048,genesisHead)`. ### Proposed Referendum - https://polkadot.polkassembly.io/referendum/97 ### Post-Mortem - https://hackmd.io/@t9B1coeOQ1GlARb2EZRx3g/Hy0mnACdo ### Comments > The shell parachain was still at genesis state, apart from trivial internal transfer tests. Therefore, no one is affected by rolling the state back to genesis. No data nor funds are lost. This makes it much easier to fix. ### Glows - [x] Team tests upgrades prior to production - [x] Team tested resolution on a local testnet - [x] Team created a post-mortem ### Grows - Continue testing upgrades on a testnet prior to production ## Help Composable Finance Unbrick Their Chain ### Issue - A stray `todo()!` was left in the code when the calling code expected a `Weight` amount to be returned ### Resolution - A proposal for a preimage that calls `paras.forceSetCurrentCode` with a new Wasm runtime which included the fixed code. ### Proposed Referendum - https://polkadot.polkassembly.io/referendum/56 ### Post-Mortem - ??? ### Comments - `paras.forceSetCurrentCode` is not something that Polkadot governance likes to call for a parachain because we have no protocol/process in place to check whether that runtime Wasm is a the fix or has other changes with it as well and also it is also a bit of a Polkadot philosophical question, every time a parachain makes a mistake, should Polkadot governance intervene and fix the mistake? ### Glows - [x] Team debugged their own issue - [x] Team acted promptly ### Grows - [x] Post-mortem? --- ## Unbrick Phala Network Parachain ### Issue - Runtime had both `sudo` and `democracy` disabled ### Resolution - Added `sudo` back to the `BaseCallFilter` - Motion to fast-track a `paras.forceSetCurrentCode` call to unbrick the parachain ### Proposed Referendum - https://polkadot.polkassembly.io/tech/41 ### Post-Mortem - https://polkadot.polkassembly.io/post/1005 ### Comments - None ### Glows - [x] Team acted promptly ### Grows - Attentiveness --- ## Title ### Issue - Mismatch in runtime version between the parachain Wasm on the Relay chain and the parachain Wasm on the collators ### Resolution - Use `codeSubstitute` to replace to parachain's Wasm with the same Wasm that was on the Relay chain, produce a few blocks, call a runtime upgrade with the new Wasm ### Proposed Referendum - N/A ### Post-Mortem - https://forum.polkadot.network/t/how-to-recover-a-parachain ### Comments - Team decided that calling `paras.forceSetCurrentCode` via governance would take too long and no gaurantee that governance would approve, so they decided to go through the `codeSubstitute` route ### Glows - [x] Excellent decision making - [x] Execellent write-up afterwards, sharing with community and even opening a discussion on the Polkadot Forum for the need for a better solution ### Grows - :sunflower: