KILT 1.9.0 Ethereum Migration

# KILT 1.9.0 Ethereum Migration Issue: Does not fit into one block weight-wise as it consumes ~120% of a block's weight. One should never exceed 75% for a runtime upgrade to have space for operational and mandatory tasks (~25%). Excerpt from `try-runtime` against Spiritnet: ``` $ RUST_LOG=try-runtime=trace,parachain-staking=trace cargo run --release --features=try-runtime --bin=kilt-parachain try-runtime --chain=spiritnet-dev --no-spec-check-panic on-runtime-upgrade live --uri=wss://spiritnet.kilt.io:443 2022-11-23 11:20:31.968 INFO main pallet_did_lookup::migrations: 🔎 DidLookup pre migration: Number of connected DIDs 1296 2022-11-23 11:20:31.977 INFO main pallet_did_lookup::migrations: 🔎 DidLookup pre migration: Number of connected accounts 1296 2022-11-23 11:20:32.131 DEBUG main try-runtime::cli: proof size: 445.71 KB (456405 bytes) 2022-11-23 11:20:32.131 DEBUG main try-runtime::cli: compact proof size: 342.45 KB (350665 bytes) 2022-11-23 11:20:32.131 DEBUG main try-runtime::cli: zstd-compressed compact proof 179.43 KB (183735 bytes) 2022-11-23 11:20:32.184 INFO main try-runtime::cli: TryRuntime_on_runtime_upgrade executed without errors. Consumed weight = (587825000000 ps, 0 byte), total weight = (500000000000 ps, 5242880 byte) (117.57 %, 0.00 %). ``` ## Fix December 2022 My proposal on how to test and execute the upgrade: 1. Merge [Eth Migration PR](https://github.com/KILTprotocol/kilt-node/pull/438) 2. Execute [upgrade-tool](https://github.com/KILTprotocol/upgrade-tool) locally. Maybe update metadata, even though nothing should change. 2a. Spawn at least as many DIDs as on Spiritnet (NOTE: Peregrine runtime is hardcoded as we don't need this for Spiritnet, adjustment would be super simple, e.g. change `peregrine_runtime` to `spiritnet_runtime`. 2b. Execute runtime upgrade. 2c. Do sanity checks. 2d. Migrate ids. 4. Redo step 2 on Peregrine Staging and Peregrine. Should not require any code changes to upgrade tool. 5. Adjust `upgrade-tool` to work for Spiritnet runtime: 5a. Replace `peregrine_runtime` with `spiritnet_runtime` in code. 5b. Get 10801 metadata from Spiritnet 7. Test against local Spiritnet runtime. Get 10900 metadata for Spiritnet. 8. Ready to migrate: Propose to Spiritnet if release process is fine For step 2, please check the [README of the upgrade tool for command](https://github.com/KILTprotocol/upgrade-tool#how-to-use). ## Decision November 2022 Made together with Albi Nov 22nd: We go with _Option 2_, e.g. the migration happens via extrinsics for any signed Origin * Arguments against _automatic upgrade_ via `on_initialize`: We cannot fully test it (differences between our test networks with self-owned validators vs. live-net) and don't want to risk block production due to long recovery via Polkadot governance ### Concept * We block extrinsics to the `did-lookup-pallet` by implementing a `CallFilter` which is toggled on by the migration and toggled of by the `migrate` extrinsic once we are done * Most ideally, we would like to build a generic `Migration` pallet exposing traits `Filter` and `Upgrade` * For details, see below ![](https://i.imgur.com/vb2LyD1.jpg) ![](https://i.imgur.com/PwxGsK8.jpg) * For now, we think a quicker and more dirty solution directly in the `pallet-did-lookup` would be sufficient: ![](https://i.imgur.com/gaqcGZg.jpg) _______ ## Potential solutions 1. Use Scheduler pallet over 2-3 blocks 2. Give Council power to execute migration via batched extrinsic in 2-3 blocks 3. Execute migration in `on_initialize` of new pallet or `pallet-did-lookup` and block user extrinsics until done ### Option 1: Scheduler #### Pros * Simplest approach #### Cons * Does not seem to be used/preferred by parachain teams * Requires knowledge of required blocks before upgrade --> attack vector for time between proposing and enactment * Minor: Blocks us from upgrading to Polkadot v0.9.32 in same runtime upgrade (because Scheduler migration requires no scheduled tasks) ### Option 2: Extrinsic #### Pros * Simple and quicker than Option 3 #### Cons * Chain not blocked for normal user, e.g. reading corrupted storage does not work until upgrade ### Option 3: Migration pallet #### Pros * Simple * Benchable * Multi-block by Design #### Cons * Not "verifiable" * Chain must be blocked for normal user [Example Pallet](https://github.com/mustermeiszer/sub0-multi-block-upgrade/blob/master/src/lib.rs) ```rust! #[pallet::hooks] impl<T: Config> Hooks<BlockNumberFor<T>> for Pallet<T> { fn on_runtime_upgrade() -> Weight { StartingBlock::<T>::put(frame_system::Pallet::<T>::current_block_number()); // Block the rest of the block. T::BlockWeights::get().max_block } fn on_initialize(n: T::BlockNumber) -> Weight { let unmigrated_dids = Pallet::<T>::unmigrated_dids(); if !unmigrated_dids.len().is_zero() { let mut n = 0u32; Self::deposit_event(Event::MigratedDids(n)); T::BlockWeights::get().max_block } else { Zero::zero() } } } ``` ## ~~Decision~~ * ~~We go with `on_initialize`~~ * ~~We could use a storage entry `LastKey` of type `Option<KeyPrefixIterator`~~ * ~~While the migration is ongoing, we check whether we would exceed the block limit during `on_initialize`~~ * ~~If so, we stop and set `LastKey` to `Some(get_last_key)`~~ * ~~Else continue~~ * ~~In `verify_migration`, we set `Option<KeyPreFixIterator`~~ * ~~If subsequent call points to same key, we are "stuck" and need to migrate that key~~ * ~~Else we continue until we reach `None` -> Then we are done~~ ## Helpful resources * https://github.com/paritytech/substrate/issues/7911 * Proposal by a user: https://gist.github.com/MrShiposha/2771974e77f6358bcf29681719d71977#file-multiblock-migration-md * Multiple variants showcased on Sub0 2021 "Multi-block migrations by Frederik Schulz from Centrifuge": * [Slides](https://drive.google.com/drive/folders/1cY_LuKTDWC8RQ1RuLYOqExSaB6Rw_7Zw) * [Video](https://www.youtube.com/watch?v=sIgvyRFs-N4)