❓✅ What is the `T::Fallback` election mode in Kusama and Polkadot? - `onchain::OnChainExecution<OnChainSeqPhragmen>` if feature `fast-runtime` is enabled, otherwise `frame_election_provider_support::NoElection` ❓✅ What is the EPM state machine flow? - All OK: `Off` -> `Signed` -> `Unsigned` --> `Signed` ... - `Off` -> `Signed` -> `Unsigned` --> `Fallback` -> `Emergency` If in `EmergencyPhase`, a solution may be set via the `Pallet::set_emergency_election_result` extrinsic, open to `Config::ForceOrigin`. The input solution is added to the solutions queue, similar to what miners do during the signed phase. The `Config::ForceOrigin` is set to: - Kusama: `type ForceOrigin = EitherOf<EnsureRoot<Self::AccountId>, StakingAdmin>;` - Polkadot: `type ForceOrigin = EitherOfDiverse<EnsureRoot<AccountId>, pallet_collective::EnsureProportionAtLeast<AccountId, CouncilCollective, 2, 3>>` ❓✅ When is the phase `Off` triggered? - After calling `elect()` and a transaction was successful, call `rotate_round()` which sets the phase to `Phase::Off` and kills the previous snapshot. ❓✅ When does the phase transition from `Phase::Off` -> `Phase::Signed`? - in the `on_initialize`, the phase is `Phase::Off` and the `fn create_snapshot` finished successfully. i.e. to enter in the `Phase::Signed`, the `ElectionDataProvider` is queried for election data and the snapshot is created. So that the staking miners can proceed with the signed phase (i.e. fetch on-chain snapshot and calculate the solution off-chain + submit it). ❓✅ When is the `Phase::Emergency` triggered? - When a call to `EPM::ElectionProvider::elect()` fails, which may happen when: - No solution was queued - (and) `T::Fallback::instant_elect()` failed ❓✅ When does `staking` set `ForceEra`? Why did it do during the incident when many slashes happened? - Eras are a set of sessions at which a specific set of validators is active: ``` Era: A (whole) number of sessions, which is the period that the validator set (and each validator’s active nominator set) is recalculated and where rewards are paid out. ``` - The staking pallet has a `Forcing` enum that is kept in storage that defines the era forcing strategy. If the `Forcing::ForceNew` is set, then a new election is forced (regardless of the number of blocks left for the next election). Once the election is successful, `Forcing::NotForcing` is set. - The `EPM::on_initialize` hook calls into `fn DataProvider::next_election_prediction`, which is implemented by the staking pallet. The `fn next_election_prediction` will check the current `Forcing` state. If it is `Forcing::NewEra`, then there will be 0 sessions left for the next election and the next election prediction is `now + until_next_session`, ie in the next session. *Note: A sessions is a number of blocks. An era is a number of sessions. (era > session > block)* ❓✅ When does staking set `Forcing::ForceNew`? Is it when there are a lot of slashes? - `root` call to `Pallet::force_new_era` extrinsic - If `T::OffendingValidatorsThreshold` is reached when adding offenders (slashes), `fn ensure_new_era` is called to forced a new era (to reset the validator set). ❓✅ When is a snapshot created? and cleared? - Snapshot is created when: - `Phase::Off && remaining <= signed_deadline && remaining > unsigned_deadline` - `Phase::Off && remaining <= unsigned_deadline && remaining > Zero::zero()` - Snapshot is cleared with `fn kill_snapshot` when `elect()` and election was successfully. - If election succeeds, clear snapshot and enter `Phase::Off` - If election does not succeed, enter `Phase::Emergency` In summary: - If `Phase::Off`, there's no snapshot (it's been cleared up before entering the phase) - At the end of `Phase::Off`, the snapshot is created. If the snapshot was successfully created, phase transitions to `Phase::` ❓✅ When is `EPM::elect()` called? - By `staking` when a new era should start (either because `Forcing::ForceNew` , `Forcing::Always` or `Forcing::NotForcing` and `era_lenght > T::SessionsPerEra`. i.e. when the staking pallet decides that a new era should start. ❓✅ What happened in the Kusama incident? (to reproduce) Root problem: New session starts, staking is in `Forcing::ForceNew` due to slashes and the EPM solution queue is empty and fallback election fails -> EPM enters in emergency mode. In a nutshell, `elect()` was called to early: EPM and staking miners didn't have enough time to come up with a new solution. **Option 1.** `staking::SessionManager::new_session()` triggered a new election after slashes happened: 1. Slash enough validators in an era so that the list of offenders is larger than `T::OffendingValidatorsThreshold`. - Expected behaviour: staking sets `Forcing::ForceNew` 2. New session starts, `staking::SessionManager::new_session()` is called. - Expected behaviour: since `Forcing::ForceNew` is set, `fn try_trigger_new_era` is called, which calls into `<T::ElectionProvider>::elect()` 3. If EPM is in `Phase::Off` (or `Phase::Signed` and solution queue is still empty), the feasibility test does not complete. - Expected behaviour: if `T::Fallback` fails, the election fails; EPM phase goes to `Phase::Emergency`. Until the above happens, `EPM::on_initialize` won't create a snapshot if: 1. If current session is not finished yet, `EPM::on_initialize` calls `let next_election = T::DataProvider::next_election_prediction` which results on `next_election = now + few_blocks` (due to staking `Forcing` state being `Forcing::ForceNew`); 3. If current phase is `Phase::Signed` or `Phase::Off`, no snapshot is created.