![](https://i.imgur.com/5JMPFeK.jpg) # Goal: Perform network upgrades on the current testnet We want to have some continuity across testnet releases, our weekly testnet deployment should not be wiping state and instead mimick realistic network upgrades scenarios as much as possible. ## Migrations Question: what does it mean to do a migration on a shielded chain like Penumbra? Penumbra's state splits into two categories: the _public state_, recorded in the JMT, and the _private state_, recorded on client devices and committed to in the state commitment tree. #### Public State - Eventually we'd like to support snapshot state sync, how does that tie in with network upgrades? - How do we perform migrations to the structure of the chain state? #### Private State - We can't do snapshots or all-at-once migrations to private data - We can only provide ways for *clients* to eventually migrate their own data -- but they might not, or might not for a long time - Zcash is the case to study and learn lessons from - Pool migration has been extremely painful ### Implications for Migrations We can only perform snapshots and migrations on the public part of the chain state; the private part of the chain state must be replayed onto the new chain. To replay the private state, we should use the `CompactBlock`, which is designed to act as the minimal data required for a client to synchronize their private state. This has a few implications: - It means that the `CompactBlock` must be forwards-compatible for all time. We can add new proto fields, but we can never remove existing ones, because each software version must be able to process all prior `CompactBlock`s. (We can potentially _omit_ old fields from newer `CompactBlock`s, but this seems preferable to avoid). - It means that the entirety of a user's private state must be recoverable from the `CompactBlock`s alone, without any extra queries to the chain state, since we plan to only replay the `CompactBlock`s and not other data. - We should plan to use the `StatePayload` oneof to support adding new kinds of state payloads. #### Migration Tiers by Difficulty - **Public State**: Easiest, possible to do snapshots and server-side data format migrations. - **Proof Systems**: Changes to the proof system require a software upgrade on the client side, but can otherwise be a drop-in replacement, as long as they only change the proof system or low-level details of the proof statements, rather than anything about the high-level data the proof statements describe. - **Private State**: Changes to private state must be accretive, preserving both old and new formats, because the migration can only be done by the end user. Even if old formats are deprecated going forward, client software will need to support them indefinitely to allow users to perform migrations or access their historical data. - **Key material**: Changes to the key hierarchy that affect the address format, are the most difficult. We should plan to never do them. ## Genesis Data How do we specify the genesis data of an upgraded chain? One approach is to serialize the entire chain state into a JSON. A better approach would be to change the `genesis::AppState` to an enum, containing either a fresh state (like we have now) or a root hash of migrated data? ## Upgrade Process Upgrades are coordinated with on-chain governance. Because Penumbra has epoch-scoped batch processing (of validator updates, other data, etc), we want to align an upgrade with an epoch boundary, to ensure no data is left "hanging" across the chain upgrade - [x] Is this actually true? - [x] How does this work with emergency chain halts? An `Upgrade` proposal contains: - a freeform `hash` field for the hash of the software to upgrade to - an `epoch_index` specifying when to perform the upgrade; the upgrade occurs at the transition from `epoch_index - 1` to `epoch_index`. #### Tendermint Coordinating that 1/3 upgrades at the same time is hard, if this does not happen... the chain halts. #### Sketch of an upgrade: Note: we use `pd_old` to refer to the old version of `pd` and `pd_new` to refer to the new version the chain is upgrading to. 1. `Upgrade` proposal passes ($e_i$, `hash`) 2. `pd_old` automatically exits after `Commit` of the last block of epoch $e_{i-1}$ - Need a hook in `pd` to perform auto-shutdowns, which should be shared with emergency proposals. 3. Validator operators shut down Tendermint instances 4. Validator operators use migration tooling (`pd_new upgrade migrate` ?) to convert an old state to a new one 5. Validator operators construct their view of the new `genesis.json` for the upgraded chain (`pd_new upgrade genesis` ?) 6. Validator operators restart with `pd_new` aimed at migrated state and `tendermint` aimed at new `genesis.json` 7. `InitChain` implementation in `pd_new` should validate migrated state against root hash in `genesis.json` #### Discussion Potential difficulty: a lot of the governance proposal types are in the `penumbra_transaction` crate (This is a sign of mis-organization generally, and cleaning it up is probably required to sensibly version our data structures). Starting a tendermint chain lets you pick the starting block height --> we could pick the starting block height to be one higher than the chain we're migrating for, so that we have a contiguous sequence of `CompactBlock`s #### Github issues to create: 1. Governance `Upgrade*` proto messages 2. Implement chain state snapshotting 3. Implement halting logic in pd (shared with `Emergency`) 4. Export tooling 6. Migration tooling in pd 7. Support checkpointing public chain state ```rust enum AppState { // Checkpoints the migrated application state Checkpoint(Hash), // Global configuration for the chain (ChainParameters, etc.) Genesis(State), } ``` 8. Upgrade document that includes: * _essential information_: what?, when? epoch, machine requirements etc. * _independently testable sections_ * _backup_ * _migration instructions_ * _expected process_: outputs, duration * _rollback plan_ 9. Versioning of domain types / protos ##### Links https://docs.cosmos.network/v0.45/migrations/chain-upgrade-guide-044.html (0.45 chain upgrade guide) https://github.com/tendermint/tendermint/issues/5595 (Upgrade tooling rfc draft) https://buf.build/penumbra-zone/penumbra/docs/main:penumbra.core.transaction.v1alpha1#penumbra.core.transaction.v1alpha1.Proposal.Payload (Emergency: 2/3 stake votes to halt the chain :eyes:) https://medium.com/tendermint/tendermint-core-state-sync-for-developers-70a96ba3ee35 (Tendermint core: state sync) https://github.com/tendermint/tendermint/blob/v0.34.x/spec/abci/apps.md (ABCI docs about state sync) https://github.com/cosmos/cosmos-sdk/blob/main/docs/architecture/adr-041-in-place-store-migrations.md (in-place store migrations in the Cosmos SDK) https://github.com/cosmos/cosmos-sdk/issues/6936 (Streaming genesis file) https://hub.cosmos.network/main/migration/cosmoshub-4-v7-Theta-upgrade.html# (CosmosHub theta upgrade)