owned this note
owned this note
Published
Linked with GitHub
## Open collator set (long route)
This draft explores a path to develop and deploy changes to the `CandidateReceipt` structure across the stack: polkadot-sdk, polkadot node, cumulus, relay chain runtime and parachain runtime.
The goal and focus is to deploy the changes without breaking any interface that is currently in use by parachain teams. The changes should also decrease implementation and deployment time for further modifications in the `CandidateReceipt` structure.
All the network protocol version changes will be guarded by a feature that will be removed after testing. Testing builds will need to build with the feature.
## Deployment
### Milestone #1: Polkadot SDK release with new primitives
We should aim for it to contain all the new staging runtime APIs, polkadot node primitives, runtime primitives and parachain primitives in one single release.
### Milestone #2: Runtime release for testnets
The runtime is backward compatible and introduces the new APIs, primitives and new PVF execution param. We will now beging testing backward compatibility on Versi then Westend.
### Milestone #2.5: Versi testing starts
Testing should start as soon as master produces good builds for both node and cumulus.
### Milstone #3: Polkadot node release
The release will remove the network protocol feature guard. We need to start monitoring the upgrade and encouraging people early to do it. This ensures we don't have additional delays later.
The release will introduce:
- the new network protocol versions and maintain compatibility with older nodes while supporting old/new runtime
- the PVF versioning support and support for v2 validation input parameters and outputs - `validate_block_v2`
At this point we are using the network protocols but still using old candidate receipts.
When Westend validator are upgraded to this node release public testing can begin.
### Milstone #4: Cumulus node release
Documentation should be ready at this point as it needs to be included in release.
Parachain teams can experiment on Westend.
### Milestone #5: Kusama Runtime release
After testnets and bug fixing we push the changes into the Fellowship Kusama runtime. We need to time this well and do it after sufficient validators have upgraded.
### Milestone #6: Polkadot runtime release
After Kusama testing and more bugfixing we push the changes into a runtime upgrade.
We should be careful about having sufficient validators upgraded.
TODO: Introduce milestones and map impl effort on each of the milestones
## List of changes and estimates:
Start: RFC for describing the reasoning for the new polkadot and parachain primitives.
The candidate receipt structures will be wraped by an enum like this:
```
pub enum VersionedCandidateReceipt {
V0(CandidateReceiptV0), // old candidate version
V1(CandidateReceiptV1), // RFC(1)
}
```
### Polkadot Primitives v8
- Implement new staging primitives as described by RFC:
- `BackedCandidate`, `CandidateReceipt`, `CommittedCandidateReceipt`, `InherentData` and `BackingState`
- Implement custom `Encode` and `Decode` that can serialize/deserialize old candidate receipt structures into `VersionedCandidateReceipt`. One way to differentiate between structures is using a 64-bit magic number.
- We want to allow the runtime to easily switch to the new receipts and be backwards compatible. This means old validators are not affected by the switch as they are able to push old receipts into the parachain inherent data.
- Add a `BackedCandidate::core_index()` method that returns the committed core or extracted core index.
- Implement accessor methods and make fields private in new `CandidateReceipt`
### Parachain Primitives v2
- Implement new version of `ValidationResult` with the new `CoreIndex` commitment as described by RFC
- New input`ValidationParams` with `core_index` for PVF execution
*This doesn't break anyone until we decide to remove the v1 primitives. Only parachains which want to use the elastic scaling feature need to upgrade in the near future (up until we stop supporting elastic scaling without `CoreIndex` commitment)*
### PVF Versioning:
This is based on the draft by Dmitry: https://hackmd.io/fvjlm1XBRsSQCUxERn_mIw?view
#### Runtime
- Add new executor environment parameter to indicate supported PVF version range. Should be done as the first step, as it costs virtually nothing to introduce it but it costs a lot of time to get it deployed. By the time the versioned PVFs appear, it should be there already.
#### Cumulus
`register_validate_block` should allow for multiple definitions of validate_block function that will be named validate_block (as a backward compatibility measure representing v1), validate_block_v2, vaslidate_block_v3 etc. The new functions accept ValidationParamsV2, ValidationParamsV3 etc. as their input. It is discussable if we need to versify their output as well.
#### Substrate
- Substrate executor recognizes which versions are supported by the PVF on the compilation stage and returns that info to the caller.
- Substrate executor support to accept optional version that it should execute.
#### PVF Host
- provides the Substrate executor with the version that should be executed, based on the executor parameters and the version info provided on preparation stage.
- receive the version range from the Substrate executor when prepariung artifacts and stores that info in-memory along with the other artifact-related info.
#### Workflow
The PVF host get a request to execute a PVF and it doesn't have a preapred artifact. It instructs the Substrate executor to compile the artifact and get back the compiled artifact and the suppoted version range. The PVF host always gets the executor params along with the execution request so it can now determine which version to execute (it's the highest common version from ranges specified in executor params and in the PVF itself). In case there's no common version available, it returns an error. Otherwise, it calls into the Substrate executor to execute the PVF specifying an entry point corresponding to the highest common version.
### Relay chain runtime
The runtime will accept both old and new receipt formats by implementing detection during `Decode`. We need to implement accessor methods for all fields of `VersionedCandidateReceipt` and not rely on doing any version checks in the runtime code.
Runtime migration for pending availability candidates.
#### New Runtime APIs
Some runtime APIs used by parachain consensus on node side return types that wrap candidate receipts. We need to introduce v2 APIs that return the new ones.
Changes:
- Introduce `candidates_pending_availability_v2` that returns a Vec of `VersionedCommittedCandidateReceipt`
- Modify existing API `candidate_pending_availability` to return `None` if receipt is using V2 as there is no way for old nodes to handle the new structures.
- Introduce `para_backing_state_v2` and adjust old API
- publish APIs in testnet runtimes
*This will break any tooling that decodes the affected primitives, for example Polkadot-introspector.*
### Parachain runtime
Technically this will introduce the 2nd version of `validate_block` PVF API.
More details about how this can be fully implemented is in https://github.com/paritytech/polkadot-sdk/issues/645.
DX: From the user perspective we want things to require no changes, or minimal changes for upgrading. In this particular case the prachain team can simply upgrade crate versions and add `2` as version argument to `register_validate_block` in their runtime.
**Changes:**
- New version of `MemoryOptimizedValidationParams`
- implement new `validate_block_v2` (see PVF Versioning) which accepts new `ValidationParams` and returns the `CoreIndex` in `ValidationResult`
- add an optional version argument to `register_validate_block` proc macro which selects the approapriate `validate_block` version to use.
### Polkadot node primitives
Introduce new primitives `SubmitCollationParamsV2`, `StatementV2`, `StatementWithPVD`, `UncheckedSignedFullStatementV2`, `SignedFullStatementWithPVDV2`. These are used in the new network protocol version but at same time they are used by Cumulus, so we still want to support the old ones to not break parachains and force them to switch.
### Polkadot node
For the node side we will the runtime API version as a toggle for the relay chain capability to deal with the new candidate receipts. The node must be backwards compatible and SDK changeset must not break any existing implementations.
### Subsystem interface
The subsystem message types needing an update to support the new primitives are :`CollatorProtocolMessage`, `DisputeCoordinatorMessage`, `AvailabilityRecoveryMessage`, `RuntimeApiRequest`, `ProvisionerMessage`, `CollationGenerationMessage`, `ProspectiveParachainsMessage`.
This also involves changes in all subsystems that send any of these messages as well as fixing their tests.
#### Network protocol upgrade
Only two new protocol version are required to be released
- Bump the collator protocol to `version 3` as required by the new version of `CandidateReceipt` which is used collation advertisements.
- New req/res protocol `AttestedCandidateV3` for fetching the `CoreIndex` committed candidate receipt and statements about it
Changes:
- In `polkadot-node-network-protocol` implement the new network protocols versions and fix tests
- `collator-protocol` version aware logic for distributing collations and compatibility tests
- `statement-distribution` implement `answer_request` for `AttestedCandidateV3`
- `statement-distribution` fallback mechanism in `dispatch_requests` to try previous version in case of failure
#### Runtime API subsystem
Just need to implement the new runtime API versions and cache.
#### Collator protocol
We need to add new internal types that wrap the new receipts.
- on collator side implement `CollationV2`
- on validator side we add sanity check early for out of range core index in descriptor
#### Collation generation subsystem
- switch to `para_backing_state_v2` relay chain APIs
- `construct_and_distribute_receipt` switch to polkadot v8 and node primitives
- handle `SubmitCollationParamsV2` in `handle_submit_collation`
#### Candidate validation changes
- check validation output `core_index` commitment matches the one in the descriptor.
#### Tests
- integration and unit tests covering network protocol backwards compatibility
- zombienet tests
- Versi network and runtime upgrades
### Cumulus node
- switch to using the Polkadot v8 primitives and v2 parachain primitives
- cumulus pov-recovery switch to new runtime API `candidates_pending_availability_v2`
### Documentation
The implementers guide needs to be updated to reflect all the changes. A step by step guide should be created for upgrading from the MVP or enabling elastic scaling. It should be the same for migration from MVP or not using elastic scaling.