# Cross-session backing
**GOAL: Allow candidates built on relay parents in the old session to be backed on chain in the next session.**
## Context
For backing purposes, the last relay parent of a session is actually considered part of session N+1.
The session index runtime API of a relay parent is actually called `SessionIndexForChild`. This is because the session change happens at the end of the last block in the session (and runtime APIs are called on the post-state of a block).
Therefore, in a new session, we already allow backing a candidate that is built on the last relay parent of the previous session (but all subsystems and the runtime consider this relay parent as part of the next session).
If this were not the case, there would be no valid relay parent for the first block of a session, causing yet another skipped backing opportunity.
The allowed ancestry length (how old the relay parent can be) for backing is determined by the `scheduling_lookahead` value. At present, the allowed ancestor set is getting cleared on session boundary, only the last block of the previous session remaining there.
## Assumptions
- The implementation of the [cross-session availability plan](https://hackmd.io/NKUWBZC7Q8ipvg1FYibRzQ) is a prerequisite of this design (at a minimum the runtime changes and the approval voting fix, as well as having the new node feature enabled). We build further on the decisions made in the previous doc.
- The core for which the candidate was built (from the v2 descriptor field) still exists and is still assigned to this paraid.
- Paraid is still registered.
- The validator set is largely unchanged between sessions (but they are still shuffled, of course).
- Validators from the previous session are still online and participate in the availability and bitfield distribution processes. At least one backer needs to be available and 2/3 of the validators, to achieve full availability.
- (Initial, can be lifted later if needed) The candidates have already been backed offchain and the statements have reached the block author in the new session. This is a fine assumption, since these blocks have had at least 6 seconds to get backed offchain (the candidates that are built on the very last block are part of the new session actually, so the ones built on the RP before that already had at least 6 seconds to reach the backing threshold).
- We maintain the assumption (from the availability design) that if any changes to the `HostConfiguration` occur in the new session, cross-session backing is not possible in the new session.
While for cross-session availability we could make the simplifying assumption that one block is enough to have the av-cores freed, we cannot make the same assumption for backing candidates. This is because we may have a large unincluded segment of candidates (all built on relay parents in the last session). And quite possibly, the parachain may not be using elastic scaling (or basti blocks) and therefore still lose part of its candidates.
Moreover, the availability and bitfield distributions still needs to continue for more than one block (so we are no longer benefiting from the simplifications of one block availability post session change).
However, we maintain the design decision of only allowing one block for availability of cores that have been occupied in the previous session.
This simplifies the logic on the first block of the session, by only having to process bitfields in the context of the previous session.
## Solution
We build on the design decision from cross-session availability: the old validators are the ones participating in all protocol operation of the candidates.
More importantly, in the availability distribution, bitfield distribution and availability recovery (but also implicitly in approval-voting and disputes).
In addition, with that design, we modified the meaning of the session index in the candidate descriptor: it is no longer the session in which it becomes included.
With this design, we further modify the meaning: it is no longer the session in which it is backed on chain.
The remaining fundamental meaning is:
- the session index of the child of the the candidate's relay parent (in whose context it is authored and executed)
- the session index in whose context the candidate goes through the offchain backing and availability protocols.
**Note: The cross-session availability design is assumed to be already implemented**
### Cumulus
No changes are needed on the collator side, because we make the assumption that candidates already built in the old session have already been backed offchain. Collators should not author more blocks in the old session once they do see a session change.
### Note on bitfields
The first block of a new session will only include bitfields for the candidates that have been pending availability since the last session.
The next couple of blocks however (depending on the `parasAvailabilityPeriod` and the size of the allowed ancestry), could include bitfields for candidates whose relay parents are part of both the old and the new session.
In other words, we potentially double the number of bitfields to sign, distribute and process by validators that are part of both the old and the new session.
In the other cases (when the validator set differs), nothing changes:
- Validators that used to be active but no longer are will only participate in bitfield signing and distribution for the first block of the new session.
- Validators that are just now becoming active will participate for candidates authored in the new session, as before.
We consider this additional network traffic as acceptable, as it only occurs for the first several blocks of a session.
If it ever proves to be problematic, some options for optimising it would be:
- assuming a `parasAvailabilityPeriod` of 1 for blocks authored in the past session
- restricting backing candidates from the old session to the very first block of the new session (this assumes however that sufficient cores are assigned to the para or that only one block remains from the old session)
- more complicated but the most flexible: add a new bitfield distribution protocol, one which potentially bundles two separate bitfields from a single validator (with a single signature).
Similarly, the runtime will also potentially handle twice the amount of bitfields.
Just as before, if it proves to be not acceptable, we could define a new bitfield format (changing the inherent data type), which could encapsulate both bitfields.
For now, assume that the bitfields are ordered: first, the ones from validators in the current session. Once a signature check fails, switch to processing them according to the rules of the previous session. This is backwards compatible:
- (old node, new runtime): unupgraded nodes will only supply new session bitfields in the inherent. No bitfields will be dropped and availability for old candidates will just not progress
- (new node, old runtime): upgraded node will first supply the new session bitfields in the inherent, followed by the old session bitfields. The old runtime will process the bitfields valid in the new session and drop the ones valid in the old session.
Both in the node and the runtime, to determine the session index of a particular bitfield, we can simply look at the bits that are set and their core indices. If they correspond only to cores occupied by previous session candidates, assume the previous session index. If they correspond to candidates built on RPs in the current session, assume the current session. If they have a mix, the bitfield is invalid. If all bits are zeroed, drop the bitfield, as it's useless anyway.
### Runtime changes
When implementing, double check that all of the changes made for the cross-session availability properly use the session index from the relay parent of the candidate (not just the session index at the time of on-chain
backing).
#### `shared`
The `AllowedRelayParents` are cleared on session change. We should remove this restriction (or store the allowed RPs from the previous session in a separate storage item).
On a session change, take into account the `scheduling_lookahead` of the new session to determine the number of allowed RPs.
We also need to ensure that the value of the `scheduling_lookahead` does not make us cross two sessions. For this, we should store the `SessionStartBlock` of the previous session as well (also needed for the `scheduler` module, see below).
#### `scheduler`
The `group_assigned_to_core` function is being exposed by this module and used in multiple places in the runtime, as well as the `availability_cores` runtime API.
This function no longer makes sense without the session info context. The same core would have different assigned groups depending on the session.
At the moment, for a relay parent from the previous session, this would return `None`.
The session index needs to be taken into account. If the block number supplied is less than the current `SessionStartBlock`, check if it's part of the `AllowedRelayParents`.
If it is, verify its part of the previous session and retrieve the validator groups and rotation info from the `session_info` pallet for the computation. Also return the session index.
In order to calculate the assigned group, we also need to persist the previous session's start block as well as the `group_rotation_frequency`,
which are at the moment not part of the buffered session info. We need to either add them there or add them in a new storage item on the scheduler.
A similar problem is on the `group_validators` function. It needs to take on a new argument - the session index.
#### `paras_inherent`
1. relax v2 session index check in `sanitize_backed_candidate_v2`. Allow the previous session as well. Validate that
- the relay parent of the candidate is indeed in the previous session.
- the session info of the previous session is available in the runtime (will always be true if the configured `dispute_period` is at least 1).
- there was no configuration change in this session
1. `sanitize_bitfields` needs to stop assuming the bitfield's session index for the signing context is the current session. Needs to perform the check based on the relay parent of the candidate.
1. `filter_backed_statements_from_disabled_validators`. This function currently filters the backing votes coming from disabled validators. Validator disabling works per-session so we face a problem: Do we consider that these validators are no longer disabled for the purposes of these couple of blocks in the last session? Or do we still consider them disabled? If we do, we'd need to persist
the disabled validators of the past session as well. .
1. `T::DisputesHandler::note_included` needs to use the real session index of the relay parent of the candidate (not the backing session, not the including session).
1. `current_concluded_invalid_disputes` needs to also take into account freeing cores for disputes for candidates in the old session (as they may be backed now)
#### `inclusion`
In `process_candidates`, `check_backing_votes` needs to take into account the session index of the relay parent of the candidate, instead of the current session index.
Moreover, the group index needs to be determined using the new `scheduler::Pallet::<T>::group_assigned_to_core` implementation (should be seamless for this code).
`update_pending_availability_and_get_freed_cores` needs to handle bitfields from both the previous and new session's validators, if there are still candidates pending availability from the old session.
### Node changes
Because of the assumption that candidates built on the old session relay parents are already backed offchain, msome subsystems involved in offchain backing do not need modifications (collator-protocol, statement-distribution).
The ones that are involved in availability and creating the inherent data need changes.
#### Prospective-parachains
On a new leaf, prospective parachains constructs a new view of the backable chain by:
1. adding the currently pending availability candidates
2. retrieving the backable chain of the previous leaf
Logic needs to modify such that the fragment chain `Scope` constructs the allowed RP ancestor set by including relay parents from the old session, if there was no configuration change on the new session. A new runtime API which returns whether or not there was a hostconfig change on this session would be needed.
#### Backing
While the backing subsystem does not need to initiate validation or statement distribution for candidates in the old session, it does need to make sure they are preserved so that the provisioner can retrieve them when building the inherent.
When handling an active leaf update, it currently clears RPs and candidates not part of the implicit view (and the implicit view does not contain RPs from the old session). We need to add a new function on the implicit view, which returns all RPs in the allowed ancestry set, even if they are part of the old session. (But only if certain preconditions are met: the paraid still having a core assigned and that no configuration change occured on the new session).
#### Availability-distribution
Changes outlined in the cross-session availability doc will be sufficient.
#### Bitfield signing and distribution
Bitfield signing may need to happen in parallel for two sessions, depending on whether an old validator is still active in the new session:
- validator used to be active, no longer is: participates in signing and distribution according to the old session config, if there are still candidates pending availability in the first `scheduling_lookahead` blocks of the new session.
- validator just became active: only participates in the new session's signing and distribution (does not even query the av store for candidates that are part of the old session's RPs).
- validator used to be active and remains active: combines both bullet points above (in parallel)
#### Provisioner
Construct the inherent data according to the rules accepted by the runtime (first the bitfields from the new session candidates, then any bitfields for the old session candidates).
#### Approvals
Make sure the fixes for cross-session availability takes into account for recovery the session index of the relay parent of the candidate (not when it is included, not when it is backed on-chain).
Approval voting and distribution already work over multiple sessions in parallel (as finality may lag over a session change and approvals happen in parallel for unfinalized relay parents). However, it was designed for multiple relay parents spanning different sessions. It was not designed for the candidates in a relay parent to span across different sessions.
Both the assignment certificates and the approval votes may contain payloads for several candidates in a block. This was a performance optimisation, to clear the bottleneck of network traffic and expensive VRF signature checking (see [#1178](https://github.com/paritytech/polkadot-sdk/pull/1178) and [#1191](https://github.com/paritytech/polkadot-sdk/pull/1191)).
This complicates things significantly, as we cannot assume seamless validator participation in both sessions for the same relay parent:
- The validator index of the message author changes between sessions, but right now it is assumed that it's unique for the candidates in a block. Moreover, an old validator may not even be active any more.
- The vote signature is created over a payload containing a session index, and therefore only valid in that context. Here again, we have two different session indices to choose from.
To work around this, there are a couple of options (both very complicated):
1. We could assume that the validators participating in approvals are the ones in the new session. However, this would introduce a duality which complicates the disputes process: the backing validators are from the old session but the approval checkers are from the new session. Another piece of complexity would be added by having two different session indices for a candidate (the backing session index and the inclusion session index). The backing index would be used for av-recovery but the inclusion session index would be used for approval checking.
2. Return to non-coalescing of votes and certificates for the blocks which have candidates from both sessions. Sign and distribute separate messages from the same validator for the candidates in different sessions. Adds complexity due to the duality of the implementation and would increase network traffic and delay finality under load.
#### Disputes
There are places in the code where we make the assumption of a session index where the candidate is backed on chain being the same as the session index of the candidate's relay parent (some examples: [here](https://github.com/paritytech/polkadot-sdk/blob/9972470602d118fb07d968460b8a6dd5d4523141/polkadot/node/core/dispute-coordinator/src/initialized.rs#L657) and [here](https://github.com/paritytech/polkadot-sdk/blob/9972470602d118fb07d968460b8a6dd5d4523141/polkadot/node/core/dispute-coordinator/src/initialized.rs#L671)).
Moreover, depending on the solution chosen for approvals, disputes handling may need adaptation. Especially in the case of having backing votes and approval votes coming from different validator sets.