Mutable Ballots

# Mutable Ballots ## Steady State ### Ballot Structure This is (more or less) the ballot structure today: ```go= type Ballot struct { InnerBallot Signature []byte } type InnerBallot struct { AtxID ATXID EligibilityProofs []VotingEligibilityProof Votes Votes RefBallot BallotID EpochData *EpochData } type Votes struct { Base BallotID Support, Against []BlockID Abstain []LayerID } type EpochData struct { ActiveSet []ATXID Beacon Beacon } ``` The signature on line 3 is constructed by signing the serialized `InnerBallot` with the smesher's key. ### Validation Validation today works like this (stop immediately if a step fails): 1. Validate signature. 2. Validate data availability. 3. Validate eligibility. ## Proposal ### Layer Hash Layer hash is calculated as follows: ```python layer_hashes[l] = hash(layer_hashes[l-1], layer_opinion) ``` `layer_opinion` is one of: 1. Some sentinel value indicating "abstain". This value should be shorter than a block ID, possibly the byte `0x00`. 2. Concatenated IDs of valid blocks, sorted in lexical order. This list can also be zero-length, indicating a vote for the empty layer. The layer hash should be accessible any time a ballot for the next layer is validated or generated, regardless of the layer status (validated or not). Whenever a layer is validated, the cached layer hash for this layer and any subsequent layer should be invalidated and re-calculated the next time it's needed. ### Ballot Structure The proposal is to replace the `Votes` in the `InnerBallot` with `HistoryHash`, a 32-byte digest of the smesher's view of history and place the `Votes` directly in the `Ballot`, where it's not part of the message that's signed: ```go= type Ballot struct { InnerBallot Votes Votes Signature []byte } type InnerBallot struct { AtxID ATXID EligibilityProofs []VotingEligibilityProof HistoryHash Hash32 RefBallot BallotID EpochData *EpochData } ``` :::info This change means that ballots are 32 bytes bigger than they otherwise need to be. With no optimizations, that's ~168MB/year of additional mesh growth, but assuming that most honest ballots will share a view, this can be almost entirely eliminated (assuming we have to store our own view's history of layer hashes), excluding malicious/prunable ballots. We've considered not explicitly including the `HistoryHash` and deducing it from the `Votes`, but this creates issues with validation, where a malicious actor can force nodes to perform a lot of work before learning that a ballot is malicious. ::: ### Validation Validation will require an additional step: 1. Validate signature. 2. Validate data availability. 3. Validate eligibility. 4. ==NEW== Validate that `HistoryHash` matches `Votes`. Step 4 is trivial for ballots that vote like the validating node's local view (compare the ballot's `HistoryHash` and `Votes` to the local view). For ballots with differing opinions on history, the cost of this step depends on how far into history the disagreement lies. The cost is one hash operation per layer into the past. **Example:** *a ballot in layer 101 should have a `HistoryHash` representing layers up to 100. If it has a divergence in view with the validating node at layer 91, the validator will have to recalculate layer hashes for layers 91-100, so the cost of validation will be 10 hashes more than for a ballot it agrees with.* ## Prioritization of Ballots ### For Genesis For genesis we don't do any prioritization. The risk here is that malicious smeshers could deliberately create ballots with diffs far in the past which would cause everyone to have to calculate many hashes to validate that the ballot is valid. The only mitigation we want to implement for this in genesis is to limit how far in the past a diff can be (e.g. 288 layers = 24 hours). Any ballot with a diff farther in the past will be considered syntactically invalid and discarded. ### Post-genesis Because some ballots are more expensive to validate than others, it would be nice if we could do this with less urgency. It just so happens to be that we actually can. As long as the Verifying Tortoise is able to validate layers, we don't actually need to consider ballots that don't agree with our view of history. Ballots that share a view of history with us have the same `HistoryHash` as us, meaning that validating them is cheaper and faster. So we propose prioritizing those ballots in the validation queue. The node should only immediately validate (and gossip) ballots that agree with its view of history. The rest can be queued up. During idle time the node can validate and gossip the queued up transactions. One caveat is that we currently use valid-but-disagreeing ballots to calculate "uncounted weight". A safer way to quantify "uncounted weight", which also works without validating disagreeing ballots, is to calculate the expected weight in each layer, based on published ATXs. If the node fails to advance consensus with the Verifying Tortoise, it should finish processing the queue of ballots before commencing the Full Tortoise (which may be affected by those queued up ballots). :::danger This approach means we're slowing down (if not entirely censoring) ballots we disagree with. Is this approach acceptable? :::