beefy authority set optimisations

# beefy authority set optimisations #### (see [#12820](https://github.com/paritytech/substrate/issues/12820) for context) The issue with the optimisation introduced in [#12857](https://github.com/paritytech/substrate/pull/12857 ) is that we still need a fingerprint for the leaves to ensure that the authority subsample is actually committed to. The merkle path in the normal proof was committed to, so it provided exactly this, whereas the lexical hash ordering optimisations currently still has unique merkle paths for every leaf, but they are no longer committed to. Here are three fixes/alternatives we can look into for optimisations: #### 1. embed the index in the leaf itself @doubledup suggested embedding the bitfield index in the leaf itself. This first seems strange to do for the prover since they have these indices readily available, but the light client verifier who only stores the root does not. When verifying the final bitfield, the light client verifier can then also check that all leaves provided by the prover have embedded indices that are also 1 in the final bitfield. Assuming the merkle root committed to is honest (which we do), one doesn't need to verify that the embedded indices are distinct. I encourage exploring this approach. #### 2. commit to the lexical path One could commit to the "twist" that's performed in the lexical ordering, and then verify that this is respected in the proof calculations: 1. the bitfield the prover commits to doesn't encode the positions (i.e. paths) of the leaves in the original merkle tree, but instead the lexical hash ordering path for every leaf. 2. when the verifier recomputes the root from a leaf and its copath, at step `n` in the calculation, they increment a running counter by `1 << n` if the current accumulator is lexically after the copath item. 3. after computing the root, the verifier also checks that this counter matches the claimed bitfield position (i.e. lexical hash ordering path) This would work as a fingerprint scheme and might be more efficient (for the light client verifier) than the reverted scheme: it infers & verifies the positions from the hashing process, rather than using the positions to decide the ordering in the hashing. It does increase complexity of the prover since they have to create a mapping of the original leaf positions to their image in the lexical hash ordering scheme (same procedure as step 2. above for the verifier) prior to committing to the bitfield. #### 3. don't merkelise at all If the update frequency is high enough, then the amortised cost of simply storing the leaves plain might be lower than merkelising them as we currently do: 1. Going from the fee schedule in Ethereum's Berlin spec and assuming 1000 validators, there'd be a once-off 20M gas cost for setting the leaves (`G_sset`: 20k gas per 32 byte validator address) in the submitInitial call. 2. Thereafter, you'd have a cost of 2900 gas per validator rotated in/out (`G_sreset`), so up to 2.9M gas per era. 3. Following the current benchmarks of the ["old" unoptimized verifier of ~ 18k gas](https://github.com/doubledup/merkle-proof-optimisation/blob/main/.gas-snapshot) for validator set size of 1000 and guesstimating the old merkle root computation itself @ 15k gas (based on the improvement already seen from the lexical sorting optimisation down to 11k gas), you'd consequently be at breakeven for just storing the validator leaves plain (worst case 2.9M gas hit per era) and verifying their inclusion in the bitfield directly at 2.9M/15k ≈ 200 updates per era (on polkadot). So with these rough figures, going for plain leaf storage would make sense if updating at least every 7 minutes. The 7 minutes is assuming you're doing a full validator set rotation every era. As a loose indication, polkadot's validator set's stayed ~75% the same over the 14 eras I checked (1062-1076), so amortised update cost per era could well be order of magnitude lower, i.e. might still be better at the hourly update interval even.