Block size is one of the factors that limits scalability, since it should be send to multiple peers and block size directly affects network traffic and required bandwidth.
There is a spreadsheet estimating an average block size in Ethereum 2.0 to be about 50K, and even smaller - 30K, when compressed. However, the calculations assume there will be 32 attestations in an average block (out of maximum 128 attestations), 2 (aggregated) attestation per committee on average (16 committee).
We argue that the average number of attestations aggregates per committee can be significantly higher, in normal scenarios (no attack) - around 8 per committee and may be even higher. So that an average block size should be calculated based on 128 attestations in a block. And one should consider increasing the 128 limit to accommodate possible higher demand.
However, attestations occupy the most part of an encoded block and growing number of attestations means the overall block size grows several times too. So, we propose ways to mitigate this and estimate reduction gains we can expect from either block re-structuring or compression techniques. The main idea is that an encoded block size should not be related to number of attestations in such a strong way (400-1300 bytes per attestation, 1.5x less expected when compressed).
The block's data structure allows each Attestation contain only one AttestationData (https://github.com/ethereum/eth2.0-specs/blob/dev/specs/core/0_beacon-chain.md#attestation). AttestationData simultaneously describes a vote for a beacon state and for a shard state. So, if attesters in a committee have two distinct views beacon-wise and two distinct views shard-wise, this can result in up to 4 distinct AttestationData.
Distinct views among attesters can result from network delays, faults (even from non-byzantine ones, e.g. a transit node failed or delayed a message for some reason), etc. Given 3 seconds time to propagate a new block through p2p-graph to 300-1000 attesters (per committee), it's highly likely, that some of them will receive the block too late and will issue an attestation for a previous block. It also looks likely, that some will miss two last blocks or even more.
For example, consider a case, when it's 2% chance for an attester to miss a current slot block and the 1% chance to miss the previous block. If for example we have 300 attesters then it's 6% chance to have at least one attester miss two blocks (1-(1-0.02*0.01)^300 ~= 6%), that means we'll have three distinct views (new block, previous block, minus-2 block).
This logic can be applied both to the beacon state and to the shard state. Combined, we should typically have 4 different AttestationData per committee and sometimes even more.
Moreover, if an attestation (issued during some slot) has not been included in the next block, it can be included in a later one. That can happen for the same reasons (e.g. network delays, clock drift, etc). They will necessarily have distinct AttestationData, since slot number will be different. And it's possible that attestations from several preceding slots are to be included in the current one. That's why we are expecting an additional factor of 2 to accommodate for late attestations.
So, we are expecting around 8 distinct AttestationData per committee and possibly even more. That means a typical block can have around 16*8 = 128 attestations. And maybe it makes sense to increase the 128 limit to allow for a block to include lots of distinct AttestationData, if there happened some severe conditions leading to many delayed blocks and attestations.
Currently, aggregated signature should have no more than 1 occurrence of an individual signature (a signature from a particular attester). This is because it's encoded with a bitfield (aggregation_bits in Attestation class).
However, if attestations are aggregated in a committee subnet in a random (unstructured) way (gossip-like protocols), then there will be many partial attestation aggregations with random bitfield assignments. That means, from some aggregation size (around sqrt(m), m is the committee size), the probability to get two partial aggregations with overlapping attester indices will be quite high (> 0.5). And it will be very difficult (and probably impractical) to get comlete or near complete bitfields. Because we cannot retract conflicting individual attestations from a signature, without knowing them.
That means, one have to aggregate attestations in a structured way, e.g. organize aggregation tree/forests (e.g. like Handel does). While it seems to viable approach, we are loosing an alternative way, which is possibly easier to implement in the context of p2p-graph and BFT requirement (Handel ensures BFT, but requires all nodes to have mutual pairwise connections, which requires adaptation for the p2p case).
An alternative way could be to store partially aggregated attestations of medium size (around 20-30% of full committee size), which means there should be several times more attestations to cover the committee. But it's hardly feasible with current Block structure.
Block size calculation above (spreadsheet) assumes an average block contains 32 attestation, each taking 1320 bytes. In detail, AttestationData takes 200 bytes, signature – 96 bytes, aggregation and custody bitfields – 512 bytes each (assuming worst case with 4096 attesters). 32*1320 = 42240 bytes (out of 47070 bytes of an average block size).
Given 128 attestation, we'll have significantly bigger average block size 128*1320 + 4830 = 173790 bytes.
However, this is assuming maximum possible amount of attesters (4096). For 300 attesters, the block size will be around 52K, and for 1000 attesters - around 89K.
Attestations consume most of encoded block data. However, it's caused by the block structure and thus can be mitigated via an appropriate compression or by adapting the block structure. The spreadsheet also mentions this
Most AttestationData is identical between multiple objects, attestation bitfield can be compressed if portion of bits filled not exactly 1/2, custody bitfield can be compressed as it must be a subset of attestation bitfield.
First, custody and aggregation bitfields are parts of Attestation structure. However, a custody bitfield is related to a committee, while an aggregation bitifield specifies attesters whose signatures are included in a particular aggregate attestation.
So, custody bitfield is replicated 2-8 times (ignoring any form of compression). Instead, custody bitfield can be moved to BeaconBlockBody structure, as a list of custody bitfields per each committee (it can be concatenated to a single bitfield too). Thus, each committee's custody bitfield will appear only once in a block, instead of 2-8 times and will take 8K bytes in worst case (4096 attesters per slot). And for 300-1000 attesters per committee, it will be 600-2000 bytes (for 16 committees). Since custody bits are hardly predictable it doesn't seem reasonable to look for a way of compressing it any further.
AttestationData contains 2 (3) votes: one for a beacon state and one for a shard state. If we have 2-3 views beacon-wise and 2-3 views shard-wise, then we can have 4-9 distinct AttestationData variants in a worst case per committee. The AttestationData occupies a large portion of encoded block – e.g. for 128 Attestations there will be 128 * 200 = 25.6K data. While they will have shared bits, corresponding to the appropriate votes versions.
For example, if we have 2-4 distinct votes for the beacon state per each slot (attestions can attest blocks of several previous slots), let's assume we have 3-4 slots. So, overall there are 6-16 distinct beacon votes. With straightforward encoding, it's 128 * 112 bytes (Hash + 2 Checkpoints) = 14K. But if we maintain a list of different votes, it will be 6-16 * 112 = 1-2K plus 3-4 bits per attestation to reference it.
It's a bit more difficult to compress Crosslink votes, since there will be several versions per shard/committee, e.g. 2-3 per committee means 32-48 total plus votes from previous slots versus 128 Crosslinks with a straightforward encoding. However, different Crosslink votes have common bits, e.g. shard, parent_root, data_root, etc. E.g. shard can be calculated, start_epoch and end_epoch can be stored in an incremental form, and parent_root/data_root can be versioned (i.e. keep a list of versions per committee and several bits per attestation to store an index in the list). E.g. instead of 128 * 88 = 11K we can store around 64 * (32+32) = 4K plus around 1-2 byte per attestation to keep references or epoch increments.
So, overall we can reduce block size from 128 * 200 = 25K to something like 6-7K.
We could probably squeeze some bytes from aggregation bitfield compression.
For example, for 128 attestation and 300 attesters per commitee we should store 128 * 300 bits with a straightforward encoding. If we store 2-3 bits referencing (slot specific) attestation version for each attester, then we can save some Kbytes.
E.g. if we have 16 committees for the main slot, we can store 16 * 300 * 2-3 bits = 1200-1800 bytes vs 64 * 300 bits = 2400 bytes for a straightforward encoding (assuming 4 attestations per committee). Encoding for attestations from earlier slots can be less beneficiary.
It's possible that generic compression libraries (e.g. Snappy or so) typically used to reduce network traffic/storage will be able to exploit the redundancies mentioned above. However, it might be the case that they won't achieve good compression rates, since they are using fast and relatively weak compression algo/level. It should be verified in experiments, of course.
My experience tells me that eliminating redundancy with such inexpensive way is a good starting point that facilitates better compression rate and/or higher compression speed (can use less intelligent but faster compression levels/libraries).
We are expecting that there will be significantly more than 32 attestation in an average block and 128 is more accurate estimation even for normal conditions (no attacks or severe network failures). Moreover, we think it will be beneficial to further increase block capacity to accommodate for potential severe conditions. And to allow more flexibility in protocol implementation (e.g. overlapping partial aggregations). While it will necessarily lead to significantly bigger blocks, a block structure re-organization and/or clever compression techniques can ensure that there is a limited impact of adding one more attestation to a block, around 100 bytes (96 for signature plus some bytes to store references).