owned this note
owned this note
Published
Linked with GitHub
# Attestation aggregation: in-place aggregation
###### tags: `data structures` `algorithms`
**Author(s):** Victor Farazdagi (Prysmatic Labs)
*Last Updated: Jan 27, 2021*
[TOC]
## Overview
The aim of this document is to describe the optimal implementation of in-place (that's
without any extra allocations) aggregation of attestations using max-cover algorithm.
For the actual implementation of this design, see
[maxcover.go](https://github.com/prysmaticlabs/prysm/blob/develop/shared/aggregation/attestations/maxcover.go)
in our official repo.
## API
You are given a slice of pointers to attestations `[]*Attestation` (the actual content of the `Attestation`
type is not important, all we need is to make sure that it stores its bitlist in `Bitlist64` format used
in our max-cover algorithm implementation), which is supported by the following **underlying array**:
```mermaid
graph LR
a0 --- a1 --- a2 --- a3 --- a4 --- a5 --- a6 --- a7 --- a8 --- a9 --- a10
```
And the aim is to aggregate as many of them, and come up with an updated array:
```mermaid
graph LR
b0[a0, a3, a5] --- b1[a4, a10] --- b2[a1] --- b3[a2] --- b4[a6] --- b5[a7] --- b6[a8] --- b7[a9] --- b8[nil] --- b9[nil] --- b10[nil]
style b0 fill:#adf
style b1 fill:#adf
style b8 fill:#fed
style b9 fill:#fed
style b10 fill:#fed
```
Here, the first two elements **of the same** (hence the title "in-place") underlying array, are aggregate
attestations (the first being the combination of signature of attestations `a0`, `a3`, and `a5`), then we
have items that are not aggregatable (say, they have overlapping bits with some aggregated attestation), and
the last three items are set to nil (and thus can be garbage collected) as they were holding aggregated items.
## Proposed implementation
### Step 1: Prepare variables
Essentially, we want to split our underlying array into 3 parts, by data type they are holding:
- newly created aggregated attestations
- unaggregated attestations
- nil attestations (those elements can be garbage collected)
In order to keep track of them we will need just two variables:
```go
// assuming that all incoming attestations are in `atts` variable:
aggregated := atts[:0]
unaggregated := atts[:]
```
The `aggregated` slice is inited empty, and by pushing into it with `unaggregated = append(unaggregated, aggregate)`
we will be able to update the same underlying array (we will make sure that element which is replaced by
such a push is preserved at some other slot).
And `unaggregated` slice will take care of the second and third parts of the underlying array: all items
that are within range or that slice -- unaggregated, if that slice is shrink i.e. there are items in underlying
array that are not covered by that slice -- those are nil, and garbage collectable.
:::info
**Important Invariant:**
At any moment the partial solution can be obtained by `append(aggregated, unaggregated...)`.
Indeed, by combining those slices we are getting all the elements of the underlying array except for nil ones (third part)
which are not necessary to return, and will be garbage collected.
:::
### Step 2: Find coverage
All attestations that are pointed by `unaggregated` are candidates for aggregation:
```go
// Obtain maximum (up to whole length of `unaggregated` slice) coverage:
selectedKeys, coverage := MaxCover(unaggregated, len(unaggregated))
```
The `selectedKeys` list will hold all the indexes of attestations that can be aggregated during this round,
and the `coverage` will hold a combined bitlist of aggregated attestations (that's all the bits that are set to 1 when
all those attestations are taken together).
Suppose we have the following three attestations marked as selected by `selectedKeys`:
```mermaid
graph LR
a0 --- a1 --- a2 --- a3 --- a4 --- a5 --- a6 --- a7 --- a8 --- a9 --- a10
style a3 fill:#6c6
style a5 fill:#6c6
style a10 fill:#6c6
```
### Step 3: Aggregate
Before we decide whether to aggregate those selected attestations we need to check whether their combined
`coverage` contains any bits that haven't been already covered by some previous aggregations, because if all
the bits are already covered, we can simply ignore those unaggregated attestations -- they do not provide any
new information.
So, there are 2 cases:
- The `coverage` has some new bits: in that case we need to create a new combined attestation and persist
that aggregate in our `aggregated` slice.
- The `coverage` has no new bits: individual attestations can be dropped.
#### 3.1: Create an aggregate attestation
In order to optimize things we accumulate the new aggregation at the memory location where `a3` is stored i.e.
we create an aggregate attestation (which has all the same data as `a3`, `a5`, and `a10`, with bitlist set to `coverage`,
and signature is combined/aggregated signature of all three items):
```mermaid
graph LR
b0[a0] --- b1[a1] --- b2[a2] --- b3[a3, a5, a10] --- b4[a4] --- b5[a5] --- b6[a6] --- b7[a7] --- b8[a8] --- b9[a9] --- b10[a10]
style b3 fill:#adf
style b5 fill:#adf
style b10 fill:#adf
```
So, we have an aggregate stored at `a3` location of the underlying array, `a5` and `a10` can be removed.
Now, we need to push that `a3, a5, a10` aggregate into `aggregated` slice, if, however, we do the following:
```go
// Assuming aggregate = Attestation{a3, a5, a10}:
aggregated = append(aggregated, aggregate)
```
we will overwrite `a0` (remember our `aggregated` was initialized with `atts[:0]`), so we make sure we preserve it before pushing the newly created aggregate into `aggregated`.
All we need to do is to find the first non-aggregated item before the `a3, a5, a10` aggregate, and swap
that item with the aggregate. Non-surprisingly, it is easy to do:
```go
// Assuming idx is index of the aggregate attestation:
idx0 := len(aggregated)
if idx0 < idx {
atts[idx0], atts[idx] = atts[idx], atts[idx0]
aggregated = atts[:idx0+1] // expand to newly added aggregate
unaggregated = unaggregated[1:] // shift the starting point of the slice
}
```
Underlying array will have the following layout:
```mermaid
graph LR
b0[a3, a5, a10] --- b1[a1] --- b2[a2] --- b3[a0] --- b4[a4] --- b5[a5] --- b6[a6] --- b7[a7] --- b8[a8] --- b9[a9] --- b10[a10]
style b0 fill:#adf
style b3 fill:#f97
style b5 fill:#adf
style b10 fill:#adf
```
And if we include `aggregated` and `unaggregated` slices (note that "end" is inclusive):
```mermaid
graph LR
subgraph main["Underlying array"]
b0[a3, a5, a10] --- b1[a1] --- b2[a2] --- b3[a0] --- b4[a4] --- b5[a5] --- b6[a6] --- b7[a7] --- b8[a8] --- b9[a9] --- b10[a10]
style b0 fill:#adf
style b3 fill:#f97
style b5 fill:#adf
style b10 fill:#adf
end
subgraph aggregated["aggregated"]
agg1["start"] --> b0
agg2["end"] --> b0
end
subgraph TB unaggregated["unaggregated"]
unagg1["start"] --> b1
unagg2["end"] --> b10
end
```
#### 3.2: No need to aggregate
No need to do anything.
### Step 4: Cleanup
So, depending on whether we did aggregate during the previous step, we end up with either
```mermaid
graph LR
b0[a3, a5, a10] --- b1[a1] --- b2[a2] --- b3[a0] --- b4[a4] --- b5[a5] --- b6[a6] --- b7[a7] --- b8[a8] --- b9[a9] --- b10[a10]
style b0 fill:#adf
style b3 fill:#f97
style b5 fill:#adf
style b10 fill:#adf
```
or
```mermaid
graph LR
b0[a0] --- b1[a1] --- b2[a2] --- b3[a3] --- b4[a4] --- b5[a5] --- b6[a6] --- b7[a7] --- b8[a8] --- b9[a9] --- b10[a10]
style b3 fill:#adf
style b5 fill:#adf
style b10 fill:#adf
```
underlying array.
We now need to traverse `selectedKeys` one more time and remove redundant attestations. Essentially, it is
the same procedure, but when aggregation has actually occurred during the previous step, we will traverse
`selectedKeys[1:]`, instead (as at index 3 there's a new element -- `a0`, which should be left intact).
Item removal is also done in-place, let's illustrate for the first case:
```mermaid
graph LR
subgraph three["remove a10"]
d0[a3, a5, a10] --- d1[a1] --- d2[a2] --- d3[a0] --- d4[a4] --- d5[a9] --- d6[a6] --- d7[a7] --- d8[a8] --- d9[a5] --- d10[a10]
style d0 fill:#adf
style d9 fill:#fed
style d10 fill:#fed
end
subgraph two["remove a5"]
c0[a3, a5, a10] --- c1[a1] --- c2[a2] --- c3[a0] --- c4[a4] --- c5[a9] --- c6[a6] --- c7[a7] --- c8[a8] --- c9[a5] --- c10[a10]
style c0 fill:#adf
style c5 fill:#f97
style c9 fill:#fed
style c10 fill:#adf
end
subgraph one["before removal"]
b0[a3, a5, a10] --- b1[a1] --- b2[a2] --- b3[a0] --- b4[a4] --- b5[a5] --- b6[a6] --- b7[a7] --- b8[a8] --- b9[a9] --- b10[a10]
style b0 fill:#adf
style b5 fill:#adf
style b10 fill:#adf
end
```
Here is what happens:
- When we are removing `a5`: find the highest non-aggregated item at right of the current item, swap.
Once done, you can shrink `unaggregated` to the slot before that highest position -- as all items after and
on it are aggregated.
- When we are removing `a10`: nothing needs to be done -- there's no unaggregated item at the right of `a10`!
:::info
At this point our slices look like:
- `aggregated` starts at 0, has length of 1 (capacity is that of underlying array, of course).
- `unaggregated` starts at 1, has length of 8 (items `a1..a8`).
:::
### Step 4: Repeat..
If there are `unaggregated` items left (and we didn't hit the threshold of rounds), we repeat "Step2: Find coverage",
passing in our updated `unaggregated` slice. Note that `selectedKeys` will be within the range of that updated
`unaggregated` slice, so there will be no problems of the next rounds mixing things for aggregated or nil
attestations of previous rounds.
Let's illustrate:
```mermaid
graph LR
subgraph six["repeat?"]
g0[a3, a5, a10] --- g1[a2, a9, a6] --- g2[a1] --- g3[a0] --- g4[a4] --- g5[a8] --- g6[a7] --- g7[nil] --- g8[nil] --- g9[nil] --- g10[nil]
style g0 fill:#adf
style g8 fill:#fed
style g9 fill:#fed
style g10 fill:#fed
style g1 fill:#adf
style g7 fill:#fed
end
subgraph five["cleanup"]
f0[a3, a5, a10] --- f1[a2, a9, a6] --- f2[a1] --- f3[a0] --- f4[a4] --- f5[a8] --- f6[a7] --- f7[a6] --- f8[a9] --- f9[nil] --- f10[nil]
style f0 fill:#adf
style f8 fill:#fed
style f9 fill:#fed
style f10 fill:#fed
style f1 fill:#adf
style f7 fill:#fed
end
subgraph four["update aggregated/nonaggregated slices"]
e0[a3, a5, a10] --- e1[a2, a9, a6] --- e2[a1] --- e3[a0] --- e4[a4] --- e5[a9] --- e6[a6] --- e7[a7] --- e8[a8] --- e9[nil] --- e10[nil]
style e0 fill:#adf
style e9 fill:#fed
style e10 fill:#fed
style e1 fill:#adf
style e5 fill:#adf
style e6 fill:#adf
end
subgraph three["aggregate"]
d0[a3, a5, a10] --- d1[a1] --- d2[a2, a9, a6] --- d3[a0] --- d4[a4] --- d5[a9] --- d6[a6] --- d7[a7] --- d8[a8] --- d9[nil] --- d10[nil]
style d0 fill:#adf
style d9 fill:#fed
style d10 fill:#fed
style d2 fill:#adf
style d5 fill:#adf
style d6 fill:#adf
end
subgraph two["selected attestations"]
c0[a3, a5, a10] --- c1[a1] --- c2[a2] --- c3[a0] --- c4[a4] --- c5[a9] --- c6[a6] --- c7[a7] --- c8[a8] --- c9[nil] --- c10[nil]
style c0 fill:#adf
style c9 fill:#fed
style c10 fill:#fed
style c2 fill:#6c6
style c5 fill:#6c6
style c6 fill:#6c6
end
subgraph one["before second round begins"]
b0[a3, a5, a10] --- b1[a1] --- b2[a2] --- b3[a0] --- b4[a4] --- b5[a9] --- b6[a6] --- b7[a7] --- b8[a8] --- b9[nil] --- b10[nil]
style b0 fill:#adf
style b9 fill:#fed
style b10 fill:#fed
end
```
## Conclusion
Since incoming `[]*Attestations` slice can only shrink (we are filtering out aggregated attestations), we
can be assured that enough memory have been already allocated, and proceed with all the filtering/covering
networks without allocating anything.
:::warning
In this whole design there's only one place, where we still need to allocate new memory:
when creating an aggregated attestation, allocation for that newly created attestation is unavoidable
(that's also done in the most optimized way possible, without recalculating what we already know, like
combined coverage bits).
:::