This week I was able to clear the roadblocks on the attestation aggregation branch as well as test the lazy attestation aggregation branch with flamegraph. Given that progress I've started to look forward to the next phase of my contributions to the project, namely attestation packing.
You'd be right to suggest that the whole project is about attestation packing. However, in this context I'm referring to the packing phase as Satalia describes it in: https://lighthouse-blog.sigmaprime.io/docs/satalia-02-exact-approaches.pdf. It really just assumes that the aggregation portion of things has already been done. Although it is likely less efficient to actually aggregate during the aggregation phase. Instead we could wait to know the final attestations that will be included into the block to aggregate. But I'm not sure if that will turn out to be the case in reality.
The major problem of the packing stage can be characterized by the fact that it is an instance of the weighted maximum coverage problem. We'd like to pick from a set the subset of aggregated attestations that maximize the number of attesters that are included subject to the weights (associated rewards) and the contraint that only N attestations can be included in a block. The first approach proposed in the document linked above is called the MIP approach. A fitting name since it uses a Mixed Integer Programming solver to solve the entire problem of max coverage over the entire set of attestations available after the aggregagtion phase.
Satalia didn't stop there. They also proposed an optimization that they called the Decomposition approach. Here they make use of the strictly additive relationship of attestation rewards when the attestations have different attestation data. They split the problem into many sub problems that are still instances of weighted max coverage but over a much smaller set. This is expected to perform better given that the number of clique attestations per data is on average small. To stitch the subproblems together a dynamic programming approach to a problem similar to the Knapsack problem can be employed. This is used to select the number of attestations from each data to be included in the final selection.
Further, because the MIP solver takes time to start up and introduces a new dependency, Satalia proposed th Enumeration approach. This really isn't much of a new approach since it just substitutes the exact method of using the MIP solver with another exact method of using tree search algorithm called branch and bound. This is expected to increase performance because we will not have to wait for the MIP solver to start up for each subproblem. Additionally it introduces the possibility to optimize the search with better bounding functions or ordering of the search. So using the decomposition approach plus the branch and bound algorithm is the most performant approach in theory that we know so far.
Given the strict time requirements of the network even the enumeration approach might not be enough for optimal attestation packing to be feasible on all machines. There is somewhere around 950ms that can be used for this computation while waiting on the payload header from the MEV relay. Unfortunately, the MIP approach in some cases took upwards of 6 seconds. If the enumeration approach isn't fast enough, we will explore approximate algorithms which improve on the greedy approach to weighted max coverage. Possibly Randomized LP Rounding will be an approach to test.
I've been working on wrapping my head around the enumeration approach as well as familiarizing myself with related algorithms. I am beginning to implement this approach in a test repository.
This week I would like to test the maximum clique enumeration aggregation branch as well as the lazy attestation signature decompression branch. Unfortunately I don't have the resources to test these branches on my own, but thankfully Michael Sproul has offered to help us test on some of the infrastructure he has access to. Additionally, I would like to implement branch and bound in the test repository and test it's basic performance on a set of test instances to compare it to the MIP approach.