Introduction

This week I further worked on both issues disccussed previously. Those being lazy attestation signature decompression and maximal clique enumeration attestation aggregation.

Lazy Attestation Signature Decompression

For lazy signature decompression Michael Sproul ran some tests to see if the updates resulted in reduced CPU usage based on metrics from nodes on Goerli. The metrics showed no decrease in CPU usage. However, he also pointed out some optimizations that could be made to decrease CPU usage. The main idea of the lazy attestation signature decompression changes is simply to keep the data received from peers in the format received over the wire until necessary. This in itself would not lead to any decreased CPU usage because the signature would have to be decompressed at some point. However, Lighthouse maintains a list of observed aggregate attestations which allows us to check if the newly received attestation is the has attesting indices that are a subset of an observed one given the same data otherwise. These attestations do not provide any new utility because they have practically already been seen. So we skip further processing them and never have to decompress the signature. This should save some amount of CPU usage, but the exact gains are unclear. The tests mentioned earlier were conducted when there was still some obvious room for improvement: skipping was only implemented for batch processing, there was an extra unneccessary computation of the tree hash root of the attestation data, and there were more allocations than need be in the batch processing algorithm. I have since fixed these issues, but we have not tested the CPU usage on the Goerli nodes. I did confirm that decompressing attestation signatures is not taking a significant portion of compute as Michael Sproul had already pointed out. I wanted to learn how to run and make flamegraphs so I confirmed what he found.

Attestation Aggregation

On the attestaion aggregation front, I have been a bit stalled. I've found my current debugging workflow to be woefully inadequate and tedious. Some of the tests in the Lighthouse test suite fail to complete after a significant amount of time (20 minutes). Whereas in the unstable branch all cases pass within a couple of minutes. Really I think there are only 1 or 2 that even take more than a minute. So I'm working to get to the bottom of this problem. It seems as though the issue lies in the bron_kerbosch function. Which really shouldn't come as a surprise since it is easily the most complicated function being added. I'm now trying to make unit tests for the algorithm that fail in a similar way as the more integrated tests of the client.

Additionally, Ankur and I made slides to present our project during the next office hours project proposal session.