Week 12 - 13

TL;DR

Over the past two weeks, my primary focus has been on wrapping up the initial two phases of my project proposal and running those changes on Holesky. It actually took more time than initially anticipated and the iterations of my solution also highlighted areas of improvement and potential issues.

Hopefully we had the last round of back and forth on the added missed blocks of the monitoring component of Lighthouse. It's quite a critical component as Paul Hauner pointed out.
I proposed to the team a chart layout that I added to their lighthouse-metric repository to see the new implemented metrics for the missed blocks.
Running my own version of Lighthouse with my changes on Holesky.

With that being said, the next report will be about the last step of my project proposal which is adding an attestation simulator in order for Lighthouse to evaluate the implication of attestating 32 times per epoch. That's quite important in terms of performance as well as preparing some transitions such as SSF and increasing the max balance.

Exposing Validator Missed Block Metrics

Context

I fixed a few issues that would create confusion in the metrics including the value of the total label with using aggregate metrics for a Prometheus Gauge, the prunning of the missed blocks cache, increasing the missed block range as well as a fork scenario where the last validator value who missed a block is wrong. You can see the PR here and some great explanation from the LH team.
Adding some logs as per as the model used for other metrics in the validator monitor documentation.
I also created 3 charts that shows the brand new metrics whether there's some data for the monitored validators.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

You can find the PR here.

I also compiled my own version of Lighthouse, deploy it to Holesky and try to beat the next epoch from starting in order to have the list of validators that had been choosen to propose a block so I have enough values to see the changes reflected by my graphs.
It's not so easy if you are on Sepolia for instance where the testnet is relatively stable and I needed to pass a subset of validators to the monitored validator of Lighthouse before redeploying the BN. At the time of this writing, Holesky was not that stable and missed blocks happen almost at every epoch. In a nutshell, a perfect candidate!

The approach was the following:

Create a script that would query a lighthouse node on Holesky to get the next epoch proposers as fast as possible such as:
curl -H "Accept: application/json" -H "Authorization: Basic xxxxxxx" 'https://xxxxxxx.ethereum.testnet.kiln.fi/eth/v1/validator/duties/proposer/3220' | jq -r '.data[].pubkey' | tr '\n' ','
Make that script inject that list into a Lighthouse Flux config values so the BN reboots before the start of the new epoch with an updated list of validators ready to be monitored.
After a few missed blocks we could see the graphs being filled out.
Also, the process of "trying to beat the next epoch proposers" made me read and get involved in some ways in the in-depth understanding of the different issues that our web3signers behave with the new network and scaling 100k validation keys.

One of my colleagues wrote a very nice article about it if you are curious :)

Scheduler analysis

Context

I also came up with a graph that was similar to what the LH team just released with their Beacon Processor v2 dashboard.

Next steps

I will keep on reading about the scheduler and looking at LH new dashboard and see if there's any discrepancies that could be spotted on mainnet.
Once the missed blocks last changes are validated, rerunning a new version of Lighthouse and monitor how that works.
Preparing the 32-attestation per epoch simulator final step of my project proposal.

Week 12 - 13

TL;DR

Exposing Validator Missed Block Metrics

Context

Scheduler analysis

Context

Next steps

Read more

EPF cohort 4 Final Dev Update

EPF cohort 4 Week 16 update

EPF cohort 4 Week 14 - 15 update

EPF cohort 4 Week 10 - 11 update