As the Ethereum network scales through various L2 solutions and becomes more computationally complex, monitoring the overall health of the network will be increasingly important to detect and respond to network issues and attacks rapidly.
Previously, in May there was a 7-block reorg which took a considerable amount of time to distinguish whether it was a malicious attack or a client implementation issue. After some time, it was later noted that this was due to certain validators not simply upgrading their software (caused by the proposer boost fork upgrade). In such occurrences, an Ethereum Network monitoring tool could be very useful to automatically detect and send alerts when there are issues in the network. This could catalyse the time required to address system failures and attacks. In addition, by utilising ML models it would also be possible to anticipate potential issues ahead of time and provide warning signals when there happens to be unusual activity occurring in the network.
At the current moment, there isn’t a monitoring system in place that captures and aggregates information about the overall health of the Ethereum Network from various client implementations. Hence, building a monitoring system to track network data as well as on-chain events could be beneficial. The main purpose of the monitoring tool is to be used to track network failures and attacks (such as reorgs) whilst also enabling devs and researchers to interact with the data through a UI for further research and analysis purposes.
The platform where the Ethereum infrastructure will be monitored is through the ELK (Elasticsearch Logstash Kibana) stack. The stack is optimised for building big data pipelines and streaming log data from servers. The Kibana platform also makes it well suited for the project as the data can be queried and visualised easily through publically accessible web interface.
To get the overall picture of the Ethereum network, it will be necessary to set up multiple nodes running on different consensus clients and conduct queries at the local level and global levels. The first queries that will be tracked and monitored will be local queries and in the second phase, global queries will be incorporated into the monitoring system.
A local query is a query processed between the tool and the beacon node. An example query would be detecting late blocks and attestation in real-time and also archiving the event stream data from the Beacon-API (such as block fullness, attestation and sync committee participation).
A global query is processed at a given central location and monitors performance across multiple nodes. An example query would be monitoring block latency across multiple nodes. This is to monitor and begin to estimate the network latency which can give further insight into the many small reorgs that occur.
In addition, monitoring the underlying server as well as the network traffic may be necessary to get a holistic view of how the servers are functioning when there are potential attacks (DoS attacks) and when the network gets congested (e.g when there is a high number of tx sent to the mempool). To achieve this, lightweight modules will be installed on each client server to periodically collect OS and hardware metrics.
Some of the lightweight modules to be included are:
The data pipeline is as follows:
Month 2: Data Pipeline building and setting up infrastructure
Month 3: Streamlining data into the centralised server and connecting it to Kibana visualisation dashboard.
Month 4: Building custom ML models and incorprating them to the alerting systems (watcher module) for given set metrics.
Some if the issues include:
The goal is to track and monitor as many features and metrics about the network as possible in the upcoming months. Given the timeframe, success would be to get the project off the ground and set up all the infrastructure and begin to stream network data, beacon node server data as well as some on-chain events through the data pipeline and visulise it through the interface (Kibana UI).
The project has been highly shaped by Fredrik’s proposal, hence I plan to engage with Fredrik for further guidance.
I also plan to engage with Barnabé having published insightful work regarding reorg and on-chain CL client behaviour.
Three Attacks on Proof-of-Stake Ethereum
(https://eprint.iacr.org/2021/1413.pdf)
Blockchain attack discover via Anomaly detection
(https://intranet.icar.cnr.it/wp-content/uploads/2019/12/RT-ICAR-CS-19-07.pdf)
Barnabé Monnot- Visualising the 7-block reorg on the Ethereum beacon chain
(https://barnabe.substack.com/p/pos-ethereum-reorg)
Barnabé Monnot- Data driven view of the beacon chain incident
(https://barnabe.substack.com/p/a-data-driven-view-of-the-beacon)
Intrusion detection systems on Ethereum (https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815256)