EPF's Summary Update

The key aim of the project was to set up a platform to monitor and analyze the network performance of the current Ethereum network in real-time. This was achieved by setting up multiple nodes that are geographically distributed that report back to a centralized server to provide a means to visualise and interact with the data for further research purposes.

The chosen stack used to conduct the analysis was the ELK stack. This is due to being optimised for processing real-time data and coming with multiple server monitoring modules, such as metricbeat, packetbeat and filebeat.

The initial architecture design can be seen below:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

What has been achieved

Elastic stack status update

I have managed to implement the majority of the data pipelines, with the exception of logstash having been replaced with Kafka in later development processes. Currently, the application is effectively streaming real-time data from the three different beacon nodes as all data pipelines have been established.

Some of core components of the architecture diagram integrated include:

  • Configuring and setting up metricbeat, packetbeat and filebeat in each node to monitor the server log data as well as to capture metadata about the node's operations.
  • Configuring and setting up Elasticsearch, Kibana, Kafka and several beacon nodes on EC2 instances.
  • Building data pipelines from beacon nodes to the required destinations.
  • Streaming and indexing the real-time data on Elasticsearch to make it available for HTTP requests through the rest API.
  • Integrating Elasticsearch to the Kibana platform to provide a means to interact and visualise the streamed data.
  • Creating custom dashboards for users that can capture the state of the beacon node as well as the Ethereum network traffic in real-time
  • Backing up the streamed data to the S3 bucket iteratively.

The following are some of the key data points being actively monitored in real-time through the dashboards (please be patient as chart may take a while to load):

OS metrics (https://hackmd.io/@BemBaraki/H1BchYpno#System-and-service-metrics)

  • RAM usage per client
  • CPU usage per client
  • Diskspace usage per client

Monitoring the real-time performance for each client over time:

  • iowait: percentage of time that the CPU is idle waiting for input/output operations to complete
  • softirq: software interuptions that get handled by the kernel
  • nice: tasks that CPU is executing having below-normal priority
  • system: percentage of CPU time used by the operating system kernel
  • user: percentage of CPU time used by user-level processes

Network traffic (https://hackmd.io/@BemBaraki/H1BchYpno#Network-traffic-overview)

  • Network traffic distribution broken down by each country
  • Identifying the most popular cloud service providers being utilised by node operators in a given region
  • Total packets sent and received by each client and error transaction encountered over time by each client
  • Capturing the top hosts driving traffic in the Ethereum network and the number of unique connections made by each client over time
  • Transport protocol used by each client and it's corresponding traffic flow over time (e.g TCP, UDP, ICMP…)

Beacon node data (https://hackmd.io/@BemBaraki/rys5jvzJh)

EL

  • Calculating the proposer's block release time using the scheduled slot time and block arrival time in the p2p network
  • Monitoring the time gap between consecutive blocks in the p2p network broken down by each client
  • The total number of pending transactions found per slot in the network (calculated iteratively)
  • Monitoring block space utilisation per slot and the number of transactions being processed per slot
  • Tracking the base fee per slot

CL

  • Tracking the attestation disagreement rate in real-time per slot
  • Tracking the distribution of attestation inclusion delay overtime
  • Monitoring chain liveness
  • Tracking the sum of the attestation vote count per slot
  • Tracking the total number of depositors
  • Tracking the attesting gwei in each epoch

A brief report looking into local network latency and reorgs

As part of my developmental work on the Eth monitor project, I conducted data analysis for over 11,600 slots to investigate network latency and reorgs. During the investigation two reorg attempts were observed and a thorough inspection was undertaken to identify the underlying causes that led to these events. During the investigation, several factors such as MEV block rewards, base fee spikes and attestation disagreement rate were tracked to check their influence on network partitions and reorgs.

The report provides a detailed overview of the findings and includes visualizations to help better illustrate the analysis. The link for the report is provided below:

https://hackmd.io/@BemBaraki/r1D3sTZjj

Future Project

Although there has been some progress made to get the Ethereum monitor project off the ground, the project is not yet finished as there is still some work to be done in relation to:

  1. Building systems to detect anomalies through the use of machine learning
  2. Implementing logic to provide real-time alerts when anomalies or other suspicious activities are detected

At this stage, research is being undertaken to find the most appropriate Machine learning algorithms and architectures to be used to detect anomalies in the Ethereum network and build systems around it. One key architecture being explored is a Deep Learning Auto Encoder Model, which is an unsupervised model which does not require any human data labelling to train the model.

Once the model is operational, it will be integrated with the watcher module to provide real-time alerts, for example when large network partitions or increased latency are experienced.

Self Evaluation

I found the EPF's open-ended approach to making contributions to the core protocol to be incredibly appealing due to the diverse array of project ideas available to work on. The early stages of the program enabled me to follow my interests and supplement my knowledge by completing various courses and materials such as the systems engineering cadCad Ethereum validator economics and eth2book. This proved to be very useful later on as it enabled me to quickly build context around challenges faced upon embarking on the eth-monitor project.

The eth-monitor project required me to solve a wide variety of technical challenges to get the project off the ground, spanning from system architecture design, devops, data engineering and data analysis. Hence, the EPF's independent self-directed approach was beneficial as it allowed me to take control of my learning experiences and tailor them to the needs of my project. This gave me the flexibility to solve challenges more creatively and also pick up complementary scripting languages along the way such as ELK's painless language and vega.

I would also like to add that the standup calls also acted as a catalyst to my developmental process as they were a great opportunity to communicate the technical intricacies of my work and the progress made. This was also an excellent time to continue to learn and expand my knowledge on the vast complexities of the Ethereum network from other fellows. Hence, I would like to thank all participants for their engaging contribuitions shared to the cohort.

Finally, I want to express my gratitude to Mario Havel and Josh Davis for their superb efforts in running the EPF amazingly and making themselves available at all times. Additionally, I would like to extend my appreciation to Fredrik Svantes, as well as the other mentors who shared their time and expertise during the AMAs and throughout the programme. Their invaluable insights and guidance have not only been informative but also inspiring, helping me to shape my character going forward.

It's been a real pleasure being part of the cohort and looking forward to the upcoming steps, thank you so much everyone!

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →