Abdul Sami J.

@abdulsamijay

Joined on May 14, 2022

  • Final EPF Development Update (Cohort 3) This is the final report of my project On-chain analysis of staking pools attestations. It was around the study of the analysis pools & how they impact the ethereum mainnet. Project Summary My main goal was to analyze performance of the validators controlled by the staking pools to validators that are not the part of any pool. If there is a notable difference between them then it meant that clustering of many nodes is impacting ethereum mainnet. When I chose this project, I was planning to utilize the existing tools, information & data that is currently available on staking pools to get a headstart. After exploring this for a week on different platforms & talking with various teams building on Ethereum it seemed like this type of data was either not easily available or was private information yet to be released. Briefly analyzing the information gave me an unsettling feeling that there might be some discrepencies & unaccuracies. Due to this, performing analysis on this data won't make much sense. Therefore I divided my main project into 2 parts. Produce a dataset after Identify & Labelling the network with different staking pools with low false-positives & higher accuracy. [Finished]Here my goal was to associate pools with their public keys & atleast identify >70% of the network which I was able to do.
     Like  Bookmark
  • EPF Update 10 This past week I have been working on finalizing & refining the dataset of the pools in the overall ethereum network. I foucused on Lido Finance, Binanace & Rocket Pool & few other smaller pools. Starting with Lido & Rocket pool, both staking pools have a complex system to identify their validators & address through which they deposited on the Eth2 deposit contract. My initial approach to identify these pools was using the etherscan-label feature which gives an starting point. But these cannot be marked as authentic or reliable as anyone can label ay address on eherscan. For both, Lido & Rocket pool, I manually extracted few address by quickly scanning through the displayed addresses & by reverse-sorting the addreses in my dataset in which the 'To' address was not the deposit contract. Then I dived deeper into the contracts for LIdo & rocket pool. Lido has a list of node operators i.e 30 at the time of writing from which which deposit to Eth2 contract. In their smart contract, they have a function that has two parameters, The index of node operator. The index of the deposit. This allows to get the node operator detail from which the addresses can be extracted by incrementing index. I quickly created a script to extract this information retrived the addresses which has a very low false positives.
     Like  Bookmark
  • EPF Dev Update #9 This past week have been a very busy, debuggish & informative week. I have finalized the indexer for retrieving data and shaped it to a more useful form. The indexer is now able to retrieve data in a format that is more easily usable for analysis. I have been in touch with the authors of a repository (https://github.com/alrevuelta/eth-metrics/) where they statistically determined and associated staking pools and their addresses. However, after having the conversation it turns out that the data is becoming outdated and by manually analyzing the hard-coded address in that repo gav me doubts about association of addresses with a specific pool i.e Binance for example. We are currently discussing this and see where this goes. Another important thing (alrevuelta) mentioned that asking for address association to discord pools is not a very good approach as asking them to verify doesn't work apparently. While verifying my dataset authenticity, I found an anomaly in my data & the data that is displayed on the Beaconcha.in website regarding validator indexes. My dataset matches with that of etherscan but it differes when it is extracted After diving into the open-sourced codebase, I discovered a section where it directly queries the Lighthouse client for data retrieval. This anomaly doesn't seem to be an issue for the explorer, but rather the Lighthouse client. This still needs to be verified, I will probably get in touch with team later. I also thought of an idea for associating validators to staking pools, similar to how Uniswap maintains its token list. However, this approach relies on the optimistic assumption that the data associated is correct. We might also need to associate some incentives to the contributors to increase authenticity. This will be very beneficial for the comuunity as a whole. I might persue this after cohort. Currently, I am in the phase of classifying the staking pools and have a few approaches that I will follow, this may deviate a bit from the main project as this ground needs to be set before the analysis for accuracy. I want to note that if we don't have enough data, we may conclude that our analysis is incomplete or inaccurate. However, if I see the complications growing, I will shortlist the major staking pools with a few false positives and continue from there. This will not include the performance of all pools, but rather a few pools to get the performance analysis. This approach will still give us valuable insights while being more practical. Here is how I plan to label the data:
     Like  Bookmark
  • EPF Dev Update #8 This past week, I have been working on indexing data and storing it in MongoDB. The data I am collecting is transaction hashes from the blockchain, specifically for the 'from' and 'to' fields. The process of collecting this data took longer than expected, as there were 286k transaction hashes to gather. However, I was able to successfully gather all of the data needed. In addition to the data indexing, I also worked on developing the front-end for retrieving data. This includes creating a search bar for users to easily search for specific data, as well as general statistics that can be displayed on the front-end. With the indexer and front-end development now complete, I will be working on integrating the two. I anticipate this step will take one day to complete. Once the indexer and front-end are integrated, I will be compiling the data. This process will also take one day to complete. Next steps After the integration of the indexer and front-end is completed, I will conduct thorough testing to ensure everything is working properly. This will include checking that data is being properly indexed and stored in MongoDB, as well as testing the search bar is working properly. Label & associate the addresses retrieved from the data to staking pools.
     Like  Bookmark
  • EPF Dev Update #6 I went through the resources shared by Mario last week and found a useful repository at https://github.com/alrevuelta/eth-metrics that has a list of staking pool addresses. In order to ensure data accuracy and reduce false positives, I decided to develop an indexer for the deposit contract on Ethereum. I have created a repository for this project, which can be found here. In addition, I explored different APIs for data retrieval and took the time to verify the events that were being recorded. I compared the data for the first batch in the MongoDB database to the data on the Ethereum blockchain to make sure everything was consistent and correct. Next steps Continue to work on the event indexer and refining it as needed & finalize the data. Get in touch with staking pool teams & eth-metrics devs to cross-verify the addresses of the staking pools. (This might take time)
     Like  Bookmark
  • EPF Dev Update #7 This past week, I continued working on gathering transaction data through Indexer (Repo) from a Geth node using an RPC (provided by Mario) method and storing it in a CSV file. I have made progress in several areas, but have also encountered some limitations. There were a few logical mistakes & wrong assumptions about the APIs I was working with for which I had to dive into the Geth codebase. Since I was working with the RPC endpoint I took the opportunity to explore the internals of ethersJS & web3JS as to how they encapsulate RPC calls into functions. Pretty Intresting. I have extracted the initial dataset for the deposit contract on Ethereum using the event indexer. I am using the dataset from 14th October, 2020 i.e (from the point the contract was deployed) up untill 25th December, 2022. That is around 500k i.e (503580 to be exact) validators information. but I am assuming there are some discrepancies. I will have to double check the following.Make sure all of the validators are captured. I plan to check by summing all of the validator indexes & its sum should be equal to the sum of first 503580 numbers! I will optimize this later. Make sure no event is missed. Make sure events from uncle blocks are not included. This took me 2 days to figure out. As captured validators were more than number of active validators! I also encountered some network delays that caused the data retreival to fail due to too many request or bandwith problems from my end which led to the missed event data i.e (events from blocks were partially downloaded). After checking this I realized the indexer missed about 90k validators! I will optimize this later too.
     Like  Bookmark
  • EPF Dev Update #5 The past week, I have been working on finalizing & validating the project proposal for the fellowship with the help of a mentor. I have been also working on pre-reqs of the project that included gathering data realted to the staking pools. I looked at a few potential data sources like beaconcha.in & Lido for getting the list of addresses of all of the nodes. It turns out after talking with the beaconcha.in team on discord this data is not available and is intended to be public in Q1/Q2 of next year. They also pointed out that the number of validators they listed have some false positives. Next step There have been some study done on staking pools, Mario Havel shared a resource. I'll get started from there. Will get in touch with few teams for gathering staking pool validator data. Will get in touch with the lighthouse team to see if they have an archive node for chaind data extraction.
     Like  Bookmark
  • EPF Dev Update #4 The past few weeks I have been working on finalizing a project. I wanted to primarily work on performace analysis of missed attestation for which I went through a lot of reading material for validators & work already done medalla testnet. I also did deep dive on the data gathering tools like lcli & chaind going through the database structure & how the raw data is converted & stored for easier retreival afterwards. It took me while as I am not very proficient with eith RUST & GO but I think this is a great opportunity for learning. I think I have all the information needed & will start working on one the projects below maybe it can be combined. Understanding pools role in generating late blocks or if clustering of multiple mainnet nodes is causing missed attestations. (More Likely) Find correlation between sync committee particpation and attestation participation. The visualization of this data could be useful. Either project can be built on top of chaind where an archive node needs to be running from where it can fetch the blockchain data & store in the database. I will validate the idea with the mentor this week and start building it.
     Like  Bookmark
  • EPF Dev Update #3 I have been exploring the article on Medalla Challenge. The article is heavily based on the assumption that validators who joined the testnet were not properly or were poorly incentivised to join the network as no real value was at stake & the participants had infra management cost overhead. Therefore, we witnessed some anamoly in the dataset while analyzing the challenge. Combining the previous findings with the current I was working on a refined document to list metrics for performance analysis. I explored the code base of chaind. This could be used to access information of beacon-chain & then later be queried for more complex scenarios. I also went over to the Lodestar implementation to understand more about sync-committes & inclusion-delay rewards & attestaion. Next step Finalize the document that includes Validator role, Metrics that can be used to analyze performance. Clarify the confusion about the sync-comittees in Eth 2.0 on how they are formed.
     Like  Bookmark
  • Responsibilities of a Validator The validator is then responsible for Checking that new blocks propagated over the network are valid. Create an attestation as a vote for block validity. Sign & broadcast attestation in every epoch (6.4 minutes). Propose blocks if the validator is chosen as a block proposers. Participating in Sync committees. Why do we need validators ?
     Like  Bookmark
  • EPF Dev Update #2 In my last week, I have been focusing on the validator architecture of ethereum 2.0. I collected information through the links provided in my last update. For my references I created an initail document to note (will cange with revisions) the information related to validator's resposibilities, componenets & reward calculations. I also explored the the attestation data analysis done on the testnet by Pintail to better extract the attestation performance metrics for Eth mainnet. Few metrics Identified through by reading the article are listed below : Participation trends Participation statistics Active/Inactive rate at X Epoch
     Like  Bookmark
  • EPF Dev Update #1 The first week's goal was to validate the projects from the mentors assigned to the Ethereum fellowhip #3 participants and famaliarize with the current develpment status. The program also incourages the participants to tackle the open problems from current & previous cohorts. I am excited to join the program. I do plan to grab this opprtunity to tackle on more than one problem in this cohort. Progress this week After going through the open list of problem statements mention in the current & previous fellowship repo. I shortlisted the following projects to work on as they seem intresting & could help the core devs. On-chain analysis of missed attestations by Potuz CL + MEV software by Alex Stokes Tooling useful for debugging CL and EL by Paritosh
     Like  Bookmark