Final EPF Development Update (Cohort 3) This is the final report of my project On-chain analysis of staking pools attestations. It was around the study of the analysis pools & how they impact the ethereum mainnet. Project Summary My main goal was to analyze performance of the validators controlled by the staking pools to validators that are not the part of any pool. If there is a notable difference between them then it meant that clustering of many nodes is impacting ethereum mainnet. When I chose this project, I was planning to utilize the existing tools, information & data that is currently available on staking pools to get a headstart. After exploring this for a week on different platforms & talking with various teams building on Ethereum it seemed like this type of data was either not easily available or was private information yet to be released. Briefly analyzing the information gave me an unsettling feeling that there might be some discrepencies & unaccuracies. Due to this, performing analysis on this data won't make much sense. Therefore I divided my main project into 2 parts. Produce a dataset after Identify & Labelling the network with different staking pools with low false-positives & higher accuracy. [Finished]Here my goal was to associate pools with their public keys & atleast identify >70% of the network which I was able to do.
2/28/2023EPF Update 10 This past week I have been working on finalizing & refining the dataset of the pools in the overall ethereum network. I foucused on Lido Finance, Binanace & Rocket Pool & few other smaller pools. Starting with Lido & Rocket pool, both staking pools have a complex system to identify their validators & address through which they deposited on the Eth2 deposit contract. My initial approach to identify these pools was using the etherscan-label feature which gives an starting point. But these cannot be marked as authentic or reliable as anyone can label ay address on eherscan. For both, Lido & Rocket pool, I manually extracted few address by quickly scanning through the displayed addresses & by reverse-sorting the addreses in my dataset in which the 'To' address was not the deposit contract. Then I dived deeper into the contracts for LIdo & rocket pool. Lido has a list of node operators i.e 30 at the time of writing from which which deposit to Eth2 contract. In their smart contract, they have a function that has two parameters, The index of node operator. The index of the deposit. This allows to get the node operator detail from which the addresses can be extracted by incrementing index. I quickly created a script to extract this information retrived the addresses which has a very low false positives.
2/28/2023EPF Dev Update #9 This past week have been a very busy, debuggish & informative week. I have finalized the indexer for retrieving data and shaped it to a more useful form. The indexer is now able to retrieve data in a format that is more easily usable for analysis. I have been in touch with the authors of a repository (https://github.com/alrevuelta/eth-metrics/) where they statistically determined and associated staking pools and their addresses. However, after having the conversation it turns out that the data is becoming outdated and by manually analyzing the hard-coded address in that repo gav me doubts about association of addresses with a specific pool i.e Binance for example. We are currently discussing this and see where this goes. Another important thing (alrevuelta) mentioned that asking for address association to discord pools is not a very good approach as asking them to verify doesn't work apparently. While verifying my dataset authenticity, I found an anomaly in my data & the data that is displayed on the Beaconcha.in website regarding validator indexes. My dataset matches with that of etherscan but it differes when it is extracted After diving into the open-sourced codebase, I discovered a section where it directly queries the Lighthouse client for data retrieval. This anomaly doesn't seem to be an issue for the explorer, but rather the Lighthouse client. This still needs to be verified, I will probably get in touch with team later. I also thought of an idea for associating validators to staking pools, similar to how Uniswap maintains its token list. However, this approach relies on the optimistic assumption that the data associated is correct. We might also need to associate some incentives to the contributors to increase authenticity. This will be very beneficial for the comuunity as a whole. I might persue this after cohort. Currently, I am in the phase of classifying the staking pools and have a few approaches that I will follow, this may deviate a bit from the main project as this ground needs to be set before the analysis for accuracy. I want to note that if we don't have enough data, we may conclude that our analysis is incomplete or inaccurate. However, if I see the complications growing, I will shortlist the major staking pools with a few false positives and continue from there. This will not include the performance of all pools, but rather a few pools to get the performance analysis. This approach will still give us valuable insights while being more practical. Here is how I plan to label the data:
1/23/2023EPF Dev Update #6 I went through the resources shared by Mario last week and found a useful repository at https://github.com/alrevuelta/eth-metrics that has a list of staking pool addresses. In order to ensure data accuracy and reduce false positives, I decided to develop an indexer for the deposit contract on Ethereum. I have created a repository for this project, which can be found here. In addition, I explored different APIs for data retrieval and took the time to verify the events that were being recorded. I compared the data for the first batch in the MongoDB database to the data on the Ethereum blockchain to make sure everything was consistent and correct. Next steps Continue to work on the event indexer and refining it as needed & finalize the data. Get in touch with staking pool teams & eth-metrics devs to cross-verify the addresses of the staking pools. (This might take time)
1/10/2023or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up