EPF Update 10

This past week I have been working on finalizing & refining the dataset of the pools in the overall ethereum network.
I foucused on Lido Finance, Binanace & Rocket Pool & few other smaller pools. Starting with Lido & Rocket pool, both staking pools have a complex system to identify their validators & address through which they deposited on the Eth2 deposit contract.
My initial approach to identify these pools was using the etherscan-label feature which gives an starting point. But these cannot be marked as authentic or reliable as anyone can label ay address on eherscan.
For both, Lido & Rocket pool, I manually extracted few address by quickly scanning through the displayed addresses & by reverse-sorting the addreses in my dataset in which the 'To' address was not the deposit contract. Then I dived deeper into the contracts for LIdo & rocket pool. Lido has a list of node operators i.e 30 at the time of writing from which which deposit to Eth2 contract. In their smart contract, they have a function that has two parameters,

  1. The index of node operator.
  2. The index of the deposit.
    This allows to get the node operator detail from which the addresses can be extracted by incrementing index. I quickly created a script to extract this information retrived the addresses which has a very low false positives.

Note: In my dataset, I identified a pattern that are normally used by major staking pools, that is they have a system of smart contracts that accepts Eth deposits & then batch submits the 32 ether to deposit contract. So, 'To' address will not always be the deposit contract. This will also help us to identify smaller pools.

For retreiving the rocket pool address, I dived deep into their documentation & rocketpool-go-sdk repo and studied how their contracts worked. It was a bit tricky to understand as Go Lang is not my cup of tea but feeding relevant pieces of code to ChatGPT made life a lot easier. Ultimately I was able to pull all of the minipools & 2 deposit addresses that are mainly used by rocket pool to deposit to Eth3 contract. Identifying minipools took a bit of an effort and later I realized it wasn't needed :/. Anyways learned alot from this.

Note: Minipools cannot be counted as a part of rocketpool as they are operated independently by the users & not by rocketpool. However, rocketpool can use this to operate on users behave but I beleive its highly unliklely because they have seperate conracts specifically for this functionality.

For Idenifying Binnace pools, I switched back to the revesre-sorted dataset to manually identified few addresses that were directly funded by Binance making it evident that these pools were operated by Binanace. Binanace issues bEth toke on binance blockchain called as BinanceEth.

One this I struggled with was the pools for Coinbase. I could not directly Identify them & am currently talking wih some teams for marking these pools. Coinbase mints cbETH (CoinbaseEth) in exchange for Eth deposited through coinbase app. I couldn't find any deposits or any addresses related to Coinbase or cbETH contract on ethereum that directly deposited to Eth2 contract. I have also checked some ither sources like beaconcha.in & Dune analytics, in their stats it shows that coinbase occupies roughly of around 10-13% of the network of which I am skeptical of. This is yet to be identified and wont stop us proceeding further.

If I were to quantify & assign percentages to the share of pools to the network according to my dataset (i.e data uptill 25th December, 2022) that has exactly 503,581 validators, we get the following numbers.

  1. Lido Finance has 144,585 validators &makes up of 28.71% of the network.
  2. Kraken has 38,605 validators & makes up of 7.66% of the network.
  3. Binance has 30,878 validators & makes up of 6.13% of the network.
  4. Staked.Us has 13,586 validators & makes up of 2.70% of the network.
  5. RocketPool has 10,349 validators & makes up of 2.05% of the network.
  6. Stakefish has 12,377 validators & makes up of 2.449% of the network.
  7. Bitcoin Suisse has 12,376 validators & makes up of 2.45% of the network.
  8. Figment has 12,109 validators & makes up of 2.40% of the network.
  9. Abyss Finanace has 4,284 validators & makes up of 0.85% of the network.
  10. Celcius network has 4,943 validators & makes up of 0.98% of the network.

I also plan to build a small dashboard or python script that shows the above data in visual format like a pie chart & number of addresses each pool is currently associated with.

Incorporating coinbase validators which are roughly estimated to be around 12%, the number of minipools (10,131) 2% that I identified earlier & roughly 4% of smaller pools gives us a total of 74.37% which can be safely maked as low false-positive staking pools in the ethereum mainnet. This means, the remaining 26.73% vaidators are solo stakers.

Next steps

Gathering the dataset with low false-positives & higher accuracy was a tedious task as the association with manual analysis was done in the identfication process.
Now that I have identified this, I can move on to the next phase which is to statistically analyze the prformance of solo-stakers vs pools and if they cause anamalies on the mainnet.