owned this note
owned this note
Published
Linked with GitHub
# Open-source Data Analytics of Ethereum Staking Pools
## Background
Ethereum staking is the act of depositing 32 ETH to the [deposit contract](https://etherscan.io/address/0x00000000219ab540356cBB839Cbe05303d7705Fa), calling the "deposit" ABI, and emitting a "DepositEvent". A validator‘s pubkey is then valid for staking on the beacon chain.
Since beacon chain staking is complicated and requires some professional knowledge, many staking pools provide simpler staking services to ordinary ETH holders based on the beacon chain. These [staking pools](https://beaconcha.in/stakingServices) generate many validators by depositing ETH from the same address or addresses with the same "name tag". It is possible to group validators into different staking pools for further analysis according to such features.
Several projects are working on analyzing Ethereum staking pools, like [rated.network](https://www.rated.network/), [beaconcha.in](https://beaconcha.in/pools), [ethereumpools.info](http://ethereumpools.info), [pools.invis.cloud](https://pools.invis.cloud/), and showing different analyzing results. However, these projects are not open-source, resulting in the uncertainty of the data accuracy and thus confusing us with which one we should refer to.
Therefore, we decide to conduct open-source data analytics on Ethereum staking pools. The source code is uploaded to [Github](https://github.com/Zachary-Lingle/ethsta_staking_analysis) and the data is visualized on [ethsta.com](https://ethsta.com/).
## How it works
![](https://i.imgur.com/lUJjloC.png)
## ETL
All the raw data is obtained from [Etherscan APIs](https://etherscan.io/apis).
1. DepositEvent
1. txid: the transaction that calls the deposit contract and emits the event
2. eth2_validator: the validator pubkey in the calldata
2. Internal transaction (the contract caller is another contract, like Lido)
1. txid: the transaction id that generates the internal transaction
2. from: the address that creates the transaction
3. value: the ETH amount of the internal transaction
3. Transaction (the contract caller is an EOA, like Coinbase)
1. txid: the transaction id
2. from: the address that creates the transaction
3. value: the ETH amount of the transaction
4. Tag
1. address: an EOA address or contract address
2. name: the "name tag" of the address on Etherscan
## Grouping
The grouping process is written in Python, but we'd like to describe it with SQL for simplicity as follows.
```sql
SELECT
name,
COUNT(eth2_validator) as validator_count,
SUM(value) as total_value,
COLLECT_SET(eth2_validator) as eth2_address,
COLLECT_SET(from) as eth1_address
FROM event, internal_transaction, transaction, tag
WHERE
event.txid = internal_transaction.txid
AND event.txid = transaction.txid
AND tag.address = internal_transaction.from
AND tag.address = transaction.from
```
## Visualization
![](https://i.imgur.com/Sv8VmDm.png)
![](https://i.imgur.com/FROtWV7.png)
From the pie chart on ethsta.com, we can see that Lido owns more than 1/4 validators. The top 3 staking pools, Lido, Coinbase, Kraken, own more than 1/2 validators. We can also see from the table that the top 3 staking pools are still growing fast in validator counts and deposit amounts. Besides, about 30% of validators are classified into "others", since we are not able to obtain their address tags.
![](https://i.imgur.com/gluJIMm.png)
The stacked area chart above presents the growth trend of the number of validators owned by various staking pools. The data shows that some traditional exchanges and mining pools started Ethereum staking first, but were greatly surpassed by Coinbase and Lido since May 2021, the time they decided to join in a big way. The two pools then maintained a linear growth trend until Lido's surge recently, making its curve begin to approach exponential growth.
In general, the proportion of Ethereum staking is more and more inclined toward Lido and Coinbase since their large participation. Now, merely Lido, Coinbase, and Kraken have together controlled more than 50% of validators.
## Future Work
We will continue to analyze the validators in "others", trying to find out the entities behind them. Welcome to raise issues to point out data faults. BTW, we are also interested in the data analytics of client diversity which may help in the upcoming Ethereum "the merge".