This past week have been a very busy, debuggish & informative week.
alrevuelta
) mentioned that asking for address association to discord pools is not a very good approach as asking them to verify doesn't work apparently.I also thought of an idea for associating validators to staking pools, similar to how Uniswap maintains its token list. However, this approach relies on the optimistic assumption that the data associated is correct. We might also need to associate some incentives to the contributors to increase authenticity. This will be very beneficial for the comuunity as a whole. I might persue this after cohort.
Currently, I am in the phase of classifying the staking pools and have a few approaches that I will follow, this may deviate a bit from the main project as this ground needs to be set before the analysis for accuracy. I want to note that if we don't have enough data, we may conclude that our analysis is incomplete or inaccurate. However, if I see the complications growing, I will shortlist the major staking pools with a few false positives and continue from there. This will not include the performance of all pools, but rather a few pools to get the performance analysis. This approach will still give us valuable insights while being more practical. Here is how I plan to label the data:
From
address from the dataset & try finding labels through etherscan (https://etherscan.io/labelcloud) is one such example. I have shortlisted the labels.From
addresses from the dataset that has the same transaction hash appearing more than once. This way we can Identify the contracts that are making batch requests to deposit contract & will probably be some pool we don't know yet.eth_metrics
repository to extract common addresses & remove false positives as much as I can. I aim for a dataset that has majority of From
addresses associated with a staking pool label for further analysis.