# CCFA Internship - IP Correlation between Disv4 and Disv5
TODOs for next week, coming from (leo's) https://hackmd.io/igb1kHMyQtWjhg97USecjw
-> Run Dv4 and Dv5 separatelly
-> Correlate the IPs of nodes in DV4 ENRs with the IPs on Armiarma
-> try to connect to the IPs of the CL with the default EL ports
# IP Intersection (Leon):
- D1 = 17k nodes from `devp2p discv4`
- D2 = `node_crawler <- D1.json` = 34.495
- 60k nodes (with duplicates)
- D3 = D1 U D2 (IPs) = 13.314 ( 13k unique IPs in D2 inside 17k)
- 15.790 counting IPs present multiple times
- D4 = Identified_Peers(D2) = 17.692(`ClientType` not null)
- 31587 (with duplicates)
- D5 = Identified_Peers(D1) = D1 U D4 (IPs & ID) = 6192
- D6 =
# Crawlers (leon)
After letting run the crawler only with __Discv4__:
- 32.897 records with duplicates
- 23.683 without duplicates
- 9831 identified
- 0 have `eth2` as `ClientType`
- 995 have `tmp` as `ClientType`
- 439 on mainnet (`NetworkID == 1`)
- EL nodes:
- 8108 nodes
- Distribution:
- Geth: 5682
- Erigon: 2375
- Besu: 51
After letting run the crawler only with __Discv5__:
- 13.599 records with duplicates
- 7473 without duplicates
- 7039 identified
- 5869 have `eth2` as `ClientType`
- 314 have `tmp` as `ClientType`
- 31 on mainnet (`NetworkID == 1`)
- EL nodes:
- 469 nodes
- Distribution:
- Geth: 454
- Erigon: 15
### CrossCheck IP with Armiarma:
Amount nodes _Armiarma_: 11540 (not dep, `clientType` not empty)
- __Discv4__:
- All nodes
- Amount nodes: 23.683
- Nodes with IPs present in both: 3.649
- Only identified nodes:
- Amount nodes: 9.831
- Nodes with IPs present in both: 547
- Only mainnet nodes:
- Amount nodes: 439
- Nodes with IPs present in both: 112
- __Discv5__:
- All nodes
- Amount nodes: 7.473
- Nodes with IPs present in both: 3.863
- Only identified nodes:
- Amount nodes: 7.039
- Nodes with IPs present in both: 3.824
- Only mainnet nodes:
- Amount nodes: 31
- Nodes with IPs present in both: 15
__TODOs__:
-
-
__DONE__:
- Run discv4 and get amount
- identified nodes
- `eth2` / `tmp` nodes
- mainnet nodes
- Run discv5 and get amount
- identified nodes
- `eth2` / `tmp` nodes
- mainnet nodes
- take IP of discv4 mainnet nodes and check if present in Armiarma
- take IP of discv5 mainnet nodes and check if present in Armiarma
- Get amount and distribution EL nodes (`ClientType` contains
- Besu
- Nethermind
- Erigon
- Geth)
# Crawlers (Ahmed):
*TODOs*:
Last week!
- Run EL crawler on the nodes that Armiarma didn't connect to and generate DB
- Build a dashboard for the nodes that we managed to connect to with Leon's `test.json`
This week!
1. How to identify CL nodes from the code?
2. A list of ways to filter the client type (What are the client names that mean that it is on the Eth blockchain?) Done
3. Compare results from the `No Armiarma` run and its generated DB to figure out how many nodes it connected to from the file that we passed to it. Use table join to figure that out. Done
4. Export IP addresses `discovery5.db` and compare it with IP addresses from the ArmiarmaV2 DB to see how many from Armiarma are on discovery 5.
5. Run an experiment on discovery 4. Figure out which nodes of them are Execution layer clients using the ClientType. Done
6. How many from discovery4 are on mainnet? (NetworkID ==1) Make a list of IP addresses from the ones on mainnet and compare it with the IP addresses in Armiarma. Done
7. Only include the Execution Layer client in the new dashboard
8. Run the crawler on our node and figure out what information it relays. In progress
*Done*:
Last week!
- Ran discovery4 separately for 24hrs and generated a DB called `discovery4.db`
Discovery4 DB highlights:
Total number of nodes: 32,897
Number of connected nodes: 15,053
Number of "eth2" nodes: 0
Number of "tmp" nodes: 1,161
Number of EL nodes: 13,892
- Ran discovery5 separately for 24hrs and generated a DB called `discovery5.db`
Discover5 DB highlights:
Total number of nodes: 13,599
Number of connected nodes: 12,831
Number of "eth2" nodes: 10,675
Number of "tmp" nodes: 622
Number of EL nodes: 1,534
- Ran node crawler on the nodes that Armiarma couldn't connect to for 6h:
Observation: I ran the crawler for 6 hours and the number of nodes it connected to in the first 6 hours is 6,594 nodes. The number of nodes it connected to in the second 6 hours is 9,856 nodes. The `Nodes.json` file that I fed the crawler initially had 6,704 nodes. This means it is probably connecting to more nodes than it is given. However, the `JSON` file that I feed the crawler IS getting updated with the information extracted by the crawler for the connected nodes.
- Built a Metabase dashboard for EL crawler on the 6,106 connected nodes
Dashboard graphs:
Client Name Distribution
Geographical Distribution
Go version Distribution
Operating System Distribution
Network ID Distribution
This week!
- How can we filter out the nodes that are on the Eth blockchain EL and not on other sidechains using the client name?
The four Execution Layer clients on the ETH blockchain are: Besu, Nethermind, Erigon, and Geth. However, I see that, for example, there are nodes that have client name `Erigon` with NetworkId `56`, which means that they are not on the ETH mainnet although they have an EL client name. Therefore, it won't be sufficient to filter by `Client name` to find the nodes that are on the ETH EL. We will have to filter by `(Client name == Besu OR Nethermind OR Erigon OR Geth) AND NetworkID == 1`. There are 394 nodes in total.
- How many nodes from the discovery 4 nodes are on mainnet? Same as above.
- I compared the generated DB of the `NoArmiarma` crawl to the file that I initially passed to the crawler. I did that to see how many nodes the crawler connected to from the `JSON` that was passed to it. The `JSON` file initially had 6,794 nodes and the crawler managed to connect to 729 nodes of those. I did that by joining the generated DB and the CSV and filtering by `ClientType != ''` and joining on the `IP` column.
- I joined the `Eth Nodes` table from Armiarma with the IPs that were connected to using the Discovery4 EL crawler. There are 8,820 nodes generated from 394 IP addresses. What information are we looking for out of those nodes?