This is a high-level overview of the project, mainly offering the details needed for operating, and understanding the website. In a future iteration of the website, I would like to incorporate the information in this post into little information bubbles you can click on to see the information where it's most relevant.
You can read my Updates to learn more about what I was doing while working on the project. You can also watch my EPF Day Presentation from Devconnect Istanbul 2023.
I wanted to make a tool for node operators to get information about their nodes. Initially, I just wanted to show if another node on the network could connect to your node, so you knew your router/firewalls were set up correctly, but then Mario Havel suggested I revive the Node Crawler project for my Ethereum Protocol Fellowship project. I worked on the project permissionlessly (when I had time) as I still have a full-time job.
The goal of the project is still for node operators to get information about their nodes, but also for researchers, and core developers to get information about the various Ethereum networks. So I'm focusing on features which will be most useful to these groups. All crawled data is kept. Nothing is filtered, or excluded. Even non-Ethereum network data is kept. Consensus layer and portal network clients are coming soon.
It's also very important that the project be GPL-compatible, and must be just as easy for anyone to run as it is to run an Ethereum node. These have been the main design considerations for the technologies used.
Most concepts should be pretty easy to understand if you are running a node, but I think some should be explained in more detail.
The crawler not only connects to nodes found on the discovery network, but also accepts connections, so nodes will randomly find the crawler on the discovery network and try to connect to it as a peer. The crawler is connected to, and crawling both DiscV4, and DiscV5 discovery networks.
The node's details are saved to the database and the crawler disconnects so the node's peer limit is not needlessly consumed.
Accepting connections is a critical feature of the crawler for nodes which cannot be reached because they are behind a firewall or Carrier-grade NAT. We would not have the details of these nodes if they did not connect at least once to the crawler. Since we have to wait for these nodes to connect to update their details, there are quite a few nodes with stale data.
This is how the details of the node was acquired, and is in reference to the
crawler. So Dial
refers to the crawler dialing a connection, and Accept
refers to the crawler accepting a connection.
As we can see from the graph above, most (> 60%) of the nodes cannot accept connections. If this is your node, please review your router/firewall configurations so we can pump that success percentage number up.
A string containing a bunch of information about the node. From this, we can extract:
Not all nodes contain all this data, so you might see places where this is blank.
This is a difficult thing to do since most of the nodes are not exposed to the internet so they cannot accept connections from peers.
A crawler which is only trying to connect to nodes will have a very limited view of the network because they will only be able to get details of nodes which are accepting connections.
Even when nodes are properly open to connections, they very commonly have too many peers, so we cannot update their details. This situation is still considered as a successful dial attempt, but we are unable to update the node's details. This problem is most common with Geth. Other nodes seem to be less strict with the peer limit.
The goal of this page is to give you access to aggregated stats about the various Ethereum networks. It only shows nodes which could still be found on the discovery network in the last 24 hours. The stats are collected every 30 minutes, on the hour, and 30 minutes past the hour.
Unsynced
on this page, and it will be shown with Unknown
on
other pages.Each of these filters adds a query parameter to the URL, so you can
share/bookmark a specific set of filters. You can also put in values which
are not given as options. You can change the network
parameter to 56
to
see stats of the Binance Smart Chain, for example.
Chainlist or Chainid have a bunch of networks you can
try.
At the moment, only 3 days is shown for each of the filtered networks, and 1
day is shown for the All
filter. I'm working on a better database for this.
I would like to make the date range configurable. All the stats since the
beginning of the project exists in the database, so I would like to make it
available.
This is where you can filter for a specific set of nodes, or even search to find your node.
There's similar filters to the stats page, just the network filter is only filtering based on the network ID, the fork ID is not taken into consideration.
The inputs should be pretty simple to understand. These will let you find your node by IP address, node ID, or public key.
These inputs will search the discovery data, so you can find nodes which were not able to be crawled. If your node is there, you can find details on the Help page on how to add the crawler as a peer to your node, so your node will connect to the crawler, and be part of the database.
Geth let's you set an identity
flag. This is added to the client's identifier
which you can also search for, if you have that set.
The 0
to f
row is just a shortcut to add that character as a start for the
node ID/public key filter.
Details about the node I think are most useful to show.
Some fields of note:
Shows the last 10 accepted, and dialed connections, along with if there was an error or not. I hope this will be very useful to node operators trying to debug peer connection or network issues. I plan to have a GitHub-style year map of dial connection issues for each node. If the history is very important, you can download the database snapshot and query all of it.
Shows the history of dialed and accepted connections for the Node Crawler.
Shows some information to help you add the node crawler as a peer to your node, and how to find your node's ID/public key so you can find it on the website.
Database snapshots, taken once a day at midnight UTC. Available for anyone to download and use in their research.