# EPF Final Update: Conclusion
## The project
The idea started from me wanting to make a tool to see if my Ethereum node was
accessible from the internet, so I would know that I've configured by router
correctly. I posted my idea to the Discord channel and Mario suggested I work
on the [Node Crawler](https://github.com/ethereum/node-crawler), and extend it
to add the functionality I was looking for. It had been almost a year since the
last commit, and the website was not working because the database would
eventually get too big, and then the server would stop working.
I thought this sounded like a great project to work on. I would make something
useful to others, learn how the Ethereum nodes communicate, and I would get the
feature I was looking for.
## Status
I have a running website with my current version running: [node-crawler.angaz.io](https://node-crawler.angaz.io).
It's been running there since 2023-09-02, and I've been updating it every time
I push any commits.
I think I've achieved what I wanted and then some. The original plan was to
have a page where a user could enter their node's enode, and then the crawler
would try to connect to it, but after having experience with the crawler, I
realized we probably don't need this. Through the discovery process, it should
find almost every node on the network within 6 hours or so. So the node will
eventually be found. I think 6 hours is enough time, but we can still add this
page if we find this approach is insufficient.
One big revelation I had while looking at the syslogs of the server where the
node-crawler is running is that there were a LOT of messages being logged about
dropped connections. So I thought that I should accept these connections,
because for nodes which cannot accept connections, them dialing the
node-crawler is the only way we would be able to get the information about
these nodes. This turned out to be very important because there are quite a few
nodes out there we would not have information about otherwise. For client
diversity, these nodes still matter.
Being able to see if the crawler can connect to your node, what I did was save
a history of the crawling, and showed the last 10 attempts on the node's
details page. This way, the user can see when the crawler attempted to connect
to their node, what the error message was, and when the next attempt will be.
One thing I noticed while looking through the database and metrics is that most
dial attempts get an i/o timeout, this means that the node is not exposed to
the internet, and will only find peers by dialing out, it will never have
incoming peers. This is not great when you restart your node, because
1) your node will mostly be hitting i/o timeout's itself, and
1) the nodes your node had previously found can't reconnect.

Node Updates - Shows the rate of dialed and accepted connections, with success
or error.

Update Errors - Shows the rate of errors occurring from dialed and accepted
connections, with the reason. You can see how i/o timeout is by far the most
common type of error.

The [Crawl History page](https://node-crawler.angaz.io/history/) showing recent
dial attempts and accepted connections.

The [Node's details page](https://node-crawler.angaz.io/nodes/6823ae9dc83e5f11a10ad710e81ccdff609a8eac9b08a307e1b758b4f0acdb02)
showing the details of the node, a map of where the node is located based on
data from Maxmind's [GeoLite2](https://dev.maxmind.com/geoip/geolite2-free-geolocation-data)
database, and a list of the last 10 accepted and dial attempts.
We can see, in this case that this node is not correctly exposed to the
internet because all the dial attempts fail with i/o timeout. The only reason
we have any details about this node is because we accepted an incoming
connection.
## Future of the project
There are so many possibilities going forward, but I think the most important
is the data. I would like to find a way to automate making a snapshot of the
database and uploading it to IPFS or something, so others can look at the data
and using it in their own research. I think the openness of this data is
important to it being a trust-worthy source of information.
I would like to improve the stats page with historical information, meaning
that we would need to save the stats already being collected for the stats page
to a time-series table, so we can make graphs from them. It's not complicated, I
just haven't done it yet.
The stats page can also be improved with more stats. At the moment, we have
countries, and client type, and OS/Architecture which come from the client's
name. I think it would be nice if we could have links on each client type,
which will give you a list of the versions of that client, same with country
linking to a list of the cities of that country. This could also be done as
filters.
I would like to add a filter for when the next fork is announced, so we can see
how many nodes are ready for the fork. Cancun should start getting dates
announced, so when that happens, we will update the Geth dependency, and add
the filter. The node is ready, there's just nothing on the front-end yet.
I started working on a page to give information to the users about how to get
their node to connect to the crawler if they are seeing only i/o timeouts, or
"too many peers", so they can get their node's info on the site even if the
crawler can't connect. This would be great for the stats to have as accurate
set of data as possible.
I'm not going to claim that I have a beautiful front-end. So it would be nice
if we could get someone to design something which looks nicer.
The current state of the code is in a [branch](https://github.com/angaz/node-crawler/tree/split_disc_crawler)
on my own fork of the repo. The code is still a bit of a mess, using random
functions from the previous implementation. There's old flags which are not
being used anymore, and lots of dead code, so I would need to spend time time
cleaning things up before it's ready for a PR.
A JSON API for other tools to use would also be pretty nice. For example, an
integration with the [Rocketpool CLI](https://github.com/rocket-pool/smartnode/tree/master/rocketpool-cli)
could make it very easy for users to find their node on the site, and/or give
information to the user in the CLI.
In order of importance, I would say:
1) Improve the node name parser so we can parse the versions.
1) Save the stats, so we can see the history.
1) Fork readiness filter, and this would be saved to the database as well, so
we can track the number of nodes over time.
1) More info on the help page.
1) JSON API.
1) Beautiful front-end. Sure, it's not the best to look at right now, but it's
got everything you need.
## Self-evaluation
I was contributing as a permissionless participant. I still had a full-time job
to do, so my work on the project was pretty limited to my free time.
At the start, I wasn't super motivated, but when I got going, and really
started to understand the code, and how Ethereum nodes communicate, I got a lot
of inspiration from all the ideas I was having, and this really motivated me to
work on it almost every day. It was a really fun project to work on and I'm
really glad I got past that unmotivated stage. I think I've made something
truly awesome!
I hope to get more involved with Ethereum's development. I have to keep up at
least with new stuff which can be added to the crawler. Hopefully we will get
more contributors to it, but I would like to keep it updated so it's always
relevant and useful.
Unfortunately, the weekly meetings often collided with weekly meetings at my
web2 job, but there were some times when I would join when these meetings were
canceled, or I would watch the recordings.
## Feedback on the EPF
The EPF is a really cool idea. Thanks Mario for getting me to apply for it, and
for coming up with the idea to work on this project. It was really fun, and I
would say, even a counter-burnout for my day job.
Getting to hear from the core devs on the office hours meetings was really
great. I mostly watched the recordings, but I was there for one or two of them.
It's really inspiring to see the people who work on Ethereum and see they are
just normal people.
The only unfortunate thing I could say is that there were sometimes issues with
the video chat software. But I agree that we should be using FOSS software
where ever possible, and not force people to use proprietary software. For me,
it worked 99% of the time, with one or two issues. I would say I've seen more
problems with MS Teams :P
I give a big thank you to Josh, Mario, and the Ethereum foundation for making
this possible. It really is amazing and I hope it continues into the future.
The dev updates I didn't really like at the start. But after the second one,
I really started to like them. Putting your thoughts into words really makes
you think about them more, and I even had some ideas while writing the updates.
I think this might be something I continue to do with my other projects. Blog
posts from others have really helped me to learn, so maybe it's a good thing
for me to give back what I have learned too.