Devan Mitchem & Christoph Sch

[**Devan Mitchem & Christoph Schlegel**: Dark Forest, Public Cloud: Building for speed and scale - Flashbots x Google Cloud Web3.](https://www.youtube.com/live/tBZj91oyNnA?si=Gq4KxLfTqgZbQgn6) --- **Summary:** This talk explores how the physical infrastructure of the internet, like fiber cables, ISPs, undersea routes, and data centers—shapes the way data moves into and across the Ethereum network. It explains how latency, routing decisions, and geographic placement of nodes and relays affect everything from transaction propagation to block building. Using empirical data from Google Cloud and Flashbots, the presentation show how physical location and network topology lead to real differences in speed, access, and profitability within Ethereum. The talk highlights how infrastructure centralization, especially around U.S. and European relays, creates structural advantages for builders and validators in those regions. The talk ends by challenging the audience to rethink what Ethereum’s network map should look like and how to build a more geographically inclusive and resilient system. --- **Transcript:** **Devan Mitchem:** We're going to dive into the dark forest and public cloud and we have three parts in this session. After this talk, you’ll come out as an expert in physical networking and everything you need to know about Ethereum networking. So the first part will cover the physical flow of bits across the Internet. In part two, we'll talk about how those bits go from various internet locations into the Ethereum supply chain. And then third, we have a couple of call-outs for areas for further research and some recommendations. And then for those that are into data science or open source components, we have a [data notebook](https://colab.research.google.com/drive/13k3fb_pdbjJuEoiPQKWSPLnOVGK0Lqpk?usp=sharing) that we created that you can use to look into the numbers behind this, and hack and remix some of the data that we came up with. So let's talk about the physical layers. This is how the internet is basically built. You have the local “last-mile” on the left. These are typically aerial fiber cables, copper lines, high voltage power lines, and Internet Exchanges that are in most cities. And then on the right, you see how these bits travel from various cities into the ocean, across continents. And if you zoom out, here's a map on the left of the submarine cable systems that span the globe. You can see these are all physical cables dropped into the ocean that follow certain paths, navigable paths. And on the right, these are the virtual networks built on top of the physical networks. And in today's talk, we're going to talk about how latency on the right and geography on the left are actually not really one on the same. And there are some important nuances you should understand as you're building apps, as you're designing mechanisms, or as you're building and solving for users. So when it comes to geography, internet paths might not be the best physical path. One key point to understand is that ISPs are essentially very large local area networks. And you have to understand the relationships that ISPs might have between one another. Some ISPs might be smaller regional ISPs, other ISPs might be transcontinental with deep networks and major tier one, tier two, tier three cities. And it's important to understand the business relationships between ISPs because that often determines the path. If you're interested in this topic even further, you can look up autonomous systems, BGP in terms of routing. But what's most important here is the path your package takes really depends on this connectivity, which might not be fully understood. Now, what does it mean for Web3? So on the right, it's often that you see this case called “Tromboning” where you might have packets that you think might go one direction, but in fact, they go another direction. And we have some data here to share in terms of what that looks like. One other thing to note, for folks that might trade on centralized venues and also trade on decentralized venues, is that location matters and Christoph will get into some of the details. So one example in terms of how, at least we at Google Web3 have some experience in terms of both the Web2 form of internet routing and Web3 is that we actually operate a faucet. So you can go to this page and get a drip for Ethereum testnet and others. Generally, the way this works is when you scan this and you get a drip from Google into your wallet, a couple of things happen. In short, your request goes over some radio towers that go into some backhaul metro fiber. It'll bounce around your ISP, and depending on your ISP relationships, bounce to other ISPs. Then it goes under the ocean, or it goes over the mountain ranges, and eventually lands on Google Cloud. And then from Google Cloud, we dump that into a testnet from one of the nodes we might have running on an Ethereum test. It lands in the testnet mempool, testnet validation, then it gets gossiped out, and then all the way back to whatever node your wallet is reading from to see that that actual drip came in. So there's a lot happening. All the raw data is available on the top right. This is some data that we took out from this thing called the perf kit that Google developed, where we have VMs in every single Google data center constantly running pings between different locations. And we built this map from Tokyo to understand how long it takes to get to different regions. And there's an interesting trend here, when it comes to connectivity, Tokyo to Virginia is much faster than Tokyo to Europe. That's interesting. And this matters, in terms of your design, and it's important to know when it comes to that decision, if you even dig into the data deeper, it actually turns out from Tokyo, it's actually faster to get to the US West Coast and U.S. East Coast than it is to get to Europe, Frankfurt and Amsterdam. Now, why is that? You have to follow the tubes. It all depends, again, on those maps that I showed you before and the routing. So putting it all together on the left, these are the tubes in terms of why it matters in terms of, your connectivity and then how it lands into the RPC and mempool. It's all somewhat related here. We have further data here on the talk if anyone's interested from a data science perspective. We have a lot of this data on Google's BigQuery public data sets. We have a [notebook](https://colab.research.google.com/drive/13k3fb_pdbjJuEoiPQKWSPLnOVGK0Lqpk?usp=sharing) available. We also have some feedback and some ideas to share below. And with that, I'll hand it over. **Christoph Schlegel:** So [mempooldumpster](https://mempool-dumpster.flashbots.net/index.html) is a quite comprehensive dataset of public transactions in Ethereum. You can now also query it on BigQuery instead of Dune and other services. You can work with it directly inside Google products. And I think it's a nice avenue for research to think about a connection between transaction data and latency data. Most of this is actually due to research from other Flashbots individuals, shout out to [Data Always](https://x.com/data_always), [Burak Öz](https://x.com/boez95), and others. And that should give you some idea of how latency matters in many different ways and how geography matters in many different ways in Ethereum. So this is a chart that probably many of you are familiar with, it's the rise of private order flow. If you don't have access to a certain kind of order flow, which is sort of exclusive to block builders, then you can not build very profitable blocks. It's a vector of centralization that is quite visible if you compare that to people who built their blocks locally on public data. So if you have access to this kind of private order flow you can pack your blocks with much more valuable transactions and you can generate more profit from that. And there's a clear trend in the data, and all of that has to do to a certain degree of connectivity. Another touch point where latency matters is how fast your transactions are included in the blockchain. And that depends on which node sees it and which nodes distribute it. For example, bloXroute has this distribution network with a relay and proxies, where they quite quickly distribute transactions that they see over the network. So if you distribute a transaction over bloXroute, it will get on average, and also in the long tail, much faster into the P2P networks than if it was gossiped from home. So there are differences in the nodes, notes are different. We treat them as the same from the point of view of the protocol, but not all nodes are the same. This is data from a [robust incentive group](https://efdn.notion.site/Robust-Incentives-Group-RIG-Homepage-802339956f2745a5964d8461c5ccef02). It gives you an idea of where Ethereum validator nodes are geographically located. I don't know whether that is sort of broadly distributed over the world. Certainly there is a majority of these nodes in North America and Europe. Definitely there are nodes lacking for sure in places like Africa and and South America, and India if you compare population density with node distribution. This has to do with infrastructure and latency. If you look at the layer where blocks are built, I don't know whether you know this, but relays are basically in two places. They are in East Coast America and in Europe, these places are the places where blocks are actually produced, because builders tend to co-locate with these relays. So everybody sends a transaction to these two locations. Why these two locations? This is related to the entire infrastructure. These locations are connected by a very fast connection through the transatlantic fiber corridor, with a very fast connection between Europe and North America. And you would not have that if you run your relay in Nigeria, or if you run your relay in Buenos Aires, it would take longer. So there's a correlation between infrastructure and where our infrastructure for Ethereum is located. It is dependent on history. History is influencing infrastructure, and that shapes Ethereum, even though we may have wanted to build something different. Another way latency enters, which is again beautifully illustrated by [Data Always](https://x.com/data_always), is through this bidding data on the MEV-Boost relay. So if you compare over time how frequently the bids in the MEV-Boost relay are updated - MEV-Boost is this market in which blocks are proposed to Ethereum - we bid on who produces the highest volume block, and if we compare these curves over time, bids are updated more frequently. What does it mean? It must mean that people have the infrastructure to update their bids more frequently. So they must sit close to the auctioneer, the relay. This is a vector of centralization, and I do not only claim that, it's visible in the data. Until January of this year, the market was basically a duopoly of two big builders being co-located with these relays. Okay so those are facts that are shaped, in some sense, by infrastructure even if we maybe wanted to build something different. So that is data on bidding in MEV-Boost and on the distribution of relays. Another fun thing that you can do, that our intern Sen Yang at Flashbots is currently pursuing, is that you can take this latency data that Google Cloud provides us, and you can run simulations in how certain kinds of games play out in Ethereum block production infrastructure. For example, timing games, where some people attempt to delay the propagation of blocks to build more valuable ones. You can wonder how that extra value is shaping the location of nodes in the network. And they built a nice simulation tool where you have a nice way of feeding in the latency data, and you see the convergence of nodes and clustering, because there is a benefit of being close to a certain kind of nodes. There's a benefit of being close to the relay. That's one way of how you can use this dataset. Other questions that you can ask, I'll just give you a bunch, it's not comprehensive. We have a new [Research Database](https://flashbots.notion.site/21f6b4a0d87680a2b08dca1eda93ff6f?v=21f6b4a0d87681ddb959000c44242e52) in Flashbots with some of the problems that we are interested in, and we are looking for people to collaborate with us on these questions. So some of the high-level questions that I'm interested in, of what I find interesting to hear about research that other people are doing, is this entire influence of location or connectivity on what different people do. So what does it mean, for example, for users? So you cannot be a high-frequency searcher if you are located in Cannes. It will not work for you, you have to be co-located with a relay. But it also matters for other kinds of use cases. Your user experience is shaped by where you are located in the world, by your connectivity. Your validator experience varies across different regions of the world. How easy it is to run a validator, and how much profit you make from running a validator. Uptime, attestations, and these kinds of things. We could wonder what happens if you change this system. There’s discourse around reducing slot time. And there are benefits to reducing slot times, but it affects people asymmetrically; it doesn't affect everybody in the same way. Ethereum is not an island; it interacts with L2, which are also usually located in Europe or North America. And we have CEXes usually located in Europe, or more of them in Asia, with Binance in Tokyo and so on. You can wonder how does this infrastructure interact with Ethereum and why the network structure looks like it does, and how latency impacts what people are doing. You can also do more practical research. How hard, or how easy, is it to run critical infrastructure of Ethereum in different places in the world? We have very little knowledge about this. And finally, the bigger questions of how we can make location matter less? And there are probably more ideas, you can answer with a mix of data and good ideas. And I want to encourage you to look into that. Perhaps the question 0 you should ask before you ask any other question is actually: What do we want? So here I’ve redrawn the previous validator relay map, and I just just for fun added a relay in Buenos Aires. I removed 40% of the nodes in North American Europe, you might not have noticed, as there are so many. You can remove 40% without changing this picture. I added a bunch of nodes in Africa and so on. I don't know whether that's a map that we want to have, but we should ask the question: What kind of map do we want to have? It’s shaped by infrastructure as we have now learned. But history is not everything. There are also memes and ideas that we want to follow. So we might we want to have an idea of how this map should look like. We should have a vision of what this map should look like. And if we can build and hack it with the current infrastructure, with Google or whatever, with internet service providers. Good. But if we cannot do that, then we should find a different way, because we don't want to replicate the same kind of map that we have. If you found this topic interesting, definitely reach out. We have a [notion page](https://flashbots.notion.site/21f6b4a0d87680a2b08dca1eda93ff6f?v=21f6b4a0d87681ddb959000c44242e52) and a [data notebook](https://colab.research.google.com/drive/13k3fb_pdbjJuEoiPQKWSPLnOVGK0Lqpk?usp=sharing) where you can hack on the data.