changed a year ago
Linked with GitHub

Quick notes on Blocknative vs Mempool Dumpster data initiatives

by Thomas Thiery, October 19th, 2023

Thanks to Barnabé, mike and Chris for feedback and comments.

Blocknative (BN) Mempool Data Program and Flashbots Mempool Dumpster (MD) are two recent initiatives aimed at open-sourcing Ethereum mempool data. In this short report, we analyze and compare both datasets to highlight their key differences and foster data initiatives within the community.

We utilize one day of mempool data (September 21, 2023), from both the BN and MD datasets, to build upon Chris' MD analysis.

Blocknative Mempool data

BN has been archiving historical mempool data since November 2019. It collects and provides data from three regions (North America, Europe, and Asia), and its schema includes 27 fields listed here, covering all transaction information except signature data. When analyzing the data, we can use the status field to determine whether the transaction was pending in the mempool, rejected, evicted, canceled, sped-up, confirmed, or failed. Note that both confirmed and failed tags indicate that transactions landed onchain. Figure 1A shows the unique transactions count for every status tags. On September 21st, the BN dataset included 1,422,827 unique transactions: 72.3% of transactions succesfully landed onchain (see Table 1, Nincluded = 1,029,508), and 7.3% of transactions that landed onchain were not seen in the mempool, thus representing Exclusive Orderflow (NEOF = 112419). Note that transactions confirmed onchain but not seen by the mempool can be identified by looking confirmed transactions with the timepending field set to 0. BN dataset also provides fields specifically created to give more details about transactions' status (see Figure 1B), such as failurereason (e.g., Reverted: UniswapV2Router: INSUFFICIENT_OUTPUT_AMOUNT), dropreason (e.g., replaced-txs, low-nonce) and rejectionreason (e.g., exceeds block gas limit).

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Figure 1. A. Table of unique transactions count per status tag on September 21st, 2023. B. Distribution of rejection and failure reasons according to rejectionreason and failurereason fields included in BN dataset.

We then evaluated the inclusion time for mempool transactions using timepending and blockspending fields in BN dataset(Figure 2). On September 21st, we show that succesful transactions landing onchain had 73% chances of being included in the next block (Figure 2, right panel), with a median inclusion time of 8,636 ms (left panel). On the other hand, failed transactions have 68% chances of being included in the next block, with a median inclusion time of 9,277 ms.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Figure 2. Cumulative Distribution Functions (CDFs) of transactions inclusion time. The left panel depicts the CDFs of time pending in the mempool for both confirmed and failed transactions, while the right panel illustrates the CDFs of the number of blocks awaited prior to transaction inclusion. In each panel, solid lines represent the empirical cumulative distribution, and vertical dashed lines denote the median values of the distributions.

Flashbots Mempool Dumpster

The Mempool Dumpster (MD) iniative was launched on September 4th, 2023 by Flashbots. Today, it encompasses about two months' worth of data starting on September 8th, from generic EL nodes (e.g., go-ethereum, Infura), Alchemy, bloXroute, Chainbound and Eden. The dataset's schema is composed of 18 fields, inclusive of signature data, and on September 21st, it included 1,307,926 unique transactions (see breakdown in Figure 3). Out of these unique transactions, 78% were included onchain (Nincluded = 1,020,078), which can be determined by filtering on included_at_block_height, included_block_timestamp_ms and inclusion_delay_ms fields (transactions are included onchain when these fields are greater than 0). Having multiple sources allows to compare various metrics.

Sources Transactions count
bloXroute 1,190,387
MempoolGuru 1,175,641
apool 1,171,871
Infura 1,167,135
Chainbound 1,072,194
Eden 1,065,510
local 1,028,072

Figure 3. Breakdown of unique transactions count across sources from the MD dataset on September 1st, 2023.

Having multiple sources also allows to identify transactions that were exclusively seen by some entities. However, the MD dataset doesn't include an exhaustive list of all transactions that ended up landing onchain (the equivalent of confirmed transactions in the BN dataset), making it harder to estimate EOF accurately (here, EOF refers to transactions that landed onchain without being seen the mempool, not transactions exclusive to a particular source). To estimate EOF, we used the Dune API to retrieve the exhaustive list of all transactions that landed onchain, and identified transactions that were not seen in the mempool by any of the sources from the MD dataset. We estimated EOF and showed that 8.7% (NEOF = 113,521) of all transactions landing onchain were not seen in the mempool across all sources.

Blocknative 🤝 Mempool Dumpster analyses

We combined BN, MD and Dune datasets to get additional insights on metrics we can derive from mempool data. First, we computed the difference in transactions count between BN and MD. We found that 191,812 transactions were exclusive to BN, while 76,911 transactions could only be found in MD. However, the difference was a lot smaller for transactions that ended up onchain. The BN dataset contained 115 transactions that landed onchain and were not present in MD. Conversely, only 35 transactions were only present in MD.

We then set out to compare inclusion time delays between BN and MD datasets. Suprisingly, we found large differences between both distributions, with a median inclusion time of 8,689 ms for BN and 6,837 ms for MD (see Figure 4). After further investigation, we found that these differences originated from differences in how inclusion delays are computed in BN and MD datasets. In BN, the inclusion delay refers to the actual time transactions are pending in the mempool before getting included onchain when transactions are seen in the mempool first. In MD, the inclusion delay is the difference between the transaction was seen in the mempool and the block timestamp: this can lead to very low, and negative inclusion delays for a significant number of transactions (N = 13,678), leading the differences displayed in Figure 4.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Figure 4. Cumulative Distribution Functions (CDFs) of transactions inclusion time for Blocknative and Mempool Dumpster datasets. Solid lines represent the empirical cumulative distribution, and vertical dashed lines denote the median values of the distributions.

Lastly, we compared the time at which transactions were first detected BN and MD mempools. Out of 1,00,1295 transactions present in both datasets, 66.31% (N = 667,272, median BNdetecttime - MDdetecttime = 43ms) were first detected by Blocknative (see Figure 5). For more detection time results specific to MD, check out the results obtained in this analysis.

BlockNative First Mempool Dumpster First
Count 667,272 334,023
Percent 66.31 % 33.19%
Median 43 ms 35 ms
p90 14 ms 72 ms
p95 7 ms 109 ms
p99 2 ms 301 ms

Conclusion

Blocknative Mempool Data Program and Flashbots Mempool Dumpster initiative publicly shared datasets that can be used to offer valuable insights on how and when public transactions get included in Ethereum blocks. We think this will help accelerate empirical research in key areas of the Ethereum supply network such as censorship and its impact on inclusion time, builders behavioral profiles, and more!

Select a repo