owned this note
owned this note
Published
Linked with GitHub
# Quick notes on Blocknative vs Mempool Dumpster data initiatives
_by [Thomas Thiery](https://twitter.com/soispoke), October 19th, 2023_
_Thanks to [Barnabé](https://twitter.com/barnabemonnot), [mike](https://twitter.com/mikeneuder) and [Chris](https://twitter.com/metachris) for feedback and comments._
Blocknative (BN) [Mempool Data Program](https://docs.blocknative.com/mempool-data-program) and Flashbots [Mempool Dumpster](https://github.com/flashbots/mempool-dumpster) (MD) are two recent initiatives aimed at open-sourcing Ethereum mempool data. In this short report, we analyze and compare both datasets to highlight their key differences and foster data initiatives within the community.
We utilize one day of mempool data (September 21, 2023), from both the BN and MD datasets, to build upon [Chris' MD analysis](https://gist.github.com/metachris/f25357750bd2fcec956ed3314e0b13b3).
## Blocknative Mempool data
BN has been archiving historical mempool data since November 2019. It collects and provides data from three regions (North America, Europe, and Asia), and its schema includes 27 fields listed [here](https://docs.blocknative.com/mempool-data-program), covering all transaction information except signature data. When analyzing the data, we can use the __`status`__ field to determine whether the transaction was pending in the mempool, rejected, evicted, canceled, sped-up, confirmed, or failed. Note that both confirmed and failed tags indicate that transactions landed onchain. Figure 1A shows the unique transactions count for every __`status`__ tags. On September 21st, the BN dataset included 1,422,827 unique transactions: 72.3% of transactions succesfully landed onchain (see Table 1, N<sub>included</sub> = 1,029,508), and 7.3% of transactions that landed onchain were not seen in the mempool, thus representing Exclusive Orderflow (N<sub>EOF</sub> = 112419). Note that transactions confirmed onchain but not seen by the mempool can be identified by looking confirmed transactions with the __`timepending`__ field set to 0. BN dataset also provides fields specifically created to give more details about transactions' status (see Figure 1B), such as __`failurereason`__ (e.g., `Reverted: UniswapV2Router: INSUFFICIENT_OUTPUT_AMOUNT`), __`dropreason`__ (e.g., `replaced-txs`, `low-nonce`) and __`rejectionreason`__ (e.g., `exceeds block gas limit`).
<img src="https://hackmd.io/_uploads/rklCDGOep.png" width="100%">
![]
> *__Figure 1.__ __A.__ Table of unique transactions count per __`status`__ tag on September 21st, 2023. __B.__ Distribution of rejection and failure reasons according to __`rejectionreason`__ and __`failurereason`__ fields included in BN dataset.*
We then evaluated the inclusion time for mempool transactions using `timepending` and `blockspending` fields in BN dataset(Figure 2). On September 21st, we show that succesful transactions landing onchain had 73% chances of being included in the next block (Figure 2, right panel), with a median inclusion time of 8,636 ms (left panel). On the other hand, failed transactions have 68% chances of being included in the next block, with a median inclusion time of 9,277 ms.
<img src="https://hackmd.io/_uploads/H1L0NEdgT.png" width="100%">
![]>
> *__Figure 2.__ Cumulative Distribution Functions (CDFs) of transactions inclusion time. The left panel depicts the CDFs of time pending in the mempool for both confirmed and failed transactions, while the right panel illustrates the CDFs of the number of blocks awaited prior to transaction inclusion. In each panel, solid lines represent the empirical cumulative distribution, and vertical dashed lines denote the median values of the distributions.*
## Flashbots Mempool Dumpster
The [Mempool Dumpster (MD)](https://github.com/flashbots/mempool-dumpster) iniative was [launched](https://twitter.com/metachris/status/1698668155260866820) on September 4th, 2023 by Flashbots. Today, it encompasses about two months' worth of data starting on September 8th, from generic EL nodes (e.g., go-ethereum, Infura), [Alchemy](https://docs.alchemy.com/reference/alchemy-pendingtransactions), [bloXroute](https://docs.bloxroute.com/streams/newtxs-and-pendingtxs), [Chainbound](https://fiber.chainbound.io/docs/usage/getting-started/) and [Eden](https://docs.edennetwork.io/eden-rpc/speed-rpc/). The dataset's schema is composed of 18 fields, inclusive of signature data, and on September 21st, it included 1,307,926 unique transactions (see breakdown in Figure 3). Out of these unique transactions, 78% were included onchain (N<sub>included</sub> = 1,020,078), which can be determined by filtering on __`included_at_block_height`__, __`included_block_timestamp_ms`__ and __`inclusion_delay_ms`__ fields (transactions are included onchain when these fields are greater than 0). Having multiple sources allows to compare various metrics.
| Sources | Transactions count |
|:-----------:|:------------------:|
| bloXroute | 1,190,387 |
| MempoolGuru | 1,175,641 |
| apool | 1,171,871 |
| Infura | 1,167,135 |
| Chainbound | 1,072,194 |
| Eden | 1,065,510 |
| local | 1,028,072 |
> *__Figure 3.__ Breakdown of unique transactions count across sources from the MD dataset on September 1st, 2023.*
Having multiple sources also allows to identify transactions that were exclusively seen by some entities. However, the MD dataset doesn't include an exhaustive list of all transactions that ended up landing onchain (the equivalent of `confirmed` transactions in the BN dataset), making it harder to estimate EOF accurately (here, EOF refers to transactions that landed onchain without being seen the mempool, not transactions exclusive to a particular source). To estimate EOF, we used the [Dune](https://dune.com/) API to retrieve the exhaustive list of all transactions that landed onchain, and identified transactions that were not seen in the mempool by any of the sources from the MD dataset. We estimated EOF and showed that 8.7% (N<sub>EOF</sub> = 113,521) of all transactions landing onchain were not seen in the mempool across all sources.
## Blocknative 🤝 Mempool Dumpster analyses
We combined BN, MD and Dune datasets to get additional insights on metrics we can derive from mempool data. First, we computed the difference in transactions count between BN and MD. We found that 191,812 transactions were exclusive to BN, while 76,911 transactions could only be found in MD. However, the difference was a lot smaller for transactions that ended up onchain. The BN dataset contained 115 transactions that landed onchain and were not present in MD. Conversely, only 35 transactions were only present in MD.
We then set out to compare inclusion time delays between BN and MD datasets. Suprisingly, we found large differences between both distributions, with a median inclusion time of 8,689 ms for BN and 6,837 ms for MD (see Figure 4). After further investigation, we found that these differences originated from differences in how inclusion delays are computed in BN and MD datasets. In BN, the inclusion delay refers to the actual time transactions are pending in the mempool before getting included onchain when transactions are seen in the mempool first. In MD, the inclusion delay is the difference between the transaction was seen in the mempool and the block timestamp: this can lead to very low, and negative inclusion delays for a significant number of transactions (N = 13,678), leading the differences displayed in Figure 4.
<img src="https://hackmd.io/_uploads/ryEuehceT.png" width="90%">
> *__Figure 4.__ Cumulative Distribution Functions (CDFs) of transactions inclusion time for Blocknative and Mempool Dumpster datasets. Solid lines represent the empirical cumulative distribution, and vertical dashed lines denote the median values of the distributions.*
Lastly, we compared the time at which transactions were first detected BN and MD mempools. Out of 1,00,1295 transactions present in both datasets, 66.31% (N = 667,272, median BN<sub>detecttime</sub> - MD<sub>detecttime</sub> = 43ms) were first detected by Blocknative (see Figure 5). For more detection time results specific to MD, check out the results obtained in this [analysis](https://gist.github.com/metachris/f25357750bd2fcec956ed3314e0b13b3).
| | BlockNative First | Mempool Dumpster First |
|:-------:|:-----------------:|:----------------------:|
| __Count__ | 667,272 | 334,023 |
| __Percent__ | 66.31 % | 33.19% |
| __Median__ | 43 ms | 35 ms |
| __p90__ | 14 ms | 72 ms |
| __p95__ | 7 ms | 109 ms |
| __p99__ | 2 ms | 301 ms
## Conclusion
Blocknative Mempool Data Program and Flashbots Mempool Dumpster initiative publicly shared datasets that can be used to offer valuable insights on how and when public transactions get included in Ethereum blocks. We think this will help accelerate empirical research in key areas of the Ethereum supply network such as [censorship](https://censorship.pics/) and its impact on [inclusion time](https://ethresear.ch/t/estimating-inclusion-delays-for-censored-transactions/15115), [builders behavioral profiles](https://ethresear.ch/t/empirical-analysis-of-builders-behavioral-profiles-bbps/16327), and more!