Overview - HackMD

# Overview This document addresses the following two questions: - **What** is the relevant data that should be collected (to get insights about the operational health of the network)? - **How** to best collect that data? # Introduction Analyses of transactions on the Bitcoin blockchain are widespread today because the data is easily accessible to anyone. Blockchain analyses give rise to the many websites that provide statistics about the Bitcoin network such as the network's hashrate, the throughput and fees of transactions, and many more. For some investigations, however, the blockchain-based view can be too narrow. For one thing, blockchain data lacks precise transaction and block timestamps, rendering it useless for fine-grained analyses of transaction and block propagation in the network. For another, it misses transactions that did not make it into the blockchain, which provide crucial clues about the demand for block space, fee dynamics, replace-by-fee usage, and so on. Also missing are invalid transactions and blocks sent in the network, which can be a basis for anomaly detection. Fortunately, a comprehensive data set not suffering the aforementioned shortcomings can be extracted from a Bitcoin Core node. Unfortunately, however, as of now there is no standardized way to collecting the data in a simple, reliable and automated fashion. The remainder of this document attempts to reach a consensus on what data should be collected, and how best to collect it. # Relevant data This is an initial attempt to list data that could prove useful. Because historical data can not be reproduced later, a general philosophy of overcollection (i.e., err on the side of collecting too much vs. too little data) should apply. Feel free to extend this list. - timestamp: The timestamp of the event - event: The type of event (invalid transaction/block, block added to chain, transaction added to mempool, transaction removed from mempool) - txid: The id of the transaction or block - raw data: The raw transaction data and, in case of an invalid block, block data. Recording raw transaction data has the advantage of being able to fix mistakes (infrequently!) made by the witness detection heuristic. - metadata (data that can in theory derived from a full historical data set but in practice is too hard/expensive to do) - invalid: reason for being invalid - removed from mempool: reason for being removed - some of the transaction information provided by `getrawmempool` API call # Data collection approaches So far, the following approaches have been identified to collect some or all of the [relevant data](#relevant-data). ## API-based data collection This approach is based on taking snapshots of a node's mempool state at regular intervals using the API call `getrawmempool`. Comparing successive snapshots yields the list of transactions added and removed from the mempool. Detailed transaction information for transactions added in the interval can be obtained using the `getrawtransaction` API call. Pros: - The resulting data set will be self-healing in case of downtime - In case of downtime, significant information might be missed, but even though the time between snapshots might increase significantly, but both snapshots always represent valid mempool states. Cons: - The following relevant data can not be collected with this approach: - exact timestamp: accuracy of timestamps is limited by the snapshotting frequency - invalid transactions and blocks are missing - the reason for removal of transactions from the mempool are missing - All transactions that were added to and removed from the mempool in the time between two successive snapshots will be missing. This includes some RBF transactions; data on transactions that were removed because they were included in a block can be reconstructed using blockchain data. ## ZMQ-based data collection This approach collects data from various ZMQs provided by Bitcoin Core. Pros: - Event notification is immediate - Exact timestamps - No missing data due to inadequate sampling as in the API-based approach Cons: - Data set can become inconsistent in case of downtime - Example: A transaction removed from the mempool during downtime will lead to the transaction to be stuck in a reconstruction of the mempool. Removal times can approximated for mined transactions using block timestamps, but not for transactions removed for other reasons. - For monitoring the operational health of the network, this should not be a problem, since a node that is down does not contribute to any health information. When it comes to the potential creation of a public mempool data set, it should not be a problem either, as data is collected in a decentralized way by multiple nodes, so as long at least one node is running, the resulting data set should be complete. - The following relevant data cannot be collected out of the box: - Invalid transactions and blocks, including reason for invalidity - Reasons for removal of transaction from mempool - Information provided by `getrawmempool` - In theory, it should be possible to make all data available via ZMQ by adding the necessary functionality to Bitcoin Core. An existing [patch](https://github.com/0xB10C/bitcoin-zmq-mempool-chain-events/blob/v23.0-zmce/PATCH.md) by 0xB10C already makes some of the relevant data availble. ## USDT-and-eBPF-based data collection This approach does not rely on an external tool to collect data from Bitcoin Core via some interface. Instead, it is based on adding tracepoints to the mempool and other subsystems of Bitcoin Core to enable logging all relevant data directly via Bitcoin Core. - Pros: - Can collect all relevant data - In case it gets merged, data collection can occur using vanilla Bitcoin Core - Cons - Probably most work - Data set can become inconsistent in case of downtime - Same comments as for ZMQ-based approach apply

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.