owned this note
owned this note
Published
Linked with GitHub
# P2P Networking: Telemetry and Analysis brainstorm
## What probes do we currently have? (Inventory)
- (@user) Title
- what does it measure
- link to source code
- (@probelab)
- discv4 and discv5 comprehensive network topology information ([`nebula`](https://github.com/dennis-tra/nebula))
<details>
<summary>Concrete Data</summary>
Every two hours for all nodes in the discv4 and discv5 DHT. Frequency can be increased as a single crawl takes a little less than 20 minutes:
- supported protocols
- supported transports
- latency from AWS us-east-1
- agent_version (parsed into client, semver, hash, os, arch etc.)
- parsed enr attributes: attnets, attnets_num, fork_digest, next_fork_epoch, next_fork_version, seq_num, syncnets, ip, tcp_port, udp_port, quic_port, opstack_chain_id, opstack_version, les, snap, bsc, ptstack, opera
- geo information: continent, country, city, latitude, longitude, is_cloud, is_vpn, is_tor, is_proxy, is_bogon, is_relay, is_mobile, asn, type (isp, datacenter, government, university, etc.), hoster (AWS, OVH, etc.), domain
- discovered and identify-advertised Multiaddresses
- connection errors per dialed multiaddress
- which multiaddress we eventually connected on
- we are not storing the raw ENR (but functionality is there) nor the signature
</details>
- Gossipsub and some libp2p internal traces ([`hermes`](https://github.com/probe-lab/hermes))
- download/upload throughput measurement from nodes in the Ethereum network (`ookla`)
- PoC for sampling Node-Custody (soon public `eth-das-guardian`)
- Prysm fork streaming events for the `engine_GetBlobsV1` req/resp summary ([`prysm`](https://github.com/probe-lab/prysm))
- benchmarks for snappy-compression of beacon blocks ([`eth-snappy-benchmarks`](https://github.com/cortze/eth-snappy-benchmarks))
- dht eclipse attack detector (internally called `antikythera`)
- disv4/discv5 bootstrap uptime monitoring (`boomo`)
- (@cskiraly) \<\<Placeholder for many tools built over time>>
- Discv5 crawler, giving the following data
- per node:
- ENRs, with version distribution
- latency measurement
- packet pair based bandwidth bottleneck estimate
- also an improved version of this, with high precision NIC HW-based RX timestamps.
- partial "Who knows who" graph
- Mempool "observatory": single geth node, modified, with more peers
- transaction diffusion in the mempool, per transaction, and of course also stats per type, per size, etc.
- cross-correlate with transaction "commit" in the block
- should also be able to cross-correleate with finalization
- Modified Nimbus for block propagation and processing stages analysis. I think it was only some logging added, nothing special
- Not really a probe, but I had a few Notebooks, hosted on Colab, to analyse data from PandaOps for block and attestation propagation.
- (@pop) Block/blob propagation time.
- (@pop) Block size.
- (@pop) Number of blobs.
- (@pop) Technically **we already have everything provided by beacon API** (both through the events and GET endpoints). See https://ethereum.github.io/beacon-APIs/
## What kind of measurements would we like to see?
- (@raulk) Private transactions per block.
- transactions included in a block that were previously not witnessed in the mpool
- Probelab might have something we can reuse. [Link](https://ethresear.ch/t/theoretical-blob-transaction-hit-rate-based-on-the-el-mempool/22147).
- (@raulk) Mpool expiry patterns across EL clients (wildly different heuristics)
- what transactions were dropped from the mpool without making it into a block, ideally specifying the reason
- (@raulk) Continuous measurement of blob EL mempool hit rate (GetBlobs)
- (@raulk) Sender-recipient pairs responsible for private blobs (can probably trace down to a handful of L2s?)
- (@raulk) Mesh topology, churn/stability, and reachability.
- (@raulk) Connection type (TCP, QUIC), handshake type (Noise, TLS), multiplexer (Yamux, QUIC).
- (@raulk) Connection age.
- (@raulk) Bandwidth and throughput measurements (passive measurements?)
- (@raulk) All pairs shortest paths (as a matrix / heatmap), peer eccentricity, graph diameter of the mesh
- (@raulk) Low-level transport metrics: RTT, packet loss, jitter, window size, reordering rate, etc.
- (@raulk) Reachability metrics (NAT)
- (@raulk) Underlay stats: AS and traceroute to peers.
- (@raulk) Network layer stats: MTU.
- (@marcopolo) Network layer stats: lost packet counter
- (@dennis-tra): Validator deanonymization via GossipSub message propagation triangulation
- (@dennis-tra): Numbers on non-server, non-dht peers interacting with the network (could adjust [ants](https://github.com/probe-lab/ants-watch))
- (@dennis-tra): Various DHT graph-related metrics. E.g., centrality like page rank, betweenness, in/out-degree distribution.
- (@dennis-tra): I always wanted to correlate the set of peers/IPs that we discover in the DHT with data from [Ripe Atlas](https://atlas.ripe.net/). If I understand it correctly, this could give us an approximation of the RTT distribution between peers of the entire network by relying on their globally distributed probes. Perhaps they also have data on traceroute, packet loss, jitter, window size, etc.?
## What probes should we build?
- (@raulk) Forks of EL clients instrumenting the mpool (ideally contribute the traces upstream)
- (@raulk) Sophisticated network crawler to gather many of the datapoints listed above.
- (@raulk) Hermes++: a useful hydra to pervade the network and collect traces, while boosting propagation.