Standardizing Ethereum Client Tracing

# Standardizing Ethereum Client Tracing In this doc, we discuss how we can work on standardizing client traces in Ethereum. We have a repo https://github.com/ethereum/client-traces to document standardizing client traces. We can approach standardizing client traces by: 1. Picking 2 main flows in an Ethereum Validator client. We can start with: a. The attestation proposal flow b. The block proposal flow 2. Document the parent-child span traces along with the attributes. These traces should only capture the most important parts of the flows. We should strike a balance between going adding too many spans and attributes and too less. We should try to be as implementation agnostic as possible. We need to document the parent-child span in a readable way. We could use a simple box drawing method like the below: ``` beacon.receive_block (root) │ ├─ beacon.validate_block │ │ Attributes: valid, validation_error │ │ │ ├─ beacon.verify_signature │ │ Attributes: signature, valid │ │ │ └─ beacon.check_proposer │ Attributes: proposer_index, expected_proposer │ ├─ beacon.state_transition │ │ Attributes: pre_state_root, post_state_root │ │ │ ├─ beacon.process_operations │ │ │ │ │ ├─ beacon.process_attestations │ │ │ Attributes: attestation_count, valid_count │ │ │ │ │ └─ beacon.process_deposits │ │ Attributes: deposit_count │ │ │ └─ beacon.process_execution_payload │ │ Attributes: payload_hash, gas_used │ │ │ └─ execution.new_payload │ Attributes: block_hash, block_number, status │ └─ beacon.update_fork_choice │ Attributes: head_root, justified_epoch │ ├─ forkchoice.on_block │ Attributes: block_root, slot │ └─ forkchoice.get_head Attributes: head_root, head_slot ``` We can derive the traces based on the consensus specs since the consensus specs are implementation agnostic. 3. We can do a POC in a client like lighthouse or prysm for our given traces. 4. As we derive more traces and have a POC, we can start having breakout observability calls discussing the trace structure for clients. 5. We need to standardize the trace exporters. Ethpandaops requires clients to support Grafana Tempo and raw JSON traces. These options can be enabled with flags. Tempo allows us to visualise the traces. Raw JSON traces allows us to validate whether clients are adhereing to the spec. # Trace Documentation Below are some points to keep in mind while documenting the traces 1. Define namespaces for the traces. We can have the following namespaces: a. beacon b. p2p c. execution We could go more granular if we want to specify DAS related traces etc. Our spans can have a structure `namespace.operation`. The namespace and operation can follow snake case. We can allow clients to prefix the span by their client-name to allow them to have client specific traces. 2. We need to introduce versioning for traces to keep changes in track. 3. Have scripts to validate that clients are meeting trace requirements. # Questions: 1. Existing clients like lighthouse and prysm already have some traces implemented. Standardizing the traces could involve breaking some of their traces. We need to check with client teams if this is okay with them. In my mind, it should be fine since I assume they don't use traces in production nodes. 2. Traces would be used only in perfnets i believe. What is the lifecycle of these perf nets? Are they long running or are they temporary? If they are long running, we might have to deal with the problem of breaking changes in spans. # Notes: 1. Traces incur performance overhead. We need to ensure that we don't introduce too many traces and also don't have too few traces which don't give enough insight. This will be on going tuning process i believe. 2. We should make the traces as implementation agnostic as possible. 3. Allow traces for client specific implementation. 4. We need to define a trace sample size to avoid too many traces being sent. 5. Avoid high cardinality traces. 6. Avoid adding sensitive data to traces.