Eth2 networking call #4 2020/03/25

--- tags: eth2network description: Notes from the Eth2 networking working group call image: https://benjaminion.xyz/f/favicon-96x96.png --- # Eth2 networking call #4 2020/03/25 [Quick contemporaneous notes by Ben Edgington] [Agenda](https://github.com/ethereum/eth2.0-pm/issues/137) ## Discussion Meeting is recorded. Contact Danny for the link. Aggregation strategy: there are a lot of edge cases. Need to discuss where each team is at. ## Updates **Protocol Labs (Raul)** Working hard on GossipSub over recent weeks. In particular, hardening, but not so much at the protocol level: 1.0 and 1.1 are actually compatible with each other. Built internal cross-functional team for attack modelling. Designed mitigations, and hardened Go implementation (peer scoring, grey-listing, back-offs...). Links to spec and PR in chat, below. Next steps. Dedicated team of 3 is testing the effects of the changes from 1.0. Will release after this work is complete. Internal Red Team will be auditing the spec and implementation. After this, other languages should adopt the changes. May also hire an external audit firm. Migration from 1.0 to 1.1 is not a huge scope: order of days of work. **Discv5 (Felix)** Go implementation is up as PR on Geth repo. New LRU session based cache (no longer on disk). Need to speed up unit tests. Have identified some things for the next spec version: - Proposal to change find-node package to reduce number of requests. Support multiple distances in a single request. - Issue with tag construction in every packet. Type confusion can lead to misidentifying packages. Not a big issue, but messy. - Use compressed elliptic curve keys in the handshake to reduce packet size. But this is a breaking change. Felix will implement these to test them and create next spec version over the next week. **Testing (Proto)** Looking into network-level testing. Rumor and [Pyrum](https://github.com/protolambda/pyrum) are evolving for writing test scripts. "Stethoscope" repo is up. Contact Proto if you want to investigate network testing. Can try [syncing](https://github.com/protolambda/eth2-py-hacks) against client testnets (v.0.10.1) to check transitions. ## Networking edge cases [Adrian Manning/Age] Naive attestation aggregation strategy. Lighthouse team has found issues during implementation. - Discovery needs to be fast. Spent lots of time tweaking Discv5 implementation. - Validator client to beacon node API. There are some changes in this API, especially around selecting aggregators. - Dealing with timing thoroughly is a challenge, especially with many validators attached to a node. - Peer management: finding peers for a subnet, need to know what long-lived subnets they belong to, via ENR, but not available via libp2p. - If we are not an aggregator for a slot, do we need to subscribe to a subnet? [Terence] API - in old spec, only one round-trip required. In latest spec, the aggregator needs to sign the aggregated object, so 2 round trips are required. [Age] Another issue is that we are signing a future slot, but need to account for fork version. Potentially need to have only one of my attached beacon nodes (if I have multiple) send an aggregation. If we are not an aggregator, should we subscribe to the subnet? [Danny] Our subnets have grown much bigger due to crosslinking every epoch. If not an aggregator, this is nice - can just send to "fanout". However, non-aggregators miss attestations which can delay their fork choice. Will eventually catch up due to persistent committee seeing attestations. Feel that this is a great change. Saves on bandwidth and keeps network size down. How to find networks our peers belong to. Age has made a [PR for a "ping" protocol](https://github.com/ethereum/eth2.0-specs/pull/1673). [Danny] How about sending a local identifier, including ENR if things change, as a meta-sequence number. [Jacek] Would be nice to ring-fence ENR so that it does not pollute the rest of the protocol. [Felix] For real-time in-protocol information exchange ENR is not ideal. Needs to be signed for one thing for tamper resistance. If we are connected to a peer, we already have an authenticated session. [Danny] Two paths: (1) put attestation subnets into status message; (2) put node sequence number into a ping and have a larger payload message if this changes. [Felix] Should allow transporting message by different means. Pre-connection negotiation in future version of Discv5. Could include metadata beyond what the ENR contains. [Danny] Do we all want a ping protocol? [Age] We don't currently have a way to identify when peers have dropped out. [Jacek] Shouldn't that be at the libp2p level? [Raul] If the transport is TCP, then there are TCP keepalives occurring, so we should see a disconnected notification if the peer dies. [Felix] TCP keepalives are not reliable, and should be a last resort. Timeouts could be long. Better to do this at the application level. Devp2p uses Ping and Pong messages. Can be used to transfer other info as well, such as sequence numbers. Much stronger signal. [Raul] Perhaps. The yamux multiplexer runs lazy pings - only used during periods of inactivity. [Age] Seeing interesting protocol negotiation timeouts on the testnet. [Danny] Like application level ping with a sequence number formulation. Do we feel this is worth exploring. **Action: take discussion [to the PR](https://github.com/ethereum/eth2.0-specs/pull/1673)** [Age] Discovery speedup strategies: - Removed NATted peers from DHT - Change so that ENRs don't need to specify an IP address - In find node queries reduce timeouts before deciding a peer is looking uncontactable, and move on to another peer in parallel. - Queries that finish early after finding a specified number of peers. - Filter for matching peers. [Felix] - Nice trick: construct iterator of nodes that keeps advancing as peers are pulled off. Run concurrent lookups. Can rate-limit outgoing packets. Gives a broad scan of the network at a configurable rate. - Use configurable distances, as mentioned above. [Danny] Summary (1) In Eth2 APIs there is an updated protocol between validator and beacon nodes. (2) Will look at implementing Pings. (3) Non-aggregators only need to publish. Temas: please talk to each other! [Age] For beacon nodes that have validators attached, prioritize peers that are members of long-lived subnets (rather than peer nodes that have no validators attached, and are therefore not subscribed to any subnets). [Danny] Concern that this favours "supernodes" that are attached to many subnets, particularly if we all seek peers that have the maximal number of subnets. [Felix] Also good to avoid having too static a network of peers, so that new nodes can always join. Good to have some dynamism. Q: How are people handling caching of peers? [Age] Discv5 puts them in routing table, and persists this to disk on shutdown. [Felix] Can also keep a cache on disk of past nodes we've found, as a fallback. Works well in Geth. Need to expire nodes eventually due to churn. Helps with rejoining the network. [Nishant] Validator privacy. It would be pretty easy to map IP addresses to Public Keys. [Danny] This is a known problem: secret single leader election may come. Decoupling validators from nodes helps: e.g. publish attestations to one node; publish blocks to a different node to avoid DoS. Still hand-wavy. ## EthCC follow-up: test plans [Nicolas L] [Checklist](https://github.com/ethereum/eth2.0-pm/issues/131#issuecomment-594632185) plan for incentivised network. [Danny] Expect ~600-2000 beacon nodes in Ph0. Scaling beyond the ~100 range is very undertested so far. Not seeing any issues so far: optimistic, but unknowns remain. [Nicolas] Shall we add the table to the spec to guide test plans? **Action: port table to a PR on the spec** [Jonny] Has also thought through some [network testing scenarios](https://hackmd.io/RvAKxmNORjusHelg4eoF4g?view). **Action: add link to the table** [Proto] Point of the table is also to phase the requirements with intermediate points. ## Spec discussion Already covered some of this. [Jonny] Questions around expectations of how beacon nodes will treat light clients etc. How does reputation factor into this? Will beacon nodes disconnect from peers that don't adhere to all the "must" items? [Danny] Haven't considered this much yet. Worth doing a pass through and seeing what's viable. Lightclients don't need to join subnets, so that part doesn't matter. How light clients obtain blocks may be an issue. ZKProofs, polynomial commitments could be a solution, but not likely to be implemented. Maybe make lightclient sync protocol a separate protocol, and avoid block-syncing. So the answer may be that light clients don't really participate in many of these protocols. [Jonny] Do we care about distinguishing between light clients and regular nodes? [Danny] Yes, this will probably happen. E.g. non-light-client servers should be able to disconnect from LCs, and vice-versa. **Action: Danny to look through the "musts" for issues. Jonny to open issue to track this** ## Chat log highlights From Raúl Kripalani to Everyone: 01:05 PM : SPEC: https://github.com/libp2p/specs/blob/master/pubsub/gossipsub/gossipsub-v1.1.md PR: https://github.com/libp2p/go-libp2p-pubsub/pull/263/ From Marin Petrunić to Everyone: 01:23 PM : Feel free to checkout current api proposal: https://github.com/ethereum/eth2.0-APIs/pull/15 From Nicolas Liochon to Everyone: 02:01 PM : https://github.com/ethereum/eth2.0-pm/issues/131#issuecomment-594632185 From Jonny Rhea to Everyone: 02:07 PM : https://hackmd.io/RvAKxmNORjusHelg4eoF4g?view