Discovery peer advertisement efficiency analysis

# Discovery peer advertisement efficiency analysis by TX/RX Research 1. Intro (you are here) 2. [Advertisement solutions](#Advertisement-solutions) a. [Advertisement in Kademlia](#Advertisement-in-Kademlia) b. [Advertisement in Discovery V5](#Advertisement-in-Discovery-V5) c. [Advertisement in ENR](#Advertisement-in-ENR) 3. [Requirements](#Requirements) a. [Ethereum 1.x requirements](#Ethereum-1.x-requirements) b. [Ethereum 2.0 requirements](#Ethereum-2.0-requirements) 4. [Efficiency and metrics](#Efficiency-and-metrics) 5. [Simulator](#Simulator) 6. [Simulation results](#Simulation-results) a. [Efficiency metrics](#Efficiency-metrics) b. [Proposals](#Proposals) 7. [Conclusion](#Conclusion) 8. [References](#References) *Discovery* is a family of p2p protocols, a fork of well-known *Kademlia*[[1]](#references-1) protocol, which was designed as a decentralized distributed hash table (DHT). Previous version, *Discovery V4*[[2]](#references-2) is already used in *Ethereum*[[3]](#references-3) network for node lookup and discovery but has few obvious flaws and requires some improvements. *Discovery V5*[[4]](#references-4) is introduced to move forward and get a better tool for peer knowledge propagation mechanism. Though *Kademlia*, ancestor of *Discovery* is designed as DHT, *Discovery V4* doesn't use all its features and doesn't have any resources to search. *Discovery V5*, by contrast, is designed with resources search and lookup feature, *topic advertisement*, where resources are node records, ENRs[[5]](#references-5). While closeness[[6]](#references-6) usage in regular peer routines plays a secondary role and reflects only in network structure uniformity, when the peer gets a task for advertisement lookup, closeness becomes a fundamental mechanism to find ENR ads. In *Kademlia* resources are files and hashes of keywords are used to find node IDs closest to hash, which are expected to store appropriate files. In *Discovery V5* there are 2 approaches for placement of topic advertisements: by closeness and random, first uses Kademlia closeness to choose advertising media for topic ads, second places ads on randomly chosen nodes[[7]](#references-7). In this research we'll try to pick out efficiency performance indicators such as average and maximum time spent on search, churn resilience, smallest number of nodes per topic (which leads to network topic capacity) and other indicators. Next, *Discovery V5* and alternative advertisement solution already suggested by Ethereum community (ENR attributes[[8]](#references-8)) will be compared in network with the same properties by efficiency using several indicators in the simulator. Finally we are going to review results and start discussion on what these results could mean for us. ## Advertisement solutions[^](#Discovery-peer-advertisement-efficiency-analysis) When you have thousands or even millions of nodes with different properties in a p2p network, you need a mechanism to find appropriate node(s) upon request. Picking each one by one and verifying its conformity will take a bunch of time. It's a case where advertisement properties of protocol break in. ### Advertisement in Kademlia[^](#Discovery-peer-advertisement-efficiency-analysis) Original advertisement solution proposed by author of Kademlia for usage in free distributed file-sharing networks. To store a (key, value) pair, a participant locates the *k* closest nodes to the key and sends them *STORE* RPC. In file sharing network value was usually a file and key - hash of keyword associated with it. Lifetime of data is set to 24 hours. Additionally, each node re-publishes (key, value) pairs as necessary to keep them alive. Not diving deeper in re-publish procedure, its main goal is to keep record alive while performing significantly less than *k \* (k-1)* requests on each cycle. To find a (key, value) pair, a node starts by performing a lookup to find the *k* nodes with IDs closest to the key. However, value lookups use *FIND_VALUE* rather than *FIND_NODE* RPCs, but if responder doesn’t have value in storage, it returns the result of FIND_NODE with the same parameter from the value query. For caching purposes, once a lookup succeeds, the requesting node stores (key, value) pair at the closest node it observed to the key that didn't return the value. This way the radius of any popular records is increasing by one with each lookup. To avoid over-caching an expiration time of a (key, value) pair in any node's database exponentially inversely proportional to the number of nodes between the current node and the node whose ID is the closest to the search key. Both republishing and caching or either one were omitted in several implementations to improve validity of lookup results. In networks with a lot of peers joining for a little period of time caching removal improves search time by stopping dead-end lookups. ### Advertisement in Discovery V5[^](#Discovery-peer-advertisement-efficiency-analysis) Discovery v5 protocol includes a scalable facility for registering ‘topic advertisements’. The list of ads for a particular topic is called the ‘topic queue’ because it functions like a FIFO queue of limited length. Every node limits the total number of ads to 50,000 and 100 of ads per topic. Ad duration is set to 15 minutes, after that time the advertiser node should re-register the ad. Node could only place an ad of node itself, so there are no any redistribution mechanism. There considered two options for ad placement - *(a)* random distribution and *(b)* placement within topic radius based on closeness. In *(b)* node calculates target subset, which is simply a region of node ID address space, consisting of nodes whose Kademlia address is within a certain distance to the topic hash `sha256(T)`. This distance is called the 'topic radius'. At first, it was proposed[[9]](#references-9) to estimate topic radius by keeping track of the average time to successful registration within segments of the address space surrounding the topic hash. So, the advertiser samples the time it takes to place an ad in each segment and adjusts the radius such that registration at the chosen distance takes approximately `target-ad-lifetime / 2` to complete. As this procedure fails to provide a good estimation other approaches are considered[[10]](#references-10). To find nodes for a topic, the searcher generates random node IDs inside the estimated topic radius and performs Kademlia lookups for these IDs. Radius estimation for topic search is similar to the estimation procedure for advertisement, but samples the average number of results from TOPICQUERY instead of average time to registration. All (intermediate) nodes encountered during lookup are asked for topic queue entries using the TOPICQUERY[[11]](#references-11) packet. ### Advertisement in ENR[^](#Discovery-peer-advertisement-efficiency-analysis) As an alternative approach of node classification in network, Ethereum 2.0 specification suggests[[12]](#references-12) to use ENR's[[13]](#references-13) attribute fields. On start every node should randomly pick 1 of 64 subnetworks to participate in and record subnetwork name in its node record which is used in discovery as node info card, default piece of information about a node used in peer exchange. After that, once in a very big interval (between 27 and 54 hours) node changes its current subnetwork in ENR by picking another random, leaves old and joins a new one. So, most of the time nodes are part of subnetwork not using it in supposed manner, but maintaining the skeleton of it. When any node gets a task to join subnetwork #X for real work, it could easily find several members and join it. While this approach is presented to be temporal and looks like a hack from a certain point of view, it has several properties, from which the whole system is benefited, at least when we look at Ethereum 2.0 usage. Subnetwork has a lot of members while the set of currently active nodes is just a small fraction of it, which means greater safety for the active members with keeping all participants info opened. Information about all subnetwork participants is automatically spreaded and redistributed across the whole network and doesn't require complex deduplication like *Kademlia*. Plus, every node couldn’t participate in a big number of subnetworks as ENR size is fairly small. So it’s difficult to act as a malicious actor by advertising on a big number of different topics or by one topic in one part of the network and another in the rest. However, it’s scaling properties to some bigger number of subnetworks than currently proposed (64) are questioned and addressed in simulation. Lag when switching from one subnetwork to another could be very big, let’s recall, in the eyes of other network members. There is no mechanism to efficiently find members of certain subnetwork without getting information from all others. Overall it looks like a good start with a number of positive inherited properties, but needs a lot of improvements to become usable. ## Requirements[^](#Discovery-peer-advertisement-efficiency-analysis) You cannot design an effective system without knowing its requirements. Overprovisioning increases costs and could slow down operations in some configurations. With underestimation in design the system dies out from a load and could require structural changes on expanding. High complexity leads to high number of errors in implementations and unexpected halts in processing. Design of any system should start from understanding of what's needed for and what goals are expected to achieve. Discovery V5 will be used in Ethereum 1.x and Ethereum 2.0 networks. ### Ethereum 1.x requirements[^](#Discovery-peer-advertisement-efficiency-analysis) Ethereum 1.x possible advertisement goals include but not limited to: - determining part of peers serving as LES servers[[14]](#references-14) and balancing load between them - splitting peers to 2 or 3 networks when each participate in different way in distribution of Ethereum state, updates and proofs[[15]](#references-15) - data locator for stateless approach[[16]](#references-16) We should notice that first and second goals require advertisement of only 2-3 types of nodes, plus the number of advertised nodes is comparable to the size of the whole network, say 10% or 50%. Last one requires an advertisement system with greater complexity, but it is currently in the early stage of development. It means design requirements for advertisement systems capable of handling such a task couldn’t be formed to date. ### Ethereum 2.0 requirements[^](#Discovery-peer-advertisement-efficiency-analysis) Current Ethereum 2.0 p2p specification suggests using a per-committee subnet[[17]](#references-17) with `MAX_COMMITTEES_PER_SLOT` set to 64. Subnetworks should be stable, so every time we join the network we will see exactly 64 subnets alive. To make this feasible each validator must randomly select and remain subscribed to RANDOM_SUBNETS_PER_VALIDATOR (1) attestation subnets. So when active in committee, the validator participates in 2 subnetworks, all the remaining time in 1. After some random duration between EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION (27 hours) and 2 * EPOCHS_PER_RANDOM_SUBNET_SUBSCRIPTION] peers should switch to other random subnets following ENR change. *IDs* of subnets are fixed, however members will join and leave depending on their needs. Though we couldn't say exact number of validators at some future time, it's capped from minimal side by network start constant `MIN_GENESIS_ACTIVE_VALIDATOR_COUNT`[[18]](#references-18) and by Ethereum issuance from the top, which is not capped but there were propose to put it on 120M[[19]](#references-19) (it could be an April, 1 joke, however), plus return of investment with validators stacking 100M ETH drops already below useful rate for any kind of currency[[20]](#references-20) (but not cryptocurrency). So, we could safely define borders of each subnetwork skeleton with about 256 to 50,000 validators. Plus 256-384 validators could join to work on their duties in the subnet during current epoch. `TARGET_COMMITTEE_SIZE` is set to 128 and validators are getting know about their future work in 1 epoch (6.4 minutes with current constants `SECONDS_PER_SLOT` of 12 and `SLOTS_PER_EPOCH` equals to 32) with shuffling of committees in every epoch. So we are going to see a current validator set (128) for each subnetwork, upcoming validator set (another 128) and one epoch switch validators from the epoch before previous could be still there in the very first seconds. It leads to a maximum of *3 x committee size*, but regularly just 2x. In order to finish size/share requirements we should notice that the validator is not equal to the peer and one peer could serve several validators. Opposite should be uncommon due to increased costs and complexity. Plus current specification requires validators to find members of appropriate subnetwork in 1 epoch (6.4 minutes at the moment of writing) before fulfilling its duties[[21]](#references-21). Though a specific number of members is not present, we should use some safe number, 3 or greater. By safety we should include probability of peers to be dead or halted, discover erroneous or malicious nodes which could fail to provide and distribute required messages. Future Ethereum 2.0 phases may require more subnetworks, but other types of DHT usage are not proposed to date. ## Efficiency and metrics[^](#Discovery-peer-advertisement-efficiency-analysis) With different approaches in creating and maintaining advertisements we need to pick out a number of metrics to compare them in terms of efficiency. While security should be another important consideration for each solution, it lies outside the scope of this study. Following metrics are measured for all advertisement processes: 1. **Time and traffic spent by an advertiser**. Measurement of time for one advertiser to find appropriate media and publish ads. As an ad has lifetime in any system, it should be valuable to measure all time and traffic spent in 24 hours on creation and maintenance of advertisement. [>>>](#1-Time-and-traffic-spent-by-an-advertiser) 2. **Time and traffic spent by the media**. While in some solutions media doesn’t spend any time except on replies to advertisers, other systems include republishing features which involve media to participate in advertisement placement. In all systems media is spending traffic when accepting ad placement and returning the results of ad searches. But we are going to measure only ad placement here as ad search traffic varies by number of lookups and is covered by 6th and 7th metrics. Same 24 hours period is measured. [>>>](#2-Time-and-traffic-spent-by-the-media) 3. **Time and traffic to establish small network fraction advertisements**. Generally it’s different how information is propagated when only a small fraction is advertised compared to a significant part of the system. In this metric we are going to measure efforts of placing an advertisement, which advertises 0.1% of the network. It’s not practical to measure 100% coverage in such metrics, so 50%, 80%, 99% percentiles are measured. [>>>](#3-Time-and-traffic-to-establish-small-network-fraction-advertisements) 4. **Time and traffic to establish big network fraction advertisements**. Depending on the strategy which is used for advertisement spreading, time to place an advertisement could be changed dramatically when a big part of the population is involved. Some systems are getting benefits from it, others get linear spent time changes compared to small fraction advertisements. Same here, we measure the moment when 50%, 80%, 95% of network population got an information about new advertisement. [>>>](#4-Time-and-traffic-to-establish-big-network-fraction-advertisements) 5. **Time to stop big and small network fraction advertisements**. It's not the whole job to place an ad, at one time ad should expire and the same way information doesn't spread immediately when placing an ad, it spreads with lag when the ad becomes obsolete. So, we measure, when 50%, 80%, 95% of population are getting know that advertiser is not supporting an ad anymore. [>>>](#5-Time-to-stop-big-and-small-network-fraction-advertisements) 6. **Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000)**. In this test we start with an established advertisement of 64 peers and try to find 8 of them. Traffic and time is compared. As it’s the case of minimal supported advertisement, it could be valuable to measure also, is it possible to go under 64 ads and still be found and what to expect on low end. [>>>](#6-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-small-fraction-64-of-10000) 6a.**What’s the smallest fraction of advertised peers to be found** [>>>](#6a-What’s-the-smallest-fraction-of-advertised-peers-to-be-found) 8. **Time and traffic to find 8 advertised peers when advertisers are a big part of the network**. Same here, but we have about 5% of peers being advertised and want to find 8 of them. [>>>](#7-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-big-part-of-the-network) 9. **Churn resilience when looking up for advertised peers**. Measure how lost packets will affect both advertisement placement and lookup and which solution serves better in bad network conditions. [>>>](#8-Churn-resilience-when-looking-up-for-advertised-peers) ## Simulator[^](#Discovery-peer-advertisement-efficiency-analysis) In order to measure metrics concerned, a <a href="https://github.com/zilm13/discv5/tree/ad-efficiency">simulator</a> was made in Kotlin language which compiles to JVM code. While previous work[[22]](#references-22) researching node lookup in Discovery allowed a lot of simplifications in simulation, measurements of topic advertisement using Discovery V5 protocol and ENR attributes required implementation very close to production, though still number of simplifications were applied: - no real network peers talk. Peers exchange messages using `Router`[[23]](#references-23) class. - steps instead of real time message processing. Two options were considered in order to reduce real time nature of simulation subject: step measurements and scheduler control. Steps were chosen being a simpler solution. One step is equal to one network leg of the roundtrip network message life cycle, so *PING-PONG* is done in 2 steps. - no handshakes, cryptography, omitting of non-subject payload. Traffic estimates for all packets includes cryptography parts of them, but handshakes are omitted from traffic and time measurements To guarantee reproducibility of results, seeding of randomness was used and almost all tests were executed 3 times with different seeds (1, 2, 3). As peers have their duties even without participation in any advertisements schemes, traffic and steps were compared when peer is involved in advertisement and when not. ## Simulation results[^](#Discovery-peer-advertisement-efficiency-analysis) ### Efficiency metrics[^](#Discovery-peer-advertisement-efficiency-analysis) #### 1. Time and traffic spent by an advertiser In Discovery V5 efforts spent on advertisement are straightforward: calculate target topic ID, discover nodes in radius around it, place an ad in required radius, repeat for 24 hours every 15 minutes[[24]](#references-24). To simplify simulation we have no pressure on advertisement placement, all ads go live without retries, though in the real system there will be additional traffic and retries. For ENR one could declare that there will be no traffic and time spent for the advertiser, but if a peer hasn't changed ENR, the environment is ok with their information about this peer. Instead if the sequence number is changed they are obliged to do `FINDNODE(0)` and get fresh ENR for it. So this is time and traffic spent by advertisers. But it happens only once in 24 hours (to be correct, in 27~54 according to specification) and distant observers get ENR updates through intermediate peers. Our measurements show that traffic and time grows with number of peers for topic advertisement, but non-significantly, doesn’t vary much with number of advertised nodes (though we excluded simulation with 100 peers/topic limit on one peer, which should be greatly affected by it). The main issue in this measurement and in all following is the method of how we measure extra traffic. We run two executions with the same random seed, one with placement of advertisement and another without and measure delta. When we have a lot of steps after ad placement, traffic spent on it is blurring because there are a lot of other node duties which vary with node and time. We are going to solve it with the calculation of delta on a small number of steps, right after ad is published. For topic advertisement we made 3 executions with 10,000 peers and different seeds, 10 nodes were advertised. In measurement of 100 steps we got 58.1 step and 88,913 bytes per advertised node every 15 minutes which gives us following numbers for daily advertisement: 8,535,648 bytes/node 5577.6 steps/node, when step == 100 ms, 557.76 seconds/node For ENR we did the same 3 runs on 10,000 nodes. While in case with topic advertisement it's clear when task duties are over, in ENR changes the complexity of measurement is that it's not clear when it's over. At first let's look at spreading metrics, showing share of network with knowledge of ENR changes: ![](https://i.imgur.com/sCw6nfA.png) Figure 1a *Figure 1a* shows that the fraction involved in changes doesn’t affect new ENR knowledge distribution, but the size of the network has some impact on it. In a network with 10,000 peers we have a 90% distribution around the 67th step, same time in 2,000 we could see it around the 43th step, though after the 80th step they run pretty close. Let’s compare traffic approximation for the last case, 20 advertised nodes among 2,000 total peers using following formula: TE = ΔT / d - **TE** - Estimated traffic - **ΔT** - Difference between advertised nodes traffic with changed ENR and when running with the same seed without changes in ENR - **d** - Distribution of ENR at this step ![](https://i.imgur.com/L81THta.png) Figure 1b And we see that we have quite stable results on 40 - 140 steps, and a lot of noise after it, as on 140 we are somewhere around 95%. 100th step is around 93% of distribution for both 2,000 and 10,000, so, we execute the next batch of simulations with 93% distribution cutoff and calculate estimated traffic using the provided formula. And we use 0 for steps spent as advertised nodes didn’t put any effort in distribution. It will be true until we change strategy and force distribution, say, by increasing the number of PINGs right after ENR change or whatever. So, here are the traffic numbers for ENR change, average from 3 executions with different seeds: 57,095 bytes/node. And as we need to change it once in 27-54 hours[[25]](#references-25), we divide it by 1.6875 as we do it less often than once in 24 hours, 33,834 bytes/node. ![](https://i.imgur.com/JhpO4MX.png) Figure 1c #### 2. Time and traffic spent by the media It should look very similar to the advertiser's case: media efforts are pretty clear for topic advertisement and not so for ENR change, but it is not. First of all, target media will vary a bit from advertiser to advertiser due to search paths, network knowledge inconsistency. Next we have cases with more than 100 advertisers per topic (which is not a part of this research) and different client implementations in a real network. So we are not going to measure it by single media, instead efforts are summarized for all media in case and divided by number of advertisers, traffic spent by media for each advertiser. And… it’s going to be the number from the previous metric, because we do not distinguish incoming and outgoing traffic, but we had some extra traffic on media search, so we are going to measure only target media traffic, it will be a bit smaller. Next, the question becomes super-complicated for ENR as media are all the peers in the network. But how much extra traffic spend nodes that is not in place in their normal duties? Only on `FINDNODE(0)` queries. And we are going to count it. Looking at time spent for both solutions, it’s 0 for topic advertisement as the media doesn’t spend any extra time for advertisement or redistribution. But it's all `FINDNODE(0)` queries for ENR change because in normal cases observers don’t need to make extra requests for new ENR. For topic advertisement we got 2,160 Kb of media traffic per advertiser during 24 hours and 0 steps as mentioned above. For ENR we have 189 `FINDNODE(0)` requests per advertiser per advertisement cycle, which give us 112 requests per 24 hours. It's 224 steps spent by all media for each advertiser and 34 Kb of traffic. ![](https://i.imgur.com/youGaqf.png) Figure 2a #### 3. Time and traffic to establish small network fraction advertisements By small we will use 0.1%, 10 advertisers from 10,000 peers, and we are not measuring an effort during 24 hours like in previous metrics, just one advertisement act. Next, we are going to measure efforts spent by all parties on placing advertisements to cover 50%, 80%, 98% percentiles of the whole network. For topic advertisement we could count that we have 100% when the ad is placed as all other responsibilities are assigned to searchers. Assuming step is always 100ms in topic advertisement, we get 5.8 seconds spent by advertiser on average and 86.83 Kb. And we are at 100%. For 50%-95% numbers are a bit smaller but as most work is done during search of media, not significantly. For ENR we assume that each step takes 5 seconds with some parallelism, we ping 3 nodes from our Kademlia table every tick and it sounds like real life client configuration. We measure traffic increment only on peers that have changed ENR, as there are no visible spendings for other network participants and if we compare whole network traffic there will be a lot of noise in it. ![](https://i.imgur.com/SfOnUEJ.png) Figure 3a #### 4. Time and traffic to establish big network fraction advertisements Same as in previous metrics, but now a big part of the network is involved: 5%. As 500 peers in 10,000 network will be heavily hit by topic advertisement limit of 100 peers per topic, and there is no working formula for radius estimation, which will give the same figures both for advertiser and searcher[[26]](#references-26) and distribute search results equally, when queried, we are going to avoid this case by ignoring this limit, though with using radius estimation topic advertisement time and traffic will increase in comparison with our measurement. Same percentiles are measured: 50%, 80%, 95%. As topic radius is not used in simulation, we have pretty similar numbers to previous measurement for topic advertisement: 5.5 seconds spent average per advertiser and 89.11 Kb. For ENR time slightly increased but traffic expenditures fell compared to simulation with smaller fraction with advertised peers. ![](https://i.imgur.com/OOKtvYB.png) Figure 4a #### 5. Time to stop big and small network fraction advertisements It’s not the whole job to place an ad, at one time an advertiser could decide that he doesn’t want to support the ad anymore and he needs a tool to stop an ad. And information doesn’t spread immediately, it spreads with lag when ad becomes obsolete. So, we measure when 50%, 80%, 95% of the population are getting to know that the advertiser is not supporting an ad anymore. For topic advertisement it’s pretty clear that the advertiser couldn’t stop an ad, but as it has expiration time, it will stop itself in no more than 15 minutes. As we measure average, we could say it takes 7.5 minutes for topic advertisement to stop after the operator decided to quit or whatever. For ENR we have the same figures as in two previous metrics, let's combine it. ![](https://i.imgur.com/hl1Zywq.png) Figure 5a #### 6. Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000) We start in a 10,000 peers network, small fraction, 64 peers are advertised (0.64%, higher value, than in [[3. Time and traffic to establish small network fraction advertisements]](#3-Time-and-traffic-to-establish-small-network-fraction-advertisements), though you could split the network up to 156 fractions with different roles in this way). And we want to find 8 of them to fulfill our duties, 8 was chosen as it looks like a number safe enough to get into subnetwork even if part of nodes are down or malicious. We start searching after 95% of advertisement is in place. For topic advertisement we don't require to use several media sources for searchers, though minimal safety compliance requires usage of several media sources. We got 33.2 steps on average here, which translates to 3.3 seconds and 104 Kb of traffic for each searcher. For ENR we got 123.3 steps on average, which translates to 12.3 seconds, though compared to topic advertisement searchers, 1/30 pass spent 28.7 seconds on it, second worst was with 19.7 time, so rarely it's more than 2 times slower than average. And it definitely needs improvement, which is offered under [[Proposals]](#Proposals). And we got traffic expenditures corresponding to long search: 482 Kb per searcher on average. Both values are 4 times bigger than we get with topic advertisements. ![](https://i.imgur.com/rRpdgZL.png) Figure 6a ![](https://i.imgur.com/nujfDx5.png) Figure 6b #### 6a. What's the smallest fraction of advertised peers to be found As we want some tolerance fault environment we are not going to try extreme configurations like 1 peer advertised. Smallest party we are going to test - find 8 from 16 of 10,000. If it's achievable for both we are going to increase whole network size, but still search for 8 of 16. As ENR failed to find 8 peers with 50 total advertised (0.5%) in 30 seconds, 1 of 10 searcher found only 7 in half of minute, but goal of the metric is to see all searchers get success in discovery and currently we cannot go down to 0.5% of network (1/200). We postpone this test until [[B. FINDNODE by attribute]](#B-FINDNODE-by-attribute) put in place and retest it with improvement. Topic advertisement could easily guarantee better values in smaller time, so ENR approach definitely need fixes there, which is offered under [[Proposals]](#Proposals) #### 7. Time and traffic to find 8 advertised peers when advertisers are a big part of the network Very similar to [[6. Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000)]](#6-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-small-fraction-64-of-10000), but we start with 5% of 10,000 peers network being advertised (500 peers) and trying to find at least 8 of them. The strange thing was to get 453.33 Kb on average per searcher in topic advertisement with almost the same time to search as in [[6. Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000)]](#6-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-small-fraction-64-of-10000). The guess is as we had no limit on advertised ads per topic, we got all 500 in answer, but even with a limit of 100 it’s too much. With any limit, we don’t want to get more than 16, `K-BUCKET`, in one reply to a query and there is no interface to scan it. For simplicity we changed `TOPICQUERY` behavior to return random 16 of advertised and rerun simulation. After adding the `K-BUCKET` limit, topic advertisement sounds much better: 68.4 Kb of traffic for each searcher with 3.3 seconds spent on lookup of 8 ads on average. It will be safe to expect traffic is going to drop in [[6. Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000)]](#6-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-small-fraction-64-of-10000) with this patch too, but we are not going to rerun simulation. For ENR we got 120.8 Kb of traffic per searcher and 3.3 seconds time. Time is on par with topic advertisement, but even in ideal conditions traffic spent is much higher. We are going to address this with [[B. FINDNODE by attribute]](#B-FINDNODE-by-attribute). ![](https://i.imgur.com/aoK7Slf.png) Figure 7a ![](https://i.imgur.com/eUQDHJy.png) Figure 7b #### 8. Churn resilience when looking up for advertised peers While we already had some churn in all tests: there was a 20% probability when pinging an old peer in full Kademlia is failed, we need to put more pressure on network flows, losing packets everywhere and check how both ENR and topic advertisement will handle. We are not testing some kind of DoS attack here, as it’s useless for ENR and only topic advertisement could suffer from targeted attack, plus this work is dedicated to efficiency, not safety, so it’s just a bad network everywhere, without direct targets. To not run ahead, as a starting point we chose [[4. Time and traffic to establish big network fraction advertisements]](#4-Time-and-traffic-to-establish-big-network-fraction-advertisements) and [[7. Time and traffic to find 8 advertised peers when advertisers are a big part of the network]](#7-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-big-part-of-the-network) cases with 500 advertisers in 10,000 peers network as ENR and topic advertisement results are close here in both ad placement distribution and search, though ENR is still significantly behind in time of distribution when ad placement is a one time task. Though this setup is not an extreme configuration, from other tests all processes in the network look linear and the question is, what is the speed of degrading both advertisement methods in a bad network. 25% replies to any queries lost rate is tested. We are stopping at 80% both in measurement of distribution of ad and as starting point of ad search in this simulation to make it faster. ![](https://i.imgur.com/zY4OrSe.png) Figure 8a As you can see on *Figure 8a* while the topic advertisement didn’t suffer too much from churn (actually time grows from 5.3 seconds to 7.2 seconds for 80% placement), ENR time and traffic grows significantly. Surprising was to see even smaller traffic for topic advertisement but it's explained with faster media search which is the most of topic advertisement placement traffic. As packets are lost, peers think earlier that they discovered the best media for their ads, so we get less traffic. The drawback of this is wider ad placement, which shouldn't be an issue for 500/10,000 peers advertisement, but it will decrease searchability of small fraction advertisement. Let's see what's on the search side. ![](https://i.imgur.com/ywRx8b7.png) Figure 8b Not a big deal both for topic advertisement and ENR. We are safe here. ### Proposals[^](#Discovery-peer-advertisement-efficiency-analysis) #### A. Ping shower[^](#Discovery-peer-advertisement-efficiency-analysis) In [[3. Time and traffic to establish small network fraction advertisements]](#3-Time-and-traffic-to-establish-small-network-fraction-advertisements) and [[4. Time and traffic to establish big network fraction advertisements]](#4-Time-and-traffic-to-establish-big-network-fraction-advertisements) it was shown that though ENR attribute approach is very effective in terms of traffic expenditures for advertisers and whole network, establishing an advertisement in this manner takes significant time, 4 minutes if you are ok with 80% distribution of knowledge or even 10, when we talk about 95%. The idea is to decrease this time by notifying explicitly all nodes in the advertiser Kademlia table by sending them `PING` with a new sequence number. Let's try and see how it works, compared to [[3. Time and traffic to establish small network fraction advertisements]](#3-Time-and-traffic-to-establish-small-network-fraction-advertisements). ![](https://i.imgur.com/NvWU6AI.png) Figure PA1 Looks like it is not working. Wouldn’t recommend it. The reason behind is that the peers that are in our Kademlia table don't include our peer in their tables, especially top buckets. Only lower buckets could know about us, so by pinging, say, 200 peers, we are reaching only 20 peers who have our peer on file and want to update its info. If we could know which peer has our info in their Kademlia table, we could make this approach effective, but collecting such information looks a sophisticated task. #### B. FINDNODE by attribute[^](#Discovery-peer-advertisement-efficiency-analysis) In order to improve search times in [[6. Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000)]](#6-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-small-fraction-64-of-10000) and reduce traffic in [[7. Time and traffic to find 8 advertised peers when advertisers are a big part of the network]](#7-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-big-part-of-the-network) for ENR attribute, make possible to find even smaller attributed fractions in network in small time and decrease traffic in all ENR search cases, following Discovery V5 API is proposed: --- ### FINDNODEATT Request (0x09) message-data = [request-id, attribute-key, (attribute-value)] message-type = 0x09 attribute-key = searched ENR attribute key attribute-value = (optional) searched ENR attribute value for provided key FINDNODEATT queries for nodes with non-null ENR attribute identified by `attribute-key`. If optional parameter `attribute-value` is provided, only ENRs with `attribute-key == attribute-value` should be returned. Maximum number of returned records is `K-BUCKET`. If more than `K-BUCKET` records are found by search, it is recommended to return a random subset of `K-BUCKET` size from it. --- Optionality of `attribute-value` could be used for other features, when most nodes in a network are in one of subnetworks, omitting this field doesn’t improve search speed and traffic, so we are going to use it with `attribute-value` provided. This should limit output only to nodes that belong to a subnetwork of search and provide the answer from each peer in one query. Let’s see the results with setup similar to [[6. Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000)]](#6-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-small-fraction-64-of-10000): find 8 of 64 nodes in 10,000 peers network. We also applied 16 nodes limit to topic advertisement too. We got 1.4 seconds on average search and just 3.3 Kb of traffic per searcher! It definitely looks better even than Discovery V5 topic advertisements. But let's return to [[6a. What’s the smallest fraction of advertised peers to be found]](#6a-What’s-the-smallest-fraction-of-advertised-peers-to-be-found) question: what's lowest edge ENR attribute could handle. ![](https://i.imgur.com/KaBSfS8.png) Figure PB1 First results look very promising, so let’s try the hardest search within 10,000 peers: find 8 of 16. And it looks like the bottom case for ENR with attribute search: though traffic expenditures for ENR are still smaller, peers spending 3x more time on search, and distribution of search time is pretty random. While 29 searchers were successful, 1 of 30 passes found only 6 peers with 8 required searching through his whole Kademlia table, making further search improvements sophisticated. Same time let’s not forget that 16 of 10,000 is 1/625, definitely, a very small fraction of the network. You could check visualized results on *figures PB2* and *PB3*. ![](https://i.imgur.com/KB0fR03.png) Figure PB2 ![](https://i.imgur.com/KzFUk1z.png) Figure PB3 ## Conclusion[^](#Discovery-peer-advertisement-efficiency-analysis) Ongoing development of the Ethereum network, both 1.x and 2.0 requires specialization of peers and introducing a set of different peer roles and responsibilities, see [[Requirements]](#Requirements). In order to make appropriate peers easily discoverable, several techniques were suggested, but most promising are topic advertisements in Discovery V5 and ENR attributes. By choosing several metrics to draw up a complete picture of advertisement efficiency in [[Efficiency and metrics]](#Efficiency-and-metrics) and testing selected advertisement solutions in the simulator we were able to compare it. Our analysis shows that in most cases ENR serves better than topic advertisement, especially when the advertiser wants to support an advertisement in the long run. Measurement of metric [[1. Time and traffic spent by an advertiser]](#1-Time-and-traffic-spent-by-an-advertiser) shows that topic advertisement consumes 150 times<a title="8,535,648 bytes / 57,095 bytes per advertiser" href="#">\*</a> more traffic during 24 hours. Moreover, results of [[2. Time and traffic spent by the media]](#2-Time-and-traffic-spent-by-the-media) stacked with advertiser metrics shows that overall network efforts spent on advertisement measured in steps of peer action is almost 25 times<a title="5,577.6 steps / 224 steps per advertiser" href="#">\*</a> higher in case of using topic advertisement. [[3. Time and traffic to establish small network fraction advertisements]](#3-Time-and-traffic-to-establish-small-network-fraction-advertisements) and [[4. Time and traffic to establish big network fraction advertisements]](#4-Time-and-traffic-to-establish-big-network-fraction-advertisements) shows the only weak point of ENR - time to establish advertisement. Though even in case of one advertisement without prolongation total network traffic expenditures are a bit lower<a title="67.41 Kb vs 86.83 Kb with small advertised fraction" href="#">\*</a> with ENR attribute change, time to establish advertisement is significantly higher<a title="from 1.8 minutes for 50% coverage to 10 minutes for 95% for ENR vs 5.5 seconds for topic advertisement" href="#">\*</a> with ENR than with topic advertisement. We tried to improve it with [[A. Ping shower]](#A-Ping-shower) proposal but failed to show any usefulness of changes. On this side the one good thing for ENR is that stopping an advertisement will be on par with stopping a topic advertisement (see [[5. Time to stop big and small network fraction advertisements]](#5-Time-to-stop-big-and-small-network-fraction-advertisements)). [[6. Time and traffic to find 8 advertised peers when advertisers are a small fraction (64 of 10,000)]](#6-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-small-fraction-64-of-10000) and [[7. Time and traffic to find 8 advertised peers when advertisers are a big part of the network]](#7-Time-and-traffic-to-find-8-advertised-peers-when-advertisers-are-a-big-part-of-the-network) shows that ENR is on par with topic advertisement when 1/20 of network is advertised lagging behind a bit with traffic spent on search, but is almost 5 times worse both in traffic<a title="482.38 Kb vs 104.01 Kb" href="#">\*</a> and time<a title="12.33 vs 3.32 seconds with some nodes spent up to 30 seconds with ENR" href="#">\*</a> when less than 1/150 of the network is advertised. Our proposal [[B. FINDNODE by attribute]](#B-FINDNODE-by-attribute) makes ENR attributes usable for advertisement of up to 1/625 dash of network and inverts tool costs for non-extreme cases: ENR search is 2.4x times faster<a title="1.37 vs 3.32 seconds" href="#">\*</a> than topic advertisement and spends 20 times less<a title="3.34 Kb vs 66.21 Kb" href="#">\*</a> traffic when 1/150 of the network is advertised. Though search of ENR advertising could be partially solved with local peer storage in excess of the Kademlia table, it’s not a part of protocol and doesn’t guarantee maintaining the same level of data freshness as Kademlia table. As shown in [[8. Churn resilience when looking up for advertised peers]](#8-Churn-resilience-when-looking-up-for-advertised-peers) 25% churn rate significantly<a title="3.33 vs 11 minutes" href="#">\*</a> increases time required for deep ENR distribution, while placing less pressure during early stages. Such a churn rate is impossible in the whole network, but rarely could be real for its geographical clusters. For topic advertisement it increases the media radius, which could affect search times of small advertised fractions. From the search side both topic advertisement and ENR look almost unaffected by churn rate. Another outcome from the simulator was figuring out high complexity of topic advertisement protocol, especially taking into account plain simplicity of ENR. This is particularly unfortunate, given that major issues with scalability of topic advertisement are not solved and any changes will increase complexity leading to a higher chance of implementation errors. Not only should the topic radius work equally for both advertisers and searchers, it should distribute all ads equally to searchers giving equal number of hits to each advertiser. For ENR such properties come just out of the box. Beyond the scope of this work we left extra advertiser efforts in topic advertisement when queue doesn’t allow to place an ad right now or is attacked, sadly it will be the issue for only advertiser and not the other members of network, meaning required attack power will need to match only advertisers to increase advertisement placement expenditures. Same time, distribution of updated ENR is done by the whole network minimizing each node cost. Lack of redistribution in topic advertisement leads to leaving the advertiser alone in protection of his role in the network. While security features of standard Kademlia `STORE` and `FIND_VALUE` are well examined by hundreds of researchers[[27]](#references-27)[[28]](#references-28)[[29]](#references-2)[[30]](#references-30) in last 20 years and different approaches are implemented in several real p2p networks, topic advertisement feature is not based on it and it's brand new, which leads to necessity of deep security research before any usage. From the first view it looks like it needs a number of security improvements especially in protection to sybil attacks, which could be very profitable in Ethereum 2.0 network compared to free file-sharing Kademlia based networks, where attacks were not monetized. And it requires to further increase protocol complexity with something like social ratings or BFT protection of the media. On the other side, using random media for topic advertisements will level down few benefits leaving all disadvantages of topic advertisement still in place. Requirements shows that most search needs of protocol end with a stable provider: if you want to be a server for light client API you are not giving up in 15 minutes. Ethereum 2.0 appeared to require 6 minutes meetings but brilliant solution of using stable skeleton with rare changes provides better security features and as simulations show could be found by searcher and joined faster with ENR. As a candidate for being a data locator for stateless approach, topic advertisement needs serious adjustments, that is to say, creating specialized solution from the blank. Our final recommendations include: - using ENR with [[FINDNODE by attribute]](#B-FINDNODE-by-attribute) for advertisement of any stable (more than 1 hour) node duties in which participates more than 1/600 of current network - modify any specific node advertisement and lookup tasks in network to use ENR attributes like it was done with committees in Ethereum 2.0 - make a supplementary research of ENR safety features and ways to improve it - in any method in Discovery V5 that could return more than `K-BUCKET` nodes, limit reply to `K-BUCKET` nodes shuffled from search result - don't use topic advertisement for advertisement of any node duties longer than 1 hour, especially when advertisement media attack could be profitable ## References[^](#Discovery-peer-advertisement-efficiency-analysis) 1. [^](#references-1-back) [Petar Maymounkov and David Mazières "Kademlia: A Peer-to-peer Information System Based on the XOR Metric", 2002](http://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf) 2. [^](#references-2-back) [Discovery V4 protocol reference](https://github.com/ethereum/devp2p/blob/master/discv4.md) 3. [^](#references-3-back) [Wikipedia: Ethereum](https://en.wikipedia.org/wiki/Ethereum) 4. [^](#references-4-back) [Node Discovery Protocol v5](https://github.com/ethereum/devp2p/blob/226690adfaf151d9e758df6c1056260105ca1500/discv5/discv5.md) 5. [^](#references-5-back) [EIP 778: Ethereum Node Records (ENR)](https://eips.ethereum.org/EIPS/eip-778) 6. [^](#references-6-back) [Node Discovery Protocol v5 - Theory: Nodes, Records and Distances](https://github.com/ethereum/devp2p/blob/226690adfaf151d9e758df6c1056260105ca1500/discv5/discv5-theory.md#nodes-records-and-distances) 7. [^](#references-7-back) [Discovery V5 Theory: Topic Radius Estimation](https://github.com/ethereum/devp2p/blob/226690adfaf151d9e758df6c1056260105ca1500/discv5/discv5-theory.md#topic-radius-estimation) 8. [^](#references-8-back) [Ethereum 2.0 networking specification: ENR structure / Attestation subnet bitfield](https://github.com/ethereum/eth2.0-specs/blob/9d39c292e020ee9243bd33de99208d884fb92517/specs/phase0/p2p-interface.md#attestation-subnet-bitfield) 9. [^](#references-7-back) [Discovery V5 Theory: Topic Radius Estimation](https://github.com/ethereum/devp2p/blob/226690adfaf151d9e758df6c1056260105ca1500/discv5/discv5-theory.md#topic-radius-estimation) 10. [^](#references-8-back) [Discovery V5 issue \#136: topic radius estimation doesn't work](https://github.com/ethereum/devp2p/issues/136) 11. [^](#references-11-back) [Discovery V5 Wire Protocol: TOPICQUERY Request (0x08)](https://github.com/ethereum/devp2p/blob/fa37304f15c937de75c26d776986463ed97936fa/discv5/discv5-wire.md#topicquery-request-0x08) 12. [^](#references-12-back) [Ethereum 2.0 Phase 0 -- Honest Validator: Phase 0 attestation subnet stability](https://github.com/ethereum/eth2.0-specs/blob/81dc71c312a44c59780a154bb056de1fb9041c6e/specs/phase0/validator.md#phase-0-attestation-subnet-stability) 13. [^](#references-13-back) [EIP 778: Ethereum Node Records (ENR)](https://eips.ethereum.org/EIPS/eip-778) 14. [^](#references-14-back) [Light Ethereum Subprotocol (LES): Client Side Flow Control](https://github.com/ethereum/devp2p/blob/2c132a47adefbc9e127c7bc82275ba71c94e6119/caps/les.md#client-side-flow-control) 15. [^](#references-15-back) [Stateless research roadmap](https://github.com/ethereum/stateless-ethereum-specs/blob/01978428fa33253f8dc9c3e0764409a428f47a89/roadmaps.md#9-one-network-vs-two-networks-vs-three-networks) 16. [^](#references-16-back) [DHT based solution for the “Data Retrieval” problem under the stateless paradigm](https://ethresear.ch/t/dht-based-solution-for-the-data-retrieval-problem-under-the-stateless-paradigm/7070) 17. [^](#references-17-back) [Ethereum 2.0 networking specification](https://github.com/ethereum/eth2.0-specs/blob/a612df1119c06921fa4ad11129a00f44b7056070/specs/phase0/p2p-interface.md#attestation-subnets) 18. [^](#references-18-back) [Ethereum 2.0 Phase 0 -- The Beacon Chain: Misc constants](https://github.com/ethereum/eth2.0-specs/blob/c841aa102bad17c0e06d0c1bb033d44591cad619/specs/phase0/beacon-chain.md#misc) 19. [^](#references-19-back) [Meta: cap total ether supply at ~120 million](https://github.com/ethereum/EIPs/issues/960) 20. [^](#references-20-back) [Eth 2.0 Economics](https://docs.ethhub.io/ethereum-roadmap/ethereum-2.0/eth-2.0-economics/) 21. [^](#references-21-back) [Ethereum 2.0 Phase 0 -- Honest Validator: Lookahead](https://github.com/ethereum/eth2.0-specs/blob/6a40f71a31724aab57ca90ff0a00fc5a536f62d6/specs/phase0/validator.md#lookahead) 22. [^](#references-22-back) [Neighbors lookup in Discovery V5](https://hackmd.io/@zilm/BykU7RRGL) 23. [^](#references-23-back) [Router.kt source code](https://github.com/zilm13/discv5/blob/master/src/main/kotlin/org/ethereum/discv5/core/Router.kt) 24. [^](#references-24-back) [Discovery V5 Theory: Tickets](https://github.com/ethereum/devp2p/blob/226690adfaf151d9e758df6c1056260105ca1500/discv5/discv5-theory.md#tickets) 25. [^](#references-25-back) [Ethereum 2.0 Phase 0 -- Honest Validator: Phase 0 attestation subnet stability](https://github.com/ethereum/eth2.0-specs/blob/81dc71c312a44c59780a154bb056de1fb9041c6e/specs/phase0/validator.md#phase-0-attestation-subnet-stability) 26. [^](#references-26-back) [Discovery V5 issue \#136: topic radius estimation doesn't work](https://github.com/ethereum/devp2p/issues/136) 27. [^](#references-27-back) [M. Srivatsa and L. Liu. Vulnerabilities and security threats in structured overlay networks: A quantitative analysis](https://pdfs.semanticscholar.org/a7b8/f19390484186d51b3711e970a0cfc8d46f4a.pdf) 28. [^](#references-28-back) [B. Awerbuch and C. Scheideler. Towards a scalable and robust DHT](http://www.cs.jhu.edu/~baruch/RESEARCH/Research_areas/Peer-to-Peer/2006_SPAA/virtual5.pdf) 29. [^](#references-29-back) [J. Liang, R. Kumar, Y. Xi, K. Ross. Pollution in P2P file sharing systems](https://cs.baylor.edu/~donahoo/classes/5321/papers/LKX+05.pdf) 30. [^](#references-30-back) [N. Naoumov and K. Ross. Exploiting P2P systems for DDoS attacks](http://cis.poly.edu/~ross/papers/p2pddos.pdf)