# Double Hashed DHT Metrics and Analysis This document contains metrics collected using [Testground](https://github.com/testground/testground) on four different variations of go-libp2-kad-dht: - "vanilla": the current DHT implementation, where a CID is used internally as the provided/looked-up key. - double hashed: the CID that is provided/looked-up is internally hashed, and the hash of the CID is what is passed between nodes. - prefix lookup: uses double hashing, but also optionally allows for a variable length prefix of the hash to be passed between nodes. - provider encrypt: uses double hashing and (optional) prefix lookups, but also encrypts and signs the peer ID of the provider before it's stored in the DHT. On receipt of a provider record, the ciphertext must be decrypted to the peer ID. The encryption/decryption key is the lookup CID (ie. the pre-image of the double-hashed value passed between nodes). <!-- ``` - Metrics to run benchmarking against: - Files are stored at the right host - double hash the file, get bitstring, from the trie of the node we go and find the closest peer and check whether they're in the closest 20 peers - Doesn't need to be a prefix query - Network overhead: how many more bytes are we putting on the wire while serving and requesting content - Overhead in the number of hops (content lookup) - CPU overhead: i) for double-hashing, ii) for the signature - Latency: Time To Provider Record (TTPR), i.e., DHT walk of the retrieval part only ``` --> # 1. Testing Environment ### 1.1 Tested Branches | Name | Github | Commit ID | | ---- | ---- | ---- | | Vanilla (with hop counts) | https://github.com/ChainSafe/go-libp2p-kad-dht/tree/noot/measure-hops | `7af377cd4175ecffab591c862f928de12b3aee21` | | Double Hash | https://github.com/ChainSafe/go-libp2p-kad-dht/tree/noot/dh-hops | `2e7ec8334806cdf475378908d563863302f7296b` | | Prefix Lookup | https://github.com/ChainSafe/go-libp2p-kad-dht/tree/noot/pl-hops | `2fbe0738960eaf9c87df1c0c10b50768a54956e6` | | Provider Encrypt | https://github.com/ChainSafe/go-libp2p-kad-dht/tree/noot/pe-hops | `3ac1963ff033e89f8a6e1916716df5978e9edd14` | > Note: the `Double Hash`, `Prefix Lookup` and `Provider Encrypt` branches build off one another; for example, the prefix lookup branch contains all the changes in the double hash branch, and the `Provider Encrypt` branch contains all the changes in the prefix lookup branch. The runs will test "incremental" changes to the DHT that will test the different behaviours of all. ### 1.2 Environment Resource Allocation The tests performed with testground utilizes the Docker process and the number of resources that were allocated to the Docker process are as follows: | Resource | Value | |----|----| | CPUs | 6 | | Memory | 10GB | | Swap | 2GB | ### 1.3 Docker Image Sizes The following list the Docker image sizes of the corresponding branches of each test: | Branch | Size (in GB) | |----|----| | Vanilla | 1.54 | | Double Hash | 1.54 | | Prefix Lookup | 1.54 | | Provider Encrypt | 1.55 | > \* Refer to section 1.1 for each corresponding branches and respective repositories. # 2. Testing Parameters ### 2.1 Double Hashed DHT Tests > Note: runs 1-10 used a 256-bit prefix length for the `Prefix Lookup` and `Provider Encrypt` branches, which is the full key lookup. | Run 1-3 | Preliminary Benchmarks (50 nodes, 20 provider records each) | -- | -- | | Vanilla | https://gist.github.com/araskachoi/d82bb6d295d7cfb448f082fc6403d4d3 | | Double Hash | https://gist.github.com/araskachoi/194b2ac55fe22bd3ad4dfcef0d74cce3 | | Prefix Lookup | https://gist.github.com/araskachoi/a24c242fddf17c55df29d60a50946c65 | | Provider Encrypt | https://gist.github.com/araskachoi/af2f41ff0a9dd1d48f0e7955acff6628 | | Run | Branch | Latency | Test Parameters | -- | -- | -- | -- | | 4-6 | Vanilla | 500ms | https://gist.github.com/araskachoi/5e45b81654a639f2a9e9fb6f9549d36e | | 4-6 | Double Hash | 500ms | https://gist.github.com/araskachoi/88741d0c29714a26cc4dabfe11569332 | | 4-6 | Prefix Lookup | 500ms | https://gist.github.com/araskachoi/78a35459e6921dda0e72941d3be16fb5 | | 4-6 | Provider Encrypt | 500ms | https://gist.github.com/araskachoi/367c35b9dfe3ab6e5f70db9df158d65d | | Run | Branch | Latency | Test Parameters | -- | -- | -- | -- | | 7-9 | Vanilla | 1000ms | https://gist.github.com/araskachoi/45dd7e1568a4372c1d92fa8d10a5cbda | | 7-9 | Double Hash | 1000ms | https://gist.github.com/araskachoi/395e187071cc0d5e03e6dcb96347f881 | | 7-9 | Prefix Lookup | 1000ms | https://gist.github.com/araskachoi/a5d7ba9bbcba8dff7c5126c0b5de8ff2 | | 7-9 | Provider Encrypt | 1000ms | https://gist.github.com/araskachoi/a113c558b2a3c859455ce9eb4d6b8c58 | | Run | Branch | Prefix Lookup | Test Parameters | | -- | -- | -- | -- | | run10-12 | Prefix Lookup | 128 bit | https://gist.github.com/araskachoi/0ff06ff35972f65f8013d63f0e308050 | | run13-15 | Prefix Lookup | 64 bit | https://gist.github.com/araskachoi/7d34aca1b2330351c8cc189a20b6d16d | | run16-18 | Prefix Lookup | 32 bit | https://gist.github.com/araskachoi/a3b719f0a81e7995a80fe73a96b02bfd | | run19-21 | Prefix Lookup | 16 bit | https://gist.github.com/araskachoi/7fc8961256ec688016389b35c28d5b1c | | run22-24 | Prefix Lookup | 8 bit | https://gist.github.com/araskachoi/b30b778ae6f6b7e57c6ddefb8428e4cf | | run25-27 | Prefix Lookup | 4 bit | https://gist.github.com/araskachoi/394d9719e22eeef22a556dca2c7303dc | > \* For the purposes of these tests, we have decided to only run against the `Prefix Lookup` branch because this was the branch that utilized the prefix length variability and could produce adequate data sets and convey the affects of different prefix length sizes to the KAD DHT. > \* Note: run14 could not complete in the given time due to insufficient resources. # 3. Metrics Collected <details><summary>DHT data points</summary> - "barrierbootstrapping0" - "barrierprovider-records1" - "barrierprovider-records2" - "full bootstrapping0" - "full provider-records1" - "full provider-records2" - "peers-found|done" - "peers-missing|done" - "signal bootstrapping0" - "signal provider-records1" - "signal provider-records2" - "time-to-find-first" - "time-to-find-last" - "time-to-find|done" - "time-to-provide" - "bandwidth-total-in" - "bandwidth-total-out" - "bandwidth-rate-in" - "bandwidth-rate-out" </details> <details><summary>Metrics collected</summary> - "EnableGC" - "HeapAlloc" - "LastGC" - "NumGC" - "MCacheSys" - "StackSys" - "StackInuse" - "Sys" - "NumThread" - "HeapIdle" - "HeapInuse" - "Lookups" - "MSpanInuse" - "Frees" - "NextGC" - "NumCgoCall" - "PauseTotalNs" - "NumGoroutine" - "BuckHashSys" - "HeapObjects" - "GCCPUFraction" - "TotalAlloc" - "DebugGC" - "HeapReleased" - "HeapSys" - "MCacheInuse" - "Alloc" - "Mallocs" - "MSpanSys" - "pauseNs" - "readMemStats" </details> # 4. Results For the following metrics, there were 20 provider records put in the DHT per node. Since there were 50 nodes, there were 1000 records in total. For the "Prefix Lookup" and "Provider Encrypt" runs, a prefix length of 256 bits (32 bytes) was used, which is the same as a full-key lookup. Note that the graphs are only of "run 1"; additional graphs can be found in the [appendix](https://docs.google.com/spreadsheets/d/1ywNNrEQxqDDfTMi7SYh57_w-kh6ZCURdzpEdt64maiM/). ## 4.1 CPU Usage The table values are %CPU used. <!-- | Branch | Total Avg | Min Avg | Max Avg | | -------- | -------- | -------- | -------- | | Vanilla | 7.238724809 | 6.319705882 | 8.434285714 | | Double hash | 7.984708669 | 6.981875 | 9.694193548 | | Prefix Lookup | 6.284318758 | 5.242307692 | 8.111538462 | | Provider Encrypt | 6.82339943 | 5.933461538 | 8.811923077 | > \* note: the averages listed here are the per-run averages of each metric of all nodes. The "total avg" is the average of all node averages, the "min avg" is the minimum node average, and the "max avg" is the maximum node average found. --> | Branch | Total Avg | Min Avg | Max Avg | | -------- | -------- | -------- | -------- | | Vanilla | 8.767696862 | 7.178214286 | 10.18285714 | | Double hash | 9.01226081 | 7.47 | 10.78758621 | | Prefix Lookup | 9.305557917 | 7.359333333 | 12.79533333 | | Provider Encrypt | 10.54163666 | 8.6359375 | 12.385 | > \* note: the averages listed here are the per-run averages of each metric of all nodes. The "total avg" is the average of all node averages, the "min avg" is the minimum node average, and the "max avg" is the maximum node average found. <!-- x-axis: index of points collected* --> y-axis: %CPU used > \* note: the x-axis ranges from the start of the test to when the test completes. If there are any plots extend more than others, that implies that the test took longer to complete. ### Vanilla ![](https://i.imgur.com/VXbAI62.png) > total test time for run 1 is: 230.4981s. each increment will be > \* test times can be found in the [appendix](https://docs.google.com/spreadsheets/d/1ywNNrEQxqDDfTMi7SYh57_w-kh6ZCURdzpEdt64maiM/edit#gid=1538420453) with page "Total Test Time". ### Double hashed ![](https://i.imgur.com/UnGZDOJ.png) ### Prefix lookup ![](https://i.imgur.com/wJfjBe5.png) ### Provider Encrypt ![](https://i.imgur.com/s8hEs94.png) ### Local tests - tested using dht-tester repo on an Intel i7-8650U (8) @ 1.900GHz CPU - for 100 nodes, doing constant lookups - max CPU% after 10 minutes | DHT type | Max CPU% | Max CPU% per DHT node | | -------- | -------- | -------- | | Vanilla | 292 | 2.92 | | Double Hash | 318 | 3.18 | | Prefix Lookup (full prefix) | 325 | 3.25 | | Provider Encrypt | 342 | 3.42 | ## 4.2 Thread Usage x-axis: timestamp metrics was logged (in nanoseconds) y-axis: number of threads ### Vanilla ![](https://i.imgur.com/1ME26H1.png) ### Double hashed ![](https://i.imgur.com/yXOkFag.png) ### Prefix lookup ![](https://i.imgur.com/0hEY6oJ.png) ### Provider Encrypt ![](https://i.imgur.com/SC6CUJw.png) ## 4.3 time-to-provide | branch | total avg [ns] | min avg [ns] | max avg [ns] | | -------- | -------- | -------- | -------- | | Vanilla | 19755489179 | 17245047620 | 21768036530 | | Double Hash | 34404439766 | 27877333480 | 41673063047 | | Prefix Lookup | 24668693704 | 22903751253 | 27181959505 | | Provider Encrypt | 45428101619 | 37264810240 | 53939038954 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: duration it took to put the record in the DHT (in nanoseconds) ### Vanilla ![](https://i.imgur.com/3dKK9pw.png) ### Double hashed ![](https://i.imgur.com/xLO8SsG.png) ### Prefix lookup ![](https://i.imgur.com/pkhLn7r.png) ### Provider Encrypt ![](https://i.imgur.com/GfTlx1T.png) ## 4.4 Hop Count This metric is the maximum number of hops measured before the call to `FindProviders` returned. > \* Note: since queries are done concurrently, the actual amount of hops to find a record might have been lower; this metric returns the highest number of hops that were actually recorded, whether that hop amount returned a record or not. | branch | total avg | min avg | max avg | | -------- | -------- | -------- | -------- | | Vanilla | 3.116 | 2.48 | 4.28 | | Double Hash | 2.675 | 2.34 | 3.22 | | Prefix Lookup | 2.833 | 2.5 | 3.3 | | Provider Encrypt | 2.318 | 1.8 | 3.12 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: maximum number of hops to find a provider record ### Vanilla ![](https://i.imgur.com/YTdn1Uv.png) ### Double hashed ![](https://i.imgur.com/nBnQ0JQ.png) ### Prefix lookup ![](https://i.imgur.com/yJxRj2n.png) ### Provider encryption ![](https://i.imgur.com/xNHELqX.png) ## 4.5 time-to-find-first This metric measures the duration of time it took to find the first provider record. | branch | total avg [ns] | min avg [ns] | max avg [ns] | | -------- | -------- | -------- | -------- | | Vanilla | 4358458490 | 2176952856 | 7404290168 | | Double Hash | 3476423993 | 2592666162 | 4368479436 | | Prefix Lookup | 2362583430 | 1104230617 | 3060013809 | | Provider Encrypt | 5540856379 | 3961127670 | 7158101316 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: duration it took to find first provider (in nanoseconds) ### Vanilla ![](https://i.imgur.com/cHnsb1U.png) ### Double hashed ![](https://i.imgur.com/nbncSOA.png) ### Prefix lookup ![](https://i.imgur.com/43L3B7V.png) ### Provider Encrypt ![](https://i.imgur.com/YKEs2QI.png) ## 4.6 time-to-find-last This metric measures the duration of time it took to find the last provider record. | branch | total avg | min avg | max avg | | -------- | -------- | -------- | -------- | | Vanilla | 27804715510 | 14985912742 | 38747303805 | | Double Hash | 57916260209 | 25872157950 | 81042507119 | | Prefix Lookup (run 1) | 8480736654 | 6098042705 | 10831531002 | | Prefix Lookup (run 2) | 61770177647 | 28407672093 | 109111871884 | | Provider Encrypt | 97087489518 | 42982464315 | 134991841268 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: duration it took to find last provider (in nanoseconds) ### Vanilla ![](https://i.imgur.com/fy20iHo.png) ### Double hashed ![](https://i.imgur.com/7ju5gtT.png) ### Prefix lookup ![](https://i.imgur.com/gRsLZqR.png) ### Provider Encrypt ![](https://i.imgur.com/kvtVOgz.png) ## 4.7 time-to-find This metric measures the duration of time it took (in nanoseconds) to find any provider record. | branch | total avg [ns] | min avg [ns] | max avg [ns] | | -------- | -------- | -------- | -------- | | Vanilla | 79390010140 | 73551090972 | 85426058177 | | Double Hash | 122686318752 | 103234014714 | 128572602244 | | Prefix Lookup | 104361404516 | 95075171130 | 112980436864 | | Provider Encrypt | 211962635950 | 197702581172 | 218596740955 | This metric logs all timestamps of when providers were found. x-axis: timestamp metric was logged (in nanoseconds) y-axis: duration it took to find provider (in nanoseconds) ### Vanilla ![](https://i.imgur.com/D0UCRTE.png) ### Double hashed ![](https://i.imgur.com/n2CYddw.png) ### Prefix lookup ![](https://i.imgur.com/y3C1TEW.png) ### Provider Encrypt ![](https://i.imgur.com/MW5dcbQ.png) ## 4.8 peers-found | branch | total avg | min avg | max avg | | -------- | -------- | -------- | -------- | | Vanilla | 50 | 49.96 | 50 | | Double Hash | 49.496 | 47.8 | 49.98 | | Prefix Lookup | 50 | 50 | 50 | | Provider Encrypt | 48.684 | 46.32 | 49.96 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: number of providers found ### Vanilla ![](https://i.imgur.com/fHsJ33B.png) ### Double hashed ![](https://i.imgur.com/qfhwicm.png) ### Prefix lookup ![](https://i.imgur.com/8cwW2QB.png) ### Provider Encrypt ![](https://i.imgur.com/d7h0ZRp.png) ## 4.9 peers-missing | branch | total avg | min avg | max avg | | -------- | -------- | -------- | -------- | | Vanilla | 0.002 | 0 | 0.04 | | Double Hash | 0.504 | 0.02 | 2.2 | | Prefix Lookup | 0 | 0 | 0 | | Provider Encrypt | 1.316 | 0.04 | 3.68 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: number of providers *not* found (but that have a record in the DHT) ### Vanilla ![](https://i.imgur.com/8VxQioW.png) ### Double hashed ![](https://i.imgur.com/yqAv4Ul.png) ### Prefix lookup ![](https://i.imgur.com/DSu1rLT.png) ### Provider Encrypt ![](https://i.imgur.com/u2S9xIT.png) ## 4.10 bandwidth-total-in This metric logs to total inbound bandwidth of each node in bytes. | branch | total avg [bytes] | min avg [bytes] | max avg [bytes] | | -------- | -------- | -------- | -------- | | Vanilla | 441164.0927 | 381232.5833 | 512251.36 | | Double Hash | 338069.1842 | 262513.6622 | 381084.9737 | | Prefix Lookup | 536800.1177 | 384560.0612 | 623936.5833 | | Provider Encrypt | 460506.8198 | 377045.4625 | 552784.4023 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: inbound bandwidth (in bytes) ### Vanilla ![](https://i.imgur.com/AE8SQX7.png) ### Double hashed ![](https://i.imgur.com/XxxyHGc.png) ### Prefix lookup ![](https://i.imgur.com/5wZTX3s.png) ### Provider Encrypt ![](https://i.imgur.com/V28C7pf.png) ## 4.11 bandwidth-total-out This metric logs to total outbound bandwidth of each node in bytes. | branch | total avg [bytes] | min avg [bytes] | max avg [bytes] | | -------- | -------- | -------- | -------- | | Vanilla | 460365.8432 | 148405.32 | 949377.34 | | Double Hash | 351425.4874 | 216034.9733 | 521763.2632 | | Prefix Lookup | 546558.6369 | 247134.9796 | 879130.6182 | | Provider Encrypt | 479355.5875 | 291006.0562 | 732418.9302 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: outbound bandwidth (in bytes) ### Vanilla ![](https://i.imgur.com/o3NX6wf.png) ### Double hashed ![](https://i.imgur.com/syIGwuP.png) ### Prefix lookup ![](https://i.imgur.com/whPolq9.png) ### Provider Encrypt ![](https://i.imgur.com/Rm8XDU3.png) ## 4.12 bandwidth-rate-in This metric logs the total inbound bandwidth rate of each node in bytes. | branch | total avg [bytes] | min avg [bytes] | max avg [bytes] | | -------- | -------- | -------- | -------- | | Vanilla | 3155.638448 | 2079.756539 | 4253.469319 | | Double hash | 1614.433899 | 1189.380728 | 2125.49657 | | Prefix lookup | 3514.293084 | 2630.424616 | 4752.05726 | | Provider Encrypt | 1799.441934 | 1063.430731 | 4151.984007 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: inbound bandwidth rate (in bytes received per second) ### Vanilla ![](https://i.imgur.com/14p7EZI.png) ### Double hashed ![](https://i.imgur.com/QV4PiL2.png) ### Prefix lookup ![](https://i.imgur.com/6lgwQVV.png) ### Provider Encrypt ![](https://i.imgur.com/rXBacad.png) ## 4.13 bandwidth-rate-out This metric logs the total outbound bandwidth rate of each node in bytes. | branch | total avg [bytes] | min avg [bytes] | max avg [bytes] | | -------- | -------- | -------- | -------- | | Vanilla | 3324.515058 | 853.7182016 | 7349.261008 | | Double Hash | 1909.789706 | 822.9401642 | 7807.012158 | | Prefix Lookup | 3614.805553 | 1652.397524 | 6912.410938 | | Provider Encrypt | 2218.613802 | 831.3868787 | 12023.19057 | x-axis: timestamp metric was logged (in nanoseconds) y-axis: outbound bandwidth rate (in bytes sent per second) ### Vanilla ![](https://i.imgur.com/birL8Cf.png) ### Double hashed ![](https://i.imgur.com/AMqEu2u.png) ### Prefix lookup ![](https://i.imgur.com/KFq2tS8.png) ### Provider Encrypt ![](https://i.imgur.com/WvxoR4p.png) ## 5. Analysis ### 5.1 CPU Usage For all branches measured, the average CPU usage was around 7%, with occasional spikes of up to 40-60%. The highest spike was measured in the `Double Hash` branch which neared 60%. Overall, there were no major differences in CPU usage between the branches. Tested locally (outside of testground), the maximum CPU% used was significantly lower, between 2.92%-3.42%. However, there was a slight increase moving from `Vanilla`, `Double Hash`, `Prefix Lookup`, to `Provider Encrypt`. The `Provider Encrypt` branch had 17% higher CPU usage per DHT node than the vanilla branch after 10 minutes of continuous lookups. ### 5.2 Thread usage The number of threads used per node reached a maximum of 25 on the vanilla branch. The `Double Hash` and `Provider Encrypt` branches were slightly lower, with a maximum thread count of 22. There were no major differences in thread count between the branches. Across all of the tests (run 1-6) with a record_count of 20, the number of threads used across the test did not fluctuate very much. The `Vanilla` nodes were seen to have a maximum of about 23-25 and the `Provider Encrypt`, `Provider Lookup`, and `Double Hash` branches have a maximum of 22-23. ### 5.3 time-to-provide The time-to-provide metric measures how long it took for the `dht.Provide()` call to complete, which puts a provider store into the node's local storage, and sends a `PutProvider` message to nodes closest to the value being stored. For the `Vanilla` and `Prefix Lookup` branches, the maximum time to provide was around 45 seconds. For `Provider Encrypt`, it was higher at a maximum of around 80 seconds, and for double-hash it was the highest at around 120 seconds. The distribution of the points appears to be evenly spaced on the line between the origin and the maximum point. Since the prefix-lookup and double-hash branches are be the same logic-wise for putting provider records, we expect the time-to-provide for those two branches and the `Vanilla` DHT to be around the same. The `Provider Encrypt` time-to-provide is expected to be higher due to the added encryption, signing and verification time, which was observed. The average time to provide on the provider-encrypt branch was 454 seconds, compared to 198 seconds for the vanilla branch. However, the average time-to-provide was also higher for the double-hash and prefix-lookup branches, at 344 seconds and 247 seconds respectively. <!-- > I see that the 4bit `Provider Lookup` (run11) time-to-provide is very similar to that of the vanilla run1 times. This shows that the "smallest" prefix length that was tested gives similar results as those of 256 bit lengths. The maximum is found to be ~4.5-4.9x1e10 NS. > The maximum time to provide has increased significantly in run12 (8-bit prefix length `Prefix Lookup`) is ~1.1x1e11 NS. This is about a 55.4-59.1% increase in time to provide. > No noticeable differences between total bandwidth in for run11 and run12. Run13 gives similar results as those of run12. > The time to provide has decreased in run15 (32-bit prefix length `Prefix Lookup`). The maximum is ~9x1e10 NS in run15. This is a decrease from run11 to run12. It is ~18% decrease in time to provide between run11/12 and run15. However, there is still a 45-50% increase between run1 and run15. > The time to provide has decreased in run16 (64-bit prefix length `Prefix Lookup`) to ~8x1e10 NS. This is an 11% decrease between run15 and run16 and a ~38.75-43.75% increase between run1 and run16. --> ### 5.4 Number of hops The num-hops metric measures the maximum number of hops before a provider record was found. The maximum hop counts observed: | Branch | Run | Hop Count | | ---- | ---- | ---- | | Vanilla | 1 | 9 | | Vanilla | 2 | 9 | | Vanilla | 3 | 8 | | Vanilla | 4 | 12 | | Vanilla | 5 | 9 | | Vanilla | 6 | 7 | | Vanilla | 7 | 9 | | Vanilla | 8 | 8 | | Vanilla | 9 | 6 | | Vanilla | 10 | 5 | | Branch | Run | Hop Count | | ---- | ---- | ---- | | Double Hash | 1 | 8 | | Double Hash | 2 | 9 | | Double Hash | 3 | 7 | | Double Hash | 4 | 11 | | Double Hash | 5 | 8 | | Double Hash | 6 | 8 | | Double Hash | 7 | NA | | Double Hash | 8 | NA | | Double Hash | 9 | 6 | | Double Hash | 10 | 7 | | Branch | Run | Prefix Length | Hop Count | | ---- | ---- | ---- | ---- | | Prefix Lookup | 1 | 256 | 9 | | Prefix Lookup | 2 | 256 | 8 | | Prefix Lookup | 3 | 256 | 8 | | Prefix Lookup | 4 | 256 | 7 | | Prefix Lookup | 5 | 256 | 9 | | Prefix Lookup | 6 | 256 | 9 | | Prefix Lookup | 7 | 256 | 8 | | Prefix Lookup | 8 | 256 | 9 | | Prefix Lookup | 9 | 256 | 7 | | Prefix Lookup | 10 | 256 | 6 | | Prefix Lookup | 11 | 4 | 13 | | Prefix Lookup | 12 | 8 | 9 | | Prefix Lookup | 13 | 16 | NA | | Prefix Lookup | 14 | 32 | 9 | | Prefix Lookup | 15 | 64 | 8 | | Prefix Lookup | 16 | 128 | 9 | | Branch | Run | Hop Count | | ---- | ---- | ---- | | Provider Encrypt | 1 | 11 | | Provider Encrypt | 2 | 8 | | Provider Encrypt | 3 | 8 | | Provider Encrypt | 4 | 10 | | Provider Encrypt | 5 | 10 | | Provider Encrypt | 6 | 13 | | Provider Encrypt | 7 | NA | | Provider Encrypt | 8 | NA | | Provider Encrypt | 9 | 7 | | Provider Encrypt | 10 | 8 | <!-- The hop count incrementally goes up from run1 to run6 and the change between the lowest and largest hops is relatively large increase (~45% increase) in the scope of this small sample size, but further tests may be needed with a larger sample size (perhaps 100-200 nodes) to observe if the delta is proportional. --> Overall, the average number of hops did not vary much for the different implementations; the lowest was provider-encrypt at 2.318 average hops and the highest was vanilla at 3.116. ### 5.5 time-to-find-first The time-to-find-first metric measures the time it took to find the first provider after calling `dht.FindProvidersAsync()`. We expect this duration to be around the same for all implementations. Surprisingly, it was the second-longest for the vanilla DHT, with a maximum of 5e10ns and an average of 4.36 seconds. For `Double Hash`, `Prefix Lookup`, and `Provider Encrypt` it was a maximum of 15 seconds, 25 seconds and 20 seconds respectively. However, the maximum values tended to be outliers; most values were located closer to the origin, meaning the duration was not close to the maximum. This indicates the implementations did not degrade the time to find the first provider. The average time slightly increased between vanilla (4.36 seconds) and provider-encrypt (5.54 seconds). ### 5.6 time-to-find-last The time-to-find-last metric measures the time it took to find the last provider after calling `dht.FindProvidersAsync()`. We expect this to be around the same for all implementations, except the `Provider Encrypt` branch, which may have a higher duration due to time added for decrypting provider records. This is what was observed; the maximum values were 90 seconds, 140 seconds, 130 seconds, and 25 seconds for `Vanilla`, `Double Hash`, `Prefix Lookup`, and `Provider Encrypt` respectively. Most values were close to the x-axis, meaning the duration was not close to the maximum values. However, the `Provider Encrypt` branch had a 3.5x increase in average time over the vanilla and 1.57x increase over `Prefix Lookup`, suggesting encryption can potentially increase the time to find the last provider significantly. ### 5.7 time-to-find The time-to-find metric measures the time it took to find a provider after calling `dht.FindProvidersAsync()`. Like the above, we expect this to be around the same for all implementations, except the `Provider Encrypt` branch, which may have a higher duration due to time added for decrypting provider records. The data was very similar to time-to-find-last; the maximum values were 110 seconds, 145 seconds, 130 seconds, and 25 seconds for `Vanilla`, `Double Hash`, `Prefix Lookup`, and `Provider Encrypt` respectively. The `Provider Encrypt` branch had a 2.67x increase in average time over the vanilla and 2.03x increase over prefix-lookup, suggesting encryption can potentially increase the time to find providers significantly. For provider-encrypt vs prefix-lookup, all average values were around double. ### 5.8 peers-found The peers-found metric measures how many providers were found for each CID looked up. There were 50 providers for each CID, so the expected value should be close to or 50 for each. For the vanilla and `Prefix Lookup` DHTs, all providers were found for all CIDs. For the `Double Hash` and `Provider Encrypt` DHTs, most of the nodes found all 50 providers, but for the `Double Hash` DHT, one found as low as 34, and for the `Provider Encrypt` branch, quite a few found lower than 50, even as low as 1. This is potentially due to the lookup timing out due to the increased time it takes for decryption on the `Provider Encrypt` branch. However, when averaged out, nodes always found near 50 peers. The lowest value recorded was a min average of 46.32 for the provider-encrypt branch. ### 5.9 peers-missing The peers-found metric measures how many providers not were found for each CID looked up (out of the 50 providers). The data here is essentially the inverse of above (ie. each point is 50-(peers-found)). ### 5.10 Bandwidth Total In The bandwidth-total-in metric logs the total inbound bandwidth of each node in bytes. We expect this to be around the same for all branches, as no implementation significantly increases the message size or number of messages passed. This is what's observed; the total bandwidth reaches a maximum of around 800 kilobytes, 700 kilobytes, 1000 kilobytes, and 900 kilobytes for vanilla, double-hash, prefix-lookup, and provider-encrypt respectively. The total bandwidth also had minimums of 700 kilobytes, 550 kilobytes, 750 kilobytes, and 600 kilobytes respectively. The bandwidth variations can be explained by differing amounts of messages sent by each node. The average values were also similar, varying from 300 kilobytes to 540 kilobytes bytes. The vanilla and provider-encrypt branches were the most similar, at an average of 440 kilobytes and 460 kilobytes respectively. Overall, there were no significant bandwidth increases between implementations. <!-- > bandwidth total in for run1 was found to have a maximum ~800,000 bytes and run11 has a maximum bandwidth rate of ~400,000 bytes bytes. This is about a 50% decrease in the total input bandwidth bewtween run1 and run11. > no noticeable differences between total bandwidth in for run11 and run12. > no noticeable differences between total bandwidth in for run12 and run13. --> ### 5.11 Bandwidth Total Out The bandwidth-total-in metric logs the total outbound bandwidth of each node in bytes. We expect this to be around the same for all branches, as no implementation significantly increases the message size or number of messages passed. This is what's observed; the total bandwidth reaches a maximum of around 1.75e6 bytes, 1e5 bytes, 1.5e6 bytes, and 1.2e6 bytes for vanilla, double-hash, prefix-lookup, and provider-encrypt respectively. The total bandwidth also had minimums of 2e5, 3e5, 4e5, and 4e5 bytes respectively. The average values were also similar, varying from 3.5e5 bytes to 5.5e5 bytes. The vanilla and provider-encrypt branches were the most similar, at an average of 4.6e5 and 4.8e5 respectively. Overall, there were no significant maximum bandwidth increases between implementations. The minimum bandwidth did double between vanilla and prefix-lookup and provider-encrypt branches, which is notable. > The 4bit `Provider Lookup` test (run11) results in significantly less total bandwidth out that is logged. The maximum bandwidth out (in bytes) is found to be ~ 5.5x1e5 bytes, and the maximum bandwidth out for the vanilla test (run1) results in ~1.75x1e6 bytes. This is about a 68.57% decrease in the output bandwidth between run1 and run11. > The maximum bandwidth out is found to be ~5x1e5 bytes for run12. There is a 71.4 decrease in the output between run1 and run12. There is a 9% decrease between run11 and run12. > no noticeable differences between total bandwidth out for run12 and run13. ### 5.12 Bandwidth Rate In The bandwidth-rate-in metric measures the inbound bandwidth rate (in bytes/second) of each node. We expect this to be around the same for all branches, as no implementation significantly rate of messages sent. For the most part, data points were clustered around the x-axis (indicating a low bandwidth rate), with occassional spikes. The maximum values were 6e4, 6.5e4, 1e5, and 2e5 bytes/second for vanilla, double-hash, prefix-lookup, and provider-encrypt respectively. Note that for provider-encrypt, only one node reached 2e5 bytes/second, and it was significantly higher than any other nodes. Excluding the outlier, provider-encrypt had a maximum rate of around 4e4 bytes/second, which was actually the lowest out of all implementations. Excluding the outliers, there is no significant difference in maximum values between the branches. The average values vary from 1614 bytes/s to 3514 bytes/s. The provider-encrypt branches was actually lower than the vanilla branch, at an average of 1799 bytes/s versus 3156 bytes/second. Overall, the average bandwidth rate was not degraded. <!-- > bandwidth rate in for run1 is found to have a maximum ~60,000 bytes and run11 has a maximum bandwidth rate of ~14,000 bytes. This is about a 76.7% decrease in the input rate between run1 and run11. > max bandwidth rate in for run12 is found to be ~12,000 bytes. This is about an 80% decrease between run1 and run12. There is a ~14.3% decrease in bandwidth rate between run11 and run12. > Although there is a global maximum of 80,000 bytes in run13, the maximum where the majority of the points lie ~27,000 bytes. Using the local maximum where most points lie, there is a ~55% decrease between run11 and run13. There is ~55.6% increase between run12 and run13. --> ### 5.13 Bandwidth Rate Out The bandwidth-rate-out metric measures the outbound bandwidth rate (in bytes/second) of each node. We expect this to be around the same for all branches, as no implementation significantly rate of messages sent. For the most part, data points were clustered around the x-axis (indicating a low bandwidth rate), with occassional spikes. The maximum values were 1.2e5, 5.5e5, 1.75e5, and 7e5 bytes/second for vanilla, double-hash, prefix-lookup, and provider-encrypt respectively. Note that all of these values were significant outliers and single nodes who reached these values, while generally the rates stayed much lower near the x-axis. Excluding the outliers, there is no significant difference between the branches. The average values vary from 1910 bytes/s to 3614 bytes/s. The provider-encrypt branches was actually lower than the vanilla branch, at an average of 2219 bytes/s versus 3325 bytes/second. Overall, the average bandwidth rate was not degraded. <!-- > bandwidth rate out for run1 was found to have a maximum ~120,000 bytes and run11 has a maximum of ~20,000 bytes. This is about an 83.3% decrease in the output rate between run1 and run11. > max bandwidth rate out for run12 is found to be ~17,500 bytes. This is about an 85.4% decrease in the output rate between run1 and run12. There is a 12.5% decrease in the output rate between run11 and 12. --> ### 5.14 Varying prefix length lookups In this section, the prefix length (in bits) of the lookup key has been varied for each run. A shorter prefix length provides more anonymity, but may increase the number of hops before a provider record is found. This is because the queried nodes will return a list of closer nodes that contain the prefix, but these closer nodes may or may not be closer to the actual desired key. The varying prefix length tests (`Prefix Lookup` run11-16) show that the maximum number of hops are greatest with the shortest test prefix length (4-bits). This reveals that the shortest prefix length may have a greater number of hops to reach its destination. The number of hops required goes back down to its "expected" value when we increase the prefix length to 8 (and greater). This shows that the prefix length does not affect hop count if it is greater than 4-bits (of the prefix lengths we have tested). Oddly, the cpu usage for run1 (`Vanilla`) was higher than that of run11 (4-bit `Prefix Lookup`). The average cpu usage was about 2% lower in run11 than in run1. The CPU usage would be expected to be higher in this case, but the measurements show that there was actually a decrease. | Run | avg CpuUsage | avg max CpuUsage | avg min CpuUsage | | --- | --- | --- | --- | | run1 | 7.238724809 | 8.434285714 | 6.319705882 | | run11 | 5.484206949 | 7.099230769 | 4.29875 | The CPU usage across the varying prefix lengths of 4, 8, 32, 64, and 128 did not show any significant changes compared to run 1 (`Vanilla`). > \* Note: For run12-16, there was no significant changes compared to run1. The CPU usage has a peak of >70% at the start for run13 and run15, but this is most likely due to some docker initialization/setup that could be causing this spike (as the CPU metrics were collected using a arbitrary point near the start of the test - but this point is before the DHT nodes `provide`). | Run | total avg BWTotalIn | max avg BWTotalIn | min avg BWTotalIn | | --- | --- | --- | --- | | run11 | 292441.1653 | 328584.4615 | 235397.5833 | | run12 | 285333.3848 | 339310.4286 | 237613.95 | | run13 | N/A | N/A | N/A | | run14 | 260836.7213 | 288134.4091 | 216600.1905 | | run15 | 265841.8088 | 303561.8462 | 236569.0714 | | run16 | 280491.7799 | 312059.4667 | 238306.2143 | | Run | total avg BWTotalOut | max avg BWTotalOut | min avg BWTotalOut | | --- | --- | --- | --- | | run11 | 294467.9067 | 380355.3077 | 221898.7692 | | run12 | 288335.4362 | 377195.95 | 189855 | | run13 | N/A | N/A | N/A | | run14 | 264421.3574 | 369168.7143 | 179440.1 | | run15 | 268613.2077 | 354313.7857 | 181613.5833 | | run16 | 283984.8108 | 376995.2143 | 203912.5714 | | Run | total avg BWRateIn | max avg BWRateIn | min avg BWRateIn | | --- | --- | --- | --- | | run11 | 1224.263371 | 2038.971633 | 448.3723341 | | run12 | 738.9709882 | 1187.754876 | 404.4961928 | | run13 | N/A | N/A | N/A | | run14 | 1254.429926 | 4801.399458 | 647.676048 | | run15 | 1768.107889 | 3581.510489 | 901.6549378 | | run16 | 1020.68957 | 1973.012653 | 417.27045 | | Run | total avg BWRateOut | max avg BWRateOut | min avg BWRateOut | | --- | --- | --- | --- | | run11 | 1248.366194 | 2983.32498 | 376.7925883 | | run12 | 781.9028536 | 1674.985666 | 311.5351291 | | run13 | N/A | N/A | N/A | | run14 | 1199.18235 | 3128.780435 | 517.5199108 | | run15 | 1776.750549 | 3685.12935 | 685.2725313 | | run16 | 1008.442029 | 1856.110441 | 398.2711893 | > \* Note: the averages listed here are the averages of the metric for each node. The "total avg" is the average of all node averages, the "min avg" is the minimum node average, and the "max avg" is the maximum node average found. ## 6. Conclusions Overall, performance was not significantly degraded by the implementation of `double hashing`, `prefix lookups`, and `provider encryption`. The main metrics of note are time-to-provide, which increased by 2.29x for provider encryption compared to `vanilla`, and 1.83x compared to `prefix lookups`. The maximum number of hops does not increase until the prefix lookup bitlength is <8. In terms of finding providers (measured by the `time-to-find` metric), provider encryption had a 2.67x increase in average time over the `vanilla` and 2.03x increase over prefix-lookup, suggesting decryption can potentially increase the time to find providers significantly. However, the time to find the first provider (`time-to-find-first`) was not degraded. CPU, bandwidth, thread usage, number of providers found, time to provide did not show any significant changes. ## 7. Appendix - Test-plan Implementation: https://github.com/ChainSafe/test-plans/tree/araska/updatesdkgo - Testground Metrics Parser Implementation: https://github.com/araskachoi/testground-parser - Test result additional graphs: https://github.com/araskachoi/testground-parser/tree/master/images - includes graphs for each run and each metric - Test Results Spreadsheet: https://docs.google.com/spreadsheets/d/1ywNNrEQxqDDfTMi7SYh57_w-kh6ZCURdzpEdt64maiM/ - includes average values for each node for each run and metric - also includes average/min/max values of each of these averages