# Pulp3 Concurrency impact on sync-performance
## Introduction
Syncing content into Pulp3 is a time-consuming process. Fortunately, one can specify that content-artifacts be retrieved in parallel, in order to minimize time by maximizing bandwidth utilization. This paralellism is accomplished by specifying the `download_concurrency` value on a given Remote.
Pulp3's default value for `download_concurrency` is 20. This can put an undue burden on the server the remote is retrieving content from - sometimes to the point of causing such a server to begin throttling (or outright **failing**)Pulp3's requests.
Rather than blindly changing that default, we would like to have some idea of the performance impact of various settings of `download_concurrency`, as noted in Redmine issue [7212](https://pulp.plan.io/issues/7212)
This document describes a set of tests, their environment, and some results, in search of that impact.
## Methodology
* Pick a repository of significant size and complexity
* http://mirror.fileplanet.com/centos/7/os/x86_64/
* Sync its content (using sync-immediate) into a 'clean' Pulp3 instance, at `download_concurrency` values of **20, 15, 10, 7, 5, and 3**
* For each level, repeat the sync at least three times to average out general-internet-nondeterminism
* Find the average sync-time for a given concurrency level.
## Environment
* Hardware
* Intel Core i7-6700 (4Ghzx8)
* 32Gb memory
* 1TB Western Digital 7200RPM drive
* Google Fibre network connection, ~900Mbps up/down
* local network has frequent high internet use (Twitch streaming, video, high-bandwidth gaming, etc.)
* system under test was not dedicated to the test process, and was running other loads
* Software
* Fedora 31
* tests run on the pulplift vagrant box, pulp3-source-fedora31
* pulpcore and pulp_rpm master as of 2020-07-25
## Caveats
* The higher the concurrency, the heavier the load on the Pulp3 instance
* Connection-bandwidth of the Pulp3 instance is a limiting factor no matter how high concurrency is set
* Connection-bandwidth of the source server is a limiting factor no matter how high concurrency is set
* Internet connectivity is subject to arbitrary changes
## Results
Decreasing concurrency by a factor of **four** (from twenty to five) did not quite **double** the sync-time (10:39 to 18:15).
Reducing it to three results in nearly **tripling** the sync-time (to 29:12)
Runs were executed on 2020-07-25 and 07-26:
![Pulp3 Concurrency Results](https://i.imgur.com/ZlyC1RS.png "Concurrency Test Results")
(Horizontal axis ticks are 20/15/10/7/5/3 concurrency)
As can be seen from the chart, while reducing concurrency does impact performance, there is a distinct knee below `download_concurrency=5`
## Conclusion
In the context of this specific test, on this specific weekend, with this specific hardware - it would seem that a concurrency of **seven** or **five**, while having an impact on sync-time, still results in acceptable results. Going below five starts to have a disproprtionate effect.
**It would therefore appear to be useful to reduce Pulp3's default concurrency to, but not lower than, five, based on these results.**
## Further Experiments
* Repeat the experiment using `on_demand` instead of `immediate`
* Monitor the load-average of the Pulp3 instance under various concurrency levels
* Monitor the Pulp3 instance's memory usage under different concurrency levels
* Monitor the performance of postgresql under different concurrency levels (although the load of moving the actual bits across the network is likely to swamp that metric)
* Test the same sync against different source sites (what impact does latency have? source-bandwidth limitiations?)
* Test when syncing onto an SSD instead of HDD
## Raw Data
```
concurrency,avg duration,duration,start,finish
3,,0:29:07,2020-07-25 16:48:11,2020-07-25 17:17:18
3,,0:27:32,2020-07-25 17:21:37,2020-07-25 17:49:09
3,0:29:12,0:30:56,2020-07-25 19:00:45,2020-07-25 19:31:41
5,,0:18:29,2020-07-24 20:45:03,2020-07-24 21:03:32
5,,0:17:47,2020-07-24 21:41:11,2020-07-24 21:58:59
5,0:18:15,0:18:29,2020-07-25 13:07:23,2020-07-25 13:25:52
7,,0:15:00,2020-07-25 23:15:42,2020-07-25 23:30:42
7,,0:15:36,2020-07-26 0:45:38,2020-07-26 1:01:14
7,0:15:30,0:15:53,2020-07-26 3:01:19,2020-07-26 3:17:12
10,,0:11:56,2020-07-25 14:38:27,2020-07-25 14:50:23
10,,0:15:00,2020-07-25 15:01:04,2020-07-25 15:16:04
10,0:13:33,0:13:43,2020-07-25 15:53:02,2020-07-25 16:06:45
15,,0:12:46,2020-07-25 21:43:20,2020-07-25 21:56:06
15,,0:12:44,2020-07-25 22:20:47,2020-07-25 22:33:31
15,0:12:31,0:12:04,2020-07-25 22:37:57,2020-07-25 22:50:01
20,,0:10:36,2020-07-24 18:51:45,2020-07-24 19:02:21
20,,0:10:58,2020-07-24 20:14:53,2020-07-24 20:25:50
20,0:10:39,0:10:23,2020-07-24 20:27:51,2020-07-24 20:38:14
```