Throughput of Pulp 3.12.dev tasking system

# Throughput of Pulp 3.12.dev tasking system ## System Under Test pulpcore==ee0684af vanilla vagrant VM pulp3-source-fedora33 * 6GB RAM * 4 CPUs ## Test The tasks are a `do_nothing` task which contains a single `pass` statement. ``` diff --git a/pulpcore/app/tasks/benchmark.py b/pulpcore/app/tasks/benchmark.py new file mode 100644 index 000000000..84110d0d2 --- /dev/null +++ b/pulpcore/app/tasks/benchmark.py @@ -0,0 +1,2 @@ +def do_nothing(): + pass ``` Dispatch 500 `do_nothing` tests, each with a unique lock required (a uuid). Perform the test against an empty database and an idled Pulp system. Vary the number of workers Measure: * The dispatch throughput in tasks / second dispatched. This is measured from when dispatching starts, to when it stops. * The task throughput in tasks / second dispatched. This is the newest Task. This is accomplished with [this script](https://gist.github.com/bmbouter/3e6ca6723a54978c340afbf4aecb63e3) which provides example output like: ``` dispatched 500 in 2.943 seconds 169.893 task/sec completed 500 in 184.554 seconds 2.709 task/sec ``` ## Vary the workers We want to understand the relationship between worker count and tasking throughput. With unique locks available, tasks should not block other tasks so we expect task throughput to increase as worker count increases. ## Results | Workers | Tasks | Dispatch Throughput | Task Throughput | | --------| ----- | ------------------- | --------------- | | 1 | 500 | 153.487 | 2.625 | | 2 | 500 | 169.893 | 2.709 | | 4 | 500 | 162.326 | 2.584 | | 8 | 500 | 167.092 | 2.576 | | 16 | 500 | 158.606 | 2.432 | ## Conclusion 1. The tasking throughput increases almost none as the worker count is increased. The tasking system's architecture in the best of conditions has a maximum throughput of ~ 2.6 tasks / sec. 2. We can dispatch tasks pretty fast (160 tasks / sec) even with a single process dispatching them serially. ## Why? All tasks route through the resource manager, which is the architectural choke-point. Regardless of how many workers you have, all tasks have to be processed by the single resource manager worker. ## How could this be considered good? During normal operation tasks take minutes to tens of minutes to run, e.g. sync tasks. For example, assume tasks take 120 seconds on average, a task processing throughput of 2.6 tasks / second means you would benefit from likely 46 workers before running into the architectural limits. One way to think about this is the resource manager routes a task every 0.385 seconds, so it has 120 seconds to make it "back around" to the first worker right as is finishes. At that rate, the system should have 46 workers. ###### tags: `Tasking System`