Proposal: gather /status/ telemetry
What question will this help us answer?
The general use is to get a feel for what a "typical" Pulp3 installation looks like.
The most important question being answered is "What specific plugin-versions are in use?" This will start giving uis insight into what our community's upgrade-pace is, and gauge plugin-popularity.
We'll be able to start gauging what a typical/standard/median "size" of a Pulp3 installtion is, by learning:
- How many content-apps are in use?
- How many hosts do these content-apps run on?
- How many workers are in use?
- How many hosts do the workers run on?
* How much disk is available
Finally, we'll learn how important "redis" is to the community, since /status/ reports on whether it is in use or not.
What is a specific example of the data to be gathered?
How will this metric be stored (in the database or gathered at runtime)?
- Gathered directly from the /status/ endpoint at telemetry-send-time.
Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
- No - /status/ is a very low-impact API.
Is this metric Personally-Identifiable-Data?
- YES -
online_content_apps
and online_workers
worker-names include machine-names, which can carry PID. These will need to be sanitized.
How can we sanitize this output?
- remove the "name" field (it doesn't teach us anything)
- replace
"name": <actual-process-id>@<actual-host>
with "name": PID@HOST
- replace
"name": <actual-process-id>@<actual-host>
with "name": PID@sha256(actual-host-name)
- this would let us track number-of-unique-hosts, without knowing the hostname
- record only number-of-processes/number-of-hosts
- requires a little more processing
What pulpcore version will this be collected with?
Discussion
- keeping unique-host-info vs counts
- can we tell plugins-per-host? is it useful? is it even possible?
- currently don't/can't do this
- would allow scaling-control
- what about artifact/content "sizes"?
- yes! needs its own proposal - volunteers?
Is this approved/not-approved?
- accept alternative proposal:
Alternative Proposal
- Lose db/redis info
- "on" isn't useful - "version" is
- should be their own telemetry
- Lose storage
- not very useful
- when connected to object-storage, not very sueful
- should also be its own telemetry option
- Summary info
- change to count process and hosts
- see better questions at top, sanitation section
Proposed alternate telemetry data
Parking Lot for potential future/RFE work
- can we tell plugins-per-host? is it useful? is it even possible?
- currently don't/can't do this
- would allow scaling-control
- Determining clusters solutions
- Pulp instances that are scaled out horizontally, how could that be visualised (give away in the status?)
- Unique pulp instances, that for part of a "cluster" from a clients perspective (does that matter, probably not)
Graphs to be produced
- How many unique systems there are?
- represent as a line graph over time (count of total unique systems)
- Versions per component
- For each component, e.g. rpm, certguard, pulpcore
- use a pie chart to show the version distribution for that component
- Bar graph reports the number of users per component
- Regardless of version
- how to distinguish whether this is a single container installation?
- Average hosts
- online_content_app hosts summarized into a single average, and graphed as a timeseries
- online_workers hosts summarized into a single average, and graphed as a timeseries
- Average processes
- same as above, only for processes
- Average processes / host
- same as above, only for average processes / host