owned this note
owned this note
Published
Linked with GitHub
# Proposal: gather /status/ telemetry
## What question will this help us answer?
The general use is to get a feel for what a "typical" Pulp3 installation looks like.
The most important question being answered is "What specific plugin-versions are in use?" This will start giving uis insight into what our community's upgrade-pace is, and gauge plugin-popularity.
We'll be able to start gauging what a typical/standard/median "size" of a Pulp3 installtion is, by learning:
* How many content-apps are in use?
* How many hosts do these content-apps run on?
* How many workers are in use?
* How many hosts do the workers run on?
~~* How much disk is available~~
Finally, we'll learn how important "redis" is to the community, since /status/ reports on whether it is in use or not.
## What is a specific example of the data to be gathered?
```
$ http :/pulp/api/v3/status/
{
"database_connection": {
"connected": true
},
"online_content_apps": [
{
"last_heartbeat": "2022-01-26T18:58:36.372233Z",
"name": "49030@pulp3-source-fedora34.padre-fedora.example.com"
},
{
"last_heartbeat": "2022-01-26T18:58:36.532093Z",
"name": "49031@pulp3-source-fedora34.padre-fedora.example.com"
},
],
"online_workers": [
{
"current_task": null,
"last_heartbeat": "2022-01-26T18:58:39.618259Z",
"name": "49007@pulp3-source-fedora34.padre-fedora.example.com",
"pulp_created": "2022-01-26T15:17:47.865## Parking Lot for potential future/RFE work
912Z",
"pulp_href": "/pulp/api/v3/workers/202beb44-4c54-48d9-a3f1-c671c406310e/"
},
{
"current_task": null,
"last_heartbeat": "2022-01-26T18:58:39.655667Z",
"name": "49017@pulp3-source-fedora34.padre-fedora.example.com",
"pulp_created": "2022-01-26T15:17:48.479940Z",
"pulp_href": "/pulp/api/v3/workers/4d3bfd4c-1bd2-4930-8c7b-15e800bec3e0/"
},
],
"redis_connection": {
"connected": true
},
"storage": {
"free": 31851548672,
"total": 42006183936,
"used": 7990427648
},
"versions": [
{
"component": "core",
"version": "3.18.0.dev"
},
{
"component": "file",
"version": "1.11.0.dev"
},
{
"component": "rpm",
"version": "3.18.0.dev"
},
{
"component": "container",
"version": "2.11.0.dev"
},
{
"component": "deb",
"version": "2.18.0.dev"
},
{
"component": "certguard",
"version": "1.6.0.dev"
},
{
"component": "pulp_2to3_migration",
"version": "0.16.0.dev"
}
]
}
```
## How will this metric be stored (in the database or gathered at runtime)?
* Gathered directly from the /status/ endpoint at telemetry-send-time.
## Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
* No - /status/ is a very low-impact API.
## Is this metric Personally-Identifiable-Data?
* **YES** - `online_content_apps` and `online_workers` worker-names include machine-names, which can carry PID. These will need to be sanitized.
### How can we sanitize this output?
* remove the "name" field (it doesn't teach us anything)
* replace `"name": <actual-process-id>@<actual-host>` with `"name": PID@HOST`
* replace `"name": <actual-process-id>@<actual-host>` with `"name": PID@sha256(actual-host-name)`
* this would let us track number-of-unique-hosts, without knowing the hostname
* record only number-of-processes/number-of-hosts
* requires a little more processing
## What pulpcore version will this be collected with?
* 3.19
## Discussion
* keeping unique-host-info vs counts
* can we tell plugins-per-host? is it useful? is it even possible?
* currently don't/can't do this
* would allow scaling-control
* what about artifact/content "sizes"?
* yes! needs its own proposal - volunteers?
## Is this approved/not-approved?
* accept alternative proposal:
* Aye: 6
* Nay: 0
## Alternative Proposal
* Lose db/redis info
* "on" isn't useful - "version" is
* should be their own telemetry
* Lose storage
* not very useful
* when connected to object-storage, not very sueful
* should also be its own telemetry option
* Summary info
* change to count process and hosts
* see better questions at top, sanitation section
### Proposed alternate telemetry data
```
{
"online_content_apps": {
"processes": 2
"hosts": 1
},
"online_workers": {
"processes": 2
"hosts": 1
},
"versions": [
{
"component": "core",
"version": "3.18.0.dev"
},
{
"component": "file",
"version": "1.11.0.dev"
},
{
"component": "rpm",
"version": "3.18.0.dev"
},
{
"component": "container",
"version": "2.11.0.dev"
},
{
"component": "deb",
"version": "2.18.0.dev"
},
{
"component": "certguard",
"version": "1.6.0.dev"
},
{
"component": "pulp_2to3_migration",
"version": "0.16.0.dev"
}
]
}
```
## Parking Lot for potential future/RFE work
* can we tell plugins-per-host? is it useful? is it even possible?
* currently don't/can't do this
* would allow scaling-control
* Determining clusters solutions
* Pulp instances that are scaled out horizontally, how could that be visualised (give away in the status?)
* Unique pulp instances, that for part of a "cluster" from a clients perspective (does that matter, probably not)
###### tags: `Telemetry`
## Graphs to be produced
* How many unique systems there are?
* represent as a line graph over time (count of total unique systems)
* Versions per component
* For each component, e.g. rpm, certguard, pulpcore
* use a pie chart to show the version distribution for that component
* Bar graph reports the number of users per component
* Regardless of version
* how to distinguish whether this is a single container installation?
* Average hosts
* online_content_app hosts summarized into a single average, and graphed as a timeseries
* online_workers hosts summarized into a single average, and graphed as a timeseries
* Average processes
* same as above, only for processes
* Average processes / host
* same as above, only for average processes / host