Try   HackMD

Proposal: gather /status/ telemetry

What question will this help us answer?

The general use is to get a feel for what a "typical" Pulp3 installation looks like.

The most important question being answered is "What specific plugin-versions are in use?" This will start giving uis insight into what our community's upgrade-pace is, and gauge plugin-popularity.

We'll be able to start gauging what a typical/standard/median "size" of a Pulp3 installtion is, by learning:

  • How many content-apps are in use?
  • How many hosts do these content-apps run on?
  • How many workers are in use?
  • How many hosts do the workers run on?
    * How much disk is available

Finally, we'll learn how important "redis" is to the community, since /status/ reports on whether it is in use or not.

What is a specific example of the data to be gathered?

$ http :/pulp/api/v3/status/
{
    "database_connection": {
        "connected": true
    },
    "online_content_apps": [
        {
            "last_heartbeat": "2022-01-26T18:58:36.372233Z",
            "name": "49030@pulp3-source-fedora34.padre-fedora.example.com"
        },
        {
            "last_heartbeat": "2022-01-26T18:58:36.532093Z",
            "name": "49031@pulp3-source-fedora34.padre-fedora.example.com"
        },
    ],
    "online_workers": [
        {
            "current_task": null,
            "last_heartbeat": "2022-01-26T18:58:39.618259Z",
            "name": "49007@pulp3-source-fedora34.padre-fedora.example.com",
            "pulp_created": "2022-01-26T15:17:47.865## Parking Lot for potential future/RFE work
912Z",
            "pulp_href": "/pulp/api/v3/workers/202beb44-4c54-48d9-a3f1-c671c406310e/"
        },
        {
            "current_task": null,
            "last_heartbeat": "2022-01-26T18:58:39.655667Z",
            "name": "49017@pulp3-source-fedora34.padre-fedora.example.com",
            "pulp_created": "2022-01-26T15:17:48.479940Z",
            "pulp_href": "/pulp/api/v3/workers/4d3bfd4c-1bd2-4930-8c7b-15e800bec3e0/"
        },

    ],
    "redis_connection": {
        "connected": true
    },
    "storage": {
        "free": 31851548672,
        "total": 42006183936,
        "used": 7990427648
    },
    "versions": [
        {
            "component": "core",
            "version": "3.18.0.dev"
        },
        {
            "component": "file",
            "version": "1.11.0.dev"
        },
        {
            "component": "rpm",
            "version": "3.18.0.dev"
        },
        {
            "component": "container",
            "version": "2.11.0.dev"
        },
        {
            "component": "deb",
            "version": "2.18.0.dev"
        },
        {
            "component": "certguard",
            "version": "1.6.0.dev"
        },
        {
            "component": "pulp_2to3_migration",
            "version": "0.16.0.dev"
        }
    ]
}

How will this metric be stored (in the database or gathered at runtime)?

  • Gathered directly from the /status/ endpoint at telemetry-send-time.

Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?

  • No - /status/ is a very low-impact API.

Is this metric Personally-Identifiable-Data?

  • YES - online_content_apps and online_workers worker-names include machine-names, which can carry PID. These will need to be sanitized.

How can we sanitize this output?

  • remove the "name" field (it doesn't teach us anything)
  • replace "name": <actual-process-id>@<actual-host> with "name": PID@HOST
  • replace "name": <actual-process-id>@<actual-host> with "name": PID@sha256(actual-host-name)
    • this would let us track number-of-unique-hosts, without knowing the hostname
  • record only number-of-processes/number-of-hosts
    • requires a little more processing

What pulpcore version will this be collected with?

  • 3.19

Discussion

  • keeping unique-host-info vs counts
    • can we tell plugins-per-host? is it useful? is it even possible?
    • currently don't/can't do this
      • would allow scaling-control
  • what about artifact/content "sizes"?
    • yes! needs its own proposal - volunteers?

Is this approved/not-approved?

  • accept alternative proposal:
    • Aye: 6
    • Nay: 0

Alternative Proposal

  • Lose db/redis info
    • "on" isn't useful - "version" is
    • should be their own telemetry
  • Lose storage
    • not very useful
    • when connected to object-storage, not very sueful
    • should also be its own telemetry option
  • Summary info
    • change to count process and hosts
      • see better questions at top, sanitation section

Proposed alternate telemetry data

{
    "online_content_apps": {
        "processes": 2
        "hosts": 1
    },
    "online_workers": {
        "processes": 2
        "hosts": 1
    },
    "versions": [
        {
            "component": "core",
            "version": "3.18.0.dev"
        },
        {
            "component": "file",
            "version": "1.11.0.dev"
        },
        {
            "component": "rpm",
            "version": "3.18.0.dev"
        },
        {
            "component": "container",
            "version": "2.11.0.dev"
        },
        {
            "component": "deb",
            "version": "2.18.0.dev"
        },
        {
            "component": "certguard",
            "version": "1.6.0.dev"
        },
        {
            "component": "pulp_2to3_migration",
            "version": "0.16.0.dev"
        }
    ]
}

Parking Lot for potential future/RFE work

  • can we tell plugins-per-host? is it useful? is it even possible?
    • currently don't/can't do this
      • would allow scaling-control
  • Determining clusters solutions
    • Pulp instances that are scaled out horizontally, how could that be visualised (give away in the status?)
    • Unique pulp instances, that for part of a "cluster" from a clients perspective (does that matter, probably not)
tags: Telemetry

Graphs to be produced

  • How many unique systems there are?
    • represent as a line graph over time (count of total unique systems)
  • Versions per component
    • For each component, e.g. rpm, certguard, pulpcore
      • use a pie chart to show the version distribution for that component
  • Bar graph reports the number of users per component
    • Regardless of version
    • how to distinguish whether this is a single container installation?
  • Average hosts
    • online_content_app hosts summarized into a single average, and graphed as a timeseries
    • online_workers hosts summarized into a single average, and graphed as a timeseries
  • Average processes
    • same as above, only for processes
  • Average processes / host
    • same as above, only for average processes / host