Try   HackMD

Analytics Working Group

## YYYY-MM-DD
### Attendees:
### Prev AIs
### Agenda
### AIs
### Links

Ongoing Notes

  • first three mtgs will be one hour
  • going forward, 30 min on the half-hour

List of Questions for Every Metric To Be Gathered

  • What question will this help us answer?
  • What is a specific example of the data to be gathered?
  • How will this metric be stored in the database or gathered at runtime?
  • Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
  • Is this metric Personally-Identifiable-Data?
  • What pulpcore version will this be collected with?
  • Is this approved/not-approved?

Analytics-proposal Template

# Title
## What question will this help us answer?
## What is a specific example of the data to be gathered?
## How will this metric be stored in the database or gathered at runtime?
## Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
## Is this metric Personally-Identifiable-Data?
### How can we sanitize this output?
## What pulpcore version will this be collected with?
## Alternative proposal(s)
### Option 1
### Option N
## Discussion notes
## Is this approved/not-approved?
## Parking Lot for potential future/RFE work
###### tags: `Analytics`

Open Questions

  • Do we want to compute processes / host also?

2022-12-1

Attendees: bmbouter ppicka mdellweg dkliban wibbit ggainey

Agenda

2022-10-20

Attendees: bmbouter ppicka mdellweg dkliban wibbit ggainey

Agenda

  • Here’s a new set of graphs 2 to look at accepting from @mdellweg
  • Here’s a proposal to collect, summarize, and visualize postgresql version 2 which would be a new metric. This is going to be the “live coding” part that I do at Pulpcon to add it.
    • https://hackmd.io/zJ1dJe8qQtmzr0JiM1jptw
    • discussion around "how do we want to summarize"
      • e.g., is X.Y.Z really interesting?
      • We want to summarize "versions that matter"
    • side discussion: format/organization of main visualization page would be A Good Thing
  • FYI lots of new docs here 1 including importing data from the production site
  • Should we be limiting summaries to only systems with at least 2 checkins?
    • "yes please" is the consensus
  • Proposal: Add a “summarization” and “visualization” sections to the “proposal template”

2022-08-25

Attendees: ppicka, ggainey, bmbouter

Agenda

2022-08-18

Attendees: ggainey, dkliban, bmbouters, ipanova, ppicka

Prev AIs

Agenda

  • progress made on finalizing POC
    • demo time!
    • proposal: have "summarizer" delete old content (rather than replace)
    • proposal: have "summarizer" only delete data older-than some window (2 weeks?)

AIs

2022-08-11

Attendees: ggainey, dkliban, ppicka, bmbouters, ipanova, wibbit

Prev AIs

Agenda

  • discussion around https://github.com/pulp/pulpcore/pull/3032
    • def a good idea, prob want this backported to 3.20
  • progress update
    • lots of progress being made, not baked yet
    • lots of interaction w/ duck@osci
      • analytics.pulpproject.org has 2 branches, main and dev
      • auto-deploys to 2 diff OSCI deployments
      • both use LetsEncrypt TLS
      • web-process pod, posstgres backend
      • django-admin enabled for superuser controls
    • modification to how payloads are defined
      • consolidates client and server definitions of payload
      • using Google's "Protocol Buffer" approach (q.v.)
      • what about version mismatches?
        • ProtocolBuffer is Opinionated - follow their requirements
    • next steps
      • charting
      • summaries
        • manage.py cmd, to be called by openshift cron every 24 hrs
      • data expiry

AIs

  • bmbouters hoping for a tech demo next mtg

Links

2022-07-21

Attendees:

2022-07-14

Attendees: bmbouters, dkliban, ipanova, ppicka, ggainey

  • Current State
  • PROBLEMS
    • summarization isn't working, investigation isn't getting us past whatever the problem is
    • server-side-code pagination isn't working
    • DNS for analytics-pulpproject-org to be analytics.pulpproject.org would require all pulpproject.org be handed over to cloudflare
      • reverse-proxy is possible, POC works but issuboptimal
  • OSCI asking why we're not just running this on their openshift instance/platform
    • This is a fine question!
  • PROPOSAL
  • discussion ensues
    • reliability/availability? visibility into admin/monitoring?
      • health probe/autorestart-pod should work
  • proposal: openapi work to auto-generate client/server side of this
    • makes available to other projects who might want to do this

2022-06-16

Attendees: ppicka, bmbouter, ipanova, douglas

  • currently pulpcore will post only to the dev site, and only if the user has a .dev installation
  • some users could have .dev

Action Items

2022-05-26

Attendees: ppicka, bmbouter, ipanova, dkliban, douglas

  • In summarizing numbers, in addition to the mean, do we want max and min also?
    • not for now
  • Is it time to sign up for the $5 / month plan?
    • yes
  • How do we make the versions graph not so complicated?
    • Keep the raw data including the z-version, but also make a graph that aggregates all Z versions into totals and show that

Action Items

  • [bmbouter] Make a graph that aggregates all Z versions into totals and show that x.y counts
  • [bmbouter] Revise telemetry PoC to only have it post dev data
  • [bmbouter] Check in with RH about them enabling the pay-plan

2022-04-07

Attendees: ppicka, bmbouter, ipanova, dkliban, ggainey, douglas

  • quick review of the graphs with the status data
  • duplicate data submission
    • expiration_time - 30days
    • there should only be one data point from each system because the key is the systemID
  • KV - data format
    • {SystemID: {all_the_data, , , }}
  • summarization process
    • only considers the latest data points posted in the last 24 hours
  • Are users allowed to download the raw data?
    • No because we're telling users that their raw data is only ever retained for 30 days
  • Are users allowed to download the summary data?
    • The public analytics site will provide the data, we may allow for downloading of the summarized data later
  • how to disable this for dev installs
    • have a dev URL and analytics site and a production URL and analytics site
    • if pulpcore ends in .dev submit to the dev site otherwise the production site
    • similar to what home assistant does
  • First implementation not planning to handle proxy configs

2022-03-24

Attendees: ppicka, bmbouter, ipanova, dkliban

  • Will we share the raw data, or just the summarized data?
    • We'll provide just the summaries publicly
  • See the graphs to be produced at the bottom of the https://hackmd.io/@pulp/telemetry_status document
  • Proposal: summarize daily and include only 1 data point from each systemID

2022-03-17

Attendees: ppicka, dfurlong, ggainey, bmbouters, ipanova

  • bmbouter revised POC and demo
  • Thoughts
    • how/where do we log outgoing info?
      • into logs? what level?
      • into task progress-report?
      • into sep file?
      • needs discussion
    • what's a good TTL for data sent to CloudFlare?
  • cloudflare docs : https://api.cloudflare.com/#custom-hostname-for-a-zone-custom-hostname-details
  • HomeAssistant has cloudflare-side worker-code receiving data
  • How do we build/maintain summary info?
  • What if we send as "uuid-timestamp": "data"?
  • details are important - but at a high level, what aggregate/historical data are we actually interested in keeping?
  • "What question are we answering" needs an additional "How are we going to visualize that information?"
  • keep in mind the difference between "monitoring" and "telemetry"
  • AI for all: what kinds of ways would we like to summarize/display/graph the existing data proposal ("status")

2022-02-03

Attendees: ppicka, dfurlong, dkliban, ggainey, bmbouters, ipanova

Prev AIs

Agenda

  • review /status/ writeup
    • alternative proposal approved
  • [all]: What do we want to focus on in the following 30-min mtgs?
    • example: how do we develop metrics and test them?
    • example: how do we let plugins report?
    • example: let's talk about status API

AIs

  • [ggainey] hackmd to list "things we might want telemetry proposals for", send link to list
  • [ggainey] update telemetry-proposal template to include "discussion", "alternative proposal", "RFE suggestions arising from discussion" sections

Links

2022-01-27

Attendees:

Prev AIs

  • [bmbouters] make POC race-condition-free, post data, have a read-UI
  • [all]: What do we want to focus on in the following 30-min mtgs?
    • example: how do we develop metrics and test them?
    • example: how do we let plugins report?
    • example: let's talk about status API
  • [ggainey]: write up "results of pulp /status/ API" as a formal presentation of a metric to the Telemetry Group, answering The List Of Questions

Agenda

  • ggainey to report on anything from OCP Telemetry discussions
    • response back from Nick Stielau
    • I have a link to an internal doc on how/what his group is measuring
    • Standing offer to have a 30-min telemetry/metric overview discussion, have not set a date yet
    • Pointer to https://www.productled.org/foundations/product-led-growth-metrics for general info (if anyone hasn't seen this before)

AIs

Links

2022-01-20

Attendees: bmbouters, dkliban, dfurlong, ppicka, ipanova, ggainey

  • Last 1-hr mtg
  • future mtgs 30 min at the half-hour

Prev AIs

  • [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
    • contact made, pointers received, email dispatched
  • [bmbouter] POC against Cloudflare

Agenda

  • discussion about POC
    • discussion around implications of adding tasking-subsystem to Pulp3
    • signed up for Cloudflare k/v accoumnt (pulp-infra@ rmail)
      • something is "not right yet" - #soon
    • bmbouter to engage CF Discourse
  • What are all the ways we could communicate this transparency to users?
    • How do we make it Really Easy for user to know what's happening and opt-out?
      • docs, release notes, discourse announcement
      • social media (tweet, etc)
      • youtube demo
      • work w/ mcorr RE social-media
      • log at start up that telemetry reporting is enabled and refer to a setting which should be changed to disable it
        • really important for the Users Who Don't Read Anything
      • log every time telemetry is sent
        • homeassist does this here
  • is periodicity configurable?
    • "keep simple things simple" - hardcoded
    • KISS - keep it simple stupid
    • how often is "often enough"?
      • what's the most-reasonable time interval, to most users?
      • once/day
        • can user control "when during the day" it happens?
        • think about network-security-rules?
      • at initial-migration-time, dispatch "soon" post-setup
        • 30 min post-migrations-run (let pulp-install settle down)
  • questions about performance (cpu/memory/etc)
    • contact operate-first group
    • performance/monitoring is separate from telemetry
      • but a still really-useful thing to be doing!
    • [dfurlong] memory-use/performance changes over time is really useful
      • being able to easily-deliver monitoring results back to pulp from users would be great
  • What is the list of questions we want to ask for each metric
    • metric-acceptance discussion needs to be "somewhere permanent"
    • should be a public checklist for answering these questions
      • example: "How we decide if something is PII and how can it be sanitized"
    • should be able to connect a specific metric to the exact commit when it entered the codebase
    • what happens if/when an API being used to collect telemetry, changes what is delivered?
      • what if PII gets added (e.g.)
      • need to have a data-audit process in place
    • an example:
      • the data reported from the list of status
        • What question will this help us answer?
          • How many workers are users running?
          • What plugins do they run?
        • What is a specific example of the data to be gathered?
          • [example TBD]
        • How will this metric be stored in the database or gathered at runtime?
          • We'll gather the data at runtime. This should not cause unecessary load on pulp
        • Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
          • No
        • Is this metric Personally-Identifiable-Data?
          • Yes the hostnames, so it needs to be redacted
  • discussion about kinds-of-data
    • what if post fails
      • give up, send it tomorrow
    • api call-periodicity?
    • api call-sequences?
  • should be a standard way for a user to request all their data be removed from the public data store
  • can there be a standard test-sequence that investigates metric results for "known PII problems" and fails a metric if/as it finds something?

AIs

  • [bmbouters] make POC race-condition-free, post data, have a read-UI
  • [all]: What do we want to focus on in the following 30-min mtgs?
    • example: how do we develop metrics and test them?
    • example: how do we let plugins report?
    • example: let's talk about status API
  • [ggainey]: write up "results of pulp /status/ API" as a formal presentation of a metric to the Telemetry Group, answering The List Of Questions

Links

2022-01-14

Attendees: bmbouter, ttereshc, ipanova, dkliban, ggainey

Prev AIs

  • [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
    • no progress to report
  • [bmbouter] talk about budget and direct costs with management
    • "it's fine, but be selective about which provider we choose"
  • [ttereshc] talk to lzap about Foreman telemetry
    • done, largely concerned with performance-monitoring
    • do we want to collect performance data? or just usage?
    • what other red=hat-telemetry-services exist that we may want to integrate with/to?
    • see ttereshc's email for more detail ("Foreman Telemetry")

Agenda

  • next mtg 20-JAN, 1 hr, then switch to 30 min
  • how is a UUID generated?
    • per-pulp-system
      • ie, one UUID per-clustered-pulp
      • "one UUID per-database"
  • how/where will it be stored?
    • in db - if it doesn't exist, create one
      • create as a migration
    • if it is in the db, use it
    • would survive across restores/rebuilds
    • multi-node installs/clusters
      • same uuid, multiple nodes reporting - can we tell multi-machine architectures?
  • how are we going to periodically post?
    • single-node is 'easy'
    • clusters
      • not a separate call-home service
      • periodic pulp-task-posting
    • everyone puts data into db (somewhere), someone reports it up
    • sanitizing data? - lv for "what do we report" later
    • "how often" - performance data prob needs to be gathered more often, for example
    • "how often do we write into the db?"
      • write at service-startup?
      • what about heartbeats?
      • feature-use needs to happen more often?
        • gather use-data from existing tables
    • How do we do a daily task?
      • wsgi, distributed-lock, dispatch task, record last-update
        • wsgi heartbeat, check against last-dispatch, at correct interval start a new one
        • database-xact to force ordering?
      • even if it's poss for task to dispatch and yet fail to call home - it's ok
  • what kind-of data is our focus?
    • what versions of pulp are installed?
    • what's "a typical pulp instance"?
      • clustered vs not
      • do we gather hardware info? (memory, disk usage, cpus?)
    • what about feature-usage data?
    • configuration - ie, content of pulp/settings.py?
      • ONLY NON-SENSITIVE DATA
      • def need to think hard about how to sanitize
    • monitoring data?
      • not a primary objective
      • let's not shut the door on it for future opportunity
      • monitoring wants UNsanitized data in order to be actionable
  • what's at least one service we can POC against?
    • cloudflare, amazon, etc
    • bmbouters chooses Cloudflare - it uses Free Starter Account! It's Super-Effective!
    • specific cost ballpark - $50-100/month at initial start, poss growing as we learn how much data and storage
  • how can we provide full-choice to users to opt-out/opt-in

AIs

  • [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
  • [bmbouter] POC against Cloudflare
    • migration that creates UUID
    • create CF account
    • have periodic wsgi that posts UUID
    • post progress to discourse

Links

2022-01-06

attendees: wibbit, ttereshc, dkliban, bmbouters, ggainey, ppicka, ipanova

  • first 2/3 mtgs, 1 hr - then shorten to 30, less often

what do we want from today?

  • set goals
  • where is the data going to go?
  • focus on base infrastructure first, then "what data collected and how"
  • process for how to change/mutate/morph the kinds-of data being collected
  • timeline possibility:
    • base infra posted by end-of-January?
      • uuid/one-piece-of-data gathered and sent "somewhere"
    • maybe not have a date attached? just work on POC?
    • maybe just post Goal, and not worry about Date
  • focus on base-infra and where data will go as POC, data-details come Later
  • example of a telemetry operation in production use : https://www.home-assistant.io/integrations/analytics
    • uses CloudFlare to store data
  • don't forget about GDPR (and friends) laws
  • what do other projects use?
    • OpenShift - need to talk to Other Folks
      • AI: establish contact with them?
    • What about Foreman?
      • lzap driving?
      • AI: talk to lzap
    • Fedora? crash reports, installation?
      • Firefox addon may do this?
      • may need some digging, does Fedora still do this?
  • talk to Red Hat around direct-cost of supporting such a service
    • AI: [bmbouters] talk to rchan
  • wibbit: where does data go
    • assuming data is sufficiently anonymized to be made public?
      • yes please
      • keeps us honest about anonymizing
      • enhances trust/transparency
    • cost of distribution/access to the data from the public
      • data-outflow vs data-ingress costs
  • wibbit: enterprise env can be draconic around security
    • infra needs to support multiple pulp-instances hitting a single internal proxy that is the single point-of-contact to telemtry service?
      • two requirements
        • clear docs on details of how data posts
        • proxy support
      • wibbit: data needs to be staged/stageable locally prior to being submitted
        • submit-queue that can be paused/investigated
        • bmbouter: adds to better user-knowledge/transparency, good idea
        • wibbit: allows for admin-internal-consumption
        • dkliban: would help manage multi-pulp-installation
    • wibbit: Real People didn't raise any major concerns, beyond "we need to know what's being uploaded"
  • wibbit: do we need a consistent UUID over time?
    • need to be able to identify across upgrades
    • change-over-time is really important
  • bmbouter: feature should default-to-on
    • ipanova: already long talk in foreman-land on this, see discourse
    • wibbit: dflt-to-on is ok
      • assumption is admins know what they're doing
      • would lose any temporal-system info if dflt-to-off
      • caveat: dflt-on for new-install vs upgrade?
        • when-introduced, to an existing system, is qualitatively diff than new-install
    • let's discuss how to do this "very transparently and loudly"
  • where will this flag exist?

what do want by next week?

  • AIs
    • [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
    • [bmbouter] talk about budget and direct costs with management
    • talk to lzap about Foreman telemetry
  • Things for next week's agenda:
    • how is a UUID generated?
    • how/where will it be stored?
    • how are we going to periodically post?
    • what's at least one service we can POC against?
      • cloudflare, amazon, etc
    • first three mtgs will be one hour
    • going forward, 30 min on the half-hour

Links

tags: Telemetry