# Analytics Working Group
* discourse thread : https://discourse.pulpproject.org/t/proposal-analytics/259/2
* meeting-notes template
```
## YYYY-MM-DD
### Attendees:
### Prev AIs
### Agenda
### AIs
### Links
```
## Ongoing Notes
* first three mtgs will be one hour
* going forward, 30 min on the half-hour
## List of Questions for Every Metric To Be Gathered
* What question will this help us answer?
* What is a specific example of the data to be gathered?
* How will this metric be stored in the database or gathered at runtime?
* Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
* Is this metric Personally-Identifiable-Data?
* What pulpcore version will this be collected with?
* Is this approved/not-approved?
### Analytics-proposal Template
* **NOTE**: this is available as the "Analytics" template in hackmd.io/pulp!
* https://hackmd.io/@pulp/telemetry_template
```
# Title
## What question will this help us answer?
## What is a specific example of the data to be gathered?
## How will this metric be stored in the database or gathered at runtime?
## Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
## Is this metric Personally-Identifiable-Data?
### How can we sanitize this output?
## What pulpcore version will this be collected with?
## Alternative proposal(s)
### Option 1
### Option N
## Discussion notes
## Is this approved/not-approved?
## Parking Lot for potential future/RFE work
###### tags: `Analytics`
```
## Open Questions
* Do we want to compute processes / host also?
## 2022-12-1
## Attendees: bmbouter ppicka mdellweg dkliban wibbit ggainey
## Agenda
* Determined the last regularly scheduled meeting, and followup meetings will happen as-needed
* To finalize the tech-debt, we should work on these two issues:
* https://github.com/pulp/analytics.pulpproject.org/issues/65
* https://github.com/pulp/analytics.pulpproject.org/issues/69
## 2022-10-20
## Attendees: bmbouter ppicka mdellweg dkliban wibbit ggainey
## Agenda
* Here’s a new set of graphs 2 to look at accepting from @mdellweg
* https://github.com/pulp/analytics.pulpproject.org/pull/23
* one last minor change suggests, consensus appears to be "go4it"
* Here’s a proposal to collect, summarize, and visualize postgresql version 2 which would be a new metric. This is going to be the “live coding” part that I do at Pulpcon to add it.
* https://hackmd.io/zJ1dJe8qQtmzr0JiM1jptw
* discussion around "how do we want to summarize"
* e.g., is X.Y.Z really interesting?
* We want to summarize "versions that matter"
* side discussion: format/organization of main visualization page would be A Good Thing
* FYI lots of new docs here 1 including importing data from the production site
* https://github.com/pulp/analytics.pulpproject.org/tree/dev#exportingimporting-the-database
* Should we be limiting summaries to only systems with at least 2 checkins?
* "yes please" is the consensus
* Proposal: Add a “summarization” and “visualization” sections to the “proposal template”
## 2022-08-25
## Attendees: ppicka, ggainey, bmbouter
## Agenda
* Interesting resources shared with the group from Mozilla's telemetry groups
* https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/index.html
* Updates
* Both https://analytics.pulpproject.org/ and https://dev.analytics.pulpproject.org/ are deployed and ready to receive data
* https://dev.analytics.pulpproject.org/ is receiving data from pulpcore:main and pulpcore:3.20 branches for dev instlals
* Users will receive telemetry on by default starting with 3.21.0 with a setting to disabl and clearly marked. 3.21.0 is tenatively scheduled for Sept 8th.
* Merged with this PR: https://github.com/pulp/pulpcore/pull/3116
* bmbouter and dkliban have admin access to https://dev.analytics.pulpproject.org/
* bmbouter only has access to https://analytics.pulpproject.org/
* AI: we need a second person for https://analytics.pulpproject.org/, will assign at next week's meeting
* Summarization isn't working on https://dev.analytics.pulpproject.org/ for some reason
* Next Steps
* bmbouter to fix whatever the issue is with summarization
* bmbouter to add plugin documentation on the processes and checklists this group currently has in hackmds
* bmbouter to add documentation on how to create the local dev environment
* Future meetings
* Telemetry working group will meet next week, and maybe the week after to finalize some process things and celebrate
* After that telemetry working group will suspend for at least 6 weeks
* Working group will resume as new proposals for metrics are proposed
## 2022-08-18
### Attendees: ggainey, dkliban, bmbouters, ipanova, ppicka
### Prev AIs
### Agenda
* progress made on finalizing POC
* demo time!
* proposal: have "summarizer" delete old content (rather than replace)
* proposal: have "summarizer" only delete data older-than some window (2 weeks?)
### AIs
* bmbouter to take up the proposals above
* add X.Y graph for each component
* next steps:
* PR to dev
* pulpcore PR (https://github.com/pulp/pulpcore/pull/3025/files)
### Links
## 2022-08-11
### Attendees: ggainey, dkliban, ppicka, bmbouters, ipanova, wibbit
### Prev AIs
### Agenda
* discussion around https://github.com/pulp/pulpcore/pull/3032
* def a good idea, prob want this backported to 3.20
* progress update
* lots of progress being made, not baked yet
* lots of interaction w/ duck@osci
* analytics.pulpproject.org has 2 branches, main and dev
* auto-deploys to 2 diff OSCI deployments
* both use LetsEncrypt TLS
* web-process pod, posstgres backend
* django-admin enabled for superuser controls
* modification to how payloads are defined
* consolidates client and server definitions of payload
* using Google's "Protocol Buffer" approach (q.v.)
* https://developers.google.com/protocol-buffers
* what about version mismatches?
* ProtocolBuffer is Opinionated - follow their requirements
* next steps
* charting
* summaries
* manage.py cmd, to be called by openshift cron every 24 hrs
* data expiry
### AIs
* bmbouters hoping for a tech demo next mtg
### Links
* https://developers.google.com/protocol-buffers
* https://github.com/pulp/pulpcore/pull/3032
## 2022-07-21
### Attendees:
* demo/POC on analytics.pulpproject.org ?
## 2022-07-14
### Attendees: bmbouters, dkliban, ipanova, ppicka, ggainey
* **Current State**
* cloudflare impl is doing some data collection/summarization
* dev-installs (*any* plugin ends in -dev) **only**
* https://dev-analytics-pulpproject-org.pulpproject.workers.dev/
* wrote to tech-list@ to see if this is already exists "somewhere" at redhat?
* console.redhat.com - but for customers
* **PROBLEMS**
* summarization isn't working, investigation isn't getting us past whatever the problem is
* server-side-code pagination isn't working
* DNS for analytics-pulpproject-org to be analytics.pulpproject.org would require **all** pulpproject.org be handed over to cloudflare
* reverse-proxy is possible, POC works but is...suboptimal
* OSCI asking why we're not just running this on their openshift instance/platform
* This is a fine question!
* **PROPOSAL**
* bmbouter takes day-of-learning to translate cloudflare server-side from typescript to python, stand up on duck's OS instance
* analytics.pulpproject.org pointing to a helloworld app (thanks duck!)
* https://github.com/pulp/analytics.pulpproject.org
* discussion ensues
* reliability/availability? visibility into admin/monitoring?
* health probe/autorestart-pod should work
* proposal: openapi work to auto-generate client/server side of this
* makes available to other projects who might want to do this
## 2022-06-16
### Attendees: ppicka, bmbouter, ipanova, douglas
* currently pulpcore will post only to the dev site, and only if the user has a .dev installation
* some users could have .dev
*
### Action Items
* [bmbouter] Make a graph that aggregates all Z versions into totals and show that x.y counts
* [bmbouter] Put up "coming soon page"
* [bmbouter] Get analytics.pulpproject.org DNS integrated with https://analytics-pulpproject-org.pulpproject.workers.dev/
* [bmbouter] Reset the https://analytics-pulpproject-org.pulpproject.workers.dev/ environment
* [bmbouter] make additional graphs for each expected plugin version posted
* [bmbouter] go through and implementation pagination in summary data
## 2022-05-26
### Attendees: ppicka, bmbouter, ipanova, dkliban, douglas
* In summarizing numbers, in addition to the mean, do we want max and min also?
* not for now
* Is it time to sign up for the $5 / month plan?
* yes
* How do we make the versions graph not so complicated?
* Keep the raw data including the z-version, but also make a graph that aggregates all Z versions into totals and show that
### Action Items
* [bmbouter] Make a graph that aggregates all Z versions into totals and show that x.y counts
* [bmbouter] Revise telemetry PoC to only have it post dev data
* [bmbouter] Check in with RH about them enabling the pay-plan
## 2022-04-07
### Attendees: ppicka, bmbouter, ipanova, dkliban, ggainey, douglas
* quick review of the graphs with the status data
* https://hackmd.io/@pulp/telemetry_status#Graphs-to-be-produced
* duplicate data submission
* expiration_time - 30days
* there should only be one data point from each system because the key is the systemID
* KV - data format
* {SystemID: {all_the_data, , , }}
* summarization process
* only considers the latest data points posted in the last 24 hours
* Are users allowed to download the raw data?
* No because we're telling users that their raw data is only ever retained for 30 days
* Are users allowed to download the summary data?
* The public analytics site will provide the data, we may allow for downloading of the summarized data later
* how to disable this for dev installs
* have a dev URL and analytics site and a production URL and analytics site
* if pulpcore ends in .dev submit to the dev site otherwise the production site
* similar to [what home assistant does](https://github.com/home-assistant/core/blob/4d72e41a3e88f696d255dc73e4f4e8ec88b1874f/homeassistant/components/analytics/analytics.py#L99)
* First implementation not planning to handle proxy configs
## 2022-03-24
### Attendees: ppicka, bmbouter, ipanova, dkliban
* Will we share the raw data, or just the summarized data?
* We'll provide just the summaries publicly
* See the graphs to be produced at the bottom of the https://hackmd.io/@pulp/telemetry_status document
* Proposal: summarize daily and include only 1 data point from each systemID
## 2022-03-17
### Attendees: ppicka, dfurlong, ggainey, bmbouters, ipanova
* bmbouter revised POC and demo
* https://github.com/pulp/pulpcore/pull/2118/files
* key/value - "systemid": "telemetry-key:value"
* Thoughts
* how/where do we log outgoing info?
* into logs? what level?
* into task progress-report?
* into sep file?
* needs discussion
* what's a good TTL for data sent to CloudFlare?
* cloudflare docs : https://api.cloudflare.com/#custom-hostname-for-a-zone-custom-hostname-details
* HomeAssistant has cloudflare-side worker-code receiving data
* How do we build/maintain summary info?
* What if we send as "uuid-timestamp": "data"?
* details are important - but at a high level, what aggregate/historical data are we actually interested in keeping?
* "What question are we answering" needs an additional "How are we going to visualize that information?"
* keep in mind the difference between "monitoring" and "telemetry"
* AI for all: what kinds of ways would we like to summarize/display/graph the existing data proposal ("status")
## 2022-02-03
### Attendees: ppicka, dfurlong, dkliban, ggainey, bmbouters, ipanova
### Prev AIs
* ggainey status writup : https://hackmd.io/@pulp/telemetry_status
* great discussion ensues
### Agenda
* review /status/ writeup
* alternative proposal approved
* [all]: What do we want to focus on in the following 30-min mtgs?
* example: how do we develop metrics and test them?
* example: how do we let plugins report?
* example: let's talk about status API
### AIs
* ~~[ggainey] hackmd to list "things we might want telemetry proposals for", send link to list~~
* https://hackmd.io/@pulp/telemetry_suggestions
* [ALL] everyone adds one line to ^^
* [ggainey] ~~update telemetry-proposal template to include "discussion", "alternative proposal", "RFE suggestions arising from discussion" sections~~
### Links
* https://hackmd.io/@pulp/telemetry_status
* https://hackmd.io/@pulp/telemetry_suggestions
## 2022-01-27
### Attendees:
### Prev AIs
* [bmbouters] make POC race-condition-free, post data, have a read-UI
* [all]: What do we want to focus on in the following 30-min mtgs?
* example: how do we develop metrics and test them?
* example: how do we let plugins report?
* example: let's talk about status API
* [ggainey]: write up "results of pulp /status/ API" as a formal presentation of a metric to the Telemetry Group, answering The List Of Questions
* https://hackmd.io/@pulp/telemetry_status
### Agenda
* ggainey to report on anything from OCP Telemetry discussions
* response back from Nick Stielau
* I have a link to an internal doc on how/what his group is measuring
* Standing offer to have a 30-min telemetry/metric overview discussion, have not set a date yet
* Pointer to https://www.productled.org/foundations/product-led-growth-metrics for general info (if anyone hasn't seen this before)
### AIs
### Links
## 2022-01-20
### Attendees: bmbouters, dkliban, dfurlong, ppicka, ipanova, ggainey
* Last 1-hr mtg
* future mtgs 30 min at the half-hour
### Prev AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* contact made, pointers received, email dispatched
* [bmbouter] POC against Cloudflare
* migration that creates UUID
* https://github.com/pulp/pulpcore/pull/2118/files
* create CF account
* done tied to pulp-infra
* have periodic wsgi that posts UUID
* post progress to discourse
### Agenda
* discussion about POC
* discussion around implications of adding tasking-subsystem to Pulp3
* signed up for Cloudflare k/v accoumnt (pulp-infra@ rmail)
* something is "not right yet" - #soon
* bmbouter to engage CF Discourse
* https://discord.gg/cloudflaredev
* What are all the ways we could communicate this transparency to users?
* How do we make it Really Easy for user to know what's happening and opt-out?
* docs, release notes, discourse announcement
* social media (tweet, etc)
* youtube demo
* work w/ mcorr RE social-media
* log at start up that telemetry reporting is enabled and refer to a setting which should be changed to disable it
* really important for the Users Who Don't Read Anything
* log every time telemetry is sent
* homeassist does this [here](https://github.com/home-assistant/core/blob/4d72e41a3e88f696d255dc73e4f4e8ec88b1874f/homeassistant/components/analytics/analytics.py#L97)
* is periodicity configurable?
* "keep simple things simple" - hardcoded
* KISS - keep it simple stupid
* how often is "often enough"?
* what's the most-reasonable time interval, to most users?
* once/day
* can user control "when during the day" it happens?
* think about network-security-rules?
* at initial-migration-time, dispatch "soon" post-setup
* 30 min post-migrations-run (let pulp-install settle down)
* questions about performance (cpu/memory/etc)
* contact operate-first group
* performance/monitoring is separate from telemetry
* but a still really-useful thing to be doing!
* [dfurlong] memory-use/performance changes over time is really useful
* being able to easily-deliver monitoring results *back to pulp* from users would be great
* What is the list of questions we want to ask for each metric
* metric-acceptance discussion needs to be "somewhere permanent"
* should be a public checklist for answering these questions
* example: "How we decide if something is PII and how can it be sanitized"
* should be able to connect a specific metric to the exact commit when it entered the codebase
* what happens if/when an API being used to collect telemetry, changes what is delivered?
* what if PII gets added (e.g.)
* need to have a data-audit process in place
* an example:
* the data reported from the list of status
* What question will this help us answer?
* How many workers are users running?
* What plugins do they run?
* What is a specific example of the data to be gathered?
* [example TBD]
* How will this metric be stored in the database or gathered at runtime?
* We'll gather the data at runtime. This should not cause unecessary load on pulp
* Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
* No
* Is this metric Personally-Identifiable-Data?
* Yes the hostnames, so it needs to be redacted
* discussion about kinds-of-data
* what if post fails
* give up, send it tomorrow
* api call-periodicity?
* api call-sequences?
* should be a standard way for a user to request all their data be removed from the public data store
* can there be a standard test-sequence that investigates metric results for "known PII problems" and fails a metric if/as it finds something?
### AIs
* [bmbouters] make POC race-condition-free, post data, have a read-UI
* [all]: What do we want to focus on in the following 30-min mtgs?
* example: how do we develop metrics and test them?
* example: how do we let plugins report?
* example: let's talk about status API
* [ggainey]: write up "results of pulp /status/ API" as a formal presentation of a metric to the Telemetry Group, answering The List Of Questions
### Links
## 2022-01-14
### Attendees: bmbouter, ttereshc, ipanova, dkliban, ggainey
### Prev AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* no progress to report
* [bmbouter] talk about budget and direct costs with management
* "it's fine, but be selective about which provider we choose"
* [ttereshc] talk to lzap about Foreman telemetry
* done, largely concerned with performance-monitoring
* do we want to collect performance data? or just usage?
* what other red=hat-telemetry-services exist that we may want to integrate with/to?
* see ttereshc's email for more detail ("Foreman Telemetry")
### Agenda
* next mtg 20-JAN, 1 hr, then switch to 30 min
* how is a UUID generated?
* per-pulp-system
* ie, one UUID per-clustered-pulp
* "one UUID per-database"
* how/where will it be stored?
* in db - if it doesn't exist, create one
* create as a migration
* if it is in the db, use it
* would survive across restores/rebuilds
* multi-node installs/clusters
* same uuid, multiple nodes reporting - can we tell multi-machine architectures?
* how are we going to periodically post?
* single-node is 'easy'
* clusters
* not a separate call-home service
* periodic pulp-task-posting
* everyone puts data into db (somewhere), someone reports it up
* sanitizing data? - lv for "what do we report" later
* "how often" - performance data prob needs to be gathered more often, for example
* "how often do we write into the db?"
* write at service-startup?
* what about heartbeats?
* feature-use needs to happen more often?
* gather use-data from existing tables
* How do we do a daily task?
* wsgi, distributed-lock, dispatch task, record last-update
* wsgi heartbeat, check against last-dispatch, at correct interval start a new one
* database-xact to force ordering?
* even if it's poss for task to dispatch and yet fail to call home - it's ok
* what kind-of data is our focus?
* what versions of pulp are installed?
* what's "a typical pulp instance"?
* clustered vs not
* do we gather hardware info? (memory, disk usage, cpus?)
* what about feature-*usage* data?
* configuration - ie, content of pulp/settings.py?
* ONLY NON-SENSITIVE DATA
* def need to think hard about how to sanitize
* monitoring data?
* not a primary objective
* let's not shut the door on it for future opportunity
* monitoring wants UNsanitized data in order to be actionable
* what's at least one service we can POC against?
* cloudflare, amazon, etc
* bmbouters chooses Cloudflare - it uses Free Starter Account! It's Super-Effective!
* specific cost ballpark - $50-100/month at initial start, poss growing as we learn how much data and storage
* how can we provide full-choice to users to opt-out/opt-in
### AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* [bmbouter] POC against Cloudflare
* migration that creates UUID
* create CF account
* have periodic wsgi that posts UUID
* post progress to discourse
### Links
## 2022-01-06
### attendees: wibbit, ttereshc, dkliban, bmbouters, ggainey, ppicka, ipanova
* first 2/3 mtgs, 1 hr - then shorten to 30, less often
#### what do we want from today?
* set goals
* where is the data going to go?
* focus on base infrastructure first, then "what data collected and how"
* process for how to change/mutate/morph the kinds-of data being collected
* timeline possibility:
* base infra posted by end-of-January?
* uuid/one-piece-of-data gathered and sent "somewhere"
* maybe not have a date attached? just work on POC?
* maybe just post Goal, and not worry about Date
* focus on base-infra and where data will go as POC, data-details come Later
* example of a telemetry operation in production use : https://www.home-assistant.io/integrations/analytics
* uses CloudFlare to store data
* don't forget about GDPR (and friends) laws
* what do other projects use?
* OpenShift - need to talk to Other Folks
* AI: establish contact with them?
* What about Foreman?
* lzap driving?
* AI: talk to lzap
* Fedora? crash reports, installation?
* Firefox addon may do this?
* may need some digging, does Fedora still do this?
* talk to Red Hat around direct-cost of supporting such a service
* AI: [bmbouters] talk to rchan
* wibbit: where does data go
* assuming data is sufficiently anonymized to be made public?
* yes please
* keeps us honest about anonymizing
* enhances trust/transparency
* cost of distribution/access to the data from the public
* data-outflow vs data-ingress costs
* wibbit: enterprise env can be draconic around security
* infra needs to support multiple pulp-instances hitting a single internal proxy that is the single point-of-contact to telemtry service?
* two requirements
* clear docs on details of how data posts
* proxy support
* wibbit: data needs to be staged/stageable locally prior to being submitted
* submit-queue that can be paused/investigated
* bmbouter: adds to better user-knowledge/transparency, good idea
* wibbit: allows for admin-internal-consumption
* dkliban: would help manage multi-pulp-installation
* wibbit: Real People didn't raise any major concerns, beyond "we need to know what's being uploaded"
* wibbit: do we need a consistent UUID over time?
* need to be able to identify across upgrades
* change-over-time is really important
* bmbouter: feature should default-to-on
* ipanova: already long talk in foreman-land on this, see discourse
* wibbit: dflt-to-on is ok
* assumption is admins know what they're doing
* would lose any temporal-system info if dflt-to-off
* caveat: dflt-on for new-install vs upgrade?
* when-introduced, to an existing system, is qualitatively diff than new-install
* let's discuss how to do this "**very** transparently and loudly"
* where will this flag exist?
#### what do want by next week?
* AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* [bmbouter] talk about budget and direct costs with management
* talk to lzap about Foreman telemetry
* Things for next week's agenda:
* how is a UUID generated?
* how/where will it be stored?
* how are we going to periodically post?
* what's at least one service we can POC against?
* cloudflare, amazon, etc
* first three mtgs will be one hour
* going forward, 30 min on the half-hour
#### Links
* https://discourse.pulpproject.org/t/proposal-telemetry/259/2
* https://www.home-assistant.io/integrations/analytics#data-storage--processing
* https://www.cloudflare.com/products/workers-kv/
* https://www.home-assistant.io/integrations/analytics
* https://community.theforeman.org/t/foreman-telemetry-api-for-developers/26409
###### tags: `Telemetry`