# Telemetry Working Group
* discourse thread : https://discourse.pulpproject.org/t/proposal-telemetry/259/2
* meeting-notes template
```
## YYYY-MM-DD
### Attendees:
### Prev AIs
### Agenda
### AIs
### Links
```
## Ongoing Notes
* first three mtgs will be one hour
* going forward, 30 min on the half-hour
## List of Questions for Every Metric To Be Gathered
* What question will this help us answer?
* What is a specific example of the data to be gathered?
* How will this metric be stored in the database or gathered at runtime?
* Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
* Is this metric Personally-Identifiable-Data?
* What pulpcore version will this be collected with?
* Is this approved/not-approved?
### Telemetry-proposal Template
* **NOTE**: this is available as the "Telemetry" template in hackmd.io/pulp!
* https://hackmd.io/@pulp/telemetry_template
```
# Title
## What question will this help us answer?
## What is a specific example of the data to be gathered?
## How will this metric be stored in the database or gathered at runtime?
## Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
## Is this metric Personally-Identifiable-Data?
### How can we sanitize this output?
## What pulpcore version will this be collected with?
## Alternative proposal(s)
### Option 1
### Option N
## Discussion notes
## Is this approved/not-approved?
## Parking Lot for potential future/RFE work
###### tags: `Telemetry`
```
## Open Questions
* Do we want to compute processes / host also?
* Should we want to configure this to analytics.pulpproject.org?
## 2022-06-16
### Attendees: ppicka, bmbouter, ipanova, douglas
* currently pulpcore will post only to the dev site, and only if the user has a .dev installation
* some users could have .dev
*
### Action Items
* [bmbouter] Make a graph that aggregates all Z versions into totals and show that x.y counts
* [bmbouter] Put up "coming soon page"
* [bmbouter] Get analytics.pulpproject.org DNS integrated with https://analytics-pulpproject-org.pulpproject.workers.dev/
* [bmbouter] Reset the https://analytics-pulpproject-org.pulpproject.workers.dev/ environment
* [bmbouter] make additional graphs for each expected plugin version posted
* [bmbouter] go through and implementation pagination in summary data
## 2022-05-26
### Attendees: ppicka, bmbouter, ipanova, dkliban, douglas
* In summarizing numbers, in addition to the mean, do we want max and min also?
* not for now
* Is it time to sign up for the $5 / month plan?
* yes
* How do we make the versions graph not so complicated?
* Keep the raw data including the z-version, but also make a graph that aggregates all Z versions into totals and show that
### Action Items
* [bmbouter] Make a graph that aggregates all Z versions into totals and show that x.y counts
* [bmbouter] Revise telemetry PoC to only have it post dev data
* [bmbouter] Check in with RH about them enabling the pay-plan
## 2022-04-07
### Attendees: ppicka, bmbouter, ipanova, dkliban, ggainey, douglas
* quick review of the graphs with the status data
* https://hackmd.io/@pulp/telemetry_status#Graphs-to-be-produced
* duplicate data submission
* expiration_time - 30days
* there should only be one data point from each system because the key is the systemID
* KV - data format
* {SystemID: {all_the_data, , , }}
* summarization process
* only considers the latest data points posted in the last 24 hours
* Are users allowed to download the raw data?
* No because we're telling users that their raw data is only ever retained for 30 days
* Are users allowed to download the summary data?
* The public analytics site will provide the data, we may allow for downloading of the summarized data later
* how to disable this for dev installs
* have a dev URL and analytics site and a production URL and analytics site
* if pulpcore ends in .dev submit to the dev site otherwise the production site
* similar to [what home assistant does](https://github.com/home-assistant/core/blob/4d72e41a3e88f696d255dc73e4f4e8ec88b1874f/homeassistant/components/analytics/analytics.py#L99)
* First implementation not planning to handle proxy configs
## 2022-03-24
### Attendees: ppicka, bmbouter, ipanova, dkliban
* Will we share the raw data, or just the summarized data?
* We'll provide just the summaries publicly
* See the graphs to be produced at the bottom of the https://hackmd.io/@pulp/telemetry_status document
* Proposal: summarize daily and include only 1 data point from each systemID
## 2022-03-17
### Attendees: ppicka, dfurlong, ggainey, bmbouters, ipanova
* bmbouter revised POC and demo
* https://github.com/pulp/pulpcore/pull/2118/files
* key/value - "systemid": "telemetry-key:value"
* Thoughts
* how/where do we log outgoing info?
* into logs? what level?
* into task progress-report?
* into sep file?
* needs discussion
* what's a good TTL for data sent to CloudFlare?
* cloudflare docs : https://api.cloudflare.com/#custom-hostname-for-a-zone-custom-hostname-details
* HomeAssistant has cloudflare-side worker-code receiving data
* How do we build/maintain summary info?
* What if we send as "uuid-timestamp": "data"?
* details are important - but at a high level, what aggregate/historical data are we actually interested in keeping?
* "What question are we answering" needs an additional "How are we going to visualize that information?"
* keep in mind the difference between "monitoring" and "telemetry"
* AI for all: what kinds of ways would we like to summarize/display/graph the existing data proposal ("status")
## 2022-02-03
### Attendees: ppicka, dfurlong, dkliban, ggainey, bmbouters, ipanova
### Prev AIs
* ggainey status writup : https://hackmd.io/@pulp/telemetry_status
* great discussion ensues
### Agenda
* review /status/ writeup
* alternative proposal approved
* [all]: What do we want to focus on in the following 30-min mtgs?
* example: how do we develop metrics and test them?
* example: how do we let plugins report?
* example: let's talk about status API
### AIs
* ~~[ggainey] hackmd to list "things we might want telemetry proposals for", send link to list~~
* https://hackmd.io/@pulp/telemetry_suggestions
* [ALL] everyone adds one line to ^^
* [ggainey] ~~update telemetry-proposal template to include "discussion", "alternative proposal", "RFE suggestions arising from discussion" sections~~
### Links
* https://hackmd.io/@pulp/telemetry_status
* https://hackmd.io/@pulp/telemetry_suggestions
## 2022-01-27
### Attendees:
### Prev AIs
* [bmbouters] make POC race-condition-free, post data, have a read-UI
* [all]: What do we want to focus on in the following 30-min mtgs?
* example: how do we develop metrics and test them?
* example: how do we let plugins report?
* example: let's talk about status API
* [ggainey]: write up "results of pulp /status/ API" as a formal presentation of a metric to the Telemetry Group, answering The List Of Questions
* https://hackmd.io/@pulp/telemetry_status
### Agenda
* ggainey to report on anything from OCP Telemetry discussions
* response back from Nick Stielau
* I have a link to an internal doc on how/what his group is measuring
* Standing offer to have a 30-min telemetry/metric overview discussion, have not set a date yet
* Pointer to https://www.productled.org/foundations/product-led-growth-metrics for general info (if anyone hasn't seen this before)
### AIs
### Links
## 2022-01-20
### Attendees: bmbouters, dkliban, dfurlong, ppicka, ipanova, ggainey
* Last 1-hr mtg
* future mtgs 30 min at the half-hour
### Prev AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* contact made, pointers received, email dispatched
* [bmbouter] POC against Cloudflare
* migration that creates UUID
* https://github.com/pulp/pulpcore/pull/2118/files
* create CF account
* done tied to pulp-infra
* have periodic wsgi that posts UUID
* post progress to discourse
### Agenda
* discussion about POC
* discussion around implications of adding tasking-subsystem to Pulp3
* signed up for Cloudflare k/v accoumnt (pulp-infra@ rmail)
* something is "not right yet" - #soon
* bmbouter to engage CF Discourse
* https://discord.gg/cloudflaredev
* What are all the ways we could communicate this transparency to users?
* How do we make it Really Easy for user to know what's happening and opt-out?
* docs, release notes, discourse announcement
* social media (tweet, etc)
* youtube demo
* work w/ mcorr RE social-media
* log at start up that telemetry reporting is enabled and refer to a setting which should be changed to disable it
* really important for the Users Who Don't Read Anything
* log every time telemetry is sent
* homeassist does this [here](https://github.com/home-assistant/core/blob/4d72e41a3e88f696d255dc73e4f4e8ec88b1874f/homeassistant/components/analytics/analytics.py#L97)
* is periodicity configurable?
* "keep simple things simple" - hardcoded
* KISS - keep it simple stupid
* how often is "often enough"?
* what's the most-reasonable time interval, to most users?
* once/day
* can user control "when during the day" it happens?
* think about network-security-rules?
* at initial-migration-time, dispatch "soon" post-setup
* 30 min post-migrations-run (let pulp-install settle down)
* questions about performance (cpu/memory/etc)
* contact operate-first group
* performance/monitoring is separate from telemetry
* but a still really-useful thing to be doing!
* [dfurlong] memory-use/performance changes over time is really useful
* being able to easily-deliver monitoring results *back to pulp* from users would be great
* What is the list of questions we want to ask for each metric
* metric-acceptance discussion needs to be "somewhere permanent"
* should be a public checklist for answering these questions
* example: "How we decide if something is PII and how can it be sanitized"
* should be able to connect a specific metric to the exact commit when it entered the codebase
* what happens if/when an API being used to collect telemetry, changes what is delivered?
* what if PII gets added (e.g.)
* need to have a data-audit process in place
* an example:
* the data reported from the list of status
* What question will this help us answer?
* How many workers are users running?
* What plugins do they run?
* What is a specific example of the data to be gathered?
* [example TBD]
* How will this metric be stored in the database or gathered at runtime?
* We'll gather the data at runtime. This should not cause unecessary load on pulp
* Will the gathering and/or storage of this cause unacceptable burden/load on Pulp?
* No
* Is this metric Personally-Identifiable-Data?
* Yes the hostnames, so it needs to be redacted
* discussion about kinds-of-data
* what if post fails
* give up, send it tomorrow
* api call-periodicity?
* api call-sequences?
* should be a standard way for a user to request all their data be removed from the public data store
* can there be a standard test-sequence that investigates metric results for "known PII problems" and fails a metric if/as it finds something?
### AIs
* [bmbouters] make POC race-condition-free, post data, have a read-UI
* [all]: What do we want to focus on in the following 30-min mtgs?
* example: how do we develop metrics and test them?
* example: how do we let plugins report?
* example: let's talk about status API
* [ggainey]: write up "results of pulp /status/ API" as a formal presentation of a metric to the Telemetry Group, answering The List Of Questions
### Links
## 2022-01-14
### Attendees: bmbouter, ttereshc, ipanova, dkliban, ggainey
### Prev AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* no progress to report
* [bmbouter] talk about budget and direct costs with management
* "it's fine, but be selective about which provider we choose"
* [ttereshc] talk to lzap about Foreman telemetry
* done, largely concerned with performance-monitoring
* do we want to collect performance data? or just usage?
* what other red=hat-telemetry-services exist that we may want to integrate with/to?
* see ttereshc's email for more detail ("Foreman Telemetry")
### Agenda
* next mtg 20-JAN, 1 hr, then switch to 30 min
* how is a UUID generated?
* per-pulp-system
* ie, one UUID per-clustered-pulp
* "one UUID per-database"
* how/where will it be stored?
* in db - if it doesn't exist, create one
* create as a migration
* if it is in the db, use it
* would survive across restores/rebuilds
* multi-node installs/clusters
* same uuid, multiple nodes reporting - can we tell multi-machine architectures?
* how are we going to periodically post?
* single-node is 'easy'
* clusters
* not a separate call-home service
* periodic pulp-task-posting
* everyone puts data into db (somewhere), someone reports it up
* sanitizing data? - lv for "what do we report" later
* "how often" - performance data prob needs to be gathered more often, for example
* "how often do we write into the db?"
* write at service-startup?
* what about heartbeats?
* feature-use needs to happen more often?
* gather use-data from existing tables
* How do we do a daily task?
* wsgi, distributed-lock, dispatch task, record last-update
* wsgi heartbeat, check against last-dispatch, at correct interval start a new one
* database-xact to force ordering?
* even if it's poss for task to dispatch and yet fail to call home - it's ok
* what kind-of data is our focus?
* what versions of pulp are installed?
* what's "a typical pulp instance"?
* clustered vs not
* do we gather hardware info? (memory, disk usage, cpus?)
* what about feature-*usage* data?
* configuration - ie, content of pulp/settings.py?
* ONLY NON-SENSITIVE DATA
* def need to think hard about how to sanitize
* monitoring data?
* not a primary objective
* let's not shut the door on it for future opportunity
* monitoring wants UNsanitized data in order to be actionable
* what's at least one service we can POC against?
* cloudflare, amazon, etc
* bmbouters chooses Cloudflare - it uses Free Starter Account! It's Super-Effective!
* specific cost ballpark - $50-100/month at initial start, poss growing as we learn how much data and storage
* how can we provide full-choice to users to opt-out/opt-in
### AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* [bmbouter] POC against Cloudflare
* migration that creates UUID
* create CF account
* have periodic wsgi that posts UUID
* post progress to discourse
### Links
## 2022-01-06
### attendees: wibbit, ttereshc, dkliban, bmbouters, ggainey, ppicka, ipanova
* first 2/3 mtgs, 1 hr - then shorten to 30, less often
#### what do we want from today?
* set goals
* where is the data going to go?
* focus on base infrastructure first, then "what data collected and how"
* process for how to change/mutate/morph the kinds-of data being collected
* timeline possibility:
* base infra posted by end-of-January?
* uuid/one-piece-of-data gathered and sent "somewhere"
* maybe not have a date attached? just work on POC?
* maybe just post Goal, and not worry about Date
* focus on base-infra and where data will go as POC, data-details come Later
* example of a telemetry operation in production use : https://www.home-assistant.io/integrations/analytics
* uses CloudFlare to store data
* don't forget about GDPR (and friends) laws
* what do other projects use?
* OpenShift - need to talk to Other Folks
* AI: establish contact with them?
* What about Foreman?
* lzap driving?
* AI: talk to lzap
* Fedora? crash reports, installation?
* Firefox addon may do this?
* may need some digging, does Fedora still do this?
* talk to Red Hat around direct-cost of supporting such a service
* AI: [bmbouters] talk to rchan
* wibbit: where does data go
* assuming data is sufficiently anonymized to be made public?
* yes please
* keeps us honest about anonymizing
* enhances trust/transparency
* cost of distribution/access to the data from the public
* data-outflow vs data-ingress costs
* wibbit: enterprise env can be draconic around security
* infra needs to support multiple pulp-instances hitting a single internal proxy that is the single point-of-contact to telemtry service?
* two requirements
* clear docs on details of how data posts
* proxy support
* wibbit: data needs to be staged/stageable locally prior to being submitted
* submit-queue that can be paused/investigated
* bmbouter: adds to better user-knowledge/transparency, good idea
* wibbit: allows for admin-internal-consumption
* dkliban: would help manage multi-pulp-installation
* wibbit: Real People didn't raise any major concerns, beyond "we need to know what's being uploaded"
* wibbit: do we need a consistent UUID over time?
* need to be able to identify across upgrades
* change-over-time is really important
* bmbouter: feature should default-to-on
* ipanova: already long talk in foreman-land on this, see discourse
* wibbit: dflt-to-on is ok
* assumption is admins know what they're doing
* would lose any temporal-system info if dflt-to-off
* caveat: dflt-on for new-install vs upgrade?
* when-introduced, to an existing system, is qualitatively diff than new-install
* let's discuss how to do this "**very** transparently and loudly"
* where will this flag exist?
#### what do want by next week?
* AIs
* [ggainey] establish contact with Carl Trieoff RE OpenShift data gathering [gchat]
* [bmbouter] talk about budget and direct costs with management
* talk to lzap about Foreman telemetry
* Things for next week's agenda:
* how is a UUID generated?
* how/where will it be stored?
* how are we going to periodically post?
* what's at least one service we can POC against?
* cloudflare, amazon, etc
* first three mtgs will be one hour
* going forward, 30 min on the half-hour
#### Links
* https://discourse.pulpproject.org/t/proposal-telemetry/259/2
* https://www.home-assistant.io/integrations/analytics#data-storage--processing
* https://www.cloudflare.com/products/workers-kv/
* https://www.home-assistant.io/integrations/analytics
* https://community.theforeman.org/t/foreman-telemetry-api-for-developers/26409
###### tags: `Telemetry`