Hypothesis driven, not data driven
I love this diagram!
<a title="Efbrazil, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:The_Scientific_Method.svg"><img width="512" alt="The Scientific Method" src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/82/The_Scientific_Method.svg/512px-The_Scientific_Method.svg.png"></a>
Open source telemetry should be using this approach, rather than the 'big data' approach of 'collect everything we can, and sift through it later'. The 'collect everything' approach, while easier for implementers, leads to both privacy problems as well as sometimes inaccurate results (p-hacking for example). It's also, IMO, less cool! And in 2023, being cool is very important.
So the overall process in an open source project should really look like:
Come up with a specific question it wants to answer