client-side sampling

# client-side sampling ## what/why? refinery = infra overhead we have docs about refinery, what's the alternative? buffering vs. no buffering if you want buffering please just use refinery. why no buffering client side? cuz shut up not worth the effort/resources for orgs with many services, refinery these client-side alternatives are best for teams with a low number of services or who want to pre-sample specific kinds of events every team has different needs blah. ## example trace [code example to use throughout] The trace: * parent * child 1 * grandchild 1 * grandchild 2 * child 2 * grandchild 3 * grandchild 4 The order: 1. grandchild 1 2. grandchild 2 3. child 1 4. grandchild 3 5. grandchild 4 6. child 2 7. parent When events are sent eagerly, like with Honeycomb's Beelines, x happens. This is important to keep in mind for blah the rest of the page. Sampling decisions are made event-by-event when you're not buffering anything. ## case 1: simple send/drop hook ``` if fields["app.drop"] return false, 0 else return true, 1 ``` ### dropping leaves (add field) why drop leaves? e.g. noisy integrations, low-value spans maybe a blurb here about Honeycomb's pricing model/not spending event budget on low-value data caveat: works for leaves like grandchild 1-4; anything else results in missing parents ### dropping non-leaf individual events (parents) bad UX, don't do this. ### dropping entire traces (add field to trace) caveat: trace-level fields have to be added early, before events are sent. in the example trace, you could add the trace field from parent, child 1, or grandchild 1 ## case 2: varying sample rates blanket sample rate doesn't reflect actual behavior well--too coarse what about when you want to return something other than ~~1 or 0?~~ a fixed value across all traffic? example code showing conditional sample rates ### the math of sample rate * you can set custom rates from your sample hook, too * how rates get inflated on the backend * different events might get different rates (if you define it that way), which jacks up the results thus, client-side sampling for traces, for which we use... ### deterministic sampling bad idea: `rand(rate) == 0` good idea: `hash(trace.id) <= MAXHASH / rate` deterministic means consistent return value for consistent input value using the trace_id for input means same result for every field in a trace caveats: trying to achieve this client-side across distributed traces is hard just use refinery dammit. if you insist, here's how to propagate sample rates: (run this by the telemetry team before including it) trace headers and shit. no guarantee that it works. we probably can't help you debug it. seriously just use refinery. ## sampling in OTel (some parts are still under discussion)