# client-side sampling
## what/why?
refinery = infra overhead
we have docs about refinery, what's the alternative?
buffering vs. no buffering
if you want buffering please just use refinery.
why no buffering client side? cuz shut up
not worth the effort/resources
for orgs with many services, refinery
these client-side alternatives are best for teams with a low number of services
or who want to pre-sample specific kinds of events
every team has different needs blah.
## example trace
[code example to use throughout]
The trace:
* parent
* child 1
* grandchild 1
* grandchild 2
* child 2
* grandchild 3
* grandchild 4
The order:
1. grandchild 1
2. grandchild 2
3. child 1
4. grandchild 3
5. grandchild 4
6. child 2
7. parent
When events are sent eagerly, like with Honeycomb's Beelines, x happens.
This is important to keep in mind for blah the rest of the page.
Sampling decisions are made event-by-event when you're not buffering anything.
## case 1: simple send/drop hook
```
if fields["app.drop"]
return false, 0
else
return true, 1
```
### dropping leaves (add field)
why drop leaves? e.g. noisy integrations, low-value spans
maybe a blurb here about Honeycomb's pricing model/not spending event budget on low-value data
caveat: works for leaves like grandchild 1-4; anything else results in missing parents
### dropping non-leaf individual events (parents)
bad UX, don't do this.
### dropping entire traces (add field to trace)
caveat: trace-level fields have to be added early, before events are sent. in the example trace, you could add the trace field from parent, child 1, or grandchild 1
## case 2: varying sample rates
blanket sample rate doesn't reflect actual behavior well--too coarse
what about when you want to return something other than ~~1 or 0?~~ a fixed value across all traffic?
example code showing conditional sample rates
### the math of sample rate
* you can set custom rates from your sample hook, too
* how rates get inflated on the backend
* different events might get different rates (if you define it that way), which jacks up the results
thus, client-side sampling for traces, for which we use...
### deterministic sampling
bad idea: `rand(rate) == 0`
good idea: `hash(trace.id) <= MAXHASH / rate`
deterministic means consistent return value for consistent input value
using the trace_id for input means same result for every field in a trace
caveats: trying to achieve this client-side across distributed traces is hard
just use refinery dammit.
if you insist, here's how to propagate sample rates: (run this by the telemetry team before including it)
trace headers and shit. no guarantee that it works. we probably can't help you debug it.
seriously just use refinery.
## sampling in OTel
(some parts are still under discussion)