-
-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Sampling Seed Propagation #3921
Comments
@bitsandfoxes I'm not sure I get it. There's already a So why/when does |
I've updated the issue description based on the slack conversation you had. |
it would actually be even greater if sentry would allow to ingest data from otel directly, it would or at least provide a exporter in https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter so that we can do Dotnet -> OTEL -> Sentry. That would allow Tail Sampling. |
@schmitch you're aware of the Sentry.OpenTelemetry package right? Does that integration not do what you want? |
See https://develop.sentry.dev/sdk/telemetry/traces/#propagated-random-value
The goal is to propagate the random value in the context of distributed tracing and aims to improve trace quality when using a custom
tracesSampler
.Motivation
A trace is complete when all of its members are sampled. A "sub-trace" is complete when all of its descendents are sampled.
Ordinarily, Trace and Logging SDKs configure parent-based samplers which decide to sample based on the Context, because it leads to completeness. However, when non-root spans or logs make independent sampling decisions, instead of using the parent-based approach, incompleteness may result.
Consistent probability sampling requires that for any span in a given trace, if a Sampler with lesser sampling probability selects the span for sampling, then the span would also be selected by a Sampler configured with greater sampling probability.
This is achieved by propagating not only the sampling decision, but also the inputs used to make that decision, in the Dynamic Sampling Context so that downstream nodes can make consistent sampling decisions.
Example
Imagine a trace from machines A -> B -> C.
The respective sample rates are 0.5 -> 0.1 -> 0.3
On machine A (the start of the trace) a random number is generated in the range [0, 1) and assigns this to
sample_rand
, which is then propagated and used by all members of the trace for their sampling decision. Let's assume that number is 0.2345.Machine A: Sets
sampled = 0.2345 < 0.5
=true
Machine B: Sets
sampled = 0.2345 < 0.1
=false
Machine C: Sets
sampled = 0.2345 < 0.3
=true
Propagating the
sample_rand
and having all members use this in sampling decisions ensures that whenever a member with a lower sampling rate (like machine C in this case) sample in, members with higher sample rates (like machine A in this case) also sample in.In this example, Machine B didn't sample in and so we don't get a complete trace. However if machine B did sample in, Machines A and C would also be guaranteed to... so we always get complete traces whenever the member with the lowest sample rate samples in.
The only other way to guarantee complete traces is by having all members sample at the same rate (e.g. 0.5 or 0.1)... which is basically what you get when the parent decides for everyone. Seed propagation doesn't guarantee complete traces - it maximises the chances of a complete trace when different members participating in a trace are all maxing their own sampling decisions (either with a custom sample rate or a custom trace sampling function).
References
The text was updated successfully, but these errors were encountered: