Implement Sampling Seed Propagation #3921

bitsandfoxes · 2025-01-28T11:13:01Z

See https://develop.sentry.dev/sdk/telemetry/traces/#propagated-random-value

The goal is to propagate the random value in the context of distributed tracing and aims to improve trace quality when using a custom tracesSampler.

Motivation

A trace is complete when all of its members are sampled. A "sub-trace" is complete when all of its descendents are sampled.

Ordinarily, Trace and Logging SDKs configure parent-based samplers which decide to sample based on the Context, because it leads to completeness. However, when non-root spans or logs make independent sampling decisions, instead of using the parent-based approach, incompleteness may result.

Consistent probability sampling requires that for any span in a given trace, if a Sampler with lesser sampling probability selects the span for sampling, then the span would also be selected by a Sampler configured with greater sampling probability.

This is achieved by propagating not only the sampling decision, but also the inputs used to make that decision, in the Dynamic Sampling Context so that downstream nodes can make consistent sampling decisions.

Example

Imagine a trace from machines A -> B -> C.

The respective sample rates are 0.5 -> 0.1 -> 0.3

On machine A (the start of the trace) a random number is generated in the range [0, 1) and assigns this to sample_rand, which is then propagated and used by all members of the trace for their sampling decision. Let's assume that number is 0.2345.
Machine A: Sets sampled = 0.2345 < 0.5 = true
Machine B: Sets sampled = 0.2345 < 0.1 = false
Machine C: Sets sampled = 0.2345 < 0.3 = true

Propagating the sample_rand and having all members use this in sampling decisions ensures that whenever a member with a lower sampling rate (like machine C in this case) sample in, members with higher sample rates (like machine A in this case) also sample in.

In this example, Machine B didn't sample in and so we don't get a complete trace. However if machine B did sample in, Machines A and C would also be guaranteed to... so we always get complete traces whenever the member with the lowest sample rate samples in.

The only other way to guarantee complete traces is by having all members sample at the same rate (e.g. 0.5 or 0.1)... which is basically what you get when the parent decides for everyone. Seed propagation doesn't guarantee complete traces - it maximises the chances of a complete trace when different members participating in a trace are all maxing their own sampling decisions (either with a custom sample rate or a custom trace sampling function).

References

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/probabilisticsamplerprocessor/README.md

The text was updated successfully, but these errors were encountered:

jamescrosswell · 2025-01-28T20:26:40Z

@bitsandfoxes I'm not sure I get it. There's already a parentSampled right (indicating whether the trace was sampled in or out by the parent)... So if there is a parent, the sampling decision gets made by the parent right? And if there isn't a parent then you'd get neither parentSampled nor sample_rand right?

So why/when does sample_rand ever get used?

bitsandfoxes · 2025-01-30T10:29:17Z

I've updated the issue description based on the slack conversation you had.

schmitch · 2025-02-05T10:43:14Z

it would actually be even greater if sentry would allow to ingest data from otel directly, it would or at least provide a exporter in https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter so that we can do Dotnet -> OTEL -> Sentry.

That would allow Tail Sampling.

jamescrosswell · 2025-02-06T00:15:01Z

it would actually be even greater if sentry would allow to ingest data from otel directly

@schmitch you're aware of the Sentry.OpenTelemetry package right? Does that integration not do what you want?

github-project-automation bot added this to GDX Jan 28, 2025

stephanie-anderson assigned jamescrosswell Feb 3, 2025

getsantry bot added the Waiting for: Product Owner label Feb 5, 2025

getsantry bot added this to GitHub Issues with 👀 3 Feb 5, 2025

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Feb 5, 2025

getsantry bot removed the Waiting for: Product Owner label Feb 6, 2025

getsantry bot removed the status in GitHub Issues with 👀 3 Feb 6, 2025

jamescrosswell linked a pull request Feb 10, 2025 that will close this issue

Propagate Sampling Seed #3951

Draft

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Sampling Seed Propagation #3921

Implement Sampling Seed Propagation #3921

bitsandfoxes commented Jan 28, 2025 •

edited by jamescrosswell

Loading

jamescrosswell commented Jan 28, 2025 •

edited

Loading

bitsandfoxes commented Jan 30, 2025

schmitch commented Feb 5, 2025

jamescrosswell commented Feb 6, 2025

Implement Sampling Seed Propagation #3921

Implement Sampling Seed Propagation #3921

Comments

bitsandfoxes commented Jan 28, 2025 • edited by jamescrosswell Loading

Motivation

Example

References

jamescrosswell commented Jan 28, 2025 • edited Loading

bitsandfoxes commented Jan 30, 2025

schmitch commented Feb 5, 2025

jamescrosswell commented Feb 6, 2025

bitsandfoxes commented Jan 28, 2025 •

edited by jamescrosswell

Loading

jamescrosswell commented Jan 28, 2025 •

edited

Loading