Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/probabilistic_sampler] The default sampling mode in documentation is self-contradictory #36802

Open
namco1992 opened this issue Dec 12, 2024 · 3 comments · May be fixed by #37536
Open
Labels
needs triage New item requiring triage processor/probabilisticsampler Probabilistic Sampler processor

Comments

@namco1992
Copy link
Contributor

namco1992 commented Dec 12, 2024

Component(s)

processor/probabilisticsampler

Describe the issue you're reporting

In the probabilistic_sampler README, it mentions that the hash_seed is selected by default:

However, later in the configuration sector, it says

the default is "proportional" unless either hash_seed is configured or attribute_source is set to record.

- `mode` (string, optional): One of "proportional", "equalizing", or "hash_seed"; the default is "proportional" unless either `hash_seed` is configured or `attribute_source` is set to `record`.

According to the code below, the default mode is indeed hash_seed. The const defaultMode is also set to hash_seed, which makes the following logic always select hash_seed, regardless of whether the HashSeed value is set or not:

if mode == modeUnset {
// Reasons to choose the legacy behavior include:
// (a) having set the hash seed
// (b) logs signal w/o trace ID source
if cfg.HashSeed != 0 || (isLogs && cfg.AttributeSource != traceIDAttributeSource) {
mode = HashSeed
} else {
mode = defaultMode
}
}

According to the #31894, it mentioned:

The default sampling mode remains HashSeed. We consider a future change of default to Proportional to be desirable.

I suppose we need a follow-up to decide if we want to switch the default sampling mode and correct the document accordingly to avoid any confusion. I'd love to open a PR for it, but I'm unsure how we want to proceed.

@namco1992 namco1992 added the needs triage New item requiring triage label Dec 12, 2024
@github-actions github-actions bot added the processor/probabilisticsampler Probabilistic Sampler processor label Dec 12, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@jmacd
Copy link
Contributor

jmacd commented Jan 23, 2025

I think we should correct the documentation. Thanks @namco1992 for the investigation.

@namco1992
Copy link
Contributor Author

Got it, will PR it in, thanks!

namco1992 added a commit to namco1992/opentelemetry-collector-contrib that referenced this issue Jan 28, 2025
mode

Fixes open-telemetry#36802.

As per open-telemetry#36802 described, the sampling mode is always "hash_seed" if no
mode is specified, regardless of the `hash_seed` or `attribute_source`
field status. Update the README to reflect it.

Signed-off-by: Mengnan Gong <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage New item requiring triage processor/probabilisticsampler Probabilistic Sampler processor
Projects
None yet
2 participants