Improve traceability of accountable memory usage for expensive queries #130

hammerhead · 2024-11-08T10:53:32Z

Problem Statement

If a cluster is under high query load, users can run into CircuitBreakingException. In case the total circuit breaker trips (indices.breaker.total.limit), users will see certain queries failing, but those are not necessarily the most expensive queries. There can be other (long-running) heavy queries that occupy large portions of memory, but are still below the (total) circuit breaker limits. Users may then start seeing relatively cheap queries failing.

As a user, I want to have an easy way to diagnose what queries are currently occupying (or have recently occopied) what amount of memory that is accountable for the total circuit breaking.

Besides troubleshooting circuit breaker exceptions, such a metric can also help to proactively review expensive queries before running into total circuit breaker exceptions.

Possible Solutions

There could be different solutions.

Use existing sys.operations(_log) tables
There already is a used_bytes column in sys.operations(_log). Would SELECT job_id, SUM(used_bytes) FROM sys.operations be an accurate representation of peak total accountable memory usage per query?

If yes, it could be documented in Diagnostics with System Tables.
Add a new field to sys.jobs(_log)
If there is no metric exposing the peak accountable memory usage of a query, a new field could be added to sys.jobs(_log).

Considered Alternatives

No response

The text was updated successfully, but these errors were encountered:

seut · 2024-11-08T12:14:35Z

Would SELECT job_id, SUM(used_bytes) FROM sys.operations be an accurate representation of peak total accountable memory usage per query?

Yes, that is what it is designed for. If it is not working as expected, it would be a bug.

If yes, it could be documented in Diagnostics with System Tables.

Good point, would you be up for adding this?

hammerhead transferred this issue from crate/crate Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve traceability of accountable memory usage for expensive queries #130

Improve traceability of accountable memory usage for expensive queries #130

hammerhead commented Nov 8, 2024

seut commented Nov 8, 2024 •

edited

Loading

Improve traceability of accountable memory usage for expensive queries #130

Improve traceability of accountable memory usage for expensive queries #130

Comments

hammerhead commented Nov 8, 2024

Problem Statement

Possible Solutions

Considered Alternatives

seut commented Nov 8, 2024 • edited Loading

seut commented Nov 8, 2024 •

edited

Loading