Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added support for filterBy in trace tab such as filterBy spanCount #6374

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

eKuG
Copy link
Contributor

@eKuG eKuG commented Nov 5, 2024

Summary

Solves #3174

Related Issues / PR's

Screenshots

NA

Affected Areas and Manually Tested Areas


Important

Add support for filtering trace queries by spanCount using a new filterBy parameter in query functions.

  • Behavior:
    • Added filterBy parameter to runBuilderQuery() in helper.go and v2/helper.go to support filtering by spanCount.
    • Updated PrepareTracesQuery() in query_builder.go and traces/v3/query_builder.go to accept filterBy and incorporate it into query construction.
  • Models:
    • Added FilterBy field to QueryRangeParamsV3 in v3.go to store filter criteria.
  • Functions:
    • Modified buildTracesQuery() in traces/v3/query_builder.go to handle filterBy parameter.
  • Misc:
    • Updated function signatures for trace query preparation to include filterBy in query_builder.go and traces/v3/query_builder.go.

This description was created by Ellipsis for fd31605. It will automatically update as commits are pushed.

Copy link

github-actions bot commented Nov 5, 2024

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

@github-actions github-actions bot added the enhancement New feature or request label Nov 5, 2024
Copy link

github-actions bot commented Nov 5, 2024

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 42e7faa in 54 seconds

More details
  • Looked at 157 lines of code in 5 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pkg/query-service/model/v3/v3.go:380
  • Draft comment:
    The FilterBy parameter is added but not used in any logic or query construction. Ensure it is utilized or remove it to avoid confusion.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. pkg/query-service/model/v3/v3.go:380
  • Draft comment:
    The FilterBy field in QueryRangeParamsV3 is not used in the codebase. Consider removing it to maintain clean code.
  • Reason this comment was not posted:
    Marked as duplicate.

Workflow ID: wflow_8CoFwUGy3oX38HkO


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

github-actions bot commented Nov 7, 2024

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on c2e7e91 in 39 seconds

More details
  • Looked at 55 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. pkg/query-service/app/traces/v3/query_builder.go:368
  • Draft comment:
    Remove commented-out code to improve readability and maintainability.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The code contains commented-out lines that are not needed and should be removed for clarity and cleanliness.
2. pkg/query-service/app/traces/v3/query_builder.go:369
  • Draft comment:
    Ensure that SQL query construction is safe from SQL injection vulnerabilities by using parameterized queries or proper escaping.
  • Reason this comment was not posted:
    Comment did not seem useful.
3. pkg/query-service/app/traces/v3/query_builder.go:358
  • Draft comment:
    Consider documenting the default limit behavior or ensure it's the intended behavior when limit is set to 0.
  • Reason this comment was not posted:
    Confidence changes required: 30%
    The code uses a default limit of 100 if the limit is set to 0. This behavior should be documented or reconsidered if it's not the intended behavior.
4. pkg/query-service/app/traces/v3/query_builder.go:353
  • Draft comment:
    Avoid using the component/index.tsx file structure approach, as it makes it difficult to debug and find components using global search tools like VS Code.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_wx127ywEe55WrcsG


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

github-actions bot commented Nov 7, 2024

Build Error! No Linked Issue found. Please link an issue or mention it in the body using #<issue_id>

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on d57d5ab in 41 seconds

More details
  • Looked at 53 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pkg/query-service/app/traces/v3/query_builder_test.go:1196
  • Draft comment:
    The FilterBy parameter is added to the function call but is not utilized in the test logic. Ensure that the FilterBy parameter is properly integrated into the test cases to validate its functionality. This issue is also present in TestPrepareTracesQuery.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. pkg/query-service/app/traces/v3/query_builder_test.go:491
  • Draft comment:
    Avoid using the component/index.tsx file structure approach, as it makes it difficult to debug and find components using global search tools like VS Code. This applies to the overall file structure.
  • Reason this comment was not posted:
    Comment was on unchanged code.

Workflow ID: wflow_BAFwOmRDcBEuoWIT


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@eKuG eKuG changed the title [WIP] feat: added support for filterBy in trace tab such as filterBy spanCount feat: added support for filterBy in trace tab such as filterBy spanCount Nov 7, 2024
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on fd31605 in 1 minute and 14 seconds

More details
  • Looked at 20 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. pkg/query-service/app/traces/v3/query_builder_test.go:1200
  • Draft comment:
    Remove the commented-out code block as it seems to be a leftover and is not needed.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The commented-out code block seems to be a leftover and should be removed for code cleanliness.
2. pkg/query-service/app/traces/v3/query_builder_test.go:1200
  • Draft comment:
    Remove the commented-out code block to keep the code clean and maintainable.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_oywcCxJ1VxW4LEYZ


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@srikanthccv
Copy link
Member

@eKuG I see two issues with this change.

  1. ... GROUP BY traceID ...

This is not going to work at scale. We can't do a group by a very high cardinal column at query. The number of traces can go from a few tens of millions to several hundred million and billions easily. Adding such a column in a group by is not an option because there won't be enough memory. Did you test this on a few billion traces or several hundred million traces at least?

  1. FilterBy string in the request payload

I would assume the capability should support the ability to specify the span count in the request, for example, give me traces with span count > 1000 and < 2000. I don't see how that's supported.

@eKuG
Copy link
Contributor Author

eKuG commented Nov 7, 2024

Hey @srikanthccv Lets talk about the second point first.
So "filterBy" is an attribute used to filter traces by either span_count or trace_duration. It has no other purpose. (In hindsight, I think I should rename thiss column to something like "trace_filter"). Let me know what you think.
If we need to finf the traces where 1000<spanCount<2000, that will go in the where clause.

My final query was looking something like this. So the span_count > x will be present in the where clause with the timestamp.

SELECT
    traceID,
    COUNT() AS span_count,
    max(toInt64(timestamp) + durationNano) - min(toInt64(timestamp)) AS total_duration_nano
FROM signoz_traces.signoz_index_v2
WHERE (timestamp >= '1730162616000000000') AND (timestamp <= '1730268878000000000')
GROUP BY traceID
ORDER BY span_count DESC
LIMIT 10

For second point, do let me know what you think should be the best approach for the same? I tested this query on approximately 200M traces and it perfromed as expected. Let me see, if we can optimise this query somehow.

@srikanthccv
Copy link
Member

srikanthccv commented Nov 7, 2024

If we need to finf the traces where 1000<spanCount<2000, that will go in the where clause.

Please share how this is achieved in the code.

I tested this query on approximately 200M traces and it perfromed as expected

Please help me understand the performed as expected part. Have you folks discussed some pre-requisite in terms of resources needed and how you would have billions of traceIDs? From my experience, anything beyond 10 million rows in the group by questionable and beyond 25 mil is where the line should be drawn in most cases. Unless the enormous amount of memory availability, the 200M should have crashed with memory error.

Edit:

I see that query only needs to maintain count and duration for state so it might require comparatively less memory but it can OOM kill ClickHouse when the total traces id can't fit in memory

@srikanthccv
Copy link
Member

I don't think grouping by traceID at the query time is an option for us. I believe @nityanandagohain has created some summary tables as a part of traces v3, which helps specifically with count; please check with him once.

@eKuG
Copy link
Contributor Author

eKuG commented Nov 7, 2024

SELECT
    traceID,
    COUNT() AS span_count,
    max(toInt64(timestamp) + durationNano) - min(toInt64(timestamp)) AS total_duration_nano
FROM signoz_traces.signoz_index_v2
WHERE (timestamp >= '1730162616000000000') AND (timestamp <= '1730268878000000000')
GROUP BY traceID
HAVING COUNT() > 5
ORDER BY span_count DESC
LIMIT 10

@srikanthccv According to the document I have written on notion, it should have been having instead of where for Span_count.
Having said that, I understand your concern with group by clause with larger dataset. Let me start a conversation on slack with @nityanandagohain when he can add span_count MV on trace v3

@eKuG eKuG closed this Nov 7, 2024
@eKuG eKuG reopened this Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs shipped enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants