-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samples limit is not honored by failed rows checks #1985
Comments
SAS-2664 |
This changes the way samples are collected from UserDefinedFailedRowsExpressionQueries. In order to apply the samples limit given in the check configuration, or apply the default samples limit, the failed rows expression needs to fire off an additional SampleQuery. The SampleQuery executes a copy of the query, appending a LIMIT clause. [sodadata#1985](sodadata#1985)
I made a first attempt at a partial fix for this: #1986 This works for the However this approach doesn't work for the Can I get some thoughts from the Soda Team on:
After your initial replies I'll have a look at adding unit tests to the PR |
When going over my first solution with a colleague, I realised this is not going to solve the issue for us because of the whole dataframe still being collected for Calculating the number of failed rows by retrieving the complete dataset and computing the length of it, has scaling limitations, which we encounter pretty frequently in our environment. Going around that needs refactoring on a deeper level than what my solution provides so far. When I have time, I'll try to figure out what is needed to solve the root cause of this scaling limitation |
Could we add get_row_length functionality in the class DbSample(Sample) like so?
|
This was addressed in 3.3.5, you can now have user defined metric check with a failed rows query in one check, making the failed rows sample possible to be limited. |
We discovered an issue with sample limits not being applied when using failed row checks.
Some things we've tried with a similar configuration as above:
fail condition
andfail query
.failed_count
andmissing_count
checks. TheLIMIT
can be seen being applied when inspecting the SQL queries generated by Soda.Confirmed not working on Soda Core
3.1.2
Mention in soda-core channel in the soda community slack
To reproduce the issue
prints:
The text was updated successfully, but these errors were encountered: