-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using AS
on an existing column name has value overridden if value exists on field resulting from another column in the query if having same name from another table
#42
Comments
I believe the best call here considering we have real users outside us internally is the following:
I dont really have strong feelings here, but there is something that feels off about it working as it sits right now, and I'd just like to gather others' thoughts and let that stew for a while and hopefully make a good decision when someone has time to seriously consider this. It seems like maybe the best short-to-mid-term decision is to keep it as maps, but CLEARLY document the shortcomings and highlight the config. But speaking on the scale of years of use, I can only imagine this creating more footguns that lead one to thinking 'do I have a bug' but really at the core its a problem in library code. |
AS
on an existing column name has value overridden if value exists on field resulting from queryAS
on an existing column name has value overridden if value exists on field resulting from another column in the query if having same name from another table
If I'm understanding this issue correctly, I believe I am experiencing a similar issue. I have a query like:
where the results look like
Notice the reference to |
@hexchung yes this is definitely the same problem. The core of this will be quite a doozy to fix, as we are essentially choosing the wrong data structure to do this (maps). Here is a canned example that shows it: Query: SELECT
upc.foo AS foo
upc.bar AS bar
upc.bizzle AS foo
FROM
MY_NAMESPACE.whatever We will get this in the snowflake UI
Which, when we do the reduction we see here will ultimately overwrite the original value of The solution here will potentially be to make an application-level configuration for the library which allows us to return lists of tuples, which can have this duplication, but maps as a data structure inherently will require people to work around this in ways that are similar to what you are dealing with. Ultimately, when doing joins or selects that will result in like-column-names and you only want one, specifically noting that by being explicit with all columns and want you want vs I am not sure when we will have bandwidth to make this change, cut a new release, etc on the PepsiCo side of things, but all pull requests are welcome. The impact on performance and massive API change in data structure makes it so that it seems we definitely cant just default back to it, but making it a configurable option seems reasonable IMO. I have also discussed this with @jeremyowensboggs -- feel free to chime in if you have any further thoughts about this per our previous Slack discussions, Jeremy. |
I am not 100% sure if this is a 'snowflex' issue or if its isolated enough that its just a me-problem.
An illustration:
Let's say I have a query like this
Let's assume that both
STUFF
andITEM
have a similar field,GTIN
andPRODUCT_CODE
that are met to be the same, but GTIN is empty in theITEM
table and we want to override that but still get everything from it so we use ourAS
statement.If we run this in snowflake via a console, we get back a result that looks (roughly) like this:
However, when we example the results coming out of Snowflex, we will see a setup like this:
as the value for
bin_headers
, thus when we do our reduction, since the empty GTIN value fromcatalogue_product.*
overrides the value of the one we set using ourAS
clause.Now, if we simply moved the
AS
clause to be aftercatalog_products.*
we could do this, but its similarly misleading as a whole and just further obfuscates the underlying issue of receiving the truth from the query. It is treating a symptom rather than a cause, and allows us to bubble the fix we're using internally in an ETL process from our app to the library, but is debatably more deceptive when it comes to figuring out why something broke later down the line (which, these things being computers being programmed by people, will for sure happen)It seems that the solution here may be to add an option to use a not-null-or-empty value for like-keys in query results when deriving this map, but that probably has some possibly surprising behavior if its enabled by default.
I'm not 100% sure this is something that should be handled, but wanted to document it here in the event someone else ran into a similar issue, or if we deem it an API worth adding. I dont think making it just do this by default is the answer, and since maps cant have like keys, the path to the best answer seems murky.
The text was updated successfully, but these errors were encountered: