Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Schema not found the second time a spark table is accessed #10749

Open
1 task done
christophediprima opened this issue Jan 29, 2025 · 8 comments
Open
1 task done
Labels
bug Incorrect behavior inside of ibis

Comments

@christophediprima
Copy link

What happened?

I am trying to insert into two different iceberg tables using spark connect. But as soon as I access a table using get_schema,table or insert I won't be able to do it a second time.

My code looks like this:

import ibis
from   pyspark.sql import SparkSession

session = SparkSession.builder.getOrCreate()
con = ibis.pyspark.connect(session)

new_dataframe = Dataframe creation here

con.insert(
  "my_table",
  new_dataframe,
  database="my_catalog.my_database",
)

new_dataframe_2 = Dataframe creation here

con.insert(
  "my_other_table",
  new_dataframe_2,
  database="my_catalog.my_database",
)

First insert will work but not the second one. If first insert is commented, second one will work. Same thing will happen regardless of the get_schema,table or insert method.

What version of ibis are you using?

ibis-framework[pyspark]==10.0.0.dev490

What backend(s) are you using, if any?

PySpark

Relevant log output

pyspark.errors.exceptions.connect.AnalysisException: [SCHEMA_NOT_FOUND] The schema `my_catalog`. cannot be found. Verify the spelling and correctness of the schema and catalog.
If you did not qualify the name with a catalog, verify the current_schema() output, or qualify the name with the correct catalog.
To tolerate the error on drop use DROP SCHEMA IF EXISTS.

Code of Conduct

  • I agree to follow this project's Code of Conduct
@cpcloud
Copy link
Member

cpcloud commented Feb 1, 2025

Thanks for the issue!

Can you show a fully reproducible example that I can copy-paste into a file or a Python REPL?

The

Dataframe creation here

bits seem critical to reproducing the problem for example.

Toy data is completely fine.

@christophediprima
Copy link
Author

Sure ! Please let me know if this one is good enough:

import ibis
from   pyspark.sql import SparkSession

session = SparkSession.builder.getOrCreate()
con = ibis.pyspark.connect(session)

# Could not use following as 'force' is not supported in 'create_table' and does not seem to work in 'create_database'
# con.create_database(
#   "my_database",
#   force=True,
# )
# con.create_table(
#   "my_table",
#   schema=ibis.schema([("id", "string")]),
#   database="my_database",
#   force=True,
# )

con.sql(f"CREATE DATABASE IF NOT EXISTS my_database")
con.sql("""
  CREATE TABLE IF NOT EXISTS my_database.my_table (
    id string
  )
""")

new_ids = ibis.memtable([{
  'id': 'my_id',
  }])

con.insert(
  "my_table",
  new_ids,
  database="my_database",
)

new_ids_2 = ibis.memtable([{
  'id': 'my_id_2',
  }])

con.insert(
  "my_table",
  new_ids_2,
  database="my_database",
)

@gforsyth
Copy link
Member

gforsyth commented Feb 3, 2025

Hey @christophediprima !

The .sql method is for SQL queries that return results that you want to chain inline with other Ibis commands. It is not a good idea to try using it for DDL operations. If you need to do some DDL outside of Ibis' API, you should use con.raw_sql for that instead.

@gforsyth
Copy link
Member

gforsyth commented Feb 3, 2025

There does seem to be a bug here, which is that force=True on con.create_database isn't generating the expected IF NOT EXISTS clause

@gforsyth
Copy link
Member

gforsyth commented Feb 3, 2025

Found the issue, I'll put up a PR

@christophediprima
Copy link
Author

Thanks for your replies. I used .sql as a workaround so I can provide a working full example to reproduce the initial bug I was reporting.

Did you find the solution for my initial bug report or force=True one?

@gforsyth
Copy link
Member

gforsyth commented Feb 3, 2025

force=True -- I can't reproduce your initial bug if the database and table have been created

@gforsyth
Copy link
Member

gforsyth commented Feb 3, 2025

(Although that's with a local spark session, not using Iceberg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis
Projects
Status: backlog
Development

No branches or pull requests

3 participants