Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quickstart starts a Datahub version without matching PyPi packages for dbt #12538

Open
james-larsen opened this issue Feb 3, 2025 · 2 comments
Labels
bug Bug report

Comments

@james-larsen
Copy link

Describe the bug
When running the datahub docker quickstart and attempting to ingest dbt metadata objects, it cannot create a virtual env since there is no corresponding PyPi packages. Below is how this problem presents itself when trying to ingest metadata from dbt.

To Reproduce
Steps to reproduce the behavior:

  1. Follow Datahub Quickstart
  2. Login using demo "datahub" account
  3. To generate the below logs: Try to run metadata ingestion for dbt
~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '20240006-70db-45c4-958f-daae42ca1dcd',
 'infos': ['2025-01-31 15:40:26.524331 INFO: Starting execution for task with name=RUN_INGEST',
           "2025-01-31 15:40:30.775089 INFO: Failed to execute 'datahub ingest', exit code 1",
           '2025-01-31 15:40:30.776539 INFO: Caught exception EXECUTING task_id=20240006-70db-45c4-958f-daae42ca1dcd, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 139, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 402, in '
           'execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv doesn't exist.. minting..
Using CPython 3.10.12 interpreter at: /usr/bin/python
Creating virtual environment at: /tmp/datahub/ingest/venv-dbt-9c8a88a4b3d58d78
Resolved 3 packages in 237ms
Prepared 3 packages in 2ms
Installed 3 packages in 2.66s
 + pip==25.0
 + setuptools==75.8.0
 + wheel==0.45.1
+ uv pip install 'acryl-datahub[datahub-rest,datahub-kafka,dbt]==1.0.0rc1'
  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of acryl-datahub[dbt]==1.0.0rc1 and
      you require acryl-datahub[dbt]==1.0.0rc1, we can conclude that your
      requirements are unsatisfiable.

Expected behavior
Ingestion in the quickstart version works. Quickstart uses a fully supported version. Alternatively update the documentation to cover this issue.

Desktop (please complete the following information):

  • OS: Windows 10 using WSL2 and Rancher

Additional context
This seems very similar to this bug report for Postgres, which was resolved back in September 2024.

@james-larsen james-larsen added the bug Bug report label Feb 3, 2025
@james-larsen
Copy link
Author

Tried again this morning, and not getting the original error anymore. However, I am now getting the below error, which seems to still be about quickstart not installing the proper dependencies for dbt:

ModuleNotFoundError: No module named 'more_itertools'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 138, in _add_init_error_context
    yield
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 289, in __init__
    source_class = source_registry.get(self.source_type)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 178, in get
    raise ConfigurationError(
datahub.configuration.common.ConfigurationError: dbt is disabled due to a missing dependency: more_itertools; try running `pip install 'acryl-datahub[dbt]'`

@cloonix
Copy link

cloonix commented Feb 7, 2025

I have the same issue with a SAP HANA connection.

+ uv pip install 'acryl-datahub[datahub-rest,datahub-kafka,hana]==1.0.0rc1'
  × No solution found when resolving dependencies:
  ╰─▶ Because there is no version of acryl-datahub[hana]==1.0.0rc1 and
      you require acryl-datahub[hana]==1.0.0rc1, we can conclude that your
      requirements are unsatisfiable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

2 participants