Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connector checkpointing #3876

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Connector checkpointing #3876

wants to merge 9 commits into from

Conversation

Weves
Copy link
Contributor

@Weves Weves commented Feb 2, 2025

Description

https://linear.app/danswer/issue/DAN-1400/connector-checkpointing-continue-on-failure

How Has This Been Tested?

  • New integration tests
  • Testing Slack connector locally with induced failures + verifying that it indexes the same # of docs as the old connector
  • Tested that Google Drive connector indexes the same # of docs

Backporting (check the box to trigger backport action)

Note: You have to check that the action passes, otherwise resolve the conflicts manually and tag the patches.

  • This PR should be backported (make sure to check that the backport attempt succeeds)
  • [Optional] Override Linear Check

Copy link

vercel bot commented Feb 2, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Feb 6, 2025 9:29pm

backend/onyx/background/celery/tasks/indexing/tasks.py Outdated Show resolved Hide resolved
backend/onyx/server/documents/cc_pair.py Show resolved Hide resolved
backend/onyx/server/documents/indexing.py Outdated Show resolved Hide resolved
backend/onyx/indexing/indexing_pipeline.py Outdated Show resolved Hide resolved
if (failed_document is None and failed_entity is None) or (
failed_document is not None and failed_entity is not None
):
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a Union type then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, to me tradeoffs for each. using isinstance to check type (which is necessary if using Union type) feels bad, thus this approach

backend/onyx/connectors/models.py Show resolved Hide resolved
backend/onyx/background/indexing/run_indexing.py Outdated Show resolved Hide resolved
backend/onyx/background/celery/tasks/beat_schedule.py Outdated Show resolved Hide resolved
except Exception as e:
logger.exception(
"Connector run exceptioned after elapsed time: "
f"{time.time() - start_time} seconds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.monotonic probably better here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched!

Weves added 6 commits February 6, 2025 10:37
more stuff for checkpointing

Basic implementation

FE stuff

More checkpointing/failure handling

rebase

rebase

initial scaffolding for IT

IT to test checkpointing

Cleanup

cleanup

Fix it

Rebase

Add todo

Fix actions IT

Test more

Pagination + fixes + cleanup

Fix IT networking

fix it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants