Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve and clarify LogSegmentBuilder semantics for CDF and Snapshot cases #476

Closed
OussamaSaoudi-db opened this issue Nov 12, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@OussamaSaoudi-db
Copy link
Collaborator

Please describe why this is necessary.

The LogSegmentBuilder introduced in #457 makes it easy to construct LogSegments. However, there are two distinct usecases for LogSegment:

  1. The first is in a Snapshot. It may leverage a checkpoint hint and will use a checkpoint file if available using LogSegmentBuilder::with_checkpoint. For a snapshot at version n, and a checkpoint at version m, we keep commits with versions i such that m < i <= n.
  2. The second use case is the upcoming TableChanges to perform CDF scans. CDF does not leverage checkpoint files since it requires commit files to produce the correct output. CDF must specify a start version s, and may have an end version e. For commit files with version i, they must satisfy s <= i <= e.

To support both use cases, LogSegmentBuilder currently allows users to omit checkpoint files, specify checkpoint hints, and specify start and end versions.

However, certain combinations of these options do not work together. Here are some cases that LogSegmentBuilder works for, but that do not fit the semantics of LogSegment.

  • You specify a checkpoint hint at version m and a start version s where m < s . The builder will produce a LogSegment that has a checkpoint file at m, but the commit files between versions m and s are missing. This violates contiguity of LogSegment
  • You specify a checkpoint hint at version m and a start version s where s < m. The builder will produce a LogSegment that has commits between s and m missing. This may or may not be acceptable.

Describe the functionality you are proposing.

Change the LogSegmentBuilder to support both use cases, while preserving the semantics and expected behaviour of LogSegment. There's a couple ways to do this:

  1. Use the TypeState pattern
  2. Make builder methods fallible
    This list is not exhaustive. You may find there are better solutions.

Additional context

@OussamaSaoudi-db
Copy link
Collaborator Author

We are no longer considering the builder pattern to construct log segments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant