You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The LogSegmentBuilder introduced in #457 makes it easy to construct LogSegments. However, there are two distinct usecases for LogSegment:
The first is in a Snapshot. It may leverage a checkpoint hint and will use a checkpoint file if available using LogSegmentBuilder::with_checkpoint. For a snapshot at version n, and a checkpoint at version m, we keep commits with versions i such that m < i <= n.
The second use case is the upcoming TableChanges to perform CDF scans. CDF does not leverage checkpoint files since it requires commit files to produce the correct output. CDF must specify a start version s, and may have an end version e. For commit files with version i, they must satisfy s <= i <= e.
To support both use cases, LogSegmentBuilder currently allows users to omit checkpoint files, specify checkpoint hints, and specify start and end versions.
However, certain combinations of these options do not work together. Here are some cases that LogSegmentBuilder works for, but that do not fit the semantics of LogSegment.
You specify a checkpoint hint at version m and a start version s where m < s . The builder will produce a LogSegment that has a checkpoint file at m, but the commit files between versions m and s are missing. This violates contiguity of LogSegment
You specify a checkpoint hint at version m and a start version s where s < m. The builder will produce a LogSegment that has commits between s and m missing. This may or may not be acceptable.
Describe the functionality you are proposing.
Change the LogSegmentBuilder to support both use cases, while preserving the semantics and expected behaviour of LogSegment. There's a couple ways to do this:
Please describe why this is necessary.
The
LogSegmentBuilder
introduced in #457 makes it easy to constructLogSegment
s. However, there are two distinct usecases forLogSegment
:Snapshot
. It may leverage a checkpoint hint and will use a checkpoint file if available usingLogSegmentBuilder::with_checkpoint
. For a snapshot at versionn
, and a checkpoint at versionm
, we keep commits with versionsi
such thatm < i <= n
.TableChanges
to perform CDF scans. CDF does not leverage checkpoint files since it requires commit files to produce the correct output. CDF must specify a start versions
, and may have an end versione
. For commit files with versioni
, they must satisfys <= i <= e
.To support both use cases,
LogSegmentBuilder
currently allows users to omit checkpoint files, specify checkpoint hints, and specify start and end versions.However, certain combinations of these options do not work together. Here are some cases that
LogSegmentBuilder
works for, but that do not fit the semantics ofLogSegment
.m
and a start versions
wherem < s
. The builder will produce aLogSegment
that has a checkpoint file atm
, but the commit files between versionsm
ands
are missing. This violates contiguity ofLogSegment
m
and a start versions
wheres < m
. The builder will produce aLogSegment
that has commits betweens
andm
missing. This may or may not be acceptable.Describe the functionality you are proposing.
Change the
LogSegmentBuilder
to support both use cases, while preserving the semantics and expected behaviour ofLogSegment
. There's a couple ways to do this:This list is not exhaustive. You may find there are better solutions.
Additional context
The text was updated successfully, but these errors were encountered: