Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: add in-commit timestamps table properties #558

Merged
merged 8 commits into from
Feb 6, 2025
3 changes: 2 additions & 1 deletion ffi/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,8 @@ pub unsafe extern "C" fn set_builder_option(
}

/// Consume the builder and return a `default` engine. After calling, the passed pointer is _no
/// longer valid_.
/// longer valid_. Note that this _consumes_ and frees the builder, so there is no need to
/// drop/free it afterwards.
///
///
/// # Safety
Expand Down
22 changes: 21 additions & 1 deletion kernel/src/table_properties.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ use std::time::Duration;

use crate::expressions::ColumnName;
use crate::table_features::ColumnMappingMode;
use crate::Error;
use crate::{Error, Version};

use strum::EnumString;

Expand Down Expand Up @@ -137,6 +137,20 @@ pub struct TableProperties {
/// whether to enable row tracking during writes.
pub enable_row_tracking: Option<bool>,

/// Whether to enable [In-Commit Timestamps]. The in-commit timestamps writer feature strongly
/// associates a monotonically increasing timestamp with each commit by storing it in the
/// commit's metadata.
///
/// [In-Commit Timestamps]: https://github.com/delta-io/delta/blob/master/PROTOCOL.md#in-commit-timestamps
pub enable_in_commit_timestamps: Option<bool>,

/// The version of the table at which in-commit timestamps were enabled.
pub in_commit_timestamp_enablement_version: Option<Version>,

/// The timestamp of the table at which in-commit timestamps were enabled. This must be the same
/// as the inCommitTimestamp of the commit when this feature was enabled.
pub in_commit_timestamp_enablement_timestamp: Option<i64>,

/// any unrecognized properties are passed through and ignored by the parser
pub unknown_properties: HashMap<String, String>,
}
Expand Down Expand Up @@ -268,6 +282,9 @@ mod tests {
("delta.tuneFileSizesForRewrites", "true"),
("delta.checkpointPolicy", "v2"),
("delta.enableRowTracking", "true"),
("delta.enableInCommitTimestamps", "true"),
("delta.inCommitTimestampEnablementVersion", "15"),
("delta.inCommitTimestampEnablementTimestamp", "1612345678"),
];
let actual = TableProperties::from(properties.into_iter());
let expected = TableProperties {
Expand All @@ -293,6 +310,9 @@ mod tests {
tune_file_sizes_for_rewrites: Some(true),
checkpoint_policy: Some(CheckpointPolicy::V2),
enable_row_tracking: Some(true),
enable_in_commit_timestamps: Some(true),
in_commit_timestamp_enablement_version: Some(15),
in_commit_timestamp_enablement_timestamp: Some(1_612_345_678),
unknown_properties: HashMap::new(),
};
assert_eq!(actual, expected);
Expand Down
31 changes: 27 additions & 4 deletions kernel/src/table_properties/deserialize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -76,17 +76,33 @@ fn try_parse(props: &mut TableProperties, k: &str, v: &str) -> Option<()> {
}
"delta.checkpointPolicy" => props.checkpoint_policy = CheckpointPolicy::try_from(v).ok(),
"delta.enableRowTracking" => props.enable_row_tracking = Some(parse_bool(v)?),
"delta.enableInCommitTimestamps" => {
props.enable_in_commit_timestamps = Some(parse_bool(v)?)
}
"delta.inCommitTimestampEnablementVersion" => {
props.in_commit_timestamp_enablement_version = Some(parse_uint(v)?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_uint already returns an option. Why not just:

Suggested change
props.in_commit_timestamp_enablement_version = Some(parse_uint(v)?)
props.in_commit_timestamp_enablement_version = parse_uint(v)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i actually had to think about this for a sec too lol - we actually want to return None in the case it doesn't parse. if we leave it as just parse_uint(v) then it will set the value to None and return Some

perhaps we should rethink this code and try to rewrite it more nicely considering both you and I have now stumbled on it :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right! Thx for the clarification

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also took a solid minute to grok, does the class/method have a doc comment explaining the intended behavior?

Basically, we want the entire method to return None if a required property fails to parse (indicating failure) -- not merely set that property to None (it was probably None already)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep exactly. added a comment to clarify but given this has tripped up three of us now I think we should rewrite lol

made a quick issue: #682

}
"delta.inCommitTimestampEnablementTimestamp" => {
props.in_commit_timestamp_enablement_timestamp = Some(parse_uint(v)?.try_into().ok()?)
zachschuermann marked this conversation as resolved.
Show resolved Hide resolved
}
_ => return None,
}
Some(())
}

/// Deserialize a string representing a positive integer into an `Option<u64>`. Returns `Some` if
/// successfully parses, and `None` otherwise.
/// Deserialize a string representing a positive (> 0) integer into an `Option<u64>`. Returns `Some`
/// if successfully parses, and `None` otherwise.
pub(crate) fn parse_positive_int(s: &str) -> Option<NonZero<u64>> {
// parse to i64 (then check n > 0) since java doesn't even allow u64
// parse as non-negative and verify the result is non-zero
NonZero::new(parse_uint(s)?)
}

/// Deserialize a string representing a non-negative integer into an `Option<u64>`. Returns `Some` if
/// successfully parses, and `None` otherwise.
pub(crate) fn parse_uint(s: &str) -> Option<u64> {
// parse to i64 (then to u64) since java doesn't even allow u64
let n: i64 = s.parse().ok()?;
NonZero::new(n.try_into().ok()?)
n.try_into().ok()
}

/// Deserialize a string representing a boolean into an `Option<bool>`. Returns `Some` if
Expand Down Expand Up @@ -205,6 +221,13 @@ mod tests {
assert_eq!(parse_positive_int("-123"), None);
}

#[test]
fn test_parse_int() {
assert_eq!(parse_uint("123").unwrap(), 123);
assert_eq!(parse_uint("0").unwrap(), 0);
assert_eq!(parse_uint("-123"), None);
}

#[test]
fn test_parse_interval() {
assert_eq!(
Expand Down
Loading