Help understanding #892
Unanswered
nikhil-zlai
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Quoting: #891
How does Chronon guarantee consistency between online and offline data, specifically for joins? Does it use Kappa architecture (e.g running the same streaming pipeline for offline data)? If so, what kind of streaming join is used? I'd like to understand this in-depth for both Spark Structured Streaming and Flink engines.
For fetching/loading online/offline data: my understanding is that when executed in offline mode Chronon dumps resulting data in Hive, for online data goes to KVStore. Is there any guarantee that if I load data at specific timestamp from offline store (Hive) I'll get the exact same result as if I fetched KVStore at this exact timestamp? If so, how does it work exactly?
Does Chronon allow any last-mile request-time user-defined stateless transformations (in Tecton those are called on-demand features, e.g. getting user's request time at millisecond granularity). If so, how are these computed at online and offline and same question w.r.t. data consistency.
Beta Was this translation helpful? Give feedback.
All reactions