Help understanding #892

nikhil-zlai · 2024-12-29T04:42:14Z

nikhil-zlai
Dec 29, 2024

Quoting: #891

How does Chronon guarantee consistency between online and offline data, specifically for joins? Does it use Kappa architecture (e.g running the same streaming pipeline for offline data)? If so, what kind of streaming join is used? I'd like to understand this in-depth for both Spark Structured Streaming and Flink engines.
For fetching/loading online/offline data: my understanding is that when executed in offline mode Chronon dumps resulting data in Hive, for online data goes to KVStore. Is there any guarantee that if I load data at specific timestamp from offline store (Hive) I'll get the exact same result as if I fetched KVStore at this exact timestamp? If so, how does it work exactly?
Does Chronon allow any last-mile request-time user-defined stateless transformations (in Tecton those are called on-demand features, e.g. getting user's request time at millisecond granularity). If so, how are these computed at online and offline and same question w.r.t. data consistency.