You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In general our CDC replication is very solid in terms of capturing all changes to the source table, but there is one situation where it's possible to completely replace the contents of a source table without a single peep from our connector.
The way you do that is to load a bunch of new data into a staging table, then drop the old table and rename the staging table to the real table name. Since Postgres logical replication doesn't directly tell us about table dropping or renaming, we won't receive any messages about this and thus won't have anything to emit ourselves. This is not ideal, to say the least.
It is possible in principle to address this, but it requires periodically polling the OIDs of all active tables, persisting that information across task restarts, and then triggering a stream restart if the OID ever changes (or if the table disappears entirely for a while). It might be possible to combine this with normal discovery, but it also might not be worth the effort to do that.
It would be nice if this solution could also address table truncation, but we shouldn't try to tackle that here for three reasons:
I'm not sure if there's any good way to reliably detect truncation based on catalog metadata.
We don't currently have collection-level truncation signals, so there wouldn't be any useful signal gained by doing this if the source table is truncated.
Due to (2), our other SQL CDC connectors ignore truncations currently so we might as well keep things consistent for now.
The text was updated successfully, but these errors were encountered:
In general our CDC replication is very solid in terms of capturing all changes to the source table, but there is one situation where it's possible to completely replace the contents of a source table without a single peep from our connector.
The way you do that is to load a bunch of new data into a staging table, then drop the old table and rename the staging table to the real table name. Since Postgres logical replication doesn't directly tell us about table dropping or renaming, we won't receive any messages about this and thus won't have anything to emit ourselves. This is not ideal, to say the least.
It is possible in principle to address this, but it requires periodically polling the OIDs of all active tables, persisting that information across task restarts, and then triggering a stream restart if the OID ever changes (or if the table disappears entirely for a while). It might be possible to combine this with normal discovery, but it also might not be worth the effort to do that.
It would be nice if this solution could also address table truncation, but we shouldn't try to tackle that here for three reasons:
The text was updated successfully, but these errors were encountered: