Theory
Read the WAL, not the tables
Nightly SELECT * FROM orders dumps are the dark age. They miss deletes, miss intra-day changes, hammer the OLTP DB, and scale linearly with table size.
Log-based CDC reads the database's write-ahead log (Postgres WAL, MySQL binlog, SQL Server CDC tables) and publishes every row change as an event — typically into Kafka via Debezium. The result:
- Captures inserts, updates and deletes.
- Near-zero load on the OLTP DB (read from the log, not the tables).
- Naturally event-driven: every downstream system gets the same stream.
Query-based CDC (poll WHERE updated_at > last_seen) still has its place for sources without log access, but it cannot see deletes and cannot guarantee no missed changes under concurrent writes.