Unlocking Change Data Capture in ADF: New Feature Explained
Change Data Capture(CDC) In Azure Data Factory
Introduction
This blog post introduces an exciting new feature in Azure Data Factory (ADF) — Change Data Capture (CDC) — a long-awaited capability that significantly enhances real-time data integration scenarios.
While ADF has previously supported CDC through Data Flows, this newly introduced UI-based CDC feature brings a more streamlined and user-friendly experience. However, it's important to note that there are key differences between the two implementations:
-
CDC in Data Flows follows a pattern of performing a full initial load, followed by incremental updates, allowing users to establish a historical baseline before applying changes.
-
In contrast, the CDC feature in the ADF UI is designed specifically for incremental loads only, and it offers near real-time synchronization by pushing changes directly to the sink as they occur.
Additionally, with Data Flows, changes are applied based on checkpoint-based pipeline executions, whereas the new UI-based CDC enables faster and more immediate updates, making it ideal for real-time data movement scenarios.
- You need to have Azure SQL Database (recommended for poc purpose use General Purpose - Serverless: Gen5, 1 vCore).
- Cdc must be enabled at the database level and table level in azure sql database.
- A database is created "testdb".
SQL Code:-
The same needs to be done at table level as well. So we need to create our table named "source" at testdb and enable cdc for the same. The below sql code displays the code for the same.
Similarly we will create our destination table as well, please find the below code for the same.
SQL Code:-
create table destination(id int primary key,
name varchar(255));
The following picture displays the overall data representation in source and destination tables, and for ease we have name our source and destination tables as "source" & "destination" respectively for ease.
Step-5:
The overall picture after inserting two records in the source and checking the no of records in the destination, we get a fascinating observation. There seems to have 4 records in the source, which we get (2(initial)+ 2 (new)) records, but destination only had the 2 new records which are inserted with ID's 3 & 4.
Lets try different CRUD Operation to test the flexibility of this CDC feature:-
We shall update the name of ID 1 from 'Alex' to 'Hogward' and delete the ID=4. The sql code is as below:-
UPDATE source
SET name = 'Hogward'
WHERE id = 1;
delete from source where id=4;
The below picture describes the reality as update didn't showed in destination as ID=1 doesn't present in the destination table, hence i that is not present in destination , it would not upsert.
The delete operation was reflected in the sink as ID=4 was present in destination table as well.
The changes can be monitored in azure itself by navigating to the "monitor" tab and choosing cdc option, the bar chart would represent the changes made and can be viewed in isolation.
We can conclude this blog post by saying that the CDC feature as separate ADF object can be highly beneficial if we understand its functionality and how it aligns with our business needs, along with its current features and limitations.
Thankyou for reading.
If you enjoyed this blog, feel free to share, subscribe, like, or leave a comment!













Comments
Post a Comment