The Business Intelligence Blog

January 30, 2010

Change Data Capture in SQL Server 2008

Change Data Capture(CDC) tracks DML(Inserts, Updates, Deletes) and DDL(Create, Alter, Drop) activity on SQL Server tables. CDC simply reads the transaction log to capture the changes made to the tables and then writes that information into the corresponding change tables using SQL Agent jobs. To store the information about the changes made to the tables being tracked, SQL Server creates change tables that have the same structure(columns) as that of the source tables plus a few more columns that are specific to CDC. This feature is only available on enterprise, developer and evaluation edition and can work with any recovery model(Simple, Bulk-logged, Full). Also, please note that there is no UI available to configure change data capture, you will have to do it programatically.

The first step in setting up Change Data Capture is to enable the database for Change Data Capture. To do that, you will have to execute the sys.sp_cdc_enable_db stored procedure. Please note that you need to be a member of the sysadmin fixed server role to execute the stored procedure. The next step is to enable the table(that you want to track) for Change Data Capture. To do that, you will have to execute the sys.sp_cdc_enable_table stored procedure. Please note that you need to be a member of the db_owner fixed database role to execute the stored procedure.

Lets enable CDC in one of tables in the AdventureWorks Database and see how SQL Server tracks the modifications. As said before, we will first enable the AdventureWorks database as shown in the figure below.

Now lets enable CDC on the Product table by executing the statement EXEC sys.sp_cdc_enable_table ‘Production’, ‘Product’,@role_name = NULL as shown in the figure below.

As soon as CDC is enabled, SQL Server will create some system objects(as shown in the figure below) to track and store changes made to the source table. The change table(in this case cdc.Production_Product_CT) is added after CDC is enabled on the table.

Now lets do some inserts and updates to the Product table. As shown in the figure below, we did two inserts and one update to the Production.Product table in the AdventureWorks database.

This information must have been captured in the change table cdc.Production_Product_CT, lets find out.

As shown in the figure above, we have 4 rows: the first two rows indicate inserts, the third and fourth rows represent before and after image the update operation respectively. This is indicated by the __$operation column whose value is 1 for a delete, 2 for an insert, 3 is the “before image” of an update, and 4 is the “after image” of an update.

To extract change data, there are two built-in functions you can leverage:

  • cdc.fn_cdc_get_all_changes_<capture_instance> – This function returns all changes within a specified range.
  • cdc.fn_cdc_get_net_changes_<capture_instance> –  As the name indicates, this function returns only the final change for each row within a specified range.

You will have to know the relevant Log Sequence Numbers(LSNs) to extract data using these functions. However, you can use sys.fn_cdc_map_time_lsn function to map datetime ranges to LSN ranges.

For example, if your ETL system extracts change data on a daily basis, then use the following code:

DECLARE @begin_time datetime, @end_time datetime, @begin_lsn binary(10), @end_lsn binary(10);

SELECT @begin_time = CAST(CONVERT (date, GETDATE()) as varchar)+' 12:00:00.000'

SELECT @end_time = CAST(CONVERT (date, GETDATE()+1) as varchar)+' 12:00:00.000'

SELECT @begin_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than', @begin_time);

SELECT @end_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);

SELECT * FROM cdc.fn_cdc_get_all_changes_Production_Product(@begin_lsn, @end_lsn, 'all')

If we execute the above code in our case, it will return three rows as shown in the figure below.

Advertisements

1 Comment »

  1. […] 10. What is Change Data Capture(CDC)? Change Data Capture(CDC) tracks DML(Inserts, Updates, Deletes) and DDL(Create, Alter, Drop) activity on SQL Server tables. To store the information about the changes made to the tables being tracked, CDC creates change tables that have the same structure(columns) as that of the source tables plus a few more columns that are specific to CDC. CDC simply reads the transaction log to capture the changes made to the tables and then writes that information into the corresponding change tables. CDC is available on enterprise, developer and evaluation edition only. For details check this post – Change Data Capture in SQL Server 2008 […]

    Pingback by SQL Server Database Administration – FAQs « The Business Intelligence Blog — February 3, 2010 @ 2:32 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.