CDC For Microsoft SQL Server Using Routes and Queues
CDC For Microsoft SQL Server Using Routes and Queues
For use in conjunction with the CDC SQL Server Framework.zip – Talend project.
Prerequisites
Recommended training
• Talend ESB Basics
• Talend ESB Administration
Technical prerequisites
• Talend Cloud Real Time Big Data Platform Studio 7.3.1
• Microsoft SQL Server with CDC (see Environment)
• Latest version of Microsoft SQL Server JDBC drivers
• Minimum system requirements for Talend Studio installation:
OS CPU RAM SSD Disk Size
Windows/Linux/Mac Intel i7 Processor 4 Cores or equivalent 8 GB 10+ GB
• Understanding of the following topics:
o Apache ActiveMQ
o Microsoft SQL Server installation and basic configurations
o Structured Query Language (SQL)
o Microsoft SQL Server CDC features and usage
o Apache CXF components used by Talend ESB platform
Contents
Environment ........................................................................................................................................................................................................................................ 3
Getting Started..................................................................................................................................................................................................................................... 4
Understanding the essential context variables .................................................................................................................................................................................. 5
Job overview ........................................................................................................................................................................................................................................ 6
Configuration steps ......................................................................................................................................................................................................................... 7
Creating the support table .......................................................................................................................................................................................................... 7
Building the Job and configuring components .......................................................................................................................................................................... 8
Consuming messages and CDC-enabled table discovery............................................................................................................................................................. 11
Lookup for CDC configuration in the working table ..................................................................................................................................................................... 15
Configuration unavailable ............................................................................................................................................................................................................. 19
Configuration available and CDC tracking .................................................................................................................................................................................... 22
Example .............................................................................................................................................................................................................................................. 32
Reference ........................................................................................................................................................................................................................................... 36
Environment
This solution template is designed to work on a valid Microsoft SQL Server version with CDC compatibility. It refers to a Virtual Machine created in the
Azure Cloud Platform using the SQL Server 2019 on Windows Server 2019 image. This is not a requirement for the framework, and depending on your
Azure account, the availability of the installation image might be different. When working with SQL Server, be sure that the host accepts TCP 1433
(default) incoming connections and verify that the SQL Server Agent is up and running in Microsoft Windows Services.
For information on the minimum installation requirements for the enterprise database, see the Microsoft SQL Server 2019: Hardware and software
requirements documentation.
Getting Started
1. Open Talend Studio and import the CDC SQL Server Framework.zip file. Do not select the Overwrite existing items check box.
2. Verify that the following Routes, Contexts, and Beans are in Talend Studio.
Job overview
The Route is divided into five different steps; each manages a different scope. Before the main flow starts, a set of configurations, like loading
dependency libraries and establishing the connection with the broker and SQL Server, are done. Logically, the flow starts checking all the available
CDC-configured tables on the SQL Server side. It uses a working table to store the information and contacts the enterprise database to retrieve any
changes recorded based on the internal timestamp generated by SQL Server. For every table found, a dedicated Route is created in the same context to
ensure that each CDC-tracked change generates a dedicated topic in the ActiveMQ broker. After the CDCs are recorded, a JSON message is pushed to
the queue, and the working table is updated to control the flow and the next cycle. If no changes are recorded, the working table is kept up to date to
manage the cycling process correctly.
1. Configuration steps
2. Consuming messages and CDC-enabled table discovery
3. Lookup for CDC configuration in the working table
4. Configuration unavailable
5. Configuration available and CDC tracking
Configuration steps
Creating the support table
Creating a working table on the SQL Server database allows you to manage the Route correctly. Routes consume information directly from this point
during the message exchange. Connect to SQL Server using your preferred client and run the following SQL query to create your working table, which is
used by the CDC framework to keep track of every cycle. The Route will fail in the query execution if the working table is not configured as shown below:
Where:
• cdc_consumer_id: object_id integer as reported in the table cdc.change_tables for the corresponding schema_table
• description: short description for the captured entity
• capture_instance: SYSNAME of the instance in the form schema_tablename
• last_start_lsn: last LSN (log sequence number) used as a starting point to get the values from CDC
• last_seqval: value of the last sequence in the CDC-captured table for the instance
• date_last_consumed: GETUTCDATE () value retrieved to put a datetime when the last connection was consumed
The cConfig_1 component loads all the necessary modules to run the Route and manages the data stream coming from the enterprise database. It’s
also responsible for loading the correct JDBC driver to access the SQL Server database. SQL Server JDBC must be uploaded into the Maven repository.
cBeanRegister_1 component contains all the necessary code to register a custom Java Bean in the Route and provide access to the SQL Server
database. It uses the context variables from SQL_Server_Connection_Context.
To consume messages in the Route for CDC-enabled tables, the cTimer_1 component runs the Route and starts the discovery phase. It uses the
context.route_exchange_period context variable to set the time period for retrieving information from new CDC tables controlled by SQL Server.
cSetBody_7 creates the query to retrieve all the tables under CDC control by SQL Server.
The cMessagingEndpoint_11 component, using the Camel jdbc component on the Advanced settings tab, executes the provided query in the body
and returns a data stream as a result set then passes it to Route exchange.
The cProcessor_4 component checks all the result sets retrieved from CDC-enabled tables and decides whether to add a new Route to the context
when it is not available. Adding a new Route to the context tracks a table in CDC SQL Server database with a ratio of 1:1. Route parameters are the
name of the table (table_name) and the period (period) defined in the context.cdc_table_consuming_period context variable. The message is sent to
the cDirect_1 component.
The cDirect_1 component is the global entry point for all the Routes related to CDC under the control table. Created Routes point to this input at the
scheduled time when a message is consumed.
cSetBody_1 sets up the body message as an SQL query that retrieves any configuration available from the working table. If the configuration is
available, the last_start_lsn value is retrieved.
cMessagingEndPoint_3 runs the SQL code defined in the previous step and produces a data stream as a body message in the exchange. It relies on the
Camel jdbc component configured on the Advanced settings tab.
The cProcessor_2 component sets the values used to understand whether the configuration in the working table is available. The configEvaluation
variable set to 1 means that a value for a CDC table is found, 0 means no value is retrieved, and the CDC settings are used for the first time.
cMessageRouter_1 sets an if condition on the value provided by the property variable configEvaluation. If this is set to 1, the Route goes to the direct
point called ConfigAvailable (cMessagingEndPoint_4); otherwise, it goes to the direct called ConfigUnavailable (cMessageEndPoint_5).
Configuration unavailable
When the configuration is not available, it’s the first time that the Route is running on CDC discovery for the monitored table. In this Route flow, an entry
is added to the working table, and the exchange proceeds according to the time period used. In this step, the last LSN evaluated value from SQL Server
is used as a starting point for retrieving the next changes.
The cDirect_3 component is the entry point for the Routes without any configuration in the working table.
cSetBody_2 defines the INSERT SQL statement for adding the configuration line in the working table.
cMessagingEndpoint_6 runs the query and inserts the entry in the working table using the Apache Camel jdbc component.
The cDirect_2 component responds when the exchange returns a configEvaluation variable value of 1 and acts as an entry point when the new CDC
must be retrieved.
cSetBody_3 declares the SQL statement used to retrieve any new CDC for the monitored table as declared in the property of the exchange
(table_name).
The cMessagingEndpoint_7 component runs the query and returns the data stream without the CDC changes. It is configured with the Camel jdbc
component on the Advanced settings tab.
The cProcessor_3 component sets different variables for proceeding in the message exchange. A value called cdc_ready determines whether any
changes are retrieved from the queries.
The cMessageRouter_2 component chooses the correct step depending on whether or not a new CDC is retrieved. Messages are delivered to the
correct destination based on the cdc_ready value.
When the value of cdc_ready is 0, the flow is going through the line called NoCDC. In this case, an updated query with the latest system value populates
the corresponding entry in the working table. The cSetBody_4 and cMessagingEndpoint_8 components are involved.
When the value of cdc_ready is 1, the flow is going through the line called CDCReady. In this case, a new query with the SQL Server function is set
(cSetBody_5) and executed (cMessagingEndpoint_9) to retrieve the new CDC values for the monitored table.
When values are retrieved from the CDC function, the message body contains all the information related to the change. cProcessor_1 controls the
result set and sets the current value to the body. It sends the results to the ActiveMQ broker only when the result set is not empty.
When the result set is empty (no CDC tracked), the message exchange is directed to route15, and the Route updates the working table. When the result
set is not empty, the message exchange is directed to the message broker using the goToJSM exchange property. The component responsible for this
dispatch is cMessageRouter_3.
The cSetBody_6 and cMEssagingEndpoint_10 components complete update the entry in the working table.
Set the message header and send the values to the broker. The cSetHeader_1 component populates a value called "CamelJmsDestinationName"
that declares the destination name in the broker queue. The cJMS_1 component pushes the message to the broker.
Example
This section provides the steps to create a sample database and capture CDC from Microsoft SQL Server. The Job uses the SQL Server BikeStores
sample database.
1. Download the BikeStores scripts and create a sample database as described on the SQLServer Tutorial.Net web site.
2. Create the consumer configuration table for managing the CDC using the following script:
-- BikeStores.dbo.talend_cdc_consumer definition
-- Drop table
4. Enable CDC control on the selected table using the following SQL script:
USE BikeStores
EXEC sys.sp_cdc_enable_table
@source_schema = N'production',
@source_name = N'brands',
@role_name = NULL,
--@filegroup_name = N'BikeStores_CT',
@supports_net_changes = 1
6. When the Route is up and running, a message informs you that the corresponding Route for the table is operative.
7. Check the new configuration row in the consumer working table using the following query:
select * from dbo.talend_cdc_consumer tcc
8. Prepare an Apache ActiveMQ Broker to push the messages retrieved when CDCs are captured, and point your browser to the administration
page, for example, http://localhost:8161/admin/index.jsp.
9. To simulate some changes, such as INSERT, UPDATE, and DELETE, use the following SQL script:
--INSERT
USE BikeStores
SET IDENTITY_INSERT production.brands ON
INSERT INTO production.brands(brand_id,brand_name) VALUES(10,'Cube')
SET IDENTITY_INSERT production.brands OFF
SELECT * FROM production.brands b
--UPDATE
USE BikeStores
UPDATE production.brands SET brand_name = N'Cube Bikes' WHERE brand_id = 10
SELECT * FROM production.brands b
--DELETE
USE BikeStores
DELETE FROM production.brands WHERE brand_id = 10
SELECT * FROM production.brands b
10. Point your browser to the Broker management page and select Queues. A queue is created with three JSON-like messages that describe the
action taken in the database.
Reference
For more information, see About Change Data Capture (SQL Server) in the Microsoft documentation.