0% found this document useful (0 votes)
37 views

CDC For Microsoft SQL Server Using Routes and Queues

CDC for Microsoft SQL Server using Routes and Queues

Uploaded by

rodrigofjorge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

CDC For Microsoft SQL Server Using Routes and Queues

CDC for Microsoft SQL Server using Routes and Queues

Uploaded by

rodrigofjorge
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

CDC for Microsoft SQL Server using


Routes and Queues
Introduction
Talend offers several components for managing Change Data Capture (CDC) for a wide range of enterprise relational databases. To help companies
manage the increasing necessity of making changes in real time (or near real time) it can be helpful to adopt a strategy based on Enterprise Service Bus
(ESB) instead of the classical approach based on batch Jobs. This solution template shows you how to implement a fully functional CDC framework for
Microsoft SQL Server and dynamically push the incremental changes in a message queue, such as Active MQ.

For use in conjunction with the CDC SQL Server Framework.zip – Talend project.

Prerequisites
Recommended training
• Talend ESB Basics
• Talend ESB Administration

Technical prerequisites
• Talend Cloud Real Time Big Data Platform Studio 7.3.1
• Microsoft SQL Server with CDC (see Environment)
• Latest version of Microsoft SQL Server JDBC drivers
• Minimum system requirements for Talend Studio installation:
OS CPU RAM SSD Disk Size
Windows/Linux/Mac Intel i7 Processor 4 Cores or equivalent 8 GB 10+ GB
• Understanding of the following topics:
o Apache ActiveMQ
o Microsoft SQL Server installation and basic configurations
o Structured Query Language (SQL)
o Microsoft SQL Server CDC features and usage
o Apache CXF components used by Talend ESB platform

Copyright Talend 2020 1


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Contents
Environment ........................................................................................................................................................................................................................................ 3
Getting Started..................................................................................................................................................................................................................................... 4
Understanding the essential context variables .................................................................................................................................................................................. 5
Job overview ........................................................................................................................................................................................................................................ 6
Configuration steps ......................................................................................................................................................................................................................... 7
Creating the support table .......................................................................................................................................................................................................... 7
Building the Job and configuring components .......................................................................................................................................................................... 8
Consuming messages and CDC-enabled table discovery............................................................................................................................................................. 11
Lookup for CDC configuration in the working table ..................................................................................................................................................................... 15
Configuration unavailable ............................................................................................................................................................................................................. 19
Configuration available and CDC tracking .................................................................................................................................................................................... 22
Example .............................................................................................................................................................................................................................................. 32
Reference ........................................................................................................................................................................................................................................... 36

Copyright Talend 2020 2


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Environment
This solution template is designed to work on a valid Microsoft SQL Server version with CDC compatibility. It refers to a Virtual Machine created in the
Azure Cloud Platform using the SQL Server 2019 on Windows Server 2019 image. This is not a requirement for the framework, and depending on your
Azure account, the availability of the installation image might be different. When working with SQL Server, be sure that the host accepts TCP 1433
(default) incoming connections and verify that the SQL Server Agent is up and running in Microsoft Windows Services.

For information on the minimum installation requirements for the enterprise database, see the Microsoft SQL Server 2019: Hardware and software
requirements documentation.

Copyright Talend 2020 3


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Getting Started
1. Open Talend Studio and import the CDC SQL Server Framework.zip file. Do not select the Overwrite existing items check box.

2. Verify that the following Routes, Contexts, and Beans are in Talend Studio.

Copyright Talend 2020 4


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Understanding the essential context variables


Context variables used by the Routes are defined in three different files and are related to a well-defined component, such as the Route, the database,
or the broker queue. The context used is Default.

Copyright Talend 2020 5


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Job overview
The Route is divided into five different steps; each manages a different scope. Before the main flow starts, a set of configurations, like loading
dependency libraries and establishing the connection with the broker and SQL Server, are done. Logically, the flow starts checking all the available
CDC-configured tables on the SQL Server side. It uses a working table to store the information and contacts the enterprise database to retrieve any
changes recorded based on the internal timestamp generated by SQL Server. For every table found, a dedicated Route is created in the same context to
ensure that each CDC-tracked change generates a dedicated topic in the ActiveMQ broker. After the CDCs are recorded, a JSON message is pushed to
the queue, and the working table is updated to control the flow and the next cycle. If no changes are recorded, the working table is kept up to date to
manage the cycling process correctly.

1. Configuration steps
2. Consuming messages and CDC-enabled table discovery
3. Lookup for CDC configuration in the working table
4. Configuration unavailable
5. Configuration available and CDC tracking

Copyright Talend 2020 6


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Configuration steps
Creating the support table
Creating a working table on the SQL Server database allows you to manage the Route correctly. Routes consume information directly from this point
during the message exchange. Connect to SQL Server using your preferred client and run the following SQL query to create your working table, which is
used by the CDC framework to keep track of every cycle. The Route will fail in the query execution if the working table is not configured as shown below:

CREATE TABLE dbo.talend_cdc_consumer


(cdc_consumer_id INT NOT NULL CONSTRAINT PK_cdc_consumer PRIMARY KEY CLUSTERED
, [description] VARCHAR(200) NOT NULL
, capture_instance SYSNAME NOT NULL
, last_start_lsn BINARY(10) NULL
, last_seqval BINARY(10) NULL
, date_last_consumed DATETIME NULL
, CONSTRAINT UQ_cdc_consumer UNIQUE NONCLUSTERED ([description], capture_instance));

Where:
• cdc_consumer_id: object_id integer as reported in the table cdc.change_tables for the corresponding schema_table
• description: short description for the captured entity
• capture_instance: SYSNAME of the instance in the form schema_tablename
• last_start_lsn: last LSN (log sequence number) used as a starting point to get the values from CDC
• last_seqval: value of the last sequence in the CDC-captured table for the instance
• date_last_consumed: GETUTCDATE () value retrieved to put a datetime when the last connection was consumed

Copyright Talend 2020 7


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Building the Job and configuring components


The cMQConnectionFactory_1 component defines the Broker URI, in this case, context.connection_mq_broker_uri, and creates the connection to
the ActiveMQ Broker where the messages are sent.

Copyright Talend 2020 8


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

The cConfig_1 component loads all the necessary modules to run the Route and manages the data stream coming from the enterprise database. It’s
also responsible for loading the correct JDBC driver to access the SQL Server database. SQL Server JDBC must be uploaded into the Maven repository.

Copyright Talend 2020 9


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

cBeanRegister_1 component contains all the necessary code to register a custom Java Bean in the Route and provide access to the SQL Server
database. It uses the context variables from SQL_Server_Connection_Context.

Copyright Talend 2020 10


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Consuming messages and CDC-enabled table discovery


In this section, the Route starts using a predefined time period for consuming messages. This allows you to consume messages and retrieve information
from CDC-enabled tables without overloading the infrastructure. After the CDC-enabled tables discovery phase is complete, control is done in checking
if a dedicated Route is present in the current context; if not, it is created using the Java Bean defined in the CreateCDCTableRouteBuilder code.
Usually, all the dedicated Routes are created in the first run of the main Route.

To consume messages in the Route for CDC-enabled tables, the cTimer_1 component runs the Route and starts the discovery phase. It uses the
context.route_exchange_period context variable to set the time period for retrieving information from new CDC tables controlled by SQL Server.

Copyright Talend 2020 11


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

cSetBody_7 creates the query to retrieve all the tables under CDC control by SQL Server.

Copyright Talend 2020 12


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

The cMessagingEndpoint_11 component, using the Camel jdbc component on the Advanced settings tab, executes the provided query in the body
and returns a data stream as a result set then passes it to Route exchange.

Copyright Talend 2020 13


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

The cProcessor_4 component checks all the result sets retrieved from CDC-enabled tables and decides whether to add a new Route to the context
when it is not available. Adding a new Route to the context tracks a table in CDC SQL Server database with a ratio of 1:1. Route parameters are the
name of the table (table_name) and the period (period) defined in the context.cdc_table_consuming_period context variable. The message is sent to
the cDirect_1 component.

Copyright Talend 2020 14


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Lookup for CDC configuration in the working table


After a CDC table discovery phase is complete, the Route evaluates whether a valid configuration in the working table is present to retrieve any
information on previous CDC extraction. You can have two different cases: a valid configuration is not available, and it must be added, or a valid
configuration is available, and a CDC control must be done to retrieve any additional changes.

The cDirect_1 component is the global entry point for all the Routes related to CDC under the control table. Created Routes point to this input at the
scheduled time when a message is consumed.

cSetBody_1 sets up the body message as an SQL query that retrieves any configuration available from the working table. If the configuration is
available, the last_start_lsn value is retrieved.

Copyright Talend 2020 15


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

cMessagingEndPoint_3 runs the SQL code defined in the previous step and produces a data stream as a body message in the exchange. It relies on the
Camel jdbc component configured on the Advanced settings tab.

Copyright Talend 2020 16


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

The cProcessor_2 component sets the values used to understand whether the configuration in the working table is available. The configEvaluation
variable set to 1 means that a value for a CDC table is found, 0 means no value is retrieved, and the CDC settings are used for the first time.

Copyright Talend 2020 17


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

cMessageRouter_1 sets an if condition on the value provided by the property variable configEvaluation. If this is set to 1, the Route goes to the direct
point called ConfigAvailable (cMessagingEndPoint_4); otherwise, it goes to the direct called ConfigUnavailable (cMessageEndPoint_5).

Copyright Talend 2020 18


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Configuration unavailable
When the configuration is not available, it’s the first time that the Route is running on CDC discovery for the monitored table. In this Route flow, an entry
is added to the working table, and the exchange proceeds according to the time period used. In this step, the last LSN evaluated value from SQL Server
is used as a starting point for retrieving the next changes.

The cDirect_3 component is the entry point for the Routes without any configuration in the working table.

Copyright Talend 2020 19


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

cSetBody_2 defines the INSERT SQL statement for adding the configuration line in the working table.

Copyright Talend 2020 20


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

cMessagingEndpoint_6 runs the query and inserts the entry in the working table using the Apache Camel jdbc component.

Copyright Talend 2020 21


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Configuration available and CDC tracking


When the configuration is available for the CDC table under control, the Route checks any newly-recorded CDC and determines whether new records
are created. If no record is detected for CDC, the Route tracks the new last LSN value in the working table; otherwise, the extraction query is launched.
All changes are sent to the message broker as JSON messages.

The cDirect_2 component responds when the exchange returns a configEvaluation variable value of 1 and acts as an entry point when the new CDC
must be retrieved.

Copyright Talend 2020 22


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

cSetBody_3 declares the SQL statement used to retrieve any new CDC for the monitored table as declared in the property of the exchange
(table_name).

The cMessagingEndpoint_7 component runs the query and returns the data stream without the CDC changes. It is configured with the Camel jdbc
component on the Advanced settings tab.

Copyright Talend 2020 23


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

The cProcessor_3 component sets different variables for proceeding in the message exchange. A value called cdc_ready determines whether any
changes are retrieved from the queries.

Copyright Talend 2020 24


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

The cMessageRouter_2 component chooses the correct step depending on whether or not a new CDC is retrieved. Messages are delivered to the
correct destination based on the cdc_ready value.

Copyright Talend 2020 25


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

When the value of cdc_ready is 0, the flow is going through the line called NoCDC. In this case, an updated query with the latest system value populates
the corresponding entry in the working table. The cSetBody_4 and cMessagingEndpoint_8 components are involved.

Copyright Talend 2020 26


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

When the value of cdc_ready is 1, the flow is going through the line called CDCReady. In this case, a new query with the SQL Server function is set
(cSetBody_5) and executed (cMessagingEndpoint_9) to retrieve the new CDC values for the monitored table.

Copyright Talend 2020 27


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

When values are retrieved from the CDC function, the message body contains all the information related to the change. cProcessor_1 controls the
result set and sets the current value to the body. It sends the results to the ActiveMQ broker only when the result set is not empty.

Copyright Talend 2020 28


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

When the result set is empty (no CDC tracked), the message exchange is directed to route15, and the Route updates the working table. When the result
set is not empty, the message exchange is directed to the message broker using the goToJSM exchange property. The component responsible for this
dispatch is cMessageRouter_3.

Copyright Talend 2020 29


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

The cSetBody_6 and cMEssagingEndpoint_10 components complete update the entry in the working table.

Copyright Talend 2020 30


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Set the message header and send the values to the broker. The cSetHeader_1 component populates a value called "CamelJmsDestinationName"
that declares the destination name in the broker queue. The cJMS_1 component pushes the message to the broker.

Copyright Talend 2020 31


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Example
This section provides the steps to create a sample database and capture CDC from Microsoft SQL Server. The Job uses the SQL Server BikeStores
sample database.

1. Download the BikeStores scripts and create a sample database as described on the SQLServer Tutorial.Net web site.
2. Create the consumer configuration table for managing the CDC using the following script:
-- BikeStores.dbo.talend_cdc_consumer definition

-- Drop table

-- DROP TABLE BikeStores.dbo.talend_cdc_consumer GO

CREATE TABLE talend_cdc_consumer (


cdc_consumer_id int NOT NULL,
description varchar(200) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
capture_instance sysname COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
last_start_lsn binary(10) NULL,
last_seqval binary(10) NULL,
date_last_consumed datetime NULL,
CONSTRAINT PK_cdc_consumer PRIMARY KEY (cdc_consumer_id),
CONSTRAINT UQ_cdc_consumer UNIQUE (description,capture_instance)
) GO
CREATE UNIQUE INDEX UQ_cdc_consumer ON talend_cdc_consumer (description,capture_instance) GO;

3. Enable CDC on SQL Server using the following SQL script:


USE BikeStores
EXEC sys.sp_cdc_enable_db

4. Enable CDC control on the selected table using the following SQL script:
USE BikeStores
EXEC sys.sp_cdc_enable_table
@source_schema = N'production',
@source_name = N'brands',
@role_name = NULL,
--@filegroup_name = N'BikeStores_CT',
@supports_net_changes = 1

Copyright Talend 2020 32


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

5. Run the Routes from Talend Studio.

6. When the Route is up and running, a message informs you that the corresponding Route for the table is operative.

7. Check the new configuration row in the consumer working table using the following query:
select * from dbo.talend_cdc_consumer tcc

Copyright Talend 2020 33


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

8. Prepare an Apache ActiveMQ Broker to push the messages retrieved when CDCs are captured, and point your browser to the administration
page, for example, http://localhost:8161/admin/index.jsp.

9. To simulate some changes, such as INSERT, UPDATE, and DELETE, use the following SQL script:
--INSERT
USE BikeStores
SET IDENTITY_INSERT production.brands ON
INSERT INTO production.brands(brand_id,brand_name) VALUES(10,'Cube')
SET IDENTITY_INSERT production.brands OFF
SELECT * FROM production.brands b

--UPDATE
USE BikeStores
UPDATE production.brands SET brand_name = N'Cube Bikes' WHERE brand_id = 10
SELECT * FROM production.brands b

--DELETE
USE BikeStores
DELETE FROM production.brands WHERE brand_id = 10
SELECT * FROM production.brands b

Copyright Talend 2020 34


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

10. Point your browser to the Broker management page and select Queues. A queue is created with three JSON-like messages that describe the
action taken in the database.

Copyright Talend 2020 35


CDC for Microsoft SQL Server using Routes and Queues - Talend Solution Templates

Reference
For more information, see About Change Data Capture (SQL Server) in the Microsoft documentation.

Copyright Talend 2020 36

You might also like