Qlik Replicate More Data AnalyticsReady White Paper US
Qlik Replicate More Data AnalyticsReady White Paper US
More Data,
Analytics-Ready
Accelerate Data Delivery to Data Lakes,
Data Warehouses, Streaming and Cloud Architectures
QLIK.COM
INTRODUCTION
Integrating all required data at scale can overburden your already taxed
IT team. There’s complex manual coding (often varying by platform
type) and procedures that disrupt production sources. And data
architects and database administrators (DBAs) are already struggling to
efficiently execute and track replication across the enterprise. Without
the right tools, it’s nearly impossible for them to efficiently manage the
hundreds or potentially thousands of integration tasks that your initiatives entail.
Here’s the rub: modern data requirements break traditional data integration tools. But modern data
requirements don’t break our Qlik Replicate™ solution. That’s why it’s an ideal modern data integration
solution for efficiently delivering more data, ready for agile analytics, to a diverse range of data lake,
data warehouse, streaming, and cloud architectures.
Qlik Replicate, formerly Attunity Replicate, modernizes your environment by moving data
at high speed across all major source and target platforms with a single “Click to Load”
interface that completely automates the end-to-end replication process. Our software
gives your administrators and data architects a way to easily configure, control, and monitor bulk loads
and real-time updates with enterprise-class change data capture (CDC) capabilities. This enables
instantaneous database replication of changed data to the target. And our zero-footprint CDC
eliminates any risk of production impact.
Our software accelerates both heterogeneous and homogeneous data replication, and controls data
flow across hybrid multi-platform environments. It supports all major databases, including Oracle,
Microsoft SQL Server, and IBM DB2. Beyond transactional database support, Replicate integrates with
major analytics platforms, including Microfocus Vertica, IBM Integrated Analytics PureData (formerly
Netezza), Microsoft Synapse Analytics, Oracle Exadata, and Teradata. Also Hadoop distributions from
Cloudera and Azure HDInsight, and streaming systems such as Apache Kafka. Replicate leverages
native utilities and APIs to guarantee fast, optimized, and secure data capture and loading.
Our solution showcases deep partnerships and broad product integration with industry
leaders. It supports all major source and target systems for data replication, including
relational database, data warehouse, data lake, Hadoop, cloud, and mainframe platforms.
It also supports MongoDB as a NoSQL target and writes CDC as messages to all major streaming
platforms. See Appendix for a detailed list of supported source and target platforms.
With its multi-server, multi-task, and multi-threaded architecture, you can scale your
on-premises and cloud implementations to thousands
of distributed servers and data centers worldwide. The Qlik Replicate architecture is comprised of three
domains: sources (databases, etc.), replication server(s), and targets (databases, data warehouses,
data lakes, cloud, etc.). Its key architectural principles include:
Full-Load Replication
With full-load replication, our Qlik Replicate software takes all the tables from the source
and creates copies at the target. Then, it automatically defines metadata required by the
target and populates the tables with data from the source.
Your data is loaded into one or multiple tables to improve efficiency. Although source tables may be
subject to update activity during the full-load process, there’s no need to stop applications in the source.
Our unique CDC process automatically activates when table loading starts. However, changes are not
applied to the target until after loading is complete. And although data on the target may not be
consistent while the load is active, at completion, the target has full data consistency and integrity.
If you need to though, you can interrupt the loading process. When it restarts, our software will continue
where it stopped. You can add new tables to an existing target without reloading existing tables.
Similarly, you can add or drop columns in previously populated target tables without reloading.
Our CDC process copies updates as they occur in the source data or metadata and
applies them to the target endpoint in real time. With Qlik Replicate CDC, you can move
large volumes of data changes into target databases or cloud environments with
efficiency and ease—at speed.
• Log-based Capture
Our CDC reads the recovery log file of the source endpoint management system and groups
together entries for each transaction. Its process techniques ensure efficiency without impacting
target data latency. If the CDC process can’t apply the changes to the target in a reasonable
period (e.g., when the target isn’t accessible), then it buffers the changes on the replication
server for as long as needed. No rereading of the source database logs that may take hours or
days!
• Query-based Capture
When log-based capture isn’t available then our software queries the source tables using
context columns, such as TIMESTAMP, to identify and capture changes efficiently from source
enterprise data warehouse platforms.
Advanced CDC technology has several options when delivering data to targets:
• Transactional CDC
This is for standard database targets where transactional consistency is more important than
high performance. Qlik Replicate streams changes on a transaction-by-transaction basis to
maintain transactional integrity at any point in time.
For example, agentless endpoint for IBM DB2 for z/OS and IBM DB2 for iSeries deliver significant
optimizations to improve performance and reduce footprints when capturing changes from these
platforms. Lab tests demonstrate reductions of 85 percent in source MSU (million service units), 75
percent in replication latency, and 95 percent in loading time.
Our software is designed for maximum flexibility. The transaction log reader can be installed either on
the replication server to achieve a zero-footprint impact or on the source database server. As a result,
your users can filter source rows on either the source database or replication server.
Time-based Partitioning
Whenever filtering conditions are defined on the values of one or more source
columns, our solution discards irrelevant rows and columns before they’re replicated
to the target database. This may occur, for example, when a column is not present in
the target database schema or when a row doesn’t pass the user-defined predicates on the rows within
the source tables.
There may be circumstances where data to be included in the replicated tables isn’t
an exact copy of the source data. When this happens, our Qlik Replicate solution
allows your users to define and automatically apply those changes to tables and
columns. Examples include:
• Changing the data type and/or the length of any target column, and
Our software performs data type transformations as required, calculating the values of computed
fields and applying the changes as one transaction to the target. When no user defined
transformation is set – but replication is done between heterogeneous databases – some
transformation between different database data types may be required. In these cases, our
solution automatically takes care of the required transformations and computations during the
load or CDC execution.
Our solution separates data and metadata into different topics, allows for smaller
data messages and easier integration to major streaming services such as
Apache Kafka, Confluent, Microsoft Azure Event Hubs and Amazon Kinesis.
Flexible message formats such as JSON and Avro enable easier integration of
metadata into various schema registries.
Having geographically distributed business units can require data storage off-
premises or in the cloud. From every location, each group needs timely access to
its own subset of data. Our Qlik Replicate software solves this challenge by providing an innovative,
highly secure, and resilient wide-area network (WAN) transfer engine that optimizes transfer speeds to
target databases based on available bandwidth. Our algorithms compress large tables, which are then
Since network outages and other unpredictable events can impact data flow to and from the cloud as
well as other remote data repositories, our solution offers seamless recovery from interrupted transfers
– from the exact point of failure. Our solution first stages all source data in files located in a temporary
target directory. It then moves files to the target directory and validates content between the source and
target files. After successful validation, it loads the data into the target database.
Security Capabilities
Our solution addresses security issues related to data transfer to the cloud by
establishing a three-level, secure data transfer mechanism:
3. Files are secured during transfer using advanced, NSA-approved (AES-256) encryption.
Through our Qlik Enterprise Manager, formerly Attunity Enterprise Manager, solution you
gain efficient, high-scale data replication for initiatives such as data lake consolidation.
Our software ensures centralized control of Qlik Replicate tasks and data flow across distributed
environments, enabling your enterprise to scale easily and monitor thousands of integration tasks in
real time through KPIs and alerts. With our Manager, you can monitor distributed Replicate servers
across multiple data centers and control data flow across distributed environments, on premises and in
the cloud, from a single console.
by specific application, or even by physical location, you can incorporate the enterprise business logic
regulations mandate. Granular searching and filtering capabilities offer actionable insight into data
loading and task status. By drilling down from the main console, you can view current task status to
identify and remediate issues, so you continue to meet performance service-level agreements (SLAs).
CONCLUSION
Changing data integration by enabling your IT staff to deliver more data, ready for analytics, to data
lakes, data warehouses, streaming and cloud architectures is our mission. Unlike traditional, batch-
oriented, and inflexible ETL approaches of the last decade, our solutions are modern with the real-time
architecture enterprises like yours require to harness the agility and efficiencies of new data lakes, data
warehouses and cloud offerings. If you’re seeking a single solution to improve data delivery for your
agile analytics initiatives, consider our Qlik Replicate solution.