20230314-EB-Transform Your Data Pipelines
20230314-EB-Transform Your Data Pipelines
Pipelines, Transform
Your Business
3 ways to get started
1
Data is the lifeblood of today’s enterprises—especially when it’s used in real time. It helps organizations
uncover insights that inspire innovations, deliver personalized experiences, and operate more intelligently and
efficiently. However, in many companies, data is siloed, fragmented, and stored in multiple formats across
numerous legacy and cloud-based systems. In fact, 60% of tech leaders say that difficulty integrating multiple
data sources is their biggest hurdle to accessing more real-time data.
To make data more accessible, most IT organizations In this guide, you’ll learn about the challenges that come with
centralize as much information as possible. They typically legacy data pipelines and the benefits of adopting streaming
use point-to-point data pipelines to move data between pipelines. You’ll also discover how four organizations are using
operational databases and a centralized data warehouse streaming data pipelines from Confluent—and how these
or lake. For example, extract, transform, and load (ETL) pipelines could transform your business, too.
pipelines ingest data, transform it in periodic batches, and
later push it downstream to an analytical data warehouse.
ETL pipelines—and reverse ETL pipelines—can also share the
results of data analysis that takes place in the warehouse
back to operational databases and applications.
2
Why traditional data pipelines
don’t scale
Today’s enterprises often manage numerous point-to-point data pipelines, which are
challenging to maintain. A growing number of IT organizations are concluding that their
pipeline approaches to sharing actionable data don’t scale. Reasons include:
Unlike traditional pipelines, streaming data pipelines can be designed using declarative
languages such as SQL to specify the logic of what needs to happen while abstracting
away the low-level operational details. This approach helps to maintain the
delicate balance between centralized continuous data observability, security, policy
management, and compliance standards and the need to make data easily searchable
and discoverable so that developers and engineers can innovate faster.
4
Getting started with streaming
data pipelines Next-gen data lifecycle with Confluent
Confluent makes it easy and cost-effective to build streaming data pipelines and to
evolve them as business and data needs change. Using Confluent, you can build and
deploy modern data flows in five simple steps:
1 Connect 4 Build
Create and manage data flows with an Prepare well-formatted, trustworthy
easy-to-use visual user interface and data products for downstream
pre-built connectors. systems and apps.
2 Govern 5 Share
Centrally manage, tag, audit, and Collaborate securely on live streams
apply policies for trusted high-quality with self-service data discovery
data streams. and sharing.
Streaming data pipelines can capture changes to
data in real time, enrich them on the fly, and send
3 Enrich them to downstream systems. Teams can find,
Use SQL to combine, aggregate, browse, create, share, and reuse data—wherever
clean, process, and shape data in real
and whenever it’s needed.
time, increasing the safety, efficiency,
and usability of your data streams
to power operational, analytical, and
business intelligence use cases.
5
Three key use cases for streaming data pipelines
In this section, we’ll outline the real-world stories of how four organizations are using streaming pipelines
from Confluent to deliver rich front-end customer experiences and real-time back-end operations.
6
Use case #2: Data warehouse pipelines Use case #3: Mainframe integration
A data warehouse, whether it’s on-premises or in the cloud (examples include Databricks, Mainframe systems have demonstrated tremendous staying power. As important as these
BigQuery, and Redshift), is only as good as the information it contains. systems are, though, it can be difficult for today’s data-driven businesses to integrate them with
other systems. Obtaining data from these systems in real time can require the development and
Unlocking real-time insights requires a streaming architecture that’s continuously ingesting, ongoing maintenance of costly custom connectors.
processing, and provisioning data in real time. With Confluent, you can build streaming data
pipelines from hybrid and multicloud data sources to the cloud data warehouse of your choice, Streaming data pipelines allow mainframe data to be accessed in real time without the
unlocking real-time analytics and decision-making while reducing total cost of ownership (TCO) complexity and expense of sending ongoing queries to mainframe databases. They give
and time to value (TTV). enterprises the capabilities they need to power cloud applications and systems with real-time
data streams from mainframe systems while lowering consumption costs.
AN IN-DEPTH LOOK INTO DATA WAREHOUSE PIPELINES: A PRACTICAL APPLICATION OF MAINFRAME INTEGRATION:
Toolstation takes a modern approach to data streaming Amway unlocks mainframe data for new use cases.
in the cloud.
With annual sales of more than $8 billion, Amway is the world’s largest direct-selling company.
Toolstation is an omnichannel retailer of building tools and materials with more than 500 branches Its nutrition, beauty, personal care, and home products are sold in more than 100 countries. IT
in the UK and more than 100 in the Netherlands, Belgium, and France. During the pandemic, the and business solutions teams within Amway placed a high priority on modernizing their digital
company struggled to scale click-and-collect fulfillment with its legacy MySQL database, which footprint and moving fast to deliver needed solutions on tight schedules. The company’s legacy IT
relied on multiple polling processes and batch jobs to stay up to date. With Confluent, the retailer architecture, however, was not well aligned with this priority.
used stream processing and connectors to gradually switch to data streaming—and removed the
limitation on click-and-collect orders. The update was tested, scaled, and moved into production To solve this problem, Amway decided to use capabilities within Confluent, including ksqlDB, to
in just six weeks. accelerate streaming app development and build connectors to easily integrate data from legacy
mainframes to modern systems. Combining cloud scalability with an event-driven architecture
has allowed Amway to rapidly stand up new experiences—such as mobile selling—and plug them
Picnic processes more than 300 million unique events per week into the data mesh being built with Confluent.
to power predictive analytics in their data warehouse.
Picnic is Europe’s fastest-growing online-only supermarket and relies heavily on data-driven
decisions to provide the lowest-price guarantee to its customers. Their blazing growth underpinned
the need to seek out a solution for real-time processing and easy scalability of event data to power
predictive analytics in their data warehouse. With Confluent, Picnic was able to use fully managed
sink connectors to load data seamlessly from upstream systems into Snowflake and Amazon S3 for
analysis by data science teams while reducing infrastructure costs by 40%.
7
It’s time for streaming data pipelines Integrate streaming data across apps and data systems
to unlock unlimited real-time use cases
Modern leading businesses run on real-time data. Point-to-point and batch-
based data pipelines are no longer suitable for any organization that hopes to
lead its sector. Today’s status quo—batch-oriented, centralized, ungoverned,
and inflexible—isn’t enough.
Streaming data pipelines will help you solve your data needs today, and they will serve
your business needs in the future. With streaming data pipelines from Confluent, you
can transform your business and meet ambitious goals such as these:
• Power all your use cases with high-quality, real-time data streams
• Bring new products to market faster with self-service data access
• Boost developer productivity by simplifying data flow development
and iteration
• Enable agile pipeline development to meet changing data needs
• Accelerate innovation while maintaining trust and governance
8
ABOUT CONFLUENT