Choose The Right Stream Processing Engine Whitepaper
Choose The Right Stream Processing Engine Whitepaper
Customer Success 15
About Cloudera 16
2 Choose the Right Stream Processing Engine for Your Data Needs
Process Data Streams at This paper also highlights some of the capabilities that
are key to any data streaming use case, such as:
the Speed of Business and
• Watermarks to handle late and out of order delivery;
at the Scale of IT
• Windowing semantics to structure the streams;
Business opportunities that directly impact revenue
or boost operational efficiency need to be addressed • Complex event processing; and
in near real-time. Digital transformation initiatives and
the advancement in mobility, IoT and streaming
• Capabilities that enhance operational efficiency.
technologies has led to enterprises being inundated Cloudera offers all of the engines listed here,
with data. Key business requirements determine because we believe that you should use the best tool
how such high volumes of high-speed data should for the job. Sometimes that tool is a very simple one,
be processed in real-time to provide actionable but more often than not, you will need the advanced
intelligence. This directly leads to IT having to evaluate capabilities for your specific use cases.
which stream processing engine is best fit for purpose
for their enterprise needs. Other determining factors There are a variety of ways by which to address data
include return on investment, its dexterity to be applied stream processing challenges. The solution comes
across multiple use cases, and its level of maturity for down to the fundamental way in which the engine works
enterprise-wide adoption. and how your organization implements it.
• Kafka Streams;
• Spark Structured Streaming;
• Flink. Cloudera Manager provides one view to manage all of your resources including
stream processing engines. Here we see Flink (1), Kafka (2), and Spark (3) resources
in one comprehensive view. Source: Cloudera.
3 Choose the Right Stream Processing Engine for Your Data Needs
Address Challenges Through Event time and processing time — The chances that
streaming events come in without any delays and with
Informed Decision Making predictable patterns is low, because you can’t control
Global digitization has resulted in a vast array of new the myriad of input sources that exist across collections
products and services with such high levels of of networks that vary in type and quality. Even with the
convenience that it fuels a continuous loop of greater very best networks and the fastest collection mechan
expectations for immediacy. Next day delivery and isms, there will always be latency between the time
real-time payments are demands driven by consumers an event happens in the real world (event time) and the
at the point of service that then pressures downstream time your system processes it (processing time).
services to respond faster. Processing and analyzing Bounded and unbounded streams — Bounded streams
billions of events per second across geographies is have a beginning and an end, so it is easy to reason
becoming an ordinary affair. about time and correctly sort events, akin to batch.
In response, technology teams have been pivoting Unbounded streams are harder to reason about
from large monolithic database architectures to because, without an end, you don’t know if another
event-driven applications and microservices design live event is yet to come. Calculations, aggregations
as a way to reduce the inevitable latency of inputs and or pattern detection in unbounded streams is very
outputs across networks by bringing the state of an tricky. To handle both scenarios, it is helpful to follow
event closer to the application itself. a “streaming first” principle (see sidebar next page),
and to consider capabilities like watermarks to handle
Central to this effort are modern data stream processing late and out of order events (see sidebar next page).
engines like Kafka Streams, Spark Structured Streaming,
and Flink. Of these three, Flink is the oldest, while Kafka Simple and complex events — Complex events are
Streams is the newest. The Spark Structured Streaming derived from simple events that have been aggregated,
community is large while Flink’s is growing rapidly. patterned, and evaluated to trigger a response or
Knowledge of an engine’s development community present a result, often on data that continuously moves
can help gauge how self-sufficient and productive your under your feet. Decisioning on unbounded streams
team can be. requires the state of events to be stored and analyzed.
The engine that is best for you depends on your organiza Stateless and stateful — Stream processing engines
tion’s use cases, team makeup, and various technology excel when analytics require a reassessment of events
and operational factors. This paper is meant to help you within the context of time. That is considered stateful,
in that evaluation process. while stateless represents a self-contained fire-and-
forget paradigm. There are acceptable trade-offs
Streaming Challenges between stateless high throughput engines and
stateful engines that need to address aggregation,
The following are reminders of streaming challenges
enrichment, and other requirements.
you’ve undoubtedly had or will come across.
Unbounded Stream
Unbounded Stream
4 Choose the Right Stream Processing Engine for Your Data Needs
Decision Making Process: Technology
and Operational Considerations The “Streaming First” Approach
There is often an over reliance on streaming bench
Stream processing engines have followed
marks when choosing a stream processing engine.
different paths in their approach to solving
These benchmarks primarily concentrate on latency,
unique time reasoning challenges.
throughput, and hardware utilization, neglecting
functional requirements and the level of control Flink is a “streaming first” distributed system.
developers possess when implementing a solution. This means that it has always focused on solving
Additionally, benchmarks often overlook crucial the difficult unbounded stream use cases over
operational, staffing, and other nonfunctional criteria. bounded stream and batch scenarios.
The remainder of the paper outlines vital technological
and operational factors necessary for making informed It turns out that algorithms that work on
decisions and encouraging enterprise-wide adoption unbounded streams, also work on bounded
of the chosen solution. streams by treating the latter as a special case
of the former. As a result, Flink addresses
Technology Considerations micro-batch use cases as well.
5 Choose the Right Stream Processing Engine for Your Data Needs
Flink, Kafka Streams, and Spark Structured Streaming cantly and is now fully open-source. However it is not
are all stateful, but with slight differences. Having taken ANSI-compliant nor as feature rich as Flink’s offering.
a “batch first” approach, Spark Structured streaming Spark Structured Streaming SQL, is well adopted, and
handles events as micro-batches and it excels when has ANSI compliance dating back to 2003.
high throughput is necessary but low latency is not a
big requirement. The two other stateful engines differ Implementation and Beyond
in how they store state. Kafka Streams depends on the Application development is only as good as its implemen
Kafka ecosystem, while Flink provides more storage tation. Below, we cite aspects that need to be considered
options. Both process messages one event at a time to move beyond the idea and development stage.
and are considered low-latency solutions.
Delivery Guarantee — This is a key factor to consider
Time Support — All of the stream processing engines as it relates to your expectations of latency, throughput,
described in this paper are able to distinguish event correctness, and fault tolerance of message delivery.
time from processing time. The nuance is in how much
control you have to address some of the trickier use At-least-once delivery ensures that a message is
cases. Flink provides a great deal of control with capa delivered at least once, although multiple delivery
bilities such as watermarking and session windows attempts may result in duplicates. This approach offers
(see sidebars on page 5 and page 7). high performance and minimal overhead since delivery
tracking state is not maintained. Operating in a fire-
Developmental Control and-forget manner, at-least-once delivery satisfies
A common task in every data processing use case is low latency and high throughput requirements while
to import data from one or multiple systems, apply guaranteeing message delivery, but may not sufficiently
transformations, and then export results to another address data duplication concerns.
system. Considering the ubiquitousness of streaming Some applications, such as financial transactions, have
data applications, unified integration with machine stricter demands and may require that messages are
learning, graph databases, and complex event process received and processed exactly once. This requires
ing is becoming more common. retries to counter transport losses, which means keeping
Processing Abstraction — To help your engineering state at the sending end and having an acknowledge
team be productively focused on business logic ment mechanism at the receiving end. Exactly-once
instead of advanced streaming concepts, it’s is optimal in terms of correctness and fault tolerance,
important to evaluate the stream processing engine’s but comes at the expense of added latency.
abstraction capabilities. All the engines described in this paper provide exactly-
Spark Structured Streaming has a rich set of libraries once delivery guarantee, though Kafka Streams is limited
to implement machine learning use cases. If you are to the Kafka ecosystem and can’t control downstream
already developing within a Spark ecosystem with systems. Flink and Spark Structured Streams guarantee
frameworks such as MLlib, choosing Spark Structured exactly-once delivery from any replayable upstream
Streaming as your stream processing engine will make source but also with downstream platforms in some
it for a much easier adoption. cases, if they have transactional support.
Special attention should be paid to the engine’s SQL State Management — The aforementioned trade-off
abstraction. From an analytics democratization point between exactly-once delivery guarantee and the
of view, SQL abstraction is a very important basis for inevitable latency of state storage may drive the
comparison. While many senior developers prefer selection process based on the state management
sophisticated languages like Scala for intricate analytical capabilities that come with the engine.
tasks, the expressiveness and simplicity of SQL can get Kafka Streams is good for things that are Kafka centric,
the job done more easily and it is accessible to a wider because it tends to rely heavily on Kafka storage for
range of developer, and even business, resources. state. Like Flink, it uses a local RocksDB but checkpoints
This accessibility is a critical factor in enabling domain the state as a Kafka topic and that limits flexibility as
ownership of data apps. to how you store and access the history of that state.
When it comes to the comparison on the basis of SQL, Within a Kafka ecosystem, a good linear access
the more standard the better. Flink has the most mature mechanism is provided, making everything nice and
and production tested OpenSource SQL-on-Streams tame. This works great for simple use cases, but
implementation and is fully ANSI compliant. Kafka’s it doesn’t provide the flexibility and operational
ksqlDB, formerly known as KSQL, has matured signifi capabilities that some of the other engines do.
6 Choose the Right Stream Processing Engine for Your Data Needs
Apache Flink provides more complete state manage Leveraging Apache Flink’s advanced checkpointing
ment capabilities compared to Kafka Streams, offering capabilities offers a multitude of operational
features such as native support for different state advantages, such as seamless job versioning,
backends, efficient checkpointing, and built-in fault effortless cluster migration, streamlined application
tolerance mechanisms. While both Flink and Kafka and infrastructure upgrades, and the flexibility to
Streams can handle stateful stream processing, Flink’s transition workloads between cloud and on premises
state management system is designed to scale across environments. As a testament to its efficacy, this
distributed environments seamlessly, allowing for robust approach is increasingly being embraced by
automatic state redistribution and recovery during the wider stream processing engine community,
rescaling or failures. This makes Flink particularly further solidifying Flink’s position as a trailblazer in
well-suited for large-scale, complex, and state-heavy stateful stream processing.
streaming applications that require high levels of fault
tolerance and flexibility in state management.
Window Semantics
Fault Tolerance/Resilience — The demand to mitigate
operational disruption is so strong that the concept Flink allows you to customize a window
of “resiliency” attracts regulatory oversight across structure so you are not limited to pure linear
industries. Streaming architecture capabilities such time. By utilizing Session Windows, for instance,
as checkpointing, savepoints, redistribution, and state you can define windows based on gaps
management (see above) are crucial to the stream between events or the number of events. This
processing engine selection process. offers considerable flexibility in assigning
events to different windows before processing.
Spark Structured Streaming has built-in capabilities,
while Kafka Streams requires you to “build your own”, Windows are logically essential for analytics,
using ZooKeeper to replace a failed broker for example. as they provide the structure upon which
analysis is based.
Flink’s fault tolerance mechanism uses checkpoints
to draw consistent snapshots to which the system Two other windowing examples are tumbling
can fall back in case of a failure. The aforementioned windows (that slice a stream into even chunks)
state management capabilities ensure that even in the or sliding windows (that enable your aggrega
presence of failures, the program’s state will eventually tions and analytics to move with time).
reflect every record from the data stream exactly once.
Tumbling
Windows
Sliding
Windows
7 Choose the Right Stream Processing Engine for Your Data Needs
Kafka Streams doesn’t provide a scheduler or full
deployment framework out-of-the-box. While it
Complex Event Processing (CEP)
provides efficient ways of writing simple applications,
Processing real-time events and extracting you are left to your own devices on how to launch, run,
information to identify more meaningful events, orchestrate and operate those applications.
like understanding behaviors, probably ranks as
Community Maturity and Documentation — To ensure
one of the most interesting streaming use cases.
that your developer resources are self-sufficient and
Flink’s statefulness and window handling productive, the maturity of the developer community
capabilities is the foundation on which and quality of documentation is a very important
advanced CEP is crafted. What makes Flink all aspect to consider.
the more compelling is that CEP is accessible
Kafka Streams is relatively young with very strong
to a wider range of developer resources through
community growth and extensive documentation
standard SQL abstraction.
and examples. While the Spark Structured Streaming
For example, the Match_Recognize SQL community is large and busy with extensive documen
statement can be very helpful when you are tation and examples, they still need help from more
looking for patterns built up through sequences reviewers and committers, something that Cloudera
of events that can’t be distinguished by simple is helping to drive forward.
counting methods.
Flink is the fastest growing community with strong
The standard SQL abstraction of Flink makes it research and production deployments. Documentation
a compelling choice for use cases that require and working examples are good and will broaden
“simple” complex event processing. considerably as the community matures. “For the 5 year
period from 2017-2022 Flink has been the most active
community in terms of code commits and community
discussion”. Also, some of the biggest brand name
Operational Considerations companies have already invested in large deployments
for Stream Processing of Flink for their real-time stream processing needs.
8 Choose the Right Stream Processing Engine for Your Data Needs
As it relates to Kafka, the original security implemen Scaling Up / Scaling Down — Another consideration
tation was done by Cloudera (via integration with Apache is that streaming workflows tend to be multi-modal
Ranger for role-based authorization and auditing) and and unbalanced throughout the day, so the scaling
still provides leading security capabilities via scale- capabilities are absolutely crucial. Flink and Spark
tested role-based and attribute-based access control Structured Streaming are developing auto-scaling
models, and integrated data governance with Apache approaches to automatically maintain steady and
Atlas in Cloudera. predictable performance. They each have a solid
orchestration platform underneath, which tends to
The other advantage of Cloudera for these stream
give them an edge over the “build your own” type of
processing engines is the fine-grained integrated
approach that you get with Kafka Streams.
single pane of glass security control.
9 Choose the Right Stream Processing Engine for Your Data Needs
Flink Use Cases
Operational Efficiency To get a practical understanding of how Flink is used
How would you understand what contributed in the real world, we have described a variety of use
to an unexpected value in a complex calcula cases below.
tion while data continues to stream in?
10 Choose the Right Stream Processing Engine for Your Data Needs
Outage Classification for customer relationship functions from mainframes to
Telecom Companies agile stateful stream processing engines like Flink.
For decades, telecom companies have focused This shift enables tailored product offerings based on
on their network infrastructure and preventative real-time spending patterns, delivering an enhanced
maintenance but often as a reaction to past events. and personalized customer experience.
Customer and regulatory demands require a dynamic A key driver of success is data consistency. Flink’s
approach to predict and mitigate spotty performance exactly-once delivery guarantee ensures that data is
and outages. processed exactly once, even in the event of failures.
Adapting network strength and mass availability This is critical for applications that need to track
is crucial and requires aggregated analysis on vast customer spending behavior and balances, as well as
amounts of data over a wide array of networks to find those that need to make real-time decisions based on
anomalies, predict where failures are likely to occur, or the latest data.. Flink’s ability to combine exactly-once
even just record the state of their current network at any delivery with complex event processing (CEP) makes
point of time. it a powerful tool for real-time marketing. By analyzing
real-time data streams, Flink can identify patterns that
5G has only increased the volume and variety of indicate when a customer is likely to be interested in a
metrics available, so having the ability to scale and particular product or offer. This allows banks to deliver
perform analytics on incoming events quickly is the right offer at the right time, which can lead to
absolutely critical to identifying problems before increased customer engagement and sales. Flink’s
customers do. ability to perform data access, enrichment, and
decisioning locally eliminates the need to access
Financial Services: Mainframe Offloading external data sources. This can significantly improve
Today’s consumers expect swift and seamless service response times, as data does not need to be transferred
experiences. Traditional, overburdened mainframes over the network. This is essential for applications that
struggling with low-latency user interactions often require millisecond-level response times, such as fraud
hinder banks’ efficiency, especially in the era of Open detection and risk assessment.
Banking directives, which require them to share
customer data with third-party providers. To address
this challenge, financial institutions are transitioning
11 Choose the Right Stream Processing Engine for Your Data Needs
IoT for Manufacturing Fraud Monitoring for Banking
IoT devices streamline supply chain operations within Today’s financial services demand instant approval
a manufacturing facility. Today, manufacturers are which requires data from various sources while
leveraging advanced monitoring sensors and real-time transaction streams carry ever increasing volumes of
technologies to track quality of goods, automate the data. Fraudsters try to exploit these conditions and
visual inspection of goods, and customized manufac continue to evolve tactics to try to stay one step ahead.
turing for individual partners.
Flink’s ultra-low latency unified processing makes it
Flink’s advanced windowing and state capabilities possible to process transaction streams in real time
make it an ideal platform for processing sensor data in while executing table look-ups for historical customer
manufacturing. By aggregating and comparing sensor data. In order to identify fraudulent patterns, trans
data over time, Flink can identify patterns that indicate actions must be processed in context of customer
potential problems with machines. This allows manufac profile and transaction history. Furthermore, with Flink
turers to take corrective action before a problem SQL API’s providing a useful abstraction layer, fraud
causes a costly outage. Additionally, Flink’s flexibility monitoring logic can be expressed in the language of
makes it a good fit for the different use cases in data, making the capabilities of Flink more accessible
manufacturing IoT. Flink can be used to process data to less technical fraud analysts so they can adapt their
from both streaming and batch sources, and it can be monitoring techniques to stay ahead of the fraudsters.
deployed as a microservice or a standalone application.
This makes it easy to integrate Flink with other manufac
turing systems. Overall, Flink is a powerful platform for
processing sensor data in manufacturing. Its advanced
windowing and state capabilities, as well as its flexibility,
make it a good fit for the diverse needs of manufacturing.
12 Choose the Right Stream Processing Engine for Your Data Needs
Technical Features Table
The table below gives technical comparison across four modern stream processing engines. Refer to it when evaluating
the functional and developmental aspects of your project.
● Great fit for purpose ◆ Fits with some work ▶ Fits with a lot of work
13 Choose the Right Stream Processing Engine for Your Data Needs
Operational Features Table
The table below gives an operational comparison across four modern stream processing engines. Refer to it when
evaluating the nonfunctional aspects of your project.
Documentation ◆ •Good technical documentation ● •E xtensive documentation ● •E xtensive documentation
•Growing examples •E xtensive examples •E xtensive examples
•Stack overflow coverage •Stack overflow coverage •Stack overflow coverage
Maturity/ ● •Smaller but fastest growing ● •Newest, strong community with ● •Spark Structured Streaming
community community with strong research strong growth community is strong, but
and production deployments Streaming is a small, quiet corner
Use cases ● •Unbounded and ● •Microservice/event driven, ● •Unified ETL, semi-RT processing
bounded streams embedded in another
•Batch application
Logging/metrics ◆ •Usual OSS integrations, some ◆ •BYO microservices ● •Good logging integration
vendor offerings
Scaling up/down ● •Not yet autoscaling, but all ◆ •BYO microservice, scaling limits ● •Not yet autoscaling, but all
requirements available e.g.: shuffle sort requirements available
● Great fit for purpose ◆ Fits with some work ▶ Fits with a lot of work
14 Choose the Right Stream Processing Engine for Your Data Needs
Customer Success Ensure Fit for Purpose and
To provide insights into the business impact that can Enterprise Wide Adoption
be drawn through a comprehensive data-in-motion
Streaming and time based reasoning applications
solution, we provide these customer success examples.
are confronted with both simple and complex sets of
1. An international communications company serving challenges. Functional business requirements dictate
consumers and businesses in ten countries, the manner in which data must be processed, thereby
deployed the Cloudera streaming data platform guiding the evaluation and selection of the most
to tackle a variety of critical use cases including, suitable stream processing engine to meet your
stream processing, log aggregation, large-scale specific needs.
messaging and customer insights.
This paper described a number of capabilities that
Results included improved overall customer would address both simple and complex scenarios,
experience through strategic use of data analysis, while keeping in mind acceptable trade-offs. You
reduced infrastructure management costs and don’t want to over-engineer a solution, but you want to
TCO, and enabled real-time actions to improve know that it can grow to support an evolving business.
business outcomes. To support that growth, there are a number of technical
and operational factors that are crucial to the decision
2. A large European bank specializing in agriculture
making process.
financing and sustainability oriented banking
across global markets leveraged Cloudera’s We also suggested that you take a broad perspective
streaming data platform to run sophisticated that also considers nonfunctional aspects such as how
real-time algorithms and financial models to help your team can deliver on the solution’s promise, how it
customers manage their financial obligations, integrates into your organization’s security framework,
including loan repayments. operational processes, support structure, and how it
can scale up and down in line with business demand.
By implementing the platform and gaining the
ability to stream real-time data, the bank can now In summary, the view expressed here will help ensure
detect warning signals in extremely early stages that you choose the right stream processing engine
of where clients may go into default. Through their that is both fit for purpose to the business challenge
new, governed data lake, the bank’s account at hand, and will also enjoy enterprise wide adoption.
managers are also able to access an in-depth
overview of customer data, enabling them to
generate liquidity overviews and advise customers
on how to avoid defaulting. Through rapid data
processing, better models are created that more
accurately predict warning signals.
15 Choose the Right Stream Processing Engine for Your Data Needs
About Cloudera
Cloudera is the only true hybrid platform for data,
analytics, and AI. With 100x more data under
management than other cloud-only vendors,
Cloudera empowers global enterprises to transform
data of all types, on any public or private cloud,
into valuable, trusted insights. Our open data
lakehouse delivers scalable and secure data
management with portable cloud-native
analytics, enabling customers to bring GenAI
models to their data while maintaining privacy
and ensuring responsible, reliable AI deployments.
The world’s largest brands in financial services,
insurance, media, manufacturing, and government
rely on Cloudera to be able to use their data
to solve the impossible — today and in the future.
Cloudera, Inc. | 5470 Great America Pkwy, Santa Clara, CA 95054 USA | cloudera.com
© 2025 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries.
All other trademarks are the property of their respective companies. Information is subject to change without notice. WP_011_V2 January 9, 2025