What Is Stream Processing
What Is Stream Processing
Stream processing is the practice of taking action on a series of data at the time the data
is created. Historically, data practitioners used “real-time processing” to talk generally
about data processed as frequently as necessary for a particular use case. But with the
advent and adoption of stream processing technologies and frameworks, coupled with
decreasing prices for RAM, stream processing is used more specifically.
Stream processing often entails multiple tasks on the incoming series of data (the “data
stream”), which can be performed serially, in parallel, or both. This workflow is referred to
as a stream processing pipeline, which includes the generation of the streaming data, the
processing of the data, and the delivery of the data to a final location.
Actions that stream processing takes on data include aggregations (e.g., calculations
such as sum, mean, and standard deviation), analytics (e.g., predicting a future event
based on patterns in the data), transformations (e.g., changing a number into a dat e
format), enrichment (e.g., combining the data point with other data sources to create more
context and meaning), and ingestion (e.g., inserting the data into a database).
Stream
processing allows applications to respond to new data events at the moment they occur.
In this simplified example, input data pipeline is processed by the stream processing
engine in real-time. The output data is delivered to a streaming analytics application and
added to the output stream.
Stream processing allows applications to respond to new data events at the moment they
occur. In this simplified example, the stream processing engine processes the input data
pipeline in real-time. The output data is delivered to a streaming analytics app lication and
added to the output stream.
Kappa Architecture simplifies data processing by combining batch and real-time analytics
into one. Data enters a central data queue such as Apache Kafka and is converted into a
format that can be directly fed into an analytics database. By removing complexity and
increasing efficiency, this unified method enables you to analyze data more quickly and
instantly obtain deeper insights.
The Kappa
Architecture is typically built around Apache Kafka® and a high -speed stream processing
engine.
Lambda Architecture
The
Lambda Architecture contains both a traditional batch data pipeline, a fast streaming
pipeline for real-time data, and a serving layer for responding to queries.
Stream processing has become a must-have for modern applications. Enterprises have
turned to technologies that respond to data when created for various use cases and
applications, examples of which we’ll cover below.
Stream processing allows applications to respond to new data events at the moment they
occur. Rather than grouping data and collecting it at some predetermined interval, batch
processing and stream processing applications collect and process data immediate ly as
they are generated.
How Does It Work?
Stream processing is often applied to data generated as a series of events, such as data
from IoT sensors, payment processing systems, and server and application logs. Common
paradigms include publisher/subscriber (commonly referred to as pub/sub) and
source/sink. Data and events are generated by a publisher or source and delivered to a
stream processing application, where the data may be augmented, tested against fraud
detection algorithms, or otherwise transformed before the application sends the result t o a
subscriber or sink. On the technical side, common sources and sinks include Apache
Kafka®, big data repositories such as Hadoop, TCP sockets, and in-memory data grids.
• Real-time fraud and anomaly detection. One of the world’s largest credit card
providers has been able to reduce its fraud write-downs by $800M per year, thanks
to fraud and anomaly detection powered by stream processing. Credit card
processing delays are detrimental to the experience of both t he end customer and
the store attempting to process the credit card (and any other customers in line).
Historically, credit card providers performed their time -consuming fraud
detection processes in a batch manner post-transaction. With stream processing, as
soon as you swipe your card, they can run more thorough algorithms to recognize
and block fraudulent charges and trigger alerts for anomalous charges that merit
additional inspection without making their (non-fraudulent) customers wait.
• Internet of Things (IoT) edge analytics. Companies in manufacturing, oil and gas,
transportation, and those architecting smart cities and smart buildings leverage
stream processing to keep up with data from billions of “things.” An example of IoT
data analysis is detecting anomalies in manufacturing that indicate problems need
to be fixed to improve operations and increase yields. With real -time stream
processing, a manufacturer may recognize that a production line is turning out too
many anomalies as it is occurring (as opposed to finding an entire bad batch after
the day’s shift). They can recognize huge savings and prevent massive waste by
pausing the line for immediate repairs.
• Real-time personalization, marketing, and advertising. With real-time stream
processing, companies can deliver personalized, contextual customer experiences.
This can include a discount for something you added to a cart on a website but
didn’t immediately purchase, a recommendation to connect with a just -registered
friend on a social media site, or an advertisement for a product similar to the one
you just viewed.