0% found this document useful (0 votes)
10 views14 pages

SA Unit 1 PPT 5

The document discusses the integration of batch processing into streaming applications to enhance data analysis by combining real-time insights with historical data. It outlines the definitions and characteristics of both processing types, introduces architectures like Lambda and Kappa, and provides real-world examples of their application. Additionally, it addresses challenges such as data consistency and complexity, while listing relevant tools and technologies.

Uploaded by

pg0145
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views14 pages

SA Unit 1 PPT 5

The document discusses the integration of batch processing into streaming applications to enhance data analysis by combining real-time insights with historical data. It outlines the definitions and characteristics of both processing types, introduces architectures like Lambda and Kappa, and provides real-world examples of their application. Additionally, it addresses challenges such as data consistency and complexity, while listing relevant tools and technologies.

Uploaded by

pg0145
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

The Use of a Batch-

Processing Component in
a Streaming Application
Bridging Real-Time and Historical Data
Processing
Introduction

• Streaming applications process data in real-time.


• Batch processing handles large volumes of historical
data.
• Combining both can provide comprehensive insights.
What is Streaming Processing?
• Definition: Processing data in real-time as it is
generated.
• Examples:
• Real-time fraud detection.
• Live dashboards for monitoring.
• Characteristics: Low latency, continuous data flow.
What is Batch Processing?
• Definition: Processing large volumes of data at rest in
scheduled intervals.
• Examples:
• Daily sales reports.
• Monthly customer segmentation.
• Characteristics: High throughput, delayed results.
Why Use Batch Processing in
Streaming Applications?
• Complementary Strengths:
• Streaming: Real-time insights.
• Batch: Comprehensive historical analysis.
• Use Cases:
• Enriching real-time data with historical context.
• Backfilling missing or late-arriving data.
• Training machine learning models on historical data.
Lambda Architecture
• Definition: A hybrid approach combining batch and
streaming layers.
• Components:
• Batch Layer: Processes historical data.
• Speed Layer (Streaming): Handles real-time data.
• Serving Layer: Merges results for queries.
• Advantages: Combines accuracy and low latency.
Figure 3-2. The Lambda architecture
Kappa Architecture

• Definition: A simplified alternative to Lambda


Architecture.
• Key Idea: Use a single streaming layer for both real-
time and historical data.
• Advantages: Simpler to maintain and manage.
Figure 3-3. The Kappa architecture
Real-World Examples

• Netflix: Uses batch processing for recommendation


model training and streaming for real-time
recommendations.
• Uber: Combines real-time ride data with historical
trends for dynamic pricing.
• Financial Services: Real-time fraud detection enriched
with historical transaction patterns.
Challenges

• Data Consistency: Ensuring batch and streaming

results align.

• Complexity: Managing two processing systems.

• Latency: Balancing real-time and batch processing

delays.
Tools and Technologies

• Streaming: Apache Kafka, Apache Flink, Apache Storm.

• Batch Processing: Apache Hadoop, Apache Spark.

• Hybrid Systems: Apache Beam, Databricks.


Batch Integration Vs Streaming
Integration
Thank You

You might also like