Big Data PDF
Big Data PDF
Stream Processing
in Big Data
In the age of Big Data, traditional batch processing techniques are no
longer sufficient to handle the massive volumes and rapid velocity of
data streams. This presentation explores the concepts and
techniques of stream processing, enabling real-time insights and
decision-making.
B y: Priyanka Arya
Understanding the
Concept of Stream Data
1 Continuous Data 2 Time Sensitivity
Flow
Stream data is often time-
Stream data is a sensitive, meaning that
continuous flow of data insights must be derived
points, arriving at high quickly to be actionable.
speeds. It is not processed
in batches, but rather in
real-time as it arrives.
3 Unbounded Nature
Stream data is unbounded, meaning that it can continue to
arrive indefinitely, requiring systems to handle continuous
processing.
Key Characteristics of
Stream Data
High Volume High Velocity
Stream data often arrives at Data points arrive at high
very high volumes, requiring speeds, requiring systems to
systems to process large process data quickly to keep
amounts of data in real-time. up with the flow.
Variety
Stream data can come from diverse sources, including sensor
data, social media feeds, and financial transactions.
Differences between Batch and Stream
Processing
Batch Processing Stream Processing
Processes data in batches, typically offline, allowing for Processes data continuously in real-time, focusing on
more complex calculations. speed and low latency.
Architectural Patterns for
Stream Processing
1 Lambda Architecture
Combines batch and stream processing, enabling both
immediate insights and historical analysis.
2 Kappa Architecture
Focuses solely on stream processing, providing real-time
insights with a unified approach.
3 Micro-Batching
Processes data in small batches at high frequencies,
bridging the gap between batch and stream processing.
Overview of Stream Processing Frameworks
Fault Tolerance
Redundancy and checkpointing mechanisms ensure data integrity and system uptime even in the event of failures.
Scalability
Stream processing systems can scale horizontally by adding more nodes to handle increasing data volumes and processing demands.
Best Practices for
Designing Efficient Stream
Processing Pipelines