Unit 2 BD Mining Data Streams
Unit 2 BD Mining Data Streams
• Stream Processing
• Stream processing is a method of data
processing that involves continuously
processing data in real-time as it is generated,
rather than processing it in batches. In stream
processing, data is processed incrementally
and in small chunks as it arrives, making it
possible to analyze and act on data in real-
time.
• Stream processing is particularly useful in
scenarios where data is generated rapidly,
such as in the case of IoT devices or financial
markets, where it is important to detect
anomalies or patterns in data quickly.
• Stream processing can also be used for real-
time data analytics, machine learning, and
other applications where real-time data
processing is required
• There are several popular stream processing
frameworks, including Apache Flink, Apache
Kafka, Apache Storm, and Apache Spark
Streaming. These frameworks provide tools
for building and deploying stream processing
pipelines, and they can handle large volumes
of data with low latency and high throughput
Mining data streams
• Data Source: A stream's data source is the place where the data is generated or
received. This can include sensors, databases, network connections, or other sources.
• Data Sink: A stream's data sink is the place where the data is consumed or stored.
This can include databases, data lakes, visualization tools, or other destinations.
• Streaming Data Processing: This refers to the process of continuously processing
data as it arrives in a stream. This can involve filtering, aggregation, transformation,
or analysis of the data.
• Stream Processing Frameworks: These are software tools that provide an
environment for building and deploying stream processing applications. Popular
stream processing frameworks include Apache Flink, Apache Kafka, and Apache
Spark Streaming.
• Real-time Data Processing: This refers to the ability to process data as soon as it is
generated or received. Real-time data processing is often used in applications that
require immediate action, such as fraud detection or monitoring of critical systems.
Stream Data Model and Architecture