0% found this document useful (0 votes)

16 views16 pages

Big Data 3rd Unit

The document provides an overview of mining data streams, defining data streams as continuous, real-time sequences of data that require quick processing due to their unbounded and dynamic nature. It discusses stream data models, architectures, and techniques for mining data streams, including sampling, filtering, and counting distinct elements. Additionally, it highlights applications of real-time analytics platforms in various fields such as sentiment analysis, stock market predictions, and fraud detection.

Uploaded by

aishwark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views16 pages

Big Data 3rd Unit

Uploaded by

aishwark

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Mining Data Streams

Introduction to Streams Concepts

1. Definition:
o A data stream is a continuous, rapid, and unbounded sequence of data
generated in real-time. Examples include sensor data, social media updates,
stock prices, and clickstreams.
o Unlike traditional databases, data streams cannot be stored in their entirety due
to their large volume and velocity.
2. Key Characteristics of Data Streams:
o Unbounded: Data arrives continuously with no fixed size.
o Time-sensitive: Processing must occur quickly to derive actionable insights.
o Dynamic and Evolving: Data patterns can change over time.
o Imperfect Information: Due to real-time processing, the data may not always
be complete or fully accurate.

Stream Data Model and Architecture

1. Stream Data Model:

o Data streams are modeled as sequences of tuples (or events) arriving at a high
rate.
o Each tuple consists of attributes (e.g., timestamp, key-value pairs).
o Example: A stock trade stream may have tuples like (timestamp,
stock_symbol, price, volume).
2. Stream Processing Architecture:
o Data Sources:
 Sensors, logs, social media, financial feeds.
o Stream Ingestion:
 Systems like Kafka, Apache Flume, or AWS Kinesis ingest data.
o Processing Layer:
 Frameworks like Apache Spark Streaming, Apache Flink, or Storm
process streams in real time.
o Storage:
 Results can be stored in NoSQL databases (e.g., MongoDB) or data
lakes.
o Visualization and Analytics:
 Dashboards or analytics tools present insights derived from the
processed data.

Stream Computing

 Definition: Stream computing refers to real-time processing of data streams as they

arrive.
 Components:
1. Stream Operators:
 Perform tasks like filtering, aggregation, and joining streams.
2. Windows:
 Process data within a specific time or count-based window (e.g., last
10 seconds).
3. Fault Tolerance:
 Systems handle failures to ensure no data is lost.

Techniques for Mining Data Streams

1. Sampling Data in a Stream

 Definition:
o Selecting a representative subset of the data stream for analysis.
 Techniques:

1. Reservoir Sampling:
 Maintains a fixed-size sample from the stream by replacing elements
with a decreasing probability as the stream grows.
2. Random Sampling:
 Randomly selects items from the stream based on a predefined
probability.

2. Filtering Streams

 Definition:
o Extracting only the relevant data from the stream while discarding irrelevant
data.
 Methods:
o Use conditional operators (e.g., price > 100) to filter data.
o Example:
 From a social media stream, extract only posts containing the word
“urgent.”

3. Counting Distinct Elements in a Stream

 Problem:
o Determining the number of unique elements in a data stream (e.g., distinct IP
addresses).
 Solution:
o Use probabilistic algorithms:
 HyperLogLog:
 Approximates the count of distinct elements using hash
functions and bitmaps.
 Bloom Filters:
 Efficiently tests whether an element is part of a set.

4. Estimating Moments

 Definition:
o Moments are statistical properties of data streams, such as mean, variance, or
skewness.
 Method:
o Use incremental algorithms to compute moments without storing the entire
stream.

5. Counting Oneness in a Window

 Problem:
o Track the number of unique items within a sliding window of time or events.
 Solution:
o Use sliding window techniques combined with hash-based counting.

6. Decaying Window

 Definition:
o Assigns more weight to recent data while gradually discounting older data.
 Applications:
o Useful in scenarios where recent trends are more important, such as fraud
detection.

Real-Time Analytics Platform (RTAP) Applications

1. Definition:
o RTAP refers to systems and tools designed for real-time data stream
processing and analytics.
2. Examples:
o Apache Kafka, Spark Streaming, Flink, and Google Cloud Dataflow.

Applications of Mining Data Streams

1. Real-Time Sentiment Analysis

 Definition:
o Extracting sentiments (positive, negative, or neutral) from streams like social
media or customer reviews.
 Workflow:

1. Collect tweets or posts using stream ingestion tools.

2. Apply natural language processing (NLP) techniques to classify sentiment.
3. Visualize trends in real-time.
 Use Case:

o Monitoring public perception of a product launch.

2. Stock Market Predictions

 Definition:
o Analyzing stock price streams to predict future movements and trends.
 Workflow:

1. Collect real-time stock trade data.

2. Use machine learning models to detect patterns or anomalies.
3. Provide actionable insights like buy/sell signals.
 Use Case:

o High-frequency trading and portfolio management.

Case Studies

Case Study 1: Real-Time Sentiment Analysis

 Problem:
o Track customer sentiment about a brand during a marketing campaign.
 Solution:

1. Ingest social media streams using Apache Kafka.

2. Process data with Spark Streaming and apply sentiment classification using
NLP libraries.
3. Store results in Elasticsearch and visualize them on a Kibana dashboard.

Case Study 2: Stock Market Predictions

 Problem:
o Predict stock price movements based on live trading data.
 Solution:
1. Use a real-time data feed to collect trading data.
2. Implement predictive models using tools like TensorFlow or PyTorch.
3. Execute trades automatically based on predictions.

Stream Data Model and Architecture

1. Introduction to Stream Data

 Definition: Stream data refers to continuous, real-time flows of data generated by

sources such as sensors, social media feeds, financial transactions, and logs.
 Characteristics:
o Continuous flow: Data is generated non-stop.
o Unbounded: The volume of data is potentially infinite.
o Real-time: The data is processed as it arrives with low latency.
 Examples: IoT sensor data, stock market updates, clickstream data, etc.

2. Stream Data Model

The stream data model represents a framework for managing and querying continuous data
streams.

Key Concepts

1. Data Streams: Continuous, time-ordered sequences of tuples or events.

2. Windows: Subsets of data streams defined for processing. Types of windows:
o Tumbling Window: Non-overlapping intervals.
o Sliding Window: Overlapping intervals.
o Session Window: Based on activity gaps.
3. Stream Operators:
o Selection: Filters data based on conditions.
o Projection: Extracts specific fields or attributes.
o Join: Combines multiple streams or streams with static data.
o Aggregation: Performs operations like sum, average, min, max.

3. Stream Data Processing Architecture

The architecture enables real-time data ingestion, processing, and analysis. It consists of the
following components:

3.1. Data Sources

 Sources: Sensors, IoT devices, logs, APIs, etc.
 Role: Generate and emit continuous data streams.

3.2. Stream Ingestion Layer

 Purpose: Collect and transport data from sources to processing systems.

 Tools: Kafka, RabbitMQ, Amazon Kinesis.

3.3. Stream Processing Layer

 Purpose: Process and analyze data in real time.

 Components:
o Processing Engines: Tools like Apache Flink, Apache Storm, Spark
Streaming.
o Processing Techniques:
 Stateless: Each event is processed independently.
 Stateful: Maintains state across events (e.g., session tracking).
o Windowing Mechanisms: Define processing boundaries.
 Key Features:
o Scalability
o Fault tolerance
o Low latency

3.4. Data Storage

 Purpose: Store processed data for querying, reporting, or batch analysis.

 Types:
o Cold Storage: Long-term storage (e.g., HDFS, Amazon S3).
o Hot Storage: Low-latency storage for real-time queries (e.g., NoSQL
databases like Cassandra).

3.5. Visualization and Querying Layer

 Purpose: Provide insights and actionable analytics.

 Tools: Tableau, Grafana, Kibana.

4. Stream Processing Frameworks

Framework Features Use Cases

Apache Real-time log
Distributed messaging system for data streaming.
Kafka collection.
Handles high-throughput, low-latency streaming with Event-driven
Apache Flink
stateful computation. applications.
Apache Machine learning
Unified batch and streaming processing framework.
Spark pipelines.
Apache Real-time event processing with scalability and fault
Social media analytics.
Storm tolerance.
5. Challenges in Stream Processing

1. Scalability: Managing increasing data volume.

2. Fault Tolerance: Ensuring reliability during failures.
3. Latency: Minimizing delays in data processing.
4. Ordering: Handling out-of-order data events.
5. State Management: Efficiently managing the state for stateful operations.

6. Comparison: Stream vs. Batch Processing

Aspect Stream Processing Batch Processing

Data Input Continuous Fixed size
Latency Low (real-time) High (scheduled intervals)
Use Cases Fraud detection, sensor monitoring Large-scale analytics, data warehousing
Frameworks Kafka, Flink, Spark Streaming Hadoop, Hive
Real-Time Analytics Platform (RTAP) Applications
1. Introduction to Real-Time Analytics Platform (RTAP)

 Definition: A Real-Time Analytics Platform (RTAP) is a system that processes and

analyzes data in real-time as it is generated or ingested. It enables organizations to
derive actionable insights without delay.
 Purpose:
o Immediate decision-making.
o Monitoring and responding to events as they occur.
 Key Features:
o High-speed processing.
o Scalability for large volumes of data.
o Integration with multiple data sources.

2. Components of RTAP

1. Data Ingestion: Collects data from multiple sources in real time.

o Tools: Kafka, Flume, Amazon Kinesis.
2. Stream Processing: Processes the ingested data with low latency.
o Tools: Apache Flink, Spark Streaming, Apache Storm.
3. Storage: Provides a repository for processed or intermediate data.
o Tools: HBase, Cassandra, Elasticsearch.
4. Analytics Engine: Performs real-time analytics on processed data.
o Tools: SQL engines, ML models, or dashboards.
5. Visualization: Presents insights via dashboards and alerts.
o Tools: Tableau, Grafana, Kibana.

3. Applications of RTAP

3.1. Internet of Things (IoT) Analytics

 Description: RTAP enables monitoring and analyzing IoT device data in real time.
 Applications:
o Smart home automation.
o Predictive maintenance of machinery.
o Environmental monitoring (e.g., weather or pollution sensors).
 Example: Detecting anomalies in factory equipment to prevent breakdowns.

3.2. Fraud Detection in Financial Services

 Description: RTAP identifies suspicious transactions or fraud patterns as they occur.

 Applications:
o Credit card fraud detection.
o Monitoring stock market trading for anomalies.
 Example: Blocking fraudulent transactions based on irregular patterns.

3.3. Social Media and Sentiment Analysis

 Description: Analyzing social media feeds in real time to capture trends or public
sentiment.
 Applications:
o Tracking viral content.
o Monitoring brand sentiment.
 Example: Identifying trending hashtags for marketing campaigns.

3.4. E-commerce and Retail Analytics

 Description: Enhances customer experiences and optimizes inventory.

 Applications:
o Personalized product recommendations.
o Real-time stock level updates.
o Dynamic pricing based on demand.
 Example: Displaying “customers also bought” recommendations during checkout.

3.5. Healthcare and Patient Monitoring

 Description: Analyzes patient data to deliver timely alerts or decisions.

 Applications:
o Real-time tracking of vital signs.
o Alerts for emergency conditions (e.g., heart rate anomalies).
 Example: Sending alerts to doctors when a patient’s health metrics deviate from the
norm.

3.6. Cybersecurity and Intrusion Detection

 Description: Monitors network traffic and detects malicious activities.

 Applications:
o Identifying Distributed Denial of Service (DDoS) attacks.
o Detecting unauthorized access attempts.
 Example: Blocking IP addresses involved in phishing attacks.

3.7. Supply Chain and Logistics

 Description: Optimizes supply chain operations and improves delivery efficiency.

 Applications:
o Real-time vehicle tracking.
o Inventory management and replenishment.
 Example: Predicting delivery times based on real-time traffic data.

3.8. Energy Management

 Description: Monitors and optimizes energy consumption in real time.

 Applications:
o Smart grid monitoring.
o Predicting and managing energy demands.
 Example: Detecting energy wastage in real-time and automating corrective actions.

3.9. Telecommunications

 Description: Analyzing call records and network traffic for insights.

 Applications:
o Real-time call quality monitoring.
o Network optimization during high usage periods.
 Example: Redirecting traffic during outages to maintain connectivity.

4. Benefits of RTAP

 Reduced Latency: Instant insights and decision-making.

 Scalability: Handles large volumes of high-velocity data.
 Enhanced Customer Experience: Personalization and timely interventions.
 Cost Savings: Optimizes resources and reduces downtime.
 Competitive Advantage: Enables proactive responses to market changes.

5. Challenges in Implementing RTAP

1. Data Integration: Managing data from diverse sources.

2. Scalability: Ensuring the system handles growing data volumes.
3. Latency: Minimizing delays in processing.
4. Cost: High infrastructure and maintenance costs.
5. Complexity: Building and maintaining a robust architecture.

6. Frameworks for Real-Time Analytics

Framework Description Key Features

Apache Kafka Distributed messaging for real-time data. High throughput, fault tolerance.
Apache Flink Stream-first processing with low latency. Stateful processing, scalability.
Apache Spark Unified batch and streaming processing. High performance, ML support.
Elasticsearch Full-text search and real-time analytics. Fast querying, visualization.
Case Studies in Real-Time Sentiment Analysis & Stock
Market Predictions
1. Real-Time Sentiment Analysis Case Study

1.1. Introduction to Real-Time Sentiment Analysis

 Definition: Sentiment analysis refers to determining the sentiment or emotional tone

(positive, negative, neutral) expressed in text data, especially in real-time scenarios
such as social media, customer reviews, or news articles.
 Objective: Analyze and classify sentiments of social media posts, reviews, or news
articles in real time to understand public opinion or market trends.
 Tools/Technologies:
o Natural Language Processing (NLP): Techniques like tokenization, stop
word removal, and sentiment classification.
o Machine Learning: Algorithms like Naive Bayes, Support Vector Machine
(SVM), and deep learning methods for sentiment classification.
o Real-Time Data Stream Processing: Tools like Apache Kafka for data
ingestion, Apache Flink or Spark Streaming for processing, and Elasticsearch
or MongoDB for storage.

1.2. Use Case Example: Brand Sentiment Analysis on Twitter

 Context: A company wants to monitor the public perception of its brand on Twitter in
real-time, especially to track product launches, marketing campaigns, or customer
satisfaction.
o Data Ingestion: Data is continuously collected from Twitter using APIs (like
Tweepy) or platforms like Kafka for real-time streaming.
o Data Processing:
 The streaming data is processed using NLP techniques to identify
keywords, hashtags, and mentions of the brand.
 Sentiment analysis models (trained on labeled data) classify each tweet
as positive, negative, or neutral.
o Real-Time Analytics: The results are immediately reflected on dashboards,
showing trends and sentiment scores, helping the company assess the impact
of their marketing campaign.
 Challenges:
o Ambiguity in language: Tweets often contain sarcasm, slang, or emojis,
which complicates sentiment analysis.
o Volume: The volume of tweets can be overwhelming, necessitating scalable
infrastructure.
o Latency: Ensuring near-zero latency to provide real-time feedback.
 Outcome: The company can react to customer concerns, respond to negative
sentiment, or capitalize on positive sentiment immediately, optimizing marketing
efforts.

1.3. Real-Time Sentiment Analysis Framework

 Data Stream: Twitter API (live feed of tweets).

 Processing Layer: Apache Flink or Spark Streaming for stream processing and
sentiment analysis.
 Visualization: Real-time dashboards built using Grafana or Kibana to present
sentiment trends.
 Machine Learning Model: Pre-trained models or real-time model retraining using
labeled data for classification.

2. Stock Market Predictions Case Study

2.1. Introduction to Stock Market Predictions

 Definition: Stock market prediction involves forecasting the price movement of

stocks or financial assets based on historical data, news, and social media trends.
 Objective: Using real-time data streams (such as financial news, tweets, or stock
tickers) to predict stock prices or market movements.
 Techniques Used:
o Machine Learning: Predictive models like linear regression, decision trees, or
neural networks.
o Time Series Analysis: Analyzing patterns in stock prices over time.
o Natural Language Processing (NLP): Analyzing financial news or sentiment
on social media.
o Deep Learning: Using LSTM (Long Short-Term Memory) networks to
capture temporal dependencies in stock price data.

2.2. Use Case Example: Predicting Stock Prices Using News and Historical Data

 Context: A financial institution wants to predict stock price movements using a

combination of real-time financial news and historical stock market data.
o Data Ingestion:
 Real-time stock price data is fetched using APIs like Alpha Vantage or
Yahoo Finance.
 Financial news articles, tweets, or press releases are fetched using NLP
from sources like Reuters, Bloomberg, and Twitter.
o Data Processing:
 Historical data is pre-processed and used for training machine learning
models.
 News articles are analyzed using sentiment analysis to determine the
impact of news events on stock prices.
o Stock Prediction Model:
 Time series models like ARIMA (AutoRegressive Integrated Moving
Average) are used to forecast stock prices.
 NLP-based sentiment analysis is integrated into the model, with
positive news indicating potential stock price increases, and negative
news potentially predicting a decrease.
o Real-Time Prediction:
 The system processes incoming data and updates stock predictions in
real time, providing traders with actionable insights.
 Challenges:
o Noise in Data: Financial markets can be influenced by external factors like
political events, natural disasters, or market speculation, which are difficult to
predict accurately.
o Market Volatility: Stock prices are volatile and subject to sudden changes,
making predictions difficult.
o Data Overload: Managing the volume of incoming data from news sources
and stock tickers in real-time.
 Outcome: Traders or investors receive alerts or predictions on stock movements,
helping them make informed decisions. For example, the model may predict that a
company’s stock price will rise due to positive earnings reports or increased market
sentiment.

2.3. Stock Market Prediction Framework

 Data Stream: Real-time stock price data (via APIs), financial news (via web
scraping, RSS feeds), or social media sentiment.
 Processing Layer:
o Historical stock data analysis using machine learning models (e.g., LSTM,
ARIMA).
o Sentiment analysis of financial news using NLP tools like NLTK or TextBlob.
o Real-time processing using Apache Kafka for data streaming and Apache
Flink for stream analytics.
 Machine Learning Model: A combination of time-series models (like ARIMA or
LSTM) and sentiment-based models.
 Visualization: Real-time dashboards with predicted stock trends, implemented with
tools like Tableau or Power BI.

3. Benefits of Real-Time Sentiment Analysis and Stock Market Prediction

 Timely Decision Making: Both systems allow businesses and investors to make
decisions based on real-time data, improving responsiveness.
 Market Insight: Sentiment analysis provides insights into public perception, which
can influence stock prices or brand reputation.
 Predictive Power: Stock market prediction models offer the potential to anticipate
market movements, giving traders a competitive edge.

4. Key Points for Exams

 Understand the technologies used in both real-time sentiment analysis (NLP, stream
processing, machine learning) and stock market predictions (time series analysis,
sentiment analysis).
 Know the frameworks: Kafka, Spark Streaming, Flink for data ingestion and
processing; machine learning models like LSTM for stock predictions.
 Discuss challenges: Data quality, model accuracy, latency, market volatility.
 Provide clear examples of applications and their outcomes, such as predicting stock
prices using news and social media sentiment.

Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
A Deep Dive Into Data Stream Processing
No ratings yet
A Deep Dive Into Data Stream Processing
10 pages
Stream Processing
No ratings yet
Stream Processing
33 pages
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
4 Building Blocks of A Streaming Data Architecture
No ratings yet
4 Building Blocks of A Streaming Data Architecture
11 pages
UNIT-3 (Mining Data Streams)
No ratings yet
UNIT-3 (Mining Data Streams)
50 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
DAV Chapter3
No ratings yet
DAV Chapter3
44 pages
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
G23 Parts Manual
No ratings yet
G23 Parts Manual
43 pages
Key Management Services (KMS) Client Activation and Product Keys
100% (1)
Key Management Services (KMS) Client Activation and Product Keys
8 pages
Big Data
No ratings yet
Big Data
37 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Chapter 1-1
No ratings yet
Chapter 1-1
34 pages
Mining&Data Stream Unit-3 - Removed
No ratings yet
Mining&Data Stream Unit-3 - Removed
50 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Data Analytics and Visualization Unit-III
No ratings yet
Data Analytics and Visualization Unit-III
21 pages
Computer Graphics (Lab File) - Satyam
No ratings yet
Computer Graphics (Lab File) - Satyam
61 pages
ĐỀ NGHE SỐ 13A
No ratings yet
ĐỀ NGHE SỐ 13A
10 pages
6 - Streaming Part 1
No ratings yet
6 - Streaming Part 1
44 pages
Unit-2 BDA
No ratings yet
Unit-2 BDA
30 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
BDA Mod 3
No ratings yet
BDA Mod 3
57 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
53 pages
Mining Data Streams
No ratings yet
Mining Data Streams
17 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
50 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
SA Unit 1 PPT 2
No ratings yet
SA Unit 1 PPT 2
27 pages
Unit 3 Data Analytics
No ratings yet
Unit 3 Data Analytics
15 pages
Data Analytics Chapter 3
No ratings yet
Data Analytics Chapter 3
12 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
13 pages
Unit II (Big Data)
No ratings yet
Unit II (Big Data)
19 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
DWDM - Unit - VII
No ratings yet
DWDM - Unit - VII
42 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Unit 3
No ratings yet
Unit 3
30 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
U3 Notes
No ratings yet
U3 Notes
27 pages
Mod4 DWDM BTECH
No ratings yet
Mod4 DWDM BTECH
9 pages
TRabl StreamProcessing
No ratings yet
TRabl StreamProcessing
79 pages
EX288 Red Hat Exam Practice Questions
No ratings yet
EX288 Red Hat Exam Practice Questions
6 pages
Short Notes On Unit 4 - Data Mining and Data Wareho
No ratings yet
Short Notes On Unit 4 - Data Mining and Data Wareho
7 pages
BDA GTU Study Material Presentations Unit-4 29092021094703AM
No ratings yet
BDA GTU Study Material Presentations Unit-4 29092021094703AM
33 pages
Module II
No ratings yet
Module II
22 pages
Bda 2
No ratings yet
Bda 2
16 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
Netflix Premium Cookie 1
No ratings yet
Netflix Premium Cookie 1
3 pages
MMD3
No ratings yet
MMD3
17 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
Appendix C - Simulink Refresher
No ratings yet
Appendix C - Simulink Refresher
27 pages
Data Mining - Unit-V
No ratings yet
Data Mining - Unit-V
12 pages
Bda Ut-2
No ratings yet
Bda Ut-2
18 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
DataStreaming L-4
No ratings yet
DataStreaming L-4
16 pages
Limooezekii Report 7
No ratings yet
Limooezekii Report 7
17 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
10 pages
Emmanuel NIYOMUGABO'S C.V
No ratings yet
Emmanuel NIYOMUGABO'S C.V
5 pages
Unit 2
No ratings yet
Unit 2
10 pages
Unit Iv
No ratings yet
Unit Iv
5 pages
Uint 4miningdatastream 230810162429 9d7c02a7
No ratings yet
Uint 4miningdatastream 230810162429 9d7c02a7
11 pages
Real Time Data Streaming New Techniques
No ratings yet
Real Time Data Streaming New Techniques
5 pages
A
No ratings yet
A
3 pages
Stream Processing and Website Tracking
No ratings yet
Stream Processing and Website Tracking
2 pages
Introduction To Stream Concepts - Stream Data Model and Architecture
No ratings yet
Introduction To Stream Concepts - Stream Data Model and Architecture
8 pages
Tanel Poder Oracle Execution Plans
No ratings yet
Tanel Poder Oracle Execution Plans
32 pages
Numerical I Module-1
No ratings yet
Numerical I Module-1
95 pages
Inverse of A Matrix.01
No ratings yet
Inverse of A Matrix.01
6 pages
Project Book Finish
No ratings yet
Project Book Finish
40 pages
Ma Frico FC QG en 210128
No ratings yet
Ma Frico FC QG en 210128
20 pages
Unit-3 JAVA
No ratings yet
Unit-3 JAVA
57 pages
Year 9 Scheme and Note
No ratings yet
Year 9 Scheme and Note
43 pages
Leca 102
No ratings yet
Leca 102
70 pages
HSD 28491 Camper Catalogue English
No ratings yet
HSD 28491 Camper Catalogue English
92 pages
C VIGIL
No ratings yet
C VIGIL
15 pages
Nigerian Air Force
No ratings yet
Nigerian Air Force
1 page
Wcms 2nd Unit Notes
No ratings yet
Wcms 2nd Unit Notes
31 pages
Geo SCADA Expert Design Guidelines v3.0 - EN - 2021
No ratings yet
Geo SCADA Expert Design Guidelines v3.0 - EN - 2021
27 pages
ASSEMBLY Chapter 10
No ratings yet
ASSEMBLY Chapter 10
45 pages
Co Unit3
No ratings yet
Co Unit3
41 pages
UM S7 Product Data Sheet
No ratings yet
UM S7 Product Data Sheet
2 pages
Activity 1
No ratings yet
Activity 1
7 pages
Monolithic Applications and Microservices: Applications Are Made of Multiple Components. The
No ratings yet
Monolithic Applications and Microservices: Applications Are Made of Multiple Components. The
4 pages
Gaze-Based Trigger
No ratings yet
Gaze-Based Trigger
22 pages
Deep Learning Mental Health Dialogue System
No ratings yet
Deep Learning Mental Health Dialogue System
4 pages
Automatic Generation of CNC Codes Based On Machining Features
No ratings yet
Automatic Generation of CNC Codes Based On Machining Features
5 pages

Big Data 3rd Unit

Uploaded by

Big Data 3rd Unit

Uploaded by

Mining Data Streams

Introduction to Streams Concepts

Stream Data Model and Architecture

1. Stream Data Model:

 Definition: Stream computing refers to real-time processing of data streams as they

Techniques for Mining Data Streams

1. Sampling Data in a Stream

3. Counting Distinct Elements in a Stream

5. Counting Oneness in a Window

Real-Time Analytics Platform (RTAP) Applications

Applications of Mining Data Streams

1. Real-Time Sentiment Analysis

1. Collect tweets or posts using stream ingestion tools.

o Monitoring public perception of a product launch.

2. Stock Market Predictions

1. Collect real-time stock trade data.

o High-frequency trading and portfolio management.

Case Study 1: Real-Time Sentiment Analysis

1. Ingest social media streams using Apache Kafka.

Case Study 2: Stock Market Predictions

Stream Data Model and Architecture

 Definition: Stream data refers to continuous, real-time flows of data generated by

2. Stream Data Model

1. Data Streams: Continuous, time-ordered sequences of tuples or events.

3. Stream Data Processing Architecture

3.1. Data Sources

3.2. Stream Ingestion Layer

 Purpose: Collect and transport data from sources to processing systems.

3.3. Stream Processing Layer

 Purpose: Process and analyze data in real time.

3.4. Data Storage

 Purpose: Store processed data for querying, reporting, or batch analysis.

3.5. Visualization and Querying Layer

 Purpose: Provide insights and actionable analytics.

4. Stream Processing Frameworks

Framework Features Use Cases

1. Scalability: Managing increasing data volume.

6. Comparison: Stream vs. Batch Processing

Aspect Stream Processing Batch Processing

 Definition: A Real-Time Analytics Platform (RTAP) is a system that processes and

1. Data Ingestion: Collects data from multiple sources in real time.

3.1. Internet of Things (IoT) Analytics

3.2. Fraud Detection in Financial Services

 Description: RTAP identifies suspicious transactions or fraud patterns as they occur.

3.3. Social Media and Sentiment Analysis

3.4. E-commerce and Retail Analytics

 Description: Enhances customer experiences and optimizes inventory.

3.5. Healthcare and Patient Monitoring

 Description: Analyzes patient data to deliver timely alerts or decisions.

3.6. Cybersecurity and Intrusion Detection

 Description: Monitors network traffic and detects malicious activities.

3.7. Supply Chain and Logistics

 Description: Optimizes supply chain operations and improves delivery efficiency.

3.8. Energy Management

 Description: Monitors and optimizes energy consumption in real time.

 Description: Analyzing call records and network traffic for insights.

 Reduced Latency: Instant insights and decision-making.

5. Challenges in Implementing RTAP

1. Data Integration: Managing data from diverse sources.

6. Frameworks for Real-Time Analytics

Framework Description Key Features

1.1. Introduction to Real-Time Sentiment Analysis

 Definition: Sentiment analysis refers to determining the sentiment or emotional tone

1.2. Use Case Example: Brand Sentiment Analysis on Twitter

1.3. Real-Time Sentiment Analysis Framework

 Data Stream: Twitter API (live feed of tweets).

2. Stock Market Predictions Case Study

2.1. Introduction to Stock Market Predictions

 Definition: Stock market prediction involves forecasting the price movement of

 Context: A financial institution wants to predict stock price movements using a

2.3. Stock Market Prediction Framework

3. Benefits of Real-Time Sentiment Analysis and Stock Market Prediction

4. Key Points for Exams

You might also like