0% found this document useful (0 votes)

24 views10 pages

Unit-3 Notes

Data analytics notes

Uploaded by

abhijeet12122005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views10 pages

Unit-3 Notes

Data analytics notes

Uploaded by

abhijeet12122005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Mining Data Streams

Data stream mining involves extracting meaningful information from continuous, fast, and
large volumes of data as it flows in real-time. This area of data mining is crucial for applications
that require immediate insights, such as fraud detection, network monitoring, and real-time
recommendation systems.

1. Introduction to Stream Concepts

A data stream is a sequence of data elements made available over time. Unlike traditional static
datasets, data streams are:

 Continuous: Data arrives in an ongoing flow without end.

 High-Speed: Data often arrives at a rapid pace, which can be challenging to process in
real-time.
 Large-Scale: The volume of incoming data can be vast and unbounded, making it
impractical to store all data elements.

Due to these characteristics, stream processing requires techniques that can handle real-time,
memory-efficient, and incremental processing.

Key Challenges in Data Stream Mining:

 Limited Storage: Only a subset of data can be stored at any time.

 Limited Processing Time: Each data element must be processed quickly before the next
one arrives.
 Dynamic Nature: Data patterns may evolve over time, requiring adaptive models.

2. Stream Data Model

The stream data model represents data as a series of observations over time, often arriving in
the form of tuples or objects. Each tuple has a timestamp and a set of attributes representing the
data characteristics. Stream data models are typically designed to handle two types of operations:

 One-pass Algorithms: Algorithms that only pass over the data once or a limited number
of times.
 Approximate Query Processing: Due to storage and time constraints, approximations
are often used instead of exact results.

Types of Stream Queries:

 Sliding Window Queries: Focus on the most recent data within a specified window of
time.
 Count-Based Window Queries: Process a fixed number of the most recent data
elements.
 Continuous Queries: Persistently run and update results as new data arrives.
3. Stream Data Architecture

Stream data architecture outlines the framework that supports efficient processing, querying,
and managing of data streams. An effective architecture typically includes:

 Data Sources: Points of origin for streaming data, such as sensors, social media feeds, or
log files.
 Data Ingestion Layer: Responsible for ingesting data into the system, often involving
message brokers like Apache Kafka or RabbitMQ.
 Stream Processing Engine: The core component that processes, filters, and aggregates
the stream data in real-time. Popular stream processing engines include Apache Flink,
Apache Storm, and Spark Streaming.
 Storage Layer: Temporary or long-term storage for processed data, which can be
managed by databases designed for time-series data like InfluxDB or NoSQL databases
like Cassandra.
 Query and Analytics Layer: Provides real-time analytics by querying and analyzing the
data in-stream, delivering insights as soon as data arrives.

Streaming vs. Batch Processing:

 Batch Processing: Processes large volumes of static data, typically in periodic intervals.
 Stream Processing: Continuously processes data as it arrives, suitable for applications
requiring low-latency responses.

Applications of Data Stream Mining:

 Fraud Detection: Identifying suspicious transactions as they happen.

 Network Traffic Monitoring: Monitoring and detecting anomalies in real-time to
prevent security threats.
 Social Media Analytics: Tracking trending topics and sentiment in real-time for
marketing insights.
 Stock Market Analysis: Analyzing live data streams from financial markets to make
trading decisions.

Data stream mining enables organizations to leverage real-time insights and adapt to changing
conditions dynamically, making it integral to fields that rely on time-sensitive decisions.

Stream Computing and Key Techniques in Data Stream Mining

In data stream mining, stream computing involves handling and analyzing real-time data flows
to extract insights and make decisions on-the-fly. Given the unique characteristics of data
streams—large volume, continuous flow, and the need for immediate processing—efficient
techniques are crucial for processing data without storing it in its entirety. Here, we explore core
techniques, including sampling data, filtering streams, counting distinct elements, and
estimating moments.
1. Stream Computing

Stream computing processes data as it flows, in contrast to traditional batch processing. The
primary objectives are to ensure:

 Real-Time Processing: Quickly respond to and analyze incoming data.

 Scalability: Handle high volumes of continuous data.
 Fault Tolerance: Ensure reliability in cases of data failure or delays.

Key applications of stream computing include fraud detection, network monitoring, and real-
time recommendation systems, which all benefit from low-latency, real-time analytics.

2. Sampling Data in a Stream

Sampling is a technique to handle the high volume of data in a stream by selecting a

representative subset of data points. This allows for efficient processing without requiring all
data to be stored or processed in real time.

Common Sampling Techniques:

 Reservoir Sampling: A method that maintains a random sample of a fixed size from a
continuously arriving data stream. It ensures every element in the stream has an equal
chance of being included in the sample.
 Sliding Window Sampling: Focuses only on the most recent data within a fixed-size
time window or count window. This is useful when recent data is more relevant for
decision-making.

Applications of Sampling:

 Data Monitoring: By sampling a subset of network traffic, it’s possible to identify trends
without overloading the system.
 Predictive Maintenance: Sampling sensor data in industrial systems to monitor
equipment health while reducing data storage requirements.

3. Filtering Streams

Filtering involves selecting specific data elements from the stream based on criteria, enabling
the system to process only relevant data and ignore unnecessary information. Filtering reduces
noise and focuses processing on meaningful data.

Common Filtering Techniques:

 Bloom Filters: A probabilistic data structure that helps test whether an element is part of
a set. Bloom filters are efficient for filtering since they require minimal memory but
allow for a small probability of false positives.
 Predicate-Based Filtering: Based on conditions such as value ranges or specific
keywords. For example, filtering social media feeds by keywords related to a brand or
event.
 Threshold Filtering: Retains only data that meets certain thresholds, often used for
anomaly detection.

Applications of Filtering:

 Content Moderation: Filtering out inappropriate content in social media streams.

 Sensor Data Processing: Filtering out normal readings and keeping only those that
indicate potential issues.

4. Counting Distinct Elements in a Stream

Counting distinct elements in a data stream, or estimating the cardinality, is challenging due to
memory constraints. Exact counting would require storing all elements, so approximation
techniques are commonly used.

Approximation Techniques:

 HyperLogLog (HLL): An algorithm that provides a probabilistic estimate of the number

of distinct elements using limited memory. It is highly space-efficient and works well for
large-scale data.
 FM-Sketch: An older technique that approximates the count of distinct items by utilizing
hashing and bit patterns to save memory.

Applications of Distinct Counting:

 User Counting: Estimating the number of unique users on a website.

 Ad Campaigns: Tracking unique impressions or interactions in real-time for online
advertisements.

5. Estimating Moments in Data Streams

Estimating moments in data streams helps quantify various aspects of the distribution of
incoming data, such as its mean, variance, and higher-order moments. Moments are useful for
understanding the shape and spread of the data distribution in real-time.

Moments in Data Streams:

 First Moment (Mean): The average of the stream data, used to estimate the central
tendency.
 Second Moment (Variance): Measures the spread of the data, useful for detecting
anomalies or shifts in data distribution.
 Higher-Order Moments: Indicate the skewness (third moment) or kurtosis (fourth
moment), which are helpful in identifying data distributions with unusual characteristics.
Estimating Techniques:

 AMS (Alon-Matias-Szegedy) Sketch: An algorithm used to approximate the second

moment in data streams, focusing on maintaining efficiency.
 Count-Min Sketch: A probabilistic data structure that approximates the frequency of
elements in a stream, which can be extended to estimate moments.

Applications of Moment Estimation:

 Financial Analysis: Monitoring real-time data distribution to detect unusual market

behavior.
 Quality Control: Identifying shifts in manufacturing data that may indicate quality
issues.

Summary of Techniques in Stream Computing

Technique Description Applications

Real-time processing of Fraud detection, real-time
Stream Computing
continuous data flows recommendations, network monitoring
Selecting a representative subset Network monitoring, predictive
Sampling
of data points maintenance
Selecting only relevant data Content moderation, sensor anomaly
Filtering
based on criteria detection
Counting Distinct Estimating unique elements
Unique user tracking, ad impressions
Elements without storing all data
Estimating Quantifying data distribution
Financial analysis, quality control
Moments characteristics

These stream computing techniques provide the efficiency needed for real-time applications by
processing data incrementally and memory-efficiently, allowing for immediate and actionable
insights from fast-moving data streams.

Advanced Stream Processing Techniques and Real-Time Analytics

In real-time data analytics, several sophisticated techniques allow systems to analyze high-
velocity data streams efficiently. This includes counting oneness in a window, decaying
windows, and Real-Time Analytics Platform (RTAP) applications, each addressing specific
needs in stream processing and analytics.

1. Counting Oneness in a Window

Counting oneness refers to the process of identifying the occurrence of unique values (or specific
values, often binary "1" values) within a fixed window of a data stream. This is essential in
applications where the frequency of certain events must be tracked in a limited time span or
within a set of the latest observations.
Types of Windows for Counting:

 Fixed-Time Window: Counts the occurrences of "1" within a specific time frame (e.g.,
in the past 10 minutes). This is commonly used in systems that need time-bound
statistics.
 Sliding Window: Continuously updates the count by removing old data points and
adding new ones, which maintains a real-time count of occurrences within a rolling
window.
 Count-Based Window: Counts the number of occurrences within a specified number of
recent data points (e.g., the last 100 records), regardless of when they arrived.

Applications:

 Network Traffic Analysis: Counting certain packets (like error flags) in network traffic.
 Social Media Monitoring: Counting mentions or specific keywords within a given time
window.
 Manufacturing: Counting defect occurrences in real-time within a production batch.

2. Decaying Window

A decaying window applies a weighting mechanism to reduce the influence of older data points
in a data stream. This technique is helpful in scenarios where more recent data is more valuable
than older data, and it ensures that old information doesn’t unduly influence current decisions.

Decay Methods:

 Exponential Decay: Assigns exponentially decreasing weights to data as it ages. This

method is commonly used due to its simplicity and efficiency.
 Time-Based Decay: Weights data based on how much time has passed since each data
point entered the system. This approach is beneficial for applications with varying arrival
patterns.

Applications:

 Financial Market Analysis: Real-time stock analytics, where recent trends are more
critical.
 Customer Behavior Analysis: Tracking user interactions, where recent actions are
weighted more heavily in predicting user behavior.
 Predictive Maintenance: In industrial settings, decaying windows can emphasize recent
readings for condition monitoring, as more recent anomalies often suggest imminent
failure.

3. Real-Time Analytics Platform (RTAP) Applications

A Real-Time Analytics Platform (RTAP) is an end-to-end system designed to support real-
time processing, analysis, and visualization of streaming data. RTAPs enable businesses to make
instant decisions based on real-time insights, playing a crucial role in sectors requiring low-
latency analytics.

Components of an RTAP:

 Data Ingestion Layer: Collects data from various sources, often through message
brokers like Apache Kafka, which handle high-throughput data flow.
 Stream Processing Engine: Processes incoming data in real time, performing
transformations, filtering, and aggregations using tools such as Apache Flink, Apache
Storm, or Spark Streaming.
 Storage Layer: A scalable database to store processed results, often in NoSQL systems
(e.g., Cassandra, HBase) or time-series databases (e.g., InfluxDB).
 Analytics and Visualization Layer: Provides dashboards, alerts, and other forms of
visualization to interpret results in real-time, typically using BI tools like Tableau,
Kibana, or Grafana.

RTAP Applications:

 Retail Analytics: Real-time tracking of customer interactions and purchase patterns to

drive recommendations and dynamic pricing.
 IoT Device Monitoring: Collecting sensor data from IoT devices to detect anomalies or
trends and to trigger alerts for maintenance.
 Financial Fraud Detection: Monitoring transactions in real-time to identify suspicious
behavior, helping prevent fraud.
 Telecom Network Management: Tracking network health metrics in real-time to
manage bandwidth and optimize resource allocation.

Summary of Techniques and Applications in Real-Time Data Processing

Technique Description Applications

Counting occurrences of specific
Counting Oneness in Network traffic analysis, social
events within a time or count
a Window media monitoring, manufacturing
window
Financial analytics, customer
Applying weight decay to older
Decaying Window behavior tracking, predictive
data points to prioritize recent data
maintenance
An end-to-end system for
Real-Time Analytics Retail analytics, IoT monitoring,
processing and visualizing data in
Platform (RTAP) fraud detection, telecom management
real time

These techniques and platforms support the demands of real-time analytics by focusing on
efficient data handling, prioritizing recent data, and facilitating instant decision-making across
diverse applications.
Case Studies in Real-Time Data Analytics

Real-time data analytics has been transformative in industries such as finance and social media,
where immediate insights can drive crucial decisions. Below are two case studies that illustrate
the use of real-time analytics in sentiment analysis and stock market predictions.

1. Real-Time Sentiment Analysis

Objective:
To monitor and analyze social media sentiment in real-time, providing insights into public
opinion and trends.

Overview:
Real-time sentiment analysis uses machine learning and natural language processing (NLP) to
track and analyze text data from sources like Twitter, Facebook, and news websites. Companies
and brands can quickly identify shifts in public sentiment, whether positive, neutral, or negative,
and respond proactively to public perception.

System Architecture:

 Data Ingestion: Data is collected from various social media channels using APIs like
Twitter’s Streaming API or web scraping methods.
 Text Preprocessing: Raw text is cleaned by removing stop words, punctuation, and
performing tokenization.
 Sentiment Analysis Model: NLP models, such as pre-trained BERT or LSTM networks,
classify each piece of text as positive, neutral, or negative. Alternatively, simpler lexicon-
based models are used for faster, rule-based analysis.
 Stream Processing: Tools like Apache Kafka handle data ingestion, while Apache Flink
or Spark Streaming enables the processing pipeline.
 Visualization and Reporting: Sentiment scores and trends are displayed on dashboards
using tools like Tableau, Kibana, or Grafana for real-time insights.

Applications of Real-Time Sentiment Analysis:

 Brand Monitoring: Companies use real-time sentiment analysis to track customer

feedback, allowing them to address issues or amplify positive responses.
 Political Campaigns: Monitoring public opinion during elections to adjust messaging
strategies in response to real-time feedback.
 Event Tracking: Assessing sentiment around events (e.g., product launches, corporate
announcements) to gauge immediate public reaction.

Example Case:
A large retail company implemented a real-time sentiment analysis system to track customer
reactions during a holiday sales event. By monitoring real-time feedback, the company could
quickly identify issues with product availability and customer service, deploying resources to
resolve issues immediately and thereby improving overall customer satisfaction.
2. Stock Market Predictions

Objective:
To predict stock price movements using real-time market data and external indicators like
financial news and social media sentiment.

Overview:
Stock market prediction models combine historical price data with real-time information sources,
such as financial news, social media sentiment, and economic indicators, to make short-term
predictions. Accurate, high-frequency predictions can help traders and financial institutions make
fast, data-driven trading decisions.

System Architecture:

 Data Sources: Real-time data from stock exchanges, financial news sources, and social
media platforms is collected.
 Preprocessing and Feature Engineering: Time-series data from stock prices is
normalized and cleaned. External text data, such as financial news, is preprocessed and
categorized by sentiment.
 Predictive Modeling: Machine learning models like ARIMA, LSTM networks, or
Reinforcement Learning algorithms are used to predict stock price movements. Some
systems integrate sentiment scores from news and social media as additional features for
prediction.
 Streaming and Processing: Real-time data is ingested and processed using platforms
like Apache Kafka, which sends data to a model inference pipeline.
 Visualization and Execution: Predicted prices are visualized on real-time dashboards,
and trading signals are sent to automated trading systems for execution.

Applications of Stock Market Prediction Models:

 Algorithmic Trading: Real-time stock predictions enable algorithmic trading systems to

capitalize on small market movements.
 Risk Management: Financial institutions use predictions to adjust their portfolios based
on anticipated price movements.
 Retail Investors: Some platforms offer prediction-based insights for retail investors,
helping them make more informed decisions.

Example Case:
A hedge fund developed a stock prediction system that incorporated sentiment analysis from
financial news. By analyzing the sentiment surrounding specific stocks, the system generated
signals for short-term price movements, allowing the fund to adjust its trading strategies in real-
time. This approach resulted in higher returns as the system could respond to emerging news that
impacted stock prices before the broader market reacted.
Key Takeaways

Case Study Objective Technologies Used Outcome

Improved brand
Real-Time Monitor public
NLP, Apache Kafka, monitoring, rapid
Sentiment sentiment on social
Spark/Flink, Dashboards response to customer
Analysis media in real time
feedback
Predict stock price LSTM, Reinforcement Enhanced trading
Stock Market
movements with real- Learning, Apache Kafka, strategies, increased
Predictions
time data Visualization tools returns

These case studies illustrate the power of real-time data analytics in transforming traditional
methods into proactive, data-driven systems that support rapid, informed decision-making.

STB ZTE B860H 4k Firmware (Root and Unlock) - Leakite
No ratings yet
STB ZTE B860H 4k Firmware (Root and Unlock) - Leakite
3 pages
Design Notation and Specification
No ratings yet
Design Notation and Specification
12 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
Big Data Unit III
No ratings yet
Big Data Unit III
20 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
UNIT-3 (Mining Data Streams)
No ratings yet
UNIT-3 (Mining Data Streams)
50 pages
E Books
No ratings yet
E Books
10 pages
Positioning Red Hat Enterprise Linux To Win
No ratings yet
Positioning Red Hat Enterprise Linux To Win
9 pages
Bscit 201
No ratings yet
Bscit 201
2 pages
M TrangoLINK Giga
No ratings yet
M TrangoLINK Giga
161 pages
GIScience2013 Week13a
No ratings yet
GIScience2013 Week13a
61 pages
INSPIRE Data Specification AD v3.0.1
No ratings yet
INSPIRE Data Specification AD v3.0.1
177 pages
Learner Guide Troubleshooting HP Networks 1041 No Watermark
No ratings yet
Learner Guide Troubleshooting HP Networks 1041 No Watermark
108 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
Flyer
No ratings yet
Flyer
1 page
Recommendation System
No ratings yet
Recommendation System
70 pages
Mining&Data Stream Unit-3 - Removed
No ratings yet
Mining&Data Stream Unit-3 - Removed
50 pages
Ip Practical File 2
No ratings yet
Ip Practical File 2
30 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
53 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
BDA GTU Study Material Presentations Unit-4 29092021094703AM
No ratings yet
BDA GTU Study Material Presentations Unit-4 29092021094703AM
33 pages
BDA Mod 3
No ratings yet
BDA Mod 3
57 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
50 pages
Extend QP To Custom Applications
No ratings yet
Extend QP To Custom Applications
21 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
DWDM - Unit - VII
No ratings yet
DWDM - Unit - VII
42 pages
Stream Mining
No ratings yet
Stream Mining
65 pages
U3 Notes
No ratings yet
U3 Notes
27 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Human Resource Services - Business Listings - Kenya Companies - Business Directory & Listings
No ratings yet
Human Resource Services - Business Listings - Kenya Companies - Business Directory & Listings
16 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Unit 3
No ratings yet
Unit 3
30 pages
WK 4 Data Flow Diagram (DFD)
No ratings yet
WK 4 Data Flow Diagram (DFD)
26 pages
Big Data
No ratings yet
Big Data
37 pages
Slides GeoprocessWithPythonInArcGIS10 1
No ratings yet
Slides GeoprocessWithPythonInArcGIS10 1
22 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
Data Structure Tree: DR Mourad Raafat
No ratings yet
Data Structure Tree: DR Mourad Raafat
21 pages
Managing Rogu Devices
No ratings yet
Managing Rogu Devices
12 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
Unit 2
No ratings yet
Unit 2
23 pages
Data Analytics Assignment
No ratings yet
Data Analytics Assignment
20 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Stream Data
No ratings yet
Stream Data
70 pages
Bda 2
No ratings yet
Bda 2
16 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
29 pages
Data Mining Unit-V
No ratings yet
Data Mining Unit-V
19 pages
PIIS235271102300153X
No ratings yet
PIIS235271102300153X
8 pages
Unit 2
No ratings yet
Unit 2
10 pages
21-22 - Knowledge Systems - Expert Systems, Recommenders, Chatbots, Virtual Personal Assistants, and Robo Advisors
No ratings yet
21-22 - Knowledge Systems - Expert Systems, Recommenders, Chatbots, Virtual Personal Assistants, and Robo Advisors
46 pages
MMD3
No ratings yet
MMD3
17 pages
Unit II (Big Data)
No ratings yet
Unit II (Big Data)
19 pages
Big Data Analytics Rajnish)
No ratings yet
Big Data Analytics Rajnish)
13 pages
Bda Ut2 Que Ans
No ratings yet
Bda Ut2 Que Ans
14 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
STM32 Ipod Iphone Accessories Library - Presentation v0.2
No ratings yet
STM32 Ipod Iphone Accessories Library - Presentation v0.2
27 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
13 pages
DA Unit 3
No ratings yet
DA Unit 3
12 pages
Unit Iv
No ratings yet
Unit Iv
11 pages
BDA - Question Bank - 2
No ratings yet
BDA - Question Bank - 2
12 pages
OOP Sessional 1 Solution
No ratings yet
OOP Sessional 1 Solution
11 pages
Uint 4miningdatastream 230810162429 9d7c02a7
No ratings yet
Uint 4miningdatastream 230810162429 9d7c02a7
11 pages
Data Analytics Chapter 3
No ratings yet
Data Analytics Chapter 3
12 pages
Introduction To Modeling and Simulation of Technical and Physical Systems With Modelica - 2011 - Fritzson - Frontmatter
No ratings yet
Introduction To Modeling and Simulation of Technical and Physical Systems With Modelica - 2011 - Fritzson - Frontmatter
12 pages
Data Mining - Unit-V
No ratings yet
Data Mining - Unit-V
12 pages
Interview Questions For Freshers MMCOE
No ratings yet
Interview Questions For Freshers MMCOE
4 pages
Mining Techniques For Streaming Data
No ratings yet
Mining Techniques For Streaming Data
14 pages
Unit 4
No ratings yet
Unit 4
10 pages
Data Stream Unit4
No ratings yet
Data Stream Unit4
20 pages
TESTING Questions
No ratings yet
TESTING Questions
6 pages
SMP Gateway and SEL Relays
No ratings yet
SMP Gateway and SEL Relays
6 pages
Mod4 DWDM BTECH
No ratings yet
Mod4 DWDM BTECH
9 pages
Class Assignment 3
No ratings yet
Class Assignment 3
6 pages
Unit Iv
No ratings yet
Unit Iv
5 pages
pythonOCC Parametric Modeling Tutorial
No ratings yet
pythonOCC Parametric Modeling Tutorial
11 pages
Rajat AIML
No ratings yet
Rajat AIML
8 pages
Mining Data Streams
No ratings yet
Mining Data Streams
17 pages
SPM Unitwise Imp Questions
No ratings yet
SPM Unitwise Imp Questions
4 pages
Short Notes On Unit 4 - Data Mining and Data Wareho
No ratings yet
Short Notes On Unit 4 - Data Mining and Data Wareho
7 pages
Cucumber MCQ 2
No ratings yet
Cucumber MCQ 2
3 pages
A
No ratings yet
A
3 pages
Stream Processing and Website Tracking
No ratings yet
Stream Processing and Website Tracking
2 pages
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
From Everand
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Unit-3 Notes

Uploaded by

Unit-3 Notes

Uploaded by

Mining Data Streams

1. Introduction to Stream Concepts

 Continuous: Data arrives in an ongoing flow without end.

Key Challenges in Data Stream Mining:

 Limited Storage: Only a subset of data can be stored at any time.

2. Stream Data Model

Types of Stream Queries:

Streaming vs. Batch Processing:

Applications of Data Stream Mining:

 Fraud Detection: Identifying suspicious transactions as they happen.

Stream Computing and Key Techniques in Data Stream Mining

 Real-Time Processing: Quickly respond to and analyze incoming data.

2. Sampling Data in a Stream

Sampling is a technique to handle the high volume of data in a stream by selecting a

Common Sampling Techniques:

Common Filtering Techniques:

 Content Moderation: Filtering out inappropriate content in social media streams.

4. Counting Distinct Elements in a Stream

 HyperLogLog (HLL): An algorithm that provides a probabilistic estimate of the number

Applications of Distinct Counting:

 User Counting: Estimating the number of unique users on a website.

5. Estimating Moments in Data Streams

Moments in Data Streams:

 AMS (Alon-Matias-Szegedy) Sketch: An algorithm used to approximate the second

Applications of Moment Estimation:

 Financial Analysis: Monitoring real-time data distribution to detect unusual market

Summary of Techniques in Stream Computing

Technique Description Applications

Advanced Stream Processing Techniques and Real-Time Analytics

1. Counting Oneness in a Window

 Exponential Decay: Assigns exponentially decreasing weights to data as it ages. This

3. Real-Time Analytics Platform (RTAP) Applications

 Retail Analytics: Real-time tracking of customer interactions and purchase patterns to

Summary of Techniques and Applications in Real-Time Data Processing

Technique Description Applications

1. Real-Time Sentiment Analysis

Applications of Real-Time Sentiment Analysis:

 Brand Monitoring: Companies use real-time sentiment analysis to track customer

Applications of Stock Market Prediction Models:

 Algorithmic Trading: Real-time stock predictions enable algorithmic trading systems to

Case Study Objective Technologies Used Outcome

You might also like