0% found this document useful (0 votes)

30 views57 pages

BDA Mod 3

Uploaded by

junkmailpavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views57 pages

BDA Mod 3

Uploaded by

junkmailpavan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

SWE2011 - BIG DATA ANALYTICS

FALL 2024-25
Dr.J.Jagannathan
Assistant Professor Sr.Grade 1
School of Computer Science Engineering and Information Systems
Vellore Institute of Technology - Vellore 07-10-2024 1
MODULE:III

Stream Data Mining

Introduction to Streams Concepts – Stream data model and architecture -
Stream Computing, Sampling data in a stream – Filtering streams – Counting
distinct elements in a stream – Estimating moments – Counting oneness in a
window – Decaying window – Real time Analytics Platform(RTAP) applications.

2 07-10-2024
4VS OF BIG DATA

3
INFINITE DATA

High dim. Graph Infinite Machine

Apps
data data data learning

Locality Filtering Recommen

PageRank,
sensitive data SVM der
SimRank
hashing streams systems

Community Queries on Decision Association

Clustering
Detection streams Trees Rules

Dimension Duplicate
Spam Web Perceptron,
ality document
reduction Detection advertising kNN detection
STREAM CONCEPTS

 Recently, there’s been a rise in applications that handle large amounts of data.
Instead of storing this data permanently, these applications work with data
that comes in continuously and changes rapidly.
 This type of data is called “transient data streams.”
 Examples include:
• Financial transactions
• Network monitoring
• Sensor data from devices
• web applications
• manufacturing
5 07-10-2024
STREAM CONCEPTS

 In the data stream model, individual data items may be relational

tuples, e.g., network measurements, call records, web page visits, sensor
readings, and so on.
 However, their continuous arrival in multiple, rapid, time-varying,
possibly unpredictable and unbounded streams appears to yield
some fundamentally new research problems.

6 07-10-2024
DATA STREAM MODEL

 A data stream is a real time continuous and ordered sequence of

items.
 It is not possible to control the order in which the items arrive, nor it is
feasible to locally store a stream in its entirety in any memory device.
 Further a query over streams will actually run continuously over a
period of time and return new results as new data arrives.
 Therefore these are known as long running, continuous, standing
and persistent queries.

7 07-10-2024
CHARACTERISTICS

1. The data model and query processor must allow both order based and time based
operations.
2. The inability to store a complete stream indicates that some approximate summary
structures must be used.
3. Streaming query plans must not use any operators that require the entire input before
any results are produced. Such operators will block the query processor indefinitely.
4. Any query that requires backtracking over a data streams is infeasible. This is due to
storage and performance constraints imposed by a data stream.
5. Applications that monitor streams in real time must react quickly to unusual data values.
6. Scalability requirements dictate the parallel and shared execution of many continuous
queries must be possible.
8 07-10-2024
ARCHITECTURE

 An input monitor may regulate the input streams perhaps by

dropping packets.
 Data are typically stored in three partitions.
1. Temporary working storage (for window queries)
2. Summary storage
3. Static storage for meta-data (Physical location of each storage)

9 07-10-2024
ARCHITECTURE

 Long running queries are registered in the query repository and placed into groups
for shared processing.
 The query processor communicates with the input monitor and may reoptimize the
query plans in response to changing input rates. Results are streamed to the user or
temporarily buffered.
 A Data-Stream-Management System In analogy to a database-management system,
we can view a stream processor as a kind of data-management system, the high-level
organization.

10 07-10-2024
ARCHITECTURE

 Any number of streams can enter the system.

 Each stream can provide elements at its own schedule; they need not
have the same data rates or data types, and the time between elements
of one stream need not be uniform.
 The fact that the rate of arrival of stream elements is not under the
control of the system distinguishes stream processing from the
processing of data that goes on within a database-management system.

11 07-10-2024
07-10-2024 12
ARCHITECTURE

• Streams may be archived in a large archival store, but we assume it is not possible to
answer queries from the archival store. It could be examined only under special
circumstances using time-consuming retrieval processes.
• There is also a working store, into which summaries or parts of streams may be
placed, and which can be used for answering queries.
• The working store might be disk, or it might be main memory, depending on how fast
we need to process queries.
07-10-2024 13
EXAMPLES OF DATA STREAM APPLICATIONS

 Sensor Networks –Alerts and alarms generated as a response to information received from sensors.
Example – Perform joins of several streams like temperature, ocean currents streams at weather
stations to give alerts or warnings like cyclone or tsunami.
 Financial Applications – Online analysis of stock prices and making hold or sell decisions requires
quickly identifying correlations and fast changing trends.Transaction.
 Log Analysis – Online mining of web usage logs, telephone call records and ATM transactions are
examples of data streams. Goal is to find customer behavior patterns. Example – Identify current
buying pattern of users in website and plan advertising campaigns and recommendations.
 Image Data : Satellites often send down to earth streams consisting of many terabytes of images per
day. Surveillance cameras produce images with lower resolution than satellites, but there can be many
of them, each producing a stream of images at intervals like one second.

14 07-10-2024
KINDS OF STREAM PROCESSING TECHNIQUES
• [

• Sampling data in a Stream

– To create a sample of a stream that is usable for a class of queries
• Filtering Data Stream
– To allow particular set of elements by filtering the stream arrival
• Counting distinct elements in a Stream
– To estimate the number of different elements appearing in a stream
• Estimating moments
– Involves the distribution of frequencies of different elements in a stream
• Counting Ones in a Window
– Counting the number of 1’s in the binary stream

1
5
STREAM COMPUTING

 Stream Queries
 There are two ways that queries get asked about streams.
 One-time queries - (a class that includes traditional DBMS queries)
Example - Alert when stock crosses over a price point.
 Continuous queries, on the other hand, are evaluated continuously as data streams continue to
arrive.
Example –Aggregation queries like maximum, average, count etc. Maximum price of stock every hour,
or number of time stock gains over a particular point.

16 07-10-2024
ISSUES IN STREAM PROCESSING

 Deliver elements very rapidly.

 Executed in main memory, without access to secondary storage

17 07-10-2024
SAMPLING DATA IN A STREAM

 Process of collecting a representative collection of samples from entire

stream.
 Usually very smaller than the entire sampling data.
 Retains all the significant characteristic and behaviors of the stream
 Used to estimate / predict many crucial aggregates on the stream.

18 07-10-2024
SAMPLING TECHNIQUES

1. Fixed Proportion Sampling

 Samples fixed proportion of data
 Used when you are aware of the length of data
 Ensures representative sample
 Useful for large volumes
 Less biased than fixed sized sampling
 May lead to under / over representation

19 07-10-2024
SAMPLING TECHNIQUES

1. Fixed Proportion Sampling

 A social media platform wants to analyze the sentiments of its users towards a topic. They
receive millions of tweets per day and use fixed proportion sampling to select a
representative sample. They randomly select 1% of the tweets received each hour.
ensuring a representative sample for statistical analysis of user sentiments towards the
topic.

20 07-10-2024
SAMPLING TECHNIQUES

2. Fixed Size Sampling

 Samples fixed number of data points.
 Does not guarantee representative sample.
 Useful for reducing data volume.
 Can be biased if data is not randomly distributed
 Less effective when data size increases

21 07-10-2024
SAMPLING TECHNIQUES

2. Fixed Size Sampling

Suppose we have a data stream of customer orders for an online store, with 10,000
orders coming in every hour. Using fixed size sampling, we randomly select 1,000 orders
from each hour's data stream for analysis, thus reducing the total number of data points
to process from 10,000 to 1,000 per hour.

22 07-10-2024
SAMPLING TECHNIQUES

3. Biased Reservoir Sampling

 Used in streams to select a subset of the data in a way that is not uniformly random.
 Can lead to a biased sample that may not be representative of the full dataset.
 The selection of elements is based on a predetermined probability distribution that may be
weighted towards certain elements or groups of elements.
 The probability distribution used for biased reservoir sampling may be based on various factors,
such as the frequency of occurrence of certain types of data or the importance of certain data
points.
 Used when there are constraints on the resources available for sampling, such as limited memory
or computational power.
 It is important to carefully consider the potential biases introduced by this sampling technique and
23
adjust07-10-2024
the analysis accordingly
SAMPLING TECHNIQUES

3. Biased Reservoir Sampling

 Suppose we have a data stream of product ratings, and we want to select a sample of
ratings to estimate the average rating of a product. However, we know that some users
tend to give higher ratings than others. Using biased reservoir sampling, we can assign a
higher probability of selection to ratings from users who tend to give more accurate
ratings. This way, our sample is more likely to represent the true average rating of the
product.

24 07-10-2024
SAMPLING TECHNIQUES

4. Concise Sampling
 Goal is to maintain a small reservoir of a fixed size while still achieving representative
sampling of the data stream
 Number of samples that can be stored in memory at a given time is limited, which
can be a challenge when dealing with large data streams.
 Size of the sample may need to be adjusted based on the amount of memory
available to store the data.
 Instead of selecting samples randomly, the sampling algorithm may prioritize choosing
samples with unique or representative values of a particular attribute in the data
stream
25 07-10-2024
SAMPLING TECHNIQUES

4. Concise Sampling
 A bank wants to analyze customer spending habits from a stream of transactions.
 They use concise sampling to choose distinct customer IDs as their attribute.
 The size of the reservoir is limited to 1000 customers.
 They adjust the sample size based on available memory.
 This allows for efficient analysis while maintaining accuracy.

26 07-10-2024
FILTERING STREAMS

 Bloom Filtering
 Space-efficient data structure
 used to check whether an element belongs to a set
 Probably says that the element belongs to a set (False Positive)
 Accurately says that the element does not belongs to a set (Only
True Negatives)
 Hence, Recall rate is 100 %

27 07-10-2024
BLOOM FILTERING

 Example
 Insert elements 10 and 7 in the bloom filter of size 5. Consider
these two hash functions:
 hl(x): x mod 5
 h2(x) : (2x + 6) mod 5
 Comment on the presence of elements 14 and 15.

28 07-10-2024
BLOOM FILTERING

29 07-10-2024
30 07-10-2024
BLOOM FILTERING
• The underlying concept is to utilize the main memory as a bit array.
• With 1 GB of main memory. We have a room for 8 billion
bits.
• Device a hash function ‘h’ and hash each member of ‘S’ to a bit and
set the bit as ‘1’. All the other bits of array remain ‘0’.
• Since there are 1 billion members of ‘S’, approximately
1/8th of the bits will be ‘1’.
• The exact fraction of bit set to ‘1’ will be slightly less than 1/8th
(Because it is possible that two members of ‘S’ may hash into the
same bit.

31
COUNTING DISTINCT PROBLEM
• Data stream consists of a universe of elements chosen
from a set of size N
– Maintain a count of the number of distinct elements seen so
far
• Maintain the set of elements seen so far
– That is, keep a hash table of all the distinct elements seen so
far
– Hashing and variety of algorithms are to be used
32
APPLICATIONS
• A Web site gathering statistics on how many unique
users it has seen in each given month.
– The universal set is the set of logins for that site, and a stream
element is generated each time someone logs in.
– This measure is appropriate for a site like Amazon,
where the typical user logs in with their unique login name.

33
• Web site like Google that does not require login to issue a
search query
– may be able to identify users only by the IP address from
which they send the query.

– There are about 4 billion IP addresses, sequences of

four 8-bit bytes will serve as the universal set in this case.

34
SOLUTION

• The obvious way to solve the problem is to keep in main memory a list of all the
elements seen so far in the stream.
• Adopt an efficient search structure such as a hash table or search tree, so one
can quickly add new elements and check whether or not the element that just
arrived on the stream was already seen.
• As long as the number of distinct elements is not too great, this structure can fit
in main memory and there is little problem obtaining an exact answer to the
question how many distinct elements appear in the stream.
• Approach : Flajolet-Martin Algorithm

35
FLAJOLET-MARTIN-ALGORITHM

 Problem of finding distinct elements in a stream of data with

repetitions
Applications:
IP addresses of packets passing through router
Motifs in DNA sequence
unique visitors to apps/websites

36 07-10-2024
FLAJOLET-MARTIN-ALGORITHM

 S is the stream of with repetitions, and N is the distinct elements

 S : {xl, x2, x3, x4 .... xn} , then N: {al, a2, a3, a4 .... ak} (given K<=n)
 For instance, S : {1,2,4,3,4,3,2,4,1,2} then N : {1,2,3,4} and F : 4
 where F is total number of distinct elements

37 07-10-2024
COUNTING DISTINCT ELEMENTS IN A STREAM – NAÏVE SOLUTION.

SET COUNTER = O
SET UNIQUE SET = [ ]
WHILE COUNTER NOT EQUALS LAST ELEMENT INDEX:
IF CURRENT ELEMENT NOT PRESENT IN UNIQUE SET:
ADD CURRENT ELEMENT IN UNIQUE SET
INCREMENT THE COUNTER
DISPLAY COUNT OF DISTINCT ELEMENTS : LENGTH (UNIQUE SET)

38 07-10-2024
FLAJOLET-MARTIN-ALGORITHM

 To find the approximate number of distinct elements in a stream

 In a single pass
 uses very less memory space while executing
 Hence, efficient and robust
Note: This algorithm is meant to be used when the stream of
elements as well as the expected distinct element count is very very
large

39 07-10-2024
PSEUDOCODE/ALGORITHM

1. Select a hash function h(x) so each element in the set is mapped to a value
to at least log2n bits
2. Convert this h(x) output to binary_value
3. For each binary_value, find r(binary_value) : length of the trailing zeroes in
binary_value
4. Find R : max(r(binary_value))
5. Finally, Approximate count of distinct elements will be 2R

40 07-10-2024
FLAJOLET-MARTIN-ALGORITHM

SET COUNTER = O
SET MAX_R = O
WHILE COUNTER NOT EQUALS LAST ELEMENT INDEX:
VAL = BINARY OF HASH OUTPUT OF CURRENT ELEMENT
COUNT NO. OF TRAILING ZEROES IN VAL
IF COUNT > MAX_R:
MAX_R = COUNT
INCREMENT THE COUNTER
DISPLAY APPX. COUNT OF DISTINCT ELEMENTS:2** (MAX_R)

41 07-10-2024
42 07-10-2024
EXAMPLE

 S=1,3,2,1,2,3,4,3,1,2,3,1
 h(x)=(6x+1) mod 5

43 07-10-2024
COUNTING ONENESS

• A window of length N on a binary stream.

• We focus on the situation where we can not afford to store the entire
window.
• We want at all times to be able to answer queries of the form
“how many 1’s are there in the last k bits?” for any k ≤ N.
• Solution proposed through Datar-Gionis- Indyk- Motwani Algorithm
– DGM algorithm

44 07-10-2024
DATAR-GIONIS- INDYK- MOTWANI ALGORITHM (DGIM)

• Commonly called as Motwani Algorithm or DGIM

• Used to find the number of I's in a stream of data
• Uses O(log2N) bits to represent a window of N bits
• Error rate is no more than 50 %

45 07-10-2024
ELEMENTS
 Timestamp
 Each element entering in the stream will be allotted a timestamp based on the position of
it
 Example: If first bit has timestamp 1, then second bit will have timestamp 2, third bit 3 and
so on....
 Buckets
 Used to represent time intervals in a data stream
 Algorithm divides the stream into buckets, each will have size of power of 2
 Bucket contains the bits O and 1

46 07-10-2024
RULES FOR FORMING A BUCKET

1. Every bucket should contain at least a single 1 in it

2. Right side of the bucket should strictly start from 1
3. Length of the bucket is equal to the number of 1’s in it
4. Every bucket length should be in powers of 2
5. As we move to left, the bucket size should not decrease
6. No more than two buckets can have same size

47 07-10-2024
EXAMPLE

 Consider the following stream:

 Stream = ... 10101111001001100101…
 N = 20

48 07-10-2024
ESTIMATING MOMENTS

• A generalization of the problem of counting distinct

elements in a stream.
– The problem, called computing “moments”
• Involves the distribution of frequencies of different
elements in the stream.
• We shall define moments of all orders and concentrate
on computing second moments, from which the general
algorithm for all moments is a simple extension.
49
DEFINITION OF MOMENTS

• Suppose a stream consists of elements chosen from a universal set.

• Assume the universal set is ordered so we can speak of the ith element for any i.
• Let mi be the number of occurrences of the

• ith element for any i. Then the kth-order

moment (or just kth moment) of the stream is the sum over all i of (mi)k.
• Kth Moment

50
COMPUTING DIFFERENT MOMENTS

• 0th moment - Count the number of different elements in the

stream.
• 1st moment = sum of the numbers of elements in the
stream (length of the stream)
• 2nd moment = surprise number (a measure of how uneven
the distribution is)

51
ALON MATIAS SZEGEDY METHOD
• AMS method works for all moments
• Gives an unbiased estimate
• We will just concentrate on the 2nd moment S
• We pick and keep track of many variables X:
– For each variable X we store X.el and X.val
• X.elcorresponds to the item i
• X.valcorresponds to the count of item i
– Note this requires a count in main memory, so number of
Xs is limited
• Our goal is to compute 52
FIND THE SURPRISE NUMBER

• Given Stream : 7, 9, 8, 7,8,9,9,8,9,7

5
3
REAL-TIME ANALYTICS

 Refers to finding meaningful patterns in data at the actual time of

receiving
 Real-Time Analytics Platform (RTAP) analyses the data, correlates,
and predicts the outcomes in the real time.
 Manages and processes data and helps timely decision-making
 Helps to develop dynamic analysis applications
 Leads to evolution of business intelligence

54 07-10-2024
07-10-2024 55
07-10-2024 56
WIDELY USED RTAPS

 Apache SparkStreaming
 Cisco Connected Streaming Analytics (CSA)
 Oracle Stream Analytics (OSA)
 SAP HANA
 SQL streamBlaze
 TIBCO StreamBase

57 07-10-2024

BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
Big Data Unit III
No ratings yet
Big Data Unit III
20 pages
Bigdata-Mining Data Streams
No ratings yet
Bigdata-Mining Data Streams
19 pages
Heat Pipes Write Up With Example
No ratings yet
Heat Pipes Write Up With Example
9 pages
Module-2-MINING DATA STREAMS
100% (3)
Module-2-MINING DATA STREAMS
17 pages
Unit2 Bda
No ratings yet
Unit2 Bda
293 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
11 pages
Unit 4 Notes PDF
100% (2)
Unit 4 Notes PDF
27 pages
TN4611 PDF
No ratings yet
TN4611 PDF
11 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
64 pages
Using The Leica TC 307 v2
No ratings yet
Using The Leica TC 307 v2
4 pages
Unit-II BDA
No ratings yet
Unit-II BDA
19 pages
Mining Techniques For Streaming Data
No ratings yet
Mining Techniques For Streaming Data
14 pages
Unit 3 - BD - Streaming
No ratings yet
Unit 3 - BD - Streaming
42 pages
Chapter-5 Stream Processing Part1
No ratings yet
Chapter-5 Stream Processing Part1
32 pages
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE2011 ETH VL2024250103282 2024-08-19 Reference-Material-I
53 pages
Unit 2 BD Mining Data Streams
No ratings yet
Unit 2 BD Mining Data Streams
34 pages
Data Analytics Unit 3
No ratings yet
Data Analytics Unit 3
14 pages
UNIT-3 (Mining Data Streams)
No ratings yet
UNIT-3 (Mining Data Streams)
50 pages
Mining Data Streams
No ratings yet
Mining Data Streams
17 pages
Data Structure & Algorithms Lab Manual V1.2-1
No ratings yet
Data Structure & Algorithms Lab Manual V1.2-1
97 pages
Doherty Power Amplifier For 5G Systems
No ratings yet
Doherty Power Amplifier For 5G Systems
25 pages
Bda Ut2 Que Ans
No ratings yet
Bda Ut2 Que Ans
14 pages
Real Time Data Stream Processing Engine
No ratings yet
Real Time Data Stream Processing Engine
13 pages
Data Stream MG
No ratings yet
Data Stream MG
528 pages
Bda L4
No ratings yet
Bda L4
32 pages
Stream Mining
No ratings yet
Stream Mining
65 pages
Unit 2
No ratings yet
Unit 2
10 pages
Unit 3-6
No ratings yet
Unit 3-6
14 pages
U3 Notes
No ratings yet
U3 Notes
27 pages
Module II
No ratings yet
Module II
22 pages
MMD3
No ratings yet
MMD3
17 pages
Big Data IV Nit
No ratings yet
Big Data IV Nit
15 pages
Mining&Data Stream Unit-3 - Removed
No ratings yet
Mining&Data Stream Unit-3 - Removed
50 pages
Bigdata Unit-Ii
No ratings yet
Bigdata Unit-Ii
33 pages
Unit 2
No ratings yet
Unit 2
23 pages
Stream Data
No ratings yet
Stream Data
70 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
50 pages
DWDM - Unit - VII
No ratings yet
DWDM - Unit - VII
42 pages
BDA GTU Study Material Presentations Unit-4 29092021094703AM
No ratings yet
BDA GTU Study Material Presentations Unit-4 29092021094703AM
33 pages
Mod4 DWDM BTECH
No ratings yet
Mod4 DWDM BTECH
9 pages
Unit-3 Notes
No ratings yet
Unit-3 Notes
10 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
19 pages
Mining Data Streams
No ratings yet
Mining Data Streams
33 pages
Rust Experimental v2017 DevBlog 179 x64 #KnightsTable
No ratings yet
Rust Experimental v2017 DevBlog 179 x64 #KnightsTable
2 pages
Big Data 3rd Unit
No ratings yet
Big Data 3rd Unit
16 pages
Bigdata Unit II
No ratings yet
Bigdata Unit II
57 pages
Introduction To Stream Concepts - Stream Data Model and Architecture
No ratings yet
Introduction To Stream Concepts - Stream Data Model and Architecture
8 pages
Unit-Ii 30-1-24
No ratings yet
Unit-Ii 30-1-24
162 pages
BigData Mod2
No ratings yet
BigData Mod2
12 pages
A
No ratings yet
A
3 pages
Unit-II (Big Data)
No ratings yet
Unit-II (Big Data)
20 pages
Swe2011 Bda - III
No ratings yet
Swe2011 Bda - III
53 pages
Unit II (Big Data)
No ratings yet
Unit II (Big Data)
19 pages
Bda 2
No ratings yet
Bda 2
16 pages
Unit-2 BDA
No ratings yet
Unit-2 BDA
30 pages
Big Data
No ratings yet
Big Data
37 pages
Big Data Unit Ii Notes
No ratings yet
Big Data Unit Ii Notes
19 pages
Bda M4
No ratings yet
Bda M4
57 pages
Big Data Analytics - Unit 2 Notes
No ratings yet
Big Data Analytics - Unit 2 Notes
44 pages
Data Analytics and Visualization Unit-III
No ratings yet
Data Analytics and Visualization Unit-III
21 pages
Unit 3
No ratings yet
Unit 3
30 pages
Stream Processing
No ratings yet
Stream Processing
70 pages
Piling
No ratings yet
Piling
20 pages
Mining Data Streams
No ratings yet
Mining Data Streams
37 pages
Data Stream Unit4
No ratings yet
Data Stream Unit4
20 pages
DSP in Radar
No ratings yet
DSP in Radar
11 pages
Install Active-Directory
No ratings yet
Install Active-Directory
4 pages
CAIE IGCSE Physics Theory
No ratings yet
CAIE IGCSE Physics Theory
52 pages
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
No ratings yet
Digital Filter Design (FIR) Using Frequency Sampling Method: Abstract
10 pages
Modified Test
No ratings yet
Modified Test
12 pages
Local Attraction
No ratings yet
Local Attraction
15 pages
Chapter 4:jfet: Junction Field Effect Transistor
No ratings yet
Chapter 4:jfet: Junction Field Effect Transistor
67 pages
Class 10 2019 Science Set 2
No ratings yet
Class 10 2019 Science Set 2
11 pages
Discrete-Time Simulation With Simulink: ECE4560: Digital Control Laboratory
No ratings yet
Discrete-Time Simulation With Simulink: ECE4560: Digital Control Laboratory
5 pages
Ats Phy 09 F4 P2
No ratings yet
Ats Phy 09 F4 P2
60 pages
AP1501
No ratings yet
AP1501
12 pages
JASCO FT-IR Spectrometers
No ratings yet
JASCO FT-IR Spectrometers
2 pages
Quarter 3 - Module 1C: Nature of Crystals
No ratings yet
Quarter 3 - Module 1C: Nature of Crystals
14 pages
Omkw 1
No ratings yet
Omkw 1
32 pages
A Simheuristic Approach For Throughput Maximization of A - 2020 - Computers - Op
No ratings yet
A Simheuristic Approach For Throughput Maximization of A - 2020 - Computers - Op
13 pages
Unit 6
No ratings yet
Unit 6
9 pages
Mathematical - Optimization in Management of Services
No ratings yet
Mathematical - Optimization in Management of Services
20 pages
JTT v6.21 en
No ratings yet
JTT v6.21 en
32 pages
BMC JE Brochure English 1729259098
No ratings yet
BMC JE Brochure English 1729259098
7 pages
09 2024
No ratings yet
09 2024
37 pages
Structural Optimization in Civil Engineering A Lit
No ratings yet
Structural Optimization in Civil Engineering A Lit
28 pages
Accelerated Synthesis of Novel Materials
No ratings yet
Accelerated Synthesis of Novel Materials
12 pages
191008-Elmsbrook BREEAM Daylighting-Rev03
No ratings yet
191008-Elmsbrook BREEAM Daylighting-Rev03
10 pages
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
From Everand
Principles of Real-Time Data Streaming: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

BDA Mod 3

Uploaded by

BDA Mod 3

Uploaded by

SWE2011 - BIG DATA ANALYTICS

Stream Data Mining

High dim. Graph Infinite Machine

Locality Filtering Recommen

Community Queries on Decision Association

 In the data stream model, individual data items may be relational

 A data stream is a real time continuous and ordered sequence of

 An input monitor may regulate the input streams perhaps by

 Any number of streams can enter the system.

• Sampling data in a Stream

 Deliver elements very rapidly.

 Process of collecting a representative collection of samples from entire

1. Fixed Proportion Sampling

1. Fixed Proportion Sampling

2. Fixed Size Sampling

2. Fixed Size Sampling

3. Biased Reservoir Sampling

3. Biased Reservoir Sampling

– There are about 4 billion IP addresses, sequences of

 Problem of finding distinct elements in a stream of data with

 S is the stream of with repetitions, and N is the distinct elements

 To find the approximate number of distinct elements in a stream

• A window of length N on a binary stream.

• Commonly called as Motwani Algorithm or DGIM

1. Every bucket should contain at least a single 1 in it

 Consider the following stream:

• A generalization of the problem of counting distinct

• Suppose a stream consists of elements chosen from a universal set.

• ith element for any i. Then the kth-order

• 0th moment - Count the number of different elements in the

• Given Stream : 7, 9, 8, 7,8,9,9,8,9,7

 Refers to finding meaningful patterns in data at the actual time of

You might also like