0% found this document useful (0 votes)
23 views3 pages

Module 2 Session 7 Counting of Ones in A Window Decaying Windows

The document discusses the sliding window model for data stream algorithms, focusing on counting the number of 1's in a recent data stream using the Datar-Gionis-Indyk-Motwani (DGIM) algorithm. It outlines the rules for bucket representation in the DGIM algorithm and the maintenance of these buckets as new bits arrive. Additionally, it introduces the concept of decaying windows for applications that prioritize recent data over older data.

Uploaded by

s903019.1265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views3 pages

Module 2 Session 7 Counting of Ones in A Window Decaying Windows

The document discusses the sliding window model for data stream algorithms, focusing on counting the number of 1's in a recent data stream using the Datar-Gionis-Indyk-Motwani (DGIM) algorithm. It outlines the rules for bucket representation in the DGIM algorithm and the maintenance of these buckets as new bits arrive. Additionally, it introduces the concept of decaying windows for applications that prioritize recent data over older data.

Uploaded by

s903019.1265
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

CS6CRT19 Big Data Analytics Module 2

Counting of 1’s in a Window


The sliding window model for data stream algorithms is a popular model for
infinite data stream processing. Window refers to the time interval during which
stream raised and processed the queries. The receiving of data elements is taking
place one by one. Statistical computations are over a sliding time-window of size N
(not over the whole stream) in time-units. Window covers the most recent data
items arrived.
Sliding window focuses on recent data and hence provides more significant
and relevant data in real-world applications. The network traffic analysis requires
analysis based on the recent past. This is more informative and useful than analysis
based on stale data. A useful model of stream processing is the one in which
queries are processed for a window of length N, where N corresponds to the
most-recent elements received. Usually it is so that N is very large and cannot be
stored on a storage device, or there are so many streams that elements from
windows for all cannot be stored.
Let us consider a counting problem in which we need to count the number
of 1’s present in the last k bits received (where k<=N, and N is the window
length), in a given stream of 0’s and 1’s. The obvious solution is to store the most
recent N bits. When a new bit comes in, discard the first bit. This will result in the
exact answer. If there is not enough memory to store the N bits (assume N is 1
Billion), the solution can be obtained by using the Datar-Gionis-Indyk-Motwani
(DGIM) algorithm.

Swamy Saswathikananda College, Poothotta 1


CS6CRT19 Big Data Analytics Module 2

Datar-Gionis-Indyk-Motwani (DGIM) algorithm


● Each bit that comes in the stream has a timestamp, same as the position of
the bit in the stream.
○ First bit has a timestamp of 1, the second bit has a timestamp 2 and so
on.
● Distinguish the positions within the window of length N.
● Take the window size as a multiple of 2.
● Represent the timestamp as log2N.
● Divide the window into buckets, consisting of:
○ The timestamp of its right (most recent) end.
○ The number of 1’s in the bucket. This number must be a power of 2.
○ The number of 1’s is referred to as the size of the bucket.
■ Example-1001011-the number of 1s is 4-Hence the bucket size
is 4.
Rules that must be followed when representing a stream by buckets
● The right end of a bucket is always a position with a 1.
● Every bucket must contain one bit 1.
● No buckets can be formed without a bit 1.
● All sizes must be a power of 2.
● The size of the buckets must increase as we move on to the left.

Figure 2-6: Dividing stream into buckets as per DGIM algorithm

Swamy Saswathikananda College, Poothotta 2


CS6CRT19 Big Data Analytics Module 2

Maintaining the DGIM Rules


When a new bit enter:
● Check the leftmost bucket. If its timestamp has now reached the current
timestamp minus N , then this bucket no longer has any of its 1’s in the
window. Therefore, drop it from the list of buckets.
● Consider whether the new bit is 0 or 1.
○ If it is 0, then no further change to the buckets is needed.
○ If the new bit is a 1,
■ Create a new bucket with the current timestamp and size 1
■ If there are more than 2 buckets of size 1, combine the earliest
two buckets of size 1.
■ To combine any two adjacent buckets of the same size, replace
them by one bucket of twice the size.
■ The timestamp of the new bucket is the timestamp of the
rightmost of the two buckets.
The time complexity of the DGIM algorithm is O(log N).
Decaying Windows
Decaying windows are useful in applications which need identification of
most common elements. The use of the decaying window concept is when more
weight assigns to recent elements. The technique computes a smooth aggregation
of all the 1’s ever seen in the stream, with decaying weights. When it further
appears in the stream, less weight is given.

Swamy Saswathikananda College, Poothotta 3

You might also like