Module 2 Session 7 Counting of Ones in A Window Decaying Windows
The document discusses the sliding window model for data stream algorithms, focusing on counting the number of 1's in a recent data stream using the Datar-Gionis-Indyk-Motwani (DGIM) algorithm. It outlines the rules for bucket representation in the DGIM algorithm and the maintenance of these buckets as new bits arrive. Additionally, it introduces the concept of decaying windows for applications that prioritize recent data over older data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
23 views3 pages
Module 2 Session 7 Counting of Ones in A Window Decaying Windows
The document discusses the sliding window model for data stream algorithms, focusing on counting the number of 1's in a recent data stream using the Datar-Gionis-Indyk-Motwani (DGIM) algorithm. It outlines the rules for bucket representation in the DGIM algorithm and the maintenance of these buckets as new bits arrive. Additionally, it introduces the concept of decaying windows for applications that prioritize recent data over older data.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3
CS6CRT19 Big Data Analytics Module 2
Counting of 1’s in a Window
The sliding window model for data stream algorithms is a popular model for infinite data stream processing. Window refers to the time interval during which stream raised and processed the queries. The receiving of data elements is taking place one by one. Statistical computations are over a sliding time-window of size N (not over the whole stream) in time-units. Window covers the most recent data items arrived. Sliding window focuses on recent data and hence provides more significant and relevant data in real-world applications. The network traffic analysis requires analysis based on the recent past. This is more informative and useful than analysis based on stale data. A useful model of stream processing is the one in which queries are processed for a window of length N, where N corresponds to the most-recent elements received. Usually it is so that N is very large and cannot be stored on a storage device, or there are so many streams that elements from windows for all cannot be stored. Let us consider a counting problem in which we need to count the number of 1’s present in the last k bits received (where k<=N, and N is the window length), in a given stream of 0’s and 1’s. The obvious solution is to store the most recent N bits. When a new bit comes in, discard the first bit. This will result in the exact answer. If there is not enough memory to store the N bits (assume N is 1 Billion), the solution can be obtained by using the Datar-Gionis-Indyk-Motwani (DGIM) algorithm.
Swamy Saswathikananda College, Poothotta 1
CS6CRT19 Big Data Analytics Module 2
Datar-Gionis-Indyk-Motwani (DGIM) algorithm
● Each bit that comes in the stream has a timestamp, same as the position of the bit in the stream. ○ First bit has a timestamp of 1, the second bit has a timestamp 2 and so on. ● Distinguish the positions within the window of length N. ● Take the window size as a multiple of 2. ● Represent the timestamp as log2N. ● Divide the window into buckets, consisting of: ○ The timestamp of its right (most recent) end. ○ The number of 1’s in the bucket. This number must be a power of 2. ○ The number of 1’s is referred to as the size of the bucket. ■ Example-1001011-the number of 1s is 4-Hence the bucket size is 4. Rules that must be followed when representing a stream by buckets ● The right end of a bucket is always a position with a 1. ● Every bucket must contain one bit 1. ● No buckets can be formed without a bit 1. ● All sizes must be a power of 2. ● The size of the buckets must increase as we move on to the left.
Figure 2-6: Dividing stream into buckets as per DGIM algorithm
Swamy Saswathikananda College, Poothotta 2
CS6CRT19 Big Data Analytics Module 2
Maintaining the DGIM Rules
When a new bit enter: ● Check the leftmost bucket. If its timestamp has now reached the current timestamp minus N , then this bucket no longer has any of its 1’s in the window. Therefore, drop it from the list of buckets. ● Consider whether the new bit is 0 or 1. ○ If it is 0, then no further change to the buckets is needed. ○ If the new bit is a 1, ■ Create a new bucket with the current timestamp and size 1 ■ If there are more than 2 buckets of size 1, combine the earliest two buckets of size 1. ■ To combine any two adjacent buckets of the same size, replace them by one bucket of twice the size. ■ The timestamp of the new bucket is the timestamp of the rightmost of the two buckets. The time complexity of the DGIM algorithm is O(log N). Decaying Windows Decaying windows are useful in applications which need identification of most common elements. The use of the decaying window concept is when more weight assigns to recent elements. The technique computes a smooth aggregation of all the 1’s ever seen in the stream, with decaying weights. When it further appears in the stream, less weight is given.