0% found this document useful (0 votes)

39 views7 pages

BDA Experiment 7

Uploaded by

pabocon672

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views7 pages

BDA Experiment 7

Uploaded by

pabocon672

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

EXPERIMENT NO:07

AIM: Implementing DGIM algorithm using any Programming Language / Implementing

Bloom Filter using any Programming Language.

THEORY:

What is DGIM Algorithm?

Suppose we have a window of length N on a binary stream. We want at all times to be able to
answer queries of the form “how many 1’s are there in the last k bits?” for any k≤ N. For this
purpose we use the DGIM algorithm.
The basic version of the algorithm uses O(log2 N) bits to represent a window of N bits, and
allows us to estimate the number of 1’s in the window with an error of no more than 50%. To
begin, each bit of the stream has a timestamp, the position in which it arrives. The first bit has
timestamp 1, the second has timestamp 2, and so on.
Since we only need to distinguish positions within the window of length N, we shall represent
timestamps modulo N, so they can be represented by log2 N bits. If we also store the total
number of bits ever seen in the stream (i.e., the most recent timestamp) modulo N, then we can
determine from a timestamp modulo N where in the current window the bit with that timestamp
is.

We divide the window into buckets, 5 consisting of:

1. The timestamp of its right (most recent) end.
2. The number of 1’s in the bucket. This number must be a power of 2, and we refer to the
number of 1’s as the size of the bucket.
To represent a bucket, we need log2 N bits to represent the timestamp (modulo N) of its right
end. To represent the number of 1’s we only need log2 log2 N bits. The reason is that we know
this number i is a power of 2, say 2j , so we can represent i by coding j in binary. Since j is at
most log2 N, it requires log2 log2 N bits. Thus, O(logN) bits suffice to represent a bucket.
There are six rules that must be followed when representing a stream by buckets.
3. The right end of a bucket is always a position with a 1.
4. Every position with a 1 is in some bucket.
5. No position is in more than one bucket.
6. There are one or two buckets of any given size, up to some maximum size.
7. All sizes must be a power of 2.
8. Buckets cannot decrease in size as we move to the left (back in time).
What is Bloom Filter?

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an
element is a member of a set. For example, checking availability of username is set membership
problem, where the set is the list of all registered username. The price we pay for efficiency is
that it is probabilistic in nature that means, there might be some False Positive results. False
positive means, it might tell that given username is already taken but actually it’s not.

Working of Bloom Filter

A empty bloom filter is a bit array of m bits, all set to zero, like this –

We need k number of hash functions to calculate the hashes for a given input. When we want
to add an item in the filter, the bits at k indices h1(x), h2(x), … hk(x) are set, where indices are
calculated using hash functions.

DGIM PROGRAM:

import math filename

= "test.txt"

container = {}
windowsize = 1000
timestamp = 0
updateinterval = 1000# no larger than the windowsize
updateindex = 0

keysnum = int(math.log(windowsize, 2)) + 1 keylist

= list()
# initialize the container for
i in range(keysnum): key =
int(math.pow(2, i))
keylist.append(key)
container[key] = list()

def UpdateContainer(inputdict, klist, numkeys):

for key in klist:
if len(inputdict[key]) > 2:
inputdict[key].pop(0) tstamp
= inputdict[key].pop(0) if key
!= klist[-1]:
inputdict[key * 2].append(tstamp)
else:
break

def OutputResult(inputdict, klist, wsize):

cnt = 0
firststamp = 0
for key in klist:
if len(inputdict[key]) > 0: firststamp
= inputdict[key][0]
for tstamp in inputdict[key]: print "size of bucket: %d,
timestamp: %d" % (key, tstamp)
for key in klist:
for tstamp in inputdict[key]:
if tstamp != firststamp:
cnt += key
else:
cnt += 0.5 * key
print "Estimated number of ones in the last %d bits: %d" % (wsize, cnt)

with open(filename, 'r') as sfile:

while True: char =
sfile.read(1) if not char:#
no more input
OutputResult(container, keylist, windowsize)
break
timestamp = (timestamp + 1) % windowsize
for k in container.iterkeys():
for itemstamp in container[k]: if itemstamp == timestamp:# remove
record which is out of the window
container[k].remove(itemstamp)
if char == "1":# add it to the container
container[1].append(timestamp)
UpdateContainer(container, keylist, keysnum)
updateindex = (updateindex + 1) % updateinterval
if updateindex == 0:
OutputResult(container, keylist, windowsize)
print "\n"
OUTPUT:

BLOOM FILTER PROGRAM:

import math import

mmh3
from bitarray import bitarray

class BloomFilter(object):
'''
Class for Bloom filter, using murmur3 hash function
'''
def init (self, items_count,fp_prob):
'''
items_count : int
Number of items expected to be stored in bloom filter
fp_prob : float
False Positive probability in decimal
'''
# False posible probability in decimal
self.fp_prob = fp_prob

# Size of bit array to use

self.size = self.get_size(items_count,fp_prob)

# number of hash functions to use

self.hash_count = self.get_hash_count(self.size,items_count)

# Bit array of given size

self.bit_array = bitarray(self.size)

# initialize all bits as 0 self.bit_array.setall(0)

def add(self, item):
'''
Add an item in the filter
'''
digests = [] for i in
range(self.hash_count):

# create digest for given item.

# i work as seed to mmh3.hash() function #
With different seed, digest created is different
digest = mmh3.hash(item,i) % self.size
digests.append(digest)

# set the bit True in bit_array

self.bit_array[digest] = True

def check(self, item):

'''
Check for existence of an item in filter
'''
for i in range(self.hash_count): digest =
mmh3.hash(item,i) % self.size if
self.bit_array[digest] == False:

# if any of bit is False then,its not present

# in filter
# else there is probability that it exist
return False
return True @classmethod
def get_size(self,n,p):
'''
Return the size of bit array(m) to used using
following formula m = -(n
* lg(p)) / (lg(2)^2)
n : int number of items expected to be stored in
filter
p : float
False Positive probability in decimal
'''
m = -(n * math.log(p))/(math.log(2)**2)
return int(m) @classmethod
def get_hash_count(self, m,
n):
'''
Return the hash function(k) to be used
using following formula k = (m/n) * lg(2) m
: int size of bit array
n : int
number of items expected to be stored in filter
'''
k = (m/n) * math.log(2)
return int(k)
from bloomfilter import BloomFilter from random
import shuffle n = 20 #no of items to add p = 0.05
#false positive probability bloomf = BloomFilter(n,p)
print("Size of bit array:%d"%bloomf.size) print("False
positive Probability:%d"%bloomf.fp_prob)
print("Number of hash functions:%d"%bloomf.hash_count)

# words to be added
word_present = ['abound','abounds','abundance','abundant','accessable',
'bloom','blossom','bolster','bonny','bonus','bonuses',
'coherent','cohesive','colorful','comely','comfort',
'gems','generosity','generous','generously','genial']
# word not added
word_absent = ['bluff','cheater','hate','war','humanity',
'racism','hurt','nuke','gloomy','facebook',
'geeksforgeeks','twitter']

for item in word_present:

bloomf.add(item)

shuffle(word_present)
shuffle(word_absent)

test_words = word_present[:10] + word_absent

shuffle(test_words) for
word in test_words:
if bloomf.check(word):
if word in word_absent: print("'%s' is a
false positive!"%word)
else: print("'%s' is probably
present!"%word)
else:
print("'%s' is definitely not present!"%word)

OUTPUT:
Conclusion:
We have successfully implemented DGIM algorithm and Bloom Filter

Name: Huzaif Shaikh

Roll No. : 53
Date: 30th Sept,2024

Marks: Signature of Supervisor

Virtusa Interview Questions
No ratings yet
Virtusa Interview Questions
7 pages
Reliability Assessment of Electric Power Systems Using Monte Carlo Methods PDF
No ratings yet
Reliability Assessment of Electric Power Systems Using Monte Carlo Methods PDF
361 pages
Chapter 1 Information-Representation
No ratings yet
Chapter 1 Information-Representation
192 pages
Unit 4 - Lecture 3 - DGIM Algorithm Notes
100% (1)
Unit 4 - Lecture 3 - DGIM Algorithm Notes
8 pages
Convergent Billing - Solutions
100% (2)
Convergent Billing - Solutions
13 pages
User Guide
No ratings yet
User Guide
490 pages
Mining Data Streams
No ratings yet
Mining Data Streams
34 pages
DGIM
No ratings yet
DGIM
90 pages
Interchangeability
No ratings yet
Interchangeability
2 pages
Counting Oneness in A Window
No ratings yet
Counting Oneness in A Window
12 pages
Infinite Sequences Lecture Notes
No ratings yet
Infinite Sequences Lecture Notes
4 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
Ipi74931 PDF
No ratings yet
Ipi74931 PDF
7 pages
Bloom Filters A Tutorial, Analysis, and Survey
No ratings yet
Bloom Filters A Tutorial, Analysis, and Survey
31 pages
Unit 3
No ratings yet
Unit 3
49 pages
Pulse User Manual - v1.2
No ratings yet
Pulse User Manual - v1.2
86 pages
Questions and Answers: Autocad Civil 3D 2010
No ratings yet
Questions and Answers: Autocad Civil 3D 2010
10 pages
BigdataFinal
No ratings yet
BigdataFinal
13 pages
SoICT-Eng - ProbComp - Lec 5
No ratings yet
SoICT-Eng - ProbComp - Lec 5
41 pages
Gv500 User Manual v1.00
No ratings yet
Gv500 User Manual v1.00
17 pages
Streaming Algorithms Complete
No ratings yet
Streaming Algorithms Complete
10 pages
Rsa 2008
No ratings yet
Rsa 2008
32 pages
Web Dynpro Abap - Scn1
No ratings yet
Web Dynpro Abap - Scn1
39 pages
Blooms Filter
No ratings yet
Blooms Filter
15 pages
Bda PT 2
No ratings yet
Bda PT 2
35 pages
Data Stream Sampling
No ratings yet
Data Stream Sampling
25 pages
Counting Ones in A Window
No ratings yet
Counting Ones in A Window
27 pages
Streams 2
No ratings yet
Streams 2
49 pages
Erofflps
50% (2)
Erofflps
3 pages
Bloom Filter Guo
No ratings yet
Bloom Filter Guo
90 pages
DSBDA UT 2 Part 2
No ratings yet
DSBDA UT 2 Part 2
21 pages
Bloomfilter
No ratings yet
Bloomfilter
9 pages
Bda Ut-2
No ratings yet
Bda Ut-2
34 pages
Ensure Condition Monitoring Success
No ratings yet
Ensure Condition Monitoring Success
6 pages
KMSAuto ReadMe
No ratings yet
KMSAuto ReadMe
4 pages
Bloom Filter
No ratings yet
Bloom Filter
29 pages
Module2 Data Structures
No ratings yet
Module2 Data Structures
10 pages
Assocrules 2
No ratings yet
Assocrules 2
49 pages
Bloom Filters - A Probabilistic Data Structure - LinkedIn
No ratings yet
Bloom Filters - A Probabilistic Data Structure - LinkedIn
7 pages
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
No ratings yet
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
90 pages
Bda Exp8
No ratings yet
Bda Exp8
4 pages
Bda 8 59
No ratings yet
Bda 8 59
4 pages
Chapter 3 Integrative Coding
No ratings yet
Chapter 3 Integrative Coding
98 pages
Bloom Filter
No ratings yet
Bloom Filter
50 pages
Collections
No ratings yet
Collections
7 pages
Streams 1
No ratings yet
Streams 1
33 pages
Viden Io Data Analytics Lecture7 Data Stream Filtering PDF
No ratings yet
Viden Io Data Analytics Lecture7 Data Stream Filtering PDF
20 pages
B43 BDA Exp7
No ratings yet
B43 BDA Exp7
12 pages
An Introduction To Computational Chemistry Laboratory
No ratings yet
An Introduction To Computational Chemistry Laboratory
39 pages
Bloom Filters: Presented By: Eman Shafiq (2017-EE-389) Bareera Azhar (2017-EE-379) Ruqia Rubab (2017-EE-383
No ratings yet
Bloom Filters: Presented By: Eman Shafiq (2017-EE-389) Bareera Azhar (2017-EE-379) Ruqia Rubab (2017-EE-383
14 pages
CIA ClassificationManagementTools
No ratings yet
CIA ClassificationManagementTools
28 pages
Bda Exp4 Chinmay
No ratings yet
Bda Exp4 Chinmay
4 pages
Elements of Statistics - Fergus Daly, Et Al
No ratings yet
Elements of Statistics - Fergus Daly, Et Al
4 pages
DGIM Algorithm Theory Explanation
0% (1)
DGIM Algorithm Theory Explanation
2 pages
Bloom FIlter and Hash Function Numericals
No ratings yet
Bloom FIlter and Hash Function Numericals
6 pages
Lecture08 BloomFilter
No ratings yet
Lecture08 BloomFilter
2 pages
AdityaGaur BDA Exp7
No ratings yet
AdityaGaur BDA Exp7
2 pages
Manual Bda 6 7 8
No ratings yet
Manual Bda 6 7 8
6 pages
Big Dta Analytics
No ratings yet
Big Dta Analytics
7 pages
CS 561, Lecture 2: Randomization in Data Structures: Jared Saia University of New Mexico
No ratings yet
CS 561, Lecture 2: Randomization in Data Structures: Jared Saia University of New Mexico
46 pages
Exp 7 BDA
No ratings yet
Exp 7 BDA
1 page
Experiment No 8
No ratings yet
Experiment No 8
7 pages
Panduan Ringkas Pengaktifan I-Akaun Ahli - EN PDF
No ratings yet
Panduan Ringkas Pengaktifan I-Akaun Ahli - EN PDF
8 pages
Readme
No ratings yet
Readme
2 pages
Bloom Filter
No ratings yet
Bloom Filter
9 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
2D and 3D Truss Elements: MCEN 4173/5173
No ratings yet
2D and 3D Truss Elements: MCEN 4173/5173
19 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
Practical 1: Write C# Code To Display The Asterisk Pattern
No ratings yet
Practical 1: Write C# Code To Display The Asterisk Pattern
20 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
DGIM Example
No ratings yet
DGIM Example
4 pages
Q. 1: For The Rectangular Element Shown in The Figure Obtain The Coefficient Matrix
No ratings yet
Q. 1: For The Rectangular Element Shown in The Figure Obtain The Coefficient Matrix
6 pages
6 Filtering and Streaming: 6.1 Bloom Filters
No ratings yet
6 Filtering and Streaming: 6.1 Bloom Filters
6 pages
The Best Machine Learning Model For Fraud Detection On e Platforms: A Systematic Literature Review
No ratings yet
The Best Machine Learning Model For Fraud Detection On e Platforms: A Systematic Literature Review
10 pages
Data Science 5
No ratings yet
Data Science 5
82 pages
72 Soham Naik BDA EXP7
No ratings yet
72 Soham Naik BDA EXP7
3 pages
Bloom Filters: What Is A Bloom Filter?
No ratings yet
Bloom Filters: What Is A Bloom Filter?
7 pages
An Award-Winning Technology: Focused Insurance Broker
No ratings yet
An Award-Winning Technology: Focused Insurance Broker
10 pages
Module 4
No ratings yet
Module 4
10 pages
Module 4
No ratings yet
Module 4
20 pages
5th Class Computers
No ratings yet
5th Class Computers
3 pages
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 4: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
Bloom Filters: Differential Files Simple Large Database
No ratings yet
Bloom Filters: Differential Files Simple Large Database
22 pages
An Examination of The Bloom Filter and Its Application in Preventing Weak Password Choices
No ratings yet
An Examination of The Bloom Filter and Its Application in Preventing Weak Password Choices
4 pages
Feet Template-1
No ratings yet
Feet Template-1
1 page
Notes
No ratings yet
Notes
5 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
IT in The Nest
No ratings yet
IT in The Nest
2 pages
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

BDA Experiment 7

Uploaded by

BDA Experiment 7

Uploaded by

EXPERIMENT NO:07

AIM: Implementing DGIM algorithm using any Programming Language / Implementing

What is DGIM Algorithm?

We divide the window into buckets, 5 consisting of:

Working of Bloom Filter

import math filename

keysnum = int(math.log(windowsize, 2)) + 1 keylist

def UpdateContainer(inputdict, klist, numkeys):

def OutputResult(inputdict, klist, wsize):

with open(filename, 'r') as sfile:

BLOOM FILTER PROGRAM:

import math import

# Size of bit array to use

# number of hash functions to use

# Bit array of given size

# initialize all bits as 0 self.bit_array.setall(0)

# create digest for given item.

# set the bit True in bit_array

def check(self, item):

# if any of bit is False then,its not present

for item in word_present:

test_words = word_present[:10] + word_absent

Name: Huzaif Shaikh

Marks: Signature of Supervisor

You might also like