Lecture08_BloomFilter

Uploaded by

mahmoudsharaf796

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture08_BloomFilter

Uploaded by

mahmoudsharaf796

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Cairo University

Faculty of Computers and Artificial Intelligence

Computer Science Department

Advanced Data Structures Bloom Filters Dr. Amin Allam

[For more details, refer to Wikipedia]

1 Bloom Filters
A Bloom filter is a data structure used to test whether an element is a member of a set. It is
probabilistic since it allows false positives; It may decide that an element belongs to the set while
it does not. However, it does not allow false negatives; it never decides that an element does not
belong to the set while it does. Thus it returns either “possibly ∈ set” or “definitely ∈
/ set”.
The motivation of this structure is to provide an alternative to hash tables which consumes much
less space than hash tables, but with the drawback of introducing false positives. It is usually
accompanied with a hash table or another data structure which is effectively used to retrieve the
element only if it passed the fast Bloom filter test. It improves query times when most queried
elements do not belong to the set, and when the set elements are stored in secondary storage.
An empty Bloom filter is an array of m bits, all set to zero. A hash function is a function whose
parameter is an element to be inserted, and it returns a value in the range {0 . . . m−1}. To insert an
element in the Bloom filter, k different hash functions are applied to the inserted element to return
k positions in the range {0 . . . m − 1}. Then, all bits at all these positions are set to 1. It is not
possible to remove an element from a Bloom filter.
To estimate the effectiveness of Bloom filters, we need to calculate the probability of its false
positives which depends on the values of k and m. Assume that a hash function returns a value in
the range {0 . . . m − 1} with equal probability of m1 for each possible value.
Let p = 1 − q the probability that a certain bit is set to 1.
Let q = 1 − p the probability that a certain bit is not set to 1.
After one hash function call: q = 1 − m1 .
k
After inserting one element (that is, after k hash function calls): q = 1 − m1 .
kn kn
After inserting n elements: q = 1 − m1 and p = 1 − 1 − m1 .
Now, after inserting n elements, suppose that we need to test the membership of an element which
does not belong to the set (does not equal to any of the n inserted elements). The probability fp
of declaring a false positive equals to the probability that the associated k hash positions of that
element happens to be set to 1 accidentally by some of the hash positions of the existing n elements.
kn k kn k
1
So: fp = pk = 1 − 1 − m1 ≈ 1 − e− m using the approximation of: 1 − m1 ≈ e− m .
mn
ln 2
Given m and n, the optimal k that minimizes fp is k = n ln 2 where fp = 21
m
(*).
Given n and the target fp, the required m is −nlnln
2
2
fp
(*). Thus:
−1
Given the target fp, the optimal number of bits per element m n
= ln 2
log2 fp where k = − log2 fp.

1
FCAI-CU AdvDS Bloom Filters Amin Allam

∗ Given m and n, the optimal value of k that minimizes fp is m n

ln 2.
kn
k
Proof: We need to minimize fp = 1 − e− m with respect to k.
Note that since m and n are given, they are treated as constants.
We will minimize the easier function ln(fp) which is equivalent to minimizing fp.
k
− kn − kn
ln(fp) = ln 1 − e m = k ln 1 − e m

kn
Let r = e− m , then k = −m
n
ln(r).
−m
Thus ln(fp) = n ln(r) ln(1 − r). We are now minimizing with respect to r.
Taking the derivative and equating it to zero (ignoring the −m
n
constant):
− ln(r) ln(1−r)
1−r
+ r
= 0. So r ln(r) = (1 − r) ln(1 − r).
Clearly, r = 21 satisfies the above equation, so k = −mn
ln( 21 ) =m n
ln( 12 )−1 = m
n
ln 2.
kn
By substituting the optimal r = e− m = 12 and k = m n
ln 2 in the fp formula:
m m
mn
ln 2 ln 2 ln 2
fp = 1 − 12 n = 12 n = 12

.

−n ln fp
∗ Given n and the target fp, the required m is ln2 2
.
m ln 2
Proof: By taking natural logarithm of both sides of the formula: fp = 12 n .
ln(fp) = m 1
= n (ln 2)(−(ln 2)) = −m (ln 2)2 . Thus m = −nlnln fp
m
n
(ln 2) ln 2 n 2
2
.

Fermi Gas Model 1
No ratings yet
Fermi Gas Model 1
27 pages
Final Sol
No ratings yet
Final Sol
21 pages
Optimization Theory with Applications
From Everand
Optimization Theory with Applications
Donald A. Pierre
4/5 (4)
Worksheet 3
No ratings yet
Worksheet 3
2 pages
ECN 233 -LAGRANGIAN APPROACH TO OPTMIZATION
No ratings yet
ECN 233 -LAGRANGIAN APPROACH TO OPTMIZATION
13 pages
Exponent Lifting Sol
No ratings yet
Exponent Lifting Sol
4 pages
Notescanonical
No ratings yet
Notescanonical
6 pages
HW11
No ratings yet
HW11
3 pages
RDFT
No ratings yet
RDFT
20 pages
Verilog Implementation of Reed - Solomon Code
No ratings yet
Verilog Implementation of Reed - Solomon Code
16 pages
The Transitional Quantum State of Matter
No ratings yet
The Transitional Quantum State of Matter
19 pages
Chap 4
No ratings yet
Chap 4
14 pages
1.9 RMS Values and Parseval's Identities
No ratings yet
1.9 RMS Values and Parseval's Identities
4 pages
A n+-1 Solutions - Yufei Zhao - Trinity Training 2011
No ratings yet
A n+-1 Solutions - Yufei Zhao - Trinity Training 2011
4 pages
MR 1 2023 Eulers Totient Function
No ratings yet
MR 1 2023 Eulers Totient Function
11 pages
Assignment 2 MTH104
No ratings yet
Assignment 2 MTH104
2 pages
Introduction To The Dfs and The DFT
No ratings yet
Introduction To The Dfs and The DFT
5 pages
Chapter 2
No ratings yet
Chapter 2
24 pages
Solutions: Trinity Training 2011 A 1 Yufei Zhao
No ratings yet
Solutions: Trinity Training 2011 A 1 Yufei Zhao
4 pages
QFT Final
No ratings yet
QFT Final
1 page
Constant in
No ratings yet
Constant in
9 pages
Alfaro2019
No ratings yet
Alfaro2019
11 pages
Implementation: 1 Implementing FIR Filters
No ratings yet
Implementation: 1 Implementing FIR Filters
11 pages
SLP_Report
No ratings yet
SLP_Report
16 pages
Erhan Gurel - A note on the products
No ratings yet
Erhan Gurel - A note on the products
6 pages
The P-Adic Valuation of K-Central Binomial Coefficients
No ratings yet
The P-Adic Valuation of K-Central Binomial Coefficients
20 pages
CH 15
No ratings yet
CH 15
19 pages
SPA5304 Physical Dynamics: 1.1 Example: The Planar Pendulum
No ratings yet
SPA5304 Physical Dynamics: 1.1 Example: The Planar Pendulum
7 pages
ECE 410 Digital Signal Processing D. Munson University of Illinois
No ratings yet
ECE 410 Digital Signal Processing D. Munson University of Illinois
20 pages
10 11648 J Mcs 20160102 11
No ratings yet
10 11648 J Mcs 20160102 11
8 pages
Adaptivels Signal Spectra
No ratings yet
Adaptivels Signal Spectra
19 pages
mixed_radix_fft
No ratings yet
mixed_radix_fft
1 page
Fermi Theory of Beta Decay (Contd.) : Dr. Sanjay Kumar
No ratings yet
Fermi Theory of Beta Decay (Contd.) : Dr. Sanjay Kumar
13 pages
THE MINIMAL POLYNOMIAL OF cos (2π - n)
No ratings yet
THE MINIMAL POLYNOMIAL OF cos (2π - n)
16 pages
CB#13(Magnetic Effect Sol)
No ratings yet
CB#13(Magnetic Effect Sol)
6 pages
Central Force Motion
No ratings yet
Central Force Motion
26 pages
IMSC_2024_Number_Theory_Mock_Exam_Solutions
No ratings yet
IMSC_2024_Number_Theory_Mock_Exam_Solutions
4 pages
Lecture 9:title: Interpretation of Wavefunction and Modification For Alkali Atoms
No ratings yet
Lecture 9:title: Interpretation of Wavefunction and Modification For Alkali Atoms
14 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Exercises - Digital Communications
No ratings yet
Exercises - Digital Communications
34 pages
Arithmetic Function
No ratings yet
Arithmetic Function
12 pages
I1651566121ssolutions of Sample Question Paper-2
No ratings yet
I1651566121ssolutions of Sample Question Paper-2
18 pages
Bartle Intro To Real Analysis Solutions CH 9 PDF
No ratings yet
Bartle Intro To Real Analysis Solutions CH 9 PDF
9 pages
Bartle Intro To Real Analysis Solutions CH 9 PDF
100% (1)
Bartle Intro To Real Analysis Solutions CH 9 PDF
9 pages
On Some Functions Involving The LCM and GCD of Integer Tuples
No ratings yet
On Some Functions Involving The LCM and GCD of Integer Tuples
10 pages
L-15 Solution of Schrodinger Equation For Particle in A Box
No ratings yet
L-15 Solution of Schrodinger Equation For Particle in A Box
3 pages
M25 Mazu0536 01 Ism C25
No ratings yet
M25 Mazu0536 01 Ism C25
20 pages
Bruun 1978
No ratings yet
Bruun 1978
8 pages
New Asymptotics For The Mean Number of Zeros of Random Trigonometric Polynomials With Strongly Dependent Gaussian Coefficients
No ratings yet
New Asymptotics For The Mean Number of Zeros of Random Trigonometric Polynomials With Strongly Dependent Gaussian Coefficients
14 pages
Lecture 2021
No ratings yet
Lecture 2021
6 pages
NEET Full Test 8 (Sol)
No ratings yet
NEET Full Test 8 (Sol)
7 pages
Chromatic Polynomial and Its Generating Function
No ratings yet
Chromatic Polynomial and Its Generating Function
14 pages
Nama: Irdianto Saleh Kelas: A Semester: 3
No ratings yet
Nama: Irdianto Saleh Kelas: A Semester: 3
3 pages
L11n12_Polymer
No ratings yet
L11n12_Polymer
6 pages
Linear Feedback Shift Registers
No ratings yet
Linear Feedback Shift Registers
22 pages
Lecture 13 - Mechanical Vibrations
No ratings yet
Lecture 13 - Mechanical Vibrations
6 pages
1109.6557v1
No ratings yet
1109.6557v1
16 pages
VAC-4 2240410
No ratings yet
VAC-4 2240410
9 pages
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Image Classification Using SVM and CNN: March 2020
No ratings yet
Image Classification Using SVM and CNN: March 2020
6 pages
Allied Computer Science 1
No ratings yet
Allied Computer Science 1
6 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
Hamming Codes
No ratings yet
Hamming Codes
31 pages
RNN, LSTM, Gru
No ratings yet
RNN, LSTM, Gru
36 pages
Chapter 4 Dfggdgddgdfgdfddgfdefzeffreferfsrfe: Numerical Solution of Initial-Value Problems 4.3 Runge-Kutta Method
No ratings yet
Chapter 4 Dfggdgddgdfgdfddgfdefzeffreferfsrfe: Numerical Solution of Initial-Value Problems 4.3 Runge-Kutta Method
8 pages
ALCO-TD3
No ratings yet
ALCO-TD3
3 pages
CAS Handout
No ratings yet
CAS Handout
2 pages
Navies Bayes
No ratings yet
Navies Bayes
18 pages
Sampling Theory: Fourier Theory Made Easy
No ratings yet
Sampling Theory: Fourier Theory Made Easy
39 pages
MusicGen - Ipynb - Colab
No ratings yet
MusicGen - Ipynb - Colab
12 pages
Design and Analysis of Algorithm Syllabus
No ratings yet
Design and Analysis of Algorithm Syllabus
3 pages
@6 - Game Theory
No ratings yet
@6 - Game Theory
29 pages
Models For Machine Learning: M. Tim Jones
No ratings yet
Models For Machine Learning: M. Tim Jones
10 pages
MUCLecture_2024_3256178
No ratings yet
MUCLecture_2024_3256178
10 pages
All Pairs Shortest Path
No ratings yet
All Pairs Shortest Path
28 pages
Quiz 11 - Attempt Review
No ratings yet
Quiz 11 - Attempt Review
6 pages
Mat 575
No ratings yet
Mat 575
6 pages
Emotion Recognition Using CNN and RNN
No ratings yet
Emotion Recognition Using CNN and RNN
37 pages
Bankers Algo
No ratings yet
Bankers Algo
3 pages
cs312 PDF
No ratings yet
cs312 PDF
3 pages
Module Outline Data Structures and Algorithms
No ratings yet
Module Outline Data Structures and Algorithms
5 pages
School of Electrical, Electronics and Computer Engineering
No ratings yet
School of Electrical, Electronics and Computer Engineering
12 pages
LABSHEET 3 - SN PID Controller PDF
No ratings yet
LABSHEET 3 - SN PID Controller PDF
6 pages
MT 6 PDF
No ratings yet
MT 6 PDF
9 pages
Modified Iterated Square-Root Cubature Kalman Filter For Non-Cooperative Space Target Tracking
No ratings yet
Modified Iterated Square-Root Cubature Kalman Filter For Non-Cooperative Space Target Tracking
5 pages
Introduction to Management Science 13th Edition Anderson Solutions Manual all chapter instant download
100% (11)
Introduction to Management Science 13th Edition Anderson Solutions Manual all chapter instant download
54 pages
Ipcw Ann
No ratings yet
Ipcw Ann
100 pages
Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks
No ratings yet
Fast Spectrogram Inversion Using Multi-Head Convolutional Neural Networks
6 pages
Parareal Algorithm: Ma. Cristina Bargo
No ratings yet
Parareal Algorithm: Ma. Cristina Bargo
43 pages

Lecture08_BloomFilter

Uploaded by

Lecture08_BloomFilter

Uploaded by

Cairo University

Faculty of Computers and Artificial Intelligence

Advanced Data Structures Bloom Filters Dr. Amin Allam

[For more details, refer to Wikipedia]

∗ Given m and n, the optimal value of k that minimizes fp is m n

You might also like