0% found this document useful (0 votes)

14 views4 pages

Bda 8 59

This document outlines the implementation of a Bloom Filter using the MapReduce programming model, emphasizing its efficiency in testing set membership with a low chance of false positives. It details the construction of the Bloom Filter using multiple hash functions and a bit array, along with the MapReduce approach for distributed processing. The document includes code examples for the mapper, reducer, and query implementations, highlighting the utility of Bloom Filters in applications like web caches and spam filters.

Uploaded by

pjib225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views4 pages

Bda 8 59

Uploaded by

pjib225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Meet Laheri

B13/59

Experiement No. 8
Aim:

This document aims to implement a Bloom Filter using the MapReduce programming model. The
Bloom Filter is a probabilistic data structure that efficiently tests whether an element is a member of
a set, with a small chance of false positives but no false negatives.

Theory:

1. Bloom Filter Overview

A Bloom Filter is a space-efficient probabilistic data structure used to test whether an element is a
member of a set. It allows for fast membership testing with a trade-off: the possibility of false
positives (i.e., an element might be wrongly identified as present) but no false negatives (i.e., if an
element is present, it will always be correctly identified).

The Bloom Filter is constructed using:

- Multiple hash functions: Each element is hashed using `k` different hash functions, and the
corresponding positions in a bit array are set to 1.

- Bit array: The Bloom Filter maintains a bit array of `m` bits, initially all set to 0. For each inserted
element, the `k` hash functions set `k` positions to 1.

When checking if an element exists in the filter:

- The element is hashed using the same `k` hash functions.

- If all corresponding bits in the bit array are 1, the element is considered "possibly in the set."

- If any of the bits are 0, the element is definitely not in the set.

2. MapReduce Approach

In the MapReduce model, the Bloom Filter can be distributed across multiple nodes for efficient
parallel construction and querying. Each node handles part of the data and computes the necessary
hash values, updating its portion of the bit array. Once the bit arrays from all mappers are combined,
the Bloom Filter is ready for query operations.

The implementation consists of:

- Map phase: Generates the hash values for each element and updates the corresponding bits in the
bit array.

- Reduce phase: Aggregates the bit arrays from all mappers to form the final global bit array.

Code

1. Hash Functions

We'll use multiple hash functions to determine the bit positions for each element. For simplicity,
Python's built-in hash function can be combined with variations to simulate multiple hash functions.

2. Mapper and Reducer Implementation

- Mapper Code (bloom_mapper.py):

Meet Laheri
B13/59

```python

!/usr/bin/env python

import sys

import hashlib

Define a function for multiple hash functions

def get_hashes(item, num_hashes, bit_array_size):

hashes = []

for i in range(num_hashes):

Create different hash values using hashlib and mod them to fit the bit array

hash_value = int(hashlib.md5(f'{item}{i}'.encode()).hexdigest(), 16)

hashes.append(hash_value % bit_array_size)

return hashes

Parameters

NUM_HASHES = 3 Number of hash functions

BIT_ARRAY_SIZE = 1000 Size of bit array

Input comes from standard input (stdin)

for line in sys.stdin:

line = line.strip()

Generate the hash positions for the item

hash_positions = get_hashes(line, NUM_HASHES, BIT_ARRAY_SIZE)

Output the hash positions for the line (element)

for pos in hash_positions:

print(f'{pos}\t1')

- Reducer Code (bloom_reducer.py):

import sys

Initialize a bit array of 0s

BIT_ARRAY_SIZE = 1000

bit_array = [0] * BIT_ARRAY_SIZE

Meet Laheri
B13/59

Input comes from standard input (stdin)

for line in sys.stdin:

line = line.strip()

position, _ = line.split('\t')

Update the bit array at the given position

bit_array[int(position)] = 1

Output the final bit array as a compressed string

print(''.join(map(str, bit_array)))

```

3. Bloom Filter Query Implementation

Once the Bloom Filter is built, you can query it to check if an element might be in the set.

- Query Code (bloom_query.py):

import sys

import hashlib

Function to compute hash positions for querying

def get_hashes(item, num_hashes, bit_array_size):

hashes = []

for i in range(num_hashes):

hash_value = int(hashlib.md5(f'{item}{i}'.encode()).hexdigest(), 16)

hashes.append(hash_value % bit_array_size)

return hashes

Parameters

NUM_HASHES = 3 Number of hash functions

BIT_ARRAY_SIZE = 1000 Size of bit array

Read the Bloom Filter bit array from input

bloom_filter = sys.stdin.read().strip()

Input element to check

Meet Laheri
B13/59

element_to_check = sys.argv[1]

Compute hash positions for the element to check

hash_positions = get_hashes(element_to_check, NUM_HASHES, BIT_ARRAY_SIZE)

Check if all the corresponding bits in the Bloom Filter are set to 1

is_present = all(bloom_filter[pos] == '1' for pos in hash_positions)

if is_present:

print(f'{element_to_check} is possibly in the set.')

else:

print(f'{element_to_check} is definitely not in the set.')

Conclusion:

Implementing a Bloom Filter using the MapReduce framework allows for efficient parallel processing
and distributed storage of large datasets. The Bloom Filter is a valuable tool when it is important to
perform membership queries with low memory overhead, such as in web caches, databases, and
spam filters.

Basic Algorithim 6
No ratings yet
Basic Algorithim 6
89 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
Hashing
No ratings yet
Hashing
111 pages
wk3 3
No ratings yet
wk3 3
111 pages
Probablistic Data Structures
No ratings yet
Probablistic Data Structures
5 pages
Bloom Filter
No ratings yet
Bloom Filter
50 pages
Bloom Filters A Tutorial, Analysis, and Survey
No ratings yet
Bloom Filters A Tutorial, Analysis, and Survey
31 pages
CSE446 Lecture 3
No ratings yet
CSE446 Lecture 3
30 pages
Bloom Filters A Tutorial Analysis and Survey
No ratings yet
Bloom Filters A Tutorial Analysis and Survey
32 pages
Rsa 2008
No ratings yet
Rsa 2008
32 pages
DSBDA UT 2 Part 2
No ratings yet
DSBDA UT 2 Part 2
21 pages
Probabilistic Data Structures
No ratings yet
Probabilistic Data Structures
26 pages
Micro - Project DSP 2024.pdf Prasen Vishal Pratik
No ratings yet
Micro - Project DSP 2024.pdf Prasen Vishal Pratik
18 pages
BDA Experiment 7
No ratings yet
BDA Experiment 7
7 pages
TheAlgorithms Python-Hashing
No ratings yet
TheAlgorithms Python-Hashing
16 pages
32 BDA Exp6
No ratings yet
32 BDA Exp6
6 pages
AdityaGaur BDA Exp7
No ratings yet
AdityaGaur BDA Exp7
2 pages
CSE446 Lecture 3
No ratings yet
CSE446 Lecture 3
41 pages
ADS EXP 8 Tanisha Kanal
No ratings yet
ADS EXP 8 Tanisha Kanal
10 pages
Assignment 2 BDA
No ratings yet
Assignment 2 BDA
9 pages
BDA Assignment2 BE6 20
No ratings yet
BDA Assignment2 BE6 20
9 pages
Bloom Filters - A Probabilistic Data Structure - LinkedIn
No ratings yet
Bloom Filters - A Probabilistic Data Structure - LinkedIn
7 pages
Bloom Filter
No ratings yet
Bloom Filter
29 pages
BigdataFinal
No ratings yet
BigdataFinal
13 pages
On Implementing Bloom Filters in C - Andreinc
No ratings yet
On Implementing Bloom Filters in C - Andreinc
16 pages
DATA1001 23t2 Assignment3
No ratings yet
DATA1001 23t2 Assignment3
7 pages
Bloom Filter Guo
No ratings yet
Bloom Filter Guo
90 pages
Bda Exp8
No ratings yet
Bda Exp8
4 pages
Bda Exp4 Chinmay
No ratings yet
Bda Exp4 Chinmay
4 pages
Bloom Filter
No ratings yet
Bloom Filter
9 pages
Bloom Filters: Presented By: Eman Shafiq (2017-EE-389) Bareera Azhar (2017-EE-379) Ruqia Rubab (2017-EE-383
No ratings yet
Bloom Filters: Presented By: Eman Shafiq (2017-EE-389) Bareera Azhar (2017-EE-379) Ruqia Rubab (2017-EE-383
14 pages
Bda Ut-2
No ratings yet
Bda Ut-2
34 pages
Bda PT 2
No ratings yet
Bda PT 2
35 pages
Data Stream Sampling
No ratings yet
Data Stream Sampling
25 pages
DGIM
No ratings yet
DGIM
90 pages
DSBD Unit-II 3
No ratings yet
DSBD Unit-II 3
28 pages
Introduction To Bloom Filters
No ratings yet
Introduction To Bloom Filters
7 pages
Module 4
No ratings yet
Module 4
10 pages
Streams 2
No ratings yet
Streams 2
49 pages
Blooms Filter
No ratings yet
Blooms Filter
15 pages
Experiment No 8
No ratings yet
Experiment No 8
7 pages
Data Science 5
No ratings yet
Data Science 5
82 pages
Manual Bda 6 7 8
No ratings yet
Manual Bda 6 7 8
6 pages
Viden Io Data Analytics Lecture7 Data Stream Filtering PDF
No ratings yet
Viden Io Data Analytics Lecture7 Data Stream Filtering PDF
20 pages
CS Presentation 3
No ratings yet
CS Presentation 3
1 page
Bloom Filters: Insert (X) : For I in (1, K) : A (H - I (X) ) 1
No ratings yet
Bloom Filters: Insert (X) : For I in (1, K) : A (H - I (X) ) 1
1 page
Streaming Algorithms Complete
No ratings yet
Streaming Algorithms Complete
10 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Deep Packet Inspection Using Parallel Bloom Filters
No ratings yet
Deep Packet Inspection Using Parallel Bloom Filters
8 pages
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
No ratings yet
Bloom Filters - Short Tutorial: Web Cache Sharing ( (3) ) Collaborating Web Caches Use Bloom Filters (Dubbed
4 pages
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
No ratings yet
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
90 pages
Bloom Filters and Their Applications
No ratings yet
Bloom Filters and Their Applications
5 pages
Bloom Filters: Differential Files Simple Large Database
No ratings yet
Bloom Filters: Differential Files Simple Large Database
22 pages
CS 561, Lecture 2: Randomization in Data Structures: Jared Saia University of New Mexico
No ratings yet
CS 561, Lecture 2: Randomization in Data Structures: Jared Saia University of New Mexico
46 pages
Bloomfilter
No ratings yet
Bloomfilter
9 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Text Media - Definition, Characteristics, Criteria, and Design Text
100% (2)
Text Media - Definition, Characteristics, Criteria, and Design Text
2 pages
Yuko Kikuchi Refracted Modernity Visual Culture and Identity in Colonial Taiwan
100% (1)
Yuko Kikuchi Refracted Modernity Visual Culture and Identity in Colonial Taiwan
297 pages
NTSE Practice Paper - 07 Mental Ability Test
No ratings yet
NTSE Practice Paper - 07 Mental Ability Test
7 pages
Literary 100 A Ranking of The Most Influential Novelists Playwrights and Poets of All Time Daniel S. Burt Download
100% (1)
Literary 100 A Ranking of The Most Influential Novelists Playwrights and Poets of All Time Daniel S. Burt Download
52 pages
Finite, Non-Finite 1
100% (1)
Finite, Non-Finite 1
3 pages
Unit 1 Notes - DW
No ratings yet
Unit 1 Notes - DW
25 pages
Periplo Us: Pseudo-Skylax's
No ratings yet
Periplo Us: Pseudo-Skylax's
128 pages
Vedic Swar Bodh by DR - Vraj Bihari Choube 1972
No ratings yet
Vedic Swar Bodh by DR - Vraj Bihari Choube 1972
170 pages
Tugas Tuton 2 Bahasa Inggris
No ratings yet
Tugas Tuton 2 Bahasa Inggris
7 pages
Vocabulary Homework Middle School
50% (2)
Vocabulary Homework Middle School
4 pages
Soran Bushi: Section 1 - Musical Analysis - Traditional Japanese Song
No ratings yet
Soran Bushi: Section 1 - Musical Analysis - Traditional Japanese Song
15 pages
09-Post Gupta Era
No ratings yet
09-Post Gupta Era
17 pages
Where The Iron Crosses Grow - Robert Forczyk
No ratings yet
Where The Iron Crosses Grow - Robert Forczyk
390 pages
Chattanein Full Book - Roman
No ratings yet
Chattanein Full Book - Roman
57 pages
Al Khulasa Tul Hasna'
No ratings yet
Al Khulasa Tul Hasna'
71 pages
Carman Scan Lite PC Scan: User's Guide
No ratings yet
Carman Scan Lite PC Scan: User's Guide
48 pages
8 Worksheet 1
No ratings yet
8 Worksheet 1
2 pages
10 Rhetorical Devices Used in Political Messages
No ratings yet
10 Rhetorical Devices Used in Political Messages
2 pages
Relations and Functions
No ratings yet
Relations and Functions
2 pages
Critical Response
No ratings yet
Critical Response
3 pages
AFCK Manual v2
No ratings yet
AFCK Manual v2
5 pages
From Affu - 1st Year Sanskrit 3rd Roman Questions, Andhra Pradesh
No ratings yet
From Affu - 1st Year Sanskrit 3rd Roman Questions, Andhra Pradesh
13 pages
The Hungry Mouse
No ratings yet
The Hungry Mouse
13 pages
Electrostatic Potential and Capacitance: Chapter Two
No ratings yet
Electrostatic Potential and Capacitance: Chapter Two
42 pages
DS Lab 9 - Recursion in C++
No ratings yet
DS Lab 9 - Recursion in C++
10 pages
Kristin Longman CV
No ratings yet
Kristin Longman CV
3 pages
Nukhba 004
No ratings yet
Nukhba 004
12 pages
Tony Hunt - Yvain
No ratings yet
Tony Hunt - Yvain
13 pages
Skills and Structure
No ratings yet
Skills and Structure
5 pages
3.4 Positive and Negative Effects of Religion
No ratings yet
3.4 Positive and Negative Effects of Religion
5 pages

Bda 8 59

Uploaded by

Bda 8 59

Uploaded by

Meet Laheri

1. Bloom Filter Overview

The Bloom Filter is constructed using:

When checking if an element exists in the filter:

- The element is hashed using the same `k` hash functions.

The implementation consists of:

2. Mapper and Reducer Implementation

- Mapper Code (bloom_mapper.py):

Define a function for multiple hash functions

def get_hashes(item, num_hashes, bit_array_size):

hash_value = int(hashlib.md5(f'{item}{i}'.encode()).hexdigest(), 16)

NUM_HASHES = 3 Number of hash functions

BIT_ARRAY_SIZE = 1000 Size of bit array

Input comes from standard input (stdin)

for line in sys.stdin:

Generate the hash positions for the item

hash_positions = get_hashes(line, NUM_HASHES, BIT_ARRAY_SIZE)

Output the hash positions for the line (element)

for pos in hash_positions:

- Reducer Code (bloom_reducer.py):

Initialize a bit array of 0s

bit_array = [0] * BIT_ARRAY_SIZE

Input comes from standard input (stdin)

for line in sys.stdin:

Update the bit array at the given position

Output the final bit array as a compressed string

3. Bloom Filter Query Implementation

- Query Code (bloom_query.py):

Function to compute hash positions for querying

def get_hashes(item, num_hashes, bit_array_size):

hash_value = int(hashlib.md5(f'{item}{i}'.encode()).hexdigest(), 16)

NUM_HASHES = 3 Number of hash functions

BIT_ARRAY_SIZE = 1000 Size of bit array

Read the Bloom Filter bit array from input

Input element to check

Compute hash positions for the element to check

hash_positions = get_hashes(element_to_check, NUM_HASHES, BIT_ARRAY_SIZE)

is_present = all(bloom_filter[pos] == '1' for pos in hash_positions)

print(f'{element_to_check} is possibly in the set.')

print(f'{element_to_check} is definitely not in the set.')

You might also like