0% found this document useful (0 votes)
406 views19 pages

Ranged Queries Using Bloom Filters Final

This document proposes using Bloom filters to enable efficient range queries on a set of strings. It describes: 1. Inserting prefixes of each string into the Bloom filter to allow checking if a substring is contained between two query strings. 2. The range query algorithm checks prefixes of increasing length to see if any substrings fall between the query range. 3. The space complexity is O(nk) and time is O(k) per query/insert, or it can be optimized to O(nk/logk) space with the same time.

Uploaded by

Alice Qing Wong
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
406 views19 pages

Ranged Queries Using Bloom Filters Final

This document proposes using Bloom filters to enable efficient range queries on a set of strings. It describes: 1. Inserting prefixes of each string into the Bloom filter to allow checking if a substring is contained between two query strings. 2. The range query algorithm checks prefixes of increasing length to see if any substrings fall between the query range. 3. The space complexity is O(nk) and time is O(k) per query/insert, or it can be optimized to O(nk/logk) space with the same time.

Uploaded by

Alice Qing Wong
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Range Queries using Bloom Filters

Basim Baig, Hau Chan, Samuel McCauley, Alice Wong Computer Science Department, Stony Brook University

Bloom Filter (Review)


An efficient data structure to represent a set S (subset of U) to efficiently answer membership queries such that
Given: if x in U Output: return No => if x not in S return Yes => if x in S (with prob. >= (1-)) (false positive)

Goal
Let S be a subset of a universe U (containing strings) that supports the following operations:
Insert(x): S <- S U {x} Query(x, y): Is there a string between x and y?
return No => if nothing between x and y return Yes => if there is a string between x and y with false positive probability (small)

Our Result
C*nk space O(k) time for range queries/inserts An Optimized version that reduces the space to C*nk/log(k) while retaining the same query time

Idea
Let S = B be our bloom filter structure. For each insert(K), K in U, we insert each substring/prefix of K[0, pi], i=1,, |K|/p, we insert it into B. (We assume that |K| is divisible by p.)

Algorithm for Range Queries


We are assuming that all the strings in U is maximum length K. For query between X and Y, query(X, Y), if X and Y are uneven length, we can pad wildcard characters to the shorter one. Procedures:
1. Check pi > K => return yes 2. For any substring x in between X[0,pi] and Y[0,pi] (inclusively)
1. 2. 3. if bloom filter query(x) returns true for more than one children => return yes if bloom filter query(x) returns true for only the left most=> then increment and repeat If every query(x) return no => then return no

Space Analysis
The size of the structure would be the number of inserted strings, say N, times the number of inserts requires to insert the string with the maximum length. Suppose the maximum length of the string inserted is K, then we insert K/p times for this particular string. We need at most O(NK/p) inserts to the bloom filter.
Space:

Query Analysis
Since bloom filter has look up time of O(1), we need to look up at most all the brute force elements at each level Hence, the range query time of our structure is:

Error analysis:
You can set the appropriate value of error that you desire. But less error means you need more space. However this does not impact the space as much as you would think. The dominant factor in the space is still the k/p factor outside the log.

Optimization
Instead of brute forcing down each of the paths that matches the input range strings we will just brute force selected nodes.

Modified Bloom filter

Modified Query Algorithm

Ex 2: [cbcaa-cbcc] LCA = <cbc>

Modified costs
Query cost:

Space cost:

How to get the labels


Have four bloom filters labeled left, right, middle and both. Preprocessing allows you to put all the nodes in the appropriate bloom filters. But a downside is this makes the structure very static. In the worst case, you need to revert to unmodified algorithm.

Thank you

You might also like