100% found this document useful (5 votes)
2K views75 pages

Alternative Data Structures in Ruby

Bloom Filter Bloom Filters Tests for existence in set Probabilistic Minimal memory use 100 million strings in a set

Uploaded by

Señor Smiles
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (5 votes)
2K views75 pages

Alternative Data Structures in Ruby

Bloom Filter Bloom Filters Tests for existence in set Probabilistic Minimal memory use 100 million strings in a set

Uploaded by

Señor Smiles
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

Alternative Data

Structures in Ruby

Tyler McMullen

Friday, February 19, 2010


Why?

Friday, February 19, 2010


Why?

• Speed
• Memory
• Clarity

Friday, February 19, 2010


What’s wrong with my favorite
data structure, X?

Friday, February 19, 2010


Nothing. (Maybe.)

Friday, February 19, 2010


• Bloom Filter
• BK-tree
• Splay Tree
• Trie
Friday, February 19, 2010
Bloom Filters

• Tests for existence in a set


• Probabilistic
• Minimal memory use

Friday, February 19, 2010


100 million strings in a Set

Traditional Set: Minimum 10gb

Friday, February 19, 2010


100 million strings in a Set

Traditional Set: Minimum 10gb


Bloom Filter (0.00001): 280mb

Friday, February 19, 2010


100 million strings in a Set

Traditional Set: Minimum 10gb


Bloom Filter (0.00001): 280mb
Bloom Filter (0.001): 170mb

Friday, February 19, 2010


Friday, February 19, 2010
0 1 2 3 4 5 6 7

Friday, February 19, 2010


“to be or not to be”

0 1 2 3 4 5 6 7

Friday, February 19, 2010


add: “to be or not to be”

0 1 2 3 4 5 6 7

Friday, February 19, 2010


add: “that is the question”

0 1 2 3 4 5 6 7

Friday, February 19, 2010


query: “whether ‘tis nobler”

0 1 2 3 4 5 6 7

NO MATCH

Friday, February 19, 2010


query: “to be or not to be”

0 1 2 3 4 5 6 7

MATCH

Friday, February 19, 2010


query: “in the mind to suffer”

0 1 2 3 4 5 6 7

FALSE MATCH

Friday, February 19, 2010


File Server

Friday, February 19, 2010


Request

File Server

Y exists? N

200 404
Friday, February 19, 2010
Request

Bloom Filter

File Server

Y exists? N

200 404
Friday, February 19, 2010
Bloom Filter

• Test for existence in set


• Tiny Memory Footprint
• Excellent Speed

Friday, February 19, 2010


BK-tree

Friday, February 19, 2010


BK-tree

• find items within a distance of a target


• reduces search space
• works inside a metric space

Friday, February 19, 2010


Triangle Inequality
| d(x, y) - d(x, z) | ≤ d(y, z)

Friday, February 19, 2010


Triangle Inequality
| d(x, y) - d(x, z) | ≤ d(y, z)

Friday, February 19, 2010


Triangle Inequality
| d(x, y) - d(x, z) | ≤ d(y, z)

y
4
z 1

Friday, February 19, 2010


Triangle Inequality
| d(x, y) - d(x, z) | ≤ d(y, z)

? y
4
z 1

Friday, February 19, 2010


Triangle Inequality
| 4 - 1 | ≤ d(y, z)

? y
4
z 1

Friday, February 19, 2010


Triangle Inequality
3 ≤ d(y, z)

≥3 y
4
z 1

Friday, February 19, 2010


BK-tree
taser
paste
shave
light
pastor
pasta

Friday, February 19, 2010


BK-tree
taser
paste
shave
light
pastor
pasta

Friday, February 19, 2010


BK-tree
root
paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
pastu paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
1
pastu paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
1
pastu paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
1
pastu paste
2
1

pasta pastor

Friday, February 19, 2010


BK-tree
root
1
pastu paste
2
1

pasta pastor

Friday, February 19, 2010


BK-tree
root
paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
pastu paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
pastu paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
pastu paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree
root
pastu paste

1 2 3 4 5
pasta pastor taser shave light

Friday, February 19, 2010


BK-tree

• Most often used for spelling correctors


• Work in any metric space
• Reduce the search space

Friday, February 19, 2010


Splay Tree

Friday, February 19, 2010


Tangent:
Access Patterns

Friday, February 19, 2010


Access Patterns

Usually assumed to be random or even.

Friday, February 19, 2010


Access Patterns

Rarely the case.

Friday, February 19, 2010


Splay Tree

• Self-balancing binary tree


• Brings most accessed items toward root
• The more uneven the access pattern, the
better

Friday, February 19, 2010


Splay Tree
7

4 11

2 6 9 13

1 3 5 4 8 10 12 14

Friday, February 19, 2010


Splay Tree
7

4 11

2 6 9 13

1 3 5 4 8 10 12 14

Friday, February 19, 2010


Splay Tree
7

4 9

2 6 8 11

10 13
1 3 5 4

12 14

Friday, February 19, 2010


Splay Tree
9
8 11

7 10 13

4
12 14

2 6

1 3 5 4

Friday, February 19, 2010


Splay Tree

• Made for very uneven access patterns


• Caches, Garbage collectors, etc...

Friday, February 19, 2010


Trie

Friday, February 19, 2010


Trie

• O(1) on lookup, add, removal


• Ordered traversals
• Prefix matching
• Excellent memory usage (depending on
implementation)

Friday, February 19, 2010


Trie

Friday, February 19, 2010


Trie
add: “thin”
T

Friday, February 19, 2010


Trie
add: “trap”
T

H R

I A

N P

Friday, February 19, 2010


Trie
add: “bar”
B T

A H R

R I A

N P

Friday, February 19, 2010


Trie
add: “burp”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “trap”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “trap”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “trap”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “trap”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “trap”
B T

U A H R

R R I A

P N P

Success!
Friday, February 19, 2010
Trie
query: “bumpkin”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “bupkis”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “bupkis”
B T

U A H R

R R I A

P N P

Friday, February 19, 2010


Trie
query: “bupkis”
B T

U A H R

R R I A

P N P

Fail!
Friday, February 19, 2010
Trie

Example: Autocompleter

Friday, February 19, 2010


Trie
class  Autocompleter
   def  initialize(words)
       @trie  =  Trie.new
       words.each  {  |word|  @trie.add(word)  }
   end

   def  query(word)
       return  @trie.children(word)
   end
end

Friday, February 19, 2010


Trie
class  Autocompleter
   def  initialize(words)
       @trie  =  Trie.new
       words.each  {  |word|  @trie.add(word)  }
   end

   def  call(env)
       request  =  Rack::Request.new(env)
       return  [200,
                       {  ‘content-­‐type’  =>  ‘application/json’  },
                       @trie.children(word).to_json]
   end
end

Friday, February 19, 2010


Conclusion: Data structures are cool.

Friday, February 19, 2010


Questions?

Friday, February 19, 2010

You might also like