0% found this document useful (0 votes)
51 views7 pages

35SearchingApplications 2x2

Uploaded by

Hush Hush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views7 pages

35SearchingApplications 2x2

Uploaded by

Hush Hush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Algorithms R OBERT S EDGEWICK | K EVIN W AYNE

3.5 S YMBOL T ABLE A PPLICATIONS 3.5 S YMBOL T ABLE A PPLICATIONS


‣ sets ‣ sets
‣ dictionary clients ‣ dictionary clients
‣ indexing clients ‣ indexing clients
Algorithms Algorithms
‣ sparse vectors ‣ sparse vectors
F O U R T H E D I T I O N

R OBERT S EDGEWICK | K EVIN W AYNE R OBERT S EDGEWICK | K EVIN W AYNE

http://algs4.cs.princeton.edu http://algs4.cs.princeton.edu

Set API Exception filter

Mathematical set. A collection of distinct keys. ・Read in a list of words from one file.
・Print out all words from standard input that are { in, not in } the list.
public class SET<Key extends Comparable<Key>>

SET() create an empty set % more list.txt


list of exceptional words
was it the of
void add(Key key) add the key to the set
% java WhiteList list.txt < tinyTale.txt
boolean contains(Key key) is the key in the set? it was the of it was the of
it was the of it was the of
void remove(Key key) remove the key from the set it was the of it was the of
it was the of it was the of
int size() return the number of keys in the set it was the of it was the of

Iterator<Key> iterator() iterator through keys in the set % java BlackList list.txt < tinyTale.txt
best times worst times
age wisdom age foolishness
epoch belief epoch incredulity
season light season darkness
spring hope winter despair

Q. How to implement?
3 4
Exception filter applications Exception filter: Java implementation

・Read in a list of words from one file. ・Read in a list of words from one file.
・Print out all words from standard input that are { in, not in } the list. ・Print out all words from standard input that are in the list.

application purpose key in list public class WhiteList


{
spell checker identify misspelled words word dictionary words public static void main(String[] args)
{
SET<String> set = new SET<String>(); create empty set of strings
browser mark visited pages URL visited pages

parental controls block sites URL bad sites In in = new In(args[0]);


while (!in.isEmpty()) read in whitelist
chess detect draw board positions set.add(in.readString());

spam filter eliminate spam IP address spam addresses while (!StdIn.isEmpty())


{
credit cards check for stolen cards number stolen cards String word = StdIn.readString();
print words in list
if (set.contains(word))
StdOut.println(word);
}
}
}

5 6

Exception filter: Java implementation

・Read in a list of words from one file.


・Print out all words from standard input that are not in the list.

public class BlackList


{
3.5 S YMBOL T ABLE A PPLICATIONS
public static void main(String[] args)
{
SET<String> set = new SET<String>(); create empty set of strings
‣ sets
‣ dictionary clients
In in = new In(args[0]);
while (!in.isEmpty()) ‣ indexing clients
Algorithms
read in whitelist
set.add(in.readString());
‣ sparse vectors
while (!StdIn.isEmpty())
{
String word = StdIn.readString(); R OBERT S EDGEWICK | K EVIN W AYNE
print words not in list
if (!set.contains(word)) http://algs4.cs.princeton.edu
StdOut.println(word);
}
}
}

7
Dictionary lookup Dictionary lookup

Command-line arguments. Command-line arguments. % more amino.csv


% more ip.csv TTT,Phe,F,Phenylalanine

・A comma-separated value (CSV) file. www.princeton.edu,128.112.128.15


www.cs.princeton.edu,128.112.136.35
・A comma-separated value (CSV) file. TTC,Phe,F,Phenylalanine
TTA,Leu,L,Leucine

・Key field. www.math.princeton.edu,128.112.18.11 ・Key field. TTG,Leu,L,Leucine


TCT,Ser,S,Serine

・Value field. ・Value field.


www.cs.harvard.edu,140.247.50.127
TCC,Ser,S,Serine
www.harvard.edu,128.103.60.24
TCA,Ser,S,Serine
www.yale.edu,130.132.51.8
TCG,Ser,S,Serine
www.econ.yale.edu,128.36.236.74 TAT,Tyr,Y,Tyrosine
www.cs.yale.edu,128.36.229.30
Ex 1. DNS lookup. espn.com,199.181.135.201
Ex 2. Amino acids. TAC,Tyr,Y,Tyrosine
TAA,Stop,Stop,Stop
domain name is key IP is value yahoo.com,66.94.234.13 TAG,Stop,Stop,Stop
msn.com,207.68.172.246 TGT,Cys,C,Cysteine
google.com,64.233.167.99 codon is key name is value TGC,Cys,C,Cysteine
% java LookupCSV ip.csv 0 1 baidu.com,202.108.22.33 TGA,Stop,Stop,Stop
yahoo.co.jp,202.93.91.141 TGG,Trp,W,Tryptophan
adobe.com CTT,Leu,L,Leucine
sina.com.cn,202.108.33.32 % java LookupCSV amino.csv 0 3
192.150.18.60 CTC,Leu,L,Leucine
ebay.com,66.135.192.87
www.princeton.edu ACT CTA,Leu,L,Leucine
adobe.com,192.150.18.60
128.112.128.15 163.com,220.181.29.154
Threonine CTG,Leu,L,Leucine
CCT,Pro,P,Proline
ebay.edu passport.net,65.54.179.226 TAG
domain name is key URL is value CCC,Pro,P,Proline
Not found tom.com,61.135.158.237 Stop CCA,Pro,P,Proline
nate.com,203.226.253.11 CAT CCG,Pro,P,Proline
cnn.com,64.236.16.20 CAT,His,H,Histidine
% java LookupCSV ip.csv 1 0 Histidine
daum.net,211.115.77.211 CAC,His,H,Histidine
128.112.128.15 blogger.com,66.102.15.100 CAA,Gln,Q,Glutamine
www.princeton.edu fastclick.com,205.180.86.4 CAG,Gln,Q,Glutamine
wikipedia.org,66.230.200.100 CGT,Arg,R,Arginine
999.999.999.99
rakuten.co.jp,202.72.51.22 CGC,Arg,R,Arginine
Not found ... ...

9 10

Dictionary lookup Dictionary lookup: Java implementation

Command-line arguments. % more classlist.csv public class LookupCSV


・ A comma-separated value (CSV) file. 13,Berl,Ethan Michael,P01,eberl
12,Cao,Phillips Minghua,P01,pcao
{
public static void main(String[] args)
・Key field. 11,Chehoud,Christel,P01,cchehoud {
In in = new In(args[0]);
・Value field.
10,Douglas,Malia Morioka,P01,malia
12,Haddock,Sara Lynn,P01,shaddock int keyField = Integer.parseInt(args[1]); process input file
12,Hantman,Nicole Samantha,P01,nhantman int valField = Integer.parseInt(args[2]);
11,Hesterberg,Adam Classen,P01,ahesterb
Ex 3. Class list. 13,Hwang,Roland Lee,P01,rhwang ST<String, String> st = new ST<String, String>();
13,Hyde,Gregory Thomas,P01,ghyde while (!in.isEmpty())
first name 13,Kim,Hyunmoon,P01,hktwo {
login is key is value 12,Korac,Damjan,P01,dkorac String line = in.readLine();
11,MacDonald,Graham David,P01,gmacdona String[] tokens = line.split(","); build symbol table
10,Michal,Brian Thomas,P01,bmichal String key = tokens[keyField];
% java LookupCSV classlist.csv 4 1 12,Nam,Seung Hyeon,P01,seungnam String val = tokens[valField];
eberl 11,Nastasescu,Maria Monica,P01,mnastase st.put(key, val);
Ethan section 11,Pan,Di,P01,dpan }
nwebb login is key is value 12,Partridge,Brenton Alan,P01,bpartrid
Natalie 13,Rilee,Alexander,P01,arilee while (!StdIn.isEmpty())
13,Roopakalu,Ajay,P01,aroopaka {
process lookups
% java LookupCSV classlist.csv 4 3 11,Sheng,Ben C,P01,bsheng String s = StdIn.readString(); with standard I/O
dpan 12,Webb,Natalie Sue,P01,nwebb if (!st.contains(s)) StdOut.println("Not found");
else StdOut.println(st.get(s));
P01 ⋮
}
}
}
11 12
File indexing

Goal. Index a PC (or the web).

3.5 S YMBOL T ABLE A PPLICATIONS


‣ sets
‣ dictionary clients
‣ indexing clients
Algorithms
‣ sparse vectors

R OBERT S EDGEWICK | K EVIN W AYNE

http://algs4.cs.princeton.edu

14

File indexing File indexing

Goal. Given a list of files, create an index so that you can efficiently find all import java.io.File;
public class FileIndex
files containing a given query string. {
public static void main(String[] args)
{
ST<String, SET<File>> st = new ST<String, SET<File>>(); symbol table

% ls *.txt % ls *.java
aesop.txt magna.txt moby.txt BlackList.java Concordance.java for (String filename : args) { list of file names
File file = new File(filename); from command line
sawyer.txt tale.txt DeDup.java FileIndex.java ST.java
In in = new In(file);
SET.java WhiteList.java
while (!in.isEmpty())
% java FileIndex *.txt
{ for each word in file,
% java FileIndex *.java
String key = in.readString(); add file to
freedom if (!st.contains(key)) corresponding set
magna.txt moby.txt tale.txt import st.put(word, new SET<File>());
FileIndex.java SET.java ST.java SET<File> set = st.get(key);
whale set.add(file);
moby.txt Comparator }
null }
lamb
sawyer.txt aesop.txt while (!StdIn.isEmpty())
{
String query = StdIn.readString(); process queries
StdOut.println(st.get(query));
}
Solution. Key = query string; value = set of files containing that string. }
}
15 16
Book index Concordance

Goal. Index for an e-book. Goal. Preprocess a text corpus to support concordance queries:
given a word, find all occurrences with their immediate contexts.

% java Concordance tale.txt


cities
tongues of the two *cities* that were blended in

majesty
their turnkeys and the *majesty* of the law fired
me treason against the *majesty* of the people in
of his most gracious *majesty* king george the third

princeton
no matches

Solution. Key = query string; value = set of indices containing that string.

17 18

Concordance

public class Concordance


{
public static void main(String[] args)
{
In in = new In(args[0]);
String[] words = in.readAllStrings();
ST<String, SET<Integer>> st = new ST<String, SET<Integer>>();
for (int i = 0; i < words.length; i++)
3.5 S YMBOL T ABLE A PPLICATIONS
{ read text and
build index
String s = words[i];
if (!st.contains(s))
‣ sets
st.put(s, new SET<Integer>());
SET<Integer> set = st.get(s);
‣ dictionary clients
set.add(i); ‣ indexing clients
} Algorithms
‣ sparse vectors
while (!StdIn.isEmpty())
{ process queries
String query = StdIn.readString(); R OBERT S EDGEWICK | K EVIN W AYNE
and print
SET<Integer> set = st.get(query); concordances http://algs4.cs.princeton.edu
for (int k : set)
// print words[k-4] to words[k+4]
}
}
}
19
Matrix-vector multiplication (standard implementation) Sparse matrix-vector multiplication

a[][] x[] b[] Problem. Sparse matrix-vector multiplication.


0 .90 0 0 0 .05 .036 Assumptions. Matrix dimension is 10,000; average nonzeros per row ~ 10.
0 0 .36 .36 .18 .04 .297
0 0 0 .90 0 .36 = .333
.90 0 0 0 0 .37 .045
.47 0 .47 0 0 .19 .1927

Matrix-vector multiplication

...
double[][] a = new double[N][N];
double[] x = new double[N];
double[] b = new double[N];
...
// initialize a[][] and x[]
A * x = b
... nested loops
for (int i = 0; i < N; i++) (N2 running time)
{
sum = 0.0;
for (int j = 0; j < N; j++)
sum += a[i][j]*x[j];
b[i] = sum;
}
21 22

Vector representations Sparse vector data type

1d array (standard) representation. public class SparseVector


{
・Constant time access to elements. private HashST<Integer, Double> v;
HashST because order not important

・Space proportional to N. public SparseVector() empty ST represents all 0s vector


{ v = new HashST<Integer, Double>(); }
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0 .36 0 0 0 .36 0 0 0 0 0 0 0 0 .18 0 0 0 0 0 public void put(int i, double x) a[i] = value


{ v.put(i, x); }

public double get(int i)


{
if (!v.contains(i)) return 0.0;
return a[i]
else return v.get(i);
Symbol table representation. }

・Key = index, value = entry. public Iterable<Integer> indices()


・Efficient iterator.
iterate through indices of
{ return v.keys(); }
nonzero entries

・Space proportional to number of nonzeros. public double dot(double[] that)


key value { dot product is constant
double sum = 0.0; time for sparse vectors
st for (int i : indices())
sum += that[i]*this.get(i);
1 .36 5 .36 14 .18
return sum;
}
}
23 24
Matrix representations Sparse matrix-vector multiplication

2D array (standard) matrix representation: Each row of matrix is an array. a[][] x[] b[]

・Constant time access to elements. 0 .90 0 0 0 .05 .036

・Space proportional to N . 2
0 0 .36 .36 .18 .04 .297
0 0 0 .90 0 .36 = .333

Sparse matrix representation: Each row of matrix is a sparse vector. .90 0 0 0 0 .37 .045
.47 0 .47 0 0 .19 .1927
・Efficient access to elements.
・Space proportional to number of nonzeros (plus N).
array of double[]objects
Matrix-vector multiplication
array of double[]objects array of SparseVector
array objects
of SparseVector objects

0 0 1 1 2 2 3 3 4 4 st st
1 1 .90
.90
0.0
0.0 .90
.90 0.0
0.0 0.0
0.0 0.0
0.0
keykey value
value
..
0 0 1 1 2 2 3 3 4 4 st st
a a a a .36 3 3 .36
2 2 .36 .36 4 4 .18
.18 SparseVector[] a = new SparseVector[N];
0.0
0.0 0.0
0.0 .36
.36 .36
.36 .18
.18
0 0 0 0 double[] x = new double[N];
1 1 0 0 1 1 2 2 3 3 4 4 1 1 double[] b = new double[N];
st st
2 2 0.0
0.0 0.0
0.0 0.0
0.0 .90
.90 0.0
0.0 2 2 3 3 .90
.90 independent
independent
symbol-table
symbol-table ...
3 3 3 3 objects
objects
0 0 1 1 2 2 3 3 4 4 // Initialize a[] and x[]
4 4 4 4 st st
.90
.90 0.0
0.0 0.0
0.0 0.0
0.0 0.0
0.0 0 0 .90
.90
...
for (int i = 0; i < N; i++) linear running time
0 0 1 1 2 2 3 3 4 4
b[i] = a[i].dot(x); for sparse matrix
.45
.45 0.0
0.0 .45
.45 0.0
0.0 0.0
0.0 st st
0 0 .45
.45 2 2 .45
.45
a[4][2]
a[4][2]

Sparse matrix
Sparse representations
matrix representations 25 26

You might also like