0% found this document useful (0 votes)

134 views17 pages

String Algorithm

Implementation of string algorithm, including String sort: MSD, LSD and 3-way Quicksort; Searching using Tries.

Uploaded by

Hafidz Jazuli Luthfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

134 views17 pages

String Algorithm

Implementation of string algorithm, including String sort: MSD, LSD and 3-way Quicksort; Searching using Tries.

Uploaded by

Hafidz Jazuli Luthfi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 17

String Processing

There are many practical applications of string processing, such as processing encoded information in
the information processing, processing genetic codes that encoded by four characters (A, C, T, and G)
in the computational biology, processing data exchange in communications system, the study of
describing sets of strings which is belong to theory of formal languages in programming systems.

1. String Sorts
This section discuss how we solve sorting problem in the case of string. Does not like any general
sorting algorithms that we already knew, sorting the string has different method to achieve better
performance. On the other hand, we should have insight about the string before execute any sorting
algorithm. The useful insight that may important, such as, randomness, longest common prefixes,
encoding used in the string, etc. Without enough insight about the string we dealing, our sorting
algorithm may not performed well as expected.

Key-Indexed Counting
Before we discuss any string algorithms, we need to
understand how to indexing set of string-integer pairs. An
example of string-integer pairs illustrated in the left.
Intuitively, the idea of key-indexed is sort string by its key.
That’s why we can utilized this method for complex string
sort discussed later. Below, the implementation of key-
indexed counting in java:
int N = a.length;

String[] aux = new String[N];

int[] count = new int[R+1]; // R is Radix

// Computer frequency counts

for (int i=0; i < N; i++)
// key() return section
count[a[i].key() + 1]++;
// Transform counts to indicies
for (int r = 0; r < R; r++)
count[r+1] += count[r];
// Distribute the records
for (int i = 0; i < N; i++)
aux[count[a[i].key()]++] = a[i];
// Copy back
for (int i = 0; i < N; i++)
a[i] = aux[i];
At first step, we used key+1 index in frequency counts to get the offset of each occurrences, not just
count how many key we found. In the example above, the result of count array will be [0, 0, 3, 5, 6, 6].
The first and second index will always be zero.
At second step, we transform our array count into array of indexes, so our count array will be
[0, 0, 3, 8, 14, 20]. What we can tell from that array is we can found strings with key of two start at
index two, we can found strings with key of three start at index seven, and goes on for the rest on array.
Some may think that this method similar with offset method in pagination. Pay attention that our first
and second indexes still has zero value.
At third step, we distributed each string based on its key that we have indexed in the count array. By
incrementing each index stored in count array, our count array will be [0, 3, 8, 14, 20, 20] after each
string sorted in the auxiliary array.
Last step is copying all sorted string in the auxiliary array back into original array of string.
The running time of key-indexed counting is 8N+3R+1 which is proofed by N+R+1 for initializations,
2N for first step, 2R for second step, 3N for third step, and 2N for fourth step.

Least-Significant-Digit (LSD) Radix Sort

The term least significant means we examines every character in the string from right-to-left, while
term digit means we used standard string encoding, such as base-256 ASCII string. Since doing scan
from right-to-left, LSD is useful for sorting fixed-length strings, such as car license plates, IP
addresses, back account numbers, telephone number, etc.
LSD is quite simple because we can adapt the key-indexed counting method as we mentioned before.
Here the implementation of LSD in java:
int N = a.length
int R = 256;
String[] aux = new String[N];

for (int d = W-1; d>= 0; d--)

{ // W is fixed length of all strings
int[] count = new int[R+1];
for (int i = 0; i < N; i++)
count[a[i].charAt(d) + 1]++;
for (int r = 0; r < R; r++)
count[r+1] += count[r];
for (int i = 0; i < N; i++)
aux[count[a[i].charAt(d)]++] = a[i];
for (int i = 0; i < N; i++)
a[i] = aux[i];
}
The trace of LSD algorithm illustrated in the image below:
Image 1: Trace of LSD

The running time of LSD is ~7WN+3WR and extra space proportional to N+R.

Most-Significant-Digit (MSD) Radix Sorting

Surely, most of practical cases in string processing dealing with non-fixed length string. So, to achieve
general purpose string sort algorithm we need to implement a method that capable to scan character of
string from left-to-right. In this case, we implement MSD.
The key idea of MSD similar with quicksort, but instead used two or three partition, MSD only used a
subarray as partition to sort. Then for every string in the set, MSD recursively partitioning and sort
subarray. Surely, the length of partition always reduced over time since we dealing with non-fixed
length string.
Below the implementation of MSD in java:
public class MSD {
private static int R = 256; // radix
private static final int M = 5; // cuttof that indicate small array
private static String[] aux; // Auxiliary array for distribution

private static int charAt(String s, int d) {

// Return character encoding index, or -1 if outsite length of string
if (d < s.length()) return s.chartAt(d); else return -1;
}

public static void sort(String[] a) {

int N = a.length;
aux = new String[N];
sort(a, 0, N-1, 0);
}

private static void sort(String[] a, int lo, int hi, int d) {

if (hi <= lo + M) {
Insertion.sort(a, lo, hi, d); return;
}
int[] count = new int[R+2];
for (int i = lo; i <= hi; i++)
count[charAt(a[i], d) + 1]++;
for (int r = 0; r < R+1; r++)
count[r+1] += count[r];
for (int i = lo; i <= hi; i++)
aux[count[charAt(a[i], d) + 1]++] = a[i];
for (int i = lo; i <= hi; i++)
a[i] = aux[i-lo];
for (int r = 0; r < R; r++)
sort(a, lo + count[r], lo + count[r+1] – 1, d+1);
}
}
Image below illustrate MSD on small set of string which has R=15 (LOWERCASE encoding):

Image 2: Trace of MSD

We maintenance cutoff value to identify small array, so we can improve running time of sorting in
MSD by implementing insertion sort for small array. Another aspect that we should concern about is
radix range in MSD. It is okay for standard ASCII which has R=256, but for UNICODE which has
R=65536, it is not good.
The running time of MSD is between 8N+3R and ~7wN+3WR, where w is the average string length.

Three-Way String Radix Quicksort

As mentioned before, MSD could be slow for the case of huge encoding sizes, such as UNICODE
which is has 65536 mapped encodes. To solve that problem, we need to remove need of auxiliary array.
Three-way string quicksort will do that, by maintenance three-way partitioning code: first partition less
than second partition and third partition larger than second partition. We can say that this algorithm is a
hybrid that combine normal quicksort and MSD.
Below the implementation of three-way string quicksort in java:
public class Quick3string {
private static int charAt(String s, int d) {
// Return character encoding index, or -1 if outsite length of string
if (d < s.length()) return s.chartAt(d); else return -1;
}

public static void sort(String[] a) {

sort(a, 0, a.length -1, 0);
}

private static void sort(String[] a, int lo, int hi, int d) {

if (hi <= lo) return;

int lt = lo, gt = hi;

int v = charAt(a[lo], d);
int i = lo + 1;
while (i <= gt) {
int t = charAt(a[i], d);
if (t < v) exch(a, lt++, i++);
else if (t < v) exch(a, i, gt--);
else i++;
}

sort(a, lo, lt-1, d);

if (v >= 0) sort(a, lt, gt, d+1);
sort(a, gt+1, hi, d);
}
}
Image below illustrate trace of three-way quicksort in small set of strings without cutoff:

Image 3: Trace of 3-way string quicksort

Since in the core three-way quicksort implemented normal quicksort extensively, the running time of
three-way quicksort s about ~2NlnN on the average.

2. Tries
In this section, we dealing with string search problem. A primitive algorithm such as BST or Red-
Black BST has been discovered as base performance in the case of searching set of integers or
characters. In string cases, we look for improvement opportunity by looking at the pattern such as some
of string may share common longest prefix or has uniformly distributed length. The tries is pattern
aware, thus offered suitable data structure that optimizing string search running time. It is like
compression method, but in this case we highly used graph theory.
Too avoid confusion with term ‘tree’, a term trie came from word retrieval which is introduced by
E.Fredkin in 1960. It is little bit like a wordplay same as term ‘dynamic programming’ that refer to
mathematical concept, not exactly about computer programming.
Image 4 illustrated the anatomy of a trie correspond to small set of words. To reproduce a trie
construction illustrated in image 4, we need to following rules:
1. A root is a node with null values.
2. All leafs are nodes that do not have
child or followed by null link.
3. A trie correspond to map of key-
value pair, in this case, key is string
and value is integer of string
identifier.
4. A string may a prefix or suffix of
another string, while a complete
string is a path from root to any leaf.
For example, a word ‘she’ is a prefix
of word ‘shells’ that each terminated
by id 0 and 3.

Image 4: Anatomy of a trie

R-Way Trie
The idea of R-way trie is we take R possible value to create each parent node, so we can satisfy
randomness of input string cases. For example, if we need to indexing any ASCII based words and all
the words are distributed uniformly, then at least we need to satisfy LR possible combinations, where L
is average length of string. Image 5 illustrated the construction of R-Way trie for 256 ASCII characters
of word sea, shells, and she.

Image 5: R-Way tries construction

Below, the implementation of R-way tries in Java:

public class TrieST<Value>
{
private static int R = 256;
private Node root;

private static class Node

{
private Object val;
private Node[] next = new Node[R];
}

public Value get(String key)

{
Node x = get(root, key, 0);
if (x == null) return null;
return (Value) x.val;
}

private Node get(Node x, String key, int d)

{
if (x == Null) return null;
if (d == key.length()) return x;
char c = key.chartAt(d);
return get(x.next[c], key, d+1);
}

public void put(String key, Value val)

{
root = put(root, key, val, 0);
}
private Node put(Node x, String key, Value val, int d)
{
if (x == null) x == new Node();
if (d == key.length()) { x.val = val; return x; }
char c = key.chartAt(d);
x.next[c] = put(x.next[s], key, val, d+1);
return x;
}
}
Since each level of trie consisted by nodes taken from set of array with R-length, then each level has
probability 1/R to be passed by search operations. Thus, the running time of R-way tries is ~logRN and
space in between RN and RNw, where R is radix, N is max length of string, and w is average length of
string. In the case of speed, this is good, but not for space. In typical system, R-way tries will be not
possible to be implemented for millions string of unicode.

Ternary Search Tries (TST)

As mentioned before, the implementation of R-way tries
required huge of space. In the practical point of view, the R-
way tries is not suitable. To be helped with practical cases of
bug data, we need to understand that the typical input is not
random. In the case of non-randomness string, we knew that
there will be occurred such any patterns and repetition. Thus,
by knowing this fact, it is possible to reduce R-way tries into
K-way tries which is value of K should be very small as
possible.
Based on its name, Ternary Search Tries (TST) is reduced
version of R-way tries that has three links at most. Looking the
construction of TST which illustrated in image 6, we realize
that TST required different representation. Since we reduced
the possibility of word occurrences, each character appear
explecitly in nodes, so we only need to examines exactly 1/3
probability despite the length of its encoding. The search
operation illustrated in image 7.

Image 6: TST constuction

Image 7: Search on TST

Below the implementation of TST in java:

public class TST<Value>
{
private Node root;

private class Node

{
char c;
Node left, mid, right;
Value val;
}

public Value get(String key);

private Node get(Node x, String key, int d)

{
if (x == null) return null;
chat c – key.chartAt(d);
if (c < x.c) return get(x.left, key, d);
else if (c > x.c) return get(x.right, key, d);
else if (d < key.length() - 1) return get(x.mid, key, d+1)
else return x;
}

public void put(String key, Value val)

{
root = put(root, key, val, 0);
}

private Node put(Node x, String key, Value val, int d)

{
char c = key.charAt(d);
if (x == null) { x = new Node(); x.c = c; }
if (c < x.c) x.left = put(x.left, key, val, d);
else if (c > x.c) x.right = put(x.roght, key, val, d);
else if (d < key.length() - 1) x.mid = put(x.mid, key, val, d+1);
else x.val = val;
return x;
}
}
Since there is 1/3 possibility of occurrence of each character in string, the running time of TST is
approximately ~ln N and space in between 3N and 3Nw.

3. Substring Search
Let’s make a leap, since brute force implementation is not consideration because its running time
(~NM), our attention here is more efficient algorithms.

Knuth-Morris-Pratt (KMP) Substring Search

The idea of KMP algorithm is by remove backup lookup in brute force algorithm, we can implement
substring search in linear time, equal to N characters. For example, we need to find pattern ABABC in
text ABABABC. When there is a mismatch in position 4, KMP will choose better restart position which
is 2 to continue the search. To make it happen, KMP need to do preprocessing which has running time
equal to M. One of proper preprocessing for KMP is DFA (Deterministic Finite Automata).
The graphical implementation of constructing DFA of pattern ABABC illustrate in image 8 and 9.

Image 8: Constraction of DFA ABABC (1)

Image 9: Construction of DFA ABABC (2)

The implementation of KMP algorithm in java will be like this:

import edu.princeton.cs.algs4.In;
import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;

public class KMP

{
private String pat;
private int[][] dfa;

public KMP(String pat)

{ // Construct DFA from pattern
this.pat = pat;
int M = pat.length();
int R = 256;
dfa = new int[R][M];
dfa[pat.charAt(0)][0] = 1;
for (int X = 0, j = 1; j < M; j++) {
// Compute dfa[][]
for (int c = 0; c < R; c++)
dfa[c][j] = dfa[c][X];
dfa[pat.charAt(j)][j] = j+1;
X = dfa[pat.charAt(j)][X];
}
}

public int search(String txt)

{ // Simulate operation of DFA on txt
int i, j, N = txt.length(), M = pat.length();
for (i = 0, j = 0; i<N && j<M; i++)
j = dfa[txt.charAt(i)][j];
if (j == M) return i-M; // Found
else return N; // Not found
}

/**
* Will print:
* % java KMP AACAA AABRAACADABRAACAADABRA
* text: AABRAACADABRAACAADABRA
* pattern: AACAA
*/
public static void main(String[] args)
{
String pat = args[0];
String txt = args[1];
KMP kmp = new KMP(pat);
StdOut.println("text: " + txt);
int offset = kmp.search(txt);
StdOut.print("pattern: ");
for (int i = 0; i < offset; i++)
StdOut.print(" ");
StdOut.println(pat);
}
}

Worst case processing time of KMP is N+M, since M time for preprocessing and N time for searching.

Boyer-Moore Substring Search

Boyer-Moore algorithm used heuristic method to find mismatched and scan each character in the
direction right-to-left. Below the step by step of how Booyer-Moore algorithm doing mismatched
character heuristic:
1. Let i = 0 and s = 0 for skip value.
2. Let j = length of pattern.
3. Compare character of pattern[j] with text[i+j].
4. If not matched, set s to j-(index of missmatched). then increment i by s. Back to step 2.
5. Else decrement j by 1 and back to step 3.
An image below illustrate how Boyer-Moore algorithm find pattern NEEDLE in the text
FINDINAHAYSTACKNEEDLE.

Image 10: Example of missmatched character heuristic

Obviously, we can do this in linear time by constructing an array to lookup which index in the pattern.
Lets called that array as right[] since we used this array for reverse lookup, right to left. Simply, we just
need to allocate R sized array, then filled each index by -1 for each index of character that not found in
the pattern, and 0 to M for character found in the pattern by left order. The construction of right[] array
for pattern NEEDLE illustrate in image 11.

Image 11: Right array example

The implementation of Boyer-Moore algorithm in java will be like this:

import edu.princeton.cs.algs4.In;
import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;

class BoyerMoore
{
private int[] right;
private String pat;

BoyerMoore(String pat)
{ // Computer skip table
this.pat = pat;
int M = pat.length();
int R = 256;
right = new int[R];
for (int c=0; c<R; c++)
right[c] = -1;
for (int j=0; j<M; j++)
right[pat.charAt(j)] = j;
}

public int search(String txt)

{ // Search for pattern in txt
int N = txt.length();
int M = pat.length();
int skip;
for (int i=0; i<=N-M; i+=skip) {
skip = 0;
for (int j=M-1; j>=0; j--) {
if (pat.charAt(j) != txt.charAt(i+j)) {
skip = j-right[txt.charAt(i+j)];
if (skip<1) skip = 1;
break;
}
}
if (skip == 0) return i; // Found
}
return N; // Not found
}

/**
* Will print:
* % java BoyerMoore AACAA AABRAACADABRAACAADABRA
* text: AABRAACADABRAACAADABRA
* pattern: AACAA
*/
public static void main(String[] args)
{
String pat = args[0];
String txt = args[1];
BoyerMoore boyerMoore = new BoyerMoore(pat);
StdOut.println("text: " + txt);
int offset = boyerMoore.search(txt);
StdOut.print("pattern: ");
for (int i = 0; i < offset; i++)
StdOut.print(" ");
StdOut.println(pat);
}
}

The typical implementation of Boyer-Moore algorithm like code above will guarantee worst case
running time to NM. Furthermore, a full implementation of Boyer-Moore will provide linear-time
worst-case guarantee by implementing KMP-like table. If we look into Image 10, its like we can
choose better skip value by implementing KMP-like array rather than simply decrement index of j.

Rabin-Karp Fingerprint Search

Rabin-Karp algorithm use hash function to encode pattern and substring of text, so a match occurred if
only if encoded value of pattern equal to substring. We implement Horner method to do efficient
modular hashing on substring of M length. A simple hash function of substring formulated by:
M −1 M −2 0
x i = ti R + ti + 1 R + ... + t i + M − 1 R

To avoid recalculation, we can derived formula above to compute next substring by subtracting one left
most binomial value, increase the order of rest binomial and added a constant at last:

x i + 1 =(x i − t i RM − 1) R + t i + M
To avoid exhaustive computation and reduce space (since length of integer 2 31), we only need to
reproduce the remainder of each hash value. This method called as modular hashing.
H (x i ) = x i mod Q

where Q is any prime number. This calculation is similar to calculation in hash table, but in this case,
we do not store each hash value in a table, we only care of the hash value for each substring. Since
choose for right value of Q is not trivial problem, we can guarantee that by choosing large enough
value for Q, the probability of collision is 1/Q. This approach called as by Monte Carlo correctness.
But, if in the case we use defensive approach, then we should ensure that matching always correct. In
that case we used approach called as Las Vegas correctness. This approach required back-up for
testing of correctness operation. Graphical implementation of Rabin-Karp algorithm illustrated in
image below.

Image 12: Rabin-karp implementation

The implementation of Rabin-Karp algorithm in Java will be look like this:

import edu.princeton.cs.algs4.In;
import edu.princeton.cs.algs4.StdIn;
import edu.princeton.cs.algs4.StdOut;

public class RabinKarp

{
private String pat; // Only needed for Las Vegas
private long patHash;
private int M;
private long Q;
private int R = 256;
private long RM;

public RabinKarp(String pat)

{
this.pat = pat;
this.M = pat.length();
Q = longRandomPrime();
RM = 1;
for (int i = 1; i <= M; i++)
RM = (R * RM) % Q;
patHash = hash(pat, M);
}

// Monte carlo
// public boolean check(int i)
// {
// return true;
// }

// Las Vegas
public boolean check(String txt, int i)
{
for (int j = 0; j < M; j++)
if (pat.charAt(j) != txt.charAt(i+j))
return false;
return true;
}

private long hash(String key, int M)

{
long h = 0;
for (int j = 0; j < M; j++)
h = (R * h + key.charAt(j)) % Q;
return h;
}

private int search(String txt)

{
int N = txt.length();
long txtHash = hash(txt, M);
if (patHash == txtHash) return 0;
for (int i = M; i < N; i++) {
txtHash = (txtHash + Q - RM*txt.charAt(i-M) % Q) % Q;
txtHash = (txtHash*R + txt.charAt(i)) % Q;

int offset = i - M + 1;
if (patHash == txtHash && check(txt, offset))
return offset; // Match
}
return N; // No match found
}
}

Rabin-Karp substring search is known as fingerprint search because it uses a small amount of
information to represent a pattern. Then it looks for this fingerprint (the hash value) in the text. The
algorithm is efficient because the fingerprints can be efficiently computed and compared.
Summary
Algorithm Version Guarantee Typical Backup? Correct? Extra space
Brute force - MN 1.1 N yes yes 1
Full DFA 2N 1.1 N no yes MR
Mismatch transition
KMP 3N 1.1 N no yes M
only
Full algorithm 3N N/M yes yes R
Boyer-Moore Mismatch char heuristic MN N/M yes yes R
Monte Carlo 7N 7N no yes* 1
Rabin-Karp
Las vegas 7N* 7N yes yes 1

18 Radix Sort
No ratings yet
18 Radix Sort
51 pages
Chapter 11 - Strings and Characters: Setlength and Ensurecapacity Getchars and Reverse
No ratings yet
Chapter 11 - Strings and Characters: Setlength and Ensurecapacity Getchars and Reverse
55 pages
Introduction To Information Rertrieval Answer
100% (4)
Introduction To Information Rertrieval Answer
6 pages
4.2 Sorting and Searching
No ratings yet
4.2 Sorting and Searching
43 pages
Radix Search Tree
100% (1)
Radix Search Tree
18 pages
Java Api
No ratings yet
Java Api
45 pages
Module 06. String Algorithms Lecture 3-6
No ratings yet
Module 06. String Algorithms Lecture 3-6
48 pages
Lecture03 PDF
No ratings yet
Lecture03 PDF
22 pages
Java Characters
No ratings yet
Java Characters
46 pages
Dokumen - Tips Oca Java Se 8 Exam Chapter 3 Core Java Apis
No ratings yet
Dokumen - Tips Oca Java Se 8 Exam Chapter 3 Core Java Apis
66 pages
Module 3 New - 241121 - 210448
No ratings yet
Module 3 New - 241121 - 210448
39 pages
CSCI 1933 Lecture4
No ratings yet
CSCI 1933 Lecture4
114 pages
Lecture 01
No ratings yet
Lecture 01
28 pages
Java
No ratings yet
Java
99 pages
(61B SP25) Lecture 37 - Algorithm Design and Reductions
No ratings yet
(61B SP25) Lecture 37 - Algorithm Design and Reductions
36 pages
Object Oriented Programming With Java: Department of Ce/It Unit-2 Array & String OOPJ (01CE0403)
No ratings yet
Object Oriented Programming With Java: Department of Ce/It Unit-2 Array & String OOPJ (01CE0403)
48 pages
Unit Iii1
No ratings yet
Unit Iii1
100 pages
1415060374
No ratings yet
1415060374
26 pages
String
No ratings yet
String
26 pages
Radix Sort (Chapter 10)
No ratings yet
Radix Sort (Chapter 10)
11 pages
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
No ratings yet
Daa Tut 6 Sudhanshu Raut: Pseudo Code For KMP Algorithm
11 pages
3 - Arrays and Strings
No ratings yet
3 - Arrays and Strings
17 pages
String Sorts (Java)
No ratings yet
String Sorts (Java)
71 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
51 Stringsorts
No ratings yet
51 Stringsorts
69 pages
Cheatsheet Leetcode A4
No ratings yet
Cheatsheet Leetcode A4
8 pages
String 1.1
No ratings yet
String 1.1
34 pages
M6L1 Lyst1713802616915
No ratings yet
M6L1 Lyst1713802616915
30 pages
Unit II
No ratings yet
Unit II
38 pages
Radix 4up
No ratings yet
Radix 4up
10 pages
Lecture Notes On Tries
No ratings yet
Lecture Notes On Tries
10 pages
Bstract ATA Ypes Data Structures and Algorithms in Java Boro Jakimovski
No ratings yet
Bstract ATA Ypes Data Structures and Algorithms in Java Boro Jakimovski
23 pages
Chapter 3 Part 2
No ratings yet
Chapter 3 Part 2
22 pages
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
No ratings yet
Simple Linear Work Su X Array Construction: Abstract. A Su X Array Represents The Su Xes of A String in Sorted
13 pages
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
No ratings yet
Introduction To Information Storage and Retrieval: Chapter Four: Indexing Structure
34 pages
Data Structures For Strings
No ratings yet
Data Structures For Strings
21 pages
Lab CS213 - 21 03 2023 PDF
No ratings yet
Lab CS213 - 21 03 2023 PDF
7 pages
Modern Programming Tools and Techniques-I: String
No ratings yet
Modern Programming Tools and Techniques-I: String
26 pages
Samsung - LeetCode
No ratings yet
Samsung - LeetCode
4 pages
Arrays in Detail
No ratings yet
Arrays in Detail
11 pages
DSA Assignment 01
No ratings yet
DSA Assignment 01
15 pages
Graph Algorithm
No ratings yet
Graph Algorithm
14 pages
Suffix Arrays: Justin Zhang 24 May 2017
No ratings yet
Suffix Arrays: Justin Zhang 24 May 2017
5 pages
Chapter 3,4, 5 and 6
No ratings yet
Chapter 3,4, 5 and 6
145 pages
Ist Lab Java Pgms
No ratings yet
Ist Lab Java Pgms
7 pages
Lab 3
No ratings yet
Lab 3
6 pages
Introduction To String
No ratings yet
Introduction To String
26 pages
Radix Sort
No ratings yet
Radix Sort
1 page
Tries: Symbol Table Review
No ratings yet
Tries: Symbol Table Review
8 pages
Ethereum A Secure Decentralised Generalised Transaction Ledger Yellow Paper
No ratings yet
Ethereum A Secure Decentralised Generalised Transaction Ledger Yellow Paper
29 pages
Java Cheatsheet - Arrays - Strings
No ratings yet
Java Cheatsheet - Arrays - Strings
4 pages
java_dsa1
No ratings yet
java_dsa1
36 pages
Java 4th Unit
No ratings yet
Java 4th Unit
46 pages
String Class and Its Methods
No ratings yet
String Class and Its Methods
4 pages
Radix Sort: Problem Description
No ratings yet
Radix Sort: Problem Description
5 pages
Java String Class Tutorial
No ratings yet
Java String Class Tutorial
5 pages
Java - String Class: Creating Strings
No ratings yet
Java - String Class: Creating Strings
5 pages
m4 Java
No ratings yet
m4 Java
80 pages
Ch-10 Strings in Java
No ratings yet
Ch-10 Strings in Java
6 pages
Leetcode 75 Questions (NeetCode On Yt) - Sheet1
No ratings yet
Leetcode 75 Questions (NeetCode On Yt) - Sheet1
2 pages
CAD, Mechatronics
No ratings yet
CAD, Mechatronics
168 pages
Data Structures and Algorithms Problems Techie Delight PDF
No ratings yet
Data Structures and Algorithms Problems Techie Delight PDF
21 pages
Array and String Tecniques
No ratings yet
Array and String Tecniques
5 pages
String Handling and Library Class Notes
No ratings yet
String Handling and Library Class Notes
9 pages
DSA Quick Revision Guide
No ratings yet
DSA Quick Revision Guide
49 pages
Ethereum Unit4
No ratings yet
Ethereum Unit4
167 pages
Microsoft Engineer SDE2 Handbook
No ratings yet
Microsoft Engineer SDE2 Handbook
59 pages
Sona Dsa Java
No ratings yet
Sona Dsa Java
74 pages
Chapter-4 - Data Structure-File Structure
No ratings yet
Chapter-4 - Data Structure-File Structure
34 pages
Crio DevSprint DSA Syllabus
No ratings yet
Crio DevSprint DSA Syllabus
4 pages
Java String Methods With Examples
No ratings yet
Java String Methods With Examples
1 page
String Matching
No ratings yet
String Matching
4 pages
DS UNIT WISE Important Questions
No ratings yet
DS UNIT WISE Important Questions
4 pages
Blind 75
No ratings yet
Blind 75
4 pages
Programming-Assignment-1
No ratings yet
Programming-Assignment-1
20 pages
Data Compression
No ratings yet
Data Compression
12 pages
Graph Algorithm
No ratings yet
Graph Algorithm
4 pages
Hygen: Efficient LLM Serving Via Elastic Online-Offline Request Co-Location
No ratings yet
Hygen: Efficient LLM Serving Via Elastic Online-Offline Request Co-Location
15 pages
Linear Pattern Matching of Repeated Substrings: Alejandro L Opez-Ortiz
No ratings yet
Linear Pattern Matching of Repeated Substrings: Alejandro L Opez-Ortiz
10 pages
Microsoft - LeetCode
No ratings yet
Microsoft - LeetCode
14 pages
Introduction To Information Rertrieval Recitation
No ratings yet
Introduction To Information Rertrieval Recitation
2 pages
Huffman
No ratings yet
Huffman
35 pages
How Does Ethereum Work, Anyway by Preethi Kasireddy
No ratings yet
How Does Ethereum Work, Anyway by Preethi Kasireddy
40 pages
Routers and Routing Algorithms: CMPT 765 Lecture Notes
No ratings yet
Routers and Routing Algorithms: CMPT 765 Lecture Notes
7 pages
2024 04 Offchain Arbos 30 Nitro Upgrade Securityreview
No ratings yet
2024 04 Offchain Arbos 30 Nitro Upgrade Securityreview
33 pages
Rapid Association Rule Mining
No ratings yet
Rapid Association Rule Mining
9 pages
Advanced Data Structures and Algorithms Roadmap PDF by ScholarHat
No ratings yet
Advanced Data Structures and Algorithms Roadmap PDF by ScholarHat
33 pages
Another 100 DataStructures Algorithms Projects
No ratings yet
Another 100 DataStructures Algorithms Projects
3 pages
15.achieving Efficient and Privacy-Preserving Set Containment Search
No ratings yet
15.achieving Efficient and Privacy-Preserving Set Containment Search
15 pages
Graph and Tree
No ratings yet
Graph and Tree
3 pages
Dsa 2022
No ratings yet
Dsa 2022
3 pages
Leetcode Questions
No ratings yet
Leetcode Questions
1 page

String Algorithm

Uploaded by

String Algorithm

Uploaded by

String Processing

String[] aux = new String[N];

// Computer frequency counts

Least-Significant-Digit (LSD) Radix Sort

for (int d = W-1; d>= 0; d--)

Most-Significant-Digit (MSD) Radix Sorting

private static int charAt(String s, int d) {

public static void sort(String[] a) {

private static void sort(String[] a, int lo, int hi, int d) {

Image 2: Trace of MSD

Three-Way String Radix Quicksort

public static void sort(String[] a) {

private static void sort(String[] a, int lo, int hi, int d) {

int lt = lo, gt = hi;

sort(a, lo, lt-1, d);

Image 3: Trace of 3-way string quicksort

Image 4: Anatomy of a trie

Image 5: R-Way tries construction

Below, the implementation of R-way tries in Java:

private static class Node

public Value get(String key)

private Node get(Node x, String key, int d)

public void put(String key, Value val)

Ternary Search Tries (TST)

Image 6: TST constuction

Below the implementation of TST in java:

private class Node

public Value get(String key);

private Node get(Node x, String key, int d)

public void put(String key, Value val)

private Node put(Node x, String key, Value val, int d)

Knuth-Morris-Pratt (KMP) Substring Search

Image 8: Constraction of DFA ABABC (1)

The implementation of KMP algorithm in java will be like this:

public class KMP

public KMP(String pat)

public int search(String txt)

Boyer-Moore Substring Search

Image 10: Example of missmatched character heuristic

Image 11: Right array example

The implementation of Boyer-Moore algorithm in java will be like this:

public int search(String txt)

Rabin-Karp Fingerprint Search

Image 12: Rabin-karp implementation

The implementation of Rabin-Karp algorithm in Java will be look like this:

public class RabinKarp

public RabinKarp(String pat)

private long hash(String key, int M)

private int search(String txt)

You might also like