0% found this document useful (0 votes)
37 views

Java Lect 15

This document discusses string comparison and distance metrics. It describes the Levenshtein distance technique for determining the minimum number of edits between two strings. This technique allows for substitutions, insertions, and deletions. It can be used for applications like spell checking. The document also provides an implementation of the Levenshtein algorithm using dynamic programming.

Uploaded by

Chander Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Java Lect 15

This document discusses string comparison and distance metrics. It describes the Levenshtein distance technique for determining the minimum number of edits between two strings. This technique allows for substitutions, insertions, and deletions. It can be used for applications like spell checking. The document also provides an implementation of the Levenshtein algorithm using dynamic programming.

Uploaded by

Chander Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 14

NLP-AI

Java Lecture No. 15

31 Dec 2004

Contents

String Distance
String Comparison
Need in Spell Checker
Levenshtein Technique
Swapping

31 Dec 2004

[email protected]

String Comparison
Accuracy measurement: compare the transcribed and intended
strings and identify the errors
Automated error tabulation: a tricky task.
Consider the following example:
transformation
(intended text)
transxformaion
(transcribed text)
A simple characterwise comparison gives 6 errors. But there
are only 2: insertion of x and omission of t.

31 Dec 2004

[email protected]

Need in Spell Checker


The difference between two strings is an important parameter
for suggesting alternatives for typographical errors
Example:
difference (game, game); //should be 0
difference (game, gme);
//should be 1
difference (game, agme);
//should be 2
Possible ways for correction (for last example):
1. delete a, insert a after g
2. insert g before a, delete the succeeding g
3. substitute g for a, substitute a for g
If search in vocabulary is unsuccessful, suggest alternatives
Words are arranged in ascending order by the string distance
and then offered as suggestions (with constraints)
31 Dec 2004
[email protected]

String Distance

Definition: String distance between two strings, s1 and s2,


is defined as the minimum number of point mutations
required to change s1 into s2, where a point mutation is one
of substitution, insertion, deletion
Widely used methods to find out string distance:
1.
2.

31 Dec 2004

Hamming String Distance: For strings of equal length


Levenshtein String Distance: For strings of unequal length

[email protected]

Levenshtein Technique

31 Dec 2004

Levenshtein Technique

31 Dec 2004

[email protected]

Levenshtein String Distance:


Implementation
int equal (char x,char y){
if(x = = y ) return 0; // equal operator
else return 1;
}
int Lev (string s1, string s2){
for (i=0;i<=s1.length();i++) D[i,0] = i; // Initializing first
column
for (i=0;i<=s2.length();i++) D[0,i] = i; // Initializing first
row
for (i=1;i<=s1.length();i++){
for (j=1;j<=s2.length();i++){
D[i,j]=min(D[i-1,j]+1,
D[i,j-1]+1,
31 Dec 2004
equal (s1[i] , s2[j]) + D[i-1,j-1] );

Levenshtein String Distance: Applications

Spell checking
Speech recognition
DNA analysis
Plagiarism detection

31 Dec 2004

Swapping
Swapping is an important technique
in most of the sorting algorithms.

int a = 242, b = 215, temp;


temp = a;

// temp = 242

a = b;

// a = 215

b = temp;

// b = 242

swap.java

31 Dec 2004

[email protected]

Bubble Sort
Initial elements : 4 2 5 1 9 3 8 7 6
iteration :
[1] 4 2 5 1 9 3 8 7 6
245193876
[2] 2 4 5 1 9 3 8 7 6
[3] 2 4 5 1 9 3 8 7 6
241593876
[4] 2 4 1 5 9 3 8 7 6
[5] 2 4 1 5 9 3 8 7 6
241539876
31 Dec 2004

Assignments

Swap two integers without using an extra variable


Swap two strings without using an extra variable

31 Dec 2004

[email protected]

References

http://www.merriampark.com/ld.htm
http://www.yorku.ca/mack/CHI01a.htm
http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Dynamic/e
dit

31 Dec 2004

[email protected]

End

Thank You!
Wish You a Very Happy New Year..
Yahoo!

31 Dec 2004

[email protected]

You might also like