Semester Final Project Report
Semester Final Project Report
Group Members:
Hafiz Muhammad Anas
Zain Ul Abideen
3.1 Pseudo-code:
n = t.length
m = p.length
h = dm-1 mod q
p = 0
t0 = 0
for i = 1 to m
p = (dp + p[i]) mod q
t0 = (dt0 + t[i]) mod q
for s = 0 to n - m
if p = ts
if p[1.....m] = t[s + 1..... s + m]
print "pattern found at position" s
If s < n-m
ts + 1 = (d (ts - t[s + 1]h) + t[s + m + 1]) mod q
Implementation
# Rabin-Karp algorithm
d = 10
def search(pattern, text, q):
m = len(pattern)
n = len(text)
p = 0
t = 0
h = 1
i = 0
j = 0
for i in range(m-1):
h = (h*d) % q
# Calculate hash value for pattern and text
for i in range(m):
p = (d*p + ord(pattern[i])) % q
t = (d*t + ord(text[i])) % q
# Find the match
for i in range(n-m+1):
if p == t:
for j in range(m):
if text[i+j] != pattern[j]:
break
j += 1
if j == m:
print("Pattern is found at position: " +
str(i+1))
if i < n-m:
t = (d*(t-ord(text[i])*h) + ord(text[i+m])) % q
if t < 0:
t = t+q
text = "ABCCDDAEFG"
pattern = "CDD"
q = 13
search (pattern, text, q)
4.1 Pseudo-code:
(a) COMPUTE- PREFIX- FUNCTION (P)
1. m ←length [P]
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
(b) KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P [q + 1] ≠ T[i]
7. do q ← Π [q]
8. If P [q + 1] = T[i]
9. then q ← q + 1
10. If q = m
11. then print "Pattern occurs with shift" i - m
12. q ← Π[q]
Implementation
6. Conclusion:
The nice advantage about KMP is that its worst-case efficiency is guaranteed. Preprocessing
takes always O(n) time, while searching takes always O(m) . There is no possibility of being
unfortunate, no worst-case inputs, etc. For the string-matching issue, Knuth-Morris and Pratt
offer a linear time solution. By eliminating comparisons with the elements that have previously
been used in comparisons with elements of the pattern 'p' to be matched, a matching time of O
(n) is attained.
7. Future Work:
For the most part, accurate plagiarism detection is currently limited to text content. The three
sectors that could benefit from a similar plagiarism checker product – art, music, and video. All
of these industries currently have a number of outstanding legal cases and disputes over
plagiarized content. With the right technology, plagiarism in all of these fields could be
minimized through detection and checking prior to distribution.
8. References:
1. https://www.cs.auckland.ac.nz/courses/compsci369s1c/lectures/GG-notes/CS369-
StringAlgs.pdf
2. https://www.researchgate.net/publication/335319583_Plagiarism_Detection_Software_an
_Overview
3. https://www.javatpoint.com/daa-knuth-morris-pratt-algorithm
4. https://www.cs.ubc.ca/labs/algorithms/Courses/CPSC445 08/Handouts/kmp.pdf
5. https://www.researchgate.net/publication/311205690_Overview_of_Different_Plagiarism
_Detection Tools
6. https://web.stanford.edu/class/cs97si/10-string-algorithms.pdf
7. https://www.codespeedy.com/knuth-morris-pratt-kmp-algorithm-in-c/