Daa Mini Project
Daa Mini Project
Nashik
Mini-Project Report
1.0 Rationale:
In this mini project, we have implemented two popular algorithms for string matching: the Naive String
Matching Algorithm and the Rabin-Karp Algorithm. String matching is a fundamental problem in computer
science, and it has a wide range of applications, such as in search engines, DNA sequence analysis, and text
editing software. The Naive String Matching Algorithm is simple and brute force in nature, while the Rabin-
Karp Algorithm introduces a more efficient approach using hashing for pattern matching.
The Rabin-Karp Algorithm improves upon this by introducing hashing. Instead of comparing the pattern with
every substring directly, it computes a hash for the pattern and for each substring of the text, and only compares
the actual strings if the hash values match.
2
5.0 Actual Methodology followed:
1. Start: Identify the problem of string matching and decide on two algorithms: Naive and Rabin-
Karp.
2. Algorithm Implementation:
a. Implement the Naive String Matching Algorithm, iterating through the text and
comparing substrings.
b. Implement the Rabin-Karp Algorithm, using a rolling hash function for efficient pattern
matching.
3. Testing: Test both algorithms on various text inputs to observe their performance with different
4. Time Complexity Analysis: Measure the time taken by both algorithms and analyze their time
5. Edge Case Handling: Ensure the algorithms work correctly when the pattern is not found, or
time complexity.
3
6.0 Actual Code of Program:
# Loop over every position where the pattern can fit in the text
for i in range(n - m + 1):
# Check for a match between the text and the pattern
if text[i:i + m] == pattern:
print(f"Pattern found at index {i}")
# Compute the hash value of the pattern and the first window of text
for i in range(m):
p = (d * p + ord(pattern[i])) % q
t = (d * t + ord(text[i])) % q
4
if p == t:
# If the hash values match, check the characters one by one
if text[i:i + m] == pattern:
print(f"Pattern found at index {i}")
5
Explanation of Differences
1. Naive Algorithm:
Time Complexity: O (n * m), where n is the length of the text and m is the length of the pattern.
It checks each possible substring of the text by comparing it character by character with the
pattern.
This is simple but can be inefficient for larger texts and patterns due to the quadratic time
complexity.
2. Rabin-Karp Algorithm:
Time Complexity: O (n + m) on average and O (n * m) in the worst case (due to hash collisions).
It uses a rolling hash to efficiently compare the pattern with the text by checking only hash
values. If the hashes match, it compares the actual strings.
This can be much faster than the naive method.
1. Search engines: To find occurrences of specific words or phrases within a large body of text.
2. Text editors: To implement the "Find and Replace" functionality efficiently.
3. Plagiarism detection systems: To compare large sets of text for similarities.
4. DNA sequence analysis: To match specific patterns of nucleotides in biological data.