0% found this document useful (0 votes)
14 views

Anagram Substring Search

PHYTON PROGRAMMING

Uploaded by

aavasindia.com
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Anagram Substring Search

PHYTON PROGRAMMING

Uploaded by

aavasindia.com
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Harshit Sachan

Anagram Substring Search using Python


Given a string s and a pattern pat. Write a function which accept the string and return all
the occurrences of anagram of the pattern pat.

Approach 1:

The brute force approach is to check for the anagram of the given pattern at all possible
indexes i. The pattern will start at the index i in the given string.
You can match the frequency of the character of the string starting at index i in the string
s with the frequency of the character in the pat string.

Algorithm:
● Consider the Input txt[] = "BACDGABCDA" pat[] = "ABCD".
● Occurrences of the pat[] and its permutations are found at indexes 0,5,6.
● The permutations are BACD,ABCD,BCDA.
● Let's sort the pat[] and the permutations of pat[] in txt[].
● pat[] after sorting becomes : ABCD permutations of pat[] in txt[] after sorting
becomes : ABCD, ABCD,ABCD.
● So we can say that the sorted version of pat[] and sorted version of its
permutations yield the same result.

Code:

def search(pat, txt):

# finding lengths of strings pat and txt


n = len(txt)
m = len(pat)

# string sortedpat stores the sorted version of pat


sortedpat = pat
sortedpat = list(sortedpat)
sortedpat.sort()
Harshit Sachan

sortedpat = ' '.join([str(elem) for elem in sortedpat])

# temp for storing the substring of length equal to pat


for i in range(0, n-m+1):
temp = txt[i:i+m]
temp = list(temp)
temp.sort()
temp = ' '.join([str(elem) for elem in temp])

# checking whether sorted versions are equal or not


if (sortedpat == temp):
print("Found at Index ", i)

# driver code
txt = "BACDGABCDA"
pat = "ABCD"
search(pat, txt)

Output:
Found at Index 0
Found at Index 5
Found at Index 6

Time Complexity:
O(mlogm) + O( (n-m+1)(m + mlogm + m) )

The for loop runs for n-m+1 times in each iteration we build string temp, which takes
O(m) time, and sorting temp, which takes O(mlogm) time, and comparing sorted pat and
sorted substring, which takes O(m). So time complexity is O( (n-m+1)*(m+mlogm+m) )

Space Complexity:
O(m) where m is the size of pattern

Approach 2:
Harshit Sachan

The only difference between the first and this method is that the frequency
matching is optimized from O(nlog(n)) to O(1). In this approach, you need to use
two count arrays:
1. The first count array store frequencies of character in pattern.
2. The second count array stores frequencies of characters in current window of
text.

The important thing to note is, time complexity to compare two count arrays is O(1) as
the number of elements in them are fixed (independent of pattern and text sizes).
Following are steps of this algorithm.
1. Store count of frequencies of pattern in first count array countP[]. Also store
counts of frequencies of characters in first window of text in array countTW[].
2. Now run a loop from i = M to N-1. Do following in loop.
a. If the two count arrays are identical, we found an occurrence.
b. Increment count of current character of text in countTW[].
c. Decrement count of first character in previous window in countWT[].
3. The last window is not checked by above loop, so explicitly check it.

Following is the implementation of the above algorithm:

# Python program to search all


# anagrams of a pattern in a text

MAX = 256

# This function returns true


# if contents of arr1[] and arr2[]
# are same, otherwise false.

def compare(arr1, arr2):


for i in range(MAX):
if arr1[i] != arr2[i]:
return False
return True
Harshit Sachan

# This function search for all


# permutations of pat[] in txt[]

def search(pat, txt):

M = len(pat)
N = len(txt)

# countP[]: Store count of


# all characters of pattern
# countTW[]: Store count of
# current window of text
countP = [0]*MAX

countTW = [0]*MAX

for i in range(M):
(countP[ord(pat[i])]) += 1
(countTW[ord(txt[i])]) += 1

# Traverse through remaining


# characters of pattern
for i in range(M, N):

# Compare counts of current


# window of text with
# counts of pattern[]
if compare(countP, countTW):
print("Found at Index", (i-M))

# Add current character to current window


(countTW[ord(txt[i])]) += 1

# Remove the first character of previous window


(countTW[ord(txt[i-M])]) -= 1
Harshit Sachan

# Check for the last window in text


if compare(countP, countTW):

print("Found at Index", N-M)

# Driver program to test above function


txt = "BACDGABCDA"
pat = "ABCD"
search(pat, txt)

Output:
Found at Index 0
Found at Index 5
Found at Index 6

Time Complexity:
O(n) where n is the size of txt

Space Complexity:
O(n) where n is the size of txt

Conclusion:
In this article we learn about two different approaches to solve the Anagram Substring
Problem using python. We discuss the algorithms and time and space complexity of each
al;loiu65gorithm. Also the pros and cons of both approaches.

You might also like