0% found this document useful (0 votes)

53 views5 pages

Genmex Tool: (Gene Microsatellite Extractor) : Identification of Tandem Repeats

Pattern matching emerges as a powerful tool in locating nucleotide or amino acid sequence patterns in the genomic sequence databases. A Novel approach is proposed to solve the problem of finding tandem repeats patterns in the given sequence by combining the preprocessing method (PDFMCSP) with pattern searching method TSW. The frequently occurring patterns are searched in the input sequence string using Two Sliding Window method (TSW) in which the string is scanned from both the sides at a time.

Uploaded by

rameshcahnd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views5 pages

Genmex Tool: (Gene Microsatellite Extractor) : Identification of Tandem Repeats

Uploaded by

rameshcahnd

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

GenMEx TOOL

(GENE MICROSATELLITE EXTRACTOR): IDENTIFICATION OF TANDEM REPEATS

K.V.S.R.P.Varma1, Prof.Allam Apparao2, E.Vamsidhar3, P.Sankarrao4, S.Ravikanth5 1,4,5 Assistant Professor ,Department of CSE , GITAM University, Visakhapatnam, India 2 Vice Chancellor, JNTUK, Kakinada, India 3 Lecturer, Department of IT, GITAM University, Visakhapatnam, India. 1 Corresponding Author:[email protected]

Abstract--

The Human genome project raises the curtain to solve the

Biological problems in much more sophisticated manner. The Biological data is huge and increasing at faster rate. The computational approach (Insilco) is much needed to analyze these huge biological data. Pattern matching emerges as a powerful tool in locating nucleotide or amino acid sequence patterns in the genomic sequence databases, although several pattern matching algorithms are available in literature, the efficiency of various algorithms depends on faster and exact identification of the pattern in the sequence. In this article a Novel approach is proposed to solve the problem of finding tandem repeats patterns in the given sequence by combining the preprocessing method (PDFMCSP) with pattern searching method TSW.PBFMCSP is used to preprocess the sequence string using the concept of inverted matrix and frequently occurring pattern. The frequently occurring patterns are searched in the input sequence string using Two Sliding Window method (TSW) in which the string is scanned from both the sides at a time. The searching is stopped when both the windows converge. Keywords: Tandem repeats, TSW, PBFMCSP.

INTRODUCTION
The field of computer science has been extending into the different fields for making the work easier, simpler and faster .It is dealing with many types of research problems, in which string matching is one of them [1,2,3]. String matching has wide range of application in many areas such as search engine, speech recognition, data compression, information retrieval, computational biology, virus detection, network intrusion detection, and DNA, RNA and Protein sequence searching and so on. The most promising problem in the analysis of biological sequences is the searching similar sequence in the primary structure of related proteins or genes. Several methods have been proposed to solve this problem. Pattern matching focuses on finding the occurrences of a particular pattern P of length m in a text T of length n. Both the pattern and the text are built over a finite alphabet set called of size . Generally, pattern matching algorithms make use of a single window whose size is equal to the pattern length[4].The Two Sliding Windows algorithm (TSW) [5].Which concentrates on both the pattern and the text and make use of two windows of size that is equal to the size of the pattern. The first window is aligned with the left end of the text while, the second window is aligned with the right end of the text. Both windows slide at the same time (in parallel) over the text in the searching phase to locate the

pattern. The windows slide towards each other until the first occurrence of the pattern from either side in the text is found or they reach the middle of the text. If required, all the occurrences of the pattern in the text can be found. The goal of data mining or knowledge discovery is to utilize those existing data to find out new facts and to uncover new relationships that were previously unknown, in an efficient manner with minimum utilization of the space and time [10]. Frequent Item set Mining plays an essential role in many data mining tasks and applications, such as mining association rules, correlations, sequential patterns, classification and clustering. Frequent item set construction has been a major research area over the years and several algorithms have been proposed in the literature to address the problem of mining association rules. Microsatellites, also known as simple sequence repeats (SSR), short tandem repeats (STR), or variable number tandem repeats (VNTR) are the tandem repeats of nucleotide motifs of the size 16 bp[6] found in every genome known so far. A microsatellite consists of a specific sequence of DNA bases or nucleotides which contains mono, di, tri, or tetra tandem repeats. For example, AAAAAAAAAAA would be referred to as (A)11, GTGTGTGTGTGT would be referred to as (GT)6, CTCACTCACTCACTC would be referred to as (ACTC)4. Their importance in genomes is well known. Microsatellites are associated with various disease genes, have been used as molecular markers in linkage analysis and DNA fingerprinting studies, and also seem to play an important role in the genome evolution. Microsatellite instability has also been implicated in the induction of cancer [7] Owing to their high mutability, it is thought that the microsatellites are one of the sources of genetic

diversity [8] In the recent times, efforts have also been made to study the possible functional roles of microsatellites in giving rise to certain amount of plasticity and also in the evolution of genomes [9].
METHODOLOGY: PREPROCESSING PHASE In the preprocessing phase, the input DNA sequence, which is a string, will be first represented in the form of an inverted matrix. Then the frequencies of various possible patterns are found using the inverted matrix. For example, if the different characters found in a string are A, T, G, C, then the different patterns that can be obtained from these are A, T, G, C, AA, AT, AG, AC, GG, GA, GT, GC, AAA, ATC, AGC and so on. After the frequencies are obtained, the frequent patterns are extracted by pruning the infrequent patterns using Apriori algorithm. Therefore the preprocessing phase includes 3 main steps. 1. Inverted Matrix generation

978-1-4244-5967-4/10/$26.00 2010 IEEE

2. Finding frequencies of various patterns. 3. Pruning the infrequent patterns using t he Apriori algorithm.

Algorithm to generate inverted matrix:

INPUT: DNA sequence: str OUTPUT: two matrices( one[ ][ ] and two[ ][ ]) one for each value in the pair. PROCESS: /*finding the various characters in the string and assigning them indexes*/ str=; char=; str=input DNA sequence. FOR i=1 to str[i]!=\0 index=0; FOR j=I to char[j]!=\0 IF str[i]==char[j] index=1; END IF END FOR IF index!=1 char[count]=str[i]; count++; END IF END FOR /* entering values into matrix*/ FORor i=0;i<length of char;i++ count[i]=0; END FOR FOR i=0;i<length of str;i++ id=indexof character at position in string str. IF i==length of str-1 one[id][count[id]]='$'; two[id][count[id]]='$'; count[id]++; ELSE int a=count[id]; one[id][a]=index of character at i in string str count[id]++; two[id][a]=count[index of character at i+1 in string char]; END IF END FOR

Step 1: Inverted matrix generation

Inverted matrix is the numerical representation of a string. The rows of the matrix represent the various characters present in the string and are indexed in the order in which they appear in the string. For example, if the string is AGCTTAGGCT then the first row belongs to character A, second row to G, Third row to C and fourth row to T as they appear in the same order in the string. New column is added for each new occurrence of the character. The elements of the matrix give the next character position. Each row is made up of pairs. The first value in the pair represents the index of the character that occurs next in the sequence. The second value represents the column in which the next character is represented as pair. Example: Consider the following sequence. AGTCATAGGCCTATGAT The rows are A index 1, G index 2, T index 3, C index 4. So, we create a matrix with these as rows. The matrix initially contains only one column to enter the occurrence of first character in the sequence. Therefore initially the matrix looks as follows. 1 A- index 1 G- index 2 T- index 3 C- index 4 Now the character A has to be represented in the matrix. The index of the character next to A(i.e., G) is 2. The character G will be represented in the first(1) column next. So the element A is represented as pair 2,1. 1 2,1 A- index 1 G- index 2 T- index 3 C- index 4 Similarly the next character G is represented as 3, 1 (3 for the index of next character, T and 1 is the column number in which T will be represented next). Character T is represented as 4, 1 (4 for the index of next character C and 1 is the column number in which C will be represented next). The character C is represented as 1, 2 because the index of the next character i.e., A is 1 and the character A will be represented in the second column. Finally the matrix will obtained will be 1 2 3 4 5 2,1 3,2 2,2 3,4 3,5 A-1 3,1 2,3 4,2 1,5 G-2 4,1 1,3 1,4 2,4 $,$ T-3 1,2 4,3 3,3 C-4
Frequency [count] =The last character in the string is represented as $,$ which represents the termination of the string.

Step 2 - Finding frequencies of various patterns. There are three different types of patterns 1. single character patterns A, G, T,C 2. Patterns with length more than 2. Single character patterns: For single character patterns the frequency is given by the column count of that particular character. Form our previous example, the frequency of A is 5, G is 4, C is 5 and D is 3. Patterns with length more than 2: In this we traverse the inverted matrix and find the frequencies. For example to find the frequency of the pattern ATG we start with finding AT it is found 3 times in row 1 since the index 3 is

present 3 times. For the first occurrence of index 3 in row A, the value of the pair is 3,2 now we check in 3rd row 2nd column in the matrix. The character found is A since index is 1. Therefore the pattern present is ATA but not ATG hence we dont count this. For the second occurrence of T in A that is 3,4 in row A, we check the character present in 3rd row and 4th column position. The character is G as the index is 2. So the pattern explores is ATG which is the patterns that we are checking for and is counted to obtain the frequency. For the third occurrence of T in row A, the value of pair is 3,5. we go to third row 5th column and fond $,$. When the index is found to be $ that means that we have only AT but third character is not present. Hence the pattern is AT and is not counted. Hence the frequency of the pattern ATG is 1. The total number of patterns that can be obtained with four characters with length 1,2 3,4,5,6 are For each of those patterns we find the frequencies using the method described as above. Algorithm is generated which supports patterns up to length 6.
ALGORITHM INPUT: Inverted matrix OUTPUT: frequent patterns found in the given biological sequence PROCESS: %input characters are A, T, G, C. their indexes are assigned in the inverted matrix% Patern=0; Count=0; FOR i=1 to 4 Pattern=pattern*10+i; Access the i th row in the inverted matrix construction and search for any element in the row with index i; flag=no of occurrences of index i; frequency[count]=flag; count++; FOR j=1 to 4 pattern=pattern*10+j; flag=number occurrences of index j in ith row. frequency[count]=flag; count++; FOR k=1 to 4 pattern=pattern*10+k; flag= number of occurrences of k index in the a,bth element of inverted matrix. frequency[count]=flag; count++; FOR l=1 to 4 pattern=pattern*10+l; flag= number of occurrences of l index in the a,bth element of inverted matrix. frequency[count]=flag;

count++; FOR m=1 to 4 pattern=pattern*10+m; flag= number of occurrences of m index in the a,bth element of inverted matrix. frequency[count]=flag; count++; FOR n=1 to 4 pattern=pattern*10+n; flag= number of occurrences of n index in the a,bth element of inverted matrix. frequency[count]=flag; count++;

The freq of various patterns are entered into an array and the corresponding patterns are fed into the further step. Step 3: Pruning Infrequent Patterns Using Apriori Algorithm:We take the input values for the minimum repeat range for each of the patterns having length 1,2 3,4,5,6 and so on. For example if the minimum repeat range of di is given as 4 then all the patterns of length those are repeated more than 4 times are found like ATATATAT, GCGCGCGCGC etc and so on. If the minimum repeat value of tri is given to be 5, all the patterns like AGTAGTAGTAGTAGT are found. We take the minimum repeat values from the input and compute the minimum threshold and support values. If the values are given as
Mono= 5 Di=3 Tri=2 Tetra=2 Penta=3 Hexa=2 Then the support values for each of them are Let total= 4 + 4 + 4 + 4 + 4 + 4 Mono= 5*100/total Di=3*100/total Tri=2*100/total Tetra=2*100/total Penta=3*100/total Hexa=2*100/total For each pattern we have found the frequencies. We find the support for each of those patterns using the formula Support =
2 3 4 5 6

4 + 4 2 + 43 + 4 4 + 45 + 46

frequency of pattern * 100 total

Example: Let the input pattern be ATGCATATATATATAT The frequency of AT is 7. the support of AT= 7*100/total. The minimum threshold is 3*100/total. The support is greater than the minimum threshold. Therefore the pattern is considered as frequent pattern. Now consider the pattern GC. The frequency of the pattern is 1. The support is 1*100/total. This is less than the minimum support value i.e., 3*100/total of di pattern. Therefore this pattern is pruned. In this way the infrequent patterns are pruned.

Algorithm to prune infrequent patterns

INPUT: minimum repeat numbers for di, tri, tetra, penta and hexa, the frequencies of al lpatterns i.e., cp [ ] and array having lengths of patterns l[ ] OUTPUT: frequent patterns PROCESS:
total= 4 + 4 + 4 + 4 + 4 + 4 di_threshold=di*100/total; tri_threshold=tri*100/total; tetra_threshold=tetra*100/total; penta_threshold=penta*100/total; hexa_threshold=hexa*100/total; FOR i=1,i<length of cp;i++ support =cp[i]*100/total; IF l[i]==2 IF support >di_threshold add pattern to frequent patterns ELSE IF l[i]==3 IF support >tri_threshold add pattern to frequent patterns ELSE IF l[i]==4 IF support >tetra_threshold add pattern to frequent patterns ELSE IF l[i]==5 IF support >penta_threshold add pattern to frequent patterns ELSE IF l[i]==6 IF support >hexa_threshold add pattern to frequent patterns END IF END FOR
2 3 4 5 6

text. Both windows slide at the same time (in parallel) over the text in the searching phase to locate the pattern. The windows slide towards each other until they converge. We use Berry Ravindran bad character shift rule for fast search process which results due to fast shifting of the sliding windows.

TSW ALGORITHM
The Two Sliding Windows algorithm (TSW) scans the text from both sides simultaneously. It uses two sliding windows; the size of each window is m which is the same size as the pattern. The two windows search the text in parallel. The text is divided into two parts: the left and the right parts, each part is of size n/2. The left part is scanned from left to right using the left window and the right part is scanned from right to left using the right window. Both windows slide in parallel which makes the TSW algorithm suitable for parallel processors structures. TSW algorithm stops when one of the two sliding windows converge. If necessary, the algorithm can be modified easily to find all the occurrences of the pattern. Also if the pattern is exactly in the middle of the text, TSW can find it easily. The TSW algorithm utilizes the idea of BR bad character shift function to get better shift values during the searching phase. BR algorithm provides a maximum shift value in most cases without losing any characters. The main differences between TSW algorithm and BR algorithm are: TSW uses two sliding windows rather than using one sliding window to scan all text characters as in BR algorithm The TSW uses two arrays; each array is a one dimensional array of size (m-1). The arrays are used to store the calculated shift values for the two sliding windows. The shift values are calculated only for the pattern characters. While the original BR algorithm uses a two-dimensional array to store the shift values for all the alphabets. Using one dimensional array reduces the search processing time and at the same time reduces the memory requirements needed to store the shift values PRE-PROCESSING PHASE: The pre-processing phase is used to generate two arrays nextl and nextr, each array is a one-dimensional array. The values of the nextl array are calculated according to Berry-Ravindran bad character algorithm (BR). nextl contains the shift values needed to search the text from the left side. To calculate the shift values, the algorithm considers two consecutive text characters a and b which are aligned immediately after the sliding window. Initially, the indexes of the two consecutive characters in the text string from the left are (m+1) and (m+2) for a and b respectively. On the other hand, the values of the nextr array are calculated according to our proposed shift function. nextr contains the shift values needed to search the text from the right side, initially the indexes of the two consecutive characters in the text string from the right.

SEARCHING THE FREQUENT PATTERNS IN THE INPUT SEQUENCE USING TWO- SLIDING WINDOW APPROACH: The frequent patterns found in the preprocessing phase are searched in the input biological sequence using two sliding window approach. The patterns that will be searched are dynamically generated in the previous phase. Placing the pattern in the sliding window Initially we take the length of the pattern. If it is mono the pattern is repeated 5 times and placed in the window. Eg; if the pattern is A, the pattern placed in the sliding window is AAAAA. It is repeated 5 times because the minimum repeat unit for mono pattern is given as 5 in the previous phase. Similarly if the pattern is AGC, the pattern is repeated 2 times i.e., AGCAGC (since minimum repeat unit of tri is 2) is placed in the sliding window and is searched in the pattern. INTRODUCTION TO TSW (TWO SLIDING WINDOW): The algorithm concentrates on both the pattern and the text. It makes use of two windows of size that is equal to the size of the pattern. The first window is aligned with the left end of the text while, the second window is aligned with the right end of the

it identifies the positions of the characters A,T,G,C in the sequence to shift the window in the TSW method which we can use it as bad character shift preprocessing of the given sequence. Hence this novel method reduces much time complexity to find tandem repeats in the sequence.

REFERENCES
[1] G..Navarro,M.Raffinot,Fast and Flexible Pattern Matching
in Strings-Practical On-line Search Algorithms for Texts and Biological Sequences, Cambridge University Press, Cambridge,2002. [2] M.Crochemore,W.Rytter,Jewels of Stringology, World Scientific,Singapore,2002. [3] W.f.smyth, Computing Patterns in Strings, Pearson Addison Wesley, 2003. [4] Charras, C. and T. Lecroq, 2004. Handbook of Exact String Matching Algorithms. First Edition.Kings College London Publications.ISBN: 0954300645 [5] Amjad Hudaib et al., A Fast Pattern Matching Algorithm with Two Sliding Windows (TSW), Journal of Computer Science 4 (5): 393-401, 2008 [6] Schlotterer,C. (2000) Evolutionary dynamics of microsatellite DNA. Chromosoma, 109, 365371. [7] Thibodeau,S.N. et al. (1993) Microsatellite instability in cancer of the proximal colon. Science, 260, 816819. [8] Kashi,Y. and King,D.G. (2006) Simple sequence repeats as advantageous mutators in evolution. Trends Genet., 22, 253 259. [9] Sreenu,V.B. et al. (2006) Microsatellite polymorphism across the M. tuberculosis and M. bovis genomes: implications on genome evolution and plasticity.BMC Genomics, 7, 7888. [10] Jiawei Han and Micheline Kamber, Data Mining, Concepts and Techniques, 2 nd Edition, Morgan Kaufmann Published,2006.

Searching phase:
STEP 1: Compare the characters of the two sliding windows with the corresponding text characters from both sides. If there is a mismatch during comparison from both sides, the algorithm goes to step2, otherwise the comparison process continues until a complete match is found. The algorithm stops and displays the corresponding position of the pattern on the text string. If we search for all the pattern occurrences in the text string, the algorithm continues to step2. STEP 2: In this step, we use the shift values from the next arrays depending on the two text characters placed immediately after the pattern window. The two characters are placed to the right side of the left window and to the left side of the right window. The corresponding windows are shifted to the correct positions based on the shift values, the left window is shifted to the right and the right window is shifted to the left. Both steps are repeated until the first occurrence of the pattern is found from either sides or until both windows are positioned beyond n/2. If the first occurrence of the pattern exists in the middle of the text, the TSW algorithm[a] continues comparing pattern characters with text characters through the inner loops before the TSW algorithm terminates the searching process through the outer loop. The outcome from this TSW gives the tandem repeats (Microsatellites) present in the given input sequence. CONCLUSION In this novel approach we presented a method combining the frequent pattern search and fast pattern matching (Two Sliding Window) method to reduce the time complexity and to find microsatellites in the given nucleotide sequence. This approach preprocess the sequence to identify frequent patterns in the sequence by using inverted matrix method and at the same time

Algorithms On Strings Trees and Sequence PDF
No ratings yet
Algorithms On Strings Trees and Sequence PDF
326 pages
Plate Yield Line Theory 07 09 2015 PDF
No ratings yet
Plate Yield Line Theory 07 09 2015 PDF
64 pages
EGIG - 10th - Report - Final - 09-03-2018 - v1
No ratings yet
EGIG - 10th - Report - Final - 09-03-2018 - v1
50 pages
Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
How To Prepare The Sar
100% (1)
How To Prepare The Sar
107 pages
3-1a Mathematics 3: Differential Calculus
No ratings yet
3-1a Mathematics 3: Differential Calculus
33 pages
Lecture-2 Module-2 Structural Analysis-II
No ratings yet
Lecture-2 Module-2 Structural Analysis-II
65 pages
App - El & Cap
No ratings yet
App - El & Cap
11 pages
Configuration of Fibers in Staple Yarn
No ratings yet
Configuration of Fibers in Staple Yarn
8 pages
Sandeep Singh (Iii B.Tech I.T)
No ratings yet
Sandeep Singh (Iii B.Tech I.T)
179 pages
IEEE Formate
No ratings yet
IEEE Formate
5 pages
Shortest Common Superstring1
No ratings yet
Shortest Common Superstring1
14 pages
PH 114 Lab 4 - Kenneth
No ratings yet
PH 114 Lab 4 - Kenneth
13 pages
Practice Problem Set 6
No ratings yet
Practice Problem Set 6
7 pages
Logarithm DPP 2
No ratings yet
Logarithm DPP 2
2 pages
Thermodynamics Lecture 1
No ratings yet
Thermodynamics Lecture 1
15 pages
Final LP IN MATH! TYG!
No ratings yet
Final LP IN MATH! TYG!
10 pages
Pattern Recognition 1
No ratings yet
Pattern Recognition 1
5 pages
Slater Condon Rules Derivation
No ratings yet
Slater Condon Rules Derivation
7 pages
1CE 355 Practice Problems Traffic Flow and Level-Of-Service
No ratings yet
1CE 355 Practice Problems Traffic Flow and Level-Of-Service
1 page
Discrete Structure Answer Key
No ratings yet
Discrete Structure Answer Key
2 pages
Pattern Search in A Single Genome
No ratings yet
Pattern Search in A Single Genome
34 pages
Burrows-Wheeler Transform
No ratings yet
Burrows-Wheeler Transform
42 pages
Csit 64802
No ratings yet
Csit 64802
10 pages
Rounding To 2dp 2
No ratings yet
Rounding To 2dp 2
2 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
10 pages
A Method To Find Palindromes in Nucleic Acid Sequences: Open Access
No ratings yet
A Method To Find Palindromes in Nucleic Acid Sequences: Open Access
4 pages
Sequence Similarity Between Genetic Codes Using Improved Longest Common Subsequence Algorithm
No ratings yet
Sequence Similarity Between Genetic Codes Using Improved Longest Common Subsequence Algorithm
4 pages
JDS Math Test 2017 Solution 230107 025748
No ratings yet
JDS Math Test 2017 Solution 230107 025748
8 pages
Pattern Matching Algorithms
No ratings yet
Pattern Matching Algorithms
17 pages
Pattern Matching Algorithms
No ratings yet
Pattern Matching Algorithms
17 pages
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
No ratings yet
Heuristic Local Alignerers: The Basic Indexing & Extension Technique
39 pages
Chapter 1 - BRODGAR STATISTIC
No ratings yet
Chapter 1 - BRODGAR STATISTIC
4 pages
MBA103
No ratings yet
MBA103
3 pages
A Comparative Study of Various Parallel Longest Common Subsequence (LCS) Algorithms
No ratings yet
A Comparative Study of Various Parallel Longest Common Subsequence (LCS) Algorithms
4 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
An O (ND) Difference Algorithm and Its Variations
No ratings yet
An O (ND) Difference Algorithm and Its Variations
15 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Semester Final Project Report
No ratings yet
Semester Final Project Report
11 pages
Lecture Notes On Pattern Matching Algorithms
No ratings yet
Lecture Notes On Pattern Matching Algorithms
16 pages
500 Fang
No ratings yet
500 Fang
39 pages
An Analysis On Three Influential DNA Sequencing Algorithms
No ratings yet
An Analysis On Three Influential DNA Sequencing Algorithms
8 pages
Data Structures Unit 5
No ratings yet
Data Structures Unit 5
20 pages
DNA Sequence Alignment
No ratings yet
DNA Sequence Alignment
21 pages
Intro To Dynamic Programming
No ratings yet
Intro To Dynamic Programming
7 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Longest Common Substring Problem: Example
No ratings yet
Longest Common Substring Problem: Example
5 pages
Brenner 1983
No ratings yet
Brenner 1983
6 pages
State Space Modeling and Simulation and
No ratings yet
State Space Modeling and Simulation and
4 pages
2022 Cubero Et Al Entrepreneurship What Matters Most
No ratings yet
2022 Cubero Et Al Entrepreneurship What Matters Most
14 pages
2d Pattern Matching
No ratings yet
2d Pattern Matching
35 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
DAA - Notes-Unit-3 and 4
No ratings yet
DAA - Notes-Unit-3 and 4
21 pages
Dynamic Programming Longest Common Subsequence
No ratings yet
Dynamic Programming Longest Common Subsequence
3 pages
Three-Dimensional Particle Image Velocimetry: Error Analysis of Stereoscopic Techniques
No ratings yet
Three-Dimensional Particle Image Velocimetry: Error Analysis of Stereoscopic Techniques
8 pages
KMP Algorithm For Strings
No ratings yet
KMP Algorithm For Strings
4 pages
Sequential Pattern Mining
No ratings yet
Sequential Pattern Mining
3 pages
Dynamic Programming:: Example 1: Assembly Line Scheduling. Instance
No ratings yet
Dynamic Programming:: Example 1: Assembly Line Scheduling. Instance
14 pages
9th Class Unit-6 Force and Laws of Motion Teaching Notes
100% (1)
9th Class Unit-6 Force and Laws of Motion Teaching Notes
2 pages
Lipton Inference To The Best Explanation
No ratings yet
Lipton Inference To The Best Explanation
14 pages
MTE412 Ch5 Position Analysis
No ratings yet
MTE412 Ch5 Position Analysis
28 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
A Two Way Pattern Matching Algorithm Using Sliding Patterns
No ratings yet
A Two Way Pattern Matching Algorithm Using Sliding Patterns
5 pages
Text Pattern Search Using Naïve Algorithm: Justine Estoesta, Patricia Mae Omana, Winci John Singh
No ratings yet
Text Pattern Search Using Naïve Algorithm: Justine Estoesta, Patricia Mae Omana, Winci John Singh
5 pages
Unit Ii
No ratings yet
Unit Ii
14 pages
Hors Pool
No ratings yet
Hors Pool
16 pages
Aoa 6
No ratings yet
Aoa 6
4 pages
DS V Unit Notes
No ratings yet
DS V Unit Notes
33 pages
Code2pdf 6400c76826c9d
No ratings yet
Code2pdf 6400c76826c9d
3 pages
Constructing A Phylogenetic Tree
No ratings yet
Constructing A Phylogenetic Tree
7 pages
Three Steps in Dynamic Programming
No ratings yet
Three Steps in Dynamic Programming
7 pages
ADSA
No ratings yet
ADSA
4 pages
Chapter 5 Pairwise Alignment
No ratings yet
Chapter 5 Pairwise Alignment
8 pages
ADA UNIT 3 Complete Notes
No ratings yet
ADA UNIT 3 Complete Notes
59 pages
ADSA IA2 Solution
No ratings yet
ADSA IA2 Solution
14 pages
5CS4-AOA-Unit-3 @zammers
No ratings yet
5CS4-AOA-Unit-3 @zammers
7 pages
Countable/Uncountable Nouns + Articles
No ratings yet
Countable/Uncountable Nouns + Articles
36 pages
Lectures 5-8
No ratings yet
Lectures 5-8
11 pages
ALo 2
No ratings yet
ALo 2
23 pages
AP Calculus BC Unit 10A Test
No ratings yet
AP Calculus BC Unit 10A Test
3 pages
LCS Example
No ratings yet
LCS Example
5 pages
MADF Unit 4
No ratings yet
MADF Unit 4
144 pages
Unit 3
No ratings yet
Unit 3
34 pages
0009 VAV Box Calibration
No ratings yet
0009 VAV Box Calibration
4 pages
Merged Module Presentation
No ratings yet
Merged Module Presentation
100 pages
DS Unit V
No ratings yet
DS Unit V
12 pages
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
From Everand
Lessons in Bioinformatics - Dot Plots: Lessons in Bioinformatics, #1
Björn Olsson
No ratings yet
Matrix Theory
From Everand
Matrix Theory
Joel N. Franklin
No ratings yet
Basic Matrix Theory
From Everand
Basic Matrix Theory
Leonard E. Fuller
No ratings yet
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet

Genmex Tool: (Gene Microsatellite Extractor) : Identification of Tandem Repeats

Uploaded by

Genmex Tool: (Gene Microsatellite Extractor) : Identification of Tandem Repeats

Uploaded by

GenMEx TOOL

(GENE MICROSATELLITE EXTRACTOR): IDENTIFICATION OF TANDEM REPEATS

The Human genome project raises the curtain to solve the

978-1-4244-5967-4/10/$26.00 2010 IEEE

Algorithm to generate inverted matrix:

Step 1: Inverted matrix generation

frequency of pattern * 100 total

Algorithm to prune infrequent patterns

You might also like