0% found this document useful (0 votes)
22 views8 pages

Applications of Hidden Markov Model Stat-1

This paper provides a state-of-the-art literature review on applications of hidden Markov models. It reviews over 70 papers on HMMs published between 2003 and 2012. The review covers application areas, influential authors, author nationalities, and relevant journals. It also briefly describes HMMs and the key problems they can solve, such as determining state probabilities and sequence probabilities.

Uploaded by

lila oudjoudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views8 pages

Applications of Hidden Markov Model Stat-1

This paper provides a state-of-the-art literature review on applications of hidden Markov models. It reviews over 70 papers on HMMs published between 2003 and 2012. The review covers application areas, influential authors, author nationalities, and relevant journals. It also briefly describes HMMs and the key problems they can solve, such as determining state probabilities and sequence probabilities.

Uploaded by

lila oudjoudi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391

Applications of Hidden Markov Model: state-of-the-art


Marcin PIETRZYKOWSKI and Wojciech SAŁABUN
Department of Artificial Intelligence Methods and Applied Mathematics, Faculty of Computer
Science and Information Technology, West Pomeranian University of Technology, Szczecin,
ul. Żołnierska 49, 71-210 Szczecin, Poland
E-mail: [email protected], [email protected]

Abstract

This paper performs a state-of-the-art literature scheme, which includes 73 papers published in 42
review to classify and interpret the ongoing and scholarly journals since 2003 to 2012.
emerging issues associated with the Hidden Mar- The rest of the paper is set out as follows: in
kov Model (HMM) in the last decade. HMM is a Section 2, we present the basics description of the
commonly used method in many scientific areas. It HMM method with basic conceptions but the with-
is a temporal probabilistic model in which the state out detailed mathematical definitions. In this sec-
of the process is described by a single discrete ran- tion, we show only the fundamental mathematical
dom variable. The theory of HMMs was developed formulas which are necessary to introduce HMM
in the late 1960s. Now, it is especially known for its method. Section 3 presents a methodology which is
application in temporal pattern recognition, i.e. used to paper selection. In this part, we present a
speech, handwriting, and bioinformatics. After a basic bibliographic parameters and statistics. Af-
brief description of the study methodology, this terwards, in Section 4, we show a set of most im-
paper comprehensively compares the most impor- portant selected publications in respect to primary
tant HMM publications by field of interest, most application areas and bibliographic parameters.
cited authors, authors' nationalities, and scientific Section 5 contains some concluding remarks.
journals. The comparison is based on papers in-
dexed in the Institute for Scientific Information 2. Markov Model description
(ISI) Web of Knowledge and ScienceDirect data- Consider a system which consists of a set of N
bases. distinct states S1 , S 2 ,...,S N . At each discrete time
moment t the system can be in a single state. We
Keywords: Markov Chains, Hidden Markov Model, denote a single state in time t as q t . In general case
application areas, literature review. the current state q t depends on the previous state
qt 1 and the whole history of all previous states. In
1. Introduction
Hidden Markov Model (HMM) is a statistical the case of Markov Chain the history is truncated to
model named after Russian mathematician Andrey just the predecessor state. Moreover we consider a
Markov. It is a large and useful class of stochastic system in which transition between states is con-

 
processes. It is characterized by Markov Property stant in time:

aij  P qt  S j | qt 1  S i , 1  i, j  N
which means that future state of the process de-
(1)

 
pends only upon the present state, not on the se-

The probability matrix is defined as A  aij


quence of events that preceded it. HMM was origi-
nally introduced by Baum and Petrie [4]. The first,

where aij  0 and  j 1 aij  1. That kind of sto-


foremost and engineer-friendly work was an appli-
N
cation of automatic speech recognition [56]. Mar-
kov Models are very rich in mathematical structure chastic process could be called an observable Mar-
and when applied properly, work very well in prac-
distribution for moment t  1 . The initial distribu-
kov Model but it additionally needs the probability
tice for several applications. Hidden Markov Mod-
els are especially known for their application in tion is denoted as:

   i , 1  i  N
temporal pattern recognition such as speech,
handwriting, gesture recognition, part-of-speech (2)
tagging, musical score following, partial discharges

i 1  i  1 . The Markov Model is defined


and bioinformatics. This paper provides a state-of-
N
the-art literature survey on Hidden Markov Models where
applications and methodologies. A reference repo- by a pair:

   A,  
sitory has been established based on a classification
(3)

IJCTA | July-August 2014 1384


Available [email protected]
ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391


Given the specified Markov Model  three inter-
PO |   ?
probability of the sequence given the model
esting questions can be asked (and answered):

vations O, e.g.: O  S1 , S 3 , S 4 , S 4 , S1 , S 9  ? O  O1 , O2 ,...,OT  and a model  , how do we


1. What is the probability of the sequence of obser- 2. Given a sequence of observations

choose the best states sequence Q  q1 , q 2 ,...,qT 


2. What is the probability distribution for all states
at t = T (after T – 1 moments passed by)? that correspond to the observations sequence O?

O  O1 , O2 ,...,OT  and knowing M and N, how do


3. What is the probability of staying at a fixed state 3. Given a sequence of observations

we tune the model  (how to choose the best con-


S i for exactly d successive moments, given that

model  (where observation sequence is defined tent for the triplet    A, B,   in order to maxim-
the system is currently in that state and given the

  ize PO |   )?
 
as O  S i , S i ,...,S i , S j ,  S i  )?

 1 2 d d 1 
 The above problems can be solved with following
methods, respectively:
The paper only briefly describes the method. An-
swers for above questions and solutions to basic 1. Forward-Backward Algorithm
problems (showed late in the section) will not be
described here but can be found in appropriate lite- 2. Viterbi Algorithm
rature.
In a Markov Model, states of the model corres- 3. Baum-Welch reestimation procedure
pond to observable events. In Hidden Markov
HMM described above can be called as non-
Model states are hidden and not observable. We
parametric discrete HMM. Instead of probability
can only see the sequence of observations. The set
defined by matrix B we can use almost any proba-
of observation symbols is finite and contains M
bilistic parametric distribution e.g.: binomial,
distinct elements. The observation symbol corres-
Gaussian, Poisson, etc. For example observation
pond to the physical output of the system being
emission probability for Poisson Discrete Hidden

V  v1 , v2 ,...v M . HMM is a double embedded


modeled. We denote the set of symbols as
Markov Model is denoted as:

e j  nj

b j n   , n N
stochastic process. The first process determines
transitions from one state to another and is identical (6)
n!
to process described above. The second stochastic

 
process produce the sequence of observations. The where b j (n) is a Poisson Model for state j with

fined by matrix B  b j k  , where:


parameter  j . For more general models, e.g.: mul-
observation symbol probability distribution is de-

tinomials the parameter  j could be a vector.


 
b j k   P v k at t | qt  S j , 1  j  N ,1  k  M (4) When observations are real value, the model is
called Continuous Hidden Markov Model. The
discrete observation probability b j (k ) is replaced
Each row of the matrix contains distribution of the by a continuous probability density function. For
observation symbols for the specified single states example for Gaussian Hidden Markov Model:

b j ( x)  N ( x,  j ,  j )
j. The HMM is defined as a triplet:

   A, B,  
(7)
(5)
In discrete and continuous Hidden Markov Model
There are three basic problems of interest that must use of mixture probabilities is also possible. For
be solved for the model to be useful in real-word example Gaussian Mixture HMM has following
applications: form:

O  O1 , O2 ,...,OT  and a model  , what is the



b j x    c jm N x,  jm ,  jm 
1. Given a sequence of observations M
(8)
m1

IJCTA | July-August 2014 1385


Available [email protected]
ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391


where x is vector being modeled, c jm is mixture publicated in a special issue of Nucleis Acids and
coefficient for mth mixture in state j. Research. Pfam is a comprehensive collection of
protein domains and families, represented as mul-
tiple sequence alignments and as profile HMMs
3. Study of the art
The literature review was undertaken to identify [20, 21].
papers in the highest-ranking journals that provide
the most valuable information to researchers and 4. Application areas
practitioners studying issues concerning the HMM. The last 10 years have seen a large number of ma-
For the last ten years (2003-2012) many significant jor scientific papers, from the construction of an
papers on the HMM were published. With this extensive database of genomic information to the
scope in mind, we conducted an extensive search better denoising of signals. Below is a short list of
for HMM in the title, abstract and keywords of the most important scientific achievement of the
scientific papers. We particularly targeted ISI Web last decade for a common application areas. We
of knowledge library and Elsevier databases. In this selected 73 the most important papers in respect to
period of time, 11,081 papers were indexed in ISI citation number. A lower bound of citation number
Web of Knowledge and 11,764 papers were in- is determined as 300, because we would like to
dexed in ScienceDirect. Table 1. gives valuable select only the most important scientific articles.
information regarding the frequency distribution by HMM is most widely used and important in Ge-
publication year. Since 2006, the number of pub- netics and Heredity [1, 7, 8, 16, 25, 28, 36, 37, 39,
lished papers exceeded the number of 1000 articles. 43, 48, 49, 51, 60, 63, 68, 69, 70, 71, 74, 75] and
Almost one-third (27.81%) of the total number Biochemistry and Molecular Biology [2, 3, 5, 6, 13,
of papers were published by U.S. researchers. This 15, 17, 27, 30, 31, 33, 35, 44, 50, 59, 64, 66].
is slightly less than all Chinese, English, French
and German scientists. A little more than three- Table 1: The distribution of papers by year of
fourth (78.31%) of all papers were written by the publication
ten most productive nationalities. Table 2. shows ISI Web of
particular data on the most productive nationalities Year ScienceDirect
knowledge
that participated in HMM publications. 2003 746 635
In the 10 most popular journals, 2,164 scholarly
papers were publicated. This is almost one-fifth 2004 866 687
(19.53%) of all publicated papers. Table 3. shows 2005 946 839
the number of scholarly papers by journal publica- 2006 1,158 1,064
tion. According to Table 3., Lecture Notes in Com- 2007 1,293 1,113
puter Science is the most popular source, it pub- 2008 1,284 1,167
lished 636 papers (5:74%) of the total discussed 2009 1,448 1,400
HMM papers. The second place of productivity is
International Conference on Acoustics Speech and 2010 1,114 1,441
Signal Processing, which published 389 (3.51%) 2011 1,139 1,549
papers on HHM. 2012 1,087 1,869
The most of papers on HMM have been written Total: 11,081 11,764
by Pieczyński W. He is currently Professor at the
Telecom SudParis (ex Telecom INT). The result of
his research greatly improves classification by us- Table 2: The distribution of papers by authors’
ing HMM for unsupervised data [10, 18, 52]. Pro- nationality.
fessor Rigoll G. is also a leading scientist on HMM
(45 papers). He is the head of Institute for Human- No. of Percent of
No. Country
Machine Communication, Technical University articles the all
Munich. His paper on handwritten address recogni- 1 USA 3,082 27.81
tion using HMM is the most frequently cited paper 2 China 1,438 12.98
of his research [9]. Table 4. shows the number of 3 France 703 6.34
scholarly papers by authors.
4 England 684 6.17
The most often cited article was cited 4013
times. The article describes improvements of the 5 Germany 594 5.36
currently most popular method for prediction of 6 Canada 575 5.19
classically secreted proteins, SignalP. It consists of 7 Japan 575 5.19
predictors based on neural network and HMM [5]. 8 Australia 354 3.19
The second is a paper on the SWISS-model work- 9 Italy 338 3.05
space. It is a web-based environment for protein
10 South Korea 335 3.02
structure homology modeling. It was cited 2021
times [2]. Finally, the third is an article on Pfam, Total: 8,678 78.31

IJCTA | July-August 2014 1386


Available [email protected]
ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391


Forexample, The PANTHER (Protein ANalysis searchers in computer vision and content-based
THrough Evolutionary Relationships) database was image retrieval. The authors implemented and
proposed for high-throughput analysis of protein tested their ALIP system by used HMM. Whereas
sequences. One of the key features is a statistical the paper [55] describes a method for removing
models (Hidden Markov Models). Separate HMM noise from digital images, based on a statistical
are built from each of protein groups. The advan- model of the coefficients of an over-complete mul-
tage of using HMMs is that new sequences can be tiscale oriented basis. Neighborhoods of coeffi-
automatically classified as they become available. cients at adjacent positions and scales were mod-
The HMMs have been used to classify gene prod- eled as the product of two independent random
ucts across the entire genomes of human [48, 49, variables: a Gaussian vector and a hidden positive
73, 74]. scalar multiplier.
HMMs are used a very frequently to prediction.
For instance, a peptide predictor was presented in
the paper: A combined trans-membrane topology
and signal peptide prediction method. This predic- Table 3: The distribution of papers by journals.
tor was based on a HMM and try to model the dif-
ferent sequence regions of a signal peptide and the No. of
No. Name of source
different regions of a trans-membrane protein in a articles
series of interconnected states [33]. Lecture Notes in Computer Sci-
HMM also has a very significant impact in Elec- 1 363
ence
tricity, Electronics, Computer Science and Artifi-
cial Intelligence [9, 10, 11, 18, 32, 42, 24, 46, 52, International Conference on
54, 55, 61, 62, 72] and Mathematical and Computa- 2 Acoustics Speech and Signal 389
tional Biology [26, 40, 41, 57, 65]. For example in Processing
[10], authors dealed with the statistical restoration 3 Bioinformatics 243
of hidden discrete signals, extending the classical
methodology based on HMM. The aim was to take Lecture Notes in Artificial Intelli-
4 200
into account the hidden signal and complex rela- gence
tionships between the noises which can be from IEEE Transactions on Audio
different parametric models, non-independent, and 5 188
Speech and Language Processing
of class-varying nature. In the paper [65] authors
have generalized the alignment of protein se- 6 BMC Bioinformatics 165
quences with a profile Hidden Markov Model 7 Nucleic Acids Research 132
(HMM) to the case of pairwise alignment of profile 8 Speech Communication 77
HMMs. They presented a method for detecting
IEEE Transactions on Pattern
distant homologous relationships between proteins 9 76
based on this approach. HMM was also used to Analysis and Machine Intelligence
representation and recognition of a human gait 10 BMC Genomics 58
[32]. The gait information in the frame to exemplar Total: 2,164
(FED) distance vector sequences was captured in a
HMM. In the second method, referred as the direct
approach, authors worked with the feature vector
directly (as opposed to computing the FED) and
train HMM. The HMM parameters (specifically the
observation probability B) were estimated based on Table 4: The distribution of papers by author.
the distance between the exemplars and the image No. Name of author No. of articles
features. In this way, learning high-dimensional 1 Pieczyński W. 49
probability density functions has been avoided. The 2 Rigoll G. 45
statistical nature of the HMMs lend overall robust-
3 Bunke H. 42
ness to representation and recognition [9, 72]. The
HMM, MEME/MAST (Multiple Em for Motif Eli- 4 Liu Y. 42
citation/Motif Alignment and Search Tool) and 5 Tokuda K. 41
hybrid model that combined two or more models 6 Kobayashi T. 36
were developed in [57]. In result a high accuracy of 7 Carin L. 34
prediction was obtained. An another interesting 8 Nakamura Y. 32
application of HMM is the Automatic Linguistic
9 Lee CH. 31
Indexing of Pictures (ALIP) System. The paper
[42] introduces a statistical modeling approach to 10 Schuller B. 31
automatic linguistic indexing of pictures. It is an Total: 383
important but highly challenging problem for re-

IJCTA | July-August 2014 1387


Available [email protected]
ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391


Other research areas include: Pharmacology and [9] Brakensiek, A., Rigoll, G.: Handwritten address
Pharmacy [22, 23], Mechanics [29], Microbiology recognition using hidden Markov models, LEC-
[34, 38], Biophysics [47, 73], Cell Biology [12], TURE NOTES IN COMPUTER SCIENCE, 2956,
Neurosciences and Neurology [19] and Multidis- pp. 103-122, 2004.
ciplinary Sciences [45, 53]. For instance, authors in [10] Brunel, N., Pieczynski, W.: Unsupervised sig-
[47] had developed an analysis scheme that casted nal restoration using hidden Markov chains with
single-molecule time-binned FRET (Fluorescence copulas, SIGNAL PROCESSINGS , 85(12), pp.
Resonance Energy Transfer) trajectories as HMMs. 2304-2315, May 2005.
5. Conclusions [11] Cappe, O., Godsill, S. J., Moulines, E.: An
This paper performs a state-of the-art literature overview of existing methods and recent advances
review to classify and interpret the ongoing and in sequential Monte Carlo, PROCEEDINGS OF
emerging issues that apply the HMM. Overall, au- THE IEEE , 95(5), pp. 899-924, May 2007.
thors show that the HMMs have been successfully [12] Carter, C., Pan, S. Q., Jan, Z. H., et al.: The
applied to a wide range of application areas and vegetative vacuole proteorne of Arabidopsis thalia-
industrial sectors with varying terms and subjects. na reveals predicted and unexpected proteins,
The insights identified in this review will help PLANT CELL, 16(12), pp. 3285-3303, Dec. 2004.
channel research efforts and fulfill researchers'
needs for easy references to HMM publications. [13] Chandonia, J. M., Hon, G., Walker, N. S., et
al.: The ASTRAL Compendium in 2004, NUC-
6. References LEIC ACIDS RESEARCH, 32Special (SI), pp.
[1] Abecasis, G. R., Wigginton, J.E.: Handling D189-D192, Jan. 2004.
Marker-Marker linkage disequilibrium: Pedigree [14] Cohen, I., Sebe, N., Garg, A., et al.: Facial
analysis with clustered Markers, AMERICAN expression recognition from video sequences: tem-
JOURNAL OF HUMAN GENETICS, 77(5), pp. poral and static modeling, COMPUTER VISION
754-767, Nov. 2005. AND IMAGE UNDERSTANDING, 91(1-2), pp.
[2] Arnold, K., Bordoli, L., Kopp, J., et al.: The 160-187, Jul.-Aug. 2003.
SWISS-MODEL workspace: a web-based envi- [15] Colella, S., Yau, C., Taylor, J. M., et al.:
ronment for protein structure homology modelling, QuantiSNP: an Objective Bayes Hidden-Markov
BIOINFORMATICS, 22(2), pp. 195-201, Jan. Model to detect and accurately map copy number
2006. variation using SNP genotyping data, NUCLEIC
[3] Babu, M. M., Luscombe, N. M., Aravind, L., et ACIDS RESEARCH, 35(6), pp. 2013-2025, Mar.
al.: Structure and evolution of transcriptional regu- 2007.
latory networks, CURRENT OPINION IN [16] Corander, J., Waldmann, P., Sillanpaa, M. J.:
STRUCTURAL BIOLOGY, 14(3), pp. 283-291, Bayesian analysis of genetic differentiation be-
Jun. 2004. tween populations, GENETICS, 163(1), pp. 367-
[4] Baum, L., Petrie, T. : Statistical inference for 374, Jan 2003.
probabilistic functions of finite state Markov chains [17] D'Andrea, L. D., Regan, L.: TPR proteins: the
ANNALS OF MATHEMATICAL STATISTICS, versatile helix, TRENDS IN BIOCHEMICAL
37, pp. 1554 - 1563. 1966. SCIENCES, 28(12), pp. 655-662, Dec. 2003.
[5] Bendtsen, J. D., Nielsen, H., von Heijne, G., et [18] Derrode, S., Pieczynski, W.: Signal and image
al.: Improved prediction of signal peptides: SignalP segmentation using Pairwise Markov chains, IEEE
3.0, JOURNAL OF MOLECULAR BIOLOGY, TRANSACTION ON SIGNAL PROCESSING,
340(4), pp.783-795, Jul. 2004. 52(9), pp. 2477-2489, Sep. 2004.
[6] Bennett-Lovsey, R. M., Herbert, A. D., Stern- [19] Dombeck, D. A., Khabbaz, A. N., Collman, F.,
berg, M. J., et al.: Exploring the extremes of se- et al.: Imaging large-scale neural activity with cel-
quence/structure space with ensemble fold recogni- lular resolution in awake, mobile mice, NEURON,
tion in the program Phyre, PROTEINS- 56(1), pp. 43-57, Oct. 2007.
STRUCTURE FUNCTION AND BIOINFOR-
[20] Finn, R. D., Mistry, J., Tate, J., et al.: The
MATICS, 70(3), pp. 611-625, Feb. 2008.
Pfam protein families database, NUCLEIC ACIDS
[7] Berriman, M., Haas, B. J., LoVerde, P. T., et RESEARCH, 38(S1), pp. D211-D222, Jan. 2010.
al.: The genome of the blood fluke Schistosoma
[21] Finn, R. D., Tate, J., Mistry, J., et al.: The
mansoni, NATURE, 460(7253), pp. 352-U65, Jul.
16 2009. Pfam protein families database, NUCLEIC ACIDS
RESEARCH, 36(SI), pp. D281-D288, Jan. 2008.
[8] Birney, E., Clamp, M., Durbin, R.: GeneWise
and genomewise, GENOME RESEARCH, 14(5), [22] Fredriksson, R., Lagerstrom, M. C., Lundin, L.
G., et al.: The G-protein-coupled receptors in the
pp. 988-995, May 2004.
human genome form five main families. Phyloge-

IJCTA | July-August 2014 1388


Available [email protected]
ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391


netic analysis, paralogon groups, and fingerprints, lates stress response and virulence functions,
MOLECULAR PHARMACOLOGY, 63(6), JOURNAL OF BACTERIOLOGY, 185(19), pp.
pp.1256-1272, Jun. 2003. 5722-5734, Oct. 2003.
[23] Fredriksson, R., Schioth, H. B.: The repertoire [35] Kim, D. E., Chivian, D., Baker, D.: Protein
of G-protein-coupled receptors in fully sequenced structure prediction and analysis using the Robetta
genomes, MOLECULAR PHARMACOLOGY, server, NUCLEIC ACIDS RESEARCH, 32(S2),
67(5), pp. 1414-1425, May 2005. pp. W526-W531, Jul. 2004.
[24] Fridlyand, J., Snijders, A. M., Pinkel, D., et [36] Kim, H., Melen, K., Osterberg, M., et al.: A
al.: Hidden Markov models approach to the analy- global topology map of the Saccharomyces cerevi-
sis of array CGH data, JOURNAL OF MULTI- siae membrane proteome, PROCEEDINGS OF
VARIATE ANALYSIS, 90(1), pp. 132-153, Jul. THE NATIONAL ACADEMY OF SCIENCES OF
2004. THE UNITED STATES OF AMERICA, 103(30),
pp. 11142-11147, Jul. 2006.
[25] Gerstein, M. B., Bruce, C., Rozowsky, J. S., et
al.: What is a gene, post-ENCODE? History and [37] Korn, J. M., Kuruvilla, F. G., McCarroll, S.
updated definition, GENOME RESEARCH, 17(6), A., et al.: Integrated genotype calling and associa-
pp. 669-681, Jun. 2007. tion analysis of SNPs, common copy number po-
lymorphisms and rare CNVs, NATURE GENET-
[26] Haft, D. H., Selengut, J., Mongodin, E.F., et
ICS, 40(10), pp. 1253-1260, Oct. 2008.
al.: A guild of 45 CRISPR-associated (Cas) protein
families and multiple CRISPR/Cas subtypes exist [38] la Cour, T., Kiemer, L., Molgaard, A., et al.:
in prokaryotic genomes, PLOS COMPUTATION- Analysis and prediction of leucine-rich nuclear
AL BIOLOGY, 1(6), pp. 474-483, Nov. 2005. export signals, PROTEIN ENGINEERING DE-
SIGN AND SELECTION, 17(6), pp. 527-536, Jun.
[27] Haft, D. H., Selengut, J. D., White, O.: The
2004.
TIGRFAMs database of protein families, NUC-
LEIC ACIDS RESEARCH, 31(1), pp. 371-373, [39] Lagesen, K., Hallin, P., Rodland, E. A., et al.:
Jan. 2003. RNAmmer: consistent and rapid annotation of ribo-
somal RNA genes, NUCLEIC ACIDS RE-
[28] Hoggart, C. J., Parra, E. J., Shriver, M. D., et
SEARCH, 35(9), pp.3100-3108, May 2007.
al.: Control of confounding of genetic associations
in stratified populations, AMERICAN JOURNAL [40] Lai, W. R., Johnson, M. D., Kucherlapati, R.,
OF HUMAN GENETICS, 72(6), pp. 1492-1504, et al.: Comparative analysis of algorithms for iden-
Jun. 2003. tifying amplifications and deletions in array CGH
[29] Jardine, A. K., Lin, D., Banjevic, D.: A review data, BIOINFORMATICS, 21(19), pp. 3763-3770,
on machinery diagnostics and prognostics imple- Oct. 2005.
menting condition-based maintenance, MECHAN- [41] Leslie, C. S., Eskin, E., Cohen, A., et al.:
ICAL SYSTEMS AND SIGNAL PROCESSING, Mismatch string kernels for discriminative protein
20(7), pp. 1483-1510, Oct. 2006. classification, BIOINFORMATICS, 20(4), pp. 467-
476, Mar. 2004.
[30] Juncker, A. S., Willenbrock, H., Von Heijne,
G., et al.: Prediction of lipoprotein signal peptides [42] Li, J., Wang, J.Z.: Automatic linguistic index-
in Gram-negative bacteria, PROTEIN SCIENCE, ing of pictures by a statistical modeling approach,
12(8), pp. 1652-1662, Aug. 2003. IEEE TRANSACTIONS ON PATTERN ANALY-
SIS AND MACHINE INTELLIGENCE, 25(9), pp.
[31] Kaell, L., Krogh, A., Sonnhammer, E. L.: Ad-
1075-1088, Sep. 2003.
vantages of combined transmembrane topology and
signal peptide prediction - the Phobius web server, [43] Li, N., Stephens, M.: Modeling linkage dise-
NUCLEIC ACIDS RESEARCH, 35(S), pp. W429- quilibrium and identifying recombination hotspots
W432, Jul. 2007. using single-nucleotide polymorphism data, GE-
NETICS, 165(4), pp. 2213-2233, Dec. 2003.
[32] Kale, A., Sundaresan, A., Rajagopalan, A. N.,
et al.: Identification of humans using gait, IEEE [44] Liu, G. Y., Loraine, A. E., Shigeta, R., et al.:
TRANSACTIONS ON IMAGE PROCESSING, NetAfix: Afiymetrix probesets and annotations,
13(9), pp. 1163-1173, Sep. 2004. NUCLEIC ACIDS RESEARCH, 31(1), pp. 82-86,
Jan. 2003.
[33] Kall, L., Krogh, A., Sonnhammer, E. L.: A
combined transmembrane topology and signal pep- [45] Loytynoja, A., Goldman, N.: An algorithm for
tide prediction method, JOURNAL OF MOLECU- progressive multiple alignment of sequences with
LAR BIOLOGY, 338(5), pp. 1027-1036, May insertions, PROCEEDINGS OF THE NATIONAL
2004. ACADEMY OF SCIENCES OF THE UNITED
STATES OF AMERICA, 102(30), pp. 10557-
[34] Kazmierczak, M. J., Mithoe, S. C. , Boor, K.
10562, Jul. 2005.
J., et al.: Listeria monocytogenes sigma(B) regu-

IJCTA | July-August 2014 1389


Available [email protected]
ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391


[46] Markou, M., Singh, S.: Novelty detection: a NATURE REVIEWS MOLECULAR CELL BI-
review - part 1: statistical approaches, SIGNAL OLOGY, 9(5), pp. 402-412, May 2008.
PROCESSING, 83(12), pp. 2481-2497, Dec. 2003.
[59] Sadreyev, R., Grishin, N.: COMPASS: A tool
[47] McKinney, S. A., Joo, C., Ha, T.: Analysis of for comparison of multiple protein alignments with
single-molecule FRET trajectories using hidden assessment of statistical significance, JOURNAL
Markov modeling, BIOPHYSICAL JOURNAL, OF MOLECULAR BIOLOGY, 326(1), pp. 317-
91(5), pp. 1941-1951, Sep. 2006. 336, Feb. 2003.
[48] Mi, H., Guo, N., Kejariwal, A., et al.: [60] Scheet, P., Stephens, M.: A fast and exible
PANTHER version 6: protein sequence and func- statistical model for large-scale population geno-
tion evolution data with expanded representation of type data: Applications to inferring missing geno-
biological pathways, NUCLEIC ACIDS RE- types and haplotypic phase, AMERICAN JOUR-
SEARCH, 35(SI), pp. D247-D252, Jan. 2007. NAL OF HUMAN GENETICS, 78(4), pp. 629-
[49] Mi, H. Y., Lazareva-Ulitsky, B., Loo, R., et 644, Apr. 2006.
al.: The PANTHER database of protein families, [61] Sheikh, H. R., Bovik, A. C.: Image informa-
subfamilies, functions and pathways, NUCLEIC tion and visual quality, IEEE TRANSACTIONS
ACIDS RESEARCH, 33(SI), pp. D284-D288, Jan. ON IMAGE PROCESSING, 15(2), pp. 430-444,
2005. Feb. 2006.
[50] Nielsen, M., Lundegaard, C., Worning, P., et [62] Sheikh, HR, Bovik, A. C., de Veciana, G.: An
al.: Reliable prediction of T-cell epitopes using information fidelity criterion for image quality as-
neural networks with novel sequence representa- sessment using natural scene statistics, IEEE
tions, PROTEIN SCIENCE, 12(5), pp. 1007-1017, TRANSACTIONS ON IMAGE PROCESS-ING,
May 2003. 14(12), pp. 2117-2128, Dec. 2005.
[51] Patterson, T. A., Thomas, L., Wilcox, C., et [63] Siepel, A., Bejerano, G., Pedersen, J. S., et al.:
al.: State-space models of individual animal Evolutionarily conserved elements in vertebrate,
movement, TRENDS IN ECOLOGY AND EVO- insect, worm, and yeast genomes, GENOME RE-
LUTION, 23(2), pp. 87-94, Feb. 2008. SEARCH, 15(8), pp. 1034-1050, Aug. 2005.
[52] Pieczyński W.: Pairwise Markov chains, IEEE [64] Siepel, A., Haussler, D.: Phylogenetic estima-
TRANSACTION ON PATTERN ANALYSIS tion of context-dependent substitution rates by
AND MACHINE INTELLIGENCE, 25(5), pp. maximum likelihood, MOLECULAR BIOLOGY
634-639, May. 2010. AND EVOLUTION, 21(3), pp. 468-488, Mar.
[53] Pinto, D., Pagnamenta, A. T., Klei, L., et al.: 2004.
Functional impact of global rare copy number vari- [65] Soding, J.: Protein homology detection by
ation in autism spectrum disorders, NATURE, HMM-HMM comparison, BIOINFORMATICS,
466(7304), pp. 368-372, Jul. 15 2010. 21(7), pp. 951-960, Apr. 2005.
[54] Po, D. D., Do, M. N.: Directional multiscale [66] Soding, J., Biegert, A., Lupas, A. N.: The
modeling of images using the contourlet transform, HHpred interactive server for protein homology
IEEE TRANSACTIONS ON IMAGE detection and structure prediction, NUCLEIC AC-
PROCESSING, 15(6), pp. 1610-1620, Jun. 2006. IDS RESEARCH, 33(2), pp. W244-W248, Jul.
2005.
[55] Portilla, J., Strela, V., Wainwright, M.J., et al.:
Image denoising using scale mixtures of Gaussians [67] Thomas, P. D., Campbell, M. J., Kejariwal, A.,
in the wavelet domain, IEEE TRANSACTIONS et al.: PANTHER: A library of protein families and
ON IMAGE PROCESSING, 12(11), pp.1338- subfamilies indexed by function, GENOME RE-
1351, Nov. 2003. SEARCH, 13(9), pp. 2129-2141, Sep. 2003.
[56] Rabiner, L. R.: A tutorial on hidden Markov [68] Thomas, P. D., Kejariwal, A., Campbell, M. J.,
models and selected application in speech recogni- et al.: PANTHER: a browsable database of gene
tion, PROCEEDINGS OF THE IEEE , 77(2), pp. products organized by biological function, using
257-286, 1989. curated protein family and subfamily classification,
NUCLEIC ACIDS RESEARCH, 31(1), pp. 334-
[57] Rashid, M., Saha, S., Raghava, G. P.: Support
341, Jan. 2003.
Vector Machine-based method for predicting sub-
cellular localization of mycobacterial proteins us- [69] Tunnaclifie, A., Wise, M. J.: The continuing
ing evolutionary information and motifs, BMC conundrum of the LEA proteins, NATURWIS-
BIOINFORMATICS, 8(337), Sep. 2007. SENSCHAFTEN, 94(10), pp. 791-812, Oct. 2007.
[58] Riley, T., Sontag, E., Chen, P., et al.: Tran- [70] Vassilatis, D. K., Hohmann, J. G., Zeng, H., et
scriptional control of human p53-regulated genes, al.: The G protein-coupled receptor repertoires of
human and mouse, PROCEEDINGS OF THE NA-

IJCTA | July-August 2014 1390


Available [email protected]
ISSN:2229-6093

Wojciech Salabun et al, Int.J.Computer Technology & Applications,Vol 5 (4),1384-1391


TIONAL ACADEMY OF SCIENCES OF THE
UNITED STATES OF AMERICA, 100(8), pp.
4903-4908, Apr. 2003.
[71] Wang, K., Li, M., Hadley, D., et al.:
PennCNV: An integrated hidden Markov model
designed for high-resolution copy number variation
detection in whole-genome SNP genotyping data
GENOME RESEARCH, 17(11), pp. 1665-1674,
Nov. 2007.
[72] Wang, L. A., Hu, W. M., Tan, T. N.: Recent
developments in human motion analysis, PAT-
TERN RECOGNITION, 36(3), pp. 585-601, Mar.
2003.
[73] Whisstock, J. C., Lesk, A. M.: Prediction of
protein function from protein sequence and struc-
ture, QUARTERLY REVIEWS OF BIOPHYSICS,
36(3), pp. 307-340, Aug. 2003.
[74] Xu, R., Wunsch, D.: Survey of clustering algo-
rithms, IEEE TRANSACTIONS ON NEURAL
NETWORKS, 16(3), pp.645-678, May 2005.
[75] Zhang, Z. M., Henzel, W. J.: Signal peptide
prediction based on analysis of experimentally veri-
fied cleavage sites, PROTEIN SCIENCE, 13(10),
pp. 2819-2824, Oct. 2004.

IJCTA | July-August 2014 1391


Available [email protected]
View publication stats

You might also like