Pattern Recognition
Pattern Recognition
Pattern recognition is the automated recognition of patterns and regularities indata. Pattern recognition is closely related toartificial
intelligence and machine learning,[1] together with applications such as data mining and knowledge discovery in databases (KDD),
and is often used interchangeably with these terms. However, these are distinguished: machine learning is one approach to pattern
recognition, while other approaches include hand-crafted (not learned) rules or heuristics; and pattern recognition is one approach to
artificial intelligence, while other approaches includesymbolic artificial intelligence.[2] A modern definition of pattern recognition is:
The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of
computer algorithms and with the use of these regularities to take actions such as classifying the data into different
categories.[3]
This article focuses on machine learning approaches to pattern recognition. Pattern recognition systems are in many cases trained
from labeled "training" data (supervised learning), but when no labeled data are available other algorithms can be used to discover
previously unknown patterns (unsupervised learning). Machine learning is the common term for supervised learning methods and
originates from artificial intelligence, whereas KDD and data mining have a larger focus on unsupervised methods and stronger
connection to business use. Pattern recognition has its origins in engineering, and the term is popular in the context of computer
vision: a leading computer vision conference is named Conference on Computer Vision and Pattern Recognition. In pattern
recognition, there may be a higher interest to formalize, explain and visualize the pattern, while machine learning traditionally
focuses on maximizing the recognition rates. Yet, all of these domains have evolved substantially from their roots in artificial
intelligence, engineering and statistics, and they've become increasingly similar by integrating developments and ideas from each
other.
In machine learning, pattern recognition is the assignment of a label to a given input value. In statistics, discriminant analysis was
introduced for this same purpose in 1936. An example of pattern recognition is classification, which attempts to assign each input
value to one of a given set of classes (for example, determine whether a given email is "spam" or "non-spam"). However, pattern
recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a
real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (for example, part
of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse tree to an
input sentence, describing thesyntactic structure of the sentence.
Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to perform "most likely"
matching of the inputs, taking into account their statistical variation. This is opposed to pattern matching algorithms, which look for
exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is regular expression
matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and
word processors. In contrast to pattern recognition, pattern matching is not generally a type of machine learning, although pattern-
matching algorithms (especially with fairly general, carefully tailored patterns) can sometimes succeed in providing similar-quality
output of the sort provided by pattern-recognition algorithms.
Contents
Overview
Probabilistic classifiers
Number of important feature variables
Problem statement (supervised version)
Frequentist or Bayesian approach to pattern recognition
Uses
Algorithms
Classification algorithms (supervised algorithms predicting categorical labels)
Clustering algorithms (unsupervised algorithms predicting categorical labels)
Ensemble learning algorithms (supervised meta-algorithms for combining multiple learning algorithms together)
General algorithms for predicting arbitrarily-structured (sets of) labels
Multilinear subspace learning algorithms (predicting labels of multidimensional data using tensor
representations)
Real-valued sequence labeling algorithms (predicting sequences of real-valued labels)
Regression algorithms (predicting real-valued labels)
Sequence labeling algorithms (predicting sequences of categorical labels)
See also
References
Further reading
External links
Overview
Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value. Supervised
learning assumes that a set of training data (the training set) has been provided, consisting of a set of instances that have been
properly labeled by hand with the correct output. A learning procedure then generates a model that attempts to meet two sometimes
conflicting objectives: Perform as well as possible on the training data, and generalize as well as possible to new data (usually, this
means being as simple as possible, for some technical definition of "simple", in accordance with Occam's Razor, discussed below).
Unsupervised learning, on the other hand, assumes training data that has not been hand-labeled, and attempts to find inherent patterns
in the data that can then be used to determine the correct output value for new data instances.[4] A combination of the two that has
recently been explored is semi-supervised learning, which uses a combination of labeled and unlabeled data (typically a small set of
labeled data combined with a large amount of unlabeled data). Note that in cases of unsupervised learning, there may be no training
data at all to speak of; in other words,and the data to be labeledis the training data.
Note that sometimes different terms are used to describe the corresponding supervised and unsupervised learning procedures for the
same type of output. For example, the unsupervised equivalent of classification is normally known as clustering, based on the
common perception of the task as involving no training data to speak of, and of grouping the input data into clusters based on some
inherent similarity measure (e.g. the distance between instances, considered as vectors in a multi-dimensional vector space), rather
than assigning each input instance into one of a set of pre-defined classes. Note also that in some fields, the terminology is different:
For example, in community ecology, the term "classification" is used to refer to what is commonly known as "clustering".
The piece of input data for which an output value is generated is formally termed an instance. The instance is formally described by a
vector of features, which together constitute a description of all known characteristics of the instance. (These feature vectors can be
seen as defining points in an appropriate multidimensional space, and methods for manipulating vectors in vector spaces can be
correspondingly applied to them, such as computing the dot product or the angle between two vectors.) Typically, features are either
categorical (also known as nominal, i.e., consisting of one of a set of unordered items, such as a gender of "male" or "female", or a
blood type of "A", "B", "AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g., "large", "medium" or "small"),
integer-valued (e.g., a count of the number of occurrences of a particular word in an email) or real-valued (e.g., a measurement of
blood pressure). Often, categorical and ordinal data are grouped together; likewise for integer-valued and real-valued data.
Furthermore, many algorithms work only in terms of categorical data and require that real-valued or integer-valued data be
discretized into groups (e.g., less than 5, between 5 and 10, or greater than 10).
Probabilistic classifiers
Many common pattern recognition algorithms areprobabilistic in nature, in that they use statistical inference to find the best label for
a given instance. Unlike other algorithms, which simply output a "best" label, often probabilistic algorithms also output a probability
of the instance being described by the given label. In addition, many probabilistic algorithms output a list of the N-best labels with
associated probabilities, for some value of N, instead of simply a single best label. When the number of possible labels is fairly small
(e.g., in the case of classification), N may be set so that the probability of all possible labels is output. Probabilistic algorithms have
many advantages over non-probabilistic algorithms:
They output a confidence value associated with their choice. (Note that some other algorithms may also output
confidence values, but in general, only for probabilistic algorithms is this value mathematically grounded in
probability theory. Non-probabilistic confidence values can in general not be given any specific meaning, and only
used to compare against other confidence values output by the same algorithm.)
Correspondingly, they can abstain when the confidence of choosing any particular output is too low .
Because of the probabilities output, probabilistic pattern-recognition algorithms can be more fectively
ef incorporated
into larger machine-learning tasks, in a way that partially or completely avoids the problem of
error propagation.
Techniques to transform the raw feature vectors (feature extraction) are sometimes used prior to application of the pattern-matching
algorithm. For example, feature extraction algorithms attempt to reduce a large-dimensionality feature vector into a smaller-
dimensionality vector that is easier to work with and encodes less redundancy, using mathematical techniques such as principal
components analysis (PCA). The distinction between feature selection and feature extraction is that the resulting features after
feature extraction has taken place are of a different sort than the original features and may not easily be interpretable, while the
features left after feature selection are simply a subset of the original features.
For a probabilistic pattern recognizer, the problem is instead to estimate the probability of each possible output label given a
particular input instance, i.e., to estimate a function of the form
where the feature vector input is , and the function f is typically parameterized by some parameters .[8] In a discriminative
approach to the problem, f is estimated directly. In a generative approach, however, the inverse probability is instead
estimated and combined with theprior probability using Bayes' rule, as follows:
When the labels are continuously distributed (e.g., in regression analysis), the denominator involves integration rather than
summation:
The value of is typically learned using maximum a posteriori (MAP) estimation. This finds the best value that simultaneously
meets two conflicting objects: To perform as well as possible on the training data (smallest error-rate) and to find the simplest
possible model. Essentially, this combines maximum likelihood estimation with a regularization procedure that favors simpler models
over more complex models. In a Bayesian context, the regularization procedure can be viewed as placing a prior probability on
different values of . Mathematically:
where is the value used for in the subsequent evaluation procedure, and , the posterior probability of , is given by
In the Bayesian approach to this problem, instead of choosing a single parameter vector , the probability of a given label for a new
instance is computed by integrating over all possible values of , weighted according to the posterior probability:
Bayesian statistics has its origin in Greek philosophy where a distinction was already made between the 'a priori' and the 'a posteriori'
knowledge. Later Kant defined his distinction between what is a priori known – before observation – and the empirical knowledge
gained from observations. In a Bayesian pattern classifier, the class probabilities can be chosen by the user, which are
then a priori. Moreover, experience quantified as a priori parameter values can be weighted with empirical observations – using e.g.,
the Beta- (conjugate prior) and Dirichlet-distributions. The Bayesian approach facilitates a seamless intermixing between expert
knowledge in the form of subjective probabilities, and objective observations.
Uses
Within medical science, pattern recognition is the basis for computer-aided diagnosis
(CAD) systems. CAD describes a procedure that supports the doctor's interpretations
and findings. Other typical applications of pattern recognition techniques are automatic
speech recognition, classification of text into several categories (e.g., spam/non-spam
email messages), the automatic recognition of handwritten postal codes on postal
envelopes, automatic recognition of images of human faces, or handwriting image
extraction from medical forms.[9] The last two examples form the subtopic image
analysis of pattern recognition that deals with digital images as input to pattern The face was automatically
recognition systems.[10][11] detected by special software.
Artificial neural networks (neural net classifiers) and deep learning have many real-world applications in image processing, a few
examples:
identification and authentication: e.g.,license plate recognition,[12] fingerprint analysis, face detection/verification;[13] ,
and voice-based authentication.[14]
[15] , breast tumors or heart sounds;
medical diagnosis: e.g., screening for cervical cancer (Papnet)
defence: various navigation and guidance systems, target recognition systems, shape recognition technology etc.
[16]
For a discussion of the aforementioned applications of neural networks in image processing, see e.g.
In psychology, pattern recognition (making sense of and identifying objects) is closely related to perception, which explains how the
sensory inputs humans receive are made meaningful. Pattern recognition can be thought of in two different ways: the first being
template matching and the second being feature detection. A template is a pattern used to produce items of the same proportions. The
template-matching hypothesis suggests that incoming stimuli are compared with templates in the long term memory. If there is a
match, the stimulus is identified. Feature detection models, such as the Pandemonium system for classifying letters (Selfridge, 1959),
suggest that the stimuli are broken down into their component parts for identification. For example, a capital E has three horizontal
lines and one vertical line.[17]
Algorithms
Algorithms for pattern recognition depend on the type of label output, on whether learning is supervised or unsupervised, and on
whether the algorithm is statistical or non-statistical in nature. Statistical algorithms can further be categorized as generative or
discriminative.
Kalman filters
Particle filters
See also
Adaptive resonance theory
Black box
Cache language model
Compound term processing
Computer-aided diagnosis
Data mining
Deep Learning
List of numerical analysis software
List of numerical libraries
Machine learning
Multilinear subspace learning
Neocognitron
Perception
Perceptual learning
Predictive analytics
Prior knowledge for pattern recognition
Sequence mining
Template matching
Contextual image classification
List of datasets for machine learning research
References
This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing"
terms of the GFDL, version 1.3 or later.
Further reading
Fukunaga, Keinosuke (1990).Introduction to Statistical Pattern Recognition(2nd ed.). Boston: Academic Press.
ISBN 0-12-269851-7.
Hornegger, Joachim; Paulus, Dietrich W. R. (1999). Applied Pattern Recognition: A Practical Introduction to Image
and Speech Processing in C++(2nd ed.). San Francisco: Morgan Kaufmann Publishers.ISBN 3-528-15558-2.
Schuermann, Juergen (1996).Pattern Classification: A Unified View of Statistical and Neural Approaches. New York:
Wiley. ISBN 0-471-13534-8.
Godfried T. Toussaint, ed. (1988). Computational Morphology. Amsterdam: North-Holland Publishing Company .
Kulikowski, Casimir A.; Weiss, Sholom M. (1991). Computer Systems That Learn: Classification and Prediction
Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems . Machine Learning. San Francisco:
Morgan Kaufmann Publishers.ISBN 1-55860-065-5.
Jain, Anil.K.; Duin, Robert.P.W.; Mao, Jianchang (2000). "Statistical pattern recognition: a review".IEEE Transactions
on Pattern Analysis and Machine Intelligence. 22 (1): 4&ndash, 37. doi:10.1109/34.824819.
An introductory tutorial to classifiers (introducing the basic terms, with numeric example)
External links
The International Association for Pattern Recognition
List of Pattern Recognition web sites
Journal of Pattern Recognition Research
Pattern Recognition Info
Pattern Recognition (Journal of the Pattern Recognition Society)
International Journal of Pattern Recognition and Artificial Intelligence
International Journal of Applied Pattern Recognition
Open Pattern Recognition Project, intended to be an open source platform for sharing algorithms of pattern
recognition
Improved Fast Pattern MatchingImproved Fast Pattern Matching
Text is available under theCreative Commons Attribution-ShareAlike License ; additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of theWikimedia
Foundation, Inc., a non-profit organization.