Pattern Recognition...

Pattern recognition is the science of making inferences from perceptual data using various tools and is crucial for artificial intelligence and computer vision. It involves the extraction and classification of patterns based on features, which are represented in a high-dimensional feature space. The document discusses the complexities of implementing pattern recognition systems, including the importance of feature selection and the use of statistical and structural approaches for classification.

Uploaded by

droneclub420

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

1 views

Pattern Recognition...

Uploaded by

droneclub420

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 21

Pattern Recognition 1.0. INTRODUCTION It is generally easy for a person to differentiate the sound of a human voice, from that of a violin; a handwritten numeral "3," from an "8"; and the aroma of a rose, from that of an onion. However, it is difficult for a programmable computer to solve these kinds of perceptual problems. These problems are difficult because each pattern usually contains a large amount of information, and the recognition problems typically have an inconspicuous, high-dimensional, structure. L.1. WHAT IS PATTERN RECOGNITION? It is generally easy for a person to differentiate the sound of a human voice, from that of a violin; a handwritten numeral "3," from an "8"; and the aroma of a rose, from that of an onion. However, it is difficult for a programmable computer to solve these kinds of perceptual problems. These problems are difficult because each pattern usually contains a large amount of information, and the recognition problems typically have an inconspicuous, high-dimensional, structure. Pattern recognition is the science of making inferences from perceptual data, using tools from Statistics, probability, computational geometry, machine learning, signal processing, and algorithm design. Thus, it is of central importance to artificial intelligence and computer vision, and has far-reaching applications in engineering, Science, medicine, and business. In particular, Introduction to Inside this Chapter © Introduction What is Pattern Recognition? Basics of Pattern Recognition ‘An Example -Design Principles of Pattem ‘ecognition System fatten Recognition aches ing and Adaptation S + + Systems= Pattern Recognition EET advances made during the last half century, now allow computers to interact more effectively with humans and the natural world (e.g., speech recognition software, However, the most important problems in pattern rec esti bie a a i ign and build machines is natural that we should seck to design and build machines c ee automated specch recognition, fingerprint ey ae recognition DNA sequence identification, and much more, it is clear ‘at reliable, accurate pattern recognition by machine would be immensely useful. Moreover, in solving the indefinite number of problems required to i ee 1 fan i jation for pattern recoy i : deeper understanding and appreciation for pe cognition. is esign efforts may in fact be roblems, such as speech and visual recognition, our f n influenced by knowledge of how these are solved in nature, both in the algorithms we employ and in the design of special-purpose hardware [2]. 1.2. BASICS OF PATTERN RECOGNITION Feature can be defined as any distinctive aspect, quality or characteristic which, may be symbolic (i.e., color) or numeric (ie., height). The combination of d features is represented as a d-dimensional column vector called a feature vector. The d-dimensional space defined by the feature vector is called feature space. Objects are represented as points in feature space. This representation is called a scatter plot [3]. Pattern is defined as composite of features that are characteristic of an individual, In classification, a pattern is a pair of variables {x,w} where x is a collection of observations or features (feature vector) and w is the concept behind the observation (label). The quality of a feature vector is related to its ability to discriminate examples from different classes (Fig. 1.1). Examples from the same class should have similar feature values and while examples from different classes having different feature values. xo * exex Good Features Bad Features (a) Ne xx Linear Separability Non-linear ‘Separability Multi-modet Highly Correlated (b) Fig. 1.4. Characteristic (feature); a. the distinction between good and Poor features, and b. feature properties i :Introduction to Pattern Recognition classes by observing patterns of distinguishing characteristics and comparing them to a model member of each class. Pattern recognition involves the extraction of patterns from data, their analysis and, finally, the identification of the category (class) each of the pattern belongs to. A typical pattern recognition system contains a sensor, a preprocessing mechanism (segmentation), a feature extraction mechanism (manual or automated), a classification or description algorithm, and a set of examples (training set) already classified or described (post-processing) (Fig. 1.2). Fig. 1.2. Classifier and decision boundaries Feedback/Adaptation ae Gasaicaon | cass assignment [Preprocessing] " _ Feature Clustering : The real world Sensor : a erecia Algorithm F-* Cluster assignment ‘Re i i Rearesion | predicted variables) Fig. 1.3. A patter recognition system 1.3, AN EXAMPLE To illustrate the complexity of some of the types of problems involved, let us consider the following example. Suppose that a fish-packing plant wants to automate the process of sorting incoming fish on a conveyor belt according to species. As a pilot project, it is decided to try to separate sea bass from salmon using optical sensing (Fig. 1.4). (@) (b) Fig. 1.4. The objects to be classified; a. Sea bass, and b. salmon We set up a camera (see Fig. 1.5), take some sample images, and begin to note some physical differences between the two types of fish, length, lightness, width, number and shape of fins, position of the mouth, and so on, and these suggest features to explore for use in our classifier. We also notice noise or variations in the images, variations in lighting, and position of the fish on the conveyor, even static due to the electronics of the camera itself.2 Pattern Recognition Camera at Pt, Conveyor belt Oh OA onto) Conveyor belt Fig. 1.5. The fish-packing system Given that there truly are differences between the population of sea bass and that of salmon, we view them as having different models, different descriptions, which are pPically mathematical in form. The goal and approach in pattern classification is to hypothesize the class of these models, process the sensed data to eliminate noise, and for any sensed pattern choose the model that corresponds best. In our prototype system, first, the camera captures an image of the fish (Fig. 1.5), Next, the camera's signals are preprocessed to simplify subsequent operations without losing relevant information (Fig. 1.3). In Particular, we might use a segmentation cberation in which the images of different fish are somehow isolated from one another and from the background. The information from a single fish is then sent to a feature errrrt~—“‘™C_CtiésC*C“C“(“‘‘REECC—S by measuring certain features or srenaaties. These features are then passed to a classifier that evaluates the evidence Presented and makes a final decision as to the species. Count \ a i 2 ' SeeBass | i 18 Salmon ot i 16 _— io i 14 0 12 i 10- 18: 25 legories. The value marked [* errors will lead to the smallest number of ‘ave some typical length, and this L an obvious feature, and we might erely by seeing whether or Not, the length 1 of a fishIntroduction to Pattern Recognition z tern Recognition ee ceeds some critical vahte I*. To choose samples of the different types of fish: vr results. . we could obtain some design or training ake length measurements, and inspect the Suppose that we do this ar i i in Fi histograms best out the statement that ven bane eas shown in Fig 1.6, These on average, but it is clear that this singh Pe ee aman, . ae a S single criterion is quite poor; no matter how we choose I”, we cannot reliably separate sea bass from salmon by length alone Thus, we ty another feature, namely the average lightness of the fish sesles, Now we are very careful to eliminate variations in illumination, because the r obscure the models and corrupt our new classifier. The resulting histograms ond a in Fig. 1.7 is much more satisfactory: The classes are much So far we have assumed that the consequences of our actions are equally costly: Deciding the fish was a sea bass when in fact it was a salmon was just as undesirable as the converse. Such symmetry in the cost is often, but not invariably, the case. For instance, as a fish-packing company we may know that our customers easily accept occasional pieces of tasty salmon in their cans labeled "sea bass," but they object vigorously if a piece of sea bass appears in their calls labeled "salmon." If we want to stay in business, we should adjust our decisions to avoid antagonizing our customers, even if it means that more salmon makes its way into the cans of sea bass. In this case, then, we should move our decision boundary to smaller values of lightness, thereby reducing the number of sea bass that are classified as salmon (Fig. 1.7). The more our customers object to getting sea bass with their salmon (i.e., the more costly this type of error) the lower we should set the decision threshold x* in Fig. 1.7. Such considerations suggest that there is an overall single cost associated with our decision, and our true task is to make a decision rule (i.e., set a decision boundary) to minimize such a cost. This is the central task of decision theory of which, pattern classification is perhaps the most important subfield. Count 9 Salmon F iH Length fo! 18 20 25 i ¢ will lead to i itegories. The value marked x* wil |. 1.7. Histograms for the lightness feature for the two ca Cs anesthe smallest number of errors | h our decisions and choose the optimal if we know the costs associated wit ‘ 7 ie x*, we may be dissatisfied with the resulting performance. Our first i sh, Let us impulse might be to seek yet a different feature on which to eas ae assume, however, that no other single visual feature yields better p't that based on lightness. To improve recognition, then, we must resort to the use of more than one feature at a time, ‘ In our search for other features, we might try to capitalize on the observation thay sea bass are typically wider than salmon. Now we have two features for clasiying fish, the lightness x, and the width x We realize that the feature extractor has thys reduced the image of each fish to a point or feature vector x in a eneional feature space, wher x A vee (Lay a (2) a Our problem now is to partition the feature space into two regions, hea for alj ints i i il call the fish a sea bass, and for all points in the other, points in one region we wil tors for our samples we call it a salmon. Suppose that we measure the feature vecto oles ang obtain the scattering of points shown in Fig. 1.8. This plot suggests the following rule for separating the fish: Classify the fish as sea bass if its feature vector falls above the decision boundary shown, and as salmon otherwise. This rule appears to do a good job of separating our samples and suggests that perhaps incorporating yet more features would be desirable. Besides the lightness ang width of the fish, we might include width some shape parameter, such as the vertex angle of the dorsal fin, or the placement of the eyes and so on. How do we know beforehand which of these features will work best? Some features might be redundant. For instance, if the eye-color of all fish correlated perfectly with width, then classification performance need not be improved if we also include eye color as a feature. Suppose that other features are too expensive to measure, or provide little in the approach described above, and that we are forced to make our decision based on the two features. If our models were extremely complicated, our classifier would have a decision boundary more complex than the simple straight Jine. In that case, all the training patterns would be separated perfectly, as shown in Fig. 1.9. With such a solution, though, our satisfaction would be premature because the central aim of designing a classifier is to suggest actions when presented with new patterns, that is, fish not yet seen. This is the issue of generalization. It is unlikely that the complex decision boundary in Fig, 1.9 would provide good generalization; it seems to be tuned to the particular training samples rather than some underlying characteristics or true model of all the sea bass and salmon that will have to be separated. Naturally, one approach would be to get more training samples for obtaining @ better estimate of the true underlying characteristics, for instance the probability distributions of the categories. In some pattern Tecognition problems, however, the amount of such data we can obtain easily is often quite limited. Even with a vast amount of training data in a continuous feature space though, if we followed the 6 Pattern Recognition Lightness Fig. 1.8. The two features of lightness and width for sea bass and salmonIntro duction to Pattern Recognition 7 approach in Fig. 1.9 our class a ig. I. lassifier would gi 5 that would be unlikely to do well on ue Srola sompliented decision boundary; one Rather, then, we migh cae + ight seck to simplif i i fy the ree i . ae ae will not require a decision eounaae nati pe fee ia taining Semplesti¢ es Bes pres ceniated with the slightly poorer performanee on the Sore ‘at our classifier will have better performance on new Wiath 20 Salmon ‘Sea-bass Lightness Fig. 1.9. The two features of lightness and width for sea bass and salmon This makes it quite clear that our decisions are fundamentally task-or cost specific, and that creating a single general purpose artificial pattern recognition device-that is, one capable of acting accurately based on a wide variety of tasks is a profoundly difficult challenge. This should give us added appreciation of the ability of humans to switch rapidly and fluidly between pattern recognition tasks. It was necessary in our fish example to choose our features carefully, and hence achieve a representation (as in Fig. 1.10) that enabled reasonably successful pattern classification. A central aspect in virtually every pattern recognition problem is that of achieving such a "good" representation, one in which the structural relationships among the components are simply and naturally revealed, and one in which the true (unknown) model of the patterns can be expressed. In some cases, patterns should be represented as vectors of real-valued numbers, in others ordered lists of attributes, in yet others, descriptions of parts and their relations, and so forth. We seek a representation in which the patterns that lead to the same action are somehow close d a different action. The extent to which to one another, yet far from those that deman : wwe create or learn a proper representation and how we quantify near and far apart will determine the success of our pattern classifier. A number of additional characteristics are desirable for the representation. We might wish to favor a small number of features, which might lead to simpler decision regions and a classifier easier to train. We might also wish to have features that are robust, that is, relatively insensitive to noise or other errors. In practical applications, we may need the classifier to act quickly, or use few-electronic components, memory, or processing steps.8 Pattern Recognition Width 22 Sea-bass Lightness i i i formance on Fig. 1.10. The decision boundary shown might represent the optimal tradeoff between per “ne traning set and simplicity of classifier, thereby giving the highest accuracy on new pattems 1.4. DESIGN PRINCIPLES OF PATTERN RECOGNITION SYSTEM There are two fundamental approaches for implementing a pattern recognition system: statistical and structural. Each approach employs different techniques to implement the description and classification tasks. Hybrid approaches, sometimes referred to as a unified approach to pattern recognition, combine both statistical and structural techniques within a pattern recognition system. Statistical pattern recognition draws from established concepts in statistical decision theory to discriminate among data from different groups based upon quantitative features of the data. There are a wide variety of statistical techniques that can be used within the description task for feature extraction, ranging from simple descriptive statistics to complex transformations. Examples of statistical feature extraction techniques include mean and standard deviation computations, frequency count summarizations, Karhunen-Léeve transformations, Fourier transformations, wavelet transformations, and Hough transformations. The quantitative features extracted from each object for statistical pattern recognition are organized into a fixed length feature vector where the meaning associated with each feature is determined by its position within the vector (ie., the first feature describes a particular characteristic of the data, the second feature describes another characteristic, and so on). The collection of feature vectors generated by the description task are passed to the classification task. Statistical techniques used as classifiers within the classification task include those based on similarity (e.g, template matching, k-nearest neighbor), probability (eg, Bayes rule), boundaries (e.g., decision trees, neural networks), and clustering (e.g. means, hierarchical). __ The quantitative nature of statistical pattern recognition makes it difficult t@ discriminate (observe a difference) among groups based on the morphological (ic shape based or structural) subpatterns and their interrelationships embedded withis the data. This limitation provided the impetus for the development of a structutl approach to pattern recognition that is supported by psychological evidence pertainingNatron Ption to Pattern Recognition 9 ——— pnilion. Object recognition in humans to the functioning, of thn has been demonstrated to in : enctetiatiea of ante at rachel tepresentations of explicit, structure-oriented te-on the basin of a gt beam el ions have been shown | See ae tev © Of similarity between the extracted fentorce and ‘tae l for ¢ tance, the iti ts theory explains the process of it mans: (1) the obj cmpanents “ck of pattern recognition in humans: (1) the chy into separate replons econding twee irae eee ao nae, tes et Me (2 i (Cf, Ruminance, texture, and color), (2) each Segmented region is approximated by 2 lmple keometiic: shape, and (3) the object i fed upon the similarity in Crmpenition between the Reomettie representation of the object and the central tendency of cach proup. This theorized functioning of human Perception and cognition serves as the foundation for the structural approach to Pattern recognition. N recognition, sometimes referred toas formal language among data from different groups base wan perception and coy Shuetural pat due to ils origins in the subpatierns (or building blocks) and the relationships among lain. The semantics associated with each feature are ‘ictermined hy the coding seheme (i., the selection of morphologies) used te identify primitives in the data. Feature veetors generated by structural pattern recognition ® contain @ variable number of features (one for each primitive extracted from the data) in order to accommodate the presence of superfluous structures which have ho impact on classification, Since the interrelationships among the extracted primitives 1 also be encoded, the feature vector must either include additional features ribing, the relationships among primitives or take an alternate form, such as a relational graph, that can be parsed by a syntactic grammar. ‘The emphasis on relationships within data makes a structural approach to pattern recognition most sensible for data which contain an inherent, identifiable organization such as image data (which is organized by location within a visual rendering) and time- series data (which is organized by time); data composed of independent Number of segments 4 samples of quantitative Numbor ol orzortal sagen 2 measurements, lack ordering Numbor of vertical : and require a statistical —Numbor of dlagonal sogmonts :0 horizontal approach, Methodologic : used fo extract structural pie : fealures from image dala yynporottelonelseomenis 1 iagonal => diagonal cilia I sogments : 0 NO uch as morphological image —_ Numbor of vertical : horizontal Seine feahca sult Numbor of diagonal sogments:2 ri in primitives such as edges, talistical and structural approaches to pattem curves, regions; feature rontiion apple to a common ienieation problem, The goa extraction techniques for “jg to discriminate betwen the square and the tang a time-series data include statistical approach extracts Cae eae Fae ieee Hing trnvale davser A stucl approach exacts morpacgtal sqession, and curve fitting theoretic er ts marpotgal wie ame ner Me re as primitives that encodeg@phs; classifi sai eyntacle granatine sequential, time-ordered Slatistical Structural horizontal vertical verticalPattern Recognition ificati ives at an identification using parsing: the ~ tionships. The classification task arrives at t p een features are identified as being representative of a particular group ifth ssfully parsed by a syntactic grammar. When discr ig among if they can be succ . amar. When deciminatrg ang more than two groups, a syntactic grammar is classifier must be extended with an ‘adjudication scheme so as to resolve multiple las successful parsings. AIO i r both approaches can ppl same ara Se eae riage pow differentiate between the square and the trang, A statistical approach extracts quantitative features such as the re ie z rizonta Vertical, and diagonal segments which are then passed to a decision: heoretic classifier ‘A structural approach extracts morphological features ao cit iterrelationship within each figure. Using a straight line segment as. the iar ” See relational graph is generated and classified by determining the ynt 7 grammar that can successfully parse the relational graph. In this example, bo the statistiol an structural approaches would be able to accurately distinguish between the two discriminability is directly influenced by geometries. In more complex data, however, n the particular approach employed for pattern recognition because the features extracted represent different characteristics of the data. A summary of the differences between statistical and structural approaches to pattern recognition is shown in Table 1.1. The essential dissimilarities are two-fold: (1) the description generated by the statistical approach is quantitative, while the structural approach produces a description composed of subpatterns or building blocks; and (2) the statistical approach discriminates based upon numeric differences among features from different groups, while grammars are used by the structural approach to define a language encompassing the acceptable configurations of primitives for each group. Hybrid systems can combine the two approaches as a way to compensate for the drawbacks of each approach, while conserving the advantages of each. As a single level system, structural features can be used with either a statistical or structural classifier. Statistical features cannot be used with a structural classifier because they lack relational information, however statistical information can be associated with structural primitives and used to resolve ambiguities during classification (e.g., as when parsing with attributed grammars) or embedded directly in the classifier itself (e.g., as when parsing with stochastic grammars). Hybrid systems can also combine the two approaches into a multilevel system using a parallel or a hierarchical arrangement. 10 Table 1.1. A summary of the differences between statistical and structural approaches to pattern recognition. Due to their divergent theoretical foundations, the two approaches focus on different data characteristics and employ distinctive techniques to implement both the description and classification tasks Statistical Structural Foundation | Statistical decision theory Human perception and cognition Description | Quantitative features Morphological primitives Fixed number of features Variable number of primitives Ignores feature relationships Captures primitives relationships Semantics from feature position Semantics from primitive encoding Classification | Statistical Parsing with syntactic grammarsIntroduction to Pattern Recognition us 1.5. PATTERN RECOGNITION SYSTEMS APPROACHES Many pattern Recognition 5 components as shown in above represents the vario Input ystems can be partitioned i be pa ed into phase. above figure. Five phases as shown —" us approaches for pattern recogniti \ \gnition systems. Phase 1 sensor converts images or sounds or other Phase-2 Physical inputs into signal data, Phase-2 the segmentation *L_Seamenaten isolates sensed objects from the background or from other | objects. Phase-3 feature extractor measures object properties Phase-3| Feature Exracion that are useful for classification. The phase-4 classification uses these features to assign the sensed obj i nject to category. The final phase-5, the post processor can take eecsuecee Phase-4| Classification other considerations such as the effects of context and costs it of errors, to decide for appropriate action. 1 7 Phase-5| Pe i In describing our hypothetical fish classification system, a we distinguished between the three different operations of 1 preprocessing, feature extraction and classification (see Fig. Fig. 1.12. 1.3). To understand the problem of designing such a system, a we must understand the problems that each of these components must solve. Decision, 1.5.1. SeNsiING The input to a pattern recognition system is often some kind of a transducer, such as a camera or a microphone array. The difficulty of the problem may well depend on the characteristics and limitations of the transducer — its bandwidth, resolution, sensitivity, distortion, signal-to-noise ratio, latency, etc. 1.5.2. SEGMENTATION AND GROUPINGS In our fish example, we assumed that each fish was isolated, separate from others on the conveyor belt, and could easily be distinguished from the conveyor belt. In practice, the fish would often be overlapping, and our system would have to determine where one fish ends and the next begins-the individual patterns have to be segmented. If we have already recognized the fish then it would be easier to segment their images. How can we segment the images before they have been categorized, or categorize them before they have been segmented? It seems we need a way to know when we have switched from one model to another, or to know when we just have background or no category. How can this be done? Segmentation is one of the deepest problems in pattern recognition. Closely related to the problem of segmentation is the problem of recognizing or grouping together the various pans of a composite object. I . Feature EXTRACTION The conceptual boundary between feature extraction and classification proper is somewhat arbitrary: An ideal feature extractor would yield a representation that makes the job of the classifier trivial; conversely, an omnipotent classifier would not need the help of a sophisticated feature extractor The distinction is forced upon us for practical rather than theoretical reasons. : The traditional goal of the feature extractor is to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories. This leads to the = of seeking distinguishing features that are invariant to irrelevant transformations o412 Pattern Recognition ———$————$____— ees ea a. the input. In our fish example, the absolute location of fish on the conveyor bet % irrelevant to the category, and thus our representation should be inser sitive to the of the fish. Ideally, in this case we want the features to be invarian a ie al or vertical. Because rotation is also irrelevant fo, to translation, whether horizontal or to be invariant to rotation. Finally, (2 lassification, we would also like the features to be the size of the fish may not be important—a young, small salmon is still a salmon. Thus, We they sleo want the features to be invariant to scale. in general, features ita describe properties such as shape, color, and many kinds of texture lant to translation, rotation, and scale The problem of finding rotation invariant features from an overhead intage of a fish on a conveyor belt is simplified by the fact that the fish is likely to be lying flat, ang the axis of rotation is always parallel to the camera's line of sight. A more generaj invariance would be for rotations about an arbitrary line in three dimensions. The image of even such a simple object as a coffee cup undergoes radical variation, as the cup is rotated to an arbitrary angle. The handle may become occluded—that is, hidden by another part. The bottom of the inside volume conic into view, the circular lip appear oval or a straight line or even obscured, and so forth. Furthermore, if the distance between the cup and the camera can change, the image is subject to projective distortion. How might we ensure that the features are invariant to such complex transformations? On the other hand, should we define different subcategories for the image of a cup and achieve the rotation invariance at a higher level of processing? As with segmentation, the task of feature extraction is much more problem—_a Gomain—dependent than is classification proper, and thus requires knowledge of the domain, A good feature extractor for sorting fish would probably be of little use identifying fingerprints, or classifying photomicrographs of blood cells. However, some of the Principles of pattern classification can be used in the design of the feature extractor. 1.5.4. CLassirication ‘The task of the classifier component proper of a full system is to use the feature vector provided by the feature extractor to assign the object to a category. Because Perfect classification performance is often impossible, a more general task ie te determine the probability for each of the possi feature-vector representation of the input data enables the development of a largely domain-independent theory of classification, ‘The degree of difficulty of the classification problem depends on the variability in the feature values for objects in the same category relative to the differenne between feature values for objects in different categories. The variability of feature values for objects in the same cat and may be due to noise. We—————yT—_—_=——— Introduction 0 Pattern Rec Be ‘ognition > e feature > ae ee using only the feature present? The ive method of merely assumi ralue of the missing feature is zero o1 the r the patterad aT average of the values for the 's provably nonoptimal. Likewise, how skeutd we some features are mi Patterns already seen train a classifier or use one when sing? 1.5.5. Post Processing Ack action fer rarely exists ina vacuum. Inste: (put this fish in this bucket an associated cost. The post-proc the recommended action. ‘ad, it is generally to be + Put that fish in that bucket), essor uses the output of the cla: used to recommend each action having ssifier to decide on Conceptually, the simplest measure of classifier performance is the classification crror rate-the percentage of new patterns that are assigned to the wrong category. Thus, it is common to seek minimum-error- i much better to recommend actions is called the simply too hard overall? Design Principles for Pattern Recognition Systems There are various sequence of activities for designing the pattern recognition system. The name of activities are given below: 1. Data collection 2. Feature choice 3. Model choice 4. Training 5. Evaluation | | | | Model Ser Data Feature J Collection *| Choice Choice uation Training the | Clossters | Classifier ¥ Termination Fig. 1.13. iti tem. ini ing the pattern recognition syste both training and testing reece Jaana rote aaa affect both the choice of eperoonai maa aie ne eee of models for different categories. The training proceee ures cee oe anor tie to determine the system parameters. eee cee aad various steps in the process in order to See rored call for repetition P pee hove i le classifiers, vetun, We might imagine that we could allo do beter ifwe used molt ition. We mi i epee cach classifier operating on different aspects of14 Pattern Recognition 1.6. LEARNING AND ADAPTATION ct e, we are capable of listenin, We take our ability to listen for granted, For instance, ously Miter ot the extranes® one person speak among several at a party. We si er out th i Boa rerun This filtering ability is beyond the capabilities of todays conversations and sound. This tion fe not_ specchbundesetiarte speech recognition systems. Speech recognitio rr 1 of words is a higher intellectual function. The fact that compte can copond ta Sonal command des not mean i undersanda the Soot spoken ‘Voice feaatatee systems will one day have the ability to distinguish linguistic nuances and the meaning of words, to "Do what 1 mean, not what I say He role of adaptation, learning and optimization are becoming increasingly essential and intertwined. The capability of a system to adapt either through modification of its Physiological structure or via some revalidation process of internal mechanisms that directly dictate the response or behavior is crucial in many real world applications, Optimization lies at the heart of most machine learning approaches while learning ang optimization are two primary means to effect adaptation in various forms. They usually involve computational processes incorporated within the system that trigger parametric updating and knowledge or model enhancement, giving rise to Progressive improvement, This book series serves as a channel to consolidate work related to topics linked to adaptation, learning and optimization in systems and structures. Topics covered under this series include: * complex adaptive systems inclu computing, swarm intelligence, simulated annealing, etc, * machine learning, data minin, * hybridization of techniques computational intelligence for solving. * aspects of adaptation in robotics * agent-based computing ding evolutionary computation, mimetic neural networks, fuzzy systems, tabu search, ig & mathematical programming that span across artificial intelligence and synergistic alliance of strategies for problem- 1.6.1. Wear is Learnine? * Learning denotes changes in a system that ... enable a system to do the same task more efficiently the next time." Herbert Simon . "Learning is constructing or modifying Tepresentations of what is being experienced." Ryszard Michalski * "Learning is making useful changes in our minds." Marvin Minsky 1.6.2. Learning Comes IN SeveraL GENERAL Forms: 1. Supervised learning: In Supervised learning a teacher provides a category label or cost for each pattern n in a training set, and seeks to reduce the sum of the . How can we be sure that a Particular learning algorithm arn the solution to a given problem and that it will beOO ——— eee 45 a will converge infinite stable to parameter y time 2. Unsupervised learning: In unsupery explicit teacher, and the aa input patterns. Natural is itself and given a p; algorithm ‘ariations, ‘ations. How can we determine if it sed learning or cluste ; clustering there is no Mas orms clusters or natural groupings of the articular sett pene e*Plicitly in the clustering systeng E le et of patterns or cos! ? Tank te CU et ost function different clustering 3, Reinforcement learning: The mo an input compute its tentative category label, en label to improve the classifier. For instance in o input might be an image of a character, the category label ‘R’ and the desired output a desired category signal is given, instead the tentative category is right or wrong St typical way to train a classifier is to present id use the known target category ptical character recognition, the actual output of the classifier the ‘B. In reinforcement learning no only teaching feedback is that the SHORT QUESTION ANSWER Q.1. What is Pattern Recognition? Ans. It is generally easy for a person to differentiate the sound of a human voice, rom thet ofa violin; a handwritten numeral "3," from an "8"; and the aroma of a rose, from thet ofan onion. However, itis difficult for a programmable computer to solve these kinds of perceptual problems. These problems are difficult because each pattern usually contains a large amount of information, and the recognition problems typically have an inconspicuous, high-dimensional, structure. Pattern recognition is the science of making inferences from perceptual data, using tools from statistics, probability, computational geometry, machine learning, signal processing, and algorithm design. Thus, it is of central importance to artificial intelligence and computer vision, and has far-reaching applications in engineering, science, medicine, and business. In particular, advances made during the last half century, now allow computers to interact more effectively with humans and the natural world (e.g., speech recognition software). However, the most important problems in pattern recognition are yet to be solved. Q.2. Briefly Explain Design Principles of Pattern Recognition System. ‘Ans, There are two fundamental approaches for implementing a pattern recognition system: statistical and structural. Each approach employs different techniques to implement the description and classification tasks. Hybrid approaches, sometimes referred to as a unified approach to pattern recognition, combine both statistical and structural techniques within a pattern recognition system. nates: Statistical pattern recognition draws from established concepts in statistical decision theory to discriminate among data from diferent groups based upon quantitative features of the data: There are a wide variety of statistical techniques that can be used within the description tas for feature extraction, ranging from simple descriptive statistics to complex tran: . Q.3. Rxplain the differences between statistical and structural approaches to pattern recognition. oe Ans, The differences between statistical and structural approaches to pattern reccgnitoes tt to their divergent theoretical foundations, the two approsches/ focus) a aes characteristics and employ distinctive techniques to implement classification tasks.Foundation | Statistical decision theory Te Morphological P ) at eat mber of primitives =m Variable Primitives relationships Description | Qu ee " Fixed number o} 7 eptures prim! : Het hee eaten fala from primitive encoding ‘Semantics from feature Pos parsing with syntactic grammars Classification Statistical fen some kind of a transducer, mM fhe problem may well depend on the systel lution, sensitivity, distortion, What is sensing? : ne Sensing: The input to a pattern reo tly of the probl such as a camera or a microphone array oi bandwidth, resol characteristics and limitations of the transducer signal-to-noise ratio, latency, ete. _ 5. Briefly Explain Segmentation and ner isolated, separate from others on the Ans. In our fish example, we assumed that cach Nt Wt “conveyor belt. In practice, the fish conveyor belt, and could easily be distinguishe ‘Id have to determine where one fish ends ang would often be overlapping, and our Syste Wee segmented. If we have already recognized the next begins—the individual patterns have 1 fe Seem ow can we segment the images the fish then it would be easier to segment their images. have been segmented? It before they have been categorized, or categorise tet ee ieodel to another, or to know seems we need a way to know when we have swit 7 tion is one of ju background or no category. How can this be done? Segmentation is one of the deepest problems fin pattern recognition. Closely related to the problem of segmentation i the problem of recognizing or grouping together the various pans of a composite object. Q.6. What is Feature Extraction? : ‘Ans. The conceptual boundary between feature extraction and classification proper is somewhat arbitrary: An ideal feature extractor would yield a representation that makes the job of the classifier trivial; conversely, an omnipotent classifier would not need the help of a sophisticated feature extractor The distinction is forced upon us for practical rather than theoretical reasons. The traditional goal of the feature extractor is to characterize an object to be recognized by measurements whose values are very similar for objects in the same category, and very different for objects in different categories. This leads to the idea of seeking distinguishing features that are invariant to irrelevant transformations of the input. In our fish example, the absolute location of a fish on the conveyor belt is irrelevant to the category, and thus our representation should be insensitive to the absolute location of the fish. Ideally, in this case we want the features to be invariant to translation, whether horizontal or vertical. Because rotation is also irrelevant for classification, we would also like the features to be invariant to rotation. Finally, the size of the fish may not be important—a young, small salmon is still a salmon. Thus, we may also want the features to be invariant to scale. In general, features thet describe properties such as shape, color, and many kinds of texture are invariant to translation, rotation, and scale. Q.7. What is pattern Classification? Ans. The task of the classifier component proper of a full system is to use the feature vector Fives hone a Saas to assign the object to a category. Because perfect classification ie eer ae nea eo more general task is to determine the probability for each Ipput date ie categories, The abstraction provided by the feature-vector representation oft a es the development of a largely domain-independent theory of classification feature v Papas of the classification problem depends on the variability in the ects in the same category relative to the difference between feature valuesOO — Introduction to Pat tern Recognition i cn eS The hal Pattern recognition QS. Explain Learning and Adaptation, Ans. We take our ability to listen ; person speak among several at and sound. This filterin, for granted, F ‘i. For instance, we a 8 party. We nen » We are capable of listen ag ee consciously filter out the extraneous convereatoag pecch recognition is not spe he the capabilites ot ogee ee ; ion Speech remenition #8 Mot specch understands Understanding teen ecamnition systems eee etal faction, The fect that « ooo et respond to eens, of words is a the abit vonmerstands the command spoken we an nition sysene aemmand does the ability to distinguish linguistic muanere ae the Ing of weraeem® ill one day have not what I say!" © meaning of words, to “Do what I'mean He role of adaptation, tea ring and optimization are becomi: i ming incr e intertwined. The capability of a system to adapt either “rough modifeaton of ner and structure or via some revalidati it ‘ Of its physiological Famer ocess of internal mechanisms that directly dictate’ the sacha eal world applications. Optimization lee ae the heart aches while learning and optimization are two Primary means to effect adaptation in various forms. They usually involve computational processes incorporated within the system that trigger parametric updating and knowledge or model enhancement, Pork related eo eressive improvement. This book series serves a0 a channel to consolidate work related to topics linked ion in systems and structures. Topics cov response or behavior is crucial of most machine learning appr: to adaptation, learning and optimizati vered under this series include: tion, ic computing, fuzzy systems, tabu search, simulated annealing, machine learning, data mining and mathematical programming hybridization of techniques that span across artificial intelligence and computational intelligence for synergistic alliance of strategies for problem-solving. aspects of adaptation in robotics * agent-based computing autonomic/pervasive computing dynamic optimization/learning in noisy and uncertain environment systemic alliance of stochastic and conventional search techniques all aspects of adaptations in man-machine systems. Q.9. What is Learning? Ans. * "Learning denotes changes in a system that . more efficiently the next time." Herbert Simon “Learning is constructing or modifying representations of what is being experienced.” Ryszard Michalski * "Learning is making useful changes in our minds." Marvin Minsky Q.10. Why do Machine Learning? ; ; ir ‘ing Ans. * Understand and improve efficiency of human lear: ae . = aaa use to improve methods for teaching and tutoring people, as done in CAl Computer-aided instruction ‘ * Discover new things or structure that is unknown to humans Example: Data mining 5 f + Fill in skeletal or incomplete specifications about a domain enable a system to do the same taskcom Pattern Recognition iy ‘ami d require d be completely derived by hand and reqi tems cannot be cor Y Learning new characteristics expands 2 * 18 Large, complex Al sys! information. updating to incorporate new in tion i Gaia 7 expertise and lessens the "brittleness" of the system Q.11. What are the Components of a Learning System? - Sensors Ans. Critic <-~ ' i 1 Learning Element <-----> Performance Element -> Effectors j ‘a ! i --| I 1 v / Problem Generator ee + Learning Element makes changes to the system based on how it's doing * Performance Element is the agent itself that acts in the world * Critic tells the Learning Element how it is doing (e.g., success or failure) by comparing with a fixed standard of performance * Problem Generator suggests "problems" or actions that will generate new examples o, experiences that will aid in training the system further ‘We will concentrate on the Learning Element. Q.12. What are several possible criteria for Evaluating Performance of learning algorithm, Ans. Several possible criteria for evaluating a learning algorithm: * Predictive accuracy of classifier + Speed of learner * Speed of classifier * Space requirements Most common criterion is predictive accuracy Q.13. What are Major Paradigms of Machine Learning? Ans. + Rote Learning: One-to-one mapping from inputs to stored representation. Association-based storage and retrieval. by memorization." * Induction: Use specific examples to reach general conclusions "Learning * Clustering * Analogy: Determine correspondence between two different representations * Discovery: Unsupervised, specific goal not given * Genetic Algorithms * Reinforcement: Only feedback (positive or negative reward) given at end of a sequence of steps. Requires assigning reward to steps by solving the credit assignment problem. -which steps should receive credit or blame for a final result? 1% SUMMARY @ © & Pattern recognition is the science of making inferences from perceptual data, using tools from statistics, probability, computational geometry, machine learning, signal processing and algorithm design. 2. Feature can be defined as any distinctive aspect, quality or characteristic which, may be €., height). The combination of d features is represented symbolic (i.e., color) or numeric (i as a d-dimensional column vector called a feature vector.10. C—O Introduction to Pattey N Recognition 19 Pattern is defined as aS composite a pattern features (feature of features tha sification iS a pair of y aa vector) and tinbles tnx} where x in a colloce oo vidual Wis the con of observations observation (label) PProaches for imph ch approach ¢ S different technique ification tasks, ystem ription and class to implement the The qu titative nature of st (observe a difference) amor structural) sub p. Structur atistical pattern re & gtoups b; and their inte al pattern recognition, origins in formal |, ata from different reonnections) pr The input to a p: a camera or Cognition makes it difficult to discrimin ased on the shape based or embedded within the data morphological (i.¢ fatterns errclationships embeds Sometimes referred. te ‘anguage theory, relies on s 8roups based upon the mo: nt within the data, “ttern recognition system is often some @ microphone array The traditional goal of the fe, by measurements whose € to it > as syntactic pattern recognition mtactic grammars to discriminate rphological interrelationships (or among Kind of a transducer, such as ature extractor is t, 'o characterize an object to be lues are ve Tecognized ty similar for objects in the same cat fegory, and different categories, A More general invariance would be for rotations about an arbitrary line in three dimensions. EXERCISE Suppose you want to set up a subjective Probability model to predict the number of ‘ches that it will rain tomorrow based on the weather forccues for tomorrow which says that “the probability of rain tomorrow is 30 percent” Assuming that you believe the forecast, fill inyour subjective probabilities in the following table: Amount in inches Probability r= 0.0 00