0% found this document useful (0 votes)
8 views

Unit Pattern

The document provides an overview of pattern recognition, covering its definition, applications, types, learning approaches, and system architecture. It also discusses Bayesian decision theory, parameter estimation methods, hidden Markov models, dimension reduction techniques, non-parametric density estimation, linear discriminant function classifiers, non-metric methods for classification, and unsupervised learning with clustering algorithms. Key concepts include classifiers, discriminant functions, maximum likelihood estimation, Gaussian mixture models, and various clustering techniques.

Uploaded by

Sohel Datta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Unit Pattern

The document provides an overview of pattern recognition, covering its definition, applications, types, learning approaches, and system architecture. It also discusses Bayesian decision theory, parameter estimation methods, hidden Markov models, dimension reduction techniques, non-parametric density estimation, linear discriminant function classifiers, non-metric methods for classification, and unsupervised learning with clustering algorithms. Key concepts include classifiers, discriminant functions, maximum likelihood estimation, Gaussian mixture models, and various clustering techniques.

Uploaded by

Sohel Datta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Unit 1: Basics of Pattern Recognition

• Definition: Pattern Recognition is the process of classifying input data into objects or
categories based on key features.

• Applications: Handwriting recognition, fingerprint identification, speech recognition,


facial detection, medical diagnosis.

• Types of Pattern Recognition:

o Statistical: Based on statistical information (e.g., Bayesian classifier).

o Syntactic: Based on grammar rules and structure (e.g., image pattern from
pixels).

o Neural: Uses models inspired by the human brain (e.g., neural networks).

• Learning Approaches:

o Supervised Learning: Training data with known labels is used to build models.

o Unsupervised Learning: Only input data is available (no labels); goal is to find
structure (e.g., clustering).

o Reinforcement Learning: Learns through feedback and rewards.

• System Architecture:

1. Sensing: Device captures data.

2. Preprocessing: Normalize, denoise, enhance features.

3. Feature Extraction: Extract meaningful and discriminative attributes (e.g., edge,


color, texture).

4. Classification: Assign input to the most probable class.

Unit 2: Bayesian Decision Theory

Classifiers, Discriminant Functions, Decision Surfaces

• Classifier: Algorithm that assigns an input to one of the predefined classes.

• Discriminant Function: Maps a feature vector to a value; the class with the highest
value is chosen.

o gi(x)>gj(x)g_i(x) > g_j(x)gi(x)>gj(x) implies class i is preferred.

• Decision Surface: A boundary in feature space that separates different classes (e.g.,
line for 2D data).
Normal Density and Discriminant Functions

• Normal Distribution: Continuous probability distribution defined by mean μ\muμ


and variance σ2\sigma^2σ2.

• PDF (1D):

p(x)=12πσ2e−(x−μ)22σ2p(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x -


\mu)^2}{2\sigma^2}}p(x)=2πσ21e−2σ2(x−μ)2

• In classification, we compute likelihood of feature vector belonging to a class using


the Gaussian model.

Discrete Features

• When input features are discrete (e.g., binary or categorical).

• Probability mass function is used.

• Bayes theorem:

P(ωi∣x)=P(x∣ωi)P(ωi)P(x)P(\omega_i|x) = \frac{P(x|\omega_i)P(\omega_i)}{P(x)}P(ωi
∣x)=P(x)P(x∣ωi)P(ωi)

• Maximum a Posteriori (MAP) classifier selects class with highest posterior


probability.

Unit 3: Parameter Estimation Methods

Maximum Likelihood Estimation (MLE)

• Estimate parameters such that the likelihood of observed data is maximized.

• For Gaussian:

o Mean: μ=1n∑xi\mu = \frac{1}{n} \sum x_iμ=n1∑xi

o Variance: σ2=1n∑(xi−μ)2\sigma^2 = \frac{1}{n} \sum (x_i - \mu)^2σ2=n1∑(xi


−μ)2

Gaussian Mixture Models (GMM)

• Models data as a mixture of multiple Gaussians.

• Each component represents a cluster or subpopulation.

• Useful in real-world problems like speaker identification.

Expectation-Maximization (EM)
• Algorithm for parameter estimation in models like GMM.

1. E-step: Estimate hidden variables (posterior probabilities).

2. M-step: Update parameters to maximize expected likelihood.

Bayesian Estimation

• Incorporates prior knowledge into parameter estimation.

• Uses Bayes’ theorem to update belief about parameters after seeing data.

• More robust when data is sparse or uncertain.

Unit 4: Hidden Markov Models (HMMs)

Discrete HMMs

• Markov process where state is hidden but generates observable symbols.

• Defined by:

o A: state transition probabilities

o B: emission probabilities (output given state)

o π: initial state distribution

• Applications: Speech recognition, bioinformatics.

Continuous HMMs

• Observations are continuous, not discrete.

• Emission probability modeled using Gaussian or GMM.

Unit 5: Dimension Reduction Methods

Fisher’s Linear Discriminant

• Projects data to a lower dimension to maximize class separability.

• Maximizes the ratio of between-class scatter to within-class scatter.

Principal Component Analysis (PCA)

• Projects data to new axes (principal components) to capture maximum variance.

• Steps:
1. Mean normalization

2. Compute covariance matrix

3. Compute eigenvectors

4. Select top k components

Parzen Window

• Non-parametric way to estimate the PDF of a random variable.

• Uses kernels (e.g., Gaussian) placed on each data point.

• Good for visualizing data distribution without assuming a specific shape.

K-Nearest Neighbours (KNN)

• For a new point, find the k closest training samples and vote for the class.

• Simple and effective; sensitive to the choice of k and distance metric.

Unit 6: Non-Parametric Techniques for Density Estimation

• No assumption about the form of the distribution.

• Techniques:

o Histogram: Divide data range into bins.

o KNN Density: Density at a point is inverse of volume needed to enclose k


nearest samples.

o Parzen Window: Smooth version using kernel functions.

• Useful when underlying distribution is unknown or multimodal.

Unit 7: Linear Discriminant Function Based Classifier

Perceptron

• Linear binary classifier: y=wTx+by = w^T x + by=wTx+b

• Learning rule: Adjust weights based on classification error.

• Converges if data is linearly separable.

Support Vector Machine (SVM)

• Finds optimal hyperplane that maximizes margin between classes.


• Can use kernel trick to handle non-linear data (e.g., polynomial, RBF).

• Solves convex optimization problem.

Unit 8: Non-Metric Methods for Pattern Classification

Non-Numeric (Nominal) Data

• Feature values are labels (e.g., male/female).

• Require non-metric classifiers (not based on distances).

Decision Trees

• Recursive structure of nodes and branches.

• Internal nodes → feature tests.

• Leaves → class labels.

• Split criteria: Information gain, Gini index, Chi-square.

• Advantages:

o Easy to understand

o Handles both categorical and numerical data

• Disadvantage: Prone to overfitting (can be solved using pruning).

Unit 9: Unsupervised Learning and Clustering

Criterion Functions

• Measure quality of clustering:

o Intra-cluster distance: Should be small.

o Inter-cluster distance: Should be large.

o Silhouette score, SSE, etc.

Clustering Algorithms:

• K-means:

1. Choose k centers randomly

2. Assign points to nearest center


3. Update centers

4. Repeat until convergence

• Hierarchical Clustering:

o Agglomerative: Merge closest clusters (bottom-up)

o Divisive: Split until only single clusters remain (top-down)

• Other Methods:

o DBSCAN: Density-based clustering, handles noise

o Mean-Shift: Moves points to high-density regions

You might also like