AI Unit V and II
AI Unit V and II
UNIT V
Pattern Recognition: Introduction and design principles, Statistical
pattern recognition, Parameter estimation methods - Principle
component analysis and Linear discrimination analysis,
Classification techniques - Nearest neighbor rule and Bayes
classifier, K-means clustering, Support vector machine.
UNIT-V
5.1 Pattern Recognition:
Definition: Pattern recognition is the process of identifying, categorizing, and interpreting
patterns within data. It involves both the extraction of features from raw data and the
classification or clustering of data based on those features.
Applications: Pattern recognition has numerous applications across various domains, including:
Image and speech recognition
Handwriting recognition
Biometric identification
Medical diagnosis
Fraud detection
Autonomous driving
Natural language processing,
and more.
5.1.1 Key Components:
The main components of a pattern recognition system include:
• Data Acquisition: Gathering raw data from sensors, databases, or other sources.
• Feature Extraction: Identifying relevant features or characteristics from the raw data
that are informative for pattern recognition.
• Feature Selection: Selecting the most discriminative features while reducing
dimensionality.
• Classification or Clustering: Categorizing or grouping data into classes or clusters
based on extracted features.
• Model Training and Evaluation: Training a model on labeled data (supervised learning)
or learning patterns directly from data (unsupervised learning) and evaluating the model’s
performance.
5.2 Design Principles of Pattern Recognition Systems:
1. Feature Representation: Choose appropriate feature representations that capture relevant
information for the task at hand. Features should be discriminative, invariant to irrelevant
variations, and compact.
2. Model Selection: Select the most suitable model or algorithm for the given problem. This
could include classifiers such as Support Vector Machines (SVMs), Decision Trees, Neural
Networks, or clustering algorithms like K-Means or Gaussian Mixture Models (GMMs).
3. Data Preprocessing: Preprocess the data to remove noise, handle missing values, normalize
or standardize features, and enhance the quality of input data.
4. Dimensionality Reduction: Reduce the dimensionality of feature space to improve
computational efficiency and alleviate the curse of dimensionality. Techniques such as
Principal Component Analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-
SNE) can be employed.
5. Model Training and Evaluation: Train models using appropriate learning algorithms and
evaluate their performance using suitable metrics such as accuracy, precision, recall, F1-
score, or area under the receiver operating characteristic curve (AUC-ROC).
6. Cross-Validation: Use techniques like k-fold cross-validation to assess the
generalization performance of the model and ensure robustness against overfitting.
7. Interpretability and Explainability: Design models that are interpretable and
explainable, especially in domains where decisions have high stakes or legal
implications.
8. Iterative Improvement: Continuously refine and improve the pattern recognition
system based on feedback and new data. This may involve retraining models,
updating feature representations, or incorporating domain-specific knowledge.
5.3 How does Pattern Recognition Work?
Historically, the two major approaches to pattern recognition are
1. Statistical Pattern Recognition (or decision-theoretic) and
2. Syntactic Pattern Recognition (or structural).
The third major approach is based on the technology of artificial neural networks (ANN),
named-
• Neural Pattern Recognition.
No single technology is always the optimal solution for a given pattern recognition
problem. All three or hybrid methods are often considered to solve a given pattern
recognition problem.
5.3.1 Statistical Pattern Recognition
• Statistical Pattern Recognition is also referred to as StatPR. Among the traditional
approaches to pattern recognition, the statistical approach has been most intensively
studied and used in practice long before neural network methods became popular.
• In statistical pattern recognition, the pattern is grouped according to its features, and the
number of features determines how the pattern is viewed as a point in a d-dimensional
space. These features are chosen in a way that different patterns take space without
overlapping.
• The method works so that the chosen attributes help the creation of clusters. The
machine learns and adapts as expected, and then uses the patterns for further processing
and training. The goal of StatPR is to choose the features that allow pattern vectors to
belong to different categories in a d-dimensional feature space.
5.3.2 Syntactic Pattern Recognition
• Syntactic Pattern Recognition, also known as SyntPR, is used for recognition problems
involving complex patterns that can be addressed by adopting a hierarchical perspective.
• Accordingly, the syntactic pattern approach relies on primitive subpatterns (such as
letters of the alphabet). The pattern is described depending on the way the primitives
interact with each other. An example of this interaction is how they are assembled in
words and sentences. The given training samples develop how grammatical rules are
developed and how the sentences will later be ―read‖.
• In addition to classification, structural pattern recognition also provides a description of
how the given pattern is constructed from the primitive subpatterns. Hence, the approach
has been used in examples where the patterns have a distinct structure that can be
captured in terms of a rule set, such as EKG waveforms or textured images.
• The syntactic approach may lead to a combinatorial explosion of probabilities to be
examined, requiring large training sets and very large computational efforts.
5.4 Linear discriminant analysis (LDA):
• Linear discriminant analysis (LDA) is an approach used in supervised machine
learning to solve multi-class classification problems. LDA separates multiple classes
with multiple features through data dimensionality reduction. This technique is important
in data science as it helps optimize machine learning models.
• Linear discriminant analysis, also known as normal discriminant analysis (NDA) or
discriminant function analysis (DFA), follows a generative model framework. This
means LDA algorithms model the data distribution for each class and use Bayes
'theorem to classify new data points.
• Bayes calculates conditional probabilities—the probability of an event given some other
event has occurred. LDA algorithms make predictions by using Bayes to calculate the
probability of whether an input data set will belong to a particular output.
5.4 Linear discriminant analysis (LDA):
A practical application of LDA
• Suppose that a bank is deciding whether to approve or reject loan applications.
• The bank uses two features to make this decision: the applicant's credit score and annual
income.
• Here, the two features or classes are plotted on a 2-dimensional (2D) plane with an X-Y
axis. If we tried to classify approvals using just one feature, we might observe overlap.
By applying LDA, we can draw a straight line that completely separates these two class
data points.
• LDA achieves this by using the X–Y axis to create a new axis, separating the different
classes with a straight line and projecting data onto the new axis.
To create this new axis and reduce dimensionality, LDA follows these criteria:
Maximize the distance between the means of two classes.
Minimize the variance within individual classes.
5.4 Linear discriminant analysis (LDA):
5.4.1 Applications of linear discriminant analysis
• Credit risk assessment in finance
To mitigate risk, financial institutions must identify and minimize credit default. LDA can
help identify applicants who might be likely to default on loans from those who are
creditworthy by sifting through financial factors and behavior data.
• Disease diagnosis in healthcare
Fast and accurate disease diagnosis is crucial for effective treatment. Hospitals and
healthcare providers must interpret an immense amount of medical data. LDA helps
simplify complex data sets and improve diagnostic accuracy by identifying patterns and
relationships in patient data.
• Customer segmentation in e-commerce
For effective marketing, e-commerce businesses must be able to categorize diverse
customer bases. LDA is pivotal in segmenting customers, enabling e-commerce companies
to tailor their marketing strategies for different customer groups. The outcome is more
personalized shopping experiences, increasing customer loyalty and sales.
• Campaign optimization in marketing
You can maximize your advertising budget by targeting the right audience with personalized
content, but identifying those respective audience segments can be difficult. LDA can
simplify this process by classifying customer attributes and behaviors, enhancing the
customization of advertising campaigns. This approach can lead to a higher return on
investment (ROI) and a better customer experience.
5.4.2 Advantages and disadvantages of using linear discriminant analysis
Advantages
• Use simplicity and efficiency of computation: LDA is a simple yet powerful algorithm. It's relatively easy to
understand and implement, making it accessible to those new to machine learning. Also, its efficient computation
ensures quick results.
• Manage high-dimensional data: LDA is effective where the number of features is larger than the number of
training samples. Therefore, LDA is valuable in applications like text analysis, image recognition, and genomics,
where data is often high-dimensional.
• Handle Multicollinearity: LDA can address Multicollinearity, which is the presence of high correlations
between different features. It transforms the data into a lower-dimensional space while maintaining information
integrity.
Disadvantages
• Shared mean distributions: LDA encounters challenges when class distributions share means. LDA struggles to
create a new axis that linearly separates both classes. As a result, LDA might not effectively discriminate between
classes with overlapping statistical properties.
• For example, imagine a scenario in which two species of flowers have highly similar petal length and width. LDA
may find it difficult to separate these species based on these features alone. Alternative techniques, such as
nonlinear discriminant analysis methods, are preferred here.
• Not suitable for unlabeled data: LDA is applied as a supervised learning algorithm–that is, it classifies or
separates labeled data. In contrast, principal component analysis (PCA), another dimension reduction technique,
ignores class labels and preserves variance.
5.5 Principal Component Analysis (PCA):
• Both Linear Discriminant Analysis (LDA) and Principal Component Analysis (PCA) are
linear transformation techniques that are commonly used for dimensionality reduction.
PCA can be described as an ―unsupervised‖ algorithm, since it ―ignores‖ class labels
and its goal is to find the directions (the so-called principal components) that maximize
the variance in a dataset.
• Principal component analysis (PCA) reduces the number of dimensions in large datasets
to principal components that retain most of the original information. It does this by
transforming potentially correlated variables into a smaller set of variables, called
principal components.
5.5.1 Applications of principal component analysis:
• Image compression
PCA reduces image dimensionality while retaining essential information. It helps
create compact representations of images, making them easier to store and
transmit.
• Data visualization
PCA helps to visualize high-dimensional data by projecting it into a lower-
dimensional space, such as a 2D or 3D plot. This simplifies data interpretation
and exploration.
• Noise filtering
PCA can remove noise or redundant information from data by focusing on the
principal components that capture the underlying patterns.
5.5.2 Principal component analysis (PCA):
Advantages
• By reducing the data to two dimensions, you can easily visualize it.
• PCA removes multicollinearity. Multicollinearity arises when two features are correlated.
PCA produces a set of new orthogonal axes to represent the data, which, as the name
suggests, are uncorrelated.
• PCA removes noise. By reducing the number of dimensions in the data, PCA can help
remove noisy and irrelevant features.
• PCA reduces model parameters: PCA can help reduce the number of parameters in
machine learning models.
• PCA reduces model training time. By reducing the number of dimensions, PCA
simplifies the calculations involved in a model, leading to faster training times.
5.5.2 Principal component analysis (PCA):
Disadvantages
• The run-time of PCA is cubic in relation to the number of dimensions of the data. This
can be computationally
expensive at times for large datasets.
• PCA transforms the original input variables into new principal components (or
dimensions). The new dimensions offer no interpretability.
• While PCA simplifies the data and removes noise, it always leads to some loss of
information when we reduce dimensions.
• PCA is a linear dimensionality reduction technique, but not all real-world datasets may
be linear.
• PCA gets affected by outliers. This can distort the principal components and affect the
accuracy of the results.
5.6 Classification techniques.
• Classification in machine learning is a predictive modeling process by which machine
learning models use classification algorithms to predict the correct label for input data.
• A classification model is a type of machine learning model that sorts data points into
predefined groups called classes. Classifiers learn class characteristics from input data,
then learn to assign possible classes to new unseen data according to those learned
characteristics
I. Nearest neighbor rule
II. Bayes classifier
III. K-means clustering
IV. Support vector machine
5.6.3 K-means clustering
• K-Means Clustering is an unsupervised learning algorithm that is used to solve the
clustering problems in machine learning or data science.
• Here K defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.
The k-means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
5.6.3 K-means clustering
• centroid. To choose the new centroids, we will compute the center of gravity of these
centroids, and will find new centroids as below:
5.6.3 K-means clustering
Step-3: Assign each data point to their closest centroid, which will form the predefined K
clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest
classifies data by finding an optimal line or hyperplane that maximizes the distance
• SVMs were developed in the 1990s by Vladimir N. Vapnik and his colleagues, and they
published this work in a paper titled "Support Vector Method for Function
UNIT II
Introduction, Problem solving by searching, Searching for
solutions, Uniformed searching techniques, Informed searching
techniques, Local search algorithms, Adversarial search
methods, Search techniques used in games, Alpha-Beta
pruning.
1. Introduction to Searching Techniques
Definition: Searching techniques are methods used to navigate through a problem
space to find solutions efficiently.
Applications: Used in AI, game development, optimization, path finding (e.g.,
GPS), and problem-solving.
Key Concepts:
Problem Space: A representation of all possible states that can be reached from
the initial state.
Goal State: The desired state to be achieved.
Operators: Rules that move from one state to another.
2. Problem Solving by Searching
Problem Representation:
Initial State: Starting point of the problem.
Goal State: The solution to the problem.
State Space: Set of all possible states.
Actions: Available moves/actions to transition between states.
Types of Problems:
Single-State: The agent knows its current state.
Multi-State: The agent has incomplete information.
Adversarial: Involves competition, e.g., games.
Steps in Problem Solving:
Define the problem clearly.
Represent it as a search problem.
Choose an appropriate search algorithm.
3. Searching for Solutions
Involves exploring the problem space systematically to identify the goal state.
Criteria for an Effective Search:
Completeness: Does the algorithm guarantee finding a solution if one exists?
Optimality: Does it guarantee finding the best solution?
Time Complexity: How much time does the algorithm take?
Space Complexity: How much memory does the algorithm require?
4. Uninformed Searching Techniques
Do not use additional information about the problem.
Examples:
Breadth-First Search (BFS):
Explores all nodes at the current depth before moving deeper.
Completeness: Guaranteed.
Optimality: Yes (for uniform cost).
Depth-First Search (DFS):
Explores as far as possible along a branch before backtracking.
Completeness: Not guaranteed (may get stuck in infinite loops).
Optimality: No.
Uniform Cost Search (UCS):
Expands the least-cost node first.
Completeness: Guaranteed.
Optimality: Yes.
5. Informed Searching Techniques
Use heuristics to guide the search.
Examples:
Greedy Best-First Search:
Chooses the node that appears to be closest to the goal.
Completeness: Not guaranteed.
Optimality: No.
A*:
Combines the cost to reach a node and the estimated cost to the goal.
Completeness: Guaranteed.
Optimality: Yes, if the heuristic is admissible (never overestimates).
6. Local Search Algorithms
Operate using a single current state and move to neighbors.
Used in optimization problems.
Examples:
Hill Climbing:
Chooses the neighbor with the highest value.
Problem: May get stuck in local maxima.
Simulated Annealing:
Allows exploration of worse states to escape local maxima.
Genetic Algorithms:
Based on evolution, using mutation and crossover operators.
7. Adversarial Search Methods
Used in competitive environments like games.
Key Concept: Search with opponents trying to maximize their gain while
minimizing yours.
Examples:
Minimax Algorithm:
Minimizes the possible loss for a worst-case scenario.
Completeness: Guaranteed (if the tree is finite).
Optimality: Yes.
Expectimax:
Handles probabilistic outcomes instead of deterministic ones.
8. Search Techniques Used in Games
Includes strategies to make decisions under competitive or cooperative settings.
Strategies:
Depth-Limited Search.
Iterative Deepening.
Heuristic Evaluation.
9. Alpha-Beta Pruning
An optimization of the Minimax algorithm.
Purpose: Reduce the number of nodes evaluated in the game tree.
Mechanism:
Alpha: The best already explored option along the path to the maximizer.
Beta: The best already explored option along the path to the minimizer.
Working:
Prune branches that cannot affect the final decision.
Improves efficiency significantly.
Advantages:
Reduces computation time.
Makes deeper search possible in a fixed time.