0% found this document useful (0 votes)
10 views59 pages

AI Unit-4

The document outlines a syllabus for an Artificial Intelligence course, covering topics such as Intelligent Agents, Search Strategies, Knowledge Representation, Machine Learning, and Pattern Recognition. It elaborates on Machine Learning types, including Supervised, Unsupervised, and Reinforcement Learning, detailing their definitions, methods, and applications. Additionally, it discusses Decision Trees and Statistical Learning Models, emphasizing their importance in predictive modeling and data analysis.

Uploaded by

harishgarg9299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views59 pages

AI Unit-4

The document outlines a syllabus for an Artificial Intelligence course, covering topics such as Intelligent Agents, Search Strategies, Knowledge Representation, Machine Learning, and Pattern Recognition. It elaborates on Machine Learning types, including Supervised, Unsupervised, and Reinforcement Learning, detailing their definitions, methods, and applications. Additionally, it discusses Decision Trees and Statistical Learning Models, emphasizing their importance in predictive modeling and data analysis.

Uploaded by

harishgarg9299
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Artificial Intelligence

RCA-403
Syllabus
UNIT-I INTRODUCTION: - Introduction to Artificial Intelligence, Foundations and
History of Artificial Intelligence, Applications of Artificial Intelligence, Intelligent
Agents, Structure of Intelligent Agents. Computer vision, Natural Language
Possessing.
UNIT-II INTRODUCTION TO SEARCH: - Searching for solutions, uniformed
search strategies, informed search strategies, Local search algorithms and optimistic
problems, Adversarial Search, Search for Games, Alpha - Beta pruning.
UNIT-III KNOWLEDGE REPRESENTATION & REASONING: - Propositional
logic, Theory of first order logic, Inference in First order logic, Forward &
Backward chaining, Resolution, Probabilistic reasoning, Utility theory, Hidden
Markov Models (HMM), Bayesian Networks.
Syllabus
UNIT-IV MACHINE LEARNING: - Supervised and unsupervised learning,
Decision trees, Statistical learning models, learning with complete data -
Naive Bayes models, Learning with hidden data – EM algorithm,
Reinforcement learning.

UNIT-V PATTERN RECOGNITION: - Introduction, Design principles of


pattern recognition system, Statistical Pattern recognition, Parameter estimation
methods - Principle Component Analysis (PCA) and Linear Discriminant Analysis
(LDA), Classification Techniques – Nearest Neighbor (NN) Rule, Bayes Classifier,
Support Vector Machine (SVM), K – means clustering.
UNIT-IV
Machine Learning

What is learning?
• Learning is the process of gathering information and knowledge from past
experience data analysis and apply this information to enhance the system
performance.
• Learning represents changes in a system that, make a system to do the same
task more efficiently next time
Machine learning, a branch of artificial intelligence, concerns the construction and
study of systems that can learn from data. For example, a machine learning system
could be trained on email messages to learn to distinguish between spam and non-
spam messages. After learning, it can then be used to classify new email messages
into spam and non-spam folders.
Machine Learning

Types of Machine Learning


Machine Learning: Supervised ML

Supervised Machine learning?


Supervised learning is like learning with a supervisor/ guide. Training dataset is a
supervisor which is used to train the machine. Supervised learning is where you
have input variables (x) and an output variable (Y) and you use an algorithm to learn
the mapping function from the input to the output.
Y = f(X)
Basic idea of supervised learning is the training data provides “examples” and
“outcomes”, where each example specifies the outcome. Goal is to build a model
which can predict the outcome for new instances. If objective is categorical, then
model is “classification”, where as if the objective is numeric then model is
“regression”.
Machine Learning

Based on the outcome/response or dependent variable, supervised learning problems


can be further divided into two different kinds:
• Regression: When the outcome or response variable is a continuous variable
(numeric or number), it can be called as regression problems. Algorithms are:
Linear regression, Support vector regression (SVR), ensemble methods,
decision trees, neural networks
• Classification: When the outcome or response variable is a discrete variable
(labels), it can be called as classification problems. Algorithms are: Support
vector machine (SVM), discriminant analysis, Naive Bayes, K-Nearest
Neighbors (KNN)
What is Machine Learning? A
Definition.
• Machine learning is an application of artificial intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the development of computer
programs that can access data and use it to learn for themselves.

• The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is
to allow the computers learn automatically without human intervention or
assistance and adjust actions accordingly.

21-01-2025 9
Machine Learning Methods

• Machine learning algorithms are often categorized as


supervised or unsupervised.

21-01-2025 10
21-01-2025 11
Supervised machine learning
algorithms
• Supervised machine learning algorithms can apply what has been
learned in the past to new data using labeled examples to predict future
events. Starting from the analysis of a known training dataset, the learning
algorithm produces an inferred function to make predictions about the
output values. The system is able to provide targets for any new input after
sufficient training. The learning algorithm can also compare its output with
the correct, intended output and find errors in order to modify the model
accordingly.
21-01-2025 12
21-01-2025 13
Machine Learning: Unsupervised ML

In Unsupervised learning, training data provides “examples” and no specific


“outcome”. Machine is trying to find “interesting” patterns in the data, then labeled
appropriately. Unsupervised learning is where you only have input data (X) and no
corresponding output variables. The goal for unsupervised learning is to model the
underlying structure or distribution in the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above
there is no correct answers and there is no supervisor.
Unsupervised learning algorithms: Clustering: K-means, K-medoids,
Hierarchical, Gaussian mixture, neural networks, hidden markov model.
Unsupervised machine learning
algorithms
• unsupervised machine learning algorithms are used when the information used to
train is neither classified nor labeled. Unsupervised learning studies how systems can
infer a function to describe a hidden structure from unlabeled data. The system doesn’t
figure out the right output, but it explores the data and can draw inferences from datasets
to describe hidden structures from unlabeled data.

21-01-2025 15
21-01-2025 16
Machine Learning: Supervised vs. Unsupervised ML
Machine Learning: Reinforcement ML

Reinforcement Learning is a subset of machine learning algorithms that learns by


exploring its environment. RL agent learns by interacting with its environment.
Agent receives awards by performing correctly and penalties by performing
incorrectly. Reinforcement Learning is a type of Machine Learning which allows
machines and software agents to automatically determine the ideal behavior within a
specific context, in order to maximize its performance. Reinforcement algorithms
are not given explicit goals; instead, they are forced to learn these optimal goals by
trial and error.
Applications of RL are Robot Navigation,
Game Theory (Backgammon), Intelligent
tutoring system etc.
Understanding Reinforcement
Learning
• How does one learn cycling? How does a baby learn to walk? How
do we become better at doing something with more practice? Let us
explore learning to cycle to illustrate the idea behind RL.
• Did somebody tell you how to cycle or gave you steps to follow? Or
did you learn it by spending hours watching videos of people
cycling? All these will surely give you an idea about cycling; but will it
be enough to actually get you cycling? The answer is no. You learn
to cycle only by cycling (action). Through trials and errors (practice),
and going through all the positive experiences (positive reward) and
negative experiences (negative rewards or punishments), before
getting your balance and control right (maximum reward or best
outcome). This analogy of how our brain learns cycling applies to
reinforcement learning. Through trials, errors, and rewards, it finds
the best course of action
Components of Reinforcement
Learning
• Agent: Agent is the part of RL which takes actions, receives
rewards for actions and gets a new environment state as a
result of the action taken. In the cycling analogy, the agent is a
human brain that decides what action to take and gets
rewarded (falling is negative and riding is positive).
• Environment: The environment represents the outside world
(only relevant part of the world which the agent needs to know
about to take actions) that interacts with agents. In the cycling
analogy, the environment is the cycling track and the objects as
seen by the rider.
Components of Reinforcement
Learning
• State: State is the condition or position in which the agent is
currently exhibiting or residing. In the cycling analogy, it will be the
speed of cycle, tilting of the handle, tilting of the cycle, etc.
• Action: What the agent does while interacting with the environment
is referred to as action. In the cycling analogy, it will be to peddle
harder (if the decision is to increase speed), apply brakes (if the
decision is to reduce speed), tilt handle, tilt body, etc.
• Rewards: Reward is an indicator to the agent on how good or bad
the action taken was. In the cycling analogy, it can be +1 for not
falling, -10 for hitting obstacles and -100 for falling, the reward for
outcomes (+1, -10, -100) are defined while building the RL agent.
Since the agent wants to maximize rewards, it avoids hitting and
always tries to avoid falling.
21-01-2025 22
Why use Reinforcement Learning?

Here are prime reasons for using Reinforcement Learning:


• It helps you to find which situation needs an action
• Helps you to discover which action yields the highest reward over
the longer period.
• Reinforcement Learning also provides the learning agent with a
reward function.
• It also allows it to figure out the best method for obtaining large
rewards.
When Not to Use Reinforcement Learning?

You can’t apply reinforcement learning model is all the situation. Here
are some conditions when you should not use reinforcement learning
model.
• When you have enough data to solve the problem with a supervised
learning method
• You need to remember that Reinforcement Learning is computing-
heavy and time-consuming. in particular when the action space is
large.
Applications of Reinforcement Learning

Here are applications of Reinforcement Learning:


• Robotics for industrial automation.
• Business strategy planning
• Machine learning and data processing
• It helps you to create training systems that provide custom
instruction and materials according to the requirement of students.
• Aircraft control and robot motion control
Challenges of Reinforcement Learning

Here are the major challenges you will face while doing Reinforcement
earning:
• Feature/reward design which should be very involved
• Parameters may affect the speed of learning.
• Realistic environments can have partial observability.
• Too much Reinforcement may lead to an overload of states which
can diminish the results.
• Realistic environments can be non-stationary.
Reinforcement Learning vs. Supervised Learning

Parameters Reinforcement Learning Supervised Learning


reinforcement learning helps In this method, a decision is
Decision style you to take your decisions made on the input given at
sequentially. the beginning.
Works on interacting with Works on examples or given
Works on
the environment. sample data.
In RL method learning Supervised learning the
decision is dependent. decisions which are
Dependency on decision Therefore, you should give independent of each other,
labels to all the dependent so labels are given for every
decisions. decision.
Supports and work better in It is mostly operated with an
Best suited AI, where human interaction interactive software system
is prevalent. or applications.
Example Chess game Object recognition
Learning using Decision Trees
Decision tree learning is one of the most successful techniques for supervised
classification learning. DT takes input as objects or situations described by a set of
attributes and returns a decision. The input attribute can be discrete or continuous as
well as output also. Learning a discrete valued function is called classification and
learning a continuous valued function is called regression learning. It can be applied
to both regression & classification.
Example: What to do this Weekend?
This tree consists of the following components:
• Questions/conditions are Nodes.
• Yes/No options represent Edges.
• End actions are Leafs of the tree.
Introduction Decision Trees are a type of Supervised Machine Learning (that is you explain what the input
is and what the corresponding output is in the training data) where the data is continuously split according
to a certain parameter. The tree can be explained by two entities, namely decision nodes and leaves. The
leaves are the decisions or the final outcomes. And the decision nodes are where the data is split.
1. Classification trees (Yes/No types)

• What we’ve seen above is an example of classification tree, where the outcome was a variable like ‘fit’ or
‘unfit’. Here the decision variable is Categorical.

2. Regression trees (Continuous data types)

• Here the decision or the outcome variable is Continuous, e.g. a number like 123. Working Now that we
know what a Decision Tree is, we’ll see how it works internally. There are many algorithms out there which
construct Decision Trees, but one of the best is called as ID3 Algorithm. ID3 Stands for Iterative
Dichotomiser 3. Before discussing the ID3 algorithm, we’ll go through few definitions. Entropy Entropy,
also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of the amount of
uncertainty or randomness in data.
• Intuitively, it tells us about the predictability of a certain event. Example, consider a coin toss whose
probability of heads is 0.5 and probability of tails is 0.5. Here the entropy is the highest possible, since
there’s no way of determining what the outcome might be. Alternatively, consider a coin which has heads
on both the sides, the entropy of such an event can be predicted perfectly since we know beforehand that
it’ll always be heads. In other words, this event has no randomness hence it’s entropy is zero

• Information Gain Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a
set S is the effective change in entropy after deciding on a particular attribute A. It measures the relative
change in entropy with respect to the independent variables.

where IG(S, A) is the information gain by applying feature A. H(S) is the Entropy of the entire set, while
the second term calculates the Entropy after applying the feature A,
• Let’s understand this with the help of an example Consider a piece of data collected over the course of 14 days where
the features are Outlook, Temperature, Humidity, Wind and the outcome variable is whether Golf was played on the day.
Now, our job is to build a predictive model which takes in above 4 parameters and predicts whether Golf will be played

on the day. We’ll build a decision tree to do that using ID3 algorithm.

Day Outlook Temperature Humidity Wind Play Golf

D1 Sunny Hot High Weak No


D2 Sunny Hot High Strong No

D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes

D6 Rain Cool Normal Strong No

D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes

D10 Rain Mild Normal Weak Yes

D11 Sunny Mild Normal Strong Yes

D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No


ID3 Algorithm will perform following tasks recursively

1. Create root node for the tree

2. If all examples are positive, return leaf node ‘positive’

3. Else if all examples are negative, return leaf node ‘negative’

4. Calculate the entropy of current state H(S)

5. For each attribute, calculate the entropy with respect to the attribute ‘x’ denoted by H(S, x)

6. Select the attribute which has maximum value of IG(S, x)

7. Remove the attribute that offers highest IG from the set of attributes

8. Repeat until we run out of all attributes, or the decision tree has all leaf nodes.
Statistical learning models
• Statistical learning theory is a framework for machine learning drawing from the
fields of statistics and functional analysis. Statistical learning theory deals with the
problem of finding a predictive function based on data.
• Statistical learning focuses on calculating the probabilities of each hypothesis and
make predictions accordingly.
• Statistical learning theory has led to successful applications in fields such as
computer vision, speech recognition, bioinformatics etc.
• Maximum likelihood estimation (MLE) is a method of estimating the parameters
of a statistical model so the observed data is most probable. MLE attempts to find
the parameter values that maximize the likelihood function, given the
observations. The resulting estimate is called a maximum likelihood estimate,
which is also abbreviated as MLE.
• Machine learning is all about results, it is likely working in a company
where your worth is characterized solely by your performance.
• Whereas, statistical modeling is more about finding relationships
between variables and the significance of those relationships, whilst (
at the same time) also catering for prediction.
Naïve Bayes’ Model
Naïve Baye’s Algorithm is the algorithm that learns the probability of an object
with certain features belonging to a particular group/ class. Bayes’ is after the name
of statistician and philosopher, Thomas Bayes and the theorem named “Bayes
Theorem”, which is the base of Naïve Bayes Model. More formally, Bayes’
Theorem is stated as the following equation:
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
Where,
• P(A|B): probability (conditional probability) of occurrence of event A given the
event B is true.
• P(A) and P(B): probabilities of occurrence of event A and B respectively.
• P(B|A): Probability of occurrence of event B given the event A is true.
• For example, a fruit may be considered to be an apple if it is
red, round, and about 3 inches in diameter. Even if these
features depend on each other or upon the existence of the
other features, all of these properties independently contribute
to the probability that this fruit is an apple and that is why it is
known as ‘Naive’.
Naïve Bayes’ Model

Above,

•P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
•P(c) is the prior probability of class.
•P(x|c) is the likelihood which is the probability of predictor given class.
•P(x) is the prior probability of predictor.
How Naive Bayes algorithm works?

• Let’s understand it using an example. Below I have a training data set


of weather and corresponding target variable ‘Play’ (suggesting
possibilities of playing). Now, we need to classify whether players will
play or not based on weather condition. Let’s follow the below steps to
perform it.
• Step 1: Convert the data set into a frequency table
• Step 2: Create Likelihood table by finding the probabilities like
Overcast probability = 0.29 and probability of playing is 0.64.
• Step 3: Now, use Naive Bayesian equation to calculate the posterior
probability for each class. The class with the highest posterior
probability is the outcome of prediction.
Problem: Players will play if weather is sunny. Is this statement is correct?
• We can solve it using above discussed method of posterior probability.
• P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)
• Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P(
Yes)= 9/14 = 0.64
• Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher
probability.
• Naive Bayes uses a similar method to predict the probability of different
class based on various attributes. This algorithm is mostly used in text
classification and with problems having multiple classes.
What are the Pros and Cons of Naive Bayes?

Pros:
• It is easy and fast to predict class of test data set. It also
perform well in multi class prediction
• When assumption of independence holds, a Naive Bayes
classifier performs better compare to other models like logistic
regression and you need less training data.
• It perform well in case of categorical input variables compared
to numerical variable(s). For numerical variable, normal
distribution is assumed (bell curve, which is a strong
assumption).
Cons:
•If categorical variable has a category (in test data set), which was not observed in
training data set, then model will assign a 0 (zero) probability and will be unable to
make a prediction. This is often known as “Zero Frequency”. To solve this, we can use
the smoothing technique. One of the simplest smoothing techniques is called Laplace
estimation.
•On the other side naive Bayes is also known as a bad estimator, so the probability
outputs from predict_proba are not to be taken too seriously.
•Another limitation of Naive Bayes is the assumption of independent predictors. In real
life, it is almost impossible that we get a set of predictors which are completely
independent.
Applications of Naive Bayes Algorithms

• Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making
predictions in real time.

• Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict the
probability of multiple classes of target variable.

• Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to
better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a
result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify
positive and negative customer sentiments)

• Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System
that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a
given resource or not
Learning with hidden data - EM algorithm
• In the real-world applications of machine learning, it is very common that there
are many relevant features available for learning but only a small subset of them
are observable.
• So, for the variables which are sometimes observable and sometimes not, then we
can use the instances when that variable is observed for the purpose of learning
and then predict its value in the instances when it is not observable.
• On the other hand, Expectation-Maximization algorithm can be used for the latent
variables (variables that are not directly observable and are actually inferred from
the values of the other observed variables) too in order to predict their values with
the condition that the general form of probability distribution governing those latent
variables is known to us.
• This algorithm is actually at the base of many unsupervised clustering algorithms
in the field of machine learning.
• It was explained, proposed and given its name in a paper published in 1977 by
Arthur Dempster, Nan Laird, and Donald Rubin. It is used to find the local
maximum likelihood parameters of a statistical model in the cases where latent
variables are involved and the data is missing or incomplete.
Algorithm:
• Given a set of incomplete data, consider a set of starting parameters.
• Expectation step (E – step): Using the observed available data of the dataset,
estimate (guess) the values of the missing data.
• Maximization step (M – step): Complete data generated after the expectation (E)
step is used in order to update the parameters.
• Repeat step 2 and step 3 until convergence.
• The essence of Expectation-Maximization algorithm is to use the available
observed data of the dataset to estimate the missing data and then using that data to
update the values of the parameters. Let us understand the EM algorithm in detail.
• Initially, a set of initial values of the parameters are considered. A set of incomplete
observed data is given to the system with the assumption that the observed data
comes from a specific model.
• The next step is known as “Expectation” – step or E-step. In this step, we use the
observed data in order to estimate or guess the values of the missing or incomplete
data. It is basically used to update the variables.
• The next step is known as “Maximization”-step or M-step. In this step, we use the
complete data generated in the preceding “Expectation” – step in order to update
the values of the parameters. It is basically used to update the hypothesis.
• Now, in the fourth step, it is checked whether the values are converging or not, if
yes, then stop otherwise repeat step-2 and step-3 i.e. “Expectation” – step and
“Maximization” – step until the convergence occurs
Usage of EM algorithm
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• It can be used for the purpose of estimating the parameters of Hidden Markov
Model (HMM).
• It can be used for discovering the values of latent variables.
Advantages of EM algorithm
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms of
implementation.
• Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm
• It has slow convergence.
• It makes convergence to the local optima only.
• It requires both the probabilities, forward and backward (numerical optimization
requires only forward probability)
Learning with hidden data - EM algorithm
How it works?
From the given data, EM learns a theory which tells that how much example to be
classified and how to predict the feature value of each class. From this it starts from
random classify data and repeat the two steps until a clear result is formed.
1. E-step: classify the data using current theory i.e., E-step generates expected
classification for each example.
2. M-step: generate the best theory using current classification of data i.e., M-
step generates most likely theory with given the classified data.

Applications of EM algorithm are Artificial Vision, NLP, Clustering etc.


End of UNIT-IV

You might also like