0% found this document useful (0 votes)
49 views58 pages

AI Unit-4

Uploaded by

Jyoti Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views58 pages

AI Unit-4

Uploaded by

Jyoti Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Artificial Intelligence

RCA-403
Syllabus
UNIT-I INTRODUCTION: - Introduction to Artificial Intelligence, Foundations and
History of Artificial Intelligence, Applications of Artificial Intelligence, Intelligent
Agents, Structure of Intelligent Agents. Computer vision, Natural Language
Possessing.
UNIT-II INTRODUCTION TO SEARCH: - Searching for solutions, uniformed
search strategies, informed search strategies, Local search algorithms and optimistic
problems, Adversarial Search, Search for Games, Alpha - Beta pruning.
UNIT-III KNOWLEDGE REPRESENTATION & REASONING: - Propositional
logic, Theory of first order logic, Inference in First order logic, Forward &
Backward chaining, Resolution, Probabilistic reasoning, Utility theory, Hidden
Markov Models (HMM), Bayesian Networks.
Syllabus
UNIT-IV MACHINE LEARNING: - Supervised and unsupervised learning,
Decision trees, Statistical learning models, learning with complete data -
Naive Bayes models, Learning with hidden data – EM algorithm,
Reinforcement learning.

UNIT-V PATTERN RECOGNITION: - Introduction, Design principles of


pattern recognition system, Statistical Pattern recognition, Parameter estimation
methods - Principle Component Analysis (PCA) and Linear Discriminant Analysis
(LDA), Classification Techniques – Nearest Neighbor (NN) Rule, Bayes Classifier,
Support Vector Machine (SVM), K – means clustering.
UNIT-IV
Machine Learning

What is learning?
• Learning is the process of gathering information and knowledge from past
experience data analysis and apply this information to enhance the system
performance.
• Learning represents changes in a system that, make a system to do the same
task more efficiently next time
Machine learning, a branch of artificial intelligence, concerns the construction and
study of systems that can learn from data. For example, a machine learning system
could be trained on email messages to learn to distinguish between spam and non-
spam messages. After learning, it can then be used to classify new email messages
into spam and non-spam folders.
Machine Learning
Machine Learning

Types of Machine Learning


Machine Learning: Supervised ML

Supervised Machine learning?


Supervised learning is like learning with a supervisor/ guide. Training dataset is a
supervisor which is used to train the machine. Supervised learning is where you
have input variables (x) and an output variable (Y) and you use an algorithm to learn
the mapping function from the input to the output.
Y = f(X)
Basic idea of supervised learning is the training data provides “examples” and
“outcomes”, where each example specifies the outcome. Goal is to build a model
which can predict the outcome for new instances. If objective is categorical, then
model is “classification”, where as if the objective is numeric then model is
“regression”.
Machine Learning: Supervised ML
Based on the outcome/response or dependent variable, supervised learning
problems can be further divided into two different kinds:
Classification refers to taking an input value and mapping it to a discrete value. In
classification problems, our output typically consists of classes or categories. This
could be things like trying to predict what objects are present in an image (a cat/ a
dog) or whether it is going to rain today or not. Algorithms are: Decision Tree,
Support vector machine (SVM), discriminant analysis, Naive Bayes, K-Nearest
Neighbors (KNN)
Regression is related to continuous data (value functions). In Regression, the
predicted output values are real numbers. It deals with problems such as predicting
the price of a house or the trend in the stock price at a given time, etc. Algorithms
are: Decision Tree, Linear regression, Support vector regression (SVR), ensemble
methods, decision trees, neural networks
Machine Learning: Unsupervised ML

In Unsupervised learning, training data provides “examples” and no specific


“outcome”. Machine is trying to find “interesting” patterns in the data, then labeled
appropriately. Unsupervised learning is where you only have input data (X) and no
corresponding output variables. The goal for unsupervised learning is to model the
underlying structure or distribution in the data in order to learn more about the data.
These are called unsupervised learning because unlike supervised learning above
there is no correct answers and there is no supervisor.
Unsupervised learning algorithms: Clustering: K-means, K-medoids,
Hierarchical, Gaussian mixture, neural networks, hidden markov model.
Machine Learning: Unsupervised ML
Unsupervised Learning models can perform more complex tasks than Supervised
Learning models
Clustering is the type of Unsupervised Learning where we find hidden patterns in
the data based on their similarities or differences. These patterns can relate to the
shape, size, or color and are used to group data items or create clusters. There are
several types of clustering algorithms, such as exclusive, overlapping, hierarchical,
and probabilistic.
Association is the kind of Unsupervised Learning where we can find the
relationship of one data item to another data item. We can then use those
dependencies and map them in a way that benefits us—e.g., understanding
consumers' habits regarding our products can help us develop better cross-selling
strategies. The association rule is used to find the probability of co-occurrence of
items in a collection. These techniques are often utilized in customer behavior
analysis in e-commerce websites and OTT platforms.
Supervised Learning Unsupervised Learning
Input data is labelled Input data is unlabelled
Has a feedback mechanism Has no feedback mechanism
Data is classified based on the training Assigns properties of given data to classify
dataset it.
Divided into regression and classification Divided into clustering and association
Used for prediction Used for analysis
Algorithms include Decision Tree, Logistic Algorithms include K-means clustering,
Regression, Naïve Bayes, Support Vector hierarchical clustering apriori algorithm
Machine (SVM)
A known number of classes An unknown number of classes
Naïve Bayes’ Model
Naïve Baye’s Algorithm is the algorithm that learns the probability of an object
with certain features belonging to a particular group/ class. Bayes’ is after the name
of statistician and philosopher, Thomas Bayes and the theorem named “Bayes
Theorem”, which is the base of Naïve Bayes Model. More formally, Bayes’
Theorem is stated as the following equation:
𝑃 𝐵 𝐴 𝑃(𝐴)
𝑃 𝐴𝐵 =
𝑃(𝐵)
Where,
• P(A|B): probability (conditional probability) of occurrence of event A given the
event B is true.
• P(A) and P(B): probabilities of occurrence of event A and B respectively.
• P(B|A): Probability of occurrence of event B given the event A is true.
Naïve Bayes’ Model

The Naïve Bayes Classifier


belongs to the family of
probability classifier, using
Bayesian theorem. The reason
why it is called ‘Naïve’ because
it requires rigid independence
assumption between input
variables. Therefore, it is more
proper to call Simple Bayes or
Independence Bayes
Naïve Bayes’ Model Problem 1: Players will play if weather is
sunny. Is this statement is correct?
Naïve Bayes’ Model

We also calculate P(ClassPlay=Yes) and P(ClassPlay=No).

P(ClassPlay=Yes|x’) = [P(Sunny|ClassPlay=Yes) × P(Cool|ClassPlay=Yes) ×


P(High|ClassPlay=Yes) × P(Strong|ClassPlay=Yes)] × P(ClassPlay=Yes)
= 2/9 × 3/9 × 3/9 × 3/9 × 9/14 = 0.0053
In the same manner, P(ClassPlay=No|x’) = 0.0205
Since P(ClassPlay=Yes|x’) less than P(ClassPlay=No|x’), we classify the new instance x’
to be “No”.
Sl. No. Color Legs Height Smelly Species
1 White 3 Short Yes M
2 Green 2 Tall No M
3 Green 3 Short Yes M
4 White 3 Short Yes M
5 Green 2 Short No H
6 White 2 Tall No H
7 White 2 Tall No H
8 White 2 Short Yes H
Problem 2: Using the above data, we have to identify the species of an entity with
the following attributes:
X={Color=Green, Legs=2, Height=Tall, Smelly=No}
Advantages:
1. Simplicity and Speed: Naive Bayes is a simple algorithm that is easy to
understand and implement. It is computationally efficient and scales well with the
size of the dataset.
2. Efficient with High-Dimensional Data: Naive Bayes performs well even when
the number of features is high, making it suitable for high-dimensional datasets.
3. Good with Categorical Data: It works well with categorical data, and it can
handle both discrete and continuous features.
4. Robust to Irrelevant Features: It is relatively robust to irrelevant features because
of the assumption of independence. Features that are conditionally independent of
the class do not impact each other's contribution to the classification.
5. Requires Less Training Data: Naive Bayes can perform well with small amounts
of training data, making it suitable for situations where the dataset is limited.
Disadvantages:
1. Assumption of Independence: Naive Bayes assumes that all predictors (or features)
are independent, rarely happening in real life.
2. Sensitivity to Data Quality: Naive Bayes can be sensitive to the quality of the
training data. If the data contains misleading information or if the assumptions are
violated, it can lead to inaccurate results.
3. Zero Probability Problem: If a categorical variable in the test data has a category
that was not observed in the training data, the Naive Bayes algorithm assigns a
probability of zero and can't make predictions. This is known as the "zero probability
problem."
4. Difficulty Handling Continuous Variables: Naive Bayes assumes that numerical
features follow a Gaussian distribution, which may not always be the case. In such
situations, the algorithm may not perform as well with continuous variables.
5. Lack of Model Interpretability: While Naive Bayes is easy to understand, it may
lack the interpretability of more complex models. It provides probabilities of class
membership, but the reasoning behind these probabilities may not be as clear as in
some other models.
Decision Trees
Decision tree learning is one of the most successful techniques for supervised
classification learning. DT takes input as objects or situations described by a set of
attributes and returns a decision. The input attribute can be discrete or continuous as
well as output also. Learning a discrete valued function is called classification and
learning a continuous valued function is called regression learning. It can be applied
to both regression & classification.
Example: What to do this Weekend?
This tree consists of the following components:
• Questions/conditions are Nodes.
• Yes/No options represent Edges.
• End actions are Leafs of the tree.
Learning using Decision Trees

Decision Trees are a type of


Supervised Machine Learning. A
decision tree is a tree where each -
• Node - a feature(attribute)
• Branch - a decision(rule)
• Leaf - an outcome(categorical
or continuous)
Algorithms to build decision trees
are ID3, CART, Random Forest
ID3 Algorithm: It is called the ID3 (Iterative Dichotomiser 3) algorithm by J. R.
Quinlan. The algorithm uses Entropy and Information Gain to build the tree. It is a
classification algorithm that follows a greedy approach by selecting a best attribute
that yields maximum Information Gain(IG) or minimum Entropy(H). Entropy is a
measure of the amount of uncertainty in the dataset S. Information Gain IG(A) tells
us how much uncertainty in S was reduced after splitting set S on attribute A.
The steps in ID3 algorithm are as follows:
1. Calculate entropy for dataset.
2. For each attribute/feature.
2.1. Calculate entropy for all its categorical values.
2.2. Calculate information gain for the feature.
3. Find the feature with maximum information gain.
4. Repeat it until we get the desired tree.
How to calculate Entropy and Information Gain:

Calculate Entropy: For each attribute, the algorithm calculates the entropy of the
dataset based on the values of that attribute. Entropy is computed using the formula:

Calculate the Information Gain for each attribute. Information Gain is a measure
of the effectiveness of an attribute in classifying the data. It is computed as the
difference between the entropy of the original dataset and the weighted average of
the entropies of the subsets created by splitting the data based on that attribute.
Select Best Attribute: The algorithm selects the attribute with the highest
Information Gain as the attribute to split the dataset at that node. This process is
repeated recursively for each subset until a stopping condition is met, such as
reaching a certain depth or having subsets that are pure (i.e., all instances belong to
the same class).

By using entropy and information gain, the ID3 algorithm intelligently chooses how
to split the data at each node of the decision tree, resulting in a tree structure that
effectively classifies instances based on the given attributes.
Consider the weather dataset as given below:
Complete entropy of dataset is:
H(S) = - p(yes) * log2(p(yes)) - p(no) * log2(p(no))
= - (9/14) * log2(9/14) - (5/14) * log2(5/14)
= - (-0.41) - (-0.53)
= 0.94
For each attribute of the dataset:
Categorical values of outlook attribute - sunny, overcast and rain
H(Outlook=sunny) = -(2/5)*log(2/5)-(3/5)*log(3/5) =0.971
H(Outlook=rain) = -(3/5)*log(3/5)-(2/5)*log(2/5) =0.971
H(Outlook=overcast) = -(4/4)*log(4/4)-0 = 0
Average Entropy Information for Outlook -
I(Outlook) = p(sunny) * H(Outlook=sunny) + p(rain) * H(Outlook=rain) +
p(overcast) * H(Outlook=overcast)
= (5/14)*0.971 + (5/14)*0.971 + (4/14)*0 = 0.693
Information Gain = H(S) - I(Outlook) = 0.94 - 0.693 = 0.247
Categorical values of temperature attribute - hot, mild, cool
H(Temperature=hot) = -(2/4)*log(2/4)-(2/4)*log(2/4) = 1
H(Temperature=cool) = -(3/4)*log(3/4)-(1/4)*log(1/4) = 0.811
H(Temperature=mild) = -(4/6)*log(4/6)-(2/6)*log(2/6) = 0.9179

Average Entropy Information for Temperature -


I(Temperature) = p(hot)*H(Temperature=hot) +
p(mild)*H(Temperature=mild) + p(cool)*H(Temperature=cool)
= (4/14)*1 + (6/14)*0.9179 + (4/14)*0.811 = 0.9108

Information Gain = H(S) - I(Temperature)


= 0.94 - 0.9108
= 0.0292
Categorical values of humidity attribute- high, normal
H(Humidity=high) = -(3/7)*log(3/7)-(4/7)*log(4/7) = 0.983
H(Humidity=normal) = -(6/7)*log(6/7)-(1/7)*log(1/7) = 0.591

Average Entropy Information for Humidity -


I(Humidity) = p(high)*H(Humidity=high) +
p(normal)*H(Humidity=normal)
= (7/14)*0.983 + (7/14)*0.591
= 0.787

Information Gain = H(S) - I(Humidity)


= 0.94 - 0.787
= 0.153
Categorical values of windy attribute - weak, strong
H(Wind=weak) = -(6/8)*log(6/8)-(2/8)*log(2/8) = 0.811
H(Wind=strong) = -(3/6)*log(3/6)-(3/6)*log(3/6) = 1

Average Entropy Information for Wind -


I(Wind) = p(weak)*H(Wind=weak) + p(strong)*H(Wind=strong)
= (8/14)*0.811 + (6/14)*1
= 0.892

Information Gain = H(S) - I(Wind)


= 0.94 - 0.892
= 0.048
Here, the attribute with maximum information gain is Outlook. So, the decision
tree built so far –

Here, when Outlook = = overcast, it is of pure class(Yes). Now, we have to repeat


same procedure for the data with rows consist of Outlook value as Sunny and then
for Outlook value as Rain. Now, finding the best attribute for splitting the data
with Outlook=Sunny values{ Dataset rows = [1, 2, 8, 9, 11]}.
Complete entropy of Sunny is -
H(S) = - p(yes) * log2(p(yes)) - p(no) * log2(p(no))
= - (2/5) * log2(2/5) - (3/5) * log2(3/5) = 0.971
Categorical values of temperature attribute in terms of sunny - hot, mild, cool
H(Sunny, Temperature=hot) = -0-(2/2)*log(2/2) = 0
H(Sunny, Temperature=cool) = -(1)*log(1)- 0 = 0
H(Sunny, Temperature=mild) = -(1/2)*log(1/2)-(1/2)*log(1/2) = 1
Average Entropy Information for Temperature -
I(Sunny, Temperature) = p(Sunny, hot)*H(Sunny, Temperature=hot) +
p(Sunny, mild)*H(Sunny, Temperature=mild) + p(Sunny, cool)*H(Sunny,
Temperature=cool)
= (2/5)*0 + (1/5)*0 + (2/5)*1= 0.4
Information Gain = H(Sunny) - I(Sunny, Temperature)
= 0.971 - 0.4 = 0.571
Categorical values of humidity attribute in terms of sunny - high, normal
H(Sunny, Humidity=high) = - 0 - (3/3)*log(3/3) = 0
H(Sunny, Humidity=normal) = -(2/2)*log(2/2)-0 = 0

Average Entropy Information for Humidity -


I(Sunny, Humidity) = p(Sunny, high)*H(Sunny, Humidity=high) + p(Sunny,
normal)*H(Sunny, Humidity=normal)
= (3/5)*0 + (2/5)*0
=0
Information Gain = H(Sunny) - I(Sunny, Humidity)
= 0.971 - 0
= 0.971
Categorical values of windy attribute in terms of sunny - weak, strong
H(Sunny, Wind=weak) = -(1/3)*log(1/3)-(2/3)*log(2/3) = 0.918
H(Sunny, Wind=strong) = -(1/2)*log(1/2)-(1/2)*log(1/2) = 1

Average Entropy Information for Wind -


I(Sunny, Wind) = p(Sunny, weak)*H(Sunny, Wind=weak) + p(Sunny,
strong)*H(Sunny, Wind=strong)
= (3/5)*0.918 + (2/5)*1
= 0.9508

Information Gain = H(Sunny) - I(Sunny, Wind)


= 0.971 - 0.9508
= 0.0202
Here, the attribute with maximum information gain is Humidity. So, the decision
tree built so far -
Here, when Outlook = Sunny and Humidity = High, it is a pure class of category
"no". And When Outlook = Sunny and Humidity = Normal, it is again a pure class
of category "yes". Therefore, we don't need to do further calculations.
Now, finding the best attribute for splitting the data with Outlook=Sunny values{
Dataset rows = [4, 5, 6, 10, 14]}.

Complete entropy of Rain is -


H(S) = - p(yes) * log2(p(yes)) - p(no) * log2(p(no))
= - (3/5) * log(3/5) - (2/5) * log(2/5)
= 0.971
Categorical values of temperature attribute in terms of rain - mild, cool
H(Rain, Temperature=cool) = -(1/2)*log(1/2)- (1/2)*log(1/2) = 1
H(Rain, Temperature=mild) = -(2/3)*log(2/3)-(1/3)*log(1/3) = 0.918

Average Entropy Information for Temperature -


I(Rain, Temperature) = p(Rain, mild)*H(Rain, Temperature=mild) + p(Rain,
cool)*H(Rain, Temperature=cool)
= (2/5)*1 + (3/5)*0.918
= 0.9508

Information Gain = H(Rain) - I(Rain, Temperature)


= 0.971 - 0.9508
= 0.0202
Categorical values windy attribute in terms of rain - weak, strong
H(Wind=weak) = -(3/3)*log(3/3)-0 = 0
H(Wind=strong) = 0-(2/2)*log(2/2) = 0

Average Entropy Information for Wind -


I(Wind) = p(Rain, weak)*H(Rain, Wind=weak) + p(Rain, strong)*H(Rain,
Wind=strong)
= (3/5)*0 + (2/5)*0
=0

Information Gain = H(Rain) - I(Rain, Wind)


= 0.971 - 0
= 0.971
Here, the attribute with maximum information gain is Wind. So, the decision tree built
so far -

Here, when Outlook = Rain and Wind = Strong, it is a pure class of category "no". And
When Outlook = Rain and Wind = Weak, it is again a pure class of category "yes".
And this is our final desired tree for the given dataset.
Advantages of Decision Tree Algorithm

1. Compared to other algorithms decision trees requires less effort for data
preparation during pre-processing.
2. A decision tree does not require normalization of data.
3. A decision tree does not require scaling of data as well.
4. Missing values in the data also do NOT affect the process of building a
decision tree to any considerable extent.
5. A Decision tree model is very intuitive and easy to explain to technical teams
as well as stakeholders.
Disadvantage of Decision Tree Algorithm

1. A small change in the data can cause a large change in the structure of the
decision tree causing instability.
2. For a Decision tree sometimes calculation can go far more complex compared
to other algorithms.
3. Decision tree often involves higher time to train the model.
4. Decision tree training is relatively expensive as the complexity and time has
taken are more.
5. The Decision Tree algorithm is inadequate for applying regression and
predicting continuous values.
Statistical learning
models
Role of Statistics in Machine Learning:
• Constructing machine learning models. Statistics provides the methodologies and
principles for creating models in machine learning. For instance, the linear regression
model leverages the statistical method of least squares to estimate the coefficients.
• Interpreting results. Statistical concepts allow us to interpret the results generated by
machine learning models. Measures such as p-value, confidence intervals, R-squared,
and others provide us with a statistical perspective on the machine learning model’s
performance.
• Validating models. Statistical techniques are essential for validating and refining the
machine learning models. For instance, techniques like hypothesis testing, cross-
validation, and bootstrapping help us quantify the performance of models and avoid
problems like overfitting.
• Underpinning advanced techniques. Even some of the more complex machine
learning algorithms, such as Neural Networks, have statistical principles at their core.
The optimization techniques, like gradient descent, used to train these models are
based on statistical theory.
S. No. Machine Learning Statistical Learning
1 Subfield of Artificial Intelligence Subfield of Mathematics
2 Uses algorithms Uses Equations
3 Requires minimum human effort, is
Requires a lot of human effort
automated
4 Can learn from large datasets Deals with smaller datasets
5 Gives a best estimate – gain some
Has strong predictive abilities insights into one thing, but is of little
help or no help in prediction
6 Makes predictions Make inferences
7 Learn from data and discover Learns from samples, populations and
patterns hypothesis
Statistical learning models
• Statistical learning theory is a framework for machine learning drawing from the
fields of statistics and functional analysis. Statistical learning theory deals with the
problem of finding a predictive function based on data.
• Statistical learning focuses on calculating the probabilities of each hypothesis and
make predictions accordingly.
• Statistical learning theory has led to successful applications in fields such as
computer vision, speech recognition, bioinformatics etc.
• Maximum likelihood estimation (MLE) is a method of estimating the parameters
of a statistical model so the observed data is most probable. MLE attempts to find
the parameter values that maximize the likelihood function, given the
observations. The resulting estimate is called a maximum likelihood estimate,
which is also abbreviated as MLE.
Machine Learning: Reinforcement ML

Reinforcement Learning is a subset of machine learning algorithms that learns by


exploring its environment. RL agent learns by interacting with its environment.
Agent receives awards by performing correctly and penalties by performing
incorrectly. Reinforcement Learning is a type of Machine Learning which allows
machines and software agents to automatically determine the ideal behavior within a
specific context, in order to maximize its performance. Reinforcement algorithms
are not given explicit goals; instead, they are forced to learn these optimal goals by
trial and error. Algorithms of RL: Q-learning, DQN
Applications of RL are Robot Navigation,
Game Theory (Backgammon), Intelligent
tutoring system etc.
Machine Learning: Reinforcement ML
Components of Reinforcement Learning
1. Agent: Agent is the part of RL which takes actions, receives rewards for actions
and gets a new environment state as a result of the action taken. In the cycling
analogy, the agent is a human brain that decides what action to take and gets
rewarded (falling is negative and riding is positive).
2. Environment: The environment represents the outside world (only relevant part
of the world which the agent needs to know about to take actions) that interacts
with agents. In the cycling analogy, the environment is the cycling track and the
objects as seen by the rider.
3. State: State is the condition or position in which the agent is currently exhibiting
or residing. In the cycling analogy, it will be the speed of cycle, tilting of the
handle, tilting of the cycle, etc.
Machine Learning: Reinforcement ML

Components of Reinforcement Learning


4. Action: What the agent does while interacting with the environment is referred
to as action. In the cycling analogy, it will be to peddle harder (if the decision is
to increase speed), apply brakes (if the decision is to reduce speed), tilt handle,
tilt body, etc.
5. Rewards: Reward is an indicator to the agent on how good or bad the action
taken was. In the cycling analogy, it can be +1 for not falling, -10 for hitting
obstacles and -100 for falling, the reward for outcomes (+1, -10, -100) are
defined while building the RL agent. Since the agent wants to maximize
rewards, it avoids hitting and always tries to avoid falling.
Machine Learning: Reinforcement ML
Machine Learning: Reinforcement ML

When Not to Use Reinforcement Learning?


We can’t apply reinforcement learning model is all the situation. Here are some
conditions when you should not use reinforcement learning model.
• When you have enough data to solve the problem with a supervised learning
method
• You need to remember that Reinforcement Learning is computing-heavy and
time-consuming. in particular when the action space is large
Machine Learning: Reinforcement vs. Supervised ML
Machine Learning: Reinforcement ML

Challenges of Reinforcement Learning


Major challenges you will face while doing Reinforcement earning:
• Feature/reward design which should be very involved
• Parameters may affect the speed of learning.
• Realistic environments can have partial observability.
• Too much Reinforcement may lead to an overload of states which can
diminish the results.
• Realistic environments can be non-stationary.
Learning with hidden data - EM algorithm
• Initially, a set of initial values of the parameters are considered. A set of
incomplete observed data is given to the system with the assumption that the
observed data comes from a specific model.
• The next step is “Expectation” – step or E-step. In this step, we use the observed
data in order to estimate or guess the values of the missing or incomplete data. It is
basically used to update the variables.
• The next step is known as “Maximization” – step or M-step. In this step, we use
the complete data generated in the preceding “Expectation” – step in order to
update the values of the parameters. It is basically used to update the hypothesis.
• In forth step, it is checked whether the values are converging or not. If yes, then
stop otherwise repeat step-2 and step-3 i.e., “Expectation” – step and
“Maximization” – step until the convergence occurs.
Learning with hidden data - EM algorithm
Algorithm:
1. Given a set of incomplete data, consider
a set of starting parameters
2. Expectation step (E-step): Using the
observed available data of the dataset,
estimate (guess) the values of the
missing data.
3. Maximization step (M-step): Complete
data generated after the expectation E
step is used in order to update the
parameters
4. Repeat step 2 and step 3 until
convergence
Learning with hidden data - EM algorithm
Advantages:
• Used to fill the missing data in a sample
• Used as the basis of unsupervised learning of clusters
• Used for discovering the values of latent variables
• Guaranteed the likelihood will increase with each iteration
• Easy to implement E-step and M-step
Disadvantages
• Slow convergence
• Convergence to local optima only
• Required both probability, forward and backward
End of UNIT-IV

You might also like