AI Unit-4
AI Unit-4
RCA-403
Syllabus
UNIT-I INTRODUCTION: - Introduction to Artificial Intelligence, Foundations and
History of Artificial Intelligence, Applications of Artificial Intelligence, Intelligent
Agents, Structure of Intelligent Agents. Computer vision, Natural Language
Possessing.
UNIT-II INTRODUCTION TO SEARCH: - Searching for solutions, uniformed
search strategies, informed search strategies, Local search algorithms and optimistic
problems, Adversarial Search, Search for Games, Alpha - Beta pruning.
UNIT-III KNOWLEDGE REPRESENTATION & REASONING: - Propositional
logic, Theory of first order logic, Inference in First order logic, Forward &
Backward chaining, Resolution, Probabilistic reasoning, Utility theory, Hidden
Markov Models (HMM), Bayesian Networks.
Syllabus
UNIT-IV MACHINE LEARNING: - Supervised and unsupervised learning,
Decision trees, Statistical learning models, learning with complete data -
Naive Bayes models, Learning with hidden data – EM algorithm,
Reinforcement learning.
What is learning?
• Learning is the process of gathering information and knowledge from past
experience data analysis and apply this information to enhance the system
performance.
• Learning represents changes in a system that, make a system to do the same
task more efficiently next time
Machine learning, a branch of artificial intelligence, concerns the construction and
study of systems that can learn from data. For example, a machine learning system
could be trained on email messages to learn to distinguish between spam and non-
spam messages. After learning, it can then be used to classify new email messages
into spam and non-spam folders.
Machine Learning
Machine Learning
Calculate Entropy: For each attribute, the algorithm calculates the entropy of the
dataset based on the values of that attribute. Entropy is computed using the formula:
Calculate the Information Gain for each attribute. Information Gain is a measure
of the effectiveness of an attribute in classifying the data. It is computed as the
difference between the entropy of the original dataset and the weighted average of
the entropies of the subsets created by splitting the data based on that attribute.
Select Best Attribute: The algorithm selects the attribute with the highest
Information Gain as the attribute to split the dataset at that node. This process is
repeated recursively for each subset until a stopping condition is met, such as
reaching a certain depth or having subsets that are pure (i.e., all instances belong to
the same class).
By using entropy and information gain, the ID3 algorithm intelligently chooses how
to split the data at each node of the decision tree, resulting in a tree structure that
effectively classifies instances based on the given attributes.
Consider the weather dataset as given below:
Complete entropy of dataset is:
H(S) = - p(yes) * log2(p(yes)) - p(no) * log2(p(no))
= - (9/14) * log2(9/14) - (5/14) * log2(5/14)
= - (-0.41) - (-0.53)
= 0.94
For each attribute of the dataset:
Categorical values of outlook attribute - sunny, overcast and rain
H(Outlook=sunny) = -(2/5)*log(2/5)-(3/5)*log(3/5) =0.971
H(Outlook=rain) = -(3/5)*log(3/5)-(2/5)*log(2/5) =0.971
H(Outlook=overcast) = -(4/4)*log(4/4)-0 = 0
Average Entropy Information for Outlook -
I(Outlook) = p(sunny) * H(Outlook=sunny) + p(rain) * H(Outlook=rain) +
p(overcast) * H(Outlook=overcast)
= (5/14)*0.971 + (5/14)*0.971 + (4/14)*0 = 0.693
Information Gain = H(S) - I(Outlook) = 0.94 - 0.693 = 0.247
Categorical values of temperature attribute - hot, mild, cool
H(Temperature=hot) = -(2/4)*log(2/4)-(2/4)*log(2/4) = 1
H(Temperature=cool) = -(3/4)*log(3/4)-(1/4)*log(1/4) = 0.811
H(Temperature=mild) = -(4/6)*log(4/6)-(2/6)*log(2/6) = 0.9179
Here, when Outlook = Rain and Wind = Strong, it is a pure class of category "no". And
When Outlook = Rain and Wind = Weak, it is again a pure class of category "yes".
And this is our final desired tree for the given dataset.
Advantages of Decision Tree Algorithm
1. Compared to other algorithms decision trees requires less effort for data
preparation during pre-processing.
2. A decision tree does not require normalization of data.
3. A decision tree does not require scaling of data as well.
4. Missing values in the data also do NOT affect the process of building a
decision tree to any considerable extent.
5. A Decision tree model is very intuitive and easy to explain to technical teams
as well as stakeholders.
Disadvantage of Decision Tree Algorithm
1. A small change in the data can cause a large change in the structure of the
decision tree causing instability.
2. For a Decision tree sometimes calculation can go far more complex compared
to other algorithms.
3. Decision tree often involves higher time to train the model.
4. Decision tree training is relatively expensive as the complexity and time has
taken are more.
5. The Decision Tree algorithm is inadequate for applying regression and
predicting continuous values.
Statistical learning
models
Role of Statistics in Machine Learning:
• Constructing machine learning models. Statistics provides the methodologies and
principles for creating models in machine learning. For instance, the linear regression
model leverages the statistical method of least squares to estimate the coefficients.
• Interpreting results. Statistical concepts allow us to interpret the results generated by
machine learning models. Measures such as p-value, confidence intervals, R-squared,
and others provide us with a statistical perspective on the machine learning model’s
performance.
• Validating models. Statistical techniques are essential for validating and refining the
machine learning models. For instance, techniques like hypothesis testing, cross-
validation, and bootstrapping help us quantify the performance of models and avoid
problems like overfitting.
• Underpinning advanced techniques. Even some of the more complex machine
learning algorithms, such as Neural Networks, have statistical principles at their core.
The optimization techniques, like gradient descent, used to train these models are
based on statistical theory.
S. No. Machine Learning Statistical Learning
1 Subfield of Artificial Intelligence Subfield of Mathematics
2 Uses algorithms Uses Equations
3 Requires minimum human effort, is
Requires a lot of human effort
automated
4 Can learn from large datasets Deals with smaller datasets
5 Gives a best estimate – gain some
Has strong predictive abilities insights into one thing, but is of little
help or no help in prediction
6 Makes predictions Make inferences
7 Learn from data and discover Learns from samples, populations and
patterns hypothesis
Statistical learning models
• Statistical learning theory is a framework for machine learning drawing from the
fields of statistics and functional analysis. Statistical learning theory deals with the
problem of finding a predictive function based on data.
• Statistical learning focuses on calculating the probabilities of each hypothesis and
make predictions accordingly.
• Statistical learning theory has led to successful applications in fields such as
computer vision, speech recognition, bioinformatics etc.
• Maximum likelihood estimation (MLE) is a method of estimating the parameters
of a statistical model so the observed data is most probable. MLE attempts to find
the parameter values that maximize the likelihood function, given the
observations. The resulting estimate is called a maximum likelihood estimate,
which is also abbreviated as MLE.
Machine Learning: Reinforcement ML