AI Notes Module - 4
AI Notes Module - 4
LEARNING
WHAT IS LEARNING
“Learning denotes changes in a system that enables system to do the same task more efficiently
next time.”
Machine Learning
Definition
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P,if its performance at tasks in T,as measured by P,improves with
experience E.
Performance Element:
The performance element is the agent that acts in the world .It percepts and decides on external
actions.
Learning Element:
It responsible for making improvements, takes knowledge about performance element and some
feedback ,determines how to modify performance element.
Critic:
It tells the learning element how agent is doing by comparing with the fixed standard of
performance.
Problem Generator:
This component suggests problems or actions that will generate new examples or experience that
helps the system to train further.
Learning Paradigm:
• Rote learning
• Induction
• Clustering
• Analogy
• Discovery
• Genetic algorithms
• Reinforcement
ROTE LEARNING
Rote learning technique avoids understanding the inner complexities but focuses on
memorizing the material so that it can be recalled by the learner exactly the way it read or heard.
Learning by memorization: which avoids understanding the inner complexities the subject that
is being learned.
Learning something from Repeating: saying the same thing and trying to remember how to
say it; it does not help to understand ,it helps to remember ,like we learn a poem, song ,etc.
· Supervised
· Unsupervised
Supervised learning:( The machine has access to a teacher who corrects it.)
learning is the machine learning task of inferring a function from labeled training data.
The training data consist of a set of training examples. In supervised learning, each example is a
pair consisting of an input object (typically a vector) and a desired output value (also called the
supervisory signal). Example : Face recognition
Unsupervised Learning:( No access to teacher. Instead, the machine must search for
“order” and “structure” in the environment.)
since there is no desired output in this case that is provided therefore categorization is
done so that the algorithm differentiates correctly between the face of a horse, cat or human
(clustering of data)
Clustering:
In clustering or unsupervised learning, the target features are not given in the training
examples. The aim is to construct a natural classification that can be used to cluster the data. The
general idea behind clustering is to partition the examples into clusters or classes. Each class
predicts feature values for the examples in the class. Each clustering has a prediction error on the
predictions. The best clustering is the one that minimizes the error.
Example: An intelligent tutoring system may want to cluster students' learning behavior so that
strategies that work for one member of a class may work for other members.
Reinforcement Learning:
Imagine a robot that can act in a world, receiving rewards and punishments and determining
from these what it should do. This is the problem of reinforcement
learning.Most Reinforcement Learning research is conducted with in the mathematical
framework of Markov Decision Process.
LEARNING BY TAKING ADVICE
The idea of advice taking in AI based learning was proposed as early as 1958 (McCarthy).
However very few attempts were made in creating such systems until the late 1970s. Expert
systems providing a major impetus in this area.
Q(st, at) ← Q(st, at) + αt (st, at) [rt + γmaxaQ (st+1, a) - Q(st, at)]
• where rt is a real reward at time t, αt(s,a) are the learning rates such that 0 ≤ αt(s,a) ≤ 1, and γ is
the discount factor such that 0 ≤ γ < 1.
• So, in the following fig-a, points (x,y) are given in plane so that y = ƒ(x), and the task is to find
a function h(x) that fits the point well.
• In fig-b, a piecewise-linear 'h' function is given, while the fig-c shows more complicated 'h'
function.
• Both the functions agree with the example points, but differ with the values of 'y' assigned to
other x inputs.
• As shown in fig.(d), we have a function that apparently ignores one of the example points, but
fits others with a simple function. The true/ is unknown, so there are many choices for h, but
without further knowledge, we have no way to prefer (b), (c), or (d).
WINSTON LEARNING PROGRAM
A Blocks World Learning Example -- Winston (1975)
1. Select one know instance of the concept. Call this the concept definition.
2. Examine definitions of other known instance of the concept. Generalise the definition to
include them.
3. Examine descriptions of near misses. Restrict the definition to exclude these.
Both steps 2 and 3 rely on comparison and both similarities and differences need to be identified.
DECISION TREES
o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Why we use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for the given
dataset and problem is the main point to remember while creating a machine learning model.
Below are the two reasons for using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like
structure.
• Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.
In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root
node of the tree. This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes and
move further. It continues the process until it reaches the leaf node of the tree. The complete
process can be better understood using the below algorithm:
• Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
• Step-3: Divide the S into subsets that contains possible values for the best attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.
Example:
Suppose there is a candidate who has a job offer and wants to decide whether he should accept
the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM). The root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels. The next decision node further gets
split into one decision node (Cab facility) and one leaf node. Finally, the decision node splits into
two leaf nodes (Accepted offers and Declined offer). Consider the below diagram:
While implementing a Decision tree, the main issue arises that how to select the best attribute for
the root node and for sub-nodes. So, to solve such problems there is a technique which is called
as Attribute selection measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for ASM, which are:
o Information Gain
o Gini Index
1. Information Gain:
Where,
2. Gini Index:
o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini
index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits.
o Gini index can be calculated using the below formula:
Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal
decision tree.
A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the learning
tree without reducing accuracy is known as Pruning. There are mainly two types of
tree pruning technology used: