0% found this document useful (0 votes)
19 views13 pages

AI Notes Module - 4

The document discusses the concept of learning in machine learning, defining it as a change in a system that improves task performance through experience. It outlines key components of a learning system, including performance elements, learning elements, critics, and problem generators, and describes various learning paradigms such as supervised, unsupervised, and reinforcement learning. Additionally, it covers decision trees as a supervised learning technique, detailing their structure, advantages, and disadvantages.

Uploaded by

Vijendar Amgothu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views13 pages

AI Notes Module - 4

The document discusses the concept of learning in machine learning, defining it as a change in a system that improves task performance through experience. It outlines key components of a learning system, including performance elements, learning elements, critics, and problem generators, and describes various learning paradigms such as supervised, unsupervised, and reinforcement learning. Additionally, it covers decision trees as a supervised learning technique, detailing their structure, advantages, and disadvantages.

Uploaded by

Vijendar Amgothu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

UNIT – IV

LEARNING
WHAT IS LEARNING
“Learning denotes changes in a system that enables system to do the same task more efficiently
next time.”

Machine Learning
Definition

A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P,if its performance at tasks in T,as measured by P,improves with
experience E.

Components of Learning System

Performance Element:

The performance element is the agent that acts in the world .It percepts and decides on external
actions.
Learning Element:

It responsible for making improvements, takes knowledge about performance element and some
feedback ,determines how to modify performance element.

Critic:

It tells the learning element how agent is doing by comparing with the fixed standard of
performance.

Problem Generator:

This component suggests problems or actions that will generate new examples or experience that
helps the system to train further.

Let us see the role of each component with an example.


Example: Automated Taxi on city roads

Performance Element: consists of knowledge and procedures for driving actions.


Eg : turning , accelerating ,breaking are the performance elements on roads.

Learning Element: It formulates goals.


Eg: learn rules for breaking, accelerating, learn geography of the city.

Critic: Observes world and passes information to learning element.


Eg:quick right turn across three lanes of traffic ,observe reaction of other drivers.

Problem Generator: Try south city road

Learning Paradigm:

• Rote learning
• Induction
• Clustering
• Analogy
• Discovery
• Genetic algorithms
• Reinforcement

ROTE LEARNING
Rote learning technique avoids understanding the inner complexities but focuses on
memorizing the material so that it can be recalled by the learner exactly the way it read or heard.
Learning by memorization: which avoids understanding the inner complexities the subject that
is being learned.

Learning something from Repeating: saying the same thing and trying to remember how to
say it; it does not help to understand ,it helps to remember ,like we learn a poem, song ,etc.

There are two types of inductive learning,

· Supervised
· Unsupervised

Supervised learning:( The machine has access to a teacher who corrects it.)

learning is the machine learning task of inferring a function from labeled training data.
The training data consist of a set of training examples. In supervised learning, each example is a
pair consisting of an input object (typically a vector) and a desired output value (also called the
supervisory signal). Example : Face recognition

Unsupervised Learning:( No access to teacher. Instead, the machine must search for
“order” and “structure” in the environment.)

since there is no desired output in this case that is provided therefore categorization is
done so that the algorithm differentiates correctly between the face of a horse, cat or human
(clustering of data)

Clustering:

In clustering or unsupervised learning, the target features are not given in the training
examples. The aim is to construct a natural classification that can be used to cluster the data. The
general idea behind clustering is to partition the examples into clusters or classes. Each class
predicts feature values for the examples in the class. Each clustering has a prediction error on the
predictions. The best clustering is the one that minimizes the error.
Example: An intelligent tutoring system may want to cluster students' learning behavior so that
strategies that work for one member of a class may work for other members.

Reinforcement Learning:

Imagine a robot that can act in a world, receiving rewards and punishments and determining
from these what it should do. This is the problem of reinforcement
learning.Most Reinforcement Learning research is conducted with in the mathematical
framework of Markov Decision Process.
LEARNING BY TAKING ADVICE

The idea of advice taking in AI based learning was proposed as early as 1958 (McCarthy).
However very few attempts were made in creating such systems until the late 1970s. Expert
systems providing a major impetus in this area.

• This type is the easiest and simple way of learning.


• In this type of learning, a programmer writes a program to give some instructions to perform a
task to the computer. Once it is learned (i.e. programmed), the system will be able to do new
things.
• Also, there can be several sources for taking advice such as humans(experts), internet etc.
• However, this type of learning has a more necessity of inference than rote learning.
• As the stored knowledge in knowledge base gets transformed into an operational form, the
reliability of the knowledge source is always taken into consideration.

LEARNING IN PROBLEM SOLVING

• Humans have a tendency to learn by solving various real world problems.


• The forms or representation, or the exact entity, problem solving principle is based on
reinforcement learning.
• Therefore, repeating certain action results in desirable outcome while the action is avoided if it
results into undesirable outcomes.
• As the outcomes have to be evaluated, this type of learning also involves the definition of a
utility function. This function shows how much is a particular outcome worth?
• There are several research issues which include the identification of the learning rate, time and
algorithm complexity, convergence, representation (frame and qualification problems), handling
of uncertainty (ramification problem), adaptivity and "unlearning" etc.
• In reinforcement learning, the system (and thus the developer) know the desirable outcomes but
does not know which actions result into desirable outcomes.
• In such a problem or domain, the effects of performing the actions are usually compounded with
side-effects. Thus, it becomes impossible to specify the actions to be performed in accordance
to the given parameters.
• Q-Learning is the most widely used reinforcement learning algorithm.
• The main part of an algorithm is a simple value iteration update. For each state 'S', from the
state set S, and for each action, a, from the action set 'A', it is possible to calculate an update to
its expected reduction reward value, with the following expression:

Q(st, at) ← Q(st, at) + αt (st, at) [rt + γmaxaQ (st+1, a) - Q(st, at)]
• where rt is a real reward at time t, αt(s,a) are the learning rates such that 0 ≤ αt(s,a) ≤ 1, and γ is
the discount factor such that 0 ≤ γ < 1.

LEARNING BY EXAMPLE (INDUCTION LEARNING)

• Induction learning is carried out on the basis of supervised learning.


• In this learning process, a general rule is induced by the system from a set of observed instance.
• However, class definitions can be constructed with the help of a classification method.
For Example:
Consider that 'ƒ' is the target function and example is a pair (x ƒ(x)), where 'x' is input and ƒ(x)
is the output function applied to 'x'.
Given problem: Find hypothesis h such as h ≈ ƒ

• So, in the following fig-a, points (x,y) are given in plane so that y = ƒ(x), and the task is to find
a function h(x) that fits the point well.
• In fig-b, a piecewise-linear 'h' function is given, while the fig-c shows more complicated 'h'
function.
• Both the functions agree with the example points, but differ with the values of 'y' assigned to
other x inputs.

• As shown in fig.(d), we have a function that apparently ignores one of the example points, but
fits others with a simple function. The true/ is unknown, so there are many choices for h, but
without further knowledge, we have no way to prefer (b), (c), or (d).
WINSTON LEARNING PROGRAM
A Blocks World Learning Example -- Winston (1975)

• The goal is to construct representation of the definitions of concepts in this domain.


• Concepts such a house - brick (rectangular block) with a wedge (triangular block)
suitably placed on top of it, tent - 2 wedges touching side by side, or an arch - two non-
touching bricks supporting a third wedge or brick, were learned.
• The idea of near miss objects -- similar to actual instances was introduced.
• Input was a line drawing of a blocks world structure.
• Input processed (see VISION Sections later) to produce a semantic net representation of
the structural description of the object (Fig. 27)

Fig: House object and semantic net

• Links in network include left-of, right-of, does-not-marry, supported-by, has-


part, and isa.
• The marry relation is important -- two objects with a common touching edge are said to
marry. Marrying is assumed unless does-not-marry stated.

There are three basic steps to the problem of concept formulation:

1. Select one know instance of the concept. Call this the concept definition.
2. Examine definitions of other known instance of the concept. Generalise the definition to
include them.
3. Examine descriptions of near misses. Restrict the definition to exclude these.

Both steps 2 and 3 rely on comparison and both similarities and differences need to be identified.

DECISION TREES

o Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which
expands on further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification
and Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
o Below diagram explains the general structure of a decision tree:
Why we use Decision Trees?

There are various algorithms in Machine learning, so choosing the best algorithm for the given
dataset and problem is the main point to remember while creating a machine learning model.
Below are the two reasons for using the Decision tree:

o Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like
structure.

Decision Tree Terminologies

• Root Node: Root node is from where the decision tree starts. It represents the entire
dataset, which further gets divided into two or more homogeneous sets.
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated
further after getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
• Branch/Sub Tree: A tree formed by splitting the tree.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes
are called the child nodes.

How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root
node of the tree. This algorithm compares the values of root attribute with the record (real
dataset) attribute and, based on the comparison, follows the branch and jumps to the next node.

For the next node, the algorithm again compares the attribute value with the other sub-nodes and
move further. It continues the process until it reaches the leaf node of the tree. The complete
process can be better understood using the below algorithm:

• Step-1: Begin the tree with the root node, says S, which contains the complete
dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure
(ASM).
• Step-3: Divide the S into subsets that contains possible values for the best attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset created
in step -3. Continue this process until a stage is reached where you cannot further
classify the nodes and called the final node as a leaf node.

Example:

Suppose there is a candidate who has a job offer and wants to decide whether he should accept
the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary
attribute by ASM). The root node splits further into the next decision node (distance from the
office) and one leaf node based on the corresponding labels. The next decision node further gets
split into one decision node (Cab facility) and one leaf node. Finally, the decision node splits into
two leaf nodes (Accepted offers and Declined offer). Consider the below diagram:

Attribute Selection Measures

While implementing a Decision tree, the main issue arises that how to select the best attribute for
the root node and for sub-nodes. So, to solve such problems there is a technique which is called
as Attribute selection measure or ASM. By this measurement, we can easily select the best
attribute for the nodes of the tree. There are two popular techniques for ASM, which are:

o Information Gain
o Gini Index
1. Information Gain:

o Information gain is the measurement of changes in entropy after the segmentation of a


dataset based on an attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision tree.
o A decision tree algorithm always tries to maximize the value of information gain, and a
node/attribute having the highest information gain is split first. It can be calculated using
the below formula:

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies


randomness in data. Entropy can be calculated as:

Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)

Where,

o S= Total number of samples


o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini
index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binary
splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2


Pruning: Getting an Optimal Decision tree

Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal
decision tree.

A too-large tree increases the risk of overfitting, and a small tree may not capture all the
important features of the dataset. Therefore, a technique that decreases the size of the learning
tree without reducing accuracy is known as Pruning. There are mainly two types of
tree pruning technology used:

o Cost Complexity Pruning


o Reduced Error Pruning.

Advantages of the Decision Tree


• It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
• It can be very useful for solving decision-related problems.
• It helps to think about all the possible outcomes for a problem.
• There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


• The decision tree contains lots of layers, which makes it complex.
• It may have an over fitting issue, which can be resolved using the Random Forest
algorithm.
• For more class labels, the computational complexity of the decision tree may
increase.

You might also like