0% found this document useful (0 votes)
16 views8 pages

Ai Module V Part2

The document covers key concepts in artificial intelligence learning, focusing on uncertain knowledge and various forms of learning such as supervised, unsupervised, and reinforcement learning. It details the process of learning from observations, inductive learning, and decision tree learning, including algorithms and performance assessment. Additionally, it addresses challenges like noise and overfitting in learning algorithms, emphasizing the importance of effective representation and feedback in the learning process.

Uploaded by

dssvarma.2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

Ai Module V Part2

The document covers key concepts in artificial intelligence learning, focusing on uncertain knowledge and various forms of learning such as supervised, unsupervised, and reinforcement learning. It details the process of learning from observations, inductive learning, and decision tree learning, including algorithms and performance assessment. Additionally, it addresses challenges like noise and overfitting in learning algorithms, emphasizing the importance of effective representation and feedback in the learning process.

Uploaded by

dssvarma.2020
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

Learning

Syllabus
Uncertain Knowledge: Uncertainty: Acting under uncertainty, basic probability notation, the axioms of
Probability, Inference using full joint distributions, independence, Baye’s rule and its use, the wumpus world
revisited. Learning: Learning from Observations: Forms of learning, Inductive learning, learning decision
trees, ensemble learning. Why Learning Works: Computational learning theory.

1. Learning from observations


Learning is a process that improves the knowledge of an AI program by making observations about its
environment. It makes use of Percept’s, not only for acting, but also for improving the agent's ability to act
in the future.
1.1 Forms of Learning:
Learning agent contains a performance element that decides what actions to take and a Learning element
that modifies the performance element so that it makes better decisions.
The design of a learning element is affected by three major issues:
i. Which components of the performance element are to be learned?
ii. What feedback is available to learn these components?
iii. What representation is used for the components?
Components of these agents include the following;
1. A direct mapping from conditions on the current state to actions.
2. A means to infer relevant properties of the world from the percept sequence.
3. Information about the way the world evolves and about the results of possible actions the agent can
take.
4. Utility information indicating the desirability of world states.
5. Action-value information indicating the desirability of actions.
6. Goals that describe classes of states whose achievement maximizes the agent's utility
Feedback:
 Each of the components can be learned from appropriate feedback.
 The type of feedback available for learning determines the nature of the learning problem that the
agent faces.
 The field of machine learning usually distinguishes three cases: supervised, unsupervised, and
reinforcement learning.
Unsupervised Learning:
It involves learning patterns in the input when no specific output values are supplied. For example, a taxi
agent might gradually develop a concept of "good traffic days" and "bad traffic days" without ever being
given labeled examples of each. A purely unsupervised learning agent cannot learn what to do, because it
has no information as to what constitutes a correct action or a desirable state.
Supervised Learning:
It involves learning a function from examples of its inputs and outputs. Cases (I), (2), and (3) are all
instances of supervised learning problems.
In (I), the agent learns condition-action rule for braking-this is a function from states to a Boolean output (to
brake or not to brake),
In (2), the agent learns a function from images to a Boolean output (whether the image contains a bus).

1
CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

In (3), the theory of braking is a function from states and braking actions to, say, stopping distance in feet.
Notice that in cases (1) and (2), a teacher provided the correct output value of the examples; in the third, the
output value was available directly from the agent's percepts. For fully observable environments, it will
always be the case that an agent can observe the effects of its actions and hence can use supervised learning
methods to learn to predict them. For partially observable environments, the problem is more difficult,
because the immediate effects might be invisible.

Reinforcement learning:
Rather than being told what to do by a teacher, a reinforcement learning agent must learn from
reinforcement.
 The representation of learned information plays a role in determining how the learning algorithm
must work. Any of the components of an agent can be represented using any of the representation
schemes like linear weighted polynomials, propositional and first-order logical sentences,
probabilistic descriptions etc.,
 Availability of prior knowledge also plays major role in design of learning system.
1.2 Inductive Learning:
Consider an example pair (x, f (x)), where x is the input and f(x) is the output of the function applied to x.
The task of pure inductive inference is “Given a collection of examples of f, return a function h that
approximates f”. The function h is called a hypothesis.
 Inductive learning involves finding a consistent hypothesis that agrees with the examples.
 Figure: 1.2.1 shows a familiar example: fitting a function of a single variable to some data points. The
examples are (x, f (x)) pairs, where both x and f (x) are real numbers. We choose the hypothesis
space H-the set of hypotheses we will consider-to be the set of polynomials of degree at most k.
 Figure (a) shows some data with an exact fit by a straight line (a polynomial of degree 1). The line is
called a consistent hypothesis because it agrees with all the data. Figure (b) shows a high-degree
polynomial that is also consistent with the same data. This illustrates the first issue in inductive
learning: how do we choose from among multiple consistent hypotheses?
 Ockham’s razor suggests the simplest hypothesis consistent with the data.
 Figure (c) shows a second data set. There is no consistent straight line for this data set; in fact, it
requires a degree-6 polynomial (with 7 parameters) for an exact fit. There are just 7 data points, so the
polynomial has as many parameters as there are data points: thus, it does not seem to be finding any
pattern in the data and we do not expect it to generalize well. It might be better to fit a simple straight
line that is not exactly consistent but might make reasonable predictions.
 Figure (d) shows that the data in (c) can be fit exactly by a simple function of the form ax + b + c sin
x. This example shows the importance of the choice of hypothesis space. A hypothesis space
consisting of polynomials of finite degree cannot represent sinusoidal functions accurately, so a
learner using that hypothesis space will not be able to learn from sinusoidal data.
 A learning problem is realizable if the hypothesis space contains the true function; otherwise, it is
unrealizable.

2
CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

Figure 1.2.1 (a) Example (x, f (x)) pairs and a consistent, linear hypothesis. (b) A consistent,
degree-7 polynomial hypothesis for the same data set. (c) A different data set that admits an exact
degree-6 polynomial fit or an approximate linear fit. (d) A simple, exact sinusoidal fit to the same
data set.
1.3 Learning Decision Trees:
A Decision tree takes as input an object or situation described by a set of attributes and returns a "decision”
-the predicted output value for the input.
 The input attributes and the output values can be discrete or continuous.
 Learning discrete-valued function is called “classification “whereas continuous-valued functions are
called “regression”.
 A decision tree reaches its decision by performing a sequence of tests.
 Each internal node in the tree corresponds to a test of the value of one of the properties.
 The branches from the node are labelled with the possible values of the test.
 Each leaf node in the tree specifies the value to be returned if that leaf is reached.
Consider a list of Attributes: [ Example-To wait for a table at restaurant?]

Decision tree:
Here, Attributes are processed by the tree starting at the root and following the appropriate branch until a leaf
is reached. For instance, an example with Patrons = Full and Wait Estimate = 0-10 will be classified as
positive (i.e., yes, we will wait for a table).

3
CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

A decision tree for deciding whether to wait for a table.

Expressiveness of decision trees:


Any particular decision tree hypothesis for the Will Wait goal predicate can be seen as an assertion of the
form

Where each condition Pi(s) is a conjunction of tests corresponding to a path from the root of the tree to a leaf
with a positive outcome.
Decision trees can express any function of the input attributes. For Boolean functions, truth table row gives
path to leaf.

If the function is the parity function, which returns 1 if and only if an even number of inputs are 1, then an
exponentially large decision tree will be needed. It is also difficult to use a decision tree to represent a
majority function, which returns 1 if more than half of its inputs are 1.

The truth table has 2n rows, because each input case is described by n attributes. We can consider the
"answer" column of the table as a 2n-bit number that defines the function.
Inducing Decision trees for examples:
An example for a Boolean decision tree consists of a vector of' input attributes, X, and a single Boolean
output value y. A set of examples (X1, yl) . . . , (XI2, y12) is shown in following Figure.

4
CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

Examples for the restaurant domain.

 The positive examples are the ones in which the goal Will Wait is true (X1, X3, . . .); the negative
examples are the ones in which it is false (X2, X5,...)

 The complete set of examples is called the “training set”.

 Construct a decision tree that has one path to a leaf for each example, where the path tests each
attribute in turn and follows the value for the example and the leaf has the classification of the
example. When given the same example again, the decision tree will come up with the right
classification.
Algorithm:

The Decision tree Learning Algorithm

5
CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

Decision tree induced from 12-example training set

Splitting the examples by testing on attributes. (a) Splitting on Type brings us no nearer to distinguishing
between positive and negative examples. (b) Splitting on Patrons does a good job of separating positive and
negative examples? After splitting on patrons, Hungry is a fairly good second test.

There are four cases to consider for these recursive problems:


1. If there are some positive and some negative examples, then choose the best attribute to split them. Figure
(b) shows Hungry being used to split the remaining examples.

2. If all the remaining examples are positive (or all. negative), then we are done: we can answer Yes or No.
Figure (b) shows examples of this in the none and some cases.

3. If there are no examples left, it means that no such example has been observed, and we return a default
value calculated from the majority classification at the node's parent.

6
CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

4. If there are no attributes left, but both positive and negative examples, we have a problem. It means that
these examples have exactly the same description, but different classifications. This happens when some of
the data are incorrect; we say there is noise in the data. It also happens either when the attributes do not give
enough information to describe the situation fully, or when the domain is truly nondeterministic. One simple
way out of the problem is to use a majority vote.
Choosing Attribute Sets:
 One suitable measure is the expected amount of information provided by the attribute.
 It can think of as giving answer to question. The amount of information contained in the answer
depends on one's prior knowledge.
 If the possible answers vi have probabilities P(vi), then the information content I of the actual answer
is given b.

 Suppose the training set contains p positive examples and n negative examples. Then an estimate of
the information contained in a correct answer is

 After testing attribute A, we will need

 The information gain from the attribute test is the difference between the original information
requirement and the new requirement

Assessing the performance of the learning algorithm:


 A learning algorithm is good if it produces hypotheses that do a good job of predicting the
classifications of unseen examples.
 If we train on all our available examples, then we will have to go out and get some more to test on
 Collect a large set of examples.
 Divide it into two disjoint sets: the training set and the test set.
 Apply the learning algorithm to the training set, generating a hypothesis h.
 Measure the percentage of examples in the test set that are correctly classified by h.
 Repeat steps 2 to 4 for different sizes of training sets and different randomly selected training sets of
each size.

7
CSEN2031: ARTIFICIAL INTELLIGENCE Module V LECTURE NOTES

A learning curve for the decision tree algorithm on 100 randomly generated examples in the
restaurant domain. The graph summarizes 20 trials.

Noise and over fitting:


Whenever there is a large set of possible hypotheses, one has to be careful not to use the resulting freedom to
find meaningless "regularity" in the data. This problem is called over fitting. A very general phenomenon,
over fitting occurs even when the target function is not at all random. It afflicts every kind of learning
algorithm, not just decision trees.
Decision tree pruning is here to deal with the problem. Pruning works by preventing recursive splitting on
attributes that are not clearly relevant, even when the data at that node in the tree are not uniformly classified.

Applicability of decision trees:


-Missing data
-Multi valued attributes
-Continuous and integer-valued input attributes
-Continuous-valued output attributes

You might also like