CS601PC - MACHINE LEARNING Unit - 1-2
CS601PC - MACHINE LEARNING Unit - 1-2
MACHINE LEARNING
CS601PC: MACHINE LEARNING
DR DV RAMANA,
DATA STRATEGIST –CONSULTANT
AND
DVR
ACADEMIC ADVISOR
MACHINE LEARNING
MACHINE LEARNING
Prerequisites
Data Structures
Knowledge on statistical methods
.
DVR
MACHINE LEARNING
MACHINE LEARNING
Course Objective
This course explains machine learning techniques such as decision tree learning, Bayesian learning etc
DVR
MACHINE LEARNING
MACHINE LEARNING
Course Outcomes
Ability to get the skill to apply machine learning techniques to address the real time problems in different areas
.
Understand the Neural Networks and its usage in machine learning application.
DVR
MACHINE LEARNING
MACHINE LEARNING
References
TEXT BOOK
Machine Learning – Tom M. Mitchell, - MGH
.
REFERENCE BOOK:
Machine Learning: An Algorithmic Perspective, Stephen Marshland, Taylor & Francis
DVR
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Introduction -
Well-posed learning problems,
DVR
Inductive bias
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Decision Tree Learning
Introduction
DVR
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Machine learning is a subfield of artificial intelligence, which is broadly defined as the capability of a machine to
imitate intelligent human behavior
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure
P, if its performance at tasks in T, as measured by P, improves with experience E.
Model
DVR
MACHINE LEARNING
MACHINE LEARNING
Introduction - Well-posed learning problems
Introduction - Well-posed learning problems
A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves with experience E.
EXAMPLE:
A computer program that learns to play checkers might improve its performance as measured by its ability to
win at the class of tasks involving playing checkers games, through experience obtained by playing games
against itself.
DVR
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to recognize spoken words
All of the most successful speech recognition systems employ machine learning in some form
Example
SPHINX system (e.g., Lee 1989) learns speaker-specific strategies for recognizing
the primitive sounds (phonemes) and words from the observed speech signal
Neural network learning methods (e.g., Waibel et al. 1989) and methods for learning hidden
Markov models (e.g., Lee 1989) are effective for
Automatically customizing to , Individual speakers Vocabularies Microphone characteristics
Background noise, etc.
DVR
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to drive an autonomous vehicle
Machine learning methods have been used to train computer-controlled vehicles to steer correctly when
driving on a variety of road types
Example
ALVINN system (Pomerleau 1989) has used its learned strategies to drive unassisted
at 70 miles per hour for 90 miles on public highways among other cars
DVR
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to classify new astronomical structures
Machine learning methods have been applied to a variety of large databases to learn general regularities
implicit in the data
Example
Decision tree learning algorithms have been used by NASA to learn how to classify
celestial objects from the second Palomar Observatory Sky Survey (Fayyad et al. 1995)
This system is now used to automatically classify all objects in the Sky Survey, which consists of three
terabytes of image data
DVR
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to play world-class backgammon
The most successful computer programs for playing games such as backgammon are based on machiie
learning algorithms
Example
Learned its strategy by playing over one million practice games against itself
Similar techniques have applications in many practical problems where very large search spaces must be
examined efficiently.
DVR
MACHINE LEARNING
MACHINE LEARNING
Three features
Class of tasks
Measure of performance to be improved, and
Source of experience
A robot driving learning problem:
Task T: playing checkers
Performance measure P: percent of games won against opponents
Training experience E : playing practice games against itself
A handwriting recognition learning problem
DVR
Any algorithm that achieves this is called Machine Learning Algorithm
MACHINE LEARNING
MACHINE LEARNING
Three features
Definition of Machine Learning (Mitchell 1997)
A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at the tasks improves with the experiences
Total percent of mails being correctly classified as ‘spam’ (or ‘not spam’ )
Performance measure P:
by the program
Training experience E: A set of mails with given labels (‘spam’ / ‘not spam’)
DVR
MACHINE LEARNING
MACHINE LEARNING
Three features
Simple Learning Process
DVR
MACHINE LEARNING
MACHINE LEARNING
Machine learning algorithms
Machine learning draws on ideas from a diverse set of disciplines, including
Artificial intelligence
Computational complexity
Information theory
Philosophy
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Traditional Programming
Data
Computer Output
Program
Machine Learning
Data
Computer Program
Output
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Artificial intelligence
Any method that tries to replicate the results of some aspect of human cognition
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning
Any method that tries to replicate the results of some aspect of human cognition
Machine learning if the set of algorithms which actually gets better Artificial intelligence might or
might not actually get better with experience
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning
A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves with experience E
Main goal of Machine learning is to devise learning algorithms that do the learning automatically without human
intervention or assistance
Develop Computational models of human learning process and perform computer simulations
To build computer systems that can adapt and learn from their experience
Can Figure out how to perform important tasks by generalizing from examples
Provides business insight and intelligence. Decision makers are provided with greater insights into
their organizations
Discover the relationship between the variables of a system(input,output and hidden) from direct
DVR
samples of the system
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning
Data
Computer Program
Output
Seeds = Algorithms
Nutrients= Data
Gardner = You
Plants = Programs
https://archive.org/details/academictorrents_0db676a6aaff8c33f9749d5f9c0fa22bf336bc76/01+Introduction+%
DVR
26+Inductive+learning/6.++Machine+Learning+In+Practice.mp4
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning
Any method that tries to replicate the results of some aspect of human cognition
Machine learning if the set of algorithms which actually gets better Artificial intelligence might or
might not actually get better with experience
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Bayesian methods
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Computational complexity theory
Theoretical bounds on the inherent complexity of different learning tasks, measured in terms of the
Computational effort
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Control theory
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Information theory
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Philosophy
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Psychology and Neurobiology
The power law of practice, which states that over a very broad range of learning problems,
people's response time improves with practice according to a power law
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Statistics
Characterization of errors (e.g., bias and variance) that occur when estimating the accuracy of a
hypothesis based on a limited sample of data. Confidence intervals, statistical tests.
DVR
MACHINE LEARNING
MACHINE LEARNING
Perspectives and issues in machine learning
Perspectives and issues in machine learning
It involves searching a very large space of possible hypotheses to determine one that best fits the
observed data and any prior knowledge held by the learner
Example
Consider the space of hypotheses that could in principle be output by the above checkers learner.
This hypothesis space consists of all evaluation functions that can be represented by some choice of
values for the weights wo through w6
DVR
MACHINE LEARNING
MACHINE LEARNING
Perspectives and issues in machine learning
Perspectives and issues in machine learning
The learner's task is thus to search through this vast space to locate the hypothesis that is most
consistent with the available training examples
LMS algorithm for fitting weights achieves this goal by iteratively tuning the weights, adding a
correction to each weight each time the hypothesized evaluation function predicts a value that
differs from the training value
LMS algorithm works well when the hypothesis representation considered by the learner defines a
continuously parameterized space of potential hypotheses
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
For a well posed ML problem based on the design issues and approaches the
different steps are used to measure the performance of the problem
A learning Mechanism
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 1 - Choose type of Training Experience/data SET
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 2 - Choose Target Function -Target Function V
What type of knowledge will be learned and how this will be used by the
performance program
Example:
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 3 - Choose Target Function Representation
Ideal Target Function is usually not known; machine learning algorithms learn an
approximation of V ,say V or V‘
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 3 - Choose Target Function Representation
Ideal Target Function is usually not known; machine learning algorithms learn an
approximation of V ,say V or V‘
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 3 - Choose Target Function Representation
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 4-Target Function Approximation
Choose Learning Algorithm
Estimating Values
Adjusting weights
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 4-Target Function Approximation
Choose Learning Algorithm
Estimating Values
Adjusting weights
DVR
MACHINE LEARNING
MACHINE LEARNING
Designing a Machine Learning System
Step 5 –Final Design
DVR
MACHINE LEARNING
MACHINE LEARNING
Issues in Machine Learning
Issues in Machine Learning
What algorithms exist for learning general target functions from specific training examples?
In what settings will particular algorithms converge to the desired function, given sufficient training data?
Which algorithms perform best for which types of problems and representations?
What general bounds can be found to relate the confidence in learned hypotheses to the amount
of training experience and the character of the learner's hypothesis space?
When and how can prior knowledge held by the learner guide the process of generalizing from examples?
What is the best strategy for choosing a useful next training experience, and how does the choice of this strategy
alter the complexity of the learning problem?
What is the best way to reduce the learning task to one or more function approximation problem?
DVR
MACHINE LEARNING
MACHINE LEARNING
Machine learning algorithms
Machine learning algorithms have proven to be of great practical value in a variety of application domains.
Data mining problems where large databases may contain valuable implicit regularities that can be
discovered automatically (e.g., to analyze outcomes of medical treatments from patient databases or to
learn general rules for credit worthiness from financial databases)
Poorly understood domains where humans might not have the knowledge needed to develop effective
algorithms (e.g.,human face recognition from images); and
Domains where the program must dynamically adapt to changing conditions (e.g., controlling manufacturing
processes under changing supply stocks or adapting to the changing reading interests of individuals)
DVR
MACHINE LEARNING
MACHINE LEARNING
Machine learning algorithms
A well-defined learning problem requires a well-specified task, performance metric, and source of training
experience.
DVR
MACHINE LEARNING
MACHINE LEARNING
Learning
Learning
Learning process includes gaining of new symbolic knowledge and development of cognitive skills through
instruction and practice
It is also discovery pf new facts and theories through observation and experiment
Inductive learning is based on formulating a generalized concept after observing examples of the concept
Example
If a kid is asked to write an answer to 2*8=x, they can either use the rote learning method to
memorize the answer or use inductive learning (i.e. thinking how 2*1=2, 2*2=4, and so on) to
formulate a concept to calculate the results
In this way, the kid will be able to solve similar types of questions using the same concept
DVR
MACHINE LEARNING
MACHINE LEARNING
Learning
Learning
“The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing
something.”
Rote learning (memorization) Memorizing things without knowing the concept/logic behind them
Inductive learning (experience On the basis of past experience, formulating a generalized concept.
DVR
MACHINE LEARNING
MACHINE LEARNING
Learning
Concept learning
“The problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits
the training examples.” — Tom Michell
Example
Humans identify different vehicles among all the vehicles based on specific sets of features
defined over a large set of features
This special set of features differentiates the subset of cars in a set of vehicles
Machines can learn from concepts to identify whether an object belongs to a specific category by processing past/training
data to find a hypothesis that best fits the training examples
Much of human learning involves acquiring general concepts from past experiences.
DVR
MACHINE LEARNING
MACHINE LEARNING
Learning
Concept learning
Acquiring the definition of a general category from given sample positive and negative training examples of the
category.
Concept Learning can seen as a problem of searching through a predefined space of potential hypotheses for the
hypothesis that best fits the training examples
Goal of the concept learning search is to find the hypothesis that best fits the training examples
Concept learning can be viewed as searching through a large space of hypothesis implicitly defined by the
hypothesis representation
DVR
MACHINE LEARNING
MACHINE LEARNING
Concept learning as search
Concept learning can be viewed as the task of searching through a large space of hypothesis implicitly defined by the
hypothesis representation
The goal of this search is to find the hypothesis that best fits the training examples
DVR
MACHINE LEARNING
MACHINE LEARNING
Hypothesis Representation
Hypothesis
If some instance x satisfied all the constraints of hypothesis h, then h classifies x as a positive example (h()x=1)
DVR
MACHINE LEARNING
MACHINE LEARNING
Hypothesis Representation
Hypothesis Representation
A hypothesis:
EnjoySport concept learning task requires learning the sets of days for which EnjoySport=yes, describing this set by a
conjunction of constraints over the instance attributes
DVR
MACHINE LEARNING
MACHINE LEARNING
Hypothesis Representation
Hypothesis Representation
Given – Instances X : set of all possible days, each described by the attributes
DVR
– A hypothesis h in H such that h(x) = c(x) for all x in D
MACHINE LEARNING
MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning Formal Definition for Concept Learning
Inferring a boolean-valued function from training examples of its input and output
Attribute Concept
DVR
MACHINE LEARNING
MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples
Attribute Concept
The task is to learn to predict the value of EnjoySport for arbitrary day, based on the values of its attribute values
DVR
Each hypothesis will be a vector of six constraints, specifying the values of the six attributes
– (Sky, AirTemp, Humidity, Wind, Water, and Forecast)
MACHINE LEARNING
MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples
Attribute Concept
DVR
Φ - indicating no value is acceptable for the attribute (no value)
MACHINE LEARNING
MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples
Concept
Attribute
Task T: Learn to predict the value of EnjoySport for an arbitrary day, based on the values of the attributes of the day
DVR
Training experience E: A set of days with given labels (EnjoySport: Yes/No)
MACHINE LEARNING
MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples
Concept
Attribute
Let us take a very simple hypothesis representation which consists of a conjunction of constraints in the instance
attributes
We get a hypothesis h_i with the help of example i for our training set as below: hi(x) := <x1, x2, x3, x4, x5, x6>
where x1, x2, x3, x4, x5 and x6 are the values of Sky, AirTemp, Humidity, Wind, Water and Forecast.
h1(x=1): <Sunny, Warm, Normal, Strong, Warm, Same > Note: x=1 represents a positive hypothesis / Positive example
DVR
MACHINE LEARNING
MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples
Concept
Attribute
We want to find the most suitable hypothesis which can represent the concept
Example
Ramesh enjoys his favorite sport only on cold days with high humidity (This seems independent of the values of the
other attributes present in the training examples).
DVR
MACHINE LEARNING
MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples
Concept
Attribute
where every day is a positive example and the most specific hypothesis will be <?,?,?,?,?,? > where no day is a positive
example
DVR
MACHINE LEARNING
MACHINE LEARNING
General-to-Specific Ordering of Hypotheses
General-to-Specific Ordering of Hypotheses
Many algorithms for concept learning organize the search through the hypothesis space by relying on a very
useful structure that exists for any concept learning problem:
By taking advantage of this naturally occurring structure over the hypothesis space, we can design learning
algorithms that exhaustively search even infinite hypothesis spaces without explicitly enumerating every
hypothesis.
DVR
MACHINE LEARNING
MACHINE LEARNING
General-to-Specific Ordering of Hypotheses
General-to-Specific Ordering of Hypotheses
h1 = (Sunny, ?, ?, Strong, ?, ?)
h2 = (Sunny, ?, ?, ?, ?, ?)
Now consider the sets of instances that are classified positive by h1 and by h2.
Because h2 imposes fewer constraints on the instance, it classifies more instances as positive
DVR
MACHINE LEARNING
MACHINE LEARNING
General-to-Specific Ordering of Hypotheses
General-to-Specific Ordering of Hypotheses
First, for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1
We now define the more_general_than_or_equal_to relation in terms of the sets of instances that satisfy the
two hypotheses:
Given hypotheses hj and hk, hj is more-general_than_equal_to hk if and only if any instance that satisfies hk also
satisfies hj
Definition:
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
find-S - Finding a maximally specific hypothesis
The find-S algorithm is a basic concept learning algorithm in machine learning.
The find-S algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here
that the algorithm considers only those positive training example
The find-S algorithm finds the most specific hypothesis that fits all the positive examples
find-S algorithm starts with the most specific hypothesis and generalizes this hypothesis each time it fails to
classify an observed positive training data
Find-S algorithm moves from the most specific hypothesis to the most general hypothesis
DVR
Using the Find-S algorithm gives a single maximally specific hypothesis for the given set of training examples
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Important Representation : find-S - Finding a maximally specific hypothesis
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
2. Take the next example and if it is negative, then no changes occur to the hypothesis.
3. If the example is positive and we find that our initial hypothesis is too specific then we
update our current hypothesis to a general condition.
4. Keep repeating the above steps till all the training examples are complete.
5. After we have completed all the training examples we will have the final hypothesis when
can use to classify the new examples.
DVR
MACHINE LEARNING
MACHINE LEARNING
How Does Find-S algorithm Works?
Initialize h
Identify a
positive example
Check for
Attributes
Yes
Attribute
value is
Replace the value
equal to No
with “?”
hypothesis
value
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes
3.Apply the find-S algorithm by hand on the given training set. Consider the
examples in the specified order and write down your hypothesis each time after
observing an example
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes
Solution:
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes
Number of hypothesis in the hypothesis space, whenever we have null for any of the hypothesis, it will not accept or never be
positive classification so for that reason we calculate semantically distinct hypothesis
Calculate semantically distinct hypothesis, we consider the actual number of possibilities along with the question mark as one
more possibility. In this case one as a question then
DVR
2*3*2*2*3 is written as (3*4*3*3*4)+1 =433 ( for all hypothesis consisting of null as 1)
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes
3.Apply the find-S algorithm by hand on the given training set. Consider the examples in the specified order and
write down your hypothesis each time after observing an example
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes
3.Apply the find-S algorithm by hand on the given training set. Consider the examples in the specified
order and write down your hypothesis each time after observing an example
X3=(some,big,always,expensive,few) -no
h3=(many,big,no,expensive,one)
X4=(many,medium,no,expensive,many)-Yes
h4=(many,?,no,expensive,?)
X5=(many,small,no,affordable,many)-yes
h5=(many,?,no,?,?)
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes
3.Apply the find-S algorithm by hand on the given training set. Consider the examples in the specified order and write down your
hypothesis each time after observing an example
Solution: Step 3 Final Hypothesis or maxminally specific hypothesis for this given data set is
h5=(many,?,no,?,?)
DVR
MACHINE LEARNING
MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Implement Find-S Algorithm to the following table and Generate the final Hypothesis
DVR
MACHINE LEARNING
MACHINE LEARNING
Advantages of Find-S Algorithm
Find-S algorithm only considers the positive examples and eliminates negative examples
Find-S algorithm does not provide a backtracking technique to determine the best
possible changes that could be done to improve the resulting hypothesis.
DVR
MACHINE LEARNING
MACHINE LEARNING
Limitations of Find-S Algorithm
Inconsistent training sets can actually mislead the Find-S algorithm, since it ignores the
negative examples.
Find-S algorithm does not provide a backtracking technique to determine the best
possible changes that could be done to improve the resulting hypothesis.
DVR
MACHINE LEARNING
MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Version space:
A set of all hypothesis that are consistent with the training examples
The version space denoted VS_H,D with respect to hypothesis space H and training
examples D, is the subset of from H consistent with the training examples in D
The version space is a hierarchical representation of knowledge that enables you to keep track of all the
useful information supplied by a sequence of learning examples without remembering any of the examples
The version space method is a concept learning process accomplished by managing multiple models within a version space
DVR
MACHINE LEARNING
MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Version space:
One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with the
training data – there might be many
A version space is a hierarchical representation of knowledge that enables you to keep track of all
the useful information supplied by a sequence of learning examples without remembering any of the
examples
Version space learning is a logical approach to machine learning, specifically binary classification
Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences.
Version Space not only just written one hypothesis but a set of all possible hypothesis based on training data-set
DVR
MACHINE LEARNING
MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Characteristics of Version Space
3.A plausible description is one that is applicable to all known positive examples and no known negative example
DVR
MACHINE LEARNING
MACHINE LEARNING
Version spaces and the candidate elimination algorithm
An hypothesis h is consistent with set of training examples D iff h(x)=c(x) for each example in D
h1=(?,?,No,?,Many) - Consistent
DVR
MACHINE LEARNING
MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Remarks on version spaces and candidate elimination
An hypothesis h is consistent with set of training examples D iff h(x)=c(x) for each example in D
DVR
MACHINE LEARNING
MACHINE LEARNING
Find S Algorithm Vs Candidate Elimination algorithm
FIND-S outputs a hypothesis from H, that is consistent with the training examples, this is just one of
many hypotheses from H that might fit the training data equally well
Candidate-Elimination algorithm is to output a description of the set of all hypotheses consistent with
the training examples.
DVR
MACHINE LEARNING
MACHINE LEARNING
Inductive bias
The Candidate Elimination Algorithm will converge toward the true target concept provided it is given
accurate training examples and provided its initial hypothesis space contains the target concept
How does the size of this hypothesis space influence the ability of the
algorithm to generalize to unobserved instances?
How does the size of the hypothesis space influence the number of
training examples that must be observed?
DVR
MACHINE LEARNING
MACHINE LEARNING
Inductive bias
In EnjoySport example, we restricted the hypothesis space to include only conjunctions of
attribute values.
Because of this restriction, the hypothesis space is unable to represent even
simple disjunctive target concepts such as "Sky = Sunny or Sky Cloudy."
Sky Air Temp Humidity Wind Water Forecast Enjoy Sport
Sunny Warm Normal Strong Cool Change Yes
Cloudy Warm Normal Strong Cool Change Yes
Rainy Warm Normal Strong Cool Change No
From first two examples S2 : <<?, Warm, Normal, Strong, Cool, Change>
This is inconsistent with third examples, and there are no hypotheses consistent with these
three examples
Problem: We have biased the learner to consider only conjunctive hypothesis. We require a more
expressive hypothesis space
The obvious solution to the problem of assuring that the target concept is in the hypothesis space
H is to provide a hypothesis space capable of representing every teachable concept.
DVR
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
The candidate-Elimination algorithm computes the version space containing all (and only those)
hypothesis from H that are consistent with an observed sequence of training examples
The candidate elimination algorithm incrementally builds the version space given a hypothesis space
H and a set E of examples
The candidate elimination algorithm does this by updating the general and specific boundary for each new
example
You can consider this as an extended form of Find-S algorithm
The candidate elimination algorithm does this by updating the general and specific boundary for
each new example
Consider both positive and negative examples
Actually, positive examples are used here as Find-S algorithm (Basically they are generalizing from the
specification)
DVR
While the negative example is specified from generalize form
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
Terms Used:
Concept learning:
Concept learning is basically learning task of the machine (Learn by Train data)
General Hypothesis:
Number of attributes:
Specific Hypothesis
Specifying features to learn machine (Specific feature)
Specific Hypothesis
S= {‘pi’,’pi’,’pi’…}: Number of pi depends on number of attributes
Version Space
It is intermediate of general hypothesis and Specific hypothesis.
DVR
It not only just written one hypothesis but a set of all possible hypothesis based on training data-set
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
DVR
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
Algorithm:
DVR
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:
Algorithmic steps:
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']
DVR
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:
Algorithmic steps:
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]
G1 = G
S1 = ['sunny','warm','normal','strong','warm ','same']
DVR
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:
Algorithmic steps:
For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.
G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same']
DVR
MACHINE LEARNING
MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:
Algorithmic steps:
For instance 4 : <'sunny','warm','high','strong','cool','change'> and positive output
G2 = G
S2 = ['sunny','warm',?,'strong','warm ','same']
Output :
G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]
S = ['sunny','warm',?,'strong', ?, ?] .
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Remarks on Candidate Elimination and Version space algorithm:
DVR
MACHINE LEARNING
MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Inductive bias InductiveBias-Fundamental Questions for InductiveInference
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Introduction
A decision tree is a tree where each node represents feature(Attribute), each link(Branch) represents
a decision(rule) and each leaf represents an outcome
A decision tree is a simple representation for classifying examples
Decision tree learning is a method commonly used in data mining
The goal is to create a model that predicts the value of a target variable based on several input
variables
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Introduction
Decision tree learning is a method for approximating discrete- valued target functions, in which the
learned function is represented by a decision tree.
Learned trees can also be re- represented as sets of if-then rules to improve human readability
Decision trees classify instances by sorting them done from the tree from the root to some leaf, which
provides the classification of the instance.
Each node in the tree specifies a test of some attribute of the instance and each branch descending
from the node corresponds to one of the possible values for this attribute.
An instance is classified by starting at the toot node of the tree, testing the attribute specified by
this node, then moving down the tree branch corresponding to the value of the attribute in the
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Introduction
Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree
A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)
A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.
Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree
A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)
A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.
1. Each leaf node has a class label, determined by majority vote of training examples reaching that leaf
2. Each internal node is a question on features. It branch out according to the answers
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Introduction
Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree
A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)
A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.
Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree
A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)
A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.
1. Each leaf node has a class label, determined by majority vote of training examples reaching that leaf
2. Each internal node is a question on features. It branch out according to the answers
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree consists of three types of nodes:
Decision nodes – typically represented by squares
Chance nodes – typically represented by circles
End nodes – typically represented by triangles
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree classifies instances
Node An attribute which describes an instance
Branch Possible values of the attribute
Leaf Class To which the instance belong
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Important Terminology related to Decision Trees
Root Node
Root Node represents the entire population or sample and this further gets divided into two or more homogeneous sets.
Splitting
Decision Node
When a sub-node splits into further sub-nodes, then it is called the decision node.
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Important Terminology related to Decision Trees
Pruning
When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite
process of splitting
Branch / Sub-Tree
When a sub-node splits into further sub-nodes, then it is called the decision node.
A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the
child of a parent node
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Appropriate problems for decision tree learning
Decision tree learning is generally best suited to problems with the following characteristics:
Instances are described by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot)
The easiest situation for decision tree learning is when each attribute takes on a small number of disjoint possible
values (e.g., Hot, Mild, Cold)
Extensions to the basic algorithm allow handling real-valued attributes as well (e.g., representing Temperature
numerically
Decision tree learning methods are robust to errors, both errors in classifications of the training examples and
errors in the attribute values that describe these examples
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree algorithm falls under the category of supervised learning. They can be used to solve both regression and
classification problems.
The decision tree uses these attributes or features and asks the
right questions at the right step or node so as to classify whether
the loan can be provided to the person or not.
The decision tree uses these attributes or features and asks the
right questions at the right step or node so as to classify whether
the loan can be provided to the person or not.
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree algorithm falls under the category of supervised learning. They can be used to solve both regression
and classification problems.
DVR
Nodes that do not have any children are called leaf nodes. ( Get Loan, can also be more than 2
Don’t get Loan ). Leaf nodes hold the output labels.
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Appropriate problems for decision tree learning
Decision tree learning is generally best suited to problems with the following characteristics:
Decision tree methods can be used even when some training examples have unknown values (e.g., if the Humidity
of the day is known for only some of the training examples).
DVR
MACHINE LEARNING
MACHINE LEARNING
Agenda – Unit- I
Decision Tree Learning - BASIC DECISION TREE LEARNING ALGORITHM
They are two basic Algorithm
CART (Classification and Regression)
GINI Index
ID3
Entropy function
Information Gain
DVR
THE BASIC DECISION TREE LEARNING ALGORITHM
MACHINE LEARNING
• Most algorithms that have been developed for learning decision trees are
variations on a core algorithm that employs a top-down, greedy search through the
space of possible decision trees. This approach is exemplified by the ID3
algorithm and its successor C4.5
DVR
109
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning - ID3 ALGORITHM
ID3 algorithm, stands for Iterative Dichotomiser 3, is a classification algorithm that follows a
greedy approach of building a decision tree by selecting a best attribute that yields maximum
Information Gain (IG) or minimum Entropy (H)
ID3 only needs to test enough attributes until all data is classified
Identity and
DVR
Visionary technologies
What is the ID3 algorithm?
MACHINE LEARNING
• ID3 stands for Iterative Dichotomiser 3
• ID3 is a precursor to the C4.5 Algorithm.
• The ID3 algorithm was invented by Ross Quinlan in 1975
• Used to generate a decision tree from a given data set by employing a top-down,
greedy search, to test each attribute at every node of the tree.
• The resulting tree is used to classify future samples.
DVR
111
ID3 algorithm
MACHINE LEARNING
ID3(Examples, Target_attribute, Attributes)
Examples are the training examples. Target_attribute is the attribute whose value is to be predicted
by the tree. Attributes is a list of other attributes that may be tested by the learned decision tree.
Returns a decision tree that correctly classifies the given Examples.
DVR
112
ID3 algorithm
MACHINE LEARNING
Otherwise Begin
A ← the attribute from Attributes that best* classifies Examples
The decision attribute for Root ← A
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi
Let Examples vi, be the subset of Examples that have value vi for A
If Examples vi , is empty
Then below this new branch add a leaf node with label = most common value of
Target_attribute in Examples
Else below this new branch add the subtree ID3(Examples vi, Targe_tattribute,
Attributes – {A}))
End
Return Root
DVR
113
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning - ID3 ALGORITHM
ID3 (Examples, Target_Attribute, Attributes)
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A ← The Attribute that best classifies examples.
Decision Tree attribute for Root = A.
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi.
Let Examples(vi) be the subset of examples that have the value vi for A
If Examples(vi) is empty
Then below this new branch add a leaf node with label = most common target value in the examples
Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A})
End
Return Root
DVR
Which Attribute Is the Best Classifier?
MACHINE LEARNING
• The central choice in the ID3 algorithm is selecting which attribute
to test at each
• node in the tree.
• A statistical property called information gain that
measures how well a given attribute separates the
training examples according to their target classification.
• ID3 uses information gain measure to select among the candidate
attributes at each step while growing the tree.
DVR
115
MACHINE LEARNING
MACHINE LEARNING
BASIC DECISION TREE LEARNING ALGORITHM
Decision Tree Learning - BASIC DECISION TREE LEARNING ALGORITHM
ID3 Entropy function
Information Gain
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Entropy
Entropy is the measurement of disorder or impurities in the information processed in machine learning
Entropy determines how a decision tree chooses to split data
The higher the entropy, the harder it is to draw any conclusions from that information
Example
Flipping a coin. When we flip a coin, then there can be two outcomes
Entropy is frequently used in one of the most common machine learning techniques–decision trees
A measure for
Uncertainty
Purity
C is the number of classes Information Content
DVR
pi is the proportion of the ith class in that set
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
Entropy is an information theory metric that measures the impurity or uncertainty in a group of observations
Entropy determines how a decision tree chooses to split data.
The image below gives a better description of the purity of a set.
Entropy is the degree of uncertainty, impurity or disorder of a random variable, or a measure of purity
Entropy characterizes the impurity of an arbitrary class of examples
Entropy is the measurement of impurities or randomness in the data points
If all elements belong to a single class, then it is termed as “Pure”, and if not then the distribution is named as
“Impurity”
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Entropy
Example: Calculates the
Entropy
entropy of our data
The dataset has
9 positive instances and
5 negative instances, therefore
---1
By observing closely on equations 1, 2 and 3
Which concludes, the data set is 94% If the data set is completely homogeneous then the impurity is 0,
impure or 94% non-homogeneous therefore entropy is 0 (equation 3)
If the data set can be equally divided into two classes, then it is
What could do be the nature of Entropy for
completely non-homogeneous & impurity is 100%, therefore entropy is 1
(7+,7-) and (14+,14-)
(equation 2.).
---2 If we try to plot the Entropy in a graph,
it will look like Figure
and
---3 It clearly shows that the Entropy is
lowest when the data set is
homogeneous and highest when the
DVR
data set is completely non-homogeneous
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
lets assume, without loss of generality, that the resulting decision tree classifies instances into two categories,
we'll call them P(positive)and N(negative)
Given a set S, containing these positive and negative target, the entropy of S related to this Boolean classification is:
Example
if S is (0.5+, 0.5-) then Entropy(S) is 1, if S is (0.67+, 0.33-) then Entropy(S) is 0.92, if P is (1+, 0-) then
Entropy(S) is 0
Note that the more uniform is the probability distribution, the greater is its information
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
Entropy is a measure of impurity of a node. By Impurity, We mean to measure the heterogeneity at a particular
node.
Example:
Assume that we have 50 red balls and 50 blue balls in a Set.
In this case , proportions of the balls of both the colors are equal. Hence, the entropy would be 1. which means that
the set is impure
But, If the set has 98 red balls and 2 blue balls instead of the 50–50 proportion (The same logic can be applied for
a set of 98 blue balls and 2 red balls | which category does not matter What matters is that one category
dominates well over the other )
This is because now the set is mostly pure as it mostly contains balls belonging to one category
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
Entropy is a measure of impurity of a node. By Impurity, We mean to measure the heterogeneity at a particular
node.
Example:
Assume that we have 50 red balls and 50 blue balls in a Set.
In this case , proportions of the balls of both the colors are equal. Hence, the entropy would be 1. which means that
the set is impure
But, If the set has 98 red balls and 2 blue balls instead of the 50–50 proportion (The same logic can be applied for
a set of 98 blue balls and 2 red balls | which category does not matter What matters is that one category
dominates well over the other )
This is because now the set is mostly pure as it mostly contains balls belonging to one category
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
The concept of entropy plays an important role in measuring the information gain
Information gain is used for determining the best features/attributes that render maximum information about a
class
Information gain follows the concept of entropy while aiming at decreasing the level of entropy, beginning from the
root node to the leaf nodes
Information gain computes the difference between entropy before and after split and specifies the impurity in
class elements
Information Gain = Entropy before splitting - Entropy after splitting
Information gain computes the difference between entropy before and after split and specifies the impurity in
class elements
Information gain (IG) measures how much “information” a feature gives us about the class
Information gain (IG) tells us how important a given attribute of the feature vectors is
DVR
Information gain (IG) is used to decide the ordering of attributes in the nodes of a decision tree
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
Information gain as a measure of how much information a feature provides about a class.
The information gain is the amount of information gained about a random variable or signal
from observing another random variable
Information gain helps to determine the order of attributes in the nodes of a decision tree
Main node is referred to as the parent node, whereas sub-nodes are known as child nodes
We can use information gain to determine how good the splitting of nodes in a decision tree
The calculation of information gain should help us understand this concept better.
DVR
E_{children} is the average entropy of the child nodes.
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
Information gain is the reduction in entropy or surprise by transforming a dataset and is often
used in training decision trees
Information gain is calculated by comparing the entropy of the dataset before and after a
transformation
We can use information gain to determine how good the splitting of nodes in a decision tree
Information gain computes the difference between entropy before split and average entropy
after split of the dataset based on given attribute values
Information gain is based on the decrease in entropy after a dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns the highest information
gain (i.e., the most homogeneous branches)
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
The information gained in the decision tree can be defined as the amount of information improved in the nodes
before splitting them for making further decisions
Example:
As we can see in these three nodes we have data of two classes and here in node 3 we have data for only one class
and similarly, we have less data for the second class than the first class in node 2, and node 1 is balanced
By this above, we can say that in node three we don’t need to make any decision because all the instances are
representing the direction of the decision in the class first side wherein in node 1 there are 50% chances to decide
the direction of both classes
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
We can say that in node 1 we are required more information than the other nodes to describe a decision
Example:
As we can see in these three nodes we have data of two classes and here in node 3 we have data for only one class
and similarly, we have less data for the second class than the first class in node 2, and node 1 is balanced
By this above, we can say that in node three we don’t need to make any decision because all the instances are
representing the direction of the decision in the class first side wherein in node 1 there are 50% chances to decide
the direction of both classes
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain Information Gain = entropy(parent) – [average entropy(children)]
DVR
MACHINE LEARNING
MACHINE LEARNING
Inductive bias in decision tree learning
Decision Tree Learning -Inductive bias in decision tree learning
The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that
the learner uses to predict outputs of given inputs that it has not encountered
In machine learning, one aims to construct algorithms that are able to learn to predict a certain
target output
Example
Assuming that the solution to the problem of road safety can be expressed as a
conjunction of a set of eight concept
The inductive bias (also known as learning bias) of a learning algorithm is the set
of assumptions that the learner uses to predict outputs of given inputs that it has
not encountered
In the case of decision trees, the depth of the tress is the inductive bias. If the depth of the tree
is too low, then there is too much generalisation in the model.
DVR
INDUCTIVE BIAS IN DECISION TREE LEARNING
MACHINE LEARNING
Inductive bias is the set of assumptions that, together with the training data,
deductively justify the classifications assigned by the learner to future instances
DVR
131
INDUCTIVE BIAS IN DECISION TREE LEARNING
MACHINE LEARNING
Approximate inductive bias of ID3: Shorter trees are preferred over larger trees
•Consider an algorithm that begins with the empty tree and searches breadth first
through progressively more complex trees.
•First considering all trees of depth 1, then all trees of depth 2, etc.
•Once it finds a decision tree consistent with the
training data, it returns the smallest consistent tree at that
search depth (e.g., the tree with the fewest nodes).
•Let us call this breadth-first search algorithm BFS-ID3.
•BFS-ID3 finds a shortest decision tree and thus exhibits the bias "shorter trees are
preferred over longer trees.
DVR
132
INDUCTIVE BIAS IN DECISION TREE LEARNING
MACHINE LEARNING
A closer approximation to the inductive bias of ID3: Shorter trees are preferred over
longer trees. Trees that place high information gain attributes close to the root are
preferred over those that do not.
DVR
133
Restriction Biases and Preference Biases
MACHINE LEARNING
Difference between the types of inductive bias exhibited by ID3 and by the CANDIDATE-
ELIMINATION Algorithm.
ID3
•ID3 searches a complete hypothesis space
•It searches incompletely through this space, from simple to complex
hypotheses, until its
termination condition is met
•Its inductive bias is solely a consequence of the ordering of hypotheses by its search strategy. Its
hypothesis space introduces no additional bias
CANDIDATE-ELIMINATION Algorithm
•The version space CANDIDATE-ELIMINATION Algorithm searches an incomplete hypothesis
space
•It searches this space completely, finding every hypothesis consistent with the training data.
•Its inductive bias is solely a consequence of the expressive power of
its hypothesis representation. Its search strategy introduces no additional bias
DVR
134
Restriction Biases and Preference Biases
MACHINE LEARNING
• The inductive bias of ID3 is a preference for certain hypotheses over others (e.g.,
preference for shorter hypotheses over larger hypotheses), with no hard restriction
on the hypotheses that can be eventually enumerated. This form of bias is called a
preference bias or a search bias.
DVR
135
INDUCTIVE BIAS IN DECISION TREE LEARNING
MACHINE LEARNING
Which type of inductive bias is preferred in order to generalize beyond the training
data, a preference bias or restriction bias?
• A preference bias is more desirable than a restriction bias, because it allows the
learner to work within a complete hypothesis space that is assured to contain the
unknown target function.
• In contrast, a restriction bias that strictly limits the set of potential hypotheses is
generally less desirable, because it introduces the possibility of excluding the
unknown target function altogether.
DVR
136
Occam's razor
MACHINE LEARNING
Occam's razor: is the problem-solving principle that the simplest solution tends to be
the right one. When presented with competing hypotheses to solve a problem, one
should select the solution with the fewest assumptions.
Occam's razor: “Prefer the simplest hypothesis that fits the data”.
DVR
137
Why Prefer Short Hypotheses ?
MACHINE LEARNING
Argument in favour:
Fewer short hypotheses than long ones:
•Short hypotheses fits the training data which are less likely to be coincident
•Longer hypotheses fits the training data might be coincident.
Many complex hypotheses that fit the current training data but fail to generalize
correctly to subsequent data.
DVR
138
Why Prefer Short Hypotheses ?
MACHINE LEARNING
Argument opposed:
•There are few small trees, and our priori chance of finding one consistent with an
arbitrary set of data is therefore small. The difficulty here is that there are very
many small sets of hypotheses that one can define but understood by fewer learner.
•The size of a hypothesis is determined by the representation used internally by the
learner. Occam's razor will produce two different hypotheses from the same training
examples when it is applied by two learners, both justifying their contradictory
conclusions by Occam's razor. On this basis we might be tempted to reject Occam's
razor altogether.
DVR
139
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning - Issues in decision tree learning
Practical issues in learning decision trees include
DVR
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning - Issues in decision tree learning
Issues and extensions to the basic ID3 algorithm that address them
Avoiding Overfitting the Data
When we are designing a machine learning model, a model is said to be a good machine learning model, if
it generalizes any new input data from the problem domain in a proper way.
This helps us to make predictions in the future data, that data model has never seen
Under fitting
A machine learning algorithm is said to have under fitting when it cannot capture the underlying trend
of the data
Its occurrence simply means that our model or the algorithm does not fit the data well enough
Under fitting usually happens when we have less data to build an accurate model and also when we try to
build a linear model with a non-linear data
In Under fitting cases the rules of the machine learning model are too easy and flexible to be applied
on such a minimal data and therefore the model will probably make a lot of wrong predictions
DVR
Under fitting can be avoided by using more data and also reducing the features by feature selection
MACHINE LEARNING
MACHINE LEARNING
Decision Tree Learning - Issues in decision tree learning
Issues and extensions to the basic ID3 algorithm that address them
Avoiding Overfitting the Data
Under fitting
A machine learning algorithm is said to be over fitted, when we train it with a lot of data
When a model gets trained with so much of data, it starts learning from the noise and
inaccurate data entries in our data set
Then the model does not categorize the data correctly, because of too much of details and noise
The causes of overfitting are the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model based on the dataset and
therefore they can really build unrealistic models
A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters
like the maximal depth if we are using decision trees
DVR
MACHINE LEARNING
MACHINE LEARNING
Assignment
Please complete the following videos and prepare the notes on the following videos
What is concept learning in machine learning? https://www.youtube.com/watch?v=a75S7EVav-M
FIND S Algorithm | Finding A Maximally Specific https://www.youtube.com/watch?v=SD6MQLC2DdQ&list=PL4g
Hypothesis u8xQu0_5JBO1FKRO5p20wc8DprlOgn
https://www.youtube.com/watch?v=d-
Find S Algorithm Solved Numerical Example to find 7qkRtimX4&list=PL4gu8xQu0_5JBO1FKRO5p20wc8DprlOgn&i
Maximally Specific Hypothesis ndex=5
Machine Learning | Find-S Algorithm https://www.youtube.com/watch?v=ZcyI621kgak
Candidate Elimination Algorithm Concept https://www.youtube.com/watch?v=cW03t3aZkmE
Candidate Elimination Algorithm | Solved Example - 1 https://www.youtube.com/watch?v=O2wYwFOMQ24&t=299s
Candidate Elimination Algorithm With Example |ML| https://www.youtube.com/watch?v=orONxBtXp0o
Candidate Elimination Algorithm Solved Numerical
Example to find Specific and Generic Hypothesis https://www.youtube.com/watch?v=Hr96fzShANk&t=1s
I will check each and every one Notes of the above Videos. This is your
Assignment.
DVR
DVR MACHINE LEARNING
MACHINE LEARNING
DR DV RAMANA,
DATA STRATEGIST –CONSULTANT
AND
CHIEF ACADEMIC ADVISOR
MAIL ADDRESS: [email protected]
TO CONTACT: +91 9959423084
DVR