0% found this document useful (0 votes)
132 views145 pages

CS601PC - MACHINE LEARNING Unit - 1-2

This document outlines the agenda for a Machine Learning course. It discusses machine learning as a subfield of artificial intelligence, and defines machine learning as a computer program learning from experience to improve its performance on tasks. The document provides examples of machine learning problems, including learning to recognize speech, drive autonomous vehicles, classify astronomical structures, and play backgammon. It also discusses concepts like decision trees, neural networks, and hidden Markov models as they relate to machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
132 views145 pages

CS601PC - MACHINE LEARNING Unit - 1-2

This document outlines the agenda for a Machine Learning course. It discusses machine learning as a subfield of artificial intelligence, and defines machine learning as a computer program learning from experience to improve its performance on tasks. The document provides examples of machine learning problems, including learning to recognize speech, drive autonomous vehicles, classify astronomical structures, and play backgammon. It also discusses concepts like decision trees, neural networks, and hidden Markov models as they relate to machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

B.Tech.

III YEAR II Semester

MACHINE LEARNING
CS601PC: MACHINE LEARNING

DR DV RAMANA,
DATA STRATEGIST –CONSULTANT
AND

DVR
ACADEMIC ADVISOR
MACHINE LEARNING

MACHINE LEARNING
Prerequisites

Data Structures
Knowledge on statistical methods
.

DVR
MACHINE LEARNING

MACHINE LEARNING
Course Objective

This course explains machine learning techniques such as decision tree learning, Bayesian learning etc

To understand computational learning theory.


.
To study the pattern comparison techniques

DVR
MACHINE LEARNING

MACHINE LEARNING
Course Outcomes

Understand the concepts of computational intelligence like machine learning

Ability to get the skill to apply machine learning techniques to address the real time problems in different areas
.
Understand the Neural Networks and its usage in machine learning application.

DVR
MACHINE LEARNING

MACHINE LEARNING
References

TEXT BOOK
Machine Learning – Tom M. Mitchell, - MGH
.
REFERENCE BOOK:
Machine Learning: An Algorithmic Perspective, Stephen Marshland, Taylor & Francis

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Introduction -
Well-posed learning problems,

Designing a learning system


Perspectives and issues in machine learning
Concept learning and the general to specific ordering
introduction
a concept learning task
concept learning as search
find-S
Finding a maximally specific hypothesis
Version spaces and the candidate elimination algorithm
Remarks on version spaces and candidate elimination

DVR
Inductive bias
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Decision Tree Learning
Introduction

Decision tree representation


Appropriate problems for decision tree learning
The basic decision tree learning algorithm
Inductive bias in decision tree learning
Issues in decision tree learning

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Machine learning is a subfield of artificial intelligence, which is broadly defined as the capability of a machine to
imitate intelligent human behavior

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure
P, if its performance at tasks in T, as measured by P, improves with experience E.

Model

INPUT MACHINE LEARNING OUTPUT

DVR
MACHINE LEARNING

MACHINE LEARNING
Introduction - Well-posed learning problems
Introduction - Well-posed learning problems
A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves with experience E.

EXAMPLE:

A computer program that learns to play checkers might improve its performance as measured by its ability to
win at the class of tasks involving playing checkers games, through experience obtained by playing games
against itself.

In general, to have a well-defined learning problem, we must identity

Learning to recognize spoken words


Learning to drive an autonomous vehicle
Learning to classify new astronomical structures
Learning to play world-class backgammon

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to recognize spoken words

All of the most successful speech recognition systems employ machine learning in some form
Example

SPHINX system (e.g., Lee 1989) learns speaker-specific strategies for recognizing
the primitive sounds (phonemes) and words from the observed speech signal

Neural network learning methods (e.g., Waibel et al. 1989) and methods for learning hidden
Markov models (e.g., Lee 1989) are effective for
Automatically customizing to , Individual speakers Vocabularies Microphone characteristics
Background noise, etc.

Similar techniques have potential applications in many signal-interpretation problems

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to drive an autonomous vehicle

Machine learning methods have been used to train computer-controlled vehicles to steer correctly when
driving on a variety of road types
Example

ALVINN system (Pomerleau 1989) has used its learned strategies to drive unassisted
at 70 miles per hour for 90 miles on public highways among other cars

Similar techniques have possible applications in many sensor-based control problems

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to classify new astronomical structures

Machine learning methods have been applied to a variety of large databases to learn general regularities
implicit in the data
Example

Decision tree learning algorithms have been used by NASA to learn how to classify
celestial objects from the second Palomar Observatory Sky Survey (Fayyad et al. 1995)

This system is now used to automatically classify all objects in the Sky Survey, which consists of three
terabytes of image data

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Introduction - Well-posed learning problems Learning to play world-class backgammon

The most successful computer programs for playing games such as backgammon are based on machiie
learning algorithms
Example

World's top computer program for backgammon,TD-GAMMON(T esauro 1992, 1995)

Learned its strategy by playing over one million practice games against itself

It now plays at a level competitive with the human world champion

Similar techniques have applications in many practical problems where very large search spaces must be
examined efficiently.

DVR
MACHINE LEARNING

MACHINE LEARNING
Three features
Class of tasks
Measure of performance to be improved, and
Source of experience
A robot driving learning problem:
Task T: playing checkers
Performance measure P: percent of games won against opponents
Training experience E : playing practice games against itself
A handwriting recognition learning problem

Task T: Recognizing and classifying handwritten words within images


Performance measure P: Percent of words correctly classified
Training experience E: Database of handwritten words with given classifications

As Experience E increases the performance should get better and better.

DVR
Any algorithm that achieves this is called Machine Learning Algorithm
MACHINE LEARNING

MACHINE LEARNING
Three features
Definition of Machine Learning (Mitchell 1997)

A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at the tasks improves with the experiences

Assume we want to classify an incoming mail as a Spam or Not.

This problem can be described in terms of three elements as below :

Spam Mail detection learning problem

Task T: To recognize and classify emails into ‘spam’ or ‘not spam’.

Total percent of mails being correctly classified as ‘spam’ (or ‘not spam’ )
Performance measure P:
by the program

Training experience E: A set of mails with given labels (‘spam’ / ‘not spam’)

DVR
MACHINE LEARNING

MACHINE LEARNING
Three features
Simple Learning Process

For any learning system, we must be knowing the three elements

T (Task), P (Performance Measure), and E (Training Experience).

At a high level, the process of learning system looks as below.

DVR
MACHINE LEARNING

MACHINE LEARNING
Machine learning algorithms
Machine learning draws on ideas from a diverse set of disciplines, including

Artificial intelligence

Probability and statistics,

Computational complexity

Information theory

Psychology and neurobiology,

Control theory, and

Philosophy

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning

Traditional Programming

Data
Computer Output
Program

Machine Learning

Data
Computer Program
Output

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Artificial intelligence

Learning symbolic representations of concepts


Machine learning as a search problem
Learning as an approach to improving problem solving
Using prior knowledge together with training data to guide learning

Any method that tries to replicate the results of some aspect of human cognition

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning

Programs that perform better with experience grows


Machine learning as a search problem
Learning as an approach to improving problem solving
Using prior knowledge together with training data to guide learning
Learning as an approach to improving problem solving
Using prior knowledge together with training data to guide learning

Any method that tries to replicate the results of some aspect of human cognition

Machine learning if the set of algorithms which actually gets better Artificial intelligence might or
might not actually get better with experience

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning

A computer program is said to learn from experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves with experience E

Main goal of Machine learning is to devise learning algorithms that do the learning automatically without human
intervention or assistance

The machine learning paradigm can be viewed as "Programming by example".

Develop Computational models of human learning process and perform computer simulations

To build computer systems that can adapt and learn from their experience

Can Figure out how to perform important tasks by generalizing from examples

Provides business insight and intelligence. Decision makers are provided with greater insights into
their organizations
Discover the relationship between the variables of a system(input,output and hidden) from direct

DVR
samples of the system
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning

Data
Computer Program
Output

Seeds = Algorithms

Nutrients= Data

Gardner = You

Plants = Programs

https://archive.org/details/academictorrents_0db676a6aaff8c33f9749d5f9c0fa22bf336bc76/01+Introduction+%

DVR
26+Inductive+learning/6.++Machine+Learning+In+Practice.mp4
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning

Master algorithms - Pedro Domingo


Francois Chollet's - Deep Learning with Python

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Machine Learning

Programs that perform better with experience grows


Machine learning as a search problem
Learning as an approach to improving problem solving
Using prior knowledge together with training data to guide learning

Any method that tries to replicate the results of some aspect of human cognition

Machine learning if the set of algorithms which actually gets better Artificial intelligence might or
might not actually get better with experience

Artificial Neural Network(ANN): A Machine Learning algorithm

Deep Learning: A type of ANN

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Bayesian methods

Bayes' theorem as the basis for calculating probabilities of hypotheses


The naive Bayes classifier
Algorithms for estimating values of unobserved variables

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Computational complexity theory

Theoretical bounds on the inherent complexity of different learning tasks, measured in terms of the

Computational effort

Number of training examples

Number of mistakes, etc. required in order to learn

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Control theory

Procedures that learn to control processes in order to optimize predefined objectives


and that learn to predict the next state of the process they are controlling

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Information theory

Measures of entropy and information content

Minimum description length approaches to learning


Optimal codes and their relationship to optimal training sequences for encoding a hypothesis

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Philosophy

Occam's razor, suggesting that the simplest hypothesis is the best

Analysis of the justification for generalizing beyond observed data

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Psychology and Neurobiology

The power law of practice, which states that over a very broad range of learning problems,
people's response time improves with practice according to a power law

Neurobiological studies motivating artificial neural network models of learning

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Statistics

Characterization of errors (e.g., bias and variance) that occur when estimating the accuracy of a
hypothesis based on a limited sample of data. Confidence intervals, statistical tests.

DVR
MACHINE LEARNING

MACHINE LEARNING
Perspectives and issues in machine learning
Perspectives and issues in machine learning
It involves searching a very large space of possible hypotheses to determine one that best fits the
observed data and any prior knowledge held by the learner

Example

Consider the space of hypotheses that could in principle be output by the above checkers learner.

This hypothesis space consists of all evaluation functions that can be represented by some choice of
values for the weights wo through w6

DVR
MACHINE LEARNING

MACHINE LEARNING
Perspectives and issues in machine learning
Perspectives and issues in machine learning
The learner's task is thus to search through this vast space to locate the hypothesis that is most
consistent with the available training examples

LMS algorithm for fitting weights achieves this goal by iteratively tuning the weights, adding a
correction to each weight each time the hypothesized evaluation function predicts a value that
differs from the training value

LMS algorithm works well when the hypothesis representation considered by the learner defines a
continuously parameterized space of potential hypotheses

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
For a well posed ML problem based on the design issues and approaches the
different steps are used to measure the performance of the problem

To complete the Design of learning Problem ,choose basic steps are

Exact type of knowledge to be learned

A representation for this target Knowledge

A learning Mechanism

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 1 - Choose type of Training Experience/data SET

The training set choosing is when we choose for an application .

It impacts on the success and Failure of the system

Training experience can be

Direct training – teacher-------------BEST


Indirect training- no teacher * Credit assignment --- disadvantage
Random Training - expert teacher

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 2 - Choose Target Function -Target Function V

What type of knowledge will be learned and how this will be used by the
performance program

Example:

Choosing the best next move in checkers identifying people , C classifying


facial expressions into emotion categories)

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 3 - Choose Target Function Representation

Ideal Target Function is usually not known; machine learning algorithms learn an
approximation of V ,say V or V‘

Be as close an approximation of V as possible

Require (reasonably) small amount of training data to be learned

Approximated Target Function V is for a collection of rule is defined


with a quadratic Polynomial or ANN

If b is arbitrary board state in B then V(b) is V‘

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 3 - Choose Target Function Representation

Ideal Target Function is usually not known; machine learning algorithms learn an
approximation of V ,say V or V‘

Be as close an approximation of V as possible

Require (reasonably) small amount of training data to be learned

Approximated Target Function V is for a collection of rule is defined with a quadratic


Polynomial or ANN
If b is arbitrary board state in B then V(b) is V‘

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 3 - Choose Target Function Representation

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 4-Target Function Approximation
Choose Learning Algorithm
Estimating Values
Adjusting weights

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 4-Target Function Approximation
Choose Learning Algorithm
Estimating Values
Adjusting weights

DVR
MACHINE LEARNING

MACHINE LEARNING
Designing a Machine Learning System
Step 5 –Final Design

DVR
MACHINE LEARNING

MACHINE LEARNING
Issues in Machine Learning
Issues in Machine Learning
What algorithms exist for learning general target functions from specific training examples?

In what settings will particular algorithms converge to the desired function, given sufficient training data?

Which algorithms perform best for which types of problems and representations?

How much training data is sufficient?

What general bounds can be found to relate the confidence in learned hypotheses to the amount
of training experience and the character of the learner's hypothesis space?

When and how can prior knowledge held by the learner guide the process of generalizing from examples?

Can prior knowledge be helpful even when it is only approximately correct?

What is the best strategy for choosing a useful next training experience, and how does the choice of this strategy
alter the complexity of the learning problem?

What is the best way to reduce the learning task to one or more function approximation problem?

DVR
MACHINE LEARNING

MACHINE LEARNING
Machine learning algorithms

Machine learning algorithms have proven to be of great practical value in a variety of application domains.

They are especially useful in

Data mining problems where large databases may contain valuable implicit regularities that can be
discovered automatically (e.g., to analyze outcomes of medical treatments from patient databases or to
learn general rules for credit worthiness from financial databases)

Poorly understood domains where humans might not have the knowledge needed to develop effective
algorithms (e.g.,human face recognition from images); and

Domains where the program must dynamically adapt to changing conditions (e.g., controlling manufacturing
processes under changing supply stocks or adapting to the changing reading interests of individuals)

DVR
MACHINE LEARNING

MACHINE LEARNING
Machine learning algorithms

A well-defined learning problem requires a well-specified task, performance metric, and source of training
experience.

Designing a machine learning approach involves a number of design choices, including

Choosing the type of training experience

The target function to be learned

A representation for this target function, and

An algorithm for learning the target function from training examples.

DVR
MACHINE LEARNING

MACHINE LEARNING
Learning
Learning

Learning is a phenomenon and process which has manifestations of various aspects

Learning process includes gaining of new symbolic knowledge and development of cognitive skills through
instruction and practice

It is also discovery pf new facts and theories through observation and experiment

Inductive learning is based on formulating a generalized concept after observing examples of the concept

Example

If a kid is asked to write an answer to 2*8=x, they can either use the rote learning method to
memorize the answer or use inductive learning (i.e. thinking how 2*1=2, 2*2=4, and so on) to
formulate a concept to calculate the results

In this way, the kid will be able to solve similar types of questions using the same concept

DVR
MACHINE LEARNING

MACHINE LEARNING
Learning
Learning

“The activity or process of gaining knowledge or skill by studying, practicing, being taught, or experiencing
something.”

Rote learning (memorization) Memorizing things without knowing the concept/logic behind them

Passive learning (instructions) Learning from a teacher/expert

Analogy (experience) Learning new things from our past experience

Inductive learning (experience On the basis of past experience, formulating a generalized concept.

Deductive learning Deriving new facts from past facts

DVR
MACHINE LEARNING

MACHINE LEARNING
Learning
Concept learning

“The problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits
the training examples.” — Tom Michell

Example
Humans identify different vehicles among all the vehicles based on specific sets of features
defined over a large set of features

This special set of features differentiates the subset of cars in a set of vehicles

This set of features that differentiate cars can be called a concept

Machines can learn from concepts to identify whether an object belongs to a specific category by processing past/training
data to find a hypothesis that best fits the training examples

Much of human learning involves acquiring general concepts from past experiences.

DVR
MACHINE LEARNING

MACHINE LEARNING
Learning
Concept learning
Acquiring the definition of a general category from given sample positive and negative training examples of the
category.

Concept Learning can seen as a problem of searching through a predefined space of potential hypotheses for the
hypothesis that best fits the training examples

Goal of the concept learning search is to find the hypothesis that best fits the training examples

Concept learning can be viewed as searching through a large space of hypothesis implicitly defined by the
hypothesis representation

DVR
MACHINE LEARNING

MACHINE LEARNING
Concept learning as search
Concept learning can be viewed as the task of searching through a large space of hypothesis implicitly defined by the
hypothesis representation

The goal of this search is to find the hypothesis that best fits the training examples

DVR
MACHINE LEARNING

MACHINE LEARNING
Hypothesis Representation
Hypothesis

1.Indicate by 1 "?" that any value is acceptable for this attribute

2. Specify a single required value for the attribute

3. indication by a Φ that no value is acceptable

If some instance x satisfied all the constraints of hypothesis h, then h classifies x as a positive example (h()x=1)

DVR
MACHINE LEARNING

MACHINE LEARNING
Hypothesis Representation
Hypothesis Representation
A hypothesis:

Sky AirTemp Humidity Wind Water Forecast


< Sunny, ?, ?, Strong , ? , Same >

The most specific hypothesis – that no day is a positive example


< Φ, Φ, Φ, Φ, Φ, Φ >

EnjoySport concept learning task requires learning the sets of days for which EnjoySport=yes, describing this set by a
conjunction of constraints over the instance attributes

DVR
MACHINE LEARNING

MACHINE LEARNING
Hypothesis Representation
Hypothesis Representation
Given – Instances X : set of all possible days, each described by the attributes

Sky – (values: Sunny, Cloudy, Rainy)

AirTemp – (values: Warm, Cold)

Humidity – (values: Normal, High)

Wind – (values: Strong, Weak)

Water – (values: Warm, Cold)

Forecast – (values: Same, Change)

Target Concept (Function) c : EnjoySport : X {0,1}

Hypotheses H : Each hypothesis is described by a conjunction of constraints on the attributes

Training Examples D : positive and negative examples of the target function


Determine

DVR
– A hypothesis h in H such that h(x) = c(x) for all x in D
MACHINE LEARNING

MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning Formal Definition for Concept Learning

Inferring a boolean-valued function from training examples of its input and output

A Concept Learning Task – Enjoy Sport Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same YES


2 Sunny Warm High Strong Warm Same YES
3 Rainy Cold High Strong Warm Same NO
4 Sunny Warm High Strong Warm Same YES

Attribute Concept

DVR
MACHINE LEARNING

MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same YES


2 Sunny Warm High Strong Warm Same YES
3 Rainy Cold High Strong Warm Same NO
4 Sunny Warm High Strong Warm Same YES

Attribute Concept

A set of example days, and each is described by six attributes.

The task is to learn to predict the value of EnjoySport for arbitrary day, based on the values of its attribute values

Each hypothesis consists of a conjunction of constraints on the instance attributes

DVR
Each hypothesis will be a vector of six constraints, specifying the values of the six attributes
– (Sky, AirTemp, Humidity, Wind, Water, and Forecast)
MACHINE LEARNING

MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same YES


2 Sunny Warm High Strong Warm Same YES
3 Rainy Cold High Strong Warm Same NO
4 Sunny Warm High Strong Warm Same YES

Attribute Concept

Each attribute will be:

? - indicating any value is acceptable for the attribute (don’t care)

Single value – specifying a single required value (ex. Warm) (specific)

DVR
Φ - indicating no value is acceptable for the attribute (no value)
MACHINE LEARNING

MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport


1 Sunny Warm Normal Strong Warm Same YES
2 Sunny Warm High Strong Warm Same YES
3 Rainy Cold High Strong Warm Same NO
4 Sunny Warm High Strong Warm Same YES

Concept
Attribute

Let’s Design the problem formally with TPE(Task, Performance, Experience):

Problem: Leaning the day when Ramesh enjoys the sport.

Task T: Learn to predict the value of EnjoySport for an arbitrary day, based on the values of the attributes of the day

Performance measure P: Total percent of days (EnjoySport) correctly predicted.

DVR
Training experience E: A set of days with given labels (EnjoySport: Yes/No)
MACHINE LEARNING

MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same YES

2 Sunny Warm High Strong Warm Same YES

3 Rainy Cold High Strong Warm Same NO

4 Sunny Warm High Strong Warm Same YES

Concept
Attribute
Let us take a very simple hypothesis representation which consists of a conjunction of constraints in the instance
attributes

We get a hypothesis h_i with the help of example i for our training set as below: hi(x) := <x1, x2, x3, x4, x5, x6>

where x1, x2, x3, x4, x5 and x6 are the values of Sky, AirTemp, Humidity, Wind, Water and Forecast.

Hence h1 will look like(the first row of the table above):

h1(x=1): <Sunny, Warm, Normal, Strong, Warm, Same > Note: x=1 represents a positive hypothesis / Positive example

DVR
MACHINE LEARNING

MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same YES

2 Sunny Warm High Strong Warm Same YES

3 Rainy Cold High Strong Warm Same NO

4 Sunny Warm High Strong Warm Same YES

Concept
Attribute
We want to find the most suitable hypothesis which can represent the concept

Example

Ramesh enjoys his favorite sport only on cold days with high humidity (This seems independent of the values of the
other attributes present in the training examples).

h(x=1) = <?, Cold, High, ?, ?, ?>

Here ? indicates that any value of the attribute is acceptable.

DVR
MACHINE LEARNING

MACHINE LEARNING
Formal Definition for Concept Learning
Concept learning A Concept Learning Task – Enjoy Sport Training Examples

Example Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same YES

2 Sunny Warm High Strong Warm Same YES

3 Rainy Cold High Strong Warm Same NO

4 Sunny Warm High Strong Warm Same YES

Concept
Attribute

Note: The most generic hypothesis will be < ?, ?, ?, ?, ?, ?>

where every day is a positive example and the most specific hypothesis will be <?,?,?,?,?,? > where no day is a positive
example

DVR
MACHINE LEARNING

MACHINE LEARNING
General-to-Specific Ordering of Hypotheses
General-to-Specific Ordering of Hypotheses
Many algorithms for concept learning organize the search through the hypothesis space by relying on a very
useful structure that exists for any concept learning problem:

A general-to-specific ordering of hypotheses

By taking advantage of this naturally occurring structure over the hypothesis space, we can design learning
algorithms that exhaustively search even infinite hypothesis spaces without explicitly enumerating every
hypothesis.

DVR
MACHINE LEARNING

MACHINE LEARNING
General-to-Specific Ordering of Hypotheses
General-to-Specific Ordering of Hypotheses

To illustrate the general-to-specific ordering

Consider the two hypotheses

h1 = (Sunny, ?, ?, Strong, ?, ?)

h2 = (Sunny, ?, ?, ?, ?, ?)

Consider the two hypotheses

Now consider the sets of instances that are classified positive by h1 and by h2.

Because h2 imposes fewer constraints on the instance, it classifies more instances as positive

In fact, any instance classified positive by hl will also be classified positive by h2

Therefore, we say that h2 is more general than hl.


This intuitive "more general than" relationship between hypotheses can be defined more precisely as follows

DVR
MACHINE LEARNING

MACHINE LEARNING
General-to-Specific Ordering of Hypotheses
General-to-Specific Ordering of Hypotheses

To illustrate the general-to-specific ordering

First, for any instance x in X and hypothesis h in H, we say that x satisfies h if and only if h(x) = 1

We now define the more_general_than_or_equal_to relation in terms of the sets of instances that satisfy the
two hypotheses:

Given hypotheses hj and hk, hj is more-general_than_equal_to hk if and only if any instance that satisfies hk also
satisfies hj

Definition:

Let hj and hk be boolean-valued functions defined over X.

Then hj is more_general_than_or_equal-to hk (written hj ≥ hk) if and only if

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
find-S - Finding a maximally specific hypothesis
The find-S algorithm is a basic concept learning algorithm in machine learning.

The find-S algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here
that the algorithm considers only those positive training example

FIND S Algorithm is used to find the Maximally Specific Hypothesis

The find-S algorithm is a basic concept learning algorithm in machine learning

The find-S algorithm is a basic concept learning algorithm in machine learning.

The find-S algorithm finds the most specific hypothesis that fits all the positive examples

Find S Algorithm considers only those positive training example

find-S algorithm starts with the most specific hypothesis and generalizes this hypothesis each time it fails to
classify an observed positive training data

Find-S algorithm moves from the most specific hypothesis to the most general hypothesis

DVR
Using the Find-S algorithm gives a single maximally specific hypothesis for the given set of training examples
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Important Representation : find-S - Finding a maximally specific hypothesis

? indicates that any value is acceptable for the attribute.

Specify a single required value ( e.g., Cold ) for the attribute.

Φ indicates that no value is acceptable.

The most general hypothesis is represented by: {?, ?, ?, ?, ?, ?}

The most specific hypothesis is represented by: {Φ , Φ, Φ, Φ, Φ, Φ}

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis

1. Start with the most specific hypothesis.


h = {ϕ, ϕ, ϕ, ϕ, ϕ, ϕ}

2. Take the next example and if it is negative, then no changes occur to the hypothesis.

3. If the example is positive and we find that our initial hypothesis is too specific then we
update our current hypothesis to a general condition.

4. Keep repeating the above steps till all the training examples are complete.

5. After we have completed all the training examples we will have the final hypothesis when
can use to classify the new examples.

DVR
MACHINE LEARNING

MACHINE LEARNING
How Does Find-S algorithm Works?
Initialize h

Identify a
positive example

Check for
Attributes

Yes

Attribute
value is
Replace the value
equal to No
with “?”
hypothesis
value

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes

1.How many concepts are possible for this instance space?

2.How many hypothesis can be expressed by the hypothesis language

3.Apply the find-S algorithm by hand on the given training set. Consider the
examples in the specified order and write down your hypothesis each time after
observing an example

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes

1.How many concepts are possible for this instance space?

Solution:

Citations are 2(some,many), Size is 3(small,big,medium),Library is 2 (no,yes),Price is


2(Affordable,Expensive),Editions are 3(many,few and one) and the Target value Buy is 2(no,yes)

Solution: 2*3*2*2*3 =72

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes

2.How many hypothesis can be expressed by the hypothesis language


Solution:
Citations are 4(some,many,ϕ,?), Size is 5(small,big,medium,ϕ,?), Library is 4(no,yes,ϕ,?), Price is (Affordable, Expensive,ϕ,?),
Editions are 5(many,few,one,ϕ,?)
Solution: 4*5*4*4*5 =1600
2*3*2*2*3 =72 concepts are possible for the above instance space

Number of hypothesis in the hypothesis space, whenever we have null for any of the hypothesis, it will not accept or never be
positive classification so for that reason we calculate semantically distinct hypothesis

Calculate semantically distinct hypothesis, we consider the actual number of possibilities along with the question mark as one
more possibility. In this case one as a question then

DVR
2*3*2*2*3 is written as (3*4*3*3*4)+1 =433 ( for all hypothesis consisting of null as 1)
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes

3.Apply the find-S algorithm by hand on the given training set. Consider the examples in the specified order and
write down your hypothesis each time after observing an example

Solution: Step 2 h2=(many,big,no,expensive,one)


X3=(some,big,always,expensive,few) -no
Negative Example Hence Ignore
h3=(many,big,no,expensive,one)
X4=(many,medium,no,expensive,many)-Yes
h4=(many,?,no,expensive,?)

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes

3.Apply the find-S algorithm by hand on the given training set. Consider the examples in the specified
order and write down your hypothesis each time after observing an example

Solution: Step 2 h2=(many,big,no,expensive,one)

X3=(some,big,always,expensive,few) -no

Negative Example Hence Ignore

h3=(many,big,no,expensive,one)

X4=(many,medium,no,expensive,many)-Yes

h4=(many,?,no,expensive,?)

X5=(many,small,no,affordable,many)-yes

h5=(many,?,no,?,?)

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Steps Involved in find-S - Finding a maximally specific hypothesis
Example Citations Size Library Price Editions Buy
1 some small no Affordable many no
2 Many big no Expensive one yes
3 Some big always Expensive few no
4 Many medium no Expensive many yes
5 Many small no Affordable many yes

3.Apply the find-S algorithm by hand on the given training set. Consider the examples in the specified order and write down your
hypothesis each time after observing an example

Solution: Step 3 Final Hypothesis or maxminally specific hypothesis for this given data set is

h5=(many,?,no,?,?)

DVR
MACHINE LEARNING

MACHINE LEARNING
find-S - Finding a maximally specific hypothesis
Implement Find-S Algorithm to the following table and Generate the final Hypothesis

Outlook Temperature Humidity Wind Play Tennis


Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Weak Yes

DVR
MACHINE LEARNING

MACHINE LEARNING
Advantages of Find-S Algorithm

Find-S algorithm only considers the positive examples and eliminates negative examples

FIND S Algorithm is used to find the Maximally Specific Hypothesis.

Find-S algorithm does not provide a backtracking technique to determine the best
possible changes that could be done to improve the resulting hypothesis.

DVR
MACHINE LEARNING

MACHINE LEARNING
Limitations of Find-S Algorithm

There is no way to determine if the hypothesis is consistent throughout the data.

Inconsistent training sets can actually mislead the Find-S algorithm, since it ignores the
negative examples.

Find-S algorithm does not provide a backtracking technique to determine the best
possible changes that could be done to improve the resulting hypothesis.

DVR
MACHINE LEARNING

MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Version space:

A set of all hypothesis that are consistent with the training examples

The version space denoted VS_H,D with respect to hypothesis space H and training
examples D, is the subset of from H consistent with the training examples in D

The version space is a hierarchical representation of knowledge that enables you to keep track of all the
useful information supplied by a sequence of learning examples without remembering any of the examples

The version space method is a concept learning process accomplished by managing multiple models within a version space

DVR
MACHINE LEARNING

MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Version space:

One limitation of the FIND-S algorithm is that it outputs just one hypothesis consistent with the
training data – there might be many

To overcome this, introduce notion of version space and algorithms to compute it

A version space is a hierarchical representation of knowledge that enables you to keep track of all
the useful information supplied by a sequence of learning examples without remembering any of the
examples
Version space learning is a logical approach to machine learning, specifically binary classification

Version space learning algorithms search a predefined space of hypotheses, viewed as a set of logical sentences.

Formally, the hypothesis space is a disjunction.

Version Space is an intermediate of general hypothesis and Specific hypothesis

Version Space not only just written one hypothesis but a set of all possible hypothesis based on training data-set

The version space method is still a trial and error method.

DVR
MACHINE LEARNING

MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Characteristics of Version Space

1.Tentative heuristics are represented using version spaces

2. A version space represents all the alternative plausible descriptions of a heuristics

3.A plausible description is one that is applicable to all known positive examples and no known negative example

4. A version space description consists of two complementary trees:

i. One that contains nodes connected to overly general models and

ii. One that contains nodes connected to overly specific models

5. Node values/attributes are discrete

DVR
MACHINE LEARNING

MACHINE LEARNING
Version spaces and the candidate elimination algorithm

Consistent Hypothesis and Version Space

An hypothesis h is consistent with set of training examples D iff h(x)=c(x) for each example in D

Consistent(h,D) ≡ (Ɐ[x,c(x)] ƐD)h(x)=c(x))

Example Citations Size InLibrary Price Editions Buy


1 Some Small No Affordable One No
2 Many Big No Expensive Many Yes

h1=(?,?,No,?,Many) - Consistent

h2=(?,?,No,?,?) - Not Consistent

DVR
MACHINE LEARNING

MACHINE LEARNING
Version spaces and the candidate elimination algorithm
Remarks on version spaces and candidate elimination

Consistent Hypothesis and Version Space

An hypothesis h is consistent with set of training examples D iff h(x)=c(x) for each example in D

DVR
MACHINE LEARNING

MACHINE LEARNING
Find S Algorithm Vs Candidate Elimination algorithm
FIND-S outputs a hypothesis from H, that is consistent with the training examples, this is just one of
many hypotheses from H that might fit the training data equally well

Candidate-Elimination algorithm is to output a description of the set of all hypotheses consistent with
the training examples.

DVR
MACHINE LEARNING

MACHINE LEARNING
Inductive bias
The Candidate Elimination Algorithm will converge toward the true target concept provided it is given
accurate training examples and provided its initial hypothesis space contains the target concept

What if the target concept is not contained in the hypothesis space?

Can we avoid this difficulty by using a hypothesis space that includes


every possible hypothesis?

How does the size of this hypothesis space influence the ability of the
algorithm to generalize to unobserved instances?

How does the size of the hypothesis space influence the number of
training examples that must be observed?

DVR
MACHINE LEARNING

MACHINE LEARNING
Inductive bias
In EnjoySport example, we restricted the hypothesis space to include only conjunctions of
attribute values.
Because of this restriction, the hypothesis space is unable to represent even
simple disjunctive target concepts such as "Sky = Sunny or Sky Cloudy."
Sky Air Temp Humidity Wind Water Forecast Enjoy Sport
Sunny Warm Normal Strong Cool Change Yes
Cloudy Warm Normal Strong Cool Change Yes
Rainy Warm Normal Strong Cool Change No

From first two examples S2 : <<?, Warm, Normal, Strong, Cool, Change>

This is inconsistent with third examples, and there are no hypotheses consistent with these
three examples

Problem: We have biased the learner to consider only conjunctive hypothesis. We require a more
expressive hypothesis space

The obvious solution to the problem of assuring that the target concept is in the hypothesis space
H is to provide a hypothesis space capable of representing every teachable concept.

DVR
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm
The candidate-Elimination algorithm computes the version space containing all (and only those)
hypothesis from H that are consistent with an observed sequence of training examples

The candidate elimination algorithm incrementally builds the version space given a hypothesis space
H and a set E of examples

The examples are added one by one


Each example possibly shrinks the version space by removing the hypotheses that are inconsistent
with the example

The candidate elimination algorithm does this by updating the general and specific boundary for each new
example
You can consider this as an extended form of Find-S algorithm
The candidate elimination algorithm does this by updating the general and specific boundary for
each new example
Consider both positive and negative examples
Actually, positive examples are used here as Find-S algorithm (Basically they are generalizing from the
specification)

DVR
While the negative example is specified from generalize form
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm
Terms Used:
Concept learning:

Concept learning is basically learning task of the machine (Learn by Train data)

General Hypothesis:

Not Specifying features to learn the machine

Number of attributes:

G = {‘?’, ‘?’,’?’,’?’…}: Number of attributes

Specific Hypothesis
Specifying features to learn machine (Specific feature)

Specific Hypothesis
S= {‘pi’,’pi’,’pi’…}: Number of pi depends on number of attributes

Version Space
It is intermediate of general hypothesis and Specific hypothesis.

DVR
It not only just written one hypothesis but a set of all possible hypothesis based on training data-set
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm

DVR
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm
Algorithm:

Step1: Load Data set

Step2: Initialize General Hypothesis and Specific Hypothesis.

Step3: For each training example

Step4: If example is positive example


if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically
generalizing it)

Step5: If example is Negative example


Make generalize hypothesis more specific

DVR
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:

Algorithmic steps:
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]

S = [Null, Null, Null, Null, Null, Null]

For instance 1 : <'sunny','warm','normal','strong','warm ','same'> and positive output.

G1 = G

S1 = ['sunny','warm','normal','strong','warm ','same']

DVR
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:

Algorithmic steps:
Initially : G = [[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],
[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?]]

S = [Null, Null, Null, Null, Null, Null]

For instance 1 : <'sunny','warm','normal','strong','warm ','same'> and positive output.

G1 = G

S1 = ['sunny','warm','normal','strong','warm ','same']

DVR
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:

Algorithmic steps:
For instance 2 : <'sunny','warm','high','strong','warm ','same'> and positive output.

G2 = G

S2 = ['sunny','warm',?,'strong','warm ','same']

For instance 3 : <'rainy','cold','high','strong','warm ','change'> and negative output

G3 = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?], [?, ?, ?, ?, ?, ?],


[?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ?], [?, ?, ?, ?, ?, ‘
']]
S3 = S2

DVR
MACHINE LEARNING

MACHINE LEARNING
ML – Candidate Elimination Algorithm
Consider the dataset given below:

Algorithmic steps:
For instance 4 : <'sunny','warm','high','strong','cool','change'> and positive output

G2 = G

S2 = ['sunny','warm',?,'strong','warm ','same']

At last, by synchronizing the G4 and S4 algorithm produce the output

Output :
G = [['sunny', ?, ?, ?, ?, ?], [?, 'warm', ?, ?, ?, ?]]

S = ['sunny','warm',?,'strong', ?, ?] .

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Remarks on Candidate Elimination and Version space algorithm:

1. Will the CANDIDATE-ELIMINATION Algorithm Converge to the Correct Hypothesis?

2.What training example should the learner request next?

DVR
MACHINE LEARNING

MACHINE LEARNING
Disciplines and Examples of their influence on Machine Learning
Inductive bias InductiveBias-Fundamental Questions for InductiveInference

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Introduction
A decision tree is a tree where each node represents feature(Attribute), each link(Branch) represents
a decision(rule) and each leaf represents an outcome
A decision tree is a simple representation for classifying examples
Decision tree learning is a method commonly used in data mining

The goal is to create a model that predicts the value of a target variable based on several input
variables

A decision tree is a simple representation for classifying examples

A decision tree is constructed by looking for regularities in data.

Data Decision Tree Allows us to make


predictions on unseen data
Decision Rules

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Introduction
Decision tree learning is a method for approximating discrete- valued target functions, in which the
learned function is represented by a decision tree.
Learned trees can also be re- represented as sets of if-then rules to improve human readability
Decision trees classify instances by sorting them done from the tree from the root to some leaf, which
provides the classification of the instance.

Each node in the tree specifies a test of some attribute of the instance and each branch descending
from the node corresponds to one of the possible values for this attribute.

An instance is classified by starting at the toot node of the tree, testing the attribute specified by
this node, then moving down the tree branch corresponding to the value of the attribute in the

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Introduction
Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree

A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)

A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.

Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree

A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)

A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.

A decision tree has two kinds of nodes

1. Each leaf node has a class label, determined by majority vote of training examples reaching that leaf
2. Each internal node is a question on features. It branch out according to the answers

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Introduction
Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree

A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)

A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.

Decision tree learning is a method for approximating discrete- valued target functions, in which the learned
function is represented by a decision tree

A decision tree is a tree where each node represents a feature(attribute), each link(branch) represents a
decision (rule) and each leaf represents an outcome (categorical or continues value)

A decision tree or a classification tree is a tree in which each internal node is labeled with an input features. The
arcs coming from a node labeled with a feature are labeled with each of the possible values of the feature.

A decision tree has two kinds of nodes

1. Each leaf node has a class label, determined by majority vote of training examples reaching that leaf
2. Each internal node is a question on features. It branch out according to the answers

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree consists of three types of nodes:
Decision nodes – typically represented by squares
Chance nodes – typically represented by circles
End nodes – typically represented by triangles

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree classifies instances
Node An attribute which describes an instance
Branch Possible values of the attribute
Leaf Class To which the instance belong

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Important Terminology related to Decision Trees

Root Node

Root Node represents the entire population or sample and this further gets divided into two or more homogeneous sets.

Splitting

Splitting is a process of dividing a node into two or more sub-nodes.

Decision Node

When a sub-node splits into further sub-nodes, then it is called the decision node.

Leaf / Terminal Node

Nodes do not split is called Leaf or Terminal node.

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Important Terminology related to Decision Trees

Pruning
When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite
process of splitting

Branch / Sub-Tree

A subsection of the entire tree is called branch or sub-tree.

Parent and Child Node

When a sub-node splits into further sub-nodes, then it is called the decision node.

Leaf / Terminal Node

A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the
child of a parent node

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Appropriate problems for decision tree learning
Decision tree learning is generally best suited to problems with the following characteristics:

Instances are represented by attribute-value pairs

Instances are described by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot)

The easiest situation for decision tree learning is when each attribute takes on a small number of disjoint possible
values (e.g., Hot, Mild, Cold)

Extensions to the basic algorithm allow handling real-valued attributes as well (e.g., representing Temperature
numerically

The training data may contain errors

Decision tree learning methods are robust to errors, both errors in classifications of the training examples and
errors in the attribute values that describe these examples

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree algorithm falls under the category of supervised learning. They can be used to solve both regression and
classification problems.

Figure represents a simple decision tree that is used to for a


classification task of whether a customer gets a loan or not.

The input features are salary of the person, the number of


children and the age of the person.

The decision tree uses these attributes or features and asks the
right questions at the right step or node so as to classify whether
the loan can be provided to the person or not.
The decision tree uses these attributes or features and asks the
right questions at the right step or node so as to classify whether
the loan can be provided to the person or not.

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Decision tree algorithm falls under the category of supervised learning. They can be used to solve both regression
and classification problems.

Figure represents a simple decision tree that is used to for a


classification task of whether a customer gets a loan or not.
The input features are salary of the person, the number of children and
the age of the person.
The decision tree uses these attributes or features and asks the right
questions at the right step or node so as to classify whether the loan can
be provided to the person or not.
Node
Blue colored rectangles that are shown above are what we call the
nodes of the tree
Root Node or Root
Top most node is called as the root node -“age over 30 ?” is the root Height of the above decision tree is 3
node 2 children for each node
Leaf node The number of children for that node

DVR
Nodes that do not have any children are called leaf nodes. ( Get Loan, can also be more than 2
Don’t get Loan ). Leaf nodes hold the output labels.
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Appropriate problems for decision tree learning
Decision tree learning is generally best suited to problems with the following characteristics:

The training data may contain missing attribute values

Decision tree methods can be used even when some training examples have unknown values (e.g., if the Humidity
of the day is known for only some of the training examples).

DVR
MACHINE LEARNING

MACHINE LEARNING
Agenda – Unit- I
Decision Tree Learning - BASIC DECISION TREE LEARNING ALGORITHM
They are two basic Algorithm
CART (Classification and Regression)
GINI Index

ID3
Entropy function
Information Gain

DVR
THE BASIC DECISION TREE LEARNING ALGORITHM

MACHINE LEARNING
• Most algorithms that have been developed for learning decision trees are
variations on a core algorithm that employs a top-down, greedy search through the
space of possible decision trees. This approach is exemplified by the ID3
algorithm and its successor C4.5

DVR
109
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning - ID3 ALGORITHM
ID3 algorithm, stands for Iterative Dichotomiser 3, is a classification algorithm that follows a
greedy approach of building a decision tree by selecting a best attribute that yields maximum
Information Gain (IG) or minimum Entropy (H)

Major benefits of ID3 are:

Understandable prediction rules are created from the training data

Builds a short tree in relatively small time

ID3 only needs to test enough attributes until all data is classified

Builds a short tree in relatively small time

3 in ID3 stands for


Intelligent design

Identity and

DVR
Visionary technologies
What is the ID3 algorithm?

MACHINE LEARNING
• ID3 stands for Iterative Dichotomiser 3
• ID3 is a precursor to the C4.5 Algorithm.
• The ID3 algorithm was invented by Ross Quinlan in 1975
• Used to generate a decision tree from a given data set by employing a top-down,
greedy search, to test each attribute at every node of the tree.
• The resulting tree is used to classify future samples.

DVR
111
ID3 algorithm

MACHINE LEARNING
ID3(Examples, Target_attribute, Attributes)

Examples are the training examples. Target_attribute is the attribute whose value is to be predicted
by the tree. Attributes is a list of other attributes that may be tested by the learned decision tree.
Returns a decision tree that correctly classifies the given Examples.

Create a Root node for the tree


If all Examples are positive, Return the single-node tree Root, with label = +
If all Examples are negative, Return the single-node tree Root, with label = -
If Attributes is empty, Return the single-node tree Root, with label = most common value of
Target_attribute in Examples

DVR
112
ID3 algorithm

MACHINE LEARNING
 Otherwise Begin
 A ← the attribute from Attributes that best* classifies Examples
 The decision attribute for Root ← A
 For each possible value, vi, of A,
 Add a new tree branch below Root, corresponding to the test A = vi
 Let Examples vi, be the subset of Examples that have value vi for A
 If Examples vi , is empty
 Then below this new branch add a leaf node with label = most common value of
Target_attribute in Examples
 Else below this new branch add the subtree ID3(Examples vi, Targe_tattribute,
Attributes – {A}))

 End
 Return Root

* The best attribute is the one with highest information gain

DVR
113
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning - ID3 ALGORITHM
ID3 (Examples, Target_Attribute, Attributes)
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A ← The Attribute that best classifies examples.
Decision Tree attribute for Root = A.
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi.
Let Examples(vi) be the subset of examples that have the value vi for A
If Examples(vi) is empty
Then below this new branch add a leaf node with label = most common target value in the examples
Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A})
End
Return Root

DVR
Which Attribute Is the Best Classifier?

MACHINE LEARNING
• The central choice in the ID3 algorithm is selecting which attribute
to test at each
• node in the tree.
• A statistical property called information gain that
measures how well a given attribute separates the
training examples according to their target classification.
• ID3 uses information gain measure to select among the candidate
attributes at each step while growing the tree.

DVR
115
MACHINE LEARNING

MACHINE LEARNING
BASIC DECISION TREE LEARNING ALGORITHM
Decision Tree Learning - BASIC DECISION TREE LEARNING ALGORITHM
ID3 Entropy function
Information Gain

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Decision tree representation
Entropy

Entropy is the measurement of disorder or impurities in the information processed in machine learning
Entropy determines how a decision tree chooses to split data

Entropy is a measure of the randomness in the information being processed

The higher the entropy, the harder it is to draw any conclusions from that information

Example

Flipping a coin. When we flip a coin, then there can be two outcomes
Entropy is frequently used in one of the most common machine learning techniques–decision trees

A measure for
Uncertainty
Purity
C is the number of classes Information Content

DVR
pi is the proportion of the ith class in that set
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
Entropy is an information theory metric that measures the impurity or uncertainty in a group of observations
Entropy determines how a decision tree chooses to split data.
The image below gives a better description of the purity of a set.

Entropy is the degree of uncertainty, impurity or disorder of a random variable, or a measure of purity
Entropy characterizes the impurity of an arbitrary class of examples
Entropy is the measurement of impurities or randomness in the data points

If all elements belong to a single class, then it is termed as “Pure”, and if not then the distribution is named as
“Impurity”

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy

Entropy basically tells us how impure a collection of data is.


Impure here defines non-homogeneity.
Entropy is the measurement of homogeneity.
Example: Calculates the entropy of our data

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Entropy
Example: Calculates the
Entropy
entropy of our data
The dataset has
9 positive instances and
5 negative instances, therefore

---1
By observing closely on equations 1, 2 and 3
Which concludes, the data set is 94% If the data set is completely homogeneous then the impurity is 0,
impure or 94% non-homogeneous therefore entropy is 0 (equation 3)
If the data set can be equally divided into two classes, then it is
What could do be the nature of Entropy for
completely non-homogeneous & impurity is 100%, therefore entropy is 1
(7+,7-) and (14+,14-)
(equation 2.).
---2 If we try to plot the Entropy in a graph,
it will look like Figure
and
---3 It clearly shows that the Entropy is
lowest when the data set is
homogeneous and highest when the

DVR
data set is completely non-homogeneous
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy

Entropy --- measuring homogeneity of a learning set (Tom M. Mitchell,1997,p55)

lets assume, without loss of generality, that the resulting decision tree classifies instances into two categories,
we'll call them P(positive)and N(negative)

Given a set S, containing these positive and negative target, the entropy of S related to this Boolean classification is:

Entropy(S)= - P(positive)log2P(positive) - P(negative)log2P(negative)

P(positive): proportion of positive examples in S

P(negative): proportion of negative examples in S

Example

if S is (0.5+, 0.5-) then Entropy(S) is 1, if S is (0.67+, 0.33-) then Entropy(S) is 0.92, if P is (1+, 0-) then
Entropy(S) is 0

Note that the more uniform is the probability distribution, the greater is its information

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
Entropy is a measure of impurity of a node. By Impurity, We mean to measure the heterogeneity at a particular
node.

Example:
Assume that we have 50 red balls and 50 blue balls in a Set.

In this case , proportions of the balls of both the colors are equal. Hence, the entropy would be 1. which means that
the set is impure

But, If the set has 98 red balls and 2 blue balls instead of the 50–50 proportion (The same logic can be applied for
a set of 98 blue balls and 2 red balls | which category does not matter What matters is that one category
dominates well over the other )

Then the entropy would be low (somewhere closer to 0)

This is because now the set is mostly pure as it mostly contains balls belonging to one category

Because of this , the heterogeneity is reduced

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Entropy
Entropy
Entropy is a measure of impurity of a node. By Impurity, We mean to measure the heterogeneity at a particular
node.

Example:
Assume that we have 50 red balls and 50 blue balls in a Set.

In this case , proportions of the balls of both the colors are equal. Hence, the entropy would be 1. which means that
the set is impure

But, If the set has 98 red balls and 2 blue balls instead of the 50–50 proportion (The same logic can be applied for
a set of 98 blue balls and 2 red balls | which category does not matter What matters is that one category
dominates well over the other )

Then the entropy would be low (somewhere closer to 0)

This is because now the set is mostly pure as it mostly contains balls belonging to one category

Because of this , the heterogeneity is reduced

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
The concept of entropy plays an important role in measuring the information gain

Information gain is based on the information theory

Information gain is used for determining the best features/attributes that render maximum information about a
class

Information gain follows the concept of entropy while aiming at decreasing the level of entropy, beginning from the
root node to the leaf nodes

Information gain computes the difference between entropy before and after split and specifies the impurity in
class elements
Information Gain = Entropy before splitting - Entropy after splitting

Information gain computes the difference between entropy before and after split and specifies the impurity in
class elements
Information gain (IG) measures how much “information” a feature gives us about the class

Information gain (IG) tells us how important a given attribute of the feature vectors is

DVR
Information gain (IG) is used to decide the ordering of attributes in the nodes of a decision tree
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
Information gain as a measure of how much information a feature provides about a class.

The information gain is the amount of information gained about a random variable or signal
from observing another random variable

Information gain helps to determine the order of attributes in the nodes of a decision tree

Main node is referred to as the parent node, whereas sub-nodes are known as child nodes

We can use information gain to determine how good the splitting of nodes in a decision tree

The calculation of information gain should help us understand this concept better.

$$ Gain = E_{parent} - E_{children} $$

Gain represents information gain

$ E_{parent} $ is the entropy of the parent node and

DVR
E_{children} is the average entropy of the child nodes.
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain

Information gain is the reduction in entropy or surprise by transforming a dataset and is often
used in training decision trees

Information gain is calculated by comparing the entropy of the dataset before and after a
transformation

We can use information gain to determine how good the splitting of nodes in a decision tree

Information gain is a decrease in entropy

Information gain computes the difference between entropy before split and average entropy
after split of the dataset based on given attribute values

ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain

Information gain is based on the decrease in entropy after a dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns the highest information
gain (i.e., the most homogeneous branches)

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
The information gained in the decision tree can be defined as the amount of information improved in the nodes
before splitting them for making further decisions

Example:

As we can see in these three nodes we have data of two classes and here in node 3 we have data for only one class
and similarly, we have less data for the second class than the first class in node 2, and node 1 is balanced

By this above, we can say that in node three we don’t need to make any decision because all the instances are
representing the direction of the decision in the class first side wherein in node 1 there are 50% chances to decide
the direction of both classes

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain
We can say that in node 1 we are required more information than the other nodes to describe a decision

Example:

As we can see in these three nodes we have data of two classes and here in node 3 we have data for only one class
and similarly, we have less data for the second class than the first class in node 2, and node 1 is balanced

By this above, we can say that in node three we don’t need to make any decision because all the instances are
representing the direction of the decision in the class first side wherein in node 1 there are 50% chances to decide
the direction of both classes

By the above, we can say the information gain in node 1 is higher.

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning -Information Gain
Information Gain Information Gain = entropy(parent) – [average entropy(children)]

DVR
MACHINE LEARNING

MACHINE LEARNING
Inductive bias in decision tree learning
Decision Tree Learning -Inductive bias in decision tree learning
The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that
the learner uses to predict outputs of given inputs that it has not encountered

In machine learning, one aims to construct algorithms that are able to learn to predict a certain
target output

Example
Assuming that the solution to the problem of road safety can be expressed as a
conjunction of a set of eight concept
The inductive bias (also known as learning bias) of a learning algorithm is the set
of assumptions that the learner uses to predict outputs of given inputs that it has
not encountered

In the case of decision trees, the depth of the tress is the inductive bias. If the depth of the tree
is too low, then there is too much generalisation in the model.

DVR
INDUCTIVE BIAS IN DECISION TREE LEARNING

MACHINE LEARNING
Inductive bias is the set of assumptions that, together with the training data,
deductively justify the classifications assigned by the learner to future instances

Given a collection of training examples, there are typically


many decision trees consistent with these examples. Which of these
decision trees does ID3 choose?

ID3 search strategy


(a)selects in favour of shorter trees over longer ones
(b)selects trees that place the attributes with highest information gain closest to the
root.

DVR
131
INDUCTIVE BIAS IN DECISION TREE LEARNING

MACHINE LEARNING
Approximate inductive bias of ID3: Shorter trees are preferred over larger trees

•Consider an algorithm that begins with the empty tree and searches breadth first
through progressively more complex trees.
•First considering all trees of depth 1, then all trees of depth 2, etc.
•Once it finds a decision tree consistent with the
training data, it returns the smallest consistent tree at that
search depth (e.g., the tree with the fewest nodes).
•Let us call this breadth-first search algorithm BFS-ID3.
•BFS-ID3 finds a shortest decision tree and thus exhibits the bias "shorter trees are
preferred over longer trees.

DVR
132
INDUCTIVE BIAS IN DECISION TREE LEARNING

MACHINE LEARNING
A closer approximation to the inductive bias of ID3: Shorter trees are preferred over
longer trees. Trees that place high information gain attributes close to the root are
preferred over those that do not.

• ID3 can be viewed as an efficient approximation to BFS-ID3, using a greedy


heuristic search to attempt to find the shortest tree without conducting the entire
breadth-first search through the hypothesis space.
• Because ID3 uses the information gain heuristic and a hill climbing strategy, it
exhibits a more complex bias than BFS-ID3.
• In particular, it does not always find the shortest consistent tree, and it is biased to
favour trees that place attributes with high information gain closest to the root.

DVR
133
Restriction Biases and Preference Biases

MACHINE LEARNING
Difference between the types of inductive bias exhibited by ID3 and by the CANDIDATE-
ELIMINATION Algorithm.
ID3
•ID3 searches a complete hypothesis space
•It searches incompletely through this space, from simple to complex
hypotheses, until its
termination condition is met
•Its inductive bias is solely a consequence of the ordering of hypotheses by its search strategy. Its
hypothesis space introduces no additional bias
CANDIDATE-ELIMINATION Algorithm
•The version space CANDIDATE-ELIMINATION Algorithm searches an incomplete hypothesis
space
•It searches this space completely, finding every hypothesis consistent with the training data.
•Its inductive bias is solely a consequence of the expressive power of
its hypothesis representation. Its search strategy introduces no additional bias

DVR
134
Restriction Biases and Preference Biases

MACHINE LEARNING
• The inductive bias of ID3 is a preference for certain hypotheses over others (e.g.,
preference for shorter hypotheses over larger hypotheses), with no hard restriction
on the hypotheses that can be eventually enumerated. This form of bias is called a
preference bias or a search bias.

• The bias of the CANDIDATE ELIMINATION algorithm is in the form of a


categorical restriction on the set of hypotheses considered. This form of bias is
typically called a restriction bias or a language bias.

DVR
135
INDUCTIVE BIAS IN DECISION TREE LEARNING

MACHINE LEARNING
Which type of inductive bias is preferred in order to generalize beyond the training
data, a preference bias or restriction bias?

• A preference bias is more desirable than a restriction bias, because it allows the
learner to work within a complete hypothesis space that is assured to contain the
unknown target function.
• In contrast, a restriction bias that strictly limits the set of potential hypotheses is
generally less desirable, because it introduces the possibility of excluding the
unknown target function altogether.

DVR
136
Occam's razor

MACHINE LEARNING
Occam's razor: is the problem-solving principle that the simplest solution tends to be
the right one. When presented with competing hypotheses to solve a problem, one
should select the solution with the fewest assumptions.

Occam's razor: “Prefer the simplest hypothesis that fits the data”.

DVR
137
Why Prefer Short Hypotheses ?

MACHINE LEARNING
Argument in favour:
Fewer short hypotheses than long ones:
•Short hypotheses fits the training data which are less likely to be coincident
•Longer hypotheses fits the training data might be coincident.
Many complex hypotheses that fit the current training data but fail to generalize
correctly to subsequent data.

DVR
138
Why Prefer Short Hypotheses ?

MACHINE LEARNING
Argument opposed:
•There are few small trees, and our priori chance of finding one consistent with an
arbitrary set of data is therefore small. The difficulty here is that there are very
many small sets of hypotheses that one can define but understood by fewer learner.
•The size of a hypothesis is determined by the representation used internally by the
learner. Occam's razor will produce two different hypotheses from the same training
examples when it is applied by two learners, both justifying their contradictory
conclusions by Occam's razor. On this basis we might be tempted to reject Occam's
razor altogether.

DVR
139
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning - Issues in decision tree learning
Practical issues in learning decision trees include

Determining how deeply to grow the decision tree

Handling continuous attributes

Choosing an appropriate attribute selection measure

Handling training data with missing attribute values

Handling attributes with differing costs, and

Improving computational efficiency

DVR
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning - Issues in decision tree learning
Issues and extensions to the basic ID3 algorithm that address them
Avoiding Overfitting the Data
When we are designing a machine learning model, a model is said to be a good machine learning model, if
it generalizes any new input data from the problem domain in a proper way.

This helps us to make predictions in the future data, that data model has never seen

Under fitting
A machine learning algorithm is said to have under fitting when it cannot capture the underlying trend
of the data

Under fitting destroys the accuracy of our machine learning model

Its occurrence simply means that our model or the algorithm does not fit the data well enough

Under fitting usually happens when we have less data to build an accurate model and also when we try to
build a linear model with a non-linear data
In Under fitting cases the rules of the machine learning model are too easy and flexible to be applied
on such a minimal data and therefore the model will probably make a lot of wrong predictions

DVR
Under fitting can be avoided by using more data and also reducing the features by feature selection
MACHINE LEARNING

MACHINE LEARNING
Decision Tree Learning - Issues in decision tree learning
Issues and extensions to the basic ID3 algorithm that address them
Avoiding Overfitting the Data
Under fitting
A machine learning algorithm is said to be over fitted, when we train it with a lot of data

When a model gets trained with so much of data, it starts learning from the noise and
inaccurate data entries in our data set

Then the model does not categorize the data correctly, because of too much of details and noise

The causes of overfitting are the non-parametric and non-linear methods because these types of
machine learning algorithms have more freedom in building the model based on the dataset and
therefore they can really build unrealistic models

A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters
like the maximal depth if we are using decision trees

DVR
MACHINE LEARNING

MACHINE LEARNING
Assignment
Please complete the following videos and prepare the notes on the following videos
What is concept learning in machine learning? https://www.youtube.com/watch?v=a75S7EVav-M
FIND S Algorithm | Finding A Maximally Specific https://www.youtube.com/watch?v=SD6MQLC2DdQ&list=PL4g
Hypothesis u8xQu0_5JBO1FKRO5p20wc8DprlOgn
https://www.youtube.com/watch?v=d-
Find S Algorithm Solved Numerical Example to find 7qkRtimX4&list=PL4gu8xQu0_5JBO1FKRO5p20wc8DprlOgn&i
Maximally Specific Hypothesis ndex=5
Machine Learning | Find-S Algorithm https://www.youtube.com/watch?v=ZcyI621kgak
Candidate Elimination Algorithm Concept https://www.youtube.com/watch?v=cW03t3aZkmE
Candidate Elimination Algorithm | Solved Example - 1 https://www.youtube.com/watch?v=O2wYwFOMQ24&t=299s
Candidate Elimination Algorithm With Example |ML| https://www.youtube.com/watch?v=orONxBtXp0o
Candidate Elimination Algorithm Solved Numerical
Example to find Specific and Generic Hypothesis https://www.youtube.com/watch?v=Hr96fzShANk&t=1s
I will check each and every one Notes of the above Videos. This is your
Assignment.

DVR
DVR MACHINE LEARNING
MACHINE LEARNING
DR DV RAMANA,
DATA STRATEGIST –CONSULTANT
AND
CHIEF ACADEMIC ADVISOR
MAIL ADDRESS: [email protected]
TO CONTACT: +91 9959423084

DVR

You might also like