0% found this document useful (0 votes)
3 views

Lecture 8

The document discusses the fundamentals of machine learning, focusing on the selection of hypothesis spaces and the associated biases and trade-offs. It explains the concepts of bias, variance, and generalization error, emphasizing the importance of choosing an appropriate model class for effective learning. Additionally, it highlights the empirical and subjective nature of hypothesis space selection and the implications of model complexity on learning algorithms.

Uploaded by

Dhruv Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Lecture 8

The document discusses the fundamentals of machine learning, focusing on the selection of hypothesis spaces and the associated biases and trade-offs. It explains the concepts of bias, variance, and generalization error, emphasizing the importance of choosing an appropriate model class for effective learning. Additionally, it highlights the empirical and subjective nature of hypothesis space selection and the implications of model complexity on learning algorithms.

Uploaded by

Dhruv Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

DS605: Fundamentals of Machine Learning

Lecture 08

Choosing a Hypothesis Space


[Inductive Bias, Bias-Variance Trade-off, Model Complexity and Expressiveness Trade-off]

Arpit Rana
6th August 2024
Supervised Learning Process

Hypothesis Learner
Space 𝓗 (𝚪: S → h)

Final Hypothesis or
Model (h)
A Test Instance Prediction
Supervised Learning: Example

Problem: whether to wait for a table at a restaurant.

● Alternate: whether there is a suitable alternative restaurant ● Price: the restaurant’s price range ($, $$, $$$).
nearby. ● Raining: whether it is raining outside.
● Bar: whether the restaurant has a comfortable bar area to wait in. ● Reservation: whether we made a reservation.
● Fri/Sat: true on Fridays and Saturdays. ● Type: the kind of restaurant (French, Italian, Thai, or Burger).
● Hungry: whether we are hungry right now. ● WaitEstimate: host’s wait estimate: 0–10, 10–30, 30–60, or >60
● Patrons: how many people are in the restaurant (values are None, minutes.
Some, and Full).
Supervised Learning: Example

Problem: whether to wait for a table at a restaurant.

. Unknown
Training . Target
Data . function 𝑓
.

Instances

Instance
Space (𝑿)
2 x 2 x 2 x 2 x 3 x 3 x 2 x 2 x 4 x 4 = 9216
Size of Hypothesis Space (| 𝓗 |)
of Boolean Functions
= 29216
Hypothesis Space vs. Hypothesis

What do we mean by a Hypothesis Space (a.k.a. Model Class) and a hypothesis?

There are three different levels of specificity for using the term Hypothesis or Model:

● a broad hypothesis space (like “polynomials”),

● a hypothesis space with hyperparameters filled in (like “degree-2 polynomials”), and

● a specific hypothesis with all parameters filled in (like 5x2 + 3x − 2).


Hypothesis Space vs. Hypothesis

What do we mean by a Hypothesis Space (a.k.a. Model Class) and a hypothesis?

There are three different levels of specificity for using the term Hypothesis or Model:

Hyperparameter:
Polynomials degree=1

Parameters:
a=2, b=3
Hypothesis Space vs. Hypothesis

How do we choose a good Hypothesis Space or Model Class?

Hyperparameter:
Polynomials degree=1

Parameters:
a=2, b=3

Hypothesis Space / Representation / Model Class Selection Optimization


(popularly known as Model Selection) or Training
Choosing the Hypothesis Space
Hypothesis Space Selection is Subjective

Most probable hypothesis given the data -


𝒉∈𝓗 𝒉∈𝓗

● We can say that the prior probability P(h) is high for a smooth degree-1 or -2 polynomial
and lower for a degree-12 polynomial with large, sharp spikes.
Hypothesis Space Selection is Subjective

The observed dataset S alone does not allow us to make conclusions about unseen instances.
We need to make some assumptions!

● These assumptions induce the bias (a.k.a. inductive or learning bias) of a learning
algorithm.

● Two ways to induce bias:


○ Restriction: Limit the hypothesis space (e.g., degree-2 polynomials)
○ Preference: Impose ordering on hypothesis space (e.g., prefer simpler than complex)
Hypothesis Space Selection is not only subjective but is empirical also.

● Part of hypothesis space selection is qualitative and subjective:


We might select polynomials rather than decision trees based on something that we
know about the problem,

and

● part is quantitative and empirical:


Within the class of polynomials, we might select Degree = 2, because that value performs
best on the validation data set.
Experimental Evaluation of Learning Algorithms

The overall objective of the Learning


Algorithm is to find a hypothesis that -

● is consistent (i.e., fits the training


data), but more importantly,

● generalizes well for previously


unseen data. Hypothesis Learner
Space 𝓗 (𝚪: S → h)

Experimental Evaluation defines ways


to Measure the Generalizability of a
Learning Algorithm.
Final Hypothesis or
Model (h)
Experimental Evaluation of Learning Algorithms

Sample Error
The sample error of hypothesis h with respect to the target function f and data sample S is:

It is impossible to asses
true error, so we try to
estimate it using sample
error.
True Error

The true error of hypothesis h with respect to the target function f and the distribution D is
the probability that h will misclassify an instance drawn at random according to D:
Generalization Error

Generalization error (a.k.a. out-of-sample error) is a measure of how accurately an algorithm is


able to predict outcome values for previously unseen data.

Variance Bias Irreducible Error


Due to the model’s Due to Wrong Assumptions. Restrictions Due to the noisiness of
sensitivity to small imposed by - the data itself.
variations in the
training data. The Representation Function (i.e., Hypothesis The only way to handle
space, such as, linear or quadratic) it is to clean up the
It leads to overfitting! data properly, detect
The Search Algorithm (e.g., Grid search or Beam
search) and remove outliers.

It leads to underfitting!
Choosing a Hypothesis Space - I

One way to analyze hypothesis spaces is by

● the bias they impose (regardless of the training data set), and

● the variance they produce (from one training set to another).


Bias

The tendency of a predictive hypothesis to deviate from the expected value when averaged
over different training sets.

● Bias often results from restrictions


imposed by the hypothesis space.

● We say that a hypothesis is


underfitting when it fails to find a
pattern in the data.
Variance

The amount of change in the hypothesis due to fluctuation in the training data.

● We say a function is overfitting the data


when it pays too much attention to the
particular data set it is trained on.

● It causes the hypothesis to perform


poorly on unseen data.
Bias–Variance Trade-off

● High Variance-High Bias


The model is inconsistent and also
inaccurate on average

● Low Variance-High Bias


Models are consistent but low on
average

● High Variance-Low Bias


Somewhat accurate but inconsistent on
average

● Low Variance-Low Bias


Model is consistent and accurate on
average
Analogy with throwing darts at a board.
Choosing a Hypothesis Space - II

Another way to analyze hypothesis spaces is by

● the expressiveness (i.e., ability of a model to represent a wide variety of functions or


patterns) of a hypothesis space, and
○ Can be measured by the size of the hypothesis space

● the model complexity (i.e., how intricate the relationships a model can capture) of a
hypothesis space.
○ Can be estimated by the number of parameters of a hypothesis

Note-1: Sometimes the term model capacity is used to refer to model complexity and
expressiveness together.
Note-2: In general, the required amount of training data depends on the model complexity,
representativeness of the training sample, and the acceptable error margin.
Choosing a Hypothesis Space - II

There is a tradeoff between the expressiveness of a hypothesis space and the computational
complexity of finding a good hypothesis within that space.

● Fitting a straight line to data is an easy computation; fitting high-degree polynomials is


somewhat harder; and fitting unusual-looking functions may be undecidable.

● After learning h, computing h(x) when h is a linear function is guaranteed to be fast, while
computing an arbitrarily complex function may not even guaranteed to terminate.

For example:
● In Deep Learning, representations are not simple but the h(x) computation still takes
only a bounded number of steps to compute with appropriate hardware.
Bias-Variance vs. Model’s Complexity

The relationship between bias and variance is closely related to the machine learning concepts
of overfitting, underfitting, and model’s complexity.

● Increasing a model’s complexity


typically increases its variance and
reduces its bias.
● Reducing a model’s complexity
increases its bias and reduces its
variance.

This is why it is called a tradeoff.

Optimal Model’s
complexity complexity
Learning as a Search

Given a hypothesis space, data, and a bias, the problem of learning can be reduced to one of
search.

Hypothesis Learner
Space 𝓗 (𝚪: S → h)

Final Hypothesis or
Model (h)
A Test Instance Prediction
Next lecture Evaluation
8th August 2024

You might also like