Lecture 8
Lecture 8
Lecture 08
Arpit Rana
6th August 2024
Supervised Learning Process
Hypothesis Learner
Space 𝓗 (𝚪: S → h)
Final Hypothesis or
Model (h)
A Test Instance Prediction
Supervised Learning: Example
● Alternate: whether there is a suitable alternative restaurant ● Price: the restaurant’s price range ($, $$, $$$).
nearby. ● Raining: whether it is raining outside.
● Bar: whether the restaurant has a comfortable bar area to wait in. ● Reservation: whether we made a reservation.
● Fri/Sat: true on Fridays and Saturdays. ● Type: the kind of restaurant (French, Italian, Thai, or Burger).
● Hungry: whether we are hungry right now. ● WaitEstimate: host’s wait estimate: 0–10, 10–30, 30–60, or >60
● Patrons: how many people are in the restaurant (values are None, minutes.
Some, and Full).
Supervised Learning: Example
. Unknown
Training . Target
Data . function 𝑓
.
Instances
Instance
Space (𝑿)
2 x 2 x 2 x 2 x 3 x 3 x 2 x 2 x 4 x 4 = 9216
Size of Hypothesis Space (| 𝓗 |)
of Boolean Functions
= 29216
Hypothesis Space vs. Hypothesis
There are three different levels of specificity for using the term Hypothesis or Model:
There are three different levels of specificity for using the term Hypothesis or Model:
Hyperparameter:
Polynomials degree=1
Parameters:
a=2, b=3
Hypothesis Space vs. Hypothesis
Hyperparameter:
Polynomials degree=1
Parameters:
a=2, b=3
≣
𝒉∈𝓗 𝒉∈𝓗
● We can say that the prior probability P(h) is high for a smooth degree-1 or -2 polynomial
and lower for a degree-12 polynomial with large, sharp spikes.
Hypothesis Space Selection is Subjective
The observed dataset S alone does not allow us to make conclusions about unseen instances.
We need to make some assumptions!
● These assumptions induce the bias (a.k.a. inductive or learning bias) of a learning
algorithm.
and
Sample Error
The sample error of hypothesis h with respect to the target function f and data sample S is:
It is impossible to asses
true error, so we try to
estimate it using sample
error.
True Error
The true error of hypothesis h with respect to the target function f and the distribution D is
the probability that h will misclassify an instance drawn at random according to D:
Generalization Error
It leads to underfitting!
Choosing a Hypothesis Space - I
● the bias they impose (regardless of the training data set), and
The tendency of a predictive hypothesis to deviate from the expected value when averaged
over different training sets.
The amount of change in the hypothesis due to fluctuation in the training data.
● the model complexity (i.e., how intricate the relationships a model can capture) of a
hypothesis space.
○ Can be estimated by the number of parameters of a hypothesis
Note-1: Sometimes the term model capacity is used to refer to model complexity and
expressiveness together.
Note-2: In general, the required amount of training data depends on the model complexity,
representativeness of the training sample, and the acceptable error margin.
Choosing a Hypothesis Space - II
There is a tradeoff between the expressiveness of a hypothesis space and the computational
complexity of finding a good hypothesis within that space.
● After learning h, computing h(x) when h is a linear function is guaranteed to be fast, while
computing an arbitrarily complex function may not even guaranteed to terminate.
For example:
● In Deep Learning, representations are not simple but the h(x) computation still takes
only a bounded number of steps to compute with appropriate hardware.
Bias-Variance vs. Model’s Complexity
The relationship between bias and variance is closely related to the machine learning concepts
of overfitting, underfitting, and model’s complexity.
Optimal Model’s
complexity complexity
Learning as a Search
Given a hypothesis space, data, and a bias, the problem of learning can be reduced to one of
search.
Hypothesis Learner
Space 𝓗 (𝚪: S → h)
Final Hypothesis or
Model (h)
A Test Instance Prediction
Next lecture Evaluation
8th August 2024