Lect02 Problem ML
Lect02 Problem ML
2023
Contents
1. Learning Components
3. Feasibility Of Learning
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Feasibility Of
symbol meaning
Learning
Probability to the
a, b, c, N . . . scalar number
rescue
w, v, x, y . . . column vector
Risk and
Emprical Risk X, Y . . . matrix operator meaning
Loss function
Empirical risk
R set of real numbers w| transpose
Regularizer
Z set of integer numbers XY matrix multiplication
N set of natural numbers X −1 inverse
RD set of vectors
X , Y, . . . set
A algorithm
3
Learning Components
Credit Approval
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Suppose that a bank receives thousands of credit card applications every day,
Feasibility Of
Learning and it wants to automate the process of evaluating them.
Probability to the
rescue
• Applicant information
Risk and
Emprical Risk age 23 years
gender male
Loss function
Empirical risk
Regularizer
annual salary $30000
years in residence 1 year
years in job 1 year
current debt $15000
... ...
• Approve credit?
5
Problem Statement
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Formalization
Feasibility Of
Learning • Input: x (customer application)
Probability to the
rescue
• Output: y (good/bad customer? or {1, −1})
Risk and
Emprical Risk • Data (x 1 , y1 ), (x 2 , y2 ), ...(x N , yN ) (historical records)
Loss function
Empirical risk • Target function: f : X → Y (ideal credit approval formula)
Regularizer
6
Inductive Bias
Learning
Components
A Simple
Learning Model
Hypothesis Set
Feasibility Of
Learning
Probability to the
An unbiased learner can never generalize.
rescue
Risk and
Emprical Risk Concept 1
An inductive bias of a learner is the set of assumptions a learner uses to predict
Loss function
Empirical risk
Regularizer
results given inputs it has not yet encountered.
• Consider: arbitrarily wiggly functions or random truth tables.
0 0 0 0
0 0 1 ?
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 ?
1 1 0 1
1 1 1 ?
7
Inductive Bias (cont.)
Learning
Components
A Simple
Learning Model
Hypothesis Set
Feasibility Of
Learning
Probability to the
Generalization is possible.
rescue
Risk and
Emprical Risk • If a machine performs well on most training data AND it is not too
Loss function
Empirical risk
complex, it will probably do well on similar test data.
Regularizer
8
Components of Learning
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
UNKNOWN TARGET FUNCTION TRAINING EXAMPLES
Feasibility Of
Learning
Probability to the
rescue
Risk and
Emprical Risk
Loss function
Empirical risk
LEARNING
Regularizer HYPOTHESIS SET
ALGORITHM
FINAL HYPOTHESIS
9
Learning Model
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
The two components are referred as the
Feasibility Of
Learning
learning model
• The hypothesis set H is a set of
f, H
Probability to the
rescue
Risk and
Emprical Risk
functions that is potentially similar
Loss function to f
H = {hθ1 , hθ2 , ...}
Empirical risk
Regularizer
hθ
1
A ∣ D
g ≈ f
hθ
i
best
g ≈ f
10
What is hypothesis set
Learning
Components
A Simple
Learning Model
Hypothesis Set
Concept 2
Learning Algorithm
Feasibility Of
Learning
Probability to the
Hypothesis set is a set of potential functions, models or solutions
rescue
Risk and
Emprical Risk • Hypothesis set can be finite. For example
Loss function
Empirical risk • {guilty, not guilty}
Regularizer
• {accept, reject}
• {happy, sad}
• {1, 2, 3, 4, 5, 6}
11
What is hypothesis set (cont.)
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm • Hypothesis set can be infinite. For example, sets of functions y = θ0 + θ1 x
Feasibility Of
Learning
and y = θ0 + θ1 x + θ2 x 2 + θ3 x 3
Probability to the
rescue
Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer
12
Parameter representations
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Each element of hypothesis set often indexed by parameters or weights (θ
Feasibility Of
Learning or w)
Probability to the
rescue
• Two basic representations for parameters: factored, and structured
Risk and
Emprical Risk 1. Factored: a paramater set consists of a vector of attribute values; values
can be boolean, real-valued, or one of a fixed set of symbols.
Loss function
Empirical risk
Regularizer
2. Structured: a paramater set includes objects, each of which may have
attributes of its own as well as relationships to other objects.
13
A Simple Learning Model
• Hypothesis Set
• Learning Algorithm
A Simple Hypothesis Set
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
We starts with the simple model (the perceptron model)
Feasibility Of
Learning • For input x = (x1 , ..., xd ) (attributes of a customer)
Probability to the
rescue
Risk and
d
Approve credit if wi xi ≥ threshold
X
Emprical Risk
Loss function
Empirical risk i=1
Regularizer
i=1
i=1
15
A Simple Hypothesis Set (cont.)
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm • Set w0 = −threshold
Feasibility Of
Learning
d
! !
Probability to the
Risk and
Emprical Risk i=1
Loss function
Empirical risk
Regularizer
• Introduce an artificial coordinate x0 = 1
d
!
h(x) = hw (x) = sign w i xi (4)
X
i=0
16
2D Model Visualization
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Decision boundaries: line
Feasibility Of
Learning • Decision regions: approve and deny regions
Probability to the
rescue
Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer Approve
Attribute 2
Deny
Attribute 1
17
A Simple Learning Algorithm
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• The performance measure: the error rate
Feasibility Of
Learning • We uses the simple learning algorithm (perceptron learning algorithm -
Probability to the
rescue
PLA) to find w
Risk and
Emprical Risk arg min E (hw (x), y | D) (6)
Loss function
w
Empirical risk
Regularizer
18
A Simple Learning Algorithm (cont.)
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Given the training set
Feasibility Of
Learning
Probability to the
rescue
D = {(x 1 , y1 ), (x 2 , y2 ), ...(x N , yN )}
Risk and
Emprical Risk
Loss function 1. Init w
2. Repeat until satisfied
Empirical risk
Regularizer
sign(w | x i ) 6= yi (7)
w ← w + yi x i (8)
19
A Simple Explanation
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Feasibility Of
Learning incorrect correct
Probability to the
rescue
Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer
=0
20
Is It Learning Algorithm?
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Feasibility Of
Learning
Probability to the
rescue
Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer
21
A Learning Puzzle
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Feasibility Of
Learning
Probability to the y = -1
rescue
Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer
y = +1
y=?
22
Feasibility Of Learning
• Probability to the rescue
Feasibility Of Learning
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
The feasibility of learning is thus split into two questions:
Feasibility Of
Learning 1. Can we make the performance good enough?
• run our learning algorithm on the actual data D and see how good we
Probability to the
rescue
Risk and
Emprical Risk can get.
2. Can we make sure that the performance inside of D is close enough to the
Loss function
Empirical risk
Regularizer
performance outside of D?
• probability theory
24
A Related Experiment - Bin Problem
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Feasibility Of
• Consider a BIN with red and green
marbles
Learning BIN SAMPLE
Probability to the
rescue
Risk and
Emprical Risk P[picking a red marble] = µ = fraction of red marbles
25
Does ν say anything about µ?
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Feasibility Of
Learning
Probability to the
rescue
Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer
26
What does ν say about µ?
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• In a big sample (large N), ν is probably close µ (within )
Feasibility Of
Learning • Formally,
P[|ν − µ| > ] ≤ 2e −2 N for any > 0
Probability to the
(9)
2
rescue
Risk and
Emprical Risk
Loss function
This is called Hoeffding’s Inequality
Empirical risk
Regularizer
• Bound does not depend on µ; tradeoff: N, and the bound
• We have
ν ≈ µ =⇒ µ ≈ ν
• In other words, the statement “µ = ν” is probably approximately correct
(P.A.C)
27
Connection to Learning
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Feasibility Of
Bin problem Learning problem
Learning
Probability to the
The unknown is a number µ The unknown is a function f : X → Y
rescue
a marble a point x ∈ X
Risk and
Emprical Risk hypothesis got it right h(x) = f (x)
Loss function
Empirical risk
hypothesis got it wrong h(x) 6= f (x)
Regularizer
28
Connection to Learning (cont.)
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm • The error rate within the sample D, which corresponds to ν in the bin model,
Feasibility Of
Learning
will be called the in-sample error
Probability to the
rescue
Risk and
Ein (h) = fraction of D where f and h disagree
Emprical Risk
N
Loss function
1 X
Empirical risk
= I(h(x n ) 6= f (x n ))
Regularizer
N
n=1
A Simple
Learning Model
Hypothesis Set
Learning Algorithm • The Hoeffding inequality becomes:
Feasibility Of
Learning 2N
Probability to the
rescue
P[|Ein (h) − Eout (h)| > ] ≤ 2e −2 for any > 0 (10)
Risk and
Emprical Risk
Loss function In a big sample D, the performance inside of D is close enough to the
performance outside of D
Empirical risk
Regularizer
30
Risk and Emprical Risk
• Loss function
• Empirical risk
• Regularizer
Loss function
Learning
Components
A Simple
Learning Model
Hypothesis Set
Concept 3
Learning Algorithm
Feasibility Of
Learning
Probability to the
Given a hypothesis ŷ = h(x) ∈ H, a non-negative real-valued loss function
`(ŷ, y) which measures how different the prediction ŷ of a hypothesis is from the
rescue
Risk and
Emprical Risk true outcome y.
Loss function
Empirical risk
Regularizer
32
Loss Functions for Binary Classification
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Zero-one loss
Feasibility Of
Learning I(h(x) 6= y) (11)
Probability to the
Risk and
Emprical Risk
Loss function
Empirical risk
log(1 + e −h(x)y ) (12)
Regularizer
33
Loss Functions for Regression
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Squared loss
Feasibility Of
Learning (h(x) − y)2 (14)
Probability to the
• Absolute loss
rescue
Risk and
Emprical Risk |h(x) − y| (15)
Loss function
Empirical risk
Regularizer
34
Risk
Learning
Components
A Simple
Learning Model
Hypothesis Set
Concept 4
Learning Algorithm
Feasibility Of
Learning
Probability to the
The risk E associated with hypothesis h(x) is defined as the expectation of the
loss function
rescue
Risk and
Emprical Risk
Loss function
E (h) = E[`(h(x), y)] = `(h(x), y)dp(x, y) (16)
Empirical risk
Regularizer
35
Empirical Risk
Learning
Components
A Simple
Learning Model
Hypothesis Set
Concept 5
Learning Algorithm
Feasibility Of
Learning
Probability to the The empirical risk Ê is the average of the loss function on the training set
D = {(x 1 , y1 ), (x 2 , y2 ), ...(x N , yN )}
rescue
Risk and
Emprical Risk
Loss function
N
Empirical risk 1 X
Regularizer Ê = `(hw (x i ), yi ) (17)
N
i=1
Theorem 2
The empirial risk is unbiased estimate of the risk
36
Empirical Risk (cont.)
Learning
Components
A Simple
Learning Model
Hypothesis Set
Concept 6
Learning Algorithm
Feasibility Of
Learning
Probability to the
Empirical risk of hypothesis hw (x) with a loss function ` and a regularizer reg
rescue
Risk and N
Emprical Risk 1 X
Ê = `(hw (x i ), yi ) + λreg(w) (18)
N
Loss function
Empirical risk
i=1
| {z } | {z }
Regularizer
Loss Regularizer
37
The empirical risk minimization principle
Learning
Components
A Simple
Learning Model
Hypothesis Set
Principle
Learning Algorithm
Feasibility Of
Learning
Probability to the
The learning algorithm should choose a hypothesis hw which minimizes the
rescue
empirical risk
Risk and
Emprical Risk hw = arg min Ê (hw | D) (19)
Loss function hw ∈H
Empirical risk
Regularizer
38
Regularizers
Learning
Components
A Simple
Learning Model
Hypothesis Set
Theorem 3
Learning Algorithm
Feasibility Of
Learning
Probability to the
For each λ ≥ 0, there exists B ≥ 0. such that the two formulations are equivalent,
rescue
Risk and N
Emprical Risk
arg min `(hw (x i ), yi ) + λreg(w) (20)
X
Loss function
w
i=1
Empirical risk
Regularizer
N
arg min `(hw (x i ), yi ) subject to reg(w) ≤ B (21)
X
w
i=1
39
Regularizers (cont.)
Learning
Components
A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• L2 -regularization
Feasibility Of
Learning reg(w) = w > w = kwk22 (22)
Probability to the
• L1 -regularization
rescue
Risk and
Emprical Risk reg(w) = kwk1 (23)
Loss function
Empirical risk
Regularizer
40
References