0% found this document useful (0 votes)
14 views

Lect02 Problem ML

A computer program interacts with a dynamic environment in which it must perform a certain goal (finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) • The program is provided feedback in terms of rewards and punishments as it navigates its problem space
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lect02 Problem ML

A computer program interacts with a dynamic environment in which it must perform a certain goal (finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge) • The program is provided feedback in terms of rewards and punishments as it navigates its problem space
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

LEARNING PROBLEM

Bùi Tiến Lên

2023
Contents

1. Learning Components

2. A Simple Learning Model

3. Feasibility Of Learning

4. Risk and Emprical Risk


Notation
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm

Feasibility Of
symbol meaning
Learning
Probability to the
a, b, c, N . . . scalar number
rescue
w, v, x, y . . . column vector
Risk and
Emprical Risk X, Y . . . matrix operator meaning
Loss function
Empirical risk
R set of real numbers w| transpose
Regularizer
Z set of integer numbers XY matrix multiplication
N set of natural numbers X −1 inverse
RD set of vectors
X , Y, . . . set
A algorithm

3
Learning Components
Credit Approval
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Suppose that a bank receives thousands of credit card applications every day,
Feasibility Of
Learning and it wants to automate the process of evaluating them.
Probability to the
rescue
• Applicant information
Risk and
Emprical Risk age 23 years
gender male
Loss function
Empirical risk
Regularizer
annual salary $30000
years in residence 1 year
years in job 1 year
current debt $15000
... ...
• Approve credit?

5
Problem Statement
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
Formalization
Feasibility Of
Learning • Input: x (customer application)
Probability to the
rescue
• Output: y (good/bad customer? or {1, −1})
Risk and
Emprical Risk • Data (x 1 , y1 ), (x 2 , y2 ), ...(x N , yN ) (historical records)
Loss function
Empirical risk • Target function: f : X → Y (ideal credit approval formula)
Regularizer

• Best approximate function g : X → Y (formula to be used)

6
Inductive Bias
Learning
Components

A Simple
Learning Model
Hypothesis Set

Theorem 1 (No Free Lunch Theorems)


Learning Algorithm

Feasibility Of
Learning
Probability to the
An unbiased learner can never generalize.
rescue

Risk and
Emprical Risk Concept 1
An inductive bias of a learner is the set of assumptions a learner uses to predict
Loss function
Empirical risk
Regularizer
results given inputs it has not yet encountered.
• Consider: arbitrarily wiggly functions or random truth tables.

0 0 0 0

0 0 1 ?

0 1 0 1

0 1 1 1

1 0 0 0

1 0 1 ?

1 1 0 1

1 1 1 ?
7
Inductive Bias (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set

Inductive Learning Hypothesis


Learning Algorithm

Feasibility Of
Learning
Probability to the
Generalization is possible.
rescue

Risk and
Emprical Risk • If a machine performs well on most training data AND it is not too
Loss function
Empirical risk
complex, it will probably do well on similar test data.
Regularizer

8
Components of Learning
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
UNKNOWN TARGET FUNCTION TRAINING EXAMPLES
Feasibility Of
Learning
Probability to the
rescue

Risk and
Emprical Risk
Loss function
Empirical risk
LEARNING
Regularizer HYPOTHESIS SET
ALGORITHM

FINAL HYPOTHESIS

9
Learning Model
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
The two components are referred as the
Feasibility Of
Learning
learning model
• The hypothesis set H is a set of
f, H

Probability to the
rescue

Risk and
Emprical Risk
functions that is potentially similar
Loss function to f
H = {hθ1 , hθ2 , ...}
Empirical risk
Regularizer

• The learning algorithm A is a


f, H


1

search algorithm which finds


g ∈ H such that
hθ hθ
2 3

A ∣ D
g ≈ f


i

best
g ≈ f
10
What is hypothesis set
Learning
Components

A Simple
Learning Model
Hypothesis Set

Concept 2
Learning Algorithm

Feasibility Of
Learning
Probability to the
Hypothesis set is a set of potential functions, models or solutions
rescue

Risk and
Emprical Risk • Hypothesis set can be finite. For example
Loss function
Empirical risk • {guilty, not guilty}
Regularizer
• {accept, reject}
• {happy, sad}
• {1, 2, 3, 4, 5, 6}

11
What is hypothesis set (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm • Hypothesis set can be infinite. For example, sets of functions y = θ0 + θ1 x
Feasibility Of
Learning
and y = θ0 + θ1 x + θ2 x 2 + θ3 x 3
Probability to the
rescue

Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer

12
Parameter representations
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Each element of hypothesis set often indexed by parameters or weights (θ
Feasibility Of
Learning or w)
Probability to the
rescue
• Two basic representations for parameters: factored, and structured
Risk and
Emprical Risk 1. Factored: a paramater set consists of a vector of attribute values; values
can be boolean, real-valued, or one of a fixed set of symbols.
Loss function
Empirical risk
Regularizer
2. Structured: a paramater set includes objects, each of which may have
attributes of its own as well as relationships to other objects.

13
A Simple Learning Model
• Hypothesis Set
• Learning Algorithm
A Simple Hypothesis Set
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
We starts with the simple model (the perceptron model)
Feasibility Of
Learning • For input x = (x1 , ..., xd ) (attributes of a customer)
Probability to the
rescue

Risk and
d
Approve credit if wi xi ≥ threshold
X
Emprical Risk
Loss function
Empirical risk i=1
Regularizer

Deny credit if wi xi < threshold (1)


X

i=1

• This linear formula h ∈ H can be written as


d
! !
h(x) = hw,threshold (x) = sign w i xi − threshold (2)
X

i=1

15
A Simple Hypothesis Set (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm • Set w0 = −threshold
Feasibility Of
Learning
d
! !
Probability to the

h(x) = hw (x) = sign w i xi + w0 (3)


X
rescue

Risk and
Emprical Risk i=1
Loss function
Empirical risk
Regularizer
• Introduce an artificial coordinate x0 = 1
d
!
h(x) = hw (x) = sign w i xi (4)
X

i=0

• In vector form, the perceptron implements

h(x) = hw (x) = sign (w | x) (5)

16
2D Model Visualization
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Decision boundaries: line
Feasibility Of
Learning • Decision regions: approve and deny regions
Probability to the
rescue

Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer Approve

Attribute 2
Deny

Attribute 1

17
A Simple Learning Algorithm
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• The performance measure: the error rate
Feasibility Of
Learning • We uses the simple learning algorithm (perceptron learning algorithm -
Probability to the
rescue
PLA) to find w
Risk and
Emprical Risk arg min E (hw (x), y | D) (6)
Loss function
w
Empirical risk
Regularizer

18
A Simple Learning Algorithm (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Given the training set
Feasibility Of
Learning
Probability to the
rescue
D = {(x 1 , y1 ), (x 2 , y2 ), ...(x N , yN )}
Risk and
Emprical Risk
Loss function 1. Init w
2. Repeat until satisfied
Empirical risk
Regularizer

• At iteration t = 1, 2, 3, ..., pick a misclassified point (x i , yi )

sign(w | x i ) 6= yi (7)

• and update the weight vector

w ← w + yi x i (8)

19
A Simple Explanation
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm

Feasibility Of
Learning incorrect correct
Probability to the
rescue

Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer
=0

20
Is It Learning Algorithm?
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm

Feasibility Of
Learning
Probability to the
rescue

Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer

21
A Learning Puzzle
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm

Feasibility Of
Learning
Probability to the y = -1
rescue

Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer
y = +1

y=?

22
Feasibility Of Learning
• Probability to the rescue
Feasibility Of Learning
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
The feasibility of learning is thus split into two questions:
Feasibility Of
Learning 1. Can we make the performance good enough?
• run our learning algorithm on the actual data D and see how good we
Probability to the
rescue

Risk and
Emprical Risk can get.
2. Can we make sure that the performance inside of D is close enough to the
Loss function
Empirical risk
Regularizer
performance outside of D?
• probability theory

24
A Related Experiment - Bin Problem
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm

Feasibility Of
• Consider a BIN with red and green
marbles
Learning BIN SAMPLE
Probability to the
rescue

Risk and
Emprical Risk P[picking a red marble] = µ = fraction of red marbles

P[picking a green marble] = 1 − µ


Loss function
Empirical risk
Regularizer

• The value of µ is unknown to us


• We pick N marbles independently
• The fraction of red marbles in
SAMPLE = ν = probability of red marbles

25
Does ν say anything about µ?
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm

Feasibility Of
Learning
Probability to the
rescue

Risk and
Emprical Risk
Loss function
Empirical risk
Regularizer

• No! (certain answer): Sample can • Yes! (uncertain answer): Sample


be mostly red while bin is mostly red frequency ν is likely close to bin
frequency µ

26
What does ν say about µ?
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• In a big sample (large N), ν is probably close µ (within )
Feasibility Of
Learning • Formally,
P[|ν − µ| > ] ≤ 2e −2 N for any  > 0
Probability to the

(9)
2
rescue

Risk and
Emprical Risk
Loss function
This is called Hoeffding’s Inequality
Empirical risk
Regularizer
• Bound does not depend on µ; tradeoff: N,  and the bound
• We have
ν ≈ µ =⇒ µ ≈ ν
• In other words, the statement “µ = ν” is probably approximately correct
(P.A.C)

27
Connection to Learning
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm

Feasibility Of
Bin problem Learning problem
Learning
Probability to the
The unknown is a number µ The unknown is a function f : X → Y
rescue
a marble a point x ∈ X
Risk and
Emprical Risk hypothesis got it right h(x) = f (x)
Loss function
Empirical risk
hypothesis got it wrong h(x) 6= f (x)
Regularizer

28
Connection to Learning (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm • The error rate within the sample D, which corresponds to ν in the bin model,
Feasibility Of
Learning
will be called the in-sample error
Probability to the
rescue

Risk and
Ein (h) = fraction of D where f and h disagree
Emprical Risk
N
Loss function
1 X
Empirical risk
= I(h(x n ) 6= f (x n ))
Regularizer
N
n=1

where I(...) = 1 if the statement is true, and I(...) = 0 if the statement is


false
• In the same way, we define the out-of-sample error , (domain X )

Eout (h) = P(h(x) 6= f (x)), x ∈ X

which corresponds to µ in the bin model.


29
Connection to Learning (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm • The Hoeffding inequality becomes:
Feasibility Of
Learning 2N
Probability to the
rescue
P[|Ein (h) − Eout (h)| > ] ≤ 2e −2 for any  > 0 (10)
Risk and
Emprical Risk
Loss function In a big sample D, the performance inside of D is close enough to the
performance outside of D
Empirical risk
Regularizer

30
Risk and Emprical Risk
• Loss function
• Empirical risk
• Regularizer
Loss function
Learning
Components

A Simple
Learning Model
Hypothesis Set

Concept 3
Learning Algorithm

Feasibility Of
Learning
Probability to the
Given a hypothesis ŷ = h(x) ∈ H, a non-negative real-valued loss function
`(ŷ, y) which measures how different the prediction ŷ of a hypothesis is from the
rescue

Risk and
Emprical Risk true outcome y.
Loss function
Empirical risk
Regularizer

32
Loss Functions for Binary Classification
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Zero-one loss
Feasibility Of
Learning I(h(x) 6= y) (11)
Probability to the

• Log loss (logistic regression)


rescue

Risk and
Emprical Risk
Loss function
Empirical risk
log(1 + e −h(x)y ) (12)
Regularizer

• Exponential loss (AdaBoost)


e −h(x)y (13)

33
Loss Functions for Regression
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• Squared loss
Feasibility Of
Learning (h(x) − y)2 (14)
Probability to the

• Absolute loss
rescue

Risk and
Emprical Risk |h(x) − y| (15)
Loss function
Empirical risk
Regularizer

34
Risk
Learning
Components

A Simple
Learning Model
Hypothesis Set

Concept 4
Learning Algorithm

Feasibility Of
Learning
Probability to the
The risk E associated with hypothesis h(x) is defined as the expectation of the
loss function 
rescue

Risk and
Emprical Risk
Loss function
E (h) = E[`(h(x), y)] = `(h(x), y)dp(x, y) (16)
Empirical risk
Regularizer

35
Empirical Risk
Learning
Components

A Simple
Learning Model
Hypothesis Set

Concept 5
Learning Algorithm

Feasibility Of
Learning
Probability to the The empirical risk Ê is the average of the loss function on the training set
D = {(x 1 , y1 ), (x 2 , y2 ), ...(x N , yN )}
rescue

Risk and
Emprical Risk
Loss function
N
Empirical risk 1 X
Regularizer Ê = `(hw (x i ), yi ) (17)
N
i=1

Theorem 2
The empirial risk is unbiased estimate of the risk

36
Empirical Risk (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set

Concept 6
Learning Algorithm

Feasibility Of
Learning
Probability to the
Empirical risk of hypothesis hw (x) with a loss function ` and a regularizer reg
rescue

Risk and N
Emprical Risk 1 X
Ê = `(hw (x i ), yi ) + λreg(w) (18)
N
Loss function
Empirical risk
i=1
| {z } | {z }
Regularizer
Loss Regularizer

37
The empirical risk minimization principle
Learning
Components

A Simple
Learning Model
Hypothesis Set

Principle
Learning Algorithm

Feasibility Of
Learning
Probability to the
The learning algorithm should choose a hypothesis hw which minimizes the
rescue
empirical risk
Risk and
Emprical Risk hw = arg min Ê (hw | D) (19)
Loss function hw ∈H
Empirical risk
Regularizer

38
Regularizers
Learning
Components

A Simple
Learning Model
Hypothesis Set

Theorem 3
Learning Algorithm

Feasibility Of
Learning
Probability to the
For each λ ≥ 0, there exists B ≥ 0. such that the two formulations are equivalent,
rescue

Risk and N
Emprical Risk
arg min `(hw (x i ), yi ) + λreg(w) (20)
X
Loss function
w
i=1
Empirical risk
Regularizer

N
arg min `(hw (x i ), yi ) subject to reg(w) ≤ B (21)
X
w
i=1

39
Regularizers (cont.)
Learning
Components

A Simple
Learning Model
Hypothesis Set
Learning Algorithm
• L2 -regularization
Feasibility Of
Learning reg(w) = w > w = kwk22 (22)
Probability to the

• L1 -regularization
rescue

Risk and
Emprical Risk reg(w) = kwk1 (23)
Loss function
Empirical risk
Regularizer

40
References

Goodfellow, I., Bengio, Y., and Courville, A. (2016).


Deep learning.
MIT press.
Lê, B. and Tô, V. (2014).
Cở sở trí tuệ nhân tạo.
Nhà xuất bản Khoa học và Kỹ thuật.
Russell, S. and Norvig, P. (2021).
Artificial intelligence: a modern approach.
Pearson Education Limited.

You might also like