383 Fall11 Lec19
383 Fall11 Lec19
CMPSCI 383
Nov 15, 2011!
1
Todayʼs topics"
• Learning from Examples: brief review!
• Univariate Linear Regression!
• Batch gradient descent!
• Stochastic gradient descent!
• Multivariate Linear Regression!
• Regularization!
• Linear Classifiers!
• Perceptron learning rule!
• Logistic Regression!
2
Learning from Examples (supervised learning)"
3
Learning from Examples (supervised learning)"
4
Learning from Examples (supervised learning)"
5
Learning from Examples (supervised learning)"
6
Learning from Examples (supervised learning)"
7
Learning from Examples (supervised learning)"
8
Important issues"
• Generalization !
• Overfitting!
• Cross-validation!
• Holdout cross validation!
• K-fold cross validation!
• Leave-one-out cross-validation!
• Model selection!
9
Recall Notation"
hypothesis!
€
€
10
Loss Functions"
Suppose the true prediction for input x is f (x) = y
but the hypothesis gives h(x) = yˆ
11
Univariate Linear Regression"
12
Univariate Linear Regression contd."
Loss(hw ) = ∑ L2 (y j , hw (x j )) =∑ (y j − hw (x j )) 2 =∑ (y j − (w1 x j + w 0 )) 2
j =1 j =1 j =1
€ 13
€
Weight Space"
14
Finding w*"
∂ N ∂ N
∑
∂w 0 j =1
2
(y j − (w1 x j + w 0 )) = 0 and ∑
∂w1 j =1
(y j − (w1 x j + w 0 )) 2 = 0
15
Gradient Descent"
∂
wi ← wi − α Loss(w)
∂w i
step size or!
learning rate!
16
Gradient Descent contd."
hsw (x j ) = w⋅ x j = wT x j = ∑ w i x j,i
€ i
€
And batch gradient descent update becomes:!
€ w i ← w i + α ∑ (y j − hw (x j ))x j,i
j
18
€
The Multivariate case contd."
19
Regularization"
20
L1 vs. L2 Regularization"
21
Linear Classification: hard thresholds"
22
Linear Classification: hard thresholds contd."
• Decision Boundary:!
• In linear case: linear separator, a hyperplane!
• Linearly separable: !
• data is linearly separable if the classes can be
separated by a linear separator!
• Classification hypothesis:!
hw (x) = Threshold(w⋅ x) where Threshold(z) = 1 if z ≥ 0 and 0 otherwise
23
Perceptron Learning Rule"
• If the output is correct, i.e., y =h w (x), then the weights don't change
• If y = 1 but hw (x) = 0, then w i is increased when x i is positive and decreased when x i is negative.
€ • If y = 0 but hw (x) = 1, then w i is decreased when x i is positive and increased when x i is negative.
24
Perceptron Performance"
25
Linear Classification with Logistic Regression"
An important function!!
26
Logistic Regression"
1
hw (x) = Logistic(w⋅ x) =
1+ e −w⋅x
27
Logistic Regression Performance"
separable case!
28
Summary"
• Learning from Examples: brief review!
• Loss functions!
• Generalization!
• Overfitting!
• Cross-validation!
• Regularization!
• Univariate Linear Regression!
• Batch gradient descent!
• Stochastic gradient descent!
• Multivariate Linear Regression!
• Regularization!
• Linear Classifiers!
• Perceptron learning rule!
• Logistic Regression!
29
Next Class"
30