0% found this document useful (0 votes)

81 views30 pages

Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441

This document provides an overview of statistical learning and classification concepts. It discusses what statistical learning is, how overfitting is addressed through train-test splits, and the bias-variance tradeoff. Key concepts explained include prediction versus interpretation, regression versus classification, and supervised versus unsupervised learning. Examples using Dutch income data demonstrate how a learning algorithm can perfectly fit training data but fail to generalize, and how evaluating on a test set addresses overfitting.

Uploaded by

1plus12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views30 pages

Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441

Uploaded by

1plus12

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Statistical Learning Classification

Stat441
Matthias Schonlau, Ph.D.
Statistical Learning Classification
Stat441

Overview
What is statistical learning?
Overfitting/ Train-test split
Example Dutch income vs age

Bias-variance tradeoff
Some concepts
Prediction vs. Interpretation
Regression vs. Classification
Supervised vs. unsupervised
Reading: Chapter 2, James et al.

What is statistical learning?

Linear regression
Y=f(x) + e
Where f(x) is a linear function.
The parameters beta are unknown, but the
functional form (linearity) is known.
Functional form is known except for possible
variable selection, quadratic terms etc.

What is statistical learning?

Y=f(x) + e
In statistical learning f(x) is unknown and must
be estimated from the data
It is usually not possible to write down f(x)
with easy equations
It may contains 100s/1000s of parameters
It is not easily interpretable
Sometimes called black box methods

LISS survey panel

in the
Netherlands
August 2015
A random subset
of about n=60
observations
income vs age

Personal gross monthly income in Euros, imputed

0
2000
4000
6000

Dutch Income vs age

40
Age of the household member

When y spans
several orders of
magnitude, it often
makes sense to
take log(y)
Caution: log(0)
i.e.

ymax >> 10 ymin

Personal gross monthly income in Euros, imputed

0
2000
4000
6000

Dutch Income vs age

40
Age of the household member

6
logincome
4
2
0

Log income
(+1 euro) vs
age
A random
subset of
about n=60
observations

Dutch Income vs age

40
Age of the household member

5
0

Linear
regression
with 95%
confidence
interval
Because the
relationship is
not linear, the
fit is poor

Dutch income vs. age

40
60
Age of the household member
95% CI
logincome

Fitted values

100

Dutch Income vs age

Now fit a learning algorithm on the (full)
data
Use boosting with interaction= 10
Think of boosting as a blackbox method for
now
we will learn later what this means

6
4
2
0

Predicts
every obs
perfectly.
Learning is
very
flexible.

Dutch income vs age

40
60
Age of the household member
logincome

100

Prediction

graph twoway (scatter logincome leeftijd )

(line predb leeftijd)

6
4
2
0

40
60
Age of the household member
logincome

100

Prediction

Train=50%

The fit with

train=50% can be
further improved,
but this is not the
point here.

By splitting the
data into training
and test set,
overfitting can be
avoided.

Train=100%

40
60
Age of the household member
logincome

Prediction

100

Train=100%

boost logincome leeftijd, inter(10) shrink(.01)

bag(1) train(1.0) pred(predb)
distribution(normal)

The only difference is

the specification of
the training data

40
60
Age of the household member
logincome

100

Prediction

4
2
0

boost logincome leeftijd, inter(10) shrink(.01)

bag(1) train(0.5) pred(predb)
distribution(normal)

Train=50%

40
60
Age of the household member
logincome

Prediction

100

5
0

Scatter plot of the

full data set
(n~=6000)
Regression line
Predicted values
(using SVMs)

Dutch Income vs Age

40
60
Age of the household member
logincome

100

pred

graph twoway (scatter logincome leeftijd ,msize(.1)

jitter(2) ) (line pred leeftijd, lcolor(orange))

Overfitting
Overfitting means the model fits the random
noise of a sample rather than the
generalizable relationship.
Overfitting tends to occur when the model has
too many parameters relative to the number
of observations.
Learning algorithms are designed to be
flexible and tend to have a lot of parameters.

Overfitting
How does one defend against overfitting?
Separate the data into training and test data.
Fit the model on the training data.
Evaluate the model fit on the test data.

If you accidentally fit the noise in the training

data, the fit on the test data will deteriorate.

Evaluation of fit
For continuous outcomes, the fit is often
evaluated with the mean squared error:

1 n
(x )
=

MSE
y
f

i
i
n i =1

The MSE can be computed both for the

training and the test data.

Evaluation of fit
All machine learning algorithms have at least
one tuning parameter
A tuning parameter governs how flexible the
fit is.
One can plot the MSE as a function of the
flexibility parameter.
Both for the training data and the test data

Test error vs train error

With increased flexibility,
the error in the training
data will continue to
decrease
the error on the test data
will eventually increase
(U-shape)

The best fit occurs where

the test error is
minimized.
Figure 2.9 James et al.

Dutch income data

Train=100%

In boosting, a flexibility
parameter is the number of
iterations
Train=100%:
Fit as many iterations until
best fit is achieved.
0
When there are duplicates in
x with differing ys, perfect fitTrain=50%
(MSE=0) is not possible

40
60
Age of the household member

100

Prediction

bestiter= 152

4
2
0

Train=50%:
Fit as many iterations as it takes to
minimize MSE (or an equivalent
criterion).
Here, this is done automatically.
The output contains information
about the best number of iterations:

logincome

40
60
Age of the household member
logincome

Prediction

100

Evaluation of fit
For the Dutch income example, since we had
perfect predictions on the training data
MSEtrain = 0

Bias-Variance tradeoff
There is a reason why the U-shape in the test
MSE occurs.
The expected MSE can be decomposed as:
E ( y0 f ( x0 ) )= Var ( f ( xo )) + Bias ( f ( x0 ) + Var ( )
2

Where y0 and x0 refer to test data.

All three components are non-negative.

Var ( ) is a lower bound on the MSE

Bias-Variance tradeoff
Variance refers to the variation we would
get by using a different training data set
Bias refers to the error between the learning
model and the true function.
The equation states we need to
simultaneously minimize both bias and
variance.

Bias- Variance
tradeoff
All curves refer to the
test data
Squared bias (blue
curve)
Variance (Orange curve)
Var(epsilon) (dashed
line)
Test error (red line)

Figure 2.12 from James et al.

Biasvariance tradeoff
When the training data are large, a flexible
learning algorithm may be able to eliminate
much of the bias.
In real life the true function is unknown and it
is not possible to compute this bias variance/
tradeoff explicitly.
But it is useful to keep this tradeoff in mind.

Some concepts
We will now talk about some other concepts:
Prediction vs. Interpretation
Regression vs. Classification
Supervised vs. unsupervised

Interpretation vs. Prediction

Linear regression:
Interpretable.
Can specify how any one variable affects y

Good at prediction only when the model is correct

Statistical learning:
Excels at prediction
Making learning algorithm interpretable is
challenging

Regression vs. classification

Regression:
y is continuous
Classification

Y indicates class membership in one of L classes

Important special case is L=2
Has cancer vs no cancer
Default on loan vs. No default on loan

Logistic regression is the most common method when

L=2 .
The designation classification is not consistently used

Supervised vs. Unsupervised learning

Supervised learning :
y and x-vars are known
Unsupervised learning :
There are only x-vars. y is unknown
Often the goal is to cluster observations into
group.

Supervised vs unsupervised analysis

Supervised
analysis if
the colors
(i.e. ys) are
known
Unsupervised
analysis if the
colors are not
known

Course outline
Models for Supervised learning
(emphasizing classification)
Logistic regression /
Multinomial regression
Discriminant analysis
k nearest neighbours
Nave Bayes
Trees
Random forests
Boosting
Support vector machines
Neural networks

Issues in supervised learning

Overfitting
Feature (=variable)
selection
Text mining
Turning text into numerical
variables

Multi-label learning
Case studies

The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
From Everand
The Practically Cheating Statistics Handbook, The Sequel! (2nd Edition)
S. Deviant
4.5/5 (3)
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
Lecture 2 Ai
No ratings yet
Lecture 2 Ai
24 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
ML Question Bank Solution
No ratings yet
ML Question Bank Solution
95 pages
Machine Learning
No ratings yet
Machine Learning
37 pages
Ch2 Statistical Learning
No ratings yet
Ch2 Statistical Learning
51 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
21 pages
Complete ML Concepts
No ratings yet
Complete ML Concepts
30 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
1 Machine Learning
No ratings yet
1 Machine Learning
111 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
ML Hand Written Notes
No ratings yet
ML Hand Written Notes
19 pages
Unit-I Machine Learning Basics
No ratings yet
Unit-I Machine Learning Basics
85 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Lec 1
No ratings yet
Lec 1
54 pages
Regression 0
No ratings yet
Regression 0
108 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Data Analyst Interview Questionaries
No ratings yet
Data Analyst Interview Questionaries
16 pages
Mining Process
No ratings yet
Mining Process
33 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Overfitting & Feature Engineering
No ratings yet
Overfitting & Feature Engineering
37 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
Machine Learning
No ratings yet
Machine Learning
21 pages
Types of Machine Learning Algorithms
No ratings yet
Types of Machine Learning Algorithms
14 pages
Capitulo 2 Big Data
No ratings yet
Capitulo 2 Big Data
25 pages
ML3 - Evaluation
100% (1)
ML3 - Evaluation
65 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
FinQuiz - Curriculum Note, @InsightSquad Study Session 3, Reading 7
No ratings yet
FinQuiz - Curriculum Note, @InsightSquad Study Session 3, Reading 7
11 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
ML MU Unit 2
100% (2)
ML MU Unit 2
42 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
DL Unit1
100% (2)
DL Unit1
79 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
Deep Learning
No ratings yet
Deep Learning
26 pages
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
No ratings yet
Machine Learning Cheatsheet Compiled and Curated by Robins Yadav
14 pages
Unit 2
No ratings yet
Unit 2
76 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
ML Unit 2
No ratings yet
ML Unit 2
37 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
ML Unit 2
No ratings yet
ML Unit 2
35 pages
Lecture 17
No ratings yet
Lecture 17
33 pages
Module 1
No ratings yet
Module 1
50 pages
UNIT II Machine Learning
No ratings yet
UNIT II Machine Learning
43 pages
Classification
No ratings yet
Classification
53 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
Crafting Careful Qualitative Questions: Fruitful Discussions Start With Powerful Questions
No ratings yet
Crafting Careful Qualitative Questions: Fruitful Discussions Start With Powerful Questions
2 pages
Houdini 3 Chess Engine - User's Guide PDF
No ratings yet
Houdini 3 Chess Engine - User's Guide PDF
40 pages
Textbook Solutions 2.1-2.5
No ratings yet
Textbook Solutions 2.1-2.5
3 pages
Mipsref
No ratings yet
Mipsref
1 page
CS 234 1059 Midterm Solutions
No ratings yet
CS 234 1059 Midterm Solutions
9 pages
MATH 135 1109 Midterm1 Solutions
No ratings yet
MATH 135 1109 Midterm1 Solutions
3 pages
Week 02 Overall Feedback
No ratings yet
Week 02 Overall Feedback
1 page
A 1
No ratings yet
A 1
2 pages
Weekly Report Plan - Second Week
No ratings yet
Weekly Report Plan - Second Week
11 pages
Yogender K. Sharma-Foundations in Sociology of Education-Kanishka Publishers (2003)
0% (1)
Yogender K. Sharma-Foundations in Sociology of Education-Kanishka Publishers (2003)
239 pages
Teaching Guide in General Chemistry 2 (CHM 1202) : College of St. Catherine Quezon City
No ratings yet
Teaching Guide in General Chemistry 2 (CHM 1202) : College of St. Catherine Quezon City
4 pages
Aps 204
No ratings yet
Aps 204
8 pages
Summary Samples Exercises 1
No ratings yet
Summary Samples Exercises 1
11 pages
French Secrets Revealed
No ratings yet
French Secrets Revealed
4 pages
Example Making Outline Chapter 1
No ratings yet
Example Making Outline Chapter 1
7 pages
Case Study: Outline
No ratings yet
Case Study: Outline
14 pages
Mental Health CH 2
No ratings yet
Mental Health CH 2
6 pages
Summary Report Review of The Current Situation and Practices of PHL Multigrade Schools Comp
100% (1)
Summary Report Review of The Current Situation and Practices of PHL Multigrade Schools Comp
59 pages
Answer The Question 5
No ratings yet
Answer The Question 5
2 pages
Component 1 - NBCT World Language Teaching Methodsexam 2025 Update - Practice Questions and Correct Verified Answers (Complete Solutions) Assured Success - Graded A+!!! Flashcards - Quizlet
No ratings yet
Component 1 - NBCT World Language Teaching Methodsexam 2025 Update - Practice Questions and Correct Verified Answers (Complete Solutions) Assured Success - Graded A+!!! Flashcards - Quizlet
14 pages
Assignment 4 Resource Referral Intake Assessment Guide and Template sw316
No ratings yet
Assignment 4 Resource Referral Intake Assessment Guide and Template sw316
4 pages
Examining Effectiveness
No ratings yet
Examining Effectiveness
13 pages
Quick Reference Guide To Asd and Aba
100% (1)
Quick Reference Guide To Asd and Aba
2 pages
Relaciones Entre TDL, TCS, TDA
No ratings yet
Relaciones Entre TDL, TCS, TDA
14 pages
st20219157 BSP6064 WRIT1
No ratings yet
st20219157 BSP6064 WRIT1
8 pages
PR 2 - Long Exam 1
No ratings yet
PR 2 - Long Exam 1
3 pages
Module 3 - Ideation Tools and Techniques
No ratings yet
Module 3 - Ideation Tools and Techniques
29 pages
STS Activity 1
No ratings yet
STS Activity 1
2 pages
DLL 3rd Quarter wk2 Daily Lesson Log in English 9
No ratings yet
DLL 3rd Quarter wk2 Daily Lesson Log in English 9
7 pages
Mother Tongue
No ratings yet
Mother Tongue
2 pages
Sadia
No ratings yet
Sadia
3 pages
Uid 105 Passion Achievement
No ratings yet
Uid 105 Passion Achievement
4 pages
Cloud Computing
No ratings yet
Cloud Computing
1 page
Benefits of Reading
No ratings yet
Benefits of Reading
3 pages
Program Flow Matatag 2024
No ratings yet
Program Flow Matatag 2024
4 pages
ESL in The Mainstream - Talk Moves
No ratings yet
ESL in The Mainstream - Talk Moves
11 pages
Mbade Handbook
No ratings yet
Mbade Handbook
21 pages
Informative Writing Lesson Plan
No ratings yet
Informative Writing Lesson Plan
5 pages

Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441

Uploaded by

Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441

Uploaded by

Statistical Learning Classification

What is statistical learning?

What is statistical learning?

LISS survey panel

Personal gross monthly income in Euros, imputed

Dutch Income vs age

ymax >> 10 ymin

Personal gross monthly income in Euros, imputed

Dutch Income vs age

Dutch Income vs age

Dutch income vs. age

Dutch Income vs age

Dutch income vs age

graph twoway (scatter logincome leeftijd )

The fit with

boost logincome leeftijd, inter(10) shrink(.01)

The only difference is

boost logincome leeftijd, inter(10) shrink(.01)

Scatter plot of the

Dutch Income vs Age

graph twoway (scatter logincome leeftijd ,msize(.1)

If you accidentally fit the noise in the training

The MSE can be computed both for the

Test error vs train error

The best fit occurs where

Dutch income data

Where y0 and x0 refer to test data.

All three components are non-negative.

Figure 2.12 from James et al.

Interpretation vs. Prediction

Good at prediction only when the model is correct

Regression vs. classification

Y indicates class membership in one of L classes

Logistic regression is the most common method when

Supervised vs. Unsupervised learning

Supervised vs unsupervised analysis

Issues in supervised learning

You might also like