0% found this document useful (0 votes)

3 views29 pages

00_Introduction

The document provides an introduction to statistical learning, covering key concepts such as supervised and unsupervised learning, model accuracy assessment, and the issue of overfitting. It outlines the data analytics process and emphasizes the importance of understanding the relationship between input and output variables for effective predictions. Additionally, it discusses methods for estimating models and evaluating their performance using training and test data.

Uploaded by

gedankenmanken

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views29 pages

00_Introduction

Uploaded by

gedankenmanken

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Statistical learning

A brief introduction

Gertraud Malsiner-Walli

Readings: ISLR Chapters 1 & 2

Gertraud Malsiner-Walli Statistical learning 1 / 29

Outline

Framework

Modeling approaches: supervised and unsupervised learning

Statistical learning - general approach

Assessing model accuracy

Overfitting

Gertraud Malsiner-Walli Statistical learning 2 / 29

Framework

Gertraud Malsiner-Walli Statistical learning 3 / 29

AI, ML, DL,. . .

Gertraud Malsiner-Walli Statistical learning 4 / 29

Data analytics process cycle

Gertraud Malsiner-Walli Statistical learning 5 / 29

Data analytics process cycle

(1) Business application is the process driver.

(2) Classic data sources are internal and market data.
(3) Real time data, social media data, IoT (internet of things)
(4) Statistical analysis involves data description and data
exploration, as well as modeling and evaluation.
(5) Proper tools: R, Python, Excel/VBA, cloud services etc.
(6) Storytelling!
(7) Revise periodically

Gertraud Malsiner-Walli Statistical learning 6 / 29

Modeling approaches: supervised and
unsupervised learning

Gertraud Malsiner-Walli Statistical learning 7 / 29

Modeling approaches

▶ The (4) statistical analysis of the analytic process cycle relies

on statistical and machine learning techniques to extract
information from data.
▶ Statistical learning problems can be assigned to one of two
broad types:
▶ Supervised learning
▶ Unsupervised learning

Gertraud Malsiner-Walli Statistical learning 8 / 29

Supervised learning

▶ Supervised learning is used to model an observed output

(target, dependent variables etc) using a set of inputs (features,
independent variables, attributes etc) where we postulate that
there is a relationship between the input and the output.
▶ regression - (typically) used to predict numerical outcomes
▶ classification - used to predict categorical outcomes, most
often binary yes/no type of variables.
▶ Notation:
▶ Y . . . output variable
▶ X1 , X2 , . . . . . . input variables

Gertraud Malsiner-Walli Statistical learning 9 / 29

Examples of regression and classification

▶ Regression:
▶ Y . . . sales
▶ X1 , X2 , X3 . . . . advertising budgets in TV, radio and newspaper

▶ Classification:
▶ Y . . . clicking on an ad (no,yes)
▶ X1 , X2 . . . age, time stamp

Gertraud Malsiner-Walli Statistical learning 10 / 29

Supervised learning: Visualisation

1.0
25

0.8
20

0.6
Clicked.on.Ad
15
Sales

0.4
10

0.2
5

0.0

0 50 100 150 200 250 300 20 30 40 50 60

TV Age

Gertraud Malsiner-Walli Statistical learning 11 / 29

Unsupervised learning

▶ Unsupervised learning:
▶ There is no target variable Y and no feedback based on the
prediction results.
▶ It is used to describe real-world events and to discover latent
relationships responsible for them (relationship between
variables, relationship between observations.
▶ Examples: clustering, dimensionality reduction.

Gertraud Malsiner-Walli Statistical learning 12 / 29

Unsupervised learning: Clustering of Iris data
Measurements of Iris blossoms: How many subtypes of Iris flowers
are there?
Groups are easier to identify
7.5

7.5
Sepal.Length

Sepal.Length
6.5

6.5
5.5

5.5
4.5

4.5
1 2 3 4 5 6 7 1 2 3 4 5 6 7

Petal.Length Petal.Length

Groups are more difficult to identify

7.5

7.5
Sepal.Length

Sepal.Length
6.5

6.5
5.5

5.5
4.5

4.5

2.0 2.5 3.0 3.5 4.0 2.0 2.5 3.0 3.5 4.0

Sepal.Width Sepal.Width

Gertraud Malsiner-Walli Statistical learning 13 / 29

Supervised & Unsupervised learning

▶ They often go hand in hand.

▶ Example:
▶ unsupervised learning can help an organization to understand
its customers by finding customer segments;
▶ supervised learning is used to generate forecasts of the
outcomes of interest (e.g., predict the preferences of the
customers in the identified segments).
▶ Quiz: Example of supervised or unsupervised learning?

Gertraud Malsiner-Walli Statistical learning 14 / 29

Statistical learning - general approach

Gertraud Malsiner-Walli Statistical learning 15 / 29

More on supervised learning
▶ In supervised learning we are interested in a relationship of the
form:
Y = f (X ) + ϵ, where

▶ f is a fixed but unknown function which depends on

X1 , . . . , Xp , and
▶ ϵ is a random error term (“noise”) which is independent of X
and has mean zero.
▶ We can think about f as the systematic information that the
X ’s provide about Y .
▶ Note: there is always some error/noise ϵ present.
▶ Supervised learning refers to a set of approaches for
estimating f.
▶ Q: How does f look like in a linear regression model?

Gertraud Malsiner-Walli Statistical learning 16 / 29

Example: Y = f (X ) + ϵ
25

25
20

20
15

15
Sales

Sales
10

10
5

0 50 100 150 200 250 300 0 50 100 150 200 250 300

TV TV

Gertraud Malsiner-Walli Statistical learning 17 / 29

Why estimate f ?

▶ Prediction purposes:
▶ A good f can serve as a good basis for predictions.

▶ Inference purposes:
▶ We can use f to understand which inputs X1 , . . . , Xp are
important in explaining Y .
▶ We can use f to understand how the different inputs X1 , . . . , Xp
affect Y .
▶ Different methods serve the purposes differently
⇒ degree of flexibility is crucial.

Gertraud Malsiner-Walli Statistical learning 18 / 29

Trade-off between model flexibility / interpretability
▶ Typically, more rigid models may be a good choice for
inference since they allow to understand the relationship
between inputs and output quite easily.
▶ E.g., a linear regression model is very strict about the
relationship (linear):
⇒ they may not yield as accurate predictions as some other
approaches (such as deep learning).
⇒ But: they allows for relatively simple and interpretable
inference.
▶ In contrast, flexible approaches may deliver good predictions.
▶ E.g., deep learning methods are very flexible.
⇒ They make accurate predictions.
⇒ But: they can lead to such complicated estimates of f that
it is difficult to understand how any individual predictor is
associated with the response.

Gertraud Malsiner-Walli Statistical learning 19 / 29

How do we estimate f ?
▶ We assume that we have observed a set of n different data
points (x i , yi ).
▶ These observations are used to estimate f :

Y ≈ fˆ(X )

▶ Two main approaches:

▶ Parametric: we make an assumption about the functional
form of f , e.g., linear, and only estimate the parameters of the
functional form (e.g., β0 and β1 )
▶ Non-parametric: we do not make explicit assumptions
about the functional form of f :
⇒ we try to find an fˆ that gets as close to the data points
as possible without being too rough or wiggly (example:
k-Nearest Neighbors).

Gertraud Malsiner-Walli Statistical learning 20 / 29

Assessing model accuracy

Gertraud Malsiner-Walli Statistical learning 21 / 29

Assessing model fit and prediction accuracy
▶ In order to evaluate the performance of a statistical learning
method on a given data set, we need some way to measure
how well its predictions actually match the observed data.
▶ In regression problems, the most commonly-used measure is
the mean squared error (MSE):
n
1X
MSE = (yi − fˆ(xi ))2
n i=1

▶ In classification problems we look at the confusion matrix of

true vs predicted responses (see Unit 2).
▶ Idea: choose that learning method fˆ which minimizes the
MSE.
▶ However: there is a risk: the method may learn too well the
seen data but cannot predict well the unseen data
⇒ Problem of overfitting!
Gertraud Malsiner-Walli Statistical learning 22 / 29
Overfitting

Gertraud Malsiner-Walli Statistical learning 23 / 29

Problem of overfitting I

▶ Model:
Y = f (X ) + ϵ

▶ Based on data (Y , X ), we want to learn f in order to make

good predictions Ŷ :
Ŷ = fˆ(X )

▶ fˆ is a good estimate of f , if (Y − Ŷ ) small.

▶ BUT: it is difficult to entangle f (X ) from errors ϵ as we only
observe Y !
▶ If we choose a very flexible fˆ, fˆ starts learning the errors ϵ
which are typical for this data set, but not an overall pattern.
▶ Overfitting will give bad predictions for new data!

Gertraud Malsiner-Walli Statistical learning 24 / 29

’Unseen’ data?

▶ Why do we care about the predictions of new, unseen data?

▶ Suppose that we are interested in developing an algorithm to
predict a stock’s price based on previous stock returns:
▶ We can train the method using stock returns from the past 6
months.
▶ But we don’t really care how well our method predicts, e.g., last
week’s stock price.
▶ We instead care about how well it will predict tomorrow’s price
or next month’s price.
▶ Problem: how can we get ‘new’ data to evaluate the
prediction performance of a method?

Gertraud Malsiner-Walli Statistical learning 25 / 29

Training and test MSE

Remedy:
▶ We split the data set into two parts: the training data and the
test data.
▶ We distinguish between - MSE in training data vs.
- MSE in test data.
▶ We train f on the training data (training MSE) and use the
estimated fˆ to make predictions for the test set (test MSE) .
▶ We want to choose the method that gives the lowest test
MSE (as opposed to the lowest training MSE).

Gertraud Malsiner-Walli Statistical learning 26 / 29

Training and test MSE
▶ We did an experiment: we split a data set into two parts: the
training data and the test data.
▶ We trained f on the training data (for 3 different degrees of
flexibility) and used the estimated fˆ to make predictions for the
test set.

▶ Plots: left: points + fitted curves for the training data. Right:
MSE for fitted curves (test MSE in red, train MSE in gray).
Gertraud Malsiner-Walli Statistical learning 27 / 29
The bias-variance trade-off
▶ The U-shape curve observed in the test MSE turns out to be
the result of two competing properties of a learning method:

Expected test MSE = Var (fˆ) + Bias(fˆ)2 + Var (ϵ)

▶ Bias(fˆ): refers to the error that is introduced by approximating

a real-life problem by a simpler f .
▶ Variance(fˆ): refers to the amount by which fˆ would change, if
we estimated it by a different data set.
▶ Simpler models may contain bias, but have lower variance.
▶ Flexible models have usually low bias but high variance:
changing the data slightly might cause results to change
drastically.-
▶ No free lunch: no one method dominates all others over all
possible data sets.
Gertraud Malsiner-Walli Statistical learning 28 / 29
How to compute the test MSE?
▶ How can we measure the prediction accuracy of the method for
unseen data?
▶ Solution: split the data into two or more parts.
▶ A variety of approaches:
▶ Train - test split: split the data randomly into a train and test
sample and compute MSE on test sample (can lead to high bias
if we have limited data, as we could miss information which was
not included in the training data).
▶ K-fold cross-validation: split the entire data randomly into K
folds; fit the model using the K-1 folds and evaluate the model
using the remaining fold; repeat this process until every K-fold
serve as the test set.
▶ Validation set approach: split data in 3 parts: train,
validation and test set. Use the validation set to tune f . Use
test set only for testing.

Gertraud Malsiner-Walli Statistical learning 29 / 29

Rubric For Simulation Activity
67% (3)
Rubric For Simulation Activity
3 pages
Ch2_Statistical_Learning
No ratings yet
Ch2_Statistical_Learning
51 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
Capitulo 2 big data
No ratings yet
Capitulo 2 big data
25 pages
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
1 Statistical Learning
No ratings yet
1 Statistical Learning
42 pages
ML_Valkenborg
No ratings yet
ML_Valkenborg
84 pages
lec1
No ratings yet
lec1
54 pages
Machine Learning Notes
100% (3)
Machine Learning Notes
134 pages
DS-05 Introduction To Machine Learning
No ratings yet
DS-05 Introduction To Machine Learning
103 pages
Chapter 2
No ratings yet
Chapter 2
5 pages
QSRI-lecture1
No ratings yet
QSRI-lecture1
45 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
ML_Introduction
No ratings yet
ML_Introduction
76 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
4_U1_AIML_PART4
No ratings yet
4_U1_AIML_PART4
38 pages
ML Merge
No ratings yet
ML Merge
145 pages
BTMMeeting25Nov2020-StatisticalLearning
No ratings yet
BTMMeeting25Nov2020-StatisticalLearning
49 pages
Machine Learning
No ratings yet
Machine Learning
64 pages
Chapter 2
No ratings yet
Chapter 2
35 pages
MI_Unit 3
No ratings yet
MI_Unit 3
107 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
Intro_DL_01
No ratings yet
Intro_DL_01
64 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
3171617_introduction_1175
No ratings yet
3171617_introduction_1175
58 pages
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
No ratings yet
Huawei H12-211 PRACTICE EXAM HCNA-HNTD H
117 pages
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
No ratings yet
ML:Introduction What Is Machine Learning?: Continuous and Discrete Data
6 pages
ML 01
No ratings yet
ML 01
24 pages
Lecture 2
No ratings yet
Lecture 2
22 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Fundamentals Part 2
No ratings yet
Fundamentals Part 2
40 pages
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
No ratings yet
Matthias Schonlau, Ph.D. Statistical Learning - Classification Stat441
30 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
A Preliminary Idea On Machine Learning
No ratings yet
A Preliminary Idea On Machine Learning
40 pages
Machine Learning Concepts
No ratings yet
Machine Learning Concepts
68 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
Chapter 01 Introduction to ML
No ratings yet
Chapter 01 Introduction to ML
178 pages
Module 2
No ratings yet
Module 2
84 pages
Lecture 3 - Machine learning and data driven analysis
No ratings yet
Lecture 3 - Machine learning and data driven analysis
36 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Lec-01-Introduction to Statistical Learning
No ratings yet
Lec-01-Introduction to Statistical Learning
38 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
86 37 196 Mod 5
No ratings yet
86 37 196 Mod 5
52 pages
ML Week1 Learning Styles PDF
No ratings yet
ML Week1 Learning Styles PDF
2 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
Unit 4 - Machine Learning PDF
No ratings yet
Unit 4 - Machine Learning PDF
49 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Chapter 2
No ratings yet
Chapter 2
38 pages
machine learning
No ratings yet
machine learning
37 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
unit 1
100% (1)
unit 1
13 pages
Chapter 1 Introduction To Machine Learning
No ratings yet
Chapter 1 Introduction To Machine Learning
29 pages
Prediction Revisited: The Importance of Observation
From Everand
Prediction Revisited: The Importance of Observation
Mark P. Kritzman
No ratings yet
Digital SAT Prep 2025/2026 For Dummies: Book + 4 Practice Tests + Flashcards Online
From Everand
Digital SAT Prep 2025/2026 For Dummies: Book + 4 Practice Tests + Flashcards Online
Ron Woldoff
No ratings yet
Evidence Guided: Creating High Impact Products in the Face of Uncertainty
From Everand
Evidence Guided: Creating High Impact Products in the Face of Uncertainty
Itamar Gilad
No ratings yet
19CS4101 - Machine Learning
No ratings yet
19CS4101 - Machine Learning
2 pages
Orlando's Theory
No ratings yet
Orlando's Theory
7 pages
The Observer and The Process of Observation
No ratings yet
The Observer and The Process of Observation
5 pages
Mathematics LRP
No ratings yet
Mathematics LRP
2 pages
ESL - Language - Test Inglés 2023 12 15 61898
No ratings yet
ESL - Language - Test Inglés 2023 12 15 61898
5 pages
TED@Work - IBM - Case Study
No ratings yet
TED@Work - IBM - Case Study
1 page
Norwood Public Schools - 2019-2020 School Year Calendar: 2019 Sept
No ratings yet
Norwood Public Schools - 2019-2020 School Year Calendar: 2019 Sept
2 pages
Bhat Et Al., 2021 Brachyuran Crabs Diversity, Goa
No ratings yet
Bhat Et Al., 2021 Brachyuran Crabs Diversity, Goa
12 pages
LMSTC Questionnaire EFFECTIVENESS IN THE IMPLEMENTATION OF LUCENA MANPOWER SKILLS TRAINING CENTER BASIS FOR PROGRAM ENHANCEMENT
No ratings yet
LMSTC Questionnaire EFFECTIVENESS IN THE IMPLEMENTATION OF LUCENA MANPOWER SKILLS TRAINING CENTER BASIS FOR PROGRAM ENHANCEMENT
3 pages
Sam Updated Resume
No ratings yet
Sam Updated Resume
3 pages
Advanced Engineering Mathematics 8th Edition Oneil Solutions Manual
100% (48)
Advanced Engineering Mathematics 8th Edition Oneil Solutions Manual
36 pages
The Learn'd Astronomer - Notes
No ratings yet
The Learn'd Astronomer - Notes
2 pages
INTERNSHIP REPORT Don
No ratings yet
INTERNSHIP REPORT Don
24 pages
2 PPT Lecture Unit 2 Lesson 1 - Functions of Art
No ratings yet
2 PPT Lecture Unit 2 Lesson 1 - Functions of Art
19 pages
The Architectural Association Library Jørn Utzon: A Selective Bibliography
No ratings yet
The Architectural Association Library Jørn Utzon: A Selective Bibliography
9 pages
Open Fall 2024 BUSI SAV 220 13997 Business II - Economic Principles - PDF 3
No ratings yet
Open Fall 2024 BUSI SAV 220 13997 Business II - Economic Principles - PDF 3
32 pages
SSC Result 2020-21
No ratings yet
SSC Result 2020-21
6 pages
School demo
No ratings yet
School demo
6 pages
570 Study Guide - Section 7 - Insp Data Evaluation & Analysis Flashcards by Antonio Garcia - Brainscape
No ratings yet
570 Study Guide - Section 7 - Insp Data Evaluation & Analysis Flashcards by Antonio Garcia - Brainscape
15 pages
AI Centric Modelling and Analytics Book PDF
No ratings yet
AI Centric Modelling and Analytics Book PDF
2 pages
Dilla University (Only For Presentation)
No ratings yet
Dilla University (Only For Presentation)
35 pages
CRPF Form 2023 - Washerman
No ratings yet
CRPF Form 2023 - Washerman
3 pages
Top 9 Math Strategies For Engaging Lessons
No ratings yet
Top 9 Math Strategies For Engaging Lessons
9 pages
HRM6 True or False
No ratings yet
HRM6 True or False
2 pages
First Aid Kit Inspection Form
No ratings yet
First Aid Kit Inspection Form
1 page
Simplified Melc-Based Budget of Lessons in English 10
No ratings yet
Simplified Melc-Based Budget of Lessons in English 10
4 pages
Orientation Script 4-6 9-10
No ratings yet
Orientation Script 4-6 9-10
3 pages
HUM 120 Syllabus
No ratings yet
HUM 120 Syllabus
4 pages
International-Scholarship-Opportunity-Advert-FINAL-2-4
No ratings yet
International-Scholarship-Opportunity-Advert-FINAL-2-4
3 pages

00_Introduction

Uploaded by

00_Introduction

Uploaded by

Statistical learning

Readings: ISLR Chapters 1 & 2

Gertraud Malsiner-Walli Statistical learning 1 / 29

Modeling approaches: supervised and unsupervised learning

Statistical learning - general approach

Assessing model accuracy

Gertraud Malsiner-Walli Statistical learning 2 / 29

Gertraud Malsiner-Walli Statistical learning 3 / 29

Gertraud Malsiner-Walli Statistical learning 4 / 29

Gertraud Malsiner-Walli Statistical learning 5 / 29

(1) Business application is the process driver.

Gertraud Malsiner-Walli Statistical learning 6 / 29

Gertraud Malsiner-Walli Statistical learning 7 / 29

▶ The (4) statistical analysis of the analytic process cycle relies

Gertraud Malsiner-Walli Statistical learning 8 / 29

▶ Supervised learning is used to model an observed output

Gertraud Malsiner-Walli Statistical learning 9 / 29

Gertraud Malsiner-Walli Statistical learning 10 / 29

0 50 100 150 200 250 300 20 30 40 50 60

Gertraud Malsiner-Walli Statistical learning 11 / 29

Gertraud Malsiner-Walli Statistical learning 12 / 29

Groups are more difficult to identify

Gertraud Malsiner-Walli Statistical learning 13 / 29

▶ They often go hand in hand.

Gertraud Malsiner-Walli Statistical learning 14 / 29

Gertraud Malsiner-Walli Statistical learning 15 / 29

▶ f is a fixed but unknown function which depends on

Gertraud Malsiner-Walli Statistical learning 16 / 29

Gertraud Malsiner-Walli Statistical learning 17 / 29

Gertraud Malsiner-Walli Statistical learning 18 / 29

Gertraud Malsiner-Walli Statistical learning 19 / 29

▶ Two main approaches:

Gertraud Malsiner-Walli Statistical learning 20 / 29

Gertraud Malsiner-Walli Statistical learning 21 / 29

▶ In classification problems we look at the confusion matrix of

Gertraud Malsiner-Walli Statistical learning 23 / 29

▶ Based on data (Y , X ), we want to learn f in order to make

▶ fˆ is a good estimate of f , if (Y − Ŷ ) small.

Gertraud Malsiner-Walli Statistical learning 24 / 29

▶ Why do we care about the predictions of new, unseen data?

Gertraud Malsiner-Walli Statistical learning 25 / 29

Gertraud Malsiner-Walli Statistical learning 26 / 29

Expected test MSE = Var (fˆ) + Bias(fˆ)2 + Var (ϵ)

▶ Bias(fˆ): refers to the error that is introduced by approximating

Gertraud Malsiner-Walli Statistical learning 29 / 29

You might also like