0% found this document useful (0 votes)

22 views32 pages

2.SupervisedLearning Error

Uploaded by

sandeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views32 pages

2.SupervisedLearning Error

Uploaded by

sandeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

ZG 512

Supervised Learning
Errors and other concepts
BITS Pilani Dr Arindam Roy
Pilani Campus
The Supervised Learning Problem
Starting point:

• Outcome measurement Y (also called dependent variable,response, target).

• Vector of p predictor measurements X (also called inputs, regressors, covariates, features, independent

variables).

• In the regression problem, Y is quantitative (e.g price,blood pressure).

• In the classification problem, Y takes values in a finite, unordered set (survived/died, digit 0-9, cancer class of

tissue sample).

• We have training data (x1, y1), . . . ,(xN , yN ). These are observations (examples, instances) of these

measurements.

BITS Pilani, Pilani Campus

Objectives

On the basis of the training data we would like to:

• Accurately predict unseen test cases.

• Understand which inputs affect the outcome, and how.

• Assess the quality of our predictions and inferences.

BITS Pilani, Pilani Campus

Philosophy

• It is important to understand the ideas behind the various techniques, in order to

know how and when to use them.
• One has to understand the simpler methods first, in order to grasp the more
sophisticated ones.
• It is important to accurately assess the performance of a method, to know how well
or how badly it is working [simpler methods often perform as well as fancier ones!]
• This is an exciting research area, having important applications in science, industry
and finance.
• Statistical learning/Machine Learning is a fundamental ingredient in the training of a
modern data scientist.

BITS Pilani, Pilani Campus

Unsupervised Learning

• No outcome variable, just a set of predictors (features) measured on a set of

samples.

• objective is more fuzzy — find groups of samples that behave similarly, find features
that behave similarly, find linear combinations of features with the most variation.

• difficult to know how well your are doing.

• different from supervised learning, but can be useful as a pre-processing step for
supervised learning.

BITS Pilani, Pilani Campus

Statistical Learning Vs. Machine Learning

• Machine learning arose as a subfield of Artificial Intelligence.

• Statistical learning arose as a subfield of Statistics.
• There is much overlap — both fields focus on supervised and unsupervised
problems:
– Machine learning has a greater emphasis on large scale applications and prediction accuracy.
– Statistical learning emphasizes models and their interpretability, and precision and uncertainty.
• But the distinction has become more and more blurred, and there is a great deal of
“cross-fertilization”.
• Machine learning has the upper hand in Marketing!

BITS Pilani, Pilani Campus

What is Machine Learning

Shown are Sales vs TV, Radio and Newspaper, with a blue linear-regression line fit
separately to each.
Can we predict Sales using these three?
Perhaps we can do better using a model: Sales ≈ f (TV, Radio, Newspaper)

BITS Pilani, Pilani Campus

Prediction

• In simplest term, the prediction task is all about how can I design f(x) in such a way
that I get very close to the value of Y, for the given values of X.
Y = f(X) + ε
• Our estimator function is 𝕗 and our predicted value as 𝕐

• f(X) vs ƒˆ(x)? and Y vs Ŷ? – True function vs. estimated function

• Now that we have the actual (Y) and the predicted (Ŷ) value, the difference
between Y and Ŷ is the prediction error.

BITS Pilani, Pilani Campus

Prediction

Prediction error is influenced by 2 factors:

• Difference between f(X) and ƒˆ(x), which we term as the Reducible Error
• ε , which we call as the Irreducible Error

• Given infinite time, we know that we can figure out a good enough estimator and bring down the
Reducible Error closer to 0.
• So, Reducible errors are those errors which we have control of and can reduce them by choosing
more optimal models.
• E(Y - ƒˆ(x))2 = Reducible + irreducible Error
• It can be mathematically proven than average squared prediction error can be decomposed into two
parts – Reducible and Irreducible Error

BITS Pilani, Pilani Campus

Can we reduce the Irreducible error?

• In machine learning, the irreducible error, often called noise, is the component of
the error that cannot be reduced no matter how good our model is. This error
comes from the inherent randomness and variability in the data that we cannot
model or predict.

• Irreducible error can arise from several sources, including:

– Intrinsic variability: Natural randomness in the data or phenomena being modeled.
– Measurement noise: Errors in data collection or measurement inaccuracies.
– Incomplete data: Important variables or features that are not included in the dataset.
– Latent variables: Hidden variables that affect the outcomes but are not directly observed or measured.

BITS Pilani, Pilani Campus

Can we reduce the Irreducible error?

• Since irreducible error is due to these inherent aspects of the data and the system
being modeled, it cannot be reduced through improvements in the model or
learning algorithm. However, understanding the domain and managing irreducible
error is crucial:
– Improve data quality: Reducing measurement noise and improving data collection processes can help minimize the
contribution of measurement errors.
– Data augmentation: Adding more relevant data can help ensure that the variability captured is more representative of
the true distribution.
– Feature engineering: Identifying and including relevant features might reduce the perceived irreducible error by
accounting for more variability in the data.

• Ultimately, the focus in machine learning is to minimize reducible errors—those

errors that can be mitigated through better models, algorithms, and data
preprocessing techniques.

BITS Pilani, Pilani Campus

How to estimate f

• Typically we have few if any data points with X = 4

exactly.
• So we cannot compute E ( Y |X = x)!
• Relax the definition and let
ƒˆ(x) = Ave(Y | X ∈N (x))
where N (x) is some neighborhood of x.
3

● ●
●
2

●
●
● ●
● ● ●
● ●
1

● ●●●
●● ●
●
●● ●
y

●
●● ●●●
● ●● ● ● ●● ●
0

● ● ● ● ●
● ● ● ●
●
● ● ●● ● ● ● ●
●
−1

● ● ●● ● ● ●
●●
−2

1 2 3 4 5 6

x
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
The curse of dimensionality

• Nearest neighbor averaging can be pretty good for small p

— i.e. p ≤ 4 and large-ish N .

• Nearest neighbor methods can be lousy when p is large.

Reason: the curse of dimensionality. Nearest neighbors
tend to be far away in high dimensions.

• We need to get a reasonable fraction of the N values of y i

to average to bring the variance down—e.g. 10%.
• A 10% neighborhood in high dimensions need no longer be
local, so we lose the spirit of estimating by local averaging.

7 / 30
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
The curse of dimensionality

10% Neighborhood

p= 10

1.0
● ● ●●●
● ● ●
● ● ● ● ● ●●

1.5
● ● ● ●
● ●
● ●
● ● ●
0.5 ● ● ● p= 5
● ●
● ● ●

●●●
● ●
p= 3

1.0
●

Radius
● ● ● ●
● ●
p= 2
0.0

● ●
x2

● ●● ●
● ●●
●
● ● ●
● ●
●●
●
● ● p= 1
● ● ●● ●

0.5
−0.5

●
● ●
● ●●
● ●
●●
● ● ●
● ●●
● ● ● ● ●
● ●
−1.0

●
●

0.0
●

−1.0 −0.5 0.0 0.5 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

x1 Fraction of Volume

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Parametric and structured models
The linear model is an important example of a parametric
model:
ƒ L ( X ) = β0 + β1X 1 + β2X 2 + . . . βpXp.

• A linear model is specified in terms of p + 1 parameters

β0, β1, . . . , βp.
• We estimate the parameters by fitting the model to
training data.
• Although it is almost never correct, a linear model often
serves as a good and interpretable approximation to the
unknown true function ƒ ( X ) .

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Parametric and structured models
3 ● ●
●
2

●
●●
● ●

A linear model ƒ ˆ L ( X ) = βˆ0 + βˆ1X

●
● ● ●
1

● ●●●
●
●●●
● ● ●
y

● ●●
● ●
●● ●●
● ● ● ● ●
gives a reasonable fit here
0

● ● ● ● ●
● ● ●
● ●
● ● ●● ● ● ● ●
●
−1

● ● ●
● ●
●● ●●
−2
3

● ●
●
A quadratic model
2

●
●
● ● ● ●

ƒ ˆ Q (X) = βˆ0 + βˆ1 X + βˆ2 X 2

● ● ●
● ●●●
1

●● ●
●
● ● ●
y

●
●● ●●
●
● ● ● ● ● ● ● ●
0

● ● ● ● ●
● ● ●
● ●
● ● ● ●● ● ● ● ●
fits slightly better.
−1

● ● ●
● ●
●● ●●
−2

1 2 3 4 5 6

x
BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
Simulated example. Red points are simulated values for income
from the model
income = ƒ (education, s e n i o r i t y ) + ϵ

ƒ is the blue surface.

11 / 30
Linear regression model fit to the simulated data.

ƒ ˆ L (education, s e n i o r i t y ) = βˆ0+βˆ1×education+βˆ2×seniority

12 / 30
More flexible regression model ƒˆS (education, s e n i o r i t y ) fit to
the simulated data. Here we use a technique called a thin-plate
spline to fit a flexible surface. We control the roughness of the
fit.

13 / 30
Even more flexible spline regression model
ƒˆS (education, s e n i o r i t y ) fit to the simulated data. Here the
fitted model makes no errors on the training data! Also known
as overfitting.

14 / 30
Some Trade offs

• Prediction accuracy versus interpretability.

— Linear models are easy to interpret; thin-plate splines are not.
• Good fit versus over-fit or under-fit.
— How do we know when the fit is just right?
• Parsimony versus black-box.
— We often prefer a simpler model involving fewer variables over a
black-box predictor involving them all.

BITS Pilani, Deemed to be University 15 / 30

under Section 3 of UGC Act, 1956
Subset Selection

High
Lasso

Least Squares
Interpretability
Generalized Additive Models
Trees

Bagging, Boosting

Support Vector Machines

Low

Low High

Flexibility
Assessing Model Accuracy

Suppose we fit a model ƒˆ(x) to some training data, and we

wish to see how well it performs
• We could compute the average squared prediction error
over Tr:

MSE Tr = Mean Squared Error for Training set

This may be biased toward more overfit models.
• Instead we should, if possible, compute it using fresh test
data
MSE Te = Mean Squared Error for Test set

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

2.5
12

2.0
10

Mean Squared Error

1.5
8
Y

1.0
6

0.5
4
2

0.0
0 20 40 60 80 100 2 5 10 20

X Flexibility

Black curve is truth. Red curve on right is MSE Te , grey curve is

MSE Tr . Orange, blue and green curves/squares correspond to fits of
different flexibility.
2.5
12

2.0
10

Mean Squared Error

1.5
8
Y

1.0
6
4

0.5
2

0.0
0 20 40 60 80 100 2 5 10 20

X Flexibility

Here the truth is smoother, so the smoother fit and linear model do
really well.
20
20

15
Mean Squared Error
10

10
Y

5
−10

0
0 20 40 60 80 100 2 5 10 20

X Flexibility

Here the truth is wiggly and the noise is low, so the more flexible fits
do the best.
Understanding Bias – Variance Tradeoff

What is bias?
• Bias is the difference between the average prediction of our model and the correct
value which we are trying to predict. Model with high bias pays very little attention
to the training data and oversimplifies the model. It always leads to high error on
training and test data.

What is variance?
• Variance is the variability of model prediction for a given data point or a value which
tells us spread of our data. Model with high variance pays a lot of attention to
training data and does not generalize on the data which it hasn’t seen before. As a
result, such models perform very well on training data but has high error rates on
test data.

BITS Pilani, Pilani Campus

Understanding Bias – Variance Tradeoff

• Suppose we are trying to predict Y, given X. We assume there is a relationship

between the two such that: Y = f(X) + ε
• We are estimating f(X) with ƒˆ(x) using linear regression or any other modelling
technique
• So the expected squared error for some x is

• The Err(x) can be further decomposed as

BITS Pilani, Pilani Campus

Understanding Bias – Variance Tradeoff

• In supervised learning, underfitting happens when a model unable to capture the underlying pattern
of the data. These models usually have high bias and low variance. It happens when we have very
less amount of data to build an accurate model or when we try to build a linear model with a
nonlinear data. Also, these kind of models are very simple to capture the complex patterns in data
like Linear and logistic regression.
• In supervised learning, overfitting happens when our model captures the noise along with the
underlying pattern in data. It happens when we train our model a lot over noisy dataset. These
models have low bias and high variance. These models are very complex like Decision trees which are
prone to overfitting.

BITS Pilani, Pilani Campus

Understanding Bias – Variance Tradeoff

Why tradeoff?

• If our model is too simple and has very few parameters then it may have high bias
and low variance. On the other hand if our model has large number of parameters
then it’s going to have high variance and low bias. So we need to find the right/good
balance without overfitting and underfitting the data.
• This tradeoff in complexity is why there is a tradeoff between bias and variance. An
algorithm can’t be more complex and less complex at the same time.
• Typically as the flexibility of ƒˆ increases, its variance increases, and its bias
decreases. So choosing the flexibility based on average test error amounts to a
bias-variance trade-off.

BITS Pilani, Pilani Campus

Understanding Bias – Variance Tradeoff

• To build a good model, we need to find a good balance between bias and variance
such that it minimizes the total error. An optimal balance of bias and variance would
never overfit or underfit the model.

BITS Pilani, Pilani Campus

Bias-variance trade-off for the three
examples
2.5

2.5

20
MSE
Bias
2.0 Var

2.0

15
1.5

1.5

10
1.0

1.0

5
0.5

0.5
0.0

0.0

0
2 5 10 20 2 5 10 20 2 5 10 20

Flexibility Flexibility Flexibility

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Statistical Learning
No ratings yet
Statistical Learning
31 pages
ML 2
No ratings yet
ML 2
155 pages
Lec-01-Introduction To Statistical Learning
No ratings yet
Lec-01-Introduction To Statistical Learning
38 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
PA DL Consolidated
No ratings yet
PA DL Consolidated
94 pages
W1.2 Regression 1
No ratings yet
W1.2 Regression 1
28 pages
ML Introduction
No ratings yet
ML Introduction
76 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Regression
No ratings yet
Regression
39 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
AIML ML Session 3 - Student Common Reference (With More Additional Reading Materials)
No ratings yet
AIML ML Session 3 - Student Common Reference (With More Additional Reading Materials)
55 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Unit 2 Machine Learning
No ratings yet
Unit 2 Machine Learning
32 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
ML 01
No ratings yet
ML 01
24 pages
Notes5 Regression
No ratings yet
Notes5 Regression
14 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
BTMMeeting25Nov2020 StatisticalLearning
No ratings yet
BTMMeeting25Nov2020 StatisticalLearning
49 pages
PA Combined
No ratings yet
PA Combined
264 pages
RigNotes15 PDF
No ratings yet
RigNotes15 PDF
130 pages
Presentation 6
No ratings yet
Presentation 6
34 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
When Models Meet Data
No ratings yet
When Models Meet Data
25 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Notes 1
No ratings yet
Notes 1
3 pages
AML AfterMid Merged
No ratings yet
AML AfterMid Merged
389 pages
Machine Learning
No ratings yet
Machine Learning
92 pages
ZG512 L1 Introduction, Bias-Variance 270724
No ratings yet
ZG512 L1 Introduction, Bias-Variance 270724
19 pages
Merge
No ratings yet
Merge
240 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
Machine Learning Lecture Notes Undergrad
No ratings yet
Machine Learning Lecture Notes Undergrad
19 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Week 2
No ratings yet
Week 2
43 pages
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
27 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
1 Intro
No ratings yet
1 Intro
5 pages
Regression
No ratings yet
Regression
45 pages
Logistic Regression and Decision Tree
No ratings yet
Logistic Regression and Decision Tree
75 pages
ML: Introduction 1. What Is Machine Learning?
No ratings yet
ML: Introduction 1. What Is Machine Learning?
38 pages
ML LVC 1 Post-Session Summary
No ratings yet
ML LVC 1 Post-Session Summary
15 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
Lecture3 2015
No ratings yet
Lecture3 2015
38 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
AIML ML Session 4 - Student Common Reference (With More Additional Reading Materials) Part 2
No ratings yet
AIML ML Session 4 - Student Common Reference (With More Additional Reading Materials) Part 2
45 pages
Week2 StatisticalLearning
No ratings yet
Week2 StatisticalLearning
46 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
Descriptive and Summary Statistics: Statistics for Lean Six Sigma Simplified with GEN AI, #1
From Everand
Descriptive and Summary Statistics: Statistics for Lean Six Sigma Simplified with GEN AI, #1
Sumeet Savant
No ratings yet
Linear and Nonlinear Programming Essentials
From Everand
Linear and Nonlinear Programming Essentials
Tanushri Kaniyar
No ratings yet
Analysis Report - Soil Nail SGHR100 MacMat
No ratings yet
Analysis Report - Soil Nail SGHR100 MacMat
2 pages
GEd 107 ETHICS - FINAL PROJECT
100% (1)
GEd 107 ETHICS - FINAL PROJECT
2 pages
UI Unit 8 Test B
No ratings yet
UI Unit 8 Test B
3 pages
Teaching Practicum Syllabus
No ratings yet
Teaching Practicum Syllabus
9 pages
Introduction To Bioinformatics - Notes
No ratings yet
Introduction To Bioinformatics - Notes
18 pages
Vedic Maths Final PPT-1
No ratings yet
Vedic Maths Final PPT-1
21 pages
Data Analysis Coca Cola
No ratings yet
Data Analysis Coca Cola
7 pages
Geography Project On Environment Conserv
No ratings yet
Geography Project On Environment Conserv
18 pages
16 Cooperative Structures
No ratings yet
16 Cooperative Structures
6 pages
Proving Areaof Triangle Using Series
No ratings yet
Proving Areaof Triangle Using Series
9 pages
Conflict Management Ogl 220 Worksheet
No ratings yet
Conflict Management Ogl 220 Worksheet
4 pages
Group 3 Final 1
No ratings yet
Group 3 Final 1
112 pages
Ccaa Training Catalogue - March 2023 2
No ratings yet
Ccaa Training Catalogue - March 2023 2
27 pages
Internship Supervisor's Evalution Form Makerere
100% (1)
Internship Supervisor's Evalution Form Makerere
2 pages
Older Persons Programme Western Cape Government
No ratings yet
Older Persons Programme Western Cape Government
4 pages
FIELDCRAFTProficiency
100% (1)
FIELDCRAFTProficiency
67 pages
HF 2010 040
No ratings yet
HF 2010 040
10 pages
Proaquatest v02 en
No ratings yet
Proaquatest v02 en
32 pages
GEED 20073 BSID 1-1 Philippine Popular Culture - FINAL PAPER
No ratings yet
GEED 20073 BSID 1-1 Philippine Popular Culture - FINAL PAPER
2 pages
14 04 2024 SR Super60 Elite, Target & LIIT BTs Jee Adv2023 P1On
No ratings yet
14 04 2024 SR Super60 Elite, Target & LIIT BTs Jee Adv2023 P1On
21 pages
Confusion Matrix
No ratings yet
Confusion Matrix
21 pages
Unified Council: National Level Science Talent Search Examination
No ratings yet
Unified Council: National Level Science Talent Search Examination
4 pages
Patient Complaint Form Template
No ratings yet
Patient Complaint Form Template
3 pages
Social Studies Ms MD
No ratings yet
Social Studies Ms MD
8 pages
Worked Examples and Exercises On Redox Titrations
No ratings yet
Worked Examples and Exercises On Redox Titrations
6 pages
Siprotec Line Differential Protection 7SD80: Preface Modbus Register Map Glossary Index
100% (1)
Siprotec Line Differential Protection 7SD80: Preface Modbus Register Map Glossary Index
28 pages
Scanning Probe Lithography Fundamentals Materials and Applications Yu Kyoung Ryu Javier Martinez Rodrigo Download
No ratings yet
Scanning Probe Lithography Fundamentals Materials and Applications Yu Kyoung Ryu Javier Martinez Rodrigo Download
41 pages
Atsumi Nr60qs Nr110qs Nr160qs Eng Catalog
No ratings yet
Atsumi Nr60qs Nr110qs Nr160qs Eng Catalog
2 pages
Basic Designer and Virtual Verifier (Mechanical Stream)
No ratings yet
Basic Designer and Virtual Verifier (Mechanical Stream)
2 pages
D21 First Report On Facts Figures v12 Clean CJp0V9c6gjjZCCVu1KBoxweRdZk 82977
No ratings yet
D21 First Report On Facts Figures v12 Clean CJp0V9c6gjjZCCVu1KBoxweRdZk 82977
184 pages

2.SupervisedLearning Error

Uploaded by

2.SupervisedLearning Error

Uploaded by

ZG 512

• Outcome measurement Y (also called dependent variable,response, target).

• In the regression problem, Y is quantitative (e.g price,blood pressure).

BITS Pilani, Pilani Campus

On the basis of the training data we would like to:

• Accurately predict unseen test cases.

• Understand which inputs affect the outcome, and how.

• Assess the quality of our predictions and inferences.

BITS Pilani, Pilani Campus

• It is important to understand the ideas behind the various techniques, in order to

BITS Pilani, Pilani Campus

• No outcome variable, just a set of predictors (features) measured on a set of

• difficult to know how well your are doing.

BITS Pilani, Pilani Campus

• Machine learning arose as a subfield of Artificial Intelligence.

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

• f(X) vs ƒˆ(x)? and Y vs Ŷ? – True function vs. estimated function

BITS Pilani, Pilani Campus

Prediction error is influenced by 2 factors:

BITS Pilani, Pilani Campus

• Irreducible error can arise from several sources, including:

BITS Pilani, Pilani Campus

• Ultimately, the focus in machine learning is to minimize reducible errors—those

BITS Pilani, Pilani Campus

• Typically we have few if any data points with X = 4

• Nearest neighbor averaging can be pretty good for small p

• Nearest neighbor methods can be lousy when p is large.

• We need to get a reasonable fraction of the N values of y i

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

• A linear model is specified in terms of p + 1 parameters

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

A linear model ƒ ˆ L ( X ) = βˆ0 + βˆ1X

ƒ ˆ Q (X) = βˆ0 + βˆ1 X + βˆ2 X 2

ƒ is the blue surface.

• Prediction accuracy versus interpretability.

BITS Pilani, Deemed to be University 15 / 30

Support Vector Machines

Suppose we fit a model ƒˆ(x) to some training data, and we

MSE Tr = Mean Squared Error for Training set

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

Mean Squared Error

Black curve is truth. Red curve on right is MSE Te , grey curve is

Mean Squared Error

BITS Pilani, Pilani Campus

• Suppose we are trying to predict Y, given X. We assume there is a relationship

• The Err(x) can be further decomposed as

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Flexibility Flexibility Flexibility

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956

You might also like