0% found this document useful (0 votes)

27 views

Week2 StatisticalLearning

Statistical learning involves using a set of predictor variables (X) to estimate an unknown function (f) that predicts a response or target variable (Y). The goal is to estimate f in order to accurately predict Y for new observations, and to understand which predictor variables are most important for explaining Y. A general statistical learning model takes the form Y = f(X) + ε, where ε is random error. Estimating f allows making predictions as Y^ = f^(X), treating f^ as a "black box" that aims to be near accurate without knowing its exact form.

Uploaded by

Muntaz Muntaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Week2 StatisticalLearning

Uploaded by

Muntaz Muntaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

slides originally by

Dr. Richard Burns,

modified by
Dr. Stephanie Schwartz

STATISTICAL LEARNING
CSCI 406: Data Mining
What is Statistical Learning?
2 / 30

25
20

20
Sales

Sales

Sales
15

15
10

10
5

5
0 50 100 200 300 0 10 20 30 40 50 0 20 40 60 80 100

TV Radio Newspaper

Shown are Sales vs TV, Radio and Newspaper, with a blue linear-regression line fit
separately to each.
Can we predict Sales using these three? P erhaps we can do better
using a model Sales ≈ f (TV, Radio, Newspaper)
Statistical Learning – General Form
¨ In general, assuming we have
¤ Observation of quantitative (numerical) response Y
¤ Observation of p different predictors {X1, X2, …, Xp}

¤ A relationship between Y and X

¤ We can write this in the very general form:

Y = f (X) + ε
Statistical Learning – General Form
Y = f (X) + ε
¨ Y is the target or response (in previous example: Sales)
¨ f is unknown function of X = {X1, X2, …, Xp}
¨ f may involve more than one input variable (in previous example: Radio, TV,
Newspaper)
¨ ε is a random error term
¨ Independent of X
¨ Has mean equal zero
¨ f represents information that X provides about Y
¨ Statistical learning refers to a set of approaches for estimating f
Why estimate f?
¨ Two usual objectives:
1. Prediction:
v With a good f we can make predictions of Y at new
points X = x
2. Inference / Descriptive:
v We can understand which components of X = (X1, X2, . . . , Xp)
are important in explaining Y , and which are irrelevant. e.g.
Seniority and Years of Education have a big impact on Income,
but Marital Status typically does not.
Estimating f - Prediction
¨ In many situations, a set of X inputs are readily
available, but Y is not easily obtained.
Y = f (X) + ε
¨ Since error term averages to zero, we can predict Y
using,
Yˆ = fˆ (X)
fˆ represents estimate for f Yˆ represents prediction for Y
Estimating f - Prediction
Yˆ = fˆ (X)
¨ fˆ often treated as a black box
¤ Not typically concerned with the exact form of f
n linear, quadratic, etc.
¤ We only care that our predictions are “near accurate”
●●

6
● ●
●●
● ● ●
● ●●
● ●
● ● ●● ● ● ●●

4
●●● ● ●
●●
● ●●●●●● ●●● ● ● ●
● ● ● ●
● ●●●●●● ●●●●●
●● ●●● ●
●●
●●●●
●● ● ●● ●
●● ●●●● ●●● ●●●●●
●● ●●●●●●●●●●● ●
●●●●● ● ● ● ● ●●●● ●●●●●

y
● ●
● ● ● ●
● ●● ●
●●●●● ● ● ●●●

2
● ●●● ●●● ●●●●●●●
● ●● ●●●
●●●●●●●
●●●
●● ●● ●● ●
●●●●●● ●●● ●
●●● ●●● ● ●● ●●
●●●●●●●
●●●● ●●●●●●
●● ●
●● ●●●●●●●● ● ●
●●●●●
●●● ● ●●●●●●●
●●●
● ●●
●
●●●●●●●●●● ●● ●
●●●●
● ●● ●●
●●●●●●●
●●●●●● ●●● ●●
● ●●●●●● ● ●
●●●
●●
●
●
●
●●
●●
●
●
● ●
●● ●●
●●●
● ●●● ● ●●●● ● ●●
●●●●●●●●
●
● ●
● ● ● ●●●●● ● ●
●●● ●●●● ●● ●● ●●●
●●●●●● ● ●
●●●●●●●● ●
●●●● ●
● ● ●●
● ● ●●●●
●●●● ●●● ● ●● ●●●●●●● ● ●●●
●●●●
●● ●●●●●●●●●●
● ●●●●● ●● ●● ●●●●●●●● ●●●● ●●●●● ●
●●●
●● ●● ● ●●●●● ●●●● ● ●● ●●● ●●●●●●●● ●●
● ● ●● ● ● ● ● ●●● ●●● ●●●●
● ●●●●●●● ●●●●●● ●●
●●●●
●●
●●
●●
●● ●●●
●●●●●
●
● ●●●●
●
● ● ●
●
●
● ●● ●
● ●●
●● ●●
●●●
● ●●●
●●●
●
●
●
●●●●●
●●
●●●●●●
● ●
●●
●●●
●● ● ●
● ●
● ● ●● ●● ●● ●●
●● ●● ●●●●
● ●●●● ●
● ●
●●●●● ● ●
● ●
●●●●●● ●
● ●

0
●● ●●● ●●● ●●●● ● ●●
●●
●●●● ●● ●● ●●●
●●● ●● ● ●
●● ●●●
●●●●● ●● ● ●
● ●●
●●
● ●●● ●●● ●● ●●●● ●●●
●●● ●●●
●●●●
● ●●● ●●●●●●●●●
●●
●●●
●
●● ●●●●●●
●
●●
● ●●
●●
●●●● ●●●●●
●
●●●●
●●●
● ●
●
●●●
● ●●●
●
● ●●
●●●
●●●●
● ●
●●●●
● ● ● ●●●●●●
●●
●●
●
●●
●
●
●
●
●●
●
●●●●● ●
●
●
●● ●●●
● ●●●
●
●●
●●●●●● ●●●●
●● ●●●● ●●●●● ●●
●●●● ●●● ●●●●●●●●●●●●●●●●●● ● ●●
●●●●●
●●● ●
●●●●● ●●●
●●● ●●●●●●●
● ●●●●●● ●●●●● ●●●●
●● ● ●●●●●● ●●● ●●
● ●
●●●● ●● ●●●●●
●●●●● ●
●●●●●
● ●
●●●●
● ● ● ●
●●●●●●
●●●● ●
● ●
● ●
●●●
●● ●●●● ● ●
●●
●●● ● ●
●●●●●●●●●●●● ●
● ● ●●● ● ● ●● ● ●●
● ● ●● ● ●
● ● ● ●
● ●● ●
● ● ●●●●● ●●● ● ●●●●●●●●● ●●●●●
●● ●●●●●●● ●● ●●● ● ● ●●●
● ● ●●●●●●● ● ●●● ●●●●●●●●●● ●●● ●●● ●

−2
●

1 2 3 4 5 6 7

Is there an ideal f (X )? In particular, what is a good value for

f (X ) at any selected value of X , say X = 4? There can be
many Y values at X = 4. A good value is

f (4) = E (Y |X = 4)
E (Y |X = 4) means expected value (average) of Y given X = 4.
This ideal f (x ) = E (Y |X = x ) is called the regression function.
8 / 30
Estimating f – Types of Error

¨ The accuracy of Yˆ as a prediction for Y depends on:

1. Reducible error
2. Irreducible error

¨ fˆ will not be perfect estimate for f

¤ reducible, because we can use more appropriate data mining
techniques
Estimating f – Irreducible Error
¨ The accuracy of Yˆ as a prediction for Y also depends on:
¤ Irreducible error
Y = f (X) + ε
¨ ε = Y – f(x)
¤ Even if we knew f(x), we would still make errors in prediction since
at each X=x there is a distribution of possible Y values
¤ Thus, variability associated with ε also affects prediction accuracy
¨ Cannot reduce error introduced by ε no matter how well we
estimate f
Estimating f – Irreducible Error
¨ Why is irreducible error larger than zero?
¨ Quantity ε may contain unmeasured variables that are
useful in predicting Y
¤ If we don’t measure them, f can’t use them for its prediction

¨ Quantity ε may also contain unmeasureable variation

Estimating f –
¨ Focus in this course is on techniques for estimating f
with the aim of minimizing reducible error.
¨ The irreducible error will always provide an upper
bound on the accuracy of our predictions.
¤ In practice, the upper bound because of irreducible
error is almost always unknown.
Estimating f - Inference
¨ Rather than predicting Y based on observations of
X,
¨ Goal is to understand the way that Y is affected as X =
{X1, X2, …, Xp} changes
¨ Understand the relationship between X and Y
¨ fˆ not treated as “black box” anymore, we need to
know its exact form
Estimating f - Inference
¨ May be interested in answering the following
questions:
¤ “Which predictors are associated with the response?”
n Often the case that only small fraction of the available
predictors are associated with Y
n Identifying the few, important predictors
Estimating f - Inference
¨ May be interested in answering the following
questions:
¤ “What is the relationship between the response and
each predictor?”
n Some predictors may have a positive relationship with Y (or
vice versa, a negative relationship)
n Increasing the predictor is associated with increasing values
of Y
Estimating f - Inference
¨ May be interested in answering the following questions:
¤ “Can fˆ be summarized using a linear equation, or is the
relationship more complicated?”
n Historically, most methods for estimating f have taken a linear
form
n But often true relationship is more complicated
n Linear model may not accurately represent relationship between
input and output variables
How do we estimate f?
¨ Most statistical learning methods classified as:
1. Parametric
2. Non-parametric
Parametric Methods
¨ Assume that the functional form, or shape, of f is
linear in X
f (X) = β0 + β1 X1 + β2 X 2 +... + β p X p
¨ This is a linear model, for p predictors X = {X1, X2, …, Xp}
¨ Model fitting involves estimating the parameters β0, β1, …, βp
¨ Only need to estimate p+1 coefficients,
¤ Rather than an entirely arbitrary p-dimensional function f(X)
¨ Parametric: reduces the problem of estimating f down to estimating a set of
parameters
Non-parametric Methods
¨ Do not make explicit assumptions about the
functional form of f (such that it is linear)
Parametric Methods Non-parametric Methods

¨ Assumption of form of ¨ Potential to accurately

model (perhaps linear) fit a wider range of
¨ Possible that functional possible shapes for f
estimate is very
different from the true f ¨ Many, many more
¤ If so, won’t fit data well observations needed
¨ Only need to estimate ¨ Complex models can
set of parameters lead to overfitting
Trade-Off Between Model Flexibility and
Model Interpretability
¨ Some statistical models (e.g. linear models) are less flexible and
more restrictive.
¨ Q: Why would be ever choose to use a more restrictive method
instead of a very flexible approach?
¨ A: When inference is the goal, the restrictive models are much more
interpretable.
¤ In linear model, it is easy to understand relationship between Y and X1,
X2, …
¨ For prediction, we might only be interested in accuracy and not the
interpretability of the model
Trade-Off Between Model Flexibility and
Model Interpretability
High
Model Interpretability

Linear Models

Decision Trees

Support Vector Machines

Low

Low High
Model Flexibility
Trade-Off Between Model Flexibility and
Model Interpretability
¨ Even for prediction, where we might only care about
accuracy, more accurate predictions are sometimes
made from the less flexible methods
¤ Reason: overfitting in more complex models
Classification vs. Regression
¨ Given a dataset: instances with X set of
predictors/attributes, and single Y target attribute
¨ Classification:
¤ Y Class label is discrete (usually categorical/nominal or
binary) attribute
¨ Regression:
¤ Y Class label is continuous
¤ Numeric prediction
Supervised Learning Approach to
Classification or Regression Problems
¨ Given a collection of records (training set)
¤ Each record contains predictor attributes as well as target
attribute
¨ Learn a model (function f) that predicts the class value
(category or numeric value) based on the predictor
attributes
¨ Goal: “previously unseen” instances should be assigned a
class as accurately as possible
¤ A test set is used to evaluate the model’s accuracy.
Training Set vs. Test Set
¨ Overall dataset can be divided into:
1. Training set – used to build model
2. Test set – evaluates model
Tid Attrib1 Attrib2 Attrib3 Class
Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Model Evaluation on Test Set
(Classification) – Error Rate
¨ Error Rate: proportion of mistakes that are made by
applying our fˆ model to the testing observations:
1 n
∑
n i=1
I(yi ≠ ŷi )

Observations in test set: {(x1,y1), …, (xn,yn)}

ŷi is the predicted class for the ith record
I(yi ≠ ŷi ) is an indicator variable: equals 1 if yi ≠ ŷi and 0 if yi = ŷi
Model Evaluation on Test Set
(Classification) – Confusion Matrix
¨ Confusion Matrix: tabulation of counts of test records
correctly and incorrectly predicted by model
Predicted Class
Class = 1 Class = 0
Class = 1 f11 f10
Actual Class
Class = 0 f01 f00

(Confusion matrix for a 2-class problem.)

Model Evaluation on Test Set
(Classification) – Confusion Matrix
Predicted Class
Class = 1 Class = 0
Class = 1 f11 f10
Actual Class
Class = 0 f01 f00

Number of correct predictions f11 + f00

Accuracy = =
Total number of predictions f11 + f10 + f01 + f00
Number of wrong predictions f10 + f01
Error rate = =
Total number of predictions f11 + f10 + f01 + f00

Most classification tasks seek models that attain the highest accuracy when applied to the test set.
Model Evaluation on Test Set
(Regression) – Mean Squared Error
¨ Mean Squared Error: measuring the “quality of fit”
¤ will be small if the predicted responses are very close
to the true responses
n
1
MSE = ∑ (yi − fˆ (xi ))2
n i=1
Observations in test set: {(x1,y1), …, (xn,yn)}
fˆ (xi ) is the predicted value for the ith record
A Problem
¨ We already know that there is no one “best” data mining method or
statistical learning method.
¤ Depends on the characteristics of the data
¨ We’ve introduced evaluation:
¤ We can quantify error (classification error, mean squared error) in hopes
of comparing accuracy of different models
¨ We have datasets partitioned:
¤ Training set – model learns on this data
¤ Test set – model evaluated on this data
How well the model works on new data is what we really care about!
A Problem
¨ Error rates on training set vs. testing set might be
drastically different.
¨ There is no guarantee that the method with the smallest
training error rate will have the smallest testing error
rate.
¨ Why?
¤ Statistical methods specifically estimate coefficients so as to
minimize the training set error
Overfitting
¨ Overfitting: occurs when statistical model
“memorizes” the training set data
¤ very low error rate on training data
¤ higher error rate on test data

¨ Model does not generalize to the overall problem

¨ This is bad! We wish to avoid overfitting.
Learning Method Bias
¨ Bias: the error introduced by modeling a real-life problem
(usually extremely complicated) by a much simpler problem
¤ Example: linear regression assumes a linear relationship between
the target variable Y and the predictor variables X
¤ It’s unlikely that the relationship is exactly linear, so some bias will
be present in the model.
¨ The more flexible (complex) a method is, the less bias it will
generally have.
Learning Method Variance
¨ Variance: how much the learned model would
change if the training set was different
¤ Does changing a few observations in the training set,
dramatically affect the model?
n Ideally, answer is no.
¨ Generally, the more flexible (complex) a method is,
the more variance it has.
Bias-Variance Trade-Off
¨ Math proof! (beyond scope of this course)
¨ Expected test set error can be decomposed into the sum of
the model’s variance, its squared bias, and the variance of
its error terms.
E(y0 − fˆ (x0 ))2 = Variance( fˆ (x0 )) +[Bias( f (x0 ))]2 +Variance(ε )

¨ As a statistical method gets more complex, the bias will

decrease and the variance will increase.
¨ Expected error on the test set may go up or down.
BIAS – VARIANCE TRADE-OFF 93

Optimal Level of
Model Complexity

Error Rate on
Validation Set
Error Rate

Error Rate on
Underfitting Overfitting Training Set

Complexity of Model
Example: we wish to build a model that separates the dark-colored points from the
CHAPTER 5 k-NEAREST NEIGHBOR ALGORITHM
light-colored points.
Data Point Observations created by: Y=f(X)+ε

Black line is simple, linear model fˆ

Currently, some
classification error

• Low variance
• Bias present

Figure 5.3 Low-complexity separator with high error rate.

More complex model (curvy line instead of linear)
Figure 5.3 Low-complexity separator with high error rate.

Zero classification
error for these data
points

• No linear model bias

• Higher Variance?

Figure 5.4 High-complexity separator with low error rate.

More data has been added

Re-train both models (linear line, and curvy line) in order to minimize error rate
Figure 5.4 High-complexity separator with low error rate.

Variance:
• Linear model doesn’t change much
• Curvy line significantly changes

Which model is better?

ure 5.5 With more data: low-complexity separator need not change much; high-
¨ Now that we know the definitions of “training set”
and “testing set”,
¤ A more complete view of the Data Mining process…
Data Mining Process
1. Engage in efficient data storage and data preprocessing
2. Select appropriate response variables
¤ Decide on the number of variables that should be investigated
3. Screen data for outliers
¤ Address issues of missing values
4. Partition datasets into training and testing sets
¤ Sample large datasets that cannot easily be analyzed as a
whole
Data Mining Process (cont.)
5. Visualize data
¤ Box plots, histograms, etc.
6. Summarize data
¤ Mean, median, sd, etc.
7. Apply appropriate data mining methods (decision trees)
8. Evaluate model on test set
9. Analyze, interpret results
¤ Act on findings
References
¨ Introduction to Data Mining, 1st edition, Tan et al.
¨ Data Mining and Business Analytics in R, 1st edition,
Ledolter
¨ An Introduction to Statistical Learning, 1st edition,
James et al.
¨ Discovering Knowledge in Data, 2nd edition, Larose
et al.

R LAB Exproling Data
100% (2)
R LAB Exproling Data
6 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Introduction To Statistical Learning
No ratings yet
Introduction To Statistical Learning
16 pages
Ch2_Statistical_Learning
No ratings yet
Ch2_Statistical_Learning
51 pages
Statistical Learning
No ratings yet
Statistical Learning
31 pages
Capitulo 2 big data
No ratings yet
Capitulo 2 big data
25 pages
AIML-Unit 5 Notes
No ratings yet
AIML-Unit 5 Notes
45 pages
1 Statistical Learning
No ratings yet
1 Statistical Learning
42 pages
Islp 4
No ratings yet
Islp 4
5 pages
Machine Learning
No ratings yet
Machine Learning
92 pages
Intro To Data Science Lecture 1
No ratings yet
Intro To Data Science Lecture 1
7 pages
Predictive Modelling Process: A First Tour
No ratings yet
Predictive Modelling Process: A First Tour
11 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Lecture 1
No ratings yet
Lecture 1
54 pages
Islp 5
No ratings yet
Islp 5
5 pages
Lec-01-Introduction to Statistical Learning
No ratings yet
Lec-01-Introduction to Statistical Learning
38 pages
Chapter 2
No ratings yet
Chapter 2
38 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Lecture 3 - Removed
No ratings yet
Lecture 3 - Removed
11 pages
Lecture 2 - Removed
No ratings yet
Lecture 2 - Removed
19 pages
Module 2
No ratings yet
Module 2
84 pages
Ba 02
No ratings yet
Ba 02
26 pages
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
No ratings yet
F (A) P (X A) : Var (X) 0 If and Only If X Is A Constant Var (X) Var (X+Y) Var (X) + Var (Y) Var (X-Y)
8 pages
ISLR
No ratings yet
ISLR
9 pages
02 Statistical Learning
No ratings yet
02 Statistical Learning
37 pages
Chapter 1. Elements in Predictive Analytics
No ratings yet
Chapter 1. Elements in Predictive Analytics
66 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
ECON835 Lecture Notes Part 1 Probability Through Asymptotics [Fall 2014]
No ratings yet
ECON835 Lecture Notes Part 1 Probability Through Asymptotics [Fall 2014]
75 pages
lec1
No ratings yet
lec1
54 pages
00_Introduction
No ratings yet
00_Introduction
29 pages
Lecture 2
No ratings yet
Lecture 2
43 pages
2.SupervisedLearning Error
No ratings yet
2.SupervisedLearning Error
32 pages
Week 5 Notes
No ratings yet
Week 5 Notes
175 pages
8822 LectureNotes
No ratings yet
8822 LectureNotes
74 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
ECN 306
No ratings yet
ECN 306
43 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis
No ratings yet
MGS3100 Chapter 13 Forecasting: Slides 13c: Causal Models and Regression Analysis
36 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Data Science Q&A - Latest Ed (2020) - 3 - 1
No ratings yet
Data Science Q&A - Latest Ed (2020) - 3 - 1
2 pages
Day 2. Lecture - Machinelearning
No ratings yet
Day 2. Lecture - Machinelearning
32 pages
An Introduction to Statistical Learning PDF
No ratings yet
An Introduction to Statistical Learning PDF
35 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Slides 1 Handout
No ratings yet
Slides 1 Handout
23 pages
Lecture #2: Prediction, K-Nearest Neighbors: CS 109A, STAT 121A, AC 209A: Data Science
No ratings yet
Lecture #2: Prediction, K-Nearest Neighbors: CS 109A, STAT 121A, AC 209A: Data Science
28 pages
Fundamentals Part 2
No ratings yet
Fundamentals Part 2
40 pages
DA-Unit-3-Trio
No ratings yet
DA-Unit-3-Trio
13 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
Econometrics II Week 3 Summary
No ratings yet
Econometrics II Week 3 Summary
8 pages
Basic ML Algorithm
No ratings yet
Basic ML Algorithm
74 pages
Gary Chamberlain Econometric S
No ratings yet
Gary Chamberlain Econometric S
152 pages
SDS Solution1
No ratings yet
SDS Solution1
26 pages
Anderson Ch16
No ratings yet
Anderson Ch16
59 pages
1 Introduction
No ratings yet
1 Introduction
8 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
SI_Chapter-2
No ratings yet
SI_Chapter-2
53 pages
Fill Your Glass With Gold-When It's Half-Full or Even Completely Shattered
From Everand
Fill Your Glass With Gold-When It's Half-Full or Even Completely Shattered
Hillary Saffran
No ratings yet
Course Description: Cc3780@cumc - Columbia.edu
No ratings yet
Course Description: Cc3780@cumc - Columbia.edu
4 pages
Senior Practical Research 2 Q1 Module1
83% (12)
Senior Practical Research 2 Q1 Module1
24 pages
Fin534 Ci Oct22
No ratings yet
Fin534 Ci Oct22
4 pages
Flowchart On Statistical Technique
100% (1)
Flowchart On Statistical Technique
3 pages
Multiple Linear Regression: BIOST 515 January 15, 2004
No ratings yet
Multiple Linear Regression: BIOST 515 January 15, 2004
32 pages
Faculity of Busines and Economics Department of Managment: The Role of Effective Communication in Resolving Conflict
No ratings yet
Faculity of Busines and Economics Department of Managment: The Role of Effective Communication in Resolving Conflict
41 pages
Draft Format Use or Cite: Chapter 31 Wind Tunnel Procedure
No ratings yet
Draft Format Use or Cite: Chapter 31 Wind Tunnel Procedure
7 pages
08 Dummy Variable
No ratings yet
08 Dummy Variable
24 pages
Telehealth Delivery of Memory Rehabilitation Following Strok
No ratings yet
Telehealth Delivery of Memory Rehabilitation Following Strok
14 pages
DNI Exercise Sheet 6 Charlotte Baehren
No ratings yet
DNI Exercise Sheet 6 Charlotte Baehren
4 pages
Unveiling The Secrets of Peanut Shell Biochar
No ratings yet
Unveiling The Secrets of Peanut Shell Biochar
2 pages
Canonical Word 2
No ratings yet
Canonical Word 2
7 pages
Eviews PDF
No ratings yet
Eviews PDF
2 pages
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
No ratings yet
Placement Prediction Using Various Machine Learning Models and Their Efficiency Comparison
5 pages
Tutorial 1
No ratings yet
Tutorial 1
5 pages
Reconciliation-Officer-Intern
No ratings yet
Reconciliation-Officer-Intern
2 pages
Iris Classification Report
No ratings yet
Iris Classification Report
3 pages
Descriptive Research Design Literature Review
100% (2)
Descriptive Research Design Literature Review
8 pages
X11 ARIMA Manual
No ratings yet
X11 ARIMA Manual
120 pages
Descriptive and Inferential Statistics
No ratings yet
Descriptive and Inferential Statistics
10 pages
LUMBA FC Syllabus
No ratings yet
LUMBA FC Syllabus
15 pages
Education English
No ratings yet
Education English
8 pages
Chap15 Demand Management and Forecasting
No ratings yet
Chap15 Demand Management and Forecasting
51 pages
3_Modeling.ipynb - Colaboratory
No ratings yet
3_Modeling.ipynb - Colaboratory
31 pages
Cqi 9
No ratings yet
Cqi 9
15 pages
Engineering Design Ch8
No ratings yet
Engineering Design Ch8
8 pages
Q 7
No ratings yet
Q 7
2 pages
Course Slides - Data Science and ML Fundamentals
No ratings yet
Course Slides - Data Science and ML Fundamentals
92 pages
Tugas XXX LATIHAN Biostatistika
No ratings yet
Tugas XXX LATIHAN Biostatistika
6 pages

Week2 StatisticalLearning

Uploaded by

Week2 StatisticalLearning

Uploaded by

slides originally by

Dr. Richard Burns,

¤ A relationship between Y and X

¤ We can write this in the very general form:

Is there an ideal f (X )? In particular, what is a good value for

¨ The accuracy of Yˆ as a prediction for Y depends on:

¨ fˆ will not be perfect estimate for f

¨ Quantity ε may also contain unmeasureable variation

¨ Assumption of form of ¨ Potential to accurately

Support Vector Machines

4 Yes Medium 120K No

7 Yes Large 220K No Learn

10 No Small 90K Yes

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

Observations in test set: {(x1,y1), …, (xn,yn)}

(Confusion matrix for a 2-class problem.)

Number of correct predictions f11 + f00

¨ Model does not generalize to the overall problem

¨ As a statistical method gets more complex, the bias will

Black line is simple, linear model fˆ

Figure 5.3 Low-complexity separator with high error rate.

• No linear model bias

Figure 5.4 High-complexity separator with low error rate.

Which model is better?

You might also like