0% found this document useful (0 votes)

47 views

NASA Regression Lecture

This document provides an introduction to linear regression analysis through a course outline and overview of key concepts. The course will cover topics such as simple and multiple linear regression, residual analysis, transformations, influence diagnostics, and model selection. It discusses the scientific method and how regression models are used to build relationships between variables from data. Simple linear regression is introduced as the basic model where the response is predicted from a single independent variable.

Uploaded by

gustavo rodriguez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

NASA Regression Lecture

Uploaded by

gustavo rodriguez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 268

Introduction to Linear Regression Analysis

Geoﬀ Vining

Virginia Tech

Text: Montgomery, Peck and Vining, Introduction to Linear

Regression Analysis, 5th edition, Wiley.
Course Topics

1. Overview of Modeling
2. Review of Simple Linear Regression
3. Multiple Linear Regression
4. Residual Analysis
5. Transformations
6. Inﬂuence Diagnostics
7. Collinearity
8. Model Selection
9. Logistic Regression
Chapter 1: Overview of Regression
Scientific Method

The heart of sound engineering practice is the scientific

method:
I systematic approach for problem solving
I constant interplay between the concrete and abstract
Concrete: Actual engineering processes

Abstract: Mathematical models

Scientific method: An inductive - deductive process

A succinct summary of the scientific method:

1. Define the problem (inductive)
2. Propose an educated theory, idea or model (inductive)
3. Collect data to test the theory (deductive)
4. Analyze the results (deductive)
5. Interpret the data and draw conclusions (deductive)
This process continues until a reasonable solution emerges.

The scientific method is a sequential learning strategy!

Scientific Method

I The proper application requires:

I Model building
I Data collection
I Data analysis
I Data interpretation
I Scientific method requires experimentation!
I The scientific method requires statistics!
I Handmaiden to the Scientific Method
Regression Models

Consider the thickness of the silicon layer on a wafer for an

integrated circuit.

Let yi be the thickness of the silicon layer on the i th wafer.

A deterministic model is

y = β0 + βt t + βT T + βA A

where
I the β’s are constants,
I t is the deposition time,
I T is the deposition temp, and
I A is the Argon flow rate.
Regression Models

A better model is

yi = β0 + βt t + βT T + βA A + ϵi .

where ϵi is the random error associated with the ith wafer.

We note that this model is an example of a first-order Taylor

Series approximation!

The scientific basis for many of our regression models is

Taylor Series.

George Box: ”All models are wrong; some are useful.”

Linear Models

The simple linear model:

yi = β0 + β1 xi + ϵi

The multiple linear regression model:

yi = β0 + β1 x1i + β2 x2i + . . . + βk xki + ϵi

In general, k represents the number of regressors
(independent variables).
Linear Models
Other linear models:

yi = β0 + β1 xi + β11 xi2 + ϵi
Note: we can let x1i = xi , x2i = xi2 , and let β2 = β11 .

We then can rewrite the model as

yi = β0 + β1 x1i + β2 x2i + ϵi
A linear model means linear in the parameters (the β’s).

log(yi ) = β0 + β1 xi + β11 xi2 + ϵi

1
yi = β0 + β1 + ϵi
xi
Three basic methods of collecting data are:

The basic methods are:

I Retrospective studies (historical data)
I Observational studies
I Designed experiments
Need to understand each method!
Retrospective Studies

Retrospective studies use previously collected data.

I Surely, our historical data have the answers!
I Surely, this is the least expensive and quickest way since
the data are usually readily available.

Data mining is very popular.

However, these data are usually less than optimal for

research goals in many cases.
Issues with Retrospective Studies

1. The data collected are either

I easily obtained or
I answered a previous problem.
2. Relevant data are missing.
3. Data often do not address problem at hand
4. Questionable reliability and quality of the data
5. Analysts use data in ways never intended
6. Interesting phenomena rarely identified because no one
realized they were interesting at the time!
7. Relying on historical data is often very expensive!
Observational Studies

Observational studies interact with the process only as much

as is required to obtain relevant data.

Usually we use observational studies to monitor processes.

Beware the Heisenberg Uncertainty Principle!

Designed Experiments
Designed experiments intentionally disturb the process and
then observe the results.

Consider the yield (concentration of desired product) from a

distillation column.

We call the yield our response.

Factors which may influence the yield:

I reboil temperature
I column pressure
I flow rate into the column
In a designed experiment, we manipulate the factors, allow
the system to reach equilibrium, and then we observe the
response.
23 Factorial Experiment

Suppose we wish to conduct an experiment using a “high”

and “low” level for each factor.

For example: Reboil temp. 120◦ − 150◦

pressure 2 − 3 atm.
flow rate 100 − 150 GPM

What would seem a reasonable strategy?

Use all possible combinations.

23 Factorial Experiment

Reboil
Temp. Pres. Flow Rate
120 2 100
150 2 100
120 3 100
150 3 100
120 2 150
150 2 150
120 3 150
150 3 150
23 Factorial Experiment
P

C
Chapter 2

Simple Linear Regression

Scientific Method and Models

The Scientiﬁc Method uses models to solve problems.

These models express the relationships among various

important characteristics.

They are often used to predict some outcome based on the

values of other characteristics.

Outcome: response or dependent variable

Other Characteristics: regressors, predictors, or independent

variables
Scientific Method and Models

Data are essential for building and conﬁrming models.

After collecting data, we must

I estimate the proposed model
I determine whether the hypothesized relationships truly
exist
I determine the adequacy of the model
I determine the ranges for the regressors that allow
reasonable prediction of the response
Simple Linear Regression Model

The simplest model is Simple Linear Regression, which

involves a single independent variable or regressor.

From high school, we deﬁne a straight line by

y = mx + b

I m is the slope
I b is the y -intercept
Later, we will use Greek letters to deﬁne the model.

This model provides another way to compare two groups!

Scatter Plots

Regression analysis is best taught via an example.

Consider the vapor pressure of water as a function of

temperature (the steam tables!)

As the temperature increases, the vapor pressure also

increases.

The following data are the vapor pressures (mm Hg) from
the freezing point of water to its boiling point (temperatures
reported in degrees Kelvin).
Scatter Plots

Temp vp
273 4.6
283 9.2
293 17.5
303 31.8
313 55.3
323 92.5
333 149.4
343 233.7
353 355.1
363 525.8
373 760.0
Scatter Plots
Scatter Plots

As expected, the plot suggests that as the temperature

increases, the vapor pressure increases (a positive
relationship).

We also see that the relationship is not perfect (the data do

not form a perfect straight line).

I There is a deﬁnite “curve” to the relationship!

I Is the simple linear model useful?
I Answer: Depends!
The Formal Simple Linear Regression Model
Suppose we believe that as at least a ﬁrst approximation,
there is a strictly linear relationship between temperature
and vapor pressure.

The appropriate model:

yi = β0 + β1 xi + ϵi ,
where
I yi is the response, in this case, the vapor pressure at the
ith temperature,
I xi is the predictor or regressor, in this case, the ith
temperature,
I β0 is the y -intercept,
I β1 is the slope (in our case, we expect β1 to be
positive), and
I ϵi is a random error.
The Formal Simple Linear Regression Model

We usually assume that the random errors are independent

and that they all have an expected value of 0 and variance
σ2.

With these assumptions,

E(yi ) = β0 + β1 xi ,
which is a straight line.

The statistical model represents the approximate relationship

between yi , the response of interest, and the xi which is the
regressor.
Interpreting the Slope

By knowing β1 , we know the relationship between y and x.

I If β1 < 0, then there is a negative relationship between
yi and xi .
I If β1 > 0, then there is a positive relationship.
I If β1 = 0, then there is no relationship!
Problem: We never know β1 !

Issue: How should we choose our estimates for β0 and β1 ?

Since β0 + β1 xi is a straight line, we should choose our

estimates to produce the “best” line through the data.

Note: There are an inﬁnite number of possible lines. How

shall we deﬁne the “best” line.
Least Squares Estimation of the Model

Consider an estimated relationship between y and x given by

ŷi = β̂0 + β̂1 xi .

Note:
I ŷi is an estimate or prediction of yi .
I β̂0 is an estimate of the y -intercept.
I β̂1 is an estimate of the slope.
One possible line through our scatter plot is the following.
Least Squares Estimation of the Model
Least Squares Estimation of the Model

Consider the diﬀerence between each actual observation yi

and its predicted value, ŷi .

We usually call this diﬀerence the ith residual, ei ; thus,

ei = yi − ŷi .
For a good estimated line, all of the residuals should be
“small”.

Thus, one possible measure of how good our estimated line is

∑
n ∑
n
ei = (yi − ŷi )
i=1 i=1
Least Squares Estimation of the Model

Problem: Note, for some data points ei < 0, for others,

ei > 0.

A poor ﬁt where ei is much less than 0 can be compensated

by another very poor ﬁt when ei is much larger than 0.
∑n
Thus, i=1 ei is not a particularly good measure.

A better measure?

∑
n ∑
n
SSres = ei2 = (yi − ŷi )2 .
i=1 i=1

which we call the sum of squares for the residuals (SSres ).

Least Squares Estimation of the Model

Our best estimated line, then, is the one which minimizes

SSres .

Therefore we wish to choose β̂0 and β̂1 such that SSres is

minimized.

What values of β̂0 and β̂1 accomplish this?

Least Squares Estimation of the Model
From basic calculus:

β̂0 = y − β̂1 x.

SSxy
β̂1 =
SSxx
where

∑
n
SSxy = (yi − y )(xi − x)
i=1

and

∑
n
SSxx = (xi − x)2
i=1
Statistical Properties of the Estimators

It can be shown that

E[β̂0 ] = β0 .

1 x2
var[β̂0 ] = σ 2 [ + ]
n SSxx

E[β̂1 ] = β1

σ2
var[β̂1 ] =
SSxx

x2
cov[β0 , β̂1 ] = −σ 2 .
SSxx
Understanding the Variance of β̂1
Suppose you control the xs.
How should you choose them to minimize var[β̂1 ]?
Partitioning the Total Variability

We can partition the total variability in the data, SStotal , into

two components:
I SSreg , the sum of squares due to the regression model,
and
I SSres , which is the sum of squares due to the residuals.
We deﬁne SSreg by

∑
n
SSreg = (ŷi − y )2 .
i=1

SSreg represents the variability in the data explained by our

model.

SSres represents the variability unexplained and presumed due

to error.
Partitioning the Total Variability

Note: If our model ﬁts the data well, then

I SSreg should be “large”, and
I SSres should be near 0.
On the other hand, if the model does not ﬁt the data well,
then
I SSreg should be near 0, and
I SSres should be large.
R2

One reasonable measure of the overall performance of our

model is the coeﬃcient of determination, R 2 , given by

SSreg SSres
R2 = =1−
SStotal SStotal
It can be shown that 0 ≤ R 2 ≤ 1.

Note: If the ﬁt is good, SSres is near 0 and R 2 is near 1.

If the ﬁt is poor, SSres is large and R 2 is near 0.

A problem with R 2 : What deﬁnes a good value?

The answer depends upon the application area.

Typically, in many engineering problems, R 2 > .9

However, there are some very “noisy” systems where a good

R 2 is ≃ .20.
The Overall F -Test
This procedure focuses purely on whether some relationship,
either positive or negative, exists between the response and
the regressor.

Consequently, it is inherently a two-sided procedure.

In general, this test evaluates the overall adequacy of the

model.

For simple linear regression, this test reduces to a two-sided

test for the slope, in which case, our hypotheses are

H0 : β1 = 0
Ha : β1 ̸= 0

In multiple regression, this test simultaneously evaluates all

of the slopes.
The Overall F -Test
Our test statistic is based on MSreg which is deﬁned by

SSreg
MSreg = ,
dfreg
where

dfreg = the number of regressors

In the case of simple linear regression, dfreg = 1.

Our test statistic is

MSreg
F = .
MSres
The degrees of freedom for the test statistic are 1 for the
numerator and n − 2 for the denominator (for simple linear
regression).
The Overall F -Test
It can be shown that

E [MSreg ] = σ 2 + β12 SSxx

E [MSres ] = σ 2
One way to view this F statistics is as a signal-to-noise ratio.

MSreg is a standardized measure of what the model explains

(a signal).

MSres is a standardized measure of the error (a measure of

noise).

This F statistic follows a central F distribution under the

null hypothesis.
The Overall F -Test

Under the alternative hypothesis, it follows a non-central F

distribution, denoted by

′
Fdfreg ,dfres ,λ

where λ is the non-centrality parameter. In this case,

1
λ= SSxx β12 .
σ2
Note: under the null hypothesis, λ = 0.

In general, λ controls the power of the test.

I λ near 0 yields poor power.
I large λ yields good power.
The Overall F -Test
We typically use the following analysis of variance (ANOVA)
table to summarize the calculations for this test.

Degrees of Sum of Mean

Source Freedom Squares Squares F

Regression dfreg SSreg MSreg F

Residual dfres SSres MSres
Total n−1 SStotal

Source refers to our partition of the total variability into two

components: one for the regression model, and the other for
the residual or error.

For simple linear regression, the degrees of freedom for the

model are

number of parameters − 1 = 2 − 1 = 1.
The Overall F -Test

The degrees of freedom for the residuals for this particular

situation are

number of obs. − number of parameters est.

= n − 2 = 11 − 2 = 9.
We obtain the mean squares by dividing the appropriate sum
of squares by the corresponding degrees of freedom.

We calculate the F statistic by dividing the mean square for

regression by the mean square for the residuals.
Test for β1
The beneﬁt of this test is that we can look at the alternative
hypotheses:

Ha : β1 < 0 Ha : β1 > 0 Ha : β1 ̸= 0.
The form of the test statistic is

β̂1
t=
σ̂β̂1

where σ̂β̂1 is the standard error of β̂1 , which is

√
MSres
.
SSxx

Apart from rounding errors, the square of the value for the t
statistic is the F statistic from the global test.
Test for β1

Under the null hypothesis, this t statistic follows a central t

distribution with dfres degrees of freedom.

Under the alternative hypothesis, it follows a non-central t

distribution, denoted by

′
tdfres ,δ

where
√
β1 SSxx
δ=
σ
Note: δ controls the power of the test.

If you control the xs, thus, SSxx , how should you pick them
to maximize the power?
Confidence and Prediction Bands
We can construct conﬁdence intervals around any
observation by noting that the estimated variance for a
predicted value is
( )
1 (x − x)2
MSres + .
n SSxx

We can construct prediction intervals for predicting new

observations by noting that the estimated variance for this
new observation is
( )
1 (x − x)2
MSres 1 + + .
n SSxx

Of course, no one does any of the calculations for simple

linear regression by hand!
Where is the prediction variance largest?
What are the consequences!
Using Software
Regression Analysis: vp versus temp

The regression equation is

vp = - 1956 + 6.69 temp

Predictor Coef SE Coef T P

Constant -1956.3 363.8 -5.38 0.000
temp 6.686 1.121 5.96 0.000

S = 117.569 R-Sq = 79.8% R-Sq(adj) = 77.6%

Analysis of Variance

Source DF SS MS F P
Regression 1 491662 491662 35.57 0.000
Residual Error 9 124403 13823
Total 10 616065
Using Software

Predicted Values for New Observations

New
Obs Fit SE Fit 95% CI 95% PI
1 -131.1 66.3 (-281.1, 18.9) (-436.5, 174.3)
2 -64.2 57.2 (-193.6, 65.1) (-360.0, 231.5)
3 2.6 48.9 (-107.9, 113.1) (-285.4, 290.6)
4 69.5 41.9 ( -25.4, 164.3) (-212.9, 351.8)
5 136.3 37.2 ( 52.2, 220.4) (-142.6, 415.3)
6 203.2 35.4 ( 123.0, 283.4) ( -74.6, 481.0)
7 270.0 37.2 ( 185.9, 354.1) ( -8.9, 549.0)
8 336.9 41.9 ( 242.0, 431.8) ( 54.5, 619.3)
9 403.7 48.9 ( 293.2, 514.3) ( 115.7, 691.8)
10 470.6 57.2 ( 341.3, 599.9) ( 174.9, 766.3)
11 537.4 66.3 ( 387.4, 687.5) ( 232.1, 842.8)
Using Software
Some Considerations in the Use of Regression

I Primary purpose of regression models: interpolation.

I Be careful with extrapolation.
I A little is usually OK, especially if the model is
reasonably correct.
I A lot is almost always bad!
I The location of the x plays a huge role in the least
squares estimates!

We will discuss this topic in depth later in the course.

Some Considerations in the Use of Regression

I Be careful with outliers!

I Natural tendency: toss them out.
I Bad idea: ”All models are wrong; some models are
useful.”
I Should trust data before model!
I Often, most important observations in a study are the
”true” outliers.
Some Considerations in the Use of Regression

I Strong relationships in regression do not imply causality!

I Be especially careful when the data are either historical
or from an observational study.
I Too many researchers have been burned by spurious
correlation.
I In some applications of regression for prediction, the
regressors, the xs are predictions themselves.
I Consider predicting electrical demand for tomorrow.
I Important regressor (predictor) tomorrow’s high and/or
low temperature.
I Note: both are predictions!
Chapter 3

Introduction to Multiple Linear Regression

Multiple linear regression is a straight-forward extension of

simple linear regression.

We spent a lot of time in simple linear regression laying the

necessary foundation for multiple linear regression.

Two engineers used an observational study to examine the

impact of
I x1 ﬁxed carbon
I x2 percent ash
I x3 percent sulfur
on coking heat in BTU per pound (y ).
Introduction to Multiple Linear Regression
x1 x2 x3 y
83.2 11.2 0.61 625
78.9 5.1 0.60 680
76.1 5.3 1.65 680
72.2 8.1 1.06 710
73.2 7.0 1.02 710
73.8 6.0 0.75 685
70.6 8.6 0.74 705
68.4 9.8 1.09 685
70.5 6.7 0.76 680
63.2 11.8 1.85 700
55.8 10.4 0.71 720
56.3 8.8 1.70 705
57.1 5.3 0.93 730
55.6 6.6 0.90 715
54.7 6.5 1.54 705
Introduction to Multiple Linear Regression

x1 x2 x3 y
53.4 4.5 1.10 730
60.4 9.9 1.08 725
60.8 8.1 1.41 710
61.9 6.8 1.03 710
61.8 6.8 0.99 700
61.9 6.6 0.90 715
61.1 6.4 0.91 710
59.0 7.6 1.36 740
59.3 7.0 1.31 730
56.6 7.6 1.07 730
Introduction to Multiple Linear Regression

The following questions should guide our analysis:

1. What is our model and how should we estimate it?
2. What is the overall adequacy of our model?
3. Which speciﬁc regressors seem important?
Once we begin residual analysis, we shall add the following
questions:
I Is the model reasonably correct?
I How well do our data meet the assumptions required for
our analysis?
Model and Ordinary Least Squares Revisited
The multiple linear regression model is

yi = β0 + β1 xi1 + β2 xi2 + . . . + βk xik + ϵi

∑
k
yi = β0 + βj xij + ϵi
j=1

where
I yi is the ith response,
I xij is the ith value for the jth regressor,
I k is the number of regressors,
I β0 is the y -intercept,
I βj is coeﬃcient associated with the jth regressor, and
I ϵi is a random error with mean 0 and constant variance
σ2.
Model and Ordinary Least Squares Revisited

It is important to note that we can no longer call the βj ’s

slopes.

The βj ’s represent the expected change in y given a one unit

change in xj if we hold all of the other regressors constant.

Once again, we estimate the model using least squares.

Model and Ordinary Least Squares Revisited

The estimated model is

ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + . . . + β̂k xik

∑
k
ŷi = β̂0 + β̂j xij
j=1

where
I ŷi is the predicted response,
I β̂0 is the estimated y -intercept, and
I β̂j is the estimated coeﬃcient for the jth regressor.
Model and Ordinary Least Squares Revisited

In multiple regression, we once again ﬁnd the β̂’s that

minimize

∑
n
SSres = (yi − ŷi )2 .
i=1

In this course, we always let computer software packages

perform the estimation.
Matrix Notation

Consider the matrix formulation of the model:

y = Xβ + ϵ
where

       
y1 1 x11 x12 . . . x1k β0 ϵ1
 y2   1 x21 x22 . . . x2k   β1   ϵ2 
y =  ..



X =  .. .. .. ..
 
..  β =  ..
 
 ϵ = ..


. . . . . .   .   . 
yn 1 xn1 xn2 . . . xnk βk ϵn
Matrix Notation

We typically assume that var[ϵ ] = σ 2 I .

The ordinary least squares estimate of β minimizes

SSres = (y − ŷ )′ (y − ŷ ) = (y − X β̂)′ (y − X β̂).

The resulting normal equations are:

(X ′ X )β̂ = X ′ y .

The resulting ordinary least squares estimate of β is

β̂ = (X ′ X )−1 X ′ y .
Matrix Notation

The variance of β̂ is

σ 2 (X ′ X )−1 .
The sum of squares of the residuals is

y ′ [I − X (X ′ X )−1 X ′ ]y .
The vector of predicted values is

ŷ = X (X ′ X )−1 X ′ y .
Sometimes we call X (X ′ X )−1 X ′ the hat matrix.
Matrix Notation

Let x 0 be a speciﬁc point in the regressor space.

Let ŷ (x 0 ) be the resulting predicted value.

var [ŷ (x 0 )] = σ 2 x ′0 (X ′ X )−1 x 0 .

The term x ′0 (X ′ X )−1 x 0 is a Mahalanobis distance.

We determine the overall adequacy of the model in two ways.

First, the multiple coeﬃcient of determination, R 2 , given by

SSreg SSres
R2 = =1− ,
SStotal SStotal
where

∑
n
SSreg = (ŷi − y )2 ,
i=1

which is the same way that we deﬁned the coeﬃcient of

determination for simple linear regression.
R2

The issues of interpreting R 2 are the same as in the case of

simple linear regression.
2 is popular (will discuss in Chapter 10)
Radj

MSres
2
Radj =1−
MStotal
Overall F Test
Second, the overall F test, which tests the hypotheses

H0 : β1 = β2 = · · · = βk = 0
Ha : at least one of the β’s ̸= 0.

The test statistic is based on MSreg and MSres , which are

deﬁned just as they were for simple linear regression.
Thus,

SSreg
MSreg = ,
dfreg
where dfreg is the number of regressors, and

SSres
MSres = ,
dfres
where dfres is the number of observations (n) minus the
number of parameters estimated (k + 1).
Overall F Test

The resulting test statistic is

MSreg
F = ,
MSres
and has k numerator degrees of freedom and n − k − 1
denominator degrees of freedom.

It can be shown that

E (MSres ) = σ 2

[ ]
β ′ X ′ X (X ′ X )−1 X ′ − 1(1′ 1)−1 1′ X β
1
E (MSreg ) = σ 2 +
n−p
where 1 is a n × 1 vector of 1’s.
The t Test for an Individual Coeﬃcient

We determine whether a speciﬁc regressor makes a

signiﬁcant contribution through t-tests of the form

H0 : βj = 0
Ha : βj ̸= 0

The test statistics have the form

β̂j
t= .
σ̂β̂j
The t Test for an Individual Coeﬃcient

It is important to note that these tests actually are tests on

the contribution of the speciﬁc regressor given that all of the
other regressors are in the model.

Thus, these tests do not determine whether the speciﬁc

regressor is important in isolation from the eﬀects of the
other regressors.

This point emphasizes why we cannot call the β’s slopes!

Conﬁdence and Prediction Bands
Conﬁdence intervals and prediction intervals follow similarly
to before.

Let x 0 be a setting for the regressors for which we wish to

predict the response.

The estimated variance for the predicted response is

MSres · x ′0 (X ′ X )−1 x 0 .
The estimated variance for predicting a new response at that
setting is

MSres · [1 + x ′0 (X ′ X )−1 x 0 ].

Again, we let computer software packages perform this task.

Hidden Extrapolation!
Extra Sum of Squares Principle

We may rewrite our model as

y = Xβ + ϵ = X 1 β1 + X 2 β2 + ϵ
Consider a hypothesis test of the form

H0 : β 2 = 0
Ha : β 2 ̸= 0

Deﬁne SS(β 2 |β 1 ) by
[ ]
SS(β 2 |β 1 ) = y ′ X (X ′ X )−1 X ′ − X 1 (X ′1 X 1 )−1 X ′1 y
Extra Sum of Squares Principle

SSreg is an example of this principle where

X = [1 X r ]
[ ]
SSreg = y ′ X (X ′ X )−1 X ′ − 1(1′ 1)−1 1′ y
This approach also includes the t tests on the individual
parameters.
Extra Sum of Squares Principle

We can show that

[ ]
E [SS(β 2 |β 1 )] = p2 σ 2 + β2′ X ′2 I − X 1 (X ′1 X 1 )−1 X 1 X 2 β2
where p2 is the number of parameters in β 2 .

Our test statistic is

SS(β 2 |β 1 )/p2
F =
MSres
Extra Sum of Squares Principle

Under Ha , this statistic is a non-central F with

1 ′ ′[ ]
λ= 2
β2 X 2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β 2
σ
Note:
I The larger λ, the more power
I The smaller λ, the less power
I λ = 0, no power!
Extra Sum of Squares Principle

If we control X , how do we maximize power?

If X ′2 X 1 = 0 (X 2 and X 1 are orthogonal),

1 ′ ′[ ]
λ =
σ2
β2 X 2 I − X 1 ( X ′
1 X 1 ) X
−1 ′
1 X 2 β2
1 ′[ ′ ′ ]
= 2
β 2 X 2 X 2 − X ′2 X 1 (X ′1 X 1 )−1 X ′1 X 2 β 2
σ
[ ]
= β ′2 X ′2 X 2 β2

Thus, λ is maximized!

We do this with planned experiments!

Impact of Collinearity on Testing

What is the worst case scenario?

Suppose there exists a matrix A such that

X 2 = X 1A
In this situation, the columns of X 2 are linear combinations
of the columns of X 1 .

Key point: The regressors that form X 2 are perfectly related

to at least some of the regressors in X 1 .
Impact of Collinearity on Testing

Note:

1 ′ ′[ ]
λ =
σ2
β 2 X 2 I − X 1 ( X ′
1 X 1 ) X
−1 ′
1 X 2 β2
1 ′ ′ ′[ ]
= 2
β 2 A X 1 I − X 1 (X ′1 X 1 )−1 X ′1 X 1 Aβ 2
σ
1 ′[ ′ ′ ]
= 2
β 2 A X 1 X 1 A − A′ X ′1 X 1 (X ′1 X 1 )−1 X ′1 X 1 A β 2
σ
1 ′[ ′ ′ ]
= 2
β 2 A X 1 X 1 A − A′ X ′1 X 1 A β 2 = 0
σ
Impact of Collinearity on Testing

This situation is the worst case of collinearity.

It is common with historical data or observational studies

that

X 2 ≈ X 1A
In this situation, the regressors that form X 2 are almost
perfectly related to at least some of the regressors in X 1 .
A real example: Georgia Power Data
Using Software
We illustrate the basic multiple linear regression analysis with
the Coking Heat data.

Regression Analysis: y versus x1, x2, x3

The regression equation is

y = 865 - 2.23 x1 - 2.21 x2 + 0.15 x3

Predictor Coef SE Coef T P

Constant 864.90 31.23 27.70 0.000
x1 -2.2264 0.3944 -5.65 0.000
x2 -2.209 1.630 -1.36 0.190
x3 0.154 9.719 0.02 0.988

S = 14.9831 R-Sq = 65.8% R-Sq(adj) = 60.9%

Using Software

Analysis of Variance

Source DF SS MS F P
Regression 3 9081.6 3027.2 13.48 0.000
Residual Error 21 4714.4 224.5
Total 24 13796.0
Chapter 4

Introduction to Residuals
Underlying Assumptions for OLS

It is always important to check the assumptions underlying our

analysis.

One fundamental assumption is that the fitted model is correct.

Least squares estimation of the model makes three other

assumptions about the random errors
1. They have an expected value of zero.
2. They have constant variance.
3. They are independent.
Our testing procedures require the additional assumption that the
random errors follow a well-behaved distribution.
Information about the Random Errors

It is crucial to note that our assumptions are on our random errors,

which are unobservable!

Recall, ei = yi − ŷi .

The vector of residuals is e = y − ŷ .

It can be shown that if the model is correct, then

E[e ] = 0.

It also can be shown that

var[e ] = σ 2 [I − X (X ′ X )−1 X ′ ].
Information about the Random Errors

As a result, the variance of a specific residual is

σ 2 (1 − hii )
where hii is the i th diagonal element of the hat matrix.

It can be shown that 1/n ≤ hii ≤ 1.

Note: the residuals do not have constant variance!

Clearly, there are serious issues with the raw residuals, e .

Types of Residuals
EXCEL has an option to use what it calls standardized residuals,
which are
e
√ i .
MSres
This residual is junk!!!

A better way to standardize the residual is the internally

studentized residual:
ei
√ .
MSres (1 − hii )
Minitab calls this the standardized residual.

These residuals do have constant variance, but there are technical

problems with them.
Types of Residuals

The best residual is the externally studentized residual:

ei
√ .
MSres,−i (1 − hii )
where MSres,−i is the mean squared residual calculated when we
drop the i th observation.

This residual actually follows a t-distribution with n − k − 2

degrees of freedom.
Leverage, Influence, and Outliers

Potentially problematic data points are:

I outliers
I leverage points
I influential points
Outliers are points where the observed response does not appear to
follow the pattern of the other data.

Leverage points are distant from the other data in terms of the
regressors.

Leverage points have hii s that are large, near 1.

Influential points combine both ideas.

Example of an Outlier
Example of a Leverage Point
Example of an Influential Point
Outliers

The deleted residuals are the best guides to outliers.

Important consideration: What do you do if you find an outlier?

In many analyses, the outliers are the most informative data points!
Surfactant Data
Useful Plots

Some useful graphical techniques for checking the other

assumptions:
I A plot of the studentized residuals against the predicted
values, which checks the constant variance assumption and
helps to identify possible model misspecification
I A plot of the studentized residuals against the regressors,
which also checks the constant variance and no model
misspecification assumptions
I A plot of the studentized residuals in time order, which checks
the independence assumption
I A normal probability plot of the studentized residuals, which
also checks the well-behaved distribution assumption
If you have the option, use the studentized (deleted) residuals
rather than the raw or standardized residuals.
“Good” Plots

For normal probability plots: Straight line able to be covered by a

“fat” pen.

Other plots: Random scatter.

Classic problem plot: Funnel

I fairly common
I variance increases with the mean
I often correctable by a transformation.
Funnel Pattern
Coking Heat Example: Normal Plot
Coking Heat Example: Residuals versus Fits
Coking Heat Example: Residuals versus x1
Coking Heat Example: Residuals versus x2
Coking Heat Example: Residuals versus x3
Coking Heat Example: Residuals versus Order
Vapor Pressure Example: Normal Plot
Vapor Pressure Example: Residuals versus Fits
Vapor Pressure Example: Residuals versus Temp
Chapter 5

Transformations
Common Transformations

Typically, we respond to problems with the residual plots by

transformations of the data, primarily on the responses.

Primary purposes:
I correct problems with the constant variance assumption
I corrects problems with the normality assumption.
Common Transformations

In some cases, we know coming into the problem, based on theory,

that certain transformations are appropriate.
I log (engineering data)
I square root (count data and some engineering data)
I logisitc (binomial data)
For example, with the vapor pressure data, there are solid reasons
to use log(vapor pressure) from physical chemistry.
Box-Cox Family of Transformations
Some people like the Box-Cox family of transformations.




y λ −1
λ ̸= 0
λy ∗λ−1
y (λ) =

 y ∗ ln(y ) λ = 0

where y ∗ is the geometric mean of the observations.

This formulation allows us to compare the SSres for models with

diﬀerent λ.

The Box-Cox transformations are an extension of the power family.

Some software packages are set up to maximum likelihood to

estimate jointly the regression coeﬃcients and λ.

The latest version of Minitab does.

Vapor Pressure of Water - log Transformation

Regression Analysis: ln_vp versus temp

The regression equation is

ln_vp = - 12.0 + 0.0507 temp

Predictor Coef SE Coef T P

Constant -12.0162 0.5431 -22.13 0.000
temp 0.050666 0.001673 30.28 0.000

S = 0.175509 R-Sq = 99.0% R-Sq(adj) = 98.9%

Vapor Pressure of Water - log Transformation

Analysis of Variance

Source DF SS MS F P
Regression 1 28.238 28.238 916.71 0.000
Residual Error 9 0.277 0.031
Total 10 28.515

The previous R 2 was 79.8

New Normal Plot
New Residuals versus Fits
Scatter Plot - Based on the log Transformation
Original Scatter Plot
Chapter 6

Inﬂuence Diagnostics
“Too Much in Love with Your Model”

Recall Box’s statement:

All models are wrong; some are useful.

Many people trust their models over their data!

Big Problem!

Consider a plot of surface tension of water versus amount of

surfactant.
“Too Much in Love with Your Model”
Cleaning Data

Certainly, there can be problems with the data.

Data quality is always a serious issue.

Residual plots are very useful.

Personal experience: Federal law suit over the weighing of chickens.

Once we clean the data, we must trust the data!

Big Picture for this Chapter

Recall: what do you do when you identify a potential outlier?

Purpose of this chapter: How to identify ”problem children.”

This chapter is short but extremely important, especially for

projects!!

We need to review the basic types of outliers:

I ”pure” outlier
I leverage point
I inﬂuential point
Our goal is to discuss numerical measures to help in identifying
each type of outlier
Example of a Pure Outlier
Example of a Leverage Point
Example of an Influential Point
Finding Pure Outliers

The externally studentized residual, R student, is an eﬀective tool

for identifying pure outliers

Major caution:
I Natural tendency is to say an observation is an outlier
I if the absolute value > 2 (α = .05)
I if the absolute value > 3 (α = .0027)
I Such a cut-oﬀ ignores the multiple comparison problem!
( )
I If we have n observations, actually are performing n2
comparisons.
I To preserve an overall α of .05 (which is large!), need to use
0.05
(n) for each observation.
2
Leverage

Leverage is an outlier in terms of the ”data cloud”, the x i ’s.

We have dealt with this issue earlier with prediction intervals and
hidden extrapolation.

The most appropriate information on the distances of the x i ’s from

each other comes from the hat matrix:

H = X (X ′X )−1X ′
Leverage

The diagonal elements of H are:

x ′i (X ′X )−1x i
are the distances of the individual x i ’s from the center of the data
cloud.

Most textbooks declare a speciﬁc x i to be a leverage point if

x ′i (X ′X )−1x i > p2
Why?
Leverage
Consider the average value for the hat diagonals.
∑n
average = i=1 i
x ′ (X ′X )−1x i
n
We next note that the trace of H is the sum of the hat diagonals!

[ ]
trace [H ] = trace X (X ′X )−1X ′
[ ]
= trace X ′X (X ′X )−1
= trace [I p ] = p

As a result, the average value for the hat diagonals is pn .

The recommended cut-oﬀ value is simply twice the average.

Note: There is no underlying statistical theory justifying this

cut-oﬀ!
Cook’s D

One of the ﬁrst inﬂuence measures was Cook’s D.

The basic idea: How much does a speciﬁc data point impact the
vectors of predicted values?

Mathematically,
( )′ ( )
ŷ (i) − ŷ ŷ (i) − ŷ
Di =
pMSres
Cook’s D

Operationally, the statistics is

( )′ ( )
β̂ (i) − β̂ X X β̂ (i) − β̂
′
Di =
pMSres
A data point is considered potentially inﬂuential if Di > 1.

The statistical justiﬁcation for this cut oﬀ is weak, but it seems to

work well.
DFFITS

The basic idea: how much does the prediction of the i th response
change when we drop the i th data point?

Computational formula:

ŷi − ŷ(i)
DFFITSi = √
MSres,(i) hii
The generally recommended cut-oﬀ value is
√
p
|DFFITSi | > 2
n
DFBETAS

Basic idea: How much does the estimate of the j th regression

coeﬃcient change if we drop the i th data value?

Computationally:

β̂ j − β̂ j(i)
DFBETASi,j = √
MSres,(i) Cjj
where Cjj is the j th diagonal element of (X ′ X )−1 .

The usually accepted cut-oﬀ value is

2
|DFBETASij | > √
n
COVRATIO

The basic idea is to see how much the generalized variance

changes when we drop the i th data value.
[ ]
The generalized variance is det σ 2 (X ′ X )−1 .

The conﬁdence ellipsoid around β̂ is proportional to the

generalized variance.

The statistic is deﬁned by

[ ]
(X ′(i) X (i) )−1 MSres,i
(X ′ X )−1 MSres
COVRATIOi = det
COVRATIO

Note:
I COVRATIOi > 1 indicates that the i th data value improves
precision
I COVRATIOi < 1 indicates that the i th data value hurts
precision
Suggested cut-oﬀ values are
I COVRATIOi > 1 + 3p/n
I COVRATIOi < 1 − 3p/n
Jet Turbine Example: Table B.13 in the Appendix

The regression equation is

thrust = - 4738 + 1.12 primary - 0.0302 secondary
+ 0.231 fuel + 3.85 press
+ 0.822 exhaust - 16.9 ambient

Predictor Coef SE Coef T P

Constant -4738 2445 -1.94 0.061
primary 1.1185 0.2865 3.90 0.000
secondary -0.03018 0.03823 -0.79 0.435
fuel 0.2306 0.1180 1.95 0.059
press 3.850 2.686 1.43 0.161
exhaust 0.8219 0.3507 2.34 0.025
ambient -16.946 2.620 -6.47 0.000
Jet Turbine Example: Table B.13 in the Appendix

S = 26.5088 R-Sq = 99.8% R-Sq(adj) = 99.7%

Analysis of Variance
Source DF SS MS F P
Regression 6 9908846 1651474 2350.13 0.000
Residual Error 33 23190 703
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x2
Residuals versus x3
Residuals versus x4
Residuals versus x5
Residuals versus x6
Influence Analysis

We note that p = 7:
I 2p
Cut oﬀ for leverage: = .35
n
√
I Cut oﬀ for DFFITS: 2 pn = .8367
Flagged observations:
I No. 11 hii = 0.4987, DFFITS = 1.6546
I No. 20 hii = 0.7409, Cook’s D = 3.0072, DFFITS = −5.1257
I No. 28 DFFITS = 0.8925
Chapter 9

Overview to Collinearity
What is Multicollinearity?

Multicollinearity is a major problem with observational data sets.

Designed experiments are planned speciﬁcally to avoid this

problem!!

When we have a problem with multicollinearity, at least one of the

regressors is very well explained by the other regressors.

In technical terms, one of the columns of X is almost a linear

combination of the other columns.
What is Multicollinearity?

If one of the columns of X is a perfect linear combination of the

other columns, then (X ′ X )−1 does not exist.

Another way to explain this phenomenon is that at lest one of the

eigenvalues of X ′ X is almost zero.
The eigenvalues of (X ′ X )−1 are the inverse of the eigenvalues of
X ′X .
The eigenvalues of X ′ X deﬁne the ellipsoid surrounding the data
cloud formed by the xs.

These eigenvalues measure the spread of the xs in speciﬁc,

orthogonal directions.
Consequences and Causes

I Inﬂated variances for the estimated coeﬃcients

I Reduced ability to see important regressors
I Poor ability to predict due to hidden extrapolation
Causes of collinearity include:
I the data collection method (artifact of poor sampling)
I constraints on the population
I model speciﬁcation (polynomial terms)
I overdeﬁned models (most common)
Georgia Power Study
Diagnostics

Common diagnostics include:

I the correlation matrix among the regressors
I the determinant of X ′X
I the ”condition numbers”
I the variance inﬂation factors
Diagnostics

The correlation matrix among the regressors looks at the pairwise

relationships.

In many cases, only two regressors are highly related.

However, pairwise correlations cannot detect more complicated

relationships

The determinant is the product of the eigenvalues.

As a result, a determinant near zero indicates that at least one

eigenvalue is near zero.

Of course, what is near zero?

Diagnostics

Most people deﬁne the condition number as the ratio of the largest
eigenvalue of X ′ X to the smallest.

Large values indicate a problem.

Most people use a cut-oﬀ of 1000.

The variance inﬂation factors (VIFs) are the diagonal elements of

the inverse of the correlation matrix.

Let Rj2 be the R 2 if we regressed xj on the other regressors.

Diagnostics
The corresponding VIF is
1
VIFj = .
1 − Rj2
Usual cut-oﬀ value for VIFs is 10, in some cases 5.

Note:
I a VIF of 2 implies that Rj2 = 0.5, which implies a pairwise
r ≈ 0.7
I a VIF of 5 implies that Rj2 = 0.8, which implies a pairwise
r ≈ 0.9
I a VIF of 10 implies that Rj2 = 0.9, which implies a pairwise
r ≈ 0.95
Key point: VIFs > 5 represent serious relationships among the
regressors!
Jet Turbine Data

The regression equation is

thrust = - 4738 + 1.12 primary - 0.0302 secondary
+ 0.231 fuel + 3.85 press
+ 0.822 exhaust - 16.9 ambient

Predictor Coef SE Coef T P VIF

Constant -4738 2445 -1.94 0.061
primary 1.1185 0.2865 3.90 0.000 289.113
secondary -0.03018 0.03823 -0.79 0.435 71.830
fuel 0.2306 0.1180 1.95 0.059 168.051
press 3.850 2.686 1.43 0.161 219.972
exhaust 0.8219 0.3507 2.34 0.025 32.409
ambient -16.946 2.620 -6.47 0.000 8.477

S = 26.5088 R-Sq = 99.8% R-Sq(adj) = 99.7%

Jet Turbine Data

Analysis of Variance

Source DF SS MS F P
Regression 6 9908846 1651474 2350.13 0.000
Residual Error 33 23190 703
Total 39 9932036
Designed Experiments

Consider the VIFs for a 24 factorial experiment.

The regression equation is

y = 6.16 + 0.456 x1 + 1.65 x2 + 3.22 x3 + 1.14 x4

Predictor Coef SE Coef T P VIF

Constant 6.1550 0.4092 15.04 0.000
x1 0.4562 0.4092 1.12 0.289 1.000
x2 1.6487 0.4092 4.03 0.002 1.000
x3 3.2163 0.4092 7.86 0.000 1.000
x4 1.1425 0.4092 2.79 0.018 1.000

S = 1.63670 R-Sq = 88.8% R-Sq(adj) = 84.7%

Designed Experiments

Analysis of Variance

Source DF SS MS F P
Regression 4 233.218 58.304 21.77 0.000
Residual Error 11 29.467 2.679
Total 15 262.684
Correcting Multicollinearity

I Standard model reduction

I Formal model selection (Chapter 10)
I Ridge regression (see text)
I Principal components regression(see text)
Chapter 10

Model Selection
Basic Issues

We face two risks whenever we perform model selection:

I leaving out important regressors
I including unimportant regressors.
Basic Issues

Underspecifying the model:

I Our coefficient estimates may be biased.
I The variance estimate is biased and too large.
I We have reduced ability to see significant effects
Overspecifying the model:
I We have inflated prediction variances.
I We run the risk of multicollinearity.
I We are claiming that some of the regressors are important
when, in fact, they are not.
Impact of Underspecifying the Model

Suppose we ﬁt the model

y = X 1 β1 + ϵ
when the true model is

y = X 1 β1 + X 2 β2 + ϵ .
We note that

β̂ = β̂ 1 = (X ′1 X 1 )−1 X ′1 y .
Impact of Underspecifying the Model

Consequences:

(1) E(β̂)

E(β̂) = E(β̂ 1 ) = β 1 + (X ′1 X 1 )−1 X ′1 X 2 β 2

′
If X 2X 1 = 0, then there is no bias (think planned experiment).

(2) var(β̂)

var(β̂1 ) = σ 2 (X ′1 X 1 )−1
(3) SSres

[ ]
E(SSres ) = (n − p1 )σ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β2
Impact of Underspecifying the Model

(4) MSres , which is our estimate of σ 2 !

1 [ ]
E(MSres ) = σ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β 2
n − p1
Consequences:
I Our estimate of σ 2 is biased too large.
I The denominator of all of our tests is larger than it should be.
I We have reduced power!
Impact of Underspecifying the Model

(5) E [ŷi ]

E [ŷ (x 0 )] = x ′i,1 β 1 + x ′i,1 (X ′1 X 1 )−1 X ′1 X 2 β 2

where x i,1 is the value of the speciﬁc point for the model ﬁt.

Formally, bias is deﬁned by

E(θ̂) − θ
where θ is the parameter being estimated.
Impact of Underspecifying the Model

In this case, the bias is

bias (ŷi ) = x ′i,1 (X ′1 X 1 )−1 X ′1 X 2 β 2 − xi,2

′
β2
where xi,2 is the part of the speciﬁc point that should be ﬁt but is
not.

We can show that

∑
n
[ ]
bias2 (ŷi ) = β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β2
i=1

Note: This squared bias is very similar to the bias for MSres !
Impact of Overspecifying the Model

Consider the sum of the predicted variances:

∑
n
var(ŷi )
i=1

We observe that
[ ′ −1 ′
]
var(ŷ ) = σ2 X 1 (X 1 X 1 ) X1

Thus,
[ the sum of the] predicted variances is the trace of
σ 2 X 1 (X ′1 X 1 )−1 X ′1 .

However, we already have shown that this sum is p1 σ 2 .

As a result, if we include terms into our model that are truly 0,

then we merely inﬂated the total (and average) prediction variance!
Measures of Model Adequacy

Important measures of model adequacy include:

I R2
I adjusted R 2 , Radj
2

I Mallow’s Cp and extensions such as AIC

I the PRESS statistic
A major drawback to R 2 is that as we increase the number of
terms in the model, R 2 cannot decrease!
2
Radj

2 is designed to correct this feature.

Radj

Recall,

SSreg SSres
R2 = =1− .
SStotal SStotal
2 by
We deﬁne Radj

MSres
2
Radj =1− .
MStotal

Good values are near 1.

2 minimizes
It can be shown that the model that maximizes Radj
MSres .
Mallow’s Cp

In theory, Mallow’s Cp balances

I the bias from underspecifying the model
I with the increase in the prediction variance when we
overspecify.
Mallows uses a mean squared error argument to derive Cp .
Mallow’s Cp

Consider

∑
n ∑
n ∑
n
MSE(ŷi ) = var(ŷi ) + bias2 (ŷi ).
i=1 i=1 i=1

From previous results, we can show that

∑
n
[ ]
MSE(ŷi ) = pσ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β2 .
i=1

The problem is how to estimate the sum of squared biases.

Mallow’s Cp

We note that

1 [ ]
E(MSres ) = σ 2 + β ′2 X ′2 I − X 1 (X ′1 X 1 )−1 X ′1 X 2 β 2
n − p1
As a result, an unbiased estimate of the sum of the squared biases
is
[ ]
(n − p1 ) MSres − σ 2
Mallow’s Cp

The formal deﬁnition of Cp is

1 ∑
n
Cp = MSE(ŷi )
σ2
i=1

The most common estimate of Cp is

(n − p) [MSres − MSres,full ]
Cp = p +
MSres,full
where MSres,full is the mean squared residual from the full model.

Basically, we use the MSres from the full model as our estimate of
σ2.
Mallow’s Cp

In theory, the optimal value is the number of parameters in the

model.

In practice, we look for small values, and it is possible to have

negative values.

It can be shown that

SSres
Cp = 2p + − n.
σ2
PRESS
The PRESS statistic stands for predicted sum of squares.

It is deﬁned to be
n (
∑ )2
ei
PRESS = .
1 − hii
i=1

PRESS is a cross-validation type statistic.

Good values are small.

In general,
I 2 likes bigger models.
Radj
I PRESS likes smaller models.
I Cp falls in between.
Other Popular Measures

Other popular measures involve the likelihood function, L.

Note: These measures assume a likelihood based estimation

technique, which makes stronger assumptions than OLS.
I AIC: The Akaike Information Criterion:

AIC = −2ln(L) + 2p
I BIC: Bayesian Information Criterion.

There are several; one is the Schwartz criterion, used by R:

BICSch = −2ln(L) + pln(n)

AIC may be viewed as a generalization of Mallow’s Cp .

If we assume that σ 2 is known, then for normally distributed data

with constant variance,

SSres
AIC = 2p + n ln(2π) + n ln(σ 2 ) +
σ2
SSres
= 2p + − n + n + n ln(2π) + n ln(σ 2 )
σ2
SSres
= 2p + 2 − n + C
σ
= Cp + C ,

where C is a constant that does not change values as we evaluate

diﬀerent subset models.
More on the AIC

Consequence: The best models in terms of AIC are the same best
models for Mallow’s Cp .

In practice, we calculate AIC for the ordinary least squares case

(assuming normality) by
( )
SSres
AIC = 2p + n ln
n
Analysts typically use these criteria for more complicated modeling
situations based on maximum likelihood, such as generalized linear
models, that make extensive use:
I maximum likelihood estimation
I log-likelihood inference
Strategies for Model Selection

The basic strategy is:

1. analyze the full model
2. perform any necessary transformations
3. use a standard selection technique
I all possible regressions (recommended)
I stepwise methods
4. select 3-5 models for detailed analysis
5. rethink the transformation, if necessary
6. recommend several potential models to subject matter expert.
All Possible Regressions

The technique of all possible regressions is exactly as it sounds.

We estimate every possible model from the list of candidate

regressors.

We rank the models based on the selection statistics of choice.

We look for patterns in the ”best” models.

“Stepwise” Methods

The basic stepwise techniques are:

I forward
I backward
I ”stepwise”
Forward Selection
Forward selection adds regressors one at a time.

It estimates each one regressor model and picks the best according
to an entrance requirement, based on the model’s F statistic.
Note: the technique estimates k models

The selected regressor then is in the model permanently.

The method then estimates each two regressor model based on the
term already included.

Note: the technique estimates k − 1 models.

If the best term meets the entrance requirement, it is included.

The procedure continues until no more terms meet the entrance

requirement.
Backward and Stepwise Selection

Backward selection drops terms one at a time in a similar manner.

”Stepwise” selection starts as forward selection.

After each forward step, it performs a backward selection.

All Possible Regressions Example: Jet Turbine Data

p s f p
r e u r e a
Mallows i c e e x m
Vars R-Sq R-Sq(adj) Cp S m d l s h b
1 99.0 99.0 101.8 50.486 X
1 99.0 99.0 104.7 51.010 X
1 95.3 95.1 633.0 111.22 X
2 99.6 99.5 26.7 33.941 X X
2 99.3 99.3 62.9 42.899 X X
2 99.3 99.3 65.2 43.400 X X
3 99.7 99.7 7.6 27.791 X X X
3 99.7 99.7 7.9 27.911 X X X
3 99.6 99.6 24.6 33.235 X X X
Example: Jet Turbine Data

p s f p
r e u r e a
Mallows i c e e x m
Vars R-Sq R-Sq(adj) Cp S m d l s h b
4 99.7 99.7 5.6 26.725 X X X X
4 99.7 99.7 6.9 27.205 X X X X
4 99.7 99.7 8.8 27.923 X X X X
5 99.8 99.7 5.6 26.362 X X X X X
5 99.8 99.7 7.1 26.916 X X X X X
5 99.7 99.7 8.8 27.585 X X X X X
6 99.8 99.7 7.0 26.509 X X X X X X
Example: Jet Turbine Data

Models with Cp ≤ 7:

p s f
p
r e u
r e a
Mallows i c e
e x m
Vars R-Sq R-Sq(adj) Cp S m d l
s h b
4 99.7 99.7 5.6 26.725 X X X X
4 99.7 99.7 6.9 27.205 X X X X
5 99.8 99.7 5.6 26.362 X X X X X
6 99.8 99.7 7.0 26.509 X X X X X X
Example: Jet Turbine Data

Problem Children from the Full Model

I No. 11 hii = 0.4987, DFFITS = 1.6546
I No. 20 hii = 0.7409, Cook’s D = 3.0072, DFFITS = −5.1257
I No. 28 DFFITS = 0.8925
We now look at the three models with Cp < 7:
I Model A: primary, fuel, exhaust, ambient
I Model B: primary, fuel, press, exhaust, ambient
I Model C: primary, press, exhaust, ambient
Model A

The regression equation is

thrust = - 4280 + 1.44 primary + 0.210 fuel
+ 0.647 exhaust - 17.5 ambient

Predictor Coef SE Coef T P VIF

Constant -4280 2258 -1.90 0.066
primary 1.4420 0.1426 10.11 0.000 70.470
fuel 0.2098 0.1016 2.07 0.046 122.452
exhaust 0.6467 0.3262 1.98 0.055 27.588
ambient -17.510 2.336 -7.50 0.000 6.630

S = 26.7246 R-Sq = 99.7% R-Sq(adj) = 99.7%

PRESS = 31685.4 R-Sq(pred) = 99.68%

Model A

Analysis of Variance

Source DF SS MS F P
Regression 4 9907039 2476760 3467.86 0.000
Residual Error 35 24997 714
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x3
Residuals versus x5
Residuals versus x6
Problem Children

We note that p = 5
I Cut-oﬀ for leverage: .25
I Cut-oﬀ for DFFITS: .7071
Problem Children:
I 10: hii = .3065
I 11: hii = .2516
I 20: DFFITS = -1.0518
I 28: DFFITS = .7620
Analysis of Model B

The regression equation is

thrust = - 3982 + 1.10 primary + 0.184 fuel
+ 0.834 exhaust - 16.3 ambient
+ 3.75 press

Predictor Coef SE Coef T P VIF

Constant -3982 2237 -1.78 0.084
primary 1.0964 0.2835 3.87 0.000 286.356
fuel 0.1843 0.1018 1.81 0.079 126.489
exhaust 0.8343 0.3484 2.39 0.022 32.344
ambient -16.278 2.466 -6.60 0.000 7.592
press 3.746 2.668 1.40 0.169 219.444

S = 26.3615 R-Sq = 99.8% R-Sq(adj) = 99.7%

Analysis of Model B

PRESS = 34081.6 R-Sq(pred) = 99.66%

Analysis of Variance

Source DF SS MS F P
Regression 5 9908408 1981682 2851.63 0.000
Residual Error 34 23628 695
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x3
Residuals versus x4
Residuals versus x5
Residuals versus x6
Problem Children

We note that p = 6
I Cut-oﬀ for leverage: .30
I Cut-oﬀ for DFFITS: .7746
Problem Children:
I 10: hii = .3065
I 11: hii = .4903, DFFITS = 1.5109
I 20: DFFITS = -1.1401
I 21: DFFITS = -0.9404
I 28: DFFITS = .8128
Analysis of Model C

The regression equation is

thrust = 38 + 1.24 primary + 4.61 press
+ 1.26 exhaust - 13.0 ambient

Predictor Coef SE Coef T P VIF

Constant 37.6 273.0 0.14 0.891
primary 1.2411 0.2807 4.42 0.000 263.589
press 4.608 2.709 1.70 0.098 212.440
exhaust 1.2603 0.2652 4.75 0.000 17.586
ambient -12.993 1.722 -7.54 0.000 3.478

S = 27.2048 R-Sq = 99.7% R-Sq(adj) = 99.7%

PRESS = 33334.3 R-Sq(pred) = 99.66%

Analysis of Model C

Analysis of Variance

Source DF SS MS F P
Regression 4 9906133 2476533 3346.22 0.000
Residual Error 35 25903 740
Total 39 9932036
Normal Plot
Residuals versus Fits
Residuals versus Order
Residuals versus x1
Residuals versus x4
Residuals versus x5
Residuals versus x6
Problem Children

We note that p = 5
I Cut-oﬀ for leverage: .25
I Cut-oﬀ for DFFITS: .7071
Problem Children

Problem Children:
I 6: hii = 0.2525
I 9: hii = 0.2648
I 10: hii = 0.2870
I 11: hii = 0.3892
I 20: DFFITS = -0.9746
I 21: DFFITS = -0.8429
I DFFITS = -0.7659
“Conclusions”

Model Cp PRESS MSres

A 5.6 31685.4 714
B 5.6 34081.6 695
C 6.9 33334.3 740

Best model, all else equal, appears to be Model A:

I primary
I fuel
I exhaust temp
I ambient temp
Forward Selection Results

Forward selection. Alpha-to-Enter: 0.25

press 21.43 10.93 -0.06 4.61 3.75
P-Value 0.000 0.000 0.986 0.098 0.169
primary 0.98 1.99 1.24 1.10
P-Value 0.000 0.000 0.000 0.000
ambient -8.1 -13.0 -16.3
P-Value 0.000 0.000 0.000
exhaust 1.26 0.83
P-Value 0.000 0.022
fuel 0.18
P-Value 0.079
Forward Selection Results

S 50.5 42.9 34.4 27.2 26.4

R-Sq 99.02 99.31 99.57 99.74 99.76
R-Sq(adj) 99.00 99.28 99.54 99.71 99.73
Mallows Cp 101.8 62.9 28.7 6.9 5.6
Backward Selection Results

Backward elimination. Alpha-to-Remove: 0.1

primary 1.12 1.10 1.44
P-Value 0.000 0.000 0.000
secondary -0.030
P-Value 0.435
fuel 0.23 0.18 0.21
P-Value 0.059 0.079 0.046
press 3.8 3.7
P-Value 0.161 0.169
exhaust 0.82 0.83 0.65
P-Value 0.025 0.022 0.055
ambient -16.9 -16.3 -17.5
P-Value 0.000 0.000 0.000
Backward Selection Results

S 26.5 26.4 26.7

R-Sq 99.77 99.76 99.75
R-Sq(adj) 99.72 99.73 99.72
Stepwise Results

Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15

press 21.43 10.93 -0.06
P-Value 0.000 0.000 0.986
primary 0.983 1.987 1.982 1.363 1.442
P-Value 0.000 0.000 0.000 0.000 0.000
ambient -8.1 -8.1 -17.8 -17.5
P-Value 0.000 0.000 0.000 0.000
fuel 0.344 0.210
P-Value 0.000 0.046
exhaust 0.65
P-Value 0.055
Stepwise Results

S 50.5 42.9 34.4 33.9 27.8 26.7

R-Sq 99.02 99.31 99.57 99.57 99.72 99.75
R-Sq(adj) 99.00 99.28 99.54 99.55 99.70 99.72
Conclusions

Forward: Model B

Backward: Model A

Stepwise: Model A
Final Comments on Model Selection

I Personally, strongly prefer all possible regressions.

I Does require more work.
I Important consequence: understand the problem children and
the impact of the choice of model.
I Provides deeper insight into the data.
I Stepwise lets the computer doing the thinking for the analyst.
I Some people really do not want to think hard about their data.
I There are cases where all possible regressions is not
computationally feasible.
Chapter 13

Introduction to Logistic Regression

Problem Context
An engineer studied the eﬀect of target air speed, in knots, on the
ability of a surface-to-air missile battery. The result of each test, y ,
was either 0 (a miss) or 1 (a hit).
Speed y Speed y
400 0 330 1
220 1 280 1
490 0 210 1
210 1 300 1
500 0 470 1
270 0 230 0
200 1 430 0
470 0 460 0
480 0 220 1
310 1 250 1
240 1 200 1
490 0 390 0
420 0
Basic Probability Model

A reasonable probability model for the yi ’s is binomial.

Consequently, the variance of the yi ’s is a function of its expected

value!

Let
yi
pi = .
mi
Let πi be the expected value of pi .

πi (1 − πi )
var[pi ] = .
mi
Rather than modeling pi directly, we use the logistic function to
transform the pi ’s.
What the Logistic Function Does
Estimation of the Logistic Regression Model
For convenience, deﬁne
[ ]
πi
ηi = ln
1 − πi
and
[ ]
π̂i
η̂i = ln
1 − π̂i
At least initially, π̂i = pi .

Our model then is

η̂i = β̂0 + β̂1 xi1 + . . . + β̂k xik

= x ′i β̂.
η is the linear predictor.
Estimation of the Logistic Regression Model

As a consequence of using the logistic transformation,

[ ]
exp(η̂i ) exp x ′ β̂
i
π̂i = = [ ].
1 + exp(η̂i ) 1 + exp x ′ β̂ i

Note: the variances of the η̂i are not constant!

To a ﬁrst-order approximation:
1
var(η̂i ) = .
mi πi (1 − πi )
Estimation of the Logistic Regression Model

This suggests using weighted least squares with

V̂ = diag
1
mi π̂i (1 − π̂i )
,

V̂ −1 = diag [mi π̂i (1 − π̂i )] .

Problem:

The weights depend upon the estimated coeﬃcients, which in turn

depend upon the weights!
Estimation of the Logistic Regression Model

The standard approach to this problem uses maximum likelihood.

For binomial data, the likelihood function for yi is

( )
mi yi
πi (1 − πi )mi −yi .
yi

Thus, the kernel of the log-likelihood function is

∑
n ∑
n ∑
n
L= yi lnπi + mi ln (1 − πi ) − yi ln (1 − πi ) .
i=1 i=1 i=1
Estimation of the Logistic Regression Model

Recall

exp(x ′i β)
1 + exp(x ′i β)
πi = .

It can be shown that the maximum likelihood estimate for β solves

X ′ (y − µ̂) = 0,
where
y = (y1 , y2 , . . . , yn )′ , and
µ̂ = (m1 π̂1 , m2 π̂2 , . . . , mn π̂n )′ .
Estimation of the Logistic Regression Model

It can be shown that we may rewrite the score equations, as

n [
∑ ]
(ηi − η̂i ) x i
1
var(η̂i )
i=1

X ′ V −1 (η − η̂) = 0,
where V is the diagonal matrix formed from the variances of the
η̂i ’s.

Note: V is exactly the same variance matrix that we established

earlier for logistic regression.
Estimation of the Logistic Regression Model

Because η̂ = X β̂; we may rewrite the score equations as

( )
X ′ V −1 η − X β̂ = 0,
which are the score equations from weighted least squares!

Thus, the maximum likelihood estimate of β is

( )−1
β̂ = X ′ V −1 X X ′ V −1 η.
Estimation of the Logistic Regression Model

Problem: we do not know η.

Our solution uses Newton-Raphson to build an iterative solution.

Using this procedure, if the model assumptions are correct, one can
show that asymptotically
( )
E (β̂ n ) = β and var β̂ n = (X ′ V −1 X )−1 .
Interpretation of the Parameters

The log-odds ratio at x is

π̂(x)
ln = β̂0 + β̂1 x.
1 − π̂(x)
The log-odds ratio at x + 1 is

π̂(x + 1)
ln = β̂0 + β̂1 (x + 1).
1 − π̂(x + 1)
Interpretation of the Parameters

Thus, the diﬀerence in the log-odds ratio is β̂1 !

The odds ration Ôr is

Oddsx+1
Ôr = = exp(β̂1 ).
Oddsx
The odds ratio is the estimated increase of the probability of a
success given a one-unit increase in x.
Analog to the Global F Test
For logistic regression, we use log-likelihood theory to construct
the test statistic.

Consider the model in terms of the linear predictor as

E[ηi ] = x i,1 β 1 + x i,2 β 2

Let Lfull be the likelihood function for the full model evaluated at
the MLE for β.

Let Lred be the likelihood function for the reduced model evaluated
at the MLE for β 1 .

Let G 2 be the log-likelihood statistics deﬁned by

Lfull
G 2 = 2 ln
Lred
Analog to the Global F Test
The test statistic, G 2 , asymptotically follows a χ2 distribution with
k degrees of freedom under the null hypothesis.

Consider the hypotheses

H0 : β1 = β2 = . . . = βk = 0
Ha : at least one βj ̸= 0

In this case, the full model is

E[ηi ] = β0 + β1 xi1 + β2 xi2 + . . . + βk xik .

The reduced model is

E(ηi ) = β0 .
Test for the Individual Coeﬃcients

We note that the estimation procedure produces (X ′ V −1 X )−1 .

Alternatively, let G be the matrix of second derivatives of the log

likelihood function evaluated at the MLE of β.

We typically call G the Hessian matrix.

From log-likelihood theory

(X ′ V −1 X )−1 = −G .
Let cjj be the j th diagonal element of (X ′ V −1 X )−1 .
Test for the Individual Coeﬃcients

The Wald statistic for testing

H0 : β j = 0
is

β̂j
.
cjj
Asymptotically, this statistic follows a standard normal distribution
under the null hypothesis.

Some software gives conﬁdence intervals on the Odds Ratio.

These intervals follow from the fact that functions of maximum

likelihood statistics also are maximum likelihood statistics.
Measures of the Adequacy of Fit
The deviance has a legitimate interpretation as a residual sum of
squares.
A traditional measure of the adequacy of ﬁt is the deviance divided
by the n − p where p is the number of model parameters estimated.
A similar approach divides the Pearson χ2 statistic by n − p.
The usual rule of thumb for both approaches is a value near 1.
This rule of thumb is appropriate only when the regressors are
actually categorical, not continuous.
The sample size requirements are based on the sample sizes for
contingency tables.
Hosmer-Lemeshow developed their statistic for continuous
regressors.
Basically, they discretize the continuous regressor for the purposes
of calculating the statistic.
Residual Analysis in GLIM

McCullagh and Nelder (Generalized Linear Models, 2nd ed.,

Chapman and Hall, 1989) recommend using the deviance residuals.

Other important plots are:

I the deviance residuals versus each regressor
I normal probability plot of the residuals.
The Missile Test Example

Link Function: Logit

Response Information

Variable Value Count

y 1 13 (Event)
0 12
Total 25

Logistic Regression Table

Predictor Coef SE Coef Z P

Constant 6.07088 2.10916 2.88 0.004
speed -0.0177047 0.0060761 -2.91 0.004
The Missile Test Example

Odds 95% CI
Ratio Lower Upper
0.98 0.97 0.99

Log-Likelihood = -10.182

Test that all slopes are zero: G =14.254,

DF = 1, P-Value = 0.000

Goodness-of-Fit Tests

Method Chi-Square DF P
Pearson 19.5867 18 0.357
Deviance 17.5911 18 0.483
Hosmer-Lemeshow 7.0039 8 0.536
The Pneumoconiosis Data

Ashford (Biometrics, 1959) considers the proportion of coal miners

who exhibit symptoms of severe pneumoconiosis as a function of
the number of years exposure.

Number of Number of Total Number

Years Severe Cases of Miners
5.8 0 98
15.0 1 54
21.5 3 43
27.5 8 48
33.5 9 51
39.5 8 38
46.0 10 28
51.5 5 11
The Pneumoconiosis Data
The SAS code using PROC LOGISTIC:

options ls=70;
data coal;
input years cases n;
cards;
5.8 0 98
15.0 1 54
21.5 3 43
27.5 8 48
33.5 9 51
39.5 8 38
46.0 10 28
51.5 5 11
proc logistic descending;
model cases/n = years;
output out=coal2 resdev=r p=p;
run;
The Pneumoconiosis Data

Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 272.179 223.327

SC 276.095 231.160
-2 Log L 270.179 219.327
The Pneumoconiosis Data

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 50.8520 1 <.0001

Score 47.7370 1 <.0001
Wald 36.7079 1 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -4.7964 0.5686 71.1612 <.0001

years 1 0.0935 0.0154 36.7079 <.0001
The Pneumoconiosis Data

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

years 1.098 1.065 1.132

Normal Plot of the Deviance Residuals
Deviance Residuals versus Years
Summary Remarks
Course Topics

1. Overview of Modeling
2. Review of Simple Linear Regression
3. Multiple Linear Regression
4. Residual Analysis
5. Transformations
6. Influence Diagnostics
7. Collinearity
8. Model Selection
9. Logistic Regression
Take-Home Messages

I Thorough Understanding of Simple Linear Regression is

a Great Foundation
I Having a Clue about ”What is under the Hood”
Requires Matrix Notation
I Proper Residual and Influence Analysis Are Fundamental
to Model Evaluation
I Transformations Are Important Tools
I Collinearity Is a Major and Common Problem
I Model Selection Is Much More than Looking at t
Statistics
I Can Extend Basic Regression Ideas to very Non-Normal
Data Situations

Week 4 Project: Case Study
No ratings yet
Week 4 Project: Case Study
2 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
DA-Unit-3-Trio
No ratings yet
DA-Unit-3-Trio
13 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
70 pages
Module 5
No ratings yet
Module 5
28 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
LM Week1 1 2019
No ratings yet
LM Week1 1 2019
28 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Week 2
No ratings yet
Week 2
33 pages
Estad Istica II Chapter 4: Simple Linear Regression
No ratings yet
Estad Istica II Chapter 4: Simple Linear Regression
46 pages
Lecture1 STAT4355
No ratings yet
Lecture1 STAT4355
59 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Simple Regression Analysis
No ratings yet
Simple Regression Analysis
60 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
No ratings yet
Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness
23 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
228371_Lecture_Notes_Week_2
No ratings yet
228371_Lecture_Notes_Week_2
76 pages
Notes On Applied Linear Regression
No ratings yet
Notes On Applied Linear Regression
47 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
Linear Regression Analysis For STARDEX: Trend Calculation
No ratings yet
Linear Regression Analysis For STARDEX: Trend Calculation
6 pages
Unit III
No ratings yet
Unit III
18 pages
da-unit-iii
No ratings yet
da-unit-iii
43 pages
Simple_linear_regression-Presentation -Review-analysis -covariance
No ratings yet
Simple_linear_regression-Presentation -Review-analysis -covariance
10 pages
Regression 101
No ratings yet
Regression 101
18 pages
Topic 6B Regression
No ratings yet
Topic 6B Regression
13 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model for Medical Data
No ratings yet
Stephen and Senthamarai Kannan (2017) - Detection of Outliers in Regression Model for Medical Data
7 pages
Da Unit III
No ratings yet
Da Unit III
43 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Reg02
No ratings yet
Reg02
46 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Chapter 5. Regression Models: 1 A Simple Model
No ratings yet
Chapter 5. Regression Models: 1 A Simple Model
49 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
regression2
No ratings yet
regression2
28 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Introduction To Linear Regression Analysis - (CHAPTER 2 SIMPLE LINEAR REGRESSION)
No ratings yet
Introduction To Linear Regression Analysis - (CHAPTER 2 SIMPLE LINEAR REGRESSION)
51 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Cheming e
No ratings yet
Cheming e
61 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Dejong 1993
No ratings yet
Dejong 1993
13 pages
19-67
No ratings yet
19-67
31 pages
terbraak1998
No ratings yet
terbraak1998
14 pages
An Alternative Foundation of Quantum Theory
No ratings yet
An Alternative Foundation of Quantum Theory
46 pages
[email protected]
No ratings yet
[email protected]
36 pages
2007.05115v1
No ratings yet
2007.05115v1
28 pages
1.HaenleinKaplan2004-AbeginnersguidetopartialleastsquaresPLSanalysis
No ratings yet
1.HaenleinKaplan2004-AbeginnersguidetopartialleastsquaresPLSanalysis
16 pages
0702873v1
No ratings yet
0702873v1
13 pages
CatalogReferenceComplete2005._ROU BUCHANAN_PAG_30_pdf
No ratings yet
CatalogReferenceComplete2005._ROU BUCHANAN_PAG_30_pdf
110 pages
Influence_properties_of_partial_squares
No ratings yet
Influence_properties_of_partial_squares
20 pages
The_geometry_of_PLS1_explained_properly
No ratings yet
The_geometry_of_PLS1_explained_properly
13 pages
Seminaire_Russolillo
No ratings yet
Seminaire_Russolillo
45 pages
Ojowoldhwoldsetcpls Wold Siam
No ratings yet
Ojowoldhwoldsetcpls Wold Siam
9 pages
Principalcomponentanalysis
No ratings yet
Principalcomponentanalysis
21 pages
From_dummy_regression_to_prior_probabili
No ratings yet
From_dummy_regression_to_prior_probabili
8 pages
Identification of Discriminatory Variabl
No ratings yet
Identification of Discriminatory Variabl
9 pages
Multivariate_strategies_for_classificati
No ratings yet
Multivariate_strategies_for_classificati
13 pages
Basic Validation Procedures For Regressi - PDF) (REF2) PAG 772
No ratings yet
Basic Validation Procedures For Regressi - PDF) (REF2) PAG 772
55 pages
Canonical_partial_least_squares_a_unifie
No ratings yet
Canonical_partial_least_squares_a_unifie
10 pages
Brief-Bioinform-2016-Meng-628-41.pdf (REF 5) (PAG - 629)
No ratings yet
Brief-Bioinform-2016-Meng-628-41.pdf (REF 5) (PAG - 629)
14 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
Chemometricsanalbiochem
No ratings yet
Chemometricsanalbiochem
11 pages
From - Data - Processing - To - Multivariate - Val - PDF (REF 7) (PAG - 1000)
No ratings yet
From - Data - Processing - To - Multivariate - Val - PDF (REF 7) (PAG - 1000)
9 pages
Ijg20100200001 14731168
No ratings yet
Ijg20100200001 14731168
7 pages
Chemometrics in Metabonomics
No ratings yet
Chemometrics in Metabonomics
11 pages
Leveraging Geographic Information PREPRINT
No ratings yet
Leveraging Geographic Information PREPRINT
48 pages
Santens & Gevers 2008
No ratings yet
Santens & Gevers 2008
8 pages
Visualizing Interaction Effects: A Proposal For Presentation and Interpretation
No ratings yet
Visualizing Interaction Effects: A Proposal For Presentation and Interpretation
8 pages
Multivariate Analysis
No ratings yet
Multivariate Analysis
7 pages
MM 1 Unit 1 MBA
No ratings yet
MM 1 Unit 1 MBA
42 pages
An Assessment by The Students Regarding Enrolment System of Ama Computer College Santiago City Campus
No ratings yet
An Assessment by The Students Regarding Enrolment System of Ama Computer College Santiago City Campus
4 pages
Neet-Pg 2019
No ratings yet
Neet-Pg 2019
66 pages
Pfe301 Psda 1 G
No ratings yet
Pfe301 Psda 1 G
14 pages
Group 3 Research Final
No ratings yet
Group 3 Research Final
32 pages
Marketing Manager Coordinator Programs in Dallas FT Worth TX Resume KeJaun DuBose
No ratings yet
Marketing Manager Coordinator Programs in Dallas FT Worth TX Resume KeJaun DuBose
2 pages
22-23AIHAVendorDirectory
No ratings yet
22-23AIHAVendorDirectory
28 pages
Assalamu'alaikum WR - WB
No ratings yet
Assalamu'alaikum WR - WB
6 pages
Synopsis For Registration of Subject For Dissertation: Submitted To
No ratings yet
Synopsis For Registration of Subject For Dissertation: Submitted To
21 pages
Janine Marie Guillen - Topic Proposal Template
No ratings yet
Janine Marie Guillen - Topic Proposal Template
3 pages
12 - Analyse Donné Aussi
No ratings yet
12 - Analyse Donné Aussi
8 pages
Application To Sit For PAE
No ratings yet
Application To Sit For PAE
9 pages
Questions and Their Solutions: Sample Space
No ratings yet
Questions and Their Solutions: Sample Space
5 pages
Gene Mapping
No ratings yet
Gene Mapping
5 pages
A Critical Appraisal On The Admissibility of Electronically Generated Evidence On The Evidence Act 2011: What Next in The Law?
No ratings yet
A Critical Appraisal On The Admissibility of Electronically Generated Evidence On The Evidence Act 2011: What Next in The Law?
171 pages
Sap PPT Final
No ratings yet
Sap PPT Final
7 pages
Women Employment (Minor Project)
0% (1)
Women Employment (Minor Project)
35 pages
Final Report
No ratings yet
Final Report
53 pages
Kessler, Ronald Sex Differences in Vulnerability To Undesirable Life Events
No ratings yet
Kessler, Ronald Sex Differences in Vulnerability To Undesirable Life Events
13 pages
BSW 2605 Portfolio Made Easy
No ratings yet
BSW 2605 Portfolio Made Easy
4 pages
Counselling India
No ratings yet
Counselling India
10 pages
Braekmans Et Al., 2011 - Classical and Hellenistic Ceramics From Duzen Tepe and Sagalassos
No ratings yet
Braekmans Et Al., 2011 - Classical and Hellenistic Ceramics From Duzen Tepe and Sagalassos
15 pages
Syllabus Type of Information Content
100% (1)
Syllabus Type of Information Content
2 pages
Customer Perception Towards Tupperware
100% (1)
Customer Perception Towards Tupperware
6 pages
Physical Fitness Qualities of Professional Volleyball Players Determination of Positional Differences
No ratings yet
Physical Fitness Qualities of Professional Volleyball Players Determination of Positional Differences
6 pages