0% found this document useful (0 votes)

529 views28 pages

Simple Linear Regression (Solutions To Exercises)

This document contains solutions to exercises involving simple linear regression. The first exercise involves a plastic film folding machine and using data to determine parameter estimates and interpret results. The second exercise uses data on the lifetime and temperature of an electronic device to calculate a confidence interval for the slope. The third exercise examines data on chemical yield and temperature to test for a significant relationship and calculate a prediction interval. Each exercise demonstrates concepts like parameter estimation, hypothesis testing, and prediction in the context of simple linear regression.

Uploaded by

blu runner1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

529 views28 pages

Simple Linear Regression (Solutions To Exercises)

Uploaded by

blu runner1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Chapter 5 1

Chapter 5

Simple Linear regression (solutions to

exercises)
Chapter 5 CONTENTS 2

Contents

5 Simple Linear regression (solutions to exercises) 1

5.1 Plastic film folding machine . . . . . . . . . . . . . . . . . . . . . . 3
5.2 Linear regression life time model . . . . . . . . . . . . . . . . . . . 5
5.3 Yield of chemical process . . . . . . . . . . . . . . . . . . . . . . . . 8
5.4 Plastic material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.5 Water polution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.6 Membrane pressure drop . . . . . . . . . . . . . . . . . . . . . . . 18
5.7 Membrane pressure drop (matrix form) . . . . . . . . . . . . . . . 22
5.8 Independence and correlation . . . . . . . . . . . . . . . . . . . . . 25
Chapter 5 5.1 PLASTIC FILM FOLDING MACHINE 3

5.1 Plastic film folding machine

Exercise 5.1 Plastic film folding machine

On a machine that folds plastic film the temperature may be varied in the range
of 130-185 ◦ C. For obtaining, if possible, a model for the influence of tempera-
ture on the folding thickness, n = 12 related set of values of temperature and
the fold thickness were measured that is illustrated in the following figure:
130
120
Thickness
110
100
90

130 140 150 160 170 180

Temperature

a) Determine by looking at the figure, which of the following sets of esti-

mates for the parameters in the usual regression model is correct:

1) β̂ 0 = 0, β̂ 1 = −0.9, σ̂ = 36
2) β̂ 0 = 0, β̂ 1 = 0.9, σ̂ = 3.6
3) β̂ 0 = 252, β̂ 1 = −0.9, σ̂ = 3.6
4) β̂ 0 = −252, β̂ 1 = −0.9, σ̂ = 36
5) β̂ 0 = 252, β̂ 1 = −0.9, σ̂ = 36

Solution

First of all, the only possible intercept ( β̂ 0 ) among the ones given in the answers is
252. And then the slope estimate of -0.9 in these two options looks reasonable. We
Chapter 5 5.1 PLASTIC FILM FOLDING MACHINE 4

just need to decide on whether the estimated standard deviation of the error se = σ̂
is 3.6 or 36. From the figure it is clear that the points are NOT having an average
vertical distance to the line in the size of 36, so 3.6 must be the correct number and
hence the correct answer is:

3 ) β̂ 0 = 252, β̂ 1 = −0.9, σ̂ = 3.6

b) What is the only possible correct answer:

1) The proportion of explained variation is 50% and the correlation is

0.98
2) The proportion of explained variation is 0% and the correlation is
−0.98
3) The proportion of explained variation is 96% and the correlation is
−1
4) The proportion of explained variation is 96% and the correlation is
0.98
5) The proportion of explained variation is 96% and the correlation is
−0.98

Solution

The proportion of variation explained must be pretty high, so 0 can be ruled out.
Answer 1 and 4 is also ruled out since the correlation clearly is negative. This also
narrows the possibilities down to answer 3 and 5. And since the correlation is NOT
exactly -1 (in which case the observations would be exactly on the line), the correct
answer is:

5) The proportion of explained variation is 96% and the correlation is −0.98

Chapter 5 5.2 LINEAR REGRESSION LIFE TIME MODEL 5

5.2 Linear regression life time model

Exercise 5.2 Linear regression life time model

A company manufactures an electronic device to be used in a very wide tem-

perature range. The company knows that increased temperature shortens the
life time of the device, and a study is therefore performed in which the life time
is determined as a function of temperature. The following data is found:

Temperature in Celcius (t) 10 20 30 40 50 60 70 80 90

Life time in hours (y) 420 365 285 220 176 117 69 34 5

a) Calculate the 95% confidence interval for the slope in the usual linear re-
gression model, which expresses the life time as a linear function of the
temperature.

Solution

Either one could do all the regression computations to find the β̂ 1 = −5.3133 and
then subsequently use the formula for the confidence interval for β 1 in Method 5.15
s
1
β̂ 1 ± t1−α/2 · σ̂β1 = β̂ 1 ± tα/2 · σ̂ ,
∑i=1 i − x̄ )2
n
( x

or just run lm in R to find:

Chapter 5 5.2 LINEAR REGRESSION LIFE TIME MODEL 6

D <- data.frame(t=c(10,20,30,40,50,60,70,80,90),
y=c(420,365,285,220,176,117,69,34,5))
fit <- lm(y ~ t, data=D)
summary(fit)

Call:
lm(formula = y ~ t, data = D)

Residuals:
Min 1Q Median 3Q Max
-21.02 -12.62 -9.16 17.71 29.64

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 453.556 14.394 31.5 8.4e-09 ***
t -5.313 0.256 -20.8 1.5e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 19.8 on 7 degrees of freedom

Multiple R-squared: 0.984, Adjusted R-squared: 0.982
F-statistic: 432 on 1 and 7 DF, p-value: 0.000000151

and use the knowledge of the information in the R-output that wht is know as the
"standard error for the slope” can be directly read off as
s
1
σ̂β1 = σ̂ = 0.2558,
∑i=1 ( xi − x̄ )2
n

and t0.025 (7) = 2.364 - in R:

qt(.975,7)

[1] 2.365

to get −5.31 ± 2.365 · 0.2558, or in R:

-5.31+c(-1,1)*qt(.975,7)*0.2558

[1] -5.915 -4.705

b) Can a relation between temperature and life time be documented on level

Chapter 5 5.2 LINEAR REGRESSION LIFE TIME MODEL 7

5%?

Solution

Since the confidence interval does not include 0, it can be documented that there
is a relationship between life time and temperature, also the p-value is 1.5 · 10−7 <
0.05 = α, which also give strong evidence against the null-hypothesis.
Chapter 5 5.3 YIELD OF CHEMICAL PROCESS 8

5.3 Yield of chemical process

Exercise 5.3 Yield of chemical process

The yield y of a chemical process is a random variable whose value is considered

to be a linear function of the temperature x. The following data of correspond-
ing values of x and y is found:

Temperature in ◦ C (x) 0 25 50 75 100

Yield in grams (y) 14 38 54 76 95

The average and standard deviation of temperature and yield are

x̄ = 50, s x = 39.52847, ȳ = 55.4, sy = 31.66702,

In the exercise the usual linear regression model is used

Yi = β 0 + β 1 xi + ε i , ε i ∼ N (0, σε2 ), i = 1, . . . , 5

a) Can a significant relationship between yield and temperature be docu-

mented on the usual significance level α = 0.05?
Chapter 5 5.3 YIELD OF CHEMICAL PROCESS 9

Solution

It could most easily be solved by running the regression in R as:

D <- data.frame(x=c(0,25,50,75,100),
y=c(14,38,54,76,95))
fit <- lm(y ~ x, data=D)
summary(fit)

Call:
lm(formula = y ~ x, data = D)

Residuals:
1 2 3 4 5
-1.4 2.6 -1.4 0.6 -0.4

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.4000 1.4967 10.3 0.002 **
x 0.8000 0.0244 32.7 0.000063 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.93 on 3 degrees of freedom

Multiple R-squared: 0.997, Adjusted R-squared: 0.996
F-statistic: 1.07e+03 on 1 and 3 DF, p-value: 0.0000627

Alternatively one could use hand calculations and use the formula in Theorem 5.12
for the t-test of the null hypothesis: H0 : β 1 = 0.

The relevant test statistic and p-value can be read off in the R output as 32.7 and
0.000063. So the answer is:

Yes, as the relevant test statistic and p-value are resp. 32.7 and 0.00006 < 0.05 = α.

b) Give the 95% confidence interval of the expected yield at a temperature of

xnew = 80 ◦ C.
Chapter 5 5.3 YIELD OF CHEMICAL PROCESS 10

Solution

We use the formula in Equation (5-59) for the confidence limit of the line (the ex-
pected value of Yi for a value xnew ):
s
1 ( xnew − x̄ )2
β̂ 0 + β̂ 1 xnew ± t1−α/2 σ̂ + ,
n Sxx

and we have to compute β̂ 0 , β̂ 1 and se either by hand OR in R as above:

β̂ 0 = 15.4, β̂ 1 = 0.8, σ̂ = 1.932.

So the confidence interval becomes

r
1 (80 − 50)2
(15.4 + 0.8 · 80) ± 3.182 · 1.932 + ,
5 6250
since
1 n
1
s2x = ∑
n − 1 i =1
( xi − x̄ )2 =
n−1
Sxx ⇔

Sxx = (n − 1)s2x = 4 · 39.5282 = 6250.

Thus the answer is

79.40 ± 3.61 = [75.79, 83.01].
In R this could be by:

predict(fit, newdata=data.frame(x=80), interval="confidence",

level=0.95)

fit lwr upr

1 79.4 75.79 83.01

c) What is the upper quartile of the residuals?

Solution

The five residuals become: -1.4, 2.6, -1.4, 0.6 og -0.4.

We use the basic definition of finding a quantile (from Definition 1.7) and the upper
quartile is q0.75 (see Definition 1.8). We set n = 5, p = 0.75, so
np = 3.75
Chapter 5 5.3 YIELD OF CHEMICAL PROCESS 11

So the upper quartile is the 4th observation in the ordered sequence:

−1.4, −1.4, −0.4, 0.6, 2.6.

This is also found in the summary() output above under

Residuals:
1 2 3 4 5
-1.4 2.6 -1.4 0.6 -0.4
So the answer is: 0.6.
Chapter 5 5.4 PLASTIC MATERIAL 12

5.4 Plastic material

Exercise 5.4 Plastic material

In the manufacturing of a plastic material, it is believed that the cooling time has
an influence on the impact strength. Therefore a study is carried out in which
plastic material impact strength is determined for 4 different cooling times. The
results of this experiment are shown in the following table:

Cooling times in seconds (x) 15 25 35 40

Impact strength in kJ/m2 (y) 42.1 36.0 31.8 28.7

The following statistics may be used:

x̄ = 28.75, ȳ = 34.65, Sxx = 368.75.

a) What is the 95% confidence interval for the slope of the regression model,
expressing the impact strength as a linear function of the cooling time?
Chapter 5 5.4 PLASTIC MATERIAL 13

Solution

The easiest way to get to the confidence interval is to use the standard error for the
slope (σ̂β1 or denoted with SEβ1 ) given in the R output:

x <- c(15,25,35,40)
y <- c(42.1,36.0,31.8,28.7)
summary(lm(y ~ x))

Call:
lm(formula = y ~ x)

Residuals:
1 2 3 4
0.2814 -0.6051 0.4085 -0.0847

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.639 0.878 56.5 0.00031 ***
x -0.521 0.029 -18.0 0.00308 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.556 on 2 degrees of freedom

Multiple R-squared: 0.994, Adjusted R-squared: 0.991
F-statistic: 324 on 1 and 2 DF, p-value: 0.00308

the standard error for the slope is σ̂β1 = 0.029 (also known as the sampling distribu-
tion standard deviation for β̂ 1 ). Finding the relevant t-quantile (with ν = 2 degrees
of freedom (either of):

c(qt(0.025, df=2), qt(0.975, df=2))

[1] -4.303 4.303

|t0.025 | = 4.303, which using Theorem 5.15 gives

−0.521 ± 4.303 · 0.029,
giving
−0.521 ± 0.125,
or, that we say with high confidence that the true parameter value is in the interval,
i.e.
−0.646 ≤ β 1 ≤ −0.396.
Chapter 5 5.4 PLASTIC MATERIAL 14

b) Can you conclude that there is a relation between the impact strength and
the cooling time at significance level α = 5%?

Solution

The relevant p-value can be read off directly from the summary output: 0.00308, and
we can conclude: Yes, as the relevant p-value is 0.00308, which is smaller than 0.05.

c) For a similar plastic material the tabulated value for the linear relation
between temperature and impact strength (i.e the slope) is −0.30. If the
following hypothesis is tested (at level α = 0.05)

H0 : β 1 = −0.30
H1 : β 1 6= −0.30

with the usual t-test statistic for such a test, what is the range (for t) within
which the hypothesis is accepted?

Solution

The so-called critical values for the t-statistic with ν = 2 degrees of freedom is found
as (or at least the negative one of the two): t0.025 = −4.303 - in R: qt(0.975,2)). So
the answer becomes:
[−4.303, 4.303].
Chapter 5 5.5 WATER POLUTION 15

5.5 Water polution

Exercise 5.5 Water polution

In a study of pollution in a water stream, the concentration of pollution is mea-

sured at 5 different locations. The locations are at different distances to the
pollution source. In the table below, these distances and the average pollution
are given:

Distance to the pollution source (in km) 2 4 6 8 10

Average concentration 11.5 10.2 10.3 9.68 9.32

a) What are the parameter estimates for the three unknown parameters in
the usual linear regression model: 1) The intercept (β 0 ), 2) the slope (β 1 )
and 3) error standard deviation (σ)?
Chapter 5 5.5 WATER POLUTION 16

Solution

The question is solved by considering the following R-output:

D <- data.frame(concentration=c(11.5, 10.2, 10.3, 9.68, 9.32),

distance=c(2, 4, 6, 8, 10))
fit <- lm(concentration ~ distance, data=D)
summary(fit)

Call:
lm(formula = concentration ~ distance, data = D)

Residuals:
1 2 3 4 5
0.324 -0.488 0.100 -0.032 0.096

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.664 0.365 31.96 0.000067 ***
distance -0.244 0.055 -4.43 0.021 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.348 on 3 degrees of freedom

Multiple R-squared: 0.868, Adjusted R-squared: 0.823
F-statistic: 19.7 on 1 and 3 DF, p-value: 0.0213

Given the knowledge of the R-output structure, the three values can be read off
directly from the output.
So the correct answer is: β̂ 0 = 11.7, β̂ 1 = −0.244 and SEσ̂ = σ̂ = 0.348.

b) How large a part of the variation in concentration can be explained by the

distance?

Solution

The amount of variation in the model output (Y) explained by the variable input
(x) can be found from the squared correlation, that can be read off directly from the
Chapter 5 5.5 WATER POLUTION 17

output as "Multiple R-squared". So the correct answer is: R2 = 86.8% (it is actually
an estimate of the variation in concentration which can be explained by distance,
since it is what we found with the particular data at hand. If the sample was taken
again, then this value would vary. We should actually calculate a confidence interval
for R2 to understand how accurate this estimate is!).

c) What is a 95%-confidence interval for the expected pollution concentra-

tion 7 km from the pollution source?

Solution

The wanted number is estimated by the point on the line (using xnew = 7)
−0.244 · 7 + 11.664 = 9.96,
and the confidence interval is given by
s
1 (7 − 6)2
9.96 ± t0.025 (3) · σ̂ + ,
5 Sxx

where Sxx = 42 + 22 + 02 + 22 + 42 = 40 and t0.025 (3) = 3.182 (in R: qt(0.975,3))

we have that r
1 1
3.182 · 0.348 + = 0.525,
5 40
where s x is:

sd(D$distance)

[1] 3.162

and thus
Sxx = (n − 1) · s2x = 4 · 3.1622 = 40.
This could also have been found by

predict(fit, newdata=data.frame(distance=7), interval="confidence",

level=0.95)

fit lwr upr

1 9.956 9.431 10.48

So the correct answer is:

9.96 ± 0.525 = [9.43, 10.5].
Chapter 5 5.6 MEMBRANE PRESSURE DROP 18

5.6 Membrane pressure drop

Exercise 5.6 Membrane pressure drop

When purifying drinking water you can use a so-called membrane filtration.
In an experiment one wishes to examine the relationship between the pressure
drop across a membrane and the flux (flow per area) through the membrane.
We observe the following 10 related values of pressure (x) and flux (y):

1 2 3 4 5 6 7 8 9 10
Pressure (x) 1.02 2.08 2.89 4.01 5.32 5.83 7.26 7.96 9.11 9.99
Flux (y) 1.15 0.85 1.56 1.72 4.32 5.07 5.00 5.31 6.17 7.04

Copy this into R to avoid typing in the data:

D <- data.frame(
pressure=c(1.02,2.08,2.89,4.01,5.32,5.83,7.26,7.96,9.11,9.99),
flux=c(1.15,0.85,1.56,1.72,4.32,5.07,5.00,5.31,6.17,7.04)
)

a) What is the empirical correlation between pressure and flux estimated to?
Give also an interpretation of the correlation.
Chapter 5 5.6 MEMBRANE PRESSURE DROP 19

Solution

The questions are most easily solved by using lm in R:

D <- data.frame(
pressure=c(1.02,2.08,2.89,4.01,5.32,5.83,7.26,7.96,9.11,9.99),
flux=c(1.15,0.85,1.56,1.72,4.32,5.07,5.00,5.31,6.17,7.04)
)
fit <- lm(flux ~ pressure, data=D)
summary(fit)

Call:
lm(formula = flux ~ pressure, data = D)

Residuals:
Min 1Q Median 3Q Max
-0.989 -0.318 -0.140 0.454 1.046

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1886 0.4417 -0.43 0.68
pressure 0.7225 0.0706 10.23 0.0000072 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.645 on 8 degrees of freedom

Multiple R-squared: 0.929, Adjusted R-squared: 0.92
F-statistic: 105 on 1 and 8 DF, p-value: 0.00000718

The found coefficient of determination (see Theorem 5.25) can be read off the R out-
put to be 0.929. The sign of the correlation is the same as the sign of the slope, which
can be read off to be positive ( β̂ 1 = 0.7225), so the correlation is
√
ρ̂ = r = 0.929 = 0.964.

So the empirical correlation is 0.964, and thus flux is found to increase with increas-
ing pressure.

b) What is a 90% confidence interval for the slope β 1 in the usual regression
model?
Chapter 5 5.6 MEMBRANE PRESSURE DROP 20

Solution

We use the formula for the slope (β 1 , see Method 5.15) confidence interval, and can
actually just realize that the correct t-quantile to use is the t1−0.05 (8) = 1.860 (in R:
qt(0.95,8)), and the other values we read of the summary output.
So the confidence interval is: 0.7225 ± 1.860 · 0.0706.

c) How large a part of the flux-variation (∑10 2

i =1 ( yi − ȳ ) ) is not explained by
pressure differences?

Solution

The squared correlation, r2 = 0.929 express the explained variation, this means that
1 − 0.929 = 0.071 express the unexplained variation by the model.

d) Can you at significance level α = 0.05 reject the hypothesis that the line
passes through (0, 0)?

Solution

The hypothesis is the same as:

H0 : β 0 = 0
which is the hypothesis results provided in the output in the "intercept" row of
summary, so: No, since the relevant p-value is 0.68, which is larger than α.

A
e) A confidence interval for the line at three different pressure levels: xnew =
B C
3.5, xnew = 5.0 and xnew = 9.5 will look as follows:
U
β̂ 0 + β̂ 1 · xnew ± CU
where U then is either A, B or C. Write the constants CU in increasing
order.
Chapter 5 5.6 MEMBRANE PRESSURE DROP 21

Solution

The formula for the Confidence limits of α + βxnew includes the following term:

( xnew − x̄ )2
Sxx
and this is the ONLY term in CU that makes CU different between the three Us. And
since x̄ = 5.547 it is clear that

(5.0 − 5.547)2 < (3.5 − 5.547)2 < (9.5 − 5.547)2

and hence
B
( xnew − 5.547)2 < ( xnew
A
− 5.547)2 < ( xnew
C
− 5.547)2
So CB < CA < CC
Chapter 5 5.7 MEMBRANE PRESSURE DROP (MATRIX FORM) 22

5.7 Membrane pressure drop (matrix form)

Exercise 5.7 Membrane pressure drop (matrix form)

This exercise uses the data presented in Exercise 6 above.

a) Find parameters values, standard errors, t-test statistics, and p-values for
the standard hypotheses tests.

Copy this into R to avoid typing in the data:

D <- data.frame(
pressure=c(1.02,2.08,2.89,4.01,5.32,5.83,7.26,7.96,9.11,9.99),
flux=c(1.15,0.85,1.56,1.72,4.32,5.07,5.00,5.31,6.17,7.04)
)

Solution
Chapter 5 5.7 MEMBRANE PRESSURE DROP (MATRIX FORM) 23

D <- data.frame(
pressure=c(1.02,2.08,2.89,4.01,5.32,5.83,7.26,7.96,9.11,9.99),
flux=c(1.15,0.85,1.56,1.72,4.32,5.07,5.00,5.31,6.17,7.04)
)
fit <- lm(flux ~ pressure, data=D)
summary(fit)

Call:
lm(formula = flux ~ pressure, data = D)

Residuals:
Min 1Q Median 3Q Max
-0.989 -0.318 -0.140 0.454 1.046

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.1886 0.4417 -0.43 0.68
pressure 0.7225 0.0706 10.23 0.0000072 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.645 on 8 degrees of freedom

Multiple R-squared: 0.929, Adjusted R-squared: 0.92
F-statistic: 105 on 1 and 8 DF, p-value: 0.00000718

The parameter estimates are given in the first column, the standard errors in the
second column, the t-test statistics are given in the third column and the p-values of
the standard hypothesis are given in the last column.

b) Reproduce the above numbers by matrix vector calculations. You will

need some matrix notation in R:
– Matrix multiplication (XY): X%*%Y
– Matrix transpose (X T ): t(X)
– Matrix inverse (X −1 ): solve(X)
– Make a matrix from vectors (X = [ x1T ; x2T ]): cbind(x1,x2)
See also Example 5.24.
Chapter 5 5.7 MEMBRANE PRESSURE DROP (MATRIX FORM) 24

Solution

X <- cbind(1, D$pressure)

y <- D$flux
n <- length(y)
beta <- solve(t(X) %*%X ) %*% t(X) %*% y
beta

[,1]
[1,] -0.1886
[2,] 0.7225

e <- y - X %*% beta

s <- sqrt(sum(e^2)/(n-2))
Vbeta <- s^2 * solve(t(X) %*%X )
se.beta <- sqrt(diag(Vbeta))
t.obs <- beta / se.beta
p.value <- 2 * (1 - pt(abs(t.obs), df = n-2))

## Collection in a table
analasis.table <- cbind(beta, se.beta, t.obs, p.value)
analasis.table

se.beta
[1,] -0.1886 0.44171 -0.4269 0.680696710
[2,] 0.7225 0.07064 10.2269 0.000007177

## Put some names on our table

colnames(analasis.table) <- c("Estimates","Std.Error","t.obs","p.value")
rownames(analasis.table) <- c("beta1","beta2")
analasis.table

Estimates Std.Error t.obs p.value

beta1 -0.1886 0.44171 -0.4269 0.680696710
beta2 0.7225 0.07064 10.2269 0.000007177

## Done!!
Chapter 5 5.8 INDEPENDENCE AND CORRELATION 25

5.8 Independence and correlation

Exercise 5.8 Independence and correlation

Consider the layout of independent variable in Example 5.11,

n·(n+1)
a) Show that Sxx = 12·(n−1)
.

Hint: you can use the following relations

n
n ( n + 1)
∑i= 2
,
i =1
n
n(n + 1)(2n + 1)
∑ i2 = 6
.
i =1

Solution

x̄ becomes
1 n i−1 1 n
x̄ = ∑
n i =1 n − 1
= ∑
n ( n − 1 ) i =1
( i − 1)

1 n ( n + 1) 1

= −n = ,
n ( n − 1) 2 2

and Sxx becomes

2
n
i−1 1

Sxx = ∑ −
n−1 2
i =1
n 1 n
=− +
4 ( n − 1)2 ∑ (i2 + 1 − 2i)
i =1
1 n(n + 1)(2n + 1) − 6n2

n
=− +
4 ( n − 1)2 6
2
4n + 6n + 2 − 12n − 3(n − 1)2

n
=
( n − 1)2 12
2
n −1 n ( n + 1)

n
= 2
= .
( n − 1) 12 12(n − 1)
Chapter 5 5.8 INDEPENDENCE AND CORRELATION 26

b) Show that the asymptotic correlation between β̂ 0 and β̂ 1 is

√
3
lim ρn ( β̂ 0 , β̂ 1 ) = − .
n→∞ 2

Solution

The correlation between β̂ 0 and β̂ 0 is

Cov( β̂ 0 , β̂ 1 )
ρn ( β̂ 0 , β̂ 1 ) = q
V( β̂ 0 ) V( β̂ 1 )
σ2 x̄/Sxx
= −r
2
σ4 n1 + Sx̄xx S1xx
x̄/S
=− r xx
1 Sxx 2
Sxx n + x̄

x̄
= −q .
Sxx
n + x̄2

Notice that the correlation is not a function of the variance (σ2 ), but only a function
of the independent variables. Now insert the values of x̄ and Sxx

1 1
ρn ( β̂ 0 , β̂ 1 ) = − q =− q
n +1 1 n +1+3( n −1)
2 12(n−1) + 42 12(n−1)

1 6( n − 1)
p
=− q =− √
2 62n −1 2 2n − 1
( n −1)
s √ r
1 6( n − 1) 3 n−1
=− =−
2 2(n − 1/2) 2 n − 1/2
.
√
3
which converges to − 2 for n → ∞.

Consider a layout of the independent variable where n = 2k and xi = 0 for i ≤ k

and xi = 1 for k < i ≤ n.

c) Find Sxx for the new layout of x.

Chapter 5 5.8 INDEPENDENCE AND CORRELATION 27

Solution

1
x̄ = ,
2
and
2 2k 2
k
1 1

Snew
xx = ∑ 0−
2
+ ∑ 1−
2
i =1 i = k +1
k k k n
= + = = .
4 4 2 4

d) Compare Sxx for the two layouts of x.

Solution

Sxx n ( n + 1) 4 ( n + 1)
new
= = < 1; f or n>2
Sxx 12(n − 1) n 3( n − 1)

which imply that Snew

xx > S xx for all n > 2.

e) What is the consequence for the parameter variance in the two layouts?

Solution

The larger Sxx for the new layout imply that the parameter variance is smaller for
the new layout (given that data comes from the same model).

f) Discuss pro’s and cons for the two layouts.

Chapter 5 5.8 INDEPENDENCE AND CORRELATION 28

Solution

The smaller parameter variance for the new layout would suggest that we should
use this layout. However, we would not be able to check that data is in fact generated
by a linear model. Consider e.g. data generated by the model

yi = β 0 + β 1 xi2 + ε i , ε i ∼ N (0, σ2 ),

if we only look at xi = 0 or xi = 1 we will not be able to detect that the relationship

is in fact non-linear.

Engineering Unit 4
No ratings yet
Engineering Unit 4
18 pages
Chapter 2
100% (1)
Chapter 2
28 pages
Simplex Algorithm - Special Cases
No ratings yet
Simplex Algorithm - Special Cases
27 pages
Syem Modelling and Simulation Final Exam
No ratings yet
Syem Modelling and Simulation Final Exam
2 pages
Effects of Inventory Management On The Supply Chain of Dell
100% (9)
Effects of Inventory Management On The Supply Chain of Dell
75 pages
Assignment On Project Crashing: Activities Normal Time (Weeks) Crash Time Normal Cost Crash Cost Imm. Predec
No ratings yet
Assignment On Project Crashing: Activities Normal Time (Weeks) Crash Time Normal Cost Crash Cost Imm. Predec
6 pages
Coursera Basic Statistics Final Exam Answers
80% (5)
Coursera Basic Statistics Final Exam Answers
9 pages
Introduction To Value Analysis
100% (1)
Introduction To Value Analysis
18 pages
Transportation Notes Good
100% (1)
Transportation Notes Good
134 pages
Engineering Mathematics Lecture 0 PDF
100% (1)
Engineering Mathematics Lecture 0 PDF
7 pages
NPTEL Lecture Series
No ratings yet
NPTEL Lecture Series
30 pages
Lecture24 26
No ratings yet
Lecture24 26
9 pages
Chapter 4: Unconstrained Optimization
No ratings yet
Chapter 4: Unconstrained Optimization
25 pages
Case 21 - Denby Constabulary
100% (1)
Case 21 - Denby Constabulary
6 pages
Unit 3 - Statistics
No ratings yet
Unit 3 - Statistics
26 pages
Multiple Linear Regression Exercises
No ratings yet
Multiple Linear Regression Exercises
28 pages
Huber RobustEstimationLocation 1964
No ratings yet
Huber RobustEstimationLocation 1964
30 pages
Individual Assignment (Mock Exam) : Answer. 235, 226
0% (1)
Individual Assignment (Mock Exam) : Answer. 235, 226
4 pages
Application of ANOVA
100% (1)
Application of ANOVA
18 pages
ME8793 Process Planning and Cost EStimation UNIT 3 Notes
No ratings yet
ME8793 Process Planning and Cost EStimation UNIT 3 Notes
10 pages
Or Notes (Unit Iv)
100% (1)
Or Notes (Unit Iv)
24 pages
Mba Syllabus
No ratings yet
Mba Syllabus
2 pages
Business Statistics Material
No ratings yet
Business Statistics Material
67 pages
LP-III Lab Manual
No ratings yet
LP-III Lab Manual
49 pages
PBP Formula
100% (1)
PBP Formula
3 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Data Science For Beginners
100% (3)
Data Science For Beginners
354 pages
Direct Search Methods
No ratings yet
Direct Search Methods
32 pages
Chapter 4. Estimation of Parameters (Autosaved)
No ratings yet
Chapter 4. Estimation of Parameters (Autosaved)
33 pages
CDC Aptitude
No ratings yet
CDC Aptitude
29 pages
05.theory of Indices (36-47) PDF
No ratings yet
05.theory of Indices (36-47) PDF
12 pages
ASSIGNMENT - Modelling and Simulation of Manufacturing System
No ratings yet
ASSIGNMENT - Modelling and Simulation of Manufacturing System
1 page
Topic 7 Queuing Theory
No ratings yet
Topic 7 Queuing Theory
6 pages
Sampling Theory Notes
No ratings yet
Sampling Theory Notes
8 pages
Lesson - 04
No ratings yet
Lesson - 04
24 pages
Experiment Archimedes' Principle
No ratings yet
Experiment Archimedes' Principle
3 pages
Chapter 1 Linear Regression Notes (As FS2)
No ratings yet
Chapter 1 Linear Regression Notes (As FS2)
29 pages
Handbook of Econometrics Volume 3
No ratings yet
Handbook of Econometrics Volume 3
620 pages
Chapter 2 Linear Programming 2021
No ratings yet
Chapter 2 Linear Programming 2021
115 pages
Critical Path Method
100% (1)
Critical Path Method
8 pages
final- درجة 21.25
No ratings yet
final- درجة 21.25
21 pages
Enae411 Final Project Report
No ratings yet
Enae411 Final Project Report
31 pages
PRM2-MTS2-FRIDAY-Cost Accounting and Budgetary Management
No ratings yet
PRM2-MTS2-FRIDAY-Cost Accounting and Budgetary Management
23 pages
Acvuracy Precision Error Unit-1
No ratings yet
Acvuracy Precision Error Unit-1
29 pages
Estimation & Hypothesis Testing - PPTX (Final)
No ratings yet
Estimation & Hypothesis Testing - PPTX (Final)
92 pages
Practice Problems and Solutions 3process Design and Facility Layout
No ratings yet
Practice Problems and Solutions 3process Design and Facility Layout
10 pages
ppc2 CH 2
No ratings yet
ppc2 CH 2
68 pages
EE434 Exam - 2023 - With Solution
No ratings yet
EE434 Exam - 2023 - With Solution
6 pages
Selective Control
No ratings yet
Selective Control
5 pages
MATH3714 Jan 2024
No ratings yet
MATH3714 Jan 2024
9 pages
(Chapter - 4) MEng 5361 (IEng 5241) - Industrial Management & Engineering Economy
No ratings yet
(Chapter - 4) MEng 5361 (IEng 5241) - Industrial Management & Engineering Economy
19 pages
Course Diary
No ratings yet
Course Diary
17 pages
Chapter3 - Concurrent Engineering PDF
No ratings yet
Chapter3 - Concurrent Engineering PDF
15 pages
311 Maths Eng Lesson26
No ratings yet
311 Maths Eng Lesson26
28 pages
AIMMS Modeling Guide - Linear Programming Tricks
No ratings yet
AIMMS Modeling Guide - Linear Programming Tricks
16 pages
IENG300 Midterm Solution Fall 2017
No ratings yet
IENG300 Midterm Solution Fall 2017
8 pages
Example (In Terms of Percentage)
No ratings yet
Example (In Terms of Percentage)
30 pages
STATA - Logit-Probit-Tobit - IInd Sem 23-24
No ratings yet
STATA - Logit-Probit-Tobit - IInd Sem 23-24
84 pages
Queuing Theory: Transient State & Steady State
No ratings yet
Queuing Theory: Transient State & Steady State
6 pages
Unit - 5 Forecasting
No ratings yet
Unit - 5 Forecasting
11 pages
TS03J Nyoagbe Ayer Et Al 12205
No ratings yet
TS03J Nyoagbe Ayer Et Al 12205
18 pages
Consumer Behaviour Course Objectives
No ratings yet
Consumer Behaviour Course Objectives
2 pages
Assignment CM Final PDF
No ratings yet
Assignment CM Final PDF
9 pages
Error Processing Lab Reports
No ratings yet
Error Processing Lab Reports
13 pages
Role of Financial System in Economic Development
No ratings yet
Role of Financial System in Economic Development
22 pages
Adjusted Exponential Smoothing
No ratings yet
Adjusted Exponential Smoothing
20 pages
OPM
No ratings yet
OPM
25 pages
Tay Thesis 2015 PDF
No ratings yet
Tay Thesis 2015 PDF
253 pages
Topic 6 Heteroscedasticity
No ratings yet
Topic 6 Heteroscedasticity
15 pages
Assignment 2 (LCW) (1) HH
No ratings yet
Assignment 2 (LCW) (1) HH
12 pages
Important Question - Iat 2
No ratings yet
Important Question - Iat 2
10 pages
Time: 2 Hours Total Marks: 70
No ratings yet
Time: 2 Hours Total Marks: 70
3 pages
Biodiversity Scale-Dependence and Opposing Multi-Level Correlatio
No ratings yet
Biodiversity Scale-Dependence and Opposing Multi-Level Correlatio
42 pages
Solving Linear Programming Problem Using LINGO Software: ABC Transistor Radio Company Code For Execution
No ratings yet
Solving Linear Programming Problem Using LINGO Software: ABC Transistor Radio Company Code For Execution
4 pages
Mod 3 Worksheet Review 14KEY
No ratings yet
Mod 3 Worksheet Review 14KEY
5 pages
Bio 181 Lb2b Group 2 Case Study Manuscript
No ratings yet
Bio 181 Lb2b Group 2 Case Study Manuscript
23 pages
Temperature C Frequency of Occurence T F Deviation D F D D F D
No ratings yet
Temperature C Frequency of Occurence T F Deviation D F D D F D
2 pages
Analytica Chemistry Theory Lec 10
No ratings yet
Analytica Chemistry Theory Lec 10
10 pages
CH 3 Describing Relationship Review
No ratings yet
CH 3 Describing Relationship Review
8 pages
Structural Reliability Analysis of Stiffened Panels
No ratings yet
Structural Reliability Analysis of Stiffened Panels
18 pages
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
No ratings yet
Isye4031 Regression and Forecasting Practice Problems 2 Fall 2014
5 pages
Sas Chapter 10 Asda Analysis Examples Replication Winter 2010 Sas
No ratings yet
Sas Chapter 10 Asda Analysis Examples Replication Winter 2010 Sas
7 pages
Robert Grant - Emperical Research Into Organizational Capabilities
No ratings yet
Robert Grant - Emperical Research Into Organizational Capabilities
14 pages
Erdemir, Cavdar, Bagci, Cihat Corbaci - Factors Predicting E-Learners' Satisfaction On Online Education, 2016
No ratings yet
Erdemir, Cavdar, Bagci, Cihat Corbaci - Factors Predicting E-Learners' Satisfaction On Online Education, 2016
9 pages
Embedded Dram
No ratings yet
Embedded Dram
23 pages
On Field Calibration of An Electronic Nose For Benzene Estimation in An Urban Pollution Monitoring Scenario
No ratings yet
On Field Calibration of An Electronic Nose For Benzene Estimation in An Urban Pollution Monitoring Scenario
8 pages
Biostatistics I - Assignment 02 Solution
No ratings yet
Biostatistics I - Assignment 02 Solution
5 pages
BUS 822 - Assignment-20 Aggregate Planning
No ratings yet
BUS 822 - Assignment-20 Aggregate Planning
19 pages
HW 1 Solutions
No ratings yet
HW 1 Solutions
6 pages