0% found this document useful (0 votes)

28 views48 pages

STA302 Week07 Full

The document provides lecture notes for a statistics course. It discusses important announcements, including the location of an upcoming make-up midterm exam. The lecture focuses on variable transformations to satisfy linear regression assumptions and improve model fit. Examples are given demonstrating how transforming the X variable using a square root function can linearize the relationship between X and Y and produce a model with better diagnostics.

Uploaded by

tianyuan gu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views48 pages

STA302 Week07 Full

Uploaded by

tianyuan gu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

STA302/1001 - Methods of Data Analysis I

(Week 07 lecture notes)

Wei (Becky) Lin

Oct 24, 2016

1/48
Important

• Morning sections: Class on October 27 will take place in ES 1050.

• A1 result is available today
• Write me mail with your Last name + first name + student ID if you
had my permission to waive the lateness penalty. Crowdmark does
everything automatically, I need to adjust it back.
• A2 will be posted this week. Waiting for my announcement.
• Make-up midterm
• Location: BA1170
• Time: AM 10-12 (90 minutes), Oct. 29. (This Saturday.)
• A calculator and your student ID.
• Final is on Dec. 12, AM 9-12
• A to J: BN 2N
• K to R: BN 2S
• S to ZHE: BN 3
• Zho to ZZ: ZZ VLAd

2/48
Lecture before Midterm

• Unusual data points: outliers, high levage points, influential points.

• Diagnostics for residual
• Check linearity by residuals vs fitted values plot or (Scatter plot of Y
and X)
• Check constant variance by residual plots
• Check Normality by Normal QQ-plot
• Identify unusual data points.
• Influence Metrics: DFFITS, DFBETAS,COOK’s distance
• Case study

3/48
Week 07- Learning objectives & Outcomes

• Variable transformations.
• More on logarithmic transformation.
• Box-Cox transformation.
• Interpretation of slope after transformation.
• Chapter 4: Simultaneous Inferences

4/48
Variable Transformations

5/48
Transformations

• Why?
• Satisfy model assumptions.
• Improve predictive ability.
• Make it easier to interpret parameters.
• How?
• First fit a linear regression model to the original variables. Diagnostics
indicate
• Nonlinearity: transformation on X.
• Nonlinearity, nonconstant variance and non-normality: transformation
on Y (transformation on X might also helpful).
• Box-Cox transformation.

• Fit linear regression model after the transformations for one or both of
the original variables.
• To make the regression model appropriate for the transformed data.

6/48
Transformations (cont.)

• Several alternative transformations may be tried.

• Scatter plot and residual plots based on each transformation should

then be prepared and analyzed to decide which transformation is most
effective. Check SSE.

7/48
Transformations on X

• XiÕ = f (Xi ), Yi = —0 + —1 XiÕ + ‘i

• Correct nonlinearity, when constant variance in residual is satisfied.
r2
th constant
constant
y
y
= ±
.

.
ttonstant
2
f

8/48
Transformations on Y

• YiÕ = f (Yi ), YiÕ = —0 + —1 Xi + ‘i

• Help to fix unequal error variances; non-normality of error terms.
• Also help to linearize a curvilinear regression.
• A simultaneous transformation on X may also be helpful or necessary.

:w9#
g.
Lisman
or larger
:
}o2
:
.
:
. .
i i
.

: e

When variance My .

of Y large,
variance of Y’
small 9/48
Example: Transformation on X

# Table 3.7: 10 participants in the study

# X: the number of days of training received;
# Y: performance score in a battery of simulated sales situations
X = c(0.5, 0.5, 1,1,1.5,1.5,2,2,2.5,2.5)
Y=c(42.5,50.6,68.5,80.7,89.0,99.6,105.3,111.8,112.3,125.7)
# Use of Square Root Transformation of X
Xp = sqrt(X)

fit0 = lm(Y~X)
fit1 = lm(Y~Xp)

par(mfrow=c(2,2))
plot(Y~X, type="p",col="red",main="Before transformation of X")
plot(Y~Xp,type="p",,col="red",xlab=expression(paste(sqrt(X))),
main="After transformation of X" )
plot(fit1,1,main="After transformation of X")
plot(fit1,2,main="After transformation of X")

10/48
Example: Transformation on X (cont.)
Ô
• Diagnostic for Ŷ = ≠10.33 + 84.35 X : no evidence of lack of fit or
strongly unequal error variances.

Before transformation of X After transformation of X

120

120
100

100
Y

Y
80

80
60

60
40

40
0.5 1.0 1.5 2.0 2.5 0.8 1.0 1.2 1.4 1.6

X X

After transformation of X After transformation of X

Residuals vs Fitted Normal Q-Q

1.5
4 6 6 4
Standardized residuals
5

0.5
Residuals

-0.5
-5

-1.5
-10

9
9

60 80 100 120 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Fitted values Theoretical Quantiles

11/48
Example: Transformation on X (cont.)
X = c(0.5, 0.5, 1,1,1.5,1.5,2,2,2.5,2.5)
Y=c(42.5,50.6,68.5,80.7,89.0,99.6,105.3,111.8,112.3,125.7)
Xp = sqrt(X)

fit0 = lm(Y~X); fit1 = lm(Y~Xp)

anova(fit0); anova(fit1)

## Analysis of Variance Table

##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 6397.5 6397.5 99.464 8.66e-06 ***
## Residuals 8 514.6 64.3
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## Xp 1 6597.3 6597.3 167.72 1.197e-06 ***
## Residuals 8 314.7 39.3
## ---
## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
12/48
Example: Transformation on X (cont.)

• Find confidence interval for —0 , —1 Ô

• Find confidence interval for E (Y ) when
Ô X = 1.2
• Find Prediction interval for Y when ÔX = 1.2
• Find Prediction interval for Y when X = c(1, 1.2)

*
or Xml and Xn= 1.2

13/48
Example: Transformation on Y
Annual US GNP data analysis
• US GNP data (1947-2007)
• Y: annual (adjusted) US GNP (Gross National Product) (in $Billions).
• X: years
12000
10000
8000
GNP( $Billions)

6000
4000
2000

1950 1960 1970 1980 1990 2000

year

14/48
Annual US GNP data analysis (cont.)

• Fitted
Ô model, M0: GNPt = ≠315741.23 + 162.43 Yeart + ‘t
• MSE = 606.4, R 2 = 0.9583
Normal Q-Q
12000

60
10000

1000

2
59
8000

500

1
Standardized residuals
GNP( $Billions)

M0$res
6000

0
4000

-500

-1
off Normality
2000

nonlinearity non .

constant r2
-1000

1950 1960 1970 1980 1990 2000 1950 1960 1970 1980 1990 2000 -2 -1 0 1 2

year year Theoretical Quantiles

15/48
Annual US GNP data analysis (cont.)

log GNP, ti = year ≠ 1947

The higher GNP is, the lower the speed of increase in log(GNP) will be
• Fitted
Ô model, M1: log(GNPi ) = 7.44134 + 0.03297 ti + ‘i
• MSE = 0.04279, R 2 = 0.9948
Normal Q-Q

2
22
9.0

0.05

1
Standardized residuals
log(GNP)( $Billions)

8.5

0.00
M1$res

0
8.0

-1
-0.05

-2
Looks good
7.5

QK Improved .
-0.10

.
.
3

0 10 20 30 40 50 60 0 10 20 30 40 50 60 -2 -1 0 1 2

time time Theoretical Quantiles

16/48
Annual US GNP data analysis (cont.)
Ô
GNP, ti = year ≠ 1947
Ô
• Fitted
Ô model,M1: GNP = 36.70997 + 1.1288 ti + ‘i
• MSE = 1.786, R 2 = 0.9923
Normal Q-Q
110

2
60
100

2
90

1
Standardized residuals
80
sqrt(GNP)( $Billions)

M2$res

0
70
60

-1
-2
50

ok .

Improved A bit concern

-2
40

36
-4

0 10 20 30 40 50 60 0 10 20 30 40 50 60 -2 -1 0 1 2

time time Theoretical Quantiles

17/48
Annual US GNP data analysis (cont.)
Which transformation is best?
Ô
Y X MSE R2
GNP year 606.4 0.9583
log(GNP)
Ô t=year-1947 0.04279 0.9948
GNP t=year-1947 1.786 0.9923

Compare SSE’s in terms of original units

q
• SSElinear = i (Yi ≠ Ŷi )2 = 21, 697, 236 all fitted values
q
• SSElog = i (Yi ≠ exp log
\ Yi )2 = 3, 005, 365 can be obtained in A
q Ô
• SSE sqrt= (Y ≠ ‰
i Y )2 )2 = 4, 628, 230
i i

In this case, logarithmic transformation also offers better interpretation:

exponential growth model for GNP (b1 is the estimated annual growth -

rate).
~ ~

Ei
GAP @ Byte )teEi
GNP # ( HR Taylor Expansion

e×÷H£
= . .

GNPi = GNP0 exp(—1 t + ‘i ) ¥ GNP0 (1 + —1 )t e ‘i

-
~

log(GNPi ) = log(GNP0 ) + —1 t + ‘i
18/48
Annual US GNP data analysis (cont.)
Based on the logarithmic transformation model,
• Find confidence interval for —0 , —1
• Find confidence interval for E (GNP) when time = 50, i.e. at year
1997.
• Find Prediction interval for GNP when time = 63, i.e, at year 2010.
\h ) ± t1≠–/2,n≠2 s(pred)log(Y )
log(Yh ) : [L, U] = log(Y h

lwrcewgyn
upr ) 0.90

ceyn
=
PC <

)
pcewr < eupr ) =
ago
<

R 9.43 < wgtn < 9.611=0.95

)
e9' 611=0.95
<

PL g. 43 < yn <

19/48
More on logarithmic transformation

• The default logarithmic transformation merely involves taking the

natural logarithm — denoted ln or loge or simply log10 — of each
data value.
• One could consider taking a different kind of logarithm, such as log
base 10, or log base 2. In R, log10(), log2().
• However, the natural logarithm, log base e where e is the constant
2.718282. . . , is the most common used in practical.

Why consider the natural logarithmic transformation:

• Small values that are close together are spread further out.
• Large values that are spread out are brought closer together.

20/48
Why Might Logarithms Work?

Logarithms are often used because they are connected to common

exponential growth and power curve relationships.
• The exponential growth equation for variables y and x

y = a ú e bX ∆ log(y ) = log(a) + bX

• A general power curve equation is

y = a ú x b ∆ log(y ) = log(a) + b log(X )

This regression equation is sometimes referred to as a log-log

regression equation.

21/48
Example: logarithmic transformation

Data:
• A memory retention experiment in which 13 subjects were asked to
memorize a list of disconnected items. The subjects were then asked
to recall the items at various times up to a week later.
• The proportion of items (y = prop) correctly recalled at various times
(x = time, in minutes) since the list was memorized

22/48
Example: logarithmic transformation (cont.)

• Scatter plot

Before transformation on X log(X)

0.8

0.8
small values
¢
wg
after
-
0.6

0.6
spread
out
Prop

Prop
0.4

0.4
0.2

0.2

0 2000 4000 6000 8000 10000 0 2 4 6 8

Time log(Time)

23/48
Example: logarithmic transformation (cont.)

• Diagnostics: residual plot and Normal QQ-plot

Y~X Y~X
Residuals vs Fitted Normal Q-Q
1 1

Standardized residuals

2
0.2

2 13
2
Residuals

1
0.0

0
-0.2

-1
10

0.0 0.1 0.2 0.3 0.4 0.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Fitted values Theoretical Quantiles

Y~log(X) Y~log(X)
Residuals vs Fitted Normal Q-Q

2
0.00 0.02 0.04

7 7
9 9
Standardized residuals

1
Residuals

0
-1
-0.04

l
13

0.2 0.4 0.6 0.8 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Fitted values Theoretical Quantiles

Good to me since n.to .

24/48
A summary on variable transformation

• Using transformations is part of an iterative process where all the

linear regression assumptions are re-checked after each iteration.
• We transform the predictor (X) values only.
• We transform the response (Y) values.
• We transform both the predictor (X) values and response (Y) values.
• W try a transformation and then check to see if the transformation
eliminated the problems with the model. If it doesn’t help, we try
another transformation and so on.
Model building
• Model formulation
• Model estimation
• Model evaluation *

Model use

25/48
Box-Cox transformation

• It is often difficult to determine from diagnostic plots which

transformation of Y is most appropriate.
• Correcting skewness of the distribution of error terms.
• unequal error variances.
• nonlinearity of the regression function.
• Box-Cox procedure: automatically identifies a transformation from
the family of power transformations on Y.

26/48
Box-Cox transformation (cont.)

• The family of power transformations:

YÕ = Y⁄

• ⁄: a parameter to be determined from the data.

⁄=2 YÕ = Y2
Ô
⁄ = 0.5 YÕ = Y
⁄=0 Y Õ = loge (Y ) (by definition)
Ô
⁄ = ≠0.5 Y Õ = 1/ Y
⁄ = ≠1 Y Õ = 1/Y

27/48
Box-Cox transformation (cont.)

• The model becomes

Yi⁄ = —0 + —1 Xi + ‘i

• ⁄ : need to be estimated by MLE as well as —0 , —1 , ‡ 2

• Simple procedure to obtained ⁄̂: search in a range of potential ⁄
• ⁄ grid on : -2, 1.75, . . ., 1.75,2
• Each ⁄: standardized the Ŷi⁄ .

28/48
Box-Cox transformation (cont.)

I
K1 (Yi⁄ ≠ 1), ⁄ ”= 0
Wi =
K2 (loge Yi ), ⁄ = 0
n
1 Ÿ
K1 = , K2 = ( Yi )1/n
⁄K2⁄≠1 i=1

Note that K2 is the geometric mean of the Yi observations.

• Wi is the standardized observation of Yi so that the magnitude of the
error sum of squares does not depend on ⁄
• Wi = —0 + —1 Xi + ‘iThe regression is on Wi and Xi
This can be shown
• The MLE ⁄̂ is that value of ⁄ for which SSE is a minimum.
• Scatter and residual plots should be utilized to examine the
appropriateness of the transformation identified by the Box-Cox
procedure.

29/48
Box-Cox transformation (cont.)

• Theoretical or a priori considerations can be utilized to help in

choosing an appropriate transformation.
• When transformation models are employed, the estimators b0 , b1 have
the least squares properties with respect to the transformation
observations, not the original ones.
• The MLE of ⁄ with the Box-Cox procedure is subject to sampling
variability.
• SSE is fairly stable in a neighbourhood around the estimate.
• a nearby ⁄: easy to understand. EX: ⁄ = 0 instead of ⁄̂ = 0.13.
• when ⁄ = 1 ≈ no transformation of Y may be needed.

30/48
Example: Box-Cox transformation
GPA and ACT score
• R code to generate the plot and data are available on portal

Box-Cox transformation: GPA vs ACT score example

75
70
65
60
SSE

55
50
45

:..
40

0 1 2 3 4 5 6

31/48
*
Interpretation after Transformations

32/48
Interpretation of slope (—1 )

Model Response predictor

Level-level Y X
Level-log Y log(X )
Log-Level log(Y ) X
Log-log log(Y ) log(X )

• For Level-level model,E (Y |X ) = —0 + —1 X . The interpretation of —1 :

on average, Y changes by —1 as X increases by 1 unit.

E (Y |X + 1) ≠ E (Y |X ) = (—0 + —1 (X + 1)) ≠ (—0 + —1 X ) = —1

33/48
Interpretation of slope (—1 ): Level-log model

E (Y | log(X )) = —0 + —1 log(X )

• For level-log model, the interpretation of —1 : associated with each

two-fold increase (i.e doubling) of X, there is a —1 log(2) change in
.

the mean of Y.

E (Y |2X ) ≠ E (Y |X ) = —0 + —1 log(2X ) ≠ [—0 + —1 log(X )] = —1 log(2)

• For example
• Y = pH
• X = time after slaughter (hrs.)
• Estimated model: Ŷ = 6.98 ≠ 0.73 log(X )
• Interpretation of b1 : it is estimated that for each doubling of time
after slaughter (between 0 and 8 hours) the mean pH decreases by 0.5
= 0.73 ú log(2).

34/48
Interpretation of slope (—1 ): Log-level model

E (log(Y )|X ) = —0 + —1 X ≈ E (Y ) = e —0 +—1 X

As X increases by 1, what happens?
E (Y |X + 1) e —0 +—1 (X +1)
= = e —1
E (Y |X ) e —0 +—1 X
Interpretation:
• As X increases by 1, the mean of Y changes by the multiplicative
factor of e —1 .
• If —1 > 0: As X increases by 1, the mean of Y increases by
(e —1 ≠ 1) ú 100%
• If —1 < 0: As X increases by 1, the mean of Y decreases by
(1 ≠ e —1 ) ú 100%
Example:
• Estimated model: log(Yˆ ) = 18.96 ≠ 0.50X
• 1≠e = 0.4
-

≠0.5 < 0

• Interpretation: it is estimated that, on average, Y decreases by 40%

with each one unit increases in X. 35/48
Interpretation of slope (—1 ): both Y and X logged

E (log(Y )|X ) = —0 + —1 log(X )

Interpretation:
• Associated with each doubling of X, the mean of Y changes by the
the multiplicative factor of e —1 log(2) .
• If —1 > 0: As X is doubled, the mean of Y increases by
(e —1 log(2) ≠ 1) ú 100%
• If —1 < 0: As X is doubled, the mean of Y decreases by
(1 ≠ e —1 log(2) ) ú 100%
Example:
• Y: number of bird species on an island
• X: island area
\
• Estimated model: E (log(Y )| log(X )) = 1.94 ≠ 0.25 log(X )
• since 1 ≠ e ≠0.25úlog(2) = 0.1591
• Associated with each doubling of island area, it is estimated that there
is a 15.91% decreases in the mean number of bird species.
36/48
CH4: Simultaneous Inference

37/48
Joint Estimation of —0 and —1

• The 100(1 ≠ –)% CI of —0 is

1 X̄ 2
[L0 , U0 ] = b0 ± t1≠–/2;n≠2 s(b0 ), s 2 (b0 ) = MSE ( + )
n SXX

• The 100(1 ≠ –)% CI of —1 is

MSE
[L1 , U1 ] = b1 ± t1≠–/2;n≠2 s(b1 ), s 2 (b1 ) =
SXX
• What is the confidence coefficient of their joint intervals?
• P(L0 Æ —0 Æ U0 , L1 Æ —1 Æ U1 ) =?
• Let 1 ≠ – = 0.95: Not provide 95% C.I.s for —0 and —1 since (0.95)2 if
the inferences were independent.

38/48
Joint Estimation of —0 and —1 (cont.)

• Let A0 denote the event that the first confidence interval does not
cover —0 . Then P(A0 ) = –
• Let A1 denote the event that the first confidence interval does not
cover —1 . Then P(A1 ) = –
u
• Here Ac0 Ac1 is the event which indicates that both of the confidence
intervals cover —0 and —1 .
‹
P(Ac0 Ac1 ) =?

39/48
Bonferroni inequality: P(Ac0 Ac1 ) Ø 1 ≠ 2–
u

Venn diagram
PIAFNAFKPCATUA )

÷
|¥€|
,

_y→
.

LPIAO )tplA ) PLAOUAI )

Putout
p( AONA , )
-

,
,
-

=plAoHPCAi )

=tPlAo
Tftnan twice
) PLA , )tp( Aona , ,
-

-
-

PLAONAI )
20

⇒ PCAOCNAF )
Ztplttotptai )
=t22

QED .

40/48
Joint Estimation of —0 and —1 (cont.)

# %
Patil Zt5%x2=q%
, plAot=5%# ⇒ PLAOCNAF )

• —0 and —1 are separately estimated with 95% C.I.. The Bonferroni

inequality guarantees us a family confidence coefficient of at least
90% the that both intervals based on the same sample are correct.
• The 1 ≠ – family confidence limits for —0 and —1 for SLR model by
the Bonfirroni procedure:

bi ± Bs{bi }, B = t1≠–/4;n≠2

where i=0 for b0 and i=1 for b1 . I

to ensure PCAOCNAF )
>_t2xda

#
i. e. find a
for pop ,

respectively .

41/48
Example: Joint Estimation of —0 and —1

toluca=read.table(
"/Users/Wei/TA/Teaching/0-STA302-2016F/Week07-Oct24/toluca.txt",
col.names = c("lotsize", "workhrs"))
# plot(toluca$lotsize,toluca$workhrs)

modt = lm(lotsize~workhrs,data=toluca)
confint(modt)

## 2.5 % 97.5 %
## (Intercept) -17.1880966 13.4715943
## workhrs 0.1838466 0.2763702

confint(modt,level=1-0.05/2) # Bonferroni C.I.

-
) to ensure
## 1.25 % 98.75 %
## (Intercept) -19.6277718 15.9112696 P ( Loyd .
< Uo , 4 Yi < U,
7=0,95
## workhrs 0.1764842 0.2837326

42/48
Joint Estimation of —0 and —1 (cont.)

:
since Putin At ) It 22

• The Bonferroni 1 ≠ – family confidence coefficient is actually a lower

bound on the true family confidence coefficient.
-

• If g interval estimates are desired with family confidence coefficient

1 ≠ –, constructing each interval estimate with statement confidence
coefficient 1 ≠ –/g will suffice.
• The Bonferroni technique is ordinarily most useful when the number
of simultaneous estimates is not too large.
• It is not necessary with the Bonferroni procedure that the C.I. have
the same statement confidence coefficient. (P(A1 ) + P(A2 ) = –)

is large the wide be

g +21g →
-

a is to
too useful
, 1
;
,

43/48
Simultaneous Estimation of mean response

• The mean response at a number of X levels need to be estimated.

• Two procedures for simultaneous estimation of a number of different
mean responses: Working-Hotelling procedure Bonferroini procedure
) o

• Working-Hotelling procedure
• Bonferroni procedure

44/48
Simultaneous Estimation of mean response
Working-Hotelling procedure:
• Based on the confidence band for the regression line (Chap. 2.6).
• The confidence band contains the entire regression line, so it contains
the mean responses at all X levels.
• The simultaneous confidence limits for g mean responses E {Yh }

Ŷh + ±Ws{Ŷh }, W 2 = 2F (1 ≠ –; 2, n ≠ 2)

where

1 (Xh ≠ X̄ )2
Ŷh = b0 + b1 Xh , s{Ŷh } = MSE [ + ]
n SXX
Bonferroin procedure:
• The Bonferroni confidence limits for E {Yh } at g levels Xh with 1 ≠ –
family confidence coefficient:

Ŷh + ±Bs{Ŷh }, B = t1≠–/(2g);n≠2

45/48
Example: Simultaneous Estimation of mean response
• Toluca data example
+ BF

/
'

twit

46/48
Simultaneous Prediction Intervals for New observation

• The simultaneous predictions of g new observations on Y in g

independent trials at g different levels of X.
• Two procedure:
• Scheffe Procedure: using the F distribution

Ŷh + ±Ss{pred}, S 2 = gF (1 ≠ –; g, n ≠ 2)

• Bonferroni procedure: using the t distribution

Ŷh + ±Bs{pred}, B = t1≠–/(2g);n≠2

(Xh ≠X̄ )2
• s 2 {pred} = MSE [1 + 1
n
+ SXX
]
• Reference(http://rstudio-pubs-static.s3.amazonaws.com/5218_
61195adcdb7441f7b08af3dba795354f.html) Good .

47/48
Practice problems and upcoming topics

• Practice problems after today’s lecture: Chapter 3: 3.9, 3.18, 3.19,

3.20. Chapter 4: 4.1, 4.3, 4.4, 4.8, 4.19, 4.21, 4.24, 4.25.
• Upcoming topics
• Review on matrices.
• Ch5: Simple Linear Regression Model in Matrix Terms.
• Reading for upcoming topics: Ch5.9 - Ch5.13.

48/48

Plots Transformations and Regression An Introduction To Graphical Methods of Diagnostic Regression Analysis 0198533713 9780198533719
100% (1)
Plots Transformations and Regression An Introduction To Graphical Methods of Diagnostic Regression Analysis 0198533713 9780198533719
300 pages
Ch4 Notes Transformations 2024
No ratings yet
Ch4 Notes Transformations 2024
12 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Regn Lect 7
No ratings yet
Regn Lect 7
26 pages
Lecture 11 Regression
No ratings yet
Lecture 11 Regression
53 pages
Topic 7
No ratings yet
Topic 7
43 pages
ISOM2500 Spring 25 - Topic 10 - Assumptions For Linear Regression
No ratings yet
ISOM2500 Spring 25 - Topic 10 - Assumptions For Linear Regression
35 pages
Week 2
No ratings yet
Week 2
61 pages
AP Statistics: Re-Expressing Data: Get It Straight
No ratings yet
AP Statistics: Re-Expressing Data: Get It Straight
47 pages
3.1 Multivariate Analysis
No ratings yet
3.1 Multivariate Analysis
32 pages
Regression 2
No ratings yet
Regression 2
27 pages
T8 Transformation
No ratings yet
T8 Transformation
16 pages
Chapter 9 Variable Transformation Heteroscedasticity
No ratings yet
Chapter 9 Variable Transformation Heteroscedasticity
25 pages
Organizational Change Management
100% (5)
Organizational Change Management
107 pages
T8 Transformation
No ratings yet
T8 Transformation
14 pages
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
No ratings yet
Lesson - 4.2 - Exploratory Data Analysis - Analyze - Phase
50 pages
Chapter 4 Hand Out
No ratings yet
Chapter 4 Hand Out
15 pages
Codigo Box Cox SAS
No ratings yet
Codigo Box Cox SAS
37 pages
Regression & Correlation
No ratings yet
Regression & Correlation
44 pages
Estad Istica II Chapter 5. Regression Analysis (Second Part)
No ratings yet
Estad Istica II Chapter 5. Regression Analysis (Second Part)
39 pages
Extrapolation
No ratings yet
Extrapolation
48 pages
Algebra I m5 Topic B Lesson 7 Teacher
No ratings yet
Algebra I m5 Topic B Lesson 7 Teacher
13 pages
Statistical Methods II
No ratings yet
Statistical Methods II
284 pages
Ps Answers 2 Marks
No ratings yet
Ps Answers 2 Marks
20 pages
STA2005S Regression
No ratings yet
STA2005S Regression
92 pages
12 W12NSE6220 - Fall 2023 - Zeng
No ratings yet
12 W12NSE6220 - Fall 2023 - Zeng
44 pages
ENGR 217 Lecture 6
No ratings yet
ENGR 217 Lecture 6
29 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Regression For Everyone Vol. 1
No ratings yet
Regression For Everyone Vol. 1
25 pages
Section 2
No ratings yet
Section 2
22 pages
Statistics, Statistical Modelling and Data Analytics - Practicalfile - SJ
No ratings yet
Statistics, Statistical Modelling and Data Analytics - Practicalfile - SJ
23 pages
Regbook Inside
100% (1)
Regbook Inside
21 pages
00000chen - Linear Regression Analysis3
No ratings yet
00000chen - Linear Regression Analysis3
252 pages
Data Science Interview Preparation
100% (1)
Data Science Interview Preparation
113 pages
Diagnostico de Modelos
No ratings yet
Diagnostico de Modelos
4 pages
Analysing Panel Data
No ratings yet
Analysing Panel Data
25 pages
Question 1
No ratings yet
Question 1
23 pages
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
No ratings yet
Stat2 2023 Syllabus B v1.0 Weeks 5-6-7
41 pages
Six Sigma Black Belt - Cheat Sheet
100% (7)
Six Sigma Black Belt - Cheat Sheet
11 pages
Econometric Estimation BETA
No ratings yet
Econometric Estimation BETA
36 pages
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
No ratings yet
ECE 3040 Lecture 18: Curve Fitting by Least-Squares-Error Regression
38 pages
Theory and Application of The Linear Models - F. Graybill
33% (3)
Theory and Application of The Linear Models - F. Graybill
18 pages
Chapter 4 MLR
No ratings yet
Chapter 4 MLR
17 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Transformation Data
No ratings yet
Transformation Data
13 pages
Introductory Econometrics: Regression Functional Form, Model Selection, Prediction
No ratings yet
Introductory Econometrics: Regression Functional Form, Model Selection, Prediction
32 pages
Transformation
No ratings yet
Transformation
4 pages
Stat 331 Course Notes
No ratings yet
Stat 331 Course Notes
79 pages
Collection of Formulae and Statistical Tables For The B2-Econometrics and B3-Time Series Analysis Courses and Exams
No ratings yet
Collection of Formulae and Statistical Tables For The B2-Econometrics and B3-Time Series Analysis Courses and Exams
21 pages
Empirical Models: Data Collection
No ratings yet
Empirical Models: Data Collection
16 pages
Transformations of Variables
No ratings yet
Transformations of Variables
5 pages
CH 2
No ratings yet
CH 2
31 pages
Volume Control Dampers
100% (1)
Volume Control Dampers
13 pages
What Is Empirical - Models
No ratings yet
What Is Empirical - Models
14 pages
MBA 8040 MODEL BUILDING With Data Transformations PDF
No ratings yet
MBA 8040 MODEL BUILDING With Data Transformations PDF
17 pages
3 Transformations in Regression: Y X Y X
No ratings yet
3 Transformations in Regression: Y X Y X
13 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
APPLIED REGRESSION ANALYSIS AND GENERALIZED LINEAR MODELS Fox 2008
0% (1)
APPLIED REGRESSION ANALYSIS AND GENERALIZED LINEAR MODELS Fox 2008
103 pages
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
100% (15)
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
16 pages
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
No ratings yet
Chapter 4 Transformations and Weighting To Correct Model Inadequacies 13 March
27 pages
Air 0 Is The Next
No ratings yet
Air 0 Is The Next
2 pages
Age of Empires Rise of Rome
No ratings yet
Age of Empires Rise of Rome
35 pages
Models of Integration
No ratings yet
Models of Integration
18 pages
Catalogo Thompson MOTORES
No ratings yet
Catalogo Thompson MOTORES
23 pages
Experiment 6 Isolation of Eugenol From Cloves TECHNIQUE: Steam Distillation
No ratings yet
Experiment 6 Isolation of Eugenol From Cloves TECHNIQUE: Steam Distillation
2 pages
Ray Martinez - Resume 03 11 2023 - Most Recent
No ratings yet
Ray Martinez - Resume 03 11 2023 - Most Recent
3 pages
Huawei S3700 Switch Datasheet (22-Oct-2012)
No ratings yet
Huawei S3700 Switch Datasheet (22-Oct-2012)
12 pages
National-Oilwell: Top Drive
No ratings yet
National-Oilwell: Top Drive
6 pages
UCUN DINAS I BHS INGGRIS PKT A Dijawab
100% (3)
UCUN DINAS I BHS INGGRIS PKT A Dijawab
12 pages
Occupational Health and Safety Policy For The National Department of Health
No ratings yet
Occupational Health and Safety Policy For The National Department of Health
14 pages
Telehandler Genie GTH 1048-Specifications
No ratings yet
Telehandler Genie GTH 1048-Specifications
2 pages
RDBMS Unit2
No ratings yet
RDBMS Unit2
28 pages
CG Project Report
No ratings yet
CG Project Report
25 pages
Crystal Reports Vs PowerBI 1
No ratings yet
Crystal Reports Vs PowerBI 1
3 pages
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
No ratings yet
Illustrated Parts Catalog Bo105 Ls A-3: Lifting System Assy
2 pages
Access Que
No ratings yet
Access Que
19 pages
Family Worship, Family Worship, J. H. Merle D'aubigné
No ratings yet
Family Worship, Family Worship, J. H. Merle D'aubigné
18 pages
JEE Main 2024 Solutions Jan 29 Shift 2
No ratings yet
JEE Main 2024 Solutions Jan 29 Shift 2
22 pages
Dhupguri Report
No ratings yet
Dhupguri Report
11 pages
Final Project Report MRI Reconstruction
No ratings yet
Final Project Report MRI Reconstruction
19 pages
Reserch Proposal Raneesha
No ratings yet
Reserch Proposal Raneesha
22 pages
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
No ratings yet
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
67 pages
Explosion Proof Pressure Transmitter: Model PT124B-282 Intelligent Type
No ratings yet
Explosion Proof Pressure Transmitter: Model PT124B-282 Intelligent Type
2 pages
Chapter1 InteractionsandMotion
No ratings yet
Chapter1 InteractionsandMotion
44 pages
Untitled Document 13
No ratings yet
Untitled Document 13
3 pages
Hipotesis Uji T Kontrol Dan Intervensi
No ratings yet
Hipotesis Uji T Kontrol Dan Intervensi
3 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
2 pages

STA302 Week07 Full

Uploaded by

STA302 Week07 Full

Uploaded by

STA302/1001 - Methods of Data Analysis I

(Week 07 lecture notes)

Wei (Becky) Lin

Oct 24, 2016

• Morning sections: Class on October 27 will take place in ES 1050.

• Unusual data points: outliers, high levage points, influential points.

• Several alternative transformations may be tried.

• Scatter plot and residual plots based on each transformation should

• XiÕ = f (Xi ), Yi = —0 + —1 XiÕ + ‘i

• YiÕ = f (Yi ), YiÕ = —0 + —1 Xi + ‘i

# Table 3.7: 10 participants in the study

Before transformation of X After transformation of X

After transformation of X After transformation of X

60 80 100 120 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5

Fitted values Theoretical Quantiles

fit0 = lm(Y~X); fit1 = lm(Y~Xp)

## Analysis of Variance Table

• Find confidence interval for —0 , —1 Ô

1950 1960 1970 1980 1990 2000

year year Theoretical Quantiles

log GNP, ti = year ≠ 1947

time time Theoretical Quantiles

Improved A bit concern

time time Theoretical Quantiles

Compare SSE’s in terms of original units

In this case, logarithmic transformation also offers better interpretation:

GNPi = GNP0 exp(—1 t + ‘i ) ¥ GNP0 (1 + —1 )t e ‘i

R 9.43 < wgtn < 9.611=0.95

• The default logarithmic transformation merely involves taking the

Why consider the natural logarithmic transformation:

Logarithms are often used because they are connected to common

• A general power curve equation is

y = a ú x b ∆ log(y ) = log(a) + b log(X )

This regression equation is sometimes referred to as a log-log

Before transformation on X log(X)

0 2000 4000 6000 8000 10000 0 2 4 6 8

• Diagnostics: residual plot and Normal QQ-plot

Fitted values Theoretical Quantiles

Fitted values Theoretical Quantiles

Good to me since n.to .

• Using transformations is part of an iterative process where all the

• It is often difficult to determine from diagnostic plots which

• The family of power transformations:

• ⁄: a parameter to be determined from the data.

• The model becomes

• ⁄ : need to be estimated by MLE as well as —0 , —1 , ‡ 2

Note that K2 is the geometric mean of the Yi observations.

• Theoretical or a priori considerations can be utilized to help in

Box-Cox transformation: GPA vs ACT score example

Model Response predictor

• For Level-level model,E (Y |X ) = —0 + —1 X . The interpretation of —1 :

E (Y |X + 1) ≠ E (Y |X ) = (—0 + —1 (X + 1)) ≠ (—0 + —1 X ) = —1

• For level-log model, the interpretation of —1 : associated with each

E (Y |2X ) ≠ E (Y |X ) = —0 + —1 log(2X ) ≠ [—0 + —1 log(X )] = —1 log(2)

E (log(Y )|X ) = —0 + —1 X ≈ E (Y ) = e —0 +—1 X

• Interpretation: it is estimated that, on average, Y decreases by 40%

E (log(Y )|X ) = —0 + —1 log(X )

• The 100(1 ≠ –)% CI of —0 is

• The 100(1 ≠ –)% CI of —1 is

LPIAO )tplA ) PLAOUAI )

• —0 and —1 are separately estimated with 95% C.I.. The Bonferroni

where i=0 for b0 and i=1 for b1 . I

confint(modt,level=1-0.05/2) # Bonferroni C.I.

• The Bonferroni 1 ≠ – family confidence coefficient is actually a lower

• If g interval estimates are desired with family confidence coefficient

is large the wide be

• The mean response at a number of X levels need to be estimated.

Ŷh + ±Bs{Ŷh }, B = t1≠–/(2g);n≠2

• The simultaneous predictions of g new observations on Y in g

• Bonferroni procedure: using the t distribution

Ŷh + ±Bs{pred}, B = t1≠–/(2g);n≠2

• Practice problems after today’s lecture: Chapter 3: 3.9, 3.18, 3.19,

You might also like