0% found this document useful (0 votes)
27 views55 pages

Regression

The document discusses simple linear regression. Simple linear regression models the relationship between a single independent variable (X) and dependent variable (Y). The model takes the form Y = β0 + β1X + ε, where β0 is the intercept, β1 is the slope coefficient, and ε is the error term. The document provides examples of simple linear regression, discusses model estimation using least squares, and lists assumptions of the simple linear regression model such as the linear relationship between X and Y and that the errors are normally distributed.

Uploaded by

thinnapat.siri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views55 pages

Regression

The document discusses simple linear regression. Simple linear regression models the relationship between a single independent variable (X) and dependent variable (Y). The model takes the form Y = β0 + β1X + ε, where β0 is the intercept, β1 is the slope coefficient, and ε is the error term. The document provides examples of simple linear regression, discusses model estimation using least squares, and lists assumptions of the simple linear regression model such as the linear relationship between X and Y and that the errors are normally distributed.

Uploaded by

thinnapat.siri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

2603282

Stat BIO
(Section 2)

Instructor:
Sawitree Boonpatcharanon

Week 14: 16 November, 2023

1/ 53
2603282
Stat BIO
(Section 2)

Instructor:
Sawitree Boonpatcharanon

Week 14: 16 November, 2023

1/ 53
วทลดการ ส ป บ
* เท

2603282
Stat BIO
(Section 2)

Instructor:
Sawitree Boonpatcharanon

Week 14: 16 November, 2023

1/ 53
จั
รุ
Flow

2/ 53
Regression
Objective To study the relationship between two types of variables (X and Y )
and/or use X to predict Y .

3/ 53
Regression
Example 1 American Express Company has long believed that its cardholders
tend to travel more extensively than others - both on business and for pleasure.
As part of a comprehensive research e↵ort undertaken by a New York market
research firm on behalf of American Express, a study was conducted to
determine the relationship between travel and charges on the American Express
card. The research firm selected a random sample of 25 cardholders from the
American Express computer file and recorded their total charges over a
specified period. For the selected cardholders, information was also obtained,
through a mailed questionnaire, on the total number of miles traveled by each
cardholder during the same period.

4/ 53
Regression
Example 1: Data Layout

5/ 53
Regression
Example 2 Alka-Seltzer recently embarked on an in-store promotional
campaign, with displays of its antacid featured prominently in supermarkets.
The company also ran its usual radio and television commercials. Over a period
of 10 weeks, the company kept track of its expenditure on radio and television
advertising, variable X1 , as well as its spending on in-store displays, variable X2 .
The resulting sales for each week in the area studied were recorded as the
dependent variable Y. The company analyst conducting the study hypothesized
a linear regression model of the form linking sales volume with the two
independent variables, advertising and in-store promotions. The analyst wanted
to use the available data, considered a random sample of 10 weekly
observations, to estimate the parameters of the regression relationship.

6/ 53
Regression
Example 2: Data Layout

7/ 53
Regression: Model
1 Simple Linear Regression (Example 1)

One X ! Simple Linear Regression (SLR)

Y = 0 + 1X +✏

2 Multiple Linear Regression (Example 2)

More than one X ! Multiple Linear Regression (MLR)

Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏

8/ 53
Regression: Model
Note
1 Relationship between X and Y is linear.

2 We call X , X1 , X2 , . . . , Xk as independent variables or predictor variables.


We have this data.

3 Y is a dependent variable or variable that we want to predict. We have


this data.

4 ✏ is the error term or the residual term. We do not have this data.

5 Data type of all X and Y are at least interval scale. (X can be


categorical variable but you need more steps than material in this course)

6 0 , 1 , 2 , . . . , k are the coefficients. We will estimates these values to


get the relationship or prediction model.

9/ 53
Regression: Relationship between one X and Y

source: Complete Business Statistics 7th Edition, Aczel - Sounderpandian


10/ 53
Simple Linear Regression: Model
1x 1 ว 3 หา า า 30,81 โด ย

ท ให้ erro rน อยท

use G sordinary
least square
Y = 0 + 1X +✏

0 is the intercept of the model.


1 is the slope of the model.
E (Y |X ) = 0 + 1X ) Average of Y given X

11/ 53
ค่
ตั
น้

Simple Linear Regression: Estimation

Y = 0 + 1X +✏

Parameters: 0, 1

ŷ = b0 + b1 x

Estimators: ˆ0 = b0 , ˆ1 = b1

Also get ✏ˆ = e from Y ŷ

Important !! Choose the best fitting line for the data


The best model, the smallest error.

12/ 53
Simple Linear Regression: Estimation

13/ 53
Simple Linear Regression: Estimation
Estimation method “Least Square method”
Objective is to minimize the Sum Square Error (SSE)
P P
Sum square error = SSE = ni=1 ei2 = ni=1 (yi ŷi )2

14/ 53
Simple Linear Regression: Estimation

SPSS Results

15/ 53
Simple Linear Regression: Example
The simple linear regression model for this data set is

Y = 0 + 1X + ✏.

We then get

ŷ = b0 + b1 x
ŷ = 274.85 + 1.255x.

Next, we need to clarify that there is a relationship between x and y (slope are
necessary to stay in the model). ) It means we need to do the hypothesis
testing that 1 6= 0.

Hypothesis setting H0 : 1 = 0, H1 : 1 6= 0
b1
Test statistic tcal = 10
s(b1 )

ŷ = 274.85 + 1.255x

16/ 53
Simple Linear Regression: Assumptions

Y = 0 + 1X +✏

Model assumptions

1 The relationship between X and Y is a straight-line relationship.

2 The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one another in
successive observations. In symbols:
2
✏ ⇠ N(0, )

17/ 53
Simple Linear Regression: Assumptions
• The relationship between X and Y is a straight-line
relationship.

18/ 53
Simple Linear Regression: Assumptions
• ✏ ⇠ N(0, 2
) Normality assumption

• K-S or S-W test

19/ 53
Simple Linear Regression: Assumptions
• The errors ✏ are normally distributed with mean 0 and a
2
constant variance . The errors are uncorrelated (not related)
with one another in successive observations.

20/ 53
Simple Linear Regression: Assumptions
• The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one
another in successive observations.

21/ 53
Simple Linear Regression: Example
Example 1: American Express Company . . . , a study was conducted to
determine the relationship between travel and charges on the American Express
card. . . .

22/ 53
Simple Linear Regression: Example
First, see whether X and Y pass the linear assumption. However, the error
term, ✏, cannot be checked right now. We need to wait until we get ŷ , so we
can calculate the following e.

23/ 53
Simple Linear Regression: Example Con’t
Check assumption

24/ 53
Simple Linear Regression: Interpretation
The simple linear regression model for this data set is

ŷ = b0 + b1 x
ŷ = 274.85 + 1.26x

In practice, the two objectives of regression model are

1 to study relationship

b1 : when x increases 1 unit, y also increase b1 unit.

2 to make a prediction eg. when x = 2, 000, ŷ = 2, 785.52, e = .....

25/ 53
Simple Linear Regression: Prediction
Note

1 We use x to predict y but not vice versa.

2 You should be aware that using a regression for extrapolating outside the
estimation range is risky, as the estimated relationship may not be
appropriate outside this range.

26/ 53
Simple Linear Regression: Model Evaluation
How good the model is?
The coefficient of determination R 2 is a measure of the strength of the
regression relationship, a measure of how well the regression line fits the data.

SSR SSE
R2 = =1 ; 0  R2  1
SST SST

The coefficient of determination can be interpreted as the proportion of the


variation in Y that is explained by the regression relationship of Y with X .

27/ 53
Simple Linear Regression: Model Evaluation

28/ 53
Simple Linear Regression: Example Con’t

2
SSxy
R2 =
SSx SSy
= 0.97

Meaning: Dollars spend on the American Express card (Y ) can be explained by


the relationship between the amount spent (Y ) and the travel miles (X ) by
97%.

29/ 53
Multiple Linear Regression
Example 2 Alka-Seltzer recently embarked on an in-store promotional
campaign, with displays of its antacid featured prominently in supermarkets.
The company also ran its usual radio and television commercials. Over a period
of 10 weeks, the company kept track of its expenditure on radio and television
advertising, variable X1 , as well as its spending on in-store displays, variable X2 .
The resulting sales for each week in the area studied were recorded as the
dependent variable Y. The company analyst conducting the study hypothesized
a linear regression model of the form linking sales volume with the two
independent variables, advertising and in-store promotions. The analyst wanted
to use the available data, considered a random sample of 10 weekly
observations, to estimate the parameters of the regression relationship.

30/ 53
Multiple Linear Regression
Example 2: Data Layout

31/ 53
Multiple Linear Regression
1 Simple Linear Regression

Y = 0 + 1X +✏

2 Multiple Linear Regression


Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏

32/ 53
Multiple Linear Regression: Assumptions
Model assumptions

1 The relationship between X and Y is a straight-line relationship.

2 The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one another in
successive observations. In symbols:
2
✏ ⇠ N(0, )

3 All independent vaiables are uncorrelated. No multicollinearity problem!

33/ 53
Multiple Linear Regression: Estimation
Example 2:

We will get b0 = 47.165, b1 = 1.599, b2 = 1.149.

34/ 53
Multiple Linear Regression: Estimation
Example 2 Con’t:
How to get b0 = 47.165, b1 = 1.599, b2 = 1.149?
Excel: Data ! Data Analysis ! Regression

ŷ = 47.165 + 1.599x1 + 1.149x2

35/ 53
Multiple Linear Regression: Hypothesis Testing

Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏

A hypothesis test for the existence of a linear relationship between any of the
Xi and Y is

H0 : 1 = 2 = 3 = ··· = k =0
H1 : Not all i are zero.

Look similar to . . . . . . . . . . . . . . . . . . . . . test

36/ 53
Multiple Linear Regression: Hypothesis Testing

Variation Sum Square df Mean Square F Ratio

SSR MSR
Regression SSR k MSR = k
Fcal = MSE

SSE
Error SSE n (k + 1) MSE = n (k+1)

Total SST n 1

Excel: Data ! Data Analysis ! Regression


Example 2 Con’t:

37/ 53
Multiple Linear Regression: Hypothesis Testing
Rejection region: Fcal > Ftable = F↵,k,n (k+1)

38/ 53
Multiple Linear Regression: Hypothesis Testing
Next question: Which i is/are significant?
Tests individual regression slope parameter

H0 : i =0
Ha : i 6= 0

Test statistics
bi 0
tcal = ; df = n (k + 1)
s(bi )
s
p
where s(bi ) = pSS ; s = MSE is the standard error of bi .
xi

This test is under the assumption that the regression errors are normally
distributed.

39/ 53
Multiple Linear Regression: Hypothesis Testing
Example 2 Con’t:

ŷ = 47.165 + 1.599x1 + 1.149x2

SPSS

40/ 53
Multiple Linear Regression: Assumptions
Example 2 Con’t:

41/ 53
Multiple Linear Regression: Assumptions
Multicollinearity
Ideally, the Xi variables in a regression equation are uncorrelated with one
another; each variable contains a unique piece of information about Y
information that is not contained in any of the other Xi .
Excel: Data ! Data Analysis ! Correlation

42/ 53
Multiple Linear Regression: Assumptions
The e↵ects of multicollinearity
1 The variances (and standard errors) of regression coefficient estimators
are inflated.

2 The magnitudes of the regression coefficient estimates may be di↵erent


from what we expect.

3 The signs of the regression coefficient estimates may be the opposite of


what we expect.

4 Adding or removing variables produces large changes in the coefficient


estimates or their signs.

5 Removing a data point causes large changes in the coefficient estimates


or their signs.

6 In some cases, the F ratio is significant, but none of the t ratios is.

43/ 53
Multiple Linear Regression: Coefficient of Determination
The multiple coefficient of determination R 2 is
SSR SSE
R2 = =1 ; 0  R2  1
SST SST

The adjusted multiple coefficient of determination is

2 SSE /(n (k + 1)) 2


Radj =1 ; 0  Radj 1
SST /(n 1)

2
Radj always less than R 2 .

44/ 53
Appendix

45/ 53
Simple Linear Regression: Estimation

ŷi = b0 + b1 xi
Minimize SSE through normal equations which are
n
X n
X
yi = nb0 + b1 xi (1)
i=1 i=1
n
X n
X n
X
xi yi = b0 xi + b 1 xi2 (2)
i=1 i=1 i=1

Hence from (1) and (2), the intercept is


b0 = ȳ b1 x̄.
and the slope is
Pn Pn P
( xi )( ni=1 yi )
i=1 xi yi SSxy
i=1
n
b1 = Pn P
( ni=1 xi )2
=
2 SSx
i=1 xi n
46/ 53
Simple Linear Regression: Estimation

n n Pn
X X ( xi )2
SSx = (xi x̄)2 = xi2 i=1

i=1 i=1
n
n n Pn
X X ( yi )2
SSy = (yi ȳ )2 = yi2 i=1

i=1 i=1
n
n n Pn Pn
X X ( xi )( yi )
i=1 i=1
SSxy = (xi x̄)(yi ȳ ) = xi yi
i=1 i=1
n

Note: SS comes from Sum Square. Therefore,

SSxy
b1 = ,
SSx

47/ 53
Simple Linear Regression: Hypothesis testing
A hypothesis test for the existence of a linear relationship between X and Y is

H0 : 1 =0
H1 : 1 6= 0

H0 , Ha can be written as one-sided too.

Test statistic
Given the assumption of normality of the regression errors, the test statistic
possesses the t distribution with n 2 degrees of freedom.
b1 10
tcal =
s(b1 )

For the critical region, we use the same rules as when we do the t-test.

48/ 53
Simple Linear Regression: Hypothesis testing
b1
From tcal = 10
s(b1 )
,
s
s(b1 ) = p
SSx

where s 2 is an unbiased estimator of 2 , then


p
s = MSE

and
SSE
MSE = .
n 2
where
n
X n
X
SSE = ei2 = (yi ŷi )2
i=1 i=1

(SSxy )2
= SSy
SSx
= SSy b1 SSxy .

49/ 53
Simple Linear Regression: Hypothesis testing
If you reject H0 , 1 6= 0. Your regression model will be

ŷ = b0 + b1 x.

For the result that you FTR H0 , 1 = 0. Your regression model will be

ŷ = b0 .

Moreover, we can also find the CI of 0 and 1.

A (1 ↵)100% confidence interval for 0 is

b0 ± t↵/2,n 2 s(b0 ).

A (1 ↵)100% confidence interval for 1 is

b1 ± t↵/2,n 2 s(b1 ).

50/ 53
Simple Linear Regression: Model Evaluation
where

SST = SSE + SSR


n
X n
X n
X
2 2
(yi ȳ ) = (yi ŷi ) + (ŷi ȳ )2
i=1 i=1 i=1
SSy = (SSy b1 SSxy ) + b1 SSxy

yi ȳ = yi ŷi + ŷi ȳ
Total deviation = Unexplained deviation + Explained deviation
(error ) (regression)

Therefore,
2
SSxy
R2 = ; 0  R2  1
SSx SSy

51/ 53
Multiple Linear Regression: Estimation
Example of two independent variables
Objective is to minimize SSE through normal equations which are
n
X n
X n
X
yi = nb0 + b1 x1i + b2 x2i
i=1 i=1 i=1
n
X n
X n
X n
X
x1i yi = b0 x1i + b1 x1i2 + b2 x1i x2i
i=1 i=1 i=1 i=1
Xn Xn Xn n
X
x2i yi = b0 x2i + b1 x1i x2i + b2 x2i2
i=1 i=1 i=1 i=1

For finding b0 , b1 , b2 , we calculate all summations above then substitute in the


formula. Next is to solve three equations.

52/ 53
Ending Ticket

https://forms.gle/dPEQqPTNNi16c34z6

After you submit the form, you will get a receipt ticket. Please
check your email address and keep it as your evidence.

53/ 53

You might also like