Regression
Regression
Stat BIO
(Section 2)
Instructor:
Sawitree Boonpatcharanon
1/ 53
2603282
Stat BIO
(Section 2)
Instructor:
Sawitree Boonpatcharanon
1/ 53
วทลดการ ส ป บ
* เท
2603282
Stat BIO
(Section 2)
Instructor:
Sawitree Boonpatcharanon
1/ 53
จั
รุ
Flow
2/ 53
Regression
Objective To study the relationship between two types of variables (X and Y )
and/or use X to predict Y .
3/ 53
Regression
Example 1 American Express Company has long believed that its cardholders
tend to travel more extensively than others - both on business and for pleasure.
As part of a comprehensive research e↵ort undertaken by a New York market
research firm on behalf of American Express, a study was conducted to
determine the relationship between travel and charges on the American Express
card. The research firm selected a random sample of 25 cardholders from the
American Express computer file and recorded their total charges over a
specified period. For the selected cardholders, information was also obtained,
through a mailed questionnaire, on the total number of miles traveled by each
cardholder during the same period.
4/ 53
Regression
Example 1: Data Layout
5/ 53
Regression
Example 2 Alka-Seltzer recently embarked on an in-store promotional
campaign, with displays of its antacid featured prominently in supermarkets.
The company also ran its usual radio and television commercials. Over a period
of 10 weeks, the company kept track of its expenditure on radio and television
advertising, variable X1 , as well as its spending on in-store displays, variable X2 .
The resulting sales for each week in the area studied were recorded as the
dependent variable Y. The company analyst conducting the study hypothesized
a linear regression model of the form linking sales volume with the two
independent variables, advertising and in-store promotions. The analyst wanted
to use the available data, considered a random sample of 10 weekly
observations, to estimate the parameters of the regression relationship.
6/ 53
Regression
Example 2: Data Layout
7/ 53
Regression: Model
1 Simple Linear Regression (Example 1)
Y = 0 + 1X +✏
Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏
8/ 53
Regression: Model
Note
1 Relationship between X and Y is linear.
4 ✏ is the error term or the residual term. We do not have this data.
9/ 53
Regression: Relationship between one X and Y
use G sordinary
least square
Y = 0 + 1X +✏
11/ 53
ค่
ตั
น้
ำ
Simple Linear Regression: Estimation
Y = 0 + 1X +✏
Parameters: 0, 1
ŷ = b0 + b1 x
Estimators: ˆ0 = b0 , ˆ1 = b1
12/ 53
Simple Linear Regression: Estimation
13/ 53
Simple Linear Regression: Estimation
Estimation method “Least Square method”
Objective is to minimize the Sum Square Error (SSE)
P P
Sum square error = SSE = ni=1 ei2 = ni=1 (yi ŷi )2
14/ 53
Simple Linear Regression: Estimation
SPSS Results
15/ 53
Simple Linear Regression: Example
The simple linear regression model for this data set is
Y = 0 + 1X + ✏.
We then get
ŷ = b0 + b1 x
ŷ = 274.85 + 1.255x.
Next, we need to clarify that there is a relationship between x and y (slope are
necessary to stay in the model). ) It means we need to do the hypothesis
testing that 1 6= 0.
Hypothesis setting H0 : 1 = 0, H1 : 1 6= 0
b1
Test statistic tcal = 10
s(b1 )
ŷ = 274.85 + 1.255x
16/ 53
Simple Linear Regression: Assumptions
Y = 0 + 1X +✏
Model assumptions
2 The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one another in
successive observations. In symbols:
2
✏ ⇠ N(0, )
17/ 53
Simple Linear Regression: Assumptions
• The relationship between X and Y is a straight-line
relationship.
18/ 53
Simple Linear Regression: Assumptions
• ✏ ⇠ N(0, 2
) Normality assumption
19/ 53
Simple Linear Regression: Assumptions
• The errors ✏ are normally distributed with mean 0 and a
2
constant variance . The errors are uncorrelated (not related)
with one another in successive observations.
20/ 53
Simple Linear Regression: Assumptions
• The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one
another in successive observations.
21/ 53
Simple Linear Regression: Example
Example 1: American Express Company . . . , a study was conducted to
determine the relationship between travel and charges on the American Express
card. . . .
22/ 53
Simple Linear Regression: Example
First, see whether X and Y pass the linear assumption. However, the error
term, ✏, cannot be checked right now. We need to wait until we get ŷ , so we
can calculate the following e.
23/ 53
Simple Linear Regression: Example Con’t
Check assumption
24/ 53
Simple Linear Regression: Interpretation
The simple linear regression model for this data set is
ŷ = b0 + b1 x
ŷ = 274.85 + 1.26x
1 to study relationship
25/ 53
Simple Linear Regression: Prediction
Note
2 You should be aware that using a regression for extrapolating outside the
estimation range is risky, as the estimated relationship may not be
appropriate outside this range.
26/ 53
Simple Linear Regression: Model Evaluation
How good the model is?
The coefficient of determination R 2 is a measure of the strength of the
regression relationship, a measure of how well the regression line fits the data.
SSR SSE
R2 = =1 ; 0 R2 1
SST SST
27/ 53
Simple Linear Regression: Model Evaluation
28/ 53
Simple Linear Regression: Example Con’t
2
SSxy
R2 =
SSx SSy
= 0.97
29/ 53
Multiple Linear Regression
Example 2 Alka-Seltzer recently embarked on an in-store promotional
campaign, with displays of its antacid featured prominently in supermarkets.
The company also ran its usual radio and television commercials. Over a period
of 10 weeks, the company kept track of its expenditure on radio and television
advertising, variable X1 , as well as its spending on in-store displays, variable X2 .
The resulting sales for each week in the area studied were recorded as the
dependent variable Y. The company analyst conducting the study hypothesized
a linear regression model of the form linking sales volume with the two
independent variables, advertising and in-store promotions. The analyst wanted
to use the available data, considered a random sample of 10 weekly
observations, to estimate the parameters of the regression relationship.
30/ 53
Multiple Linear Regression
Example 2: Data Layout
31/ 53
Multiple Linear Regression
1 Simple Linear Regression
Y = 0 + 1X +✏
32/ 53
Multiple Linear Regression: Assumptions
Model assumptions
2 The errors ✏ are normally distributed with mean 0 and a constant variance
2
. The errors are uncorrelated (not related) with one another in
successive observations. In symbols:
2
✏ ⇠ N(0, )
33/ 53
Multiple Linear Regression: Estimation
Example 2:
34/ 53
Multiple Linear Regression: Estimation
Example 2 Con’t:
How to get b0 = 47.165, b1 = 1.599, b2 = 1.149?
Excel: Data ! Data Analysis ! Regression
35/ 53
Multiple Linear Regression: Hypothesis Testing
Y = 0 + 1 X1 + 2 X2 + ··· + k Xk +✏
A hypothesis test for the existence of a linear relationship between any of the
Xi and Y is
H0 : 1 = 2 = 3 = ··· = k =0
H1 : Not all i are zero.
36/ 53
Multiple Linear Regression: Hypothesis Testing
SSR MSR
Regression SSR k MSR = k
Fcal = MSE
SSE
Error SSE n (k + 1) MSE = n (k+1)
Total SST n 1
37/ 53
Multiple Linear Regression: Hypothesis Testing
Rejection region: Fcal > Ftable = F↵,k,n (k+1)
38/ 53
Multiple Linear Regression: Hypothesis Testing
Next question: Which i is/are significant?
Tests individual regression slope parameter
H0 : i =0
Ha : i 6= 0
Test statistics
bi 0
tcal = ; df = n (k + 1)
s(bi )
s
p
where s(bi ) = pSS ; s = MSE is the standard error of bi .
xi
This test is under the assumption that the regression errors are normally
distributed.
39/ 53
Multiple Linear Regression: Hypothesis Testing
Example 2 Con’t:
SPSS
40/ 53
Multiple Linear Regression: Assumptions
Example 2 Con’t:
41/ 53
Multiple Linear Regression: Assumptions
Multicollinearity
Ideally, the Xi variables in a regression equation are uncorrelated with one
another; each variable contains a unique piece of information about Y
information that is not contained in any of the other Xi .
Excel: Data ! Data Analysis ! Correlation
42/ 53
Multiple Linear Regression: Assumptions
The e↵ects of multicollinearity
1 The variances (and standard errors) of regression coefficient estimators
are inflated.
6 In some cases, the F ratio is significant, but none of the t ratios is.
43/ 53
Multiple Linear Regression: Coefficient of Determination
The multiple coefficient of determination R 2 is
SSR SSE
R2 = =1 ; 0 R2 1
SST SST
2
Radj always less than R 2 .
44/ 53
Appendix
45/ 53
Simple Linear Regression: Estimation
ŷi = b0 + b1 xi
Minimize SSE through normal equations which are
n
X n
X
yi = nb0 + b1 xi (1)
i=1 i=1
n
X n
X n
X
xi yi = b0 xi + b 1 xi2 (2)
i=1 i=1 i=1
n n Pn
X X ( xi )2
SSx = (xi x̄)2 = xi2 i=1
i=1 i=1
n
n n Pn
X X ( yi )2
SSy = (yi ȳ )2 = yi2 i=1
i=1 i=1
n
n n Pn Pn
X X ( xi )( yi )
i=1 i=1
SSxy = (xi x̄)(yi ȳ ) = xi yi
i=1 i=1
n
SSxy
b1 = ,
SSx
47/ 53
Simple Linear Regression: Hypothesis testing
A hypothesis test for the existence of a linear relationship between X and Y is
H0 : 1 =0
H1 : 1 6= 0
Test statistic
Given the assumption of normality of the regression errors, the test statistic
possesses the t distribution with n 2 degrees of freedom.
b1 10
tcal =
s(b1 )
For the critical region, we use the same rules as when we do the t-test.
48/ 53
Simple Linear Regression: Hypothesis testing
b1
From tcal = 10
s(b1 )
,
s
s(b1 ) = p
SSx
and
SSE
MSE = .
n 2
where
n
X n
X
SSE = ei2 = (yi ŷi )2
i=1 i=1
(SSxy )2
= SSy
SSx
= SSy b1 SSxy .
49/ 53
Simple Linear Regression: Hypothesis testing
If you reject H0 , 1 6= 0. Your regression model will be
ŷ = b0 + b1 x.
For the result that you FTR H0 , 1 = 0. Your regression model will be
ŷ = b0 .
b0 ± t↵/2,n 2 s(b0 ).
b1 ± t↵/2,n 2 s(b1 ).
50/ 53
Simple Linear Regression: Model Evaluation
where
yi ȳ = yi ŷi + ŷi ȳ
Total deviation = Unexplained deviation + Explained deviation
(error ) (regression)
Therefore,
2
SSxy
R2 = ; 0 R2 1
SSx SSy
51/ 53
Multiple Linear Regression: Estimation
Example of two independent variables
Objective is to minimize SSE through normal equations which are
n
X n
X n
X
yi = nb0 + b1 x1i + b2 x2i
i=1 i=1 i=1
n
X n
X n
X n
X
x1i yi = b0 x1i + b1 x1i2 + b2 x1i x2i
i=1 i=1 i=1 i=1
Xn Xn Xn n
X
x2i yi = b0 x2i + b1 x1i x2i + b2 x2i2
i=1 i=1 i=1 i=1
52/ 53
Ending Ticket
https://forms.gle/dPEQqPTNNi16c34z6
After you submit the form, you will get a receipt ticket. Please
check your email address and keep it as your evidence.
53/ 53