0% found this document useful (0 votes)
10 views

Solved Application On Multiple Linear Regression Model

The professor recorded test scores, time spent in clubs, and math exam grades for 20 students to analyze relationships between the variables. A multiple linear regression model found that: 1) Test scores decreased by 2.87 points for each additional day in clubs, and increased by 4.25 points for each additional point on the math exam. 2) The regression model was a valid fit for the data and both predictor variables were statistically significant. 3) Approximately 27% of variability in test scores was explained by time in clubs and math grades.

Uploaded by

CristiCristi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Solved Application On Multiple Linear Regression Model

The professor recorded test scores, time spent in clubs, and math exam grades for 20 students to analyze relationships between the variables. A multiple linear regression model found that: 1) Test scores decreased by 2.87 points for each additional day in clubs, and increased by 4.25 points for each additional point on the math exam. 2) The regression model was a valid fit for the data and both predictor variables were statistically significant. 3) Approximately 27% of variability in test scores was explained by time in clubs and math grades.

Uploaded by

CristiCristi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

lOMoARcPSD|8641179

Solved application on multiple linear regression model

Econometrie Econometrics (Academia de Studii Economice din București)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Ionescu Cristian ([email protected])
lOMoARcPSD|8641179

The multiple linear regression model – solved application

1. A professor of Statistics wishes to find out if there is a relationship between the test-scores of
his students (points), time spent in clubs (number of days) and the grade at math exam (points).
He recorded the values of the three variables for 20 students randomly selected and – after
Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 R = …….
2
R =…….. …………. Se= MSE = 6,58
a. Predictors: (Constant), Math-grade, Days spent in the club.

a
ANOVA
Model Sum of Squares df Mean Square F Sig.
MSR = ……..
Regression SSR =2729,25 k=… Fcalc = …. ,000002

1 MSE = …..…
Residual SSE = … n-k-1 = …

Total SST = … n-1 = …


a. Dependent Variable: Statistics-test-score
b. Predictors: (Constant), Math-grade, Days spent in the club

a
Coefficients
Model Unstandardized Standardized t Sig. 95,0% Confidence Interval for B
Coefficients Coefficients

B Std. Error Beta Lower Bound Upper Bound


s b0 = tcalc(β0)
(Constant) b0 =…….… ,003 L(β0) = 14,81 U(β0) = 60,11
10,736 = 3,49
Days spent sb1 tcalc(β1)
1 b1 = - 2,870 -,333 ,062 L(β1) = -5,90 U(β1) = … .
in the club =1,435 = ….
tcalc(β2)
Math-grade b2 =………. sb2=1,159 ,611 ,002 L(β2) = …. U(β2) = 6,70
= ….
a. Dependent Variable: Statistics-test-score

a. Identify the linear regression model in the sample. Interpret the estimates b 0 and b1.
b. Decide whether this equation is appropriate for explaining / describing the variance of
statistical test-scores depending on the time spent in the club and on the mathematics
exam grades. (Fcrit=3,59; tcrit=2,11)
- test the model validity
- test the statistical significance of the last two model-parameters.
c. Identify and interpret the confidence intervals of the last two model parameters.
d. What percent of the total variability in the Statistics-test-scores is explained by the
regression model?
e. Measure the intensity of the relationship between the variables and test the statistical
significance of the indicator used.

Downloaded by Ionescu Cristian ([email protected])


lOMoARcPSD|8641179

f. The following correlation-matrix is given:

Statistics-test-score Days in clubs Math-grade


Statistics-test-score …………..
Days in clubs -0,787 …………….
Math-grade 0,859 -0,742 ……………..

g. What would be the test-score at Statistics, for a student who spends 6 days in the club and
whose Math-exam grade is 6?

SOLUTION

a) The variables analyzed are:


X1 – independent variable: Number of days spent in the club
X2 – independent variable: Math-exam gade
Y – dependent variable: Statistics test score
The two independent variables can be found in “Coefficients” table, under the word “Constant” (in
SPSS) or “Intercept” (in Excel).
n = 20 (sample size)
k = 2 (number of independent variables)

The population 2-factor linear regression model is: y i   0   1 x1i   2 x 2i   i .

 0 , 1 ,  2 are the model parameters (the first one is the “intercept”; the second and the third are the
“slopes” parameters, also called partial regression coefficients).
The sample 2-factor linear regression model is y i  b0  b1 x1i  b2 x 2i  ei .

b0 , b1 , b2 are the estimates of the model parameters:  0 ,  1 ,  2 .


The linear regression equation is:
yˆ i  b0  b1  x1i  b2  x 2i
We have to identify the values of the three estimates.
In the 3rd table:
We know b1 = - 2,870, and have to find b0 and b2
b0 b0
tcomp  ; 3,49   b0  10,736  3,49  37,46
sb0 10,736

U  2   b2  t crit  sb 2  6,7  b2  2,11 1,159  b2  4,25


The linear regression equation is: yˆ i  37,46  2,87  x1i  4,25  x 2 i , i  1,20
and ŷi represents the estimated values of Y, calculated based on the regression model.
Interpretations of the partial regression coefficients:

Downloaded by Ionescu Cristian ([email protected])


lOMoARcPSD|8641179

 b1 reveals that the estimated average decrease in the Statistics test score is 2,87 points, as a result of
1 day increase in the time spent in the clubs (assuming that the Math exam grade remains
unchanged). Also, b1<0, showing that there is an inverse (negative) relationship between Y and X1.
 b2 reveals that the estimated average increase in the Statistics test score is 4,25 points, as a result of
1 point increase in the Math exam grade (assuming that the time spent in clubs remains unchanged).
Also, b2>0, showing that there is a direct (positive) relationship between Y and X2.
Coefficients
Intercept b0 = 37,46
Days spent in the club (X1) b1 = - 2,87
Math exam grade (X2) b2 = 4,25

b) Testing the model validity:


The hypotheses are:

H0: MSR = MSE (the regression model is valid)


H1: MSR > MSE (the regression model is not valid)
In order to decide which hypothesis should be accepted, we apply the F test, using the ANOVA table:
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression SSR =2729,25 k = 2 MSR = …….. Fcalc = …. ,000002
1 Residual SSE = … n-k-1 = 17 MSE = …..…
Total SST = … n-1 = 19
a. Dependent Variable: Statistics-test-score
b. Predictors: (Constant), Math-grade, Days spent in the club

The calculations are as follows:


From the 1st table we have: Se= MSE = 6,58 → MSE = (6,58)2=43,3

 SSE  MSE  n  k  1  43,3  17  736,1 (the Error Mean Square)


SSE
But MSE 
n  k 1
SSR 2729,25
MSR    1364,62 (the Regression Mean Square)
k 2
SST = SSR + SSE = 2729,25+736,1 = 3465,35
MSR 1364,62
Fcomp    31,51 (the computed F test value)
MSE 43,3
Decision rule: Fcomp  31,51  Fcrit  3,59 , Fcomp falls within the Rejection Region (Rr), we reject
H0 and accept H1, so the regression model is valid. (where Fcrit is the critical F test value)
Also, Significance F < 0.05 we reject H0 and accept H1, so the regression model is valid.
The maximum probability for which we sustain that the model is valid is:
100-Significance F%=99,999….>95%.

The ANOVA table is:

Downloaded by Ionescu Cristian ([email protected])


lOMoARcPSD|8641179

ANOVAa
Model Sum of df Mean Square F Sig.
Squares
MSR = Fcalc =
Regression SSR =2729,25 k=2 ,000002
1364,62 31,51
SSE = n-k-1 =
1 Residual
736,1 17 MSE = 43,3
SST =
Total n-1 = 19
3465,35

Testing the statistical significance of the model parameters. The last two parameters are 1 and  2 .
Testing 1 :

H0: 1 =0 ( 1 is not statistically significant)


H1: 1 ≠0 ( 1 is statistically significant)
b1 - 2,87
tcomp    2 , (t can be found in „t Stat” column, in the 3rd Excel table, or in
sb1 1,435 comp

“t” column in the 3rd SPSS table).


- tcrit<tcomp<tcrit, we accept H0, 1 is not statistically significant (tcrit =2,11 given in the application
text).
In addition, as Pvalue(β1)>0.05, we accept H0 and we conclude that 1 is not statistically significant.

The maximum probability for which we can sustain that 1 is statistically significant is:
100-pvalue(β1 )%=100-6,2=93,8%<95%.
Testing  2 :

H0:  2 =0 (  2 is not statistically significant)


H1:  2 ≠0 (  2 is statistically significant)
b2 4,25
tcomp    3,66 , (t can be found in „t Stat” column, in the 3rd Excel table, or in
sb 2 1,159 comp

“t” column in the 3rd SPSS table).


As tcomp>tcrit, we reject H0, accept H1,  2 is statistically significant (tcrit =2,11 given in the
application text).

In addition, as Pvalue(β2)<0.05, we accept H1 and conclude that  2 is statistically significant.

The maximum probability for which we can sustain that  2 is statistically significant is:
100-pvalue(β2 )%=100-0,2=99,8%>95%.
c) The confidence interval for 1 is:

Downloaded by Ionescu Cristian ([email protected])


lOMoARcPSD|8641179

b1  tcrit  sb1  1  b1  tcrit  sb1


  ,
lower limit of the confidence interval for 1 upper limit of the confidence interval for 1
L ( 1 ) U ( 1 )

where Sb1 is the standard error of the estimate b1.


In our case, b1= - 2,87, sb1=1,435, α=0,05, t crit  2,11, so the limits of the confidence interval are:
L1   b1  t crit  sb1  2,87  2,111,435  5,9
U 1   b1  t crit  sb1  2,87  2,11  1,435  0,16
The interval [-5,9; 0,16] covers the real values of the slope parameter 1 , with 0,95 (95%) probability.
Since the two limits of the confidence interval for the slope parameter 1 have different signs, the
interval doesn’t include 0, which means that 1 can be equal to 0 ( 1 =0), thus 1 is not statistically
significant.
Since 1 is not statistically significant, there is no need to interpret the values of the two limits.
The confidence interval for  2 is:
b2  tcrit  sb2  2  b2  tcrit  sb2
  ,
lower limit of the confidence interval for  2 upper limit of the confidence interval for  2
L ( 2 ) U ( 2 )

where Sb2 is the standard error of the estimate b2.


In our case, b2= 4,25, sb2=1,159, α=0,05, t crit  2,11, so the two limits are:
L 2   b2  tcrit  sb 2  4,25  2,11  1,159  1,8
U  2   b2  t crit  sb 2  4,25  2,111,159  6,7
The interval [1,8; 6,7] covers the real value of the slope parameter  2 , with 0,95 (95%) probability level.
Since the two limits of the confidence interval for  2 have the same sign (they both are positive), the
interval doesn’t include 0, therefore  2 can’t be equal to 0 (  2 ≠0) and  2 is statistically significant.
In order to interpret the values of the two limits, we might say that:
The estimated average increase in the Statistics test score ranges between 1,8 and 6,7 as a result of 1 point
increase in the Math exam grade (assuming that the number of days spent in clubs doesn’t change)

Table 3 is:
a
Coefficients
Model Unstandardized Standardized t Sig. 95,0% Confidence Interval for B
Coefficients Coefficients

B Std. Error Beta Lower Bound Upper Bound


s b0 = tcalc(β0)
(Constant) b0 =37,46 ,003 L(β0) = 14,81 U(β0) = 60,11
10,736 = 3,49
Time spent in sb1 tcalc(β1) U(β1) = 0,16
1
club
b1 =-2,870 -,333 ,062 L(β1) = -5,90
=1,435 = -2 .
tcalc(β2)
Math grade b2 =4,25 sb2=1,159 ,611 ,002 L(β2) = 1,8 U(β2) = 6,70
= 3,66
a. Dependent Variable: Statistic test score

Downloaded by Ionescu Cristian ([email protected])


lOMoARcPSD|8641179

d) The answer to this question is provided by the coefficient of determination (R2):


SSR 2729,25
The coefficient of determination R2    0,78 reveals that 78% (R2%) of the total
SST 3465,75
variability in the Statistics test score is explained by the independent variables (by the time spent in clubs
and by the math exam grade), or is explained by the regression model). The indicator can be found in the
1st Excel or SPSS table, under the name of „R Square”.
The rest up to 100% (meaning 22%) reveals that 22% of the total variability in the Statistics test score is
explained by random factors (other factors than the time spent in clubs and math grade), or is not
explained by the regression model.
The Adjusted R Square (Table 1) is found as follows:
2 MSE 43,3
R 1  1  0,76
MST 3465,75
19
e) In order to measure the strength of the relationship between the three variables, we can apply the
Multiple correlation ratio.
R  R 2  0,88
RЄ [0;1] , it can be found in the first Excel or SPSS Table („Multiple R” or “R”).
The correlation between the variables is a strong one, because R value is closed to 1.

Testing the statistical significance of the Multiple Correlation Ratio:

The hypotheses are formulated as follows:

H0: R = 0 (R is not statistically significant)


H1: R > 0 (R is statistically significant)

where R is the population correlation ratio, while R is the sample correlation ratio.

The F test is applied:

(see the ANOVA table), as Fcomp > Fcrit we reject H0, accept H1, therefore the indicator is statistically
significant.
Now, Table 1 is:

Model Summary
Model R R Square Adjusted R Std. Error of the
Square Estimate
1 R = 88 R2 =0,78 =0,76 Se= MSE = 6,58
a. Predictors: (Constant), Math-grade, Days spent in the club

f) The correlation matrix is:

Downloaded by Ionescu Cristian ([email protected])


lOMoARcPSD|8641179

On the main diagonal, the correlation coefficient is equal to 1, meaning that each variable is perfectly
self-correlated.
Statistics-test-score (Y) Days in clubs (X1) Math-grade (X2)
Statistics-test-score (Y) 1
Days in clubs (X1) -0,787 1
Math-grade (X2) 0,859 -0,742 1
rYX1 = -0,787 reveals a negative and relatively strong relationship between the Statistics test score and the
time spent in clubs.
rYX2 = +0,859 reveals a positive and strong relationship between the Statistics test score and the Math
exam grade.
rX1 X2 = -0,742 reveals a negative and relatively strong relationship between the Math exam grade and the
time spent in clubs.

g) In the linear regression equation: yˆ i  b0  b1  x1i  b2  x 2i , that is;


yˆ i  37,46  2,87  x1i  4,25  x 2 i we replace x1i with 6, and x2i with 6, obtaining the estimated Y:
yˆ i  37,46  2,87  6  4,25  6  45,74 points at Statistics test.
We mention, however, that the use of this regression equation to estimate a value of the dependent
variable is not appropriate, because not all model parameters are statistically significant.

Downloaded by Ionescu Cristian ([email protected])

You might also like