0% found this document useful (0 votes)

20 views8 pages

Solved Application On Multiple Linear Regression Model

The professor recorded test scores, time spent in clubs, and math exam grades for 20 students to analyze relationships between the variables. A multiple linear regression model found that: 1) Test scores decreased by 2.87 points for each additional day in clubs, and increased by 4.25 points for each additional point on the math exam. 2) The regression model was a valid fit for the data and both predictor variables were statistically significant. 3) Approximately 27% of variability in test scores was explained by time in clubs and math grades.

Uploaded by

CristiCristi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views8 pages

Solved Application On Multiple Linear Regression Model

Uploaded by

CristiCristi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

lOMoARcPSD|8641179

Solved application on multiple linear regression model

Econometrie Econometrics (Academia de Studii Economice din București)

Studocu is not sponsored or endorsed by any college or university

Downloaded by Ionescu Cristian ([Link]@[Link])
lOMoARcPSD|8641179

The multiple linear regression model – solved application

1. A professor of Statistics wishes to find out if there is a relationship between the test-scores of
his students (points), time spent in clubs (number of days) and the grade at math exam (points).
He recorded the values of the three variables for 20 students randomly selected and – after
Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate

1 R = …….
2
R =…….. …………. Se= MSE = 6,58
a. Predictors: (Constant), Math-grade, Days spent in the club.

a
ANOVA
Model Sum of Squares df Mean Square F Sig.
MSR = ……..
Regression SSR =2729,25 k=… Fcalc = …. ,000002

1 MSE = …..…
Residual SSE = … n-k-1 = …

Total SST = … n-1 = …

a. Dependent Variable: Statistics-test-score
b. Predictors: (Constant), Math-grade, Days spent in the club

a
Coefficients
Model Unstandardized Standardized t Sig. 95,0% Confidence Interval for B
Coefficients Coefficients

B Std. Error Beta Lower Bound Upper Bound

s b0 = tcalc(β0)
(Constant) b0 =…….… ,003 L(β0) = 14,81 U(β0) = 60,11
10,736 = 3,49
Days spent sb1 tcalc(β1)
1 b1 = - 2,870 -,333 ,062 L(β1) = -5,90 U(β1) = … .
in the club =1,435 = ….
tcalc(β2)
Math-grade b2 =………. sb2=1,159 ,611 ,002 L(β2) = …. U(β2) = 6,70
= ….
a. Dependent Variable: Statistics-test-score

a. Identify the linear regression model in the sample. Interpret the estimates b 0 and b1.
b. Decide whether this equation is appropriate for explaining / describing the variance of
statistical test-scores depending on the time spent in the club and on the mathematics
exam grades. (Fcrit=3,59; tcrit=2,11)
- test the model validity
- test the statistical significance of the last two model-parameters.
c. Identify and interpret the confidence intervals of the last two model parameters.
d. What percent of the total variability in the Statistics-test-scores is explained by the
regression model?
e. Measure the intensity of the relationship between the variables and test the statistical
significance of the indicator used.

Downloaded by Ionescu Cristian ([Link]@[Link])

lOMoARcPSD|8641179

f. The following correlation-matrix is given:

Statistics-test-score Days in clubs Math-grade

Statistics-test-score …………..
Days in clubs -0,787 …………….
Math-grade 0,859 -0,742 ……………..

g. What would be the test-score at Statistics, for a student who spends 6 days in the club and
whose Math-exam grade is 6?

SOLUTION

a) The variables analyzed are:

X1 – independent variable: Number of days spent in the club
X2 – independent variable: Math-exam gade
Y – dependent variable: Statistics test score
The two independent variables can be found in “Coefficients” table, under the word “Constant” (in
SPSS) or “Intercept” (in Excel).
n = 20 (sample size)
k = 2 (number of independent variables)

The population 2-factor linear regression model is: y i   0   1 x1i   2 x 2i   i .

 0 , 1 ,  2 are the model parameters (the first one is the “intercept”; the second and the third are the
“slopes” parameters, also called partial regression coefficients).
The sample 2-factor linear regression model is y i  b0  b1 x1i  b2 x 2i  ei .

b0 , b1 , b2 are the estimates of the model parameters:  0 ,  1 ,  2 .

The linear regression equation is:
yˆ i  b0  b1  x1i  b2  x 2i
We have to identify the values of the three estimates.
In the 3rd table:
We know b1 = - 2,870, and have to find b0 and b2
b0 b0
tcomp  ; 3,49   b0  10,736  3,49  37,46
sb0 10,736

U  2   b2  t crit  sb 2  6,7  b2  2,11 1,159  b2  4,25

The linear regression equation is: yˆ i  37,46  2,87  x1i  4,25  x 2 i , i  1,20
and ŷi represents the estimated values of Y, calculated based on the regression model.
Interpretations of the partial regression coefficients:

Downloaded by Ionescu Cristian ([Link]@[Link])

lOMoARcPSD|8641179

 b1 reveals that the estimated average decrease in the Statistics test score is 2,87 points, as a result of
1 day increase in the time spent in the clubs (assuming that the Math exam grade remains
unchanged). Also, b1<0, showing that there is an inverse (negative) relationship between Y and X1.
 b2 reveals that the estimated average increase in the Statistics test score is 4,25 points, as a result of
1 point increase in the Math exam grade (assuming that the time spent in clubs remains unchanged).
Also, b2>0, showing that there is a direct (positive) relationship between Y and X2.
Coefficients
Intercept b0 = 37,46
Days spent in the club (X1) b1 = - 2,87
Math exam grade (X2) b2 = 4,25

b) Testing the model validity:

The hypotheses are:

H0: MSR = MSE (the regression model is valid)

H1: MSR > MSE (the regression model is not valid)
In order to decide which hypothesis should be accepted, we apply the F test, using the ANOVA table:
ANOVAa
Model Sum of df Mean Square F Sig.
Squares
Regression SSR =2729,25 k = 2 MSR = …….. Fcalc = …. ,000002
1 Residual SSE = … n-k-1 = 17 MSE = …..…
Total SST = … n-1 = 19
a. Dependent Variable: Statistics-test-score
b. Predictors: (Constant), Math-grade, Days spent in the club

The calculations are as follows:

From the 1st table we have: Se= MSE = 6,58 → MSE = (6,58)2=43,3

 SSE  MSE  n  k  1  43,3  17  736,1 (the Error Mean Square)

SSE
But MSE 
n  k 1
SSR 2729,25
MSR    1364,62 (the Regression Mean Square)
k 2
SST = SSR + SSE = 2729,25+736,1 = 3465,35
MSR 1364,62
Fcomp    31,51 (the computed F test value)
MSE 43,3
Decision rule: Fcomp  31,51  Fcrit  3,59 , Fcomp falls within the Rejection Region (Rr), we reject
H0 and accept H1, so the regression model is valid. (where Fcrit is the critical F test value)
Also, Significance F < 0.05 we reject H0 and accept H1, so the regression model is valid.
The maximum probability for which we sustain that the model is valid is:
100-Significance F%=99,999….>95%.

The ANOVA table is:

Downloaded by Ionescu Cristian ([Link]@[Link])

lOMoARcPSD|8641179

ANOVAa
Model Sum of df Mean Square F Sig.
Squares
MSR = Fcalc =
Regression SSR =2729,25 k=2 ,000002
1364,62 31,51
SSE = n-k-1 =
1 Residual
736,1 17 MSE = 43,3
SST =
Total n-1 = 19
3465,35

Testing the statistical significance of the model parameters. The last two parameters are 1 and  2 .
Testing 1 :

H0: 1 =0 ( 1 is not statistically significant)

H1: 1 ≠0 ( 1 is statistically significant)
b1 - 2,87
tcomp    2 , (t can be found in „t Stat” column, in the 3rd Excel table, or in
sb1 1,435 comp

“t” column in the 3rd SPSS table).

- tcrit<tcomp<tcrit, we accept H0, 1 is not statistically significant (tcrit =2,11 given in the application
text).
In addition, as Pvalue(β1)>0.05, we accept H0 and we conclude that 1 is not statistically significant.

The maximum probability for which we can sustain that 1 is statistically significant is:
100-pvalue(β1 )%=100-6,2=93,8%<95%.
Testing  2 :

H0:  2 =0 (  2 is not statistically significant)

H1:  2 ≠0 (  2 is statistically significant)
b2 4,25
tcomp    3,66 , (t can be found in „t Stat” column, in the 3rd Excel table, or in
sb 2 1,159 comp

“t” column in the 3rd SPSS table).

As tcomp>tcrit, we reject H0, accept H1,  2 is statistically significant (tcrit =2,11 given in the
application text).

In addition, as Pvalue(β2)<0.05, we accept H1 and conclude that  2 is statistically significant.

The maximum probability for which we can sustain that  2 is statistically significant is:
100-pvalue(β2 )%=100-0,2=99,8%>95%.
c) The confidence interval for 1 is:

Downloaded by Ionescu Cristian ([Link]@[Link])

lOMoARcPSD|8641179

b1  tcrit  sb1  1  b1  tcrit  sb1

  ,
lower limit of the confidence interval for 1 upper limit of the confidence interval for 1
L ( 1 ) U ( 1 )

where Sb1 is the standard error of the estimate b1.

In our case, b1= - 2,87, sb1=1,435, α=0,05, t crit  2,11, so the limits of the confidence interval are:
L1   b1  t crit  sb1  2,87  2,111,435  5,9
U 1   b1  t crit  sb1  2,87  2,11  1,435  0,16
The interval [-5,9; 0,16] covers the real values of the slope parameter 1 , with 0,95 (95%) probability.
Since the two limits of the confidence interval for the slope parameter 1 have different signs, the
interval doesn’t include 0, which means that 1 can be equal to 0 ( 1 =0), thus 1 is not statistically
significant.
Since 1 is not statistically significant, there is no need to interpret the values of the two limits.
The confidence interval for  2 is:
b2  tcrit  sb2  2  b2  tcrit  sb2
  ,
lower limit of the confidence interval for  2 upper limit of the confidence interval for  2
L ( 2 ) U ( 2 )

where Sb2 is the standard error of the estimate b2.

In our case, b2= 4,25, sb2=1,159, α=0,05, t crit  2,11, so the two limits are:
L 2   b2  tcrit  sb 2  4,25  2,11  1,159  1,8
U  2   b2  t crit  sb 2  4,25  2,111,159  6,7
The interval [1,8; 6,7] covers the real value of the slope parameter  2 , with 0,95 (95%) probability level.
Since the two limits of the confidence interval for  2 have the same sign (they both are positive), the
interval doesn’t include 0, therefore  2 can’t be equal to 0 (  2 ≠0) and  2 is statistically significant.
In order to interpret the values of the two limits, we might say that:
The estimated average increase in the Statistics test score ranges between 1,8 and 6,7 as a result of 1 point
increase in the Math exam grade (assuming that the number of days spent in clubs doesn’t change)

Table 3 is:
a
Coefficients
Model Unstandardized Standardized t Sig. 95,0% Confidence Interval for B
Coefficients Coefficients

B Std. Error Beta Lower Bound Upper Bound

s b0 = tcalc(β0)
(Constant) b0 =37,46 ,003 L(β0) = 14,81 U(β0) = 60,11
10,736 = 3,49
Time spent in sb1 tcalc(β1) U(β1) = 0,16
1
club
b1 =-2,870 -,333 ,062 L(β1) = -5,90
=1,435 = -2 .
tcalc(β2)
Math grade b2 =4,25 sb2=1,159 ,611 ,002 L(β2) = 1,8 U(β2) = 6,70
= 3,66
a. Dependent Variable: Statistic test score

Downloaded by Ionescu Cristian ([Link]@[Link])

lOMoARcPSD|8641179

d) The answer to this question is provided by the coefficient of determination (R2):

SSR 2729,25
The coefficient of determination R2    0,78 reveals that 78% (R2%) of the total
SST 3465,75
variability in the Statistics test score is explained by the independent variables (by the time spent in clubs
and by the math exam grade), or is explained by the regression model). The indicator can be found in the
1st Excel or SPSS table, under the name of „R Square”.
The rest up to 100% (meaning 22%) reveals that 22% of the total variability in the Statistics test score is
explained by random factors (other factors than the time spent in clubs and math grade), or is not
explained by the regression model.
The Adjusted R Square (Table 1) is found as follows:
2 MSE 43,3
R 1  1  0,76
MST 3465,75
19
e) In order to measure the strength of the relationship between the three variables, we can apply the
Multiple correlation ratio.
R  R 2  0,88
RЄ [0;1] , it can be found in the first Excel or SPSS Table („Multiple R” or “R”).
The correlation between the variables is a strong one, because R value is closed to 1.

Testing the statistical significance of the Multiple Correlation Ratio:

The hypotheses are formulated as follows:

H0: R = 0 (R is not statistically significant)

H1: R > 0 (R is statistically significant)

where R is the population correlation ratio, while R is the sample correlation ratio.

The F test is applied:

(see the ANOVA table), as Fcomp > Fcrit we reject H0, accept H1, therefore the indicator is statistically
significant.
Now, Table 1 is:

Model Summary
Model R R Square Adjusted R Std. Error of the
Square Estimate
1 R = 88 R2 =0,78 =0,76 Se= MSE = 6,58
a. Predictors: (Constant), Math-grade, Days spent in the club

f) The correlation matrix is:

Downloaded by Ionescu Cristian ([Link]@[Link])

lOMoARcPSD|8641179

On the main diagonal, the correlation coefficient is equal to 1, meaning that each variable is perfectly
self-correlated.
Statistics-test-score (Y) Days in clubs (X1) Math-grade (X2)
Statistics-test-score (Y) 1
Days in clubs (X1) -0,787 1
Math-grade (X2) 0,859 -0,742 1
rYX1 = -0,787 reveals a negative and relatively strong relationship between the Statistics test score and the
time spent in clubs.
rYX2 = +0,859 reveals a positive and strong relationship between the Statistics test score and the Math
exam grade.
rX1 X2 = -0,742 reveals a negative and relatively strong relationship between the Math exam grade and the
time spent in clubs.

g) In the linear regression equation: yˆ i  b0  b1  x1i  b2  x 2i , that is;

yˆ i  37,46  2,87  x1i  4,25  x 2 i we replace x1i with 6, and x2i with 6, obtaining the estimated Y:
yˆ i  37,46  2,87  6  4,25  6  45,74 points at Statistics test.
We mention, however, that the use of this regression equation to estimate a value of the dependent
variable is not appropriate, because not all model parameters are statistically significant.

Downloaded by Ionescu Cristian ([Link]@[Link])

QMB12 CH 6 A
No ratings yet
QMB12 CH 6 A
56 pages
Geostatistics for Coal Resource Experts
No ratings yet
Geostatistics for Coal Resource Experts
1 page
Advanced Causal Analysis Techniques
No ratings yet
Advanced Causal Analysis Techniques
21 pages
GLM Assign
No ratings yet
GLM Assign
3 pages
L6 All Together
No ratings yet
L6 All Together
68 pages
Autocorrelation Notes PDF
No ratings yet
Autocorrelation Notes PDF
6 pages
Exercises For l5 With Answers
No ratings yet
Exercises For l5 With Answers
6 pages
A Complete Guide To Model Evaluation Metrics
No ratings yet
A Complete Guide To Model Evaluation Metrics
9 pages
WWW Adichemistry Com
No ratings yet
WWW Adichemistry Com
6 pages
Course Outline - Advanced Econometrics - SemVI - 23-24
No ratings yet
Course Outline - Advanced Econometrics - SemVI - 23-24
3 pages
Month Actual Shed Sales 2-Month MA 3-Month MA 4-Month MA
No ratings yet
Month Actual Shed Sales 2-Month MA 3-Month MA 4-Month MA
17 pages
PPT OME Lec 3 Lesson 1 Forecasting
No ratings yet
PPT OME Lec 3 Lesson 1 Forecasting
33 pages
Correlation & Regression Analysis
No ratings yet
Correlation & Regression Analysis
19 pages
AI 900 Questions Bank
No ratings yet
AI 900 Questions Bank
14 pages
Bioestadistica 5
No ratings yet
Bioestadistica 5
42 pages
Spss Analysis Logics
No ratings yet
Spss Analysis Logics
7 pages
Regression Analysis Overview
No ratings yet
Regression Analysis Overview
5 pages
ECD202 Lec04 2023
No ratings yet
ECD202 Lec04 2023
9 pages
Chapter 14 Simple Linear Regression
No ratings yet
Chapter 14 Simple Linear Regression
45 pages
Cases Conjoint Analysis
No ratings yet
Cases Conjoint Analysis
5 pages
UNIT-3 Supervised Learning
No ratings yet
UNIT-3 Supervised Learning
77 pages
Results
No ratings yet
Results
8 pages
Chapter 7 (Time Series Analysis - Forecasting)
No ratings yet
Chapter 7 (Time Series Analysis - Forecasting)
36 pages
Eviews 1418112021
No ratings yet
Eviews 1418112021
80 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
Postgraduate PG - Mba - Semester 3 - 2022 - November - Advanced Statistical Method Using R Pattern 2019
No ratings yet
Postgraduate PG - Mba - Semester 3 - 2022 - November - Advanced Statistical Method Using R Pattern 2019
2 pages
F00010769-WVSA Citation Format 2024
No ratings yet
F00010769-WVSA Citation Format 2024
2 pages
Binary Choice Models - Lindear Probability Model
No ratings yet
Binary Choice Models - Lindear Probability Model
26 pages
Econometrics Theory Practice I 2005 Solutions
No ratings yet
Econometrics Theory Practice I 2005 Solutions
4 pages
Performance Metrics
No ratings yet
Performance Metrics
3 pages

Solved Application On Multiple Linear Regression Model

Uploaded by

Solved Application On Multiple Linear Regression Model

Uploaded by

lOMoARcPSD|8641179

Solved application on multiple linear regression model

Econometrie Econometrics (Academia de Studii Economice din București)

Studocu is not sponsored or endorsed by any college or university

The multiple linear regression model – solved application

Model R R Square Adjusted R Square Std. Error of the Estimate

Total SST = … n-1 = …

B Std. Error Beta Lower Bound Upper Bound

Downloaded by Ionescu Cristian ([Link]@[Link])

f. The following correlation-matrix is given:

Statistics-test-score Days in clubs Math-grade

a) The variables analyzed are:

The population 2-factor linear regression model is: y i   0   1 x1i   2 x 2i   i .

b0 , b1 , b2 are the estimates of the model parameters:  0 ,  1 ,  2 .

U  2   b2  t crit  sb 2  6,7  b2  2,11 1,159  b2  4,25

Downloaded by Ionescu Cristian ([Link]@[Link])

b) Testing the model validity:

H0: MSR = MSE (the regression model is valid)

The calculations are as follows:

 SSE  MSE  n  k  1  43,3  17  736,1 (the Error Mean Square)

The ANOVA table is:

Downloaded by Ionescu Cristian ([Link]@[Link])

H0: 1 =0 ( 1 is not statistically significant)

“t” column in the 3rd SPSS table).

H0:  2 =0 (  2 is not statistically significant)

“t” column in the 3rd SPSS table).

In addition, as Pvalue(β2)<0.05, we accept H1 and conclude that  2 is statistically significant.

Downloaded by Ionescu Cristian ([Link]@[Link])

b1  tcrit  sb1  1  b1  tcrit  sb1

where Sb1 is the standard error of the estimate b1.

where Sb2 is the standard error of the estimate b2.

B Std. Error Beta Lower Bound Upper Bound

Downloaded by Ionescu Cristian ([Link]@[Link])

d) The answer to this question is provided by the coefficient of determination (R2):

Testing the statistical significance of the Multiple Correlation Ratio:

The hypotheses are formulated as follows:

H0: R = 0 (R is not statistically significant)

The F test is applied:

f) The correlation matrix is:

Downloaded by Ionescu Cristian ([Link]@[Link])

g) In the linear regression equation: yˆ i  b0  b1  x1i  b2  x 2i , that is;

Downloaded by Ionescu Cristian ([Link]@[Link])

You might also like