0% found this document useful (0 votes)

37 views

Practical Session 1 Solved

A multiple linear regression analysis was conducted to model desire for cosmetic surgery based on self-esteem, body satisfaction, impressions of reality TV, and gender. The analysis found that body satisfaction, impressions of reality TV, and gender were significant predictors of desire for cosmetic surgery, but self-esteem was not a significant predictor. Some multicollinearity was detected between the predictor variables. An alternative model was estimated without the non-significant self-esteem variable.

Uploaded by

Théobald Vitoux

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views

Practical Session 1 Solved

Uploaded by

Théobald Vitoux

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

L.F.

- ST22 - Advanced Statistical Methods

Practical Session 1 – Multiple Linear Regression Using Excel

Objectives

Using Excel, you will conduct multiple linear regression analysis.

You must follow the different steps of linear regression analysis, as discussed in class, and interpret all the obtained outputs.
You can rely on the below checklist specifying all the elements that must be included.

1. Choice of variables included in the model

Multiple regression is used to model desire to have cosmetic surgery (y) as a function of the following explanatory variables
(regressors) :
 x 1 : self-esteem,
 x 2 : body satisfaction, and
 x 3 : impression of reality TV
 x 4 : gender

Therefore, the model equation is the following :

Y = β0 + β 1 · x 1 ,i + β 2 · x 2, i+ β3 · x3 , i+ β 4 · x 4 , i+ where N (0 , σ)

2. Equation for model (based on sample  estimates) & interpretation of coefficients

Using Excel, we obtain:

^y i=14.01−0.05· x 1 ,i−0.32· x2 , i+ 0.49· x 3 ,i−2.19 · x 4 ,i

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.7054
R Square 0.4976
Adjusted R Square 0.4854
Standard Error 2.2509
Observations 170.0000
ANOVA
df SS MS F Significance F
Regression 4.0000 827.8333 206.9583 40.8492 0.0000
Residual 165.0000 835.9549 5.0664
Total 169.0000 1663.7882
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 14.0107 0.7753 18.0702 0.0000 12.4798 15.5415 12.4798 15.5415
SELFESTM -0.0479 0.0367 -1.3066 0.1932 -0.1204 0.0245 -0.1204 0.0245
BODYSAT -0.3223 0.1435 -2.2465 0.0260 -0.6056 -0.0390 -0.6056 -0.0390
IMPREAL 0.4931 0.1274 3.8707 0.0002 0.2416 0.7446 0.2416 0.7446
GENDER -2.1865 0.6766 -3.2314 0.0015 -3.5225 -0.8505 -3.5225 -0.8505

Interpretation of the unknown parameters.

 The “desire to have cosmetic surgery” is decrease on average by 0.05 units for each increase of a unitary value of “self-
esteem”, keeping the rest of the regressors constant.

 The “desire to have cosmetic surgery” is decreased on average by 0.32 units for each increase of a unitary value of
“body-satisfaction”, keeping the rest of the regressors constant.

 The “desire to have cosmetic surgery” is increased on average by 0.49 units for each increase of a unitary value of
“impression of reality TV”, keeping the rest of the regressors constant.
L.F. - ST22 - Advanced Statistical Methods

 Since “gender” is a qualitative variable, the interpretation of its coefficient is out of the scope of this course.

Interpretation of the dependent variable

 When all variables are equal to zero, the desire to have costmetic surgery is expected to be equal to 14.01.

3. Global model validity (Fisher test) & model quality (R2 and R2adjusted)

Considering that the p-value for the Fisher F test is closed to zero, we conclude that the model is valid for the overall population,
since there is, at least, one significant regressor for predicting y.

Considering the coefficient of determination (0.4976) and the adjusted coefficient of determination (0.4854), we conclude that
approximately the half of the total variability is explained by the regression model, (which is low).

(other example)

4. Marginal contributions of explanatory variables (Student tests) & Confidence Intervals for ’s

The p-value of the student t tests are provided in the second column of the following table. And the confidence interval are
provided in the last two columns.

P-value Lower 95% Upper 95%

SELFESTM 0.1932 -0.1204 0.0245
BODYSAT 0.0260 -0.6056 -0.0390
IMPREAL 0.0002 0.2416 0.7446
GENDER 0.0015 -3.5225 -0.8505

The p-values for variables BODYSAT, IMPREAL, GENDER are lower than 0.05. Thus, we can conclude that the aforementioned
variables have a significant marginal contribution with a 95% of confidence. (This can also be observed in the confidence
intervals, since the zero value is not included in the interval)

On the other hand, the p-value for the variable SELFESTM is higher than 0.05, denoting that this variable is not significant with a
95% of confidence. (This can also be observed in the confidence intervals, since the zero value is inside in the interval).

Correlation coefficients interpretation

The correlation coefficients in Table 1 provide information about the linear relationship between pairs of variables. A correlation
coefficient of zero indicates that there is no linear relationship between the two variables. The closer the coefficient of correlation
is to -1 or +1 the stronger the relationship is between the two variables; a correlation coefficient of +1 indicates perfect positive
relationship and -1 indicates a perfect negative relationship.
The correlation of 0.686 between Age and Experience, for example, indicates that the higher the age of a participant the greater is
their work experience on average. However, note that this relationship is not perfect, i.e., a higher age does not mean
proportionally greater experience in all cases but just that on average experience is greater for the higher age group.

T-test interpretation
The "t-test" in the various regressions indicates the statistical significance of the relationship between the two variables.
For example, the t-test between Salary and Age is 2.128, the absolute value of which is greater than the critical t-value of 1.969.
This means that the correlation between Age and Salary is statistically significant; we accept the hypothesis that the relationship
between Age and Salary that we have observed in the sample data is not simply due to chance. Furthermore, the regression
coefficient of 802.9337 for Age means that as Age increases by one year the average salary is higher by $802.9337.
L.F. - ST22 - Advanced Statistical Methods

Note that in regression (iii), the t-test for Sex is -0.079. In this case, the absolute value of the t-test, 0.079, is less than the critical
value of 1.969. Hence, we can infer that the correlation between Sex and Salary is statistically non-significant; we accept the
hypothesis that the correlation between Sex and Salary is really zero in the population of interest and we have observed something
slightly different than zero in our sample simply due to chance (due to sampling fluctuations). It follows that the regression
coefficient of -192.7 for Sex is not significant and we accept the hypothesis that it is really zero for the population.
Similarly, we can interpret the results of the other regressions.

5. New case estimation. Forecasted values.

Participant 171 Participant 172

GENDER = female (0 value) GENDER = male (1 value)
SELFESTM = 20 SELFESTM = 38
BODYSAT = 4 BODYSAT = 2
IMPREAL = 3 IMPREAL = 2

We substitute the previous values for the regressors in the following equation:

^y i=14.01−0.05· x 1 ,i−0.32· x2 , i+ 0.49· x 3 ,i−2.19 · x 4 ,i

Obtaining:

betas participant 171 betas participant 172

14.0107 1 14.0107 1
SELFEST
M -0.0479 20 -0.0479 38
BODYSAT -0.3223 4 -0.3223 2
IMPREAL 0.4931 3 0.4931 2
GENDER -2.1865 0 -2.1865 1
39.9470 55.9470

6. A-posteriori diagnostics: Residual analysis (including plots) & Multicollinearity analysis

Concerning the multicollinearity analysis, we compute the correlation coefficients between the variables:

SELFEST BODYSA
DESIRE M T IMPREAL GENDER
DESIRE 1
SELFEST
M -0.48546 1
BODYSA
T -0.64359 0.757216 1
IMPREAL 0.131726 0.16651 0.14344 1
GENDER -0.63678 0.511079 0.828248 0.065294 1

We observe some high correlations between regressors, denoted in bold font type in the previous table.

Concerning the residual analysis, the residuals’ histogram is provided, denoting normality:

The plot of standard residuals is plotted, denoting that they are homoscedastic (constant in variance), and outliers are not detected
with a 99% confidence (since all standard residuals have an absolute value lower than 3).
L.F. - ST22 - Advanced Statistical Methods

Standard Residuals
12

0
0 2 4 6 8 10 12

7. Alternative (improved) model

Since we observed that the variable SELFESTM is not significant with a 95% of confidence, we remove it from the model and re-
estimate it again.

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.7017
R Square 0.4924
Adjusted R Square 0.4832
Standard Error 2.2557
Observations 170.0000
ANOVA
df SS MS F Significance F
Regression 3.0000 819.1836 273.0612 53.6679 0.0000
Residual 166.0000 844.6047 5.0880
Total 169.0000 1663.7882
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 13.3228 0.5704 23.3549 0.0000 12.1965 14.4491
BODYSAT -0.4508 0.1047 -4.3065 0.0000 -0.6575 -0.2441
IMPREAL 0.4827 0.1274 3.7885 0.0002 0.2311 0.7343
GENDER -1.9113 0.6444 -2.9661 0.0035 -3.1836 -0.6391

In the alternative model, all the regressors are significant.

(Both the R Square and adjusted R Square have not changed significantly.)

The final equation is the following:

^y i=13.32−0.45· bodysat i +0.48 ·impreal i−1.91 · gender i

Context

Reality TV and Cosmetic Surgery

(Chapter 12: #12.17 p.725; #12.27 p.732; #12.128 p.804; #12.116 p.782)

How much influence does the media, especially reality television programs, have on one’s decision to undergo cosmetic surgery?
This was the question of interest to psychologists who published an article in Body Image: An International Journal of Research
(March 2010).
In the study, 170 college students answered questions about their impressions of reality TV shows featuring cosmetic surgery.

The five variables analyzed in the study were measured as follows:

DESIRE—scale ranging from 5 to 25, where the higher the value, the greater the interest in having cosmetic surgery;
GENDER—1 if male, 0 if female;
SELFESTM—scale ranging from 4 to 40, where the higher the value, the greater the level of self- esteem;
L.F. - ST22 - Advanced Statistical Methods

BODYSAT—scale ranging from 1 to 9, where the higher the value, the greater the satisfaction with one’s own body;
IMPREAL—scale ranging from 1 to 7, where the higher the value, the more one believes reality television shows featuring
cosmetic surgery are realistic.

The data for the study (simulated based on statistics reported in the journal article) are saved in the Excel file BDYIMG (available
on BlackBoard).

Additional Participants:

Participant 171 Participant 172

GENDER = female GENDER = male
SELFESTM = 20 SELFESTM = 38
BODYSAT = 4 BODYSAT = 2
IMPREAL = 3 IMPREAL = 2

Exercise : Testing a model (without all tables)

A staff restaurant conducted a survey collecting data from a random sample of 32 clients. They were asked,
among other things: how many times did they eat the restaurant during the last month (variable:
FREQUENCY); how much did they spend on a meal (variable: SPENDING); how old they were (variable:
AGE).

The restaurant manager would like to construct a model that would explain the spending amount in
terms of the frequency and the age for all clients.

You will find below:

Appendix 1: scatterplot of SPENDING w.r.t. FREQUENCY
Appendix 2: scatterplot of SPENDING w.r.t. AGE
Appendix 3: linear regression results obtained using a statistical package modeling SPENDING in
terms of FREQUENCY and AGE.

You are asked to proceed with the different tests required to validate the linear regression model for all
clients among which the sample was taken.

Appendix 1 (1st scatterplot)

Response variable: SPENDING
Explanatory variable: FREQUENCY
L.F. - ST22 - Advanced Statistical Methods

Appendix 2 (2nd scatterplot)

Response variable: SPENDING
Explanatory variable: AGE

Appendix 3 (multiple linear regression model)

Response variable: SPENDING
Explanatory variables: FREQUENCY and AGE

Variable #1 (SPENDING)
Mean 4.17188
Corrected Standard Deviation 1.53249
Variable #2 (FREQUENCY)
Mean 10.59375
Corrected Standard Deviation 6.76738
Variable #3 (AGE)
Mean 35.75
Corrected Standard Deviation 11.6453
Count n 32
R-square 0.58999

Coefficient Standard Error

Intercept 6.13803
FREQUENCY -0.17208 0.02782
AGE -0.00401 0.01617

ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 29.8504
Total 72.8047 ?
L.F. - ST22 - Advanced Statistical Methods

SOLUTIONS:
ANOVA TABLE
SS df MS F p-value
Regression 42,9543 2 21,47715 20,8652933 <0,01
Residual 29,8504 29=32-(2+1) 1,02932414
Total 72,8047 32-1=31

42,9543=72,8047-29,8504

Overall test (F test)

H0 : 1=2=0
H1 : at least one of the j is not 0 j=1,2

From the F table (df1= 2; df2=29) critical value is 3,33 (5%)

There exists enough statistical evidence in order to reject the null hypothesis (no model).

Coefficient Standard Error t p-value

Intercept 6.13803
FREQUENCY -0.17208 0.02782 -6,18547807 <,05
AGE -0.00401 0.01617 -0,24799011 >,05

Frequency:

[ ]
95 % CI for β 1= ^β1 ±t 29 ,α /2 × s ^β =[−0.17208± 2,045 ×0.02782 ]
1

We will run a Student test for the coefficient 1 for the explanatory variable Frequency as follows:
H0: 1=0 in the presence of AGE
H1: 1≠0 in the presence of AGE
^β1 −0 −0.17208
We calculate the test statistic t= = =−6,18547807
s ^β 0.02782
1

The critical values associated with a Student distribution for df=29 and a type I error risk =0,05 are t29;0.025 =
2,045 (two tailed test).
|t|>>critical value so we can reject H0

Age:

[ ]
95 % CI for β 2= ^β2 ± t 29 ,α /2 × s ^β =[ −0.00401± 2,045 ×0.01617 ]
2

We will run a Student test for the coefficient 2 for the explanatory variable Age as follows:
H0: 2=0 in the presence of FREQUENCY
H1: 2≠0 in the presence of FREQUENCY
^β 2−0 −0.00401
We calculate the test statistic t= = =−0,24799011
s ^β 0.01617
2

The critical values associated with a Student distribution for df=29 and a type I error risk =0,05 are t29;0.025 =
2,045 (two tailed test).
|t|<critical value so we can not reject H0
L.F. - ST22 - Advanced Statistical Methods

Exercise 2: Choosing a model and using it

The HR director of an industrial group would like to construct a model explaining the monthly salary
of all employees.
Using data collected from a random sample of 36 employees, he tests two explanatory variables that
he deems relevant: the number of years of graduate studies (X1) and the number of years of service
(X2).
You can find below results from three regression models he tested using Excel.

1) For each of the three suggested models:

a. Calculate the coefficient of determination r²
b. Conduct the Student tests for the explanatory variables.
2) Can you help the HR director choose the most suitable model?
a. Using the information provided, which model would you suggest to use? Justify your choice.
b. Estimate the parameters of the chosen model.
3) Pierre Durand, an employee of this group, is 38 years old, with 10 years of service and 4 years of
graduate studies. His monthly salary is 2050 Euros and he thinks he is underpaid.
Calculate a 95% confidence interval for the mean salary of an employee with Pierre Durand’s profile.
If you were the HR director of that firm, what would you tell Pierre Durand about his salary?

Variable Mean Sample standard deviation*

Y 1850 795.88
X1 3.5 2.40
X2 4.17 2.15
* Reminder: this is the value of the unbiased point estimate (coefficient 1/ n-1)

Regression of Y w.r.t X1
ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 694 925
Total 22 170 000

Coefficients Standard-
error
Constant 706
Variable X1 326.9 10.08
L.F. - ST22 - Advanced Statistical Methods

Regression of Y w.r.t. X2
ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 22 156 025
Total ? ?

Coefficients Standard-
error
Constant 1 811.2
Variable X2 9.32 63.62

Regression of Y w.r.t. X1, X2

ANALYSE OF VARIANCE
Sum of Squares
Regression ?
Residual 681 981.4
Total ?

Coefficients Standard-
error
Constant 742
Variable X 1 327.27 10.15
Variable X 2 -8.98 11.35
L.F. - ST22 - Advanced Statistical Methods

SOLUTIONS:

1) For each of the three suggested models:

a. Calculate the coefficient of determination r²
b. Conduct the Student tests for the explanatory variables.

a. r²=SSRegression/SSTotal=(SSTotal-SSResidual)/SSTotal

Regression of Y w.r.t X1
ANALYSIS OF VARIANCE
Sum of Squares
Regression 21475075
Residual 694 925
Total 22 170 000

r²=(22170000-694925)/22170000= 21475075/22170000=0,96865471

Regression of Y w.r.t. X2
ANALYSIS OF VARIANCE
Sum of Squares
Regression 13975
Residual 22 156 025
Total 22 170 000 ?

r²=(22170000-22 156 025)/22170000=13975/22170000=0,00063036

Regression of Y w.r.t. X1, X2

ANALYSIS OF VARIANCE
Sum of Squares
Regression 21488018,6
Residual 681 981.4
Total 22 170 000

r²=(22170000-681 981.4)/22170000=21488018,6/22170000=0,96923855

Regression of Y w.r.t X1
Coefficients Standard-
error
Constant 706
Variable X1 326.9 10.08

We will run a Student test for the coefficient 1 for the explanatory variable X1 as follows:
H0: 1=0
H1: 1≠0
L.F. - ST22 - Advanced Statistical Methods

^β −0 326,9
1
We calculate the test statistic t= = =¿32,4305556
s ^β
1
10,08

The critical values associated with a Student distribution for df=n-2=36-2=34 and a type I error risk =0,05
t35;0.025 are not included in the tables. We know t30;0.025 = 2.042 and t40;0.025 = 2.021 (two tailed test).
|t|>>critical value so we can reject H0

Regression of Y w.r.t X2
Coefficients Standard-
error
Constant 1 811.2
Variable X2 9.32 63.62

We will run a Student test for the coefficient 2 for the explanatory variable X2 as follows:
H0: 2=0
H1: 2≠0

^β −0 9.32
2
We calculate the test statistic t= = =¿ 0,14649481
s ^β 63.62
12

Regression of Y w.r.t. X1, X2

Coefficients Standard-
error
Constant 742
Variable X 1 327.27 10.15
Variable X 2 -8.98 11.35

X1:
We will run a Student test for the coefficient 1 for the explanatory variable X1 as follows:
H0: 1=0 in the presence of X2
H1: 1≠0 in the presence of X2
^β1 −0 327,27
We calculate the test statistic t= = =¿32,2433498
s ^β 10,15
1

The critical values associated with a Student distribution for df=n-(k+1)=n-3=36-3=33 and a type I error risk =0,05
t33;0.025 are not included in the tables. We know t30;0.025 = 2.042 and t40;0.025 = 2.021 (two tailed test).

|t|>>critical value so we can reject H0

X2:
We will run a Student test for the coefficient 2 for the explanatory variable X2 as follows:
H0: 2=0 in the presence of X1
H1: 2≠0 in the presence of X1
^β 2−0 −8,98
We calculate the test statistic t= = =−¿ 0,79118943
s ^β 11,35
2
L.F. - ST22 - Advanced Statistical Methods

2.a Choose the most suitable model

From 1.b we choose Regression of Y w.r.t X1

2.b parameters (B0, B1, s2)

Coefficients Standard-
error
Constant 706
Variable X1 326.9 10.08

^β =706 ; ^β =326,9
0 1
ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 694 925
Total 22 170 000

S2=SSResidual/(n-2)= 694 925/34= 20438,9706

3) Pierre Durand, an employee of this group, is 38 years old, with X2=10 years of service and X1=4
years of graduate studies. His monthly salary is Y=2050 Euros and he thinks he is underpaid.
Calculate a 95% confidence interval for the mean salary of an employee with Pierre Durand’s profile.

We use the model chosen in 2.a

95 % CI for E ( y ) when x =4

[ √ ][ √ ]
2
1 ( x p−x )
2
1 ( 4−3,5 )
^y ± t × sε + = 2013,6 ± 2, 042 ×142,964928 +
n−2 ,
α
2
n ∑ ( x−x )2 36 35× 2.402
¿ [ 2013,6 ±2 , 042× 142,964928× 0,17034629 ] =[ 2013,6 ± 49,7299378 ] =[1963,87006 ; 2063,32994 ]
sε =√ S 2=¿ 142,964928
^y =706 +326,9∗4=¿2013,6

T34;0.05 2 . 042 using the Student table with df=30.

∑ ( x− x )2= ( n−1 ) × s2x =35 ×2.402 .
If you were the HR director of that firm, what would you tell Pierre Durand about his salary?

His salary belongs to the CI.

He is not underpaid (in fact, his salary is higher than the prediction!).

Durbin Watson:

Visually: A visual examination of this plot does not show any obvious pattern in the residual.

The test statistic always ranges from 0 to 4 where:

L.F. - ST22 - Advanced Statistical Methods

 d = 2 indicates no autocorrelation

 d < 2 indicates positive serial correlation
 d > 2 indicates negative serial correlation
In general, if d is less than 1.5 or greater than 2.5 then there is potentially a serious autocorrelation problem. Otherwise, if d is
between 1.5 and 2.5 then autocorrelation is likely not a cause for concern.

H0 (null hypothesis): There is no correlation among the residuals.

HA (alternative hypothesis): The residuals are autocorrelated.

For α = .05, n = 13 observations, and k = 2 independent variables in the regression model, the Durbin-Watson table shows the
following upper and lower critical values:
 Lower critical value: 0.86
 Upper critical value: 1.56

Since our test statistic of 1.3475 doe not lie outside of this range, we do not have sufficient evidence to reject the null hypothesis

of the Durbin-Watson test. In other words, there is no correlation among the residuals.

If you reject the null hypothesis and conclude that autocorrelation is present in the residuals, then you have a few different options
to correct this problem if you deem it to be serious enough:
 For positive serial correlation, consider adding lags of the dependent and/or independent variable to the model.
 For negative serial correlation, check to make sure that none of your variables are overdifferenced.
 For seasonal correlation, consider adding seasonal dummy variables to the model.

Confidence Interval for Regression Coefficients

With Bj being the coefficient of the regression coefficient concerned and sbj its standard error.
Study the impact of “dependent variable” on “independent

variable”

Forecasting
L.F. - ST22 - Advanced Statistical Methods

Use standard deviation of Regression or Global Standard errror

Statistical Modelling For Biomedical Researchers
100% (2)
Statistical Modelling For Biomedical Researchers
544 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Impact of Leadership Styles On Business Performance in Construction Industry in Gaborone
No ratings yet
Impact of Leadership Styles On Business Performance in Construction Industry in Gaborone
37 pages
Quality Improvement Project
100% (1)
Quality Improvement Project
7 pages
15Multiple Linear Regression
No ratings yet
15Multiple Linear Regression
168 pages
SPSS and Building Models
No ratings yet
SPSS and Building Models
36 pages
Session 1.3 Notes
No ratings yet
Session 1.3 Notes
39 pages
Module 4 Advanced Data Analytics Techniques BRM
No ratings yet
Module 4 Advanced Data Analytics Techniques BRM
29 pages
Chapter 3 Notes Part 3
No ratings yet
Chapter 3 Notes Part 3
9 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
73 pages
Bioestadistica 5
No ratings yet
Bioestadistica 5
42 pages
120.508 Module 8 Multiple Regression (PDF Full Page Color)
No ratings yet
120.508 Module 8 Multiple Regression (PDF Full Page Color)
52 pages
English ST 3001 Exam V 2012
No ratings yet
English ST 3001 Exam V 2012
7 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Chapter 3 MLR
No ratings yet
Chapter 3 MLR
40 pages
MLR
No ratings yet
MLR
48 pages
Linear Regression
100% (2)
Linear Regression
28 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
19 pages
Multiple Regression
No ratings yet
Multiple Regression
36 pages
Problem Set 1.5
No ratings yet
Problem Set 1.5
4 pages
CH 14 Ans
No ratings yet
CH 14 Ans
3 pages
Multiple Linear Regression Session 4
No ratings yet
Multiple Linear Regression Session 4
32 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Lecture 15
No ratings yet
Lecture 15
12 pages
MultipleRegression 1
No ratings yet
MultipleRegression 1
40 pages
C2 English
No ratings yet
C2 English
34 pages
10 Regression Analysis
No ratings yet
10 Regression Analysis
55 pages
Assignment 2.1
No ratings yet
Assignment 2.1
19 pages
Biostatistics (Correlation and Regression)
100% (1)
Biostatistics (Correlation and Regression)
29 pages
Nu - Edu.kz Econometrics-I Assignment 7 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 7 Answer Key
6 pages
MSC Nursing
No ratings yet
MSC Nursing
8 pages
Week 11-2 Lecture 15 Student
No ratings yet
Week 11-2 Lecture 15 Student
54 pages
Bivariate
No ratings yet
Bivariate
28 pages
Regression Models: Eaching Uggestions
No ratings yet
Regression Models: Eaching Uggestions
7 pages
STATA Red Tutorial
100% (1)
STATA Red Tutorial
84 pages
F_Regression
No ratings yet
F_Regression
65 pages
05 Linear Regression 2
No ratings yet
05 Linear Regression 2
71 pages
Regression Analysis
No ratings yet
Regression Analysis
20 pages
Quantitative Anaysise Solomon
No ratings yet
Quantitative Anaysise Solomon
51 pages
Unit 3
No ratings yet
Unit 3
24 pages
Regression ANOVA Compiled
No ratings yet
Regression ANOVA Compiled
112 pages
Linear Regression With Length Predicted by Dose-1
No ratings yet
Linear Regression With Length Predicted by Dose-1
7 pages
Multiple Linear Regression 1
No ratings yet
Multiple Linear Regression 1
115 pages
InferentialStats SPSS
No ratings yet
InferentialStats SPSS
14 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
11 Regression JASP
100% (1)
11 Regression JASP
35 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Solutions - Exercises - 1 - and - 2 Multiple - Linear - Regression
No ratings yet
Solutions - Exercises - 1 - and - 2 Multiple - Linear - Regression
8 pages
Theme 3 Multivariante Regression Model
No ratings yet
Theme 3 Multivariante Regression Model
8 pages
Module01.1 LinearRegression
No ratings yet
Module01.1 LinearRegression
32 pages
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
No ratings yet
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
37 pages
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
No ratings yet
Correlation, Simple Linear Regression and Multiple Linear Regression Practice
50 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
BUS173 Assignment
No ratings yet
BUS173 Assignment
19 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Chapter 7 - Regression Analysis
100% (1)
Chapter 7 - Regression Analysis
111 pages
Data Science 03 - Regression PDF
No ratings yet
Data Science 03 - Regression PDF
32 pages
Na9vr1 SZWvb69fvimVUw BF C2 W2 Multiple Regression Models
No ratings yet
Na9vr1 SZWvb69fvimVUw BF C2 W2 Multiple Regression Models
25 pages
Multiple linear regression
No ratings yet
Multiple linear regression
39 pages
Solutions To Practice Problems For Part VI: Regression Statistics
No ratings yet
Solutions To Practice Problems For Part VI: Regression Statistics
40 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
BSBWOR501 Manage Personal Work Priorities and Professional Development - Assessment 1
No ratings yet
BSBWOR501 Manage Personal Work Priorities and Professional Development - Assessment 1
6 pages
Data Treatment: Michael P. Tumilap
No ratings yet
Data Treatment: Michael P. Tumilap
20 pages
Factors Affecting Design and Documentation Quality
No ratings yet
Factors Affecting Design and Documentation Quality
15 pages
Untitled
No ratings yet
Untitled
3 pages
Da Insem All
No ratings yet
Da Insem All
217 pages
Thesis Solid Waste
79% (80)
Thesis Solid Waste
69 pages
Mcklnney - Some Trim Drag Considerations For Maneuvering Aircraft (1971)
No ratings yet
Mcklnney - Some Trim Drag Considerations For Maneuvering Aircraft (1971)
7 pages
Hris
No ratings yet
Hris
25 pages
Descriptives: DESCRIPTIVES VARIABLES Trust Awareness Comfort Freq /statistics Mean Stddev Min Max
No ratings yet
Descriptives: DESCRIPTIVES VARIABLES Trust Awareness Comfort Freq /statistics Mean Stddev Min Max
46 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Cooperative University of Kenya
No ratings yet
Cooperative University of Kenya
7 pages
Capture hood PDF 1
No ratings yet
Capture hood PDF 1
3 pages
Root Cause Analysis Wetland Degradation Presentation
No ratings yet
Root Cause Analysis Wetland Degradation Presentation
15 pages
BTEC Business - Unit 6 Principles of Managment - Curriculum Map
No ratings yet
BTEC Business - Unit 6 Principles of Managment - Curriculum Map
3 pages
Chebyshev Inequality
100% (1)
Chebyshev Inequality
2 pages
2 Final of The-Research-Methodology-2021-2022-2
No ratings yet
2 Final of The-Research-Methodology-2021-2022-2
5 pages
Duration Dependence of Real Estate Price in China
No ratings yet
Duration Dependence of Real Estate Price in China
12 pages
65+ Business Analyst Interview Questions and Answers For 2024 - Simplilearn
No ratings yet
65+ Business Analyst Interview Questions and Answers For 2024 - Simplilearn
35 pages
Information Literacy Education in Primary Schools: A Case Study
No ratings yet
Information Literacy Education in Primary Schools: A Case Study
16 pages
Project On Student Satisfaction
No ratings yet
Project On Student Satisfaction
71 pages
New Model of Job Design Motivating Emplo
No ratings yet
New Model of Job Design Motivating Emplo
18 pages
Evidence Based Nursing Practice - A Review
100% (1)
Evidence Based Nursing Practice - A Review
4 pages
Linear Program Resource
No ratings yet
Linear Program Resource
3 pages
PBA 232 Principles of Business Analytics
No ratings yet
PBA 232 Principles of Business Analytics
11 pages
Paradox of artificial intelligence as teammate.
No ratings yet
Paradox of artificial intelligence as teammate.
4 pages
IJSR
No ratings yet
IJSR
5 pages
Population Pharmacokinetics
No ratings yet
Population Pharmacokinetics
9 pages
MMW - Midterm - Modules - DATA MANAGEMENT
No ratings yet
MMW - Midterm - Modules - DATA MANAGEMENT
29 pages

Practical Session 1 Solved

Uploaded by

Practical Session 1 Solved

Uploaded by

L.F.

- ST22 - Advanced Statistical Methods

Practical Session 1 – Multiple Linear Regression Using Excel

Using Excel, you will conduct multiple linear regression analysis.

1. Choice of variables included in the model

Therefore, the model equation is the following :

2. Equation for model (based on sample  estimates) & interpretation of coefficients

Using Excel, we obtain:

Interpretation of the unknown parameters.

Interpretation of the dependent variable

P-value Lower 95% Upper 95%

Correlation coefficients interpretation

5. New case estimation. Forecasted values.

Participant 171 Participant 172

^y i=14.01−0.05· x 1 ,i−0.32· x2 , i+ 0.49· x 3 ,i−2.19 · x 4 ,i

betas participant 171 betas participant 172

6. A-posteriori diagnostics: Residual analysis (including plots) & Multicollinearity analysis

7. Alternative (improved) model

In the alternative model, all the regressors are significant.

The final equation is the following:

^y i=13.32−0.45· bodysat i +0.48 ·impreal i−1.91 · gender i

Reality TV and Cosmetic Surgery

The five variables analyzed in the study were measured as follows:

Participant 171 Participant 172

Exercise : Testing a model (without all tables)

You will find below:

Appendix 1 (1st scatterplot)

Appendix 2 (2nd scatterplot)

Appendix 3 (multiple linear regression model)

Coefficient Standard Error

Overall test (F test)

From the F table (df1= 2; df2=29) critical value is 3,33 (5%)

Coefficient Standard Error t p-value

Exercise 2: Choosing a model and using it

1) For each of the three suggested models:

Variable Mean Sample standard deviation*

Regression of Y w.r.t. X1, X2

1) For each of the three suggested models:

r²=(22170000-22 156 025)/22170000=13975/22170000=0,00063036

Regression of Y w.r.t. X1, X2

Regression of Y w.r.t. X1, X2

|t|>>critical value so we can reject H0

2.a Choose the most suitable model

From 1.b we choose Regression of Y w.r.t X1

2.b parameters (B0, B1, s2)

S2=SSResidual/(n-2)= 694 925/34= 20438,9706

We use the model chosen in 2.a

T34;0.05 2 . 042 using the Student table with df=30.

His salary belongs to the CI.

The test statistic always ranges from 0 to 4 where:

 d = 2 indicates no autocorrelation

H0 (null hypothesis): There is no correlation among the residuals.

Confidence Interval for Regression Coefficients

Use standard deviation of Regression or Global Standard errror

You might also like