Assignment Chapter 4 MULTI LINEAR REGRESSION
Assignment Chapter 4 MULTI LINEAR REGRESSION
ASSIGNMENT 1
TITLE :
PREPARED TO :
PREPARED BY :
NO NAME NO MATRIC
Partial regression coefficient is a statistical measure that quantifies the linear association
between two variables, while controlling for the effects of one or more other variables. In
other words, it measures the strength of the association between two variables, while
accounting for the effects of other variables that may also be related to both of them.
It measures the goodness of fit of the fitted sample regression line (SRL); that is, it gives
the proportion or percentage of the total variation in the dependent variable Y explained
by the single explanatory variable X. This concept of r 2 can be extended to regression
models containing any number of explanatory variables. Thus, in the three variable case
we would like to know the proportion of the total variation explained by X2 and X3
jointly. The quantity that gives this information is known as the multiple coefficient of
determination and is denoted by the symbol R2.
A joint hypothesis is a null hypothesis that has more than one conjecture and is
represented with multiple equal signs. An example of a joint hypothesis is testing whether
a group of explanatory variables should be included in a particular model.
e. Adjusted R2
Adjusted R-squared is a modified version of R2 that has been adjusted for the
number of predictors in the model. The adjusted R-squared increases when the new
term improves the model more than would be expected by chance. It decreases
when a predictor improves the model by less than expected
2. Explain step by step the procedure involves in Testing the statistical significance of a
single or multiple regression coefficient.
Step:
I. State null and alternative hypothesis. H0=B2 B3=0 H1=B2
II. Choose the level significant
III. Find the t-value of coefficient under the null hypothesis
IV. Compare t-value statistics with the critical value (t-critical)
V. If value t-critical, reject H0 whether one tail or two tail.
3. Explain step by step the procedure involved in Testing the statistical significance of all
partial slope coefficients or F-test.
Step :
I. Partial slope coefficient F-test
II. Use ANOVA technique and F-test
III. If computed F under the null hypothesis > critical F value – means we
reject H null, otherwise vice versa
IV. Make sure that numerator and denominator different f are properly counted
4. Table 8.6 provided in the excel gives data on child mortality (CM), female literacy rate (FLR),
per capita GNP (PGNP), and total fertility rate (TFR) for a group of 64 countries.
https://1drv.ms/x/c/642ccf8447744e51/EWQgG54G-GpKknFc3lC-9BUBMC47NlInvkFxAca
ahm1R1Q?e=eZOLQo
a. A Priori, what is the expected relationship between CM and each of the other variables?
Firstly, the relationship between CM and and FLR: Negative relationship (as female
literacy rate increases, child mortality is expected to decrease).
Secondly, the relationship CM and PGNP: Negative relationship (as per capita GNP
increases, child mortality is expected to decrease).
Thirdly, CM and TFR: Positive relationship (as total fertility rate increases, child mortality
is expected to increase).
b. Regress CM on FLR and obtain the usual regression results. Perform and report a Confidence
Interval test, Significant t- test (one tail test) and p-value test. Interpret the relationship
between CM and FLR under 5% significant level.
The Confidence Interval test report shows 95% that the true effect of FLR on CM is
between -1.96 to -2.82. Since the interval does not include zero, it suggests that the
relationship is a statistically significant negative relationship between FLR and CM. This
means that as the female literacy rate increases, Child Mortality decreases.
Significant t-test: Since (t-value) -11.2 > (critical t-value) 2 we reject the null hypothesis.
This strong negative t-value further confirms the negative relationship between FLR and
CM.
c. Regress CM on FLR and PGNP and obtain the usual results. Perform and report a
Confidence Interval test, Significant t- test (one tail test), p-value test and F-test. Interpret the
relationship between CM, FLR and PGNP under 5% significant level.
The Confidence Interval test report shows 95% that the true effect of FLR on CM is
between -1.81 to -2.66. Since the interval does not include zero, it suggests that the
relationship is a statistically significant negative relationship between FLR and CM.
For PGNP, we will be using a two-tailed test. At alpha = 0.05. The two-tailed t critical
value is 2.000. The test statistic is 2.818702828 which is greater than 2.000, so t- statistic >
that t- value so we reject the null hypothesis and conclude that the parameter of PGNP is
significant in the model or we have enough evidence to conclude that PGNP is linearly relate
to CM.
P-value is lower than our alpha of 0.05 then we reject the null hypothesis. F statistic is a test
of significance for the entire regression, the F critical value is 3.15. This regression is
statistically significant because the F statistic is greater than the critical value which is
(F-statistic) 73.83253623 > (Critical value) 3.17.
So, overwhelming evidence to conclude that CM decreases as FLR and PGNP increases.
d. Regress CM on FLR, PGNP, and TFR and obtain the usual results. Perform and report a
Confidence Interval test, Significant t- test (two tail test), p-value test and F-test. Interpret
the relationship between CM, FLR, PGNP and TFR under 5% significant level.
For TFR, we will use beta 4 in the hypothesis. We will be using a two-tailed test. At alpha
= 0.5. The two-tailed t critical value is 2.000. The test statistic is 3.070883214 which is
greater than 2.000, so t- statistic > that t- value so we reject the null hypothesis and conclude
that parameter of TRF is significant in the model or we have enough evidence to conclude
that TRF is linearly relate to CM.
P-value is lower than our alpha of 0.05 then we reject the null hypothesis. F statistic is a
test of significance for the entire regression, the F critical value is 2.76. This regression is
statistically significant because F value is greater than critical value which is (F-value)
59.16766956 > (Critical value) 2.79.
The overall model is significant and there is a significant relationship between the
dependent variable and at least one of the independent variables.
e. Given the various regression results, which model would you choose and why?
We chose a regression model in(d) because the variables are relevant . FLR,GNP and TRF
are indicating their significant relationship on CM . Second reason why I chose the model in
(d) is it has the highest R-square compared to the other two models in (b) and (c) . It explains
that a larger proportion of the variance in a dependent variable is predictable from the
independent variables .
f. If the regression model in (d) is the correct model, but you estimate (b) or ( C ) what are the
consequences?
The consequences that we will face if we choose model b or c, the misleading confidence
interval and t-test. With 95% certainty, the range of possible values for the real effect of FLR
on CM is displayed in the confidence interval that we calculated. It may give a false
impression of accuracy and uncertainty about the relationship.
Next, the inaccurate relationship between FLR and CM. It has the potential to misrepresent
the amount or even the direction of FLR’s impact on CM if (b), the coefficient for FLR is
assessed improperly.
g. Suppose you have regressed CM on FLR as in (b). How would you decide if it is worth
adding the variables PGNP and TFR to the model? Which test would you use? Show the
necessary calculations and test it at 1 percent significance level.
An F-test can be used to determine if it makes sense to include PGNP and TFR in model
(b) with only FLR. This test computes the difference between the explained variance of a
model with FLR (b) and a model with all three variables (FLR, PGNP, and TFR). We may
assess whether more variables significantly improve the model's explanatory power using
the F-test.
F- test statistic:
F = (SSRr - SSRu) / q
SSRu / (n - k - 1)
2 2
(𝑅𝑢𝑟 − 𝑅𝑟 ) / 𝑚
F= 2 ~ 𝐹𝑚, 𝑛−𝑘
(1 − 𝑅𝑢𝑟) / (𝑛 − 𝑘)
(0.7474 − 0.6696) / 2
F= (1 − 0.7474) / (64−4)
≈ 9.236718157
= 9.2367
From the case above, since the F-statistic (9.2367) is greater than the critical value (4.98),
we reject the null hypothesis that the coefficients of PGNP and TFR are zero. We can
therefore conclude that the CM is statistically significant for both PGNP and TFR. At the
1% significance level, the model greatly enhances the model.