0% found this document useful (0 votes)
9 views

Assignment

The document describes estimating a population regression model to analyze the relationship between college GPA (colgpa) and various factors. Model 1 uses OLS to regress colgpa on high school GPA (hsperc), log of study hours per week (logtothrs), and a constant. The coefficients are interpreted, with hsperc negatively related to colgpa and logtothrs positively related. Model 2 adds race dummy variables for black and white students, finding black students have lower colgpa on average than other races. Heteroskedasticity and functional form tests find no issues with Model 1. Race dummies are jointly significant in affecting colgpa. Measurement error in study hours is random and would

Uploaded by

Minza Jahangir
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Assignment

The document describes estimating a population regression model to analyze the relationship between college GPA (colgpa) and various factors. Model 1 uses OLS to regress colgpa on high school GPA (hsperc), log of study hours per week (logtothrs), and a constant. The coefficients are interpreted, with hsperc negatively related to colgpa and logtothrs positively related. Model 2 adds race dummy variables for black and white students, finding black students have lower colgpa on average than other races. Heteroskedasticity and functional form tests find no issues with Model 1. Race dummies are jointly significant in affecting colgpa. Measurement error in study hours is random and would

Uploaded by

Minza Jahangir
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

In order to analyse the data you estimate the following population regression model:

colgpa = β0 + β1hsperc+ β2log(toothrs) +U (1)

1. (10 points) Using OLS to estimate equation (1). Report the results in equation or tabular
form. Interpret the estimated coefficients of β1.and β2
. regress colgpa hsperc logtothrs

Source SS df MS Number of obs = 4137


F( 2, 4134) = 497.23
Model 347.912544 2 173.956272 Prob > F = 0.0000
Residual 1446.28313 4134 .349850781 R-squared = 0.1939
Adj R-squared = 0.1935
Total 1794.19567 4136 .433799728 Root MSE = .59148

colgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

hsperc -.0166453 .0005577 -29.85 0.000 -.0177387 -.015552


logtothrs .0859363 .0118415 7.26 0.000 .0627205 .109152
_cons 2.655312 .0469579 56.55 0.000 2.56325 2.747375

Hsperc: The b1 is -0.166453 which is also found statistically significant at the 5% significance
level. Further, it shows that students performance in high school is negatively related with
students average grade in college, which means that if students performance in ghigh schools
decrease by 1% then their grades in college will increase by 16.6453%

Logtoothrs: The b2 is .0859363 which is also found statistically significant at the 5%


significance level. Further, it shows that the number of total hours of study per week is
positively related with students average grade in college, which means that if student will
study one more per week than their grades in college will increase by 8.593%

(10 points) We modify Model (1) to include race dummies:

colgpa = β0 + β1hsperc+ β2log(toothrs) + β3white + β4white +U (2)


. regress colgpa hsperc logtothrs black white

Source SS df MS Number of obs = 4137


F( 4, 4132) = 295.95
Model 399.553097 4 99.8882743 Prob > F = 0.0000
Residual 1394.64258 4132 .337522405 R-squared = 0.2227
Adj R-squared = 0.2219
Total 1794.19567 4136 .433799728 Root MSE = .58097

colgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

hsperc -.0169703 .0005485 -30.94 0.000 -.0180456 -.0158949


logtothrs .0845107 .0116319 7.27 0.000 .0617059 .1073155
black -.4937566 .0758722 -6.51 0.000 -.6425069 -.3450062
white -.0047028 .0660507 -0.07 0.943 -.1341977 .1247921
_cons 2.698516 .0794471 33.97 0.000 2.542757 2.854275

Why has the associated race dummy variable, allother, been excluded? Using the
regression results of Model (2), interpret the coefficients of the race dummy variables.

Allother has been excluded from the Model (2) due to dummy variable trap.

The β3 and β4 are -.4937566 & -0.0047 respectively. The equations above implies that the
average grade of college students who are black is 49.37 points less than other students other
racial groups; the average grade of white students in college is 0.47 lesser than other racial
groups. It show that students who belong from other racial groups have more better average
grades in college.

(14 points) An econometrician argues that Model (2) is not the correct population regression
model. Instead, the correct population regression model should take into account that high
school performance has varying predictive power for college performance depending on the
race of the student. The econometrician predicts that students from disadvantaged races may
attend high schools of lower quality, leading to a lower correlation between high school
performance and college performance. Formulate the appropriate regression equation to take
into account the heterogeneous relationship between hsperc and colgpa that varies based on
the black dummy variable. Estimate this regression using OLS and interpret the estimated
coefficient(s) of the new parameter(s).

colgpa = β0 + β1hsperc+ β2balck+U


. regress colgpa hsperc black

Source SS df MS Number of obs = 4137


F( 2, 4134) = 558.63
Model 381.73639 2 190.868195 Prob > F = 0.0000
Residual 1412.45928 4134 .341668913 R-squared = 0.2128
Adj R-squared = 0.2124
Total 1794.19567 4136 .433799728 Root MSE = .58452

colgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

hsperc -.0173544 .0005492 -31.60 0.000 -.0184311 -.0162778


black -.4920052 .0397861 -12.37 0.000 -.5700074 -.414003
_cons 3.013769 .0141857 212.45 0.000 2.985957 3.04158

The equations above implies that the average grade of college students who belong from
disadvantaged races may attend high schools of lower quality is 49.37 points less than other
other racial groups .

2. (26 points) Perform statistical tests to determine whether Model (1) suffers from
heteroskedasticity and functional form misspecification. What do you conclude from the
results of these tests (use 5% significance level)?

. regress colgpa hsperc logtothrs

Source SS df MS Number of obs = 4137


F( 2, 4134) = 497.23
Model 347.912544 2 173.956272 Prob > F = 0.0000
Residual 1446.28313 4134 .349850781 R-squared = 0.1939
Adj R-squared = 0.1935
Total 1794.19567 4136 .433799728 Root MSE = .59148

colgpa Coef. Std. Err. t P>|t| [95% Conf. Interval]

hsperc -.0166453 .0005577 -29.85 0.000 -.0177387 -.015552


logtothrs .0859363 .0118415 7.26 0.000 .0627205 .109152
_cons 2.655312 .0469579 56.55 0.000 2.56325 2.747375

. hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity


Ho: Constant variance
Variables: fitted values of colgpa

chi2(1) = 3.73
Prob > chi2 = 0.0534
This above output shows a p-value of 0.0534 which is statistically significant at significance
level of 0.05, indicates that Chi-square test is statistically insignificant. It indicates that the
college performance (response variable) is free from heteroskedasticity

3. (10 points) Perform statistical tests to determine whether race dummies jointly statistically
significantly affect the college performance at the 5% significance level.

. test logtothrs black white

( 1) logtothrs = 0
( 2) black = 0
( 3) white = 0

F( 3, 4132) = 69.20
Prob > F = 0.0000

4. The above results shows that the F-statistic for the joint significance of race dummies is
about 69.2 and the corresponding p-value is 0.0000. Thus, we can reject the null
hypothesis and concluded that race dummies jointly statistically significantly affect the
college performance at the 5% significance level.

5. (5 points) The number of total hours of study per week (tothrs) variable may have been
measured with some errors, as some students may not recall the correct number of hours
spent studying. However, while the errors are quite frequent, they are completely at
random. Explain in detail the impact of these errors on the estimate of
n2 .

As it is mentioned above that there are some random errors in the model this means that
number of study hours per week sometimes goes up or sometimes goes down. In this
scenario random error is less, if it was large than our estimated model was inconsistent.
Further, in order to reduce the impact of random errors on estimation of b2, they should
increase the sample size it will reduce the effect or errors in above estimated mode of b2l

6. (5 points) Some students did not want to report their college performance (colgpa) when
answering the survey questions. Will this cause any problems when estimating the effect of
study hours on the college performance (colgpa) in Model (1)?
Definitely, it will not cause the problem missing values in the survey question can reduce
the statistical power of a study and can affect the decision we made on the basis of results. It
is better to delete or impute the missing values for increasing the accuracy of our results

7. (10 points) Consider the dummy variable colgpa_A, which is 1 if the college GPA higher
than 3.5, and 0 otherwise. Estimate the following model using OLS and the appropriate
standard errors. Report the results in equation or tabular form. Interpret the estimated
coefficients of α 1 and α 2.

colgpa_A, = α0 + α 1 ℎsperc + α 2 log(toothrs)+ U (3)

. logit colgpa_a hsperc logtothrs

Iteration 0: log likelihood = -1439.6492


Iteration 1: log likelihood = -1240.829
Iteration 2: log likelihood = -1167.1211
Iteration 3: log likelihood = -1162.4375
Iteration 4: log likelihood = -1162.4279
Iteration 5: log likelihood = -1162.4279

Logistic regression Number of obs = 4137


LR chi2(2) = 554.44
Prob > chi2 = 0.0000
Log likelihood = -1162.4279 Pseudo R2 = 0.1926

colgpa_a Coef. Std. Err. z P>|z| [95% Conf. Interval]

hsperc -.1427191 .0087127 -16.38 0.000 -.1597956 -.1256426


logtothrs -.2918732 .0683725 -4.27 0.000 -.4258809 -.1578655
_cons .5777074 .2690866 2.15 0.032 .0503073 1.105107

We have used Logistic regression because the outcome variable is dichotomous.

For every one percent increase in high school performance, the log odds of if the college GPA
higher than 3.5 (versus zero or otherwise) decreases by -0.1327191.

By studying one more hour per week, the log odds of if the college GPA higher than 3.5 (versus
zero or otherwise) decreases by -0.2918

8. (20 points) What could be the possible reasons for the sign of the coefficient α2 not being as
predicted? Suppose you had a longitudinal data set with the same variables as used here.
Would the panel data provide any advantages over the cross-sectional data set for the
estimation of the effect of study hours (tothrs) on GPA (colgpa)?

It might be possible because we have collected data at a single point of time due to which
the coefficient a1 is not similar as being predicted. Definitely, the pane data would be better
than cross sectional data to estimate the effect of study hours (tothrs) on GPA (colgpa). It
allows us to collect a large amount of data over the period of time. It will help us to know
how the effect of study hours (tothrs) on GPA (colgpa) with the period of time and how it
will affect in the futures.

You might also like