Class 3 Computer Exercise
Class 3 Computer Exercise
Please download and save the Do-file for this assignment, “Class 3 Computer Exercise do
file.do,” from the course website and perform all the steps and required statistical operations as
directed below.
1. Save the downloaded Do-file, “Class 3 Computer Exercise do file.do,” in a folder on your
computer with all your PADM-GP 2902 computer work.
2. Open Stata.
3. Click on the “New Do-file editor” icon (pad with pencil) on the main Stata screen. 1 A
new screen will pop up or appear as a bar at the bottom of your screen. Maximize this
Do-file editor window.
4. In the new window (the Do-file editor window), go to File > Open > then find and
double-click your Do-file.
5. Further instructions for this exercise will now appear in the Do-file.
6. To save your work in the Do-file, save as any other file, but be sure to give it a new
name, use the .do suffix and put it with your 2902 computer work.
7. Important: You should clean and edit your Do file so it only contains correct
commands.
8. Read the Stata Crib Sheet_Brief (on the course website). More advanced Stata users can
feel free to skim the Stata Crib Sheet _Long_ for more advanced programming.
9. Now you are ready to do this exercise! Follow the written instructions below along
with the instructions in the “Class 3 Computer Exercise do file.do.”
Please submit:
A pdf of the Stata log (smcl) file documenting that you’ve completed each of the operations
directed below and showing the results. See the translate command in the downloaded Do-
file in order to create a PDF from the log file. (NB: Be sure to run your correct Do-file in its
entirety once all your commands are working properly so your log file does not contain
errors.)
1
If you go to File > Do in the main screen, it doesn’t open the Do-file, it RUNS it.
1
PADM 2902 Regression and Introduction to Econometrics
Part I: This part of the exercise serves as an introduction to the dataset and to Stata. You will
only be doing the Stata work. The questions on interpretations are only meant for in-class
discussion. You do not need to include written answers about interpreting the results as part of
your homework this week, but be prepared to discuss the answers in class.
1. Obtain descriptive statistics for all the variables for 1998, 2002, 2006, 2010, 2014, and 2016
separately, and then obtain the correlations between all the variables for all six years pooled.
The descriptive statistics command for 1998 and the correlation command are given in the
Do-file. The remaining descriptive statistics commands are partially given, and you must
complete them, using the first command as a model.
Questions 2 through 5 are interpretation questions for in-class discussion only. Again, do not
include written answers about interpreting the results as part of your homework this week, but
be prepared to discuss the answers in class.
2. Are there any missing values for any variables? If so, which and how do you know?
3. What is the range of total expenditures per pupil? Of percent of students eligible for free
lunch? Of total enrollment? Are you surprised by the variation in any of these variables?
Why or why not?
4. Which variables are most highly correlated with total spending per pupil?
5. Which variables do you think are most likely to determine the variation of total spending per
student and why?
6. Now, obtain detailed descriptive statistics for totreg and pfl using the “detail” option (it’s an
option with the summarize command) in Stata. For help with this or any function in Stata,
use the help command. In this case, type help summarize. Try help, even though the
command is written out for you in the Do-file. You will need to be able to use help later!
7. Let’s define a small school as one with total enrollment equal to or less than the first quartile
(25th percentile) of totreg. A large school is one with total enrollment equal to or greater than
2
PADM 2902 Regression and Introduction to Econometrics
the third quartile (75th percentile) of totreg. And let’s define the least poor and poorest student
bodies using the first and third quartiles of pfl, respectively.
Obtain descriptive statistics on total per-pupil spending for small and large schools, and then
for the poorest and least poor student bodies.
Question 8 is interpretation. Do not include a written answer with your homework this week,
but be prepared to answer it in class.
8. Comment on the differences in total per-pupil spending between these groups. How can you
explain them?
9. Finally, use the twoway command to create two scatterplots (scattergrams), both with total
per-pupil spending on the y-axis. Plot pfl on the x-axis for the first scatterplot and totreg for
the second.
Question 10 is interpretation. Do not include a written answer with your homework this week,
but be prepared to answer it in class.
10. Do the visual results comport with the descriptive statistics you obtained? Why or why not?
Part II: This part of the exercise involves using Stata to estimate regressions. Again your
homework is the Stata work and output and not the interpretations.
1. Estimate an OLS regression of total expenditures per pupil on the percent of the school’s
students who are black.
Questions 2 through 7 are interpretation questions for in-class discussion only. Again, do not
include written answers about interpreting the results as part of your homework this week, but
be prepared to discuss the answers in class.
2. Interpret the coefficient on percent black. Be specific in interpreting direction and magnitude.
3. Is the coefficient on percent black statistically significant? How do you know? Use numbers
as well as words.
3
PADM 2902 Regression and Introduction to Econometrics
4. What is the R2? Is it statistically significant? How do you know? Use numbers as well as
words.
5. What is the adjusted R2? (Give a number and an explanation of what it means.)
6. What is the sample size? How did you obtain this number?
7. Form a 95% confidence interval around the coefficient on percent black students.
8. Now estimate an OLS regression of total expenditures per pupil on the percent of the
school’s students who are black and percent of the school’s students who are eligible for free
lunches.
Questions 9 and 10 are interpretation. Do not include written answers with your homework
this week, but be prepared to answer them in class.
9. Does the coefficient on percent black differ from the one in your previous regression? If so,
why? Which limitation(s) of OLS does this illustrate? That is, which assumption(s) was/were
violated by the previous regression?
10. Has the R2 changed? Is it in the direction expected, what was the expected direction and why
was that expected?
11. We’ve learned that one of the Classical Assumptions of the linear model is constant variance
in the error term. In Stata, we can use the predict command with the residuals option to
obtain our model’s residuals and plot them. This command is provided for you in the Do-file.
Use the twoway command to plot the residuals on the y-axis and pfl on the x-axis.
Use the twoway command to plot the residuals on the y-axis and pblack on the x-axis.
Question 12 is interpretation. Do not include a written answer with your homework this week,
but be prepared to answer it in class.
12. Does it appear the constant variance assumption holds for pfl? for pblack? What, in plain
English, can you conclude?