Tutorial 10 - Questions
Tutorial 10 - Questions
Tutorial : 10
Topics : Regression Analysis
1. Explain what it means by (1) Using Ordinary least square (OLS) for coefficient estimation,
(2) the OLS estimates are statistics, and (3) the “Everything-else-being-equal” condition
when interpreting the coefficient.
2. We analyze how debt payments may be influenced by income (measured in $1,000) and
the unemployment index. The sample regression equation is
% = 20 + 10𝐼𝑛𝑐𝑜𝑚𝑒 + 0.6𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡
𝐷𝑒𝑏𝑡
(1) Interpret the regression coefficients.
(2) Use the coefficients to predict debt payment if income is $80,000 and the
unemployment index is 7.5.
3. A sociologist believes that the crime rate in an area is significantly influenced by the area’s
poverty rate and median income. Specifically, she hypothesizes that crime will increase
with poverty and decrease with income. She collects data on the crime rate (crimes per
100,000 residents), the poverty rate (in %), and the median income (in $1,000s) from 41
cities. A portion of the regression results is shown in the following table.
Coefficients Standard Error t Statistics P value
Intercept -301.45 549.51 -0.55 0.5864
Poverty 53.23 14.18 3.75 0.0006
Income 4.93 8.15 0.60 0.5526
(1) Interpret the slope coefficient for Poverty.
(2) Predict the crime rate in an area with a poverty rate of 20% and a median income of
$50,000.
(3) In terms of statistical significance, which independent variable is significantly
associated with crime rates?
We did not talk about the test procedure for such inference problems. In any introductory
statistical textbook, you can find another set of formulas to do those (and you won’t like
that I believe). With regression, you can do all that very easily.
Consider scores.csv in the course folder. The file has the test scores of randomly selected
1
Nanyang Business School
students (score) and a two-level categorical variable “female.” In this example, we think of
male and female students as forming their respective populations, which may have
different mean test scores. The regression equation of interest is:
5. In a study, researchers intend to test the degree to which a CEO’s salary depends on the
company’s sales, and the main industry sector of the company. The data used are in
“ceosal1” in the package ”wooldridge”. In the data, Salary is the annual salary in
thousands of dollars. indus, finance, consprod, and utility are binary variables indicating
the sector of the company (industrial/manufacturing, financial, consumer products, and
utilities industries). sales are company sales in millions $.
(1) Estimate the OLS regression. Use sales and all sector variables as the IV’s. For the
sector variables, use indus as the base group.
(2) State the null and alternative hypotheses that are implicitly tested for the coefficient
of sales on CEO’s salary. What does the regression results say about the above
hypotheses?
(3) Estimate a new model to compare the difference in average salary between the utility
and finance industries, holding sales fixed (use finance as the base group). What does
the regression say about the difference?
6. Use the data set “bwght” in the package “wooldridge”. This data set contains data on births
to women in the United States. Two variables of interest are the dependent variable, infant
birth weight in ounces (bwght), and an explanatory variable, average number of cigarettes
the mother smoked per day during pregnancy(cigs).
(1) Estimate the model :
bwght = 𝛽! + 𝛽" 𝑐𝑖𝑔𝑠 + 𝜖
(2) What is the predicted birth weight when cigs=0? What about when cigs=20 (one pack
per day)?
(3) As a thought experiment: For the predicted birth weight to be 125 ounces, what would
cigs have to be? Does the answer make sense? What makes it counterintuitive?
2
Nanyang Business School
student’s percentile in the high school graduating class (for example, hsperc=5 means
the top 5% of the class), and sat is the combined math and verbal scores on the student
achievement test.
(2) What is the predicted college GPA when hsperc =20 and sat =1050?
(3) Suppose that two high school graduates, A and B, graduated in the same percentile
from high school, but student A’s SAT score was 140 points higher. What is the
predicted difference in college GPA for these two students?