0% found this document useful (0 votes)
34 views3 pages

Tutorial 10 - Questions

The document discusses regression analysis concepts and how to apply regression models to analyze relationships between variables using different datasets. Questions ask to estimate regression models, interpret coefficients, and make predictions based on the results. Factors like time, industry sectors, and diet are used as explanatory variables in various contexts.

Uploaded by

Gowshika Sekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views3 pages

Tutorial 10 - Questions

The document discusses regression analysis concepts and how to apply regression models to analyze relationships between variables using different datasets. Questions ask to estimate regression models, interpret coefficients, and make predictions based on the results. Factors like time, industry sectors, and diet are used as explanatory variables in various contexts.

Uploaded by

Gowshika Sekar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Nanyang Business School

AB1202 – STATISTICS AND ANALYSIS

Tutorial : 10
Topics : Regression Analysis

1. Explain what it means by (1) Using Ordinary least square (OLS) for coefficient estimation,
(2) the OLS estimates are statistics, and (3) the “Everything-else-being-equal” condition
when interpreting the coefficient.

2. We analyze how debt payments may be influenced by income (measured in $1,000) and
the unemployment index. The sample regression equation is
% = 20 + 10𝐼𝑛𝑐𝑜𝑚𝑒 + 0.6𝑈𝑛𝑒𝑚𝑝𝑙𝑜𝑦𝑚𝑒𝑛𝑡
𝐷𝑒𝑏𝑡
(1) Interpret the regression coefficients.
(2) Use the coefficients to predict debt payment if income is $80,000 and the
unemployment index is 7.5.

3. A sociologist believes that the crime rate in an area is significantly influenced by the area’s
poverty rate and median income. Specifically, she hypothesizes that crime will increase
with poverty and decrease with income. She collects data on the crime rate (crimes per
100,000 residents), the poverty rate (in %), and the median income (in $1,000s) from 41
cities. A portion of the regression results is shown in the following table.
Coefficients Standard Error t Statistics P value
Intercept -301.45 549.51 -0.55 0.5864
Poverty 53.23 14.18 3.75 0.0006
Income 4.93 8.15 0.60 0.5526
(1) Interpret the slope coefficient for Poverty.
(2) Predict the crime rate in an area with a poverty rate of 20% and a median income of
$50,000.
(3) In terms of statistical significance, which independent variable is significantly
associated with crime rates?

4. In Week 9, we limited our discussion to the inference of a single population mean 𝜇.


Oftentimes in many applications, we would want to compare the mean of two different
populations; for example, we may want to test H0: 𝜇! = 𝜇" , that the means of two
populations are equal (or equivalently, H0: 𝜇! − 𝜇" = 0)

We did not talk about the test procedure for such inference problems. In any introductory
statistical textbook, you can find another set of formulas to do those (and you won’t like
that I believe). With regression, you can do all that very easily.

Consider scores.csv in the course folder. The file has the test scores of randomly selected
1
Nanyang Business School

students (score) and a two-level categorical variable “female.” In this example, we think of
male and female students as forming their respective populations, which may have
different mean test scores. The regression equation of interest is:

score = b0 + b1 female + error

(1) What is the interpretation of the coefficients b0 and b1?


(2) What is the estimated coefficient of “female”, its t-score & p-value?

5. In a study, researchers intend to test the degree to which a CEO’s salary depends on the
company’s sales, and the main industry sector of the company. The data used are in
“ceosal1” in the package ”wooldridge”. In the data, Salary is the annual salary in
thousands of dollars. indus, finance, consprod, and utility are binary variables indicating
the sector of the company (industrial/manufacturing, financial, consumer products, and
utilities industries). sales are company sales in millions $.

(1) Estimate the OLS regression. Use sales and all sector variables as the IV’s. For the
sector variables, use indus as the base group.
(2) State the null and alternative hypotheses that are implicitly tested for the coefficient
of sales on CEO’s salary. What does the regression results say about the above
hypotheses?
(3) Estimate a new model to compare the difference in average salary between the utility
and finance industries, holding sales fixed (use finance as the base group). What does
the regression say about the difference?

6. Use the data set “bwght” in the package “wooldridge”. This data set contains data on births
to women in the United States. Two variables of interest are the dependent variable, infant
birth weight in ounces (bwght), and an explanatory variable, average number of cigarettes
the mother smoked per day during pregnancy(cigs).
(1) Estimate the model :
bwght = 𝛽! + 𝛽" 𝑐𝑖𝑔𝑠 + 𝜖
(2) What is the predicted birth weight when cigs=0? What about when cigs=20 (one pack
per day)?
(3) As a thought experiment: For the predicted birth weight to be 125 ounces, what would
cigs have to be? Does the answer make sense? What makes it counterintuitive?

7. Use the data set “gpa2” in the package “wooldridge”.


(1) Estimate the coefficient of the regression model:
𝑐𝑜𝑙𝑔𝑝𝑎 = 𝛽! + 𝛽" ℎ𝑠𝑝𝑒𝑟𝑐 + 𝛽# 𝑠𝑎𝑡 + 𝜖,
where colgpa is a student’s college GPA measured on a four-point scale, hsperc is the

2
Nanyang Business School

student’s percentile in the high school graduating class (for example, hsperc=5 means
the top 5% of the class), and sat is the combined math and verbal scores on the student
achievement test.
(2) What is the predicted college GPA when hsperc =20 and sat =1050?
(3) Suppose that two high school graduates, A and B, graduated in the same percentile
from high school, but student A’s SAT score was 140 points higher. What is the
predicted difference in college GPA for these two students?

8. Use the data set “wage2” in the package “wooldridge”.


(1) Estimate the model :
𝑒𝑑𝑢𝑐 = 𝛽! + 𝛽" 𝑠𝑖𝑏𝑠 + 𝛽# 𝑚𝑒𝑑𝑢𝑐 + 𝛽$ 𝑓𝑒𝑑𝑢𝑐 + 𝜖
Where educ is an individual’s years of schooling, sibs is the number of sibling, meduc
is mother’s years of schooling, and feduc is father’s years of schooling.
(2) Holding meduc and feduc fixed, by how much does sibs have to increase to reduce
predicted years of education by one year?
(3) Discuss the interpretation of the coefficient on meduc.
(4) Suppose that Man A has no siblings, and his mother and father each have 12 years of
education. Man B has no siblings, and his mother and father each have 16 years of
education. What is the predicted difference in years of education between B and A?

9. The ChickWeight dataset in R contains the weight records of 50 chicken


(ChickWeight$weight; in grams). Each chicken (id’ed by ChickWeight$chick) has been fed
on one of the four available chicken feeds (ChickWeight$Diet), and their weight has been
measured 12 times at different age cycles (ChickWeight$Time; in # of days since born).
(1) Verify that both Diet and Chick are factor variables in R while Time and weight are not.
(2) Estimate the following regression model:
weight = β! + β" Time + β# Diet
NOTE: the above is a short-hand representation as Diet has multiple levels and thus
should have multiple coefficients
(3) How would you interpret the coefficients of Time and Diet in the regression?
(4) Factorize the Time variable and rerun the regression model. How would using a
factorized Time variable change the interpretation of the Time variable(s)?

You might also like