Exam 2 Practice Problems
Exam 2 Practice Problems
Professor Handy
1. Consider tossing two four-sided dice. As is often the case, you may want to write out the
sample space of the experiment before you begin. Let 𝑋 be the sum of the two dice and 𝐷 be
the absolute value of difference of the two dice, so that 𝐷((3, 1)) = 𝐷((1, 3)) = 2. Create a table
showing the joint pmf of 𝑋 and 𝐷. You do not have to submit this table, but it will be helpful
for answering the following questions.
2. You are given the following joint pmf for discrete random variables 𝑋 and 𝑌.
𝑋 0 1 2
(a) 1
(b) 0.70
(c) 0.60
(d) 0.30
(a) 1.1
(b) 1
(c) 0.9
(d) 0.5
3. You are given the following joint pmf for discrete random variables 𝑋 and 𝑌.
𝑌
𝑋 0 1
1 0.20 0.15
2 0.10 0.30
3 𝑐 0.15
3.1. What is the value of 𝑐?
4. A random point (𝑋 , 𝑌) is distributed uniformly on the rectangle with vertices (2, 1), (2, −1),
(−2, 1), and (−2, −1). That is, the joint pdf of 𝑋 and 𝑌 is 𝑓 (𝑥, 𝑦) = 1
8 for −2 ≤ 𝑥 ≤ 2 and
−1 ≤ 𝑦 ≤ 1. Answer the following questions, using geometric reasoning rather than integration.
4.4. What is 𝑃(𝑋 2 + 𝑌 2 < 1)? (Hint: The region described by this inequality is a very common
shape.)
(a) 2
(b) 1
1
(c) 2
(d) √1
8
5. (Challenging) Kirti is taking a 10-question multiple choice quiz. On each question, she has an
80 percent chance of answering correctly, and the questions are independent of one another.
Let 𝑌 be the number of questions she answers correctly. Answer the questions below without
using formulas for conditional distributions.
5.5. Consider the number of questions out of 10 on the quiz that Kirti will get correct, conditional
on answering the first question correctly. What is the distribution of this random variable?
6. (Challenging) You have seen how the geometric distribution can be used to answer a question
such as: What is the average number of rolls of a six-sided die needed to get a 6? In this problem,
we will use geometric distributions to answer a related, but more challenging, question. Let 𝑋
be the number of rolls of a six-sided die needed to get each number at least once. What is the
average number of rolls needed, 𝐸(𝑋)?
In this problem, a success is rolling any number that you have not yet rolled. Rolling each
number at least once requires six different successes. Let 𝑥 𝑖 be the number of rolls necessary to
get the 𝑖th success after you have had 𝑖 − 1 successes. Note that the subscripts denote successes,
not numbers on the die. Then the number of rolls needed to get each number at least once is
𝑋 = 𝑥1 + 𝑥2 + · · · + 𝑥6 .
6.1. What is the probability of success on your first roll of the die? Remember that success
means rolling any number you have not yet rolled.
6.2. The random variable 𝑥 1 is the number of rolls needed to get your first success. What is
𝐸(𝑥1 )?
6.3. After you have your first success, what is the probability of success (rolling a number you
have not yet rolled) on the next roll?
6.4. The random variable 𝑥2 is the number of rolls needed to get your second success after
getting your first success. What is 𝐸(𝑥2 )?
6.6. By the linearity of expected value, the expectation of a sum of random variables is the sum of
the expectations of the random variables. We can put that property to use here: The average
number of rolls needed to get each number at least once is 𝐸(𝑋) = 𝐸(𝑥1 ) + 𝐸(𝑥2 ) + · · · + 𝐸(𝑥6 ).
What is 𝐸(𝑋)?
7. Let 𝑋 and 𝑌 be discrete random variables. You know that 𝐸(𝑋) = 4, 𝐸(𝑌) = 9, 𝑓𝑌 (2) = 0.5, and
𝑋 and 𝑌 are independent. What is the value of 𝐸(𝑋 | 𝑌 = 2)?
(a) 2
(b) 4
(c) 9
(d) 18
8. Suppose 𝑋 and 𝑌 are discrete random variables with 𝜎𝑋 = 0.2, 𝜎𝑌 = 3.5, and 𝜎𝑋 ,𝑌 = 0.5. Are 𝑋
and 𝑌 independent?
9. Suppose that 𝑊 and 𝑍 are independent random variables. Which of the following can we
conclude?
(a) 𝜇𝑊 = 𝜇𝑍
(b) Var(𝑊) = Var(𝑍)
(c) 𝜌𝑊 ,𝑍 = 0
(d) 𝑃(𝑊 = 𝑤) = 𝑃(𝑍 = 𝑧) when 𝑤 = 𝑧
10. In addition to estimating the official (U-3) U.S. unemployment rate that you heard about in class,
the BLS also estimates a number of alternative measures of “labor underutilization.” One of
these, the U-6 rate, uses a broader definition of the labor force, and measures the proportion
of this broader labor force that is either unemployed or is working part-time but would like to
work more hours. This is sometimes called the “underemployment rate.” Suppose the true
underemployment rate is 20 percent (levels this high were seen early in the Covid pandemic).
You plan to survey four members of the labor force at random. Let 𝑋𝑖 be the underemployment
status (1 if underemployed, 0 otherwise) of the 𝑖th member of your survey, and let 𝑝ˆ be the
proportion of your sample that is underemployed.
10.4. What is 𝑃( 𝑝ˆ = 41 )?
10.5. What is 𝑃( 𝑝ˆ = 12 )?
10.6. What is 𝑃( 𝑝ˆ = 34 )?
10.7. What is 𝑃( 𝑝ˆ = 1)? After calculating this, ensure that your pmf values for 𝑝ˆ sum to 1.
11. (Challenging) Consider the experiment of the birth of dizygotic twins (often called fraternal
twins), and let 𝑋 be the number of girls among the two twins. Unlike monozygotic (identical)
twins, dizygotic twins do not have to be the same sex. Assume boys and girls are equally likely
in any given birth, and the sexes of a pair of dizygotic twins are independent. You plan to
sample two pairs of twins and compute 𝑋.
11.2. What is 𝑃(𝑋 > 0)? How does it compare to 𝑃(𝑋 > 0)?
12. A recent report indicates that 62 percent of college graduates in the U.S. have some student loan
debt. Suppose this proportion also applies to students at UNC. You plan to randomly sample
five students on campus about whether they have any student loan debt.
12.1. What is the probability that exactly two of the five students you survey have some student
loan debt?
12.2. What is the expected number of students in your survey who have some student loan debt?
12.3. Suppose there was a flaw in your survey design: you were less likely to interview second-
semester seniors, who generally spend less time on campus. Will the sample proportion
still be an unbiased estimator?
13. In the United States, the overall undergraduate dropout rate is 40 percent. If you conduct a
random sample of 1,000 people who ever enrolled as an undergraduate, what is the probability
that the estimated dropout rate in your sample will be between 38 percent and 42 percent?
14. Let 𝑋1 , . . . , 𝑋70 be the values to be recorded in a random sample of 70 people from a population
that is normally distributed with mean 𝜇 = 30 and standard deviation 𝜎 = 8.
(a) 0.018
(b) 0.037
(c) 0.401
(d) 0.803
(a) 0.11
(b) 0.34
(c) 0.96
(d) 8
(a) 0.018
(b) 0.037
(c) 0.401
(d) 0.803
15. After medical school, physicians complete a residency of three or more years to train in a specific
branch of medicine. Residencies often demand long work shifts. Suppose that among all
residents, the number of hours worked per week has a mean of 60 and a standard deviation of
16 (these values are consistent with Barger et al., BMJ Medicine, 2023), and assume that hours
worked per week is normally distributed.
15.1. What share of residents works more than 80 hours per week?
15.2. In a random sample of 100 residents, what is the probability that the average number of
hours worked per week will be more than 65?
16. You sample 100 people from a population with a mean of 𝜇 = 500 and a standard deviation of
𝜎 = 60. Consider the sampling distribution of the sample mean, 𝑋.
(a) 495
(b) 500
(c) 505
(d) To answer this question, we would need to know the shape of the population
distribution.
(a) 600
(b) 60
(c) 6
(d) 0.6
16.3. What is the probability that the sample mean is between 490 and 495?
(a) 0.036
(b) 0.156
(c) 0.312
(d) To answer this question, we would need a calculator to find probabilities for a 𝑡
distribution.
17. You have a random sample of 2,500 people from a large population. When can you use the
standard normal distribution to compute probabilities associated with the sample mean, treated
as an estimator?
18. Three economists surveyed MBA graduates from the Booth School at the University of Chicago
in order to learn about the role of long work hours in explaining the persistence of a male-female
earnings gap in high-skill jobs (Bertrand, Goldin, and Katz, “Dynamics of the Gender Gap for
Young Professionals in the Financial and Corporate Sectors,” American Economic Journal: Applied
Economics, 2010). Among 629 women in the survey, the average work week was 55.75 hours.
19. Incomes in a particular town are strongly right-skewed with a mean of $36,000 and a standard
deviation of $17,000. A random sample of 125 households is taken. Describe the shape, mean,
and standard deviation of the sample mean.
20. You conduct a random sample of 𝑛 = 2 individuals from a population with mean 𝜇 and
variance 𝜎2 . You know that the sample mean, 𝑋 = 12 (𝑋1 + 𝑋2 ), would be the typical estimator
for the population mean. However, you instead decide to estimate the population mean using
a different weighted average of your sample values, 𝑋
e = 2 𝑋1 + 1 𝑋2 .
3 3
20.3. If two unbiased estimators have difference variances, then the estimator with the smaller
variance is said to be more efficient. Compared to 𝑋, is 𝑋
e more efficient, less efficient, or
equally efficient?
(a) 𝑋
e is more efficient
(b) 𝑋
e is less efficient
(c) 𝑋
e is equally efficient
21. In the U.S., the poverty rate is the share of individuals living in households with incomes below
the federal poverty line (which varies with the size of the household). You plan to conduct
a random sample of 1,000 people and compute the estimated poverty rate as the share of the
sample living in poverty, 𝑝.
ˆ
21.1. Will the standard deviation of 𝑝ˆ be larger than, smaller than, or the same as the standard
deviation of poverty status in the population?
21.2. Suppose the true poverty rate is 12 percent. What is the probability that in a random sample
of 1,000 individuals, the estimated poverty rate is between 11 percent and 13 percent?
21.3. Suppose for the remainder of the question that you do not know the true poverty rate and
that in your survey of 1,000 individuals, the estimated poverty rate is 12.5 percent. What is
the standard error of 𝑝?
ˆ
21.4. What is the lower bound of the 99% confidence interval for the U.S. poverty rate? Express
your answer as a proportion, not a percentage.
21.5. What is the upper bound of the 99% confidence interval for the U.S. poverty rate? Express
your answer as a proportion, not a percentage.
22. Suppose 100 students are randomly surveyed about how much they study. The sample mean is
23.6 hours per week, and the sample standard deviation is 7 hours per week. Construct a 90%
confidence interval for the population mean study time per week.
23. Many political polling organizations report sample proportions and a margin of error from a
95% confidence interval for the proportion. Suppose an organization is planning to conduct a
poll of voters to see what share of them will vote for the Democratic candidate in the presidential
election. The organization expects half of the voters to support the Democratic candidate and
they want to be able to report a margin of error of 3 percentage points. How many voters do
they need to survey? (Assume that you can use the standard normal distribution.)
(a) 267
(b) 752
(c) 1,067
(d) 2,134
24. (This question is from March 2022.) France has presidential elections coming up this April.
According to data from Politico, current French President Emmanuel Macron is leading polling
with 24% support. His top opponent, Marine LePen, is supported by 17% of respondents.
24.1. Suppose the data above comes from a poll of 200 likely voters. Form a 95% confidence
interval for the proportion of voters in France who support Macron.
24.2. Five candidates in the election are each polling with over 10% support. In close elections,
it can be difficult to use polling data to gauge the true order of support for the candidates.
Suppose the polling organization would like the 95% confidence interval for Macron’s
support to have a margin of error of 2 percentage points. How large a sample would be
necessary to achieve this? Assume for this exercise that the estimated support for Macron
would stay the same as the sample size changes.
25. You plan to ask a random sample of students on campus about whether they have at least
conversational proficiency in a second language. Let 𝑝 be the true proportion of students who
are proficient, and let 𝑝ˆ be the sample proportion who are proficient.
25.1. If 𝑝 = 0.2 and you sample 200 students, what is sd( 𝑝)?
ˆ
25.2. If 𝑝 = 0.5 and you sample 200 students, what is sd( 𝑝)?
ˆ
25.3. If 𝑝 = 0.8 and you sample 200 students, what is sd( 𝑝)?
ˆ
25.4. If you wanted to estimate a confidence interval with a margin of error of 4 percentage
points, would the necessary sample size be largest if 𝑝 is small, moderate, or large?
26. You are hired by the state government to study whether a tax on liquor has decreased average
liquor consumption. You have data for 400 individuals on the change in each person’s liquor
consumption, in ounces, computed as liquor consumption in the year after the tax minus liquor
consumption in the year prior to the tax. You find a sample mean change in liquor consumption
of 𝑥 = −32.8 with a sample standard deviation of 𝑠 = 466.4.
(a) 1.166
(b) 21.60
(c) 23.32
(d) 466.4
26.2. You want to test the null hypothesis that the liquor tax had no effect on liquor consumption.
What is the test statistic?
(a) −1.51
(b) −1.41
(c) −0.08
(d) 0
26.3. What is the margin of error of a 95% confidence interval for the mean change in liquor
consumption?
(a) 38.36
(b) 38.45
(c) 45.71
(d) 45.85
27. In class, you did problems related to inference about the mean height of Arapaho men in the
late 19th century, using statistics from a published research paper. Your null hypothesis was
that the mean height among Arapaho men, 𝜇, was equal to the mean height among Australian
men at the time, 172 cm. Among the 57 men sampled, the average height was 𝑥 = 174.3 cm.
For this problem, assume that the distribution of height among Arapaho men is approximately
normal with a population standard deviation of 𝜎 = 7 cm. Under these assumptions, the test
statistic is normally distributed (not 𝑡) under the null hypothesis. Throughout this problem,
use a significance level of 𝛼 = 0.05.
27.1. What is the smallest sample mean among the Arapaho men, 𝑥, for which you would not
reject the null hypothesis?
27.2. What is the largest sample mean among the Arapaho men, 𝑥, for which you would not
reject the null hypothesis?
27.3. Describe the distribution of the sample mean, 𝑋, under the null hypothesis.
27.4. If the mean height among Arapaho men was 𝜇 = 173 cm, what is the probability you will
make a type II error?
27.5. If the mean height among Arapaho men was 𝜇 = 176 cm, what is the probability you will
make a type II error?
28. Below are summary statistics of weekly earnings among college graduates who were “full-year”
workers (working at least 50 of the previous 52 weeks) in North Carolina in the 2019 American
Community Survey (ACS).
Variable | Obs Weight Mean Std. dev. Min Max
-------------+-----------------------------------------------------------------
earnweek | 15,454 1497193 1520.695 1532.912 0 9220
28.1. Test whether mean weekly earnings among this group are equal to $1500. Use a significance
level of 𝛼 = 0.01, and use the critical value method to conduct the test.
28.2. In this case, could you compute the 𝑝-value of the test without using a computer or
calculator? If so, what is it?
29. You are conducting a hypothesis test for a population proportion in a sample of 1,500 people.
You are using a significance level of 𝛼 = 0.10, and you find a test statistic of 𝑧 = 2.10.
(a) 0.018
(b) 0.036
(c) 0.050
(d) 0.100
(a) Yes
(b) No
(c) The test results are right on the border between rejecting and not rejecting the
null hypothesis.
(d) We cannot reach a conclusion from the given information.
30. The poverty gap for a person in a poor household is the ratio by which their household income
falls short of the poverty line. (You do not need the following definition, but if 𝑋 is household
𝐿−𝑋
income and 𝐿 is the poverty line, then the poverty gap is 𝐺 = .) The poverty gap may be
𝐿
used to measure the severity of poverty among the poor population. You take a random sample
of people in poverty in the U.S. and compute the poverty gap for each person in the sample.
Among 160 people in your sample, the average poverty gap is 0.35 with a standard deviation of
0.20. You find that among other OECD countries (a group of mostly high-income countries), the
median of the average poverty gaps is 0.30, and you are interested in testing whether the average
poverty gap in the U.S. is consistent with this value, using a significance level of 𝛼 = 0.10.
30.1. State the null hypothesis and alternative hypothesis of your test.
30.4. Should you reject the null hypothesis? Explain why or why not.
31. Suppose that in the 19th century, the population mean height of Arapaho men was 174 cm.
Researchers found records from a sample of 60 Arapaho men and want to test the null hypothesis
that the mean height of Arapaho men is equal to the population mean height of Australian men
at the time, which is known to be 172 cm. Among the 60 Arapaho men, the sample mean is
172.8 cm and the sample standard deviation is 7.7 cm, so the test statistic is 𝑡 = 0.805. Will the
result of this hypothesis test be an error, and if so, what type of error will be made?
32. If a wage subsidy is created for low-income workers, some workers may respond to this incentive
by working more hours, while others may respond to their now-higher incomes by working
fewer hours and consuming more leisure. (In labor supply theory, the effect of a wage increase
on labor supply is theoretically ambiguous, because the substitution effect and the income effect
are working in opposite directions.) Suppose that before the wage subsidy is introduced, high
school dropouts were working an average of 30 hours per week, and let 𝜇 be the population
mean hours per week when the subsidy is available. After the wage subsidy is created, a sample
of 120 high school dropouts finds an average of 32 hours worked per week, with a standard
deviation of 10.2 hours. Test whether the subsidy has changed the average hours worked among
high school dropouts, using a significance level of 𝛼 = 0.01.
32.5. Should you reject the null hypothesis? Explain why or why not.
32.6. Use R, a calculator, or an online tool to find the 𝑝-value for this hypothesis test.
33. You use the 2017 wave of the National Longitudinal Survey of Youth – 1997 Cohort (NLSY97)
to study educational attainment by demographic group. You find that among 544 Hispanic
women in the sample, the share who have completed college is 0.313.
33.1. Form a 90 percent confidence interval for the proportion of Hispanic women who have
completed college.
33.2. Test the hypothesis that 35 percent of Hispanic women have completed college. Use a
significance level of 𝛼 = 0.10, and use the critical value method to conduct the test.
34. In Assignment 6, you used ACS data from North Carolina to study educational attainment,
income, and poverty in specific metro areas. If you also wanted to study the proportion of
people who worked full time, defined as working 35 or more hours per week, you would first
need to create an indicator variable for full-time work, using the variable uhrswork that records
usual hours worked per week. What would be the correct command to create this indicator of
working full time?
35. Consider the problem from class about a pharmaceutical company that is testing a new drug
intended to lower total cholesterol. The company conducts a trial to see if the new drug is
more effective than the best-performing existing drug, which causes an average change in total
cholesterol of −15 mg/dL during the first year it is taken. The company finds a 90% confidence
interval for the effect of the new drug of (−20.3, −15.7). How should this finding be interpreted?
(a) There is only a 10 percent chance that the true effect of the new drug is −15 mg/dL.
(b) There is a 90 percent chance that the true effect of the new drug is between −20.3
mg/dL and −15.7 mg/dL.
(c) There is a 90 percent chance that the true effect of the new drug is more negative (that
is, more beneficial) than −15 mg/dL.
(d) None of these would be an appropriate interpretation.
36. Suppose that the median member of the Federal Open Market Committee (a group of Federal
Reserve officials who vote on monetary policy) believes the natural rate of unemployment is
4 percent. Using data from last month’s CPS, you find that a 95% confidence interval for the
U.S. unemployment rate is (0.0361, 0.0392). How can you interpret this finding?
(a) There is only a 5 percent chance that the true unemployment rate is 4 percent.
(b) There is a 95 percent chance that the true unemployment rate is between 3.61 percent
and 3.92 percent.
(c) There is a 95 percent chance that the unemployment rate is less than 4 percent.
(d) None of these would be an appropriate interpretation.
37. These questions are about how the level of significance of a hypothesis test relates to the
probabilities of making type I and type II errors. Suppose you plan to take a random sample of
𝑛 = 675 observations and test the null hypothesis 𝐻0 : 𝜇 = 𝜇0 .
37.1. If you decrease the level of significance, 𝛼 (for example, from 5% to 1%), what happens to
the probability of making a type I error when the null hypothesis is true?
37.2. If you decrease the level of significance, 𝛼 (for example, from 5% to 1%), what happens to
the probability of making a type II error when the null hypothesis is false?
38. These questions are about how the sample size relates to the probabilities of making type I
and type II errors in a hypothesis test. Suppose you plan to take a large random sample of 𝑛
observations and test the null hypothesis 𝐻0 : 𝜇 = 𝜇0 .
38.1. If you increase the sample size, 𝑛, what happens to the probability of making a type I error
when the null hypothesis is true?
38.2. If you increase the sample size, 𝑛, what happens to the probability of making a type II error
when the null hypothesis is false?
39. A classmate asks you to use a 𝑡 table to find the probability to the left of a particular 𝑡 value.
What should your response be?
(a) You need to calculate the test statistic first.
(b) You need to know the sample standard deviation before you can find the value you
need.
(c) The 𝑡 table contains critical values, not probabilities, so the request isn’t possible.
(d) You need to know the sample size before you can find the row you need to use.
40. Suppose you conduct a test of 𝐻0 : 𝜇 = 𝜇0 , and you fail to reject this null hypothesis, using a
significance level of 𝛼 = 0.05. Which of the following conclusions could you make about the
95% confidence interval?