Practice Problems For Quiz 1 With Solutions
Practice Problems For Quiz 1 With Solutions
a) True
b) False
Solution: Nominal categorical variables (e.g., hair color) express no order or ranking.
2. Four different beverages sold at a fast-food restaurant include (i) soft drinks, (ii) tea, (iii)
coffee, and (iv) bottled water. Suppose you have a dataset that includes the drink choice of
each customer. The variable that captures each customer’s drink choice would be classified as
a) Continuous numerical
b) Discrete numerical
c) Nominal categorical
d) Ordinal categorical
Solution: Because customers’ drink choices represent categories with no order or ranking,
this variable would be classified as a nominal categorical variable.
3. Suppose the following information is collected from John Doe on his application for a home
mortgage loan:
Solution: All four variables in this question are numerical variables. Both monthly debt
payments and annual family income are discrete variables because they increase in increments
of one cent. Similarly, the number of jobs in the past 10 years is a discrete variable that
increases in increments of one. By contrast, age (more generally, time) is a continuous variable.
1
4. Which of the following is a discrete numerical variable?
a) The number of doctors who wash their hands between patient visits
b) The amount of liquid consumed by the average American each day
c) The weight of a newborn baby at a local hospital
d) The time it takes a person to react to a stimulus
Solution: Liquid volume, weight, and time are all continuous numerical variables. By con-
trast, the number of doctors is a discrete variable that increases in increments of one.
Answer questions 5–9 based on the following information. The following table contains nutritional
data about a sample of seven breakfast cereals
Cereal Calories
Kellogg’s All Bran 80
Kellogg’s Corn Flakes 100
Wheaties 100
Nature’s Path Organic Multigrain Flakes 110
Kellogg’s Rice Krispies 130
Post Shredded Wheat Vanilla Almond 190
Kellogg’s Mini Wheats 200
a) 100
b) 110
c) 130
d) 150
2
6. What is the median number of calories in these breakfast cereals?
a) 100
b) 110
c) 130
d) 150
Solution: We have an odd number of data points (seven). Hence, the median is the middle-
ranked value, which is the fourth observation (110). If you want to use R, you can simply type
and run median(calories).
a) 80
b) 90
c) 120
d) 200
Solution: The range is the difference between the largest and smallest values. That is, 200 −
80 = 120. If you want to use R, you can type and run max(calories)-min(calories).
8. What is the standard deviation of the number of calories in these breakfast cereals? (Select
the closest answer.)
a) 43.42
b) 46.90
c) 80
d) 2,200
3
9. What is the z score of the calories for Kellogg’s Mini Wheats? (Select the closest answer.)
a) −1.07
b) −0.43
c) 0
d) 1.49
Solution: Kellogg’s Mini Wheats is the 7th observation in our dataset. Using the formula for
the z score, we find that the z score for the 7th observation is
x7 − x̄ 200 − 130
z7 = = = 1.49
s 46.90
If you want to use R, you can type and run (calories[7]-mean(calories))/sd(calories),
where typing calories[7] tells R that you want to work with the 7th observation.
10. Based on a sample of seven breakfast cereals, the mean and the standard deviation of the
number of calories are 130 and 46.9, respectively. By contrast, the mean and the standard de-
viation of the amount of sugar in grams are 5.86 and 3.39, respectively. A marketing researcher
decides to use the coefficient of variation to measure the relative variability of different vari-
ables. What would the marketing researcher conclude about the relative variability of the
number of calories and the amount of sugar in these cereals?
a) Relative to the mean, the amount of sugar is more variable than the number
of calories
b) Relative to the mean, the number of calories is more variable than the amount of sugar
c) Relative to the mean, the number of calories and the amount of sugar have the same
variability
d) It is impossible to compare the variability of the number of calories with that of the
amount of sugar
Solution: We need to calculate the coefficient of variation for each variable. The coefficient
of variation of the number of calories is s/x̄ = 46.9/130 = 0.36. The coefficient of variation of
the amount of sugar is s/x̄ = 3.39/5.86 = 0.58. Thus, we conclude that relative to the mean,
the amount of sugar is more variable than the number of calories.
4
Answer questions 11 and 12 using the following information. The following table contains some
summary statistics about hotel room prices in 28 hotels located in downtown Nashville
a) $10
b) $90
c) $310
d) We do not have enough information to calculate the interquartile range.
Solution: The interquartile range is the difference between upper and lower quartiles. Thus,
we have $270-$180=$90.
12. How many hotels have prices ranging between $180 and $270? (Select the closest answer.)
a) 7
b) 14
c) 21
d) 28
Solution: By definition, each quartile includes 25% of the observations. As such, the range
between the lower quartile and upper quartile includes 50% of the observations (25% of those
observations are between the lower quartile and the median, and the remaining 25% are
between the median and the upper quartile). Because there are 28 observations, we have
0.5 × 28 = 14.
5
13. The manager of a local diner has calculated her average daily sales to be $4,500 with a
standard deviation of $750. Assuming that the daily sales possess a bell-shaped distribution,
in what range can the manager expect her daily sales to be 95% of the time?
a) [$3.750, $5,250]
b) [$3,000, $6,000]
c) [$2,250, $6,750]
d) [$0, $9,000]
Solution: The empirical rule suggests that approximately 95% of observations fall within
two standard deviations of the mean. Thus, we have $4, 500 − 2 × $750 = $3, 000 on the lower
end point and $4, 500 + 2 × $750 = $6, 000 on the upper end point.
14. An operations manager has the following information about four products
Product Mean Price Standard Deviation of Price Mean Demand Standard Deviation of Demand
Product A $16 $5 100 30
Product B $14 $3 120 40
Product C $20 $6 90 35
Product D $18 $3 100 30
In addition, the operations has the following information about the covariance of price and
demand for these four products
Product Covariance
Product A -120
Product B -75
Product C -140
Product D -30
Which product has the strongest correlation between price and demand?
a) Product A
b) Product B
c) Product C
d) Product D
Solution: First, we need to calculate the correlation for each product. Recall that the corre-
lation formula is
cov(x, y)
corr(x, y) = ,
sx sy
where cov(x, y) is the covariance of two variables x and y, sx is the standard deviation of the
first variable (x), and sy is the standard deviation of the second variable (y). Accordingly, we
6
have
−120
Product A: = −0.80,
5 × 30
−75
Product B: = −0.63,
3 × 40
−140
Product C: = −0.67,
6 × 35
−30
Product D: = −0.33.
3 × 30
The correlation takes values between −1 and 1, where 0 indicates no correlation and −1
and 1 indicate perfect negative and perfect positive correlation, respectively. Based on the
above numbers, we conclude that product A has the strongest correlation between price and
demand.
15. Contingency tables appear frequently in legal cases, such as those that allege that a company
has discriminated against a protected class. The following table gives the number of employees
of different ages who were laid off when a company anticipated a decline in business
Based on these data, which of the following statements would be most accurate?
a) There is no association between age and job retention because the company laid off the
same number of employees from each age group
b) There is a negative association between age and job retention (i.e., older
employees were more likely to be laid off when the company anticipated a
decline in business)
c) There is a positive association between age and job retention (i.e., older employees were
less likely to be laid off when the company anticipated a decline in business)
d) The company should layoff more people
Solution: We first calculate the proportion of employees that retained their jobs in each age
group to have apples to apples comparisons between different age groups. The proportion
of employees that retained their jobs is 787/805 = 97.76% in the first age group (< 40),
632/650 = 97.23% in the second age group (40-49), 374/392 = 95.41% in the third age group
(50-59), and 107/125 = 85.60% in the fourth age group (60 or older). Because the proportion
of employees that retained their jobs decreases as the age group increases, we conclude that
there is a negative association between age and job retention. Note that having an association
does not necessarily mean that there is a causal relationship (i.e., these numbers are unlikely
to be sufficient to conclude that the company has discriminated against a protected class.)
7
16. Which of the following relationships is most likely to be driven by a spurious correlation?
a) There is a negative relationship between a car’s age and sales price (i.e., a car’s sales
price decreases as its age increases)
b) There is a positive relationship between a house’s size (sq. ft.) and sales price (i.e., a
house’s sales price increases as its size increases)
c) There is a positive relationship between the number of storks nesting in a
geographic area and the number of human babies born in that area (i.e., the
number of human babies born in an area increases as the number of storks
nesting in the area increases)
d) There is a positive relationship between years of education and salary (i.e., a person’s
salary increases as his/her education level increases)
Solution: It is natural to expect that a car’s overall condition will deteriorate over time,
leading to decline in its price. A larger house is typically more expensive partly because
it has higher construction costs. Having more years of education typically leads to better
employment opportunities, leading to higher salaries.
If you believe that storks bring babies, I am sorry to disappoint you. There is indeed a posi-
tive correlation between stork nests and the number of babies born. One explanation for this
relationship is population density. Densely populated areas naturally have more births. Fur-
thermore, they provide more nesting opportunities (i.e., buildings) for storks. Consequently,
there is a spurious correlation between the number of storks nesting in an area and the number
of babies born in that area.