Exercises
Exercises
Fall 2001
B6014: Managerial Statistics
1. Descriptive Statistics
2. Probability and Expected Value
3. Covariance and Correlation
4. Normal Distribution
5. Sampling
6. Confidence Intervals
7. Hypothesis Testing
8. Regression Analysis
Exercises
Descriptive Statistics
Fall 2001
B6014: Managerial Statistics
# Occurrences
7
6
5
4
3
2
1
0
3
4
Values
2
1.5
1
0.5
0
-2
-1
-0.5
-1
-1.5
-2
10
0.25
Frequency
0.2
0.15
0.1
0.05
0
0
10
20
30
40
50
60
70
80
90
100
Price
Figure 3: Histogram of bond prices at default, 1974-1995. (Source: Moodys Investor Services.)
8. Which of the following best describes the data in Figure 3? (Base your answer on the
appearance of the histogram. You do not need to do any calculations. Select just one
statement below and complete the one you select.)
(a) The mean is greater than the median because
(b) The median is greater than the mean because
(c) The mean and median are roughly equal because
9. One proposal that has received little attention from Major League Baseball is to pay
pitchers according to the following rule: each pitcher receives a base salary of $4.25 million,
minus $0.25 million times his earned run average (ERA). (A lower ERA is associated with
better performance.) If this rule were adopted, what would be the correlation between a
pitchers earnings and ERA? (Assume that the ERA cannot exceed 17, so this rule never
results in negative earnings. You may also assume a standard deviation of 1.2 for ERA.)
10. Using the data in Figure 4, answer both (a) and (b) below, providing a numerical value
for each.
(a) The mean of the data in the histogram is
(b) The median of the data in the histogram is
11. Cluster had exams in Finance and Marketing last week. All 60 students in the cluster
took both exams. The results were as follows:
Finance: mean = 25, standard deviation = 2
Marketing: mean = 75, standard deviation = 12
Correlation between score in Finance and same students score in Marketing = 0.84
Mary, a student in Cluster , scored a 30 in Finance and a 90 in Marketing. We are
interested in comparing her performance on the two exams relative to the rest of the
class. In particular, we would like to make a statement about which of her scores ranked
higher compared to the other scores on the same exam. Select one of the choices below
and complete the statement you select.
11
20
18
16
# observations
14
12
10
8
6
4
2
0
0
10
value
12. Seven students from the 1998 MBA class took jobs in brain surgery after graduation. Five
of the students reported their starting salaries: $55,000, $90,250, $90,250, $95,500, and
$105,000. Choose one of the following:
(a) Based on the information given, the largest possible value of the median starting
salary for all seven students is
(b) Based on the information given, it is not possible to put an upper limit on the median
starting salary for all seven students.
12
n
1
X2
n i=1 i
n
1
Xi
n i=1
2
Mean
Xi
0
0
0
0
1
2
2
3
4
4
4
5
5
6
7
2.867
Xi2
0
0
0
0
1
4
4
9
16
16
16
25
25
36
49
13.4
5.182 = 2.28.
2
(Xi X)
8.218
8.218
8.218
8.218
3.484
0.751
0.751
0.018
1.284
1.284
1.284
4.551
4.551
9.818
17.084
5.182
the mean; these values are in the third column of Table 1. The average of these squared
dierences gives the variance
n
1
2 = 5.182.
(Xi X)
n i=1
5.182 = 2.28.
Method 4. List the distinct values observed without repetition, as in the second column
of Table 2. For each value, calculate what proportion of the observations had that value,
as in the rst column of the table. Calculate the weighted average of the values, using
the proportions as weights; the result is 2.867. Now calculate the weighted average of the
2
squared values
to get 13.4. The dierence 13.4 (2.867) = 5.182 is the variance. The
square root 5.182 = 2.28 is the standard deviation.
Proportion
4/15
1/15
2/15
1/15
3/15
2/15
1/15
1/15
Weighted Avg.
Value
0
1
2
3
4
5
6
7
2.867
Squared Value
0
1
4
9
16
25
36
49
13.4
95, 000,
106, 000,
106, 000,
118, 000
106, 000,
106, 000,
118, 000,
150, 000.
6. The answers are T, T, F, T. (i) should be clear because the range given is exactly 2
standard deviations. (ii) First we nd the m that goes with 80%:
.80 = 1
1
m = 2.236.
m2
Thus, Chebyshev tells us that at least 80% are between 34.35 and 65.65. Clearly, at least
as many observations must lie between 34 and 66. (iii) To guarantee 88.9% we need a
range of 3 standard deviations, which would be from 29 to 71. The range given only
goes as low as 31. (iv) First nd how many standard deviations below the mean 30 is:
(50 30)/7 = 2.86.
Chebyshev guarantees that at least
1
1
= 87.8%
(2.86)2
of the observations are within 2.86 standard deviations of the mean. But then at most
12.2% can be below 30.
7. Exactly 25% are greater than 15, so 15 can be at most 2 standard deviations above the
mean of 10; i.e., the standard deviation cannot be less than 2.5.
8. (a) because the distribution is skew right.
9. The proposed rule makes Salary = 0.25 ERA + 4.25, a linear transformation with
negative slope. The correlation is therefore 1.
10. (a) The mean is [(20 0) + (2 9)]/22 = 18/22 = 0.82. (b) The median is 0. (List the
observations from smallest to largest. Since we have an even number of observations, take
the two closest to the middle the 11th and the 12th and average them. Both are 0,
so the median is 0.)
11. The answer is (i) because her score in Marketing is 2.5 standard deviations above the
mean whereas her score in Finance is only 1.25 standard deviations above the mean.
12. Even if the two unknown salaries were very large, the median could not be larger than the
fourth smallest value, $95,500. For example, if the two unknown salaries were $500,000,
the values become
55,000
90,250,
90,250,
95,500,
15
105,000,
500,000,
500,000,
Exercises
Probability and Expected Value
Fall 2001
B6014: Managerial Statistics
1. An inverse oater is a type of security whose payments move in the opposite direction of
short-term interest rates. The security is ordinarily structured so that no matter how high
interest rates rise, the payment cannot be negative, and no matter how low interest rates
drop, the payment cannot exceed some specied cap, e.g., 7%. Consider the following
specic case: if the prevailing short-term rate is X, the inverse oater pays
Y = 100 max(0.07 X, 0)
on $100 of face value. (The notation max(0.07 X, 0) means use whichever is larger,
0.07 X or 0.) Suppose the distribution of the short-term rate X at the next payment
date is given by the following table:
x
P (X = x)
0.04
0.30
0.06
0.20
0.07
0.20
0.08
0.15
0.09
0.15
16
3. The Gourmet Cafe serves the exotic Bernoulli Salmon at lunch and dinner. The number of
customers ordering the salmon at lunch and dinner are given by the following distributions:
Lunch demand
probability
0
0.3
1
0.5
2
0.2
Dinner demand
probability
0
0.2
1
0.4
2
0.4
Assume the lunch and dinner demands are independent of each other so the joint distribution of the lunch and dinner demands is given by the following table:
Dinner
0
1
2
0
0.06
0.12
0.12
0.3
Lunch
1
0.10
0.20
0.20
0.5
2
0.04
0.08
0.08
0.2
0.2
0.4
0.4
(Each entry of the table is just the product of the marginal probabilities at the end of the
corresponding row and the bottom of the corresponding column.)
The chef orders the sh in advance at a cost of $3.50 per serving. Any sh left over at
the end of the day is discarded.
(a) What is the expected total demand for the sh in a day?
(b) Suppose the chef orders three servings. What is the breakeven selling price (i.e.,
the price at which the expected revenue from sales of the sh equals the cost of the sh
ordered)? Assume that a customer who would have ordered the sh but nds it sold out
simply leaves rather than order something else. (Hint: Expected revenue = price times
expected number of units sold.)
4. The Uris & Warren Ratings Agency rates bonds on a simplied scale with just three
categories: A, B, and C. The Professors Pension Fund has all its money invested in two
bonds, X and Y , both of which are currently rated B. Over the course of the next year
the ratings of the bonds may change; the end-of-year value (in millions) of each bond
depends on its end-of-year rating as in the following table:
Rating
A
B
C
X Value
100
75
50
Y Value
100
75
50
The joint distribution of the end-of-year ratings of the two bonds is given by the following
table:
17
A
B
C
A
0.20
0
0
Y
B
0.05
0.40
0.05
C
0
0.10
0.20
(a) Find the probability that bond X will be rated C at the end of one year.
(b) Find the expected value of the year-end value of bond X.
(c) The Pension Fund buys an insurance contract that will pay 20 million if either bond
is downgraded to C. If both bonds are downgraded the contract still pays 20; if neither
is downgraded the contract pays nothing. Find the expected value of the payo of the
insurance contract.
18
0.04
0.30
3
0.06
0.20
1
0.07
0.20
0
0.08
0.15
0
0.09
0.15
0
probability
0.30(1-0.30)
(1-0.30)0.30
(0.30)(0.30)
(1-0.30)(1-0.30)
payo
4
4
7
0
Thus,
E[X] = (2 0.3 0.7 4) + (0.30 0.30 7) = 2.31
3. (a) From the joint distribution of lunch and dinner demands we nd that the distribution
of total demand is as follows:
Demand
Prob
0
0.06
1
0.22
2
0.36
3
0.28
4
0.08
0
0
0.06
1
1
0.22
2
2
0.36
3
3
0.28
4
3
0.08
4. (a) The question asks for the marginal probability that X will be rated C. By summing
the third row of the table, we nd that this is 0.25.
(b) To nd E[X] we need to nd the possible values and their probabilities:
Scenario
A
B
C
Value
100
75
50
Prob.
0.25
0.50
0.25
The probabilities are obtained by summing across the rows of the table; this is the marginal distribution of X. We now nd
E[X] = 100 0.25 + 75 0.50 + 50 0.25 = 75.
(c) The insurance contract pays if either bond is downgraded; thus, the scenarios in which
the contracts pays are the ones in boxes in the following table:
A
B
C
A
0.20
0
0
Y
B
0.05
0.40
0.05
C
0
0.10
0.20
20
Exercises
Covariance and Correlation
Fall 2001
B6014: Managerial Statistics
1. You manage a retail operation from which you sell to both walk-in and telephone customers. For a particular product, your goal is to set the inventory so that 99% of customers
looking or calling for the product nd it in stock. Consider the following two scenarios:
(i) Days with a lot of walk-in customers are also days with a lot of telephone orders. (ii)
Days with a lot of walk-in customers tend to be days with fewer telephone orders and
vice-versa. In which scenario would you expect to have to hold more total inventory to
meet your service objective? Explain your answer by making reference to the concepts of
standard deviation and correlation.
2. You invest $3 thousand in one stock and your spouse invests $2 thousand in another.
Over the next year, each dollar invested in your pick will increase by X dollars and each
dollar invested in your spouses will increase by Y dollars; X and Y are random variables
with the following properties:
X has a mean of 0.09 and a standard deviation of 0.20.
Y has a mean of 0.12 and a standard deviation of 0.27.
The correlation between X and Y is 0.6.
Your individual earnings are 3X thousand, your spouses individual earnings are 2Y
thousand and your family earnings are the sum of two.
(a) What is the expected value of your family earnings?
(b) What is the standard deviation of your family portfolio earnings?
3. Let X, Y , and Z be random variables. Consider the following statements:
(i) if Cov[X, Y ] > Cov[Y, Z] then XY > Y Z .
(ii) if XY > Y Z then X > Y .
(iii) if XY > 0 then Cov[X, Y ] > 0.
Now pick one of the following:
(a) all of the above are true
(b) (i) and (ii) are true
(c) (i) and (iii) are true
(d) (ii) and (iii) are true
21
22
2
2
walkin
+ phone
+ 2walkin phone ,
where walkin , phone are the standard deviations for the individual types of demands and
is their correlation. If the demand streams are positively correlated, the variability of
total demand will be greater, resulting in a higher inventory requirement. If the demands
are negatively correlated, total variability and required inventory will be lower.
2. (a) E[3X + 2Y ] = 3E[X] + 2E[Y ] = 3(.09) + 2(.12) = .51.
2 + 4 2 + 2(3)(2) = 1.04. The standard deviation is the
(b) V ar[3X + 2Y ] = 9X
X Y
Y
square root, 1.02.
58.4 = 7.64.
5. (a) The standard deviation of the price change for one gallon is 0.032 so the standard
deviation for 1, 000, 000 gallons is 32,000.
(b) After buying 24 contracts, the company in eect has a position of 1, 000, 000X +
24 42, 000Y , using the notation in the hint. The resulting standard deviation is
(1, 000, 000)2 (.032)2 + (24 42, 000)2 (.040)2 + 2(1, 000, 000)(24)(42, 000)(.032)(.040)(0.8)
= 24, 193.
This is lower than the unhedged standard deviation.
(c) A simple way to nd the mimimum is to repeat the calculation in (b) with 24 replaced
by several dierent values and then graph the results; see Figure 5. From this it is clear
that the answer is about 15.
For a more systematic approach, let b denote the number of contracts and notice that we
want to nd the value of b that minimizes
2
+ (b 42, 000)2 Y2 + 2(1, 000, 000)b(42, 000)X Y .
(1, 000, 000)2 X
23
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0
0
10
15
20
Number of Contracts
25
30
35
X
1, 000, 000
,
Y
42, 000
24
Exercises
Normal Distribution
Fall 2001
B6014: Managerial Statistics
1. Daily demand for widgets is normally distributed with a mean of 100 and a standard
deviation of 15.
(a) What is the probability that the demand in a day will exceed 125?
(b) What is the probability that demand will be less than 75? Less than 70?
(c) How many widgets should be stocked to ensure that with 95% probability all demands
will be met?
2. The plant manager of a manufacturing facility is concerned about drug use among plant
workers and plans to implement random drug testing. One of the tests to be applied
measures the level of factor-X in blood samples. Among recent users of cocaine, the level
of factor-X is normally distributed with a mean of 10.0 and a standard deviation of 1.3.
Among non-users, the level is normally distributed with a mean of 6.75 and a standard
deviation of 1.5.
The employer plans to send a warning letter to all employees with a factor-X level of x
or greater, with x to be determined.
(a) Find the value of x that will ensure that 90% of recent cocaine users will be sent a
warning letter.
(b) If your answer to part (a) is adopted, what proportion of non-users will also be sent
warning letters?
3. A bank nds that the one-day increase in the dollar value of its foreign exchange portfolio
is normally distributed with a mean of $1.5 million and a standard deviation of $9.7
million. (A negative increase is a loss.)
(a) Find the value x such that the probability that the portfolio will lose more than x
dollars in one day is 5%.
(b) For the x you found in part (a), what is the probability that the portfolio will increase
in value by more than x dollars in one day?
4. Mogul Magazine has recently completed an analysis of its customer base. It has determined that 75% of the issues sold each month are subscriptions and the other 25% are
sold at newsstands. It has also determined that the ages of its subscribers are normally
distributed with a mean of 44.5 and a standard deviation of 7.42 years, whereas the ages
of its newsstand customers are normally distributed with a mean of 36.1 and a standard
deviation of 8.20 years.
25
(a) Mogul would like to make the following statement to its advertisers: 80% of our
and
. Your job is to ll in the blanks,
subscribers are between the ages of
choosing a range that is symmetric around the mean. (In other words, the mean age of
subscribers should be the midpoint of the range.)
(b) What proportion of Moguls newsstand customers have ages in the range you gave in
(a)?
(c) What proportion of all of Moguls customers have ages in the range you gave in (a)?
26
35 36.1
54 36.1
)P (Z <
) = P (Z < 2.18)P (Z < 0.13) = 0.9850.447 = 0.538.
8.2
8.2
(c) Of the 75% in the subscriber group, 80% are in the range. Of the remaining 25%,
53.8% are in the range. Thus, the overall fraction is (0.75)(0.80) + (0.25)(.538) = 0.734.
27
Exercises
Sampling
Fall 2001
B6014: Managerial Statistics
1. Let X1 , . . . , X10 be a random sample from a population with mean 50 and standard
be the sample mean. Find the expected value and standard deviation
deviation 4. Let X
of X.
2. Each visit to Uris.com has a 2% chance of turning into a purchase. Let p denote the
proportion of visits that turn into purchases from a random sample of 100 visits. Find
the expected value and standard deviation of p.
3. (a) The starting salary among 1999 graduates of the Evian Business School has a standard
deviation of $17,000. If you randomly survey 40 students and average their starting
salaries, what is the probability that the average among these students will be greater
than the average among all students?
(b) What is the probability that the average in the sample will exceed the average among
all students by more than $5,000?
(c) What is the probability that the average in the sample will dier from the average
among all students by more than $5,000?
4. A traders annual bonus is normally distributed with a mean of $650,000 and standard
deviation $125,000. Her bonuses are independent from year to year. Find the probability
that her average bonus over a ve-year period will be less than $500,000.
5. Fifty-seven percent of students at Calabria Business School support making Introduction
to Philanthropy a required course. The school plans to survey 100 students to gauge
opinion on this issue. What is the probability that fewer than half of those surveyed will
say they support requiring the course?
6. A wireless communication company is considering switching from a per-minute charge to
a at monthly fee for unlimited service. It anticipates that this will result in greater usage
and would like to estimate what the average number of minutes per month per customer
will be under the new plan. To do this, it oers 1000 customers the at-fee plan for one
month and tracks their usage. From past data, the company knows that the standard
deviation of monthly minutes is about 120 minutes; the company expects that this gure
will not change much under the new plan. Find the probability that the average among
the 1000 test customers will dier from the true mean by more than 5 minutes.
28
7. A company believes that roughly 35% of consumers rate its brand rst in quality. It
would like to estimate the proportion more accurately and retains a market research rm
to survey n consumers. Using the initial estimate of 35% as a guide, determine how large
n must be to ensure that there is only a 5% chance that the estimate from the survey will
dier from the true value by more than 3 percentage points.
29
Solutions: Sampling
p(1 p)/n =
.02(1 .02)/100 =
denote the average for the sample and the average among all students. Then X
3. Let X
2
is (approximately) normally distributed with a mean of and a variance of /n, where
is given as $17,000 and n = 40.
> ) = 1/2 because the mean of a normal distribution is also its median.
(a) P (X
P (
p < 0.50) = P
0.50 p
p p
<
p(1 p)/n
p(1 p)/n
.07
= P Z<
.57(1 .57)/100
= P (Z < 1.41) = 0.0793.
denote the sample mean among the 1000 test customers and let denote the true
6. Let X
N (, 2 /n), with given as 120 and n = 1000. We
mean under the new plan. Then X
need to nd
< 5 or X
> + 5).
P (X
By symmetry of the normal, this is the same as
> + 5).
2P (X
Standardizing, this becomes
2P
5
X
>
/ n
/ n
= 2P
Z>
120/ 1000
7. Let p denote the estimate obtained by the market research rm and let p denote the true
proportion. Even though we dont know p, p, or n, we do know that the error in the
30
0.03 = 1.96
Solving for n, we get
p(1 p)
.
n
Of course, we still dont know p, but heres where we use the rough prior estimate of 35%
as a guide. Substituting this value, we nd
n (0.35(1 0.35)) (1.96/0.03)2 = 971.
31
Exercises
Confidence Intervals
Fall 2001
B6014: Managerial Statistics
1. Let X1 , . . . , X10 be a random sample from a normal population with mean and standard
be the sample mean and suppose X
= 48. Find a 95% condence
deviation 4. Let X
interval for .
2. Each visit to Uris.com has a probability p of resulting in a purchase. Out of a random
sample of 500 visits, 15 result in purchases. Use this to nd a 95% condence interval for
p.
3. In a random sample of 40 students from the 1999 class at Evian Business School, the average staring salary is $110 thousand and the sample standard deviation is $16 thousand.
Find a 95% condence interval for the population mean.
4. A wireless communication company is considering switching from a per-minute charge to
a at monthly fee for unlimited service. It anticipates that this will result in greater usage
and would like to estimate the change in the average number of minutes per month per
customer resulting from the new plan. To do this, it oers 850 customers the at-fee plan
for one month and tracks their usage. It also tracks a control group of 700 customers under
the old plan. Among the 850 test customers, it nds an average usage of 227 minutes
and a sample standard deviation of 135 minutes; in the control group, the corresponding
values are 163 and 85. Find a 95% condence interval for the increase in mean usage.
5. For a t-based condence interval, what multiplier would you use with a sample size of
15 and a condence level of 95%? 90%? What condence level would you get with a
multiplier of 2.624?
6. Six months after elections in the Democratic Republic of Urisia, a newspaper reports that
two-thirds of French companies with major government contracts have enjoyed increases
in the value of their contracts since the election and accuses the government of proFrench policies. Table 3 shows the contract values (in millions of dollars) for the 12
French companies with a major presence in Urisia, before and after the election. Using
the t distribution, calculate a 95% condence interval to assess the dierence in average
contract size before and after the election. What does this condence interval suggest?
7. (a) A recent study compared HMOs that provide nancial incentives for physicians to
reduce referrals with HMOs that do not. The study found that in the rst group 12 out
of 52 physicians surveyed said they would refer a patient with chest pain upon waking
32
Average
Std Dev
Before
68.3
37.0
121.4
113.1
117.3
47.7
85.7
99.4
108.8
92.0
63.3
33.9
82.3
31.6
After
71.5
32.5
122.1
119.8
111.2
44.0
94.6
103.6
129.4
74.7
75.1
39.2
84.8
33.8
Dierence
3.2
-4.6
0.7
6.6
-6.1
-3.6
8.8
4.2
20.6
-17.2
11.8
5.3
2.5
9.7
33
p 1.96
p(1 p)
= .03 1.96
n
(.03)(1 .03)
= .03 .015.
500
Y ) 1.96
(X
s2
s2X
+ Y = 164 11.04.
nX
nY
5. Look up tn1,/2 with n = 15 and = 5%; i.e., t14,.025 which is 2.145. For 90% its
1.761. A multiplier of 2.624 leaves 1% probability in each tail and thus results in a 98%
condence level.
6. This is a matched-pairs setting because we are comparing the same companies before and
after the election. The average of the dierences is 2.5 and their standard deviation is
9.7. The sample size is 12, so we use the t-multiplier for 11 degrees of freedom and 95%
condence (right-tail probability 0.025) which is 2.201. Putting the pieces together we
get a condence interval of
9.5
2.5 2.201 = 2.5 6.0.
12
Comment: This condence interval is very wide. The uncertainty (as measured by the
halfwidth of 6.0) is very large compared with the average increas of 2.5 million. Indeed,
we cannot even be condent that the mean changed after the election because the interval
crosses zero. (Note that for this interpretation of the condence interval we need to view
the 12 companies as a random sample from a larger hypothetical population.)
7. (a) No incentive: pX = 18/45 = 0.40, nX = 45. With incentive: pY = 12/52 = 0.23,
nY = 52. Condence interval:
pX pY 1.96
pX (1 pX ) pY (1 pY )
+
= 0.17 0.18.
nX
nY
(b) Now we have pX = 0.40, nX = 450, pY = 0.23, nY = 520. The same formula gives a
condence interval of 0.17 0.058. We can make two comments
comparing (a) and (b).
First, the condence interval in (b) is narrower by a factor of 10 because both sample
sizes in (b) are 10 times larger. Second, even though the dierence in the two proportions
is the same in both cases (0.17), this dierence is more convincing in (b) because of the
narrower interval. This is again a consequence of the larger sample size.
34
Exercises
Hypothesis Testing
Fall 2001
B6014: Managerial Statistics
1. Let X1 , . . . , X10 be a random sample from a normal population with mean and standard
be the sample mean and suppose X
= 48. Let denote the unknown
deviation 4. Let X
population mean. Test H0 : = 45 vs. H1 :
= 45 at the 5% level. Test H0 : 45 vs.
H1 : > 45 at the 5% level.
2. Each visit to Uris.com has a probability p of resulting in a purchase. Out of a random
sample of 500 visits, 15 result in purchases. Is this statistically signicant evidence that
p
= 2%?
3. In a random sample of 40 students from the 1999 class at Evian Business School, the average staring salary is $110 thousand and the sample standard deviation is $16 thousand.
How signicant is the evidence that the population mean is greater than $100 thousand?
Give a p-value.
4. A wireless communication company is considering switching from a per-minute charge to
a at monthly fee for unlimited service. It anticipates that this will result in greater usage
and would like to estimate the change in the average number of minutes per month per
customer resulting from the new plan. To do this, it oers 850 customers the at-fee plan
for one month and tracks their usage. It also tracks a control group of 700 customers under
the old plan. Among the 850 test customers, it nds an average usage of 227 minutes
and a sample standard deviation of 135 minutes; in the control group, the corresponding
values are 163 and 85. Is the increase in mean usage statistically signicant?
5. Six months after elections in the Democratic Republic of Urisia, a newspaper reports that
two-thirds of French companies with major government contracts have enjoyed increases
in the value of their contracts since the election and accuses the government of proFrench policies. Table 4 shows the contract values (in millions of dollars) for the 12
French companies with a major presence in Urisia, before and after the election. Using
the t distribution, test whether the dierence in average contract size before and after is
statistically signicant at the 5% level.
6. (a) A recent study compared HMOs that provide nancial incentives for physicians to
reduce referrals with HMOs that do not. The study found that in the rst group 12 out
of 52 physicians surveyed said they would refer a patient with chest pain upon waking
to a cardiologist; in the second group, 18 out of 45 physicians surveyed said they would
make the referral. Is the dierence statistically signicant?
(b) Repeat the test now assuming the proportions are 120 out of 520 and 180 out of 450.
Comment on the dierence between this and part (a).
35
Average
Std Dev
Before
68.3
37.0
121.4
113.1
117.3
47.7
85.7
99.4
108.8
92.0
63.3
33.9
82.3
31.6
After
71.5
32.5
122.1
119.8
111.2
44.0
94.6
103.6
129.4
74.7
75.1
39.2
84.8
33.8
Dierence
3.2
-4.6
0.7
6.6
-6.1
-3.6
8.8
4.2
20.6
-17.2
11.8
5.3
2.5
9.7
45
X
= 2.37.
4/ 10
The rejection criterion at 5% is Z > 1.96 or Z < 1.96, so we reject the null hypothesis.
In the second test, the rejection criterion is Z > 1.645 so we again reject the null.
2. Set up the test as H0 : p = .02 vs. H1 : p
= .02. Test statistic:
p .02
Z=
.02(1.02)
500
= 1.60,
with p = .03. Note that we use .02 (the value under the null) in calculating the standard
error. At 5% signicance level, we would reject if Z > 1.96 or Z < 1.96; clearly, the
evidence is therefore not signicant at 5%. To measure how signicant it is we nd a pvalue. This is the area outside the range (1.60, 1.60) under the standard normal, which
is 11%.
3. For H0 : 100 vs. H1 : > 100 the test statistic is
Z=
100
110 100
X
= 3.95.
=
s/ n
16/ 40
The p-value is the area to the right of 3.95 under the standard normal curve which is zero
to four decimal places. The evidence is therefore extremely signicant.
36
1352
850
852
700
= 11.35.
This is a very large Z value; the null can be rejected at almost signicance level; the
evidence is highly signicant.
5. This is a matched-pairs setting because we are comparing the same companies before and
after the election. We need to test H0 : X Y = 0 vs. H0 : X Y
= 0, with Y =
Before and X = After. The average of thedierences is 2.5 and their standard deviation
is 9.7. The test statistic is t = 2.5/(9.7/ 12) = 0.89. To reject at 5% signifcance we
would need t > t11,.025 or t < t11,.025 . But t11,.025 = 2.201 so we cannot reject the null.
The test statistic 0.89 tells us that 2.5 is not statistically dierent from 0 because it is
only 0.89 standard errors away from 0.
6. We need to test H0 : pX pY = 0 vs. H0 : pX pY
= 0 with X = no incentive and
Y = with incentive. (a) We are given pX = 18/45 = 0.40, nX = 45. With incentive:
pY = 12/52 = 0.23, nY = 52. Under the null hypothesis, the two proportions are equal
so we rst need to calculate the pooled estimate
p0 =
18 + 12
= 0.31.
45 + 52
p0 (1 p0 ) p0 (1 p0 )
+
= 0.094.
nX
nY
0.40 0.23
pX pY
=
= 1.80.
0.094
0.094
Because this fails to exceed 1.96, it is not signicant at 5% (though it would be signicant
at 10%).
(b) With the larger sample sizes p0 is unchanged but the standard error becomes
p0 (1 p0 ) p0 (1 p0 )
+
= 0.030.
450
520
The test statistic becomes Z = 0.17/0.030 = 5.71 which now looks extremely signicant.
Thus, the dierence of 17% looks signicant in (b) but not in (a) because (b) is based on
a much larger sample size.
37
Exercises
Regression Analysis
Fall 2001
B6014: Managerial Statistics
The raw data for all of these exercises can be downloaded in the le Exercises Regression.xls.
It is not necessary to download this le to solve these exercises but you may nd it helpful.
The data for Exercises 1-4 is from Managerial Statistics, by S.C. Albright, W. Winston, and
C. Zappe, published by Duxbury Press.
1. A drugstore chain has collected data on of 50 of its stores. For each store it has values of
the following two variables:
PROMOTE: Store promotional expenditure as a percentage of expenditure
by leading competitor in the stores area
SALES: Store sales as a percentage of leading competitors
Figure 6 shows part of the output from a regression of SALES against PROMOTE.
ANOVA
df
Regression
Residual
Total
Intercept
Promote
1
48
49
Coefficients
25.126
0.762
SS
2172.88
2624.74
Std Err
11.883
0.121
ANOVA
df
Regression
Residual
Total
Intercept
Time
1
43
44
SS
57.0874
3.9456
61.0330
MS
F
57.0874 622.1441
0.0918
Coefficients
0.219
0.087
Std Err
0.09
0.00
t Stat
P-value
2.46
0.018
24.94
0.000
Intercept
Female
Coefficients
45.505
-8.296
Std Err
1.284
1.564
t Stat
35.453
-5.302
Intercept
YrsPrior
Female
YrsExper
0.702
0.492
0.485
8.079
208
Coefficients
35.492
0.131
-8.080
0.988
Std Err
1.341
0.181
1.198
0.081
t Stat
26.466
0.726
-6.744
12.208
Albright, Winston, and Zappe state that this is real data taken from an actual law suit.
39
(a) Interpret the coecient for Female in Figure 8. Interpret the p-value that goes with
it.
(b) Interpret the coecient for Female in Figure 9. Interpret the p-value that goes with
it.
(c) The coecient for Female has larger magnitude in Figure 8 than Figure 9. Based on
information in the regression outputs, can you suggest an explanation for this? What
other information in the outputs might temper your conclusion?
(d) You have been hired to help defend the bank. What argument could you oer to
defend the bank against the evidence in the gures? What additional regressions might
support your argument?
4. Figure 10 shows the output from a regression of the following variables
QUANTITY: number of cars sold in US, in thousands
PRICE: index of ination-adjusted car prices
INCOME: ination-adjusted measure of disposable income, in thousands
INTEREST: prime rate of interest
We have one value of each of these variables for each year running from 1970 to 1987.
QUANTITY is the dependent variable.
Regression Statistics
Multiple R
0.843
R Square
0.711
Adjusted R Square
0.649
Standard Error
Observations
18
ANOVA
df
Regression
Residual
Total
Intercept
Price
Income
Interest
SS
3 15980400
14 6494658
17 22475058
Coefficients
1332.062
-62.400
8.290
-116.554
Std Err
2854.15
18.54
2.54
57.62
MS
F
P-value
11.48
0.000
t Stat
P-value Lower 95% Upper 95%
0.47
0.648
-4789.5
7453.6
-3.36
0.005
-102.2
-22.6
3.27
0.006
2.8
13.7
-240.1
7.0
results are based on 250 days from July 1, 1999 to June 30, 2000. For each currency, the
data consists of the daily percentage changes in the number of US dollars per unit of the
foreign currency. We may think of these as the percentage changes in the price (in US
dollars) of the foreign currency.
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.425
0.181
0.171
0.006
250
ANOVA
df
Regression
Residual
Total
Intercept
BritPound
JapanYen
CanDollar
3
246
249
SS
0.0018
0.0081
0.0099
Coefficients
-0.0004
0.2235
0.0298
0.7293
Std Err
0.0004
0.0749
0.0473
0.1126
MS
0.0006
0.0000
F
P-value
18.1213
0.0000
t Stat
P-value Lower 95% Upper 95%
-0.98
0.329
-0.0011
0.0004
2.98
0.003
0.0759
0.3710
0.63
0.530
-0.0634
0.1229
6.48
0.000
0.5076
0.9511
41
Intercept
Promote
1
48
49
SS
2172.88
2624.74
4797.62
Coefficients
25.126
0.762
Std Err
11.883
0.121
MS
2172.88
54.68
F
39.74
P-value
0.000
t Stat
P-value Lower 95% Upper 95%
2.115
0.040
1.235
49.018
6.304
0.000
0.519
1.005
(b) R2 = SSR/(SSR+SSE) = 2172.88/4797.62 and se = M SE = SSE/(n k 1) =
2. (a) The coecient for Time measures the annual rate of increase in the minimum wage.
According to the regression output, this rate is estimated at $0.0867 per year.
(b) t43,.025 s ; i.e., 0.0867 (2.32)(.0035); i.e., (.0797 , .0937).
(c) The
R2
is
SSR
SSR + SSE
57.0874
=
57.0874 + 3.9456
= 93.54%
R2 =
This is a high R2 indicating a good linear t. This means that the minimum wage grew
roughly linearly (i.e., at a constant rate) during 1950-1994.
3. (a) According to Figure 8, the average salary among female employees is $8,296 less
than amonge male employees. The very small p-value indicates that this dierence is
statistically signicant.
42
(b) According to Figure 9, the average salary among female employees is $8,080 less than
amonge male employees, once dierences in prior years of experience and years with the
bank have been accounted for. The very small p-value indicates that the dierence between
men and women remains statistically signicant even once these other dierences have
been accounted for.
(c) The coecients for both YrsPrior and YrsExper are positive, indicating that employees
with more years of experience tend to have higher salaries. The fact that the dierence
between average salaries for men and women is smaller when the experience variables are
included suggests that women may have lower values of YrsPrior and YrsExper, on average, so that part of the dierence observed in Figure 8 is due to dierences in experience,
rather than dierences in sex.
Two considerations temper this argument. First, the condence intervals for the Female
coecients are very wide compared with the dierence across the two regressions; so, the
dierence many not be meaningful. Second, the p-value for YrsPrior suggests that this
variable does not have a statistically signicant eect on salary.
(d) To defend the bank, you could argue that these regressions do not take account of
other dierences between the men and women employed at the bank. For example, women
may have been hired with fewer years of education and/or hired into lower-level positions.
To support this argument, you would want to run a regression that includes variables for
education and job grade to determine whether the salary dierence remains signicant
even after these other factors are considered. (Of course, this would not address the
question of whether the initial hiring was discriminatory; rather, it would check whether
men and women in comparable positions earn equal pay.)
4. (a) Recall that
se =
MSE =
SSE
,
nk1
so
6494658.12
= 681.
14
Thus, the se is 681,000 cars, and this is roughly the standard error in any forecast we try
to make from the model.
se =
116.554
0
= 2.02.
=
s
57.62
For the p-value, we use TDIST(2.02,14,2) because we have 14 degrees of freedom and we
want a 2-sided p-value. The answer is 0.063. Notice that this just fails to be signicant
at the 5% level; similarly (and equivalently) the 95% condence interval given in the
regression output just straddles 0.
(c) The estimated regression equation is
QUANTITY = 1332.062 62.400PRICE +8.290INCOME 116.554INTEREST
43
so our forecast is
QUANTITY = 1332.062 62.400 285 + 8.290 3110 116.554 9.50
which is 8223, or 8,223,000 cars.
5. (a) Interpret the coecient on British Pound as follows: for each percentage point change
in the pound, the Australian dollar changes by 0.2235. The interpretation for the others
is analogous.
(b) The coecients are all positive, which suggests that all four currencies tend to move
in the same direction. To hedge a position in Australian dollars, we should therefore take
an opposite position in the others. So, we should sell short.
The interpretation in (a) suggests that we should sell 22.35 million US dollars worth of
British pounds, 2.98 million US dollars worth of Yen, 72.93 million US dollars worth of
Canadian dollars. If we do this and if, for example, the dollar/pound rate increases by 1%
then we lose .2235 million US dollars on the short pound position. The regression predicts
that when the US dollar/pound rate goes up by 1%, the US dollar/Australian dollar rate
goes up by 0.2235%. So, we earn 0.2235 million on the Australian dollar position, which
is oset by the gains on the pound. Of course, in practice these would not oset each
other exactly because the regression equation does not hold exactly.
(c) The residuals of this regression are the hedging errors. Recall that se estimates the
standard error of the residuals. So, se estimates the residual risk. More precisely, the
percentage change of the hedged position has a standard deviation of 0.006. On a position
of $100 million, 0.006% means $6,000. This is the estimated standard deviation of the
one-day change in the value of the position.
Recall that the R2 gives the ratio of the explained variance to the total variance, and 1R2
gives the ratio of unexplained variance to the total variance. In the current setting, the
unexplained variance is how much variance remains after we hedge, and the total variance
is the variance without the hedge; similarly,
unexplained variance
Var[hedged position]
=
= 1 R2 .
Var[original position]
total variance
44