Assignment Stastics
Assignment Stastics
Assignment Statistics
Group : GIM - 06
Exercise 1. For statements (a)-(h), state whether descriptive or inferential statistics has
been used.
(a) By 2040 at least 3.5 billion people will run short of water (World Future Society).
(b) In a sample of 100 on-the-job fatalities, 90% of the victims were men.
(c) In a survey of 1000 adults, 34% said that they posted notes on social media websites
(Source: AARP Survey).
(d) In a poll of 3036 adults, 32% said that they got a flu shot at a retail clinic (Source:
Harris Interactive Poll).
(e) Allergy therapy makes bees go away (Source: Prevention).
(f) Drinking decaffeinated coffee can raise cholesterol levels by 7% (Source: American Heart
Association).
(g) The average stay in a hospital for 2000 patients who had circulatory system problems
was 4.7 days.
(h) Experts say that mortgage rates may soon hit bottom (Source: USA TODAY).
Answer :
a Inferential
b Descriptive
c Descriptive
d Descriptive
e Inferential
f Inferential
g Descriptive
h Inferential
By : Sun Bunra 2
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Answer :
a Ratio-level
b Ordinal-level
c Interval-level
d Ratio-level
e Ratio-level
f Ratio-level
g Ordinal-level
h Ratio-level
i Ratio-level
By : Sun Bunra 3
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Answer :
a Discrete
b Continuous
c Discrete
d Continuous
e Continuous
f Discrete
g Continuous
h Continuous
Exercise 5. How People Get Their News The Brunswick Research Organization surveyed 50
randomly selected individuals and asked them the primary way they received the daily news.
Their choices were via newspaper (N), television (T), radio (R), or Internet (I). Construct a
categorical frequency distribution for the data and interpret the results.
N N T T T I R R I T
I N R R I N N I T N
I R T T T T N R R I
R R I N T R T I I T
T I N T T I R N R T
Solution :
There are four types of primary way to receive the daily news N, T, R, and I. These types
will be used as the classes for the distribution.
The procedure for constructing a frequency distribution for categorical data is given below.
• Make a table as shown.
• Tally the data and place the results in column B.
• Count the tallies and place the results in column C.
• Find the percentage of values for each class in column D.
The categorical frequency distribution is obtained as given below:
From the above frequency distribution table, it is clear that 32% of the people got their daily
news via television which is the higher percentage as compared to other primary ways to
receive their daily news.
By : Sun Bunra 4
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 6. College Completions The percentage (rounded to the nearest whole percent) of
persons from each state completing 4 years or more of college is listed below. Percentage of
persons completing 4 years of college
23 25 24 34 22 24 27 37 33 24
26 23 38 24 24 17 28 23 30 25
30 22 33 24 28 36 24 19 25 31
34 31 27 24 29 28 21 25 26 15
26 22 27 21 25 28 24 21 25 26
(a) Organize the data into a grouped frequency distribution with 5 classes.
(b) Find the relative frequency.
(c) Construct a histogram, frequency polygon, and ogive.
Solution : We have MIN = 15, MAX = 38, K = 6, I = 4
a. Organize the data into a grouped frequency distribution with 6 classes.
lower limit upper limit lower boundary upper boundary Midpoint frequency
15 18 14.5 18.5 16.5 2
19 22 18.5 22.5 20.5 7
23 26 22.5 26.5 24.5 22
27 30 26.5 30.5 28.5 10
31 34 30.5 34.5 32.5 6
35 38 34.5 38.5 36.5 3
50
(b) Find the relative frequency.
cumulative frequency Class boundary Class Cumulative Frequency
2 14.5 − 18.5 14.5 0
9 18.5 − 22.5 18.5 2
31 22.5 − 26.5 22.5 9
41 26.5 − 30.5 26.5 31
47 30.5 − 34.5 30.5 41
50 34.5 − 38.5 34.5 47
50
(c) Construct a histogram, frequency polygon, and ogive.
By : Sun Bunra 5
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 7. Ages of the Vice Presidents at the Time of Their Death The ages of the Vice
Presidents of the United States at the time of their death are listed below.
90 83 80 73 70 51 68 79 70 71
72 74 67 54 81 66 62 63 68 57
66 96 78 55 60 66 57 71 60 85
76 98 77 88 78 81 64 66 77 93 70
lower limit upper limit lower boundary upper boundary Midpoint frequency
51 58 50.5 58.5 54.5 5
59 66 58.5 66.5 62.5 9
67 74 66.5 74.5 70.5 11
75 82 74.5 82.5 78.5 9
83 90 82.5 90.5 86.5 4
91 98 90.5 98.5 94.5 3
41
(b) Find the relative frequency.
By : Sun Bunra 6
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 8. Activities While Driving A survey of 1200 drivers showed the percentage of
respondents who did the following while driving. Construct a vertical bar graph and a
horizontal bar graph for the data.
Solution :
Construct a vertical bar graph and a horizontal bar graph for the data.
By : Sun Bunra 7
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 9. Calories of Nuts The data show the number of calories per ounce in selected
types of nuts. Construct vertical and horizontal bar graphs for the data.
Types Calories
Peanuts 160
Almonds 170
Macadamia 200
Pecans 190
Cashews 160
Solution :
selected types of nuts. Construct vertical and horizontal bar graphs for the data.
By : Sun Bunra 8
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 10. Space Launches The data show the number of U.S. space launches for the
10-year periods from 1960 to 2009. Construct a time series graph for the data and analyze
the graph.
Year 60 − 69 70 − 79 80 − 89 90 − 99 100 − 109
Launches 614 247 199 300 206
Solution :
We have,
Year Launches
60 − 69 614
70 − 79 247
80 − 89 199
90 − 99 300
100 − 109 206
Construct a time series graph for the data and analyze the graph.
The data show the number of us that space launches 60 to 89 the graph is decrease And 89
to 99 the graph is increase And then 99 to 109 is decrease again.
By : Sun Bunra 9
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 11. High School Dropout Rate The data show the high school dropout rate for
students for the years 2003 to 2009 . Construct a time series graph and analyze the graph.
Solution :
We have,
Year Percent
2003 9.9
2004 10.3
2005 9.4
2006 9.3
2007 8.7
2008 8
2009 8.1
Construct a time series graph and analyze the graph.
The data showed about the number of us that High School Dropout rate increase from 2003
into 2004 and then it is decrease from 2004 to 2008 and increase again in 2008 to 2009.
Exercise 12. Spending of College Freshmen The average amounts spent by college freshmen
for school items are shown. Construct a pie graph for the data.
Electronics/computers $728
Dorm items $344
Clothing $141
Shoes $72
Solution :
The total amount is 1285$. For construct the pie graph you need to find the percentages of
each data.
(728 × 100)
• Some note:
1285
By : Sun Bunra 10
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 13. Career Changes A survey asked if people would like to spend the rest of their
careers with their present employers. The results are shown. Construct a pie graph for the
data and analyze the results.
Solution :
We have
Answer Number of people
Yes 660
No 260
Undecided 80
Construct a pie graph for the data and analyze the results.
The data show the number of us that Career change A we saw this graph show that for
answer yes it have 80% and 26% is said no and then we have 8% is undecided.
By : Sun Bunra 11
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 14. Peyton Manning’s Colts Career Peyton Manning played for the Indianapolis
Colts for 14 years. (He did not play in 2011.) The data show the number of touch-downs he
scored for the years 1998-2010. Construct a dotplot for the data and comment on the graph.
26 33 27 49 31 27 33
26 26 29 28 31 33
Solution :
Construct a dotplot for the data and comment on the graph.
The graph shows that the maximum score in number 26 and 33 his scored 3 score. And the
minimum score in number 49 and 29 his scored 1 score.
Exercise 15. Songs on CDs The data show the number of songs on each of 40CDs from the
author’s collection. Construct a dotplot for the data and comment on the graph.
10 14 18 11
11 15 16 10
10 17 10 15
22 9 14 12
18 12 12 15
21 22 20 15
10 19 20 21
17 9 13 15
11 12 12 9
14 20 12 10
By : Sun Bunra 12
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Solution :
Construct a dotplot for the data and comment on the graph.
The graph shows that most CDs have 10 songs, 16 and 22 songs. The lowest number of CDs
have 17 songs.
Exercise 16. The traffic situation in X-City is getting worse, and it is high time a solution
was offered. The company hired to work on the project took a survey of the estimated
amount of vehicles that move on the road daily and for various intervals. The result of this
survey is illustrated in the table below.
Time Cars Buses Bikes
1 − 2pm 37 45 42
2 − 3pm 44 34 26
3 − 4pm 23 39 27
4 − 5pm 29 41 48
Construct a multiple line graph to visualize the data. Hence, determine the vehicle with the
highest frequency and that with the lowest frequency.
Solution :
determine the vehicle with the highest frequency and that with the lowest frequency.
By : Sun Bunra 13
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Vertical with highest frequency is Bikes and with lowest frequency is Cars.
Exercise 17. Draw a multiple bar graph for the following data which represented agricul-
tural production for the priod from 2010-2013.
Solution :
Draw a multiple bar graph for the following data which represented agricultural production
for the priod from 2010-2013.
Exercise 18. The heights (in cm ) of a sample of the students in a class are shown:
50 52 70 72 65 52 60
75 51 64 65 55 67 70
Find the mean, mode, median, inter quartile range, midrange, variance, and standard devi-
ation for the data.
By : Sun Bunra 14
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Solution :
Find the mean, mode, median, inter quartile range, midrange, variance, and standard devi-
ation for the data.
P X
x 868
• Mean: x̄ = , n = 14, x = 868 (sum of all data) → x̄ = = 62
n 14
• Mode = 52, 65, 70 (appear most frequency)
X7 + X8 64 + 65
• Median: M D = = = 64.5
2 2
• Inter quartile range:
n×p 14 × 25 X4 + X5 52 + 55
• for p = 25 ⇒ C = = = 3.5 ⇒ Q1 = = = 53.5
100 100 2 2
n×p 14 × 75 X11 + X12 70 + 70
• for p = 75 ⇒ C = = = 10.5 ⇒ Q3 = = = 70
100 100 2 2
⇒ IQR = Q3 − Q1 = 70 − 535 = 16.5
lowest values + highest value 50 + 75
• Midrange: M R = = = 62.5
2 2
P
(X − X̄)2 962
• Variance: s = 2
= = 74 (Check Excel)
n−1 13
√ √
• Standard deviation: s = s2 = 74 = 8.60233
Column1
Exercise 19. Households of Four Television Networks A survey showed the number of
viewers and number of households of four television networks. Find the average number of
viewers, using the weighted mean.
Solution :
Find the average number of viewers, using the weighted mean.
Averages of number viewers:
P
wx ((1.4 × 1.6)) + ((0.8) × (0.8)) + ((0.3) × (0.4)) + ((1.6) × (1.8))
X̄ = P = 1.43
w 1.4 + 0.8 + 0.3 + 1.6
By : Sun Bunra 15
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 20. Magazines in Bookstores A survey of bookstores showed that the average
number of magazines carried is 56 , with a standard deviation of 12 . The same survey
showed that the average length of time each store had been in business was 6 years, with
a standard deviation of 2.5 years. Which is more variable, the number of magazines or the
number of years?
Solution :
Here the average number of magazines carried is x = 56, then the standard deviation is 12
Here the average length of time is x = 6, then the standard deviation s is 2.5
CVar x s Percentage
Magazines 56 12 21.43%
Time 6 2.5 41.67%
Therefore, the coefficient of variation time is 41.66% from the above information the data of
the time in business is more variable, because its coefficient of variation is higher.
Exercise 21. Average Earnings of Workers The average earnings of year-round full-time
workers 25 − 34 years old with a bachelor’s degree or higher were $58, 500 in 2003 . If the
standard deviation is $11, 200, what can you say about the percentage of these workers who
earn.
(a) Between $47, 300 and $69, 700 ?
(b) More than $80, 900 ?
(c) How likely is it that someone earns more than $100, 000 ?
Solution :
(a) Between $47, 300 and $69, 700 ?
We have X̄ = 58500$, S = 11200$
µ − ks = 47300 (1)
Then
µ + ks = 49700 (2)
µ − 47300 58500 − 47300
(1): k = = =1
5 11200
Thus, chebyshev’s is not app.
(b) More than $80, 900 ?
• Step 1
we need to find p(x > 80900)
x0 − µ 809000 − 58500
by z-score, zo = = =2
6 11200
⇒ p(z > 2) = 1 − p(z < 2)
= 1 − p(z < 2)
= 1 − 0.9772
⇒ p(x > 80900) = 0.0228
• Step 2
By : Sun Bunra 16
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 22. Costs to Train Employees For a certain type of job, it costs a company an
average of $231 to train an employee to perform the task. The standard deviation is $5.
Find the minimum percentage of data values that will fall in the range of $219 to $243. Use
Chebyshev’s theorem.
Solution :
• Step 1
µ − kσ = 219 231 − 5k = 219 × (−1)
⇔
µ + kσ = 243 231 + 5k = 243
1 1
By cheby shev’s theorem: p(|x − µ| < kσ) ≥ 1 − 2
=1− = 0.83 = 83%
k (2.4)2
• Step 2
We have, mean = 231$
Standard deviation 5$
1
Formular 1 − 2
k
Firstly, we need to find out how many standard deviations 219$ and 243$ are from the
mean of 231$ we subtract to find which, if each, of the two bounds, 219$ and 234$, is closer
to the mean.
By : Sun Bunra 17
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 23. Exam Completion Time The mean time it takes a group of students to com-
plete a statistics final exam is 44 minutes, and the standard deviation is 9 minutes. Within
what limits would you expect approximately 95% of the students to complete the exam?
Assume the variable is approximately normally distributed.
Solution :
From what we can show from the informing
We have: µ = 44, σ = 9
Approximately 95% we got k = 2 (normally)
µ − kσ = 44 − 2(9) = 26
µ − kσ = 44 + 2(9) = 62
Therefore, approximately 95 % of students can complete the exam is staying between 26-62
minutes.
Exercise 24. Exam Grades Which of these exam grades has a better relative position?
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
Solution :
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
82 − 85
⇒ z - score for grade of 82 is z1 = = −0.5
6
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
56 − 60
⇒ z - score for grade of 56 is z2 = = −0.8 In conclusion, Z1 > Z2
5
Hence, a grade of 82 has better relative position than a grade of 56 on a test.
Exercise 25. Check each data set for outliers.
(a) 506, 511, 517, 514, 400, 521
(b) 3, 7, 9, 6, 8, 10, 14, 16, 20, 12
Solution :
(a) 506, 511, 517, 514, 400, 521
We have, 400, 506, 511, 514, 517, 521
n×p 6 × 25 X2 + X3 506 + 511
• P = 25 ⇒ C = = = 1.5 ⇒ Q1 = = = 508.5
100 100 2 2
n×p 6 × 75 X5 + X6 517 + 521
• P = 75 ⇒ C = = = 4.5 ⇒ Q3 = = = 519
100 100 2 2
• Find the interquartile range : IQR = Q3 − Q1 = 519 − 508.5 = 10.5
• Multiply the IQR by 1.5 : IQR ×1.5 = 10.5 × 1.5 = 15.75
• Subtract the value obtained in step 3 from Q1 , and add value to Q3 .
Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than
Q3 + 1.5(IQR).
By : Sun Bunra 18
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Outlier Outlier
By : Sun Bunra 19
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 +
1.5(IQR).
Outlier Outlier
By : Sun Bunra 20
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 +
1.5(IQR).
Outlier Outlier
Exercise 27. The following sample data are the midterm examination test scores for 30
students:
55 60 91 85 60 70 89 99 59 67
72 82 60 68 57 74 64 70 68 91
89 90 83 40 79 85 71 80 76 81
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
b. Construct a frequency table with 5 classes.
c. Using the grouped data formula, find the mean, mode, median, variance, standard devi-
ation, Q1 , and Q3 for the table in part (b) and compare it to the results in part (a).
d. Construct a histogram and comment on the shape of the distribution.
e. Find the percentile values of 55,60 , and 74 .
Solution :
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
We have: MAX = 99, MIN = 40, I = 12
P X
x 2215
• Mean X̄ = , x = 2215, n = 30 ⇒ X̄ = = 73.83
n 30
• Mode = 60 (Appear 3 frequency which is the most frequency data value)
(X15 + X16 ) 72 + 74
• MD = = = 73
2 2
n×p 30 + 25
• Q1 = L25 = C = = =8
100 100
X 8 + Q9 64 + 67
We found Q1 = 8. So, We have: Q1 = = = 65.5 ∼ 65
2 2
n×p 30 + 75
• Q3 = L75 = C = = = 23
100 100
By : Sun Bunra 21
Institute of Technology of Cambodia Statistics ( 2022-2023 )
X23 + X24 85 + 89
We found Q3 = 23, So we have: Q3 = = = 85
2 2
P
(x − x̄)2 5294.17
• Variance : s =
2
= = 182.56
n−1 30 − 1
√ √
• Standard deviation: s = s2 = 182.56 = 13.51
Column1
Mean 73.83
Standard Error 2.46
Median 73
Mode 60
Standard Deviation 13.51
Variance 182.55
lower limit upper limit lower boundary upper boundary Midpoint frequency
40 51 39.5 51.5 45.5 1
52 63 51.5 63.5 57.5 6
64 75 63.5 75.5 69.5 9
76 87 75.5 87.5 81.5 8
88 99 87.5 99.5 93.5 6
30
2
Class limits Frequency (f) Midpoint f.Xm f.Xm
40 − 51 1 45.5 45.5 2070.25
52 − 63 6 57.5 345 19837.5
64 − 75 9 69.5 625.5 43472.25
76 − 87 8 81.5 652 53138
88 − 99 6 93.5 561 52453.5
2229 170971.5
P
f · xm 2229
• Mean X̄ = = = 74.3
n 30
• Mode = modal class = the class with the largest frequency.
• The modal class is 64− 75 (class limit)or 63.5 − 75.5 (class boundary).
By : Sun Bunra 22
Institute of Technology of Cambodia Statistics ( 2022-2023 )
w
• Median: M D = Lm + (0.5n − cf )
f
• Lm = 63.5, w = 12, f = 9, n = 30, cf = 7
12
• M D = 63.5 + (0.5 × 30 − 7) = 74.16
9
P P
n ( f · Xm2
) − ( f · Xm )2 30(170971.5) − (2229)2
• Variance: s =2
= = 184.71
n(n − 1) 30(30 − 1)
√ √
• Standard deviation: s = s2 = 184.71 = 13.59
n×p 30 × 25
• For p = 25 → C = = = 7.5 ∼ 8
100 100
we can shows: Q1 = 65.5 and Q3 = 85, So from grouped data and data on ( a we got Q1
and Q3 are similar.
Therefore, from grouped data we got: Mean X̄ = 74.3, Mode = 63.5, M D = 74.16, S 2 =
184.71, S = 13.59, Q1 = 65.5, Q3 = 85
d. Construct a histogram and comment on the shape of the distribution.
Comment: The shape of the distribution can be described as bimodal. There are 2 classes
boundary that occurred at the same frequency.
e. Find the percentile values of 55,60 , and 74 .
Hence, a student whose score was 60 did better than 15% of the class.
15 + 0.5
• For x = 74, then the percentile = × 100 = 51.6th percentile. Hence, a student
30
whose score was 74 did better than 51.6% of the class.
6.3 2.9 4.5 1.1 1.8 4.0 1.2 3.1 2.0 4.0
7.0 2.8 4.3 5.3 2.9 8.3 4.4 2.8 3.1 5.6
4.5 4.5 5.7 0.5 6.2 3.7 0.9 2.4 3.0 3.5
(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
(b) Construct a frequency table with 5 classes.
(c) Using the grouped data formula, find the mean, mode, median, variance, standard devia-
tion, Q1 , Q3 and 90 th percentile for the frequency table constructed in part (b) and compare
it to the results in part (a).
(d) Construct a histogram, and comment on the shape of the data.
Solution :
(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
P
x
• Mean X̄ = = 3.74
n
• Mode = 4.5 (it appears 3 frequency which is the most frequency data values)
X15 + X16 3.5 + 3.7
• Median = = = 3.6
2 2
Σ X − X)2
• Variance 2 =
n−1
we have sample X = 3.74, n = 30,
101.55
we got s2 = = 3.50
29
√
• Standard deviation, s = s2 = 1.87
• Find Q1 and Q3
n×p 30 × 25 X8 + Q9 2.8 + 2.8
• For p = 25 : Q1 = = = 7.5 ⇒ Q1 = = = 2.8
100 100 2 2
n×p 30 × 75 X23 + Q24 4.5 + 5.3
• For p = 75 Q3 = = = 22.5 ⇒ Q3 = = = 4.9
100 100 2 2
Therefore, Q1 = 2.8, Q3 = 4.9
• Find the value corresponding to 90th percentile.
n×p 30 × 90
• For p = 90, then c = = = 27
100 100
6.2 + 6.3
Hence, the data value correspending to 90th percentile is = 6.25
2
By : Sun Bunra 24
Institute of Technology of Cambodia Statistics ( 2022-2023 )
lower limit upper limit lower boundary upper boundary Midpoint frequency
0.5 2.05 0 2.55 1.275 6
2.06 3.61 1.56 4.11 2.835 9
3.62 5.17 3.12 5.67 4.395 8
5.18 6.73 4.68 7.23 5.955 5
6.74 8.29 6.24 8.79 7.515 1
29
By : Sun Bunra 25
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 29. In recent years, due to low interest rates, many homeowners refinanced their
home mortgages. Linda Lahey is a mortgage officer at Down River Federal Savings and
Loan. Below is the amount refinanced for 20 loans she processed last week. The data are
reported in thousands of dollars and arranged from smallest to largest.
59.2 59.5 61.6 65.5 66.6 72.9 74.8 77.3 79.2 83.7
85.6 85.8 86.6 87.0 87.1 90.2 93.3 98.6 100.2 100.7
n×p 20 × 26
C= = = 5.2
100 100
n×p 20 × 83
C= = = 16.6
100 100
Thus, the value corresponding to the 83rd percentile is L83 = x17 = 93.3
c. Draw a box plot of the data and comment on the shape of the distribution.
By : Sun Bunra 26
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 30. Hours Worked The data shown here represent the number of hours that 12
part-time employees at a toy store worked during the weeks before and after Christmas.
Construct two boxplots and compare the distributions.
Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12
Solution :
Construct two boxplots and compare the distributions.
Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12
Exercise 31. Many times in statistics it is necessary to see if a set of data values is approx-
imately normally distributed. There are special techniques that can be used. One technique
is to draw a histogram for the data and see if it is approximately bell-shaped. (Note: It does
not have to be exactly symmetric to be bell-shaped.) The numbers of branches of the 50 top
libraries are shown.
67 84 80 77 97 59 62 37 33 42
36 54 18 12 19 33 49 24 25 22
24 29 9 21 21 24 31 17 15 21
13 19 19 22 22 30 41 22 18 20
26 33 14 14 16 22 26 10 16 24
By : Sun Bunra 27
Institute of Technology of Cambodia Statistics ( 2022-2023 )
2k ≥ 50
2k ≥ 26 ⇒ k = 6
max − min 97 − 9
we have, M AX = 97, M IN = 9, K = 6, i = = = 15
k 6
lower limit upper limit lower boundary upper boundary Midpoint frequency
9 23 8.5 23.5 16 24
24 38 23.5 38.5 31 15
39 53 38.5 53.5 46 3
54 68 53.5 68.5 61 4
69 83 68.5 83.5 76 2
84 98 83.5 98.5 91 2
50
By : Sun Bunra 28
Institute of Technology of Cambodia Statistics ( 2022-2023 )
2
Classes Boundaries Midrange Xm Frequency f f.Xm f.Xm
9 − 23 8.5 − 23.5 16 24 384 6144
24 − 38 23.5 − 38.5 31 15 465 14415
39 − 53 38.5 − 53.5 46 3 138 6348
54 − 68 53.5 − 68.5 61 4 244 14884
69 − 83 68.5 − 83.5 76 2 152 11552
84 − 98 83.5 − 98.5 91 2 182 16562
sigma 1565 69905
2. Construct a histogram for the data.
6. What percent of the data values fall within 1 standard deviation of the mean?
# posible data 40
P (−9.7 < X < 72.7) = × 100 = × 100 = 80%
# all data 50
By : Sun Bunra 29
Institute of Technology of Cambodia Statistics ( 2022-2023 )
7. What percent of the data values fall within 2 standard deviations of the mean?
According to the 50 data, there are 46 data which is betweenn −10.02to72.62. Therefore,
the percent of data values which falls within 2 standard deviation is
# posible data 46
P (−10.02 < X < 72.62) = × 100 = × 100 = 92%
# all data 50
8. What percent of the data values fall within 3 standard deviations of the mean?
According to the 50 data, there are 49 data which is between - 30.68 to 93.28. Therefore,
the percent of data values which falls within 1 standard deviation is
# posible data 49
P (−30.68 < X < 93.68) = × 100 = × 100 = 98%
# all data 50
9. Does your answer help support the conclusion you reached in question 4? Explain.
The answers from 6, 7, 8 does not support the conclusion from number distribution. That
mean, this distribution is approximately normal.
By : Sun Bunra 30
Institute of Technology of Cambodia Statistics ( 2022-2023 )
(a) Calculate a point estimate of the mean pull-off force of all connectors in the population.
State which estimator you used and why.
(b) Calculate a point estimate of the pull-off force value that separates the weakest 50% of
the connectors in the population from the strongest 50%.
(c) Calculate point estimates of the population variance and the population standard devi-
ation.
(d) Calculate the standard error of the point estimate found in part (a). Interpret the stan-
dard error.
(e) Calculate a point estimate of the proportion of all connectors in the population whose
pull-off force is less than 73 pounds
Solution :
(a) Calculate a point estimate of the mean pull-off.
1 X
26
θ̂ = X̄ = xi = 75.61538
26 i=1
By : Sun Bunra 31
Institute of Technology of Cambodia Statistics ( 2022-2023 )
(e) Calculate a point estimate of the population of all connectors in the population whose
x
pull-off force is less than 73 pounds p̂ = , where x is the number of connectors whose force
n
is less than 73pounds
1
Then, p̂ = = 0.03846
26
Therefore, p̂ = 0.03846
Exercise 2. (a) A random sample of 10 houses in a particular area, each of which is heated
with natural gas, is selected and the amount of gas (therms) used during the month of
January is determined for each house. The resulting observations are
Let µ denote the average gas usage during January by all houses in this area. Compute a
point estimate of µ
(b) Suppose there are 10,000 houses in this area that use natural gas for heating. Let t
denote the total amount of gas used by all of these houses during January. Estimate t using
the data of part (a). What estimator did you use in computing your estimate?
(c) Use the data in part (a) to estimate p, the proportion of all houses that used at least 100
therms.
(d) Give a point estimate of the population median usage (the middle value in the population
of all houses) based on the sample of part (a). What estimator did you use?
Solution :
(a) Compute a point estimate of µ
P10
i=1 xi
µ = x̄ = = 120.7
16
Therefore, µ̂ = 120.7
(b) Estimate t using the data of part (a).
Note that t is the total amount of gas used by all of those houses during January.
In this computing, we use the point estimate of the average gas usage during January by 10
houses in particular area.
(c) Use the data in part (a) to estimate p.
x 8
p= = = 0.8
n 10
Therefore, p = 0.8
(d) Give a point estimate of the population median usage.
ˆ = M D = x5 + x6 = 120
µ̃
2
Therefore, µ̃ = 120
By : Sun Bunra 32
Institute of Technology of Cambodia Statistics ( 2022-2023 )
2
and compute E S
Solution :
1 X 2
n
Show that S =2
Xi − X̄ is an unbiased estimator of σ 2
n − 1 i=1
!
1 Xn
2 1 Xn
We have S 2 = Xi − X̄ = X 2 − nX̄ 2
n − 1 i=1 n − 1 i=1 i
" !#
1 X n
E S2 = E X 2 − nX̄ 2
n − 1 i=1 i
!2
1 X n
1 Xn
E S2 = E (Xi )2 − E Xi
n − 1 i=1 n(n − 1) i=1
!2 " #
X
n X
n
We have E Xi = σ 2 + µ2 and E
2
Xi = nσ 2 + E 2 Xi
i=1 i=1
!2
X
n
we obtained E Xi = nσ 2 + (nµ)2
i=1
1
Thus, E S 2 = nσ 2 + nµ2 − σ 2 − nµ2 = σ 2
n−1
Therefore, S is an unbiased estimator of σ 2
2
By : Sun Bunra 33
Institute of Technology of Cambodia Statistics ( 2022-2023 )
X
n
Define X̂ = ai X i
i=1
X
n
Since X1 , . . . , Xn ∼ N µ; σ 2 E(X̂) = µ ai = µ
i=1
By Cauchy-Schwarz inequality:
(a1 b1 + . . . + an bn )2 ≤ a1 2 + . . . + a2n
b1 2 + . . . + bn 2 for ai , bi are positive for all
1
i ∈ N Take b1 = . . . = bn = 1, then a21 + . . . + a2n ≥
n
Therefore, V (X̄) ≤ V (X̂)
By : Sun Bunra 34
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Solution :
Show that X̄ and Y are both unbiased estimator of µ
1X
n
We have E(X̄) = Xi µ and
n i=1
n(n+1) Xn
X1 + 2X2 + . . . + nXn 2
E(Y ) = E = n(n+1)
E (Xi ) = µ
1 + 2 + 3 + ... + n i=1
2
Exercise 7. Using a long rod that has length µ, you are going to lay out a square plot in
which the length of each side is µ. Thus the area of the plot will be µ2 . However, you do not
know the value of µ, so you decide to make n independent measurements X1 , X2 , . . . , Xn of
the length. Assume that each Xi has mean µ (unbiased measurements) and variance σ 2 .
(a) Show that X̄ 2 is not an unbiased estimator for µ2 . [Hint: For any rv Y, E Y 2 =
V (Y ) + [E(Y )]2 . Apply this with Y = X̄.]
(b) For what value of k is the estimator X̄ 2 − kS 2 unbiased for µ2 ? [Hint: Compute
E X̄ 2 − kS 2 .]
Solution :
(a) Show that X̄ 2 is not an unbiased estimator for µ2
We will show that E X̄ 2 ̸= µ2
V (X̄) = E X̄ 2 − E 2 X̄
By : Sun Bunra 35
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Thus,
E X̄ 2 = V (X̄) + E 2 X̄
Then,
σ2
E X̄ 2 = + µ2 ̸= µ2
n
Therefore, X̄ is not an unbiased estimator for µ2
2
n θ
B(θ̂) = E(θ̂) − θ = θ−θ =−
n+1 n+1
By : Sun Bunra 36
Institute of Technology of Cambodia Statistics ( 2022-2023 )
By : Sun Bunra 37
Institute of Technology of Cambodia Statistics ( 2022-2023 )
P
e−nλ λ ni=1 xi
since, likelihood function is L(x; θ) =
x1 !x2 ! . . . xn !
∑n
x
λ i=1 i
So, ln L(x; θ) = −nλ + ln
x1 !x2 ! . . . xn !
∂ X n
∑n 1
Then, ln L(x; θ) = −n + xi × λ i=1 xi −1 × Pn
∂λ λ i=1 xi
i=1 Pn
∂ xi
• If, ln L(x; θ) = 0 ⇐⇒ −n + i=1 = 0
∂λ λ
Therefore, λ̂M LE = X̄
(b) Find maximum likelihood estimate of λ.
1X
40
We have λ̂ = xi
n i=1
Then,
1−θ X
n
ln L(x; θ) = −n ln(θ) + xi
θ i=1
1 X
n
∂ n
ln L(x; θ) = − − 2 ln (xi )
∂θ θ θ i=1
1 X
n
∂ n
ln L(x; θ) = 0 ⇐⇒ − 2 ln (xi ) = 0
∂θ θ θ i=1
1X
n
Then, θ = − ln (xi )
n i=1
1X
n
Therefore, θ̂M LE = − ln (xi )
n i=1
By : Sun Bunra 38
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Z 1 Z 1
1 1−θ
We have E(ln X) = ln xf (x; θ)dx = x θ ln xdx
0 θ 0
1
By changing variable let u = ln x so du = dx
x
Z 0
1 u
Then E(u) = ue θ du = −θ (Integrating by part)
θ −∞
!
1X 1X
n n
Therefore, E − ln (xi ) = − E (ln Xi ) = θ
n i=1 n i=1
Therefore, E(θ̂) = θ
Exercise 11. Let X1 , X2 , . . . , Xn be a random sample of size n from the exponential distri-
bution whose pdf is f (x; θ) = (1/θ)e−x/θ , 0 < x < ∞, 0 < θ < ∞.
(a) Show that X̄ is an unbiased estimator of θ.
(b) Show that the variance of X̄ is θ2 /n.
(c) What is a good estimate of θ if a random sample of size 5 yielded the sample values
3.5, 8.1, 0.9, 4.4, and 0.5 ?
Solution :
(a) Show that X̄ is an unbiased estimator of θ.
1X
n
We have E(X̄) = E (Xi ) and since X1 , . . . , Xn ∼ Exp(θ)
n i=1
Then, E(X̄) = E(X) where E(X) = θ
Therefore, It is unbiased estimator of θ
θ2
(b) Show that variance of X̄ is
n
1 X
n
We have V (X̄) = 2 V (Xi )
n i=1
Since X1 , . . . , Xn ∼ Exp(θ)
1 θ2
Then, V (X̄) = V (X) =
n n
θ2
Therefore, variance of X̄ is
n
(c) What is a good estimate of θ if a random sample of size 5 yielded the sample value
3.5, 8.1, 0.9, 4.4 and 0.5.
1
By Cramer-Rao inequality, a good estimate must be satisfied V (θ̂) =
nI(θ)
By : Sun Bunra 39
Institute of Technology of Cambodia Statistics ( 2022-2023 )
∂2
and I(θ) = −E ln f (x; ; θ)
∂θ2
We have
1 −x
ln f (x; θ) = ln eθ
θ
Then
∂ 1 x
ln f (x; θ) = − + 2
∂θ θ θ
Then
∂2 1 2x
2
ln f (x; θ) = 2 − 3
∂θ θ θ
So,
1 2x 1
I(θ) = −E 2
− 3 =
θ θ θ2
θ2
Then, V (θ̂) =
n
Conclusion, a good estimate of θ is X̄ using the given data we get x̄ = 3.48
Therefore, a good estimate is 3.48
Exercise 12. A diagnostic test for a certain disense is applied to n individuals known to not
have the disease. Let X = the number among the n test results that are positive (indicating
presence of the disease, so X is the number of false positives) and p = the probability that
a disease-free individual’s test result is positive (i.e., p is the true proportion of test results
from disease-free individuals that are positive). Assume that only X is available rather than
the actual sequence of test results.
(a) Derive the maximum likelihood estimator of p. If n = 20 and x = 3, what is the estimate?
(b) Is the estimator of part (a) unbiased?
(c) If n = 20 and x = 3, what is the MLE of the probability (1 − p)5 that none of the next
five tests done on disease free individuals are positive?
Solution :
Let X = the number among the n test results that are positive.
p = the probability that a disease-free individual’s test result is positive.
(a) Derive the maximum likelihood estimator of p,If n = 20 and x = 3 what is the estimate?
We have X ∼ Bin(n, p)
So, P (X = x) = C(n, x)px (1 − p)n−x
we have a likelihood function, L(x; p) = C(n, x)px (1 − p)n−x
Then,
ln L(x; θ) = ln (C(n, x)px (1 − p)x )
= ln C(n; x) + ln (px ) + ln(1 − p)n−x
∂ x n−x
So, ln L(x; θ) = −
∂p p 1−p
∂ x n−x x
• If ln L(x; θ) = 0 ⇐⇒ − = 0 then p̂ =
∂p p 1−p n
By : Sun Bunra 40
Institute of Technology of Cambodia Statistics ( 2022-2023 )
x
Therefore, p̂ =
n
3
using the given data we get p̂ = = 0.15
20
(b) In the estimate of part(a) is unbiased?
x X 1
we have p̂ = then, E p̂ = = E(X) = p
n n n
Therefore, It is unbiased .
(c) what is mle of the probability (1 − p)5
For n = 20 and x = 3p̂ = 0.15 Thus, (1 − p̂)5 = (1 − 0.15)5 = 0.855 = 0.443
Therefore, (1 − p̂)5 = 0.443
Exercise 13. The shear strength of each of ten test spot welds is determined, yielding the
following data (psi):
392 376 401 367 389 362 409 415 358 375
(a) Assuming that shear strength is normally distributed, estimate the true average shear
strength and standard deviation of shear strength using the method of maximum likelihood.
(b) Again assuming a normal distribution, estimate the strength value below which 95% of
all welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ
? Now use the invariance principle.]
Solution :
(a) Assuming that shear strength is normally distributed, estimate the true average shear
strength and standard deviation of shear strength using the method of maximum likelihood.
Since X ∼ N µ; σ 2
P10
xi
Then, by previous exercise we get µ̂ = X̄ = i=1 = 384.4
10
and for X1 , . . . , Xn ∼ N µ, σ ;
2
we have
− n2 ∑n
n 1(xi −µ)
2
L(x; σ) = 2πσ 2 × e− 2σ 2
1 X
n
n
ln L(x; σ) = × ln 2πσ 2 − 2 (xi − µ)2
2 2σ i=1
1 X
n
∂ n 2
L(x; σ) = − × + 3 (xi − µ)2
∂σ 2 σ σ i=1
1 X
n
∂ n
• If , L(x; σ) = 0 ⇐⇒ − + 3 (xi − µ)2 = 0
∂σ σ σ i=1
v
u n
u1 X
Thus, σ̂ = t (xi − µ)2
n i=1
By : Sun Bunra 41
Institute of Technology of Cambodia Statistics ( 2022-2023 )
v
u n
u1 X
Therefore, we have µ̂ = X̄ = 384.4 and σ̂ = t (xi − µ)2 = 3556.4
n i=1
(b) Again assuming a normal distribution, estimate the strength value below which 95% of
all welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ
? Now use the invariance principle.]
We have P (X ≤ c) = 0.95
Z −µ c−µ
Since, P ≤ = 0.95
σ σ
c−µ
So, ϕ = 0.95
σ
Then, ĉ = 1.65σ̂ + µ̂ (by invariance principle)
Therefore, estimate of strength is ĉ = 6252.46
Exercise 14. At time t = 0, 20 identical components are tested. The lifetime distribution
of each is exponential with parameter λ. The experimenter then leaves the test facility
unmonitored. On his return 24 hours later, the experimenter immediately terminates the test
after noticing that y = 15 of the 20 components are still in operation (so 5 have failed). Derive
the MLE of λ. [Hint: Let Y = the number that survive 24 hours. Then Y ∼ Bin(n, p). What
is the mle of p ? Now notice that p = P (Xi ≥ 24), where Xi is exponentially distributed.
This relates λ to p, so the former can be estimated once the latter has been.]
Solution :
Let Ti be the life time of component i th, Ti ∼ Exp(λ)
Derive mle of λ
Let Y = the number that survive 24 hours. Y ∼ Bin(n; p)
24
and we have p = P (Ti ≥ 24) = e λ , since Y ∼ Bin(n; p)
Then,
p(y) = C(n, y)py (1 − p)n−y
where
ln L(y, p) = ln C(n, y)py (1 − p)n−y
= ln C(n, y) + y ln p + (n − y) ln(1 − p)
• If
∂
ln L(y, p) = 0
∂p
y 15 24
Then, p̂ = = = 0.75 and λ̂ =
n 20 ln 0.75
24
Therefore, λ̂ =
ln 0.75
Exercise 15. Let X1 , X2 , . . . , Xn be a random sample from Bin(1, p) (i.e., n Bernoulli trials).
Thus,
Xn
Y = Xi ∼ Bin(n, p)
i=1
By : Sun Bunra 42
Institute of Technology of Cambodia Statistics ( 2022-2023 )
1X
n
Y 1
E(X̄) = E = E (Xi ) = np × = p
n n i=1 n
Note that E (Xi ) = p because X is Bernoulli Distributed
p(1 − p)
(b) Show that Var(X̄) =
n
1 X
n
1 p(1 − p)
V (X̄) = 2 V (Xi ) = 2 × np(1 − p) =
n i=1 n n
p(1 − p)
Therefore, Var(X̄) =
n
(c) Show that E[X̄(1 − X̄)/n] = (n − 1) p(1 − p)/n2
1 1
We have E[X̄(1 − X̄)/n] = E(X̄) − E X̄ 2
n n
p(1 − p) − np2
By previous question, E(X̄) = p and E X̄ 2 = Var(X̄) − E 2 (X̄) =
n
p p 2
p(1 − p)
Then, E[X̄(1 − X̄)/n] = − −
n n n2
E[X̄(1 − X̄)/n] = (n − 1)p(1 − p)/n2
Therefore, E[X̄(1 − X̄)/n] = (n − 1) p(1 − p)/n2
(d) Find the value c
1
by using the (c) question, we obtained c =
n−1
1
Therefore, c =
n−1
Exercise 16. Assume that the number of defects in a car has a Poisson distribution with
parameter λ. To estimate λ we obtain the random sample X1 , X2 , . . . , Xn .
(a) Find the Fisher information in a single observation using two methods.
(b) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of λ.
(c) Find the MLE of λ and show that the MLE is an efficient estimator.
Solution :
(a) Find the Fisher information in a single observation using two methods.
By : Sun Bunra 43
Institute of Technology of Cambodia Statistics ( 2022-2023 )
• First method.
∂2
I(λ) = −E ln f (X; λ)
∂λ2
e λ λx
f (x; λ) =
x!
∂2 x
Then ln f (x; λ) = −λ + x ln λ − ln x! =⇒ 2
ln f (x; λ) = − 2
∂λ λ
1 1
So, I(λ) = 2
E(X) =
λ λ
1
Therefore, I(λ) =
λ
• Second method
∂ X 1
I(λ) = V ln f (X; λ) =V −1 + =
∂λ λ λ
1
Therefore, I(λ) =
λ
(b) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of λ.
1 λ
The lower bound is =
nI(λ) n
(c) Find the mle of λ and show that the mle is an efficient estimator.
By using method of moment,
we have 1st sample moment is E(X) = X̄ (1)
and 1 st population moment is E(X) = λ (2)
By (1) and (2); we get λ̂ = X̄
1 X
n
λ
and we have V (X̄) = 2 V (Xi ) = which is equal to the lower bound of Cramer-Rao
n i=1 n
inequality.
Thus, it is an efficient estimator.
Therefore, The efficient estimator is λ̂ = X̄
Exercise 17. Suppose the waiting time for a bus is uniformly distributed on [0, θ] and the
results x1 , . . . , xn of a random sample from this distribution have been observed.
(a) Find the MLE θ̂ of θ.
n+1
(b) Letting θ̄ = θ̂, show that θ̄ is unbiased and find its variance.
n
(c) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of θ.
Solution :
(a) Find the mle of θ̂ of θ
1
We have X1 , . . . Xn ∼ U [0, θ] and the likelihood function L(x; θ) = for 0 < xi < θ
θn
In order to maximize the likelihood function we choose θ̂ = max (xi )
By : Sun Bunra 44
Institute of Technology of Cambodia Statistics ( 2022-2023 )
By : Sun Bunra 45
Institute of Technology of Cambodia Statistics ( 2022-2023 )
σ2
P (|X̄ − µ| ≥ ϵ) ≤ 2
nϵ
as n → ∞ we have P (|X̄ − µ| ≥ ϵ) → 0, as n → ∞
Therefore, X̄ is a consistently estimator of µ
By : Sun Bunra 46
Institute of Technology of Cambodia Statistics ( 2022-2023 )
I3-TD3
(Confidence Intervals)
Exercise 1. (a) Suppose we construct a 99% confidence interval. What are we 99% confident
about?
(b) Which of the confidence intervals is wider, 90% or 99% ?
(c) In computing a confidence interval, when do you use the t-distribution and when do you
use z, with normal approximation?
(d) How does the sample size affect the width of a confidence interval?
Solution :
a) A confidence interval for a parameter is an interval of numbers within which we expect
the true value of the population parameter to be contained. When we construct a 99%
confidence interval, we are 99% confident that the true value of the parameter is in our
confidence interval.
b) The larger the confidence, the wider the interval. 99% is wider than 90%.
c) If population standard deviation σ is not known, then using t-distribution is correct.If
population standard deviation σ is known, then using the normal distribution is correct.
d) Increasing the sample size decreases the width of confidence intervals, because it decreases
the standard error.
since Z ∼ N (0; 1)
so, we have ϕ(2.75) − ϕ(−2.81) = k
then k = 0.99454.
(b) Use this statement to find CI.
By : Sun Bunra 47
Institute of Technology of Cambodia Statistics ( 2022-2023 )
We have
!
X̄ − µ
P −2.81 ≤ Z = ≤ 2.75 =k
√σ
n
σ σ
P −2.81 √ ≤ x̄ − µ ≤ 2.75 √ = 0.99454
n n
σ σ
P x̄ − 2.75 √ ≤ µ ≤ x̄ + 2.81 √ = 0.99454
n n
Then the CI of µ is
σ σ
I(µ) = x̄ − 2.75 √ , x̄ + 2.81 √
n n
σ σ
Therefore, I(µ) = x̄ − 2.75 √ , x̄ + 2.81 √
n n
(c) The confident level of µ is 99%
(d) Symmetric confidence interval for µ
By symmetric principal
P −z α2 ≤ Z ≤ z α2 = k
σ σ
then, P x̄ − z α2 √ ≤ µ ≤ x̄ + z α2 √ =k
n n
σ σ
Therefore, the symmetric interval is x̄ − z 2 √ , x̄ + z 2 √
α α
n n
Exercise 3. Let X1 , . . . , Xn be a random sample from an N µ, σ 2 , where the value of σ 2
is unknown.
(a) Construct a 100(1 − α)% confidence interval for µ when the value of σ 2 is known.
(b) Construct a 100(1 − α)% confidence interval for µ when the value of σ 2 is unknown.
Solution :
Let X1 , . . . . . . , Xn is a random sample whose σ 2 is unknown.
(a) Construct a (1 − α)100%, σ 2 is known
X̄ − µ
We have X ∼ N µ, σ 2 . let Z = σ then Z ∼ N (0, 1)
√
n
By symmetric principal,
P −z α2 ≤ Z ≤ z α2 = 1 − α
σ σ
P x̄ − z α2 √ ≤ µ ≤ x̄ + z α2 √ =1−α
n n
σ σ
Therefore, CI of µ is I(µ) = x̄ − z α2 √ ; x̄ + z α2 √
n n
By : Sun Bunra 48
Institute of Technology of Cambodia Statistics ( 2022-2023 )
X̄ − µ
T = ∼ t(n − 1)
√S
n
P −t α2 ,n−1 ≤ T ≤ t α2 ,n−1 = 1 − α
σ σ
P x̄ − t 2 ,n−1 √ ≤ µ ≤ x̄ + t 2 ,n−1 √
α α =1−α
n n
σ σ
then, a (1 − α)100% CI for µ is I(µ) = x̄ − t α2 √ , x̄ + t α2 √
n n
Exercise 4. A random sample of size 50 from a particular brand of 16-ounce tea packets
produced a mean weight of 15.65 ounces. Assume that the weights of these brands of
tea packets are normally distributed with standard deviation of 0.59 ounce. Find a 95%
confidence interval for the true mean µ.
Solution : Find a 95% confidence interval for the true mean µ
⇒ (1 − α) = 0.95 ⇒ α = 0.05
′ α ′ 0.05
Since α = 0.05 ⇒ zα/2 = ϕ 1 − =ϕ 1− = 1.96
2 2
n = 50, x̄ = 15.65, σ = 0.59
Thus a 95% confidence interval for the true mean µ
σ σ 0.39
x̄ ± z · = 15.65 ± 1.96 √
α/2 n 50
= [15.65 ± 13.83]
= [1.79, 29.50]
Therefore, a 95% confidence interval for the true moan µ is [1.79, 29.5]
Exercise 5. A researcher wishes to estimate within $25 the average cost of postage a com-
munity college spends in one year. If she wishes to be 90% confident, how large of a sample
will be necessary if the population standard deviation is $80.
Solution : Find the sample size
We have (1 − α) = 90% ⇔ (1 − α) = 0.9 ⇒ α = 0.1
−1 0.1
⇒ z2/2 = ϕ 1− = ϕ−1 (0.95) = 1.65
2
By : Sun Bunra 49
Institute of Technology of Cambodia Statistics ( 2022-2023 )
And σ = 80 Then z 2
2/2 σ
n=
E
2
1.65 × 80
=
25
= 27.87 ≈ 28
Therefore, the sample size is n = 28
Exercise 6. A university dean wishes to estimate the average number of hours that
freshmen study each week. The standard deviation from a previous study is 2.6 hours. How
large a sample must be selected if he wants to be 99% confident of finding whether the true
mean differs from the sample mean by 0.5 hour?
Solution : Find the sample size
−1 0.01
We have (1 − α) = 0.99 ⇒ α = 0.01 ⇒ zα/2 ϕ 1− = 2.58
2
since σ = 2.6 and E = 0.5
2 2
z2/2σ 2.58 × 2.6
⇒n= = = 179.98 ≈ 480
E 0.5
Therefore, the sample size is n = 480
Exercise 7. In a large university, the following are the ages of 20 randomly chosen employees:
24 31 28 43 28 56 48 39 52 32
38 49 51 49 62 33 41 58 63 56
Assuming that the data come from a normal population, construct a 95% confidence interval
for the population mean µ of the ages of the employees of this university. Interpret your
answer.
Solution :
Construct a 95% confidence interval n = 20, x̄ = 44.05 By symmetric principal
s s
we have I(µ) = x̄ − t α2 √ , x̄ − t α2 √
n n
v
u
u 1 X 20
We have S = t (xi − x̄)2 = 12
n − 1 i=1
Since
t α2 ,n−1 = t0.025,19 = 2.093
12 12
⇒ I(µ) = 44.05 − 2.093 × √ , 44.05 − 2.093 × √ = (39.43, 49.66)
20 20
Then, I(µ) = (39.43, 49.66)
By : Sun Bunra 50
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 9. A random sample from a normal population yields the following 25 values:
1 X
n
2
S = (xi − x̄)2
n − 1 i=1
1 2
= (90 − 97.24)2 + 187 − 97.24 + · · · + (89 − 97.24)2
24
564.53
=
24
= 23.52
⇒ S = 4.84
(b) Give approximate 99% confidence interval for the population mean.
S
CI of mean rs x̄ ± t−1 /2, n − 1 √
n
Since x̄ = 97.24
α = 0.01 ⇒ t 2 , n − 1 = t0.005,24 = 8.091
2
s = 4.84
n = 25
By : Sun Bunra 51
Institute of Technology of Cambodia Statistics ( 2022-2023 )
(a) Can we say that the data are approximately normally distributed?
(b) Find a 95% confidence interval for population mean rate µ for the new cells to close a
razor cut made in the skin of anesthetized newts.
(c) Find a 99% confidence interval for µ.
(d) Is the 95% CI wider or narrower than the 99% CI? Briefly explain why.
Solution :
Exercise 11. Let X1 , . . . , Xn be a random sample from a normal distribution N µ, σ 2 ,
where the values of µ and σ 2 are unknown.
(a) Construct a 100(1 − α)% confidence interval for σ 2 , choosing an appropriate pivot. In-
terpret its meaning.
(b) Suppose a random sample from a normal distribution gives the following summary statis-
tics: n = 21, X̄ = 44.3, and s = 3.96. Using part (a), find a 90% confidence interval for σ 2 .
Interpret its meaning.
Solution :
Let X1 , . . . ., Xn ∼ N µ, σ 2 , where σ 2 is unknown
(n − 1)S 2
(a) We have ∼ χ2n−1
σ2
(n − 1)S 2 2
the appropriate pivot is , σ is unknown.
σ2
we have P χ21− α ,n−1 ≤ X 2 ≤ χ2α ,n−1 = 1 − α
2 2
!
(n − 1)s2 (n − 1)s 2
then, P ≤ σ2 ≤ 2 =1−α
χ2α ,n−1 χ1− α ,n−1
!
2 2
(n − 1)s 2
(n − 1)s 2
Therefore, I σ 2 = ,
χ2α ,n−1 χ21− α ,n−1
2 2
By : Sun Bunra 52
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 12. A random sample of 20 automobiles has a pollution by-product release stan-
dard deviation of 2.3 ounces when 1 gallon of gasoline is used. Find the 90% confidence
interval of the population standard deviation. Assume the variable is normally distributed.
Solution :
Given
n = 20
s = 2.3
c = 1 − α = 90% = 0.90
Determine the critical values using table G in the row df = n − 1 = 20 − 1 = 19 and in the
α α
columns of and 1 −
2 2
χ21−0.05 = χ20.95 = 10.117
χ20.05 = 30.144
The boundaries of the confidence interval are then:
s r
n−1 20 − 1
2
·s= · 2.3 ≈ 1.8260
χα/2 30.144
s r
n−1 20 − 1
2
·s= · 2.3 ≈ 3.1520
χ1−α/2 10.117
Therefore
1.8260 < σ < 3.1520
Exercise 13. A random sample from a normal population yields the following 25 values:
By : Sun Bunra 53
Institute of Technology of Cambodia Statistics ( 2022-2023 )
then, we obtained
I σ 2 = (78.19, 360.36)
Therefore, I σ 2 = (78.19, 360.36)
(c) We are 99% confident that the population variance lies in I σ 2 .
We can assume that an unbiased estimate for population variance of any random sample
with finite variance is s2 .
Exercise 14. In a random sample of 50 college seniors, 18 indicated that they were planning
to pursue a graduate degree. Find a 98% confidence interval for the true proportion of all
college seniors planning to pursue a graduate degree, and interpret the result, and state any
assumptions you have made.
Solution :
Find a 98% CI for the true proportion
18
We have n = 50 and p̂ = = 0.36
50
since np̂ ≥ 5 so,it is approximately normally distributed.
Exercise 15. It is believed that slightly over 40% of Cambodians own pets. How large a
sample is necessary to estimate the true proportion within 0.02 with 90% confidence?
Solution :
Since z α2 = 1.645, E = 0.02, p̂ = 0.4, and q̂ = 1 − 0.4 = 0.6.
r !2
z α2
n = p̂q̂
E
r !2
1.645
= (0.4)(0.6)
0.02
= 1623.615
which, when rounded up, is 1624 . So, the researcher must interview 1624 people.
Exercise 16. In a random sample of 500 items from a large lot of manufactured items, there
were 40 defectives. (a) Find a 90% confidence interval for the true proportion of defectives
in the lot. (b) Is the assumption of normal approximation valid? (c) Suppose we suspect
that another lot has the same proportion of defectives as in the first lot. What should be
the sample size if we want to estimate the true proportion within 0.01 with 90% confidence?
By : Sun Bunra 54
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Solution :
(a) Find a 90% CI for the true proportion of defectives in the lot.
40
We have n = 500 and p̂ = p = = 0.08 so np = 40 ≥ 5 so, it is approximately normally
500
distributed.
r r !
p̂(1 − p̂) p̂(1 − p̂)
then, a 90% CI is given by: I(p) = p̂ − z α2 , p̂ + z α2
n n
Therefore, I(p) = (0.06, 0.01)
(b) By the previous question, the assumption is true.
(c) Find the sample size.
z α 2
we have n = 2
p̂(1 − p̂),
E
where E = 0.01
1.6452 2 × 23
n= = 1992
0.01 252
Therefore, n = 1992
Exercise 17. A study found that 73% of randomly selected prekindergarten children ages
3 to 5 whose mothers had a bachelor’s degree or higher were enrolled in center-based early
childhood care and education programs. How large a sample is needed to estimate the true
proportion within 3 percentage points with 95% confidence? How large a sample is needed
if you had no prior knowledge of the proportion?
Solution :
Given:
p̂ = 73% = 0.73
c = 95%
E = 3% = 0.03
Formula sample size
2 2
zα/2 p̂q̂ zα/2 p̂(1 − p̂)
p̂ known: n = =
E2 E2
2
zα/2 0.25
p̂ unknown: n =
E2
For confidence level 1 − α = 0.95
• Determine zα/2 = z0.025 using table E (look up 0.025 in the table, the z-score is then the
found z-score with opposite sign)
zα/2 = 1.96
p̂ is known, then the sample size is (round up!)
2
zα/2 p̂(1 − p̂) 1.962 × 0.73(1 − 0.73)
n= = ≈ 842
E2 0.032
If p̂ is unknown, then the sample size is (round up!)
2
zα/2 (0.25) 1.962 × (0.25)
n= = ≈ 1068
E2 0.032
By : Sun Bunra 55
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Solution :
(a) Find the MLE µ̂ for µ.
1
e− 2σ2 (ln x−µ)
1 2
We have log normal distribution f x, µ, σ 2 = √
2πσx
!
Yn
1 2
−1 − 2σ2 (ln xi −µ)
1
• Likelihood function L x, µ, σ 2 = 1 xi e
2
(2πσ ) 2
i=1
∑n Y
n
2 −1n/2 [ − 12 (ln xi −µ)2 ]
= 2πσ ·e i=1 2σ × x−1
i
i=1
n !
n X 1 X n
• L = ln L xi µ, σ 2
= − ln 2πσ 2 + − 2 (ln xi − µ)2 − ln xi
2 i=1
2σ i=1
X
n
n X
n
(ln xi − µ)2
=− ln xi − ln(2π) − n ln(σ) −
i=1
2 i=1
2σ 2
Xn Pn Pn
n ln xi µ ln xi nµ2
=− ln xi − ln(2π) − n ln(σ) − i=1
+ i=1
− 2
i=1
2 2σ 2 σ2 2σ
Pn
dL ln xi 2nµ
• = i=1
−
dµ σ2 2σ 2
d2 L n
• 2
= − 2 < 0, σ > 0
dµ σ
Pn
1X
n
dL i=1 ln xi nµ
=0⇔ = 2 ⇒µ= ln xi
du σ2 σ n i=1
1X
n
Therefore, the MLE µ̂ of µ is µ̂ = ln xi
n i=1
b. Show that the MLE is the MVUE of µ
!
1X 1X
n n
We have E(µ̂) = E ln xi = E (ln xi )
n i=1 n i=1
1
E (ln x1 + . . . + ln xn ) = E(ln x)
=
n
Z +∞
1 1 ln x−µ 2
Since, E(µ̂) = ln x √ e− 2 ( σ ) dx
−∞ σx 2π
ln x − µ 1
Let u = ⇒ du = 2 dx
σ σ x
By : Sun Bunra 56
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Z +∞
uσ + µ − 1 u2
⇒ E(µ̂) = √ e 2 du
Z 2π Z +∞
+∞
σ − 12 u2 µ
√ e− 2 u du
1 2
= √ ue du +
2π 2π
| −∞ {z } −∞
Z +∞ 0
µ
√ e− 2 u du = µ
1 2
=
−∞ 2π
Z +∞
σ 1
√ ue− 2 u du Let u = − u2 ⇔ du = −u
1 2
Then, I=
−∞ 2π 2
Z +∞
σ
√ e−1/2u d −1/u2
2
=−
−∞ 2π
σ h i+∞
−1/u2
=− √ e =0
2π −∞
ln πxi 1
And V (µ̂) = V = 2 [V (ln x1 ) + . . . + V (ln xn )]
n n
1 1
= V (ln x) = E(ln x)2 − E 2 (ln x)
n n
Z +∞
1 ln x−u 2
=⇒ E(ln x)2 = (ln x)2 × √ e−1/2( σ ) du = µ2 + σ 2
−∞ 2πσx
1 2 σ2
=⇒ V (µ̂) = µ + σ 2 − µ2 =
n n
2 n
dα n n
we have, E = E − = −E = −
d2 µ σ2 σ2 σ2
1
=⇒ V (µ̂) = − 2 ⇒ µ̂ is an efficient estimator of µ
E dd2αµ
Since every efficient estimator of µ is the MVUE of µ So, µ̂ is a MUVE of µ
Therefore, MLE is the MVUE of µ
C. Construct a 95% confident interval for µ
σ σ
By formula : CI(µ) = x̄ − zα/2 √ , x̄ + zα/2 √
n n
α
We have 1 − α = 0.95 ⇒ α = 0.05 ⇒ = 0.025
2
−1 α
⇒ zα/2 = ϕ 1− = ϕ (1 − 0.025) = ϕ−1 (0.975) = 1.96
−1
2
For σ = 1 Then, x ∼ log µ, σ 2 ⇒ ln x − N µ, σ 2
1X 1 X
n 25
54.95
=⇒ x̄ = ln xi = ln xi = = 2.198
n i=1 25 i=1 25
σ σ
⇒ CI(µ) = x̄ − zα/2 √ , x̄ + zα/2 √ = [2.198 − 0.392, 2.198 + 0.392] = [1.806, 2.59]
n n
Therefore, CI(µ) = [1.806, 2.59]
By : Sun Bunra 57
Institute of Technology of Cambodia Statistics ( 2022-2023 )
By : Sun Bunra 58
Institute of Technology of Cambodia Statistics ( 2022-2023 )
where
ln(f (x, θ)) = ln(2x) − 2 ln(θ)
and
∂2 2 2
2
ln (f (xi , θ)) = 2 , so I(θ) = − 2
∂θ θ θ
1 θ2
Therefore = − which is different from the variance of Tn .
nI(θ) 2
Conclusion, Tn is not efficient.
Solution :
(a) Find the MLE θ̂n for θ.
we have
Y
n
L(x, θ) = f (xi ; θ)
i=1
Y
n
xo 1/θ
= 1+ θ1
for x ≥ xo
i=1θxi
n/θ 1+ θ1
xo 1
= n Qn
θ i=1 xi
Then,
n
n 1 X
ln(L(x; θ)) = ln xo − n ln(θ) − 1 + xi
θ θ i=1
1 X
n
∂ n n
ln(L(x; θ)) = − 2 ln xo − + 2 ln xi
∂θ θ θ θ i=1
∂
ln(L(x; θ)) = 0
∂θ
And,
1 X
n
n n
− 2 ln xo − + 2 ln xi = 0
θ θ θ i=1
Thus,
1X
n
θ= ln xi − ln xo
n i=1
By : Sun Bunra 59
Institute of Technology of Cambodia Statistics ( 2022-2023 )
1X
n
Therefore, θ̂n = ln Xi − ln xo
n i=1
Is θ̂n efficient? Z Z
+∞ +∞
xo 1/θ ln x
E(ln X) = ln xf (x; θ)dx = 1 dx
xo θ xo x1+ θ
Let u = ln x
Then,
Z
xo 1/θ +∞ −u/θ
E(U ) = ue du
θ ln xo
xo 1/θ
= θxo −1/θ ln xo + θ2 xo −1/θ
θ
= ln xo + θ
Then E θ̂n = θ. Therefore, it is an unbiased estimator.
!
1X
n
V θ̂n = V ln Xi − ln xo
n i=1
1 X
n
= 2 V (ln Xi )
n i=1
1
= V (ln X)
n
Where,
V (ln X) = E ln2 X − E 2 (ln X)
and,
Z
xo 1/θ +∞ ln2 x
2
E ln X = 1 dx
θ xo x1+ θ
Z
xo 1/θ +∞ 2 −u/θ
= ue du
θ ln xo
By changing variable u = ln x
we have
E(U ) = ln2 xo + 2θ ln xo + 2θ2
So
V (ln X) = ln2 xo + 2θ ln xo + 2θ2 − (ln xo + θ)2
= θ2
θ
Therefore, V θ̂n =
n
Fisher information,
∂2
I(θ) = −E ln(f (x; θ))
∂θ2
!
1/θ
∂2 xo
= −E ln
∂θ2 θxi 1+ θ1
By : Sun Bunra 60
Institute of Technology of Cambodia Statistics ( 2022-2023 )
where,
∂ 1
ln xo 1 1
ln xo 1/θ − ln θ − ln x1+ θ = − 2 − + 2 ln x
∂θ θ θ θ
so,
∂2 2 ln xo 1 2 ln x
2
(ln f (x; θ)) = 3
+ 2+ 3
∂θ θ θ θ
Then,
2 ln xo 1 2 ln x 2 ln xo 1 2
E 3
+ 2+ 3 = 3
+ 2 − 3 E(ln X)
θ θ θ θ θ θ
1
Therefore, I(θ) = 2
θ
1
Conclusion, Since V θ̂n = . So, θ̂n is efficient.
nI(θ)
Y
15
(b) Find a 95% CI for θ, when xi = 256514 , xo = 1900
i=1
By : Sun Bunra 61
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Solution :
(a) Find the MLE of θ
Yn n
Y ∑n √
e− θ i=1
√ 1
xi
1 xi
• L(θ) = f (x; θ) = √ e− θ = (2θ)−n Qn √
2θ xi i=1 xi
i=1 i=1
!
1 X√ Y
n n
√
⇒ ln L(θ) = −n ln(2θ) − xi − ln xi
θ i=1 i=1
1 X√
n
∂ n
• ln L(θ) = − + 2 xi
∂θ θ θ i=1
1 X√ 1 X√
n n
∂ n
• ln L(θ) = 0 ⇒ = 2 xi ⇒ θ = xi = ȳ.
∂θ θ θ i=1 n i=1
1 Xp
n
Therefore, the MLE of θ is θ̂n = Xi = Ȳ .
n i=1
!
1X
n
1X
n
nθ
We have E θ̂n = E Yi = E (Yi ) = = θ (1)
n i=1 n i=1 n
!
1X
n
1 X
n
nθ2 θ2
V θ̂n = V Yi = 2 V (Yi ) = 2 =
n i=1 n i=1 n n
Since X(Ω) = (0, ∞), then
√
∂ ∂ √ x
I(θ) = V ln f (x; θ) = V − ln(2θ) − ln x −
∂θ ∂θ θ
√
1 x 1 θ2 1
= V − + 2 = 4 V (Y ) = 4 = 2
θ θ θ θ θ
1 θ 2
⇒ = = V θ̂n (2)
nI(θ) n
By : Sun Bunra 62
Institute of Technology of Cambodia Statistics ( 2022-2023 )
By : Sun Bunra 63
Institute of Technology of Cambodia Statistics ( 2022-2023 )
1 1 1
where 0 < p ≤ is a parameter. The hypothesis Ho : p = to be tested against Ha : p < .
2 2 2
X20
If Ho is rejected when Xi ≤ 6, then what is the probability of type I error?
1=1
Therefore, α = 0.058
RR = {(x) : x ≥ 4}
By : Sun Bunra 64
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 3. A random sample of size 4 is taken from a normal distribution with unknown
mean µ and variance σ 2 > 0. To test Ho : µ = 0 against Ha : µ < 0 the following test is
used: ”Reject Ho if and only if X1 + X2 + X3 + X4 < −20.” Find the value of σ so that the
significance level of this test will be closed to 0.14.
Solution :
Find the value of σ so that the significance level of this test will be closed to 0.14.
i.i.d
We have X1 , . . . , X4 ∼ N (µ, σ)
The hypothesis H0 : µ = 0 vs Ha : µ < 0
Then,
α = P ( type I error ) = P ( reject H0 | H0 is true ) = 0.14
= P (X1 + X2 + X3 + X4 < −20 | µ = 0) = 0.14
= P (X̄ < −5 | µ = 0) = 0.14
σ2
= P (X̄ < −5) = 0.14 where X̄ ∼ N 0,
4
10 10
Then, Φ − = 0.14 =⇒ − = −1.08 =⇒ σ = 9.25
σ σ
Therefore, σ = 9.25
Exercise 4. Let X1 , X2 , . . . , X25 be a random sample of size 25 drawn from a normal distri-
bution with unknown mean µ and variance σ 2 = 100. It is desired to test the null hypothesis
Ho : µ = 4 against the alternative Ha : µ = 6. What is the power at µ = 6 of the test with
X25
rejection rule: reject µ = 4 if Xi ≥ 125 ?
i=1
Solution :
X
25
What is the power at µ = 6 of the test with rejection rule: reject µ = 4 if Xi ≥ 125?
i=1
i.i.d
We have X1 , . . . , Xn ∼ N µ, σ 2
The hypotheses H0 : µ = 4 vs Ha : µ = 6
( )
X
25
RR = (x1 , . . . , x25 ) : Xi ≥ 125
i=1
By : Sun Bunra 65
Institute of Technology of Cambodia Statistics ( 2022-2023 )
We have,
π(6) = P ( Reject H0 | µ = 6) is a power at µ = 6
!
X25
=P Xi ≥ 125 | µ = 6
i=1
100
= P (X̄ ≥ 5) Where X̄ ∼ N 6,
25
5−6
=1−Φ
2
= 1 − 0.30854 = 0.6915
Therefore, π(6) = 0.6915
Exercise 5. A urn contains 7 balls, θ of which are red. A reandom sample of size 2 is drawn
without replacement to test Ho : θ ≤ 1 against Ha : θ > 1. If the null hypothesis is rejected
if one or more red balls are drawn, find the power of the test when θ = 2.
Solution : Find the power of test when θ = 2
We have RR = {θ : θ ≥ 1}
Then,
π(2) = P ( Reject H0 | θ = 6)
= P (θ ≥ 1 | θ = 2)
= P ( one or more balls are drawn | two red are drawn )
= 1 − P ( no red ball is drawn | two red are drawn )
1 1
5 4 20 11
= 1 − = 1 − = ( drawn without replacement )
1 1 42 21
7 6
11
Therefore, π(2) =
21
Exercise 6. Let X1 , X2 , · · · , Xn be a random sample from N 0, σ 2 .
( )
Xn
(a) Show that C = (x1 , x2 , · · · , xn ) : x2i ≥ c is a best rejection region for testing
i=1
H0 : σ 2 = 4 against Ha : σ 2 = 16.
X
n
(b) If n = 15, find the value of c so that α = 0.05. [Hint: Recall that Xi2 /σ 2 is χ2 (n).
i=1
(c) If n = 15 and c is ! the value found in part (b), find the approximate value of β =
X n
P Xi2 < c | σ 2 = 16 .
i=1
Solution :
( )
X
n
(a) Show that RR = (x1 , . . . , xn ) : x2i ≥ c is a best rejection region for testing H0 :
i=1
σ 2 = 4 against Ha : σ 2 = 16
By : Sun Bunra 66
Institute of Technology of Cambodia Statistics ( 2022-2023 )
i.i.d
We have X1 , . . . , Xn ∼ N 0, σ 2
so, its pdf is given by :
1 x2
f x; σ 2 = √ e− 2σ2 ∀x ∈ R
2πσ
The hypotheses H0 : σ 2 = 4 vs Ha : σ 2 = 16
By applying Neymann-Pearson lemma, the best RR is
L(4)
RR = (x1 , . . . , xn ) | ≤k and P ( Reject H0 | H0 ) = α
L(16)
Y
n
−n/2 − 1 ∑n x2
2
We have L σ = f xi , σ 2 = 2πσ 2 e 2σ2 i=1 i so
i=1
( ∑n
L(4) = (2π4)−n/2 e− 8 i=1∑xi
1 2
n
L(16) = (2π16)−n/2 e− 32 i=1 xi
1 2
Then,
L(4) 1 ∑n
= 2n e( 32 − 8 ) i=1 xi
1 2
L(16)
∑n
= 2n e− 32
3
x2i
i=1 ≤k
3 X 2
n
n ln 2 − x ≤ ln k
32 i=1 i
X
n
32
x2i ≥ (n ln 2 − ln k)
i=1
3
32
Let (n ln 2 − ln k) = c
3
( )
X
n
Therefore, RR = (x1 , . . . , xn ) | x2i ≥c
i=1
Hence,
X
n
X2 i
∼ χ2 (n)
i=1
σ2
Since,
P ( Reject H0 | H0 ) = α
!
Xn
P x2i ≥ c | σ 2 = α
P
i=1
n
x2i c
P i=1
≥ | σ2 = α = 0.05
σ2 4
c
χ215,0.05 = = 25 =⇒ c = 100
4
By : Sun Bunra 67
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Therefore, c = 100
(c) if n = 15 and c is the value found in part (b), find the approximate value of
!
X
n
β=P Xi2 < c | σ 2 = 16
i=1
!
X
n
β=P Xi2 < c | σ 2 = 16
i=1
= P χ2 (15) ≤ 6.25
= 0.03
Therefore, β = 0.3
Exercise 7. Let X have a Pareto distribution with parameter θ > 0; that is, the pdf of X
is
1 x− θ1 −1 x > 1
f (x; θ) = θ
0 otherwise
Find mgf of Yn
By : Sun Bunra 68
Institute of Technology of Cambodia Statistics ( 2022-2023 )
We have,
MYn (t) = E etYn
2t ∑n
ln Xi
=E e θ i=1
2t 2t 2t
= E e θ ln X1 × E e θ ln X2 × . . . × E e θ ln Xn
h 2t in
= E e θ ln X
n
2t
= Mln X
θ
We have,
Z
t ln X
t
∞
Mln X (t) = E e =E X = xt f (x; θ)dx
Z1 ∞
xt − 1 −1
= x θ dx
θ
Z1 ∞
1 − 1 −1+t
= x θ dx
1 θ
1
=
1 − tθ
n
1
Therefore, MYn (t) = = (1 − tθ)−n
1 − tθ
So, Yn ∼ χ2 (2n) Therefore, Yn ∼ χ2 (2n)
(b) Using Neyman-Pearson lemma, show that the best critical region for testing H0 : θ = θ0
against Ha : θ = θa , θa > θ0 > 0, at level of test α, is
( )
X
n
RR = (x1 , . . . , xn ) : ln xi ≥ c ,
i=1
L (θ0 )
RR = (x1 , . . . , xn ) : ≤k
L (θa )
1 Y −1− θ1
n
We have, L(θ) = x
θn i=1 i
By : Sun Bunra 69
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Then
n Y
n
L (θ0 ) θa 1
− θ1
= x θa 0 ≤k
L (θa ) θ0 i=1
Y
n n
1
− θ1 θ0
x θa 0 ≤k
i=1
θa
Yn n
1
− θ1 θ0
ln x θa 0 ≤ ln k
i=1
θa
n
1 X
n
1 θ0
− ln xi ≤ ln k
θa θ0 i=1 θa
X
n n
θ0 θa θ0
ln xi ≥ ln k
i=1
θ0 − θa θa
n
θ0 θa θ0
Let c = ln k
θ0 − θa θa
( )
X
n
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ c is the best critical region.
i=1
⇒P ( Reject H0 | H0 ) = α
!
Xn
P ln Xi ≥ c | θ0 = α
i=1
!
2 X
n
2c 2c
P ln Xi ≥ =P Yn ≥
θ0 i=1 θ0 θ0
Since, Yn ∼ χ2 (2n)
2c θ0 θ0
Then, = χ22n,α =⇒ c = χ22n,α Therefore, c = χ22n,α
θ0 2 2
(c) Is the above critical region RR is uniformly most powerful for testing H0 : θ = θ0 against
Ha : θ > θ0 at level of test α ? Justify your answer.
Remark : A test defined by a critical region C of size α is a Uniformly most powerful test if
it is a most powerful test against each simple alternative in Ha .
Since, the test statistic Yn and C are independent of θa .
Thus, the RR is uniformly most powerful.
Therefore, RR is uniformly most powerful.
(d) If n = 12, α = 0.10, H0 : θ = 3 and Ha : θ = 5.Determine the critical region RR.
( )
X n
θ0 2
We have RR = (x1 , . . . , xn ) : ln xi ≥ c = χ2n,α
i=1
2
( )
Xn
3 2
Then, RR = (x1 , . . . , xn ) : ln xi ≥ c = χ24,0.1
i=1
2
( )
Xn
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ 49.794
i=1
By : Sun Bunra 70
Institute of Technology of Cambodia Statistics ( 2022-2023 )
RR = {z : z ≤ −2.575 or z ≥ 2.575}
Since, the test statistic value is not include in RR so, H0 is not rejected.
Therefore, H0 is not rejected.
(b) If a level 0.01 test is used, what is β(94), the probability of a type II error when µ = 94?
We have
′ µ0 − µ′ µ0 − µ′
β (µ ) = Φ z α2 + √ − Φ −z α2 + √
σ/ n σ/ n
95 − 94 95 − 94
= Φ 2.575 + − Φ −2.575 +
1.2/4 1.2/4
= Φ(5.9) − Φ(0.75) = 1 − 0.7734 = 0.2266
Therefore, β(94) = 0.2266
(c) What value of n is necessary to ensure that β(94) = 0.1 when σ = 0.01 ?
We have, " #2
σ z α2 + zβ
n= an approximate solution
µ0 − µ′
0.01(2.575 + 1.285)
Then, n = where zβ = 1.285, So, n = 22
95 − 94
Therefore, value of n is necessary to ensure that β(94) = 0.1 when σ = 0.01 is 22.
By : Sun Bunra 71
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 9. The desired percentage of SiO2 in a certain type of aluminous cement is 5.5.
To test whether the true average percentage is 5.5 for a particular production facility, 16
independently obtained samples are analyzed. Suppose that the percentage of SiO2 in a
sample is normally distributed with σ = 0.3 and that x̄ = 5.25.
(a) Does this indicate conclusively that the true average percentage differs from 5.5 ?
(b) If the true average percentage is µ = 5.6 and a level α = 0.01 test based on n = 16 is
used, what is the probability of detecting this departure from H0 ?
(c) What value of n is required to satisfy α = 0.01 and β(5.6) = 0.01 ?
Solution :
(a) Does this indicate conclusively that the true average percentage differs from 5.5?
Test statistic value :
x̄ − µ 5.25 − 5.5
z= √ = = −3.33
σ/ n 0.3/4
By using two-tailed level the rejection area is given by :
RR = |z| ≥ z α2 = z : z ≥ z α2 or z ≤ −z α2
Take α = 0.01 so z α2 = Φ−1 (1 − 0.01) = 2.575
Then
RR = {z : z ≥ 2.575 or z ≤ −2.575}
Since the test statistic value is not included in rejection region, Then the true average
percentage differs from 5.5
Therefore, The true average percentage is different from 5.5.
(b) If the true average percentage is µ = 5.6 and a level α = 0.01 test based on n = 16 is
used, what is the probability of detecting this departure from H0 ?
By : Sun Bunra 72
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Therefore, n = 217
Exercise 10. The article ”Uncertainty Estimation in Railway Track Life- Cycle Cost” (J.
of Rail and Rapid Transit, 2009) presented the following data on time to repair (min) a rail
break in the high rail on a curved track of a certain railway line.
159 120 480 149 270 547 340 43 228 202 240 218
A normal probability plot of the data shows a reasonably linear pattern, so it is plausible
that the population distribution of repair time is at least approximately normal. The sample
mean and standard deviation are 249.7 and 145.1, respectively.
(a) Is there compelling evidence for concluding that true average repair time exceeds 200 min
? Carry out a test of hypotheses using a significance level of 0.05.
(b) Using σ = 150, what is the type II error probability of the test used in (a) when true
average repair time is actually 300 min ? That is, what is β(300) ?
Solution :
(a) Is there compelling evidence for concluding that true average repair time exceeds 200
min? Carry out a test of hypotheses using a significance level of 0.05.
Test statistic value :
x̄ − µ
t= √
s/ n
249.7 − 200
Then, t = √ = 1.1865
145.1/ 12
By using upper-tailed level for 0.05 test the rejection region is given by :
RR = {t : t ≥ tα,n−1 }
By : Sun Bunra 73
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 11. Given the accompanying sample data on expense ratio (%) for large-cap
growth mutual funds:
0.52 1.06 1.26 2.17 1.55 0.99 1.10 1.07 1.81 2.05
0.91 0.79 1.39 0.62 1.52 1.02 1.10 1.78 1.01 1.15
x̄ − µ
t= √
s/ n
v
P20 u
u 1 X 20
i=1 xi
we have x̄ = = 1.2435 and s = t (xi − x̄)2 = 0.4484
20 n − 1 i1
1.2435 − 1
Then, t = √ = 2.4285
0.4484/ 20
By using upper-tailed level for 0.01 test the rejection region is given by :
RR = {t : t ≥ tα,n−1 }
By : Sun Bunra 74
Institute of Technology of Cambodia Statistics ( 2022-2023 )
this fact.
• Type II errors: the true expense ratio exceeds 1% based on the data, and we accept that
µ = 1%
The source from which the data was obtained reported that µ = 1.33 > 1
So, we actually commit an Type II errors in reaching our conclusion.
(c) Supposing that σ = 0.5, determine and interpret the power of the test in (a) for the
actual value of µ stated in (b).
We have
π (µ′ ) = 1 − P ( Type II Errors )
µ − µ′
= 1 − Φ z2 + √
α
σ/ n
1 − 1.33
= 1 − Φ 1.645 + √
0.5/ 762
= 1 − Φ(−0.62) = 0.7324
Therefore, π(1.33) = 0.7324
Interpret : We are 73.24% sure that for alternative hypothesis µ′ = 1.33% the test statistic
is included in the rejection region.
RR = {z : z ≤ −zα }
RR = {z : z ≤ −2.06}
Since the test statistic value is not included in rejection region, so there is not enough
evidence to refute.
By : Sun Bunra 75
Institute of Technology of Cambodia Statistics ( 2022-2023 )
X
78 X
78
xi = 22.8, (xi − x̄)2 = 2.05.
i=1 i=1
(a) Test the null hypothesis that µ = 0.45 against the alternative hypothesis that µ < 0.45
using α = 0.01. Also find the p-value.
(b) Test the null hypothesis that µ = 0.45 against the alternative hypothesis that µ ̸= 0.45
using α = 0.01. Also find the p-value.
(c) What assumptions did you make for solving (a) and (b)?
Solution :
(a) Test the null hypothesis that µ = 0.45 against the alternative hypothesis that µ < 0.45
using α = 0.01. Also find the p-value.
We have, Null hypothesis H0 : µ = 0.45 and Alternative hypothesis Ha : µ < 0.45
Test statistic value :
x̄ − µ
z= √
s/ n
r
22.8 2.05
We have, x̄ = = 0.2923, s = = 0.1631
78 77
Then
0.2923 − 0.45
z= √ = −8.54
0.1631/ 78
By using lower-tailed level for 0.01 test the rejection region is given by :
RR = {z : z ≤ −zα }
By : Sun Bunra 76
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 14. The number of carbohydrates found in a random sample of fast-food entrees
is listed. Is there sufficient evidence to conclude that the variance differs from 100? Use the
0.05 level of significance.
53 46 39 39 30
47 38 73 43 41
Solution :
Given Information
The number of carbohydrates found in a random sample of fast-food entrees is listed below
S = {53, 46, 39, 39, 30, 47, 38, 73, 43, 41}
The Population Variance :
σ 2 = 100
The significance Level :
α = 0.05
We need to test whether the sample variance differs from the population variance or not.
The Null and Alternative hypothesis are :
H0 : σ 2 = 100; H1 : σ 2 ̸= 100
X
N
x̄ = xi = 44.9
i=1
s
P
(xi − x̄)2
s= = 135.433
N −1
By : Sun Bunra 77
Institute of Technology of Cambodia Statistics ( 2022-2023 )
S. No. Data
1 53
2 46
3 39
4 39
5 30
6 47
7 38
8 73
9 43
10 41
MEAN 44.9
STD 135.433
Compute the test value :
(n − 1)s2
χ2 =
σ2
(10 − 1)(135.433)
=
(100)
= 12.189
From Table G, the value of Critical χ2 for d.f. = 9 and α = 0.025 (For two tailed test) is
16.919.
It does not cross the critical value and hence we do not reject the Null hypothesis. Therefore,
we do not have enough evidence to support the claim that variance differs from 100.
Exercise 15. The manager of a large company claims that the standard deviation of the
time (in minutes) that it takes a telephone call to be transferred to the correct office in her
company is 1.2 minutes or less. A random sample of 15 calls is selected, and the calls are
timed. The standard deviation of the sample is 1.8 minutes. At α = 0.01, test the claim
that the standard deviation is less than or equal to 1.2 minutes. Use the P -value method.
Solution :
From the given information, n = 15; s = 1.8 and σ = 1.2
Null hypothesis, H0 : σ ≤ 1.2
Alternative hypothesis, H1 : σ > 1.2
Level of significance, α = 0.01
Test statistic is,
(n − 1)s2
χ2 =
σ2
(15 − 1)(1.8)2
=
(1.2)2
= 31.5
The degrees of freedom is,
df = n − 1
= 15 − 1
= 14
By : Sun Bunra 78
Institute of Technology of Cambodia Statistics ( 2022-2023 )
It is observed that the p - value is less than the given significance level, so we reject the null
hypothesis and conclude that there is enough evidence to reject the null hypothesis and the
standard deviation is less or equal to 12 .
Exercise 16. A machine fills 12 -ounce bottles with soda. For the machine to function
properly, the standard deviation of the sample must be less than or equal to 0.03 ounce. A
random sample of 8 bottles is selected, and the number of ounces of soda in each bottle is
given. At a α = 0.05, can we reject the claim that the machine is functioning properly? Use
the P -value method.
12.03 12.10 12.02 11.98
12.00 12.05 11.97 11.99
Solution :
Significance level α = 0.05, n=8
Denote appropriate null and alternative hypothesis from the claim given in the exercise :
H0 : σ = 0.03 (claim)
H1 : σ > 0.03
From the data given in the exercise we need to calculate sample mean and sample standard
deviation with the formula :
P
X
X̄ = = 12.018
sn
P 2
Xi − X̄
s= = 0.043
n−1
By : Sun Bunra 79
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 17. A coin is tossed 9 times and 3 heads appear. Can you conclude that the coin
is not balanced? Use α = 0.10. [Hint: Use the binomial table and find 2P (X ≤ 3) with
p = 0.5 and n = 9.
Solution :
Given :
x=3
n=9
If the coin is fair, then we have 1 chance out of 2 to toss heads:
1
p= = 0.5
2
Determine the hypotheses :
H0 : p = 0.5
Ha : p ̸= 0.5
The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme, if the null hypothesis is true. Formula binomial probability :
n
P (X = k) = · pk · (1 − p)n−k
k
Evaluate at k = 0, 1, 2, 3 :
9
P (X = 0) = · 0.50 · (1 − 0.5)9−0 ≈ 0.0020
0
9
P (X = 1) = · 0.51 · (1 − 0.5)9−1 ≈ 0.0176
1
9
P (X = 2) = · 0.52 · (1 − 0.5)9−2 ≈ 0.0703
2
9
P (X = 3) = · 0.53 · (1 − 0.5)9−3 ≈ 0.1641
3
Add the probabilities :
P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 0.0020 + 0.0176 + 0.0703 + 0.1641
= 0.2539
P = 2 × 0.2539 = 0.5078
If the P-value is smaller than the significance level, then the null hypothesis is rejected.
By : Sun Bunra 80
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 18. In the past, 20% of all airline passengers flew first class. In a sample of 15
passengers, 5 flew first class. At α = 0.10, can you conclude that the proportions have
changed?
Solution :
Can you conclude that proportion has change?
Let p be the proportion of the changed of first class
Test H0 : p = 0.2 versus Hα : p ̸= 0.2
We have n = 15, x = 5 and α = 0.1
x 5
• p̂ = = = 0.34
n 15
p̂ − p0 0.34 − 0.20
• Test statistic value z = q −q = 1.36
1
n
p 0 (1 − p 0 ) 1
15
0.2 × 0.8
Since, z = 1.36 ∈
/C
Hance, we decided to do not reject H0 when α = 0.1 based on the given sample.
Therefore, we can say that from the past until now the number of passengers flew the first
class in not changing.
Exercise 19. A survey by Men’s Health magazine stated that 14% of men said they used
exercise to reduce stress. Use α = 0.10. A random sample of 100 men was selected, and 10
said that they used exercise to relieve stress. Use the P -value method to test the claim.
Solution :
Use the P -value method to test the claim
x 10
We have: α = 0.1, P = 0.14, β = = = 0.1
n 100
The test hypotheses : H0 : P̂ = P0 = 0.14, Ha ̸= P0
The fest statistics value z
P̂ − P0 0.1 − 0.14
z= 12 = 21 = −0.15
1
P
n 0
(1 − P0 ) 1
100
(0.14)(1 − 0.14)
The
P- value = 2[1 − ϕ(IzI)] = 2[1 − ϕ(0.15)]
= 2[1 − 0.5596]
= 0.8808
Since the P -value is > 0.1, The H0 is accepted
There is enough evidence to support the claim that 14% of mean used exercise to reduce
stress.
By : Sun Bunra 81
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 20. A common characterization of obese individuals is that their body mass in-
dex is at least 30 [BMI =weight /( height )2 , where height is in meters and weight is in
kilograms]. The article ”The Impact of Obesity on Illness Absence and Productivity in an
Industrial Population of Petrochemical Workers” (Annals of Epidemiology, 2008: 8-14) re-
ported that in a sample of female workers, 262 had BMIs of less than 25, 159 had BMIs
that were at least 25 but less than 30, and 120 had BMIs exceeding 30. Is there compelling
evidence for concluding that more than 20% of the individuals in the sampled population
are obese?
(a) State and test appropriate hypotheses using the rejection region approach with a signif-
icance level of 0.05.
(b) Explain in the context of this scenario what constitutes type I and II errors.
(c) What is the probability of not concluding that more than 20% of the population is obese
when the actual percentage of obese individuals is 25% ?
Solution :
(a) State and test appropriate hypotheses using the rejection region approach with a signif-
icance level of 0.05
Note : 262 had BMIs of less than 25,159 had BMIs that were at least 25 but less than 30,120
had BMIs exceeding 30.
So, we have n = 262 + 159 + 120 = 541, x = 120
120
Then, p̂ = = 0.221
541
Null hypothesis: H0 : p = 0.2
Alternative hypothesis: Ha : p > 0.2
p̂ − p0 0.221 − 0.2
The test statistic value: z=p =p = 1.221
p0 (1 − p0 ) /n 0.2(1 − 0.2)/541
By using upper-tailed level for 0.05 test, the rejection region is defined by :
RR = {z : z ≥ zα }
By : Sun Bunra 82
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Exercise 21. A manufacturer of nickel-hydrogen batteries randomly selects 100 nickel plates
for test cells, cycles them a specified number of times, and determines that 14 of the plates
have blistered.
(a) Does this provide compelling evidence for concluding that more than 10% of all plates
blister under such circumstances? State and test the appropriate hypotheses using a sig-
nificance level of 0.05. In reaching your conclusion, what type of error might you have
committed?
(b) If it is really the case that 15% of all plates blister under these circumstances and a sam-
ple size of 100 is used,how likely is it that the null hypothesis of part (a) will not berejected
by the level 0.05 test? Answer this question for a sample size of 200 .
(c) How many plates would have to be tested to have β(0.15) = 0.10 for the test of part (a)?
Solution :
(a) Null hypothesis: H0 : p = 0.1
Alternative hypothesis: Ha : p > 0.1
14
We have n = 100, x = 14. Then, p̂ = = 0.14
100
The test statistic value :
p̂ − p0 0.14 − 0.1
z=p =p = 1.333
pe (1 − pe ) /n 0.1(1 − 0.1)/100
By using upper-tailed level for 0.05 test, the rejection region is defined by :
RR = {z : z ≥ za }
By : Sun Bunra 83
Institute of Technology of Cambodia Statistics ( 2022-2023 )
We have p !
′
p − p + z p (1 − p ) /n
β (p′ ) = Φ
0 a 0 0
p
p′ (1 − p′ ) /n
p !
0.1 − 0.15 + 1.645 0.1(1 − 0.1)/n
⇒ β(0.15) = Φ p
0.15(1 − 0.15)/n
For n = 100, then
p !
0.1 − 0.15 + 1.645 0.1(1 − 0.1)/100
For n = 200, then β(0.15) = Φ p = 0.493
0.15(1 − 0.15)/100
Thus,
β(0.15) = 0.493 for n = 100
β(0.15) = 0.275 for n = 200
(c) Find n when β(0.15) = 0.10 for the test of part (a)
p p !2
zσ p0 (1 − p0 ) + zj p′ (1 − p′ )
We have n =
p′ − p0
For β = 0.10, then zj = 1.282
p p !2
1.645 0.1(1 − 0.1) + 1.282 0.15(1 − 0.15)
So, n = = 361.9625 ≈ 362
0.15 − 0.1
Therefore, n = 362
Exercise 22. Let X have a Pareto distribution with parameter θ > 0; that is, the pdf of X
is
1 x− θ1 −1 , x > 1
f (x; θ) = θ
0, otherwise.
Let X1 , X2 , . . . , Xn be a random sample from this distribution.
2X
n
(a) Let Yn = ln Xi . Show that Yn has chi-squared distribution with degree of freedom
θ i=1
n ∼ χ (2n) .(Recall that if V∼ χ (ν), then the moment generating function
2 2
2n (that is, Y
1
(mgf) of V is GV (t) = (1 − 2t)−ν/2 , t < .
2
(b) Using Neyman-Pearson lemma, show that the best critical region for testing H0 : θ = θ0
against Ha : θ = θa , θa > θ0 > 0, at level of test α, is
( )
X
n
RR = (x1 , . . . , xn ) : ln xi ≥ c ,
i=1
By : Sun Bunra 84
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Solution :
2X
n
(a) Let Yn = ln Xi . Show that Yn has a chi-squared distribution with degree of freedom
θ i=1
2n (that is, Yn ∼ χ2 (2n) .
We have
1 x− θ1 −1 x > 1,
f (x; θ) = θ
0 otherwise.
Find mgf of Yn
We have,
MYn (t) = E etYn
2t ∑n
= E e θ i=1 ln Xi
2t 2t 2t
= E e θ ln X1 × E e θ ln X2 × . . . × E e θ ln Xn
h 2t in
ln X
= E e θ
n
2t
= Mln X
θ
And, Z
t ln X
t
∞
Mln X (t) = E e =E X = xt f (x; θ)dx
Z1 ∞
xt − 1 −1
= x θ dx
θ
Z1 ∞
1 − 1 −1+t
= x θ dx
1 θ
1
=
1 − tθ
n
1
Therefore, MYn (t) = = (1 − tθ)−n , So, Yn ∼ χ2 (2n)
1 − tθ
Therefore, Yn ∼ χ2 (2n)
(b) Using Neyman-Pearson lemma, show that the best critical region for testing H0 : θ = θ0
against Ha : θ = θa , θa > θ0 > 0, at level of test α, is
( )
X
n
RR = (x1 , . . . , xn ) : ln xi ≥ c ,
i=1
1 Y −1− θ1
n
We have, L(θ) = x
θn i=1 i
By : Sun Bunra 85
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Then n Y
n
L (θ0 ) θa 1
− θ1
= x θa 0 ≤k
L (θa ) θ0 i=1
Y
n n
1
− θ1 θ0
x θa 0 ≤k
i=1
θa
Y
n n
1
− θ1 θ0
ln x θa 0 ≤ ln k
i=1
θa
n
1 X
n
1 θ0
− ln xi ≤ ln k
θa θ0 i=1 θa
X
n n
θ0 θa θ0
ln xi ≥ ln k
i=1
θ0 − θa θa
n
θ0 θa θ0
Let c = ln k
θ0 − θa θa
( )
X
n
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ c is the best critical region.
i=1
⇒P ( Reject H0 | H0 ) = α
!
Xn
P ln Xi ≥ c | θ0 = α
i=1
!
2 X
n
2c 2c
P ln Xi ≥ =P Yn ≥
θ0 i=1 θ0 θ0
Since, Yn ∼ χ2 (2n)
2c θ0 θ0
Then, = χ22n,α =⇒ c = χ22n,α Therefore, c = χ22n,α
θ0 2 2
(c) Is the above critical region RR is uniformly most powerful for testing H0 : θ = θ0 against
Ha : θ > θ0 at level of test α ? Justify your answer.
Remark : A test defined by a critical region C of size α is a Uniformly most powerful test if
it is a most powerful test against each simple alternative in Ha .
Since, the test statistic Yn and C are independent of θa .
Thus, the RR is uniformly most powerful.
Therefore, RR is uniformly most powerful.
(d) If n = 12, α = 0.10, H0 : θ = 3 and Ha : θ = 5.Determine the critical region RR.
( )
X n
θ0 2
We have RR = (x1 , . . . , xn ) : ln xi ≥ c = χ2n,α
i=1
2
( )
Xn
3 2
Then, RR = (x1 , . . . , xn ) : ln xi ≥ c = χ24,0.1
i=1
2
( )
Xn
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ 49.794
i=1
By : Sun Bunra 86
Institute of Technology of Cambodia Statistics ( 2022-2023 )
(e) Find the best critical region for testing H0 : θ = 1 versus Ha : θ = θa , where θa > 1 when
α = 0.01 and n = 15.
(f) Is the test in (e) a UMP test for testing H0 : θ = 1 vs Ha : θ > 1 ? Justify your answer.
Solution :
√
(a) Show that Y = X ∼ Exp(θ)
√
Y = X ⇔X =Y2
dx
⇒J = = 2y
dy
Therefore, Y ∼ Exp(θ).
(b) Find the MLE of θ
Yn n
Y ∑n √
e− θ i=1
√ 1
xi
1 xi
• L(θ) = f (x; θ) = √ e− θ = (2θ)−n Qn √
2θ xi i=1 xi
i=1 i=1
!
1 X√ Y
n n
√
⇒ ln L(θ) = −n ln(2θ) − xi − ln xi
θ i=1 i=1
1 X√
n
∂ n
• ln L(θ) = − + 2 xi
∂θ θ θ i=1
1 X√ 1 X√
n n
∂ n
• ln L(θ) = 0 ⇒ = 2 xi ⇒ θ = xi = ȳ.
∂θ θ θ i=1 n i=1
By : Sun Bunra 87
Institute of Technology of Cambodia Statistics ( 2022-2023 )
1 Xp
n
Therefore, the MLE of θ is θ̂n = Xi = Ȳ .
n i=1
!
1X
n
1X
n
nθ
We have E θ̂n = E Yi = E (Yi ) = = θ (1)
n i=1 n i=1 n
!
1X
n
1 X
n
nθ2 θ2
V θ̂n = V Yi = 2 V (Yi ) = 2 =
n i=1 n i=1 n n
Since X(Ω) = (0, ∞), then
√
∂ ∂ √ x
I(θ) = V ln f (x; θ) = V − ln(2θ) − ln x −
∂θ ∂θ θ
√ 2
1 x 1 θ 1
= V − + 2 = 4 V (Y ) = 4 = 2
θ θ θ θ θ
1 θ 2
⇒ = = V θ̂n (2)
nI(θ) n
From (1) and (2), we have θ̂n is an efficient estimator of θ.
2 Xp 2X
n n
(c) Let U = Xi = Yi . Then,
θ i=1 θ i=1
2t ∑n
MU (t) = E etU = E e θ i=1 Yi
h 2t in n
2t
Y
= E eθ = MY
θ
Since Y ∼ Exp(θ), then
1
MY (t) = (1 − θt)−1 , t <
2
−1
2t 2t 1
⇒ MY = 1−θ× = (1 − 2t)−1 , t <
θ θ 2
1
Therefore, MU (t) = (1 − 2t)−n = (1 − 2t)− 2 , t < . Hence, U ∼ χ2 (2n)
2n
2
X√
15
(d) Find 90% CI for θ when xi = 47.
i=1
By : Sun Bunra 88
Institute of Technology of Cambodia Statistics ( 2022-2023 )
(e) H0 : θ = 1 vs Ha : θ = θa , θa > 1
By Neyman-Pearson lemma, we have, for k > 0,
L (θ0 ) L(1)
≤k⇔ ≤k
L (θa ) L (θa )
∑ n √x
1 n
e i=1 i
⇔ 2n 1 ∑n √ ≤ k
1
2θa
e− θa i=1 xi
∑n √
⇔ θan e( θa −1) i=1 xi ≤ k
1
n
1 − θa X √ k
⇔ xi ≤ ln n
θa i=1
θa
X√n
θa k
⇔ xi ≥ ln n = c
i=1
1 − θa θa
( )
X
n
√
Thus, RR = (x1 , · · · , xn ) : xi ≥ c where the constant c is defined by
i=1
2c
α=P U ≥ |θ=1
θ
2c
0.01 = P U ≥ | θ = 1 , U ∼ χ2 (2n)
θ
= P (U ≥ 2c), n = 15
So, 2c = χ20.01,30 = 50.892 ⇒ c = 25.446.
( )
X
15
√
Hence, RR = (x1 , · · · , xn ) : xi ≥ 25.446 .
i=1
X
n p
(f) Since the test H0 : θ = 1 vs Ha : θ = θa defines the test statistic Xi and RR do not
i=1
depend on θa , for each θa > 1, then it is a UMP test for testing H0 : θ = 1 vs H0 : θ > 1
By : Sun Bunra 89
Institute of Technology of Cambodia Statistics ( 2022-2023 )
y = − ln x ⇔ x = e−y
dx
⇒J = = −e−y
dy
⇒ |J| = e−y
Therefore, Y ∼ Exp(θ).
(b) Find the MLE of θ
! θ1 −1
Y
n Y
n
1 1
−1
Y
n
L(θ) = f (x; θ) = xi = θ−nθ
xi
i=1 i=1
θ i=1
X n
1
ln L(θ) = −n ln θ + −1 ln xi
θ i=1
1 X
n
∂ n
ln L(θ) = − − 2 ln xi
∂θ θ θ i=1
1X
n
⇒θ=− ln xi = ȳ
n i=1
1X
n
So, the MLE of θ is θ̂n = − ln Xi = Ȳ .
n i=1
By : Sun Bunra 90
Institute of Technology of Cambodia Statistics ( 2022-2023 )
1
Since ln f (x; θ) = − ln θ + − 1 ln x then
θ
∂ 1 1
ln f (x; θ) = − − 2 y
∂θ θ θ
Then
1 θ2 1
I(θ) =
4
V (Y ) = 4
= 2
θ θ θ
1 θ 2
⇒ = = V θ̂n
nI(θ) n
2X
n
2nθ̂n
U= =− ln Xi
θ θ i=1
2t ∑n
MU (t) = E etU = E e− θ i=1 ln xi
h 2t ∑n in h in
−θ ln x − 2t
= E e i=1 = E X θ
Since Z Z
1
1 1 −1 1 1 1−2t
− 2t − 2t
E X θ = x x dx =
θ θ x θ dx
0 θ θ 0
" 1−2t #1
1 x θ 1
= 1−2t = (1 − 2t)−1 , t <
θ θ
2
0
1
So, MU (t) = (1 − 2t)−n = (1 − 2t)− 2 , t < .
2n
2
Hence, U ∼ χ (2n).
2
2nθ̂n
=P χ1− α2 ,2n ≤ ≤ χ α2 ,2n
θ
!
2nθ̂n 2nθ̂n
=P ≤θ≤
χ2α ,2n χ1− α2 ,2n
2
(e) H0 : θ = 1 vs Ha : θ = θa , θa > 1
By : Sun Bunra 91
Institute of Technology of Cambodia Statistics ( 2022-2023 )
L (θ0 ) L(1)
≤k⇔ ≤k
L (θa ) L (θa )
1
⇔ Q θ 1−1 ≤ k
−n n 0
θa j=1 xi
! σ 1−1
Yn 0
⇔ xi ≤ kθa−n
i=1
X
1
n
⇔ 1− ln xi ≤ ln kθa−n
θa i=1
X
n
θa
⇔ ln xi ≤ ln kθa−n ∼=c
i=1
θa − 1
( )
X
n
Therefore, RR = (x1 , · · · , xn ) : ln xi ≤ c where the constant c is defined by
i=1
!
X
n
α=P ln Xi ≤ c | θ = 1
i=1
2c
= P U ≥ − | θ = 1 , U ∼ χ2 (2n)
θ
= P (U ≥ −2c)
X
n
(f) Since the test H0 : θ = 1 vs Ha : θ = θa defines the test statistic ln Xi and RR do not
i=1
depend on θa , for each θa > 1, then it is a UMP test for testing H0 : θ = 1 vs H0 : θ > 1.
Exercise 25. Suppose that X, the fraction of a container that is filled, has pdf f (x; θ) =
θxθ−1 for 0 < x < 1 (where θ > 0 ) and zero otherwise, and let X1 , . . . , Xn be a random
sample from this distribution.
(a) Show thatXthe most powerful test for H0 : θ = 1 versus Ha : θ = 2 rejects the null
hypothesis if ln (xi ) ≥ c.
(b) Is the test of (a) UMP for testing H0 : θ = 1 versus Ha : θ > 1 ? Explain your reasoning.
(c) If n = 50, what is the (approximate) value of c for which the test has significance level
0.05 ?
Solution :
The probability density function is f (x, θ) = θxθ−1
We have n random variables describing the fraction of container that is filled.
By : Sun Bunra 92
Institute of Technology of Cambodia Statistics ( 2022-2023 )
f (x1 , . . . , xn ; θ = 2)
≥k
f (x1 , . . . , xn ; θ = 1)
Q 2−1
2n ( ni=1 xi )
⇒ Q 1−1 ≥ k
1n ( ni=1 xi )
Yn
k
⇒ xi ≥ n
i=1
2
X
n
k
The rejection region for the most powerful test is of the form ln (xi ) ≥ ln .
i=1
2n
(b) Is the test of (a) UMP for testing H0 : θ = 1 versus Ha : θ > 1 ? Explain your reasoning.
Yes, the test in part (a) is Uniformly Most Powerful for testing H0 : θ = 1 versus H0 : θ > 1.
1X
n
1 k
Because for any θa > θ0 , the most powerful level α test reject if y = ln (xi ) ≥ ln n
n i=1 n 2
(c) If n = 50, what is the (approximate) value of c for which the test has significance level
0.05 ?
The likelihood function is,
(
1 if 0 < ln (xi ) < 1
f (x1 , . . . , xn ; θ = 1) =
0 Otherwise
By : Sun Bunra 93
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Thus,
!
X
n
⇒P ln (xi ) ≥ c = 0.05
i=1
c−1
⇒P Z≥ √ = 0.05
1/ 50
c−1
⇒1−Φ √ = 0.05
1/ 50
c−1
⇒Φ √ = 0.95
1/ 50
c−1
⇒ √ = 1.645 ⇒ c = 1.2326 (From normal distribution tables)
1/ 50
Therefore, c = 1.2326
By : Sun Bunra 94
Institute of Technology of Cambodia Statistics ( 2022-2023 )
The P-value is the probability of obtaining a value more extreme or equal to the standardized
test statistic z. Determine the probability using table III.
If the P-value is smaller than the significance level α, then the null hypothesis is rejected.
There is not sufficient evidence to support the claim that the population means are not
equal.
b. If a 95%(1 − α) confidence interval for the mean does not contain 0 , then we reject the
null hypothesis of equal means.
If a 95%(1 − α) confidence interval for the mean contains 0 , then we fail to reject the null
hypothesis of equal means.
For confidence level 1 − α = 0.95, determine zα/2 = z0.025 using table III (look up 0.025 in
the table, the z-score is then the found z-score with opposite sign):
zα/2 = 1.96
The confidence interval does not contain 0 and thus the means appear to be equal.
c. The power is the probability of rejecting the null hypothesis when the alternative hypoth-
esis is true.
By : Sun Bunra 95
Institute of Technology of Cambodia Statistics ( 2022-2023 )
z = ±1.96
The corresponding sample mean difference is the population mean difference (of the hypoth-
esis) increased by the product of the z-score and the standard deviation :
s r
σ12 σ22 102 52
(x̄1 − x̄2 ) = (µ1 − µ2 ) − zα/2 · + x̄ = 0 − 1.96 + ≈ −6.6947
n1 n2 10 15
s r
σ12 σ22 102 52
(x̄1 − x̄2 ) = (µ1 − µ2 ) + zα/2 · + x̄ = 0 + 1.96 + ≈ 6.6947
n1 n2 10 15
The z-value is the sample mean difference decreased by the population mean difference
(alternative mean difference!), divided by the standard deviation :
Determine the probability of rejecting the null hypothesis using table III.
(d) Assuming equal sample sizes, what sample size should be used to obtain β = 0.05 if the
true difference in means is 3? Assume that α = 0.05.
Formula sample size :
2
zα/2 + zβ (σ12 + σ22 )
n=
(∆ − ∆0 )2
• ∆0 is the null hypothesis µ1 − µ2 = ∆0 .
• ∆ is the alternative hypothesis µ1 − µ2 = ∆.
Determine zα/2 = z0.025 using table III (look up 0.025 in the table, the z-score is then the
found z-score with opposite sign) :
zα/2 = 1.96
Determine zβ = z0.05 using table III (look up 0.05 in the table, the z-score is then the found
z-score with opposite sign) :
zα/2 = 1.64
Fill in the known values into the formula and evaluate :
By : Sun Bunra 96
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Since H a : µ1 − µ2 > 0
0
⇒ P − value = 1 − ϕ(z) = 1 − ϕ(0.93)
= 1 − 0.8238 = 0.1762
We get P -value = 0.1762 ⇒ P -value > α = 0.05
We do not rect the null hypotheses H0 .
(b). Explain how the test could be conducted with CI.
r
δ12 δ22
CI (µ1 − µ2 ) = x̄ − ȳ ± z α2 +
m r n
102 52
= 24.5 − 21.3 ± 1.96 +
10 15
= [−3.48, 9.88]
By : Sun Bunra 97
Institute of Technology of Cambodia Statistics ( 2022-2023 )
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µ1 = µ 2
Ha : µ1 ̸= µ2
Determine the value of the test statistic :
x̄1 − x̄2 16.018 − 16.005
z=q 2 2
= q ≈ 1.28
σ1 σ2 0.0202 0.0252
n1
+ n2 10
+ 10
The P-value is the probability of obtaining a value more extreme or equal to the standardized
test statistic z, assuming that the null hypothesis is true. Determine the probability using
the normal probability table.
If the P-value is smaller than the significance level α, then the null hypothesis is rejected.
∆ = 0.04
∆0 = 0
Determine zα/2 = z0.025 using using the normal probability table in the appendix (look up
0.025 in the table, the zscore is then the found z-score with opposite sign) :
zα/2 = 1.96
By : Sun Bunra 99
Institute of Technology of Cambodia Statistics ( 2022-2023 )
The P-value is the probability of obtaining a value more extreme or equal to the standardized
test statistic z. Determine the probability using table III.
P = P (Z < −7.25) ≈ 0
If the P-value is smaller than the significance level α, then the null hypothesis is rejected.
There is sufficient evidence to support the claim that the second population mean is larger
than the first population mean.
b. If a 95%(1 − α) confidence interval for the mean does not contain 0 , then we reject the
null hypothesis of equal means. If a 95%(1 − α) confidence interval for the mean contains 0
, then we fail to reject the null hypothesis of equal means. For confidence level 1 − α = 0.95,
determine zα = z0.025 using table III (look up 0.025 in the table, the z-score is then the found
z-score with opposite sign) :
zα/2 = 1.96
The endpoints of the confidence interval for µ1 − µ2 are :
s r
σ12 σ22 1.5 1.2
(x̄1 − x̄2 ) − zα/2 · + = (89.6 − 92.5) − 1.96 · + ≈ −3.684
n1 n2 15 20
s r
σ12 σ22 1.5 1.2
(x̄1 − x̄2 ) + zα/2 · + = (89.6 − 92.5) + 1.96 · + ≈ −2.116
n1 n2 15 20
The confidence interval does not contain 0 and thus the means appear to be unequal.
(c) What sample size would be required in each population if we wanted to be 95% confident
that the error in estimating the difference in mean road octane number is less than 1 ?
Formula sample size :
(zα + zβ )2 (σ12 + σ22 )
n=
(∆ − ∆0 )2
• ∆0 is the null hypothesis µ1 − µ2 = ∆0 .
• ∆ is the alternative hypothesis µ1 − µ2 = ∆.
Determine zα = z0.05 using table III (look up 0.05 in the table, the z-score is then the found
z-score with opposite sign) :
zα = 1.645
Determine zβ = z0.05 using table III (look up 0.05 in the table, the z-score is then the found
z-score with opposite sign):
zβ = 1.645
Fill in the known values into the formula and evaluate (round up to the nearest integer!) :
Exercise 5. The diameter of steel rods manufactured on two different extrusion machines
is being investigated. Two random samples of sizes n1 = 15 and n2 = 17 are selected, and
the sample means and sample variances are x̄1 = 8.73, s21 = 0.35, x̄2 = 8.68, and s22 = 0.40,
respectively. Assume that σ12 = σ22 and that the data are drawn from a normal distribution.
(a) Is there evidence to support the claim that the two machines produce rods with different
mean diameters? Use α = 0.05 in arriving at this conclusion. Find the P -value.
(b) Construct a 95% confidence interval for the difference in mean rod diameter. Interpret
this interval.
Solution :
a. Determine the hypotheses
H 0 : µ1 = µ 2
Ha : µ1 ̸= µ2
Determine the pooled standard deviation
s r
(n1 − 1) s21 + (n2 − 1) s22 (15 − 1)0.35 + (17 − 1)0.40
sp = = ≈ 0.6137
n1 + n2 − 2 15 + 17 − 2
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
There is not sufficient evidence to support the claim that the population means are different.
b. Given
c = 95% = 0.95 ⇒ α = 1 − c = 1 − 0.95 = 0.05
Determine tα/2 with df = 30 using table V :
t0.025 = 2.042
Exercise 6. An article in Fire Technology investigated two different foam expanding agents
that can be used in the nozzles of fire-fighting spray equipment. A random sample of five
observations with an aqueous film-forming foam (AFFF) had a sample mean of 4.7 and a
standard deviation of 0.6. A random sample of five observations with alcohol-type concen-
trates (ATC) had a sample mean of 6.9 and a standard deviation 0.8.
(a) Can you draw any conclusions about differences in mean foam expansion? Assume that
both populations are well represented by normal distributions with the same standard devi-
ations.
(b) Find a 95% confidence interval on the difference in mean foam expansion of these two
agents.
Solution :
Given :
n1 = Sample size = 5
n2 = Sample size = 5
x̄1 = Sample mean = 4.7
x̄2 = Sample mean = 6.9
s1 = Sample standard deviation = 0.6
s2 = Sample standard deviation = 0.8
α = Significance level = 5% = 0.05
Determine tα/2 using the Student’s T distribution table, which is given in the column with
α/2 = 0.025 and in the row with df = n1 + n2 − 2 = 5 + 5 − 2 = 8 :
t0.025 = 2.306
We are 95% confident that the mean foam expansion of AFFF is between 1.1687 and 3.2313
lower than the mean foam expansion of ATC.
Since the confidence interval does not contain 0 and only contains negative values, ATC
appears to have the greatest mean foam expansion and thus we can draw conclusion about
which agent produces the greatest man foam expansion.
Exercise 7. The deflection temperature under load for two different types of plastic pipe is
being investigated. Two random samples of 15 pipe specimens are tested, and the deflection
temperatures observed are as follows (in ◦ F) :
Type 1: 206, 188, 205, 187, 194, 193, 207, 185, 189, 213, 192, 210, 194, 178, 205
Type 2: 177, 197, 206, 201, 180, 176, 185, 200, 197, 192, 198, 188, 189, 203, 192
(a) Construct box plots and normal probability plots for the two samples. Do these plots
provide support of the assumptions of normality and equal variances? Write a practical
interpretation for these plots.
(b) Do the data support the claim that the deflection temperature under load for type 1 pipe
exceeds that of type 2? In reaching your conclusions, use α = 0.05. Calculate a P -value.
(c) If the mean deflection temperature for type 1 pipe exceeds that of type 2 by as much as
5◦ F, it is important to detect this difference with probability at least 0.90. Is the choice of
n1 = n2 = 15 adequate? Use α = 0.05.
Solution :
Given:
n1 = Sample size = 15
n2 = Sample size = 15
α = Significance level = 0.05
The mean is the sum of all values divided by the number of values:
The variance is the sum of squared deviations from the mean divided by n − 1. The standard
deviation is the square root of the variance:
r
(206 − 196.4)2 + . . . . + (205 − 196.4)2
s1 = ≈ 10.4799
15 − 1
r
(177 − 192.0667)2 + . . . . + (192 − 192.0667)2
s2 = ≈ 9.4375
15 − 1
• BOXPLOT
The whiskers of the boxplot are at the minimum and maximum value. The box starts at the
lower quartile, ends at the upper quartile and has a vertical line at the median.
The lower quartile is at 25% of the sorted data list, the median at 50% and the upper quartile
at 75%.
The assumption of normality appears to be valid, because the patterns in the normal prob-
ability plots are roughly linear and contained no strong curvature.
The assumption of equal variances appears to be valid, because the boxplots have roughly
the same width.
(b) Given claim: The mean of type 2 is higher than the mean of type 1. The claim is either
the null hypothesis or the alternative hypothesis. The null hypothesis and the alternative
hypothesis state the opposite of each other. The null hypothesis needs to contain an equality.
H 0 : µ1 = µ2
H 1 : µ1 < µ 2
Determine the corresponding P-value from the Student’s T distribution table in the appendix
with df = n1 + n2 − 2 = 15 + 15 − 2 = 28
P > 0.40
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
There is not sufficient evidence to support the claim that the deflection temperature under
load for type 2 pipe exceeds that of type 1 .
Given claim : The mean of type 2 is higher than the mean of type 1. The claim is either
the null hypothesis or the alternative hypothesis. The null hypothesis and the alternative
hypothesis state the opposite of each other. The null hypothesis needs to contain an equality.
H 0 : µ1 = µ2
H 1 : µ1 < µ 2
Determine the pooled standard deviation :
s r
(n1 − 1) s21 + (n2 − 1) s22 (15 − 1)10.47992 + (15 − 1)9.43752
sp = = ≈ 9.9723
n1 + n2 − 2 15 + 15 − 2
Determine the corresponding P-value from the Student’s T distribution table in the appendix
with df = n1 + n2 − 2 = 15 + 15 − 2 = 28
P > 0.40
(c) If the mean deflection temperature for type 1 pipe exceeds that of type 2 by as much as
5◦ F, it is important to detect this difference with probability at least 0.90. Is the choice of
n1 = n2 = 15 adequate? Use α = 0.05.
∆=5
P OW ER = 0.90
β is the complement of the power, thus 1 decreased by the power.
β = 1 − P OW ER = 1 − 0.90 = 0.10
zα/2 = 1.96
Determine zβ = z0.10 using the normal probability table in the appendix (look up 0.10 in the
table, the z-score is then the found z-score with opposite sign):
zβ = 1.28
Fill in the known values into the formula and evaluate (round up!):
(1.96 + 1.28)2 (10.47992 + 9.43752 )
n1 = n2 = = 84
(5 − 0)2
Since this required sample size is much larger than the used sample sizes of 15 , the sample
size of n1 = n2 = 15 is then not adequate.
Exercise 8. Two companies manufacture a rubber material intended for use in an automo-
tive application. The part will be subjected to abrasive wear in the field application, so we
decide to compare the material produced by each company in a test. Twenty-five samples
of material from each company are tested in an abrasion test, and the amount of wear after
1000 cycles is observed. For company 1, the sample mean and standard deviation of wear
are x̄1 = 20 milligrams/1000 cycles and s1 = 2 milligrams 1000 cycles, while for company 2
we obtain x̄2 = 15 milligrams/1000 cycles and s2 = 8 milligrams/1000 cycles.
(a) Do the data support the claim that the two companies produce material with different
mean wear? Use α = 0.05, and assume each population is normally distributed but that
their variances are not equal. What is the P -value for this test?
(b) Do the data support a claim that the material from company 1 has higher mean wear
than the material from company 2 ? Use the same assumptions as in part (a).
(c) Construct confidence intervals that will address the questions in parts (a) and (b) above.
Solution :
Given x̄1 = 20, s1 = 2, n1 = 25, x̄2 = 15, s2 = 8, n2 = 25
(a) Given claim: Different
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µ1 = µ 2
Ha : µ1 ̸= µ2
Determine the test statistic
x̄1 − x̄2 20 − 15
t= q 2 2
=q ≈ 3.0317
s1 s2 22 82
n1
+ n2 25
+ 25
The P-value is the probability of obtaining the value of the test statistic, or a value more ex-
treme. The P-value is the number (or interval) in the column title of Student’s T distribution
in the appendix containing the t-value in the row df = 27
P < 2 × 0.0005 = 0.001
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
There is sufficient evidence to support the claim that the two companies produce material
with different mean wear.
(b) Given claim: Higher for company 1
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µ1 = µ 2
H a : µ1 > µ 2
Determine the test statistic
x̄1 − x̄2 20 − 15
t= q 2 = q ≈ 3.0317
s1 s22 22 82
n1
+ n2 25
+ 25
The P-value is the probability of obtaining the value of the test statistic, or a value more ex-
treme. The P-value is the number (or interval) in the column title of Student’s T distribution
in the appendix containing the t-value in the row df = 27
P < 0.0005
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
There is sufficient evidence to support the claim that the material from company 1 has higher
mean wear than the material from company 2.
Exercise 9. The thickness of a plastic film (in mils) on a substrate material is thought to
be influenced by the temperature at which the coating is applied. A completely randomized
experiment is carried out. Eleven substrates are coated at 125◦ F, resulting in a sample mean
coating thickness of x̄1 = 103.5 and a sample standard deviation of s1 = 10.2. Another
13 substrates are coated at 150◦ F, for which x̄2 = 99 and s2 = 20.1 are observed. It
was originally suspected that raising the process temperature would reduce mean coating
thickness.
(a) Do the data support this claim? Use α = 0.01 and assume that the two population
standard deviations are not equal. Calculate an approximate P -value for this test.
(b) How could you have answered the question posed regarding the effect of temperature on
coating thickness by using a confidence interval? Explain your answer.
Solution :
Given n1 = 11, n2 = 13, x̄1 = 103.5, x̄2 = 99, s1 = 10.2, s2 = 20.1, α = 0.01
a. The variances are assumed to be non-equal, thus we need to use the unpooled t-test
H 0 : µ1 = µ2
H 1 : µ1 > µ 2
Determine the test statistic :
x̄1 − x̄2 103.5 − 99
t= q 2 2
=q ≈ 0.7067
s1 s2 10.22 20.12
n1
+ n2 11
+ 13
Determine the degrees of freedom (rounded down to the nearest integer) :
2 2 2 2
s1 s22 10.2 20.12
n1
+ n2 11
+ 13
∆= 2 2 2 = ≈ 18
(s1 /n1 ) (s22 /n2 ) (10.22 /11)2 (20.12 /13)2
+ n2 −1 11−1
+ 13−1
n1 −1
The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme. The P-value is the number (or interval) in the column title of Table V containing
the t-value in the row df = 18 :
0.25 < P < 0.40
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
P > 0.01 ⇒ Fail to reject H0
There is not sufficient evidence to support the claim that the process temperature would
reduce the mean coating thickness.
b. If we construct a confidence interval and the confidence interval contains 0 , then we will
fail to reject the null hypothesis H0 and there will not be sufficient evidence to support the
claim of different population means. If the confidence interval does not contain 0 , then we
will reject the null hypothesis H0 and there is sufficient evidence to support the claim of
different population means.
α = 0.05 ⇒ c = 1 − α = 1 − 0.05 = 0.95 = 95%
Determine the degrees of freedom (rounded down to the nearest integer)
2 2 2 2
s1 s22 10.2 20.12
n1
+ n2 11
+ 13
∆= 2 2 2 = 2 ≈ 18
(s1 /n1 ) (s22 /n2 ) (10.22 /11) (20.12 /13)2
+ 11−1
+ 13−1
n1 −1 n2 −1
Determine the t-value by looking in the row starting with degrees of freedom df = 18 and in
the column with α = (1 − c)/2 = 0.005 in the Student’s t distribution table in the appendix
t0.005 = 2.878
The endpoints of the confidence interval for µ1 − µ2 are
s r
s21 s22 10.22 20.12
(x̄1 − x̄2 ) − tα/2 · + = (103.5 − 99) − 2.878 · + ≈ −13.8235
n1 n2 11 13
s r
s21 s22 10.22 20.12
(x̄1 − x̄2 ) + tα/2 · + = (103.5 − 99) + 2.878 · + ≈ 22.8235
n1 n2 11 13
Since the confidence interval (−13.8235, 22.8235) contains 0 , there is not sufficient evidence
to support the claim that the process temperature would reduce the mean coating thickness.
Exercise 10. Fifteen adult males between the ages of 35 and 50 participated in a study to
evaluate the effect of diet and exercise on blood cholesterol levels. The total cholesterol was
measured in each subject initially and then three months after participating in an aerobic
exercise program and switching to a low-fat diet. The data are shown in the accompanying
table
Blood Clolesterol Level
Subject Before After
1 265 229
2 240 231
3 258 227
4 295 240
5 251 238
6 245 241
7 287 234
8 314 256
9 260 247
10 279 239
11 283 246
12 240 218
13 238 219
14 225 226
15 247 233
(a) Do the data support the claim that low-fat diet and aerobic exercise are of value in
producing a mean reduction in blood cholesterol levels? Use α = 0.05. Find the P -value.
(b) Calculate a one-sided confidence limit that can be used to answer the question in part
(a).
Solution :
(a) Claim: Decrease
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µd = 0
H a : µd > 0
Determine the sample mean of the differences. The mean is the sum of all values divided by
the number of values.
36 + 9 + 31 + . . . + 19 − 1 + 14
d¯ = ≈ 26.8667
15
Determine the sample standard deviation of the differences
r
(36 − 26.8667)2 + . . . + (14 − 26.8667)2
sd = ≈ 19.0371
15 − 1
Determine the value of the test statistic
d¯ 26.8667
t= √ = √ ≈ 5.4659
sd / n 19.0371/ 15
The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme, assuming that the null hypothesis is true. The P-value is the number (or interval)
in the column title of the Student’s Tistribution in the appendix containing the t-value in
the row df = n − 1 = 15 − 1 = 14
P < 0.0005
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
P < 0.05 ⇒ Reject H0
There is sufficient evidence to support the claim that low-fat diet and aerobic exercise are
of value in producing a man reduction in blood cholesterol levels.
(b) Determine the t-value using the Student’s T distribution table in the appendix with
• df = n − 1 = 15 − 1 = 14
• α = (1 − c)/2 = 0.025
t∗ = 2.145
The margin of error is then
sd 19.0371
E = t∗ · √ = 2.145 · √ ≈ 10.5434
n 15
The endpoints of the confidence interval for µd are
d¯ − E = 26.8667 − 10.5434 = 16.3233
d¯ + E = 26.8667 + 10.5434 = 37.4101
Exercise 11. Two different analytical tests can be used to determine the impurity level in
steel alloys. Eight specimens are tested using both procedures, and the results are shown in
the following tabulation.
Specimen Test 1 Test 2
1 1.2 1.4
2 1.3 1.7
3 1.5 1.5
4 1.4 1.3
5 1.7 2.0
6 1.8 2.1
7 1.4 1.7
8 1.3 1.6
(a) Is there sufficient evidence to conclude that tests differ in the mean impurity level, using
α = 0.01 ?
(b) Is there evidence to support the claim that Test 1 generates a mean difference 0.1 units
lower than Test 2? Use α = 0.05.
(c) If the mean from Test 1 is 0.1 less than the mean from Test 2, it is important to detect
this with probability at least 0.90. Was the use of eight alloys an adequate sample size? If
not, how many alloys should have been used?
Solution :
Given:
n=8
α = 0.01
If the P-value is less than the significance level, reject the null hypothesis.
There is not sufficient evidence to support the claim that tests differ in the mean impurity
level.
b.Given
α = 0.05
Determine the hypotheses
H0 : µd = −0.1
H1 : µd < −0.1
Determine the value of the test statistic
d¯ − µd −0.2125 − (−0.1)
t= √ = √ ≈ −1.842
sd / n 0.1727/ 8
The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme. The P-value is the number (or interval) in the column title of Table V containing
the t-value in the row df = n − 1 = 8 − 1 = 7 :
If the P-value is less than the significance level, reject the null hypothesis.
There is not sufficient evidence to support the claim that test 1 generates a mean difference
0.1 units lower than test 2.
c. Given
∆ = −0.1
P OW ER = 0.90
α = 0.05
β is the complement of the power, thus 1 decreased by the power:
β = 1 − P OW ER = 1 − 0.90 = 0.10
The sample size of 8 is not adequate, because the minimal sample size required to detect a
difference of 0.1 is 25 .
f0.25,5,10 = 1.59
b. The critical value f0.10,24,9 is given in the second table of table VI in the row with
• df d = 9 and in the column with df n = 24
f0.10,24,9 = 2.28
c. The critical value f0.05,8,15 is given in the third table of table VI in the row with
• df d = 15 and in the column with df n = 8
f0.05,8,15 = 2.64
Property F-distribution
1
f1−α,u,v =
fα,v,u
d. The critical value f0.25,10,5 is given in the first table of table VI in the row with
• df d = 5 and in the column with df n = 10
f0.25,5,10 = 1.89
e. The critical value f0.10,9,24 is given in the second table of table VI in the row with
• df d = 24 and in the column with df n = 9
f0.10,9,24 = 1.91
f. The critical value f0.05,15,8 is given in the third table of table VI in the row with
• df d = 8 and in the column with df n = 15
f0.05,15,8 = 3.22
Exercise 13. Consider the hypothesis test H0 : σ12 = σ22 against Ha : σ12 < σ22 . Suppose that
the sample sizes are n1 = 5 and n2 = 10, and that s21 = 23.2 and s22 = 28.8. Use α = 0.05.
Test the hypothesis and explain how the test could be conducted with a confidence interval
on σ1 /σ2 .
Solution :
Given
H0 : σ12 = σ22
H1 : σ12 < σ22
n1 = 5 and n2 = 10
s1 = 23.2, s22 = 28.8 and α = 0.05
2
s21 23.2
F = 2
= ≈ 0.801
s2 28.8
The critical value is given in the third table of table VI in the row with df n = n1 − 1 = 4
and in the column with df d = n2 − 1 = 10 − 1 = 9
1 1
f0.95,4,9 = = ≈ 0.1667
f0.05,9,4 6.00
There is not sufficient evidence to support the claim that the first variance is smaller the
second variance.
Exercise 14. Consider the hypothesis test H0 : σ12 = σ22 against Ha : σ12 ̸= σ22 . Suppose
that the sample sizes are n1 = 15 and n2 = 15, and the sample variances are s21 = 2.3 and
s22 = 1.9. Use α = 0.05.
(a) Test the hypothesis and explain how the test could be conducted with a confidence
interval on σ1 /σ2 .
(b) What is the power of the test in part (a) if σ1 is twice as large as σ2 ?
(c) Assuming equal sample sizes, what sample size should be used to obtain β = 0.05 if the
σ2 is half of σ1 ?
Solution :
Exercise 15. A study was performed to determine whether men and women differ in their
repeatability in assembling components on printed circuit boards. Random samples of 25
men and 21 women were selected, and each subject assembled the units. The two sample
standard deviations of assembly time were smen = 0.98 minutes and swomen = 1.02 minutes.
(a) Is there evidence to support the claim that men and women differ in repeatability for
this assembly task? Use α = 0.02 and state any necessary assumptions about the underlying
distribution of the data.
(b) Find a 98% confidence interval on the ratio of the two variances. Provide an interpretation
of the interval.
Solution :
a. Determine the hypotheses :
H0 : σ12 = σ22
H1 : σ12 ̸= σ22
Compute the value of the test statistic :
s21 0.982
F = = ≈ 0.923
s22 1.022
The critical value is given in the third table of table VI in the row with
• df d = n2 − 1 = 21 − 1 = 20
• df n = n1 − 1 = 25 − 1 = 24
⇒ f0.01,24,20 = 2.86
1 1
f0.99,24,20 = = ≈ 0.3650
f0.01,20,24 2.74
The rejection region contains all values smaller than 0.3650 and all values larger than 2.86.
If the value of the test statistic is in the rejection region, then reject the null hypothesis
There is not sufficient evidence to support the claim that the two population variances differ.
b. The boundaries of the confidence interval are then
s21 0.982
f0.01,24,20 · = 2.86 · ≈ 2.6401
s22 1.022
s2 0.982
f0.99,24,20 · 21 = 0.3650 · ≈ 0.3369
s2 1.022
We are 98% confident that the ratio of the sample variances is between 0.3369 and 2.6401.
Exercise 16. To measure air pollution in a home, let X and Y equal the amount of sus-
pended particulate matter (in mg/m3 ) measured during a 24-hour period in a home in which
there is no smoker and a home in which there is a smoker, respectively. We shall test the
σ2 σ2
null hypothesis H0 : 12 = 1 Vs Ha : 12 > 1
σ2 σ2
(a) If a random sample of size m = 9 yielded x̄ = 93 and sx = 12.9 while a random sample
of size n = 11 yielded ȳ = 132 and sy = 7.1, define a critical region and give your conclusion
if α = 0.05.
(b) Now test H0 : µ1 = µ2 against Ha : µ1 < µ2 if α = 0.05.
Solution :
(a) Determine the value of the test statistic :
s2X 12.92
F = = = 3.301
s2Y 7.12
The critical value is given in the first table of table VII in the row with
• df d = n2 − 1 = 11 − 1 = 10α = 0.05
• df n = n1 − 1 = 9 − 1 = 8
⇒ F = 3.07
The rejection region contains all values larger than 3.07.
If the value of the test statistic is in the rejection region, then reject the null hypothesis:
(b) Given :
H 0 : µX = µY
H 1 : µX < µ Y
Since in part (a) we concluded that the population variances were not equal, we need to use
Welch’s t test.
Determine the degrees of freedom (rounded down to the nearest integer):
2 2
s21 s22 12.92 7.12
n1
+ n2 9
+ 13
∆= 2 2 = 2 ≈ 11
(s21 /n1 ) (s22 /n2 ) (12.92 /9) (7.12 /11)2
+ 9−1
+ 11−1
n1 −1 n2 −1
t = −1.796
The rejection region then contains all values smaller than −1.796. Determine the test statistic
x̄1 − x̄2 93 − 132
t= q 2 2
=q ≈ −8.119
s1 s2 12.92 7.12
n1
+ n2 9
+ 11
If the value of the test statistic is within the rejection region, then the null hypothesis is
rejected :
−8.119 < −1.796 ⇒ Reject H0
Exercise 17. Two different types of injection-molding machines are used to form plastic
parts. A part is considered defective if it has excessive shrinkage or is discolored. Two
random samples, each of size 300 , are selected, and 15 defective parts are found in the
sample from machine 1 while 8 defective parts are found in the sample from machine 2 .
(a) Is it reasonable to conclude that both machines produce the same fraction of defective
parts, using α = 0.05 ? Find the P -value for this test.
(b) Construct a 95% confidence interval on the difference in the two fractions defective.
(c) Suppose that p1 = 0.05 and p2 = 0.01. With the sample sizes given here, what is the
power of the test for this two-sided alternate?
(d) Suppose that p1 = 0.05 and p2 = 0.01. Determine the sample size needed to detect this
difference with a probability of at least 0.9.
Solution :
The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme. Determine the Pvalue using the normal probability table in the appendix :
If the P-value is smaller than the significance level, then reject the null hypothesis :
There is not sufficient evidence to support the claim that there is a difference in these
proportions.
b.Given
c = 95% = 0.95
For confidence level 1 − α = 0.95, determine zα/2 = z0.025 using the normal probability
table in the appendix (look up 0.025 in the table, the z-score is then the found z-score with
opposite sign) :
zα/2 = 1.96
The lower boundary of the confidence interval for p1 − p2 are then :
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) − zα/2 · +
n1 n2
r
0.05(1 − 0.05) 0.0267(1 − 0.0267)
= (0.05 − 0.0267) − 1.96 +
300 300
≈ −0.0074
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) + zα/2 · +
n1 n2
r
0.05(1 − 0.05) 0.0267(1 − 0.0267)
= (0.05 − 0.0267) + 1.96 +
300 300
≈ 0.0540
c. Given
p1 = 0.05
p2 = 0.01
Formula power :
p p
zα/2 pq (1/n1 + 1/n2 ) − (p1 − p2 ) −z pq (1/n1 + 1/n2 ) − (p1 − p2 )
β = Φ q − Φ α/2 q
p̂1 (1−p̂1 ) p̂2 (1−p̂2 ) p̂1 (1−p̂1 ) p̂2 (1−p̂2 )
n1
+ n2 n1
+ n2
Determine zα/2 = z0.025 using the normal probability table in the appendix (look up 0.025
in the table, the z-score is then the found z-score with opposite sign) :
zα/2 = 1.96
= Φ(−0.92) − Φ(0.92)
P OW ER = 1 − β = 1 − 0.642428 = 0.357572
d. Given :
P OW ER = 0.9
β is the complement of the power :
β = 1 − P OW ER = 1 − 0.9 = 0.1
Determine zα/2 = z0.025 using the normal probability table in the appendix (look up 0.025
in the table, the z-score is then the found z-score with opposite sign) :
zα/2 = 1.96
Determine zβ = z0.10 using the normal probability table in the appendix (look up 0.10 in the
table, the z-score is then the found z-score with opposite sign) :
zβ = 1.28
Exercise 18. Recent incidents of food contamination have caused great concern among
consumers. The article ”How Safe Is That Chicken?” (Consumer Reports, Jan. 2010: 19-23)
reported that 35 of 80 randomly selected Perdue brand broilers tested positively for either
campylobacter or salmonella (or both), the leading bacterial causes of food-borne disease,
whereas 66 of 80 Tyson brand broilers tested positive.
(a) Does it appear that the true proportion of non-contaminated Perdue broilers differs from
that for the Tyson brand? Carry out a test of hypotheses using a significance level 0.01 by
obtaining a P -value.
(b) If the true proportions of non-contaminated chickens for the Perdue and Tyson brands
are 0.50 and 0.25, respectively, how likely is it that the null hypothesis of equal proportions
will be rejected when a 0.01 significance level is used and the sample sizes are both 80 ?
Solution :
(a) Claim: p1 ̸= p2
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
states an equality. Since the null hypothesis is not the claim, the alternative hypothesis is
the claim.
H0 : p 1 = p 2
Ha : p1 ̸= p2
The sample proportion is the number of successes divided by the sample size :
x1 35
p̂1 = = ≈ 0.4375
n1 80
x2 66
p̂2 = = ≈ 0.825
n2 80
x1 + x2 35 + 66
p̂p = = ≈ 0.63125
n1 + n2 80 + 80
The critical values are the values corresponding to a probability of 0.005/0.995 in table A.3
z = ±2.575
The rejection region then contains all values below −2.575 and all values above 2.575.
Determine the value of the test statistic :
p̂1 − p̂2 0.4375 − 0.825
z=p q =p q ≈ −5.08
p̂p (1 − p̂p ) n11 + 1
n2 0.63125(1 − 0.63125) 80
1
+ 1
80
If the value of the test statistic is within the rejection region, then the null hypothesis is
rejected :
−5.08 < −2.575 ⇒ Reject H0
There is sufficient evidence to support the claim that the true proportion of non-contaminated
Perdue broilers differs from that for the Tyson brand.
(b) Given
p1 = 0.50
p2 = 0.25
The critical values are the values corresponding to a probability of 0.005/0.995 in table A.3
z = ±2.575
Determine the difference in proportions that correspond with these z-values (assuming null
hypothesis p1 = p2 is true):
q r
1 1
p̂1 − p̂1 = (p1 − p2 ) + z p̂p (1 − p̂p ) +
n1 n2
r
p 1 1
= 0 ± 2.575 0.63125(1 − 0.63125) +
80 80
≈ ±0.196
Determine the z-score corresponding with these difference in proportions, assuming that the
alternative hypothesis is true (Note: We use p1 and p2 instead of p̂p , because p̂p is unknown
since we do not know the sample proportions) :
p̂1 − p̂2 − (p1 − p2 ) 0.1964 − (0.50 − 0.25)
z=q =q ≈ −0.72
p1 (1−p1 ) p2 (1−p2 ) 0.50(1−0.50) 0.25(1−0.25)
n1
+ n2 80
+ 80
p̂1 − p̂2 − (p1 − p2 ) −0.1964 − (0.50 − 0.25)
z=q =q ≈ −6.04
p1 (1−p1 ) p2 (1−p2 ) 0.50(1−0.50) 0.25(1−0.25)
n1
+ n2 80
+ 80
Determine the probability of rejecting the null hypothesis using table A.3:
P (z < −6.04 or z > −0.72) = P (z < −6.04) + P (z > −0.72)
= P (z < −6.04) + 1 − P (z < −0.72)
≈ 0 + 1 − 0.2358 = 0.7642 = 76.42%
The P-value is the probability of obtaining the value of the test statistic, or
TD6 - (Anova)
1. Three classes in elementary statistics are taught by three different persons : a regular
faculty member, a graduate teaching assistant, and an adjunct from outside the uni-
versity. At the end of the semester, each student is given a standardized test. Five
students are randomly picked from each of these classes, and their scores are as shown
in Table
Faculty Teaching assistant Adjunct
93 88 86
61 90 56
87 76 73
75 82 90
92 58 47
(a) Construct an ANOVA table and interpret your results.
(b) Test at the 0.05 level whether there is a difference between the mean scores for the
three persons teaching. Assume that the ANOVA assumptions are met.
Solution :
(a) Construct an ANOVA table and interpret your results.
The ANOVA table is :
Source of Variation Df Sum of Squares Mean Squares F-Statistic
Treatment
Error
Total
we have k = 3, n1 = 5, n2 = 5, n3 = 5, and N = n1 + n2 + n3 = 15 and then
SST r = 339.7333, SSE = 2785.2, SST = 3124.9333
Since
SST r SSE
M ST r = = 169.8667, M SE = = 232.1
k−1 n−k
Then
M ST r
F = = 0.73187
M SE
Therefore, ANOVA Table
Source of Variation Df Sum of Squares Mean Squares F-Statistic
Treatment 2 339.7333 169.8667 0.73187
Error 12 2785.2 232.1
Total 14 3124.9333
(b) Test at the 0.05 level whether there is a difference between the mean scores for
the three persons teaching. Assume that the ANOVA assumptions are met.
■ For significance level 0.05
we have p-value = P (F ≥ f ) = 0.501295.
Since the p-value is too high (0.501295 >> 0.05) we can say that, there is a difference
between the mean scores for the three persons teaching.
2. The following data refers to yield of tomatoes (kg/plot) for four different levels of
salinity; salinity level here refers to electrical conductivity (EC), where the chosen
levels were EC = 1.6, 3.8, 6.0, and 10.2nmhos/cm :
(a) Use the F test at level α = 0.05 to test for any differences in true average yield
due to the different salinity levels.
(b) Apply the modified Tukey’s method to identify significant differences among the
µi ’s.
Solution :
(a) Use the F test at level α = 0.05 to test for any differences in true average yield due
to the different salinity levels.
To find the value of test statistic f , prepare the following table
We have, I = 4, N = 18
X
4 X
ni
xij = 59.52 + . . . + 46.12 = 50078.07
i=1 j=1
■ The quantity for comparing the difference in means of sample 2 and sample 3 is:
√
w23 = 4.11 1.49 = 6.13
■ The quantity for comparing the difference in means of sample 1 and sample 3 is :
√
w13 = 4.11 2.00025 = 5.81
■ similarly, the quantity for comparing the differences for remaining combinations
■ x̄1. − x̄2. = 2.88 < w12 says that there is no significant difference between these
means.
■ x̄1. − x̄3 = 7.43 > w13 says that there is significant difference between these means.
■ x̄1. − x̄4 = 12.78 > w14 says that there is significant difference between these means
■ I¯2. − x̄3. = 4.55 < w23 says that there is no significant difference between these
means.
■ x̄2. − x̄4. = 9.9 > w24 says that there is significant difference between these means.
■ x̄3 . − x̄4 . = 5.35 < w34 says that there is no significant difference between these
means.
Therefore, there is no significant difference between the averages of level (1.6 and
3.8), (3.8 and 6.0), (6.0 and 10.2) with in the groups, but the remaining differences are
appeared to be significantly different.
3. The following partial ANOVA table is taken from the article ” Perception of Spatial
Incongruity ” (J. Nerv. Ment. Dis., 1961: 222) in which the abilities of three
different groups to identify a perceptual incongruity were assessed and compared. All
individuals in the experiment had been hospitalized to undergo psychiatric treatment.
There were 21 individuals in the depressive group, 32 individuals in the functional
”other ” group, and 21 individuals in the brain-damaged group. Complete the ANOVA
table and carry out the F test at level α = 0.01.
Old Old
Young
Sedentary Active
Sample size 10 8 10
Sample mean 46.68 47.71 58.24
Sample sd 7.16 5.59 8.43
Carry out a test to decide whether true average activity differs for the three groups.If
appropriate, investigate differences amongst the means with a multiple comparisons
method.
Solution :
Carry out a test to decide whether true average activity differs for the three groups.
If appropriate, investigate differences amongst the means with a multiple comparisons
method.
X
3
1 2
SSTr = ni x̄2i. − x = 797.0966
i=1
n ..
X
3 X
ni X
3
SSE = (xij − x̄i. ) =
2
(ni − 1) s2i = 1319.7112
i=1 j=1 i=1
797.0966
M ST r = = 398.5483
2
SSE
M SE = = 52.7884
25
The test statistic is :
M ST r 398.5483
F = = = 7.55
M SE 52.7884
By : Sun Bunra 127
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Therefore, at the level 0.01, the null hypothesis can be rejected and conclude that at
least two of the means differ.
Now use multiple comparison method to investigate which pairs of the means differ
significantly. Use α = 0.01.
we have
Qα,I,n−I = Q0.01,3,25 = 4.55
w12 = 11.09, w13 = 10.45, w23 = 11.09
The means in ascending order are :
Therefore, Conclude that there is significant difference between the sample means of
the first and third sample. That is the true average activity differs between young and
old active group.
5. Lipids provide much of the dietary energy in the bodies of infants and young children.
There is a growing interest in the quality of the dietary lipid supply during infancy as
a major determinant of growth, visual and neural development, and longterm health.
The article ”Essential Fat Requirements of Preterm Infants” (Amer. J. Clin. Nutrit.,
2000: 245S-250S) reported the following data on total polyunsaturated fats (%) for
infants who were randomized to four different feeding regimens: breast milk, corn-
oilbased formula, soy-oil-based formula, or soy-and-marine-oil-based formula :
(a) What assumptions must be made about the four total polyunsaturated fat
distributions before carrying out a single-factor ANOVA to decide whether there are
any differences in true average fat content?
(b) Carry out the test suggested in part (a). What can be said about the P -value?
Solution :
(a) What assumptions must be made about the four total polyunsaturated fat distri-
butions before carrying out a single-factor ANOVA to decide whether there are any
differences in true average fat content?
To test for taking decision on having differences in true average while conducting a
single-factor ANOVA, assume that each of the four fat distributions must follow nor-
mal distribution with equal variance.
(b) Carry out the test suggested in part (a). What can be said about the P -value?
Test the null hypothesis : H0 : There is no significance difference between means.
with alternative hypothesis Ha : at least one mean is different from others.
We have
1 XX
I i n
x̄ = xij = 43.02
n i=1 j=1
X
I X
ni
SST r = (x̄i. − x̄.. )2 = 8.3344
i=1 j=1
SST r 8.3344
M ST r = = = 2.778
I −1 3
XI
SSE = (ni − 1) s2i = 77.79
i=1
SSE 77.79
M SE = = = 1.62
n−I 48
The test statistic can be calculated as :
M ST r
f= = 1.71
M SE
The critical value of test statistic for α = 0.1 with (3, 48) degrees of freedom is
F0.10,3,48 = 2.20 so, f = 1.71 < 2.20 = F0.10,3,48
The probability of test statistic at α = 0.1 with (3, 48) degrees of freedom is 0.18.
Since P -value = 0.18 > 0.10, we accept the null hypothesis and conclude that there is
no significant difference among the means.
6. Although tea is the world’s most widely consumed beverage after water, little is known
about its nutritional value. Folacin is the only B vitamin present in any significant
amount in tea, and recent advances in assay methods have made accurate determina-
tion of folacin content feasible. Consider the accompanying data on folacin content for
randomly selected specimens of the four leading brands of green tea.
Brand Observationst
1 7.9 6.2 6.6 8.6 8.9 10.1 9.6
2 5.7 7.5 9.8 6.1 8.4
3 6.8 7.5 5.0 7.4 5.3 6.1
4 6.4 7.1 7.9 4.5 5.0 4.0
(Data is based on ”Folacin Content of Tea,” J. Amer. Dietetic Assoc., 1983 : 627-632.)
Does this data suggest that true average folacin content is the same for all brands?
(a) Carry out a test using α = 0.05 via the P -value method.
(b) Assess the plausibility of any assumptions required for your analysis in part (a).
(c) Perform a multiple comparisons analysis to identify significant differences among
brands.
Solution :
(a) Carry out a test using α = 0.05 via the P -value method.
Test the null hypothesis : H0 : There is no significance difference between brand means.
with alternative hypothesis
Since Ha : at least one brand mean is different from others. From the given data we
have the One-Way ANOVA Table :
Source SS df MS f p-value
Between-treatments 23.4957 3 7.8319 f = 3.74933 0.027552
Within-treatments 41.7776 20 2.0889
Total 65.2733 23
We have P -value = 0.027552 < 0.05. So, the null hypothesis is rejected.
Therefore, it can be concluded that there is a significance difference the mean folacin
contents in the four brands. This means that at least one brand has average folacin
content different from the others.
(b) Assess the plausibility of any assumptions required for your analysis in part (a).
We assume that each of the four brands must follow normal distribution with equal
variance.
(c) Perform a multiple comparisons analysis to identify significant differences among
brands.
We have
Factor ni mean
Brand 1 7 8.271
Brand 2 5 7.500
Brand 3 6 6.350
Brand 4 6 5.817
we have
Q0.05,4,20 = 3.958
s
2.0889 1 1
w34 = 3.958 + = 2.33
2 6 6
w24 = 2.44, w23 = 2.44, w13 = 2.24, w14 = 2.24, w12 = 2.366
The difference between the pairs and decision are shown in the table below :
Pairs difference wij decision
x̄1. − x̄2. 0.771 2.366 not significant difference
x̄1. − x̄3. 1.921 2.24 not significant difference
x̄1. − x̄4. 2.454 2.24 significant difference
x̄2. − x̄3. 1.15 2.44 not significant difference
x̄2. − x̄4. 1.863 2.44 not significant difference
x̄3. − x̄4. 0.533 2.33 not significant difference
Therefore, Conclude that there is significant difference between the sample means of
the Brand 1 and Brand 4.
7. In
Xsingle-factor ANOVA
X with sample sizes ni (iX = 1, . . . , I), show that SSTr =
2
ni X̄i. − X̄.. = ni X̄i. − nX̄.. , where n =
2 2
ni .
i
Solution :
In single-factor ANOVA with sample sizes ni (i = 1, 2, . . . , I)
X
I
2 X
I X
n
Show that SSTr = ni X̄i. − X̄.. = ni X̄i.2 − nX̄..2 , where n = ni .
i=1 i=1 i=1
We have
X
I
SSTr = ni (Xi. − X.. )2
i=1
XI
= ni X̄i.2 + X̄..2 − 2X̄.. X̄i.
i=1
XI X
I X
I
= ni X̄i.2 + ni X̄..2 − 2X̄.. ni X̄i.
i=1 i=1 i=1
XI
= ni X̄i.2 + nX̄..2 − 2nX̄..2
i=1
XI
= ni X̄i.2 − nX̄..2
i=1
1 X
E(MSTr) = σ 2 + ni αi2
I −1
Since
!
1 X
I
⇒ E(M ST r) = E n (x̄i. − x̄.. )2
I −1 i=1
!
1 XI
= E nx̄2i. + nx̄2.. − n2x̄.. x̄i.
I −1 i=1
!
1 XI X
I
= E n x̄2i. − 2nx̄.. x̄i. + nI x̄..
I −1 i=1 i=1
!
1 XI
= E n x̄2i. − nI x̄2..
I −1
i=1
!2 !2
1 1 X
I X
n
1 X
I X
n
= E xij − xij
I −1 n i=1 j=1 nI i=1 j=1
!2
1 1 XI Xn
1 X I X n
= E (µ + αi + ϵij )2 − (µ + αi + ϵij )
I −1 n i=1 j=1 nI i=1 j=1
!
1 XI
(nIµ) 2
nIµ 2
= nµ2 + n αi2 + aσ 2 − −
I −1 i=1
nI nI
n X 2
I
= σ2 + α When H0 is true, we get αi = 0
I − 1 i=1 i
Therefore, E(M ST r) = σ 2
9. Consider the ANOVA model
Xij = µi + εij , i = 1, 2, . . . , I, , j = 1, 2, . . . , J
SSE
where Xij ∼ N µi , σ 2 . Then show that (a) the random variable 2
∼ χ2 (I(J − 1))
σ
(b) the statistics SSE and SST r are independent. Further, if the null hypothesis
H0 : µ1 = µ2 = . . . = µI = µ is true, then
SST r
(c) the random variable 2
∼ χ2 (I − 1)
σ
M ST r
(d) the statistics ∼ F (I − 1, I(J − 1))
M SE
SST
(e) the random variable ∼ χ2 (IJ − 1).
σ2
Solution : Consider the ANOVA model
Xij = µi + ϵij , i = 1, 2, . . . , I, j = 1, 2, . . . , J
SSE
where Xij ∼ N µ, σ 2 . Then show that (a) the random variable ∼ χ2 (I(J − 1))
σ2
Then
SSE X (J − 1)Si2
I
=
σ2 i=1
σ2
(J1 ) Si2
Since, S12 , . . . , SI2 are independent and 2
∼ χ2 (J − 1), ∀i
σ
Therefore,
SSE
∼ χ2 (I(J − 1))
σ2
(b) the statistics SSE and SSTr are independent. Further, if the null hypothesis H0 :
µ1 = µ2 = . . . = µI = µ is true, then We have
X
I
SSE = (J − 1)Si2
i=1
Since
X
I
2
SST r = J X̄i. − X̄..
i=1
Since Si and X̄i are independent. So, SSE and SST r are independent.
SST r
(c) the random variable 2
∼ χ2 (I − 1)
σ
we have
X J
2
SST r = J X̄i. − X̄.
i=1
1X
J
Let Yi = X̄i. = Xij since, Xij ∼ N µ, σ 2 =⇒ Yi ∼ N µ, σ 2 /J
J j=1
So,
2
SST r X Yi − Ȳ
J
=
σ2 i=1
σ 2 /J
SST r
Therefore, ∼ χ2 (I − 1)
σ2
M ST r
(d) the statistics ∼ F (I − 1, I(J − 1))
M SE
We have
SST r/σ 2
M ST r I−1
F = = SSE/σ 2
M SE
J−1
SST r SSE
Since, 2
∼ χ2 (I − 1) and ∼ χ2 (I(J − 1))
σ σ2
and we already knew that SSTr and SSE are independent.
SST
Therefore, F ∼ F (I − 1, I(J − 1)) (e) the random variable ∼ χ2 (IJ − 1)
σ2
We have SST = SSE + SST r
SST SSE SST r
Then 2
= + ∼ χ2 (IJ − 1)
σ σ2 σ2
10. The number of miles of useful tread wear (in 1000s) was determined for tires of each
of five different makes of subcompact car (factor A, with I = 5 ) in combination
with each of four different brands of radial tires (factor B, with J = 4 ), resulting in
IJ = 20 observations. The values SSA = 30.6, SSB = 44.1, and SSE = 59.2 were
then computed. Assume that an additive model is appropriate.
a. Test H0 : α1 = α2 = α3 = α4 = α5 = 0 (no differences in true average tire lifetime
due to makes of cars) versus Ha : at least one αi = 0 using a level 0.05 test.
b. H0 : β1 = β2 = β3 = β4 = 0 (no differences in true average tire lifetime due to
brands of tires) versus Ha : at least one βj using a level 0.05 test.
Solution :
(a) Test H0 : α1 = α2 = α3 = α4 = α5 = 0 ( no differences in true average tire lifetime
due to makes of cars) versus Ha : at least one αi ̸= 0 using a level 0.05 test.
Null Hypothesis H0 : α1 = α2 = α3 = α4 = α5 = 0
Versus Ha :at least one αi = 0
In order to test the hypothesis the value F need to be calculated.
M SA
F =
M SE
where,
SSA SSE
M SA = , and M SE =
dfA dfE
we have,dfA = I − 1 = 4 and dfE = (I − 1)(J − 1) = 12 therefore,
30.6 59.2
M SA = = 7.65, and M SE = = 4.93
4 12
thus,
7.65
f= = 1.55
4.93
The value of f -statistic F0.05,4,12 = 3.26
Since f < F0.05,4,12 the null hypothesis H0 cannot be rejected.
This means that there is no significant difference in the average lifetime of tires among
the various makes of cars.
(b) H0 : β1 = β2 = β3 = β4 = 0 (no differences in true average tire lifetime due to
brands of tires) versus Ha : at least one βi ̸= 0 using a level 0.05 test.
Null hypothesis H0 : β1 = β2 = β3 = β4 = 0
versus Ha : at least one βi ̸= 0
In order to test the hypothesis the value F need to be calculated.
M SB
F =
M SE
where,
SSB SSE
M SB = , and M SE =
dfB dfE
Therefore,
44.1/3
f= = 2.98
4.93
The value of f -statistic F0.05,3,12 = 3.49
Since f < F0.05,3,12 , the null hypothesis H0 cannot be rejected.
This means that there is no significant difference in the average lifetime of tires among
the different brands of tires.
11. Four different coatings are being considered for corrosion protection of metal pipe. The
pipe will be buried in three different types of soil. To investigate whether the amount
of corrosion depends either on the coating or on the type of soil, 12 pieces of pipe
are selected. Each piece is coated with one of the four coatings and buried in one of
the three types of soil for a fixed time, after which the amount of corrosion (depth of
maximum pits, in .0001 in.) is determined. The data appears in the table.
a. Assuming the validity of the additive model, carry out the ANOVA analysis using
an ANOVA table to see whether the amount of corrosion depends on either the type
of coating used or the type of soil. Use α = 0.05.
b. Compute µ̂, αˆ1 , αˆ2 , αˆ3 , αˆ4 , β̂1 , β̂2 , and β̂3 .
Solution :
(a) Assuming the validity of the additive model, carry out the ANOVA analysis using
an ANOVA table to see whether the amount of corrosion depends on either the type
of coating used or the type of soil. Use α = 0.05.
Soil Type(B)
1 2 3
1 64 49 50
Coating(A) 2 53 51 48
3 47 45 50
4 51 43 52
Soil Type(B)
1 2 3 xi. x̄i.
1 64 49 50 163 54.3333
Coating(A) 2 53 51 48 152 50.6667
3 47 45 50 142 47.3333
4 51 43 52 146 48.6667
x.j 215 188 200
x̄.j 53.75 47 50
we have,
X.. = 163 + 152 + 142 + 146 = 603
Since
X
I X
J
x2ij = 642 + . . . + 522 = 30599
i=1 j=1
X
I X
J
1 2
SST = x2ij − x = 298.25
i=1 j=1
IJ ..
1X 2
I
1 2
SSA = xi. − x = 83.5833
J i=1 IJ ..
1X 2
J
1 2
SSB = x.j − x = 91.5
I j=1 IJ ..
The means :
SSA SSB SSE
M SA = = 27.8611, M SB = = 45.75, M SE = = 20.5278
dfA dfB dfE
The F-Statistics :
M SA M SB
fA = = 1.36, fB = = 2.23
M SE M SE
Thus, the complete ANOVA table is as shown below :
Source df SS MS F-ratio
Factor A 3 83.5833 27.8611 1.36
Factor B 2 91.5 45.75 2.23
Error 6 123.1667 20.5278
Total 11 298.25
Therefore, we have
µ̂ = 50.25
Then
α̂1 = 4.0833, α̂2 = 0.4167, α̂3 = −2.9167, α̂4 = −1.5833
Since
β̂1 = 3.5, β̂2 = −3.25, β̂3 = −0.25
12. a. Show that a constant d can be added to (or subtracted from) each xij without
affecting any of the ANOVA sums of squares.
b. Suppose that each xij is multiplied by a nonzero constant c. How does this affect
the ANOVA sums of squares? How does this affect the values of the F statistics FA
and FB ? What effect does ”coding” the data by yij = cxij + d have on the conclusions
resulting from the ANOVA procedures?
Solution :
(a) Show that a constant d can be added to (or subtracted from) each xij without
affecting any of the ANOVA sums of squares.
Let a constant d which has been added to each xij so that xdij = xij +d and xdi = xi +jd
and xd.. = x.. + IJd
we have
XX 1 2
⇒ SSTnew = x2dij −
X
IJ d..
XX 1
= (xij + d)2 − (x.. + IJd)2
IJ
XX 1
= x2ij + d2 + 2dxij − x2.. + (IJd)2 + 2x.. dIJ
IJ
XX 1
= x2ij − x2
IJ ..
= SSTold
Therefore,
SSTc = c2 SST
Therefore, each sum of squares will be multiplied by c2 .
Since fA and fB are ratio of sum of square, then they are not affected. the ”Coding”
of data will only have affecting as the multiplication by c.
13. In an experiment to see whether the amount of coverage of light-blue interior latex
paint depends either on the brand of paint or on the brand of roller used, one gallon of
each of four brands of paint was applied using each of three brands of roller, resulting
in the following data (number of square feet covered).
Roller Brand
1 2 3
1 454 446 451
Paint 2 446 444 447
Brand 3 439 442 444
4 444 437 443
Rollar Brand
1 2 3
Paint 1 454 446 451
Brand 2 446 444 447
3 439 442 444
4 444 437 443
Rollar Brand
1 2 3 x1 . x̄1.
1 454 446 451 151 50.3333
Paint Brand 2 446 444 447 137 45.6667
3 439 442 444 125 41.6667
4 444 437 443 124 41.3333
x−j 183 169 185
x̄.j 45.75 42.25 46.25
we have,
x.... = 151 + 137 + 125 + 1124 = 537
Since,
X
I X
J
x2ij = 542 + . . . + 432 = 24269
i=1 j=1
X
I X
J
1 2
SST = x2ij − x = 238.25
i=1 j=1
IJ ..
1X 2
I
1 2
SSA = xi − x = 159.5833
J i=1 IJ .
1X 2
J
1 2
SSB = x−j − x = 38
I j=1 IJ −.
The means :
SSA SSB SSE
M SA = = 53.1944, M SB = = 19, M SE = = 6.7778
dfA dfB dfE
The F-Statistics :
M SA M SB
fA = = 7.848, fB = = 2.8033
M SE M SE
By : Sun Bunra 139
Institute of Technology of Cambodia Statistics ( 2022-2023 )
we have
1 X 2 x2
SSA = xi.. − ... = 1387.5
JK IJK
1 X 2 x2
SSA = x.j. − ... = 2888.083
IK IJK
XXX 1 XX 2
SSE = x2ijk − xij = 8216
K
XXX x2
SST = x2ijk − ... = 20591.853
IJK
SSAB = SST − SSE − SSA − SSB = 8100.25
we have the following table :
Source df SS MS f
Brand of pens(A) 3 1387.5 462.5 0.67
Writing surf(B) 2 2888.083 1444.041 2.1
Interaction(AB) 6 8100.25 1350.041 1.97
Error 12 8216 684.66
Total 23 20591.833
■ For α = 0.05
■ For Factor A : Fα,I−1,IJ(K−1) = 3.49
■ For Factor B : Fα,J−1,IJ(K−1) = 3.89
■ For Factor AB : Fα,(J−1)(I−1),IJ(K−1) = 3.00
It can be observed that each f is not included in the rejection region.
Therefore, Neither the surface nor the brand of pen has a significant effect on the
writing.
17. a. Show that E X̄i.. − X̄ − . . . = αi , so that X̄i.. − X̄... is an unbiased estimator for
αi (in the fixed effects model).
b. With γ̂ij = X̄ij. − X̄i.. − X̄.j. + X̄.... , show that γ̂ij is an unbiased estimator for γij
(in the fixed effects model).
Solution :
(a) Show that E X̄i.. − X̄... = αi , so that X̄i.. − X̄.... is an unbiased estimator for αi
(in the fixed effects model).
We have θ = αi = µ̄i. − µ, parameter of interest with estimator θ̂ = X̄i. − X̄...
Since each sample space Xi is normally distributed by assumption, then E (Xi ) = µi
E(θ̂) = E X̄i. − X̄... = µi. − µ = αi
We have
E (γ̂ij ) = E X̄ij. − X̄i.. − X̄.j. + X̄....
= E X̄ij. − E X̄i.. − E X̄.j. + E X̄...
= E X̄ij. − E X̄.... − E X̄i.. − E X̄... − E X̄.j. − E X̄...
= E X̄ij. − E X̄... + E X̄i.. − X̄... + E X̄.j. − X̄...
= µij − (µ + αi + βi )
= γij