0% found this document useful (0 votes)
304 views

Assignment Stastics

This document contains statistics exercises from a class assignment. It includes 6 exercises analyzing descriptive statistics concepts like nominal/ordinal/interval/ratio measurement, qualitative/quantitative variables, discrete/continuous variables, and constructing frequency distributions. The exercises analyze data on news consumption habits, college completion rates by US state, and ages of US Vice Presidents. The student provides well-explained answers to each multi-part exercise analyzing different statistical concepts.

Uploaded by

Hai Kim Sreng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
304 views

Assignment Stastics

This document contains statistics exercises from a class assignment. It includes 6 exercises analyzing descriptive statistics concepts like nominal/ordinal/interval/ratio measurement, qualitative/quantitative variables, discrete/continuous variables, and constructing frequency distributions. The exercises analyze data on news consumption habits, college completion rates by US state, and ages of US Vice Presidents. The student provides well-explained answers to each multi-part exercise analyzing different statistical concepts.

Uploaded by

Hai Kim Sreng
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

INSTITUTE OF TECHNOLOGY OF CAMBODIA

Département Génie Industriel et Mécanique

Assignment Statistics

Group : GIM - 06

Lectures : Mr. PHOK Ponna (Course)


Mr. OL Say (TD)
Student’s name ID
SUN BUNRA e20201001

Academic year (2022-2023)


Institute of Technology of Cambodia Statistics ( 2022-2023 )

TD1 - Descriptive Statistics

Exercise 1. For statements (a)-(h), state whether descriptive or inferential statistics has
been used.
(a) By 2040 at least 3.5 billion people will run short of water (World Future Society).
(b) In a sample of 100 on-the-job fatalities, 90% of the victims were men.
(c) In a survey of 1000 adults, 34% said that they posted notes on social media websites
(Source: AARP Survey).
(d) In a poll of 3036 adults, 32% said that they got a flu shot at a retail clinic (Source:
Harris Interactive Poll).
(e) Allergy therapy makes bees go away (Source: Prevention).
(f) Drinking decaffeinated coffee can raise cholesterol levels by 7% (Source: American Heart
Association).
(g) The average stay in a hospital for 2000 patients who had circulatory system problems
was 4.7 days.
(h) Experts say that mortgage rates may soon hit bottom (Source: USA TODAY).
Answer :
a Inferential
b Descriptive
c Descriptive
d Descriptive
e Inferential
f Inferential
g Descriptive
h Inferential

Exercise 2. For statements (a)-(i), classify each as nominal-level, ordinal-level, interval-


level, or ratiolevel measurement
(a) Pages in the 25 best-selling mystery novels.
(b) Rankings of golfers in a tournament.
(c) Temperatures inside 10 pizza ovens.
(d) Weights of selected cell phones.
(e) Salaries of the coaches in the NFL.
(f) Times required to complete a chess game.
(g) Ratings of textbooks (poor, fair, good, excellent).
(h) Number of amps delivered by battery chargers.
(i) Ages of children in a day care center. Categories of magazines in a physician’s office
(sports, women’s, health, men’s, news).

By : Sun Bunra 2
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Answer :
a Ratio-level
b Ordinal-level
c Interval-level
d Ratio-level
e Ratio-level
f Ratio-level
g Ordinal-level
h Ratio-level
i Ratio-level

Exercise 3. For statements (a)-(h), classify each variable as qualitative or quantitative.


(a) Marital status of nurses in a hospital.
(b) Time it takes to run a marathon.
(c) Weights of lobsters in a tank in a restaurant.
(d) Colors of automobiles in a shopping center parking lot.
(e) Ounces of ice cream in a large milkshake.
(f) Capacity of the NFL football stadiums.
(g) Ages of people living in a personal care home.
(h) Different vitamins taken.
Answer :
a Qualitative
b Quantitative
c Quantitative
d Qualitative
e Quantitative
f Quantitative
g Quantitative
h Quantitative

Exercise 4. For statements (a)-(h), classify each variable as discrete or continuous.


(a) Number of pizzas sold by Pizza Express each day.
(b) Relative humidity levels in operating rooms at local hospitals.
(c) Number of bananas in a bunch at several local super-markets.
(d) Lifetimes (in hours) of 15 iPod batteries.
(e) Weights of the backpacks of first-graders on a school bus.
(f) Number of students each day who make appointments with a math tutor at a local college.
(g) Blood pressures of runners in a marathon.
(h) Ages of children in a preschool.

By : Sun Bunra 3
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Answer :
a Discrete
b Continuous
c Discrete
d Continuous
e Continuous
f Discrete
g Continuous
h Continuous

Exercise 5. How People Get Their News The Brunswick Research Organization surveyed 50
randomly selected individuals and asked them the primary way they received the daily news.
Their choices were via newspaper (N), television (T), radio (R), or Internet (I). Construct a
categorical frequency distribution for the data and interpret the results.

N N T T T I R R I T
I N R R I N N I T N
I R T T T T N R R I
R R I N T R T I I T
T I N T T I R N R T

Solution :
There are four types of primary way to receive the daily news N, T, R, and I. These types
will be used as the classes for the distribution.
The procedure for constructing a frequency distribution for categorical data is given below.
• Make a table as shown.
• Tally the data and place the results in column B.
• Count the tallies and place the results in column C.
• Find the percentage of values for each class in column D.
The categorical frequency distribution is obtained as given below:

Classes Frequency Percentage


N 10 20%
T 16 32%
R 12 24%
I 12 24%
Total 50 100%

From the above frequency distribution table, it is clear that 32% of the people got their daily
news via television which is the higher percentage as compared to other primary ways to
receive their daily news.

By : Sun Bunra 4
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 6. College Completions The percentage (rounded to the nearest whole percent) of
persons from each state completing 4 years or more of college is listed below. Percentage of
persons completing 4 years of college
23 25 24 34 22 24 27 37 33 24
26 23 38 24 24 17 28 23 30 25
30 22 33 24 28 36 24 19 25 31
34 31 27 24 29 28 21 25 26 15
26 22 27 21 25 28 24 21 25 26
(a) Organize the data into a grouped frequency distribution with 5 classes.
(b) Find the relative frequency.
(c) Construct a histogram, frequency polygon, and ogive.
Solution : We have MIN = 15, MAX = 38, K = 6, I = 4
a. Organize the data into a grouped frequency distribution with 6 classes.
lower limit upper limit lower boundary upper boundary Midpoint frequency
15 18 14.5 18.5 16.5 2
19 22 18.5 22.5 20.5 7
23 26 22.5 26.5 24.5 22
27 30 26.5 30.5 28.5 10
31 34 30.5 34.5 32.5 6
35 38 34.5 38.5 36.5 3
50
(b) Find the relative frequency.
cumulative frequency Class boundary Class Cumulative Frequency
2 14.5 − 18.5 14.5 0
9 18.5 − 22.5 18.5 2
31 22.5 − 26.5 22.5 9
41 26.5 − 30.5 26.5 31
47 30.5 − 34.5 30.5 41
50 34.5 − 38.5 34.5 47
50
(c) Construct a histogram, frequency polygon, and ogive.

By : Sun Bunra 5
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 7. Ages of the Vice Presidents at the Time of Their Death The ages of the Vice
Presidents of the United States at the time of their death are listed below.

90 83 80 73 70 51 68 79 70 71
72 74 67 54 81 66 62 63 68 57
66 96 78 55 60 66 57 71 60 85
76 98 77 88 78 81 64 66 77 93 70

(a) Use the data to construct a frequency distribution with 6 classes.


(b) Find the relative frequency.
(c) Construct a histogram, frequency polygon, and ogive.
Solution : We have MIN = 51, MAX = 98, K = 6, I = 8
(a) Use the data to construct a frequency distribution with 6 classes.

lower limit upper limit lower boundary upper boundary Midpoint frequency
51 58 50.5 58.5 54.5 5
59 66 58.5 66.5 62.5 9
67 74 66.5 74.5 70.5 11
75 82 74.5 82.5 78.5 9
83 90 82.5 90.5 86.5 4
91 98 90.5 98.5 94.5 3
41
(b) Find the relative frequency.

cumulative frequency Class boundary Class Cumulative Frequency


5 50.5 − 58.5 50.5 0
14 58.5 − 66.5 58.5 2
25 66.5 − 74.5 66.5 3
34 74.5 − 82.5 74.5 4
38 82.5 − 90.5 82.5 17
41 90.5 − 98.5 90.5 34
98.5 41
(c) Construct a histogram, frequency polygon, and ogive.

By : Sun Bunra 6
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 8. Activities While Driving A survey of 1200 drivers showed the percentage of
respondents who did the following while driving. Construct a vertical bar graph and a
horizontal bar graph for the data.

Drink beverage 80%


Talk on cell phone 73%
Eat a meal 41%
Experience road rage 23%
Smoke 21%

Solution :
Construct a vertical bar graph and a horizontal bar graph for the data.

Drink Beverage 80%


Talk on cell phone 73%
Eat a meal 41%
Experience road rage 23%
Smoke 21%

By : Sun Bunra 7
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 9. Calories of Nuts The data show the number of calories per ounce in selected
types of nuts. Construct vertical and horizontal bar graphs for the data.

Types Calories
Peanuts 160
Almonds 170
Macadamia 200
Pecans 190
Cashews 160

Solution :
selected types of nuts. Construct vertical and horizontal bar graphs for the data.

By : Sun Bunra 8
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 10. Space Launches The data show the number of U.S. space launches for the
10-year periods from 1960 to 2009. Construct a time series graph for the data and analyze
the graph.
Year 60 − 69 70 − 79 80 − 89 90 − 99 100 − 109
Launches 614 247 199 300 206
Solution :
We have,
Year Launches
60 − 69 614
70 − 79 247
80 − 89 199
90 − 99 300
100 − 109 206
Construct a time series graph for the data and analyze the graph.

The data show the number of us that space launches 60 to 89 the graph is decrease And 89
to 99 the graph is increase And then 99 to 109 is decrease again.

By : Sun Bunra 9
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 11. High School Dropout Rate The data show the high school dropout rate for
students for the years 2003 to 2009 . Construct a time series graph and analyze the graph.

Year 2003 2004 2005 2006 2007 2008 2009


Percent 9.9 10.3 9.4 9.3 8.7 8.0 8.1

Solution :
We have,
Year Percent
2003 9.9
2004 10.3
2005 9.4
2006 9.3
2007 8.7
2008 8
2009 8.1
Construct a time series graph and analyze the graph.

The data showed about the number of us that High School Dropout rate increase from 2003
into 2004 and then it is decrease from 2004 to 2008 and increase again in 2008 to 2009.

Exercise 12. Spending of College Freshmen The average amounts spent by college freshmen
for school items are shown. Construct a pie graph for the data.

Electronics/computers $728
Dorm items $344
Clothing $141
Shoes $72

Solution :
The total amount is 1285$. For construct the pie graph you need to find the percentages of
each data.
(728 × 100)
• Some note:
1285
By : Sun Bunra 10
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 13. Career Changes A survey asked if people would like to spend the rest of their
careers with their present employers. The results are shown. Construct a pie graph for the
data and analyze the results.

Answer Number of people


Yes 660
No 260
Undecided 80

Solution :
We have
Answer Number of people
Yes 660
No 260
Undecided 80
Construct a pie graph for the data and analyze the results.

The data show the number of us that Career change A we saw this graph show that for
answer yes it have 80% and 26% is said no and then we have 8% is undecided.

By : Sun Bunra 11
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 14. Peyton Manning’s Colts Career Peyton Manning played for the Indianapolis
Colts for 14 years. (He did not play in 2011.) The data show the number of touch-downs he
scored for the years 1998-2010. Construct a dotplot for the data and comment on the graph.
26 33 27 49 31 27 33
26 26 29 28 31 33
Solution :
Construct a dotplot for the data and comment on the graph.

The graph shows that the maximum score in number 26 and 33 his scored 3 score. And the
minimum score in number 49 and 29 his scored 1 score.
Exercise 15. Songs on CDs The data show the number of songs on each of 40CDs from the
author’s collection. Construct a dotplot for the data and comment on the graph.
10 14 18 11
11 15 16 10
10 17 10 15
22 9 14 12
18 12 12 15
21 22 20 15
10 19 20 21
17 9 13 15
11 12 12 9
14 20 12 10

By : Sun Bunra 12
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
Construct a dotplot for the data and comment on the graph.

The graph shows that most CDs have 10 songs, 16 and 22 songs. The lowest number of CDs
have 17 songs.
Exercise 16. The traffic situation in X-City is getting worse, and it is high time a solution
was offered. The company hired to work on the project took a survey of the estimated
amount of vehicles that move on the road daily and for various intervals. The result of this
survey is illustrated in the table below.
Time Cars Buses Bikes
1 − 2pm 37 45 42
2 − 3pm 44 34 26
3 − 4pm 23 39 27
4 − 5pm 29 41 48
Construct a multiple line graph to visualize the data. Hence, determine the vehicle with the
highest frequency and that with the lowest frequency.
Solution :
determine the vehicle with the highest frequency and that with the lowest frequency.

By : Sun Bunra 13
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Vertical with highest frequency is Bikes and with lowest frequency is Cars.

Exercise 17. Draw a multiple bar graph for the following data which represented agricul-
tural production for the priod from 2010-2013.

Year Food grains (tones) Vegetables (tones) Others (tones)


2010 100 30 10
2011 120 40 15
2012 130 45 25
2013 150 50 25

Solution :
Draw a multiple bar graph for the following data which represented agricultural production
for the priod from 2010-2013.

Exercise 18. The heights (in cm ) of a sample of the students in a class are shown:

50 52 70 72 65 52 60
75 51 64 65 55 67 70

Find the mean, mode, median, inter quartile range, midrange, variance, and standard devi-
ation for the data.

By : Sun Bunra 14
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
Find the mean, mode, median, inter quartile range, midrange, variance, and standard devi-
ation for the data.
P X
x 868
• Mean: x̄ = , n = 14, x = 868 (sum of all data) → x̄ = = 62
n 14
• Mode = 52, 65, 70 (appear most frequency)
X7 + X8 64 + 65
• Median: M D = = = 64.5
2 2
• Inter quartile range:
n×p 14 × 25 X4 + X5 52 + 55
• for p = 25 ⇒ C = = = 3.5 ⇒ Q1 = = = 53.5
100 100 2 2
n×p 14 × 75 X11 + X12 70 + 70
• for p = 75 ⇒ C = = = 10.5 ⇒ Q3 = = = 70
100 100 2 2
⇒ IQR = Q3 − Q1 = 70 − 535 = 16.5
lowest values + highest value 50 + 75
• Midrange: M R = = = 62.5
2 2
P
(X − X̄)2 962
• Variance: s = 2
= = 74 (Check Excel)
n−1 13
√ √
• Standard deviation: s = s2 = 74 = 8.60233

Column1

Mean 62 IQR= Q3-Q1


Standard Error 2.299 Q3 Q1
Median 64.5 69.25 52.75
Mode 52
Standard Deviation 8.60 IQR 16.5
Sample Variance 74
Inter Quartile Range 16.5 Midrange
Sum 62.5 62.5
Mangange 25
Minimum 50

Exercise 19. Households of Four Television Networks A survey showed the number of
viewers and number of households of four television networks. Find the average number of
viewers, using the weighted mean.

Households 1.4 0.8 0.3 1.6


Viewers (in millions) 1.6 0.8 0.4 1.8

Solution :
Find the average number of viewers, using the weighted mean.
Averages of number viewers:
P
wx ((1.4 × 1.6)) + ((0.8) × (0.8)) + ((0.3) × (0.4)) + ((1.6) × (1.8))
X̄ = P = 1.43
w 1.4 + 0.8 + 0.3 + 1.6

By : Sun Bunra 15
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 20. Magazines in Bookstores A survey of bookstores showed that the average
number of magazines carried is 56 , with a standard deviation of 12 . The same survey
showed that the average length of time each store had been in business was 6 years, with
a standard deviation of 2.5 years. Which is more variable, the number of magazines or the
number of years?
Solution :
Here the average number of magazines carried is x = 56, then the standard deviation is 12
Here the average length of time is x = 6, then the standard deviation s is 2.5

CVar x s Percentage
Magazines 56 12 21.43%
Time 6 2.5 41.67%

Therefore, the coefficient of variation time is 41.66% from the above information the data of
the time in business is more variable, because its coefficient of variation is higher.

Exercise 21. Average Earnings of Workers The average earnings of year-round full-time
workers 25 − 34 years old with a bachelor’s degree or higher were $58, 500 in 2003 . If the
standard deviation is $11, 200, what can you say about the percentage of these workers who
earn.
(a) Between $47, 300 and $69, 700 ?
(b) More than $80, 900 ?
(c) How likely is it that someone earns more than $100, 000 ?
Solution :
(a) Between $47, 300 and $69, 700 ?
We have X̄ = 58500$, S = 11200$

µ − ks = 47300 (1)
Then
µ + ks = 49700 (2)
µ − 47300 58500 − 47300
(1): k = = =1
5 11200
Thus, chebyshev’s is not app.
(b) More than $80, 900 ?
• Step 1
we need to find p(x > 80900)
x0 − µ 809000 − 58500
by z-score, zo = = =2
6 11200
⇒ p(z > 2) = 1 − p(z < 2)
= 1 − p(z < 2)
= 1 − 0.9772
⇒ p(x > 80900) = 0.0228
• Step 2

By : Sun Bunra 16
Institute of Technology of Cambodia Statistics ( 2022-2023 )

⇒ 80900 = 58500 + 11200k ⇒ k = 2


1
⇒ p(|80900 − µ| < 25) ≥ 1 − 2
2
⇒ P (x < µ − 28) + P (x > 80900) ≤ 1 − 0.75
⇒ P (x > 80900) ≤ 25%
(c) How likely is it that someone earns more than $100, 000 ?

⇒ P (X > 80900) ≤ 1 − 0.75


1 100000 − 58500
⇒ P (µ − ks < x < ks + µ) = 1 − 2
⇒k= = 3.7
k 11200
1
⇒ P (|100000 − µ|) ≥ 3.75 ≥ 1 − = 0.926
(3.7)2
⇒ P (X < µ − 3.75) + P (X > 100000) ≤ 1 − 0.926

Exercise 22. Costs to Train Employees For a certain type of job, it costs a company an
average of $231 to train an employee to perform the task. The standard deviation is $5.
Find the minimum percentage of data values that will fall in the range of $219 to $243. Use
Chebyshev’s theorem.
Solution :
• Step 1
 
µ − kσ = 219 231 − 5k = 219 × (−1)

µ + kσ = 243 231 + 5k = 243
1 1
By cheby shev’s theorem: p(|x − µ| < kσ) ≥ 1 − 2
=1− = 0.83 = 83%
k (2.4)2
• Step 2
We have, mean = 231$
Standard deviation 5$
1
Formular 1 − 2
k
Firstly, we need to find out how many standard deviations 219$ and 243$ are from the
mean of 231$ we subtract to find which, if each, of the two bounds, 219$ and 234$, is closer
to the mean.

$231 $219 = $12


$243 $231 = $12
We find that both limits are 12$ from the mean. Now want to see how many standard
deviations this 12$ difference. So, we divide by the given standard deviation, 5$ to find out.

12$ 12 1 1
we got k = = 2.4 ⇔ 1 − 2 = 1 − = 1 − 0.1736 = 0.8263 = 82.63%
5$ 5 k (2.4)2

By : Sun Bunra 17
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 23. Exam Completion Time The mean time it takes a group of students to com-
plete a statistics final exam is 44 minutes, and the standard deviation is 9 minutes. Within
what limits would you expect approximately 95% of the students to complete the exam?
Assume the variable is approximately normally distributed.
Solution :
From what we can show from the informing
We have: µ = 44, σ = 9
Approximately 95% we got k = 2 (normally)

µ − kσ = 44 − 2(9) = 26
µ − kσ = 44 + 2(9) = 62
Therefore, approximately 95 % of students can complete the exam is staying between 26-62
minutes.
Exercise 24. Exam Grades Which of these exam grades has a better relative position?
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
Solution :
(a) A grade of 82 on a test with x̄ = 85 and s = 6.
82 − 85
⇒ z - score for grade of 82 is z1 = = −0.5
6
(b) A grade of 56 on a test with x̄ = 60 and s = 5.
56 − 60
⇒ z - score for grade of 56 is z2 = = −0.8 In conclusion, Z1 > Z2
5
Hence, a grade of 82 has better relative position than a grade of 56 on a test.
Exercise 25. Check each data set for outliers.
(a) 506, 511, 517, 514, 400, 521
(b) 3, 7, 9, 6, 8, 10, 14, 16, 20, 12
Solution :
(a) 506, 511, 517, 514, 400, 521
We have, 400, 506, 511, 514, 517, 521
n×p 6 × 25 X2 + X3 506 + 511
• P = 25 ⇒ C = = = 1.5 ⇒ Q1 = = = 508.5
100 100 2 2
n×p 6 × 75 X5 + X6 517 + 521
• P = 75 ⇒ C = = = 4.5 ⇒ Q3 = = = 519
100 100 2 2
• Find the interquartile range : IQR = Q3 − Q1 = 519 − 508.5 = 10.5
• Multiply the IQR by 1.5 : IQR ×1.5 = 10.5 × 1.5 = 15.75
• Subtract the value obtained in step 3 from Q1 , and add value to Q3 .
Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than
Q3 + 1.5(IQR).

By : Sun Bunra 18
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Outlier Outlier

Q1 − 1.5IQR = 492.75 Q3 + 1.5IQR = 534.75


Data Outliers Q1 Q3 IQR Upper boundary Lower boundary
400 TRUE 507.25 516.25 9 534.75 492.75
506 FALSE
511 FALSE
514 FALSE
517 FALSE
521 FALSE
X3 + X4 511 + 514
Q2 = = = 512.5
2 2
Therefore, 400 is outliers.
b. 3, 7, 9, 6, 8, 10, 14, 16, 20, 12
• Arrange the data in order and find Q1 and Q3 : 3, 6, 7, 8, 9, 10, 12, 14, 16, 20
n×p 10 × 25 X3 + X4 7+8
• p = 25 : C = = = 2.5 ⇒ Q1 = = = 7.5
100 100 2 2
n×p 10 × 75 X8 + X9 14 + 16
• p = 75 : C = = = 7.5 ⇒ Q3 = = = 15
100 100 2 2
• Find the interquartile range: IQR = Q3 − Q1 , IQR = 15 − 7.5 = 7.5
• Multiply the IQR by 1.5 : IQR ×1.5 = 7.5 × 1.5 = 11.25
• Substract the value obtained in step 3 from Q1 and add value to Q3
Q1 − 1.5IQR = 7.5 − 11.25 = −3.75, Q3 + 1.5IQR = 15 + 11.25 = 26.25
Checking the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than
Q3 + 1.5(IQR).
Outlier Outlier

Q1 − 1.5IQR = −3.37 Q3 + 1.5IQR = 26.25


Data Outliers Q1 Q3 IQR Upper boundary Lower boundary
3 FALSE 7.25 13.5 6.25 26.25 −3.75
7 FALSE
9 FALSE
6 FALSE
8 FALSE
10 FALSE
14 FALSE
16 FALSE
20 FALSE
12 FALSE
There are no outliers because no data value is out of (−3.75, 26.25)

By : Sun Bunra 19
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 26. Check each data set for outliers.


(a) 14, 18, 27, 26, 19, 13, 5, 25
(b) 112, 157, 192, 116, 153, 129, 131
Solution :
Check each data set for outliers.
(a) 14, 18, 27, 26, 19, 13, 5, 25
• Arrange the data in order and find Q1 and Q3 : 5, 13, 14, 18, 19, 25, 26, 27
n×p 8 × 25 X2 + X3 13 + 14
• p = 25 ⇒ C = = = 2 ⇒ Q1 = = = 13.5
100 100 2 2
n×p 8 × 75 X6 + X7 25 + 26
• p = 75 ⇒ C = = = 6 ⇒ Q3 = = = 25.5
100 100 2 2
• Find the interquatile range: IQR = Q3 − Q1 : IQR = 25.5 − 13.5 = 12
• Multiply the IQR by 1.5 : IQR × 1.5 = 12 × 1.5 = 18
• Substract the value ontained in step 3 from Q1 and add value to Q3 .

Q1 − 1.5IQR = 13.5 − 18 = −4.5Q3 + 1.5IQR = 25.5 + 18 = 43.5

Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 +
1.5(IQR).

Outlier Outlier

Q1 − 1.5IQR = −4.5 Q3 + 1.5IQR = 43.5

Data Outliers Q1 Q3 IQR Upper boundary Lower boundary


14 FALSE 13.75 25.25 11.5 43.5 −4.5
18 FALSE
27 FALSE
26 FALSE
19 FALSE
13 FALSE
5 FALSE
25 FALSE
There are no outliers because no data value is out of (−4.5, 43.5).
(b) 112, 157, 192, 116, 153, 129, 131
• Arrange the data in order and find Q1 and Q3 : 112, 116, 129, 131, 153, 157, 192
n×p 7 × 25 X2 + X3 116 + 129
• p = 25 ⇒ C = = = 1.75 ⇒ Q1 = = = 122.5
100 100 2 2
n×p 7 × 75 X5 + X6 153 + 157
• p = 75 ⇒ C = = = 5.25 ⇒ Q3 = = = 155
100 100 2 2
• Find the interquatile range: IQR = Q3 − Q1 : IQR = 155 − 122.5 = 32.5
• Multiply the IQR by 1.5: IQR ×1.5 = 32.5 × 1.5 = 48.75

By : Sun Bunra 20
Institute of Technology of Cambodia Statistics ( 2022-2023 )

• Substract the value ontained in step 3 from Q1 and add value to Q3 .

Q1 − 1.5IQR = 122.5 − 48.75 = 73.75, Q3 + 1.5IQR = 155 + 48.75 = 203.75

Check the data set for any data value is smaller than Q1 − 1.5(IQR) or larger than Q3 +
1.5(IQR).
Outlier Outlier

Q1 − 1.5IQR = 73.75 Q3 + 1.5IQR = 203.75

Data Outliers Q1 Q3 IQR Upper boundary Lower boundary


112 FALSE 122.5 155 32.5 203.75 73.75
157 FALSE
192 FALSE
116 FALSE
153 FALSE
129 FALSE
131 FALSE
There are no outliers because no data value is out of (73.75, 203.75)

Exercise 27. The following sample data are the midterm examination test scores for 30
students:
55 60 91 85 60 70 89 99 59 67
72 82 60 68 57 74 64 70 68 91
89 90 83 40 79 85 71 80 76 81
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
b. Construct a frequency table with 5 classes.
c. Using the grouped data formula, find the mean, mode, median, variance, standard devi-
ation, Q1 , and Q3 for the table in part (b) and compare it to the results in part (a).
d. Construct a histogram and comment on the shape of the distribution.
e. Find the percentile values of 55,60 , and 74 .
Solution :
a. Find the mean, mode, median, variance, standard deviation, Q1 , and Q3 of the data.
We have: MAX = 99, MIN = 40, I = 12
P X
x 2215
• Mean X̄ = , x = 2215, n = 30 ⇒ X̄ = = 73.83
n 30
• Mode = 60 (Appear 3 frequency which is the most frequency data value)
(X15 + X16 ) 72 + 74
• MD = = = 73
2 2
n×p 30 + 25
• Q1 = L25 = C = = =8
100 100
X 8 + Q9 64 + 67
We found Q1 = 8. So, We have: Q1 = = = 65.5 ∼ 65
2 2
n×p 30 + 75
• Q3 = L75 = C = = = 23
100 100
By : Sun Bunra 21
Institute of Technology of Cambodia Statistics ( 2022-2023 )

X23 + X24 85 + 89
We found Q3 = 23, So we have: Q3 = = = 85
2 2
P
(x − x̄)2 5294.17
• Variance : s =
2
= = 182.56
n−1 30 − 1
√ √
• Standard deviation: s = s2 = 182.56 = 13.51

Column1
Mean 73.83
Standard Error 2.46
Median 73
Mode 60
Standard Deviation 13.51
Variance 182.55

b. Construct a frequency table with 5 classes.


Recall: We have MAX = 99, MIN = 40, I = 12, K = 5

lower limit upper limit lower boundary upper boundary Midpoint frequency
40 51 39.5 51.5 45.5 1
52 63 51.5 63.5 57.5 6
64 75 63.5 75.5 69.5 9
76 87 75.5 87.5 81.5 8
88 99 87.5 99.5 93.5 6
30

cumulative frequency Class boundary Class Cumulative Frequency


1 39.5 − 51.5 39.5 0
7 51.5 − 63.6 51.5 1
16 63.5 − 75.7 63.5 7
24 75.5 − 87.8 75.5 16
30 87.5 − 99.9 87.5 24
30
c. Using the grouped data formula, find the mean, mode, median, variance, standard
deviation, Q1 , and Q3 for the table in part (b) and compare it to the results in part (a).

2
Class limits Frequency (f) Midpoint f.Xm f.Xm
40 − 51 1 45.5 45.5 2070.25
52 − 63 6 57.5 345 19837.5
64 − 75 9 69.5 625.5 43472.25
76 − 87 8 81.5 652 53138
88 − 99 6 93.5 561 52453.5
2229 170971.5
P
f · xm 2229
• Mean X̄ = = = 74.3
n 30
• Mode = modal class = the class with the largest frequency.
• The modal class is 64− 75 (class limit)or 63.5 − 75.5 (class boundary).

By : Sun Bunra 22
Institute of Technology of Cambodia Statistics ( 2022-2023 )

w
• Median: M D = Lm + (0.5n − cf )
f
• Lm = 63.5, w = 12, f = 9, n = 30, cf = 7
12
• M D = 63.5 + (0.5 × 30 − 7) = 74.16
9
P P
n ( f · Xm2
) − ( f · Xm )2 30(170971.5) − (2229)2
• Variance: s =2
= = 184.71
n(n − 1) 30(30 − 1)
√ √
• Standard deviation: s = s2 = 184.71 = 13.59
n×p 30 × 25
• For p = 25 → C = = = 7.5 ∼ 8
100 100
we can shows: Q1 = 65.5 and Q3 = 85, So from grouped data and data on ( a we got Q1
and Q3 are similar.
Therefore, from grouped data we got: Mean X̄ = 74.3, Mode = 63.5, M D = 74.16, S 2 =
184.71, S = 13.59, Q1 = 65.5, Q3 = 85
d. Construct a histogram and comment on the shape of the distribution.

Comment: The shape of the distribution can be described as bimodal. There are 2 classes
boundary that occurred at the same frequency.
e. Find the percentile values of 55,60 , and 74 .

( number of values below x) + 0.5


Percentile = × 100
total number of values
1 + 0.5
• for x = 55, then the percentile = × 100 = 5th percentile
30
Hence, a student whose score was 55 did better than 5% of the class.
4 + 0.5
• For x = 60, then the percentile = × 100 = 15th percentile .
30
By : Sun Bunra 23
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Hence, a student whose score was 60 did better than 15% of the class.
15 + 0.5
• For x = 74, then the percentile = × 100 = 51.6th percentile. Hence, a student
30
whose score was 74 did better than 51.6% of the class.

Exercise 28. For the following data:

6.3 2.9 4.5 1.1 1.8 4.0 1.2 3.1 2.0 4.0
7.0 2.8 4.3 5.3 2.9 8.3 4.4 2.8 3.1 5.6
4.5 4.5 5.7 0.5 6.2 3.7 0.9 2.4 3.0 3.5

(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
(b) Construct a frequency table with 5 classes.
(c) Using the grouped data formula, find the mean, mode, median, variance, standard devia-
tion, Q1 , Q3 and 90 th percentile for the frequency table constructed in part (b) and compare
it to the results in part (a).
(d) Construct a histogram, and comment on the shape of the data.
Solution :
(a) Find the mean, mode, median, variance, standard deviation, Q1 , Q3 , and 90 th percentile.
P
x
• Mean X̄ = = 3.74
n
• Mode = 4.5 (it appears 3 frequency which is the most frequency data values)
X15 + X16 3.5 + 3.7
• Median = = = 3.6
2 2

Σ X − X)2
• Variance 2 =
n−1
we have sample X = 3.74, n = 30,
101.55
we got s2 = = 3.50
29

• Standard deviation, s = s2 = 1.87
• Find Q1 and Q3
n×p 30 × 25 X8 + Q9 2.8 + 2.8
• For p = 25 : Q1 = = = 7.5 ⇒ Q1 = = = 2.8
100 100 2 2
n×p 30 × 75 X23 + Q24 4.5 + 5.3
• For p = 75 Q3 = = = 22.5 ⇒ Q3 = = = 4.9
100 100 2 2
Therefore, Q1 = 2.8, Q3 = 4.9
• Find the value corresponding to 90th percentile.
n×p 30 × 90
• For p = 90, then c = = = 27
100 100
6.2 + 6.3
Hence, the data value correspending to 90th percentile is = 6.25
2

By : Sun Bunra 24
Institute of Technology of Cambodia Statistics ( 2022-2023 )

(b) Construct a frequency table with 5 classes.

lower limit upper limit lower boundary upper boundary Midpoint frequency
0.5 2.05 0 2.55 1.275 6
2.06 3.61 1.56 4.11 2.835 9
3.62 5.17 3.12 5.67 4.395 8
5.18 6.73 4.68 7.23 5.955 5
6.74 8.29 6.24 8.79 7.515 1
29

cumulative frequency Class boundary Class Cumulative Frequency


1 0 − 2.55 0 0
10 1.56 − 4.11 2.55 6
18 3.12 − 5.67 4.11 9
23 4.68 − 7.23 5.67 8
24 6.24 − 8.79 7.23 5
8.79 1
(c) Using the grouped data formula, find the mean, mode, median, variance, standard devia-
tion, Q1 , Q3 and 90th percentile for the frequency table constructed in part (b) and compare
it to the results in part (a).
2
Class limits Frequency (f) Midpoint f.Xm f.Xm
0.5 − 2.05 6 1.275 7.65 9.75
2.06 − 3.61 9 2.835 25.51 72.33
3.62 − 5.17 8 4.395 35.16 154.52
5.18 − 6.73 5 5.955 29.77 177.31
6.74 − 8.29 1 7.515 7.51 56.47
105.615 470.40
P
f · Xm 105.615
• Mean X̄ = = = 3.52
n 30
• Mode = modal class = the class with the largest frequency.
• The modal class is 64− 75 (class limit) or 4.12 − 5.67 (class boundary).
w
• Median: M D = Lm + (0.5n − cf )
f
• Lm = 3.57, w = 3, f = 8, n = 30, cf = 9
3
• M D = 4.12 + (0.5 × 30 − 9) = 6.37
8
P P
n ( f · x 2
) − ( f · xm ) 2 2957.54
• Variance: s = 2 m
= = 3.4
n(n − 1) 870
√ √
• Standard deviation: s = s2 = 3.4 = 1.84
n×p 30 × 25
• For p = 25 → C = = = 7.5 ∼ 8
100 100
we can shows: Q1 = 2.8 and Q3 = 4.9, So from grouped data and data on (a) we got Q1 and
Q3 are similar.
(d) Construct a histogram, and comment on the shape of the data.

By : Sun Bunra 25
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 29. In recent years, due to low interest rates, many homeowners refinanced their
home mortgages. Linda Lahey is a mortgage officer at Down River Federal Savings and
Loan. Below is the amount refinanced for 20 loans she processed last week. The data are
reported in thousands of dollars and arranged from smallest to largest.

59.2 59.5 61.6 65.5 66.6 72.9 74.8 77.3 79.2 83.7
85.6 85.8 86.6 87.0 87.1 90.2 93.3 98.6 100.2 100.7

a. Find the median, first quartile, and third quartile.


b. Find the 26 th and 83rd percentiles.
c. Draw a box plot of the data and comment on the shape of the distribution.
Solution :
a. Find the median, first quartile, and third quartile.
X10 + X11 83.7 + 85.6
• Medain = = = 84.65
2 2
n×p 20 × 25 X5 + X6 66.6 + 72.9
• For p = 25 → C = = = 5 ⇒ Q1 = = = 69.75
100 100 2 2
n×p 20 × 75 X15 + X16 87.1 + 90.2
• For p = 75 → C = = = 15 ⇒ Q3 = = = 88.65
100 100 2 2
b. Find the 26th and 83rd percentiles.

n×p 20 × 26
C= = = 5.2
100 100

Since, the value corresponding to the 26th percentile is L26 = x6 = 72.9

n×p 20 × 83
C= = = 16.6
100 100

Thus, the value corresponding to the 83rd percentile is L83 = x17 = 93.3
c. Draw a box plot of the data and comment on the shape of the distribution.

By : Sun Bunra 26
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 30. Hours Worked The data shown here represent the number of hours that 12
part-time employees at a toy store worked during the weeks before and after Christmas.
Construct two boxplots and compare the distributions.
Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12
Solution :
Construct two boxplots and compare the distributions.
Before 38 16 18 24 12 30 35 32 31 30 24 35
After 26 15 12 18 24 32 14 18 16 18 22 12

Exercise 31. Many times in statistics it is necessary to see if a set of data values is approx-
imately normally distributed. There are special techniques that can be used. One technique
is to draw a histogram for the data and see if it is approximately bell-shaped. (Note: It does
not have to be exactly symmetric to be bell-shaped.) The numbers of branches of the 50 top
libraries are shown.
67 84 80 77 97 59 62 37 33 42
36 54 18 12 19 33 49 24 25 22
24 29 9 21 21 24 31 17 15 21
13 19 19 22 22 30 41 22 18 20
26 33 14 14 16 22 26 10 16 24

By : Sun Bunra 27
Institute of Technology of Cambodia Statistics ( 2022-2023 )

1. Construct a frequency distribution for the data.


2. Construct a histogram for the data.
3. Describe the shape of the histogram.
4. Based on your answer to question 3, do you feel that the distribution is approximately
normal?
5. Find the mean and standard deviation for the data.
6. What percent of the data values fall within 1 standard deviation of the mean?
7. What percent of the data values fall within 2 standard deviations of the mean?
8. What percent of the data values fall within 3 standard deviations of the mean?
9. Does your answer help support the conclusion you reached in question 4? Explain.
Solution :
1. Construct a frequency distribution for the data.
Recall: The 2k rule says that 2k ≥ n where:
• K is the number of classes
• N is trhe number of data point the width of each class can be found by using:

largest value − smallest value


k

The all data number of branches is 50 so we got:

2k ≥ 50
2k ≥ 26 ⇒ k = 6

max − min 97 − 9
we have, M AX = 97, M IN = 9, K = 6, i = = = 15
k 6

lower limit upper limit lower boundary upper boundary Midpoint frequency
9 23 8.5 23.5 16 24
24 38 23.5 38.5 31 15
39 53 38.5 53.5 46 3
54 68 53.5 68.5 61 4
69 83 68.5 83.5 76 2
84 98 83.5 98.5 91 2
50

cumulative frequency Class boundary Class Cumulative Frequency


2 14.5 − 18.5 14.5 0
17 18.5 − 22.5 18.5 2
20 22.5 − 26.5 22.5 9
24 26.5 − 30.5 26.5 31
26 30.5 − 34.5 30.5 41
28 34.5 − 38.5 34.5 47
50

By : Sun Bunra 28
Institute of Technology of Cambodia Statistics ( 2022-2023 )

2
Classes Boundaries Midrange Xm Frequency f f.Xm f.Xm
9 − 23 8.5 − 23.5 16 24 384 6144
24 − 38 23.5 − 38.5 31 15 465 14415
39 − 53 38.5 − 53.5 46 3 138 6348
54 − 68 53.5 − 68.5 61 4 244 14884
69 − 83 68.5 − 83.5 76 2 152 11552
84 − 98 83.5 − 98.5 91 2 182 16562
sigma 1565 69905
2. Construct a histogram for the data.

3. Describe the shape of the histogram.


According to the histogram, the distribution has falls to the left or it is positively skewed.
4. The distribution does not appear to be normal.
5. Find the mean and standard deviation for the data.
1X (24)(16) + (15)(31) + (3)(46) + (4)(61) + (2)(76) + (2)(91)
• Mean x̄ = f · xm =
n 50
X̄ = 31.5
P
(X − X̄)2
• Variance : s2 = = 424.36
n−1

• We got, Standard Deviation: s = S 2 = 20.6

6. What percent of the data values fall within 1 standard deviation of the mean?

X : |X − X̄| < ks ⇔ X̄ − s < X < X̄ + s ⇔ 11 < X < 52

According to the 50 data, there are 40 data which is betweenn 11 to 52.


Therefore, the percent of data values which falls within 1 standard deviation is

# posible data 40
P (−9.7 < X < 72.7) = × 100 = × 100 = 80%
# all data 50

By : Sun Bunra 29
Institute of Technology of Cambodia Statistics ( 2022-2023 )

7. What percent of the data values fall within 2 standard deviations of the mean?

X : |X − X̄| < ks ⇔ X̄ − 2s < X < X̄ + 2s ⇔ −10.02 < X < 72.62

According to the 50 data, there are 46 data which is betweenn −10.02to72.62. Therefore,
the percent of data values which falls within 2 standard deviation is
# posible data 46
P (−10.02 < X < 72.62) = × 100 = × 100 = 92%
# all data 50
8. What percent of the data values fall within 3 standard deviations of the mean?

X : |X − X̄| < ks ⇔ X̄ − 3s < X < X̄ + 3s ⇔ −30.68 < X < 93.28

According to the 50 data, there are 49 data which is between - 30.68 to 93.28. Therefore,
the percent of data values which falls within 1 standard deviation is
# posible data 49
P (−30.68 < X < 93.68) = × 100 = × 100 = 98%
# all data 50
9. Does your answer help support the conclusion you reached in question 4? Explain.
The answers from 6, 7, 8 does not support the conclusion from number distribution. That
mean, this distribution is approximately normal.

By : Sun Bunra 30
Institute of Technology of Cambodia Statistics ( 2022-2023 )

TD2 - Point Estimation


Exercise 1. Data on pull-off force (pounds) for connectors used in an automobile engine
application are as follows
79.3 75.1 78.2 74.1 73.9 75.0 77.6 77.3 73.8 74.6 75.5 74.0 74.7
75.9 72.9 73.8 74.2 78.1 75.4 76.3 75.3 76.2 74.9 78.0 75.1 76.8

(a) Calculate a point estimate of the mean pull-off force of all connectors in the population.
State which estimator you used and why.
(b) Calculate a point estimate of the pull-off force value that separates the weakest 50% of
the connectors in the population from the strongest 50%.
(c) Calculate point estimates of the population variance and the population standard devi-
ation.
(d) Calculate the standard error of the point estimate found in part (a). Interpret the stan-
dard error.
(e) Calculate a point estimate of the proportion of all connectors in the population whose
pull-off force is less than 73 pounds
Solution :
(a) Calculate a point estimate of the mean pull-off.

1 X
26
θ̂ = X̄ = xi = 75.61538
26 i=1

Therefore, Point estimate of the mean pull-off is 75.61538


(b) Calculate a point estimate of the pull-off force value that separates the weakest 50% of
the connectors in the population from the strongest 50%

x n2 − x n2 +1 x13 − x14
µ̂ = M D = = = 75.2
2 2
Therefore, µ̂ = 75.2
(c) Calculate point estimate of the population variance and the population standard devia-
tion.
We have s2 is population variance and s is the population standard deviation
1 X
26
2
Where s = (xi − x̄)2 = 7.5076
n − 1 i=1
√ √
Then, s = s2 = 7.50765 = 2.74
Therefore, s2 = 7.5076 and s = 2.74
(d) Calculate the standard errors of the population estimate found in part(a).
q r
s2 2.74
σ̂X̄ = V (X̄) = = √ = 0.53735
n 26
Therefore, σ̂X̄ = 0.53735

By : Sun Bunra 31
Institute of Technology of Cambodia Statistics ( 2022-2023 )

(e) Calculate a point estimate of the population of all connectors in the population whose
x
pull-off force is less than 73 pounds p̂ = , where x is the number of connectors whose force
n
is less than 73pounds
1
Then, p̂ = = 0.03846
26
Therefore, p̂ = 0.03846
Exercise 2. (a) A random sample of 10 houses in a particular area, each of which is heated
with natural gas, is selected and the amount of gas (therms) used during the month of
January is determined for each house. The resulting observations are

103 156 118 89 125 147 122 109 138 99

Let µ denote the average gas usage during January by all houses in this area. Compute a
point estimate of µ
(b) Suppose there are 10,000 houses in this area that use natural gas for heating. Let t
denote the total amount of gas used by all of these houses during January. Estimate t using
the data of part (a). What estimator did you use in computing your estimate?
(c) Use the data in part (a) to estimate p, the proportion of all houses that used at least 100
therms.
(d) Give a point estimate of the population median usage (the middle value in the population
of all houses) based on the sample of part (a). What estimator did you use?
Solution :
(a) Compute a point estimate of µ
P10
i=1 xi
µ = x̄ = = 120.7
16
Therefore, µ̂ = 120.7
(b) Estimate t using the data of part (a).
Note that t is the total amount of gas used by all of those houses during January.

t̂ = 10000 × µ̂ = 120700 therms.

In this computing, we use the point estimate of the average gas usage during January by 10
houses in particular area.
(c) Use the data in part (a) to estimate p.
x 8
p= = = 0.8
n 10
Therefore, p = 0.8
(d) Give a point estimate of the population median usage.

ˆ = M D = x5 + x6 = 120
µ̃
2
Therefore, µ̃ = 120

By : Sun Bunra 32
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 3. Let X1 , X2 , . . . , Xn be a random sample from a distribution having finite vari-


ance σ 2 . Show that
1 X 2
n
2
S = Xi − X̄
n − 1 i=1
is an unbiased estimator of σ 2 . Hint: Write
!
1 X
n
S2 = Xi2 − nX̄ 2
n−1 i=1

2

and compute E S
Solution :
1 X 2
n
Show that S =2
Xi − X̄ is an unbiased estimator of σ 2
n − 1 i=1
!
1 Xn
 2 1 Xn
We have S 2 = Xi − X̄ = X 2 − nX̄ 2
n − 1 i=1 n − 1 i=1 i
" !#
 1 X n
E S2 = E X 2 − nX̄ 2
n − 1 i=1 i
 !2 
 1 X n
1 Xn
E S2 = E (Xi )2 − E Xi 
n − 1 i=1 n(n − 1) i=1
 !2  " #
 X
n X
n
We have E Xi = σ 2 + µ2 and E 
2
Xi  = nσ 2 + E 2 Xi
i=1 i=1
 !2 
X
n
we obtained E  Xi  = nσ 2 + (nµ)2
i=1

 1 
Thus, E S 2 = nσ 2 + nµ2 − σ 2 − nµ2 = σ 2
n−1
Therefore, S is an unbiased estimator of σ 2
2

Exercise 4. Suppose that X is the number of observed ”successes” in a sample of n obser-


vations where p is the probability of success on each observation.
x
(a) Show that p̂ = is an unbiased estimator of p.
n
p
(b) Show that the standard error of p̂ is p(1 − p)/n. How would you estimate the standard
error?
Solution :
X
(a) Show that p̂ = is an unbiased estimator of p
n
Since X ∼ Bin(n, p)
 
X np
Then, E(p̂) = E = =p
n n

By : Sun Bunra 33
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, p̂ is an unbiased estimator of p .


p
(b) Show that the standard error of p̂ is p(1 − p)/n
Let σp̂ be a standard error.
r
p 1 p
so, σp̂ = V (p̂) = × np(1 − p) = p(1 − p)/n
n2
p
Therefore, The standard error of p̂ is p(1 − p)/n
Since the population standard deviation is rarely known,the standard error of proportion is
usually estimated as the sample standard deviation divided by the square root of its sample
size.

Exercise 5. Let X1 , X2 , . . . , Xn be a random sample drawn from a distribution with mean


X
n
2
µ and variance σ and let a1 , . . . , an be real numbers such that ai = 1.
i=1

X
n
Define X̂ = ai X i
i=1

(a) Show that X̄ is an unbiased estimator of µ.


X
n
(b) Show that V (X̄) ≤ V (X̂) (hence among all estimators of µ of the form ai Xi , X̄ is
i=1
the MVUE).
Solution :
(a) Show that X̂ is an unbiased estimator of µ.
Xn
We have X̂ = ai X i
i=1
!
X
n X
n
Then, E(X̂) = E ai X i = ai E (Xi )
i=1 i=1

 X
n
Since X1 , . . . , Xn ∼ N µ; σ 2 E(X̂) = µ ai = µ
i=1

Therefore, X̂ is an unbiased estimator of µ


(b) Show that V (X̄) ≤ V (X̂)
!
X
n
1 X
n
We have V (X̂) = σ 2 ai 2 and V (X̄) = 2 V Xi = σ 2 /n
i=1
n i=1

By Cauchy-Schwarz inequality:
 
(a1 b1 + . . . + an bn )2 ≤ a1 2 + . . . + a2n
b1 2 + . . . + bn 2 for ai , bi are positive for all
 1
i ∈ N Take b1 = . . . = bn = 1, then a21 + . . . + a2n ≥
n
Therefore, V (X̄) ≤ V (X̂)

By : Sun Bunra 34
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 6. Let X1 , X2 , . . . , Xn be a random sample from a distribution with unknown


mean −∞ < µ < +∞, and unknown variance σ 2 > 0. Show that the statistic X̄ and Y =
X1 + 2X2 + . . . + nXn
n(n+1)
are both unbiased estimators of µ. Further, show that V (X̄) < V (Y )
2

Solution :
Show that X̄ and Y are both unbiased estimator of µ
1X
n
We have E(X̄) = Xi µ and
n i=1
  n(n+1) Xn
X1 + 2X2 + . . . + nXn 2
E(Y ) = E = n(n+1)
E (Xi ) = µ
1 + 2 + 3 + ... + n i=1
2

Therefore, They are unbiased estimator .


Show that V (X̄) ≤ V (Y )
σ2 4σ 2 Xn
We have V (X̄) = and V (Y ) = 2 i2
n n (1 + n)2 i=1
By Cauchy-Schwarz inequality:
 
(a1 b1 + . . . + an bn )2 ≤ a21 + . . . + a2n b21 + . . . + b2n for ai , bi positive for all i ∈ N
Take b1 = . . . = bn = 1 and ai = i for all i ∈ [1, n]
 2
n(1 + n) 
We obtained, ≤ n 1 2 + 2 2 + . . . + n2
2
Then,  P 
1 4 ni=1 i2

n n2 (1 + n)2
multiply by σ 2 we get
V (X̄) ≤ V (Y )
Therefore, V (X̄) ≤ V (Y )

Exercise 7. Using a long rod that has length µ, you are going to lay out a square plot in
which the length of each side is µ. Thus the area of the plot will be µ2 . However, you do not
know the value of µ, so you decide to make n independent measurements X1 , X2 , . . . , Xn of
the length. Assume that each Xi has mean µ (unbiased measurements) and variance σ 2 .

(a) Show that X̄ 2 is not an unbiased estimator for µ2 . [Hint: For any rv Y, E Y 2 =
V (Y ) + [E(Y )]2 . Apply this with Y = X̄.]
(b) For what value of k is the estimator X̄ 2 − kS 2 unbiased for µ2 ? [Hint: Compute
E X̄ 2 − kS 2 .]
Solution :
(a) Show that X̄ 2 is not an unbiased estimator for µ2

We will show that E X̄ 2 ̸= µ2

V (X̄) = E X̄ 2 − E 2 X̄

By : Sun Bunra 35
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Thus, 
E X̄ 2 = V (X̄) + E 2 X̄
Then,
 σ2
E X̄ 2 = + µ2 ̸= µ2
n
Therefore, X̄ is not an unbiased estimator for µ2
2

(b) For what value k is the estimator X̄ 2 − kS 2 unbiased for µ2 .


   σ2
We have E X̄ 2 − kS 2 = E X̄ 2 − kE S 2 = + µ2 − kσ 2 = µ2 (It is unbiased)
n
1
Then, we obtained k =
n
1
Therefore, k =
n
Exercise 8. Let X1 , X2 , . . . , Xn be uniformly distributed on the interval [0, θ]. Recall that
the maximum likelihood estimator of θ is θ̂ = max (Xi ).
(a) Argue intuitively why θ̂ cannot be an unbiased estimator for θ.
(b) Suppose that E(θ̂) = nθ/(n + 1). Is it reasonable that θ̂ consistently underestimates θ ?
Show that the bias in the estimator approaches zero as n gets large.
(c) Propose an unbiased estimator for θ.
(d) Let Y = max (Xi ). Use the fact that Y ≤ y if and only if each Xi ≤ y to derive the
cumulative distribution function of Y . Then show that the probability density function of
Y is  n−1
 ny
, 0≤y≤θ
f (y) = θn
0, otherwise.
Use this result to show that the maximum likelihood estimator for θ is biased.
(e) We have two unbiased estimators for θ : the moment estimator θ̂1 = 2X̄ and θ̂2 =
[(n + 1)/n] max (Xi ), where max
 (Xi ) is the largest observation
 in a random sample of size
2 2
n. It can be shown that V θ̂1 = θ /(3n) and that V θ̂2 = θ /[n(n + 2)] . Show that if
n > 1, θ̂2 is a better estimator than θ̂1 . In what sense is it a better estimator of θ ?
Solution :
(a) Argue intuitively why θ can not be an unbiased estimator for θ.
Intuitively, θ̂ will be always be smaller than θ (Uniform distribution on interval [0, θ]). that
is why it should not be an unbiased estimator .
n
(b) Suppose that E(θ̂) = θ
n+1
That is reasonable that θ̂ consistently underestimate, because it is biased estimator.
Note: a statistic is positively biased if it tends to overestimate the parameter. A statistic is
negatively biased if it tends to underestimate the parameter.

n θ
B(θ̂) = E(θ̂) − θ = θ−θ =−
n+1 n+1

By : Sun Bunra 36
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, as n → ∞ we get B(θ̂) → 0


(c) Propose an unbiased estimator for θ
n
We have E(θ̂) = θ
n+1
 
n+1
Then, E θ̂ = θ
n
n+1
Conclusion, We can propose Tn = θ̂ as an unbiased estimator of θ
n
(d) Let Y = max (Xi ).
Find cdf of Y .
We have P (Y < y) = P (max (Xi ) < y) and since X1 , . . . , Xn are independent.
We obtained, P (Y < y) = [P (X < y)]n
y
Since X is uniformly distributed, so P (X < y) = , for 0 ≤ y ≤ θ
θ
 y n
Therefore, P (Y < y) =
θ
y n−1
so its pdf is given by: fY (y) = n n for 0 ≤ y ≤ θ and zero elsewhere.
θ
Use this result to show that the maximum likelihood estimator for θ is biased.
Z θ n
ny n
We have E(θ̂) = n
dy = θ
0 θ n+1
Therefore, It is biased estimator .
n+1
(e) Given two unbiased estimators for θ, θ̂1 = 2X̄, θ̂2 = max (Xi )
n
  θ2   θ2
And we have V θ̂1 = and V θ̂2 =
3n n(n + 1)
   
For n ≥ 1 we have V θ̂1 ≥ V θ̂2

Therefore, θ̂2 is a better estimator .

Exercise 9. A random sample X1 , X2 , . . . , Xn of size n is taken from a Poisson distribution


with a mean of λ, 0 < λ < ∞
(a) Show that the maximum likelihood estimator for λ is λ̂ = X̄.
(b) Let X equal the number of flaws per 100 feet of a used computer tape. Assume that X
has a Poisson distribution with a mean of λ. If 40 observations of X yielded 5 zeros, 7 ones,
12 twos, 9 threes, 5 fours, 1 five, and 1 six, find the maximum likelihood estimate of λ.
Solution :
(a) Show that the maximum likelihood estimator for λ is λ̂ = X̄
We have X1 , . . . , Xn ∼ Poi(λ)
e λ λ xi
Then p (xi ) = for xi ≥ 0
xi !

By : Sun Bunra 37
Institute of Technology of Cambodia Statistics ( 2022-2023 )
P
e−nλ λ ni=1 xi
since, likelihood function is L(x; θ) =
x1 !x2 ! . . . xn !
 ∑n
x

λ i=1 i
So, ln L(x; θ) = −nλ + ln
x1 !x2 ! . . . xn !
∂ X n
∑n 1
Then, ln L(x; θ) = −n + xi × λ i=1 xi −1 × Pn
∂λ λ i=1 xi
i=1 Pn
∂ xi
• If, ln L(x; θ) = 0 ⇐⇒ −n + i=1 = 0
∂λ λ
Therefore, λ̂M LE = X̄
(b) Find maximum likelihood estimate of λ.
1X
40
We have λ̂ = xi
n i=1

Then, λ̂ = 2.075 flaws / feet


Therefore, The point estimate of λ is 2.075 flaws/feet

Exercise 10. Let f (x) = (1/θ)x(1−θ)/θ , 0 < x < 1, 0 < θ < ∞.


X
n
(a) Show that the maximum likelihood estimator of θ is θ̂ = −(1/n) ln Xi .
i=1

(b) Show that E(θ̂) = θ and thus that θ̂ is an unbiased estimator of θ.


Solution :
1X
n
(a) Show that maximum likelihood estimator of θ is θ̂ = ln (Xi )
n i=1
1 1−θ
We have f (xi ; θ) = x θ , 0 < x < 1 and 0 < θ < ∞
θ
Yn
1 Y 1−θ
n
We have likelihood function; L(x; θ) = f (xi ; θ) = n xi θ
i=1
θ i=1

Then,
1−θ X
n
ln L(x; θ) = −n ln(θ) + xi
θ i=1
1 X
n
∂ n
ln L(x; θ) = − − 2 ln (xi )
∂θ θ θ i=1
1 X
n
∂ n
ln L(x; θ) = 0 ⇐⇒ − 2 ln (xi ) = 0
∂θ θ θ i=1

1X
n
Then, θ = − ln (xi )
n i=1

1X
n
Therefore, θ̂M LE = − ln (xi )
n i=1

By : Sun Bunra 38
Institute of Technology of Cambodia Statistics ( 2022-2023 )

(b) Show that E(θ̂) = θ


!
1X 1X
n n
E − ln (xi ) =− E (ln Xi )
n i=1 n i=1

Z 1 Z 1
1 1−θ
We have E(ln X) = ln xf (x; θ)dx = x θ ln xdx
0 θ 0
1
By changing variable let u = ln x so du = dx
x
Z 0
1 u
Then E(u) = ue θ du = −θ (Integrating by part)
θ −∞
!
1X 1X
n n
Therefore, E − ln (xi ) = − E (ln Xi ) = θ
n i=1 n i=1

Therefore, E(θ̂) = θ

Exercise 11. Let X1 , X2 , . . . , Xn be a random sample of size n from the exponential distri-
bution whose pdf is f (x; θ) = (1/θ)e−x/θ , 0 < x < ∞, 0 < θ < ∞.
(a) Show that X̄ is an unbiased estimator of θ.
(b) Show that the variance of X̄ is θ2 /n.
(c) What is a good estimate of θ if a random sample of size 5 yielded the sample values
3.5, 8.1, 0.9, 4.4, and 0.5 ?
Solution :
(a) Show that X̄ is an unbiased estimator of θ.
1X
n
We have E(X̄) = E (Xi ) and since X1 , . . . , Xn ∼ Exp(θ)
n i=1
Then, E(X̄) = E(X) where E(X) = θ
Therefore, It is unbiased estimator of θ
θ2
(b) Show that variance of X̄ is
n
1 X
n
We have V (X̄) = 2 V (Xi )
n i=1
Since X1 , . . . , Xn ∼ Exp(θ)
1 θ2
Then, V (X̄) = V (X) =
n n
θ2
Therefore, variance of X̄ is
n
(c) What is a good estimate of θ if a random sample of size 5 yielded the sample value
3.5, 8.1, 0.9, 4.4 and 0.5.
1
By Cramer-Rao inequality, a good estimate must be satisfied V (θ̂) =
nI(θ)

By : Sun Bunra 39
Institute of Technology of Cambodia Statistics ( 2022-2023 )
 
∂2
and I(θ) = −E ln f (x; ; θ)
∂θ2
We have  
1 −x
ln f (x; θ) = ln eθ
θ
Then
∂ 1 x
ln f (x; θ) = − + 2
∂θ θ θ
Then
∂2 1 2x
2
ln f (x; θ) = 2 − 3
∂θ θ θ
So,  
1 2x 1
I(θ) = −E 2
− 3 =
θ θ θ2
θ2
Then, V (θ̂) =
n
Conclusion, a good estimate of θ is X̄ using the given data we get x̄ = 3.48
Therefore, a good estimate is 3.48
Exercise 12. A diagnostic test for a certain disense is applied to n individuals known to not
have the disease. Let X = the number among the n test results that are positive (indicating
presence of the disease, so X is the number of false positives) and p = the probability that
a disease-free individual’s test result is positive (i.e., p is the true proportion of test results
from disease-free individuals that are positive). Assume that only X is available rather than
the actual sequence of test results.
(a) Derive the maximum likelihood estimator of p. If n = 20 and x = 3, what is the estimate?
(b) Is the estimator of part (a) unbiased?
(c) If n = 20 and x = 3, what is the MLE of the probability (1 − p)5 that none of the next
five tests done on disease free individuals are positive?
Solution :
Let X = the number among the n test results that are positive.
p = the probability that a disease-free individual’s test result is positive.
(a) Derive the maximum likelihood estimator of p,If n = 20 and x = 3 what is the estimate?
We have X ∼ Bin(n, p)
So, P (X = x) = C(n, x)px (1 − p)n−x
we have a likelihood function, L(x; p) = C(n, x)px (1 − p)n−x
Then,
ln L(x; θ) = ln (C(n, x)px (1 − p)x )
= ln C(n; x) + ln (px ) + ln(1 − p)n−x
∂ x n−x
So, ln L(x; θ) = −
∂p p 1−p
∂ x n−x x
• If ln L(x; θ) = 0 ⇐⇒ − = 0 then p̂ =
∂p p 1−p n

By : Sun Bunra 40
Institute of Technology of Cambodia Statistics ( 2022-2023 )

x
Therefore, p̂ =
n
3
using the given data we get p̂ = = 0.15
20
(b) In the estimate of part(a) is unbiased?
 
x X 1
we have p̂ = then, E p̂ = = E(X) = p
n n n
Therefore, It is unbiased .
(c) what is mle of the probability (1 − p)5
For n = 20 and x = 3p̂ = 0.15 Thus, (1 − p̂)5 = (1 − 0.15)5 = 0.855 = 0.443
Therefore, (1 − p̂)5 = 0.443

Exercise 13. The shear strength of each of ten test spot welds is determined, yielding the
following data (psi):

392 376 401 367 389 362 409 415 358 375

(a) Assuming that shear strength is normally distributed, estimate the true average shear
strength and standard deviation of shear strength using the method of maximum likelihood.
(b) Again assuming a normal distribution, estimate the strength value below which 95% of
all welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ
? Now use the invariance principle.]
Solution :
(a) Assuming that shear strength is normally distributed, estimate the true average shear
strength and standard deviation of shear strength using the method of maximum likelihood.

Since X ∼ N µ; σ 2
P10
xi
Then, by previous exercise we get µ̂ = X̄ = i=1 = 384.4
10

and for X1 , . . . , Xn ∼ N µ, σ ;
2

we have
− n2 ∑n
n 1(xi −µ)
2

L(x; σ) = 2πσ 2 × e− 2σ 2

 1 X
n
n
ln L(x; σ) = × ln 2πσ 2 − 2 (xi − µ)2
2 2σ i=1
1 X
n
∂ n 2
L(x; σ) = − × + 3 (xi − µ)2
∂σ 2 σ σ i=1

1 X
n
∂ n
• If , L(x; σ) = 0 ⇐⇒ − + 3 (xi − µ)2 = 0
∂σ σ σ i=1
v
u n
u1 X
Thus, σ̂ = t (xi − µ)2
n i=1

By : Sun Bunra 41
Institute of Technology of Cambodia Statistics ( 2022-2023 )
v
u n
u1 X
Therefore, we have µ̂ = X̄ = 384.4 and σ̂ = t (xi − µ)2 = 3556.4
n i=1

(b) Again assuming a normal distribution, estimate the strength value below which 95% of
all welds will have their strengths. [Hint: What is the 95 th percentile in terms of µ and σ
? Now use the invariance principle.]
We have P (X ≤ c) = 0.95
 
Z −µ c−µ
Since, P ≤ = 0.95
σ σ
 
c−µ
So, ϕ = 0.95
σ
Then, ĉ = 1.65σ̂ + µ̂ (by invariance principle)
Therefore, estimate of strength is ĉ = 6252.46

Exercise 14. At time t = 0, 20 identical components are tested. The lifetime distribution
of each is exponential with parameter λ. The experimenter then leaves the test facility
unmonitored. On his return 24 hours later, the experimenter immediately terminates the test
after noticing that y = 15 of the 20 components are still in operation (so 5 have failed). Derive
the MLE of λ. [Hint: Let Y = the number that survive 24 hours. Then Y ∼ Bin(n, p). What
is the mle of p ? Now notice that p = P (Xi ≥ 24), where Xi is exponentially distributed.
This relates λ to p, so the former can be estimated once the latter has been.]
Solution :
Let Ti be the life time of component i th, Ti ∼ Exp(λ)
Derive mle of λ
Let Y = the number that survive 24 hours. Y ∼ Bin(n; p)
24
and we have p = P (Ti ≥ 24) = e λ , since Y ∼ Bin(n; p)
Then,
p(y) = C(n, y)py (1 − p)n−y
where 
ln L(y, p) = ln C(n, y)py (1 − p)n−y
= ln C(n, y) + y ln p + (n − y) ln(1 − p)
• If

ln L(y, p) = 0
∂p
y 15 24
Then, p̂ = = = 0.75 and λ̂ =
n 20 ln 0.75
24
Therefore, λ̂ =
ln 0.75
Exercise 15. Let X1 , X2 , . . . , Xn be a random sample from Bin(1, p) (i.e., n Bernoulli trials).
Thus,
Xn
Y = Xi ∼ Bin(n, p)
i=1

By : Sun Bunra 42
Institute of Technology of Cambodia Statistics ( 2022-2023 )

(a) Show that X̄ = Y /n is an unbiased estimator of p.


(b) Show that Var(X̄) = p(1 − p)/n.
 
(c) Show that E[X̄(1 − X̄)/n] = (n − 1) p(1 − p)/n2 .
(d) Find the value of c so that cX̄(1 − X̄) is an unbiased estimator of Var(X̄) = p(1 − p)/n
Solution :
(a) Show that X̄ = Y /n is an unbiased estimator of p.

 
1X
n
Y 1
E(X̄) = E = E (Xi ) = np × = p
n n i=1 n
Note that E (Xi ) = p because X is Bernoulli Distributed
p(1 − p)
(b) Show that Var(X̄) =
n

1 X
n
1 p(1 − p)
V (X̄) = 2 V (Xi ) = 2 × np(1 − p) =
n i=1 n n

p(1 − p)
Therefore, Var(X̄) =
n
 
(c) Show that E[X̄(1 − X̄)/n] = (n − 1) p(1 − p)/n2
1 1 
We have E[X̄(1 − X̄)/n] = E(X̄) − E X̄ 2
n n
 p(1 − p) − np2
By previous question, E(X̄) = p and E X̄ 2 = Var(X̄) − E 2 (X̄) =
n
p p 2
p(1 − p)
Then, E[X̄(1 − X̄)/n] = − −
n n n2
E[X̄(1 − X̄)/n] = (n − 1)p(1 − p)/n2
 
Therefore, E[X̄(1 − X̄)/n] = (n − 1) p(1 − p)/n2
(d) Find the value c
1
by using the (c) question, we obtained c =
n−1
1
Therefore, c =
n−1
Exercise 16. Assume that the number of defects in a car has a Poisson distribution with
parameter λ. To estimate λ we obtain the random sample X1 , X2 , . . . , Xn .
(a) Find the Fisher information in a single observation using two methods.
(b) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of λ.
(c) Find the MLE of λ and show that the MLE is an efficient estimator.
Solution :
(a) Find the Fisher information in a single observation using two methods.

By : Sun Bunra 43
Institute of Technology of Cambodia Statistics ( 2022-2023 )

• First method.  
∂2
I(λ) = −E ln f (X; λ)
∂λ2
e λ λx
f (x; λ) =
x!
∂2 x
Then ln f (x; λ) = −λ + x ln λ − ln x! =⇒ 2
ln f (x; λ) = − 2
∂λ λ
1 1
So, I(λ) = 2
E(X) =
λ λ
1
Therefore, I(λ) =
λ
• Second method
   
∂ X 1
I(λ) = V ln f (X; λ) =V −1 + =
∂λ λ λ
1
Therefore, I(λ) =
λ
(b) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of λ.
1 λ
The lower bound is =
nI(λ) n
(c) Find the mle of λ and show that the mle is an efficient estimator.
By using method of moment,
we have 1st sample moment is E(X) = X̄ (1)
and 1 st population moment is E(X) = λ (2)
By (1) and (2); we get λ̂ = X̄
1 X
n
λ
and we have V (X̄) = 2 V (Xi ) = which is equal to the lower bound of Cramer-Rao
n i=1 n
inequality.
Thus, it is an efficient estimator.
Therefore, The efficient estimator is λ̂ = X̄
Exercise 17. Suppose the waiting time for a bus is uniformly distributed on [0, θ] and the
results x1 , . . . , xn of a random sample from this distribution have been observed.
(a) Find the MLE θ̂ of θ.
n+1
(b) Letting θ̄ = θ̂, show that θ̄ is unbiased and find its variance.
n
(c) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of θ.
Solution :
(a) Find the mle of θ̂ of θ
1
We have X1 , . . . Xn ∼ U [0, θ] and the likelihood function L(x; θ) = for 0 < xi < θ
θn
In order to maximize the likelihood function we choose θ̂ = max (xi )

By : Sun Bunra 44
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, The mle of θ̂ = max (xi )


n+1
(b) Letting θ̃ = θ̂, show that θ̃ is unbiased and find its variance.
n
• Find cdf of Y .
We have P (θ̂ < y) = P (max (Xi ) < y) and since X1 , . . . , Xn are independent.
We obtained, P (θ̂ < y) = [P (X < y)]n
y
Since X is uniformly distributed, so P (X < y) = , for 0 ≤ y ≤ θ
θ
 y n
Therefore, P (θ̂ < y) =
θ
so its pdf is given by  n−1
 y
n n , for 0 < y < θ
fθ̂ (y) = θ
0, otherwise.
Z θ n
ny n
Since, E(θ̂) = dy = θ
0 θn n+1
n+1
Then, E(θ̂) = θ
n
n+1
Therefore, θ̃ = θ̂ is unbiased estimator of θ .
n
• Find its variance.
 
n+1 (n + 1)2
V (θ̃) = V θ̂ = V (θ̂)
n n2
 
V (θ̂) = E θ̂2 − E 2 (θ̂)
  Z θ
ny n+1 n 2
and E θ̂2 = n
dy = θ
0 θ n+2
 
n n2
Thus, V (θ̂) = − θ2
n + 2 (n + 1)2
 
(n + 1)2 n n2 1
so, V (θ̃) = 2
− 2
θ2 = θ2
n n + 2 (n + 1) n(n + 2)
(c) Find the Cramer-Rao lower bound for the variance of an unbiased estimator of θ.
 2 

we have, I(θ) = −E ln f (x; θ)
∂θ2
1
Then, f (x; θ) = =⇒ ln f (x; θ) = − ln θ
θ
2
∂ 1 1
and 2 ln f (x; θ) = 2 so, I(θ) = − 2
∂θ θ θ
2
θ
Therefore, The lower bound is −
n

By : Sun Bunra 45
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 18. An estimator θ̂ is said to be consistent if for any ϵ > 0, P (|θ̂ − θ| ≥ ϵ) → 0


as n → ∞. That is, θ̂ is consistent if, as the sample size gets larger, it is less and less likely
that θ̂ will be further than ϵ from the true value of θ. Show that X̄ is a consistent estimator
of µ when σ 2 < ∞ by using Chebyshev’s inequality.
Hint: (Chebyshev’s inequality) Let X be a random variable with finite expected value µ and
finite non-zero variance σ 2 . Then for any real number k > 0,
1
P (|X − µ| ≥ kσ) ≤ .
k2
Solution :
We will show that for ϵ > 0, P (|X̄ − µ| ≥ ϵ) → 0, as n → ∞
By using Chebyshev’s Inequality:
1
P (|X̄ − µ| ≥ ϵ) ≤
(ϵ/σ ′ )2

and since σ 2 < ∞, σ 2 = nσ ′2 . so,

σ2
P (|X̄ − µ| ≥ ϵ) ≤ 2

as n → ∞ we have P (|X̄ − µ| ≥ ϵ) → 0, as n → ∞
Therefore, X̄ is a consistently estimator of µ

By : Sun Bunra 46
Institute of Technology of Cambodia Statistics ( 2022-2023 )

I3-TD3
(Confidence Intervals)

Exercise 1. (a) Suppose we construct a 99% confidence interval. What are we 99% confident
about?
(b) Which of the confidence intervals is wider, 90% or 99% ?
(c) In computing a confidence interval, when do you use the t-distribution and when do you
use z, with normal approximation?
(d) How does the sample size affect the width of a confidence interval?
Solution :
a) A confidence interval for a parameter is an interval of numbers within which we expect
the true value of the population parameter to be contained. When we construct a 99%
confidence interval, we are 99% confident that the true value of the parameter is in our
confidence interval.
b) The larger the confidence, the wider the interval. 99% is wider than 90%.
c) If population standard deviation σ is not known, then using t-distribution is correct.If
population standard deviation σ is known, then using the normal distribution is correct.
d) Increasing the sample size decreases the width of confidence intervals, because it decreases
the standard error.

Exercise 2. Consider the probability statement


 
X̄ − µ
P −2.81 < Z = √ < 2.75 =k
σ/ n

where X̄ is the mean of a random sample of size n from N µ, σ 2 distribution with known
σ2
(a) Find k.
(b) Use this statement to find a confidence interval for µ.
(c) What is the confidence level of this confidence interval?
(d) Find a symmetric confidence interval for µ.
Solution :
We have
!
X̄ − µ
P −2.81 ≤ Z = ≤ 2.75 =k
√σ
n

since Z ∼ N (0; 1)
so, we have ϕ(2.75) − ϕ(−2.81) = k
then k = 0.99454.
(b) Use this statement to find CI.

By : Sun Bunra 47
Institute of Technology of Cambodia Statistics ( 2022-2023 )

We have
!
X̄ − µ
P −2.81 ≤ Z = ≤ 2.75 =k
√σ
n
 
σ σ
P −2.81 √ ≤ x̄ − µ ≤ 2.75 √ = 0.99454
n n
 
σ σ
P x̄ − 2.75 √ ≤ µ ≤ x̄ + 2.81 √ = 0.99454
n n

Then the CI of µ is
 
σ σ
I(µ) = x̄ − 2.75 √ , x̄ + 2.81 √
n n
 
σ σ
Therefore, I(µ) = x̄ − 2.75 √ , x̄ + 2.81 √
n n
(c) The confident level of µ is 99%
(d) Symmetric confidence interval for µ
By symmetric principal

P −z α2 ≤ Z ≤ z α2 = k
 
σ σ
then, P x̄ − z α2 √ ≤ µ ≤ x̄ + z α2 √ =k
n n
 
σ σ
Therefore, the symmetric interval is x̄ − z 2 √ , x̄ + z 2 √
α α
n n

Exercise 3. Let X1 , . . . , Xn be a random sample from an N µ, σ 2 , where the value of σ 2
is unknown.
(a) Construct a 100(1 − α)% confidence interval for µ when the value of σ 2 is known.
(b) Construct a 100(1 − α)% confidence interval for µ when the value of σ 2 is unknown.
Solution :
Let X1 , . . . . . . , Xn is a random sample whose σ 2 is unknown.
(a) Construct a (1 − α)100%, σ 2 is known
 X̄ − µ
We have X ∼ N µ, σ 2 . let Z = σ then Z ∼ N (0, 1)

n

By symmetric principal,

P −z α2 ≤ Z ≤ z α2 = 1 − α
 
σ σ
P x̄ − z α2 √ ≤ µ ≤ x̄ + z α2 √ =1−α
n n
 
σ σ
Therefore, CI of µ is I(µ) = x̄ − z α2 √ ; x̄ + z α2 √
n n

By : Sun Bunra 48
Institute of Technology of Cambodia Statistics ( 2022-2023 )

(b) Construct a (1 − α)100%, σ 2 is unknown.

X̄ − µ
T = ∼ t(n − 1)
√S
n

P −t α2 ,n−1 ≤ T ≤ t α2 ,n−1 = 1 − α
 
σ σ
P x̄ − t 2 ,n−1 √ ≤ µ ≤ x̄ + t 2 ,n−1 √
α α =1−α
n n
 
σ σ
then, a (1 − α)100% CI for µ is I(µ) = x̄ − t α2 √ , x̄ + t α2 √
n n

Exercise 4. A random sample of size 50 from a particular brand of 16-ounce tea packets
produced a mean weight of 15.65 ounces. Assume that the weights of these brands of
tea packets are normally distributed with standard deviation of 0.59 ounce. Find a 95%
confidence interval for the true mean µ.
Solution : Find a 95% confidence interval for the true mean µ

Central Limit Theorem


Let X1 , . . . , XN be independent discrete random variables and let Y = X1 + . . . + Xn ,
suppose that we are interested in finding P (A) = P (l ≤ Y ≤ u) using CLT(Central
1 1
Limit Theorem). We can write P (A) = P l − ≤ Y ≤ u + . It turns out that
2 2
the above expression provides a better approximate for P (A) when applying CLT.

⇒ (1 − α) = 0.95 ⇒ α = 0.05
  
′ α ′ 0.05
Since α = 0.05 ⇒ zα/2 = ϕ 1 − =ϕ 1− = 1.96
2 2
n = 50, x̄ = 15.65, σ = 0.59
Thus a 95% confidence interval for the true mean µ
   
σ σ 0.39
x̄ ± z · = 15.65 ± 1.96 √
α/2 n 50
= [15.65 ± 13.83]
= [1.79, 29.50]

Therefore, a 95% confidence interval for the true moan µ is [1.79, 29.5]

Exercise 5. A researcher wishes to estimate within $25 the average cost of postage a com-
munity college spends in one year. If she wishes to be 90% confident, how large of a sample
will be necessary if the population standard deviation is $80.
Solution : Find the sample size
We have (1 − α) = 90% ⇔ (1 − α) = 0.9 ⇒ α = 0.1
 
−1 0.1
⇒ z2/2 = ϕ 1− = ϕ−1 (0.95) = 1.65
2

By : Sun Bunra 49
Institute of Technology of Cambodia Statistics ( 2022-2023 )

And σ = 80 Then z 2
2/2 σ
n=
E
 2
1.65 × 80
=
25
= 27.87 ≈ 28
Therefore, the sample size is n = 28

Exercise 6. A university dean wishes to estimate the average number of hours that
freshmen study each week. The standard deviation from a previous study is 2.6 hours. How
large a sample must be selected if he wants to be 99% confident of finding whether the true
mean differs from the sample mean by 0.5 hour?
Solution : Find the sample size
 
−1 0.01
We have (1 − α) = 0.99 ⇒ α = 0.01 ⇒ zα/2 ϕ 1− = 2.58
2
since σ = 2.6 and E = 0.5
 2  2
z2/2σ 2.58 × 2.6
⇒n= = = 179.98 ≈ 480
E 0.5
Therefore, the sample size is n = 480

Exercise 7. In a large university, the following are the ages of 20 randomly chosen employees:

24 31 28 43 28 56 48 39 52 32
38 49 51 49 62 33 41 58 63 56

Assuming that the data come from a normal population, construct a 95% confidence interval
for the population mean µ of the ages of the employees of this university. Interpret your
answer.
Solution :
Construct a 95% confidence interval n = 20, x̄ = 44.05 By symmetric principal
 
s s
we have I(µ) = x̄ − t α2 √ , x̄ − t α2 √
n n
v
u
u 1 X 20
We have S = t (xi − x̄)2 = 12
n − 1 i=1

Since
t α2 ,n−1 = t0.025,19 = 2.093
 
12 12
⇒ I(µ) = 44.05 − 2.093 × √ , 44.05 − 2.093 × √ = (39.43, 49.66)
20 20
Then, I(µ) = (39.43, 49.66)

By : Sun Bunra 50
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 8. A random sample of size 26 is drawn from a population having a normal


distribution. The sample mean and the sample standard deviation from the data are given,
respectively, as x̄ = −2.22 and s = 1.67. Construct a 98% confidence interval for the
population mean µ and interpret the result.
Solution :

Exercise 9. A random sample from a normal population yields the following 25 values:

90 87 121 96 106 107 89 107 83 92


117 93 98 120 97 109 78 87 99 79
104 85 91 107 89

(a) Calculate an unbiased estimate µ̂ of the population mean.


(b) Give approximate 99% confidence interval for the population mean.
Solution :
(a) Calculate an unbiased estimate µ̂ of the population mean.
we have
1X
n
µ̂ = xi
n i=1
1
= (90 + 87 + . . . + 89)
25
2431
= = 97.24
25
Therefore, µ̂ = 97.24
• Find S
by formula

1 X
n
2
S = (xi − x̄)2
n − 1 i=1

1  2
= (90 − 97.24)2 + 187 − 97.24 + · · · + (89 − 97.24)2
24
564.53
=
24
= 23.52
⇒ S = 4.84

(b) Give approximate 99% confidence interval for the population mean.
 
S
CI of mean rs x̄ ± t−1 /2, n − 1 √
n
Since x̄ = 97.24
α = 0.01 ⇒ t 2 , n − 1 = t0.005,24 = 8.091
2

s = 4.84
n = 25

By : Sun Bunra 51
Institute of Technology of Cambodia Statistics ( 2022-2023 )

So a 99% confidence interval for the population mean


 
4.84
is 97.24 ± 3.091 × = [94.24, 100.23]
5
Exercise 10. The following data represent the rates (micrometers per hour) at which a
razor cut made in the skin of anesthetized newts is closed by new cells.
28 20 21 39 32 23 18 31 14 23
18 22 28 24 33 12 23 21 25 25

(a) Can we say that the data are approximately normally distributed?
(b) Find a 95% confidence interval for population mean rate µ for the new cells to close a
razor cut made in the skin of anesthetized newts.
(c) Find a 99% confidence interval for µ.
(d) Is the 95% CI wider or narrower than the 99% CI? Briefly explain why.
Solution :


Exercise 11. Let X1 , . . . , Xn be a random sample from a normal distribution N µ, σ 2 ,
where the values of µ and σ 2 are unknown.
(a) Construct a 100(1 − α)% confidence interval for σ 2 , choosing an appropriate pivot. In-
terpret its meaning.
(b) Suppose a random sample from a normal distribution gives the following summary statis-
tics: n = 21, X̄ = 44.3, and s = 3.96. Using part (a), find a 90% confidence interval for σ 2 .
Interpret its meaning.
Solution :

Let X1 , . . . ., Xn ∼ N µ, σ 2 , where σ 2 is unknown
(n − 1)S 2
(a) We have ∼ χ2n−1
σ2
(n − 1)S 2 2
the appropriate pivot is , σ is unknown.
σ2
 
we have P χ21− α ,n−1 ≤ X 2 ≤ χ2α ,n−1 = 1 − α
2 2
!
(n − 1)s2 (n − 1)s 2
then, P ≤ σ2 ≤ 2 =1−α
χ2α ,n−1 χ1− α ,n−1
!
2 2

 (n − 1)s 2
(n − 1)s 2
Therefore, I σ 2 = ,
χ2α ,n−1 χ21− α ,n−1
2 2

We are (1 − α)100% confident that the variance lies in this interval.


(b) Find a 90% confidence interval for σ 2 .
we have n = 21, x̄ = 44., s = 3.96 and α = 0.1

Therefore, I σ 2 = (9.98, 28.90)

We are 90% confident that σ 2 lies in I σ 2 .

By : Sun Bunra 52
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 12. A random sample of 20 automobiles has a pollution by-product release stan-
dard deviation of 2.3 ounces when 1 gallon of gasoline is used. Find the 90% confidence
interval of the population standard deviation. Assume the variable is normally distributed.
Solution :
Given
n = 20
s = 2.3
c = 1 − α = 90% = 0.90
Determine the critical values using table G in the row df = n − 1 = 20 − 1 = 19 and in the
α α
columns of and 1 −
2 2
χ21−0.05 = χ20.95 = 10.117
χ20.05 = 30.144
The boundaries of the confidence interval are then:
s r
n−1 20 − 1
2
·s= · 2.3 ≈ 1.8260
χα/2 30.144
s r
n−1 20 − 1
2
·s= · 2.3 ≈ 3.1520
χ1−α/2 10.117

Therefore
1.8260 < σ < 3.1520

Exercise 13. A random sample from a normal population yields the following 25 values:

90 87 121 96 106 107 89 107 83 92


117 93 98 120 97 109 78 87 99 79
104 85 91 107 89

(a) Calculate an unbiased estimate of the population variance.


(b) Give approximate 99% confidence interval for the population variance.
(c) Interpret your results and state any assumptions you made in order to solve the problem.
Solution :
(a) Calculate an unbiased estimate of σ 2
By previous lesson,we have s2 is an unbiased estimate of σ 2
1 X
n
where s2 = (xi − x̄)2
n − 1 i=1
By using the given data, s2 = 142.5024
(b) Give approximate 99% confidence interval for the population variance.
We have !
2
 (n − 1)s2 (n − 1)s2
I σ = ,
χ2α ,n−1 χ21− α ,n−1
2 2

where α = 0.01n = 25 and s = 11.93

By : Sun Bunra 53
Institute of Technology of Cambodia Statistics ( 2022-2023 )

then, we obtained 
I σ 2 = (78.19, 360.36)

Therefore, I σ 2 = (78.19, 360.36)

(c) We are 99% confident that the population variance lies in I σ 2 .
We can assume that an unbiased estimate for population variance of any random sample
with finite variance is s2 .

Exercise 14. In a random sample of 50 college seniors, 18 indicated that they were planning
to pursue a graduate degree. Find a 98% confidence interval for the true proportion of all
college seniors planning to pursue a graduate degree, and interpret the result, and state any
assumptions you have made.
Solution :
Find a 98% CI for the true proportion
18
We have n = 50 and p̂ = = 0.36
50
since np̂ ≥ 5 so,it is approximately normally distributed.

σ 2 = p̂(1 − p̂), α = 0.02


r r !
p̂(1 − p̂) p̂(1 − p̂)
Then a 98% is given by I(p) = p̂ − z α2 , p̂ + z α2
n n
Then, I(p) = (0.292, 0.428)
Therefore, I(p) = (0.292, 0.428)
We are 98% confident that the true proportion of all college seniors planning to pursue a
graduate degree lies in I(p) = (0.292, 0.428)

Exercise 15. It is believed that slightly over 40% of Cambodians own pets. How large a
sample is necessary to estimate the true proportion within 0.02 with 90% confidence?
Solution :
Since z α2 = 1.645, E = 0.02, p̂ = 0.4, and q̂ = 1 − 0.4 = 0.6.
r !2
z α2
n = p̂q̂
E
r !2
1.645
= (0.4)(0.6)
0.02
= 1623.615

which, when rounded up, is 1624 . So, the researcher must interview 1624 people.

Exercise 16. In a random sample of 500 items from a large lot of manufactured items, there
were 40 defectives. (a) Find a 90% confidence interval for the true proportion of defectives
in the lot. (b) Is the assumption of normal approximation valid? (c) Suppose we suspect
that another lot has the same proportion of defectives as in the first lot. What should be
the sample size if we want to estimate the true proportion within 0.01 with 90% confidence?

By : Sun Bunra 54
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
(a) Find a 90% CI for the true proportion of defectives in the lot.
40
We have n = 500 and p̂ = p = = 0.08 so np = 40 ≥ 5 so, it is approximately normally
500
distributed.
r r !
p̂(1 − p̂) p̂(1 − p̂)
then, a 90% CI is given by: I(p) = p̂ − z α2 , p̂ + z α2
n n
Therefore, I(p) = (0.06, 0.01)
(b) By the previous question, the assumption is true.
(c) Find the sample size.
 z α 2
we have n = 2
p̂(1 − p̂),
E
where E = 0.01
1.6452 2 × 23
n= = 1992
0.01 252
Therefore, n = 1992
Exercise 17. A study found that 73% of randomly selected prekindergarten children ages
3 to 5 whose mothers had a bachelor’s degree or higher were enrolled in center-based early
childhood care and education programs. How large a sample is needed to estimate the true
proportion within 3 percentage points with 95% confidence? How large a sample is needed
if you had no prior knowledge of the proportion?
Solution :
Given:
p̂ = 73% = 0.73
c = 95%
E = 3% = 0.03
Formula sample size
 2  2
zα/2 p̂q̂ zα/2 p̂(1 − p̂)
p̂ known: n = =
E2 E2
 2
zα/2 0.25
p̂ unknown: n =
E2
For confidence level 1 − α = 0.95
• Determine zα/2 = z0.025 using table E (look up 0.025 in the table, the z-score is then the
found z-score with opposite sign)
zα/2 = 1.96
p̂ is known, then the sample size is (round up!)
 2
zα/2 p̂(1 − p̂) 1.962 × 0.73(1 − 0.73)
n= = ≈ 842
E2 0.032
If p̂ is unknown, then the sample size is (round up!)
 2
zα/2 (0.25) 1.962 × (0.25)
n= = ≈ 1068
E2 0.032
By : Sun Bunra 55
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise  18. Let X1 , X2 , . . . , Xn be a random sample from a log-normal distribution


log m, σ 2 , where m and σ 2 are parameters.
(a) Find the MLE m̂ for m. Is m̂ efficient?
X
25
(b) Construct a 95% CI for m when σ = 1 and ln xi = 54.95.
i=1

Solution :
(a) Find the MLE µ̂ for µ.
 1
e− 2σ2 (ln x−µ)
1 2
We have log normal distribution f x, µ, σ 2 = √
2πσx
!
 Yn
1 2
−1 − 2σ2 (ln xi −µ)
1
• Likelihood function L x, µ, σ 2 = 1 xi e
2
(2πσ ) 2
i=1
 ∑n Y
n
2 −1n/2 [ − 12 (ln xi −µ)2 ]
= 2πσ ·e i=1 2σ × x−1
i
i=1
n   !
 n  X 1 X n
• L = ln L xi µ, σ 2
= − ln 2πσ 2 + − 2 (ln xi − µ)2 − ln xi
2 i=1
2σ i=1
X
n
n X
n
(ln xi − µ)2
=− ln xi − ln(2π) − n ln(σ) −
i=1
2 i=1
2σ 2
Xn Pn Pn
n ln xi µ ln xi nµ2
=− ln xi − ln(2π) − n ln(σ) − i=1
+ i=1
− 2
i=1
2 2σ 2 σ2 2σ
Pn
dL ln xi 2nµ
• = i=1

dµ σ2 2σ 2
d2 L n
• 2
= − 2 < 0, σ > 0
dµ σ
Pn
1X
n
dL i=1 ln xi nµ
=0⇔ = 2 ⇒µ= ln xi
du σ2 σ n i=1

1X
n
Therefore, the MLE µ̂ of µ is µ̂ = ln xi
n i=1
b. Show that the MLE is the MVUE of µ
!
1X 1X
n n
We have E(µ̂) = E ln xi = E (ln xi )
n i=1 n i=1
1
E (ln x1 + . . . + ln xn ) = E(ln x)
=
n
Z +∞
1 1 ln x−µ 2
Since, E(µ̂) = ln x √ e− 2 ( σ ) dx
−∞ σx 2π
ln x − µ 1
Let u = ⇒ du = 2 dx
σ σ x

By : Sun Bunra 56
Institute of Technology of Cambodia Statistics ( 2022-2023 )
Z +∞
uσ + µ − 1 u2
⇒ E(µ̂) = √ e 2 du
Z 2π Z +∞
+∞
σ − 12 u2 µ
√ e− 2 u du
1 2
= √ ue du +
2π 2π
| −∞ {z } −∞

Z +∞ 0
µ
√ e− 2 u du = µ
1 2
=
−∞ 2π
Z +∞
σ 1
√ ue− 2 u du Let u = − u2 ⇔ du = −u
1 2
Then, I=
−∞ 2π 2
Z +∞
σ 
√ e−1/2u d −1/u2
2
=−
−∞ 2π
σ h i+∞
−1/u2
=− √ e =0
2π −∞
 
ln πxi 1
And V (µ̂) = V = 2 [V (ln x1 ) + . . . + V (ln xn )]
n n
1 1 
= V (ln x) = E(ln x)2 − E 2 (ln x)
n n
Z +∞
1 ln x−u 2
=⇒ E(ln x)2 = (ln x)2 × √ e−1/2( σ ) du = µ2 + σ 2
−∞ 2πσx
1 2   σ2
=⇒ V (µ̂) = µ + σ 2 − µ2 =
n n
 2    n
dα n n
we have, E = E − = −E = −
d2 µ σ2 σ2 σ2
1
=⇒ V (µ̂) = −  2  ⇒ µ̂ is an efficient estimator of µ
E dd2αµ
Since every efficient estimator of µ is the MVUE of µ So, µ̂ is a MUVE of µ
Therefore, MLE is the MVUE of µ
C. Construct a 95% confident interval for µ
 
σ σ
By formula : CI(µ) = x̄ − zα/2 √ , x̄ + zα/2 √
n n
α
We have 1 − α = 0.95 ⇒ α = 0.05 ⇒ = 0.025
 2
−1 α
⇒ zα/2 = ϕ 1− = ϕ (1 − 0.025) = ϕ−1 (0.975) = 1.96
−1
2
 
For σ = 1 Then, x ∼ log µ, σ 2 ⇒ ln x − N µ, σ 2
1X 1 X
n 25
54.95
=⇒ x̄ = ln xi = ln xi = = 2.198
n i=1 25 i=1 25
 
σ σ
⇒ CI(µ) = x̄ − zα/2 √ , x̄ + zα/2 √ = [2.198 − 0.392, 2.198 + 0.392] = [1.806, 2.59]
n n
Therefore, CI(µ) = [1.806, 2.59]

By : Sun Bunra 57
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 19. Let X1 , X2 , . . . , Xn be a random sample from a population X with pdf



 2x if 0 ≤ x ≤ θ
f (x; θ) = θ2
0 otherwise

where θ > 0 is an unknown parameter.


(a) Determine an estimator Tn for θ, using the moment method. Is Tn efficient.
(b) Find an unbiased estimator θ̂n for θ, from the MLE for θ. Which estimator between Tn
and θ̂n is more efficient?
(c) Find a 95% CI for θ when max (x1 , . . . , x20 ) = 5.
Solution :
(a) Determine Tn which is an estimator of θ by using moment method.
1X
n
We have E(X) = Xi = X̄
n i=1
Z +∞ Z θ
2x 2
and we have E(X) = xf (x)dx = x · 2 dx = θ(2)
−∞ 0 θ 3
3X̄
by (1) and (2), we get Tn =
2
3X̄
Therefore, Tn =
2
• Is Tn efficient?
 
3 X
n
3X̄
We have E (Tn ) = E = E (Xi ) = θ
2 2n i=1
Therefore, It is an unbiased estimator.
1
Note that, If Tn is efficient, then variance of Tn must be equal to .
nI(θ)
 
9 X
n
3X̄
We have V (Tn ) = V = · V (Xi )
2 4n2 i=1
For n ≥ 1, we have V (Xi ) = V (X)
where 
V (X) = E X 2 − E 2 (X)
2θ 4θ2
We have E(X) = then E 2 (X) =
3 9
Z θ 3
 2x θ2
by definition E X 2 = dx =
0 θ2 2
Thus  
1 4 θ2
V (X) = − θ2 =
2 9 18
we have  2 

I(θ) = −E ln (f (xi , θ))
∂θ2

By : Sun Bunra 58
Institute of Technology of Cambodia Statistics ( 2022-2023 )

where
ln(f (x, θ)) = ln(2x) − 2 ln(θ)
and
∂2 2 2
2
ln (f (xi , θ)) = 2 , so I(θ) = − 2
∂θ θ θ
1 θ2
Therefore = − which is different from the variance of Tn .
nI(θ) 2
Conclusion, Tn is not efficient.

Exercise 20. Let X1 , X2 , . . . , Xn be a random sample from a population X with pdf


 1

 x0θ
f (x; θ) = θx1+ θ1 if x > x0

0 if x ≤ x0 .

where θ > 0 is an unknown parameter and x0 > 0.


(a) Find the MLE θ̂n for θ.Is θ̂n efficient?
Y
14
(b) Find a 95% CI for θ when xi = 256514 and x0 = 1900.
i=1

Solution :
(a) Find the MLE θ̂n for θ.
we have
Y
n
L(x, θ) = f (xi ; θ)
i=1
Y
n
xo 1/θ
= 1+ θ1
for x ≥ xo
i=1θxi
n/θ  1+ θ1
xo 1
= n Qn
θ i=1 xi
Then,
  n
n 1 X
ln(L(x; θ)) = ln xo − n ln(θ) − 1 + xi
θ θ i=1
1 X
n
∂ n n
ln(L(x; θ)) = − 2 ln xo − + 2 ln xi
∂θ θ θ θ i=1

ln(L(x; θ)) = 0
∂θ
And,
1 X
n
n n
− 2 ln xo − + 2 ln xi = 0
θ θ θ i=1
Thus,
1X
n
θ= ln xi − ln xo
n i=1

By : Sun Bunra 59
Institute of Technology of Cambodia Statistics ( 2022-2023 )

1X
n
Therefore, θ̂n = ln Xi − ln xo
n i=1

Is θ̂n efficient? Z Z
+∞ +∞
xo 1/θ ln x
E(ln X) = ln xf (x; θ)dx = 1 dx
xo θ xo x1+ θ
Let u = ln x
Then,
Z
xo 1/θ +∞ −u/θ
E(U ) = ue du
θ ln xo
xo 1/θ 
= θxo −1/θ ln xo + θ2 xo −1/θ
θ
= ln xo + θ
 
Then E θ̂n = θ. Therefore, it is an unbiased estimator.
!
  1X
n
V θ̂n = V ln Xi − ln xo
n i=1
1 X
n
= 2 V (ln Xi )
n i=1
1
= V (ln X)
n
Where, 
V (ln X) = E ln2 X − E 2 (ln X)
and,
Z

xo 1/θ +∞ ln2 x
2
E ln X = 1 dx
θ xo x1+ θ
Z
xo 1/θ +∞ 2 −u/θ
= ue du
θ ln xo

By changing variable u = ln x
we have
E(U ) = ln2 xo + 2θ ln xo + 2θ2
So
V (ln X) = ln2 xo + 2θ ln xo + 2θ2 − (ln xo + θ)2
= θ2
  θ
Therefore, V θ̂n =
n
Fisher information,  
∂2
I(θ) = −E ln(f (x; θ))
∂θ2
!
1/θ
∂2 xo
= −E ln
∂θ2 θxi 1+ θ1

By : Sun Bunra 60
Institute of Technology of Cambodia Statistics ( 2022-2023 )

where,
∂  1
 ln xo 1 1
ln xo 1/θ − ln θ − ln x1+ θ = − 2 − + 2 ln x
∂θ θ θ θ
so,
∂2 2 ln xo 1 2 ln x
2
(ln f (x; θ)) = 3
+ 2+ 3
∂θ θ θ θ
Then,  
2 ln xo 1 2 ln x 2 ln xo 1 2
E 3
+ 2+ 3 = 3
+ 2 − 3 E(ln X)
θ θ θ θ θ θ
1
Therefore, I(θ) = 2
θ
  1
Conclusion, Since V θ̂n = . So, θ̂n is efficient.
nI(θ)
Y
15
(b) Find a 95% CI for θ, when xi = 256514 , xo = 1900
i=1

We have to find a and b such that; P (a ≤ g(x; θ) ≤ b) = 1 − α, α = 0.05 g is a pivot.


1/θ
xo
We have f (x; θ) = 1+ 1
if x ≥ xo
θxi θ
"   −1 #
x θ
Then, FX (x) = 1 −
xo
 
x
Let u = ln where u > 0
xo
1
Then U ∼ Exp(θ) where fU (u) = e−u/θ
θ
2Ui 1
Let Yi = , It is clearly defined that fY (y) = e−y/2 (Use changing variable in pdf)
θ 2
 
1
so, Yi ∼ χ22 then, Y ∼ χ22n It is Γ ,2
2
Therefore by symmetry principal
we obtained P
!
2 X nα  α 
P χ22n ≤ ui ≤ χ22n 1 − = 0.95
2 θ i=1 2
 Q Q 
2 ln ni=1 xi − 2n ln xo 2 ln ni=1 xi − 2n ln xo
P ≤θ≤ = 0.95
χ228 (0.975) χ228 (0.025)
Using the given data,we obtained
P (0.189 ≤ θ ≤ 0.549) = 0.95
Therefore, CI0.95 (θ) = (0.189, 0.549)
Exercise 21. Let X1 , X2 , . . . , Xn be a random sample from a population X with pdf
 √
 1√ e− θx if x > 0
f (x; θ) = 2θ x

0 otherwise.

By : Sun Bunra 61
Institute of Technology of Cambodia Statistics ( 2022-2023 )

where θ > 0 is an unknown paramete.


(a) Find the MLE θ̂n for θ. Is θ̂n efficient?
X
20

(b) Find a 90% CI for θ when xi = 47.4.
i=1

Solution :
(a) Find the MLE of θ
Yn n 
Y  ∑n √
e− θ i=1
√ 1
xi
1 xi
• L(θ) = f (x; θ) = √ e− θ = (2θ)−n Qn √
2θ xi i=1 xi
i=1 i=1
!
1 X√ Y
n n

⇒ ln L(θ) = −n ln(2θ) − xi − ln xi
θ i=1 i=1

1 X√
n
∂ n
• ln L(θ) = − + 2 xi
∂θ θ θ i=1

1 X√ 1 X√
n n
∂ n
• ln L(θ) = 0 ⇒ = 2 xi ⇒ θ = xi = ȳ.
∂θ θ θ i=1 n i=1
1 Xp
n
Therefore, the MLE of θ is θ̂n = Xi = Ȳ .
n i=1
!
  1X
n
1X
n

We have E θ̂n = E Yi = E (Yi ) = = θ (1)
n i=1 n i=1 n
!
  1X
n
1 X
n
nθ2 θ2
V θ̂n = V Yi = 2 V (Yi ) = 2 =
n i=1 n i=1 n n
Since X(Ω) = (0, ∞), then
    √ 
∂ ∂ √ x
I(θ) = V ln f (x; θ) = V − ln(2θ) − ln x −
∂θ ∂θ θ
 √ 
1 x 1 θ2 1
= V − + 2 = 4 V (Y ) = 4 = 2
θ θ θ θ θ
1 θ 2  
⇒ = = V θ̂n (2)
nI(θ) n

From (1) and (2), we have θ̂n is an efficient estimator of θ.


X
15

(b) Find 90% CI for θ when xi = 47.4
i=1

By symmetry principle, we get


0.9 = P (χ0.95,30 ≤ U ≤ χ0.05,30 )
!
2X
15
=P 18.493 ≤ ≤ 43.773
θ i=1
P15 √ P15 √ !
χ i 2 χi
=P i=1
≤θ≤ i=1
43.773 18.493

By : Sun Bunra 62
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, a 90%CI for θ is


 
2 × 47.4 2 × 47.4
I(θ) = , = [2.166, 5.126]
43.773 18.493

By : Sun Bunra 63
Institute of Technology of Cambodia Statistics ( 2022-2023 )

TD4 - (Hypotheses Testing)

Exercise 1. Let X1 , X2 , . . . , X20 be a random sample from a distribution with probability


mass function
(
px (1 − p)1−x , if x = 0, 1
f (x; p) =
0 otherwise

1 1 1
where 0 < p ≤ is a parameter. The hypothesis Ho : p = to be tested against Ha : p < .
2 2 2
X20
If Ho is rejected when Xi ≤ 6, then what is the probability of type I error?
1=1

Solution : What is the probability of type I error?


i.i.d 1
We have X1 , .., Xn ∼ Ber(p) where 0 < p <
2
1 1
And the hypotheses H0 : p = vs Ha : p >
2 2
( )
X
20
RR = (x1 , . . . , x20 ) : xi ≤ 6
i=1

Then, the probability of type I error is given by α where:

α = P ( type I error ) = P ( reject H0 | H0 is true )


!    
X20
1 X 20
1
=P Xi ≤ 6 Since, X ∼ Ber =⇒ Xi ∼ Bin 20,
i=1
2 i=1
2
X6  
20
= pk (1 − p)20−k = 0.058
k
i=1

Therefore, α = 0.058

Exercise 2. Let p represent the proportion of defectives in a manufacturing process. To


1 1
test Ho : p ≤ versus Ha : p > , a random sample of size 5 is taken from the process.
4 4
If the number of defectives is 4 or more, the null hypothesis is rejected. What is the
1
probability of rejecting Ho if p = ?
5
1
Solution : What is the probability of rejecting H0 if p = .
5
Let X be a number of defectives.
1 1
The hypotheses H0 : p ≤ vs Ha : p >
4 4

RR = {(x) : x ≥ 4}

By : Sun Bunra 64
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Then, the probability of type I error is given by α where :

α = P ( type I error ) = P ( reject H0 | H0 is true )


 
1
= P (X ≥ 4) Since, X ∼ Bin 5,
5
X3  
5
=1− pk (1 − p)5−k = 1 − 0.003
k
i=1
= 0.007

Therefore, The probability of type I error is 0.007

Exercise 3. A random sample of size 4 is taken from a normal distribution with unknown
mean µ and variance σ 2 > 0. To test Ho : µ = 0 against Ha : µ < 0 the following test is
used: ”Reject Ho if and only if X1 + X2 + X3 + X4 < −20.” Find the value of σ so that the
significance level of this test will be closed to 0.14.
Solution :
Find the value of σ so that the significance level of this test will be closed to 0.14.
i.i.d
We have X1 , . . . , X4 ∼ N (µ, σ)
The hypothesis H0 : µ = 0 vs Ha : µ < 0
Then,
α = P ( type I error ) = P ( reject H0 | H0 is true ) = 0.14
= P (X1 + X2 + X3 + X4 < −20 | µ = 0) = 0.14
= P (X̄ < −5 | µ = 0) = 0.14
 
σ2
= P (X̄ < −5) = 0.14 where X̄ ∼ N 0,
4
 
10 10
Then, Φ − = 0.14 =⇒ − = −1.08 =⇒ σ = 9.25
σ σ
Therefore, σ = 9.25

Exercise 4. Let X1 , X2 , . . . , X25 be a random sample of size 25 drawn from a normal distri-
bution with unknown mean µ and variance σ 2 = 100. It is desired to test the null hypothesis
Ho : µ = 4 against the alternative Ha : µ = 6. What is the power at µ = 6 of the test with
X25
rejection rule: reject µ = 4 if Xi ≥ 125 ?
i=1

Solution :
X
25
What is the power at µ = 6 of the test with rejection rule: reject µ = 4 if Xi ≥ 125?
i=1
i.i.d 
We have X1 , . . . , Xn ∼ N µ, σ 2

The hypotheses H0 : µ = 4 vs Ha : µ = 6
( )
X
25
RR = (x1 , . . . , x25 ) : Xi ≥ 125
i=1

By : Sun Bunra 65
Institute of Technology of Cambodia Statistics ( 2022-2023 )

We have,
π(6) = P ( Reject H0 | µ = 6) is a power at µ = 6
!
X25
=P Xi ≥ 125 | µ = 6
i=1
 
100
= P (X̄ ≥ 5) Where X̄ ∼ N 6,
25
 
5−6
=1−Φ
2
= 1 − 0.30854 = 0.6915
Therefore, π(6) = 0.6915
Exercise 5. A urn contains 7 balls, θ of which are red. A reandom sample of size 2 is drawn
without replacement to test Ho : θ ≤ 1 against Ha : θ > 1. If the null hypothesis is rejected
if one or more red balls are drawn, find the power of the test when θ = 2.
Solution : Find the power of test when θ = 2
We have RR = {θ : θ ≥ 1}
Then,
π(2) = P ( Reject H0 | θ = 6)
= P (θ ≥ 1 | θ = 2)
= P ( one or more balls are drawn | two red are drawn )
= 1 − P ( no red ball is drawn | two red are drawn )
  
1 1
5 4 20 11
= 1 −    = 1 − = ( drawn without replacement )
1 1 42 21
7 6
11
Therefore, π(2) =
21

Exercise 6. Let X1 , X2 , · · · , Xn be a random sample from N 0, σ 2 .
( )
Xn
(a) Show that C = (x1 , x2 , · · · , xn ) : x2i ≥ c is a best rejection region for testing
i=1
H0 : σ 2 = 4 against Ha : σ 2 = 16.
X
n

(b) If n = 15, find the value of c so that α = 0.05. [Hint: Recall that Xi2 /σ 2 is χ2 (n).
i=1

(c) If n = 15 and c is ! the value found in part (b), find the approximate value of β =
X n
P Xi2 < c | σ 2 = 16 .
i=1

Solution :
( )
X
n
(a) Show that RR = (x1 , . . . , xn ) : x2i ≥ c is a best rejection region for testing H0 :
i=1
σ 2 = 4 against Ha : σ 2 = 16

By : Sun Bunra 66
Institute of Technology of Cambodia Statistics ( 2022-2023 )

i.i.d 
We have X1 , . . . , Xn ∼ N 0, σ 2
so, its pdf is given by :
 1 x2
f x; σ 2 = √ e− 2σ2 ∀x ∈ R
2πσ
The hypotheses H0 : σ 2 = 4 vs Ha : σ 2 = 16
By applying Neymann-Pearson lemma, the best RR is
 
L(4)
RR = (x1 , . . . , xn ) | ≤k and P ( Reject H0 | H0 ) = α
L(16)

 Y
n
 −n/2 − 1 ∑n x2
2
We have L σ = f xi , σ 2 = 2πσ 2 e 2σ2 i=1 i so
i=1
( ∑n
L(4) = (2π4)−n/2 e− 8 i=1∑xi
1 2

n
L(16) = (2π16)−n/2 e− 32 i=1 xi
1 2

Then,
L(4) 1 ∑n
= 2n e( 32 − 8 ) i=1 xi
1 2

L(16)
∑n
= 2n e− 32
3
x2i
i=1 ≤k
3 X 2
n
n ln 2 − x ≤ ln k
32 i=1 i
X
n
32
x2i ≥ (n ln 2 − ln k)
i=1
3
32
Let (n ln 2 − ln k) = c
3
( )
X
n
Therefore, RR = (x1 , . . . , xn ) | x2i ≥c
i=1

(b) If n = 15, find the value of c so that α = 0.05


i.i.d 
X1 , . . . , Xn ∼ N 0, σ 2

Hence,
X
n
X2 i
∼ χ2 (n)
i=1
σ2
Since,
P ( Reject H0 | H0 ) = α
!
Xn
P x2i ≥ c | σ 2 = α
P
i=1
n 
x2i c
P i=1
≥ | σ2 = α = 0.05
σ2 4
c
χ215,0.05 = = 25 =⇒ c = 100
4
By : Sun Bunra 67
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, c = 100
(c) if n = 15 and c is the value found in part (b), find the approximate value of
!
X
n
β=P Xi2 < c | σ 2 = 16
i=1
!
X
n
β=P Xi2 < c | σ 2 = 16
i=1

= P χ2 (15) ≤ 6.25
= 0.03

Therefore, β = 0.3

Exercise 7. Let X have a Pareto distribution with parameter θ > 0; that is, the pdf of X
is 
 1 x− θ1 −1 x > 1
f (x; θ) = θ
0 otherwise

Let X1 , X2 , . . . , Xn be a random sample from this distribution.


2X
n
(a) Let Yn = ln Xi . Show that Yn has chi-squared distribution with degree of freedom
θ i=1

n ∼ χ (2n) .(Recall that if V∼ χ (ν), then the moment generating function
2 2
2n (that is, Y
1
(mgf) of V is GV (t) = (1 − 2t)−ν/2 , t < .
2
(b) Using Neyman-Pearson lemma, show that the best critical region for testing H0 : θ = θ0
against Ha : θ = θa , θa > θ0 > 0, at level of test α, is
( )
X
n
C= (x1 , . . . , xn ) : ln xi ≥ c
i=1

where c satisfies P (Yn ≥ 2c/θ0 ) = α.


(c) Is the above critical region RR is uniformly most powerful for testing H0 : θ = θ0 against
Ha : θ > θ0 at level of test α ? Justify your answer.
(d) If n = 12, α = 0.10, H0 : θ = 3 and Ha : θ = 5. Determine the critical region RR.
Solution :
2X
n
(a) Let Yn = ln Xi . Show that Yn has a chi-squared distribution with degree of freedom
θ i=1

2n (that is, Yn ∼ χ2 (2n) .
We have 
 1 x− θ1 −1 x>1
f (x; θ) = θ
0 otherwise

Find mgf of Yn

By : Sun Bunra 68
Institute of Technology of Cambodia Statistics ( 2022-2023 )

We have,

MYn (t) = E etYn
 2t ∑n 
ln Xi
=E e θ i=1

 2t   2t   2t 
= E e θ ln X1 × E e θ ln X2 × . . . × E e θ ln Xn
h  2t in
= E e θ ln X
  n
2t
= Mln X
θ

We have,
Z
t ln X
 t
 ∞
Mln X (t) = E e =E X = xt f (x; θ)dx
Z1 ∞
xt − 1 −1
= x θ dx
θ
Z1 ∞
1 − 1 −1+t
= x θ dx
1 θ
1
=
1 − tθ

 n
1
Therefore, MYn (t) = = (1 − tθ)−n
1 − tθ
So, Yn ∼ χ2 (2n) Therefore, Yn ∼ χ2 (2n)
(b) Using Neyman-Pearson lemma, show that the best critical region for testing H0 : θ = θ0
against Ha : θ = θa , θa > θ0 > 0, at level of test α, is

( )
X
n
RR = (x1 , . . . , xn ) : ln xi ≥ c ,
i=1

where c satisfies P (Yn ≥ 2c/θ0 ) = α.


By Neyman-Pearson lemma, we have

 
L (θ0 )
RR = (x1 , . . . , xn ) : ≤k
L (θa )

1 Y −1− θ1
n
We have, L(θ) = x
θn i=1 i

By : Sun Bunra 69
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Then
 n Y
n
L (θ0 ) θa 1
− θ1
= x θa 0 ≤k
L (θa ) θ0 i=1
Y
n  n
1
− θ1 θ0
x θa 0 ≤k
i=1
θa
Yn  n
1
− θ1 θ0
ln x θa 0 ≤ ln k
i=1
θa
 n
1 X
n
1 θ0
− ln xi ≤ ln k
θa θ0 i=1 θa
X
n  n
θ0 θa θ0
ln xi ≥ ln k
i=1
θ0 − θa θa
 n
θ0 θa θ0
Let c = ln k
θ0 − θa θa
( )
X
n
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ c is the best critical region.
i=1
⇒P ( Reject H0 | H0 ) = α
!
Xn
P ln Xi ≥ c | θ0 = α
i=1
!  
2 X
n
2c 2c
P ln Xi ≥ =P Yn ≥
θ0 i=1 θ0 θ0
Since, Yn ∼ χ2 (2n)
2c θ0 θ0
Then, = χ22n,α =⇒ c = χ22n,α Therefore, c = χ22n,α
θ0 2 2
(c) Is the above critical region RR is uniformly most powerful for testing H0 : θ = θ0 against
Ha : θ > θ0 at level of test α ? Justify your answer.
Remark : A test defined by a critical region C of size α is a Uniformly most powerful test if
it is a most powerful test against each simple alternative in Ha .
Since, the test statistic Yn and C are independent of θa .
Thus, the RR is uniformly most powerful.
Therefore, RR is uniformly most powerful.
(d) If n = 12, α = 0.10, H0 : θ = 3 and Ha : θ = 5.Determine the critical region RR.
( )
X n
θ0 2
We have RR = (x1 , . . . , xn ) : ln xi ≥ c = χ2n,α
i=1
2
( )
Xn
3 2
Then, RR = (x1 , . . . , xn ) : ln xi ≥ c = χ24,0.1
i=1
2
( )
Xn
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ 49.794
i=1

By : Sun Bunra 70
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 8. The melting point of each of 16 samples of a certain brand of hydrogenated


vegetable oil was determined, resulting in x̄ = 94.32. Assume that the distribution of the
melting point is normal with σ = 1.20.
(a) Test H0 : µ = 95 versus Ha : µ ̸= 95 using a two-tailed level 0.01 test.
(b) If a level 0.01 test is used, what is β(94), the probability of a type II error when µ = 94?
(c) What value of n is necessary to ensure that β(94) = 0.1 when σ = .01 ?
Solution :
(a) Test H0 : µ = 95 versus Ha : µ ̸= 95 using a two-tailed level 0.01 test.
Test statistic value :
x̄ − µ 94.32 − 95
z= √ = = −2.56
σ/ n 1.2/4
By using two-tailed level the rejection area is given by:
 
RR = |z| ≥ z α2 = z : z ≥ z α2 or z ≤ −z α2

For α = 0.01 then z α2 = Φ−1 (1 − 0.01) = 2.575 Then,

RR = {z : z ≤ −2.575 or z ≥ 2.575}

Since, the test statistic value is not include in RR so, H0 is not rejected.
Therefore, H0 is not rejected.
(b) If a level 0.01 test is used, what is β(94), the probability of a type II error when µ = 94?
We have    
′ µ0 − µ′ µ0 − µ′
β (µ ) = Φ z α2 + √ − Φ −z α2 + √
σ/ n σ/ n
   
95 − 94 95 − 94
= Φ 2.575 + − Φ −2.575 +
1.2/4 1.2/4
= Φ(5.9) − Φ(0.75) = 1 − 0.7734 = 0.2266
Therefore, β(94) = 0.2266
(c) What value of n is necessary to ensure that β(94) = 0.1 when σ = 0.01 ?
We have, "  #2
σ z α2 + zβ
n= an approximate solution
µ0 − µ′
 
0.01(2.575 + 1.285)
Then, n = where zβ = 1.285, So, n = 22
95 − 94
Therefore, value of n is necessary to ensure that β(94) = 0.1 when σ = 0.01 is 22.

By : Sun Bunra 71
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 9. The desired percentage of SiO2 in a certain type of aluminous cement is 5.5.
To test whether the true average percentage is 5.5 for a particular production facility, 16
independently obtained samples are analyzed. Suppose that the percentage of SiO2 in a
sample is normally distributed with σ = 0.3 and that x̄ = 5.25.
(a) Does this indicate conclusively that the true average percentage differs from 5.5 ?
(b) If the true average percentage is µ = 5.6 and a level α = 0.01 test based on n = 16 is
used, what is the probability of detecting this departure from H0 ?
(c) What value of n is required to satisfy α = 0.01 and β(5.6) = 0.01 ?
Solution :
(a) Does this indicate conclusively that the true average percentage differs from 5.5?
Test statistic value :
x̄ − µ 5.25 − 5.5
z= √ = = −3.33
σ/ n 0.3/4
By using two-tailed level the rejection area is given by :
 
RR = |z| ≥ z α2 = z : z ≥ z α2 or z ≤ −z α2
Take α = 0.01 so z α2 = Φ−1 (1 − 0.01) = 2.575
Then
RR = {z : z ≥ 2.575 or z ≤ −2.575}
Since the test statistic value is not included in rejection region, Then the true average
percentage differs from 5.5
Therefore, The true average percentage is different from 5.5.
(b) If the true average percentage is µ = 5.6 and a level α = 0.01 test based on n = 16 is
used, what is the probability of detecting this departure from H0 ?

Let P ( Detecting from H0 ) = P ( Reject H0 | H0 is true )


= 1 − P ( do not reject H0 | H0 is false )
= 1 − β(5.6)
    
µ0 − µ′ µ0 − µ′
= 1 − Φ z α2 + √ − Φ −z α2 + √
σ/ n σ/ n
    
5.5 − 5.6 5.5 − 5.6
= 1 − Φ 2.575 + − Φ −2.575 +
0.3/4 0.3/4
= 1 − (Φ(1.242) − Φ(−3.908))
= 0.1075
Therefore, P ( detecting this departure from H0 ) = 0.1075
(c) What value of n is required to satisfy α = 0.01 and β(5.6) = 0.01 ?
We have, "  #2
σ z α2 + zβ
n= ( an approximate solution )
µ0 − µ′
Then,  2
0.03(2.575 + 2.33)
n= = 216.54 ≈ 217
5.5 − 5.6

By : Sun Bunra 72
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, n = 217

Exercise 10. The article ”Uncertainty Estimation in Railway Track Life- Cycle Cost” (J.
of Rail and Rapid Transit, 2009) presented the following data on time to repair (min) a rail
break in the high rail on a curved track of a certain railway line.

159 120 480 149 270 547 340 43 228 202 240 218

A normal probability plot of the data shows a reasonably linear pattern, so it is plausible
that the population distribution of repair time is at least approximately normal. The sample
mean and standard deviation are 249.7 and 145.1, respectively.
(a) Is there compelling evidence for concluding that true average repair time exceeds 200 min
? Carry out a test of hypotheses using a significance level of 0.05.
(b) Using σ = 150, what is the type II error probability of the test used in (a) when true
average repair time is actually 300 min ? That is, what is β(300) ?
Solution :
(a) Is there compelling evidence for concluding that true average repair time exceeds 200
min? Carry out a test of hypotheses using a significance level of 0.05.
Test statistic value :
x̄ − µ
t= √
s/ n
249.7 − 200
Then, t = √ = 1.1865
145.1/ 12
By using upper-tailed level for 0.05 test the rejection region is given by :

RR = {t : t ≥ tα,n−1 }

For α = 0.05, n = 12 we have tα,n−1 = 1.796


Then,
RR = {t : t ≥ 1.796}
Since the test statistic value is not included in rejection region, so we cannot conclude that
the true average repair time exceeds 200 min.
Therefore, There is not enough evidence to do the conclusion.
(b) Using σ = 150, what is the type II error probability of the test used in (a) when true
average repair time is actually 300 min ? That is, what is β(300) ?
We have :  
µ − µ′
β(300) = Φ z α2 + √
σ/ n
 
200 − 300
= Φ 1.645 + √
120/ 12
= Φ(−0.664) = 0.2546
Therefore, β(300) = 0.2546

By : Sun Bunra 73
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 11. Given the accompanying sample data on expense ratio (%) for large-cap
growth mutual funds:
0.52 1.06 1.26 2.17 1.55 0.99 1.10 1.07 1.81 2.05
0.91 0.79 1.39 0.62 1.52 1.02 1.10 1.78 1.01 1.15

A normal probability plot shows a reasonably linear pattern.


(a) Is there compelling evidence for concluding that the population mean expense ratio
exceeds 1% ? Carry out a test of the relevant hypotheses using a significance level of 0.01.
(b) Referring back to (a), describe in context type I and II errors and say which error you
might have made in reaching your conclusion. The source from which the data was obtained
reported that µ = 1.33 for the population of all 762 such funds. So did you actually commit
an error in reaching your conclusion?
(c) Supposing that σ = 0.5, determine and interpret the power of the test in (a) for the
actual value of µ stated in(b).
Solution :
(a) Is there compelling evidence for concluding that the population mean expense ratio
exceeds 1% ? Carry out a test of the relevant hypotheses using a significance level of 1%
Null Hypotheses H0 : µ = 1%
Alternative hypotheses Ha : µ > 1%
Test statistic value :

x̄ − µ
t= √
s/ n
v
P20 u
u 1 X 20
i=1 xi
we have x̄ = = 1.2435 and s = t (xi − x̄)2 = 0.4484
20 n − 1 i1
1.2435 − 1
Then, t = √ = 2.4285
0.4484/ 20
By using upper-tailed level for 0.01 test the rejection region is given by :

RR = {t : t ≥ tα,n−1 }

where, tα,n−1 = 2.539


Then,
RR = {t : t ≥ 2.539}
Since the test statistic value is not included in rejection region, so we cannot conclude that
the population mean expense ratio exceeds 0.01.
Therefore, There is not enough evidence to do the conclusion.
(b) Referring back to (a), describe in context type I and II errors and say which error you
might have made in reaching your conclusion. The source from which the data was obtained
reported that µ = 1.33 for the population of all 762 such funds. did you actually commit an
error in reaching your conclusion?
• Type I errors: the true expense ratio is 1% based on the data, but we do not suppose

By : Sun Bunra 74
Institute of Technology of Cambodia Statistics ( 2022-2023 )

this fact.
• Type II errors: the true expense ratio exceeds 1% based on the data, and we accept that
µ = 1%
The source from which the data was obtained reported that µ = 1.33 > 1
So, we actually commit an Type II errors in reaching our conclusion.
(c) Supposing that σ = 0.5, determine and interpret the power of the test in (a) for the
actual value of µ stated in (b).
We have
π (µ′ ) = 1 − P ( Type II Errors )
 
µ − µ′
= 1 − Φ z2 + √
α
σ/ n
 
1 − 1.33
= 1 − Φ 1.645 + √
0.5/ 762
= 1 − Φ(−0.62) = 0.7324
Therefore, π(1.33) = 0.7324
Interpret : We are 73.24% sure that for alternative hypothesis µ′ = 1.33% the test statistic
is included in the rejection region.

Exercise 12. A random sample of 50 measurements resulted in a sample mean of 62 with


a sample standard deviation 8 . It is claimed that the true population mean is at least 64.
(a) Is there sufficient evidence to refute the claim at the 2% level of significance?
(b) What is the P -value?
(c) What is the smallest value of α for which the claim will be rejected?
Solution :
(a) Is there sufficient evidence to refute the claim at the 2% level of significance?
Null hypothesis : H0 : µ ≥ 64
Alternative hypothesis : Ha : µ ≤ 64
Test statistic value :
x̄ − µ
z= √
s/ n
62 − 64
Then, z = √ = −1.76
8/ 50
By using lower-tailed level for 2% test the rejection region is given by :

RR = {z : z ≤ −zα }

for, α = 0.02 we have, zα = 2.06 Thus,

RR = {z : z ≤ −2.06}

Since the test statistic value is not included in rejection region, so there is not enough
evidence to refute.

By : Sun Bunra 75
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, There is not enough evidence to refute the claim.


(b) What is the p-value?
The p-value is the smallest significance level α at which the null hypothesis can be rejected.
(c) What is the smallest value of α for which the claim will be rejected?
Note : Reject H0 if p-value ≤ α
We have p-value = P (z ≤ −1.76 | H0 ) = Φ(−1.76) = 0.0392
Therefore, The smallest value of α is 0.0392

Exercise 13. A random sample of 78 observations produced the following sums:

X
78 X
78
xi = 22.8, (xi − x̄)2 = 2.05.
i=1 i=1

(a) Test the null hypothesis that µ = 0.45 against the alternative hypothesis that µ < 0.45
using α = 0.01. Also find the p-value.
(b) Test the null hypothesis that µ = 0.45 against the alternative hypothesis that µ ̸= 0.45
using α = 0.01. Also find the p-value.
(c) What assumptions did you make for solving (a) and (b)?
Solution :
(a) Test the null hypothesis that µ = 0.45 against the alternative hypothesis that µ < 0.45
using α = 0.01. Also find the p-value.
We have, Null hypothesis H0 : µ = 0.45 and Alternative hypothesis Ha : µ < 0.45
Test statistic value :
x̄ − µ
z= √
s/ n
r
22.8 2.05
We have, x̄ = = 0.2923, s = = 0.1631
78 77
Then
0.2923 − 0.45
z= √ = −8.54
0.1631/ 78
By using lower-tailed level for 0.01 test the rejection region is given by :

RR = {z : z ≤ −zα }

for α = 0.01 we have zα = 2.33


Then
RR = {z : z ≤ −2.33}
Since the test statistic value in included in rejection region, so H0 is rejected.
Therefore, H0 is rejected.
And P -value = Φ(z) = Φ(−8.54) ≈ 0
Therefore, p-value ≈ 0
(b) Test the null hypothesis that µ = 0.45 against the alternative hypothesis that µ ̸= 0.45

By : Sun Bunra 76
Institute of Technology of Cambodia Statistics ( 2022-2023 )

using α = 0.01. Also find the p-value.


Alternative hypothesis Ha : µ ̸= 0.45
we have the test statistic value z = 8.54
and by using Two-tailed level for 0.01 test the rejection region is given by :

RR = z : z ≥ z α2 or z ≤ −z α2

We have z α2 = Φ−1 (1 − 0.05) = 2.575


Then
RR = {z : z ≥ 2.575 or z ≤ −2.575}
Since the test statistic value is included in the rejection area, so H0 is rejected.
Therefore, H0 is rejected. p-value = 2[1 − Φ(|z|)] = 2[1 − Φ(8.54)] ≈ 0v
(c) What assumptions did you make for solving (a) and (b)? Random sample comes from a
normal population.
Since n = 28, we assume that it is in the large sample case.

Exercise 14. The number of carbohydrates found in a random sample of fast-food entrees
is listed. Is there sufficient evidence to conclude that the variance differs from 100? Use the
0.05 level of significance.
53 46 39 39 30
47 38 73 43 41
Solution :
Given Information
The number of carbohydrates found in a random sample of fast-food entrees is listed below

S = {53, 46, 39, 39, 30, 47, 38, 73, 43, 41}
The Population Variance :
σ 2 = 100
The significance Level :
α = 0.05
We need to test whether the sample variance differs from the population variance or not.
The Null and Alternative hypothesis are :

H0 : σ 2 = 100; H1 : σ 2 ̸= 100

Compute the Standard Deviation of data :

X
N
x̄ = xi = 44.9
i=1
s
P
(xi − x̄)2
s= = 135.433
N −1

By : Sun Bunra 77
Institute of Technology of Cambodia Statistics ( 2022-2023 )

S. No. Data
1 53
2 46
3 39
4 39
5 30
6 47
7 38
8 73
9 43
10 41
MEAN 44.9
STD 135.433
Compute the test value :
(n − 1)s2
χ2 =
σ2
(10 − 1)(135.433)
=
(100)
= 12.189
From Table G, the value of Critical χ2 for d.f. = 9 and α = 0.025 (For two tailed test) is
16.919.
It does not cross the critical value and hence we do not reject the Null hypothesis. Therefore,
we do not have enough evidence to support the claim that variance differs from 100.

Exercise 15. The manager of a large company claims that the standard deviation of the
time (in minutes) that it takes a telephone call to be transferred to the correct office in her
company is 1.2 minutes or less. A random sample of 15 calls is selected, and the calls are
timed. The standard deviation of the sample is 1.8 minutes. At α = 0.01, test the claim
that the standard deviation is less than or equal to 1.2 minutes. Use the P -value method.
Solution :
From the given information, n = 15; s = 1.8 and σ = 1.2
Null hypothesis, H0 : σ ≤ 1.2
Alternative hypothesis, H1 : σ > 1.2
Level of significance, α = 0.01
Test statistic is,
(n − 1)s2
χ2 =
σ2
(15 − 1)(1.8)2
=
(1.2)2
= 31.5
The degrees of freedom is,
df = n − 1
= 15 − 1
= 14

By : Sun Bunra 78
Institute of Technology of Cambodia Statistics ( 2022-2023 )

The p-value is,

p − value = (= CHIDIST (31.5, 14)) (Use MS Excel function )


= 0.004715

It is observed that the p - value is less than the given significance level, so we reject the null
hypothesis and conclude that there is enough evidence to reject the null hypothesis and the
standard deviation is less or equal to 12 .

Exercise 16. A machine fills 12 -ounce bottles with soda. For the machine to function
properly, the standard deviation of the sample must be less than or equal to 0.03 ounce. A
random sample of 8 bottles is selected, and the number of ounces of soda in each bottle is
given. At a α = 0.05, can we reject the claim that the machine is functioning properly? Use
the P -value method.
12.03 12.10 12.02 11.98
12.00 12.05 11.97 11.99

Solution :
Significance level α = 0.05, n=8
Denote appropriate null and alternative hypothesis from the claim given in the exercise :

H0 : σ = 0.03 (claim)
H1 : σ > 0.03

From the data given in the exercise we need to calculate sample mean and sample standard
deviation with the formula :
P
X
X̄ = = 12.018
sn
P 2
Xi − X̄
s= = 0.043
n−1

Test value is calculated with formula :


(n − 1)s2 (8 − 1)0.0432
χ2 = =
σ2 0.032
= 14.381

p-value: Using table G, Appendix Cp-value can be found with df = n − 1 = 8 − 1 = 7.


The value 14.381 is between 14.067 and 16.013, which corresponds to probabilities 0.05 and
0.025
0.025 < p-value < 0.05
Decision: Since p - value < α, we have enough evidence to reject H0
With 95% confidence we have enough evidence to reject the claim that the machine is
functioning properly (that the standard deviation is less than or equal to 0.03 ounce).

By : Sun Bunra 79
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 17. A coin is tossed 9 times and 3 heads appear. Can you conclude that the coin
is not balanced? Use α = 0.10. [Hint: Use the binomial table and find 2P (X ≤ 3) with
p = 0.5 and n = 9.
Solution :
Given :
x=3
n=9
If the coin is fair, then we have 1 chance out of 2 to toss heads:

1
p= = 0.5
2
Determine the hypotheses :
H0 : p = 0.5
Ha : p ̸= 0.5
The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme, if the null hypothesis is true. Formula binomial probability :
 
n
P (X = k) = · pk · (1 − p)n−k
k

Evaluate at k = 0, 1, 2, 3 :
 
9
P (X = 0) = · 0.50 · (1 − 0.5)9−0 ≈ 0.0020
0
 
9
P (X = 1) = · 0.51 · (1 − 0.5)9−1 ≈ 0.0176
1
 
9
P (X = 2) = · 0.52 · (1 − 0.5)9−2 ≈ 0.0703
2
 
9
P (X = 3) = · 0.53 · (1 − 0.5)9−3 ≈ 0.1641
3
Add the probabilities :

P (X ≤ 3) = P (X = 0) + P (X = 1) + P (X = 2) + P (X = 3)
= 0.0020 + 0.0176 + 0.0703 + 0.1641
= 0.2539

The P-value is then twice this probability :

P = 2 × 0.2539 = 0.5078

If the P-value is smaller than the significance level, then the null hypothesis is rejected.

P > 0.10 ⇒ Fail to reject H0

There is not sufficient evidence that the claim is not balanced.

By : Sun Bunra 80
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 18. In the past, 20% of all airline passengers flew first class. In a sample of 15
passengers, 5 flew first class. At α = 0.10, can you conclude that the proportions have
changed?
Solution :
Can you conclude that proportion has change?
Let p be the proportion of the changed of first class
Test H0 : p = 0.2 versus Hα : p ̸= 0.2
We have n = 15, x = 5 and α = 0.1
x 5
• p̂ = = = 0.34
n 15
p̂ − p0 0.34 − 0.20
• Test statistic value z = q  −q  = 1.36
1
n
p 0 (1 − p 0 ) 1
15
0.2 × 0.8

• z α2 = z0.05 = ϕ−1 (1 − 0.05) = 1.645


Then the critical region

C = z : z ≤ −z α2 or z ≥ z α2 = {z : z ≤ −1.645 or z ≥ 1.645}

Since, z = 1.36 ∈
/C
Hance, we decided to do not reject H0 when α = 0.1 based on the given sample.
Therefore, we can say that from the past until now the number of passengers flew the first
class in not changing.

Exercise 19. A survey by Men’s Health magazine stated that 14% of men said they used
exercise to reduce stress. Use α = 0.10. A random sample of 100 men was selected, and 10
said that they used exercise to relieve stress. Use the P -value method to test the claim.
Solution :
Use the P -value method to test the claim
x 10
We have: α = 0.1, P = 0.14, β = = = 0.1
n 100
The test hypotheses : H0 : P̂ = P0 = 0.14, Ha ̸= P0
The fest statistics value z

P̂ − P0 0.1 − 0.14
z=  12 =   21 = −0.15
1
P
n 0
(1 − P0 ) 1
100
(0.14)(1 − 0.14)

The
P- value = 2[1 − ϕ(IzI)] = 2[1 − ϕ(0.15)]
= 2[1 − 0.5596]
= 0.8808
Since the P -value is > 0.1, The H0 is accepted
There is enough evidence to support the claim that 14% of mean used exercise to reduce
stress.

By : Sun Bunra 81
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 20. A common characterization of obese individuals is that their body mass in-
dex is at least 30 [BMI =weight /( height )2 , where height is in meters and weight is in
kilograms]. The article ”The Impact of Obesity on Illness Absence and Productivity in an
Industrial Population of Petrochemical Workers” (Annals of Epidemiology, 2008: 8-14) re-
ported that in a sample of female workers, 262 had BMIs of less than 25, 159 had BMIs
that were at least 25 but less than 30, and 120 had BMIs exceeding 30. Is there compelling
evidence for concluding that more than 20% of the individuals in the sampled population
are obese?
(a) State and test appropriate hypotheses using the rejection region approach with a signif-
icance level of 0.05.
(b) Explain in the context of this scenario what constitutes type I and II errors.
(c) What is the probability of not concluding that more than 20% of the population is obese
when the actual percentage of obese individuals is 25% ?
Solution :
(a) State and test appropriate hypotheses using the rejection region approach with a signif-
icance level of 0.05
Note : 262 had BMIs of less than 25,159 had BMIs that were at least 25 but less than 30,120
had BMIs exceeding 30.
So, we have n = 262 + 159 + 120 = 541, x = 120
120
Then, p̂ = = 0.221
541
Null hypothesis: H0 : p = 0.2
Alternative hypothesis: Ha : p > 0.2
p̂ − p0 0.221 − 0.2
The test statistic value: z=p =p = 1.221
p0 (1 − p0 ) /n 0.2(1 − 0.2)/541
By using upper-tailed level for 0.05 test, the rejection region is defined by :

RR = {z : z ≥ zα }

For α = 0.05, then zα = Φ−1 (1 − 0.05) = 1.645


Therefore, RR = {z : z ≥ 1.645}
Since the test statistic value is not included in rejection region, so there is not enough
evidence for concluding.
Therefore, there is not enough evidence.
(b) Explain in the context of this scenario what constitutes type I and II errors.
• Type I errors : include that more than 20% of female workers are obese, in fact it is not.
• Type II errors : include that exactly 20% of female workers are obese, in fact there is
more than 20% of female workers are obese.
(c) What is the probability of not concluding that more than 20% of the population is obese
when the actual percentage of obese individuals is 25% ?

By : Sun Bunra 82
Institute of Technology of Cambodia Statistics ( 2022-2023 )

By using upper-tailed level 0.05 test, we have


p !

p − p + z p (1 − p ) /n
β (p′ ) = Φ
0 α 0 0
p
p (1 − p ) /n
′ ′
p !
0.2 − 0.25 + 1.645 0.2(1 − 0.2)/541
⇒ β(0.25) = Φ p = 0.122
0.25(1 − 0.25)/541

Therefore, β(0.25) = 0.122

Exercise 21. A manufacturer of nickel-hydrogen batteries randomly selects 100 nickel plates
for test cells, cycles them a specified number of times, and determines that 14 of the plates
have blistered.
(a) Does this provide compelling evidence for concluding that more than 10% of all plates
blister under such circumstances? State and test the appropriate hypotheses using a sig-
nificance level of 0.05. In reaching your conclusion, what type of error might you have
committed?
(b) If it is really the case that 15% of all plates blister under these circumstances and a sam-
ple size of 100 is used,how likely is it that the null hypothesis of part (a) will not berejected
by the level 0.05 test? Answer this question for a sample size of 200 .
(c) How many plates would have to be tested to have β(0.15) = 0.10 for the test of part (a)?
Solution :
(a) Null hypothesis: H0 : p = 0.1
Alternative hypothesis: Ha : p > 0.1
14
We have n = 100, x = 14. Then, p̂ = = 0.14
100
The test statistic value :
p̂ − p0 0.14 − 0.1
z=p =p = 1.333
pe (1 − pe ) /n 0.1(1 − 0.1)/100

By using upper-tailed level for 0.05 test, the rejection region is defined by :

RR = {z : z ≥ za }

For α = 0.05, then za = Φ−1 (1 − 0.05) = 1.645


Therefore, RR = {z : z ≥ 1.645}
So, the test statistic value is not included in rejection region.
We have p− value = P (Z ≥ z) = P (Z ≥ 1.333) = 1 − P (Z ≤ 1.333) = 0.091
Since the p-value = 0.091 is greater than α = 0.05, then we fail to reject the hypothesis H0
Therefore, we can conclude that there is no evidence for concluding that more than 10% of
all plates blister under such circumstance. The possible error that could have occurred is
Type II error when you fail to reject the hypothesis.
(b) For p′ = 0.15

By : Sun Bunra 83
Institute of Technology of Cambodia Statistics ( 2022-2023 )

We have p !

p − p + z p (1 − p ) /n
β (p′ ) = Φ
0 a 0 0
p
p′ (1 − p′ ) /n
p !
0.1 − 0.15 + 1.645 0.1(1 − 0.1)/n
⇒ β(0.15) = Φ p
0.15(1 − 0.15)/n
For n = 100, then
p !
0.1 − 0.15 + 1.645 0.1(1 − 0.1)/100
For n = 200, then β(0.15) = Φ p = 0.493
0.15(1 − 0.15)/100
Thus,
β(0.15) = 0.493 for n = 100
β(0.15) = 0.275 for n = 200
(c) Find n when β(0.15) = 0.10 for the test of part (a)
p p !2
zσ p0 (1 − p0 ) + zj p′ (1 − p′ )
We have n =
p′ − p0
For β = 0.10, then zj = 1.282
p p !2
1.645 0.1(1 − 0.1) + 1.282 0.15(1 − 0.15)
So, n = = 361.9625 ≈ 362
0.15 − 0.1
Therefore, n = 362

Exercise 22. Let X have a Pareto distribution with parameter θ > 0; that is, the pdf of X
is 
 1 x− θ1 −1 , x > 1
f (x; θ) = θ
0, otherwise.
Let X1 , X2 , . . . , Xn be a random sample from this distribution.
2X
n
(a) Let Yn = ln Xi . Show that Yn has chi-squared distribution with degree of freedom
θ i=1

n ∼ χ (2n) .(Recall that if V∼ χ (ν), then the moment generating function
2 2
2n (that is, Y
1
(mgf) of V is GV (t) = (1 − 2t)−ν/2 , t < .
2
(b) Using Neyman-Pearson lemma, show that the best critical region for testing H0 : θ = θ0
against Ha : θ = θa , θa > θ0 > 0, at level of test α, is
( )
X
n
RR = (x1 , . . . , xn ) : ln xi ≥ c ,
i=1

where c satisfies P (Yn ≥ 2c/θ0 ) = α.


(c) Is the above critical region RR is uniformly most powerful for testing H0 : θ = θ0 against
Ha : θ > θ0 at level of test α ? Justify your answer.
(d) If n = 12, α = 0.10, H0 : θ = 3 and Ha : θ = 5. Determine the critical region RR.

By : Sun Bunra 84
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
2X
n
(a) Let Yn = ln Xi . Show that Yn has a chi-squared distribution with degree of freedom
θ i=1

2n (that is, Yn ∼ χ2 (2n) .
We have 
 1 x− θ1 −1 x > 1,
f (x; θ) = θ
0 otherwise.
Find mgf of Yn
We have, 
MYn (t) = E etYn
 2t ∑n 
= E e θ i=1 ln Xi
 2t   2t   2t 
= E e θ ln X1 × E e θ ln X2 × . . . × E e θ ln Xn
h  2t in
ln X
= E e θ

  n
2t
= Mln X
θ
And, Z
t ln X
 t
 ∞
Mln X (t) = E e =E X = xt f (x; θ)dx
Z1 ∞
xt − 1 −1
= x θ dx
θ
Z1 ∞
1 − 1 −1+t
= x θ dx
1 θ
1
=
1 − tθ
 n
1
Therefore, MYn (t) = = (1 − tθ)−n , So, Yn ∼ χ2 (2n)
1 − tθ
Therefore, Yn ∼ χ2 (2n)
(b) Using Neyman-Pearson lemma, show that the best critical region for testing H0 : θ = θ0
against Ha : θ = θa , θa > θ0 > 0, at level of test α, is
( )
X
n
RR = (x1 , . . . , xn ) : ln xi ≥ c ,
i=1

where c satisfies P (Yn ≥ 2c/θ0 ) = α.


By Neyman-Pearson lemma, we have
 
L (θ0 )
RR = (x1 , . . . , xn ) : ≤k
L (θa )

1 Y −1− θ1
n
We have, L(θ) = x
θn i=1 i

By : Sun Bunra 85
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Then  n Y
n
L (θ0 ) θa 1
− θ1
= x θa 0 ≤k
L (θa ) θ0 i=1
Y
n  n
1
− θ1 θ0
x θa 0 ≤k
i=1
θa
Y
n  n
1
− θ1 θ0
ln x θa 0 ≤ ln k
i=1
θa
 n
1 X
n
1 θ0
− ln xi ≤ ln k
θa θ0 i=1 θa
X
n  n
θ0 θa θ0
ln xi ≥ ln k
i=1
θ0 − θa θa
 n
θ0 θa θ0
Let c = ln k
θ0 − θa θa
( )
X
n
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ c is the best critical region.
i=1

⇒P ( Reject H0 | H0 ) = α
!
Xn
P ln Xi ≥ c | θ0 = α
i=1
!  
2 X
n
2c 2c
P ln Xi ≥ =P Yn ≥
θ0 i=1 θ0 θ0
Since, Yn ∼ χ2 (2n)
2c θ0 θ0
Then, = χ22n,α =⇒ c = χ22n,α Therefore, c = χ22n,α
θ0 2 2
(c) Is the above critical region RR is uniformly most powerful for testing H0 : θ = θ0 against
Ha : θ > θ0 at level of test α ? Justify your answer.
Remark : A test defined by a critical region C of size α is a Uniformly most powerful test if
it is a most powerful test against each simple alternative in Ha .
Since, the test statistic Yn and C are independent of θa .
Thus, the RR is uniformly most powerful.
Therefore, RR is uniformly most powerful.
(d) If n = 12, α = 0.10, H0 : θ = 3 and Ha : θ = 5.Determine the critical region RR.
( )
X n
θ0 2
We have RR = (x1 , . . . , xn ) : ln xi ≥ c = χ2n,α
i=1
2
( )
Xn
3 2
Then, RR = (x1 , . . . , xn ) : ln xi ≥ c = χ24,0.1
i=1
2
( )
Xn
Therefore, RR = (x1 , . . . , xn ) : ln xi ≥ 49.794
i=1

By : Sun Bunra 86
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 23. Let X1 , X2 , . . . , Xn be a random sample from a population X with pdf


 √
 1√ e− θx if x > 0
f (x; θ) = 2θ x

0 otherwise

where θ > 0 is an unknown parameter.



(a) Let Y = X. Find the cdf of Y and then deduce the pdf of Y . Show that Y ∼ Exp(θ).
(b) Find the MLE θ̂n for θ. Is θ̂n efficient?
2nθ̂n
(c) Let U = . Find the mgf of U and deduce that U ∼ χ2 (2n).
θ
X20

(d) Derive a 90% CI for θ when xi = 47.4.
i=1

(e) Find the best critical region for testing H0 : θ = 1 versus Ha : θ = θa , where θa > 1 when
α = 0.01 and n = 15.
(f) Is the test in (e) a UMP test for testing H0 : θ = 1 vs Ha : θ > 1 ? Justify your answer.
Solution :

(a) Show that Y = X ∼ Exp(θ)

Y = X ⇔X =Y2
dx
⇒J = = 2y
dy

Therefore, the pdf of Y is



  1 e− yθ , if y > 0
g(y) = f y · |J| = 2y 2θy
2
0, otherwise

 1 e− yθ , if y ≥ 0
= θ
0, otherwise

Therefore, Y ∼ Exp(θ).
(b) Find the MLE of θ
Yn n 
Y  ∑n √
e− θ i=1
√ 1
xi
1 xi
• L(θ) = f (x; θ) = √ e− θ = (2θ)−n Qn √
2θ xi i=1 xi
i=1 i=1
!
1 X√ Y
n n

⇒ ln L(θ) = −n ln(2θ) − xi − ln xi
θ i=1 i=1

1 X√
n
∂ n
• ln L(θ) = − + 2 xi
∂θ θ θ i=1

1 X√ 1 X√
n n
∂ n
• ln L(θ) = 0 ⇒ = 2 xi ⇒ θ = xi = ȳ.
∂θ θ θ i=1 n i=1

By : Sun Bunra 87
Institute of Technology of Cambodia Statistics ( 2022-2023 )

1 Xp
n
Therefore, the MLE of θ is θ̂n = Xi = Ȳ .
n i=1
!
  1X
n
1X
n

We have E θ̂n = E Yi = E (Yi ) = = θ (1)
n i=1 n i=1 n
!
  1X
n
1 X
n
nθ2 θ2
V θ̂n = V Yi = 2 V (Yi ) = 2 =
n i=1 n i=1 n n
Since X(Ω) = (0, ∞), then
    √ 
∂ ∂ √ x
I(θ) = V ln f (x; θ) = V − ln(2θ) − ln x −
∂θ ∂θ θ
 √  2
1 x 1 θ 1
= V − + 2 = 4 V (Y ) = 4 = 2
θ θ θ θ θ
1 θ 2  
⇒ = = V θ̂n (2)
nI(θ) n
From (1) and (2), we have θ̂n is an efficient estimator of θ.
2 Xp 2X
n n
(c) Let U = Xi = Yi . Then,
θ i=1 θ i=1
  2t ∑n 
MU (t) = E etU = E e θ i=1 Yi
h  2t in   n
2t
Y
= E eθ = MY
θ
Since Y ∼ Exp(θ), then
1
MY (t) = (1 − θt)−1 , t <
2
   −1
2t 2t 1
⇒ MY = 1−θ× = (1 − 2t)−1 , t <
θ θ 2
1
Therefore, MU (t) = (1 − 2t)−n = (1 − 2t)− 2 , t < . Hence, U ∼ χ2 (2n)
2n

2
X√
15
(d) Find 90% CI for θ when xi = 47.
i=1

By symmetry principle, we get


0.9 = P (χ0.95,30 ≤ U ≤ χ0.05,30 )
!
2X
15
=P 18.493 ≤ ≤ 43.773
θ i=1
P15 √ P15 √ !
χ i 2 χi
=P i=1
≤θ≤ i=1
43.773 18.493

Therefore, a 90%CI for θ is


 
2 × 47.4 2 × 47.4
I(θ) = , = [2.166, 5.126]
43.773 18.493

By : Sun Bunra 88
Institute of Technology of Cambodia Statistics ( 2022-2023 )

(e) H0 : θ = 1 vs Ha : θ = θa , θa > 1
By Neyman-Pearson lemma, we have, for k > 0,

L (θ0 ) L(1)
≤k⇔ ≤k
L (θa ) L (θa )
 ∑ n √x
1 n
e i=1 i
⇔  2n 1 ∑n √ ≤ k
1
2θa
e− θa i=1 xi
∑n √
⇔ θan e( θa −1) i=1 xi ≤ k
1

  n  
1 − θa X √ k
⇔ xi ≤ ln n
θa i=1
θa
X√n  
θa k
⇔ xi ≥ ln n = c
i=1
1 − θa θa
( )
X
n

Thus, RR = (x1 , · · · , xn ) : xi ≥ c where the constant c is defined by
i=1
 
2c
α=P U ≥ |θ=1
θ
 
2c
0.01 = P U ≥ | θ = 1 , U ∼ χ2 (2n)
θ
= P (U ≥ 2c), n = 15
So, 2c = χ20.01,30 = 50.892 ⇒ c = 25.446.
( )
X
15

Hence, RR = (x1 , · · · , xn ) : xi ≥ 25.446 .
i=1

X
n p
(f) Since the test H0 : θ = 1 vs Ha : θ = θa defines the test statistic Xi and RR do not
i=1
depend on θa , for each θa > 1, then it is a UMP test for testing H0 : θ = 1 vs H0 : θ > 1

Exercise 24. Let X1 , X2 , . . . , Xn be a random sample from a population X with pdf



 1 x θ1 −1 if 0 ≤ x ≤ 1
f (x; θ) = θ
0 otherwise

where θ > 0 is an unknown parameter.


(a) Let Y = − ln X. Find the cdf of Y and then deduce the pdf of Y . Show that Y ∼ Exp(θ)
(b) Find the MLE θ̂n for θ. Is θ̂n efficient?
2nθ̂n
(c) Let U = . Find the mgf of U and deduce that U ∼ χ2 (2n).
θ
(d) Derive a 100(1 − α)% CI for θ.
(e) Find the best critical region for testing H0 : θ = 1 versus Ha : θ = θa , where θa > 1 when

By : Sun Bunra 89
Institute of Technology of Cambodia Statistics ( 2022-2023 )

α = 0.01 and n = 15.


(f) Is the test in (e) a UMP test for testing H0 : θ = 1 vs Ha : θ > 1 ? Justify your answer.
Solution :
(a) Show that Y = − ln X ∼ Exp(θ)

y = − ln x ⇔ x = e−y
dx
⇒J = = −e−y
dy
⇒ |J| = e−y

Therefore, the pdf of Y is



  1 e−y  θ1 −1 , if y ≥ 0
−y −y
g(y) = f e · |J| = e θ
0, otherwise

 1 e− xθ , if y ≥ 0
= θ
0, otherwise

Therefore, Y ∼ Exp(θ).
(b) Find the MLE of θ
! θ1 −1
Y
n Y
n
1 1
−1
Y
n
L(θ) = f (x; θ) = xi = θ−nθ
xi
i=1 i=1
θ i=1
 X n
1
ln L(θ) = −n ln θ + −1 ln xi
θ i=1

1 X
n
∂ n
ln L(θ) = − − 2 ln xi
∂θ θ θ i=1
1X
n
⇒θ=− ln xi = ȳ
n i=1

1X
n
So, the MLE of θ is θ̂n = − ln Xi = Ȳ .
n i=1

Is θ̂n an efficient estimator?


!
  1X
n
1X
n

E θ̂n = E Yi = E (Yi ) = =θ
n i=1 n i=1 n
!
  1X
n
1 X
n
nθ2 θ2
V θ̂n = V Yi = V (Y i ) = =
n i=1 n2 i=1 n2 n

Since X(Ω) = [0, ∞), then  



I(θ) = V ln f (x; θ)
∂θ

By : Sun Bunra 90
Institute of Technology of Cambodia Statistics ( 2022-2023 )
 
1
Since ln f (x; θ) = − ln θ + − 1 ln x then
θ

∂ 1 1
ln f (x; θ) = − − 2 y
∂θ θ θ
Then
1 θ2 1
I(θ) =
4
V (Y ) = 4
= 2
θ θ θ
1 θ 2  
⇒ = = V θ̂n
nI(θ) n

From (1) and (2), we have θ̂n is an efficient estimator of θ.


(c) Find the mgr of U =

2X
n
2nθ̂n
U= =− ln Xi
θ θ i=1
  2t ∑n 
MU (t) = E etU = E e− θ i=1 ln xi
h  2t ∑n in h  in
−θ ln x − 2t
= E e i=1 = E X θ

Since Z Z
  1
1 1 −1 1 1 1−2t
− 2t − 2t
E X θ = x x dx =
θ θ x θ dx
0 θ θ 0
" 1−2t #1
1 x θ 1
= 1−2t = (1 − 2t)−1 , t <
θ θ
2
0

1
So, MU (t) = (1 − 2t)−n = (1 − 2t)− 2 , t < .
2n

2
Hence, U ∼ χ (2n).
2

(d) Find 100(1 − α)% CI for θ


Since U ∼ χ2 (2n), by symmetry principle, we have
 
1 − α = P χ21− α ,2n ≤ U ≤ χ2α ,2n
!
2 2

2nθ̂n
=P χ1− α2 ,2n ≤ ≤ χ α2 ,2n
θ
!
2nθ̂n 2nθ̂n
=P ≤θ≤
χ2α ,2n χ1− α2 ,2n
2

Therefore, a 100(1 − α)%CI for θ is


" #
2nθ̂n 2nθ̂n
I(θ) = , .
χ2α ,2n χ1− α2 ,2n
2

(e) H0 : θ = 1 vs Ha : θ = θa , θa > 1

By : Sun Bunra 91
Institute of Technology of Cambodia Statistics ( 2022-2023 )

By Neyman-Pearson lemma, we have, for k > 0,

L (θ0 ) L(1)
≤k⇔ ≤k
L (θa ) L (θa )
1
⇔ Q  θ 1−1 ≤ k
−n n 0
θa j=1 xi
! σ 1−1
Yn 0

⇔ xi ≤ kθa−n
i=1

 X
1
n

⇔ 1− ln xi ≤ ln kθa−n
θa i=1
X
n
θa 
⇔ ln xi ≤ ln kθa−n ∼=c
i=1
θa − 1
( )
X
n
Therefore, RR = (x1 , · · · , xn ) : ln xi ≤ c where the constant c is defined by
i=1
!
X
n
α=P ln Xi ≤ c | θ = 1
 i=1

2c
= P U ≥ − | θ = 1 , U ∼ χ2 (2n)
θ
= P (U ≥ −2c)

So, −2c = χ2α,2n = χ20.01,30 = −50.892 ⇒ c = −25.446.


( )
X
15
Therefore, RR = (x1 , · · · , xn ) : ln xi ≤ −25.446 .
i=1

X
n
(f) Since the test H0 : θ = 1 vs Ha : θ = θa defines the test statistic ln Xi and RR do not
i=1
depend on θa , for each θa > 1, then it is a UMP test for testing H0 : θ = 1 vs H0 : θ > 1.

Exercise 25. Suppose that X, the fraction of a container that is filled, has pdf f (x; θ) =
θxθ−1 for 0 < x < 1 (where θ > 0 ) and zero otherwise, and let X1 , . . . , Xn be a random
sample from this distribution.
(a) Show thatXthe most powerful test for H0 : θ = 1 versus Ha : θ = 2 rejects the null
hypothesis if ln (xi ) ≥ c.
(b) Is the test of (a) UMP for testing H0 : θ = 1 versus Ha : θ > 1 ? Explain your reasoning.
(c) If n = 50, what is the (approximate) value of c for which the test has significance level
0.05 ?
Solution :
The probability density function is f (x, θ) = θxθ−1
We have n random variables describing the fraction of container that is filled.

By : Sun Bunra 92
Institute of Technology of Cambodia Statistics ( 2022-2023 )

To test the hypothesis as :


H0 : θ = 1
Ha : θ = 2

The likelihood function is,

f (x1 , . . . , xn ; θ) = f (x1 ; θ) × f (x2 ; θ) × . . . × f (xn ; θ)


= θxθ−1
1 × θxθ−1
2 × . . . × θxθ−1
n
! θ−1
Yn
= θn xi
i=1

By the Neyman-Pearson theorem,

f (x1 , . . . , xn ; θ = 2)
≥k
f (x1 , . . . , xn ; θ = 1)
Q 2−1
2n ( ni=1 xi )
⇒ Q 1−1 ≥ k
1n ( ni=1 xi )
Yn
k
⇒ xi ≥ n
i=1
2

Apply logarithm on both sides,


!  
Y
n
k
ln xi ≥ ln n
i=1
2
Xn  
k
⇒ ln (xi ) ≥ ln n
i=1
2
X
n
⇒ ln (xi ) ≥ c
i=1

X
n  
k
The rejection region for the most powerful test is of the form ln (xi ) ≥ ln .
i=1
2n
(b) Is the test of (a) UMP for testing H0 : θ = 1 versus Ha : θ > 1 ? Explain your reasoning.
Yes, the test in part (a) is Uniformly Most Powerful for testing H0 : θ = 1 versus H0 : θ > 1.
 
1X
n
1 k
Because for any θa > θ0 , the most powerful level α test reject if y = ln (xi ) ≥ ln n
n i=1 n 2
(c) If n = 50, what is the (approximate) value of c for which the test has significance level
0.05 ?
The likelihood function is,
(
1 if 0 < ln (xi ) < 1
f (x1 , . . . , xn ; θ = 1) =
0 Otherwise

By : Sun Bunra 93
Institute of Technology of Cambodia Statistics ( 2022-2023 )

The expected value of ln(x) is,


Z 1
E(ln(x)) = ln(x)f (x)dx
0
Z 1
= ln(x)dx
0
= (x ln(x) − x)10
=1

The expected value of ln(x)2 is,


Z
2
 1
E ln(x) = ln(x)2 f (x)dx
Z0 1
= ln(x)2 dx
0
1
= ln(x)2 x − 2x ln(x) + 2x 0
=2

The standard deviation of ln(x) is,


p
SD(ln(x)) = E (ln(x)2 ) − E(ln(x))2

= 2−1
=1

Thus,
!
X
n
⇒P ln (xi ) ≥ c = 0.05
 i=1

c−1
⇒P Z≥ √ = 0.05
1/ 50
 
c−1
⇒1−Φ √ = 0.05
1/ 50
 
c−1
⇒Φ √ = 0.95
1/ 50
c−1
⇒ √ = 1.645 ⇒ c = 1.2326 (From normal distribution tables)
1/ 50

Therefore, c = 1.2326

By : Sun Bunra 94
Institute of Technology of Cambodia Statistics ( 2022-2023 )

TD5 - (Inferences Based on Two Samples)


Exercise 1. Consider the hypothesis test H0 : µ1 = µ2 against Ha : µ1 ̸= µ2 with known
variances σ1 = 10 and σ2 = 5. Suppose that sample sizes n1 = 10 and n2 = 15 and that
x̄1 = 4.7 and x̄2 = 7.8. Use α = 0.05.
(a) Test the hypothesis and find the P -value.
(b) Explain how the test could be conducted with a confidence interval.
(c) What is the power of the test in part (a) for a true difference in means of 3 ?
(d) Assuming equal sample sizes, what sample size should be used to obtain β = 0.05 if the
true difference in means is 3? Assume that α = 0.05.
Solution :
a. Determine the value of the test statistic:
x̄1 − x̄2 4.7 − 7.8
Z=q 2 = q ≈ −0.91
σ1 σ22 102 52
n1
+ n2 10
+ 15

The P-value is the probability of obtaining a value more extreme or equal to the standardized
test statistic z. Determine the probability using table III.

P = P (Z < −0.91 or Z > 0.91) = 2P (Z < −0.91) = 2(0.181411) = 0.362822

If the P-value is smaller than the significance level α, then the null hypothesis is rejected.

P > 0.05 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that the population means are not
equal.
b. If a 95%(1 − α) confidence interval for the mean does not contain 0 , then we reject the
null hypothesis of equal means.
If a 95%(1 − α) confidence interval for the mean contains 0 , then we fail to reject the null
hypothesis of equal means.
For confidence level 1 − α = 0.95, determine zα/2 = z0.025 using table III (look up 0.025 in
the table, the z-score is then the found z-score with opposite sign):

zα/2 = 1.96

The endpoints of the confidence interval for µ1 − µ2 are:


s r
σ12 σ22 102 52
(x̄1 − x̄2 ) − zα/2 · + = (4.7 − 7.8) − 1.96 · + ≈ −9.7947
n1 n2 10 15
s r
σ12 σ22 102 52
(x̄1 − x̄2 ) + zα/2 · + = (4.7 − 7.8) + 1.96 · + ≈ 3.5947
n1 n2 10 15

The confidence interval does not contain 0 and thus the means appear to be equal.
c. The power is the probability of rejecting the null hypothesis when the alternative hypoth-
esis is true.

By : Sun Bunra 95
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Determine the z-score corresponding with a probability of α = 2.5% = 0.025

z = ±1.96

The corresponding sample mean difference is the population mean difference (of the hypoth-
esis) increased by the product of the z-score and the standard deviation :
s r
σ12 σ22 102 52
(x̄1 − x̄2 ) = (µ1 − µ2 ) − zα/2 · + x̄ = 0 − 1.96 + ≈ −6.6947
n1 n2 10 15
s r
σ12 σ22 102 52
(x̄1 − x̄2 ) = (µ1 − µ2 ) + zα/2 · + x̄ = 0 + 1.96 + ≈ 6.6947
n1 n2 10 15

The z-value is the sample mean difference decreased by the population mean difference
(alternative mean difference!), divided by the standard deviation :

x̄1 − x̄2 − (µ1 − µ2 ) −6.6947 − 3


z= q 2 2
= q ≈ −2.84
σ1 σ2 102 52
n1
+ n2 10
+ 15
x̄1 − x̄2 − (µ1 − µ2 ) 6.6947 − 3
z= q 2 = q ≈ 1.08
σ1 σ22 102 52
n1
+ n2 10
+ 15

Determine the probability of rejecting the null hypothesis using table III.

P = P (Z < −2.84 or Z > 1.08) = P (Z < −2.84) + P (Z > 1.08)


= P (Z < −2.84) + P (Z < −1.08)
= 0.002256 + 0.140071 = 0.142327

(d) Assuming equal sample sizes, what sample size should be used to obtain β = 0.05 if the
true difference in means is 3? Assume that α = 0.05.
Formula sample size :
2
zα/2 + zβ (σ12 + σ22 )
n=
(∆ − ∆0 )2
• ∆0 is the null hypothesis µ1 − µ2 = ∆0 .
• ∆ is the alternative hypothesis µ1 − µ2 = ∆.
Determine zα/2 = z0.025 using table III (look up 0.025 in the table, the z-score is then the
found z-score with opposite sign) :
zα/2 = 1.96
Determine zβ = z0.05 using table III (look up 0.05 in the table, the z-score is then the found
z-score with opposite sign) :
zα/2 = 1.64
Fill in the known values into the formula and evaluate :

(1.96 + 1.64)2 (102 + 52 )


n1 = n2 = = 180
(3 − 0)2

By : Sun Bunra 96
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 2. Consider the hypothesis test H0 : µ1 = µ2 against Ha : µ1 > µ2 with known


variances σ1 = 10 and σ2 = 5. Suppose that sample sizes n1 = 10 and n2 = 15 and that
x̄1 = 24.5 and x̄2 = 21.3. Use α = 0.05.
(a) Test the hypothesis and find the P -value.
(b) Explain how the test could be conducted with a confidence interval.
(c) What is the power of the test in part (a) if µ1 is 2 units greater than µ2 ?
(d) Assuming equal sample sizes, what sample size should be used to obtain β = 0.05 if µ1
is 2 units greater than µ2 ? Assume that α = 0.05.
Solution :
(a). Test the hypothesis and find the P-value
Test H0 : µ1 = µ2 vs Ha : µ1 > µ2 , α = 0.05

m = n1 = 10, x̄ = x̄1 = 24.5, σ1 = 10


n = n2 = 15, ȳ = x̄2 = 21.3, σ2 = 5

• Test statistic value


x̄ − ȳ − ∆0 24.5 − 21.3 − 0 3.2
z= q 2 =√ = = 0.93
σ σ22 10 2 /10 + s2 /15 3.41
m

+ n

Since H a : µ1 − µ2 > 0

P- volue ( Upper - tailet test )

0
⇒ P − value = 1 − ϕ(z) = 1 − ϕ(0.93)
= 1 − 0.8238 = 0.1762
We get P -value = 0.1762 ⇒ P -value > α = 0.05
We do not rect the null hypotheses H0 .
(b). Explain how the test could be conducted with CI.
r
δ12 δ22
CI (µ1 − µ2 ) = x̄ − ȳ ± z α2 +
m r n
102 52
= 24.5 − 21.3 ± 1.96 +
10 15
= [−3.48, 9.88]

Since 0 ∈ CI (µ1 − µ2 ), we can say that there is not a significant.


Therefore, we fail to reject the null hypothesis.

By : Sun Bunra 97
Institute of Technology of Cambodia Statistics ( 2022-2023 )

(C.) Find the power of test in part (A) if µ1 = 2µ2


 
′ ∆′ − ∆0
β (∆ ) = ϕ Zα −
σ
  r
2−0 σ12 σ22
= ϕ 1.65 − ,σ = + = 3.41
σ m n
 
′ 2
Then β (∆ ) = ϕ 1.65 − = ϕ(1.07) = 0.7577
3.41
Therefore, the power of test = 1 − β = 0.1423
(d). Find the sample size
σ12 σ22 (∆′ − ∆0 )2
+ =
m n (zα + zβ )2
Since m = u
(zα + zβ )2 (σ12 + σ22 )
We have n =
(∆′ − ∆0 )2
Since, α = β = 0.05 ⇒ z0.05 = ϕ−1 (1 − 0.05) = ϕ−1 (0.95) = 1.65
(1.65 − 1.65)2 (102 + 52 )
n= = 340.31 ≃ 340
22
Therefore, the sample size n = 340
Exercise 3. Two machines are used for filling plastic bottles with a net volume of 16.0
ounces. The fill volume can be assumed normal, with standard deviation σ1 = 0.020 and
σ2 = 0.025 ounces. A member of the quality engineering staff suspects that both machines
fill to the same mean net volume, whether or not this volume is 16.0 ounces. A random
sample of 10 bottles is taken from the output of each machine.
Machine 1 Machine 2
16.03 16.01 16.02 16.03
16.04 15.96 15.97 16.04
16.05 15.98 15.96 16.02
16.05 16.02 16.01 16.01
16.02 15.99 15.99 16.00
(a) Do you think the engineer is correct? Use α = 0.05. What is the P-value for this test?
(b) Calculate a 95% confidence interval on the difference in means. Provide a practical
interpretation of this interval.
(c) What is the power of the test in part (a) for a true difference in means of 0.04 ?
(d) Assuming equal sample sizes, what sample size should be used to assure that β = 0.05
if the true difference in means is 0.04 ? Assume that α = 0.05.
Solution :
(a) The mean is the sum of all values divided by the number of values:
16.03 + 16.04 + 16.05 + . . . + 15.98 + 16.02 + 15.99
x̄1 = = 16.018
10
16.02 + 15.97 + 15.96 + . . . + 16.02 + 16.01 + 16.00
x̄2 = = 16.005
10
By : Sun Bunra 98
Institute of Technology of Cambodia Statistics ( 2022-2023 )

The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µ1 = µ 2
Ha : µ1 ̸= µ2
Determine the value of the test statistic :
x̄1 − x̄2 16.018 − 16.005
z=q 2 2
= q ≈ 1.28
σ1 σ2 0.0202 0.0252
n1
+ n2 10
+ 10

The P-value is the probability of obtaining a value more extreme or equal to the standardized
test statistic z, assuming that the null hypothesis is true. Determine the probability using
the normal probability table.

P = P (Z < −1.28 or Z > 1.28) = 2P (Z < −1.28) = 2(0.100273) = 0.200546

If the P-value is smaller than the significance level α, then the null hypothesis is rejected.

P > 0.05 ⇒ Fail to reject H0

There is not sufficient evidence to reject the engineer’s claim.


(b) Calculate a 95% confidence interval on the difference in means. Provide a practical
interpretation of this interval.

∆ = 0.04
∆0 = 0
Determine zα/2 = z0.025 using using the normal probability table in the appendix (look up
0.025 in the table, the zscore is then the found z-score with opposite sign) :

zα/2 = 1.96

The power is then :


   
∆ − ∆0  ∆ − ∆0 
P OW ER = 1 − β = 1 − Φ zα/2 − q 2 2
+ Φ −zα/2 − q 2
σ1 σ σ1 σ2
n1
+ n22 n1
+ n22
   
0.04 − 0  + Φ −1.96 − q 0.04 − 0
= 1 − Φ 1.96 − q 
0.0202 0.0252 0.0202 0.0252
10
+ 10 10
+ 10
= 1 − Φ(−1.99) + Φ(5.91)
= 1 − 0.023295 + 0
= 0.976705
(c) Given :
c = 95% = 0.95
For confidence level 1 − α = 0.95, determine zα/2 = z0.025 using the normal probability
table in the appendix (look up 0.025 in the table, the z-score is then the found z-score with
opposite sign) :
zα/2 = 1.96

By : Sun Bunra 99
Institute of Technology of Cambodia Statistics ( 2022-2023 )

The margin of error then becomes :


s r
σ12 σ22 0.0202 0.0252
E = zα/2 · + = 1.96 · + ≈ 0.0198
n1 n2 10 10
The endpoints of the confidence interval for µ1 − µ2 are :
(x̄1 − x̄2 ) − E = (16.018 − 16.005) − 0.0198 = 0.013 − 0.0198 = −0.0068
(x̄1 − x̄2 ) + E = (16.018 − 16.005) + 0.0198 = 0.013 + 0.0198 = 0.0328
We are 95% confident that the mean net volume of machine 1 is between 0.0068 ounces lower
and 0.0328 ounces higher than the mean net volume of machine 2.
(d) Assuming equal sample sizes, what sample size should be used to assure that β = 0.05
if the true difference in means is 0.04 ? Assume that α = 0.05.
Formula sample size : 2
zα/2 + zβ (σ12 + σ22 )
n=
(∆ − ∆0 )2
Determine zα/2 = z0.025 using the normal probability table in the appendix (look up 0.025
in the table, the z-score is then the found z-score with opposite sign) :
zα/2 = 1.96
Determine zβ = z0.01 using the normal probability table in the appendix (look up 0.01 in the
table, the z-score is then the found z-score with opposite sign) :
zβ = 2.33
Fill in the known values into the formula and evaluate (round up!) :
(1.96 + 2.33)2 (0.0202 + 0.0252 )
n1 = n2 = ≈ 12
(0.04 − 0)2
Exercise 4. Two different formulations of an oxygenated motor fuel are being tested to
study their road octane numbers. The variance of road octane number for formulation 1
is σ12 = 1.5, and for formulation 2 it is σ22 = 1.2. Two random samples of size n1 = 15
and n2 = 20 are tested, and the mean road octane numbers observed are x̄1 = 89.6 and
x̄2 = 92.5. Assume normality.
(a) If formulation 2 produces a higher road octane number than formulation 1, the manufac-
turer would like to detect it. Formulate and test an appropriate hypothesis, using α = 0.05.
What is the P -value?
(b) Explain how the question in part (a) could be answered with a 95% confidence interval
on the difference in mean road octane number.
(c) What sample size would be required in each population if we wanted to be 95% confident
that the error in estimating the difference in mean road octane number is less than 1 ?
Solution :
a. Determine the value of the test statistic :
x̄1 − x̄2 89.6 − 92.5
Z=q 2 2
= q ≈ −7.25
σ1 σ2 1.5 1.2
n1
+ n2 15
+ 20

By : Sun Bunra 100


Institute of Technology of Cambodia Statistics ( 2022-2023 )

The P-value is the probability of obtaining a value more extreme or equal to the standardized
test statistic z. Determine the probability using table III.

P = P (Z < −7.25) ≈ 0

If the P-value is smaller than the significance level α, then the null hypothesis is rejected.

P < 0.05 ⇒ Fail to reject H0

There is sufficient evidence to support the claim that the second population mean is larger
than the first population mean.
b. If a 95%(1 − α) confidence interval for the mean does not contain 0 , then we reject the
null hypothesis of equal means. If a 95%(1 − α) confidence interval for the mean contains 0
, then we fail to reject the null hypothesis of equal means. For confidence level 1 − α = 0.95,
determine zα = z0.025 using table III (look up 0.025 in the table, the z-score is then the found
z-score with opposite sign) :
zα/2 = 1.96
The endpoints of the confidence interval for µ1 − µ2 are :
s r
σ12 σ22 1.5 1.2
(x̄1 − x̄2 ) − zα/2 · + = (89.6 − 92.5) − 1.96 · + ≈ −3.684
n1 n2 15 20
s r
σ12 σ22 1.5 1.2
(x̄1 − x̄2 ) + zα/2 · + = (89.6 − 92.5) + 1.96 · + ≈ −2.116
n1 n2 15 20

The confidence interval does not contain 0 and thus the means appear to be unequal.
(c) What sample size would be required in each population if we wanted to be 95% confident
that the error in estimating the difference in mean road octane number is less than 1 ?
Formula sample size :
(zα + zβ )2 (σ12 + σ22 )
n=
(∆ − ∆0 )2
• ∆0 is the null hypothesis µ1 − µ2 = ∆0 .
• ∆ is the alternative hypothesis µ1 − µ2 = ∆.
Determine zα = z0.05 using table III (look up 0.05 in the table, the z-score is then the found
z-score with opposite sign) :
zα = 1.645
Determine zβ = z0.05 using table III (look up 0.05 in the table, the z-score is then the found
z-score with opposite sign):
zβ = 1.645
Fill in the known values into the formula and evaluate (round up to the nearest integer!) :

(1.645 + 1.645)2 (1.5 + 1.2)


n1 = n2 = = 30
(1 − 0)2

By : Sun Bunra 101


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 5. The diameter of steel rods manufactured on two different extrusion machines
is being investigated. Two random samples of sizes n1 = 15 and n2 = 17 are selected, and
the sample means and sample variances are x̄1 = 8.73, s21 = 0.35, x̄2 = 8.68, and s22 = 0.40,
respectively. Assume that σ12 = σ22 and that the data are drawn from a normal distribution.
(a) Is there evidence to support the claim that the two machines produce rods with different
mean diameters? Use α = 0.05 in arriving at this conclusion. Find the P -value.
(b) Construct a 95% confidence interval for the difference in mean rod diameter. Interpret
this interval.
Solution :
a. Determine the hypotheses
H 0 : µ1 = µ 2
Ha : µ1 ̸= µ2
Determine the pooled standard deviation
s r
(n1 − 1) s21 + (n2 − 1) s22 (15 − 1)0.35 + (17 − 1)0.40
sp = = ≈ 0.6137
n1 + n2 − 2 15 + 17 − 2

Determine the test statistic


x̄ − x̄2 8.73 − 8.68
t= q1 = q ≈ 0.230
sp n11 + n12 0.6137 15 1 1
+ 17

Determine the corresponding P-value from table ∨ with df = n1 + n2 − 2 = 15 + 17 − 2 = 30

P > 2 × 0.40 = 0.80

If the P-value is less than or equal to the significance level, then the null hypothesis is rejected

P > 0.05 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that the population means are different.
b. Given
c = 95% = 0.95 ⇒ α = 1 − c = 1 − 0.95 = 0.05
Determine tα/2 with df = 30 using table V :

t0.025 = 2.042

The endpoints of the confidence interval for µ1 − µ2 are:


r r
1 1 1 1
(x̄1 − x̄2 ) − tα/2 · sp + = (8.73 − 8.68) − 2.042 · 0.6137 + ≈ −0.3939
n1 n2 15 17
r r
1 1 1 1
(x̄1 − x̄2 ) + tα/2 · sp + = (8.73 − 8.68) + 2.042 · 0.6137 + ≈ 0.4939
n1 n2 15 17

By : Sun Bunra 102


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 6. An article in Fire Technology investigated two different foam expanding agents
that can be used in the nozzles of fire-fighting spray equipment. A random sample of five
observations with an aqueous film-forming foam (AFFF) had a sample mean of 4.7 and a
standard deviation of 0.6. A random sample of five observations with alcohol-type concen-
trates (ATC) had a sample mean of 6.9 and a standard deviation 0.8.
(a) Can you draw any conclusions about differences in mean foam expansion? Assume that
both populations are well represented by normal distributions with the same standard devi-
ations.
(b) Find a 95% confidence interval on the difference in mean foam expansion of these two
agents.
Solution :
Given :
n1 = Sample size = 5
n2 = Sample size = 5
x̄1 = Sample mean = 4.7
x̄2 = Sample mean = 6.9
s1 = Sample standard deviation = 0.6
s2 = Sample standard deviation = 0.8
α = Significance level = 5% = 0.05
Determine tα/2 using the Student’s T distribution table, which is given in the column with
α/2 = 0.025 and in the row with df = n1 + n2 − 2 = 5 + 5 − 2 = 8 :

t0.025 = 2.306

The endpoints of the confidence interval for µ1 − µ2 are :


r r
1 1 1 1
(x̄1 − x̄2 ) − tα/2 · sp + = (4.7 − 6.9) − 2.306 · 0.7071 +
n1 n2 5 5
≈ −3.2313
r r
1 1 1 1
(x̄1 − x̄2 ) + tα/2 · sp + = (4.7 − 6.9) + 2.306 · 0.7071 +
n1 n2 5 5
≈ −1.1687

We are 95% confident that the mean foam expansion of AFFF is between 1.1687 and 3.2313
lower than the mean foam expansion of ATC.
Since the confidence interval does not contain 0 and only contains negative values, ATC
appears to have the greatest mean foam expansion and thus we can draw conclusion about
which agent produces the greatest man foam expansion.

By : Sun Bunra 103


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 7. The deflection temperature under load for two different types of plastic pipe is
being investigated. Two random samples of 15 pipe specimens are tested, and the deflection
temperatures observed are as follows (in ◦ F) :
Type 1: 206, 188, 205, 187, 194, 193, 207, 185, 189, 213, 192, 210, 194, 178, 205
Type 2: 177, 197, 206, 201, 180, 176, 185, 200, 197, 192, 198, 188, 189, 203, 192
(a) Construct box plots and normal probability plots for the two samples. Do these plots
provide support of the assumptions of normality and equal variances? Write a practical
interpretation for these plots.
(b) Do the data support the claim that the deflection temperature under load for type 1 pipe
exceeds that of type 2? In reaching your conclusions, use α = 0.05. Calculate a P -value.
(c) If the mean deflection temperature for type 1 pipe exceeds that of type 2 by as much as
5◦ F, it is important to detect this difference with probability at least 0.90. Is the choice of
n1 = n2 = 15 adequate? Use α = 0.05.
Solution :
Given:
n1 = Sample size = 15
n2 = Sample size = 15
α = Significance level = 0.05

The mean is the sum of all values divided by the number of values:

206 + 188 + 205 + . . . + 194 + 178 + 205


x̄1 = ≈ 196.4
15
177 + 197 + 206 + . . . + 189 + 203 + 192
x̄2 = ≈ 192.0667
15

The variance is the sum of squared deviations from the mean divided by n − 1. The standard
deviation is the square root of the variance:

r
(206 − 196.4)2 + . . . . + (205 − 196.4)2
s1 = ≈ 10.4799
15 − 1
r
(177 − 192.0667)2 + . . . . + (192 − 192.0667)2
s2 = ≈ 9.4375
15 − 1

(a) NORMAL PROBABILITY PLOT


The data values are on the horizontal axis and the standardized normal scores are on the
vertical axis.
If the data contains n data values, then the standardized normal scores are the z-scores in
j − 0.5
table III corresponding to an area of (or the closest area) with j ∈ {1, 2, 3, . . . , n}.
n
The smallest standardized score corresponds with the smallest data value, the second smallest
standardized score corresponds with the second smallest data value, and so on.

By : Sun Bunra 104


Institute of Technology of Cambodia Statistics ( 2022-2023 )

• BOXPLOT
The whiskers of the boxplot are at the minimum and maximum value. The box starts at the
lower quartile, ends at the upper quartile and has a vertical line at the median.
The lower quartile is at 25% of the sorted data list, the median at 50% and the upper quartile
at 75%.

The assumption of normality appears to be valid, because the patterns in the normal prob-
ability plots are roughly linear and contained no strong curvature.
The assumption of equal variances appears to be valid, because the boxplots have roughly
the same width.
(b) Given claim: The mean of type 2 is higher than the mean of type 1. The claim is either
the null hypothesis or the alternative hypothesis. The null hypothesis and the alternative
hypothesis state the opposite of each other. The null hypothesis needs to contain an equality.

H 0 : µ1 = µ2
H 1 : µ1 < µ 2

Determine the pooled standard deviation:


s r
(n1 − 1) s21 + (n2 − 1) s22 (15 − 1)10.47992 + (15 − 1)9.43752
sp = = ≈ 9.9723
n1 + n2 − 2 15 + 15 − 2

Determine the test statistic :


x̄ − x̄2 196.4 − 192.0667
t= q1 = q ≈ 1.190
sp n11 + n12 9.9723 15 1 1
+ 15

By : Sun Bunra 105


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Determine the corresponding P-value from the Student’s T distribution table in the appendix
with df = n1 + n2 − 2 = 15 + 15 − 2 = 28

P > 0.40

If the P-value is less than or equal to the significance level, then the null hypothesis is rejected

P > 0.05 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that the deflection temperature under
load for type 2 pipe exceeds that of type 1 .
Given claim : The mean of type 2 is higher than the mean of type 1. The claim is either
the null hypothesis or the alternative hypothesis. The null hypothesis and the alternative
hypothesis state the opposite of each other. The null hypothesis needs to contain an equality.
H 0 : µ1 = µ2
H 1 : µ1 < µ 2
Determine the pooled standard deviation :
s r
(n1 − 1) s21 + (n2 − 1) s22 (15 − 1)10.47992 + (15 − 1)9.43752
sp = = ≈ 9.9723
n1 + n2 − 2 15 + 15 − 2

Determine the test statistic :


x̄ − x̄2 196.4 − 192.0667
t= q1 = q ≈ 1.190
sp n11 + n12 9.9723 15 1 1
+ 15

Determine the corresponding P-value from the Student’s T distribution table in the appendix
with df = n1 + n2 − 2 = 15 + 15 − 2 = 28

P > 0.40

(c) If the mean deflection temperature for type 1 pipe exceeds that of type 2 by as much as
5◦ F, it is important to detect this difference with probability at least 0.90. Is the choice of
n1 = n2 = 15 adequate? Use α = 0.05.

∆=5
P OW ER = 0.90
β is the complement of the power, thus 1 decreased by the power.

β = 1 − P OW ER = 1 − 0.90 = 0.10

Formula sample size : 2


zα/2 + zβ (σ12 + σ22 )
n=
(∆ − ∆0 )2
∆0 is the null hypothesis µ1 −µ2 = ∆0 ∆ is the alternative hypothesis µ1 −µ2 = ∆. Determine
zα/2 = z0.025 using the normal probability table in the appendix (look up 0.025 in the table,
the z-score is then the found z-score with opposite sign) :

zα/2 = 1.96

By : Sun Bunra 106


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Determine zβ = z0.10 using the normal probability table in the appendix (look up 0.10 in the
table, the z-score is then the found z-score with opposite sign):
zβ = 1.28
Fill in the known values into the formula and evaluate (round up!):
(1.96 + 1.28)2 (10.47992 + 9.43752 )
n1 = n2 = = 84
(5 − 0)2
Since this required sample size is much larger than the used sample sizes of 15 , the sample
size of n1 = n2 = 15 is then not adequate.
Exercise 8. Two companies manufacture a rubber material intended for use in an automo-
tive application. The part will be subjected to abrasive wear in the field application, so we
decide to compare the material produced by each company in a test. Twenty-five samples
of material from each company are tested in an abrasion test, and the amount of wear after
1000 cycles is observed. For company 1, the sample mean and standard deviation of wear
are x̄1 = 20 milligrams/1000 cycles and s1 = 2 milligrams 1000 cycles, while for company 2
we obtain x̄2 = 15 milligrams/1000 cycles and s2 = 8 milligrams/1000 cycles.
(a) Do the data support the claim that the two companies produce material with different
mean wear? Use α = 0.05, and assume each population is normally distributed but that
their variances are not equal. What is the P -value for this test?
(b) Do the data support a claim that the material from company 1 has higher mean wear
than the material from company 2 ? Use the same assumptions as in part (a).
(c) Construct confidence intervals that will address the questions in parts (a) and (b) above.
Solution :
Given x̄1 = 20, s1 = 2, n1 = 25, x̄2 = 15, s2 = 8, n2 = 25
(a) Given claim: Different
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µ1 = µ 2
Ha : µ1 ̸= µ2
Determine the test statistic
x̄1 − x̄2 20 − 15
t= q 2 2
=q ≈ 3.0317
s1 s2 22 82
n1
+ n2 25
+ 25

Determine the degrees of freedom (rounded down to the nearest integer)


 2 2  2 2
s1 s22 2 82
n1
+ n2 25
+ 25
∆= 2 2 2 = 2 = 26.9883 ≈ 27
(s1 /n1 ) (s22 /n2 ) 2
(2 /25) (82 /25)2
+ n2 −1 25−1
+ 25−1
n1 −1

The P-value is the probability of obtaining the value of the test statistic, or a value more ex-
treme. The P-value is the number (or interval) in the column title of Student’s T distribution
in the appendix containing the t-value in the row df = 27
P < 2 × 0.0005 = 0.001

By : Sun Bunra 107


Institute of Technology of Cambodia Statistics ( 2022-2023 )

If the P-value is less than or equal to the significance level, then the null hypothesis is rejected

P < 0.05 ⇒ Reject H0

There is sufficient evidence to support the claim that the two companies produce material
with different mean wear.
(b) Given claim: Higher for company 1
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µ1 = µ 2
H a : µ1 > µ 2
Determine the test statistic
x̄1 − x̄2 20 − 15
t= q 2 = q ≈ 3.0317
s1 s22 22 82
n1
+ n2 25
+ 25

Determine the degrees of freedom (rounded down to the nearest integer)


 2  2
s21 s22 22 82
n1
+ n2 25
+ 25
∆= 2 2 = = 26.9883 ≈ 27
(s21 /n1 ) (s22 /n2 ) (22 /25)2 (82 /25)2
+ 25−1
+ 25−1
n1 −1 n2 −1

The P-value is the probability of obtaining the value of the test statistic, or a value more ex-
treme. The P-value is the number (or interval) in the column title of Student’s T distribution
in the appendix containing the t-value in the row df = 27

P < 0.0005

If the P-value is less than or equal to the significance level, then the null hypothesis is rejected

P < 0.05 ⇒ Reject H0

There is sufficient evidence to support the claim that the material from company 1 has higher
mean wear than the material from company 2.

Exercise 9. The thickness of a plastic film (in mils) on a substrate material is thought to
be influenced by the temperature at which the coating is applied. A completely randomized
experiment is carried out. Eleven substrates are coated at 125◦ F, resulting in a sample mean
coating thickness of x̄1 = 103.5 and a sample standard deviation of s1 = 10.2. Another
13 substrates are coated at 150◦ F, for which x̄2 = 99 and s2 = 20.1 are observed. It
was originally suspected that raising the process temperature would reduce mean coating
thickness.
(a) Do the data support this claim? Use α = 0.01 and assume that the two population
standard deviations are not equal. Calculate an approximate P -value for this test.
(b) How could you have answered the question posed regarding the effect of temperature on
coating thickness by using a confidence interval? Explain your answer.
Solution :

By : Sun Bunra 108


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Given n1 = 11, n2 = 13, x̄1 = 103.5, x̄2 = 99, s1 = 10.2, s2 = 20.1, α = 0.01
a. The variances are assumed to be non-equal, thus we need to use the unpooled t-test
H 0 : µ1 = µ2
H 1 : µ1 > µ 2
Determine the test statistic :
x̄1 − x̄2 103.5 − 99
t= q 2 2
=q ≈ 0.7067
s1 s2 10.22 20.12
n1
+ n2 11
+ 13
Determine the degrees of freedom (rounded down to the nearest integer) :
 2 2  2 2
s1 s22 10.2 20.12
n1
+ n2 11
+ 13
∆= 2 2 2 = ≈ 18
(s1 /n1 ) (s22 /n2 ) (10.22 /11)2 (20.12 /13)2
+ n2 −1 11−1
+ 13−1
n1 −1

The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme. The P-value is the number (or interval) in the column title of Table V containing
the t-value in the row df = 18 :
0.25 < P < 0.40
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
P > 0.01 ⇒ Fail to reject H0
There is not sufficient evidence to support the claim that the process temperature would
reduce the mean coating thickness.
b. If we construct a confidence interval and the confidence interval contains 0 , then we will
fail to reject the null hypothesis H0 and there will not be sufficient evidence to support the
claim of different population means. If the confidence interval does not contain 0 , then we
will reject the null hypothesis H0 and there is sufficient evidence to support the claim of
different population means.
α = 0.05 ⇒ c = 1 − α = 1 − 0.05 = 0.95 = 95%
Determine the degrees of freedom (rounded down to the nearest integer)
 2 2  2 2
s1 s22 10.2 20.12
n1
+ n2 11
+ 13
∆= 2 2 2 = 2 ≈ 18
(s1 /n1 ) (s22 /n2 ) (10.22 /11) (20.12 /13)2
+ 11−1
+ 13−1
n1 −1 n2 −1

Determine the t-value by looking in the row starting with degrees of freedom df = 18 and in
the column with α = (1 − c)/2 = 0.005 in the Student’s t distribution table in the appendix
t0.005 = 2.878
The endpoints of the confidence interval for µ1 − µ2 are
s r
s21 s22 10.22 20.12
(x̄1 − x̄2 ) − tα/2 · + = (103.5 − 99) − 2.878 · + ≈ −13.8235
n1 n2 11 13
s r
s21 s22 10.22 20.12
(x̄1 − x̄2 ) + tα/2 · + = (103.5 − 99) + 2.878 · + ≈ 22.8235
n1 n2 11 13
Since the confidence interval (−13.8235, 22.8235) contains 0 , there is not sufficient evidence
to support the claim that the process temperature would reduce the mean coating thickness.

By : Sun Bunra 109


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 10. Fifteen adult males between the ages of 35 and 50 participated in a study to
evaluate the effect of diet and exercise on blood cholesterol levels. The total cholesterol was
measured in each subject initially and then three months after participating in an aerobic
exercise program and switching to a low-fat diet. The data are shown in the accompanying
table
Blood Clolesterol Level
Subject Before After
1 265 229
2 240 231
3 258 227
4 295 240
5 251 238
6 245 241
7 287 234
8 314 256
9 260 247
10 279 239
11 283 246
12 240 218
13 238 219
14 225 226
15 247 233
(a) Do the data support the claim that low-fat diet and aerobic exercise are of value in
producing a mean reduction in blood cholesterol levels? Use α = 0.05. Find the P -value.
(b) Calculate a one-sided confidence limit that can be used to answer the question in part
(a).
Solution :
(a) Claim: Decrease
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
and the alternative hypothesis state the opposite of each other. The null hypothesis needs
to contain an equality.
H 0 : µd = 0
H a : µd > 0
Determine the sample mean of the differences. The mean is the sum of all values divided by
the number of values.
36 + 9 + 31 + . . . + 19 − 1 + 14
d¯ = ≈ 26.8667
15
Determine the sample standard deviation of the differences
r
(36 − 26.8667)2 + . . . + (14 − 26.8667)2
sd = ≈ 19.0371
15 − 1
Determine the value of the test statistic
d¯ 26.8667
t= √ = √ ≈ 5.4659
sd / n 19.0371/ 15

By : Sun Bunra 110


Institute of Technology of Cambodia Statistics ( 2022-2023 )

The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme, assuming that the null hypothesis is true. The P-value is the number (or interval)
in the column title of the Student’s Tistribution in the appendix containing the t-value in
the row df = n − 1 = 15 − 1 = 14
P < 0.0005
If the P-value is less than or equal to the significance level, then the null hypothesis is rejected
P < 0.05 ⇒ Reject H0
There is sufficient evidence to support the claim that low-fat diet and aerobic exercise are
of value in producing a man reduction in blood cholesterol levels.
(b) Determine the t-value using the Student’s T distribution table in the appendix with
• df = n − 1 = 15 − 1 = 14
• α = (1 − c)/2 = 0.025
t∗ = 2.145
The margin of error is then
sd 19.0371
E = t∗ · √ = 2.145 · √ ≈ 10.5434
n 15
The endpoints of the confidence interval for µd are
d¯ − E = 26.8667 − 10.5434 = 16.3233
d¯ + E = 26.8667 + 10.5434 = 37.4101
Exercise 11. Two different analytical tests can be used to determine the impurity level in
steel alloys. Eight specimens are tested using both procedures, and the results are shown in
the following tabulation.
Specimen Test 1 Test 2
1 1.2 1.4
2 1.3 1.7
3 1.5 1.5
4 1.4 1.3
5 1.7 2.0
6 1.8 2.1
7 1.4 1.7
8 1.3 1.6
(a) Is there sufficient evidence to conclude that tests differ in the mean impurity level, using
α = 0.01 ?
(b) Is there evidence to support the claim that Test 1 generates a mean difference 0.1 units
lower than Test 2? Use α = 0.05.
(c) If the mean from Test 1 is 0.1 less than the mean from Test 2, it is important to detect
this with probability at least 0.90. Was the use of eight alloys an adequate sample size? If
not, how many alloys should have been used?
Solution :
Given:
n=8
α = 0.01

By : Sun Bunra 111


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Determine the difference in value of each pair.

Difference D Sample 1 Sample 2


-0.2 1.2 1.4
-0.4 1.3 1.7
0 1.5 1.5
0.1 1.4 1.3
-0.3 1.7 2.0
-0.3 1.8 2.1
-0.3 1.4 1.7
-0.3 1.3 1.6

a. Determine the hypotheses


H 0 : µd = 0
H1 : µd ̸= 0
Determine the sample mean of the differences
−0.2 − 0.4 + 0 + 0.1 − 0.3 − 0.3 − 0.3 − 0.3
d¯ = ≈ −0.2125
8
Determine the sample standard deviation of the differences
r
(−0.2 − (−0.2125))2 + . . . + (−0.3 − (−0.2125))2
sd = ≈ 0.1727
8−1
Determine the value of the test statistic
d¯ −0.2125
t= √ = √ ≈ −3.480
sd / n 0.1727/ 8
The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme. The P-value is the number (or interval) in the column title of Table V containing
the t-value in the row df = n − 1 = 8 − 1 = 7

0.01 = 2 × 0.005 < P < 2 × 0.01 = 0.02

If the P-value is less than the significance level, reject the null hypothesis.

P > 0.01 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that tests differ in the mean impurity
level.
b.Given
α = 0.05
Determine the hypotheses
H0 : µd = −0.1
H1 : µd < −0.1
Determine the value of the test statistic
d¯ − µd −0.2125 − (−0.1)
t= √ = √ ≈ −1.842
sd / n 0.1727/ 8

By : Sun Bunra 112


Institute of Technology of Cambodia Statistics ( 2022-2023 )

The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme. The P-value is the number (or interval) in the column title of Table V containing
the t-value in the row df = n − 1 = 8 − 1 = 7 :

0.05 < P < 0.10

If the P-value is less than the significance level, reject the null hypothesis.

P > 0.05 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that test 1 generates a mean difference
0.1 units lower than test 2.
c. Given
∆ = −0.1
P OW ER = 0.90
α = 0.05
β is the complement of the power, thus 1 decreased by the power:

β = 1 − P OW ER = 1 − 0.90 = 0.10

Formula sample size


(zα + zβ )2 (σ 2 )
n=
(∆ − ∆0 )2
∆0 is the null hypothesis µ1 −µ2 = ∆0 ∆ is the alternative hypothesis µ1 −µ2 = ∆. Determine
zα = z0.05 using table III (look up 0.05 in the table, the z-score is then the found z-score
with opposite sign):
zα = 1.64
Determine zβ = z0.10 using table III (look up 0.10 in the table, the z-score is then the found
z-score with opposite sign):
zα/2 = 1.28
Fill in the known values into the formula and evaluate (round up!):

(1.64 + 1.28)2 (0.17272 )


n1 = n2 = = 25
(−0.1 − 0)2

The sample size of 8 is not adequate, because the minimal sample size required to detect a
difference of 0.1 is 25 .

Exercise 12. For an F distribution, find the following :


(a) f0.25,5,10 (b)f0.10,24,9 (c) f0.05,8,15
(d) f0.75,5,10 (e)f0.90,24,9 (f) f0.95,8,15
Solution :
a. The critical value f0.25,5,10 is given in the first table of table VI in the row with
• df d = 10 and in the column with df n = 5

f0.25,5,10 = 1.59

By : Sun Bunra 113


Institute of Technology of Cambodia Statistics ( 2022-2023 )

b. The critical value f0.10,24,9 is given in the second table of table VI in the row with
• df d = 9 and in the column with df n = 24

f0.10,24,9 = 2.28

c. The critical value f0.05,8,15 is given in the third table of table VI in the row with
• df d = 15 and in the column with df n = 8

f0.05,8,15 = 2.64

Property F-distribution
1
f1−α,u,v =
fα,v,u
d. The critical value f0.25,10,5 is given in the first table of table VI in the row with
• df d = 5 and in the column with df n = 10

f0.25,5,10 = 1.89

Use the property of the F-distribution


1 1
f0.75,5,10 = = ≈ 0.5291
f0.25,10,5 1.89

e. The critical value f0.10,9,24 is given in the second table of table VI in the row with
• df d = 24 and in the column with df n = 9

f0.10,9,24 = 1.91

Use the property of the F-distribution:


1 1
f0.90,24,9 = = ≈ 0.5236
f0.10,9,24 1.91

f. The critical value f0.05,15,8 is given in the third table of table VI in the row with
• df d = 8 and in the column with df n = 15

f0.05,15,8 = 3.22

Use the property of the F-distribution


1 1
f0.95,8,15 = = ≈ 0.3106
f0.05,15,8 3.22

By : Sun Bunra 114


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Exercise 13. Consider the hypothesis test H0 : σ12 = σ22 against Ha : σ12 < σ22 . Suppose that
the sample sizes are n1 = 5 and n2 = 10, and that s21 = 23.2 and s22 = 28.8. Use α = 0.05.
Test the hypothesis and explain how the test could be conducted with a confidence interval
on σ1 /σ2 .
Solution :
Given
H0 : σ12 = σ22
H1 : σ12 < σ22
n1 = 5 and n2 = 10
s1 = 23.2, s22 = 28.8 and α = 0.05
2

Compute the value of the test statistic :

s21 23.2
F = 2
= ≈ 0.801
s2 28.8
The critical value is given in the third table of table VI in the row with df n = n1 − 1 = 4
and in the column with df d = n2 − 1 = 10 − 1 = 9
1 1
f0.95,4,9 = = ≈ 0.1667
f0.05,9,4 6.00

The rejection region contains all values smaller than 0.1667


If the value of the statistic is in the rejection region, then reject the null hypothesis

0.801 > 0.1667 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that the first variance is smaller the
second variance.

Exercise 14. Consider the hypothesis test H0 : σ12 = σ22 against Ha : σ12 ̸= σ22 . Suppose
that the sample sizes are n1 = 15 and n2 = 15, and the sample variances are s21 = 2.3 and
s22 = 1.9. Use α = 0.05.
(a) Test the hypothesis and explain how the test could be conducted with a confidence
interval on σ1 /σ2 .
(b) What is the power of the test in part (a) if σ1 is twice as large as σ2 ?
(c) Assuming equal sample sizes, what sample size should be used to obtain β = 0.05 if the
σ2 is half of σ1 ?
Solution :

Exercise 15. A study was performed to determine whether men and women differ in their
repeatability in assembling components on printed circuit boards. Random samples of 25
men and 21 women were selected, and each subject assembled the units. The two sample
standard deviations of assembly time were smen = 0.98 minutes and swomen = 1.02 minutes.
(a) Is there evidence to support the claim that men and women differ in repeatability for
this assembly task? Use α = 0.02 and state any necessary assumptions about the underlying
distribution of the data.

By : Sun Bunra 115


Institute of Technology of Cambodia Statistics ( 2022-2023 )

(b) Find a 98% confidence interval on the ratio of the two variances. Provide an interpretation
of the interval.
Solution :
a. Determine the hypotheses :
H0 : σ12 = σ22
H1 : σ12 ̸= σ22
Compute the value of the test statistic :
s21 0.982
F = = ≈ 0.923
s22 1.022
The critical value is given in the third table of table VI in the row with
• df d = n2 − 1 = 21 − 1 = 20
• df n = n1 − 1 = 25 − 1 = 24
⇒ f0.01,24,20 = 2.86
1 1
f0.99,24,20 = = ≈ 0.3650
f0.01,20,24 2.74
The rejection region contains all values smaller than 0.3650 and all values larger than 2.86.
If the value of the test statistic is in the rejection region, then reject the null hypothesis

0.3650 < 0.923 < 2.86 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that the two population variances differ.
b. The boundaries of the confidence interval are then
s21 0.982
f0.01,24,20 · = 2.86 · ≈ 2.6401
s22 1.022
s2 0.982
f0.99,24,20 · 21 = 0.3650 · ≈ 0.3369
s2 1.022
We are 98% confident that the ratio of the sample variances is between 0.3369 and 2.6401.
Exercise 16. To measure air pollution in a home, let X and Y equal the amount of sus-
pended particulate matter (in mg/m3 ) measured during a 24-hour period in a home in which
there is no smoker and a home in which there is a smoker, respectively. We shall test the
σ2 σ2
null hypothesis H0 : 12 = 1 Vs Ha : 12 > 1
σ2 σ2
(a) If a random sample of size m = 9 yielded x̄ = 93 and sx = 12.9 while a random sample
of size n = 11 yielded ȳ = 132 and sy = 7.1, define a critical region and give your conclusion
if α = 0.05.
(b) Now test H0 : µ1 = µ2 against Ha : µ1 < µ2 if α = 0.05.
Solution :
(a) Determine the value of the test statistic :
s2X 12.92
F = = = 3.301
s2Y 7.12

By : Sun Bunra 116


Institute of Technology of Cambodia Statistics ( 2022-2023 )

The critical value is given in the first table of table VII in the row with
• df d = n2 − 1 = 11 − 1 = 10α = 0.05
• df n = n1 − 1 = 9 − 1 = 8
⇒ F = 3.07
The rejection region contains all values larger than 3.07.
If the value of the test statistic is in the rejection region, then reject the null hypothesis:

3.301 > 3.07 ⇒ Reject H0

(b) Given :
H 0 : µX = µY
H 1 : µX < µ Y
Since in part (a) we concluded that the population variances were not equal, we need to use
Welch’s t test.
Determine the degrees of freedom (rounded down to the nearest integer):
 2  2
s21 s22 12.92 7.12
n1
+ n2 9
+ 13
∆= 2 2 = 2 ≈ 11
(s21 /n1 ) (s22 /n2 ) (12.92 /9) (7.12 /11)2
+ 9−1
+ 11−1
n1 −1 n2 −1

Determine the critical values from table VI with df = 11 and 1 − α = 0.95 :

t = −1.796

The rejection region then contains all values smaller than −1.796. Determine the test statistic
x̄1 − x̄2 93 − 132
t= q 2 2
=q ≈ −8.119
s1 s2 12.92 7.12
n1
+ n2 9
+ 11

If the value of the test statistic is within the rejection region, then the null hypothesis is
rejected :
−8.119 < −1.796 ⇒ Reject H0

Exercise 17. Two different types of injection-molding machines are used to form plastic
parts. A part is considered defective if it has excessive shrinkage or is discolored. Two
random samples, each of size 300 , are selected, and 15 defective parts are found in the
sample from machine 1 while 8 defective parts are found in the sample from machine 2 .
(a) Is it reasonable to conclude that both machines produce the same fraction of defective
parts, using α = 0.05 ? Find the P -value for this test.
(b) Construct a 95% confidence interval on the difference in the two fractions defective.
(c) Suppose that p1 = 0.05 and p2 = 0.01. With the sample sizes given here, what is the
power of the test for this two-sided alternate?
(d) Suppose that p1 = 0.05 and p2 = 0.01. Determine the sample size needed to detect this
difference with a probability of at least 0.9.
Solution :

By : Sun Bunra 117


Institute of Technology of Cambodia Statistics ( 2022-2023 )

a. Determine the hypotheses


H 0 : µ1 = µ2
H1 : µ1 ̸= µ2
The sample proportion is the number of successes divided by the sample size :
x1 15
p̂1 = = ≈ 0.05
n1 300
x2 8
p̂2 = = ≈ 0.0267
n2 300
x1 + x2 15 + 8 23
p̂p = = = ≈ 0.0383
n1 + n2 300 + 300 600
Determine the value of the test statistic :
p̂1 − p̂2 0.05 − 0.0267
z=p q =p q ≈ 1.49
p̂p (1 − p̂p ) n11 + 1
n2 0.0383(1 − 0.0383) 300
1
+ 1
300

The P-value is the probability of obtaining the value of the test statistic, or a value more
extreme. Determine the Pvalue using the normal probability table in the appendix :

P = P (Z < −1.49 or Z > 1.49) = 2P (Z < −1.49) = 2(0.068112) = 0.136224

If the P-value is smaller than the significance level, then reject the null hypothesis :

P > 0.05 ⇒ Fail to reject H0

There is not sufficient evidence to support the claim that there is a difference in these
proportions.
b.Given
c = 95% = 0.95
For confidence level 1 − α = 0.95, determine zα/2 = z0.025 using the normal probability
table in the appendix (look up 0.025 in the table, the z-score is then the found z-score with
opposite sign) :
zα/2 = 1.96
The lower boundary of the confidence interval for p1 − p2 are then :
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) − zα/2 · +
n1 n2
r
0.05(1 − 0.05) 0.0267(1 − 0.0267)
= (0.05 − 0.0267) − 1.96 +
300 300
≈ −0.0074
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) + zα/2 · +
n1 n2
r
0.05(1 − 0.05) 0.0267(1 − 0.0267)
= (0.05 − 0.0267) + 1.96 +
300 300
≈ 0.0540

By : Sun Bunra 118


Institute of Technology of Cambodia Statistics ( 2022-2023 )

c. Given
p1 = 0.05
p2 = 0.01
Formula power :
 p   p 
zα/2 pq (1/n1 + 1/n2 ) − (p1 − p2 ) −z pq (1/n1 + 1/n2 ) − (p1 − p2 )
β = Φ q  − Φ  α/2 q 
p̂1 (1−p̂1 ) p̂2 (1−p̂2 ) p̂1 (1−p̂1 ) p̂2 (1−p̂2 )
n1
+ n2 n1
+ n2

Determine zα/2 = z0.025 using the normal probability table in the appendix (look up 0.025
in the table, the z-score is then the found z-score with opposite sign) :

zα/2 = 1.96

Use the formula of power :


 p 
1.96 0.03(1 − 0.03)(1/300 + 1/300) − (0.05 − 0.01) 
β = Φ q
0.05(1−0.05)
300
+ 0.01(1−0.01)
300
 p 
−1.96 0.03(1 − 0.03)(1/300 + 1/300) − (0.05 − 0.01)
− Φ q 
0.05(1−0.05) 0.01(1−0.01)
300
+ 300

= Φ(−0.92) − Φ(0.92)

Evaluate using the normal probability table in the appendix :

β = Φ(0.92) − Φ(−0.92) = 0.821214 − 0.178786 = 0.642428

The power is the complement of β :

P OW ER = 1 − β = 1 − 0.642428 = 0.357572

d. Given :
P OW ER = 0.9
β is the complement of the power :

β = 1 − P OW ER = 1 − 0.9 = 0.1

Formula sample size :


 p p 2
zα/2 (p1 + p2 ) (1 − p1 + 1 − p2 ) /2 + zβ p1 (1 − p1 ) + p2 (1 − p2 )
n=
(p1 − p2 )2

Determine zα/2 = z0.025 using the normal probability table in the appendix (look up 0.025
in the table, the z-score is then the found z-score with opposite sign) :

zα/2 = 1.96

By : Sun Bunra 119


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Determine zβ = z0.10 using the normal probability table in the appendix (look up 0.10 in the
table, the z-score is then the found z-score with opposite sign) :

zβ = 1.28

Evaluate the formula (round up!)


p !2
1.96 (0.05
p + 0.01)(1 − 0.05 + 1 − 0.01)/2
+1.28 0.05(1 − 0.05) + 0.01(1 − 0.01)
n= = 380
(0.05 − 0.01)2

Exercise 18. Recent incidents of food contamination have caused great concern among
consumers. The article ”How Safe Is That Chicken?” (Consumer Reports, Jan. 2010: 19-23)
reported that 35 of 80 randomly selected Perdue brand broilers tested positively for either
campylobacter or salmonella (or both), the leading bacterial causes of food-borne disease,
whereas 66 of 80 Tyson brand broilers tested positive.
(a) Does it appear that the true proportion of non-contaminated Perdue broilers differs from
that for the Tyson brand? Carry out a test of hypotheses using a significance level 0.01 by
obtaining a P -value.
(b) If the true proportions of non-contaminated chickens for the Perdue and Tyson brands
are 0.50 and 0.25, respectively, how likely is it that the null hypothesis of equal proportions
will be rejected when a 0.01 significance level is used and the sample sizes are both 80 ?
Solution :
(a) Claim: p1 ̸= p2
The claim is either the null hypothesis or the alternative hypothesis. The null hypothesis
states an equality. Since the null hypothesis is not the claim, the alternative hypothesis is
the claim.
H0 : p 1 = p 2
Ha : p1 ̸= p2
The sample proportion is the number of successes divided by the sample size :
x1 35
p̂1 = = ≈ 0.4375
n1 80
x2 66
p̂2 = = ≈ 0.825
n2 80
x1 + x2 35 + 66
p̂p = = ≈ 0.63125
n1 + n2 80 + 80
The critical values are the values corresponding to a probability of 0.005/0.995 in table A.3

z = ±2.575

The rejection region then contains all values below −2.575 and all values above 2.575.
Determine the value of the test statistic :
p̂1 − p̂2 0.4375 − 0.825
z=p q =p q ≈ −5.08
p̂p (1 − p̂p ) n11 + 1
n2 0.63125(1 − 0.63125) 80
1
+ 1
80

By : Sun Bunra 120


Institute of Technology of Cambodia Statistics ( 2022-2023 )

If the value of the test statistic is within the rejection region, then the null hypothesis is
rejected :
−5.08 < −2.575 ⇒ Reject H0
There is sufficient evidence to support the claim that the true proportion of non-contaminated
Perdue broilers differs from that for the Tyson brand.
(b) Given
p1 = 0.50
p2 = 0.25
The critical values are the values corresponding to a probability of 0.005/0.995 in table A.3
z = ±2.575
Determine the difference in proportions that correspond with these z-values (assuming null
hypothesis p1 = p2 is true):
q r
1 1
p̂1 − p̂1 = (p1 − p2 ) + z p̂p (1 − p̂p ) +
n1 n2
r
p 1 1
= 0 ± 2.575 0.63125(1 − 0.63125) +
80 80
≈ ±0.196
Determine the z-score corresponding with these difference in proportions, assuming that the
alternative hypothesis is true (Note: We use p1 and p2 instead of p̂p , because p̂p is unknown
since we do not know the sample proportions) :
p̂1 − p̂2 − (p1 − p2 ) 0.1964 − (0.50 − 0.25)
z=q =q ≈ −0.72
p1 (1−p1 ) p2 (1−p2 ) 0.50(1−0.50) 0.25(1−0.25)
n1
+ n2 80
+ 80
p̂1 − p̂2 − (p1 − p2 ) −0.1964 − (0.50 − 0.25)
z=q =q ≈ −6.04
p1 (1−p1 ) p2 (1−p2 ) 0.50(1−0.50) 0.25(1−0.25)
n1
+ n2 80
+ 80

Determine the probability of rejecting the null hypothesis using table A.3:
P (z < −6.04 or z > −0.72) = P (z < −6.04) + P (z > −0.72)
= P (z < −6.04) + 1 − P (z < −0.72)
≈ 0 + 1 − 0.2358 = 0.7642 = 76.42%

Thus we have a 76.42 chance of rejecting the null hypothesis.


a. Determine the hypotheses:
H 0 : µ1 = µ2
H1 : µ1 ̸= µ2
The sample proportion is the number of successes divided by the sample size:
x1 15
p̂1 = = ≈ 0.05
n1 300
x2 8
p̂2 = = ≈ 0.0267
n2 300
x1 + x2 15 + 8 23
p̂p = = = ≈ 0.0383
n1 + n2 300 + 300 600

By : Sun Bunra 121


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Determine the value of the test statistic:


p̂1 − p̂2 0.05 − 0.0267
z=p q =p q ≈ 1.49
p̂p (1 − p̂p ) n11 + 1
n2 0.0383(1 − 0.0383) 300
1
+ 1
300

The P-value is the probability of obtaining the value of the test statistic, or

By : Sun Bunra 122


Institute of Technology of Cambodia Statistics ( 2022-2023 )

TD6 - (Anova)
1. Three classes in elementary statistics are taught by three different persons : a regular
faculty member, a graduate teaching assistant, and an adjunct from outside the uni-
versity. At the end of the semester, each student is given a standardized test. Five
students are randomly picked from each of these classes, and their scores are as shown
in Table
Faculty Teaching assistant Adjunct
93 88 86
61 90 56
87 76 73
75 82 90
92 58 47
(a) Construct an ANOVA table and interpret your results.
(b) Test at the 0.05 level whether there is a difference between the mean scores for the
three persons teaching. Assume that the ANOVA assumptions are met.
Solution :
(a) Construct an ANOVA table and interpret your results.
The ANOVA table is :
Source of Variation Df Sum of Squares Mean Squares F-Statistic
Treatment
Error
Total
we have k = 3, n1 = 5, n2 = 5, n3 = 5, and N = n1 + n2 + n3 = 15 and then
SST r = 339.7333, SSE = 2785.2, SST = 3124.9333
Since
SST r SSE
M ST r = = 169.8667, M SE = = 232.1
k−1 n−k
Then
M ST r
F = = 0.73187
M SE
Therefore, ANOVA Table
Source of Variation Df Sum of Squares Mean Squares F-Statistic
Treatment 2 339.7333 169.8667 0.73187
Error 12 2785.2 232.1
Total 14 3124.9333
(b) Test at the 0.05 level whether there is a difference between the mean scores for
the three persons teaching. Assume that the ANOVA assumptions are met.
■ For significance level 0.05
we have p-value = P (F ≥ f ) = 0.501295.
Since the p-value is too high (0.501295 >> 0.05) we can say that, there is a difference
between the mean scores for the three persons teaching.

By : Sun Bunra 123


Institute of Technology of Cambodia Statistics ( 2022-2023 )

2. The following data refers to yield of tomatoes (kg/plot) for four different levels of
salinity; salinity level here refers to electrical conductivity (EC), where the chosen
levels were EC = 1.6, 3.8, 6.0, and 10.2nmhos/cm :

1.6 : 59.5 53.3 56.8 63.1 58.7


3.8 : 55.2 59.1 52.8 54.5
6.0 : 51.7 48.8 53.9 49.0
10.2 : 44.6 48.5 41.0 47.3 46.1

(a) Use the F test at level α = 0.05 to test for any differences in true average yield
due to the different salinity levels.
(b) Apply the modified Tukey’s method to identify significant differences among the
µi ’s.
Solution :
(a) Use the F test at level α = 0.05 to test for any differences in true average yield due
to the different salinity levels.
To find the value of test statistic f , prepare the following table

EC Observations ni xi. x̄i.


1.6 59.553.356.863.158.7 5 291.4 58.25
3.8 55.259.152.854.5 4 221.6 55.4
6.0 51.748.853.949.0 4 203.4 50.84
10.2 44.648.541.047.346.1 5 227.5 45.5
N = 18 x.. = 943.9

We have, I = 4, N = 18
X
4 X
ni
xij = 59.52 + . . . + 46.12 = 50078.07
i=1 j=1

Then total sum of square


X
4 X
ni
x2.
SST = xij − = 50078.07 − 49497.06 = 581.01
i=1 j=1
N

Treatment sum of squares


X
4
1 2
SSTr = ni x2i. − x = 456.44
i=1
n −.

Error sum of squares


SSE = SST − SST r = 124.57
Treatment mean sum of squares
SST r
M ST r = = 152.1467
4−1
Error mean sum of squares
SSE
M SE = = 8.9
N −4

By : Sun Bunra 124


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, the f statistic is


M ST r
f= = 17.095
M SE
Since f > F0.05,3,14 , reject the null hypothesis at a 5% significance level.
There is a significance difference in yields of tomatoes for the four levels of salinity.
(b) Apply the modified Tukey’s method to identify significant differences among the
µ′i s.
Tukey’s procedure for finding the significantly differed means is as follows :
The Studentized range distribution when there are different sample sizes for groups is
found to be
Qα,I,N −I = Q0.05,4,14 = 4.11.
Therefore, s  
M SE 1 1
wij = Q0.05,4,14 +
2 ni nj
■ The quantity for comparing the difference in means of sample 1 and sample 2 is:

w12 = 4.11 2.00025 = 5.81

■ The quantity for comparing the difference in means of sample 2 and sample 3 is:

w23 = 4.11 1.49 = 6.13

■ The quantity for comparing the difference in means of sample 1 and sample 3 is :

w13 = 4.11 2.00025 = 5.81

■ similarly, the quantity for comparing the differences for remaining combinations

w14 = 5.48, w24 = 5.81, w34 = 5.81

■ The means in ascending order are :

x̄4. = 45.50, x̄3. = 50.85, x̄2. = 55.40, x̄1. = 58.28

■ x̄1. − x̄2. = 2.88 < w12 says that there is no significant difference between these
means.
■ x̄1. − x̄3 = 7.43 > w13 says that there is significant difference between these means.
■ x̄1. − x̄4 = 12.78 > w14 says that there is significant difference between these means
■ I¯2. − x̄3. = 4.55 < w23 says that there is no significant difference between these
means.
■ x̄2. − x̄4. = 9.9 > w24 says that there is significant difference between these means.
■ x̄3 . − x̄4 . = 5.35 < w34 says that there is no significant difference between these
means.
Therefore, there is no significant difference between the averages of level (1.6 and
3.8), (3.8 and 6.0), (6.0 and 10.2) with in the groups, but the remaining differences are
appeared to be significantly different.

By : Sun Bunra 125


Institute of Technology of Cambodia Statistics ( 2022-2023 )

3. The following partial ANOVA table is taken from the article ” Perception of Spatial
Incongruity ” (J. Nerv. Ment. Dis., 1961: 222) in which the abilities of three
different groups to identify a perceptual incongruity were assessed and compared. All
individuals in the experiment had been hospitalized to undergo psychiatric treatment.
There were 21 individuals in the depressive group, 32 individuals in the functional
”other ” group, and 21 individuals in the brain-damaged group. Complete the ANOVA
table and carry out the F test at level α = 0.01.

Source df Sum of Squares Mean Square f


Groups 76.09
Error
Total 1123.14
Solution :
Complete the ANOVA table and carry out the F test at level α = 0.01.
Test the null hypothesis :
We have H0 : There is no significance difference between means of groups. with
alternative hypothesis
Since Ha : at least one mean is different from others.
Source df Sum of Squares Mean Square f
Group 76.09
Error
Total 1123.14
■ There were 21 individuals in the depressive group, denoted by n1 = 21
■ There were 32 individuals in the functional ”other” group, denoted by n2 = 32
■ There were 21 individuals in the brain-damaged group, denoted by n3 = 21
■ Then, k = 3, n = n1 + n2 + n3 = 21 + 32 + 21 = 74
We have,
SST r = (3 − 1)M ST r = 2 × 76.09 = 152.18
Then
SSE = SST − SST r = 1123.14 − 152.18 = 970.96
Since
SSE
M SE = = 13.3008
n−I
The test statistic for the data can be calculated as :
M ST r
f= = 5.72
M SE
Source df Sum of Squares Mean Square f
Group 3 152.18 76.09 5.72
Error 71 970.96 13.3008
Total 73 1123.14
The critical value of test statistic at α = 0.01 with (2, 71) degrees of freedom is
F0.01,2,71 = 4.92.

By : Sun Bunra 126


Institute of Technology of Cambodia Statistics ( 2022-2023 )

The probability value of test statistic is p-value = P (F ≥ F0.01,2,71 ) = 0.005


Since, p-value < 0.01, we fail to accept the null hypothesis and conclude that there is
significant difference in at least one mean of groups.

4. The accompanying summary data on skeletal-muscle CS activity (nmol/min/mg) ap-


peared in the article ”Impact of Lifelong Sedentary Behavior on Mitochondrial Function
of Mice Skeletal Muscle” (J. of Gerontology, 2009: 927-939) :

Old Old
Young
Sedentary Active
Sample size 10 8 10
Sample mean 46.68 47.71 58.24
Sample sd 7.16 5.59 8.43

Carry out a test to decide whether true average activity differs for the three groups.If
appropriate, investigate differences amongst the means with a multiple comparisons
method.
Solution :
Carry out a test to decide whether true average activity differs for the three groups.
If appropriate, investigate differences amongst the means with a multiple comparisons
method.

Young Old Sedentary Old Active


Sample Size 10 8 10
Sample Mean 46.68 47.71 58.24
Sample Standard Deviation 7.16 5.59 8.43

Test the null hypothesis :


We have H0 : There is no significance difference between means. with alternative
hypothesis
Since Ha : at least one mean is different from others.
■ Construct an ANOVA Table.
We have x.. = 10 × 46.68 + 8 × 47.71 + 10 × 58.24 = 1430.88

X
3
1 2
SSTr = ni x̄2i. − x = 797.0966
i=1
n ..
X
3 X
ni X
3
SSE = (xij − x̄i. ) =
2
(ni − 1) s2i = 1319.7112
i=1 j=1 i=1
797.0966
M ST r = = 398.5483
2
SSE
M SE = = 52.7884
25
The test statistic is :
M ST r 398.5483
F = = = 7.55
M SE 52.7884
By : Sun Bunra 127
Institute of Technology of Cambodia Statistics ( 2022-2023 )

From the table of F-distribution with I = 3 and n − I = 25 degrees of freedom we


obtain F0.001,2,25 = 5.57 and F0.01,2,25 = 9.22 Since 5.57 < 7.55 < 9.22 it follows that

0.001 < P -value < 0.01

Therefore, at the level 0.01, the null hypothesis can be rejected and conclude that at
least two of the means differ.
Now use multiple comparison method to investigate which pairs of the means differ
significantly. Use α = 0.01.
we have
Qα,I,n−I = Q0.01,3,25 = 4.55
w12 = 11.09, w13 = 10.45, w23 = 11.09
The means in ascending order are :

µ1 = 46.68, µ2 = 47.71, µ3 = 58.24

The results are shown in the table below :


i−j µi µj |µi − µj | wij |µi − µj | ≥ wij
1−2 46.68 47.71 1.03 11.09 No
1−3 46.68 58.24 11.56 10.45 Yes
2−3 47.71 58.24 10.53 11.09 No

Therefore, Conclude that there is significant difference between the sample means of
the first and third sample. That is the true average activity differs between young and
old active group.

5. Lipids provide much of the dietary energy in the bodies of infants and young children.
There is a growing interest in the quality of the dietary lipid supply during infancy as
a major determinant of growth, visual and neural development, and longterm health.
The article ”Essential Fat Requirements of Preterm Infants” (Amer. J. Clin. Nutrit.,
2000: 245S-250S) reported the following data on total polyunsaturated fats (%) for
infants who were randomized to four different feeding regimens: breast milk, corn-
oilbased formula, soy-oil-based formula, or soy-and-marine-oil-based formula :

Regimen Sample Size Sample Mean Sample SD


Breast milk 8 43.0 1.5
CO 13 42.4 1.3
SO 17 43.1 1.2
SMO 14 43.5 1.2

(a) What assumptions must be made about the four total polyunsaturated fat
distributions before carrying out a single-factor ANOVA to decide whether there are
any differences in true average fat content?
(b) Carry out the test suggested in part (a). What can be said about the P -value?

By : Sun Bunra 128


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Solution :
(a) What assumptions must be made about the four total polyunsaturated fat distri-
butions before carrying out a single-factor ANOVA to decide whether there are any
differences in true average fat content?
To test for taking decision on having differences in true average while conducting a
single-factor ANOVA, assume that each of the four fat distributions must follow nor-
mal distribution with equal variance.
(b) Carry out the test suggested in part (a). What can be said about the P -value?
Test the null hypothesis : H0 : There is no significance difference between means.
with alternative hypothesis Ha : at least one mean is different from others.
We have
1 XX
I i n
x̄ = xij = 43.02
n i=1 j=1
X
I X
ni
SST r = (x̄i. − x̄.. )2 = 8.3344
i=1 j=1
SST r 8.3344
M ST r = = = 2.778
I −1 3
XI
SSE = (ni − 1) s2i = 77.79
i=1
SSE 77.79
M SE = = = 1.62
n−I 48
The test statistic can be calculated as :
M ST r
f= = 1.71
M SE
The critical value of test statistic for α = 0.1 with (3, 48) degrees of freedom is
F0.10,3,48 = 2.20 so, f = 1.71 < 2.20 = F0.10,3,48
The probability of test statistic at α = 0.1 with (3, 48) degrees of freedom is 0.18.
Since P -value = 0.18 > 0.10, we accept the null hypothesis and conclude that there is
no significant difference among the means.
6. Although tea is the world’s most widely consumed beverage after water, little is known
about its nutritional value. Folacin is the only B vitamin present in any significant
amount in tea, and recent advances in assay methods have made accurate determina-
tion of folacin content feasible. Consider the accompanying data on folacin content for
randomly selected specimens of the four leading brands of green tea.
Brand Observationst
1 7.9 6.2 6.6 8.6 8.9 10.1 9.6
2 5.7 7.5 9.8 6.1 8.4
3 6.8 7.5 5.0 7.4 5.3 6.1
4 6.4 7.1 7.9 4.5 5.0 4.0
(Data is based on ”Folacin Content of Tea,” J. Amer. Dietetic Assoc., 1983 : 627-632.)
Does this data suggest that true average folacin content is the same for all brands?

By : Sun Bunra 129


Institute of Technology of Cambodia Statistics ( 2022-2023 )

(a) Carry out a test using α = 0.05 via the P -value method.
(b) Assess the plausibility of any assumptions required for your analysis in part (a).
(c) Perform a multiple comparisons analysis to identify significant differences among
brands.
Solution :
(a) Carry out a test using α = 0.05 via the P -value method.
Test the null hypothesis : H0 : There is no significance difference between brand means.
with alternative hypothesis
Since Ha : at least one brand mean is different from others. From the given data we
have the One-Way ANOVA Table :
Source SS df MS f p-value
Between-treatments 23.4957 3 7.8319 f = 3.74933 0.027552
Within-treatments 41.7776 20 2.0889
Total 65.2733 23
We have P -value = 0.027552 < 0.05. So, the null hypothesis is rejected.
Therefore, it can be concluded that there is a significance difference the mean folacin
contents in the four brands. This means that at least one brand has average folacin
content different from the others.
(b) Assess the plausibility of any assumptions required for your analysis in part (a).
We assume that each of the four brands must follow normal distribution with equal
variance.
(c) Perform a multiple comparisons analysis to identify significant differences among
brands.
We have
Factor ni mean
Brand 1 7 8.271
Brand 2 5 7.500
Brand 3 6 6.350
Brand 4 6 5.817
we have
Q0.05,4,20 = 3.958
s  
2.0889 1 1
w34 = 3.958 + = 2.33
2 6 6
w24 = 2.44, w23 = 2.44, w13 = 2.24, w14 = 2.24, w12 = 2.366
The difference between the pairs and decision are shown in the table below :
Pairs difference wij decision
x̄1. − x̄2. 0.771 2.366 not significant difference
x̄1. − x̄3. 1.921 2.24 not significant difference
x̄1. − x̄4. 2.454 2.24 significant difference
x̄2. − x̄3. 1.15 2.44 not significant difference
x̄2. − x̄4. 1.863 2.44 not significant difference
x̄3. − x̄4. 0.533 2.33 not significant difference

By : Sun Bunra 130


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore, Conclude that there is significant difference between the sample means of
the Brand 1 and Brand 4.

7. In
Xsingle-factor ANOVA
X with sample sizes ni (iX = 1, . . . , I), show that SSTr =
2
ni X̄i. − X̄.. = ni X̄i. − nX̄.. , where n =
2 2
ni .
i

Solution :
In single-factor ANOVA with sample sizes ni (i = 1, 2, . . . , I)
X
I
2 X
I X
n
Show that SSTr = ni X̄i. − X̄.. = ni X̄i.2 − nX̄..2 , where n = ni .
i=1 i=1 i=1

We have
X
I
SSTr = ni (Xi. − X.. )2
i=1
XI

= ni X̄i.2 + X̄..2 − 2X̄.. X̄i.
i=1
XI X
I X
I
= ni X̄i.2 + ni X̄..2 − 2X̄.. ni X̄i.
i=1 i=1 i=1
XI
= ni X̄i.2 + nX̄..2 − 2nX̄..2
i=1
XI
= ni X̄i.2 − nX̄..2
i=1

The given relation is proved.

8. When sample sizes are equal (ni = J),


Xthe parameters α1 , α2 , . . . , αI of the alternative
parameterization are restricted by αi = 0. For unequal sample sizes, the most
X
natural restriction is ni αi = 0. Use this to show that

1 X
E(MSTr) = σ 2 + ni αi2
I −1

What is E(MSTr) when H0 is true?


Solution :
1 X
Show that E(M ST r) = σ 2 + ni αi2
I −1
We have
SST r
M ST r =
I −1
Then
1
E(M ST r) = E(SST r)
I −1
We have X
SST r = n (x̄i. − x̄.. )2

By : Sun Bunra 131


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Since
!
1 X
I
⇒ E(M ST r) = E n (x̄i. − x̄.. )2
I −1 i=1
!
1 XI

= E nx̄2i. + nx̄2.. − n2x̄.. x̄i.
I −1 i=1
!
1 XI X
I
= E n x̄2i. − 2nx̄.. x̄i. + nI x̄..
I −1 i=1 i=1
!
1 XI
= E n x̄2i. − nI x̄2..
I −1
 i=1
!2 !2 
1 1 X
I X
n
1 X
I X
n
= E xij − xij 
I −1 n i=1 j=1 nI i=1 j=1
  !2 
1 1 XI Xn
1 X I X n
= E  (µ + αi + ϵij )2 − (µ + αi + ϵij ) 
I −1 n i=1 j=1 nI i=1 j=1
!
1 XI
(nIµ) 2
nIµ 2
= nµ2 + n αi2 + aσ 2 − −
I −1 i=1
nI nI

n X 2
I
= σ2 + α When H0 is true, we get αi = 0
I − 1 i=1 i

Therefore, E(M ST r) = σ 2
9. Consider the ANOVA model
Xij = µi + εij , i = 1, 2, . . . , I, , j = 1, 2, . . . , J
 SSE
where Xij ∼ N µi , σ 2 . Then show that (a) the random variable 2
∼ χ2 (I(J − 1))
σ
(b) the statistics SSE and SST r are independent. Further, if the null hypothesis
H0 : µ1 = µ2 = . . . = µI = µ is true, then
SST r
(c) the random variable 2
∼ χ2 (I − 1)
σ
M ST r
(d) the statistics ∼ F (I − 1, I(J − 1))
M SE
SST
(e) the random variable ∼ χ2 (IJ − 1).
σ2
Solution : Consider the ANOVA model
Xij = µi + ϵij , i = 1, 2, . . . , I, j = 1, 2, . . . , J
 SSE
where Xij ∼ N µ, σ 2 . Then show that (a) the random variable ∼ χ2 (I(J − 1))
σ2
Then
SSE X (J − 1)Si2
I
=
σ2 i=1
σ2

By : Sun Bunra 132


Institute of Technology of Cambodia Statistics ( 2022-2023 )

(J1 ) Si2
Since, S12 , . . . , SI2 are independent and 2
∼ χ2 (J − 1), ∀i
σ
Therefore,
SSE
∼ χ2 (I(J − 1))
σ2
(b) the statistics SSE and SSTr are independent. Further, if the null hypothesis H0 :
µ1 = µ2 = . . . = µI = µ is true, then We have

X
I

SSE = (J − 1)Si2
i=1

Since
X
I
2
SST r = J X̄i. − X̄..
i=1

Since Si and X̄i are independent. So, SSE and SST r are independent.
SST r
(c) the random variable 2
∼ χ2 (I − 1)
σ
we have
X J
2
SST r = J X̄i. − X̄.
i=1

1X  
J
Let Yi = X̄i. = Xij since, Xij ∼ N µ, σ 2 =⇒ Yi ∼ N µ, σ 2 /J
J j=1
So,
2
SST r X Yi − Ȳ
J
=
σ2 i=1
σ 2 /J
SST r
Therefore, ∼ χ2 (I − 1)
σ2
M ST r
(d) the statistics ∼ F (I − 1, I(J − 1))
M SE
We have
SST r/σ 2
M ST r I−1
F = = SSE/σ 2
M SE
J−1

SST r SSE
Since, 2
∼ χ2 (I − 1) and ∼ χ2 (I(J − 1))
σ σ2
and we already knew that SSTr and SSE are independent.
SST
Therefore, F ∼ F (I − 1, I(J − 1)) (e) the random variable ∼ χ2 (IJ − 1)
σ2
We have SST = SSE + SST r
SST SSE SST r
Then 2
= + ∼ χ2 (IJ − 1)
σ σ2 σ2

By : Sun Bunra 133


Institute of Technology of Cambodia Statistics ( 2022-2023 )

10. The number of miles of useful tread wear (in 1000s) was determined for tires of each
of five different makes of subcompact car (factor A, with I = 5 ) in combination
with each of four different brands of radial tires (factor B, with J = 4 ), resulting in
IJ = 20 observations. The values SSA = 30.6, SSB = 44.1, and SSE = 59.2 were
then computed. Assume that an additive model is appropriate.
a. Test H0 : α1 = α2 = α3 = α4 = α5 = 0 (no differences in true average tire lifetime
due to makes of cars) versus Ha : at least one αi = 0 using a level 0.05 test.
b. H0 : β1 = β2 = β3 = β4 = 0 (no differences in true average tire lifetime due to
brands of tires) versus Ha : at least one βj using a level 0.05 test.
Solution :
(a) Test H0 : α1 = α2 = α3 = α4 = α5 = 0 ( no differences in true average tire lifetime
due to makes of cars) versus Ha : at least one αi ̸= 0 using a level 0.05 test.
Null Hypothesis H0 : α1 = α2 = α3 = α4 = α5 = 0
Versus Ha :at least one αi = 0
In order to test the hypothesis the value F need to be calculated.

M SA
F =
M SE
where,
SSA SSE
M SA = , and M SE =
dfA dfE
we have,dfA = I − 1 = 4 and dfE = (I − 1)(J − 1) = 12 therefore,

30.6 59.2
M SA = = 7.65, and M SE = = 4.93
4 12
thus,
7.65
f= = 1.55
4.93
The value of f -statistic F0.05,4,12 = 3.26
Since f < F0.05,4,12 the null hypothesis H0 cannot be rejected.
This means that there is no significant difference in the average lifetime of tires among
the various makes of cars.
(b) H0 : β1 = β2 = β3 = β4 = 0 (no differences in true average tire lifetime due to
brands of tires) versus Ha : at least one βi ̸= 0 using a level 0.05 test.
Null hypothesis H0 : β1 = β2 = β3 = β4 = 0
versus Ha : at least one βi ̸= 0
In order to test the hypothesis the value F need to be calculated.

M SB
F =
M SE
where,
SSB SSE
M SB = , and M SE =
dfB dfE

By : Sun Bunra 134


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Therefore,
44.1/3
f= = 2.98
4.93
The value of f -statistic F0.05,3,12 = 3.49
Since f < F0.05,3,12 , the null hypothesis H0 cannot be rejected.
This means that there is no significant difference in the average lifetime of tires among
the different brands of tires.

11. Four different coatings are being considered for corrosion protection of metal pipe. The
pipe will be buried in three different types of soil. To investigate whether the amount
of corrosion depends either on the coating or on the type of soil, 12 pieces of pipe
are selected. Each piece is coated with one of the four coatings and buried in one of
the three types of soil for a fixed time, after which the amount of corrosion (depth of
maximum pits, in .0001 in.) is determined. The data appears in the table.

Soil Type (B)


1 2 3
1 64 49 50
Coating (A) 2 53 51 48
3 47 45 50
4 51 43 52

a. Assuming the validity of the additive model, carry out the ANOVA analysis using
an ANOVA table to see whether the amount of corrosion depends on either the type
of coating used or the type of soil. Use α = 0.05.
b. Compute µ̂, αˆ1 , αˆ2 , αˆ3 , αˆ4 , β̂1 , β̂2 , and β̂3 .
Solution :
(a) Assuming the validity of the additive model, carry out the ANOVA analysis using
an ANOVA table to see whether the amount of corrosion depends on either the type
of coating used or the type of soil. Use α = 0.05.

Soil Type(B)
1 2 3
1 64 49 50
Coating(A) 2 53 51 48
3 47 45 50
4 51 43 52

From the above table we can create another table :

Soil Type(B)
1 2 3 xi. x̄i.
1 64 49 50 163 54.3333
Coating(A) 2 53 51 48 152 50.6667
3 47 45 50 142 47.3333
4 51 43 52 146 48.6667
x.j 215 188 200
x̄.j 53.75 47 50

By : Sun Bunra 135


Institute of Technology of Cambodia Statistics ( 2022-2023 )

we have,
X.. = 163 + 152 + 142 + 146 = 603
Since
X
I X
J
x2ij = 642 + . . . + 522 = 30599
i=1 j=1

The value of correction factor is :


2
X̄... 6032
CF = = = 30300.75
n 12
The total sum of square is :

X
I X
J
1 2
SST = x2ij − x = 298.25
i=1 j=1
IJ ..

The sum of square for factor A is :

1X 2
I
1 2
SSA = xi. − x = 83.5833
J i=1 IJ ..

The sum of square for factor B is :

1X 2
J
1 2
SSB = x.j − x = 91.5
I j=1 IJ ..

The error sum of square is :

SSE = SST − (SSA + SSB) = 123.1667

The means :
SSA SSB SSE
M SA = = 27.8611, M SB = = 45.75, M SE = = 20.5278
dfA dfB dfE
The F-Statistics :
M SA M SB
fA = = 1.36, fB = = 2.23
M SE M SE
Thus, the complete ANOVA table is as shown below :

Source df SS MS F-ratio
Factor A 3 83.5833 27.8611 1.36
Factor B 2 91.5 45.75 2.23
Error 6 123.1667 20.5278
Total 11 298.25

■ For Factor A : F0.05,2,6 = 4.757


Therefore, it can be concluded that there is insufficient evidence to support that there
is difference in the average corrosion due to coating.
■ For Factor B : F0.05,2,6 = 5.1433

By : Sun Bunra 136


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Since fB < F0.05,2,6 .So, the null hypothesis fails to be rejected.


Therefore, it can be concluded that there is insufficient evidence to support that there
is difference in the average corrosion due to soil type.
(b) Compute µ̂, α̂1 , α̂2 , α̂3 , α̂4 , β̂1 , β̂2 , and β̂3 .
The unbiased estimators are found using the following equations:

µ̂ = X̄.. , α̂i = X̄i. − X̄.. , β̂j = X̄.j − X̄..

Therefore, we have
µ̂ = 50.25
Then
α̂1 = 4.0833, α̂2 = 0.4167, α̂3 = −2.9167, α̂4 = −1.5833
Since
β̂1 = 3.5, β̂2 = −3.25, β̂3 = −0.25

12. a. Show that a constant d can be added to (or subtracted from) each xij without
affecting any of the ANOVA sums of squares.
b. Suppose that each xij is multiplied by a nonzero constant c. How does this affect
the ANOVA sums of squares? How does this affect the values of the F statistics FA
and FB ? What effect does ”coding” the data by yij = cxij + d have on the conclusions
resulting from the ANOVA procedures?
Solution :
(a) Show that a constant d can be added to (or subtracted from) each xij without
affecting any of the ANOVA sums of squares.
Let a constant d which has been added to each xij so that xdij = xij +d and xdi = xi +jd
and xd.. = x.. + IJd
we have
XX 1 2
⇒ SSTnew = x2dij −
X
IJ d..
XX  1
= (xij + d)2 − (x.. + IJd)2
IJ
XX  1 
= x2ij + d2 + 2dxij − x2.. + (IJd)2 + 2x.. dIJ
IJ
XX 1
= x2ij − x2
IJ ..
= SSTold

Therefore, it does not affect any of the ANOVA sums of squares.


(b) Suppose that each xij is multiplied by a nonzero constant c. How does this affect
the ANOVA sums of squares? How does this affect the values of the F statistics FA
and FB ? What effect does ”coding” the data by yij = cxij + d have on the conclusions
resulting from the ANOVA procedures?
Let xcij = cxij where c ̸= 0

By : Sun Bunra 137


Institute of Technology of Cambodia Statistics ( 2022-2023 )

Then xci = cxi and xc.. = cx...


XX 1 2
⇒ SSTc = x2cij −
xc..
X X IJ 
1 2
=c 2
xij − x.
IJ
= c2 SST

Therefore,
SSTc = c2 SST
Therefore, each sum of squares will be multiplied by c2 .
Since fA and fB are ratio of sum of square, then they are not affected. the ”Coding”
of data will only have affecting as the multiplication by c.

13. In an experiment to see whether the amount of coverage of light-blue interior latex
paint depends either on the brand of paint or on the brand of roller used, one gallon of
each of four brands of paint was applied using each of three brands of roller, resulting
in the following data (number of square feet covered).

Roller Brand
1 2 3
1 454 446 451
Paint 2 446 444 447
Brand 3 439 442 444
4 444 437 443

a. Construct the ANOVA table. [Hint : The computations can be expedited by


subtracting 400 (or any other convenient number) from each observation. This does
not affect the final results.]
b. State and test hypotheses appropriate for deciding whether paint brand has any
effect on coverage. Use α = 0.05.
c. Repeat part (b) for brand of roller.
d. Use Tukey’s method to identify significant differences among brands. Is there one
brand that seems clearly preferable to the others?
e. Check the normality and constant variance assumptions graphically.
Solution :
(a) Construct the ANOVA table. [Hint: The computations can be expedited by
subtracting 400 (or any other convenient number) from each observation. This does
not affect the final results.]

Rollar Brand
1 2 3
Paint 1 454 446 451
Brand 2 446 444 447
3 439 442 444
4 444 437 443

By : Sun Bunra 138


Institute of Technology of Cambodia Statistics ( 2022-2023 )

From the above table we can create another table :

Rollar Brand
1 2 3 x1 . x̄1.
1 454 446 451 151 50.3333
Paint Brand 2 446 444 447 137 45.6667
3 439 442 444 125 41.6667
4 444 437 443 124 41.3333
x−j 183 169 185
x̄.j 45.75 42.25 46.25

we have,
x.... = 151 + 137 + 125 + 1124 = 537
Since,
X
I X
J
x2ij = 542 + . . . + 432 = 24269
i=1 j=1

the value of correction factor is :


x̄2... 5372
CF = = = 24030.75
n 12
The total sum of square is:

X
I X
J
1 2
SST = x2ij − x = 238.25
i=1 j=1
IJ ..

The sum of square for factor A is :

1X 2
I
1 2
SSA = xi − x = 159.5833
J i=1 IJ .

The sum of square for factor B is :

1X 2
J
1 2
SSB = x−j − x = 38
I j=1 IJ −.

The error sum of square is :

SSE = SST − (SSA + SSB) = 40.6667

The means :
SSA SSB SSE
M SA = = 53.1944, M SB = = 19, M SE = = 6.7778
dfA dfB dfE

The F-Statistics :
M SA M SB
fA = = 7.848, fB = = 2.8033
M SE M SE
By : Sun Bunra 139
Institute of Technology of Cambodia Statistics ( 2022-2023 )

Thus, the complete ANOVA table is as shown below :


Source df SS MS F-ratio
Factor A 3 159.5833 53.1944 7.848
Factor B 2 38 19 2.8033
Error 6 40.6667 6.7778
Total 11 238.25
(b) State and test hypotheses appropriate for deciding whether paint brand has any
effect on coverage α = 0.05.
Test hypothesis Null hypothesis H0A : α1 = α2 = α3 = α4 = 0
versus HaA : at least αi ̸= 0
for α = 0.05 we have F0.05,3,6 = 4.757
Since, the fA > F0.05,3,6 . So, we reject the null hypothesis and concluded that there is
sufficient evidence to support that there is an effect of paint brand on coverage.
(c) Repeat part (b) for brand roller.
Test hypothesis : Null hypothesis H0A : β1 = β2 = β3 = 0 versus HaA : at least βi ̸= 0
for α = 0.05 we have F0.05,2,6 = 5.1433
Since, the fB > F0.05,2,6 . So we reject the null hypothesis and concluded that there is
sufficient evidence to support that there is an effect of roller brand on coverage.
(d) Use Tukey’s method to identify significant differences among brands. Is there one
brand that seems clearly preferable to the others?
For comparing factor A, the critical value of standardized range statistic is
Qα,I,(I−1)(J−1) = Q0.05,4,6 = 4.9
Since r
M SE
w = Q0.05,4,6 = 7.37
J
arrange the means in increasing order and underlining the pairs which differ less than
w.
x̄4. = 41.3333, x̄3. = 41.6667, x̄2. = 45.6667, x̄1. = 50.3333
The difference between the pairs and decision are shown in the table below:
Pairs difference decision
x̄1. − x̄2. 4.67 significant
x̄1. − x̄3. 8.67 not significant
x̄1. − x̄4. 9 not significant
x̄2. − x̄3. 4 significant
x̄2. − x̄4. 4.33 significant
x̄3. − x̄4. 0.33 significant
From the table above, it can be observed that x̄1. , x̄3. , x̄3 , are not significant connect
with line segment.
From the above results, it can be shown that x̄2 . is clearly different from others.

By : Sun Bunra 140


Institute of Technology of Cambodia Statistics ( 2022-2023 )
X X
14. Use the fact that E (Xij ) = µ + αi + βj with αi = βj = 0 to show that E X̄i. −
 i j
X̄.. = αi , so that α̂i = X̄i. − X̄.. is an unbiased estimator for αi .
Solution :
X X
Use the fact that E (Xij ) = µ + αi + βj with αi = βj = 0 to show that
 i i
E X̄i − X̄.. = αi ,so that α̂i = X̄i − X̄.. is an unbiased estimator for αi .
We have
  
⇒ E X̄i − X̄.. = E X̄i − E X̄..
1X 1 XX
= (µ + αi + βi ) − (µ + αi + βi )
J IJ
= µ + α1 − µ
= αi
Therefore, α̂i = X̄i − X̄.. is an unbiased estimator for αi .
15. The accompanying data table gives observations on total acidity of coal samples of
three different types, with determinations made using three different concentrations of
ethanolic NaOH (”Chemistry of Brown Coals,” Australian J. Applied Science, 1958:
375 − 379)
Type of Coal
Morwell Yallourn Maddingley
.404 N 8.27, 8.17 8.66, 8.61 8.14, 7.96
NaOH Conc. .626 N 8.03, 8.21 8.42, 8.58 8.02, 7.89
.786 N 8.60, 8.20 8.61, 8.76 8.13, 8.07
a. Assuming both effects to be fixed, construct an ANOVA table, test for the presence
of interaction, and then test for the presence of main effects for each factor (all using
level .01).
b. Use Tukey’s procedure to identify significant differences among the types of coal.
Solution :
(a) Assuming both effects to be fixed, construct an ANOVA table, test for the presence
of interaction, and then test for the presence of main effects for each factor (all using
level .01).
Type of Coal
Morwell Yallourn Maddingley
.404 N 8.27, 8.17 8.66, 8.61 8.14, 7.96
NaOH Conc. .626 N 8.03, 8.21 8.42, 8.58 8.02, 7.89
786 N 8.60, 8.20 8.61, 8.76 8.13, 8.07
By using the above table, we get ANOVA table
Source of Variation df SS MS f
Factor A 2 0.1243 0.621 3.65
Factor B 2 1.0024 0.5012 29.48
Interaction 4 0.0146 0.00365 0.214
Error 9 0.153 0.017
Total 17 1.2943

By : Sun Bunra 141


Institute of Technology of Cambodia Statistics ( 2022-2023 )

(b) For the presence of interaction


For Factor A F0.01,2,9 = 8.02
since, fA < F0.01,2,9 thus total audits is not affected by concentration of NaOH.
For Factor B F0.01,2,9 = 8.02
since, fB > F0.01,2,9
thus total acidity depends sufficiently on type of coal. So, varying levels of NaOH
does not have a significant impact on total acidity and type of coal does appear to
affect total acidity.
(c) For Factor B r
M SE
w = f0.01,3,9 × = 0.29
IK
Arranging means in increasing order and underlining pairs whose difference is less than
w.
x̄.3. = 8.03, x̄.1. = 8.25, x̄.2. = 8.6
Therefore, there is no difference between 1 and 3 . But 1 and 3 are both differ signifi-
cantly from coal 2 .
16. A study was carried out to compare the writing lifetimes of four premium brands of
pens. It was thought that the writing surface might affect lifetime, so three different
surfaces were randomly selected. A writing machine was used to ensure that condi-
tions were otherwise homogeneous (e.g., constant pressure and a fixed angle). The
accompanying table shows the two lifetimes (min) obtained for each brand-surface
combination.
Writing Surface
1 2 3 xi..
1 709,659 713,726 660,645 4112
Brand of Pen 2 668,685 722,740 692,720 4227
3 659,685 666,684 678,750 4122
4 698,650 704,666 686,733 4137
x.j 5413 5621 5564 16,598
Carry out an appropriate ANOVA, and state your conclusions.
Solution :
Carry out an appropriate ANOVA, and state your conclusions.
Type of Coal
1 2 3 xi..
1 709,659 713,726 660,645 4112
Brand of Pen 2 668,685 722,740 692,720 4227
3 659,685 666,684 678,750 4122
4 698,650 704,666 686,733 4137
x.j. 5413 5621 5564 16,598
The hypotheses :
⇒ H0A : α1 = α2 = α3 = α4 = 0 vs HaA : at least 1αi ̸= 0
⇒ H0B : β1 = β2 = β3 = 0 vs HaB : at least 1βi ̸= 0
⇒ H0AB : γij = 0∀i, j vs HaAB : at least 1γi ̸= 0

By : Sun Bunra 142


Institute of Technology of Cambodia Statistics ( 2022-2023 )

we have
1 X 2 x2
SSA = xi.. − ... = 1387.5
JK IJK
1 X 2 x2
SSA = x.j. − ... = 2888.083
IK IJK
XXX 1 XX 2
SSE = x2ijk − xij = 8216
K
XXX x2
SST = x2ijk − ... = 20591.853
IJK
SSAB = SST − SSE − SSA − SSB = 8100.25
we have the following table :

Source df SS MS f
Brand of pens(A) 3 1387.5 462.5 0.67
Writing surf(B) 2 2888.083 1444.041 2.1
Interaction(AB) 6 8100.25 1350.041 1.97
Error 12 8216 684.66
Total 23 20591.833

■ For α = 0.05
■ For Factor A : Fα,I−1,IJ(K−1) = 3.49
■ For Factor B : Fα,J−1,IJ(K−1) = 3.89
■ For Factor AB : Fα,(J−1)(I−1),IJ(K−1) = 3.00
It can be observed that each f is not included in the rejection region.
Therefore, Neither the surface nor the brand of pen has a significant effect on the
writing.

17. a. Show that E X̄i.. − X̄ − . . . = αi , so that X̄i.. − X̄... is an unbiased estimator for
αi (in the fixed effects model).
b. With γ̂ij = X̄ij. − X̄i.. − X̄.j. + X̄.... , show that γ̂ij is an unbiased estimator for γij
(in the fixed effects model).
Solution :

(a) Show that E X̄i.. − X̄... = αi , so that X̄i.. − X̄.... is an unbiased estimator for αi
(in the fixed effects model).
We have θ = αi = µ̄i. − µ, parameter of interest with estimator θ̂ = X̄i. − X̄...
Since each sample space Xi is normally distributed by assumption, then E (Xi ) = µi

E(θ̂) = E X̄i. − X̄... = µi. − µ = αi

Therefore, X̄i. − X̄... is an unbiased estimator of αi


(b) With γ̂ij = X̄ij. − X̄i.. − X̄.j. + X̄... , show that γ̄ij is an unbiased estimator for γij
(in the fixed effects model).
The parameter of interest is θ = γij = µij − (µ + αi + βi )
Since each sample space is normally distributed then E (Xi ) = µi

By : Sun Bunra 143


Institute of Technology of Cambodia Statistics ( 2022-2023 )

We have

E (γ̂ij ) = E X̄ij. − X̄i.. − X̄.j. + X̄....
   
= E X̄ij. − E X̄i.. − E X̄.j. + E X̄...
     
= E X̄ij. − E X̄.... − E X̄i.. − E X̄... − E X̄.j. − E X̄...
   
= E X̄ij. − E X̄... + E X̄i.. − X̄... + E X̄.j. − X̄...
= µij − (µ + αi + βi )
= γij

Therefore, γ̂ij = X̄ij. − X̄i.. − X̄.j. + X̄.... is an unbiased estimator of γij

By : Sun Bunra 144


INSTITUTE OF TECHNOLOGY OF CAMBODIA
INDUSTRIAL AND MECHANICAL ENGINEERING

Academic Year ( 2022-2023 )

You might also like