Inferential Statistics
Inferential Statistics
Modular Course 5
Summary or Descriptive Statistics:
Numerical and graphical summaries
of data.
68%
-3 -2 -1 0 1 2 3
95%
99.7%
Empirical Rule
68%
m - 3s m - 2s m - 1s m m + 1s m + 2s m + 3s
95%
99.7%
Most Important for Inferential Stats
on our Syllabus
m - 2s m m + 2s
95%
Solution
95% of the IQ scores are within �2 standard deviations of the mean.
100 + 2(15) = 100 + 30 = 130
100 - 2(15) = 100 - 30 = 70
68%
-3 -2 -1 0 1 2 3
95%
99.7%
Example 2
The number of sandwiches sold by a shop from 12 noon to 2 pm each day is normally distributed.
The mean of the distribution was 42.6 sandwiches and a standard deviation of 8.2.
Use the Empirical Rule to identify the range of values around the mean that includes 68%
of the sale numbers.
Solution
68% of the sales are within �1 standard deviations of the mean .
42.6 + 1(8.2) = 42.6 + 8.2 = 50.8
42.6 - 1(8.2) = 42.6 - 8.2 = 34.4
Solution: 68% of the sale are between 34.4 and 50.8 sandwiches.
68%
-3 -2 -1 0 1 2 3
95%
99.7%
Your Turn
Question
The table below shows the prices charged per room of 40 B&B houses in Galway.
Race - Week B&B prices per room (€)
56 75 60 70 80 70 50 90 80 75
75 50 75 50 70 60 65 60 50 70
84 70 70 60 60 70 70 70 40 60
70 80 60 65 55 50 70 80 50 55
(i) Calculate, correct to one decimal place, the mean and standard deviation of the data.
(ii) Show that the emperical rule holds true for 1 standard deviation around the mean.
(iii) Show that the emperical rule holds true for 2 standard deviations around the mean.
68%
-3 -2 -1 0 1 2 3
95%
99.7%
Solution
(i) Using calculator : Mean = 65.5, SD =11.2
68%
-3 -2 -1 0 1 2 3
95%
99.7%
Inferential Statistics:
Sampling
For Leaving Cert we deal with two types of sampling:
To ensure this
1. The sample must be random;
S ample 3
S ample 6
S ample 4
S ample 5
Population Proportions and Margin of Error
sample of 25 students in a school were asked if they spent over €5 on
A
mobile phone calls over the last week. 10 students have spent over €5.
The proportion of the sample of 25 who spent over €5 was
Can we say that 40% of the students in the school (population) spent over
€5?
The answer is no, (unless the sample size was the same as the population
size), we can’t say for certain.
This means that we are 95% certain that the population proportion is
within ±2 standard deviations of the sample proportion. ± 2 standard
deviations is our margin of error and the percentage margin of error
that this represents depends on the sample size.
95% is the confidence interval we are working with, but other confidence
intervals also exist (e.g.90% and 99%) for which a different margin of error
applies depending on sample size.
95% confidence
interval
Population
Proportion
diferent 95% confidence intervals
• If we double the sample size (1000 to 2000) we do not get do not half
the margin of error
10%
7.5% 16.5%
10% is between 7.5% and 16.5% (inside the margin of error) so it seems not to be unusual.
Recognising the Concept of a
Hypothesis Test
Testing claims about a population.
Null Hypothesis: The null hypothesis, denoted by H 0 is a claim or
statement about a population. We assume this statement is true
until proven otherwise. (the null hypothesis means that nothing is
wrong with the claim or statement).
• If the jury reject the null hypothesis (H0), this means that they find the
defendant guilty.
• If the jury fail to reject the null hypothesis (H 0), this means that they
find the defendant not guilty.
Often we need to make a decision about a population based on a
sample.
95% confidence
interval
Population
Proportion
Claim % Claim %
Claim % Claim % (H0) is
(H0) is (H0) is (H0) is outside
outside inside inside
Go Fast
Airlines
Evidence:
Sample Proportion =
Margin of Error =
70%
Conclusion 63.24% 69.56%
The 70% is outside the range 63.24% to 69.56% of our confidence interval. Reject
There is sufficient evidence to reject the claim that the percentage of passengers who are
happy with the service is 70% at the 5% level of significance.
II. Null hypothesis : 60% of viewers watch the Late Late Show.
Alternative hypothesis : 60% of viewers do not watch the Late Late
Show.
= 0.45 = 45%
60
%
40% 50%
Rejec
t
iii. There is sufficient evidence, according to the survey, Reject the Null
Hypotheses. Reason: 60% is outside the confidence interval.
Empirical Rule
68%
m - 3s m - 2s m - 1s m m + 1s m + 2s m + 3s
95%
99.7%
4 7 1 1 1 1 2
0 3 6 9 2
m = 278
s = 12 –3 –2 –1 0 1 2 3
24 25 26 27 29 30 31
2 4 6 8 0 2 4
Standard Normal Distribution
1 - 12 z2
If m = 0 and s = 1 we would plot e
2p
This graph gives the Standard Normal Graph with a standardised scale.
Total area under the curve
� - 1 z2
1
P(-�< z < �) =
2p �e
-�
2
dz = 1
m - 3s m - 2s m-s m m+s m + 2s m + 3s
-3 -2 -1 0 1 2 3
z - scores
36
Pg. 36
37 Pg.
For a given z, the table gives
1
1 z - t
P(Z �z) =
�e 2
dt
Pg. 37
2p -�
Pg.
–3 –2 –1 0 1 2 3
1.31
P(Z �1 �
31) can be read from the tables directly
31) = 0 �
P(Z �1 � 9049 = 90.49%
Example 2
Using the tables find P(Z �1 �
32)
37 Pg.
Pg. 37
Pg. 36
Pg. 36
–3 –2 –1 0 1 2 3
1.32
The table only gives value to the left of z, but
the fact that the total area under the curve
P(Z �z) is equal to 1 - P(Z �z) equals 1, allows us to use, P(Z �z) = 1 - P(Z �z)
P(Z �1 �32) = 1 - P(Z �1 �32)
32) = 1 - 0 �
P(Z �1 � 9066 = 0 �
0934 = 9.34% P(Z �z)
0
z
Example 3
Using the tables find P(Z �-0 �
74).
37 Pg.
Pg. 37
Pg. 36
Pg. 36
–3 –2 –1 0 1 2 3
–0.74
0
–z z
Example 4
1 32 - z 1 29)
Using the tables find P( ��ף
37 Pg.
Pg. 37
Pg. 36
Pg. 36
–3 –2 –1 0 1 2 3
–1.32 1.29
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
1.29 –1.32
P( --1�
32=��ף
z 1 -29) Area to the Left of 1 29 Area to the left of 1.32
29) - [ 1 - P(z �1 �
= P(z �1 � 32)]
= 0�
9015 - [1 - 0 � 8081 = 80.81%
9066] = 0 �
Your Turn
Question 1
The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of €53 and a
standard deviation of €15. If a monthly phone bill is chosen at random, find the probability that the
amount due is between €47 and €74.
Solution
x -m x -m
z1 = z2 =
s s
47 - 53 74 - 53
z1 = z2 =
15 15
z1 = - 0 �
4 z2 = 1 �
4
P(-0 �
4 < Z < 1�
4)
P(-0 �
4 < Z < 1� 4) - [ 1 - P(Z �0 �
4) = P(Z �1 � 4)]
P(-0 �
4 < Z < 1�
4) = 0 �
9192 - [1 - 0 �
6554]
P(-0 �
4 < Z < 1�
4) = 0 �
5746
Question 1: Solution
The amounts due on a mobile phone bill in Ireland are normally distributed with a mean of €53 and a
standard deviation of €15. If a monthly phone bill is chosen at random, find the probability that the
amount due is between €47 and €74.
Solution
x -m x -m
z1 = z2 =
s s
47 - 53 74 - 53
z1 = z2 =
15 15
z1 = - 0 �
4 z2 = 1 �
4
8 23 38 47 53 68 74 83 98
–3 –2 –1 –0.4 0 1 1.4 2 3
P(-0 �
4 < Z < 1�
4)
P(-0 �
4 < Z < 1� 4) - [ 1 - P(Z �0 �
4) = P(Z �1 � 4)]
P(-0 �
4 < Z < 1�
4) = 0 �
9192 - [1 - 0 �
6554]
P(-0 �
4 < Z < 1�
4) = 0 �
5746
Question 2
The mean percentage achieved by a student in a statistic exam is 60%.
The standard deviation of the exam marks is 10%.
(i) What is the probability that a randomly selected student scores above 80%?
(ii) What is the probability that a randomly selected student scores below 45%?
(iii) What is the probability that a randomly selected student scores between 50% and 75%?
(iv) Suppose you were sitting this exam and you are offered a prize for getting a mark which is
greater than 90% of all the other students sitting the exam?
What percentage would you need to get in the exam to win the prize?
Solution
x - m 80 - 60
(i) z= = =2
s 10
P(Z > 2) = 1 - P(Z < 2)
P(Z > 2) = 1 - 0.9772 = 0.0228 = 2.28% 30 40 50 60 70 80 90
–3 –2 –1 0 1 2 3
x - m 45 - 60
(ii) z= = = -1.5
s 10
P(Z < -1.5) = P(Z > 1.5) = 1 - P(Z < 1.5)
P(Z < -1.5) = 1 - 0.9332 = 0.0668 = 6.68% 30 40 45 50 60 70 80 90
–3 –2–1.5–1 0 1 2 3
Question 2: Solution
The mean percentage achieved by a student in a statistic exam is 60%.
The standard deviation of the exam marks is 10%.
(i) What is the probability that a randomly selected student scores above 80%?
(ii) What is the probability that a randomly selected student scores below 45%?
(iii) What is the probability that a randomly selected student scores between 50% and 75%?
(iv) Suppose you were sitting this exam and you are offered a prize for getting a mark which is
greater than 90% of all the other students sitting the exam?
What percentage would you need to get in the exam to win the prize?
Solution
x - m 80 - 60
(i) z= = =2
s 10
P(Z > 2) = 1 - P(Z < 2)
P(Z > 2) = 1 - 0.9772 = 0.0228 = 2.28% 30 40 50 60 70 80 90
–3 –2 –1 0 1 2 3
x - m 45 - 60
(ii) z= = = -1.5
s 10
P(Z < -1.5) = P(Z > 1.5) = 1 - P(Z < 1.5)
P(Z < -1.5) = 1 - 0.9332 = 0.0668 = 6.68% 30 40 45 50 60 70 80 90
–3 –2–1.5–1 0 1 2 3
Question 2: Solution
x -m x -m
(iii) z1 = z2 =
s s
50 - 60 75 - 60
z1 = z2 =
10 10
z1 = -1 z2 = 1.5
30 40 50 60 70 75 80 90
P(-1 < Z < 1 � 5) - [ 1 - P(Z �1)]
5) = P(Z �1 � –3 –2 –1 0 1 1.5 2 3
(iv) From the tables an answer for an area of 90% (0.9) = 1.28 � Z = 1.28
x -m
z=
s
x - 60
1.28 = � x = 72.8 marks
10
30 40 50 60 70 72.8 80 90
–3 –2 –1 0 1 1.28 2 3
For Higher Level Leaving Cert use z scores
37 Pg.
Pg. 37
Pg. 36
Pg. 36
-1.96 0 +1.96
95% confidence
interval
Population
Proportion
Confidence Limits =
95% confidence
interval
55.36% 63.04%
Your Turn
Question 1:
The Sunday Independent reports that the government's approval rating is at 65%. The
paper states that the poll is based on a random sample of 972 voters and that the margin
of error is 3%
Show that the pollsters used a 95% level of confidence.
Question 1: Solution
The Sunday Independent reports that the government's approval rating is at 65%. The
paper states that the poll is based on a random sample of 972 voters and that the margin
of error is 3%
Show that the pollsters used a 95% level of confidence.
Solution
Confidence Limits=
0.03 =
0.03 =
=1.96
Sample Proportion =
Confidence Limits =
95% confidence
interval
Sample means
Sample Means
The data below are the heights in cm, of a population of 100, 15 year old students
165 161 170 182 176 185 180 155 154 166
165 152 174 167 165 171 172 150 181 165
166 161 174 158 166 168 164 150 155 170
168 144 164 154 177 173 178 158 165 175
180 174 152 167 148 175 153 162 180 175
157 172 155 140 147 160 152 166 168 158
153 165 160 143 166 167 167 163 158 160
150 157 172 167 184 172 165 159 158 177
179 174 156 178 165 179 174 148 175 166
157 159 163 165 162 153 145 170 176 180
�( x - m )
2
Slide60
It does not matter if the original distribution of the sample means
will always be normally distributed. Use Java Applets.
Slide61
A single sample of 5 data points. A single sample of 10 data points.
The black arrows are the data points. The mean of the sample is the red dot
Naturally if we choose a sample size of 100 (original population size) the mean of the
sample will be that same as the mean of the population.
Population
S ample 1
S ample 2
S ample 3
S ample 6
S ample 4
S ample 5
Summary
Populatio Large Sample Means
n Sample
Mean
Standard Deviation (Standard Error)
In practice, from the table above, we can say that for n �30
1. The sample means are normally distributed.
2. The mean of the sample means is the same as the population mean. m x = m
s
3. The standard deviation of the sample means is equal to
n
s
this is called the standard error. s x =
n
KEY IDEA CLICK LINK BELOW
http://onlinestatbook.com/stat_sim/sampling_dist/index.htm
l
In the Standard Normal Distribution we want the values of z 1 such that 95%
of the population lies in the interval - z 1 ≤ z ≤ z1
- z1 z1
095 0025
0025
P(z �z1 ) = 0 �
95 + 0 �
025 = 0 �
975
� z1 = 1 �
96 and - z1 = -1 �
96
Therefore in a Normal Distribution 95% of the population lies within 1∙96
standard deviations of the mean.
95% of the population lies within 1∙96 of μ( the population mean)
\m x
96s x < m < m
- 1� x
96s x
+ 1�
s s
As s x �=ޱ the confidence limits are m x
1 96
n n
Slide71
Example 1
A random sample of 250 cars were taken and the mean age of the cars was
4�5 years and the standard diviation was 2 �
2 years.
(i) Find the 95% confidence interval for the mean age of all cars.
(ii) What size sample is required to estimate the mean age, with 95% confidence
within �0.3 years.
s s
(i) The confidence limits are x �1 �
96 (ii) �1 �
96 = �0 � 3
n n
2�2 2�2
4 =5 1 96
�ױ �1�96 = 0� 3
250 n
5 -1�
4� 139) < m < 4 �
96(0 � 5 +1�
96(0 �
139)
� n=
(1� 96 ) ( 2 �
2)
4� 23 < m < 4 �
77 0� 3
This means that we can say with 95% confidence = 14 �373
� n = ( 14 �
373 ) = 207cars
2
that the mean age of all cars in the population is
between 4 �23 years and 4 � 77 years.
Example 2
A random sample of 144 male students in a large university was taken and their heights measured.
The mean height was 175 cm. The standard deviation of all the male students in the university
was 9 cm.
(i) Give a 95% confidence intreval for the heights of all the male students.
(ii) Show that the confidence interval would decrease if a sample size was 225 instead of 144.
(i) n = 144, x (mean of the sample) = 175, s (standard deviation of the population) = 9,
m (population mean) is unknown.
s
We calculate the standard error of the mean using sx =
n
9
sx = = 0.75
144
As the sample size is large the best possible estimated value of m is x which is 175 cm.
Now we have to give a range of values in which the true population mean (m) lies.
This will be with 95% level of certainty.
s s
x - 1.96 � m � x + 1.96
n n
175 - 1.96(0.75) � m � 175 + 1.96(0.75)
173.53 � m � 176.47
The true population mean lies within the range 173.53 cm to 176.47 cm with 95% certainty.
(ii) If a sample of 225 were taken the standard error would be
9
sx = = 0.6
225
s s
x - 1.96 � m � x + 1.96
n n
175 - 1.96(0.6) � m � 175 + 1.96(0.6)
173.82 � m � 176.18
The true population mean lies within the range 173.82 cm to 176.18 cm with 95% certainty.
The confidence interval has decreased.
Solution:
The original claim is that the success rate is no different from 50%.
H0 = 0.5
H1 �0.5
57
pˆ = = 0.548
104
pˆ - p 0.548 - 0.50
z= = = 0.98
p(1 - p) n (0.5)(0.5)/104
At 5% level of significance the critical values are �1.96
As 0.98 is between - 1.96 and 1.96 we fail to reject the null hypthesis.
There is not sufficient evidence to warrant rejection of the claim that women who guess the sex of
their babies have a success rate equal to 50%.
Your Turn
A survey was carried out to find the weekly rental costs of holiday apartments in a certain country.
A random sample of 400 apartments was taken. The mean of the sample was €320 and the
standard deviation was €50.
Form a 95% confidence interval for the mean weekly rental costs of holiday apartments in that country.
s
The confidence limits are x �1 �
96
n
50
= 320 �1 �
96
400
320 - 1 � 5) < m < 320 + 1 �
96(2 � 96(2 �
5)
315 �1 < m < 324 �9
Between €315 � 10 and €324 �90
Night 3
Hypothesis Testing
Slide79
Often we need to make a decision about a population based on a
sample.
2. During a 5 minute period a new machine produces fewer faulty parts than
an old machine.
Assuming that the new machine is no better than the old one is called a
NULL HYPOTHESIS (H0)
Assuming that the new machine is better than the old one is called an
ALTERNATIVE HYPOTHESIS (H1)
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
z = - 1∙96 or z = 1∙96
Slide81
Testing the Null Hypothesis using z-
values
The statistical method used to determine whether H0 is true or not is called
HYPOTHESIS TESTING.
Statisticians speak of “not accepting or accepting H0 at a certain level”. This
level is called the LEVEL OF SIGNIFICANCE. ( 5% level of significance is on the
syllabus).
If the value of z lies outside the range - 1∙96 < z < 1∙96 (critical region)
we reject H0 .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
If we take a large sample of size n from a population with a mean of m and a standard deviation of s.
We have to calculate the mean of the sample x. ( m x = x when we are dealing with large samples)
s
We can also calculate sx (s) by using sx = .
n
We want to test the hypothesis that the sample comes from a population with a
paticular value of m called m 0
2. Convert the observed results into z units. (Calculate the test statistic).
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m 0 497 - 500
Z= = = - 2.7
s 10
n 81
Example 1
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m 0 10,000 - 11,000
Z= = = - 10.87
s 552
n 36
Example 2
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m0 0.6 - 0.75
z= = = -4.5 Note we are approximating with
s 0.2 as we don’t know .
n 36
Question 1: Solution
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m 0 50 - 51.5
Z= = = - 1.24
s 8.5
n 49
Example 2
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m 0 3.28 - 3.42
Z= = = - 3.48
s 0.9
n 500
Example 3
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
If p �0.05: Very strong evidence to reject the null hypotenuse H0 (if p is low H0 must go)
If p > 0.05: Very strong evidence to fail to reject the null hypotenuse H0 .
Example 1
Medical consultants for large companies are concerned about the effects of stress
on company executives. The mean systolic blood pressure for males aged
35 to 44 years of age is, according to national health statistics, 128 with
a standard deviation of 15. A sample of 72 male executives in this age
group ws selected from companies. Their mean blood pressure was 130.
(i) Construct a 95% confidence interval for the mean systolic blood pressure
for the executives. Interpert this interval.
(ii) Carry out a hypothesis test using a significance level of 5% to see if there
is evidence to suggest that the mean systolic blood pressure for executives
is different to the national average. Clearly state the null and alternative
hypothesis and your conclusion. Give a p-value for this hypothesis test
and interpret this p-value.
(i) n = 72, s = 15, x = 130
�s �
95% confidence interval x �1.96 � �
�n�
�15 �
95% confidence interval 130 �1.96 � �
� 72 �
130 �3.46
[126.54, 133.46]
This means that the mean systolic blood pressure (m) for all male executives aged 35 to 44
in large companies lies in the range 126.54 to 133.46, with 95% certainty.
This range includes the national average of 128.
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m0 3.6 - 4
Z= = = - 2.53
s 1
n 40
Example 2
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
“The p-value is very small – there is only a 1.2% chance that the
deviation from the 4 kg stated is due to sampling variability. This is
very strong evidence for rejecting the company’s claim.”
Your Turn
Question 1
The mean hourly wage in an EU country is €10. A sample of 35 individuals in the capital city
of the country has a mean hourly wage of €10.83 with a standard deviation of €3.35 per hour.
(i) Construct a 95% confidence interval for the mean hourly wage in the capital city.
Interpert this interval.
(ii) Is there evidence to suggest that hourly wages for workers in the capital city are
differen from the national hourly wage?
Test the hypothesis using a 5% level of significance.
Clearly state the null and alternative hypotheses and your conclusion.
Give a p-value for this hypothesis test and interpret this p-value.
Question 1: Solution
The mean hourly wage in an EU country is €10. A sample of 35 individuals in the capital city
of the country has a mean hourly wage of €10.83 with a standard deviation of €3.35 per hour.
(i) Construct a 95% confidence interval for the mean hourly wage in the capital city.
Interpert this interval.
(ii) Is there evidence to suggest that hourly wages for workers in the capital city are
differen from the national hourly wage?
Test the hypothesis using a 5% level of significance.
Clearly state the null and alternative hypotheses and your conclusion.
Give a p-value for this hypothesis test and interpret this p-value.
(i) n = 35, s = 3.35, x = 10.83
�s �
95% confidence interval x �1.96 � �
�n�
�3.35 �
95% confidence interval 10.83 �1.96 � �
� 35 �
10.83 �1.11
[9.72, 11.94]
This means hourly wage (m) for workers in the capital city lies in the range €9.72 to €11.94
with 95% certainty.
This range includes the mean hourly rate for the country (€10).
Question 1: Solution
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m0 10.83 - 10
Z= = = 1.466
s 3.35
n 35
Question 1: Solution
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96
Step 2. Convert the observed results into z units. ( Calculate the test statistic ) .
x - m0 0.721 - 0.725
Z= = = - 2.83
s 0.01
n 50
Question 2: Solution
Step 3. Write down the critical values. ( a sketch also helps ) .
Fail to Fail to
Reject Reject Reject Reject
Reject H0 Reject H0
25% 25%
-1.96 1.96