Lecture Slides 3a - Statistical Testing

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

Lecture 3

Statistical testing

- Statistical testing
- One-sample t-test

Free after dr. ir. P. Heijnen (TU Delft)

Statistical testing

• Normal distribution

• Standard normal distribution

• Confidence intervals

• The one-sample t-test

The normal distribution
Normal distribution
• X is a continuous random variable and has a normal
distribution with average  en standard deviation 

 average
 standard deviation
2 variance
Standard percentages of
the normal distribution


-3 -2 -1  1 2 3

Normal distribution
Example shopping center

• On average visitors of a shopping center live within a

distance of 4.6 km of the center ( = 3.06)

 = 3.06

 = 4.6
The probability of an interval of X
Example shopping center

• What is the probability of a visitor of the shopping

center living within a distance of 2.5 km of the center?

 = 3.06

x = 2.5
 = 4.6
Standard normal distribution
Standard normal distribution

Transformation of X => Z

• What is the mean of z?

• What is the standard deviation of z?

All probabilities
known in tables

Standard normal distribution
Example shopping center

What is the z-value of a distance of 2.5 km ( = 4.6

and  = 3.06)?

z = -0.686 0
So we have
z = -0.686

Look up in table
P(Z ≤ -0.686) ≈ 0.245

P(X ≤ 2.5) ≈ 0.245

Probability P(Z  z)
Example Shopping Center
• Which % of visitors of the shopping center live within
a distance of 5 km from the center?

Look up in table

Approximately 55%
Probability P(Z > z)
Example Shopping Center

• Which % of visitors of the shopping center live more

than 5 km from the center?
Confidence intervals
Confidence intervals

• In a sample, we find an average of X

• Can we say that X represents the average of the


• Yes, as a best guess, but how confident are we?

• The smaller the sample, the less confident we are


• Average so, the average of the

• Sample sample is an estimate of
the average of the
• Population population
• Estimate of

• Standard deviation
• Sample
• Population
Confidence interval - the problem
• Find an interval [x1, x2] such that average will be
with 95% confidence within [x1, x2]
95% is often used

• For the shopping example

• What is the 95% confidence interval for the
estimate average distance to the center? = 4.6
When we would draw many
samples, we would get a
distribution of sample means


=4.6 km

x1 x2
Confidence interval

So, we are looking for the values x1 and x2

For the standard normal distribution
we know the critical values z1 and



z1 =-1.96 z2 = 1.96
Confidence interval
To translate this to x variables
we must know the variance of
the distribution of sample
Standard deviation of sample means

Standard deviation of sample averages?

σ = standard deviation in
n = sample size

So, the standard deviation of sample averages is much

smaller depending on n
So, we have the formula

... but we don’t know population

95% variance σ, we only know sample
variance s
Let’s for the moment assume that the
z1 =-1.96 z2 = 1.96
population variance equals the sample
Confidence interval
we assume = s
Then we get
Confidence interval – population variance known
Example shopping center

Using the formula for z, we have



z1 =-1.96 z2 = 1.96
Confidence interval Solving x1 and x2
Confidence interval – population variance known
Example shopping center


=4.6 km

x1 =3.92 x2 = 5.28
Confidence interval

Conclusion: the 95% confidence interval for average

distance to the center is: [3.92, 5.28]
BUT ... the population variance is unknown

• We do not know the population variance, but we do

have an estimate, namely the variance we find in the

• Because we estimate the population variance, there is

more uncertainty

• As a consequence, we cannot use the standard normal


• Instead, we must use a slightly different distribution,

known as the student t-distribution
Student t-distribution

• Bell-shaped curve, around 0

• Larger variance than

standard normal distribution

• Takes into account the larger

uncertainty since also  is
estimated by sample stand. dev. s # degrees of freedom

• Probability density function has parameter (N - 1)

Confidence interval – population variance unknown
Example shopping center
df = 80 - 1

The formula for t is the

95% same as for z

-1.99 t=0 1.99

Correspond to 95% for t-

distribution, df = 80 - 1

Larger interval Solving x1 and x2

• Conclusion: the 95% confidence interval for average
distance to the center is: [3.91, 5.29]


3.9 5.3
 = 4.6

Sample size is large enough, so

approximately the same interval is
Summary of steps
Calculate a 95% confidence interval

• Use the Student t-distribution since the population

variance is unknown
1. Calculate the degrees of freedom as df = N – 1
2. Given df, determine the critical values of t for a 95%
confidence interval – this is t0.975
3. Calculate the standard deviation of sample averages

4. Given the sample average , calculate the interval [x1

, x2] as:
Calculating a confidence interval in


Statistic Std. Error

Verplaatsingsafstand Mean 123.90 .538
inTravel distance in
Nederland 95% Confidence Lower Bound 122.84
Netherlands Interval for Mean Upper Bound

5% Trimmed Mean 81.49

Median 35.00
Variance 64130.129
Std. Deviation 253.239
Minimum 1
Maximum 6950
Range 6949
Interquartile Range 110
Skewness 5.033 .005
Kurtosis 40.666 .010

Lower and upper bound of 95% confidence interval

for variable Travel distance in Netherlands
One-sample t-test:

Student t-test for averages

The concept of hypothesis testing
Example: body length

• Someone says the average length of an adult person in

the Netherlands is 1.70 m

• In a sample (n = 100) we find an average length of

1.75 m and a standard deviation of 0.15 m

• Do we belief the person?

• We use a statistical test to make a decision

• Assume the person is right; then what is the probability
that we find an average in our sample that differs as
strongly as 1.75 m does from this claimed average?

Quite extreme so it
seems unlikely

0 = 1.70
Assuming the claim is right
• We want to know the probability that we find an
average in our sample that differs as strongly as 1.75 m
does from this claimed average

• So, on both sides

This percentage is
often chosen
= 1.65 0 = 1.70 = 1.75

• If the probability is smaller than 5% we decide not to

belief the person
Translation to formal terms
• The average length is 1.70 m
• Null hypothesis (H0)

• The average length is not equal to 1.70 m

• Alternative hypothesis (H1)

• The maximum probability of making a wrong decision

that we still accept is 5%
• Alpha ( = 5%)

• The probability that we find an average length of 1.75

or larger in the sample while the null hypothesis is
• p-value
The way the test can be performed -
confidence intervals

• Calculate a 95% confidence interval [μ1, μ2] around

the test value
• If the sample average falls outside the interval then
reject the null hypothesis

2.5% 2.5% Because we assume the

sample standard
deviation, we should use a
1 2 Student t-distribution
0 = 1.70
2.5% 2.5%

1 2 df = N – 1 = 99
0 = 1.70

2.5% 2.5%

-1.98 0 1.98

Correspond to 95% for t-

distribution, df = 99

• We have found a mean of 1.75 m

What is the t-value of this mean?
• the t -value is the standardized value just as the z -
value, but then for the t-distribution

• calculated as:

Sample average Test value

• For the sample we find

very small –
typical for large N
df = N – 1 = 99

2.5% 2.5%

-1.98 0 1.98

Correspond to 95% for t-

distribution, df = 99

• Because 3.33 > 1.98, we reject the null hypothesis and

accept the alternative hypothesis that the average is
different than 1.70 m
Choosing the alternative hypothesis
• In the example we tested whether the average is
different from 1.70 m

• We could also test whether the average length is

larger than 1.70 m (in stead of just different)

• This makes sense if we are interested in that particular


• Then the alternative hypothesis is:

• The average length is larger than 1.70 m

• Does this make a difference for the test?

• Because we now test whether it is larger we look at
one side

df = N – 1 = 99


0 1.66

Correspond to 95% for t-

distribution, df = 99

• Because 3.33 > 1.66, we reject the null hypothesis

and accept the alternative hypothesis that the actual
average is larger than 1.70 m
Summary ( = 5%) – we have three
H1 < H 0
Called one-tailed test
H1 > H 0


H1 ≠ H 0

2.5% 2.5%

Called two-tailed test

Aonther way the test can be
performed - p-value
• p-value is the probability that we find the t-value
while the null hypothesis is true

• What is the probability that we find 1.75 m or

anything as strongly deviating from 1.70 m?


= 1.65 0 = 1.70 = 1.75

The t-value is t = 3.33
The degrees of freedom is
df = 99

The corresponding p-value

is p = 0.0012
= 1.65 0 = 1.70 = 1.75

On the internet p-value calculators are avaialable, for

example https://www.graphpad.com/quickcalcs/pvalue1.cfm

In this case, we did a two-tailed test. Does it make a

difference when instead we would do a one-tailed test
(when the alternative H says the average is larger instead
of just difference)?
Yes that makes a difference


p1 p1 The p-value is
p = p1 + p1 = 0.0012
= 1.65 0 = 1.70 = 1.75


p1 The p-value is
p = p1 = 0.0006
0 = 1.70 = 1.75

So, in a one-tailed test the p-value is twice as small!

Student t-test
Another example: satisfaction measurement

• In a survey, we ask a sample of N =50 visitors to indicate their

satisfaction on a 5 point scale (1 = very dissatisfied, 5 = very

• Null hypothesis: visitors are neutral (i.e., not satisfied or

dissatisfied): 0 = 3.0

• Alternative hypothesis: visitors are not neutral: 0  3.0

• We find a mean score of 𝑥= 3.45 with standard deviation s =

1.51, so = 3.45

• Can we conclude that on average visitors are not neutral?

Student t-test SPSS
Example satisfaction measurement H0
One-Sample Test

Test Value = 3.0

95% Confidence
Interval of the
Mean Difference
t df Sig. (2-tailed) Difference Lower Upper
Satisfaction score 2,107 49 .0403 0.45 0.025 0.875

degrees of freedom N -1

If 0 in interval then
H0 accepted

p-value of this t-value

(or smaller) if H0 is true
Critical t – values

Alpha = 5%

large samples
Summary of important concepts

• Student t-distribution: population variance is estimated

 more uncertainty compared to z-distribution

• Confidence interval for an estimate of population

average based on a sample

• One-sample t-test:
• Is the average different from a known average?

• One-tailed or two-tailed testing depends on the

formulation of the alternative hypothesis

You might also like