0% found this document useful (0 votes)
28 views70 pages

Lecture Notes Week 1

Uploaded by

Jay VN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views70 pages

Lecture Notes Week 1

Uploaded by

Jay VN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

228371

Statistical Modelling for Engineers & Technologists


LECTURE SET 1: Foundations of Inference
We are
here
Lecture 1 (Monday 11am) Revision, Sampling, CLT, Inference
Lecture 2 (Weds 8am) Confidence intervals
Lecture 3 (Thurs 12pm) Hypothesis testing
Lab (2hours on Friday) “Confidence intervals & Hypothesis testing”

Lab1 (Friday 8am – 10am) OR Lab2 (Friday 12pm – 2pm)


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Probability basics
Sets X = {1, 2, 6, 12, 15} 2∈X 3∉X

Random experiments Multiple outcomes (>1), which outcome you get is unknown

Discrete vs Continuous sample spaces Discrete: specific set of values


Continuous: any value within the range

Probability in practice → relative frequency 20 %


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Probability basics
Conditional probability P(being green)

P(being green | it is a camel) < P(being green | it is a chameleon)


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference
All students

Probability basics
Engineering Female students
Conditional probability
students

All students All students

Female students
Independence?
Engineering Female students Engineering
students students
228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Probability basics All students

Engineering Female students


Law of total probability students


𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 + 𝑃(𝐴 ∩ 𝐵)

𝑃 𝑓𝑒𝑚𝑎𝑙𝑒 = 𝑃 𝑓𝑒𝑚𝑎𝑙𝑒 ∩ 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 + 𝑃(𝑓𝑒𝑚𝑎𝑙𝑒 ∩ 𝑁𝑂𝑇 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔)


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Probability basics All students

Maths

Engineering Female students


Law of total probability students
Science Art
Music

𝑃 𝐴 = ෍ 𝑃 𝐴 𝐵𝑛 𝑃(𝐵𝑛 )
𝑛

𝑃 𝑓𝑒𝑚𝑎𝑙𝑒 = 𝑃 𝑓𝑒𝑚𝑎𝑙𝑒|𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 ∗ 𝑃 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟𝑖𝑛𝑔 + 𝑃 𝑓𝑒𝑚𝑎𝑙𝑒 𝑚𝑎𝑡ℎ𝑠 ∗ 𝑃 𝑚𝑎𝑡ℎ𝑠 + 𝑃 𝑓𝑒𝑚𝑎𝑙𝑒 𝑚𝑢𝑠𝑖𝑐 ∗ 𝑃 𝑚𝑢𝑠𝑖𝑐 … …

Ratio of engineering students to


all students
228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Probability basics
Bayes theorem 𝑃 𝐴|𝐵 → 𝑃(𝐵|𝐴) 𝑃 𝑓𝑒𝑚𝑎𝑙𝑒|𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟 → 𝑃 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟 𝑓𝑒𝑚𝑎𝑙𝑒

𝑃(𝐴∩𝐵) 𝑃(𝑓𝑒𝑚𝑎𝑙𝑒 𝐴𝑁𝐷 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟)


𝑃 𝐴𝐵 = 𝑃 𝑓𝑒𝑚𝑎𝑙𝑒|𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟 =
𝑃(𝐵) 𝑃(𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟)

𝑃(𝐵∩𝐴) 𝑃(𝑓𝑒𝑚𝑎𝑙𝑒 𝐴𝑁𝐷 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟)


𝑃 𝐵𝐴 = 𝑃 𝑒𝑛𝑔𝑖𝑛𝑒𝑒𝑟|𝑓𝑒𝑚𝑎𝑙𝑒 =
𝑃(𝐴) 𝑃(𝑓𝑒𝑚𝑎𝑙𝑒)
All students

Engineering Female students


students
228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Probability basics
Random Variables

Sampling

How long are the leaves of this tree?

Data are the results of the realisation of a random variable


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Random Sampling
Each unit in the population is equally likely to end up being picked.

Random Sample Stratified Sample


Problem: not enough information on people with pink hair Ensures pink-hair subgroup is adequately represented in sample

Key to understanding how one random sample will differ from the next, and for extrapolating from sample to population
228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Probability basics
Distributions statistical models of our data
which one to use depends on type of data (discrete / continuous), and the shape of the data

DISCRETE: Binomial How many successes? how many failures?


Geometric How many failures until you win?

CONTINUOUS: Normal Symmetrical. Very common.


Exponential Useful for time. How long will this bulb stay on until it breaks?

We use distributions to calculate probabilities.

See notes from last year’s course F(x)


228271 for refresher + equations ☺ x
228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Random vs Haphazard Sampling


Q -> Will rats with different treatments take different times to run through a maze?

Haphazard sampling -> choose which rats to get the treatment by “reaching in and grabbing a rat without looking”
Might “feel” random, but maybe the less energetic rats are easier to catch? Or those with shorter legs?

Random sampling -> label your rats with numbers, pick numbers at random, those rats get the treatment.

Moral: formal assignment of random numbers to the experimental units is the way to go.
228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Central Limit Theorem (CLT)


If ten of us go and collect 20 leaves from the tree,
we will have ten samples, and ten means for leaf length.

CLT says that the distribution of these means:


1. is a normal distribution,
2. centered on the true mean (~the mean of the means),
3. width is informed by total population and sample size (larger sample, smaller width).

CLT is useful because it gives you an idea of how to get from sample estimates, to population estimates
Foundation of most inferences on this course

You must have a RANDOM SAMPLE


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Statistical Inference
Learning about a population or process from samples

1. Get data
2. Fit distribution
3. Make probabilistic statements

Inference – to come to a conclusion using evidence and reasoning, under conditions of uncertainty

Statistical inference – to come to a conclusion about a population, based on sample(s) and distributions, using probabilities

To make these statements we often use confidence intervals…


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Confidence Intervals – the probability that the true value of a population will fall between two values for a certain proportion of times.
Want to make statements like: “With 95 % confidence, the population proportion lies between x and y”

If we did the experiment many, many times, build our confidence intervals, 95 % of those intervals would contain the true
population.

The more confident you want to be, the wider the interval will be. ± 2 standard deviations

The large number of samples you have, the smaller the interval will be (CLT).

BUT we usually only do the experiment once!


228371 Lecture set 1: Foundations of Inference
Lecture 1: Revision, Sampling, CLT, Inference

Confidence Intervals – the probability that the true value of a population will fall between two values for a certain proportion of times.

Some confidence intervals DO NOT Next lecture:


include the true mean
(the other 5%) How to calculate confidence intervals
228371
Statistical Modelling for Engineers & Technologists
LECTURE SET 1: Foundations of Inference

Lecture 1 (Monday 11am) Revision, Sampling, CLT, Inference


Lecture 2 (Weds 8am) Confidence intervals We are here
Lecture 3 (Thurs 12pm) Hypothesis testing
Lab (2hours on Friday) “Confidence intervals & Hypothesis testing”

Lab1 (Friday 8am – 10am) OR Lab2 (Friday 12pm – 2pm)


228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals – the probability that the true value of a population will fall between two values for a certain proportion of times.

Sample estimate ± Multiplier x Standard error


True
F(x)

90% x
Confidence Sample
interval estimate
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals – the probability that the true value of a population will fall between two values for a certain proportion of times.

Sample estimate ± Multiplier x Standard error

𝑝(1
Ƹ − 𝑝)Ƹ
𝑝Ƹ ± 𝑧 ∗
𝑛
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Nicotine Patches vs Zyban 𝑝(1


Ƹ − 𝑝)Ƹ
Study: New England Journal of Medicine 3/4/99 𝑝Ƹ ± 𝑧 ∗
𝑛
What is effective to help stop smoking?

893 participants randomly allocated to four groups:


Placebo, nicotine patch only, Zyban only, Zyban plus nicotine patch
(participants blinded to which)
Everyone got a patch (nicotine or placebo), Everyone got a pill (Zyban or placebo)
Treatments used for nine weeks

Zyban plus nicotine patch: 245 participants, 95 not smoking at end of 9 weeks.

95 ො
𝑝(1− ො
𝑝) 0.388(1−0.388)
Success: 𝑝Ƹ = 245 = 0.388 Standard error: = = 0.031
𝑛 245
95 % confidence level → 𝑧 ∗ = 1.96

Confidence Interval is 0.388 ± 1.96 x 0.031 = (0.328, 0.449)

in R → prop.test(95,245)
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Nicotine Patches vs Zyban 𝑝(1


Ƹ − 𝑝)Ƹ
Study: New England Journal of Medicine 3/4/99 𝑝Ƹ ± 𝑧 ∗
𝑛
What is effective to help stop smoking?

893 participants randomly allocated to four groups:


Placebo, nicotine patch only, Zyban only, Zyban plus nicotine patch
(participants blinded to which)
Everyone got a patch (nicotine or placebo), Everyone got a pill (Zyban or placebo)
Treatments used for nine weeks

Zyban plus nicotine patch: 245 participants, 95 not smoking at end of 9 weeks.
in R →
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Nicotine Patches vs Zyban


Study: New England Journal of Medicine 3/4/99
What is effective to help stop smoking?
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Two notes on sample size



𝑝(1
Ƹ − 𝑝)Ƹ
𝑝Ƹ ± 𝑧
𝑛

1. The width of confidence intervals depends on sample size (through the standard error)

2. Intervals are based on the normal distribution (CLT), for this assumption to be reasonable – need a large
enough sample size: n𝑝,Ƹ 𝑛 1 − 𝑝Ƹ ≥ 5

0 1 0 1
𝑝Ƹ 𝑝Ƹ ~ 0.5
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Two notes on sample size



𝑝(1
Ƹ − 𝑝)Ƹ
𝑝Ƹ ± 𝑧
𝑛

1. The width of confidence intervals depends on sample size (through the standard error)

2. Intervals are based on the normal distribution (CLT), for this assumption to be reasonable – need a large
enough sample size: n𝑝,Ƹ 𝑛 1 − 𝑝Ƹ ≥ 5

DISASTER!

0 1 1
𝑝Ƹ ~ 0.05
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

The difference between two proportions (Independent Samples)


[Is the difference between our two proportions significant?]


𝑝Ƹ1 1 − 𝑝Ƹ1 𝑝Ƹ 2 1 − 𝑝Ƹ 2
𝑝Ƹ1 − 𝑝Ƹ 2 ± 𝑧 +
𝑛1 𝑛2

where z* is the value of the standard normal variable with area between –z* and z* equal to the desired
conference level.

Condition 1: Sample proportions are available based on independent, randomly selected samples from the two
populations

Condition 2: All the quantities 𝑛1 𝑝Ƹ1 , 𝑛1 1 − 𝑝Ƹ1 ≥ 5 AND 𝑛2 𝑝Ƹ2 , 𝑛2 1 − 𝑝Ƹ 2 ≥ 5


228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Nicotine Patches vs Zyban


Study: New England Journal of Medicine 3/4/99

𝑝Ƹ1 1 − 𝑝Ƹ1 𝑝Ƹ 2 1 − 𝑝Ƹ 2
𝑝Ƹ1 − 𝑝Ƹ 2 ± 𝑧∗ +
𝑛1 𝑛2

0.348 1 − 0.348 0.213 1 − 0.213


0.348 − 0.213 ± 1.96 +
244 244
𝑛1 = 𝑛2 = 244
0.135 ± 0.080 → 0.055 to 0.215
Zyban: 85/244 → 𝑝Ƹ1 = 0.348
Patch: 52/255 → 𝑝Ƹ 2 = 0.213 Interval does NOT include 0, so it supports a difference
between the success rates of the two methods.

in R → prop.test(c(85,52), c(244,244))
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Nicotine Patches vs Zyban


Study: New England Journal of Medicine 3/4/99

𝑝Ƹ1 1 − 𝑝Ƹ1 𝑝Ƹ 2 1 − 𝑝Ƹ 2
𝑝Ƹ1 − 𝑝Ƹ 2 ± 𝑧∗ +
𝑛1 𝑛2

0.348 1 − 0.348 0.213 1 − 0.213


0.348 − 0.213 ± 1.96 +
244 244
in R → prop.test(c(85,52), c(244,244))
0.135 ± 0.080 → 0.055 to 0.215

Interval does NOT include 0, so it supports a difference


between the success rates of the two methods.
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals for Means Sample estimate ± Multiplier x Standard error

We know how to calculate mean and standard deviation (hopefully!). But both of these are SAMPLE properties.

ҧ Sample standard deviation (s) → population mean (μ), population standard error (σ)
Sample Mean (𝑥),

Because we don’t know σ, we can’t use normal distribution,.


We need a way to get sample size information into our confidence intervals.

CI for proportions → Normal distribution, z* multiplier


CI for means → t-distribution, t* multiplier

t-distribution is wider and flatter than a normal distribution, but approaches normal distribution as n → ∞
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Student’s t-distribution
𝑥ҧ − μ 𝑥ҧ − μ 𝑛(𝑥ҧ − μ)
𝑡= = =
𝑠. 𝑒. (𝑥)ҧ 𝑠/ 𝑛 𝑠

𝑥ҧ : sample mean
s: sample standard deviation
n: sample size

μ: population mean
σ: population standard error
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals for Means


Sample estimate ± Multiplier x Standard error


𝑠 ∗
𝑥ҧ ± 𝑡 s. e. 𝑥ҧ → 𝑥ҧ ± 𝑡
𝑛
where t* is the value in a t-distribution with degrees of freedom: df = n-1, such that the area between –t* and t*
equals the desired conference level.

Condition: Population of measurements is bell-shaped and a random sample of any size is measure, OR
Condition: Population of measurements is not bell-shaped but a large random sample is needed n ≥ 30

Confidence Level Multiplier


90
DEPENDS ON DEGREES OF FREEDOM (df = n-1)
95 in R → qt(confidence quantile, df = ##)
98
e.g., for 95 %, n = 100 → qt(0.975, df = 99)
99
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

𝑠
𝑥ҧ ± 𝑡∗
Case Study: Bats and insects 𝑛
How far can bats detect insects from?

100 bat →insect detection distances measured


Sample mean, 𝑥ҧ = 532 cm
Sample standard deviation, s = 18.1 cm
Sample size, n = 100

What is the 95% Confidence Interval for the unknown mean, μ?

in R to get t* → qt(0.975, df=99) ~ 1.98

𝑠 18.1
𝑥ҧ ± 𝑡 ∗ = 532 ± 1.98 = 532 ± 3.58
𝑛 100

Confidence intervals: (528.42, 535.58)


228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals for Paired data

Paired data are where members of the population are linked:


e.g., pairs of observations on same unit (before and after a treatment)
e.g., matched pairs of alike subjects (twins/brothers)

Difference in responses between them can be attributed to the treatment/change.

Interval is for the mean difference, based on differences for each pair.
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Asprin and blood-clotting


Does asprin influence blood-clotting ability?

12 adult males
Measured time for blood to clot BEFORE taking 2 asprin
Measured time for blood to clot AFTER taking 2 asprin (3 hours later)

Sample mean difference (before -> after), 𝑥ҧ = 0.1084


Sample standard deviation of differences, s = 0.5071
Sample size, n = 12

What is the 95% Confidence Interval for the unknown mean, μ?


in R to get t* → qt(0.975, df=11) ~ 2.201

𝑠 0.5071
𝑥ҧ ± 𝑡 = 0.1084 ± 2.201 = 0.1084 ± 0.322
𝑛 12

Confidence intervals: (-0.214, 0.431)


228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals for difference in two means

Depends on whether equal variance, or unequal variance.

in R → t.test(A,B, var.equal = TRUE) in R → t.test(A,B) # default is false


228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals for difference in two means


EQUAL VARIANCE: Pooled Confidence intervals

1 1
𝑥1ҧ − 𝑥ҧ2 ± 𝑡 ∗ 𝑠𝑝 +
𝑛1 𝑛2

Where 𝑡 ∗ is found using a t-distribution with 𝑑𝑓 = 𝑛1 + 𝑛2 − 2, and 𝑠𝑝 is the pooled standard deviation.

𝑛1 − 1 𝑠12 + 𝑛2 − 1 𝑠22
𝑠𝑝 =
𝑛1 + 𝑛2 − 2
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Confidence Intervals for difference in two means


UNEQUAL VARIANCE

𝑠12 𝑠22
𝑥1ҧ − 𝑥ҧ2 ± 𝑡 ∗ +
𝑛1 𝑛2

Where 𝑡 ∗ is the value in a t-distribution with area between - 𝑡 ∗ and + 𝑡 ∗ equal to the desired confidence
level. But the df now depends on the observed variances – tricky, and approximate (Welch’s
approximation):
𝑠12 𝑠22
+
𝑛1 𝑛2
𝑑𝑓 ≈
2 2 𝑠22 2
1 𝑠1 1
+
𝑛1 − 1 𝑛1 𝑛2 − 1 𝑛2
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Teaching children to read


Do these new reading activities help improve reading abilities?

21 students in class 1 → take part in the new activities


24 students in class 2 → do NOT take part

At the end of 8 weeks, the students are given a Degree of Reading Power (DRP) test.

Class 1 (activity): 24 61 59 46 43 44 52 43 58 67 62 57 71 49 54 43 53 57 49 56 33
Class 2 (no activity): 42 33 46 37 43 41 10 42 55 19 17 55 26 54 60 28 62 20 53 48 37 85 42

What is the 95% Confidence Interval for the difference in the means?

Try two ways:


[1] Assume equal variance
[2] Unequal variance
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Teaching children to read


Do these new reading activities help improve reading abilities?

Class 1 (activity): 24 61 59 46 43 44 52 43 58 67 62 57 71 49 54 43 53 57 49 56 33
Class 2 (no activity): 42 33 46 37 43 41 10 42 55 19 17 55 26 54 60 28 62 20 53 48 37 85 42

[1] Assume equal variance


Class 1 average: 𝑥ҧ1 = 51.48, Class 2 average: 𝑥ҧ2 = 41.92 Centre: 51.48 – 41.92 = 9.96
Class 1 sd: 𝑠1 = 11.01, Class 2 sd: 𝑠2 = 17.15
𝑑𝑓 = 𝑛1 + 𝑛2 − 2 = 21 + 23 − 2 = 42 in R to get t* → qt(0.975, df=42) = 2.018
21 − 1 11.012 + 23 − 1 17.152
𝑠𝑝 = = 14.55
21 + 23 − 2

1 1 1 1
𝑥ҧ1 − 𝑥ҧ2 ± 𝑡 ∗ 𝑠𝑝 + 𝑛 → 51.48 − 41.92 ± 2.018 ∗ 14.55 + = (1.10, 18.82)
𝑛1 2 21 23

We are 95% confidence that the difference in mean scores is between 1.10 and 18.82. +ve, but could be small or large.
228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Teaching children to read


Do these new reading activities help improve reading abilities?
Class 1 (activity): 24 61 59 46 43 44 52 43 58 67 62 57 71 49 54 43 53 57 49 56 33
Class 2 (no activity): 42 33 46 37 43 41 10 42 55 19 17 55 26 54 60 28 62 20 53 48 37 85 42

[1] Assume equal variance


228371 Lecture set 1: Foundations of Inference
Lecture 2: Confidence Intervals

Case Study: Teaching children to read


Do these new reading activities help improve reading abilities?
Class 1 (activity): 24 61 59 46 43 44 52 43 58 67 62 57 71 49 54 43 53 57 49 56 33
Class 2 (no activity): 42 33 46 37 43 41 10 42 55 19 17 55 26 54 60 28 62 20 53 48 37 85 42

[2] Unequal variance

Rule of thumb:

If sd differs by less than a factor of


2, can use either approach.

If LARGER, use unequal variances.


228371
Statistical Modelling for Engineers & Technologists
LECTURE SET 1: Foundations of Inference

Lecture 1 (Monday 11am) Revision, Sampling, CLT, Inference


Lecture 2 (Weds 8am) Confidence intervals
Lecture 3 (Thurs 12pm) Hypothesis testing We are here
Lab (2hours on Friday) “Confidence intervals & Hypothesis testing”

Lab1 (Friday 8am – 10am) OR Lab2 (Friday 12pm – 2pm)


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis
“A hypothesis is an idea which is suggested as a possible explanation for a particular situation or
condition, but which has not been proven to be correct”

For us: Need the hypothesis to be translated into a statement about a parameter

Hypothesis test
To establish whether the hypothesis can be rejected based on evidence (data) presented

Errors
Can be wrong in two different ways - Type 1 and Type 2 errors.

PLAN:
(1) Hypothesise,
(2) Test to reject hypothesis,
(3) Think about how wrong we might be
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing PLAN:
(1) Hypothesise,
(2) Test to reject hypothesis,
(3) Think about how wrong we might be
Case Study: Member of a Jury
A person is accused of a crime. Under NZ law, an accused is presumed innocent until proven guilty.

(1) Two scenarios: H0: The defendant is NOT guilty, or Ha: The defendant is guilty

(2) You look at the evidence and decide whether to reject H0.

(3) Worry.

H0 rejected, Ha accepted:
Has an innocent person been falsely convicted?
Type 1 error false positive

H0 not rejected:
Has the criminal been erroneously freed?
Type 2 error false negative
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Null vs Alternative Hypotheses

Null hypothesis: H0 is the hypothesis that is set up to see if we can reject it or not.
Innocent until proven guilty → H0: The defendant is NOT guilty

Alternative hypothesis: Ha is the hypothesis which we accept if we reject the null.


Reject null hypothesis → Ha: the defendant is guilty

Which way round you set up your null and alternative hypotheses is based on the intent of the test:
If it is of interest to show a hypothesis is TRUE, make it the alternative hypothesis
If it is of interest to show a hypothesis is FALSE, make it the null hypothesis

Decision table
Decision H0 false H0 true
Accept H0 Type II error Correct decision
Reject H0 Correct decision Type 1 error
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Case Study: Do you have a specific disease?


You are tested for a disease.
Most tests are not 100% accurate.
The doctor must decide between two hypotheses:

(1) H0: You do NOT have the disease, or Ha: You do have the disease

Scenario A: you test negative


(2) Doctor accepts null hypothesis, declares you healthy
(3) Worry: you are POSITIVE but your test was NEGATIVE → false negative result Type 2 error

Scenario B: you test positive


(2) Doctor rejects null hypothesis, declares you sick
(3) Worry: you are NEGATIVE but your test was POSITIVE → false positive result Type I error

Which scenario is worse? Depends on the disease, and the consequence of wrong diagnosis.
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Errors & Significance Level

Type 1 error (false positive) probability: α Type 2 error (false negative) probability: β

We want α and β as small as possible, HOWEVER, α + β ≈ constant. If you decrease α, you increase β.

Usually focus on type 1 errors and select α that is tolerable for the specific situation. Also call this α,
the significance level of a test – the probability of making a type 1 error.

A test with α = 0.01 is said to have “a significance level 0.01”

So, we decide how significant we want the results from our test to be, run the test, and get something
called a “p-value”. We compare the p-value to our significance level, and if p is LESS than α, we say the
results are statistically significant at the α level.

“If the p-value is LOW, the null must go”


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Test statistic is….


- How we decide between H0 and Ha
- A quantity computed from sample data to provide evidence about plausibility of H0
- if we are interested in sample proportions 𝑝Ƹ that are normally distributed → z statistic

P-value is the sum of as,


or more, extreme areas

P-value is…
- the tail probability -z z
- the probability of getting a sample as, or more, unlikely given the null hypothesis, than the sample
you actually got
- just the total amount of ‘answer’ space that represents samples as, or more, unlikely than the
result of the sample you observed.
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

How does p-value help?


- To reject a hypothesis, we need to know how unlikely the observed data are
- How unlikely would it be do get values EVEN MORE extreme than what we have observed?

- For example, p value =0.04 means only 4% of all possible samples are more extreme. This sample is
unlikely to occur if H0 is true, and so provides evidence against H0, and H0 can be rejected if, say,
our significance level is 0.05.

- A p-value of 0.3, means 30% of all possible samples are more extreme. This is not enough evidence
to reject H0 unless our significance level is really high (in which case our chance of type 1 errors –
false positives, will be really high)

- Commonly use p-value < 0.05 → reject H0

One-tailed Two-tailed
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Recap
Null & Alternative Hypotheses H0: claim about populations initially assumed to be true
Ha: claim that is contradictory to the null hypothesis

Type I and Type II errors Type I: False positive, reject null but its true
Type 2: False negative, do not reject null but its false

Test statistic calculated from the sample used to provide evidence about the plausibility of H0

P-value Tail probability for which the statistic under H0 is more extreme than the observed
value of the statistic.
P-value is low, the null must go.

Significance α, the predetermined probability of making a type I error.


Reject H0 if p-value of the test is LOWER than α
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing PLAN:
(1) Hypothesise,
(2) Test to reject hypothesis,
(3) Think about how wrong we might be
Two-Sample z test for a Difference in Two Proportions
[Assumption: both sample sizes are large]

(1) H0: p1 = p2 Ha: p1 ≠ p2

𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑥1 +𝑥2


(2) If H0 is true → use pooled estimate of proportions: 𝑝Ƹ = =
𝑡𝑜𝑡𝑎𝑙 𝑜𝑓 𝑡𝑤𝑜 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑛1 +𝑛2

𝑝ො1 −𝑝ො2
Test statistic: 𝑧= 1 1

𝑝(1− ො
𝑝) +
𝑛1 𝑛2
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Case Study: Friction-driven material degradation


You can make a product out of natural rubber or a synthetic substitute.
Are they both as resistant to friction-driven material degradation?
Sample Sample size Number degraded
(1) H0: p1 = p2 Ha: p1 ≠ p2
Rubber 40 27
np, n(1-p) ≥ 5 # Check OK for CLT to apply Synthetic 15 8

p1 = 27/40 = 0.675 p2 = 8/15 = 0.533

𝑥1 +𝑥2 27+8 𝑝ො1 −𝑝ො2 0.675−0.533


𝑝Ƹ = = = 0.636 𝑧= = = 0.9727
𝑛1 +𝑛2 40+15 1 1 1 1

𝑝(1− ො
𝑝) + 0.636 (1−0.636) +
𝑛1 𝑛2 40 15

In R → 2*pnorm(0.9727, lower = FALSE)


0.3307024 (p-value)

-z z
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Case Study: Friction-driven material degradation


You can make a product out of natural rubber or a synthetic substitute.
Are they both as resistant to friction-driven material degradation?
Sample Sample size Number degraded
(1) H0: p1 = p2 Ha: p1 ≠ p2
Rubber 40 27
np, n(1-p) ≥ 5 # Check OK for CLT to apply Synthetic 15 8

ALTERNATIVELY In R →
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Case Study: Friction-driven material degradation

ALTERNATIVELY In R →

2
𝑂𝑖 − 𝐸𝑖
Χ2 = ෍
𝐸𝑖
𝑖

Where 𝑂𝑖 is observed value


And 𝐸𝑖 is expected value if independent
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing
HOW TO:

(1) State your null hypothesis and alternative: H0: ### Ha: ###

(2) Find the appropriate test statistic: z, t, … (note: see last pages in Engineering Stats book)

(3) Calculate test statistic under H0 i.e., when H0 parameter value is true

(4) Determine if one or two tailed test Any directionality in Ha ?

(5) Set significance level: α = 0.05, or α = 0.01

(6) Find p-value of the test: p

(7) Accept or reject H0 using p vs α p < α → reject H0 in favour of Ha


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing about One Mean or Paired Data


HOW TO: (1) State your null hypothesis and alternative: H0: ### Ha: ###

A. H0 : μ = μ0 versus Ha : μ ≠ μ0 (two-sided) H0: data matches the standard situation


Ha: data doesn’t match the standard situation

B. H0 : μ = μ0 versus Ha : μ < μ0 (one-sided) or H0: data matches the standard situation


H0 : μ ≥ μ0 versus Ha : μ < μ0 Ha: data is LESS than the standard situation

C. H0 : μ = μ0 versus Ha : μ > μ0 (one-sided) or H0: data matches the standard situation


H0 : μ ≤ μ0 versus Ha : μ > μ0 Ha: data is MORE than the standard situation

Note: a p-value is computed assuming H0 is true, and μ0 is the value for that computation
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing about One Mean or Paired Data


HOW TO: (2) Find the appropriate test statistic z, t, …
(3) Calculate test statistic under H0

What is your data? Is it a random sample? → Can do hypothesis test

Are data approximately normal or have a large enough sample (n ≥ 30)? → CLT

𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛 −𝑛𝑢𝑙𝑙 𝑣𝑎𝑙𝑢𝑒 ҧ 0


𝑥−𝜇
t-statistic: t= = 𝑠
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟 ൗ 𝑛

t-statistic has (approximately) a t-distribution with df = n-1.


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing about One Mean or Paired Data


HOW TO: (4) Determine if one or two tailed test READ THE QUESTION

(5) Set significance level: α = 0.05, or α = 0.01


(6) Find p-value of the test: p
(7) Accept or reject H0 using p vs α p < α → reject H0 in favour of Ha
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 1 – Are the data different to standard, μ0 = 1.5 ?

H0 : μ = μ0 versus Ha : μ ≠ μ0 (two-sided)

μ0 = 1.5

CANNOT REJECT NULL HYPOTHESIS


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 2 – Are the data greater than standard, μ0 = 25 ?

H0 : μ = μ0 versus Ha : μ > μ0 (one-sided)

μ0 = 25

CANNOT REJECT NULL HYPOTHESIS


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 3 – Are the two sample means different?


[Not assuming equal variances]

H0 : μx – μy = 0 versus
Ha : μx – μy ≠ 0 ***

REJECT NULL HYPOTHESIS at


0.05 significance level

*** same as writing H0 : μ1 = μ2 versus Ha : μ1 ≠ μ2


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 4 – Tensile Strength of Liners


[A case where we are not given the raw data]
[Not assuming equal variances]

Q: Does fusion increase average tensile strength?

H0 : μ1 = μ2 versus Ha : μ1 < μ2

Immediate issue: small sample size

Are data approximately normal or have a large enough sample (n ≥ 30)? → CLT

NEED RAW DATA….


Then can do: quantile-quantile plots → qqnorm()
Or: Shapiro-Wilk test → shapiro.test()
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 4 – Tensile Strength of Liners


Q: Does fusion increase average tensile strength?

Looking for p-value > 0.05


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 4 – Tensile Strength of Liners


Q: Does fusion increase average tensile strength?

Not awesome, but let’s carry on…


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 4 – Tensile Strength of Liners


Q: Does fusion increase average tensile strength?

H0 : μ1 = μ2 versus Ha : μ1 < μ2

REJECT NULL HYPOTHESIS


at 0.05 significance level
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Hypothesis Testing: Example 4 – Tensile Strength of Liners


Q: Does fusion increase average tensile strength?
[Assuming equal variances]
H0 : μ1 = μ2 versus Ha : μ1 < μ2

CANNOT REJECT NULL


HYPOTHESIS at 0.05
significance level
228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Paired Data Example: Shoe Sole Rubber – Which is more durable?


[Assumption: Distribution of differences is normal]

Two types of rubber are being considered for use in shoe soles.
8 people wear shoes with one type of rubber (A) on one foot, and the other type on the other foot (B).
After 2 months the amount of wear is measured on each shoe.

Immediate issue: small sample size


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Paired Data Example: Shoe Sole Rubber – Which is more durable?


[Assumption: Distribution of differences is normal]

Immediate issue: small sample size


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Paired Data Example: Shoe Sole Rubber – Which is more durable?


[Assumption: Distribution of differences is normal]

PAIRED DATA – so do shapiro-test


on “distribution of differences”

Looking for p-value > 0.05


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Paired Data Example: Shoe Sole Rubber – Which is more durable?


[Assumption: Distribution of differences is normal]

CAN REJECT NULL


HYPOTHESIS at 0.05
significance level

Note: Can also just run the test with raw data and tell it that the data are paired:

t.test(dat$A, dat$B, paired = TRUE)


228371 Lecture set 1: Foundations of Inference
Lecture 3: Hypothesis Testing

Paired Data Example: Shoe Sole Rubber – Which is more durable?

If we don’t identify the data as paired:


R will run a 2-sample t-test rather than a paired t-test.

THIS IS BAD: Standard error ↑ t-statistic ↓ df ↑ p-value ↑


Less like to see any departure from H0

CANNOT REJECT NULL HYPOTHESIS


at 0.05 significance level
228371
Statistical Modelling for Engineers & Technologists
LECTURE SET 1: Foundations of Inference

Lecture 1 (Monday 11am) Revision, Sampling, CLT, Inference


Lecture 2 (Weds 8am) Confidence intervals We are here
Lecture 3 (Thurs 12pm) Hypothesis testing
Lab (2hours on Friday) “Confidence intervals & Hypothesis testing”

Lab1 (Friday 8am – 10am) OR Lab2 (Friday 12pm – 2pm)

You might also like