0% found this document useful (0 votes)
13 views27 pages

Lecture 2 1

Uploaded by

Rafalel Jupio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views27 pages

Lecture 2 1

Uploaded by

Rafalel Jupio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

BIOE70037 - Computational and Statistical

Methods for Research 2023-2024

Lecture 2: Data analysis

Chiu Fan Lee | [email protected]


Standard Error of the Mean
▪ The standard error is the standard deviation of sample means.
▪ It is a measure of how representative a sample mean is likely to be of the population mean
▪ A large standard error (relative to the sample mean) means that there is a lot of variability
between the means of different samples and so the sample we have might not be
representative of the population mean
▪ A small standard error indicates that most sample means are similar to the population mean
and so our sample is likely to be an accurate reflection of the population.
Frequency Distribution
▪ Histograms are good way of visualizing data
▪ With enough data points histograms may indicate
the potential distribution, multimodality, skewness.
▪ Number of bins is important.
➢ Too many bins might be very noisy
➢ Too few bins can mask out important features
Frequency Distribution Histogram of rn1 Histogram of rn2
Histogram of rn1 Histogra

10000
Histograms are good way of visualizing data

80
200
4000
▪ With enough data points histograms may indicate

8000

60
150
the potential distribution, multimodality, skewness.

3000
6000

Frequency

Frequency
Frequency

Frequency

40
Number of bins is important.

100
2000
4000
➢ Too many bins might be very noisy

20
50
1000
2000
➢ Too few bins can mask out important features

0
0

0
0 5 10 0 10 20
-5 0 5 10 15 0 10 20 30 40 50 60
rn1 r
rn1 rn2
Frequency Distribution
▪ Normal distribution
▪ Skewed distribution
▪ Modality
Frequency Distribution
▪ Normal distribution
▪ Skewed distribution
▪ Modality
Frequency Distribution
▪ Normal distribution
▪ Skewed distribution
▪ Modality
Universality of normal distribution! Basis
of most statistical testing methods

99%
95%

8
Standard Deviation (σ)
99%
95%

9
Z-scores
▪ Used to convert any normal distribution, N(µ, σ) where
➢ Mean (µ) = 0
𝑋−μ
➢ Standard deviation (σ) = 1 𝑧=
σ
To the Standard normal distribution: N(0,1)
▪ Allows the use of probability tables (p-values)
▪ Important z-score: ±1.96 (removes the outer 2.5%) or a 95% CI
▪ https://www.mathsisfun.com/data/standard-normal-distribution-table.html
Z-scores
Z-scores
▪ Example:
Every year, 50,000 runners compete in the Victoria Park Fun Run. They run 10 kilometres. The average finishing
time is 55 minutes, with a standard deviation of 10 minutes. Fred and Wilma completed the race in 61 and 51
minutes, respectively. Barney and Betty had finishing times with z-scores of -0.3 and 0.7, respectively.

List the runners in order, starting with the fastest runner and ending with the slowest runner.
Z-scores
▪ Example:
This problem can be solved by converting Fred and Wilma's raw scores into z-scores. To do this, we use the z-
score equation:
𝑋 − 𝑋ത
𝑧=
𝑠
where z is the z-score, X is the runner's raw score, X is the mean finishing time, and s is the standard deviation of
finishing times.
Fred's z-score
= ( 61-55) / 10 = 0.60
Wilma's z-score
= ( 51-55) / 10 = - 0.4
Based on z-scores, we can order the runners from fastest to slowest as follows: Wilma (z = -0.4), Barney (z = -0.3),
Fred (z = 0.6), and Betty (z = 0.7).
Hypothesis Testing
▪ Null hypothesis (H0)
▪ Experimental hypothesis or alternative hypothesis (H1)
▪ The null hypothesis must take the opposite to the experimental hypothesis
▪ Collect data and seek evidence against H0 as a way of bolstering H1
(deduction)
P-values
▪ The probability that the observed test statistic is equal to or more extreme, than the observed
result when H0 is true
▪ The p-value is used in the context of null hypothesis testing in order to quantify the idea of
statistical significance of evidence
▪ P-value will answer the question: What is the probability of the observed test statistic when H0
is true?
▪ Smaller P-values provide stronger and stronger evidence against H0
P-values
▪ Example: In the 1970s, 20–29 year old men in the U.K. had a mean body weight of 78kg.
Standard deviation was 18 kg. Test whether mean body weight in the population now differs.
▪ Null hypothesis H0: μ = 78 (“no difference”)
▪ The alternative hypothesis can be either
➢ H1: μ > 78 (one-sided test)
➢ H1 : μ ≠ 78 (two-sided test)
P-values: One sided test
▪ The critical value is either + or -, but not
both.

▪ In this case, you would have statistical


significance (p < .05) if z ≥ 1.645.

17
P-values: Two sided test
▪ The critical value is the number that
separates the “blue zone” from the
middle (± 1.96 this example)

▪ To be statistically significant the z-score


needs to be in the “blue zone”

18
P-values
▪ Example: In the 1970s, 20–29 year old men in the U.K. had a mean body weight of 78kg.
Standard deviation was 18 kg. Test whether mean body weight in the population now differs.
▪ Null hypothesis H0: μ = 78 (“no difference”)
▪ The alternative hypothesis can be either
➢ H1: μ > 78 (one-sided test)
➢ H1 : μ ≠ 78 (two-sided test)
P-values
▪ Example: In the 1970s, 20–29 year old men in the U.K. had a mean body weight of 78kg.
Standard deviation was 18 kg. Test whether mean body weight in the population now differs.
▪ Null hypothesis H0: μ = 78 (“no difference”)
▪ The alternative hypothesis can be either
➢ H1: μ > 78 (one-sided test)
➢ H1 : μ ≠ 78 (two-sided test)
P-values
▪ Example: In the 1970s, 20–29 year old men in the U.K. had a mean body weight of 78kg.
Standard deviation for the population was 18 kg.
▪ A sample was taken from 64 people, finding a mean weight of 80kg
z-score = (80-78) / (18/√64) = 0.89
Probability tables can then be used to ascertain the P-value: 0.19
Is this strong evidence for or against the null hypothesis, H0?
P-values
▪ Example: In the 1970s, 20–29 year old men in the U.K. had a mean body weight of 78kg.
Standard deviation for the population was 18 kg.
▪ Another sample was taken from 64 people, finding a mean weight of 83kg
z-score = (83-78) / (18/√64) = 2.22
Probability tables can then be used to ascertain the P-value: 0.01
Is this strong evidence for or against the null hypothesis, H0?
P-values
▪ Example: In the 1970s, 20–29 year old men in the U.K. had a mean body weight of 78kg.
Standard deviation for the population was 18 kg.

▪ What if we consider the two-sided test, where the P-value is 0.02


Is this strong evidence for or against the null hypothesis, H0?

▪ If one-sided P = 0.01, then two-sided P = 2 × 0.01 = 0.02.


Is this strong evidence for or against the null hypothesis, H0?
P-value or α-level??
α-level P-value

▪ Set BEFORE we collect data, run ▪ Calculated AFTER we gather the data
statistics ▪ The calculated probability of a mistake
▪ Defines how much of an error we are by saying it works
willing to make to say we made a ▪ AKA: level of significance
difference ▪ Describes the percent of the
▪ If we’re wrong, it’s an α error or Type 1 population/area under the curve (in the
error tail) that is beyond our statistic
P-value or α-level??
▪ Let α ≡ probability of erroneously rejecting H0
▪ Set α threshold (e.g., let α = .10, .05, or whatever)
▪ Reject H0 when P ≤ α

▪ Retain H0 when P > α

▪ Example: Set α = .10. Find P = 0.27  retain H0


▪ Example: Set α = .01. Find P = .001  reject H0
β-level
▪ Let β ≡ probability of erroneously retaining H0

▪ Two types of decision errors:


➢ Type I error (α) = erroneous rejection of true H0

➢ Type II error (β) = erroneous retention of false H0


Power
▪ β ≡ probability of a Type II error

➢ β = Pr(retain H0 | H0 false) (the “|” is read as “given”)

▪ 1 – β = “Power” ≡ probability of avoiding a Type II error

➢ 1– β = Pr(reject H0 | H0 false)

You might also like