Statistics
Statistics
26.03.2025
1 / 28
Population vs sample
2 / 28
Representativeness of the Group
3 / 28
Estimators
4 / 28
Measurement error
5 / 28
Optimal sample size
The optimal sample size is typically 30. As this number is both easy
to collect and sufficiently large to ensure accuracy and reliability.
However, the rule of 30 does not always apply, as it depends on the
specific characteristics of the population being studied.
7 / 28
Normal distribution
8 / 28
The Central Limit Theorem
The Central Limit Theorem states: Regardless of the shape of the
population distribution, with random and independent measurements
taken from it, the distribution of sample means will approach a normal
distribution, - and the more observations we collect, the closer it gets to
normality.
10 / 28
Null Hypothesis Significance Testing (NHST)
11 / 28
Null and Alternative Hypotheses
H0 : µ 1 = µ 2
HA : µ1 ̸= µ2
The Greek symbols used here (µ1 , µ2 ) indicate that we are referring to
population means, not sample means.
According to the NHST approach, the null hypothesis is superior to the
alternative hypothesis because it is the one that is actually tested.
Accepting (more precisely: having no grounds to reject) or
rejecting the null hypothesis is not evidence of the non-existence
or existence of a particular effect or relationship!
12 / 28
Type I and Type II Errors
Type I Error (α): Rejecting H0 when it is true (False Positive).
Type II Error (β): Failing to reject H0 when HA is true (False Negative).
13 / 28
Significance Level (α) and Confidence Level
Significance Level (α): Probability of a Type I error, commonly
set at 5% (sometimes 1% or 10%).
14 / 28
Significance Level (α) and Confidence Level
15 / 28
Confidence Intervals
Confidence intervals (CI) are used to illustrate how reliable the estimator
obtained in the study is and are related to the standard error.
16 / 28
Power of the Test
it is the probability of avoiding a Type II error
the greater the test’s power, the better its ability to reject the null
hypothesis (if it is indeed false!)
the sample should be chosen to achieve a power of at least 80%
power = 1 − β
17 / 28
Test Statistic
calculated from sample data; used to decide if H0 should be rejected
when running a statistical test, we calculate the probability of
obtaining a specific value of the test statistic based on our sample
the smaller the probability (p-value) of obtaining your result
under H0 , the stronger the indication that your result is
significant and that the null hypothesis might be false
18 / 28
p-Value and Critical Value
19 / 28
p-Value and Critical Value
Example: If p = 0.03 and α = 0.05, we reject H0 .
20 / 28
Common mistakes
21 / 28
Statistical Hypothesis Testing Process
22 / 28
Mean (Average)
Definition: The mean, or average, is the sum of all data values divided
by the number of values.
Formula:
n
1X
x̄ = xi
n
i=1
Example:
For the data: 5, 7, 8, 10, 10
Mean = 5+7+8+10+10
5 =8
Use: Useful for understanding the central tendency of a dataset.
23 / 28
Median
Example:
Data: 3, 5, 7, 8, 10
Median = 7
Data: 3, 5, 7, 8
Median = 5+72 =6
Use: Useful when data contains outliers.
24 / 28
Quantiles
Example:
The first quartile (Q1) is the 25th percentile, the median (Q2) is the
50th percentile, and the third quartile (Q3) is the 75th percentile.
Use: Helps describe the distribution and spread of data.
25 / 28
Mode (Dominant Value)
Example:
Data: 3, 4, 4, 5, 6, 6, 6, 7
Mode = 6
Use: Useful for categorical data or to identify common values.
26 / 28
Variance
Definition: Variance measures how far data values are spread out from
the mean.
Formula:
n
1X
σ2 = (xi − x̄)2
n
i=1
Example:
If data = 2, 4, 4, 4, 5, 5, 7, 9 and mean = 5
Then variance = average of squared differences from 5
Use: Important in probability and statistical modeling.
27 / 28
Standard Deviation
Formula: v
u n
u1 X
σ=t (xi − x̄)2
n
i=1
Use: Easier to interpret than variance because it has the same unit as
the data. Useful for comparing variability.
28 / 28