0214 Lecture Notes
0214 Lecture Notes
Written by
Dr. Fadal Aldhufairi
Edited by
Prof. Ibrahim Almanjahie
Department of Mathematics
King Khalid University
Reference:
Wayne W. Daniel, Chad L. Cross: Biostatistics: A Foundation for Analysis in the Health
Sciences.
Note: This handout does not replace the main textbook as the primary reference. It serves
as a supplementary resource to enhance understanding and comprehension
Learning Objectives
After studying this chapter, students will:
1
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 1 / 315
Introduction to Biostatistics
2
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 2 / 315
What is Statistics?
3
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 3 / 315
Sources of Data
4
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 4 / 315
Concept of Biostatistics
Examples:
Assessing the effectiveness of a new drug.
Analyzing the correlation between smoking and lung cancer.
5
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 5 / 315
Role of a Biostatistician
6
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 6 / 315
Examples
7
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 7 / 315
Uses of Biostatistics
8
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 8 / 315
Why Study Biostatistics?
• Numerical data is essential in healthcare for making evidence-
based decisions and improving patient outcomes. Biostatistics is
widely applied in:
• Clinical trials (e.g., determining the effectiveness of new drugs or
treatments).
• Epidemiology (e.g., tracking disease outbreaks and trends).
• Patient care (e.g., using data to predict outcomes and personalize
treatments).
• Public health (e.g., assessing vaccination programs and health poli-
cies).
• Medical research (e.g., analyzing genetic data to understand dis-
eases).
• Statistical methods help healthcare professionals make critical de-
cisions that directly impact lives.
Biostatistics is the foundation of evidence-based medicine, en-
abling medical professionals to provide the best possible care.
9
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 9 / 315
Who Uses Statistics and Biostatistics?
10
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 10 / 315
Types of Statistics – Descriptive and Inferential
11
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 11 / 315
Types of Statistics – Descriptive and Inferential
12
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 12 / 315
Types of Statistics – Descriptive and Inferential
2) Inferential Statistics
Inferential statistics is the branch of statistics that uses data from a sam-
ple to make conclusions or decisions about a larger population. It also
involves assessing the reliability of these conclusions.
13
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 13 / 315
Variables: Quantitative and Qualitative
Remarks:
Statistical terminology can have different meanings in every-
day language versus statistical contexts.
Understanding the basic vocabulary is crucial for effective
communication and analysis.
15
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 15 / 315
Introduction to Measurement
16
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 16 / 315
Nominal Scale
Example
Medical diagnoses: 1 = Flu, 2 = Cold, 3 = COVID-19
17
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 17 / 315
Ordinal Scale
Example
Improvement: 1 = Unimproved, 2 = Improved, 3 = Much Im-
proved
18
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 18 / 315
Interval Scale
Example
Temperature: 0°C is not the absence of heat.
19
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 19 / 315
Ratio Scale
20
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 20 / 315
Summary of Measurement Scales
21
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 21 / 315
Self-Review Questions
22
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 22 / 315
Sampling and Statistical Inference
23
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 23 / 315
Definition: Simple Random Sample
24
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 24 / 315
Example 1.4.1: Simple Random Sample
Example
Gold et al. studied the effectiveness of smoking cessation treatments.
A simple random sample of 10 subjects is drawn from a population of
189 subjects. Their ages are shown in Table 1.4.1.
Random Number Selection:
Subject No. Age
Use a random number table to 1 48
2 35
select a sample. 3 46
4 44
Start at a random point and use 5 43
. .
three-digit numbers to select .
.
.
.
subjects. 189 66
Example: Start at row 21, col- Table: Ages of 189 subjects who
umn 28 in the random number participated in a study on smoking
table and select valid numbers cessation.
(1-189).
25
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 25 / 315
Systematic Sampling: Example 1.4.2
Example
If the starting point is subject 4 and the interval k = 18, subjects 4, 22,
40, …, are selected.
Systematic Sampling:
Subject No. Age
The first subject is selected 4
22
44
66
randomly (subject 4). 40
58
47
56
76 52
The sample interval k = 18, . .
. .
which means every 18th sub- . .
27
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 27 / 315
Remarks on Sampling Methods
28
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 28 / 315
The Scientific Method
29
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 29 / 315
Key Elements of the Scientific Method
1) Observation
Phenomena are observed to generate questions for further explo-
ration.
Example: Does regular exercise reduce body weight?
2) Formulating a Hypothesis
A hypothesis is a testable statement about the observed phenom-
ena.
Example: ”Exercise reduces body weight” (Research Hypothesis)
Statistical Hypothesis: ”The average weight loss in exercisers is
greater than in non-exercisers.”
3) Designing an Experiment
Random assignment to experimental and control groups ensures a
valid test.
Example: 100 participants assigned to either an exercise group or
a control group.
30
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 30 / 315
Experimental Design Example
31
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 31 / 315
Key Elements of the Scientific Method
32
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 32 / 315
Self-Review
33
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 33 / 315
Conclusion and Remarks
34
Chapter 1: Introduction to Biostatistics 0214STAT: Fundamentals of Biostatistics 34 / 315
Chapter 2: Descriptive Statistics
Learning Objectives:
After studying this chapter, students will:
36
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 36 / 315
Considerations in Grouping Data
k = 1 + 3.322 log(n)
Remark: The number of intervals is six, fewer than the nine sug-
gested by Sturges’s rule.
40
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 40 / 315
Midpoint of Class Intervals
Note: The midpoint is useful for estimating the center of the class
interval when summarizing data.
41
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 41 / 315
Relative Frequency
42
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 42 / 315
Relative Frequency
44
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 44 / 315
Self-Review Questions
45
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 45 / 315
Boundary (or True) Class Limits and Histogram
Construction
46
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 46 / 315
Boundary Class Limits Table
47
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 47 / 315
Histogram Construction
48
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 48 / 315
Histogram Example
49
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 49 / 315
Frequency Polygon
50
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 50 / 315
Frequency Polygon vs Histogram
The total area under the frequency polygon is equal to the area under
the histogram. Figure 2 demonstrates this relationship by showing the
frequency polygon superimposed on the histogram.
51
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 51 / 315
Stem-and-Leaf Display
Stem-and-leaf displays are most effective with small data sets. They
are not generally suitable for use in publications intended for the
general public, but they help researchers and decision makers
understand the nature of their data. In contrast, histograms are more
appropriate for external communication.
52
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 52 / 315
Constructing a Stem-and-Leaf Display
54
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 54 / 315
Example: Heights of 15 Individuals
Next, we find the range of the data, which is the difference between
the maximum and minimum values:
56
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 56 / 315
Step 3: Define the Classes
57
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 57 / 315
Step 4: Calculate the Frequencies
58
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 58 / 315
Constructing the Frequency Table
Class Frequency
150 - 159 3
160 - 169 2
170 - 179 3
180 - 189 3
190 - 199 4
59
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 59 / 315
Boundary Class Limits
60
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 60 / 315
Height Distribution and boundary Class Limits Example
61
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 61 / 315
Cumulative Frequency Example
62
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 62 / 315
Use of boundary Class Limits in Graphs
Histogram:
boundary class limits ensure that bars representing adjacent
intervals touch each other.
The base of each bar corresponds to the range between the lower
and upper boundary class limits.
The height of the bar reflects the frequency of the class.
63
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 63 / 315
Frequency Polygon
64
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 64 / 315
Importance of boundary Class Limits
65
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 65 / 315
Descriptive Statistics Overview
66
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 66 / 315
What Are Measures of Central Tendency?
67
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 67 / 315
Arithmetic Mean
68
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 68 / 315
Example 2.4.1: Population Mean
Sol:
48 + 35 + 46 + · · · + 73 + 66
µ= = 55.032
189
69
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 69 / 315
Sample Mean
70
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 70 / 315
Example 2.4.2: Sample Mean
Sol:
43 + 66 + 61 + 64 + 65 + 38 + 59 + 57 + 57 + 50
x̄ = = 56
10
71
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 71 / 315
Properties of the Arithmetic Mean
72
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 72 / 315
Example: Effect of Extreme Values
Mean:
75 + 75 + 80 + 80 + 280
= 118
5
73
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 73 / 315
Median
Definition: The median divides a dataset into two equal parts such
that half the values are less than or equal to it and half are greater
than or equal to it.
Odd number of observations: Median is the middle value.
Even number of observations: Median is the mean of the two
middle values.
Example: For 10 ages: 38, 43, 50, 57, 57, 59, 61, 64, 65, 66
57 + 59
Median = = 58
2
74
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 74 / 315
Properties of the Median
75
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 75 / 315
Mode
76
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 76 / 315
Skewness
Formula: √ Pn
n i=1 (xi − x̄)3
Skewness = √
(n − 1) n − 1 s3
Examples:
Positively Skewed: Mean = 10.5, Median = 9.
Negatively Skewed: Mean = 8, Median = 9.
77
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 77 / 315
Symmetry and Normal Distribution
Symmetric Distribution: Left and right halves of the graph are
mirror images.
79
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 79 / 315
Visualizing Dispersion
Populations with the same mean can have different levels of
variability.
Examples:
Population A: Less variable, values closer together.
Population B: More variable, values more spread out.
If we denote the range by R, the largest value by xL , and the
smallest value by xS , we compute the range as follows:
Figure: 2.5.1: Two frequency distributions with equal means but different
amounts of dispersion.
80
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 80 / 315
The Range
Formula:
R = xL − xS
Where:
xL : Largest value.
xS : Smallest value.
Example:
R = 82 − 30 = 52
81
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 81 / 315
Advantages and Disadvantages of the Range
Disadvantages:
Considers only two values.
Limited usefulness for describing the entire dataset.
Alternate Expression:
82
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 82 / 315
Variance
Variance:
(43 − 56)2 + (66 − 56)2 + · · · + (50 − 56)2
s2 =
9
810
s2 = = 90
9
84
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 84 / 315
Standard Deviation
Formula: sP
√ n
− x̄)2
i=1 (xi
s= s2 =
n−1
Calculation: √
s= 90 ≈ 9.49
85
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 85 / 315
Coefficient of Variation (C.V.)
Formula:
s
C.V. = × 100
x̄
86
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 86 / 315
Example - Coefficient of Variation
Sample 1:
Mean weight: 145 pounds, Standard deviation: 10 pounds
C.V. = 10
145 × 100 = 6.9%
Sample 2:
Mean weight: 80 pounds, Standard deviation: 10 pounds
C.V. = 10
80 × 100 = 12.5%
87
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 87 / 315
Importance of C.V.
Practical Applications:
Compare variability in serum cholesterol levels (mg/dL)
and body weight (pounds).
Evaluate consistency in different experimental results.
88
Chapter 2: Descriptive Statistics 0214STAT: Fundamentals of Biostatistics 88 / 315
Chapter 3: The Normal Distribution
90
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 90 / 315
Visualization of a Continuous Distribution
91
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 91 / 315
Properties of Continuous Distributions
Key Properties:
Total area under the curve = 1.
Relative frequency between a and b: Area under the curve
between a and b (Figure 4.5.3).
92
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 92 / 315
Properties of Continuous Distributions
Finding Areas:
Use integral calculus to compute areas.
Density function f (x) is integrated over the interval [a, b]:
Z b
P (a ≤ X ≤ b) = f (x)dx
a
93
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 93 / 315
Definition of Probability Distribution
Definition:
94
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 94 / 315
Introduction to The Normal Distribution
95
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 95 / 315
The Normal Density Function
1 (x−µ)2
f (x) = √ e− 2σ2 , −∞ < x < ∞
2πσ
Here:
π = 3.14159 . . . and e = 2.71828 . . . are mathematical constants.
µ: Mean (measure of central tendency).
σ: Standard deviation (measure of dispersion).
96
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 96 / 315
Characteristics of the Normal Distribution
97
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 97 / 315
Visualizing the Normal Distribution
The bell-shaped curve is symmetric about µ (mean).
Variations in µ and σ (standard deviation):
Changing µ: Shifts the curve along the x-axis.
Changing σ: Affects the flatness or peakedness of the curve.
Example graph:
98
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 98 / 315
Empirical Rule (68–95–99.7 Rule)
The areas under the normal curve:
±1σ: 68% of the total area.
±2σ: 95% of the total area.
±3σ: 99.7% of the total area.
Illustration:
99
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 99 / 315
Remarks
100
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 100 / 315
Graph of the Normal Distribution
Figure: Three normal distributions with different means but the same amount
of variability.
101
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 101 / 315
Effect of Parameters on the Normal Distribution
Figure: Three normal distributions with different standard deviations but the
same mean.
102
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 102 / 315
The Standard Normal Distribution
103
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 103 / 315
Graph of the Standard Normal Distribution
104
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 104 / 315
Finding Probabilities with the Standard Normal Distribution
105
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 105 / 315
Using the Standard Normal Table
To find the area under the curve between −1 and a value z0 , we use
the standard normal table. For example, the shaded area in the graph
below represents the area between −1 and z0 .
106
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 106 / 315
The z-score
107
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 107 / 315
Example 4.6.1
Given the standard normal distribution, find the area under the curve,
above the z-axis, between z = −∞ and z = 2.
P (z < 2) = 0.9772
108
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 108 / 315
Example 4.6.2
109
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 109 / 315
Example 4.6.3
110
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 110 / 315
Example 4.6.4
111
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 111 / 315
Statistical Inference
Estimation Process:
Calculate a statistic from a sample to approximate a population
parameter.
Example: Hospital administrator estimating the mean age of
admitted patients.
Example: Physician estimating the proportion of patients with
drug side effects.
112
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 112 / 315
Types of Estimates
Estimator:
P Rule/formula used to compute an estimate. Example:
xi
x̄ = n as an estimator of µ.
113
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 113 / 315
Sampled vs. Target Populations
114
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 114 / 315
Random vs. Nonrandom Samples
115
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 115 / 315
Remarks
116
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 116 / 315
Introduction to Confidence Interval
117
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 117 / 315
Sampling Distributions and Estimation
118
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 118 / 315
Example 6.2.1
Sol:
s r
σ2 45
σx̄ = = = 2.1213,
n 10
CI = x̄ ± 2σx̄ = 22 ± 2(2.1213) = (17.76, 26.24).
119
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 119 / 315
General Formula
Interval Estimate:
Estimator ± (Reliability Coefficient) × (Standard Error)
For known variance:
Reliability Coefficients:
90% CI: z = 1.645
95% CI: z = 1.96
99% CI: z = 2.58
120
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 120 / 315
Example 6.2.2
Sol:
r
144
σx̄ = = 3.0984,
15
CI = x̄ ± 2.58σx̄ = 84.3 ± 2.58(3.0984) = (76.3, 92.3).
121
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 121 / 315
Example 6.2.3: Sampling from Nonnormal Populations
Sol:
8
σx̄ = √ = 1.3522,
35
CI = x̄ ± 1.645σx̄ = 17.2 ± 1.645(1.3522) = (15.0, 19.4).
122
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 122 / 315
Key Concepts
123
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 123 / 315
Introduction to the t Distribution
124
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 124 / 315
Formula for the t Distribution
x̄ − µ
t= √
s/ n
125
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 125 / 315
Properties of the t Distribution
Range: −∞ to +∞.
Shape: Compared to the normal distribution:
Less peaked center.
Thicker tails.
Approaches the normal distribution as n − 1 → ∞.
126
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 126 / 315
Confidence Intervals Using t
s
x̄ ± t(1−α/2) · √
n
Reliability coefficient t(1−α/2) is derived from the t table.
Requirements:
Sample drawn from a normal distribution.
Small deviations from normality are acceptable.
127
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 127 / 315
Example 6.3.1: Confidence Interval
Study: Maffulli et al. (A-1) examined early weightbearing and
mobilization after Achilles tendon repair. Among 19 subjects, the
mean isometric strength was 250.8 N with a standard deviation of
130.9 N, used to estimate the population mean. Given:
Mean strength of 19 subjects: x̄ = 250.8.
Standard deviation: s = 130.9.
Degrees of freedom: n − 1 = 18.
Desired confidence level: 95%.
Sol:
s 130.9
Standard error: √ = √ = 30.03.
n 19
t-value: t0.975,18 = 2.1009.
Confidence interval: 250.8 ± 2.1009 · 30.03 = [187.7, 313.9].
128
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 128 / 315
Deciding Between z and t
Use z:
Large sample size (n > 30).
Known population variance.
Use t:
Small sample size.
Unknown population variance.
Sample from a normal or approximately normal distribution.
(n − 1)s2
σ2
Where n is the sample size, s2 is the sample variance, and σ 2 is the
population variance.
The Chi-Square distribution depends on the degrees of freedom
(df = n − 1).
130
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 130 / 315
Confidence Interval for Variance (σ 2 )
(n − 1)s2
χ2α/2 < < χ21−α/2
σ2
Rearranging to solve for σ 2 :
(n − 1)s2 2 (n − 1)s2
< σ <
χ21−α/2 χ2α/2
This gives us the 100(1 − α)% confidence interval for the population
variance (σ 2 ).
131
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 131 / 315
Confidence Interval for Standard Deviation (σ)
Taking the square root of each term in the confidence interval for σ 2 ,
we obtain the confidence interval for the population standard
deviation (σ):
v v
u u
u (n − 1)s2 u (n − 1)s2
t <σ< t
2χ1−α/2 χ2α/2
132
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 132 / 315
Example 6.9.1: Gluten-Free Diet Study
Study: In a study where seven subjects with type 1 diabetes were
placed on a gluten-free diet, the IAA levels measured were as follows:
What are the values of χ21−α/2 and χ2α/2 used to calculate the
95% confidence interval for the population variance?
What is the 95% confidence interval for the population variance
σ2?
What is the 95% confidence interval for the population standard
deviation σ?
134
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 134 / 315
Precautions and Considerations
135
Chapter 3: The Normal Distribution 0214STAT: Fundamentals of Biostatistics 135 / 315
Chapter 4: Hypothesis Testing
Learning Objectives:
After studying this chapter, the student will
136
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 136 / 315
Introduction to Statistical Hypothesis Testing
137
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 137 / 315
Definition of Hypothesis
Examples:
The average length of hospital stay is 5 days.
A drug is effective in 90% of cases.
Hypothesis testing determines compatibility of statements with
data.
138
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 138 / 315
Key Concepts
Key Concepts in Hypothesis Testing:
Null Hypothesis (H0 ): A statement that there is no effect or no
difference.
Alternative Hypothesis (HA ): A statement that contradicts the
null hypothesis.
Types of Hypotheses:
Research Hypothesis: The conjecture or supposition that
motivates the research.
Statistical Hypothesis: A statement expressed in a way that
allows evaluation through statistical techniques.
139
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 139 / 315
Types of Hypotheses
Examples:
Case 1: Testing for equality:
Null Hypothesis (H0 ): µ = 50 (e.g., The average length of
hospital stay is 50 days).
Alternative Hypothesis (HA ): µ , 50 (e.g., The average
length of hospital stay differs from 50 days).
Case 2: Testing for an increase:
Null Hypothesis (H0 ): µ ≤ 50 (e.g., The drug is effective
in no more than 50
Alternative Hypothesis (HA ): µ > 50 (e.g., The drug is
effective in more than 50
140
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 140 / 315
Purpose of Hypothesis Testing
141
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 141 / 315
Common Hypothesis Tests
Common Tests:
Z-test: Used when population variance is known and the
sample size is large.
T-test: Used when population variance is unknown and the
sample size is small.
Chi-square test: Used for testing categorical data.
142
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 142 / 315
Rules for Hypothesis Tests
143
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 143 / 315
General Formula for Test Statistic
x̄ − µ0
z= √ ,
σ/ n
where µ0 is a hypothesized value of a population mean.
This test statistic relates to the familiar formula:
x̄ − µ
z= √ (7.1.2)
σ/ n
144
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 144 / 315
Test Statistic
146
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 146 / 315
Steps in Hypothesis Testing
147
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 147 / 315
Conclusion and p-values
Decision Rule:
Reject H0 if p-value < α. This indicates strong evidence against
H0 .
Do not reject H0 if p-value ≥ α. This does not prove H0 , but
suggests insufficient evidence to reject it.
149
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 149 / 315
Precaution
Remarks:
Hypothesis testing does not prove a hypothesis; it evaluates
whether data supports or does not support it.
If H0 is not rejected:
We say it ”may be true,” but we do not claim it is proven.
Accepting H0 implies that the data do not provide strong
evidence against it.
150
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 150 / 315
Introduction to Hypothesis Testing: Single Population Mean
Test Statistic:
x̄ − µ0
z= √
σ/ n
151
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 151 / 315
Example 7.2.1: Testing the Mean Age
152
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 152 / 315
Example 7.2.1: Testing the Mean Age
Decision Rule: The critical values are:
z-value for α/2 = 0.025:
z = ±1.96
Reject H0 if:
z ≤ −1.96 or z ≥ 1.96
Rejection and Non-Rejection Regions:
Substitute Values:
27 − 30 −3
z= √ = ≈ −2.12
4.47/ 10 1.4142
Decision:
z = −2.12 falls in the rejection region (z ≤ −1.96).
Reject H0 .
Conclusion:
There is sufficient evidence to conclude that the mean age of the
population is different from 30 at the α = 0.05 significance level.
154
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 154 / 315
Understanding the p-value
p = 2 × P (Z ≥ |zobserved |).
155
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 155 / 315
How to Compute the p-value
Steps to compute the p-value:
1 Identify the test statistic: In this example, the test statistic is
z = −2.12.
2 Determine the type of test: This is a two-tailed test. The
157
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 157 / 315
Introduction to Sampling from a Normally Distributed
Population: Population Variance Unknown
x̄ − µ0
t= √
s/ n
Under the null hypothesis H0 : µ = µ0 , this statistic follows a
Student’s t-distribution with n − 1 degrees of freedom.
158
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 158 / 315
Example 7.2.3: MCL and ACL Tear Study
Research Context:
Study by Nakamura et al. on MRI timing for 17 patients with
MCL and ACL tears.
Variable: Days between injury and initial MRI.
Data Summary: Sample mean x̄ = 13.2941, sample standard
deviation s = 8.88654, sample size n = 17.
Subject Days Subject Days Subject Days
1 14 6 0 11 28
2 9 7 10 12 24
3 18 8 4 13 24
4 26 9 8 14 2
5 12 10 21 15 3
16 14 17 9
Table: Number of Days Until MRI for Subjects with medial collateral
ligament (MCL) and anterior cruciate ligament (ACL) tears.
159
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 159 / 315
Example 7.2.3: MCL and ACL Tear Study
H0 : µ = 15
HA : µ , 15
Test Statistic:
x̄ − µ0
t= √
s/ n
160
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 160 / 315
Calculation and Statistical Decision
Degrees of Freedom:
df = n − 1 = 16
Decision Rule:
α = 0.05, two-tailed test.
Critical t values: ±2.1199.
Do not reject H0 since t = −0.791 falls in the non-rejection
region.
161
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 161 / 315
Rejection and Non-Rejection Regions
162
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 162 / 315
Sampling from Non-Normal Populations
x̄ − µ0
z= √
s/ n
Approximates the standard normal distribution if H0 is true.
Use s as an estimate for σ when population standard deviation is
unknown.
163
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 163 / 315
EXAMPLE 7.2.4: Systolic Blood Pressure in
African-American Men
164
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 164 / 315
EXAMPLE 7.2.4: Systolic Blood Pressure in
African-American Men
Sol. Assumptions:
The data represent a simple random sample of African-American
men reporting similar symptoms.
Systolic blood pressure is not assumed to be normally
distributed; the Central Limit Theorem applies due to the large
sample size.
Hypotheses:
H0 : µ ≤ 140 (null hypothesis)
HA : µ > 140 (alternative hypothesis)
Test Statistic:
x̄ − µ0 146 − 140
z= √ = √ = 2.78
s/ n 27/ 157
Distribution: By the central limit theorem, the test statistic z
approximately follows a standard normal distribution under H0 . 165
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 165 / 315
EXAMPLE 7.2.4: Systolic Blood Pressure in
African-American Men
Decision Rule: Using a significance level α = 0.05, the critical value
is:
z(α) = 1.645
p = 1 − 0.9973 = 0.0027
167
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 167 / 315
Remarks
168
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 168 / 315
Introduction to Hypothesis Testing: The Difference
Between Two Population Means
1. H0 : µ1 − µ2 = 0, HA : µ1 − µ2 , 0
2. H0 : µ1 − µ2 ≥ 0, HA : µ1 − µ2 < 0
3. H0 : µ1 − µ2 ≤ 0, HA : µ1 − µ2 > 0
169
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 169 / 315
Contexts for Testing the Difference
170
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 170 / 315
Testing When Population Variances are Known
171
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 171 / 315
Example 7.3.1: Problem
172
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 172 / 315
Example 7.3.1: Solution
Steps in Hypothesis Testing:
(1) Hypotheses:
H0 : µ1 = µ2 , H A : µ 1 , µ2
(2) Test Statistic:
(4.5 − 3.4) − 0
z= q
1 1.5
12 + 15
(3) Calculation:
1.1
z= = 2.57
0.4282
(4) Decision Rule: At α = 0.05, reject H0 if |z| > 1.96.
173
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 173 / 315
Example 7.3.1: p-value Calculation
174
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 174 / 315
Remarks
175
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 175 / 315
Sampling from Normally Distributed Populations
176
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 176 / 315
Population Variances Equal
177
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 177 / 315
Test Statistic for Testing H0 : µ1 = µ2
where:
x¯1 and x¯2 are the sample means.
s2p is the pooled sample variance.
Under the null hypothesis, this test statistic follows a Student’s
t-distribution with n1 + n2 − 2 degrees of freedom.
178
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 178 / 315
Example 7.3.2: Problem
Control 131 115 124 131 122 117 88 114 150 169
SCI 60 150 130 180 163 130 121 119 130 148
Table: Pressures (mm Hg) Under the Pelvis during Static Conditions
179
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 179 / 315
Example 7.3.2: Solution
Steps in Hypothesis Testing:
(1) Data: The problem statement provides the data.
(2) Assumptions: Two independent simple random samples,
approximately normally distributed data, and equal population
variances.
(3) Hypotheses:
Decision Rule:
The critical value for t at α = 0.05 and df = n1 + n2 − 2 = 18
is −1.7341.
Reject H0 if tcomputed < −1.7341.
Since the computed value of t = −0.57 is greater than −1.7341, we
fail to reject H0 .
183
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 183 / 315
Population Variances Unequal
When two independent simple random samples have been drawn from
normally distributed populations with unknown and unequal
variances, the test statistic for testing H0 : µ1 = µ2 is given by:
Determining t0 :
One-sided test:
Compute t0 using Equation 7.3.4.
Use t1 = t1−α for n1 − 1 degrees of freedom.
Use t2 = t1−α for n2 − 1 degrees of freedom.
Two-sided test:
Reject H0 if:
t0 ≥ t1−α/2 or t0 ≤ −t1−α/2 .
185
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 185 / 315
Example 7.3.3: Aortic Stiffness Index
186
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 186 / 315
Example 7.3.3: Hypotheses and Test Statistic
Sol.
Hypotheses:
H0 : µ1 = µ2 , H A : µ1 , µ2
187
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 187 / 315
Example 7.3.3: Decision Rule
(19.16 − 9.53) − 0
t0 = q
(5.29)2 2
15 + (2.69)
30
189
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 189 / 315
Sampling from Populations That Are Not Normally
Distributed
190
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 190 / 315
Test Statistic for Large Samples
191
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 191 / 315
Using the Test Statistic
192
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 192 / 315
Example 7.3.4: IgG Levels in Thrombosis Study
Goal: We will test if the mean IgG level for thrombosis subjects
is higher than that of non-thrombosis subjects.
193
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 193 / 315
Example 7.3.4: IgG Levels in Thrombosis Study
Sol.
H0 : µT − µN T ≤ 0
HA : µT − µN T > 0
where:
µT is the mean IgG level for thrombosis subjects,
µN T is the mean IgG level for non-thrombosis subjects.
194
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 194 / 315
Example 7.3.4: IgG Levels in Thrombosis Study
Step 2: Test Statistic: Since the samples are large, we apply the
central limit theorem, and the test statistic is:
59.01 − 46.61
z= q = 1.59
44.892 34.852
53 + 54
195
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 195 / 315
Example 7.3.4: IgG Levels in Thrombosis Study
Step 5: p-value
To compute the p-value, we need to find the probability of obtain-
ing a test statistic at least as extreme as the one computed from
the sample, assuming the null hypothesis H0 is true.
The null hypothesis: H0 : µT ≤ µN T
The alternative hypothesis: HA : µT > µN T
Test statistic computed: z = 1.59
196
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 196 / 315
Example 7.3.4: IgG Levels in Thrombosis Study
197
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 197 / 315
Introduction to Paired Comparisons
198
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 198 / 315
Reasons for Pairing
199
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 199 / 315
Example: Sunscreen Study
200
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 200 / 315
Hypothesis Testing for Paired Comparisons
Test Statistic:
d¯ − µd0
t= , (1)
sd¯
where:
d¯ = sample mean difference.
µd0 = hypothesized mean difference.
sd
√
sd¯ = n
.
202
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 202 / 315
Solution: Hypothesis Testing
Goal: Reject the null hypothesis if we can conclude that the pop-
ulation mean change in GBEF (µd ) is greater than zero.
Step 1: Data The data consist of the GBEF for 12 individuals, before
and after fundoplication.
203
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 203 / 315
Hypothesis Testing for Fundoplication
204
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 204 / 315
Hypothesis Testing for Fundoplication
H0 : µd ≥ 0
HA : µd < 0
H0 : µd = 0
HA : µd , 0
205
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 205 / 315
Assumptions and Hypotheses
Step 2: Hypotheses
Null Hypothesis (H0 ): The population mean difference µd = 0
Alternative Hypothesis (HA ): The population mean difference
µd > 0
206
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 206 / 315
Test Statistic and Decision Rule
d¯ − µd0
t= √
sd / n
Where:
- d¯ is the sample mean of the differences.
- sd is the sample standard deviation of the differences.
- n is the number of differences.
207
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 207 / 315
Test Statistic and Decision Rule
208
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 208 / 315
Calculation of the Test Statistic
Step 5: Calculation of Test Statistic
From the data, the sample mean of the differences is:
216.9
d¯ = = 18.075
12
The sample variance s2d is calculated as:
P ¯2
(di − d)
s2d =
n−1
P 2 P
n di − ( di )2
=
n(n − 1)
12 × 15669.49 − (216.9)2
=
12 × 11
= 1068.0930
18.075
t= q ≈ 1.9159
1068.0930
12
210
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 210 / 315
p-value Calculation
211
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 211 / 315
Confidence Interval for µd
18.075 ± 20.765
(−2.690, 38.840)
212
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 212 / 315
The Use of z
dˆ − µd
z= σd
√
n
213
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 213 / 315
Assumption and Alternatives
If neither z-test nor t-test is appropriate for use with available data,
one may consider using nonparametric methods.
214
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 214 / 315
Disadvantages of Paired Comparisons
The use of the paired comparisons test is not without its problems. If
different subjects are used and randomly assigned to two treatments,
considerable time and expense may be involved in trying to match
individuals on relevant variables.
215
Chapter 4: Hypothesis Testing 0214STAT: Fundamentals of Biostatistics 215 / 315
Chapter 5: Simple Linear Regression and Correlation
Learning Objectives:
216
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 216 / 315
Introduction to Simple Linear Regression and Correlation
217
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 217 / 315
What is Regression?
218
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 218 / 315
What is Correlation?
219
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 219 / 315
The Simple Linear Regression Model
y = b0 + b1 x + ϵ
y: Response variable.
b0 , b1 : Regression coefficients.
ϵ: Error term representing deviations from the mean.
220
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 220 / 315
LINE Assumptions
221
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 221 / 315
Assumptions of Linear Regression: Linearity
Linearity:
A straight line is used to model the relationship between X
and Y .
A non-linear pattern indicates the linear regression model is
not suitable.
222
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 222 / 315
Assumptions of Linear Regression: Independence
Independence:
No autocorrelation:
Error terms (residuals) should not be correlated with each
other.
Errors for one observation should not depend on errors for
another.
Implication:
Violations can lead to unreliable standard errors, invalid
hypothesis tests, and confidence intervals.
Example of Violation:
Time-series data often exhibit autocorrelation.
Detection:
Use the Durbin-Watson test to check for autocorrelation.
223
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 223 / 315
Assumptions of Linear Regression: Normality
Normality:
The error term (ϵ) must follow a normal distribution.
224
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 224 / 315
Assumptions of Linear Regression: Equal Variances
Testing Homoscedasticity:
Plot residuals (errors) against the dependent variable (Y ).
225
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 225 / 315
Assumptions of Linear Regression: Equal Variances
Implications of Heteroscedasticity:
Leads to inefficient estimates of regression coefficients.
May affect hypothesis tests and confidence intervals.
226
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 226 / 315
Graphical Representation of the Regression Model
227
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 227 / 315
Graphical Representation of the Regression Model
229
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 229 / 315
Prediction and Estimation
Key Note: The sample data provide known values of both X and
Y . When using the regression equation, only X values are known.
230
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 230 / 315
Example: Simple Linear Regression Analysis
231
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 231 / 315
Example 9.3.1: Predicting Deep Abdominal AT from Waist
Circumference
Study:
Deep abdominal adipose tissue (AT) is linked to cardiovascular
disease risks.
Computed tomography (CT) accurately measures deep
abdominal AT but is expensive and involves radiation.
CT is not widely available for routine use by most physicians.
Study Objective:
Researchers aimed to predict deep abdominal AT using simpler
body measurements, like waist circumference.
The study involved healthy men aged 18–42 with no metabolic
diseases requiring treatment.
Data:
Measurements included deep abdominal AT (via CT) and waist
circumference (Table 9.3.1). 232
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 232 / 315
Table 9.3.1: Waist Circumference and Deep Abdominal AT
Subject Waist Circumference (cm) Deep Abdominal AT (cm2 )
1 74.75 25.72
2 72.60 25.89
3 81.80 42.60
4 83.95 42.80
5 74.65 29.84
6 71.85 21.68
7 80.90 29.08
8 83.40 32.98
9 63.50 11.44
10 73.20 32.22
Table: 9.3.1: Waist Circumference (X) and Deep Abdominal AT (Y) for a
Sample of 10 Subjects from 109 (Page 418).
The independent
variable X is plotted on
the horizontal axis.
The dependent variable
Y is plotted on the
vertical axis.
The pattern of points on
the scatter diagram
suggests the nature and
strength of the Figure: Scatter diagram of data shown
relationship. in Table
234
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 234 / 315
Interpreting the Scatter Diagram
235
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 235 / 315
The Least-Squares Line
y = b0 + b1 x
where:
b0 is the y-intercept.
b1 is the slope of the line.
236
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 236 / 315
Calculating the Least-Squares Line
b0 = ȳ − b1 x̄
where:
xi and yi are the data points.
x̄ and ȳ are the sample means of X and Y , respectively.
These formulas are typically computed using software like MINITAB.
237
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 237 / 315
Example: Calculating the Regression Line
238
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 238 / 315
The Least-Squares Regression Line
From Figure 9.3.2, the linear equation for the least-squares line
describing the relationship between waist circumference (X) and
deep abdominal AT (Y ) is: ŷ = −216 + 3.46x
Since bˆ0 (intercept) is negative, the line crosses the Y-axis below
the origin.
Since bˆ1 (slope) is positive, the line extends from the lower
left-hand corner to the upper right-hand corner of the graph.
For each unit increase in X, Y increases by 3.46 units.
The symbol ŷ represents a predicted value of Y , rather than an
observed value.
For X = 110:
These coordinates, (70, 26.2) and (110, 164), can be used to plot
the least-squares line. The graph in the figure illustrates the orig-
inal data and the least-squares line.
240
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 240 / 315
Simple Linear Regression
241
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 241 / 315
The Least-Squares Criterion
The least-squares line is considered the ”best fit” line for describ-
ing the relationship between the two variables. But what makes it
the best?
The least-squares line minimizes the sum of squared vertical
deviations between the observed data points (yi ) and the line.
The sum of squared deviations is smaller for the least-squares
line than for any other line that can be drawn through the points.
242
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 242 / 315
The Correlation Coefficient
243
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 243 / 315
The formula allows computation of r without first computing the
regression coefficient b. Additionally, the correlation coefficient can
also be expressed as:
Sxy
r=
Sx Sy
1 X
Sxy = (xi − x̄)(yi − ȳ)
n−1
where x̄ and ȳ are the means of X and Y , respectively.
244
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 244 / 315
Correlation Assumptions
245
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 245 / 315
Graphical Representation of the Bivariate Normal
Distribution
The bivariate normal distribution is represented graphically in Figure
9.6.1.
247
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 247 / 315
Sample Correlation Coefficient
The sample correlation coefficient r will always have the same sign as
the sample slope b.
If r = 1, perfect direct linear correlation.
If r = −1, perfect inverse linear correlation.
If r = 0, no linear correlation.
248
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 248 / 315
Figure 9.4.6: Scatter diagrams showing (a) direct linear relationship,
(b) inverse linear relationship, and (c) no linear relationship between
X and Y .
249
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 249 / 315
Hypothesis Test for Correlation
250
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 250 / 315
Test Statistic for Correlation
Decision Rule:
Using a significance level α = 0.05, we find the critical values
for t:
±1.9754 (for 153 degrees of freedom)
If the calculated t-value is outside this range, we reject H0 .
251
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 251 / 315
Output of Example 9.7.1
procedure.
252
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 252 / 315
Test Statistic Calculation
Conclusion:
Based on the hypothesis test, we conclude that there is a significant
linear correlation between height and SEP levels in the population.
p-value < 0.005
t = 19.787 > 2.6085
253
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 253 / 315
Hypotheses about the slope b1 the test statistic
The Test Statistic: For testing hypotheses about b1 , the test statistic
2 is known is given by:
when σy|x
b̂1 − b1(0)
z= (9.4.8)
σb̂1
254
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 254 / 315
Hypotheses about b1 the test statistic
2 is unknown. In this case, the test statistic becomes:
As a rule, σy|x
b̂1 − b1(0)
t= (9.4.9)
sb̂1
255
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 255 / 315
Example 9.4.2
256
Chapter 5: Simple Linear Regression and Correlation 0214STAT: Fundamentals of Biostatistics 256 / 315
Example 9.4.2
3. Hypotheses:
H0 : b1 = 0 vs. HA : b1 , 0 at α = 0.05
b̂1 − 0 3.4589
t= = = 14.74
sb̂1 0.2347
Learning Objectives:
258
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 258 / 315
Introduction to The Mathematical Properties of the
Chi-Square Distribution
259
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 259 / 315
The Properties of the Chi-Square Distribution
y−µ
z=
σ
260
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 260 / 315
Mathematical Formula for Chi-Square Distribution
The probability density function (pdf) of the chi-square distribution
with k degrees of freedom is given by:
1
f (u) = uk/2−1 e−u/2 , u>0
Γ(k/2)2k/2
where:
Γ(k/2) is the Gamma function, a generalization of the factorial
function, defined for x > 0 as:
Z ∞
Γ(x) = tx−1 e−t dt
0
262
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 262 / 315
Observed Versus Expected Frequencies
263
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 263 / 315
Chi-Square Test Statistic
X (Oi − Ei )2
χ2 =
Ei
Where:
Oi is the observed frequency for the i-th category.
Ei is the expected frequency for the i-th category.
264
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 264 / 315
Perfect Match Between Observed and Expected Frequencies
265
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 265 / 315
Disagreement Between Observed and Expected Frequencies
Reject H0
266
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 266 / 315
Chi-Square Test Statistic
The quantity:
(Oi − Ei )2
Ei
will be small if the observed and expected frequencies are close, and
large if the differences are significant.
267
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 267 / 315
Chi-Square Test Statistic
Handling Small Expected Frequencies:
Solutions:
Combine adjacent categories to achieve the minimum expected
frequency (suggested to be at least 5).
Reduce the degrees of freedom.
268
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 268 / 315
Goodness-of-Fit Test
269
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 269 / 315
Procedure for Goodness-of-Fit Test
270
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 270 / 315
Example 12.3.1: The Normal Distribution
Study: Cranor and Christensen (A-1) conducted a study to assess
short-term clinical, economic, and humanistic outcomes of
pharmaceutical care services for patients with diabetes. The
cholesterol levels of 47 subjects are summarized in the table below:
Cholesterol Level (mg/dl) Number of Subjects
100.0–124.9 1
125.0–149.9 3
150.0–174.9 8
175.0–199.9 18
200.0–224.9 6
225.0–249.9 4
250.0–274.9 4
275.0–299.9 3
X
k
(Oi − Ei )2
χ2 =
i=1
Ei
where:
Oi = Observed frequency in class i
Ei = Expected frequency in class i
272
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 272 / 315
Decision Rule
273
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 273 / 315
Calculation of Test Statistic
The sample mean and standard deviation are used to estimate the
parameters of the hypothesized normal distribution:
x̄ = 198.67, s = 41.31
The next step is to determine the expected frequencies for each class
interval based on these estimates.
274
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 274 / 315
Expected Relative Frequencies
x0 − µ
z0 =
σ
where x0 is the value in the class interval, µ is the mean, and σ is the
standard deviation.
275
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 275 / 315
Expected Relative Frequencies
276
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 276 / 315
Expected Relative Frequency
277
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 277 / 315
Expected Relative Frequency
The area to the left of z = −1.78 is 0.0375, and the area to the
left of z = −1.18 is 0.1190.
278
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 278 / 315
Observed and Expected Frequencies
(Oi −Ei )2
Class Interval Observed Frequency Oi Expected Frequency Ei Ei
< 100 0 0.4 -
100.0–124.9 1 1.4 0.356
125.0–149.9 3 3.8 0.168
150.0–174.9 8 7.8 0.005
175.0–199.9 18 10.7 4.980
200.0–224.9 6 10.7 2.064
225.0–249.9 4 7.2 1.422
250.0–274.9 4 3.5 0.071
275.0–299.9 3 1.2 1.500
300.0 and greater 0 0.3 -
279
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 279 / 315
Observed and Expected Frequencies
The first, second, and last entries in the last column, for example,
are computed as:
(1−1.8)2
First entry: 1.8 = 0.356
2
Second entry: (3−3.8)
3.8 = 0.168
(3−1.5)2
Last entry: 1.5 = 1.5
other values of (Oi −E
2
i)
The Ei are computed in a similar manner.
280
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 280 / 315
Degrees of Freedom Calculation
281
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 281 / 315
P P
Remark: Condition Ei = Oi
P P
The condition Ei = Oi ensures consistency between:
P
P Oi : The total observed frequencies from the data.
Ei : The total expected frequencies under H0 .
This is required to:
Ensure the total sample size is the same for observed and
expected distributions.
Maintain fairness in comparing observed (Oi ) and expected (Ei )
frequencies.
This condition is one of the restrictions subtracted from the total
degrees of freedom when performing the test.
Example from the preceding table:
Suppose there are 47 observations across 8 intervals.
P P
Oi = 47, so Ei must also be adjusted to 47.
282
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 282 / 315
Chi-Square Test for Goodness-of-Fit
283
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 283 / 315
Chi-Square Test for Goodness-of-Fit
284
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 284 / 315
Example 12.3.5: Chi-Square Goodness-of-Fit Test
285
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 285 / 315
Example 12.3.5: Chi-Square Goodness-of-Fit Test
286
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 286 / 315
Example 12.3.5: Chi-Square Goodness-of-Fit Test
287
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 287 / 315
Example 12.3.5: Chi-Square Goodness-of-Fit Test
2. Total parts in the ratio: The sum of parts is:
1 + 2 + 1 = 4.
Degrees of freedom = k − 1 = 3 − 1 = 2.
290
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 290 / 315
Example 12.3.5: Chi-Square Goodness-of-Fit Test
In this case: χ2 = 13.71 > 5.991. Since 13.71 is greater than the
critical value, we reject H0 . This suggests that the observed data do
not align with the expected distribution under the 1:2:1 ratio.
Step 7: P-Value The p-value is the probability of observing a test
statistic at least as extreme as χ2 = 13.71 under the assumption that
H0 is true. From the chi-square distribution table for 2 degrees of
freedom:
291
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 291 / 315
Example 12.3.5: Chi-Square Goodness-of-Fit Test
292
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 292 / 315
Introduction to Tests of Independence
293
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 293 / 315
Contingency Table
294
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 294 / 315
Testing the Null Hypothesis
295
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 295 / 315
Calculating Expected Frequencies
To calculate expected frequencies under the assumption of
independence, use the formula:
ni. · n.j
Eij =
n
This formula calculates the expected frequency Eij for each cell,
assuming the null hypothesis of independence.
Repeat this calculation for all cells in the table to obtain the ex-
pected frequencies.
296
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 296 / 315
Observed vs Expected Frequencies
The observed and expected frequencies are compared.
If the discrepancy between observed and expected frequencies is
sufficiently large, the null hypothesis is rejected.
The chi-square statistic χ2 is computed using:
X (Oij − Eij )2
χ2 =
Eij
Where Oij are the observed frequencies and Eij are the ex-
pected frequencies.
298
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 298 / 315
EXAMPLE 12.4.1: Folic Acid Use and Race
299
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 299 / 315
EXAMPLE 12.4.1: Folic Acid Use and Race
300
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 300 / 315
Chi-Square Test Statistic
P (Oi −Ei )2
Formula for the Test Statistic: χ2 = Ei
Degrees of Freedom: (r − 1)(c − 1) = (3 − 1)(2 − 1) = 2.
Decision Rule: Reject H0 if χ2 ≥ χ21−0.05,2 = 5.991.
Race Yes (Observed, Expected) No (Observed, Expected) Total
White 260 (247.86) 299 (311.14) 559
Black 15 (24.83) 41 (31.17) 56
Other 7 (9.31) 14 (11.69) 21
Total 282 354 636
Table: Observed and Expected Frequencies.
Remark: The table displays the observed and expected frequencies for the race and folic
acid usage categories. These are used to calculate the chi-square statistic.
301
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 301 / 315
Expected Frequencies
Test Statistic (Chi-Square) Calculation: The test statistic χ2 is
calculated as:
(260 − 247.86)2 (14 − 11.69)2
χ2 = + ··· +
247.86 11.69
= 0.59461 + 0.47368 + · · · + 0.45647
= 9.08960.
Key Concepts:
A test of homogeneity examines if multiple samples come from
populations that are homogeneous with respect to a certain
classification.
It differs from a test of independence, where the goal is to
examine if two criteria of classification are independent.
Both tests use the chi-square statistic but differ in sampling
procedures and hypotheses.
304
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 304 / 315
Calculating Expected Frequencies
Expected Frequencies:
Expected frequencies for each cell are computed as:
305
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 305 / 315
Solution and Conclusion
Step-by-Step Solution:
1 Calculate expected frequencies for each cell.
2 Compute the χ2 statistic using observed and expected
frequencies.
3 Compare χ2 to the critical value (e.g. χ21−α,df ).
306
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 306 / 315
Solution and Conclusion
Conclusion:
If χ2 is less than the critical value, fail to reject H0 : Populations
are homogeneous.
If χ2 is greater than the critical value, reject H0 : Populations are
not homogeneous.
307
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 307 / 315
Example 12.5.1: Narcolepsy and Migraine
Yes No Total
Narcoleptic Subjects 21 75 96
Healthy Controls 19 77 96
Total 40 152 192
308
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 308 / 315
Example 12.5.1: Narcolepsy and Migraine
309
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 309 / 315
Hypotheses and Test Statistic
Sol: Hypotheses:
Null Hypothesis (H0 ): The populations are homogeneous with
respect to migraine frequency.
Alternative Hypothesis (HA ): The populations are not
homogeneous with respect to migraine frequency.
Test Statistic:
X (Oi − Ei )2
χ2 =
Ei
Degrees of Freedom:
(r − 1)(c − 1)
where r = number of rows, c = number of columns.
310
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 310 / 315
Solution and Conclusion
Step-by-Step Solution:
1 Calculate expected frequencies for each cell.
96 × 40 96 × 152
E11 = = 20 E12 = = 76
192 192
96 × 40 96 × 152
E21 = = 20 E22 = = 76
192 192
2 Compute χ2 statistic using observed and expected frequencies:
311
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 311 / 315
Steps in Hypothesis Testing
df = (r − 1)(c − 1) = (2 − 1)(2 − 1) = 1.
313
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 313 / 315
Steps in Hypothesis Testing
Statistical Decision: Since 0.126 < 3.841, we fail to reject the
null hypothesis.
Conclusion:
p = 0.722.
315
Chapter 6: The Chi-square Distribution and Frequencies 0214STAT: Fundamentals of Biostatistics 315 / 315