1 Introduction To Biostatistics
1 Introduction To Biostatistics
Biostatistics
Mengistu Y. (BSC, MPH-HI, PhD fellow, Assi. Prof. PH)
2022
Learning Objectives
General Objective
♦ To provide the statistical methods and numerical descriptions that is useful to generate
information about certain situations and present them in such a way that valid interpretations
are possible
Specific Objectives
♦ design, organize, present and summarize data
♦ understand the process involved in data collection and processing
♦ distinguish between categorical and numeric data
♦ understand probabilities and their applications
♦ interpret summary statistics, graphical displays and contingency tables commonly presented in
the health literature
♦ carry out exploratory data analysis
♦ understand the process involved in estimations and hypothesis testing
♦ interpret the functions of confidence intervals and p-values
♦ give an interpretation or reach a conclusion about a population on the basis of information
contained in a sample drown from that population.
Course content
♦ Introduction to the course
♦ Data and Scales of measurement
♦ Methods of data organization and presentation
♦ Frequency distribution
♦ Measures of central tendency and dispersion
♦ Basic principles of probability
♦ Rules of probability and applications (additive,
multiplicative, Bayes')
References: (available in the Library)
collecting data
organizing
Summarizing data
Presenting data
analysing and drawing conclusion (inferences) from data.
• Statistical data:
• When it means statistical data it refers to numerical descriptions of things.
• These descriptions may take the form of counts or measurements.
NB Even though statistical data always denote figures (numerical
descriptions) it must be remembered that all 'numerical descriptions'
are not statistical data
• Statistical methods:
• It refers to a body of methods that are used for collecting, organising,
summarizing, analysis and interpreting numerical data for
understanding a phenomenon or making wise decisions.
04/13/2024 6
Definitions…
• Biostatistics is the application of different statistical methods for
biological, medical and public health data
• A population is any specific collection of objects of interest.
• A sample is any subset or sub-collection of the population
• A census is the case that the sample consists of the whole population.
04/13/2024 7
Definitions ...
• A measurement is a number or attribute computed for each member
of a population or of a sample.
• A parameter is the characteristics of the population as a whole.
• A statistic is the characteristics of the sample data.
• Descriptive statistics is a study of data: involves organizing,
displaying, and describing properties of the data
• Inferential statistics is drawing conclusions about a population of
interest based on information contained in the sample taken from the
population.
04/13/2024 8
Definitions …
• The distinction between a population together with its parameters
and a sample together with its statistics is a fundamental concept in
inferential statistics.
population sample
Statistics
Inference
parameters
04/13/2024 9
Definition …
• In a research, the variable that you believe might be influenced or modified by some
treatment or exposure.
• The dependent variable is called the outcome variable. This definition depends on the context
of the study.
• Example: A study examined the relationship of dietary fat consumption and the development
of ischemic stroke.
04/13/2024 13
Characteristics…
iv) They must have been collected in a systematic manner for a
predetermined purpose. Numerical data can be called statistics only if
they have been compiled in a properly planned manner and for a
purpose about which the enumerator had a definite idea.
v) They must be placed in relation to each other. That is, they must be
comparable. Numerical facts may be placed in relation to each other
either in point of time, space or condition.
04/13/2024 14
Source of data
• Routine data collection
• Routine health unit and community data
• Activity data about patients seen and programmes run, routine
services and epidemiological surveillance;
• Semi-permanent data about the population served, the facility
itself and staff that run it
• Vital registration
• Non-routine data collection
• Surveys
• Population census (headcounts proportion/facility catchment’s area)
• Quantitative or qualitative rapid assessment methods
04/13/2024 15
Techniques of data collection
16
Techniques of collecting data con’td
Observation: is a technique that involves systematically
selecting, watching and recording behavior and
characteristics of living things, objects or phenomena.
• Observation of human behavior is a much-used data
collection technique. It can be undertaken in different
ways;
• Participant observation: The observer takes part in the
situation he or she observes.
• Non-participant observation: The observer watches the
situation, openly or concealed, but does not participate
17
Data collection techniques con’d
• Observations can give additional, more accurate information
on behavior of people than interviews or questionnaires
18
Data collection techniques con’d
• Interview (face-to-face): is a data-collection technique that
involves oral questioning of respondents, either individually or as
a group.
19
Data collection techniques con’d
21
Types of questions
Such questions are useful for obtaining in-depth information
on:
• sensitive issues.
22
Types of questions
• Example;
2. 'What do you think are the reasons some adolescents in this area start
using drugs?
3. 'What would you do if you noticed that your daughter (school girl) had
a relationship with someone?'
23
Types of questions
• Advantage of open-ended questions
• Allow you to probe more deeply into issues of interest being
raised.
• Information provided in the respondents' own words might be
useful
• Risks of completely open-ended questions
• A big risk is incomplete recording of all relevant issues covered in
the discussion.
• Analysis is time-consuming and requires experience; otherwise
important data may be lost.
24
Types of questions
2. Closed
questions: have a list of possible options or answers
from which the respondents must choose
25
Types of questions
26
Types of questions
2. Did you eat any of the following foods yesterday?' (Circle yes if at least one
item in each set of items is eaten.)
27
Types of questions
• Advantages of closed questions
• It saves time
• Comparing responses of different groups, or of the same group over time,
becomes easier.
• Risks of closed questions:
• In case of illiterate respondents, bias will be introduce
28
Steps in designing questionnaire
1. Content: Take your objectives and variables as a starting point
29
Steps in designing questionnaire
3. Sequencing the questions: Design your interview
schedule or questionnaire to be 'informant friendly‘
30
Steps in designing questionnaire
explaining the purpose of the study
requesting the informant's consent to be interviewed
assuring confidentiality of the data obtained.
6. Pre-test:
32
For qualitative study
Focus group discussions: It allows a group of 8 - 12 informants to freely discuss a
certain subject with the guidance of a facilitator or reporter
In-depth interview
Key informant interview
33
Rationale of studying statistics
•Why do we need to use statistics
•– The reason is: Presence of variability
04/13/2024 35
Limitations of statistics
1. It deals on aggregates of facts and no importance is attached to
individual items–suited only if their group characteristics are desired to
be studied.
2. Statistical data are only approximately and not mathematically
correct.
04/13/2024 36
Data and types of data
• Qualitative (or categorical) data consist of values that can be
separated into different categories that are distinguished by some
nonnumeric characteristic.
• Cannot be measured in quantitative form but can only be identified by name
or categories
• Quantitative data consist of values representing counts or
measurements. Expressed numerically and they can be of two types
(discrete or continuous).
04/13/2024 37
Types of Quantitative Data
• Continuous data can take on any value in a given interval. Continuous
data values results from some continuous scale that covers a range of
values without gaps, interruptions, or jumps.
• Discrete data can take on only particular distinct values and not other
values in between. The values in discrete data is either a finite
number or a countable number.
04/13/2024 38
Scale of measurement
• Nominal
• Ordinal
• Interval
• Ratio
• Nominal and ordinal are qualitative (categorical) levels of
measurement.
• Interval and ratio are quantitative levels of measurement.
04/13/2024 39
Types of Variables
• Variable types can be distinguished based on their scale, Typically,
different statistical methods are appropriate for variables of different
scales
scale Characteristic questions Examples
Nominal Is A different than B? Marital status, Eye color, Gender,
Religious affiliation, Race
Ordinal Is A bigger than B? Stage of disease
Severity of pain
Level of satisfaction
Interval By how many units do A and Temperature
B differ?
Ratio How many times bigger than Distance, Length
B is A? Time until death
Weight
04/13/2024 40
Operations that make sense for variables of
different scales
Scale Operation that make sense
Counting Ranking Addition/ Multiplication/
subtraction Division
Nominal .
Ordinal . .
Interval . . .
Ratio . . . .
04/13/2024 41
TYPES OF QUALITATIVE
MEASUREMENTS
• Nominal level of measurement—classifies data into names, labels or
categories in which no order or ranking can be imposed.
Example: Sex ( M, F)
Exam result (P, F)
Blood Group (A,B, O or AB)
Color of Eyes (blue, green,
brown, black)
04/13/2024 42
• Ordinal level of measurement—classifies data into categories that can be
ordered or ranked, but precise differences between the ranks do not exist.
Generally it does not make sense to do calculations with data at the
ordinal level.
Example:
Response to treatment
(poor, fair, good)
Severity of disease
(mild, moderate, severe)
Income status (low, middle,
high)
04/13/2024 43
TYPES OF QUANTITATIVE
MEASUREMENTS
• Interval level of measurement—ranks data, precise differences
between units of measure exist, but there is no meaningful zero. If a
zero exists, it is an arbitrary point. Example—IQ scores, it makes sense
to talk about someone having an IQ 20 points higher than another
person, but an IQ of zero has no meaning.
04/13/2024 45
Copyright © 2009 Pearson Education, Inc.
Data organization and presentation
• Statistics is used to organize and interpret research
observations and findings.
• Before interpretation & communication of the
findings, the raw data must be organized and
presented in a clear and understandable way.
Techniques used to organize and summarize a set of
data in a concise way.
• Organization of data
• Summarization of data
• Presentation of data
04/13/2024 46
Cont...
• Numbers that have not been summarized and
organized are called raw data
04/13/2024 50
Cumulative frequency
• Two other distributions are useful describing
particularly ordinal data.
• It tells nothing in nominal data.
E.g. You will never say 70% are below blue color.
• The cumulative frequency is the number of
observations in the category plus observations in all
categories smaller than it.
• Cumulative relative frequency is the proportion of
observations in the category plus observations in all
categories smaller than it, and is obtained by dividing
the cumulative frequency by the total number of
observations.
04/13/2024 51
Table 2. Distribution of birth weight of newborns
between 1976-1996 at TAH.
04/13/2024 52
Frequency distribution for numerical data
• Ordered array, further useful summarization may be
achieved by grouping the data.
• To group a set of observations we select a set of
continuous, non overlapping intervals such that each
value in the set of observations can be placed in one,
and only one, of the intervals.
• These intervals are usually referred to as class
intervals.
04/13/2024 53
• One of the first considerations when data are to
be grouped is how many intervals to include
• The question is how best can we organize such
data. Imagine when we have huge data set
which may not be manageable by eye.
04/13/2024 54
Table 3. Frequencies of serum cholesterol levels for 1067 US males of
ages 25-34, (1976-1980).
-------------------------------------------------------------------------------------------------------------------------------
Cholesterol level
Mg/100ml freq Relative freq Cum freq Cum.rel. freq
------------------------------------------------------------------------------------------------------------------
80-119 13 1.2 13 1.2
120-159 150 14.1 163 15.3
160-199 442 41.4 605 56.7
200-239 299 28.0 904 84.7
240-279 115 10.8 1019 95.5
280-319 34 3.2 1053 98.7
320-359 9 0.8 1062 99.5
360-399 5 0.5 1067 100
------------------------------------------------------------------------------------------------------------------
Total 1067 100
04/13/2024 55
For both discrete and continuous data the values are
grouped into non-overlapping intervals, usually of
equal width.
04/13/2024 56
Example of raw data of age….
04/13/2024 57
Example of categorized data of age
04/13/2024 58
How to calculate class interval?
To determine the number of class intervals and the
corresponding width, we use:
Sturge’s rule:
K=1+3.322(logn)
W=L-S
K
where
K = number of class intervals n = no. of observations
W = width of the class interval L = the largest value
S = the smallest value
04/13/2024 59
Example
• Construct a grouped frequency distribution of the
following data on the amount of time (in hours) that
80 college students devoted to leisure activities during
a typical school week:
04/13/2024 60
Example:
04/13/2024 61
The amount of time (in hours) that 80 college students devoted to leisure activities during a typical school week
04/13/2024 62
04/13/2024 63
Mid-point and True-limits
Mid-point (class mark): The value of the interval
which lies midway between the lower and the upper
limits of a class.
True limits(class boundaries): Are those limits that
make an interval of a continuous variable continuous
in both directions
04/13/2024 65
Example
Time True limit Mid-point Frequency
(Hours)
04/13/2024 66
• Class interval: The length of the class, it is given by the difference
between class boundaries for 1st class, the interval is 5.
• Note: As sample increases, and interval reduced the sample
distribution resembles the population distribution
04/13/2024 67
• Class intervals should be continuous, non
overlapping, mutually exclusive and exhaustive
04/13/2024 69
Age of patients (years) in a diabetic clinic in Addis
Ababa, 2010
Cumulative freq Relative Cum freq
Fraction (%)
<Method >Method <Method >Method
Frequency,
Age group
Boundary
Relative
(Years)
Fr. (fi)
Point
Class
Class
Class
limit
Tally
Mid
Total
04/13/2024 70
METHOD OF DATA PRESENTATION
04/13/2024 71
Data table
Guidelines for constructing tables
• Keep them simple
• Limit the number of variables
• All tables should be self-explanatory
• Include clear title telling what, where and
when
• Clearly label the rows and columns
04/13/2024 72
Cntd…
• State clearly the unit of measurement used
• Explain codes and abbreviations in the foot-
note
• Show totals
• If data is not original, indicate the source in
foot-note
04/13/2024 73
Graphical presentation of data
04/13/2024 74
Importance of graphs
• Diagrams have greater attraction than mere figures.
• They give delight to the eye, add a spark of interest and as such
catch the attention
• They help in deriving the required information in less time and
without any mental strain.
• They have great memorizing value than mere figures.
• They facilitate comparison
04/13/2024 75
Bar charts
04/13/2024 76
Figure 1. Bar charts showing frequency distribution of
the variable ‘BWT’.
100
6000
5000
80
4000
60
3000
Rel. Freq.
Freq.
40
2000
1000 20
0
0
Very low Low Normal Big Very low Low Normal Big
BWT BWT
04/13/2024 77
Bar charts for comparison
• Multiple bar chart: In order to compare the
distribution of a variable for two or more groups, bars
are often drawn along side each other for groups being
compared in a single bar chart.
• Sub division bar chart: If there are different
quantities forming the sub-divisions of the totals,
simple bars may be sub-divided in the ratio of the
various sub-divisions to exhibit the relationship of the
parts to the whole.
04/13/2024 78
Fig 2. Bar chart indicating categories of birth weight of
9975 newborns grouped by antenatal follow-up of the
mothers
6000
100 88.9 89
5000
90
80
70
4000
60
50 Yes
Percent
Freq.
3000
40 No
30
20 9 7.9
2000
10 2.1 3.1
Antenatal Care 0
1000 Low Normal Big
No
NNo BWT
0 Yes
Low Normal Big
04/13/2024 79
BWT
Example: Plasmodium species distribution for confirmed malaria cases,
Zeway, 2003
04/13/2024 80
Pie chart
Pie Chart: Displays the frequency distribution for
nominal or ordinal data.
• In a pie chart the various categories into which the
observation fall are represented along sectors of a
circle
0.4 43 793
2.7 8 268
Normal Normal
Big
Big
8870
88.9
04/13/2024 82
Histogram
• Histogram is frequency distributions with continuous
class interval that has been turned into graph.
04/13/2024 83
Histograms cont…
• The number of observations in each class
is called the frequency. Hence histograms
are also called frequency distributions
04/13/2024 84
Histograms cont…
• Except for the two boundaries, class intervals are
usually chosen to be of equal width. If this is not the
case, the histogram could give a misleading
impression of the shape of the data
04/13/2024 85
Example
Distribution of the age of women at the time of
marriage
Age group No. of women
15-19 11
20-24 36
25-29 28
30-34 13
35-39 7
40-44 3
45-49 2
04/13/2024 86
Age of women at the time of marriage
40
35
30
25
No of women
20
15
10
0
14.5-19.5 19.5-24.5 24.5-29.5 29.5-34.5 34.5-39.5 39.5-44.5 44.5-49.5
Age group
04/13/2024 87
Fig 5. A histogram displaying frequency distribution of birth weight of newborns at
Tikur Anbessa Hospital
2000
1800
1600
1400
1200
1000
800
600
Frequency
400
Std. Dev = 502.34
200 Mean = 3126
0 N = 9975.00
Birth weight
04/13/2024 88
Frequency polygons
• Instead of drawing bars for each class interval,
sometimes a single point is drawn at the mid point of
each class interval and consecutive points joined by
straight line.
04/13/2024 89
Fig.6. Frequency polygon of birth weight of 9975 newborns at Tikur Anbessa Hospital for males
and females
50
40
%
30
20
SEX
10
Males
Females
0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Birth Weight
04/13/2024 90
Box and Whisker Plot
It is another way to display information when
the objective is to illustrate certain locations
(skewness) in the distribution
04/13/2024 91
Box plot cont...
A box is drawn with the top of the box at the third
quartile (75%) and the bottom at the first quartile
(25%).
The location of the mid-point (50%) of the
distribution is indicated with a horizontal line in the
box.
Finally, straight lines, or whiskers, are drawn from
the centre of the top of the box to the largest
observation and from the centre of the bottom of the
box to the smallest observation.
04/13/2024 92
Box cont....
The box plot is then completed
Draw a vertical bar from the upper quartile to
the largest non-outlining value in the sample
Draw a vertical bar from the lower quartile to
the smallest non-outlying value in the sample
Any values that are outside the IQR but are not
outliers are marked by the whiskers on the plot
(IQR = P75 – P25)
04/13/2024 93
Box plots are useful for comparing two or
more groups of observations
04/13/2024 94
Drawing Box-and -whiskers plot
Raw data
35, 29, 44, 72, 34, 64, 41, 50, 54, 104, 39, 58
Order the data
29 34 35 39 41 44 50 54 58 64 72 104
Median = (44 + 50)/2 = 47 = Q2
Q1 = 37
Q3 = 61,Min = 29 , Max = 104
04/13/2024 95
Box plot Example
Min = 29 Q1 = 37 Q2 = 47 Q3 = 61 Max = 104
.. . .
0 10 20 30 40 50 60 70 80 90 100 110
04/13/2024 96
Scatter plot
Most studies in medicine involve measuring more than one
characteristic, and graphs displaying the relationship between
two characteristics are common in literature.
04/13/2024 97
Scatter plot ….
For two quantitative variables we use bivariate plots (also called
scatter plots or scatter diagrams).
It is used to see whether a relationship existed between the two
measures.
A scatter diagram is constructed by drawing
X-and Y-axes
Each point represented by a point or dot() represents a pair of
values measured for a single study subject =POSTIVE RELATION
04/13/2024 98
Scatter plot
• Scatter plot helps us to understand the association between two
variables using:
1. The trend
2. The shape and
3. The strength
Measure of association
• Identifying very strong and very weak association is easy by observing
the graph, but how we can classify everything in between?
04/13/2024 99
Scatter plot
• Linear correlation coefficient (R) measure the strength of association
between 2 variables.
• R values always range from -1 to 1
• R approaches to 1 shows a strong linear positive association
• R approaches to -1 shows a strong linear negative association
• R approaches to 0 shows a weak or no linear association
• Note: values in between is somewhat subjective
04/13/2024 100
Scatter Plots and Types of Correlation
60
x = hours of training
Accidents
50 y = number of accidents
Accidents
40
30
20
10
0 2 4 6 8 10 12 14 16 18 20
Hours of Training
101
Scatter Plots and Types of Correlation
x = SAT score
GPA4.00
3.75
y = GPA
3.50
3.25
3.00
2.75
2.50
2.25
2.00
1.75
1.50
300 350 400 450 500 550 600 650 700 750 800
Math SAT
x = height y = IQ
IQ
160
150
140
130
IQ
120
110
100
90
80
60 64 68 72 76 80
Height
No linear correlation
103
Scatter Diagram…
1. Direction of Relationship
Y Positive
Y Negative
X
04/13/2024 104
2. Form of Relationship
Y
Linear
X
Y
Curvilinear
X
04/13/2024 105
3. Degree of Relationship
Y Strong
Y
Weak
X
04/13/2024 106
Line graph
Useful for assessing the trend of particular situation
overtime. e.g. monitoring the trend of epidemics.
The time, in weeks, months or years, is marked along
the horizontal axis
Values of the quantity being studied is marked on the
vertical axis.
Values for each category are connected by continuous
line.
Sometimes two or more graphs are drawn on the same
graph taking the same scale so that the plotted graphs
are comparable.
04/13/2024 107
No. of microscopically confirmed malaria cases by species
and month at Zeway malaria control unit, 2003
2100
900
600
300
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Months
04/13/2024 108
Line graph cont..
Line graph can be also used to depict the
relationship between two continuous
variables like that of scatter diagram.
The following graph shows level of
zidovudine (AZT) in the blood of
HIV/AIDS patients at several times after
administration of the drug, for with normal
fat absorption and with fat mal absorption.
04/13/2024 109
Line graph cont…..
Response to administration of zidovudine in two groups of AIDS
patients in hospital X, 1999
8
7
Blood zidovudine
6
concentration
5
4
3
2
1
0
10
20
70
80
100
120
170
190
250
300
360
Tim e since adm inistration (Min.)
04/13/2024 110
Choosing graphs
Type of Data/or Appropriate Graphs
Purpose
Metric/Numerical -Histogram (one continuous var)
-Frequency Polygon (one/more cont. var)
-Cumulative Freq Polygon (ogive curve)
-Box and whisker (one cont. and one cat.
Var)
-Stem and Leave (one cont. var)
-Scatter (two cont. var)
04/13/2024
Summary Measures
Describing Data Numerically
Mode Variance
04/13/2024
MEASURES OF CENTRAL TENDENCY
04/13/2024
The Arithmetic Mean or simple Mean
• The mean is the average of the numbers. It
is add up all the numbers, then divide by
how many numbers there are
• It is written in statistical terms as:
04/13/2024
• Example 1: What is the Mean of these numbers? 6, 11, 7
• Add the numbers: 6 + 11 + 7 = 24
• Divide by how many numbers (there are 3 numbers): 24 / 3 = 8
• The Mean is 8
Why Does This Work?
• It is because 6, 11 and 7 added together is the same as 3 lots of 8:
• It is like you are "flattening out" the numbers.
04/13/2024
Example 2
Birth weights(gm) of all live
born infant born at a private
What is the arithmetic mean
hospital in a city, during a 1-
for the sample birth weights?
week period.
04/13/2024
Weighted Mean
• When averaging quantities, it is often necessary to
account for the fact that not all of them are equally
important in the phenomenon being described.
04/13/2024
• Example: In a given drug shop fourdifferentdrugs were sold for unit
price of 60, 85, 95 and 50 birr and the total numbers of drugs sold
were 10, 10, 5 and 20 respectively. What is the average price of the
four drugs in this drug shop?
• Solution: for this example we have to use weightedmeanusing
number of drugs sold as the respective weights for each drug's price.
Therefore, the average price will be: 65 birr
• If we don't consider the weights, the average price will be 72.5 birr
Weighted mean65
04/13/2024
Weighted Mean
• We can also calculate a weighted mean using some weighting
factor:
e.g. What is the average income of all
n people in cities A, B, and C :
w x i i
City
A
Avg. Income
$23,000
Population
100,000
x i 1
n B $20,000 50,000
w
i 1
i
C $25,000 150,000
04/13/2024
Geometric Mean
• The Geometric Mean is a special type of average where we multiply
the numbers together and then take a square root (for two numbers),
cube root (for three numbers) etc.
Example: What is the Geometric Mean of 2 and 18?
• First we multiply them: 2 × 18 = 36
• Then (as there are two numbers) take the square root: √36 = 6
04/13/2024
Example: What is the Geometric Mean of 10, 51.2 and 8?
• First we multiply them: 10 × 51.2 × 8 = 4096
• Then (as there are three numbers) take the cube root: 3√4096 = 16
• For n numbers: multiply them all together and then take the nth root
(written n√ )
04/13/2024
Estimating the Mean from Grouped Data
Someone timed 21 people in the race, to the
nearest second:
Seconds Frequency
51 - 55 2
56 - 60 7
61 - 65 8
66 - 70 4
is: 1288 m f i i
= 61.333… x = i=1
k
• Estimated Mean = 21 f i
i=1
04/13/2024
Correct mean
• If a wrong figure has been used when calculating the mean the correct
mean can be obtained with out repeating the whole process using:
04/13/2024
Disadvantages
• It may be greatly affected by extreme items and its
usefulness as a “Summary of the whole” may be
considerably reduced.
• When the distribution has open-ended classes, its
computation would be based assumption, and therefore may
not be valid.
04/13/2024
Median
• Suppose there are n observations in a sample. If
these observations are ordered from smallest to
largest, then the median is defined as follows:
• The sample median is
04/13/2024
Example 2
2.2. Consider the following
2.1. Compute the sample data, which consists of white
median for the birth blood counts taken on
weight data in example 1. admission of all patients
entering a small hospital on a
given day. Compute the
median white-blood count
(103).
7, 35,5,9,8,3,10,12,8
04/13/2024
Estimating the Median from Grouped Data
• Let's look at our data again:
04/13/2024
Cntd…
• We call it "61 - 65", but it really includes values from 60.5 up to (but
not including) 65.5.
• Why? the values are in whole seconds, so a real time of 60.5 is
measured as 61. Likewise 65.5 is measured as 65.
• At 60.5 we already have 9 runners, and by the next boundary at 65.5
we have 17 runners. By drawing a straight line in between we can pick
out where the median frequency of n/2 runners is:
04/13/2024
Cntd..
(n/2) − B
• Estimated Median = where: L+ ×w
G
• L is the lower class boundary of the group containing the median
• n is the total number of values
• B is the cumulative frequency of the groups before the median group
• G is the frequency of the median group
• w is the group width
• For our example: (21/2) − 9
• L = 60.5 Estimated
= 60.5 + ×5
Median 8
• n = 21
• B=2+7=9
• G=8
• w=5 = 61.4375
04/13/2024
i) Characteristics of Median
• It is an average of position/location .
• It is affected by the number of items than by extreme values.
ii) Advantages
• It is easily calculated and is not much disturbed by extreme
values
• It is more typical of the series
• The median may be located even when the data are incomplete,
e.g, when the class intervals are irregular and the final classes
have open ends.
04/13/2024
iii) Disadvantages
• it is determined mainly by the middle points in a
sample and is less sensitive to the actual numerical
values of the remaining data points.
• It is not so generally familiar as the arithmetic mean
04/13/2024
Mode
• It is the value of the observation that occurs with the greatest
frequency.
• A particular disadvantage is that, with a small number of
observations, there may be no mode.
• In addition, sometimes, there may be more than one mode such
as when dealing with a bimodal (two-peak) distribution.
• Find the modal values for the following data
a) 22, 66, 69, 70, 73. (No modal value)
b) 1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg)
04/13/2024
Estimating the Mode from Grouped Data
• We can easily find the modal group (the group with the highest
frequency), which is 61 - 65
• We can say "the modal group is 61 - 65"
fm − fm-1
Estimated Mode = L+ ×w
(fm − fm-1) + (fm − fm+1)
04/13/2024
Cntd…
• where:
• L is the lower class boundary of the modal group
• fm-1 is the frequency of the group before the modal group
• fm is the frequency of the modal group
• fm+1 is the frequency of the group after the modal group
• w is the group width
8−7
• In this example: Estimated
= 60.5 + ×5
Mode (8 − 7) + (8 − 4)
• L = 60.5
• fm-1 = 7 = 60.5 + (1/5) × 5
• fm = 8 = 61.5
• fm+1 = 4
• w=5
04/13/2024
Mode
Characteristics
• It is an average of position
• It is not affected by extreme values
• It is the most typical value of the distribution
Advantages
• Since it is the most typical value it is the most descriptive
average
• Since the mode is usually an “actual value”, it indicates the
precise value of an important part of the series.
04/13/2024
Disadvantages:-
• Unless the number of items is fairly large and the distribution
reveals a distinct central tendency, the mode has no
significance
• It is not capable of mathematical treatment
• In a small number of items the mode may not exist.
04/13/2024
Skewness:
• If extremely low or extremely high observations are present in a
distribution, then the mean tends to shift towards those scores. Based
on the type of skewness, distributions can be:
• Negatively skewed distribution: occurs when majority of scores are at
the right end of the curve and a few small scores are scattered at the
left end.
• Positively skewed distribution: Occurs when the majority of scores are
at the left end of the curve and a few extreme large scores are scattered
at the right end.
• Symmetrical distribution: It is neither positively nor negatively skewed.
A curve is symmetrical if one half of the curve is the mirror image of the
other half.
04/13/2024
Skewness…
• Data can be "skewed", meaning it tends to have a long tail on one
side or the other:
• Negative Skew?
• Why is it called negative skew? Because the long "tail" is on the
negative side of the peak.
• The mean is also on the left of the peak.
04/13/2024
Skewness…
The Normal Distribution has No Skew
A Normal Distribution is not skewed.
It is perfectly symmetrical.
And the Mean is exactly at the peak.
04/13/2024
Skewness…
Positive Skew
And positive skew is when the long tail is on the
positive side of the peak, and some people say it
is "skewed to the right".
The mean is on the right of the peak value.
04/13/2024
Skewness…
04/13/2024
Measures of Dispersion
• Which of the
distributions of scores
has the larger 125
100
dispersion? 75
50
25
75
50
25
0
1 2 3 4 5 6 7 8 9 10
04/13/2024
Measures of Dispersion
• The two data sets given above have a mean of 50, but obviously set 1 is more
“spread out” than set 2 how do we express this numerically?
• Some of the commonly used measures of dispersion (variation) are: Range,
inter quartile range, quartiles, percentiles, variance, standard deviation and
coefficient of variation.
04/13/2024
Range and Interquartile Rage
• Range
• Simplest and the crudest measure of variation
• Difference between the largest and the smallest observations: Range =
Xlargest – Xsmallest
• Ignores the way in which data are distributed
• It wastes information for it takes no account of the entire data.
• Sensitive to outliers
• Interquartile Range
• Eliminate some high- and low-valued observations and calculate the range
from the remaining values
• Interquartile range = 3rd quartile – 1st quartile
04/13/2024
= Q3 – Q1
Quartiles and Percentiles
• Percentiles: If data is ordered and divided into 100 parts, then cut
points are called Percentiles
04/13/2024
Quartiles
• The 25th percentile is often When we wish to find the
referred to as the first quartiles for a set of data, the
quartile and denoted Q1. following formulas are used
• The 50th percentile (the
median) is referred to as
the second or middle
quartile and written Q2’
and
• the 75th percentile is
referred to as the third
quartile, Q3.
04/13/2024
Using the Five-Number Summary to Explore
the Shape
• Box-and-Whisker Plot: A Graphical display of data using 5-number
summary:
• The Box and central line are centered between the endpoints if data
are symmetric around the median
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Standard Deviation and Variance
• show the scatter of the individual measurements around the mean of
all the measurements in a given distribution.
• The variance represents squared units and, therefore, is not an
appropriate measure of dispersion when we wish to express this
concept in terms of the original units.
• To obtain a measure of dispersion in original units, we merely take the
square root of the variance. The result is called the standard
deviation.
• Variance the average of the squared difference from the mean
• Standard deviation is the square root of variance
04/13/2024
Variance and Standard Deviation
Population Sample
i
x 2
ix x 2
s
N n 1
SD variance
04/13/2024
To calculate standard deviation
1. C alculate the m ean
x
2. C alculate the residual for each x x x
04/13/2024
Example- Find Standard Deviation of Ungroup
Data
Family No. 1 2 3 4 5 6 7 8 9 10
Size (xi) 3 3 4 4 5 5 6 6 7 7
04/13/2024
Here, x
x i
50
5
n 10
xi 3 3 4 4 5 5 6 6 7 7 50
xi x -2 -2 -1 -1 0 0 1 1 2 2 0
x i x 2
4 4 1 1 0 0 1 1 4 4 20
ix x 2
20
s2
n 1
9
2.2, s 2.2 1.48
04/13/2024
Example
• The length of a newborn baby are: 600mm, 470mm, 170mm, 430mm
and 300mm.
• Find out the Mean, the Variance, and the Standard Deviation.
• Your first step is to find the Mean:
• Answer:
• Mean = 600 + 470 + 170 + 430 + 300 = 1970 = 394
5 5
• so the mean (average) height is 394 mm.
• Let's plot this on the chart:
04/13/2024
Cntd…
04/13/2024
To calculate the Variance, take each difference,
square it, and then average the result:
Standard Deviation
σ = √21,704
= 147.32...
= 147 (to the nearest
mm)
04/13/2024
Cntd…
04/13/2024
• And the good thing about the Standard Deviation is that it is useful.
Now we can show which lengths are within one Standard Deviation
(147mm) of the Mean:
• So, using the Standard Deviation we have a "standard" way of
knowing what is normal, and what is extra long or extra short.
04/13/2024
Why square the differences?
• If we just add up the differences from the mean ... the negatives
cancel the positives:
•
4+4-4-4 =0
• 4
04/13/2024 167
Probability
• Probability is the language of chance.
• The deliberate use of chance is the central idea of statistical designs
for producing data.
• Probability is so important for data – leaders of the distribution as
maps for a journey
• Probabilities are used in everyday communication
• Probability theory was developed out of attempting to solve
problems related to games of chance such as tossing a coin, rolling a
die etc.
i.e. trying to quantify personal beliefs regarding degrees of
uncertainty.
04/13/2024
Question from Simple Probabilities
1. What is the probability that a card drawn at random
from a deck of cards will be an ace ?
2. A book contains 32 pages numbered 1, 2, ..., 32. If a
student randomly opens the book, what is the
probability that the page number contains digit 1?
3. A mother in the delivery room to give birth and the
health worker informed her as she will deliver at
9:30 pm. She is eager to give birth of a daughter.
What is the probability that she will get what she
wants?
04/13/2024
Chance
04/13/2024
Basic terms
• Experiment: Is any activity from which result can
be obtained.
• Example: 1. flipping a coin
2. rolling a die
3. drawing 30 individual from the pop
• Sample space: set of possible outcome from the
experiment
Example: 1. coin toss {H, T}
2. Rolling a die {1, 2, 3, 4, 5, 6}
• Event: a collection of outcomes
04/13/2024
• The Sample Space is all
possible outcomes.
• A Sample Point is just
one possible outcome.
• And an Event can be
one or more of the
possible outcomes.
04/13/2024
Properties of probability
1. Possible outcome of probability range 0-1
2. Generally the probability of two events happening
is given by
P(AuB)=P(A)+P(B)-P(AnB)
3. If two events are mutually exclusive then
P(AuB)=P(A)+P(B)
4. If two events are independent then
P(AnB)=P(A).P(B)
04/13/2024
Unions and Intersections
Unions of Two Events
•“If A and B are events, then the union of A and B, denoted by
AUB, represents the event composed of all basic outcomes in A
or B.”
• Intersections of Two Events
“If A and B are events, then the intersection of A and B,
denoted by AnB, represents the event composed of all basic
outcomes in A and B.”
A B
04/13/2024
Addition rules
• Rule 1: If 2 events, B & C, are mutually exclusive (i.e., no overlap) then
the probability that one or both occur is P(B or C) = P(B ∪ C) = P(B) +
P(C)
• Rule 2: For any given pair of events, if the sum of their probabilities is
equal to one, then those two events are mutually exclusive.
• Rule 3: For any 2 events, A & B, not mutually exclusive, the probability
that one or both occur is P(A or B) = P(A∪B) = P(A)+P(B)-P(A n B)
04/13/2024
• Example 1: One die is rolled. Sample space = S = (1, 2, 3, 4, 5, 6)
Let A = the event an odd number turns up, A = (1, 3, 5)
Let B = the event a 1, 2 or 3 turns up; B = (1, 2, 3)
Let C = the event a 2 turns up, C= (2)
I) Find Pr (A); Pr (B) and Pr (C)
• Pr (A) = Pr (1) + Pr (3) + Pr (5) = 1/6+1/6+ 1/6 = 3/6 = 1/2
• Pr (B) = Pr (1) + pr (2) + Pr (3) = 1/6+1/6+1/6 = 3/6 = ½
• Pr (C) = Pr (2) = 1/6
II) Are A and B; A and C; B and C mutually exclusive?
• A and B are not mutually exclusive. Because they have the
elements 1 and 3 in common
• Similarly, B and C are not mutually exclusive. They have the
element 2 in common
• A and C are mutually exclusive. They don’t have any element in
common
04/13/2024
The Addition . . .
If two events A and B are not mutually exclusive, then, P (A
U B) = P (A) + P (B) – P (A and B)
Example
1. There are 80 nurses and 40 physicians in a hospital. Of
these, 70 nurses and 15 physicians are females. If a staff
person is selected at random, find the probability that the
subject is a nurse or male.
04/13/2024
Conditional probabilities and the multiplicative law
• Let’s assume two questions on a test, the
first question is a true/false and the second is
a multiple question type with five possible
answers (a, b, c, d, e)
• True or False: Heart is an organ which pumps blood in our
body.
• MCQ: Which of the following human organ is used for
breathing?
a. Brain b. Liver c. Lung d. Kidney e. Heart
• If the answers are random guesses the 10
possible outcomes are equally likely so
04/13/2024
• A tree diagram is a picture of the possible outcomes
of a procedure
04/13/2024
•
04/13/2024
Multiplicative Rule
• When two events are said to be independent of each
other, what this means is that the probability that
one event occurs in no way affects the probability of
the other event occurring.
• P (A/B)= P(A) , and P(B/A)= P(B) ; and so, P(A and B)=
P(A) P(B)
04/13/2024
• Eg. 1) A classic example is n tosses of a coin and the
chances that on each toss it lands heads. These are
independent events. The chance of heads on any one
toss is independent of the number of previous heads. No
matter how many heads have already been observed,
the chance of heads on the next toss is ½.
04/13/2024
• Sometimes the chance a particular event happens depends on
the outcome of some other event. This applies obviously with
many events that are spread out in time
• Eg. The chance a patient with some disease survives the next
year depends on his having survived to the present time. Such
probabilities are called conditional.
1-19 times 32 7 39
20-99 times 18 20 38
more than 100 times 25 9 34
--------------------------------------------------------------------------------------------
Total 75 36 111
---------------------------------------------------------------------------------------------
04/13/2024
Questions
1.What is the probability of a person randomly
picked is a male?
2. What is the probability of a person randomly
picked uses cocaine more than 100 times?
3.Given that the selected person is male, what is the
probability of a person randomly picked uses
cocaine more than 100 times?
4.Given that the person has used cocaine less than
100 times, what is the probability of being female?
5.What is the probability of a person randomly
picked is a male and uses cocaine more than 100
times?
04/13/2024
Summary for the Multiplicative Rule
04/13/2024
Probability as a Numerical Measure of the Likelihood of
Occurrence
0 .5 1
Probability:
04/13/2024
Permutations
The number of possible permutations is the number of
different orders in which particular events occur. The
number of possible permutations are
n!
Np
r ( n r )!
where r is the number of events in the series, n is the
number of possible events, and n! denotes the factorial
of
n = the product of all the positive integers from 1 to n.
04/13/2024
Combinations
When the order in which the events occurred is of no
interest, we are dealing with combinations. The number
of possible combinations is
n n!
Nc
r r!(n r)!
where r is the number of events in the series, n is the
number of possible events, and n! denotes the factorial
of n = the product of all the positive integers from 1 to
n.
04/13/2024
Bayes' Theorem
• The term probability distribution or just distribution refers to the way data are
distributed, in order to draw conclusions about a set of data.
• With numeric variables, the aim is to determine whether or not normality may
be assumed.
04/13/2024
I. Probability distribution of a categorical variables
• The probability distribution of a categorical variable tells us with what
probability the variable will take on the different possible values.
• That is it specifies all possible outcomes of the categorical variable along with
the probability that each will occur.
E.g. Consider the value on the face showing up from tossing a die. The probability
distribution of this variable is
Value on Face 1 2 3 4 5 6
Probability 1/6 1/6 1/6 1/6 1/6 1/6
• Notice that the total probability is 1.
04/13/2024
Bernoulli Distribution
• A random experiment with only one experiment with probability p
and q; where p+q=1, is called Bernoulli trials
• The outcome of an experiment can either be success (i.e., 1) and
failure (i.e., 0).
• Pr(X=1) = p, Pr(X=0) = 1-p, or
04/13/2024
Binomial distribution
• In general the binomial distribution involves three assumptions
• There are fixed n number of trials each of which results in one of two mutually exclusive
outcomes.
• the outcomes of n trials are independent.
• the probability of “success” is constant for each trial
• Pr (X=success) = Pr (X=1) = p
• Pr (X=failure) = Pr (X=0) = 1-p
n k n k
P(k) p 1 p
k
04/13/2024
The binomial distribution
A process that has only two possible outcomes is called
a binomial process. In statistics, the two outcomes are
frequently denoted as success and failure. Binomial
distribution is a sum of independent and evenly
distributed Bernoulli trials. The binomial distribution
gives the probability of exactly k successes in n trials
n k n k
P(k) p 1 p
k
04/13/2024
Binomial distribution….
• In addition to the probabilities of individual outcomes, we can also compute the
numerical summary measures associated with a probability distribution.
• The mean and variance values for a binomial distribution or the average
number of successes in repeated samples of n is equal to
np
V npq
• Example 1: From the sample of 1000 US population, there are 290 smokers, if
we want to get the mean and standard deviation of the proportion of smokers,
we can use the formula of the following;
• Mean=nxp=1000x0.29=290
______________
S.d = √1000(0.29X0.71) = 14.4
04/13/2024
Binomial distribution….
Example 2: Suppose that in a certain population 52% of all recorded births are
males. If we select randomly 10 birth records What is the probability that
exactly
• 5 will be males? Given n=10, x=5,
• Pr (X= x) = n! p x (1- p) n- x
x ! (n -x )!
So Pr (X=5) = 10! X 0.52 x (1- 0.52)
5 10-5
=0.24
5!(10-5)!
• 3 or more will be females?
• Pr(X≥3) = 1- Pr (X<3) = 1-[Pr(X=0)+Pr(X=1)+Pr(X=2)]
=1-[0.001+0.013+0.055]= 1-0.069=0.931
04/13/2024
Random variable and Probability
distributions
• A continuous random variable has infinitely many values, and those values
can be associated with measurements on a continuous scale in such a way
that there are no gaps or interruptions.
Eg. Voltage of electricity
04/13/2024
Every probability distribution must satisfy each of the following two requirements
04/13/2024
Random Variable
• A Random Variable is a set of possible values from a
random experiment
• Example: Tossing a coin: we could get Heads or Tails.
• Let's give them the values Heads=0 and Tails=1 and
we have a Random Variable "X":
random possible random
variable values events
0 H
X =
1 T
04/13/2024
• So:
• We have an experiment (like tossing a coin)
• We give values to each event
• The set of values is a Random Variable
04/13/2024
• Eg. Toss a coin 3 times. Let x be the number of heads obtained. Find the
probability distribution of x . f (x) = Pr (X = xi) , i = 0, 1, 2, 3.
• Pr (x = 0) = 1/8 …………………………….. TTT
• Pr (x = 1) = 3/8 ……………………………. HTT THT TTH
• Pr (x = 2) = 3/8 ……………………………..HHT THH HTH
• Pr (x = 3) = 1/8 ……………………………. HHH
• Probability distribution of X.
• The required conditions are also satisfied. i) f(x) 0 ii) f (xi) = 1
X = xi 0 1 2 3
04/13/2024
The birth of a son or a daughter
are mutually exclusive events
because the two events will not
happen at the same time.
04/13/2024
Example : Sex Ratio in a Family of 3
• Assume that the probability of a boy = child child child
1/2 and the probability of a girl = 1/2. #1 #2 #3
04/13/2024
The expected value of a discrete random variable
The expected value, denoted by E(x) or , represents the “average” value of the random variable. It is
obtained by multiplying each possible value by its respective probability and summing over all the values
that have positive probability.
04/13/2024
Where the xi’s are the values the random variable assumes with positive probability
Example: Consider the random variable representing the number of episodes of diarrhea in the first 2
years of life. Suppose this random variable has a probability mass function as below
R 0 1 2 3 4 5 6
P(X .129 .264 .271 .185 .095 .039 .017
= r)
What is the expected number of episodes of diarrhoea in the first 2 years of life?
E(X) = 0(.129) +1(.264) +2(.271) +3(.185) +4(.095) +5(.039) +6(.017) = 2.038
Thus, on the average a child would be expected to have 2 episodes of diarrhoea in the first 2 years of life
04/13/2024
The variance of a discrete random variable
The variance represents the spread of all values that have positive probability relative to the expected
value. In particular, the variance is obtained by multiplying the squared distance of each possible value
from the expected value by its respective probability and summing overall the values that have positive
probability.
2 k 2 k 2 2
V(X) = σ ( x i μ ) P(X x i ) x i P(X x i ) μ
i 1 i1
Where the Xi’s are the values for which the random variable takes on positive probability. The SD of a
random variable X, denoted by SD(X) or is defined by square root of its variance.
04/13/2024
Example: Compute the variance and SD for the random variable representing number of episodes
of diarrhea in the first 2 years of life.
E(X) = = 2.04
n
x i P(X x i ) = 02(.129) + 12(.264) + 22(.271) + 32(.185) + 42(.095) + 52(.039) + 62(0.017) = 6.12
i 1
04/13/2024
Binomial distribution, generally
n X n X
p (1 p )
X 1-p = probability of
failure
X=# p = probability of
successes out success
of n trials
04/13/2024
Exercise
1. Each child born to a particular set of
parents has a probability of 0.25 of having
blood type O. If these parents have 5
children.
What is the probability that
a. Exactly two of them have blood type O
b. At most 2 have blood type O
c. At least 4 have blood type O
d.2 do not have blood type O.
04/13/2024
Exercise….
2. Suppose past experiences in a certain malarious area
indicated that the probability of a person with a high
fever will be positive for malaria is 0.7. Consider 3
randomly selected patients (with high fever) in that same
area.
a) What is the probability that no patient will be positive
for malaria?
b) What is the probability that exactly one patient will be
positive for malaria?
c) What is the probability that exactly two of the patients
will be positive for malaria?
d) What is the probability that all patients will be positive
for malaria?
04/13/2024
The Poisson distribution
When the probability of “success” is very small, e.g., the
probability of a mutation, then pk and (1 – p)n – k become too
small to calculate exactly by the binomial distribution. In
such cases, the Poisson distribution becomes useful. Let l
be the expected number of successes in a process
consisting of n trials, i.e., l = np. The probability of
observing k successes is
k
e
P(k)
k!
The mean and variance of a Poisson distributed variable are
given by m = l and V = l, respectively.
04/13/2024
Plots of Poisson Distribution
04/13/2024
The Poisson distribution…
• Example 3. Suppose x is a random variable representing
the number of individuals involved in a road accident
each year (In US 2.4 are involved per 10,000 population
each year)
• I.e. λ = 2.4 per 10000
• Pr (X=0) = e-2.4 2.40 = 0.091
0!
• Pr (X=1) = e-2.4 2.41 = 0.218
1!
• Pr (X=2) = e-2.4 2.42 = 0.262
2!
04/13/2024
II. Probability distribution of Numeric variables
04/13/2024
Characteristics of a distribution
• Features commonly used to describe a distribution are
location, dispersion, modality and skew ness.
• Location tells us something about the average
value of the variable.
• Dispersion tells us something about how
spread out, the values of the variable are.
• Modality refers to the number of peaks in the
distribution.
• Skew ness refers to whether or not the
distribution is symmetric
• A distribution is said to be symmetric if it is
symmetrically distribute about its mode.
04/13/2024
2.Probability distribution of continuous variables
• Under different circumstances, the outcome of a random
variable may not be limited to categories or counts.
• E.g. Suppose, X represents the continuous
variable ‘Height’; rarely is an individual
exactly equal to 170cm tall
• X can assume an infinite number of
intermediate values 170.1, 170.2, 170.3 etc.
• Because a continuous random variable X can take on an
uncountably infinite number of values, the probability
associated with any particular one value is almost equal
to zero.
04/13/2024
Continuous Random Variables
• A smooth curve describes the probability distribution of a
continuous random variable.
04/13/2024
Properties of Continuous Probability Distributions
• The area under the curve is equal to 1.
• P(a x b) = area under the curve between a and b.
2
1 x
1
2
f ( x) e for x
2
e 2.7183 3.1416
and are the population mean and standard deviation.
04/13/2024
How the Normal curve shifts when parameters
change
-a μ a X
-1
- 0 11 X-μ
-1 0 1 X-μ
𝜎
Same location (μ) but different 𝜎 (S.D)
𝜎=1
𝜎-2
𝜎=3
μ
Biostatistics course by Girma Taye
(PhD), AAU
Same 𝜎 but different location (mean)
04/13/2024
The Standard Normal Distribution
• To find P(a < x < b), we need to find the area under the
appropriate normal curve.
• To simplify the tabulation of these areas, we
standardize each value of x by expressing it as a z-
score, the number of standard deviations s it lies from
the mean m.
x
z
04/13/2024
The Standard
Normal (z)
Distribution
04/13/2024
Using normal table
The four digit probability in a particular row and column
of Table 1 gives the area under the z curve to the left
that particular value of z.
04/13/2024
Area for z = 1.36
Example
Use Table 1 to calculate these probabilities:
P(zz 1.36)
P( 1.36) =
= .9131
.9131
P(zz >1.36)
P( >1.36)
=
= 11 -- .9131
.9131 =
= .0869
.0869
P(-1.20 zz 1.36)
P(-1.20 1.36)
=
= .9131
.9131 -- .1151
.1151
=
= .7980
.7980
04/13/2024
Example
P (.80 x .85)
P (2 z 1.5)
.0668 .0228 .0440
04/13/2024
Example
What is the weight of a package
such that only 1% of all packages
exceed this weight?
P ( x ?) .01
? 1
P( z ) .01
.1
? 1
From Table 1, 2.33
.1
? 2.33(.1) 1 1.233
04/13/2024
Approximating the Binomial
Make sure to include the entire rectangle for the
values of x in the interval of interest. This is called
the continuity correction.
Standardize the values of x using
x np
z
npq
04/13/2024
μ-3σ μ-2σ μ-σ μ μ+σ μ+2σ μ+3σ
04/13/2024
Exercises
04/13/2024
Table 1: Normal distribution
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
04/13/2024 2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
Table 2: Student’s t-distribution
04/13/2024
Thank you!
04/13/2024 247