Lecture-3&4 - Measure of Centeral T
Lecture-3&4 - Measure of Centeral T
2
Cont’d …
o When two or more groups are measured, the central tendency
provides the basis of comparison between them.
3
Cont’d …
The most common measures of central tendency include:
1. Mean (Arithmetic, Weighted, Geometric, and
Harmonic)
2. Median
3. Mode
4
X 1 X 2 ... X n
x x2 ... xn
X 1 N
n
N
n
x i
x
i 1
i
i 1
N
n
5
k
fi X i
X i 1
k
i 1
fi
fi X i
X i 1k ,
i 1
fi
6
Example
7
Grouped Data
8
Special properties of A.M
Cont’d
X 1 n1 X 2 n 2 X i ni
40(350) 60(380)
Xc i 12 368Birr
n1 n 2 40 60
n i 1
i
10
Merits and demerits of arithmetic mean
o The mean can be used as a summary measure for quantitative
data, but it is not appropriate for either nominal or ordinal data.
o For a given set of data, there is only one arithmetic mean
(uniqueness).
o Easy to calculate and understand (simple).
o Greatly affected by the extreme values.
15
Con'd….
Merits and Demerits of Arithmetic Mean
Merits:
It is based on all observations
It is suitable for further statistical analysis
It is easy to calculate and simple to understand
Demerits:
It is affected by extreme observations
It cannot be used in the case of open-ended classes
It cannot be used when dealing with qualitative
characteristics, such as intelligence, honesty, and beauty
By habtamu.A. 12/13/24
3. Geometric mean
Con'd….
By habtamu.A. 12/13/24
Con'd….
Values 3 4 5 6
Freq. 2 3 1 2
Con'd….
4. Harmonic Mean
Con'd….
Activity; discuses the advantage and disadvantage of A.M, G.M
and H.M
Exercise: The number of diarrhea episodes for 25child are
summarized in the following table.
diarrhea No child
episodes
1 3
2 3
3 f3
5 2
6 10
8 f6
If the arithmetic mean is 4.8,then what are the values of f3 and f6?
23
Median is as its name indicates the middle most value in the
arrangement which divides the data in to two equal parts
~ ~ 1
X X 1 X X n X n
2
( n 1)
2 2 1
2
i.e
When n = 11, then the median is the 6th observation.
When n = 12, then the median is the 6.5th observation, which is an
observation halfway between the 6th and 7th ordered observation.
24
Example: For the same random sample, the ordered observations will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. = (32+34)/2 = 33.
Median of Group Data
~ w n
X Lme lcfbm
fm 2
25
Example: Find Median
26
Solution: To determine the median class, we have to take the class
that contains
th th
n 5763 th
2881.5 item
2 2
27
THE MODE ( X̂ )
The mode or modal value is the value with the highest frequency in
the data set. The mode of a set of data or distribution can be:
No mode: In this case all values appear equal number of times
Unimodal: If the distribution has only one mode
Bimodal: If the distribution has two modes
Multi-modal: If the distribution has more than two modes
28
Mode of Group Data
1
x Lmo w
1 2
Lmo = Lower boundary of modal class
Δ1 = difference of frequency between modal class and class before it
Δ2 = difference of frequency between modal class and class after it
w = class width
1
f mo
f 1
2
f f
mo 2
10-14 7 2
f f 12 7 5
mo 2
15-19 5
1
20-24 0 x L mo W
Total n = 30 1 2
6
x 4.5 5
65
x 4.5 0.55 5.05
30
Measure of position(quantiles)
Quantiles are measures of position that divide a dataset into
equal intervals, each containing a specific proportion of the data.
They help to describe the distribution of a dataset by identifying
values at specific points that divide the data into portions.
The most commonly used quantiles are: quartile, decile and
percentile
31
Con'd….
Con'd….
Example: The following data shows the age of 30 sampled patients
in JUSH 6, 9, 11, 14, 16, 17, 18, 21, 22, 22, 22, 22, 23, 25, 25,
26, 27, 28, 28, 32, 33, 34, 34, 36, 39, 39, 41, 45, 46, 49
Find the lower, middle and upper quartiles for the above data.
Solution: n = 30 Q1 1 (n 1)th = 1 4 (30 1) th
4
Measures of Variation
Introduction
o Measures of central tendency locate the center of the distribution.
However, they do not tell how individual observations are
scattered on either side of the center. The spread of observations
around the center is known as dispersion or variability.
o In other words, the degree to which numerical data tends to
spread about an average value is called dispersion or variation of
the data.
o Measures of dispersions are statistical measures that provide
ways of measuring the extent to which data are dispersed or
spread out.
Significance of measure of dispersion
o To determine the reliability of an average: If the variation is
small, the average will closely represent the individual values and
is highly representative on the other hand, if the dispersion or
variation is large, the average will be quite unreliable.
o To compare the variability of two or more groups: It is also
useful to determine the uniformity or consistency of two or more
groups. A high degree of variation would mean less consistency or
less uniformity as compared to the data having less variation.
o For facilitating the use of other statistical measures: Measures
of dispersion serve as the basis of many other statistical measures
such as correlation, regression, and testing of hypothesis.
Type of measure of dispersion
1. Absolute measures of variation. The absolute measure is
expressed in the same statistical unit in which the original data
are given such as kilograms, tones, etc. These measures are
suitable for comparing the variability in two distributions having
variables expressed in the same units and the same averaging
size.
2. Relative measure of variation: In case the two sets of data are
expressed in different units of measurement, then the absolute
measures of variation are not comparable. In such cases,
measures of relative variation should be used
Absolut Relative
1. Range Relative range
2. Inter-quartile range Coefficient of quartile
deviation
3. Variance Coefficient of variation
4. Standard deviation Standard score
Range
o It is the difference between the largest and smallest
observations from the data.
o Example: Consider the data on the weight (in Kg) of 10
newborn children at Jimma Hospital within a month: 2.51,
3.01, 3.25, 2.02,1.98, 2.33, 2.33, 2.98, 2.88, 2.43
Solution: The range for the dataset can be computed by first
arranging all observations into ascending order as: 1.98, 2.02,
2.33, 2.33, 2.43, 2.51, 2.88, 2.98, 3.01, 3.25
By habtamu.A. 12/13/24
Con'd….
Some important properties of variance and standard
deviation
12/13/24
Coefficient of variation (CV)
oWhen two data sets have different units of measurement, or their
means differ sufficiently in size, the CV should be used as a
measure of dispersion. It is used to assess the relative variability
of data.
oThe coefficient of variation is defined as the ratio of standard
deviation to the mean, usually expressed as a percent.
oData with lower CV indicates less variability or consistency,
meaning the data is more tightly clustered around the mean.
oData with higher CV indicates more variability relative to the
mean, meaning the data is more spread out.
o measure variation relative to the mean and present in percentage
(%)
Con'd….
Example: Last semester, the students of the nursing and
anesthesia departments took Stat273 course. At the end of the
semester, the following information was recorded.
Solution: The means of the two sets of data are very different, we
use coefficient of variation to compare variability
Con'd….
The coefficients of variation are calculated as
By habtamu.A. 12/13/24
Graphical representation
The diagram illustrates the shape of three different curves mentioned above
Measure of kurtosis
By habtamu.A.
Con'd….
67
Chapter Five
Elementary Probability
Introduction
o Every scientific experiment investigating the patterns in natural
phenomena may result in " events " that may or may not happen.
o Most of the events in real life have uncertainty in their
happening.
For example, The event of increase in the gold price under an
economic condition in a country.
A drug control for cancer patients for curing disease in a
period of time.
In such a situation, knowledge about the chance or probability of
occurrence of an event of interest is vital and calculation of
probability for happening of an event is imperative.
Con'd….
The word probability has two basic meaning
Quantitative measure of uncertainty is concerned with
decision-making under risk and uncertainty conditions or the
occurrence of uncertain events that will happen in the future.
70
Definitions of some probability terms
o Experiment: any process which generates a well-defined
outcome.
o Random Experiment: it is an experiment that can be repeated
any number of times under the same conditions, but their
outcomes are uncertain and do not give unique results.
o Random experiment has three important components
multiplicity of outcomes
uncertainty regarding the outcomes, and
repeatability of the experiment under identical manner
Example: Toss a coin many times
There are two (H or T) possible outcome → multiplicity
In each toss no certainty of getting H or T → uncertainty
Tossing many times (in same fashion) → repeatability
Con'd….
o Outcome: The result of a single trial of a random experiment.
Example: experiment outcome
Tossing a fair coin: head, tail
Rolling a fair die: 1, 2, 3, 4, 5, 6.
o Sample space: The set of all possible outcomes of a random
experiment is called sample space and is usually denoted by S
(or Ω).
o Event: A subset of the sample space is called an event and
denoted by upper case letter.
o Equally Likely Events: Events which have the same chance of
occurring.
o Elementary event: An event consisting of a single outcome.
o Impossible event: An event that can’t occur.
Con'd….
Con'd….
Fundamental of counting principle
By habtamu.A. 12/13/24
1. Permutation
Con'd….
By habtamu.A. 12/13/24
Examples: Suppose we have the letters A, B, C, and D
a. How many permutations are there taking all four?
b. How many permutations are there if two letters are used at a
time?
1.How many different permutations can be made from the letters in
the word “CORRECTION”?
2. Combination
Con'd….
80
Activity
1. A question paper contains section A with 5 questions and
section B with 7 questions. A student is required to attempt 8
questions in all, selecting at least 3 from each section. In how
many ways can a student select the questions?
2. A class contains 12 boys and 10 girls. From the class, 10
students are to be chosen for a competition under the condition
that at least 4 boys and at least 4 girls must be represented. In
how many ways can the selection are made?
82
Approaches to measuring probability
By habtamu.A. 12/13/24
Con'd….
94
Con'd….
95
Conditional Probability
By habtamu.A. 12/13/24
Con'd….
Con'd….
Con'd….
Con'd….
Con'd….
Con'd….
Activity
103
Con'd….
104
Chapter Six
By habtamu.A. 12/13/24
Con'd….
There are two types of random variable
By habtamu.A. 12/13/24
Con'd….
Probability density function of random variable
Con'd….
Con'd….
Expectation of random variable
12/13/24
Variance of random variable
What is the expected proportion of surveys returned in any given
quarter?
Solution: by definition
Common discrete probability distribution
By habtamu.A. 12/13/24
Con'd….
Con'd….
By habtamu.A. 12/13/24
Con'd….
Con'd….
12/13/24
Con'd….
2. Poisson Distribution
The number of events that occur in an interval of time when the
events are occurring at a constant rate. It provides a model for the
relative frequency of the number of "rare events" that occur in a
unit of time, area, volume, etc.
124
Con'd….
Con'd….
Exercise: On average, five smokers pass a certain street
corner every ten minutes, what is the probability that during a
given 10 minutes the number of smokers passing will be
a. 6 or fewer
b. 7 or more
c. Exactly 8…….
1.The average number of traffic accidents per week in a small
city is equal to 3.
a. What is the probability that there will not be any
accidents in the next one week?
b. What is the probability that there will be an accident
Bywithin the next 2 weeks?
habtamu.A. 12/13/24
Common continuous probability distribution
Properties of the normal distribution curve
The areas under the curve that lie within one standard deviation,
Greater accuracy
By habtamu.A. 12/13/24
Simple Random Sampling
o It is a method of selecting items from a population such that
every possible sample of a specific size has an equal chance of
being selected. In this case, the sample can be drawn in two
possible ways.
o The sampling units are chosen without replacement in the
sense that the unit once chosen are not placed back in the
population.
o The sampling units are chosen with replacement in the sense
that the chosen units are placed back in the population.
o Simple random sampling involves randomly selecting
respondents from a sampling frame, but with large sampling
frames, usually a table of random numbers or a computerized
random number generator is used.
Stratified Random Sampling
o When the population is heterogeneous concerning the study
variable it would not be desirable to use simple random sampling.
In such cases, stratified random sampling would be appropriate.
o The population is first divided into homogenous groups called
strata and a simple random sample is then taken from each strata.
o Elements in the same strata should be more or less homogeneous
while different in different strata.
o The number of units to be selected from each stratum can be
determined by one of the following allocation methods.
o Proportional allocation: If the same sampling fraction is used for
each stratum. Some of the criteria for dividing a population into
strata are: Sex (male, female); Age (under 18, 18 to 28, and 29 to
39).
Con'd….
Example: To find the average height of the students in a school
of class 1 to class 12, the height varies a lot as the students in
class 1 are of age around 6 years and students in class 10 are of
age around 16 years. So one can divide all the students into
different subpopulations or strata such as
Students of class 1, 2, and 3: Stratum 1
Students of class 4, 5, and 6: Stratum 2
Students of class 7, 8, and 9: Stratum 3
Students of class 10, 11, and 12: Stratum 4
oNow draw the samples by SRS from each of the strata 1, 2, 3,
and 4. All the drawn samples combined will constitute the final
stratified sample for further analysis.
Cluster Sampling
o If you have a population dispersed over a wide geographic
region, it may not be feasible to conduct a simple random
sampling of the entire population. In such case, it may be
reasonable to divide the population into clusters (usually along
geographic boundaries).
o A simple random sample of groups or clusters of elements is
chosen and all the sampling units in the selected clusters will
be surveyed.
o Clusters are formed in a way that elements within a cluster are
heterogeneous, i.e. observations in each cluster should be more
or less dissimilar.
o Cluster sampling is useful when it is difficult or costly to
generate a simple random sample.
Con'd….
6 8 10 12 14
6 8 10 12 14
6 6 7 8 9 10
8 7 8 9 10 11
10 8 9 10 11 12
12 9 10 11 12 13
14 10 11 12 13 14
Con'd….
o Step 3: Summarize the mean obtained in step 2 in terms of
frequency distribution.
Con'd….
By habtamu.A. 12/13/24
Con'd….
o Exercise: a rolling team consists of four rowers who weigh