Chapter 3&4 5
Chapter 3&4 5
A single value that describes the characteristics of the entire mass of data is called measures of
central tendency or average.
We say a measure of central tendency is best if it possess most of the following. It should:
- be simple to understand and easy to calculate/interpret,
- exist and be unique,
- be rigidly defined by mathematical formula,
- based on all observations,
- Not be seriously affected by extreme observations,
- Have capable of further statistical analysis and/or algebraic manipulation.
3.2 Types of Measures of Central Tendency
Several types of averages or measures of central tendency can be defined, the most commons are
- the mean
- the mode
- the median
3.2.1. The Mean
Arithmetic mean is defined as the sum of the measurements of the items divided by the total
number of items.
Arithmetic Mean for Ungrouped Frequency Distribution
When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is
𝑓1 𝑥1 + 𝑓2 𝑥2 + ⋯ + 𝑓𝑘 𝑥𝑘 ∑𝑘𝑖=1 𝑓𝑖 𝑥𝑖 k
𝑋̅ =
𝑓1 + 𝑓2 + ⋯ + 𝑓𝑘
= 𝑘
∑𝑖=1 𝑓𝑖
Note that f
i 1
i n
1
Example 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record
the following:
17.5, 19.5, 17.5, 19, 20, 21, 18, 19.5, 18, 10.75
Example 2: Monthly incomes of fourth year regular students are given in the following frequency
distribution.
Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
k
f m i i f m
f m ... f m
x i 1
1 1 2 2 k k
f
k
f f ... f
1 2 k
i
i 1
th
Where mi is he class mark of the i class; i = 1, 2, …, k
k
Note that f i n = the total number of observations.
i 1
Example: The following table gives the daily wages of laborers. Calculate the average daily wages
paid to a laborer.
Number of laborers 3 4 5 6 6 4 3
2
3.2.2 The Median
The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of
x1 , x2 , ..., xn by ~
x . For ungrouped data the median is obtained by
x n 1 if the number of items, n, is odd
~ 2
x 1
( x n x n 2 ) if the number of items, n, is even
2 2 2
3
1
xˆ Lmod W
1 2
Where Lmod lower class boundary of the modal class
1 The difference between the frequency of the modal class and frequency of the class
immediately preceding the modal class
2 The difference between the frequency of the modal class and frequency of the class
Immediately follows the modal class
W is the class width
The modal class is the class with the highest frequency in the distribution.
Examples 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70,
75, 73, 80, 70, 83 and 86. Find the mode of the students’ marks.
Example 2: Find the mode for the frequency distribution of the birth weight (in kilogram) of 30
children given below.
4
Chapter Four
Measures of Dispersion (Variation) and Shape
4.1 Objectives of Measuring Variation
Variation (dispersion) is the scatter or spread of observations /values/ in a distribution
The average or central value is of little use unless the degree of variation, which occurs about it,
is given. If the scatter about the measure of central tendency is very large, the average is not a
typical value. Therefore it is necessary to develop a quantitative measure of the dispersion (or
variation) of the values about the average. Measures of variation are statistical measures, which
provide ways of measuring the extent to which the data are dispersed or spread out.
Measures of variation are needed for the following basic objectives.
To judge the reliability of a measure of central tendency
To compare two or more sets of data with regard to their variability
To control variability itself like in quality control, body temperature, etc
To make further statistical analysis or to facilitate the use of other statistical measures.
Example 1: Find the range and relative range for the monthly salary of ten workers in a certain
paint factory given below.
462 480 534 624 498 552 606 588 516 570
Solution:
5
x max 624 birr x min 462 birr
R x max x min 624 birr 462 birr 162 birr
x max x min 624 birr 462 birr 162 birr
RR 0.149
x max x min 624 birr 462 birr 1086 birr
Example 2: Find the values of the range and relative range for the following frequency distribution:
which shows the distribution of the maximum loads supported by a certain number of cables.
Maximum load Number
(in kilo-Newton) of cables
93 – 97 2
98 – 102 5
103 – 107 12
108 – 112 17
113 – 117 14
118 – 122 6
123 – 127 3
128 – 132 1
Solution:
M first 95 kN M last 130 kN
R M last M first 130 kN 95 kN 35 kN
M last M first 130 kN 95 kN 35 kN
RR 0.156
M last M first 130 kN 95 kN 225 kN
4.2.2 The Variance, the Standard Deviation and Coefficient of Variation
The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their arithmetic
mean.
Population Variance ( 2 )
For ungrouped data
x
2
1 xi 2 Where is the population arithmetic mean
... xi N
2 i 2
N N
and N is the total number of observations in the population.
For grouped data
fi mi fi mi
2 2
1
2
N
N
fi mi N Where is the population arithmetic
2
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and N f i .
Sample Variance ( S 2 )
For ungrouped data
6
x 1 xi 2 Where is the sample arithmetic mean
x
2
... xi n
2 i
2
S x
n 1 n 1
and n is the total number of observations in the sample.
For grouped data
f m x fi mi
2 2
1
S 2
n 1
...
n 1
i
fi mi
2 i
n
Where x is the sample arithmetic
mean, mi is the class mark of the i class, f i is the frequency of the i th class and n f i .
th
f ( xi )
2
i
37.8225 39.69 0.225 24.3675 68.445 170.55
x
fx i i
183 5
9.15, where n fi 20
n 20 i 1
f x x
2
170.55
And S 2
i i
8.976
n 1 19
Coefficient of Variation
The standard deviation is an absolute measure of dispersion. The corresponding relative measure
is known as the coefficient of variation (CV).
Coefficient of variation is used in such problems where we want to compare the variability of two
or more different series. Coefficient of variation is the ratio of the standard deviation to the
arithmetic mean, usually expressed in percent.
S
CV 100 . Where S is the standard deviation of the observations.
x
7
A distribution having less coefficient of variation is said to be less variable or more consistent or
more uniform or more homogeneous.
Example: Last semester, the students of Biology and Chemistry Departments took Stat 273 course.
At the end of the semester, the following information was recorded.
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Biology Department Chemistry Department
S S
CV 100 CV 100
x x
23 11
100 29.11% 100 17.19%
79 64
Interpretation: Since the CV of Biology Department students is greater than that of Chemistry
Department students, we can say that there is more dispersion relative to the mean in the
distribution of Biology students’ scores compared with that of Chemistry students.
Example: The following table illustrates the frequency distribution of masses of 100 male
students in Gander University.
Mass (kg) 60-62 63-65 66-68 69-71 72-74
No. of students 5 18 42 27 8
Find: a) the variance b) the standard deviation c) the coefficient of variation
Solution:
Mass (kg) 60-62 63-65 66-68 69-71 72-74 total
No. of students(fi) 5 18 42 27 8 100
class mark(mi) 61 64 67 70 73
fi mi 305 1152 2814 1890 584 6745
2
fi mi 18605 73728 188538 132300 42632 455803
mi x 6.45 3.45 0.45 2.55 5.55
fi mi x 32.25 62.1 18.9 68.85 44.4 226.5
5 5
m f
5
fi mi 6745 , 455803 , n 100
2
i 1 i i
i 1 i 1
5
fm i i
6745
and x i 1
67.45
n 100
8
2 2
1 ( f i m i ) 1 (6745) ) 8.61
(i 1
5
) (455803
2 2
a) S n 1
f m
i i
n 99 100
b) S 8.61 2.93
2
S
S 2.93
c) CV *100 *100 4.344
x 67.45
Chapter 5
5 Elementary Probabilities
5.1 Definition of basic terms of probability
Random experiment: - is a process of measurement or observation which is repeated at any
time and who’s out come can’t be predicted with certainty. E.g. tossing a coin
Outcome: - a particular result of an experiment (result of single trial of an experiment)
Sample space: - is the set of all possible outcomes of a random experiment. Each possible
outcome is called sample point.
Event: - is a subset of a sample space (one or more outcomes of an experiment)
Example1: if we toss a coin the sample space (S) of this experiment
S = {head, tail} where head and tail are two faces of a coin. If we are interested the outcome of
head will turn up then the event E= {head}
Example 2: find the sample space of tossing a coin twice.
S= {HH, HT, TH, TT}
Elementary or simple event: - an event having only one sample point.
Mutually exclusive event: - two events E1 and E2 are said to be mutually exclusive if there is
no sample point which is common to E1 and E2.
i.e. E1 E2 =
Independent event: two events E1 and E2 are said to be independent if the occurrence of E1 has
no bearing on occurrence of E2. That means knowledge of E1 has occurred given no information
about the occurrence of E2.
Collectively exclusive events: - two events are said to be collectively exclusive if at least one of
them must occur. Hence they include every possible outcome.
Equally likely outcomes: - if each outcome in a sample space has the same chance to be
occurred.
9
Example In throwing a fair die all possible outcomes are equally likely. That means the elements
of the sample space have the chance to be occurred.
5.2 Counting techniques:
In order to determine the number of out comes one can use several rules of counting
1. Multiplication rule: - in a sequence of n events in which the first event has k1 possibilities…
the nth event has kn possibilities, then the total possibilities of the sequence will be k1.k2….kn.
Example: - in a personnel department a larger corporation wishes to issue each employee an ID
cards with two letters followed by two digit numbers. How many possible ID cards can be
imposed?
Solution
K1 K2 K3 K4
26 26 10 10
Thus the total number of ID cards issued could be:
26*26*10*10=67600(with repetition)
26*25*10*9=58500 (with repetition)
2. Permutation: is an arrangement of n objects in a specific order. In this case order is crucial.
a) The number of permutations of n objects taken all together is n!
I.e. n! / (n-n)!
b) The arrangement of n distinct objects in a specific order using r objects at a time is given by
nPr =n!/(n-r)!= n(n-1)(n-2)…..(n-r-1)
c) The number of permutation of n objects in which k1 are alike, k2 are alike, kn are alike is
n! / k1!k2!....kn!
Example: a photographer wants to arrange 3 persons in a raw for photograph. How many
different types of photographs are possible?
Solution:
Assume 3 persons Aster (A), lemma (L), Yared (Y) and n=3
Since n! =3! = 3*2! = 6, there are 6 possible arrangement ALY, AYL, LAY, LYA, YLA and
YAL
Example2: fifteen athletes including Haile were entered to the race.
a) In how many different ways could prizes for the first, the second and the third place be
awarded?
10
b) How many of the above triplets just counted have if Haile is in the first position?
Solution:
15 objects taken 3 at a time 15P3=15! / (15-3)! = 2730
There are 14P2= 14! / (14-2) = 182
3. Combination: - counting technique in which the order of the objects is immaterial. Selection
of r objects from a collection of n objects where r<= n without regarding order.
The combination of n objects r objects taken at a time is given by
nCr = n!/(n-r)!r!
Example: In a club containing 7 members a committee of 3 people is to be formed. In how many
ways can the committee be formed?
Solution: 7C3 = 7! / (7-3)! 3! = 35
5.3 Definition of probability
Probability:-is a chance (likely hood) of occurrence of an event. It is expressed by a numerical
value between 0 and 1 inclusively. Probability is a building block of inferential statistics.
Deterministic Stochastic model (probabilistic)
-> Certain -> uncertain
->mathematical ->non-mathematical (econometric model)
Generally probability can be divided into two
i) Subjective probability: - probability of an event in a certain experiment to be
occurred based on individual’s belief or attitude.
ii) Objective probability: - the probability of an event in a certain experiment based on
experimental evidence.
5.4 Basic approaches to probability
Classical approach: - Uses sample space to determine the numerical probability that an event
will happen. If there are n equally likely outcomes of an experiment, and out the n outcomes
event E occur only k times the probability of the event E is denoted by P (E) is defined as
P (E) = n (E)/ n(S) =k/n
Deficiencies of classical approach
- If total number of outcomes is infinite or if it is not possible to enumerate all elements of
the sample space.
- If each out come is not equally likely
11
Example: in the experiment of tossing a coin and a die together, find the probability of an event
E consisting head and even numbers.
Solution: S={H1,H2,H3,H4,H5,H6,T1,T2,T3,T4,T5,T6} then
E= {H2, H4, H6} thus, P (E) =n (E)/n(S) =3/12= ¼
Let S be sample space of an experiment, P is called probability function if it satisfies the
following condition
0 < P (A) ≤ 1, for each event A, P (A) is called probability of A where P (S) = 1
If A and B are mutually exclusive events, then P (A B) = P (A) + P (B)
Similarly P ( Ai ) =P ( A1 ) + P ( A2 ) +…+ P ( An )
i 1
= P( A )
i
i 1
12
Rule 2: let A and B are events of a sample space S, then
P (A’ B) = P (B)-P (A B)
Proof: B =S B = (A A’) B = (A B) (A’ B)
Case 1: if A B ≠ , then P (B) =P (A B) +P (A’ B)
P (A’ B) = P (B) – P (A B)
Case 2: if A B = , then P (B) =P (A B) + P (A’ B) since P (A B) = P ( ) =0
=> P (B) = P (A’ B)
Rule 3: Suppose A and B are two events of a sample space, then
P (A B) = P (A) + P (B) - P (A B)
Example: A fair die is thrown twice. Calculate the probability that the sum of spots on the face
of the die that turn up is divisible by 2 or 3.
Solution:
S= {(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(2,1),(2,2),(2,3),(2,4),(2,5),(2,6),(3,1),
(3,2),(3,3),(3,4),(3,5),(3,6),(4,1),(4,2),(4,3),(4,4),(4,5),(4,6),(5,1),(5,2),(5,3),(5,4),(5,4),(5,5),(5,6),
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6)}
This sample space has 6*6 =36 elements let E1 be the event that the sum of the spots on the die
is divisible by 2 and E2 be the event that the sum of the spots on the die is divisible by two,
then
P (E1 or E2) = P (E1 E2)
= P (E1) +P (E2) – P (E1 E2)
= 18/36 + 12/36 -6/36 = 24/36 = 2/3
5.6 Conditional probability and independence
5.6.1 Conditional probability: the conditional probability of an event A in relation to B is
defined as the probability that event E occurs given that event A is has been already occurred.
P (A/B)=P(A B)/P(B) where P(B)> 0
Remark: (i) P (A B) & P (B) are computed w. r. t. original sample
(ii) P (S/B) = P(S B)/P (B) = P (B)/P (B) = 1
P (B/S) = P (B) because P (B/S) = P (B S)/P(S) = P (B)/1 =P (B) (iv) if A and B are
independent event, then P(A/B) =P(A) and P(B/A) =P(B) two events are independent if the
occurrence of B doesn’t affect the occurrence of A. i.e. P(A/B) =P(A B)/P(B)
13
P (A B) = P (A/B) *P (B) but P (A/B) = P (A)
Hence P (A B) = P (A)* P (B)
Example: Suppose that an office has 100 calculating machines. Some of them use electric power
(E) while others are manual (M) and some machines are well known (N) while others are used
(U). The table below gives numbers of machines in each category. A person enter the office
picks a machine at random and discovers that it is new. What is the probability that it is used
with electric power?
E M Total
N 40 30 70
U 20 10 30
Total 60 40 100
Solution: P (E/N) =P (E N) /P (N) = 40/70 =7/4
0.4
5.6.2 Independence: two events E1 and E2 are said to be independent if the occurrence of E1
has no bearing on occurrence of E2. That means knowledge of E1 has occurred given no
information about the occurrence of E2. Two events, A and B, are said to be independent
if P( A B) P( A) P( B) .
Suppose A and B are independent events with 0<P (A) <1 and 0<P (B) <1. Show that the
following statements true.
i. AC and BC are independent.
ii. A and Bc are independent
iii. Ac and B are independent
iv. P(B|A) = P(B)
v. P(B|AC) = P(B)
Example: Consider the experiment of drawing a card from a well shuffled deck of cards
Let A: a spade is drawn
B: an honor (10, J, Q, K, A) is drawn
Are the two events are independent?
13 1 20 5
Solution: P ( A)
4
, P( B)
13
and P( A B) 5
52 52 52
13 20 5
Using independence theorem P( A B) P( A) P( B) *
52
52 52
14