Introdaction To Statistics
Introdaction To Statistics
Department of Statistics
College of Natural and Computational Science.
MSc. in Bio-Statistics
By:
Addisu T.(M.Sc)
E-mail:- [email protected]
Phone:- +251 18 17 36 76
Nationality:- Ethiopian
November 24, 2023
1. Chapter
1 One
1
INTRODUCTION
2 Chapter Two
Organization and Presentation of data
3 Chapter Three
Measures of central tendency
4 Chapter Four
Measures of dispersion (Variation)
Chapter One
Definition and classifications of statistics
We cannot say that stage IV is twice as bad as stage II, or that the
difference between stages I and II is equivalent to that between
stages III and IV.
In contrast, 3 children as three times as many as 1 and a difference
of one means the same throughout the range of values.
The difference between any two possible data values can be very
small. Common examples include height, weight, temperature etc.
i. Primary Data
Data measured or collected by the investigator or the user directly
from the source. Data collected first hand by the investigator.
Chapter Two
Organization and Presentation of data
Is a table of all the potential raw score values that could occur in
the data along with the number of times each actually occurred.
Is often constructed for small set of data on discrete variable.
Constructing ungrouped frequency distribution:
First find the smallest and largest raw score in the collected data.
Arrange the data in order of magnitude and count the frequency.
When the range of the data is large, the data must be grouped in
to classes that are more than one unit in width.
Class width: the difference between the upper and lower class
boundaries of any class.
Class mark ( Mid points): it is the average of the lower and
upper class limits or the average of upper and lower class
boundary.
Cumulative frequency: is the number of observations less
than/more than or equal to a specific value.
Relative frequency : it is the frequency divided by the total
frequency.
Steps for constructing Grouped frequency Distribution
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 35/ / 105
2s
Can't
viii. Find the boundaries by subtracting U/2 units from the lower
limits and adding U/2 units from the upper limits.
The boundaries are also half-way between the upper limit of one
class and the lower limit of the next class.
Find the frequencies.
Example: Frame a grouped frequency distribution for
the observation of 30 families.
30,20,14,52,24,33,56,45,24,37,21,43, 25,11,33,19,26,31,42,34,37,41,15,47
38,26,44,28,12,
Solution: U=1 , Largest data value is 56 and smallest data value is
11 then range ,R = 56 -11 = 45
K = 1 + 3.32log (30) = 5.9 by rounding up=6
W=R K
= 456 = 7.5 by round up it is equal to 8,
The bars represent total value of a variable with each total broken
in to its component parts and different color or designs are used
for identifications.
Example: Draw a component bar chart to represent the sales by
product from 1957 to 1959.
f
Degrees = .360o
n
Solution:
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 45/ / 105
3s
Graphical presentatian of data
Histogram ,
Frequency polygon and
Cumulative frequency graph or ogive are most commonly
applied graphical representation for continuous data.
Procedure for constructing statistical graphs:
Draw and label the X and Y axes.
Choose a suitable scale for the frequencies or cumulative
frequencies and label it on the Y axes.
Represent the class boundaries for the histogram or ogive and the
mid points for the frequency polygon on the X axes .
Plot the points.
Draw the bars or lines to connect the points.
That is class boundaries are plotted along the horizontal axis and
the corresponding cumulative frequencies are plotted along the
vertical axis. The points are joined by a free hand curve.
Less than Ogive
Cumulative Frequency
More than 99.5 50
More than 104.5 48
More than 109.5 40
More than 114.5 22
More than 119.5 9
More than 124.5 2
More than 129.5 1
More than 134.5 0
Chapter Three
Measures of central tendency
a. A rithmetic mean
X̄ =
X1 + X2 + ... + Xn
=
ΣX
n n
In the case of grouped distribution, mean is calculated assuming
that each observation in a class interval is equal to the midpoint of
that class interval.
Class work
X̄ = Σ f *X =?
n
n = Σ f = 20
Properties of mean
nX̄wr + Xc - Xwr
X̄correct =
n
Combined mean
If X¯1 is the mean of n1 observations
If X¯2 is the mean of n2 observations
...
If X¯k is the mean of nk observations
Then the mean of all the observation in all groups often called the
Σ k X̄ n
combined mean given by X¯ = X 1 n1+X 2 n 2 + +Xk k = i=1 i i
¯ ¯ ¯ . . . . . .
c nn1 +n2 + +n
...
k k
Σi=1 ni
Class work
marks frequency(f) LCF
0-20 4
20-40 5
40-60 2
60-80 5
80-100 4
total 20
i. Uniqueness: There is one and only one median for a given set of
observations.
ii. Simplicity: It is easy to calculate and easily understood.
iii. It is not drastically affected by extreme values , as is the
mean.
2 8 6 4 5 2 4 5 6 8, Mn = 5
2 80 6 4 5 2 4 5 6 80, Mn = 5
2 0.8 6 4 5 0.8 2 4 5 6, Mn = 4
0.2 8 6 4 5 0.2 4 5 6 8, Mn = 5
Demerits of median
It is not capable of further algebraic treatment.
It is not a good representative of the data if the number of items
(data) is small.
The arrangement of items in order of magnitude is sometimes very
tedious process if the number of items is very large.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 71 / 105
Mode for Ungrauped Data
∆1
mode = X̂ = LCBm + ( )w
∆1 + ∆2
Merits of mode
Demerits of mode
Mode may not exist in the series and if it exists it may not be a
unique value.
It does not fulfill most of the requirements of a good measure of
central tendency
It may be unrepresentative in many cases.
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 75/ / 105
6s
Review
are values which divide the data set in to four equal parts,
denoted by Q1 , Q2 and Q3.
The first quartile Q1 is also called the lower quartile and the third
quartile Q3 is the upper quartile.
The second quartile Q2 is the median.
j . n4 - FQj
Qj = LQj + ( f Qj )w j=1,2,3 Where
Qj = the j th quartile which is to be worked out
LQj = Lower class boundary of the j th quartile class
FQj = Sum of frequencies of all classes lower than the j th quartile
class
fQj = Frequency of the j th quartile class and w= Class width
n
j . 10 - FDj
Dj = LDj + ( fDj
)w j=1,2,...,9 Where
Define the symbols similar way as we did in the case of quartiles.
The j th Deciles class is the class with the smallest cumulative
frequency greater tha n or equal to j . n10.
It can be located by counting of the frequencies beginning from the
lowest class.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 80 / 105
III. Percentiles:
are values which divide the data in to one hundred equal parts,
denoted by P1, P2, ..., P99 . The fiftieth percentile D5 is the median.
Percentiles for Ungrouped data:
n
j . 100 - FPj
Pj = LPj + ( fPj
)w j=1,2,...,99 Where
Define the symbols similar way as we did in the case of quartiles.
The j th Percentiles class is the class with the smallest cumulative
n .
frequency greater tha n or equal to j . 100
It can be located by counting of the frequencies beginning from the
lowest class.
In trodu ction to Statistics Set BY Addisu T. 5/1/2023 81 / 105
CH-4
Chapter Four
Measures of dispersion (Variation)
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 85/ / 105
7s
Absolute and Relative Measures of Dispersian
i. Absolute measures of dispersion are expressed in the same unit
of measurement in which the original data are given.
These values may be used to compare the variation in two
distributions provided that the variables are in the same units and
of the same average size.
In case the two sets of data are expressed in different units,
however, such as quintals of sugar versus tons of sugarcane or if
the average sizes are very different such as manager's salary versus
worker's salary, the absolute measures of dispersion are not
comparable.
In such cases measures of relative dispersion should be used.
ii. A measure of relative dispersion is the ratio of a measure of
absolute dispersion to an appropriate measure of central tendency.
It is also called coefficient of dispersion because the word
coefficient represents a pure number (that is independent of any
unit of measurement).
Note: the value of a relative dispersion is unit less quantity.
to Statistics
In trodu ction to Statistics ()Set BY Addisu T. November
5/1/2023 24, 2019 79
86/ / 105
Types of Measures of Dispersian
Xmax - Xmin R
RR = = ...for ungrouped data
Xmax + Xmin Xmax + Xmin
CMlast - CMfirst R
RR = = ...for grouped data
CMlast + CMfirst CMlast + CMfirst
1. The Variance
is the arithmetic mean of the square of the deviation of
observations from their arithmetic mean.
Population Variance for ungrouped data
Σ(Xi -µ)2 1 (Σ Xi )2
a2 = = (ΣXi 2- )
N N N
Population Variance for grouped data
Σ fi (Xi -µ)2 1 (Σfi Xi -
N (Σ fi Xi )2 )
a2 = 2
N N
S2 = Σ fi (Xi - X̄ )2 1 (Σ fi Xi )2 )
2
= - (
n-1 n - 1 Σf Xi
i
n
Where X̄ is the sample arithmetic mean, Xi is the class mark of
the i th class, fi is the frequency of the i th class and Σ fi = n.
2. The Standard Deviation
Standard deviation is the positive square root of the variance.
= 51
Σf 60
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 95/ / 105
ss
Can’t…
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November s996/ / 105
The Standard Scores (Z-Scores)
A standard score is a measure that describes the relative position
of a single score in the entire distribution of scores in terms of the
mean and standard deviation.
It also gives us the number of standard deviations a particular
observation lie above or below the mean.
Population standard score:
X -µ
Z=
a
Sample standard score:
X - X̄
Z=
S
1 . Moments
1. The r th moment is defined as:
¯ r = Σ (Xi )r , for ungrouped data , r = 1, 2, ...
X
n
¯r =
X Σ fi (Xi )r , for grouped data , r = 1, 2, ...
n
If r=1, it is the simple arithmetic mean, this is called the first
moment.
2. The r th moment about the mean (the r th central moment)
Denoted by Mr and defined as:
(X i X¯)
Mr = Σ
r
n
, for ungrouped data , r = 1, 2, ...
Σ f (X X¯) r
Mr = i
n
i
, for grouped data , r = 1, 2, ...
If r=2, it is population variance, this is called the second central
moment. If we assume n - 1 n, it is also the sample variance.
Mr = Σ fi (X i A) r
, for grouped data , r = 1, 2, ...
n
Remarks:
1. M0 = 1
2. M1 = 0
3. M2 ≈ 𝜎2
Example :
1. Find the first two moments for the following set of numbers 2, 3, 7
2. Find the first three central moments of the numbers in problem 1
3. Find the first three moment about the number 3 of the numbers in
problem 1
Remarks:
In a positively skewed distribution, smaller observations are more
frequent than larger observations i.e. the majority of the
observations have a value below an average and it has a long tail
in the positive direction.
∴ 𝛅𝟑 = 3( mean median
S
) = 3( 10 -2 85 ) = 15
.
2
= 0.75 +vely skewed
In trodu ction to
to Statistics
Statistics ()Set BY Addisu T. 5/1/2023 24, 2019
November 106
9s /
/ 105
Kurtasis