Lecture Note Introduction To Stat Seni
Lecture Note Introduction To Stat Seni
1.1 Introduction
The origin of modern statistics can be traced back to the 17 th and 18th centuries when
mathematicians were mainly interested in the development of the theory of probability as
applied to the theory of chance. A commoner, named John Graunt, who was a native of
London, begin reviewing a weekly church publication issued by the local parish clerk that
listed the number of births, christenings, and deaths in each parish. These so-called Bills
of Mortality also listed the causes of death. Graunt who was a shopkeeper organized this
data in the forms we call descriptive statistics, which was published as Natural and
Political Observation Made upon the Bills of Mortality. Whatever it is possible to define
Statistics in different way in different authors. For instance, “Statistics is the branch of
the scientific method which deals with the data obtained by counting or measuring the
properties of populations of natural phenomena.” Kendall and Stuart [1963]. “Statistics
is concerned with the inferential process, in particular with the planning and analysis of
experiments or surveys, and with the efficient summarizing of sets of data.” Kruskal
[1968] and the like.
New and ever growing diverse fields of human activities are using statistics; however, it
seems that this field itself remains obscure to the public. Professor Bradley Efron
expressed this fact nicely:
During the 20th Century statistical thinking and methodology has become
the scientific framework for literally dozens of fields including education,
agriculture, economics, biology and medicine, and with increasing
influence recently on the hard sciences such as astronomy, geology, and
physics. In other words, we have grown from a small obscure field into a
big obscure field.
The Statistics component of the course aims to provide the basic statistical tools for
management and business decisions, comprising, descriptive statistics, probability
distributions, estimation, hypothesis testing, correlation and regression.
1.2 Definitions of Statistics
The word “statistics” is derived from the Latin word ‘status’ meaning states. In political
leader ship the interest was in the numerical description of a political unit such as
provinces, states, cities, towns, etc. in which the main concern was the collection of
information on revenue, population, political manpower for military services, area of land
under cultivation, about births and deaths.
This part of statistics deals only with describing some characteristics of the data collected
without going beyond the sample. i.e. this part deals with only describing the data
collected without going any further or without attempting to infer (conclude) anything
that goes beyond. It comprises the first four stages of statistical investigation namely:
collection, organization presentation, and analysis of data.
1.5.2 Inferential Statistics
This type of statistics is concerned with drawing statistically valid conclusions about the
characteristics of the population based on information obtained from a sample. It is the
part of statistics that is generalizing from the sample to population using probabilities,
performing hypothesis testing, determining relationships between variables, and making
predictions.
Stages in Statistical Investigation
According to the definition of statistics, we have the following five stages of a statistical
investigation.
1. Collection of data: The first stage of statistical investigation. The data should be
collected with a specific and well defined purpose so that the conclusions drawn are
not to be misleading. Two methods of data collection: Primary and Secondary:
Primary method of data collection refers to obtaining original and first hand data
and Secondary method of data collection involves obtaining data from other sources.
Population
μ,σ ,π , ρ
Parameter
Sample X ,S , p,r
Statistic
In any statistical investigation the first step is to collect a set of related observations from
which conclusions may be drawn. Data are a set of related information (facts) from
different values.
Based on the information desired variable can be classified as qualitative and
quantitative.
Qualitative variable: variables in which the characteristic or variable being studied is
non-numeric. A qualitative variable is a variable that can be described only in words.
Example: gender, color, religion, ethnic group etc.
Quantitative variable: variables that can be expressed numerically or are variables that
are numeric in nature. Quantitative variables can be further classified as discrete or
continuous.
Discrete variables: A Variable that assumes a finite or countable number of possible
values is called a discrete variable. There are finite or countable numbers of choices
available with discrete data. You cannot have 2.63 people in the room. Discrete variable
is usually obtained by counting. E.g., number of children’s in a family, number of cars at
a traffic light is usually obtained by counting.
Continuous variables: A variable that can theoretically assume infinite number of
possible values is called a continuous variable. Continuous variables are obtained by
measuring. Continuous variables assume any value between two given values. Length,
weight, and time are all examples of continuous variables. Since continuous variables are
real numbers, we usually round them. This implies a boundary depending on the number
of decimal places. For example: 64 is really anything 63.5 ≤ x < 64.5. Likewise, if there
are two decimal places, then 64.03 is really anything 63.025≤ x < 63.035. Boundaries
always have one more decimal places than the data.
Exercises
Classify each of the following as qualitative and quantitative and if it is quantitative
classify as discrete or continuous.
a. Colors of automobiles in a dealer’s show room.
A. Classifications by Sources
According to source, data are classified as primary and secondary.
Primary data refer to data collected either by or under the direct supervision and
instruction of the researcher while Secondary data refer to data which are not
originated by the researcher himself/herself, but obtained from other sources such as
newspapers, journals, official records, etc.
10
11
Broadly, there are two methods of data collection, which are primary and secondary.
The primary method consists of obtaining data or information by any of the following
ways.
Direct personal interview
Indirect personal interview
Information from correspondents
Mailed questionnaires
Questionnaires to be filled by enumerators
The secondary method is a method by which we obtain data from the records of
institutions that collect and publish statistics as part of their routine duties.
2.2 Methods of data presentation
Data presentation is a statistical procedure of arranging and putting data in a form of
tables, graphs, charts and diagrams. After data have been collected and organized, the
next step is to present them in some suitable form. The need for proper presentation arises
because of the fact that statistical data in their raw form are difficult to make
comprehension.
In this chapter, we will be introduced with the following concepts.
12
One of the simplest and most revealing devices for summarizing data and presenting
them in meaningful fashion is the statistical table. A table is a symmetrical arrangement
13
Simple or One- way Table: in this type of table only one characteristic is shown.
Complex Tables: such tables represent two or more characteristics with the same table.
General Purpose Tables: also known as the reference tables or repository tables provide
information for general reference. On the other hand, special purpose tables, also known
as summary or analytical tables, provide information for particular discussion.
An ideal table should consists: table number, title of the table, caption or column
heading, body of the table, stubs or row designation, footnotes and source of data.
Table number: for easy reference and identification a table should be numbered. This
number, if possible, should be written in the center at the top of the table.
Title: title of the table is a description of the contents in the table. A complete title has to
answer the questions what precisely are the data in the table? Where and when the data
occurred?
Caption: in a table stands for brief and self explanatory headings of vertical columns and
it explains what the column represents.
Stubs: stands for brief and self-explanatory headings of horizontal rows. Stubs do
perform the same function for the horizontal in the table as the column headings do for
vertical columns.
Body: the body of a table contains the numerical information of frequency of
observations in the different cells.
Footnotes: are given at the foot of a table to explain any fact or information included in
the table, which needs some explanation.
Source of data: one should also mention the source of information, from which the data
are taken. This may include author, volume, page and year of publication.
Classification of table has the following chief advantages
14
15
Units of measurement (U or d)
the distance between two possible consecutive measures or the gap between two
successive classes. It is usually taken as 1, 0.1, 0.01, 0.001, -----.
Class Boundaries
Separate one class in a grouped frequency distribution from another. The boundaries
have one more decimal place than the raw data and therefore do not appear in the data.
There is no gap between the upper boundary of one class and the lower boundary of the
next class. The lower class boundary is found by subtracting U/2 from the corresponding
lower class limit and the upper class boundary is found by adding U/2 to the
corresponding upper class limit.
16
17
2. The classes must be mutually exclusive. This means that no data value can fall into
two different classes.
3. The classes must be all inclusive or exhaustive. This means that all data values
must be included.
18
2. Select the number of classes desired, usually between 5 and 20 or use Sturges
R
w
3. Find the class width by dividing the range by the number of classes: k . It is
also the difference between the upper and lower class boundaries of the class, that
is, w = UCB – LCB.
4. Pick a suitable starting point less than or equal to the minimum value. The
starting point is called the lower limit of the first class. Continue to add the class
width to this lower limit to get the rest of the lower limits.
5. To find the upper limit of the first class, subtract U from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the
rest of the upper limits.
6. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2
units from the upper limits.
10. If necessary, find the relative frequencies and/or relative cumulative frequencies
19
1. The following data represent the mark of 20 students and then construct ungrouped
frequency distribution.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Solution
Step 1: Arrange the data in the order of magnitude and make a table as shown below.
Step 2: Tally the data.
Step 3: Compute the frequency.
Mark 60 62 63 65 70 74 75 76 80 85 90
Tally // / / / //// / // / /// /// /
Frequency 2 1 1 1 4 1 2 1 3 3 1
N. B.: Each individual value is presented separately, that is why it is named ungrouped
frequency distribution.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions
Step 1: Find the highest and the lowest value H=39, L=6 and find the range; R=H-L=39-
6=33.
20
R
w
Step 3: Find the class width; k =33/6=5.5=6 (rounding up)
Step 4: Select the starting point let it be the minimum observation and add the class width
(=6). Therefore, you have: 6, 12, 18, 24, 30, 36 are the lower class limits.
Step 5: Find the upper class limit; i.e. the first upper class=12-U=12-1=11 and add the
class width (=6). Therefore, you have: 11, 17, 23, 29, 35, 41 are the upper class
limits.
So combining step 5 and step 6, one can construct the following classes.
Then continue adding w on both boundaries to obtain the rest boundaries. By doing so,
one can obtain the following classes.
Class boundary: 5.5 – 11.5 11.5 – 17.5 17.5 – 23.5 23.5 – 29.5 29.5 – 35.5 35.5 – 41.5
Step 7& 8: tally the data and write the numeric values for the tallies in the frequency
column.
Step 9 & 10: Find cumulative, relative or/and relative cumulative frequencies.
Class limit Class boundary Class Tally Freq Cf. (less Cf (more rf. rcf (less
Mark . than type) than type) than type)
21
22
23
24
5 Frequency
0
4.5- 14.5- 24.5- 34.5- 44.5- 54.5- 64.5-
14.5 24.5 34.5 44.5 54.5 64.5 74.5
Class Boundaries
.
2.5.3 Cumulative Frequency Curve (Ogive)
Ogive is a frequency polygon (line plot) of the cumulative/relative cumulative
frequencies. The horizontal axis is marked with the class boundaries and the vertical one
is by cumulative/relative cumulative frequencies. A given cumulative frequency
distribution can have both a ‘more than’ and a ‘less than’ Ogive and it is useful to find
out the values quantiles (quartiles, deciles and percentiles).
Example
Construct both the less than and the more than Ogive by considering the data set given
below.
CB 19.5-29.5 29.5-39.5 39.5-49.5 49.5-59.5 59.5-69.5 69.5-9.5 79.5-89.5
f 4 6 8 12 9 7 4
Solution
Classes less than 29.5 39.5 49. 59.5 69. 79.5 89.5
5 5
Cum. Freq. 4 10 18 30 39 46 50
Classes more than 19.5 29.5 39. 49.5 59. 69.5 79.5
5 5
Cum. Freq 50 46 40 32 20 11 4
25
26
27
28
2.6.2 Pie-Chart
is a circle divided into sectors with areas equal to the corresponding components. It is
used for representing breakdown of an aggregate into its components. The proportion of
the category can express either by percentages or by angles.
That is degree of central angle of a category = (amount of the category / total amount)*
360o. The proportion of a category = (frequency of a category / total frequency)* 100%.
Example
29
30%
40%
30%
Review Exercises
1. a. Draw a histogram for the following information.
Height (feet): (Number of pupils) Relative frequency:
0-2 0 0
2-4 1 1
4-5 4 8
5-6 8 16
6-8 2 2
b. Find the cumulative frequency
Height (cm) Frequency:
30
31
3.1 Introduction
Tabulation, diagrammatic and graphic presentation techniques do not tell the complete
picture of the phenomena. The simplest case for highlighting some of the key features
and terminology of statistics is that of making statements about a single population based
on a sample. There is a need to calculate single representative values that has strong
power to create images on the user of information, which are known as average. Average
is the most popularly known tools to condense the chaos data and represent it through
single numbers. To have clear information about observations closeness or far apart,
32
x i
where i goes from 1 up to n is symbolically given by: i 1 = x1 + x2 +…. + xn
Properties of Summation Notation
n
n(n 1)
1) i 1 2 .... n
i 1 2
n n n
2) ( xi y i ) xi y i
i 1 i 1 i 1
n n
3) cxi c xi
i 1 i 1
n
4) c nc
i 1
n n
5) ( xi a) xi na
i i 1
n n n
6) ( xi a) xi 2a xi na 2
i 1
2
i 1
2
i 1
33
The mean may often be similar with the median or mode for symmetrical set of values, or
distribution; however, for skewed distributions, the mean is not necessarily the same as
the middle value (median), or the most likely occurrence value (mode). For example,
mean income is skewed upwards by a small number of people with very large incomes,
so that the majority has an income lower than the mean. By contrast, the median income
is the level at which half the population is below and half is above. The mode income is
the most likely occurrence income, and favors the larger number of people with lower
incomes. The median or mode is often more intuitive measures of such data.
Nevertheless, many skewed distributions are best described by their mean such as the
exponential and Poisson distributions.
For example, the arithmetic mean of six values: 34, 27, 45, 55, 22 and 34 is:
If X is a variable having values X1, X2,…, Xm occurring with frequencies f1, f2,…, fm
respectively, then its arithmetic mean is given by:
34
X2f +¿…+ X m f m
∑ Xif i
X =X 1 f + 2
= i=1m ¿.
1
f 1+ f 2 +…+ f m
∑fi
i=1
∑ Xi 3∗2+5∗1+…+7∗2+6∗1 40
i=1
X= = = = 4.
m
2+…+ 1 10
∑ fi
i=1
GM =
{√ m
√n X 1 . X 2 … . X n , for ungrouped data sets
X f1 . X f2 … . X mf , for data sets of Xi having frequencies fi
1 2 m
N.B.: The geometric mean is an average that is useful for sets of data that are not
containing odd negative numbers and have no zeros. For example rates of growth.
Example: the geometric mean of six values: 34, 27, 45, 55, 22 and 34 is:
35
When the observed values X1, X2, …Xk have the corresponding frequencies f1, f2,…fk
n ❑
∑ Xi , where n= ∑
k
respectively, then the harmonic mean is given by: HM = f fi .
❑
i=1 i
Example: the harmonic mean of the six values: 34, 27, 45, 55, 22, and 34 is
The relation among the three means: Arithmetic mean (AM), geometric mean (GM) and
harmonic mean (HM) have the following relationships.
AM≥GM≥HM, this statement can be elaborated by considering x 1 and x2 as non negative
observation, then
2 x1 x2
x x
xx
1 2
and HM= x1
2 1 2 x2
AM= GM=
AM=GM=HM if x1 = x2
x x
xx xx
1 2
2 x1 x2
xx
1 2 ≥ x x
1 2
then we have after some mathematical manipulation and shows
36
X 2 W +¿ …+ X ∑ XiW i
¿.
fn
X w =X 1 W + 2 n
= i=1n
1
W 1 +W 2 +…+W n
∑ Wi
i=1
Example
In a given drug shop four different drugs were sold for unit price of 0.60, 0.85, 0.95 and
0.50 birr and the total numbers of drugs sold were 10, 10, 5 and 20 respectively. What is
the average price of the four drugs in this drug shop?
Solution: for this example we have to use weighted mean using number of drugs sold as
the respective weights for each drug's price.
Therefore, the average price will be:
Xw = (10*0.60 +10*0.85 + 5*0.95 + 20*0.50)/ (10+10+5+20) = 29.25/45 = 0.65 birr. If
we don't consider the weights, the average price will be 0.725 birr and it is totally
wrong!!
37
x
j 1
j fj
x
3. Divide the total sum by the number of observations. This is: n
Where,
n = number of observations in the sample
g = number of classes in the frequency distribution
xj = midpoint of the jth class
fj = number of observations in the jth class
Example
Using the age of employees example, the frequency distribution is:
Class Interval Class mark Frequency
CI (xj) fj xjfj
15 - 19 17 2 34
20 - 24 22 10 220
38
39
40
41
3.2.3.1 Quartiles
Quantiles are three points which divide a given ordered data into four equal parts. The
first, second and third points are known as 1 st, 2nd, 3rd quartiles and are denoted by Q 1, Q2
and Q3 respectively.
th
k(n + 1 ¿
For ungrouped data the Kth quartile Qk is the value of the item which is a the
4
position, wher K =1, 2, 3 and n is the total number of observations.
For a grouped data, the computation of three quartiles can be done as follows:
Calculate kn/4 and search for the minimum cumulative frequency which is greater than or
equal to kn/4, k=1, 2, 3. The class corresponding to this cumulative frequency is the k th
quartile class. This is the class where Qk lies. Thus,
kn
Qk = L + c ( −C F ¿ ¿ , k =1, 2, 3.
4 f
3.2.3.2 Deciles
Deciles are nine points which divide a given ordered data into ten equal parts. Each part
contains equal number of elements. The first, second … and ninth points are known as 1 st,
2nd, 3rd… and 9th deciles and are denoted by D1, D2 … and D9 respectively.
th
k(n + 1 ¿
For ungrouped data the Kth deciles Dk is the value of the item which is a the
10
position, wher K =1, 2, 3…9 and n is the total number of observations.
For a grouped data, the computation of nine deciles can be done as follows:
42
43
the (kn/4)th and (kn/4 +1)th, (kn/10)th and (kn/10 +1)th, (kn/100)th and (kn/100)th
value resapectively.
3) Q2=D5=P50=median of the distributions, P25=Q1, P75=Q3, and Di = Pi*10, i=1, 2, 3,…9.
4) Intuitively, the pth percentile is the value Vp such that p percent of the sample points
are less than or equal to Vp. For example, the median, being the 50 th percentile, that
indicate half of the observation are above and half of the observations are below the
50th percentile.
5) Quantiles have the advantage that being less sensitive to outliers and of not being
much affected by the sample size (n).
3.2.4 Mode
The mode is the value of the observation that occurs with the greatest frequency. A
particular disadvantage is that, with a small number of observations, there may be no
mode. In addition, sometimes, there may be more than one mode such as when dealing
with a bimodal (two-peaks) distribution. It is even less amenable (responsive) to
mathematical treatment than the median.
Find the modal values for the following data: (a) 22, 66, 69, 70, 73. (No modal value) (b)
1.8, 3.0, 3.3, 2.8, 2.9, 3.6, 3.0, 1.9, 3.2, 3.5 (modal value = 3.0 kg). 10, 10, 9, 9, 8, 12, 15,
5 (modal value = 9 and 10). Hence, it is possible for a frequency distribution to have
more than one mode. Distributions with one mode are called unimodal, those with two
modes are called bimodal, and those with more than two modes are called multimodal.
3.2.4.1 Mode of Grouped Data
To find the Modal value for grouped (continuous) frequency distribution, first find the
modal class which is the class that contains the mode which is the class with the highest
frequency. Then to compute the modal value for grouped data, we use the formula:
44
45
46
47
48
s
d 2
( d ) 2
n 1 n(n 1) ……….for ungrouped data. And di = xi – A, where, A is
assumed mean.
49
( fd ) 2
n 1 n(n 1) ……...for grouped data. Where, d = Xmi – A, where
i
N.B.: σ 2
=
∑ ( X i−μ ¿ )2
in which variance is calculated for the population distribution.
i=1
¿
N
Example
1. Compute the mean, variance and standard deviation for data sets A and B.
A: 10 60 50 30 40 20
B: 40 30 45 35 40 20
After you computed the mean and standard deviation, what did you observe? Comment
on the result.
Solution
MeanA MeanB StDevA StDevB VarianceA1 VarianceB
35 35 18.7083 8.94427 350 80
Both A and B has equal means, but the observations of A is more scattered than B.
2. Find the sample variance and standard deviation for the following distributions.
a) Xi: 3 4 6 8
fi : 2 3 4 2
b) Class: 0 –10 10 –20 20 –30 30 – 40 40 – 50
Frequency 7 6 15 12 10
Solutions: a
xi- x (xi- x )2 f(xi- x )2
-2.27273 5.165289 10.33
-1.27273 1.619835 4.86
0.727273 0.528926 2.12
2.727273 7.438017 14.88
Mean=5.27 Sum=14.7520 Variance=3.22
50
class
3.
mark frequency xifi xi- x (xi- x )2 fi(xi- x )2
sum 50 1370
Variance=169.6
3 and SD = 13.04
The mean and standard deviation of a set of 100 sample observations were worked out
as 40 and 5 respectively by a computer. Later, it was detected that the value 50 was
recorded in place of 40 for one observation. Find the correct variance and standard
deviation.
Solution
5040
2 2
2
i 1
i 1
25
Sd = 2 99 99 99 , but when x=40, then
xi 40
99
4040
2
2
i 1
xi 40
99
5040
2
2
i 1
99 =25- 99 =25-100/99=23.98
51
( x 7) 7, ( x 7) 2
535 and n 15
. (Answer: s 37.98 and s 6.16 )
2
( xi x ) 2 (x A) 2
n 1
i
N 1 1 N 2 2 ..... N k k
d j i and
Where, N 1 N 2 ...... N k
52
Where, n1 n2 ...... n k
In particular, when k =2
(n1 1)( s12 d12 ) (n2 1)( s 22 d 22 )
s 212
n1 n2 2
n 2 X 2 n1 X 1
Where, d1 = X 1 X ; d2 = X 2 X and X = n1 n 2
Example
The mean weight of 150 students is 60 kg. The mean weight of boys is 70 kg with a
standard deviation of 10 kg. For the girls, the mean weight is 55 kg and the standard
deviation is 15 kg. Find the number of boys and the combined standard deviation.
(49)(100 100) (99)( 225 25)
s 212
(Hint: 60*150=nb*70+ (150-nb)*55 and 150 2 the
53
Example
Consider the following hypothetical distribution on height and weight of individuals.
Height (cm): 152 166 174 181 175 172 190 180 178
Weight (kg): 45 52 54 66 62 59 84 90 70
Using coefficient of quartile deviation check whether the height or weight data is more
variable.
54
55
Exercise
a. A sample of 5 items was taken from the output of a factory. The length and weight of
them are given below.
Length (in inches): 5 6 7 9 12
Weight (in ounces): 13 15 18 19 20
Which of the two characteristics is more variable? Why? (Find the variance of length
and weight)
b. The average IQ of students in one calculus class is 110, with standard deviation of 5.
The average IQ of students in another class is 106 with standard deviation 4.Which
class is uniform in terms of IQ?
4.2 The Standard scores (Z – score)
Scores are generally meaningless by themselves unless they are compared to the
distribution or scores from some reference group. The numerical value of the Z-score
reflects the relative measure of standing because of its value. Therefore, Z-score is the
number of standard deviations that a given value X is below or above the mean of the
Xi– X Xi–μ
data and defined as Z = (for the sample data sets) and Z = (for the
S σ
population data sets).
Properties of Z-Score
56
57
k
(X i A) k
For population data: N , k = 0, 1, 2, 3, …and
k
f (X
i i A) k
N , for grouped.
mk
(X i A) k
mk
f (X i i A) k
For sample data: n , k = 0, 1, 2, …and n ,
for grouped data.
The kth central moment (centered about the arithmetic mean) is defined as:
k
(X i )k
For population data: N , k = 0, 1, 2, 3 … and
k
f (X i )k
N , for grouped data.
mk
(X i x) k
For sample data: n , k = 0, 1, 2, … and
m
f (X i x) k
k
n , for grouped data. And Xi becomes class midpoint in
the case of continuous grouped data.
Remark
The first central moment is zero.
The second central moment is the same as variance.
For a symmetric distribution, all odd central moments are zero.
Example
Calculate the first three raw moments about 0 and the first three moments centered about
the arithmetic mean for the following sample data.
58
2 = 2 1 2
3 = 3 3 2 1 21
3
4 = 4 4 3 1 6 2 1 31
2 4
moment is calculated.
Exercise
The first two moments of a distribution about 1 are 2 and 25 respectively. Find the mean
and standard deviation of the distribution.
Skewness
Skewness is a measure of distortion of a distribution having a single mode. The
frequency distribution of a set of observations is called symmetrical about the mean if the
number of frequencies above the mean is the same as those below the mean.
Alternatively, a distribution is said to be symmetrical if observations are arranged in a
symmetrical order around mean, median and mode. Such a distribution has no skewness.
When a distribution is not symmetrical, it is called skewed. Whenever mean is greater
than the median and the mode, then the distribution is positively skewed, but if the mean
is less than the median and the mode, then the distribution is negatively skewed.
59
3( Mean Median )
S tan dard deviation
K=
Properties of Skewness
If SK = 0, then the distribution is symmetrical.
If SK > 0, then the distribution is positively skewed.
If SK < 0, then the distribution is negatively skewed.
There is no theoretical limit to this measure, however, in practice the value
given by this formula falls between -3 and 3.
Example
The following facts are gathered before and after an industrial dispute, then compare the
position before, and after dispute in respect of skewness.
Before dispute After dispute
Number of workers: $ 515 $ 509
Mean wages: $49.50 $52.75
Median wages: $52.80 $50.00
60
Exercise
The arithmetic mean, the median and coefficient of variations for a distribution are 30,
33 and 40% respectively. Find the coefficient of skewness.
2) The Bowley’s Coefficient of Skewness (S b): it is based on the relative positions of
the median and the two quartiles.
(Q3 median) (median Q1 ) Q3 Q1 2median
Sb = (Q3 median) (median Q1 ) = Q3 Q1
N.B.: i. For a symmetric distribution twice the median is the same as the sum of
the two quartiles and hence Sb is zero; If Sb <0, then the distribution is
negatively or left skewed; and If Sb >0, then the distribution is positively
or right skewed.
ii. Note that the Bowley’s measure of skewness is recommended when the
mode is ill defined and/or the distribution has open end classes as well as
unequal class intervals.
3) Measure of Skewness Based on Moments (Sm)
3 3
3
( 2 ) 3
Sm = ( 2 )
2
Or Sm = … for population data.
m3 m3
3
(m2 ) 3
Sm = (m2 )
2
Or Sm = … for sample data.
The interpretation of Sm is the same as the above two.
Exercise
Find the coefficient of skewness based on the following summary statistics that are
obtained from a certain distribution.
a. Q1 = 8, Q3 = 20 and Q2 = 11
61
m4
K= (m2 ) 2 … for sample data.
Interpretation of the value of K
1. If K =3, then the distribution is mesokurtic.
2. If K > 3, then the distribution is leptokurtic.
3. If K < 3, then the distribution is platykurtic.
If we want to our reference point to be zero, we can change the above coefficient as:
4
φ = ( 2 ) - 3… for population data.
2
m4
2
φ = (m2 ) - 3… for sample data
Accordingly, If φ =0, then the distribution is said to be mesokurtic.
If φ > 0, then the distribution is said to be leptokurtic.
If φ < 0, then the distribution is said to be platykurtic.
62
C. Find out the lower and upper quartiles, the coefficient of quartile deviation (CQD) and
the coefficient of skewness from the following information. Sum of the two quartiles
= 110, Difference of the two quartiles = 26 and Median = Double the difference of the
upper and lower quartiles. (Answer: CQD = 0.236, Q1 = 42, Q3 = 68 and Sk = 0.23)
Review Exercises
1. Calculate geometric mean for 9 and 16
2. Suppose that the arithmetic mean and geometric mean of two observations are 25 and
20, respectively. Then find the harmonic mean.
3. If for a certain distribution the coefficient of variation = 20% and the mean = 10,
calculate it standard deviation and variance.
4. Find the value of the first quartile of: 2, 4, 6, 8, 10, 12, and 14.
5. Find quantiles for: 5, 7, 7, 8, 10, 11, 12, 15, and 17.
6. Suppose that you obtianed: 16, 17, 18, 19, and 20 on your last five quizzes. If you get
20, 20 and 20 on the next three quizzes, which of the following would change?
I. Q1 II. Q2 III. Q3 IV. Minimum V. maximum
7. Calculate quartile deviation and co-efficient of quartile deviation if Q1=20 and Q3 =40
8. Consider the monthly salaries of 5 individuals: 2000, 2500, 2500, 2800, 4000
calculate the mean salary,
find variance and standard deviation for the salary.
63
13. The arithmetic mean and standard deviation of a series of 20 items were computed as
20 and 5 respectively. While calculating these, an item 13 was misread as 30. Find the
correct mean and standard deviation.
64
5.1 Introduction
The subject of probability can be traced back to the 17 th century when it arose out of the
study of gambling games. As we will see the range of applications extend beyond games
into business decisions, insurance, law, medical tests, investments, weather forecasting
and the social sciences. The telephone network, call centers, and airline companies with
their randomly fluctuating loads could not have been economically designed without
probability theory. Consequently, Probability as a general concept can be defined as the
chance of an event occurring.
“Probability is basically common sense reduced to calculation; it makes us appreciate
with exactitude what reasonable minds feel by a sort of instinct.” So said Laplace. In the
modern scientific and technological world, it is even more important to understand
probabilistic argument. This module introduces the tools needed for probability and goes
on to use them in simple situations related to repeated experiments with applications to
quality control.
Probability axioms and simple properties
65
66
Example 3: Consider the word "STATISTICS". Find the permutations of the word.
Solution: Here is the frequency of each letter: S=3,T=3,A=1,I=2,C=1, there are 10 letters
10!
in total and 10! =10*9*8*7*6*5*4*3*2*1. Therefore, Permutations = 3!3!1!2!1! = 50400.
5.2.3 Combinations
Suppose that we have a collection of objects and that we wish to make r selections from
this list of objects where the order does not matter. An unordered selection such as this is
referred to as a combination.
Note: The difference between a permutation and a combination is not whether there is
repetition or not, there must not be repetition with either, and if there is repetition, you
cannot use the formulas for permutations or combinations. The only difference in the
definition of a permutation and a combination is whether order is important.
A combination of n objects, arranged in groups of size r, without repetition, and order is:
n!
n
C r =¿
r ! ( n−r ) !
.
Example 1: Find all two-letter combinations of the letters "ABC"
67
Mutually Exclusive Events: Two events are mutually exclusive if they cannot occur at
the same time. Means, mutually exclusive event are disjoint. Thus, the probability of both
occurring at the same time is 0 if two events are disjoint, i.e. P(A ∩B) =0. And if two
68
What is the probability that a randomly selected individual is a male smoker? This
is just a joint probability. The number of "Male and Smoke" divided by the total =
19/100 = 0.19
69
the probability that some other event F has occurred, that is, that P ( E | F ) P ( E ) . One
would expect that in this case, the equation P ( F | E ) P ( F ) would also be true. If these
equations are true, we might say the F is independent of E.
Definition: Two events E and F are independent if both E and F have positive
Example 5.14: Suppose that we roll a pair of fail dice, so each of the 36 possible out
come is equally likely. Let A denotes the event that the first die lands on
3, let C be the event that the sum of the dice is 7
70
Solution:
A. Since A B is the event that the first die lands on 3 and the second on 5, we see
that
1
P ( A B ) P ((3,5))
36
6
P ( A) P ((3,1), (3,2), (3,3), (3,4), (3,4), (3,6))
36 and
1 (6 ).( 5 ),
Therefore, since 36 36 36 we see that P( A B) P( A) P( B) and so
events A and B are not independent
1
P ( A C ) P (3,4)
36
1 6
P ( A) P (C ) P ((1,6), (2,5), (3,4), (4,3), (5,2), (6,1))
While 6 and 36 .Therefore,
P ( A C ) P ( A).P (C ) and so events A and C are independent.
71
Definition
A random variable is a variable that assumes numerical values associated with the
random outcomes of an experiment, where one (and only one) numerical value is
assigned to each sample point. A random variable can be either discrete or continuous.
Example
Consider an experiment of counting the number of customers who use the drive-up
window of a bank each day. The random variable can be: “the number of customers”
and the possible values of this random variable range from 0 to the maximum number of
customers the window could possibly serve in a day.
Random variables that can assume a countable number of values are called discrete.
Example
1. The number of sales made by a salesperson in a given week: x = 0, 1, 2,. ..
72
f(x)dx
Pr [a < X < b] = a .The function f(x) is called a probability density function.
73
The second important numerical characteristics of random variable are its variance and
standard deviation, which are defined as follows:
Let x be a discrete random variable with probability distribution p(x). Then the variance
of x is
' '
all x all x
The standard deviation of x is the positive square root of the variance of x, i.e. σ = √ σ 2 .
Let X be a continuous random variable with density function f(x). Then the mean or the
+∞
Let X be a continuous random variable with density function f(x) and g(x) is a function of
+∞
x. Then the mean or the expected value of g(X) is given by: E[g(x)] = ∫ g(x )f ( x ) dx .
−∞
Let X be a continuous random variable with the expected value E(X) = μ . Then the
+∞ +∞
variance of X is σ = E[(X - μ ¿ ¿ = ∫ ¿ ¿
2 2
∫ x 2 f ( x ) dx - ¿
−∞ −∞
74
75
76
p = 16!12!2!
0.55 0.40 .05
77
78
σ 2Π x ; σ 0 and μ
Properties of the Theoretical Normal Distribution
1. The curve is bell-shaped.
2. The mean, median and mode are equal and located at the center of the
distribution.
79
80
81
The chi-square distribution contains only one parameter called the degrees of freedom,
and is
equal to the number of Z values in the sum of squares.
Characteristics of the χ2 Distribution
- χ2 values cannot be negative since they are sums of squares.
- The χ2 distribution is non-symmetric.
- The mean of the χ2 distribution is its degree of freedom (n), and the variance is
2n.
- For large values of n (usually greater than 30), the χ2 distribution may be
approximated by the normal.
- The degrees of freedom when working with a single population variance is n-1.
A common use of the χ2 distribution is to describe the distribution of the sample
variance. Let Y1, Y2, . . . , Yn be a random sample from a normally distributed
population with mean = μ and variance = σ2. Then the quantity (n − 1)S2/σ2 is a
random variable whose distribution is described by a χ2 distribution with (n − 1)
degrees of freedom.
C. Student’s t-Distribution
This distribution is quite similar to the normal in that it is symmetric and bell shaped.
However, the t distribution has “fatter” tails than the normal. That is, it has more
probability in the extreme or tail areas than does the normal distribution, a characteristic
quite apparent for small values of the degrees of freedom, but barely noticeable if the
degrees of freedom exceed 30 or so.
It is symmetric about its mean
82
83
2. A bead is drawn from a bag containing 6 red beads, 4 green beads, 2 yellow beads and
3 blue beads. What is the probability that a bead drawn at random
(i) is either a red bead or blue bead
(ii) is neither a yellow nor a red bead
84
It is incumbent on the researcher to clearly define the target population. There are no
strict rules to follow, and the researcher must rely on logic and judgment. The population
is defined in keeping with the objectives of the study.
85
Usually, the population is too large for the researcher to attempt to survey all of its
members. A small, but carefully chosen sample can be used to represent the population.
The sample reflects the characteristics of the population from which it is drawn.
Random sampling is the purest form of probability sampling. Each member of the
population has an equal and known chance of being selected. That is, simple random
sampling takes the selection of every possible combination of the desired number of units
equally likely. To under take the sample selection there are two types of random
sampling: Sampling with replacement and sampling without replacement.
Sampling without replacement (swr) means that once a unit has been selected, it can’t
be selected again. In other words, this means that no unit can appear more than once in
86
units, then there are ( Nn ) ways of selecting n units. Hence, simple random sampling is
1
equivalent to the selection of the
N
n ( )
possible samples with an equal probability N
n ( )
assigned to each sample.
In simple random sampling without replacement the probability of a specified unit of the
population being selecting at any given draw is equal to the probability of its being
1
selected at the first draw, that is, . However, for a sample of sizen, the probability of
N
n
including a specified unit is .
N
Sampling with replacement(sw): This process allow for a unit to be selected on more
than one draw. There are N n ways of selecting n units out of the total N units with
replacement. In this case, the order of selection will be considered. All selections are
independent since the selected unit is returned to the population before making the next
1
selection. Thus, the probability is for any specific element on each of the n draws.
N
Note that: Simple random sampling with and without replacement is practically identical
if the sample size is a very small fraction of the population size. Generally,
sampling without replacement yields more precise results and is
operationally more convenient.
In sample survey when sample units are selected there could be a bias in the selection
procedure which may come from the use of a non-random method. That is, the selection
87
There are different methods to select a random sample. The impossible part of each
random selection method is that the selection of each unit is biased purely on chance.
This eliminates selection bias, which may prevent the sample from being representative
of the population. A representative means that the sample gives an accurate (valid)
picture of the total population. If the population has N units then a random method of
selection is one which gives each of the N units in the procedures of random selection
method here.
Lottery Method: This is a very common method of talking a random sample. Under this
method, we label each member of the population by identifiable disc or a ticket or pieces
of paper. Discs or tickets must be of identical size, color and shape. They are placed in a
container and well mixed before each draw, and then without looking selecting
designated labels with or without replacement. Then draw may be continued until a
sample of the required size is selected. This shows that selection of items depends
entirely on chance.
Table of Random Number: The members of the population are numbered from 1 to N
and n numbers are selected from one of the random tables in any convenient and
systematic way. A table of random numbers consists of digit from 0 to 9, which are
equally represented with no pattern or order. The procedure of selection is outlined as
follows:
Identify the population units ( N ) and gives serial numbers from 1 to N . This,
total number determines how many of the random digits we need when
selecting each element
88
Simple random sampling is very important as a basis for development of the theory of
sampling. It serves as a central reference for all other sampling designs. Under simple
random sampling ant particular sample of n elements from the population of N elements
can be chosen and in addition, is as likely to be chosen as any other sample. In this sense,
it is conceptually the simplest possible method, and hence it is one against which all other
methods can be compared. However, despite such importance, simple random sampling
has the following limitations:
It can be expensive and often not feasible in practice since it requires that all
elements be identified and labeled prior to the sampling. This prior identification
is not possible and hence a simple random sample of elements can’t be drawn
89
Systematic sampling is often used instead of random sampling. It is also called N th name
selection technique. After the required sample size has been calculated, every N th record
is selected from a list of population members. As long as the list does not contain any
hidden order, this sampling method is as good as the random sampling method. Its only
advantage over the random sampling technique is simplicity. Systematic sampling is
frequently used to select a specified number of records from a computer file.
90
Judgment sampling is a common non probability method. The researcher selects the
sample based on judgment. This is usually and extension of convenience sampling. For
example, a researcher may decide to draw the entire sample from one "representative"
city, even though the population includes all cities. When using this method, the
researcher must be confident that the chosen sample is truly representative of the entire
population.
Quota sampling is the non probability equivalent of stratified sampling. Like stratified
sampling, the researcher first identifies the stratums and their proportions as they are
represented in the population. Then convenience or judgment sampling is used to select
the required number of subjects from each stratum. This differs from stratified sampling,
where the stratums are filled by random sampling.
Snowball sampling is a special non probability method used when the desired sample
characteristic is rare. It may be extremely difficult or cost prohibitive to locate
respondents in these situations. Snowball sampling relies on referrals from initial subjects
to generate additional subjects. While this technique can dramatically lower search costs,
it comes at the expense of introducing bias because the technique itself reduces the
likelihood that the sample will represent a good cross section from the population.
91