ssc201 Lecture Note-1
ssc201 Lecture Note-1
0 NATURE OF STATISTICS
Several Scholars of the subject ‘Statistics’ have attempted to
describe what the subject is. It should be noted that the subject ‘Statistics’
is wide or is a wide area among many others of the applied mathematics
with its theorems, symbols and notations. Some of the definitions given by
various scholars include:
Statistics is the aggregation of facts affected to marked extent of
multiplicity of courses, numerically expressed, enumerated or estimated
according to the reason standard of accuracy collected in systematic
manner for a pre-determined purpose and placed in relation to each other
Statistics is the science which deals with collection, analysis and
interpretation of numerical data
Statistics is concerned with method for treating numerical data that have
been collected in observation taken in the form of measurement or count
so that meaningful conclusions are drawn from such data
There could be many more definitions by scholars. We notice three
important facts common to these definitions
Statistics is concerned with the technique by which information is
collected, organised and interpreted
Most information analysed is quantitative data collected from the process
of sampling
The essence of statistical interpretation of processed data is decision-
making under the condition of uncertainty
Given the above background and for the purpose of this course, we
can define Statistics as follows: Statistics refers to the collection,
presentation, analysis and utilization of numerical data to make inference
and brief decision in the face of uncertainty (in Economics, businesses and
other social sciences and biological, agricultural and physical sciences).
The data collected are usually in the form of tabulation or
corresponding variables and as we know in due course, such table can be
represented by formular or graphs from a collection of data. We make a
table and from this we may draw a graph, we may proceed to draw an
1
algebraic relation between the variables using a formular.
Thus Statistics is concerned with the planning of a programme of
data collected by the process of sampling, presentation of the collected
data in a graph, tabulation or other forms, analysing the data and drawing
conclusions which may be valued or otherwise for appropriate decision-
making. Although the above definition is all-embracing with regards to the
areas of statistical application, we shall lay emphasis on application in
Economics, other social sciences and business.
3
events are not easily predictable.
1.2 Types of Statistics
Statistics is sub-divided into two: Descriptive and Inferential
Statistics.
Descriptive Statistics: is concerned with summarising and describing a
body of data. How does descriptive statistics summarise data? Data
summarisation is done by finding out one or more pieces of information
that characterise a whole data. Among the quantitative summary values
are averages and measures of dispersion. For instance, suppose we have
data on the incomes of 1000 Nigerian families, the body of data can be
summarised by finding the average family income and by finding the
spread of these family income above or below the average. Again, how
does descriptive statistics describe a body of data? This is done by
representing a body of data in graphic forms such as table, chart or graph
of population of family in each income class.
Inferential Statistics: is the process of reaching generalisation about the
whole (called the population) by examining a portion (called a sample).
The inferential statistics include those techniques by which decisions about
statistical population can be made without observing or measuring all
elements in the population. Typically, inferential statistics make use of
random sample as the basis for statistical inference. In order for such
inferences to be valid, a sample must be representative of the population
and the probability of error also must be specified. It should be noted that
inferential statistics has two aspects: (a) Estimation (b) Hypothesis testing.
Furthermore, inferential statistics involves inductive reasoning. In
inferential statistics two conditions are required:
The sample must be representative. This is to say that the sample must
fully reflect the characteristics and properties of the population from which
it is drawn
The probability of error must be specified since the probability of error
exists in statistical inference. Estimate or test of a population properties or
characteristics should be given together with the chance of probability of
being wrong. This probability theory is an essential element in statistical
inferences.
Consider again the sample of 1000 Nigerian families above. Definitely, we
4
have more than 1000 families in Nigeria. If these 1000 families are
representative of all Nigerian families, we can estimate and test hypothesis
about the average family income in Nigeria as a whole. However, since
these conclusions are subject to error, we also could have to indicate the
probability of error.
1.3 Common Terms in Statistics
Observation: In Statistics an observation refers to the things been
observed. There could be observation about any object such as height,
weight, plot etc. The numerically recorded observation, which is referred to
as data, is the raw materials with which statisticians work.
Population: In Statistics, population is the entire individuals, objects or
items which may be living or non-living that are to be observed in a given
problem situation. Consider a single toss of a coin. There are two
outcomes which are Head (H) and Tail (T). Hence, the population consists
of (H, T). In throwing a Ludo die the population is (1,2,3,4,5,6). It should be
noted that a population could be finite or infinite. The no of student
registered for SSC 201 is a finite population because the count process
can end. On the other hand, the population of stars on the sky is infinite.
Variable: is a feature poses by the member of a population e.g. age,
weight, height etc. The variable may take on different values which may be
integers or any kind of real numbers. A variable can be discrete or
continuous. A discrete variable takes on countable value each of which
can be identified exactly e.g. the number of oranges, size of family, size of
shoe. Discrete variable could be odd numbers such as 0, 1, 5, 2, 7 and
could be others exactly identifiable or countable mixed numbers such as
51/2, 21/2. The continuous variable is one which can assume any values
within any given interval. It takes any kind of real numbers which has no
exact value. Hence a continuous variable can only be measured. We
usually approximate or estimate its value e.g. distance, height etc.
Parameter: is a descriptive characteristic of a population which helps to
summarise information about the population with regards to the variable
under study e.g. the mean and standard deviation of the population.
Statistic: is a descriptive characteristic of a sample e.g. sample mean.
The statistical inference will make inferences about parameter from their
corresponding statistics.
5
Data: is the set of recorded observation made on a sample. Data are
therefore facts, unevaluated symbols or messages. They are usually
values of an attribute. They may be in the form of numeric value (i.e.
quantitative data) or non-numerical perception or observation (i.e.
qualitative data) made by man or machine. Also data can be discrete or
continuous. A discrete data is defined as that which can assume only fixed
number or value that may be identified a circle. A continuous data is
defined as data that practically have no single exact value and can be
identified only within a fixed valid grade.
Information: refers to the evaluated, validated or processed data. The
information we have about an entity are referred to as attributes each of
which has a value - data that has been processed into useful information.
For instance, a class average score computed from examination grade
provides information that is useful, to obtain such useful information, the
examination score will undergo calculation of class average score.
Information is very important for the decision-making and gathered data
will not be useful until it is processed.
Sample: is a subset of the population observed for the purpose of making
scientific inferences such as generalisation or conclusion about the
population. Recall that in other for statistical inference to be valid, it must
be based on a sample. It fully reflects the characteristics and properties of
the population from which it is drawn. A representative sample is ensured
by random sampling whereby each element of the population has an equal
chance of being included in the sample.
Sampling Frame: contains the basic details of all member of the
population from which samples are to be drawn. Statisticians believe that
without a complete Sampling Frame, a truly random sample can not be
selected. Sampling frame include voters register, telephone directory and
so on.
1.4 Reasons why a Sample is preferred to Population in most
statistical enquiries and analysis
Analysis based on a representative sample is as precise as that based on
the entire population
Use of a sample is time-saving and cost-minimizing in terms of human and
6
material cost
Use of the population to obtain some of its parameters may not be feasible
i.e. not practicable especially with infinite population (i.e. population whose
number is too large to be known) or when the observation process is
disrupted e.g. In testing the efficacy of a new vaccine on new raw
materials in production.
7
2.2.1 Steps in using systematic sampling method
Let the population be of size ‘N’ and let the sample size be ‘n’ with
defined number ‘k’ such that k = N/n. For instance, if N = 15000 and n =
100, then k = 15000/100 = 150. In case N is not exactly divisible by n, k is
taken as the nearest integer e.g. if we have k = 20/3 = 6.667 7 as the
determined number k. Then select randomly the first member of the
sample from the 1st kth element in the sampling frame. In other words, if x
is the 1st element in the sample then we select x such that 1 < x < k. Other
members of the sample are then selected by choosing every k th element
thereafter. The procedure can be summarized according to the following
steps.
Step1: assign randomly to every member of the population the unit 1 to N.
If the population is 15000 then we have x1, x2, x3,..., x15000, where x is a
member of population
Step2: determine the number k defined as k = N/n
Step3: select randomly the 1st element of the sample from the 1st kth
element e.g. if N = 20, n = 4, then k = 20/4 = 5, then the 1st element of the
sample is selected from the 1st 5 element
Step4: we then select every kth element thereafter. According to this
procedure, the sample size will consist of the element S 2, S7, S12, S17, If ‘a’
refers to the subscript on the 1st randomly selected item then the elements
contained in the sample will form an arithmetic progression defined as S a,
Sa+k, Sa+2k, ..., Sa+(n-1)k.
Note: (i) Systematic sample requires a specification of sampling frame (ii)
It may not be practicable if the population is infinite (iii) The system may be
bias in the presence of hidden periodicity
2.3 Stratified Sampling
This is a sampling procedure which involves dividing the population
into a number of non-overlapping sub-populations (strata). Then we take
sample from each stratum by any suitable random method. Stratified
sampling procedure is good and appropriate when our sample is to be
drawn from an heterogeneous population e.g. human population with
varying economic or social group, population of automobile with varying
brands, relevant question requiring satisfactory answer and post by the
8
use of stratified sampling technique. This includes: (i) What should be the
bases of stratification (ii) How many strata should be formed (iii) What
sample size should be allocated to different strata (iv) How should a
sample within each stratum be taken
Usually, answer to question 1 and 2 should depend on a good
judgement of the researcher. With regards to question 3, very often it is
necessary in the selection of required sample to ensure the sizes are
chosen in a way that makes them proportional to the sizes of the
respective strata. This is called proportional allocation.
In general, if we divide a population of size N into k strata of sizes
N1, N2, N3,..., Nk and take a sample of size n1, n2, n3, ..., nk, we say that the
allocation is proportional if n1/N1 = n2/N2 = n3/N3 = ... = nk/Nk or nearly we
can show that ni = Ni/N × n for i=1,2,3,...,k where n = total sample size and
N = n1+n2+n3+...+nk.
Stratified sample of size n = 60 is to be taken from the population of
size N= 4000 which consists of 3 strata of N 1= 2000, N2= 1200 and N3=
800. If the allocation is to be proportional, how large a sample must be
taken from each stratum?
2.4 Quota Sampling
In stratified sampling the cost of taking random sampling from the
individual strata is often so expensive that the interviewers are simply
given quota to fill from different strata with very few prescriptions on how to
be filled. This kind of sampling is called quota sampling. Basically, it
involves two steps: (i) Classifying population into distinct groups (ii)
Allocating quota to each group
For instance, in determining students attitude towards increment in
school fees in a particular institution, an interviewer may be told to select
10 students from college A, 5 students from college B, and 10 students
from college C etc, will be the actual selection of those to be interviewed
being left to the discretion of the researcher or interviewer.
A major disadvantage of this method of sampling is the absence of
firm restriction on their choice. Interviewers naturally tend to select
individuals who are most readily available in order to reduce cost and time.
Major advantages in Quota Sampling include the following: (i) It saves
9
time and it is quick (ii) It is convenient (iii) It is relatively cheap
2.5 Cluster Sampling
A cluster sample is one in which the element in the target population
are selected in group rather than individually. As initial step in sampling,
the group to be included are selected by simple random sampling as
follows. Each group in the target population is assigned a serial number.
Then, the sample groups can be selected by reference to the table of
random number.
Cluster Sample is often used when the elements in the population
are not easily identified individually but are grouped i.e. clustered together
and are more easily identified as members of the cluster. Cluster Sampling
could be the most useful in the absence of complete sampling frame unlike
the other methods of sampling described earlier. For instance, if we wish
to study the hourly wage rate of workers in a large metropolitan area, it
would be difficult if not impossible to obtain the listing of all individual wage
earners. However, we could randomly sample the firm in which people
work, which would represent cluster of employees that may be included in
the cluster sample.
2.6 Multi-Stage Sampling
Multi-stage sampling involves a sampling procedure of more than
one stage that first consists of breaking down the population into a set of
distinct groups. From this a number of group are selected. Each group
selected is broken down into units from which a sample is taken. If we stop
at this stage we have a two-stage sampling. Further stages may be added
by breaking each unit into a still small unit, we could have 3, 4 or 5 stage
sampling.
13
3.4.2 Bar Chart: A bar chart or graph consists of a set of equally spaced
rectangle drawn in the cartesian plane with equal width but with height
proportional to the frequency of the variable attribute with which we are
concerned. These set of rectangle can be drawn vertically or horizontally.
Bar chart could be simple, multiple or component in nature. A simple bar
chart comprises a number of equally spaced rectangle. A multiple bar
chart is usually used in the comparison of two or more attributes. For
instance, if two attributes have been compared, we have pairs of rectangle
standing together. A component bar chart comprises of bars with each bar
subdivided into components.
Illustration: (1) present the data of marital status in the above example in
simple bar chart.
14
(2) the sex distribution of a member of staff in a television station are
written below
Departmen Male Female Total
t
Admin.(I) 25 15 40
Programm. 65 30 95
(II)
Commercia 45 40 85
l (III)
News (IV) 35 15 50
Sports (V) 30 10 40
Represent the above data in Multiple and Component bar charts
Multiple bar chart
16
This is a table in which possible values for variables are grouped into
classes and the number of observed values that belong to each class is
recorded. Data organised in a frequency distribution are called grouped
data. In contrast for ungrouped data, every individual observed value of the
random variable is listed regardless of whether or not the data are
grouped. The collection of values may be for either a sample or a
population.
Suppose we have a set of raw data which has been collected:
67 73 71 74 61 68 70 66 73 70 68 67 72 69 71 69 76
70 72 71 77 69 71 74 66 68 70 72 72 70 71 70 64 65
70 69 72 75 66 67 70 72 67 70 71 68 66 73 69 67
The scores listed above are not easy to interpret owing to the absence of
any organisation of the data. The table below is the frequency distribution
of the scores above.
Score (%) No. of Students
60-62 1
63-65 2
66-68 13
69-71 20
72-74 11
75-77 3
The advantage of frequency distribution is that such a table makes it
easier to interpret the reported value.
4.2 Features of a Good Frequency Distribution Table
Class Limit: for each class in a frequency distribution the lower and upper
class limit identify the value indicated in the class e.g. the class limits for
the 1st class of scores reported in the table above are 60-62% inclusive.
Class Frequency: this indicates the number of scores for each class of
the frequency distribution e.g. the frequency for the class 75-77 is 3.
Class Interval and Width: this is the difference between the upper and
lower boundary of each class. In other words, the class interval which
indicates the range of values included in each class. Thus, the class
interval is determined by subtracting the lower boundary (B L) from the
upper class boundary (BU) for each class i.e. c = BU + BL
17
Class Boundary: the class boundary is defined as the point half-way
between the upper limit of one class and the lower limit of the next class. It
is calculated by applying the following rules
Lower Class Boundary = Lower class limit – 1/2D
Upper Class Boundary = Upper Class limit + 1/2D
where D is the common difference between the upper class limit of any
class and the lower class limit of the next class.
Class Mid-Point: Also known as the class mark is the middle value of
each class and is located half-way between the upper limit or boundary
and lower limit or boundary of each class as reported in the table below. M
= BL + 0.5C
Class boundaries are used for the calculation and in drawing the graph of
distribution
4.3 Relative Frequency Distribution: is one in which the number of
observation for each class is converted into relative frequency by dividing it
by the total number of observations in the entire distribution. Each relative
frequency is thus in proportion.
Score (%) Absolute Frequency R.F
(f)
60-62 1 0.02
63-65 2 0.04
66-68 13 0.26
18
69-71 20 0.40
72-74 11 0.22
75-77 3 0.06
19
= X 12 + X22 + X32 + X42 + X52
= 1 + 2 + 3 + ... + (n-1) + n
= 1 2 + 22 + 32 + 42
Note that “i” does not have to begin at 1 unless otherwise specified. “i”
always increases in step 1. Using the notation we also have:
= a + a + a + a = 4a
20
= ±
= +
21
9 2 18
Σfi = 30 Σfixi = 161
= ΣFiXi/ΣFi = 161/30 = 5.3667
4.6.1 Mean of a grouped frequency distribution
For a continuous frequency distribution or a grouped discrete
distribution, we clearly can not use the previous method because it does
not have a distinct value but ranges of values of x. What we do here is
simply take the mid-point of the class to represent x value for the class and
proceed in the usual way.
Illustration:
class Mid-point(x)Frequency Fx
(f)
5.00-5.49 5.25 12 63
5.50-5.99 5.75 32 184
6.00-6.49 6.25 11 68.75
6.50-6.91 6.75 8 54
7.00-7.49 7.25 2 14.5
Σfi = 65 Σfixi = 384.25
= 5.91
4.7 The Median
The Median of a set of numbers x1, x2, ... , xn is defined as the middle
value of the set when arranged in size order. If the set has even number of
items, the median is taken as the mean of the middle two.
The median is a measure of central tendency i.e. a measure of an
observation that occupies the middle position in an array of values.
Determination of the median requires that the data be re-arranged either in
an ascending or descending order. For M ordered observation when N is
odd, the median represented by Me = (N+1/2)th item. When N is even Me is
determined by finding the mean of the two values M e = (1/2N)th item.
Illustration:
22
Suppose we have the following data: 44, 40, 79, 42, 51, 59, 71, 44, 60, 65,
45, 40, 42, 44, 44, 45, 51, 59, 60, 65, 71, 79. The median value (M e) = 51
Suppose we have the following set of data: 40, 42, 44, 44, 45, 51, 59, 60.
Me = 44 + 45/2 = 44.5
For a discrete frequency distribution taking the value x1, x2, x3, ..., xn
and corresponding frequency f1, f2, ..., fn, the median frequency is (Σf+1/2)th
item. If Σf is large enough such that there is a difference, it is usually found
convenient to include a column of cumulative frequency when calculating
the median for discrete frequency distribution.
Find the median to the frequency distribution
X 0 1 2 3 4 5 6
f 5 5 10 20 30 20 10
Solution
x f Cf
0 5 5
1 5 10
2 10 20
3 20 40
4 30 70
5 20 90
6 10 100
Σf = 100
Median = ( /2) item = (
N+1 th 100+1/2) th item = 50.5th item = 4. The median is 4
because the 50.5th item falls at x = 4
When given a continuous or grouped discrete distribution, you can
only estimate the value for the median for grouped data. The median class
can be identified by identifying the class that contains ( N+1/2) th item.
However, the median value has to be obtained by using the formular: M e =
L + [N/2 – F/fm]c, where: L = lower class boundary of the median class, N =
number of observation in the data set i.e. total number of frequency, F =
sum of the frequencies up to but not including the median class, Fm =
frequency of the median class, C = size of the interval of the median
class
Illustration: consider the following frequency distribution of scores of
23
student in an examination. Obtain the median score
Score 60-62 63-65 66-68 69-71 72-74 75-77
No of students
1 2 13 20 11 3
Solution
X F Class boundary
Cum. Freq.
60-62 1 59.5-62.5 1
63-65 2 62.5-65.5 3
66-68 13 65.5-68.5 16
69-71 20 68.5-71.5 36
72-74 11 71.5-74.5 47
75-77 3 74.5-77.5 50
24
Recall that median = 42.5th item. Hence, from the graph Me = 37
(approximately). Usually, the median estimate is a better estimate provided
we draw a smooth curve through the plotted point.
4.8 Quantiles
The median has been defined has the ‘middle’ value of a set of
numbers arranged in size order. When applying the frequency distribution,
you can think of the media as splitting the area under a frequency curve
into 2 equal portions as in figure below.
Fig. A:
Note: (i) The broken vertical line represents deciles not shown (ii) The 5 th
decile, D5, again coincides with median
Since the deciles split a set or distribution into 10 equal distributions,
then for a size ordered distribution D 1 = 1/10 (n + 1)th item; D2 = 2/10 (n + 1)th
item; D3 = 3/10 (n + 1)th item; ... Di = i/10 (n + 1)th item for i = 1, 2, 3, ..., 9.
26
Percentiles: the 99 values that split a distribution into 100 equal portions
are known as percentiles and are represented by P 1, P2, P3,..., P99. Again,
P50 is the median. It could be cumbersome to fully illustrate percentiles on
the frequency curve but the following graphical illustration might do.
Fig. D:
The doted vertical lines represent the percentiles not shown. Since
the percentiles split a set or distribution into 100 equal parts, P 1 = 1/100 (n +
1)th item; P2 = 2/100 (n + 1)th item; P3 = 3/100 (n + 1)th item ... Pi = i/100 (n + 1)th
item, where i = 1, 2, 3, ..., 99.
Notice that P50 = 50/100 (n + 1)th item = ½ (n + 1)th item. Hence P50 =
median. Collectively, all quantities that are defined as splitting a
distribution into a number of equal portions (including median, quartiles,
deciles and percentiles) are called Quantiles. In general, if a particular
quantile splits a distribution into ‘s’ equal parts, the r th quantile of the set =
r/ (n + 1)th item of the size-ordered distribution.
s
Illustration: the set of observations below gives the grade on a subject for
a class of 40 students. Find: (i) 1 st quartile (ii) D3 and (iii) sixtieth percentile
7 5 6 2 8 7 6 7 3 9 10 4 5 5 4 6 7 4 8 2
3 5 6 7 9 8 2 4 7 9 4 6 7 8 3 6 7 9 10 5
x f fx Cf
2 3 6 3
3 3 9 6
4 5 20 11
5 5 25 16
6 6 36 22
7 8 56 30
8 4 32 34
9 4 36 38
27
10 2 20 40
total Σ f = 40 Σ fx = 240
28
Solution
x f cf Class boundary
70-72 5 5 69.5-72.5
73-75 18 23 72.5-75.5
76-78 42 65 75.5-78.5
79-81 27 92 78.5-81.5
82-84 8 100 81.5-84.5
29
4.9 The Mode
The mode of a set of values is defined as that one which occurred
with the greatest frequency. For example, for the following set of values 2,
3, 3, 1, 3, 2, 4, 5, 8, 3, 2, 4, 4, 3, the mode is 3 since it occurred mostly.
For continuous grouped frequency distribution, mode = L + [ d1/d1+d2]c,
where L = lower class boundary of the modal class, d 1 = frequency of the
modal class minus frequency of the immediate previous class, d 2 =
frequency of the modal class minus frequency of immediate following
class, c = width of class interval.
Note that the quantity d1/d1+d2 is always between 0 and 1 ensuring that
the mode must lie in the predefined modal class.
Illustration: Estimate the mode of the following distribution
X 9.3- 9.8- 10.3- 10.8- 11.3- 11.8- 12.3- 12.8-
9.7 10.2 10.7 11.2 11.7 12.2 12.7 13.2
30
f 2 5 12 18 14 6 4 1
Sk = 0
31
A distribution is positively skewed if the right tail is long. The mean is
greater than the median
Sk = +ve
A distribution is negatively skewed if the left tail is longer. Then, mode >
median > mean
Sk = -ve
The difference in the shape of the curve can be made more apparent when
the 3 cuves are superimposed on the same graph as shown below.
33
5.0 Measure of Dispersion
Dispersion refers to variability or spread of the data. The measures
of dispersion include the range, average of (mean) deviation, standard
deviation and variance.
5.1 Range
Range is the simplest of all methods of dispersion and can be
calculated very easily and quickly. The range of a set of numbers is the
difference between the smallest and largest number in the set. For a
grouped frequency distribution, an estimate of the range would be the
difference between the lower class boundaries of the 1 st class and the
upper class boundary of the last class.
5.1.1 Illustration: the range of the set 2, 3, 8, 9, 7, 5, 3, 8, 9, 2, 4 is 9 – 2
= 7, since 9 is the largest and 2 is the smallest number.
Range is particularly useful in cases where we need to calculate the
dispersion of very small set of numbers where other techniques will be far
too time-consuming. However, the simplicity of this measure in particular
refers that it uses only two extreme values (ignoring all others), preclude its
use in any expensive analysis.
5.2. Mean Deviation
The mean deviation from the mean of a set of numbers x1, x2, x3 ... xn
with arithmetic mean is defined as:
Mean deviation = where i = 1, 2, 3 ... n, |Xi – X| is
the positive difference between Xi and X is called the modulus of Xi – X
5.2.1 Illustration: Consider the set of data 2, 3, 5, 3, 4, 1
X X–x |x – x|
2 -1 1
3 0 0
5 2 2
34
3 0 0
4 1 1
1/ -2 2
18
Mean (X) = /n = /6 = 3; Mean deviation = 1/6 (6) = 1
Σ f 18
Solution
36
X = Σfx/ Σf = 69.8, S = √475.4/50 = 3.08
5.5 Variance
It is the square of the standard deviation i.e. V = S 2. For example, if
S = 3.08, then V = 3.082 = 9.49.
The following procedure is to be used in finding the standard deviation of a
grouped data
Calculate the mean
Calculate the deviation of each observation from the mean
Square the deviation
Multiply the squared deviation by the frequency of each class
Apply the formular for standard deviation
37
by definition, it is a subset of every set.
Equivalent Set: this occurs when the elements of one set is equal to that
of another set without regard to order. Given E = {3, 6, 7} and S = {6, 3, 7},
then E and S are equivalent.
The order of a set: the order of a set A written as n(A) is defined as the
number of elements in set A.
Set Notation and Venn Diagram
Intersection of set – set of all elements which belong to both A and B is the
intersection of A and B i.e. A n B = {x: x c A, x ε B}. A n B is illustrated in
the diagram below
Disjoint set: two sets A and B are said to be disjoint if they have no
element in common. That is to say that A n B = Ø or A n B = {x: x ¢ A, x ¢
B}
38
Associative Law - A U (B U C) = (A U B) U C; A n (B n C) = (A n
B) n C
Distributive Law - A U (B n C) = (A U B) n (A U C)
Law of Compliment - A n A1 = Ø; A U A1 = U; (A1)1 = A
Idempotent Law - A U A = A; A n A = A
De Morgan Law - (A U B)1 = A1 n B1; (A n B)1 = A1 U B1
6.2 Probability
The probability is the study of random or non-deterministic events.
The depth of probability goes with early development to the interest of
European Mathematician in game set chance in the latter part of 17 th
century. Since then the concept of probability has become the foundation
for the development of technique on statistical inference that are used in
many views of basic and applied research, including economic analysis
and managerial decision-making.
39
empty set – A n B = Ø. Simply put, mutually exclusive events can not occur
simultaneously.
This approach permits the determination of probability value before
any sample results are observed. For this reason the approach has
become apriori approach i.e. before the experiment. The classical or priori
approach to probability can only be applied to game of chance (such as
tossing a balanced coins, rolling a fair die or picking a card from a well-
shuffled lot) where we can determine without experimentation, the
probability value that an event will occur. In real world, problem of
economics and business we often can not assign probability value apriori
(without experimentation) and the classical approach cannot be used. The
classical approach for defining probability requires equally likely
possibilities. However, there are many situations in real life in which the
possibility that arise cannot be regarded as equally likely.
42
A1 or Ac, the compliment of A, is the event that occurs if A does not occur
6.3.1 Illustration: Consider the toss of a die, the sample space consists of
six possible numbers
S = {1, 2, 3, 4, 5, 6}. Let A be the event that even numbers occur, B that
odd number occurs and C that prime number occurs. Therefore, A = {2, 4,
6}, B = {1, 3, 5} and C = {1, 3, 5}.
Find (i) A U C (ii) B n C and (iii) C c
A U C = {1, 2, 3, 4, 5, 6}; B n C = {1, 3, 5}; Cc = {2, 4, 6}
6.4 Probability of a single event: If an event A in nA ways out of the
total of N possible and equally likely outcomes, the probability that event A
will occur is given by P (A) = nA/N. the probability of event A so described
can be visualized using the Venn diagram . The following figure represents
event A and total area of the rectangle represents all possible outcomes.
43
does not preclude the occurrence of the other event or vice versa. Then,
P(A U B) = P(A) + P(B) – P(A n B); P(A or B) = P(A) + P(B) – P
(A n B)
Probability of A n B is deducted to avoid double counting.
6.7 Rule of multiplication for independent events
Two events A and B are said to be independent if the occurrence of
A is not connected in any way to the occurrence of B. then, the joint P(A)
and P(B) is P(A n B) = P(A) . P(B)
6.7.1 Illustration:
Let A and B be two events. Find an expression and exhibit the Venn
diagram for the event that (i) A but not B occur (ii) either A or B but not
both occur.
Since A but not B occur shade the area of A outside B. this is the area that
represent the intersection of A and B 1. The expression for this event is A n
B1 and Venn diagram is drawn below.
Either A or B but not both occur would imply that one of the two events
occur
44
7.0 RANDOM VARIABLES AND PROBABILITY DISTRIBUTION
A random variable is a variable whose values are associated with
some probability or chance of being observed. For instance, in one roll of a
pair die, we have six mutually exclusive outcome defined by the sample
space S = {1, 2, 3, 4, 5, 6} with each sample point i.e. each of the
outcomes is associated with a probability occurrence of 1/6. P(1) = 1/6, P(2)
= 1/6, P(3) =1/6, P(4) =1/6, P(5) =1/6, P(6) =1/6; x = 1, 2, 3, 4, 5, 6. This
outcome from a roll of a die is called a random variable.
A continuous variable is one that can assume any value within any
given interval. A continuous variable can be measured with any degree of
accuracy simply by using smaller units of measurement e.g. time, income,
distance, waves and temperature. We know that it is impossible for a
continuous variable to assume an exact value i.e. to say that a production
time takes 10 hours means anywhere between 9.5 and 10.4 hours i.e. 10
hours rounded to the nearest hour. It is equally impossible to assign a
probability to any single value that a continuous variable might take instead
we construct ranges of value for the variable and assign a probability to
each of these variables e.g. suppose x is a continuous random variable
with value from 0 to 5, we could have probability assigned as following.
Range of 0-1 1-2 2-3 3-4 4-5
x
Probabilit 0.2 0.2 0.4 0.1 0.1
y
45
of the form:
X X1 X2……. Xn Total
F F1 F2……. Fn ΣFi
Probability distribution on the other hand is of the form:
X X1 X2……. Xn Total
P P1 P2……. Pn Σ Pi
In other words, let us associate a random variable x with the
numbers of scores shown after a single throw of an unbiased die. If we
throw the die for 36 times we might obtain the following tabulated result
from this empirical distribution.
Table I
X 1 2 3 4 5 6 Total
f 4 6 8 7 5 6 36
Table II
X 1 2 3 4 5 6 Total
P 1/
6
1/
6
1/
6
1/
6
1/
6
1/
6 1
46
the sum of the frequencies to have the following table
Table III
X 1 2 3 4 5 6 Total
Proportio 4/
36
6/
36
8/
36
7/
36
5/
36
6/
36 1
n
48
When p < ½ the distribution will be skewed to the right.
When p > ½ the distribution is correspondingly skewed to the left
In general for small n (n ≤ 10) is the closer p is to 0 or 1, the more
skewed (right or left) the binomial distribution is. However as n becomes
larger the skewness tends to be corrected as distribution become more
symmetric.
49
P(X=x) = nCxpxqn-x ; P(X=3) = 5C3(1/2)3(1/2)5-3 = 5/16
P(at least 3 heads) = P(X≥3) = P(x=3) + P(x=4) + P(x=5)
= 5C3(1/2)3(1/2)2 + 5C4(1/2)4(1/2)1 + 5C5(1/2)5(1/2)0
= 5/16 + 5/32 + 1/32
=½
P(at most 3 heads) = P(x≤3) = P(x=0) + P(x=1) + P(x=2) + P(x=3)
= 5C0(1/2)0(1/2)5 + 5C1(1/2)1(1/2)4 + 5C2(1/2)2(1/2)3 + 5C3(1/2)3(1/2)2
= 1/32 + 5/32 + 10/32 + 10/32 = 13/16 or 0.813
7.3.2 The Poisson Distribution
The Poisson distribution is another discrete distribution. it is used to
determine the probability of a designted number of success per unit of
time, when the event of successes are independent and the average
number of successes per unit of time remained constant. The conditions
that apply to binomial distribution also apply to poisson distribution. Then,
P(x) == λxе-λ/x! for x = 0, 1, 2, 3, .. n where x = designated number of
successes, P(x) = probability of x number of successes. The Greek letter
λ, lambda is equal to average number of successes per unit of time, е =
base of the natural logarithmic equation and the value is given as 2.71828
approximately. It can be showing that mean = λ and the variance = λ. As
with the binomial distribution the exact shape of poisson distribution
depends on the value of the parameter λ. Note that λ can take any positive
value.
Consider λ = 1, then P (x) == 1е-1/x! = 1/e . 1/x! = 1/ex!
P(x) = 0.3679/x!
Therefore, P(x=0) = 0.3679/0! = 0.3679; P(x=1) = 0.3679/1! = 0.3679
P(x=2) = 0.3679/2! = 0.1840; P(x=3) = 0.3679/3! = 0.0613; P(x=4) = 0.3679/
4!
= 0.0153
P(x=5) = 0.3679/5! = 0.0031
As the value of x increases the value of subsequent probability drops
50
quite rapidly
λ = 3, P(x) = λxе-λ/x! = 3xе-3/x!, now e-3 = 0.0498, P(x) = 3x(0.0498)/x!
hence, when x = 0 , P(x) = 30(0.0498)/0! = 0.0498
when x = 1, P(x) = 31(0.0498)/1! = 0.1494
when x = 2, P(x) = 32(0.0498)/2! = 0.2241
Given the value of λ (mean and variance of poisson distribution) we
can find e- λ from the table. Then we substitute the value of e - λ in the
equation P(x) = λxе-λ/x! to find P(x). As with binomial the exact shape of the
distribution depends on the value of the parameter λ.
7.3.2.1 Illustration: A bank receives on the average 5 bad cheques
per day. What is the probability that on a particular day, the bank will
receive (a) no bad cheques (b) 3 bad cheques (c) less than 3 bad cheques
(d) at least 2 bad cheques
7.3.2.2 Solution
λ = 5, P(x) = 5xе-5/x!, when x = number of bad cheques
P(x = 0) = 50е-5/0! = 0.00674
P(x = 3) = 53е-5/3! = 0.1404
P(x < 3) = P(x=0) + P(x=1) + P(x=2)
= 0.00674 + 51е-5/1! + 52е-5/2! = 0.00674 + 0.00337 + 0.08425 = 0.12469
P(x ≥ 2) = P(x=2) + P(x=3) + P(x=4) + P(x=5) = 1 - P(x<2) = 1 – [P(x=0) + P
(x=1)]
= 1 – (0.00674 + 0.0337) = 0.95956
51
n
7.3.3 Normal Distribution
The continuous variable x having P(f) i.e. probability distribution
function of the form
For – ∞ < x < ∞ and – ∞ < µ < ∞, where σ > 0 is said to have a
normal distribution where f(x) is the height of the normal curve, е = the
constant 2.7183, π = the constant 3.1417, µ = the mean of the distribution
and σ = the standard deviation of the distribution.
The normal distribution is the most commonly used of all the
probability distributions in statistical analysis. This is because many
distributions found in nature and industry are normal. Some examples are
the IQ, weight and height of a large number of people and the variation in
dimension of a large number of parks procedures by a machine.
A convenient shorthand notation for a random variable distributed
normally is X ~ N(µ, σ2) and reads that X is distributed normally with
parameters µ and σ2. Suppose, X ~ N(8, 25), the mean = 8, variance σ2 =
25, standard deviation σ = √25 = 5.
7.3.3.1 Features of a Normal Distribution
The curve is bell-shaped and symmetrical about a vertical axis through the
mean µ
The mode which is the point on the horizontal axis where the curve
maximum occurs where X = µ
The normal curve approaches the horizontal axis asymptotically as it
proceeds either direction away from the mean.
Total area under the curve and above the horizontal axis is equal to 1. This
feature is illustrated graphically below
Figure A
52
Depending on the value of the parameter µ and σ2 so the curve will
alter in the following appearance.
Figure B below shows the effect of altering µ with sigma fixed and
figure C shows the effect of altering sigma with µ fixed.
7.3.3.3 Example:
find Pr(Z < 1.64)
The area is shaded. Therefore, Pr(Z < 1.64) = 0.9495 from the table
Find Pr(Z < 0.1)
54
However, we may be asked to find the probability of Z taking all
values such as Pr(Z ≥ x), Pr(Z < a) and others. There is need to do by
appropriate transforming the given problem into the form Pr(Z < x); x ≥ 0.
The following relationships enable us to calculate all types of probability
connected with the standard normal distribution using the symmetric
feature of the normal curve.
7.3.3.4 Illustration 1
Find the probability of the form Pr(Z > a); a ≥ 0
The probability of shaded + unshaded area = 1 i.e. Pr(Z > 0.5) = 1 – Pr(Z <
0.5) = 1 – 0.6915 = 0.3085
7.3.3.5 Illustration 2
Find the probability of the form Pr(Z < -a)
55
i.e. Pr(Z > -a) ≡ Pr(Z < a) …………………Rule 2
7.3.3.6 Illustration 3:
Find the probability of the form Pr(Z < -a)
The required area is Area B. Pr(b < Z < c) = Pr(Z < c) – Pr(Z < b)
…………Rule 4
In the above figure, we have assumed that both “b” and “c” are positive but
the same argument applies if either b or c or both of b and c < 0.
7.3.3.8 Examples:
Find Pr(0.95 < Z < 1.36)
Using rule 4, Pr(0.95 < Z < 1.36) = Pr(Z < 1.36) – Pr(Z < 0.95) = 0.9131 –
56
0.8289 = 0.0842
Find Pr(-1.50 < Z < 2.50)
Pr(-1.50 < Z < 2.50) = Pr(Z ≤ 2.5) – Pr(Z ≤ -1.5) = Pr(Z ≤ 2.5) – [1 – Pr(Z ≤
1.50)]
= 0.9938 – (1 – 0.9332) = 0.927
Pr(95 < x < 10.5) = Pr(Z < 95) – Pr(Z < 105)
Pr(Z < 95) = 95 – 100/4 = -5/4 = 1.25
Pr(Z < 105) = 105 – 100/4 = 1.25
1.25
Z-scale
0
-1.25
57
definitions.
Statistical Hypothesis: this is an assumption about the value of the
parameter of the population under consideration
Null Hypothesis: is a statistical hypothesis which can be tested. Often, it
will be just that hypothesis that we are suspicious about and which to
disprove. The symbol used for the null hypothesis H 0 and we will write H0:
µ = µ0 where µ = mean and µ0 = assumed value for the mean
Alternative Hypothesis: is that hypothesis that is automatically accepted if
the null hypothesis is rejected. It is represented by H 1. For instance, the
alternative hypothesis could be H1: µ not equal to µ0; µ > µ0; µ < µ0. The
alternative hypothesis adopted depends on the nature of the problem.
There are two types of alternative hypothesis.
Two-Tailed Test: a two-tailed alternative hypothesis test considers any
change in the parameter. The change can be increase or decrease. Hence
a two-tailed hypothesis is stated in the form H 1: µ not equal to µ0
One-Tailed Test: a one-tailed alternative hypothesis test strictly consider
either an increase or a decrease in the value of a tested parameter e.g. H 1:
µ > µ0 (Right-tailed) or H1: µ < µ0 (left-tailed).
Critical Region: a critical region is the set of possible value that a sample
statistic e.g. Z-distribution can take that leads us to reject the null
hypothesis. The critical region is also called rejection region
Acceptance Region: this is the set of the possible value that a sample
statistic can take that leads us to accept the null hypothesis.
Level of Significance: this is the probability that a test statistic is in the
critical (rejection) region under H0. The most commonly used level of
significance are 5% and 1%. We shall use 5% level of significance. 0.05
level of significance the critical value defining acceptance region or
rejection region are illustrated below for both two-tailed and one-tailed
tests, using the Z-distribution. It is represented by α.
Two-tailed test
58
Right-tailed test
Left-tailed test
Decision Rule: a decision rule for a statistical test is a model given value of
a sample or test statistics that could lead to either acceptance or rejection
of the stated null hypothesis. Decision rule are illustrated below for two-
tailed and one-tailed test at 5% level of significance.
Decision Rule for a two-tailed test: accept H0 if |Z*| < 1.96. Otherwise
reject H0.
Z* = x - µ/σ/√n where x = sample mean, σ = sample variance/ population variance
and n = sample size
Decision Rule for a one-tailed test: (a) for a right-tailed test, accept H0 if
Z* < 1.64. Otherwise, reject H0 (b) for a left-tailed test, accept H0 if Z* >
1.64. Otherwise, reject H0
8.1 Illustration: Suppose we have the following hypothesis to be tested
given x = 7.5
H0: µ = 8.5; H1: µ not equal to 8.5.
Given a sample size of 50 from population variance 10, we proceed as
follows at α = 0.05.
59
8.1.1 Step 1: Calculate the Z value corresponding to the mean
Z* = x - µ/σ/√n = 7.5 – 8.5/10/√50 = -0.71; |Z*| = 0.71
8.1.2 Step 2:
Decide the acceptance region and rejection from the set table at α = 0.05,
the value of Z leads α/2 = 0.025 of the area in each tail is Z0.025 ≡ ± 1.96
8.1.3 Step 3:
Set up the decision rule. Accept H0 if |Z*| < 1.96. Otherwise, reject H0
8.1.4 Step 4:
Make decision by comparing the value of Z* with tabulated value of Z.
Since |Z*| = 0.71 < 1.96, we therefore accept H 0 to be true.
8.1.5 Step 5:
Make conclusion. Since H0 has been accepted it means that the sample
mean value of 7.5 is not significantly different from our apriori value of µ =
8.5
8.2 Errors in statistical test
In all type of statistical test, it is possible to make error. There are
two important errors that we need to be aware of:
60
Pr(type-II-error) = Pr(accepting H0 when it H1 is true) = β
Compute the sample mean from a random sample and standardize on the
scale
8.4 Example:
61
1.96
8.4.1 Decision rule: Reject H0 if |Z*| > 1.96. Accept H0 if |Z*| < 1.96
8.4.2 Conclusion: Since |Z*| = 1.67 < 1.96, we accept H 0 and conclude
that the producer should accept that the breaking of the cable is not
significantly different from 5000.
62