0% found this document useful (0 votes)
152 views62 pages

ssc201 Lecture Note-1

Statistics can be defined in several ways but generally refers to the collection, organization, analysis, and interpretation of numerical data. It is used to make inferences about populations based on samples. There are two main types of statistics: descriptive statistics, which summarizes and describes data through things like averages and graphs; and inferential statistics, which allows generalizing about larger populations based on samples. Inferential statistics requires samples to be representative of populations and probabilities of error to be specified. Statistics is widely applied in fields like economics, business, social sciences, medicine, agriculture, and physical sciences to aid decision-making under uncertainty.

Uploaded by

Adebayo Omojuwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views62 pages

ssc201 Lecture Note-1

Statistics can be defined in several ways but generally refers to the collection, organization, analysis, and interpretation of numerical data. It is used to make inferences about populations based on samples. There are two main types of statistics: descriptive statistics, which summarizes and describes data through things like averages and graphs; and inferential statistics, which allows generalizing about larger populations based on samples. Inferential statistics requires samples to be representative of populations and probabilities of error to be specified. Statistics is widely applied in fields like economics, business, social sciences, medicine, agriculture, and physical sciences to aid decision-making under uncertainty.

Uploaded by

Adebayo Omojuwa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

1.

0 NATURE OF STATISTICS
Several Scholars of the subject ‘Statistics’ have attempted to
describe what the subject is. It should be noted that the subject ‘Statistics’
is wide or is a wide area among many others of the applied mathematics
with its theorems, symbols and notations. Some of the definitions given by
various scholars include:
 Statistics is the aggregation of facts affected to marked extent of
multiplicity of courses, numerically expressed, enumerated or estimated
according to the reason standard of accuracy collected in systematic
manner for a pre-determined purpose and placed in relation to each other
 Statistics is the science which deals with collection, analysis and
interpretation of numerical data
 Statistics is concerned with method for treating numerical data that have
been collected in observation taken in the form of measurement or count
so that meaningful conclusions are drawn from such data
There could be many more definitions by scholars. We notice three
important facts common to these definitions
 Statistics is concerned with the technique by which information is
collected, organised and interpreted
 Most information analysed is quantitative data collected from the process
of sampling
 The essence of statistical interpretation of processed data is decision-
making under the condition of uncertainty
Given the above background and for the purpose of this course, we
can define Statistics as follows: Statistics refers to the collection,
presentation, analysis and utilization of numerical data to make inference
and brief decision in the face of uncertainty (in Economics, businesses and
other social sciences and biological, agricultural and physical sciences).
The data collected are usually in the form of tabulation or
corresponding variables and as we know in due course, such table can be
represented by formular or graphs from a collection of data. We make a
table and from this we may draw a graph, we may proceed to draw an

1
algebraic relation between the variables using a formular.
Thus Statistics is concerned with the planning of a programme of
data collected by the process of sampling, presentation of the collected
data in a graph, tabulation or other forms, analysing the data and drawing
conclusions which may be valued or otherwise for appropriate decision-
making. Although the above definition is all-embracing with regards to the
areas of statistical application, we shall lay emphasis on application in
Economics, other social sciences and business.

1.1 Why study Statistics?


It has been said above that Statistics is a body of procedure and
technique used to collect, present and analyse data for applications in
practically every profession today.
Statistical method are widely used in Economics, for instance, in
assessing the impact of economic policy change e.g. taxation and wage
policy changes in testing the efficiency of alternative production
techniques, in conducting econometric research in comparative study of
different economies etc, statistical method application. For instance, an
economist needs to assess the impact of a proposed change in sale taxes
in the given community in terms of consumer buying pattern for different
categories of consumers through field research. Such determination will
include interview with samples of consumers in each major geographical
area and in each income category. As another example, consider an
economic study which seems to identify appropriate mechanism for
determining income wage and to trace the economic consequences of
such a wage. Such a study will invariably involve the use of data gathering
to presentation and analysis using statistical method.
In the business world, the decision-making with differing occurrence
aided by specifically determined conclusion. Such conclusion evolves out
of statistical inferences. For instance, consider an auditor verifying the
financial record of a firm. Investors need such financial information to aid
their investment decision. Likewise, manager of the firms also needs such
information for operational guidance. To an auditor, it may not be possible
to verify the accuracy of every account receivable, the auditor therefore
selects a sample of accounts based on the techniques of Statistics and
from the sample results either accepts the accuracy of the stated amount
2
of account receivable or continues with further sampling.
As another example, let us consider the work of a quality control
manager of a firm. To ascertain whether or not the required quality
standards are met, it is practically impossible to examine every possible
product (sometimes thousands or millions of these products are
produced). To accomplished the quality control objective, the production or
quality control manager may have to test samples of the product and as a
result of such test make decision whether or not to make changes in the
production process. Tested samples need to be selected by the use of
statistical method.
Consider again the introduction of a new product to the market by a
firm’s marketing manager. As a first step, the marketing manager could
give a sample of the product to consumer for their use and evaluation. As
another step by full-scale marketing, she could carry out test marketing in
limited geographical area and then analyse the result using statistical
method to determine the likely level of demand.
As a further explanation of the use of Statistics in business, we
consider the introduction of a new training program by a firm personal
manager. Before implementing the program throughout the coy, the
personal manager will have to present the program to representative group
e.g. or sales personal and evaluate the result in comparison with other
training and development method. The process of choosing the
representative group as well as that of result evaluation both involve the
use of the knowledge of Statistics.
The knowledge of Statistics also finds application in other social
sciences. The Sociologists find the knowledge of Statistics useful in
analysing the result of a broad rehabilitation program. For instance,
industrial psychologist could use the knowledge of Statistics to examine
workers responses to plant environment. The political scientist also could
use statistical method of analysis to forecast voting pattern.
In summary statistical techniques provide useful means of informed
and unbiased decision-making in the face of uncertainty in Economics,
other social sciences and other fields such as Medicine, Agriculture,
Biological Sciences and the Physical Sciences. Indeed, the knowledge of
the subject Statistics enables us to get a picture of a problem where
precise measurements or observations are difficult to make or where

3
events are not easily predictable.
1.2 Types of Statistics
Statistics is sub-divided into two: Descriptive and Inferential
Statistics.
Descriptive Statistics: is concerned with summarising and describing a
body of data. How does descriptive statistics summarise data? Data
summarisation is done by finding out one or more pieces of information
that characterise a whole data. Among the quantitative summary values
are averages and measures of dispersion. For instance, suppose we have
data on the incomes of 1000 Nigerian families, the body of data can be
summarised by finding the average family income and by finding the
spread of these family income above or below the average. Again, how
does descriptive statistics describe a body of data? This is done by
representing a body of data in graphic forms such as table, chart or graph
of population of family in each income class.
Inferential Statistics: is the process of reaching generalisation about the
whole (called the population) by examining a portion (called a sample).
The inferential statistics include those techniques by which decisions about
statistical population can be made without observing or measuring all
elements in the population. Typically, inferential statistics make use of
random sample as the basis for statistical inference. In order for such
inferences to be valid, a sample must be representative of the population
and the probability of error also must be specified. It should be noted that
inferential statistics has two aspects: (a) Estimation (b) Hypothesis testing.
Furthermore, inferential statistics involves inductive reasoning. In
inferential statistics two conditions are required:
 The sample must be representative. This is to say that the sample must
fully reflect the characteristics and properties of the population from which
it is drawn
 The probability of error must be specified since the probability of error
exists in statistical inference. Estimate or test of a population properties or
characteristics should be given together with the chance of probability of
being wrong. This probability theory is an essential element in statistical
inferences.
Consider again the sample of 1000 Nigerian families above. Definitely, we
4
have more than 1000 families in Nigeria. If these 1000 families are
representative of all Nigerian families, we can estimate and test hypothesis
about the average family income in Nigeria as a whole. However, since
these conclusions are subject to error, we also could have to indicate the
probability of error.
1.3 Common Terms in Statistics
 Observation: In Statistics an observation refers to the things been
observed. There could be observation about any object such as height,
weight, plot etc. The numerically recorded observation, which is referred to
as data, is the raw materials with which statisticians work.
 Population: In Statistics, population is the entire individuals, objects or
items which may be living or non-living that are to be observed in a given
problem situation. Consider a single toss of a coin. There are two
outcomes which are Head (H) and Tail (T). Hence, the population consists
of (H, T). In throwing a Ludo die the population is (1,2,3,4,5,6). It should be
noted that a population could be finite or infinite. The no of student
registered for SSC 201 is a finite population because the count process
can end. On the other hand, the population of stars on the sky is infinite.
 Variable: is a feature poses by the member of a population e.g. age,
weight, height etc. The variable may take on different values which may be
integers or any kind of real numbers. A variable can be discrete or
continuous. A discrete variable takes on countable value each of which
can be identified exactly e.g. the number of oranges, size of family, size of
shoe. Discrete variable could be odd numbers such as 0, 1, 5, 2, 7 and
could be others exactly identifiable or countable mixed numbers such as
51/2, 21/2. The continuous variable is one which can assume any values
within any given interval. It takes any kind of real numbers which has no
exact value. Hence a continuous variable can only be measured. We
usually approximate or estimate its value e.g. distance, height etc.
 Parameter: is a descriptive characteristic of a population which helps to
summarise information about the population with regards to the variable
under study e.g. the mean and standard deviation of the population.
 Statistic: is a descriptive characteristic of a sample e.g. sample mean.
The statistical inference will make inferences about parameter from their
corresponding statistics.
5
 Data: is the set of recorded observation made on a sample. Data are
therefore facts, unevaluated symbols or messages. They are usually
values of an attribute. They may be in the form of numeric value (i.e.
quantitative data) or non-numerical perception or observation (i.e.
qualitative data) made by man or machine. Also data can be discrete or
continuous. A discrete data is defined as that which can assume only fixed
number or value that may be identified a circle. A continuous data is
defined as data that practically have no single exact value and can be
identified only within a fixed valid grade.
 Information: refers to the evaluated, validated or processed data. The
information we have about an entity are referred to as attributes each of
which has a value - data that has been processed into useful information.
For instance, a class average score computed from examination grade
provides information that is useful, to obtain such useful information, the
examination score will undergo calculation of class average score.
Information is very important for the decision-making and gathered data
will not be useful until it is processed.
 Sample: is a subset of the population observed for the purpose of making
scientific inferences such as generalisation or conclusion about the
population. Recall that in other for statistical inference to be valid, it must
be based on a sample. It fully reflects the characteristics and properties of
the population from which it is drawn. A representative sample is ensured
by random sampling whereby each element of the population has an equal
chance of being included in the sample.
 Sampling Frame: contains the basic details of all member of the
population from which samples are to be drawn. Statisticians believe that
without a complete Sampling Frame, a truly random sample can not be
selected. Sampling frame include voters register, telephone directory and
so on.
1.4 Reasons why a Sample is preferred to Population in most
statistical enquiries and analysis
 Analysis based on a representative sample is as precise as that based on
the entire population
 Use of a sample is time-saving and cost-minimizing in terms of human and

6
material cost
 Use of the population to obtain some of its parameters may not be feasible
i.e. not practicable especially with infinite population (i.e. population whose
number is too large to be known) or when the observation process is
disrupted e.g. In testing the efficacy of a new vaccine on new raw
materials in production.

2.0 SAMPLING DESIGN


Sampling design is a definite plan completely determined before any
data is collected for obtaining a sample from a given unknown population.
Some of the most important types of sampling design are Simple Random
Sampling, Systematic Sampling, Stratified Sampling, Multi-stage Sampling,
Quota Sampling and Cluster Sampling
2.1 Simple Random Sampling
This is a sampling procedure in which every member of the
population has equal chance of being selected as a member of the
sample. It is mostly adopted in homogeneous population (i.e. of the same
kind). The methods of selection include casting of lots, tossing of coin or
rolling a die. However, these methods are not perfectly objective and
widely used method is the random number table. This table contains
randomly allocated number arranged in rows and columns of a standard
statistical table.
2.2 Systematic Sampling
This is similar to simple random sampling but not the same. It is used
when the population is homogeneous and sampling frame is complete. An
element of randomness is introduced by selecting the first member or unit
by random method. The other members are then systematically selected.
The danger in systematic sampling lies in the possible presence of hidden
periodicity. For instance, it is known that there are usually misprints in the
daily production of newspaper and this is to be investigated. If by sampling
we select every 5th copy made by a particular machine, our result will be
biased if there is failure in recording every 4 th print regularly.

7
2.2.1 Steps in using systematic sampling method
Let the population be of size ‘N’ and let the sample size be ‘n’ with
defined number ‘k’ such that k = N/n. For instance, if N = 15000 and n =
100, then k = 15000/100 = 150. In case N is not exactly divisible by n, k is
taken as the nearest integer e.g. if we have k = 20/3 = 6.667 7 as the
determined number k. Then select randomly the first member of the
sample from the 1st kth element in the sampling frame. In other words, if x
is the 1st element in the sample then we select x such that 1 < x < k. Other
members of the sample are then selected by choosing every k th element
thereafter. The procedure can be summarized according to the following
steps.
Step1: assign randomly to every member of the population the unit 1 to N.
If the population is 15000 then we have x1, x2, x3,..., x15000, where x is a
member of population
Step2: determine the number k defined as k = N/n
Step3: select randomly the 1st element of the sample from the 1st kth
element e.g. if N = 20, n = 4, then k = 20/4 = 5, then the 1st element of the
sample is selected from the 1st 5 element
Step4: we then select every kth element thereafter. According to this
procedure, the sample size will consist of the element S 2, S7, S12, S17, If ‘a’
refers to the subscript on the 1st randomly selected item then the elements
contained in the sample will form an arithmetic progression defined as S a,
Sa+k, Sa+2k, ..., Sa+(n-1)k.
Note: (i) Systematic sample requires a specification of sampling frame (ii)
It may not be practicable if the population is infinite (iii) The system may be
bias in the presence of hidden periodicity
2.3 Stratified Sampling
This is a sampling procedure which involves dividing the population
into a number of non-overlapping sub-populations (strata). Then we take
sample from each stratum by any suitable random method. Stratified
sampling procedure is good and appropriate when our sample is to be
drawn from an heterogeneous population e.g. human population with
varying economic or social group, population of automobile with varying
brands, relevant question requiring satisfactory answer and post by the

8
use of stratified sampling technique. This includes: (i) What should be the
bases of stratification (ii) How many strata should be formed (iii) What
sample size should be allocated to different strata (iv) How should a
sample within each stratum be taken
Usually, answer to question 1 and 2 should depend on a good
judgement of the researcher. With regards to question 3, very often it is
necessary in the selection of required sample to ensure the sizes are
chosen in a way that makes them proportional to the sizes of the
respective strata. This is called proportional allocation.
In general, if we divide a population of size N into k strata of sizes
N1, N2, N3,..., Nk and take a sample of size n1, n2, n3, ..., nk, we say that the
allocation is proportional if n1/N1 = n2/N2 = n3/N3 = ... = nk/Nk or nearly we
can show that ni = Ni/N × n for i=1,2,3,...,k where n = total sample size and
N = n1+n2+n3+...+nk.
Stratified sample of size n = 60 is to be taken from the population of
size N= 4000 which consists of 3 strata of N 1= 2000, N2= 1200 and N3=
800. If the allocation is to be proportional, how large a sample must be
taken from each stratum?
2.4 Quota Sampling
In stratified sampling the cost of taking random sampling from the
individual strata is often so expensive that the interviewers are simply
given quota to fill from different strata with very few prescriptions on how to
be filled. This kind of sampling is called quota sampling. Basically, it
involves two steps: (i) Classifying population into distinct groups (ii)
Allocating quota to each group
For instance, in determining students attitude towards increment in
school fees in a particular institution, an interviewer may be told to select
10 students from college A, 5 students from college B, and 10 students
from college C etc, will be the actual selection of those to be interviewed
being left to the discretion of the researcher or interviewer.
A major disadvantage of this method of sampling is the absence of
firm restriction on their choice. Interviewers naturally tend to select
individuals who are most readily available in order to reduce cost and time.
Major advantages in Quota Sampling include the following: (i) It saves

9
time and it is quick (ii) It is convenient (iii) It is relatively cheap
2.5 Cluster Sampling
A cluster sample is one in which the element in the target population
are selected in group rather than individually. As initial step in sampling,
the group to be included are selected by simple random sampling as
follows. Each group in the target population is assigned a serial number.
Then, the sample groups can be selected by reference to the table of
random number.
Cluster Sample is often used when the elements in the population
are not easily identified individually but are grouped i.e. clustered together
and are more easily identified as members of the cluster. Cluster Sampling
could be the most useful in the absence of complete sampling frame unlike
the other methods of sampling described earlier. For instance, if we wish
to study the hourly wage rate of workers in a large metropolitan area, it
would be difficult if not impossible to obtain the listing of all individual wage
earners. However, we could randomly sample the firm in which people
work, which would represent cluster of employees that may be included in
the cluster sample.
2.6 Multi-Stage Sampling
Multi-stage sampling involves a sampling procedure of more than
one stage that first consists of breaking down the population into a set of
distinct groups. From this a number of group are selected. Each group
selected is broken down into units from which a sample is taken. If we stop
at this stage we have a two-stage sampling. Further stages may be added
by breaking each unit into a still small unit, we could have 3, 4 or 5 stage
sampling.

3.0 TABULATION AND PRESENTATION OF DATA


Raw data may not be meaningful if they are not organised
numerically. Thus, to make them to be more meaningful it is necessary to
summarise in table, chart and diagram. It is from these sub-summary
10
conclusions that we could make classification that provides useful
information by taking out the main features of sampling data.
3.1 Table
There are three categories of tables that help us in the presentation
of data. (i) Reference/Source/Repository/General purpose table (ii)
Working table (iii) Summary table
3.1.1 Reference Table: are general purpose table which serves as
reference for data users. It is a table from which further analysis could be
made. Such reference tables are usually secondary data and it can be
sourced from Central Bank of Nigeria, National Bureau of Statistics and
other public agents.
Example: External Public Debt (End of Period) 1981-1988 (Nbillion)
Public debt Total Total Total Total
Selected committme drawing repayme outstandi
period nt nt ng
1981 8.0 5.2 0.7 2.3
1982 14.7 10.2 1.4 8.8
1983 17.8 12.8 2.3 10.5
1984 21.0 17.7 3.2 14.5
1985 29.3 23.2 5.9 17.3
1986 57.0 48.9 7.7 41.2
1987 14.2 132.5 23.7 100.8
1988 18.3 164.6 30.6 133.9
Sources: CBN Economic and Financial Review Vol. 27, No. 1 March 1989
3.1.2 Working Table: is one of which initial calculation are made before
the final table is arrived at. Consider an experiment from which a coin is
tossed 3 times, our working table may look like the following.
Outcome No. of heads
HHH 3
HHT 2
HTH 2
HTT 1
THH 2
THT 1
TTH 1
11
TTT 0
If we use letter X to represent the number of heads then our final table
which present the data in a better way.
X Frequenc
y
0 1
1 3
2 3
3 1

3.1.3 Summary Table: is usually a derived table in the sense that it is


either taken out from the table or modified to suit a particular purpose.
3.2 Characteristics of a Good Table
 A table has a general title on top
 A table has a column title showing the order of the tabulation along the
column. There could be a row title to show the order of classification along
the row.
 A table should have a short note at the bottom to indicate the sources of
information contained in the table
 An indication of the unit in which the data in the table is measured should
be given
 A good table should be arranged to give clear communication of
information in neat and concise form
3.3 Ratio and Percentages
Another form of data presentation is to express data in the form of
ratios and percentages. Ratios are fractions which express variation in the
data in respective of the actual or absolute size of the data. For instance, if
the total population of a village A in 1930 was 30000 and the number of
death was 600, then the ratio of death on the population is 1:50 or 1/50.
Ratios can be in three forms namely: (i) One that gives comparative
fact on the component fact in relation to the O (ii) One that relates one
observation to another. In most cases the related observations are of the
12
same variable at different time interval and may be at different places
3.4 Pictorial Representation of Data
Although, calculus representation of data will make them clear by
enhancing the condensation and comparison, pictorial representation of
data has the added advantage of enhancing the convening of information
through visual comprehension or appreciation.
The most popular forms of pictorial representation of data include pie
charts, bar diagram (bar chart and histogram) and graphs (frequency
polygon and ogive).
3.4.1 Pie chart: A pie chart is simply a circle divided into sectors. The
circle represents the total of the data being presented and each section is
drawn proportional to its relative size. The main advantage of the pie chart
is that it is easy to understand. However, it is most restricted to very simple
comparison when there are only few groups such as a group exceeding it.
The disadvantage in the use of pie chart is loss of clear visual effect when
there are more than say four groups or sections.
Illustration: An investigation of marital status of some adult in an area gives
the following information
Marital Status No. of Adults
Single 35
Married 130
Widowed 25
Divorced 10
Draw a pie chart using the above information.
Total no. of adults=200Single = 35/200 × 360º = 63º Married = 130/
200 ×
360º = 234º
Widowed = 25/200 × 360º = 45º Divorced = 10/200 × 360º = 18º
Marital status of adults in area B.

13
3.4.2 Bar Chart: A bar chart or graph consists of a set of equally spaced
rectangle drawn in the cartesian plane with equal width but with height
proportional to the frequency of the variable attribute with which we are
concerned. These set of rectangle can be drawn vertically or horizontally.
Bar chart could be simple, multiple or component in nature. A simple bar
chart comprises a number of equally spaced rectangle. A multiple bar
chart is usually used in the comparison of two or more attributes. For
instance, if two attributes have been compared, we have pairs of rectangle
standing together. A component bar chart comprises of bars with each bar
subdivided into components.
Illustration: (1) present the data of marital status in the above example in
simple bar chart.

Simple bar chart

14
(2) the sex distribution of a member of staff in a television station are
written below
Departmen Male Female Total
t
Admin.(I) 25 15 40
Programm. 65 30 95
(II)
Commercia 45 40 85
l (III)
News (IV) 35 15 50
Sports (V) 30 10 40
Represent the above data in Multiple and Component bar charts
Multiple bar chart

Component bar chart

3.4.3 Histogram: Histogram and bar chart look alike in the


15
presentation. However, while the bars of the bar chart are usually not
joined, those of the histogram are always joined. Furthermore, while the
bar chart attaches importance only to his height, histogram attaches
importance to both its height and width. Hence, in drawing them we let the
area of each bar represent the frequency of the attribute. Data on marital
status above can be translated into the following.

3.4.4 Frequency Polygon: is obtained by joining the mid-point of the tops


of rectangles of the histogram.
3.4.5 Cumulative Frequency Curve (Ogive): to obtain a cumulative
frequency curve, we plot the cumulative frequencies against the upper
class boundaries of the class intervals. The shape of a cumulative
frequency curve is usually but not always like that of an elongated S.

4.0 DESCRIPTIVE STATISTICS


When a relatively large number of observation or measurements
have been made, it is often useful to organise the data in a manner that
the principal features of the data become ready-made apparent. The
descriptive statistics provide various techniques as well as quantitative
summary value for handling such a case. Such methods to be examined
are frequency distribution, measure of location and dispersion, normal
distribution and sampling distribution.
4.1 Frequency Distribution

16
This is a table in which possible values for variables are grouped into
classes and the number of observed values that belong to each class is
recorded. Data organised in a frequency distribution are called grouped
data. In contrast for ungrouped data, every individual observed value of the
random variable is listed regardless of whether or not the data are
grouped. The collection of values may be for either a sample or a
population.
Suppose we have a set of raw data which has been collected:
67 73 71 74 61 68 70 66 73 70 68 67 72 69 71 69 76
70 72 71 77 69 71 74 66 68 70 72 72 70 71 70 64 65
70 69 72 75 66 67 70 72 67 70 71 68 66 73 69 67
The scores listed above are not easy to interpret owing to the absence of
any organisation of the data. The table below is the frequency distribution
of the scores above.
Score (%) No. of Students
60-62 1
63-65 2
66-68 13
69-71 20
72-74 11
75-77 3
The advantage of frequency distribution is that such a table makes it
easier to interpret the reported value.
4.2 Features of a Good Frequency Distribution Table
 Class Limit: for each class in a frequency distribution the lower and upper
class limit identify the value indicated in the class e.g. the class limits for
the 1st class of scores reported in the table above are 60-62% inclusive.
 Class Frequency: this indicates the number of scores for each class of
the frequency distribution e.g. the frequency for the class 75-77 is 3.
 Class Interval and Width: this is the difference between the upper and
lower boundary of each class. In other words, the class interval which
indicates the range of values included in each class. Thus, the class
interval is determined by subtracting the lower boundary (B L) from the
upper class boundary (BU) for each class i.e. c = BU + BL
17
 Class Boundary: the class boundary is defined as the point half-way
between the upper limit of one class and the lower limit of the next class. It
is calculated by applying the following rules
 Lower Class Boundary = Lower class limit – 1/2D
 Upper Class Boundary = Upper Class limit + 1/2D
where D is the common difference between the upper class limit of any
class and the lower class limit of the next class.
 Class Mid-Point: Also known as the class mark is the middle value of
each class and is located half-way between the upper limit or boundary
and lower limit or boundary of each class as reported in the table below. M
= BL + 0.5C

Table: Class boundaries and Mid-points for the score of 50 students


Class limit (%) Class bounbary Mid-point
(%) (%)
60-62 59.5-62.5 61
63-65 62.5-65.5 64
66-68 65.5-68.5 67
69-71 68.5-71.5 70
72-74 71.5-74.5 73
75-77 74.5-77.5 76

Class boundaries are used for the calculation and in drawing the graph of
distribution
4.3 Relative Frequency Distribution: is one in which the number of
observation for each class is converted into relative frequency by dividing it
by the total number of observations in the entire distribution. Each relative
frequency is thus in proportion.
Score (%) Absolute Frequency R.F
(f)
60-62 1 0.02
63-65 2 0.04
66-68 13 0.26
18
69-71 20 0.40
72-74 11 0.22
75-77 3 0.06

4.4 Cumulative Frequency Distribution: it shows for each class the


total number of observation in all classes up to and including that class.
Thus, a cumulative frequency identifies a cumulative number of
observations below the upper boundary of each class in the distribution.
The cumulative frequency for a class is determined by adding the
observed frequency for that class to the cumulative frequency for the
preceding class. The table below illustrates the calculation of cumulative
frequency for the frequency distribution table above.
Score (%) Upper class boundary
f Cf
60-62 62.5 1 1
63-65 65.5 2 1+2 = 3
66-68 68.5 13 3+13 = 16
69-71 71.5 20 16+20 = 36
72-74 74.5 11 36+11 = 47
75-77 77.5 3 47+3 = 50

4.4 The Summary Notation


The convenient shorthand for the sum of nth (x1 + x2 + ... + xn) is
and translated “add up all the xi’s beginning at x1 (i.e. i=1) and
stopping at xn (i.e. i = n). Denotation Σ is the Greek Capital S pronounced
as Sigma. It is known as summarisation operator or sigma operator. The
suffix “i” is often called a dummy variable.
Illustration: = X 1 + X2 + X3 + X4.
It will be seen that this operator is useful in meaning context and will be
used freely through out this course
Examples:
 = y1 + y2 + y3

19
 = X 12 + X22 + X32 + X42 + X52

 = X 13 + X23 + X33 + X43

 = 1 + 2 + 3 + ... + (n-1) + n

 = 1 2 + 22 + 32 + 42

Note that “i” does not have to begin at 1 unless otherwise specified. “i”
always increases in step 1. Using the notation we also have:
 = a + a + a + a = 4a

 = (4 + X 1) + (4 + X2) + (4 + X3) + (4 + X4) + (4 + X5)

 = 2X 1 + 2X2 + 2X3 + 2X4

4.5 Some important properties of sigma operator


 = na, where ‘a’ is a constant: Illustration =2+2+2+2=4
(2) = 8

 =a where ‘a’ is a constant

Illustration: = 3X 1 + 3X2 + 3X3 + 3X4 = 3 (X1 + X2 + X3 + X4) = 3

20
 = ±

Illustration: = (X 1 + Y1) + (X2 + Y2) + (X3 + Y3)

= (X1 + X2 + X3) + (Y1 + Y2 + Y3)

= +

4.6 The Arithmetic Mean


The mean of a set of numbers x1 + x2 + x3 + ... + xn is denoted by and
defined as
= 1/n (x1 + x2 + ... + xn) i.e. using the summation notation = 1/n
Example:
 Find the Arithmetic Mean of the set of data -3, -1, 0, 2, 3, 4
= 1/
n

= 1/6 [-3 + (-1) + 0 + 2 + 3 + 4] = 1/6 (5) = 0.833


 Find the mean of the following set of numbers by grouping them into
frequency distribution: 4, 3, 6, 7, 5, 5, 3, 4, 9, 6, 5, 5, 6, 8, 3, 6, 6, 3, 5, 4,
7, 6, 4, 1, 9, 7, 8, 6, 4, 6
Xi Fi Fixi
1 1 1
3 4 12
4 5 20
5 5 25
6 8 48
7 3 21
8 2 16

21
9 2 18
Σfi = 30 Σfixi = 161
= ΣFiXi/ΣFi = 161/30 = 5.3667
4.6.1 Mean of a grouped frequency distribution
For a continuous frequency distribution or a grouped discrete
distribution, we clearly can not use the previous method because it does
not have a distinct value but ranges of values of x. What we do here is
simply take the mid-point of the class to represent x value for the class and
proceed in the usual way.

Illustration:
class Mid-point(x)Frequency Fx
(f)
5.00-5.49 5.25 12 63
5.50-5.99 5.75 32 184
6.00-6.49 6.25 11 68.75
6.50-6.91 6.75 8 54
7.00-7.49 7.25 2 14.5
Σfi = 65 Σfixi = 384.25
= 5.91
4.7 The Median
The Median of a set of numbers x1, x2, ... , xn is defined as the middle
value of the set when arranged in size order. If the set has even number of
items, the median is taken as the mean of the middle two.
The median is a measure of central tendency i.e. a measure of an
observation that occupies the middle position in an array of values.
Determination of the median requires that the data be re-arranged either in
an ascending or descending order. For M ordered observation when N is
odd, the median represented by Me = (N+1/2)th item. When N is even Me is
determined by finding the mean of the two values M e = (1/2N)th item.
Illustration:
22
 Suppose we have the following data: 44, 40, 79, 42, 51, 59, 71, 44, 60, 65,
45, 40, 42, 44, 44, 45, 51, 59, 60, 65, 71, 79. The median value (M e) = 51
 Suppose we have the following set of data: 40, 42, 44, 44, 45, 51, 59, 60.
Me = 44 + 45/2 = 44.5
For a discrete frequency distribution taking the value x1, x2, x3, ..., xn
and corresponding frequency f1, f2, ..., fn, the median frequency is (Σf+1/2)th
item. If Σf is large enough such that there is a difference, it is usually found
convenient to include a column of cumulative frequency when calculating
the median for discrete frequency distribution.
 Find the median to the frequency distribution
X 0 1 2 3 4 5 6
f 5 5 10 20 30 20 10
Solution
x f Cf
0 5 5
1 5 10
2 10 20
3 20 40
4 30 70
5 20 90
6 10 100
Σf = 100
Median = ( /2) item = (
N+1 th 100+1/2) th item = 50.5th item = 4. The median is 4
because the 50.5th item falls at x = 4
When given a continuous or grouped discrete distribution, you can
only estimate the value for the median for grouped data. The median class
can be identified by identifying the class that contains ( N+1/2) th item.
However, the median value has to be obtained by using the formular: M e =
L + [N/2 – F/fm]c, where: L = lower class boundary of the median class, N =
number of observation in the data set i.e. total number of frequency, F =
sum of the frequencies up to but not including the median class, Fm =
frequency of the median class, C = size of the interval of the median
class
Illustration: consider the following frequency distribution of scores of
23
student in an examination. Obtain the median score
Score 60-62 63-65 66-68 69-71 72-74 75-77
No of students
1 2 13 20 11 3

Solution
X F Class boundary
Cum. Freq.
60-62 1 59.5-62.5 1
63-65 2 62.5-65.5 3
66-68 13 65.5-68.5 16
69-71 20 68.5-71.5 36
72-74 11 71.5-74.5 47
75-77 3 74.5-77.5 50

First, locate the median class: Me class = (N+1/2) th item = (50+1/2) th


item = 25.5th item. Hence, the median value lies between 25 th and 26th
people. The group that contains this value is 69-71 class
Me = L + [N/2 – F/fm]c ; L=68.5, n=50, F=16, Fm=20 and c=3
Me = 68.5 + [25-16/20]3 = 68.5 + (1.35) = 69.85
The technique discussed above for estimating the median value is
called the method of interpolation. The second method requires operating
graphically. The graphical method is more efficient than the interpolation
method. The method involves drawing a smooth cumulative frequency
curve for the given data. The cumulative frequency is drawn on the vertical
axis against the upper class boundaries on the horizontal axis.
Illustration: Obtain the median by graphical method
Using the graphical method, we need to: (a) plot the cumulative frequency
against the upper class boundary (b) find the point on the x-axis that
correspond to the value (n+1/2) th item (n = Σf) on the cumulative frequency
axis. The table for plotting the graph is given below
c.f 0 2 16 14 77 83 84
Upper class boundary 9.95 19.95 29.95 39.95 49.95 59.95 69.95

24
Recall that median = 42.5th item. Hence, from the graph Me = 37
(approximately). Usually, the median estimate is a better estimate provided
we draw a smooth curve through the plotted point.
4.8 Quantiles
The median has been defined has the ‘middle’ value of a set of
numbers arranged in size order. When applying the frequency distribution,
you can think of the media as splitting the area under a frequency curve
into 2 equal portions as in figure below.
Fig. A:

Similarly, we can split a frequency curve into as many equal portions


25
as we wish. Special names are given to those values that split a curve into
4, 10 and 100 equal parts. These are defined below.
 Quartiles: the 3 values that split a distribution into 4 equal portions are
known as Quartiles. In order of magnitude Quartiles are usually
represented by Q1, Q2 and Q3 and are called 1st, 2nd, and 3rd Quartiles
respectively. These are illustrated graphically as follows.
Fig. B:

Note: It is seen that by definition, the second quartile Q 2 is just the


median since it divides the area under the curve into two equal portions as
in fig. 2 above.
Since the quartile can split a distribution or set into four equal
portions, then for a size ordered distribution, Q 1= ¼ (n + 1)th item; Q2 = 2/4
(n + 1)th item = ½ (n + 1)th item; Q3 = ¾ (n + 1)th item. Recall that Q2 and
the median have the same formular.
 Deciles: the nine values that split a distribution into ten equal portions are
known as deciles. In order of magnitude, deciles are represented by D 1,
D2, ..., D9. They are called 1st, 2nd, ..., 9th deciles respectively. Deciles are
represented graphically as follows.
Fig. C:

Note: (i) The broken vertical line represents deciles not shown (ii) The 5 th
decile, D5, again coincides with median
Since the deciles split a set or distribution into 10 equal distributions,
then for a size ordered distribution D 1 = 1/10 (n + 1)th item; D2 = 2/10 (n + 1)th
item; D3 = 3/10 (n + 1)th item; ... Di = i/10 (n + 1)th item for i = 1, 2, 3, ..., 9.

26
 Percentiles: the 99 values that split a distribution into 100 equal portions
are known as percentiles and are represented by P 1, P2, P3,..., P99. Again,
P50 is the median. It could be cumbersome to fully illustrate percentiles on
the frequency curve but the following graphical illustration might do.
Fig. D:

The doted vertical lines represent the percentiles not shown. Since
the percentiles split a set or distribution into 100 equal parts, P 1 = 1/100 (n +
1)th item; P2 = 2/100 (n + 1)th item; P3 = 3/100 (n + 1)th item ... Pi = i/100 (n + 1)th
item, where i = 1, 2, 3, ..., 99.
Notice that P50 = 50/100 (n + 1)th item = ½ (n + 1)th item. Hence P50 =
median. Collectively, all quantities that are defined as splitting a
distribution into a number of equal portions (including median, quartiles,
deciles and percentiles) are called Quantiles. In general, if a particular
quantile splits a distribution into ‘s’ equal parts, the r th quantile of the set =
r/ (n + 1)th item of the size-ordered distribution.
s

Illustration: the set of observations below gives the grade on a subject for
a class of 40 students. Find: (i) 1 st quartile (ii) D3 and (iii) sixtieth percentile
7 5 6 2 8 7 6 7 3 9 10 4 5 5 4 6 7 4 8 2
3 5 6 7 9 8 2 4 7 9 4 6 7 8 3 6 7 9 10 5
x f fx Cf
2 3 6 3
3 3 9 6
4 5 20 11
5 5 25 16
6 6 36 22
7 8 56 30
8 4 32 34
9 4 36 38

27
10 2 20 40
total Σ f = 40 Σ fx = 240

 Q1 = ¼ (n + 1)th item = ¼ (40 + 1)th item = 10.25 th item = 4 (from the


frequency distribution table)
 D3 = 3/10 (n + 1)th item = 3/10 (40 + 1)th item = 30.75th item = 8 (from the
frequency distribution table)
 P60 = 60/100 (n + 1)th item = 60/100 (40 + 1)th item = 24.6th item = 7
4.8.1 Calculating Quantiles from grouped data
 Quartiles: the formular for calculating the quartile class is the same for both
the ungrouped and grouped data namely Qi = i/4 (n + 1)th item, i = 1, 2, 3.
However, when we have grouped frequency distribution we need to obtain
the specific values of the quartiles within a particular class. The method of
interpolation required to do so is just an extension of the formular for
finding the median of grouped data. Thus, for a grouped frequency
distribution, estimate of Q1, Q2 and Q3 are obtained by the given formulae:
Q1 = L1 + [(n+1/4) – F/F1]c1; Q2 = L2 + [(n+1/2) – F/F2]c2; Q3 = L3 + [3/4(n+1)-F/F3]c3;
where Li (i = 1,2,3) = lower class boundary of the quartile class, Fi ( i = 1,2,
3) = frequency of the quartile class, F = cumulative frequency up to the
class preceding the quartile class
 Deciles: Recall that to calculate the decile class, D i = i/10 (n + 1)th item i =
1,2,3,...,9. However, to calculate the real value of decile when the data is
grouped, we use Di = Li + [i/10(n+1) – F/Fi]Ci, i = 1,2,3,...,9
 Percentiles: Recall that to calculate the percentile class we use Pi = i/100(n
+ 1)th item i = 1,2,3,..., 99. However, to calculate the specific value of the
percentile when the data is grouped we use P i = Li + [i/100(n+1) – F/Fi]ci, i =
1,2,3,..., 99
Illustration: Calculate Q1, Q3 and P27 from the following grouped data
X 70-72 73-75 76-78 79-81 82-84
f 5 18 42 27 8

28
Solution
x f cf Class boundary
70-72 5 5 69.5-72.5
73-75 18 23 72.5-75.5
76-78 42 65 75.5-78.5
79-81 27 92 78.5-81.5
82-84 8 100 81.5-84.5

 Q1 position = ¼ (n+1)th item = ¼ (100+1)th item = ¼ (101) = 25.25 th item.


Hence the Q1 class is 76-78. Therefore, Q1 = L1 + [(n+1/4) – F/F1]c1 = 75.5 +
[25.25 – 23/42]3 = 75.66
 Q3 position = ¾ (n+1)th item = ¾ (100+1)th item = 75.75 th item. Hence the
Q3 class is 79-81. Therefore, Q3 = L3 + [3/4(n+1)-F/F3]c3 = 78.5 + [75.75 –
65/27]3 = 79.69
 P27 position = 27/100 (n+1)th item = 27/100 (100+1)th item = 27.27th item.
Hence, the P27 class is 76-78. Therefore, P27 = L27 + [27/100 (n+1) – F/F27]
c27 = 75.5 + [27.27 – 23/42]3 = 75.81
The methods used above for finding Quantiles for grouped data are
the interpolation methods which especially are extension of calculating
median. However, we can use the cumulative frequency curve to
determine this the same way we did when finding the median.
However, it is mostly useful to use a cumulative frequency percentage
instead of cumulative frequency. In other words, we are to plot %
cumulative frequency curve on the vertical axis while we plot the upper
class boundary on the horizontal axis.
Illustration:

29
4.9 The Mode
The mode of a set of values is defined as that one which occurred
with the greatest frequency. For example, for the following set of values 2,
3, 3, 1, 3, 2, 4, 5, 8, 3, 2, 4, 4, 3, the mode is 3 since it occurred mostly.
For continuous grouped frequency distribution, mode = L + [ d1/d1+d2]c,
where L = lower class boundary of the modal class, d 1 = frequency of the
modal class minus frequency of the immediate previous class, d 2 =
frequency of the modal class minus frequency of immediate following
class, c = width of class interval.
Note that the quantity d1/d1+d2 is always between 0 and 1 ensuring that
the mode must lie in the predefined modal class.
Illustration: Estimate the mode of the following distribution
X 9.3- 9.8- 10.3- 10.8- 11.3- 11.8- 12.3- 12.8-
9.7 10.2 10.7 11.2 11.7 12.2 12.7 13.2
30
f 2 5 12 18 14 6 4 1

Modal class = 10.8-11.2 with L = 10.75, d1 = 18-12 = 6, d2 = 18-14 = 4, c =


11.25-10.75 = 0.5; Mode = L + [d1/d1+d2]c = 10.75 + [6/6+4]0.5 = 11.05
The mode of the grouped data can also be obtained graphically. The
lines AB in the figure below are drawn on the highest rectangle or
histogram. The mode is obtained by the X-value i.e. horizontal value of the
interception of the two lines.

4.10 Relationship among mode, mean and median


For uni-modal frequency curves which are moderately skewed, the
relation:
mean – mode = 3(mean – median). The shape of the frequency
distribution refers to its symmetry or width.
A distribution has zero skewness above its mean if it is
symmetrically negligible above its mean. For a symmetrical uni-modal
distribution, the mean, median and mode are equal i.e. mean =
median = mode. This is graphically illustrated:

 Sk = 0

31
A distribution is positively skewed if the right tail is long. The mean is
greater than the median
 Sk = +ve

A distribution is negatively skewed if the left tail is longer. Then, mode >
median > mean
 Sk = -ve

Mode > median > mean


4.10.1 Skewness
Skewness can be measured by Pearson’s coefficient of skewness.
For population, Sk = 3 (μ– median/σ)
For sample, Sk = 3 (X – median/s)
Skewness can also be measured by the 3rd moment (numerator of
the equations below) divided by the cube of the standard deviation as
given below
For population, Sk = Σf(X – μ)3/σ3
For sample, Sk = Σf(X – X)3/s3
4.10.2 Kurtosis
Frequency can be:
 Platykurtic: i.e. flat with the number of the observed value distributed
32
relatively evenly across the class

 Mesokurtic: i.e. neither flat nor peaked in respect to the general


appearance of the frequency curve

 Leptokurtic: i.e. peaked is a large number of observed values


concentrated with a narrow
range of the possible value of the variable being measured

The difference in the shape of the curve can be made more apparent when
the 3 cuves are superimposed on the same graph as shown below.

33
5.0 Measure of Dispersion
Dispersion refers to variability or spread of the data. The measures
of dispersion include the range, average of (mean) deviation, standard
deviation and variance.
5.1 Range
Range is the simplest of all methods of dispersion and can be
calculated very easily and quickly. The range of a set of numbers is the
difference between the smallest and largest number in the set. For a
grouped frequency distribution, an estimate of the range would be the
difference between the lower class boundaries of the 1 st class and the
upper class boundary of the last class.
5.1.1 Illustration: the range of the set 2, 3, 8, 9, 7, 5, 3, 8, 9, 2, 4 is 9 – 2
= 7, since 9 is the largest and 2 is the smallest number.
Range is particularly useful in cases where we need to calculate the
dispersion of very small set of numbers where other techniques will be far
too time-consuming. However, the simplicity of this measure in particular
refers that it uses only two extreme values (ignoring all others), preclude its
use in any expensive analysis.
5.2. Mean Deviation
The mean deviation from the mean of a set of numbers x1, x2, x3 ... xn
with arithmetic mean is defined as:
Mean deviation = where i = 1, 2, 3 ... n, |Xi – X| is
the positive difference between Xi and X is called the modulus of Xi – X
5.2.1 Illustration: Consider the set of data 2, 3, 5, 3, 4, 1
X X–x |x – x|
2 -1 1
3 0 0
5 2 2
34
3 0 0
4 1 1
1/ -2 2
18
Mean (X) = /n = /6 = 3; Mean deviation = 1/6 (6) = 1
Σ f 18

However, the mean deviation can be described all generally as follows.


Definition: the mean deviation from sub-constant c of a set of numbers x1,
x2, x3, ... xn is defined as M.D. = 1/n Σ |Xi – c| where c can be any numeric
value. If c = X, we just obtained M.D from the mean and in general as a
measure of dispersion, the mean deviation is best measured from a
measure of location i.e. c is the mean, median or mode.
For a frequency distribution x1, x2, x3, ... xn corresponding to
frequencies f1, f2, f3, ... fn, the mean deviation is given by M.D. = 1/n Σ |Xi –
X|. The above formular is used for grouped data where x 1, x2, x3, ... xn
represent group mid-point.
5.2.2 Illustration: Find the M.D. for the frequency distribution
class f x X-x fx |x – x| F|x – x|
10-20 2 15 -18.3 30 18.3 36.6
20-30 12 25 -8.3 300 8.3 99.6
30-40 24 35 1.7 840 1.7 40.8
40-50 8 45 11.7 360 11.7 93.6
Σf = 46 Σfx = 1530 ΣF|x – x|

X = Σfx/ Σf = 1530/46 = 33.3 M.D. = 1/n Σ |Xi – X| = 1/


46 × 270.6 =
5.88

5.3 Measures Associated with Quantile


 Semi-interquartile range
 10-90 percentile range
The measure of dispersion based on quartile is the semi-interquartile
range of quartile deviation and it is calculated as Q s = ½ (Q3 – Q1).
The measure of dispersion based on percentile is the 10-90 percentile
35
range and it is calculated as P90 – P10.
5.4 Standard Deviation
The standard deviation of a set of numbers x1, x2, x3, … xn with x is
denoted by S and defined as

In other words, S = √M.D. hence, Standard deviation is sometimes


referred to as the square root of the mean deviation.
For a frequency distribution,

For a grouped data, standard deviation is computed using the


formular above where f = frequency of a class in a frequency distribution, x i
in the formular is the class mid-point, x is the mean and ∑fi = total number
of observation.
5.41 Illustration: Obtain the standard deviation of the data in the
following table
Class 60-62 63-65 66-68 69-71 72-74 75-77
f 1 2 13 20 11 3

Solution

c f x X-x fx |x – x| |x – x|2 f|x – x|2


60-62 1 61 -8.8 61 8.8 77.44 77.44
63-65 2 64 -5.8 128 5.8 33.64 67.28
66-68 13 67 -2.8 871 2.8 7.84 101.92
69-71 20 70 0.2 1400 0.2 0.04 0.8
72-74 11 73 3.2 803 3.2 10.24 112.64
75-77 3 76 6.2 228 6.2 38.44 115.32
50 3491 475.4

36
X = Σfx/ Σf = 69.8, S = √475.4/50 = 3.08

5.5 Variance
It is the square of the standard deviation i.e. V = S 2. For example, if
S = 3.08, then V = 3.082 = 9.49.
The following procedure is to be used in finding the standard deviation of a
grouped data
 Calculate the mean
 Calculate the deviation of each observation from the mean
 Square the deviation
 Multiply the squared deviation by the frequency of each class
 Apply the formular for standard deviation

6.0 ELEMENTARY PROBABILITY THEORY


6.1 Set Theory
A set is a collection of well-defined objects sharing the same
characteristic e.g. a set of students offering SSC 201, a set of university
lecturers, a set A = {x: x is even},
a set K = {x: x2 + 5x – 6 = 0}
6.1.1 Some Definitions
 Universal set: the universal set denoted by U is a set that contains any
set (or sets) under consideration. The universal set from any particular set
is not a fixed set.
 Subset: a set A is a subset of set B if all the elements of A are contained
in B i.e. if G = {1, 3, 5, 7} and S = {1, 3}, then S is a subset of G.
 Empty Set: the empty set denoted by Ø is a set that has no element and

37
by definition, it is a subset of every set.
 Equivalent Set: this occurs when the elements of one set is equal to that
of another set without regard to order. Given E = {3, 6, 7} and S = {6, 3, 7},
then E and S are equivalent.
 The order of a set: the order of a set A written as n(A) is defined as the
number of elements in set A.
Set Notation and Venn Diagram
 Intersection of set – set of all elements which belong to both A and B is the
intersection of A and B i.e. A n B = {x: x c A, x ε B}. A n B is illustrated in
the diagram below

 Disjoint set: two sets A and B are said to be disjoint if they have no
element in common. That is to say that A n B = Ø or A n B = {x: x ¢ A, x ¢
B}

 Union of set: a set of all elements belonging to either A or B or both is


called a union of set i.e. A U B or A U B = {x: x ε A, x ε B or x ε A or B}. for
example, if A = {1, 2, 3, 4} and B = {1, 2, 3, 7}, then A U B = {1, 2, 3, 4, 7}
(Note: no elements are repeated}

 Compliment of set: if B c A, then A – B is called the compliment of B


relative to A and is denoted by B1.
Law of Set
 Cummutative law - A U B = B U A; A n B = B n A

38
 Associative Law - A U (B U C) = (A U B) U C; A n (B n C) = (A n
B) n C
 Distributive Law - A U (B n C) = (A U B) n (A U C)
 Law of Compliment - A n A1 = Ø; A U A1 = U; (A1)1 = A
 Idempotent Law - A U A = A; A n A = A
 De Morgan Law - (A U B)1 = A1 n B1; (A n B)1 = A1 U B1
6.2 Probability
The probability is the study of random or non-deterministic events.
The depth of probability goes with early development to the interest of
European Mathematician in game set chance in the latter part of 17 th
century. Since then the concept of probability has become the foundation
for the development of technique on statistical inference that are used in
many views of basic and applied research, including economic analysis
and managerial decision-making.

6.2.1 Definition of Probability


Basically, three different approaches or schools of thought are
developed over time towards defining probability and these approaches
determine how probability values are obtained and interpreted. The 3
approaches are the classical approach, the relative frequency approach
and subjective approach.
6.2.1.1 Classical approach

Probability definition according to the classical approach states that


“if an event E can happen in ‘n’ different ways out of total of ‘N’ possible,
equally likely and mutually exclusive ways, then the probability of event E
occurring is given by P (E) = n/N where P (E) denotes the probability that
event E will occur. Note that by mutually exclusive we mean the
occurrence of any one outcome precludes the probability of any other
outcome occurrence in the same trial or observation. In other words, two
events A and B are mutually exclusive if they are disjoint i.e. A n B is an

39
empty set – A n B = Ø. Simply put, mutually exclusive events can not occur
simultaneously.
This approach permits the determination of probability value before
any sample results are observed. For this reason the approach has
become apriori approach i.e. before the experiment. The classical or priori
approach to probability can only be applied to game of chance (such as
tossing a balanced coins, rolling a fair die or picking a card from a well-
shuffled lot) where we can determine without experimentation, the
probability value that an event will occur. In real world, problem of
economics and business we often can not assign probability value apriori
(without experimentation) and the classical approach cannot be used. The
classical approach for defining probability requires equally likely
possibilities. However, there are many situations in real life in which the
possibility that arise cannot be regarded as equally likely.

6.2.1.2 The Relative Frequency Approach

By the relative frequency the probability value is determined on the


basis of the proportion of time that the favourable outcome occurs in a set
of sample observation or trial. No apriori assumption of outcome in this
approach i.e. the observation must be experimented to determine the
probability value based on the collection of analysis of data it has often
been called empirical approach. The approach is useful in a situation
where sample data relating to the event of interest can be collected. Thus
by the relative frequency approach, the probability that event A will occur is

Because relative frequency approach is based on sample, data are


not on apriori knowledge. The probability is an estimate of the exact
probability. However, as the number of observation is included the
observed relative frequency of an event tends to become a table e.g. if
1000 tosses of a coin result in 520x, then the relative frequency of x is
520/
1000. If another 1000 tosses result in 485x then the relative frequency in
the total 2000 tosses is 520+485/2000 = 0.5025. According to the relative
frequency definition, by continuing in this manner, we should ultimately get
closer and closer to a number we can call the probability of ‘n’ in a single
40
toss of a coin. In other words, we get closer to the classical probability
value which is 0.5.
The figure below portrays the observed relative frequency of ‘n’
occurring in each 100 tosses of a fair coin and the number of increase.
The relative frequency of x approaches the classical probability value
eventually stabilizing about 0.5.

A difficulty in relative frequency approach is that it gets different probability


for different number of trial or experiment. Although it is probability
stabilized or approaches a limit as the number of trial increases. The
danger is that because it is time-consuming. People may end up using the
relative frequency approach without a sufficient number of trial of
experiment.
In terms of the interpretation of probability value both the classical
and relative frequency approaches use objective probability values i.e. the
probability value is interpreted as indicating the long run relative rate of
occurrence of the event.

6.2.1.3 Subjective Approach


This approach is applicable when the process of generating the
event can not be known on an apriori basis and result cannot be sampled.
41
By this approach, the probability of an event is the degree of belief by an
individual in respect of the event based on all available evidence. For
instance, a labour. Consider the labour economy, level of profit in the
industry, comparative inter-industry wage and a number of other factors in
estimating the probability of a labour dispute because the probability value
so arrived at a personal judgement. This approach can also be called “the
personalistic approach”. The disadvantage of subjective or personalistic
approach to probability is that different people faced in the same situation
may come up with completely different probability. It is bias.
6.3 Some Definitions
 Experiment: this can be defined as the procedure adopted in order to gain
information about some process. An experiment usually involves an action
and observation
 Event: An event of an experiment can be taught of as some particular
situation that can arise during the experiment. An event is a subset of a
sample space
 Sample Point: A particular outcome out of all possible outcome of an
experiment is called a sample point
 Sample Space: the set S of all possible outcomes of some given
experiment is called the sample space. The sample space is the certain or
sure event. For instance, in a roll of fair die the sample space is given as S
= [1, 2, 3, 4, 5, 6]
 Elementary Event: the event consisting of a single sample is called an
elementary event e.g. above sample space given A = [3] is elementary
event
 Impossible event: this is an event that can not occur. It is denoted by the
empty set i.e. Ø e.g. possibility of getting 7 in the observation of tossing a
fair die.
We can combine events to form new event using various set:
 A U B is the event that occurs if A occurs or B occurs or both
 A n B is the event that occurs if A occurs and B occurs

42
 A1 or Ac, the compliment of A, is the event that occurs if A does not occur
6.3.1 Illustration: Consider the toss of a die, the sample space consists of
six possible numbers
S = {1, 2, 3, 4, 5, 6}. Let A be the event that even numbers occur, B that
odd number occurs and C that prime number occurs. Therefore, A = {2, 4,
6}, B = {1, 3, 5} and C = {1, 3, 5}.
Find (i) A U C (ii) B n C and (iii) C c
A U C = {1, 2, 3, 4, 5, 6}; B n C = {1, 3, 5}; Cc = {2, 4, 6}
6.4 Probability of a single event: If an event A in nA ways out of the
total of N possible and equally likely outcomes, the probability that event A
will occur is given by P (A) = nA/N. the probability of event A so described
can be visualized using the Venn diagram . The following figure represents
event A and total area of the rectangle represents all possible outcomes.

Note P(A) ranges between 0 and 1 i.e. 0 ≤ P(A) ≤ 1. If probability of A


= 0, event A cannot occur, if probability of A = 1, event A will occur with
certainty, if probability of A1 represents the probability of non-occurrence of
event A, then P(A) + P(A1) = 1
6.4.1 Illustration: Head and Tail are two equally possible outcome in
tossing a balanced fair coin, thus, probability of head, P(H) = ½, since n H =
1 and N = 2, P(T) = ½ , then P(H) + P(T) = ½ + ½ = 1
6.5 Rules of addition for mutually exclusive events
Two events A and B are mutually exclusive if the occurrence of A
makes impossible the occurrence of B and vice versa. Then,
P(A U B) = P(A) + P(B) ; P(A or B) = P(A) + P(B)
6.6 Rule of addition for non-mutually exclusive events
Two events A and B are not mutually exclusive if the occurrence of A

43
does not preclude the occurrence of the other event or vice versa. Then,
P(A U B) = P(A) + P(B) – P(A n B); P(A or B) = P(A) + P(B) – P
(A n B)
Probability of A n B is deducted to avoid double counting.
6.7 Rule of multiplication for independent events
Two events A and B are said to be independent if the occurrence of
A is not connected in any way to the occurrence of B. then, the joint P(A)
and P(B) is P(A n B) = P(A) . P(B)
6.7.1 Illustration:
 Let A and B be two events. Find an expression and exhibit the Venn
diagram for the event that (i) A but not B occur (ii) either A or B but not
both occur.
 Since A but not B occur shade the area of A outside B. this is the area that
represent the intersection of A and B 1. The expression for this event is A n
B1 and Venn diagram is drawn below.

Suppose we take A and B mutually exclusive, then the required Venn


diagram will be

 Either A or B but not both occur would imply that one of the two events
occur

44
7.0 RANDOM VARIABLES AND PROBABILITY DISTRIBUTION
A random variable is a variable whose values are associated with
some probability or chance of being observed. For instance, in one roll of a
pair die, we have six mutually exclusive outcome defined by the sample
space S = {1, 2, 3, 4, 5, 6} with each sample point i.e. each of the
outcomes is associated with a probability occurrence of 1/6. P(1) = 1/6, P(2)
= 1/6, P(3) =1/6, P(4) =1/6, P(5) =1/6, P(6) =1/6; x = 1, 2, 3, 4, 5, 6. This
outcome from a roll of a die is called a random variable.
A continuous variable is one that can assume any value within any
given interval. A continuous variable can be measured with any degree of
accuracy simply by using smaller units of measurement e.g. time, income,
distance, waves and temperature. We know that it is impossible for a
continuous variable to assume an exact value i.e. to say that a production
time takes 10 hours means anywhere between 9.5 and 10.4 hours i.e. 10
hours rounded to the nearest hour. It is equally impossible to assign a
probability to any single value that a continuous variable might take instead
we construct ranges of value for the variable and assign a probability to
each of these variables e.g. suppose x is a continuous random variable
with value from 0 to 5, we could have probability assigned as following.
Range of 0-1 1-2 2-3 3-4 4-5
x
Probabilit 0.2 0.2 0.4 0.1 0.1
y

That is, P(0 ≤ x ≤ 1) = 0.2; P(1 ≤ x ≤ 2) = 0.2; P(2 ≤ x ≤ 3) = 0.4 etc.


Let x be a continuous variable that can assume values only in the
ranges x1 up to x2, x2 up to x3, … xn up to xn+1 with respective probabilities
P1, P2, …, Pn, then P(x1 ≤ x ≤ x2) = P1, P(x2 ≤ x ≤ x3) = P2 etc. then the
summarization ∑Pi =1 for i = 1,2, …, n
7.1 Probability Distribution
Probability distribution is the set of all possible values of a random
variable and its associated probability. A probability distribution is
analogous to a frequency distribution in the sense that theoretical
probability dictates actual frequency. Recall that a frequency distribution is

45
of the form:
X X1 X2……. Xn Total
F F1 F2……. Fn ΣFi
Probability distribution on the other hand is of the form:

X X1 X2……. Xn Total
P P1 P2……. Pn Σ Pi
In other words, let us associate a random variable x with the
numbers of scores shown after a single throw of an unbiased die. If we
throw the die for 36 times we might obtain the following tabulated result
from this empirical distribution.
Table I
X 1 2 3 4 5 6 Total
f 4 6 8 7 5 6 36

Examine the theoretical situation of the same experiment S = {1, 2, 3, 4, 5,


6}.
P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6 or P(x1 = x) = 1/6 for x = 1, 2, 3,
4, 5, 6 or
P(x1 = xi) = 1/6; i =1, 2, 3, 4, 5, 6. Hence, the table of x values and
associated probabilities is given below

Table II
X 1 2 3 4 5 6 Total
P 1/
6
1/
6
1/
6
1/
6
1/
6
1/
6 1

7.2 Transforming a Frequency Distribution to a Probability


Distribution and vice versa
Notice that the frequency distribution table above can be turned to a
relative (proportional) frequency distribution by dividing each frequency by

46
the sum of the frequencies to have the following table
Table III
X 1 2 3 4 5 6 Total
Proportio 4/
36
6/
36
8/
36
7/
36
5/
36
6/
36 1
n

The proportion adds up to 1 as in the probability distribution in table


II. This enables us to compare table III with table II. In doing so, we regard
the proportion as probability estimates. Similarly the original probability
distribution in table II could be transformed into a frequency distribution by
visualizing that the experiment was performed 36 times. We need only to
multiply each probability to obtain what are called “expected frequency”.
Table IV
X 1 2 3 4Tota 5 6
l
Expected frequency 4/
36
6/
36
8/
36
7/
36
5/
36
6/
36 1
Note that the total expected frequency is 36 which is the same as the sum
of frequency in table 1.
We conclude that frequency and probability distribution essentially
differ in the manner with which we operate on some statistical experiment.
Theoretical consideration will give rise to probability distribution. Practical
performed will result to frequency distribution.
7.3 Some special Probability Distribution
In statistics there are special types of distribution that occur often
enough in practical situation. The most important of these are: binomial,
poisson and normal distribution. While the binomial and poisson
distribution are discrete, the normal distribution is a continuous one.
7.3.1 Binomial Distribution
A binomial distribution is a discrete probability distribution used to
find the probability of an outcome which may be classified as ‘success’ or
‘failure’, in n-trials of the same experiment. The probability of success is
usually denoted by ‘p’ while that of failure is denoted by ‘q’. Obviously, p +
q = 1. The binomial distribution will arise when:
47
 There are only two possible and mutually exclusive outcomes, one is
labeled ‘success’ and the other ‘failure’
 The n-trials of the same experiment are independent
 The probability of success is constant for each trial
The probability of getting x successes in n-independent trials is given
by:
F(x) = P(x) = P(X=x); F(x) = P(x) = nCxpxqn-x for x = 0, 1, 2, …, n
Since we know that p + q = 1, then q = 1 – p, therefore P(x) = px(1 –
p)n-x where n = positive integers and p ranges between 0 and 1 i.e. 0 ≤ p ≤
1. n and p are parameters.
7.3.1.1 The mean and variance of a binomial distribution
Mean = µ = np; Variance = σ2 = npq; standard deviation = σ = √ npq
= √ np(1-p)
The exact form of the binomial distribution depends on the value of
the two parameters n and p. Furthermore, the value of p determines the
skewness of the binomial distribution.
If p = ½ the distribution is symmetric whatever the value of n i.e.
when n = 4 and p = ½ we have F(x) = 4Cx (1/2)x (1/2)4-x. Consider the value x
= 0, 1, 2, 3, 4, then we have :
P(x = 0) = 4C0 (1/2)0 (1/2)4-0 = 1/6; P(x = 1) = 4C1 (1/2)1 (1/2)4-1 = 4/16
P(x = 2) = 4C2 (1/2)2 (1/2)4-2 = 6/16; P(x = 3) = 4C3 (1/2)3 (1/2)4-3 = 4/16
P(x = 4) = 4C4 (1/2)4 (1/2)4-4 = 1/16
We can now write the binomial probability distribution function in the form
of probability table as follows:
X 0 1 2 3 4
P(x) 1/
16
4/
16
6/
16
4/
16
1/
16

And the histogram is of the form:

48
 When p < ½ the distribution will be skewed to the right.
 When p > ½ the distribution is correspondingly skewed to the left
In general for small n (n ≤ 10) is the closer p is to 0 or 1, the more
skewed (right or left) the binomial distribution is. However as n becomes
larger the skewness tends to be corrected as distribution become more
symmetric.

7.3.1.2 Shorthand Notation for Binomial Distribution


If a random variable X is binomially distributed in the form P(x) = f(x)
= nC pxqn-x,
x for x = 0, 1, 2, …n, then in convenient shorthand notation is
written in the form X ~ b(n, p) which is read as “X has a binomial
distribution with parameters n,p.
For example, 5 fair coins are tossed. Find the probability of obtaining
(a) 3 heads (b) at least 3 heads (c) at most 3 heads
We can regard the tossing of the single coin as an experiment and
since the tossing of one is independent of any other, we can regard the
tossing of 5 coins as conducting 5 independent trials. Let H = success and
T = failure, the probability P(H) = ½ and since p + q = 1, q = 1/2, so we
have a binomial distribution with n = 5 and p = ½ defining X as the
numbers of head (i.e. success): X ~ b(5, ½ ) we know that

49
 P(X=x) = nCxpxqn-x ; P(X=3) = 5C3(1/2)3(1/2)5-3 = 5/16
 P(at least 3 heads) = P(X≥3) = P(x=3) + P(x=4) + P(x=5)
= 5C3(1/2)3(1/2)2 + 5C4(1/2)4(1/2)1 + 5C5(1/2)5(1/2)0
= 5/16 + 5/32 + 1/32

 P(at most 3 heads) = P(x≤3) = P(x=0) + P(x=1) + P(x=2) + P(x=3)
= 5C0(1/2)0(1/2)5 + 5C1(1/2)1(1/2)4 + 5C2(1/2)2(1/2)3 + 5C3(1/2)3(1/2)2
= 1/32 + 5/32 + 10/32 + 10/32 = 13/16 or 0.813
7.3.2 The Poisson Distribution
The Poisson distribution is another discrete distribution. it is used to
determine the probability of a designted number of success per unit of
time, when the event of successes are independent and the average
number of successes per unit of time remained constant. The conditions
that apply to binomial distribution also apply to poisson distribution. Then,
P(x) == λxе-λ/x! for x = 0, 1, 2, 3, .. n where x = designated number of
successes, P(x) = probability of x number of successes. The Greek letter
λ, lambda is equal to average number of successes per unit of time, е =
base of the natural logarithmic equation and the value is given as 2.71828
approximately. It can be showing that mean = λ and the variance = λ. As
with the binomial distribution the exact shape of poisson distribution
depends on the value of the parameter λ. Note that λ can take any positive
value.
Consider λ = 1, then P (x) == 1е-1/x! = 1/e . 1/x! = 1/ex!
P(x) = 0.3679/x!
Therefore, P(x=0) = 0.3679/0! = 0.3679; P(x=1) = 0.3679/1! = 0.3679
P(x=2) = 0.3679/2! = 0.1840; P(x=3) = 0.3679/3! = 0.0613; P(x=4) = 0.3679/
4!
= 0.0153
P(x=5) = 0.3679/5! = 0.0031
As the value of x increases the value of subsequent probability drops

50
quite rapidly
λ = 3, P(x) = λxе-λ/x! = 3xе-3/x!, now e-3 = 0.0498, P(x) = 3x(0.0498)/x!
hence, when x = 0 , P(x) = 30(0.0498)/0! = 0.0498
when x = 1, P(x) = 31(0.0498)/1! = 0.1494
when x = 2, P(x) = 32(0.0498)/2! = 0.2241
Given the value of λ (mean and variance of poisson distribution) we
can find e- λ from the table. Then we substitute the value of e - λ in the
equation P(x) = λxе-λ/x! to find P(x). As with binomial the exact shape of the
distribution depends on the value of the parameter λ.
7.3.2.1 Illustration: A bank receives on the average 5 bad cheques
per day. What is the probability that on a particular day, the bank will
receive (a) no bad cheques (b) 3 bad cheques (c) less than 3 bad cheques
(d) at least 2 bad cheques
7.3.2.2 Solution
λ = 5, P(x) = 5xе-5/x!, when x = number of bad cheques
 P(x = 0) = 50е-5/0! = 0.00674
 P(x = 3) = 53е-5/3! = 0.1404
 P(x < 3) = P(x=0) + P(x=1) + P(x=2)
= 0.00674 + 51е-5/1! + 52е-5/2! = 0.00674 + 0.00337 + 0.08425 = 0.12469
 P(x ≥ 2) = P(x=2) + P(x=3) + P(x=4) + P(x=5) = 1 - P(x<2) = 1 – [P(x=0) + P
(x=1)]
= 1 – (0.00674 + 0.0337) = 0.95956

7.3.2.3 Poisson Approximation to Binomial


A binomial distribution parameters n and p can be approximated by
poisson distribution with parameter λ = np when: (i) n is large i.e. n ≥ 30 (ii)
p is small i.e. p < 0.1 (iii) np < 4. Thus, P(X=x) = е-np(np)x/x!; x = 0, 1, 2, …

51
n
7.3.3 Normal Distribution
The continuous variable x having P(f) i.e. probability distribution
function of the form

For – ∞ < x < ∞ and – ∞ < µ < ∞, where σ > 0 is said to have a
normal distribution where f(x) is the height of the normal curve, е = the
constant 2.7183, π = the constant 3.1417, µ = the mean of the distribution
and σ = the standard deviation of the distribution.
The normal distribution is the most commonly used of all the
probability distributions in statistical analysis. This is because many
distributions found in nature and industry are normal. Some examples are
the IQ, weight and height of a large number of people and the variation in
dimension of a large number of parks procedures by a machine.
A convenient shorthand notation for a random variable distributed
normally is X ~ N(µ, σ2) and reads that X is distributed normally with
parameters µ and σ2. Suppose, X ~ N(8, 25), the mean = 8, variance σ2 =
25, standard deviation σ = √25 = 5.
7.3.3.1 Features of a Normal Distribution
 The curve is bell-shaped and symmetrical about a vertical axis through the
mean µ
 The mode which is the point on the horizontal axis where the curve
maximum occurs where X = µ
 The normal curve approaches the horizontal axis asymptotically as it
proceeds either direction away from the mean.
 Total area under the curve and above the horizontal axis is equal to 1. This
feature is illustrated graphically below
Figure A

52
Depending on the value of the parameter µ and σ2 so the curve will
alter in the following appearance.

Figure B below shows the effect of altering µ with sigma fixed and
figure C shows the effect of altering sigma with µ fixed.

7.3.3.2 The Standard Normal distribution


A normal distribution having µ = 0 and σ2 = 1 is called a standard
normal distribution.
X ~ N (µ , σ2) – Normal distribution; X ~ N (0 , 1) –Standard Normal
distribution
The random variable associated with this distribution is usually
denoted by Z so that we can write Z ~ N (0 , 1). A curve for a standard
distribution is of the form. To obtain the formular for the standard normal
distribution we put µ = 0 and σ2 = 1 in the formular of the normal
53
probability distribution function to give:

Usually denoted by Ø (x) . In other words, the probability distribution


function of a standard normal variable Z is given by :

We do not need to find the value of the expression in * above. Table


have drawn up for a wide range of x. one such table is contained in the
Cambridge elementary statistical table (CEST) in particular only gives
value of probability such as Pr(Z < x) where x ≥ 0. The CEST can not give
Pr(Z < ∞) or Pr(Z < -2).
Graphically, the table gives the probability value of the form:

7.3.3.3 Example:
 find Pr(Z < 1.64)

The area is shaded. Therefore, Pr(Z < 1.64) = 0.9495 from the table
 Find Pr(Z < 0.1)

Therefore, Pr(Z < 0.1) = 0.5398 from tables

54
However, we may be asked to find the probability of Z taking all
values such as Pr(Z ≥ x), Pr(Z < a) and others. There is need to do by
appropriate transforming the given problem into the form Pr(Z < x); x ≥ 0.
The following relationships enable us to calculate all types of probability
connected with the standard normal distribution using the symmetric
feature of the normal curve.
7.3.3.4 Illustration 1
Find the probability of the form Pr(Z > a); a ≥ 0

Total probability = 1 i.e. area A + area B = 1


Pr(Z < a) + Pr(Z > a) = 1. Therefore, Pr(Z > a) = 1 – Pr(Z < a)……….(rule
1)
Note that the tabulated probability is of the area from the left up to a value
‘a’. for example, findPr(Z > 0.5)

The probability of shaded + unshaded area = 1 i.e. Pr(Z > 0.5) = 1 – Pr(Z <
0.5) = 1 – 0.6915 = 0.3085
7.3.3.5 Illustration 2
Find the probability of the form Pr(Z < -a)

Because of symmetry Area A = Area C, Hence

55
i.e. Pr(Z > -a) ≡ Pr(Z < a) …………………Rule 2
7.3.3.6 Illustration 3:
Find the probability of the form Pr(Z < -a)

Area A = Area C (symmetry), therefore


Area B + Area A = Area B + Area C,
Area A + Area B + Area C = 1
Area A = 1 – (Area B + Area C) = 1 – (Area B + AreaA)
Pr(Z < -a) = 1 – Pr(Z < a) ……………..Rule 3
7.3.3.7 Illustration 4:
Finding Pr(b < Z < c)

The required area is Area B. Pr(b < Z < c) = Pr(Z < c) – Pr(Z < b)
…………Rule 4
In the above figure, we have assumed that both “b” and “c” are positive but
the same argument applies if either b or c or both of b and c < 0.
7.3.3.8 Examples:
 Find Pr(0.95 < Z < 1.36)

Using rule 4, Pr(0.95 < Z < 1.36) = Pr(Z < 1.36) – Pr(Z < 0.95) = 0.9131 –

56
0.8289 = 0.0842
 Find Pr(-1.50 < Z < 2.50)

Pr(-1.50 < Z < 2.50) = Pr(Z ≤ 2.5) – Pr(Z ≤ -1.5) = Pr(Z ≤ 2.5) – [1 – Pr(Z ≤
1.50)]
= 0.9938 – (1 – 0.9332) = 0.927
 Pr(95 < x < 10.5) = Pr(Z < 95) – Pr(Z < 105)
Pr(Z < 95) = 95 – 100/4 = -5/4 = 1.25
Pr(Z < 105) = 105 – 100/4 = 1.25
1.25
Z-scale
0
-1.25

8.0 HYPOTHESIS TESTING


Testing hypothesis about population characteristics such as mean,
standard deviation, population variance etc. is a fundamental aspect of
statistical inference and analysis. Testing hypothesis refers to the
acceptance or rejection of an assumption made about an unknown
population characteristics.
Here, we shall be concerned with testing hypothesis about the
population mean alone. We first state the formal steps in testing
hypothesis about the population mean. We consider the following

57
definitions.
 Statistical Hypothesis: this is an assumption about the value of the
parameter of the population under consideration
 Null Hypothesis: is a statistical hypothesis which can be tested. Often, it
will be just that hypothesis that we are suspicious about and which to
disprove. The symbol used for the null hypothesis H 0 and we will write H0:
µ = µ0 where µ = mean and µ0 = assumed value for the mean
 Alternative Hypothesis: is that hypothesis that is automatically accepted if
the null hypothesis is rejected. It is represented by H 1. For instance, the
alternative hypothesis could be H1: µ not equal to µ0; µ > µ0; µ < µ0. The
alternative hypothesis adopted depends on the nature of the problem.
There are two types of alternative hypothesis.
 Two-Tailed Test: a two-tailed alternative hypothesis test considers any
change in the parameter. The change can be increase or decrease. Hence
a two-tailed hypothesis is stated in the form H 1: µ not equal to µ0
 One-Tailed Test: a one-tailed alternative hypothesis test strictly consider
either an increase or a decrease in the value of a tested parameter e.g. H 1:
µ > µ0 (Right-tailed) or H1: µ < µ0 (left-tailed).
 Critical Region: a critical region is the set of possible value that a sample
statistic e.g. Z-distribution can take that leads us to reject the null
hypothesis. The critical region is also called rejection region
 Acceptance Region: this is the set of the possible value that a sample
statistic can take that leads us to accept the null hypothesis.
 Level of Significance: this is the probability that a test statistic is in the
critical (rejection) region under H0. The most commonly used level of
significance are 5% and 1%. We shall use 5% level of significance. 0.05
level of significance the critical value defining acceptance region or
rejection region are illustrated below for both two-tailed and one-tailed
tests, using the Z-distribution. It is represented by α.

 Two-tailed test

58
 Right-tailed test

 Left-tailed test

 Decision Rule: a decision rule for a statistical test is a model given value of
a sample or test statistics that could lead to either acceptance or rejection
of the stated null hypothesis. Decision rule are illustrated below for two-
tailed and one-tailed test at 5% level of significance.
 Decision Rule for a two-tailed test: accept H0 if |Z*| < 1.96. Otherwise
reject H0.
Z* = x - µ/σ/√n where x = sample mean, σ = sample variance/ population variance
and n = sample size
 Decision Rule for a one-tailed test: (a) for a right-tailed test, accept H0 if
Z* < 1.64. Otherwise, reject H0 (b) for a left-tailed test, accept H0 if Z* >
1.64. Otherwise, reject H0
8.1 Illustration: Suppose we have the following hypothesis to be tested
given x = 7.5
H0: µ = 8.5; H1: µ not equal to 8.5.
Given a sample size of 50 from population variance 10, we proceed as
follows at α = 0.05.

59
8.1.1 Step 1: Calculate the Z value corresponding to the mean
Z* = x - µ/σ/√n = 7.5 – 8.5/10/√50 = -0.71; |Z*| = 0.71
8.1.2 Step 2:
Decide the acceptance region and rejection from the set table at α = 0.05,
the value of Z leads α/2 = 0.025 of the area in each tail is Z0.025 ≡ ± 1.96

8.1.3 Step 3:
Set up the decision rule. Accept H0 if |Z*| < 1.96. Otherwise, reject H0
8.1.4 Step 4:
Make decision by comparing the value of Z* with tabulated value of Z.
Since |Z*| = 0.71 < 1.96, we therefore accept H 0 to be true.
8.1.5 Step 5:
Make conclusion. Since H0 has been accepted it means that the sample
mean value of 7.5 is not significantly different from our apriori value of µ =
8.5
8.2 Errors in statistical test
In all type of statistical test, it is possible to make error. There are
two important errors that we need to be aware of:

 Type-I-error: this is made when we reject the true hypothesis e.g. if we


reject H0 when it is true, we have committed a type-1-error. Hence, Pr
(Type-1-error) = Pr(rejection H0 when it is true) = α. Note that α is the level
of significance. Hence, level of significance is the probability of rejecting a
true hypothesis or committing a type-1-error.

 Type-II-error: this refers to the acceptance of a false hypothesis e.g. if we


accept H0 when it is false then we have committed a type-II-error. Hence,

60
Pr(type-II-error) = Pr(accepting H0 when it H1 is true) = β

8.3 Formal steps in testing hypothesis about the population mean at


α = 0.05

 Assume that µ = some hypothetical value i.e. write H 0: µ = µ0. Alternative


hypothesis could be any of H1: µ not equal to µ0, H1: µ < µ0, H1: µ > µ0
depending on the problem

 Using the 5% level of significance and the Z-distribution define the


acceptance and rejection region

 Compute the sample mean from a random sample and standardize on the
scale

 State the decision rule, make decision and conclude

8.4 Example:

A producer of steel cable wants to test if the steel cables he is


producing have a breaking strength of 5000 unit. A breaking strength of
less than 5000 unit will not be adequate and to produce steel cable with
breaking strength of more than 5000 units would unnecessarily increase
production cost. The producer takes a random sample of 64 pieces and
finds that the average mean breaking strength is 5100 unit and the sample
standard deviation is 480 units. Should the producer accept the hypothesis
that his steel cable has a breaking strength of 5000 units at the 5% level of
significance?

H0: µ = 5000 unitsH1: µ not equal to 5000 units

Z* = x - µ/σ/√n = Z* = 5100- 5000/480/√64 = 100/480/8 = 1.67

At α = 0.05, α/2 will be used because it is two-tailed Zα/2 = Z0.05/2 = Z0.025 =

61
1.96

8.4.1 Decision rule: Reject H0 if |Z*| > 1.96. Accept H0 if |Z*| < 1.96

8.4.2 Conclusion: Since |Z*| = 1.67 < 1.96, we accept H 0 and conclude
that the producer should accept that the breaking of the cable is not
significantly different from 5000.

62

You might also like