0% found this document useful (0 votes)
518 views

Introduction To Statistics Module

This document provides an introduction to statistics, outlining its key concepts, objectives, and sections. It defines statistics as the science of collecting, organizing, analyzing, and interpreting numerical data. The introduction describes the objectives of the course as enabling learners to understand introductory statistical concepts, summarize and describe data, execute appropriate statistical techniques, and apply statistics in their fields of study. It then outlines the major sections to be covered, including introductory concepts, graphical and numerical descriptive techniques, measures of dispersion, probability theories, sampling, estimation, and hypothesis testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
518 views

Introduction To Statistics Module

This document provides an introduction to statistics, outlining its key concepts, objectives, and sections. It defines statistics as the science of collecting, organizing, analyzing, and interpreting numerical data. The introduction describes the objectives of the course as enabling learners to understand introductory statistical concepts, summarize and describe data, execute appropriate statistical techniques, and apply statistics in their fields of study. It then outlines the major sections to be covered, including introductory concepts, graphical and numerical descriptive techniques, measures of dispersion, probability theories, sampling, estimation, and hypothesis testing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 101

INTRODUCTION TO STATISTICS (5 ECTS)

Compiled by:

Alelign Ademe (MSc)


Fentahun Tesafa (MSc)

FEBRUARY, 2013
Statistics for Agribusiness Introduction to Statistics

Table of Contents
3.1. INTRODUCTION TO STATISTICS 1
3.1.1 INTRODUCTION 1
3.1.2 OBJECTIVES 1
3.1.3 SECTIONS 1
3.1.3.1 INTRODUCTORY CONCEPTS IN STATISTICS 1
3.1.3.1.1 Definitions of Statistics 2
3.1.3.1.2 Categories of Statistics 3
3.1.3.1.3 Functions and scopes of statistics 3
3.1.3.1.4 Limitations of statistics 5
3.1.3. 2 GRAPHICAL AND NUMERICAL DESCRIPTIVE TECHNIQUES
6
3.1.3.2.1 Graphical Descriptive Techniques 7
3.1.3.2.2 Numerical Descriptive Techniques 12
3.1.3.3 MEASURES OF DISPERSION 22
3.1.3.3. 1. Absolute measures of dispersion 22
3.1.3.3.2. Relative Measures of Dispersion 24
3.1.3.4 PROBABILITY THEORIES 27
3.1.3.4.1 Basic Concepts of Probability 27
3.1.3.4.2 Definitions and Types of Probability 29
3.1.3.4.3 Basic Rules of Probability 30
3.1.3.4.4 Normal Probability Distribution 33
3.1.3.5. CONCEPTS OF SAMPLING AND THEIR APPLICATIONS 37
3.1.3.5.1 Basic Concepts of Sampling 37
3.1.3.5.2 Probability and Non-Probability Sampling Methods 39
3.1.3.5.3 Sampling Problems (Errors in Sample Survey) 43
3.1.3.5.4 Sampling Distributions 45
3.1.3.5.4.1 Sampling Distribution of the Mean 45
3.1.3.5.4.2 Sampling Distribution of the Proportions 51
3.1.3.6 STATISTICAL ESTIMATION AND HYPOTHESIS TESTING 55
3.1.3.6.2 Determining the Sample Size 59
3.1.3.6.3 Hypothesis Testing 60
3.1.3.6.3.1 Basic Concepts in Hypothesis Testing 61
3.1.3.6.3.2 Hypothesis Tests about a Population Mean 62
ii

Jimma, Haramaya, Hawassa, Ambo, Adama, Samara and Wolaita Sodo Universities
Statistics for Agribusiness Introduction to Statistics

3.1.3.6.4 Hypothesis Tests about a Population Proportion 65


3.1.3.6.5 Analysis of Variance 75
Proof of Ability 79

iii

Jimma, Haramaya, Hawassa, Ambo, Adama, Samara and Wolaita Sodo Universities
Introduction to Statistics

3.1. LEARNING TASK I: INTRODUCTION TO STATISTICS

3.1.1 Introduction
The learning task was designed to equip students with the ability to identify
the importance and application areas of statistics in their field of study;
interpret statistical information, reports, charts and figures; choose
appropriate sampling methods and procedures; explain the basic concepts of
probability distributions and their application; use estimation and testing
methods for predication and generalization purposes. In addition, the
learning task attempts to enable students to describe data collection tools and
procedures.

3.1.2 Objectives

This learning task will enable learners to:


 Know the elements, concepts and principles of basic statistics;
 Realize the need for data collection and summarize data sets into
meaningful information;
 Execute appropriate statistical techniques and write sound interpretations
for use in practical decision making;
 Describe data with appropriate measures of central tendency, variability
and relationship;
 Identify the different types of statistical distributions with their
applications;
 Execute appropriate sampling methods for various types of researches in
agribusiness and value chain;
 Draw conclusions about population parameters based on the information
obtained from the sample and determine its significance by performing
hypothesis testing;
 Illustrate the application of statistics in areas of their discipline; and
 Demonstrate the relationship of statistics to their everyday life of
activities.
1

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

3.1.3 SECTIONS

3.1.3.1: INTRODUCTORY CONCEPTS IN STATISTICS

Define statistics? What are the applications of statistics in the sphere of human
activity? What are the limitations of statistics?
Definitions of Statistics
Statistics has been defined differently by different authors from time to time.
One can find more than hundred definitions in the literature of statistics.
Statistics can be used either as plural or singular.
When it is used as plural, it is a systematic presentation of facts and figures. It is
in this context that majority of people use the word statistics.
When statistics is used as singular, it is defined as the science of collecting,
organizing, presenting, analyzing and interpreting numerical data for useful
purposes. According to this definition, the area of statistics incorporates the
following five elements:
(a) Proper collection of data
The data itself forms the foundation of statistical analysis, and hence the data
must be carefully and accurately collected and accumulated. If the data is faulty
it will lead a wrong conclusion. The data may be available from existing
published sources which may already be organized into a presentable form, or it
may be collected by the investigator itself.
(b) Organization and classification of data
The collected data must now be edited in order to correct any inconsistencies,
biases, omissions and irrelevant answers in the survey or any mistakes in the
necessary computations. After editing, the data must be classified in suitable
terms according to some common characteristics of the elements of data. This
makes it easier for presentation.
(c) Presentation of data

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The organized data can now be presented in the form of tables or diagrams.
This presentation in an orderly manner facilitates the understanding as well as
the analysis of data.
(d) Analysis of data
The basic purpose of data analysis is to make it useful for certain conclusions.
This analysis may simply be critical observation of data to draw conclusions
about it or it may involve highly complex and sophisticated mathematical
techniques.
(e) Interpretation of the data
Interpretation means drawing conclusions from the data which form the basis of
decision making. Correct interpretation requires a high degree of skill and
experience and is necessary in order to draw valid conclusions.
Categories of Statistics
Statistics can be divided into two broad categories such as descriptive statistics
and inferential statistics.
(i) Descriptive statistics
It is a collection of methods that are used to summarize and describe the
important characteristics of a set of measurements (data). As the name
suggests, it merely describes the data and consists of methods used in the
collection, organization, presentation and analysis of the data in order to
describe the various characteristics of such data. The methods can be either
graphical or computational. For e.g. frequency distribution tables, pie-charts,
bar-graphs, summaries of data, etc.
(ii)Inferential statistics
It is a set of procedures used to make inferences about the population
characteristics from the information contained in the sample. It can be used, for
example, to predict the price of fertilizer in the coming year based on the
sample information in this year, to estimate the effect of the intensity of rainfall
on plant growth, etc. Figure 1.1 below shows the major divisions of statistics.

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Figure 1.1 Major divisions in the field of statistics

Functions and scopes of statistics


Statistics is not a mere device for collecting numerical data, but as a means of
developing sound techniques for their handling, analyzing and drawing valid
inferences from them. Statistics is applied in every sphere of human activity-
social as well as physical- like Biology, Commerce, Education, Planning,
Business Management, Information Technology, etc. It is almost impossible to
find a single department of human activity where statistics cannot be applied.
The following are some of the applications of statistics in different disciplines:

Statistics and Industry: In industries, control charts are widely used to


maintain a certain quality level. In production engineering, to find whether the
product is conforming to specifications or not, statistical tools, namely
inspection plans, control charts, etc., are of extreme importance. In inspection
plans we have to resort to some kind of sampling –a very important aspect of
Statistics.

Statistics and Commerce: Any businessman cannot afford to either by under


stocking or having overstock of his goods. In the beginning he estimates the
demand for his goods and then takes steps to adjust with his output or
purchases. Thus statistics is indispensable in business and commerce.

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Statistics and Agriculture: In tests of significance based on small samples, it


can be shown that statistics is adequate to test the significant difference between
two sample means. For example, five fertilizers are applied to five plots each of
wheat and the yield of wheat on each of the plots is given. In such a situation,
we are interested in finding out whether the effect of these fertilizers on the
yield is significantly different or not. The answer to this problem is provided by
the technique of ANOVA and it is used to test the homogeneity of several
population means.

Statistics and Economics: Nowadays the uses of statistics are abundantly


made in any economic study. Both in economic theory and practice, statistical
methods play an important role. It may also be noted that statistical data and
techniques of statistical tools are immensely useful in solving many economic
problems such as wages, prices, production, distribution of income and wealth
and so on.

Statistics is no longer confined to the domain of mathematicians, but has spread


to most of the branches of knowledge and human operations including social
sciences and behavioral sciences. One of the reasons of its phenomenal growth
is the variety of different functions attributed to it. Some of the important
functions of statistics are:
(1) It represents facts in the form of numerical data: Qualitative expressions
of facts are prone to misinterpretation since qualitative statements do not lend
themselves to precise comparisons. Statistics, however, are numerical
expressions which are in a precise and definite form, which makes it easier to
understand. For example, statements like “unemployment is very high in
Ethiopia” does not convey as a precise meaning as the actual figures related to
unemployment for the current year and their comparison to similar figures of
the previous years.
(2) It condenses and summarizes a mass of data into a few presentable,
understandable and precise figures: For example, the average salary of a
teacher is derived from a mass of data and surveys. But, just one summarized
figure gives us a pretty good idea about the income of a teacher. Similarly,
5

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

stock market prices of individual stocks and their trends are highly complex to
comprehend, but a graph of price trends, gives us the picture at a glance.
(3) It facilitates comparisons of data: The absolute figures themselves do not
convey any significant meaning. Statistical devices such as averages,
percentages, ratios, etc are the tools that can be employed for the purpose of
comparison.
(4) It helps in formulating and testing hypothesis for the purpose of
correlation: It helps us establish a relationship between two or more variables.
For example, the degree of association between the extent of training and
productivity can be obtained by using statistical tools.
(5) It helps in predicting future trends: Statistical methods are highly useful
in analyzing the past data and predicting some future trends. For example, the
scales for a particular product for the next year can be computed by knowing
the sales volumes for that product for the previous years and the current market
trends and possible changes in the variables that affect the sales.
(6) It helps the central management in formulating policies: Based upon the
forecast of future trends, events or demand, the management can revise their
policies and plans to meet the future needs.

Limitations of statistics
Now a day’s statistics has become an inevitable part of our life. Though the use
and application of statistics is vast, it is not so easy to collect information or
data. Steps have ample the probability of mistake in its proper application can
be happened. Besides these, statistics has some limitation, some of which are:
(a) Statistics is not suitable to the study of qualitative phenomenon: Since
statistics is basically a science and deals with a set of numerical data, it is
applicable to the study of only these subjects of enquiry, which can be
expressed in terms of quantitative measurements. As a matter of fact, qualitative
phenomenon like honesty, poverty, beauty, intelligence etc, cannot be expressed
numerically and any statistical analysis cannot be directly applied on these
qualitative phenomenons.
(b) Statistics does not study individuals: Statistics does not give any specific
importance to the individual items; in fact it deals with an aggregate of objects.
6

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Individual items, when they are taken individually do not constitute any
statistical data and do not serve any purpose for any statistical enquiry.
(c) Statistical laws are not exact: It is well known that mathematical and
physical sciences are exact. But statistical laws are not exact and statistical laws
are only approximations. Statistical conclusions are not universally true. They
are true only on an average.
(d)Statistics is liable to be misused: Statistics must be used only by experts;
otherwise, statistical methods are the most dangerous tools on the hands of the
inexpert. The use of statistical tools by the inexperienced and untraced persons
might lead to wrong conclusions. Statistics can be easily misused by quoting
wrong figures of data.
(e) Statistics can analyze only aggregated observation or data: Any
statistics is a collection of data. Individual observation does not belong to
statistics hence, statistics analyses a collection of data and enlighten the overall
estimated result. For- example the average income of the laborers of a business
can be estimated by observing their per capital income. Average income does
not particularize anybody or neglect anybodies income. In this respect, statistics
gives an overall idea.
(f) Statistics rules are mutable: In some brunches of science some
unchangeable principles and data are available .but in statistics they are not
found. The principles of statistics are variable and changeable, approximate etc.
(g) Statistics is simply a method: The practical solution of any problem can be
done in many ways. Statistics is one of the methods of solving the problem. Its
evidences give the ideas of any matter beforehand. Its evidences are to be
relatively supported by other observation or data.
Learning activities
1. Which definition of statistics is more important? How?
2. What are the major importance and limitations of statistics?

Continuous Assessment
Question and answering, Quiz

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Summary
In this section you have learnt the introductory concepts of statistics. Here, you
have acquainted yourself with the definitions of statistics. When statistics is
used as singular, it incorporates the five elements such as collecting, organizing,
presenting, analyzing and interpreting of numerical data. Statistics can also be
categorized as descriptive and inferential statistics. Descriptive statistics is used
to collecting, organizing, summarizing, and presenting of numerical data where
as inferential statistics is used to performing hypothesis testing, determining
relationships between variables, and making predictions. Finally, you have
learnt the major functions and limitations of statistics.

3.1.3. 2 GRAPHICAL AND NUMERICAL DESCRIPTIVE


TECHNIQUES

What are the various graphical descriptive techniques? What are various types
of measures of central tendencies? Which descriptive technique is more
important? Why?
The subject area of descriptive statistics includes procedures used to summarize
masses of data and present them in an understandable way. Thus, we can
classify and examine the techniques (or procedures) of descriptive statistics into
two: graphical descriptive techniques (frequency distribution) and numerical
descriptive techniques (measures of central tendency). In this learning task, we
will discuss the various types of graphical descriptive techniques and numerical
descriptive techniques.

Graphical Descriptive Techniques


A frequency distribution is a tabulation of the values that one or more variables
take in a sample. Each entry in the table contains the frequency or count of the
occurrences of values within a particular group or interval, and in this way the
table summarizes the distribution of values in the sample. A frequency
distribution can be constructed for qualitative data (called qualitative frequency
distribution) and quantitative data (called quantitative frequency distribution).

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(1) Qualitative frequency distribution: A list of qualitative classes with their


corresponding frequency counts (or observations). Non-quantitative classes (or
categories) of a qualitative variable are called qualitative classes.
Example 2.1: Assume we have the following information on a sample of 50
individuals who recently purchased a new car.
Table 2.1: Frequency distribution of new car purchased for n = 50
Qualitative classes (Type of Renau Toyot Hond
Citroen Ford Total
car) lt a a
Observations (Frequency
9 14 8 11 8 50
counts)

(ii) Quantitative frequency distribution: It is also called numerical frequency


distribution. It is a table which shows how the numbers in a data set (a group of
members) are distributed in an interval which includes the smallest and largest
numbers in the data set. In constructing a quantitative frequency distribution,
we have to be extremely careful in defining the non-overlapping classes to be
used in the distribution.
Example 2.2: Data on the time taken (in days) by an accounting company to
complete its end-of-year audits for a sample of 30 organizations are given
below:

Table 2.2: Data on year audits for 30 sample organization


25 32 45 8 24 42 22 12 9 15 26 35 23 41 47
18 44 37 27 46 38 24 43 46 10 21 36 45 22 18
In order to construct a frequency distribution from these data, there are
essentially three steps that should be followed: (1) Determining the number of
classes (2) Determining the width of each class and (3) Determining the limits
of each class.
Before constructing a frequency distribution for a quantitative variable, it is
important to note the following concepts:
(a) A class- is a description of a group of similar numbers in a data set.
(b)A class frequency- is the total number of data values in a class.
(c) Class limits- are the numbers used to describe classes in a frequency
distribution. A given class is bounded by two limits, they are called lower
9

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

class limit (LCL) and upper class limit (UCL). While LCL identifies the
smallest possible data value assigned to the class, the UCL identifies the
largest possible data value assigned to the class.
(d)A class mark (mid-class) – is the average value of limits of a given class.
The class limits can be either stated or real.
(e) A class width (class interval) – is the difference between: Successive lower
class limits, or successive upper class limits, or successive class marks, or
the real class limits of a given class.
The three steps of constructing a quantitative frequency distribution are now
discussed briefly as follows:
(1) Determining the number of classes: Classes should be mutually exclusive.
They are formed by specifying values that will be used to group or classify
the elements in the data set. Data sets with a large number of elements
usually require a large number of classes. The objective is to use just
enough classes to capture the inherent variation within the data. Therefore,
as a rule of thumb we shall use from 5-20 classes. In general, smaller
number of classes is appropriate for small data sets, and larger number of
classes is appropriate for large data sets. The appropriate number of classes
may be decided by Yule’s formula, which is as follows:

, where “n” is the total number of observations.

Number of classes = 2.5 x n1/4 = 2.5 x 301/4 = 2.5 x 2.3= 5.8≈6

(2) Determining class width: A larger number of classes inevitably mean a


smaller class width. A larger size of class width will also mean a smaller
number of classes. Once we decide the number of classes, we can compute
an approximate class width by using the following formula:

10

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

From the above example, since the chosen number of class is 5, the
approximate class width is (46-8)/6 = 6.3≈6. It may be more convenient to
round this up (or down) to the nearest integer number. The appropriate class
width may be obtained through a trial and error process, which should be
viewed as a feature of developing a frequency distribution with quantitative
data. It is, however, desirable for the class width to be the same in order to
facilitate a meaningful interpretation. For the purpose of this exercise we will
use a common class width of 6.

(3) Determining class limits: Once we decide the number of classes, the class
width and the lower limit of the first class, it will be simple to obtain the
values of the lower and upper class limits of each class. Thus, in the above
example, the first interval ranges from 8 to 14, the second 15 to 21… and the
last class 43 to 49. However, the initial and terminal intervals (or limits) are
again determined by the judgment of the investigator. This then completes
the necessary steps for the construction of quantitative frequency
distributions. The frequency distribution for the data set in our example is as
follows:
Table 2.3: Frequency distribution for audit time in days

Class Intervals Frequency


8 – 14 4
15 – 21 4
22 – 28 8
29 – 35 2
36 –42 5
43 –49 7
Total 30

The important characteristics of classes for a quantitative frequency distribution


are:
1. Classes must be mutually exclusive (or non-overlapping) - each data value
must fall in only one class.

11

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

2. Classes must be exhaustive (or all inclusive) - they must provide a place to
record every data value in the data set.
3. The number of classes and the class limits must be chosen in a way that
empty classes (i.e., classes with zero frequencies) do not occur.
4. All classes shall have the same interval if possible.

Absolute and Relative Frequency Distributions: A relative frequency


distribution is constructed by dividing each actual (or absolute) frequency of a
distribution by the total frequency. The relative frequency can be expressed
either as a decimal or as a percent. Relative frequency distributions are
particularly useful when we need to compare distributions having different total
frequencies, since they provide comparisons in which the total for each
distribution is the same (i.e., 100% if percents are used, or 1.00 if decimals are
used).

Cumulative Frequency Distribution: The cumulative frequency distribution


uses the number of classes, class widths and class limits developed for the
frequency distribution but rather than showing the frequency for each class, the
cumulative frequency provides a running total of frequencies through the
classes. Cumulative frequencies can either be:

(a) ‘from the top’ cumulative frequencies, which start with the first frequency
from the top, then following frequencies are added cumulatively down ward.
(b) ‘from the bottom’ cumulative frequencies, which start with the first
frequency from the bottom, then following frequencies are added
cumulatively upward.

Table 2.4: Cumulative frequency distributions for audit time in days for n = 30
12

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Class Cumulative frequencies


Absolute Relative Class
intervals Added from Added from
frequency frequency mark
the top the bottom
8 – 14 4 0.13 11 4 30
15 – 21 4 0.13 18 8 26
22 – 28 8 0.27 25 16 22
29 – 35 2 0.07 32 18 14
36 –42 5 0.17 39 23 12
43 –49 7 0.23 46 30 7
Total 30 1.00

Frequency distribution charts: It is probably true that a picture is worth a


thousand of words, and it is certainly true when the purpose is to describe a
quantitative data set. In other words, it means that visually it is easier to
comprehend a data pattern by looking at a chart than by examining numbers in
a frequency table. The major types of frequency distribution charts are:
Histograms, Frequency polygons, Ogives, Bar chart and Pie-chart. The first
three are the graphical representation of quantitative frequency distribution
while the last two are the graphical representation of qualitative frequency
distribution.
(1) Histograms: A graph which displays the data by using vertical bars of
various heights to represent frequencies. The horizontal axis can be either
the class boundaries, the class marks, or the class limits.
(2) Frequency polygons: A frequency polygon is a line graph made by plotting
a point for each pair of numbers (class mark, class frequency), then
connecting the points by line segments. A class mark is chosen to represent
all the data values counted in a class. Thus, we can find the class mark of
each class by using the following formula:

Where, CMi = Class mark of the ith class, LCLi and UCLi =
the lower and upper class limit for the ith class limit, respectively. For
example, the CM of the first class for a frequency distribution in the above
example is:

13

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Frequency polygons are preferred to histograms for graphical comparisons of


two or more distributions. Thus, in this case we can use relative frequency
polygons if these distributions have different total frequencies.
(3) Ogive Curves: An ogive is a line chart of cumulative frequency distribution.
There are two types of an ogive chart:
(a) A less than ogive chart – a chart that is drawn by using the upper limits of
classes as the values of the horizontal axis and cumulative frequencies added
from the top as the vertical axis.
(b) A greater than ogive chart – a chart that is drawn by using the lower limits of
classes as the values of the horizontal axis and cumulative frequencies added
from the bottom as the vertical axis.
(4) Bar chart: Bar chart is a chart made by drawing rectangles in which
qualitative classes (or categories) are at their bottoms and whose heights are
the class frequencies. It is a graphical representation of qualitative frequency
distribution. The bars are now separated more as compared to the bars in the
case of histogram to emphasize that each class is a separate category.
(5) Pie-chart: It is usually used to represent qualitative data expressed as
relative frequencies. The relative frequencies are used to sub-divide the pie
into sectors or parts corresponding to the relative frequency for each class.
The pie chart starts with a circle which has 360 degrees and since Citroen
has a 0.18 (= 9/50) share of the market, the degree of its angle is 0.18360 =
64.8, or it is possible to express it in terms of percent, which is 0.18x100% =
18%. The remaining categories can be calculated in a similar way.

Numerical Descriptive Techniques


The subject area of measures of central tendency provides descriptive
information about a single numerical value that is considered to be the most
typical of the values of a quantitative variable. The following are the commonly
used numerical descriptive techniques: (1) Arithmetic Mean (2) Median (3)
Mode (4) Geometric Mean and (5) Harmonic Mean.
14

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(1) Arithmetic mean: It is also called arithmetic average, or simply the mean,
or the average. It is the most familiar and useful measure of average. It is
computed by dividing the sum of all data values by the total number of
observations in a data set. It provides a measure of a central location. The
arithmetic mean can further be divided into the following three types:
(i) Simple arithmetic mean (SAM): It is the arithmetic mean for ungrouped
data. If X is a variable having n values, x1, x2, …, xn, then its SAM can be
computed as:

Where a symbol  is a sigma notation which denotes the summation sign


and xi is the ith data value (or observation) of a variable X.

Example 2.3: Suppose a data set consists of the following (n = 7) elements: 3,


5, 6, 6, 7, 10 and 12.

(ii)Grouped arithmetic mean (GAM): Let f1,f2,…,fk be “k” frequencies to the


mid values of the class intervals x1,x2,……,xk, then

Example 2.4: What is the GAM of a grouped data set in (a frequency


distribution) Table 3.

The mean cannot be computed from an open-ended frequency distribution table


which gives no information about the sizes of the numbers in the open class.
However, if the mean of the numbers in the open class is known or can be
estimated, this average can be taken as the class mark of the open interval. Then
the computation can proceed as before.
15

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(iii) Weighted arithmetic mean (WAM): If the frequencies given above are
substituted by weights, then it is called WAM. If X is a variable having n values
(or class marks), x1, x2, …, xk with the corresponding weights of w1, w2, …, wn,
then its WAM can be computed as:

Example 2.5: Suppose profits per order for small, medium and large orders are
Birr 1, 2 and 3, respectively. Thus, the average profit per order is obtained by
using a formula of SAM. That is,

Suppose, in the example at hand, the numbers of small, medium and large
orders are 120, 60 and 20, respectively. Then the SAM of the three profits does
not tell us what average profit was actually earned. Thus, to find that, each
profit is weighted (or multiplied) by w, the number of orders having this profit.
The actual average profit per order in this case can be obtained by using a
formula of WAM. That is,

Thus, the weighted average profit per order is Birr 1.5. The predominance of
lower profit orders makes the weighted average lower than the simple average.

(iv) Geometric mean: If X is a variable having n values, x1, x2,…, xn, then
the geometric mean (GM) of a variable X for ungrouped data is computed as
the nth root of the product of n values. That is,

16

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

For grouped data, suppose X is a variable having k values, x1, x2, …, xk,
occurring with the corresponding frequencies f1, f2, …, fk, then the geometric
mean (GM) of a variable X is computed as:

Example 2.6: Compute the GM of the data set given in example 2.3.

, which is less than its simple


arithmetic mean. Thus, the geometric mean is less than arithmetic mean for a
given data set except in the case where all the numbers in the data set are the
same. In this case, they are equal. Geometric mean is used when there are
sequences, ratios or percent changes. It is sometimes used as the average for
right-skewed economic data as it is not affected as much by the skewness as the
arithmetic mean.
The average of percentage changes from period-to-period can be computed
using geometric mean. The three steps to compute the geometric mean of
period-to-period percentage change are:

1. Convert the percentage changes into growth factors.


A growth factor is defined as the ratio of a number for one period to the
corresponding number for a previous period (provided that the values of a
variable are given in number). In other words, it is simply a decimal
equivalent of percent change in decimal plus one if the values of a variable
are given in percent.

2. Find the geometric mean of the growth factors.


3. Convert the geometric mean of the growth factors into an average percent
change. This can be obtained by subtracting one from the result obtained in
step 2 above.
Example 2.7: Suppose Bahir Dar Textile Company’s year-to-year changes in
fuel consumption expenditures were -5, 10, 20, 40 and 60%, then what will be
its average yearly percent change in fuel expenditure?
17

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Solution: We can find the average yearly percent change in fuel expenditures
by using a geometric mean of growth factors as follows:

Step 1: Convert the percent change to growth factors.

Step 2: Calculate the geometric mean of the growth factors.

Step 3: Covert the geometric mean of the growth factors into an average yearly
percentage change.

The average yearly percent change in expenditure = 1.229 – 1 = 0.229 = 22.9%


(i.e., average annual rise in fuel expenditure is 22.9%).

(v) Harmonic Mean (HM): Harmonic mean is the reciprocal of the arithmetic
mean of the reciprocals. If X is a variable having n values, x1, x2, …, xn, then
the HM of a variable X for ungrouped data is computed as:

For grouped data, suppose X is a variable having k values, x1, x2, …, xk,
occurring with the corresponding frequencies f1, f2, …, fk, then the HM of a
variable X is computed as:

Example 8: What will be the HM of the data set given in example 2.3 above?

Xi 3 5 6 6 7 10 12
Reciprocal 1/3 1/5 1/6 1/6 1/7 1/10 1/12
18

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Note that HM ≤ GM ≤ AM but HM, GM and AM are equal where all the
numbers in the data set are the same. In general, because the mean takes into
account the value of every observation in a sample, it can be greatly distorted
(or affected) by extreme value(s).

(2) Median: The median is the value of the middle observation in a set of
observations (or data), which have been arrayed in order of magnitude. Thus, to
find the median, first put the numbers in ascending or descending order and
then find the middle position in the array. The median is therefore a position
average.

For ungrouped data, with ‘n observations or sample size’ arrayed in magnitude,


the median (Md) is:

Example 2.9: (a) Suppose there are n=5 data items such as: 1, 5, 4, 1 and 9.
What is the median of this data set? (b) If the data set consists of n=6 such as 1,
1, 4, 8, 6 and 10, then what will be the median?

Solution: In order to find the median of a given data set, we first arrange the
data set in either ascending or descending order.

(a) The data set is then arranged in ascending order as: 1, 1, 4, 5, 9. Since n=5 is

odd, its median is the value of observation, which


equals to 4.

19

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(b) The data set is arranged in ascending order as: 1, 1, 4, 6, 8, 10. Since n=6 is
even, its median is the average of two middle values of ,
which equals to 5. i.e.,

For the case of grouped data (i.e., data in a frequency distribution table), the
median will be approximated as follows.

The median class of a distribution is a class where the sum of frequencies (or
cumulative frequency) is greater than or equal to n/2 for the first time. In a
frequency distribution, the median had to be interpolated in the class interval
containing the median, assuming the observations are uniformly distributed
within the median class.

Example 2.10: Suppose a frequency distribution of number of seeds per plant


for a sample of 28 plants was given in Table 4 below. How can you compute (or
approximate) the median of this distribution.

Table 2.5: Frequency Distribution of Seeds per Plant


No. of
Cumulative
Seeds per Plant plants Class mark
Frequency
1–3 6 6 2
4–6 7 13 5
7–9 10 23 8
10 – 12 2 25 11
13 – 15 3 28 14
Here, the total frequency (n) = 28. Thus, n/2=28/2=14. Since the cumulative
frequency (CF) 23 >14 at the first time, the third class (7 – 9) is the median
class.

20

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

We can compute the median by using the following formula:

Where, L = Lower class limit (or boundary) of the

median class; CF = Cumulative frequency of pre-median class; (N/2-CF) is the


number of items required from the median class to reach a CF of n/2; C = the
class interval (or class width); fm = Frequency of the median class;

locates the median within the median class.

The median is the 50-50 number in a data set, meaning 50% of the data set (or
observations) fall below 6.8 while the rest 50% fall above 6.8.

The median is resistant to extreme observations. i.e., the value of an outlier does
not affect the median. But, an outlier has a marked effect upon the arithmetic
mean.

Other measures of location for approximating the median for a frequency are
quartiles, deciles and percentiles. Formulas to get quartiles, deciles and
percentiles are:

(a) Quartiles: Where, i = 1, 2, 3, 4; Li = Lower

limit of the ith quartile; CF = Cumulative frequency of the pre-ith quartile


class; fi = Frequency of ith quartile class and C = Class interval.

(b) Deciles: Where, i = 1, 2, …, 10; Li = Lower

limit of the ith deciles; CF = Cumulative frequency of the pre-ith deciles class;
fi = Frequency of ith decile class and C = Class interval.

21

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(c) Percentiles: Where, i = 1, 2, …, 100; Li =

Lower limit of the ith percentile; CF = Cumulative frequency of the pre-ith


percentile class; fi = Frequency of ith percentile class and C= Class interval.

In general, we should first find the value of the following factors such as
ith quartile, the ith decile and the ith
percentile class, respectively. Note that median = Q2=D5=P50

Example 2.11: A sample of measurements on outputs per labor-hour for 80


working days was given in Table 2.5 below: Find Q1, D8 and P20 and interpret
the results.

Table 2.6: A Frequency Distribution of Output per Labor-Hour for 80


Working Days
Class Classes of Absolute Cumulative
no. outputs frequency frequency
1 4.00 – 6.99 5 5
2 7.00 – 9.99 15 20
3 10.00 – 12.99 22 42
4 13.00 – 15.99 15 57
5 16.00 – 18.99 13 70
6 19.00 – 21.99 10 80

Solution: First we should get the quartile one class (Q1). In order to get the Q1
class, we first find the value of to 20 the
cumulative frequency (CF) 20 is equal to 20 at the first time, the second class
(7.00 – 9.99) is Q1 class. Now, once we get the Q1 class, we can compute the
value of the median by using the formula.

22

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Interpretation: The outputs per labor-hour for 25% of the total working days
lie below 6.995 while the rest 75% lie above it. Since all the observed values in
the quartile one class are included in the computation of Q1, the value of Q1
equals to the upper boundary of Q1 class.
To get D8, we first find the value which equals to . Since
the CF of 70 is greater than 64 at the first time, the 5th class (16.00 – 18.99) is
8th decile class (D8).

Interpretation: The output per labor-hour for 80% of the total working days lie
below 17.61 while the rest 20% fall above it.

the factor that determines the 20th percentile class, we should


first get this value, which equals to 16 (=0.2x80). As the CF of 20 is greater
than 16 at the first time, the 2nd class (7.00 – 9.99) is the 20th percentile class
(P20).

It implies that the output per labor-hour for 20% of the total working days lie
below 9.195 while the rest 80% fall above it.

(3) Mode (Mo): Mode is defined as the value that occurs most frequently in a
data set. A given data set in which each value occurs only once or each value
occurs with the same frequency has no mode (unique characteristics which is
impossible in the case of other measures of central tendency). Hence, the mode
is not necessarily unique. A frequency distribution with one mode is called
unimodal, with two modes bimodal and with more than two modes called
23

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

multimodal frequency distribution. It is also the only measure of the average for
the variables measured in nominal scale. In case of continuous frequency

distribution, mode is obtained from the formula:

Where, L is the lower limit (or boundary) of a modal class; f is frequency of the
modal class; f 1 and f2 are the frequencies of the classes preceding and
succeeding the modal class, respectively; C is class interval.
If the distribution is moderately asymmetrical, the mean, median and mode
obey the empirical relationship:

Example 2.12: Find mode of a distribution in example 2.4 above.

The modal class of a distribution is also the 3 rd class (i.e.,7 – 9). Thus,

Comparing Mean, Median and Mode: The values of the mean, median and
mode computed based on a data set in example 2.4 are 6.53, 6.8, and 7.32,
respectively. This order reveals that mean < median < mode would indicate the
distribution of this data set is left skewed (panel a, Fig 3 below). The opposite
order (i.e., mean > median > mode) would indicate a right skewed distribution
(Panel b, Fig 3). If mean = median = mode, it is symmetrical (or normal)
distribution (Panel c, Fig 3). Both left and right skewed distributions are also
called asymmetric distributions.

(a) Left skewed distribution (b) Right skewed distribution (c)


Normal distribution

Figure 2.3: Left, Right and Normal Distributions

24

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

In general, the mean is the best measure of central tendency in case of


numerical data with no extreme values. Mode is preferred in case of non-
numerical data without order of arrangements (or nominal data) while median
in case of numerical data with extreme values or non-numerical data with
possible order of arrangement (or ordinal data).

Learning activities

1. The following data is obtained from marks of 20 students in introduction to


statistics out of 35%:
12 14 19 18 15 15 18 17 20 27
22 23 22 21 33 28 14 18 16 13

(a) Construct a frequency distribution table and


(b) Construct a histogram, a frequency polygon and ogive curves based
on the data?
2. The following table gives the marks obtained by 89 students in Statistics.

Marks 10- 20- 35- 40-


15-19 25-29 30-34 45-49
14 24 39 44
No. of
4 6 10 16 21 18 9 5
students
(a) Find the mean, median and the mode?
(b) Find Q2, D5, and P50 and interpret the results?

Continuous assessment
Test and Quiz
Individual assignment on constructing frequency distribution tables and
computing measures of central tendency and dispersion of a given data set.

Summary

The easiest method of organizing data is a frequency distribution, which


converts raw data into a meaningful pattern for statistical analysis. A frequency

25

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

distribution is a tabulation of the values that one or more variables take in a


sample. Each entry in the table contains the frequency or count of the
occurrences of values within a particular group or interval, and in this way the
table summarizes the distribution of values in the sample. One of the most
important aspects of describing a distribution is the central value around which
the observations are distributed. Any mathematical measure which is intended
to represent the center or central value of a set of observations is known as
measure of central tendency (or) the single value, which represents the group of
values, is termed as a “measure of central tendency” or a measure of location or
an average.

3.1.3.3 MEASURES OF DISPERSION

What is the importance of measuring variability of data set? What are the
commonly used measures of dispersion in statistics?
Absolute measures of dispersion
Absolute measures of dispersion consist of range, inter-quartile range, mean
(absolute) deviation, variance, standard deviation and standard error of mean.

(a) Range (R)


The range of a data set is the difference between the highest value and the
lowest value. It is computed by subtracting the lowest number from the highest
number in the sequence. It is thus easy to compute but because it only uses the
highest and lowest values in the sequence its measurement is sensitive to these
numbers if one or both are outliers. It is far this reason that it is regarded as
inferior to other measures of variation that use more of the data points in the
sequence. It is also the simplest measure of variability.
(b)Inter-quartile range (IQR)

It is the difference between the first (Q1) and the third (Q3) quartiles. It shows
the interval width which contains the middle half of the data set.

IQR= Q3 – Q1

26

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(c) Mean deviation (MD) or Mean absolute deviation (MAD)

It is the arithmetic mean of the absolute deviations of each observation from the
mean. It measures on the average departure of each observation from the mean.

&

Where, n is sample size and the deviation may be from the mean or median; k =
Number of classes; fi Frequency of the ith class; Xi = Class mark of the ith class

The difference of each observation from the mean is called a deviation. It shows
how much a number varies from the mean.

(d)Variance : It is the arithmetic mean of the squared deviations of


each observation from the mean. In contrast to the range, the variance is a
measure of dispersion that exploits all data points. It is based on the difference
between each data point and the mean value for the series. In most statistical
applications, the data being analyzed is drawn from a sample and not a
population. The sample can be used to provide an estimate of the population’s
variance which is usually denoted by . An unbiased estimator of the
population variance is provided not by dividing the squared deviations by n but
by n-1, where n is the number of items in the data sequence. The expressions
for the sample variance and population variance are given in Table 3.1 below.
Table 3.1: Formula for Sample and population Variances

Variance for For ungrouped data For grouped data

27

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Variance for For ungrouped data For grouped data

Sample

Population

Example 3.1: Find the mean and variance of the following data set: 46, 54, 42,
46, 32

The mean was computed as:

28

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The variance can be computed as:

We can ignore the units of measurement for the moment but emphasize that the
variance is an important summary statistic that captures the degree of dispersion
inherent in a data set. A major use is when dispersion is being compared across
two samples of data. The sampling variance will be seen to play an important
role later in the discussion series when we explore statistical inference and
hypothesis testing.

(e) Standard deviation

It is the square root of variance. It is the most commonly used measure of


dispersion of the distribution from the mean. It is expressed in the same units of
measurement as the original data. The standard deviation of the data set in
example 3.1 is given by .

(f) The standard error of mean

The measure of the deviations of each sample mean from mean of means of
repeated samples drawn from the same population. It is the measure of how
much the value of the mean may vary from sample to sample taken from the
same population. Or it is the standard deviation of the distribution of all
possible sample means, if samples of the same size were repeatedly taken from
the same population.

It is estimated as:
This can be proved as follows:

29

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Relative Measures of Dispersion

Two or more groups of data sets may not be compared on the basis of absolute
measures of dispersion (although they express the variations in the same units
as the original data) due to the following reasons:

(i) Measures of dispersions of the groups of data sets may be in different


units
(ii)The means of the groups may be quite different

Under these two conditions, the relative measures of dispersion provide a better
indicator of variability since it is a unit free and measures dispersion of a
variable relative to its mean. The relative measures of dispersion consist of
relative range, coefficient of inter-quartile range, coefficient of mean deviation,
and coefficient of variation.

(1) Relative range (RR)


Where, X represents the value of a given variable under
consideration.

(2) Coefficient of inter-quartile range (CIR)


Where, Q3 and Q1 denote quartile 3 and quartile 1, respectively;
Md represents median of the distribution.

(3) Coefficient of mean deviation (CMD)

(4) Coefficient of variation (CV)


Coefficient of variation is the percentage ratio of standard deviation and the
arithmetic mean. It is usually expressed in percentage. The formula for C.V. is:

30

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The coefficient of variation will be small if the variation is small of the two
groups, the one with less C.V. said to be more consistent.

Note: 1. Standard deviation is absolute measure of dispersion

2. Coefficient of variation is relative measure of dispersion.

Example 3.2: Consider the distribution of the yields (per plot) of two ground
nut varieties. For the first variety, the mean and standard deviation are 82 kg
and 16 kg respectively. For the second variety, the mean and standard deviation
are 55 kg and 8 kg respectively.

Then we have, for the first variety: . For the second

variety:

It is apparent that the variability in second variety is less as compared to that in


the first variety. But in terms of standard deviation the interpretation could be
reverse.

Effect of a constant on mean ( ), variance (S2) and standard deviation (S):


If C is a constant number (being positive or negative), then adding C to every
number in a given data set will add C to the mean of the given data set, but will
not change the S2 and S. However, multiplying or dividing every number in a
given data set by C will multiply or divide the mean by C, the S 2 by C2 and S by
. These can be proved as follows:

(a) Addition or subtraction of a constant for each observation in a data set

This implies that addition or subtraction of a constant to each observation


doesn`t affect S2 and S.

(b) Multiplication of a constant for each observation in a data set


31

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(c) Division of a constant for each observation in a data set

Learning activities

1. Calculate S.D. for the yields (in kg per plot) of a cotton variety recorded
from seven plots 5, 6, 7, 7, 9, 4, and 5?
2. If the mean wage rate for production employees is $3.50 per hour with a
standard deviation of $0.20, what would be the effect on and S of raising
all rates by 10%.
3. Discuss why the relative measures of dispersion are better indicators of
variability than the absolute measures if one need to compare the dispersions
of two or more different groups of data sets, which are measured in different
units and/or their means are quite different.

Continuous assessment
Test and Quiz, and individual assignment on selecting representative samples
that can cover the variability of the population under study through using the
appropriate sampling methods

Summary

Measures of central tendency are not adequate to describe a given data set as
they do not provide any information about the spread of the data set (or
observations) about the center and from each other. The subject areas of
measures of dispersion are therefore used to describe the amount of scatter
32

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

around the center of the distribution and from each other. If there were no
variability within population, there would be no need for statistics. Since a
single item or sampling unit would tell all that we need to know about the
population as a whole. Hence, in order to know more about the population from
the sample, only average is not enough. The importance of measuring
variability is therefore to determine how representative the average is and to
compare the variability of two or more data sets.

3.1.3.4 PROBABILITY THEORIES

What do you understand by probability means?


A probability is a quantitative measure of uncertainty. It is a number that
conveys the strength of our belief in the occurrence of an uncertain event. In
dealing with probability problems, it may be useful first to know some basic
concepts/definitions.

Basic Concepts of Probability


(a) A probability experiment- an experiment (or a trial) that can be
repeated any number of times under identical conditions, and all possible
outcomes are known in advance but an individual outcome is not known before
hand. It is also called a random or a statistical experiment.
Examples 4.1: Identify whether the following is a random experiment or not.
1. If a fair coin is tossed three times, it is possible to list all possible 8
sequences of head (H) and tail (T). But it is not possible to predict which
sequence will occur at any occasion. Thus, this is an example of a
random experiment.
2. Rolling of a faire die. It is also an example for a random experiment.
(b) Sample space (S) - the set of all possible outcomes of a random experiment.
However, some sample spaces are better than others. Consider the experiment
of tossing two coins. It is possible to get 0 heads, 1 head, or 2 heads. Thus, the
sample space could be {0, 1, 2}. Another way to look at it is toss {HH, HT, TH,
TT}. The second way is better because each event is as equally likely to occur
as any other.
33

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(c) An event- is a subset of the sample space. Thus, the events in no. 2 in
example 4.1 above, can be:
A = {An even number will turn up} = {2, 4, 6}; B = {An odd number will
turn up} = {1, 3, 5}; C = {A single number will turn up} = {1}, {2}, {3},
{4}, {5}, or {6}, etc.
If an experiment is performed and the outcome is a member of an event in
which we are interested in, then we say that the event has occurred. Otherwise,
it has not occurred. Let A = {2, 4, 6} be an event of a random experiment of
throwing a fair die. Now on a throw of die 2, 4 or 6 appear we say event A has
occurred/happened. If on a throw, 5 appears, then event A has not occurred.
Thus, we can say an event occurs if the experiment gives rise to an outcome
belonging to the event.
(d) A simple event- an event consisting of exactly one element. Thus,
appearance of 1 in a throw of a die is a simple event, where as the appearance
of a multiple of 3 is not a simple event. Every (non-empty) event can be written
as disjoint union of simple events.
(e) Complementary event- All the outcomes in the sample space except the
given events. Thus, the complementary event of an event A is the non-
occurrence of event A in the sample space. It is denoted by or S-A.
contains those elements of the sample space which do not belong to A.
(f)Exhaustive events- Events are said to be (collectively) exhaustive if they
exhaust all the possible outcomes of an experiment. Thus, in the experiment of
tossing two coins, the events (a) two heads, (b) two tails, and (c) one tail, one
head exhaust all the outcomes; hence they are (collectively) exhaustive events.
(g) Equally likely events- Events which have the same probability of
occurrence. Two events A and B are said to be equally likely when there is no
reason to assume that event A occurs more than event B or vice versa. Thus,
p(A) = p(B). For instance, the two events {H} and {T} in the experiment of
tossing an ordinary coin are equally likely.
(h) A joint event- is the occurrence of two or more events in one trial.
(i)Mutually exclusive events- Two events which cannot happen at the same
time. Two events are said to be mutually exclusive if the occurrence of one

34

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

event precludes the occurrence of another event. For example, in the experiment
of tossing a coin, the occurrence of both head and tail at the same time is not
possible. Thus, the occurrence of head precludes the occurrence of tail, and vice
versa. i.e., two events A and B are mutually exclusive if they are disjoint (
). Thus,
(j) Independent events- Two events are independent if the occurrence of one
has no effect on the occurrence of the other. Thus, two events A and B are said
to be independent if . Two events are dependent if the
occurrence of the first event affects the occurrence of the second event in a way
the probability is changed.
(k) Sure event- An event A is said to be sure if p(A) = 1. Thus, S is a sure event
as p(S) = 1.
(l) Impossible event- An event A is called impossible if p(A) = 0. Thus, empty
set is an impossible event since .
(m) Odds in favor of an event: The odds that an event occurs can be found
using the ratio of the number of ways it can occur to the number of ways it
cannot occur. Let A be an event and odds in favor of A (or odds for A) are
‘a’ to ‘b’ (or a:b).
Thus, . If , then .

. Thus, odds in favor of A are


Definitions and Types of Probability
Can you mention the two broad approaches that help define probability? If yes,
list them and explain how probability is defined based on each approach?
There are two broad approaches to define probability: (a) objective probability
and (b) subjective probability.
(a) Objective probability
A probability is defined as objective when the probability of an event is based
on past data and circumstances are separable by test. It can be further sub-
divided into:
(i)The classical (or a priori) probability: It uses the sample space to
determine the numerical probability that an event will happen. It is also called
35

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

theoretical probability. Suppose n is the total number of equally likely outcomes


in the sample space and out of these m are in favor of a certain event A, then
the probability of the occurrence of event A is defined as:
A classical probability is the relative frequency of each event in the sample
space when each event is equally likely.
Examples 4.2 In an experiment of rolling a fair die only once, what is the
probability of getting event A [i.e., p(A)]? Suppose A = {An even number will
turn up}. In this case, m=3 and n=6. Thus,

(ii) The relative frequency (or empirical) probability: Empirical


probability is based on observation. The empirical probability of an event is the
relative frequency of a frequency distribution based upon observation. Suppose
the relative frequency is the frequency (m) of a class (say, event A) divided by
the total number of observations (n). The probability of event A occurring is
defined as the limit of the relative frequency of event A as the number of trials
is repeated indefinitely. i.e.,

This implies that the probability of an event happening in the long run is the
ratio of the number of times the event occurred in the past to the total number of
observations.
Subjective probability: Subjective probability is defined as the degree of
believe assigned to the occurrence of an event by a particular individual. This
type of probability uses value judgment based on an educated guess or estimate,
employing opinions and inexact information. Subjective probabilities are also
called assessed probabilities.
Note that whether probabilities are objective or subjective, once they are
established, they are used in the same way according to the basic rules of
probability. In general, the basic rules of probability are applied to both
objective and subjective probabilities.
36

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Basic Rules of Probability


Some of the basic rules of probabilities are briefly discussed as follows:
The range of values of probability: The probability of an event always lies
between 0 and 1 inclusive. i.e., the probability of any event A in the sample
space will be: . The probability of an event that cannot occur (an
impossible event) is 0. The probability of any event which is not in the sample
space is zero. The probability of an event that must occur (a sure event) is 1.
The probability of the sample space is 1.
1) Complement rule: The probability of an event not occurring is one minus
the probability of it occurring. For an event A, a complement rule is
expressed as: .
, as
are mutually exclusive events
For example, a probability of getting a single dot in rolling a fair die is 1/6, the
probability of getting other than that single dot is 1 - 1/6 = 1/5.
2) Addition rules (or rules of unions): To determine the probability that one or
another event will occur, we use the addition rule. There are two rules of
addition:
General addition rule: In events which are not mutually exclusive, there is
some overlap. When P(A) and P(B) are added, the probability of the
intersection is added twice.
P(A or B) = P(A) + P(B) - P(A and B) Or, P (A  B) = P (A) + P (B) -
P (A  B)
This rule is always valid.
Specific addition rule: It is valid only when the events are mutually exclusive.
If two events A and B are mutually exclusive, then the joint probability of them
[P (A  B)] is zero.
P(A or B) = P(A) + P(B) Or, P (A  B) = P (A) + P (B)
If two events are mutually exclusive, then the probability of either occurring is
the sum of the probabilities of each occurring.
3) Multiplication rules (or rules of intersections): Like the addition rules,
there are two rules of multiplication.
37

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

General multiplication rule: It always works.

The probability of event B occurring that event A has already occurred is read
"the probability of B given A" and is written: P(B|A). This is called conditional
probability. i.e.,

Specific multiplication rule: It is valid only for independent events.

If events are independent, then the probability of them both occurring is the
product of the probabilities of each occurring.
4) Rules for conditional probability: The probability of an event occurring
given that another event has already occurred is called a conditional probability.
The conditional probability of the event A given that event B has occurred is
denoted by P(A|B).
, provided P(B) > 0.
The probability that event B occurs, given that event A has already occurred is
, provided P (A) > 0.
Conditional probability is therefore the ratio of joint probability to marginal
probability. Marginal probability refers to the probability of an individual
given event.

Examples 4.3
1. The question, "Do you smoke?" was asked of 100 people. Results are shown
in the table.
Do you smoke?
Yes No Total
Male 19 41 60
Female 12 28 40
Total 31 69 100
(a) What is the probability of a randomly selected individual being a male
who smokes?

38

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(b) What is the probability of a randomly selected individual being a male?

(c) What is the probability of a randomly selected individual is smoker?

(d) What is the probability of a randomly selected male is smoker?

(e) What is the probability that a randomly selected smoker is male?

Solutions:
(a) This is just a joint probability. The number of "Male and Smoke"
divided by the total = 19/100 = 0.19
(b)This is the total for male divided by the total = 60/100 = 0.60. Since no
mention is made of smoking or not smoking, this is a marginal
probability, it includes all the cases.

(c) Again, since no mention is made of gender, this is a marginal


probability, the total who smoke divided by the total = 31/100 = 0.31.

(d)This time, you're told that you have a male - think of stratified sampling,
this is a conditional probability. What is the probability that the male
smokes? Well, 19 males smoke out of 60 males, so 19/60 = 0.317.

(e) This time, you're told that you have a smoker and asked to find the
probability that the smoker is also male. This is also a conditional
probability. There are 19 male smokers out of 31 total smokers, so 19/31
= 0.6129.
5) Rule of total probability: Let be complementary events and let A
denote an arbitrary event. Then, P(A) = P(AB) + P(AB’). The rule says that the
probability of the event A occurring is the sum of probabilities for all joint
events in which A occur.
Remarks:
(i) The events of interest here are are called prior
probabilities,
(ii) P(B|A) and P(B’|A) are called posterior (revised) probabilities.
(iii) Baye’s Theorem is important in several fields of applications.

39

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

6) Counting rules
(a) Basic principle of counting (mn rule): Suppose that two experiments
are to be performed. Then if experiment 1 can result in any one of m possible
outcomes and if, for each outcome of experiment 1, there are n possible
outcomes of experiment 2, then together there are mn possible outcomes of the
two experiments.
(b)Generalized basic principle of counting: If r experiments that are to be
performed are such that the first one may result in any of n1 possible outcomes,
and if for each of these n1 possible outcomes there are n2 possible outcomes of
the second experiment, and if for each of the possible outcomes of the first two
experiments there are n3 possible outcomes of the third experiment, and if, . . .,
then there are a total of n1.n2…nr possible outcomes of the r experiments.
(c) Permutations: (Ordered arrangements): For r ≤ n, we define
permutation: The number of ways of ordering n distinct objects taken r at a
time (order is important). It is given by
;
Thus,

(d) Combinations: For r ≤ n, we define combination: the number of ways of


ordering n objects taken r at a time (with no regard to the order).
;

Thus,
Normal Probability Distribution
The normal distribution is probably the most important distribution in statistics.
It is a probability distribution of a continuous random variable and is often used
to model the distribution of discrete random variable as well as the distribution
of other continuous random variables. The basic from of normal distribution is
40

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

that of a bell, it has single mode and is symmetric about its central values. The
flexibility of using normal distribution is due to the fact that the curve may be
cantered over any number on the real line and it may be flat or peaked to
correspond to the amount of dispersion in the values of random variable.

A random variable X is said to follow a Normal Distribution with parameter µ


and if its density function is given by the probability law:

Where, π= A mathematical constant equality = 22/7; e = Naperian base


equalling 2.7183; µ= Population mean; = Population standard deviation; x =
A given value of the random variable in the range -∞< x <∞

Characteristics of a normal distribution:


It has bell-shaped; It is symmetric about the mean; A continuous probability
distribution; Never touches the x-axis; It has Mean=median=mode;
Approximately 68% of the observations lies within 1 standard deviation (S) of
the mean, 95% within 2 standard deviations, and 99% within 3 standard
deviations of the mean. ; Its total area under the curve is 1.00. i.e.,

Empirical rule: Given a set of measurements x1, x2. . . xn, that is bell shaped.
Then approximately
1. 68.26% of the measurements lie within one SD of their sample mean:

2. 95.44% of the measurements lie within two SDs of their sample mean:

3. 99.73% of the measurements lie within three SDs of their sample mean:

Example 4.4: A data set has = 75, S = 6. The frequency distribution is


known to be normal. Then
a. (69, 81) contains approximately 68% of the observations
b. (63, 87) contains approximately 95% of the observations
41

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

c. (57, 93) contains at least 99% (almost all) of the observations


The Normal Curve: The graph of the normal distribution depends on two
factors- the mean and the standard deviation. The mean of the distribution
determines the location of the centre of the graph, and the standard deviation
determines the height and width of the graph. When the standard deviation is
large, the curve is short and wide; when the standard deviation is small, the
curve is tall and narrow. All normal distributions look like a symmetric, bell-
shaped curve, as shown below.

Figure 5.1: Normal curves


The curve on the left is shorter and wider than the curve on the right, because
the curve on the left has a bigger standard deviation.
Standard Normal Distribution: A normally distributed random variable with
μ = 0 and = 1 is said to have the standard normal distribution. It is denoted by
the letter Z. Its probability function is given by:

There is a table which must be used to look up standard normal probabilities.


The Z-score is broken into two parts, the whole number and tenth are looked up
along the left side and the hundredth is looked up across the top. The value in
the intersection of the row and column is the area under the curve between zero
and the Z-score looked up. Because of the symmetry of the normal distribution,
look up the absolute value of any Z-score.
How to compute normal probabilities: There are several different situations
that can arise when asked to find normal probabilities. This can be shortened
into two rules:
1. If there is only one z-score given, use 0.5000 for the second area; otherwise
look up both z-scores in the table

42

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

2. If the two numbers have the same sign, then subtract; if they are different
signs, then add. If there is only one z-score, then use the inequality to
determine the second sign (< is negative, and > is positive).

Finding Z-scores from probabilities- This is more difficult and requires you
to use the table inversely. You must look up the area between zero and the
value on the inside part of the table, and then read the z-score from the outside.
Finally, decide if the z-score should be positive or negative, based on whether it
was on the left side or the right side of the mean. Remember, z-scores can be
negative, but areas or probabilities cannot be.
Tabulated Values: Values of P(0 ≤ Z ≤ z) are tabulated in the appendix of any
statistics book.
Critical Values: Zα of the standard normal distribution are given by P(Z ≥ Zα) =
α which is in the tail of the distribution.
Examples 4.4:
1. Find the probability from the standard Z-Table such that
a. P (0 ≤ z ≤ 1) b. P(−1 ≤ z≤ 1) c. (−2 ≤ z ≤ 2) d. P(−3 ≤ z ≤ 3)
Solutions: a) 0.3413, b) 0.6826, c) 0.9544, d) 0.9974
Excel formula for this is =NORMSDIST (z). For example, can be
written by the following excel formula:
= NORMSDIST (1) – NORMSDIST (0) = 0.841345-0.5 = 0.341345
2. Find Z-score (i.e., z0) from the standard Z-Table such that
a. P(Z > z0) = 0.100 b. P(-z0<Z < z0) = 0.050 c. P(Z > z0) = 0.025 d. P(-z0<
Z ) = 0.010
Answer: a) z0 = 1.28, b) z0 = 0.07, c) z0 = 1.96, d) z0 = 2.33,
Excel Formula for this is =NORMSINV (zo). For example,P (Z>zo) = 0.100,
= NORMSINV (1-0.100) = 1.28; P (-z0<Z < z0) = 0.050, =
NORMSINV(0.5+0.025)= 0.07
Standard score (Z-score)- is obtained by the following formula:

The mean of the standard scores is zero and the standard deviation is 1.

43

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Examples 4.6: The length of life of a certain type of automatic washer is


approximately normally distributed, with a mean of 3.1 years and standard
deviation of 1.2 years. If this type of washer is guaranteed for 1 year, what
fraction of original sales will require replacement?
Solution: Let X be the length of life of an automatic washer selected at
random, then

Therefore, P(X <1) = P (Z < −1.75)


Exercise: Complete the solution of this problem.

Learning Activities
1. In a sample of 100 plots of land, 25 have red (R), 15 grey (G), 50 black (D)
and 10 others (O) soil type. Set up a frequency distribution and find the
probabilities of the occurrence of the following events:
A = {A plot which has red soil type}
B = {A plot which has red or black soil type}
C = {A plot which has neither red or black soil type}
2. Suppose a manufacturing firm receives spare parts from two different
suppliers . Currently, 80% of the spare parts are purchased from
supplier I and the rest 20% from supplier II . It is also known that 5%
and 2% of Supplier 1’s and Supplier 2’s provision is defective, respectively. If a
firm uses a defective part in the process of production, the processing machine
will be broken down so that it will stop its production.
(i) If a firm receives a spare part from one of the two suppliers, what is the
probability of that spare part being: (a) Good? (b) Defective?
(ii) Suppose a firm now receives a defective spare part, what is the
probability that it came from
(a) Supplier 1? (b) Supplier 2?
3. If X is a normal random variable with parameters μ = 3 and = 9, find
a. P (2 < X < 5), b. P(X >0), c. P(X >9).

Continuous Assessment
44

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Test and Quiz, and individual assignment on how to estimate and test
significance of the relationship between different variables

Summary
 The theory of probability forms the basis of statistical inference, the drawing
of inferences on the basis of a random sample of data.
 Bayes’ theorem provides a formula for calculating a conditional probability.
It forms the basis of Bayesian statistics, allowing us to calculate the
probability of a hypothesis being true, based on the sample evidence and
prior beliefs.
 The Normal distribution is appropriate for problems where the random
variable has the familiar bell-shaped distribution. This often occurs when the
variable is influenced by many, independent factors, none of which
dominates the others.
 Each of these distributions is actually a family of distributions, differing in
the parameters of the distribution. Both the Binomial and Normal
distributions have two parameters: n and P in the former case, μ and σ 2 in the
latter. The Poisson distribution has one parameter, its mean μ.
 The mean of a random sample follows a normal distribution, because it is
influenced by many independent factors (the sample observations), none of
which dominates in the calculation of the mean. This statement is always true
if the population from which the sample is drawn follows a normal
distribution.
 If the population is not normally distributed then the Central Limit Theorem
states that the sample mean is normally distributed in large samples. In this
case ‘large’ means a sample of about 30 or more.

3.1.3.5. CONCEPTS OF SAMPLING AND THEIR APPLICATIONS

Have you come across the words sampling and population? If yes, please try to
describe what is sampling and population? Why do we need to take sample
instead of studying the population as a whole?

45

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Basic Concepts of Sampling


Before we start to discuss details of the various sampling methods, it is
important to understand the following terminologies.
Population is defined as the totality of things under consideration. That is, it is
a collection of all values of the variable that is being studied. Population is
therefore defined by the investigator based on the objectives of his/her study. A
population may consist of individuals, objects, scores, measurements,
characteristics, etc. A population may be finite or infinite.
Census is the collection of data from each element of the population. Or it is a
complete enumeration of all items in the population.
Sample is a portion of a total population under study. It is a set of elements
selected in some way from a population. A sample is used to inform certain
characteristics of the population. For statistical inferences to be made about the
population from the sample, it is essential that the samples are representative of
the population. The process of selecting the sample from the population is
called sampling.
A statistic can be viewed as a numerical measurement describing some
characteristic of a sample. For example, sample mean and variance.
A parameter is a numerical measurement that describes some characteristic of
the population. For example, population mean and variance
Why do you need to take sample instead of studying the population as a whole?
Some of the major reasons for sampling include:

1. Cost reduction. Sampling saves cost of the study. It will be very expensive
to cover the entire population in the study. Only government can do some large
scale census. For example, population census has been conducted every 10
years in Ethiopia.
2. Timeliness. It saves time. Census involves a great deal of time to contact the
whole population.
3. The physical impossibility of checking all the items when a population
contains infinitely many members. Thus, it is impossible to generate detailed

46

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

information. Sometimes it possible to obtain sufficiently accurate results by


studying only the sample.
4. The destructive nature of certain tests. In this case, sampling is the only
choice. This works particularly in medical sciences. For example, if a physician
needs to take a blood test to check whether a patient is exposed to malaria
infection, he will simply take a sample of blood rather than taking the whole
blood of the patient.
5. Even if the above four problems are solved, the adequacy of sample results
makes sampling necessary. Even the slightest elements of bias will get larger
and larger, as the number of observations increases. No way of checking the
element of bias (or its extent) unless through a resurvey (census) or use of
sample checks.

In general, sampling has the advantages of simplicity, cost reduction and


timeliness. A small well designed and constructed sample can give more
accurate information than a larger sample because it can be much better
managed and the data collected on individuals may be more reliable
accordingly.

Sampling frame: It is also called source list from which sample is to be drawn.
A sampling frame is a list that closely approximates all the elements in the
population. If a sampling frame is not available, you have to prepare it if the
size of the population is small. Such a list should be comprehensive, reliable
and appropriate.
Sampling units: They are the collections of elements which do not overlap and
exhaust the entire of the population. Sampling unit may be a geographical area
such as region, district, village, kebele, etc., or a construction unit such as
house, flat, etc., or it may be an individual.
Sampling elements: They are the units of analysis (the units of final
observations) or cases in a population. It can be a person, a group, or an
organization that is being measured.
Sampling ratio: It is the ratio of sample size to size of the target population. If,
for example, the size of a given population is 20000 and the size of the sample

47

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

drawn from this population is 200, then the sampling ratio will be 200 divided
by 20000, which equals to 0.01 or 1%.

Probability and Non-Probability Sampling Methods


There are two principal methods of drawing a sample from a population:
Probability (or random) sampling methods and non-probability (or non-random)
sampling methods.

Probability Sampling Methods: Probability sampling method is also called


random (or chance) sampling method. It is a method of selecting sample from
the population in such a way that each element has an equal chance of being
chosen. Random sampling ensures the law of statistical regularity which states
that if on average the sample chosen is a random one, the sample will have the
same composition and characteristics as the population. The results obtained
from this sampling method be assured in terms of probability, i.e., we can
measure the errors of estimation or the significance of results obtained from a
random sample. Thus, a random sampling is considered as the best method of
selecting a representative sample. The five major types of probability sampling
methods are:
1. Simple random sampling (SRS): It is the easiest method of probability
sampling. It is more commonly used in the case of homogeneous population.
There are two methods of drawing simple random samples from the population:
Lottery method: For a small population, slips corresponding to the element
numbers (in the sampling frame) may be placed in the container and then we
may form a sample by drawing slips from it.
Random numbers table: a table of random numbers generated by a random
process (by computer or mechanically). If the size of the population is so large,
the task of selecting a simple random sample is simpler when we use a random
numbers table.

2. Systematic Sampling: A sample is selected at every sampling interval, i.e.,


instead of using a list of random numbers, you will calculate a sampling interval
(say, K). Procedure of selecting a sample is that by using a list of a population
48

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

we simply go down the list taking every Kth individual, starting with a randomly
selected one among the first n individual. Thus, in systematic sampling only the
first unit is selected randomly and the remaining units of the sample are selected
at fixed intervals.

3. Stratified Sampling: It is more convenient when the population is


heterogeneous. If the population can be divided into sub-populations (strata),
the elements in each stratum are fairly uniform. Thus, we can draw a random
sample from each stratum by using either simple random sampling or
systematic sampling method. Stratified sampling can also be used when the
information about some parts of the population is required. These parts can be
treated as population in their own rights. Stratified sampling produces samples
that are more representative of the population than simple random sampling or
systematic sampling if the stratum information is accurate.
For example, if we know the target population has 51% female and 49% male,
the population parameter is a sex ratio of 51 to 49. We can draw random
samples among females and among males so that the sample contains a 51 to 49
sex ratio. Thus, fewer errors are made in representing the population and the
sampling errors are smaller with stratified sampling.
Stratified sampling can further be classified into two types: proportional and
disproportional stratified sampling. Thus, the size of a stratified sample may
either be proportional or disproportional to the stratum size.
Proportional stratified sampling: a method of stratified sampling where the
sampling fractions (or sampling ratios) for each stratum are equal.
Example 5.1: Suppose there are 600 Protestants, 300 Catholics and 100 Jews in
a given population.
(a) If an investigator needs to draw a proportional stratified sample from this
population, then what will be the sampling fraction for all these three strata?
Find the size of our sample elements drawn from each stratum.
(b)If an investigator needs to draw a size of 150 proportional stratified samples
from this population, then what will be the sampling fraction for all these
three strata? Find the size of your sample elements drawn from each stratum.

49

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The sampling ratio and the size of sample elements drawn from each stratum
for question (a) can be computed as follows:
 First choose the stratum which has the smallest size and then divide the size
of each stratum by the size of the smallest stratum. Thus, you will get the
following proportions among the three strata:

These proportions mean that if you draw 1 element from Jews, at the same
time you should draw 3 and 6 elements from Catholic and Protestant,
respectively to be included in your sample. In this way you can draw a
proportional stratified sample from each stratum. Thus, in order to keep this
proportionality, the sampling ratio will be calculated using these figures.
 If you then add these proportions (6+3+1), you will get 10, a value of total
proportion. Finally, you divide the smallest proportion which equals to 1 by
the total proportion 10, you will get a sampling ratio of 0.1 for all
strata.
 The elements of the sample drawn from each stratum will be:

The sampling ratio and the size of sample elements drawn from each stratum in
the case of question (b) can be computed as:

The elements of the sample drawn from each stratum will be:

Disproportional stratified sampling: a method of stratified sampling where


the sampling fractions for each stratum or for some of the strata are unequal.
Suppose we want to compare the three religious groups in the above example
with respect to their church attendance. Obviously, both simple random
sampling and proportionate stratified sampling would give too few Jews in the
sample to make meaningful comparisons. We might therefore decide to select
equal numbers from each group. If we were to select 50 from each group, the
50

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

respective sampling fractions would be 1/12 (= 50/600), 1/6 (= 50/300) and ½


(= 50/100).

We use stratified sampling when a stratum of interest is a small percentage of a


population and a random process could miss the stratum by chance. By using
disproportionate stratified sample, we cannot generalize directly from the
sample to the population. However, disproportionate sampling helps the
analysts who want to focus on issues most relevant to a sub-population (in our
example, Jews). It is usually wise to follow the rule of using proportional
stratification unless cost differentials are very great or variations within stratum
are substantially large.

4. Cluster Sampling: If the total area of interest happens to be a large one, a


convenient way in which a sample can be drawn is cluster sampling. In
stratified sampling we divide the population into strata and sampled from each
stratum. In cluster sampling, however, we divide the population into a large
number of non-overlapping groups (usually called clusters) and then randomly
draw sample among the clusters. In cluster sampling, we do not sample our
elements directly; instead we first sample clusters (or groups of elements).
Thus, cluster sampling can be categorized as:
Suppose we want to examine the productive efficiency of investors on
agricultural sector in Ethiopia and we want to take a sample of few investors for
this purpose. The first stage is to select large primary sampling unit such as
regions in a country. Then we may select certain zones and interview all the
investors in the chosen zones. This would represent a two-stage cluster
sampling with the ultimate sampling units being clusters of zones. If instead of
taking a census of all investors within the selected zones, we choose certain
districts and interview all investors in the selected districts. This would
represent a three-stage cluster sampling. If instead of taking a census of all
investors within the chosen districts, we choose randomly certain villages and
interview all investors in the selected villages, then it is the case of using a
four-stage cluster sampling. Usually multi-stage cluster sampling is applied in

51

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

big studies extending to a considerable large geographical area, say the entire
country.

Like stratified sampling, cluster sampling can use probability proportionate


to size (or weighted cluster sampling), which keeps the selection probability
equal for each element in the population. If the cluster sampling units do not
have the same number or approximately the same number of elements, it is
appropriate to use a random selection process where the probability of each
cluster being included in the sample is proportional to the size of cluster. For
this purpose, we have to list the number of elements in each cluster irrespective
of the method of ordering the cluster.

5. Hybrid sampling: Where there is no single good way to sample a particular


population we use multistage frame designs or a combination of the four
different methods discussed above.
Non-Probability Sampling Methods: Non-probability sampling method is
also called non-random sampling method. It is not quite possible to introduce
randomization under this type of sampling. There are several types of non-
probability sampling methods. Some of these are listed as follows:
1. Haphazard/convenient sampling: Select anyone who is convenient. It
can produce ineffective, highly unrepresentative samples and is not
recommended. Such samples are cheap and quick, but the bias and systematic
errors that easily occur make them worse than no sample at all.
2. Purposive/judgmental sampling: In this sampling technique the
researcher chooses respondents who in his opinion are thought to be relevant to
the subject under study. It can be appropriate in some situations.
3. Quota sampling: In quota sampling we first identify categories of people
(say, male and female), then decide how many to get in each category. Thus,
the number of people in various categories of the sample is fixed. It is widely
used in market research. For example, if populations of customers are known to
consist of 60% male and 40% female, then even in sample of 10 it is better to
have 6 males and 4 females.

52

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

4. Snowball sampling: Snowball sampling is based on analogy to a


snowball, which begins small but becomes larger as it is rolled on wet snow and
pick up additional snow. It begins with one or a few people/cases and spreads
out on the basis of links to the initial cases. i.e., we begin with a few
respondents who are available. After interviewing them, ask the respondents to
recommend you other person who meets the criteria of the research and who are
willing to participate in the study.

Sampling Problems (Errors in Sample Survey)


Error is a word with a special meaning in sampling theory. It is not synonymous
with mistake. However, a mistake by an interviewer or a wrong answer to a
question would each contribute to error in a survey, whether a sample survey or
a census. Fieldwork problems, interviewer-induced bias, clerical problems in
managing the amounts of data, etc would also contribute to error in a survey,
irrespective of whether a sample is drawn or a census is taken. Biases or errors
due to such reasons or sources are called non-sampling error. On the other
hand, error which is attributable to sampling, and which therefore is not present
in census gathered information is called sampling error.

Since a sample has both kind of errors, where as a census has only non-
sampling error, you might conclude that the advantage really rests with the
census. But, the scale of taking a census makes it difficult to reduce the risk of
non-sampling error. Many sources of bias, for instance, management problems,
faulty measurement, lost or corrupted data will be easier to control in a tightly
constructed sample survey than in a full census. Moreover, sampling error can
be controlled (or at least the extent of it can be estimated) with sample surveys.
Thus, there are occasions when a sample survey could produce less error overall
than full census. Occasionally, sample results may not be representative. Bad
sample can be detected if we check whether the sample approximately matches
with the percentage for gender, race, educational level, etc given by the latest
data from census. In general, there are two types of errors, which may happen in
sample surveys: sampling and non-sampling errors.

53

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(a) Sampling errors: Sampling errors are random variations in the sample
estimates around the true population parameters. For example, while population
mean is fixed for the given population, the sample mean (which estimates the
population mean) will vary from sample to sample. A discrepancy will arise
between the population parameter and sample statistic. The error thus
introduced by this statistical discrepancy is called sampling error.

Sampling errors can be estimated only for probability (or random) samples.
Random sampling allows unbiased estimates of sampling error. The
measurement of sampling errors is known as the precision of the sampling plan.
Thus, there are two major factors that cause sampling errors:
Sample bias: This is caused by the method of sample selection. When the
method of selection is inappropriate, statistical discrepancy between population
parameter and sample statistic will arise and will persist even when the sample
size is large. The sample bias is due to selection of the sample, which does not
truly represent the population from which it is drawn. Errors due to sample bias
could be corrected by the use of proper sampling method.
By chance: Even in the absence of sample bias, discrepancy between
population parameters and sample statistics could occur due to chance, as a
sample will never be the same as the population from which it is drawn and will
never reproduce exactly the characteristics of the population.
Non-sampling errors: are likely to occur in both sample surveys and censuses.
Some of the non-sampling errors are described as follows:
1. Non-coverage (sampling frame defects): Omission of part of the
intended population. For example, soldiers, students living in campus,
people in the hospitals, prisoners, etc are typically excluded from national
sample.
2. Non-response error: Some people refuse to be interviewed because they
may be ill, too busy, or simply don’t trust the interviewer.
3. Response error: This occurs due to response bias, which is a result of
vague, inaccurate, or wrong answers given by the respondent. For
example, there may be tendency on the part of the respondent to make

54

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

understatement in certain cases (say, income) or to make an overstatement


in certain cases (say, expenditure).
4. Interviewer error: This occurs when some characteristics of the
interviewer, such as age or sex, affects the way in which respondents
answer questions. For instance, questions about a racial discrimination
might be differently answered depending on the racial group of the
interviewer.
5. Instrument error: In sample survey a word instrument means the device
by which we collect data- usually refers to a questionnaire to be filled out
by the respondent. Different wording of a question can lead to different
answers being given by a respondent. When a question is badly worded,
the resulting error is called instrument error.
6. Observational error: It is a bias due to observation. For instance, in
making a visual estimate of the extent of crop damage due to drought,
flood, or a certain disease, an element of observational error is likely to
occur.

Sampling Distributions

Sampling Distribution of the Mean


The sampling distribution of the mean ( ) is the probability distribution of all
possible values of the random variable that may take when a sample of a
given size n is taken from a population. We assume the random variable in
question is a continuous variable. For example, it may be the population of
employees’ salaries in a particular industry or the population of scores of
students for Introduction to Statistics course. It is possible to draw different
simple random samples of a given size n from the underlying population. There
is no reason to expect the point estimates for  and  to be identical for each
simple random sample drawn. The sample mean could be viewed as the
numerical description of the outcome of the experiment. The sample mean
is a random variable. Consequently, has a mean, a variance and a
probability distribution. Information about the sampling distribution and its

55

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

properties enables an investigator to make probability statements about how


close the sample mean ( ) is to the population mean ().

The expected value of is simply the population mean  and this can be
written more formally as: . (Proof)
Various random samples can be expected to generate a variety of values. It
can be shown that with simple random sampling, the expression for the standard
deviation of depends on whether the underlying population is finite or
infinite. It might be useful to define some notation that will be used
subsequently.
= the standard deviation of all possible values; = the standard
deviation of the underlying population; n= the sample size; N= the population
size.
If the population is finite, the standard deviation of is given by:

If the population is infinite, the standard deviation of is given by:

The factor is called the finite population correction factor.


In most practical contexts, the finite population is relatively large, though not
infinite, and the sample size is relatively small. In such cases,  1. It is,
therefore, obvious that there is little difference between the two expressions and
as a ‘rule of thumb’ if the sample size is less than or equal to 5% of the
population size, the latter expression should be used. In all cases in this
discussion, the population is assumed finite but large and the finite correction
factor will be viewed as redundant. Therefore, we will always use the latter
expression. It should be noted that is sometimes referred to as the standard
error of the mean because of its role in the computation of estimation errors. It
is possible to demonstrate how the latter expression is derived. (Show!)
Example 5.2: Consider a simple population consisting of a random variable
with 5 elements designated, for reference, by the letters A, B, C, D, and E. The
elements and their values are listed below as:
56

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

A B C D E
0 3 6 3 18
(a)What is the population mean (µ)?
(b)What is the population standard deviation ( )?
(c) Construct the sampling distribution of the mean ( ) for a sample size of
3?
(d)What is the mean of the sampling distribution of the mean, (i.e.,the
mean of means)?
(e) What is the standard deviation of the sampling distribution the mean
(standard error of the mean, )?
(f) What observations can be made with respect to the population and
sampling distribution?
Solutions:
(a) Population mean (µ)
(b)Population standard deviation ( )

(c) Sampling distribution of the means ( )


In order to get the sampling distribution of the mean ( ), we should first
find the number of simple random samples of size n=3 that can be drawn
without replacement from a population of size N=5. The total possible
samples can be obtained by using the rule of combination.

Thus, 10 simple random samples of size 3 can be drawn from our population.
The means for all these possible samples are given in Table 5.1.
Table 5.1 Sample Means ( ) from Population Values
Population Sample mean (
Samples Sample values
values )
A=0 ABC 0, 3, 6 3
B=3 ABD 0, 3, 3 2
C=6 ABE 0, 3, 18 7
D=3 ACD 0, 6, 3 3
57

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

E = 18 ACE 0, 6, 18 8
ADE 0, 3, 18 7
BCD 3, 6, 3 4
BCE 3, 6, 18 9
BDE 3, 3, 18 8
CDE 6, 3, 18 9

The sampling distribution of the means for sample size n=3 is given in Table
5.2.
Table 5.2 Sampling Distribution of the Means for n=3
No of means Probability of (p(
Sample mean (
(Frequency = ))
)
f) (Relative frequency)
2 1 0.1
3 2 0.2
4 1 0.1
7 2 0.2
8 2 0.2
9 2 0.2
(d)Mean of the sampling distribution ( )
, by using values in Table
5.1.
Or
,
by using values in Table 5.2.
(e) Standard error of the mean ( )
It is possible to find by using the values and their associated frequencies
of stated in
Table 5.2.

58

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

In this case n refers to the possible number of samples to be drawn from a


population, which equals to 10. We can also find from the formula relating
and . This will be computed as follows:
, which is
equal to the value of computed above using values of Table 5.2 above.
(f) The following observations can be made:
1. = µ. It is always true that the mean of the sample means equals the
populations mean.
2. < . A standard deviation measures variability. Averages vary less
than the population values from which the averages are computed. Thus, in
our example, population variability is = 6.293, but the variability of the
sample averages is the smaller number = 2.569. Thus, averaging reduces
variability, and the larger the sample size, the smaller will be.
3. The range of the sample mean is less than the range in the population. i.e.,
the sample means vary from 2 to 9 where as the population values vary
from 0 to 18.
4. The sampling distribution of the sample means tends to be more bell-
shaped & to approximate the normal probability distribution.

The normal distributions of sample mean ( ): It is now necessary to


determine the probability distribution of . If the population from which the
samples are drawn is normal with mean µ and standard deviation , then the
sampling distribution of is also normal with mean = = µ, and standard
deviation = = . That is, if is the normally distributed random
variable, then its standard normal variable (z) will be computed as:

59

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

This is the formula for standard normal variable for the distribution of . Thus,
values of z computed from the formula can be used to enter the standard normal
table in the usual manner.
The Central Limit Theorem (CLT): When the population is normally
distributed, it is obvious that the sampling distribution of the mean also normal.
Yet decision makers must deal with many populations that are not normally
distributed or the distributions of their original data are unknown. Under these
conditions, we invoke what is called the Central Limit Theorem. In the context
of this application, the CLT states that in selecting simple random samples of
size n from a population, the sampling distribution of the mean can be
approximated by a normal probability distribution as the sample size becomes
large. It has been generally found that for most populations, a sample size of n
 30 makes the normal approximation a reasonable assumption. However, the
larger the sample size, the better is the approximation to a normal probability
function. If, on the other hand, the population distribution is known and has a
normal probability distribution, the sampling distribution of is a normal
probability distribution for any sample size.

The CLT states that if x1, x2, ..., xn be simple random samples drawn from a
population x (whose distribution may not be normal), then the distribution of
the mean ( ) approaches the normal distribution with mean µ and standard
deviation as the sample size n becomes large. That is, if

.
Example 5.3: Suppose hourly wages of workers in an industry have a mean
wage rate of $5.00 per hour and a standard deviation of $0.60.
(a) What is the probability that the mean wage of a random sample of 50
workers will be between $5.10 and $5.20?
60

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(b)What is the probability that the mean wage of a random sample of 36


workers will be between $5.10 and $5.20?
Solutions: In order to find the probability of the occurrence of given values for
a random variable we should first convert that random variable into a normal
variable with its given values. For doing so, we then need to get the values of
and .
(a) For the sampling distribution of , = µ= $5.00.
The standard error of the mean,
As n = 50 satisfies the rule, the distribution of the will be normal.
Thus, we can obtain the z values using the formula.

This implies that the probability of the sample mean wage is between $5.10 to
$5.20 is 10.99% or 0.1099.
(b)By simply substituting only n=36 instead of n=50 and keeping other values
as they were, you get a probability value of 0.1359. This value is larger than
the above value because the sample size this case is lower than before and
hence its standard error of mean is larger and thereby the range of the z-
values will be higher.
To this end, for a normally distributed population, the sample means are
normally distributed, regardless of the size of the sample. However, for any
large population (n  30), the sample means are nearly normally distributed.
Note that the CLT is not restricted only to normally distributed populations, but
the tendency also occurs for all populations usually encountered as the sample
size n increases. Thus, empirically CLT implies that if you have a sample size
of n which is sufficiently large (n  30), one can technically make inferences
based on the assumptions of normal distribution. The larger the sample size n
is, the smaller the standard error of the mean ( ) and the taller and thinner is
the distribution of means ( ).

61

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Sampling Distribution of the Proportions


The proportion of “successes” in a population is the population parameter (i.e.,
denoted by p) and the proportion of “successes” in a sample is a statistic (i.e.,
denoted by ). The sampling distribution of the proportion ( ) is the
probability distribution of the values of in all samples of a given size n that
can be drawn from the population. Suppose the size of a given population and
samples drawn from that population is denoted by letters N and n, respectively;
and X and represent the number of successes in the population and in a
sample respectively that possess a specific characteristic of interest. The
population and sample proportions are then computed as follows:

The mean of the sampling distribution of the proportion ( ) equals population


proportion.

Where, n in this case is the number of all possible samples of a particular size
that can be drawn from a population.
The standard error of the proportion ( ) is the standard deviation of all
possible sample proportions. is then computed by applying the following
formula:
, if the sample size of the population
size N.

Where, q =1-p. When of the population size N, the finite


population multiplier can be omitted and hence we will usually compute by
the formula:

Example 5.4: Consider a simple population consisting of a random variable


with 5 elements designated, for reference, by the letters A, B, C, D, and E. The
elements and their values are listed as:
62

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

A B C D E
0 3 6 3 18
This population contains three even numbers (0, 6 & 18) and two odd numbers
(3 & 3).
(a) What will be the population proportion of even numbers (p) (i.e., the
successes are now even numbers)?
(b)Construct the sampling distribution of proportions ( ) for a sample size n
= 3.
(c) What is the mean of the sampling distribution of the proportion, (i.e.,
the mean of means)?
(d)What is the standard deviation (or standard error) of the sampling
distribution of the proportion (i.e., standard error of the proportion, )?
Solutions:
(a) Population proportion of even numbers (p)
Given that X = 3 and N = 5. Then,
(b)Sampling distribution of the proportion ( )
In order to obtain the sampling distribution of the proportion ( ), we should
first find the number of simple random samples of size n=3 that can be drawn
without replacement from a population of size N=5. The total possible samples
can be obtained by using the rule of combination,
Thus, 10 simple random samples of size 3 can be
drawn from our population. The proportions for all these possible samples are
given in Table 5.3.
Table 5.3 Sample Proportions ( ) from Population Values
Population Sample proportions (
Samples Sample values
values )
A=0 ABC 0, 3, 6 2/3
B=3 ABD 0, 3, 3 1/3
C=6 ABE 0, 3, 18 2/3
D=3 ACD 0, 6, 3 2/3
E = 18 ACE 0, 6, 18 1
ADE 0, 3, 18 2/3
63

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

BCD 3, 6, 3 1/3
BCE 3, 6, 18 2/3
BDE 3, 3, 18 1/3
CDE 6, 3, 18 2/3
The sampling distribution of the proportions for sample size n=3 is given in
Table 5.4.
Table 5.4 Sampling Distribution of the Proportion for n=3
Sample proportions ( No of proportions Probability of (p( ))
) (Frequency = f) (Relative frequency)
1/3 3 0.3
2/3 6 0.6
1 1 0.1

(c) Mean of the sampling distribution of the proportion ( )


,
by using values in Table 5.4.
(d)Standard error of the proportion ( )
It is possible to find by using the values and their associated frequencies of
stated in Table 5.4

In this case n refers to the possible number of samples to be drawn from a


population, which equals to 10. We can also find from the formula relating
and . This will be computed as follows:

which is equal to the value of , which is computed above using the 10


sample proportions.

The CLT may also be used to justify the normal approximation to the
distribution of the sample proportion for a sufficiently large sample size n. If
n is large (usually ) and both , then the sampling

64

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

distribution of the proportion ( ) with mean = p and standard deviation


. Thus, the standard normal variable z is computed as:

Learning Activities
1. Suppose 55% the television audience population watched a particular
program one Saturday evening.
(a) What is the probability that, in a random sample of 100 viewers, less than
50% of the sample watched the program?
(b)If a random sample of 500 viewers is taken, what is the probability that less
than 50% of the sample saw the program?

Continuous Assessment
Individual assignment and/or written test will be given about basic concepts of
probability.

Summary
 A primary data source is one where you obtain the data yourself or have
access to all the original observations. A secondary data source contains a
summary of the original data, usually in the form of tables. When collecting
data always keep detailed notes of the sources of all information, how it was
collected, precise definitions of the variables, etc. Some data can be obtained
electronically, which saves having to type it into a computer, but the data still
need to be checked for errors.
 There are two principal types of sampling namely random and non-random
sampling. The various types of random sample, including simple, systematic,
stratified and clustered random samples. The methods are sometimes
combined in multistage samples. Common types of non-random sampling
methods include convenient, purposive, quota and snow ball sampling.
 The type of sampling affects the size of the standard errors of the sample
statistics. The most precise sampling method is not necessarily the best if it
costs more to collect (since the overall sample size that can be afforded will
be smaller).

65

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

 The sampling frame is the list (or lists) from which the sample is drawn. If it
omits important elements of the population its use could lead to biased
results.
 Careful interviewing techniques are needed to ensure reliable answers are
obtained from participants in a survey.
 Estimation is the process of using sample information to make good
estimates of the value of population parameters.
 There are several criteria for finding a good estimate. Two important ones are
the (lack of) bias and precision of the estimator. Sometimes there is a tradeoff
between these two criteria – one estimator might have a smaller bias but be
less precise than another.
 An estimator is unbiased if it gives a correct estimate of the true value on
average. Its expected value is equal to the true value.
 The precision of an estimator can be measured by its sampling variance.

3.1.3.6 STATISTICAL ESTIMATION AND HYPOTHESIS TESTING


Do you know how to estimate the relationship between two variables and test
whether their relationship is significant or not?

Statistical Estimation
Estimation is a process by which we estimate various unknown population
parameters from sample statistics. The first step in estimation is to obtain
observations on one or more random variables. Suppose there are a single
random variable X and a single parameter . The observations are used to
construct estimates of . The formula for obtaining the estimate of a parameter
is referred to as an estimator and the numerical value associated with it is called
an estimate. Thus, an estimator is a sample statistics that is used to estimate an
unknown population parameter. The theory of estimation can be divided into
two parts: point estimator and interval estimator.

Point Estimation: Point estimate is a single number or value that estimates the
exact value of the unknown population parameter of interest. Interval estimate
is an interval that provides an upper and lower bounds for a population
parameter. For example, the sample mean is an estimator of the population

66

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

mean (µ). Suppose we have the following random sample of n = 6 elements


from a population whose parameter values are unknown:
1 2 4 11 7 5
The point estimators and point estimates of the population mean, proportion and
standard deviation will be as follows:
The sample mean, . The point estimator is , and 5 is a point
estimate of the unknown population mean. Note that our sample of n = 6
elements contains two even numbers (2 & 4). Calling an even number a
success, the sample proportion of successes is . The statistic is an
estimator of the unknown population proportion of successes; and 1/3 is a point
estimate of the population proportion. Let an estimator of the unknown
population standard deviation is denoted by S. The estimator S is defined by the
formula:

Interval Estimation: The point estimates of the population parameters do not


provide any indication of the precision of the point estimation. They will not
provide exact estimates of the population parameters given that some degree of
error is likely to be introduced by the sampling. Interval estimation has an
advantage over point estimation because it provides information regarding the
precision of the estimates. The interval estimate is also called the confidence
interval estimate and consists of a range of values instead of just a single value.
It can be applied when sample sizes are either small (n < 30) or large (n  30)
but modifications to the approach are required depending upon the sample size.
In addition, the approach adopted is determined by whether the population
standard deviation ( ) is known or unknown.

(a) Interval estimation of the mean in large sample case: Here we assume
that n  30 and the sampling distribution of is thus approximated by the
normal probability distribution. We also assume that the sample size is small
relative to the size of population (n/N  0.05) and hence the finite population

67

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

correction factor is not required in the computation of the standard deviation of


the .

Example 6.1: Assume we have a random sample of 36 civil servants in Addis


Ababa (n=36) and we want to calculate an interval estimate for the age of the
civil servants. The sample mean age is computed as 40 years ( = 40). It is
known from studies that the population standard deviation of civil servants age
is 9 years ( = 9). From the discussion of Section 7, we know that the sampling
distribution of is approximately normal with mean  and a standard
deviation given by , which in this case is . We are now in a
position to provide a probability statement about the sampling error that
attaches to use of a simple random sample of 36 civil servants to obtain a point
estimate of the mean age of the population. Using the table of areas from the
standard normal probability distribution, 95% of the values of a normally
distributed random variable lie within  1.96 . Thus, in the example here,
95% of all values lie within  1.96 of the population mean (). Since 1.96
= 1.961.5 = 2.94, we can state that 95% of all sample means lie within 
2.94 years of the population mean. We could re-express this saying as there is a
0.95 probability that the sample mean will provide a sampling error of 2.94
years or less. There is only a 0.05 probability that the sampling error will be
greater than 2.94 years. This tells us something about the precision of the
estimate. If the investigator is unhappy with this level of precision, a larger
sample will be necessary since there is an inverse relationship between the
standard deviation of the mean and the sample size.

Statisticians use the notation  to denote the probability that the sampling error
is larger than the sampling error reported in the precision statement. Thus,
=0.05 states there is a 0.05 probability that the sampling error is larger than
that reported in the precision statement. Since the normal distribution is
symmetric, /2 is the area or probability in each tail of the distribution and 1-
will be the area or probability that a sample mean will provide a sampling error
less than or equal to the sampling error used in the precision statement.
68

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Using z to denote the standard normal random variable, a subscript on the z


value is used to denote the area in the upper tail of the probability distribution.
This corresponds to the z-value with 0.025 of the area in the upper tail of the
probability distribution. As can be found in the standard normal probability
distribution table, z0.025 = 1.96. In general, z/2 denotes the value of the standard
normal variable corresponding to the area of /2 in the upper tail of the
distribution. There is 1- probability that the value of a sample mean will
provide a sampling error of Z/2 or less.
Another way of expressing this is in terms of the interval estimate itself. There
is a 1- probability that the interval constructed by
(6.1)
contains the true population mean . This could be expressed as:
(6.2)
This can only be constructed if we know the value of . The complementary
way of looking at this is that there is a probability of  that the interval
constructed by equation 6.1 does not contain the true population mean, . In the
above case, 1- is the confidence coefficient and z/2 is the z value providing an
area of /2 in the upper tail of the standard normal probability distribution. The
above expressions are sometimes referred to as confidence intervals. Use of the
standard normal is based on the central limit theorem and enables us to

conclude that with a large sample:

If is known, can be computed relatively easily, can be computed from


the sample, and values for z/2 can be obtained from the z-table. If is

unknown, then the sample variance is defined as:

(6.3)
can be used in its place. Thus, the standard deviation of the sample means is
given by

69

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The following table contains the values for z/2 for the most commonly used
confidence intervals. It should be noted that the common choices for the degree
of confidence can be expressed in percentage terms as 90%, 95% and 99%. The
z-value corresponding to z/2 is also referred to as critical value.
Table 6.1 Confidence Levels
Confidence level
 (Critical Values)
(%)
90 0.1 0.05 1.645
95 0.05 0.025 1.96
99 0.01 0.005 2.575

(b) Interval estimation of the mean in small sample case: The central limit
theorem played an important role in the development of the confidence interval
in Section 7, but this was only the case for n  30. If n < 30 the sampling
distribution of depends upon the distribution of the population. If the
population distribution is normal, the sampling distribution of will also be
normal regardless of the sample size. If the population standard deviation ( )
is known, then expression (6.2) can be used regardless of the sample size. The
more likely case is that is unknown and the sample standard deviation
denoted as S (the square root of expression 8.3) must be used to obtain an
estimate for . The resultant confidence interval is: . is only
valid when n  30. In the small sample case, the confidence interval is based on
an alternative probability distribution known as the t-distribution. The t-
distribution is actually a family of similar probability distributions. A specific
distribution is determined by a parameter known as the degrees of freedom. A
t-distribution has a zero mean and with more degrees of freedom becomes less
dispersed. As the number of degrees of freedom increases, the difference
between the t-distribution and the standard normal probability distribution
becomes smaller and smaller.

70

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

A table of critical values provides the necessary information for the t-


distribution. The value t/2 denotes the area representing /2 in the upper tail of
the t-distribution. The t-value can be used to construct an interval estimate for
the population mean when n < 30 and is unknown. The following expression
provides the confidence interval using the t-distribution in these circumstances.
. The number of degrees of freedom (df) that the t-distribution has in
this case is n-1. This is related to the fact that S and not is used as the
population standard deviation. The concept of the df is related to the number of
independent pieces of information involved in the computation of the numerator
of the expression (xi - ) 2, where i =1,....n. We should be aware that
. This means that if you know n-1 of the values, the nth can be
worked out using the fact that Thus, n-1 is the number of df
associated with and hence the t-distribution. The t-distribution is
used in testing the significance of: 1) a sample mean, 2) the difference between
two sample means, 3) the sample correlation coefficient, and 4) the sample
regression coefficient.
(c) Interval estimation of the proportion in large sample: Developing a
confidence interval for a population proportion is similar to doing so for a
mean. Confidence interval for a population proportion under this case can be

computed as:
Example 6.2: John Gail is running for Congress from Nebraska district. From a
random sample of 100 voters in the district, 60 indicate they plan to vote for
him the upcoming election. The sample proportion ( ) is 0.6, but the
population proportion (p) is unknown. (a) Estimate the population proportion,
(b) Develop a 95% confidence interval for a population proportion, (c) Interpret
confidence interval estimate
Solutions:
(a) . This is the point estimate of the population proportion.

71

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(b)

(c) We are 95% confident that the population proportion (p) is between
0.503 and 0.693.

(d)Interval estimation of the proportion in small sample case:


Example 6.3: With the given information in the example 6.3 above, if size of
the sample is 20 instead of 100 voters, then (1) Estimate the population
proportion, (2) Develop a 95% confidence interval for a population proportion,
(3) Interpret confidence interval estimate.
Solutions:
1) . This is the point estimate of the population proportion.

2)

3) We are 95% confident that the population proportion (p) is between 0.4 and
0.8.

Determining the Sample Size


In designing a sample survey the crucial problem is to determine the sample
size. Too small a sample will increase the sampling error in estimating
population parameters and too large a sample will increase the cost of a survey.
Thus, an optimum sample size should be determined to strike a balance between
the accuracy in estimation and the cost. How to determine the sample size?
Determination of the necessary sample size dependence on the following three
factors:
1. The level of confidence desired. This refers to the level of confidence.
The higher the level of confidence selected, the larger will be the sample size.
2. The margin of error the researcher will tolerate (allowable error). The
maximum allowable error (denoted by E) is the amount that is added &
subtracted to the sample mean (or proportion) to determine the end points of the
72

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

confidence interval. It is one-half the width of the corresponding confidence


interval. A small allowable error will require a larger sample, and vice versa.
3. The variability in the population being studied (the population standard
deviation). If the population is widely dispersed, a large sample is required.
Population Mean: The formula for the sample size which is obtained by
solving the maximum error of the estimate formula for the population mean for
n:
Example 6.4: A researcher wishes to know the minimum sample size required
to ascertain the average wage of all employees who work in Dire Dawa city,
with a margin of error of ± Birr 10 and with a confidence level of 95%. This
researcher knows from previous survey that the standard deviation of the
monthly wage of these employees is 100 Birr. This researcher also knows that
the total number of employees who work in Dire Dawa city is approximately
5000. What will be the sample size for this study?
Solution:
Hypothesis Testing
The assumption we make about the value of a population parameter that
subjects to verification is called hypothesis. Sample evidence is then used to test
the reasonableness of a hypothesis; hence the statistical inference made in this
way is referred to as hypothesis testing. It is defined as a procedure to determine
whether the hypothesis is a reasonable statement based on the sample evidence
and probability theory. It is the branch of inferential statistics that is concerned
with how well the sample data support a null hypothesis and when the null
hypothesis can be rejected in favor of the alternative hypothesis.

In statistical analysis we make a claim (state a hypothesis), collect data, and


then use the data to test the assertion. Hypothesis testing starts with making a
tentative assumption/statement about the population parameter usually referred
to as the null hypothesis (denoted by Ho). The null hypothesis is the hypothesis
to be tested. Another hypothesis referred to as the alternative hypothesis

73

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(denoted by Ha or H1) is usually the opposite of the null hypothesis. The


hypothesis-testing procedure involves using data from a sample to test two
competing statements (Ho and Ha).

Basic Concepts in Hypothesis Testing


(a) Choosing the null and the alternative hypothesis
The first step in setting up a hypothesis testing is to decide on the null and the
alternative hypothesis. Let’s first focus on hypothesis tests concerning a single
parameter of one population. The parameter that we focus on is, say, the
population mean (). The null hypothesis will always specify a single value for
the parameter in question. The null hypothesis could be expressed concisely as:
Ho:  = 0, where 0 is some numeric value the population mean  is assumed to
be. Three choices are possible for the alternative hypothesis.
(1) Ha:   0. In this case, we are concerned with whether a population
mean is different from a specified value 0, regardless of whether a
population parameter is greater or less than the specified value. A hypothesis
test of this kind is known as a two-tailed test.
(2) Ha:  < 0. Here, the primary concern is whether a population mean is less
than a specified value 0. A hypothesis test of this kind is called a left-tailed
test but it is also called a one-tailed test.
(3) Ha: > 0. The primary concern in this case is whether a population mean
is greater than a specified value 0. A hypothesis test of this kind refers to a
right-tailed test but it is also known as a one-tailed test.
(b) Type I and Type II Errors
The null and alternative hypotheses represent competing statements about the
true state of nature. Either Ho or Ha is true, but not both. The hypothesis testing
procedure should lead to accepting Ho when Ho is true and rejecting Ho when Ha
is true. Since hypothesis testing is made based on the use of sample
information, the possibility of errors must be allowed for. Table 6.2 summarizes
the possibilities of errors that may occur in hypothesis testing.
Table 6.2: Type I and Type II Errors in Hypothesis Testing
Conclusion State of Nature 
74

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Ho is true H0 is false
Reject Ho Type I Error Correct Conclusion
Accept Ho Correct Conclusion Type II Error

It is usually argued that making a Type I error is more serious than a type II
error. We can control the probability of making a Type I error since this is
nothing more than the level of significance of the test. In the last discussion we
used the notation  = 0.05 to denote the level of significance of the test. This
specifies the maximum allowable probability of making a Type I error. The
probability of making a Type I error is controlled for by setting a low value for
the significance level of the test. Conventional values for  are 0.05 and 0.01.
These values are set low to enhance our confidence that the conclusion to reject
Ho while it is correct. Because of the uncertainty associated with making a Type
II error, statisticians often recommend use of the statement “do not reject H0”
rather than statement “accept H0”. The former statement eliminates the
possibility of making a Type II error. Only two conclusions are possible - either
reject H0 or do not reject H0.

Hypothesis Tests about a Population Mean


1. State the null hypothesis and alternative hypothesis: The first step in
hypothesis testing is to set hypothesis being tested, which is usually called null
hypothesis, denoted by Ho. the capital letter H stands for Hypothesis and the
subscript 0 implies “no difference”. The null hypothesis is a statement that is
not rejected unless our sample data provide convincing evidence that it is false.
Failing to reject Ho doesn’t prove Ho is true, but it means we have failed to
disprove Ho. Thus, Ho is a statement about a population parameter developed
for the purpose of testing numerical evidence. To prove without any doubt the
Ho is true, the population parameter would have to be known. To actually
determine it, we would have to test, survey, or count every element in the
population. This is usually not feasible. Hence, taking a sample from a
population is important. The alternative hypothesis (denoted by H a) describes
what you will conclude if you reject the Ho. It is also called research

75

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

hypothesis. Ha is a statement that is accepted if sample data provide sufficient


evidence that Ho is false.

2. Select a level of significant (denoted by ): It is the probability of


rejecting the Ho when it is true. In other words, it means the probability of
committing type I error. It is also called the level of risk. The most common
value of : 1%, 5% and 10%. Usually, 5% is chosen for consumer research
projects, 1% for quality assurance and 10% for political polling. However, the
level of significance must be determined by the researcher before formulating a
decision rule and collecting sample data.

3. Select the appropriate test statistic: Test statistics is a value obtained from
sample information that used to determine whether the Ho is rejected or not.
There are many test statistics such as z-distribution, t-distribution, F-distribution
and - distribution.
(a) The test statistic used for testing a claim about a population mean
The large sample case: The objective of hypothesis testing is to determine
whether a sample point estimate (for e.g., the sample mean) is significantly
different from a claimed value. The relevant sample statistic ( ) is converted
into a test statistic and compared to a critical value. For a large sample (n  30)
and the standard deviation of a population is either known or unknown, the
appropriate test statistic is the z-score. The test statistic can be computed as
follows: (6.4)

If is known, we can re-express this as: (6.5)

If is unknown, as is usually the case, we can use the sample standard

deviation (S) in its place and re-write as:

(6.6)

76

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The z-score is always evaluated under the null hypothesis and is not affected by
whether the alternative hypothesis suggests a left-tailed test, a right-tailed test
or a two-tailed test. The use of this test can best be illustrated by reference to
an example.
Example 6.5: A certain manufacturer claims on its label that each jar of coffee
it produces contains at least three pounds of coffee. A researcher intends to test
the validity of this and establish whether the manufacturing company is in
violation of its label claim.
The null and alternative hypotheses may be expressed as: Ho:   3; HA:  < 3.
It is clear that the test is a left-tailed. Previous studies suggest that = 0.20
pounds. Suppose the researcher took a sample of 49 jars of coffee and
computes an average per jar of 2.95 pounds ( = 2.95). We can now insert our
sample values into expression (6.5) to obtain the z-score value.

and we will refer to this as the z-score test statistic.

The small sample case: If the sample size is small (n < 30) and the population
standard deviation is unknown, then we cannot use the z-distribution but must
use the t-distribution. We assume in using the t-distribution that the parent

population is normal. The t-statistic can be expressed as follows:

(6.7)
and is distributed with n-1 degrees of freedom. The critical values for this test
are determined by the degrees of freedom. This test can be used to undertake
left-sided hypothesis tests, right-sided hypothesis tests and two-tailed tests. It is
very similar to the z-score test with the provision that n < 30 and the critical
values vary depending on the degrees of freedom. The null and alternative
hypotheses are set up in exactly the same way.

Example 6.6: The mean expenditure on fuel for all Addis Ababa households in
1990 was Br.600. A randomly selected sample of 15 upper-income households
was chosen to ascertain whether high-income households spent more on fuel.

77

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The sample average for this group was calculated at Br.825 with an estimated
standard deviation of Br.150. Do high income families spend more on fuel?
This is clearly a right-tailed test. The small sample size in conjunction with the
estimated standard deviation suggests the use of the t-statistic.
The null hypothesis in this case is:
Ho:   600 (i.e., the mean expenditure is not greater than the national
mean)
Ha:  > 600 (i.e., the mean expenditure is greater than the national mean)
We can now compute the t-statistic itself. If we insert the values we have
obtained from the sample into expression we obtain the t-value as

3. Formulate the decision rule: A decision rule is a statement of the specific


conditions under which the Ho is rejected and the conditions under which it is
not rejected. A critical value is the dividing point between the region where the
Ho is rejected and the region where it is not rejected. Thus, before we made the
decision, we should find the critical value by using Table of the required test.
The decision rule (at  level of significance) in a large sample case for:
(a) A left-sided test is: Reject Ho if the z-score statistic < - Z; .
(b) A right-tailed test is: Reject Ho if the z-score statistic > Z.
(c) A Two-tailed test is: Reject Ho if the z-score statistic < - Z/2; .
One way to determine the location of the rejection region is to look at the
direction in which the inequality sign in the HA is pointing (either <, > or ).
The important point to note here is that the boundaries delineating the rejection
region are not  but /2. If  = 0.05, then 0.025 is assigned to the lower tail
and 0.025 to the upper tail. If we used a one-tailed test with  = 0.05, the
critical value for the z-statistic is either -1.64 (if left-tail case) or +1.64 (if right-
tail case). In the two tailed case, if we use an overall critical value of  = 0.05,
we need to assign 0.025 to the upper and to the lower tails respectively, then the
critical value for the z-statistic is -1.96 in the lower tail of the sampling
distribution and +1.96 in the upper tail of the sampling distribution.
78

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The prob-value or p-value of a test is also a useful piece of information that can
be used when undertaking hypothesis testing. A p-value is the probability of
getting a value of the sample statistic that is at least as extreme as the one from
the sample data, assuming that the null hypothesis is true. The ‘rule-of-thumb’
is that you should reject the null hypothesis if the p-value of the test is less than
or equal to the significance level of the test . The traditional approach
outlined compares the test statistic to the critical value whereas the p-value
compares the p-value to the significance level. Most statistical software now
generates p-values as a matter of course. The results obtained whether you use
the traditional or the p-value approach will always result in the same
conclusion.
The decision rule (at a level of significance of ) in a small sample case for:
(a) A left-sided test is: Reject Ho if t-statistic (or tcal) < - t; or .
(b) A right-tailed test is: Reject Ho if t-statistic > t .
(c) A two-tailed test is: Reject Ho if t-statistic < - t/2 or if t-statistic > t/2; or
.
5. Make a decision: Here we compare the test statistic and the critical value
and make a decision to reject Ho or not to reject Ho. If Ho is rejected, it means
that it is highly improbable that a large computed Z-value/t-value is due to
sampling error (or by chance). If it is not rejected, a small computed Z-value/t-
value is due to sampling error (or by chance).

Hypothesis Tests about a Population Proportion


a) Testing a claim about a population proportion
The test statistic used for a large The test statistic used for a small
sample: sample:

b) Comparing the population proportions

79

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Where, = Pooled

proportion
Under the null hypothesis P1 - P2 = 0. Hence, the test statistic is reduced into
the following form:

Example 6.7: Suppose prior election in a certain state indicated it is necessary


for a candidate for governor to receive at least 80% of vote in the northern
section of the state to be selected. The incumbent governor needs to assess his
chances of returning to office & plans to conduct a sample survey of 2000
registered voters in the northern section of the state. The results of this sample
survey revealed that 1550 planned to vote for the incumbent governor. Is the
sample proportion close to 0.80 to conclude that the difference is due to
sampling error at ? (or is there a significant difference between the
sample and population proportions?)
Steps:
1.
2. It is given that
3. Select appropriate test statistic and compute its value
Given; ;

4. Formulate the decision rule and find the critical value


If we reject Ho, or If we reject Ho.

5. Make a decision: As we reject Ho. Or as -2.80<-1.65,


we reject Ho.

80

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Chi-Square Test and F-Test


Chi-square ( ) Distribution: It allows us to establish confidence interval
estimates for a variance, just as the Normal and t distributions were used in the
case of a mean. Further, just as the Binomial distribution was used to examine
situations where the result of an experiment could be either ‘success’ or
‘failure’, the distribution allows us to analyse situations where there are
more than two categories of outcome. The F distribution enables us to conduct
hypotheses tests regarding the equality of two variances and also to make
comparisons between the means of multiple samples, not just two. When a
random sample of size n is drawn from a normal population with mean (μ) and
variance ( ), the sampling distribution of depends on n. The standardized
distribution of is called the chi-square distribution and is given by;
; Degrees of freedom (df ), v = n – 1
(6.8)
Note the distribution has the following characteristics: (1) It is always non-
negative- as it is the ratio of two non-negative values (2) It is non-symmetric-
i.e., it is skewed to the right (3) There are many different chi-square
distributions, one for each degree of freedom (4) It becomes more symmetric as
the number of degrees of freedom increases (5) The degrees of freedom when
working with a single population variance is n-1.
Chi-Square Probabilities: Since the distribution is not symmetric, the
method for looking up left-tail values is different from the method for looking
up right tail values.
 Area to the right- just use the area given.
 Area to the left- the table requires the area to the right, so subtract the given
area from one and look this area up in the table.
 Area in both tails- divide the area by two. Look up this area for the right
critical value and one minus this area for the left critical value.
The distribution has a number of uses. In this section, we make use of it in
three ways:
(a) To calculate a confidence interval estimate of the population variance.
(b) To compare actual observations on a variable with the (theoretically)
expected values.
81

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(c) To test for association between two variables in a contingency table.

The use of distribution is in many ways similar to the Normal and t


distributions already encountered. Once again, it is actually a family of
distributions depending upon one parameter, the degrees of freedom, similar to
the t distribution. The number of degrees of freedom can have slightly different
interpretations, depending upon the particular problem, but is often related to
sample size. Some typical distributions are drawn in Figure 6.3 for different
values of the parameter. Confidence intervals are constructed in the usual way,
by using the critical values of the distribution (given in Table A4 in the
appendix) which cut off an area α/2 in each tail of the distribution. For
hypothesis tests, a rejection region is defined which cuts off an area α in either
one or both tails of the distribution, whichever is appropriate. The following
examples show how this works for the distribution.

Figure 6.1: The Distribution with Different Degrees of Freedom


(a) Estimating a variance: The sample variance is also a random variable like
the mean; it takes on different values from sample to sample. We can therefore
ask the usual question: given a sample variance, what can we infer about the
true value?
Example 6.8: A sample of 20 Labour-controlled local authorities shows that
they spend an average of £175 per taxpayer on administration with a standard
deviation of £25. What can we say about the true variance and standard
deviation?

Solution: To solve this problem it is more convenient to use distribution.


First of all, the sample variance is an unbiased estimator of the population
variance, i.e., E (S2) = , so we may use this as our point estimate, which is
therefore 625. To construct the confidence interval around this we need to know
82

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

about the distribution of S2. Unfortunately, this does not have a convenient
probability distribution, so we transform it to (equation 6.8) which does have a
χ2 distribution, with ν = n − 1 degrees of freedom.

To construct the 95% confidence interval around the point estimate we proceed
in a similar fashion to the Normal or t distribution. First, we find the critical
values of the χ2 distribution which cut off 2.5% in each tail. These are no longer
symmetric around zero as was the case with the standard Normal and t
distributions (See Table A4 in the Appendix).

Like the t distribution, the first column gives the degrees of freedom, so we
require the row corresponding to ν = n − 1 = 19.
 For the left-hand critical value (cutting off 2.5% in the left-hand tail) we look
at the column headed 0.975, representing 97.5% in the right-hand tail. This
critical value is 8.91.
 For the right-hand critical value we look up the column headed ‘0.025’ (2.5%
in the right-hand tail), giving 32.85. The excel formula for this is: =
CHIINV (0.025, 19) gives the answer 32.85, the right-hand critical value.
We can therefore be 95% confident that lies between these two
values, i.e.
(6.9)
We actually want an interval estimate for so we need to rearrange equation
(6.9) so that lies between the two inequality signs. Rearranging yields

(6.10)

and evaluating this expression leads to the 95% confidence interval for
which is

Note that the point estimate, 625, is no longer at the centre of the interval but is
closer to the lower limit. This is a consequence of the skewness of the
distribution.
83

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

(b) Comparing actual and expected values of a variable: A second use of


the χ2 distribution provides a hypothesis test, allowing us to compare a set of
observed values to expected values, the latter calculated on the basis of some
null hypothesis to be tested. If the observed and expected values differ
significantly, as judged by the χ2 test (the test statistic falls into the rejection
region of the χ2 distribution), then the null hypothesis is rejected. Again, this is
similar in principle to hypothesis testing using the Normal or t distributions, but
allows a slightly different type of problem to be handled. This can be illustrated
with a very simple example.
Example 6.9: Suppose that throwing a die 72 times yields the following
data:
Score on
1 2 3 4 5 6
die
Frequency 6 15 15 7 15 14
Are these data consistent with the die being unbiased? A crude examination of
the data suggests a slight bias against 1 and 4, but is this truly bias or just a
random fluctuation quite common in this type of experiment?
Solution: First the null and alternative hypotheses are set up:
H0: the die is unbiased; H1: the die is biased
Note that the null hypothesis should be constructed in such a way as to permit
the calculation of the expected outcomes of the experiment. Thus the null and
alternative hypotheses could not be reversed in this case, since ‘the die is
biased’ is a vague statement (exactly how biased, for example?) and would not
permit the calculation of the expected outcomes of the experiment. On the basis
of the null hypothesis, the expected values are based on the uniform
distribution, i.e. each number should come up an equal number of times. The
expected values are therefore 12 (=72/6) for each number on the die. This gives
the data shown in Table 6.3 with observed and expected frequencies in columns
two and three respectively (ignore columns 4-6 for the moment).
Table 6.3: Calculation of the χ2 Statistic for the Die Problem
Observed frequency Expected frequency (O-
Score O-E (O-E)2
(O) (E) E)2/E
1 6 12 −6 36 3
2 15 12 3 9 0.75
84

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

3 15 12 3 9 0.75
4 7 12 −5 25 2.08
5 15 12 3 9 0.75
6 14 12 2 4 0.33
Totals 72 72 0 7.66

The χ2 test statistic is now constructed using the formula:


(6.11)
which has a χ2 distribution with v = k - 1 df (k is the number of different
outcomes, here 6). O represents the observed frequencies and E the expected. If
the value of this test statistic falls into the rejection region, then we conclude the
die is biased, rejecting the null. The calculation of the test statistic is shown in
columns 4–6 of Table 8.3, yielding a value of χ2 = 7.66, to be compared to the
critical value of the distribution, for 6 - 1 = 5 df.

Looking up the critical value for this test takes a little care as one needs first to
consider if it is a one- or two-tailed test. Looking at the alternative hypothesis
suggests a two-sided test, since the error could be in either direction. However,
this intuition is wrong, for the following reason. Looking closely at equation
(6.11) reveals that large discrepancy between observed and expected values
(however occurring) can only lead to large values of the test statistic.
Conversely, small values of the test statistic must mean that differences
between O and E are small, so the die must be unbiased. Thus the null is only
rejected by large values of the χ2 statistic or, in other words, the rejection region
is in the right-hand tail only of the χ2 distribution. It is a one-tailed test. This is
illustrated in Figure 6.2.

85

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Figure 6.2: The Rejection Region for the χ2 test


The critical value of the χ2 distribution in this case (ν = 5, 5% significance level)
is 11.1, found from Table A4 in the appendix. Note that we require 5% of the
distribution in the right-hand tail to establish the rejection region. Since the test
statistic is less than the critical value (7.66 < 11.1) the null hypothesis is not
rejected.
Example 8.11: One hundred volunteers each toss a coin twice and note the
numbers of heads. The results of the experiment are as follows:
Heads 0 1 2 Total
Frequency 15 55 30 100
Can we reject the hypothesis that a fair coin (or strictly, coins) was used for the
experiment?
Solution: On the basis of the Binomial distribution the probability of no heads is 0.25 (= 1/2 ×
1/2), of one head is 0.5 and of two heads is again 0.25, as explained in Section 4. The expected
frequencies are therefore 25, 50 and 25. The calculation of the test statistic is set out below:
No of (O-
O E O-E (O-E)2
heads E)2/E
0 15 25 −10 100 4
1 55 50 5 25 0.5
2 30 25 5 25 1
Totals 100 100 5.5
The test statistic of 5.5 compares to a critical value of 5.99 (ν = 2) so we cannot
reject the null hypothesis of a fair coin being used.

(c) Contingency Tables: Data are often presented in the form of a two-way
classification as shown in Table 6.4, known as a contingency table and this is
another situation where the χ2 distribution is useful. It provides a test of whether
or not there is an association between the two variables represented in the table.
Table 6.4: Data on Voting Intentions by Social Class
Social Liberal
Labour Conservative Total
class Democrat
A 10 15 15 40
B 40 35 25 100
C 30 20 10 60
Totals 80 70 50 200
86

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The table shows the voting intentions of a sample of 200 voters, cross-classified
by social class. Test whether there is any association between people’s voting
behaviour and their social class. Are manual workers (social class C in the
table) more likely to vote for the Labour party than for the Conservative party?
The table would appear to indicate support for this view, but is this truly the
case for the whole population or is the evidence insufficient to draw this
conclusion? This sort of problem is amenable to analysis by a χ 2 test. The data
presented in the table represent the observed values, so expected values need to
be calculated and then compared to them using a χ2 test statistic.

The first task is to formulate a null hypothesis, on which to base the calculation
of the expected values, and an alternative hypothesis. These are: H 0: there is no
association between social class and voting behaviour; H 1: there is some
association between social class and voting behaviour. As always, the null
hypothesis has to be precise, so that expected values can be calculated. If H0 is
true and there is no association, we would expect the proportions voting
Labour, Conservative and Liberal Democrat to be the same in each social class.
Further, the parties would be identical in the proportions of their support
coming from social classes A, B and C. This means that, since the whole
sample of 200 splits 80:70:50 for the Labour, Conservative and Liberal
Democrat parties (see the bottom row of the table), each social class should
split the same way. Thus of the 40 people of class A, 80/200 of them should
vote Labour, 70/200 Conservative and 50/200 Liberal Democrat. This yield:

Intension of Split of social class


voting A B C
40 80/200 100 80/200 = 60 80/200 =
Labour
=16 40 24
40 70/200 100 70/200 = 60 70/200 =
Conservative
=14 35 21
Liberal 40 50/200 100 50/200 = 60 50/200 =
Democrat =10 25 15

87

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Both observed and expected values are presented in Table 6.5 (expected values
are in brackets). Notice that both the observed and expected values sum to the
appropriate row and column totals.
Table 6.5: Observed and expected values (latter in brackets)
Social Liberal
Labour Conservative Total
class Democrat
A 10(16) 15(14) 15(10) 40
B 40(40) 35(35) 25(25) 100
C 30(24) 20(21) 10(15) 60
Totals 80 50 200

It can be seen that, compared with the ‘no association’ position, Labour gets too
few votes from Class A and the Liberal Democrats too many. However, Labour
gets disproportionately many class C votes, the Liberal Democrats too few. The
Conservatives’ observed and expected values are identical, indicating that the
propensities to vote Conservative are the same in all social classes. A quick way
to calculate the expected value in any cell is to multiply the appropriate row
total by column total and divide through by the grand total (200). For example,
to get the expected value for the class A/Labour cell:

In carrying out the analysis care should again be taken to ensure that
information is retained about the sample size, i.e. the numbers in the table
should be actual numbers and not percentages or proportions. This can be
checked by ensuring that the grand total is always the same as the sample size.
The χ2 test on a contingency table is similar to the one carried out before, the
formula being the same: with the number of degrees of freedom
given by ν = (r − 1) × (c − 1) where r is the number of rows in the table and c is
the number of columns. In this case, r = 3 and c = 3 so ν = (3 − 1) × (3 − 1) = 4.

The reason why there are only four degrees of freedom is that once any four
cells of the contingency table have been filled, the other five are constrained by
the row and column totals. The number of ‘free’ cells can always be calculated
88

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

as the number of rows less one, times the number of columns less one, as given
above. The test statistic in this can be calculated as follows, cell by cell:

Find the critical value from the χ 2 distribution with 4 degrees of freedom. At
the 5% significance level, it is 9.50 (see Table A4 in the appendix).

Make a decision: Since 8.04 < 9.50 the test statistic is smaller than the critical
value, so the null hypothesis cannot be rejected. The evidence is not strong
enough to support an association between social class and voting intention. We
cannot reject the null of the lack of any association with 95% confidence. Note,
however, that the test statistic is fairly close to the critical value, so there is
some weak evidence of an association, but not enough to satisfy conventional
statistical criteria.
The F distribution: It has a variety of uses in statistics; for this session we only
look at: testing for the equality of two variances and conducting an analysis of
variance (ANOVA) test. The F family of distributions resembles the χ2
distribution in shape: it is always non-negative and is skewed to the right. It has
two sets of degrees of freedom (labelled ν 1 and ν2) and these determine its
precise shape. Typical F distributions are shown in Figure 8.5. As usual, for a
hypothesis test we define an area in one or both tails of the distribution to be the
rejection region. If a test statistic falls into the rejection region then the null
hypothesis upon which the test statistic was based is rejected.

Figure 6.3: The F distribution, for different ν1(ν2 = 25)

89

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Testing the equality of two variances: It is unusual to conduct a test of a


specific value of a variance, since we usually have little intuitive idea what the
variance should be in most circumstances. A more likely circumstance is a test
of the equality of two variances (across two samples). Suppose the two car
factories were tested for the equality of average daily output levels. One can
also test whether the variance of output differs or not. A more consistent output
(lower variance) from a factory might be beneficial to the firm, e.g. dealers can
be reassured that they are more likely to be able to obtain models when they
require them. If the first factory had a standard deviation of daily output of 25,
the second of 20, both from samples of size 30 (i.e. 30 days’ output was
sampled at each factory), we can now test whether the difference between these
figures is significant or not. Such a test is set up as follows. It is known as a
variance ratio test for reasons which will become apparent.
The null and alternative hypotheses are:

Or, equivalently, (6.12)

It is appropriate to write the hypotheses in the form shown in (8.12) since the
random variable we shall use is in the form of the ratio of sample variances,

. This is a random variable which follows an F distribution with ν1 = n1 −

1, ν2 = n2 − 1 degrees of freedom. We require the assumption that the two


samples are independent for the variance ratio to follow an F distribution. Thus

we write: ~͠ (6.13)

The F distribution thus has two parameters, the two sets of degrees of freedom,
one (ν1) associated with the numerator, the other (ν2) associated with the
denominator of the formula. In each case, the degrees of freedom are given by

the sample size minus one. Note that is also an F distribution (i.e. it

doesn’t matter which variance goes into the numerator) but with the degrees of
freedom reversed, ν1 = n2 − 1,ν2 = n1 − 1. The sample data are: S1 = 25, S2 = 20,
n1 = 30, n2 = 30.

90

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

The test statistic is simply the ratio of sample variances. In testing it is less
confusing if the larger of the two variances is made the numerator of the test
statistic. Therefore, we have the following test statistic:
This must be compared to the critical value of the F distribution with ν 1 = 29,
ν2= 29 degrees of freedom. The rejection regions for the test are the two tails of
the distribution, cutting off 2.5% in each tail. Since we have placed the larger
variance in the denominator, only large values of F reject the null hypothesis so
we need only consult the upper critical value of the F distribution, i.e. that value
which cuts off the top 2.5% of the distribution. The degrees of freedom for the
test are given along the top row (ν1) and down the first column (ν2). The
numbers in the table give the critical values cutting off the top 2.5% of the
distribution. The critical value in this case is 2.09 (see Table A5 in the Annex),
at the intersection of the row corresponding to ν 2 = 29 and the column
corresponding to ν1 = 30 (ν1 = 29 is not given so 30 is used instead; this gives a
very close approximation to the correct critical value). Since the test statistic
does not exceed the critical value, the null hypothesis of equal variances cannot
be rejected with 95% confidence.

Analysis of Variance
In the previous sub-section we discussed how to test the hypothesis that the
means of two samples are the same, using a z or t test, depending upon the
sample size. This type of hypothesis test can be generalised to more than two
samples using a technique called analysis of variance (ANOVA), based on the
F distribution. Although it is called analysis of variance, it actually tests
differences in means. Using this technique we can test the hypothesis that the
means of all the samples are equal, versus the alternative hypothesis that at least
one of them is different from the others. The assumptions underlying the
ANOVA technique are essentially the same as those used in the t test when
comparing two different means. We assume that the samples are randomly and
independently drawn from normally distributed populations which have equal
variances. Suppose there are three factories, whose outputs have been sampled,
with the results shown in Table 6.6. We wish to answer the question whether
91

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

this is evidence of different outputs from the three factories, or simply random
variations around an average output level.
Table 6.6: Samples of output from three factories
Observation Factory 1 Factory 2 Factory 3
1 415 385 408
2 430 410 415
3 395 409 418
4 399 403 440
5 408 405 425
6 418 400
7 399

The null and alternative hypotheses are therefore: H0: μ1 = μ2 = μ3; H1: at least
one mean is different from the others. This is the simplest type of ANOVA,
known as one-way analysis of variance. In this case there is only one factor
which affects output – the factory. The factor which may affect output is also
known as the independent variable. In more complex designs, there can be two
or more factors which influence output. The output from the factories is the
dependent or response variable in this case. To decide whether we reject H0, we
compare the variance of output within factories to the variance of output
between (the means of) the factories. Both methods provide estimates of the
overall true variance of output and, under the null hypothesis that factories
make no difference, should provide similar estimates. The ratio of the variances
should be approximately unity. If the null is false however, the between-
samples estimate will tend to be larger than the within-samples estimate and
their ratio will exceed unity. This ratio has an F distribution and so if it is
sufficiently large that it falls into the upper tail of the distribution then H 0 is
rejected.

To formalise this we breakdown the total variance of all the observations into:
1. The variance due to differences between factories, and
2. The variance due to differences within factories (also known as the error
variance).

92

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Initially we work with sums of squares rather than variances. Recall that the
sample variance is given by:
The numerator of the right-hand side of this expression gives the sum of
squares, i.e. the sum of squared deviations from the mean. Accordingly we have
to work with three sums of squares:
 The total sum of squares measures (squared) deviations from the overall or
grand average using all the observations. It ignores the existence of the
different factors.
 The between sum of squares is based upon the averages for each factor and
measures how they deviate from the grand average.
 The within sum of squares is based on squared deviations of observations
from their own factor mean.
It can be shown that there is a relationship between these sums of squares, i.e.
Total sum of squares (TSS) = Between sum of squares + Within sum of squares

; Where Xij is the output from factory i on day j and

is the grand average. The index i runs from 1 to 3 in this case (there are
three classes or groups for this factor) and the index j (indexing the
observations) goes from 1 to 6, 7, or 5 (for factories 1, 2 and 3 respectively).
Although this looks complex, it simply means that you calculate the sum of
squared deviations from the overall mean. The overall mean of the 18 values is
410.11 and the total sum of squares may be calculated as:

 An alternative formula for the total sum of squares is:


Where n is the total number of observations.
 The sum of the squares of all the observations: ΣX2 = 4152 + 4302 + . . . +
4252 = 3 030 418 the total sum of squares is given by;

as before

93

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

 The between sum of squares (BSS) (or treatment sum of square- TrSS) is

calculated using the formula: ; where Xi denotes the

mean output of factor i.


This part of the calculation effectively ignores the differences that exist within
factors and compares the differences between them. It does this by replacing the
observations within each factor by the mean for that factor. Hence all the factor
1 observations are replaced by 410.83, for factor 2 they are replaced by the
mean 401.57 and for factor 3 by 421.2. We then calculate the sum of squared
deviations of these values from the grand mean. Hence we get

Once again there is an alternative formula which may be simpler for calculation
purposes:

We have arrived at the result that 37% (= 1128.43/2977.78) of the total


variation (sum of squared deviations) is due to differences between factories
and the remaining 63% is therefore due to variation (day to day) within
factories. We can therefore immediately calculate the within sum of squares
(WSS) (or error sum of squares- ESS) as:

For completeness, the formula for the within sum of squares is

(6.17)
The term Xij − measures the deviations of the observations from the factor
mean and so the within sum of squares gives a measure of dispersion within the
classes. Hence, it can be calculated as:
Compute the F test statistic: The F statistic is based upon comparison between
and within sums of squares (BSS and WSS) but we must also take account of
the degrees of freedom for the test. The degrees of freedom adjust for the
number of observations and for the number of factors. Formally, the test
94

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

statistic is; , where MSB and MSW are mean squares

between and within which has k − 1 and n − k degrees of freedom. k is the


number of factors, 3 in this case, and n the overall number of observations, 18.
We thus have

The critical value of F for 2 and 15 degrees of freedom at the 5% significance


level is 3.682. As the test statistic exceeds the critical value we reject the null
hypothesis of no difference between factories.
The ANOVA table for the given example
Source of
SS df MS F
variation
Between 1128.43 2
1128.43/2 = 564.215
1849.348/15 = 564.215/123.289 =
Within 1849.348 15
123.289 4.576
Total 17

Learning Activities

1. The mean annual income for a sample of 250 Midroc factory workers was
calculated to be Br.25,000. The population standard deviation of annual
income for Midroc factory workers is known to be Br.5000. Assume the
sample is less than 5% of the relevant population.
a) Construct a 95% confidence interval for the population mean.
b) Construct a 90% confidence interval for the population mean.
2. A point estimate is correct if it is equal to the actual value of the parameter
being estimated while an interval estimate is correct if the actual value of the
parameter is in the interval. Which of these two estimates has the greatest
chance of being correct?

95

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

3. If two interval estimates for a population mean ( ),


are made from the same sample, which has the greatest chance of being
correct? Why?
4. Given a sample variance of 65 from a sample of size n = 30,
(a) Calculate the 95% CI for the variance of the population from which the
sample was drawn.
(b) Calculate the 95% CI for the standard deviation.
Continuous Assessment
Test, individual assignment on how to estimate and test significance of the
relationship between economic variables.
Summary
 Reject Ho if the test statistic (calculated value) exceeds the critical value
(tabulated value) and don’t reject otherwise. Reject H o if the proposed
population parameter is not found within a confidence interval. Otherwise,
don’t reject. Reject Ho if P-value is less than or equal to . Otherwise, don’t
reject.
 The χ2 and F distributions play important roles in statistics, particularly in
problems relating to the goodness of fit of the data to that predicted by a null
hypothesis.
 A random variable based on the sample variance, (n − 1) s2/σ 2, has a χ2
distribution with n − 1 degrees of freedom. Based on this fact, the χ2
distribution may be used to construct confidence interval estimates for the
variance σ 2. Since the χ2 is not a symmetric distribution, the confidence
interval is not symmetric around the (unbiased) point estimate s2.
 The χ2 distribution may also be used to compare actual and expected values
of a variable and hence to test the hypothesis upon which the expected values
were constructed.
 A two-way classification of observations is known as a contingency table.
The independence or otherwise of the two variables may be tested using the
χ2 distribution, by comparing observed values with those expected under the
null hypothesis of independence.

96

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

 The F distribution is used to test a hypothesis of the equality of two


variances. The test statistic is the ratio of two sample variances which, under
the null hypothesis, has an F distribution with n1-1, n2-1 degrees of freedom.
 The F distribution may also be used in an analysis of variance, which tests
for the equality of means across several samples. The results are set out in an
analysis of variance table, which compares the variation of the observations
within each sample to the variation between samples.
Proof of Ability
Name of student:
Date of assessment:
Name of assessors:
Assessor 1: ________________________________________
Assessor 2: ________________________________________
Criteria/method Competence Score
Products
of assessment to be assessed (50%)
The student has:
Applying appropriate
Collected, organized and
statistical tools and soft To organize 15
processed data
wares
Estimated population Applying appropriate
parameters based on the statistical tools and soft To estimate 10
sample information wares
Applying appropriate
Tested significance of the
statistical tools and soft To test 10
population parameters
wares
Analyzed functional
relationships between
Presenting the results of
variables in agribusiness To analyze 15
the analysis
and value chain
management

References

Agrawal B.L. (1996). Basic statistics. New Age International Pub. Ltd. New
Delhi
97

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities
Introduction to Statistics

Barrow, Michael (2006). Statistics for economics, accounting, and business


studies. 4th ed., Pearson Education Limited
Bowem E.K. and Starr M.K. (1982). Basic Statistics for business and
Economics. McGraw-Hill, Inc.
Frank H. and Althoen S.C. 1994. Statistics: concepts, and application.
Cambridge university press, UK
Gupta, C.B. (1997). An introduction to Statistical Methods. Vikas Publishing
House.
Hooda R.P, (2001). Statistics for Business and Economics. 2nd ed., New York.
Johnson , R.A, and Bhata K.G. (1992). Statistics Principles and Methods. New
York.
Mann, P.S. (1997). Introductory Statistics. 3rd ed.
Manson D., et al (1999). Statistical Techniques in Business and Economics.
McGraw-Hill, 10th ed.
Salvatore D. and Reagle D. (2005). Statistics and Econometrics.
Wayne, W. (1995). Biostatistics : A Foundation for Analysis in Health. 6 th ed.,
New York
In addition to these books, students are advised to read manuals of statistical
software.

98

Jimma, Haramaya, Hawassa, Ambo, Adama, Bahir Dar, Samara and Wolaita Sodo Universities

You might also like