0% found this document useful (0 votes)
17 views127 pages

Introduction To Statistics Note

Uploaded by

tafeseobsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views127 pages

Introduction To Statistics Note

Uploaded by

tafeseobsi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

DEBRE TABOR UNIVERSITY

Course Module for

Introduction to Statistics (Econ-1041)

OCTOBER, 2013

1
General Introduction

This module is primarily for economics students who need to understand the principles of data
collection, presentation, analysis and interpretation. It is valuable to first degree students of
economics. The material could also be of paramount importance for an individual who is
interested in business research.

The first three chapters cover basic concepts of Statistics focusing on the collection, presentation
and summarization of data. In chapter four, measures of variation using practical examples and
data are discussed comprehensively. In chapters five and six elementary probability and
sampling methods are presented with practical examples, respectively. The last chapter of this
module is about linear correlations and regressions.

General learning objectives followed by introductory sections which are specific to each chapter
are placed at the beginning of each chapter. The module also includes many problems for the
student, most of them based on real data, the majority with detailed solutions. A few reference
materials are also given at the end of each chapter for further reading.

Organization of the Module

This course module has seven chapters. These are introduction as definition of statistics, uses,
applications & limitation of statistics, types of variables and its measurement scale; methods of
data collection and presentations which includes data collection techniques and presentation
tools; measure of central tendency which deals about mean, median & mode; measure of
dispersion which covers absolute and relative measures of dispersions; elementary probability
including random experiment, sample space, event and approaches of measuring probability of
occurrences of events; random; sampling and sampling distribution as types and methods of
sampling and the distribution of the sample mean and proportion; and simple linear regression
and correlations. In each chapter there are exercises and assignments.

2
Objectives of the Module

General objectives
The general objective of this course module is to introduce, explain and show the basic concepts
and applications of statistical model in business related problems using real data examples.
Specifically:
Upon completion of this course module, the students will able to;
 Develop the basic statistical knowledge
 Justifies the basic statistical thinking and reasoning
 Apply the methods of statistics in scientific research.
 Demonstrate the usefulness of statistics in real life.

Module prerequisites…….. No
Credit hours: 3

3
Table of Contents
Contents Page
General Introduction ..................................................................................................................................... 2
Organization of the Module .......................................................................................................................... 2
Objectives of the Module .............................................................................................................................. 3
 Assessment Methods ............................................................................. Error! Bookmark not defined.
Chapter 1: Introduction ................................................................................................................................. 7
1.1 Introduction ......................................................................................................................................... 7
1.2 Definitions & classification of statistics ............................................................................................. 7
1.3 Stages in statistical investigation ........................................................................................................ 9
1.4 Definition of some basic terms ......................................................................................................... 10
1.5 Application, uses and limitation of statistics .................................................................................... 10
Summary ................................................................................................................................................. 17
Exercise 1 ................................................................................................................................................ 18
Assignment 1:(4%) ................................................................................................................................. 18
Chapter 2: Sampling Theory ....................................................................................................................... 19
2.1 introduction ....................................................................................................................................... 19
2.2 Basic Concepts .................................................................................................................................. 19
2.3 Reasons for Sampling ....................................................................................................................... 23
2.4 A Review of Methods of Sampling................................................................................................... 23
Summary ................................................................................................................................................. 28
Exercises 2 .............................................................................................................................................. 29
Assignment 2: (5%) ................................................................................................................................ 29
1. Describe the basic difference between Probability and non- probability sampling ............................ 29
2. A survey will be conducted on household water supply in a district comprising 20,000 households,
of which 20% are urban and 80% rural, since it is suspected that in urban areas the access to safe water
sources is much more satisfactory. What sampling technique is best for such condition? Justify. ....... 29
Reference ................................................................................................................................................ 29
Chapter 3: Methods of Data Collection and Presentations ..................................................................... 30
3.1 Introduction ....................................................................................................................................... 30
3.2 Classification of Data ........................................................................................................................ 31
3.3 Data Collection ................................................................................................................................. 31
3.3 Methods of Data Presentation ........................................................................................................... 35

4
3.3.1 Frequency Distribution .............................................................................................................. 35
3.3.2 Diagrammatic and Graphic presentation of data ........................................................................ 43
Summary ................................................................................................................................................. 49
Exercise 3 ................................................................................................................................................ 49
Assignment 3: (6%) ................................................................................................................................ 49
Chapter 4: Measures of Central Tendency .................................................................................................. 51
4.1 Introduction and Objectives of Measuring Central Tendency .......................................................... 51
4.2 Desirable properties of measure of central tendency ........................................................................ 52
4.3 The Summation Notation (∑) ........................................................................................................... 52
4.4 Types of Measures of Central Tendency .......................................................................................... 53
4.4.1. The Mean .................................................................................................................................. 53
4.4.2 The Median ................................................................................................................................ 58
4.4.3 The Mode ................................................................................................................................... 60
4.3.4 Quantiles .................................................................................................................................... 61
Summary ..................................................................................................................................................... 64
Exercise 4 ................................................................................................................................................ 64
Assignment 4: (5%) ................................................................................................................................ 65
Chapter 5: Measures of Variation (Dispersion), Skewness and Kurtosis ................................................... 66
5.1 Introduction and objectives of measuring Variation ......................................................................... 66
5.2 Types of Measures of Dispersion...................................................................................................... 66
5.2.1 Range(R) and Relative Range (RR) ........................................................................................... 67
5.2.2 The Quartile Deviation............................................................................................................... 69
5.2.3 The Mean Deviation and Coefficient of Mean Deviation .......................................................... 70
5.2.4 The Variance, the Standard Deviation and Coefficient of Variation ......................................... 74
5.2.5 Standard Scores (Z-scores) ........................................................................................................ 79
5.3 Moments ........................................................................................................................................... 81
5.4 Skewness ........................................................................................................................................... 83
5.5 Kurtosis ........................................................................................................................................... 84
Summary ................................................................................................................................................. 86
Exercises 5 .............................................................................................................................................. 87
Assignment 5: (5%) ................................................................................................................................ 87
Chapter 6: Simple Linear Regression and Correlation ............................................................................... 89

5
6.1 Introduction ....................................................................................................................................... 89
6.2 Scatter Plot ........................................................................................................................................ 89
6.3 Simple Correlation Analysis ............................................................................................................. 89
6.4 Simple Linear Regression Analysis .................................................................................................. 93
6.4.1 Assumption of simple linear regression ..................................................................................... 94
6.4.2 Parameter estimation .................................................................................................................. 95
Summary ..................................................................................................................................................... 98
Exercise 6 ................................................................................................................................................ 99
Assignment 6: (10%) ............................................................................................................................ 100
Chapter 7: Elementary Probability ............................................................................................................ 101
7.1 Definitions of Some Probability Terms .......................................................................................... 101
7.2 Counting rule (Addition, Multiplication, Permutation, Combination)............................................ 102
7.3 Approaches of measuring probability (Classical, frequents, Subjective) ....................................... 107
7.4 Conditional Probability and Independency ..................................................................................... 112
7.5 Basic Concepts of Probability Distributions ................................................................................... 115
Summary ................................................................................................................................................... 120
Exercises 7 ............................................................................................................................................ 120
Assignment 7 :( 5%) ............................................................................................................................. 121
Answer Key .............................................................................................................................................. 121
Biblography (References) ..................................................................................................................... 122
Appendix: Standard Tables of Statistics ................................................................................................... 123

6
Chapter 1: Introduction

Objectives
At the end of this unit the learners will able to:
1 defines statistics and biostatistics and indentifies the type of statistics

1.2 identify the stages of statistical investigations

1.3 define basic terminologies

1.4 enumerate application, uses and limitation of statistics

1.5 categorizes the type of variables and identifies its measurement scales.

Contents
1.1 Introduction

1.2 Definitions & classification of statistics

1.3 Stages in statistical investigation

1.4 Definition of some basic terms

1.5 Application, uses and limitation of statistics

1.6 Types of variables and measurement scales

1.1 Introduction

In the modern world of computers and information technology, the importance of statistics is
very well recognized by all the disciplines. Statistics has originated as a science of statehood and
found applications slowly and steadily in Agriculture, Economics, Commerce, Biology,
Medicine, Industry, planning, education and so on. Now a day, there is no other human walk of
life, where statistics cannot be applied.

1.2 Definitions & classification of statistics

Statistics is defined differently by different authors over a period of time. The word ‘statistics’
derived from lantin word ‘'Statis' which means a political state. It is originated from two quite
dissimilar fields i.e. games of chance and political state. In the olden days statistics was confined
to only state affairs but in modern days it embraces almost every sphere of human activity.

7
The word statistics has several meanings. In the first place, it is a plural noun which describes a
collection of numerical data such as employment statistics, accident statistics, population statistics,
economic statistics, and agricultural statistics e t c. It is in this sense that the word 'statistics' is usually
understood by a layman.

Secondly the word statistics as a singular noun is used to describe a branch of applied mathematics,
whose purpose is to provide methods of dealing with collections of data and extracting information
from them in compact form by tabulating, summarizing and analyzing the numerical data or a set of
observations.

Generally, Statistics can be defined into different ways as statistical data or/and statistical
methods:
1. Plural sense (lay man definition).
It is an aggregate or collection of numerical facts such as the number of people living in
particular area, the distribution of family incomes in Debre Tabor town (in the sense of statistical
data).
2. Singular sense (formal definition)
When it is used in the sense of statistical methods, statistics is defined as the science of
collecting, organizing, presenting, analyzing and interpreting numerical data for the purpose of
assisting in making a more effective decision.

NB Even though statistical data always denote figures (numerical descriptions) it must be
remembered that all 'numerical descriptions' are not statistical data.

Characteristics of statistical data

In order that numerical descriptions may be called statistics (statistical data) they must possess
the following characteristics:

i) They must be aggregates. This means that statistics are a “number of facts.” A single fact,
even though numerical stated, cannot be called statistics.
ii) The must be affected to a marked extent by a multiplicity of causes. This means that the
numerical values of any quantity at any particular moment is the result of the action and
interaction of a number of forces, differing amongst themselves and it is not possible to
say as to how much of due to any one particular cause.

8
iii) Statistics must be enumerated or estimated according to reasonable standard of accuracy.
This means that if aggregates of numerical facts are to be called ‘Statistics’ they must be
reasonably accurate.
iv) Statistics are collected in a systematic manner for a predetermined purpose. Numerical
data can be called statistics only if they have been compiled in a properly planned manner
and for a purpose about which the enumerator had a definite idea.
v) Statistics should be placed in relation to each other. Numerical facts may be placed in
relation to each other either in points of time, space or condition. This means that facts
should be comparable.

Classifications of statistics

Depending on how data can be used statistics is sometimes divided in to two main areas or
branches.

1. Descriptive Statistics: is concerned with summary calculations, graphs, charts and tables for
the data has been collected. For example average number of female births in a hospital per week.
The attrition rate of Debre Tabor University in 2012/13 was 6.0%.

2. Inferential Statistics: is a branch of statistics deals the techniques of making conclusions


about the population; it is builds upon descriptive statistics or it a method used to generalize from
a sample to a population. For example, the average income of all families (the population) in
Ethiopia can be estimated from figures obtained from a few hundred (the sample) families.

 It is important because statistical data usually arises from sample.


 Statistical techniques based on probability theory are required.

1.3 Stages in statistical investigation

There are five stages or steps in any statistical investigation.

1. Collection of data: the process of measuring, gathering, assembling the raw data up on which
the statistical investigation is to be based.
 Data can be collected in a variety of ways; one of the most common methods is
through the use of survey and experiment. Survey can also be done in different
methods, three of the most common methods are:

9
 Telephone survey
 Mailed questionnaire
 Personal interview.
2. Organization of data: Summarization of data in some meaningful way, e.g table form
3. Presentation of the data: The process of re-organization, classification, compilation, and
summarization of data to present it in a meaningful form.
4. Analysis of data: The process of extracting relevant information from the summarized data,
mainly through the use of elementary mathematical operation.
5. Inference of data: The interpretation and further observation of the various statistical
measures through the analysis of the data by implementing those methods by which
conclusions are formed and inferences made.

1.4 Definition of some basic terms

a. Statistical Population: It is the collection of all possible observations of a specified


characteristic of interest (possessing certain common property) and being under study. An
example is all of the students in AAU 3101 course in this term.
b. Sample: It is a subset of the population, selected using some sampling technique in such a
way that they represent the population.
c. Sampling: The process or method of sample selection from the population.
d. Sample size: The number of elements or observation to be included in the sample.
e. Census: Complete enumeration or observation of the elements of the population. Or it is the
collection of data from every element in a population
f. Parameter: Characteristic or measure obtained from a population.
g. Statistic: Characteristic or measure obtained from a sample.
h. Variable: It is an item of interest that can take on many different numerical values.

1.5 Application, uses and limitation of statistics

Applications of statistics:

Statistics is not a mere device for collecting numerical data, but as a means of developing sound
techniques for their handling, analyzing and drawing valid inferences from them. Statistics is
applied in every sphere of human activity – social as well as physical – like Biology, Commerce,

10
Education, Planning, Business Management, Information Technology, etc. It is almost
impossible to find a single department of human activity where statistics cannot be applied. For
instance;
 Almost all human beings in their daily life are subjected to obtaining numerical facts e.g.
abut price.
 Applicable in some process e.g. invention of certain drugs, extent of environmental
pollution.
 In industries especially in quality control area.
 To test the efficiency of a new drug or medicine

Uses of statistics:

The main function of statistics is to enlarge our knowledge of complex phenomena. The
following are some uses of statistics:

1. It presents facts in a definite and precise form.

2. Data reduction; Statistical measures help to reduce the complexity of the data and
consequently to understand any huge mass of data. The process of summarizing large
amounts of data by forming frequency distributions, histograms, scatter diagrams, etc.,
and calculating statistics such as means, variances and correlation coefficients.

3. Measuring the magnitude of variations in data.

4. Furnishes a technique of comparison; Classification and tabulation are the two


methods that are used to condense the data. They help us to compare data collected from
different sources. Grand totals, measures of central tendency measures of dispersion,
graphs and diagrams, coefficient of correlation etc provide ample scope for comparison.

5. Estimating unknown population characteristics. One of the main objectives of


statistics is drawn inference about a population from the analysis for the sample drawn
from that population. This can be done either using point or interval estimation
techniques.

6. Testing and formulating of hypothesis. A statistical hypothesis is some statement about


the probability distribution, characterizing a population on the basis of the information

11
available from the sample observations. In the formulation and testing of hypothesis,
statistical methods are extremely useful. Whether crop yield has increased because of the
use of new fertilizer or whether the new medicine is effective in eliminating a particular
disease are some examples of statements of hypothesis.

7. Studying the relationship between two or more variable. The relationship between
variables can be identified by using chi-sqaure test or using regression models.

8. Forecasting future events. For instance, a biologist can forecast thee rainfall for the near
future based on the data of the last ten years connected to rainfall of a particular
ecological zone.

Scope of Statistics

Statistics is applied in every sphere of human activity – social as well as physical – like Biology,
Commerce, Education, Planning, Business Management, Information Technology, etc. It is
almost impossible to find a single department of human activity where statistics cannot be
applied.

Applications of statistics for Economics

Statistical methods are useful in measuring numerical changes in complex groups and
interpreting collective phenomenon. Nowadays the uses of statistics are abundantly made in any
economic study. Both in economic theory and practice, statistical methods play an important
role. Alfred Marshall said, “Statistics are the straw only which I like every other economists
have to make the bricks”. It may also be noted that statistical data and techniques of statistical
tools are immensely useful in solving many economic problems such as wages, prices,
production, distribution of income and wealth and so on. Statistical tools like Index numbers,
time series Analysis, Estimation theory, Testing Statistical Hypothesis are extensively used in
economics.

Limitations of statistics

As a science statistics has its own limitations. The following are some of the limitations:

 It deals with only those subjects of inquiry that are capable of being quantitatively
measured and numerically expressed.

12
 It deals on aggregates of facts and no importance is attached to individual items–suited
only if their group characteristics are desired to be studied.
 Statistical data are only approximately and not mathematical correct.

1.6 Types of variables and measurement scales

i. Types of variables

Any aspect of an individual that is measured and take any value for different individuals or
cases, income, demand, or records, like age, sex is called variables.

It is helpful to divide variables into different types, as different statistical methods are applicable
to each. The main division is into qualitative (categorical) or quantitative (numerical) variables.

1. Qualitative Variables are variables or characteristics which cannot measured in quantitative


form but can only be identified by name or categories, for instance include gender, religious
affiliation, place of birth.

2. Quantitative Variables are variables or characteristics which can be measured and expressed
numerically. For instance, balance in checking account, number of children in family, expense,
income, salary.

Quantitative variables are two types:

a. Discrete variable are variables which can assume only certain values, and there are
usually "gaps" between the values, such as the number of bedrooms in your house,
number of households in a kebele, number of VAT registered cafés’ in each Sub city at
Addis Ababa.
b. Continuous variables are variables which can assume any value within a specific range,
such as the air pressure in a tire, income, weight, age of household head etc.

ii. Scales of measurement

Proper knowledge about the nature and type of data to be dealt with is essential in order to
specify and apply the proper statistical method for their analysis and inferences. Measurement
scale refers to the property of value assigned to the data based on the properties of order,
distance and fixed zero.
13
In mathematical terms measurement is a functional mapping from the set of objects {Oi} to the
set of real numbers {M (Oi)}.

The goal of measurement systems is to structure the rule for assigning numbers to objects in such
a way that the relationship between the objects is preserved in the numbers assigned to the
objects. The different kinds of relationships preserved are called properties of the measurement
system. The three properties of measurement scales are listed below:

a. Order

The property of order exists when an object that has more of the attribute than another object, is
given a bigger number by the rule system. This relationship must hold for all objects in the "real
world".

The property of ORDER exists

When for all i, j if Oi > Oj, then M(Oi) > M(Oj).

b. Distance

The property of distance is concerned with the relationship of differences between objects. If a
measurement system possesses the property of distance it means that the unit of measurement

14
means the same thing throughout the scale of numbers. That is, an inch is an inch, no matters
were it falls - immediately ahead or a mile downs the road.

More precisely, an equal difference between two numbers reflects an equal difference in the "real
world" between the objects that were assigned the numbers. In order to define the property of
distance in the mathematical notation, four objects are required: Oi, Oj, Ok, and Ol. The
difference between objects is represented by the "-" sign; Oi - Oj refers to the actual "real world"
difference between object i and object j, while M(Oi) - M(Oj) refers to differences between
numbers.
The property of DISTANCE exists, for all i, j, k, l
If Oi-Oj ≥ Ok- Ol then M(Oi)-M(Oj) ≥ M(Ok)-M( Ol ).
c. Fixed Zero
A measurement system possesses a rational zero (fixed zero) if an object that has none of the
attribute in question is assigned the number zero by the system of rules. The object does not need
to really exist in the "real world", as it is somewhat difficult to visualize a "man with no height".
The requirement for a rational zero is this: if objects with none of the attribute did exist would
they be given the value zero. Defining O0 as the object with none of the attribute in question, the
definition of a rational zero becomes:

The property of FIXED ZERO exists if M(O0) = 0.


The property of fixed zero is necessary for ratios between numbers to be meaningful.

Scale type

Measurement is the assignment of numbers to objects or events in a systematic fashion. Four


levels of measurement scales are commonly distinguished: nominal, ordinal, interval, and ratio
and each possessed different properties of measurement systems.

i. Nominal Scales

Nominal scales are measurement systems that possess none of the three properties stated above.
 Level of measurement which classifies data into mutually exclusive, all inclusive
categories in which no order or ranking can be imposed on the data.
 No arithmetic and relational operation can be applied.

15
Examples:
o Political party preference (Republican, Democrat, or Other,)
o Sex (Male or Female.)
o Marital status(married, single, widow, divorce)
o Country code
o Regional differentiation of Ethiopia.

ii. Ordinal Scales

Ordinal Scales are measurement systems that possess the property of order, but not the property
of distance. The property of fixed zero is not important if the property of distance is not satisfied.
 Level of measurement which classifies data into categories that can be ranked.
 Differences between the ranks do not exist.
 Arithmetic operations are not applicable but relational operations are applicable.
 Ordering is the sole property of ordinal scale.

Examples:
o Letter grades (A, B, C, D, F).
o Rating scales (Excellent, Very good, Good, Fair, poor)
o Socio- economic class (low, middle, high)
o Country status (Undeveloped, developing, developed)
o Rate of satisfaction (very satisfied, satisfied, less than satisfied, very unsatisfied)

iii. Interval Scales

Interval scales are measurement systems that possess the properties of Order and distance, but
not the property of fixed zero.
 Level of measurement which classifies data that can be ranked and differences are
meaningful. However, there is no meaningful zero, so ratios are meaningless.
 All arithmetic operations except division are applicable.
 Relational operations are also possible.

Examples:
o IQ

16
o Temperature in oF.

iv. Ratio Scales

Ratio scales are measurement systems that possess all three properties: order, distance, and fixed
zero. The added power of a fixed zero allows ratios of numbers to be meaningfully interpreted;
i.e. the ratio of Bekele's height to Martha's height is 1.32, whereas this is not possible with
interval scales.
 Level of measurement which classifies data that can be ranked, differences are
meaningful, and there is a true zero. True ratios exist between the different units of
measure.
 All arithmetic and relational operations are applicable.

Examples:
o Weight, Height, income, expense
o Number of students
o Age

Summary

Statistics is the science of collecting, organizing, presenting, analyzing and interpreting of


numerical figures in order to make effective decision. Generally, statistics divide into two main
branches that are descriptive and inferential statistics. Any scientific research should be
undertaken by considering stages of statistical investigation else the finding of the researcher
may not be accurate. Statistics is applied in every sphere of human activity. Identifying the
nature and the type of data are appropriate to select the appropriate statistical model for a
specified problem. Variable divide into four based on their measurement scales. These are
nominal, ordinal, interval and ratio.

17
Exercise 1
1. The average score for an entire population would be an example of a __________.
a. Parameter c. Variable
b. Statistic d. Constant
2. After measuring a height of two trees, a researcher finds that the height one is three times
greater than the other. These measurements must come from _______ scale.
a. Nominal c. interval
b. Ordinal d. ratio
3. A variable that has an infinite number of possible values between any two specific
measurements is called __________ variable.
a. Independent c. Discrete
b. Dependent d. Continuous

Assignment 1: (4%)

1.1 The following present a list of different attributes and rules for assigning numbers to objects.
Try to classify the different measurement systems into one of the four types of scales
A. Your score on the first statistics test as a measure of your knowledge of statistics.
B. Your score on an individual intelligence test as a measure of your intelligence.
C. A response to the statement "Abortion is a woman's right" where "Strongly Disagree" =
1, "Disagree" = 2, "No Opinion" = 3, "Agree" = 4, and "Strongly Agree" = 5, as a
measure of attitude toward abortion.
D. Times for swimmers to complete a 50-meter race
E. Months of the year September, October, …
F. Blood type of individuals, A, B, AB and O.
G. Regions numbers of Ethiopia (1, 2, 3 etc.)
H. The number of cattle which belongs to a household head
I. Annual income;
References
 Bluman, A.G(1995) Elementary statistics (2nd edition)
 Gupta, C.P () Introduction to statistical methods (9th edition)

18
Chapter 2: Sampling Theory

Objectives
At the end of this unit the learners will be able to:

2.1 Define basic concepts & terms of sampling theory


2.2 Identify different types of sampling methods

Contents
2.1 Introduction
2.2 Basic Concepts
2.3 Reason for sampling
2.4 A Review of Methods of Sampling

2.1 introduction
Sampling is very often used in our daily life. For example while purchasing food grains from a
shop we usually examine a handful from the bag to assess the quality of the commodity. A
doctor examines a few drops of blood as sample and draws conclusion about the blood
constitution of the whole body. Thus most of our investigations are based on samples. In this
chapter, we will see basic concepts of sampling theory, reasons for sampling and the various
methods of sample selections from the population.

2.2 Basic Concepts

We are going to analyze and interpret data to draw conclusions not about the data but about the
source of the data (population consisting of all elements being studied). We collect a sample of
data from the population and use it to make inferences about the population. Very often we will
be interested in estimating a population parameter. In order to estimate this we need to define
our terms carefully:
Population: In a statistical enquiry, all the items, which fall within the purview of enquiry, are
known as Population or Universe. In other words, the population is a complete set of all
possible observations of the type which is to be investigated.
Example:
 Total number of students studying in a school or college,

19
 total number of books in a library,
 Total numbers of houses in a village or town.
Sometimes it is possible and practical to examine every person or item in the population we wish
to describe. We call this a complete enumeration, or census. We use sampling when it is not
possible to measure every item in the population. Statisticians use the word population to refer
not only to people but to all items that have been chosen for study.

Finite population and infinite population:


A population is said to be finite if it consists of finite number of units. Number of workers in a
factory and production of articles in a particular day for a company are examples of finite
population. The total number of units in a population is called population size. A population is
said to be infinite if it has infinite number of units. For example the number of stars in the sky,
the number of people seeing the Television programmes etc.,

Census Method:
Information on population can be collected in two ways – census method and sample method. In
census method every element of the population is included in the investigation.

Example
If we study the average annual income of the families of a particular village or area, and if there
are 1000 families in that area, we must study the income of all 1000 families. In this method no
family is left out, as each family is a unit.

Merits and limitations of Census method:

Mertis:
1. The data are collected from each and every item of the population
2. The results are more accurate and reliable, because every item of the universe is required.
3. Intensive study is possible.
4. The data collected may be used for various surveys, analyses etc.
Limitations:
1. It requires a large number of enumerators and it is a costly method
2. It requires more money, labour, time energy etc.
3. It is not possible in some circumstances where the universe is infinite.

20
Sampling:

The theory of sampling has been developed recently but this is not new. In our everyday life we
have been using sampling theory as we have discussed in introduction. In all those cases we
believe that the samples give a correct idea about the population. Most of our decisions are based
on the examination of a few items that is sample studies.

Sample:

Statisticians use the word sample to describe a portion chosen from the population. A finite
subset of statistical individuals defined in a population is called a sample. The number of units in
a sample is called the sample size.

Sampling unit:

The constituents of a population which are individuals to be sampled from the population and
cannot be further subdivided for the purpose of the sampling at a time are called sampling units.
For example,
1. To know the average income per family, the head of the family is a sampling unit.
2. To know the average yield of rice, each farm owner’s yield of rice is a sampling unit.
3. If somebody studies Scio-economic status of the households, households are the sampling
unit.
4. If one studies performance of freshman students in some college, the student is the sampling
unit.

Sampling frame:

For adopting any sampling procedure it is essential to have a list identifying each sampling unit
by a number. Such a list or map is called sampling frame. A list of voters, a list of house holders,
a list of villages in a district, a list of farmers etc. are a few examples of sampling frame.

Target population: the population about which one wishes to make an inference.

Sample size (n): the amount (total number) of individuals or sampling units selected as a sample.

Sampling population: is a population from which one actually draws a sample. Sample
population covers the element from which sample was actually selected.

21
Parameter and statistic:

We can describe samples and populations by using measures such as the mean, median, mode
and standard deviation. When these terms describe the characteristics of a population, they are
called parameters. When they describe the characteristics of a sample, they are called statistics.
A parameter is a characteristic of a population and a statistic is a characteristic of a sample. Since
samples are subsets of population statistics provide estimates of the parameters. That is, when the
parameters are unknown, they are estimated from the values of the statistics.

In general, we use Greek or capital letters for population parameters and lower case Roman
letters to denote sample statistics. [N, μ, , are the standard symbols for the size, mean, standard
̅, s, are the standard symbol for the size, mean, standard deviation of
deviation of population. n, 𝒙
sample respectively].

Errors in sample survey: There are two types of errors


a) Sampling error:
- The error which arise due to only a sample being used to estimate population parameter.
- Is the discrepancy between the population value and sample value.
- May arise due to use of inappropriate sampling techniques or by taking too small sample
size.
b) Non sampling errors: are errors due to procedure bias such as:
- such as measurement errors, recording errors, non-response errors, respondent bias,
interviewer error, errors in processing the data, and reporting error, incorrect responses
and measurement
- Errors at different stages in processing the data.
Note: (i) the sample population is smaller than target population by non coverage or
incomplete coverage (missing units).

(ii) Statistical inference procedures allow one to make inference about sample population. Only
when sample population and target population are equal one can infer about target population.

22
2.3 Reasons for Sampling

1. The destructive nature of certain tests.


2. Physical impossibility of checking all items if the population are infinite.
3. Cost of studying all items in the population is often prohibitive.
4. The adequacy of sample result can be obtained.
5. To save resources.
Advantages and Limitations of Sampling:
There are many advantages of sampling methods over census method. They are as follows:
1. Sampling saves time and labour.
2. It results in reduction of cost in terms of money and time.
3. Sampling ends up with greater accuracy of results.
4. It has greater scope.
5. It has greater flexibility.
6. If the population is too large, or hypothetical or destroyable sampling is the only method
to be used.
The limitations of sampling are given below:
1. Sampling is to be done by qualified and experienced persons. Otherwise, the information
will be unbelievable.
2. Sample method may give the extreme values sometimes instead of the mixed values.
3. There is the possibility of sampling errors. Census survey is free from sampling error.

2.4 A Review of Methods of Sampling

There are two types (groups) of sampling techniques


1. Random Sampling or Probability Sampling.
 It is a method of sampling in which all elements in the population have a pre-assigned non-
zero probability to be included in to the sample.
 It is sampling method in which the items are included in the sample in a random basis.
Some of these are:
(a). Simple random sampling
(b). Stratified random sampling

23
(c). Cluster sampling
(d). Systematic sampling
a. Simple random sample: a sampling technique in which member of the population is
equally likely to be included in the sample. Suppose we have a population of N objects and
we wish to choose n of them to form a sample. We have seen that there are N C n ways of
choosing the sample without replacement and Nn ways with replacement.
Examples
Lottery method – the units to be included in the sample are chosen by a lottery. Assign numbers
to each element in the population. Write each number in a split of paper, toss then draw one
number at a time. This method can only be used if the population is not very large otherwise it is
cumbersome.

Table of random number: used to select representative sample from a large size population. To
select the sample use random digit techniques.

Procedure to select a sample using random number table:

Units of the population from which a sample is required are assigned with equal number of
digits. When the size of the population is less than thousand, three digit number 000,001,002,
….. 999 are assigned. We may start at any place and may go on in any direction such as column
wise or row- wise in a random number table. But consecutive numbers are to be used. On the
basis of the size of the population and the random number table available with us, we proceed
according to our convenience. If any random number is greater than the population size N, then
N can be subtracted from the random number drawn. This can be repeatedly until the number is
less than N or equal to N. We proceed with the following steps
Step 1: each element numbered for example for a population of size 500 we assign 001 to 500.
Step 2: select a random starting point
Step 3: we need only respective number of digits. Proceed in this fashion until the required
number of sample selected

Example 1:

In an area there are 500 families. Using the following extract from a table of random numbers
select a sample of 15 families to find out the standard of living of those families in that area.

24
4652 3819 8431 2150 2352 2472 0043 3488
9031 7617 1220 4129 7148 1943 4890 1749
2030 2327 7353 6007 9410 9179 2722 8445
0641 1489 0828 0385 8488 0422 7209 4950
Solution:
In the above random number table we can start from any row or column and read three digit
numbers continuously row-wise or column wise.
Now we start from the third row, the numbers are:
203 023 277 353 600 794 109 179
272 284 450 641 148 908 280
Since some numbers are greater than 500, we subtract 500 from those numbers and we rewrite
the selected numbers as follows:
203 023 277 353 100 294 109 179
272 284 450 141 148 408 280

b. Stratified random sampling: is often used when the population is split into subgroups or
“strata”. The different subgroups are believed to be very different from each other, but it is
thought that the individuals who make up each subgroup are similar. The number of units to be
chosen from each sub-group is fixed in advance and the units are chosen by simple random
sampling within the sub group. Some of the criteria for dividing a population into strata are: Sex
(male, female); Age (under 18, 18 to 28, 29 to 39); Occupation (blue-collar, professional, other).
It is applied if the population is heterogeneous.

Example: An investigator is interested in securing a particular response that would be


representative of under graduate college student. He might stratify the population into four
groups: freshman, sophomore, junior and senior.

Merits and limitations of stratified sampling:

Merits:

1. It is more representative.
2. It ensures greater accuracy
3. It is easy to administer as the universe is sub - divided.
4. Greater geographical concentration reduces time and expenses.

25
5. When the original population is badly skewed, this method is appropriate.
6. For non – homogeneous population, it may field good results.

Limitations:
1. To divide the population into homogeneous strata, it requires more money, time and
statistical experience which is a difficult one.
2. Improper stratification leads to bias, if the different strata overlap such a sample will not
be a representative one.
c. Cluster sampling: in some case the identification and location of an ultimate unit for
sampling may require considerable time and cost in such cases cluster sampling is used. A
simple random sample of groups or cluster of elements is chosen and all the sampling units in
the selected clusters will be surveyed. In cluster sampling the population is subdivided into
groups or clusters and a probability of these clusters is then drawn and studied. Clusters may
be Region, Zones, Weredas, Kebeles, etc. This method of sampling has less cost, faster and
more convenient but it may not be very efficient and representative due to the usual
tendency of the units in different cluster be similar.
Example: if we want to study the travel habit of families in Ethiopia which is divided in to
Regions and Zones. We shall first draw a random sample from the Zones to be studied and
then from these selected Zones or clusters, we draw random sample of households for the
purpose of investigation. To estimate the average annual household income in a large city we
use cluster sampling, because to use simple random sampling we need a complete list of
households in the city from which to sample.
d. Systematic sampling: the items or individuals of the population are arranged in some way
alphabetically, in file drawer by data received or some other method. So that, A complete
list of all elements within the population (sampling frame) is required .A random starting
point is selected and then every Kth member of the population is selected for the sample. For
example if we want select n items from the population of size N using systematic sampling,
we divide N by n (N/n = K) and choose one (i) between 1 and K then we take every Kth
member. So the samples will be i, i+K, i+ 2K, i+ 3K, etc. where 0< i < K.
Example: Suppose we want to choose a sample of about 20 students out of a class of 100
students. First we put the class in order (may be alphabetical order, or by ID number) and give
each a number between 1 and 100. Next we divide 100 by 20 and we get 100/20 = 5. We now

26
choose a number at random between 1 and 5. The student corresponding to that number is the
first student in the sample, and we then take every 5th student. So if, for example, we choose the
number 2 the sample will consist of the 2nd, 7th, 12th, 17th, ..., 92nd and 97th students on the list.

Merits:
1. This method is simple and convenient.
2. Time and work is reduced much.
3. If proper care is taken result will be accurate.
4. It can be used in infinite population.
Limitations:
1. Systematic sampling may not represent the whole population.
2. There is a chance of personal bias of the investigators.
Systematic sampling is preferably used when the information is to be collected from trees in a
forest, house in blocks, entries in a register which are in a serial order etc.

e) Multi- stage sampling

For example, in a study of utilization of pit latrines in a district 150 homesteads are to be visited
for interviews with family members as well as for observations on types and cleanliness of
latrines. The district is composed of 6 wards and each Wereda has between 6 and 9 villages. First,
select 3 woredas out of the 6 by simple random sampling. Second, for each selected Wereda select
5 villages by simple random sampling (15 villages in total). Third, for each the selected villages
select 10 households.

2. Non Random Sampling or non-probability sampling


It is a sampling technique in which the choice of individuals for a sample depends on the basis of
convenience, personal opinion or interest. Some of these are:
(a). Judgment sampling.
(b). Convenience sampling
(c). Quota Sampling.

27
(a). Judgment Sampling

In this case, the person taking the sample has direct or indirect control over which items are
selected for the sample. The subjective judgment of the researcher is the basis for selecting items
to be included in a sample. Judgment sampling often used to pre-test the questionnaire.

(b). Convenience Sampling

In this method, the decision maker selects a sample from the population in a manner that is
relatively easy and convenient to the investigator or the data collectors. This technique is simply
convenient to the researcher in terms of time, money and administration.
(c). Quota Sampling
In this sampling technique major population characteristics play an important role in selection of
the sample. It has some aspects in common with stratified sampling, but has no randomization. In
this method, the decision maker requires the sample to contain a certain number of items with a
given characteristic. Many political polls are, in part, quota sampling.

Example: if a scientist is reorganizing that the variability in daily milk production may due to
age difference. Characteristics of cows will be selected from different age group. For instant 30%
of cows’ b/n ages 4-6 years old, and remaining 70% are b/n ages 6-8 years old, a quota sample
must reflect those same percentages.

Note: we can’t make inference about the population by using non-probability sampling.

Summary

Sampling is very often used in our daily life, especially to do real researches or solve in a
specific any problem in any discipline we had better to use probability sampling. Because of
probability sampling can be infer the characters tics of population based on information taken
from the sample. To apply the application of sampling theory we should know the definition of
some basic terms like population, sampling, sample statistic, parameter, sampling unit and
sampling frame etc.

28
Exercises 2
1. The difference between sample estimate and population parameter is termed as
a. Human error c. Non-sampling error
b. Sampling error d. None of the above
2. If each and every unit of population has equal chance of being included in the sample, it is
known as:
a. Restricted sampling c. Simple random sampling
b. Purposive sampling d. None of the above
3. Simple random sample can be drawn with the help of
a. Slip method c. Calculator
b. Random number table d. All the above
4. Explain the difference between the following pairs of terms.
a) Sample and sample size
b) Sampling frame and sampling unit
c) Target population and sampling population
d) Stratified sampling and cluster sampling

Assignment 2: (5%)

1. Describe the basic difference between Probability and non- probability sampling

2. A survey will be conducted on household water supply in a district comprising 20,000


households, of which 20% are urban and 80% rural, since it is suspected that in urban areas
the access to safe water sources is much more satisfactory. What sampling technique is best
for such condition? Justify.
3. Suppose a sample of 50 students is to be selected from a school of 250 students. Using a list
of all 250 students, each student is given a number (1 to 250), and these numbers are written
on small pieces of paper. All the 250 papers are put in a box, after which the box is shaken
vigorously, to ensure randomization. Then, 50 papers are taken out of the box, and the
numbers are recorded. What is the methodology used in this sample selection?

Reference
 Freund J.E and Simon G.A (1998), modern Elementary statistics (9th edition)
 Cochran, W. G. (1977). Sampling Techniques, 3rd , Ed, John Wiley& Sons, Inc., New
York.

29
Chapter 3: Methods of Data Collection and Presentations
Objectives
At the end of this unit the learners will able to:
 Mention the type data and its source
 Identify methods of data collection
 Construct frequency distribution for a raw data
 Identify the different methods of data organization and presentation
Contents
3.1 Introduction
3.2 Classification of data
3.3 Data collection

3.3.1 Methods of data collection

3.4 Methods of data presentation

3.4.1 Frequency distribution

3.4.2 Tabular presentation of data

3.4.3 Diagrammatic presentation of data

3.4.4 Graphical presentation of data

3.1 Introduction
Everybody collects, interprets and uses information, much of it in numerical or statistical forms
in day-to-day life. It is a common practice that people receive large quantities of information
everyday through conversations, televisions, computers, the radios, newspapers, posters, notices
and instructions. It is just because there is so much information available that people need to be
able to absorb, select and reject it. In everyday life, in business and industry, certain statistical
information is necessary and it is independent to know where to find it how to collect it. As
consequences, everybody has to compare prices and quality before making any decision about
what goods to buy. As employees of any firm, people want to compare their salaries and working
conditions, promotion opportunities and so on. In time the firms on their part want to control
costs and expand their profits.

30
One of the main functions of statistics is to provide information which will help on making
decisions. Statistics provides the type of information by providing a description of the present, a
profile of the past and an estimate of the future.

3.2 Classification of Data

It may be noted that different types of data can be collected for different purposes. The data can
be collected in connection with time or geographical location or in connection with time and
location. The following are the three types of data:
1. Time series data: It is a collection of a set of numerical values, collected over a period of
time. The data might have been collected either at regular intervals of time or irregular
intervals of time. For example the data for the three types of expenditures (food,
education, other) in D/Tabor for a family for the four years 2001,2002,2003,2004.
2. Spatial data: If the data collected is connected with that of a place, then it is termed as
spatial data. For example, the data may be number of runs scored by a batsman in different
test matches in a test series at different places, district wise rainfall in Ethiopia, prices of
silver in four metropolitan cities
3. Spacio-temporal data: If the data collected is connected to the time as well as place then
it is known as Spacio-temporal data.
4. Cross sectional data: data on many individual collected over a specified period of time.
5. Longitudinal data: data on multiple entities collected over two or more times in an interval
of time.

3.3 Data Collection

3.3.1 Methods of data collection

Data is a general term for observations and measurements collected during any type of scientific
investigation. Based on source data can be classified as:
1. Primary Data
Primary data is the one, which is collected by the investigator himself for the purpose of a
specific inquiry or study. Such data is original in character and is generated by survey conducted
by individuals or research institution or any organization.

31
Example
If a researcher is interested to know the impact of noon meal scheme for the school children, he
has to undertake a survey and collect data on the opinion of parents and children by asking
relevant questions. Such a data collected for the purpose is called primary data.
Two activities involved: planning and measuring.
a) Planning:
 Identify source and elements of the data.
 Decide whether to consider sample or census.
 If sampling is preferred, decide on sample size, selection method,… etc
 Decide measurement procedure.
 Set up the necessary organizational structure.
The primary data can be collected by the following five methods:
 Focus Group discussion
 Telephone Interview
 Mail Questionnaires
 Interviews
 self-administered questionnaire
 Experiments
 Diary
 Observations

32
Merits and Demerits of primary data:

1. The collection of data by the method of personal survey is possible only if the area
covered by the investigator is small. Collection of data by sending the enumerator is
bound to be expensive. Care should be taken twice that the enumerator record correct
information provided by the informants.
2. Collection of primary data by framing a schedules or distributing and collecting
questionnaires by post is less expensive and can be completed in shorter time.
3. Suppose the questions are embarrassing or of complicated nature or the questions probe
into personnel affairs of individuals, then the schedules may not be filled with accurate
and correct information and hence this method is unsuitable.
4. The information collected for primary data is mere reliable than those collected from the
secondary data.

2. Secondary Data

Secondary data are those data which have been already collected and analyzed by some earlier
agency for its own use; and later the same data are used by a different agency. According to
W.A.Neiswanger, ‘A primary source is a publication in which the data are published by the same
authority which gathered and analyzed them. A secondary source is a publication, reporting the
data which have been gathered by other authorities and for which others are responsible’.
Sources of Secondary data:
In most of the studies the investigator finds it impracticable to collect first-hand information on
all related issues and as such he makes use of the data collected by others. There is a vast amount
of published information from which statistical studies may be made and fresh statistics are
constantly in a state of production. The sources of secondary data can broadly be classified under
two heads:
a. Published sources, and
b. Unpublished sources.
a) Published Sources:
The various sources of published data are:
1. Reports and official publications of

33
i. International bodies such as the International Monetary Fund, International Finance
Corporation and United Nations Organization.
ii. Central and State Governments such as the Report of the Tandon Committee and Pay
Commission.
2. Semi-official publication of various local bodies such as Municipal Corporations and District
Boards.
3. Private publications-such as the publications of –
i. Trade and professional bodies such as the Federation of Indian Chambers of Commerce
and Institute of Chartered Accountants.
ii. Financial and economic journals such as ‘Commerce’, ‘Capital’ and ‘Indian Finance’.
iii. Annual reports of joint stock companies.
iv. Publications brought out by research agencies, research scholars, etc.
It should be noted that the publications mentioned above vary with regard to the periodically of
publication. Some are published at regular intervals (yearly, monthly, weekly etc.,) whereas
others are ad hoc publications, i.e., with no regularity about periodicity of publications.

b) Unpublished Sources
All statistical material is not always published. There are various sources of unpublished data
such as records maintained by various Government and private offices, studies made by research
institutions, scholars, etc. Such sources can also be used where necessary
When our source is secondary data check that:
 The type and objective of the situations.
 The purpose for which the data are collected and compatible with the present
problem.
 The nature and classification of data is appropriate to our problem.
 There are no biases and misreporting in the published data.
Precautions in the use of Secondary data
The following are some of the points that are to be considered in the use of secondary data
1. How the data has been collected and processed
2. The accuracy of the data
3. How far the data has been summarized

34
4. How comparable the data is with other tabulations
5. How to interpret the data, especially when figures collected for one purpose is used for
another
Generally speaking, with secondary data, people have to compromise between what they want
and what they are able to find.
Merits and Demerits of Secondary Data:
1. Secondary data is cheap to obtain. Many government publications are relatively cheap
and libraries stock quantities of secondary data produced by the government, by
companies and other organizations.
2. Large quantities of secondary data can be got through internet.
3. Much of the secondary data available has been collected for many years and therefore it
can be used to plot trends.
4. Secondary data is of value to:
- The government – help in making decisions and planning future policy.
- Business and industry – in areas such as marketing, and sales in order to appreciate
the general economic and social conditions and to provide information on competitors.
- Research organizations – by providing social, economical and industrial information.
Note: Data which are primary for one may be secondary for the other.

3.3 Methods of Data Presentation

Having collected and edited the data, the next important step is to organize it. That is to present it
in a readily comprehensible condensed form that aids in order to draw inferences from it. It is
also necessary that the like be separated from the unlike ones.

The presentation of data is broadly classified in to the following two categories:


 Tabular presentation
 Diagrammatic and Graphic presentation.

3.3.1 Frequency Distribution

The process of arranging data into classes or categories according to similarities technically is
called classification.

35
Classification is a preliminary and it prepares the ground for proper presentation of data.

Definitions:

 Raw data: recorded information in its original collected form, whether it be counts or
measurements, is referred to as raw data.
 Frequency: is the number of values in a specific class of the distribution.
 Frequency distribution: is the organization of raw data in table form using classes and
frequencies.

There are three basic types of frequency distributions

 Categorical frequency distribution


 Ungrouped frequency distribution
 Grouped frequency distribution

There are specific procedures for constructing each type.

1) Categorical frequency Distribution:

Used for data that can be place in specific categories such as nominal, or ordinal. e.g. marital
status.

Example: a social worker collected the following data on marital status for 25
persons.(M=married, S=single, W=widowed, D=divorced)

M S D W D
S S M M M
W D S M M
W D D S S
S W W D D

Solution:

Since the data are categorical, discrete classes can be used. There are four types of marital status
M, S, D, and W. These types will be used as class for the distribution. We follow procedure to
construct the frequency distribution.
Step 1: Make a table as shown.

36
Class Tally Frequency Percent
(1) (2) (3) (4)
M
S
D
W

Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% * 100 Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be added since they
are used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing all the steps one can construct the following frequency distribution.

Class Tally Frequency Percent

(1) (2) (3) (4)


M //// 5 20
S //// // 7 28
D //// // 7 28
W //// 6 24

2) Ungrouped frequency Distribution:

It is a table of all the potential raw score values that could possible occur in the data along with
the number of times each actually occurred. It is often constructed for small set or data on
discrete variable.
Constructing ungrouped frequency distribution:

37
 First find the smallest and largest raw score in the collected data.

 Arrange the data in order of magnitude and count the frequency.

 To facilitate counting one may include a column of tallies.


Example:
The following data represent the mark of 20 students.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.

3) Grouped frequency Distribution:

38
When the range of the data is large, the data must be grouped in to classes that are more than one
unit in width.

Definitions:

 Grouped Frequency Distribution: a frequency distribution when several numbers are


grouped in one class.
 Class limits: Separates one class in a grouped frequency distribution from another. The
limits could actually appear in the data and have gaps between the upper limits of one class
and lower limit of the next.
 Units of measurement (U): the distance between two possible consecutive measures. It is
usually taken as 1, 0.1, 0.01, 0.001, -----.
 Class boundaries: Separates one class in a grouped frequency distribution from another. The
boundaries have one more decimal places than the row data and therefore do not appear in
the data. There is no gap between the upper boundary of one class and lower boundary of the
next class. The lower class boundary is found by subtracting U/2 from the corresponding
lower class limit and the upper class boundary is found by adding U/2 to the corresponding
upper class limit.
 Class width: the difference between the upper and lower class boundaries of any class. It is
also the difference between the lower limits of any two consecutive classes or the difference
between any two consecutive class marks.
 Class mark (Mid points): it is the average of the lower and upper class limits or the average
of upper and lower class boundary.
 Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
 Cumulative frequency above: it is the total frequency of all values greater than or equal to
the lower class boundary of a given class.
 Cumulative frequency blow: it is the total frequency of all values less than or equal to the
upper class boundary of a given class.

39
 Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class interval
together with their corresponding cumulative frequencies. It can be more than or less than
type, depending on the type of cumulative frequency used.
 Relative frequency (rf): it is the frequency divided by the total frequency.
 Relative cumulative frequency (rcf): it is the cumulative frequency divided by the total
frequency.

Guidelines for classes

1. There should be between 5 and 20 classes.


2. The classes must be mutually exclusive. This means that no data value can fall into two
different classes.
3. The classes must be all inclusive or exhaustive. This means that all data values must be
included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width. The exception here is the first or last class. It is possible
to have an "below ..." or "... and above" class. This is often used with ages.

Steps for constructing Grouped frequency Distribution

1. Find the largest and smallest values


2. Compute the Range(R) = Maximum - Minimum
3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k  1  3.32 log n where k is number of classes desired and n is total number of observation.
4. Find the class width by dividing the range by the number of classes and rounding up, not off.
R
w .
k
5. Pick a suitable starting point less than or equal to the minimum value. The starting point is
called the lower limit of the first class. Continue to add the class width to this lower limit to
get the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second class.
Then continue to add the class width to this upper limit to find the rest of the upper limits.

40
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units from
the upper limits. The boundaries are also half-way between the upper limit of one class and
the lower limit of the next class. may not be necessary to find the boundaries.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not
be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies

Example 1*:
Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes desired using Sturges formula;
k  1  3.32 log n =1+3.32log (20) =5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
 6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
 11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41

41
Step 7: Find the class boundaries;
E.g. for class 1, Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5

 Then continue adding w on both boundaries to obtain the rest boundaries. By doing so one
can obtain the following classes.

Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.
The complete frequency distribution follows:

Class Class boundary Class Tally Freq. Cf (less Cf (more rf. rcf (less
limit Mark than than than
type) type) type
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55

24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75


30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

42
3.3.2 Diagrammatic and Graphic presentation of data

These are techniques for presenting data in visual displays using geometric and pictures.

Some importances of diagrammatic and graphic presentations are:

 They have greater attraction.


 They facilitate comparison.
 They are easily understandable.

-Diagrams are appropriate for presenting discrete data.

-The three most commonly used diagrammatic presentation for discrete as well as qualitative
data are:
 Pie charts
 Pictogram
 Bar charts

Pie chart

A pie chart is a circle that is divided into sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:

𝑉𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑟𝑡


𝑎𝑛𝑔𝑙𝑒 𝑜𝑓 𝑠𝑒𝑐𝑡𝑜𝑟 = 𝑡ℎ𝑒 𝑤ℎ𝑜𝑙𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦 ∗ 360

Example: Draw a suitable diagram to represent the following population in a town.

Men Women Girls Boys


2500 2000 4000 1500
Solutions:
Step 1: Find the percentage.
Step 2: Find the number of degrees for each class.

Class Frequency Percent Degree 43


Men 2500 25 90 Step 3: Using a protractor and compass, graph

Women 2000 20 72 each section and write its name corresponding


percentage.
Girls 4000 40 144
Boys 1500 15 54 CLASS

Boy s Men

Girls Women

Pictogram
In these diagrams, we represent data by means of some picture symbols. We decide about a
suitable picture to represent a definite number of units in which the variable is measured.

Example: draw a pictogram to represent the Population of a town.

Year 1989 1990 1991 1992


Population 2000 3000 5000 7000

This can be represented by pictograms:


Year Population
1989 
1990 
1991 
1992 

44
 Stands for 1000 people

Bar Charts:

- A set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being :
 Simple bar chart
 Deviation or two way bar chart
 Broken bar chart
 Component or sub divided bar chart.
 Multiple bar charts.

Simple Bar Chart

-Are used to display data on one variable.

-They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity
is represented by the height /length of the bar.

Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.

Product Sales($) Sales($) Sales($)


In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54

Solutions:

45
Sales by product in 1957

30
25
Sales in $

20
15
10
5
0
A B C
product

Component Bar chart

When there is a desire to show how a total (or aggregate) is divided in to its component parts, we
use component bar chart.
The bars represent total value of a variable with each total broken in to its component parts and
different colors or designs are used for identifications

Example:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:

SALES BY PRODUCT 1957-1959

100

80
Sales in $

Product C
60
Product B
40
Product A
20

0
1957 1958 1959
Year of production

46
Multiple Bar charts

These are used to display data on more than one variable. They are used for comparing different
variables at the same time.
Example:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:

Sales by product 1957-1959

60
50
Sales in $

40 Product A
30 Product B
20 Product C

10
0
1957 1958 1959
Year of production

Graphical Presentation of data

- The histogram, frequency polygon and cumulative frequency graph or ogive is most
commonly applied graphical representation for continuous data.
Procedures for constructing statistical graphs:

 Draw and label the X and Y axes.


 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the Y
axes.

47
 Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the X axes.
 Plot the points.
 Draw the bars or lines to connect the points.

Histogram: is a graph which displays the data by using vertical bars of various heights to
represent frequencies. Class boundaries are placed along the horizontal axes. Class marks and
class limits are sometimes used as quantity on the X axes.

Example: Construct a histogram to represent the previous data (example 1*).

Frequency Polygon is a line graph. The frequency is placed along the vertical axis and classes
mid points (class marks) are placed along the horizontal axis. It is customer to the next higher
and lower class interval with corresponding frequency of zero, this is to make it a complete
polygon.

Example: Draw a frequency polygon for the above data (example *).

Solutions:

4
Value Frequency

0
2. 5 8. 5 14.5 20.5 26.5 32.5 38.5 44.5

Class Mid points

48
Ogive (cumulative frequency polygon): is a graph showing the cumulative frequency (less
than or more than type) plotted against upper or lower class boundaries respectively. That is class
boundaries are plotted along the horizontal axis and the corresponding cumulative frequencies
are plotted along the vertical axis. The points are joined by a free hand curve.

Summary

Data is a general term for observations and measurements collected during any type of scientific
investigation. Based on source; data classified as primary and secondary. Primary data be
collected through observation, experimentation, interview, questionnaire. Frequency distribution
is arrangement of data using class and frequency in table form. Cumulative frequency
distribution: The tabulation of a sample of observations in terms of numbers falling below
particular values. The tabulated data is presented using either diagrammatic or graphic
presentation methods. Diagrammatic presentation like bar chart, pie chart, pictogram are more
appropriate for discrete data while graphic presentation like histogram, ogive curves (less/more
than cumulative frequency curve) and frequency polygons are appropriate for continuous
frequency distributions.

Exercise 3

1. A researcher observes aggressive behavior for a sample of n = 15 boys and classifies each boy
as high, medium, or low in terms of aggression. If the frequency distribution for these scores is
presented in a graph, what kind of graph would be appropriate?

a. a bar graph c. a polygon


b. a histogram d. all of the above
2. In a frequency distribution graph, frequencies are presented on the ____ and the scores
(categories) are listed on the ____respectively.

a. X axis/Y axis c. Y axis/X axis


b. horizontal line/vertical line d. class interval/horizontal line

Assignment 3: (6%)
1. What are the points that are to be considered in the use of secondary data?
2. What are the sources of secondary data?

49
3. Give the merits and demerits of primary data and secondary data.
4. In a survey, it was found that 64 families bought milk in the following quantities in a
particular month. Quantity of milk (in litres) bought by 64 Families in a month. 19 16 22 9 22
12 39 19 14 23 6 24 16 18 7 17 20 25 28 18 10 24 20 21 10 7 18 28 24 20 14 23
25 34 22 5 33 23 26 29 13 36 11 26 11 37 30 13
8 15 22 21 32 21 31 17 16 23 12 9 15 27 17 21
Construct a continuous frequency distribution.

Reference

 Bluman, A.G (1995). Elementary statistics (2nd edition)


 Gupta, C.P (2004). Introduction to statistical methods (9th edition)
 Freund J.E and Simon G.A (1998). Modern Elementary statistics (9th edition)

50
Chapter 4: Measures of Central Tendency

Objectives

At the end of this unit the learners will be able to:


 Explain objectives of measures of central tendency
 Describe the important characteristics of a good average
 Compute mean
 Compute median
 Compute the mode of a grouped frequency distribution

Contents

4.1 Introduction and objectives of measures of central tendency


4.2 Properties of a good average
4.3 Summation notation
4.4 Types of measures of central tendency
4.4.1 Mean
4.4.1.1 Arithmetic Mean
4.4.1.2 Weighted arithmetic mean
4.4.1.3 Geometric Mean
4.4.1.4 Harmonic Mean
4.4.2 Median
4.4.3 Mode

4.1 Introduction and Objectives of Measuring Central Tendency

A single value that describes the characteristics of the entire mass of data is called measures of
central tendency or average.

Objectives of measuring central tendency are:

 To get a single value that represent(describe) characteristics of the entire data


 To summarizing/reducing the volume of the data
 To facilitating comparison within one group or between groups of data

51
 To enable further statistical analysis

4.2 Desirable properties of measure of central tendency

We say a measure of central tendency is best if it posses most of the following. It should:
- be simple to understand and easy to calculate/interpret,
- exist and be unique,
- be rigidly defined by mathematical formula,
- based on all observations,
- Not be seriously affected by extreme observations,
- Have capable of further statistical analysis and/or algebraic manipulation.

4.3 The Summation Notation (∑)

Let a data set consists of a number of observations, represents by x1 , x 2 , ..., x n where n (the last
subscript) denotes the number of observations in the data and x i is the ith observation. Then the
sum 𝑥1 + 𝑥2 + ⋯ + 𝑥𝑛 = ∑𝑛𝑖=1 𝑥𝑖

For instance a data set consisting of six measurements 21, 13, 54, 46, 32 and 37 is represented by
x1 , x 2 , x3 , x 4 , x5 and x 6 where x1 = 21, x 2 = 13, x3 = 54, x 4 = 46, x5 = 32 and x 6 = 37.
6
Their sum becomes xi 1
i  21+13+59+46+32+37=208.
n
Similarly x1  x2  ...  xn =  xi
2 2 2 2

i 1

Some Properties of the Summation Notation


n
1.  c = n.c where c
i 1
is a constant number.
n n
2.  b.x
i 1
i  b xi where b is a constant number
i 1
n n
3.  (a  bxi )  n.a  b xi where a and b are constant numbers
i 1 i 1
n n n
4.  (x
i 1
i  y i )  xi   y i
i 1 i 1
n n n
5. x y
i 1
i i  x  y
i 1
i
i 1
i

Example:

52
12 12 12 12

 x  26,  y 17 ,  x  484,  y


2
 362
2
Let i i
i i
i 1 i 1 i 1 i 1

12 12
Find I )  (4x  3 y ),
i 1
i i
II )  2x ( x  7)
i 1
i i

12 12 12
Solution: I )  (4x
i 1
i
 3 y )  4 xi  Y i  4(26)  3(17)  105
i
i 1 i 1

12 12 12

 2xi ( xi  7)  2 xi  14 xi  2(484)  14(26)  604


2
II )
i 1 i 1 i 1

4.4 Types of Measures of Central Tendency

Several types of averages or measures of central tendency can be defined, the most commons are

- the mean
- the mode
- the median

4.4.1. The Mean

There are four of means: Arithmetic mean, weighted arithmetic mean, Harmonic mean and
Geometric mean.

4.4.1.1 Arithmetic mean types

It is defined as the sum of the measurements of the items divided by the total number of items.

Arithmetic Mean for Ungrouped Frequency Distribution

When the data are arranged or given on the form of ungrouped frequency distribution, then the
formula for the mean is

𝑓1 𝑥1 + 𝑓2 𝑥2 + ⋯ + 𝑓𝑘 𝑥𝑘 ∑𝑘𝑖=1 𝑓𝑖 𝑥𝑖 k
𝑋̅ =
𝑓1 + 𝑓2 + ⋯ + 𝑓𝑘
= 𝑘
∑𝑖=1 𝑓𝑖
Note that f
i 1
i n

Example 1: You measure the body lengths (in inches) of 10 full-term infants at birth and record
the following:

53
17.5, 19.5, 17.5, 19, 20, 21, 18, 19.5, 18, 10.75

Compute the mean length of the infants for these data.

Example 2: Monthly incomes of fourth year regular students are given in the following
frequency distribution.

Monthly income (birr) 54.5 64.5 74.5 84.5 94.5 104.5 114.5
Number of students 6 9 15 25 13 7 5
Compute the mean for these data.

Arithmetic Mean for Grouped Frequency Distribution

If data are given in the form of continuous frequency distribution, the sample mean can be
computed as
k

f m i i f m 
f m  ...  f m
x i 1
 1 1 2 2 k k

f
k
f  f  ...  f
1 2 k
i
i 1

Where mi is he class mark of the i th class; i = 1, 2, …, k


f i = the frequency of the i th class and k = the number of classes

k
Note that  f i  n = the total number of observations.
i 1

Example: The following table gives the daily wages of laborers. Calculate the average daily
wages paid to a laborer.

Wages in birr 11-13 13-15 15-17 17-19 19-21 21-23 23-25


Number of laborers 3 4 5 6 6 4 3
Properties of the Arithmetic Mean

 The sum of the deviations of the items from their arithmetic mean is zero. This means, the
algebraic sum of the deviations of a set of numbers x1 , x 2 , ..., x n from their mean x is zero.
n
That is  ( xi  x )  0
i 1

54
 The sum of the squares of the deviations of a set of observations from any number, say A, is
2
minimum when A=𝑋̅ . That is, ∑(𝑥𝑖 − 𝑥̅ )2 ≤ ∑(𝑥𝑖 − 𝐴)
 When a set of observations is divided into k groups and x1 is the mean of n1 observations of

group 1, x 2 is the mean of n 2 observations of group2, …, x k is the mean of n k observations

of group k , then the combined mean ,denoted by x c , of all observations taken together is
given by
𝑛1 𝑥̅1 + 𝑛2 𝑥̅2 + ⋯ + 𝑛𝑘 𝑥̅ 𝑘 ∑𝑘𝑖=1 𝑛𝑖 𝑥̅𝑖
𝑋̅𝑐 = = 𝑘
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 ∑𝑖=1 𝑛𝑖

 If a wrong figure has been used in calculating the mean, we can correct if we know the
correct figure that should have been used. Let
 𝑋𝑤𝑟 denote the wrong figure used in calculating the mean
 𝑋𝑐 be the correct figure that should have been used
 𝑋̅𝑤𝑟 be the wrong mean calculated using 𝑋𝑤𝑟 , then the correct mean, 𝑋̅𝑐𝑜𝑟𝑟𝑒𝑐𝑡 , is given
by
̅
𝑛𝑋 + 𝑋 − 𝑋
𝑋̅𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = 𝑤𝑟 𝑛𝑐 𝑤𝑟

 If the mean of x1 , x 2 , ..., x n is x , then

a) the mean of x1  k , x 2  k , ..., x n  k will be x  k

b) The mean of kx1 , kx2 , ..., kxn will be kx .


Example 1: Last year there were three sections taking Stat 273 course in Alemaya University. At
the end of the semester, the three sections got average marks of 80, 83 and 76. There were 28, 32
and 35 students in each section respectively. Find the mean mark for the entire students.

Solution:
n1 x1  n2 x 2  n3 x3 28(80)  32(83)  35(76) 7556
xc     79.54
n1  n2  n3 28  32  35 95

Example 2: An average weight of 10 students was calculated to be 65 kg, but latter, it was
discovered that one measurement was misread as 40 kg instead of 80 kg. Calculate the corrected
average weight.

55
𝑛𝑋̅𝑤𝑟 + 𝑋𝑐 − 𝑋𝑤𝑟 10(65)+80−40
Solution: 𝑋̅𝑐𝑜𝑟𝑟𝑒𝑐𝑡 = = = 69
𝑛 10

Exercise: The average score on the mid-term examination of 25 students was 75.8 out of 100.
After the mid-term exam, however, a student whose score was 41 out of 100 dropped the course.
What is the average/mean score among the 24 students?

4.4.1.2 Weighted Arithmetic Mean

In finding arithmetic mean, all items were assumed to be of equally importance (each value in
the data set has equal weight). When the observations have different weight, we use weighted
average. Weights are assigned to each item in proportion to its relative importance.

If x1 , x 2 , ..., x k represent values of the items and w1 , w2 , ... , wk are the corresponding weights, then

the weighted mean, ( x w ) is given by

𝑤1 𝑥1 +𝑤2 𝑥2 +⋯+𝑤𝑘 𝑥𝑘 ∑𝑘
𝑖=1 𝑤𝑖 𝑥𝑖
𝑋̅𝑤 = = ∑𝑘
𝑤1 +𝑤2 +⋯+𝑤𝑘 𝑖=1 𝑤𝑖

Example: A student’s final mark in Mathematics, Physics, Chemistry and Biology are
respectively 82, 80, 90 and 70. If the respective credits received for these courses are 3, 5, 3 and
1, determine the approximate average mark the student has got for one course.

Solution: We use a weighted arithmetic mean, weight associated with each course being taken as
the number of credits received for the corresponding course.
𝑥𝑖 82 80 90 70
𝑤𝑖 3 5 3 1

Therefore x w 
w x i i

(3  82)  (5  80)  (3  90)  (1  70)
 82.17
w i 3  5  3 1

Average mark of the student for one course is approximately 82.

Exercise 1: If a student gets A in 4 cr. hrs, B in 3 cr. hrs and D in 2 cr. hrs courses, what is his
GPA in this semester?

56
Values 4 3 1
Weight 4 3 2

Answer: GPA=3

4.4.1.3 Geometric Mean:

It used when observed values are measured as ratios, percentages, proportions, indices or growth
rates.

GM  n
x . x .... x
1 2 n
,

GM  n f1 f 2 .... f k
If the observed have frequencies x .x
1 2 x k

Example: compute the geometric mean of the following values: 2, 8, 6, 4, 10, 6, 8, 4


Solution:

Values 2 4 6 8 10 Total

Frequencies 1 2 2 2 1 8

GM  * 6 .*8 *10  5.41


2 2 2
8
2*4
3.4.1.4 Harmonic Mean:
It is a suitable measure of central tendency when the data pertains to speed, rate and time.
n n
HM  
1 1 1
i1
n
 .... 
x i x 1 x n

If the data arranged in the form of frequency distribution

 f
k
i 1 f  .....  f
HM  i
 1 k
1 1

n
i 1
f ix i
f x  ........  f x
1 1 k k

Example: A motorist travels 480km in 3 days. She travels for 10 hours at rate of 48km/hr on 1st day,
for 12 hours at rate of 40km/hr on the 2nd day and for 15 hours at rate of 32km/hr on the 3rd day.
What is her average speed?

57
3
HM   39.92
1 1 1
 
48 40 32

Merits of Arithmetic Mean

- Arithmetic mean has a rigidly defined mathematical formula so that its value is always
definite.
- It is calculated based on all observations.
- Arithmetic mean is simple to calculate and easy to understand.
- It doesn’t need arrangement of data in increasing or decreasing order.
- Arithmetic mean is also capable of further algebraic treatment.
- It affords a good standard of comparison.

Drawbacks of Arithmetic Mean

- It is highly affected by extreme (abnormal) values in the series.


- It can be a number which does not exist in the series.
- It sometime gives such results which appear almost absurd. For example it is likely that we
can get an average of ‘3.6 children’ per family.
- It can’t be calculated for open-ended classes.

4.4.2 The Median

The median of a set of items (numbers) arranged in order of magnitude (i.e. in an array form) is the
middle value or the arithmetic mean of the two middle values. We shall denote the median of
x1 , x 2 , ..., x n by ~x . For ungrouped data the median is obtained by

 x n 1 if the number of items, n, is odd



 2
~
x  1
 ( x n  x n  2 ) if the number of items, n, is even

2 2 2

For grouped data the median, obtained by interpolation method, is given by


𝑛
−𝐹
𝑋̃ = 𝐿𝑚𝑒𝑑 + 𝑊 (𝑓2 )
𝑚𝑒𝑑

Where Lmed  lower class boundary of the median class

58
F  Sum of frequencies of all class lower than the median class (in other words it is the

cumulative frequency immediately preceding the median class)


f med  Frequency of the median class and W  is class width

The median class is the class with the smallest cumulative frequency greater than or equal to n .
2
Examples1: The birth weights in pounds of five babies born in a hospital on a certain day are 9.2,
6.4, 10.5, 8.1 and 7.8. Find the median weight of these five babies.
Solution: 𝑋̃ = 8.1

Exercise: The following table gives the distribution of the weekly wages of employees of a small
firm.

Wages in birr No. of employees


126 and below 3
127 – 135 5
136 – 144 9
145 – 153 12
154 – 162 5
163 – 171 4
172 and above 2

a) Find the median weekly wage.


b) Why is the median a more suitable measure of central tendency than the mean in this case?
Merits of median

- It is not influenced by extreme values.


- Arithmetic mean is rigidly defined a mathematical formula so that its value is always definite.
- Median can be calculated even in case of open-ended intervals.
- It can be computed for ratio, interval, and ordinal level of data.
Demerits of median
- It is not capable of further algebraic treatment.
- It is not a good representative of the data if the number of items (data) is small.
- The arrangement of items in order of magnitude is sometimes very tedious process if the number
of items is very large.

59
4.4.3 The Mode

The mode refers to that value in a distribution, which occur most frequently. It is an actual value,
which has the highest concentration of items in and around it. According to Croxton and Cowden
“The mode of a distribution is the value at the point around which the items tend to be most heavily
concentrated. It may be regarded at the most typical of a series of values”.

The mode or the modal value is denoted by x̂ . Note that the mode may not exist in the series or, even
if it does exist, it may not be unique.

For grouped data, the mode is found by the following formula:


 1 
xˆ  Lmod   W
 1   2 
Where Lmod  lower class boundary of the modal class
1  The difference between the frequency of the modal class and frequency of the class
immediately preceding the modal class

 2  The difference between the frequency of the modal class and frequency of the class
immediately follows the modal class

W  is the class width


The modal class is the class with the highest frequency in the distribution.

Examples 1: The marks obtained by ten students in a semester exam in statistics are: 70, 65, 68, 70,
75, 73, 80, 70, 83 and 86. Find the mode of the students’ marks. Mode=70

Example 2: Find the mode for the frequency distribution of the birth weight (in kilogram) of 30
children given below.

Weight 1.9-2.3 2.3-2.7 2.7-3.1 3.1-3.5 3.5-3.9 3.9-4.3

No. of children 5 5 9 4 4 3

Solution: 2.7-3.1 is the modal class since it has the highest frequency
1  9  5  4 and  2  9  4  5 Lmod  2.7

 4 
xˆ  2.7    * 0.4  2.878
 4 5

60
Merits of mode

- Mode is not affected by extreme values.


- Mode can be calculated even in the case of open-end intervals. And it is not necessary to know all
observations.
- It can be computed for all level of data i.e. ratio, interval, ordinal or nominal.

Demerits of mode

- Mode may not exist in the series and if it exists it may not be a unique value.
- It does not fulfill most of the requirements of a good measure of central tendency

4.3.4 Quantiles

Quantiles are values which divides the data set arranged in order of magnitude in to certain equal
parts. They are averages of position (non-central tendency). Some of these are quartiles, deciles and
percentiles.

I. Quartiles: are values which divide the data set in to four equal parts, denoted by Q1 ,Q2 and Q3 .
The first quartile is also called the lower quartile and the third quartile is the upper quartile. The
second quartile is the median.
 For Ungrouped data:
Let Q j be the j th quartile value for j  1, 2, 3 . Then
th
j 
Q j   n  1 item; j  1, 2, 3.
4 
Example:
Compute quartiles for the data given below 25, 18, 30, 8, 15, 5, 10, 35, 40, and 45
Solution: arrange the data in ascending order.
5, 8, 10, 15, 18, 25, 30, 35, 40, 45
𝑡ℎ 𝑡ℎ
1 1
𝑄1 = ( (𝑛 + 1)) 𝑖𝑡𝑒𝑚 = ( (10 + 1)) 𝑖𝑡𝑒𝑚 = (2.75)𝑡ℎ 𝑖𝑡𝑒𝑚
4 4
= 2𝑛𝑑 𝑖𝑡𝑒𝑚 + 0.75(3𝑟𝑑 − 2𝑛𝑑 )𝑖𝑡𝑒𝑚 = 8 + 0.75(10 − 8) = 9.5

61
𝑡ℎ 𝑡ℎ
2 2
𝑄2 = ( (𝑛 + 1)) 𝑖𝑡𝑒𝑚 = ( (10 + 1)) 𝑖𝑡𝑒𝑚 = (5.5)𝑡ℎ 𝑖𝑡𝑒𝑚 = 5𝑡ℎ 𝑖𝑡𝑒𝑚 + 0.5(6𝑡ℎ − 5𝑡ℎ )𝑖𝑡𝑒𝑚
4 4
= 18 + 0.5(25 − 18) = 21.5
𝑡ℎ 𝑡ℎ
3 3
𝑄3 = ( (𝑛 + 1)) 𝑖𝑡𝑒𝑚 = ( (10 + 1)) 𝑖𝑡𝑒𝑚
4 4
= (8.25)𝑡ℎ 𝑖𝑡𝑒𝑚85𝑡ℎ 𝑖𝑡𝑒𝑚 + 0.25(9𝑡ℎ − 8𝑡ℎ )𝑖𝑡𝑒𝑚 = 35 + 0.25(40 − 35) = 36.25
 For grouped data
We can apply the following formula:
 j  n 4  FQ j 
Q j  LQ j   W ; j  1, 2, 3.
 fQj 
 

Where Q j  the j th quartile we are going to calculate

LQ j  Lower class boundary of the j th quartile class

FQ j  Sum of frequencies of all classes lower than the j th quartile class

f Q j  Frequency of the j th quartile class and W  Class width

The j th quartile class is the class with the smallest cumulative frequency greater than or equal to
j  n4 .

II. Deciles: are values dividing the data in to ten equal parts, denoted by D1 , D2 , ..., D9 . The fifth
decile is the median.
 For Ungrouped data
Let D j be the j th percentile value for j  1, 2, ... , 9 . Then
th
 j 
D j   n  1 item; j  1, 2, ... , 9
 10 
Example: Compute 𝐷5 , 𝐷3 for the data given below 5, 24, 36, 12, 20, 8
Solution: first arrange the data in ascending order: 5, 8, 12, 20, 24, 36
𝑡ℎ 𝑡ℎ
3 3
𝐷3 = ( (𝑛 + 1)) 𝑖𝑡𝑒𝑚 = ( (6 + 1)) 𝑖𝑡𝑒𝑚 = (2.1)𝑡ℎ 𝑖𝑡𝑒𝑚
10 10
= 2𝑛𝑑 𝑖𝑡𝑒𝑚 + 0.1(3𝑟𝑑 − 2𝑛𝑑 )𝑖𝑡𝑒𝑚 = 8 + 0.1(12 − 8) = 8.4

62
𝑡ℎ 𝑡ℎ
5 5
𝐷5 = ( (𝑛 + 1)) 𝑖𝑡𝑒𝑚 = ( (6 + 1)) 𝑖𝑡𝑒𝑚 = (3.5)𝑡ℎ 𝑖𝑡𝑒𝑚
10 10
= 3𝑟𝑑 𝑖𝑡𝑒𝑚 + 0.5(4𝑡ℎ − 3𝑟𝑑 )𝑖𝑡𝑒𝑚 = 12 + 0.5(20 − 12) = 16
 For grouped data
We can apply the following formula:
 j  n10  FD j 
D j  LD j  W ; j  1, 2, ... , 9
 f Dj 
 
Define the symbols similar way as we did in the case of quartiles.
The j th decile class is the class with the smallest cumulative frequency greater than or equal to j  n 10 .

Percentiles: are values which divide the data in to one hundred equal parts, denoted by P1 , P2 , ... P99 .
The fiftieth percentile is the median.

 For ungrouped data


Let Pj be the percentile value for j  1, 2, 3, ... , 99 . Then
th
 j
Pj   n  1 item; j  1, 2, 3, ... , 99
 100 
Example: Calculate P 15 for the data given below: 5, 24, 36, 12, 20, 8
Solution: Arranging the given values in the increasing order: 5, 8, 12, 20, 24, 36
𝑡ℎ 𝑡ℎ
15 15
𝑃15 =( (𝑛 + 1)) 𝑖𝑡𝑒𝑚 = ( (6 + 1)) 𝑖𝑡𝑒𝑚 = (1.05)𝑡ℎ 𝑖𝑡𝑒𝑚
100 100
= 1𝑠𝑡 𝑖𝑡𝑒𝑚 + 0.05(2𝑛𝑑 − 1𝑠𝑡 )𝑖𝑡𝑒𝑚 = 5 + 0.05(8 − 5) = 5.15
 For grouped data
We can use the following formula:

 j  n100  FPj 
Pj  LPj   W ; j  1, 2, 3, ... , 99
 f Pj 
 

Define the symbols similar way as we did in the case of quartiles.


The j th percentile class is the class with the smallest cumulative frequency greater than or equal to
j  n 100 .

63
Interpretations

1. Q j is the value below which ( j  25) percent of the observations in the series are found (where

j  1, 2, 3 ). For instance Q3 means the value below which 75 percent of observations in the given

series are found.

2. D j is the value below which ( j  10) percent of the observations in the series are found (where

j  1, 2, ... , 9 ). For instance D4 is the value below which 40 percent of the values are found in the
series.

3. Pj is the value below which j percentof the total observations are found (where j  1, 2, 3, ... , 99 ).

For example 73 percent of the observations in a given series are below P73 .

Summary
A measure of central tendency is a typical value around which other figures congregate. An
average stands for the whole group of which it forms a part yet represents the whole. One of the
most widely used set of summary figures is known as measures of location. There are five
averages. Among them mean, median and mode are called simple averages and the other two
averages geometric mean and harmonic mean are called special averages. It is possible to
compute the modal value of any data set either qualitative or quantitative. We can compute the
arithmetic mean of a frequency distribution with open class while it is possible to compute mean
and median.

Exercise 4

Indicate whether the statement is true or false.

1. The mean will provide the best measure of central tendency for any possible set of data.
2. Changing the value of a score in a distribution will always change the value of the mean.
3. For any normal distribution, the mean and the median will have the same value.
Identify the choice that best completes the statement or answers the question.
4. A population of scores has ∑ 𝑋 = 60 and a mean of 𝑥̅ = 12. How many scores are in this
population?
a. 5 c. 60
b. 12 d. None

64
5. One sample of n = 4 scores has a mean of 𝑥̅ = 10, and a second sample of n = 8 scores has a
mean of 𝑥̅ = 20. If the two samples are combined, the mean for the combined sample will be
a. equal to 15 c. less than 15 but more than 10
b. greater than 15 but less than 20 d. None of the other choices is correct.

Assignment 4: (5%)

1. Arithmetic mean of 50 observations was 100. At the time of calculations two observations
180 and 90 were wrongly taken as 100 and 10. Find the corrected mean?
2. Complete the following frequency distribution?
Class limit frequency
11-20 12
21-30 30
31-40 𝑓3
41-50 65
51-60 𝑓5
61-70 25
71-80 18
Total 229

If the median value is 46, find


a) The missing frequencies?
b) Calculate the mean, second quartiles, 75th percentile and interpret them?
c) Comment on the value of median and second quartiles?

Reference

 Bluman, A.G (1995). Elementary statistics (2nd edition)


 Gupta, C.P (2004). Introduction to statistical methods (9th edition)
 Freund J.E and Simon G.A (1998). Modern Elementary statistics (9th edition)

65
Chapter 5: Measures of Variation (Dispersion), Skewness and Kurtosis

Objectives
At the end of this unit the learners will be able to:
 explain objectives of measures of central tendency
 Compute range, variance, standard deviation, CV and Z-score for raw/summarized data.
 Identify the shape and peakedness of frequency curves

Contents

5.1 Introduction and objectives of measures of variation


5.2 Types of Measures of dispersion (Variation)
5.2.1 Range
5.2.2 Variance, standard deviation and coefficient of variation
5.2.3 Standard scores
5.3 Moments
5.4 Skewness
5.5 Kurtosis

5.1 Introduction and objectives of measuring Variation

The scatter or spread of items/objects of a distribution is known as dispersion or variation. In


other words the degree to which numerical data tend to spread about an average value is called
dispersion or variation of the data.

Objectives of measuring Variation:


 To judge the reliability of measures of central tendency
 To measure and control variability itself.
 To compare two or more groups of numbers in terms of their variability.
 To make further statistical analysis.

5.2 Types of Measures of Dispersion

Measures of dispersion/variation may be either absolute or relative. Absolute measures of


dispersion are expressed in the same unit of measurement in which the original data are given.

66
These values may be used to compare the variation in two distributions provided that the
variables are in the same units and of the same average size.

A measure of relative dispersion is the ratio of a measure of absolute dispersion to an appropriate


measure of central tendency. It is sometimes called coefficient of dispersion because the word
“coefficient” represents a pure number (that is independent of any unit of measurement). The
value of a relative dispersion is unit less quantity. Various measures of dispersions are in use.
The most commonly used measures of dispersions are discussed here in pair (absolute and
relative) of the two measures of dispersion.

1) Range and relative range


2) Quartile deviation and coefficient of Quartile deviation
3) Mean deviation and coefficient of Mean deviation
4) Standard deviation and coefficient of variation.
5) Variance
6) Standard score( Z-score)

5.2.1 Range(R) and Relative Range (RR)


Range(R)

The range is the largest score minus the smallest score. It is a quick and dirty measure of
variability, although when a test is given back to students they very often wish to know the range
of scores.

R  LS , L  l arg est observation


S  smallestobservation

Range for grouped data:

If data are given in the shape of continuous frequency distribution, the range is computed as:

R  UCLk  LCL1 , UCLk is upperclasslim it of the last class.


UCL1 is lower class lim it of the first class.
This is sometimes expressed as:
R  X k  X1 , X k is class mark of the last class.
X 1 is classmark of the first class.

67
Relative Range (RR): it is also sometimes called Coefficient of Range (CR) and given by:

LS R
RR   ; This is sometimes expressed as:
LS LS
x max  x min R
RR   ........ for ungroupeddata
x max  x min x max  x min
M last  M first R
RR   ......... for grouped data
M last  M first M last  M first

Examples: Find the range and coefficient of range the data set: 7, 9, 8, 6,11, 10, 4
L  S 11  4 7
Solution: L=11, S=4 R=L-S=11-4=7 CR=    0.4667
L  S 11  4 15
Example: If the range and relative range of a series are 4 and 0.25 respectively. Then what is the
value of the smallest observation largest observation.
Solutions:

R  4  L  S  4 __________ _______(1)
RR  0.25  L  S  16 __________ ___( 2)
Solving (1) and ( 2) at the same time , one can obtain the following value
L  10 and S  6
Example: Find the values of the range and relative range for the following frequency
distribution: which shows the distribution of the maximum loads supported by a certain number
of cables.
Maximum load(in kilo-Newton) Number of cables
93 – 97 2
118 – 122 6
123 – 127 3
128 – 132 1

Solution:

M first  95 kN M last  130 kN


R  M last  M first  130 kN  95 kN  35 kN
M last  M first 130 kN  95 kN 35 kN
RR     0.156
M last  M first 130 kN  95 kN 225 kN

68
Merits and Demerits of Range:
Merits:
1. It is simple to understand.
2. It is easy to calculate.
3. In certain types of problems like quality control, weather forecasts, share price analysis, et
c., range is most widely used.
Demerits:
1. It is very much affected by the extreme items.
2. It is based on only two extreme observations.
3. It cannot be calculated from open-end class intervals.
4. It is not suitable for mathematical treatment.
5. It is a very rarely used measure.

5.2.2 The Quartile Deviation (Semi-inter quartile range, ( Q.D) and Coefficient of Quartile
Deviation (C.Q.D)

Quartile Deviation (Semi-inter quartile range), Q.D

The inter quartile range is the difference between the third and the first quartiles of a set of items
and semi-inter quartile range is half of the inter quartile range.

Q3  Q1
Q.D 
2
Coefficient of Quartile Deviation (C.Q.D)

 It gives the average amount by which the two quartiles differ from the median.

(Q3  Q1 2 2 * Q.D Q3  Q1
C. Q.D   
(Q3  Q1 ) 2 Q3  Q1 Q3  Q1

Example 1: Find the Quartile Deviation and its coefficients for the following data: 391, 384,
591, 407, 672, 522, 777, 733, 1490, 2488

Solution: Arrange the given values in ascending order. 384, 391, 407, 522, 591, 672, 733, 777,
1490, 2488.

Q1=403, Q3=955.25

69
𝑄3 − 𝑄1 955.25 − 403
𝑄. 𝐷 = = = 276.125
2 2
Example 2: Compute Q.D and its coefficient for the following distribution.

Values Frequency
140- 150 17
150- 160 29
160- 170 42
170- 180 72
180- 190 84
190- 200 107
200- 210 49
210- 220 34
220- 230 31
230- 240 16
240- 250 12
Solutions:

We have obtained the values of all quartiles as:

Q1= 174.90, Q2= 190.23, Q3=203.83

Q3  Q1 203.83  174.90
 Q.D    14.47
2 2
2 * Q.D 2 * 14.47
C.Q.D    0.076
Q3  Q1 203.83  174.90

Remark: Q.D or C.Q.D includes only the middle 50% of the observation.

5.2.3 The Mean Deviation and Coefficient of Mean Deviation

The Mean Deviation (M.D):

The mean deviation of a set of items is defined as the arithmetic mean of the values of the
absolute deviations from a given average. Depending up on the type of averages used we have
different mean deviations.

The mean deviation (MD) measures the average deviation of a set of observations about their
central value, generally the mean or the median, ignoring the plus/minus sign of the deviations.

The mean deviation of a sample of n observations x1 , x 2 , ... , x n is given as

70
MD 
x i A
Where A is a central measure (the mean or the median or the mode)
n
In case of grouped data, the formula for MD becomes

MD 
f i mi  A
Where mi is the class mark of the i th class, f i is the frequency of the
n
i th
class and n  f i .
a) Mean Deviation about the Mean

Denoted by M.D( X ) and given by

MD 
x i x
....
n for ungrouped data

MD 
f i mi  x
.... for grouped frequency distribution; where mi is the class mark of the
n
i th class, f i is the frequency of the i th class and n  f i

Steps to calculate M.D ( X ):

1. Find the arithmetic mean, X


2. Find the deviations of each reading from X .
3. Find the arithmetic mean of the deviations, ignoring sign.

b) Mean Deviation about the Median


~
 Denoted by M.D( X ) and given by

MD 
 xi  ~
x
.... for ungrouped data
n

MD 
 fi mi  x .... for grouped frequency distribution; where mi is the class mark of
n
the i th class, f i is the frequency of the i th class and n  f i .
~
Steps to calculate M.D ( X ):
~
1. Find the median, X
~
2. Find the deviations of each reading from X .
3. Find the arithmetic mean of the deviations, ignoring sign.
c) Mean Deviation about the Mode

71
n
 Denoted by M.D( X̂ ) and given by:
X i  Xˆ
M .D( Xˆ )  i 1

n
 For the case of frequency(grouped) distribution it is given as:

f i mi  Xˆ
M .D( Xˆ )  i 1
n

Steps to calculate M.D ( X̂ ):

1. Find the mode, X̂


2. Find the deviations of each reading from X̂ .
3. Find the arithmetic mean of the deviations, ignoring sign.
Examples:

1. The following are the number of visit made by ten mothers to the local doctor’s surgery. 8, 6,
5, 5, 7, 4, 5, 9, 7, 4
Find mean deviation about mean, median and mode.

Solutions:
First calculate the three averages
~
X  6, X  5.5, Xˆ  5
Then take the deviations of each observation from these averages.
Xi 4 4 5 5 5 6 7 7 8 9 total

Xi  6 2 2 1 1 1 0 1 1 2 3 14

X i  5.5 1.5 1.5 0.5 0.5 0.5 0.5 1.5 1.5 2.5 3.5 14

Xi  5 1 1 0 0 0 1 2 2 3 4 14

 j  n100  FP j 
Pj  LP j   W ; j  1, 2, 3, . .. , 99
 f Pj 
 

72
10

X i  5.5
14
M .D( X )  i 1
  1.4
10 10
j  n 100

Coefficient of Mean Deviation


The coefficient of mean deviation (CMD) is the ratio of the mean deviation of the observations to
their appropriate measure of central tendency: the arithmetic mean or the median.

MD
In general, CMD  where A is a measure of central tendency: the arithmetic mean or the
A
median.
MD
That is, CMD about the arithmetic mean is given by CMD  where MD is the mean
x
deviation calculated about the arithmetic mean. On the other hand CMD about the median is
MD
given by CMD  ~ in which case MD is calculated about the median of the observations.
x
Example: Calculate the C.M.D about the mean, median and mode for the data in example above.

Solutions:

( j 10) percent

M .D( X ) 1.4
 C.M .D( X )    0.233
X 6
~
~ M .D( X ) 1.4
C.M .D( X )  ~   0.255
X 5.5
M .D( Xˆ ) 1.4
C.M .D( Xˆ )    0.28
Xˆ 5

Merits and Demerits of Quartile Deviation


Merits:
1. It is Simple to understand and easy to calculate
2. It is not affected by extreme values.
3. It can be calculated for data with open end classes also.
Demerits:

73
1. It is not based on all the items. It is based on two positional values Q 1 and Q 3 and
ignores the extreme 50% of the items.
2. 2. It is not amenable to further mathematical treatment.
3. 3. It is affected by sampling fluctuations.

5.2.4 The Variance, the Standard Deviation and Coefficient of Variation

The Variance
Variance is the arithmetic mean of the square of the deviation of observations from their
arithmetic mean

Population Variance (  2 )
If we divide the variation by the number of values in the population, we get something called the
population variance. This variance is the "average squared deviation from the mean".

 For ungrouped data

 x  
2
1   xi  
2

   ...   xi  N  , Where  is the population arithmetic mean and N


2 i 2

N N
 
is the total number of observations in the population.

For grouped data

 f m       fi mi  
2 2

 2
 i i

1
 fi mi  N
2  , Where  is the population arithmetic
N N 
 
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and N  f i .

Sample Variance ( S 2 )
One would expect the sample variance to simply be the population variance with the population
mean replaced by the sample mean. To counteract this, the sum of the squares of the deviations
is divided by one less than the sample size.

For ungrouped data

74
 x  x
2
1   xi 2  , Where x is the sample arithmetic mean
  ...   xi  n 
2 i 2
S
n 1 n 1
 
and n is the total number of observations in the sample.

For grouped data

 f m  x      
2 2
1  f m
  ...   fi mi 2  n  Where x is the sample arithmetic
i i i i
S2
n 1 n 1  
 
mean, mi is the class mark of the i th class, f i is the frequency of the i th class and n  f i .

Standard Deviation

There is a problem with variances. Recall that the deviations were squared. That means that the
units were also squared. To get the units back the same as the original data values, the square
root must be taken.
Standard deviation is the positive square root of the variance.

 Population Standard Deviation (  )

   2 , where  2 is the population variance.

 Sample Standard Deviation ( S )

S  S 2 , where S 2 is the sample standard deviation.

Examples: Find the variance and standard deviation of the following sample data
1. 5, 17, 12, 10.
2. The data are given in the form of frequency distribution.
Class Frequency
40-44 7
45-49 10
50-54 22
55-59 15
60-64 12
65-69 6
70-74 3
Solutions:

Xi 5 10 12 17 Total
75
1. X  11 (Xi- X)2 36 1 1 36 74

 (X
i 1
i  X )2
74
 S2    24.67.
n 1 3
 S  S 2  24.67  4.97.
2. X  55
mi(M.P.) 42 47 52 57 62 67 72 Total

fi(Xi- X)2 1183 640 198 60 588 864 867 4400

 f (m  X )
i i
2
4400
 S2  i 1
  59.46.
n 1 74
 S  S 2  59.46  7.71.
Properties of the Variance and the Standard Deviation

Variance

 It removes most of the demerits or drawbacks of the measures of dispersion discussed so far.
 Its unit is the square of the unit of measurement of values. For example, if the variable is
measured in kg, the unit of variance is kg2.
 It is calculated based on all the observations/data in the series.
 It gives more weight to extreme values and less to those which are near to the mean.
Standard Deviation

 It is considered to be the best measure of dispersion.


 [Demerits] If the values of two series have different unit of measurement, then we cannot
compare their variability just by comparing the values of their respective standard deviations.
 It is calculated based on all the observations/data in the series. Standard deviation is capable of
further algebraic treatment.
 Standard deviation is as such neither easy to calculate nor to understand.
 Similar to the variance, standard deviation gives more weight to extreme values and less to
those which are near to the mean.

76
Special Properties of Standard Deviations

1.
 ( X i  X )2   ( X i  A) 2 , A  X
n 1 n 1
2. For normal (symmetric distribution the following holds true.)
 Approximately 68.27% of the data values fall within one standard deviation of the mean. i.e.

with in ( X  S, X  S)
 Approximately 95.45% of the data values fall within two standard deviations of the mean. i.e.

with in ( X  2S , X  2S )
 Approximately 99.73% of the data values fall within three standard deviations of the mean.

i.e. with in ( X  3S , X  3S )
3. Chebyshev's Theorem
For any data set ,no matter what the pattern of variation, the proportion of the values that fall
1
within k standard deviations of the mean or ( X  kS, X  kS ) will be at least 1  , where k
k2
is an number greater than 1. i.e. the proportion of items falling beyond k standard deviations of
1
the mean is at most
k2
Example: Suppose a distribution has mean 50 and standard deviation 6. What percent of the
numbers are?
a) Between 38 and 62
b) Between 32 and 68
c) Less than 38 or more than 62.
d) Less than 32 or more than 68.

Solutions:

a) 38 and 62 are at equal distance from the mean,50 and this distance is 12

 50-38=62-50=12
 KS=12; K=12/S =12/6=2

77
1
 Applying the above theorem, at least (1  ) * 100%  75% of the numbers lie between 38
k2
and 62.
b) Similarly done.
1
c) It is just the complement of a) i.e. at most * 100%  25% of the numbers lie less than 32
k2
or more than 62.
d) Similarly done.

4. If the standard deviation of X 1 , X 2 , .....X n is S and considering k and a as constants.,


then the standard deviation of

a) X 1  k , X 2  k , ..... X n  k will also be S

b) kX1 , kX 2 , .....kX n would be k S


c) a  kX1 , a  kX 2 , .....a  kX n would be k S
Coefficient of Variation (C.V)

 The standard deviation is an absolute measure of dispersion. The corresponding relative


measure is known as the coefficient of variation (CV).
 Coefficient of variation is used in such problems where we want to compare the variability of
two or more different series. Coefficient of variation is the ratio of the standard deviation to
the arithmetic mean, usually expressed in percent.
S
 CV   100. Where S is the standard deviation of the observations.
x
 A distribution having less coefficient of variation is said to be less variable or more
consistent or more uniform or more homogeneous.
 Example: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.

Department Biology Chemistry


Mean score 79 64
Standard deviation 23 11

78
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:

Biology Department Chemistry Department


S S
CV   100 CV   100
x x
23 11
  100  29.11%   100  17.19%
79 64
 Interpretation: Since the CV of Biology Department students is greater than that of
Chemistry Department students, we can say that there is more dispersion relative to the mean
in the distribution of Biology students’ scores compared with that of Chemistry students.
 Example: An analysis of the monthly wages paid (in Birr) to workers in two firms A and B
belonging to the same industry gives the following results
Value Firm A Firm B
Mean wage 52.5 47.5
Median wage 50.5 45.5
Variance 100 121
In which firm A or B is there greater variability in individual wages?

Solutions:
Calculate coefficient of variation for both firms.

SA 10
C.VA  *100  *100  19.05%
XA 52.5
SB 11
C.VB  *100  *100  23.16%
XB 47.5
Since C.VA < C.VB, in firm B there is greater variability in individual wages.

5.2.5 Standard Scores (Z-scores)

A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
x
Population standard score: Z  where x is the value of the observation,  and  are the

mean and standard deviation of the population respectively.

79
xx
Sample standard score: Z  where x is the value of the observation, x and S are the mean
S
and standard deviation of the sample respectively.

Interpretation:

𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑙𝑖𝑒𝑠 𝑎𝑏𝑜𝑣𝑒 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛


𝐼𝑓 𝑍 𝑖𝑠 { 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑙𝑖𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
𝑧𝑒𝑟𝑜, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑒𝑞𝑢𝑎𝑙𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛

Example: Two sections were given an exam in a course. The average score was 72 with standard
deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A from
section 1 scored 84 and student B from section 2 scored 90. Who performed better relative to
his/her group?

Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; x A = 84


Section 2: x = 85, S = 5 and score of student B from Section 2; x B = 90
x A  x1 84  72
Z-score of student A: Z    2.00
S1 6

x B  x 2 90  85
Z-score of student B: Z    1.00
S2 5
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.

Example Two sections were given introduction to statistics examinations. The following
information was given.

Value Section 1 Section 2


Mean 78 90
Stan. Deviation 6 5
Student MARU from section 1 scored 90 and student HANA from section 2 scored 95.
Relatively speaking who performed better?

80
Solutions:

Calculate the standard score of both students.

X A  X 1 90  78
Z MARU   2
S1 6
X B  X 2 95  90
Z HANA   1
S2 5

 Student MARU performed better relative to his section because the score of student MARU
is two standard deviation above the mean score of his section while, the score of student HANA
is only one standard deviation above the mean score of his section.

5.3 Moments

- If X is a variable that assume the values X1, X2,…..,Xn then


1. The rth moment is defined as:

X  X 2  ...  X n
r r r

X  1
r

n
n

X
r
i
 i 1

n
- For the case of frequency distribution this is expressed as:
k

f X
r
i i
i 1
Xr 
n
- If r  1 , it is the simple arithmetic mean, this is called the first moment.
- This is sometimes called the moment about the origin.
2. The rth moment about the mean ( the rth central moment)
Denoted by Mr and defined as:
n

(X  X )r
n

(X i  X) r
(n  1) i 1
i

Mr  i 1

n n n 1

81
For the case of frequency distribution this is expressed as:
k

f i ( X i  X )r
Mr  i 1

n
Ifr  2 , it is population variance, this is called the second central moment. If we assume
n 1  n , it is also the sample variance.
3. The rth moment about any number A is defined as:
'
- Denoted by M r and

∑(𝑋𝑖 − 𝐴)𝑟
𝑀𝑟, =
𝑛
- For the case of frequency distribution this is expressed as:
k

 f (X i i  A) r
Mr  i 1
'

Example:

1. Find the first two(about the origin) moments for the following set of numbers: 2, 3, 7
2. Find the first three central moments of the numbers in problem 1
3. Find the third moment about the number 3 of the numbers in problem 1.

Solutions:
1. Use the rth moment formula.
n

X
r
i
Xr  i 1

n
237
 X1  4 X
3
2 2  32  7 2
X2   20.67
3
2. Use the rth central moment formula.

82
n

 (X
i 1
i  X )r
Mr 
n
(2  4)  (3  4)  (7  4)
 M1  0
3
(2  4) 2  (3  4) 2  (7  4) 2
M2   4.67
3
(2  4) 3  (3  4) 3  (7  4) 3
M3  6
3
3. Use the rth moment about A.
n

(X
i 1
i  A) r
Mr 
n
(2  3) 3  (3  3) 3  (7  3) 3
 M3   21
'

5.4 Skewness

Skewness is the degree of asymmetry or departure from symmetry of a distribution. A skewed


frequency distribution is one that is not symmetrical. Skewness is concerned with the shape of
the curve not size.
A distribution is said to be symmetric if it can be folded along a vertical axis so that the two
sides coincide.
A distribution is said to symmetrical when the value is uniformly distributed around the mean
(distribution of the data bellow the mean and above the mean are equal). The mean, median and
the mode are equal
A distribution that lacks symmetry with respect to a vertical axis is said to be skewed.
If the frequency curve (smoothed frequency polygon) of a distribution has a longer tail to the
right of the central maximum than to the left, the distribution is said to be skewed to the right or
said to have positive skewness. If it has a longer tail to the left of the central maximum than to
the right, it is said to be skewed to the left or said to have negative skewness.
For moderately skewed distribution, the following relation holds among the three commonly
used measures of central tendency: Mean  Mode  3 * ( Mean  Median )

83
Symmetrical Positively skewed negatively Skewed

x x x x  x or x x  x or x

Measures of Skewness

Denoted by  3 ; There are various measures of skewness.

1. The Pearsonian coefficient of skewness

Mean  Mode X  Xˆ
3  
S tan dard deviation S

2. The Bowley’s coefficient of skewness ( coefficient of skewness based on quartiles)


(Q3  Q2 )  (Q2  Q1 ) Q3  Q1  2Q2
3  
Q3  Q1 Q3  Q1

3. The moment coefficient of skewness


M3 M3 M
3    33 , Where is the populations tan dard deviation. The shape
M2
32
( )
2 32

of the curve is determined by the value of 3

 If  3  0 then the distribution is positivelyskewed.

 If  3  0 then the distribution is symmetric.

 If  3  0 then the distribution is negativelyskewed.

5.5 Kurtosis

It is the degree of peakedness of the curve of a frequency distribution. The peakedness or


condensation of a distribution can be any of the three different sizes.

Mesokurtic (normal curve): If the frequency distribution is unimodal and if the curve is bell
shaped and symmetrical.

84
Leptokurtic: If the frequency distribution is more peaked and narrow topped than normal i.e.
large numbers of observations have high frequency

Platykurtic: If the frequency distribution is less peaked and flat topped than normal i.e. large
numbers of observations have low frequency.

Measures of Kurtosis
The moment coefficient of kurtosis:

 Denoted by  4 and given by:

M4 M4 M4
4   
M 22  4  2 2

Where:M 4 is the fourth moment about the mean.


M 2 is the second moment about the mean.
σ is the population standard deviation.

The peakedness depends on the value of 4.


 If  4  3 then the curveis leptokurtic.

 If  4  3 then the curveis mesokurtic.

 If  4  3 then the curveis platykurtic.

Leptokurtic

Mesokurtic

Platykurtic

Examples

1. If the first four central moments of a distribution are:


M  0, M  16, M  60, M  162
1 2 3 4
a) Compute a measure of skewness and give your interpretation.
b) Compute a measure of kurtosis and give your interpretation.

85
Solutions:
M3  60
3  32
  0.94  0
a) M2 16 3 2
 The distribution is negatively skewed .
M 4 162
    0.6  3
b) 4 M 2 162
2
 The curveis platykurtic.
2. Suppose the mean, the mode, and the standard deviation of a certain distribution are 32, 30.5
and 10 respectively. What is the shape of the curve representing the distribution?
Solutions:
Use the Pearsonian coefficient of skewness
Mean  Mode 32  30.5
    0.15
3 S tan dard deviation 10
  0  The distributionis positively skewed .
3
3. In a frequency distribution, the coefficient of skewness based on the quartiles is given to be
0.5. If the sum of the upper and lower quartile is 28 and the median is 11, find the values of
the upper and lower quartiles.
Solutions:

Given:   0.5, X  Q  11 Required: Q1 ,Q3


3 2
Q  Q  28...........................(*)
1 3
(Q3 Q2 )(Q2 Q1) Q Q 2Q2
   3 1  0.5; Substituting the given values , one can obtainthe following
3 Q3 Q1 Q3 Q1
Q  Q  12...................................(**)
3 1

Solving (*) and (**) at the sametime weobtain the following values
Q 8 and Q  20
1 3

Summary

A measure of dispersion or simply dispersion may be defined as a statistics satisfying the extent of the
scatteredness of items round a measure of central tendency. A measure of dispersion may be expressed in

86
an “absolute form” or in “relative form”. It is said to be in an absolute form when it states the actual
amount by which the values of an item on an average deviate from a measure of central tendency.
Absolute measures are expressed in concrete units i.e. units in terms of which the data have been
expressed. Range is the crudest measure of dispersion. By far the most universally used and the most
useful measure of dispersion is the standard deviation or root mean square deviation about the mean.
Skewness is concerned with the shape of the curve while the measure of kurtosis exhibits the to
which the curve is more peaked or more flat topped than the normal curve.

Exercises 5

3. The following data is for the monthly salary of eighteen workers in a certain paint factory
given below.

Frequency 2 6 2 1 1 1 2 1 1 1
Salary 462 480 534 624 498 552 606 588 516 570
a. Find the range and relative range
b.Find the variance and standard deviation
c. Find the mean devotion and coefficient of MD about the mode
4. If the average scores of a special test of knowledge of wood refinishing has a mean of 53 and
standard deviation of 6. Find the range of values in which at least 75% the scores will lie.
5. The mean and the standard deviation of a set of numbers are respectively 500 and 10.
a. If 10 is added to each of the numbers in the set, then what will be the variance and standard
deviation of the new set?
b. If each of the numbers in the set are multiplied by -5, then what will be the variance and
standard deviation of the new set?

Assignment 5: (5%)
1. Two groups of people were trained to perform a certain task and tested to find out which
group is faster to learn the task. For the two groups the following information was given:

Value Group one Group two


Mean 10.4 min 11.9 min
Stan.dev. 1.2 min 1.3 min
Relatively speaking:

87
a) Which group is more consistent in its performance
b) Suppose a person A from group one take 9.2 minutes while person B from Group two take
9.3 minutes, who was faster in performing the task? Why?
3. For a moderately skewed frequency distribution, the mean is 10 and the median is 8.5. If the
coefficient of variation is 20%, find the Pearsonian coefficient of skewness and the probable
mode of the distribution.
4. The sum of fifteen observations, whose mode is 8, was found to be 150 with coefficient of
variation of 20%
(a) Calculate the pearsonian coefficient of skewness and give appropriate conclusion.
(b) What is the shape of this distribution?
(c) If a constant k was added on each observation, what will be the new pearsonian coefficient
of skewness? Show your steps. What do you conclude from this?
Reference

 Bluman, A.G (1995). Elementary statistics (2nd edition)


 Gupta, C.P (2004). Introduction to statistical methods (9th edition)
 Freund J.E and Simon G.A (1998). Modern Elementary statistics (9th edition)

88
Chapter 6: Simple Linear Regression and Correlation

6.1 Introduction

In this chapter we shall see the relationships between different variables and closely related
techniques of correlation and linear regression for investigating the linear association between
two continuous variables. Correlation measures the closeness of the association, while linear
regression gives the equation of the straight line that best describes it and enables the prediction
of one variable from the other. For example, how does consumption change as family income
change? Is there a relation between expense and income? In the community, what is the relation
between socio economic status of residents and the extent to which health care is available? Is
there relation between income and expenditure? All these questions concern the relationship
between two variables, each measured on the same units of observation, be they animals,
patients, or communities. Correlation and regression constitute the statistical techniques for
investigating such relationships.

6.2 Scatter Plot

Scatter diagram (plot): It is the simplest method of studying the relationship between two
variables diagrammatically. It is a two-dimensional plot of a sample of bivariate observations.
The diagram is an important aid in assessing what type of relationship links the two variables. It
is a plot of all ordered pairs (x, y) on the coordinate plane which is necessary to discover whether
the relationship b/n two variables indeed best explained by straight line.

6.3 Simple Correlation Analysis

Correlation analysis: is a statistical technique that can be used to describe the degree to which
one variable is linearly related to other variable.
Correlation is the method of analysis to use when studying the possible association between two
continuous variables. If we want to measure the degree of association, this can be done by
calculating the correlation coefficient. The standard method (Pearson correlation) leads to a
quantity called r which can take any value from -1 to +1. This correlation coefficient r measures
the degree of 'straight-line' association between the values of two variables. Thus a value of +1.0
or -1.0 is obtained if all the points in a scatter plot lie on a perfect straight line (see figures 9.1).

89
The correlation between two variables is positive if higher values of one variable are associated
with higher values of the other and negative if one variable tends to be lower as the other gets
higher. If r = 0 implies there is no linear relationship between the two variables: but there could
be a non-linear relationship between them. In other words, when two variables are uncorrelated, r
= 0, but when r = 0, it is not necessarily true that the variables are uncorrelated.

What are we measuring with r? In essence r is a measure of the scatter of the points around an
underlying linear trend: the greater the spread of the points the lower the correlation. The
correlation coefficient usually calculated is called Pearson's r or the 'product-moment' correlation
coefficient (other coefficients are used for ranked data, etc.).

If we have two variables x and y, the correlation between them denoted by r (x, y) or ‘r’ is given
by
( x  x )( y  y )
Cor ( x, y )  n 1
r 
sd ( x).sd (Y )  ( x  x  ( y  y)
2

n 1 n 1

=
 ( x  x )( y  y )
 ( x  x)  ( y  y)
2 2

 x y
 xy  n
=
( X ) )( y  ( y)
2
2

( x  2
 2
)
n n

90
The numerator is termed as the sum of products of x and y, SPxy. In the denominator, the first
term is called the sum of squares of x, SSx, and the second term is called the sum of squares of y,
SSy. Thus,
SPxy
r=
SS x SS y

Interpretation of r
1. Perfect positive linear relationship ( if r  1)
2. Perfect negative linear relationship ( if r  1)
3. Some Positive linear relationship ( if r is between 0 and 1)
4. No linear relationship ( if r  0)
5. Some Negative linear relationship ( if r is between -1 and 0)
Uses of correlation:

1. It is used in physical and social sciences.

2. It is useful for economists to study the relationship between variables like price, quantity etc.
Businessmen estimate costs, sales, price etc.

3. It is helpful in measuring the degree of relationship between the variables like income and
expenditure, price and supply, supply and demand etc.

4. Sampling error can be calculated.

5. It is the basis for the concept of regression.

Example 9.1: Compute the value of Pearson’s correlation coefficient based on the study of Age
(X) and Blood pressure (Y) of a person.

Blood
Age=X Pressure=Y XY X2 Y2

43 128 5504 1849 16384

48 120 5760 2304 14400

56 135 7560 3136 18225

61 143 8723 3721 20449

67 141 9447 4489 19881

91
70 152 10640 4900 23104

 X  345  Y  819  XY  47634  X 2


 20,399 Y 2
 112,443

n XY  ( X )( Y )
n=6 and r 
[ n X 2  ( X ) 2 ] [ n Y 2  ( Y ) 2 ]

6( 47,634)  (345)(819) 3249


r   0.897(Closeto1)
6( 20,399)  (345) 2 ][ 6(112,443 (819) 2 ] 13128993

Interpretation: There is strong positive linear r/p b/n age & blood pressure.

The above formula and procedure is applicable for quantitative data. When we have qualitative
data (efficiency, honesty, intelligence and others), we go for Spearman’s Rank Correlation
Coefficient ( rs ).

Rank Correlation Coefficient ( rs ).

It is a measure of correlation based on rank of observations and not on the actual magnitudes
(values). It is useful to study the qualitative measure of attributes like honesty, colour, beauty,
intelligence, character, morality etc. The individuals in the group can be arranged in order and
there on, obtaining for each individual a number showing his/her rank in the group.

Steps: 1st: Rank the different values in X & Y.

2nd: Find the difference of the ranks in a pair. i.e. Di=Xi-Yi

3rd: Then use the formula:

6 ∑ 𝐷𝑖2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)

Interpretation of rs is similar to that of r.

Example: The following are rankings of seven football players by two Coaches.

Find rs and comment on the opinion of the two coaches.

92
Player Coach-1=X Coach-2=Y Di=Xi-Yi Di2

A 4 4 0 0

B 1 2 -1 1

C 6 5 1 1

D 5 6 -1 1

E 3 1 2 4

F 2 3 -1 1

G 7 7 0 0

D i
2
=8

6 Di
2

6(8)
n=7 and rs  1   1  0.857(Closeto  1)
n(n  1)
2
7(7 2  1)

6.4 Simple Linear Regression Analysis

Regression analysis: is concerned with bringing out the nature of relationship and using it to
know the best approximate value of one variable corresponding to a known value of other
variables. Simple linear regression deals with method of fitting a straight line (regression line) on
a sample of data of two variables in terms of equation so that if the value of one variable is given
we can predict the value of the other variable.

If we have two variables under study one may represent the cause and the other may represent
the effect. The variable representing the cause is known as independent (predictor or repressor)
variable and it is usually denoted by X. The variable representing the effect is known as
dependent (predicted) variable and is usually denoted by Y. Then, if the relationship between the
two variables is a straight line, it is known as simple linear regression.

When there are more than two variables and one of them is assumed to be dependent up on the
others, the functional relationship between the variables is known as multiple linear regressions.
Therefore, to see the type of relationship, it is advisable to prepare scatter plot before fitting the
model.

93
There are two principal purposes for building a regression model. The first most common
purpose is to build a predictive model, for example in situations in which age and gender are
used to predict normal values in lung size or body mass index (BMI). Normal values are the
range of values that occur naturally in the general population. In developing a model to predict
normal values, the emphasis is on building an accurate predictive model.

The second purpose of using a regression model is to examine the effect of an explanatory
variable on an outcome variable after adjusting for other important explanatory factors. These
types of models are used for hypothesis testing.

Simple Regression: is a regression where there is one dependent variable (Y) and one
independent variable (X).

Multiple Regressions: is a regression where there is one dependent variable (Y) and two or
more independent variable (X).

Simple Linear Regression: is a regression where there is a linear relationship b/n dependent
variable (Y) and independent variable.

Example: Study Hours (X=independent variable=Cause) versus

Grade Obtained (Y=dependent variable=Effect)

Two variables X and Y are said to linearly related if their relationship can be expressed by
simple linear model (referred as Regression of Y on X):

Y = α + βX + ε … … … … … … … … … … … … … … … … … … 9.1
Where:Y  Dependentvar iable
X  independent var iable
  Re gression cons tan t
  regressionslope
  randomdisturbance term
Y ~ N (   X ,  2 )
 ~ N (0,  2 )

6.4.1 Assumption of simple linear regression

 There exists a linear relationship between the dependent and the independent variable/s.

94
 Error terms are assumed to be distributed normally with zero mean and constant variance 𝜎 2
(homoscedasticity or equal variance of ui,), i.e. 𝜀𝑖 ~𝑁(0, 𝜎 2 ).
 The values of predictor variables are fixed in repeated sampling.
 No autocorrelation between the disturbances.
 Variability in X values (the X values in a given sample must not all be the same).
 There is no perfect multicollinearity. That is, there are no perfect linear relationships among
the explanatory variables. (Gujarati, 2004)

6.4.2 Parameter estimation

To estimate the parameters (  and  ) we have several methods: some of them are:

 The least square method


 The maximum likelihood method
 The method of moments
In this course we will discuss only the least square methods.

a) The Least Square Estimation

The least squares estimation procedure uses the criterion that the Least Squares solution must
give the smallest possible sum of squared deviations of the Criterion the observed Yi from the
estimates of their true means provided by the solution. Let 𝑎 𝑎𝑛𝑑 𝑏 be the numerical estimates of
the parameters 𝛼 𝑎𝑛𝑑 𝛽 respectively, and let

Yˆ  a  bX .......... .......... .......... .......... .......... .....9.2

be the estimated mean of Y for each Xi, i = 1, . . . , n. Note that _Yi is obtained by substituting the
estimates for the parameters in the functional form of the model relating E(Yi) to X.

Where a is a constant which gives the value of Y when X=0 .It is called the Y-intercept. b is a
constant indicating the slope of the regression line, and it gives a measure of the change in Y for
a unit change in X. It is also regression coefficient of Y on X.

a and b are found by minimizing SSE    2   (Yi  Yˆi ) 2

95
Where: Yi  observedvalue
Yˆ  estimated value  a  bX
i i

And this method is known as OLS (ordinary least square)

Minimizing SSE   2
gives

b
 ( X  X )(Y  Y )   XY  nXY
i i

(X  X )
i  X  nX
2 2 2

a  Y  bX
Example 6.3: The following data shows the score of 12 students for accounting and fundaments
of Statistics examinations.

a) Calculate a simple correlation coefficient


b) Fit a regression line of Statistics on accounting using least square estimates.
c) Predict the score of Statistics if the score of accounting is 85.

Accounting X Statistics Y
1 74.00 81.00
2 93.00 86.00
3 55.00 67.00
4 41.00 35.00
5 23.00 30.00
6 92.00 100.00
7 64.00 55.00
8 40.00 52.00
9 71.00 76.00
10 33.00 24.00
11 30.00 48.00
12 71.00 87.00
a)

96
𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌
𝑟=
(√𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ) √𝑛 ∑ 𝑌 2 − (∑ 𝑌)2
12 ∑ 48407 − 687 × 741
= = 0.9194
(√12 × 45591 − 6872 ) × √12 × 52525 − 7412

The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two variables are
positively correlated (Y increases as X increases).

b. Using OLS:

𝑛 ∑ 𝑋𝑌 − ∑ 𝑋 ∑ 𝑌 ∑ 𝑋𝑌 − 𝑛𝑋̅𝑌̅
𝑏= =
𝑛 ∑ 𝑋 2 − (∑ 𝑋)2 ∑ 𝑋 2 − 𝑛𝑋̅ 2

48407 − 12 × 57.25 × 61.75


= = 0.9560
45591 − 12 × (57.25)2

𝑎 = 𝑋̅ − 𝑏𝑌̅ = 61.75 − 0.9560 × 57.25

= 7.0194

 Yˆ  7.0194  0.9560X is the estimated regressionline.


c) Insert X=85 in the estimated regression line.

Yˆ  7.0194  0.9560 X
 7.0194  0.9560(85)  88.28
Example 6.4: The following hypothetical data set shows income and monthly food expenditure
of household in hundreds of birr. Then,

a. Fit the least squares regression line to the given data.


b. Calculate a simple correlation coefficient (r)
c. Predict the food expenditure for 800 birr (8).
Income=X Expenditure=Y XY X2 Y2

3.8 3.1 11.78 14.44 9.61

4.5 3.6 16.2 20.25 12.96

2.5 2.3 5.75 6.25 5.29

97
4.8 3.7 17.76 23.04 13.69

7.7 4.6 35.42 59.29 21.16

5.0 4.1 20.5 25 16.81

12.6 6.5 81.9 158.76 42.25

8.5 5.1 43.35 72.25 26.01

5.5 4 22 30.25 16

7.1 4.1 29.11 50.41 16.81

3.5 3.2 11.2 12.25 10.24

 X  65.5  Y  44.3  XY  294.97  X 2


 472.19 Y 2
 190.83

n=11, X =5.95 and Y  4.03

a. Yˆ  a  bX
n XY   X  Y 11(294.97)  (65.45)(44.43)
b   0.38 and
n X  ( X )
2 2
11(472.19)  (65.45) 2

a  Y  bX  4.03  (0.38)(5.95)  1.79

Therefore, Yˆ  a  bX  1.79  0.38X

Interpretation: whenever income (X) is zero, the expenditure on food will be Birr 1.79 (179)
and for every Birr increase in income 38% of it will be spent on food.

n XY  ( X)( Y)
b. r   0.9829
[n  X 2  (  X ) 2 ] [n  Y 2  (  Y ) 2

c. X=8  Yˆ  a  bX  1.79  0.38X  1.79  0.38(8)  4.83

Summary

Correlation measures the closeness of the association, while linear regression gives the equation
of the straight line that best describes it and enables the prediction of one variable from the other.

98
It is useful to study the qualitative measure of attributes like honesty, colour, beauty, intelligence,
character, morality etc. The individuals in the group can be arranged in order and there on,
obtaining for each individual a number showing his/her rank in the group. Simple linear
regression deals with method of fitting a straight line (regression line) on a sample of data of two
variables in terms of equation so that if the value of one variable is given we can predict the
value of the other variable.

Exercise 6

Choose the best answer

1. If cov(x, y) = 0 then
A. x and y are correlated C. x and y are linearly related
B. x and y are uncorrelated D. none
2. Limits for correlation coefficient.
A. –1 ≤r ≤1
B. 0 ≤r ≤1
C. –1 ≤r ≤0
D. 1≤ r ≤2

99
1. Given the bivariate data:
X: 1 5 3 2 1 1 7 3
Y: 6 1 0 0 1 2 1 5
a) Fit a regression line of Y on x and hence predict Y if x=10.
b) Fit a regression line of x on y and hence predict X if y=2.5

Assignment 6: (10%)
1. For 10 observations on price (x) and supply (y) the following data was obtained (in
appropriate units): ∑ 𝑥 = 130, ∑ 𝑦 = 220, ∑ 𝑥 2 = 2288, ∑ 𝑦 2 = 5506, ∑ 𝑥𝑦 = 3467. obtain the
line of regression of y on x and estimate the supply when the price is 16 units.
a. Find the correlation coefficient between supply and price?
b. Fit the linear regression model
2. The grades of a class of 9 students on a midterm report (x) and on the final examination (y)
are as follows:
x 77 50 71 72 81 94 96 99 67
Y 82 66 78 34 47 85 99 99 68

a. Find the correlation coefficient.


b. Estimate the linear regression line.
c. Estimate the final examination grade of a student who received a grade of 85 on the
midterm report.

References

 Bluman, A.G (1995). Elementary statistics (2nd edition)


 Gupta, C.P (2004). Introduction to statistical methods (9th edition)
 Freund J.E and Simon G.A (1998). Modern Elementary statistics (9th edition)

100
Chapter 7: Elementary Probability

Objectives
At the end of this unit the students will be able to:
 Define probability and identify probability experiment
 Explain some terms of probability
 Explain and identify the counting rule for a given problem
 List the axioms of probability
 Compute the probability/conditional probability of an event
Contents:
7.1 Definitions of terms of probability
7.2 Counting rule (addition, multiplication, permutation, combination)
7.3 Approaches of measuring probability (Classical, frequents, Subjective)
7.4 Conditional probability and independence
7.5 Basic Concepts of Probability Distributions

7.1 Definitions of Some Probability Terms

Introduction: In general, probability is the chance of an outcome of an experiment. It is the


measure of how likely an event is to occur.
1. Experiment: Any process of observation or measurement or any process which generates
well defined outcome.
2. Probability (random) Experiment: It is an experiment that can be repeated any number of
times under similar conditions and it is possible to enumerate the total number of outcomes
without predicting an individual out come with certainty. It is also called random experiment.
Example: If a fair die is rolled once it is possible to list all the possible outcomes i.e.1, 2, 3, 4, 5,
6 but it is not possible to predict which outcome will occur with certainty.

Outcome: The result of a single trial of a random experiment. Each outcome in a sample space is
called an element or a member of the sample space, or simply a sample point
3. Sample Space: The set of all possible outcomes of a statistical experiment is called the
sample space and is represented by the symbol S.
4. Event: It is a subset of sample space. It is a statement about one or more outcomes of a
random experiment .They are denoted by capital letters.

101
Example: Considering the above experiment let A be the event of odd numbers, B be the event
of even numbers, and C be the event of number 8.

 A  1,3,5; B  2,4,6 C   or empty spaceor impossibleevent


Remark: If S (sample space) has n members then there are exactly 2n subsets or events.
5. Equally Likely Events: Events which have the same chance of occurring.
6. Complement of an Event: the complement of an event A means non- occurrence of A and is
denoted by A' , or Ac , or A contains those points of the sample space which don’t belong to A.
7. Elementary Event: an event having only a single element or sample point.
8. Mutually Exclusive Events: Two events which cannot happen at the same time. Events E1
and E2 are said to be mutually exclusive if there is no sample point in common. i.e. E 1  E2 =
.
9. Independent Events: Two events are independent if the occurrence of one does not affect
the probability of the other occurring.
10. Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
Example: An experiment consists of flipping a coin and then flipping it a second time if a head
occurs. If a tail occurs on the first flip, then a die is tossed once. To list the elements of the
sample space providing the most information, we construct the tree diagram of Figure 2.1. The
various paths along the branches of the tree give the distinct sample points. Starting with the top
left branch and moving to the right along the first path, we get the sample point HH, indicating
the possibility that heads occurs on two successive flips of the coin. Likewise, the sample point
T3 indicates the possibility that the coin will show a tail followed by a 3 on the toss of the die.
By proceeding along all paths, we see that the sample space is
𝑆 = {HH, HT, T1, T2, T3, T4, T5, T6}
NB. In some experiments, it is helpful to list the elements of the sample space ( To list the
outcomes of the sequence of events )systematically by means of a tree diagram.

7.2 Counting rule (Addition, Multiplication, Permutation, Combination)

In order to calculate probabilities, we have to know


 The number of elements of an event
 The number of elements of the sample space.
 That is in order to judge what is probable, we have to know what is possible.
102
 In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule
- The multiplication rule
- Permutation rule
- Combination rule

a) The Addition rule


A task A consists of m subtasks A1, A2, Am that can be done in n(A1), n(A2), . . . , n(Am) ways,
respectively. If no two of them can be done at the same time, then the total number of ways n(A)
of doing the task A is: 𝒏(𝑨) = 𝑛(𝐴1 ) + 𝑛(𝐴2 ) + ⋯ + 𝑛(𝐴𝑚 )
Example: An engineering student undertaking a capstone project T has a choice of 4 professors
A, B, C, D who can give him a list of projects from which he can choose one. A has 5 projects, B
has 7 projects, C has 6 projects, and D has 2 projects. Since the student cannot take any other
project once he has chosen a project, the number of choices he has is:
n(T) = n(A) + n(B) + n(C) + n(D) = 5 + 7 + 6 + 2 = 20
Example: A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or
milk with bread, cake and sandwich. How many possibilities does he have?

Solutions:

Tea Sandwitch Coeffee Sandwich Milk Sandwich

Cake Cake Cake

Bread Bread Bread

 There are nine possibilities.

3+3+3=9

b) The Multiplication Rule

If a choice consists of k steps of which the first can be made in n1 ways, the second can be made
in n2 ways…, the kth can be made in nk ways, then the whole choice can be made in
(n1 * n2 * ........ * nk ) ways.

Example 1: The digits 0, 1, 2, 3, and 4 are to be used in 4 digit identification card. How many
different cards are possible if

103
a) Repetitions are permitted.
b) Repetitions are not permitted.
Solutions:

a) 5 * 5 * 5 * 5  625 differentcards are possible.


b) 5 * 4 * 3 * 2  120 differentcards are possible.

Example 2: How many sample points are there in the sample space when a pair of dice is thrown
once?
Solution: The first die can land face-up in any one of n1 = 6 ways. For each of these 6 ways, the
second die can also land face-up in n2 = 6 ways. Therefore, the pair of dice can land in n1n2 =
(6)×(6) = 36 possible ways.
Example 3: In a personnel department a larger corporation wishes to issue each employee an ID
cards with two letters followed by two digit numbers. How many possible ID cards can be
imposed?
Solution
K1 K2 K3 K4
26 26 10 10

Thus the total number of ID cards issued could be:


26*26*10*10=67600(with repetition)
26*25*10*9=58500 (with repetition)

c) Permutation

An arrangement of n objects in a specified order is called permutation of the objects.


Permutation Rules:

1. The number of permutations of n distinct objects taken all together is n!


Where n! n * (n  1) * (n  2) * ..... * 3 * 2 *1
2. The arrangement of n objects in a specified order using r objects at a time is called the
permutation of n objects taken r objects at a time. It is written as n Pr and the formula is

n!
Pr 
( n  r )!
n

104
3. The number of permutations of n objects in which k1 are alike, k2 are alike,…, kn are alike;
is given by:
n!
n Pr 
k1!*k2 * ... * kn
4. The number of permutations of n objects arranged in a circle is (n − 1)!.
Examples:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there two letters at a time?
2. How many different permutations can be made from the letters in the word
“CORRECTION”?

Solutions:
1.
a) Here n=4, there are four distinct objects, there are 4!=24 permutations
4! 24
b) Here n=4, r=2; then 4 P2    12 permutations.
(4  2)! 2
2. Here , n=10 Of which 2 areC , 2 areO, 2 are R ,1E ,1T ,1I ,1N

 K1  2, k2  2, k3  2, k4  k5  k6  k7  1; Using the3rd ruleof permutation , there are;


10!
 453600 permutations.
2!* 2!* 2!*1!*1!*1!*1!

Example: In one year, three awards (research, teaching, and service) will be given to a class of
25 graduate students in a statistics department. If each student can receive at most one
award, how many possible selections are there?
Solution: Since the awards are distinguishable, it is a permutation problem. The total number of
25! 25!
sample points is: 25𝑃3 = (25−3)! = 22! = (25)(24)(23) = 13,800

d) Combination

 A selection of objects without regard to order is called combination.


 It is an arrangement of objects /items without considering a specific order.

Combination Rule

105
n
The number of combinations of r objects selected from n objects is denoted by n Cr or   and
r
 n n!
is given by the formula:   
 r  (n  r )!*r!
Examples:

1. In how many ways a committee of 5 people be chosen out of 9 people?


Solutions:

n9 , r 5
n n! 9!
     126 ways
 r  (n  r )!*r! 4!*5!

2. Among 15 clocks there are two defectives .In how many ways can an inspector chose three of
the clocks for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions:

n=15 of which 2 are defective and 13 are non-defective; r=3

a) If there is no restriction select three clocks from 15 clocks and this can be done in :
n  15 , r  3
n n! 15!
     455 ways
 r  (n  r )!*r! 12!*3!

b) None of the defective clocks is included.


This is equivalent to zero defective and three non defective, which can be done in:

 2  13
  *    286 ways.
 0  3 

c) Only one of the defective clocks is included.


This is equivalent to one defective and two non defective, which can be done in:

106
 2  13
  *    156 ways.
1  2 

d) Two of the defective clock is included.


This is equivalent to two defective and one non defective, which can be done in:

 2  13
  *    13 ways.
 2  3 

7.3 Approaches of measuring probability (Classical, frequents, Subjective)

There are four different conceptual approaches to the study of probability theory. These are:

a) The classical approach.


b) The frequentist approach.
c) The axiomatic approach.
d) The subjective approach.

a) The Classical Approach

This approach is used when:


 All outcomes are equally likely.
 Total number of outcome is finite, say N.

Definition: If a random experiment with N equally likely outcomes is conducted and out of these
NA outcomes are favorable to the event A, then the probability that event A occur denoted P ( A)
is defined as:
N A No. of outcomes favourableto A n( A)
P( A)   
N Total numberof outcomes n(S )

Examples:

1. A fair die is tossed once. What is the probability of getting


a) Number 4?
b) An odd number?
c) An even number?
d) Number 8?

107
Solutions:

First identify the sample space, say S

S  1, 2, 3, 4, 5, 6
 N  n( S )  6

a) Let A be the event of number 4 b) Let B be the event an odd number


B  1,3,5
A  4
 N A  n( A)  1  N B  n( B )  3
n( A) n( B )
P ( A)  1 6 P( B)   3 6  0.5
n( S ) n( S )
c) Let C be the event an even number d) Let D be the event Number 8
C  2, 4,6 D  
b)  NC  n(C )  3  N D  n( D )  0
n(C ) n( D )
P(C )   3 6  0.5 P( D)   0 6  0.0
n( S ) n( S )
2. A box of 80 candles consists of 30 defective and 50 non defective candles. If 10 of this
candles are selected at random, what is the probability
a) All will be defective.
b) 6 will be non defective
c) All will be non defective
Solutions:
 80
Total selection     N  n(S )
10 

a) Let A be the event that all will be defective.


 30   50 
Total way in which A occur    *    N A  n( A)
 10   0 
 30   50 
 * 
n( A)  10   0 
 P( A)    0.00001825
n( S )  80 
 
 10 
b) Let A be the event that 6 will be non defective.

108
 30   50 
Total way in which A occur    *    N A  n( A)
4 6
 30   50 
 * 
n( A)  4   6 
 P ( A)    0.265
n( S )  80 
 
 10 
c) Let A be the event that all will be non defective.
 30   50 
Total way in which A occur    *    N A  n( A)
 0   10 
 30   50 
 * 
n( A)  0   10 
 P ( A)    0.00624
n( S )  80 
 
 10 
3. Some customers prefer to see the merchandise but then make their purchase later using Lee’s
Lights’ new Internet site. Tracking customer behavior, Lee determines that there’s a 9%
chance of a customer making a purchase in this way. We know that about 30% of customers
make purchases when they enter the store.
Question:
a) What is the probability that a customer who enters the store makes no purchase at all?
Answer: We can use the Addition Rule because the alternatives “no purchase,” “purchase in the
store” and “purchase online” are disjoint events.

𝑃(𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑡𝑜𝑟𝑒 𝑜𝑟 𝑜𝑛𝑙𝑖𝑛𝑒) = 𝑃(𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒 𝑖𝑛 𝑠𝑡𝑜𝑟𝑒) + 𝑃( 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒 𝑜𝑛𝑙𝑖𝑛𝑒)


= 0.3 + 0.09 = 0.39

𝑃(𝑛𝑜 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) = 𝑃(𝑛𝑜𝑡 𝑃(𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑠𝑡𝑜𝑟𝑒 𝑜𝑟 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒 𝑜𝑛𝑙𝑖𝑛𝑒))


= 1 − 𝑃(𝑖𝑛 𝑠𝑡𝑜𝑟𝑒 𝑜𝑟 𝑜𝑛𝑙𝑖𝑛𝑒) = 1 − 0.39 = 0.61

4. Lee notices that when two customers enter the store together, their behavior isn’t independent.
In fact, there’s a 20% chance they’ll both make a purchase.
If 𝑃(𝐴 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) = 𝑃( 𝐵 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) = 0.3
Question: When two customers enter the store together, what is the probability that at least one
of them will make a purchase?

109
Answer: Now we know that the events are not independent, so we must use the General
Addition Rule
𝑃(𝐵𝑜𝑡ℎ 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒) = 𝑃(𝐴 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑠 𝑜𝑟 𝐵 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑠)
= 𝑃(𝐴 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑠) + 𝑃(𝐵 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒𝑠)– 𝑃(𝐴 𝑎𝑛𝑑 𝐵 𝑏𝑜𝑡ℎ 𝑝𝑢𝑟𝑐ℎ𝑎𝑠𝑒)
= 0.30 + 0.30 − 0.20 = 0.40
b) The Frequentist Approach (Empirical Approach)

This is based on the relative frequencies of outcomes belonging to an event.


Definition: The probability of an event A is the proportion of outcomes favorable to A in the
long run when the experiment is repeated under same condition.
NA
P( A)  lim
N  N

Example:
1. If records show that 60 out of 100,000 bulbs produced are defective. What is the probability
of a newly produced bulb to be defective?
Solution:
Let A be the event that the newly produced bulb is defective.
NA 60
P( A)  lim   0.0006
N  N 100,000

2. The national center for health statistics reported that of every 539 deaths in recent years, 24
resulted that from automobile accident, 182 from cancer, and 353 from other disease. What is
the probability that particular death is due to an automobile accident?
Solution: P (automobile) = death due to automobile /total death =24/539

c) Axiomatic Approach: (Rule)

Let E be a random experiment and S be a sample space associated with E. With each event A a
real number called the probability of A satisfies the following properties called axioms of
probability or postulates of probability.

1. P ( A)  0;for any event A


2. P ( S )  1, S ,Sample space, is the sure event.

110
3. If A and B are mutually exclusive events, the probability that one or the other occur equals
the sum of the two probabilities. i. e. P ( A  B )  P ( A)  P ( B ) ; P( A B)  0

4. P( A' )  1  P( A)
5. 0  P ( A)  1
6. P(ø) =0, ø is the impossible event.
 Remark: Venn-diagrams can be used to solve probability problems.

AUB AnB A
In general, p( A  B )  p ( A)  p ( B )  p ( A  B) , for any events A and B.

Examples
1. John is going to graduate from a national economics in a university by the end of the
semester. After being interviewed at two companies he likes, he assesses that his probability
of getting an offer from company A is 0.8, and his probability of getting an offer from
company B is 0.6. If he believes that the probability that he will get offers from both
companies is 0.5, what is the probability that he will get at least one offer from these two
companies?
Solution: using the additive rule, we have
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = 0.8 + 0.6 − 0.5 = 0.9
2. Find the errors in each of the following statement.
a. The probability that it will rain tomorrow is 0.40, and the probability that it will not rain
tomorrow is 0.52.
b. The probabilities that a printer will make 0, 1, 2, 3, or 4 or more mistakes in setting a
document are, respectively, 0.19, 0.34,−0.25, 0.43, and 0.29.
Solutions:
a. Let A= the event it will rain tomorrow and B= the event it will not rain tomorrow. Then if
P(A) = 0.40 , P(B) must be equal to 1- P(A).i.e. P(B) = 1-0.40=0.60 ≠ 0.52.

111
b. Here there are two errors; the first error is probability of any event cannot be negative
(P(2) = -0.25), second although we consider there is no the first error , the sum of the
probabilities cannot be greater than 1.
a. The probability that an American industry will locate in Shanghai, China, is 0.7, the
probability that it will locate in Beijing, China, is 0.4, and the probability that it will
locate in either Shanghai or Beijing or both is 0.8. What is the probability that the
industry will locate
b. in both cities?
c. in neither city?
Solution: let A=the event that the industry will locate in Shanghai and
B= the event that the industry will locate in Beijing
p ( A)  0.7 , p ( B )  0.4 and p ( A  B)  0.8

a. p( A  B)  p  A  p  B   p  A  B   0.7  0.4  0.8  0.3

b. p( A ' B ') p ( A  B ) ' = 1  p( A  B)  1  0.8  0.2


=
d. Subjective Approach
Definition: The use of intuition, personal beliefs, and other indirect information in arriving at
probabilities is referred to as the subjective definition of probability.

7.4 Conditional Probability and Independency

Conditional Events: If the occurrence of one event has an effect on the next occurrence of the
other event then the two events are conditional or dependant events.

Conditional probability:
The conditional probability of an event A in relation to B is defined as the probability that event
A occurs given that event B has been already occurred, denoted p( A B) is
p( A  B) p( A  B)
p( A B)  , p( B)  0 or p ( B A)  , p ( A)  0
p( B) p ( A)

Remark: (1). p( A' B)  1- p( A B) (2). p( B' A)  1  p( B A)

p( B A) 
P  A  B
P  A

P  B  A
P  A
   
(4). pB   pB A. p A  p B A' . p A'
(3).

112
Example:
1. Suppose we have two red and three white balls in a bag
a. Draw a ball with replacement

Let A= the event that the first draw is red p ( A)  2 ;


5

B= the event that the second draw is red p ( B )  2 ; A and B are independent.
5
b. Draw a ball without replacement

Let A= the event that the first draw is red p ( A)  2 ;


5
B= the event that the second draw is red p( B)  ? ; this is conditional.

Let B= the event that the second draw is red given that the first draw is red  p ( B )  1 4
2. The probability that a regularly scheduled flight departs on time is P (D) = 0.83; the
probability that it arrives on time is P (A) = 0.82; and the probability that it departs and arrives
on time is P (D ∩A) = 0.78. Find the probability that a plane
a. Arrives on time, given that it departed on time, and
b. Departed on time, given that it has arrived on time.
Solutions: Using the definitions we have the following
The probability that a plane arrives on time, given that it departed on time, is P(A|D)
 P( D  A) P( D)  0.78  0.94 .
0.83

(b) The probability that a plane departed on time, given that it has arrived on time, is P(D|A)
 P( D  A) P( D)  0.78  0.95 .
0.82

3. The following table shows the category of Adults in a Small Town


Employed Unemployed Total
Male 460 40 500
Female 140 260 400
Total 600 300 900

One of these individuals is to be selected at random for a certain task.

Find the probability that:

i.The selected individual is a male given that he/she is employed.

113
ii.The selected individual is unemployed given that she is a female.
iii.The selected individual is employed.
iv.The selected individual is a female given that he/she is unemployed.
Solutions: Consider; M=Male, F=Female, E=Employed and U= Unemployed
PM  E
i. p(M E )   23/ 45  460  23
PE 2/3 600 30

P  F U 
ii. p(U F )   260  13
PF  400 20

iii. p  E   600 2
900 3
P  F U 
iv. p( F U )   260  13
P U  300 15

4. For a student enrolling at freshman at certain university the probability is 0.25 that he/she will
get scholarship and 0.75 that he/she will graduate. If the probability is 0.2 that he/she will
get scholarship and will also graduate. What is the probability that a student who get a
scholarship graduate?
Solution: Let A= the event that a student will get a scholarship

B= the event that a student will graduate

given p ( A)  0.25, p ( B)  0.75, p A  B   0.20


Re quired pB A
p A  B  0.20
p  B A    0.80
p  A 0.25

Probability of Independent Events

Two events A and B are independent if and only if p  A  B   p  A  p  B 

Here P A B  p A, PB A  pB 

Examples:
1. A box contains four black and six white balls. What is the probability of getting two
black balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced

114
b.The first ball drawn is replaced
2. A small town has one fire engine and one ambulance available for emergencies. The
probability that the fire engine is available when needed is 0.98, and the probability that
the ambulance is available when called is 0.92. In the event of an injury resulting from a
burning building, find the probability that both the ambulance and the fire engine will be
available, assuming they operate independently.
3. The probability that a married man watches a certain television show is 0.4, and the
probability that a married woman watches the show is 0.5. The probability that a man
watches the show, given that his wife does, is 0.7. Find the probability that
a. a married couple watches the show;
b. a wife watches the show, given that her husband does;
Solutions:

1. Let A= first drawn ball is black, B= second drawn is black, Required p A  B 

a. p A  B  pB A. p A  4 103 9  2 15

b. p A  B  p A. pB  4 104 10  4 25

2. Let A and B represent the respective events that the fire engine and the ambulance are

available. Then P  A  B   p  B   p  A   0.92    0.98  0.9016


3. Let p  M   0.4; p W   0.5, and P M W  0.7 
a. p W  M   p  M W  . p W    0.7  0.5  0.35

p W  M  0.35
b. p W M     0.875
PM  0.4

7.5 Basic Concepts of Probability Distributions

Definition: A random variable is a numerical description of the outcomes of the experiment or a


numerical valued function defined on sample space, usually denoted by capital letters.
 It is a function that associates a real number with each element in the sample space.

115
 It a random Variable is variable whose values are determined by chance.
Example: If X is a random variable, then it is a function from the elements of the sample space
to the set of real numbers. i.e.

X is a function X: S  R
Example: Flip a coin three times, let X be the number of heads in three tosses.

 S  HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 


 X HHH   3,
X HHT   X HTH   X THH   2, X = {0, 1, 2, 3}
X HTT   X THT   X TTH   1
X TTT   0

X assumes a specific number of values with some probabilities.


Random variables are of two types:
i.Discrete random variable

Discrete random variable: are variables which can assume only a specific number of values
which are clearly separated and they can be counted.
Example:
b. Toss coin n times and count the number of heads.
c. Number of Children in a family.
d. Number of car accidents per week.
e. Number of defective items in a given company.
f. Number of bacteria per two cubic centimeter of water.

ii.Continuous random variable


Continuous random variable: are variables that can assume any in an interval.
Example:
a. Height of students at certain college.
b. Mark of a student.
c. Life time of light bulbs.
d. Length of time required to complete a given training.

Probability Distribution of a Random Variable


116
Definition: is a complete list of all possible of values of a random variable and their
corresponding probabilities. Consists of a value a random variable can assume and the
corresponding probabilities of the values. It can be of two types. Probability distribution is
denoted by ‘P’ for discrete random variable and by ‘f’ for continuous random variable.

Discrete Probability Distribution: is a distribution whose random variable is discrete. It is


usually called probability Mass function (Pmf).

Every discrete random variable X has a point associated with it. The points collectively are
known as a probability mass function which can be used to obtain probabilities associated with
the random variable.

Let X be a discrete random variable, then the probability mass function is given by
f(x) = P(X=x), for real number x.

The set of order pairs (x, p(x)) is a probability function, probability mass function, or
probability distribution of the discrete random variable X if, for each possible outcome x,
P( x)  0, if X is discrete.
1.

2.
 P  X  x  1 ,
x
if X is discrete.

3. P( X  u )  f  u  for any u

4. If X is a discrete random variable with any constants a and b the


b 1
P ( a  X  b)   P( x)
x  a 1
b 1
P ( a  X  b)   p ( x)
xa
b
P ( a  X  b)   P( x)
x  a 1
b
P ( a  X  b)   P( x)
xa

Example: Consider the experiment of tossing a coin three times. Let X be the number of heads.
Construct the probability distribution of X.

Probability density function (continuous probability distribution): is a probability


distribution whose random variable is continuous. Probability of a single value is zero and
117
probability of an interval is the area bounded by curve of probability density function and
interval on x-axis. Let a and b be any two values; a<b. The probability that X assumes a value
that lies between a and b is equal to the area under the curve a and b. i.e. P (a  x  b) area under
curve between a and b. Since X must assume some value, it follows that the total area under the
density curve must equal 1.
b
P(a  x  b ) area of shaded region P (a  X  b)  p (a  x  b)   f ( x)dx
a

a b

Fig. probability density functions of X


Note:
1. If f(x) is a probability density function, and X is a continuous random variable:
𝑎) 𝑓(𝑥) ≥ 0, ∀𝑥 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑜𝑚𝑎𝑖𝑛

𝑏) ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞

2. If X is a continuous random variable then


b
P(a  X  b)   f ( x)dx
a

3. Probability of a fixed value of a continuous random variable is zero. i.e. P ( X = a ) = 0


for any point a
Solution:
 First identify the possible value that X can assume.
 Calculate the probability of each possible distinct value of X and express X in the form of
frequency distribution.
X x 0 1 2 3
P X  x 18 38 38 18

Probability density function (continuous probability distribution): is a probability


distribution whose random variable is continuous. Probability of a single value is zero and

118
probability of an interval is the area bounded by curve of probability density function and
interval on x-axis. Let a and b be any two values; a<b. The probability that X assumes a value
that lies between a and b is equal to the area under the curve a and b. i.e. P (a  x  b) area under
curve between a and b. Since X must assume some value, it follows that the total area under the
density curve must equal 1.
b
P(a  x  b ) area of shaded region P (a  X  b)  p (a  x  b)   f ( x)dx
a

a b

Fig. probability density functions of X


Note:
1. If f(x) is a probability density function, and X is a continuous random variable:
𝑎) 𝑓(𝑥) ≥ 0, ∀𝑥 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑜𝑚𝑎𝑖𝑛

𝑏) ∫ 𝑓(𝑥)𝑑𝑥 = 1
−∞

2. If X is a continuous random variable then


b
P(a  X  b)   f ( x)dx
a

3. Probability of a fixed value of a continuous random variable is zero. i.e. P ( X = a ) = 0


for any point a

Examples: 1. Suppose that the error in the reaction temperature, in ◦C, for a controlled
laboratory experiment is a continuous random variable X having the probability density function

𝑥2
𝑓(𝑥) = { 3 , −1 < 𝑥 < 2,
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒.

a) Verify that 𝑓(𝑥) is a density function

b) Find 𝑃(0 < 𝑥 ≤ 1)

Solution: by using the definitions above;

119
a) Obviously, f(x) ≥0. To verify condition 2 above, we have
∞ 2 2
𝑥3 𝑥3 8 1
∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = | = + = 1
3 9 −1 9 9
−∞ −1

b) By using the formula


1 1
𝑥3 𝑥3 1
𝑃(0 < 𝑋 ≤ 1) = ∫ 𝑑𝑥 = | =
3 9 0 9
0

Summary

Any realistic model of a real-world phenomenon must take into account the possibility of
randomness. That is, more often than not, the quantities we are interested in will not be
predictable in advance but, rather, will exhibit an inherent variation that should be taken into
account by the model.

Probability is a measure associated with an event A and denoted by 𝑃(𝐴) which takes a value
such that 0 ≤ 𝑃(𝐴) ≤ 1 0. Essentially the quantitative expression of the chance that an event
will occur. In general the higher the value of 𝑃(𝐴) the more likely it is that the event will occur.
If an event cannot happen 𝑃(𝐴) = 0; if an event is certain to happen 𝑃(𝐴) = 1. Numerical
values can be assigned in simple cases by one of the following two methods:
1) If the sample space can be divided into subsets of n (n ≥2) equally likely outcomes and the
event A is associated with r (0 ≤ r ≤n) of these, then 𝑃(𝐴) = 𝑟/𝑛.
2) If an experiment can be repeated a large number of times, n, and in r cases the event A occurs,
then 𝑟/𝑛 is called the relative frequency of A. If this leads to a limit as 𝑛 → ∞, this limit is
𝑃(𝐴).

Exercises 7

1. A professor assigns a capstone project T to senior engineering students consisting of four


parts A, B, C, D that must be done in sequence independent of the previous ones. Part A can
be done in 2 ways, part B in 3 ways, part C in 5 ways, and part D in 4 ways. Since A, B, C, D
are functionally independent. In how many ways the project can be done?
2. Sam is going to assemble a computer by himself. He has the choice of chips from two brands,
a hard drive from four, memory from three, and an accessory bundle from five local stores.
How many different ways can Sam order the parts?
3. In how many ways can 7 graduate students be assigned to 1 triple and 2 double hotel rooms
during a conference?

120
4. How many different letter arrangements can be made from the letters in the word
“STATISTICS?”
5. In how many ways can 5 different trees be planted in a circle?
6. If the probability that a research project will be well planned is 0.60 and the probability that it
will be well planned and well executed is 0.54, what is the probability that it will be well
executed given that it is well planned?

Assignment 7 :( 5%)
1. In a college football training session, the defensive coordinator needs to have 10 players
standing in a row. Among these 10 players, there are 1 freshman, 2 sophomores 4 juniors, and
3 seniors. How many different ways can they be arranged in a row if only their class level will
be distinguished?
2. How many ways are there to select 3 candidates from 8 equally qualified recent graduates for
openings in an accounting firm?
3. If 3 books are picked at random from a shelf containing 5 novels, 3 books of poems, and a
dictionary, what is the probability that
i. The dictionary is selected?
ii. 2 novels and 1 book of poems are selected?
References

 Bluman, A.G (1995). Elementary statistics (2nd edition)


 Gupta, C.P (2004). Introduction to statistical methods (9th edition)
 Freund J.E and Simon G.A (1998). Modern Elementary statistics (9th edition)

Answer Key

Exercise 1
1.1

121
A. Interval D. Ratio G. Nominal J. Nominal
B. Interval E. Nominal H. Ratio
C. Ordinal F. Nominal I. Ratio

1.2 A 1.3 D 1.4 D


Exercise 2
1. B 2. C 3. D
Exercise 3
1.A 2.C
Exercise 4
1. False 4. A
2. True 5. B
3. True
Exercise 6
1. B
2. B
3. 𝑎) 𝑌 = 2.875 − 0.304𝑋, 𝑖𝑓 𝑥 = 10 𝑡ℎ𝑒𝑛 𝑦 = 0.16
𝑏) 𝑋 = 3.431 − 0.278𝑌, 𝑖𝑓 𝑦 = 2.5, 𝑡ℎ𝑒𝑛 𝑥 = 2.736

Biblography (References)

 Bluman, A.G (1995). Elementary statistics (2nd edition)


 Gupta, C.P (2004). Introduction to statistical methods (9th edition)
 Freund J.E and Simon G.A (1998). Modern Elementary statistics (9th edition)

 Hogg, R.V. and Tanis, E. (2009). Probability and Statistical Inference (8th Edition).
Prentice Hall.
 Cochran, W. G. (1977). Sampling Techniques, 3rd , Ed, John Wiley& Sons, Inc., New

York.

122
Appendix: Standard Tables of Statistics

123
-

124
-
125
-

126
-

127

You might also like