Introduction To Statistics For Economists: Allama Iqbal Open University
Introduction To Statistics For Economists: Allama Iqbal Open University
Introduction to Statistics
for Economists
INTRODUCTION TO STATISTICS
FOR ECONOMISTS
BS Economics (4 Year)
Credit Hours: 3
DEPARTMENT OF ECONOMICS
FACULTY OF SOCIAL SCIENCES AND HUMANITIES
ALLAMA IQBAL OPEN UNIVERSITY
I
(Copyright 2023 AIOU Islamabad)
All rights reserved. No part of this publication maybe reproduced, stored in retrieval
system, or transmitted in any form or by any means, electronic, mechanical,
photocopying recording, scanning or otherwise, except as permitted under AIOU
copyright ACT.
1st Edition…………………………………….2023
Quantity………………………………………1000
Printer…………………………………………AIOU, Islamabad
II
Course Team
1. Incharge Dr. Fouzia Jamshaid
III
CONTENTS
1. Introduction to the Course i
2. Course Learning Outcomes iii
3. Structure of the Study Guide iv
3.1 How to use Reading Material? iv
3.2 Study Chart vi
3.3 How to Attend a Tutorial? vi
4. Methods of Instructions viii
4.1 Assignments viii
4.2 Tutorial Support viii
4.3 Assessment viii
5. Prescribed Readings x
Unit-1 Introduction 1
Unit-6 Probability 97
IV
1 – Introduction to the Course
If you invest in financial markets, you may want to predict the price of a stock in
six months from now on the basis of company performance measures and other
economic factors. As a university student, you may be interested in knowing the
dependence of the mean starting salary of a college graduate, based on your GPA.
These are just some examples that highlight how statistics are used in our modern
society. To figure out the desired information for each example, you need data to
analyze or knowledge of Statistics.
The purpose of this course is to introduce you to the subject of statistics as a science
of data. There is data abound in this information age; how to extract useful
knowledge and gain a sound understanding of complex data sets has been more of
a challenge. In this course, we will focus on the fundamentals of statistics, which
may be broadly described as the techniques to collect, clarify, summarize, organize,
analyze, and interpret numerical information.
This course will begin with a brief overview of the discipline of statistics and will
then quickly focus on descriptive statistics, introducing graphical methods of
describing data. You will learn about combinatorial probability and random
distributions, the latter of which serves as the foundation for statistical
inference. On the side of inference, we will focus on both estimation and
hypothesis testing issues. We will also examine the techniques to study the
relationship between two or more variables; this is known as regression.
By the end of this course, you should gain a sound understanding of what statistics
represent, how to use statistics to organize and display data, and how to draw valid
inferences based on data by using appropriate statistical tools.
There are nine units in total. First five unit are devoted to Introduction to Statistics,
presentation, central tendency and variability. In today's technologically advanced
world, we have access to large volumes of data. The first step of data analysis is to
accurately summarize all of this data, both graphically and numerically, so that we
can understand what the data reveals. To be able to use and interpret the data
correctly is essential to making informed decisions. For instance, when you see a
survey of opinion about a certain TV program, you may be interested in the
proportion of those people who indeed like the program. In these units, you will
learn about descriptive statistics, which are used to summarize and display data.
After completing each unit, you will know how to present your findings once you
have collected data. For example, suppose you want to buy a new mobile phone
i
with a particular type of a camera. Suppose you are not sure about the prices of any
of the phones with this feature, so you access a website that provides you with a
sample data set of prices, given your desired features. Looking at all of the prices
in a sample can sometimes be confusing. A better way to compare this data might
be to look at the mean, median price and the variation of prices. The mean, median
and variation are two ways out of several ways that you can describe data. You can
also graph the data so that it is easier to see what the price distribution looks like.
Probabilities affect our everyday lives. In this unit, you will learn about probability
and its properties, how probability behaves, and how to calculate and use it. You
will study the fundamentals of probability and will work through examples that
cover different types of probability questions. These basic probability concepts will
provide a foundation for understanding more statistical concepts, for example,
interpreting polling results. Though you may have already encountered concepts of
probability, after this unit, you will be able to formally and precisely predict the
likelihood of an event occurring given certain constraints.
Probability theory is a discipline that was created to deal with chance phenomena. For
instance, before getting a surgery, a patient wants to know the chances that the surgery
might fail; before taking medication, you want to know the chances that there will be
side effects; before leaving your house, you want to know the chance that it will rain
today. Probability is a measure of likelihood that takes on values between 0 and 1,
inclusive, with 0 representing impossible events and 1 representing certainty. The
chances of events occurring fall between these two values.
The skill of calculating probability allows us to make better decisions. Whether you
are evaluating how likely it is to get more than 50% of the questions correct on a
quiz if you guess randomly; predicting the chance that the next storm will arrive by
the end of the week; or exploring the relationship between the number of hours
students spend at the gym and their performance on an exam, an understanding of
the fundamentals of probability is crucial.
We will also talk about random variables. A random variable describes the
outcomes of a random experiment. A statistical distribution describes the numbers
of times each possible outcome occurs in a sample. The values of a random variable
can vary with each repetition of an experiment. Intuitively, a random variable,
summarizing certain chance phenomenon, takes on values with certain
probabilities. A random variable can be classified as being either discrete or
ii
continuous, depending on the values it assumes. Suppose you count the number of
people who go to a coffee shop between 4 p.m. and 5 p.m. and the amount of
waiting time that they spend in that hour. In this case, the number of people is an
example of a discrete random variable and the amount of waiting time they spend
is an example of a continuous random variable.
In unit 8, we will discuss situations in which the mean of a population, treated as a
variable, depends on the value of another variable. One of the main reasons why
we conduct such analyses is to understand how two variables are related to each
other. The most common type of relationship is a linear relationship. For example,
you may want to know what happens to one variable when you increase or decrease
the other variable. You want to answer questions such as, "Does one variable
increase as the other increases, or does the variable decrease?” For example, you
may want to determine how the mean reaction time of rats depends on the amount
of drug in bloodstream.
In unit 8 and 9, you will also learn to measure the degree of a relationship between
two or more variables. Both correlation and regression are measures for comparing
variables. Correlation quantifies the strength of a relationship between two
variables and is a measure of existing data. On the other hand, regression is the
study of the strength of a linear relationship between an independent and dependent
variable and can be used to predict the value of the dependent variable when the
value of the independent variable is known.
The Study Guide in your hand provides you the introduction of each Unit followed
by the objectives of the Unit. In each Unit throughout the Study Guide, we have
given self-assessment questions. They are meant to assist your comprehension after
reading the Unit the useful reading list is also provided for each Unit.
This study guide/course has been organized to enable you to acquire the skill of
self-learning. For each unit an introduction is given, to help you to develop an
objective analysis of the major and sub-themes, discussed in the prescribed reading
materials. Besides this, learning outcomes of each unit are very specifically laid
iv
down to facilitate in developing logical analytical approach. Summary of main
topics has also been included in the contents to understand the topics. We have
given you a few self-assessments questions and activities which are not only meant
to facilitate you in understanding the required reading materials, but also to provide
you an opportunity to assess yourself. Recommended books and important links
have been given to understand the main topics. Key terms have also been included
in the study guide.
Every course has a study package including study guides, assignments and tutorial
schedule uploaded by the University. For the books suggested at the end of each
unit you can visit online resources, a nearby library/study center or the Central
Library at main campus in AIOU.
Course Materials
The primary learning materials for this course are:
Readings (e.g., study guides, recommended books, online links and scholarly
articles)
Lectures, (tutorial and workshops)
Other resources.
All course materials are free to access and can be found through the links provided
in each unit and sub-unit of the course. Pay close attention to the notes that
accompany these course materials, as they will instruct you as to what specifically
to read or watch at a given point in the course and help you to understand how these
individual materials fit into the course. You can also access a list all the materials
used in this course by clicking on resources mentioned in each unit.
Technical Requirements
This course is delivered online through Learning Management System (LMS). You
will be required to have access to a computer or web-capable mobile device and have
consistent access to the internet either to view or download the necessary course
resources and to attempt any auto-graded course assessments and the final exam.
Methods of Instruction
Following are the methods for directing this guide and course also and then you
will be able to understand the macroeconomics course through.
v
Lecture online
Mandatory workshops
Workshop Quizzes
Class discussion during workshops
Individual, paired and small group exercises
Use of library for research projects
Use of videos lectures
Use of the internet
Types of Assignments
Students must complete assignments from the recommended books and other
sources also.
Students must be able to research and complete the assignments, which will
include library, Internet and another media research.
Activities
In most units, different types of activities are mentioned for better understanding of
the course. If you thoroughly study the materials and follow the links and videos,
then you will be able to understand the course in the easiest way.
Step 1
Go through them.
1. Course Outlines
2. Course Introduction
3. Course Learning Outcomes
4. Structure of the Course
5. Assessment Methods
6. Recommended Books
7. Suggested Readings
vi
Step 2
Read the whole unit and make notes of those points which you could not fully
understand or wish to discuss with your course tutor.
Step 3
Go through the self-assessment questions at the end of each unit. If you find any
difficulty in comprehension or locating relevant material, discuss it with your tutor.
Step 4
Study the compulsory recommended books at least for three hours in a week
recommended in your study guide. AIOU Tries to read it with the help of a specific
study guide for the course. You can raise questions on both during your tutorial
meetings and workshops.
Step 5
First go through assignments, which are mandatory to solve/complete for this
course. Highlight all the points you consider difficult to tackle, and then discuss in
detail with your tutor. This exercise will keep you regular and ensure good results
in the form of higher grades.
Assessment
For each three credit hours course, a student will be assessed as follow:
vii
Assignments
Assignments are written exercises that are required to complete at home or
place of work after having studied 9 units/study guides with the help of
compulsory and suggested reading material within the scheduled study
period. (See the assignments scheduled).
For this course 02 assignments are uploaded on the AIOU portal along with
allied material. You are advised to complete your assignments within the
required time and upload it to your assigned tutor.
This is compulsory course work, and its successful completion will make you
eligible to take the final examination at the end of the semester.
You will upload your assignments to your appointed tutor, whose name is
notified to you for assessment and necessary guidance through concerned
Regional Office of AIOU. You can also locate your tutor through AIOU
website. Your tutor will return your online assignments after marking and
providing necessary academic guidance and supervision.
Workshops
The online mandatory workshops through (LMS) of Bachelor Studies BS
Economics (4, Year) courses will be arranged during each semester or as-per
AIOU policy. Attendance and course quizzes are compulsory in workshops.
A student will not be declared pass until he/she attends the workshop
satisfactorily and actively.
The duration of a workshop for each 03-credit course will be as per AIOU
policy.
Go through the course unit one by one, using your notes during tutorial
meetings to remind you of the key concepts or theories. If you have not
already made notes, do so now.
Prepare a chronology with short notes on the topics/events/personalities
included in all units.
Go through your assignments and check your weak areas in each case.
viii
Test yourself on each of the main topics, write down the main points or go
through all the notes.
Make sure to attend the workshops and revise all the points that you find
difficult to comprehend.
Try to prepare various questions with your fellow-students during last few
tutorial meetings. A group activity in this regard is helpful. Each student
should be given a topic and revise his topics intensively, summarize it and
revise in group, then all members raise queries and questions. This approach
will make your studies interesting and provide you an opportunity to revise
thoroughly.
For the final exam paper, go through last semesters’ papers. This can clarify
questions and deciding how to frame an answer.
Before your final exams, make sure that,
you get your roll-number slip
you know the exact location of the examination center
you know the date and time of the examination.
Note:
This study guide has been developed to guide the students about the course
“Introduction to Statistics for Economists”. In this context we want to make it clear
that you are not bound to depend entirely upon the recommended books in the study
guide. In case you are unable to find any recommended book, please free to consult
any other book which covers the main contents of the course.
ix
5 – Prescribed Readings
1. Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th
Edition. McGraw-Hill Companies Incorporated. London.
2. Chaudhary, S.M. & KAmnal, S. (2017). Introduction to Statistical Theory
Part-I. Eighth Edition. Ilmi Kitab Khana. Lahore.
3. Chaudhary, S.M. & KAmnal, S. (2017). Introduction to Statistical Theory
Part-II. 8th Edition. Ilmi Kitab Khana. Lahore.
4. Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health
sciences. Sixth Edition. John Wiley and sons Incorporated. USA.
5. Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
6. Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
7. Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
8. Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011).
Business Statistics, Qureshi Brothers Publishers.
9. Millar, R.L.: Intermediate Microeconomics, McGraw-Hill, Latest Edition.
10. Russel, R.R. and M. Wilkinson: Microeconomics: A Synthesis of modern and
Neo-Classical Theory, John Wiley and Sons, New York, 1978.
11. Scherer, F.M.: Industrial Market Structure and Economics Performance.
12. Varian, H.R: Microeconomic Analysis, Norton W.W. Ince, New York, Latest
Edition.
x
UNIT 01
INTRODUCTION
1
CONTENTS
Pages
Background……………………………………………………………………… 3
Objectives………………………..……………………………………………… 3
1.1 Meaning of Statistics.........................................................................................4
1.2 Importance of Statistics………………………………………………………..6
1.3 Observation and Variables……………………………………………………..6
1.4 Collection of Data ……………………………………………………………11
1.5 Summary .........................................................................................................12
1.6 SELF ASSESSMENT QUESTIONS ..............................................................13
Suggested Readings ...............................................................................................14
2
Background
Statistics has been defined differently by different authors from time to time. One
can find more than a hundred definitions in the literature of statistics.
Objectives
After studying this unit, you will be able to;
1. Explain why knowledge of statistics is important.
2. Define statistics and provide an example of how statistics is applied.
3. Differentiate between descriptive and inferential statistics.
4. Classify variables as qualitative or quantitative, and discrete or continuous.
3
1.1 Meaning of Statistics
All definitions clearly point out the four aspects of statistics. Statistics is the science
which deals with methods of collecting, classifying, presenting and interpreting
numerical data.
Functions of Statistics:
Example:
1. All registered voter in Islamabad city
2. All students of Allma Iqbal Open University
3. All daily minimum temperatures in January for major Pakistani cities.
Sample:
A representative part of population which is under investigation is called a sample.
Following figure illustrates the idea of population and sample
4
Parameter:
Estimates calculated from sample data are often used to make inferences about
populations. If a sample is representative of a population, then statistics calculated
from sample data will be close to corresponding values from the population.
Samples contain less information than full populations, so estimates from samples
about population quantities always involve some uncertainty.
Random sampling, in which every potential sample of a given size has the same
chance of being selected, is the best way to obtain a representative sample.
However, it often impossible or impractical to obtain a random sample.
Nevertheless, we often will make calculations for statistical inference as if a sample
was selected at random, even when this is not the case. Thus, it is important to
understand both how to conduct a random sample in practice and the properties of
random samples.
5
➢ Descriptive statistics: classification and diagrammatic representation of
data.
➢ Inferential Statistics: to draw conclusion about population on the basis of
sample drawn from it.
➢ Data: Any measurement of one or more characteristics recorded either
from population or sample.
(i) The planning of operations: This may relate to either special projects or to
the recurring activities of a firm over a specified period.
(ii) The setting up of standards: This may relate to the size of employment,
volume of sales, fixation of quality norms for the manufactured product,
norms for the daily output and so forth.
6
As such, one may do no more than highlight some of the more important ones to
emphasis the relevance of statistics to the business world. In the sphere of
production, for example, statistics can be useful in various ways.
Statistical quality control methods are used to ensure the production of quality
goods. Identifying and rejecting defective or substandard goods achieves this. The
sale targets can be fixed on the basis of sale forecasts, which are done by using
varying methods of forecasting. Analysis of sales affected against the targets set
earlier would indicate the deficiency in achievement, which may be on account of
several causes: (i) targets were too high and unrealistic (ii) salesmen's performance
has been poor (iii) emergence of increase in competition (iv) poor quality of
company's product, and so on. These factors can be further investigated.
7
very quick and accurate in decision making. He knows what his customers want;
he should therefore know what to produce and sell and in what quantities.
Economics
Mathematics
Statistics plays a central role in almost all natural and social sciences. The methods
used in natural sciences are the most reliable but conclusions drawn from them are
only probable because they are based on incomplete evidence.
Statistics helps in describing these measurements more precisely. Statistics is a
branch of applied mathematics. A large number of statistical methods like
probability averages, dispersions, estimation, etc., is used in mathematics, and
different techniques of pure mathematics like integration, differentiation and
algebra are used in statistics.
Banking
Statistics plays an important role in banking. Banks make use of statistics for a
number of purposes. They work on the principle that everyone who deposits their
money with the banks does not withdraw it at the same time. The bank earns profits
out of these deposits by lending it to others on interest. Bankers use statistical
approaches based on probability to estimate the number of deposits and their claims
for a certain day.
8
statistics. Statistical data are now widely used in making all administrative
decisions. Suppose if the government wants to revise the pay scales of employees
in view of an increase in the cost of living, and statistical methods will be used to
determine the rise in the cost of living. The preparation of federal and provincial
government budgets mainly depends upon statistics because it helps in estimating
the expected expenditures and revenue from different sources. So statistics are the
eyes of the administration of the state.
Statistics plays a vital role in almost all the natural and social sciences. Statistical
methods are commonly used for analyzing experiments results, and testing their
significance in biology, physics, chemistry, mathematics, meteorology, research,
chambers of commerce, sociology, business, public administration,
communications and information technology, etc.
Astronomy
Astronomy is one of the oldest branches of statistical study; it deals with the
measurement of distance, and sizes, masses and densities of heavenly bodies by
means of observations. During these measurements errors are unavoidable, so the
most probable measurements are found by using statistical methods.
Example: This distance of the moon from the earth is measured. Since history,
astronomers have been using statistical methods like method of least squares to find
the movements of stars and many mores.
9
procedures to follow. Nevertheless, it is based on a combination of empiricism and
theory which uses several overlapping stages of reasoning. These stages of
reasoning include:
Variable
To put it in very simple terms, a variable is an entity whose value varies. A variable
is an essential component of any statistical data. It is a feature of a member of a
given sample or population, which is unique, and can differ in quantity or quantity
from another member of the same sample or population. Variables either are the
primary quantities of interest or act as practical substitutes for the same. The
importance of variables is that they help in operationalization of concepts for data
collection. For example, if you want to do an experiment based on the severity of
urticaria, one option would be to measure the severity using a scale to grade severity
of itching. This becomes an operational variable. For a variable to be “good,” it
needs to have some properties such as good reliability and validity, low bias,
feasibility/practicality, low cost, objectivity, clarity, and acceptance. Variables can
be classified into various ways as discussed below.
10
Continuous variables, on the other hand, can take any value in between the two
given values (e.g., height (between 5ft to 6ft) or weight (between 70kg and 71kg)
it may takes any values). One way of differentiating between continuous and
discrete variables is to use the “mid-way” test. If, for every pair of values of a
variable, a value exactly mid-way between them is meaningful, the variable is
continuous. For example, two values for the time taken for a weal to subside can be
10 and 13 min. The mid-way value would be 11.5 min which makes sense.
However, for a number of weals, suppose you have a pair of values – 5 and 8 – the
midway value would be 6.5 weals, which does not make sense.
In the context of an experimental study, the dependent variable (also called outcome
variable) is directly linked to the primary outcome of the study. For example, in a
clinical trial on psoriasis, the PASI (psoriasis area severity index) would possibly
be one dependent variable. The independent variable (sometime also called
explanatory variable) is something which is not affected by the experiment itself
but which can be manipulated to affect the dependent variable. Other terms
sometimes used synonymously include blocking variable, covariate, or predictor
variable. Confounding variables are extra variables, which can have an effect on
the experiment. They are linked with dependent and independent variables and can
cause spurious association. For example, in a clinical trial for a topical treatment in
psoriasis, the concomitant use of moisturizers might be a confounding variable. A
control variable is a variable that must be kept constant during the course of an
experiment.
(i) Primary data: Those data which do not already exist in any form, and
thus have to be collected for the first time from the primary source(s).
11
By their very nature, these data require fresh and first-time collection
covering the whole population or a sample drawn from it.
The first step in any scientific inquiry is to collect data relevant to the problem in
hand. When the inquiry relates to physical and/or biological sciences, data
collection is normally an integral part of the experiment itself. In fact, the very
manner in which an experiment is designed, determines the kind of data it would
require and/or generate. The problem of identifying the nature and the kind of the
relevant data is thus automatically resolved as soon as the design of experiment is
finalized. It is possible in the case of physical sciences. In the case of social
sciences, where the required data are often collected through a questionnaire from
a number of carefully selected respondents, the problem is not that simply resolved.
For one thing, designing the questionnaire itself is a critical initial problem. For
another, the number of respondents to be accessed for data collection and the
criteria for selecting them has their own implications and importance for the quality
of results obtained. Further, the data have been collected, these are assembled,
organized and presented in the form of appropriate tables to make them readable.
Wherever needed, figures, diagrams, charts and graphs are also used for better
presentation of the data. A useful tabular and graphic presentation of data will
require that the raw data be properly classified in accordance with the objectives of
investigation and the relational analysis to be carried out.
1.5 Summary
In a summarized manner, ‘Statistics’ means numerical information expressed in
quantitative terms. As a matter of fact, data have no limits as to their reference,
coverage and scope. At the macro level, these are data on gross national product
and shares of agriculture, manufacturing and services in GDP (Gross Domestic
Product). At the micro level, individual firms, how so ever small or large, produce
extensive statistics on their operations. The annual reports of companies contain
variety of data on sales, production, expenditure, inventories, capital employed and
other activities. These data are often field data, collected by employing scientific
survey techniques. Unless regularly updated, such data are the product of a one-
time effort and have limited use beyond the situation that may have called for their
collection. A student knows statistics more intimately as a subject of study like
economics, mathematics, chemistry, physics and others. It is a discipline, which
12
scientifically deals with data, and is often described as the science of data. In
dealing with statistics as data, statistics has developed appropriate methods of
collecting, presenting, summarizing and analysing data and thus consists of a body
of these methods.
1. Define Statistics. Explain its types, and importance to trade, commerce and
business.
2. “Statistics is all-pervading”. Elucidate this statement.
3. Write a note on the scope and limitations of Statistics.
4. What are the major limitations of Statistics? Explain with suitable examples.
5. Distinguish between descriptive Statistics and inferential Statistics.
13
1.7 SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II. 8th
Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
14
UNIT 02
PRESENTATION OF DATA
15
CONTENTS
Pages
Introduction ..........................................................................................................17
Objectives .............................................................................................................17
2.1 Classification....................................................................................................18
2.2 Tabulation … ...................................................................................................19
2.3 Diagrams and Graphs…. ..................................................................................23
2.4 SELF ASSESSMENT QUESTIONS ..............................................................27
Suggested Readings ..............................................................................................28
16
Introduction
In Statistics, presentation of data is very important. In real life problems, we have to
deal with lot of data. Tables, Graphs and charts are used to summarize the data and to
give the data an attractive look. This chapter will explain that how large data is
summarized and presented in understandable form by using different statistical tools.
The presentation of data is not as easy as people think. There is an art to taking
data and creating a story out of it that fulfills the purpose of the presentation.
This refers to the organization of data into tables, graphs or charts, so that logical
and statistical conclusions can be derived from the collected measurements. Data
may be presented in (3 Methods): - Textual - Tabular and - Graphical.
Whenever we hear the word statistics, we think there will be some information,
data, figures, charts, graphs, diagrams, values or some numeric. Isn’t it? It means
statistics relates to some data or values or numeric. Before discussing the data lets
step back to the origin of statistics.
Statistics has developed gradually during the last few centuries. Now it is no longer
restricted to the study of human population or the byproduct of administrative
activities of the state. In the present era of information technology, statistics is
regarded as one of the most import tools for making decisions and its scope has
acquired broad spectrum in almost every sphere of life.
Now, then what is data? Before interpreting the data lets understand the concept of
observation. Anything that can be measured or observed is called an observation
and the numbers or measurements that are collected as a result of observations is
called data. In other words, the facts and figures that are collected, analyzed and
interpreted are called data. Data is considered to be useful information.
Objectives
After studying this unit, you will be able to;
1. Discovery and communication are the two objectives of data visualization.
2. To introduce the students about the types of data and its presentation.
3. To give introduction of basic graphs, charts and diagrams.
4. Interpret a frequency table of quantitative data.
5. Be able to make a histogram or frequency polygon.
6. Differentiate normal distribution, positively skewed distribution and
negatively skewed distribution.
17
2.1 Classification
It is the process of arranging observations into different classes or categories
according to some common characteristics. The best example of classification is
the process of sorting letters in a Courier Office. The data may be classified or
represented by one, two or more characteristics at a time. If the data is classified
according to one characteristic, it is called one-way classification and if the data is
classified according to two characteristics, it is called two-way classification. As in
Courier office the letters are firstly classified as district-wise which is an example
of one way classification and then they are classified in to tehsil-wise that is second
classification. In this manner the third classification may be mohallah or town. That
is an example of three way classification. When the data is classified according to
many characteristics, it is called many-way classification.
Classification is the process of arranging data into various groups, classes and
subclasses according to some common characteristics of separating them into
different but related parts.
18
v.
vi. The class interval is to be determined. It is obtained by using the
relationship
The classification of the data primarily depends upon the following four basis:
i. Geographical (Spatial)
ii. Chronological (Temporal)
iii. Qualitative
iv. Quantitative
Some characteristics of a good classification are:
• Classification should be unambiguous.
• Classification should be stable.
• Classification should not be rigid.
Activity:
2.2 Tabulation
The process of making tables or arranging the data into rows and columns is called
tabulation.
The following are the parts of tables which are involved in the construction of table.
Parts of a Table:
Title
Prefatory Notes
Stub Box Head
Column Caption
Row Captions Body of the table
Footnote
Source note
i) Title:
It is the heading at the top of the table. It should be brief and self-
explanatory. It describes the contents of the table.
19
ii) Column captions and Box-head:
The headings for different columns are called column captions and this
part of column captions is called box-head.
iii) Row captions and Stub:
The headings for different rows are called row captions and this part of
row captions is called stub.
iv) Body of table:
The entries in different cells of columns and rows in a table are called
body of the table.
v) Prefatory notes:
The prefatory note is given after the title of the table. It is used to explain
the contents of the data.
vi) Footnotes:
The footnotes are given at the end of the table. It is used to explain the
contents of the data.
vii) Source note:
Source notes are given at the end of the table, which indicate the
compiling agency, publication, the data and page of distribution.
Frequency Distribution:
Open-end classes:
By open-end classes in a frequency table, either the lower limit of the 1st class or
the upper limit of the last class is not a fixed number.
20
Class limits:
Each class is described by two numbers (the smaller number in the class limit is
lower class limit and the upper number in the class limit is called upper class limit).
These numbers are called class limits.
Class interval:
The class interval is the difference between the upper-class boundary and the lower-
class boundary of the same class (not the difference between the class limits).
Class frequency:
Class mark:
The class mark or the midpoint is the value which divides the class into two equal
parts. It is obtained by adding the lower- and upper-class limits or class boundaries
of a class and dividing the resulting total by 2.
Class boundaries:
A class boundary is located midway between the upper limit of a class and the lower
limit of the next class. The upper-class boundary of a class coincides with the lower-
class boundary of the next class.
Cumulative Frequency:
Relative Frequency:
Percentage Frequency
21
multiply by Hundred. The sum of all the percentages corresponding to each data is
100.
Example:
The marks of 30 students of BS class are as follows:
51, 57, 64, 66, 71, 56, 58, 67, 80, 82, 71, 72, 70, 64, 66, 43, 30, 33, 38, 40, 46, 49,
55, 59, 60, 66, 70, 88, 70, 72
Make a suitable frequency distribution. Also find class boundaries and cumulative
frequency.
Solution:
To construct a frequency distribution, we proceed as follow:
a. Range = R = Maximum value – Minimum value
Here Maximum Value = 92 Minimum Value = 30
So Range = R = 92 – 30 = 62
b. No. of classes = C =1 + 3.3 log 30 C = 1 + 3.3 log 30 here n = 30
C = 1 + 3.3 (1.4771) C = 1 + 4.87443
C = 5.87443 C = 6 (approximately)
c. Class interval = h = R / C = 62 / 6 = 10 (approximately)
22
2.3 Diagrams and Graphs
Diagrammatic Presentation of Data gives an immediate understanding of the real
situation to be defined by data in comparison to the tabular presentation of data or
textual representations. Diagrammatic presentation of data translates pretty effectively
the highly complex ideas included in numbers into more concrete and quickly
understandable form. Diagrams may be less certain but are much more efficient than
tables in displaying the data. There are many kinds of diagrams in general use.
Suppose you are interested to compare the marks of your mates in a test. How can you
make the comparison interesting? It can be done by the diagrammatic representations
of data. You can use a bar diagram, histograms, pie-charts etc. for this.
How will you find out the number of students in the various categories of marks in
a certain test? What can you say about the marks obtained by the maximum
students? Also, how can you compare the marks of your classmates in five other
tests? Is it possible for you to remember the marks of each student in all subjects?
No! Also, you don’t have the time to compare the marks of every student. Merely
noting down the marks and making comparisons is not interesting at all.
The percentage of total income spent under various heads by a family is given
below.
Different Heads Food Clothing Health Education House Rent Miscellaneous
% Age of Total
40% 10% 10% 15% 20% 5%
Number
Represent the above data in the form of bar graph.
23
Multiple Bar Diagram
A multiple bar graph shows the relationship between different values of data. Each
data value is represented by a column in the graph. In a multiple bar graph, multiple
data points for each category of data are shown with the addition of columns.
24
Subdivided Bar Diagram
This is also called Component bar diagram. Instead of placing the bars for each
component side by side we may place these one on top of the other. This will result
in a component bar diagram.
Example: Draw a component bar diagram for the following data
25
Pie Diagram
Pie diagram is a circular diagram where the whole circle represent a ‘total’ and the
components of the total are represented by sectors of the pie diagram. Pie diagram is
also called sector diagram. It is a popular diagram and is drawn when the components
are to be shown for comparison. The total angle of the circle is 3600 and the total
quantity to be represented is taken equal to 3600. The angles for each components are
calculated and these angles are made in the circle to show different components.
Example: The data on Agricultural Product at current factor cost for Pakistan for
the year 1983-84 is given below. Make a pie diagram to represent the data.
Sub-sector Product (million Rs.)
Major crops 46321
Minor crops 14971
Livestock 27096
Fishing 3082
Forestry 457
Source: Punjab Development Statistics, 1984
Solution: The necessary calculations to make the pie diagram are shown below and
the diagram is shown.
Sub-sectors Agriculture Product Angles of a sub-sectors
(million Rs.)
Major crops 46231 46231/91837 * 360 =
181.2
Minor crops 14971 14971/91837 *360 = 58
Livestock 27096 27096/91837 * 360 =
106.2
Fishing 3082 3082/91837 * 360 = 12.1
Forestry 457 457/91837 * 360 = 1.8
Total 91837 360
26
Graphs to describe categorical variable are bar diagram, pie diagram, pareto
diagram and so on.
Graphs to describe numerical variable are histogram, ogive, stem and leaf plot.
2.2 Mercury contamination can be particularly high in certain types of fish. The
mercury content (ppm) on the hair of 40 fishermen in a region thought to be
particularly vulnerable are given below (From paper “Mercury content of
commercially imported fish of the Seychelles, and hair mercury levels of a selected
part of the population.” Environ. Research, (1983), 305-312.)
13.26 32.43 18.10 58.23 64.00 68.20 35.35 33.92 23.94 18.28
22.05 39.14 31.43 18.51 21.03 5.50 6.96 5.19 28.66 26.29
13.89 25.87 9.84 26.88 16.81 38.65 19.23 21.82 31.58 30.13
42.42 16.51 21.16 32.97 9.84 10.64 29.56 40.69 12.86 13.80
Construct frequency distribution of the above data, also calculate the cumulative
and percentage frequency distribution.
2.3 You are working for the Transport manager of a large chain of supermarkets
which hires cars for the use of its staff. Your boss is interested in the weekly
distances covered by these cars. Mileages recorded for a sample of hired
vehicles from 'Fleet 1' during a given week yielded the following data:
138 164 150 132 144 125 149 157 161 150 168 126
138 186 163 146 158 140 109 136 148 152 144 145
145 109 154 165 135 156 146 183 105 108 135 153
140 135 142 128
27
SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II. 8th
Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health Sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, New York.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
28
UNIT 03
MEASURE OF CENTRAL
TENDENCY
29
CONTENTS
Pages
Introduction .. .........................................................................................................31
Objectives ..............................................................................................................31
3.1 Importance and Properties of Avearges ...........................................................32
3.2 Type of Averages…………………………………………………………….32
3.3 Mean………………………………………………………………………….32
3.4 Median……………………………………………………………………… 41
3.5 Mode ……………………………………………………………………… 43
3.6 Relative Merits and Demrits of Averages…… .........…………………… 44
3.7 SELF ASSESSMENT QUESTIONS ..............................................................47
Suggested Readings ..............................................................................................48
30
Introduction
Measure of Central tendency is a single value within the range of data which reflect
the complete data set and falls in the center of the array. The purpose of measures
of central tendency is to identify the location of the center of various distributions.
Objectives
After studying this unit, you will be able to;
1. Understand why the mean is the balancing point in a distribution of scores.
2. Understand the differences between statistics and parameters.
3. Understand the strengths and weaknesses of the mean, median and mode as
measures of central tendency and when you might use one rather than the
others.
4. Understand when you might a particular measure of central tendency to
describe a set of data.
5. Understand why are there different formulas for calculating the median for an
odd versus even number of scores for a variable.
6. Understand the purposes of measures of central tendency.
7. Calculate and interpret measures of central tendency (mode, median, mean)
for a set of data.
8. Identify the mode from a frequency distribution table or figure.
31
3.1 Importance and Properties of Averages
3.3 Mean
The mean is the arithmetic average of all the observations in the data. It is also the
balancing point of the data. The mean is found by adding up all of the observations
and dividing by the total number of observations, either N or n depending upon
whether you are dealing with the population or sample. The formula for the mean
is
32
1. To compute a mean, the data must be measured at the interval or ratio level.
Recall from Chapter 1 that ratio-level data include such data as ages,
incomes, and weights, with the distance between numbers being constant.
3. The mean is unique. That is, there is only one mean in a set of data.
4. The sum of the deviations of each value from the mean is zero. Expressed
symbolically:
Σ(x – 𝑥̅ ) = 0
Σ(x − 𝑥̅ )) = (3 − 5) + (8 − 5) + (4 − 5) = −2 + 3 − 1 = 0
Thus, we can consider the mean as a balance point for a set of data. To illustrate,
we have a long board with the numbers 1, 2, 3, . . . , 9 evenly spaced on it. Suppose
three bars of equal weight were placed on the board at numbers 3, 4, and 8, and the
balance point was set at 5, the mean of the three numbers. We would find that the
STATISTIC A characteristic of a sample.
The arithmetic mean number of minutes used last month by the sample of cell
phone users is 97.5 minutes.
Weighted Mean
The weighted mean is a convenient way to compute the arithmetic mean when there
are several observations of the same value. To explain, suppose the nearby
Restaurant sold medium, large, and Biggie-sized soft drinks for Rs100, Rs 150, and
33
200, respectively. Of the last 10 drinks sold, 3 were medium, 4 were large, and 3
were Biggie-sized. To find the mean price of the last 10 drinks sold, we could use
formula
The mean selling price of the last 10 drinks is Rs. 150. An easier way to find the
mean selling price is to determine the weighted mean. That is, we multiply each
observation by the number of times it occurs.
We will refer to the weighted mean as . This is read “x bar sub w.”
= = 3(100) + 4(150) + 3(200) /10 = 1500 /10 = 150
In this case, the weights are frequency counts. However, any measure of
importance could be used as a weight. In general, the weighted mean of a set of
numbers designated x1, x2, x3, . . . , xn with the corresponding weights w1, w2, w3, .
. . , wn is computed by:
WEIGHTED MEAN= =
Xi Wi Wi Xi
100 3 3 x 100=300
150 4 4 x 150=600
200 3 3 x 200=600
Total 10 1500
=1500/10 = 150
EXAMPLE: The Carter Construction Company pays its hourly employees Rs1650,
Rs1900, or Rs. 2500 per hour. There are 26 hourly employees, 14 of whom are paid
at the Rs.1650 rate, 10 at the Rs1900 rate, and 2 at the Rs.25.00 rate. What is the
mean hourly rate paid the 26 employees?
SOLUTION To find the mean hourly rate, we multiply each of the hourly rates by
the number of employees earning that rate. From formula
34
WEIGHTED MEAN= =
a) Ungrouped Data
If the weights of 7 ear-heads of sorghum are 89, 94, 102, 107, 108, 115 and 126 g.
find arithmetic mean.
b) Grouped Data
The following are the 405 soybean plant heights collected from a particular plot.
Find the arithmetic mean of the plants height by direct and indirect method.
Plant height 8- 13- 18- 23- 28- 33- 38- 43- 48- 53-
(cms) 12 17 22 27 32 37 42 47 52 57
No. of plants 6 17 25 86 125 77 55 9 4 1
Solution:
1) Direct Method:
2) Indirect Method:
35
h=class interval = 5
Class Frequency Mid value
interval (f)
8-12 6 10 60 -4 -24
13-17 17 15 255 -3 -51
18-22 25 20 500 -2 -50
23-27 86 25 2150 -1 -86
28-32 125 30 3750 0 0
33-37 77 35 2695 1 77
38-42 55 40 2200 2 110
43-47 9 45 405 3 27
48-52 4 50 200 4 16
53-57 1 55 55 5 5
Total 405
1) Direct Method:
2) Indirect Method:
A.M = 30 + 120/405
A.M =30+0.2963
A.M = 30.2963
Geometric Mean
The geometric mean is useful in finding the average change of percentages, ratios,
indexes, or growth rates over time. It has a wide application in business and
economics because we are often interested in finding the percentage changes in
sales, salaries, or economic figures, such as the gross domestic product, which
36
compound or build on each other. The geometric mean of a set of n positive
numbers is defined as the nth root of the product of n values. The formula for the
geometric mean is written:
Geometric Mean =GM =
The geometric mean will always be less than or equal to (never more than) the
arithmetic mean. Also, all the data values must be positive. As an example of the
geometric mean, suppose you receive a 5% increase in salary this year and a 15%
increase next year. The average annual percent increase is 9.886%, not 10.0%. Why
is this so?
This can be verified by assuming that your monthly earning was Rs. 3,000 to start
and you received two increases of 5% and 15%.
37
Example: Calculate Geometric mean of the following data.
Solution:
x Log of x
50 1.6990
72 1.8573
54 1.7324
82 1.9138
93 1.9685
log x = 9.1710
GM= 50 72 54 82 93 =68.26
Or
log x 9.1710
GM = Antilog = Antilog = Anti log 1.8342 = 68.26
n 5
Example: Daily income of ten families are given below. Find out the
Geometric Mean.
f log x
Solution: By using the formula GM = Antilog .
n
38
X F Log x f log x
10 2 1 2
100 3 2 6
1000 2 3 6
10000 3 4 12
n= f =10 f log x = 26
f log x 26
GM = Antilog = Antilog =398.1
n 10
Harmonic Mean
Example: Find the harmonic mean for the given data, 3, 5, 6, 6, 7, 10, 12.
Solution:
X 3 5 6 6 7 10 12 Total
1/X 0.3333 0.2000 0.1667 0.1667
0.1429 0.1000 0.0833 1.2939
n 7
The formula of Harmonic Mean is H.M= = = 5.8683
1
x
1.2939
Example: The monthly income of 10 families in a certain village are given below.
Calculate the Harmonic Mean of monthly income.
Family 1 2 3 4 5 6 7 8 9 10
Income (in RS) 85 70 10 75 500 8 42 250 40 36
Solution: -
Family Income (x) 1/x
1 85 0.01176
2 70 0.01426
3 10 0.1000
4 75 0.01333
5 500 0.0020
6 8 0.1250
7 42 0.0238
8 250 0.0040
9 40 0.0250
10 36 0.02778
n=10 (1/x) =
0.34693
39
Harmonic Mean = n OR n
(1/x1 +1/x2 + 1/x3-----1/xn) (1/x)
Truck no 1 2 3 4 5 5
Minutes per hour 48 40 40 48 32 32
Solution: -
Size of Items 6 7 8 9 10 11
Frequency 4 6 9 5 2 8
40
Solution: -
n
The formula of Harmonic Mean =
1
f x
X F 1/x f (1/x)
6 4 0.167 0.6668
7 6 0.143 0.8574
8 9 0.125 1.1250
9 5 0.111 0.5555
10 2 0.100 0.2000
11 8 0.090 0.7272
n = f = 34 f(1/x)=4.1319
Harmonic Mean = n
f(1/x)
= 34 = 8.23
4.1319
3.4 Median
Median is the value of the variable that divides the ordered set of values into two
equal halves. 50 percent values are to the left of the median and 50 percent are the
right of the median.
Median for odd number of observations:
41
Formula to calculate median:
The following are the 405 soybean plant heights collected from a particular plot.
Find the Median of the plants height by.
The formula is, again,
𝒏 𝒉
Median = L+ (𝟐 – C) x 𝒇
Where:
L is the lower class boundary of the group containing the median
n is the total number of values and f is the frequency of the median group
C is the cumulative frequency of the groups before the median group
h is the Class Interval or the width
Example: Find the median, for the distribution of examination marks given below:
Solution
Class Class Mid points Frequency (f) Cumulative
Interval Boundaries (x) frequency (cf)
30 – 39 29.5 -39.5 34.5 08 08
40- 49 39.5- 49.5 44.5 87 95
50-59 49.5-59.5 54.5 190 285
60-69 59.5- 69.5 64.5 304 589
70-79 69.5 -79.5 74.5 211 800
80-89 79.5 -89.5 84.5 85 885
90-99 89.5 - 99.5 94.5 20 905
Total 905
n= Σf = 905 and n/2 = Σf / 2 = 905/2 =452.5 student which corresponds to marks
th
42
Therefore
Median = L+ (n/2 – C) x h/f
=59.5 + (452.5 – 285) x 10/304
Median = 59.5 + (167.5) x 10/304
Median = 59.5 + 1675/304
Median = 59.5 + 5.5098=65 Marks
3.5 Mode
Mode is that value of the variable which occurs most frequently in the series of
observations of the variable.
A list of temperature for one week
Mon Tues Wed Thurs Fri Sat Sun
77 79 83 77 83 77 82
Example: Find the Mode, for the distribution of examination marks given below:
Marks 30 – 39 40- 49 50-59 60-69 70-79 80-89 90-99
Solution
Class Class Mid points Frequency (f) Cumulative
Interval Boundaries (x) frequency (cf)
30 – 39 29.5 -39.5 34.5 08 08
40- 49 39.5- 49.5 44.5 87 95
50-59 49.5-59.5 54.5 190 285
60-69 59.5- 69.5 64.5 304 589
70-79 69.5 -79.5 74.5 211 800
80-89 79.5 -89.5 84.5 85 885
90-99 89.5 - 99.5 94.5 20 905
Total 905
Mode= L +
Model class is that in which the frequency is highest i.e. frequency =304
Mode= 59.5 +
43
Mode= 59.5 + = X 10
Mode =59.5 + x 10
Mode = 59.5 + x 10
Mode =59.5 + 5.507
Mode = 65.007
Mode = 65 Marks
The arithmetic mean (or simply "mean") of a sample is the sum of the sampled
values divided by the number of items in the sample.
The median is that value of the series which divides the group into two equal parts,
44
one part comprising all values greater than the median value and the other part
comprising all the values smaller than the median value.
Merits of median
5. Real value: - Median value is real value and is a better representative value
of the series compared to arithmetic mean average, the value of which may
not exist in the series at all.
Demerits of median
45
Mode:
The value of the variable which occurs most frequently in a distribution is called
the mode.
Merits of mode:
2. Less effect of marginal values: - Compared top mean, mode is less affected by
marginal values in the series. Mode is determined only by the value with highest
frequencies.
3. Graphic presentation:- Mode can be located graphically, with the help of
histogram.
4. Best representative: - Mode is that value which occurs most frequently in the
series. Accordingly, mode is the best representative value of the series.
Demerits of mode:=
46
3.7 SELF ASSESSMENT QUESTIONS
1. Consider the data below. This data represents the number of miles per gallon
that 30 selected four-wheel drive sports utility vehicles obtained in city driving
12 17 16 14 16 18
16 18 17 16 17 15
15 16 16 15 16 19
10 14 15 11 15 15
19 13 16 18 16 20
2. A student recorded her scores on weekly math quizzes that were marked out of
a possible 10 points. Her scores were as follows: 8, 5, 8, 5, 7, 6, 7, 7, 5, 7, 5, 5,
6, 6, 9, 8, 9, 7, 9, 9, 6, 8, 6, 6, 7. What is the Mean, Median and mode of her
scores on the weekly math quizzes?
3. The following table of grouped data represents the weight (in pounds) of 100
computer towers. Calculate the mean, Median and Mode weight for a computer.
Weight (pounds) Number of Computers
3-5 8
5-7 25
7-9 45
9 - 11 18
11 – 13 4
4. Calculate the Mean, Median and Mode from the frequency distribution for the
weight of 120 students as given in the following Table;
Weights 110- 120- 130- 140- 150- 160- 170- 180- 190- 200- 210-
(Ibs) 119 129 139 149 159 169 179 189 199 209 219
f 1 4 17 28 25 18 13 6 5 2 1
47
SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II. 8th
Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
48
UNIT 04
MEASURE OF DISPERSION
49
CONTENTS
Pages
Introduction… ........................................................................................................49
Objectives .............................................................................................................49
4.1 The Range ........................................................................................................52
4.2 The Mean Deviation ........................................................................................53
4.3 The Variance and Standard Deviation .............................................................55
4.4 Coefficient of Variation ...................................................................................57
4.5 Moments .........................................................................................................59
4.6 Skewness ..........................................................................................................60
4.7 Kurtosis ...........................................................................................................61
4.8 SELF ASSESSMENT QUESTIONS ............................................................633
Suggested Readings ..............................................................................................64
50
Introduction
Dispersion means scattering of the observations among themselves or from a
central value (Mean/ Median/ Mode) of data. We study the dispersion to have an
idea about the variation. These measures give us an idea about the amount of
dispersion in a set of observations. They give the answers in the same units as the
units of the original observations.
There are two types of measures of dispersion.
1. Absolute measures of dispersion
2. Relative measures of dispersion
Difference between Absolute measures and Relative measures:
Absolute measures of Dispersion are expressed in same units in which
original data is presented but these measures cannot be used to compare the
variations between the two series. Relative measures are not expressed in
units but it is a pure number. It is the ratios of absolute dispersion to an
appropriate average such as co-efficient of Standard Deviation or Co-
efficient of Mean Deviation.
1. Absolute measures of dispersion
I.Range
II.Mean deviation.
III.Standard deviation and Variance
IV.Quartile deviation
2. Relative measures of dispersion
I.Coefficient of range
II.Coefficient of mean deviation
III.Co-efficient of variation
IV.Coefficient of quartile deviation.
Objectives
After studying this unit, you will be able to;
• Comparative Study: Measures of dispersion give a single value indicating
the degree of consistency or uniformity of distribution. This single value
helps us in making comparisons of various distributions.
• The smaller the magnitude (value) of dispersion, higher is the consistency
or uniformity and vice-versa.
• Reliability of an Average: A small value of dispersion means low variation
between observations and average. It means the average is a good
representative of observation and very reliable.
• A higher value of dispersion means greater deviation among the
observations. In this case, the average is not a good representative, and it
cannot be considered reliable.
• Control the Variability: Different measures of dispersion provide us data of
variability from different angles, and this knowledge can prove helpful in
controlling the variation. Especially in the financial analysis of business and
Medical, these measures of dispersion can prove very useful.
51
4.1 The Range
The range is the absolute difference between the highest and the smallest values in
a set of data.
Range is defined as the difference between the maximum or largest and the
minimum or smallest observation of the given data. If xm denotes the maximum
observation and x0 denotes the minimum observation, then the range is defined as
Example:
Suppose we have the following data of weights in Ibs (Pounds)
126 68 130 129 139 119 115 128 100 186 84 99
The largest value among the data=Xm=186 lbs
The Smallest value among the data=X0=68 lbs
Range= largest value - smallest value= Xm – X0 =186 - 68=118 lbs
The coefficient of range can be calculated by using the following formulae
𝑥 −𝑥 186−68 118
𝐶𝑅 = 𝑥𝑚+𝑥0 ,=186+68 =254 =0.465 or 46.5%
𝑚 0
Example:
The heights (in centimeters) of second semester students of BS Statistics are
measured nearest to whole number as 56, 71, 62, 65, 59, 67, 64, 68, 70, 63.
Determine the range and coefficient range.
Solution: It is simple to find out that 𝑥0 = 56 𝑐𝑚 and 𝑥𝑚 = 71 𝑐𝑚 ,
therefore
𝑅 = 𝑥𝑚 − 𝑥𝑚 = 71 − 56 = 15 𝑐𝑚
and
𝑥𝑚 − 𝑥0 71 − 56 15
𝐶𝑅 = = = = 0.118 𝑜𝑟 11.8%
𝑥𝑚 + 𝑥0 71 + 56 127
Activity: Calculate Range and Coefficient of Range for the following information.
5 6 7 7 9 4 5
Activity: Calculate Range and Coefficient of Range for the following information.
0.30, 2.22, 0.71, 3.53, 2.15, 4.18, 0.16, 1.25, 2.46,
8.83, 1.51, 0.92, 2.49, 2.55, 2.35, 0.50, 2.17, 2.35,
0.08, 1.22, 0.31, 1.52, 0.69, 0.24, 0.80, 1.16, 2.98,
52
3.72 0.58, 6.57, 0.02, 3.93, 0.02, 1.96, 2.56, 2.61,
1.67, 0.23, 8.61, 4.84, 4.67, 4.63, 5.31, 1.11, 0.54,
1.95, 0.20, 0.57, 2.51, 1.98.
Range is based on two extreme observations. It gives no weight to the central values
of the data. It is a poor measure of dispersion and does not give a good picture of
the overall spread of the observations with respect to the center of the observations.
Let us consider three groups of data which have the same range:
In all the three groups the range is 50 – 30 = 20. In group A there is a concentration
of observations in the center. In group B the observations are concentrated in the
extreme corners, and in group C the observations are almost equally distributed in
the interval from 30 to 50. The range fails to explain differences in the three groups
of data. This defect in range cannot be removed even if we calculate the coefficient
of the range, which is a relative measure of dispersion. If we calculate the range of
a sample, we cannot draw any inferences about the range of the population.
∑𝑛
𝑖=1|𝑥𝑖 −𝑥̅ | 𝑀𝐷(𝑚𝑒𝑎𝑛) 𝑀𝐷(𝑚𝑒𝑎𝑛)
𝑀𝐷(𝑚𝑒𝑎𝑛) = and 𝐶𝑀𝐷 = = .
𝑛 𝑚𝑒𝑎𝑛 𝑥̅
Similarly the formulae for mean deviation from median and the corresponding
coefficient are
∑𝑛
𝑖=1|𝑥𝑖 −𝑥̃| 𝑀𝐷(𝑚𝑒𝑑𝑖𝑎𝑛) 𝑀𝐷(𝑚𝑒𝑑𝑖𝑎𝑛)
𝑀𝐷(𝑚𝑒𝑑𝑖𝑎𝑛) = and 𝐶𝑀𝐷 = = .
𝑛 𝑚𝑒𝑑𝑖𝑎𝑛 𝑥̃
53
Now consider the calculation of mean deviation and coefficient of mean deviation
for the grouped data in the form of following frequency distribution.
Example:
The weights (in kg) of second semester students of BS Statistics are measured
nearest to one decimal point as 37.7, 40.3, 43.3, 44.5, 46.9, 47.6, 48.6, 51.5, 52.4,
53.8. Determine the mean deviation from mean and median and coefficient of mean
deviation from mean and median.
Solution: First we compute the mean and median as
∑10
𝑖=1 𝑥𝑖 466.6 46.9+47.6
mean = 𝑥̅ = = = 46.66 kg, and median = 𝑥̃ = =
10 10 2
47.25 kg.
X 𝒙−𝒙 ̅ 𝒙−𝒙 ̃ ̅|
|𝒙 − 𝒙 ̃|
|𝒙 − 𝒙
37.7 −8.96 −9.55 8.96 9.55
40.3 −6.36 −6.95 6.36 6.95
43.3 −3.36 −3.95 3.36 3.95
44.5 −2.16 −2.75 2.16 2.75
46.9 +0.24 −0.35 0.24 0.35
47.6 +0.94 +0.35 0.94 0.35
48.6 +1.94 +1.35 1.94 1.35
51.5 +4.84 +4.25 4.84 4.25
52.4 +5.74 +5.15 5.74 5.15
53.8 +7.15 +6.55 7.15 6.55
Total 41.68 41.20
Now
∑𝑛𝑖=1|𝑥𝑖 − 𝑥̅ | 41.68
𝑀𝐷(𝑚𝑒𝑎𝑛) = = = 4.17 kg
𝑛 10
and
𝑀𝐷(𝑚𝑒𝑎𝑛) 4.17
𝐶𝑀𝐷 = = = 0.0894 = 8.94 %.
𝑥̅ 46.66
Similarly
∑𝑛𝑖=1|𝑥𝑖 − 𝑥̃| 41.2
𝑀𝐷(𝑚𝑒𝑑𝑖𝑎𝑛) = = = 4.12 𝑘𝑔
𝑛 10
and
𝑀𝐷(𝑚𝑒𝑑𝑖𝑎𝑛) 4.12
𝐶𝑀𝐷 = = = 0.0872 = 8.72%
𝑥̃ 47.25
54
Activity:
Calculate the mean deviation from mean and median and coefficient of mean
deviation from mean and median from the following data.
6.28 6.42 5.52 6.09 5.71 6.18 5.80 6.10 6.09 6.06 6.11 5.95 6.25
6.10 6.02 6.16 5.61 5.97 5.92 5.89 6.11 5.56 5.70 5.63 6.13 5.94
6.17 6.14 5.80 5.97
2
∑𝑘 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2 ∑𝑘 𝑓𝑖 𝑥𝑖2 ∑𝑘𝑖=1 𝑓𝑖 𝑥𝑖
𝑠 = √ 𝑖=1 𝑘 = √ 𝑖=1 − ( ) .
∑𝑖=1 𝑓𝑖 ∑𝑘𝑖=1 𝑓𝑖 ∑𝑘𝑖=1 𝑓𝑖
55
Example: Compute standard deviation, Variance and Coefficient of variation for
the following data. 56, 71, 62, 65, 59, 67, 64, 68, 70, 63
𝒙 (𝒙 − 𝒙
̅) (𝒙 − 𝒙̅) 𝟐 𝒙𝟐 𝒙 − 𝟔𝟒 𝒅𝟐
56 −8.5 72.25 3136 −8 64
59 −5.5 30.25 5041 −5 25
62 −2.5 6.25 3844 −2 4
63 −1.5 2.25 4225 −1 1
64 −0.5 0.25 3481 0 0
65 0.5 0.25 4489 1 1
67 2.5 6.25 4096 3 9
68 3.5 12.25 4624 4 16
70 5.5 30.25 4900 6 36
71 6.5 42.25 3969 7 49
645 0 202.50 41805 5 205
In order to Calculate the standard deviation, we first need the mean of the data
which is computed as
∑10
𝑖=1 𝑥𝑖 645
𝑥̅ = = = 64.5 cm
10 10
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2
𝑠 2 = Square [√ ] = (4.74)2 = 22.5
𝑛−1
56
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 1 𝑛 (∑𝑛𝑖=1 𝑥)2
𝑠=√ = √ [∑ 𝑥 −2 ]
𝑛−1 𝑛−1 𝑖=1 𝑛
1 𝑛 (645)2
𝑠=√ [∑ 41805 − ]
10 − 1 𝑖=1 10
1 𝑛 416025
𝑠 = √ [∑ 41805 − ]
9 𝑖=1 10
The coefficient of variation (CV) is the ratio of Standard deviation to the mean. The
higher the coefficient of variation, the greater the level of dispersion around mean.
It is generally expressed as a percentage. Without units, it allows for comparison
between distributions of values whose scales of measurement are not comparable.
When we are presented with estimated values, the CV relates the standard deviation
of the estimate to the value of this estimates. The lower the value of the coefficient
of variation, the more precise the estimate.
Example:
Below are the scores of two cricket players A & B in 10 innings. Calculate
Coefficient of Variation for Player A and B and decide which player is more
consistent?
57
Player 204 68 150 30 70 95 60 76 24 19
A
Player 99 190 130 94 80 89 69 85 65 40
B
Solution:
Now
Similarly
58
And
Activity: Calculate the variance, S.D and C.V from the following marks obtained
by 9 students.45 32 37 46 39 36 41 48 36
4.5 Moments
Beyond the measures of central tendency and dispersion explained earlier, there are
measures that further describe the characteristics of a distribution. Moments are a
set of statistical parameters to measure a distribution. Four moments are commonly
used:
59
3rd moment = μ3 = ∑(𝑥 – 𝑥̅)3/ 𝑛
4th moment = μ4 = ∑(𝑥 – 𝑥̅)4/𝑛
4.6 Skewness
The term ‘skewness’ refers to lack of symmetry or departure from symmetry, e.g.,
when a distribution is not symmetrical (or is asymmetrical) it is called a skewed
distribution. The measures of skewness indicate the difference between the manner
in which the observations are distributed in a particular distribution compared with
a symmetrical (or normal) distribution. The concept of skewness gains importance
from the fact that statistical theory is often based upon the assumption of the normal
distribution. A measure of skewness is, therefore, necessary in order to guard
against the consequence of this assumption. In a symmetrical distribution, the
values of mean, median and mode are alike. If the value of mean is greater than the
mode, skewness is said to be positive. In a positively skewed distribution, mean is
greater than the mode and the median lies somewhere in between mean and mode.
A positively skewed distribution contains some values that are much larger than
most other observations. A distribution is positively skewed when the long tail is
on the positive side of the peak. On the other hand, if the value of mode is greater
than mean, skewness is said to be negative. The following diagrams could clarify
the meaning of skewness.
In a negatively skewed distribution, mode is greater than the mean and the median
lies in between mean and mode. The mean is pulled towards the low-valued item
(that is, to the left). A negatively skewed distribution contains some values that are
much smaller than most observations. A distribution is negatively skewed when the
long tail is on the negative side of the peak.
60
Karl Pearson’s Coefficient of Skewness = 3(𝑀𝑒𝑎𝑛−𝑀𝑒𝑑𝑖𝑎𝑛)/𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑
𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
4.7 Kurtosis
Kurtosis refers to the degree of peakedness of a frequency curve. It tells how tall
and sharp the central peak is, relative to a standard bell curve of a distribution.
Kurtosis can be described in the following ways:
• Platykurtic– When the kurtosis < 0, the frequencies throughout the curve are
closer to be equal (i.e., the curve is more flat and wide)
• Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small
part of the curve (i.e, the curve is more peaked)
Example: Calculate first four moments about mean for ungrouped data for the
following set of examination marks: 32, 36,36, 37, 39, 41, 45, 46, 48
X 32 36 36 37 39 41 45 46 48 ΣX=360
X - 𝑋̅ -8 -4 -4 -3 -1 1 5 6 8 Σ(X - 𝑋̅) = 0
(X − 𝑋̅)3 -512 -64 -64 -27 -1 1 125 216 512 Σ(X − 𝑋̅)3 =186
(X − 𝑋̅)4 4096 256 256 81 -1 1 625 1296 4096 Σ(X − 𝑋̅)4 =10708
61
1st moment = μ1 = ∑(𝑥 – 𝑥̅)/ 𝑛 = 0 Marks
2nd moment = μ2=∑(𝑥–𝑥̅)2/𝑛=232/10=23.2 (Marks)2
3rd moment = μ3 = ∑(𝑥 – 𝑥̅)3/ 𝑛= 186/10=18.6 (Marks)3
4th moment = μ4 = ∑(𝑥 – 𝑥̅)4/𝑛 =10708/10=1070.8 (Marks)4
Skewness
Moment based measure of Skewness = γ1 = √β1 = 𝜇3/𝜇23/2 =186/√12487168= 186/
3533.72 Moment based measure of Skewness = γ1 =0.0526 , The Data is very close
to symmetry
Kurtosis
Moment based Measure of kurtosis = β2 = 𝜇4/ 𝜇22 =1070.8/(23.2)2
=1070.8/538.24=1.989
• Leptokurtic– When the kurtosis > 0, there are high frequencies in only a small
part of the curve (i.e, the curve is more peaked)
Activity:
Calculate skewness and kurtosis for grouped data (using a continuous grouped case
formula). The following distribution relates to the number of assistants in 50 retail
establishments, the data are given below:
No of Assistant 0 1 2 3 4 5 6 7 8
Frequency 3 4 6 7 10 6 5 5 3
62
4.8 SELF ASSESSMENT QUESTIONS
Q1. The following data is of Batsman Score in a series
30, 91, 0, 64, 42, 80, 30,
Calculate variance, standard deviation, Co-efficient of Variation, Skewness
and Kurtosis
Q2. The following table gives the frequency distribution of the amounts of
telephone bills for April 2013 for a sample of 50 students.
Q3. The production of jute goods in different days of first and second of the year
are shown below
Q4. Terrier and SFP are two stocks traded on the New York Stock Exchange. For
the past seven weeks Friday closing price (dollars per share) was recorded:
Terrier 32 35 34 36 31 39 41
SFP 51 55 56 52 55 52 57
63
SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II. 8th
Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
64
UNIT 05
65
CONTENTS
Pages
Introduction… ........................................................................................................63
Objectives .............................................................................................................64
5.1 Random Experiment ........................................................................................65
5.2 Random Variable .............................................................................................65
5.3 Discrete Distribution ........................................................................................66
5.4 Contineous Distribution ...................................................................................68
5.5 Expectation of Random Vairables ..................................................................69
5.6 Linear Transformation of Random Variable....................................................70
5.7 Jointly Distributed Random Variables…………………………………… . 71
5.8 Covariance and Corelation ...............................................................................72
5.9 Some Rules and Symbols ................................................................................73
5.10 Rules of Counting ......................................................................................... 76
5.11 SELF ASSESSMENT QUESTIONS ............................................................83
Suggested Readings ..............................................................................................85
66
Introduction
Chance is what makes life worth living – if everything was known in advance,
imagine the disappointment! If decision-makers had perfect information about the
future as well as the present and the past, there would be no need to consider the
concepts of probability. However, it is usually the case that uncertainty cannot be
eliminated and hence its presence should be recognized and used in the process of
decision- making. Information about uncertainty is often available to the decision-
maker in the form of probabilities. This chapter introduces the fundamental
concepts of probability. In other subjects (e.g. Management Science Methods) you
may make full use of probabilities in decision trees and highlight ways in which
such information can be used. Our treatment of probability in this module is quite
superficial. The concepts of probability are simple but applying them in some
circumstances can be very difficult! As a preliminary we consider the basic ideas
concerning sets.
Objectives
After studying this unit, you will be able to understand the ideas of randomness and
variability, and the way in which these link to probability theory to allow the
systematic and logical collection of statistical techniques of great practical
importance in many applied areas.
67
5.1 Random Experiment
An experiment is any well-defined, repeatable procedure, usually involving one or
more chance events. One repetition of the procedure is called a trial. When a trial
is conducted, it results in some outcome. (Note that, in the usual case where the
experiment involves randomness, different trials can result in different outcomes.)
A random variable is a measurable (numeric) quantity associated with the outcome
of an experiment. An event is a statement about the outcome of the experiment that
is either true or false.
This topic is returned to, and made more substantial use of, in the Statistics,
economics and Management Mathematics courses.
• Sample Space, S. For a given experiment the sample space, S, is the set of all
possible outcomes. • Event, E. This is a subset of S. If an event E occurs, the
outcome of the experiment is contained in E.
Example . Suppose you arrive at a railway station at a random time. There is a train
once an hour. The random experiment is to observe the number of (rounded up)
minutes that you wait before a train leaves.
The elementary outcomes here are the integers (whole numbers) 1 to 60, and the
sample space is {1,2,3…60}. The event that ‘you wait less than 10 minutes’ is the
subset {1,2,3,4,5,6,7,8,9}.
68
5.2 Random Variable
1. A random variable is a variable which take a specific values with specific
probabilities.
It can be thought as a variable whose values depends on outcome of an
uncertain event.
2. We usually use the capital alphabet to denote the random variables e.g. W,
X,Y or Z etc.
The word “random” in the term “random variable” does not necessarily
imply that the outcome is completely random in the sense that all values are
equally likely. Some values may be more likely than others; “random”
simply means that the value is uncertain.
When you think of a random variable, immediately ask yourself
• What are the possible values?
• What are their probabilities?
Example: Let Y be the sum of two dice rolls.
The probabilities assigned to the possible values of a random variable are its
distribution. A distribution completely describes a random variable.
A random variable is called discrete if it has count ably many possible
values; otherwise,
it is called continuous. For example, if the possible values are any of these:
• {1, 2, 3,... , }
• {... , −2, −1, 0, 1, 2,.. .}
• {0, 2, 4, 6,.. .}
• {0, 0.5, 1.0, 1.5, 2.0,. . .}
any finite set then the random variable is discrete.
69
If the possible values are any of these:
• all numbers between 0 and ∞
• all numbers between −∞ and ∞
• all numbers between 0 and 1
then the random variable is continuous.
Sometimes, we approximate a discrete random variable with a continuous one if
the possible values are very close together; e.g., stock prices are often
treated as continuous random variables.
Example: If X is the outcome of the roll of a die, then P (X = 1)= P (X = 2)= ···
= P (X = 6) = 1/6, and P (X = x) = 0 for all other values of x.
0.18
0.15
0.12
0.09
0.06
0.03
70
In Figure above the Left panel shows the probability mass function or pdf for the
sum of two dice; the possible values are 2 through 12 and the heights of the bars
give their probabilities. The bar heights sum to 1. Right panel shows a probability
density for a continuous random variable. The probability P (1 X ≤ 1.5) is
given by the shaded area under the curve between 1 and 1.5. The total area under
the curve is 1. The probability of any particular value, e.g., P (X = 1) is zero
because there is no area under a single point.
We always use capital letters for random variables. Lower-case letters like x and y
stand for possible values (i.e., numbers) and are not random.
i. The probability that X falls between two points a and b is the area
under f(X) between the points a and b.
ii. The familiar bell-shaped curve is an example of a density.
71
(X < x). For a discrete random variable, the two probabilities are not in general
equal.
5. The probability that X falls between two points a and b is given by the
difference between the cdf values at these points:
P (a < X ≤ b)= FX (b) − FX (a).
Since FX (b) is the area under fX to the left of b and since FX (a) is the area under
fX to the left of a, their difference is the area under fX between the two points.
2. The formula for the expected value of a discrete random variable is this:
E[X]= Σ xP (X = x). for all possible x
In words, the expected value is the sum, over all possible values x, of x times its
probability
P (X = x).
In general, E[g(X)] is not the same as g(E[X]). In particular, E[X2 ] is not the same
as (E[X])2.
this is the expected value σof the squared difference between Xand its mean.
For a discrete distribution, we can write the variance as
2
σ X = Σ(x – µ X) 2P (X = x).
7. An alternative expression for the variance (valid for both discrete and
continuous random variables) is
2
σ X = E[(X2)] – [µX ]2.
σX
2
= E[(X2)] – [E(X)]2.
This is the difference between the expected value of X2 and the square of the mean of X.
9. In fact, there is a Chebyshev rule for random variables: if m > 1, then the
probability that X falls within m standard deviations of its mean is at least 1 −
(1/m2); that is,
P (µx − mσX ≤ X ≤ µX + mσX) ≥ 1 − (1/m2).
10. Find the variance and standard deviation for the roll of one die. Solution:
We use the formula V ar [X] = E[X2 ] (E[X]). We found previously that E[X] =
3.5, so now we need to find E[X2 ]. This is given by
Thus, E[X2]= Σ x2PX ( x)=12( ) + 22( ) + ·· + 62( ) = 15.167. σ
6 6 6
2
x = V ar [X]= E[X2] − (E[X])2 = 15.167 − (3.5)2 = 2.917
√
and σ = 2.917 = 1.708.
5.6 Linear Transformations of Random Variables
73
is a linear transformation of X. It scales X by b and shifts it by a. A linear
transformation of X is another random variable; we often denote it by Z.
Example: Suppose you have investments in Japan. The value of your investment
(in yen) one month from today is a random variable X. Suppose you can convert
yen to dollars at the rate of b dollars per yen after paying a commission of a
dollars. What is the value of your investment, in dollars, one month from today?
Activity: Your salary is Rs. a per year. You earn a bonus of b dollars for every Rs.
of sales you bring in. If X is what you sell, how much do you make?
Example: It takes you exactly 16 minutes to walk to the train station. The
train ride takes X hours, where X is a random variable. How long is your trip,
in minutes?
σ
2. Thus, the expected value of a linear transformation of X is just the linear
transformation of the expected value of X. Previously, we said that E[g(X)] and
g(E[X]) are generally different. The only case in which they are the same is when
g is a linear transformation: g(x)= a + bx.
Examples
(a) Think of the price of each stock in the Pakistan exchange as a random variable;
the movements of these variables are related.
(b) You may be interested in the probability that a randomly selected shopper
buys prepared frozen meals. In designing a promotional campaign you might be
even more interested in the probability that that same shopper also buys instant
coffee and reads a certain magazine.
74
(c) The number of defects produced by a machine in an hour is a random variable.
The number of hours the machine operator has gone without a break is another
random variable. You might well be interested in probabilities involving these
two random variables together.
For example, if X and Y are random variables, there joint pmf or pdf is PX,Y
(x, y) = P (X = x, Y =y) = probability that X = x and Y = y.
For several random variables X1,... , Xn, we denote the joint pmf by P(X1 ,...,Xn)
1 2 3
2 .03 .05 .22
3 .05 .06 .15
4 .14 .15 .15
Thus, the proportion of employees that had 1 year prior experience and
stayed for 2 years is 0.03. If we let Y = years stayed and X = years’ experience,
we can express this as
PX,Y (1, 2) = P (X = 1,Y = 2)= 0.03.
The table above determines all values of PX,Y (x, y).
5. What proportion of employees stay 4 years? What proportion are hired with
just 1 year of experience?
These are questions about marginal probabilities; i.e., probabilities involving
just one of the random variables. A marginal probability for one random variable
75
is found by adding up over all values of the other random variable; e.g.,
P (X = x)=ΣP (X = x, Y = y),
where the sum ranges over all possible y values. In the table, the marginal
probabilities correspond to the column-sums and row-sums. So, the answers
to the two questions just posed are 0.44 and 0.22 (the last row-sum and the
first column-sum).
76
Cov[X, Y ]= E[(X − µX)(Y − µY )] = E[XY ] − µXµY .
If X tends to be large when Y is large, the covariance will be positive.
A positive ρXY implies that X tends to be large when Y is large and vice-versa. A
negative ρXY implies that X tends to be large when Y is small and vice-versa.
• Correlation measures the strength of linear dependence between two random
variables. If
Y= a + bX and b ƒ= 0, the ρXY | = 1; its sign positive or negative if b is positive or
negative. Conversely, if |ρXY | = 1 then Y = a + bX for some values of a and b.
77
If X, Y are independent, then their covariance is zero and Var[X + Y ]= Var[X]+
Var[Y ].
there is a covariance term for each pair of variables. If the variables are independent,
then this simplifies to
Var[X1 + ··· + Xn]= Var[X1]+ ··· + Var[Xn].
If, in addition, X1,... , Xn all have variance σ2, then V ar[X1 + ··· + Xn]= (σ2 + ··· +
σ2)= nσ2
and thus Standard Deviation [X1 + ··· + X n ]= √𝑛 σ.
Example. A population of interest has four members: Ali, Gulzar, Ibrar and Zeenat.
A random experiment selects a sample of size two from the population without
replacement. The sample space is:
S = {(Ali, Gulzar), (Ali, Ibrar), (Ali, Zeenat), (Gulzar, Ibrar), (Gulzar, Zeenat),
(Ibrar, Zeenat)}.
The event that ‘the sample includes Ibrar’ is the subset: {(Ali, Ibrar), (Gulzar,
Ibrar}, (Ibrar, Zeenat)}. This example shows that the elementary outcomes can
themselves be sets.
78
• Complement. We write the complement of E as Ec . It indicates all the elements
of a set not in event E. Looking at Example again, throwing a die, you can see that
Ec is = {1,2,5,6}.
Examples:
1. Number of phone calls received in a day by a company
2. Number of heads in 5 tosses of a coin
A common way to tabulate all of this information is to make a list or table of all the
possible values of X along with their corresponding probabilities. The associated
function is called the probability density function of X:
Definition: If X is a random variable on the sample space S, then the function p(X)
such that P(X ∈ E) for any set of numbers E is called the probability density
function (pdf) of X.
Explicitly, the value of p(a) on a real number a is the probability that the random
variable X takes the value a.
79
For discrete random variables with a small number of outcomes, we usually
describe the probability density function using a table of values. In certain
situations, we can find a convenient formula for the values of the probability density
function on arbitrary events, but in many other cases, the best we can do is simply
to tabulate all the different values.
Example: If two standard 6-sided dice are rolled, find the probability distribution
for the random variable X giving the sum of the outcomes. Then calculate (i)
P(X=7), (ii) P(4< X<9), and (iii) P(X≤ 6).
To find the probability distribution for X, we identify all of the possible values for
X and then tabulate the respective outcomes in which each value occurs.
We can see that the possible values for X are 2, 3, 4, ... , 12, and that they occur as
follows:
Example: If a fair coin is flipped 4 times, find the probability distributions for the
random variable X giving the number of total heads obtained, and for the random
variable Y giving the longest run of consecutive tails obtained. Then calculate (i)
P(X = 2), (ii) P(X ≥ 3), (iii) P(1 < X < 4), (iv) P(Y = 1), (v) P(Y ≤ 3), and (vi) P(X
= Y = 2).
80
For X, we obtain the following distribution:
Value (X) Outcomes Probability
0 (T T T T) 1/16
1 (T T T H), (T T HT), (T HT T), (HT T T) 1/4
2 (T T HH), (T HT H), (T HHT), (HT T H), (HT HT), 3/8
(HHT T )
3 (T HHH), (HT HH), (HHT H), (HHHT) 1/4
4 (HHHH) 1/16
To and P(X = Y = 2) we must look at the individual outcomes where X and Y are
both equal to 2. There are 2 such outcomes, namely (TTHH) and (HHTT), so P(X
= Y = 2) = 1/8 .
The name for this random variable comes from the idea of a Bernoulli trial, which
is an experiment having only two possible outcomes, success (with probability p)
and failure (with probability 1 − p). We think of E as being the event of success,
while Ec is the event of failure.
82
While there are five basic counting principles: addition, multiplication,
Permutation and Combination. The one that is most closely associated with the
title of “fundamental counting principle” is the multiplication rule, where if
there are p ways to do one task and q ways to another task, then there are pxq
ways to do both.
When selecting elements of a set, the number of possible outcomes depends on the
conditions under which the selection has taken place.
Some times counting the "number of ways an Event E can occur" or the "total
number of possible outcomes" can be fairly complicated. In this section, we'll learn
several counting techniques, which will help us calculate some of the more
complicated probabilities.
Addition Principle
The Sum Rule states that if a task can be performed in two ways, where the two
methods cannot be performed simultaneously, then completing the job can be
done by the sum of the ways to perform the task.
Example: if an experiment can proceed in one of two ways, with experiment-I have
n1 outcomes for the first way, and Experiment II have n2 outcomes for the second,
then the total number of outcomes for the experiment is n1 + n2
Example
For instance, suppose a bakery has a selection of 20 different cupcakes, 10
different donuts, and 15 different muffins. If you are to select a tasty treat, how
many different choices of sweets can you choose from?
Multiplication Principle
The Product Rule states that if a task can be performed in a sequence of tasks,
one after the other, then completing the job can be done by the product of the
ways to perform the task.
83
Example
Continuing our story from above, suppose a bakery has a selection of 20
different cupcakes, 10 different donuts, and 15 different muffins — how many
different orders are there?
Solution
What makes this question different from the first problem is that we
are not asking how many total choices there are. We are asking how many
different ways we can select a treat.
It’s possible that you only want one treat, but you can quite easily want more
than one.
So how many different orders can you create, if you’re allowed to choose as few
or as many as you like?
Example
Now let’s look at another example. Suppose a mathematics faculty and 83
mathematics majors, and no one is both a faculty member and a student.
Solution: By the sum rule, it follows that there are 37 + 83 = 120 possible ways
to pick a representative.
Remember, the product rule states that if there are p ways to do one task and q
ways to another task, then there are p x q ways to do both.
Example
A restaurant menu offers 4 starters, 7 main courses and 3 different desserts. How
many different three-course meals can be selected from the menu?
Solution:
Multiplying together the number of choices for each course
gives 4×7×3=84 different three-course meals.
84
Permutation and Combination
Both combination and permutation are concerned with the number of ways of
selecting and arranging of objects. Combination is simply concerned with selection
while permutation is concerned with arrangement. There is therefore a slight
difference between the two.
Combinations
The term, combination refers to the number of ways of selecting objects from a
group of objects at a time without considering the order in which they are selected.
In other words, the combination of n different items taking r objects at a time is the
selection of r out of the n objects with no attention paid to the order of selection.
The number of possible combinations of n objects taking r at a time is denoted by
n
Cr and is expanded as follows:
n!
n
Cr =
(n − r )!r!
[n! = n(n-1)(n-2)(n-3) ---(1); e.g. 5! = 5 4 3 2 1 = 120]
For example, consider the selection of two numbers at a time from the set {1, 2, 3,
4, 5}. The possible selections by combination are:
(1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (2, 5) (3, 4) (3, 5) and (4, 5).
Ten possible selections are therefore made.
Ten possible ways of arrangement can be made as we can see from the illustration
above.
Another example can be chosen from a lottery in which out of all the numbers from
1 to 90, five are selected as the winning numbers for the National lottery. The
selection of the five numbers out of the ninety is by combinational arrangements
since the other in which the winning numbers are picked is not necessary. The
number of possible arrangements in this case is given by:
85
90! 90!
C5 =
90
= = 43,949,268 ways.
(90 − 5)!5! 85!5!
Hence, the chance for one winning the lottery is too low since 43,949,268 different
sets of five winning numbers can be selected.
Permutations
The term, permutation, on the other hand, refers to the number of ways of arranging
objects from a group of objects at a time with attention given to the order of
arrangement. In order words, a permutation of n objects taking r at a time is number
of arrangement of r objects out of the n objects with attention paid to the order of
arrangements. Thus, if n is the total number of objects in the group and r is to be
selected at a time taking into consideration the order of arrangement, the possible
number of ways is given by:
n!
n
Pr =
(n − r )!
For example, consider the selection of two numbers at a time from the set {1, 2, 3,
4, 5}. The possible selections by permutation are:
(1, 2) (2, 1) (1, 3) (3, 1) (1, 4) (4, 1) (1, 5) (5, 1) (2, 3) (3, 2) (2, 4) (4, 2)
(2,5) (5, 2) (3, 4) (4, 3) (3, 5) (5, 3) (4, 5) and (5, 4). Twenty possible selections
are therefore made.
The total number of objects in the set =5; Number of items selected at a time=2
: . Number of ways of making the selection is given by:
5! 5! 5 4 3 2 1
5
P2 = = = = 20 ways.
(5 − 2)! 3! 3 2 1
Twenty possible ways of arrangement can be made as we can see from the
illustration above.
The Concept of Exclusion and Inclusion in Combinations
86
Exclusion
If some objects are to be selected by combinational means in such a way that some
particular objects are to be excluded, the number to be excluded should be deducted
from the total. The experiment is then conducted on the remaining objects.
For example, assuming there are 8 boys in a class from which a committee of 3
boys is to be formed. The number of ways of forming the committee so that 2
particular boys are excluded can be determined as follows:
Inclusion
For an object or objects to be included, it has to affect both the total number of
objects and the number of objects to be selected. The number to be included should
be deducted from the total, and also from the number to be selected. Then, the
number of combinations of the remaining objects gives the number of ways
required.
For example, the number of ways of a committee of 4 girls can be formed from a
grouped of 10 girls if:
Solution:
87
: . Number of ways = 9 C3 = 9! 9!
= = 84 ways
(9 − 3)!3! 6!3!
10− 2 8! 8!
(c) Number of ways = C 4− 2 = 8 C 2 = = = 56 ways
(8 − 2)!2! 6!2!
(a) No consideration is given to sex, (b) Two boys and three girls should be on the
committee
Solution
88
Solution
Either one is to be excluded from union members or one to be excluded from the
non-union members. Therefore, number of ways the committee can be constituted
( C C )or ( C C ) = (6 35) + (10 20) = 410 ways
4
2
7
3
5
2
6
3
Solution
Solution
Solution
Since the marbles are coloured differently, the order of arrangement is important.
Therefore the number of ways = 6 P6 = 6!= 720 ways .
Activity: In how many ways can 8 people be seated on a bench if only 3 seats are
available?
Example: Six men and five women are to be seated in a row so that women occupy
the even places. How many such arrangements are possible?
Solution
Example: In how many ways can 5 people be seated at a round table if (a) They
sit anywhere, (b) Two particular people must sit together, (c) Two particular
people must not sit together
Solution
(a) Since they are to sit around a table, one of them should be made fixed.
Thus the number of ways is given by: 5−1 P5−1 = 4 P4 = 24 ways
(b) The two particular people to be seated together should be considered as one
person so that there would apparently be 4 people altogether and they can be
arranged in 4−1 P4−1 2!= 3P3 2 = 12 ways.
(c) Number of ways of arranging 5 people at a round table so that 2 people do
not sit together is
24 – 12 = 12 ways.
90
Activity: Six different Mathematics books, three different English books, and
four different
Literature books are arranged on a shelf. How many different arrangements
are possible if (a) The books on each particular subject must all stand
together
(b) Only the Mathematics books should stand together
Solution
Because no one person should win more than one, the order of arrangement is
important
: . Number of ways = 20 P3 = 20! = 6,840 ways
(20 − 3)!
Solution
Solution
The given numbers are eight in number. Since 6’s are two and 7’s are three and the
remaining numbers do not repeat themselves, the number of ways =
8!
= 3,360 ways
2!3!
91
Solution
(a) We consider the three numbers 6, 7, 8 as one number to give n=4. Number of
items
(6, 7, 8) to be put together, x = 3
Number of ways for 6, 7 and 8 to be to be together = 3! 4! = 6
4!
= 36 ways
(3 − 1)! 2!
(b) 3 and 5 are two numbers. We need to find the total possible arrangements and
also the
number of arrangements when the two numbers are together. We then subtract
the latter
from the former to get the expected answer.
92
5.11 SELF ASSESSMENT QUESTIONS
1. On your route to work, there are two traffic lights. You are 20% likely to
be stopped at the first and 40% likely to be stopped at the second.
5. You may hear a statistic like “30% of all highway fatalities involve drunk
drivers.” From a statistical point of view, why is this the wrong statistic
upon which to base a MADD (Mothers Against Drunk Drivers) lobbying
effort? What probability involving the same events would be relevant?
Hint: Compare to the statistic, “Over 50% of all highway fatalities involve
male drivers.
7. We write the number “1” one head of a coin and the number “–1” on the
93
tail. We then flip the coin. Let N = the number appearing on the top of the
coin, and B be the number on the bottom of the coin. Find E(N), E(B), E(N
+ B), and E(NB). Interpret each.
8. A standard six sided die is made so that the opposite faces always add to
seven. Hence, the “1” face is always opposite the “6” face, and so on. Let
T = the number that appears on the top face of a die that we roll, and B =
the number appearing on the bottom face.
(a) One particular boy and one particular girl are to be included.
(b) Three particular boys are to be included and two particular
girls are to be excluded.
13. In how many ways can 3 prizes be awarded to a class of 10 boys, one for
English, one for
94
Mathematics and for French if? (a) No boy should win more than one prize
(b) there is no condition.
14. In how many ways can 10 story books be arranged on a straight shelf?
16. There are 5 different mathematics books and 3 different English book on a
shelf.
Find the number of ways the arrangement can be made if
(a) the books on each particular subject must stand together. (b) the books
should stand anyhow.
17. In how many ways can 10 girls be seated on a bench if only 5 seats are
available?
95
SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II. 8th
Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
96
UNIT 06
PROBABILITY
97
CONTENTS
Pages
Introduction ............................................................................................................99
Objectives… ..........................................................................................................99
6.1 Sets….. ...........................................................................................................100
6.2 Probability .....................................................................................................108
6.3 Types of Probability .......................................................................................109
6.4 Random Experiment ......................................................................................110
6.5 Probability Distribution .................................................................................111
6.6 Discrete and Continuous Distribution............................................................112
6.7 Probability Tree Diagram ..............................................................................117
6.8 Law of Axiom of Probbaility………………………………………………118
6.9 SELF ASSESSMENT QUESTIONS…………………………………… ....130
Suggested Readings…………………………………………………………… 132
98
Introduction
Probability formulas and technique developed by Jacob Bernoulli (1654- 1705),
Reverend Thomas Bays (1702- 1761), Abraham de Moivre (1667- 1754) and
Joseph Lagrange (1736- 1813). In the nineteenth century Pierre Simon and Marquis
de Laplace gather all these early ideas and compiled the first general theory of
probability. Probability theory is a part of our daily life. In many personal and
managerial decision we face uncertainty and ultimately use probability theory like
weather forecasting, sale forecasting and so on.
Objectives
After studying this unit, you will be able to.
• Define experiment, outcome, event, probability and equally likely.
• Restate the formula for finding the probability of an event.
• Determine the outcomes and probabilities for experiments.
• Interact with die rolls and spinners to help predict the outcome of
experiments.
• Distinguish between an event and an outcome for an experiment.
• Recognize the difference between outcomes that are equally likely
and not equally likely to occur.
• Apply probability concepts.
99
6.1 Sets
The concept of sets is very useful in statistics because it is one of the basics of
understanding the principle of probability, the subject of the next chapter which is
a vital topic in statistics.
A set is a well-defined collection of objects. Any group of objects of the same kind
can be considered as set. The object in a set are called elements or members of the
set and may be anything whatsoever. We may have a set of goats, a set of cars, a
set of tables, or even a set of sets sometimes called a class of sets. A set is usually
denoted by a capital letter and an element is represented by a small letter. Thus, if
a is an element of set A, then we write;
a A.
a A.
A set is specified by the content of two braces or curly brackets: { }. There are two
methods for specifying the content of a set. These are:
(i) The Tabular Method in which case the elements are enumerated explicitly.
For example, the set of all even numbers between 1 to 10 will be: {2, 4, 6, 8}
(ii) The Rule Method in which case the content of a set is determined by some
rule, such as: {even numbers between 1 and 10}. The rule method is usually
more convenient to use when the set is large. For example, it would be tedious
to write explicitly using the tabular method for the set: {even numbers between
1 and 10, 000}.
100
The empty set is represented with a symbol or { }. It is often called a null set.
A set is said to be a finite set if it is either empty or has elements which can be
counted, with the counting process starting and ending at certain stages; that is, the
set has a finite or definite number of elements.
On the other hand, an infinite set is one whose elements are not finite. An infinite
set having countable elements is known as a countably infinite set. For example, a
set of all integers or {integers}.
Subsets
The set A, is said to be a subset of another set B, if all the members of A are also
members of B [i.e. A B or B A]. In other words, A is said to be contained
in B. If at least one element exists in a set B which is not in set A, we say A is a
proper set of B.
For example if A = {1, 3, 5} and B = {1,2, 3, 4, 5, 6}, then since 1, 3 and 5 are all
contained in the set B we can say A is a subset of B. [i.e. A B]. The null set is a
subset of all other sets.
If two sets, A and B, have no common set or elements at all, they are called disjoint
or mutually exclusive sets. For example, if A = {1, 3, 5, 7} and B = {2, 4, 6, 8, 10},
then A and B are disjoint sets.
The universal set, also known as the entity set, is the largest possible set containing
all the members in any experiment. In other words, it contains all the possible
101
subsets. For example, the sets {natural numbers} and {integers} can be considered
as universal sets.
Various operations are carried out in sets. These operations are explained below.
Complement sets
If A is a set, then the complement of A, written as A1, is the set containing all the
other elements in the universal set which are not found in the set A. For example,
if U = {1, 2, 3, ---, 10} and A = {1, 2, 3, 5, 7}, then A1 is given by; A1 = {4, 6, 8,
9, 10}
The intersection of Sets ( ) [ i.e. A cap B]
The intersection of two sets A and B, is the set containing the common elements of
A and B. It means the set that contains the elements which can be seen in both A
and B. For example, if A={1, 2, 3, 5. 7} and B = {2, 5, 7, 8, 9}, then A B =
{2, 5, 7}.
The union of Sets ( )
The union of sets is the set whose elements include the elements of all sets under
consideration. Thus, if A = {1, 2, 3}, B = {2, 3, 4, 7} and C = {6, 8, 9}, then:
A B = {1, 2, 3, 4, 7}; B C = {2, 3, 4, 6, 7, 8, 9};
A C = {1, 2, 3, 6, 8, 9} and A B C = {1, 2, 3, 4, 6, 7, 8, 9}.
Venn Diagram
Two-Set Problems
Where n(A) means the number of members in Set A, n(B) means the number of
members in Set B.
102
It must be noted that n(A B) is subtracted from the sum of members in A and B,
because the intersection region is added twice.
On the other hand, if the two sets are disjoints or mutually exclusive (i.e. do not
intersect), the relation reduces to: n(A B) = n(A) + n(B)
For example, assume that there are 100 students in a school who are going to take
Geography (G) and History (H) examinations. If it is found that 65 students are to
take Geography whilst 53 are to take History, the number taking both papers can
be found as follows:
Solve the Problem Given in Diagram: we can solve the problem using a Venn
diagram as shown in Figure 6.1 below in which x represents the number of students
taking both Geography and History.
U
G 65-x x 53 - x H
Solve the Venn Diagram Problem: we can solve the problem above using a
Venn diagram as shown in Figure 6.2 below.
F H
x - 20 20U x - 10
x - 20 + 20+ x - 10 = 40 . 2x = 70 x = 35.
Finally, let us assume that in a sports contingent, there are 40 players in the football
team and 36 players in the volleyball team. Eight players play both football and
volleyball. Let us find:
(b) Number who play only football or only volleyball is (40 – 8) + (36 – 8) = 60
Three-Set Problems
Let A, B and C be any three intersecting sets. The following relation can be
obtained if at least one of any event is to be achieved.
104
It must be noted that when the members of the three sets are exclusively added, the
intersections of the members of any two sets are added twice and therefore one each
has to be subtracted. When the subtractions are done, n(A B C) is subtracted
thrice while addition of that part has been added twice and hence, has to be added
once.
In a group of 300 traders, 210 sell Wheat, 195 sell Maize and 180 sell rice. Ninety
sell both Wheat and Maize, 100 sell Wheat and Rice, and 115 sell both Rice and
Maize. If each trader sells at least one of the three items, the number of traders who
sell all three items can be derived as follows:
Let U = {Traders}; G = {Wheat sellers}; R = {Rice sellers}; M = {Maize sellers}
and let x be number of traders selling all the three items.
EXAMPLES
Example:. A company has a large number of typists. A survey shows that 30 can
use a word processor, 25 are audio-typists and 28 are short-hand writers. Of the
typists who are short-hand writers, 3 are audio-typists, and can use word processor,
5 are audio-typists but cannot use a word processor, 6 can use a word processor but
are not audio typists. Eight can use word processor and are audio-typists but are
not short hand typists.
(a) Present this information on a Venn diagram.
(b) How many typists were involved in the survey?
(c) How many typists have only one skill?
Solution
Let P = {Word processor typists}; A = {Audio-typists}; S = {Short-hand typists}
105
Using the Venn diagram below we can solve the problem as follows:
P
13 8 9 A
6 3 5
14
Adding all members in the various regions and solving for x gives the value of x
as 55.
The number of students who passed
Example:. In a survey of the 100 out-patients who reported at a hospital one day,
it was found that 70 complained of fever, 50 had stomach trouble and 30 were
injured. Each of the 100 out-patients had one or other of these complaints, and
44 had exactly two of them. How many patients had all three complaints?
Solution
Let U= out − patients reported that day; F= Those who had fever
S = Those who had stomach trouble ; J = Those who were injured
x = Those who had all three complains
Let appropriate letters for the various regions in the Venn diagram is on
next page
106
F=70 a S=50
f s
b x c
j
J=30
f = 70 – (a + b + x ); s = 50 – (a + c + x ); j = 30 – (b + c + x )
Since the union of the three sets adds up to 50, we can have;
n(F S J) = f + s + j + a + b + c + x
= [70 – (a + b + x )] + [50 – (a + c + x )] + [30 – (b + c + x )] + a + b + c + x
= 150 – 2(a + b + c ) + a + b + c – 3 x + x = 150 – (a + b + c ) - 2 x .
But a + b + c = 44 and therefore, n(F S J)=150 – (a + b + c ) - 2 x =150 – 44 -
2 x = 106 - 2 x
But n(F S J) = 100 and therefore
106 - 2 x = 100 - 2 x = - 6 x = 3, Hence, 3 people had all the three
complaints.
Example: In an examination, each of the 1,000 students sat for Biology, Chemistry
and Physics. All the Students passed at least one subject, 600 passed Biology, 500
passed Chemistry, and 290 passed Physics, 175 passed both Biology and
Chemistry, 150 passed both Biology and Physics, and 120 passed both Chemistry
and Physics. How many students passed
(a) all the three subjects (b) exactly one subject (c) exactly two subjects
Solution
Let U ={All students}; B={Students who passed Biology}; C={Students who
passed Chemistry}
P = {Students who passed Physics} and x = Number who passed all the three
subjects.
Using the formula:
n(B C P) = n(B) + n(C) + n(P)–n(B C) – n(B P)– n(C P) + n(B C P)
1,000 = 600 + 500 + 290 – 175 – 150 – 120 + x x = 55.
107
U
150- x x 120 - x
20 + x
6.2 Probability
Quite basic to the theory of probability is the idea of physical experiment. An
experiment is any action that has a number of possible outcomes (or events). For
example, the casting of a die once is an experiment of six possible outcomes which
are: 1, 2, 3, 4, 5 or 6; while the tossing of a coin is an experiment of two outcomes
– head or tail. It is however those experiments that are regulated in some
probabilistic way that is helpful. A single performance of an experiment is called
a trial for which there is a given set of outcomes.
Definition of Probability
Thus, for example, in a toss of a fair die, the probability that 6 appears is 1/6.
Trial, Outcome, Event and Sample space
108
A trial is any process which when repeated generates a set of results or observations.
An outcome is the result of carrying out a trial. Thus, selecting a student at a
random from a class is a trial while selecting a particular student say, Grace, is an
outcome.
An event is a set which consists of one or more of the possible outcomes of a trial.
A sample space is the set of all possible outcomes in any experiment. It is normally
denoted by the letter S or the symbol Ω. Hence, the sample space is the universal
set for any given experiment while an event is just a subset. All the outcomes in
the sample space are mutually exclusive which, as has been explained in Section
7.7, means the occurrence of one of the outcomes rules out all the others. For
example, one cannot have both a head (H) and a tail (T) in a single throw of a fair
die. The probability of a sample space is equal to 1. Thus, P(S) = 1 or P(Ω) = 1.
Since, an event is a set, all our earlier definitions and operations applicable to sets
are also applicable to events. For example, if two events have no common
outcomes they are said to be mutually exclusive as has been explained in 7.7 above.
The probability of any event A lies between zero and one. That is: 0 P(A) 1.
We can summarize therefore that, any trial has a number of possible outcomes, and
the set of all possible outcomes is called the sample space. An event is defined to
be a subset of sample space.
Probability space
Probability space corresponds to a given experiment comprising three items. An
experiment is a course of action whose consequences is not predetermined. The
three items of the probability space include:
(a) The set of all possible outcomes of the experiment which is usually called
sample space.
(b) A list of all events which may possibly occur as a consequence of the
experiment.
(c) An assessment of the likelihood of these events.
Prior Probability
109
This is the probability which is concerned with estimating the likelihood that an
event will occur. These probabilities are calculated prior to observing the results
of an experiment. It is the type of probability which can be specified by common
logic. An example is the throwing a fair die or a coin. This is an exact probability
based on an objective approach.
Posterior Probability
The probability calculated after the outcome of an experiment has been observed
which cannot be associated with common logic is called posterior probability. For
example, if we want to find the probability of average number of workers who are
punctual to work daily, will need to observe the attendance of workers for say one
month and find the average number of workers who were punctual in a day. The
result divided by the total number of workers is a posterior probability.
Empirical Probability
Subjective Probability
110
Equally Likely Events
Any set of events in the sample space which has all its members having equal
chance of being drawn are said to be equally likely events. An example of such
events is the outcomes from throwing a fair die. The event of getting 1, 2, 3, 4, 5,
or 6 has a probability of 1/6 for each score.
Unequally Likely Events
A set of events in the sample space whose members do not have equal chance of
being drawn are said to be unequally likely. An example is throwing an unfair die.
The chances of some faces showing up will be more probable than other faces.
Discrete and Continuous Variable
A variable can either be discrete or continuous. A variable is discrete if it assumes
values which are usually whole numbers like 1, 2, 3, ---. A variable is usually
represented by a letter or a symbol. Thus, if x represents the marks scored by 6
students in a class given as 18, 19, 20, 21, 19, and 22, then x is termed as a discrete
variable because it assumes values which indicate disjoint points of whole numbers.
A continuous variable on the other hand, represents all measurements of intervals
of points. A decimal or fractional value can be obtained for a continuous variable.
The lifetime of a light bulb can be a continuous variable. Weight of students can
also represent a continuous variable. It is therefore not restricted to whole numbers.
111
6.6 Discrete and Continuous Probability
As with the sample space, events may be either discrete or continuous. The
probability of any finite number of an infinite sequence of points is said to be a
discrete probability. An example is the probability of throwing a fair coin or die.
On the other hand, a continuous probability is the probability of the set of one or
more intervals of points. An example is to find the probability of ages of children
between 8 and 10 years.
A weather forecast on radio may state the chance of rain as 10% tomorrow but for
another day it may be 90%. Thus, one advises himself as to whether or not carry a
rain coat or an umbrella along.
We also apply probability at work places during planning and budgeting. How
much to produce, what to produce, and when to produce, derive a great recognition
from probability.
An insurance company will have to find out how long a person can live before
accepting his life assurance policy to be processed. This is rightly done by
considering the probability of how long the person will live. A vehicle is usually
granted a comprehensive insurance policy after carefully examining its age and
road worthy certificate to determine how probable it can exist and for what period.
All these are well determined by the help of probability.
112
Two-Set Problems
Problems on probability involving two sets are explained in Figure below. The sets
A and B are presented as follows:
A B
A B’ A B A’ B
A and B are not mutually exclusive. Hence, to find P(A B) from the values given
in the Venn diagram, the problem can be solved as follow:
A 12 10 20 B
For any set of events to be mutually exclusive, it must satisfy the following
conditions.
113
ii. The probability of the union events is the sum of the probabilities of the
individual events e.g.
A B B
C
In Figure below, events A and B are mutually exclusive. Therefore, the probability
of the union event, P(A B), is calculated as follows.
A B A B
11 3 19
20 32
4
2 6
5 C 18 C
114
From Figure A above, the probability of the union of the three events is calculated
as follows:
20 32 17 3 + 4 2 + 4 4 + 6 4 73 − 23 5
= + + − − − + = =
80 80 80 80 80 80 80 80 8
From Figure 8.5B, since A, B and C are mutually exclusive; the probability of the
union of the three events is calculated as follows:
Thus, the complement of an event is the set of outcomes in the sample space which
are not members of outcomes of the given event.
For example, if the probability that Ben can win a game is 0.8, then the probability
that Ben cannot win the game is: 1 – 0.8 = 0.2.
EXAMPLES
Example. Two boys, A1 and A2, play a game of chance. The probabilities of A1
and A2 winning the game are 3/5 and 5/6 respectively. Find the probability that
(a) Both of them win the game
(b) Only A1 wins the game
(c) Only one wins the game
Solution
Solution
Let K = event of Ali solving the problem; K = complement of K
A = event of Amna solving the problem; A = complement of A
F = event of Farid solving the problem; F = complement of F
(a) P(K and A and F) = P(K) P(A) P(F) = 0.7 0.4 0.8 = 0.224
(b) P( K and A and F ) = P( K ) P(A) P( F ) = 0.3 0.4 0.2 = 0.024
(c) P(K and A and F) = P(K) P( A ) P(F) = 0.7 0.6 0.8 = 0.336
(d) P( K and A and F ) = P( K ) P( A ) P( F ) = 0.3 0.6 0.2 = 0.036
(e) P(at least one can solve) = 1 – P(none can solve) = 1 – 0.036 = 0.964
Example. Three statistically independent events X, Y and Z are such that P(X) =
0.85;
P(Y = 0.72; P(Z) = 0.60, Find the probability of:
(a) X and Y occurring together (b) X and Z occurring together
(c) X, Y, and Z occurring together (d) None of them occurring
Solution
(a) P(X and Y)=P(X) P(Y)=0.85 0.72 = 0.612, (b) P(X and Z) = P(X) P(Z) =
0.85 0.60 = 0.51
(c) P(X and Y and Z) = P(X) P(Y) P(Z) = 0.85 0.72 0.60 = 0.3672
(d) P( X and Y and Z ) = P( X ) P( Y ) P( Z ) = 0.15 0.28 0.40 = 0.0168
116
Relative Frequency Interpretation of Probability
Consider the frequency distribution table below and the Relative Frequency Table
can be constructed as shown below:
Let X denotes a random variable showing the age of boys from 2 years to 6 years.
With the frequency table above, the probability distribution will be deduced as
follows:
Table: Probability Distribution Table
X 2 3 4 5 6 Total
Frequency 3 4 8 3 2 20
3 4 8 3 2 20
P(X) /20 /20 /20 /20 /20 /20 = 1
From the foregoing therefore, the probability distribution of a random variable X is
the list of the relative frequencies of the variable X.
6.7 Probability Tree Diagram
The theory of probability can be expanded with the probability tree diagram. For
example, if a fair coin is tossed once, the sample space, S = [H, T]. It therefore
consists of two possible outcomes. This can be represented in a Tree Diagram as
shown in Figure.
Outcome Probability
P(H) = ½
T P(T) = ½
H TH
T
T T
TT P(no head) =¼
If the coin is tossed thrice, the sample space S = [HHH, HHT, HTH, THH, HTT,
THT, TTH, TTT]. Thus, eight possible outcomes are to be realized. The
probability tree diagram is given as follows:
Outcome Probability
The addition law is applied to the calculation of probability of two or more mutually
exclusive events. Under this law, all individual probabilities are added together.
The word ‘or’ and the union sign, ‘ ’ are concerned with addition of probabilities.
Let A1, A2, A3, ---, An be events in the sample space which are mutually exclusive.
Then:
P(A1 or A2 or A3 or---or An)=(P(A1 A2 A3 ----- An)= P(A1)+
P(A2)+P(A3)+--+ P(An)
P( n An) = P( An )
This means the events, A1, A2, A3, ---, An, are disjoint and therefore the union of
their probabilities is the sum of the individual probabilities.
For example, to find the probability of scoring a ‘6’ with a fair die or a ‘Head’ with
a fair coin after tossing the die and the coin once, we proceed as follows:
For example, to find the probability of scoring a ‘6’ with a throw of a die and a
‘Head’ with a throw of a coin, we proceed as follows:
119
Selection with Replacement and Selection without Replacement
Selection with replacement is the selection procedure which requires that an item(s)
selected is/are replaced before subsequent selections. This type of selection
procedure corresponds to independent events. In this case, because an item is put
back into the system before subsequent selection, the probability of any selection
of a particular event and the subsequent ones of the same event, will not change.
As an example, let us find the probability of selecting two red balls from a bag
containing 5 red, 6 blue and 7 green identical balls at random, one after the other,
with replacement.
If R, B and G are the events of selecting red, blue and green balls respectively, then
since the total number of balls is 18 and n(R) = 5; n(B) = 6 and n(G) = 7: P(R) =
5
/18; P(B) = 6/18 and P(G) = 7/18
Hence, the required probability will be calculated as follows:
P(1st is red and 2nd is red) = P(R1 and R2) = P(R1 R2) = P(R1) P(R2) = 5/18 5/18 =
25
/324
Example: let us assume that a bag contains 8 white, 5 brown and 7 green marbles.
Three of them are selected at random with replacement. Let us find the probability
that :
(a) They are all white, b) They are of the same colour and c) The first two are
brown, and the third green.
Solution:
The problem is solved as follows:
Let W = event of selecting a white marble, B = event of selecting a brown marble
G = event of selecting a green marble
(a) P(W1 W2 W3) = P(W1) P(W2) P(W3) = 8/20 8/20 8/20= 8/125
120
On the other hand, selection without replacement is the selection procedure in
which every item selected is not replaced before subsequent selections. This type
of selection corresponds to dependent events. For example, let us consider the
previous illustration where this time, the two red balls are selected at random, one
after the other without replacement. When the first red ball is selected, the number
of red balls in the bag will reduce by one and likewise, the total number of balls in
the bag will reduce by one. The required probability will then be given by:
P(1st is red and 2nd is red) = P(R1 and R2/R1) = P(R1) P(R2/R1) = 5/18 4/17 = 10/153
Example: consider a box containing 7 blue and 5 green marbles of the same sizes
only for colour. Two marbles are selected at random, one after the other without
a replacement. Let us find the probability that: They are of the same colour,
(b) Each colour is selected.
Solution:
These can be calculated as follows:
n(B) = 7; n(G) = 5. The total number of marbles = 7 + 5 = 12.
Let A1 and A2 be any two events which have nonzero probabilities of occurrence;
that is, P(A1) 0 and P(A2) 0. The two events, A1 and A2, are said to be statistically
independent if the probability of occurrence of one event is not affected by the
occurrence of the other event. Thus,
P(A1/A2) = P(A1) and, P(A2/A1) = P(A2)
As we shall see later from conditional probability, the two events above can have a
joint probability equal to the product of the probabilities of the events given by:
P(A1 A2) = P(A1) P(A2)
121
It has already been stated earlier in this chapter that the joint probability of two
mutually exclusive events is zero. That is; P(A1 A2) = 0. Thus, if two events
have nonzero probabilities, they cannot be both mutually exclusive and statistically
independent. Therefore, for any two events to be independent, they must have an
intersection. That is;
A1 A2
Another example can be given about a 52-card deck in which A is the event of
selecting a King; B the event of selecting a jack or queen; and C, the event of
selecting a heart. The corresponding probabilities of the three events are:
The following joint probabilities can be computed can be computed from the above
information.
P(A B) = 0; since it is not possible to select a king and a jack or queen at
the same time.
Since the other pairs are independent:
P(A C) = P(A) P(C) = 4/52 13/52= 1/52 , P(B C) = P(B) P(C) = 13/52
13
/52= 1/52
Multiple Events
The set of events A1, A2, A3, ---, An, are said to be independent if only and only if
they are independent by pairs and also independent as a joint, of all the n possible
events. Thus, for three given events A1, A2 and A3, which are independent, the
following conditions must be satisfied.
P(A1 A2) = P(A1) P(A2), P(A1 A3) = P(A1) P(A3)
P(A2 A3) = P(A2) P(A3) , P(A1 A2 A3) = P(A1) P(A2) P(A3)
More generally, for n statistically independent events, it is required that all the
conditions below must be satisfied for all 1< i < j < --- < n
122
P(Ai Aj) = P(Ai) P(Aj)
P(A1 A2 A3 --- An ) = P(A1) P(A2) P(A3)---P(An).
As an example, let us consider three boys A, B, and C who play a game of chance.
The probabilities that A, B, and C win the game are 0.5, 0.7, and 0.9
respectively.
The probability that the three boys will win the game can be calculated as follows:
Then, P(A) = 0.5; P(B) = 0.7; P(C) = 0.9 and since the three event are independent,
P(A and B and C) = P(A B C)= P(A) P(B) P(C) = 0.5 0.7 0.9=0.315
Any set of events are said to be non-independent if the occurrence of the given
event is affected by the occurrence of the previous event or events of the Same
Sample Space. Thus, joint probabilities events under this are just like the problems
under this are just like the problems under selection without replacement
The probability P(A B) is called the joint probability for two events A and B
which represent the intersection of the sample space. As we saw from equation
8.6.01 above,
P (A B) = P(A) + P(B) - P(A B)
which is equivalent to
Thus, it should be noted that the probability of the union of two events can never
exceed the sum of the probabilities of the individual events. The equality holds only
123
for mutually exclusive events since in this case, A B= and therefore, P(A B)
= P( ) = 0
On the other hand, given some event B with nonzero probability, P(B) > 0, we
define the conditional probability of an event A, given that B has occurred, by
Let P(A) of an event A be any probability defined on a sample space S. P(A) can
be expressed in terms of conditional probabilities on the sample space S which has
been partitioned into n mutually exclusive events Di, i = 1, 2, 3, ---,n; whose union
equals S.
The intersection of any pair or any group of the partitioned events is an empty set.
That is:
n
Bi Bj = ; i j=1, 2, 3, ---, n , and; Bi = S
i =1
n n
Since A S = A, it follows that; A S = A ( Bi) = (A Bi)
i =1 i =1
Since the events, A Bi; I = 1, 2, 3, ---, n are mutually exclusive, as seen from the
axiom above, it follows:
n
P( A Bi)
n
P(A) = P(A S) = P[ (A Bi)] =
i =1
i =1
But from above, we can write: P(A B1)= P(A/B1) P(B1); P(A B2) =
P(A/B2) P(B2);
P(A B3) = P(A/B3) P(B3); -------------; P(A Bn) = P(A/Bn) P(Bn)
n n
Thus, we can write: P( A Bi) = P( A / Bi ) P( Bi)
i =1 i =1
From above equation it is known as the total probability of event A.
Bayes Theorem
The definition of conditional probability, as given by 8.24.02 and 8.24.04, applies
to any two events in the sample space. Thus, if Bi is any one of the events defined
in 8.24.05, we can write:
124
P(Bi/A) = P( A Bi ) P(A) P(Bi/A) = P(A Bi) ; P(A) 0
P( A)
Alternatively,
P(Bi/A) = P( A / Bi ) P( Bi )
P( A)
But from Equations,
n
P(A) = P( A / B ) P( Bi) .
i =1
i Thus, for any partitioned event Bi, to occur given
P( A / Bi ) P( Bi )
P(Bi/A) = P( A / Bi ) P( Bi ) =
P( A) P( A / B1 ) P( B1 ) + P( A / B2 ) P( B2 ) + .... + P( A / Bn ) P( Bn )
= P( A / Bi ) P( Bi )
P( A / B1 )P( B1 )
Thus, in general, if we have n independent events A1, A2, A3, ---, An, and W is any
other event which is common to the mutually exclusive events, A1, A2, A3, ---, An,
then by Bayes’ theorem:
P(Ai/W) = P(W / Ai ) P( Ai )
P(W / A1 ) P( A1 ) + P(W / A2 ) P( A2 ) + .... + P(W / An ) P( An )
P(W / Ai i ) P( Ai )
=
P(W / A )P( A )
i i
One box contains two red balls and a second box of identical appearance contains
one red and one white balls. If a box is selected at random and one ball is drawn
from it, let us find the probability that the first box was the selected one if the drawn
ball is red.
To solve such a problem:
125
Let B1 = event of selecting the first box , B2 = event of selecting the second
box
R = event of selecting a red ball
Example: There are four different machines A, B, C and D with their respective
degrees of accuracy being 90%, 70%, 50% and 40%. The probabilities that the
machines will give wrong results are given as 2%, 5%, 7% and 9% respectively. If
a machine is operating wrongly we can find the probability that it is machine C as
follows:
Example: A box containing 6 red and 9 blue balls. Two balls are selected at
random, one after the other without replacement. Let us find the probabilities of
the following events:
126
(a) They are both red b) They are of the same colour c) Each colour is selected.
Solution:
The total number of ways of selecting any two balls out of the fifteen is given by:
15!
15
C2 = = 105
(15 − 2)!2!
(a) The total number of ways of selecting two red balls out of the six red balls
6
is given by: P( R1 R1 ) = C 2 = 15 = 1
105 105 7
105 105 35
EXAMPLES
Example. A bag contains 5 red, 4 blue and 3 white marbles. Three of them are
selected without replacement. Find the probability that:
(a) They are all blue, (b) Each of the colours is selected, (c) Two blue and one white
are selected
(d) At least one red was drawn, (e) Each colour is selected in order red, blue and
white.
Solution
127
(c) P(2 blue and one white) = P(BBW) = C 2 C1 = 18 = 9
4 3
Solution
(a) P(A B) = P(A) + P(B) = 0.3 + 0.4 = 0.7 [A and B are mutually exclusive]
Example. If A and B are independent events with P(A) = 0.2 and P(B) = 0.5, find:
(a) P(A B) (b) P(A B) (c) P(A/ B/)
Solution
(a) P(A B) = P(A)P(B) = 0.2 0.5 = 0.10, (b) P(A B) = P(A) + P(B) – P(A B)
= P(A) + P(B) – P(A)P(B) = 0.2 + 0.5 – (0.2)(0.5) = 0.60
(c) P(A/ B/) = P(A/)P(B/) = (1 – 0.2)(1 – 0.5) = 0.8 0.5 = 0.40
Note that by De Morgan’s Law: P(A/ B/) = P(A B)/ = 1 – 0.6 = 0.4 [from (b)]
Solution
Example. If P(A) = x, P(B) = ½x and P(A B) = 0.8, find the value of x if A and
B are independent.
128
Solution
3 (−3) 2 − 4(1)(1.6)
x= x = 2.3; x = 0.7 , Since x should lie between 0 and
2(1)
1, x = 0.7.
Example. a)Two events A and B, are independent with P(A) = 0.4 and P(B) = 0.7.
What is P(A/ B)?
(b) Two events E and F are such that P(E F) = 0.8, P(E) = 0.7 and P(F) = 0.6.
Find
(i) P(E//F) (ii) P(F//E/)
Solution
P( E ) P( E )
But P(F E) = P(F) + P(E) – P(F E) P(F E) = 0.7 + 0.6 – 0.8 = 0.5;
and
P(E/) = 1 – P(E) = 1 – 0.7 = 0.3
Example. The probability that a certain beginner at golf gets a good shot if he uses
the correct club is 1/3, and the probability of a good shot with an incorrect club is ¼
. In his bag are 5 different clubs only one of which is correct for the shot in question.
If he chooses a club at random and takes a stroke what is the probability that:
(a) He gets a good shot, (b) The correct club had a good shot?
129
Solution
(a) P(good shot) = P(D) = P(good shot due to A) + P(good shot due to B)
= P(A D) + P(B D) = P(A)P(D/A) + P(B)P(D/B) = 1/5 1/3 + 4/5
1 4
/4 = /15
1 1
(b) P(A/D) = P( A / D) = P( A) P( D / A) = 5 3 = 1
P ( D) 4 4
5
Example. On a visit to a dentist, a patient is told that his mouth contains 20 of his
original teeth of which 5 are required to be drilled, 3 extracted and the rest left.
What is the probability that if two teeth are chosen at random (a) They would both
be required to be drilled?
(b) One will have to be drilled and one extracted?
Solution
A B U
6
5 8
5
8 4
7 C
Find the proportion of players in (a) Set A (b) all three sets (c)sets A
and B (d)only one set (e) none of the three Sets [Ans; (a)12/25
(b)1/10 (c)11/50 (d)2/5 (e)7/50]
130
Q2. A survey of reading habits of 130 students showed that 30 read both Comics
and Novels, 10 read neither Comics nor Novels and twice as many read
Comics as read Novels. How many read (a)Comics (b)Novels (c)Only
Comics or only Novels [Ans;(a)100 (b)50 (c)90 ]
Q3. In a class of 50 students, 27 study French, 24 study History and 30 study
Geography. Each student studies at least one of the three subjects. Five
study all the three subjects while 11 study French and Geography. How
many study (a)One of the three subjects (b) exactly two subjects [Ans;
(a) 24 (b) 21]
Q4. Three girls are to write professional examinations. They are Amna, Bernice
and Mabel. The probability that they will pass the examinations are; 0.5, 0.7
and 0.8 respectively. What is the probability that (a) The three girls will pass
the examinations? (b) None of them will pass the examinations? (c) Only
Mabel will pass the examinations? (d) Only one of them will pass the
examinations?(e) At least one of them will pass the examinations?
[Ans: (a)0.28 (b)0.03 (c) 0.03 d)0.22 (e)0.97
Q5. A box contains 8 red, 3 white and 9 blue balls. If 3 balls are drawn at
random determine the probability that all 3 are red (b) all are white (c)
2 are red and 1 is white (d) at least 1 is white (e) one of each colour is
drawn (f) the balls are drawn in other; red white, blue.
14 1 21 23 18 3
Ans : (a) 285 (b) (c) (d ) (e) (f)
95
1140 95 75 95
Q6. A diagnostic test for a new disease has the following characteristics: A
person with disease if given the test certainly show positive reactions, while
10% of persons without the disease who are administered the test show
positive reaction. If in a population sampled, one percent of the people have
the disease, what percentage of those who reacted to the test actually has
the disease? [Ans: 9%]
Q7. If two dice are tossed together once, what is the probability of a) getting a total
of 7? b) Each one of them shows at least 5 points? [Ans: (a) 2/9 (b) 1/9]
Q8. Three fair coin are tossed together. i). List the members of the sample space
ii). Find the probability of getting: (a) At least one head (b) no tail (c)
one head and two tails (d) three tails or two tails [Ans: (a) 7/8
(b) 1/8 (c) 9/64 (d) ½ ]
131
Q9. The events A, B and C satisfy these conditions:P(A) = 0.6 P(B) = 0.8 P(B/A)
= 0.45 P(B and C) = 0.28 Calculate: (a) P(A and B) (b) P(C/B) (c) P(A/B)
[Ans:(a) 0.27(b) 0.35 (c) 0.3375]
Q10. Given that P(A)=0.75,P(B/A)=0.8 and P(B/Ac)=0.6;Calculate P(B) and
P(A/B) [Ans: 0.75; 0.8]
Q11. The probability that an event A occurs is P(A) = 0.3. The event B is
independent of A and P(B) = 0.4. a) Calculate P(A or B or both occur)
Event C is defined to be event that neither A nor B occurs. Calculate
P(C/A’), where A’ is the event that A does not occur. [Ans: (a) 0.58 (b)
0.6]
SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II. 8th
Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
132
UNIT 07
PROBABILITY DISTRIBUTIONS
133
CONTENTS
Pages
Introduction ..........................................................................................................135
Objectives ............................................................................................................135
7.1 Binomial Random Variable ...........................................................................136
7.2 Normal Approximation to the Binomial Distribution....................................137
7.3 Poisson Random Variable ..............................................................................140
7.4 Binomial and Normal Approximation to Poisson Distribution ....................140
7.5 Hypergeometric Distribution .........................................................................147
7.6 Negative Binomial Distribution .....................................................................149
7.7 Geometric Distribution ..................................................................................151
7.8 Normal Distribution .......................................................................................152
7.9 SELF ASSESSMENT QUESTIONS ............................................................156
Suggested Readings .............................................................................................160
134
Introduction
Objectives
After studying this unit, you will be able to.
135
7.1 Binomial Random Variable
Many experiments have responses with Two possibilities (Yes/No, Pass/Fail,
True/False).
Certain experiments called binomial experiments yield a type of random variable
(r.v.) called a binomial random variable.
Then the binomial r.v. denoted by X is the number of successes in the n trials.
136
Example: A box contains a large number of screws. The screws are very similar
in appearance but are, in fact, of three different types A, B, C which are present in
equal numbers. For a given job, only screws of type A are suitable. If 4 screws are
chosen at random, find the probability that i. exactly two are suitable, ii. at
least two are suitable.
If twenty screws are chosen at random, find the expected value and variance of
the number of suitable screws.
Solution:
(ii) P(at least two) = P(r 2) =1– P(r = 1)–P(r = 2)= 1– 4C0(⅓)0(⅔)3– 4C1(⅓)1(⅔)3
= 1– 16/81– 32/81= 11/27
If n = 20, the Expected Value = np = 20 x ⅓ = /3 and Variance = np(1 –
20
p)=20 x ⅓ x ⅔ = 40/9
Solution :
137
suitable and convenient than the others. The relationship between the Normal and
the Binomial distributions illustrates this important point.
It must be recalled that if a random variable r follows the Binomial distribution,
then:
r ~ B(n, p)
and the mean of the distribution is np, while the variance is np(1 – p). It has been
observed that as the sample size n gets larger, the Binomial distribution becomes
approximately equal to the Normal distribution with mean np and variance np(1 –
p). The approximation is quite accurate so far as np 5 and n(1 – p) 5. Hence,
the approximation may not be good enough even if n is large so far as p is very
close to zero or one.
To illustrate this important point, let us solve the following problem using both
Binomial and Normal distributions and observe that results are relatively close.
Twenty students take an examination in statistics which is simply graded: pass and
fail. If the probability, p, of any individual student passing is 60%, let us find the
probability of at least 19 students passing the examination.
From the problem, p = 0.6; 1 – p = 0.4; n = 20.
To solve the problem using the Binomial distribution, we have to find the
probability of exactly 19 students passing, plus the probability of 20 passing. Since
the events are mutually exclusive, the Binomial distribution is allowed. Let r
represent the number passing. Then, the required probability will be given by:
P(r = 19) + P(r = 20) = 20C19 x 0.619 x .041 + 20C20 x 0.620 x .040
= 16 x 0.619 + 0.4 x 0.620 = 0.000 024.
138
Since Binomial Distribution measures discrete probabilities, the ends of the values
should be corrected to make the intervals continuous. This is because Normal
distribution is a continuous probability. We can now solve the above problem using
the Normal distribution.
EXAMPLE
139
of which p is a proportion of a success in a single trial, then the variable
n − p
x
z=
p(1 − p )/ n
has a distribution that approaches the normal with mean zero and standard deviation
one as the number of trials increases. This is just similar to the Normal
approximation to the Binomial distribution. When both the numerator and the
denominator of equation 9.3.02 are divided by n, we get equation above (ie. z-score
for proportion).
Solution:
P = 5% or 0.05; 1 – p = 1 – 0.05=0.95 ;
= p(1− p )/ n = 0.05(1 − 0.05 )/ 100 = 0.022
Let x/n denote any proportion
x n − 0.05 0.10 − 0,05
(a) P(x/n 10%) = P(x/n 0.10) = P
0.022 0,022
= P(z 2.27) = 0.5 – P(0 z 2.27) = 0.5 – 0.4884 = 0.0116
And P(2% x/n 8%) = P( 0.02 x/n 0.08)
0.02 − 0.05 x n − 0.05 0.10 − 0,05
= P
0.022 0.022 0,022
= P(-1.36 z 1.36) = 2{P(0 z 1.36) = 2{P(0 z 1.36)
= 2(0.4141) = 0.8282
Poisson Distribution:
Which values can a Poisson r.v. take?
140
Probability distribution for X (if X is Poisson with mean 𝜆), P(x) = 𝜆 x e – 𝜆 (for x =
x!
0, 1, 2, …)
141
First, let us consider a firm of wholesale fruit distributor who found that on the
average, one apple in fifty is bruised on arrival from the growers. If the apples arrive
in cartons of 100, calculate the probabilities of a carton having 0, 1, 2, 3, or
more than 3 bruised apples.
Example: It is known that 0.1% of all people react adversely to certain type of
drug. What is the probability that out of a sample of 1,000 people a) none will
react to the drug ? b) just one person will react to the drug ? c). more than two
will react to the drug ? d) less than three will react to the drug ?
142
7.4 Binomial an Normal Approximation to the Poisson
Distribution
If the probability of a single trial p approaches zero while the number n of trials
becomes infinitely large in such a manner that the mean = np remains fixed,
then the Binomial Distribution will approach the Poisson Distribution with mean
= np.
This can be illustrated with the following couple of sample problems.
Example: Given that a factory has 100 machines in stock for sale. Five percent of
the machines were found faulty. Find the probability that a) None will be faulty, b)
two will be faulty, c) at most two will be faulty and d) at least three will be faulty.
There are 10,000 tins of milk in a firm to be tested of quality. The selection of
defective ones follows Poisson distribution. Let us find the probability that at least
190 are defective.
Solution
The problem can be solve as follows:
144
EXAMPLES
Example. A machine fills millet flour in nominally 500-gram bags. The actual
weight of the filled bags varies, being approximately normally distributed with
standard deviation 10 grams.
(a) Find the mean weight of bags filled by the machine if 15% filled bags are
underweight.
(b) Calculate the proportion of bags whose weight is between 495 grams and 535
grams.
(c) Bags weighing less than 500 grams are sold at a loss of Rs.3,000. Calculate the
the loss associated with the sale of 150 bags.
(d) If the mean weight of filled bags is adjusted to 521.2 grams and the standard
deviation remains unchanged, what percentage of bags would be sold at a loss?
Solution
Let x represent the weight of any filled bag and be the mean weight filled by
the machine
𝑥̅ = 500 grams; = 10 gram
(a) P(x 500) = 0.15, P x − u 500 − u = P z 500 − u = 0.15
10 10 10
500 − u = 0.5 – 0.15 = 0.35 by using normal table
P 0 z
10
145
Solution
Example 3. If a typist makes an average of two errors per page of a book, use the
Poisson distribution to find the probability that (a) exactly four errors will be found
on a page, (b) at least two errors will be found on a given page.
Solution
The mean of Poisson distribution = 2, Let x represent any number of errors
made per page.
24 e −2 2 − 2
= e
(a) P(x = 4) = 4! 3 (b) P(x 2) = 1 – [P(x = 0) + P(x = 1)]
20 e −2 21 e −2
1− + = 1 − 3e − 2
= 0! 1!
Example 4. The lifetime of batteries produced by a company are normally
distributed with mean 110 hours and variance 2. The probability that a battery
has a lifetime more than 113 hours is 0.3821. (a) Find the variance 2.
(b) Use the variance in (a) to determine the probability a battery will last between
90 and 102 hours.
Solution
Let x denote the lifetime of any battery (a) P(x 113) = 0.3821
x − 110 113 − 110
P = 0.3821
P(z 3/ ) = 0.3821, P(0 z 3/ ) = 0.5 – 0.3821= 0.1179
146
Activity: A call center averages 10 calls per hour. Assume X (the number of calls
in an hour) follows a Poisson distribution. What is the probability that the call
center receives exactly 3 calls in the next hour?
Example: A particular part that is used as an injection device is sold in lots of 10.
The producer deems a lot acceptable if no more than one defective is in the lot. A
sampling plan involves random sampling and testing 3 of the parts out of 10. If
none of the 3 is defective, the lot is accepted. Comment on the utility of this plan.
Solution: Let us assume that the lot is truly unacceptable (i.e., that 2 out of 10 parts
are defective). The probability that the sampling plan finds the lot acceptable is
Thus, if the lot is truly unacceptable, with 2 defective parts, this sampling plan will
allow acceptance roughly 47% of the time. As a result, this plan should be
considered faulty. Let us now generalize in order to find a formula for h(x; N, n, k).
The total number of samples of size n chosen from N items is NCn. These samples
are assumed to be equally likely. There are kCx ways of selecting x successes from
the k that are available, and for each of these ways we can choose the n − x failures
in N−kCn−x ways. Thus, the total number of favorable samples among the NCn
possible samples is given by kCx X N−kCn−x / NCn. Hence, we have the following
definition
Consider the use of a drug that is known to be effective in 60% of the cases where
it is used. The drug will be considered a success if it is effective in bringing some
degree of relief to the patient. We are interested in finding the probability that the
fifth patient to experience relief is the seventh patient to receive the drug during a
given week. Designating a success by S and a failure by F, a possible order of
achieving the desired result is SFSSSFS, which occurs with probability
(0.6)(0.4)(0.6)(0.6)(0.6)(0.4)(0.6) = (0.6)5(0.4)2. We could list all possible orders
by rearranging the F’s and S’s except for the last outcome, which must be the fifth
success. The total number of possible orders is equal to the number of partitions of
the first six trials into two groups with 2 failures assigned to the one group and 4
successes assigned to the other group. This can be done in 6C4 = 15 mutually
exclusive ways. Hence, if X represents the outcome on which the fifth success
occurs, then P(X = 7) = (6C4) (0.6)5(0.4)2 = 0.1866.
149
The number X of trials required to produce k successes in a negative binomial
experiment is called a negative binomial random variable, and its probability
distribution is called the negative binomial distribution. Since its probabilities
depend on the number of successes desired and the probability of a success on a
given trial, we shall denote them by b∗(x; k, p). To obtain the general formula for
b∗(x; k, p), consider the probability of a success on the xth trial preceded by k − 1
successes and x − k failures in some specified order. Since the trials are
independent, we can multiply all the probabilities corresponding to each desired
outcome. Each success occurs with probability p and each failure with probability
q = 1 − p. Therefore, the probability for the specified order ending in success is
pk−1qx−kp = pkqx−k.
The total number of sample points in the experiment ending in a success, after the
occurrence of k−1 successes and x−k failures in any order, is equal to the number
of partitions of x−1 trials into two groups with k−1 successes corresponding to one
group and x−k failures corresponding to the other group. This number is specified
by the term x−1Ck−1 , each mutually exclusive and occurring with equal probability
pkqx−k. We obtain the general formula by multiplying pkqx−k by x−1Ck−1.
If repeated independent trials can result in a success with probability p and a failure
with probability q = 1 − p, then the probability distribution of the random variable
X, the number of the trial on which the kth success occurs, is
(a) What is the probability that team A will win the series in 6 games?
(b) What is the probability that team A will win the series?
(c) If teams A and B were facing each other in a regional playoff series, which is
decided by winning three out of five games, what is the probability that team A
would win the series?
(b) P(team A wins the playoff) is b∗(3; 3, 0.55) + b∗(4; 3, 0.55) + b∗(5; 3, 0.55)
= 0.1664 + 0.2246 + 0.2021 = 0.5931.
The negative binomial distribution derives its name from the fact that each term in
the expansion of pk(1 − q)−k corresponds to the values of b∗(x; k, p) for x = k, k +
1, k + 2, ... . If we consider the special case of the negative binomial distribution
where k = 1, we have a probability distribution for the number of trials required
for a single success. An example would be the tossing of a coin until a head occurs.
We might be interested in the probability that the first head occurs on the fourth
toss. The negative binomial distribution reduces to the form
151
Solution: Using the geometric distribution with x = 5 and p = 0.05 yields
Quite often, in applications dealing with the geometric distribution, the mean and
variance are important. For example, the expected number of calls necessary to
make a connection is quite important.
The mean and variance of a random variable following the geometric distribution
are
μ = 1/ p and σ2 = (1 – p)/ p2
Areas of application for the negative binomial and geometric distributions become
obvious when one focuses on the examples in this section and the exercises devoted
to these distributions. In the case of the geometric distribution, depicts a situation
where engineers or managers are attempting to determine how inefficient a
telephone exchange system is during busy times. Clearly, in this case, trials
occurring prior to a success represent a cost. If there is a high probability of several
attempts being required prior to making a connection, then plans should be made
to redesign the system. Applications of the negative binomial distribution are
similar in nature. Suppose attempts are costly in some sense and are occurring in
sequence. A high probability of needing a “large” number of attempts to experience
a fixed number of successes is not beneficial to the scientist or engineer.
The density of the normal random variable X, with mean μ and variance σ2, is
n(x; μ, σ) = e− 1/2σ2 (x−μ)2 /√2πσ, − ∞ <x< ∞, where π = 3.14159 ... and e =
2.71828 ... .
Once μ and σ are specified, the normal curve is completely determined. For
example, if μ = 50 and σ = 5, then the ordinates n(x; 50, 5) can be computed for
various values of x and the curve drawn. we have sketched two normal curves
having the same standard deviation but different means. The two curves are
identical in form but are centered at different positions along the horizontal axis.
Based on inspection of Figures and examination of the first and second derivatives
of n(x; μ, σ), we list the following properties of the normal curve:
1. The mode, which is the point on the horizontal axis where the curve is a
maximum, occurs at x = μ.
5. The total area under the curve and above the horizontal axis is equal to 1.
The distribution of a normal random variable with mean 0 and variance 1 is called
a standard normal distribution.
Example: Given a standard normal distribution, find the area under the curve that
lies (a) to the right of z = 1.84 and (b) between z = −1.97 and z = 0.86.
Solution: (a) The area (a) to the right of z = 1.84 is equal to 1 minus the area in
Table A. to the left of z = 1.84, namely, 1 − 0.9671 = 0.0329.
(b) The area) between z = −1.97 and z = 0.86 is equal to the area to the left of z =
0.86 minus the area to the left of z = −1.97. From Table A. we find the desired area
to be 0.8051 − 0.0244 = 0.7807
153
Example: Given a standard normal distribution, find the value of k such that (a)
P(Z>k)=0.3015 and (b) P(k<Z< −0.18) = 0.4197.
Solution: Distributions and the desired areas are shown. (a) we see that the k value
leaving an area of 0.3015 to the right must then leave an area of 0.6985 to the left.
From Table A. it follows that k = 0.52. (b) From Table A. we note that the total
area to the left of −0.18 is equal to 0.4286. We see that the area between k and
−0.18 is 0.4197, so the area to the left of k must be 0.4286 − 0.4197 = 0.0089.
Hence, from Table A.3, we have k = −2.37.
Example: A certain type of storage battery lasts, on average, 3.0 years with a
standard deviation of 0.5 year. Assuming that battery life is normally distributed,
find the probability that a given battery will last less than 2.3 years.
Solution: First construct a diagram, showing the given distribution of battery lives
and the desired area. To find P(X < 2.3), we need to evaluate the area under the
normal curve to the left of 2.3. This is accomplished by finding the area to the left
of the corresponding z value. Hence, we find that z = (2.3 – 3)/ 0.5 = −1.4, and then,
using Table A., we have P(X < 2.3) = P(Z < −1.4) = 0.0808.
Example: An electrical firm manufactures light bulbs that have a life, before burn-
out, that is normally distributed with mean equal to 800 hours and a standard
deviation of 40 hours. Find the probability that a bulb burns between 778 and 834
hours.
P(778 <X< 834) = P(−0.55 <Z< 0.85) = P(Z < 0.85) − P(Z < −0.55) = 0.8023 −
0.2912 = 0.5111.
154
Solution: The distribution of diameters is illustrated. The values corresponding to
the specification limits are x1 = 2.99 and x2 = 3.01. The corresponding z values are
z1 = (2.99 − 3.0)/ 0.005 = −2.0 and z2 = (3.01 − 3.0)/0.005 = +2.0.
Hence, P(2.99 <X< 3.01) = P(−2.0 <Z< 2.0).
From Table, P(Z < −2.0) = 0.0228. Due to symmetry of the normal distribution, we
find that P(Z < −2.0) + P(Z > 2.0) = 2(0.0228) = 0.0456. As a result, it is
anticipated that, on average, 4.56% of manufactured ball bearings will be scrapped.
Example: An electrical firm manufactures light bulbs that have a life, before burn-
out, that is normally distributed with mean equal to 800 hours and a standard
deviation of 40 hours. Find the probability that a bulb burns between 778 and 834
hours.
Hence, P(778 <X< 834) = P(−0.55 <Z< 0.85) = P(Z < 0.85) − P(Z < −0.55) = 0.8023
− 0.2912 = 0.5111.
Hence, P(2.99 <X< 3.01) = P(−2.0 <Z< 2.0). From Table, P(Z < −2.0) = 0.0228.
Due to symmetry of the normal distribution, we find that P(Z < −2.0) + P(Z > 2.0)
= 2(0.0228) = 0.0456.
155
7.9 SELF ASSESSMENT QUESTIONS
Q.2 In a certain city district, the need for money to buy drugs is stated as the
reason for 75% of all thefts. Find the probability that among the next 5 theft
cases reported in this district, (a) exactly 2 resulted from the need for money
to buy drugs; (b) at most 3 resulted from the need for money to buy drugs.
Q.7 A random committee of size 3 is selected from 4 doctors and 2 nurses. Write
a formula for the probability distribution of the random variable X
representing the number of doctors on the committee. Find P(2 ≤ X ≤ 3).
156
Q.8 From a lot of 10 missiles, 4 are selected at random and fired. If the lot
contains 3 defective missiles that will not fire, what is the probability that (a)
all 4 will fire? (b) at most 2 will not fire?
Q.9 If 7 cards are dealt from an ordinary deck of 52 playing cards, what is the
probability that (a) exactly 2 of them will be face cards? (b) at least 1 of them
will be a queen?
Q.10 The probability that a person living in a certain city owns a dog is estimated
to be 0.3. Find the probability that the tenth person randomly interviewed in
that city is the fifth one to own a dog.
Q.11 Find the probability that a person flipping a coin gets (a) the third head on
the seventh flip; (b) the first head on the fourth flip.
Q.12 Three people toss a fair coin and the odd one pays for coffee. If the coins all
turn up the same, they are tossed again. Find the probability that fewer than
4 tosses are needed.
Q.13 A scientist inoculates mice, one at a time, with a disease germ until he finds
2 that have contracted the disease. If the probability of contracting the
disease is 1/6, what is the probability that 8 mice are required?
Q.16 On average, a textbook author makes two word-processing errors per page
on the first draft of her textbook. What is the probability that on the next page
she will make (a) 4 or more errors? (b) no errors?
Q.17 A certain area of the eastern United States is, on average, hit by 6 hurricanes
a year. Find the probability that each year that area will be hit by (a) fewer
than 4 hurricanes; (b) anywhere from 6 to 8 hurricanes.
Q.18 Suppose the probability that any given person will believe a tale about the
transgressions of a famous actress is 0.8. What is the probability that (a) the
157
sixth person to hear this tale is the fourth one to believe it? (b) the third
person to hear this tale is the first one to believe it.
Q.19 The average number of field mice per acre in a 5-acre wheat field is estimated
to be 12. Find the probability that fewer than 7 field mice are found (a) on a
given acre; (b) on 2 of the next 3 acres inspected.
Q.20 The number of customers arriving per hour at a certain automobile service
facility is assumed to follow a Poisson distribution with mean λ = 7. (a)
Compute the probability that more than 10 customers will arrive in a 2-hour
period. (b) What is the mean number of arrivals during a 2-hour period?
Q.21 The probability that a student at a local high school fails the screening test
for scoliosis (curvature of the spine) is known to be 0.004. Of the next 1875
students at the school who are screened for scoliosis, find the probability that
(a) fewer than 5 fail the test; (b) 8, 9, or 10 fail the test. What is the mean
number of students who fail the test?
Q.22 The probability that a person will die when he or she contracts a virus
infection is 0.001. Of the next 4000 people infected, what is the mean number
who will die?
Q.23 The potential buyer of a particular engine requires (among other things) that
the engine successfully start 10 consecutive times. Suppose the probability
of a successful start is 0.990. Let us assume that the outcomes of attempted
starts are independent. (a) What is the probability that the engine is accepted
after only 10 starts? (b) What is the probability that 12 attempted starts are
made during the acceptance process?
Q.24 A couple decides to continue to have children until they have two males.
Assuming that P(male) = 0.5, what is the probability that their second male
is their fourth child?
Q.25 The manufacturer of a tricycle for children has received complaints about
defective brakes in the product. According to the design of the product and
considerable preliminary testing, it had been determined that the probability
of the kind of defect in the complaint was 1 in 10,000 (i.e., 0.0001). After a
thorough investigation of the complaints, it was determined that during a
certain period of time, 200 products were randomly chosen from production
and 5 had defective brakes. (a) Comment on the “1 in 10,000” claim by the
manufacturer. Use a probabilistic argument. Use the binomial distribution
for your calculations. (b) Repeat part (a) using the Poisson approximation?
158
Q.26 A soft-drink machine is regulated so that it discharges an average of 200
milliliters per cup. If the amount of drink is normally distributed with a
standard deviation equal to 15 milliliters, (a) what fraction of the cups will
contain more than 224 milliliters? (b) what is the probability that a cup
contains between 191 and 209 milliliters? (c) how many cups will probably
overflow if 230- milliliter cups are used for the next 1000 drinks? (d) below
what value do we get the smallest 25% of the drinks?
Q.27 The loaves of rye bread distributed to local stores by a certain bakery have
an average length of 30 centimeters and a standard deviation of 2
centimeters. Assuming that the lengths are normally distributed, what
percentage of the loaves are (a) longer than 31.7 centimeters? (b) between
29.3 and 33.5 centimeters in length? (c) shorter than 25.5 centimeters?
Q.28 A research scientist reports that mice will live an average of 40 months when
their diets are sharply restricted and then enriched with vitamins and
proteins. Assuming that the lifetimes of such mice are normally distributed
with a standard deviation of 6.3 months, find the probability that a given
mouse will live (a) more than 32 months; (b) less than 28 months; (c)
between 37 and 49 months.
Q.29 The finished inside diameter of a piston ring is normally distributed with a
mean of 10 centimeters and a standard deviation of 0.03 centimeter. (a) What
proportion of rings will have inside diameters exceeding 10.075 centimeters?
(b) What is the probability that a piston ring will have an inside diameter
between 9.97 and 10.03 centimeters? (c) Below what value of inside
diameter will 15% of the piston rings fall?
Q.30 A lawyer commutes daily from his suburban home to his main city office.
The average time for a one-way trip is 24 minutes, with a standard deviation
of 3.8 minutes. Assume the distribution of trip times to be normally
distributed. (a) What is the probability that a trip will take at least 1/2 hour?
(b) If the office opens at 9:00 A.M. and the lawyer leaves his house at 8:45
A.M. daily, what percentage of the time is he late for work?
159
SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II. 8th
Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
160
UNIT 08
161
CONTENTS
Pages
Introduction ..........................................................................................................163
Objectives .. .........................................................................................................163
8.1 Corelation .......................................................................................................164
8.2 Observation Cloud .........................................................................................165
8.3 Scatter Diagram .............................................................................................167
8.4 Regression ......................................................................................................170
8.5 Measuring Contribution off X in Predicting Y ..............................................179
8.6 SELF ASSESSMENT QUESTIONS ............................................................181
Suggested Readings .............................................................................................182
162
Introduction
The term regression was first used in 1877 by Francis Galton. He made a study that
showed that the height of children born to tall parents tends to move back or regress
towards the mean height of the population. He coined the word regression as the
name of the general process of predicting one variable (the height of the children)
from another (the height of the parents). Later, the term multiple regression came
into existence by which several variables are used to predict another.
Objectives
After studying this unit, you will be able to;
163
8.1 Correlation
How can we explore the relationship between two quantitative variables?
Graphically, we can construct a scatterplot. Numerically, we can calculate a
correlation coefficient and a regression equation.
The Pearson correlation coefficient, r, measures the degree of association ,
strength and the direction of a straight-line relationship.
• The strength of the relationship is determined by the closeness of the points
to a straight line.
• The direction is determined by whether one variable generally increases or
generally decreases when the other variable increases.
• r is always between –1 and +1
• magnitude indicates the strength
• r = –1 or +1 indicates a perfect linear relationship
• sign indicates the direction
• r = 0 indicates no linear relationship
Activity: Among all elementary school children, the relationship between the
number of cavities in a child’s teeth and the size of his or her vocabulary is strong
and positive.
164
8.2 Observation Cloud
Let us consider the data of on two interdependent variables namely X and Y.
165
The following data were collected to study the relationship between the sale price,
y and the total appraised value, x, of a residential property located in an upscale
neighborhood.
Property X y x2 y2 Xy
1 2 2 4 4 4
2 3 5 9 25 15
3 4 7 16 49 28
4 5 10 25 100 50
5 6 11 36 121 66
Σ(Sum) 20 35 90 299 163
x y x2 y2 xy
n xy − ( x)( y )
r=
n( x 2 ) − ( x ) 2 n( y 2 ) − ( y ) 2
115 115
r= = = 0.98, X and Y are strongly Positively correlated.
7.071 x16.432 116 .174
166
8.3 Scatter Diagram
Let us consider the scatter diagram of X and Y.
0
0 1 2 3 4 5 6 7
x y
xi − x yi − y ( x − x )( y − y )
i i
0 3 -3 0 0
2 2 -1 -1 1
3 4 0 1 0
4 0 1 -3 -3
6 6 3 3 9
15 15 0 0
x=3 y=3
0 0
= 7
n
(x i − x)( yi − y ))
7
cov( x, y ) = i =1
= = 1.4
n 5
But what does this number tell us?
167
Nothing, So we can only compare covariances between different variables to see
which is greater. Really, as
− cov( x, y )
Or, we could standardize this measure, thus obtaining a more intuitive measure of
correlation magnitude.
Correlation: Pearson’s r
n
n
(x i − x)( yi − y ) (x i − x)( yi − y )
cov( x, y ) = i =1
→ rxy = i =1
n nsx s y
cov( x, y )
rxy =
sx s y
Z xi Z yi
rxy = i =1
n
Important: each xi goes with a specific yi Why?
168
Example: By changing just two points of Y variable the correlation result is
different…
7
7
6
6
5
5
4
4
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
x y
Z x Zy Z x * Z y x y
Z x Zy Z x * Zy
0 3 -1.5 0 0 0 3 -1.5 0 0
3 4 0 0.5 0 3 4 0 0.5 0
Z xi * Z yi
n
1.75 Z xi * Z yi
2.75
rxy = i =1
= = 0.35 rxy = i =1
= = 0.55
n 5 n 5
169
A limitation of r: it is very sensitive to extreme values.
Note: r is actually r̂ .
So when r = 1 or r = (-1) we have a perfect linear relationship: y = ax + b
r=+1 (Perfect Positive correlation), = -1 (perfect negative correlation), r=0
(No Linear Correlation)
8.4 Regression
First recorded in 1510–20, regression is from the Latin word regression- (stem of
regression).
Umbrella selling company offers this example scenario: Suppose you’re a sales
manager trying to predict next month’s numbers. You know that dozens, perhaps
even hundreds of factors from the weather to a competitor’s promotion to the rumor
of a new and improved model can impact the number. Perhaps people in your
organization even have a theory about what will have the biggest effect on sales.
“Trust me. The more rain we have, the more we sell.” “Six weeks after the
competitor’s promotion, sales jump.”
170
We have seen how to explore the relationship between two quantitative variables
graphically with a scatterplot. When the relationship has a straight-line pattern, the
Pearson correlation coefficient describes it numerically. We can analyze the data
further by finding an equation for the straight line that best describes the pattern.
This equation predicts the value of the response(y) variable from the value of the
explanatory variable.
There are two variables x and y which are appear to be related to one another, but
not in a deterministic fashion. Suppose we examine the relationship between x=high
school GPA and Y=college GPA. The value of y cannot be determined just from
knowledge of x, and two different students could have the same x value but have
very different y values. Yet there is a tendency for those students who have high
(low) high school GPAs also to have high(low) college GPAs. Knowledge of a
student’s high school GPA should be quite helpful in enabling us to predict how that
person will do in college.
Regression analysis is the part of statistics that deals with investigation of the
relationship between two or more variables related in a nondeterministic fashion.
The statistical use of the word regression dates back to Francis Galton, who studied
heredity in the late 1800’s. One of Galton’s interests was whether or not a man’s
height as an adult could be predicted by his parents’ heights. He discovered that it
could, but the relationship was such that very tall parents tended to have children
who were shorter than they were, and very short parents tended to have children
taller than themselves. He initially described this phenomenon by saying that there
was a “reversion to mediocrity” but later changed to the terminology “regression to
mediocrity”.
The least-squares line is the line that makes the sum of the squares of the vertical
distances of the data points from the line as small as possible.
Simple Linear regression model equation for Least Squares (Regression) Line
Y=β0 + βX + ∈
When talking about regression equations, the following are terms used for X and Y
X: predictor variable, explanatory variable, or independent variable
171
Y: response variable or dependent variable
And the Estimated Line y = ˆo + ˆ1 x
ˆ1 denotes the estimated slope. The slope in the equation equals the amount
that y changes when x increases by one unit.
n xy − ( x)( y )
ˆ1 =
n x 2 − ( x ) 2
ˆ 0 denotes the estimated y-intercept. The y-intercept is the predicted value of y
when x=0. The y-intercept may not have any interpretive value. If the answer to
either of the two questions below is no, we do not interpret the y-intercept.
ˆ
1. Is 0 a reasonable value for the explanatory variable?
2. Do any observations near x=0 exist in the data set?
ˆ0 = y − ˆ1 x
Scatterplot with Least Squares Line
14
12
10
SalePric
6 Y = -2.2 + 2.3X
R-Squared = 0.980
4
2 3 4 5 6
App val
172
Equation for Least Squares Line : ŷ = -2.2 + 2.3x
Σ (y - ŷ )2 = 1.1
The slope in the equation equals the 2.3 that y changes when x i.e. price increases
by one unit.
ŷ = -2.2 i.e. on average sale price is -2.2 when appraisal value is zero.
173
Regression
ŷ = a + bX
, predicted value
, true value
ε residual error
i
( y − ˆ
y ) 2
Example
i =1
→ min
n
From the data we calculate the following:
174
So what we’re looking for is the parameters (a, b) of the
regression line.
Example
From the data we calculate the following:
Σxy=150605 Sx=19.3679 , ΣY/n=66.93 and ΣX/n=144.6. Run a Regression Y
(height of anatomical dead space ) on X (range of measurements).
Solution:
Applying these figures to the formulae for the regression coefficients, we have:
The line representing the equation is shown superimposed on the scatter diagram
of the data in figure. The way to draw the line is to take three values of x, one on
the left side of the scatter diagram, one in the middle and one on the right, and
substitute these in the equation, as follows:
Although two points are enough to define the line, three are better as a check.
Having put them on a scatter diagram, we simply draw the line through them. ŷ =
a + bx This is true for a sample.
175
Like in all statistical methods, we want to make inferences about the population.
So,
yi = a + bxi + i
Then Estimated Equation is
ˆ + bˆxi
ˆi = a
y
Obviously, the stronger the correlation between x and y, the better the prediction;
this is expressed in both parameters:
ˆs y
r ˆs y
r
ˆ
b = ˆ = y−
a x
sx sx
by putting values of a and b
ˆ ˆs y
r ˆs y
r
yi = a + bxi = y −
ˆ ˆ x+ xi
sx sx
After rearranging, we can write this:
ˆ rˆs y rˆs y
yi = a + bxi =
ˆ ˆ xi − x+ y
sx sx
ˆs y
r
ˆi =
y ( xi − x ) + y
sx
It’s easy to see why if there’s no correlation, we will simply predict the average of
y for any x. The larger the correlation, the greater the regression line’s slope.
In any case, the average of the predicted values will always equal the average of
( yˆi − y )
2
s y2ˆ = = .................... = rˆ 2 s y2
n
176
So this variance is always smaller than the true variance (as the true variance is
multiplied by a fraction).
Furthermore:
2
sy
ˆ
s 2
ˆ
y = r
ˆ s 2 2
y r
ˆ 2
= 2
sy
r-squared is the explained variance!
It tells us what fraction of the general variance can be attributed to the model.
Therefore:
True variance = predicted variance + error variance
s 2
y = s 2
yˆ + s 2
( yi − y ˆi )
or:
s y2 = r
ˆ 2 s y2 + (1 − r
ˆ 2 ) s y2
177
This is where we see why it is similar to ANOVA*:
(y
j =1 i =1
ij − y ) 2 = n j ( y
j =1
j − y) 2 + (y
j =1 i =1
ij − y j )2
MS Re g rˆ 2 ( N − 2)2
F( df mod el ,dferror) = = ... =
MSErr 1 − rˆ 2
Alternatively (as F is the square of t):
ˆ(n − 2)
r
t( n − 2 ) =
1− r
ˆ2
Assumptions
yi = a + bxi + i
The regression model in GLM terms:
178
yi = y + xi + i
So:
y1 = x1 + y *1 + 1
y2 = x2 + y *1 + 2
y3 = x3 + y *1 + 3
And in matrix notation:
y1 x1 1 1
y = x
2 2 1
y + 2
y3
x3 1
3
In matrix Form in general
Y = X +
Extrapolation is the use of the least-squares line for prediction outside the range
of values of the explanatory variable x that you used to obtain the line.
Extrapolation should not be done!
179
The coefficient of determination can also be obtained by squaring the Pearson
correlation coefficient. This method works only for the linear regression model
y = ˆo + ˆ1 x . The method does not work in general.
The coefficient of determination, r2, represents the proportion of the total sample
variation in y (measured by the sum of squares of deviations of the sample y values
about their mean y ) that is explained by (or attributed to) the linear relationship
between x and y.
(y - y) - (y - y)
2 2
R2(Coefficient of Determination)=
= 54 − 1.1 = 0.98
(y - y)
2 54
180
8.6 SELF ASSESSMENT QUESTIONS
Q.1: The grades of a class of 9 students on a midterm report (x) and on the final
examination (y) are as follows:
X 77 50 71 72 81 94 96 99 67
Y 82 66 78 34 47 85 99 99 68
a) Find the equation of the regression line. (b) Graph the line on a scatter
diagram. (c) Estimate the amount of chemical that will dissolve in 100
grams of water at 50◦C.
Q.3: The following data were collected to determine the relationship between
pressure and the corresponding scale reading for the purpose of calibration.
Pressure (Lb/Sq.In) 10 10 10 10 10 50 50 50 50 50
Reading 13 18 16 15 20 86 90 88 88 92
(a) Find the equation of the regression line.
(b) Find the Correlation coefficient between pressure and readings
(c) Draw a scatter Diagram of readings and pressure
Q.4: A study was made on the amount of converted sugar in a certain process at
various temperatures. The data were coded and recorded as follows:
Temperature (X) 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0
Converted Sugar (Y)
8.1 7.8 8.5 9.8 9.5 8.9 8.6 10.2 9.3 9.2 10.5
(a) Estimate the linear regression line. (b) Estimate the mean amount of
converted sugar produced when the coded temperature is 1.75. (c) Plot the
residuals versus temperature. (d) Find correlation coefficient e) Draw a
scatter diagram.
181
SUGGESTED READINGS
Bluman, A.G. (2004). Elementary Statistics. A Step by Step Approach. 5th Edition.
McGraw-Hill Companies Incorporated. London.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-I. 8th
Edition. Ilmi Kitab Khana. Lahore.
Chaudhary, S.M. & Kamal, S. (2017). Introduction to Statistical Theory Part-II.
8th Edition. Ilmi Kitab Khana. Lahore.
Daniel, W.W. (1995). Biostatistics: A foundation for Analysis in Health sciences.
Sixth Edition. John Wiley and sons Incorporated. USA.
Harper, W.M. (1991). Statistics. Sixth Edition. Pitman Publishing, Longman
Group, United Kingdom.
Hoel, P.G. (1976). Elementary Statistics. 4th Edition. John Wiley and Sons
Incorporated, NewYork.
Kiani, G. H., & Akhtar, M. S. (2012). Basic statistics, Majeed Book Depot.
Khan, A. A., Mirza, S. H., Ahmad, M. I., Baig, I. & Yaqoob, M. (2011). Business
Statistics, Qureshi Brothers Publishers.
182
UNIT 09
183
CONTENTS
Pages
Introduction ..........................................................................................................185
Objectives ............................................................................................................186
9.1 Dynamic Nature of Relationships ..................................................................187
9.2 Least Square Assumptions .............................................................................188
9.3 Stationarity .....................................................................................................189
9.4 Alternative Paths ...........................................................................................189
9.5 Assumptions...................................................................................................191
9.6 SELF ASSESSMENT QUESTIONS ............................................................194
Suggested Readings ............................................................................................196
184
Introduction
When modeling relationships between variables, the nature of the data that have
been collected has an important bearing on the appropriate choice of an
econometric model. In particular, it is important to distinguish between cross-
section data (data on a number of economic units at a particular point in time) and
time-series data (data collected over time on one particular economic unit).
Examples of both types of data. When we say ‘‘economic units’’ we could be
referring to individuals, households, firms, geographical regions, countries, or
some other entity on which data is collected. Because cross-section observations
on a number of economic units at a given time are often generated by way of a
random sample, they are typically uncorrelated. The level of income observed in
the Smiths’ household, for example, does not affect, nor is it affected by, the level
of income in the Jones’s household. On the other hand, time-series observations
on a given economic unit, observed over a number of time periods, are likely to be
correlated. The level of income observed in the Smiths’ household in one year is
likely to be related to the level of income in the Smiths’ household in the year
before. Thus, one feature that distinguishes time-series data from cross-section
data is the likely correlation between different observations. Our challenges for
this chapter include testing for and modeling such correlation. A second
distinguishing feature of time-series data is its natural ordering according to time.
With cross-section data there is no particular ordering of the observations that is
better or more natural than another. One could shuffle the observations and then
proceed with estimation without losing any information. If one shuffles time-series
observations, there is a danger of confounding what is their most important
distinguishing feature: the possible existence of dynamic relationships between
variables. A dynamic relationship is one in which the change in a variable now has
an impact on that same variable, or other variables, in one or more future time
periods. For example, it is common for a change in the level of an explanatory
variable to have behavioral implications for other variables beyond the time period
in which it occurred. The consequences of economic decisions that result in
changes in economic variables can last a long time. When the income tax rate is
increased, consumers have less disposable income, reducing their expenditures on
goods and services, which reduces profits of suppliers, which reduces the demand
for productive inputs, which reduces the profits of the input suppliers, and so on.
The effect of the tax increase ripples through the economy. These effects do not
occur instantaneously but are spread, or distributed, over future time periods. As
shown in Figure 9.1, economic actions or decisions taken at one point in time, t,
have effects on the economy at time t, but also at times t + 1, t + 2, and so on.
185
Objectives
After studying this unit, you will be able to;
• Explain why lags are important in models that use time-series data, and the
ways in which lags can be included in dynamic econometric models.
• Explain what is meant by a serially correlated time series, and how we
measure serial correlation.
• Specify, estimate, and interpret the estimates from a finite distribute lag
model.
• Explain the nature of regressions that involve lagged variables and the
number of observations that are available.
• Specify and explain how the multiple regression assumptions are modified
to accommodate time series data.
• Compute the autocorrelations for a time-series, graph the corresponding
correlogram, and use it to test for serial correlation.
186
9.1 Dynamic Nature of Relationships
Given that the effects of changes in variables are not always instantaneous, we
need to ask how to model the dynamic nature of relationships. We begin by
recognizing three different ways of doing so.
One way is to specify that a dependent variable y is a function of current and past
values of an explanatory variable x. That is,
We can think of (yt, xt) as denoting the values for y and x in the current period; xt-
1 means the value of x in the previous period; xt-2 is the value of x two periods ago,
and so on. For the moment f (.) is used to denote any general function. Later we
replace f (.) by a linear function. Equations such as (9.1) say, for example, that the
current rate of inflation yt depends not just on the current interest rate xt, but also
on the rates in previous time periods xt-1, xt-2, ... ….. Turning this interpretation
around as in Figure 9.1, it means that a change in the interest rate now will have
an impact on inflation now and in future periods; it takes time for the effect of an
interest rate change to fully work its way through the economy. Because of the
existence of these lagged effects, (9.1) is called a distributed lag model.
Where again f(.) is a general function that we later replace with a linear function.
In this case we are saying that the inflation rate in one period yt will depend (among
other things) on what it was in the previous period, yt-1. Assuming a positive
relationship, periods of high inflation will tend to follow periods of high inflation
and periods of low inflation will tend to follow periods of low inflation. Or, in
other words, inflation is positively correlated with its value lagged one period. A
model of this nature is one way of modeling correlation between current and past
values of a dependent variable. Also, we can combine the features of (9.1) and
(9.2) so that we have a dynamic model with lagged values of both the dependent
and explanatory variables, such as
187
Such models are called autoregressive distributed lag (ARDL) models, with
‘‘autoregressive’’ meaning a regression of yt on its own lag or lags.
A third way of modeling the continuing impact of change over several periods is
via the error term. For example, using general functions f(.) and g(.), both of which
are replaced later with linear functions, we can write
Where the function et = g(et-1) is used to denote the dependence of the error on its
value in the previous period. In this case et is correlated with et-1; we say the errors
are serially correlated or auto-correlated. Because (9.3) implies et+1 = g(et), the
dynamic nature of this relationship is such that the impact of any unpredictable
shock that feeds into the error term will be felt not just in period t, but also in future
periods. The current error et affects not just the current value of the dependent
variable yt, but also its future values yt+1; yt+2; ... . As an example, suppose that a
terrorist act creates fear of an oil shortage, driving up the price of oil. The terrorist
act is an unpredictable shock that forms part of the error term et. It is likely to
affect the price of oil in the future as well as during the current period.
The dynamic models in (9.2), (9.3) and (9.4) imply correlation between yt and yt-1
or et and et-1 or both, so they clearly violate assumption, that different observations
on y and on e are uncorrelated. As mentioned below (9.4), when a variable is
correlated with its past values, we say that it is autocorrelated or serially correlated.
How to test for serial correlation, and its implications for estimation.
9.3 Stationarity
An assumption that we maintain throughout the time series is that the variables in
our equations are stationary. This assumption will take on more meaning when it
is relaxed. For the moment we note that a stationary variable is one that is not
explosive, nor trending, and nor wandering aimlessly without returning to its
mean. These features can be illustrated with some graphs. Plots of this kind are
routinely considered when examining time-series variables. The variable Y that
appears is considered stationary because it tends to fluctuate around a constant
mean without wandering or trending. On the other hand, X and Z in possess
characteristics of nonstationary variables. In X tends to wander, or is ‘‘slow
turning,’’ while Z is trending. These concepts will be defined. For now the
important thing to remember is that with modeling and estimating dynamic
relationships between stationary variables whose time series have similar
characteristics to those of Y. That is, they neither wander nor trend.
189
distributed lag models can be covered as a special case of ARDL models or
omitted.
Finite Distributed Lags The first dynamic relationship that we consider is that
given in (9.1),
yt = f(xt; xt-1; xt-2; …….), with the additional assumptions that the relationship is
linear, and, after q time periods, changes in x no longer have an impact on y. Under
these conditions we have the multiple regression model
The model in (9.5) can be treated in the same way as the multiple regression model.
Instead of having a number of explanatory variables, we have a number of different
lags of the same explanatory variable. However, for the purpose of estimation,
these different lags can be treated in the same way as different explanatory
variables. It is convenient to change subscript notation on the coefficients: bs is
used to denote the coefficient of xts and a is introduced to denote the intercept.
Other explanatory variables can be added if relevant, in which case other symbols
are needed to denote their coefficients. Models such as (9.5) have two special uses.
The first is forecasting future values of y. To introduce notation for future values,
suppose our sample period is for t = 1, 2, ... , T. We use t for the index (rather than
i) and T for the sample size (rather than N) to emphasize the time series nature of
the data. Given that the last observation in our sample is at t = T, the first post
sample observation that we want to forecast is at t = T + 1. The equation for this
observation is given by
The forecasting problem is how to use the time series of x-values, xT+1; xT ; xT-1;
... ; xT-q+1 to forecast the value yT+1, with special attention needed to obtain a value
for xT+1.
The second special use of models like (9.5) is for policy analysis. Examples of
policy analysis where the distributed-lag effect is important are the effects of
changes in government expenditure or taxation on unemployment and inflation
(fiscal policy), the effects of changes in the interest rate on unemployment and
inflation (monetary policy), and the effect of advertising on sales of a firm’s
products. The timing of the effect of a change in the interest rate or a change in
taxation on unemployment, inflation, and the general health of the economy can
be critical. Suppose the government (or a firm or business) controls the values of
190
x, and would like to set x to achieve a given value, or a given sequence of values,
for y. The coefficient βs gives the change in E(yt) when xt-s changes by one unit,
but x is held constant in other periods. Alternatively, if we look forward instead of
backward, βs gives the change in E(yt+s) when xt changes by one unit, but x in
other periods is held constant. In terms of derivatives
To further appreciate this interpretation, suppose that x and y have been constant
for at least the last q periods and that xt is increased by one unit, then returned to
its original level. Then, using (9.5) but ignoring the error term, the immediate
effect will be an increase in yt by β0 units. One period later, yt+1 will increase by
β1 units, then yt+2 will increase by β2 units and so on, up to period t + q, when yt+q
will increase by β q units. In period t+ q + 1 the value of y will return to its original
level. The effect of a one-unit change in xt is distributed over the current and next
q periods, from which we get the term ‘‘distributed lag model.’’ It is called a
finite distributed lag model of order q because it is assumed that after a finite
number of periods q, changes in x no longer have an impact on y. The coefficient
β s is called a distributed-lag weight or an s-period delay multiplier. The
coefficient β0 (s = 0) is called the impact multiplier. It is also relevant to ask what
happens if xt is increased by one unit and then maintained at its new level in
subsequent periods (t + 1), (t þ+2), ... . In this case, the immediate impact will
again be β0; the total effect in period t +1 will be β0 + β1, in period t+2 it will be β0
+ β1+ β2, and so on. We add together the effects from the changes in all preceding
periods. These quantities are called interim multipliers. For example, the two-
period interim multiplier is β0 + β1+ β2. The total multiplier is the final effect
on y of the sustained increase after q or more periods have elapsed; it is given by
∑𝒒𝒔=𝟎 𝛃s.
9.5 Assumptions
When the simple regression model was first introduced in Chapter 8, it was written
in terms of the mean of y conditional on x. Specifically, E(y/x)= β1 + β2X, which
led to the error term assumption E(e/x)= 0. Then, so that we could avoid the need
to condition on x, and hence ease the notational burden, we made the simplifying
assumption that the x’s are not random. We maintained this assumption through
Chapters 8, recognizing that although it is unrealistic for most data sets, relaxing
it in a limited but realistic way would have had little impact on our results and on
our choice of estimators and test statistics. However, because the time-series
variables used in the examples in this chapter are random, it is useful to mention
191
alternative assumptions under which we can consider the properties of least
squares and other estimators. In distributed lag models both y and x are typically
random. The variables used in the example that follows are unemployment and
output growth. They are both random. They are observed at the same time; we do
not know their values prior to ‘‘sampling.’’ We do not ‘‘set’’ output growth and
then observe the resulting level of unemployment. To accommodate this
randomness we assume that the x’s are random and that et is independent of all x’s
in the sample—past, current, and future. This assumption, in conjunction with the
other multiple regression assumptions, is sufficient for the least squares estimator
to be unbiased and to be best linear unbiased conditional on the x’s in the sample.
With the added assumption of normally distributed error terms, our usual t and F
tests have finite sample justification. Accordingly, the multiple regression
assumptions given can be modified for the distributed lag model as follows:
INFt = β1 + β2DUt + et
with both sets of standard errors—the incorrect least squares ones that ignore
autocorrelation, and the correct HAC ones that recognize the autocorrelation—are
as follows:
The HAC standard errors are larger than those from least squares, implying that
if we ignore the autocorrelation, we will overstate the reliability of the least squares
estimates. The t and p-values for testing H0 : β2 = 0 are
An autoregressive distributed lag (ARDL) model is one that contains both lagged
xt’s and lagged yt’s. In its general form, with p lags of y and q lags of x, an
ARDL(p, q) model can be written as
The AR component of the name ARDL comes from the regression of y on lagged
values of itself; the DL component comes from the distributed lag effect of the
lagged x’s. Two examples that we) are
The ARDL model has several advantages. It captures dynamic effects from lagged
x’s and lagged y’s, and by including a sufficient number of lags of y and x, we can
eliminate serial correlation in the errors.
193
9.6 SELF ASSESSMENT QUESTIONS
Q.1 Consider the following distributed lag model relating the percentage
growth in private investment (INVGWTH) to the federal funds rate of
interest (FFRATE):
̂
𝑰𝑵𝑽𝑮𝑾𝑻𝑯 t = 4 – 0.4FFRATEt – 0.8FFRATEt-1 – 0.6FFRATEt-2 -
0.2FFRATEt-3
(b) Suppose FFRATE is raised to 1.5% in period t = 5 and then returned to its
original level of 1% for t =6, 7, 8, 9. Use the equation to forecast
INVGWTH for periods t = 5, 6, 7, 8, 9. Relate the changes in your forecasts
to the values of the coefficients. What are the delay multipliers?
Q.2 The contains 105 weekly observations on sales revenue (SALES) and
advertising expenditure (ADV) in millions of Rupees for a large midwest
department store in 2008 and 2009. The following relationship was
estimated:
(c) Find 95% confidence intervals for the impact multiplier, the one-period
interim multiplier, and the total multiplier.
(b) Find 95% forecast intervals for ADV108 for each of the three allocations.
If maximize ADV108 is your objective, which allocation would you
choose? Why?
Q.4 In question no.1, the following Phillips curve was estimated:
(a) Given that the unemployment rates in the first three post-sample quarters
are U2019Q4 = 5.6; U2020Q1= 5.4; and U2020Q2 = 5.0, use the estimated
equation to forecast inflation for 2019Q4, 2020Q1 and 2020Q2.
(b) Find the standard errors of the forecast errors for your forecasts in (a).
196
INTRODUCTION TO STATISTICS
FOR ECONOMISTS