Quantitative Management With Corrections PDF
Quantitative Management With Corrections PDF
34 cm
QUANTITATIVE MANAGEMENT Formatted: Heading 4, Indent: First line: 0 cm
Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Contents
1
estimate, Co-efficient of determination
Unit 3 FORECASTING
Introduction, General Steps of Forecasting Techniques, Types of Forecast Models, Time-
Series Analysis – Components of Time-Series Analysis, Moving Average, Exponential
Smoothing, Measures Forecast Accuracy, Least Square Regression Analysis, Application
Areas of Forecasting
____________________________________________________________________
____
Block 3 LINEAR PROGRAMMING PROBLEM AND SPECIAL PROBLEMS
Unit 3 TRANSPORTATION
Introduction, Basic structure of transportation, Transportation problem- Initial Basic
feasible solution (North west corner rule, Least Cost Rule, Vogel’s approximation method),
Test for optimality (The Modified Distribution (MODI) method), Special cases of
transportation
Unit 4 ASSIGNMENT
Introduction, Basic structure of assignment, Approach of the Assignment model, Solution
Method (Hungarian method), Special cases of Assignment
_______________________________________________________________
2
Unit 2 WAITING LINE MODELS
Introduction, Basic Concepts in Game theory, Two- person zero-sum game, Game with no
Saddle Point, Principle of Dominance, Solution of 2Xn and m X 2 games,
3
Block no.1 Introduction to Quantitative
Management and Statistical methods
______________________________
___
Block Introduction
In this block, an introduction to quantitative methods will be given. The basic difference
between statistics and operations research will be discussed. The role and importance and
its application of quantitative methods in business will be explained. In the second unit, the
meaning and importance of measures of central tendency will be discussed. Various
measures of central tendency and its comparative analysis will be covered. In the third unit
discrete probability distribution and its various types will be discussed. In the last unit
continuous probability distributions and its various applications will be covered.
Block Objective
• Understand the meaning of quantitative methods
• Identify the various situations where discrete probability distributions can be applied.
4
• Understand the importance of continuous probability distributions in decision making
Block Structure
5
Unit 1 : INTRODUCTION TO QUANTITATIVE
METHODS
______________________________
___
Unit Structure
1.0 Learning Objectives
1.1 Introduction
6
1.9.2 Application of Operation Research in Business
1.12 Glossary
1.13 Assignment
1.14 Activities
7
1.0 Learning Objectives
After learning this unit, you will be able to understand:
1.1 Introduction
Decision making is an integral part of management of an organization. Every day business
managers are required to make decisions. The key managerial functions of planning,
organizing directing and controlling, requires management to be engaged continuously in
the process of decision making pertaining to each of them. So we can say that management
can be regarded an equivalent to decision making.
Historically, decision making was considered purely as an art, acquired over period of time
based on experience. Various styles of decision making were observed in solving similar
managerial problems by different people in real business situations. Many times managers
resort to their “instincts” to make decisions (unstructured decision making). However, the
environment in which the management has to operate these days is complex and fast
changing. There is a great requirement for augmenting the art of decision making using
systematic and scientific methods. Most decisions cannot be taken on the basis of ‘rule of
thumb’ or common sense or snap judgment. For businesses, a single wrong decision may
have long term painful implications. The present day managers cannot work on trial and
error method. A systematic approach to decision making is also necessary, as the cost of
making errors may be too high and at times irreversible. Thus the managers in the business
world should understand the importance of scientific methodology of decision making. It
means defining the problem in a clear manner, collecting required data, analyzing the data
thoroughly, deriving and forming conclusions about the data and finally implementing the
solution.
8
Although qualitative approach are inherent in the manager and usually increase with
experience, the skills of the quantitative approach need to be learned by studying its
assumptions and methods. A manager who is knowledgeable in Quantitative methods can
compare and evaluate the qualitative and quantitative sources of recommendations and
finally combine the two sources to choose the best possible decision.
Quantitative methods involve the use of numbers, symbols, mathematical expressions and
other elements of quantities. These are used to supplement the judgment and intuitions of
the decision makers. The essential idea of the quantitative approach to decision making is
that if the factors that influence the decisions can be identified and quantified, it becomes
easier to resolve the complexity of the problem at hand. These methods help businesses in
optimum utilization of resources with limited resources. In other words, we can say
quantitative methods helps in choosing the best course of action from the alternative
courses of actions available to achieve the optimum value of the objective or goal.
i) Statistical Methods
1.3.1Statistical methods
Statistics is a science dealing with the collection, analyze, interpretation and presentation
of numerical data. As an example, let us suppose that a company is interested in knowing
the satisfaction level its consumers. The first step will be data collection on satisfaction
level, the factors of satisfaction and other variables related to consumer behavior. The data
so obtained can be organized on the basis of various demographic and classification
variables like- age, income, gender, education level, region etc. This organized data may
now be presented by means of tabular data or various types of graphs to facilitate analysis.
The average satisfaction level can be derived and further compared on the basis of
measured variables like age. This information will help to determine if a particular age
group is more satisfied as compared to others. Similarly various kinds of analysis will give
insights to drawing conclusions about the population being studied. This will further help
in decision making related to improvement of satisfaction level of customers of the
targeted product.
9
Classification of Statistical data
The data used in statistical study is broadly classified into two types- (1) Primary data (2)
secondary data. When the data used in the study is collected specifically for the purpose
of the study, such type of data is referred to as primary data. Primary data is collected
afresh for the first time and thus have originality in its character. On the other hand, when
the data was collected for some other purpose and is derived from the other sources then
such data is referred to as secondary data. The secondary data is collected by some
organization and are available in published form and is used by someone else for their
research.
The same data when can be called as primary or secondary, based on the difference of who
is using it. For example a researcher wants to study the economic conditions of laborers’
in India. If the researcher collects the data directly using a questionnaire, it is called
‘primary data’. However if some other researcher uses this data for some other purpose
subsequently, then the same data becomes “secondary data”.
Whenever one is doing research, first it must be checked whether any secondary data is
available on the subject matter of interest which can be used, as it will save a lot of time
and money. However the data must be verified thoroughly for its reliability and accuracy.
Its relevance and the context under which it is collected should also be verified, since it
was originally collected for another purpose. The researcher would need to collect original
data according to his objectives, when either secondary data is not available or is not
reliable.
There are many international bodies who collect great amount of data regularly and publish
like: International Monetary fund(IMF), World Health organization(WHO, Asian
Development Bank, International Labor Organization , United nations organization , world
meteorological organization, Food and agriculture organization(FAO),etc., Government
and its many agencies: Reserve Bank of India, Census Commission, Ministries-Ministry
of Economic Affairs, Commerce Ministry; Private Research Organizations, Trade
Associations, etc. Examples of government publications in India are reports on currency
and finance, India trade journal, statistical abstract of India, Indian customs and central
excise tariff, Reserve Bank of India Bulletin, Agricultural Statistics of India, Economic
Survey, and Indian Foreign Statistics, etc.
1. Problem Formulation. The first step in operations research is to develop a clear and
10
concise statement of problem. It is essential to identify and understand the root problem to
get the right answer to solve the problem. The symptom should not be confused with the
problem. For example higher production cost is a symptom, where the underlying problem
may be of – improper inventory levels, excessive wastage, poor quality control, etc. The
symptoms are only an indication of the problem and hence the manager should go beyond
the symptoms to identify the real cause of the problem. Also there may be multiple
problems and one may be related to other. The organization often selects those problems
whose solution would either result in increasing profit or decreasing cost. So it is
imperative for an analyst to have an extensive interaction with the management involving
selection and interpretation of the available data This step often involves various activities
like- site visit, meetings, research, conferences, observations etc, These activities which
provide the analyst with the required information to formulate the right problem.
2. Model Building: Once a problem is identified, the next step is to develop a model. A
model is a representation of some abstract or real life problem. The models are basically
mathematical models, which describe systems, process in the form of equations,
formula/relationships. The activities in this step involve defining the variables, studying
their relationship and formulating equations to represent the problem. The model will be
tested in different environmental constraints and revised in order to work.
3. Obtaining the input data: The next step is to obtain the data to be used in the model
as input. The data should be accurate, relevant and complete in all respect. The quality of
the input data will decide the quality of output. A number of resources including company
reports and documents, interviews with company employees may be used for data
collection.
4. Solution of the Model: The next stage of analysis is finding the solution and
interpreting it in the context of the problem. A solution to a model means determination of
a specific set of decision variables that would give a desired level of output. The desired
level of output is the level which ‘optimizes’. Optimization means maximization of the
goal attainment from a given set of resources or minimization of cost as will satisfy the
required level of goal attainment.
5. Model Validation: The validation of the model means whether the developed model
is adequately predicting the behavior of the actual system, it is representing. It involves
checking the reliability and ascertaining if the structural assumptions of the model are met.
It’s a normal practice is to test the validity by comparing the performance of the past data
available with respect to the actual system
6. Implementation: The final step is the implementation of the results. It is the process
of incorporating the developed model as a solution in the organization. The techniques and
methods of operation research are based on mathematical concepts, and neglect the human
aspects, which are most important at the time of implementation. The impact of the decision
will be influenced by the level of motivation, resistance to change, desire to be informed
among employees. It will be very important to tactfully handle these issues for successful
implementation of the solution. A model which gives average theoretical advantage but
implementable is better than one which ranks high on theoretical advantage but cannot be
implemented.
11
Check your Progress1
1. Individual respondents, focus groups, and panels of respondents are categorised as
a) Primary Data Sources
b) Secondary Data Sources
c) Itemized Data Sources
d) Pointed Data Sources
Inferential Statistics: If a researcher gathers data from a sample and uses the statistics
generated to reach conclusions about the population from which the sample was taken, the
statistics are inferential statistics. The data gathered are used to infer something about a
large group. Continuing with the same example if the professor uses statistics on average
grade achieved by one class to estimate the average grade achieved by all five sections of
the same English course, the process of estimating this average grade would be called as
inferential statistics. Inferential statistics are sometimes also referred to as inductive
statistics. We need to understand word ‘statistic’ and ‘parameter’ to understand inferential
statistics. A statistic is a descriptive measure computed from a sample of data.
• A statistic is a descriptive measure computed from a sample of data. For eg, Sample
Mean ( x ) and Sample Standard Deviation (s) of a sample are each known as a ‘Statistic’.
12
• A parameter is a descriptive measure computed from an entire population of data.
For eg., Population Mean (µ) and Population Standard Deviation () of a population are
each known as a ‘Parameter’.
i. Iconic Models
Iconic models represent a system the way it is, but in different size. They are essentially
the scaled up/down versions of the particular thing they represent. They are obtained by
reducing or enlarging the size of the system. In other words they are images. A model of a
proposed building by an architect, model of solar system, model of molecular structure of
a chemical, a toy aeroplane are some examples of iconic model. Maps, photographs,
drawings may also be categorized as iconic models as they look like what they represent
except in size. The advantage of I conic models is that they are specific and represent the
thing visually. But the disadvantage is that they cannot be manipulated for experimental
purposes. They cannot be used to study the changes in the operation of a system.
The analogue models use one set of properties to represent another set of properties. After
the problem is solved, it is interpreted in terms of the original system. For example the
electrical network model may be used as an analogue model to study the flows in a
transportation system. The contour lines on a map are analogues to elevation as they
13
represent the rise and fall of height. In general the analogue models are less specific and
concrete as compared to the iconic models and can be easily manipulated.
In symbolic models letters, numbers and other types of mathematical symbols are used to
represent variables and the relationship between them. These are the most general and
abstract type of models. These models can be verbal or mathematical. The verbal models
represent a situation in spoken language or written words, whereas, mathematical models
use mathematical notations to represent the situation. The difference between the two can
be understood by taking an example of measuring area of rectangle. A verbal model would
express it as: The area of the rectangle (A) is equal to multiplication of length (L) of the
rectangle by its breadth(B), whereas the mathematical model is represented as: A= L x B.
Both the models yield same result, however a mathematical model is more precise.
Symbolic models are used in Operations Research as they are easier to manipulate and
yield better results as compared to iconic or analogue models.
Once the data is collected, they need to be summarized and presented to the decision maker
in a form that is easy to understand and comprehend. Tabulation helps in this process
through effective presentation. Classification of the data showing the different values of
the variable and their respective frequencies of occurrence is called frequency distribution
of the values. There are two kinds of frequency distributions- discrete frequency
distribution and continuous frequency distribution. Graphical representation is more
effective in communicating the information. Through graphs and charts, the decision maker
can often get an overall picture of the data and reach very useful conclusions merely by
studying the chart or graph.
The concept of central tendency plays an important role in the study and application of
statistics. There is an inherent tendency of the data to cluster or group around their central
value. This behavior of the data to concentrate the values around central part of data is
called as ‘Central tendency’ of the data. Measures of central tendency enable to find that
single value at which the data is considered to be concentrated. Measures of central
tendency helps to compare two or more sets of data, for example average sales figures of
two months. There are three common measures of central tendency- Mean, Median and
Mode. Mean is the most widely used measure. Arithmetic mean is the average of a group
of numbers and is computed by summing all numbers and dividing by the number of
observations. The median is the middle value in a set of data that has been ordered from
14
lowest to highest (ascending) or highest to lowest (descending). It is the value that splits
ordered data into two equal parts. The mode is the most frequently occurring value of a set
of data.
Measures of Variability
Measures of variability explain the spread or dispersion of a set of data. It explains the
variation in the values and how different the values are from the mean. Usually measures
of variability are used together with the measures of central tendency to make a complete
description of the data. There are a number of measure of dispersion like- Range, Inter
quartile range, mean absolute deviation, variance and standard deviation
Probability Distribution
Correlation
Correlation is a measure of the degree of relatedness of variables. For example, how strong
is the correlation between the producer price index and the unemployment rate? In retail
sales, are sales related to population density, number of competitors, size of the store,
amount of advertising, or other variables? The correlation coefficient measures the degree
of association of one variable with the other. The Pearson product-moment correlation(r)
is used, when both variables being analyzed have at least an interval level of data. The term
r is a measure of the linear correlation of two variables. It is a number that ranges from -1
to 0 to +1, representing the strength of the relationship between the variables. An r value
of +1 denotes a perfect positive relationship between two sets of numbers. An r value of -
1 denotes a perfect negative correlation, which indicates an inverse relationship between
two variables. An r value of 0 means no linear relationship is present between the two
variables.
Regression
Regression analysis is the process of developing a model to predict the value of a numerical
variable based on the values of other variables (one or more). The most elementary
regression model is called simple regression or bivariate regression involving two
variables in which one variable is predicted by another variable. In simple regression, the
variable to be predicted is called the dependent variable and is designated as y. The
predictor is called the independent variable, or explanatory variable, and is designated
as x. In simple regression analysis, only a straight-line relationship between two variables
is examined. In multiple regression, more than one independent variables are used to
predict the dependent variable.
15
Forecasting
Forecasting is the art or science of predicting the future values of a variable. Forecasting
methods can be classified as qualitative and quantitative. The quantitative methods can be
used only when the variable under study can be quantified and the historical data is
available. A time series data is a set observations of a variable measured over a period of
time at regular intervals. The objective of time series method is to discover a pattern in the
historical data and then extrapolate this pattern into the future.
Decision Theory
Decision theory also called as decision analysis, is used to determine optimal strategy
where a decision maker is faced with several decision alternatives and an uncertain pattern
of future events. All decision making situations have usually two or more alternative
courses of action available to the decision maker to choose from. There are various possible
outcomes, called states of nature, which are beyond the control of decision maker. A
decision may be defined as the selection of an act which is considered to be the best
according to a predefined standard, from the available options.
Index Number
Index number is a ratio of a measure taken during one time period to that same measure
taken during another time period, usually denoted as base period. The ratio is often
multiplied by 100 and expressed as a percentage. These are very useful to reflect the inter-
period differences. Using index numbers, a researcher can transform the data into values
that are more usable and make it easier to compare other years to one particular key year.
Index numbers are widely used among the world to relate information about stock prices,
inflation, sales, exports, imports, agriculture prices etc. Some examples of specific indexes
are employment cost index, price index for construction, producer price index, consumer
price index etc. For example, if the Consumer Price Index for year 2020 is 150, it means
the prices are gone up by 50 %. As the Consumer Price Index (CPI-U) is compiled by the
Bureau of Labor Statistics and is based upon a 1982 Base year of 100.
Linear Programming:
It is a mathematical modeling technique for selecting the best alternative from a set of
feasible alternative, in situations where the objective function as well as the constraints can
be expressed as linear mathematical function. The objective function may be maximization
of profit /sales or minimization of cost/time etc. There are many methods to solve a linear
programming problem.
Transportation:
The transportation problem arises in planning for the distribution of goods and services
from various supply locations to different demand locations. Normally the quantity of
goods available at supply location (origin) is limited and the quantity of goods required at
demand location (destination) is known. Mostly the objective is to minimize the total
transportation cost of shipping the goods from origin to destination
Assignment
An assignment problem arises in many decision making situations in an organization like
assigning jobs to machines, workers to machines, clerk to counters, sales personnel to sales
territories etc. It is a special type of linear programming, with the constraint that one job
can be assigned to one and only one machine.
Game theory
Game theory is used to make decisions in conflicting situations in which where there are
one or more players/ adversaries/opponents. Each player selects a strategy independently
without knowing in advance the strategy of other player or players. The combination of the
competing strategies provides the value of the game to the players. Game theory
applications have been developed for situations in which the competing players are teams,
companies, political candidates, armies or contract bidders.
Project scheduling
Managers are responsible for planning, scheduling and controlling projects that consists of
numerous jobs or tasks performed by various departments or individuals. The Program
Evaluation and Review technique (PERT) and Critical Path method (CPM) are extremely
17
helpful in these situations. The objective is to complete the project on time, adhering to the
precedence requirements (which mean some activities should be completed, before other
activities can be started).
Simulation
Simulation is one of the most widely used quantitative approaches of decision making. It
involves developing a model of some real phenomenon and then performing experiments
on the model evolved. It is a descriptive and not an optimizing technique. In simulation, a
given system is copied and the variable and the constants associated with it are manipulated
in the artificial environment to study the behavior of the system.
Quantitative methods provide the managers with a variety of tools from statistics and
operations research for handling problems in modern business a scientific way.
1. Give accurate and specific description: The facts can be conveyed in a precise form
when stated quantitatively using statistics. For example the statement that infant mortality
rate is 30 % in 2018, as compared to 35 percent in 2015, is more specific than stating that
the infant mortality rate in 2018 had decreased in comparison to year 2015.
2. Convert data into information: Statistics help in reducing the amount of data collected
and convert it to more meaningful information for making decisions. For example the
census data of individual household on the number of members is a huge mass of data and
it will be difficult to draw any conclusions without applying statistics.
18
3. Facilitate Comparison of data: It helps in the comparison of data, as the data is
collected in the form of numbers. The present data can be compared with the previously
collected data to study the pattern of increase or decrease in a phenomenon. For example
there can be a comparison of month-wise sales figure data of a company to identify the
trend.
4. Forecast future events: Statistical methods are very useful in predicting a future events.
For example to take the decision on production scheduling, an automobile manufacturer
would like to know the past sales figures. Based on these figures, future sales can be
predicted and accordingly the required number of automobiles can be manufactured
Tools for scientific analysis: Operations research models, provides a systematic, scientific
and logical way of understanding and solving problems. It is not possible to take decisions
based on intuitions due to increased complexities of business. These techniques help the
decision maker to provide the description and solution of the problem more precisely.
Choosing an optimal business strategy– Using operations research techniques like Game
theory, it is possible to determine the optimal strategy for an organization that is facing
competition from its rivals with conflicting interests.
Facilitate and improve the quality of decision making-A decision maker can use various
mathematical models to take better informed decision in the face of uncertainty. The
operation research techniques like decision theory, improve the quality of decision making.
Multiple variables or resources can be formulated and manipulated as a model to take
optimum decisions.
19
1.91.8 Applications of Quantitative Methods in Business and Management
Managers in all functional areas use statistics and operations research methods to make
better informed decisions.
This course is about how quantitative methods may be used to help managers make better
decisions. This unit attempted to explain the meaning and use of various quantitative
analysis methods in the field of business and management. The two branches of
quantitative analysis- Statistics and Operations Research have been introduced in detail as
regards their meaning, purpose and applications. Statistics is a science dealing with the
collection, analyze, interpretation and presentation of numerical data. Data gathered on a
group to describe or reach conclusions about that same group is called descriptive statistics.
Data gathered from a sample and statistics generated to reach conclusions about the
population from which the sample was taken is known as inferential statistics.
21
Operations Research is a method of employing mathematical representations or models
to analyze business problems to take management decisions. The discussion in this unit
was centered on the problem orientation of quantitative methods and an overview of how
mathematical models can be used in analysis. Mathematical models are abstractions of real
world situations and may not be able to capture all the aspects of the real situation. However
if a model can capture the major relevant aspects of the problem and can provide a
recommended solution, it can be valuable in decision making.
Various methods used in statistics and operations research have been discussed in brief.
The benefits and advantages of quantitative methods along with their applications in
various functional areas were also covered in this unit. The importance and complexity of
decision making process has resulted in wide applications of quantitative techniques.
1.12 Glossary
Statistics: Statistics is a science dealing with the collection, analyze, interpretation and
presentation of numerical data.
Descriptive statistics: Data gathered on a group to describe or reach conclusions about
that same group
Inferential Statistics: Data gathered from a sample and statistics generated to reach
conclusions about the population from which the sample was taken.
Primary Data: The data used in the study is collected specifically for the purpose of the
study
Secondary Data: The data was collected for some other purpose and is derived from the
other sources.
Statistic: It is a descriptive measure computed from a sample of data.
Parameter: It is a descriptive measure computed from an entire population of data.
22
Random variable is a numerical description of the outcome of an experiment.
Discrete Random Variable: The random variable that can take limited number of values
basically whole numbers
Continuous random Variable: The random variable can take any value over a range
(decimal values also).
Operations Research: It is a method of employing mathematical representations or
models to analyze business problems to take management decisions.
Model: A representation of real object or situation
Iconic Model: A physical replica or representation of a real object
Analog Model: Analogical models are a method of representing a phenomenon of the
world, by another, more understandable or analyzable system.
Mathematical Model: Mathematical symbols and expressions used to represent a real
situation
1.13 Assignment
1. What is Statistics? Explain types of statistics with examples
2 Discuss stages of operation research in detail.
3 Describe the various types of operation research models.
4 List at least five techniques used in statistics and operations research.
5 Describe the advantages of quantitative methods
6 Discuss applications of statistics and operations research in functional area of
management.
1.14 Activities
Take an example of a major decision you have taken recently. List the steps you had
taken to reach the final decision.
24
2.0 Learning Objectives
2.1 Introduction
2.2 Measures of Central Tendency
2.2.1 Importance of measures of central tendency
2.2.2 Properties of a good measure of central tendency
2.4 Median
2.5 Mode
2.10 Glossary
2.11 Assignment
2.12 Activities
2.13 Case Study
25
2.0 Learning Objectives
After learning this unit, you will be able to:
• Understand the meaning of Central tendency
• Understand the Importance of measures of central tendency
• Compute various measures of central tendency- arithmetic mean, weighted mean,
geometric mean, harmonic mean, median, mode, Quartiles, Deciles and Percentiles
• Explain the relationship between mean , median and mode
2.1 Introduction
In the introductory chapter, an overview of the various types of statistical methods used in
management decision making were explained. The purpose of descriptive statistics is to
describe and summarize the data. Descriptive statistics include various measures like-
Measures of Central Tendency, Measures of Variation, Measures of Shape and Measures
of Kurtosis. Measure of central tendency is one of the most important and widely used tool
for describing and summarizing the data. In this unit we will be exploring the concept of
central tendency and various measures used to measure central tendency. The objective is
to identify a single value which can act as a representative of the given data. This value can
be used to make conclusion and decision related to the entire data set. The computation of
various measures is different for ungrouped and grouped data and hence will be discussed
separately.
A measure of central tendency, enables us to get an idea of the entire data from a single
value where the data is considered to be concentrated. For example, it is impossible to
remember the sales figures of various retail outlets in a region. But the average could be
used to make conclusions about the sales of the entire region. The average condenses a
great amount of data, into a single representative value, so that data can be summarized
easily. Measure of Central tendency also enables to compare two or more sets of data. For
example the average sales figures of two brands in the same product category can be
compared.
26
2.2.2 Characteristics of a good measure of central tendency
A good measure of central tendency should posses as far as possible, the following
characteristics:
• Easy to understand
• Easy to compute
• Based on all the observations
• Uniquely defined
• Possibility of further algebraic treatment
• Not unduly affected by extreme values
27
396
= = 39.6
10
Therefore the average age of employees in the organization is 39.6 years
The calculation of the sample mean uses the same formula as for the population mean and
would have resulted in the same answer, if computed on the given data. However it is
inappropriate to compute the sample mean for a population and a population mean for a
sample. It should be noted that as the entire employee’s data was included, it is population
data. If a sample of five out of ten employees was taken, then we would have calculated
sample mean. In statistics it is important to clearly differentiate between a sample and
population data
2.3.1 Arithmetic Mean for Grouped Data
We have already seen how to compute the arithmetic mean of ungrouped data. When the
data is classified in the form of a frequency distribution, we are working with grouped data.
Grouped Data can be either a Discrete Frequency Distribution or a Continuous Frequency
Distribution. With grouped data in the form of a Continuous Frequency Distribution, the
specific values are unknown, as the data is in the form of class intervals. The midpoint of
each class interval is used to represent all the values in the class interval. This midpoint is
weighted by the frequency of values in the class interval.
The arithmetic mean for grouped data is computed by summing the product of class
midpoint(Mi) and the class frequency(fi) of each class and dividing that sum by the total
number of frequencies(N). The formula for the mean of grouped data is as follows:
∑𝑓𝑖 𝑀𝑖 ∑𝑓𝑖 𝑀𝑖
𝜇= =
𝑁 ∑𝑓𝑖
This method is illustrated with the help of the given data on the age group of people in an
area.
For calculation the arithmetic mean, we need the following table Formatted: Indent: Left: 0.5 cm, First line: 0 cm
28
18-24 17 21 357 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
fi= 231 fiMi= 10,371 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
To apply the formula, let us consider the following distribution of marks of 100 students in
an examination:
29
Class 10- 20- 30- 40- 50- 60- 70- Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Interval 20 30 40 50 60 70 80
Frequency 5 3 4 7 2 6 13 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
20-30 25 3 -20 -60 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
30-40 35 4 -10 -40 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
∑𝑓𝑖 =40 ∑𝑓𝑖 𝑑𝑖 =280 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
In calculating arithmetic mean, an equal importance is given to all the observations. But
there are situations where relative importance of different values is not the same. In such
case, weighted arithmetic mean need to be used. The procedure is similar to the calculation
of grouped data arithmetic mean, where frequency is used as the weight associated with
the class interval. For example for the data value x1,x2 , x3 ….. xn and associated weights
w1, w2 , w3 ….. wn, the weighted arithmetic mean can be computed using the formula:
∑𝑤𝑖 𝑥𝑖 𝑤𝑖 𝑥𝑖 × 𝑤2 𝑥2 ×⋅⋅⋅⋅⋅⋅⋅⋅× 𝑤𝑛 𝑥𝑛
𝜇𝑤 = =
∑𝑤𝑖 𝑤1 + 𝑤2 +⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 𝑤𝑛
You are aware about the use of weighted averages, when the various components of
evaluation are not equally important. For example your final grade is composed of 30
percent of mid term score, 50 percent of final exam score and 20 percent for assignment.
Then the final grade will be calculated by multiplying the score (x i) by the weight (wi) of
each score :
30
∑𝑤𝑖 𝑥𝑖 30𝑥1 + 50𝑥2 + 20𝑥3
𝜇𝑤 = =
∑𝑤𝑖 30 + 50 + 20
So if you score 85 marks in mid term, 75 in final exam and 90 in assignment, then the
weighted average will be:
30 × 85 + 50 × 75 + 20 × 90
𝜇𝑤 =
100
8100
= = 81
100
Some of the common applications of weighted arithmetic mean is calculating index
numbers like consumer price index, BSE Sensex etc.; where different weights are
associated with items or shares. If the frequencies are considered as weights then the same
method as above will calculate Arithmetic Mean for a Discrete Frequency Distribution with
𝑤𝑖 = 𝑓𝑖 .
When we are dealing with quantities that change over a period of time and we need to find
the average rate of change, such as average growth rate or depreciation rate over a period
of time. In such cases, simple arithmetic mean is inappropriate, as it will result in a wrong
answer. The appropriate measure of central tendency will be geometric mean.
The geometric mean is defined as nth root of the product of ‘n’ values of the data. If
x1,x2,x3…..xn are the values of the data then Geometric mean is equal to:
𝐺𝑀 = 𝑛√𝑥1 × 𝑥2 × 𝑥3 × … . .× 𝑥𝑛
When the number of observations are more, to simplify the calculations, logarithmic
transformations can be applied. Taking log on both the sides, the formula becomes:
∑𝐿𝑜𝑔(𝑥𝑖 ) 1
𝐿𝑜𝑔(𝐺𝑀) = = (𝐿𝑜𝑔𝑥1 + 𝐿𝑜𝑔𝑥2 + 𝑙𝑜𝑔𝑥3 +⋅⋅⋅⋅⋅⋅⋅⋅⋅ +𝑙𝑜𝑔𝑥𝑛 )
𝑛 𝑛
∑𝐿𝑜𝑔(𝑥𝑖 )
𝐺𝑀 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 { }
𝑛
Geometric mean is useful to find the average percentage increase in sales, production,
population etc. It is the most representative average in the construction of index numbers.
When large weights are to be given to smaller values and small weights to larger values,
the most appropriate average to be used is geometric mean. Let’s take an example to
understand computing of geometric mean.
31
Inflation rate in percentage for the past six months is given as 5.5, 6.2, 7.2, 6, 6.5 and 5.9.
Find average inflation rate over the past six months
First, we find the index by dividing the percentage rate by 100 and then adding 1. Then we
take the GM of this index as average index. From this we can find out the average inflation
rate.
6 6
𝐺𝑀 = √1.055 × 1.062 × 1.072 × 1.06 × 1.065 × 1.059 = √1.4359 = 1.062
Harmonic mean is appropriate if the data values are ratios of the two variables with
different measures called rates. The harmonic mean is very useful for computing average
speed of a journey or average price of a product at which it is sold. In finance harmonic
mean is used to determine the average of financial multiples like P/E ratio.
A journey from place X to Y is completed using four different cars. The average speed of
each of the car is 50 km/hr,75km/hr, 60 km/hr and 80km/hr. Find the average speed of the
journey.
4
𝐻𝑀 = 1 1 1 1 = 64 𝑘𝑚/ℎ
+ + +
50 75 60 80
Like arithmetic mean and geometric mean, in harmonic mean also the values are used for
computation of the average. However harmonic mean cannot be used when one or more
observations have zero value or the observations can take both positive and negative values.
Harmonic mean has very limited applications in business.
32
Check your Progress 1
1. What is the major assumption we make when computing a mean from grouped data
a) All values are discrete
b) Every value in a class is equal to the midpoint
c) No value occurs more than once
d) Each class contains same number of values
2. When calculating the average rate of debt expansion for a company, the correct mean
to use is the
a) Arithmetic mean
b) Weighted arithmetic mean
c) Geometric mean
d) Either (a) or (c)
3. The following frequency distribution has been constructed from about the Air transport
traffic data. Calculate the arithmetic mean
No of passengers 20-30 30-40 40-50 50-60 60-70 70-80
travelling
No of airports 8 7 1 0 3 1
4. The management of a restaurant has employed 2 waiters, 5 cooks and 10 waiters. The
monthly salaries of the managers, cooks and waiters are 30000, 20000 and 10000 per
month respectively. Find the mean salary paid per month by the management.
2.4 Median
Median is a measure of central tendency different from all the averages we have discussed
so far. Median is the middle value in a set of data that has been arranged in ascending or
descending order. While computing various types of means all the values in the data set
are used, whereas median is a single value from the data set that is the middle most or
central item in the set of numbers. Half of the values lie above the point and the other half
lie below it measures the central item in the data
To find the median of ungrouped data, first arrange the data in ascending or descending
order. If the data set contains an odd number of values, the middle item (median) is one of
the original observations. If there is an even number of values, the median is the average
of the two middle observations. The formula for median is:
𝑁+1
𝑀𝑒𝑑𝑖𝑎𝑛 = ( ) 𝑡ℎ 𝑖𝑡𝑒𝑚 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑎𝑟𝑟𝑎𝑦
2
Suppose we want to find the median of a data set containing seven observations. Then as
33
per the above formula, the median is the (7+1)/2= 4th value in the data set. Lets take an
example of the data on time taken to complete a task daily, for ten days. First the data has
to be arranged in ascending order:
Ordered data
29 31 35 39 39 40 43 44 44 52
𝑁+1 (10 + 1)
𝑀𝑒𝑑𝑖𝑎𝑛 = ( ) 𝑡ℎ 𝑖𝑡𝑒𝑚 = = 5.5𝑡ℎ 𝑖𝑡𝑒𝑚
2 2
As the median is at the 5.5th item, we will be taking an average of 5th and the 6th value,
which is 39 and 40. Therefore the median is (39+40)/2= 39.5. The median of 39.5 means
that for half of the days, the time taken to do the task is less than or equal o 39.5minutes
and for half of the days the time taken to do the task is greater or equal to 39.5 minutes.
For the grouped data (Continuous Frequency Distribution), we first find the N/2 value.
Then from the cumulative frequency we find the class in which N/2th item falls. Such a
class is called median class. Then the median is calculated using the following formula:
𝑁
− 𝑐𝑓𝑝
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 (𝑤)
𝑓𝑚𝑑
where, L= lower limit of the median class
cfp= cumulative frequency of the class preceding the median class
fmd = frequency of the median class
w = width of the class .
To facilitate the process of locating the median class, let’s find the cumulative frequency. Formatted: Indent: Left: 0.5 cm, First line: 0 cm
34
9-11 7 55 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Median = N/2th value= 60/2= 30th Value. Let’s understand how to locate the median class
using the cumulative frequency column. It can be seen that 1st to 4th value lies in class 1-3,
from 5th to 16th in the second class, 17th to 29th in third class and from 30th to 48th in the
fourth class and similarly the rest of the values. Thus the 30th value lies in the class interval
7-9.
𝑁
− 𝑐𝑓𝑝
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + 2 (𝑤)
𝑓𝑚𝑑
60
− 29
2
= 7+ × (2) = 7 + 0.105 = 7.105
19
The median value of the unemployment rates is 7.105.
Like the grouped arithmetic mean, the median is a approximate value. It is based on the
assumption that the actual value fall uniformly across the median class interval, which may
not be always true.
Frequency: 1 4 8 6 3 1
2.5 Mode
Mode is a measure of central tendency that is similar to median because it is also not
arithmetically calculated like mean. The mode is the value that is repeated most often in
the data set. In case there is a tie for the most frequent value, the data is said to be bimodal.
Data set with more than two modes are called as multimodal.
35
Mode is rarely used as a measure of central tendency for ungrouped data, as sometimes a
single unrepresentative value might have occurred just by chance. For example as in the
data series 1,2,2,3,3,4,4,5,5,6,7,7,8,9,9,12,12,and 12 , the mode is 9 as it occurs maximum
number of times. But as it can be observed it is not representing of the central part of the
data and most of the values are actually below 10.
When data is grouped in the form of frequency distribution, it is assumed that the mode is
located in the class with the most items. The class with the highest frequency will be called
the modal class. To determine the mode from the Modal class, the given formula will be
used:
𝑑1
𝑀𝑜𝑑𝑒 = 𝐿 + ( )𝑤
𝑑1 + 𝑑2
The modal class is 15 -20, as the highest frequency is 10. Let’s substitute the values in the Formatted: Indent: Left: 0.5 cm, First line: 0 cm
given formula
𝑑1 = 𝑓1 − 𝑓0 = 10 − 0 = 10 𝑑2 = 𝑓1 − 𝑓2 = 10 − 9 = 1
𝑑1
𝑀𝑜𝑑𝑒 = 𝐿 + ( )
𝑑1 + 𝑑2
10)
= 15 + ( ) 5 = 19.55
10 + 1
The mode of the age of students enrolled for the programme is 19.55 years.
Frequency: 1 6 36 10 9 3 1
2.6 Quartiles
Quartiles are related positional measures of central tendency. There are useful and quite
frequently used measures. The most familiar positional averages are – quartiles, deciles
and percentiles
Quartiles: Quartiles are values that divide the data into four equal parts. To divide data
into four parts we need three partitions and these are called - Quartile 1, Quartile 2 and
Quartile 3. The first quartile Q1 is such that 25% of the values are smaller and 75 % of the
observations are higher than this value. The second quartile Q2 is the median as 50% of the
values are smaller and 50 % of the observations are larger than it. The third quartile Q3
divides in such a way that 75% of the values are smaller and 25 % of the observations are
larger than Q3.
𝑁
The quartile is located at the 𝑖 4 th item of the data set. The class in which the quartile lies
is known as the quartile class. The formula of computing quartile for grouped data, similar
to median formula is as follows:
𝑁
𝑖 − 𝑐𝑓𝑝
4
𝑄𝑖 = 𝐿 + (𝑤) 𝑓𝑜𝑟 𝑖 = 1,2,3
𝑓𝑞
where, L = lower limit of the quartile class
cfp= cumulative frequency of the class preceding the quartile class
fq = frequency of the quartile class
w = width of the class .
Deciles: Deciles are values, that divide the data into ten equal parts. Since we need nine
points to divide data set into ten parts, there are nine deciles denoted as D1, D2, D3, …..D9.
𝑁
The decile is the 𝑖 10th item of the data set, where i=1 ,2, 3,….9. The class in which the
decile falls is known as the decile class. The formula of computing decile for grouped data
is:
𝑁
𝑖 − 𝑐𝑓𝑝
10
𝐷𝑖 = 𝐿 + (𝑤) 𝑓𝑜𝑟 𝑖 = 1,2,3 … .9
𝑓𝑑
where, the symbols have usual meaning and interpretation
Percentiles: Percentiles are the values, which divide the data into hundred equal parts.
There are ninety nine percentiles, denoted as P1, P2, P3……P99.The percentile is located at
𝑁
the 𝑖 th item of the data set. The formula is:
100
37
𝑁
𝑖 100 − 𝑐𝑓𝑝
𝑄𝑖 = 𝐿 + (𝑤) 𝑓𝑜𝑟 𝑖 = 1,2,3 … … .99
𝑓𝑝
where, L = lower limit of the quartile class
cfp= cumulative frequency of the class preceding the quartile class
fp = frequency of the quartile class
w = width of the class.
To illustrate the computation of quartiles, deciles and percentiles, consider the following
data on sales of companies in lakhs.
Sales ( in 0-10 10-20 20-30 30-40 40-50 50-60 Formatted: Normal, Indent: First line: 0 cm
lakhs): Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Calculate Q1 ,Q3, D6 and P80 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Solution:
38
frequency of this class is 77. Substituting the relevant values in the formula
𝑁
𝑖 4 − 𝑐𝑓𝑝
𝑄𝑖 = 𝐿 + (𝑤)
𝑓𝑞
100
3 − 57
4
𝑄3 = 30 + × (10) = 39
20
This value of Q3 suggest that 75% of the company’s sales are Rs. 39 lakhs or less than that
and only 25% of the company’s sales figures are more than that.
𝑁 100
D6 = (𝑖 10)th item = (6 10 ) = 60th item which falls in the class 30-40 , as the cumulative
frequency of this class is 77. Substituting the relevant values in the formula
𝑁
𝑖 − 𝑐𝑓𝑝
10
𝐷𝑖 = 𝐿 + (𝑤)
𝑓𝑑
100
6 − 57
10
𝐷6 = 30 + × (10) = 31.5
20
This value of D6 suggest that 60% of the company’s sales are Rs. 31.5 lakhs or less than
that and only 40% of the company’s sales figures are more than that.
𝑁 100
P80 = (𝑖 100)thitem = (80 100) = 80th item which falls in the class 30-40 , as the cumulative
frequency of this class is 77. Substituting the relevant values in the formula
𝑁
𝑖 − 𝑐𝑓𝑝
100
𝑃𝑖 = 𝐿 + (𝑤)
𝑓𝑝
100
80 − 77
100
𝑃80 = 40 + × (10) = 41.77
17
This value of P80 suggest that 80% of the company’s sales are Rs. 41.5 lakhs or less than
that and only 20% of the company’s sales figures are more than that.
39
Let’s try to summarize the difference between three major measures of central tendency
A distribution of data, in which the right half is mirror image of the left half is said to be
symmetrical. One example of symmetrical distribution is normal distribution or bell shaped
curve. In a symmetrical distribution, mean, median and mode all coincide at the same point.
Lack of Symmetricity in a distribution is called Skewness. If the distribution is skewed, the
mean, median and mode are not equal. In a moderately skewed distribution, the distance
between mean and median is approximately one third of the distance between the mean
and the mode. This can be expressed as
1
𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛 = (𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒)
3
𝑀𝑜𝑑𝑒 = 3 𝑚𝑒𝑑𝑖𝑎𝑛 − 2 𝑀𝑒𝑎𝑛
Thus if we know values of any two measure of central tendency, the third measure can be
approximately determined in any moderately skewed distribution. The curves (a) and (c)
are examples of moderately skewed distributions. A skewed distribution can be of two
types- (1) Negatively skewed distribution (2) Positively skewed distribution.
40
A negatively skewed distribution is skewed to the left with a long left tail and a positively
skewed distribution is skewed to the right with a long right tail. It can be observed from
the above curves that the relationship between mean median and mode is as follows:
However, it can be observed that in any skewed distribution, the median lies between the
mean and the mode. When the population is skewed negatively or positively, the median
is often the best measure, as it is always between mean and mode. The median is not as
highly influenced by the frequency of occurrence of a single value as in the mode nor is it
pulled by extreme values as in case of mean.
41
2.8 Let Us Sum Up
Measures of central tendency, is branch of descriptive statistics, which helps to describe
the characteristics of the data. The most common measures of central tendency are – mean
median and mode. In addition quartiles, deciles and percentiles are measures of central
tendency. Any one of the measures may be used, based on the data and its application.
These measures are computed differently for ungrouped and grouped data. The arithmetic
mean is computed using all values and so can be influenced by extreme values. A median
is unaffected by the magnitude of extreme values. This characteristic makes median a most
useful measure of location, especially for skewed distribution. Mode should be used when
the most occurring value needs to be found
∑𝑓𝑖 =20 ∑𝑓𝑖 𝑚𝑖 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
= 760
Formatted: Indent: Left: 0.5 cm, First line: 0 cm
∑𝑓𝑖 𝑀𝑖 ∑𝑓𝑖 𝑀𝑖 760
𝜇= = = = 38
𝑁 𝑓𝑖 20
4. Weighted Arithmetic mean
42
1. (b)
2.
N/2=25/2=12.5, which falls in the class 2-3 , as the cumulative frequency of this class is Formatted: Indent: Left: 0.5 cm, First line: 0 cm
13. Substituting the relevant values in the formula
𝑁 25
− 𝑐𝑓𝑝 −5
2 2
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐿 + (𝑤) = 2 + × (1) = 2.9375
𝑓𝑚𝑑 8
1. The modal class is 2-3, as the highest frequency is 10. Let’s substitute the values in the
given formula
𝑑1 = 𝑓1 − 𝑓0 = 10 − 6 = 4 𝑑2 = 𝑓1 − 𝑓2 = 10 − 9 = 1
𝑑1 4
𝑀𝑜𝑑𝑒 = 𝐿 + ( ) = 2+( ) 1 = 2.8
𝑑1 + 𝑑2 4+1
Answers to check your Progress 4
1. Percentile
2. True
3. False
4. P30= 20 means 30% of the values are less than or equal to 20 and and 70 % are more
than 20
1. (d)
2. (c)
43
3.4 Glossary
Arithmetic Mean: A measure of central tendency, computed by summing all the values
and dividing by the number of observations
Geometric Mean: A measure of central tendency used to measure the average rate of
change or growth for some quantity, computed by taking the nth root of the product of the
values representing change
Harmonic Mean: A measure of central tendency defined as the reciprocal of the arithmetic
mean of the reciprocals of the individual observations.
Median : The middle point of the data set that divides the data into two halves
Mode: The value most often repeated in the data set
Quartile: Fractiles that divide the data into four equal parts
Decile: Fractiles that divide the data into ten equal parts
Percentile: Fractiles that divide the data into hundred equal parts
3.5 Assignment
3.6 Activities
Compare some small cap, mid cap and large cap mutual funds for 3 year and five year
return on the basis of measures of central tendency.
45
Unit No. 3 Discrete Probability Distributions
___________________________________________
Unit Structure
3.1 Introduction
3.8 Glossary
3.9 Assignment
3.10 Activities
3.11 Case Study
46
3.0 Learning Objectives
• Identify the various situations where discrete probability distributions can be applied.
3.1 Introduction
Many times organizations are more interested in some function of the outcome of a process/
experiment than the actual outcome itself. For example road safety service may be
interested to know the probability of a particular number of accidents that could take place
in a day rather than the details of the accident itself. We recognize that this information on
probability will be very useful in taking decision. Let’s say, a manufacturer randomly
selects two boxes from a large batch of boxes to test its quality. Each selected box can be
rated as good or defective. If the boxes are numbered 1 and 2, a defective box is designated
as D and good box is designated with G. Then all the possible outcomes in the sample space
are {D1G2, D1D2, G1G2,G1D2} .The expression D1G2 means first is defective and second is
of good quality. The possible outcomes are getting zero, one or two good boxes. It can be
observed that the probability of getting one good (2/4) is more than getting both (1/4). Formatted: Font color: Red
This representation of possible outcomes and their probabilities is known as probability
distribution. Development of probability theory helps in specifying probability
distributions. There are a number of theoretical probability distributions that have been
analyzed. Many real life situations could be approximated to these distributions and used
for decision making. We will be studying some common probability distributions in this
and the subsequent unit. The objective of this unit is to study one type of probability
distribution- i.e. discrete probability distribution. The basic concept and its application in
decision making will be discussed
47
of the new MBA batch on the basis of their stream of graduation based on their experience
of previous batches. Assume the students are from these streams:
Stream B.Com BBA BE/BCA others
Probability 0.40 0.30 0.10 0.20 Formatted: Indent: Left: 0.5 cm
The above data is based on expectations of the institute about the new batch and are Formatted: Indent: Left: 0.5 cm
prepared before collecting any real data. This will be called probability distribution.
However, once the admissions are done and the distribution of actual data on the stream of
graduation collected will be called a frequency distribution.
AnA experiment is defined as any process that generates well defined outcomes. Let’s
understand the process of assigning numerical values to experimental outcomes. For any
particular experiment, a random variable can be defined in a way that each possible
experimental outcome generates exactly one numerical value for the random variable. For
example if we consider the experiment of cars arriving for repair work at an automobile
service station, we can describe the experimental outcomes in terms of numbers of cars
arriving. In this case if X= Number of cars arriving, X is called the random variable. The
possible values that the random variable X denoted by ‘x’ can take are 0, 1, 2, 3, 4.,…..n
cars.
A random variable is defined as a numerical description of the outcome of an experiment.
A random variable may be classified as either discrete or continuous, depending on the
numerical values they can assume. In the above example, the random variable can take
only discrete values. A random variable that may assume only a finite or countably infinite
number of possible numbers ( eg. x=0, 1, 2, 3, 4, 5…n) is a discrete random variable. In
most situations, discrete random variables produce values that are nonnegative whole
numbers. For example, if 10 people are selected from a population and how many are
female is to be determined, the random variable here is discrete. The only possible numbers
of female in the sample are 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10. There cannot be 3.5 females in
a group of 10 people; obtaining decimal values is impossible. The numbers of units sold,
number of defective parts, no of customers entering a bank, no of voters who voted in an
area etc., are some examples of discrete random variables.
There are certain situations, in which the variable of interest can take infinitely many
values. Consider an example that a company is interested in ascertaining the probability
distribution of the volume of a 1000 ml bottle of soft drink, manufactured by it. The
company have reasons to believe that the packaging process is such that at times the volume
may be slightly less than or slightly more than 1000 ml. There is infinite number of values
that the random variable ‘volume’ can take over a range. In such cases, it makes more sense
in talking about probability of volume between two values, rather than the probability of
volume taking a specific value. Random variables that may assume any value over a given
interval are called continuous random variable. It can be said that continuous random
variables are generated from experiments in which things are ‘measured’ and not ‘counted’.
For example a worker can take any value of time of time between a reasonable range to
assemble a product component such was 3.5 minutes (3 minute, 30 seconds). This means
48
unlike discrete, continuous random variable can take decimal values also. The weight, time,
temperature, percentage of projects completed on time, length of car etc., are some
examples of continuous discrete variable.
The outcomes for random variables and their associated probabilities can be organized into
distributions. The distributions constructed from discrete random variables are called
discrete probability distributions and the distributions constructed from continuous random
variables are called continuous probability distributions.
For example, the following data is the distribution of the number of loans approved per
week at the local branch office of a bank. The listing is collective exhaustive as all the
possible outcomes are listed and thus the probabilities must add up to 1.
X 0 1 2 3 4 5 6
P(x) 0.1 0.1 0.2 0.3 0.15 0.1 0.05 Formatted: Indent: Left: 0.5 cm
The given figure is a graphical representation of the data, with the values of the random Formatted: Indent: Left: 0.5 cm
variable x shown on the horizontal axis. The probability that x takes on these values is
shown on vertical axis
49
0.35
0.3
0.25
0.2
P(x)
0.15
0.1
0.05
0
0 1 2 3 4 5 6
Loans per week
After constructing the probability distribution for a random variable, we often want to
calculate the mean of the random variable. The mean µ of a probability distribution is the
expected value of a random variable. To calculate the expected value, you multiply each
possible outcome x by its corresponding probability P(x) and then add the resulting
terms. The mathematical formula for computing the expected value of a discrete random
variable is
𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑖 𝑃(𝑥𝑖 )
Let’s find the expected value for the given probability distribution on the loan approved
per week using the formula.
50
𝜇 = 𝐸(𝑥) = ∑ 𝑥𝑖 𝑃(𝑥𝑖 )
= 2.8
The expected value of 2.8 represents the mean number of loans approved per week. For
experiments that can be repeated numerous times, the expected value can be interpreted as
the ‘long run’ average value of the random variable. However it does not mean that the
random variable will assume this value, whenever next the experiment is conducted. In
fact, it is impossible to approve exactly 2.8 loans in any week. This value is important to a
manager from both the planning and decision making point of view. For example the
company is interested to know, how many loans will be approved in the next five weeks?
Although we cannot specify the exact number of loans approved in a week, based on the
expected value of 2.8 loans per week, we can say that the average number of loans approved
in the next month will be 14 (2.8x5). In terms of setting targets or allocating work, the
expected value may provide helpful decision making information.
The expected value gives us an idea of the average or central value for the random variable,
but often we want to measure variability of the possible values of random variable. The
variance is a commonly used measure to summarize the variability in the values of the
random variable. The variance of the probability distribution can be computed by
multiplying each possible squared difference (xi-µ)2 by its corresponding probability and
then summing the resulting values. The mathematical expression for the variance of the
discrete variable is:
𝜎 2 = ∑(𝑥𝑖 − 𝜇)2 𝑃(𝑥𝑖 )
51
5 0.1 0.5 (5- Formatted: Indent: Left: 0.5 cm
2.8)2(0.10)=0.484
6 0.05 0.3 (6- Formatted: Indent: Left: 0.5 cm
2.8)2(0.05)=0.512
∑=1.00 µ=E(x)=2.8 σ2=2.46 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
= √2.46 = 1.57
The variance of the number of loans approved per week is 2.46. For the purpose of easier
managerial interpretation, the standard deviation may be preferred over the variance, as it
is measured in the same units as the random variable. The variance (σ2) is measured in
squared units and is thus more difficult for a manger to interpret. The utility of variance
and standard deviation is limited to comparisons of variability of different random
variables. For example, the number of loans approved by two credit risk managers can be
compared for variability.
There are many discrete probability distributions, but in this unit, we will be discussing
two types of discrete distributions- Binomial distribution and Poisson distribution.
X P(x)
-$1,000 .40
$0 .20
+$1,000 .40
X P(x)
-$1,000 .40
$0 .20
+$1,000 .40
52
5.The standard deviation of this distribution is _____________.
a) -$400
b) $663
c) $800,000
3.4 Binomial Distribution
The most widely used of all discrete distributions is the binomial distribution. Several
assumptions underlying the use of the binomial distribution are:
• The experiment consists of a sequence of n identical trials
• Each trial has only two possible outcomes denoted as success and failure
• Each trial is independent of previous trials
• Probabilities of the two outcomes remain constant throughout the experiment
As the word binomial suggests that any single trial of a binomial experiment contains only
two possible outcomes. The two outcomes are labeled as success or failure. The outcome
of interest to the researcher is usually labeled as success. The symbol ‘p’ represents the
probability of success of a trial and the symbol ‘q’ is the probability of failure of a trial. Let
‘x’ denote the value of the random variable, then x can have a value of 0, 1, 2, 3…..n,
depending on the number of success observed in n trials. The mathematical formula for
computing the probability of any value for the random variable, where binomial
distribution is applicable is:
𝑛!
𝑃(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 = × 𝑝 𝑥 𝑞𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
where n= no of trials
x= no of successes desired
p= Probability getting success in one trial
q=1-p= the probability of failure in one trial
To illustrate the binomial probability distribution, let us consider the experiment of entering
a toy store. To keep the problem relatively small, we restrict the experiment to next five
customers. Based on experience, the store owner estimates that the probability of a
customer making a purchase is 0.30, what is the probability that exactly three of the next
five customers make a purchase?
Let’s check the assumptions of binomial experiment:
1. The experiment is described as sequence of five identical trials, one trial each for
the five customers entering the store
2. Each trial has only two possible outcomes- customer making a purchase (success)
and customer does not make a purchase (failure)
3. The purchase decision of one customer is independent of other trial is independent of
previous trials
4. Probabilities of purchase p=0.30 and no purchase q=0.70, remains constant
throughout the experiment
The random variable ‘x’ is defined as number of customers making a purchase. With n=5
trials, p=0.30 , q=0.70, the probability that exactly 3 customers out of five make a purchase
can be computed using the formula:
𝑛!
𝑃(𝑥 = 3) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 = × 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
5!
= 5 𝐶3 0.303 0.705−3 = × 0.303 0.705−3
3! (5 − 3)!
53
= 0.1323
Similarly, we can find the probability of zero (x=0) customers making a purchase
5!
𝑃(𝑥 = 0) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 = 5 𝐶0 0.300 0.705−0 = × 0.300 0.705−0 = 0.1681
0! (5 − 0)!
If we are interested in computing the probability of at least 3 customers making a purchase,
we need to find the probabilities of P(x=0), P(x=1), P(x=2) and P( x=3) and then sum it up.
In the next section, we will be discussing the use of tables to directly get the probability
values.
Binomial distributions are a family of distributions. Every different value of n and/or every
different value of p gives a different binomial distribution and tables are available for
various combinations of n and p values. Such a table for binomial probability values is
provided in Appendix Statistical Table A. In order to use this table, we need to specify
values of n, p and x for the binomial experiment. Each table is headed by a value of n.
Eleven values of p are presented in each table of size n. The column below each value of p
is the binomial distribution for that combination of n and p.
To illustrate the use of Binomial tables, let’s take an example. ABC resources, publishes
data on market share for various product categories in FMCG. As per the latest report, Oreo
controls 10 % of the market of cookies brand. Suppose 20 purchasers are selected randomly
from the population. What is the probability that fewer than four purchasers choose Oreo?
For this problem n=20 p=0.10 and x=4. The portion of binomial tables under n=20 can be
used to find the probability values. Search along the p values for 0.10. Determining the
probability of getting x=4 involves adding the probabilities for x=0, 1, 2 and 3. The values
appear in the x column of the intersection of each x value and p=0.10.
x value Probability
0 0.122
1 0.270
2 0.285
3 0.190
∑= 0.867
P(x<4)=0.867. If 10 % of all cookie purchasers prefer Oreos and 20 cookies purchasers are
randomly selected, about 86.7 % of the time, fewer than four of the 20 will select Oreos
54
Let’s say, according to a study, 64% of all consumers believe that public sector banks are
more competitive than five years ago. If 25 consumers are selected randomly, what is the
expected number who believe that public sector banks are more competitive than they were
five years ago?
This problem can be described by the binomial distribution of n=25 and p=0.64. The mean
of this problem can b computed as:
𝜇 = 𝑛. 𝑝 = 25 × 0.64 = 16
It means in long run, if 25 consumers are selected randomly again and again and if 64% of
the consumers believe in the given statement, then on an average 16 out of 25 will believe
that public sector banks are more competitive than five years ago.
The standard deviation of the binomial distribution is denoted as ‘σ’ and is computed
using the following formula:
𝜎 = √𝑛. 𝑝. 𝑞
For the given data, the standard deviation is
𝜎 = √25 × 0.64 × 0.36 = 2.4
55
characteristics:
• It describes discrete occurrences over a continuum
• Each occurrence is independent of the other occurrences
• The occurrence in each interval can range from zero to infinity
• The expected number of occurrences must remain constant throughout the
experiment
This distribution was used initially to describe occurrence of rare events for some interval
. Some of the common examples where Poisson random variable can be used to define are-
number of accidents per day, number of earthquakes occurring over a time period, no of
misprints on a page, number of interruptions per minute on a server, number of arrivals at
a toll booth etc.
If a Poisson distributed phenomenon is studied for a long period of time, a long run
average can be determined. This average is denoted as lambda (𝜆) and is used to describe
Poisson Distribution. The Poisson formula used to compute the probability of occurrences
over an interval for a given lambda value is:
𝜆𝑥 𝑒 −𝜆
𝑃(𝑥) =
𝑥!
Where, x=0, 1, 2, 3……..
ʎ= long run average
e=2.7182
Her x is the number of occurrences per interval for which the probability is to be
computed. The 𝜆 value must remain constant throughout the Poisson experiment.
Suppose that we are interested in the number of arrivals at a bank window during a 10
minute period on weekday mornings. We assume that the arrival of one customer is
independent of arrival of the other. Based on the historical data it is found that the average
number of customers arriving during a 10 minute interval of time is 8. If we want to find
the probability of arrival of five customers in 10 minutes, we would use x=5, 𝜆 = 8 per
10 minutes and compute:
𝜆𝑥 𝑒 −𝜆 85 × 2.71828−8
𝑃(𝑥) = = = 0.0916
𝑥! 5!
Suppose we want to find the probability for 9 customers arriving in twenty minutes. We
need to note that there is a change in the interval, instead of 10 minutes, the probability is
to be found for 20 minutes. As per the 𝜆 value, on an average 8 customers are arriving in
10 minutes. We can derive the new average rate for 20 Minutes by multiplying 𝜆 by 2,
i.e., 16 customers per 20 minutes. To compute we would use x=9, 𝜆 = 16 per 20 minutes:
𝜆𝑥 𝑒 −𝜆 169 × 2.71828−16
𝑃(𝑥) = = = 0.0213
𝑥! 9!
The probability of 9 customers arriving in twenty minutes duration is 0.0213. Similarly, if
we want to find probability of a x value for 5 minutes, the lambda value will be 4 customer
per five minutes. If we want to find cumulative probabilities like less than 8 customers, we
need to find various probabilities (for x=0, 1, 2, 3,4,5,6,7,) and then add it up. However in
this case, it will be easier to use Poisson tables.
56
3.5.1 Using Poisson Tables
Every value of Lambda determines different Poisson distribution. Regardless of the nature
of interval associated with the lambda, The Poisson distribution for a particular lambda is
same. Table B, contains the Poisson distribution for the selected value of lambda.
Probabilities for each x value associated with a given lambda are displayed, if it has a
nonzero probability value in the table.
Let’s illustrate the use of Poisson table for the given problem. The number of faults per
month that arise in the gearboxes of travel buses is known to follow a Poisson distribution
with a mean of 2.5 faults per month. What is the probability that in a given month less than
3 faults are found?
For this problem 𝜆 =2.5 faults per month and x=0, 1, 2. The portion of Poisson tables under
𝜆 =2.5 can be used to find the probability values. The values appear in the x column of the
intersection of each x value and 𝜆 =2.5.
x value Probability
0 0.0821
1 0.2052
2 0.2565
∑= 0.5438
The probability that in a given month less than three faults are found is 0.5438
57
Check Your Progress 4
1. The mean number of occurrences per interval of a Poisson distribution is denoted
by___
2. For a Poisson distribution the standard deviation is calculated as_________
3. The number of cars arriving at a toll booth in five-minute intervals is Poisson
distributed with a mean of 3 cars arriving in five-minute time intervals. The
probability of 3 cars arriving over a five-minute interval is _______.
a) 0.2700
b) 0.0498
c) 0.2240
d) 0.0001
4. A service station has a pump that distributes petrol fuel to automobiles. It is
estimated that 7 cars use the petrol pump every 2 hours. Assuming the arrivals are
Poisson distributed, what is the probability that atleast three cars will arrive to use
the diesel pump during one hour period?
58
Answers to check your progress 2
1. True
2. False
3. (b)
4. (d)
1. Binomial distribution
2. (b)
3. (d)
4. (d)
1. ʎ
2. 𝜎 = √𝜆
3. (c)
4. P(x>=3), ʎ= 7 cars per two hours, so for one hour interval ʎ= 3.5 cars.
P(x>=3)= 1- {P(x=0)+P(x=1) +P(x=2)
x value Probability
0 0.0302
1 0.1057
2 0.1850
∑= 0.3209
P(x>=3)= 1- {0.0302+0.1057 +0.1850)= 1- 0.3209= 0.6791
3.93.8 Glossary
Probability distribution: A list of outcomes of an experiment with the probabilities
associated with these outcomes
Random Variable: A variable that takes on different values as a result of outcomes of a
random experiment.
Discrete random variable: A random variable that is allowed to take countable infinite or
finite number of values.
Continuous random variable: A probability distribution in which the variable is allowed
to take on any value within a given range.
59
used to compute the probability of x success in n trials
Poisson distribution: The probability distribution for a discrete probability distribution,
used to compute the probability of x occurrences over a specified interval.
3.103.9 Assignment
1. What is meaning of expected value of a probability distribution?
2. What are the assumptions of a Binomial distribution?
3. What are the characteristics of a Poisson distribution?
4. A survey conducted for an insurance company revealed that 70% of workers say job
stress caused frequent health problems. Suppose a random sample 10 workers is selected.
What is the probability that more than seven of them say job stress caused frequent health
problems? What is the expected number of workers who say job stress caused frequent
health problems?
5. A survey conducted by the Consumer Research Centre reported that among
other things that women spend an average 1.2 hours per week on shopping online. Assume
that hours per week shopping online are Poisson distributed. If the survey result is true for
all women and if a woman is randomly selected, what is the probability that she did not
shop at all online over a one week period? What is the probability that a women would
shop three or more hours online during a one week period?
3.113.10 Activities
Develop graphs for binomial distribution using the tables for n= 8 and(a) p=0.20, (b)
p=0.50 and (c) p=0.80 and comment on the shape of the three graphs
Starting a business entails understanding and dealing with many issues—legal, financing,
sales and marketing, intellectual property protection, liability protection, human resources,
and more. The interest in entrepreneurship is at an all-time high. And there have been
spectacular success stories of early stage startups growing to be multi-billion-dollar
companies, such as Uber, Facebook, WhatsApp, Airbnb, and many others. Starting a
business is a huge commitment. Entrepreneurs often fail to appreciate the significant amount
of time, resources, and energy needed to start and grow a business.
A survey was done to identify the most important advice for starting a business venture. A
random sample of 12 small business owners, were contacted and data was collected. As
per the survey, 20% of all small business owners said that the most important advice
for starting a business is to prepare for long hours and hard work. Twenty five percent said
that the most important advice is to have good financing ready. Nineteen percent said
that having a good plan is the most important advice, 18 % said that studying the
industry and industry knowledge is the most important advice and 18% listed other advices.
Questions
1. What is the probability that six or more owners would say preparing for long hours
60
and hard work is the most important advice?
2. What is the probability that exactly five owners would say having food financing
ready is the most important advice?
3. What is the expected number of owners who would say having a good plan is the
most important advice?
61
Unit No.4 Continuous Probability
Distributions
Unit Structure
4.8 Glossary
4.9 Assignment
4.10 Activities
62
4.0 Learning Objectives
After learning this unit, you will be able to:
4.1 Introduction
In the last unit, we discussed situations involving discrete random variables and the
resulting discrete probability distributions. In this unit we will be focusing on random
variable which can take any value over a range. Suppose you are a website designer for a
matrimonial site and you have to make sure that the webpage downloads quickly. The
download time is affected by design of the website and the load on the company’s web
server. The random variable ‘download time’ is a continuous variable, as it can take any
value on a range and not just whole number. This type of random variable which can take
infinite number of values over a range is called a continuous random variable and the
probability distribution of such variable is called continuous probability distribution. The
concepts and assumptions for this type of distributions are quite different from those of
discrete probability distributions. The objective of this unit is to study the concepts and
usefulness of continuous distributions. We will be discussing some important continuous
probability distributions and their applications in this unit.
63
The figure 1 graphically represents three continuous distributions. Figure 1(a) depicts a
uniform distribution, where the probability of occurrence of a value is equally likely to
occur anywhere in the range between the smallest vale ‘a’ and the largest value ‘b’.
Sometimes referred to as uniform distribution, the uniform distribution is symmetric,
meaning its mean equals its median.
Figure 1(b) depicts a normal distribution. The normal distribution is symmetrical and bell
shaped, so most of the values group around the mean. The mean, median and mode all have
the same value. An exponential distribution is illustrated in Figure 1(c).An exponential
distribution, is a positively skewed distribution, which makes the mean larger than the
median. The range for an exponential distribution is zero to positive infinity, but its shape
makes it highly unlikely for extremely large values to occur.
Uniform distribution refers to a probability distribution for which all of the values that a
random variable can take on occur with equal probability in the range between the smallest
64
value ‘a’ and the largest value ‘b’. Suppose, the travel time of buses travelling from city X
to city Y is denoted by x. Assume that the minimum time is 3 hours and the maximum time
is 3 hours 20 minutes. Thus in terms of minutes the travel time can be any interval between
180 and 200 minutes. As the random variable x can take any value between 180 and 200
minutes, x is a continuous variable. Based on the past data, the probability of flight time
between 180 and 181 minutes is same as the probability of travel time between any other
1 minute interval up to and including 200 minutes. With every interval being equally likely,
the random variable x has a uniform distribution. The following probability density
function defines a uniform distribution:
1
𝑓𝑜𝑟 𝑎 ≤ 𝑥 ≤ 𝑏
𝑓(𝑥) = {𝑏 − 𝑎 }
0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑜𝑡ℎ𝑒𝑟 𝑣𝑎𝑙𝑢𝑒𝑠
In a uniform distribution, the total area under the curve is 1 and as the shape is rectangular
the area can be computed as the product of length and width of the rectangle. Because, by
definition, the distribution lies between the x values of a and b, the length of the rectangle
is (b-a). Combining this with the fact that area under the curve is equal to 1, height of the
rectangle can be solved as follows:
Area of Rectangle= Length x height=1, but length= (b-a)
Therefore (𝑏 − 𝑎) × ℎ𝑒𝑖𝑔ℎ𝑡 = 1
1
𝐻𝑒𝑖𝑔ℎ𝑡 =
(𝑏 − 𝑎)
The mean and the standard deviation of the uniform distribution are given as follows:
𝑎+𝑏
𝜇=
2
𝑏−𝑎
𝜎=
√12
As an example, suppose a production line manufactures a machine part in lots of 10 per
minute during a shift. When the lots are weighed, variation in weights was observed in the
range of 34 to 48 grams in a uniform distribution. The height of the distribution is:
1 1 1
𝐻𝑒𝑖𝑔ℎ𝑡 = = =
(𝑏 − 𝑎) 48 − 34 14
The mean and the standard deviation of the uniform distribution are given as follows:
𝑎 + 𝑏 48 + 34 82
𝜇= = = = 41
2 2 2
𝑏 − 𝑎 48 − 34 14
𝜎= = = = 4.041
√12 √12 3.464
65
4.3.1 Area as a measure of probability
where, 𝑎 ≤ 𝑥1 ≤ 𝑥2 ≤ 𝑏
Suppose for the same problem given above, we are interested to find the probability that
the lot weighs between 40 and 45 grams. The probability can be calculated as:
𝑥2 − 𝑥1 45 − 40
𝑃(𝑥) = = = 0.3571
𝑏−𝑎 48 − 34
So the probability that the lot weights are between 40 and 45 grams, is 0.3571. The
probability that the lot weight is less than 34 is zero, as the lowest value is 34. Similarly
the probability that the lot weight is more than 50 is also zero, as the upper value is 48.
Let’s find the probability that the lot weighs less than 40. As the lowest value is 34, for
finding the probability that lot weighs being less than 40 actually means values between 34
and 40 grams. So the probability is calculated as follows:
𝑥2 − 𝑥1 40 − 34
𝑃(𝑥) = = = 0.4286
𝑏−𝑎 48 − 34
66
Check your progress 2
A very important continuous probability distribution is the normal distribution. There are
many reasons for normal distribution’s versatility and prominent place in statistics. First,
it has properties that make it applicable to many situations in which it is necessary to make
inferences by taking samples. Quite often, we face the problem of limited data for making
inferences about processes. Irrespective of the shape of the distribution of population, it
has been found that normal distribution can be used to characterize sampling distributions.
This helps considerably in inferential statistics. Second, the normal distribution is similar
to actual frequency distribution of many phenomena, like human characteristics (weight,
height, IQ), outputs from physical processes (dimensions and yield) and other measures of
67
interest to managers. This knowledge helps us to calculate probabilities of different events
in varied situations and which in turn help us in decision making. Finally, the normal
distribution can be used to approximate certain probability distributions, which help
considerably in simplifying probability calculations.
The normal distribution is described by two parameters: the mean µ and standard
deviation σ. The density function of the normal distribution is:
1 𝑥−𝜇 2
1 − ( )
𝑓 (𝑥 ) = 𝑒 2 𝜎
𝜎 √2𝜋
Where µ= mean of x
Σ= standard deviation of x
π=3.14159
e= 2.71828
Using calculus to determine areas under the normal curve from this function is difficult and
time consuming, therefore all researchers use table values to analyze normal distribution
problems
Every unique pair of µ and σ values defines a different normal distribution. This
characteristic of being a family of curves could make analysis tedious, because of the
volumes of normal curve tables – one for each combination of µ and σ would be required.
A mechanism was developed by which all normal distributions can be converted into a
single distribution (z distribution).This process yields standardized normal distribution.
The conversion formula for any value of x of a given normal distribution is as follows:
𝑥−𝜇
𝑧= 𝑤ℎ𝑒𝑟𝑒 𝜎 ≠ 0
𝜎
A zscore is the number of standard deviations that a value, x, is above or below the mean.
If the value of x is less than the mean, the zscore is negative; If the value of x is more than
the mean, the zscore is positive; and if the value of x is equal to mean, the zscore is zero.
This formula converts the distance from mean into standard deviation units. A standard z
distribution table can be used to find probabilities for any normal curve value that is
converted to zscore. The z distribution is a normal distribution with a mean of 0 and
68
standard deviation of 1. Any value of x at the mean is zero standard deviation from the
mean. Any value of x that is one standard deviation above or below the mean has a z value
of 1. As per the empirical rule, in a normal distribution regardless of the values of µ and σ,
68 % of all values are within one standard deviation of the mean; 95%of all values are
within one standard deviation of the mean; and 99.7% of all values are within three standard
deviation of the mean. The z distribution probability values are given in Appendix
Statistical Table C. The Table C gives the total area between 0 and any point on the positive
z axis. Since the curve is symmetric, the area between z and 0 is the same, irrespective of
whether z is positive or negative. The table areas or probabilities are always positive.
To use Z Table to find probabilities, first note that values of z appear in the left hand
column, with the second decimal value of z appearing in the top row.. For example for a
value of 1.00, we find the 1.0 in the left hand column and 0.00 in the top row. Then by
looking into the body of the table, we find that 0.3413 correspond to the 1.00 value of z.
The value of 0.3413 is the area under the curve between the mean (z=0) and z=1.00, as
shown graphically in Figure 2.
Figure 2
Area or probability of 0.3413
z=0 z=+1
Suppose we want to find the probability of obtaining a z value between z=-1.00 and z=1.00.
We already know that the probability value of a z value between z=0.00 and z=1.00 is
0.3413. As the normal distribution is symmetrical, i.e. the shape of the curve on the left of
the mean is a mirror image of the shape of the curve on the right of the mean. Thus the
probability of a z value between z=0.00 and z=-1.00 is same as that probability of a z value
between z=0.00 and z=1.00, i.e., 0.3413. Hence the probability between z=-1 and z=1.00
is 0.3413 + 0.3413= 0.6826, as shown graphically.
Figure 3
Area or probability of 0.6826
69
z=-1 z=+1
4.4.3 Solving Normal distribution problems
Suppose that the Ceat tyre company just developed a new radial tyre that will be sold
through a national chain of stores. Because the tyre is a new product, the management
believes that the mileage guarantee offered with the tyre will be an important factor in the
consumer acceptance of the product. Before finalizing the tyre’s mileage guarantee policy,
Ceat management wants some probability information concerning the number of miles the
tires will last.
From actual road test with the tyres, the engineering department estimates the mean tyre
mileage to be 36500 miles and the standard deviation to be 5000 miles. In addition the data
collected indicate that a normal distribution is a reasonable assumption. What percentage
of the tyres can be expected to last more than 40000 miles?
𝑥 − 𝜇 40000 − 36500
𝑧= = = 0.70
𝜎 5000
Probability that x exceeds
40000
σ=5000
µ= 36500 40000
Thus the probability that the normal distribution for tyre mileage will have x values greater
than 40000 is the same as the probability that the z distribution will have a z value greater
than 0.70. Using Z Table, we find that the area corresponding to z=0.70 is 0.2580. But we
need to remember that the table provides area between the mean and the z value. Thus we
know, that there is a 0.2580 area between mean and z=0.70. The total are under the curve
70
is 1, being a symmetrical curve, the area from mean to the tail will be 0.5.Thus the area
above z=0.70 will be 0.5-0.2580=0.2420. In terms of tyre mileage x, we can conclude that
there is a 0.2420 probability that x value will be above 40000. Thus about 24.2 % of the
tyres manufactured can b expected to last more than 40000.
Let us now assume that the company is considering that it will provide a discount on new
set of tires if the mileage on the original tires does not exceed the mileage stated on the
guarantee. What should be the guarantee mileage be, if Ceat wants no more than 8% of the
tyres to be eligible for the discount?
X value? µ=36500
Note that 8 % of the area is below that unknown guarantee mileage that we need to
calculate. It means the area between the men and the unknown guarantee value is 0.5-
0.08=0.42. The question is how many standard deviation(z value) do we have to be below
the mean to get 42 % of area? We have earlier used the z Table to find the area using a z
value. Now we have area between the mean and the z value, and need to find the
corresponding z value. If we look for 0.42 in the body of the z Table, we see that a 0.4200
area occurs at approximately z = 1.41. As the area is below the mean the z value of interest
must be -1.41. Hence the desired guarantee mileage should be 1.41 standard deviations less
than the mean. Putting the known values in the formula,
𝑥−𝜇
𝑧=
𝜎
𝑥 − 36500
−1.41 =
5000
Therefore a guarantee of 29450 miles will meet the requirement that approximately 8% of
the tyres will be eligible for the discount. With this information the firm might confidently
take the decision to set its tyre mileage at 29000 miles. Again we see the important role of
probability distributions in providing information for decision making.
71
4.4.4. Normal as an approximation of Binomial
As the sample sizes become large, binomial distribution approaches normal distribution,
regardless of the value of p. This phenomenon occurs faster (for smaller values of n) when
p is near 0.50.To work a binomial problem by the normal curve requires a transformation
process. The first part is to convert the two parameters of binomial distribution-n and p, to
the two parameters of the normal distribution, µ and σ. It involves following formula:
𝜇 = 𝑛. 𝑝 𝑎𝑛𝑑 𝜎 = √𝑛. 𝑝. 𝑞
Suppose we want to find the probability that random variable x value lie between 20 and
24, when a sample of 60 is taken and the probability of success is found to be 0.60. From
the previous unit we know that this can be calculated using the formula:
𝑛!
𝑃(𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥 = × 𝑝 𝑥 𝑞𝑛−𝑥
𝑥! (𝑛 − 𝑥)!
We need to calculate P(x) for x=20, 21, 22, 23, and 24 and then sum it up to get the
probability, which is going to be very tedious. Translating from a binomial problem to a
normal curve problem gives:
𝜇 = 𝑛. 𝑝 = 60(0.30) = 18 𝑎𝑛𝑑 𝜎 = √𝑛. 𝑝. 𝑞 = 3.55
72
σ =3.55
µ=1819.50 24.50
For x=19.50
𝑥 − 𝜇 19.50 − 18
𝑧= = = 0.43
𝜎 3.55
For x=25.50
𝑥 − 𝜇 24.50 − 18
𝑧= = = 1.83
𝜎 3.55
From the z Table we can find that, probability of z value =0.43 is 0.1664. This value is the
area between the mean and the z value. Similarly for z=1.83, the p value is 0.4664. To find
the area required probability, we need to subtract the two probability values:
Thus the probability that the value will fall between 19.5 and 24.50 is 030. You may check
the value by using the binomial distribution formula; the answer will be the same.
73
Check your progress 3
1. A z-score is the number of standard deviations that a value of a random variable is
above or below the mean. (T/F)
2. Since a normal distribution curve extends from minus infinity to plus infinity, the area
under the curve is infinity.(T/F)
3. A standard normal distribution has a mean of zero and a standard deviation of one.(T/F)
4. The area to the left of the mean in any normal distribution is equal to _______.
a) the mean
b) 1
c) the variance
d) 0.5
5.If x is a normal random variable with mean 80 and standard deviation 5, the z-score for x
= 88 is ________.
a) 1.8
b) -1.8
c) 1.6
d) -1.6
6. Suppose x is a normal random variable with mean 60 and standard deviation 2. A z
score was calculated for a number, and the z score is 3.4. What is x?
a) 63.4
b) 56.6
c) 68.6
d) 66.8
74
Figure 4
𝐹(𝑥) = 𝜆𝑒 −𝜆𝑥
𝑤ℎ𝑒𝑟𝑒 𝑥 ≥ 0 , 𝜆 > 0 𝑎𝑛𝑑 𝑒 = 2.71828
The mean of an exponential distribution is 𝜇 = 1⁄𝜆 and the standard deviation is 𝜎 = 1⁄𝜆
Probabilities are computed by determining the area under the curve between two points.
Applying calculus to the exponential probability density function gives a formula that can
be used to compute the probabilities of exponential distribution:
𝑃(𝑥 ≥ 𝑥0 ) = 𝑒 −𝜆𝑥0
Where, x0≥ 0, and is the fraction or the number of intervals between arrivals in the
probability question .
75
Illustration: The exponential distribution can be used to solve Poisson type problems in
which the intervals are not time. The Air travel consumer report published that average
number of mishandled baggage occurrences is 4.06 per 1000 passengers. Assume
mishandled baggage occurrences is Poisson distributed. Determine the average number of
passengers between occurrences. Suppose a baggage is just been mishandled; what is the
probability that the number will be fewer than 190 passengers? What is the probability that
it is between 190 and 495 passengers?
As the = 4.06/ 1000 passengers; the mean of exponential distribution can be calculated
as.
1
𝜇 = 1⁄𝜆 = = 0.2463
4.06
= 0.2463(1000) = 246.3
190 495
From looking at the graph, we can easily see that the required shaded area can be
computed by subtracting P(x≥495) from P(x≥190)= 0.4624-0.1340= 0.3284
In operations research, Poisson distribution in conjunction with exponential distribution is
used to solve queuing problems. The Poisson distribution is used to analyze the arrivals
in a queue and exponential distribution is used to analyze inter-arrival time.
76
Check your progress 4
1. If arrivals at a bank followed a Poisson distribution, then the time between arrivals would
follow a binomial distribution.(True/False)
2. For an exponential distribution, the mean is always equal to its variance. (True /False)
3. At a certain workstation in an assembly line, the time required to assemble a component
is exponentially distributed with a mean time of 10 minutes. Find the probability that a
component is assembled in 3 to 7 minutes?
a) 0.5034
b) 0.2592
c) 0.2442
d) 0.2942
4. At a certain workstation in an assembly line, the time required to assemble a component
is exponentially distributed with a mean time of 10 minutes. Find the probability that a
component is assembled in 7 minutes or less?
a) 0.349
b) 0.591
c) 0.714
d) 0.503
5. On Saturdays, cars arrive at Shine Car Wash at the rate of 6 cars per fifteen minute interval.
The probability that at least 2 minutes will elapse between car arrivals is _____________.
a) 0.0000
b) 0.4493
c) 0.1353
d) 1.0000
The most widely used distribution is the normal distribution. Many phenomena are
normally distributed like characteristics of machine parts, many measurements of natural
environment, human characteristics such as height, weight, IQ and test scores. The
parameters necessary to describe a normal distribution are mean and standard deviation.
77
For convenience, the data should be standardized by using the mean and standard deviation
to compute z score. The probability of the z score of an x value can be determined by the
table of z scores. The normal distribution is also used to work certain type of binomial
distribution problems.
1. True
2. (a)
1. True
2. (a)
3. (c)
4. (d)
5. (b)
1.True
2. False
3.True
4. (d)
5. (c)
6. (d)
1. False
2. False
3. (c)
4. (d)
5. (b)
4.8 Glossary
Uniform Probability Distribution: A continuous probability distribution in which the
probability that the random variable will assume a value in any interval of equal length is
same for each interval.
78
Probability Density function: The function that describes the probability distribution of
a continuous random variable
Normal Distribution: A continuous probability distribution whose probability density
function is bell shaped and is determined by the mean and standard deviation
Standard normal distribution: A normal distribution with mean of 0 and a standard
deviation of 1
Z Score: z score is the distance that an x value is from the mean µ in units of standard
deviations
Exponential Distribution: A continuous probability distribution that is useful in
describing the time to complete a task or the time/interval between occurrences of an event
4.9 Assignment
4.10 Activities
Use the probability density formula to sketch the graphs of the following exponential
distributions (a) 𝜆=0.2, (b) 𝜆=0.4, (c) 𝜆=0.4. Hint{ use x=0, 1,2, 3……and find f(x)}
79
4.12 Further Readings
80
Block Summary
In this block, we studied how quantitative methods may be used to help managers make
better decisions. In the first unit the meaning and use of various quantitative analysis
methods in the field of business and management was explained. In this unit, the basic
difference between statistics and operations research was discussed along with their
techniques. In the second unit, the concept of measures of central tendency was introduced.
Various measures of central tendency and its relative importance were discussed. In the
third unit the application of various types of discrete probability distributions were
discussed. In the last unit continuous probability distributions and its various applications
were covered.
81
Block Assignment
6.5. The Poisson distribution of annual trips per family to amusement parks gives average
of 0.6 trips per year. What is the probability of randomly selected family did not make a
trip to an amusement park last year? What is the probability of randomly selected family
took three or fewer trips to amusement parks over a three years period?
82
Block Structure
In this block, we will study decision making techniques which are used to make business
decisions and forecasting. In the first unit the concept of decision making along with
decision tree approach and other related concepts like single stage decisions, multi stage
decisions, issues, and types of environments of decisions will be discussed. In the second
unit, we will explore relationships between variables through correlation and regression
analysis and learn how to develop models that can be used to predict one variable by
another variable. Here, we will also learn how to make meaningful predictions from the
given data by fitting them into the linear function. In the third unit some of the basic
concepts of forecasting will be discussed for planning and understanding decisions in a
scientific approach. We will also explore the statistical techniques that can be used to
forecast values from time-series data and to know how well the forecast is being done.
Objectives
After learning this block, you will be able to:
• Understand decision problems which involve various uncertainties in different types
of environments
• Understand the decision-making process
• Analyze problems using Decision Tree Approach
• Make decisions under uncertainty
• Analyze situations where probabilities of outcomes are uncertain
• Understand the concept of correlation
• Understand the role of regression in establishing mathematical relationships
between dependent and independent variables from given data
• Use the least squares criterion to estimate the model parameters
83
• Learn the meaning and calculation of residuals
• Identify the standard errors of estimate
Know when to use various forecasting methods.
• Understand different types of forecast models
• Understand time series analysis - moving averages, exponential smoothing, least
square regression trend analysis for demand forecasting.
• Calculate different measures of forecast accuracy.
Block Structure
Unit 3: Forecasting
84
Unit No. 1 Decision Theory
___________________________________________
Unit Structure
1.0 Learning Objectives
1.1 Introduction
1.1.1 Types of decision-making environments
Check your progress 1
1.6 Glossary
1.7 Assignment
1.8 Activities
85
1.0 Learning Objectives
1.1 Introduction
At every stage of our life including day to day routine involves various kinds of decisions.
The decision problems are everywhere but altogether it deals with making good decisions
too. Many people from different time and fields, use decision theory under different
environments to come up with the final decisions. The analysis varies with the nature of
the decision problem, so that any classification base for decision problems provides us with
a means to segregate the decision analysis approach. An important condition for the
existence of a decision problem is the presence of alternative ways of actions. Each action
leads to a consequence through a possible set of outcomes based on the information that
might be known or unknown. One of the several ways of classifying decision problems has
been based on this knowledge about the information on outcomes. Broadly, two
classifications result:
a) The information on outcomes are deterministic and are known with certainty, and
b) The information on outcomes are probabilistic (uncertain), with the probabilities known
or unknown.
The former is classified as decision making under certainty, while the latter is called
decision making under uncertainty. The theory that has resulted from analyzing decision
problems in uncertain situations is commonly known as Decision Theory. The agenda of
this unit is to study some methods for solving decision problems under uncertainty.
Decision theory is an analytic and systematic approach for decision making. A good
decision is one that is based on logic, considers all available data and possible alternatives,
and the quantitative approach described here.
86
Type 1: Decision making under certainty: The decision maker knows with certainty the
consequences of every alternative or decision choice.
Type 2: Decision making under uncertainty: The decision maker does not know the
probabilities of the various outcomes.
Type 3: Decision making under risk: The decision maker knows the probabilities of the
various outcomes.
Check your progress 1
1. The information on outcomes are deterministic and are known with certainty is known
as____________
2. The necessary condition for the existence of decision problem is the presence
of___________
4. Which theory concerns making sound decisions under conditions of certainty, risk and
uncertainty
a. Game Theory
b. Network Analysis
c. Decision Theory
d. None of the above
Different problems arise while analyzing decision problems under uncertain conditions of
outcomes. The first concept is, decisions can be viewed either as independent decisions
(one stage/one-time decision) or as decisions with the sequence of decisions that are taken
over a period of time. So, planning horizon also decides the nature of decisions, we have
either a single stage decision problem, or a sequential decision problem. In real life, the
decisions can be classified generally as sequential and thus it becomes difficult to solve
them. Fortunately, valid assumptions in most of the cases help to reduce the number of
stages, and make the problem solvable. So, decision theory deals with following two types
problems basically.
(a) One-stage decision making process
(b) Multi-stage decision making process
Now consider the problem to find the number of magazines copies one should stock
in the face of uncertain demand, such that, the expected profit is maximized. A critical
87
evaluation of the method shows that the calculation becomes tedious as the number of
values the demand is taking increases. You can also try the method with a discrete
distribution of demand, where demand can take values between some range and then do
trial and error for each and every value of demand that is again time-consuming task.
Hence it calls for the separate techniques to make decisions. We will learn techniques
for solving such single stage problems called marginal analysis. For sequential decision
problems, the Decision Tree Approach is helpful and will be explained in a later section.
In the analysis, we will be using some criteria but main is expected monetary value criteria
(all other criteria will be explained in next section). However, this criterion suffers from
two problems. Expected Profit or Expected Monetary Value (EMV), as it is more
commonly known, does not take into account the decision maker's attitude towards risk.
The other problem with Expected Monetary Value is that it can be applied only when the
probabilities of outcomes are known. For problems, where the probabilities are unknown,
one way is to assign equal probabilities to the outcomes, and then use EMV for decision-
making. However, this is also not always rational, and as other criteria are available for
deciding on such situations.
The following are the steps of decision-making process which can be commonly used
for any approach:
88
Example I
Decision Table with Conditional Values for Krishna Manufacturer.
State of Nature
Alternative Favourable Market Unfavourable Market Formatted: Indent: Left: 0.5 cm
1. Maximax (optimistic): Used to find the alternative that maximizes the maximum
payoff. Locate the maximum payoff for each alternative. Select the alternative with the
maximum number.
State of Nature
Alternative Favourable Unfavourable Maximum in a Formatted: Indent: Left: 0.5 cm
plant
Construct a small 100,000 -20,000 100,000 Formatted: Indent: Left: 0.5 cm
plant
Do nothing 0 0 0 Formatted: Indent: Left: 0.5 cm
89
State of Nature
Alternative Favourable Unfavourable Minimum in a Formatted: Font color: Red
Formatted: Indent: Left: 0.5 cm
Market Market row
Construct a large 200,000 -180,000 200,000 Formatted: Indent: Left: 0.5 cm
plant
Construct a small 100,000 -20,000 100,000 Formatted: Indent: Left: 0.5 cm
plant
Do nothing 0 0 0 Formatted: Indent: Left: 0.5 cm
State of Nature
Alternative Favourable Unfavourable Market Maximum = Formatted: Indent: Left: 0.5 cm
Market Realism
Construct a large plant 200,000 -180,000 1,24,000 Formatted: Indent: Left: 0.5 cm
Construct a small plant 100,000 -20,000 76,000 Formatted: Indent: Left: 0.5 cm
5. Equally likely (Laplace): Considers all the payoffs for each alternative with highest
average. Find the average payoff for each alternative. Select the alternative with the
90
highest average.
State of Nature
Alternative Favourable Unfavourable Highest Formatted: Indent: Left: 0.5 cm
plant
Construct a small 100,000 -20,000 40,000 Formatted: Indent: Left: 0.5 cm
plant
Do nothing 0 0 0 Formatted: Indent: Left: 0.5 cm
This is decision making when there are several possible states of nature, and the
probabilities associated with each possible state are known. The most popular method is to
choose the alternative with the highest expected monetary value (EMV).
Suppose in example I each market outcome has a probability of occurrence of 0.50. Which
alternative would give the highest EMV? Calculations are as follows: Select the alternative
with highest EMV.
EMV (large plant) = (200,000)(0.5) + (–180,000)(0.5)= 10,000
EMV (small plant) = (100,000)(0.5) + (–20,000)(0.5)= 40,000
EMV (do nothing) = (0)(0.5) + (0)(0.5)= 0
Highest EMV is obtained with strategy of Small Plant.
1.3.2 Multi-stage decision making process with certainty (Decision Tree
Approach)
Any problem that can be presented in a decision table can also be graphically represented
in a decision tree. Decision trees are most beneficial when a sequence of decisions must be
made. All decision trees contain decision points or nodes, from which one of the several
alternatives may be chosen. All decision trees contain state-of-nature points or nodes, out
of which one state of nature will occur.
91
Steps of decision tree analysis
92
Basic structure of Decision tree of example I
93
Check your progress 3
1. In the Hurwicz criteria the value of is always between _________
2. Select the alternative with highest average payoff is given in the rule of _____________
3. When number of alternatives with probabilities/certainties are given, is known
as______________
Decision Theory provides us with the structure and methods for analyzing decision
problems under uncertainty, certainty and risk. The decision problems under uncertainty
are characterized by different courses of action and uncertain or risky outcomes
corresponding to each action or alternative. The problems can involve a single stage or a
multi-stage decision process. Expected monetary value and other different criteria are
helpful in solving single stage problems, whereas the decision tree approach is useful
for solving multi-stage problems. In this unit we have learned the applications of these
methods to solve decision problems. The main objective behind using decision making
94
methods is of maximizing the Expected Monetary Value (EMV). So ultimate goal by
finding EMV with both the methods basically assumes that the decision maker does not
want to take risk or he/she wants to be neutral or decision maker can make approximate
decisions based on the outcomes discovered.
4. C
2. EMV
3. Attitude
4. Outcomes
2. Laplace
1.6 Glossary
Decision making under certainty: The decision maker knows with certainty the
consequences of every alternative or decision choice.
Decision making under uncertainty: The decision maker does not know the probabilities
of the various outcomes.
Decision making under risk: The decision maker knows the probabilities of the various
outcomes.
95
Maximax (optimistic): Used to find the alternative that maximizes the maximum payoff.
Maximin (pessimistic): Used to find the alternative that maximizes the minimum payoff.
Equally likely (Laplace): Considers all the payoffs for each alternative with highest
average.
EMV: The highest expected monetary value means payoff of particular decision multiply
by probability of occurrence.
Lines or branches in decision tree: It connects the decisions nodes and the states of
nature.
1.7 Assignment
1. A small group of investors is considering planting a tree farm. Their choices are (1) don’t
plan trees, (2) plant a small number of trees, or (3) plant a large number of trees. The
investors are concerned about the demand for trees. If demand for trees declines, planting
a large tree farm would probably result in a loss. However, if a large increase in the demand
for trees occurs, not planting a tree farm could mean a large loss in revenue opportunity.
They determine that three states of demand are possible: (1) demand declines, (2) demand
remains the same as it is, and (3) demand increases. Use the following decision table to
compute an expected monetary value for this decision opportunity. Also show decision tree
for the same.
State of Demand
Decision Alternatives Decline (0.20) Same (0.30) Increase (0.50) Formatted: Indent: Left: 0.5 cm
Large Tree Farm -550 -120 750 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
2. Some oil speculators are interested in drilling an oil well. The rights to the land have
been secured and they must decide whether to drill. The states of nature are that oil is
present or that no oil is present. Their two decision alternatives are drill or don’t drill. If
they strike oil, the well will pay 2 million. If they have a dry hole, they will lose 150,000.
96
If they don’t drill, their payoffs are 0 rs. when oil is present and 0 rs when it is not. The
probability that oil is present is .12. Use this information to construct a decision table and
decision tree and compute an expected monetary value for this problem.
3. A car rental agency faces the decision of buying a fleet of cars, all of which will be the
same size. It can purchase a fleet of small cars, medium cars, or large cars. The smallest
cars are the most fuel efficient and the largest cars are the greatest fuel users. One of the
problems for the decision makers is that they do not know whether the price of fuel will
increase or decrease in the near future. If the price increases, the small cars are likely to be
most popular. If the price decreases, customers may demand the larger cars. Following is
a decision table with these decision alternatives, the states of nature, the probabilities, and
the payoffs. Use this information to determine the expected monetary value for this
problem.
State of Nature
Decision Alternatives Fuel Decrease (0.70) Fuel Increase (0.30) Formatted: Indent: Left: 0.5 cm
1. Suppose you have the option of investing either in Project A or in Project B. The
outcomes of both the projects are uncertain. If you invest in Project A, there is a 98%
chance of making Rs. 25,000 profit, and 2% chance of losing Rs. 90,000. If project B is
chosen, there is a 50-50 chance of making a profit of Rs. 7,000 or Rs. 17,000. Which project
will you choose and why?
2. Suppose in above activity 1, you have calculated the expected payoff (EMV) for both
the projects as follows.
EMVA = 0.98 * 25,000 - 0.02 * 90,000 = Rs. 26,300.
EMVB = 0.5 * 7,000- 0.5 * 17,000 = Rs. 12,000.
You have thus found that by investing in Project A, you can expect more money, so you
have chosen A. Your friend, when given the same option, chooses B, arguing that he would
not like to go bankrupt (losing 90,000) by choosing A. How do you reconcile these two
arguments?
1.9 Case Study Formatted: Indent: Left: 0.5 cm, First line: 0 cm
97
The Property Company: A property owner is faced with a choice of:
(a) A large-scale investment (A) to improve her flats. This could produce a substantial pay-
off in terms of increased revenue net of costs but will require an investment of
Rs.1,400,000. After extensive market research it is considered that there is a 40% chance
that a pay-off of Rs.2,500,000 will be obtained, but there is a 60% chance that it will be
only Rs.800,000.
(b) A smaller scale project (B) to re-decorate her premises. At Rs.500,000 this is less costly
but will produce a lower pay-off. Research data suggests a 30% chance of a gain
of Rs.1,000,000 but a 70% chance of it being only Rs.500,000.
(c) Continuing the present operation without change (C). It will cost nothing, but neither
will it produce any pay-off. Clients will be unhappy and it will become harder and harder
to rent the flats out when they become free.
98
Unit No. 2 Correlation and Regression
Analysis
______________________________
Unit Structure
2.0 Learning Objectives
2.1 Introduction
2.7 Glossary
2.8 Assignment
2.9 Activities
99
2.0 Learning Objectives
2.1 Introduction
In industry and business today, large amounts of data are continuously being
generated and thus it calls for statistical analysis of mass data. Data is an asset for
any business. This data can be company's annual production, annual sales, capacity
utilization, turnover, profits, man-power levels, absenteeism or some other variable
of direct interest to management. In general, the data can be of any of the aspects
related to finance, marketing, human resource, inventory, production or there might
be technical data regarding processes such as temperature, pressure etc. Sometimes
it is related to quality control issues. The accumulated data can be used to gain
information about the system (as for instance what happens to the market return
when Sensex goes down) or to identify past pattern of trends, behavior or simply
used for control purposes to check if the process or system is operating as planned
and designed (as for instance in quality control). So main objective to learn
correlation and regression is primarily for extracting the main features of the
relationships and impacts hidden in or implied by the mass of data.
The data we analyze can have many variables and it is of interest to examine the
effects of some variables on others. To identify the exact functional relationship
between variables can be too complex but we may wish to approximate relationship
by some simple mathematical function such as correlation and straight line or least
square line. For instance, the monthly consumption of raw materials at a particular
company, daily demand of a particular product, weekly price change in petrol could
all be variables of interest. We are, however, interested in some key performance
variables (let us consider sales and advertisement) would like to see how this key
variable (called the response variable or dependent variable, here sales) is affected
by the other variables (often called independent or explanatory variable, here
advertisement).
n n
Example I
A study is designed to check the relationship between smoking and longevity. A sample of
15 men, 50 years and older was taken and the average number of cigarettes smoked per
day and their age at death was measured. Here cigarettes smoking is independent variable
(X) and Longevity is dependent variable (Y). n is number of pairs = 15
101
8 26 79 2054 676 6241 Formatted: Indent: Left: 0.5 cm
9 11 81 891 121 6561 Formatted: Indent: Left: 0.5 cm
10 19 75 1425 361 5625 Formatted: Indent: Left: 0.5 cm
11 14 68 952 196 4624 Formatted: Indent: Left: 0.5 cm
12 35 72 2520 1225 5184 Formatted: Indent: Left: 0.5 cm
13 29 58 1682 841 3364 Formatted: Indent: Left: 0.5 cm
14 4 92 368 16 8464 Formatted: Indent: Left: 0.5 cm
15 23 65 1495 529 4225 Formatted: Indent: Left: 0.5 cm
∑Y = ∑X*Y = ∑X* X ∑Y * Y = Formatted: Indent: Left: 0.5 cm
∑X = 291 1103 20034 =7817 82791
Formatted: Indent: Left: 0.5 cm
Put all the calculated values in the formula learned above. Answer is = -0.71343, so
moderate negative (Variables are related reciprocally) correlation between two variables.
In conclusion, if cigarettes smoking is less, then Longevity of life is more.
a) 0 and 1
b) -1 to 0 to 1
c) -1
d) None of the above
102
be predicted is called the dependent variable and is designated as Y. The predictor is called
the independent variable, or explanatory variable, and is designated as X. In simple
regression analysis, only a straight-line relationship between two variables is examined.
ŷ = b0 + b1x
where
Y is the dependent variable (that’s the variable that goes on the Y axis), X is the
independent variable (i.e. it is plotted on the X axis), b is the slope of the line and a is the
y-intercept.
Example II
In the table below, the xi column shows scores on the aptitude test and yi column shows Formatted: Subscript
statistics grades. Conduct the regression analysis, residual analysis and standard error of Formatted: Subscript
estimate.
First, we solve for the regression coefficient (b1): Formatted: Indent: Left: 0.5 cm
Once we know the value of the regression coefficient (b1), we can solve for the regression
slope (b0):
b0 = 𝑦̅− 𝑏1 𝑥̅
b0 = 77 - (0.644)(78)
b0 = 26.768
Here, X = 88, transfer this value in developed regression equation: ŷ = 26.768 + 0.644 *
88 = 83.44 marks in statistics.
Each difference between the actual y values and the predicted y values is the error of the
regression line at a given point, and is referred to as the residual. It is the sum of squares
of these residuals that is minimized to find the least squares line. You can find predicted y
values by putting x values one by one in regression line that has been already developed.
Residuals represent errors of estimation for individual points. With large samples of data,
residual computations become laborious. Even with computers, a researcher sometimes has
difficulty working through pages of residuals in an effort to understand the error of the
regression model. An alternative way of examining the error of the model is the standard
error of the estimate, which provides a single measurement of the regression error. Because
the sum of the residuals is zero, attempting to determine the total amount of error by
summing the residuals is fruitless. This zero-sum characteristic of residuals can be avoided
by squaring the residuals and then summing them.
104
Σ(𝑦− Σ𝑦2=30275 Formatted: Indent: Left: 0.5 cm
Σ(𝑦− 𝑦̅) = 𝑦̅)2=
0.00 327.391
Formatted: Indent: Left: 0.5 cm
First calculate 𝑆𝑆𝐸= Σ(𝑦− 𝑦̂)2 OR Σ𝑦2− 𝑏0Σ𝑦− 𝑏1Σ𝑥𝑦 Formatted: Superscript
Formatted: Superscript
Standard error of the estimate Formatted: Subscript
𝑠𝑒= √𝑆𝑆𝐸/𝑛−2 = 10.44 Formatted: Subscript
A widely used measure of fit for regression models is the coefficient of determination, or r Formatted: Superscript
2
. The coefficient of determination is the proportion of variability of the dependent
variable(y) accounted for or explained by the independent variable (x). The coefficient of
determination ranges from 0 to 1. An r 2 of zero means that the predictor accounts for none Formatted: Superscript
of the variability of the dependent variable and that there is no regression prediction of y
by x. An r 2 of 1 means perfect prediction of y by x and that 100% of the variability of y is Formatted: Superscript
accounted for by x. Of course, most r 2 values are between the extremes. The researcher Formatted: Superscript
105
must interpret whether a particular r2 is high or low, depending on the use of the model Formatted: Superscript
and the context within which the model was developed. In the correlation example answer
is r = -0.71, so square of that is = 0.5041. That means 50% of the variation of dependent
variable y is explained by independent variable x.
Is r, the coefficient of correlation related to r2, the coefficient of determination in linear Formatted: Superscript
regression? The answer is yes: r2 equals (r)2. The coefficient of determination is the square Formatted: Superscript
of the coefficient of correlation. i.e. A regression model was developed to predict FTEs by Formatted: Superscript
number of hospital beds. The r 2 value for the model was 0.886. Taking the square root of Formatted: Superscript
this value yields r = 0.941, which is the correlation between the sample number of beds and
FTEs.
Note:
Because r2 is always positive, solving for r by taking square root gives the correct Formatted: Superscript
magnitude of r but may give the wrong sign. The researcher must examine the sign of the
slope of the regression line to determine whether a positive or negative relationship exists
between the variables and then assign the appropriate sign to the correlation value.
106
Check your progress 3
1. The coefficient of determination equals if r = 0.8045
a) 0.6471
b) -0.6471
c) 0
d) 1
2. Suppose the correlation coefficient between height (as measured in feet) versus weight
(as measured in pounds) is 0.40. What is the correlation coefficient of height measured in
inches versus weight measured in ounces? [12 inches = one foot; 16 ounces = one pound]
a) 0.40
b) 0.30
c) 0.533
d) Cannot be determined from information given
In this unit we have learned basics of correlation and linear regression. Correlation
gives answer as whether two variables are related to each other or not in terms of
positive or negative relationship. So, correlation gives answer of “how”, but
regression gives an extended answer that how much change you can expect or
predict based on the relation. As in regression line, change in dependent variable Y
can be predicted with any value of independent variable X. Broadly speaking, the
fitting of any chosen mathematical function to given data is termed as regression
analysis. The estimation of the parameters of this model is accomplished by the
least squares criterion which tries to minimize the sum of squares of the errors for
107
all the data points. After the model is fitted to data the next logical question is to
find out how good the quality of fit is. This question can best be answered by
conducting statistical tests and determining the standard errors of estimate. An
overall percentage variation explained by coefficient of determination can also be
computed. Finally, it can be concluded that the method of least squares used in
linear regression is applicable to different range of situations. Correlation and
regression both are important concepts for establishing relationships between
variables from the given data. The identified relationship and mathematical model
may be used for the purpose of prediction. Some of the models used in forecasting
of demand are based on regression-analysis. One of the models of forecasting,
named Time -series analysis is discussed in next unit.
1. b
2. c
3. c
Answers to check your progress 2
1. c
2. c
3. a
1. a
2. a
2.7 Glossary
Independent variable: A variable that can be set either to a desirable value or takes
values that can be observed but not controlled.
108
Dependent/Response variable: The variable of interest or focus which is influenced by
one or
more independent variable
Formatted: Subscript
Estimate: A value obtained from data for a certain parameter of the assumed model Formatted: Subscript
or a forecast value obtained from the model.
Formatted: Subscript
Formatted: Indent: Left: 0.5 cm
Linear regression: Fitting of any chosen mathematical model, linear in unknown
Formatted: Indent: Left: 0.5 cm
parameters, to a given data.
Formatted: Font: 12 pt, Complex Script Font: 12 pt, Bold
Model: A general mathematical relationship relating a dependent (or response) Formatted: List Paragraph, Indent: Left: 0.5 cm, First line: 0
cm, Numbered + Level: 1 + Numbering Style: a, b, c, … +
variable Y to independent variables X1 , X2 ……, Xn. Start at: 1 + Alignment: Left + Aligned at: 0.63 cm + Indent
at: 1.27 cm
b) b)What is the estimated revenue when the advertising expenditure is 7? Formatted: Font: 12 pt, Complex Script Font: 12 pt, Bold
Formatted: Indent: Left: 0.5 cm
Formatted: Font: 12 pt, Complex Script Font: 12 pt
c) c)Suppose SSR = 691 and SST = 1002. Find the value of R 2 and interpret the same in
Formatted: List Paragraph, Indent: Left: 0.5 cm, First line: 0
the context of the problem cm, Outline numbered + Level: 1 + Numbering Style: 1, 2, 3,
… + Start at: 1 + Alignment: Left + Aligned at: 0.63 cm +
Indent at: 1.27 cm
2. 2.Use the following data to determine the correlation equation of the least square Formatted: Indent: Left: 0.5 cm
regression line.
Formatted Table
X 12 21 28 8 20
Formatted: Indent: Left: 0.5 cm
Y 17 15 22 19 24
Formatted: Indent: Left: 0.5 cm
Formatted: Font: 12 pt, Complex Script Font: 12 pt
3. 3.What is the measure of correlation between the interest rate of federal funds and the
Formatted: List Paragraph, Indent: Left: 0.5 cm, First line: 0
commodities futures index? Use the following data: cm, Outline numbered + Level: 1 + Numbering Style: 1, 2, 3,
… + Start at: 1 + Alignment: Left + Aligned at: 0.63 cm +
Indent at: 1.27 cm
Days Interest Rate Future Index
Formatted: Indent: Left: 0.5 cm
1 7.43 223
Formatted: Indent: Left: 0.5 cm
2 7.48 221
Formatted: Indent: Left: 0.5 cm
3 8.00 222
4 7.75 226 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
109
5 7.58 225 Formatted: Indent: Left: 0.5 cm
6 7.64 223 Formatted: Indent: Left: 0.5 cm
7 7.69 224 Formatted: Indent: Left: 0.5 cm
8 8.01 221 Formatted: Indent: Left: 0.5 cm
9 8.23 227 Formatted: Indent: Left: 0.5 cm
10 8.45 235 Formatted: Indent: Left: 0.5 cm
11 8.52 241 Formatted: Indent: Left: 0.5 cm
12 8.56 238 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
4. 4.Find the equation of the regression line for the following data and compute the Formatted: Font: 12 pt, Complex Script Font: 12 pt
residuals. Formatted: List Paragraph, Indent: Left: 0.5 cm, First line: 0
cm, Outline numbered + Level: 1 + Numbering Style: 1, 2, 3,
… + Start at: 1 + Alignment: Left + Aligned at: 0.63 cm +
X 15 8 21 15 6 8 3 Indent at: 1.27 cm
Y 45 38 55 46 24 33 49 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
2.9 Activities
A student is required to collect the stock price and stock return of last 15 days of any
particular stock from “money control”. Now, identify independent and dependent variable,
find Ppearson correlation coefficient and regression line and comment on the outcome.
According to the Capital Asset Pricing Model (CAPM), the risk associated with a capital
asset is proportional to the slope β11 (or simply β : Regression coefficient of Y on X) Formatted: Subscript
obtained by regressing the assets past returns with the corresponding return of the average
portfolio called the market portfolio. (The return of the market portfolio represents the
return earned by the average investor. It is a weighted average of the returns from all the
assets in the market. The larger the slope of β on of an asset, the larger is the risk
associated with that asset. A β of 1.00 represents average risk. The return from IT firm’s
stock and the corresponding returns for the market portfolio for the past 10 years are given
below:
Market 16 12 11 17 14 13 18 15 08 10
Return
(X)
Stock’s 21 17 14 22 16 15 24 18 05 08 Formatted: Indent: Left: 0.5 cm
Return
(Y)
Formatted: Indent: Left: 0.5 cm
Answer the following questions:
Formatted: Indent: Left: 0.5 cm, First line: 0 cm, Tab stops:
1. What are the independent and dependent variables? Not at 1.27 cm
Formatted: Indent: Left: 0.5 cm
110
2. Carry out the regression and find the β for the stock. What is the regression
equation?
3. Does the value of the slope indicate that the stock has above average risk? (in the
range of 1± 0.1, interpret the risk.)
4. If the market portfolio return for the current year is 25%, what is the stocks return?
5. Calculate standard error of estimate
6. Calculate the Pearson correlation co-efficient and coefficient of the determination
and state its interpretation.
7. Carry out residual analysis for each value
111
Unit No. 3 Forecasting
_______________________________________
___
Unit Structure
3.0 Learning Objectives
3.1 Introduction
Let Us Sum Up
3.123.9 Glossary
3.133.10 Assignment
3.143.11 Activities
112
3.0 Learning Objectives
3.1 Introduction
Forecasting is a technique that in our day to day life or routine we use. Every day,
forecasting is used in the decision making as a science and art of predicting and then
planning future accordingly. It is a process which helps business people to reach
conclusions about buying, selling, producing, hiring, planning, manufacturing, inventory
management and many other actions. As an example, consider the following:
• Market watchers predict a low and high price, return on stock values short term,
medium term and long term.
• City planners forecast rain, temperature etc. in a particular city.
• Rising demand of laptops
• Predicting the future for paper industry
• Life insurance outlooks for number of claims for the next year.
• Trends or changes in demand for clothing or apparels over the period of time
• Change in eating habits over the period of time
How are these and other conclusions reached? What forecasting techniques are used? Are
the forecasts accurate? Here we will discuss several forecasting techniques, how to measure
the error of a forecast, and some of the problems that can occur in forecasting. Managers
are always trying to reduce uncertainty and make better estimates of what will happen in
the future. This is the main purpose of forecasting. So, in this unit we will focus only
on quantitative and causal models where data occur over time, time-series data. Time-
series data are data gathered on a given characteristic over a period of time at regular
intervals. Time-series forecasting techniques attempt to account for changes over time by
examining patterns, cycles, or trends, or using information about previous time periods to
113
predict the outcome for a future time period. Time-series methods include Moving
averages, Exponential smoothing, Least square regression trend analysis.
Sales Force Composite: This allows individual sales persons to estimate the sales in their
region and the data is compiled at a district or national level.
2. Time-series models: attempt to predict the future based on the past. Common time-
series models are: Moving average, Exponential smoothing, Trend projections.
3. Causal models: use variables or factors that might influence the quantity being
forecasted. The objective is to build a model with the best statistical relationship between
the variable to be forecasted and the independent variables. Regression analysis is
the most common technique used in causal modeling.
114
Check your progress 1
1.To apply causal model approach which following concept can be used:
a) Regression Analysis
b) Decision Theory
C) Moving Average
d) Exponential Smoothing
A time series is a sequence of evenly spaced events. Time-series forecasts predict the
future based solely on the past values of the variable, and other variables are ignored.
3.4.1 Components of Time-Series Analysis: A time series typically has four components:
Trend (T): is the gradual upward or downward movement of the data over time. This is
for the longer period of time generally more than five years. Trend change in preference of
mobile phones, selection of new homes etc.
Seasonal Change (S): is a pattern of demand fluctuations above or below the trend line
that repeats at regular intervals. This is the year by year, month by month change. i.e. Flu
disease every year during monsoon season. Generally, for short period of time, for less than
a year.
Cycles (C): are patterns in annual data that occur every several years. i.e. Every 5 years
election is there for choosing new prime minister, every 10 years Census calculation is
done by government etc.
Moving averages can be used when demand is relatively steady over time. The next
forecast is the average of the most recent n data values from the time series. This method
tends to smooth out short-term irregularities in the data series.
115
Moving Average Forecast = Sum of demand in previous n periods / n
Mathematically,
Example I
The demand for a product in each of the last five months is shown below.
Month: 1 2 3 4 5
Demand ('00s): 13 17 19 23 24
Solution of Example I
The two-month moving average for months two to five is given by:
The forecast for month six is just the moving average for the month before that i.e. the
moving average for month 5= m5 = 2350.
Exponential smoothing is a type of moving average that is easy to use and requires little
record keeping of data. The new estimate is the old estimate plus some fraction of the error
in the last period. The general approach is to develop trial forecasts with different values
of and select the with lowest mean absolute deviation (MAD) which will be discussed
in next section.
New forecast = Last period’s forecast+ * (Last period’s actual demand – Last period’s
forecast)
Where,
is a weight (or smoothing constant) in which 0≤≤1.
116
Mathematically,
Example II
In January, February’s demand for a certain car model was predicted to be 150.Actual
February demand was 166 autos. Using a smoothing constant of = 0.20, what is the
forecast for March?
Solution of Example II
New forecast (for March demand) = 150 + 0.2 x(166 – 150)= 153.2 or 153 autos
If actual demand in March was 146 autos, the April forecast would be:
New forecast (for April demand) = 153.2 + 0.2x(146 – 153.2)= 151.76 or 152 autos
Comparison of forecasted values with actual values to see how well model works. There
are several measures available for measuring accuracy as depicted below:
Example III
The table below shows the demand for a new aftershave in a shop for each of the last 7
months.
Month: 1 2 3 4 5 6 7
Demand: 23 29 33 40 41 43 49
117
a) Calculate a two-month moving average for months two to seven. What would be your
forecast for the demand in month eight?
b) Apply exponential smoothing with a smoothing constant of 0.1 to derive a forecast for
the demand in month eight.
c) Which of the two forecasts for month eight do you prefer and why?
a) The two-month moving average for months two to seven is given by:
The forecast for month eight is just the moving average for the month before that i.e. the
moving average for month 7 = m7 = 46.
M1 = Y1 = 23
M2 = 0.1Y2 + 0.9M1 = 0.1x(29) + 0.9x(23) = 23.60
M3 = 0.1Y3 + 0.9M2 = 0.1x(33) + 0.9x(23.60) = 24.54
M4 = 0.1Y4 + 0.9M3 = 0.1x(40) + 0.9x(24.54) = 26.09
M5 = 0.1Y5 + 0.9M4 = 0.1x(41) + 0.9x(26.09) = 27.58
M6 = 0.1Y6 + 0.9M5 = 0.1x(43) + 0.9x(27.58) = 29.12
M7 = 0.1Y7 + 0.9M6 = 0.1x(49) + 0.9x(29.12) = 31.11
As before the forecast for month eight is just the average for month 7 = M 7 = 31.11 = 31
(as we cannot have fractional demand).
c) To compare the two forecast we calculate the mean squared deviation (MSD). If we do
this we find that for the moving average
and for the exponentially smoothed average with a smoothing constant of 0.1
Overall then we see that the two-month moving average appears to give the best one
month ahead forecasts as it has a lower MSD. Hence, we prefer the forecast of 46 that has
118
been produced by the two-month moving average. Same way MSE can be used to
compare the results and come up with the final decision.
2. An orderly set of data arranged in accordance with their time of occurrence is called:
(a) Arithmetic series (b) Harmonic series (c) Geometric series (d) Time series
5. Damages due to floods, droughts, strikes fires and political disturbances are:
(a) Trend (b) Seasonal (c) Cyclical (d) Irregular
Example IV
The sales of a company (in thousand rupees) for each year are shown in the table below.
119
a)Find the least square regression line y = a x + b.
b) Use the least squares regression line as a model to estimate the sales of the company in
2012.
Solution of Example IV
a) We first change the variable x into t such that t = x - 2005 and therefore t represents the
number of years after 2005. Using t instead of x makes the numbers smaller and therefore
manageable. The table of values becomes.
t (years after
0 1 2 3 4
2005)
Calculate a and b included in the least regression line formula. Formatted: Indent: Left: 0.5 cm
t y t*y t2
We now calculate a and b using the least square regression formulas for a and b. Formatted: Indent: Left: 0.5 cm
120
Given
previous
for
simple
a.
b.
c.
d.
e.
of58.9
45.5
65.5
57.1
61.0
.3,
thewhat
an
next
exponential
actual
forecast
would
period
demand
of
the
Check your progress 3be
smoothing?
58,
forecast
using
and
of 61,
ana
1. Actual demand of 103, and previous value of 99, alpha is 0.4, the next period exponential
smoothing forecast:
a)96.9
b)100.6
c)101.7
d)102
3.If values of a = 11.8 and b = 19 when years from 2008 to 2015. Regression Equation is
_______________and forecast for 2018_____________
121
• Transport planning and Transportation forecasting
• Weather forecasting and Flood forecasting
The unit mainly focuses on the importance of forecasting in all our short term, medium
term and long-term planning decisions. For long term planning decisions, qualitative
techniques like technological forecasting, expert opinions through Delphi or opinion
polls using personal interviews or questionnaires. For medium-term and short-term
decisions, apart from subjective and intuitive methods there is a wide variety of
statistical techniques that could be employed. The methods like Moving averages or
exponential smoothing are based on past data. Any suitable mathematical function
can be fitted to the demand history by using least squares regression. Regression is also
used in estimation of parameters of causal or econometric models.
1. a
2. Qualitative, Feelings/Intuition
3. Future Demand
1. c
2. d
3. d
4. b
5. d
1. b
2. d
122
3.9 Glossary
Moving Average: An average computed by considering the N most recent (for a K-period
moving average) demand points, commonly used for short term forecasting.
Prediction: A term to denote the estimate or guess of a future variable that may bearrived
at by subjective feelings or intuition.
Time Series: Any data on demand, sales or consumption taken at regular intervals oftime
is a time series. Analysis of this time series to discover patterns ofgrowth, demand, seasonal
trends or random fluctuations is known as Time Serieanalysis.
Causal Models: Forecasting models wherein the demand or variable or interest is related
to impact analysis or causal variables.
Delphi: A method of collecting information from experts, useful for long term forecasting.
It is iterative and maintains confidentiality to reduce subjective bias.
3.10 Assignment
1. The table below shows the demand for a particular brand of razor in a shop for each of
the last nine months.
Month 1 2 3 4 5 6 7 8 9
Demand 10 12 13 17 15 19 20 21 20
a) Calculate a three-month moving average for months three to nine. What would be your
forecast for the demand in month ten?
b) Apply exponential smoothing with a smoothing constant of 0.3 to derive a forecast for
the demand in month ten.
c) Which of the two forecasts for month ten do you prefer and why?
2. The table below shows the demand for a particular brand of fax machine in a department
store in each of the last twelve months.
123
Month 1 2 3 4 5 6 7 8 9 10 11 12
Demand 12 15 19 23 27 30 32 33 37 41 49 58
a) Calculate the four-month moving average for months 4 to 12. What would be your
forecast for the demand in month 13?
b) Apply exponential smoothing with a smoothing constant of 0.2 to derive a forecast for
the demand in month 13.
c) Which of the two forecasts for month 13 do you prefer and why?
3. Find the regression trend line for the following data of equity fund investment (In lakhs
of rupees per year) from 2001 to 2018.
1. You are required to collect the data of corona cases registered and recovered from march Formatted: Indent: Left: 0.5 cm
20, 2020 to June 20,2020. Analyze the trend between two variables. And forecast the
number of new cases for the month of July, 2020.
2. Visit a manufacturing company which is established for at least 15 years. Select any
product of the company if they are manufacturing more than one product. Collect the data
of price, production, demand, sales year wise. Now identify the change in each variable
data with respect to years passed.
Following are the average yields of long-term new corporate bonds over a several-month
period published by the Market Finance Department of the Treasury.
124
3 9.24 12 8.12 21 6.88 Formatted: Indent: Left: 0.5 cm
4 9.23 13 7.91 22 6.88 Formatted: Indent: Left: 0.5 cm
5 9.69 14 7.73 23 7.17 Formatted: Indent: Left: 0.5 cm
6 9.55 15 7.39 24 7.12 Formatted: Indent: Left: 0.5 cm
7 9.37 16 7.48 Formatted: Indent: Left: 0.5 cm
8 8.55 17 7.52 Formatted: Indent: Left: 0.5 cm
9 8.36 18 7.48 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
a) Explore trends in these data by using regression trend analysis.
b) Use a 4-month moving average to forecast values for each of the ensuing months.
c)Use simple exponential smoothing to forecast values for each of the ensuing months. Let Formatted: Font: 12 pt, Complex Script Font: 12 pt
a = .3 Formatted: List Paragraph, Indent: Left: 0 cm, Hanging: 0.5
cm, Outline numbered + Level: 5 + Numbering Style: a, b, c,
c) and then let wWhich weight produces better forecasts? … + Aligned at: 5.71 cm + Indent at: 6.35 cm
Formatted: Font: 12 pt, Complex Script Font: 12 pt
d) Compute MAD for the forecasts obtained in parts (b) and (c) and compare the results. Formatted: Indent: Left: 0.5 cm
1. Business Statistics: For Contemporary Decision Making, by Ken Black, Wiley Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Publication
2. Quantitative Techniques in Management, by N.D. Vora, McGraw hills
3. Operations Research theory and Applications, by J.K. Sharma, Macmillan
4. Operations Research, By Hamdy A Taha, Pearson Education
5. Quantitative Analysis for Management, by Barry Render, Ralph M. Stair, Michael E.
Hanna and T. N. Badri, Pearson Publication
6. Statistics for management, Levin and Rubin, Pearson Education
7. Business Statistics, David M. levine et al, Pearson Education
8. Use of software like QM for Windows, Excel Solver
Formatted: Indent: Left: 0.5 cm
125
Block Summary
In this block, we learned various techniques about the vital aspects of any business that is
decision making and forecasting. The decisions taken by applying quantitative methods
may be used to achieve optimum profit or cost and perhaps it can help to forecast future
also. In the first unit, one stage and multi stage decision making techniques were explained.
Decision making under uncertainty and risk along with the decision tree approach with
certainty have been discussed. In the second unit, linear relationships between independent
and dependent variables were discussed with the help of concepts like correlation,
coefficient of determination and regression analysis. In the last unit, the forecasting
techniques with various models were explained. Third unit also covered the time series
analysis and least square regression analysis.
126
Block Assignment
If tenders are to be submitted the company will incur additional costs. These costs will have
to be entirely recouped from the contract price. The risk, of course, is that if a tender is
unsuccessful the company will have made a loss.
The cost of tendering for contract MS1 only is 50,000. The component supply cost if the
tender is successful would be 18,000.
The cost of tendering for contract MS2 only is 14,000. The component supply cost if the
tender is successful would be 12,000.
The cost of tendering for both contract MS1 and contract MS2 is 55,000. The component
supply cost if the tender is successful would be 24,000.
For each contract, possible tender prices have been determined. In addition, subjective
assessments have been made of the probability of getting the contract with a particular
tender price as shown below. Note here that the company can only submit one tender and
cannot, for example, submit two tenders (at different prices) for the same contract. Solve
the dilemma with decision tree approach.
127
4. Forecast next year's sales based on changes in GDP.
Year Sales GDP Formatted: Indent: Left: 0.5 cm
2015 100 1.00% Formatted: Indent: Left: 0.5 cm
2016 250 1.90% Formatted: Indent: Left: 0.5 cm
2017 275 2.40% Formatted: Indent: Left: 0.5 cm
2018 200 2.60% Formatted: Indent: Left: 0.5 cm
2019 300 2.90% Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
5. Calculate the pearson product moment correlation coefficient and regression line for Formatted: Indent: Left: 0.5 cm, First line: 0 cm
128
Block Structure
______________________________
Block Introduction
Operation research is always the vital part of any industry. The agenda of doing research
on operations is maximum utilization of available resources within given restrictions. As
resources are generally scare, there is a need of learning techniques which can help in
achieving maximum profit along with minimum cost. Thus, here in this block we will
explore some of the most common and useful techniques of linear programming problems
for two or more variables. The first unit describes about formulation of given problem into
mathematical function and then solve it with graphical analysis to come up with the
decisions. Decisions are always regarding two possible objectives, either maximization of
profit or minimization of cost. The second unit describes about simplex method which is
used when two or more decision variables are concerned for utilizing available resources
in best possible way to maximize profit. The third unit describes about developing
transportation schedule for the shipment from one source to another destination. In the
fourth unit we will explore the assignment concept that is useful to understand the
allocation of jobs/projects to employees, workers, machines with the scientific approach.
Block Objectives
After learning this block, you will be able to:
129
• Understand the practicality of the concept with stated assumptions
• Understand the basic feasible solution of a transportation problem by various methods
• Obtain the minimum transportation cost schedule by using Modified Distribution
Method
• Discuss the special cases of transportation
• Discuss the steps of learned method when problem is related to minimization
• Understand the concept and assumptions in comprehensive manner
• Learn algorithm of Hungarian assignment method
• Use the algorithm for solving an assignment problem
• Learn special cases of assignment
Block Structure
Unit 3: Transportation
Unit 4: Assignment
130
______________________________
Unit Structure
1.0 Learning Objectives
1.1 Introduction
1.1.1 Characteristics of LPP
1.2 Formulation of Linear Programming Problem (LPP)
1.2.1 Steps of Linear Programming Formulation
1.2.2 Examples of LPP Formulation
Check your progress 1
1.3 Graphical Analysis
1.3.1 Steps of Graphical Analysis
1.3.2 Example of Graphical Analysis
1.3.3 Slack and Surplus
1.3.4 Convex and Non-Convex Set
Check your progress 2
1.4 Types of constraints
1.5 Special Cases
1.5.1Multiple Optimal Solutions
1.5.2Unbounded Solution
1.5.3Infeasibility
Check your progress 3
1.6 Application Areas of Linear Programming in Business
1.7 Let Us Sum Up
1.8 Answers for Check your Progress
1.9 Glossary
1.10 Assignment
1.11 Activities
1.12 Case Study
131
1.13 Further Readings
1.1 Introduction
Here, only graphical method for two decision variables is presented in this unit, easy and
efficient computational procedures known as algorithms are available to solve linear
programming problems. The development of various software has been helpful to solve
these problems with a large number of decision variables and constraints.
132
1.1.1 Characteristics of LPP
• One objective function- maximization or minimization
• One or more constraints- that limits the degree to which the objective can be obtained
• Mathematical relationships of objectives and constraints are always linear
• Linear programming models are always deterministic in nature
• Finite choices- only positive numbers we can take
The formulation of a linear programming problem can be explained through product mix
problem. Typically, it occurs in a manufacturing industry where there is a requirement of
manufacturing variety of products with given set of resources. Each of the products has a
certain margin of profit per unit and cost per unit. These products use a common bunch of
resources – according to availability. The linear programming technique identifies the
combination of the products which will either maximize the profit or Minimize the cost
without violating the restrictions related to resources. So, the company would like to
determine how many units of each product it should produce so as to maximize overall
profit or minimize overall production cost. Basically, it involves two types of LPPs:
Maximization (Profit) and Minimization (Cost).
Example I (Maximization)
The Jay Ambe Company produces two types of products tables and chairs. Processes are
Solution of Example I
A firm wants to determine the best combination of tables and chairs to produce to reach
Hours required to produce one unit
Department Tables (x1) Chairs (x2) Available Hours/Week Formatted: Indent: Left: 0.5 cm
Carpentry 5 4 250 Formatted: Indent: Left: 0.5 cm
133
Painting 2 2 110 Formatted: Indent: Left: 0.5 cm
Profit Per Unit 65 60 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
• The hours of painting time used cannot exceed 110 hours per week.
Formatted: Indent: Left: 0.5 cm
The decision variables according to two types of products are:
• x1 = number of tables to be produced per week. Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Formatted: Subscript
• x2 = number of chairs to be produced per week.
Formatted: Subscript
Formatted: Indent: Left: 0.5 cm
Now, write the LP objective function in terms of x1 and x2:
Formatted: Subscript
And we know that we can use total carpentry time or less than given time but not more
Formatted: Subscript
than that
Formatted: Subscript
5x1 + 4x2 ≤ 250 (hours of carpentry time)
Formatted: Subscript
Similarly, for painting, the function is 2x1 + 2x2 ≤ 110, Both of these constraints restrict Formatted: Subscript
134
x1 , x2 ≥ 0 (non-negativity constraint)
Example II (Minimization)
A farm is engaged in breeding pigs. The pigs are fed on various products grown on the
Nutrient content
in feeds Minimum requirement
Nutrient
A B of feed nutrient fora
pig
M1 12 6 108
M2 3 9 81
M3 15 10 150
Solution of Example II
And we know that we are required to feed minimum or equal to nutrient amount but not
less than that
12A + 6B≥108 (Minimum Nutrient M1 Requirement) Formatted: Subscript
135
Similarly, for Nutrient M2 and M3, the functions are3A + 9B ≥ 81 and 15A + 10B ≥ 150
respectively.
From the above examples, we can see that with maximization type of problems constraints
must have “Less than or Equal to Sign” while in minimization type of problems constraints
have “Greater than or Equal to sign”. But Sometimes it can be a combination of both types
“Less than and Greater than types of constraints” according to availability of resources.
136
1.3 Graphical Analysis
The easiest way to solve a small LPP is graphically. The graphical method only works
when there are just two decision variables. When there are more than two variables, a more
complex approach is needed as it is not possible to plot the solution on a two-dimensional
graph. The graphical method provides valuable insight into how other approaches work.
Example III
Step 1
The first step in solving the problem is to identify a set or region of feasible solutions.
To do this we plot each constraint equation on a graph.
137
4(0) + 3x2 = 240
3x2 = 240 Formatted: Subscript
x2 = 80 Formatted: Subscript
(x1 = 60, x2 = 0)
Step 2
138
2(0) + x2 = 100
x2 = 100 Formatted: Subscript
Drilling Constraint
1
4
Step 3
In above graph, there is a feasible region which means “The region which satisfies all the
constraints”. For drilling and milling constraints, maximum availability is 240 and 100
respectively so identify the common region which satisfies both the constraints. (For
common feasible region identification, consider the sign of constraint in terms of “Less
than”, “Greater Than” or “Equal To”).
Once the feasible region has been graphed, we need to find the optimal solution from the
many possible solutions. This approach is known as Corner Point Method. It involves
looking at the profit at every corner point of the feasible region.The mathematical theory
behind LP is that the optimal solution must lie at one of the corner points, or extreme point,
139
in the feasible region. For this example, the feasible region is a four-sided polygon with
four corner points labeled 1, 2, 3, and 4 on the graph.
To find the coordinates for Point accurately we have to solve for the intersection of the two
constraint lines. Using the simultaneous equations method, we multiply the Milling
equation by –2 and add it to the Drilling equation
Find the final solution by putting all x1 and x2 values in objective function. Formatted: Subscript
Formatted: Subscript
Points X1 X2 Maximize Z = 70x1 + 50x2 Formatted: Subscript
1 0 0 0 Formatted: Subscript
2 0 80 4000 Formatted: Subscript
3 30 40 4100 Formatted: Subscript
4 50 0 3500 Formatted: Subscript
Formatted: Indent: Left: 0.5 cm
Because Point 3 returns the highest profit, this is the optimal solution. Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
1.3.3 Slack and Surplus
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Slack is the amount of a resource that is not used. For a less-than-or-equal constraint:
In Example 3, Optimal solution is x1=30 and x2=40, put these values in 4x1 + 3x2 = 240 Formatted: Subscript
4(30) + 3(40) = 240, here LHS = RSH So No Slack Formatted: Subscript
Formatted: Subscript
In Example 3, Optimal solution is x1=30 and x2=40, put these values in 2x1 + 1x2 = 100 Formatted: Subscript
2(30) + 1(40) = 100, here LHS = RSH So No Slack, No Surplus Formatted: Subscript
Formatted: Subscript
In both the constraints, full utilization of resources has one so no slack.
Formatted: Subscript
Formatted: Subscript
Surplus is used with a greater-than-or-equal constraint to indicate the amount by which
the right-hand side of the constraint is exceeded.
140
For Example, If actual amount is 240 but minimum requirement is 160 only so you will
have remaining value of (240 – 160 = 40) as surplus with you.
If any two points are selected in the region and the line segment formed by joining these
two points lies completely on the boundary of the feasible region then it is a Convex Set
i.e. Feasible region is always convex set
If any two points are selected in the region and the line segment formed by joining these
two points do not lie on the boundary of the feasible region then it is a Non-Convex Set
2. The objective function for a L.P model is 3x1+2x2, if x1=20 and x2=30, what is the value of the Formatted: Subscript
objective function? Formatted: Subscript
A) 0 Formatted: Subscript
B) 50 Formatted: Subscript
C) 60
D) 120
3. The graphical method can only be used when there are _____ decision variables
1. Binding Constraints: If in the constraints LHS = RHS when optimal values of the
decision variables are substituted into the constraints then those constraints are binding
constraints
141
2. Non - Binding Constraints: If in the constraints LHS ≠ RHS when optimal values of
the decision variables are substituted into the constraints then those constraints are Non-
binding constraint
3. Redundant Constraints: When a constraint, when plotted, does not form part of the
boundary marking the feasible region of the problem, it is said to be Redundant
It does not affect the optimal solution to the problem
1.5.1 Multiple Optimal Solutions: A solution which have similar values of profits
or costs so not unique but more than one optimal solution are possible.
Example IV
Subject to
4x1+ 3x2 ≤ 24
x1 ≤ 4.5
x2 ≤ 6
x1 ≥ 0 , x 2 ≥ 0
Solution of Example IV
142
The corner points of feasible region are A, B, C and D. So the coordinates for the corner
points are
A (0, 6)
B (1.5, 6) (Solve the two equations 4x1+ 3x2 = 24 and x2 = 6 to get the coordinates)
C (4.5, 2) (Solve the two equations 4x1+ 3x2 = 24 and x1 = 4.5 to get the coordinates)
D (4.5, 0)
We know that Max Z = 4x1 + 3x2
At A (0, 6)
Z = 4(0) + 3(6) = 18
At B (1.5, 6)
Z = 4(1.5) + 3(6) = 24
At C (4.5, 2)
Z = 4(4.5) + 3(2) = 24
At D (4.5, 0)
Z = 4(4.5) + 3(0) = 18
Max Z = 24, which is achieved at both B and C corner points. It can be achieved not only
at B and C but every point between B and C. Hence the given problem has multiple optimal
solutions.
Example V
Solution of Example V
143
The second constraint x1+ x2 ≥ 6, written in a form of
equation x1+ x2 = 6
Put x1 =0, then x2 = 6
Put x2 =0, then x1 = 6
The coordinates are (0, 6) and (6, 0)
The corner points of feasible region are A, B, C and D. So the coordinates for the corner
points are
A (0, 7)
B (1, 5) (Solve the two equations 2x1+ x2 = 7 and x1+ x2 = 6 to get the coordinates)
C (4.5, 1.5) (Solve the two equations x1+ x2 = 6 and x1+ 3x2 = 9 to get the coordinates)
D (9, 0)
We know that Max Z = 3x1 + 5x2
At A (0, 7)
Z = 3(0) + 5(7) = 35
At B (1, 5)
Z = 3(1) + 5(5) = 28
At C (4.5, 1.5)
144
Z = 3(4.5) + 5(1.5) = 21
At D (9, 0)
Z = 3(9) + 5(0) = 27
The values of objective function at corner points are 35, 28, 21 and 27. But there exists
infinite number of points in the feasible region which is unbounded. The value of objective
function will be more than the value of these four corner points i.e. the maximum value of
the objective function occurs at a point at ∞. Hence the given problem has unbounded
solution.
1.5.3 Infeasibility: The set of values of decision variables which do not satisfy all
the constraints and non-negativity conditions of an LP problem simultaneously is said to
constitute the infeasible solution to that linear programming problem. In common, when it
is not possible to find common region that satisfies all constraints simultaneously.
Example VI
Subject to
x1+ x2 ≤ 1
x1+ x2 ≥ 3
x1 ≥ 0 , x 2 ≥ 0
Solution of Example VI
145
There is no common feasible region generated by two constraints together i.e. we cannot
identify even a single point satisfying the constraints. Hence there is no optimal solution.
4. Infinitive feasible solutions but none of them can be termed as an optimal solution is
known as ______________ special case of LPP.
5. If one or more optimal solutions have same value as maximum profit or minimum cost is
termed as ________________________
Production Mix: Number of units of production for one or more different products for
146
Maximizing the profit or Minimizing the cost
Ingredient Mix: Ingredient Mixing proportion decision for making one or more products
In this unit, we started with the general introduction of linear programming problem
followed by identification of the decision variables which are with some economic or
physical quantities, whose values are of major interest to the management. The problem
must have a well-defined objective function expressed in terms of the decision variables.
The objective function must be maximized when it expresses the profit or contribution. In
case the objective function indicates a cost, it must be minimized. When a problem of
management is expressed in terms of the mathematical function by using decision variables
with appropriate objective function and constraints, the problem has been formulated. A
linear programming problem with only two decision variables can be solved graphically.
Any non-negative solution which satisfies all the constraints is known as a feasible solution
of the problem. The common region which satisfies all the constraints is known as a
feasible region. The value of the decision variables which maximize or minimize the
objective function is located on the extreme point of the convex set (Feasible Region)
formed by the feasible solutions. From the all feasible solutions, there can be one or more
optimal solutions. Sometimes the problem may be infeasible indicating that no feasible
solution of the problem exists. Sometimes there is no boundary to form the convex set and
thus number of multiple optimal solutions can be considered but none of them can be
termed as an optimal solution. The different applicability of linear programming is also
discussed in this unit.
147
3x1 + 2x2 ≥ 9 (Mediocre Iron Constraint)
4x1 + 10x2 ≥ 22 (Bad Iron Constraint) Formatted: Subscript
x1, x2 ≥ 0 (Non-negativity constraint) Formatted: Subscript
Formatted: Subscript
2. False Formatted: Subscript
3. A
1. A
2. D
3. Two
4. Feasible
1. Binding
2. D
3. True
4. Unboundedness
1.9 Glossary
Decision Variables: are economic or physical quantities whose numerical values indicate
the solution of the linear programming problem.
148
A Redundant Constraint: is a constraint which does not affect the feasible region.
A Convex Set: is a collection of points such that for any two points on the set, the line
joining the points belongs to the set.
Non-Convex Set: If any two points are selected in the region and the line segment formed
by joining these two points do not lie on the boundary of the feasible region.
1. A retired person wants to invest up to an amount of Rs. 30,000 in fixed income securities.
His broker recommends investing in two bonds: Bonds A yielding 7% and Bond B
yielding 10%. After some consideration, he decides to invest at most Rs. 12,000 in Bond
A and least Rs. 6000 in Bond B. He also wants the amount invested in Bond A to be at Formatted: Font color: Red
least equal to the amount invested in Bond B. What should the broker recommended if the Formatted: Font color: Red
investor wants to maximize his return on investment? Solve graphically.
2. A firm manufactures two products TV & DVD player which must be processed through
two processes, Assembly and Finishing. Assembly process has 90 hours available and
finishing process has 82 hours available. For 1 TV set requires 5 hours in assembly and 3
hours in finishing, while 1 DVD player set requires 6 hours in assembly and 4 hours in
finishing. If profit is Rs. 900 per TV and Rs. 600 per DVD player set, find out the best
combination of TV and DVD player set to realize a maximum profit.
149
Y≤4,
X,Y≥0.
Observe the solution and comment on it.
5. A firm uses lathes, milling and grinding machines to produce two parts. Following table
represents the machining times required for each part, available machine time on different
machines and the profit values:
Maximu tim
Machine Type Required Machine time (Min) m e
Part - I Part – II available per Formatted: Indent: Left: 0.5 cm
week
(Min) Formatted: Indent: Left: 0.5 cm
Lathes 12 6 3000 Formatted: Indent: Left: 0.5 cm
Milling Machines 4 10 2000 Formatted: Indent: Left: 0.5 cm
Grinding Machines 2 3 900 Formatted: Indent: Left: 0.5 cm
Profit per Unit (Rs) 40/- 100/- Formatted: Indent: Left: 0.5 cm
Visit a manufacturing company, collect the data regarding any two types of products they
produce which use any number of common resources, cost or profit per unit of product,
minimum or maximum availability of resources, number of hours or kgs etc. required to
produce one unit of product. Then prepare a table of the information, formulate as LPP and
solve graphically to identify optimal cost or profit
1) 1)Solve the problem using Graphical method to determine the optimum product mix Formatted: Font: 12 pt, Complex Script Font: 12 pt
of capacitors and resistors for the next month. Also, determine corresponding optimum Formatted: List Paragraph, Numbered + Level: 1 +
achievable profit due to sales of Resistors and Capacitors. Which facilities are fully Numbering Style: 1, 2, 3, … + Start at: 1 + Alignment: Left +
Aligned at: 0.5 cm + Indent at: 1.14 cm
utilized and which resources are left unused at the optimal stage?
Formatted: Font: 12 pt, Complex Script Font: 12 pt
Formatted: Font: 12 pt, Complex Script Font: 12 pt
150
2) Are there alternate (multiple) optimal solutions available to Mr. Pavan Kumar? If so,
suggest another solution.
1. Quantitative Techniques in Management, by N.D. Vora, McGraw hills Formatted: Indent: Left: 0.5 cm, First line: 0 cm
2. Operations Research theory and Applications, by J.K. Sharma, Macmillan
3. Operations Research, bBy Hamdy A. Taha, Pearson Education
4. Quantitative Analysis for Management, by Barry Render, Ralph M. Stair, Michael E. Formatted: Indent: Left: 0.5 cm, Hanging: 0.75 cm
Hanna and T. N. Badri, Pearson Publication
5. Use of software like QM for Windows, Excel Solver Formatted: Indent: Left: 0.5 cm, First line: 0 cm
151
______________________________________________________________________
Unit Structure
2.0 Learning Objectives
2.1 Introduction
2.2 Simplex Method
2.2.1 Algorithm of simplex method
2.2.2 Principles of simplex method
2.2.3 Computational part of simplex method
Check your progress
2.3 Let Us Sum Up
2.4 Answers for Check your Progress
Assignment
Activities
Case Study
Further Readings
152
2.0 Learning Objectives
2.1 Introduction
The graphical method that you have learned in block 2 unit 1 of solving linear programming
problem is a vital help to understand basic structure of problem, the method has limited
applications in industrial problems as the number of variables occurring are always
substantially large. A more useful method known as Simplex Method is suitable for solving
linear programming problems with a larger number of variables means more than or equal
two variables. This method through an iterative process progressively approaches and
finally reaches to the maximum or minimum value of the objective function. The method
also helps the decision maker to identify the redundant constraints, an unbounded solution,
multiple solution and an infeasible problem.
Every linear programming problem has a dual problem associated with it. The solution of
this problem is readily obtained from the solution of the original problem if simplex method
is used for this purpose. The variables of dual problem are known as dual variables or
shadow price of the various resources. The solution of the dual problem can be used by the
decision maker for augmenting the resources.
Simplex method was developed by G. Danztig in 1947. The simplex method provides an
algorithm which is based on the fundamental theorem of linear programming. The Simplex
algorithm is an iterative procedure for solving LP problems in a finite number of steps. It
consists of following:
• Having a trial basic feasible solution to constraint-equations
153
• Testing whether it is an optimal solution
• Improving the first trial solution by a set of rules and repeating the process till an optimal
solution is obtained
To solve a linear programming problem in standard form, use the following steps.
1. Convert each inequality in the set of constraints to an equation by adding slack variables.
2. Create the initial simplex tableau and Calculation of Zj and test the basic feasible solution
for optimality.
3. This step is to improve the basic feasible solution, the vector entering the basis matrix
and the vector to be removed from the basis matrix are determined. Locate the highest
negative entry in the bottom row. The column for this entry is called the entering column.
(If ties occur, any of the tied entries can be used to determine the entering column). Now
find minimum ratio considering column respective to incoming variable. Select the
minimum value as outgoing variable from minimum ratio. (If negative minimum ratio then
never consider it). Intersection point of incoming variable column and outgoing variable
row is selected.
4. Mark the key element at the intersection of incoming and outgoing variables. Divide all
the elements of that row by the key element. Then subtract appropriate multiples of this
new row from the remaining rows, so as to obtain zeroes in the remaining position of the
respective column.
6. If all entries in the bottom row are zero or positive, this is the final tableau.
154
Since the left-hand side of each inequality is less than or equal to the right-hand side, there
must exist non-negative numbers and that can be added to the left side of each equation to
produce the following system of linear equations. The numbers are called slack
variables because they take up the “slack” in each inequality. Remember that slack
variables are counted only for constraints not for objective function.
Example I
Subject to
x1 + x2 ≤ 4
x1 – x2 ≤ 2
and x1 ≥ 0, x2 ≥ 0
Solution of Example I
1. Convert each inequality in the set of constraints to an equation by adding slack variables.
Subject to
x1 + x2+ s1= 4
x1 – x2 + s2= 2
155
x1 ≥ 0, x2 ≥ 0, s1 ≥ 0, s2 ≥ 0
2. Create the initial simplex tableau and Calculation of Zj and test the basic feasible Formatted: Subscript
solution for optimality.
The simplex method is carried out by performing elementary row operations on a matrix
that we call the simplex table. This table consists of the matrix corresponding to the
coefficients of constraints together with the coefficients of the objective function written
in the specific form in the table. Objective function values at the initial simplex table are
always considered negative.
For below points Cij = coefficients of objective function for x1, x2, s1, s2. Formatted: Indent: Left: 0.5 cm
x1 = C′B X1 – Cj =( 0 * 1 + 0 * 1) – 3 = -3
x2 = C′B X2 – Cj =( 0 * 1 + 0 * -1) – 2 = -2
s1 = C′B X3 – Cj = (0 * 1 + 0 * 0) – 0 = 0
s2 = C′B X4 – Cj = (0 * 0 + 0 * 1) – 0 = 0
In this problem it is observed that there are negative values -3 and -2. Hence proceed to
improve this solution.
3.This step is to improve the basic feasible solution, the vector entering the basis matrix
and the vector to be removed from the basis matrix are determined. Locate the highest
negative entry in the bottom row. The column for this entry is called the entering column.
(If ties occur, any of the tied entries can be used to determine the entering column.). Now
find minimum ratio considering column respective to incoming variable. Select the
minimum value as outgoing variable from minimum ratio. (If negative minimum ratio then
never considers it). Intersection point of incoming variable column and outgoing variable
row is selected.
156
Basic CB X1 = Xk x2 s1 s2 XB = RHS Minimum Formatted: Left
Variables of Ratio = Formatted: Indent: Left: 0.5 cm
constraint XB/Xk
s1 0 1 1 1 0 4 4/1 = 4 Formatted: Indent: Left: 0.5 cm
s2 0 1= - 0 1 2 2/1 = 2 Formatted: Indent: Left: 0.5 cm
Intersection 1 Outgoing
Point = Key Variable
Element
Z= -3 = -2 0 0 Formatted: Indent: Left: 0.5 cm
(C′B* Incoming Formatted: Left
XB ) = Formatted: Indent: Left: 0.5 cm
0 Formatted: Left
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
4. Mark the key element at the intersection of incoming and outgoing variable divide all
Formatted: Justified, Indent: Left: 0.5 cm
the elements of that row by the key element. Then subtract appropriate multiples of this
new row from the remaining rows, so as to obtain zeroes in the remaining position of the
column Xk.
Formatted: Indent: Left: 0.5 cm
Here key element is 1, so divide respective second row by value 1. Related calculation is
shown below.
Use (R1=R1 – R2) for first row calculation that is 1-1 = 0, 1-(-1)=2, 1-0 = 1, 0-1 = -1, 4-2 =
2 respectively.
157
not an optimal solution.
As all the values are positive, this is an optimal solution, X B values are the solution so
answer is X1 = 3 and X2 = 1, thus maximum profit Z = 3*x1 + 2*x2 = 11
Example II
Solution of Example II
158
Cj → 80 55 0 0
Basic CB XB X1 X2 S1 S2 Min ratio Formatted: Indent: Left: 0.5 cm
Variables XB /Xk Formatted: Indent: Left: 0.5 cm
40 / 4 = 10→
s1 0 40 44 2 1 0 outgoing Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
s2 0 32 2 4 0 1 32 / 2 = 16 Formatted: Indent: Left: 0.5 cm
↑incoming Formatted: Indent: Left: 0 cm
Z= CB XB = 0 -80 -55 0 0 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
x1 80 10 1 ½ 1/4 0 10/1/2 = 20 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
12/3 = 4→ outgoing
Formatted: Indent: Left: 0.5 cm
s2 0 12 0 3 -1/2 1
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
↑incoming
Formatted: Indent: Left: 0.5 cm
Z = 800 0 -15 40 0
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
1/
x1 Formatted: Indent: Left: 0.5 cm
80 8 1 0 3 -1/6
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
-
x2 55 4 0 1 1/6 1/3 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Z = 860 0 0 35/2 5 Formatted: Indent: Left: 0.5 cm
Answer is X1= 8 and x2 = 4, so Z = 860 Formatted: Indent: Left: 0.5 cm
159
Check your progress 1
1. In the simplex method, a tableau is optimal only if all the Z values at the end of the solution:
(a) zero or negative.
(b) zero.
(c)negative and nonzero.
(d) positive and zero.
2. Linear programming problem involving more than two variables can be solved by:
Subject to,
x1 + 2x2 + 2x3 ≤ 8
3x1 + 2x2 + 6x3 ≤ 12
2x1 + 3x2 + 4x3 ≤ 12
x1, x2, x3 ≥ 0
The simplex method is the appropriate method for solving a linear programming problem
with more than two decision variables. For less than or equal to type constraints slack
variables are introduced to convert inequalities to equations. A type of solution known as
a basic feasible solution is important for simplex computation. A basic feasible solution of
a system with m equations and n variables has m non negative variables known as basic
variables and n-m variables with value zero known as non-basic variables. It can always
find a basic feasible solution with the help of the slack variables. The objective function is
maximized at one of the basic feasible solutions. Starting with the initial basic feasible
solution obtained from the slack variables, the simplex method improves the value of the
objective function step by step by bringing in a new basic variable and making one of the
present basic variables non basic. The selection of the new basic variable and the omission
of a current basic variable are performed following certain rules so that the revised basic
feasible solution improves the value of the objective function. The iterative procedure stops
when it is no longer possible to obtain a better value of the objective function than the
present one. The existing basic feasible solution is the optimum solution of the problem
which maximizes objective function.
160
2.4 Answers for Check your Progress
1. d
2. a
3. Z = 12 where x1 = 4, x2 = 3
2.5 Glossary
2.6 Assignment
Subject to
x1 + 2x2 + x3 ≤ 430
3x1 + 2x3 ≤ 460
x1 + 4x2 ≤ 420
x1, x2, x3 ≥ 0
2. A manufacturer of bags makes three types of bags P, Q and R which are processed on
three machines M1, M2 and M3. Bag P requires 2 hours on machine M1 and 3 hours on
machine M2 and 2 hours on machine M3. bag Q requires 3 hours on machine M 1, 2 hours
on machine M2 and 2 hours on machine M3 and Bag R requires 5 hours on machine M2 and
4 hours on machine M3. There are 8 hours of time per day available on machine M1, 10
hours of time per day available on machine M2 and 15 hours of time per day available on
machine M3. The profit gained from bag P is Rs 3.00 per unit, from bag Q is Rs 5.00 per
unit and from bag R is Rs 4.00 per unit. what should be the daily production of each type
of bag so that the products yield the maximum profit?
161
3. Use the Simplex Method solve the following LPP problem:
Subject to
2x1 + x2 + x3 ≤ 2
3x1 + 4x2 + 2x3 ≤ 8
x1, x2, x3 ≥ 0
2.7 Activities
Subject to
2x1 + 3x2 ≤ 8
2x2 + 5x3 ≤ 10
3x1 + 2x2 +4x3 ≤15
x1, x2, x3 ≥ 0
2. The products A, B and C are produced in three machine centers X, Y and Z. Each product
involves operation of each of the machine centers. The time required for each operation for
unit amount of each product is given below. 100, 77 and 80 hours are available at machine
centers X, Y and Z respectively. The profit per unit of A, B and C is Rs.12, Rs.3 and Rs.1
respectively. Find out a suitable product mix so as to maximize the profit.
162
2.8 Case Study
A manufacturer of three products tries to follow a policy of producing those which continue
most to fixed cost and profit. However, there is also a policy of recognizing certain
minimum sales requirements currently, these are: Product: x1, x2, and x3. There are three
producing departments. The production times in hour per unit in each department and the
total times available each week in each department are given in the table below. The
contribution per unit of products x1, x2, x3 is Rs.10.50, Rs.9.00 and Rs.8.00 respectively.
Solve by simplex method.
1. Quantitative Techniques in Management, by N.D. Vora, McGraw hills Formatted: Indent: Left: 0.5 cm, First line: 0 cm
2. Operations Research theory and Applications, by J.K. Sharma, Macmillan
3. Operations Research, by Hamdy A. Taha, Pearson Education
4. Quantitative Analysis for Management, by Barry Render, Ralph M. Stair, Michael E.
Hanna and T. N. Badri, Pearson Publication
5. Use of software like QM for Windows, Excel Solver
Formatted: Indent: Left: 0.5 cm
163
Unit No. 3 Transportation Problem
______________________________
Unit Structure
3.0 Learning Objectives
3.1 Introduction
3.1.1Basic Structure of Transportation
3.7 Glossary
3.8 Assignment
3.9 Activities
164
3.0 Learning Objectives
3.1 Introduction
The transportation problem deals with the distribution of goods from several points of
supply (sources/Origins) to a number of points of demand (destinations).Usually we are
given the capacity of goods at each source and the requirements at each destination.
Basically, the objective is to minimize total transportation and production costs. Sometimes
we deal with maximization of profit also. This is an iterative procedure in which a solution
to a transportation problem is found and evaluated using a special procedure to determine
whether the solution is optimal. When the solution is optimal, the process stops. If not, then
a new solution is generated. Basic Structure of a transportation problem is discussed with
the help of the following example.
Source P Q R S Supply
A 40 45 35 36 300 Formatted: Indent: Left: 0.5 cm
B 48 50 52 46 200 Formatted: Indent: Left: 0.5 cm
C 43 44 55 50 400 Formatted: Indent: Left: 0.5 cm
D 44 50 40 30 400
Formatted: Indent: Left: 0.5 cm
Demand 250 300 350 400 1300
Formatted: Indent: Left: 0.5 cm
Formatted: Left
Consider a manufacturer who operates four factories (Sources) and dispatches his products
to four different retail shops (Destinations). The Table above indicates the capacities Formatted: Indent: Left: 0.5 cm
(Supply) of the four factories, the quantity of products required (Demand) at the various
retail shops and the cost of shipping one unit of the product from each of four factories to
each of the four retail shops.
The Table usually referred to as Transportation Table provides the basic data regarding the
transportation problem. The capacity of factories A, B, C, and D is 300, 200, 400, and 400
respectively. The requirements at retail shops P, Q, R, and S are 250, 300, 350, and 400
respectively. The prices inside the intersecting cells (Cell AP – Per Unit Transportation
cost from Source A to Destination P) are known as unit transportation costs. So, the cost
of transportation of one unit from Source A to retail shop P is 40 Rs., Factory A to retail
shop Q is 45 Rs. and so on.
165
3.2 Initial Basic Feasible Solution of a Transportation Problem
In general, any basic feasible solution of a transportation problem with m origins (such as
factories) and n destinations (such as retail shops) starts with the vital condition check of
SUPPLY=DEMAND which is also known as rim requirement of transportation
problem (Balanced Transportation Problem). The following methods are available for the
calculation of an initial basic feasible solution. All the three methods have been explained
using Example I.
Example I
Here supply = demand = 100 so go ahead with step 2. First, start with the cell on
intersection of A and P. The row total corresponding to this is 30 and column total at
destination P is 20. So, allocate 20 which is minimum out of two at AP and remaining units
are 10 at source A. At the destination P, requirement has been satisfied so eliminate column
P, move horizontally to the cell AQ. With the supply available at source A being 10 and
the demand at Q being 20, allocate minimum out of two which is 10 at AQ and move
further horizontally to cell AR. As no supply is available at source A, move further to
directly to cell BQ where 10 units are left to satisfy. Allocate 10 units to cell BQ and move
horizontally again, at BR now remaining supply being 30 and demand being 25 so allocate
25 at BR. Now again move horizontally at BS, with remaining units of 5 at source B and
with demand of 35, allocate 5 units to cell BS. Again, move horizontally at CP, CQ and
166
CR where no units are left to allocate. So, by default last 30 units will be allocated at cell
CS.
This is the simple method to use but it starts from north west corner irrespective to looking
for the transportation cost, sometimes highest cost may be considered by the method.
1 First check supply and demand if they are equal, go to step 2 or add dummy row if
supply is less and add dummy column if demand is less.
2 Choose the cell with minimum cost. Formatted: Indent: Left: 0.5 cm, First line: 0 cm
3 Consider the supply at source and demand at destination corresponding to that cell and
allocate lower of the two to that cell.
4 Delete the row or column whichever is satisfied by this allocation.
5 If row is deleted, then the column value is revised by subtracting the quantity and
column is deleted then row value is revised.
6 Again, choose the one with least cost from remaining cells, make assignments and
adjust row and column total.
7 Continue until all the units are assigned
Formatted: Indent: Left: 0.5 cm
Solution
Here supply = demand = 100 so go ahead with step 2. First, select the least cost from whole
matrix which is at cell CP being 13. At CP, supply being 30 and demand being 20 allocate
20 units at CP. Now cut the destination P column as demand has been satisfied. Now again,
select the minimum cost from remaining all values of matrix which at cell BS being 14. At
BS, supply being 40 and demand being 35 allocate 35 units at BS. Now cut the Source C
row as supply has been dispatched fully. Move further, select minimum from remaining
values which is at AQ being 18. Allocate only 10 as demand at Q is only 10 units. Now cut
the destination Q column as demand has been satisfied. Now, out of two remaining values,
minimum is 20 at BR, allocate remaining units of 5 at BR 20 at AR.
167
Initial Feasible Solution: LCM Method
Source Destination Supply
P Q R S Formatted: Indent: Left: 0.5 cm
A 15 18 [10] 22 16 30 20 Formatted: Indent: Left: 0.5 cm
B 15 19 20 [5] 14[35] 40 5 Formatted: Indent: Left: 0.5 cm
C 13[20] 16 [10] 23 17 30 10 Formatted: Indent: Left: 0.5 cm
Demand 20 20 10 25 20 35 100 Formatted: Indent: Left: 0.5 cm
Calculate Total Cost = (18*10) + (20*5) + (14*35) + (13*20) + (16*10) + (17*30) Formatted: Indent: Left: 0.5 cm
=Rs.1700.
1 For each row/column of table, find difference between two lowest costs. (Opportunity
cost/Penalty)
2. Find greatest opportunity cost/Penalty.
3. Assign as many units as possible to lowest cost square in row/column with greatest
opportunity cost.
4. Eliminate row or column which has been completely satisfied.
5. Begin again, omitting eliminated rows/columns. Number of times, the process gets
repeated so it is known as iterative process.
Solution
The highest penalty of 3 occurs at row C, minimum cost in the C row is 13 so intersection
of it is cell CP where allocate 20 units and eliminate column P as demand has been satisfied
and 10 units are remaining at source C. Again, repeat step 1and 2 in second iteration (II)
with only remining values of column Q, R and S. The highest penalty at row B, in that
minimum cost is 14 so allocate 35 units at cell BS and eliminate column S as demand has
been satisfied. Again, repeat step 1and 2 in third iteration (III) with only remining values
of column Q and R. Now, highest penalty at row C with minimum cost of 16, so allocate
10 units at cell CQ and eliminate row C as supply has been delivered fully. Still difference
can be calculated between remaining values, so repeat step 1 and 2 in fourth iteration.
Highest penalty at row A with minimum cost of 18 so allocate remaining 10 units at cell
AQ and eliminate column Q. Now, only one column is left so no difference can be
calculated, hence no iteration is possible, allocate remaining supply or demand accordingly.
168
Source Destination Supply
P Q R S I II III IV Formatted: Indent: Left: 0.5 cm
A 15 18 22 16 30 20 1 2 4 4 Formatted: Indent: Left: 0.5 cm
[10] [20]
B 15 19 20 14[35] 40 5 1 5 1 1 Formatted: Indent: Left: 0.5 cm
[5]
C 13[20] 16 23 17 30 10 3 1 7 - Formatted: Indent: Left: 0.5 cm
[10]
Demand 20 20 25 35 100 Formatted: Indent: Left: 0.5 cm
10
I 0 2 2 2 Formatted: Indent: Left: 0.5 cm
II - 2 2 2 Formatted: Indent: Left: 0.5 cm
III - 2 2 - Formatted: Indent: Left: 0.5 cm
IV - 1 2 - Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Calculate Total Cost = (18*10) + (22*20) + (20*5) + (14*35) + (13*20) + (16*10)
=Rs.1630.
Note: If there is a tie between two minimum costs, select the one where maximum
allocation can be done. If there is a tie between two least cost as well as maximum
allocation, select either of the two.
169
Check your progress 1
1.The initial solution of a transportation problem can be obtained by using any of the three
known methods.
However, the only condition is that
(a) the solution be optimal (b)the rim condition are satisfied.
(c) the solution not be degenerate. (d) all of the above.
2. One disadvantage of using North-‐West Corner Rule to find initial solution to the
transportation problem is that
(a) it is complicated to use.
(b)it leads to degenerate initial solution
(c) it does not take into account cost of transportation.
(d) all of the above.
4. The method of finding an initial solution based upon opportunity costs is called __________
5. Find with which initial basic feasible solution method the following solution developed, what
is the total cost of transportation?
TO
FROM
P Q R S Supply
A 12[180] 10[150] 12[170] 13 500
B 7 11 8[180] 14[120] 300
C 6 16 11 7[200] 200
Demand 180 150 350 320 1000
1)Find out a basic feasible solution of the transportation problem using one of the 'three
methods described in the previous section. Check m + n - 1 = number of occupied cells
(where m = number of rows and n= number of columns) condition to apply MODI method
first. For every step of method, it is compulsory to check above condition.
170
2) Introduce dual variables corresponding to the row constraints and the column
constraints. If there are in origins and n destinations then there will be m+n dual variables.
The dual variables corresponding to the row constraints are denoted by ui (i = 1, 2, ….., m)
while the dual variables corresponding to column constraints are denoted by v j (j=1, 2,
…….., n).
3)The values of the dual variables should be determined from the following equations.
Values can be calculated only with the help of occupied cells.
Ui + vj = cij
One of the dual variables can be chosen arbitrarily. It is to be also noted that as the primal
constraints are equations, the dual variables are unrestricted in sign. Any positive or
negative number can be selected but it is always good to allocate zero with no sign. The
best way to assign zero is to select a row or column where maximum number of occupied
cells are located.
4) Now find the opportunity costs of each unoccupied cells (The cells where no allocation
has been made) with the help of following formula:
Δij = Cij – (ui + vj)
If any value is negative that means there a scope of reducing transportation cost by that
much of rupees per unit.
Let us consider the following transportation problem given in Example 2 with a basic
feasible solution computed by least cost method,
Example II
171
Step 2 and 3. The dual variables can be calculated as follows by putting zero in P1 row. (by
considering only occupied cells)
u 3 + v2 = 4 u2 + v3 = -3,
172
P3D3 60 – ( 8 + 9) = 43 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Final transportation schedule is:
In a transportation problem, the total demand of destinations must be identical to the total
capacity of sources, otherwise it cannot be solved.
2. In vogel’s approximation method the differences of the smallest and second smallest
costs in each row and column are called ______.
2.4 Special Cases of Transportation Formatted: Indent: Left: 0.5 cm, First line: 0 cm
When supply and demand is not equal it is known as unbalanced transportation problem.
To make it balanced, add dummy row with zero cost in each cell if supply is less and add
dummy column with zero cost in each cell if demand is less. Following example will make
procedure clear:
Example III
Formatted Table
A B C Supply
Formatted: Indent: Left: 0.5 cm
X 9 11 10 40
Formatted: Indent: Left: 0.5 cm
Y 10 8 12 60
Formatted: Indent: Left: 0.5 cm
Z 12 7 8 50
Formatted: Indent: Left: 0.5 cm
Demand 50 40 30 120 / 150 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
173
Solution
Here supply is 150 and demand is 120, so demand is less by 30 units, as demand is less,
we will add dummy column with D destination with zero transportation costs as actually it
does not contribute in total transportation cost. If supply is less, add dummy row with zero
transportation cost. Solution for the same is shown in table below:
3.4.3 Degeneracy
A basic feasible solution of a transportation problem has m+n-1 basic variables, which
means that the number of occupied cells in such a solution is one less than the number of
rows plus the number of columns, It may happen sometimes that the number of occupied
cells is smaller than m+n-1. Such a solution is called a degenerate solution.
Degeneracy in a transportation problem can figure in two ways:
1) While obtaining Initial feasible Solution
2) While Revising the solution
When a solution is degenerate, the difficulty is that it cannot be tested for optimality.
k+ ϵ=k; k- ϵ= k; 0+ ϵ= ϵ;
ϵ + ϵ = ϵ; ϵ- ϵ =0; k* ϵ =0.
174
I. While obtaining Initial feasible Solution
An epsilon is inserted in the least cost independent cell. An independent cell is one from
which a closed loop cannot be traced. It may be further noted that if a given problem
requires two (or more) epsilons, then a cell in which an epsilon has already been placed
will be treated as occupied while determining independence of cells for inserting an epsilon
subsequently.
Example IV:
A company wants to ship loads of his product shown below. The matrix shows the
kilometers from sources of supply to the destination.
Shipping cost is Rs. 10/Load per km. what shipping schedule should be used to minimize
total transportation cost?
Solution:
Since total destination requirement of 25 units more than the total resources capacity of 22.
This excess requirement is handled by adding dumny plant S excees with a capacity equal to
3 unit. We use zero transportation cost to the dummy plant.
Then modified total is shown below:
175
To obtain initial solution:
We use Vogel’s approximation method and get a following solution:
In order to remove degeneracy, we assign Δ to unoccupied cell (S2, D5) which has minimum
cost among unoccupied cells as shown in table 2.
176
We use MODI method and therefore first we have to find ui, vj &Δij with following relation.
cij = ui + vj for occupied cell
Δij = cij – (ui + vj) for unoccupied cell.
Here some Δij is not greater or equal to zero. This is not an optimal solution. Then we have
to improve this solution for this we have to choose (S excess D3) cell because it has largest
negative cost it must enter the basesThen we choose a closed path for cell (Sexcess D3) which
is (Sexcess, D3)→(Sexcess,D4)→(S2D4)→(S2,D5) →(S1D5)→(S1D3)→ (D4Sexcess)and, min.
(Δ,3,5) = Δ. The new solution is shown in following table 4:
177
Here again some Δij, is not greater or equal to zero. Then this is not an optimal solution.
Then again we choose (S3D4) cell which is largest negative, it must enter the basis and
choose a closed path as
(S3 D4)→(S3D5)→(S1D5)→(S1D3)→(SexcessD3)→(SexcessD4)→(S3D4). Here min (3, 5) = 3
and find a solution which is shown in following Table 6.
178
Again, we check optimality for this we calculate µi, vj &Δij as follows:
179
The minimum total transportation cost associated with this solution is
Example V
Goods have to be transported from sources S1, S2 and S3 to destinations D1, D2 and D3. The
transportation cost per unit capacities of the sources and requirements of the destination
are given in the following table.
Solution:
To find initial Basic feasible solution. Using north- west corner method. The non-
degenerate initial basic feasible solution is given is Table 1.
180
Here total occupied cell = m + n – 1= 3 + 3-1= 5
Therefore, there is no degeneracy. To test the optimality. We use MODI method, for this
first we calculate µi, vj &Δij.
Since the unoccupied cell (S3, D1) has the largest negative opportunity cost of the therefore
cell (S3, D1) is entered into the basis. Then we have chosen closed path (S3,D1)→(S3D2)→
(S2D2)→(S2D1)→(S3D3). Here maximum allocation to negative cell is 300. So, modified
solution is given below:
181
But in this solution degeneracy occurs because total no of positive allocation becomes 4
which is less than the required no m + n – 1 = 3 + 3 – 1 =5
Again, proceed with the usual solution procedure. The optimal solution is given as follows:
with total transportation cost = 1900 Rs.
182
Check your progress 3
1. In the result of QM’s transportation model, if it shows that Source 2 should ship 45 units
to a “dummy” destination, then it means that ___________.
In the most general form, a transportation problem has a number of origins and a number
of destinations. A certain amount of a particular shipment is available in each origin.
Likewise, each destination has a certain requirement/demand. The transportation problem
indicates the amount of shipment to be transported from various origins to different
destinations so that the total transportation cost is minimized without violating the
availability constraints and the requirement constraints. The number of techniques is
available for computing an initial basic feasible solution of a transportation problem. These
are the North West Corner rule, Least Cost method and Vogel's Approximation Method
(VAM). Optimum solution of a transportation problem can be calculated from Modified
Distribution (MODI) Method. Sometimes the total available supply at the origins is
different from the total demand at the destinations. Such a transportation problem is said
to be unbalanced. An unbalanced transportation problem can be made balanced by
introducing an additional dummy row or column with zero transportation cost. The basic
feasible solutions of a transportation problem with m origins and n destinations should have
m+n - 1 positive basic variables. However, if basic variables are less than m + n – 1, the
solution is said to be degenerate. A degenerate transportation problem can be modified by
adding n epsilons at independent cell.
1. b
2. c
183
3. False
1. True
2. Penalty
3. m + n -1
3. False
3.7 Glossary
184
An Unbalanced Transportation Problem: is a transportation problem where the total
availability at the origins is different from the total requirement at the destinations.
Multiple Optimal Solutions: More than one unique optimal solutions with same amount
of transportation cost.
3.8 Assignment
1. Find an initial basic feasible solution to the following transportation problem. Is it an
optimal? Use VAM &MODI method.
D D D
1 2 3 D4 Available Units
Formatted: Indent: Left: 0.5 cm
O1 5 4 2 1 130 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
O2 2 3 7 5 100 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
O3 5 4 5 6 30 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
10
Demand 40 50 70 0 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
2.Mr.ContractorisabuilderandownerofAshianaConstructionCompany.Currentlyhehasthree
largehousingprojectsinhand.TheyarelocatedatAndheri,BandraandChinchwad.
HeprocurescementfromfourplantslocatedatDumdum,Ellora,FerozaandGuna.The basic
feasible solution as determined byNorth West Corner rule is given below:
Projects A B C Availability
Plants Formatted: Indent: Left: 0.5 cm
1 2[50] 7 4 50 Formatted: Indent: Left: 0.5 cm
2 3[20] 3[60] 1 80 Formatted: Indent: Left: 0.5 cm
3 5 4[30] 7[40] 70 Formatted: Indent: Left: 0.5 cm
4 1 6 2[140] 140 Formatted: Indent: Left: 0.5 cm
Demand 70 90 180 340
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Mr.Contractorwantstoplanmovementofcementinsuchamannerthattheoptimalminimu
mtransportation cost is reached. Assist him.
185
3. A company has three plants and four warehouses. The supply and demand in units and the
corresponding transportation costs are given. Below table shows initial solution of problem.
Formatted: Indent: Left: 0.5 cm
Warehouses Formatted Table
I II III IV
Plants Supply
1 Formatted: Indent: Left: 0.5 cm
1
5 10 4 0 5 10
2 5 Formatted: Indent: Left: 0.5 cm
2
6 0 8 7 2 25
5 1 5
3 Formatted: Indent: Left: 0.5 cm
4 2 0 5 7 20
Demand 25 10 15 5 55 Formatted: Indent: Left: 0.5 cm
Answer the following questions, giving brief reasons: Formatted: Indent: Left: 0.5 cm
(a) Is this solution degenerate? Formatted: Indent: Left: 0.5 cm, First line: 0 cm
(b) Is this solution optimal?
(c) Does this problem have more than one optimal solution? If so,
show all of them.
4. A company has three plants and three warehouses. The supply and demand in units and Formatted: Indent: Left: 0.5 cm
the corresponding transportation costs are given. Below table shows initial solution of
problem. Find an optimal solution.
186
Stores Formatted Table
I II III IV Formatted: Indent: Left: 0.25 cm, Hanging: 0.25 cm
1 Formatted: Indent: Left: 0.5 cm
Factory A 2 4 6 1 11 Formatted: Indent: Left: 0.5 cm
Factory B 10 8 7 5 5 Formatted: Indent: Left: 0.5 cm
1 Formatted: Indent: Left: 0.5 cm
Factory C 13 3 9 2 12
Formatted: Indent: Left: 0.5 cm
Factory D 4 6 8 3 3
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
3.9 Activities Formatted: Indent: Left: 0.5 cm
Select any transportation company or manufacturing company. Select 3 or 4 sources and 3 Formatted: Indent: Left: 0.5 cm, Tab stops: 11.75 cm, Left
or 4 destinations. Collect the data regarding total supply at each source and total demand
at each destination, each transportation cost to ship from one source to destination, make a
proper transportation structure in form of table and find optimum cost schedule.
Formatted: Indent: Left: 0.5 cm
XYZ shipping corp. is a leading shipping corporation of the nation. They have offices in
Mumbai and Gandhidham. They provide services to different companiesy and transport
their goods fromto warehouses to marketplaces. The following table provides all necessary
information on the availability of supply of each warehouse to the requirement of the
various markets. And the unit transportation cost in thousand Rs from each warehouse to
each market is mentioned below. Mr Sanjay, the shipping clerk of ABG shipping agency
usually prepares schedule of transportation based on his expertise and vast experience. Mr.
Sanjay has worked out the following schedule on the basis of assumptions.12 units from A
to Q, 1 unit from A to R, 9 units from A to S, 15 units from B to R, 7 units from C to P, 1
unit from C to R
Marke
ts
Warehous Formatted: Indent: Left: 0.5 cm
e P Q R S Supply
A 6 3 5 4 22 Formatted: Indent: Left: 0.5 cm
B 5 9 2 7 15 Formatted: Indent: Left: 0.5 cm
C 5 7 8 6 8 Formatted: Indent: Left: 0.5 cm
Demand 7 12 17 9 45 Formatted: Indent: Left: 0.5 cm
a) Being a consultant of the company, check and analyze wether Mr Sanjay has Formatted: Indent: Left: 0.5 cm, First line: 0 cm
arranged optimal schedule or not? You can apply transportation method.
b) Find the optimal schedule and minimum total transportation cost whether this
problem has only one optimal solution or not? Justify your answer.
187
3.11 Further Readings
188
Unit No. 4 Assignment Problem
______________________________
Unit Structure
4.0 Learning Objectives
4.1 Introduction
4.1.1Basic Structure of Assignment
4.6 Glossary
4.7 Assignment
4.8 Activities
189
4.0 Learning Objectives
• Understand the concept and assumptions in comprehensive manner
• Learn algorithm of Hungarian assignment method
• Use the algorithm for solving an assignment problem
• Learn special cases of assignment
4.1 Introduction
The assignment problem in the general form can be stated as follows : Given m number of
facilities and n number of jobs and the effectiveness of each facility for each job, the
problem is to assign each facility to one and only one job in such a way that the measure
of effectiveness is optimized (Maximized or Minimized).Several problems of management
may have the applications of assignment problem. A project manager may have five
people available for assignment and five projects to fill. He is interested in knowing
which job should be assigned to which person so that all project tasks may be accomplished
in the shortest possible time. Likewise, an institute may have different subjects to offer
different faculties, the duty is to assign subjects in a such way that faculty may be able to
complete within short period of time with efficiency. In a marketing set up by making an
estimate of sales performance for different salesmen as well as for different territories one
could assign a particular salesman to a particular territory with a view to maximize overall
sales. It may be noted that with n facilities and n jobs there are n! possible assignments.
One way of finding an optimum assignment is to write all the n! possible arrangements,
evaluate their total cost (in terms of the given measure of effectiveness) and select the
assignment with minimum cost. The method leads to a lengthy computational process.
Hence it is necessary to develop a suitable computation procedure to solve an assignment
problem.
Consider this example to understand basic structure of an assignment. Given the following
cost table for an assignment problem. Here the important condition is to have rows and
columns are same. The assignment should be always one to one. More than one machine
cannot be assigned to more than one jobs.
Operator Machine
A B C D Formatted: Indent: Left: 0.5 cm
1 10 2 8 6 Formatted: Indent: Left: 0.5 cm
2 9 5 11 9 Formatted: Indent: Left: 0.5 cm
3 12 7 14 14 Formatted: Indent: Left: 0.5 cm
190
4 3 1 4 2 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Step 1: ConstructFind out the cost table from the given problem. If the number of origins
are not equal to the number of destinations, a dummy origin or destination must be added
with zero cost.
Step 2: Find the smallest cost in each row of the cost table. Subtract this smallest cost
element from each element in that row. Therefore, there will be at-least one zero in each
row of this new table, called the first Reduced Cost Table.
Step 3: Find the smallest element in each column of the reduced cost table. Subtract this
smallest cost element from each element in that column. As a result, each row and column
now has at-least one zero value in the second reduced cost table.
Step 3: Draw the minimum number of horizontal and vertical lines that can cover maximum
zero.
Step 4: Number of drawn horizontal and vertical lines must be equal to number of rows
and columns. If both are same go to step 5 or go to step 6.
Step 5: Start to check with the first row and made first assignment where there is a single
zero. Cut the zero in respective column. Repeat the procedure till assignment is made for
all jobs. An optimal assignment is found, if the number of assigned cells equals the number
of rows (and columns).
Step 6: Examine those elements that are not covered by a line. Choose the smallest of these
elements and subtract this smallest from all the elements that do not have a line through
them and add this smallest element to every element that lies at the intersection of two
lines. The resulting matrix is a new revised cost tableau. Repeat the step until number of
rows and columns are equal to drawn horizontal or vertical lines.
Step 8: Calculate total cost or profit with reference to the original matrix.
Example I
191
Let us assume that Geeta is a sorority pledge coordinator with four jobs and only three
pledges. Geeta decides that the assignment problem is appropriate except that she will
attempt to minimize total time instead of money (since the pledges aren’t paid). Geeta also
realizes that she will have to create a dummy fourth pledge and she knows that whatever
job gets assigned to that pledge will not be done (this semester, anyhow). She creates
estimates for the respective times and places them in the following table, E is, of course,
a dummy pledge, so her times are all zero.
Solution of Example I
(a) The first step in this algorithm is to develop the opportunity cost table. This is done by
subtracting the smallest number in each row from every value in that row, then, using these
newly created figures, by subtracting the smallest number in each column from every other
value in that column. Whenever these smallest values are zero, the subtraction results in no
change.
Job Job Job Job
1 2 3 4
B 1 6 0 5 Formatted: Indent: Left: 0.5 cm
No change was produced when dealing with the columns since the smallest values were Formatted: Indent: Left: 0.5 cm
always the zeros from row four.
(c) (b) The next step is to draw lines through all of the zeros. The lines are to be straight and Formatted: Numbered + Level: 1 + Numbering Style: a, b, c,
… + Start at: 1 + Alignment: Left + Aligned at: 0.01 cm +
either horizontal or vertical. Furthermore, you are to use as few lines as possible. If it Indent at: 0.65 cm
requires four of these lines (four because it is a 4 4 matrix), an optimal assignment is
already possible. If it requires fewer than four lines, another step is required before optimal
assignments may be made. In our example, draw a line through: row four, column three,
and either column one or row three.
192
Job Job Job Job
1 2 3 4
B 1 6 0 5 Formatted: Indent: Left: 0.5 cm
(c) Since the number of lines required was less than the number of assignees, a third step Formatted: Indent: Left: 0.5 cm
is required (as is normally the case). Looking at the version of the matrix with the lines
through it, determine the smallest number not covered by a line. Subtract this smallest
number from every number not covered by a line and add it to every number at the
intersection of two lines.
Job Job Job Job
1 2 3 4
B 0 5 0 4 Formatted: Indent: Left: 0.5 cm
Draw the minimum number of lines to cover all the zeroes, and we have the matrix below.
Job Job Job Job
1 2 3 4
B 0 5 0 4 Formatted: Indent: Left: 0.5 cm
Since only 3 lines are needed to cover the zeroes, we determine the smallest number not
covered by a line. Subtract this smallest number from every number not covered by a line
and add it to every number at the intersection of two lines. The result is shown with the
new lines drawn through the zeroes.
193
Job Job Job Job Formatted: Indent: Left: 0.5 cm
1 2 3 4
B 0 4 0 3 Formatted: Indent: Left: 0.5 cm
(d) Since this matrix requires four lines to cover all zeros, we have now reached an optimal Formatted: Indent: Left: 0.5 cm
solution stage.
(e) In our example the assignments must be: C to job 3 = 2 , B to job 1 = 4, D to job 2 = 4
and E to job 4 = 0. Since E is a dummy row, the job labeled job 4 does not get completed.
So, the total time is 10.
2. In an assignment problem,
(a) one agent can do parts of several tasks
(b) one task can be done by several agents
(c) each agent is assigned to its own one best task
(d) none of the above
3. Number of drawn lines are not equal to number of rows and columns, even though optimal
solution can be found out. State true or false.
4. The procedure used to solve assignment problems wherein one reduces the original
assignment costs to a table of opportunity costs is called __________.
4.3.1 4.3.1 Unbalanced Assignment Problem: When number of rows and columns are Formatted: Outline numbered + Level: 3 + Numbering Style:
1, 2, 3, … + Start at: 1 + Alignment: Left + Aligned at: 0.63
not same, it the case of unbalanced transportation problem. There is a need of cm + Indent at: 1.9 cm
adding dummy row or column whichever is less. In example I, we have added
194
dummy row with zero cost to solve the problem of unbalanced assignment
problem.
4.3.2 Prohibited Assignment Problem: When some routes are closed or some
tasks/projects/work cannot be assigned to machine, worker, employee because of any
reason, then it is the case of prohibited assignment problem. There is a restriction for an
assignment. To solve such kind of problems, put “ – or M” where there is a prohibition.
And solve the problem by following algorithm explained for solving example I. Do not do
anything with restricted sign, keep that as it is and solve the problem. Following example
will make it clear.
In a production unit four new machines M1, M2, M3 and M4 are to be installed in a machine
shop. There are five vacant places A, B, C, D and E available. Because of limited space,
machine M2 cannot be placed at C and M3 cannot be placed at A. The cost of locating a
machine at a place in thousands of rupees is as under:
A B C D E
M1 4 6 10 5 6 Formatted: Indent: Left: 0.5 cm
M2 7 4 - 5 4 Formatted: Subscript
M3 - 6 9 6 2 Formatted: Subscript
M4 9 3 7 2 3 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
4.3.3 Multiple Optimal Solutions: When during final assignment there is no single zero Formatted: Subscript
found in any row or column, that can be the case of multiple optimal solutions. It might be Formatted: Subscript
possible to get one row and one column both havinge two zeros, then arbitrarily start with Formatted: Indent: Left: 0.5 cm
any zero of row or column and find the assignment, similar way again start with the second Formatted: Indent: Left: 0.5 cm
zero of row or column. Single zero row or column assignment will remain same in both the
solutions. Following example will make it clear
Example II
Consider the following assignment problem: The Spicy Spoon restaurant has four payment
counters. There are four persons available for service. The cost of assigning each person to
each counter is given in the following table. Assign one person to one counter to minimize
the total cost.
Person 1 2 3 4
A 1 8 15 22 Formatted: Indent: Left: 0.5 cm
B 13 18 23 28 Formatted: Indent: Left: 0.5 cm
C 13 18 23 28 Formatted: Indent: Left: 0.5 cm
D 19 23 27 31 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
195
Solution of Example II
After applying steps 1 to 3 of the Hungarian Method, we obtain the following matrix.
Person 1 2 3 4
A 0 3 6 9 Formatted: Indent: Left: 0.5 cm
B 0 1 2 3 Formatted: Indent: Left: 0.5 cm
C 0 1 2 3 Formatted: Indent: Left: 0.5 cm
D 0 0 0 0 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Now by applying the usual procedure, we get the following matrix.
Person 1 2 3 4
A 0 2 5 8 Formatted: Indent: Left: 0.5 cm
B 0 0 1 2 Formatted: Indent: Left: 0.5 cm
C 0 0 1 2 Formatted: Indent: Left: 0.5 cm
D 1 0 0 0 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
The resulting matrix suggest the alternative optimal solutions as shown in the following
Option 1
Person 1 2 3 4
A 0 2 4 7 Formatted: Indent: Left: 0.5 cm
B 0 0 0 1 Formatted: Indent: Left: 0.5 cm
C 0 0 0 1 Formatted: Indent: Left: 0.5 cm
D 2 1 0 0 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
Option 2
Person 1 2 3 4
A 0 2 4 7 Formatted: Indent: Left: 0.5 cm
B 0 0 0 1 Formatted: Indent: Left: 0.5 cm
C 0 0 0 1 Formatted: Indent: Left: 0.5 cm
D 2 1 0 0 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
The persons B & C may be assigned either to job 2 or 3.
The two alternative assignments are:
A1 + B2 + C3 + D4 A1 + B3 + C2+ D4
196
12 + 23 + 18 + 31 = 73 1 + 18 + 23 + 31 = 73
4.3.4 Maximization types of problems: First select the maximum value from whole
matrix, subtract all other values from that maximum value, the new matrix is known as
revised cost matrix. Then apply Hungarian assignment method on developed table as
explained in example I. following example will make it clear.
A company has four sales representatives who are to be assigned to four different sales
territories. The monthly sales increase estimated for each sales representative for different
sales territories(in lakh rupees), is shown in the following table: Suggest an optimal
assignment and the total maximum sales increase per month.
2. Solve the following assignment problem so as to minimize the time (in days) required to
complete all the task.
person task
T1 T2 T3 T4 T5
A 6 5 8 11 16
B 1 13 16 1 10
C 16 11 8 8 8
D 9 14 12 10 16
197
The Assignment Problem considers the allocation of a number of jobs to a number of
persons so that the total completion time or cost is minimized or total profit is maximized.
If the number of persons is-the same as the number of jobs, the assignment problem is said
to be balanced. If the number of jobs is different from the number of persons the assignment
problem is said to be unbalanced. An unbalanced assignment problem can be converted
into a balanced assignment problem by introducing a dummy person or a dummy job with
completion time zero.
1. b
2. c
3. False
1. False
2. A – T2 = 5, B – T4 = 1, C – T3 = 8, D – T1 = 9, E – T5 = 0
Total Time = 5 + 1 + 8 + 9 + 0 = 23 Days
4.6 Glossary
198
Balanced Assignment Problem: is an assignment problem where the number of persons
is equal to the number of jobs.
A Dummy Job: is an imaginary job with cost or time zero introduced to make an
unbalanced assignment problem balanced
4.7 Assignment
1.Solve the following assignment problem. (Assign one machine to one worker so that total
time in hours is minimized.)
Machine M1 M2 M3 M4 M5
Man
Project
Person 1 2 3 Formatted: Indent: Left: 0.5 cm
Adams 11 14 6 Formatted: Indent: Left: 0.5 cm
Brown 8 10 11 Formatted: Indent: Left: 0.5 cm
Cooper 9 12 7 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
3. ABC company is engaged in manufacturing 5 brands of packed snacks. It is having five
manufacturing setups, each capable of manufacturing any of its brands one at a time. The
199
cost to make a brand on the setups vary according to the table below:
S1 S2 S3 S4 S5
B1 4 6 7 5 11 Formatted: Indent: Left: 0.5 cm
B2 7 3 6 9 5 Formatted: Indent: Left: 0.5 cm
B3 8 5 4 6 9 Formatted: Indent: Left: 0.5 cm
B4 9 12 7 11 10 Formatted: Indent: Left: 0.5 cm
B5 7 5 9 8 11 Formatted: Indent: Left: 0.5 cm
Find the optimum assignment of products on the setup resulting in the minimum cost. Formatted: Indent: Left: 0.5 cm
An airline that operates 7 days a week has the timetable as given below. Crew must have a
minimum layover of 5 hours between flights. Obtain the pairing of flights that minimize
layover time away from home assuming that the crew can be based at either of the two
cities. Suggest an optimum assignment of crew that result in small layover
Delhi – Jaipur -
Jaipur Delhi
Flight Flight Formatted: Indent: Left: 0.5 cm
No. Depart Arrive No. Depart Arrive
7.00 8.00 9;15
1 am am 101 8.00 am am Formatted: Indent: Left: 0.5 cm
8.00 9.00 9;45
2 am am 102 8;30 am am Formatted: Indent: Left: 0.5 cm
1.30 2.30 12 1.15
3 pm pm 103 Noon pm Formatted: Indent: Left: 0.5 cm
6.30 7.30 6.45
4 pm pm 104 5.30 pm pm Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
200
hour. Find the most suitable task for each typist and least allocation of the following
data with Hungarian methods:
Typis
t Rate Number of Job/Task Number of
Per pages typed Pages Formatted: Indent: Left: 0.5 cm
1. Quantitative Techniques in Management, by N.D. Vora, McGraw hills Formatted: Indent: Left: 0.5 cm, First line: 0 cm
2. Operations Research theory and Applications, by J.K. Sharma, Macmillan
3. Operations Research, By Hamdy A. Taha, Pearson Education
4. Quantitative Analysis for Management, by Barry Render, Ralph M. Stair, Michael E.
Hanna and T. N. Badri, Pearson Publication
5. Use of software like QM for Windows, Excel Solver
Formatted: Indent: Left: 0.5 cm
Block Summary
In this block, we discussed widely used techniques of operation research in detail for
optimal output in terms of maximum profit or minimum cost with available resources. In
the first unit, special technique of operations called linear programming problems was
explained. Based on the number of products and resources available, formulation and
201
solution for only two decision variables were discussed through graphical method. Special
problems like infeasibility and unboundedness of LPP were discussed. In the second unit,
the technique of simplex method was discussed for two or more decision variables. In the
third unit, the concept of transportation problem was discussed with initial basic feasible
solution methods and optimal solution method. Special cases of transportation like
unbalanced, multiple optimal solutions and degeneracy were explained. In the last unit, the
method of assignment problem was explained to find the effectiveness of assigning jobs
to each facility along with special cases like unbalanced assignment, prohibited, multiple
optimal solutions and maximization types of assignment problems.
Block Assignment
4. A transportation problem involves the following costs, supply, and demand. Formatted: Indent: Left: 0.5 cm, First line: 0 cm
202
TO
From 1 2 3 4 Supply Formatted: Indent: Left: 0.5 cm
(b) Using the VAM initial solution, find the optimal solution using the modified Formatted: Indent: Left: 0.5 cm
distribution method (MODI). Formatted: Indent: Left: 0.5 cm, First line: 0 cm, Tab stops:
1 cm, Left + Not at 4.44 cm
Formatted: Indent: Left: 0.5 cm
5. Find the graphical solution of the following problem.
Find x and y so as to Formatted: Indent: Left: 0.5 cm, First line: 0 cm, Tab stops:
1 cm, Left + Not at 4.44 cm
Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Minimize Z = X1 + X2 subject to the following constraints;
Formatted: Indent: Left: 0.5 cm
X1 + 2X2 ≤ 2000 ,
X1 + X2 ≤ 1500, Formatted: Indent: Left: 0.5 cm, First line: 0 cm
203
Block no. 4 Specific Operation Research
Methods
______________________________
Block Introduction
In this block, some more operation research techniques will be discussed. In the first unit
situations related to planning, scheduling and controlling of projects will be discussed. The
process of developing network diagrams and finding project completion time will be
covered. In the second unit the nature and scope of waiting line concept will be discussed.
Some basic waiting line models and their application will also be covered. In the last unit
the concept and scope of game theory will be discussed. The consequences of interplay of
combination of strategies with competitor and methods employed to derive the optimal
strategy will be covered
Block Objectives
• Understand situations related to planning , scheduling and controlling of projects
• Develop simple network diagrams with activities.
• Identify the critical path and compute the project completion time
• Compute Slack and float
• Estimate the probability of project completion on a desired date
• Understand the nature and scope of waiting line system
• Describe the characteristics and structure of waiting line system
• Understand the application of statistics in solving waiting line problems
• Apply common waiting line models in suitable business problems
• Determine the optimum parameters of queuing models
• Understand the concept and scope of game theory
• Understand the consequences of interplay of combination of strategies with competitor
• Distinguish between different type of game situations
• Analyse and derive the optimal strategy in a game
• Understand the rule of dominance for solving game problems.
Block Structure
Unit 1 Project Scheduling-PERT/CPM
Unit 2 Waiting Line Models
Unit 3 Game Theory
204
Unit No. 1 Project Scheduling –
CPM/PERT
______________________________
Unit Structure
1.0 Learning Objectives
1.1 Introduction
1.7 Glossary
205
1.8 Assignment
1.9 Activities
1.1Introduction
206
4. Non critical activities may be delayed by how much time.
5. Probability of completing the project at a desired date.
Today’s computerized versions of PERT and CPM techniques combine the best features
of both approaches. Thus the distinction between the two techniques is no longer necessary.
So in this unit we will refer project scheduling techniques as PERT/CPM.
Activity: An operation or task which utilizes resources and consumers time is known as an
activity. An activity is represented by a single arrow, also called as arc in the project
network. The head of the arrow shows the sequence or flow in which activities are to be
done. The activity arrow is not scaled and the length of the activity arrow is a matter of
convenience and clarity and is not related to the time required by the activity. All activities
should be defined properly, so that their beginning and end can be identified clearly. A
project consists of several activities. For example, construction of the house involves many
activities like- getting finance, building foundation, Order and receiving materials, building
house, selecting paint, selecting furnishings, painting, finishing work etc.
Event: An event is called the beginning and completion of activity. They are points in time
and can be considered as milestones. An event in a network is represented by a circle. The
events are also called nodes. The difference between activity and even is that an activity
is a recognizable part of the project, involving physical and mental work and requiring time
and resources for its completion, whereas an event is an accomplishment at a point of time
which neither requires time nor consumes resources.
Successor Activity: An activity which cannot be started, until the completion of one or
more activities is called successor activity
Concurrent Activity: Activities that should be done simultaneously are called concurrent
207
activity. It should be noted that an activity can be predecessor or successor to an activity
and may be concurrent with one or more activity.
Dummy Activity: A dummy activity is an activity, which doesn’t consume any time or
1. Two or more activities in a project have identical immediate predecessor and successor
activities.
2. Two or more activities have some (and not all) of their predecessor activities in
common.
Dummy activities are usually shown by arrows with dashed lines. To illustrate, in Fig 1,
we have a situation in which both the activities A and B have the same start and end events
. It is incorrect to represent the activities A and B, as shown in Part (i) because 1-2 is used
to represent either A or B. It is against the rule of assigning unique numbers to activities
for the purpose of identification.
2
1 2
1 3
(i) (ii)
By introducing a dummy activity, the activities A and B can be identified as 1-2 and 1-3
respectively as shown in Part (ii). Thus in situations where two or more activities have the
same beginning and end events, a dummy activity is introduced to resolve the problem
There are number of concepts and rules which should be followed in dealing with activities
and events, when making a network. It helps to develop a correct structure of the network.
1. Each activity is represented by one and only one arrow in the network. Therefore no
single activity can be represented twice in the network.
2. Events are identified by numbers. The number given to an event should be higher than
that is allotted to the event immediately preceding activity.
3. The activities are identified by the numbers of their starting and ending nodes
4. Paralleling activities between two events are prohibited. Thus, no two activities can
have the same start and end events.
5. Before an activity can be undertaken, all activities preceding it must be completed.
6. Dangling must be avoided in a network. It means an event which is not connected to
another event by an activity. An activity is merging into an event, but no activity is starting
or emerging from the event. Thus the event becomes detached from the network
208
Check your Progress 1
1. PERT stands for program enterprise and resource technique. (True/False)
2. A dummy variable is an activity inserted into the AOA network diagram to show a
precedence relationship, but does not represent any passage of time. (True/False)
3. Unlike PERT, CPM incorporates probabilistic time estimates into the project
management process. (True/False)
4. An activity which should be completed immediately prior to the start of another activity is
a. Successor activity
b. Predecessor activity
c. Dummy activity
d. Concurrent activity
5. An activity which cannot be started, until the completion of one or more activities is called
successor activity
a. Successor activity
b. Predecessor activity
c. Dummy activity
d. Concurrent activity
The first step in PERT/CPM scheduling process is to develop a list of all the activities that
comprise a project and the interdependence relationship. Let us take an example of
construction of a commercial complex. First we need to prepare plan of the complex. Next
we may prepare prospectus and start looking for potential tenants. A contractor should to
be selected and building permits should be prepared and approval should be obtained. Then
the construction can be done. Lastly the contracts can be finalized with tenants and they
can move in. In this project, the various activities required to be performed along with the
time needed for execution are given in Table1 below:
209
Table 1: Construction of commercial complex Formatted: Indent: Left: 0.5 cm
Note that this table contains information about immediate predecessors. The immediate Formatted: Indent: Left: 0.5 cm
predecessors for a particular activity are those that must be completed immediately before
this activity may start. For example, before we can start on the activity A-Building the Plan
of the commercial complex, at any time as this the first activity. However Activity B can
be started once activity A is completed. Activities B, D, E, can be started, only after
completing Activity A. In the same way rest of the information in the table can be
understood.
Once the activities comprising a project and the interdependency relationship among them
is clearly identified, they can be portrayed graphically using a network or an arrow diagram.
As earlier explained, the arrows in a project network represent various activities in a
project. Along with each arrow the description and duration of the activity is represented.
The circles at the beginning and at the end of the arrow represent the nodes or the events.
Activity A has no predecessor activity, as it is the first activity. Let us assume that activity
‘A’ starts at node 1 and ends at node 2. It is represented graphically as below:
A
1 2 Formatted: English (India)
210
Next activities B, D and E, have a precedence of A, so all the activities will start at the end
node of A. Let us demonstrate:
3
B
A D
1 2 5
1 Formatted: English (India)
1
E
1
4
As activity C has a precedence of B, it will start at node 3. Similarly activity F will start at
node 4. However as activity G has a precedence of two events D and F, activity will end
on 5.
C
3 6
B
A D
1 2 5
E F
4
Similarly rest of the precedence relationship can be followed and the final network can be
developed. This figure depicts the project network for constructing the commercial
complex.
211
C
3 6
B H
A D G I
1 2 5 7 8
E F
Illustration: The table 2 gives the activities involved in construction of a house. Develop
a project network
Table 2 – Construction of a house
Activity Description Duration Immediate Formatted: Indent: Left: 0.5 cm
Predecessor
A Design House 3 - Formatted: Indent: Left: 0.5 cm
212
3
B
1 A 2 4
Both activity D and E, have activity A as predecessor and activities B & C as successor. A
dummy is required when two or more activities have identical immediate predecessor and
successor activities. Hence a dummy is required in this step, which can start either at end
of activity B or C.
A C D
1 2 4 6
3
B
A C D G
1 2 4 6 7
E F
5
213
on the critical path is delayed, whole project will be delayed. There can be multiple critical
paths, if there is a tie among the longest paths. To understand the concept of critical path
and project completion let us consider the earlier example given in Table 1
C(6)
3 6
B(4) H(12)
E(1) F(4)
In the above network, the time estimates are mentioned within bracket along with the
activity name on the arrow. There are three possible paths for this network. For this simple
network, the critical path can be found by enumerating all the possible paths. These paths
are listed below:
Path Length
31 Formatted: Indent: Left: 0.5 cm
(i) A→B→C→H→I
Formatted: Indent: Left: 0.5 cm, First line: 0 cm
(ii) A→D→G→I 26
Formatted: Indent: Left: 0.5 cm
(iii) A→E→F→G→I 28 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Formatted: Indent: Left: 0.5 cm
The first path (A→B→C→H→I) is the critical path, as it takes the longest period of time
Formatted: Indent: Left: 0.5 cm
to complete i.e., 31 months. For this network the project completion time will be 31 months.
The activities on the critical path are known as critical activities, as delay in any one of
them can delay the entire project. In other words there is no slack time in the activities on
the critical path. Slack time is the time an activity can be delayed without delaying the
project.
For a small network it is simple to list all the possible paths and compare to find the critical
path. As the number of activities increases, the network becomes complex and finding the
critical path by enumerating all path becomes time consuming. Therefore there is a need to
develop a systematic approach to find the critical path. These computations involve a
forward and a backward pass through the network. The forward pass calculation begins, at
the start event and moves to the end event of the project network, i.e. from left to right of
the network. The backward pass calculation begins at the end event and moves to the start
event of the network, i.e., from right to left of the network event.
214
1.3.3 Determination of Earliest start and Earliest Finish Times- Forward pass
The earliest start (ES) time indicates the earliest that a given activity can be scheduled and
earliest finish (EF) times indicates the time which the activity can be completed, at the
earliest. To begin with, each of the activities initiated at the starting node is assumed to
start at time ‘0’. The earliest finish time for each activity is obtained by adding the activity
time to the ES time. The formula of EF is:
𝐸𝐹 = 𝐸𝑆 + 𝑡
𝑤ℎ𝑒𝑟𝑒, 𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 𝑡𝑖𝑚𝑒
In our example, activity A is the first activity and therefore will start at ‘0’ time. As the
duration of the activity A is 5 months, so its EF time will be 0+5=5. Now all the subsequent
activities are assumed to start as soon as possible, that is as soon as all of their respective
predecessor activities are completed. For a given activity, the ES would be taken as the
maximum of the EF’s of the activities preceding the activity. For activity B,D and E there
is only one predecessor activity i.e., activity A and EF of A is 5, so [ES, EF] of B is [5,9];
[ES, EF] of D is [5,8] and [ES, EF] of E is [5,6] . Similarly for C and F the [ES, EF] are
[9,15] and [6,10] respectively. The ES time of G has to be the maximum of EF’s of the two
preceding activities D (EF=8) and F(EF=10). Therefore the ES of G is 10 and EF is 24
(10+14). The remaining values are calculated and given in Table 3.
1.3.41.3.4. Determination of Late Start and late Finish times- Backward Pass
The concept of the backward pass is to compute the latest allowable times of starting and
finishing, LS and LF for each of the activities of the project. The term ‘ latest allowable “
means how much an activity can be delayed without delaying the project completion time.
The computations for the backward pass start at the terminal event and move towards the
start event. The terminal node is assigned the latest of EF times of activities merging into
it. In our example, there is only one terminal activity, so the time assigned to node 8 will
be 31. This implies that the latest finish (LF) time of activity I is equal to 31. The formula
for Latest start time is:
𝐿𝑆 = 𝐿𝐹 − 𝑡
𝑤ℎ𝑒𝑟𝑒, 𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 𝑡𝑖𝑚𝑒
The LS time for the activity being equal to its LF time minus its duration, so for G the LS
would be 31-4=27. In respect of others, the LF time for an activity would be set as equal
to the smallest or minimum of the LS times of its successor activities. The LF time of
activities G and H would be equal to 27, the LS of only succeeding activity I. The latest
start and completion times of activities F, E, D, C and B are similarly calculated, as they
have one succeeding activity. However activity A has three succeeding activity- B, D and
E. In this case, the minimum of ES times of these three activities will be taken as the EF of
activity A. In our example the ES of activities B, D and E are all 5, so EF of activity A is
5 and the ES is 0. All the calculated latest finish times are given in Table 3.
215
Once the forward pass and backward pass times are computed, it becomes very easy to
calculate the critical path. If the early start and late start or early finish and late finish values
are equal, then the activity is referred as a critical activity. If the values are not equal, the
activity is termed as non critical. The path consisting of critical activities is called a critical
path.
1.3.51.3.5. Determination of float
The concept of float is of paramount importance to a project manager. Every critical
activity in a network cannot be scheduled later than their earliest schedule time without
delaying the project duration. However, non-critical activity can be scheduled later and
allows exercising control over time, resources, or cost. This flexibility is seen in terms of
the float or slack that any activity has. It is the time available to an activity in addition to
its duration. Since each activity has four associated times, four types of floats can be
identified. In practice, only three are used and discussed here:
Total Float: The total float of an activity represents the amount of time by which it can be
delayed without delaying the project completion date. It is equal to the difference between
the total time available for the performance of an activity and the time required or its
performance. For any activity, the total float is calculated as follows:
𝑇𝑜𝑡𝑎𝑙 𝐹𝑙𝑜𝑎𝑡 = 𝐿𝐹 − 𝐸𝐹
= 𝐿𝑆 − 𝐸𝑆
= 𝐿𝐹 − 𝐸𝑆 − 𝑡
Where t is the activity time
In our example, for activity D,𝑇𝑜𝑡𝑎𝑙 𝐹𝑙𝑜𝑎𝑡 = 𝐿𝐹 − 𝐸𝐹 = 10 − 5 = 5,
= 𝐿𝑆 − 𝐸𝑆 = 13 − 8 = 5
Free Float: The free float is that part of the total float which can be used without
affecting the float of the succeeding activities. The free float is calculated as the earliest
start time for the following activity (j) minus the earliest completion time for this activity
(i).
Independent Float: The independent float time of an activity is the amount of float time
which can be used without affecting either the head or tail events. The value of independent
float is as follows, if ‘i’ is the preceding activity, ‘j’ is the succeeding activity and ‘t’ is the
duration of activity
𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝐹𝑙𝑜𝑎𝑡 = 𝐸𝑆𝑗 − 𝐿𝐹𝑖 − 𝑡
In our example, for activity D 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝐹𝑙𝑜𝑎𝑡 = 𝐸𝑆𝑗 − 𝐿𝐹𝑖 − 𝑡 = 10 − 5 − 3 = 2
The independent float is always either equal to or less than the free float of an activity. A
negative value of independent float may be obtained, but in that case independent float is
taken as zero. Based on the data given in Table 2, the Earliest and latest times and floats
can be calculated as below:
216
Table 3: Calculation of Earliest and latest times and float
In the previous section, the critical path and the project length were determined on the basis
of activity times that were assumed to be known and constant. However in reality in most
projects these activity times are unlikely to be predicted correctly. In PERT, we assume
that it is not possible to estimate the time for each activity precisely and instead
probabilistic estimates of time are only possible. This method uses three time estimates for
an activity. They are:
• Optimistic Time (a). This is the shortest time the activity can take to complete. It is
based on the assumption that there will not be any difficulty in completing the work
• Most likely time (m) This refers to the time that would normally take to complete an
activity. The most likely time estimate is between the optimistic and pessimistic time
estimate.
• Pessimistic time (b) This is the longest time the activity could take to finish. It assumes
that unexpected problems can occur during the execution of the activity
Depending on the values of a, m, and b, the resulting distribution of activity duration can
take a variety of forms. Typically the activity completion times is assumed to follow beta
distribution as shown in figure 1. The beta distribution is a skewed curve, which can be
either positively or negatively skewed. The below one is a positively skewed curve.
The expected times (te) of various activities is the time estimate based on the weighted
arithmetic mean of a, m and b. It can be calculated as follows:
𝑎 + 4𝑚 + 𝑏
𝑡𝑒 =
6
The variance σ2 of the completion time of an activity is calculated as follows:
𝑏−𝑎 2
𝜎2 = ( )
6
To demonstrate the use of PERT, let us take an illustration. Instead of a single estimate,
there are three time estimates.
218
Table 4: Three time estimates of activity times
Activity Predecessor Time estimates
Activity Optimistic Most likely( Pessimistic
(a) m) (b)
A - 1 4 7 Formatted: Indent: Left: 0.5 cm
B A 2 6 7 Formatted: Indent: Left: 0.5 cm
C D 3 4 6 Formatted: Indent: Left: 0.5 cm
D A 6 12 14 Formatted: Indent: Left: 0.5 cm
E D 3 6 12 Formatted: Indent: Left: 0.5 cm
F B,C 6 8 16 Formatted: Indent: Left: 0.5 cm
G E,F 1 5 6
Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
First let us draw the project network reflecting the precedence relationships:
3
B F
A G
1 2 4 5
C
D E
Next we need to find the expected activity times and variance and then we can apply the
concepts learnt earlier to compute critical path. The calculations of expected times and
variance are shown in given table5
𝑥 − 𝑡𝑒
𝑧=
√∑ 𝜎𝑝2
It is observed from the z table that the probability value of z=0.39 is 0.1517. However as
we studied in unit 4 of block 1, this area is from the mean and we need to find the total
shaded area as shown in the above figure. The desired probability is 0.5+ 0.1517= 0.6517,
so we can say there is 65 % chance of completing the project by the desired time.
220
Check Your Progress 3
The following table of probabilistic time estimates (in weeks) and activity predecessors
are provided for a project.
Time Estimates (weeks)
Activity A M b Activity
Predecessor
A 3 5 7 --
B 4 8 10 A
C 2 3 5 A
D 6 9 12 B, C
E 5 9 15 D
4. Using given data, the variance of the project’s total completion time is
a. 5.472 weeks.
b. 5.222 weeks.
c. 4.872 weeks.
d. 3.752 weeks.
5. Using given data, the probability that the project could be completed in 34 weeks or less is
approximately
a. 86 percent.
b. 89 percent.
c. 91 percent.
d. 96 percent.
221
1.51.5. Let Us Sum Up
In this unit we discussed how network techniques can be used to plan, schedule and control
a wide variety of projects. The most important aspect of project scheduling is the
development of PERT/CPM project network which depicts the activities and their
precedence relationships. From this project network and activity time estimates, the critical
path for the network, the associated critical activities can be identified. Based on the critical
path, project completion time can be calculated. A network provides information on
earliest start and finish times, the latest start and latest finish times and the float for each
activity. The length of the time an activity can be delayed without affecting the project
completion time is known as float. Activity times may be probabilistic or deterministic.
PERT uses three time estimates- optimistic, Most likely and Pessimistic. The activity times
are considered to follow beta distribution. The probability of completion of a project within
a specific time period can be determined by the use of normal distribution
1. (a)
Solution: t=(3+4*5+7)/6=5.0 weeks
2. (d)
Solution: variance=((15-5)/6)2=2.778 weeks
3. (c)
4. (b)
5. (c)
Solution: Z=(34-31)/2.285=1.31, Pr.=0.91
222
1.7 Glossary
Program Evaluation and Review Technique (PERT): A network based project
scheduling techniques with uncertain activity times.
Critical path method (CPM): A network based scheduling technique with certain activity
times.
Activities: Specific jobs or tasks that are components of a project.
Immediate Predecessor: The activities that must be completed immediately prior to the
start of a given activity
Project Network: A graphical representation of a project that depicts the activities and
shows the predecessor relationships among the activities
Critical Path: The longest path in a project network.
Earliest Start Time: The earliest time an activity can begin.
Earliest Finish Time: The earliest time an activity can be completed
Latest start Time: The latest time an activity may begin without increasing the project
completion time.
Latest Finish Time: The latest time an activity may be completed without increasing the
project completion time.
Float/Slack: The length of the time an activity can be delayed without affecting the project
completion time
Optimistic Time: The minimum activity time if everything progresses ideally
Most Probable Time: The most probable activity time under normal conditions.
Pessimistic Time: The maximum activity time if significant delays are encountered.
Expected Time: The average activity time
Beta Probability Distribution: A probability distribution used to describe activity times
1.8 Assignment
1. State the rules of constructing a network.
2. What is critical path? State the necessary and sufficient conditions of critical path. Can
a project have multiple paths?
3. Explain the concept of float? Distinguish clearly between free and independent float.
4. A small project consists of seven activities for which relevant data is given below:
223
5. A project consisting of eight activities has the following characteristics:
Activity Predecessor Time estimates Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Activity Optimistic Most likely( Pessimistic
(a) m) (b)
A - 2 4 12 Formatted: Indent: Left: 0.5 cm
B - 10 12 26 Formatted: Indent: Left: 0.5 cm
C A 8 9 10 Formatted: Indent: Left: 0.5 cm
D A 10 15 20 Formatted: Indent: Left: 0.5 cm
E A 7 7.5 11 Formatted: Indent: Left: 0.5 cm
F B,C 9 9 9 Formatted: Indent: Left: 0.5 cm
G D 3 3.5 7
Formatted: Indent: Left: 0.5 cm
H E,F,G 5 5 5
Formatted: Indent: Left: 0.5 cm
a) Draw the PERT network
Formatted: Indent: Left: 0.5 cm, First line: 0 cm
b) Find out the critical path and expected project completion time
c) If a 30 week deadline is imposed, what is the probability that the project will be
completed within the time limit?
Formatted: Indent: Left: 0.5 cm
1.9 Activities
You are made in charge of planning and coordinating next sales management training
program of your company. List out the activities that, needs to be done to organize the
program with assumed activity times and develop a network.
Food Solutions Ltd. distributes a variety of food products that are sold through grocery
stores and supermarket outlets. The company receives orders directly from the individual
outlets with a typical order requesting the delivery of several cases of anywhere from 20 to
50 different products. Under the company’s current warehouse operation warehouse clerks
dispatch order picking personnel to fill each order and have the goods moved to the
warehouse shipping area. Because of the high labour costs and relatively low productivity
of hand order picking, management decided to automate the warehouse operation by
installing a computer controlled order picking system, along with a conveyor system for
moving goods from storage to the warehouse shipping area.
The director of material management has been named as the project manager in charge of
the automated warehouse system. After consulting with members of the engineering staff
and warehouse management personnel, the director compiled a list of activities associated
with the project. The optimistic, most probable and pessimistic times have been also seen
provided for each activity.
224
Activity Description Predecessor Optimistic Most Pessimistic
Probable
A Determine - 4 6 8 Formatted: Indent: Left: 0.5 cm
equipment
needs
B Obtain - 6 8 16 Formatted: Indent: Left: 0.5 cm
vendor
proposals
C Select A, B 2 4 6 Formatted: Indent: Left: 0.5 cm
vendor
D Order C 8 10 24 Formatted: Indent: Left: 0.5 cm
system
E Design C 7 10 13 Formatted: Indent: Left: 0.5 cm
new
warehouse
layout
F Design E 4 6 8 Formatted: Indent: Left: 0.5 cm
warehouse
G Design C 4 6 20 Formatted: Indent: Left: 0.5 cm
Computer
interface
H Interface D, F, G 4 6 8 Formatted: Indent: Left: 0.5 cm
computer
I Install D, F 4 6 14 Formatted: Indent: Left: 0.5 cm
system
J Train H 3 4 5 Formatted: Indent: Left: 0.5 cm
system
operators
K Test I, J 2 4 6 Formatted: Indent: Left: 0.5 cm
System
Formatted: Indent: Left: 0.5 cm
Develop a report that presents the activity schedule and expected project completion time
for the warehouse expansion project. The top management of Food solutions established a
required 40 week completion time for the project. Can this completion be achieved?
Include probability distribution in your discussion. What recommendations do you have if
the 40 week completion time is required?
1.11Further Readings
1. Operations Research, By Hamdy A. Taha, Pearson Education Formatted: Indent: Left: 0.5 cm, First line: 0 cm
2. Operations Research theory and Applications by J.K. Sharma, Macmillan India Ltd.
3. Quantitative techniques in Management, by N.D. Vora, McGraw hills
4. Quantitative methods for business, by Anderson, Sweeney and Williams, Thompson
5. Quantitative Analysis by Render, Stair, Hanna & Badri, Pearson Education
6. Operations Research by Pradeep Pai, Oxford University Press
225
Unit No. 2 Waiting Line Models
______________________________
Unit Structure
2.0 Learning Objectives
2.1 Introduction
2.2 Waiting Line system
2.2.1 Arrival process
2.2.2 Queue Structure
2.2.3 Service System
2.11 Glossary
2.12 Assignment
2.13 Activities
226
2.0 Learning Objectives
2.1Introduction
Waiting in line is a common occurrence – in banks, public transportation, restaurant,
hospitals, theatres, workshops, saloons and for several other situations. The waiting line
problem is identified by the random arrival of a group of customers to receive some service.
Waiting line models are developed to help managers understand and take decisions
concerning the operation of waiting lines. In operations research terminology, a waiting
line is also known as a queue and the body of knowledge dealing with waiting lines is
known as queuing theory. The theory of queuing models has its origin in the work of A.K.
Erlang, a Danish telephone engineer during early 1900’s.
Waiting Lines are formed when there are more arrivals than what can be handled at the
service facility and no waiting line will be formed if arrivals are less than that. Thus lack
of adequate facility would cause waiting lines of customers to be formed. At times the time
required to be spent in a waiting line by customer is undesirable. The only way the demand
in service can be met is to increase the service capacity or service efficiency to higher level
( if possible). The service capacity can be build to such a level that the demand at the peak
time can be met. But adding more number of checkout clerks, bank tellers or servers is not
always the most economical strategy for improving service, as the system will remain idle
when there are few or no customers. The managers therefore needs to decide an appropriate
level of service which is neither too low nor too high, so that waiting time can be kept
within tolerable limits. The objective of waiting line models is to provide such information
to managers that they are able to make decisions to balance desirable service levels against
the cost of providing the service.
227
Queuing System
Input
Queue Service
Source
System
Customers
leave
the system
Arrival
Process
The arrivals from the input populations can be classified on different basis as follows:
Source of arrival: Customer arrivals at a service system may be drawn from a finite or
infinite population. For example all the people of the city can be potential customers for a
supermarket. The number of people being very large, it can be taken as infinite. An infinite
population is large enough in relation to the service system so that the population caused
by subtraction or additions to the existing population does not significantly affect the
system probabilities. However there are business situations where the population is
considered finite. For example, consider a group of six machines being maintained by one
repairman. When one machine breaks down, the source population is reduced to five and
the chance of another machine breaking down is less than when six machines were
operating. The probability of another breakdown is again changed if two machines are
down, with only four operating machines.
Size of arrival: The customers may arrive for service individually or in groups. Single
arrival are illustrated by customers visiting banks, saloons etc. On the other hand families
visiting restaurants, shipments getting loaded in trucks are example of bulk or batch
arrivals.
Arrival Distribution: Defining the arrival process for a waiting line involves determining
the distribution of customer arrival times. The queuing models wherein the number of
arrivals in a given period of time is known with certainty are known as deterministic
models. On the other hand for many waiting line situations the arrivals occur randomly
and independently of other arrivals and we cannot predict when an arrival will occur. In
such cases, a frequently employed assumption is that the Poisson probability distribution
provides a good description of the arrival pattern.
Degree of Patience: A patient arrival will wait as long as the service facility is ready to
serve. There are two types of impatient arrivals. Members of the first class arrive, view the
service facility and length of the line and then decide to leave. Those in the second class
arrive, view, wait in line and leave after some time. This behavior of first type is known as
228
balking and second is termed as reneging.
In queue structure the important thing to know is the queue discipline which means the set
of rules for determining the order of service to customers in a waiting line. The most
common disciplines are:
1. First come First Served (FCFS)
2. Last come first served (LCFS)
3. Service in random order (SIRO)
4. Priority Service/reservations
There are two aspects to the service system- (1) the structure of the service system (2)
Distribution of service time.
Structure of service system: The structure of a service system means how the service
facilities exist. Waiting line processes are generally classified into four basic structures:
Single-channel single-phase, single-channel multiple-phase, multiple-channel single-phase
and multiple-channel multiple-phase. Channels are the number of parallel servers and
phases denote the number of sequential servers. A bank with a single clerk providing
service to a single line of customers is an example of single-channel single-phase queuing
system. If several clerks are providing service to a single line of customers, it will be an
example of multiple-channel single-phase system. An example of single-channel multiple-
phase system is the manufacturing assembly line type operation in which the product goes
through several sequential machines at work stations to be worked on. If there are two or
more assembly lines manufacturing the same product, it is an example of multiple-channel
multiple-phase.
Distribution of Service Time: The service time is the time a customer spends at the service
facility once the service has started. Waiting line formulae generally specify service rate as
the number of units served per unit of time. A constant service time rule states that each
service takes exactly the same time, as in case of automated operations. When service times
are random, they can be approximated by the exponential probability distribution.
There are numerous waiting line models available. We shall be considering the following
models in this unit:
a) Single Channel Poisson Arrivals with Exponential Service Times (M/M/1)
b) Multiple Channel Poisson Arrivals with Exponential Service Times (M/M/C)
c) Single- Channel with Poisson arrivals and Arbitrary Service Times (M/G/1)
In each of these models the customer arrivals follow Poisson distribution. If the arrivals are
independent with a mean arrival rate of 𝜆 per period of time, the Poisson probability
function provides the probability of x arrivals in a specific time period as (discussed in
detail Block 1, Unit 3).
𝜆𝑥 𝑒 −𝜆
𝑃(𝑥) =
𝑥!
Where, x= number of arrivals in the time period
𝜆 = mean number of arrivals per time period
e= 2.71828
For the first two models, the service times are distributed exponentially. Using exponential
probability distribution, the probability that the service time will be less than or equal to a
time of length t is(discussed in detail Block 1, Unit 4):
𝑃( 𝑠𝑒𝑟𝑣𝑖𝑐𝑒 ≤ 𝑡) = 1 − 𝑒 −𝜇𝑡
Where, µ= mean number of units can be served per unit time period
230
e=2.71828
Further in each of the models the customer service is assumed to be in first-come- first –
served order (FCFS). Now we will describe each of the models in detail.
To evaluate a model we need to first check whether a service station can handle the
customer demand of service. If 𝜆 ≥ µ, the waiting line will increase infinitely and the system
will collapse. For the system to be functional arrival rate should be less that service rate
(𝜆 < µ).
The following Formulas are used to compute the steady state operating characteristics:
1. Probability that system is busy or probability that a customer has to wait for service :
𝜆
𝜌=
𝜇
Where 𝜌 𝑖𝑠 𝑟ℎ𝑜, 𝑎𝑙𝑠𝑜 𝑘𝑛𝑜𝑤𝑛 𝑎𝑠 𝑡𝑟𝑎𝑓𝑓𝑖𝑐 𝐼𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 𝑜𝑟 𝑈𝑡𝑖𝑙𝑖𝑠𝑎𝑡𝑖𝑜𝑛
2. The probability that zero units are in the system or probability that system is idle
𝜆
𝑃0 = 1 − 𝜌 = 1 −
𝜇
3. Probability of exactly n customers in the system::
𝜆 2
𝑃𝑛 = 𝜌𝑛 𝑃0 = ( ) 𝑃0
𝜇
4. Average/expected number of customers in the system:
𝜆 𝜌
𝐿𝑠 = 𝑜𝑟
𝜇−𝜆 1−𝜌
231
5. Average/expected number of customers in the queue:
𝜆2 𝜌2
𝐿𝑞 = 𝑜𝑟
𝜇(𝜇 − 𝜆) 1−𝜌
6. Average waiting time in queue:
𝜆 𝜌
𝑊𝑞 = 𝑜𝑟
𝜇(𝜇 − 𝜆) 𝜇−𝜆
7. Average waiting time in system:
1
𝑊𝑠 =
𝜇−𝜆
Illustration2: A repairman finds that the time spent on the job has an exponential
distribution with mean 30 minutes. If he repairs machines at an average rate of 10 per 8
hour day, what is the expected idle time each day? How many jobs are ahead of the set just
brought in? What is the probability that four machines are waiting to get repaired?
232
Here the arrival rate 𝜆 = 10 machines/day and mean time of servicing is 30 minutes. It
means in one hour 2 machines and in a day (2x8) =16 machines are repaired. So µ= 16
machine/day.
To determine the number of jobs just brought in, we should be calculating average no of
machines in the system
𝜆 10 5
𝐿𝑠 = = = = 1.67 𝑚𝑎𝑐ℎ𝑖𝑛𝑒𝑠
𝜇 − 𝜆 16 − 10 3
Probability that four machines are waiting means in total there are five machines in the
system.
𝜆 2 10 2
𝑃𝑛 = ( ) 𝑃0 = ( ) × 0.375 = 0.1465
𝜇 16
233
The following Formulas are used to compute the steady state operating characteristics for
multiple –channel waiting lines, where
234
Illustration 3: The Customer Care Centre of a departmental store help the customers with
their questions or complaints or issues regarding credit card bills. There are chairs placed
along the wall making it a single waiting line. The customers are served by three store
representatives, and customers on a first come first serve basis. The store management
wants to analyze this queuing system as excessive waiting ties can make customers angry
enough to shop at other stores. A study of the customer service department for a 6 month
period shows that an average of 10 customers arrive per hour and an average of 4 customers
can be served per hour by a customer care representative.
Here , 𝜆 = 10 customers/hour
µ= 4 customers/hour
K= 3 customer representatives
Kµ=3x4=12 (>ʎ)
Using the multiple server model formulas, we can compute the following operating
characteristics for the departmental store:
= 0.045
Probability that a customer arriving in the system must wait for service(i.e. all the three
servers are busy) is
𝐾
(𝜆⁄𝜇 ) 𝐾𝜇
𝑃𝑤 = ( )𝑃
𝐾! 𝐾𝜇 − 𝜆 0
3
(10⁄4) 3×4
= ( ) 0.045
3! 3 × 4 − 10
=0.703
235
𝐾
(𝜆⁄𝜇 ) 𝜌
𝐿𝑞 = (𝑃0 )
𝐾! (1 − ρ)2
3
(10⁄4) 0.833
= (0.045)
3! (1 − 0.833)2
The department stores management has observed that customers are frustrated by the
waiting time of 21 minute and the 0.703 probability of waiting. The management is
considering employing an additional service representative to improve the level of service.
The operating characteristics for this system must be recomputed with K=4 service
representatives: P0= .073, Pw=0.31, Ls=3 customers, Ws= 18 minutes, Lq=0.5 customers,
Wq= 3 minutes
The waiting time is considerable reduced from 21 minutes to 3 minutes. However, this
improvement in the quality of the service would have to be compared with the cost of
adding an extra service representative, before taking any decision.
236
2.7 Single- Channel with Poisson arrivals and Arbitrary Service
Times(M/G/1)
This model is based on following assumptions:
• The arrivals follow Poisson distribution with a mean arrival rate of 𝜆
• The service time has a general probability distribution with a mean service rate of µ
and standard deviation of σ.
• There is a single service station
• A single waiting line is formed
• Customers are served on FCFS basis
The following Formulas are used to compute the steady state operating characteristics for
M/G/1 model is, where
238
2.8 Economic Analysis of waiting Line
The information we derive from the operating characteristics of various models can be used
to determine the appropriate level of service. Inadequate service would cause excessive
waiting which has a cost in terms of customer frustration, loss of goodwill, direct cost of
idle machines (machines to be used in production waiting for repair work) etc. On the other
hand, high service level would result in higher set up cost and idle time for service station.
Thus the goal of queuing modeling is the achievement of an economic balance between the
cost of providing service and the cost associated with the waiting time for service. The
optimum level of service would be where the total of waiting time cost and cost of
providing service is minimum. Figure 1, shows that increasing the service level result in
increasing the cost of service and reducing the cost of waiting time.
The thick curve shows that the total cost decreases to a point and then starts increasing.
The service level corresponding to the minimum point on it is the optimum service level.
239
Case I- One worker
𝜆 = 3/hour and µ= 5/hour
Cost of waiting(𝐶𝑤 ) = Down time cost for 1.5 machines = 250 × 1.5 = 375 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟
Cost of service (𝐶𝑠 ) for one worker is Rs 160/hour
Total cost per hour = 𝐶𝑤 + 𝐶𝑠 = 375 + 160 = 𝑅𝑠 535
Cost of waiting(𝐶𝑤 ) = Down time cost for 0.75 machines = 250 × 0.75 =
187.5 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟
Cost of service (𝐶𝑠 ) for two worker = 160 × 2 = 320 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟
Total cost per hour = 𝐶𝑤 + 𝐶𝑠 = 187.5 + 320 = 𝑅𝑠 507.5
Cost of waiting(𝐶𝑤 ) = Down time cost for 0.60 machines = 250 × 0.60 = 150 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟
Cost of service (𝐶𝑠 ) for three worker = 160 × 3 = 480 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟
Total cost per hour = 𝐶𝑤 + 𝐶𝑠 = 150 + 480 = 𝑅𝑠 630
Comparing the cost of one, two and three workers, the total cost is lowest in Case II. Hence
the optimal solution is hiring 2 workers.
Waiting line theory deals with situations where customers arrive, wait for the service, get
the service and leave the system. In this unit we discussed a variety of waiting line models
that have been developed to help managers make better decisions concerning the operation
of waiting lines. The formulae required to compute operating characteristics or
performance measures for each model were presented. The operating characteristics
include- Probability that system is idle, Average number of customers in system, average
number of customers in queue, average time a unit spends in the waiting line, average time
a unit spends in system, probability that arriving customers have to wait for service .
Queuing structures are analyzed for determining the optimum level of service, where the
total cost of providing service and waiting is minimized. An increase in the level of service
increases the cost of providing service but reduces the cost of waiting. While the waiting
line models can be deterministic as well, the probabilistic ones are commonly occurring
and analyzed. Three models discussed in this unit include- Single-channel Poisson- arrival
with exponential- service times (M/M/1), Multiple-channel, Poisson arrivals with
exponential service time(M/M/C) and Single –Channel, Poisson- arrival with arbitrary
service times. For a queuing system to be functional the arrival rate of the customers per
unit of time should be less than the service rate.
241
1. True
2. (d)
2.11 Glossary
2.12 Assignment
1. Which assumptions are necessary to employ (M/M/C) waiting Line Model?
2. Discuss the waiting line system in detail with some queuing situations.
3. Describe a single server waiting line mode. Give an example from real life, for each f
the following queuing models
a. First come first serve
b. Last come last serve
4. The mechanic at Car point is able to install new mufflers at an average of three per
hour while customers arrive at an average rate of 2 per hour. Assuming that the conditions
for a single –server infinite population model are all satisfied, calculate the following:
a. Utilization parameter
b. The average number of customers in the system
c. The average time a customer spends in the queue
d. The probability that there are more than three customers in the system.
5. A service station has five mechanics each of whom can service a scooter in 2 hours on
an average. The scooters are registered at a single counter and then sent for servicing to
different mechanics. Scooters arrive at the service station at an average rate of 2 scooters
per hour. Assuming that arrivals are Poisson distributed and servicing times are distributed
exponentially, determine:
242
a. The probability that system is idle
b. The probability that there are 3 scooters in the service centre
c. The expected number of scooters waiting in the queue
d. The average waiting time in the queue.
2.13Activities
Analyze the following queuing systems by describing their various system properties:
a) Hospital Emergency Room
b) Traffic light
c) Computer system at university
The arrival rate of 24 per hour means that on an average a customer arrives about every 2.5
minutes (60/24). This indicates the store is busy. Because of the nature of the store,
customers purchase few items and expect a quick service. Customers expect to spend more
time in a supermarket where they make larger purchases but they shop at a drive-in market
because it is quicker than a supermarket. Given customer’s expectations, the manager
believes that it will be unacceptable for a customer to wait beyond 5 minutes in the waiting
line.
The market manager wants to determine the operating characteristics for this waiting line
system and wants to test if hiring another employee to pack up purchases will help in
reducing customer waiting time and still be economically viable. An extra employee will
cost the market manager $150 per week. With the help of market research agency, the
manager had determined that for each minute that customer waiting time is reduced; the
store avoids a loss in sales of $75 per week. The service rate with two employees will be
40 customers per hour.
243
Unit No. 3 Game Theory
______________________________
Unit Structure
3.0 Learning Objectives
3.1 Introduction
3.2 Basic Concepts in Game Theory
3.3 Two-person zero-sum game
3.3.1 Payoff Matrix
3.3.2 Maximin Strategy
3.3.3 Minimax Strategy
3.3.4 Saddle Point
3.4 Game with No Saddle point
3.5 Principle of Dominance
3.6 Solution of 2 x n and m x 2 games
3.7 Let Us Sum Up
3.8 Answers for Check your Progress
3.9 Glossary
3.10 Assignment
3.11 Activities
3.12 Case Study
3.13 Further Readings
244
3.0 Learning Objectives
3.1Introduction
The models and techniques we discussed so far in operations research were involving
interest of an organization. For example in transportation problem we are interested in
minimization of cost or maximization of profits given the organizational constraints.
However in real life situations, decision making is often taken where two or more rational
opponents are involved under conditions of competition and conflicting interest. Game
theory deals with processes where an individual or a group or an organization is not in
complete control of the other player, the opponent and addresses situations involving
conflict, co-operation or both at different levels.
The main objective of the game theory is to determine the rules of rational behavior in the
situations in which the outcomes are dependent on the actions of the interdependent
players. A game is a situation in which two or more players are competing. The players
may have different objectives but their fate is intertwined. They might have some control
that will influence the outcome but they do not have complete control over others. Game
Theory is the analysis (or science) of rational behavior in interactive decision-making. It is
therefore distinguished from individual decision-making situations by the presence of
significant interactions with other ‘players’ in the game. Game Theory can be used to
help in explaining past events and situations, predict what actions players will take in
future games, and based on it take decisions in interactions with other players to achieve
the best outcome.
If there are two participants in a game it is called two-person game and if more than two
participants are involved, it is a n-person game. In a game, if the sum of the gains and
losses is equal to zero, it is called zero- sum or constant-sum game. If the sum of the gains
and losses is not equal to zero, it is called non-zero-sum game. A game is said to be finite
if each player has the option of choosing from only a finite number of strategies, or else it
is called infinite.
Some of the key concepts to be used in game theory are described below:
245
Players: The competitors or decision makers in a game are called the players of the game.
Strategies: The alternative courses of action available to a player are called as strategies
Payoff: The outcome of playing a game is called the payoff to the concerned player.
Optimal Strategy: A strategy in which the player can achieve the maximum payoff is
called the optimal strategy.
Payoff Matrix: The tabular display of the payoffs of the players under various alternatives
is called the payoff matrix.
Pure strategy: A game solution that provide a single best strategy for each player.
Mixed strategy: If there is no one specific strategy as the best strategy for any player in a
game, then the game is referred to as mixed strategy or a mixed game. Each player has to
choose different alternative courses of action from time to time.
2. The zero-sum game implies that any gain of one player is exactly matched by a loss to
the other, so that their sum is equal to zero. ( True/ False)
3. Game theory is concerned with
a) Predicting the results of bets
b) Choice of an optimal strategy in conflict situations
c) Utility maximization by firms
d) Migration pattern in India
4. In game theory in which one firm can gain only what another form losses is called
a) A non zero-sum game
b) Two-person game
c) Prisoners dilemma
d) Zero-sum game
246
3.3.1Payoff Matrix:
When players select particular strategies, the payoff can be represented in the form of a
payoff matrix. Suppose firm A has m strategies and firm B has n strategies, a payoff matrix
will be
The matrix is in terms of player A’s point of view. Player A wishes to gain as large a payoff
aij as possible, while player will do his best make it as small a value of a ij as possible.
Let us assume that both the firms A and B are considering three strategies to gain the market
share- advertising, promotion and quality improvement. The strategies of advertising ,
promotion and better quality is represented as A1, A2 and A3 respectively for firm A and
B1, B2 and B3 respectively for firm B. As shown below in matrix, in total there are 3x3=9
combinations of moves. Each pair of moves shall affect the share of market in a particular
way. As the payoff is in terms of A- a positive payoff indicates that A had gained at the
expense of firm B while negative pay-offs imply B’s gain at A’s expense. For example,
strategy of advertising by both firms A and B will lead to 12 % market share gain for firm
A, while advertising by A and promotion by B, would lead to a shift of 8 % market share
in favour of B. Similarly there are pay-offs corresponding to other pairs of moves.
B’s Strategy
B1 B2 B3
A’s A1 12 -7 -2
Strategy A2 6 7 3
A3 -10 -5 2
Formatted: Indent: Left: 0.5 cm
23.3.2 Maximin Strategy
The conservative approach in selection of best strategy would call for assuming the worst
to happen and act accordingly. In reference to the pay off matrix, if firm A employs A 1
strategy it would expect the firm B to employ strategy B 2, thereby reducing A’s payoffs
from the strategy A1 to its minimum value of -7, representing a loss to firm A. If the firm
employs A2 strategy, it would expect the firm B to employ B 3 strategy which would give a
three percent gain in market share. Similarly for strategy A3, it will expect Firm B to
employ B1 strategy, with a loss of 10 percent. The firm A would like to make the best of
the situation by choosing the strategy which gives maximum of these minimum pay-offs.
Since the minimal payoff to strategies A1, A2 and A3 are -8, 3 and -10 respectively; firm A
would select A2 as its strategy. This decision rule is called the Maximin Strategy.
247
3.3.3 Minimax Strategy
Firm B would also employ a similar conservative approach. When B employs B 1 strategy,
it expect firm A to employ A1, which gives maximum gain to A. In a similar way, adoption
of B2and B3, would make it expect firm A to adopt strategy A2. To minimize the gain of
the competing firm, firm B would select the strategy which would yield the least gain to
firm A . This decision rule of firm B is called Minimax strategy.
As discussed above, it is clear that maximin strategy A2 of firm and the minimax strategy
of firm B, both lead to the same payoff. These strategies are based on the conservative
approach of choosing the best strategy, by assuming that the worst will happen. By
adopting the maximin strategy A can stop B from lowering its gain in the market share
below 3 percent and by adopting minimax strategy firm B can stop A from gaining more
than 3 percent market share. The situation is therefore, one of equilibrium. The point of
equilibrium is known as the saddle point.
To obtain the saddle point, if it exists, we determine the minimum payoff value for each
row and maximum pay off value for each column. If maximum of row minima is equal to
the smallest of the column maxima, then it represents the saddle point. For the illustration,
lets continue with the same problem:
It is also possible to have more than one saddle points for a given problem. For example
consider the following Matrix
B’s Strategy Row
Minima
A’s Strategy B1 B2 B3 B4
A1 2 15 13 -14 -14
A2 -5 6 -4 -5 -5*
A3 5 -2 0 -5 -5*
Column 5 15 13 - Formatted: Indent: Left: 0.5 cm
Maxima 5*
Formatted: Indent: Left: 0.5 cm
In relation of B’s minimax strategy, A firm could employ either A1 or A2, each of which
248
represents the maximin strategy for it. As the pay-off corresponding to B’s minimax
strategy and A’s either maximin strategies is identical, there are two saddle points,
represented by A2B4 and A3B4. The value of the game is -5, a net loss of 5 point to A and
an equivalent profit of B
Illustration 1: Soul Ltd had forecasted sales for its products and products of competitors,
Pure Ltd. There are four strategies for soul Ltd- S1, S2, S3, S4 and three strategies available
to Pure Ltd- P1, P2, P3. The payoffs to all the twelve combinations are given below.
Considering the information, state what would be the optimal strategies for Soul Ltd and
Pure Ltd respectively? What is the value of the game? Is the game fair?
It is possible that there is no saddle point of a game and hence it is not possible to find
solution in terms of pure strategies- the maximin and minimax rule. To solve such problems
we need to employ mixed strategies. A mixed strategy represents a combination of two or
more strategies that are selected one at a time, with pre-determined probabilities. Therefore
in mixed strategy, a player decides to choose among various alternatives in a certain ratio.
Illustrration2: The following is the pay-off matrix of a game being played by A and B.
Determine the optimal strategies for the players and the value of the game.
With mixed strategies, let the player A employs A1 strategy with a probability of x and A2
strategy with a probability of (1-x) . If B plays strategy B1, the A’s expected payoff can be
determined from the first column of the pay-off matrix as follows:
Similarly, if B plays strategy B2, the expected payoff of A can be determined as follows:
We shall find a value of x so that the expected payoff for A is the same irrespective of the
strategy adopted by B. This can be obtained by equating the two equations and solving it:
9𝑥 − 5(1 − 𝑥) = −6𝑥 + 5(1 − 𝑥)
or 9𝑥 − 5 + 5𝑥 = −6𝑥 + 5 − 5𝑥
or 25 𝑥 = 10
or 𝑥 = 10 /25 = 2/5 Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Formatted: Indent: Left: 0.5 cm
A will do best by choosing A1 and A2 strategy in the proportion 2:3 ( i.e. A1 2/5 times &
A2 3/5) The expected pay-off for A applying mixed strategy is :
2 2
9𝑥 − 5(1 − 𝑥) = 9 × − 5 (1 − ) = 3/5
5 5
or
2 2
−6𝑥 + 5(1 − 𝑥) = −6 × + 5 (1 − ) = 3/5
5 5
250
Thus Firm A will have a net gain of 3/5 in long run.
We can determine the mixed strategy of B in a similar way. Thus if player B plays B 1 with
a probability of y and B2 with a probability of (1-y), then
Thus B would play strategies B1 and B2 in the ratio 11:14 in a random manner.
The expected pay-off for B applying mixed strategy is:
11 11
9𝑦 − 6(1 − 𝑦) = 9 × − 6 (1 − ) = −3/5
25 25
or
11 11
−5𝑦 + 5(1 − 𝑦) = −5 × + 5 (1 − ) = −3/5
25 25
Thus, we conclude that A and B should both use mixed strategies as given below and the
value of the game in long run is 3/5
Strategy Probability
For A, A1 2/5 Formatted: Indent: Left: 0.5 cm
A2 3/5 Formatted: Indent: Left: 0.5 cm
For B, B1 11/25 Formatted: Indent: Left: 0.5 cm
B2 14/25 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
In general, for a zero sum two-person game , in which each of the players A and B has
strategies A1, A2 and B1 B2 respectively and the payoffs are given below, if x is the
probability of player A choosing strategy A 1 and y is the probability of player B choosing
strategy B1:
B’s Strategy
B1 B2
A’s Strategy A1 A11 A12
A2 A21 A22
Formatted: Indent: Left: 0.5 cm
Then,
251
𝐴22 − 𝐴21
𝑥=
(𝐴11 + 𝐴22 ) − (𝐴12 + 𝐴21 )
𝐴22 − 𝐴12
𝑦=
(𝐴11 + 𝐴22 ) − (𝐴12 + 𝐴21 )
By substituting the values in the above equation, we get matching the already obtained
ones:
5+5 10 2
𝑥= = =
(9 + 5)— (−6 − 5) 25 5
5 − (−6) 11
𝑦= =
(9 + 5)— (−6 − 5) 25
9 × 5 − (−5)(−6) 15 3
𝑉= = =
(9 + 5)— (−6 − 5) 25 5
252
B’s Strategy Formatted: Indent: Left: 0.5 cm
B1 B2 B3
A’s A1 0 -1 2
Strategy A2 5 4 -3
A3 2 3 -4
Formatted: Indent: Left: 0.5 cm
Lets us follow the usual procedure for identifying a pure strategy, we compute the row
minima and column maxima as below:
B’s Strategy Row Minima
B1 B2 B3
A’s Strategy A1 0 -1 2 -1*
A2 5 4 -3 -3
A3 2 3 -4 -4
Column 5 4 2* Formatted: Indent: Left: 0.5 cm
Maxima
Formatted: Indent: Left: 0.5 cm
The maximum of row minima is -1 and the minimum of column maxima is 2. As the
maximin and minimax values are not equal, the two person zero sum game does not have
an optimal pure strategy. For a problem larger than 2 X 2 matrix, we cannot apply the
mixed strategy probabilities using algebraic equation, as we did in the previous section.
If the game is larger than 2 X 2 requires a mixed strategy, we need to reduce the size of the
matrix by looking for dominated strategies. A dominant strategy exists if another strategy
is at least as good regardless of what opponent does. For example, for strategies A 1 and A2
in the column B1, 5>2, in column B2, 4>3 and in the column B3, -3>-4. Thus regardless of
what the player B does, player A will always choose higher values of strategy A2 as
compared to A3. Therefore we can say strategy A2 dominates strategy A3, and A3 strategy
can be dropped from consideration of player A. This helps us to reduce the size of the
game. After eliminating, the game becomes:
B’s Strategy
B1 B2 B3
A’s A1 0 -1 2
Strategy A2 5 4 -3
Formatted: Indent: Left: 0.5 cm
Now if we compare A1 and A2, we cannot find dominated strategy. Next we look for
dominating strategies for player B. We should remember that player B looks for smaller
values as the matrix is in terms of A’s payoff. By comparing B 1 and B2 strategies, in row
A1, -1<0 , in row A2; 4<5. Thus regardless of what player A does, Player B would always
prefer the smaller values of strategy B2 over strategy B1. Therefore B1 is dominated by
strategy B2 and hence is eliminated.
253
B’s Strategy Formatted: Indent: Left: 0.5 cm
B2 B3
A’s A1 -1 2
Strategy A2 4 -3
Formatted: Indent: Left: 0.5 cm
By successively eliminating dominated strategies, we reduce the game to a 2 X 2 game.
The algebraic solution procedure described in the earlier section can now be used to find
the optimal probabilities for a mixed strategy problem.
𝐴22 − 𝐴12 −3 − 2 −5 1
𝑦= = = =
(𝐴11 + 𝐴22 ) − (𝐴12 + 𝐴21 ) (−1 − 3) − (2 + 4) −10 2
Thus, we conclude that A and B should both use mixed strategies as given below and the
value of the game in long run is 1/2
Strategy Probability
For A, A1 7/10 Formatted: Indent: Left: 0.5 cm
A2 3/10 Formatted: Indent: Left: 0.5 cm
For B, B1 1/2 Formatted: Indent: Left: 0.5 cm
B2 1/2 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
There are problems, where after applying dominance rule, game is reduced to a 2 x n or a
m x 2 matrix. In such case, the problems can be solved graphically.
2. Given the following two person game, which strategy can be eliminated by
dominance rule
Y1 Y2
X1 9 13
X2 12 8
X3 6 14
a) X1
b) X2
c) X3 254
d) None of the above
3.6 Solution of 2 x n and m x 2 games
When a player A has only 2 strategies to choose from and the player B has n strategies, the
game is of the order 2 x n, whereas in case B has only two strategies and A has m strategies,
the game is a m x 2 game.
The problem may be originally a 2 x n or a m x 2 game or might have been reduced to such
size after applying the dominance rule. By using the graphical method, the aim is to reduce
the game to the order of 2 x 2 by identifying and eliminating the dominated strategies and
then solve by the algebraic method as used earlier. The game value and optimal strategy
can be read from the graph, but generally algebraic method is adopted to get the answer.
Player B’s
Strategy
B1 B2
Player A’s Strategy A1 6 -7
A2 1 3
A3 3 1
A4 5 -1
Formatted: Indent: Left: 0.5 cm
Here, the payoff matrix consists of m rows and 2 columns. We will be discussing how to
solve a m x 2 game. The first step is to check whether the problem have a saddle point or
not. As can be seen below, this game has no saddle point
Let y be the probability that player B select B 1 strategy and (1-y) be the probability that
player B selects B2 strategy. When player A chooses to play A1, the expected payoff for B
shall be 6y-7(1-y) = 13y-7. Similarly the expected payoff of strategies A2, A3, and A4 are
found and is shown in below table. To plot graphically, the value of pay-off when y=0 and
255
y=1 is also calculated for each of the strategies.
Formatted: Indent: Left: 0.5 cm
Player A’s Expected Pay-off Payoff of B
Strategy y=0 y=1
A1 6y-7(1-y)=13y-7 -7 6 Formatted: Indent: Left: 0.5 cm
A2 y+3(1-y)= -2y+3 3 1 Formatted: Indent: Left: 0.5 cm
A3 3y+1(1-y)=2y+1 1 3 Formatted: Indent: Left: 0.5 cm
A4 5y-1(1-y)= 6y-1 -1 5 Formatted: Indent: Left: 0.5 cm
Plot the pay-off values by keeping using appropriate scaling to represent ‘y’ on x axis and
pay-off values on y axis. This is shown in below figure:
The lines are marked A1, A2, A3 and A4; they represent the respective strategies. For each
value of y, the height of the lines at that point denotes the pay-offs of each of B’s strategies
against(y, 1-y) for B. Player B is concerned with his maximum pay-off when he plays a
particular strategy which is represented by the uppermost area formed by the four lines and
wishes to choose y so as to minimize this maximum pay-off. The lowest interaction point
in the upper boundary of the graph is the minimax point for Player B. ABCD is the upper
boundary of the graph and the lowest point is B, so this is called the minimax point. As can
be seen from the graph more than two lines are passing through the point B. We will be
selecting any two lines with opposite slopes. So either A2 and A3 or A2 and A4 can be
selected. Here we select A2 and A4. The reduced pay-off matrix will be as follows:
Player B’s
Strategy
B1 B2
Player A’s Strategy A2 1 3
A4 5 -1
Formatted: Indent: Left: 0.5 cm
256
The 2 X 2 matrix can be solved using the algebraic method as
𝐴22 − 𝐴21 −1 − 5 −6 3
𝑥= = = =
(𝐴11 + 𝐴22 ) − (𝐴12 + 𝐴21 ) (1 − 1) − (3 + 5) −8 4
𝐴22 − 𝐴12 −1 − 3 −4 1
𝑦= = = =
(𝐴11 + 𝐴22 ) − (𝐴12 + 𝐴21 ) (1 − 1) − (3 + 5) −8 2
Thus, we conclude that A and B should both use mixed strategies as given below and the
value of the game in long run is 2
Strategy Probability
For A, A1 0 Formatted: Indent: Left: 0.5 cm
A2 3/4 Formatted: Indent: Left: 0.5 cm
A3 0 Formatted: Indent: Left: 0.5 cm
A4 1/4 Formatted: Indent: Left: 0.5 cm
For B, B1 1/2 Formatted: Indent: Left: 0.5 cm
B2 1/2 Formatted: Indent: Left: 0.5 cm
Formatted: Indent: Left: 0.5 cm
If it is a 2 x n game, the expected pay-off values will be calculated for Player A, x will be
the probability of choosing A1 strategy and (1-x) will be the probability of choosing
strategy A2. The x axis will be used to represent ‘x’ values and payoff of player A will be
y axis. The highest interaction point in the lower boundary of the graph is the maximin
point for Player A.
Summary of the Steps for solving Two- person Zero –sum games
1. Use the maximin strategy for player A and minimax strategy for player B to determine Formatted: Indent: Left: 0.5 cm, First line: 0 cm
whether a pure strategy solution exists. If there is a saddle point, it is the optimal solution.
2. If a pure strategy does not exist and the game is larger than 2 X 2 , identify a dominated
strategy to remove a row or a column. Develop a reduced pay-off table and continue to use
dominance rule to reduce as many rows and columns as possible.
3. If reduced game is 2 x n or m x 2, solve graphically to reduce it to a 2 x 2 matrix
4. If the reduced game is 2 x 2, solve for the optimal mixed strategy probabilities using
algebraic method.
Formatted: Indent: Left: 0.5 cm
If the game cannot be reduced to a 2 x 2 game, a linear programming model is used to solve
for the optimal mixed strategy probabilities, which is beyond the scope of this unit.
257
Check your Progress 5
1. In a 2 x n game , the graphical solution is obtained from which value of lower
boundary
a) Highest value
b) Lowest value
c) Average value
d) None of the above
2. If in a m x 2 game, Player A has m strategies and player B has two strategies, the y axis
is used to represent
a) x values
b) y values
c) Expected payoff for B
d) Expected payoff for A
In this unit, we described how to solve two-person zero-sum games. In these games, the
two players end up when sum of the gain (loss) of one player and the loss (gain) to the
other player is always equal to zero. The steps that are used to determine whether a two-
person zero-sum game results in an optimal pure strategy were discussed. If a pure strategy
exists, a saddle point determines the value of the game. If an optimal strategy does not exist
for a two-person zero-sum 2 x 2 game, the algebraic method was used to derive the
probabilities of mixed strategy. In mixed strategy each player employs probability to select
a strategy for each play of the game. The dominance rule used for reduction of the size of
the matrix of mixed strategy game was also discussed. If the elimination of dominated
strategies can reduce a larger game to 2 x 2 game, an algebraic solution procedure is used
to find a solution. The solution of the n x2 or m x 2 game, using the graphical method was
also discussed.
258
Check your progress 3
Answers
1. (d)
2. (d)
3.9 Glossary
Game theory: The study of decision situations in which two or more players compete as
adversaries.
Two-person Zero-sum game: A game with two players in which the gain to one player is
equal to the loss to the other player.
Optimal Strategy: A strategy in which the player can achieve the maximum payoff is
called the optimal strategy.
Saddle point: A condition that exists when pure strategies are optimal for both players in
a two-person zero-sum game.
Payoff Matrix: The tabular display of the payoffs of the players under various alternatives
is called the payoff matrix.
Pure strategy: A game solution that provide a single best strategy for each player.
Mixed strategy: A game solution in which the player randomly selects the strategy to play
from among several strategies with probabilities.
Dominated strategy: A strategy is dominated if another strategy is at least as good for
every strategy that the opposing player may employ.
3.10 Assignment
1 What is game theory? What do you understand by ‘zero-sum” in the context of game
theory?
2 Explain the following:Saddle point, Pure strategy, Mixed strategy
3 Explain the concept of dominance with examples.
4 For the following Two-person, zero-sum game, find the optimal strategies for the
two players and value of the game:
259
B’s Strategy Formatted: Indent: Left: 0.5 cm
B1 B2 B3
A’s A1 5 9 3
Strategy A2 6 -12 -11
A3 8 16 10
5 Solve the following game graphically Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Player B’s Formatted: Indent: Left: 0.5 cm
Strategy
B1 B2
Player A’s Strategy A1 3 4
A2 -3 12
A3 6 -2
A4 -4 -9
A5 5 -3
Formatted: Indent: Left: 0.5 cm
3.11 Activities Formatted: Indent: Left: 0.63 cm, Hanging: 0.63 cm, Tab
stops: 1.5 cm, Left
Discuss applications of game theory with examples
Formatted: Indent: Left: 0.5 cm
Station B
Sitcom, News, Travel Formatted: Indent: Left: 0.5 cm
a1 a2 a3
Station A Sitcom, a1 70 80 50
News, a2 90 60 95
Travel, a3 105 90 65
Formatted: Indent: Left: 0.5 cm
Determine the optimal programming strategy for each station. What is the value of the
game?
260
.
Block Summary
In this block, we discussed some operation research techniques in detail. In the first unit
business situations pertaining to managing projects were discussed. The project
management techniques involve constructing network diagram using rules of networking.
The project networking technique with multiple estimates of activity time was also
explained. In the second unit the waiting line concept was introduced with its applications.
The unit included some most commonly used models of queueing. In the last unit game
theory and its applications were discussed. The consequences of interplay of combination
of strategies with competitor were explained and various methods employed to derive the
optimal strategy were covered.
Block Assignment
261
b. Calculate total float for each of the activities
5. For the following Two-person, zero-sum game, find the optimal strategies for the
two players and value of the game:
B’s Strategy Formatted: Indent: Left: 0.5 cm
B1 B2 B3
A’s A1 30 40 -80
Strategy A2 0 15 -20
A3 90 20 50
Formatted: Indent: Left: 0.5 cm, First line: 0 cm
Formatted: Indent: Left: 0.5 cm
6. Customers for a local bakery arrive randomly following a Poisson process. The Formatted: Indent: Left: 0.5 cm, First line: 0 cm
single salesman can attend customers at an average rate of 20 customers per hour, the
service time being distributed exponentially. The man arrival rate of customers is 12 per
hour. Determine the following?
a. The mean number of customers in bakery
b. The mean time spend by customers in bakery
c. The expected number of customers waiting to be queued
d. The mean waiting time of a typical customer in the queue
Formatted: Indent: Left: 0.5 cm
262