Business Statistics Chapter 5
Business Statistics Chapter 5
BUSINESS
STATISTICS
BBA LLB
By
The_Lawgical_World
THE_LAWGICAL_WORLD 1
BUSINESS STATISTICS
SYLLABUS
Unit – V: Correlation Analysis: Scatter diagram, Positive and negative
correlation, limits for coefficient of correlation, Kari Pearson’s
coefficient of correlation, Spearman’s Rank correlation. Regression
Analysis: Concept, least square fir of a linear regression, two lines of
regression, properties of regression, properties of regression
coefficients (Simple problems only) Time Series Analysis:
Components, Models of Time Series – Additive, Multiplicative and
Mixed models; Trend analysis – Free hand curve, Semi averages,
moving averages, Least Square methods (Simple problems only).
THE_LAWGICAL_WORLD 2
BUSINESS STATISTICS
Correlation Analysis:
Introduction:
Statistical methods of measures of central tendency, dispersion,
skewness and kurtosis are helpful for the purpose of comparison and
analysis of distributions involving only one variable i.e., univariate
distributions. However, describing the relationship between two or more
variables, is another important part of statistics.
The statistical methods of Correlation and Regression are helpful in
knowing the relationship between two or more variables which may be
related in same way, like interest rate of bonds and prime interest rate;
advertising expenditure and sales; income and consumption; crop-yield
and fertilizer used; height and weights and so on.
Correlation
Correlation is a measure of association between two or more variables.
When two or more variables very in sympathy so that movement in one
tends to be accompanied by corresponding movements in the other
variable(s), they are said to be correlated.
“The correlation between variables is a measure of the nature and degree
of association between the variables”.
As a measure of the degree of relatedness of two variables, correlation is
widely used in exploratory research when the objective is to locate
variables that might be related in some way to the variable of interest.
The degree of relationship between the variables under consideration is
measure through the correlation analysis. The measure of correlation
called the correlation coefficient. The degree of relationship is expressed
by coefficient which range from correlation ( -1 ≤ r ≥ +1). The direction
of change is indicated by a sign. The correlation analysis enables us to
have an idea about the degree & direction of the relationship between
the two variables under study.
THE_LAWGICAL_WORLD 3
BUSINESS STATISTICS
Definitions:
“Correlation is an analysis of the covariation between two or more
variables.” —A.M. Tuttle
“Correlation analysis contributes to the understanding of economic
behaviour, aids in locating the critically important variables on which
others depend, may reveal to the economist the connections by which
disturbances spread and suggest to him the paths through which
stabilising forces may become effective.”—W.A. Neiswanger
Types of Correlation
The correlation is a statistical tool which studies the relationship
between two variables and correlation analysis involves various methods
and techniques used for studying and measuring the extent of the
relationship between the two variables.
(a) POSITIVE AND NEGATIVE CORRELATION: If the values of
the two variables deviate in the same direction i.e., if the increase in the
values of one variable results, on an average, in a corresponding increase
in the values of the other variable or if a decrease in the values of one
variable results, on an average, in a corresponding decrease in the values
of the other variable, correlation is said to be positive or direct.
Some examples of series of positive correlation are : (i) Heights and
weights. (ii) The family income and expenditure on luxury items. (iii)
Amount of rainfall and yield of crop (up to a point). (iv) Price and
supply of a commodity and so on.
On the other hand, correlation is said to be negative or inverse if the
variables deviate in the opposite direction i.e., if the increase (decrease)
in the values of one variable results, on the average, in a corresponding
decrease (increase) in the values of the other variable.
THE_LAWGICAL_WORLD 4
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 5
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 6
BUSINESS STATISTICS
(i) If the points are very dense i.e., very close to each other, a fairly good
amount of correlation may be expected between the two variables. On
the other hand, if the points are widely scattered, a poor correlation may
be expected between them.
(ii) If the points on the scatter diagram reveal any trend (either upward
or downward), the variables are said to be correlated and if no trend is
revealed, the variables are uncorrelated.
(iii) If there is an upward trend rising from lower left hand corner and
going upward to the upper right hand corner, the correlation is positive
since this reveals that the values of the two variables move in the same
direction. If, on the other hand, the points depict a downward trend from
the upper left hand corner to the lower right hand corner, the correlation
is negative since in this case the values of the two variables move in the
opposite directions.
(iv) In particular, if all the points lie on a straight line starting from the
left bottom and going up towards the right top, the correlation is perfect
and positive, and if all the points lie on a straight line starting from left
top and coming down to right bottom, the correlation is perfect and
negative.
The following diagrams of the scattered data depict different forms of
correlation.
THE_LAWGICAL_WORLD 7
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 8
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 9
BUSINESS STATISTICS
For example, if there are m classes for the X-variable series and n
classes for the Y-variable series then there will be m × n cells in the two-
way table. By going through the different pairs of the values (x, y) and
using tally marks we can find the frequency for each cell and thus obtain
the so-called bivariate frequency table as shown below.
where the frequency f used for the product xy is nothing but f (x, y) and
the frequency f used in the sums ∑fx and ∑fy are respectively the
frequencies of x and y, viz., fx & fy as explained in the above table. If we
THE_LAWGICAL_WORLD 12
BUSINESS STATISTICS
change the origin and scale in X and Y by transforming them to the new
variables U and V by
u = x – A/h and v = y – B/k; h > 0, k > 0
where h and k are the widths of the x-classes and y-classes respectively
and A and B are constants, then by Property II of r, we have:
the rank of a member of one group from the rank of its associated
member of the other group. The differences (d) are then squared and
summed. The number of pairs in the groups is represented by n.
Computation of Rank Correlation Coefficient.
The method of computing the Spearman’s rank correlation
coefficient ρ under the following situations :
(i) When actual ranks are given.
(ii) When ranks are not given
CASE (I) — WHEN ACTUAL RANKS ARE GIVEN
In this situation the following steps are involved:
(i) Compute d, the difference of ranks.
(ii) Compute d2.
(iii) Obtain the sum ∑d2.
(iv) Use formula 1- 6Σd2 / n(n2-1) to get the value of ρ
CASE (II)—WHEN RANKS ARE NOT GIVEN:
Spearman’s rank correlation formula can also be used even if we are
dealing with variables which are measured quantitatively, i.e., when the
actual data but not the ranks relating to two variables are given. In such a
case we shall have to convert the data into ranks. The highest (smallest)
observation is given the rank 1. The next highest (next lowest)
observation is given rank 2 and so on. It is immaterial in which way
(descending or ascending) the ranks are assigned. However, the same
approach should be followed for all the variables under consideration.
REPEATED RANKS
In case of attributes if there is a tie i.e., if any two or more individuals
are placed together in any classification w.r.t. an attribute or if in case of
variable data there is more than one item with the same value in either or
THE_LAWGICAL_WORLD 14
BUSINESS STATISTICS
both the series, then Spearman’s formula for calculating the rank
correlation coefficient breaks down, since in this case the variables X
[the ranks of individuals in characteristic A (1st series)] and Y [the ranks
of individuals in characteristic B (2nd series)] do not take the values
from 1 to n and consequently x – ≠ y – , while in proving we had
assumed that x– = y– .
In this case, common ranks are assigned to the repeated items. These
common ranks are the arithmetic mean of the ranks which these items
would have got if they were different from each other and the next item
will get the rank next to the rank used in computing the common rank.
V. METHOD OF CONCURRENT DEVIATIONS
This is very casual method of determining the correlation between two
series when we are not very serious about its precision. This is based on
the signs of the deviations (i.e., direction of the change) of the values of
the variable from its preceding value and does not take into account the
exact magnitude of the values of the variables. Thus, we put a plus (+)
sign, minus (–) sign or equality (=) sign for the deviation if the value of
the variable is greater than, less than or equal to the preceding value
respectively. The deviations in the values of two variables are said to be
concurrent if they have the same sign, i.e., either both deviations are
positive or both are negative or both are equal. The formula used for
computing correlation coefficient r by this method is given by
THE_LAWGICAL_WORLD 15
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 16
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 17
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 18
BUSINESS STATISTICS
This is the error (parallel to the y-axis) for the ith point. We will have
such errors for all the points on scatter diagram. For the points which lie
above the line, the error would be positive and for the points which lie
below the line, the error would be negative.
THE_LAWGICAL_WORLD 19
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 20
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 23
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 24
BUSINESS STATISTICS
for the analysis of time series data have been developed by economists.
However, these techniques can also be applied for the study of
behaviour of any phenomenon collected chronologically over a period of
time in any discipline relating to natural and social sciences, though not
directly related to economics or business.
Components of Time Series:
If the values of a phenomenon are observed at different periods of time,
the values so obtained will show appreciable variations or changes.
These fluctuations are due to the fact that the value of the phenomenon
is affected not by a single factor but due to the cumulative effect of a
multiplicity of factors pulling it up and down. However, if the various
forces were in a state of equilibrium, then the time series will remain
constant.
The various forces affecting the values of a phenomenon in a time series
may be broadly classified into the following four categories, commonly
known as the components of a time series, some or all of which are
present (in a given time series) in varying degrees.
(a) Secular Trend or Long-term Movement (T).
(b) Periodic Movements or Short-term Fluctuations:
(i) Seasonal Variations (S),
(ii) Cyclical Variations (C).
(c) Random or Irregular Variations (R or I).
The value (y) of a phenomenon observed at any time (t) is the net effect
of the interaction of above components.
(a) Secular Trend: The general tendency of the time series data to
increase or decrease or stagnate during a long period of time is called the
secular trend or simple trend. This phenomenon is usually observed in
most of the series relating to Economics and Business, e.g., an upward
THE_LAWGICAL_WORLD 26
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 27
BUSINESS STATISTICS
Thus, seasonal variations in a time series will be there, if the data are
recorded quarterly (every three months), monthly, weekly, daily, hourly,
and so on. Although in each of the above cases, the amplitudes of the
seasonal variations are different, all of them have the same period, viz.,
1 year. Thus, in a time series data where only annual figures are given,
there are no seasonal variations. Most of economic time series are
influenced by seasonal swings, e.g., prices, production and consumption
of commodities; sales and profits in a departmental store; bank clearings
and bank deposits, etc.
The seasonal variations may be attributed to the following two causes:
(i) Those resulting from natural forces and (ii) Those resulting from
man-made conventions.
(ii) Cyclical Variations (C). The oscillatory movements in a time series
with period of oscillation greater than one year are termed as cyclical
variations. These variations in a time series are due to ups and downs
recurring after a period greater than one year. The cyclical fluctuations,
though more or less regular, are not necessarily uniformly periodic, i.e.,
they may or may not follow exactly similar patterns after equal intervals
of time. One complete period which normally lasts from 7 to 9 years is
termed as a ‘cycle’. These oscillatory movements in any business
activity are the outcome of the so-called ‘Business Cycles’ which are the
four-phased cycles comprising prosperity (boom), recession, depression
and recovery from time to time. These booms and depressions in any
business activity follow each other with steady regularity and the
complete cycle from the peak of one boom to the peak of next boom
usually lasts from 7 to 9 years. Most of the economic and business
series, e.g., series relating to production, prices, wages, investments, etc.,
are affected by cyclical upswings and downswings.
The study of cyclical variations is of great importance to business
executives in the formulation of policies aimed at stabilising the level of
business activity. A knowledge of the cyclic component enables a
THE_LAWGICAL_WORLD 28
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 30
BUSINESS STATISTICS
The additive model assumes that all the four components of the time
series operate independently of each other so that none of these
components has any effect on the remaining three.
This implies that the trend, however, fast or slow, it may be, has no
effect on the seasonal and cyclical components; nor do seasonal swings
have any impact on cyclical variations and conversely. However, this
assumption is not true in most of the economic and business time series
where the four components of the time series are not independent of
each other.
(ii) Multiplicative Model or Decomposition by Multiplicative
Hypothesis: Keeping the above points, in view, most of the economic
and business time series are characterised by the following classical
multiplicative model:
Y = T × S × C × I --- (i)
or more precisely, Yt = Tt × St × Ct × It
This model assumes that the four components of the time series are due
to different causes but they are not necessarily independent and they can
affect each other.
In this model S, C and I are not viewed as absolute amounts but rather as
relative variations. Except for the trend component T, the other
components S, C and I are expressed as rates or indices fluctuating
above or below 1 such that the geometric means of all the S = St values
THE_LAWGICAL_WORLD 31
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 32
BUSINESS STATISTICS
This is the simplest and the most flexible method of estimating the
secular trend and consists in first obtaining a histogram by plotting the
time series values on a graph paper and then drawing a free-hand smooth
curve through these points so that it accurately reflects the long-term
tendency of the data. The smoothing of the curve eliminates the other
components, viz., seasonal, cyclical and random variations.
In order to obtain proper trend line or curve, the following points may be
borne in mind:
(i) It should be smooth.
(ii) The number of points above the trend curve/line should be more or
less equal to the number of points below it.
(iii) The sum of the vertical deviations of the given points above the
trend line should be approximately equal to the sum of vertical
deviations of the points below the trend line so that the total positive
deviations are more or less balanced against total negative deviations.
(iv) The sum of the squares of the vertical deviations of the given points
from the trend line/curve is minimum possible.
(v) If the cycles are present in the data then the trend line should be so
drawn that:
(a) It has equal number of cycles above and below it.
(b) It bisects the cycles so that the areas of the cycles above and
below the trend line are approximately same.
(vi) The minor short-term fluctuations or abrupt and sudden variations
may be ignored.
Merits:
(i) It is very simple and time-saving method and does not require any
mathematical calculations.
THE_LAWGICAL_WORLD 33
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 34
BUSINESS STATISTICS
each part. The line joining these points gives the straight-line trend
fitting the given data.
Merits:
(i) An obvious advantage of this method is its objectivity in the sense
that it does not depend on personal judgement and everyone who uses
this method gets the same trend line and hence the same trend values.
(ii) It is easy to understand and apply as compared with the moving
average or the least square methods of measuring trend.
(iii) The line can be extended both ways to obtain future or past
estimates.
Limitations:
(i) This method assumes the presence of linear trend (in the time series
values) which may not exist.
(ii) The use of arithmetic mean (for obtaining semi-averages) may also
be questioned because of its limitations. Accordingly, the trend values
obtained by this method and the predicted values for future are not
precise and reliable.
Example: Apply the method of semi-averages for determining trend of
the following data and estimate the value for 2000:
Years: 1993 1994 1995 1996 1997 1998
Sales: 20 24 22 30 28 32
If the actual figure of sales for 2000 is 35,000 units, how do you account
for the difference between the figures you obtain and the actual figures
given to you?
Solution: Here n = 6 (even), and hence the two parts will be 1993 to
1995 and 1996 to 1998.
CALCULATIONS FOR TREND BY SEMI-AVERAGES
THE_LAWGICAL_WORLD 35
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 36
BUSINESS STATISTICS
y = a + bt.
Then for any given time ‘t’, the estimated value y e of y as given by this
equation is:
ye = a + bt
The principle of least squares consists in estimating the values of a and b
in the above equation, so that the sum of the squares of errors of
estimate.
THE_LAWGICAL_WORLD 37
BUSINESS STATISTICS
(iii) The trend equation can be used to estimate or predict the values of
the variable for any period t in future or even in the intermediate periods
of the given series and the forecasted values are also quite reliable.
(iv) The curve fitting by the principle of least squares is the only
technique which enables us to obtain the rate of growth per annum, for
yearly data, if linear trend is fitted.
Demerits:
(i) The most serious limitation of the method is the determination of the
type of the trend curve to be fitted, viz., whether we should fit a linear or
a parabolic trend or some other more complicated trend curve.
Assumptions about the type of trend to be fitted might introduce some
bias.
(ii) The addition of even a single new observation necessitates all the
calculations to be done afresh which is not so in the case of moving
average method.
(iii) This method requires more calculations and is quite tedious and
time consuming as compared with other methods. It is rather difficult for
a non-mathematical person (layman) to understand and use.
(iv) Future predictions or forecasts based on this method are based only
on the long-term variations, i.e., trend and completely ignore the
cyclical, seasonal and irregular fluctuations.
(v) It cannot be used to fit growth curves (Modified exponential curve,
Gompertz curve and Logistic curve) to which most of the economic and
business time series conform. The discussion, however, is beyond the
scope of the book
Example: Fit a linear trend to the following data by the least squares
method. Verify that ∑ (y – ye) = 0, where ye is the corresponding trend
value of y.
Year: 1990 1992 1994 1996 1998
THE_LAWGICAL_WORLD 38
BUSINESS STATISTICS
Production: 18 21 23 27 16
(in ’000 units)
Also estimate the production for the year 1999.
Solution:
Here n = 5 i.e., odd. Hence, we shift the origin to the middle of the time
period viz., the year 1994.
Let x = t – 1994 …(i)
Let the trend line of y (production) on x be:
y = a + bx (Origin 1994) …(ii)
COMPUTATION OF STRAIGHT-LINE TREND
Putting x = – 4, –2, 0, 2 and 4 in (iii), we obtain the trend values (ye) for
the years 1990, 1992, ...., 1998 respectively.
The difference (y – ye) is calculated in the last column of the table. We
have:
THE_LAWGICAL_WORLD 39
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 40
BUSINESS STATISTICS
If the period ‘m’ of the moving average is odd. then the successive
values of the moving averages are placed against the middle values of
the corresponding time intervals.
For example, if m =5, the first moving average value is placed against
the middle period. i.e., 3rd, the second M.A. value is placed against the
time period 4 and so on.
Case (ii). When Period is Even.
If the period ‘m’ of the M.A. is even, then there are two middle periods
and the M.A. values are placed in between the two middle periods of the
time intervals it covers.
Obviously, in this case, the M.A. values will not coincide with a period
of the given time series and an attempt is made to synchronise them with
the original data by taking a two-period average of the moving averages
and placing them in between the corresponding time periods.
This technique is called centering and the corresponding moving average
values are called centred moving averages. In particular, if the period m
= 4, the first moving average value is placed against the middle of 2nd
and 3rd time intervals; the second moving average value is placed in
between 3rd and 4th time periods and so on.
THE_LAWGICAL_WORLD 41
BUSINESS STATISTICS
THE_LAWGICAL_WORLD 42
BUSINESS STATISTICS
and consequently the moving average values will not represent a true
picture of the general trend.
4. In case of non-linear trend, which is generally the case in most of
economic and business time series, the trend values given by the moving
average method are biased and they lie either above or below the true
sweep of the data.
THE_LAWGICAL_WORLD 43