2nd Unit
2nd Unit
One of the most important objectives of statistical analysis is to get single value
that describes the characteristics of the entire mass of universal data. Such a value is
called the central value or an average or the expected value of the variable.
Definition:
According to Clark,
“Average is an attempt to find one single figure to describe whole of figures”
It is clear from of the above definition that an average is a single value that
represents a group of value. Such a value is of great significance because it depicts the
characteristics of the whole group. Since an average represents the entire data, its value
lies somewhere in between the two extremes, i.e., the largest and smallest items. For this
reason an average is frequently referred to as a measure of Central tendency.
Objectives of Averaging:
1. To get one single value that describes the characteristics of the entire group.
2. To facilitate comparisons.
Requisites of a Good Average:
1
22-01-2025
Types of Averages:
1. Arithmetic Mean
2. Median
3. Mode
4. Geometric Mean
5. Harmonic Mean
Arithmetic Mean:
Its value is obtained by adding together all the items and by dividing this total
Symbolically,
X= X1+X2+X3……Xn
N
Or,
X= ΣX
N
Here,
X= Arithmetic Mean
ΣX= Sum of all the values of variable X, i.e., X1+X2+X3……Xn
N = Number of observations
Steps:
1. Add together all the values of the variables X and obtain the total, i.e., ΣX
2. Divide the total by the number of observations i.e., N
2
22-01-2025
= 8650
10
X̄ = 865.0
X= ΣfX
N
Where,
f= Frequency
Steps:
1. Multiply the frequency of each row with the variable and obtain the total ΣfX.
2. Divide total obtain by step I by the number of observations i.e., total frequency.
3
22-01-2025
Ex: from the following data of the marks obtained by 60 students of a class, calculate the A.M
Marks No of students fX
X= ΣfX
X f
N
20 8 160
= 2460
30 12 360
60
40 20 800 X = 41
50 10 500
60 6 360
70 4 280
N = 60 ΣX = 2460
Direct Method
X= Σfm
N
Where,
Steps:
•Multiply these mid-points by the respective frequency of each class and obtain the total Σfm.
4
22-01-2025
Short-cut Method:
X=A+ Σfd * i
N
Where,
A= Assumed mean
d= deviation of mid-points from assumed mean divided bt class interval i.e., (m-A)
i
N= total number of observations
i= class intervals
Steps:
1. take any assumed mean
2. from the mid-point of each class deduct the assumed mean and divide the deviation by I and
denote it by d
3. Multiply the respective frequency of each class by these d values and obtain the total Σfd.
4. Apply the Formula:
X=A+ Σfd * i
N
5
22-01-2025
Open end classes are those in which lower limit of the first class and the upper limit of
last class are not known. In a such case we cannot find out the arithmetic mean unless we make
an assumption about the unknown limits. The assumption would naturally depend upon the
class intervals following the first class and preceding the last class.
Marks No of students
X f
In the above case since the class interval is
Below-10 4
uniform, the appropriate assumption would be that
10 - 20 6
20 - 30 10 the lower limit of the first class is zero and upper
1. The sum of deviations of the items taking from a arithmetic mean taking sign into account is
always zero (0) i,e. Σ(X-X) = 0, This would be clear from the following data.
X (X-X) X= ΣX
10 -20 N
20 -10
= 150
30 0 5
40 10 X = 30
50 20
It is because of this property that the mean is characterized as a point of balanced because the
sum of positive deviations from it is equal to the sum of negative deviations from it.
6
22-01-2025
2. The sum of deviations of the items taking from a arithmetic mean is minimum is less than
the sum of square deviation of the items from any other value. The following example
would clarify the point.
2 -2 4 2 -1 1
3 -1 1 3 0 0
4 0 0 4 1 1
5 1 1 5 2 4
6 2 4 6 3 9
Sum of the square deviation is equal to It is clear that Σ(X-3)2 is greater. This
10 in the above case. If the deviation are property that the sum of the square of
taken from any other value then sum of the items is least from the mean is of
square deviations would be greater then 10. immense use in regression analysis.
For example, let us calculate the square of the
deviations of items from a value less than the
arithmetic mean say 3
3. X= ΣX NX = ΣX
N
In other words, if we repeat each items in the series by the mean, then the sum these
substitutions will b equal to the sum of the individual items .
4. If we have the arithmetic mea n and number of items of two or more then two related group,
we can compute combined average of these groups by applying following formula
X 12 = N 1 X 1 + N2 X2
N 1 + N2
7
22-01-2025
Ex : The mean height of 25 male workers in a factory is 61 cm and the mean height of 35 female
workers in the same factory of 58cm. Find the combined mean height of 60 workers in the factory.
Solution:
X 12 = N 1 X 1 + N2 X2
N 1 + N2
N 1 = 25, X 1 = 61 X2 = 58 N2 = 35
X 12 = (25*61)+ (35*58)
25+ 35
= 1525 + 2030
60
= 3555
60
X 12 = 59.25
Merits:
5. It is relative reliable.
8
22-01-2025
Demerit:
computed.
purpose.
Median:
case of a median one-half of the items in the distributions have a value the size of the
median value or smaller and one half has a value the size of the median value or larger.
As distinct from the arithmetic mean which is calculated from the value of
every item in the series, the median is what is called a positional average. The term
position refers to the place of a value in a series. The place of the median in a series is
such that an equal number of items lie on either side of it. Thus when N is odd, the
median is an actual value, with the reminder of the series in two equal parts on either
side of it. If N is even, the median is a derived figure i.e., half the sum of the two
middle values.
9
22-01-2025
Steps:
II. In a group composed of an add numbers of values, add 1 to the total number of
values and divide by 2. The median value for a group composed of an even number
of items is estimated by findings, the arithmetic mean – that is, adding the two
10
22-01-2025
Steps:
IV. Now look at cumulative frequency column and find the total which is either
equal to N+1/2 or next higher than that and determine the value of the variable
11
22-01-2025
Steps:
I. Determine the particular class in which the value of median lies. Use N/2 as rank of the
median and not N+1/2. Hence it is N/2 which will divide the area of curve into two parts
and such as we should use N/2 instead of N+1/2 continuous series. After ascertaining the
class in which median lies, the following formula is used for determining the exact value
of median.
Median = L+ N/2 - c.f * I
f
L= lower limit of median class i.e., the class in which the middle items in distribution lies
c.f= Cumulative frequency of the class preceding the median class or sum of the
12
22-01-2025
•The sum of the deviations of the items from media, ignoring signs, it the least.
Merits:
13
22-01-2025
Demerits:
every observation
14
22-01-2025
Mode:
The mode or the mode value is that value in a series of observation having
The mode is often said to be the value which occurs most often, that is, with the
highest frequently.
Mode
For determining mode count the number of times the various values repeat themselves
and the value occurring the maximum number of times in the middle value.
In a discrete series quite often mode can be determined just by inspection i.e., by looking
to that value of the variable around which the items are most heavily concentrated.
Size of Garments 28 29 30 31 32 33
No of persons wearing 10 20 40 65 50 15
For the above data we can clearly say that the mode size is 31 because the value 31 has
15
22-01-2025
Where,
While applying the above formula for calculating mode, it is necessary to see that
the class interval is uniform throughout. If they are unequal they should first be made
equal on the assumption that the frequencies are equally distributed throughout the
There may be two values which occur with equal frequency. The distribution is
formula based upon the relationship between mean, median, and mode.
Usefulness:
Highly skewed or non -normal distribution.
16
22-01-2025
Merits:
3. Its value can be determined in often-end distribution without ascertaining the class limits.
Demerits:
1. The value of mode cannot always be determined. In some cases we may have a bimodal
series.
3. The value of mode is not based on each and every items o f the series.
The distribution in which the value of mean, median, mode are coinside (i.e., mean =
Mode=3Median- 2 Mean.
17
22-01-2025
Geometric Mean:
Geometric Mean is defined as the nth root of the product id N items or values.
Symbolically:
G.M= n (X1)*(X2)*(X3)………..Xn
Where X1, X2, X3,….etc refers to the various items of the series.
When the number of items is three or more the task of multiplying numbers and of the
Steps:
1. Take the logarithms of the variable X and obtain the total Σ log X.
2. Divide Σ log X by N and take the antilog of the values so obtained. This gives the
value of geometric mean.
Example
Monthly income of ten families of a particular place is given below. Find out
Geometric mean
85 70 75 500 8 45 250 40 36
85 70 75 500 8 45 250 40 36
18
22-01-2025
Steps:
2. Multiply these logarithms with the respective frequencies of each class and obtain
1. The product the value of the series will remain unchanged when the value of
2*4*8 = 64 = 4*4*4
19
22-01-2025
2. The sum of the deviations of the logarithms of the original observation above and
below the logarithms Geometric Mean is equal. This also means that value of the
geometric mean is such as to balance the ratio deviations of observation from it.
Thus, using the sum previous numbers we find that (4/2) (4/4)=2= (8/4).
1. The geometric mean is used to find the average percentage increasing in production,
population and the economic business series. For example, from 1986 to 1987 prices
increased by 5%, 10% and 18% is respectively. The average annual increase is not
11% (5+10+18)/5=11 as given by the arithmetic average but 10.9% as obtained by the
geometric mean.
2. Geometric mean theoretically considered being the best average in the construction
of index number. It’s satisfied the time reversal test and given equal weight to
ratio of change.
3. It is an average most suitable when large weights had to be given to small items
and small weights to large items, situations which we usually come across in
Limitations:
1. It is difficult to understand.
3. It cannot be computed when there are both negative and positive values in series or
20
22-01-2025
Harmonic Mean:
defined as the reciprocal of the arithmetic mean of the reciprocal the individual
In Individual Observation
HM= N .
Σ (1/X)
In Discrete Series:
Steps:
2. Multiply the reciprocal of the respective frequencies and obtain the total
Σ (fX 1/X)
Note:
Instead of first finding out the reciprocal and then multiplying them by
frequencies it will be for easier to divide each frequency by the respective value of the
variable.
21
22-01-2025
In Continuous Series:
HM= N = N .
Σ (fX/m) Σ (f/m)
speed at which a journey has been performed or the average price at which an article
has been sold. The rate usually indicated the relation between two different types of
Merits:
•In problem relating to time and rates it gives better result than other averages.
Demerits:
AM GM HM
22
22-01-2025
Median:
especially where if plotted as a frequency curve one gets a J or reverse J curve. For
Mode:
It is used to describe qualitative data. Mode can be used in problems involving the
Geometric mean:
increasing or decreasing.
Harmonic mean:
constant quantity of another variables i.e., rates, time, and distance covered with in
certain time and quantity purchased are sold per unit etc.
Arithmetic mean:
23
22-01-2025
Measures of Dispersion
the numerical value tent to spread about an average. It is clear from above that
from an average they are also called averages of the second order.
Definition:
central tendency is called dispersion. In other words measures which measure the
1
22-01-2025
which the original data are given such as Rs, KG, etc. These measures may be used to
compare variation in two or more distribution provided the data are expressed in the
same units.
1. Co-efficient of Range
Co-efficient of dispersion
Co-efficient:
2
22-01-2025
Standard Deviation:
Definition: It is defined as the positive root of the mean of the squared deviations of given
letter δ (sigma)
C of SD = δ /X
Co-efficient f variation:
most commonly used measured sod dispersions. The quartile deviation is the
smallest the mean deviation is the next and standard deviation is the largest in the
following proportion.
QD= 2/3 δ
MD= 4/5 δ
3
22-01-2025
Range
Definition :
Range is differences between the two extreme items i.e. it is the
difference between the maximum value and the minimum value in a series.
Range = maximum value - minimum value
Range = L- S
Coefficient of Range = L - S
L+S
L - largest value or item
S - smallest value or item
A. UnGrouped Data
Find the range in the series
80 90 63 68 61 67 65 100 75 89 84 86 60
Solution :
Arrange the numbers in systematic order
61 63 65 67 68 75 80 84 86 89 90 100
Range = L - S
= 100 - 60
=40
Coefficient of Range = L - S
L+S
= 100 - 60
100 + 60
= 40
140
= 0.285
4
22-01-2025
B. Grouped Data
Calculate range and coefficient of range for the following data:
In case of grouped data the range is the differences between the upper boundary of
highest class and the lower boundary of the lowest class. No consideration of the given
frequencies.
Marks No of
Students
Range = L - S
10-20 8
= 60 - 10
20-30 10
=50
30-40 12
40-50 8
Coefficient of Range = L - S
50-60 4
L+S
= 60 - 10
60 + 10
= 50
70
= 0.714
Merits
1. It is simple to compute and easy to understand.
2. It is time saving and widely used in Industrial Quality Control.
Demerits
1. It is not a precise measure.
2. It is not based on each and every item of the distribution.
3. Range cannot tell us anything about the character of the distribution within two
extreme observation.
4. Range cannot be e computed in case of open and distribution.
Uses Of Range
1. Quality control.
2. Fluctuations in the share prices.
3. Weather forecast.
5
22-01-2025
Individual observation :
MD=1/N ∑|X−A|
or
=∑|D|
N
Steps :
6
22-01-2025
|D| = ∑|X−A|
A= 4400
∑|D| = 1200
MD =∑|D|
N
MD = 1200
5
= 240
MD ==∑f |D|
N
7
22-01-2025
MD ==∑f |D|
N
X f m
0-10 5 5
10-20 8 15
20-30 12 25
30-40 15 35
40-50 20 45
50-60 14 55
60-70 12 65
70-80 6 75
8
22-01-2025
Definition :
It is the square root of the quotient that obtained by dividing the sum of the
square deviation of items from the Arithmetic mean by the number of observation.
Standard deviation measure the average spread around the mean.
or
σ = √ sum of squared deviation from Arithmetic mean
number of observation
or
S = √ Σ X2 X= (X- X̅ )
N
9
22-01-2025
σ = √Σ X2 X= (X- X̅ )
N
Steps:
i. Calculate the actual mean of the series i.e. X.
ii. Take the deviation of the items from mean, i.e. find (X-X̅). Denote these
deviation by X.
iii. Square these deviations and obtain the total ΣX2.
iv. Divide ΣX2 by the total number of observation i.e. N and extract the square root.
This gives us the value of standard deviation.
σ = √Σ d2 - - Σd 2
N N
Steps:
i. Take the deviations of the items from assumed mean i.e. Obtain (X- A).
denote these deviation by d. Take the total of these division i.e. Obtained
Σd.
ii. Square these division and obtain the total Σd2.
iii. Substitute the value of Σd2, Σd and N in the above formula.
10
22-01-2025
X d d2
120 20 400
60 -40 1600
80 -20 400
20 -80 6400
100 0 0
40 -60 3600
140 40 1600
560 140 14000
= √ 14000 - (140) 2
7 7
= √ 2000-400
= √ 1600
σ = 40
For calculating standard deviation in discrete series any of the following Method
can be applied :
1. Actual mean method
2. Assumed mean method
3. Step deviation method
11
22-01-2025
Steps :
i. Take the deviation of the items from assumed mean and denote these
deviations by d.
ii. Multiply these deviation by respective frequencies and obtain the total Σfd.
iii. Obtained the square of the deviation i.e. calculate d2.
iv. Multiply the square deviation by the respective frequencies and obtain the total
Σfd2.
X f d d2 fd fd2
3.5 3 -3 9 -9 27
4.5 7 -2 4 -14 28
5.5 22 -1 1 -22 22
6.5 60 0 0 0 0
7.5 85 1 1 85 85
8.5 32 2 4 64 128
9.5 8 3 9 24 72
217 128 362
= √ 362 - (128) 2
217 217
= √ 1.668-0.347
= √ 1.32
σ = 1.148
12
22-01-2025
Calculate the standard deviation from the following data in case of step deviation
method: X f D=X-A fd fd2
C
10 3 -3 -9 27
20 7 -2 -14 28
30 9 -1 -9 9
40 23 0 00 0
50 15 1 15 15
60 8 2 16 32
70 6 3 18 54
80 4 16 64
75 33 229
Assumed mean A=40
σ = √Σfd2 - Σfd 2
N N *i
= √ 229 - (33) 2
75 75 * 10
= √ 3.05-1.936 * 10
= 1.69 * 10
σ = 16.90
Calculate the standard deviation from the following data in case of continuous series:
X f m D=X-A d2 fd fd2
C
20-30 30 25 -3 9 90 270
30-40 58 35 -2 4 -116 232
40-50 62 45 -1 1 -62 62
50-60 85 55 0 0 0 0
60-70 112 65 1 1 112 112
70-80 70 75 2 4 140 280
80-90 57 85 3 9 171 513
90-100 26 95 4 16 104 416
500 259 1885
Assumed mean A=55
σ = √Σfd2 - Σfd 2
N N *i
= √ 1885 - (259) 2
500 500 * 10
= √ 3.77- 00.268 * 10
= 1.87 * 10
σ = 18.7
13
22-01-2025
σ = √1/12( N2-1)
3. The sum of the square of the deviation of items in the series from their
arithmetic mean is minimum. This is the reason why standard deviation is
always computed from the arithmetic mean.
14
22-01-2025
COEFFICIENT OF VARIATION
Example :From the following table of marks obtained by A and B in 10 tests of 150 marks
each, find out who is more intelligent and who is more consistent.
A: 25 50 45 30 70 42 36 48 34 60
B: 10 70 50 20 95 55 42 60 48 80
Solution: In order to find out the more intelligent student between A and B we will calculate
the average marks and for finding out the more consistent student we will compare the
coefficient of variation. A B dA dA2 dB dB2
25 10 -5 25 -40 1600
50 70 20 400 20 400
45 50 15 225 0 0
30 20 0 0 -30 900
70 95 40 1600 45 2025
42 55 12 144 5 25
36 42 6 36 -8 64
48 60 18 324 10 100
34 48 4 16 -2 4
60 80 30 900 30 900
440 530 140 3670 30 6018
15
22-01-2025
i) X̄ A = XA
N
= 440
10
X̄ A = 44
X̄ B = XB
N
= 530
10
X̄ B = 53
∴ Since average score by the student B is higher than A. hence, student B is more
intelligent.
ii)
σA = √Σ d2 - Σd 2
N N
= √ 3670 - (140) 2
10 10
= √ 367-196
= √ 171
σA = 13.06
C.V A = σ A
X̄ A * 100
= 13.06
44 * 100
C.V = 29.68
σB = √Σ d2 - Σd 2
N N
= √ 6018 - (30) 2
10 10
= √ 601.80 - 9
= √ 592.8
σB = 24.34
C.V B= σ B
X̄ B * 100
= 24.34
53 * 100
C.V = 45.92
16
22-01-2025
Variance
Variance is nothing but square of the standard deviation.
i.e. Variance = σ2 = (X- X̄ )2
σ = √ Variance
Variance = Σ fd2 - Σf d 2
N N * i2
d = X-A
i
17
22-01-2025
Correlation analysis
If two quantities vary in such a way that movements in one are accompanied by
movements in other, these quantities are correlated. For example, there exists some
two such sets of observations is called correlation. The correlation analysis refers to the
Types of correlation
It depends upon the direction of change of the variables. If both the variables are
varying in the same direction correlation is said to be positive. If, on the other hand, the
The distinction is based upon the number of variables studied. When only two variables are
studied it is a problem of simple correlation. When there are more variables are studied it is a
problem of either multiple or partial correlation.
In multiple correlation three or more variables are studied simultaneously, on the other
hand, in partial correlation we recognize more than two variables, but consider only two
variables to be influencing each other the effect of other influencing variables kept constant.
The distinction is based upon the constancy of the ratio of change between the variables. If
the amount of change in one variable tends to bear constant ratio to the amount of change in
other variables then the correlation is said to be linear.
1
22-01-2025
2. Graphic method.
The simplest device for ascertaining whether two variables are related is to prepare a
dot chart called scatter diagram. When this method is used the given data are plotted on a
graph paper in the form of dots, i.e., for each fair of X and Y values we put a dot and thus
obtain as many points as the numbers of observations. By looking to the scatter of various
points we can form an idea to whether the variables are related or not. The greater the
scatter of the plotted pointed on the chart, the lesser is the relationship between the two
variables. The more closely the points come to a straight line, the higher the degree of
relationship.
If all the points lie on a straight line falling from the lower left-hand corner to the upper right-
hand corner, correlation is said to be perfectly positive (i.e., r = +1).
If all the points lie on a straight line rising from the upper left-hand corner to the lower right-
hand corner of the diagram, correlation is said to be perfectly negative (i.e., r = -1).
2
22-01-2025
If the plotted points fall in a narrow band would be a high degree of correlation
between the variables -- correlation shall be positive if the points show a rising
tendency from the lower-left hand corner to the upper right-hand corner.
Correlation shall be negative if the points show a declining tendency from the
upper left-hand corner to the lower right-hand corner of the diagram.
v
v v
v
v
v
If the points widely scattered over the diagram it indicates very little relationship
between the variables-- correlation shall be positive , if the points are rising from the lower
left-hand corner to the upper right-hand corner.
v
vi. Low degree of negative correlation :-
Correlation shall be negative if the points are running from the upper left-hand side to the
3
22-01-2025
vii. No correlation :-
If the plotted points lie on a straight line parallel to the X- axis haphazard
manner, it shows absence of any relationship between the variables (i.e., r = 0).
Merits
Limitations :-
By applying this method we can get an idea about the direction of the
correlation and also whether it is high or low. But we cannot establish the exact
mathematical methods.
4
22-01-2025
2.Graphic method
Under this method individual values of two variables are plotted on the
graph paper thus obtain two variables one for X variable and another for Y
variable. By examining the direction and closeness of the to drawn we can inter
whether not the variables are related. If both the curves drawn on the graph are
be positive. On the other hand, if the curves are moving in the opposite directions
• From the following data ascertain whether the income and expenditure of 100 workers of a
1979 100 90
1980 102 91
1981 105 93
1982 105 95
1983 101 92
1984 112 94
5
22-01-2025
The pearson coefficient of correlation is denoted by the symbol r. the formula for
computing pesrsonian r
(i) when deviations of is the items are taken from actual mean.
r= ∑xy
Nδx δy
Where,
x = (X – X̅) & y = ( Y - Y̅ )
δx = standard deviation of series X,
δy = standard deviation of series Y,
N = number of pairs of observations,
r = the correlation coefficient,
The value of r lies between +1 and -1, i.e., cannot be greater than 1 or less than -1.
If r = +1 correlation is perfect and positive. If, r = -1 correlation is perfect and
negative. If r = 0 there is no correlation i.e., the variables are independents the above
formula for computing pearson coefficient of correlation can be transformed to
following form which is easier to apply.
r= Σx y
Σx² Σy² (i)
x = ( X – X̅ )
y = ( Y – Y̅ ) √
(ii) Direct method of finding out correlation
r= N Σxy - (Σx) (Σ y)
√ NΣx² - (Σx)² √ NΣy² - (Σy)²
6
22-01-2025
X Y X- X̅ X² Y- Ȳ Y² XY
48 45 14 196 10 100 140
35 20 1 01 -15 225 -15
17 40 -17 289 5 25 -185
23 25 -11 121 -10 100 110
470 45 13 169 10 100 130
170 175 00 776 00 550 280
r= Σx y .
√ Σx² Σy²
x = ( X – X̅ ) y = ( Y – Y̅ )
X̅ = Σx = 170 = 34 Ȳ = Σy = 175 = 35
N 5 N 5
r = 280 .
√ 776*550
= 280
653.29
= 0.429
= 513 .
540
r = 0.95
7
22-01-2025
r= 16928 - 5076
√ 11800 – 2209 √ 27744 – 11664
r= 1852 .
√ 9591√ 16080
r= 11852 .
97.93 * 126.80
r= 11852 .
12417.78
r= 0.954
8
22-01-2025
2) The two variables under study are affected by a large number of causes so as to form
a normal distribution.
3) There is a cause and effect relationship between the forces affecting the distribution
1) It is most popular.
2) It summarizes in one figure not only the degree of correlation but also the direction,
1)The correlation coefficient always assumes linear relationship regardless of the fact
4)This method takes more time to compute the value of correlation coefficient.
9
22-01-2025
R = 1-(6 ΣD² )
N (N² –1)
R = Rank correlation coefficient
The value of this coefficient ,interpreted in the same way as karl pearson’s
correlation coefficient, ranges between +1 and -1. when r ranks are in the same
direction, when the r is -1 there is complete agreement in the order to the ranks
and they are in opposite directions.
10
22-01-2025
Example :
R1 R2 D= R1 – R2 D²
1 3 -2 4
2 2 0 0
3 1 2 4
D² = 8
R = 1- (6 ΣD² )
N (N² –1)
= 1- 6 * 8
3 (3² –1)
= 1 - 48
24
R= - 1
Where actual ranks are given to us the required for computing rank correlation are :
(i) Take the difference of the two ranks, i.e., (R1-R2) and denote these differences by D.
11
22-01-2025
Example :
The ranking of 10 students in subjects accounting and auditing are as follow:
Accounting 3 5 8 4 7 10 2 1 6 9
Auditing : 6 4 9 8 1 2 3 10 5 7
R = 1- (6 ΣD² )
Solution:
N 3 –N
R1 R2 D= R1 – D²
R2 = 1- 6 * 214
10 3 – 10
3 6 3 9
5 4 1 1
= 1 - 1284
8 9 -1 1
990
4 8 -4 16
7 1 6 36 = - 294
10 2 8 64 990
2 3 -1 1
1 10 -9 81 R = - 0.296
6 5 -1 1
9 7 2 4
D² = 214
When we are given the actual data and not the ranks, it will be necessary to assign
the ranks. Ranks can be assigned by taking either highest value as 1 or the lowest value
as 1. but whether we start with lowest value or the highest value we must follow the
Example:
company are given below, using rank correlation method, determine the relationship
12
22-01-2025
R = 1- (6 ΣD² )
N 3 –N
= 1- 6 * 62
73–7
= 1 - 372
336
= - 36
336
R = - 0.107
Equal ranks
If two or more items are of equal value, they can assigned average rank. An
adjustment is required for each group of equal ranks. The formula for calculating
If there are one such group of items with common ranks, this value is
13
22-01-2025
Example:
Calculate the rank coefficient of correlation of the following data.
X Rx Y Ry D = (Rx – Ry) D2
80 8 12 1 7 49
78 7 13 2 5 25
75 5.5 14 4 1.5 2.25
75 5.5 14 4 1.5 2.25
68 4 14 4 0 00
67 3 16 7 4 16
60 2 15 6 4 16
59 1 17 8 7 49
D² = 159.5
Merits
(1)This method is simpler to understand and easier to apply compared to the karl
Pearson's method.
(2)When the data use of a qualitative nature like honesty, efficiency etc, this method can
be used with great advantage.
(3) This is the only method that can be used where we can given the ranks and not actual
data.
(4)Even where actual data are given, rank method can be applied for ascertaining
correlation
Limitations
1) This method cannot be used for finding out correlation in a grouped frequency distribution.
2) This method should not be applied where N exceeds 30 censes we can given the ranks and not
28
14
22-01-2025
Regression analysis
Introduction
Regression analysis reveals average relationship between two variables and this
makes possible estimation or prediction. The two variable regression model assigns one of
the variables the status of an independent variable, and the other variable the status of a
de-pendent variable.
15
22-01-2025
Regression equation X on Y
The regression equation of X on Y is used to describe the variations in the values of X for
given changes in Y.
It is expressed as
Xc = a + bY
To determine the values of a and b the following normal equations are to be solved
simultaneously,
ΣX = Na + bΣy
ΣXY = a Σy + bΣy²
Regression equation of Y on X
Y – Y̅ = r δy / δx (x - x̅ )
16
22-01-2025
X Y x x2 y y2 xy
6 9 0 0 1 1 0
2 11 -4 16 3 9 -12
10 5 4 16 -3 9 -12
4 8 -2 04 0 0 00
8 7 2 04 -1 1 -2
30 40 0 20 -26
Regression equation of X on Y
X – X̅ = r δx / δy (Y - Y̅ )
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
Y - Y̅ ̅ = r δy / δx ( X – X)
Hence,
Y – 8 = -0.65 (X – 6)
Y – 8 = -0.65X + 3.9
Y = -0.65X + 11.9
OR Y = 11.9 – 0.65X
17
22-01-2025
regression equation of X on Y :
( X – X̅ ) = r δx / δy (Y - Y̅)
(Y - Y̅) = r δx / δy ( X – X̅ )
r δy / δx = Σdxdy – Σdx * Σdy / N
Σ dx² - (Σdx)² / N
X Y Dx =X dX2 Dy = Y – 7 dY2 XY
–6
6 9 0 0 2 4 0
2 11 -4 16 4 16 -16
10 5 4 16 -2 4 -08
4 8 -2 04
1 1 -02
8 7 2 04
0 0 00
30 40 40 5 25 -26
dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
Y - Y̅ ̅ = byx ( X – X)
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dx 2 – (Σdx 2) / N
= -26 – (0) (5)
40 – 0/5
= -26/40
= -.65
18
22-01-2025
dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
( X – X) ̅ ̅ = byx (Y - Y )
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dy 2 – (Σdy 2) / N
= -26
25 – (5) 2 /5
= -26/20
= -1.3
2. Correlation merely degree of relationship not cause and effect relationship. Regression cause
and effect relationship.
3. rxy is a measure of direction and degree of linear relationship between two variables X and Y,
rxy and ryx symmetric (rxy = ryx). i.e., it is immaterial which of X and Y is dependent
variable and which is independent variable. In regression analysis the regression coefficients
bxy and byx are not symmetric i.e., bxy ≠ byx and hence it definitely makes a difference as to
4. There may be nonsense correlation between two variables, it is purely due to change and has
no practical relevance. Nothing like nonsense regression.
correlation takes the same sign as the ( r ) regression coefficient ( bxy and byx )
19
22-01-2025
1. Both the regression coefficients will have the same sign. i.e., either they will be positive or
negative. It is never possible that one of the regression coefficients is negative and other
positive.
2. Since the value of the coefficient of correlation ( r ) cannot exceed one, one of the regression
coefficient must be less than one or in other words, both the regression coefficients can not be
greater than one. For ex – if bxy = 1.2 and byx = 1.4 the value of correlation coefficient would
3. The coefficient of correlation will have the same sign as that of regression coefficients i.e., if
regression coefficient have a negative sign, r will also be negative, if regression coefficient
have a positive sign r would also be positive. For example – if bxy = -0.8 and byx = -1.2 r
4. Since bxy = r δx / δy, we we can find out any of the four values given the other three. For
example, if we know that r = 0.6, δx = 0.4 and bxy = 0.8 we can find δy, bxy = r δx / δy .
5. Regression coefficients are independent of change of origin but not scale ( change of origin
subtracting some constant change of scale. Dividing, multiplying, every values x and y).
20
22-01-2025
Introduction
One of the most important tasks before economists and businessmen these days is
Businessmen – sales
Economists – population
“A time series consists of statistical data which are collected, recorded or observed over
-- Morris Hambury
It is clear from above definitions that time series consists of data arranged
chronologically. Thus if we record the data relating to population, per capita income, prices,
production, etc. for the last 5, 10, 15, 20 years or some other time period., the series so edging
The problem of time series analysis can be best appreciated with the help of the following
example. The following are the figures of sales of refrigerators of a firm in thousand units :
1992 48
1993 65
1994 42
1
22-01-2025
1. Secular movements : Changes have occurred as a result of general tendency of the data to
increase or decrease.
4) It facilitates comparison.
1. Secular Trend :
Trend, also called secular or long-term trend, is the basic tendency of production,
sales, income, employment, etc., to grow or decline over a period of time. The
concept of trend does not include short-range oscillations but rather steady
movements over a long time.
2. Seasonal variations :
Seasonal variations are the periodic movements in business activity which occur
regularly every year and have their origin in the nature of the year itself.
2
22-01-2025
3. Cyclical variations :
The term ‘cycle’ refers to the recurrent variations in time series that usually last longer than
a year and regular neither in amplitude nor in length. Cyclical fluctuations are long-term
movements that represent consistently recurring rises and declines in activity. A business
cycle consists of the recurrence of the up and down movements of business activity from
some sort of statistical trend or “normal”. By “normal” we mean some kind of statistical
average we do not mean that there is anything very permanent or special. There are four well
defined periods or phases in the business cycle, namely :
(i) prosperity.
4. Irregular variations :
Irregular variations, also called erratic, accidental, random, refers to such variations in
Irregular variations are caused by such isolated special occurrences as floods, earthquakes,
strikes and wars. Sudden changes in demand or very rapid technological progress may also
3
22-01-2025
• Measurement of trend :
The various methods that can be used for determining trend are :
4
22-01-2025
Ex : Below are the figures of production of sugar factory (in thousands quintals)
Fit a straight line trend for these figures.
Since ∴a = ΣY ∴b = ΣXY
N ΣX2
a = 630 = 90 b = 56 = 2
7 28
5
22-01-2025
Ex : Fit a straight line trend for these following series. Estimate the value for 1987.
Since ∴a = ΣY ∴b = ΣXY
N ΣX2
6
22-01-2025
Ex : Fit a straight line trend by the method of least squares to the using data. Assuming the same
of change continues, what would be the predicted earnings for the year 1988.
Since ∴a = ΣY ∴b = ΣXY
N ΣX2
The procedure of obtaining a straight line trend by this method is given below.
ii. Examine carefully the direction of the trend based on the plotted intermation dots.
iii. Draw a straight line which will best fit to the data according to the personal judgment.
7
22-01-2025
Example : Fit a trend line by the method of semi – averages to the data given below.
8
22-01-2025
When there are even number of years like 6, 8, 10, etc, two equal parts can
easily be formed and average of each part obtained. However, when the
average is to be centered there would be some problem in case the number of
years is 8, 12, etc. For example, if the data relates to 1984, 1985 and 1987
which would be the middle year ? In such a case the average will be centered
corresponding to 1st July 1985, i.e., a middle of 1985 and 1986. The following
example shall illustrate the point.
Example : Fit a trend line by the method of semi – averages to the data given below.
1982 454
1983 470
1984 482 1942 = 485.5
1985 490 4
1986 500
500. Trend Line
490.
Solution : 480. Actual Line
470.
460.
Sales
9
22-01-2025
When a trend is to be determined by the method of moving averages, the average value
for a number of years is secured, and this average is taken as the normal or trend value for the
unit of time falling at the middle of the period covered in the calculation of the average. The
effect of averaging is to give a smoother curve, lessening the influence of the fluctuations that
full the annual figures away from the general trend.
While applying this method, it is necessary to select a period for moving average such as
Calculate the 3 yearly moving averages of the producation figures given below and draw the
trend line.
10
22-01-2025
If the moving average is an even period moving average, say four yearly, or
six-yearly, the moving total and moving average which are placed at the centre of
time spam from while they are computed fall between two time periods. This
placement is inconvenient since the moving average so placed would not coincide
with the original time period, we, therefore, synchronies moving average and
original data. This process is called centering and always consists of taking a two
Estimate the trend value use the data given below by taking a four yearly moving averages
11
22-01-2025
(i) ∑ (Y – Yc) = 0
i.e., the sum of deviations of the actual values of Y and the computed values of Y
is zero.
i.e., the sum of the squares of the actual and computed values id least.
12
22-01-2025
Theory of probability
Probability defined :
• Broadly speaking there are four different schools of thought on the concepts of probability,
The classical approach to probability is the oldest and simplest. It originated in eighteenth
century in problems pertaining to games of chance, such as throwing of coins, dice or deck of
cards, etc. The basic assumption underlying the classical theory is that the outcomes of a random
experiment are “ equally likely”. The “event” whose probability is sought consist of one or more
possible outcomes of the given activity such as when a die is rolled once, any one of the six
possible outcomes, i.e., 1,2,3,4,5,6, can occur. These activities are referred to in modern
terminology as “experiment which is a terms that refers processes which result in different
possible outcomes or observations. The term “equally likely”, through undefined, conveys the
notion that each outcome of an experiment has the same chance of appearing as any other. Thus
in a throw of a die occurrence of 1,2,3,4,5,6, are equally likely events.
1
22-01-2025
• The definition of probability given by French mathematician Laplace and generally adopted
disciples of the classical school runs as follows probability it is said, is the ratio of the number
of “favorable” cases to the total number of equally likely cases. If probability of occurrence of
A is denoted by P(A), then by this definition we have.
For example, if a coin is tossed there are two equally likely results, a head or a tail, hence
the probability of head is ½. Similarly, if a die is thrown, the probability of obtaining an even
number is 3/6 or ½, since, three of the six equally possible results are even numbers.
• Symbolically, if an event A can happen in ‘a’ ways out of a total of ‘n’ equally likely and
mutually exclusive ways then the probability of occurrence of event called its success is
denoted by P = Pr(A) = a/n and the probability of non-occurrence of the event (called its
failure) is given by :
Since the sum of the successful and unsuccessful outcomes is equal to the total number of
events, we have,
a+b = n
Dividing by n,
a/n + b/n = 1
So that p+q = 1
2
22-01-2025
The scale of probability extends from zero to unity (i.e., one). When p =0 it denotes
impossibility of the event taking place, i.e., the event cannot place. However, this is true only
when the number of possible outcomes is finite. For example, the probability of throwing
seven with a single die is zero. On the other hand, when p=1 it denotes certainty, i.e., the
event is bound to take place. In most cases, in practical life the probability lies between these
two extremes 0 and 1.
For example(1), one card is drawn from a standard pack of 52. what is the probability that it
is a king.
Example(2) from a bag containing 10 black and 20 white balls, a ball is drawn at random what
is the probability that it is the black ball ?
Solution :
Or p(A) = a/n
= 10/30
= 1/3
3
22-01-2025
The whole field of probability theory of finite sample spaces is based upon the following
three axioms:
a. The probability of an event ranges from zero to one. If the event cannot take place its
probability shall be zero and if it is certain, i.e, bound to occur, its probability shall be one.
c. If A and B are mutually exclusive (or disjoint) events then the probability of occurrence of
either A or B by P(A U B) shall be given by
P (A U B) = P (A) + P(B).