0% found this document useful (0 votes)
33 views76 pages

2nd Unit

The document discusses measures of central tendency, including definitions, objectives, types, and methods for calculating averages such as arithmetic mean, median, and mode. It outlines the requisites of a good average, merits and demerits of each measure, and provides examples for calculating these statistics. Additionally, it highlights the mathematical properties of these measures and their applications in statistical analysis.

Uploaded by

biologyinshorts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views76 pages

2nd Unit

The document discusses measures of central tendency, including definitions, objectives, types, and methods for calculating averages such as arithmetic mean, median, and mode. It outlines the requisites of a good average, merits and demerits of each measure, and provides examples for calculating these statistics. Additionally, it highlights the mathematical properties of these measures and their applications in statistical analysis.

Uploaded by

biologyinshorts
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

22-01-2025

Measures of Central Tendency

One of the most important objectives of statistical analysis is to get single value
that describes the characteristics of the entire mass of universal data. Such a value is
called the central value or an average or the expected value of the variable.

Definition:
According to Clark,
“Average is an attempt to find one single figure to describe whole of figures”
It is clear from of the above definition that an average is a single value that
represents a group of value. Such a value is of great significance because it depicts the
characteristics of the whole group. Since an average represents the entire data, its value
lies somewhere in between the two extremes, i.e., the largest and smallest items. For this
reason an average is frequently referred to as a measure of Central tendency.

Objectives of Averaging:

1. To get one single value that describes the characteristics of the entire group.

Example : National Income / Population = PCI

2. To facilitate comparisons.
Requisites of a Good Average:

1. It should be easy to understand.

2. It should be simple to compute.

3. It should be based on the all items.

4. It should not be affected by the extreme observations.

5. It should rigidly define.

6. It should be capable of further algebraic treatment.

7. It should have sampling stability.

1
22-01-2025

Types of Averages:

1. Arithmetic Mean

2. Median

3. Mode

4. Geometric Mean

5. Harmonic Mean

Arithmetic Mean:

Its value is obtained by adding together all the items and by dividing this total

by the numbers if its items.

A.M may be either,

1) Simple Arithmetic Mean

2) Weighted Arithmetic Mean

Calculating of Simple Arithmetic Mean – Individual Observations:


The process of computing mean in case of individual observations (i.e., where
the frequencies are not given) is very simple. Add together the various values of the
variables and divide the total by the number of items.

Symbolically,
X= X1+X2+X3……Xn
N
Or,
X= ΣX
N
Here,
X= Arithmetic Mean
ΣX= Sum of all the values of variable X, i.e., X1+X2+X3……Xn
N = Number of observations

Steps:
1. Add together all the values of the variables X and obtain the total, i.e., ΣX
2. Divide the total by the number of observations i.e., N

2
22-01-2025

Ex: The following table gives monthly income of 10 families in a town.


Income (Rs) 780,760, 690, 750, 540, 920, 1100, 810, 1050, 950.

Solution: Calculate the AM:


X= ΣX
N

X̄ = 780+760+690+ 750 + 540+ 920+1100+ 810+1050+ 950


10

= 8650
10
X̄ = 865.0

Calculation of Arithmetic Mean- Deseret Series

X= ΣfX
N

Where,

f= Frequency

X= the variable in Question

N= the number of observations i.e., Σf

Steps:

1. Multiply the frequency of each row with the variable and obtain the total ΣfX.

2. Divide total obtain by step I by the number of observations i.e., total frequency.

3
22-01-2025

Ex: from the following data of the marks obtained by 60 students of a class, calculate the A.M

Marks No of students fX
X= ΣfX
X f
N
20 8 160
= 2460
30 12 360
60
40 20 800 X = 41
50 10 500
60 6 360
70 4 280
N = 60 ΣX = 2460

Calculation of Arithmetic Mean- Continuous Series

Direct Method
X= Σfm
N

Where,

m= mid-point of various classes,

f= the frequency of each classes,

N= the total frequency.

Steps:

•Obtain the mid-point of each class and denote it by m.

•Multiply these mid-points by the respective frequency of each class and obtain the total Σfm.

•Divide the total obtained in step I by the sum of frequency i.e., N.

4
22-01-2025

Ex: From the following data compute arithmetic mean


Solution:

Marks No of students Mid point fm


X= Σfm
X f (m)
N
0-10 5 5 25
= 3300
10 - 20 10 15 150
100
20 - 30 25 25 325 X = 33
30 - 40 30 35 1050
40 - 50 20 45 900
50 - 60 10 55 550
N = 100 Σm = 3300

Short-cut Method:
X=A+ Σfd * i
N
Where,
A= Assumed mean
d= deviation of mid-points from assumed mean divided bt class interval i.e., (m-A)
i
N= total number of observations
i= class intervals
Steps:
1. take any assumed mean
2. from the mid-point of each class deduct the assumed mean and divide the deviation by I and
denote it by d
3. Multiply the respective frequency of each class by these d values and obtain the total Σfd.
4. Apply the Formula:
X=A+ Σfd * i
N

5
22-01-2025

Calculation of arithmetic mean in case of open end classes

Open end classes are those in which lower limit of the first class and the upper limit of
last class are not known. In a such case we cannot find out the arithmetic mean unless we make
an assumption about the unknown limits. The assumption would naturally depend upon the
class intervals following the first class and preceding the last class.

Marks No of students
X f
In the above case since the class interval is
Below-10 4
uniform, the appropriate assumption would be that
10 - 20 6
20 - 30 10 the lower limit of the first class is zero and upper

30 - 40 18 limit of the last class is 60.the first class would be 0-


40 - 50 8 10 and the last
50 - Above 7

Mathematical properties of arithmetic mean

1. The sum of deviations of the items taking from a arithmetic mean taking sign into account is
always zero (0) i,e. Σ(X-X) = 0, This would be clear from the following data.

X (X-X) X= ΣX
10 -20 N
20 -10
= 150
30 0 5
40 10 X = 30
50 20

It is because of this property that the mean is characterized as a point of balanced because the
sum of positive deviations from it is equal to the sum of negative deviations from it.

6
22-01-2025

2. The sum of deviations of the items taking from a arithmetic mean is minimum is less than
the sum of square deviation of the items from any other value. The following example
would clarify the point.

X (X-X) (X-X)2 X (X-3) (X-3)2

2 -2 4 2 -1 1

3 -1 1 3 0 0

4 0 0 4 1 1

5 1 1 5 2 4

6 2 4 6 3 9

ΣX = 20 (X-X) = 0 (X-X)2 =10 Σ(X-3)2 =15

Sum of the square deviation is equal to It is clear that Σ(X-3)2 is greater. This
10 in the above case. If the deviation are property that the sum of the square of
taken from any other value then sum of the items is least from the mean is of
square deviations would be greater then 10. immense use in regression analysis.
For example, let us calculate the square of the
deviations of items from a value less than the
arithmetic mean say 3

3. X= ΣX NX = ΣX
N
In other words, if we repeat each items in the series by the mean, then the sum these
substitutions will b equal to the sum of the individual items .

4. If we have the arithmetic mea n and number of items of two or more then two related group,
we can compute combined average of these groups by applying following formula

X 12 = N 1 X 1 + N2 X2
N 1 + N2

X 12 = combined mean of the two averages


X 1 = Arithmetic mean of the first group
X 2 = Arithmetic mean of the second group
N 1 = number of items in the first group
N 2 = number of items in the second group

5. If the given observations of X be changed to observations on Y= a + bX

7
22-01-2025

Ex : The mean height of 25 male workers in a factory is 61 cm and the mean height of 35 female
workers in the same factory of 58cm. Find the combined mean height of 60 workers in the factory.

Solution:
X 12 = N 1 X 1 + N2 X2
N 1 + N2

N 1 = 25, X 1 = 61 X2 = 58 N2 = 35

X 12 = (25*61)+ (35*58)
25+ 35

= 1525 + 2030
60

= 3555
60

X 12 = 59.25

Merits:

1. It is the simplest average to understand and easiest to compute.

2. It is based on all the items.

3. It is defined by a rigid mathematical formula.

4. It is capable of algebraic treatment.

5. It is relative reliable.

6. The mean is typical in the sense that it is the centre of gravity.

7. It is a calculated value and not a based on position on in the series.

8
22-01-2025

Demerit:

1. It is affected by extreme observation , (Ex: 60, 70, 90 and 100)

2. In a distribution with open-end classes the value of mean cannot be

computed.

3. The arithmetic mean is not always a good measure of Central Tendency.

4. In case of a U-shaped distribution the mean is not likely to serve a useful

purpose.

Median:

The median by definition refers to the middle value in a distribution. In

case of a median one-half of the items in the distributions have a value the size of the

median value or smaller and one half has a value the size of the median value or larger.

As distinct from the arithmetic mean which is calculated from the value of

every item in the series, the median is what is called a positional average. The term

position refers to the place of a value in a series. The place of the median in a series is

such that an equal number of items lie on either side of it. Thus when N is odd, the

median is an actual value, with the reminder of the series in two equal parts on either

side of it. If N is even, the median is a derived figure i.e., half the sum of the two

middle values.

9
22-01-2025

Calculation of Median – Individual Observation:

Steps:

I. Arrange the data in ascending or descending order of magnitude (both arrangement

should give the same answer)

II. In a group composed of an add numbers of values, add 1 to the total number of

values and divide by 2. The median value for a group composed of an even number

of items is estimated by findings, the arithmetic mean – that is, adding the two

values in the middle and divide by two.


Median = size of N+1 the items
2

10
22-01-2025

Calculation of Median –Discrete series

Steps:

I. Arrange the data in ascending or descending order of magnitude

II. Find out the cumulative frequencies.


III. Apply the formula; Median = size of N+1
2

IV. Now look at cumulative frequency column and find the total which is either

equal to N+1/2 or next higher than that and determine the value of the variable

corresponding to this. That gives the value of median.

11
22-01-2025

Calculation of Median – Continuous series

Steps:

I. Determine the particular class in which the value of median lies. Use N/2 as rank of the

median and not N+1/2. Hence it is N/2 which will divide the area of curve into two parts

and such as we should use N/2 instead of N+1/2 continuous series. After ascertaining the

class in which median lies, the following formula is used for determining the exact value

of median.
Median = L+ N/2 - c.f * I
f
L= lower limit of median class i.e., the class in which the middle items in distribution lies

c.f= Cumulative frequency of the class preceding the median class or sum of the

frequencies of all classes lower than the median class.

f= Simple frequency of median class

i= the class interval of the median class

12
22-01-2025

Mathematical Properties of Media:

•The sum of the deviations of the items from media, ignoring signs, it the least.

Merits:

1. It is useful in case of open end class

2. It is not influence by extreme class

3. It is marked by skewed distribution

4. It is the most appropriate average in dealing with qualitative data

5. The value of median can be determined graphically.

13
22-01-2025

Demerits:

1. For calculating median it is necessary to arrange the data

2. Since it is a positional average, its value is not determined by each and

every observation

3. It is not capable of algebraic treatment

4. The value of median is affected more by sampling fluctuations than the

value of arithmetic mean

5. The median in some cases, cannot be computed

6. It is erratic if the number of items is small

14
22-01-2025

Mode:
The mode or the mode value is that value in a series of observation having

occurs with the greatest frequency.

Ex: The mode of series 3, 5, 8, 5, 4, 5, 9, 3 would be 5, since this value

occurs more frequently than any of the others.

The mode is often said to be the value which occurs most often, that is, with the

highest frequently.

Mode

Calculation of mode: (Individual observation)

For determining mode count the number of times the various values repeat themselves

and the value occurring the maximum number of times in the middle value.

Calculation of Mode –Discrete series

In a discrete series quite often mode can be determined just by inspection i.e., by looking

to that value of the variable around which the items are most heavily concentrated.

For example: Observe the following data,

Size of Garments 28 29 30 31 32 33
No of persons wearing 10 20 40 65 50 15

For the above data we can clearly say that the mode size is 31 because the value 31 has

accrued the maximum number if times i.e., 65.

15
22-01-2025

Calculation of Mode –Continuous series


Mode= L + f1-f0 * i
2f1-f0-f2

Where,

L= lower limit of the mode class

f1= frequency of the mode class

f0= frequency of the class preceding the mode class

f2= frequency of the class succeeding the mode class

While applying the above formula for calculating mode, it is necessary to see that

the class interval is uniform throughout. If they are unequal they should first be made

equal on the assumption that the frequencies are equally distributed throughout the

class, otherwise we will get misleading results.

There may be two values which occur with equal frequency. The distribution is

then called bi-modal.

Where mode is ill-defined, its value may be ascertained by the following

formula based upon the relationship between mean, median, and mode.

Mode= 3 median - 2 mean

This measure is called empirical method.

Usefulness:
Highly skewed or non -normal distribution.

16
22-01-2025

Merits:

1. By definition mode is the most typical or representative value of a distribution.

2. It is not affected by extreme items

3. Its value can be determined in often-end distribution without ascertaining the class limits.

4. It can be used describe quantitative phenomenon

5. The value of mode can be determined graphically.

Demerits:

1. The value of mode cannot always be determined. In some cases we may have a bimodal

series.

2. It is not capable of allergic treatment.

3. The value of mode is not based on each and every items o f the series.

4. It is not rigidly defined.

Relationship among mean, median, mode:

The distribution in which the value of mean, median, mode are coinside (i.e., mean =

median = mode) is known as a symmetrical distribution. If mean, median, modes are

not equal then it is known as asymmetrical distribution.

Mode=3Median- 2 Mean.

17
22-01-2025

Geometric Mean:

Geometric Mean is defined as the nth root of the product id N items or values.

Symbolically:

G.M= n (X1)*(X2)*(X3)………..Xn

Where X1, X2, X3,….etc refers to the various items of the series.

When the number of items is three or more the task of multiplying numbers and of the

root becomes excessively difficult. To simplify calculations logarithms are used.

Geometric mean then is calculated as follow:


Log GM= logX1+ logX2+…….. log Xn
N
Log GM= Σ log X
N
i.e., GM= AL (Σ log X)
N

Calculation of Geometric Mean – Individual Observation:


GM= AL (Σ log X)
N

Steps:
1. Take the logarithms of the variable X and obtain the total Σ log X.
2. Divide Σ log X by N and take the antilog of the values so obtained. This gives the
value of geometric mean.

Example

Monthly income of ten families of a particular place is given below. Find out

Geometric mean

85 70 75 500 8 45 250 40 36

85 70 75 500 8 45 250 40 36

18
22-01-2025

Calculation of Geometric Mean – discrete series:


GM= Antilog ((Σ f log X)
N
Steps:
1. Find the logarithms of the variable X.
2. Multiply these logarithms with the respective frequencies and obtain the total Σ
log X.
3. Divide the Σ f log X by the total frequencies and take the antilog of the value so
obtained.

Calculation of Geometric Mean – Continuous Series:


GM= Antilog ((Σ f log m)
N

Steps:

1. Find the midpoint of the classes and take their logarithm.

2. Multiply these logarithms with the respective frequencies of each class and obtain

the total Σ f log m.

Properties of Geometric Mean:

1. The product the value of the series will remain unchanged when the value of

Geometric mean is substitute for each individual values.

For example: Geometric Mean for series 2, 4, 8 is 4; therefore we have

2*4*8 = 64 = 4*4*4

19
22-01-2025

2. The sum of the deviations of the logarithms of the original observation above and

below the logarithms Geometric Mean is equal. This also means that value of the

geometric mean is such as to balance the ratio deviations of observation from it.

Thus, using the sum previous numbers we find that (4/2) (4/4)=2= (8/4).

Because of this property this measure of central values is especially

adopted to average, ratios, rate of change, and logarithmically distributed series.

Uses of Geometric Mean:

1. The geometric mean is used to find the average percentage increasing in production,

population and the economic business series. For example, from 1986 to 1987 prices

increased by 5%, 10% and 18% is respectively. The average annual increase is not

11% (5+10+18)/5=11 as given by the arithmetic average but 10.9% as obtained by the

geometric mean.

2. Geometric mean theoretically considered being the best average in the construction

of index number. It’s satisfied the time reversal test and given equal weight to

ratio of change.

3. It is an average most suitable when large weights had to be given to small items

and small weights to large items, situations which we usually come across in

social and economic field.

Limitations:

1. It is difficult to understand.

2. It is difficult to compute and to interpret and so has restricted applications.

3. It cannot be computed when there are both negative and positive values in series or

one or more of the values is zero.

20
22-01-2025

Harmonic Mean:

The Harmonic Mean is based on the reciprocal of the n umber arranged. It is

defined as the reciprocal of the arithmetic mean of the reciprocal the individual

observation. Thus, by definition


HM= N .
( 1 + 1 + 1 +…….. 1 )
X1 X2 X3 Xn

In Individual Observation
HM= N .
Σ (1/X)

In Discrete Series:

In Discrete series, Harmonic mean is computed by applying the following formula:


HM= N . = N .
Σ (fX 1/X) Σ (f/X)

Steps:

1. Take the reciprocal various items of the variable X.

2. Multiply the reciprocal of the respective frequencies and obtain the total

Σ (fX 1/X)

3. Substitute the value of N and Σ (fX 1/X) in the above formula.

Note:

Instead of first finding out the reciprocal and then multiplying them by

frequencies it will be for easier to divide each frequency by the respective value of the

variable.

21
22-01-2025

In Continuous Series:
HM= N = N .
Σ (fX/m) Σ (f/m)

Uses of Harmonic Mean:

The harmonic mean is restricted in its field of usefulness. It is useful for

computing the average rate of measure in profits of a common concern or average

speed at which a journey has been performed or the average price at which an article

has been sold. The rate usually indicated the relation between two different types of

measures units that can be expressed reciprocally.

Merits:

•Its value is based on every item of the series.

•It tends itself to algebraic manipulation.

•In problem relating to time and rates it gives better result than other averages.

Demerits:

•It not easily understood.

•It is difficult to compute.

•It gives largest weights to smallest items.


•Its value cannot be computed when there are both positive and negative items in a series
or one or more items are zero.

Relationship among the averages:

AM GM HM

22
22-01-2025

Which Average to Use:

Median:

The median is generally the best average in open-end grouped distribution,

especially where if plotted as a frequency curve one gets a J or reverse J curve. For

example, in case of price distribution or income distribution.

Mode:

It is used to describe qualitative data. Mode can be used in problems involving the

expression of preference where quantitative measurements are not possible.

Geometric mean:

It is useful in averaging ratio and percentage and in computing average rates of

increasing or decreasing.

Harmonic mean:

It is useful in problems in which value of the variable are compared with a

constant quantity of another variables i.e., rates, time, and distance covered with in

certain time and quantity purchased are sold per unit etc.

Arithmetic mean:

In the following list AM should not be used:

1. In highly skewed distribution.

2. In distribution with open end intervals.

3. To average ratios and rate of change

4. When the distribution is universally spread.

23
22-01-2025

Measures of Dispersion

A measure of dispersion or variation in any data shows the extent to which

the numerical value tent to spread about an average. It is clear from above that

dispersion is also known as scatter spread or variation measures to extent to which

the items of vary from some central values.

Since measures of dispersion give an average of difference of various items

from an average they are also called averages of the second order.

Definition:

The study of scatter or variation of values in a data from any measures of

central tendency is called dispersion. In other words measures which measure the

lack of uniformity are called measure of dispersion.

Objectives of measure of dispersion:

1. To judge the reliability of an average

2. To compare two or more services with respect to its variability

3. To serve as a basis for the control of variability

4. To enable the use of other statistical measures

Pre- requites of a good measures of dispersion:

1. It should be easy to understand and simple to calculate

2. It should be based on each and every items of the series

3. It should passes Sampling stability

4. It should be capable of further mathematical formula

5. It should not be affected by abnormal extreme value

6. It should be well defined by a mathematical formula

7. It should be easily calculated for an open-end distribution.

1
22-01-2025

Absolute measures of dispersion:

Absolute measures of dispersion are expressed in the same statistical unit in

which the original data are given such as Rs, KG, etc. These measures may be used to

compare variation in two or more distribution provided the data are expressed in the

same units.

Types of Absolute measures of dispersion:


1. Range
2. Quartile Deviation
3. Mean Deviation
4. Standard Deviation

Relative measures of dispersion:


A relative measure of variation is one which s independent (free) of units of
measurements of variable and is suitable for comparative study of variation.

Types of Relative measures of dispersion:

1. Co-efficient of Range

2. Co-efficient of Quartile Deviation

3. Co-efficient of Mean Deviation

4. Co-efficient of Standard Deviation

Co-efficient of dispersion

A relative measure of variation is the ratio of measures of absolute

dispersion to an appropriate average; it is called Co-efficient of dispersion

Co-efficient:

It means a pure number that is independent of unit of measurement.

2
22-01-2025

Standard Deviation:

The concept of SD was first used by Karl Pearson in 1893.

Definition: It is defined as the positive root of the mean of the squared deviations of given

observation from their A.M.

In short SD is root mean square deviation from mean. It is denoted by Greek

letter δ (sigma)

It is relative measure is known as co-efficient of standard deviation:

C of SD = δ /X

Co-efficient f variation:

Co-efficient of variation is relative measure dispersion.


C.V = = δ /X * 100

Relationship between Measures and Dispersion:

In a normal distribution there is a fixed relationship between the three

most commonly used measured sod dispersions. The quartile deviation is the

smallest the mean deviation is the next and standard deviation is the largest in the

following proportion.

QD= 2/3 δ

MD= 4/5 δ

3
22-01-2025

Range
Definition :
Range is differences between the two extreme items i.e. it is the
difference between the maximum value and the minimum value in a series.
Range = maximum value - minimum value
Range = L- S
Coefficient of Range = L - S
L+S
L - largest value or item
S - smallest value or item

A. UnGrouped Data
Find the range in the series
80 90 63 68 61 67 65 100 75 89 84 86 60
Solution :
Arrange the numbers in systematic order
61 63 65 67 68 75 80 84 86 89 90 100
Range = L - S
= 100 - 60
=40

Coefficient of Range = L - S
L+S

= 100 - 60
100 + 60
= 40
140
= 0.285

4
22-01-2025

B. Grouped Data
Calculate range and coefficient of range for the following data:
In case of grouped data the range is the differences between the upper boundary of
highest class and the lower boundary of the lowest class. No consideration of the given
frequencies.
Marks No of
Students
Range = L - S
10-20 8
= 60 - 10
20-30 10
=50
30-40 12
40-50 8
Coefficient of Range = L - S
50-60 4
L+S
= 60 - 10
60 + 10
= 50
70
= 0.714

 Merits
1. It is simple to compute and easy to understand.
2. It is time saving and widely used in Industrial Quality Control.

 Demerits
1. It is not a precise measure.
2. It is not based on each and every item of the distribution.
3. Range cannot tell us anything about the character of the distribution within two
extreme observation.
4. Range cannot be e computed in case of open and distribution.

 Uses Of Range
1. Quality control.
2. Fluctuations in the share prices.
3. Weather forecast.

5
22-01-2025

The Mean Deviation

Mean deviation is also known as the average deviation. It is the average


differences between the items in a distribution and median or mean of that series.
Theoretically, there is an a advantage in taking the deviations from the median
because the sum of the deviations of the items from median is minimum when
signs are ignored.

However in practice, the arithmetic mean is more frequently used in


calculating the value of average deviation and this is the reason why it is more
commonly called mean deviation. In any case, the average used must be clearly
stated in a given problem so that any possible confusion in measuring is avoided.

Computing Of Mean Deviation

Individual observation :
MD=1/N ∑|X−A|
or
=∑|D|
N

Coefficient Of Mean Deviation = M D


Median

Steps :

i. Compute the median of the series.


ii. The deviation of the items of the median ignoring ± signs and denote the
deviations by |d|
iii. Obtain the total of these deviations i.e. ∑|D|
iv. Divide the total obtained in step (iii)by the number of observation calculate the
mean deviation

6
22-01-2025

Calculate Mean Deviation


Solution:
income |D|
group
4000 400
4200 200
4400 0
4600 200
4800 400
∑|D|= 1200

|D| = ∑|X−A|
A= 4400
∑|D| = 1200

MD =∑|D|
N
MD = 1200
5
= 240

Calculate Mean Deviation In Discrete Series

MD ==∑f |D|
N

X f C.F |D| f|D| |D| = ∑|X−A|


A= 12
10 3 3 2 6 f∑|D| = 36
11 12 15 1 12
12 18 33 0 0 MD =∑f|D|
13 12 45 1 12 N
14 3 48 2 6 MD = 36
∑f|D|= 36 48
= 0.75

7
22-01-2025

Calculate Mean Deviation In Continuous Series

MD ==∑f |D|
N

X f m
0-10 5 5
10-20 8 15
20-30 12 25
30-40 15 35
40-50 20 45
50-60 14 55
60-70 12 65
70-80 6 75

Merits And Limitations Of Mean Deviation


Merits :
1. It is a simple to understand and easy to compute.
2. It is based on each and every items of the data .
3. Mean deviation is used affected by the values of extreme items then standard deviation.
4. Since deviation are taken from a control value comparison about formation of different
distribution can easily made
Limitations:
1. The greatest drawback of this method is that algebraic signs are ignored while taking the
deviations of the items. This is the mathematically wrong and makes the method non
algebraic.
2. This method may not given us very accurate results.
3. It is not capable of further algebraic treatment.
4. It is rarely used in social logical studies.
Usefulness of the Mean Deviation
This measure is useful for small sample with no elaborate analyzers required

8
22-01-2025

The Standard Deviation

 The standard deviation concept was introduced by karl pearson in 1823.


 Its significance lies in the fact that it is free from those defects from which the
earlier methods suffered and satisfied most of the properties of good measure of
dispersion.
 Standard deviation is also known as root mean square deviation, for the reason
that it is the square root of the measure of the squared deviation from arithmetic
mean.

Definition :
It is the square root of the quotient that obtained by dividing the sum of the
square deviation of items from the Arithmetic mean by the number of observation.
Standard deviation measure the average spread around the mean.

Formula for calculating standard deviation:

or
σ = √ sum of squared deviation from Arithmetic mean
number of observation
or

S = √ Σ X2 X= (X- X̅ )
N

Differences Between Mean Deviation And Standard Deviation


1. Algebra signs are ignored while calculating mean deviation where as in the
calculation of standard deviation signs are taken into account.
2. Mean deviation can be computed either from the median or mean standard
deviation, on the other hand, is always computed from the arithmetic mean.

9
22-01-2025

Calculation Of Standard Deviation In Individual Series


It may be calculated by applying any of the following two methods:
1. By taking deviation of the items from actual mean.
2. By taking deviation of the items from assumed mean.

1. Deviation Taken From Actual Mean

σ = √Σ X2 X= (X- X̅ )
N
Steps:
i. Calculate the actual mean of the series i.e. X.
ii. Take the deviation of the items from mean, i.e. find (X-X̅). Denote these
deviation by X.
iii. Square these deviations and obtain the total ΣX2.
iv. Divide ΣX2 by the total number of observation i.e. N and extract the square root.
This gives us the value of standard deviation.

2. Deviations Taken From Assumed Mean :


When the actual mean is in fraction in such cases deviation are taken from
assumed mean

σ = √Σ d2 - - Σd 2

N N

Steps:
i. Take the deviations of the items from assumed mean i.e. Obtain (X- A).
denote these deviation by d. Take the total of these division i.e. Obtained
Σd.
ii. Square these division and obtain the total Σd2.
iii. Substitute the value of Σd2, Σd and N in the above formula.

10
22-01-2025

Find the standard deviation of the following data by deviation method:

X d d2
120 20 400
60 -40 1600
80 -20 400
20 -80 6400
100 0 0
40 -60 3600
140 40 1600
560 140 14000

Assumed mean A= 100


σ = √Σ d2 - Σd 2
N N

= √ 14000 - (140) 2
7 7
= √ 2000-400
= √ 1600
σ = 40

Calculating Standard Deviation In Discrete Series

For calculating standard deviation in discrete series any of the following Method
can be applied :
1. Actual mean method
2. Assumed mean method
3. Step deviation method

1. Actual Mean Method :


σ = √Σ fX2 where, X= (X- X̅ )
N

2. Assumed Mean Method


σ = √Σfd2 - Σfd2
N N

where, d = (X- A).

11
22-01-2025

Steps :
i. Take the deviation of the items from assumed mean and denote these
deviations by d.
ii. Multiply these deviation by respective frequencies and obtain the total Σfd.
iii. Obtained the square of the deviation i.e. calculate d2.
iv. Multiply the square deviation by the respective frequencies and obtain the total
Σfd2.

Substitute the values in the above formula


Calculate the standard deviation from the following data in case of discrete series:

X f d d2 fd fd2
3.5 3 -3 9 -9 27
4.5 7 -2 4 -14 28
5.5 22 -1 1 -22 22
6.5 60 0 0 0 0
7.5 85 1 1 85 85
8.5 32 2 4 64 128
9.5 8 3 9 24 72
217 128 362

Assumed mean A= 6.5


σ = √Σ fd2 - Σfd 2
N N

= √ 362 - (128) 2
217 217
= √ 1.668-0.347
= √ 1.32
σ = 1.148

12
22-01-2025

Calculate the standard deviation from the following data in case of step deviation
method: X f D=X-A fd fd2
C
10 3 -3 -9 27
20 7 -2 -14 28
30 9 -1 -9 9
40 23 0 00 0
50 15 1 15 15
60 8 2 16 32
70 6 3 18 54
80 4 16 64
75 33 229
Assumed mean A=40
σ = √Σfd2 - Σfd 2

N N *i

= √ 229 - (33) 2
75 75 * 10
= √ 3.05-1.936 * 10
= 1.69 * 10
σ = 16.90

Calculate the standard deviation from the following data in case of continuous series:

X f m D=X-A d2 fd fd2
C
20-30 30 25 -3 9 90 270
30-40 58 35 -2 4 -116 232
40-50 62 45 -1 1 -62 62
50-60 85 55 0 0 0 0
60-70 112 65 1 1 112 112
70-80 70 75 2 4 140 280
80-90 57 85 3 9 171 513
90-100 26 95 4 16 104 416
500 259 1885
Assumed mean A=55
σ = √Σfd2 - Σfd 2

N N *i

= √ 1885 - (259) 2
500 500 * 10
= √ 3.77- 00.268 * 10
= 1.87 * 10
σ = 18.7

13
22-01-2025

Mathematical Properties Of Standard Deviation:


1. Combined standard deviation

σ 12 = combined standard deviation


σ 1 = standard deviation of first group
σ 2 = standard deviation of second group

2. standard deviation of n natural numbers. The standard deviation of the first n


natural number can be obtained by

σ = √1/12( N2-1)

3. The sum of the square of the deviation of items in the series from their
arithmetic mean is minimum. This is the reason why standard deviation is
always computed from the arithmetic mean.

4. For a systematic distribution the following relationship hold good:


mean ± 1 σ covers 68.27% of the items
mean ± 2 σ covers 95.45 % of the items
mean ± 3 σ covers 99.73. % of the items

Relation Between Measure Of Dispersion


In a normal distribution there is a fixed relationship between three most
commonly used measure of distribution. the quartile deviation is smallest, the
mean deviation next and the standard deviation is largest, in the following
properties;
Q.D = 2/3 σ and MD = 4/5 σ

14
22-01-2025

COEFFICIENT OF VARIATION

The standard deviation is an absolute measure of dispersion. The corresponding


relative measure is known as the coefficient of variation. The measured developed by
Karl Pearson. It is used in such problems where we want to compare the variability of
two of more than two series. That series (or group) for which the coefficient of
variation is greater is said to be more variable or less consistent, less uniform, less
stable or less homogenous. On other hand, the series for which coefficient of variation
is less said to be less variable or more consistent, more uniform, more stable or more
homogenous.

Coefficient of variation is denoted by C.V and it is obtained follow:


C.V = σ / X̄ *100

C.V express the standard deviation as a percentage of mean.

Example :From the following table of marks obtained by A and B in 10 tests of 150 marks
each, find out who is more intelligent and who is more consistent.
A: 25 50 45 30 70 42 36 48 34 60
B: 10 70 50 20 95 55 42 60 48 80
Solution: In order to find out the more intelligent student between A and B we will calculate
the average marks and for finding out the more consistent student we will compare the
coefficient of variation. A B dA dA2 dB dB2

25 10 -5 25 -40 1600
50 70 20 400 20 400
45 50 15 225 0 0
30 20 0 0 -30 900
70 95 40 1600 45 2025
42 55 12 144 5 25
36 42 6 36 -8 64
48 60 18 324 10 100
34 48 4 16 -2 4
60 80 30 900 30 900
440 530 140 3670 30 6018

15
22-01-2025

i) X̄ A = XA
N
= 440
10
X̄ A = 44

X̄ B = XB
N
= 530
10
X̄ B = 53

∴ Since average score by the student B is higher than A. hence, student B is more
intelligent.
ii)
σA = √Σ d2 - Σd 2
N N

= √ 3670 - (140) 2
10 10
= √ 367-196
= √ 171
σA = 13.06

C.V A = σ A
X̄ A * 100
= 13.06
44 * 100
C.V = 29.68

σB = √Σ d2 - Σd 2

N N

= √ 6018 - (30) 2
10 10
= √ 601.80 - 9
= √ 592.8
σB = 24.34

C.V B= σ B
X̄ B * 100
= 24.34
53 * 100
C.V = 45.92

16
22-01-2025

Variance
Variance is nothing but square of the standard deviation.
i.e. Variance = σ2 = (X- X̄ )2
σ = √ Variance

Variance = Σ fd2 - Σf d 2

N N * i2

d = X-A
i

Merits Of Standard Deviation :


1) It is the most reliable measure of dispersion. It is based on every items and it is rigidly
defined.
2) It is not as much affected by sampling fluctuations as other measures of dispersion.
3) The standard deviation is free from most of the defects.
4) It is suitable for algebraic manipulation.
5) It is used for hire statistical work.

17
22-01-2025

Correlation analysis
If two quantities vary in such a way that movements in one are accompanied by

movements in other, these quantities are correlated. For example, there exists some

relationships between price of commodity and amount demanded. A relationship between

two such sets of observations is called correlation. The correlation analysis refers to the

techniques used in measuring the relationship between the variables.

Thus correlation is a statistical device which helps us in understanding the co-variation

of two or more variables.

Types of correlation

1. Positive and negative correlation :

It depends upon the direction of change of the variables. If both the variables are

varying in the same direction correlation is said to be positive. If, on the other hand, the

variables are varying in opposite direction correlation is said to be negative.

2. Simple, partial and multiple correlation :

The distinction is based upon the number of variables studied. When only two variables are
studied it is a problem of simple correlation. When there are more variables are studied it is a
problem of either multiple or partial correlation.

In multiple correlation three or more variables are studied simultaneously, on the other
hand, in partial correlation we recognize more than two variables, but consider only two
variables to be influencing each other the effect of other influencing variables kept constant.

3. Liner and non-linear (curvilinear) correlation :

The distinction is based upon the constancy of the ratio of change between the variables. If
the amount of change in one variable tends to bear constant ratio to the amount of change in
other variables then the correlation is said to be linear.

Correlation would be called non-linear or curvilinear if the amount of change in one


variable does not bear a constant ratio to the amount of change in the other variable.

1
22-01-2025

 Methods of studying correlation :

1. Scatter diagram method.

2. Graphic method.

3. Karl pearson’s coefficient correlation.

4. Concurrent deviation method.

5. Method of least squares.

1. Scatter diagram method :

The simplest device for ascertaining whether two variables are related is to prepare a
dot chart called scatter diagram. When this method is used the given data are plotted on a
graph paper in the form of dots, i.e., for each fair of X and Y values we put a dot and thus
obtain as many points as the numbers of observations. By looking to the scatter of various
points we can form an idea to whether the variables are related or not. The greater the
scatter of the plotted pointed on the chart, the lesser is the relationship between the two
variables. The more closely the points come to a straight line, the higher the degree of
relationship.

i. Perfect positive correlation :-

If all the points lie on a straight line falling from the lower left-hand corner to the upper right-
hand corner, correlation is said to be perfectly positive (i.e., r = +1).

ii. Perfect negative correlation :-

If all the points lie on a straight line rising from the upper left-hand corner to the lower right-

hand corner of the diagram, correlation is said to be perfectly negative (i.e., r = -1).

2
22-01-2025

iii. High degree of positive correlation :-

If the plotted points fall in a narrow band would be a high degree of correlation
between the variables -- correlation shall be positive if the points show a rising
tendency from the lower-left hand corner to the upper right-hand corner.

iv. High degree of negative correlation :-

Correlation shall be negative if the points show a declining tendency from the
upper left-hand corner to the lower right-hand corner of the diagram.

v
v v
v
v
v

v. Low degree of positive correlation :-

If the points widely scattered over the diagram it indicates very little relationship
between the variables-- correlation shall be positive , if the points are rising from the lower
left-hand corner to the upper right-hand corner.

v
vi. Low degree of negative correlation :-

Correlation shall be negative if the points are running from the upper left-hand side to the

lower right-hand side of the diagram.

3
22-01-2025

vii. No correlation :-

If the plotted points lie on a straight line parallel to the X- axis haphazard

manner, it shows absence of any relationship between the variables (i.e., r = 0).

Merits

a) It is simple and non-mathematical method of studying correlation between the


variables. As such it can be easily understood and a rough idea very quickly be
formed as to whether or not the variables are related.

b) It is not influenced by the size of extreme items.

c) It is the first step in investing the relationship between two variables.

 Limitations :-

By applying this method we can get an idea about the direction of the

correlation and also whether it is high or low. But we cannot establish the exact

degree of correlation between the variables as is possible by applying the

mathematical methods.

4
22-01-2025

2.Graphic method

Under this method individual values of two variables are plotted on the

graph paper thus obtain two variables one for X variable and another for Y

variable. By examining the direction and closeness of the to drawn we can inter

whether not the variables are related. If both the curves drawn on the graph are

moving in the same direction (either upward or downward) correlation is said to

be positive. On the other hand, if the curves are moving in the opposite directions

correlation is said to be negative.

• From the following data ascertain whether the income and expenditure of 100 workers of a

factory are correlated.

Year Average income Average expenditure (in


(in Rs) Rs)

1979 100 90

1980 102 91

1981 105 93

1982 105 95

1983 101 92

1984 112 94

1985 118 100

1986 120 105

1987 125 108

1988 130 110

5
22-01-2025

3. Karl pearson’s coefficient correlation.

The pearson coefficient of correlation is denoted by the symbol r. the formula for
computing pesrsonian r

(i) when deviations of is the items are taken from actual mean.
r= ∑xy
Nδx δy
Where,
x = (X – X̅) & y = ( Y - Y̅ )
δx = standard deviation of series X,
δy = standard deviation of series Y,
N = number of pairs of observations,
r = the correlation coefficient,
The value of r lies between +1 and -1, i.e., cannot be greater than 1 or less than -1.
If r = +1 correlation is perfect and positive. If, r = -1 correlation is perfect and
negative. If r = 0 there is no correlation i.e., the variables are independents the above
formula for computing pearson coefficient of correlation can be transformed to
following form which is easier to apply.

r= Σx y
Σx² Σy² (i)
x = ( X – X̅ )
y = ( Y – Y̅ ) √
(ii) Direct method of finding out correlation
r= N Σxy - (Σx) (Σ y)
√ NΣx² - (Σx)² √ NΣy² - (Σy)²

(iii) When deviations are taken from an assumed mean :


r= N Σ dx dy - {(Σ dx )*(Σ dy)}
√ N Σdx² - (Σdx)² √ NΣdy² - (Σdy)²

(iv) Correlation of grouped data :


r= N Σ fdx dy - {(Σ fdx )*(Σ fdy)}
√ N Σ fdx² - (Σ fdx)² √ NΣ fdy² - (Σ fdy)²

6
22-01-2025

Calculate Karl Pearson coefficient of correlation for the following data:

X Y X- X̅ X² Y- Ȳ Y² XY
48 45 14 196 10 100 140
35 20 1 01 -15 225 -15
17 40 -17 289 5 25 -185
23 25 -11 121 -10 100 110
470 45 13 169 10 100 130
170 175 00 776 00 550 280

r= Σx y .
√ Σx² Σy²
x = ( X – X̅ ) y = ( Y – Y̅ )

X̅ = Σx = 170 = 34 Ȳ = Σy = 175 = 35
N 5 N 5
r = 280 .
√ 776*550
= 280
653.29
= 0.429

(ii) Direct method of finding out correlation


r= N Σxy - (Σx) (Σy)
√ NΣx² - (Σx)² √N Σy² - (Σy)²

Example calculate correlation coefficient from data by direct method

X Y X² Y² XY r= N Σxy - (Σx) (Σy)


9 15 81 225 135 √ NΣx² - (Σx)² √N Σy² - (Σy)²
8 16 64 256 128
7 14 49 196 98 = 9*597 – 45*108 .
6 13 36 169 78 √9*285- (45)² √ 9*1356- (108)²
5 11 25 121 55
4 12 16 144 48
3 10 9 100 30 = 5373 – 4860 .
2 8 4 64 16 √ 2565 – 2025 √ 12204 -11664
1 9 1 81 9
45 108 285 1356 597 = 513 .
√ 540 √ 540

= 513 .
540

r = 0.95

7
22-01-2025

(iii) When deviations are taken from an assumed mean :


r= N Σ dx dy - {(Σ dx )*(Σ dy)}
√ N Σdx² - (Σdx)² √ NΣdy² - (Σdy)²
Example :
Calculate Karl Pearson coefficient of correlation between the calues of X and Y for the
following data:X : 78 89 96 69 59 79 68 61
Y : 125 13 156 112 107 136 123 108

X dX= dx² y dy = dy² dxdy


(X-69) (y-112)
78 +9 81 125 +13 169 +117
89 +20 400 137 +25 625 +500
96 +27 729 156 +44 1936 +1188
69 00 00 112 00 00 00
59 -10 100 107 -5 25 50
79 +10 100 136 +24 576 240
68 -1 01 123 +11 121 -11
61 -8 64 108 -4 16 +32
Σdx= 47 1475 Σdy= 108 3468 2116

r= N Σ dx dy - {(Σ dx )*(Σ dy)}


√ N Σdx² - (Σdx)² √ NΣdy² - (Σdy)²

r= 9 *47*108 - {(47 )*(108)}


√ 9 * 1475 - (47)² √ 9 * 3468 - (108)²

r= 16928 - 5076
√ 11800 – 2209 √ 27744 – 11664
r= 1852 .
√ 9591√ 16080
r= 11852 .
97.93 * 126.80
r= 11852 .
12417.78
r= 0.954

8
22-01-2025

Assumptions of the Pearson's coefficient

1) There is linear relationship between the variables.

2) The two variables under study are affected by a large number of causes so as to form

a normal distribution.

3) There is a cause and effect relationship between the forces affecting the distribution

of the items in the two series.

Merits of the Pearson's coefficient

1) It is most popular.

2) It summarizes in one figure not only the degree of correlation but also the direction,

i.e., whether correlation is positive or negative.

 Limitations of the Pearson's coefficient

1)The correlation coefficient always assumes linear relationship regardless of the fact

whether that assumption is correct or not.

2)Greater care must be taken for calculating the value.

3)It is unduly affected by extreme items.

4)This method takes more time to compute the value of correlation coefficient.

9
22-01-2025

Properties of coefficient of correlation :


1. The coefficient of correlation lies between -1 and +1. symbolically, -1 ≤ r ≥ +1 or
2. The coefficient of correlation is independent of the change of scale and origin of
the variables X and Y.
By change of origin we mean subtracting some constant from every given
values of X and Y and by change of scale we mean dividing or multiplying every value
of X and y by some constant.
3. The coefficient of correlation is the geometric mean of two regression coefficient.
Symbolically,
r = √ bxy * byx .
4. The degree of relationship between the two variables is symmetric as shown below
rxy = ryx
rxy = Σx y = Σy x = ryx
Nδx δy N Σ δy δx

Spearman’s coefficient Rank correlation :

Spearman’s rank correlation coefficient is defined as ,

R = 1-(6 ΣD² )
N (N² –1)
R = Rank correlation coefficient

D = Difference of rank between paired item in two series.

N = Total number of observation.

The value of this coefficient ,interpreted in the same way as karl pearson’s
correlation coefficient, ranges between +1 and -1. when r ranks are in the same
direction, when the r is -1 there is complete agreement in the order to the ranks
and they are in opposite directions.

10
22-01-2025

Example :

R1 R2 D= R1 – R2 D²
1 3 -2 4
2 2 0 0
3 1 2 4
D² = 8

R = 1- (6 ΣD² )
N (N² –1)

= 1- 6 * 8
3 (3² –1)

= 1 - 48
24

R= - 1

 In rank correlation we may have two types of problems :

A) Where ranks are given.

B) Where ranks are not given.

A.Where ranks are given :

Where actual ranks are given to us the required for computing rank correlation are :

(i) Take the difference of the two ranks, i.e., (R1-R2) and denote these differences by D.

(ii) Square there differences and obtained the total Σ D2.

(iii) Apply the formula,


R = 1-(6 ΣD² )
N3 –1

11
22-01-2025

Example :
The ranking of 10 students in subjects accounting and auditing are as follow:
Accounting 3 5 8 4 7 10 2 1 6 9
Auditing : 6 4 9 8 1 2 3 10 5 7
R = 1- (6 ΣD² )
Solution:
N 3 –N
R1 R2 D= R1 – D²
R2 = 1- 6 * 214
10 3 – 10
3 6 3 9
5 4 1 1
= 1 - 1284
8 9 -1 1
990
4 8 -4 16
7 1 6 36 = - 294
10 2 8 64 990
2 3 -1 1
1 10 -9 81 R = - 0.296
6 5 -1 1
9 7 2 4
D² = 214

2) Where ranks are not given :

When we are given the actual data and not the ranks, it will be necessary to assign

the ranks. Ranks can be assigned by taking either highest value as 1 or the lowest value

as 1. but whether we start with lowest value or the highest value we must follow the

same method in use of both the variables.

Example:

Quotations of Index Numbers of security prices of a certain joint stock

company are given below, using rank correlation method, determine the relationship

between debenture price and share prices.

12
22-01-2025

Year Debentares Prices Share Ry D = (Rx – Ry) D2


x Rx price
y
1 97.8 3 73.2 1 2 4
2 99.2 7 85.8 6 1 1
3 98.8 6 78.9 4 2 4
4 98.3 4 75.8 2 2 4
5 98.4 5 77.2 3 2 4
6 96.7 1 87.2 7 6 36
7 97.1 2 83.8 5 3 9
D² = 62

R = 1- (6 ΣD² )
N 3 –N
= 1- 6 * 62
73–7
= 1 - 372
336
= - 36
336
R = - 0.107

Equal ranks

If two or more items are of equal value, they can assigned average rank. An

adjustment is required for each group of equal ranks. The formula for calculating

rank coefficient of correlation in case of equal rank is :


r= 1 – 6 {ΣD² + 1/12 ( m3 –m) + 1/12 (m3- m) + --------}
N3 – N

m -- stands for the number of items whose ranks are common.

If there are one such group of items with common ranks, this value is

added as many times the number of such group.

13
22-01-2025

Example:
Calculate the rank coefficient of correlation of the following data.

X Rx Y Ry D = (Rx – Ry) D2

80 8 12 1 7 49
78 7 13 2 5 25
75 5.5 14 4 1.5 2.25
75 5.5 14 4 1.5 2.25
68 4 14 4 0 00
67 3 16 7 4 16
60 2 15 6 4 16
59 1 17 8 7 49
D² = 159.5

r= 1 – 6 {ΣD² + 1/12 ( m3 –m) + 1/12 (m3- m) + --------}


N3– N
r = 1 – 6 {159.5+ 1/12 ( 23 –2) + 1/12 (33- 3)
83 – 8
r = 1 – 6 {159.5+ 5 + 2 }
504
r = 1 – 972
504
r = 1 – 1.929
r = - 0.929

Merits

(1)This method is simpler to understand and easier to apply compared to the karl
Pearson's method.

(2)When the data use of a qualitative nature like honesty, efficiency etc, this method can
be used with great advantage.

(3) This is the only method that can be used where we can given the ranks and not actual
data.

(4)Even where actual data are given, rank method can be applied for ascertaining
correlation

 Limitations

1) This method cannot be used for finding out correlation in a grouped frequency distribution.

2) This method should not be applied where N exceeds 30 censes we can given the ranks and not

the actual values of the variables.

28

14
22-01-2025

Regression analysis

 Introduction

Regression analysis reveals average relationship between two variables and this

makes possible estimation or prediction. The two variable regression model assigns one of

the variables the status of an independent variable, and the other variable the status of a

de-pendent variable.

 Uses of regression analysis :


1. Regression analysis provides estimates of values of the dependent variables from values
of the independent variable. The devised used to accomplish this estimation procedure is
regression line.
2. A second goal of regression analysis is to obtain a measure of the error involved in using
the regression line as a basis for estimation. For this purpose standard error of estimate is
calculated.
3. With the help of regression coefficient we can calculate the correlation co-efficient.

Difference between correlation and regression analysis


I. Regression equations
Regression equations also know as estimating equations, are algebraic expression of the
regression lines.
 Regression equation of Y on X
The regression equation of Y on X is used to describe the variation in the values of Y for
given changes in X.
It is expressed as
Yc = a + bX
Here, a and b are constant, the symbol Yc stands for the value of Y computed from the
relationship for given X.
To determine the values of a and b the following normal equations are to be solved
simultaneously,
ΣY = Na + bΣx
ΣXY = a Σx + bΣx²

15
22-01-2025

Regression equation X on Y
The regression equation of X on Y is used to describe the variations in the values of X for
given changes in Y.
It is expressed as
Xc = a + bY
To determine the values of a and b the following normal equations are to be solved
simultaneously,
ΣX = Na + bΣy
ΣXY = a Σy + bΣy²

Deviations taken from arithmetic means of X and Y


 Regression equation of X on Y
X – X̅ = r δx / δy (Y - Y̅ )
X̅ is the mean of X series,
Y̅ is the mean of Y series,
r δx / δy is know as the regression coefficient of X on Y,
The regression coefficient of X on Y is denoted by the symbol, bxy .
bxy or y = r δx / δy = Σxy / Σy²

 Regression equation of Y on X

Y – Y̅ = r δy / δx (x - x̅ )

X̅ is the mean of X series,

Y̅ is the mean of Y series,

r δy / δx is know as the regression coefficient of y on x,

The regression coefficient of Y on X is denoted by the symbol, byx .

byx or = r δy / δx = Σxy / Σx²

16
22-01-2025

Example: From the following data calculate the regression equation:

X Y x x2 y y2 xy

6 9 0 0 1 1 0
2 11 -4 16 3 9 -12
10 5 4 16 -3 9 -12
4 8 -2 04 0 0 00
8 7 2 04 -1 1 -2
30 40 0 20 -26

Regression equation of X on Y
X – X̅ = r δx / δy (Y - Y̅ )

X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5

r δx = Σxy = -26 = - 1.3


δy Σy2 20

Hence, X – 6 = -1.3 (Y-8)


X-6 = -1.3Y +10.4
X = -1.3Y + 16.4
X = 16.4 -1.3Y

Regression equation of Y on X
Y - Y̅ ̅ = r δy / δx ( X – X)

r = δy = ΣXY = -26 = - 0.65


δx Σx2 40

Hence,
Y – 8 = -0.65 (X – 6)
Y – 8 = -0.65X + 3.9
Y = -0.65X + 11.9
OR Y = 11.9 – 0.65X

17
22-01-2025

 Deviations taken from assumed means :

 regression equation of X on Y :

( X – X̅ ) = r δx / δy (Y - Y̅)

The value of r δx / δy will now be obtained as follows ,


r δx / δy = Σdxdy – Σdx * Σdy / N
Σ dy² - (Σdy)² / N

dx = (X-A) and dy = (Y-A)

 Similarly the regression equation Y on X is,

(Y - Y̅) = r δx / δy ( X – X̅ )
r δy / δx = Σdxdy – Σdx * Σdy / N
Σ dx² - (Σdx)² / N

Example: From the following data calculate the regression equation:

X Y Dx =X dX2 Dy = Y – 7 dY2 XY
–6
6 9 0 0 2 4 0
2 11 -4 16 4 16 -16
10 5 4 16 -2 4 -08
4 8 -2 04
1 1 -02
8 7 2 04
0 0 00
30 40 40 5 25 -26

dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
Y - Y̅ ̅ = byx ( X – X)
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dx 2 – (Σdx 2) / N
= -26 – (0) (5)
40 – 0/5
= -26/40
= -.65

18
22-01-2025

dx = X – A dy = Y-A
X̅ = ΣX = 30 = 6 Y ̅ = ΣY = 40 = 8
N 5 N 5
Regression equation of Y on X
( X – X) ̅ ̅ = byx (Y - Y )
byx = Σdxdy – (Σdx) (Σdy)/N
Σ dy 2 – (Σdy 2) / N
= -26
25 – (5) 2 /5
= -26/20
= -1.3

Difference Between Correlation And Regression


1. Correlation coefficient measures the degree of co variability between X and Y. Regression
nature of relationship dependent and independent.

2. Correlation merely degree of relationship not cause and effect relationship. Regression cause
and effect relationship.

3. rxy is a measure of direction and degree of linear relationship between two variables X and Y,

rxy and ryx symmetric (rxy = ryx). i.e., it is immaterial which of X and Y is dependent

variable and which is independent variable. In regression analysis the regression coefficients

bxy and byx are not symmetric i.e., bxy ≠ byx and hence it definitely makes a difference as to

which variable is dependent which variable is independent.

4. There may be nonsense correlation between two variables, it is purely due to change and has
no practical relevance. Nothing like nonsense regression.

5. Correlation coefficient is independent of change of scale and origin. Regression coefficient of

correlation takes the same sign as the ( r ) regression coefficient ( bxy and byx )

19
22-01-2025

• The following points should be noted about the regression coefficient :

1. Both the regression coefficients will have the same sign. i.e., either they will be positive or

negative. It is never possible that one of the regression coefficients is negative and other

positive.

2. Since the value of the coefficient of correlation ( r ) cannot exceed one, one of the regression

coefficient must be less than one or in other words, both the regression coefficients can not be

greater than one. For ex – if bxy = 1.2 and byx = 1.4 the value of correlation coefficient would

be √1.2 x 1.2 = 1.296 which is not possible.

3. The coefficient of correlation will have the same sign as that of regression coefficients i.e., if

regression coefficient have a negative sign, r will also be negative, if regression coefficient

have a positive sign r would also be positive. For example – if bxy = -0.8 and byx = -1.2 r

would be r = √-0.8 x -1 = - 0.98 and not + 0.98.

4. Since bxy = r δx / δy, we we can find out any of the four values given the other three. For

example, if we know that r = 0.6, δx = 0.4 and bxy = 0.8 we can find δy, bxy = r δx / δy .

0.8 = 0.6 (0.4 / δy ) or δy = 0.24 / 0.8 = 0.3

5. Regression coefficients are independent of change of origin but not scale ( change of origin

subtracting some constant change of scale. Dividing, multiplying, every values x and y).

20
22-01-2025

Analysis Of Time Series

Introduction

One of the most important tasks before economists and businessmen these days is

to make estimates for the future.

Businessmen – sales

Economists – population

“A time series consists of statistical data which are collected, recorded or observed over

successive increments” ------ Patterson

“A time series is a set of statistical observations arranged in chronological order”

-- Morris Hambury

It is clear from above definitions that time series consists of data arranged

chronologically. Thus if we record the data relating to population, per capita income, prices,

production, etc. for the last 5, 10, 15, 20 years or some other time period., the series so edging

would be called time series.

The problem of time series analysis can be best appreciated with the help of the following
example. The following are the figures of sales of refrigerators of a firm in thousand units :

Year Sales of firm A(1000 units)


1987 40
1988 42
1989 47
1990 41
1991 43

1992 48
1993 65
1994 42

1
22-01-2025

1. Secular movements : Changes have occurred as a result of general tendency of the data to

increase or decrease.

2. Seasonal variations : Changes in climate.

3. Cyclical variations : Boom and depression.

4. Irregular variations : Erratic variations. Like floods, earthquakes, etc.

 Utility of Time Series Analysis

1) It helps in understanding past behavior.

2) It helps in planning future operations.

3) It helps in evaluating current accomplishments.

4) It facilitates comparison.

 Components of time series

1. Secular Trend :

Trend, also called secular or long-term trend, is the basic tendency of production,
sales, income, employment, etc., to grow or decline over a period of time. The
concept of trend does not include short-range oscillations but rather steady
movements over a long time.

Secular trend movements are attributable to factors such as population change,


technological progress and large-scale shift in consumer tastes.

2. Seasonal variations :

Seasonal variations are the periodic movements in business activity which occur
regularly every year and have their origin in the nature of the year itself.

• The factors that cause seasonal variations are,

(i) Climate and weather conditions.

(ii) Customs, traditions and habits.

2
22-01-2025

3. Cyclical variations :

The term ‘cycle’ refers to the recurrent variations in time series that usually last longer than
a year and regular neither in amplitude nor in length. Cyclical fluctuations are long-term
movements that represent consistently recurring rises and declines in activity. A business
cycle consists of the recurrence of the up and down movements of business activity from
some sort of statistical trend or “normal”. By “normal” we mean some kind of statistical
average we do not mean that there is anything very permanent or special. There are four well
defined periods or phases in the business cycle, namely :

(i) prosperity.

(ii) Decline or recession.

(iii) Depression and

(iv) Improvement or recovery.

4. Irregular variations :

Irregular variations, also called erratic, accidental, random, refers to such variations in

business activity which do not repeat in a definite pattern.

Irregular variations are caused by such isolated special occurrences as floods, earthquakes,

strikes and wars. Sudden changes in demand or very rapid technological progress may also

be included in this category.

3
22-01-2025

• Measurement of trend :

The various methods that can be used for determining trend are :

(1) Freehand or graphic method.

(2) Semi-average method.

(3) Moving average method.

(4) Method of least squares.

Methods of least square


The following two conditions are satisfied
1) Σ(Y-Y1 ) = 0 i.e. the sum of deviations of the actual values of Y and the computed values of Y is zero.
2) Σ(Y-Y1 )2 i.e. the sum of the square of the deviations of the actual and the computed values is least.

The straight line trends is represented by the equation:


Y = a + bX
In order to determine the values of the constants a and b the following two normal equations are to be
solved.
ΣY = Na +bΣX i
ΣXY = aΣX +bΣX 2 ii
Since ΣX = 0
ΣY = Na i
ΣXY = bΣX2 ii
the value of a and b are now be determined easily
Since ΣY = Na ∴a = ΣY/N
Since ΣXY = bΣX2 ∴b = ΣXY/ ΣX2

4
22-01-2025

Ex : Below are the figures of production of sugar factory (in thousands quintals)
Fit a straight line trend for these figures.

Year Production X XY X2 Trend Value


(in 000’s quintals)
Y
1980 80 -3 -240 9 84
1981 90 -2 -180 4 86
1982 92 -1 -92 1 88
1983 83 0 0 0 90
1984 94 1 94 1 92
1985 99 2 198 4 94
1986 92 3 276 9 96
N=7 ΣY = 630 ΣX= 0 ΣXY = 56 ΣX2 = 28

The equation of the straight line is


Y = a + bX

Since ∴a = ΣY ∴b = ΣXY
N ΣX2

a = 630 = 90 b = 56 = 2
7 28

Hence the equation of the straight line trend is


Y = a + bX
Y = 90 + 2X
From 1980 X = -3, Y = 90 + 2 (-3) = 84
X = -2, Y = 90 + 2 (-2) = 86
X = -1, Y = 90 + 2 (-1) = 88
X = 0, Y = 90 + 2 ( 0) = 90
X = 1, Y = 90 + 2 ( 1) = 92
X = 2, Y = 90 + 2 ( 2) = 94
X = 3, Y = 90 + 2 ( 3) = 96

5
22-01-2025

Ex : Fit a straight line trend for these following series. Estimate the value for 1987.

Year Production of Deviations from XY X2 Trend


steels (in tonnes) 1983 i.e X-65 Value
Y X
1980 60 -3 -180 9 61.429
1981 72 -2 -144 4 66.286
1982 75 -1 -75 1 71.143
1983 65 0 00 0 76.000
1984 80 1 80 1 80.857
1985 85 2 170 4 85.714
1986 95 3 285 9 90.571
N=7 ΣY = 532 ΣX= 0 ΣXY = 136 ΣX2 = 28

The equation of the straight line is


Y = a + bX

Since ∴a = ΣY ∴b = ΣXY
N ΣX2

a = 532 = 76 b = 136 = 4.857


7 28

Hence the equation of the straight line trend is


Y = a + bX
Y = 76 + 4.857 (X)
From 1980 X = -3, Y = 76 + 4.857 (-3) = 61.429
X = -2, Y = 76 + 4.857 (-2) = 66.286
X = -1, Y = 76 + 4.857 (-1) = 71.143
X = 0, Y = 76 + 4.857 ( 0) = 76.000
X = 1, Y = 76 + 4.857 ( 1) = 80.857
X = 2, Y = 76 + 4.857 ( 2) = 85.714
X = 3, Y = 76 + 4.857 ( 3) = 90.571

6
22-01-2025

Ex : Fit a straight line trend by the method of least squares to the using data. Assuming the same
of change continues, what would be the predicted earnings for the year 1988.

Year Earnings Deviation X XY X2 Trend


(Rs. lakhs) from 1982.5 Value
1979 38 -3.5 7 -266 49
1980 40 -2.5 5 -200 25
1981 65 -1.5 3 -195 9
1982 75 -0.5 -1 -72 1
1983 69 0.5 1 69 1
1984 60 1.5 3 180 9
1985 87 2.5 5 435 25
1986 95 3.5 7 665 49
N=8 ΣY = 526 ΣX= 0 ΣXY = 616 ΣX2 = 168

The equation of the straight line is


Y = a + bX

Since ∴a = ΣY ∴b = ΣXY
N ΣX2

a = 526 = 65.75 b = 616 = 3.667


8 168

1. Freehand or graphic method :

The procedure of obtaining a straight line trend by this method is given below.

i. Plot the time series on a graph.

ii. Examine carefully the direction of the trend based on the plotted intermation dots.

iii. Draw a straight line which will best fit to the data according to the personal judgment.

Year Production of steel


(million tones)
Example:
1978 20
Fit a trend line to the following data
1979 22
by the freehand method. 1980 24
1981 21
1982 23
1983 25
1984 23
1985 26
1986 25

7
22-01-2025

2. Method of semi average :

Fit a trend line to the following data

by the method of semi averages, Year Sales of firm A (‘000units)


1980 102
1981 105
1982 114
1983 110
1984 108
1985 116
1986 112
• Solution :
Average of first three years and last three years,
102 + 105 + 114 = 107
3 Trend line
108 + 116 + 112 = 112
3 Actual data
These two points 107 and 112 which shall be plotted corresponding to their respectively
middle years i.e., 1981 and 1985.

Example : Fit a trend line by the method of semi – averages to the data given below.

Solution: Fitting of trend line by the method of semi-average method

Year Profit (in ‘000 Rs)


1983 100
360 = 120
1984 120
3
1985 140
1986 150
480 = 160
1987 130
3
1988 200
200.
190.
Corresponding to 1984 we will take120 and 180.
170. Trend Line
Profit

corresponding to 1987 we will take 160 and join 160.


150. Actual Line
these two points. This will give the trend line. 140.
130.
120.
110.
100.
.
84 85 86 87 88

8
22-01-2025

Even number of years :

When there are even number of years like 6, 8, 10, etc, two equal parts can
easily be formed and average of each part obtained. However, when the
average is to be centered there would be some problem in case the number of
years is 8, 12, etc. For example, if the data relates to 1984, 1985 and 1987
which would be the middle year ? In such a case the average will be centered
corresponding to 1st July 1985, i.e., a middle of 1985 and 1986. The following
example shall illustrate the point.

Example : Fit a trend line by the method of semi – averages to the data given below.

Year Sales (Rs. Lakhs)


1979 412
1980 438 1748 = 437
1981 444 4

1982 454
1983 470
1984 482 1942 = 485.5
1985 490 4

1986 500
500. Trend Line
490.
Solution : 480. Actual Line
470.
460.
Sales

The average of the first four years is


450.
437 and that of the last four years 485.5. These 440.
430.
two points shall be taken corresponding to the 420.
middle periods i.e., 1st July 1980 and 1st July 410.
400.
1984. .
80 81 82 83 84 85 86

9
22-01-2025

3. Methods of moving average :

When a trend is to be determined by the method of moving averages, the average value
for a number of years is secured, and this average is taken as the normal or trend value for the
unit of time falling at the middle of the period covered in the calculation of the average. The
effect of averaging is to give a smoother curve, lessening the influence of the fluctuations that
full the annual figures away from the general trend.

While applying this method, it is necessary to select a period for moving average such as

3-yearly moving averages, 5-yearly moving averages etc.

The 3-yearly moving average shall be computed as follows,


a + b+ c , b +c + d , c + d+ e , d + e+ f _ _ _ _ _
3 3 3 3

And for 5-yearly moving average,


a +b +c + d+ e , b+ c+ d+ e+ f , c+ d+ e+ f+ g ,_ _ _ _ _ _
5 5 5

Calculate the 3 yearly moving averages of the producation figures given below and draw the
trend line.

Year Production 3 yearly 3 yearly moving


(in tonnes) total averages
1973 15 - - 110
100.
1974 21 66 22.00
90. Actual Line
1975 30 87 29.00 80.
1976 36 108 36.00 70. Trend Line
Sales

1977 42 124 41.33 60.


1978 46 138 46.00 50.
1979 50 152 50.67 40.
1980 56 169 56.33 30.
20.
1981 63 189 63.00
10.
1982 70 207 69.00 .
1983 74 226 75.33 73 74 75 76 77 78 79 80 81 82 83 84 85 86
1984 82 246 82.00
1985 90 267 89.00
1986 95 287 95.67
1987 102 - -

10
22-01-2025

• Even period of moving average :

If the moving average is an even period moving average, say four yearly, or

six-yearly, the moving total and moving average which are placed at the centre of

time spam from while they are computed fall between two time periods. This

placement is inconvenient since the moving average so placed would not coincide

with the original time period, we, therefore, synchronies moving average and

original data. This process is called centering and always consists of taking a two

period moving average of the moving averages.

Estimate the trend value use the data given below by taking a four yearly moving averages

Year Values 4 4yearly 4yearly


yearly moving moving
moving averages averages
total trend 29
1974 12 - - -
1975 25 - - - 27
1976 39 130 32.5 39.75
25 Trend Line
1977 54 188 47.0 54.75
1978 70 250 62.5 70.75 23
1979 87 316 79.0 84.75 Actual Line
1980 105 362 90.5 92.00 21
1981 100 374 96.5 90.75
1982 82 352 88.0 81.00 19
.
1983 65 296 74.0 65.75
. 78 79 80 81 82 83 84 85 86
1984 49 230 57.5 49.75
1985 34 168 42.0 34.75
1986 20 110 27.5 -
1987 7 - - -

11
22-01-2025

4. Method of least squares :

The following two conditions are satisfied.

(i) ∑ (Y – Yc) = 0

i.e., the sum of deviations of the actual values of Y and the computed values of Y

is zero.

(ii) ∑(Y – Yc)²

i.e., the sum of the squares of the actual and computed values id least.

12
22-01-2025

Theory of probability

In mathematics and statistics we try to present conditions under which we


can make sensible numerical statements about uncertainty and apply certain
methods of calculating numerical values of probabilities and expectations. In
statistical sense the term probability is thus established by definition and is not
connected with beliefs or any form of wishful thinking.

Probability defined :

The probability of a given event is an expression of likelihood of occurrence of


an event.

A probability is a number which ranges from 0(zero) to 1 (one) – zero for an


event which cannot occur and 1 for an event to occur.

• Broadly speaking there are four different schools of thought on the concepts of probability,

i. Classical or a priori probability.

ii. Relative frequency theory of probability.

iii. Subjective approach to probability.

iv. Axiomatic approach to probability.

i. Classical or a priori probability :

The classical approach to probability is the oldest and simplest. It originated in eighteenth
century in problems pertaining to games of chance, such as throwing of coins, dice or deck of
cards, etc. The basic assumption underlying the classical theory is that the outcomes of a random
experiment are “ equally likely”. The “event” whose probability is sought consist of one or more
possible outcomes of the given activity such as when a die is rolled once, any one of the six
possible outcomes, i.e., 1,2,3,4,5,6, can occur. These activities are referred to in modern
terminology as “experiment which is a terms that refers processes which result in different
possible outcomes or observations. The term “equally likely”, through undefined, conveys the
notion that each outcome of an experiment has the same chance of appearing as any other. Thus
in a throw of a die occurrence of 1,2,3,4,5,6, are equally likely events.

1
22-01-2025

• The definition of probability given by French mathematician Laplace and generally adopted
disciples of the classical school runs as follows probability it is said, is the ratio of the number
of “favorable” cases to the total number of equally likely cases. If probability of occurrence of
A is denoted by P(A), then by this definition we have.

P(A) = Number of favorable cases


Total number of equally likely cases

For calculating probability we have to find out two things :

1. Number of favorable cases.

2. Total number of equally likely cases.

For example, if a coin is tossed there are two equally likely results, a head or a tail, hence
the probability of head is ½. Similarly, if a die is thrown, the probability of obtaining an even
number is 3/6 or ½, since, three of the six equally possible results are even numbers.

• Symbolically, if an event A can happen in ‘a’ ways out of a total of ‘n’ equally likely and
mutually exclusive ways then the probability of occurrence of event called its success is
denoted by P = Pr(A) = a/n and the probability of non-occurrence of the event (called its
failure) is given by :

q = Pr (Not A) or P(A) = n-a/n or b/n

= 1 – a/n or 1-P or 1 – Pr(A),

Since the sum of the successful and unsuccessful outcomes is equal to the total number of
events, we have,

a+b = n

Dividing by n,

a/n + b/n = 1

So that p+q = 1

Probability, therefore, may be written as a ratio. The numerator of the fraction


corresponding to this ratio represents the number of successful (or unsuccessful) outcomes
while the denominator represents the total number of possible outcomes.

2
22-01-2025

The scale of probability extends from zero to unity (i.e., one). When p =0 it denotes
impossibility of the event taking place, i.e., the event cannot place. However, this is true only
when the number of possible outcomes is finite. For example, the probability of throwing
seven with a single die is zero. On the other hand, when p=1 it denotes certainty, i.e., the
event is bound to take place. In most cases, in practical life the probability lies between these
two extremes 0 and 1.

For example(1), one card is drawn from a standard pack of 52. what is the probability that it
is a king.

Total number of cards in pack = 52.

Total number of king cards = 4.

Probability of getting a king p(A) = Number of total favorable cases


Number of equally likely cases
or a
n
4 / 52 = 1/ 13

Example(2) from a bag containing 10 black and 20 white balls, a ball is drawn at random what
is the probability that it is the black ball ?

Solution :

Total number of balls in the bag = 10 + 20 = 30

Number of black balls = 10

Probability of getting a black ball or


p(A) = Number of total favorable cases
Number of equally likely cases

Or p(A) = a/n

= 10/30

= 1/3

3
22-01-2025

IV. Axiomatic approach to probability.

The whole field of probability theory of finite sample spaces is based upon the following
three axioms:

a. The probability of an event ranges from zero to one. If the event cannot take place its
probability shall be zero and if it is certain, i.e, bound to occur, its probability shall be one.

b. The probability of entire sample is 1. i.e, P(s) = 1.

c. If A and B are mutually exclusive (or disjoint) events then the probability of occurrence of
either A or B by P(A U B) shall be given by

P (A U B) = P (A) + P(B).

You might also like