0% found this document useful (0 votes)
44 views43 pages

Fy Bba Unit 1

The document discusses different types of data and scales of measurement used in statistics. It defines qualitative and quantitative data, as well as continuous and discrete variables. It also defines four levels of measurement scales: nominal, ordinal, interval and ratio scales. Examples are provided to explain each scale of measurement.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views43 pages

Fy Bba Unit 1

The document discusses different types of data and scales of measurement used in statistics. It defines qualitative and quantitative data, as well as continuous and discrete variables. It also defines four levels of measurement scales: nominal, ordinal, interval and ratio scales. Examples are provided to explain each scale of measurement.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

FYBBA SEM I

BASIC STATISTICS FOR


UNIT I DESCRIPTIVE STATISTICS

Introduction:
The views commonly held about statistics are numerous, but often incomplete. It has different meanings to different
people depending largely on its use. For example, (i) for a cricket fan, statistics refers to numerical information or data
relating to the runs scored by a cricketer; (ii) for an environmentalist, statistics refers to information on the quantity of
pollution released into the atmosphere by all types of vehicles in different cities; (iii) for the census department,
statistics consists of information about the birth rate per thousand and the sex ratio in different states; (iv) for a share
broker, statistics is the information on changes in share prices over a period of time; and so on.
Definition of statistics is given by Croxton and Cowden. They have defined statistics in a singular sense. This definition
also refers statistics as Statistical Method. According to Croxton and Cowden Statistics may be defined as a science of
collection, presentation, analysis and interpretation of numerical data.
This definition has pointed out four stages of statistical investigation, to which one more stage ‘organization of data’
rightly deserves to be added. Accordingly, statistics may be defined as the science of collecting, organizing, presenting,
analyzing, and interpreting numerical data for making better decisions.

Collection Organization Presentation Analysis Interpretation

Statistical methods, broadly, fall into the following two categories:


(i) Descriptive statistics (ii) Inferential statistics
Descriptive statistics includes statistical methods involving the collection, presentation, and characterization of a set of
data in order to describe the various features of that set of data.
Inferential statistics includes statistical methods which facilitate estimating the characteristic of a population or making
decisions concerning a population on the basis of sample results

Types of data:
The collected data are of two types (i) Qualitative data (ii) Quantitative data
Qualitative Data:
When the data are classified according to some qualitative phenomena which are not capable of quantitative
measurement like honesty, beauty, employment, intelligence, occupation, sex, literacy, etc., are termed as qualitative
data. The qualitative phenomena under study are known as Attributes.
For example,
(i) Population has two classes like presence and absence, male and female, honest or dishonest employed or
unemployed , beautiful or not beautiful

1
(ii) Population is classified into more than two classes. Attribute “Intelligence” the various classes may be, say, genius,
very intelligent, average intelligent, below average and dull as given below:

(iii) Classify the population by sex into two classes, males and females. Then each of these is again classified according
to smoking, smokers and non-smokers, again each of these four classes are classified with respect to a third attribute,
religion, into two classes , Hindu and non-Hindu.

Quantitative data:
If the data are classified on the basis of phenomenon which is capable of quantitative measurement like age, height,
weight, prices, production, income, expenditure, sales, profits, etc., are called Quantitative data. The quantitative
phenomenon under study is known as Variable.
Variables are of two kinds: (i) Continuous variable. (ii) Discrete variable (Discontinuous variable).
(i) Those variables which can take all the possible values (integral as well as fractional) in a given specified range are
termed as continuous variables.
For example, the age of students in a school (Nursery to Higher Secondary) is a continuous variable because age
can take all possible values (as it can be measured to the nearest fraction of time : years, months, days, minutes,
seconds, etc.), in a certain range, say, from 3 years to 20 years.
More precisely a variable is said to be continuous if it is capable of passing from any given value to the next value
by infinitely small gradations.
(ii) On the other hand those variables which cannot take all the possible values within a given specified range are
termed as discrete (discontinuous) variables.
For example, family size (members in a family), the population of a city, the number of accidents on the road, the
number of typing mistakes per page and so on.

2
Scales of measurements:
The data are categorized using different scales of measurements. Each level of measurement scale has specific
properties that determine the various use of statistical analysis. There are four different scales of measurement. The
data can be defined as being one of the four scales. The four types of scales are:
The four types of scales are:
1. Nominal Scale
2. Ordinal Scale
3. Interval Scale
4. Ratio Scale
1. Nominal Scale: 1st Level of Measurement
• Definition:
Nominal Scale, also called the categorical variable scale, is defined as a scale used for labeling variables into
distinct classifications and doesn’t involve a quantitative value or order. This scale is the simplest of the four
variable measurement scales. Calculations done on these variables will be futile as there is no numerical value of
the options.
This is the fundamental of quantitative research, and nominal scale is the most fundamental research scale.
The sequence in which subgroups are listed makes no difference as there is no relationship among subgroups. A
subgroup of nominal scale with only two categories (e.g. male/female) is called “dichotomous.”
• Nominal Scale Data and Analysis:
There are two primary ways in which nominal scale data can be collected:
(i) By asking an open-ended question, the answers of which can be coded to a respective number of label
decided by the researcher.
(ii) The other alternative to collect nominal data is to include a multiple choice question in which the answers
will be labeled.
In both the cases, the analysis of gathered data will happened using percentages or mode, i.e., the most
common answer received for the question. It is possible for a single question to have more than one mode as it
is possible for two common favorites can exist in a target population.
• Nominal Scale Examples:
Nominal scale is often used in research surveys and questionnaires where only variable labels hold significance.
(1) For instance, a customer survey asking, “Which brand of smart phones do you prefer?”
Options: “Apple”- 1 , “Samsung”-2, “OnePlus”-3.
In this survey question, only the names of the brands are significant for the researcher conducting
consumer research. There is no need for any specific order for these brands. However, while capturing
nominal data, researchers conduct analysis based on the associated labels.
In the above example, when a survey respondent selects Apple as their preferred brand, the data entered
and associated will be “1”. This helped in quantifying and answering the final question – How many
respondents selected Apple, how many selected Samsung, and how many went for OnePlus – and which
one is the highest.
(2) What is your Gender?
Options:”Male”-1, “Female”-2
(3) What is your Political preference?
Options: 1- Independent, 2- Democrat, 3- Republican
(4) Where do you live?
Options: 1- Suburbs, 2- City, 3- Town
3
In this survey question, only the names of the brands are significant for the researcher conducting consumer
research. There is no need for any specific order for these brands. However, while capturing nominal data,
researchers conduct analysis based on the associated labels

2. Ordinal Scale: 2nd Level of Measurement


• Definition:
Ordinal Scale is defined as a categorical variable measurement scale used to simply depict the order of variables and
not the difference between each of the categorical variables. These scales are generally used to depict non-
mathematical ideas such as frequency, satisfaction, happiness, a degree of pain etc. It is quite straight forward to
remember the implementation of this scale as ‘Ordinal’ sounds similar to ‘Order’, which is exactly the purpose of this
scale.
Ordinal Scale maintains descriptional qualities along with an order but is void of an origin of scale and thus, the
distance between variables can’t be calculated. Descriptional qualities indicate tagging properties similar to the
nominal scale, in addition to which, ordinal scale also has a relative position of variables. Origin of this scale is absent
due to which there is no fixed start or “true zero.
• Ordinal Data and Analysis:
Ordinal scale data can be presented in tabular or graphical formats for a researcher to conduct convenient analysis of
collected data. The analysis of gathered data will happened using percentages or median,
• Ordinal Scale Examples:
Status at workplace, tournament team rankings, order of product quality, and order of agreement or satisfaction are
some of the most common examples of Ordinal Scale. These scales are generally used in market research to gather
and evaluate relative feedback about product satisfaction, changing perceptions with product upgrades etc.
For example, ordinal scale question such as:
How satisfied are you with our services?
Very Unsatisfied – 1
Unsatisfied – 2
Neutral – 3
Satisfied – 4
Very Satisfied – 5
1. Here, the order of variables is of prime importance and so is the labeling. Very unsatisfied will always be
worse than unsatisfied and satisfied will be worse than very satisfied.
2. This is where ordinal scale is a step above nominal scale – the order is relevant to the results and so is their
naming.
3. Analyzing results based on the order along with the name becomes a convenient process for the researcher.
4. If they intend to obtain more information than what they would collect using nominal scale, they can use
ordinal scale.

3. Interval Scale: 3rd Level of Measurement:


• Definition:
Interval Scale is defined as a numerical scale where the order of the variables is known as well as the difference
between these variables. Variables which have familiar, constant and computable differences are classified using the
Interval scale.
These scales are effective as they open doors for the statistical analysis of provided data. Mean, median or mode can
be used to calculate the central tendency in this scale. The only drawback of this scale is that there no pre-decided
starting point or a true zero value.
Interval scale contains all the properties of ordinal scale, in addition to which, it offers a calculation of the difference
between variables. The main characteristic of this scale is the equidistant difference between objects. For instance,
4
- 80 degrees is always higher than 50 degrees and the difference between these two temperatures is the same
as the difference between 70 degrees and 40 degrees.
- Also, the value of 0 is arbitrary because negative values of temperature do exist – which makes
- Celsius/Fahrenheit temperature as a classic example of interval scale.
- Due to absence of absolute zero, one cannot tell by how much the temperature is higher or lower. For
example, you cannot say if 40 degrees is twice hot as degree or if -20 degrees is half as cold as -40 degrees.
Interval scale is often chosen in research cases where the difference between variables is a mandate – which can’t be
achieved using nominal or ordinal scale. Interval scale quantifies the difference between two variables whereas the
other two scales are only capable of associating qualitative values with variables.
The mean and median values in an ordinal scale can be evaluated, unlike the previous two scales.
In statistics, interval scale is frequently used as a numerical value can not only be assigned to variables but calculation
on the basis of those values can also be carried out.
• Interval Data and Analysis
All the techniques applicable to nominal and ordinal data analysis are applicable to Interval Data as well. Apart from
those techniques, there are a few analysis methods such as descriptive statistics, correlation regression analysis which
is extensively for analyzing interval data.
Descriptive statistics is the term given to analysis of numerical data which helps to describe, depict or summarize data
in a meaningful manner and it helps in calculation of mean, median, and mode.
Even if interval scales are amazing, they do not calculate the “true zero” value which is why the next scale comes into
the picture
4. Ratio Scale: 4th Level of Measurement:
• Definition:
Ratio Scale is defined as a variable measurement scale that not only produces the order of variables but also makes the
difference between variables known along with information on the value of true zero. It is calculated by assuming that
the variables have an option for zero, the difference between the two variables is the same and there is a specific order
between the options.
Ratio scale accommodates the characteristic of three other variable measurement scales, i.e. labeling the variables, the
significance of the order of variables and a calculable difference between variables (which are usually equidistant).
Because of the existence of true zero value, the ratio scale doesn’t have negative values. To decide when to use a ratio
scale, the researcher must observe whether the variables have all the characteristic of an interval scale along with the
presence of the absolute zero value.
• Ratio Data and Analysis:
With the option of true zero, varied inferential and descriptive analysis techniques can be applied to the variables. In
addition to the fact that the ratio scale does everything that a nominal, ordinal and interval scale can do, it can also
establish the value of absolute zero.
Ratio scale provides the most detailed information as researchers and statisticians can calculate the central tendency
using statistical techniques such as mean, median, mode and methods such as geometric mean, the coefficient of
variation or harmonic mean can also be used on this scale.
• Ratio Scale Examples:
Best examples of ratio scales are weight and height.
The following questions fall under the Ratio Scale category:
(ii) What is your daughter’s current height? (ii) What is your weight in kilograms?
Less than 5 feet. Less than 50 kilograms
5 feet 1 inch – 5 feet 5 inches 51- 70 kilograms
5 feet 6 inches- 6 feet 71- 90 kilograms
More than 6 feet 91-110 kilograms

5
Summary of Levels of Measurement
Offers: Nominal Ordinal Interval Ratio

The sequence of variables is established – Yes Yes Yes

Mode Yes Yes Yes Yes

Median – Yes Yes Yes


Mean – – Yes Yes
Difference between variables can be evaluated – – Yes Yes
Addition and Subtraction of variables – – Yes Yes
Multiplication and Division of variables – – – Yes
Absolute zero − − − Yes

Frequency distribution:
The organization of the data pertaining to a quantitative phenomenon involves the following four stages:
(1) The set or series of individual observations - unorganized (raw) or organized (arrayed) data.
(2) Discrete or ungrouped frequency distribution.
(3) Grouped frequency distribution.
(4) Continuous frequency distribution

(1) Array. A better presentation of the above raw data would be to arrange them in an ascending or descending order
of magnitude which is called the ‘arraying’ of the data. However, this presentation (arraying), though better than
the raw data does not reduce the volume of the data.

6
(2) Discrete or ungrouped frequency distribution: A much better way of the representation of the data is to express it
in the form of discrete or ungrouped frequency distribution where count the number of times each value of the
variable occurs in the data. This is facilitated through the technique of tally bars. If the variables takes the values in
a wide (large) range then the data still remain unwieldy and need further processing for statistical analysis.
Example
Following data shows the total number of overtime hours worked for 30 consecutive weeks by machinists in a
machine shop. The displayed are in raw form:
91 89 88 89 90 92 93 88 87 85 88 93 91 93 91
93 92 88 92 90 93 84 93 84 91 93 85 91 89 92
Represent the above information by appropriate frequency distribution.
Solution:
Variable (X): Number of overtime hours per week
Frequency (𝑓): Number of weeks, N= no. of weeks = 30
Maximum observation: 93, Minimum observation: 84

Ungrouped frequency distribution


of overtime (in hrs) by machinists in a machine shop
X Tally Bars 𝒇
84  2
85  2
86 0
87  1
88  4
89  3
90  2
91  5
92  4
93   7
Total N=30

(3) Grouped frequency distribution: If the identity of the units about whom a particular information is collected is not
relevant, nor is the order in which the observation occur, then the first real step is classifying the data into
different classes (or class intervals) by dividing the entire range of the values of the variable into a suitable number
of groups called classes and then recording the number of observations in each group or class. The various groups
into which the values of the variable are classified are known as classes or class intervals; the length of the class
interval is called the width of the classes. The two values specifying the class are called the class limits; the larger
value is called the upper class limit and the smaller
value is called the lower class limit. Here classes are of inclusive form so that both upper and lower limit is
included in respective classes. This type of classes generally used for discrete variable.
Example
A computer company received a rush order for as many home computers as could be shipped during a six-week
period. Company records provide the following daily shipments:
22 65 65 67 55 50 65 77 73 30 62 54 48 65 79 60 63 45 51 68 79
83 33 41 49 28 55 61 65 75 55 75 39 87 45 50 66 65 59 25 35 53
Represent the above information by appropriate frequency distribution.
Solution:
Variable: Number of computers shipped per day (discrete variable, make inclusive classes)
Frequency: Number of days
Maximum observation: 87; Minimum observation: 22

7
N = total no. of days during six weeks = 42
Number of classes = k = 6 as 25 = 32 < 42 and 26 = 64 > 42 so take k= 6
87−22
Class interval = ≅ 11
6
Grouped frequency distribution
of computers shipping/day during six week period in the computer company
Classes Tally marks No. of days
22-32  4
33-43  4
44-54   9
55-65    14
65-76   6
77-87  5
Total 42

(4) Continuous frequency distribution: While dealing with a continuous variable it is not desirable to present the
data into a grouped frequency distribution like 0-9, 10-19, 20-29 etc., because this classification does not take
into consideration the observation between 9 to 10, 19 to 20, so on. In such situation one should form
continuous class intervals like 0-10, 10-20, 20-30 etc., the presentation of the data into continuous classes with
corresponding frequencies is known as continuous frequency distribution.
Example
The following data represent the annual family expenses (in thousands of rupees) on food items in a city.
13.8 14.1 14.7 15.2 16.8 15.6 14.9 16.7 19.2 14.9 14.9 14.9 15.2 15.9
15.2 14.8 14.8 19.1 14.6 18.0 14.9 14.2 14.1 15.3 15.5 18.0 17.2 17.2
14.1 14.5 18.0 14.4 14.2 14.6 14.2 14.8
Represent the above information by appropriate frequency distribution.
Solution:
Variable(X): Annual family expense in thousands of rupees
(Continuous Variable, so make exclusive classes)
Frequency (𝑓): Number of families
N = total no. of families = 36, Minimum Observation: 13.8; Maximum Observation: 19.2.
Number of classes = k = 6 as 25 = 32 < 42 and 26 = 64 > 42 so take k= 6
19.2−13.8
𝐶𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 = ≅1
6

Grouped frequency distribution of


annual expenses (in’ooo Rs) on food items in a city
Classes (X) Tally Bars 𝒇
13.8-14.8    12
14.8-15.8    14
15.8-16.8  2
16.8-17.8  3
17.8-18.8  3
18.8-19.9  2
Total 36

8
• Relative frequency distribution:
Frequency of each class can also be expressed as a fraction or percentage terms. These are known as relative
frequencies. In other words, a relative frequency is the class frequency expressed as a ratio of the total frequency.
Class frequency
Relative Frequency =
Total frequency
For Example:
Consumption of electricity Number of Relative frequency
(in Kilowatt) factories distribution
20-30 18 0.18
30-40 18 0.18
40-50 25 0.25
50-60 22 0.22
60-70 17 0.17
Total 100 1

• Cumulative frequency distribution:


Cumulative frequency distribution is of two types.

(a) Less than type: Here cumulative frequencies (C.F) are obtained by adding successive class frequency from top
to bottom. Here frequencies are obtained as compare to upper limit of class.
For example:

Consumption of electricity Number of


Less than (C.F)
(in Kilowatt) factories(f)
Less than 20 = 0
20-30 18 Less than 30 = 18
30-40 18 Less than 40 = 36
40-50 25 Less than 50 = 61
50-60 22 Less than 60 = 83
60-70 17 Less than 70 = 100

(b) More than type: Here cumulative frequencies are obtained by adding class frequencies from bottom to top.
Here frequencies are obtained as compare to lower limit of class.
For example:
Consumption of electricity Number of More than (C.F)
(in Kilowatt) factories(f)
20-30 18 More than 20 = 100
30-40 18 More than 30 = 82
40-50 25 More than 40 = 64
50-60 22 More than 50 = 39
60-70 17 More than 60 = 17
More than 70 = 0

9
Graphical presentation of data:
In Frequency distribution graphs data are presented by
(1) Histograms,
(2) Frequency Polygons,
(3) Frequency Curves,
(4) Ogives.

(1) Histograms
- It consists of a number of rectangles, those are vertically adjacent.
- For drawing Histogram Class intervals are taken on X-axis and frequency density on Y-axis so that are of
rectangles represent frequency of that class. In case of equal class interval for simplicity frequencies are taken
on Y-axis.
- Histograms can’t be constructed for frequency distributions with open end classes unless we assume that the
magnitude of the first open class is same as that of the succeeding (second) class and the magnitude of the
last open class is same as that of the preceding (i.e., last but one) class.
- The purpose of drawing histogram is to locate Mode (measure of central tendency) graphically and to
comment about the nature of frequency distribution whether it is positively skewed, negatively skewed or
symmetric.
The technique of constructing histogram is as follows:
1. For ungrouped frequency distribution: Here erect a vertical line towards the value of variable having height
equal to frequencies.
For example:
The following data shows the number of accidents sustained by 314 drivers of a public utility company over a
period of five years.
No. of accident 0 1 2 3 4 5 6 7 8 9 10 11
No. of drivers 82 44 68 41 25 20 13 7 5 4 3 2

From the graph one can say


distribution is positively skewed and
mod is 68 as it has highest vertical
line

2. For distribution having equal class-intervals:


When class-intervals are equal, take frequency on Y axis, variable on X-axis and construct adjacent rectangles. In
such cases the height of the rectangles will be proportional to the frequency.
For example:

10
Frequency distribution is nearer
to symmetry. Mode can be
between 40 and 50.

3. For distribution having unequal class intervals:


When class-intervals are unequal, find for each class the frequency density. The frequency density is the
frequency for that class divided by the width of that class. A histogram is constructed using frequency density
values instead of frequency on Y axis. So that, the area of rectangle represents frequency of corresponding class.
frequency of class
Frequency density of class = corresponding class width
For example:

Frequency distribution is
negatively skewed. Mode can be
between 20 and 25.

(2) Frequency polygon:


- It can be constructed by taking middle points of class intervals on X-axis and the frequency on Y-axis.
These points are joined by straight line.
- Two extremes are joined with base in such a way that they touch the X-axis at half the distance of class
interval outside the extreme points.
11
- If we have to construct histogram and frequency polygon both on same graph paper, then first draw
histogram. Then join the middle points of the top of rectangles of histograms by straight line.
- Frequency polygon is a closed figure got by joining the extremes with X-axis at a distance of half the length of
class-interval from the extremes of the variable of the class-interval. Frequency polygons are useful for
comparing distributions. This is achieved by overlaying the frequency polygons drawn for different data sets.
- Using frequency polygon one can comment about the nature of frequency distribution whether it is positively
skewed, negatively skewed or symmetric.
Example:
The following table gives the frequency distribution of the weekly wages (in ’00 Rs.) of 100 workers in a
factory

(3) Frequency curve


- Frequency curve is obtained by joining the points of frequency polygon by a freehand smoothed curve. Unlike
frequency polygon, where the points we joined by straight lines, free hand joining of those points in order to get
a smoothed frequency curve.
- It is used to remove the ruggedness of polygon and to present it in a good form or shape. One can smoothen the
angularities of the polygon only without making any basic change in the shape of the curve.
Example:

12
(4) Cumulative Frequency Distribution (Ogives)
- Sometimes it is preferable to present data in a cumulative frequency (Cf) distribution or simply a
distribution which shows the cumulative number of observations below the upper boundary (limit) of
each class in the given frequency distribution. For example, at a time we are interested in knowing how
many workers of a factory earn less than Rs. 700 per month or how many workers earn more than Rs.
1,000 per month, percentage of students who have failed etc. To answer these questions it is necessary
to add the frequencies. When frequencies are added they are called cumulative frequencies. Then a table
of cumulative frequencies is drawn, which when plotted on a graph paper is called the cumulative
frequency curve or more popularly known as 'Ogive'.
- A cumulative frequency distribution is of two types: (i) more than type and (ii) less than type.
- Less than cumulative frequency: In the less than method we start with upper limits of class and go on
adding the frequencies. When these frequencies are plotted we get a rising curve.
- More than cumulative frequency: Here, we start with lower limit and go on subtracting the frequencies
of each class. When these frequencies are plotted a decreasing curve will be obtained.
- Using Ogive median can be located graphically.
For example:
The following table gives the distribution of monthly income of 600 families in a certain city.

Monthly Income (‘000Rs.) No. of Families(f) Less than CF More than CF


<0=0
0-75 60 < 75 = 60  0 = 600
75-150 170 < 150 = 230  75 = 540
150-225 200 < 225 = 430  150 = 370
225-300 60 < 300 = 490  225 = 170
300-375 50 < 375 = 540  300 = 110
375-450 40 < 450 = 580  375 = 60
450-525 20 < 525 = 600  450 = 20
 525 = 0

13
Measures of Central Tendency
One of the important objectives of statistical analysis is to determine various numerical measures which describe the
inherent characteristics of a frequency distribution. The first of such measures is average. The averages are the
measures which condense a huge unwieldy set of numerical data into single numerical values which are representative
of the entire distribution. The numerical value of an observation (also called central value) around which most
numerical values of other observations in the data set show a tendency to cluster or group, called the central tendency.
• Requisites of an Ideal measure of Central Tendency
The following requirements to be satisfied by an ideal measure of central tendency:
1. It should be rigidly defined: The definition of an average should be rigid so that there must be uniformity in its
interpretation by different users or investigators.
2. It should be easy to understand and calculate: The value of an average should be calculated by using simple
methods without reducing its accuracy and other advantages.
3. It should be based on all observations: Since it represents the entire data set, it must be computed using all the
observations.
4. It should be suitable for further mathematical treatment: This means that, if average of certain group is known
then it is possible to calculate their combine average without knowing actual observations for all groups. For
example, it should be possible to determine the average production in a particular year by the use of average
production in each month of the year.
5. It should be affected as little as possible by fluctuation of sampling: This means that it should have sampling
stability. That is the value of an average calculated from various independent random samples of the same size
from a given population should not vary much from another.
6. It should not be affected much by extreme observations: The value of an average should not be affected by
very small or very large observations in the given data.

• Various Measures of Central Tendency


The various measures of central tendency or averages commonly used can be broadly classified in the following
categories:
1. Mathematical Averages
a) Arithmetic Mean commonly called the mean or average
i. Simple
ii. Weighted
b) Geometric Mean
c) Harmonic Mean
2. Averages of Position
a) Mode
b) Median
c) Quartiles
d) Deciles
e) Percentiles

14
1. Mathematical Averages
a) Arithmetic Mean
i. Simple Mean:
It is the quantity obtained by sum of all observations divided by the total number of observations. If X is the
involved variable, then arithmetic mean of X is abbreviated as A.M. of X and denoted by x .
(a) Raw data: a data without any statistical treatment): If x1, x2, xn are n observations of random variable X.

x=
x
Then arithmetic mean or mean is denoted by x and is given by
n
(b) Discrete ungrouped frequency distribution:
If x1, x2, …, xn are n distinct observations of discrete variable X with frequency f1, f2, …, fn respectively, then

Arithmetic Mean of X is given by x =


f x where, f = Frequency.
f
(c) Continuous frequency distribution:
If L1 – U1, L2 – U2, … , Ln – Un are n exhaustive and exclusive class of random variable X with frequency
f1, f2, …, fn respectively, then

Arithmetic Mean of X is given by x =


f m
f
where, f = Frequency; 𝑚= mid value of class. Here classes may be inclusive or exclusive.
Merits:

1. It is rigidly defined.
2. It is easy to calculate.
3. It is simple to compute.
4. It is based upon all the observations.
5. It is capable of father algebraic treatment. (i.e. possible to find combined mean).
6. It is least affected of sampling fluctuations.
Demerits:

1. It is very much affected by extreme values. (i.e., too high and too low values).
2. It can’t be calculated when end classes are open-ended.
3. It can’t be located graphically.
4. The mean cannot be calculated for qualitative characteristics such as intelligence, honesty, beauty, or
loyalty.
Properties of Arithmetic Mean (or Mean):
1. The sum of the deviations of all observations from their arithmetic mean is always zero.
i.e., (x − x) = 0 ; for raw data
 f (x − x) = 0 ; for frequency data.
2. The sum of squared deviations of all the observations is minimum when it was taken about their arithmetic
mean.
( x − x )  ( x − A)
2 2
i.e., for raw data

 f (x − x )   f (x − A)
2 2
for frequency data.
Here A is any value except x .

15
3. If we replace each individual observation in the data by the constant then mean is the constant itself. That is if
𝑥𝑖 = 𝑐 for all 𝑖 then 𝑥 = 𝑐
4. ∑ 𝑥 = 𝑛𝑥 for raw data
∑ 𝑓𝑥 = 𝑁𝑥 for freuency data
5. Arithmetic mean is depends on change of origin and scale both.
That is, if a fixed number is subtracted from/added to each observation, then their mean is
diminished/increased by this same number and if each observation is divided/multiply by a fixed number, then
their mean is divided/multiply by this same number.
i.e., If Y = a + b X then Y = a + b X.

6. If x and x be arithmetic mean of two groups of observations N1 and N2 then the combined mean of these
1 2
two groups can be computed by
N 1 x1 + N 2 x2
x12 = N 1 + N 2
This can also be generalized in the same way for more than two groups of different observations having
different arithmetic meas.
N x1 + N + ... + Nk xk
1 2 x2
x = N + N + ... + N
c 1 2 k
(n + 1 )
7. The arithmetic mean of first 𝑛 natural number is
2

16
ii. Weighted Arithmetic Mean:
In the computation of simple arithmetic average assumption is that all the items in the distribution are of equal
importance. However, in practice, it is possible to come across situation where relative importance of all the items
of the distribution is not same. In such cases, due weightage is to be given to various item weighted mean is
computed. For example, if it is desired to have an idea of the change in the cost of living of a certain group of
people, then the simple arithmetic average of the prices of the commodities consumed by the people will not do,
as all commodities are not equally importance; e.g. items like wheat, rice pluses, fuels, housing lighting etc. are
more important than cigarettes, confectionary, cosmetics, etc. Hence, different items should be assigned weights
according to their relative importance for the computation of mean, which will be weighted mean.
Let w1 , w2 ,...wn be the weight assign to variable values x1 , x2 ,..., xn respectively, then, the weighted arithmetic
mean, usually denoted by xw is given by

xw =
w1 x1 + w2 x2 + ... + wn xn
=
wx
w1 + w2 + ... + w
n w
In case of frequency distribution, if f1 , f 2 ,... fn are the frequency of the variable values x1 , x2 ,..., xn respectively
then the weighted arithmetic mean is given by

xw =
w1 ( f1 x1 ) + w2 ( f 2 x2 ) + ... + wn ( f n xn )
=
wfx
w1 + w2 + ... + wn w
The weighted arithmetic mean should be used
1. When the importance of all the numerical values in the given data set is not equal.
2. When the frequencies of various classes are widely varying.
3. When there is a change either in the proportion of numerical values or in the proportion of their frequencies.

b) Geometric Mean:
The geometric mean is the nth root of product of n observations.
n x1.x2 ,. .. xn ; for raw data

f1 f2 fn
G.M = N x1 .x 2 ,....x n ; for discrete group data

f
N m1 f1 .m 2 2 ,....m n fn ; for data are in classes form, and mi represent mid value of classes

For computational purpose


  log x 
 Antilog   ; for raw data
  n 
 f log x 

G.M = Antilog    ; for discrete group data
  
  N 
  f log m 
Antilog   ; for data are in classes form, and mi represent mid value of classes
  N 
  
Note:
(i) If any one of the observation is zero then G.M is zero.

17
(ii) If any one of the observation is negative then G.M is imaginary.
Application:
(i) The concept of G.M. is used in the construction of Index number.
(ii) Since G.M. ≤ A.M., therefore G.M. is useful in those cases where smaller observations are to be given
importance. Such cases usually occur in social and economic areas of study.
(iii) The G.M. of a data set is useful in estimating the average rate of growth in the initial value of an
observation per unit period. For example, it is useful in finding the percentage increase in sales, profit,
production, population, and so on.
c) Harmonic Mean:
Harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the given observation.
 1 = n
 1  1 + 1 1 1 ; for raw data
 
n x x + ... + x x
  1 2 n 
H.M = 
 f1 = N ; for frequency data
 1  f1 + 2 + ... + f n  f
N x x x x
  1 2 n 

Note:
i) If any observation is Zero then H.M is not defined.
ii) Harmonic mean is especially useful in averaging rates and ratios where time factor is variable and the act
being performed e.g., for finding average speed of vehicles, typist etc.
Application: The harmonic mean is particularly useful for computation of average rates and ratios. Such rates and
ratios are generally used to express relations between two different types of measuring units that can be
expressed reciprocally.
- Relationship among A.M., G.M. and H.M.
For any set of observations, it’s A.M., G.M. and H.M. are related to each other in the relationship
𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀
The sign of ‘=’ holds if and only if all the observations are identical.
Note: (i) If the observations in a data set take the values a, ar2, ar3, arn-1, each with single frequency,
then (G.M.) 2 = A.M. x H.M.
(ii) If a variable assume only two values then (G.M.) 2 = A.M. x H.M.

2. Averages of Position:
(a) Mode: Mode is the value which occurs most frequently in a set of observation and around which the other items
of the set cluster densely.
For example, if a sandwich shop sells 10 different types of sandwiches, the mode would represent the
most popular sandwich.
(a) Raw data: M0 = that value of variable which occur more frequently in data set.
(b) Discrete ungrouped frequency distribution: M0 = that value of variable which corresponds to highest
frequency.
(c) Continuous frequency distribution : First find modal class that is the class having highest frequency.
M = L+ f1 − f0
c
0
2 f1 − f 2 − f0
Where L: lower limit or boundary of modal class
f 0 : Frequency of class above modal class
f1 : Frequency of modal class
f 2 : Frequency of class below modal class
c : Class width of modal class
18
Mode is especially useful in finding the most popular size in studies relating to marketing, trade, business and
industry. It is the appropriate average to be used to find the ideal size e.g., in business forecasting, in the
manufacture of shoes or readymade garments, in sales, in production, etc.
If two or more values observe for the same numbers of time, then there are two or more Modes exist and
distribution is said to be bi-modal or multi-modal. If the data having only one mode the distribution is said to
be uni-modal and data having two modes, the distribution is said to be bi-model.
Merits:
1. It is easy to calculate, easy to understand.
2. It is not affected by extreme values.
3. It can be determined in open-end classes.
4. It can be represented graphically by Histogram.
5. It is most suitable average to find the ideal size. For e.g. its value is used for comparing consumer
preferences for various types of products, say soaps, cigarettes, toothpastes or other products. In the
manufacture of readymade garments, shoes etc.
Demerits:
1. It is not based on all the observations.
2. It is not capable of further algebraic treatment.
3. As compared to mean and median, it is affected to a greater extent by sampling fluctuations

(b) Median: The median is that value of variable which divides the group in two equal parts, one part comprising
all the values greater and the other, all values less than median. Since its value depends on the position
occupied by a value in the frequency distribution it is also known as positional measure of central tendency.
(a) Raw data: First arrange the observation in ascending (increasing) order.
 n + 1th
  observatio n ; if n is odd
  2 
M e =  n th
   n th
  observatio n +  + 1 observatio n
 2  2  ;if n is even
 2
(b) Discrete ungroup frequency distribution: First find the cumulative frequency (C.F) less than type.
M e = that value of variable which corresponds to C.F. just greater than (N/2)
(c) Continuous frequency distribution:
- First find C.F less than type.
- Find Median class i.e. class having C.F just greater than (N/2).
N −Cf
2
Me = L + c
f
Where, L : Lower limit or boundary of median class
C f : Cumulative frequency of class above median class
f : Frequency of median class
c : Class width of median class
Note:

1) The sum of absolute deviation is minimum if it is taken from median. f x − A   f x − Me


2) Median is the only average to be used while dealing with qualitative characteristics which cannot be
measured quantitatively e.g. to find the average intelligence, average beauty, average honesty, etc. among
a group of people.
19
Merits
1. It is easy to understand.
2. It is easy to calculate.
3. It is not affected by extreme value.
4. It can be computed in case of open end classes.
5. It is most suitable average in a study of qualitative data.
6. It can be located graphically by ogive.
Demerits
1. It is not capable for further algebraic treatment.
2. It is not based on all observations.
3. It is affected more by sampling fluctuations than the arithmetic mean.
4. It requires arranging the data before it can be found, which tedious work is.
Application:
The median is helpful in understanding the characteristic of data set when
1. Observations are qualitative in nature.
2. Extreme values are present in the data set.
3. A quick estimate of an average is desired

Relationship between Mean, Median and Mode:


1. In unimodal and symmetrical distribution: (See Graph-1)
Mean = Median = mode
2. For unimodal and moderately asymmetrical distribution: (SeeGraph-2, Graph-3)
3. The Empirical relationship holds which is given by,
Mean − Mode = 3(Mean − Median)

20
Other positional measures
1. Quartiles
- The values which divide the given data into four equal parts are known as Quartiles.
- There will be three such points Q1, Q2 and Q3, Such that Q1 ≤ Q2 ≤ Q3.
- Quartiles divide a rank-ordered data set into four equal parts. There are three quartiles called, first
quartile, second quartile and third quartile. The second quartile (Q 2) is equal to the median. The first
quartile is also called lower quartile and is denoted by Q 1. The third quartile is also called upper quartile
and is denoted by Q3.
- The lower quartile Q1 is a point which has 25% observations less than it and 75% observations are above it.
- The upper quartile Q3 is a point with 75% observations below it and 25% observations above it.
(a) Raw Data (Quartile for Individual Observations):
If x1, x2, …, xn are n observations ofthrandom variable X. Then,
 n + 1
Q1= value of   observation
 4  th
 n + 1
Q2= value of 2  observation
 4 th
 n + 1
Q3= value of 3  observation
 4 
(b) Discrete ungrouped frequency distribution ( When the data follows the discrete set of values grouped by
size):
If x1, x2, …, xn are n distinct observations of random variable X with frequency f1, f2, …, fn respectively, then
 N + 1  N + 1
th

Q1= value of   observation = value corresponds to C.F just greater than  


 4  th  4 
 N + 1  N + 1
Q2= value of 2   observation = valuecorresponds to C.F just greater than 2  
 4 th  4 
 N + 1  N + 1
Q3= value of 3   observation = value corresponds to C.F just greater than 3  
 4   4 
where N =∑f

(c) Continuous frequency distribution (When data arranged in tabular form containing different groups):
If L1 – U1, L2 – U2, …, Ln – Un are n exhaustive and exclusive class of random variable X with
frequency f1, f2, …, fn respectively, then
➢ First find C.F less than type and find ith quartile cl ass i.e. class having C.F just greater than [i(N/4)].
iN −Cf
Then ith Quartile is given by, Qi = L + 4  c where, i = 1, 2, 3.
f
Where, L: Lower limit or boundary of i th quartile class
Cf : Cumulative frequency of class above i th quartile class
f : Frequency of i th quartile class
c : Class width of i th quartile class

21
2. Deciles
The deciles are the partition values which divides the set of observations into ten equal parts. We have nine
deciles, denoted by respectively D1, D2, …, D9.
The first decile is D1 is a point which has 10% of the observations below it.
(a) Raw Data: If x1, x2, …, xn are N observations of random variable X. Then, ith decile is given by
+
th
 n 1 
Di= value of i  observation , where i = 1,2,…,9.
 10  th
 n + 1
Therefore, D1= value of   observation (first decile).
 10  th
 n + 1
D2= value of 2   observation (sec ond decile)
 10 
…………
 n + 1
th

D9= value of 9  observation (ninth decile)


 10 
(b) Discrete ungrouped frequency distribution
If x1, x2, …, xn are n distinct observations of random variable X with frequency f1, f2, …, fn respectively, then,
 N + 1
th

D1= value of   observation (first decile).


 10  th
 N + 1
D2= value of 2  observation (second decile) .
 10 
………… th
 N + 1
D9= value of 9  observation (ninth decile) Where N =∑ f .
 10 
(c) Continuous frequency distribution (When data arranged in tabular form containing different groups):
If L1 – U1, L2 – U2, …, Ln – Un are n exhaustive and exclusive class of random variable X with frequency f1,
f2, …, fn respectively, then,
First find C.F less than type a nd find ith Deci le class i.e. class having C.F just greater than [i(N/10)].

Then ith decile is Di = L +  c where, i = 1,2,…,9


f
Where, L : Lower limit or boundary of i th decile class
Cf : Cumulaive frequency of class above i th decile class
f : Frequency of i th decile class
c : Class width of i th decile class
N
( )
10 − C f
c
Thus, D1 = L +
f

D2 = L + c
f
…………

D9 = L + c
. f

22
3. Percentiles: Divide the series into hundred equal parts.
There are ninety nine percentile, P1,P2,…P99. Such that
P1 ≤ P2 ≤…≤ P99. Pi has (i × 100)% item less than it.
(a) Raw data: First arrange the observation
th
in ascending (increasing) order.
  n + 1
Pi = i   observatio ns i = 1,2,...,99
100
  
(b) Discrete ungroup frequency distribution: First find the cumulative frequency (C.F) less than type.
Pi = that value of variable which corresponds to C.F. just greater than [i(N/100)], i= 1,2,…,99.
(c) Continuous frequency distribution: First find C.F less than type.
Find ith percentile class i.e. class having C.F just greater than [iN/100)].
( )
i N 100 − C.Fa
c
Pi = L + i = 1,2,…,99
f
Where, L : Lower limit or boundary of i th percentile class
C f : Cumulative frequency of class above i th percentile class
f : Frequency of i th percentile class
c : Class width of i th percentile class
- Relation between median, Quartiles, Deciles and percentiles.
Median = Q2 = D5 = P50
Q1 = P25, Q3 = P75
D1 = P10, D2 = P20, … D9 = P90
➢ Summary of various location measures or the measure of central tendency
Property Arithme Median Mode Geometric Harmonic
tic mean Mean
mean
1. Rigidly defined Yes Yes Not very Yes Yes
2. Based on all values of series Yes No No Yes Yes
3. Easy to calculate and understand Yes Quite Quite Difficult Difficult
4. Amenable to algebraic treatment Yes No No Yes Yes
5. Effect of sample variations Stable Moderate Moderate Moderate Moderate
6. Effect of extreme values Large None None Very low Very low
7. Most useful in General Least net discomfort Typifying Averaging Averaging
purpose problem series rates ratios

23
Measures Of Dispersion
One of the important characteristic of distribution is Central Tendency, gives one single value that represents the
entire data. Another important characteristic of distribution is to describe the dispersion of data. The dispersion
also means scatteredness, spread or variation of the observations. The averages alone cannot adequately describe
a set of observations, unless all the observations are the same. It is necessary to describe the variability or
dispersion of the observation of the observations. In two or more distributions the central value may be the same
but still there can be wide disparities in the formulation of distribution. The extent to which the individual
observations differ on an average from mean or any other measure of central value is called measure of dispersion
or measure of variation. As these measures give an average of the differences of the observations included in a
group from an average of these items, they are also known as “averages of second order”. Note that Measures of
central values are, therefore, called the “averages of first order”.
Definition of Dispersion: According to Spiegel – “The degree to which numerical data tend to spread about an
average value is called the variation or dispersion of the data.”
Different Measures of Dispersion
For the study of dispersion, we need some measures which show whether the dispersion is small or large. There
are two types of measure of dispersion which are:
a) Absolute Measure of Dispersion
b) Relative Measure of Dispersion
a) Absolute Measures of Dispersion
These measures give us an idea about the amount of dispersion in a set of observations. They give the answers in
the same units as the units of the original observations. When the observations are in kilograms, the absolute
measure is also in kilograms. If we have two sets of observations, we cannot always use the absolute measures to
compare their dispersion. We shall explain later as to when the absolute measures can be used for comparison of
dispersion in two or more than two sets of data.
The absolute measures which are commonly used are:
1. The Range (R)
2. The Quartile Deviation (Q.D)
3. The Mean Deviation (M.D)
4. The Standard deviation (S.D) and Variance
b) Relative Measure of Dispersion
These measures are calculated for the comparison of dispersion in two or more than two sets of observations.
These measures are free of the units in which the original data is measured. If the original data is in dollar or
kilometers, we do not use these units with relative measure of dispersion. These measures are a sort of ratio and
are called coefficients. Each absolute measure of dispersion can be converted into its relative measure.
Thus the relative measures of dispersion are:
1. Coefficient of Range or Coefficient of Dispersion.
2. Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion.
3. Coefficient of Mean Deviation or Mean Deviation of Dispersion.
4. Coefficient of Variation (C.V.)
Absolute measures vs. relative measures of variation
Measures of dispersion may be either absolute or relative. Absolute measures of dispersion are expressed in the
same statistical unit in which the original data are given such as rupees, kilograms, kilometers etc. These values
may be used to compare the variations in two distributions provided the variables are expressed in the same units
and of the same average size. In case the two sets of data are expressed in different units, however, such as
quintals of sugar versus tones of sugarcane, or if the average size is very different such as manager’s salary versus

24
workers’ salary, the absolute measures of dispersion are not comparable. In such cases measures of relative
dispersion should be used.
A relative measure of dispersion is the ratio of an absolute measure of dispersion to an appropriate average. It is
called a coefficient of dispersion, because “coefficient” means a pure number that is independent of the unit of
measurement. It should be remembered that while computing the relative dispersion the average used as base
should be the same one from which the absolute deviations were measured.

1. Range
In any statistical series, the difference between the largest and the smallest values is called as the range.
Thus
Range (R) = L – S
where; 𝐿 = 𝐿𝑎𝑟𝑔𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑜𝑏𝑎𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑆 = 𝑠𝑚𝑎𝑙𝑙𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

Coefficient of Range: The relative measure of the range.


𝐿−𝑆
Coefficient of Range = 𝐿+𝑆
Merits
1. It is simplest and easiest measure of dispersion.
2. It is readily comprehensible and it requires very little calculations.
3. It is free of measure of central tendency.
Demerits
1. It is not suitable measure of dispersion as it affected by extreme vales.
2. It is not suitable measure of dispersion for the data with open ended classes.
3. It is very crude measure, as it is based on two extreme values and not on other values.
Applications
1. It is useful in studying the variations in the prices of share and stocks.
2. It is useful in studying weather conditions where minimum and maximum temperature is
identified.
3. It is widely used in industrial quality control.

2. Mean Deviation (M.D.)


The mean deviation or the average deviation is defined as the mean of the absolute deviations of observations
from some suitable average which may be the arithmetic mean, the median or the mode.
(a) Raw data:
 X −x
M .D.(about mean) =
n
 X − Me
M .D.(about median) =
n
 X − M0
M .D.(about mod e) =
n
(b) Discrete Ungroup and continuous frequency distribution:
∑ 𝑓|𝑋 − 𝑋̅|
𝑀. 𝐷. (𝑎𝑏𝑜𝑢𝑡 𝑚𝑒𝑎𝑛) =
𝑁
∑ 𝑓|𝑋 − 𝑀𝑒|
𝑀. 𝐷. (𝑎𝑏𝑜𝑢𝑡 𝑚𝑒𝑑𝑖𝑎𝑛) =
𝑁
∑ 𝑓|𝑋 − 𝑀𝑜|
𝑀. 𝐷. (𝑎𝑏𝑜𝑢𝑡 𝑚𝑜𝑑𝑒) =
𝑁
25
Remark:
1. The difference (X-Average) is called deviation and when we ignore the negative sign, this deviation is
written as x − Average and is read as absolute deviations.
2. The value of the mean deviation is minimum if the deviations are taken from the median.
∑|𝑋 − 𝐴| ≥ ∑|𝑋 − 𝑀𝑒| for raw data
∑ 𝑓|𝑋 − 𝐴| ≥ ∑ 𝑓|𝑋 − 𝑀𝑒| for frequency data
Where, A is any constant other than Median.

Coefficient of the Mean Deviation:


A relative measure of dispersion based on the mean deviation is called the coefficient of the mean
deviation or the coefficient of dispersion. It is defined as the ratio of the mean deviation to the average used
in the calculation of the mean deviation. Thus,

Merits
1. It is easy to calculate and simple to understand.
2. It is less affected by the extreme values of variable.
3. It is based on all observations in the distribution.
Demerits
1. It ignores the negative deviation and treats them as positive which is not justified mathematically.
2. It is not a satisfactory measure when the deviations are taken from the mode.
3. It is not suitable when the class intervals are open end type.
4. The mean deviation cannot be used in statistical inference.

3. Quartile Deviation: (The Semi Inter-quartile Range (or Semi IQR))


- The Quartile deviation (Q.D.) is a measure of variability, based on dividing a data set into quartiles.
- It is based on the lower quartile Q1 and the upper quartile Q3.
- The difference Q3-Q1 is called the inter quartile range.
- The difference Q3-Q1 divided by 2 is called semi-inter-quartile range or the quartile deviation.
- Thus, The Absolute measure of Quartile Deviation is
Q3 − Q1
Q. D. =
2
- Coefficient of Quartile Deviation:
A relative measure of dispersion based on the quartile deviation is called the coefficient of quartile
deviation. It is defined as
𝑄 −𝑄
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄. 𝐷 = 3 1
𝑄3+𝑄1
Merits
1. It is easy to calculate and simple to understand.
2. It is not affected by extreme values of the variable as it is concerned with the central half portion of
the distribution.
3. It is not at all affected by open end class data.
Demerits
1. It ignores completely the portions below the lower quartile and above upper quartile.

26
2. It is not capable of further mathematical treatment.
3. It is greatly affected by the fluctuations in the sampling.

4. Standard Deviation

27
28
Note: Standard deviation is best measure of variation because the value of s. d. is based on all observation in a set
of data. It is only the measure of variation capable of algebraic treatment.
a. It is less affected by sampling fluctuations as compared to other measures of variation.
b. Standard deviation has definite relationship with the area under the symmetric curve of a frequency
distribution.
Coefficient of Variation (C. V.)

29
To compare the variations (dispersion) of two different series, relative measures of standard deviation must be
calculated. This is known as co-efficient of variation or the co-efficient of s. d. Its formula is

C. V. = 100
x
Thus it is defined as the ratio s. d. to its mean.
Remark: It is given as a percentage and is used to compare the consistency or variability of two more series. The
higher the C. V., the higher the variability and lower the C. V., the higher is the consistency of the data.

Measures of skewness:
Introduction
The voluminous raw data cannot be easily understood; hence, we calculate the measures of central tendencies and
obtain a representative figure. From the measures of variability, we can know that whether most of the items of the
data are close to or away from these central tendencies. But these statistical means and measures of variation are not
enough to draw sufficient description about the data. Another aspect of the data is to know its symmetry. The
symmetry of data is well studied by the knowledge of the "Skewness."
Literal meaning of skewness is ‘lack of symmetry’. Study of skewness is help to have an idea about the shape of the
curve which can be draw with the help of the given frequency distribution. The frequency curve of the distribution is
not a symmetric bell-shaped curve but it is stretched more to one side than other then it is called skewed distribution.
A frequency distribution for which the curve has longer tail towards the right is said to be positively skewed and if the
longer tail lies towards the left, it is said to be negatively skewed.
- Symmetric Distribution: For symmetric distribution curve falls at same rate from the highest peak. Thus
frequency curve has same tail from mean. For such curve
Mean = Median = Mode
- Positively skewed distribution: For a positively skewed distribution curve rises rapidly, reaches the maximum and
falls slowly. In other words, if the frequency curve has longer tail to right the distribution is known as positively
skewed distribution and for a positively skewed distribution
Mean > Median > Mod.

- Negatively skewed distribution: A negatively skewed distribution curve rises slowly, reaches its maximum and
falls rapidly. In other words, if the frequency curve has longer tail to left the distribution is known as negatively
skewed distribution and for negatively skewed distribution
Mean < Median < Mode.

Measures of Skewness
A measure which gives the extent of asymmetry is known as the measure of ‘skewness’. Measures of
Skewness are categories in two ways.
(i) Absolute measures (ii) Relative measures
(i) Absolute Measures:
30
Sk = (Mean – Mode)
Sk = (Mean –Median)
Sk = Q3+ Q1 – 2 Median
Absolute measures are not much practical because they involve the units of measurement, hence cannot be
used for comparative study of the distribution measured in different units of measurements, even if the same
units of measurements, one may come across different distributions which have more or less identical absolute
measures but which vary widely in the measures of central tendency and dispersion.
(ii) Relative Measures
For comparing two or more distributions for skewness compute relative measures of skewness called
coefficient of skewness which are pure numbers independent of the units of measurement.
1. Karl Pearson’s Coefficient of skewness:
Mean − Mode x − M o
Sk = =
s.d 
If mode is not uniquely defined then
3(Mean − Median) 3(x − M e )
Sk = =
If s.d 

Sk  0 then distribution is positively skewed


Sk  0 then distribution is negatively skewed
Sk = 0 then distribution is symmetric.
Theoretically the values of Sk varies between ±3, but for a moderately skewed distribution, value of Sk
varies between ±1

2. Bowley’s coefficient of skewness: In case of open end distribution Bowely’s coefficient of skewness is used.
Q3 + Q1 − 2Me
Sk =
Q3 − Q1
If,
Sk  0 then distribution is positively skewed
Sk  0 then distribution is negatively skewed
Sk = 0 then distribution is symmetric.
Limits for Bowley’s Coefficient of Skewness is −1  Sk  +1

Uses of Skewness
1. It helps in finding out the nature and degree of concentration whether it is in higher or the lower values.
2. The imperative (it include statistical data) relationship between Mean, Median and Mode is based on the
assumption of a moderately skewed distribution. The measure of skewness will show to what amount
such imperative relationship would holds good.
3. It helps in knowing the distribution is normal or not. Many statistical measures are based on the
assumption of normal distribution.

Moments
Introduction:

31
Moment is a familiar mechanical term for the measure of a force with reference to its tendency to produce
rotation. In statistics moments are used to describe the various characteristics of a frequency distribution like
center tendency, variation, skewness and kurtosis.
Different types of moments:
(1) Central Moment:
Moments are calculated using the arithmetic mean. It is the arithmetic mean of the various powers of the
deviations of observations from the arithmetic mean in any distribution is called the moments of the distribution.
These moments about mean are called the "central moment" and are denoted by     
Symbolically, rth moment about A.M. ( x ) is term as rth central moment denoted by r and define as

1
(x − x )
r
 for raw data
r =  N r
1

N  f (x − x ) for frequency data

Remark : μo = 1
First four central moments are
1
N  ( x − x ) for raw data
1 = 
1
N 
 f (x − x ) for frequency data

1 = 0
1
 N ( x − x )
2
for raw data
2 =1 2

  f (x − x ) for frequency data


N
2 = Variance = V( x) = s.d 2
1
 N ( x − x )
3
for raw data
3 
= 1 3

  f (x − x ) for frequency data


N
1
 ( )4
N x − x for raw data
4 =1 4
  f (x − x ) for frequency data
N
(2) Raw Moment:
In many cases it is very difficult to calculate moments about actual moment; particularly when actual mean is
not whole number. In such case we first compute moments about an arbitrary origin ‘A’ and then convert these
moments into moments about actual mean. These are called ‘raw moments’ which are denoted by  'r .
Thus, rth moment about any point A is term as rth raw moment denoted by  'r and define as
1
 ( x − A)
r
N for raw data

r = 
'

1
  f (x − A)
r
for frequency data
N

32
First four raw moments are
1
 N  ( x − A)
1
for raw data
' =
1 
1 1

  f (x − A) for frequency data


N

1
 N  ( x − A)
2
for raw data

'2 = 
1 2

  f (x − A) for frequency data


N

 1
 ( x − A)
3
N for raw data


 '3 = 
 1 (x − A)
3
f for frequency data

N

 1
 ( x − A)
4
N for raw data


 '4 = 
 1 (x − A)
4
f for frequency data

N

note that '1 = x − A


(3) Moments about origin : rth moment about origin i.e. zero is denoted by r’ and define as

33
1
n  x
r
for raw data
r = 
1 r


N
 f (x) for frequency data

o = 1
First four moments about origin are
1
 n  (x )
1
for raw data
1 = 

 1 1


N
 f (x) for frequency data

1 = x
1
 n  (x )
2
for raw data
2 

= 
1 2


N
 f (x) for frequency data

1
 n  (x )
3
for raw data

3 = 
1 3


N
 f ( x) for frequency data

1
 n  (x )
4
for raw data

4 = 
1 4


N
 f (x) for frequency data

Property of Moments
1. Moments are independent of change of origin and dependent on change of scale.
x−x
Let d = where h is class width, then first four moments about mean are:
h
 = 
fd
2

 = h 
N
fd
3

 = h 
N
f
4

 = h d
N
2. Central moments in terms of Raw moments
1 = 0

34
1
 =
2  ( x − x)
2

N
1
=  (x − A - (x − A))
2

N
1 1
=  ( x − A) − 2  ( x − A)(x − A) + (x − A)2
2

N N
 = ' −2' + '
2 2
2 2 1 1
2 = '2 −' 2
1

 =
1
(x − x ) 3

N
3

1
=(x − A − (x − A))3
N
=  ( x − A) − 3  ( x − A) (x − A) + 3  ( x − A)(x − A)2 − (x − A)3
1 3 1 2 1
N N N
 = ' −3 ' ' +3 ' ' 2 −' 3
3
3 2 1 1 1 1
3
 = ' −3 '  ' +2  '
3 3 2 1 1
4 =  ( x − x )
1 4

N
=  (x − A − (x − A))
1 4

N
=  ( x − A) − 4  ( x − A) (x − A)2 + 6  ( x − A) (x − A)2 − 4  ( x − A)(x − A)3 + (x − A)4
1 4 1 3 1 2 1
N N N N
 = ' −4' ' +6' ' 2 −4' ' 3 + ' 4
4 4 3 1 2 1 1 1 1
 = ' −4' ' +6' ' 2 −3' 4
4 4 3 1 2 1 1

3. Raw moment in term of Central moment:

' 1 = x − A
 (x − A)
1
'2 =
2

 (x − x + x − A)
1 2
=
n

( x − x )  ( x − x )(x − A) + (x − A)2
1 2 1
= +2
n n

'2 = 2 + '1 2

35
1
'3 =  (x − A)3
n
' =  + 3 ' +3 ' 2 + ' 3
3 3 2 1 1 1 1
' =  + 3 ' +  ' 3
3 3 2 1 1
1
'4 =  (x − A)
4

n
' =  + 4 ' +6 ' 2 +4 ' 3 +  ' 4 Thus,
4 4 3 1 2 1 1 1 1
' =  + 4 ' +6 ' 2 + ' 4
4 4 3 1 2 1 1

x = ’+
 ’ =  + ’
 ’ =  +   ’  +  ’
 ’ =  +  ’  +  ’  + ’

Moments about Origin in term of Central Moment


 =  + ’ = x
 =  + 
 =  +    + 
=  +  −   
 =  +   +   + 

Central moments in terms of moment about origin


1 = 0
 2 =  2 − 1 2
3 =  3 − 3 21 + 213
4 =  4 − 4 31 + 6 212 − 3 14
Measure of Skewness based on Moments
Beta Coefficients for measurement
2
on skewness:
𝜇
The coefficient 𝛽1 = 𝜇32 is the relative measure of Skewness and  1 is always positive so, type of skewness
3
-
can achieve based on sign of third central moment ( 3 ).
If 3 is positive ( 3 > 0) then, the distribution is positively skewed,
If 3 is negative ( 3 < 0) then the distribution is negatively skewed,
If 3 is equal to zero then distribution is symmetric.
Gamma Coefficients for measurement on skewness:


Karl Pearson defined the following four coefficients, based upon four moments about the mean.

 = 1 = 3

 information about the shape of the curve obtained from the frequency
1 3

It is pure numbers and they provide


distribution.
If 1 <0 skewness is negative
If 1 >0 Skwewness is positive
If 1 =0 distribution is symmetric (no skewness)

36
Kurtosis
It has its origin in the Greek word "Bulginess." In statistics it is the degree of flatness or ‘peakedness’ in the region of
mode of a frequency curve. It is measured relative to the ‘peakedness’ of the normal curve. It tells us the extent to
which a distribution is more peaked or flat-topped than the normal curve.
If the curve is more peaked than a normal curve it is called ’Lepto Kurtic.’ In this case items are more clustered about
the mode.
If the curve is more flat-toped than the more normal curve, it is Platy-Kurtic. The normal curve itself is known as "Meso
Kurtic."

β2 gives the measure of Kurtosis or flatness of the mode.


β2=
4 4
=
2 2
4
If β2 = 3 then the curve is normal which is neither flat nor peaked i.e. Meso kurtic.
If β2 > 3 then the curve is more peaked than a normal curve and is called Lepto kurtic.
If β2< 3 then curve is flatter than a normal curve and is called Platy kurtic.

The measure of kurtosis is also sometimes represented by 2:


Gamma coefficient,  2 = 2 − 3
If,
2 = 0 curve is mesokurtic
2 > 0 curve is leptokurtic
2 < 0 curve is platykurtic

Distinction between Dispersion and Skewness


Dispersion Skewness
1. Dispersion deal with variability of individual 1. Skewness deals with the symmetry of distribution
items in a distribution. of values on either side of the mode
2. Dispersion is type of average, since it is the 2. Skewness is known by the use of various type of
mean of deviation around central value. average. Such as Mode, Median and Arithmetic
Mean but itself is not an average.
3. Dispersion helps in finding out the degree of 3. Skewness helps in finding out whether the
variability in the data. concentration is in higher value or in lower value.
4. Dispersion shows how far mean is 4. Skewness helps in judging whether the distribution
representative of the value. is normal.
5. Dispersion changes the shape of a frequency 5. Skewness indicates the extent to which dispersion
distribution in general. on the two sides of the Mode varies in the
arrangement of frequencies.
37
EXERCISE
1. Define Statistics as Statistical Method.
2. Discuss different types of statistical data by giving examples.
3. Explain in brief Measurement Scale with proper examples.
4. State the quality offer by different measurement scale.
5. Discuss different Types of frequency distribution of a variable.
6. Explain briefly the various methods that are used for graphical representation of frequency distribution.
7. Define: Frequency density, relative frequency, cumulative frequency.
8. What are the measures of central tendency? Why are they called measures of central tendency?
9. What are the different measures of central tendency? Discuss the essentials of an ideal average.
10. Distinguish between simple and weighted average and state the circumstances under which the latter should
be employed.
11. Define Arithmetic mean (AM), Geometric Mean (GM) and Harmonic mean (HM). In which circumstances AM,
GM and HM are used? What relation between AM, GM and HM exist?
12. Define Arithmetic mean (AM). State properties of Arithmetic Mean.
13. Which is the ideal measure of central tendency? Why?
14. Give empirical relation between Mean, Median and Mode.
15. Discuss relation between median, quartile, deciles and percentile.
16. For each of the following identify which sale of measurement is applicable.
1) Brand name or numbers on readymade garments. Nominal
2) Percentage of male and female coefficient drinker in Baroda City. Nominal
3) Ranking of teams in a tournament. Ordinal
4) Occupational status of people of India. Ordinal
5) Rating given by the UGC to different Universities. Interval
6) Daily temperature report given by news channel. Interval
7) Rank given to the football team of various country. Ordinal
8) Age of respondents. Ratio
9) Daily sales of different product in departmental store. Ratio
10) DA given by the government in 2017 to the government to the government employees. Interval
11) When the respondents are asked to place local shopping malls so that their choice is 1, their second
choice is 2, and so forth. Ordinal scale
12) Coding household income into “above Rs. 1,00,000”, between Rs 50,000 and Rs. 1,00,000”, and “below
Rs.50,000”. Nominal scale
13) The Fahrenheit temperature scale. Interval scale
14) Quality grades such as “good”, “better” and “best”. Ordinal scale
15) Indication of approximate age by checking the appropriate age category (1) 0 to 18 (2) 19 to 34 (3) 35
and over. Ordinal scale
16) Ranking of school students – 1st, 2nd, 3rd, etc. Ordinal scale
17) Ratings in restaurants. Ordinal scale
18) Assessing the degree of agreement. Ordinal scale
17. In each of the following cases, explain whether the description applies to mean, median or mode.
1) Can be calculated from a frequency distribution with open end classes. Median or Mode
2) The values of all items are taken into consideration in the calculation. Mean
3) The average of extreme items does not influence the average. Median
4) In a distribution with a single peak and moderate skewness to the right, it is closer to the concentration
of the distribution. Median
5) The distribution has wide range variation. Median
38
18. Which average would be more suitable in the following cases?
1) Average size of ready-made garments. Mode
2) Average intelligence of students in a class. Median
3) Average production per shift in a factory. Arithmetic mean
4) Average rate of growth of population per decade. Geometric mean
5) Average speed of a typist. Harmonic mean
6) When depreciation is charge by diminishing balance method and an average rate of depreciation is to be
calculated. Geometric mean
7) The distance covered is fixed but speeds are varying and an average speed is to be calculated. Harmonic
mean
8) The quantities having units are in ratios. Harmonic mean.
19. What is dispersion? Explain what you understand by absolute and relative measure of dispersion?
20. What do you understand by standard deviation? State the properties of Standard Deviation.
21. Explain the term variation. Comment on its relative measure along with its uses.
22. State giving reasons whether the following statements are true or false.
(a) Standard deviation can never be Negative.
(b) The sum of squared deviation measured from median is least.
(c) The sum of absolute deviation measured from mean is least.
23. Explain the concept of skewness. Draw the sketch of a skewed frequency distribution and show the position of
the mean, median and mode when the distribution is asymmetric.
24. Explain the concept of positive and negative skewness.
25. Discuss the various absolute and relative measure of skewness.
26. Distinguish between skewness and Kurtosis. Bring out their importance in describing frequency distribution.
27. What are the raw moment and central moments? Give the relation formula for first four both types of
moments.
28. What is Kurtosis? Discuss moment base measure of kurtosis.
29. From the following data find
(i) Mean, Median, Mode, Quartile, second deciles and 82th percentile. Also interpret your answers.
(ii) Absolute and Relative Measures of dispersion.
(iii) Measures of skewness and interpret.
(1)
Overtime Hours 84 85 86 87 88 89 90 91 92 93
weeks 2 2 0 1 4 3 2 5 4 7
(2)
No. of computer shipped 22-32 33-43 44-54 55-65 66-76 77-87
No. of days 4 4 9 14 6 5
(3)
Height (cm) 59.1-60.4 60.4-61.7 61.7-63.0 63.0-64.3 64.3-65.6 65.6-66.9 66.9-68.2
No. of plant 5 4 9 13 8 7 4

30. A candidate obtains the following percentages in an examination. English 46%, Mathematics 67%, Sanskrit
72%, Economics 58%, Political science 53%. It is agreed to give double weights to marks in English and
Mathematics as compared to other subjects. What is the average mark?

31. Calculate the mean, median, mode, quartile, third deciles and 82 percentile of the following data that relates to
the service time (in minutes) per customer for 7 customers at a railway reservation counter: 3.5, 4.5,
3, 3.8, 5.0, 5.5, 4

39
32. Calculate the median and mode of the following data that relates to the number of patients examined per
hour in the outpatient ward (OPD) in a hospital: 10, 12, 15, 20, 12, 24, 17, 18

33. The mean monthly salary paid to all employees in a company is Rs.16000. The mean monthly salaries paid to
technical and non-technical employees are Rs.18000 and Rs. 12,000 respectively. Determine the percentage of
technical and non-technical employees in the company.

34. Given the following frequency distribution with some missing frequencies:
Class 10-20 20-30 30-40 40-50 50-60 60-70 70-80
frequency 185 ---- 34 180 136 ---- 50
If the total frequency is 685 and median is 42.6, find out the missing frequency.
35. Find the missing information in the following table:
A B C Combine
Number 10 8 ---- 24
Mean 20 ---- 6 15
36. The following table gives the weekly wages in rupees of workers in certain commercial organization. The
frequency of the class interval 49-52 is missing.
Weekly wages(Rs.) 40-43 43-46 46-49 49-52 52-55
No. of workers 31 58 60 ---- 27
It is known that the mean of the above frequency distribution is Rs. 47.2. Find the missing frequency.

37. Calculate mean, mode and median from the following data of the heights (in inches) of a group of students: 61,
62, 63, 61, 63, 64, 60, 65, 63, 64, 65, 65, 66, 64
Now suppose that a group of students whose heights are 60, 66, 59, 68, 67 and 70 inches, is added to the
original group. Find mean mode and median of combine group.
38. An incomplete frequency distribution is given below.
Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70 Total
Frequency 4 16 - - - 6 4 230
Find the three missing frequency of the table, given that median = 33.5 and mode = 34. Also calculate the
mean using empirical relation between mean, median and mode.
39. The average daily wage of all workers in a factory is Rs. 444. If the average daily wages paid to male and female
are Rs. 480 and Rs.360 respectively, find the percentage of male and female workers employed by the factory.
(Ans: 70% and 30%)

40. Calculate the missing frequency for the following data given that the mode is Rs. 68.44
Earnings (Rs.) 66-67 67-68 68-69 69-70 70-71 71-72
No.of persons. 15 24 --- 20 14 11
(Ans : 40)
41. Find the missing figures:
a) Mean = ? (3 Median – Mode )
b) Mean – Mode = ? (Mean – Median)
c) Median = Mode + ?(Mean – Median)
d) Mode = Mean - ? (Mean – Median)
42. Doctor’s X and Y measured the systolic pressure of two groups of man all of the same age and results were:
Doctors No. of Men Mean Systolic Blood Pressure S.D
X 113 159 mm 22.4 mm
Y 121 149 mm 20.0 mm
Find the mean and S.D of the two groups taken together.

40
43. From the following table compute the missing values:
Sub group Number A.M Variance
I - 25 9
II 250 - 16
III 300 15 -
Combine 750 16 51.73
[Ans: N1 = 200, X2 = 10 , S.D32 = 25]
44. The following table gives the distribution of wages in the two branches of a factory:
Monthly Number of workers
wages (Rs) Branch A Branch B
100-150 167 63
150-200 207 93
200-250 253 157
250-300 205 105
300-350 168 82
a) Find mean and standard deviation for the two branches for the wages separately.
b) Which branch pays higher average wages?
c) Which branch has greater variability in wages in relation to the average wages?
d) What is the average monthly wage of the factory as a whole?
e) What is the variation of wages of all the workers in the two branches A and B taken together?
45. The following table gives the distribution of income of households based on hypothetical data:
Income (Rs.) Percentage of Income (Rs.) Percentage of
households households
Under 100 7.2 500-599 14.9
100-199 11.7 600-699 10.4
200-299 12.1 700-999 9.0
300-399 14.8 1000 and above 4.0
400-499 15.9
Compute a suitable measure of dispersion. Also find its relative measure.
46. Calculate appropriate karl Pearson’s coefficient of skewness from the following data.
Classes Frequency Classes Frequency
40-60 25 10-15 6
30-40 15 5-10 4
20-30 12 3-5 3
15-20 8 0-3 2
(Hint: since classes are of unequal width. So, median base coefficient can be computed.)
(ANS: Mean =31.13, median =31.67, S.D = 16.06 and Sk = -0.1)
47. The following facts are gathered before and after an industrial dispute.
Before dispute After dispute
No. of workers employed 515 500
Mean wage Rs. 49.5 Rs. 52.7
Median Wage Rs. 52.80 Rs. 50.00
Variance of wage (Rs.)2 121.00 (Rs.)2144.0
Compare the position before and after the dispute in respect of
(i) Total wages (ii) modal wages (iii) standard deviation (iv) skewness
Before dispute After dispute
Total wages Rs. 25492.50 Rs. 26849.75
Modal wages Rs. 59.4 Rs. 44.50
C.V 22.22 22.74
Skewness - 0.90 0.69
41
48. By using the quartiles, find a measure of skewness for the following distribution.
Annual Sales(Rs. ‘000) No. of firms Annual sales(Rs. ‘000) No. of firms
Less than 20 30 Less than 70 644
“ “ 30 225 “ “ 80 650
“ “ 40 465 “ “ 90 665
“ “ 50 580 “ “ 100 680
“ “ 60 634
(ANS : Q1 = 27.18, Q3 = 43.90, Median = 34.79, skewness 0.0903)
49. Calculate the first four moments about mean for the following distribution.
Also calculate beta coefficients, and comment upon the nature of skewness and Kurtosis.
Profit (Rs. In lakh) 10-20 20-30 30-40 40-50 50-60
Number of companies 18 20 30 22 10
(ANS: 152.04, 21.312, 47327.51, Sk = 0.0114, kurtosis= 2.047)
50. Karl Pearson’s measure of skewness of a distribution is 0.5. The median and mode of the distribution are
respectively, 42 and 32. Find (i) mean (ii) S.D. (iii) Coefficient of variation. (ANS: 47, 30, 63.83)
51. The first three moment of a distribution about the value 3 of a variable are 2, 10 and 30 respectively. Comment
upon the nature of distribution.
52. The following measures were computed for a frequency distribution:
Mean = 50, coefficient of Variation = 35% and Karl Pearson's Coefficient of Skewness = - 0.25.
Compute Standard Deviation, Mode and Median of the distribution. (ANS: 17.5, 54.375, 51.45833)
53. If the first quartile is 142 and the semi-inter quartile range is 18, find the median assuming the distribution to
be symmetrical. (Ans: 160)
54. In a frequency distribution the coefficient of skewness based on quartile is 0.6. If the sum of the upper and
lower quartile is 100 and median is 38. Find the value of upper and lower quartile.(ANS: 70,30)
55. In a distribution ‘the difference of the two quartiles is 15 and their sum is 35 and median is 20. Find the
coefficient of skewness. (ANS: -0.33)
56. Find coefficient of skewness from the following data and show which section is more skewed.
Income(Rs.) 55-58 58-61 61-64 64-67 67-70
Section A 12 17 23 18 11
Section B 20 22 25 13 4
(ANS: Sk(A) = -0.0061, Sk(B) = -0.06, Section B is more skewed)
57. The first four moment of distribution about origin are 1, 4, 10, 46 Obtain first four central moment and
comment upon the nature of the distribution. (ANS: mean = 1, s.d. = 1.732,central moments= 0,3,0,26, sk=0,
kurtosis=3, distribution is symmetric and mesokurtic, hence normal)
58. If β 1= +1 and β2=4 and variance =9. Find the value of µ 3 and µ4 and comment upon nature of distribution.(ANS:
27, 324)
59. For a mesokurtic distribution the first moment about 7 is 23 and the second moment about origin is 1000. Find
the coefficient of variation and the fourth moment about mean. (Ans: 33.33, 30000)
60. For distribution, the mean is 10, variance is 16, β 1 is +1 and β2 is 4. Obtain first four moments about the origin.
Comment upon nature of distribution. (10, 116, 1544, 23184)
61. The following data are given to an economist for the purpose of economic analysis. The data refer to the length
of certain type of batteries,
𝑁 = 100, ∑ 𝑓𝑑 = 50, ∑ 𝑓𝑑2 = 1970, ∑ 𝑓𝑑3 = 2948, 𝑎𝑛𝑑 ∑ 𝑓𝑑4 = 86752 in which d = (X-48)
Do you think distribution is platykurtic? Also comment on skewness. (ANS: β2=2.214)
62. Give any three measures of skewness of a frequency distribution. Explain briefly with suitable diagrams the
term skewness.
63. Distinguish between Skewness and Kurtosis.
64. Explain briefly how the measures of skewness and kurtosis can be used in describing a frequency distribution.
65. Define moments. ‘’A frequency distribution can be described almost completely by the first four moments and
two measures based on moments.” Examine the statement

42
43

You might also like