Unit 1-2
Unit 1-2
Editor
Deekshant Awasthi
Published by:
Department of Distance and Continuing Education
Campus of Open Learning, School of Open Learning,
University of Delhi, Delhi-110007
Printed by:
School of Open Learning, University of Delhi
IT SKILLS AND DATA ANALYSIS-I
Reviewer
Ms. Asha Yadav
Disclaimer
Printed at: Taxmann Publications Pvt. Ltd., 21/35, West Punjabi Bagh,
New Delhi - 110026 (500 Copies, 2024)
PAGE
UNIT-I
Lesson 1: Introduction to Statistics 3–14
UNIT-II
Lesson 4: Measures of Central Tendency 79–112
Lesson 6: Moments134–163
PAGE i
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 1
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
1
Introduction to Statistics
STRUCTURE
1.1 Introduction
1.2 Types of Data
1.3 Collection of Data
1.4 Collection of Primary Data
1.5 Sources of Secondary Data
1.6 Quantitative Data and Qualitative Data
1.7 Levels of Measurements
1.8 Presentation of Data
1.9 Exercise
1.1 Introduction
The present-day society is essentially information-oriented. In various fields, we need
information. The set of information in the form of numerical figures is known as data.
The numerical figures maybe for example about: imports and exports of different coun-
tries, per-capita national income, minimum and maximum temperature, food production,
increase in population, per-capita expenditure, income tax, sales tax and property tax, etc.
Thus, Data may be defined as the figures which are numerical or otherwise collected with
a definite purpose and from which meaningful information can be obtained. The branch
of study which deals with data is known as Statistics.
Statistics is a branch of science which deals with the collecting, organising, summaris-
ing, presenting and analysing data and drawing valid conclusions and thereafter making
reasonable decisions on the basis of such analysis.
The word ‘statistics’ seems to have been derived from the Latin word ‘status’ or the Italian
word ‘statista’ or the German word ‘statistic’ each of which means a “political state”. In
ancient times, the government used to collect the information regarding the “population” and
PAGE 3
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes “property” of the country. It was confined only to the affairs of the state but
now it embraces almost every sphere of human activity. It is now finding
wide application in almost all sciences – social as well as physical – such
as biology, psychology, education, economics, business management, etc.
It has become indispensable in all phases of human endeavor.
4 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
(i) Availability of Finance: This is one of the factors which influences Notes
the selection of the method of collection of data. When financial
resources at the disposal of the investigator are scanty, he shall have
to leave aside expensive methods, even though they are better than
others which are comparatively cheap.
(ii) Availability of Time: Some methods, involve long duration of enquiry
while with others the enquiry can be conducted in a comparatively
shorter duration. The time at the disposal of the investigator thus
affects the selection of the technique by which data is to be collected.
PAGE 5
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
6 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
8 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
conducted by these bodies and these findings are not published and are Notes
usually meant for the consumption of their members only.
PAGE 9
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes significant variations in different regions. Further the data may not cover
suitable periods; for a monthly study of a phenomenon; yearly figures are
inadequate. Again, the degree of accuracy achieved in the data may be
found to be inadequate for the purpose of the investigation in which they
are proposed to be used. Thus, it is very risky to use statistics collected
by other people unless they have been thoroughly scrutinized and found
reliable, suitable and adequate.
10 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
1.7 Levels of Measurements
Scales of measurement or level of measure describes the nature of infor-
mation within the numbers assigned to variables. Best known classifica-
tion is with four levels or scales of measurement are nominal, ordinal,
interval, and ratio.
1. Nominal Scale of measurement deals with variables that are non-
numeric or numbers without any value. That is the measurement
is said to be on nominal scale if the observations are taken in
accordance with some attribute or quality. For example, married or
unmarried; literate or illiterate; male or female; nationality, religion,
etc. In nominal scaling the numerical values are categorized in such
a way that they are mutually exclusive and collectively exhaustive.
Since nominal scale has no order and no arithmetic origin, it is said
to be least powerful among four scales.
2. Ordinal Scales The scale is said to be ordinal when some definite
order or rank is also given along with the nominal scale. For
example, if the data is collected on the basis of intelligence i.e.
genius, above average, average, dull, etc. and are ranked as 1, 2, 3
and so on. In ordinal scaling the numerical values are categorized
to denote qualitative differences among the various categories as
well as rank ordered in some meaningful way according to some
preference. However, the ordinal scale does not give any indication
of the magnitude of difference among the ranks. Therefore, ordinal
scale of measurement looks at variables where the order matters
but the; differences do not matter. If A is better than B, which is
better than C, and so on. But is A four times better than D? Is it
two times better? As, the order is important but not the differences.
3. Interval Scales In the interval scale the data are represented in
a definite interval. Interval scale is interpretable, i.e. it not only
classifies individuals according to certain categories and determine
order of these categories, it also measures the magnitude of the
differences in the preferences among the individuals and we can
perform arithmetic operations on the data collected.
Example is temperature as the difference between each value is
the same difference between 45 and 30 degrees is measurable 15
PAGE 11
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
12 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
1.9 Exercise
1. Divide your class into five groups and ask them to collect data from
day to day life.
2. Classify the data collected in question no. in primary and secondary
data.
3. Discuss the meaning and scope of Statistics.
4. What do you understand by the word ‘Statistics’ in (i) Singular form
(ii) Plural form.
5. Define some fundamental characteristics of Statistics.
6. What are primary and secondary data? Which of the two is more
reliable and why?
7. Explain the purpose and methods of classification of data.
8. Distinguish between primary and secondary data. What are the various
methods used in collecting primary data. Examine the relative merits
and limitations of each method.
9. “In collection of statistical data commonsense is the chief requisite
and experience the chief teacher.” Discuss the above statement with
comments.
PAGE 13
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 10. Mention the different kinds of statistical methods generally used in
investigations. Are there any fields of enquiry where these methods
cannot be used?
11. “Though figures cannot lie, yet liars can figure”. Expand the above
statement so as to explain its bearing on the use of secondary
statistical data.
12. How will you organise an investigation into the handloom weaving
industry of Uttar Pradesh? Prepare a questionnaire for the purpose.
13. How far do the results of statistical investigations depend upon correct
sampling? Compare the methods used to secure representative data.
14. State and explain the law of statistical regularity. Discuss the methods
generally used in sampling.
15. Compare the different methods used in the collection of statistical
data. Explain the importance of determining a statistical unit in the
collection of data.
16. Distinguish between a census and a sample enquiry and briefly discuss
their comparative advantages. Which of these methods would you
prefer for calculating the total wages of workers in a given industry?
14 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
2
Frequency Distribution
STRUCTURE
2.1 Introduction
2.2 Frequency Distribution
2.3 Frequency Distribution of an Ungrouped Data
2.4 Procedure of Arranging the Given Data Into Class Intervals
2.5 When the Mid Value of the Class and the Class Size are Given
2.6 Cumulative Frequency
2.7 Types of Cumulative Frequencies
2.8 Miscellaneous Questions
2.9 Exercise
2.1 Introduction
Classification of the data helps in organizing raw data into smaller groups which facili-
tates comparison. It helps in studying the relationship between several characteristics and
facilitates further statistical treatment.
Primary rules that should be followed while classifying are:
1. The classes should be unambiguously defined.
2. Every observation must belong to one class or the other i.e. classes should be exhaustive.
3. As far as possible, classes should be of equal width.
The number of classes should neither be too large nor too small.
PAGE 15
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Thus, there are two types of frequency distributions, of grouped data.
1. Inclusive form (Discontinuous form): A frequency distribution in
which both lower and upper limit of each class is included in the
class.
2. Exclusive form (Continuous form): A frequency distribution in which
upper limit of each class is excluded and lower limit is included.
16 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Example 2: The distance (in km) of 40 female engineers from their res- Notes
idence to their place of work were found as follows:
5 3 10 20 25 11 13 7 12 31
19 10 12 17 18 11 32 17 16 2
7 9 7 8 3 5 12 15 18 3
12 14 2 9 6 15 15 7 6 12
Construct a grouped frequency distribution table with class size 5 for the
data given above, taking the first interval as 0-5 (5 not included). What
main features do you observe from this tabular representation?
Solution: Frequency distribution of above data in tabular form is as follows:
Distances (in km) Tally Marks Frequency
0–5 |||| 5
5–10 |||| |||| | 11
10–15 |||| |||| | 11
15–20 |||| |||| 9
20–25 | 1
25–30 | 1
30–35 || 2
Total 40
We observe that:
(i) The residence of 22 female engineers is within 5 to15 km.
(ii) The residence of 4 female engineers is within 20 to 35 km.
Example 3: The relative humidity (in%) of a certain city for a month
of 30 days was as follows:
98.1 98.6 99.2 90.3 86.5 95.3 92.9 96.3 94.2 95.1
89.2 92.3 97.1 93.5 92.7 95.1 97.2 93.3 95.2 97.3
96.2 92.1 84.9 90.2 95.7 98.3 97.3 96.1 92.1 89
(i) Construct a grouped frequency distribution table with classes 84–86,
86–88 etc.
(ii) Which month or season do you think this data is about?
(iii) What is the range of this data?
PAGE 17
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Solution:
(i) Frequency distribution of above data in tabular form as
Relative Humidity
(in%) Frequency
84–86 1
86–88 1
88–90 2
90–92 2
92–94 7
94–96 6
96–98 7
98–100 4
Total 30
(ii) This data is related to the rainy season.
(iii) Range 99.2 – 84.9 = 14.3
Example 4: The following data represents the life times in hour of 20
bulbs produced by a factory:
235, 236, 238, 232, 230, 239, 235, 236, 237, 231, 240, 239, 233, 232,
240, 231, 239, 238, 233, 230.
Construct a frequency table for the given data. Use your table to answer
the following questions.
(i) How many bulbs had life time more than 235 hours?
(ii) How many bulbs had lifetime less than 234 hours?
(iii) What percent of bulbs had lifetime more than 238 hours?
(iv) The manufacturer claimed that 70% of the bulbs produced by his
factory has a minimum lifetime of 236 hours. What percent of the
bulbs did not fulfil the claim?
Solution: Frequency Distribution of bulbs is as follows:
Lifetime (in Hours) Tally Marks Frequency
230 || 2
231 || 2
232 || 2
233 || 2
18 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 19
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Solution:
Marks Obtained Tally Marks Frequency
0 || 2
1 | 1
2 |||| 4
3 |||| 5
4 |||| 4
5 ||| 3
6 ||| 3
7 ||| 3
8 || 2
9 ||| 3
10 0
Total 30
(i) Number of students who got pass marks = 18 Ans.
(ii) Number of students failed = 12 Ans.
(iii) Number of students who secured highest marks = 3 Ans.
(iv) Number of students who received more than 60% = 8 Ans.
Sometimes the data is so large that it is inconvenient to list every mark
in the frequency distribution table. Then we group the marks into con-
venient intervals. Consider the following data of the marks obtained by
30 students: 84, 75, 93, 48, 60, 57, 61, 67, 53, 39, 50, 66, 81,49, 54,
88, 78, 64, 35, 69, 46, 52, 73, 5
PAGE 21
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
1. In class 54.5 – 64.5, the lower-class boundary is 54.5 and the upper-
class boundary is 64.5.
2. Class size is (64.5 – 54.5) = 10, is same for each class.
3. Marks 54.5 will be included in the class 54.5 – 64.5 while marks
64.5 will be included in the next class i.e., 64.5 – 74.5.
These types of class intervals are called continuous or exclusive form
since the upper limit of the class is excluded from the class and is in-
cluded in the next class.
Class Mark: Class mark is the mid value of a particular class i.e., the
average of its class limits or class boundaries.
For example, for the class 35 – 44, the class mark = (35 + 44)/2 = 39.5
Class mark is the representative of its class.
Class Mark (or Mid-Value of the class) = (Upper class limit + lower
class limit)/2
Example 6: The following data gives marks out of 60 obtained by 30
students of a class in a test:
50, 22, 56, 47, 27, 37, 40, 16, 12, 33, 29, 49, 35, 15, 43, 29, 31, 22, 51,
27, 29, 27, 22, 18, 20, 11, 19, 31, 23, 58.
22 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Arrange them in ascending order and present it as a grouped data (i) in Notes
Inclusive form and (ii) in Exclusive form.
Solution: Arranging the marks in ascending order we get 11, 12, 15, 16,
18, 19, 20, 22, 22, 22, 23, 27, 27, 27, 29, 29, 29, 31, 31, 33, 35, 37, 40,
43,47,49, 50, 51, 56, 58.
(i) Inclusive Form
Frequency Distribution of Marks
Marks (Class Intervals) Tally Marks Frequency
11 – 20 |||| || 7
21 – 30 |||| |||| 10
31 – 40 |||| | 6
41 – 50 |||| 4
51 – 60 ||| 3
Total 30
PAGE 23
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Represent the data in the form of a frequency distribution with class
size 10.
Solution: Minimum monthly wage = 804 Maximum monthly wage = 898
Range = Maximum monthly wage – Minimum monthly wages = 898 –
804 = 94
Size of the class interval = 10
Range class size= 94/10= 9.4
Number of class intervals = 10.
The minimum value = 804
The first-class interval is 804 – 814
Thus, the other class intervals are
814 – 824, 824 – 834, 834 – 844, 844 – 854, 854 – 864, 864 – 874,874
– 884, 884 – 894, 894 – 904.
So, we obtain the following frequency distribution table:
Frequency Distribution of Monthly Wages
Monthly Wages Tally Marks Frequency
804 – 814 |||| 5
814 – 824 | 1
824 – 834 ||| 3
834 – 844 |||| ||| 8
844 – 854 || 2
854 – 864 || 2
864 – 874 || 2
874 – 884 | 1
884 – 894 |||| 5
894 – 904 | 1
Total 30
24 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution: The minimum and maximum weekly wages in the given raw Notes
data are 62 and 168. It is given that 90 – 100 is one of the class intervals
and the class size in same. So, the classes of equal sizes are:
60 – 70, 70 – 80, 80 – 90, 90 – 100, 100 – 110, 110 – 120, 120 – 130,
130 – 140, 140 – 150, 150 – 160, 160 – 170.
Frequency Distribution of Weekly Wages
Weekly Wages
(in Rs.) Tally Marks Frequency
60 – 70 | 1
70 – 80 |||| 4
80 – 90 ||| 3
90 – 100 ||| 3
100 –110 ||| 3
110 – 120 .... 0
120 – 130 ||| 3
130 – 140 || 2
140 – 150 | 1
150 – 160 |||| | 6
160 – 170 |||| 4
Total 30
Note: The above frequency table is prepared by the exclusive meth-
od. The observation 150 is not included in the class 140 – 150 but
is included in the next class 150 – 160. Similarly, 160 is included
in the class 160 – 170.
Example 9: The class marks of a distribution are 61, 66, 71, 76, 81, 86,
91, 96, 101, 106. Determine the class size, class limits and true class limits.
Solution: Here the class marks are uniformly spaced, so the class size
is the difference between any two consecutive class marks.
Therefore, class size = 66 – 61 = 5.
Let the lower limit of the first-class interval be a.
Then its upper limit = a + 5
Mid-value of the first interval = 61
Class mark = (Upper limit + Lower limit)/2
PAGE 25
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Then,
[a + (a + 5)]/2 = 61
a + (a + 5) = 122
2a + 5 = 122
2a = 117
a = 58.5
The first-class interval is 58.5 – 63.5.
and the other class intervals are 58.5 – 63.5, 63.5 – 68.5, 68.5 – 73.5,
73.5 – 78.5, 78.5 – 83.5, 83.5 – 88.5, 88.5 – 93.5, 93.5 – 98.5, 98.5 –
103.5 and 103.5 – 108.5.
Class Marks Class Interval
61 58.5 – 63.5
66 63.5 – 68.5
71 68.5 – 73.5
76 73.5 – 78.5
81 78.5 – 83.5
86 83.5 – 88.5
91 88.5 – 93.5
96 93.5 – 98.5
101 98.5 – 103.5
106 103.5 – 108.5
Therefore, the classes are exclusive, so the true class limits are same as
class limits. The lower-class limits are 58.5, 63.5, 68.5, 73.5, 78.5, 83.5,
88.5, 93.5, 98.5 and 103.5.
The upper-class limits are 63.5, 68.5, 73.5, 78.5, 83.5, 88.5, 93.5, 98.5,
103.5 and 108.5.
2.5 When the Mid Value of the Class and the Class Size
are Given
Example 10: The following data gives the weights (in grams) of 50 or-
anges picked from a basket. Construct a grouped frequency distribution
taking class-intervals of equal width 20 in such a way that the mid-value
of the first-class interval is 10.
26 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
106, 107, 76, 82, 109, 107, 115, 93, 187, 95, 123, 125, 11, 92, 86, 70, Notes
126, 68, 130, 129, 139, 119, 115, 128, 100, 18, 84, 99, 113, 204, 111,
141, 136, 123, 90, 115, 98, 110, 7, 90, 107, 81, 131, 75, 84, 104, 1
Solution: Here the size of each class = 20
Mid-value of the first class = 10
Let the lower limit of the first-class interval be a.
Then its upper limit = a + 20
Mid-value of the first interval = 10
Frequency Distribution = 21
(a + (a + 20))/2 = 10
a + a + 20 = 20
2a = 0
a = 0
Upper limit of the class = 0 + 20 = 20
Thus, First-class-interval is 0 – 20 and then other classes are:
20 – 40, 40 – 60, 60 – 80, 80 – 100, 100 – 120, 120 – 140, 140 – 160,
160 – 180, 180 – 200 and 200 – 220.
Frequency Distribution of Weights
Weights (in Grams) Tally Marks Frequency
0 – 20 ||| 3
20 – 40 .... 0
40 – 60 .... 0
60 – 80 |||| 4
80 – 100 |||| |||| |||| 14
100 – 120 |||| |||| |||| | 16
120 – 140 |||| |||| 10
140 – 160 | 1
160 – 180 .... 0
180 – 200 | 1
200 – 220 | 1
Total 50
PAGE 27
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Example 11: The weights in gms of 50 mangoes picked at random from
a consignment are as follows:
141, 123, 92,85, 214, 91, 94, 128, 114, 120, 90, 117, 121, 151, 146, 133,
100, 88, 100, 125, 120, 108, 116, 109, 117, 94, 86, 196, 92, 110, 119,
138, 125, 117, 125, 129, 103, 197, 149, 139, 140, 78, 205, 133, 135,
121, 102, 96, 80, 136.
Form the grouped frequency table by dividing the variable range into
intervals of equal width of 20 gms such that the mid-value of the first
interval is 80 gms.
Solution: It is given that size of each class = 20.
Let the lower limit of the first-class interval be a. Then, its upper limit
= (a + 20)
Mid-value of the first-class interval = 80
(a + (a + 20))/2= 80
2a + 20 = 160
2a = 140
a = 70
Thus, the first-class interval is 70 – 90 and the other classes are:
90 – 110, 110 – 130, 130 – 150, 150 – 170, 170 – 190, 190 – 210, 210
– 230
So, the frequency distribution table is as under:
Frequency Distribution of Weights of Mangoes
Weight (in gm) Tally Marks Frequency
70 – 90 |||| 5
90–110 |||| |||| ||| 13
110–130 |||| |||| |||| || 17
130 – 150 |||| |||| 10
150 – 170 | 1
170 – 190 .... 0
190 – 210 ||| 3
210 – 230 | 1
Total 50
28 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
2.6 Cumulative Frequency
Definition: The cumulative frequency corresponding to a class is the
sum of all the frequencies up to and including that class. Let us consider
marks obtained by 50 students in M.Sc. in a test.
1 student secured zero marks.
3 students, each secured 5 marks.
9 students, each secured 10 marks and so on.
How many students secured 10 marks or less marks?
Then we have to add all the frequencies corresponding to 0 marks, 5
marks and 10
i.e., (1 + 3 + 9) students = 13 students.
It means 13 students secured 10 marks or less than 10 marks. 13 is
termed as cumulative frequency for marks 10.
i.e., 1 + 3 + 9 + 11 = 24
It means 24 students secured 14 marks or less than 14 marks.
24 is termed as cumulative frequency for marks 14.
PAGE 29
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
30 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution:
Age No. of Persons
0 – 10 11 = 150 – 139
10 – 20 22 = 139 – 117
20 – 30 19 = 117 – 98
30 – 40 23 = 98 – 75
40 – 50 12 = 75 – 63
PAGE 31
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 50 – 60 22 = 63 – 41
60 – 70 15 = 41 – 26
70 – 80 8 = 26 – 18
80 – 90 11 = 18 – 7
90 – 100 7 = 7 – 0
Example 16: Find the unknown entries (a, b, c, d, e, f, g) in the fol-
lowing frequency distribution table of heights of 50 students in a class:
Class (Heights in cm) Frequency Cumulative Frequency
150 – 155 12 a
155 – 160 b 25
160 – 165 10 c
165 – 170 d 43
170 – 175 e 48
175 – 180 2 f
g 50
Solution: Cumulative Frequency Distribution Table (More than type)
x Class (Height in cm) Frequency Cumulative Frequency
less than 155 12 12 = a
less than 160 b 12 + b = 25
less than 165 10 12 + b + 10 = c
less than 170 d 12 + b + 10 + d = 43
less than 175 e 12 + b + 10 + d + e = 48
less than 180 2 12 + b + 10 + d + e + 2 = f
g 50
a = 12
12 + b = 25
b = 25 – 12 = 13
12 + b + 10 = c
12 + 13 + 10 = c
c = 35
12 + b + 10 +d = 43
12 + 13 + 10 + d = 43, or d = 43 – 12 – 13 – 10 = 8
12 + b + 10 +d + e = 48 or 12 + 13 + 10 + 8 + e = 48
43 + e = 48 or e = 5
32 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
12 + b + 10 + d + e + or 12 + 13 + 10 + 8 + 5 + Notes
2 = f 2 = f
50 = f
g = 50
Hence, a = 12, b= 13, c = 35, d = 8, e =5, f = 50, g = 50
PAGE 33
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes:
1. Total Contribution = Average contribution × No. of persons joined
the trip
= 15.5 × 80 = 1240
2. Contribution of the staff per head has been obtained by deducting the
contribution of students from the total and dividing the difference
by the number of teaching staff i.e.
Example 19: A survey of 370 students from Commerce Faculty and 130
students from Science Faculty revealed that 180 students were studying
for only C.A. Examination, 140 for only Costing Examination and 80
for both C.A. and Costing Examinations. The rest had offered part-time
34 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Management Course. Of those studying for Costing only, 13 were girls Notes
and 90 boys belong to Commerce Faculty. Out of 80 studying for both
C.A. and Costing 72 were from Commerce Faculty amongst which 70
were boys. Amongst those who offered part-time Management Course,
50 boys were from Science Faculty and 30 boys and 10 girls from Com-
merce Faculty. In all there were 110 boys in Science Faculty. Present the
above information in tabular form. Find the number of students studying
for part-time Management Course.
Solution:
Distribution of Students According to Professional Course
Faculty
Commerce Commerce Commerce Science Science Science Total Total Grand
Courses Boys Girls Total Boys Girls Total Boys Girls Total
Part-time 30 10 40 50 10 60 80 20 100
Management
Only C.A. 150 8 158 16 6 22 166 14 180
Only Costing 90 10 103 37 3 40 127 13 140
C.A. and 70 2 72 7 1 8 77 3 80
Costing
Total 340 30 370 110 20 130 450 50 500
Note on Calculations:
Total number of students = 370 (Commerce) + 130 (Science) = 500
Students studying in part-time management courses = 500 – (180
+ 140 + 80) = 500 – 400 = 100
Example 20: Prepare a frequency table for the following data with width
of each class-interval as 10. Use exclusive method of classification.
57 44 80 75 0 18 45 14 4 64
72 51 69 34 22 83 70 20 57 28
96 56 50 47 10 43 61 66 80 46
22 10 84 50 47 73 42 33 48 65
10 34 66 53 75 90 58 46 38 69
Solution:
Preparation of Frequency Distribution
Variable Tally Bars Frequency
0–10 || 2
10–20 |||| 5
PAGE 35
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
36 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 37
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
38 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Example 24: Prepare a statistical table from the following weekly wages
of 100 workers (in Rs.) of Factory A:
88 23 27 28 86 96 94 93 86 99
82 24 24 55 88 99 55 86 82 36
96 39 26 54 87 100 56 84 83 86
102 48 27 26 29 100 59 83 84 48
104 46 30 29 40 101 60 89 46 49
106 33 36 30 40 103 70 90 49 50
104 36 37 40 46 108 72 24 50 60
24 39 49 46 66 107 76 96 46 67
26 78 50 44 43 46 49 99 96 68
29 67 56 99 93 48 80 102 32 51
Solution: The lowest value is 23 and the highest 105. The difference in
the highest and the lowest value is 83. If we take a class interval of 10,
PAGE 39
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes nine classes would be formed. The first class would be taken as 20–30
instead of 23–33 as per the principles of classification.
Wages (Rs.) Tally Frequency
20-30 |||| |||| ||| 13
30-40 |||| |||| | 11
40-50 |||| |||| |||| ||| 18
50-60 |||| |||| 10
60-70 |||| | 6
70-80 |||| 5
80-90 |||| |||| |||| 14
90-100 |||| |||| || 12
100-110 |||| |||| | 11
Total = 100
Example 25: Present the following data of the percentage marks of 60
students in the form of a frequency table with 10 classes of equal widths,
one class being 50-59.
41 17 83 63 54 92 60 58 70 06 67 82
33 44 57 49 34 73 54 63 36 52 32 75
60 33 09 72 28 30 42 93 43 80 03 32
57 67 24 64 63 11 35 82 10 23 00 41
60 32 72 53 92 88 62 55 60 33 40 57
Solution:
Formulation of Frequency Distribution
Marks Tally Bars Frequency
0-9 |||| 4
13-19 ||| 3
20-29 ||| 3
30-39 |||| |||| 10
40-49 |||| ||| 7
50-59 |||| |||| 9
60-69 |||| |||| | 11
70-79 |||| 5
80-89 |||| 5
90-99 ||| 3
Total = 60
40 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
2.9 Exercise
1. Explain the cumulative frequency distribution.
2. What is the difference between a frequency distribution and a
cumulative frequency distribution.
3. The following is the distribution of weights (in kg) of 40 persons:
Weight (in kg) No. of Persons
40 – 45 4
45 – 50 4
50 – 55 13
55 – 60 5
60 – 65 6
65 – 70 5
70 – 75 2
75 – 80 1
Total 40
(i) Determine the class marks of the class 40 – 45, 45 – 50, etc.
(ii) Construct the cumulative frequency distribution table. (Less
than type and more than type)
4. The following is the distribution of marks of 180 primary school
students of Allahabad:
Frequency Distribution Table
Marks No. of Students
Less than 20 10
20 – 25 33
25 – 30 52
30 – 35 47
35 – 40 28
40 – 45 6
45 – 50 4
Total 180
Construct a cumulative frequency distribution table. (Less than type)
5. In a study of diabetic patients, the following data are obtained.
PAGE 41
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 43
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
44 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
14. In the annual report of a mobile oil company, it is indicated that the Notes
company drilled a total of 882 wells in 2008 and 487 in 2009. Two
types of drilling operations were conducted: Wildcat and developmental.
In 2008, a total of 40 wildcat wells and 842 developmental wells
were drilled. The comparable figures for 2009 were 46 and 441.
There were 3 possible outcomes when a well was drilled: oil, gas,
or dry hole. Of the wildcat wells drilled in 2008, 6 resulted in oil,
4 in gas, 30 in dry holes. The comparable figures for 2009 were 6,
4, and 36. Of the developmental wells drilled in 2008, 660 resulted
in oil, 77 in gas, and 105 in dry holes; the comparable figures for
2009 were 333, 44, and 64.
Present the information in the above paragraph in a formal table
giving an appropriate title.
15. The weight in grams of 50 apples, picked from a box are as follows:
110 103 89 75 98 121 110 108 93 128
185 123 113 92 86 70 126 78 139 120
129 119 105 120 100 116 85 99 114 189
205 111 141 136 123 90 115 128 160 78
90 107 81 137 75 84 104 109 87 115
PAGE 45
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 18. Classify the following data by taking class intervals such that their
mid-values are 17, 22, 27, 32, and so on:
30 42 30 54 40 48 15 17 51
42 25 41 30 27 42 36 28 26
37 54 44 31 36 40 36 22 30
31 19 48 16 42 32 21 22 46
33 41 21
19. Following are the number of two wheelers sold by a dealer during
eight weeks of six working days each.
13 19 22 14 13 16 19 21
23 11 27 25 17 17 13 20
23 17 26 20 24 15 20 21
23 17 29 17 19 14 20 20
10 22 18 25 16 23 19 23
21 17 18 24 21 20 19 26
(i) Group these figures into a table having the classes 10–12,
13–15, 16–18, ..., and 28–30.
(ii) Convert the distribution of part (i) into a corresponding frequency
distribution and also a cumulative frequency distribution.
20. Of the 1,125 students studying in a co-ed college during a year,
720 were Hindus, 628 were boys, and 440 were science students;
the number of Hindu boys was 392, that of boys studying science
205 and that of Hindus students studying science 262; the number
of science students among the Hindu boys was 148. Enter these
frequencies in a three-way table with the rows representing the
Faculty (Science and Arts), and the columns representing Religion
(Hindus and Non-Hindus) and Gender (Boys and Girls and complete
the table by obtaining the frequencies of the remaining) cells.
46 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
3
Histogram, Frequency
Polygons, Frequency
Curves and Ogives
STRUCTURE
3.1 Histogram
3.2 Bar Graph
3.3 Frequency Polygon
3.4 Pie Diagrams
3.5 Cumulative Frequency and Ogives
3.6 Miscellaneous Questions
3.7 Exercise
3.1 Histogram
Histogram is a graphical representation of a grouped frequency distribution with continu-
ous class. It consists of a set of rectangles having their heights proportional to their class
frequencies, for equal class intervals. There is no gap between two successive rectangles.
The rectangles are constructed with base as the class size and their heights representing
the frequencies.
Drawing of Histogram for Continuous Grouped Frequency Distribution
1. Along the x-axis class intervals are marked.
2. The corresponding frequencies are marked on the y-axis.
3. The rectangles are constructed with class intervals as bases and the corresponding
frequencies as heights.
Note:
If the class intervals are not continuous then they are to be converted into continuous
distribution by the following method:
PAGE 47
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Let h = (Lower limit of II class interval – Upper limit of I class interval)
Then subtract h/2 from the lower limits of each class and add h/2 to the
upper limits of each class.
If the mid-points of class intervals are given, then compute the difference
between the second and first mid-point.
Let this difference = h, find h/2
Then subtract from each mid-point to get the lower limit of each class
and add h/2 to each mid-point to get upper limit of each class.
Suppose first class interval start from 20 and not from zero. We show it
on the graph by making a ‘‘kink’’ or a break on the axis.
48 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 49
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
50 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
(ii) In the graph drawn, different sections of the society are denoted along
the horizontal axis and the number of girls to the nearest ten per
thousand boys are denoted along the vertical axis, their intersection
represents 900.
Scale: 1 cm = 10 girls.
From the graph, we find that the number of girls to the nearest ten
per thousand boys are maximum in scheduled tribes, whereas they
are minimum in urban.
Example 3: The daily earnings of 50 workers are given below:
Daily Earnings 125–134 135–144 145–154 155–164 165–174 175–184
(in Rs.)
No. of Workers 2 7 10 15 10 6
Draw a histogram.
Solution: Since the given data is not continuous, so we have to convert
them into continuous frequency distribution.
Let h = lower limits of II class – upper limit of I class
h = 135 – 134 = 1
h/2 = 0.5
PAGE 51
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes We subtract 0.5 from the lower limits of each class and add 0.5 to the
upper limits of each class. We get the following continuous frequency
distribution:
Daily Earnings (in thousands) No. of Workers
124.5 134.5 2
134.5 144.5 7
144.5 154.5 10
154.5 164.5 15
164.5 174.5 10
174.5 184.5 6
52 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 53
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Solution: First, we draw the histogram of the given data. Then we will
find the mid- points of the top of rectangles. Join these mid-points by
dotted straight lines. Complete the polygon by joining the mid-points of
first and last class intervals to the mid-points of imagined class intervals,
with zero frequency, adjacent to them.
54 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Step V: Join the points (x1, f1), (x2, f2), …, (xn, fn) by the line segments. Notes
Step VI: Take two class intervals of zero frequency, one at the beginning
and other at the end. Obtain their mid-points.
Step VII: Complete the frequency polygon by joining the mid-points of
the first and the last intervals to the mid-point of the imagined classes
adjacent to them.
Example 7: Construct a frequency polygon for the following data:
Age (in years) 0–4 4–8 8–12 12–16 16–20 20–24 24–28 28–32 32–36
No. of Persons 1 3 6 8 10 8 5 3 2
Solution:
Age Class-Marks No. of Persons
0 – 4 2 1
4 – 8 6 3
8 – 12 10 6
12 – 16 14 8
16 – 20 18 10
20 – 24 22 8
24 – 28 26 5
28 – 32 30 3
32 – 36 34 2
10
9
8
7
6
5
4
3
2
1
PAGE 55
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes I.Q. 125.5– 118.5– 111.5– 104.5– 97.5– 90.5– 83.5– 76.5– 69.5– 62.5–
132.5 125.5 118.5 111.5 104.5 97.5 90.5 83.5 76.5 69.5
No. of 1 3 6 8 10 8 5 3 2 1
Pupils
Construct a frequency polygon for above data.
Solution:
I.Q. Class-Marks No. of Pupils
125.5 – 132.5 129 1
118.5 – 125.5 122 3
111.5 – 118.5 115 4
104.5 – 111.5 108 6
97.5 – 104.5 101 10
90.5 – 97.5 94 12
83.5 – 90.5 87 15
76.5 – 83.5 80 5
69.5 – 76.5 73 3
62.5 – 69.5 66 1
Example 9: The runs scored by two teams A and B on the first 60 balls
in a cricket match are given below:
Number of Balls Team A Team B
1 – 6 2 5
7 – 12 1 6
13 – 18 8 2
56 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 57
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
Notes
Example 11: The following table shows the numbers of hours spent by
a child on different events on a working day.
Activity No. of Hours
School 6
Sleep 8
Playing 2
Study 4
T. V. 1
Others 3
Represent the adjoining information on a pie chart.
Solution: The central angles for various observations can be calculated as:
Activity No. of Hours Measure of Central Angle
School 6 (6/24 × 360)° = 90°
Sleep 8 (8/24 × 360)° = 120°
Playing 2 (2/24 × 360)° = 30°
Study 4 (4/24 × 360)° = 60°
T. V. 1 (1/24 × 360)° = 15°
Others 3 (3/24 × 360)° = 45°
PAGE 59
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
3.5 Cumulative Frequency and Ogives
Cumulative frequency curve or an ogive is the graphical representation
of a cumulative frequency distribution.
There are two methods of constructing an ogive.
(i) Less than method (ii) More than method
60 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
Mark the upper class limits along x-axis and the cumulative frequency
along y-axis. Thus we plot the points (8, 8), (16, 20), (24, 40), (32, 56),
(40, 64) and (48, 74). Join these points by a free hand curve. Complete the
curve by joining the first point of the curve to the point (lower limit, 0).
Example 13: Draw a cumulative frequency curve (or an ogive) for the
following data:
Class Interval 0 – 19 20 – 39 40 – 59 60 – 79 80 – 99 100–119
Frequency 4 14 19 26 17 12
Solution: We first convert the class limits into true class limits and fre-
quency distribution is converted into cumulative frequency distribution.
Consider one imaginary point (lower limit of first class, 0).
Class Interval Frequency True Class Limits Cumulative Frequency
0 – 19 4 –0.5 – 19.5 4
20 – 39 14 19.5 – 39.5 18
40 – 59 19 39.5 – 59.5 37
60 – 79 26 59.5 – 79.5 63
80 – 99 17 79.5 – 99.5 80
100 – 119 12 99.5 – 119.5 92
PAGE 61
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
62 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Construct a frequency table and draw a bar diagram to present the data.
Solution: The frequency distribution of letters received are shown in
Table below. Figure depicts a frequency bar diagram for the number
of letters received during a period of 50 days presented in Table below:
PAGE 63
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Example 16: The profits before tax of Bharat Heavy Electricals Ltd. are
given below:
Year Profits before Tax (Rs. Millions)
1981-82 516.5
1982-83 604.2
1983-84 750.3
1984-85 1136.9
1985-86 1503.3
Represent the data by a bar diagram.
Solution: The above data can be represented by a simple bar diagram:
64 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
PAGE 65
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
PAGE 67
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Example 20: Prepare a histogram from the following data and find the mode.
Marks No. of Students
0-10 4
10-20 6
20-40 14
40-50 16
50-60 14
60-70 8
70-90 16
90-100 5
Solution: Since the class intervals are unequal, we first find frequency
densities. The histogram is then plotted using frequency densities.
Class
Class Intervals Width No. of Students Frequency Density
0 10 10 4 0.4
10 20 10 6 0.6
20 40 20 14 0.7
40 50 10 16 1.6
50 60 10 14 1.4
60 70 10 8 0.8
70 90 20 16 0.8
90 100 10 5 0.5
68 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Example 21: Prepare a histogram and a frequency polygon from the Notes
following data.
Class Interval 0-6 6-12 12-18 18-24 24-30 30-36
Frequency 4 8 15 20 12 6
Solution:
Class Interval Frequency
-6 0 0
0 6 4
6 12 8
12 18 15
18 24 20
24 30 12
30 36 6
36 42 0
PAGE 69
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Solution:
Number of Less than Cumulative
Pretax Income Residence (f) Frequency
0 1000 5 5
1000 2000 15 20
2000 3000 15 35
3000 4000 10 45
4000 5000 5 50
5000 6000 2 52
6000 7000 6 58
7000 8000 2 60
3.7 Exercise
1. A family with monthly income of Rs. 20,000 had planned the
following expenditure per month under various heads:
Heads Expenditure (in Rs. 1000)
Grocery 4
Rent 5
Education of children 5
Medicine 2
Fuel 2
Entertainment 1
Miscellaneous 1
Draw the graph for the above data.
70 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
2. The following table gives the marks scored by 100 students in an Notes
entrance examination:
Marks No. of Students (Frequency)
0–10 4
10–20 10
20–30 16
30–40 22
40–50 20
50–60 18
60–70 8
70–80 2
3. The following table shows the number of illiterate persons in the
age group (10 – 58 years) in a town:
Age Group (in years) Number of Illiterate Persons
10–16 175
17–23 325
24–30 100
31–37 150
38–44 250
45–51 400
52–58 525
Draw a histogram to represent the above data.
4. Draw a histogram to represent the following data which shows the
monthly cost of living index at a city in a period of two years.
Cost of Living Index Number of Months
440–460 2
460–480 4
480–500 3
500–520 5
520–540 3
540–560 2
560–580 1
580–600 4
PAGE 71
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 5. Draw the histogram and frequency polygon of the following frequency
distribution of the monthly wages:
Monthly Wages (in rupees) Number of Workers
325 – 350 30
350 – 375 45
375 – 400 75
400 – 425 60
425 – 450 55
Total 245
6. Draw the frequency polygon representing the following frequency
distribution.
Class Interval Frequency
30 – 34 12
35 – 39 16
40 – 44 20
45 – 49 8
50 – 54 10
55 – 59 4
7. Construct a frequency polygon from the following data.
Score Frequency
32 – 34 13
35 – 37 10
38 – 40 20
41 – 43 16
44 – 46 12
47 – 49 8
8. Draw an ogive for the following frequency distribution of less than
method.
Marks Number of Students
0 – 10 7
10 – 20 10
20 – 30 23
30 – 40 51
40 – 50 6
50 – 60 3
72 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 73
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
14. The following data represent the outlays (in Rs. crore) bracket close
by heads of development.
Heads of Development Centre States
Agriculture 4765 7039
Irrigation and Flood Control 6635 11395
Energy 9995 8293
Industry And Minerals 12770 2985
Transport And Communication 12200 5120
Social Services 8216 1420
Total 54581 36252
74 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
15. Draw a histogram and frequency polygon from the following data: Notes
Class Frequency
0-10 4
10-20 6
20-40 14
40-50 16
50-60 14
60-70 8
70-90 16
90-100 5
16. Construct a histogram from the following data:
Class Limits Frequency
9-10 16
10-11 22
11-12 45
12-13 60
13-14 50
14-15 24
15-16 10
17. The frequency distribution of marks obtained by 60 students of a
class in a college are given below:
Marks 30-34 35-39 40-44 45-49 50-54 55-59 60-64
No. of Students 3 5 12 18 14 6 2
Draw a histogram for this distribution and find modal value. Draw a
cumulative frequency curve also.
18. Draw a less than Ogive from the following data:
Weekly Income (Rs.)
(Equal to or More Than) No. of Families
12,000 0
11,000 6
10,000 14
8,000 26
6,000 42
4,000 54
PAGE 75
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
76 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 77
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
4
Measures of Central
Tendency
STRUCTURE
4.1 Measures of Central Tendency
4.2 Arithmetic Mean
4.3 Median
4.4 Mode
4.5 Weighted Mean
4.6 Partition Values
4.7 Miscellaneous Questions
4.8 Exercise
PAGE 79
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
4.2 Arithmetic Mean
If x1, x2, x3, ... , xn are n numbers, then their arithmetic mean (A.M.) is
defined by.
Arithmetic Mean = x = (x + x + x + ... + x )/n
1 2 3 n
where,
x1, x2, x3, xn are the observations.
n is the number of observations.
Alternatively, one can symbolically write it as shown below:
Arithmetic Mean Formula = x = ∑xifi/∑fi
In the above equation, the symbol ∑ known as sigma. It implies sum-
mation of the values.
80 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution:
Σfx = 40 + 80 + 120 + 80 = 320
f = 5 + 8 + 8 + 4 = 25
A.M. = Σfx/f = 320/25 = 12.8
(b) Shortcut Method/Assumed Mean Method
When the direct technique becomes too time-consuming, we use the as-
sumed mean method to calculate the average of a set of grouped data.
PAGE 81
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes We can use the assumed mean method to detect and calculate the mean
by following the procedures listed below:
Make a table using the five columns listed below:
Column 1: Class Intervals between classes.
Column 2: Class marks, represented by xi. Assumed Mean A: Pick the
middle value from the class marks and indicate it as A.
Column 3: Determine the relevant deviations using the formula
di = xi – A.
Column 4: Frequencies (fi) of the corresponding class
Column 5: Mean of deviated values = ∑fidi/∑fi
Finally, calculate the Mean of the original data by adding the assumed
mean to the average of the deviated values.
We usually assume a value as the average in this manner (namely, A).
This value is used to calculate the deviations that the formula is based
on. In addition, the information will be presented as a frequency distri-
bution table with classifications. As a result, the formula for calculating
the mean using the assumed mean technique is:
Mean (x) = A + ∑fidi/∑fi
Example 3: Find the arithmetic mean of the following distribution:
Class 0–10 10–20 20–30 30–40 40–50
Frequency 7 8 20 10 5
approach to simplify the calculations. The following are the steps to take Notes
while using the step deviation method:
Make a table with five columns, as shown below:
Column 1: Class intervals.
Column 2: Corresponding class marks, represented by x i. Take the middle
value from the class marks and indicate it as A.
Column 3: Corresponding frequencies (fi) in the next column.
Column 4: Determine the corresponding deviations using the formula
di = xi – A. Use the formula ui = di/h to calculate the values of ui, where
h is the class width.
Column 5: Multiply the corresponding frequencies (fi) with ui in the
next column.
The step-deviation method can be used to find the mean when the data
values are large. The formula is as follows:
Mean (x) = A + (h ∑fiui/∑fi)
Example 4: Find the arithmetic mean of the data given in example 3 by
step deviation method
Solution: Let a = 25
x – a
Class Mid-value (x) Frequency (f) u = fu
h
0–10 5 7 –2 –14
10–20 15 8 –1 –8
20–30 25 20 0 0
30–40 35 10 +1 +10
40–50 45 5 +2 +10
Total Σf = 50 Σfu = –2
A.M. = 25 + (10*(_2)/50)
= 24.6
4.3 Median
Median is defined as the measure of the central unit when they are ar-
ranged in ascending or descending order of magnitude.
PAGE 83
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Median = ((n + 1)/2)th term if given data set has odd number of values
= average of (n/2)th and ((n/2)+1)th observation if data set is even.
Steps for calculating the Median
Step 1: Sort your observations into ascending or descending order.
Step 2: The median is the middle observation if the number of observa-
tions is odd, or the average of the two middle observations if the number
of observations is even.
84 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
The point at which the perpendicular touches the x-axis gives the Notes
median.
Note: We can determine the other partition values like quartiles, deciles,
percentiles, etc. by using method 2 described above.
Example 5: Find the median of 6, 8, 9, 10, 11, 12, 13.
Solution: Total number of items = 7
The middle item = ((7 + 1)/2) = 4
Median = Value of the 4th unit = 10
Calculation of Median for Grouped Data
n
− cf
2
l+
Median = ×h
f
where, l is the lower limit of the median class,
f is the frequency of the class,
h is the width of the median class
cf is the cumulative frequency of the class preceding the median-class and
n is total frequency of the data.
Example 6: Find the value of Median from the following data:
No. of Days for which 5 10 15 20 25 30 35 40 45
Absent (less than)
No. of students 29 224 465 582 634 644 650 653 655
Solution: The given cumulative frequency distribution will first be con-
verted into ordinary frequency as under:
Class-Interval Cumulative Frequency Ordinary Frequency
0–5 29 29=29
5–10 224 224–29=195
10–15 465 465–224=241
15–20 582 582–465=117
20–25 634 634–582=52
25–30 644 644–634=10
30–35 650 650–644=6
35–40 653 653–650=3
40–45 655 655–653=2
PAGE 85
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
4.4 Mode
The mode is that value of the observation in the series which occurs the
largest number of times or which has greatest frequency. Thus, mode is
the most popular item of the series around which there is the highest
frequency density. Usually denoted by Mo.
When we speak of ‘average student’, ‘average collar size’, ‘average T-shirt
size’, ‘average shoe size’, we are referring to mode.
Mode can be calculated for ungrouped and grouped series. A distribution
may have more than one mode.
For Ungrouped Data
The mode is calculated just by inspection or by counting the number of
items. It is the value of the observation corresponding to maximum frequency.
Example 7: Find the mode of the following items: 0, 1, 6, 7, 2, 3, 7,
6, 6, 2, 6, 0, 5, 6, 0.
Solution: As 6 occurs 5 times and no other item occurs 5 or more than
5 times, hence the mode is 6.
For Grouped Data
To calculate the mode, we first ensure that it is continuous exclusive
series having equal class intervals. By inspection locate the modal class.
The modal class is one having maximum frequency. The mode is then
given by
f1 − f 0
Mo =
l+ ×h
2 f1 − f 0 − f 2
where l is the lower limit of the modal class, f0 is the frequency of the
modal class, h is the width of the class, f1 is the frequency before the
modal class and f2 is the frequency after the modal class.
86 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Note: Notes
1. In moderately skewed or asymmetrical distribution, when the values
of any two averages is given then the third average can be obtained
using the following empirical formula:
Empirical formula = Mean – Mode = 3 (Mean – Median)
2. Mode is used when the most typical value of a distribution is desired.
For example, mode is used to find average amount of money spend
by students per month, average collar size, average shoe size, etc.
PAGE 87
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
88 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
parts respectively. All these values can be determined in the same way Notes
as median. The only difference is their location.
4.6.1 Quartiles
The values of a variate which divide the series into four equal parts are
called quartiles. First arrange the data in ascending or descending order.
We know that three points are required to divide the data into four equal
parts, so we have three quartiles, denoted by Q1, Q2 and Q3.
The first quartile (Q1), also known as lower quartile is that value of the
variate below which there are 25% of the observations and above which
there are 75% of the observations.
The second quartile (Q2), also known as middle quartile or median is
that value of the variate which divides the series into two equal parts i.e.
50% of the observations are below it 50% of the observations above it.
The third quartile (Q3), also known as upper quartile is that value of the
variate below which there are 75% of the observations and above which
there are 25% of the observations.
We see that Q1 < Q2 < Q3
Computation of Quartiles
In case of individual series or discrete series
First arrange the data into ascending or descending order of magnitude.
Now
th
n +1
Q1 = value of item in the series
4
th
2 ( n + 1)
Q2 = value of item in the series
4
th
3 ( n + 1)
Q3 = value of item in the series
4
PAGE 89
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes n
− cf
4
l+
Q1 = ×h
f
2n
− cf
4
l+
Q2 = ×h
f
3n
− cf
4
l+
Q3 = ×h
f
where,
l = lower limit of the quartile class
cf = cumulative frequency of the class prior to the quartile class
f = frequency of the quartile class
h = width of the quartile class
n = total number of observations in the distribution
Note: Q2 is same as the median
4.6.2 Deciles
The values of a variate that divides the series into ten equal parts are
called deciles. Since nine points are required to divide the arranged data
into ten equal parts, there are 9 deciles denoted by D1, D2,….., D9. Each
part contains 10% of the data.
Computation of deciles in case of individual series or discrete series
The general formula of jth decile is
th
j ( n + 1)
Dj = value of item in the series
10
where j = 1, 2,…., 9
90 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
where j = 1, 2,…., 9
where cf is the cumulative frequency preceding the jth decile class. Other
notations have usual meaning.
4.6.3 Percentile
The value of the variate which divide the series into 100 equal parts are
called percentiles. Since ninety nine points are required to divide the
data into 100 equal parts, there are 99 percentile values denoted by Pj
(j = 1, 2,…..,99). Each percentile contains 1% of the total number of
observations.
Computation of percentiles
In case of discrete or individual series
The formula for jth percentile is
th
j ( n + 1)
Pj = value of item in the series
100
where j = 1, 2,…., 9
In case of continuous series
The formula for the jth percentile is
jn
− cf
100
Pj =
l+ ×h
f
where j = 1, 2,…., 99
cf is the cumulative frequency preceding the jth percentile class.
PAGE 91
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Example 9: The data set given below has 19 observations, calculate all
three quartiles.
Solution: First of all, we need to sort this data in the ascending order
as given below;
Here total observation count N = 19, now we will calculate the quartiles
using their respective formulas given above in this section.
Q1 = 1 * (19+1)/4 = 20/4 = 5 (i.e. 5th observation) = 33
Q2 = 2 * (19+1)/4 = 40/4 = 10 (i.e. 10th observation) = 48
Q3 = 3 * (19+1)/4 = 60/4 = 15 (i.e. 15th observation) = 61
Example 10: Below table shows the scores (out of 100) in a mathemat-
ics test for 30 students in a class. Calculate the 1st, 4th and 5th decile
values for given data.
92 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution: Notes
The very first step that we have to do is to arrange the given data in the
ascending order based on test scores as shown below:
PAGE 93
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution: First of all we have to arrange this data set in the ascending
order of magnitude.
Here N = 50, now using the percentile formula we will calculate P30,
P46 & P90
P30 = 30 × (50 + 1)/100 = 1530/100 = 15.3
= 15th observation + 0.3 × (16th observation – 15th observation)
= 157.7 + 0.3 × (158.1 – 157.7) = 157.82
P46 = 46 × (50 + 1)/100 = 2346/100 = 23.46
= 23rd observation + 0.46 × (24th observation – 23rd observation)
= 164.1 + 0.46 × (164.7 – 164.1) = 164.38
P90 = 90 × (50 + 1)/100 = 4590/100 = 45.9
= 45th observation + 0.9 × (46th observation – 45th observation)
= 182.6 + 0.9 × (182.7 – 182.6) = 182.6
Therefore, the heights (in cm) at 30th, 46th, and 90th percentile values
are 157.82, 164.38, and 182.69 respectively.
94 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
4.7 Miscellaneous Questions
Example 12: The following data give an actual distribution, obtained
by tossing ten coins 1024 times and recording the number of heads that
appeared on each toss. What is the average number of heads per toss?
Number of Heads Frequency
0 1
1 16
2 42
3 126
4 199
5 253
6 209
7 118
8 53
9 4
10 3
Solution: Computation of average number of heads per toss.
Number of Heads Frequency mf
(m) (f)
0 1 0
1 16 16
2 42 84
3 126 378
4 199 796
5 253 1265
6 209 1254
7 118 826
8 53 424
9 4 36
10 3 30
n = 1024 Σmf = 5109
Σmf 5109
=
Arithmetic average a = = 5 heads per toss approx.
n 1024
Thus, the average number of heads per toss is 5.
PAGE 95
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Example 13: The following table gives the population of males at dif-
ferent age-groups of the U.K. and India at the time of a census.
U.K.
Age-Group (Lakhs) India (Lakhs)
0-5 18 214
5-10 19 258
10-15 20 222
15-20 18 157
20-25 16 145
25-30 14 161
30-40 27 257
40-50 25 184
50-60 19 120
Above 60 17 100
Compare the average age of males in the two countries, and account for
difference, if any.
Solution:
U.K. India
Population Population
of males of males
Age-group Mid-Values d= x-a in U.K. in India
(m) (x) = x-27.5 (f) fd (f) fd
0-5 2.5 -25 18 -450 214 -5350
5-10 7.5 -20 19 -380 258 -5160
10-15 12.5 -15 20 -300 222 -3330
15-20 17.5 -10 18 -180 157 -1570
20-25 22.5 -5 16 -80 145 -725
25-30 27.5 0 14 0 161 0
30-40 35.0 7.5 27 202.5 257 1927
40-50 45.0 17.5 25 437.5 184 3220
50-60 55.0 27.5 19 522.5 120 3300
Above 60 65.0 37.5 17 637.5 100 3750
n=193 Σfd = n=1818 Σfd =
410 -3937
96 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 97
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
98 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
∑ pw 584.25
Weighted average = = = 46.74%
∑w 12.5
Example 16: The arithmetic mean, the mode and the median of a group
of 75 observations were calculated to be 27, 34 and 29 respectively. It
was later discovered that one observation was wrongly read as 43 instead
of the correct value 53. Examine to what extent the calculated values of
the three averages will be affected by the discovery of this error.
Solution:
1. The arithmetic mean is calculated as the sum of all observations
divided by the number of observations.
Initially:
PAGE 99
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
Incorrect: ΣX = 75 × 27 = 2025
Correct: ΣX = 2025 − 43 + 53 = 2035
Correct
2. Mode: The mode is the value that appears most frequently in the
data set. Since correcting the observation from 43 to 53 changes
just one value, and if neither 43 nor 53 was the mode, the mode
remains unchanged at 34.
3. Median: The median is the middle value when the data set is
ordered. Given there are 75 observations, the median is the 38th
observation. If the incorrect value of 43 did not alter the central
position (median), changing it to 53 would have no effect. However,
if 43 was close to the median, this correction might slightly change
the median, but it’s unlikely given the median was 29, and this
correction involves numbers larger than the median.
The discovery of the error has a minimal impact on the arithmetic mean
and likely no impact on the mode or median.
Example 17: Find the weighted arithmetic mean of first n natural num-
bers who weights are equal to corresponding numbers.
Solution: To find the weighted arithmetic mean of the first n natural
numbers where the weights are equal to the corresponding numbers, you
can use the following approach:
The first n natural numbers are 1,2, 3…,n.
The weights for these numbers are also 1,2, 3…,n.
For the first n natural numbers,
1. The weighted sum is: ∑(i.i) = ∑ i2
2. The sum of squares of the first n natural numbers is given by:
∑ i2 = n(n + 1)(2n + 1)/6
Calculate the Total Weight:
The total weight is: ∑i = n(n + 1)/2
100 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 101
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes where:
L = Lower boundary of the median class = 20
N = Total number of observations = 105
C = Cumulative frequency of the class before the median class = 41
f = Frequency of the median class = 30
h = Class width of the median class = 24 – 20 + 1 = 5
Substitute the values in the formula
52.5 − 41
Median= 20 + ×5 = 21.92
30
Example 19: Find the missing frequency from the following distribution
if median is 35 and N = 170.
Variable 0-10 10-20 20-30 30-40 40-50 50-60 60-70
Frequency 10 20 - 40 - 25 15
Solution:
To find the missing frequencies (say f1 and f2), we’ll follow these steps:
1. Determine the Median Class:
The median is given as 35, which falls in the class interval 30-40.
2. Calculate Cumulative Frequencies:
Calculate cumulative frequencies up to the median class:
Cumulative frequency up to 0-10: 10
Cumulative frequency up to 10-20: 10+20=30
Cumulative frequency up to 20-30: 30+f1
Cumulative frequency up to 30-40: 30+f1+40
Let’s denote the cumulative frequency just before the median class by
C. So, C=30+f1.
1. The Median Formula:
N
−C
2
L+
Median = ×h
f
102 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
where: Notes
L = Lower boundary of the median class = 30
N = Total number of observations = 170
C = Cumulative frequency of the class before the median class = 30+f1.
f = Frequency of the median class = 40
h = Class width = 10
The median is given as 35, so:
170
− ( 30 + f1 )
2
35 =
30 + ×10
40
Simplify this equation,
35 = 30+(85−30−f1)/4)
20 = 55−f1
f1 = 55−20=35
To Find the Second Missing Frequency f2, we know that the total
frequency N = 170
10 + 20 + f1 + 40 + f2 + 25 + 15 = 170
Substitute f1=35
10 + 20 + 35 + 40 + f2 + 25 + 15 = 170
145 + f2 = 170
f2 = 170 −145 = 25
The missing frequencies are: f1 = 35 and f2 = 25.
Example 20: Calculate median from the following data:
Mid-value 15 25 35 45 55 65 75 85
Frequency 5 9 13 21 20 15 8 3
Solution: To calculate the median from the given data, follow these steps:
1. Determine the Class Interval
The class interval (h) can be found by observing the difference
between consecutive mid-values.
Class Interval (h) = Difference between consecutive mid-values
= 25−15 = 10
So, the class interval is 10.
PAGE 103
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
where,
Lower boundary (L) = 40
Frequency of the median class (f) = 21
Cumulative frequency before the median class (C) = 27
Class interval (h) = 10
Substituting the values:
47 − 27
Median =
40 + ×10
21
= 40 + (20/21 ) × 10
= 40 + (0.952) × 10
= 40 + 9.52 ≈ 49.52
104 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Example 21: How are the mean and median affected when it is known Notes
that for a group of 10 students, scoring an average of 60 marks, the best
paper was wrongly marked 80 instead of 75?
Solution: Here’s how to assess the impact on the mean and median when
correcting a score:
1. Initial Data:
Average (Mean) score = 60
Number of students = 10
Total score (Sum) = Average × Number of students = 60 × 10 = 600
2. Error in Scoring:
The best paper was wrongly marked 80 instead of 75.
3. Correct Total Score:
Correct total score = 600 - 80 + 75 = 595
4. Correct Mean: Correct Mean = Correct Total Score/Number of
Students = 595/10 = 59.5
The median is the middle value of the dataset when it is ordered. For a
dataset with 10 students, the median will be the average of the 5th and
6th values when ordered. If the incorrect score (80) is one of the highest
values, replacing it with 75 might not affect the median if 75 falls among
the lower half of the scores. However, if 75 is a lower value and does
not change the middle position, the median remains the same.
Example 22: Find out the median of the following series:
Wages
(in Rupees) No. of Labourers
260-270 5
250-260 10
240-250 20
230-240 5
220-230 3
PAGE 105
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Solution:
To find the median of the given series, follow these steps:
1. Calculate the Cumulative Frequencies:
For the interval 220-230: Cumulative Frequency = 3
For the interval 230-240: Cumulative Frequency = 3 + 5 = 8
For the interval 240-250: Cumulative Frequency = 8 + 20 = 28
For the interval 250-260: Cumulative Frequency = 28 + 10 = 38
For the interval 260-270: Cumulative Frequency = 38 + 5 = 43
Wages No. of Labour- Cumulative Fre-
(in Rupees) (m) ers (f) quency (c.f.)
260-270 5 5
250-260 10 15
240-250 20 35
230-240 5 40
220-230 3 43
n = 43
2. Determine the Median Class:
Total Number of Labourers = 43
Median position = 43/2 = 21.5
The median class is the class interval where the cumulative frequency
just exceeds 21.5. Here, it is the interval 240 – 250 because the
cumulative frequency is 28 (which is greater than 21.5).
3. Use the Median Formula:
N
−C
2
L+
Median = ×h
f
where:
L = Lower boundary of the median class = 240
N = Total number of labourers = 43
C = Cumulative frequency of the class before the median class = 8
f = Frequency of the median class = 20
h = Class interval width = 10 (e.g., 250 – 240)
106 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Median =
240 +
( 21.5) − 8 ×10
20
= 240 + (13.5/20 ) × 10
= 240 + 6.75 = 246.75.
So, the median of the series is 246.75 Rupees.
Example 23: Calculate the mode from the following data:
Income 15-24 25-34 35-44 45-54 55-64 65-74
No. of Workers 8 10 15 25 40 20
Solution:
To calculate the mode for the given data, follow these steps:
1. Identify the Modal Class: The modal class is the class interval with
the highest frequency. Here, the highest frequency is 40, which
corresponds to the class interval 55-64. So, the modal class is 55-64.
2. Apply the Mode Formula: The formula for the mode in grouped
data is:
f1 − f 0
Mode = l+ ×h
2 f1 − f 0 − f 2
Where:
l = Lower boundary of the modal class = 55
f1 = Frequency of the modal class = 40
f0 = Frequency of the class before the modal class = 25
f2 = Frequency of the class after the modal class = 20
h = Class interval width = 10 (since 25-34, 35-44, etc., all have a
width of 10)
Step 3: Substitute the Values into the Formula
40 − 25
Mode =
55 + ×h
2 ( 40 ) − 25 − 20
= 55+(15/(80−45))×10
= 55+(15/35) ×10
= 55+(0.4286) ×10
= 55+4.29 ≈ 59.29.
PAGE 107
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Example 24: Find the missing frequency for the following incomplete
distribution by using the appropriate formula when mode is 36.
Variable Frequency
0-10 5
10-20 7
20-30 ?
30-40 ?
40-50 10
50-60 6
50
Solution: We are given that the mode is 36, and the total frequency is
50. Let the missing frequency corresponding to class 20-30 and 30-40
be x and y respectively. Let’s follow the steps to calculate the missing
frequencies.
1. Identify the Modal Class: The mode is given as 36, which falls
in the class interval 30-40. Therefore, the modal class is 30-40.
2. Use the Total Frequency: We know that the total frequency is 50:
5+7+x+y+10+6=50
28+x+y=50
x+y=22
y=22-x
Step 3: Apply the Mode Formula
The formula for the mode in grouped data is:
f1 − f 0
Mode =
l+ ×h
2 f1 − f 0 − f 2
Where:
l = Lower boundary of the modal class = 30
f1 = Frequency of the modal class (30-40) = y = 22-x
f0 = Frequency of the class before the modal class (20-30) = x
f2 = Frequency of the class after the modal class (40-50) = 10
h. = Class interval width = 10 (since 25-34, 35-44, etc., all have a
width of 10)
108 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
36 =
30 +
( 22 − x ) − x ×10
2 ( 22 − x ) − x −10
4.8 Exercise
1. Define an Average. Describe briefly advantages and disadvantages
of Arithmetic Mean.
2. What are the mathematical properties of Arithmetic Mean.
3. The heights (in cms) of 10 students of a class were noted as shown
below. Compute the arithmetic mean.
PAGE 109
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes S. No. 1 2 3 4 5 6 7 8 9 10
Height 160 167 174 158 155 171 162 152 156 175
4. Determine median from the following data:
30, 37, 54, 58, 61, 64, 31, 34, 52, 55, 32, 62, 28, 47, 55
5. Determine the mode of the following data:
58, 60, 31, 62, 48, 37, 78, 43, 65, 48
6. The following table shows the distribution of the number of students
per teacher in 750 colleges
Students 1 4 7 10 13 16 19 22 25 28
Frequency 7 46 165 195 189 89 28 19 9 3
Calculate arithmetic mean, median and mode.
7. Calculate arithmetic mean, median and mode for the following data.
Marks Obtained 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
No. of Students 8 14 22 26 15 10 5
8. Calculate the median, quartiles, 4th decile and 60th percentile the
following data
Marks Obtained 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
No. of Students 8 14 22 26 15 10 5
9. Calculate the median income for the following distribution:
Income 40 –45 45 – 50 50 – 55 55 – 60 60 – 65 65 – 70 70 - 75
(Rs.)
No. of 2 7 10 12 8 3 3
Persons
10. Find median, Q1, Q3, D4, D7, P26, P45 and P70 from the following data
S. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Marks: 5 12 17 23 28 31 37 41 42 49 54 58 65 68 17
11. The mean age of a group of 100 children was 9.35 years. The mean
age of 25 of them was 8.75 years and that of another 65 was 10.51
years. What was the mean age for the remainder?
12. A firm of readymade garments makes both men’s and women’s
shirts. Its profits average 6 percent of sales, its profits in men’s
shirts average 8 percent of sales; and women’s shirts comprise 60
percent of output. What is the average profits per sales rupee in
women’s shirts?
110 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
13. The mean wage of 150 labourers working in a factory running three Notes
shifts of 60, 40, and 50 labourers is Rs. 114.00. The mean wage
of 60 labourers working in the first shift is Rs. 121.50 and that of
40 labourers working in the second shift is Rs. 107.75. Find the
mean wage of the labourers working in the third shift.
14. A market with 168 operating firms has the following distribution
of average number of workers in various income groups:
Income Groups 150-300 300-500 500-800 800-1200 1200-1800
No. of Firms 40 32 26 28 42
Average No. of 8 12 7.5 8.5 4
Workers
Find the average salary paid in the whole market.
15. The expenditure of 1000 families is given as under:
Expenditure 40—59 60—79 80—99 100—119 120—139
(in Rs.)
No. of Families 50 ? 500 ? 50
The median and mean for the distribution are both Rs. 87.50
respectively. Calculate the missing frequencies.
16. The median and mode of the following wage distribution are known
to be Rs. 335 and Rs. 340 respectively. Three frequency values
from the table, however, are missing. Find the missing values.
Wages in Rs. Frequency
0-100 4
100-200 16
200-300 60
300-400 ?
400-500 ?
500-600 ?
600-700 4
230
17. From the following data related to unemployment, calculate the
standardized unemployment rate, the standardized rate of unemployment
of local population and the crude rate of unemployment of local
population.
PAGE 111
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
112 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
5
Measures of Dispersion
STRUCTURE
5.1 Dispersion
5.2 Range
5.3 Interquartile Range and Quartile Deviation
5.4 Mean Deviation
5.5 Standard Deviation and Root Mean Square Deviation
5.6 Miscellaneous Examples
5.7 Exercise
5.1 Dispersion
Averages discussed earlier fail to reveal the full details of the distribution. Two or three
distributions may have the same average but still they may differ from each other in many
ways.
Suppose, there are three series of nine items each as follows:
Series A Series B Series C
50 48 5
50 50 15
50 46 20
50 49 25
50 47 35
50 52 80
50 53 85
50 51 90
50 54 95
In the series A, the mean is 50 and the value of all the items is identical. The items are
not at all scattered, and the mean is the representative of this distribution. However, in
the series B, though the mean is 50 yet all the items of the series have different values.
PAGE 113
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes But the items are not very much scattered as the minimum value of the
series B is 46 and the maximum is 54 in the range. In the series C also,
the mean is 50 and the values of different items are also different, but
here the values are very widely scattered. Though the mean is the same
in all the three series, yet the series differ widely from each other in
their formation. Obviously, the average is not giving us the complete
information about the series. In such cases, further Statistical analysis
of the data is necessary so that these differences between various series
can be studied and accounted for. Such analysis will make our results
more accurate and we shall be more confident of our conclusions. As
we can see the spread among the items in the Series A is zero, Series B
varies within a small range, while in the Series C the values are widely
scattered. It is evident from the above, that a study of the extent of the
scatter around average should also be made to throw more light on the
composition of a series.
This spread or scatteredness of the data is called dispersion.
Some important definitions of dispersion are given below:
“Dispersion or spread is the degree of the scatter or variation of the
variable about a central value.”
– Brooks and Dick
“Dispersion is the measure of the variations of the items.”
– A. L. Bowley
“The degree to which numerical data tend to spread about an average
value is called the variation or dispersion of the data.”
– Spiegel
“Measures of variability are usually used to indicate how tightly bunched
the sample values are around the mean.”
– Dyckman and Thomas
From the above definitions, it is clear that in a general sense the term
dispersion refers to the variability in the size of the items. If the variation
is substantial, dispersion is said to be significant and if the variation is
very little, dispersion is said to be insignificant.
114 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
5.2 Range
It is the difference between the greatest and the smallest observations
of the distribution.
If L is maximum value, S – minimum value
Then, Range = L – S.
It is the simplest but a crude measure of dispersion. It is based on two
extreme observations which are subject to chance fluctuations, hence it
is not at all a reliable measure of dispersion. This is an absolute measure
of dispersion and is not suitable for comparison in case distributions
are in different units. For comparison, a relative measure is used, called
coefficient of range.
L−S
Coefficient of Range =
L+S
PAGE 115
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
116 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 117
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes n is even, median = (1/2) [(n/2)th observation and (n/2 + 1)th observation]
= (1/2) [8th observation + 9th observation]
= (10 + 14)/2
= 24/2
= 12
Q2 = 12
Now, lower half of the data is:
2, 5, 7, 7, 8, 8, 10, 10 (even number of observations)
Q1 = Median of lower half of the data
= (1/2) [4th observation + 5th observation]
= (7 + 8)/2
= 15/2
= 7.5
Also, the upper half of the data is:
14, 15, 17, 18, 24, 27, 28, 48 (even number of observations)
Q3 = Median of upper half of the data
= (1/2) [4th observation + 5th observation]
= (18 + 24)/2
= 42/2
= 21
Quartile deviation = (Q3 – Q1)/2
= (21 – 7.5)/2
= 13.5/2
= 6.75
Therefore, the quartile deviation for the given data set is 6.75.
118 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Mean deviation is least when taken from median. However, in practice, Notes
arithmetic mean is most frequently used in calculating mean deviation.
It is a better measure of dispersion as it is based on all the observations.
But since it ignores the signs of the deviations it becomes useless for
further mathematical treatment.
Mean deviation in case of individual observations:
If xi, i = 1, 2, ...... n are the observations then mean deviation from the
average A is
N N
where | xi – A | = |D| is the absolute value of (xi – A) (ignoring the sign)
‘A’ is usually taken as mean, median or mode.
Mean deviation in case of discrete or continuous distribution:
If xi/fi ; I = 1, 2…., n is the frequency distribution then mean deviation
from the average A is
∑f x − A = ∑f D
i
M. D. =
∑f N
PAGE 119
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Demerits:
1. It is not capable of further algebraic treatments since it ignores sign
2. It does not always give accurate results. It gives the best result when
the deviations are taken from the median
3. It is rarely used in sociological studies. Hence it has limited use.
Example 4: Determine the mean deviation about mean for the data val-
ues 5, 3,7, 8, 4, 9.
Solution:
Given data values are 5, 3, 7, 8, 4, 9.
First, find the mean for the given data:
Mean, µ = (5+3+7+8+4+9)/6
µ = 36/6
µ = 6
Therefore, the mean value is 6.
Now, subtract the mean from each of the data value, and ignore the
minus symbol if any
The obtained values are 1, 3, 1, 2, 2, 3.
Therefore, the mean deviation is
= (1 + 3 + 1 + 2 + 2 + 3)/6
= 12/6
= 2
(i) σ
=
1
∑(x − x )
2
or ∑x 2
− x2
N N
2
= σ
(ii)
∑d 2
∑d
−
N N
where, d = X – A, A is any assumed mean, N is total number of observations.
=σ
1
∑f (x − x)
2
or
∑ fx 2
− x2
N N
Where N = ∑f
(ii) Assumed mean method
2
=σ ∑ fd 2
∑ fd
−
N N
σ=
h×
∑ fu 2
∑ fu
−
N N
x− A
Where, u = , A is assumed mean and h is class size
h
5.5.3 Standard Deviation in Case of Continuous Series is obtained
using any of the following formula:
For the frequency distribution xi/fi ; i = 1, 2, ......, n
2
σ=
h×
∑ fu 2
∑ fu
−
N N
PAGE 121
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes m− A
Where, u = , A is assumed mean, m is mid-value of the class and
h
h is class size
5.5.5 Variance
The square of standard deviation is called the variance. It is denoted by σ 2
1 1
∑ f ( x −=
x) ∑ fx
2
σ2
= 2
− x2
N N
122 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution: Notes
Class Interval Frequency (f) Mid Value (x) fx fx2
0 – 10 27 5 135 675
10 – 20 10 15 150 2250
20 – 30 7 25 175 4375
30 – 40 5 35 175 6125
40 – 50 4 45 180 8100
50 – 60 2 55 110 6050
∑f = 55 ∑fx = 925 ∑fx2 =
27575
N = ∑f = 55
Mean = (∑fxi)/N = 925/55 = 16.818
1
Variance =
N
∑ xi2 − x 2
PAGE 123
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
5.6 Miscellaneous Examples
Example 7: Determine range and coefficient of range from the following
data:
Wages 100-110 110-120 120-130 130-140 140-150 150-160
No. of Workers 10 12 15 16 9 8
Solution:
Minimum value = 100
Maximum value = 160
Range = L – S = 160 – 100
L − S 160 − 100 60
Coefficient of range = = = = 0.23
L + S 160 + 100 260
Example 8: Find the interquartile range and coefficient of quartile de-
viation from the data given below:
200, 210, 208, 160, 220, 250, 300
Solution: Arranging the data in ascending order, we get
160, 200, 208, 210, 220, 250, 300
N = 7
N +1 th
Q1 = size of ( ) item
4
7 +1
= = 2nd item = 200
4
3 ( N + 1) th
Q3 = size of item
4
= 6th item = 250
Interquartile range = Q3 – Q1
= 250 – 200
= 50
124 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Q3 − Q1 Notes
Coefficient of quartile deviation =
Q3 + Q1
250 − 200
=
250 + 200
50
=
450
= 0.11
Example 9: Find the interquartile range from the following data:
Marks (more than) 0 20 40 60 80 100 120
No. of Students 80 76 50 28 18 9 3
Solution:
Marks Frequency (f) Cumulative Frequency (c.f.)
0 – 20 4 4
20 – 40 26 30
40 – 60 22 52
60 – 80 10 62
80 – 100 9 71
100 – 120 6 77
120 – 140 3 80
N = 80
N = 80
N th
Q1 = size of ( ) item
4
80
= = 20nd item
4
Therefore, the quartile class is 20 - 40
N
− cf
4 20 − 4
Q1 = l + × h = 20 + × 20 = 20 + 12.3 = 32.3
f 26
3 N th
Q3 = size of item
4
= 60th item
PAGE 125
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
th
N
Median = item = 195th item
2
126 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
∑ fd × i
Mean = A +
N
0
= 35 + × 10 =
35
60
Mean deviation =
∑f D
× i=
68
× 10= 11.33
N 60
Example 12: Find the standard deviation for the following distribution:
X 4.5 14.5 24.5 34.5 44.5 54.5 64.5
f 1 5 12 22 17 9 4
PAGE 127
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Solution:
X f d = (X – 34.5)/10 fd fd2
4.5 1 -3 -3 9
14.5 5 -2 -10 20
24.5 12 -1 -12 12
34.5 22 0 0 0
44.5 17 1 17 17
54.5 9 2 18 36
64.5 4 3 12 36
N = 70 Σfd = 22 Σfd2 = 130
2
σ=
i×
∑ fd 2
∑ fd
−
N N
2
130 22
10
=× −
70 70
= 10 × 1.757 = 13.26
Example 13: The number of employee, wages per employee and the
variance of the wages of employees for two factories are given below:
Factory A Factory B
No. of Employees 50 100
Average Wages per Employee per Month 1200 850
Variance of the Wages (Rs.) 81 256
(i) In which factory is the greater variation in the distribution of wages
per employee?
(ii) Suppose I factory B, the wages of an employee were wrongly noted
as Rs. 900 instead of Rs. 910. What would be the correct variance
of wages in Factory B?
Solution:
(i) In order to compare the distribution of wages in the two factories,
we find coefficient of variation. The Coefficient of variation is
given by the formula
σ
C.V. = ×100
x
128 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Now, Notes
9
C.V. for factory A = ×100 =
0.75%
1200
16
C.V. for factory B = ×100 = 1.88%
850
Since the coefficient of variation is greater for factory B, it shows
greater variation in the distribution of wages per employee.
(ii) For factory B,
ΣX= 850 ×100= 85000
Corrected Σ
= X 85000 − 900 + 910
= 85010
85010
Thus the corrected mean,
= X = 850.10
100
Now,
σ2
Variance ==
∑X − ( X )
2
2
N
It is given for factory B, variance = 256, N = 100, mean = 850
Hence, from the variance formula, we get
256
=
∑X 2
− ( 850 )
2
100
25600 = ∑X 2
– 72250000
∑X 2
= 72250000 – 25600 = 72275600
Hence, correct ∑X 2
= 72275600 – (900)2 + (910)2
= 72293700
Now the correct variance is
correct ∑X 2
− ( correct X )
2
correct
= variance
N
72293700
− ( 850.10 )
2
=
100
= 266.99
PAGE 129
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Variance =
1310 −220 2 2
= − ×5
480 480
= (2.729 – 0.21) × 25 = 62.975
Example 15: From the data given below state which series is more
consistent:
Variable Series A Series B
10 – 20 10 18
20 – 30 18 22
30 – 40 32 40
40 – 50 40 32
50 – 60 22 18
60 – 70 18 10
130 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution: Notes
Mid Value Series A Series B
Variable (x) f fx fx2 f fx fx2
10 – 20 15 10 150 2250 18 270 4050
20 – 30 25 18 450 11250 22 550 13750
30 – 40 35 32 1120 39200 40 1400 49000
40 – 50 45 40 1800 81000 32 1440 64800
50 – 60 55 22 1210 66550 18 990 54450
60 – 70 65 18 1170 76050 10 650 42250
Σfx = Σfx =
2
Σfx2
N = 140 140 Σfx =5300
5900 276300 =228300
Σfx 5900
Mean for series A = = = 42.14
N 140
2
276300 5900
= − = 14.05
140 140
σ 14.05
C.V. of series A = ×100 = ×100 =
33.34%
x 42.14
Now,
Σfx 5300
Mean for series B = = = 37.86
N 140
2
Σfx 2 Ófx
=
Standard deviation for series B −
N N
2
228300 5300
= − = 14.06
140 140
σ
14.06
C.V. of series B = ×100 =
×100 = 37.14%
x 37.86
Since the coefficient of variation is less for series A hence series A is
more consistent.
PAGE 131
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
5.7 Exercise
1. What is Dispersion? Discuss the Merits and Demerits of (i) Range
(ii) Mean Deviation.
2. Find the range for the following data:
(a) 63, 89, 98, 125, 79, 108, 117, 68
(b) 43.5, 13.6, 18.9, 38.4, 61.4, 29.8
3. A teacher asked the students to complete 60 pages of a record note
book. Eight students have completed only 32, 35, 37, 30, 33, 36,
35 and 37 pages. Find the standard deviation of the pages yet to
be completed by them.
Mass in kg. 60–62 63–65 66–68 69–71 72–74
Number of Students 5 18 42 27 8
4. From the following frequency distribution, compute the standard
deviation of 100 students:
5. Calculate the mean and standard deviation for the following data:
Size of Item 6 7 8 9 10 11 12
Frequency 3 6 9 13 8 5 4
6. What is the range for the following data set:
1,2,8,9,7,4,1,1,3,2,3
7. What is the range of the data sets 6, 2, 11, 14, 19, and 15?
8. Determine the highest value in the data set, if the range equals 40
and the lowest value should be equal to 6.
9. Find the range of the first 5 composite numbers.
10. Determine the interquartile range value for the first ten prime numbers.
11. Find the variance for an ungrouped data 5,12,3,18,6,8,2,10.
12. Find the variance of the following distribution.
Class Interval Frequency
20 - 24 15
25 - 29 25
30 - 34 28
35 - 39 12
40 - 44 12
45 - 49 8
132 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
13. During the 10 weeks of a session, the marks obtained by two Notes
candidates, Ramesh and Suresh, taking the computer programme
course are given below:
Ramesh 58 59 60 54 65 66 52 75 69 52
Suresh 87 89 78 71 73 84 65 66 56 46
(i) Who is the better scorer – Ramesh or Suresh?
(ii) Who is more consistent?
14. From the prices of shares of X and Y given below, state which share
is more stable in value:
X 55 54 52 53 56 58 52 50 51 49
Y 108 107 105 105 106 107 104 103 104 101
15. Calculate the mean deviation from the median for the following data:
Age (yrs.) 4-6 6-8 8-10 10-12 12-14 14-16 16-18
No. of Students 30 90 120 150 80 60 20
16. A factory produces two types of electric lamps A and B. in an
experiment relating to their life, the following results were obtained:
Length of Life No. of Lamps A No. of Lamps B
(in hrs)
500 – 700 5 4
700 – 900 11 30
900 – 1100 26 12
1100 – 1300 10 8
1300 - 1500 8 6
Compare the variability of the life of the two varieties using coefficient
of variation.
17. Find out who is better and consistent Batsman from the following
data:
Batsman A 10 12 80 70 60 100 0 4
Batsman B 8 9 7 10 5 9 10 8
18. The mean and standard deviation of 15 items were found to be 8
and 2 respectively. On checking, it was discovered that one item
11 has been misread as 5. Calculate the correct mean and standard
deviation.
PAGE 133
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
6
Moments
STRUCTURE
6.1 Moments
6.2 Skewness
6.3 Kurtosis
6.4 Normal Curve
6.5 Miscellaneous Questions
6.6 Exercise
6.1 Moments
In statistics, “moments” are quantitative measures related to the shape of a set of points.
Moments are used in various fields such as physics, engineering, and probability theory
to understand the distribution and characteristics of data sets. First four moments are of
key importance in statistics. These moments provide a comprehensive summary of the
data’s characteristics.
The first moment about the origin is the mean, which is a measure of central tendency.
The second moment about the mean is the variance, which measures the spread or dis-
persion of the data.
The third moment about the mean is skewness, which measures the asymmetry of the
data distribution.
The fourth moment about the mean is kurtosis, which measures the “tailedness” or peak-
edness of the data distribution. That is, it shows the presence of outliers and the shape
of the data distribution’s tails.
∑f (x − x)
r
=µr i i
N i =1
134 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Or Notes
1 n
∑f (z )
r
µr = i i
N i =1 ,
where, zi= xi − x
Here, fi is the frequency of the ith class and N is the total frequency.
Keeping r = 1, 2, 3, 4 gives the first four moments about mean.
In particular, for r = 0
1 n 1 n
∑ fi ( xi − x ) ⇒ µ= ∑=
0
µ=
0
0 fi 1
=N i 1= N i1
For r = 1
1 n
∑f (x =
−x)
1
=µ1 i i 0
N i =1
For r = 2
1 n
∑f (x − x)
2
µ2 = i i =σ 2 =variance
N i =1
For r = 3
1 n
∑f (x − x)
3
=µ3 i i
N i =1
For r = 4
1 n
∑f (x − x)
4
=µ4 i i
N i =1
∑ f ( x − A)
r
=µr′ i i ,
N i =1
where, N = Σfi
Or
PAGE 135
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 1 n
∑ f (d )
r
µr′ = i i ,
N i =1
where, di = xi – A
Keeping r = 1, 2, 3, 4 in the above formula, we get the first four raw
moments about any point A.
Thus,
For r = 1
1 n
∑ f ( x − A)
1
=µ1′ i i
N i =1
For r = 2
1 n
∑ f ( x − A)
2
=µ2′ i i
N i =1
For r = 3
1 n
∑ f ( x − A)
3
=µ3′ i i
N i =1
For r = 4
1 n
∑ f ( x − A)
4
=µ4′ i i
N i =1
1 n
A + ∑ fi di =
Note: x = A + µ1′, where di =
xi − A
N i =1
∑f (x )
r
= i i
N i =1
136 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
∑f (x )=
1
µ1′= i i x= mean
N i =1
For r = 2
1 n
∑f (x )
2
µ2′ = i i
N i =1
∑f (x ) ∑f (x )
3 4
µ3′ = i i , µ4′ = i i
N i =1 N i =1
µ3′ =
µ3 + 3 µ2 µ1′ + µ1′3
µ4′ =+
µ4 4 µ3 µ1′ + 6µ2 µ1′2 + µ1′4
Example 1: Find the first four moments for the following individual series:
X 1 3 9 12 20
Solution:
Sl. No. x x − x (x − x )2 (x − x )3 (x − x )4
1 1 – 8 64 – 512 4096
2 3 – 6 36 – 216 1296
3 9 0 0 0 0
PAGE 137
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 4 12 3 9 27 81
5 20 11 121 131 14641
n = 5 Σ x Σ (x − x ) Σ (x − x )2 Σ (x − x ) = Σ (x − x ) 4
3
(x − x )
∑=
5 1
i
=µ1 i =1
0
n
∑ ( x − x=
)
5 2
i 230
µ2
= i =1
= 46
n 5
∑ ( x − x=
)
5 3
i 630
µ3
= i =1
= 126
n 5
∑ f ( xi − x ) 20114
5 4
i =1 i
=µ4 = = 4022.8
n 5
Example 2: Calculate the variance and third central moment from the
following data:
xi 0 1 2 3 4 5 6 7 8
fi 1 9 26 59 72 52 29 7 1
Solution:
xi fi xi – 4 fi (xi – 4) fi (xi – 4)2 fi (xi – 4)3
0 1 – 4 – 4 16 – 64
1 9 – 3 – 27 81 – 243
2 26 – 2 – 52 104 – 208
3 59 – 1 – 59 59 – 59
4 72 0 0 0 0
5 52 1 52 52 52
6 29 2 58 116 232
7 7 3 21 63 189
8 1 4 4 16 64
Σ f i = 256 Σ fi (xi – 4) Σ fi (xi – 4)2 = Σ fi (xi – 4)3
= – 7 507 = – 37
138 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
1 8
−7 Notes
∑ f ( x=
− 4)
1
=µ1′ i i
N i =1 256
1 8
507
∑f (x=
− 4)
2
=µ2′ i i
N i =1 256
1 8
−37
∑ f ( x=
− 4)
3
=µ3′ i i
N i =1 256
2
µ= µ2′ − µ1′2 =
507 −7
2 − = 1.98047 – 0.00075 = 1.97972
256 256
3
−37 507 −7 −7
µ3′ − 3µ2′ µ1′ + 2µ1′ = − 3
µ3 = 3
+ 2
256 256 256 256
= –0.14453 + 0.16246 – 0.00004
= 0.01789
Example 3: Calculate μ1, μ2, μ3, μ4 for the following frequency distribution:
Marks 0-10 10-20 20-30 30-40 40-50 50-60
No. of Students 1 6 10 15 11 7
Solution: n = 6
No. of Mid
Students Value x − f (x − f (x −
Marks f x fx x x) f (x − x) 2
x)3 f (x − x)4
0-10 1 5 5 –30 – 30 900 – 27000 810000
10-20 6 15 90 –20 – 120 2400 – 48000 960000
20-30 10 25 250 –10 – 100 1000 – 10000 100000
30-40 15 35 525 0 0 0 0 0
40-50 11 45 495 10 110 1100 11000 11000
50-60 7 55 385 20 140 2800 56000 112000
Ν = Σ f Σ fx = Σ f (x − Σ f (x Σ f (x Σ f (x
= 50 1750 x) = 0 − x)2 = − x)3= − x)4 =
8200 –18000 3100000
Σx
x= = 9
n
1 n
0
∑f (x − x )
1
µ1 = i i = =0
N i =1 50
PAGE 139
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 1 n
8200
∑f (x − x )
2
µ2 = i i = = 164
N i =1 50
1 n
−18000
µ3 =∑ fi ( xi − x ) =
3
−360
=
N i =1 50
1 n
3100000
∑ f ( x − x=
)
4
µ4
= i i = 62000
N i =1 50
140 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
6.2 Skewness
Skewness denotes lack of symmetry. The distribution is said to be skewed
if mean, median and mode are not equal.
Skew Symmetrical Distribution
A distribution which is not symmetrical is said to be skew symmetrical
distribution. In skew symmetrical distribution the left tail and the right
tail are not of equal length. One tail will be longer than the other.
Negatively Skewed Distribution
In negatively skewed distribution, left tail of the curve is longer than
the right tail.
Symmetric Distribution
In symmetric distribution, both the tails of the curve are same.
PAGE 141
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
Definition of Skewness
A distribution is said to be ‘skewed’ when the mean, median and mode
fall at different points in the distribution and the balance is shifted to
one side or the other – To left or right. It is denoted as Sk
Note:
(i) There is no skewness in the distribution if mean = mode = median
(ii) There is no skewness in the distribution if, third quartile – median
= median – first quartile.
(iii) There is no skewness if the sum of the frequencies which are less
than mode = sum of the frequencies which are greater than mode
(iv) There is no skewness if quartiles are equidistant from the median.
(v) The distribution is negatively skewed if mean is less than mode.
Types of Skewness
1. Fairly symmetrical
2. Positively skewed
3. Negatively skewed.
142 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
PAGE 143
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 11 5 2 10 20 44
12 4 3 12 36 48
Σ f = Σ fd = 0 Σfd = 124
2
48
Ófd 0
Mean = A + = 9+ =9
Óf 48
Mode = value of x corresponding to maximum frequency (13) = 9
SD =
2
124 0
SD = −
48 48
SD = 1.61
mean − mode 9 −9
Karl Pearson’s Coefficient of Skewness = = =0
standard Deviation 1.61
Example 7: Calculate Karl Pearson’s Coefficient of Skewness from the
table given below:
Wages of Day 55 – 58 58 – 61 61 – 64 64 – 67 67 – 70
No. of Workers 12 17 23 18 11
Solution: Let A = 62.5
No. of Mid
Wages Workers Value d =
of Day (f) (x) x – 62.5 fd fd2 c.f.
55 – 58 12 56.5 – 6 – 72 432 12
58 – 61 17 59.5 – 3 – 51 153 29
61− 64 23 62.5 0 0 0 52
64 – 67 18 65.5 3 54 162 70
67 – 70 11 68.5 6 66 396 81
Σ f = 81 Σ fd = Σfd2 =
– 3 1143
Ófd −3
Mean = A + = 62.5 + = 62.46
Óf 81
Median class is 61 – 64
144 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
N 81 Notes
−cf − 29
2 2
l+
Hence, Median = i =
61 + 3 =
( ) 62.5
f 23
2
1143 −3 10286
SD = = − = = 3.76
81 81 729
3 ( mean − median )
Karl Pearson’s Coefficient of Skewness = = - 0.032
SD
6.2.3 Bowley’s Coefficient of Skewness
Bowley’s Coefficient of Skewness is based on the quartiles and median.
A distribution is symmetrical if the distance between the first quartile and
median is equal to the distance between the median and third quartile.
It is defined as
Q3 + Q1 − 2 Median
Bowley’s Coefficient of Skewness =
Q3 − Q1
PAGE 145
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
146 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
th Notes
3 ( N + 1)
Q3 = size of item = 4
4
th
N +1
median = size of item = 3
2
Q3 + Q1 − 2 Median
Bowley’s Coefficient of Skewness =
Q3 − Q1
4 + 1 −2 ( 3)
= = −0.33
4 −1
6.3 Kurtosis
It tells about the shape of a frequency distribution. It is a measure of
the flatness or peakedness of the curve
PAGE 147
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Example 10: Find the relation between moment about the mean and mo-
ment about any arbitrary point. The first four moments of a distribution
about the value 4 of the variate are –1.5, 17, –30 and 108. Calculate the
first four moments about the mean and find β1 and β2.
Solution:
We have,
148 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
4 70 0 0 0 0 0 Notes
5 56 1 56 56 56 56
6 28 2 56 112 224 448
7 8 3 24 72 216 648
8 1 4 4 16 – 64 256
Σ f = Σ fd = 0 Σ fd2 = Σ fd3 = 0 Σ fd4 =
256 512 2816
Moments about the point A = 4 are
PAGE 149
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
150 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution: Let A = 20
(x – 20)/5
x f = d fd1 f(d)2 f(d)3 f(d)4
5 8 -3 -24 72 -210 648
10 15 -2 -30 60 -124 240
15 20 -1 -20 20 -20 20
20 32 0 0 0 0 0
25 23 1 23 23 23 23
30 17 2 34 68 136 272
35 5 3 15 45 135 405
Σ fi = 0 Σ fi (d) = Σ fi (d)2 = Σ fi (d)3 = Σ fi (d)4 =
120 – 2 288 – 62 1608
PAGE 151
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 1 7
288
µ2′ =
h2 ×
N
∑f d
i =1
i
2
=25 ×
120
=60
1 7
−62
µ3′ =
h3 ×
N
∑f d
i =1
i
3
125 ×
=
120
−64.58
=
1 7
1608
µ4′ =h4 ×
N
∑f d
i =1
i
4
625 ×
=
120
8375
=
µ1 = 0
µ=
2 µ2′ − µ1′2 = 60 – (–0.083)2 = 59.993
µ3′ 3µ2′ µ1′ + 2µ1′3
µ3 =−
−64.583 − 3 ( −0.083)( 60 ) + 2 ( −0.083)
3
=
= –49.644
µ4′ 4µ3′ µ1′ + 6µ2′ µ1′2 − 2µ1′4
µ4 =−
= 8375 – 4(–0.083) (–64.583) + 6(–0.082)2(60) – 3(–0.083)4
= 8251.08
Example 13: Calculate the first four moments about the mean and com-
ment on the nature of the distribution:
x 1 2 3 4 5 6 7 8 9
f 1 6 13 25 30 22 9 5 2
Solution:
(x –5)
x f fx = d fd fd2 fd3 fd4
1 1 1 -4 -4 16 -64 256
2 6 12 -3 -18 54 -162 486
3 13 39 -2 -26 52 -104 208
4 25 100 -1 -25 25 -25 25
5 30 150 0 0 0 0 0
6 22 132 1 22 22 22 22
7 9 63 2 18 36 72 144
8 5 40 3 15 45 135 405
9 2 18 4 8 32 128 512
Σ f= Σ fx= Σ fi (d) Σ fi (d)2 = Σ fi (d)3 Σ fi (d)4 =
113 555 = – 10 282 = 2 2058
152 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
1 9
555 Notes
Mean =
N
∑ f=
i xi
i =1
= 4.91
113
1 9
−10
∑ f ( x=
− 5)
1
=µ1′ i i
N i =1 113
1 9
282
∑ f ( x=
− 5)
2
=µ2′ i i
N i =1 113
1 9
2
∑ f ( x=
− 5)
3
=µ3′ i i
N i =1 113
1 9
2058
∑ f ( x=
− 5)
4
=µ4′ i i
N i =1 113
µ1 = 0
282 −10 2
µ= µ ′ µ ′ 2
= −
2 2 − 1 113 = 2.496 – 0.0078 = 2.488
113
3
2 282 −10 −10
µ3′ − 3µ2′ µ1′ + 2µ1′ = − 3
µ3 = 3
+ 2
113 113 113 113
= 0.017699 + 0.662542 + 0.015663
= 0.69590
µ4′ 4µ3′ µ1′ + 6µ2′ µ1′2 − 2µ1′4
µ4 =−
2 4
2058 2 −10 282 −10 −10
= − 4 + 6 − 3
113 113 113 113 113 113
= 18.212 + 0.00626 + 0.111726 – 0.00018399
= 18.335
µ32 0.695902 0.484277
β1 skewness
= = = = = 0.03144
µ23 2.4883 15.40108
µ4 18.335 18.335
β 2 kurtosis
= = = = = 2.96
µ22 2.4882 6.19
PAGE 153
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
ÓX 2 144280
−( X =
)
2
− ( 72.6 ) = 44.08
2
S.D. =
N 20
Hence,
72.6 − 63.7
Sk = = 0.202
44.08
Example 15: Calculate the Karl Pearson’s coefficient of skewness from
the following data:
Size (x) 3.5 4.5 5.5 6.5 7.5 8.5 9.5
Frequency (f) 3 7 22 60 85 32 8
Solution:
x f x – 6.5 = d fd fd2
3.5 3 -3 -9 27
4.5 7 -2 -14 28
5.5 22 -1 -22 22
6.5 60 0 0 0
7.5 85 1 85 85
8.5 32 2 64 128
9.5 8 3 24 72
Σ f = 217 0 Σ fd = 128 Σ f d = 362
2
154 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
S.D. = = 1.149
PAGE 155
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes N 204
−cf − 34
4 × i =37 + 4
Q1 =l + 1
()
f 42
= 37.405
th
3N
Q3 = size of item = 153 item.
th
4
Hence Q3 will lie in the class 40 – 41
3N 3 ( 204 )
−cf − 130
4 × i =40 + 4 1
Q3 =l + ()
f 45
= 40 + 0.511
= 40.511
th
N
Median = size of item = 102th item. Hence median will lie in the
class 38 – 39. 2
N 204
−cf − 76
2 × i =38 + 2
Median =l + 1
()
f 54
= 38 + 0.48
= 38.48
Now,
Q3 + Q1 − 2 Median
Bowley’s Coefficient of Skewness =
Q3 − Q1
156 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Solution: Notes
No. of Air Persons Cumulative
Income (Rs.) (f) Frequency (c.f.)
Below 200 25 25
200-400 40 65
400-600 80 145
600-800 75 220
800-1000 20 240
Above 1000 16 256
Here N = 256
th
N
Q1 = size of item = 64th item. Hence Q1 lies in the class 200-400
4
N 256
−cf − 25
4 ×i = 200 + 4
Q1 = l + 200
( )
f 40
= 200 + 195 = 395
th
3N
Q3 = size of item = 192th item. Hence Q3 will lie in the class
600-800 4
3N 3 ( 256 )
−cf − 145
4 ×i = 600 + 4 200
Q3 = l + ( )
f 75
600 + 125.33 =
= 725.33
th
N
median = size of item = 128th item. Hence median will lie in the
2
class 400 – 600.
N 256
−cf − 65
2 × i = 400 + 2
Median =l + 200
( )
f 80
=400 + 157.5 =557.5
PAGE 157
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes Now,
Q3 + Q1 − 2 Median
Bowley’s Coefficient of Skewness =
Q3 − Q1
Or γ 12 = β1
12 = β1
Or β1 = 1
µ32
Also, β1 =
µ23
1 = µ3
2
163
Or µ32 = 4096
⇒ µ3 =
64
158 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
Or µ4 = µ2 = 4(16)2 =1024
2
27 − x
44 =
40 + ×20
2 ( 27 ) − x − y
44 − 40 27 − x
=
20 54 − x − y
PAGE 159
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes 1 27 − x
=
5 54 − x − y
Or 54 – x – y = 5(27 – x)
Or 54 – x – y = 135 – 5x
Or 4x – y = 81 ------ *
Also total frequency = N = 100
Thus, 13 + x + 27 + y + 16 = 100
Or x + y = 44 ------ **
Solving * and **, we get
x = 25 and y = 19
Now, we calculate Karl Pearson coefficient of skewness
Daily Mid f (x – 50)/20 fd fd2
Expenditure Value (x) = d
0 - 20 10 13 -2 -26 52
20 – 40 30 25 -1 -25 25
40 – 60 50 27 0 0 0
60 – 80 70 19 1 19 19
80 – 100 90 16 2 32 64
Σ f= Σ fd = 0 Σ f d2 =
100 160
Mean − Mode
Karl Pearson’s coefficient of skewness = Sk =
S .D.
Here, A = 6.5, N = 100
Ófd 0
Mean = A + = 50 + × 20 = 50 + 0 = 50
N 100
2 2
Ófd 2 Ófd 160 0
S.D. = i × i × − 20
=× − = 25.2
N N 100 100
Mode = 44
Hence,
50 − 44
Sk = = 0.238
25.2
Since Sk > 0 hence distribution is positively skewed.
160 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Notes
6.6 Exercise
1. Calculate first four moments about the mean, for the following
individual series:
5 5 5 5 5 5
2. Find the first four moments about the mean of the following series:
1 3 7 9 10
3. If the first four moments of a distribution about the value 5 are equal
to –4, 22, –117 and 560. Determine the corresponding moments:
About the (i) mean, and (ii) about zero
4. Compute first four moments of the data 3, 5, 7, 9 about the mean.
Also, compute the first four moments about the point
5. Calculate Karl Pearson’s Coefficient of Skewness from the data given
below:
1. S.D. = 6.5, mean = 29.6, mode = 27.52.
2. Mean = 100, Variance = 35, Median = 99.61.
3. Mean = 45, Median = 48, S.D. = 22.5.
6. Find the Karl Pearson’s Coefficient of Skewness for the following:
Years Under 10 20 30 40 50 60
PAGE 161
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
162 PAGE
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
18. Given the following information, find the first four central moments Notes
PAGE 163
Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi