100% found this document useful (2 votes)
93 views22 pages

Chapter 1666

Uploaded by

J nemo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
93 views22 pages

Chapter 1666

Uploaded by

J nemo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

STATISTICS

(THEORY AND PRACTICALS)


(As per New CBCS Syllabus for 1st Year, 1st Semester,
Common for B.A./B.Sc. for Osmania University and for All Other Universities in
Telangana State w.e.f. 2016-17)

Dr. M. Jagan Mohan Rao


M.Sc., M.Phil., Ph.D.
Principal,
Jagruti Degree and PG College,
Narayanaguda, Hyderabad - 29.

ISO 9001:2008 CERTIFIED


© Author
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording and/or otherwise without the prior
written permission of the publisher.

First Edition : 2017

Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,
“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.
Phone: 022-23860170/23863863; Fax: 022-23877178
E-mail: [email protected]; Website: www.himpub.com
Branch Offices :
New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj,
New Delhi - 110 002. Phon e: 011-23270392, 23278631; Fax: 011-23256286
Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.
Phone: 0712-2738731, 3296733; Telefax: 0712-2721216
Bengaluru : Plot No. 91-33, 2nd Main Road Seshadripuram, Behind Nataraja Theatre,
Bengaluru-560020. Phone: 08041138821, 9379847017, 9379847005
Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,
Hyderabad - 500 027. Phone: 040-27560041, 27550139
Chennai : New-20, Old-59, Thirumalai Pillai Road, T. Nagar, Chennai - 600 017.
Mobile: 9380460419
Pune : First Floor, "Laksha" Apartment, No. 527, Mehunpura, Shaniwarpeth
(Near Prabhat Theatre), Pune - 411 030. Phone: 020-24496323/24496333;
Mobile: 09370579333
Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,
Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549
Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,
Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847
Ernakulam : 39/176 (New No.: 60/251) 1st Floor, Karikkamuri Road, Ernakulam,
Kochi – 682011. Phone: 0484-2378012, 2378016 Mobile: 09387122121
Bhubaneswar : 5 Station Square, Bhubaneswar - 751 001 (Odisha).
Phone: 0674-2532129, Mobile: 09338746007
Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank,
Kolkata - 700 010, Phone: 033-32449649, Mobile: 7439040301
DTP by : Sunita
Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.
PREFACE

As there is no exact book to meet the requirements of B.A/B.Sc. First Year students of
Statistics as per Semester System (CBCS), an attempt has been made to write a book to cater the
needs of students. The book is also made more suitable for classroom teaching by providing a
large number of real-life examples at appropriate places and practicals using MS-EXCEL at the
end.
I express my sincere gratitude to Prof. R.J. Ramalinga Swamy, Prof. M. Krishna Reddy,
Prof. M. Gopal Rao, Prof. P. Udaya Sree, Prof.P. Lakshmi Manga, Prof. V.V. Haragopal,
Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings,
valuable suggestions and encouragement during the preparation of the book. And I thank other
staff members of the Department of Statistics, Osmania Univeristy, Hyderabad for their support.
I take this opportunity to thank Dr. R. Sudhakar Reddy and Dr. K. Ranga Rao,
Dr. Raghunadha Charya and to my best friends Mr. V. Papa Rao, Mr. K. Venkat Raman,
Mr. Yugandhar, Mr. Goverdhan, Mr. Mohan Prasad, Mr. Shekharam and Mrs. Parimala Sudheer,
Lecturers in Statistics for their suggestions and help.
I am thankful to Dr. O.S. Reddy, Chairman, Jagruti Group of Institutions, Mrs. D. Josephine,
Academic Director, Mr. S. Surya Prakash, Vice-Principal, Jagruti Degree & P.G. College.
Special thanks to Mr. Raghavendra Kulkarni, Principal, Nrupathunga College, Hyderabad,
Mr. Murali Krishna, Principal, G. Pulla Reddy College, Hyderabad, Dr. K. Padmavathi, Babu
Jagjivan Ram Govt. Degree College, Hyderabad and Dr. S. Srinivasa Rao, Principal, Govt.
Degree College, Atmakur, Mahaboob Nagar Dist.
Lastly, I would like to mention that although every possible care has been taken to make the
book free from printing errors but still the possibility of some error creeping in inadvertently
cannot be ruled out. I shall feel highly obliged to all readers if the same are brought to our notice.
Critical evaluation and suggestions for improvement are most welcome and shall be gratefully
acknowledged.

Hyderabad Dr. M. Jagan Mohan Rao


SYLLABUS

B.A./B.Sc. 1st Year First Semester (CBCS)


Statistics Syllabus
Paper-I/I: Descriptive Statistics and Probability (DSC-2A)
(4HPW with 4 Credits and 100 Marks)
UNIT – I
Descriptive Statistics: Concept of primary and secondary data. Methods of collection and
editing of primary data. Designing a questionnaire and a schedule. Sources and editing of secondary
data. Classification and tabulation of data. Measures of central tendency (mean, median, mode,
geometric mean and harmonic mean) with simple applications. Absolute and relative measures of
Importance of moments, central and non-central moments and their interrelationships, Sheppard’s
corrections for moments for grouped data. Measures of skewness based on quartiles and moments and
kurtosis based on moments with real-life examples.
UNIT – II
Probability: Basic concepts in probability—deterministic and random experiments, trail,
outcome, sample space, event, and operations of events, mutually exclusive and exhaustive events, and
equally likely and favourable outcomes with examples. Mathematical, statistical and axiomatic
definitions of probability with merits and demerits. Properties of probability based on axiomatic
definition. Conditional probability and independence of events. Addition and multiplication theorems
for n events. Boole’s inequality and Bayes theorem. Problems on probability using counting methods
and theorems.
UNIT – III
Random Variables: Definition of random variable, discrete and continuous random variables,
functions of random variables, probability mass function and probability density function with
illustrations. Distribution function and its properties. Transformation of one-dimensional random
variable (simple I-I functions only). Notion of bivariate random variable, bivariate distribution and
statement of its properties. Joint, marginal and conditional distributions. Independence of random
variables.
UNIT – IV
Mathematical Expectation: Mathematical expectation of a function of a random variable. Raw
and central moments and covariance using mathematical expectation with examples. Addition and
multiplication theorems of expectation. Definition of moment generating function (M.G.F.), cumulant
generating function (C.G.F.), probability generating function (G.G.F.) and characteristic function (C.F.)
and statements of their properties with applications. Chebyshev’s and Cauchy-Schwartz’s inequalities
and their applications. Statement and applications of weak law of large numbers and central limit
theorem for identically and independently distributed (I.I.D.) random variables with finite variance.
STATISTICS
Practical Paper – I/I

1. Basics of EXCEL – Data entry, editing and saving, establishing and copying a formulae,
built in functions in EXCEL, copy and paste and exporting to MS WORD document.
2. Graphical presentation of data (Histogram, Frequency Polygon, Ogives).
3. Graphical presentation of data (Histogram, Frequency Polygon, Ogives) using MS-EXCEL.
4. Diagrammatic presentation of data (Bar and Pie).
5. Diagrammatic presentation of data (Bar and Pie) using MS-EXCEL.
6. Computation of non-central and central moments – Sheppard’s corrections for grouped data.
7. Computation of coefficients of Skewness and Kurtosis – Karl Pearson’s and Bowley’s
β1 and β2.
8. Computation of measures of central tendency, dispersion and coefficients of Skewness,
Kurtosis using MS-Excel.
CONTENTS

Page No.
1. Descriptive Statistics 1 – 15
2. Analysis of Quantitative Data 16 – 87
3. Theory of Probability 88 – 125
4. Random Variables 126 – 173
5. Mathematical Expectation 174 – 193
6. Transformation of Random Variable 194 – 206
7. Inequalities 207 – 226
8. Practicals using MS-EXCEL 227 – 280
Descriptive Statistics 1

CHAPTER 1

DESCRIPTIVE STATISTICS

OUTLINE OF THE CHAPTER

1.1 Introduction

1.2 Primary Data and its Collection

1.3 Secondary Data

1.4 Sources of Secondary Data

1.5 Designing of a Questionnaire and Schedule

1.6 Classification of Data

1.7 Tabulation of Data

1.8 Exercise
2 Statistics (Theory and Practicals)

1.1 Introduction
Without the availability of data, the science of statistics would cease to exist. The information
gathered from sample is typically used to gain insight into a much larger population. The reliability
of conclusion drawn from sample data depends to a greater extent on the quality of the data. Are
they accurate? Do the data really represent the population of interest? Was the sample collected
properly? Before we answer these questions, we first examine how the data have been collected
and from where the data have been collected. Thus, this chapter is devoted to study data collection
methods.
The systematic, planned and meaningful way of collecting information is known as collection
of data. The methods for collection of data depends upon several considerations such as objective,
scope and nature of the problem under study. Keeping in view the aim of the investigation, the data
may be collected either from a primary source or from a secondary source. Primary data means
collection of information for the first time by an enumerator for his investigation. The information
that has already been collected by others or by the same investigator for other purpose and used for
the current study is known secondary data. A detailed discussion on the method of collection of
primary and secondary data has been taken up in the subsequent sections.

1.2 Primary Data and its Collection


When we start a new project for which no information is available, even if it is available,
it may not be sufficient and not totally reliable. In such cases, data have to be collected first hand
which is known as primary data.
Primary data are original and first hand information. For example, Osmania University regularly
enumerates data on various aspects of examination results such as number of candidates failed,
number of candidates passed, number of candidates secured first class, etc. of a certain examination.
These results constitute primary data.
For collecting primary data, the enumerator may select any one of the following methods:
(i) Direct personal interview or observation
(ii) Indirect personal interview or observation
(iii) Mailed questionnaires
(iv) Schedules sent through enumerators.

Direct Personal Interview


Under this method, the information will be collected by the investigator through personal
interview from the informants. The reliability of collected data depends upon the training and attitude
of the investigator and supporting attitude of the respondent.
This method is most suitable for a type of investigation where: (i) the investigation is confidential
and (ii) the process of investigation is so complex that it requires personal attention of the investigator.
Descriptive Statistics 3

Merits:
(i) It is possible to collect original, accurate and exact data.
(ii) The doubts of informants can be checked and clarified.
(iii) Informants doubts can be cleared in the language most suitable to him.
Demerits:
1. Skilled enumerators are required to collect data by this method.
2. This method consumes resources like time and money.

Indirect Personal Interview


This method is used when the informants are reluctant to provide information directly. When
the field of investigation is very large, the information about a large number of respondents can
indirectly be obtained from one person who may be head of an institution or community. This
method is useful to collect even secret information. It is generally adopted by police and CBI for
the collection of information regarding crimes. In the investigation of crime, they collect data from
a third party or witness or head of an institution, who is supposed to be in touch with the person
under investigation.
Merits:
(i) If the area of investigation is very large, then this method of data collection is most
suitable or when the respondents are reluctant to give the information directly.
(ii) If a person is not interested to reveal his habits of drinking, smoking, gambling, etc. By
applying this method, the information can be collected from third party.
Demerits:
(i) In the absence of direct contact between investigator and informant, it may happen that
many important points remain unnoticed.
(ii) The information may be biased as it is provided by the third party.
(iii) The information collected from different persons may not be same and comparable.

Information through Local Agencies (or) Correspondents


In this method, local agents or correspondents are appointed in different parts of the area
under investigation. These agents sent the required information at regular interview of time. This
method is often adopted by newspapers.
Merits:
(i) It is economical in terms of time money and labour.
(ii) When periodic information is required at regular intervals and area of investigation is
large, this method is very useful.
4 Statistics (Theory and Practicals)

Demerits:
(i) The information lacks originality.
(ii) The bias of the correspondent affects the information.

Mailed Questionnaire Method


In this method, a set of questions is sent by mail to the informants. They are expected to
answer the questions and mail them back to the investigator. It is very useful when the informants
are educated and when the area of investigation is very wide.
Merits:
(i) It is costly and time-consuming.
(ii) Collected information is free from the bias of the enumerators.
Demerits:
(i) It is applicable only to educated informats.
(ii) All informants may not back the questionnaire.
(iii) Some of the informants may send incomplete questionnaires.

1.3 Secondary Data


The information that has already been collected by others is called secondary information.
As we have mentioned in the previous section, examination results enumerated by Osmania
University are primary data to the Osmania University but the same statistics used by anyone else
would become secondary data for that user. Similarly, vital statistics, collected for every ten years,
are primary data to the Registrar General of India but the same statistics used by anyone else
would be secondary data for that user. So, secondary data can be collected from various sources.
The sources of secondary data has been given in detail in the following section.

1.4 Sources of Secondary Data


The sources of secondary data can broadly be classified in two categories: (i) published and
(ii) unpublished sources.

Published Sources:
For the sake of public, information is published and made available to all interested parties.
The sources of published data are:
(i) Governments Publications:
State and central governments publish reports of various committees and commissions
and official publications like Gazettes, Vital Statistics, etc.
Descriptive Statistics 5

(ii) International Publications:


Various foreign governments and international agencies like UNO, World Bank and
International Monetary Fund regularly publish reports on the data collected by them on
various aspects.
(iii) Semi-official Publications:
Various local bodies such as District Boards, Municipal Corporations, etc. publish periodicals
providing information about vital factors like health, births, deaths, etc.

Private Publications
The following private publications may also be enlisted as the source of secondary data.
(i) Reports prepared by research scholars, universities, etc.
(ii) Publications of professional bodies like Indian Statistical Institute (ISI), ICAR, NCERT,
ICMR and CSIR.
(iii) Annual reports of banks and joint stock companies, stock exchanges, etc.
(iv) Information published in newspapers, books, magazines, etc.

Unpublished Sources
All the information need not be in published form. Information can also be taken from
unpublished sources like diaries, letters, unpublished biographics and autobiographics. Unpublished
data may also be available with scholars, research workers, trade associations and individuals.

Precautions for Using Secondary Data


The secondary data may not be useful always, because it might have been collected to meet
the different objectives. Before using this data, it is necessary to examine the following:
(a) Are the data reliable and suitable?
(b) Are the data sufficient for present investigation?

1.5 Designing of a Questionnaire and Schedule


1.5.1 Questionnaire
Collection of data through questionnaires is the most popular method for collecting primary
data. A questionnaire is a list of questions pertaining to the enquiry. Under this method, a questionnaire
is sent to various informants with a request to answer the questions and return the questionnaire.
The questionnaire is mailed to the respondents who are expected to read the questions and record
their response in the space meant for the purpose on the questionnaire itself. The respondents have
to answer the questions on their own. This method is extensively employed in various economic
and business surveys.
6 Statistics (Theory and Practicals)

Merits:
The merits of this method are as under:
1. This method is very economical particularly when the universe is large and spread
geographically on a vast area.
2. Since the answers happen to be in the respondent’s own words, he/she is free from the
bias of the interviewer.
3. Respondents can take their own time to answer the questions. So, they give well thought-
out answers.
4. Respondents that are at remote places and are not easily approachable can also be reached
conveniently.
5. Large samples can be covered and thus the results can be more dependable and reliable.

Limitations:
This method also suffers from the following limitations:
1. Sometimes the respondents do not bother to return the questionnaires. So, there is the
problem of low rate of return of the duly filled-in questionnaires. And also bias due to
non­response cannot often be determined.
2. Questionnaires can be circulated only among the respondents who are educated and
cooperative.
3. Once the questionnaires are sent to the respondents, the investigator cannot change or
modify the questions for individual respondents.
4. There is no flexibility because of the difficulty of amending the approach once the
questionnaires have been despatched.
5. There is also the possibility of ambiguous replies or omission of replies to certain questions.
Interpretation of omissions is difficult.
6. It is difficult to know whether willing respondents are truly representative.
7. This method is likely to be the slowest of all, because the respondents take their own time
to return the filled-in questionnaires.
Before sending them to the respondents, it is advisable to conduct a ‘Pilot Survey’ for
pre­testing it. Pilot Survey is, in fact, the replica and rehearsal of the main survey. From the experience
gained in this sort of survey, changes can be made in the questionnaire for the final collection of
data. The pre-testing is necessary particularly in case of a big enquiry.
Features of a Good Questionnaire: In order to make the questionnaire more effective, it
must be very carefully drafted. The form and tone of the questionnaire must be designed so as to
bring in the personal element which is lost in the mailed questionnaire. The following are the qualities
of a good questionnaire:
Descriptive Statistics 7

1. It should be short and simple.


2. Questions should proceed in logical sequence starting with easy questions and then moving
on to more difficult ones. Personal questions should generally be avoided or may be left to
the end.
3. Questions may be dichotomous (yes or no type) or multiple choice. Open ended questions
are difficult to analyse and should be avoided to the extent possible.
4. In order to ensure the reliability of respondent, there should be some control questions.
They introduce a cross-check to see whether the information collected is correct or not.
5. Adequate space for answers should be provided in the questionnaire itself. There should
always be provision for indications of uncertainty, e.g., “do not know”, “no preference”
and so on.
6. Layout and design of the questionnaire should also be attractive so that it may attract the
attention of the respondents.

1.5.2 Schedule
This method of data collection is similar to that of the questionnaire. The schedule is also a
proforma containing a set of questions. The difference between the questionnaire and the schedule
is that the schedule is being filled in by the enumerators who are specially appointed for the purpose.
These enumerators go to respondents with the schedules and ask them the questions from the
schedule in the order they are listed. The enumerator records the replies in the space meant for the
same in the schedule itself. In certain situations, schedules are handed over to respondents and the
enumerators help the respondents in recording the answers. Enumerators explain the objectives of
the investigation and also remove the difficulties which the respondent may feel in understanding
the implications of a particular question(s) or the definition or concept of difficult terms. Thus, the
essential difference between the questionnaire and schedule is that the former (i.e., questionnaire)
is sent to the informants by post and in the latter case, the enumerators carry the schedule personally
to informants and fill them in their own handwriting. This method is usually adopted in investigations
conducted by governmental agencies or by some big organisations. For instance, population census
all over the world is conducted through this method.
Data collection through schedules requires enumerators for filling up schedules and as such
they should be very carefully selected. They should be trained to perform their job well. They should
be intelligent and must possess the capacity of cross-examination in order to find out the fact. Above
all, they should be honest, sincere, hard working and should have the patience and perseverance. In
drafting the schedules, all points stated for a good questionnaire, must as well be observed.
Merits:
The main advantages of this method are as follows:
1. It can be adopted in those cases where informants are illiterate.
2. The problem of non-response is avoided as the enumerators go personally to obtain the
information.
8 Statistics (Theory and Practicals)

3. The method is very useful in extensive enquiries and can lead to fairly reliable results.
4. The identity of the respondent is known which is not always clear in case of a questionnaire.
Limitations:
This method has the following limitations:
1. This method is very expensive as enumerators are generally paid persons. Money also
has to be spent in training them.
2. Another limitation is that if the investigator is not good in interviewing, most of the information
collected by him may be unreliable.
3. Since the investigator is present when the respondent is giving the answers, the respondent
may not give answers to some personal questions freely.

1.6 Classification of Data


1.6.1 Introduction
You have learnt about various sources and methods of collecting primary data and secondary
data. As the collected data is in the raw form, you cannot interpret it and draw useful conclusions.
Therefore, to draw meaningful conclusions on the basis of collected data, it is essential to present
it in summarised and simple form. Classification of data helps us in presenting the mass of data in
summarised and simple form. In this chapter, you will learn the meaning, objectives and different
methods of classification.

1.6.2 Meaning of Classification


Classification means arranging the mass of data into different classes or groups on the basis
of their similarities and resemblances. All similar items of data are put in one class and all dissimilar
items of data are put in different classes. Statistical data is classified according to its characteristics.
For example, if we have collected data regarding the number of students admitted to a university in
a year, the students can be classified on the basis of sex. In this case, all male students will be put
in one class and all female students will be put in another class. The students can also be classified
on the basis of age, marks, marital status, height, etc. The set of characteristics we choose for the
classification of the data depends upon the objective of the study. For example, if we want to study
the religions mix of the students, we classify the students on the basis of religion.

1.6.3 Objectives of Classification


Classification helps in achieving the following objectives:
1. It helps in presenting the mass of data in a concise and simple form.
2. It divides the mass of data on the basis of similarities and resemblances so as to enable
comparison.
Descriptive Statistics 9

3. It is a process of presenting raw data in a systematic manner enabling us to draw meaningful


conclusion.
4. It provides a basis for tabulation and analysis of data.
5. It provides us a meaningful pattern in the data and enables us to identify the possible
characteristics in the data.

1.6.4 Methods of Classification


You have studied the meaning and objectives of classification. Now, let us study the methods
of classification. Broadly, there are two methods of classification: (i) classification according to
attributes, and (ii) classification according to variables.
(i) Classification According to Attributes
An attribute is a qualitative characteristic which cannot be expressed numerically. Only the
presence or absence of an attribute can be known. For example, intelligence, religion, caste, sex,
etc. are attributes. You cannot quantify these characteristics. When classification is to be done on
the basis of attributes, groups are differentiated either by the presence or absence of the attribute
(e.g., male and female) or by its differing qualities. The qualities of an attribute can easily be
differentiated by means of some natural line of demarcation. Based on this natural difference, we
can determine the group into which a particular item is placed. For instance, if we select colour of
hair as the basis of classification, there will be a group of brown haired people and another group of
black haired people. There are two types of classification based on attributes.
1. Simple Classification: In simple classification, the data is classified on the basis of only
one attribute. The data classified on the basis of sex will be an example of simple
classification. It can be shown as under:

Students

Male Female

2. Manifold Classification: In this classification, the data is classified on the basis of more
than one attribute. For example, the data relating to the number of students in a university
can be classified on the basis of their sex and marital status as shown below:
Students

Male Female

Married Unmarried Married Unmarried


10 Statistics (Theory and Practicals)

(ii) Classification According to Variables


Variables refer to quantifiable characteristics of data and can be expressed numerically.
Examples of variable are wages, age, height, weight, marks, distance, etc. As you know, all these
variables can be expressed in quantitative terms. In this form of classification, the data is shown in
the form of a frequency distribution. A frequency distribution is a tabular presentation that generally
organises data into classes and shows the number of observations (frequencies) falling into each of
these classes. Based on the number of variables used, there are three categories of frequency
distribution: (1) uni-variate frequency distribution, (2) bi-variate frequency distribution and (3) multi-
variate frequency distribution.
1. Uni-variate Frequency Distribution: The frequency distribution with one variable is
called a uni-variate frequency distribution. For example, the students in a class may be classified on
the basis of marks obtained by them. This is presented in Example 1.
Example 1: An example of Uni-variate Frequency Distribution.
Marks in Statistics No. of Students
0–10 15
10–20 25
20–30 30
30–40 20
40–50 10
Total 100

The following points should be noted about the frequency distribution .


1. The marks in statistics have been divided into various classes of 0–10, 10–20, 20–30, etc.
2. The first class 0–10 marks signifies that the students securing 0 marks or above but less
than 10 marks will be put in this class. Similarly, the class 10–20 denotes that the students
securing 10 marks or above but less than 20 will be placed in this class.
3. The students falling into these classes have been put in the respective classes, which
means that there are 15 students in the class 0–10, 25 students in the class 10–20 and so
on. The number of students falling in a particular class is known as the frequency of that
class.
2. Bi-variate Frequency Distribution: The frequency distribution with one variable is
called bi-variate frequency distribution. The uni-variate frequency distribution given in Example 1
shows only the marks of the students in statistics. If a frequency distribution shows two variables,
i.e., marks in statistics and age, it is known as bi-variate frequency distribution. Look at following
Example 2 of bi-variate frequency distribution.
Descriptive Statistics 11

Example 2: An example of Bi-variate Frequency Distribution.

Marks in Number of Students with the Age


Statistics (in Years on Last Birthday)
18 years 19 years 20 years 21 years Total

0–10 1 4 8 2 15
10–20 4 8 9 4 25
20–30  3 17 10 30
30–40  5 10 5 20
40–50   1 9 10

Total 5 20 45 30 100

The following points should be noted about the bi-variate distribution presented in above
Example 2:
1. The marks in statistics have been divided as 0–10, 10–20, 20–30, etc., whereas age in
years on the last birthday has been taken as 18 years, 19 years, 20 years, etc.
2. The students securing 0 marks or above but less than 10 marks and 18 years of age have
been put against 0–10/18 years. This number is 1. Similarly, the number of students falling
in 0–10 class but with 19, 20 and 21 years of age are 4, 8 and 2 respectively. The total
number of students with 18 years of age is five among which one person is placed 0–10
marks class and the remaining four in 10–20 marks class.
3. Multi-variate Frequency Distribution: The frequency distribution with more than two
variables is called multi-variate frequency distribution. For example, the students in a class may be
classified on the basis of marks, age and sex. Now, let us take the example presented in Example 2
and further classify the students based on sex. Study Example 3 carefully and examine how it is
done.
Example 3: Example of Multi-variate Frequency Distribution.
Marks in Number of Students with the Age
Statistics (in Years on Last Birthday)
18 years 19 years 20 years 21 years 22 years Total
Male Female M F M F M F M F M F
0–10  1 2 2 5 3 2    9 6
10–20 3 1 4 4 4 5 4    15 10
20–30   1 2 7 10 7 3   15 15
30–40   3 2 5 5 3 2   11 9
40–50      1 2 2 3 2 5 5
Total 3 2 10 10 21 24 18 7 3 2 55 45
12 Statistics (Theory and Practicals)

1.7 Tabulation
1.7.1 Introduction
In 1.6, we have discussed the objectives of classifying the mass of data so as to render
comparison of data possible. We have also explained the procedure for the construction of a frequency
distribution involving one or two variables. When two variables are given, the arrangement in rows
and columns is ordinarily known as a statistical table. Such tables can be constructed even when
the given data relates to attributes. In this chapter, you will study in detail the meaning and objectives
of tabulation and the procedure of constructing statistical tables.

1.7.2 Meaning of Tabulation


The tabular presentation of data is one of the techniques of presentation of data, the two
other techniques being diagrammatic presentation and graphic presentation. The tabular
presentation means arranging the collected data in an orderly manner in rows and columns.
The horizontal arrangement of the data is known as rows, whereas the vertical arrangement is
called columns. The classified facts are recorded in rows and columns to give the tabular form.

1.7.3 Objectives of Tabulation


Tabular presentation serves the following objectives:
1. Systematic Presentation of Data: Generally, the collected data is in fragmented form.
The mass of data is presented in a concise and simple manner by means of statistical
tables. Thus, tabulation helps in presenting the data in an orderly manner.
2. Facilitates Comparison of Data: If the data is in the raw form, it is very difficult to
compare. Comparison is possible when the related items of data are presented in simple
and concise form. The presentation of complete and unorganised data in the form of
tables facilitates the comparison of the various aspects of the data.
3. Identification of the Desired Values: In tabulation, data is presented in an orderly
manner by arranging it in rows and columns. Therefore, the desired values can be identified
without much difficulty. In the absence of tabulated data, it would be rather difficult to
locate the required values.
4. Provides a Basis for Analysis: Presentation of data in tabular form provides a basis for
analysis of such data. The statistical methodology suggests that analysis follows presentation
of data. A systematic presentation of data in tabular form is a prerequisite for the analysis
of data. Statistical tables are useful aids in analysis.
5. Exhibits Trend of Data: By presenting data in a condensed form at one place, tabular
presentation exhibits the trend of data. By looking at a statistical tables, you can identify
the overall pattern of the data.
Descriptive Statistics 13

1.7.4 Distinction between Classification and Tabulation


Several people consider classification and tabulation as synonyms. The two also appear to
convey the same meaning and also serve the same objectives. However, there is a difference
between the two. In classification, the data is divided on the basis of similarity and
resemblance, whereas tabulation is the process of recording the classified facts in rows
and columns. Here, the two belong to the same chain. Tabulation begins where classification
ends. In fact, classification provides a basis for tabular presentation. In 1.6, we have stated that the
frequency distribution is a tabular presentation of the number of observations falling against different
sizes or classes.Therefore, after classifying the data into various classes, they should be shown in
the tabular form.

1.7.5 Kinds of Tables


Depending upon the use and objectives of the data to be presented, there are different types
of statistical tables. They can be classified under the following broad heads:
I. Information or Classifying Tables
II. General Purpose or Reference Tables
III. Special Purpose or Summary Tables

I. Information or Classifying Tables


This type of tables is prepared to show the important characteristics of the collected facts.
The tables are prepared on the basis of similarities in the collected data. The main purpose of
preparing this type of tables is to present the data in a condensed and simple form. These tables can
be further classified as: (i) simple tables and (ii) complex tables.
1. Simple Tables: This type of tables is also known as one-way tables. These tables are
prepared on the basis of only one characteristic of the collected data. The table showing the data
relating to the number of students in a college in different years will be an example of simple or one
way table. Look at Example 1 for a simple table.
Example 1: Example of a simple table.
Number of Students in a College from 1982-83 to 1988-89
Year No. of Students
1982–83 1500
1983–84 1550
1984–85 1600
1985–86 1650
1986–87 1600
1987–88 1675
1988–89 1700
14 Statistics (Theory and Practicals)

Similarly, the students of the college can be divided on the basis of their age and separate
simple tables for each year can be prepared.
2. Complex Tables: As you know, simple tables present only one characteristic of the data.
When the tables show more than one characteristic of the data, they are called complex tables. We
may have a two-fold table showing three characteristics or a many-fold table showing several
characteristics of the data. The table showing the number of students in a college on the basis of
their sex and marital status during different years is an example of a complex table. Look at
Example 2 for a complex table.
Example 2: An example of a complex table.
Sex and Marital Status of Students in a College during 1982-83 to 1988-89
Year No. of Students Total
Male Female
Unmarried Married Unmarried Married
1982–83 950 50 475 25 1.500
1983–84 975 55 490 30 1.550
1984–85 1,000 55 510 35 1,600
1985–86 1,035 60 520 35 1,650
1986–87 1,010 50 510 30 1,600
1987–88 1,080 50 510 35 1,675
1988–89 1,090 55 515 40 1,700
Total 7,140 375 3,530 230 11,275

II. General Purpose or Reference Tables


This type of tables are prepared to store information and they contain wide range of information
relating to a specified subject. Such tables are complex tables and are generally found as appendices
to various reports. These tables should be prepared in a systematic manner so as to render references
easier. The tables appended to the census reports are good examples of general purpose or reference
tables.

III. Special Purpose or Summary Tables


These tables show a specific point relating to data and are helpful in statistical analysis. They
provide a basis for comparison by indicating specific answers to given questions. These tables are
also called text tables as they are complementary to a given text. These tables indicate rates,
percentages, averages, etc. For instance, take the study discussing the increasing rate of industrial
accidents in a country and the number of persons killed in these accidents. The table shown in
Example 3 can follow the text to show high rate of persons killed in accidents in coal mines.
Descriptive Statistics 15

Example 3: An example of special purpose or summary tables.


Relationship between the Total Number of Persons Died in
Industrial Accidents and Persons Died in Coal Mines
Year Persons Died Persons Died Persons Died in
in in Coal Mines as a % in
Industrial Accidents Coal Mines Total Deaths in
Industrial Accidents
1976 930 150 16.1
1977 1,154 285 24.7
1978 1,250 115 9.2
1979 930 108 12.0
1980 1,350 270 20.0

1.8 Exercise
1. What are various methods of collecting statistical data? Which of these is more reliable
and why?
2. Explain the comparative merits of various methods of collecting primary data.
3. What do you understand by secondary data? State their main sources.
4. Distinguish between a questionnaire and a schedule.
5. “It is never safe to use secondary data without proper scrutinisation.” Explain.
6. Explain the meaning and objectives of classification. Also discuss the various methods of
classification.
7. What is tabulation? What are the objectives of statistical tables?
8. Describe the requisites of a good statistical table.
9. Distinguish between simple and complex statistical tables and give examples of the two
types tables.

 

You might also like