Chapter 1666
Chapter 1666
Published by : Mrs. Meena Pandey for Himalaya Publishing House Pvt. Ltd.,
“Ramdoot”, Dr. Bhalerao Marg, Girgaon, Mumbai - 400 004.
Phone: 022-23860170/23863863; Fax: 022-23877178
E-mail: [email protected]; Website: www.himpub.com
Branch Offices :
New Delhi : “Pooja Apartments”, 4-B, Murari Lal Street, Ansari Road, Darya Ganj,
New Delhi - 110 002. Phon e: 011-23270392, 23278631; Fax: 011-23256286
Nagpur : Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur - 440 018.
Phone: 0712-2738731, 3296733; Telefax: 0712-2721216
Bengaluru : Plot No. 91-33, 2nd Main Road Seshadripuram, Behind Nataraja Theatre,
Bengaluru-560020. Phone: 08041138821, 9379847017, 9379847005
Hyderabad : No. 3-4-184, Lingampally, Besides Raghavendra Swamy Matham, Kachiguda,
Hyderabad - 500 027. Phone: 040-27560041, 27550139
Chennai : New-20, Old-59, Thirumalai Pillai Road, T. Nagar, Chennai - 600 017.
Mobile: 9380460419
Pune : First Floor, "Laksha" Apartment, No. 527, Mehunpura, Shaniwarpeth
(Near Prabhat Theatre), Pune - 411 030. Phone: 020-24496323/24496333;
Mobile: 09370579333
Lucknow : House No. 731, Shekhupura Colony, Near B.D. Convent School, Aliganj,
Lucknow - 226 022. Phone: 0522-4012353; Mobile: 09307501549
Ahmedabad : 114, “SHAIL”, 1st Floor, Opp. Madhu Sudan House, C.G. Road, Navrang Pura,
Ahmedabad - 380 009. Phone: 079-26560126; Mobile: 09377088847
Ernakulam : 39/176 (New No.: 60/251) 1st Floor, Karikkamuri Road, Ernakulam,
Kochi – 682011. Phone: 0484-2378012, 2378016 Mobile: 09387122121
Bhubaneswar : 5 Station Square, Bhubaneswar - 751 001 (Odisha).
Phone: 0674-2532129, Mobile: 09338746007
Kolkata : 108/4, Beliaghata Main Road, Near ID Hospital, Opp. SBI Bank,
Kolkata - 700 010, Phone: 033-32449649, Mobile: 7439040301
DTP by : Sunita
Printed at : M/s. Aditya Offset Process (I) Pvt. Ltd., Hyderabad. On behalf of HPH.
PREFACE
As there is no exact book to meet the requirements of B.A/B.Sc. First Year students of
Statistics as per Semester System (CBCS), an attempt has been made to write a book to cater the
needs of students. The book is also made more suitable for classroom teaching by providing a
large number of real-life examples at appropriate places and practicals using MS-EXCEL at the
end.
I express my sincere gratitude to Prof. R.J. Ramalinga Swamy, Prof. M. Krishna Reddy,
Prof. M. Gopal Rao, Prof. P. Udaya Sree, Prof.P. Lakshmi Manga, Prof. V.V. Haragopal,
Dr. S. Jyothi Rani. Dr. K. Vani, Dr. C. Jaya Lakshmi, and Prof. Y. Jagadeswar, for their blessings,
valuable suggestions and encouragement during the preparation of the book. And I thank other
staff members of the Department of Statistics, Osmania Univeristy, Hyderabad for their support.
I take this opportunity to thank Dr. R. Sudhakar Reddy and Dr. K. Ranga Rao,
Dr. Raghunadha Charya and to my best friends Mr. V. Papa Rao, Mr. K. Venkat Raman,
Mr. Yugandhar, Mr. Goverdhan, Mr. Mohan Prasad, Mr. Shekharam and Mrs. Parimala Sudheer,
Lecturers in Statistics for their suggestions and help.
I am thankful to Dr. O.S. Reddy, Chairman, Jagruti Group of Institutions, Mrs. D. Josephine,
Academic Director, Mr. S. Surya Prakash, Vice-Principal, Jagruti Degree & P.G. College.
Special thanks to Mr. Raghavendra Kulkarni, Principal, Nrupathunga College, Hyderabad,
Mr. Murali Krishna, Principal, G. Pulla Reddy College, Hyderabad, Dr. K. Padmavathi, Babu
Jagjivan Ram Govt. Degree College, Hyderabad and Dr. S. Srinivasa Rao, Principal, Govt.
Degree College, Atmakur, Mahaboob Nagar Dist.
Lastly, I would like to mention that although every possible care has been taken to make the
book free from printing errors but still the possibility of some error creeping in inadvertently
cannot be ruled out. I shall feel highly obliged to all readers if the same are brought to our notice.
Critical evaluation and suggestions for improvement are most welcome and shall be gratefully
acknowledged.
1. Basics of EXCEL – Data entry, editing and saving, establishing and copying a formulae,
built in functions in EXCEL, copy and paste and exporting to MS WORD document.
2. Graphical presentation of data (Histogram, Frequency Polygon, Ogives).
3. Graphical presentation of data (Histogram, Frequency Polygon, Ogives) using MS-EXCEL.
4. Diagrammatic presentation of data (Bar and Pie).
5. Diagrammatic presentation of data (Bar and Pie) using MS-EXCEL.
6. Computation of non-central and central moments – Sheppard’s corrections for grouped data.
7. Computation of coefficients of Skewness and Kurtosis – Karl Pearson’s and Bowley’s
β1 and β2.
8. Computation of measures of central tendency, dispersion and coefficients of Skewness,
Kurtosis using MS-Excel.
CONTENTS
Page No.
1. Descriptive Statistics 1 – 15
2. Analysis of Quantitative Data 16 – 87
3. Theory of Probability 88 – 125
4. Random Variables 126 – 173
5. Mathematical Expectation 174 – 193
6. Transformation of Random Variable 194 – 206
7. Inequalities 207 – 226
8. Practicals using MS-EXCEL 227 – 280
Descriptive Statistics 1
CHAPTER 1
DESCRIPTIVE STATISTICS
1.1 Introduction
1.8 Exercise
2 Statistics (Theory and Practicals)
1.1 Introduction
Without the availability of data, the science of statistics would cease to exist. The information
gathered from sample is typically used to gain insight into a much larger population. The reliability
of conclusion drawn from sample data depends to a greater extent on the quality of the data. Are
they accurate? Do the data really represent the population of interest? Was the sample collected
properly? Before we answer these questions, we first examine how the data have been collected
and from where the data have been collected. Thus, this chapter is devoted to study data collection
methods.
The systematic, planned and meaningful way of collecting information is known as collection
of data. The methods for collection of data depends upon several considerations such as objective,
scope and nature of the problem under study. Keeping in view the aim of the investigation, the data
may be collected either from a primary source or from a secondary source. Primary data means
collection of information for the first time by an enumerator for his investigation. The information
that has already been collected by others or by the same investigator for other purpose and used for
the current study is known secondary data. A detailed discussion on the method of collection of
primary and secondary data has been taken up in the subsequent sections.
Merits:
(i) It is possible to collect original, accurate and exact data.
(ii) The doubts of informants can be checked and clarified.
(iii) Informants doubts can be cleared in the language most suitable to him.
Demerits:
1. Skilled enumerators are required to collect data by this method.
2. This method consumes resources like time and money.
Demerits:
(i) The information lacks originality.
(ii) The bias of the correspondent affects the information.
Published Sources:
For the sake of public, information is published and made available to all interested parties.
The sources of published data are:
(i) Governments Publications:
State and central governments publish reports of various committees and commissions
and official publications like Gazettes, Vital Statistics, etc.
Descriptive Statistics 5
Private Publications
The following private publications may also be enlisted as the source of secondary data.
(i) Reports prepared by research scholars, universities, etc.
(ii) Publications of professional bodies like Indian Statistical Institute (ISI), ICAR, NCERT,
ICMR and CSIR.
(iii) Annual reports of banks and joint stock companies, stock exchanges, etc.
(iv) Information published in newspapers, books, magazines, etc.
Unpublished Sources
All the information need not be in published form. Information can also be taken from
unpublished sources like diaries, letters, unpublished biographics and autobiographics. Unpublished
data may also be available with scholars, research workers, trade associations and individuals.
Merits:
The merits of this method are as under:
1. This method is very economical particularly when the universe is large and spread
geographically on a vast area.
2. Since the answers happen to be in the respondent’s own words, he/she is free from the
bias of the interviewer.
3. Respondents can take their own time to answer the questions. So, they give well thought-
out answers.
4. Respondents that are at remote places and are not easily approachable can also be reached
conveniently.
5. Large samples can be covered and thus the results can be more dependable and reliable.
Limitations:
This method also suffers from the following limitations:
1. Sometimes the respondents do not bother to return the questionnaires. So, there is the
problem of low rate of return of the duly filled-in questionnaires. And also bias due to
nonresponse cannot often be determined.
2. Questionnaires can be circulated only among the respondents who are educated and
cooperative.
3. Once the questionnaires are sent to the respondents, the investigator cannot change or
modify the questions for individual respondents.
4. There is no flexibility because of the difficulty of amending the approach once the
questionnaires have been despatched.
5. There is also the possibility of ambiguous replies or omission of replies to certain questions.
Interpretation of omissions is difficult.
6. It is difficult to know whether willing respondents are truly representative.
7. This method is likely to be the slowest of all, because the respondents take their own time
to return the filled-in questionnaires.
Before sending them to the respondents, it is advisable to conduct a ‘Pilot Survey’ for
pretesting it. Pilot Survey is, in fact, the replica and rehearsal of the main survey. From the experience
gained in this sort of survey, changes can be made in the questionnaire for the final collection of
data. The pre-testing is necessary particularly in case of a big enquiry.
Features of a Good Questionnaire: In order to make the questionnaire more effective, it
must be very carefully drafted. The form and tone of the questionnaire must be designed so as to
bring in the personal element which is lost in the mailed questionnaire. The following are the qualities
of a good questionnaire:
Descriptive Statistics 7
1.5.2 Schedule
This method of data collection is similar to that of the questionnaire. The schedule is also a
proforma containing a set of questions. The difference between the questionnaire and the schedule
is that the schedule is being filled in by the enumerators who are specially appointed for the purpose.
These enumerators go to respondents with the schedules and ask them the questions from the
schedule in the order they are listed. The enumerator records the replies in the space meant for the
same in the schedule itself. In certain situations, schedules are handed over to respondents and the
enumerators help the respondents in recording the answers. Enumerators explain the objectives of
the investigation and also remove the difficulties which the respondent may feel in understanding
the implications of a particular question(s) or the definition or concept of difficult terms. Thus, the
essential difference between the questionnaire and schedule is that the former (i.e., questionnaire)
is sent to the informants by post and in the latter case, the enumerators carry the schedule personally
to informants and fill them in their own handwriting. This method is usually adopted in investigations
conducted by governmental agencies or by some big organisations. For instance, population census
all over the world is conducted through this method.
Data collection through schedules requires enumerators for filling up schedules and as such
they should be very carefully selected. They should be trained to perform their job well. They should
be intelligent and must possess the capacity of cross-examination in order to find out the fact. Above
all, they should be honest, sincere, hard working and should have the patience and perseverance. In
drafting the schedules, all points stated for a good questionnaire, must as well be observed.
Merits:
The main advantages of this method are as follows:
1. It can be adopted in those cases where informants are illiterate.
2. The problem of non-response is avoided as the enumerators go personally to obtain the
information.
8 Statistics (Theory and Practicals)
3. The method is very useful in extensive enquiries and can lead to fairly reliable results.
4. The identity of the respondent is known which is not always clear in case of a questionnaire.
Limitations:
This method has the following limitations:
1. This method is very expensive as enumerators are generally paid persons. Money also
has to be spent in training them.
2. Another limitation is that if the investigator is not good in interviewing, most of the information
collected by him may be unreliable.
3. Since the investigator is present when the respondent is giving the answers, the respondent
may not give answers to some personal questions freely.
Students
Male Female
2. Manifold Classification: In this classification, the data is classified on the basis of more
than one attribute. For example, the data relating to the number of students in a university
can be classified on the basis of their sex and marital status as shown below:
Students
Male Female
0–10 1 4 8 2 15
10–20 4 8 9 4 25
20–30 3 17 10 30
30–40 5 10 5 20
40–50 1 9 10
Total 5 20 45 30 100
The following points should be noted about the bi-variate distribution presented in above
Example 2:
1. The marks in statistics have been divided as 0–10, 10–20, 20–30, etc., whereas age in
years on the last birthday has been taken as 18 years, 19 years, 20 years, etc.
2. The students securing 0 marks or above but less than 10 marks and 18 years of age have
been put against 0–10/18 years. This number is 1. Similarly, the number of students falling
in 0–10 class but with 19, 20 and 21 years of age are 4, 8 and 2 respectively. The total
number of students with 18 years of age is five among which one person is placed 0–10
marks class and the remaining four in 10–20 marks class.
3. Multi-variate Frequency Distribution: The frequency distribution with more than two
variables is called multi-variate frequency distribution. For example, the students in a class may be
classified on the basis of marks, age and sex. Now, let us take the example presented in Example 2
and further classify the students based on sex. Study Example 3 carefully and examine how it is
done.
Example 3: Example of Multi-variate Frequency Distribution.
Marks in Number of Students with the Age
Statistics (in Years on Last Birthday)
18 years 19 years 20 years 21 years 22 years Total
Male Female M F M F M F M F M F
0–10 1 2 2 5 3 2 9 6
10–20 3 1 4 4 4 5 4 15 10
20–30 1 2 7 10 7 3 15 15
30–40 3 2 5 5 3 2 11 9
40–50 1 2 2 3 2 5 5
Total 3 2 10 10 21 24 18 7 3 2 55 45
12 Statistics (Theory and Practicals)
1.7 Tabulation
1.7.1 Introduction
In 1.6, we have discussed the objectives of classifying the mass of data so as to render
comparison of data possible. We have also explained the procedure for the construction of a frequency
distribution involving one or two variables. When two variables are given, the arrangement in rows
and columns is ordinarily known as a statistical table. Such tables can be constructed even when
the given data relates to attributes. In this chapter, you will study in detail the meaning and objectives
of tabulation and the procedure of constructing statistical tables.
Similarly, the students of the college can be divided on the basis of their age and separate
simple tables for each year can be prepared.
2. Complex Tables: As you know, simple tables present only one characteristic of the data.
When the tables show more than one characteristic of the data, they are called complex tables. We
may have a two-fold table showing three characteristics or a many-fold table showing several
characteristics of the data. The table showing the number of students in a college on the basis of
their sex and marital status during different years is an example of a complex table. Look at
Example 2 for a complex table.
Example 2: An example of a complex table.
Sex and Marital Status of Students in a College during 1982-83 to 1988-89
Year No. of Students Total
Male Female
Unmarried Married Unmarried Married
1982–83 950 50 475 25 1.500
1983–84 975 55 490 30 1.550
1984–85 1,000 55 510 35 1,600
1985–86 1,035 60 520 35 1,650
1986–87 1,010 50 510 30 1,600
1987–88 1,080 50 510 35 1,675
1988–89 1,090 55 515 40 1,700
Total 7,140 375 3,530 230 11,275
1.8 Exercise
1. What are various methods of collecting statistical data? Which of these is more reliable
and why?
2. Explain the comparative merits of various methods of collecting primary data.
3. What do you understand by secondary data? State their main sources.
4. Distinguish between a questionnaire and a schedule.
5. “It is never safe to use secondary data without proper scrutinisation.” Explain.
6. Explain the meaning and objectives of classification. Also discuss the various methods of
classification.
7. What is tabulation? What are the objectives of statistical tables?
8. Describe the requisites of a good statistical table.
9. Distinguish between simple and complex statistical tables and give examples of the two
types tables.