DBB2102 Unit-02
DBB2102 Unit-02
BACHELOR OF BUSINESS
ADMINISTRATION
SEMESTER 3
DBB2102
QUANTITATIVE TECHNIQUES FOR
MANAGEMENT
Unit 2
Collection, Classification, and Presentation of
Data
Table of Contents
4 Drafting Questionnaire - 3 10 – 11
5 Sample Selection - 4 12
6, 7, 8, 9, 10,
6.4 Charts -
11, 12
7 Summary - - 21
8 Glossary - - 21 – 23
9 Terminal Questions - - 22
10 Answers - - 22 - 23
1. INTRODUCTION
In the last Unit, we have studied the role of statistics in the various areas of science and
engineering. The current unit focuses on the collection, analysis, and interpretation of
uncertain business data. The compilation and analysis of data are fundamental to science
and engineering. Scientists discover the principles that govern the physical world, and
techno-managers learn how to design important new products and processes, by analyzing
data collected in scientific experiments. A major difficulty with scientific data is that they are
subject to random variation, or uncertainty. To deal with the uncertainty of data, knowledge
of statistics is essential.
Data collection and analysis for the business are playing an ever-increasing role in all aspects
of modern life. For better or worse, huge amounts of data are collected about our opinions
and our lifestyles, for purposes ranging from the creation of more effective marketing
campaigns to the development of social policies designed to improve the way of life. On
almost any given day, newspaper articles are published that purport to explain social or
economic trends through the analysis and interpretation of data. A basic knowledge of
statistics is required, which can deal with the collection, classification, and presentation of
data.
1. 1 Objectives
After studying this unit, you should be able to:
❖ State the scope of data collection
❖ Differentiate between primary and secondary data
❖ List the questionnaires of data gathering
❖ Present the data in the form of graphs or charts
❖ Construct a frequency distribution
2. DATA COLLECTION
Data can be generated from actual observations or from records that are kept for normal
purposes. For billing purposes and doctors’ reports, a health centre, for example, will record
number of patients using the CT-scan facilities. But this information can also be organized to
produce data that statisticians can describe and interpret.
Data can assist the production manager in educated guesses about the causes and therefore,
the probable effects of certain characteristics in given situations. Also, knowledge of trends
from the past experience can enable us to be aware of potential outcomes and to plan in
advance.
When data are arranged in usable forms, decision makers can take reliable information from
the environment and use them to make intelligent decisions. Today, computers allow the
decision maker to collect enormous volumes of observations and compress them instantly
into tables, graphs, and numbers. But are they reliable? Remember that the data, which come
out of a computer, are only as accurate as the data that go in. The production manager must
be very careful to be sure that the data they are using are based on correct assumptions and
interpretations. Before relying on any interpreted data, from a computer or not, test the data
by asking the following questions:
1) Where did the data generate from? Is the source biased-that is, is it likely to have an
interest in supplying data points that will lead to one conclusion rather than another?
2) Do the data support or contradict other facts we have?
3) How many observations do we have? Do they represent all the groups we wish to
study?
4) Is the conclusion valid? Have we made conclusions that the data do not support?
Study your answers to these questions and classify the reliable data.
2.1 Broad Categories of data: Statistical data may be classified under two categories
depending upon the sources utilized. These categories are:
1) Primary data
Primary data are those data, which are collected by the researcher himself for the purpose
of a specific study. Such data are original in character and are generated by surveys
conducted by individuals or research institutions.
2) Secondary data
When an investigator uses data, which have already been collected by others, such data are
called “secondary data”. Such data are primary data for the agency that collected them, and
become secondary for others who use these data for their own purposes. These data can be
obtained from journals, reports, publications of professional or research organization.
While collecting the data, the nature, scope and objects of a statistical investigation should
be taken into account for deciding whether data are to be collected originally or whether the
available secondary data are to be utilized. It is generally preferable to make use of primary
data from several standpoints- i) such data usually show detailed information and a
description regarding the definition of the terms used. ii) Very often, a note on the method
of collection and any approximations used are also available, so that while using these data
it can be decided in advance how much reliance can be placed on these figures. iii) Secondary
data usually contain errors due to transcription, rounding etc., and hence are hardly reliable.
In spite of all the merits of primary data, secondary data are used when either due to
limitations of time and money at the disposal of the investigator the data cannot be collected
directly, or it becomes necessary to compare the data collected over a period of time, or
utmost accuracy is not essential.
Self-Assessment Questions - 1
1. Data can be collected from experimentation. (True/False)
2. Computers allow the decision maker to collect enormous volume of observations.
(True/False)
3. Before relying on any interpreted data, ______________________ the data by asking
standard questions.
4. Scientific data is related to _________________________.
Since the quality of the results obtained from the statistical data depends upon the quality of
the information collected, it is important that a sound investigative process be established
to ensure that the data are highly representative and highly unbiased. The following steps
may be considered in the primary data collection process:
In ‘telephone survey’, the investigator, instead of presenting himself before the informants,
contacts them on telephone and collects information from them. This method is less time
consuming and more convenient than personal interview.
In ‘information through local agents’, the information is not collected formally by the
investigator, but local agents, commonly known as correspondents, are appointed in
different parts of the area under investigation. These agents collect information in their areas
and transmit the same to the investigator. This method is very cheap and economical for
extensive investigations.
In the ‘mailed questionnaire method’, the most important instrument is the questionnaire.
This contains a set of questions, relevant to the subject of enquiry, answers which are
expected to yield the requisite information. Printed questionnaires are sent by mail to a
selected list of persons, with the request to return them duly filled in. Supplementary
definitions of terms used and methods of filling up the forms should also accompany the
questionnaire. This method is cheap and expeditious, and a large area can be covered within
a limited cost.
Activity 1
Distinguish between primary and secondary data. Describe the various methods of
collecting primary data.
Self-Assessment Questions - 2
5. Sources of data can be primary or secondary. (True/False)
6. Match the following:
a) Attribute i) Quantitative data
b) Variable ii) Qualitative data
c) Classification of data iii) Four types
d) Reserve Bank of India bulletin iv) Secondary data
e) Monthly coal bulletin v) Primary data
f) Monthly abstract of statistics vi) Primary data
7. Primary data can be collected by census method or by sample enquiry method.
(True/False)
8. Direct personal interview method is used to collect ___________________.
4. DRAFTING QUESTIONNAIRE
The ‘questionnaire’ is a proforma containing a sequence of questions relevant to a statistical
enquiry. Since the questionnaire is the only medium of communication between the
investigator and the respondents, it must be designed or drafted with utmost care and
caution so that all relevant and essential information for the enquiry may be collected
without any difficulty, ambiguity, and vagueness. Designing a questionnaire, therefore,
requires a high degree of skill and experience on the part of the investigator. The following
points should be observed in drafting the questionnaire:
1) The questionnaire should be as short as possible. Many questions may arise during an
investigation. But if all are included, the questionnaire will become unduly lengthy with
the consequence that the respondents
(i.e., persons who are required to answer them) will feel bored and reluctant to answer
all the questions.
3) If possible, questions should be so set as to elicit only two possible definite answers-
‘yes’ or ‘no’.
4) The units in which the information is to be collected should be clearly and precisely
mentioned in the questionnaire.
5) The arrangement of questions in the proforma should be such as to have an easy and
systematic flow of answers in turn. Questions should not skip back and forth from one
topic to another.
6) After the questionnaire has been devised, it is desirable to try it on a few individuals.
The procedure, which is known as pilot survey, is useful in detecting the shortcomings
of the questionnaire, so that necessary modifications may be made before it is used in
the actual enquiry.
Hence, the outcome of each question will produce the large data base. Statistician will use
this data base for further analysis and prediction of the results.
Activity 2
An automobile manufacturing unit has selling branches in each metropolitan city in
India. It manufactures four different types of vehicles, which are sold by its branches.
The Head Office wishes to plan a sales campaign based on the past sales and likely
future demand. Design a questionnaire for the collection of necessary data and draft
the instructions for completing the questionnaire.
Self-Assessment Questions - 3
9. The questionnaire should be as short as possible. (True/False)
10. Complicated questions should be split up into smaller parts. (True/False)
5. SAMPLE SELECTION
The large database generated from the outcomes of questionnaire is known as population.
But the statisticians gather data from a sample. They use this information to make inferences
about the population that the sample represents. Thus, a population is a large data base, and
a sample is a fraction or segment of that database.
Studying samples is easier than studying the whole population; it costs less and takes less
time. Often, testing an airplane part for strength destroys the part; thus testing fewer parts
is desirable. Sometimes testing involves human risk; thus use of sampling reduces that risk
to an acceptable level. Finally, it has been proven that examining an entire population still
allows defective items to be accepted. Thus sampling, in some instances, can raise the quality
level.
A population is a collection of all the elements we are studying and about which we are trying
to draw conclusions. We must define the population so that it is clear whether an element is
a member of the population. The population of our market study may be all men within a 15-
mile radius of center-city Mumbai who have family incomes between 5000 and 10000 and
have completed at least 11 years of school. A man living in Mumbai with a family income of
7500 and a college degree would be a part of this population. A man living in Mumbai or with
a family income of 4500, or with 5 years of schooling would not qualify as a member of this
population.
Self-Assessment Questions - 4
11. A population is a collection of all the elements we are studying (True/False).
12. A population is the fraction of ____________________.
13. Studying a sample is time consuming (True/False).
6. DATA PRESENTATION
All business decisions are based on the evaluation of some data. When numerical data is
listed in the same order as it is collected, it is known as raw data. The following example
shows the use of different kinds of data in statistics:
Suppose that 10 boys and 5 girls enter in a competition. Assume that the boys’ competition
was won by Ayush Poddar who finished the race in 25 minutes and 15 seconds. Similarly,
assume that the girls’ competition was won by Nabnita Choudhury in 30 minutes and 10
seconds for the same race. However, the names and finishing times for all participants were
recorded in order of finishing. The first three positions won were given awards.
The above example contains much different kind of data. The information that puts each
contestant into the category of boy or girl is known as qualitative data. For example, the
information that the winner Nabnita Choudhury is a female is “qualitative data”.
The order in which the winners finished is known as, “ordinal data”. The persons finishing
first, second, third etc., are examples of ordinal data.
All these voluminous data must be presented in a condensed form to the management
without any loss of information contained in it. Hence, the collected data must be organized,
carefully summarized and presented either in the form of tables or graphs that can be easily
interpreted. The tools of classification and presentation of statistical data are listed as
follows:
• Frequency distribution
• Cumulative frequency distribution
• Relative frequency distribution
• Charts
all the values obtained in the data and the frequency with which these value occur in the
data”.
The frequency distribution can be classified into discrete frequency distribution and
continuous frequency distribution, which are demonstrated in Tables 2.1 and 2.2,
respectively. In table 2.1, the variable has discrete numerical value. But the monthly income
is a continuous variable (Table 2.2).
6.4 Charts
Statistical data can be presented in the graphical form (or charts), which are classified as: pie
chart, bar chart, histogram, frequency polygon and ogive curve.
Pie chart
Pie chart is a circle whose area is divided proportionately among the different components
by straight lines drawn from the centre to the circumference of the circle. When statistical
data are given for a number of categories, and we are interested in the comparison of the
various categories or between a part and the whole, such a chart is very helpful in effectively
displaying the data. For drawing a pie chart (Fig.2.1), it is necessary to express the value of
each category as a percentage of the total. The chart can be drawn with the help of a compass
and a protractor. A numerical example has been demonstrated its application.
Example 1 Construct a pie chart for the following data: Principal exporting countries of
Cotton (1000 bales)-1955-56
U.S.A India Egypt Brazil Argentina
6,367 2,999 1,688 650 202
Source: Statistical Methods by N.G.Das
Solution:
Bar chart
In bar charts, we make use of rectangles to present the given data. It consists of a group of
equispaced rectangular bar, one for each category (or class) of given statistical data. The bars
starting from a common base line must be of equal width and their lengths represent the
value of statistical data. A bar chart can be set up in different forms: vertical, horizontal.
Vertical bars are used to represent time series data or data classified by the values of a
variable. Horizontal bars are used to depict data classified by attributes only.
Example 2: The following table shows the average approximate yield of food grains in lbs.
per acre in various countries of the world in 1988-89.
Country India Bangladesh Sri Lanka Nepal Bhutan
Yield in lbs. per acre 845 567 760 453 234
Indicate this by a suitable diagram, which will highlight the relative backwardness of Bhutan
in this regard.
Fig. 2.4: Average Yields of Food Grains in lbs per Acre in Various Countries
Histogram
In the representation of ‘histogram’, the given data are plotted in the form of series of
rectangles. Class intervals are marked along the X-axis and the frequencies along the Y-axis
according to a suitable scale. Unlike the bar chart, which is one dimensional meaning that
only the length of the bar material and not the width, a histogram is two-dimensional in
which the length and the width are important. A histogram is constructed from a frequency
distribution of grouped data, where the height of the rectangle is proportional to the
respective frequency and the width represents the class interval. Each rectangle is joined
with the other and the blank spaces between the rectangles would mean that the category is
empty and there are no values in that class interval.
Frequency polygon
A ‘frequency polygon’ is a line chart of frequency distribution in which either the values of
discrete variables or the mid-points of class intervals are plotted against the frequencies and
these plotted points are joined together by straight lines. Since the frequencies don’t start at
zero or end at zero, this diagram as such would not touch the horizontal axis. However, since
the area under the entire curve is the same as that of a histogram which is 100%, the curve
must be ‘enclosed.’ The starting mid-point is joined with a ‘fictitious’ preceding mid-point
whose value is zero. This makes the beginning of the curve touch the horizontal axis. The last
mid-point is joined with a ‘fictitious’ succeeding mid-point, whose value is also zero. Now,
the curve will end at the horizontal axis. The enclosed diagram is known as the ‘frequency
polygon’.
A B
Fig. 2.7: (A) More-than Ogive Curve (B) Less-than Ogive Curve
Self-Assessment Questions - 5
14. The frequency distribution can be classified into ___________________ and
___________________ frequency distribution.
15. Ogive curve is a cumulative frequency curve. (True/False)
16. Match the following types of chart and their representations:
a) Pie chart i) Rectangular bar
b) Bar chart ii) Series of rectangles
c) Histogram iii) Circle
7. SUMMARY
Let’s recapitulate the important aspects of the unit:
• In this chapter, at first, we learnt the process of collection of statistical data at the
minimum cost. The collected data must be verified by asking some standard questions.
• In the second stage, we studied the classification of the data, which is based on the
answers of the first stage questions. In this topic, sources of primary and secondary
data have been discussed. The primary source generates original data and corresponds
to the objective of investigation. However, the secondary data are often available in
published form, collected originally by some other agency with a different goal. The
primary data is more reliable than secondary data. There are several methods of
collecting the primary data. The choice of a particular method depends apart from
objective, scope and nature of investigation, on the availability of resources, literacy
levels of the respondents, etc. Secondary data should be collected carefully, only after
examining that these are suitable, adequate and reliable for the purpose of
investigation under consideration.
• In the third stage, we discussed the process of framing the questionnaires for the
statistical enquiry.
• In the fourth stage, we learnt the process of sample selection from the large population.
8. GLOSSARY
Data: A collection of any number of related observations on one or more variables.
Data point: A single observation from data set.
Data set: A collection of data.
Raw data: Information before it is arranged or analyzed by statistical methods.
Sample: A collection of some, but not all, of the elements of the population under study, used
to describe the population.
Population: A collection of all the elements we are studying, and about which we are trying
to draw conclusions.
Frequency curve: A frequency polygon smoothed by adding classes and data points to a data
set.
Frequency distribution: An organized display of data that shows the number of
observations from the data set that falls into each of a set of mutually exclusive and
collectively exhaustive classes.
Cumulative frequency: A tabular display of data showing how many observations lie above,
or below certain values.
9. TERMINAL QUESTIONS
1. What do you mean by classification? What purpose does it serve?
2. What do you mean by primary data? What are the various methods of collecting
primary data?
3. Define secondary data. What are the sources of secondary data?
4. What is a questionnaire? Discuss the main points that you will take into account while
drafting a questionnaire.
5. Discuss the various methods of presenting the statistical data.
10. ANSWERS
Self-Assessment Questions
1. True
2. True
3. Test
4. Uncertainty
5. True
6. a-ii, b-i, c-iii, d-vi, e-v, f-iv
7. True
8. Primary data
9. True
10. True
11. True
12. Sample
13. False
Terminal Questions
1. Refer section 2.3 – There are four types of classification of data.
2. Refer section 2.3 – Primary data are those data, which are collected by the researcher
himself. The methods of collecting primary data are direct, personal observation etc.
3. Refer section 2.3 – When an investigator uses data, which have already been collected
by others, such data are called “secondary data.” Secondary data sources are statistical
abstract of Indian union etc.
4. The ‘questionnaire’ is a proforma containing a sequence of questions relevant to a
statistical enquiry. There are certain points that should be followed while drafting a
questionnaire. Refer section 2.4 for the same.
5. The tools of classification and presentation of statistical data are frequency
distribution, cumulative frequency distribution, relative frequency distribution and
charts. Refer section 2.6 for detailed explanation of each of them.
References:
• Bharadwaj R. (2001). Business statistics. New Delhi Excel books.
• Chandan J. Jagjit Singh., & Khanna, K. (2003). Business statistics, Vikash Publishing
House.
• Das N. (2009). Statistical methods (Vol I), New Delhi: The Mc-Graw Hill Companies.
• Richard Levin I., & David Rubin, S. (2007). Statistics for management, New Delhi:
Eastern Economy Edition.
• Gupta C., Vijay Gupta, (2004). An Introduction to statistical methods, Delhi: Vikash
Publishing House.
• Gupta C.B., & Gupta V. (2004). An Introduction to Statistical Methods.
• New Delhi: Vikas Publishing House.
• Panneerselvam R. (2005). Research methodology. New Delhi: Prentice- Hall of India
Private Limited.
• Shenoy G., Srivastava U., & Sharma. Business Statistics, New Age International.