0% found this document useful (0 votes)
152 views

Types of Data and Data Sources Mba

This document discusses types of data and data sources used in research methodology. It describes two broad categories of data: qualitative and quantitative. Qualitative data refers to attributes that can be categorized but not numerically expressed, while quantitative data can be measured in units. The document also discusses primary and secondary sources of data, with primary data collected directly and secondary data previously collected. It notes some sources of error in data collection and emphasizes the importance of organizing data through classification and tabulation before analysis.

Uploaded by

Himanshu Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views

Types of Data and Data Sources Mba

This document discusses types of data and data sources used in research methodology. It describes two broad categories of data: qualitative and quantitative. Qualitative data refers to attributes that can be categorized but not numerically expressed, while quantitative data can be measured in units. The document also discusses primary and secondary sources of data, with primary data collected directly and secondary data previously collected. It notes some sources of error in data collection and emphasizes the importance of organizing data through classification and tabulation before analysis.

Uploaded by

Himanshu Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

LNCT GROUP OF COLLEGES

Research Methodology

TYPES OF DATA AND DATA SOURCES


Statistical data are the basic raw material of statistics. Data may relate to an activity of our interest, a
phenomenon, or a problem situation under study. They derive as a result of the process of measuring,
counting and/or observing. Statistical data, therefore, refer to those aspects of a problem situation that
can be measured, quantified, counted, or classified. Any object, subject, phenomenon, or activity that
generates data through this process is termed as a variable. In other words, a variable is one that shows a
degree of variability when successive measurements are recorded.

Figure 1
In statistics, data are classified into two broad categories: quantitative data and qualitative data. This
classification is based on the kind of characteristics that are measured.
Qualitative data refer to qualitative characteristics of a subject or an object. Data that can be placed
into distinct categories, according to some characteristic or attribute. These can not be expressed
numerically. A characteristic is qualitative in nature when its observations are defined and noted in
terms of the presence or absence of a certain attribute. These data are further classified as nominal and
rank data.
(i) Nominal data are the outcome of classification into two or more categories of items or units
comprising a sample or a population according to some quality characteristic. Classification of
students according to gender (as males andfemales), of workers according to skill (as skilled, semi-
skilled, and unskilled), and of employees according to the level of education (as matriculates,
undergraduates, and post-graduates), all result into nominal data. Given any such basis of
classification, it is always possible to assign each item to a particular class and make a summation
of items belonging to each class. The count data so obtained are called nominal data.
(ii) Rank data are the result of assigning ranks to specify order in terms of the integers 1,2,3, ..., n.
Ranks may be assigned according to the level of performance in a test. a contest, a competition, an
interview, or a show. The candidates appearing in an interview, for example, may be assigned ranks
in integers ranging from 1 to n, depending on their performance in the interview. Ranks so assigned
can be viewed as the continuous values of a variable involving performance as the quality
characteristic.

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
Quantitative data are those that can be quantified in definite units of measurement. These refer to
characteristics whose successive measurements yield quantifiable observations. Depending on the
nature of the variable observed for measurement, quantitative data can be further categorized as
continuous and discrete data. Level of measurment for quantitative data is either interval or ratio.
Obviously, a variable may be a continuous variable or a discrete variable.
(i) Continuous data represent the numerical values of a continuous variable. A continuous variable
is the one that can assume any value between any two points on a line segment, thus representing
an interval of values. The values are quite precise and close to each other, yet distinguishably
different. All characteristics such as weight, length, height, thickness, velocity, temperature,
tensile strength, etc., represent continuous variables. Thus, the data recorded on these and similar
other characteristics are called continuous data.
(ii) Discrete data are the values assumed by a discrete variable. A discrete variable is the one whose
outcomes are measured in fixed numbers. Such data are essentially count data. These are derived
from a process of counting, such as the number of items possessing or not possessing a certain
characteristic. The number of customers visiting a departmental store everyday, the incoming
flights at an airport, and the defective items in a consignment received for sale, are all examples of
discrete data.
TYPES OF DATA ACCORDING TO DATA SOURCE
Data sources could be seen as of two types as follows:
(i) Primary data: Those data which do not already exist in any form, and thus have to be collected for
the first time from the primary source(s). By their very nature, these data require fresh and first-time
collection covering the whole population or a sample drawn from it. These may be collected by any
one of the following methods
 Observation Method
 Interview Method
 Questionnaire Method
 Schedule Method
 Focus group
(ii) Secondary data: They already exist in some form: published or unpublished - in an identifiable
secondary source. They are, generally, available from following sources, though not necessarily in
the form actually required:
 Official publications of Govt., International Organizations, Banks, Trade Unions, CII, BSE
etc.
 Internet, Journals, News Papers, Magazines etc.
 Unpublished Research Reports

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
The secondary data can be relied upon only by examining the following factors:
(1) source from which they have been obtained;
(2) their true significance;
(3) completeness and
(4) method to collection.
Choice between Primary and Secondary Data
An investigator has to decide whether he will collect fresh (primary) data or he will compile data from
the published sources. Following factors should also be considered while making choice between the
primary or secondary data :
(i) Nature and scope of enquiry.
(ii) Availability of time and money.
(iii) Degree of accuracy required and
(iv) The status of the investigator i.e., individual, Pvt. Co., Govt. etc.
However, in certain investigations both primary and secondary data may have to be used, one may be
supplement to the other.
Table 1: DifferencebetweenPrimary&SecondaryData
PrimaryData SecondaryData
Primarydataareoriginalandarecollecte Datawhicharecollectedearlierbysomeoneelse,and
Basisnat
dforthefirsttime. whicharenowinpublished or unpublishedstate.
ure

Collecti Thesedataarecollectedby Secondarydatawerecollectedearlierbysomeotherpers


ngAgenc theinvestigatorhimself on.
yPost Thesedatadonotneedalterationastheya Thesehavetobeanalyzedandnecessarychangeshaveto
collectio reaccordingtotherequirementoftheinv bemadetomakethemusefulaspertherequirementsofin
nalterati estigation vestigation.
ons
Moretime,energyandmoneyhastobesp Comparativelylesstimeandmoneyistobespent.
Time&
entincollectionofthesedata.
Money

SOURCES OF ERROR IN MEASUREMENT (COLLECTION OF DATA):


 Respondent: Due to fear, Fatigue, Boredom or Little Knowledge.
 Situation: Stress, Presence of a third person
 Interviewer: Bias, Behavior, Style, Error in Recording the response
 Instrument: Ambiguous alternatives, poor printing, inadequate space for replies etc.

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
ORGANIZATION
So far, we know how to collect data. Now we have to organise the collected data so that they can be
analysed. The collected data (also known as raw data) are always in an unorganized form and need to be
organized. But before organizing the data we need to edit by correcting errors (such as missing entries,
extreme values etc.), if any. After editing we first classify the data and then tabulate it. Thus, the
organisation of data consists of the following steps

Figure 2
CLASSIFICATION OF DATA
Arranging data into sequences and groups according to their common characteristics is known as
classification. For example,letters in the postoffice are classified according to their destinations viz.,
Delhi, Jaipur, Agra, Kanpur, etc.; the televisions in a shop may be classified according to their screen
sizes; the individuals in a group may be classified into various income groups according to their income.
Main objectives of classifying the data are:
1. It condenses the mass of data in a simple form.
2. It eliminates unnecessary details.
3. It facilitates comparison and highlights the significant aspect of data.
4. It enables one to get a mental picture of the information and helps in drawing inferences.
5. It helps in the statistical treatment of the information collected.
TYPES OF CLASSIFICATION
1. Chronological or Temporal classification: In chronological classification the collected data are
arranged according to the order of time expressed in years, months, weeks etc. The data are
generally classified in ascending order of time. For example, the data related with population, sales
of a firm, imports and exports of a country are always subjected to chronological classification.
Specifically, data relating to the road accidents (in thousands) in madhya pradesh during 2003 –
2010 are
Table 2 : Number of Road Accidents in MP

Year 2003 2004 2005 2006 2007 2008 2009 2010


Accidents 30 32.5 35.1 38 41.9 43.8 47.2 50
2. Geographical or Spatial Classification: In this type of classification the data are classified
according to geographical region or place. For instance, the production of paddy in different states
in India, production of wheat in different countries etc. for example State wise Computer and
Internet users in the states of north east in India as shown in the following table

Dr.Bhavana Likhitkar/Faculty/LNCTU
M
n
d
u
t
S
m
F
G
P
e
l
a
s M
n
d
u
t
S
m
F
e
l
a
s
User
s
LNCT GROUP OF COLLEGES
Table 3: State wise Computer and Internet users (North east India)

State Sikkim

4228
AP

5232
Mizoram

5527
Nagalan
d
6799
Meghalay
a
8074
Tripur
a
8428
Manipu
r
10650

3. Qualitative or categorical classification: In this type of classification, data are classified on the
basis of some attributes or quality like sex, literacy, religion, employment, etc. Such attributes
cannot be measured along with a scale.
When the classification is done with respect to one attribute, which is dichotomous in nature, two
classes can be formed, one possessing the attribute and the other not possessing the attribute. This
type of classification is called simple or dichotomous classification. A simple classification may be
shown as under:

Figure 3
The classification, where two or more attributes are constructed and several classes are formed, is
called a manifold classification. The above example of a manifold classification can also be
explained by the following chart:

Figure 4
4. Quantitative classification: In quantitative classification, the collected data are grouped with
reference to the characteristics, which can be measured and numerically described such as height,
weight, sales, imports, age, income, etc. The first step in the direction of putting observations in
some ordered form is to arrange them in ascending or descending order of magnitude.The data are
then said to be in an array. For example, the following table presents the Number of SmartPhones
Sold in 40 consecutive days by a cell phone dealer. The data displayed here are in raw form, that is,
the numerical observations are not arranged in any particular order or sequence.
Table 4: Raw Data Pertaining to Total Time Hours Worked by Laborers
7 8 5 10 9 10 5 12 8 6 8 12 8 8 10 15 7 6 8 8
10 11 6 5 10 11 10 5 9 13 5 6 9 7 14 8 7 5 5 14

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
The raw data can be reorganized in a data array and frequency distribution. Such an arrangement
enables us to see quickly some of the characteristics of the data we have collected. When a raw data
set is arranged in rank order, from the smallest to the largest observation or vice-versa, the ordered
sequence obtained is called an ordered array. Following table reorganizes the above raw data
Table 5 : Ordered Array

5 5 5 5 5 5 5 6 6 6 6 7 7 7 7 8 8 8 8 8
1 1
8 8 8 9 9 9 10 10 10 10 10 11 11 12 12 14 14 15
0 3
It may be observed that an ordered array does not summarize the data in any way as the number of
observations in the array remains the same. To overcome this problem we reorganize the data in the
form of a frequency distribution.
A frequency Distribution shows the frequency (number of occurrences) of different values of a
single phenomenon (number of overtime hours) or one may divide observations in the data set into
conveniently established numerically ordered classes (groups or categories). The number of
observations in each class is referred to as frequency of that class. There are two different types of
frequency distributions:
1. Discrete Frequency Distribution: When the raw data is related to a Discrete Variable and we
arrange the different values in ascending or descending order along with the number of
occurrence. The frequency distribution of the number of smart phonessold given is shown in the
following table
Table 6 : Array and Tallies

Number of Frequency
Tally
Smart Phones (Number of Days)
5 |||| || 7
6 |||| 4
7 |||| 4
8 |||| ||| 8
9 ||| 3
10 |||| | 6
11 || 2
12 || 2
13 | 1
14 || 2
15 | 1
40
As the number of observations obtained gets larger, the above method to condense the data
becomes difficult and time-consuming. Thus, to further condense the data into frequency
distribution tables, we create a continuous frequency distribution.

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
2. Continuous Frequency Distribution: When the data are related to a continuous variable, the
data is classified on the basis of class intervals which are exhaustive and mutually exclusive. It is
acomplished by performing the following steps:
 Select an appropriate number of non-overlapping class intervals
The decision on the number of class groupings depends largely on the judgment of the
individual investigator and/ or the range that will be used to group the data, although there
are certain guidelines that can be used. As a general rule, a frequency distribution should
have at least five class intervals (groups), but not more than fifteen.
 Determine the width of the class intervals.
Largest Value−Smallest value
Width of the class interval=
Number of Class Intervals
 Determine class limits (or boundaries) for each class interval to avoid overlapping.
The limits of each class interval should be clearly defined so that each observation (element) of
the data set belongs to one and only one class. Each class has two limits— a lower limit and an
upper limit. The usual practice is to let the lower limit of the first class be a convenient number
slightly below or equal to the lowest value in the data set.For example, The data given below
relate to the time (in minutes) 30 different customers had to wait at a Two-wheeler service
centre:
Table 7: Customer Waiting Time (in Minutes)

13.
11.8 3.6 16.6 4.8 8.3 8.9 9.1 7.7 2.3 12.1 6.1 6.2 11.0 10.4
5
11.
10.2 8.1 11.4 6.8 9.6 19.5 15.3 12.3 8.5 15.9 18.7 7.2 5.5 14.5
7
Step 1: Number of class intervals taken = 6
Step 2: Class Width = (19.5 – 2.3)/6~3
Step 3: Class Boundaries for the class are taken as 2 and 5, 5 and 8, 8 and 11 ……upto 17 and
20
Thus, the frequency distribution table is as follows:
Table 8: Exclusive Frequency Distribution

Class- intervals Frequencies


Tally marks
(Waiting Time in Minutes) (Number of customers)
2 and under 5 III 3
5 and under 8 IIIII 6
8 and under 11 IIII III 8
11 and under 14 IIII II 7

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
14 and under 17 IIII 4
17 and under 20 II 2
30

There are two ways in which observations in the data set are classified on the basis of class
intervals, namely Exclusive method and Inclusive method.
Exclusive Method (above example): When the data are classified in such a way that the upper
limit of a class interval is the lower limit of the succeeding class interval then it is said to be the
exclusive method of classifying data and in this method any observation equal to the upper limit
of any class interval is included in the next class interval.For example, in the above problem the
observation 11 is not included in the class “8 and under 11” but in the class “11 and under 14”.
Inclusive Method:Inclusive Method When the data are classified in such a way that both lower
and upper limits of a class interval are included in the interval itself, then it is said to be the
inclusive method of classifying data. The frequency distribution formed by this method looks
like the following table
Table 9 : Inclusive Frequency Distribution

Class- intervals Frequencies


(Weight of Gold Coins in gms) (Number of Coins)
1– 4 5
5–8 22
9–12 13
13–16 8
17 - 20 2
If a continuous variable is classified according to the inclusive method, then certain adjustment
in the class interval is needed to obtain continuity. To ensure continuity, first calculate correction
factor as
lowe r limit of a class−upper limit of the previous class
Correction Factor =
2
and then subtract it from the lower limits of all the classes and add it to the upper limits of all the
classes. In the above example we calculate the correction factor as
(lower limit of a class5-8 )−( upper limit of the class 1-4 ) 5−4
¿ = =0.5
2 2
Subtracting 0.5 from the lower limits of all the classes and adding 0.5 to the upper limits, the
adjusted classes would be as shown in Table below:
Table 10 : Converting to Exclusive Distribution

Class- intervals Frequencies


(Weight of Gold Coins in gms) (Number of Coins)
0.5 – 4.5 5

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
4.5 – 8.5 22
8.5 – 12.5 13
12.5 – 16.5 8
16.5 – 20.5 2
Cumulative frequency distribution:
Cumulative frequency distribution is used to determine the number of observations that lie
above (or below) a particular value in a data set. The cumulative frequency is calculated using a
frequency distribution table. There are two types of cumulative frequency distributions:
1. Less than type Cumulative Frequency Distribution
The less than cumulative frequencies are related to upper limits of the classes and form an
increasing sequence. For this type of distribution, the cumulative frequency is calculated by
adding each frequency from a frequency distribution table to the sum of its predecessors.
The last value will always be equal to the total for all observations, since all frequencies will
already have been added to the previous total. For example, the cumulative frequency
frequency distribution from the frequency distribution given in Table 8will be as follows
Table 11: Less than Type Cumulative Distribution

Class Waiting Time Cumulative Frequencies


Frequency
intervals (Min.) (Number of Customers)
2 and under 5 3 Less than 5 3 (3)
5 and under 8 6 Less than 8 9 (6 + 3)
8 and under 11 8 Less than 11 17 (8 + 6 + 3)
11 and under 14 7 Less than 14 24 (7 + 8 + 6 + 3)
14 and under 17 4 Less than 17 28 (4 + 7 +8 + 6 +3)
17 and under 20 2 Less than 20 30 (2 + 4 + 7 + 8 + 6 + 3)
2. More than type Cumulative Frequency Distribution
The more than cumulative frequencies are related to lower limits of the classes and form a
decreasing sequence. The cumulative frequencycorresponding to any class is obtained by
adding its frequency to all the successor frequencies. Therefore, the cumulative
frequencycorresponding to the first class is always equal to the sum of all the
frequencies,since all the observations will be more than the lower limit of the first class. For
example, the more than type cumulative frequency frequency distribution from the frequency
distribution given in Table 8 will be as follows
Table 12: More than Type Cumulative Distribution

Class Waiting Time Cumulative Frequencies


Frequency
intervals (Min.) (Number of Customers)
2 and under 5 3 More than 2 30 (3 + 6 + ……+ 2)

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
5 and under 8 6 More than 5 27 (6 + 8 + ….+ 2)
8 and under 11 8 More than 8 21 (8 + 7 +..+2)
11 and under 14 7 More than 11 13 (7 + 4 + 2)
14 and under 17 4 More than 14 6 (4 + 2)
17 and under 20 2 More than 17 2 (2)

Relative Frequency Distribution


To convert a frequency distribution into a corresponding relative frequency distribution, we
divide each class frequency by the total number of observations in the entire distribution. Each
relative frequency is thus a proportion as shown in the table 13 below
Percentage Frequency Distribution
A percentage frequency distribution is one in which the number of observations for each class
interval is converted into a percentage frequency by dividing it by the total number of
observations in the entire distribution. The quotient so obtained is then multiplied by 100, as
shown in the table 13 below
Table 13 : Relative and percentage Frequency Distribution

Class Relative Percentage


Frequency
intervals Frequency Frequency
2 and under 5 3 3/30 = 0.10 10
5 and under 8 6 6/30 = 0.20 20
8 and under 11 8 8/30 = 0.266 26.66
11 and under 14 7 7/30 = 0.233 23.33
14 and under 17 4 4/30 = 0.133 13.33
17 and under 20 2 2/30 = 0.66 6.66
Total → 30 1 100%

TABULATION
Tabulation is the process of summarizing classified or grouped data in the form of a table so that it is
easily understood and an investigator is quickly able to locate the desired information. A Table is a
systematic arrangement of classified data in columns and rows. Thus, a statistical table makes it
possible for the investigator to present a huge mass of data in a detailed and orderly form. It facilitates
comparison and often reveals certain patterns in data, which are otherwise not obvious. Before
tabulation, data are classified and then displayed under different columns and rows of a table.
Advantages of Tabulation

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
Statistical data arranged in a tabular form serve following objectives:
1. It simplifies complex data and the data presented are easily understood,
2. It facilitates comparison of related facts.
3. It facilitates computation of various statistical measures like averages, dispersion, correlation etc.
4. It presents facts in minimum possible space and unnecessary repetitions and explanations are
avoided. Moreover, the needed information can be easily located.
5. Tabulated data are good for references and they make it easier to present the information in the
form of graphs and diagrams.
Preparing a Table
The making of a compact table is itself an art. This should contain all the information needed within the
smallest possible space. What the purpose of tabulation is and how the tabulated information is to be
used are the main points to be kept in mind while preparing for a statistical table. An ideal table should
consist of the following main parts:
1. Table number: A table should be numbered for easy identification and reference in future. The table
number may be given either in the centre or side of the table but above the top of the title of the
table. If the number of columns in a table is large, then these can also be numbered so that easy
reference to these is possible.
2. Title of the table: Each table must have a brief, self-explanatory, and complete title which can
indicate the nature of data contained. explain the locality (i.e., geographical or physical) of data
covered. indicate the time (or period) of data obtained. contain the source of the data to indicate the
authority for the data, as a means of verification and as a reference. The source is always placed
below the table.
3. Caption and stubs: The headings for columns and rows are called caption and stub, respectively.
They must be clear and concise.
4. Body: The body of the table should contain the numerical information. The numerical information is
arranged according to the descriptions given for each column and row.
5. Prefactory or head note: If needed, a prefactory note is given just below the title for its further
description in a prominent type. It is usually enclosed in brackets and is about the unit of
measurement.
6. Footnotes: Anything written below the table is called a footnote. It is written to further clarify either
the title captions or stubs. For example, if the data described in the table pertain to profits earned by
a company, then the footnote may define whether it is profit before tax or after tax. There are
various ways of identifying footnotes:
Numbering footnotes consecutively with small number 1, 2, 3, …, or letters a, b, c, …, or star *, **,
… or symbols like @ or $.
7. Sources of data: The source of data should be given below the table.

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
A model structure of the table consisting of above parts is given below:
Table 14 : Newpaper Preferred by Various Occupations

Occupation Caption
Stub Newspaper* Private Sector Public Sector Self Sub
Employee Employee Employed Captions
Hindustan Times 43 18 51
Indian Express 21 38 22
Sub Stub Body
The Hindu 15 37 20
Times of India 29 27 33
Footnote - * All newspapers are in English language
Source Note - PEW Research Center’s Indian Life Project, Tracking Study, July 25–August 26, 2011

Type of Tables
Tables can be classified according to their purpose, stage of enquiry, nature of data or number of
characteristics used. On the basis of the number of characteristics, tables may be classified as follows:
1. Simple or one-way table: A simple or one-way table is the simplest table, which contains data of
one characteristic only. A simple table is easy-to construct and simple to follow. For example, the
following table show the number of readers of various english news papers in a loca1ity.
Table 15: Readers of various Newspapers

Newspaper Number of Readers


Hindustan Times 112
Times of India 89
Indian Express 81
The Hindu 72
2. Two-way table (Cross Classification Table):A table which contains data on two characteristics
is called a two-way table. In such case, therefore, either stub or caption is divided into two co-
ordinate parts. In the given table, as an example, the caption in above table may be further divided
in respect of occupation of the readers. This sub-division is shown in the following two-way table,
which now contains two characteristics namely, the occupation and the preferred newspaper
Table 16: Readers of Popular Newspapers from different Occupations

Occupation
Newspaper* Private Sector Public Sector Self
Employee Employee Employed
Hindustan Times 43 18 51
Times of India 29 27 33
Indian Express 21 38 22
The Hindu 15 37 20

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
3. Manifold table: A table, in which more than two characteristics of data are considered, is called a
manifold table. For instance, table below shows three characteristics, namely, occupation,
newspaper and marital status.
Table 17: Newspaper Readers from different Occupations and Marital status

Occupation
Private Sector Public Sector Self
Newspaper*
Employee Employee Employed
M S M S M S
Hindustan Times 21 22 8 10 40 11
Times of India 20 9 20 7 20 13
Indian Express 12 9 20 18 13 9
The Hindu 10 5 25 12 15 5
Foot note- M stands for married and S for single.

GRAPHICAL REPRESENTATION OF DATA


A graphical representation is a visual display of data and statistical results. It is often more effective
than presenting data in tabular form. There are many different types of graphical representation and
which is used depends on the nature of the data and the type of statistical results. Most commonly used
graphical representations are as follows:
Presentation of Qualititative (Categorical) Data
1. Bar Graph
Bar graph (also known as a bar chart) is used to represent the categorical or qualitative data. Since
the values of a categorical variable are labels for the categories, the distribution of a categorical
variable gives either the count or the percent of individuals falling into each category. It uses
rectangular bars (columns plotted on a graph) to represent different valuesto show comparisons
among categories, such as the amount of rainfall that occurred during different months of a year,
or the average salary in different states. The columns are positioned over a label that represents a
categorical variable. The height of the column indicates the size of the group defined by the
column label. Bar graphs are most commonly drawn vertically, though they can also be depicted
horizontally. For example, the following bar chart represents the data of number of readers of four
newspapers given in table 15

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
Number of Readers of Popular Newspapers
120 112 Number of Readers
100 89
81
80 72
60
40
20
0
Hindustan Times of India Indian Express The Hindu
Times

Figure 5: Bar Chart


2. Multiple Bar Chart
Multiple bar chart is used to represent the data given by a two way table. For example following
bar chart represent the data given in table 16
60
51
50
43
40 38 37
33
29 27
30
21 22 20
20 18
15
10
0
Hindustan Times Times of India Indian Express The Hindu

Private Sector Employee Public Sector Employee Self Employed

Figure 6: Multiple Bar Chart


3. Pie Chart
An appropriate graphical representation of category frequencies is a pie chart, when we want to
show the proportion or percentage belonging to each category out of thr total.A pie chart is a circle
that is subdivided into slices whose areas are proportional to the frequencies (or relative
frequencies), thereby displaying the proportion (%) of occurrences of each category. Every 1%
contribution that a category contributes to the total corresponds to a slice with an angle of 3.6
degrees. Pie charts are useful tools that help you figure out and understand polls, statistics,
complex data, and income or spending. For example, the following bar chart represents the
percentage of readers out of total number of readers given in table 15

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
Percentage distribution of Readers

The Hindu Hindustan


20% Times
32%

Indian Express
23% Times of India
25%

Figure 7 : Pie Chart


Presentation of Quantitative Data
4. Line Chart: Line charts are used to represent chronological data i.e. when the data is related to a
quantitative variable (Sales, Demand, Production etc.) over a few consecutive time periods (years,
months, weeks etc.). The values of the variable are plotted above the point on the horizontal axis
representing corresponding time period.. Line charts are particularly useful when the trend over
time is to be emphasised. For example, the following line chart shows the stock prices of Bharti
Airtel for the ninteen consecutive days of the October 2009

Figure 8: Line Graph


5. Bar Chart: Suitable graphical representation of frequency distribution of discrete quantitative
data is also a bar chart. For example the following bar chart represents the frequency distribution
given in table

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
Frequency Distribution of No. Of Smartphones Sold in 40 Days
9
8
8
Frequency (No. of Days)

7
7
6
6
5
4 4
4
3
3
2 2 2
2
1 1
1
0
5 6 7 8 9 10 11 12 13 14 15
No. of Smartphones

Figure 9
6. Histogram: For frequency distribution of continuous quantitative data, one of the most useful
graph is a histogram. It is a diagram consisting of rectangles whose area is proportional to the
frequency of a variable and whose width is equal to the class interval.
Making a Histogram Using a Frequency Distribution Table
To make a histogram, follow these steps:
1. On the vertical axis (Y – axis), place frequencies. Label this axis "Frequency".
2. On the horizontal axis (X – axis), place the lower value of each interval.
3. Draw a bar extending from the lower value of each interval to the lower value of the next
interval. The height of each bar should be equal to the frequency of its corresponding
interval.
Example: A histogram showing the frequency distribution of the waiting time of customers at a
two-wheeler service station (see table 13)

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
Histogram
9
8
8
7
7
6
Frequency 6
5
4
4
3
3
2
2
1
0
2 5 8 11 14 17 20
Waiting Time (Min)

Figure 10
A Histogram provides a visual representation so you can see where most of the measurements
(values of the variable) are located and how spread out they are. In the above histogram most of
the values are located in the interval 8 – 11 and they are almost symmetrically distributed on both
the sides of this interval.
7. Frequency Polygon
We create a frequency polygon from a histogram. If the middle top points of the bars of the
histogram are joined, a frequency polygon is formed. Frequency polygon and histogram fulfills the
same purpose. However, the former one is useful in comparison of different datasets. Following is
an example of frequency polygon created from the histogram (Figure 15) of waiting times of
customes.
9 Frequency Polygon
8
7
6
Frequency

5
4
3
2
1
0
0.5 3.5 6.5 9.5 12.5 15.5 18.5 21.5

Waiting Time (Minutes)

Figure 11
8. Ogive
The ogive is a graph that represents the cumulative frequencies for the classes in a frequency
distribution.An ogive uses class boundaries along the horizontalscale, and cumulative frequencies

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
along the vertical scale.Ogives are useful for determining the number of values below or above
some particular value. Following two diagrams represent a less than type Ogive and a more than
type Ogive created from the cumulative frequency distributions given in table 11 and table 12:

35 Less than Ogive 35 More than Ogive


Cumulative Frequency

Cumulative Frequency
30 30
25 25
20 20
15 15
10 10
5 5
0 0
2 5 8 11 14 17 20 2 5 8 11 14 17 20
Waiting Time (Minutes) Waiting Time (Minutes)

Figure 12 : Ogive
9. Scatter Plot
A scatterplot (or scatter diagram) is a plot of paired (x, y) quantitative data with a horizontal x-axis
and a vertical y-axis. The horizontal axis is used for the first (x) variable, and the vertical axis is
used for the second variable. The pattern of the plotted points is often helpful in determining
whether there is a relationship between two quantitative variables such as price and demand, sales
and price, temperature and volume etc. For example Doctors are interested in the possible
relationship between the dosage of a medicine and the time required for a patient’s recovery. The
following table shows, for a sample of 10 patients, dosage levels (in grams) and recovery times (in
hours).
Table 18

1. 1. 1. 1.
Dosage level 1.2 1 1.4 1.8 1.3 1.3
3 5 2 4
Recovery time 25 28 40 38 10 9 27 30 16 18
We can describe the data graphically with a scatter plot as shown below:

Dr.Bhavana Likhitkar/Faculty/LNCTU
LNCT GROUP OF COLLEGES
Dosage Level and Recovery time
50
Revovery Time (Hours)
40

30

20

10

0
0.75 1 1.25 1.5 1.75 2
Dosage Level

Figure 13
The above graph shows that the recovery time decreases as the dosage level increases i.e. we can
say that there is a relationship between the two variables.

Dr.Bhavana Likhitkar/Faculty/LNCTU

You might also like