Module 3 Data Collection and Organization
Module 3 Data Collection and Organization
Contents
1. Data collection methods
2. Collection of primary data
3. Secondary data
4. Data organization
5. Methods of data grouping
6. Diagrammatic representation of data
7. Graphic representation of data.
The primary data are those which are collected fresh and for the first time, and thus happen to
be original in character.
The secondary data are those which have already been collected by someone else and which
have already been passed through the statistical process.
The researcher would have to decide which sort of data he would be using for his study and
accordingly he will have to select the method of data collection.
The observation method is the most commonly used method in studies relating to behavioural
sciences.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
2
For instance, in a study relating to consumer behaviour, the investigator instead of asking the
brand of wrist watch used by the respondent, may himself look at the watch.
Advantages:
1. Subjective bias is eliminated, if observation is done accurately.
2. Information observed relates to what is currently happening; it is not complicated by
either the past behaviour or future intentions or attitudes.
3. Independent of respondents’ willingness to respond.
4. suitable in studies which deal with respondents who are not capable of giving verbal
reports
Limitations:
1. Expensive method
2. Information provided is limited.
3. Unforeseen factors may interfere with the observational task
4. Some people are rarely accessible to direct observation, creating an obstacle for
effective data collection.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
3
If the observer observes by making himself a member of the group he is observing so that he
can experience what the members of the group experience, the observation is called as the
participant observation.
When the observer observes without any attempt on his part to experience what the
participants feel, the observation called as non-participant observation.
When the observer is observing in such a manner that his presence may be unknown to the
people he is observing, such an observation is described as disguised observation.
In controlled observation:
1. we use mechanical (or precision) instruments as aids to accuracy and standardisation.
2. has a tendency to supply formalised data upon which generalisations can be built with
some degree of assurance.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
4
Controlled observation takes place in various experiments that are carried out in a laboratory
or under controlled conditions.
If the observation takes place in the natural setting, it may be termed as uncontrolled
observation.
In non-controlled observation:
1. no attempt is made to use precision instruments.
2. The major aim is to get a spontaneous picture of life and persons.
3. It has a tendency to supply naturalness and completeness of behaviour, allowing
sufficient time for observing.
Interview Method
The interview method of collecting data can be used through personal interviews and, if
possible, through telephone interviews.
(a) Personal interviews: It requires a person known as the interviewer asking questions
generally in a face-to-face contact to the other person or persons.
This sort of interview may be in the form of direct personal investigation or it may be indirect
oral investigation.
He has to be on the spot and has to meet people from whom data have to be collected.
Most of the commissions and committees appointed by government to carry on investigations
make use of this method.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
5
The method of collecting information through personal interviews is usually carried out in a
structured way.
Such interviews involve the use of a set of predetermined questions and of highly
standardised techniques of recording.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
6
(iii) Certain types of respondents such as important officials or executives or people in high
income groups may not be easily approachable under this method and to that extent the data
may prove inadequate.
(iv) This method is relatively more-time-consuming, specially when the sample is large and
recalls upon the respondents are necessary.
(v) The presence of the interviewer on the spot may over-stimulate the respondent, sometimes
even to the extent that he may give imaginary information just to make the interview
interesting.
(vi) Under the interview method the organisation required for selecting, training and
supervising the field-staff is more complex with formidable problems.
(vii) Interviewing at times may also introduce systematic errors.
(viii) Effective interview presupposes proper rapport with respondents that would facilitate
free and frank responses.
It is not a very widely used method, but plays important part in industrial surveys,
particularly in developed regions.
Researcher should note the following with regard to these three main aspects of a
questionnaire:
1. General form: It can either be structured or unstructured questionnaire.
2. Question sequence: A proper sequence of questions reduces considerably the chances of
individual questions being misunderstood. The question-sequence must be clear and
smoothly-moving, meaning thereby that the relation of one question to another should be
readily apparent to the respondent, with questions that are easiest to answer being put in the
beginning.
3. Question formulation and wording: Question should also be impartial in order not to give a
biased picture of the true state of affairs.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
9
The schedule is generally filled out by the research worker or the enumerator, who can
interpret questions when necessary.
2. To collect data through questionnaire is relatively cheap and economical since we have to
spend money only in preparing the questionnaire and in mailing the same to respondents.
Here no field staff required.
To collect data through schedules is relatively more expensive
since considerable amount of money has to be spent in appointing enumerators and in
importing training to them.
Money is also spent in preparing schedules.
3. Non-response is usually high in case of questionnaire as many people do not respond and
many return the questionnaire without answering all questions.
As against this, non-response is generally very low in case of schedules because these are
filled by enumerators who are able to get answers to all questions. But there remains the
danger of interviewer bias and cheating.
4. In case of questionnaire, it is not always clear as to who replies, but in case of schedule the
identity of respondent is known.
5. The questionnaire method is likely to be very slow since many respondents do not return
the questionnaire in time despite several reminders, but in case of schedules the information
is collected well in time as they are filled in by enumerators.
7. Questionnaire method can be used only when respondents are literate and cooperative, but
in case of schedules the information can be gathered even when the respondents happen to
be illiterate.
8. Wider and more representative distribution of sample is possible under the questionnaire
method, but in respect of schedules there usually remains the difficulty in sending
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
10
9. Risk of collecting incomplete and wrong information is relatively more under the
questionnaire method, particularly when people are unable to understand questions properly.
But in case of schedules, the information collected is generally complete and accurate as
enumerators can remove the difficulties, if any, faced by respondents in correctly
understanding the questions.
As a result, the information collected through schedules is relatively more accurate
than that obtained through questionnaires.
10. The success of questionnaire method lies more on the quality of the questionnaire itself,
but in the case of schedules much depends upon the honesty and competence of enumerators.
11. In order to attract the attention of respondents, the physical appearance of questionnaire
must be quite attractive, but this may not be so in case of schedules as they are to be filled
in by enumerators and not by respondents.
12. Along with schedules, observation method can also be used but such a thing is not
possible while collecting data through questionnaires.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
11
(e) reports and publications of various associations connected with business and industry,
banks, stock exchanges, etc.;
(f) reports prepared by research scholars, universities, economists, etc. in different fields;
(g) public records and statistics, historical documents, and other sources of published
information.
Researcher must be very careful in using secondary data. He must make a minute scrutiny
because it is just possible that the secondary data may be unsuitable or may be inadequate in
the context of the problem which the researcher wants to study.
In this connection Dr. A.L. Bowley very aptly observes that it is never safe to take published
statistics at their face value without knowing their meaning and limitations and it is always
necessary to criticise arguments that can be based on them.
By way of caution, the researcher, before using secondary data, must see that they possess
following characteristics:
1. Reliability of data: The reliability can be tested by finding out such things about the said
data:
(a) Who collected the data?
(b) What were the sources of data?
(c) Were they collected by using proper methods ?
(d) At what time were they collected?
(e) Was there any bias of the compiler?
(t) What level of accuracy was desired? Was it achieved ?
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
12
2. Suitability of data: The data that are suitable for one enquiry may not necessarily be
found suitable in another enquiry.
Hence, if the available data are found to be unsuitable, they should not be used by the
researcher.
In this context, the researcher must very carefully scrutinise the definition of various terms
and units of collection used at the time of collecting the data from the primary source
originally.
Similarly, the object, scope and nature of the original enquiry must also be studied.
If the researcher finds differences in these, the data will remain unsuitable for the present
enquiry and should not be used.
3. Adequacy of data: If the level of accuracy achieved in data is found inadequate for the
purpose of the present enquiry, they will be considered as inadequate and should not be used
by the researcher.
The data will also be considered inadequate, if they are related to an area which may be either
narrower or wider than the area of the present enquiry.
The already available data should be used by the researcher only when he finds them reliable,
suitable and adequate.
But he should not blindly discard the use of such data if they are readily available from
authentic sources and are also suitable and adequate for in that case it will not be economical
to spend time and energy in field surveys for collecting information.
Data Organization
A systematically organized data is very important for future analysis. When you work with
data every day, its organization is obvious to you, but it may be hard to understand to others
who know nothing about the project.
A good logical system of data organization helps to share and exchange data. When deciding
how to organize your data, think about the nature of data. It is also possible to organize
chronologically, especially when you work with both real-time and historical data.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
13
1. Alphabetical organization
This is probably the first option that many people consider.
It is easy and fast to organize the data alphabetically.
However, data organization is not just for the purpose of storage of the information, but its
retrieval is important as well.
Organizing information by alphabetical order works excellently only if people know some
specific terms or topics that they are looking for. In such a case, accessing some particular
content will be more like ‘Taking a walk in the park’. Since you know that the topic you
are searching for begins with letter ‘T’, you will not have to waste time going through the
contents beginning with the other letters.
This method of data organization will be more or less useless if the individual does not know
the topic that they are looking for.
2. Location
Data can be organized by showing a visual depiction of some physical space.
Maps are the most common ways to organize information based on location.
Consider maps like those of some college campus or the shopping mall directories.
They give you a mental image of where a particular shop or lecture hall is located in relation
to another.
Organizing data based on location helps to show the relationships between the various types
of content that are relevant to each other.
4. Hierarchy
Hierarchies are beneficial when you want to show how one piece of information is related to
another one in the order of importance or their ranks.
They are used in organizational charts when you want to show who should report to whom in
a human resource department.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
14
They can also be used to show scale, for instance things like biggest to smallest or lightest to
heaviest.
You can organize the data in ascending or descending order although many people choose to
work with descending order.
5. Category
Categories are very useful for a variety of purposes, for example describing different types of
data that are being generated by an institution.
The problem with this method is that it is so broad compared to the other methods.
You can organize the data in just about any way imaginable- by color, gender, price, shape,
model etc. The options are infinite.
Data Grouping
Grouped data are data formed by aggregating individual observations of a variable into
groups, so that a frequency distribution of these groups serves as a convenient means of
summarizing or analyzing the data.
The idea of grouped data can be illustrated by considering the following raw dataset:
20 25 24 33 13
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
15
26 8 19 31 11
16 21 17 11 34
14 15 21 18 17
The above data can be grouped in order to construct a frequency distribution in any of several
ways. One method is to use intervals as a basis.
The smallest value in the above data is 8 and the largest is 34. The interval from 8 to 34 is
broken up into smaller subintervals (called class intervals).
For each class interval, the amount of data items falling in this interval is counted. This
number is called the frequency of that class interval.
Another method of grouping the data is to use some qualitative characteristics instead of
numerical intervals.
For example, suppose in the above example, there are three types of students:
1) Below normal, if the response time is 5 to 14 seconds,
2) normal if it is between 15 and 24 seconds, and
3) above normal if it is 25 seconds or more, then the grouped data looks like:
Table 3: Frequency distribution of the
three types of students
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
16
Frequency
Below normal 5
Normal 10
Above normal 5
Age Frequency
10 10
11 20
12 10
Mean of grouped data
An estimate, , of the mean of the population from which the data are drawn can be
calculated from the grouped data as:
In this formula, x refers to the midpoint of the class intervals, and f is the class frequency.
Note that the result of this will be different from the sample mean of the ungrouped data. The
mean for the grouped data in the above example, can be calculated as follows:
Class Intervals Frequency ( f ) Midpoint fx
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
17
(x)
10 ≤ t < 15 4 12.5 50
20 ≤ t < 25 4 22.5 90
25 ≤ t < 30 2 27.5 55
TOTAL 20 405
The mean for the grouped data in example 4 above can be calculated as follows:
Midpoint
Age Group Frequency ( f ) fx
(x)
10 10 10.5 105
11 20 11.5 230
12 10 12.5 125
TOTAL 40 460
Thus, the mean of the grouped data is
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
18
Diagrammatic representation can be used for both the educated section and uneducated
section of the society. Furthermore, any hidden trend present in the given data can be noticed
only in this mode of representation.
However, compared to tabulation, this is less accurate. So if there is a priority for accuracy,
we have to recommend tabulation.
1. Bar diagram
2. Component Bar Charts
3. Comparative Bar Charts or Multiple Bar Charts
4. Pie chart
5. Pictogram Chart
Bar diagram
There are two types of bar diagrams namely, Horizontal Bar diagram and Vertical
bar diagram.
While horizontal bar diagram is used for qualitative data or data varying over space, the
vertical bar diagram is associated with quantitative data or time series data.
Bars i.e. rectangles of equal width and usually of varying lengths are drawn
either horizontally or vertically.
Bar diagrams for comparing different components of a variable and also the relating of the
components to the whole. For this situation, we may also use Pie chart or Pie diagram or
circle diagram.
Example :
The total number of runs scored by a few players in one-day match is given.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
19
Solution :
To diagrammatically represent this you can split each bar into three different sections, where
the length of each section represents the number of guests in that age group.
The resulting bar chart would then be a component bar chart, as shown in Figure. Notice
that a key (or ‘legend’) has been added to the graph to explain the shading for the different
age categories.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
20
When you read a component bar chart, you need to find the height of the relevant section to
determine the quantity that it represents.
For instance, the top of the section representing British adults is opposite the 10 on the
vertical scale. The bottom of that section is opposite the 4. Thus, the number of British adults
is the difference between the top and the bottom of the section, which is 6.
It is easy to see why this is called a comparative bar chart, as it is straightforward to compare
the data – for example it is easy tell that many more visitors from Britain were female.
Pie chart
In a pie chart, the various observations or components are represented by the sectors of a
circle and the whole circle represents the sum of the value of all the components.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
21
Clearly, the total angle of 360° at the center of the circle is divided according to the values of
the components .
The central angle of a component is = [ Value of the component / Total value] x 360°
Sometimes, the values of the components are expressed in percentages. In such cases,
The central angle of a component is = [ Percentage value of the component / 100 ] x 360°
From the above table, clearly, we obtain the required pie chart as shown below.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
22
Pictogram Chart
Also known as Pictograph Chart, Pictorial Chart, Pictorial Unit Chart, Picture Graph.
Pictogram Charts use icons to give a more engaging overall view of small sets of discrete
data.
Typically, the icons represent the data’s subject or category, for example, data on population
would use icons of people. Each icon can represent one unit or any number of units (e.g. each
icon represents 10).
Data sets are compared side-by-side in either columns or rows of icons, to compare each
category to one another.
The use of icons can sometimes help overcome differences in language, culture and
education. Icons can also give a more representational view of the data.
So for example, if your data is of 5 cars, you show 5 icons of cars in the chart.
Two things to avoid when using Pictogram Charts are:
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
23
Using them for large data sets, this makes values on the chart hard to count.
Displaying partial icons, as this can add confusion to what they represent.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
24
Data Sources: Include the source of information wherever it is necessary at the bottom
of the graph.
Keep it Simple: Construct a graph in an easy way that everyone can understand.
Neat: Choose the correct size, lettering, colours etc in such a way that the graph
should be a visual aid for the presentation of information.
Solution :
We can represent the profits for 7 consecutive years by drawing either a line diagram as given
below.
Let us consider years on horizontal axis and profits on vertical axis.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
26
3. Frequency Curve
4. Cumulative Frequency Curves
Histogram
This graph uses bars to represent the frequency of numerical data that are organised into
intervals. Since all the intervals are equal and continuous, all the bars have the same width.
A two dimensional graphical representation of a continuous frequency distribution is called a
histogram.
In histogram, the bars are placed continuously side by side with no gap between adjacent
bars.
That is, in histogram rectangles are erected on the class intervals of the distribution. The areas
of rectangle are proportional to the frequencies.
Histogram - Example
Example 1 :
Draw a histogram for the following table which represent the marks obtained by 100 students
in an examination :
Solution :
The class intervals are all equal with length of 10 marks.
Let us denote these class intervals along the X-axis.
Denote the number of students along the Y-axis, with appropriate scale.
The histogram is given below.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
27
Frequency polygon
Here are the steps to follow for finding the frequency distribution of a frequency polygon and
it is represented in a graphical way.
Obtain the frequency distribution and find the midpoints of each class interval.
Represent the mid points along X-axis and frequencies along Y-axis.
Plot the points corresponding to the frequency at each mid point.
Join these points, using lines in order.
To complete the polygon, join the point at each end immediately to the lower or
higher class marks on the X-axis.
Frequency 4 6 8 10 12 14 7 5
Solution :
Mark the class interval along x – axis and frequency along y – axis.
Let assume that class interval 0-10 with frequency zero and 90-100 with frequency zero.
Now calculate the midpoint of the class interval.
0-10 5 0
10-20 15 4
20-30 25 6
30-40 35 8
40-50 45 10
50-60 55 12
60-70 65 14
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
28
70-80 75 7
80-90 85 5
90-100 95 0
Frequency Curve
When you join the verticals of a polygon using a smooth curve, then the resulting figure is a
Frequency Curve. It is a limiting form of a histogram or frequency polygon. The frequency-
curve for a distribution can be obtained by drawing a smooth and free hand curve through the
mid-points of the upper sides of the rectangles forming the histogram. A frequency-curve is a
smooth curve for which the total area is taken to be unity.
As the number of observations increase, we need to accommodate more classes. Therefore, the
width of each class reduces. In such a scenario, the variable tends to become continuous and the
frequency polygon starts taking the shape of a frequency curve.
Frequency
Marks Cumulative Frequency
(No. of Students)
0–5 2 2
5 – 10 10 12
10 – 15 5 17
15 – 20 5 22
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
30
A curve that represents the cumulative frequency distribution of grouped data on a graph is
called a Cumulative Frequency Curve or an Ogive. Representing cumulative frequency data on a
graph is the most efficient way to understand the data and derive results.
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad
31
Prepared By
Dr.Swapna Raghunath,
Department of ECE,
GNITS, Hyderabad