0% found this document useful (0 votes)
3 views43 pages

Processing and Analysis of Data

The document outlines the processes of data processing and analysis, detailing steps such as editing, coding, classification, and tabulation. It emphasizes the importance of statistics in research for summarizing and analyzing data, introducing concepts like measures of central tendency, dispersion, and asymmetry. The document also describes various statistical measures and their applications in understanding data relationships and distributions.

Uploaded by

it2022088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views43 pages

Processing and Analysis of Data

The document outlines the processes of data processing and analysis, detailing steps such as editing, coding, classification, and tabulation. It emphasizes the importance of statistics in research for summarizing and analyzing data, introducing concepts like measures of central tendency, dispersion, and asymmetry. The document also describes various statistical measures and their applications in understanding data relationships and distributions.

Uploaded by

it2022088
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Processing and Analysis of Data

Introduction
• After data collection, the next step is to processing and analysis of
data.
• Technically speaking, processing implies editing, coding,
classification and tabulation of collected data so that they are
amenable to analysis.
• The term analysis refers to the computation of certain measures
along with searching for patterns of relationship that exist among
data-groups.
PROCESSING OPERATIONS
1. Editing:
• Editing of data is a process of examining the collected raw data
(specially in surveys) to detect errors and omissions and to correct
these when possible.
• As a matter of fact, editing involves a careful scrutiny of the completed
questionnaires and/or schedules.
• Editing is done to assure that the data are accurate, consistent with
other facts gathered, uniformly entered, as completed as possible and
have been well arranged to facilitate coding and tabulation.
• Editing may be two types:
» Field Editing
» Central Editing
Field Editing
• Field editing consists in the review of the reporting forms by the
investigator for completing (translating or rewriting) what the latter
has written in abbreviated and/or in illegible form at the time of
recording the respondents’ responses.
• This type of editing is necessary in view of the fact that individual
writing styles often can be difficult for others to decipher.
• This sort of editing should be done as soon as possible after the
interview, preferably on the very day or on the next day
Central Editing
• Central editing should take place when all forms or schedules have
been completed and returned to the office.
• This type of editing implies that all forms should get a thorough
editing by a single editor in a small study and by a team of editors in
case of a large inquiry.
• Editor(s) may correct the obvious errors such as an entry in the
wrong place, entry recorded in months when it should have been
recorded in weeks, and the like.
• In case of inappropriate on missing replies, the editor can sometimes
determine the proper answer by reviewing the other information in
the schedule.
• At times, the respondent can be contacted for clarification.
• The editor must strike out the answer if the same is inappropriate
and he has no basis for determining the correct answer or the
response.
PROCESSING OPERATIONS contd..
2. Coding:
• Coding refers to the process of assigning numerals or other symbols
to answers so that responses can be put into a limited number of
categories or classes.
• Such classes should be appropriate to the research problem under
consideration.
• They must also possess the characteristic of exhaustiveness (i.e.,
there must be a class for every data item) and also that of mutual
exclusively which means that a specific answer can be placed in one
and only one cell in a given category.
• Coding is necessary for efficient analysis and through it the several
replies may be reduced to a small number of classes which contain
the critical information required for analysis.
PROCESSING OPERATIONS contd..
3. Classification:
• Most research studies result in a large volume of raw data which must
be reduced into homogeneous groups if we are to get meaningful
relationships.
• This fact necessitates classification of data which happens to be the
process of arranging data in groups or classes on the basis of common
characteristics.
• Data having a common characteristic are placed in one class and in
this way the entire data get divided into a number of groups or classes
Types of Classification
Classification according the the attributes:
• Data are classified on the basis of common characteristics which can
either be descriptive ( such as literacy, sex, honesty, etc.) or
numerical (such as weight, height, income, etc.).
• classification can be simple classification or manifold classification.
• In simple classification we consider only one attribute and divide the
universe into two classes—one class consisting of items possessing
the given attribute and the other class consisting of items which do
not possess the given attribute.
• But in manifold classification we consider two or more attributes
simultaneously, and divide that data into a number of classes (total
number of classes of final order is given by 2n
• ,where n = number of attributes considered).
Types of Classification Contd...
Classification according to class-intervals:
• Unlike descriptive characteristics, the numerical characteristics refer
to quantitative phenomenon which can be measured through some
statistical units.
• Data relating to income, production, age, weight, etc. come under
this category. Such data are known as statistics of variables and are
classified on the basis of class intervals.
• For instance, persons whose incomes, say, are within Rs 201 to Rs
400 can form one group, those whose incomes are within Rs 401 to
Rs 600 can form another group and so on.
• In this way the entire data may be divided into a number of groups
or classes or what are usually called, ‘class-intervals.’
• Each group of class-interval, thus, has an upper limit as well as a
lower limit which are known as class limits.
• The difference between the two class limits is known as class
magnitude.
Types of Classification Contd...
Classification according to class-intervals: (contd...)
• Class interval mainly divided into two types
» Exclusive type class Interval
» Inclusive type class interval

Exclusive Type class interval: They are usually stated as follows:


10–20
20–30
30–40
40–50
The above intervals should be read as under:
10 and under 20
20 and under 30
30 and under 40
40 and under 50
Thus, under the exclusive type class intervals, the items whose values are equal to the upper limit
of a class are grouped in the next higher class. For example, an item whose value is exactly 30
would be put in 30–40 class interval and not in 20–30 class interval.
Here the upper limit of a class interval is excluded and items with values less than the upper limit
(but not less than the lower limit) are put in the given class interval.
Types of Classification Contd...
Classification according to class-intervals: (contd...)
Inclusive Type class interval: They are usually stated as follows:
11–20
21–30
31–40
41–50
In inclusive type class intervals the upper limit of a class interval is also
included in the concerning class interval. Thus, an item whose value is 20 will
be put in 11–20 class interval.
The stated upper limit of the class interval 11–20 is 20 but the real limit is
20.99999 and as such 11–20 class interval really means 11 and under 21.
Types of Classification Contd...
4. Tabulation:
• Tabulation is the process of summarising raw data and displaying
the same in compact form (i.e., in the form of statistical tables) for
further analysis.
• In a broader sense, tabulation is an orderly arrangement of data in
columns and rows.
• Tabulation is essential because of the following reasons.
1. It conserves space and reduces explanatory and descriptive statement
to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and
omissions.
4. It provides a basis for various statistical computations.
Types of Classification Contd...
4. Tabulation: (Contd..)
Generally accepted principles of tabulation:
i. Every table should have a clear, concise and adequate title and this title should
always be placed just above the body of the table.
ii. Every table should be given a distinct number to facilitate easy reference.
iii. The column headings (captions) and the row headings (stubs) of the table should
be clear and brief.
iv. The units of measurement under each heading or sub-heading must always be
indicated.
v. Explanatory footnotes, if any, concerning the table should be placed directly
beneath the table, along with the reference symbols used in the table.
vi. Source or sources from where the data in the table have been obtained must be
indicated just below the table.
vii. Usually the columns are separated from one another by lines which make the
table more readable and attractive.
viii. The columns may be numbered to facilitate reference.
ix. Those columns whose data are to be compared should be kept side by side.
Similarly, percentages and/or averages must also be kept close to the data.
x. Miscellaneous and exceptional items, if any, should be usually placed in the last
row of the table.
Analysis of Data
• As stated earlier, by analysis we mean the computation of certain
indices or measures along with searching for patterns of relationship
that exist among the data groups.
• Most of cases statistics used as a data analysis tool.
• We calculate different statistical parameters like mean, mode, median to
analyse our data.
STATISTICS IN RESEARCH
• The role of statistics in research is to function as a tool in designing
research, analysing its data and drawing conclusions therefrom.
• Most research studies result in a large volume of raw data which
must be suitably reduced so that the same can be read easily and can
be used for further analysis.
• Clearly the science of statistics cannot be ignored by any research
worker, even though he may not have occasion to use statistical
methods in all their details and ramifications.
• Classification and tabulation, as stated earlier, achieve this objective
to some extent, but we have to go a step further and develop certain
indices or measures to summarise the collected/classified data.
• Only after this we can adopt the process of generalisation from
small groups (i.e., samples) to population.
STATISTICS IN RESEARCH Contd...
• There are two major areas of statistics viz., descriptive statistics and
inferential statistics.
• Descriptive statistics concern the development of certain indices
from the raw data,
• whereas inferential statistics concern with the process of
generalisation.
• Inferential statistics are also known as sampling statistics and are
mainly concerned with two major type of problems:
(i) the estimation of population parameters, and
(ii) the testing of statistical hypotheses
Some of the Important Statistical Measure
The important statistical measures that are used to summarise the
survey/research data are:
(1) measures of central tendency or statistical averages;
(2) measures of dispersion;
(3) measures of asymmetry (skewness);
(4) measures of relationship; and
(5) other measures.
Measures of Central Tendency
• Amongst the measures of central tendency, the three most important ones
are the arithmetic average or mean, median and mode. Geometric mean and
harmonic mean are also sometimes used.
• Measures of central tendency (or statistical averages) tell us the point about
which items have a tendency to cluster.
• Such a measure is considered as the most representative figure for the
entire mass of data.
• Measure of central tendency is also known as statistical average.
• Mean, median and mode are the most popular averages.
Measures of Central Tendency Contd..
Mean:
• Mean, also known as arithmetic average,
• is the most common measure of central tendency and
• may be defined as the value which we get by dividing the total of the values
of various given items in a series by the total number of items.
• we can work it out as under:
Measures of Central Tendency Contd..
Median:
Median is the value of the middle item of series when it is arranged in
ascending or descending order of magnitude.
It divides the series into two halves; in one half all items are less than
median, whereas in the other half all items have values higher than
median.
If the values of the items arranged in the ascending order are:
60, 74, 80, 95, 100, then the value of the 3rd item viz., 80 is
the value of median.
We can also write thus:
Measures of Central Tendency Contd..
Mode:
• Mode is the most commonly or frequently occurring value in a series.
• The mode in a distribution is that item around which there is maximum
concentration.
• In general, mode is the size of the item which has the maximum
frequency.
• Mode is particularly useful in the study of popular sizes.
• For example, a manufacturer of shoes is usually interested in finding out
the size most in demand so that he may manufacture a larger quantity of
that size.
• In other words, he wants a modal size to be determined for median or
mean size would not serve his purpose.
• Here Mode can be useful.
Example Mean,Median and Mode
Example Mean,Median and Mode
Measures of Central Tendency Contd..
Geometric mean:
is also useful under certain conditions. It is defined as the nth root of
the
product of the values of n times in a given series.
Symbolically, we can put it thus:
Measures of Central Tendency Contd..
Harmonic mean:
is defined as the reciprocal of the average of reciprocals of the values
of items of a series.
Symbolically, we can express it as under:
MEASURES OF DISPERSION
• An averages can represent a series only as best as a single
figure can, but it certainly cannot reveal the entire story of any
phenomenon under study.
• Specially it fails to give any idea about the scatter of the
values of items of a variable in the series around the true value
of average.
• In order to measure this scatter, statistical devices called
measures of dispersion are calculated.
MEASURES OF DISPERSION contd...
• Dispersion refers to the variation of the items among
themselves / around an average.
• Greater the variation among the different items of a seires, the
more will be the dispersion.
• As per Bowley “Dispersion is a measure of variation of items”
MEASURES OF DISPERSION contd...
Important measures of dispersion

• Important measures of dispersion are


– range,
– mean deviation, and
– standard deviation
Range
Range: Merits and Demerits
Mean Deviation
Numerical problems on finding Mean Deviation
Standard Deviation
Standard Deviation contd...
Example 1.Find the standard deviation(SD) of the following
data:
16,20,18,19,20,20,28,17,22,20 [Ans: 3.13]
Standard Deviation contd...
Solution of Example 2
Standard Deviation from Continuous Series
Measure of Dispersion: Variance
MEASURES OF ASYMMETRY (SKEWNESS)

When the distribution of item in a series happens to be perfectly


symmetrical, we then have the following type of curve for the
distribution:

Such a curve is technically described as a normal curve and the relating distribution as
normal distribution. Such a curve is perfectly bell shaped curve in which case the value of
or or M or Z is just the same and skewness is altogether absent.
MEASURES OF ASYMMETRY (SKEWNESS) contd..

• But if the previous curve is distorted (whether on the right side or on


the left side), we have asymmetrical distribution which indicates that
there is skewness.
• If the curve is distorted on the right side, we have positive skewness
but when the curve is distorted towards left, we have negative
skewness as shown here under:
MEASURES OF ASYMMETRY (SKEWNESS) contd..

• Skewness is, thus, a measure of asymmetry and shows the manner in


which the items are clustered around the average.
• In a symmetrical distribution, the items show a perfect balance on
either side of the mode, but in a skew distribution the balance is
thrown to one side.
• The amount by which the balance exceeds on one side measures the
skewness of the series.
• The difference between the mean, median or the mode provides an
easy way of expressing skewness in a series.
MEASURES OF ASYMMETRY (SKEWNESS) contd..

You might also like