0% found this document useful (0 votes)
15 views

Data AnalysisM - Tech

Uploaded by

Simmi Khurana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Data AnalysisM - Tech

Uploaded by

Simmi Khurana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 99

Data Collection, Assessment of

Qualitative Data, Data


Processing: Key Issues
Presentation Layout

• Introduction to data
• Classification of data
• Collection of data
• Methods of data collection
• Assessment of qualitative data
• Processing of data
- Editing
- Coding
- Tabulation
- Graphical representation
What is data?

 Data are observations or evidences about the social world

 Data, the plural of datum, can be quantitative or


qualitative in nature

 ‘data is produced, not given’; that is, researchers choose


what to call data, it is not just ‘there’ to be ‘found’.
(Marsh 1988)

- The Sage Dictionary of Social Research Methods


Data & Information
 The terms 'data' and 'information' are used interchangeably
 However the terms have distinct meanings

Data Information

Facts, events, Data that have been


transactions which have produced in such a way as
been recorded to be useful to the
recipient

Input raw materials is processed


from which information
Basic data are
processed in some
way to form
information
Nature of Data
 The research studies in behavioral science are
mainly concerned with the characteristics or traits
 Thus, tools are administered to quantify these characteristics
- but all traits or characteristics can not be
quantified The data can be classified into two broad
categories:

Data

Qualitative Data or Quantitative Data or


Attributes Variables
Nature of Data
1. Qualitative Data or Attributes

The characteristics or traits for which numerical value


can not be assigned, are called attributes
e.g. gender, motivation, etc.

2. Quantitative Data or Variables

The characteristics or traits for which numerical


value can be assigned, are called variables
e.g. height, weight etc.
Constants
A constant is all characteristic or condition that is the same for
all the observed units or sample subjects of a study

Variables
The characteristic or the trait in the behavioral science which
can be quantified is termed as variable

Variables

Continuous variables Discrete variables


Variables

1. Continuous variables

 A characteristic whose observation can take any values over


a particular range
 It can assure either fractional or integral values
 E.g. wt. of children in kg, height of pt.

2. Discrete variables

 Are those on the other hand, which exist only in units not the
fractional value (usually units of one)
 E.g. No. of cataract pts. in a village, WBC count
Attribute vs. Variable

Attribute Variable

 A category of a characteristic,  Variable describes a


to which a subject either characteristic in terms of
belongs or does not belong or a numerical value, which
property that a subject either is expressed in units of
possesses or does not measurements
possess
 The variables are height,
 The attributes are weight, blood pressure, age
becoming sick, describing of pts. etc.
blood group etc.
Qualitative Data
 In such data there is no notion of magnitude of size of
the characteristic

 They are just categorized

 The data are classified by counting the individuals having


the same characteristics or attribute and not by
measurement

 For examples: Gender: male/female


Disease: present/absent
Smoke: smoking/not smoking
 These data can be measured in nominal and ordinal scales
Quantitative Data
 Anything that can be expressed as a number, or quantity
or magnitude

 Describes characteristics in term of a numerical value, which


are expressed in units of measurements

 E.g. level of hemoglobin in the blood, no. of glaucoma pts.,


intra ocular pressure, weight, etc.

 Quantitative observations: as each individual is represented by


a number

 These data can be measured in interval and ratio scales


Measurement Scale

 The choice of appropriate statistical technique


depends upon the type of data in question

Qualitative Quantitative
Data Data
• Nominal Scale • Interval Scale
• Ordinal Scale • Ratio Scale
Nominal Scale

 The least precise or crude of the 4 basic scales


of measurement

 Implies the classification of an item into 2 or more


categories without any extent or magnitude

 There is no particular order assigned to them

 The frequency or numbers are used to give a name to


something that may be used for determining per cent,
mode
Eg. boys and girls; pass and fail; rural and urban
Ordinal Scale

 The ordinal scale is more precise scale than the


nominal scale

 The variables has been categorized or leveled


with meaningful natural order

 But there is no information about the

interval Eg. Pain: none, mild, moderate,


severe
Interval Scale

 The interval scale is more precise and refined scale


than nominal and ordinal scales

 This scale has all the characteristics and relationship of


the ordinal scale, besides which distances between any
two numbers on the scale are known

 The size of interval between two observations can


be measured

Eg. The temperature of a body


Ratio Scale

 It has the same properties as an interval scale as well as


a true or absolute zero value

 The ratio scale numerals have the qualities of real


numbers, and can be added, subtracted, multiplied
or divided

Eg. Mean systolic BP


Collection of Data
 Process of systematic gathering of data for a particular
purpose from various sources, that has been
systematically observed, recorded, organized

 It is the first step of statistical study

 There are several ways of collecting data

 The choice of procedures usually depends on the


objectives and design of the study and the availability of
time, money and personnel
Purpose of Data Collection

 To obtain information
 To keep on record
 To make decisions about important issues
 To pass information onto others
 For research study
How Important it is?

Data collection is an extremely important part of any


research because the conclusions of a study are
based on what the data reveal
Factors to be considered before
data collection

 Nature, scope & objective of the enquiry

 Sources of information

 Availability of fund

 Techniques of data collection

 Availability of trained persons


Sources of Data

Source of Data

External Internal

Primary Data Secondary Data


Example: materials Surveys
Documents
Creative
works
Interviews
Man-made
Example: ls
Unpublis
hed
thesis
and
dissertati
ons
Manuscri
pt
B
o
o
k
s
J
o
u
r
n
a
Internal & External Sources of Data
Internal sources of Data External sources of data

o Many institutions and o When information is collected


departments have from outside agencies is
information about their called external sources of data
regular functions , for their
own internal purposes
o Such types of data are
either primary or secondary
oWhen those information are
used in any survey is called
internal sources of data o This type of information
can be collected by census
o Eg. social welfare society or sampling method by
conducting survey
Primary Data
 Data collected by investigator from personal
experimental studies for a specific research goal is called
primary data

 The data are collected specially for a research project

 Used when secondary data are unavailable and inappropriate

 Data are to be unique, original, reliable and accurate in nature

 Primary data hahe not been changed or altered by human


beings, therefore its validity is greater than secondary
data
Primary Data
Merits Demerits

Targeted issues Evaluated cost


are addressed

Data interpretation is better Time consuming

High accuracy of data More number of


resources are required

Address as specific Inaccurate feedback


research issues

Greater control Required lot of skill with


labor
Primary Data Collection
Techniques
 Interview (direct/indirect)
 Schedule
 Questionnaires survey
 Focus group discussion (FGD)
 Community forums and public hearings
 Observation
 Case studies
 Key informants interview
 Internet/E-mail/SMS
Direct personal observation
 The data is collected by the investigator personally,
he/she must be a keen observer

 He/she asks or cross-examines the informant and


collects necessary information

 It is original in character
Suitability of direct personal observation

Direct personal observation is adopted in the following cases

Where greater accuracy is needed


Where the field of enquiry is not large
Where confidential data are to be collected
Where sufficient time is available
Direct personal observation

Merits Demerits

Original data Unsuitable in large area


True and reliable data Expensive & time-consuming
Encouraging response Untrained investigator
because of personal brings worst results
approach Collection of information
A high degree of accuracy according to the ease of
the informant
Indirect oral interview

 The investigator approaches the witness or third


parties, who are in touch with the informant
 The enumerator interviews the people, who are directly
or indirectly connected with the problem under the study
 Generally this method is employed by different
enquiry committees and commissions
 The police department generally adopts this method
to get clues of thefts, riots , murders, etc.
Suitability of indirect oral interview

 It is more suitable when the area to be studied is large

 It is used when direct information cannot be obtained

 This system is generally adopted by governments


Indirect oral interview
Merits
 Simple and convenient
 Saves time, money and labor
 Useful in investigation of a large area
 Adequate information can be had
Demerits
 Information can’t be relied as absence of direct contact
 Interview with an improper man will spoil the results
 To get real data, a sufficient no. of people are to be interviewed
 Careless attitude of informant affects the degree of accuracy
Information through agencies

 The local agents or correspondents will be appointed, they


collect the information and transmit it to the office or
person
 They do according to their own ways and tastes
 Adopted by newspapers, agencies, etc.
 The informants are generally called correspondents
 Suitable in those cases where the information is to
be obtained at regular intervals from a wide area
Information through agencies
Merits
 Extensive information can be had
 It is the most cheap and economical method
 Speedy information is possible
 It is useful where information is needed regularly
Demerits
 The information may be biased
 Degree of accuracy cannot be maintained
 Uniformity cannot be maintained
 Data may not be original
Mailed questionnaires

 The questionnaires is sent to the respondents, there are


blank spaces for answers
 A covering letter is also sent along with the questionnaire,
requesting the respondent to extend their full
cooperation
 Adopted by research workers, private individuals, non-
officials agencies and government
 Appropriate in cases where informants are spread over a
wide area
Mailed questionnaires

Merits

 Of all the methods, the mailed questionnaire is the


most economical
 It can be widely used, when the area of investigation is large
 It saves money, labor and time

Demerits
 Cannot be sure about the accuracy and reliability of the data
 There is long delay in receiving questionnaires duly filled in
Data Collection Through Schedules

 Very similar to the questionnaire method

 The main difference is that a schedule is filled by the


enumerator who is specially appointed for the
purpose

 Enumerator goes to the respondents, asks them the


questions from the Performa in the order listed, and
records the responses in the space provided

 Enumerators must be trained in administering the schedule


Survey
 A detailed study of geographical area to gather data,
attitudes, impressions, opinions, satisfaction level etc.,
by polling a section of the population

Census Survey Continuous Ad-hoc Survey


• Conducted Survey • Conducted at
regularly at • Conducted specific times
Types large interval of regularly for specific need
time and • ‘as and
frequently when’
required
Survey

Merits Demerits
On small scale survey
Cover large population avoided

Time consuming
Less expensive
Information does not
penetrate deeply

Information is accurate Researcher must have


good knowledge
Case Study
 It is the method of comprehensive study of social unit which
may be a person, a family, an institution, an organization or
a community

Merits Demerits
One case almost
Direct behavioral study different
from another case
Real & personal Personal bias
experience record
Make possible the Use only in limit sphere
study of social change

Increase analysis More time & money


ability & skills consuming
Focus Group Discussion

 Useful to further explore a topic, providing a broader


understanding of why the target group may behave
or think in a particular way

 And assist in determining the reason for attitudes


and beliefs

 Conducted with a small sample of the target group and

 Used to stimulate discussion and gain greater insights


Focus Group Discussion
Merits
 Useful when exploring cultural values and health beliefs
 Can be used to explore complex issues
 Can be used to develop hypothesis for further research
 Do not require participants to be literate
Demerits
 Lack of privacy/anonymity
 Potential for the risk of ‘group think’
 Potential for group to be dominated by one or two people
 Group leader needs to be skilled at conducting focus groups,
dealing with conflict, drawing out passive participants
 Time consuming to conduct and analyse
Triangulation
 Application and combination of several research methods in
the study of the same phenomenon
Beating the Bias
 Researchers can hope to overcome the weakness or intrinsic
biases and the problems that come from single method, single-
observer and single-theory studies

 The purpose of triangulation in qualitative research is to


increase the credibility and validity of the results
Types (Denzin
1978)

Data Investigator Theory Methodological


Triangulation Triangulation Triangulation Triangulation
Secondary Data

 Secondary data are those data which have been already


collected and analysed by some earlier agency for its own
use and later the same data are used by a different agency

Sources of
Secondary
Data

Published Sources Unpublished


Sources
Published Sources
Various governmental, international and local
agencies publish statistical data, and chief among
them are:

 International publications: They are UNO, WHO, Nature, etc.

 Official publications of Government: Department of Drug


Administration, Central Bureau of Statistics

 Semi-Official publications: Semi-Govt. institutions like


Municipal Corporation, District Board, etc. publish
reports
Published Sources

 Publications of Research Institutions: Nepal Development


Research Institute, Nepalese Journal of Ophthalmology
etc. publish the finding of their research program

 Journals and Newspapers: Current and important materials


on statistics and socio-economic problems can be obtained
from journals and newspapers like, Swasthya Khabar
Patrika, Health Today Magazine, The Sight, etc.
Unpublished Sources
 Records maintained by various government and private
offices
 Researches carried out by individual research scholars in
the universities or research institutes

According to Prof. Bowley “It is never safe to take published statistics


at their face value without knowing their meaning and limitations
and it is always necessary to criticize arguments that can be based on
them.”
Precautions in the use of Secondary Data

Before using the secondary data, the investigators should


consider the following factors:

Suitability of data

Adequacy of data

Reliability of data
Secondary Data must possess the following
characteristics
Reliability of data – may be tested by checking:
 Who collected the data?
 What were the sources of the data?
 Was the data collected properly?

Suitability of data
 Data that are suitable for one enquiry may not be
necessarily suitable in another enquiry
 Objective, scope and nature of the original enquiry must be studied

Adequacy of data – data is considered inadequate, if they are


related to area which may be either narrower or wider than the
area of the present enquiry
Primary data Secondary data
o Real time data
o Past data
o Sure about sources of data
o Not sure about of sources of
o Help to give results/ data
finding o Refining the problem
o Cheap and no time
o Costly and time consuming process
consuming process o Can not know in data
biasness or not
o Avoid biasness of o Less flexible
response data
o More flexible
Assessment of Qualitative Data

 The characteristics or traits for which numerical value


can not be assigned, are called qualitative data
(attributes)
e.g. gender, color, honesty etc.

 Methods of collecting qualitative data

Methods of Qualitative
Data Collection

Use of
Direct In-depth
Case Study Triangulation Secondary
Observation Interview
Data
Assessment of Qualitative Data

 Classification of Qualitative data

Qualitative
Data

Geographical Chronological Qualitative


Classification Classification Classification
Assessment of Qualitative Data

Tabulation of Qualitative Data


 Qualitative data values can be organized by a
frequency distribution
 A frequency distribution lists
– Each of the categories
– The frequency/counts for each category
Assessment of Qualitative Data

Frequency Table
 A simple data set is: cataract, cataract, keratoconus,
glaucoma, glaucoma, cataract, glaucoma, cataract
 A frequency table for this qualitative data is

Eye condition Frequency

Cataract 4

Keratoconus 1

Glaucoma 3

 The most commonly occurring eye condition is cataract


Assessment of Qualitative Data

What Is A Relative Frequency?


 The relative frequencies are the proportions (or
percents) of the observations out of the total

 A relative frequency distribution lists


– Each of the categories
– The relative frequency for each category

 Relative frequency = Frequency/Total


Assessment of Qualitative Data

Relative Frequency Table


 A relative frequency table for this qualitative data is
Refractive Error Relative Frequency

Cataract .500 (=4/8)

Keratoconus .125 (=1/8)

Glaucoma .375 (=3/8)

 A relative frequency table can also be constructed


with percents (50%, 12.5% and 37.5% for the above
table)
Assessment of Qualitative Data

 Graphical representation Of Qualitative Data


Bar Diagram

Pie or Sector
Diagram

Line Diagram

Pictogram

Map Diagram or
Cartogram
Data Processing
Data Processing

 The data, after collection, has to be prepared for analysis

 Collected data is raw and it must undergo some


processing before analysis

 The result of the analysis are affected a lot by the form


of the data

 So, proper data processing is must to get reliable result


Objectives of Data Processing

 Checking the questionnaires and schedules

 Reduction of mass data to manageable proportion

 Sum up the materials so as to prepare tables, charts,


graphs and various groupings and breakdowns for
presenting the result

 Minimizing the errors which may creep in at various stage


of the survey
Types of Data Processing

1. Manual Data Processing

 Involves human intervention

 Implies many chances for errors, such as delays in


data capture, high amount of operator misprints

 Implies higher labor expenses in regards to spending


for equipment and supplies, rent, etc.
Types of Data Processing

2. Mechanical Data Processing

 Different calculations and processing are


performed using mechanical machines like
calculators etc.

 The use of mechanical machines makes data


processing easier and less time- consuming

 The chances of errors also become far less than


manual data processing
Types of Data
Processing
3. Electronic Data Processing

 Processing of data by use of computer and its programs


Types of Data Processing
4. Real Time Processing

 There is a continual input, process and output of data

 Data has to be processed in a small stipulated time


period (real time)

 Eg, when a bank customer withdraws a sum of money from


his or her account it is vital that the transaction be
processed and the account balance updated as soon as
possible
Types of Data Processing
5. Batch Processing

 In a batch processing group of transactions collected over a


period of time is collected, entered, processed and then
the batch results are produced

 Batch processing requires separate programs for input,


process and output

 It is an efficient way of processing high volume of data

 Eg, Payroll system, examination system and billing system


Important Steps in Data Processing

The processing of data involves activities such as

QUESTIONNAIRE
EDITING CODING CLASSIFICATION
CHECKING

GRAPHICAL
DATA ADJUSTING DATA CLEANING TABULATION
REPRESENTATION
Questionnaire Checking

 When the data is collected through questionnaires, the


first steps of data process is to check the questionnaires if
they are accepted or not

Not accepted if:


 Gives the impression that respondent could
not understand the questions
 Incomplete partially or fully
 Answered by a person
who has inadequate
knowledge
Data Editing

 Process of examining the data collected


in questionnaires/schedules
 to detect errors and omissions
 to correct these when possible
 to make sure the schedules are ready for tabulation
Data Editing

 Editor is responsible for seeing that the data are;


 Accurate as possible
 Consistent with other facts secured
 Uniformly entered
 As complete as possible
 Acceptable for tabulation and arranged to
facilitate coding tabulation
Types of Editing

• Data form complete


Editing for quality • Free of bias, errors,
inconsistency and
dishonesty
Editing for • Modification to facilitate
tabulation tabulation,
• Ignoring extremely high/low

• Translating or
Field editing rewriting

• Wrong and
Central editing replacement
Necessity of Editing
 To gather
information
 To make data relevant and appropriate for analysis

 To find errors and modify them

 To ensures that the information provided is accurate

 To establish the consistency of data

 To determine whether or not the data are complete

 To obtain the best possible data available


Coding of Data

 Process of assigning numerals or other symbols to answers


so that responses can be put into limited number of
categories or classes
 Translating answers into numerical values or assigning
numbers to the various categories of a variable to be used
in data analysis
 Coding is done by using a code book, code sheet, and
a computer card
 Coding is done on the basis of the instructions given in
the codebook
 The codebook gives a numerical code for each variable
Codebook
• A codebook contains coding instructions and the
necessary information about variables in the data set

• A codebook generally contains the following information:


- column number
- record number
- variable number
- variable name
- question number
- instructions for coding

72
Necessity of Coding

 To organize data code

 To form structure for coding

 For interpretation of data

 For conclusions of data coded

 To translating answers into numerical values

 To assign no. to the various categories for data analysis

 It is necessary for efficient analysis


Classification of Data

 The process of arranging the primary data in a


definite pattern and presenting it in a systematic way

 The crude data obtained from experiment or survey


is classified according to their properties

 Classification cab be done by qualitatively or quantitatively


Objectives of classification

 The classified data is more easily understood

 It presents the facts into a simpler form

 It facilitates quick comparison

 It helps for further statistical treatment such


as average, dispersion etc.

 It detects the error easily


Types of classification

Qualitative classification Quantitative classification

Geographical classification Discrete classification

Chronological
Continuous classification
classification

Qualitative classification
Qualitative Classification

Geographical Classification
 Data are classified by location of occurrence (i.e. area,
region) eg cataract pts. district wise

Chronological classification
 Data are classified by time of occurrence of the
observations, events
 The categories are arranged in chronological order
eg, no. of trachoma pts. recorded from 2000 to
2010
Qualitative Classification
Qualitative classification (Classification according to attributes)
 Data are classified according to some quality such as
religion, literacy, sex, occupation etc.

 Simple classification
 Classification is made into 2 classes, such as classification
by male or female

 Manifold classification
 2 or more than 2 attributes are studied simultaneously
 Eg. Classification according to sex, again marital status
and again literacy
Tabulation

 Process of systematic organization and recording


of long series of data for further analysis and
interpretation into rows and columns

 It is concise, logical & orderly arrangement of data in


a columns & rows
Usefulness of
Tabulation
 It presents an overall view of findings in a simpler way

 To identify trends

 It displays relationships in a comparable way between


parts of the findings

 It conserves space and reduces explanatory and


descriptive statement to a minimum

 It facilitates the process of comparison

 It provides a basis for various statistical computations


Graphical Representation

 Graphs help to understand the data easily

 A single picture is worth a thousand words-so goes


a common saying

 The non statistical minded people also easily


understands the data and compares them

 Most common graphs are bar charts and pie charts


in qualitative study and histogram in quantitative
study
Graphical Representation

Advantages
 It is easier to read
 Can show relationship between 2 or more sets
of observations in one look
 Universally applicable
 Has high communication power
 Simplifies complex data
 Has more lasting effect on brain
Graphical Representation

Presentation of Qualitative data

1. Bar Diagram
• Consists of equally spaced vertical (or horizontal)
rectangular bars of equal width placed on a
common horizontal (or vertical) base line

• The categories are placed on X-axis and their


frequencies on Y-axis
Graphical Representation

Health Program at IOM


400
NO. OF STUDENTS

300
200
100
0
BPH MBBS B.Optom B.Pharma
Component Bar diagram
HEALTH PROGRAM

Simple Bar diagram

Multiple Bar diagram


Graphical Representation

2. Pie Chart
• Circular diagram divided into segments and
each segment represent frequency in a category
Graphical Representation
Line diagram
Pictogram

Production of health manpower


yearly

Cartogram
Graphical Representation

Presentation of Quantitative Data


1.Histogram
• Graphical representation of a set of contiguously
drawn bars
• Most popular graph for continuous variable
Graphical Representation

Frequency Curve
Frequency Polygon

Scatter Diagram Time Plot


Graphical Representation

Stem-leaf Display

Box-and-whisker Plot
Data Cleaning

 Includes consistency checks and treatment of


missing responses

 Although preliminary consistency checks have been


made during editing, the checks at this stage are more
thorough and extensive, because they are made by
computer

 Computer packages like SPSS, SAS, EXCEL and MINITAB


can be programmed to identify out-of-range values for
each variable
Data Adjusting

 If any correction needs to be done for the


statistical analysis, the data is adjusted accordingly

 Data adjusting is not always necessary but it


may improve the quality of analysis sometimes

Data Analysis
References

• Biostatistics by Prem P. Panta


• Fundamentals of Research Methodology
and Statistics by Yogesh k. Singh
• Research Design by J. W. Creswell
• Internet

Thank
you

You might also like