UNIT V - Data Presentation and Analysis (Compatibility Mode)
UNIT V - Data Presentation and Analysis (Compatibility Mode)
Analysis
Data types, sources, advantages and
disadvantages of secondary data, primary
sources and methods; questionnaire design,
components and principle of questionnaire
design, face to face interview, telephone
interview, computer assisted interviews,
observation-concept and methods, Data
Analysis: Descriptive and inferential statistics
Types of data
• Primary Data: Data collected by the researcher to
fulfill the objective of the research. Therefore,
primary data are the first hand data and they are
generated from the field work (through
administration of questionnaire, telephone
contact, observation, group discussion and
interview)
• Secondary data: Second hand data used by the
researcher for meeting their purpose. Secondary
data can be obtained through published and
unpublished sources (may be in the form of
reports, manuals, monographs and etc).
• Qualitative data: Data collected based on the
certain attributes or quality or characteristics
are known as qualitative data. Perceptions,
attitudes, attitudes are the examples.
Qualitative data measure the subject
dimensions
• Quantitative data: Data that are based on the
numbers or numeric description are termed as
quantitative data. Age, number of students,
speed time are the numeric data. Numeric data
measure the objective dimensions.
Sources of data
Sources of primary data: There are various sources of
primary data. However, interview, questionnaire and
observations are the main sources of primary data.
Interview: Collection of information by asking questions
orally to the respondents. One of the widely practiced
techniques in primary data collection. Structure and
unstructured interview can be conducted. Schedules
are used in face to face interview.
Questionnaire: List of questions that are asked to the
respondents for collecting information on specific issue.
Observation: It is a method of collecting primary data in
which researcher observes that situations and analyzes
it personally.
Secondary data: Literature reviews and textbooks are
examples of secondary sources. Some secondary
sources also attempt to influence the reader to a
particular point of view.
Secondary data analysis provides many opportunities for
furthering research through replication, re-analysis
and re-interpretation of existing research.
It provides researchers with opportunities to engage in
work to test new ideas, theories, frameworks, and
models of research design.
Source of secondary Data
• Published Sources (written)
– Organizational records, books, Journals, newsletters,
Newspapers, Research Reports, Committee reports,
websites and internet, Private publication.
• Unpublished Sources
• CD ROMs, TV and Radio Recordings, Pictures and Drawings, Films
and Documentaries, Recorded interviews
• Census
– Population, Industrial, Land holding, agricultural
• Surveys
• Demographic and Health surveys, Labor force survey, attitude
survey, occasional surveys, price index survey, organizational
surveys etc.
Advantages of secondary data
• The first advantage of using secondary data has always
been the saving of time
• It helps to save money.
• Generating new insights: Reanalyzing data can also lead
to unexpected new discoveries.
• Feasibility of both longitudinal and international
comparative studies: Continuous or regular surveys such
as government censuses or official registers are
especially good for such research purposes.
• Accessibility: Secondary data are easily accessible in
library and other websites.
• Facilitates to cross check: The findings of primary data
can also be check using secondary data as well.
Disadvantages of secondary data
• Inappropriateness of the data: Every data is collected to
meet the specific objective of the researcher, therefore ,
it may not be able to meet the objective of the second
hand researcher.
• Lack of control over data quality: One of the serious
issues of secondary data is quality of information. It is
not certain that the information provided by the
institutions meet the standard as required.
• Difficult to find rationality: It is not easy to find the
rationality of data.
• Limitations: The limitations of data may be the serious
issues for the researcher to meet the objective.
• Do not match the objective of the research
Questionnaire
• The main tool used in survey research.
• A questionnaire is a formal list of questions designed to
gather responses form respondents on a given topic.
• It translates the research objectives into specific
questions (measurement questions) and the answer to
those questions provide the data for testing research
hypothesis.
• As a method of data collection, the questionnaire is a
very flexible tool, but it must be used carefully in order
to fulfill the requirement of a particular piece of
research.
• A questionnaire involves several steps, including writing
question items, organizing the question items on a
questionnaire, layout and design print, administering the
questionnaire and so on.
Types of questionnaires
questionnaires
• Classification information
– This section of proposal comprises household and personal
information including age, gender education marital status,
family income occupation place of residence etc.
• Basic information
– Main part of the questionnaire, desired information needed
to solve the problem , include all necessary subject matter
under study.
Questionnaire design
• It can be designed to secure different types of
primary data from the respondents: intentions,
attitudes and opinion, activities or behavior and
demographic characteristics.
• We should pay attention to what information we
would like to secure from the respondent
• The key to successful questionnaire design are:
order, wording, layout, length, and appearance.
In designing or constructing a questionnaire, we
have to properly consider these aspects.
Dimensions of questionnaire Design
1. Information Desired
– Obtain only the information what ever is needed and
necessary
2. Types and form of questionnaire
Strictly choose the types of questionnaire
whether open end and closed end
3. Length
Simple and short question are to be preferred
4. Wording
Use simple, unambiguous and having clear
meaning words.
5. Order
The order of the questionnaire should be
appropriate but following points are important to
consider;
• Go from general to particular.
• Go from easy to difficult.
• Go from factual to abstract.
• Go from non controversial to difficult ones
• Start with closed format questions.
• Start with questions relevant to the main subject.
6. Physical appearance
Good physical appearance and eye pleasing format is
required for the attention of respondents.
Tips for preparing questionnaire
A. Identification of data needs
The initial task is to identify the types of data needed to meet the
objective of the proposed study. Be careful that analysis always
comes before data.
B. Formulate the questions:
During the formulation of questions following points are
considered;
1. pay attention to the language of the question
2. Use common word in the question that have same meaning for
everyone.
3. Avoid long question
4. Avoid negative, or even worse, double negative questions
For example, which question is easier to understand: (1) Are all the
menu options easy to find the way to?
(2) Are none of the menu options not easy to navigate to?
5. Provide all possible alternatives of the question.
6. Make a choice between open end and closed end question.
7. Decide whether general or specific questions are formed.
8. Avoid ambiguous wording
Has mobile technology changed society?
– Does society mean my country, the whole modern civilization or
are we talking about the people in my street?
– In what way did society change?
9. Aviod biased or leading question
Do you also hate this ugly mobile design? people might find it difficult to say
No.
10. Use short and simple sentence in the question
11. Do not assume the respondents are experts on themselves
12. Be careful about inadequacy of alternatives of questions
Example: Are you married? Yes or No. These alternatives are not enough because
it does not mention no information on ; parted, living separately, widow.
C. Organizing the questionnaire
1. Start with easy question that the respondents enjoy
Example: do not start with age, occupation, or marital status
2. Pose one question at a time
– Do not: How would you rate the usability and the usefulness of this
application?
– Do: How would you rate the usability of this application?
– Do: How would you rate usefulness of this application?
3. Ask precise question
4. Go from easy to difficult
4. Aviod asking recall dependent questions
5. Ask personal question at the end of the interview
D. Pre-test the questionnaire and revise if necessary
E. Type the questionnaire beautifully so that the lay out may be eye
pleasing
G. Prepare the letter of introduction.
Research Interview
Interviews are a systematic way of talking and listening to
people and are another way to collect data from
individuals through conversations.
So Interviewing is a way to collect data as well as to gain
knowledge from individuals.
There are many reasons to use interviews for collecting
data and using it as a research instrument. For example;
• There is a need to attain highly personalized data.
• There are opportunities required for probing.
• A good return rate is important.
• Respondents are not fluent in the native language of the
country, or where they have difficulties with written
language.
Types of interview
A. Face to face Interview: In a face to face interview, an interviewer is
physically present to ask the survey questions and to assist the
respondent in answering them.
It is generally taken at the home, offices or in any appropriate or
convenient places.
Researcher can get other additional information (gestures and facial
expressions as well).
Advantages
Probing is possible therefore clear answer is obtained.
Additional information can also be taken using body movement and
other facial expression.
In-depth and detail information is possible on the issue.
Disadvantages:
Time consuming if there are more respondents
Expensive
Chance of getting biased information because respondents hesitate to
say something in front of outsiders.
B. Telephone Interview
Technique used for get information through telephone is known as
telephone interview.
This can be done in wider geographical region within short period of
time with minimum cost.
A greater care is to be paid while asking questions to the respondents
because they only able to reply based on oral information.
Advantages
Less time and less cost
Response rate may be higher
More reliable and more flexible
Disadvantages
It is not suitable for in-depth surveys
Limited number of respondents (due to availability of phone number)
Additional information are not possible (appearance, gesture and body
languages)
C. Computer Assisted Interview (CAI)
When computers (laptops, I phones, Tablets) are used to collect the
information is known as computer assisted interview. In such interview
computers are used to develop and administer the survey questionnaire.
It is sometimes also known as Computer-Assisted Survey Information
Collection (CASIC). Under this technique, the enumerators use laptops to
ask questions and responses are directly entered either in software and
other media.
There are various types of CAI; for example;
CAPI (Computer Assisted Personal interview): Questions are asked face to
face but respondents may not see the questions and the responses are
typed.
CATI (Computer Assisted Telephone Interview): Face to face interview using
computer assisted telephone system
CASI (Computer Assisted Self Interviewing): If respondents themselves use
computer to write their responses then it is known as computer assisted
self interview.
Advantages
There is no need to transcribe the results into a computer form.
Cost and time can be minimized in terms of interviewing and
administration.
The computer checks for inadmissible or inconsistent responses
Errors from separate data entry are eliminated.
Disadvantages
It is not possible in all places and in all areas of research.
All enumerators and respondents may not able to handle the
computer and software.
CAI are generally administered through the help of special software,
therefore, it may not easy to get (costly)
D. Observation
Observation is a systematic process of recording behavioral patterns of people,
objects, and occurrences as they happen.
No questioning or communicating with people is needed.
Observational studies gather a wide variety of information about behavior.
Besides, collecting the data visually, this study involves listening, reading, smelling
and touching.
It is one of the important techniques in social science research.
It is most suitable to understand the feelings and perception towards the certain
events.
Methods of Observation
A. Participant and non-participant Observation
When research participates to certain activity, it is participant . But if
research does not participate in any activity that is known as non
participant observation.
B. Structured and Unstructured Observation
Structured observation is systematic and has a high level of
predetermined structure. A very little space is given to the
respondents.
While unstructured observation includes no systematic and
predetermined structure. The design and data collection process is
determined in the field.
C. Controlled and uncontrolled Observation
If the observation takes place under controlled environment, it is
known as controlled observation. But if the observation takes place
in natural environment then it is known as uncontrolled.
E. Focus group discussion (FGD)
A focus group is a small group of six to ten people (but the number
can vary according to situation) led through an open discussion by
a skilled moderator. The group needs to be large enough to
generate rich discussion but not so large that some participants
are left out.
Focus groups combine elements of both interviewing and participant
observation.
The technique inherently allows observation of group dynamics,
discussion, and firsthand insights into the respondents’ behaviors,
attitudes, language, etc.
Focus groups can be useful at both the formative and summative
stages of an evaluation. They provide answers to the same types
of questions as in-depth interviews, except that they take place in
a social context.
It should be done carefully and requires good skills (interviewer
should be skillful to manage the situations and time).
F. In-depth Interview (Key Informant Interview)
In depth interview of 15-35 people focusing on a list of issues
regarding a topic with which interviewees (leaders, professionals
and residents) have first-hand knowledge. These respondents with
their particular knowledge and understanding can provide the
insight on the nature of the problem and give recommendation for
the solution
Primary goal is to obtain qualitative description of perceptions or
experiences, rather than measuring aspects of the experience.
In-depth interview can provide;
• Qualitative, descriptive information for decision-making.
• Understanding of motivation, behavior, and perspectives of
participants.
• Examples of successes and shortcomings of the activity or program.
• Recommendations or future directions.
• Information to support interpretation of quantitative data
collected through other methods.
Steps in conducting in-depth interview
I. Formulate the study questions
These relate to specific concerns of the study. Study questions
generally should be limited to five or fewer.
II. Prepare a short interview guide
Key informant interviews do not use rigid questionnaires, which
inhibit free discussion. However, interviewers must have an idea
of what questions to ask. The guide should list major topics and
issues to be covered under each study question.
III. Select key informant
The number should not normally exceed 35. Key informants should
be selected for their specialized knowledge and unique
perspectives on a topic.
IV. Conduct interview
V. Take adequate note of Interview
VI. Analyze Interview data and check the reliability and validity
Analysis
4 ft
Height of the girl is 5 ft 6 ft
3 ft
Objective
to estimate the value of unknown parameters of the population
based on the sample information and test of hypotheses to draw
the conclusion in case of survey or experimental research.
Effective data analysis involves
– keeping your eye on the main game
– managing your data
– engaging in the actual process of quantitative and / or qualitative analysis
– presenting your data
– drawing meaningful and logical conclusions
Types
Descriptive Analysis
Inferential Analysis
Descriptive Analysis
• Descriptive analysis is the statistical procedure used to summarize,
organize, and simplify the data.
• Therefore, descriptive statistics is used to describe a set of data in
terms of its frequency of occurrence, its central tendency, and its
dispersion.
Unmarried
66.6 (20)
Married
33.4 (10)
Diagrams
Married
Unmarried
Graphs - Bad
100
90
90
80
70
60
Blue Balls
50
Red Balls
38.6
40
34.6
30.6 31.6
30 27.4
20.4 20.4
20
10
0
January February March April
Contd…..
Source: Mentegomery, 2004
Graphs - Good
Months
Hip Roof
I II
Source: Kelley, 2007
Describing Time-Series Data
• Data can be classified according to the time it is
collected.
– Cross-sectional data are all collected at the same time.
– Time-series data are collected at successive points in time.
• Time-series data is often depicted on a line chart (a plot
of the variable over time).
Line Chart
1,200,000
1,000,000
800,000
600,000
400,000
200,000
0
87 88 89 90 91 92 93 94 95 96 97 98 99
Use of diagrams and charts
Pie charts are excellent for summarizing financial data
so that the relationship can easily be seen.
Pie charts are particularly useful for presenting
proportional data.
Bar charts effectively demonstrate differences in the
data.
Line graphs are most suitable for displaying sequential
data and useful for trend analysis.
The text of the paper that refers to a table or chart
should briefly summarize the most important findings
in the visual element, not simply repeat your data.
• Bar charts are very useful for presenting data in a
comprehensible way to a non-statistical audience.
• Histogram is useful to know more about the exact
spread and distribution of a data set. Are there many
outliers, or is the data distribution equally spread out?
• A box plot (also known as a box-and-whisker diagram)
is a very efficient way of describing numerical data. It is
based on a five-number summary of a data set
(minimum, I quartile, Median, III quartile and
maximum)
• Scatter Plot is a simple graph in which the values of one
variable are plotted against those of the other. These
plots are often the first step in the correlation and
regression analyses.
Measure of central tendency
Process of getting a central value that more less represents
the entire population.
There are various methods;
Mean (AM, GM and HM)
Median
Mode
Mean
• In general, mean is the sum of the scores divided by the
number of scores. Although, it is based on all
observations, not affected by sampling fluctuation, it is
not suitable for
Mode
Mode is the score or category that has the greatest
frequency (Maximum repetition of observation).
• Although mode is suitable for finding the model value,
it is not suitable for irregular frequency distn
Remember
• When scores are measured on a NOMINAL SCALE, it is
meaningless to calculate either mean or median, so
Mode is the only way to describe central tendency.
Chi-Square Tests
Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 16.204a 2 .000
Likelihood Ratio 16.266 2 .000
Linear-by-Linear
.974 1 .324
Association
N of Valid Cases 1000
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 40.00.
Regression (Causal Analysis)
The statistical method for studying cause and effect
relationship between the variables is known as
REGRESSION.
Strictly speaking, regression line is the best fitting straight
line for a set of data.
Line of simple linear Regression
Y = b 0 + b1 X
Where b0 is intercept and b1 is slope
Illustration
Equation1 equation represents the relationship
between height and weight of an individual. Y is
dependent variable representing WEIGHT,
whereas x is independent representing HEIGHT
variable.
Y = 0.02 + 1.25 X 1
Y= 3.1 - 0.25 X 2
Y= 2.5 + 1.2 X1 +3.05 X2 (Is the
interpretation similar to equation 1 and 2?)
Inferential Analysis
Inferential analysis consists of techniques that allow to study the
samples and then make the generalization about the population
from which the samples have been selected.
Inferential statistics infer from the sample to the Population
45,51,75,78,80,85 40,70,65,30,68,31
Data
30
30
Step II
25
25
20 20
15 15
Descriptive 10
5
10
Statistics 0 0
Against specific null Chi Square Binomial Chi Square One sample One sample
hypothesis about
expected mean or
test test test t-test t-test
proportion
Relationship with Spearman’s Pearson’s Pearson’s
continuous
explanatory variable
Correlation Correlation Correlation
Difference in Cross tab Cross tab Cross tab Two-sample t - Two-sample t
expected mean test
or proportions
(Chi Square (Chi Square (Chi Square -test
test) test) test) (Mann–Whitney (Mann–Whitney
between
U test)
two groups U test)
Difference between Cross tab Cross tab Cross tab ANOVA ANOVA
mean or
proportions between
(Chi Square (Chi Square (Chi Square (Kruskal–Wallis (Kruskal–Wallis
more test) test) test) H test) H test)
than two groups
Purpose with Type of outcome data (dependent variables)
statistical
analysis
Step II
Alternative Hypothesis: (two tail): Sample mean is
not equal to the population mean or there is
significant difference between population and
sample means
Symbolically, we write
H1 : µ ≠ µ 0 (for two tail test)
H 1 = µ < µ 0 ( for left tail )
H 1 = µ > µ 0 ( for right tail )
Step III
Compute an appropriate test statistic
x − µ
Z =
σ = SE ( x )
n
1 1
σ 2
( + )
n1 n 2
If the common variance (σ2) is unknown then we
use its estimates based on the sample variances as
follows.
n1s1 + n2 s2
2 2
σ =2
n1 + n2
The testing of hypothesis is similar to that of earlier
cases.
Null Hypothesis: Two population means are equal
H o ; µ1 = µ 2
Alternative Hypothesis: Two population means are
not equal.
H 1 ;µ 1 ≠ µ 2 or
H 1 ;µ 1 p µ 2 or
H 1 ;µ 1 f µ 2
n=21
0.2
0.1
-4 -2 2 4
The exact shape of a t-distribution changes with degree of freedom . As d.f. gets
very large the distribution gets closer in shape to a normal distribution
t distribution
standard normal distribution
S= ∑ (x − x) = [∑ x −
2
]
n −1 n −1 n
Sometimes biased estimate of population variance can be used
1
s = ∑ ( x − x )2
2
n
ns 2 = ∑ ( x − x ) 2 ...............(1)
1
Again the unbiased var iance is computed as S = 2
∑ ( x − x ) 2
n −1
(n − 1)S 2 = ∑ ( x − x ) 2 .............(2)
Comparing (1) and (2), we have
S2 s2
=
n n −1
x −µ x −µ
Now t − statistic can also be written as t = =
2
S s2
n n −1
Choice between S and s
Use (s) when sample standard deviation is given
x − µ
t =
s2
n −1
Use (S) when actual data are given, then find the
unbiased sample variance
x − µ
t =
S 2
n
Procedure of testing hypothesis
• The mechanisms of testing hypothesis for t-
test are similar as in large sample test (Z).
• However, it is to be remembered that the
degrees of freedom are to be considered
for identifying the tabulated value.
• Finally, compare the computed t-value and
tabulated t-value for drawing decision
whether to accept null hypothesis or
alternative hypothesis.
T-test for difference between two means
When we want to test whether two independent
samples come from two normal populations having
the same means and variance, we use t-test for the
difference between two means is used.
Assumptions
1. The population from which the samples are
drawn are normally distributed.
2. Two samples must be random and should be
drawn independently
3. The variances of two population must be equal
and unknown.
The test statistics for the difference between means
is computed as
x1 − x2
t= ≈ t distribution with (n1 + n2 −1) df
2 1 1
S ( + )
n1 n2
Where x =
∑ x 1
and x2 =
∑ x 2
1
n1 n2
1 (∑ x1 )2 (∑ x2 )2
Similarly S = [∑ x1 − + ∑ x2 −
2 2 2
]
n1 +n2 −2 n1 n2
When sample s tan dard deviation(biased) are given then
n s + n2 s2
2 2
S =
2 1 1
n1 + n2 − 2
Testing of hypothesis procedure is similar as in
earlier cases. However, we set up the hypothesis as
follows.
Null hypothesis: The samples are from the normal
populations with same means (two population
means are equal), Symbolically, we write,
d=
∑ d
= Mean of the difference and
n
1 1 (∑ d ) 2
S =
2
∑ (d − d ) =
2
[∑ d −
2
]
n −1 n −1 n
Chi Square test for test of independence
As mentioned previously, it attempts to test the
association between two attributes
(categorical variables such as gender and
education, gender and income in categorical
variables).
The process is similar in goodness of fit but the
expected values are computed using row and
column totals with respect to grand total.
a b a+b
c d c+d
a+c b+d a+b+c+d