0% found this document useful (0 votes)
10 views120 pages

UNIT V - Data Presentation and Analysis (Compatibility Mode)

The document provides a comprehensive overview of data collection and analysis, detailing types of data (primary, secondary, qualitative, and quantitative), sources, and methods including interviews and questionnaires. It discusses the advantages and disadvantages of secondary data, as well as best practices for questionnaire design and conducting interviews. Additionally, it covers various measurement scales used in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views120 pages

UNIT V - Data Presentation and Analysis (Compatibility Mode)

The document provides a comprehensive overview of data collection and analysis, detailing types of data (primary, secondary, qualitative, and quantitative), sources, and methods including interviews and questionnaires. It discusses the advantages and disadvantages of secondary data, as well as best practices for questionnaire design and conducting interviews. Additionally, it covers various measurement scales used in data analysis.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 120

Data Collection and

Analysis
Data types, sources, advantages and
disadvantages of secondary data, primary
sources and methods; questionnaire design,
components and principle of questionnaire
design, face to face interview, telephone
interview, computer assisted interviews,
observation-concept and methods, Data
Analysis: Descriptive and inferential statistics
Types of data
• Primary Data: Data collected by the researcher to
fulfill the objective of the research. Therefore,
primary data are the first hand data and they are
generated from the field work (through
administration of questionnaire, telephone
contact, observation, group discussion and
interview)
• Secondary data: Second hand data used by the
researcher for meeting their purpose. Secondary
data can be obtained through published and
unpublished sources (may be in the form of
reports, manuals, monographs and etc).
• Qualitative data: Data collected based on the
certain attributes or quality or characteristics
are known as qualitative data. Perceptions,
attitudes, attitudes are the examples.
Qualitative data measure the subject
dimensions
• Quantitative data: Data that are based on the
numbers or numeric description are termed as
quantitative data. Age, number of students,
speed time are the numeric data. Numeric data
measure the objective dimensions.
Sources of data
Sources of primary data: There are various sources of
primary data. However, interview, questionnaire and
observations are the main sources of primary data.
Interview: Collection of information by asking questions
orally to the respondents. One of the widely practiced
techniques in primary data collection. Structure and
unstructured interview can be conducted. Schedules
are used in face to face interview.
Questionnaire: List of questions that are asked to the
respondents for collecting information on specific issue.
Observation: It is a method of collecting primary data in
which researcher observes that situations and analyzes
it personally.
Secondary data: Literature reviews and textbooks are
examples of secondary sources. Some secondary
sources also attempt to influence the reader to a
particular point of view.
Secondary data analysis provides many opportunities for
furthering research through replication, re-analysis
and re-interpretation of existing research.
It provides researchers with opportunities to engage in
work to test new ideas, theories, frameworks, and
models of research design.
Source of secondary Data
• Published Sources (written)
– Organizational records, books, Journals, newsletters,
Newspapers, Research Reports, Committee reports,
websites and internet, Private publication.
• Unpublished Sources
• CD ROMs, TV and Radio Recordings, Pictures and Drawings, Films
and Documentaries, Recorded interviews
• Census
– Population, Industrial, Land holding, agricultural
• Surveys
• Demographic and Health surveys, Labor force survey, attitude
survey, occasional surveys, price index survey, organizational
surveys etc.
Advantages of secondary data
• The first advantage of using secondary data has always
been the saving of time
• It helps to save money.
• Generating new insights: Reanalyzing data can also lead
to unexpected new discoveries.
• Feasibility of both longitudinal and international
comparative studies: Continuous or regular surveys such
as government censuses or official registers are
especially good for such research purposes.
• Accessibility: Secondary data are easily accessible in
library and other websites.
• Facilitates to cross check: The findings of primary data
can also be check using secondary data as well.
Disadvantages of secondary data
• Inappropriateness of the data: Every data is collected to
meet the specific objective of the researcher, therefore ,
it may not be able to meet the objective of the second
hand researcher.
• Lack of control over data quality: One of the serious
issues of secondary data is quality of information. It is
not certain that the information provided by the
institutions meet the standard as required.
• Difficult to find rationality: It is not easy to find the
rationality of data.
• Limitations: The limitations of data may be the serious
issues for the researcher to meet the objective.
• Do not match the objective of the research
Questionnaire
• The main tool used in survey research.
• A questionnaire is a formal list of questions designed to
gather responses form respondents on a given topic.
• It translates the research objectives into specific
questions (measurement questions) and the answer to
those questions provide the data for testing research
hypothesis.
• As a method of data collection, the questionnaire is a
very flexible tool, but it must be used carefully in order
to fulfill the requirement of a particular piece of
research.
• A questionnaire involves several steps, including writing
question items, organizing the question items on a
questionnaire, layout and design print, administering the
questionnaire and so on.
Types of questionnaires

questionnaires

Self administered Interviewer administered

online Telephone questionnaire

Mail (postal) Interview schedule

Delivery and collection


Components of a questionnaire
• Explanation information
– Introduce yourself and your institution, objective of study,
general instructions, assure anonymity in kept, tell
participation is voluntary, provide contact no address to
return and words of thanks

• Classification information
– This section of proposal comprises household and personal
information including age, gender education marital status,
family income occupation place of residence etc.

• Basic information
– Main part of the questionnaire, desired information needed
to solve the problem , include all necessary subject matter
under study.
Questionnaire design
• It can be designed to secure different types of
primary data from the respondents: intentions,
attitudes and opinion, activities or behavior and
demographic characteristics.
• We should pay attention to what information we
would like to secure from the respondent
• The key to successful questionnaire design are:
order, wording, layout, length, and appearance.
In designing or constructing a questionnaire, we
have to properly consider these aspects.
Dimensions of questionnaire Design
1. Information Desired
– Obtain only the information what ever is needed and
necessary
2. Types and form of questionnaire
Strictly choose the types of questionnaire
whether open end and closed end
3. Length
Simple and short question are to be preferred
4. Wording
Use simple, unambiguous and having clear
meaning words.
5. Order
The order of the questionnaire should be
appropriate but following points are important to
consider;
• Go from general to particular.
• Go from easy to difficult.
• Go from factual to abstract.
• Go from non controversial to difficult ones
• Start with closed format questions.
• Start with questions relevant to the main subject.
6. Physical appearance
Good physical appearance and eye pleasing format is
required for the attention of respondents.
Tips for preparing questionnaire
A. Identification of data needs
The initial task is to identify the types of data needed to meet the
objective of the proposed study. Be careful that analysis always
comes before data.
B. Formulate the questions:
During the formulation of questions following points are
considered;
1. pay attention to the language of the question
2. Use common word in the question that have same meaning for
everyone.
3. Avoid long question
4. Avoid negative, or even worse, double negative questions
For example, which question is easier to understand: (1) Are all the
menu options easy to find the way to?
(2) Are none of the menu options not easy to navigate to?
5. Provide all possible alternatives of the question.
6. Make a choice between open end and closed end question.
7. Decide whether general or specific questions are formed.
8. Avoid ambiguous wording
Has mobile technology changed society?
– Does society mean my country, the whole modern civilization or
are we talking about the people in my street?
– In what way did society change?
9. Aviod biased or leading question
Do you also hate this ugly mobile design? people might find it difficult to say
No.
10. Use short and simple sentence in the question
11. Do not assume the respondents are experts on themselves
12. Be careful about inadequacy of alternatives of questions
Example: Are you married? Yes or No. These alternatives are not enough because
it does not mention no information on ; parted, living separately, widow.
C. Organizing the questionnaire
1. Start with easy question that the respondents enjoy
Example: do not start with age, occupation, or marital status
2. Pose one question at a time
– Do not: How would you rate the usability and the usefulness of this
application?
– Do: How would you rate the usability of this application?
– Do: How would you rate usefulness of this application?
3. Ask precise question
4. Go from easy to difficult
4. Aviod asking recall dependent questions
5. Ask personal question at the end of the interview
D. Pre-test the questionnaire and revise if necessary
E. Type the questionnaire beautifully so that the lay out may be eye
pleasing
G. Prepare the letter of introduction.
Research Interview
Interviews are a systematic way of talking and listening to
people and are another way to collect data from
individuals through conversations.
So Interviewing is a way to collect data as well as to gain
knowledge from individuals.
There are many reasons to use interviews for collecting
data and using it as a research instrument. For example;
• There is a need to attain highly personalized data.
• There are opportunities required for probing.
• A good return rate is important.
• Respondents are not fluent in the native language of the
country, or where they have difficulties with written
language.
Types of interview
A. Face to face Interview: In a face to face interview, an interviewer is
physically present to ask the survey questions and to assist the
respondent in answering them.
It is generally taken at the home, offices or in any appropriate or
convenient places.
Researcher can get other additional information (gestures and facial
expressions as well).
Advantages
Probing is possible therefore clear answer is obtained.
Additional information can also be taken using body movement and
other facial expression.
In-depth and detail information is possible on the issue.
Disadvantages:
Time consuming if there are more respondents
Expensive
Chance of getting biased information because respondents hesitate to
say something in front of outsiders.
B. Telephone Interview
Technique used for get information through telephone is known as
telephone interview.
This can be done in wider geographical region within short period of
time with minimum cost.
A greater care is to be paid while asking questions to the respondents
because they only able to reply based on oral information.
Advantages
Less time and less cost
Response rate may be higher
More reliable and more flexible
Disadvantages
It is not suitable for in-depth surveys
Limited number of respondents (due to availability of phone number)
Additional information are not possible (appearance, gesture and body
languages)
C. Computer Assisted Interview (CAI)
When computers (laptops, I phones, Tablets) are used to collect the
information is known as computer assisted interview. In such interview
computers are used to develop and administer the survey questionnaire.
It is sometimes also known as Computer-Assisted Survey Information
Collection (CASIC). Under this technique, the enumerators use laptops to
ask questions and responses are directly entered either in software and
other media.
There are various types of CAI; for example;
CAPI (Computer Assisted Personal interview): Questions are asked face to
face but respondents may not see the questions and the responses are
typed.
CATI (Computer Assisted Telephone Interview): Face to face interview using
computer assisted telephone system
CASI (Computer Assisted Self Interviewing): If respondents themselves use
computer to write their responses then it is known as computer assisted
self interview.
Advantages
There is no need to transcribe the results into a computer form.
Cost and time can be minimized in terms of interviewing and
administration.
The computer checks for inadmissible or inconsistent responses
Errors from separate data entry are eliminated.
Disadvantages
It is not possible in all places and in all areas of research.
All enumerators and respondents may not able to handle the
computer and software.
CAI are generally administered through the help of special software,
therefore, it may not easy to get (costly)
D. Observation
Observation is a systematic process of recording behavioral patterns of people,
objects, and occurrences as they happen.
No questioning or communicating with people is needed.
Observational studies gather a wide variety of information about behavior.
Besides, collecting the data visually, this study involves listening, reading, smelling
and touching.
It is one of the important techniques in social science research.
It is most suitable to understand the feelings and perception towards the certain
events.
Methods of Observation
A. Participant and non-participant Observation
When research participates to certain activity, it is participant . But if
research does not participate in any activity that is known as non
participant observation.
B. Structured and Unstructured Observation
Structured observation is systematic and has a high level of
predetermined structure. A very little space is given to the
respondents.
While unstructured observation includes no systematic and
predetermined structure. The design and data collection process is
determined in the field.
C. Controlled and uncontrolled Observation
If the observation takes place under controlled environment, it is
known as controlled observation. But if the observation takes place
in natural environment then it is known as uncontrolled.
E. Focus group discussion (FGD)
A focus group is a small group of six to ten people (but the number
can vary according to situation) led through an open discussion by
a skilled moderator. The group needs to be large enough to
generate rich discussion but not so large that some participants
are left out.
Focus groups combine elements of both interviewing and participant
observation.
The technique inherently allows observation of group dynamics,
discussion, and firsthand insights into the respondents’ behaviors,
attitudes, language, etc.
Focus groups can be useful at both the formative and summative
stages of an evaluation. They provide answers to the same types
of questions as in-depth interviews, except that they take place in
a social context.
It should be done carefully and requires good skills (interviewer
should be skillful to manage the situations and time).
F. In-depth Interview (Key Informant Interview)
In depth interview of 15-35 people focusing on a list of issues
regarding a topic with which interviewees (leaders, professionals
and residents) have first-hand knowledge. These respondents with
their particular knowledge and understanding can provide the
insight on the nature of the problem and give recommendation for
the solution
Primary goal is to obtain qualitative description of perceptions or
experiences, rather than measuring aspects of the experience.
In-depth interview can provide;
• Qualitative, descriptive information for decision-making.
• Understanding of motivation, behavior, and perspectives of
participants.
• Examples of successes and shortcomings of the activity or program.
• Recommendations or future directions.
• Information to support interpretation of quantitative data
collected through other methods.
Steps in conducting in-depth interview
I. Formulate the study questions
These relate to specific concerns of the study. Study questions
generally should be limited to five or fewer.
II. Prepare a short interview guide
Key informant interviews do not use rigid questionnaires, which
inhibit free discussion. However, interviewers must have an idea
of what questions to ask. The guide should list major topics and
issues to be covered under each study question.
III. Select key informant
The number should not normally exceed 35. Key informants should
be selected for their specialized knowledge and unique
perspectives on a topic.
IV. Conduct interview
V. Take adequate note of Interview
VI. Analyze Interview data and check the reliability and validity
Analysis

Fallacy ??? Average depth of river is 4.3 ft

4 ft
Height of the girl is 5 ft 6 ft

3 ft

What happens if she tries to cross the river? However 5>4.3 ft


Measurement Scales
Nominal: Classification based on
name or certain characteristics.
E.g. Sex (Either male or Female)
Religion (Hindu, Christian, Muslin, Jain
and others)
Occupation (Doctor, Engineer,
Mechanics) WHY?
• The only allowable calculation on nominal data is to count the
frequency of each value of a variable.
• When the raw data can be naturally categorized in a meaningful
manner, we can display frequencies by
– Bar charts – emphasize frequency of occurrences of the different
categories.
– Pie chart – emphasize the proportion of occurrences of each category.
Ordinal Scale

Classification based on order or


rank.
E.g. Professor, Associate
Professor, Assistant professor

Socioeconomic status (Low,


Medium and High) WHY?
Interval Scale

Measures the value of


quantitative variables in
the absence of absolute
zero.
E.g. Temperature, Scores of
IQ test, Score obtained SAT
WHY ?
Ratio Scale
Measurement of
quantitative variables in
the presence of absolute
zero (meaningful
interpretation of zero).
E.g. Weight, Profit, no. of
patients died in the
hospitals.
WHY?
Managing Data
Regardless of data type, managing your data
involves
– familiarizing yourself with appropriate software
– developing a data management system
– systematically organizing and screening your data
– entering the data into a program
– and finally ‘cleaning’ your data
Organization and Preparation of Data
Before the analysis of data various activities are essential
to make data more readable and informative. Some of
the important issues are;
Editing: Detecting errors and simplification for coding
Coding: The process of assigning numeric values or
symbol to certain variables is coding.
Classification: Separating items according to similar
characteristics and grouping.
Tabulation: Presentation of data in systematic order
Summarization: After completing these stages,
researcher presents the data in precise form for
description, analysis and interpretation.
Analysis
Data analysis is primarily concerned with computation of certain
indices searching for the patterns of relationship that exists among
the variables (data groups).
In other words, data analysis is the process of collecting, arranging,
classifying and analyzing the information with the purpose of
generating useful information.

Objective
to estimate the value of unknown parameters of the population
based on the sample information and test of hypotheses to draw
the conclusion in case of survey or experimental research.
Effective data analysis involves
– keeping your eye on the main game
– managing your data
– engaging in the actual process of quantitative and / or qualitative analysis
– presenting your data
– drawing meaningful and logical conclusions
Types
Descriptive Analysis
Inferential Analysis
Descriptive Analysis
• Descriptive analysis is the statistical procedure used to summarize,
organize, and simplify the data.
• Therefore, descriptive statistics is used to describe a set of data in
terms of its frequency of occurrence, its central tendency, and its
dispersion.

Tools of Descriptive Analysis


Frequency Distribution (percentage, proportion and ranking)
Diagrams and charts
Measure of Central Tendency
Measure of variability
Distribution type (Skewness and Kurtosis)
Correlation analysis
Casual Analysis
Time Series and Index Number
Classification (table)
It s one of the important stages in which we classify the given information
according to similar characteristics and group them into various classes.
Classification may be of various types: chronological, geographical,
qualitative and quantitative

Table 1: Percent of respondents by marital status


Marital Status Percent

Unmarried
66.6 (20)

Married
33.4 (10)
Diagrams

Fig1: Pie diagram showing the marital status of the


respondents

Married

Unmarried
Graphs - Bad
100

90
90

80

70

60

Blue Balls
50
Red Balls

38.6
40
34.6
30.6 31.6
30 27.4

20.4 20.4
20

10

0
January February March April
Contd…..
Source: Mentegomery, 2004
Graphs - Good

Months

Source: Mentegomery, 2004


Compare in terms of clarity

Hip Roof

Pavilion Roof Lin-to Roof

Source: Paul, 2009


Which one is informative?

I II
Source: Kelley, 2007
Describing Time-Series Data
• Data can be classified according to the time it is
collected.
– Cross-sectional data are all collected at the same time.
– Time-series data are collected at successive points in time.
• Time-series data is often depicted on a line chart (a plot
of the variable over time).

Line Chart

1,200,000
1,000,000
800,000
600,000
400,000
200,000
0
87 88 89 90 91 92 93 94 95 96 97 98 99
Use of diagrams and charts
Pie charts are excellent for summarizing financial data
so that the relationship can easily be seen.
Pie charts are particularly useful for presenting
proportional data.
Bar charts effectively demonstrate differences in the
data.
Line graphs are most suitable for displaying sequential
data and useful for trend analysis.
The text of the paper that refers to a table or chart
should briefly summarize the most important findings
in the visual element, not simply repeat your data.
• Bar charts are very useful for presenting data in a
comprehensible way to a non-statistical audience.
• Histogram is useful to know more about the exact
spread and distribution of a data set. Are there many
outliers, or is the data distribution equally spread out?
• A box plot (also known as a box-and-whisker diagram)
is a very efficient way of describing numerical data. It is
based on a five-number summary of a data set
(minimum, I quartile, Median, III quartile and
maximum)
• Scatter Plot is a simple graph in which the values of one
variable are plotted against those of the other. These
plots are often the first step in the correlation and
regression analyses.
Measure of central tendency
Process of getting a central value that more less represents
the entire population.
There are various methods;
Mean (AM, GM and HM)
Median
Mode
Mean
• In general, mean is the sum of the scores divided by the
number of scores. Although, it is based on all
observations, not affected by sampling fluctuation, it is
not suitable for

– Open end data


– Qualitative data
– Highly skewed data
Median
Median is the score that divides a distribution exactly in
half.
• Although, median is suitable for qualitative data, open
end data, it is not based on all the observations and
affected by fluctuation of observations.

Mode
Mode is the score or category that has the greatest
frequency (Maximum repetition of observation).
• Although mode is suitable for finding the model value,
it is not suitable for irregular frequency distn
Remember
• When scores are measured on a NOMINAL SCALE, it is
meaningless to calculate either mean or median, so
Mode is the only way to describe central tendency.

• Median is suitable for the data which are measured on an


ORDINAL SCALE.
Central Tendency
Measure Advantages Disadvantages
Mean ∗ Best known average ∗ Affected by extreme values
(Sum of ∗ Exactly calculable ∗ Can be absurd for discrete data
all values
∗ Make use of all data (e.g. Family size = 4.5 person)
÷
no. of ∗ Useful for statistical analysis ∗ Cannot be obtained graphically
values)

Median ∗ Not influenced by extreme ∗ Needs interpolation for group/


(middle values aggregate data (cumulative
value)
∗ Obtainable even if data frequency curve)
distribution unknown (e.g. ∗ May not be characteristic of group
group/aggregate data) when: (1) items are only few; (2)
∗ Unaffected by irregular class distribution irregular
width ∗ Very limited statistical use
∗ Unaffected by open-ended class

Mode ∗ Unaffected by extreme values ∗ Cannot be determined exactly in


(most ∗ Easy to obtain from histogram group data
frequent
∗ Determinable from only values ∗ Very limited statistical use
value)
near the modal class
Dispersion (Variability)
Variability provides a quantitative measure of the degree
to which scores in a distribution are spread out. In general,
it is measured in terms of mean.
Example: Why “measure of dispersion” important?
• Consider returns from two categories of shares:
* Shares A (%) = {1.8, 1.9, 2.0, 2.1, 3.6}
* Shares B (%) = {1.0, 1.5, 2.0, 3.0, 3.9}

Mean A = mean B = 2.28%


But, different variability!
Var(A) = 0.557, Var(B) = 1.367
* Would you invest in category A shares or
category B shares?
Methods of measuring dispersion
• Range (One of the simplest measure of
dispersion)
• Quartile deviation (Quartile deviation is
particularly useful for open-end distribution)
• Mean deviation (when deviations are taken from
either mode, median)
• Standard deviation (useful measure of
dispersion and widely practiced for dispersion)
• Coefficient of variation (CV) (Best measure of
dispersion)
• Lorenz Curve (particularly useful for income
inequality measurement)
Correlation Analysis
Correlation is a statistical technique that is used to
measure and describe a relationship between two
variables.

Three characteristics of correlation analysis


Direction of relationship (positive or negative)
Form of relationship (linear or non-linear)
Degree of relationship (High, mild, low)
Data Requirements
For Pearson’s correlation, the data should be at
least in an interval scale.
However, measure of association between two
ordinal variables can also be obtained by
Spearman’s Rank Correlation.
Do not use Nominal variables to calculate
correlation coefficient.
For nominal variables, Chi Square test for
Independence is used to explore the
association.
Bivariate tabulation

Age Group Sex of the


Two variables are respondent
presented in the single Male Female

table on the basis of 15-19 80.0 (4) 20.0 (1)

their characteristics. 20-24 50.0 (4) 50.0 (4)

25-29 38.5 (5) 61.5 (8)

30-34 50.0 (2) 50.0 (2)


Chi Square test
Chi Square test is one of the best methods of
identifying the association between categorical
variables (attributes). For example:
gender and knowledge
Education and salesmanship

This test does not require any strict assumptions


for the population from which the samples are
drawn.
Religious affiliation * Sex of the respondents Crosstabulation

Sex of the respondents


Female Male Total
Religious Hindu Count 250 200 450
affiliation % within Sex of
41.7% 50.0% 45.0%
the respondents
Buddhist Count 300 150 450
% within Sex of
50.0% 37.5% 45.0%
the respondents
Others Count 50 50 100
% within Sex of
8.3% 12.5% 10.0%
the respondents
Total Count 600 400 1000
% within Sex of
100.0% 100.0% 100.0%
the respondents

Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 16.204a 2 .000
Likelihood Ratio 16.266 2 .000
Linear-by-Linear
.974 1 .324
Association
N of Valid Cases 1000
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 40.00.
Regression (Causal Analysis)
The statistical method for studying cause and effect
relationship between the variables is known as
REGRESSION.
Strictly speaking, regression line is the best fitting straight
line for a set of data.
Line of simple linear Regression

Y = b 0 + b1 X
Where b0 is intercept and b1 is slope
Illustration
Equation1 equation represents the relationship
between height and weight of an individual. Y is
dependent variable representing WEIGHT,
whereas x is independent representing HEIGHT
variable.

Y = 0.02 + 1.25 X 1

Interpret the result

Y= 3.1 - 0.25 X 2
Y= 2.5 + 1.2 X1 +3.05 X2 (Is the
interpretation similar to equation 1 and 2?)
Inferential Analysis
Inferential analysis consists of techniques that allow to study the
samples and then make the generalization about the population
from which the samples have been selected.
Inferential statistics infer from the sample to the Population

Tools of Inferential Analysis


Estimation (point and Interval)
Hypothesis testing ( Student t, Z test and other non
parametric tests)
Point estimate
• A point estimate uses a single sample value to
estimate the desired population value.
E.g. a sample mean is considered as a point estimate of the
population mean.
Similarly, sample standard deviation is the estimate of
population standard deviation. x
Drawbacks
• May not locate the population parameter due to margin of
uncertainty.
• Does not provide the level of confidence that the estimate is
close to the population parameter.
Interval Estimation

• Interval estimates uses range of values to


estimate the desired population value.

• When interval estimate is accompanied by a


specific level of confidence, it is called a
confidence interval
Analysis of Data

Descriptive Statistics and


casual analysis Inferential Statistics

Univariate Bivariate Multivariate Estimation Hypotheses


testing
1. Central 1. Cross 1. Multiple 1. Point
Tendency tabulation Regression 2. Interval 1. Parametric
/correlation 2. Non-parametric
2. Dispersion 2. Simple
regression 2. Logit/Probit
3. One way- and
ANOVA correlation 3. Factor and
cluster analysis
4. Time series 3. Two way
and Index ANOVA
number
4. Association of
attributes
RELATIONSHIP BETWEEN DESCRIPTIVE AND INFERENTIAL STATISTICS
Population
Step I Medium of
Medium of
Advertisement is A
advertisement is B
Experiment

Comparing two adv.


methods Sample A Sample B

45,51,75,78,80,85 40,70,65,30,68,31
Data
30
30

Step II
25
25
20 20
15 15

Descriptive 10
5
10

Statistics 0 0

Organize and simplify Average sales =69 Average Sales =51


The sample data show a 18 point difference in two methods of
Step III Advertisement. But there are two possible alternatives to interpret the
Inferential
results.
Statistics • Firstly, one can say that there is no difference between two ways of
advertisement, but the sample difference is due to the chance (Sampling
Error)
• Secondly, There is really difference between the populations and the
Interpret Result
sample data accurately shows this difference.
Therefore the aim of inferential analysis is to help the researcher
decide between these interpretations
Statistical tests or models depending on properties of the outcome and
explanatory variable.
Purpose with Type of outcome data (dependent variables)
statistical
analysis
Nominal Binary Ordinal Discrete Continuous

Against specific null Chi Square Binomial Chi Square One sample One sample
hypothesis about
expected mean or
test test test t-test t-test
proportion
Relationship with Spearman’s Pearson’s Pearson’s
continuous
explanatory variable
Correlation Correlation Correlation
Difference in Cross tab Cross tab Cross tab Two-sample t - Two-sample t
expected mean test
or proportions
(Chi Square (Chi Square (Chi Square -test
test) test) test) (Mann–Whitney (Mann–Whitney
between
U test)
two groups U test)
Difference between Cross tab Cross tab Cross tab ANOVA ANOVA
mean or
proportions between
(Chi Square (Chi Square (Chi Square (Kruskal–Wallis (Kruskal–Wallis
more test) test) test) H test) H test)
than two groups
Purpose with Type of outcome data (dependent variables)
statistical
analysis

Nominal Binary Ordinal Discrete Continuous

Analyzed as Multinomia Binary Ordinal Linear Linear


linear statistical l logistic logistic logistic regression/ regression/
model regression regression regression general linear general linear
model model

Two clustered or McNemar– McNemar McNemar– Paired Paired


repeated Bowker test test Bowker sample t -test sample t -test
measurements test (Wilcoxon (Wilcoxon
signed-rank test) signed-rank test)
Hypothesis testing
It is almost impossible or impractical for a researcher to
observe every individual in a population.
Therefore, the researcher usually collect data from a
sample and then use the sample data to help answer
questions about the population.
This is one of the best techniques used in research for
optimization of cost and time
What is hypothesis testing?
Hypothesis
Hypothesis are the assumption (set of
assumptions) that can be tested by using logic.
Hypothesis Testing Definition
A hypothesis test is a statistical method that uses
sample data to examine the assumption made
about the population parameter.
Simple logic underlying hypothesis testing
Simply testing hypothesis involves three steps;
Step1: State an assumption about population.
Usually the hypothesis concerns the value of
population parameter.
Step 2. Obtain a random sample form the
population fro which assumption has been made.
Step 3. Finally, compare the sample data with the
assumption made for the population parameter. If
the data are consistent with the assumption,
conclude that the hypothesis is reasonable
otherwise the hypothesis is wrong
Steps in hypothesis testing
Step I: Describe in words the population
characteristic about which hypotheses are to
be tested
Step II: State the null hypothesis, Ho
Step III: State the alternative hypothesis, H1 or Ha
Step IV: Identify the rejection region
Also determine whether one (right or left) or
two tails
Determine the value of level of significance
(α)
Step V: Choose an appropriate test Statistic to be
used for testing hypothesis and compute the
value of test statistic
Step VI: Compare the computed value with
tabulated value and draw the decision as
follows;
Option I: Accept null hypothesis if calculated
(computed) value is less than tabulated value
Option II: Reject null hypothesis if calculated
(computed) value is greater than tabulated
value.
Hypothesis testing for means
There are two tests of hypothesis for means
Z-test for large sample (if n˃30) and t-test
for small (if n<30).
Z-test is also used to test for the single and
difference between means (independent
sample tests)
Similarly, t-test is also to test for the single
and difference between two means.
Z-test for single mean
Assumptions
A. Random sampling (Randomness)
The samples should be drawn randomly from the
parent population
B. Independent observations (Independence)
Each of the observation should be independent
(that is no consistent and predictable relationship
between the observations)
C. Constant variance (σ)
There should be unchanged variance after and before
the treatment.
Step I
Null Hypothesis: The population mean has the
specific value or There is no significant difference
between sample and population means.
Symbolically, we write
Ho: µ = µ 0

Step II
Alternative Hypothesis: (two tail): Sample mean is
not equal to the population mean or there is
significant difference between population and
sample means
Symbolically, we write
H1 : µ ≠ µ 0 (for two tail test)
H 1 = µ < µ 0 ( for left tail )
H 1 = µ > µ 0 ( for right tail )

Step III
Compute an appropriate test statistic
x − µ
Z =
σ = SE ( x )
n

Where x is sample mean, µ is population mean,


σ is standard deviation and n is sample size.
Step V
Now draw the decision by comparing the calculated
and tabulated Z values. There are two options of
drawing decisions as follows:
A. If the calculated Z (absolute value or only
positive value) value is less or equal to tabulated
Z values then null hypothesis is accepted. This
indicates that there is no significant difference
between population and sample means.
Symbolically, Accept null hypothesis is if
Calculated Z ≤ Tabulated Z.
B. . If the calculated Z (absolute value or only
positive value) value is greater than tabulated Z
values then null hypothesis is rejected and
alternative hypothesis is accepted. This
indicates that there is significant difference
between population and sample means.
Symbolically
Reject null hypothesis and accept alternative
hypothesis if Calculated Z ˃ tabulated Z.
Z-test for the difference between two means
This test is particularly useful when equality
between two population means is sought. Let n1
and n2 be size of two samples drawn independently
from two populations with population means µ 1
and µ2 with variances σ1 square and σ 2 square.
Similarly, let x and x be sample means then the
1 2

test statistic can be defined as


x1 − x2
Z=
σ 21 σ 22
+
n1 n2
For large sample cases, population variances can
be replaced by sample variances then the Z-
statistic can be defined as
x − x
Z = 1 2
2 2
s1 s2
+
n1 n 2

If two sample are come from the same population


then Z statistic can be written as
x − x 2
Z = 1

1 1
σ 2
( + )
n1 n 2
If the common variance (σ2) is unknown then we
use its estimates based on the sample variances as
follows.
n1s1 + n2 s2
2 2
σ =2

n1 + n2
The testing of hypothesis is similar to that of earlier
cases.
Null Hypothesis: Two population means are equal

H o ; µ1 = µ 2
Alternative Hypothesis: Two population means are
not equal.
H 1 ;µ 1 ≠ µ 2 or
H 1 ;µ 1 p µ 2 or
H 1 ;µ 1 f µ 2

Remaining processes are similar as earlier.


Z test for Significance Single Proportion
• Z test for means is appropriate when the data is
quantitative but sometimes the data cannot be
quantified but categorized in such situation Z test
for proportion is appropriate.
• For an example the presence or absence (yes or
no) of defective items from the manufacturing
units.
• Therefore, test for the proportion is suitable for
the qualitative data.
• The logics underlying behind this test are more or
less same as in the case for the test of means.
Let p and P be the sample proportion and
population proportion.
Where p = x = Number of success containing certain characteristics
n Total number trials
P = Population proportion containing certain characteristics
( Success of population proportion) and P + Q = 1 that
Q = 1− P
Now the Z − test for proportion can be written as
p−P p−P
Z= = ≈ N (0,1)
SE ( p ) PQ
n
If finite population ( N ) is given then Z can be written as
p−P
Z= if population proportion ( P) is known
( N − n) PQ
( N − 1)n
Procedure of testing hypothesis
Step I: Null hypothesis: There is no significant
difference between population and sample
proportion. Symbolically, Ho: p=P
Step II: Alternative hypothesis: There is significant
difference between population and sample
proportions. Symbolically,
H1: p ≠ P (for two tail test)
H1: p <P (Left tail test)
H1: p ˃ P(Right tail test)
Remaining steps are similar to that of Z test for
mean.
Z test for the difference between two proportions
This test is used when it is essential to make a
comparisons between two population proportions
based on the sample proportions.
Let n1 and n2 be the samples drawn from two
large populations and X1 and X2 be the observes
number of successes (or failures) then the
observed sample proportions are
p1 = X 1 and
n1
X
p2 = 2 n2
Again let P1 and P2 be the population proportions
that are estimated from sample proportions, then
E ( p 1 ) = P1 and
E ( p 2 ) = P2
The Population variances for two populations are
P1Q1
Var ( p1 ) = and
n1
P2Q2
Var ( p2 ) =
n2
Similarly, the standard error of difference between
the proportion is
P1Q1 P2Q2
SE ( P1 − p2 ) = ( + )
n1 n2
Now the test statistic is given by
p1 − p2
Z=
P1Q1 P2Q2
( + )
n1 n2

If common population proportion is given then the


test statistic is given by
p1 − p2
Z= where
1 1
PQ( + )
n1 n2
n1 p1 + n2 p2
P= and Q = 1_ P
n1 + n2
Exact sample test (t-test for small sample)
Z test is used when the sample size is large (greater than
30), the population from which the sample are drawn is
normal, however sometimes we need to deal with small
sample case (less or equal to 30) and population standard
deviation is unknown in which we cannot approximate the
sample standard deviation for population standard
deviation .
In such situation, t-test is used to test any significant
difference between population and sample means.
The sampling distribution of sample mean for small
sample cases follows t-distribution
Nature of t-distribution
0.4
n=29
n=26
0.3 n=23

n=21

0.2

0.1

-4 -2 2 4

The exact shape of a t-distribution changes with degree of freedom . As d.f. gets
very large the distribution gets closer in shape to a normal distribution
t distribution
standard normal distribution

Comparison of normal (Z) and t-distributions


Degree of freedom
• The number of values in a sample which can be
chosen freely is known as degree of freedom (df).
• In t-test, we use degree of freedom because the
nature of curve changes as there is change in the
sample size (n), therefore sample size plays an
important role in the test of significance.
• In t-test, wee use (n-1) degree of freedom, where
n is the sample size.
Assumptions
• The parent population from which the sample is
drawn should be normal
• The sample should be drawn randomly
• The samples should be independent to each
other
T-test for single mean
The test statistic for t-test for the single mean is
given by x − µ
t =
s
n
Where µ is the population mean (assumed)
S is the sample standard deviation and
computed as
1 2 1 (∑ x ) 2

S= ∑ (x − x) = [∑ x −
2
]
n −1 n −1 n
Sometimes biased estimate of population variance can be used
1
s = ∑ ( x − x )2
2

n
ns 2 = ∑ ( x − x ) 2 ...............(1)
1
Again the unbiased var iance is computed as S = 2
∑ ( x − x ) 2

n −1
(n − 1)S 2 = ∑ ( x − x ) 2 .............(2)
Comparing (1) and (2), we have
S2 s2
=
n n −1
x −µ x −µ
Now t − statistic can also be written as t = =
2
S s2
n n −1
Choice between S and s
Use (s) when sample standard deviation is given
x − µ
t =
s2
n −1

Use (S) when actual data are given, then find the
unbiased sample variance
x − µ
t =
S 2
n
Procedure of testing hypothesis
• The mechanisms of testing hypothesis for t-
test are similar as in large sample test (Z).
• However, it is to be remembered that the
degrees of freedom are to be considered
for identifying the tabulated value.
• Finally, compare the computed t-value and
tabulated t-value for drawing decision
whether to accept null hypothesis or
alternative hypothesis.
T-test for difference between two means
When we want to test whether two independent
samples come from two normal populations having
the same means and variance, we use t-test for the
difference between two means is used.
Assumptions
1. The population from which the samples are
drawn are normally distributed.
2. Two samples must be random and should be
drawn independently
3. The variances of two population must be equal
and unknown.
The test statistics for the difference between means
is computed as
x1 − x2
t= ≈ t distribution with (n1 + n2 −1) df
2 1 1
S ( + )
n1 n2

Where x =
∑ x 1
and x2 =
∑ x 2
1
n1 n2
1 (∑ x1 )2 (∑ x2 )2
Similarly S = [∑ x1 − + ∑ x2 −
2 2 2
]
n1 +n2 −2 n1 n2
When sample s tan dard deviation(biased) are given then
n s + n2 s2
2 2
S =
2 1 1
n1 + n2 − 2
Testing of hypothesis procedure is similar as in
earlier cases. However, we set up the hypothesis as
follows.
Null hypothesis: The samples are from the normal
populations with same means (two population
means are equal), Symbolically, we write,

Alternative hypothesis: The samples are not from


the normal population with same means (two
population means are not equal)
H1 ; µ1 ≠ µ 2
Remaining procedure is similar as earlier.
Paired t-test
Paired t- test is particularly useful when it is
necessary to compare before and after values. It is
beneficial to measure the effectiveness of certain
intervention programs (treatments). For example;
the volume of sales before and after the
advertisement.
Data Requirements
1. The sample size before and after the
intervention should be equal
2. The same sample should be treated twice
The statistic used for paired t-test is
d d
t= = ≈ tn−1
S S2
n n
Where d = x − y (difference between values
before and after treatment)

d=
∑ d
= Mean of the difference and
n
1 1 (∑ d ) 2

S =
2
∑ (d − d ) =
2
[∑ d −
2
]
n −1 n −1 n
Chi Square test for test of independence
As mentioned previously, it attempts to test the
association between two attributes
(categorical variables such as gender and
education, gender and income in categorical
variables).
The process is similar in goodness of fit but the
expected values are computed using row and
column totals with respect to grand total.
a b a+b
c d c+d
a+c b+d a+b+c+d

Now the expected values are computed as


For the cell a, the expected frequency is
E (a)= Row total *Column total/Grand total
E (a)= (a+b)(a+c)/(a+b+c+d)
E (b)= (a+b) (b+d)/(a+b+c+d)
E (c)= (c+d) (b+d)/(a+b+c+d)
E(d)= (b+d) (c+d)/(a+b+c+d)
2
(O _ E)
Chi − Square= ∑
E
WhereE = ExpectedValues
O = Observedvalues
Yates’s Correction for Continuity
Yates' correction for continuity (or Yates' chi- squared
test) is used in certain situations when testing for
independence in a contingency table.
In the case in 2X2 contingency table when pooling is
applied sometimes the d.f. becomes zero. In this situation,
Chi Square test becomes meaningless . To overcome such
limitations, Yates correction for continuity is applied. The
formula for the test is
Analyzing and Interpreting Qualitative Data
• Qualitative data is thick in detail and
description.
• Data often in a narrative format.
• Data often collected by observation, open-
ended interviewing, document review.
• Analysis often emphasizes understanding
phenomena as they exist, not following
pre-determined hypotheses.
What is Qualitative data analysis (QDA)?
• Qualitative data analysis (QDA) is the process of turning written data such
as interview and field notes into findings.
• There are no formulas, methods or rules for this process, for which you
will need skills, knowledge, experience, insight and a willingness to keep
learning and working at it.
• In this section provide a concise discussion of your intended strategy for
analyzing your qualitative data. It is important for your audience to know
what you will do with the collected data and that it fits well with your
worldview, research philosophy, and research strategy.
• There are many different ways of doing QDA. They include the case study
approach, theory-based approaches, and collaborative and participatory
forms of analysis.
• A method used in qualitative research that involves cross-checking
multiple data sources and collection procedures to evaluate the extent to
which all evidence converges.
• However, There are three major methods of QDA; Content Analysis,
Narrative Analysis and Thematic Analysis
I. Content Analysis
Content analysis may be seen as a method where the
content of the message forms the basis for drawing
inferences and conclusions about the content.
It is a research methodology that utilizes a set of
procedures to make valid inferences from text.
Qualitative content analysis can include:
• Case records
• Audio tapes, videotapes, TV shows, and films.
• Books
• People’s diaries
• Newspaper accounts of events
• Live experiences
Key Features
Like any other research method, content analysis conforms
to three basic principles of scientific method. They are:
Objectivity: Which means that the analysis is pursued on
the basis of explicit rules, which enable different
researchers to obtain the same results from the same
documents or messages.
Systematic: The inclusion or exclusion of content is done
according to some consistently applied rules where by
the possibility of including only materials which support
the researcher’s ideas – is eliminated.
Generalizability: The results obtained by the researcher
can be applied to other similar situations.
Steps for conducting content analysis
• Find the necessary information.
• Develop the bases for tabulation
• Develop bases for content analysis
• Develop the layout for the construction of design
• Classify the various variables into various groups
• Establish procedures for the use of materials
• Prepare the outline of the analysis and utilizing
them
II. Narrative Analysis
Narrative research captures the voice of the participant and offers a
collection of themes that help us understand the phenomenon
being investigated.
The outcome of narrative research is a researcher-generated story (a
retelling) that answers “How” and “What” questions about the
life story and meaningful experiences that have implications for
others.
Narrative or stories occurs when one or more speakers engage in
sharing and recounting an experience or event.
Forms of Narrative can be in the form of;
• Oral or written
• Very short or long,
• Told as a way to share one’s life stories
• Focus on events and meaning of these events for those
experiencing them.
Sources and forms of Narrative Data
• Open ended questions
• Individual interview
• Discussion groups
• Observations
• Documents and Reports
• Stories
• Case studies
Steps in Narrative Analysis
• Obtaining Data
• Focusing on analysis obtained from the various
sources
• Codify the data using sign or symbol to the
classes of data
• Identify the relationship among the various
classes
• Finally draw the conclusion based on the
observation.
III. Thematic Analysis
When data is analyzed by theme, it is called thematic
analysis.
A technique of qualitative data analysis that is used to
identify the major points of data, analyze them and
prepare report is known as THEMATIC ANALYSIS.
It is work of searching theme (topic) of data, event or
subject that is important for the description of then
phenomena.
The process involves the identification of themes through
careful reading and rereading of data, noting down initial
ideas, coding interesting features of the data, relating
code into themes, generating thematic map, on-going
analysis to refine each of them and producing report
through the continuous analysis.
Steps in thematic analysis
• Reviewing the previous literature
• Generating the initial codes: Highlight the
essential information found in the literature and
prepare the note of such data (memo and notes)
• Searching the themes: Identify the probable
information.
• Review the themes: After collecting the themes
the researcher should review the data and theme
to check its reliability
• Finally prepare the report: Finally prepare the
report according to the objective of the research.

You might also like