0% found this document useful (0 votes)
31 views54 pages

Chap13 - Quantitative Data Analysis - Revised - Jan2021

This document provides an overview of quantitative data analysis concepts. It defines data and variables, and explains that data analysis is the process of understanding and making sense of data. It discusses different types of variables, the process of quantifying and coding data to prepare it for analysis, and developing code categories. It also introduces descriptive and inferential statistics, and provides examples of descriptive statistics tables and regression analysis for inferential statistics. Finally, it discusses parametric versus non-parametric statistical tests and some basic assumptions of parametric tests.

Uploaded by

Winnie Low
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views54 pages

Chap13 - Quantitative Data Analysis - Revised - Jan2021

This document provides an overview of quantitative data analysis concepts. It defines data and variables, and explains that data analysis is the process of understanding and making sense of data. It discusses different types of variables, the process of quantifying and coding data to prepare it for analysis, and developing code categories. It also introduces descriptive and inferential statistics, and provides examples of descriptive statistics tables and regression analysis for inferential statistics. Finally, it discusses parametric versus non-parametric statistical tests and some basic assumptions of parametric tests.

Uploaded by

Winnie Low
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

CHAPTER 13

QUANTITATIVE DATA ANALYSIS


WHAT IS DATA?
 Data are a bunch of values of one or more variables.
 A variable is something that has different values.
 Values can be numbers or names, depending on the
variable:
 Numeric, e.g. weight

 Nominal, e.g. gender (values are names: male or

female)
WHAT IS DATA ANALYSIS?

 Data analysis is the process of making sense of the data

 or get the meaning from the data

 or understand the data


Raw Data
 The unedited responses from a respondent exactly as indicated by that respondent.

Editing
 The process of checking the completeness, consistency, and legibility of data and
making the data ready for coding and transfer to storage.

Checking for Consistency


 Respondents match defined population
 Check for consistency within the data collection framework

Item Nonresponse
 The technical term for an unanswered question on an otherwise complete
questionnaire resulting in missing data.
Coding
 The process of assigning a numerical score or other character symbol to
previously edited data.

Codes
 Rules for interpreting, classifying, and recording data in the coding
process.
 The actual numerical or other character symbols assigned to raw data.

Data File
 The way a data set is stored electronically in spreadsheet-like form in
which the rows represent sampling units and the columns represent
variables.
QUANTIFYING DATA/CODING
QUANTIFYING DATA/CODING
 Before we can do any kind of analysis, we need to
quantify our data

 “Quantification” is the process of converting data to a


numeric format
 Convert social science data into a “machine-readable” form, a
form that can be read & manipulated by computer programs
QUANTIFYING DATA/CODING

Some transformations are simple:


 Assign numeric representations to nominal or ordinal
variables:
 Turning male into “1” and female into “2”
 Assigning “3” to Very Interested, “2” to Somewhat Interested, “1”
to Not Interested
 Assign numeric values to continuous variables:
 Turningborn in 1972 to “48”
 Number of children = “02”
DEVELOPING CODE CATEGORIES
Some data are more challenging. Open-ended responses
must be coded.

 Two basic approaches:


 Begin with a coding scheme derived from the research
purpose.
 Generate codes from the data.
CODING QUANTITATIVE DATA
 Goal – reduce a wide variety of information to a more
limited set of variable attributes:
 “What is your occupation?”
 Use pre-established scheme: Professional, Managerial, Clerical,
Semi-skilled, etc.
 Create a scheme after reviewing the data

 Assign value to each category in the scheme: Professional = 1,

Managerial = 2, etc.
 Classify the response: “Secretary” is “clerical” and is coded as “3”
Two Basic Rules for Coding Categories:

 They should be exhaustive, meaning that a coding category


should exist for all possible responses.

 They should be mutually exclusive and independent, meaning


that there should be no overlap among the categories to
ensure that a subject or response can be placed in only one
category.
Possible-Code Cleaning

Any given variable will have a specified set of answer choices and codes to match each answer choice.
For example, the variable gender will have three answer choices and codes for each: 1 for male, 2 for
female, and 0 for no answer. If you have a respondent coded as 6 for this variable, it is clear that an error
has been made since that is not a possible answer code. Possible-code cleaning is the process of checking
to see that only the codes assigned to the answer choices for each question (possible codes) appear in
the data file.

If you are not using a computer program that checks for coding errors during the data entry process, you
can locate some errors simply by examining the distribution of responses to each item in the data set.
For example, you could generate a frequency table for the variable gender and here you would see the
number 6 that was mis-entered. You could then search for that entry in the data file and correct it.
• Data integrity is essential to successful research and
decision making.

• What limits data integrity?


• Researcher making up data
• Nonresponse
• Poor editing or coding

• Consistent coding is important for secondary data.


TYPES OF STATISTICS
 Descriptive statistics uses the data to provide
descriptions of the sample or population, either through
numerical calculations presented in graphs or tables
format.

 Inferential statistics makes inferences/conclusions and


predictions about a population based on a sample of data
taken from the population in question.
DESCRIPTIVE STATISTICS

Table 4.2 General Demographic


Frequency Percent
Age 21 - 22 39 26.0
23 - 24 30 20.0
25 - 26 30 20.0
27 - 28 26 17.3
29 - 30 25 16.7
Gender Female 70 46.7
Male 80 53.3
Race Malay 47 31.3
Chinese 40 26.7
Indian 34 22.7
Others 29 19.3
Total 150 100.0
DESCRIPTIVE STATISTICS

Table 4.4 Reliability, Mean and Standard Deviation


Item Reliability Mean Standard Deviation
Ecological_Concern 5 0.837 5.260 1.043

Instrumental Motive 5 0.811 5.107 1.044

Symbolic Motive 5 0.756 4.539 1.288

Trust 8 0.887 5.174 0.985

Security 4 0.858 5.073 1.229

Adoption of Driverless Cars 3 0.859 5.002 1.524


INFERENTIAL STATISTICS

Table 4.5 Regression Table and Model Summary


Hypothesis Constructs Standard Coefficient P-Value (2-tailed) P-Value (1-tailed)
H1 Ecological Concern 0.329 0.003 0.0015

H2 Instrumental Motive -0.224 0.038 0.019

H3 Symbolic Motive 0.262 0.006 0.003

H4 Trust 0.237 0.084 0.042

H5 Security 0.360 0.001 0.0005

F-Statistic 18.695

Significant 0.000

Adjusted R² 0.373
PARAMETRIC STATISTICAL TEST
 In the literal meaning of the terms, a parametric statistical test is one that
makes assumptions about the parameters (defining properties) of the
population distribution(s) from which one‘s data are drawn.

 For practical purposes, you can think of "parametric" as referring to tests,


such as t-tests and the analysis of variance, that assume the underlying
source population(s) to be normally distributed;

 they generally also assume that one's measures derive from an equal-
interval scale (Interval or Ratio variables)

 And you can think of “non-parametric” as referring to tests that do not


make on these particular assumptions.
BASIS FOR
PARAMETRIC TEST NONPARAMETRIC TEST
COMPARISON

Meaning A statistical test, in which A statistical test used in the case


specific assumptions are made of non-metric independent
about the population parameter variables, is called non-
is known as parametric test. parametric test.

Basis of test statistic Distribution Arbitrary

Measurement level Interval or ratio Nominal or ordinal

Measure of central tendency Mean Median

Information about Completely known Unavailable


population
Applicability Variables Variables and Attributes

Correlation test Pearson Spearman


NON-PARAMETRIC
PARAMETRIC TEST
TEST

Independent Sample t Test Mann-Whitney test

Paired samples t test Wilcoxon signed Rank test

One way Analysis of Variance (ANOVA) Kruskal Wallis Test

One way repeated measures Analysis of Friedman's ANOVA


Variance
2. BASIC ASSUMPTION OF PARAMETRIC
TEST – NORMALLY DISTRIBUTED DATA SET
 Mean, Median, and Mode are equal.
 A standard deviation close to zero .
 Skewness and Kurtosis close to zero or within the range
of +1 to -1.
 Shapiro-Wilk’s W or Kolmogorov-Smirnov D test is not
significant.
 A histogram of a variable shows rough normality and
takes the form of a symmetric bell-shaped curve.
 A straighter line is formed by Q-Q plot.
 Boxplot shows little outliers; median in the center of the
box; all four quartiles about equally ranged.
MEAN, MEDIAN, MODE, SD, SKEWNESS
Gender Statistic Standard Error
Mean 3.3645 0.06771
95% Confidence Lower Bound 3.2305
Interval for Mean Upper Bound 3.4984
5% Trimmed Mean 3.3715
Median 3.4286
Variance 0.582
1. Male’s Purchase Intention:
Male Standard Deviation 0.76302 Mean (3.3645) is close to
Minimum 1.14 Median (3.4286)
Maximum 5.00
Range 3.86
Interquartile Range 0.86 2. Std Deviation (0.76302) close
Skewness -0.243 0.215 to 0, not more than 1
Kurtosis -0.016 0.427
Purchase
Mean 3.3866 0.05262
95% Confidence Lower Bound 3.2824 3. Skewness and Kurtosis close
Interval for Mean Upper Bound 3.4908
to zero, within + - 1
5% Trimmed Mean 3.3769
Median 3.4286
Variance 0.338
Female Standard Deviation 0.58121
Minimum 1.86
Maximum 5.00
Range 3.14
Interquartile Range 0.71
Skewness 0.173 0.219
Kurtosis 0.371 0.435
USING SKEWNESS AND KURTOSIS TO
INDICATE DATA NORMALITY
SHAPIRO-WILK’S W OR KOLMOGOROV-
SMIRNOV D

Kolmogorov-Smirnova Shapiro-Wilk
Gender
Statistic df Sig. Statistic df Sig.
Male 0.105 127 0.002 0.984 127 0.141
Purchase
Female 0.067 122 0.200 0.986 122 0.223
*. This is a lower bound of true significance.
a. Lilliefors Significance Correction

- Male: KMO Significant (0.002) indicates not Normally Distributed (to


conduct other test)
- KMO tests if the data distribution is significantly different from Normal
Distribution, when it is significant, means the data is not normally distributed.
- Male and Female: Shapiro-Wilk Not Significant (0.141; 0.223) indicates
Normally Distributed
HISTOGRAM

Female Purchase:

Symmetric bell-shaped
curve, indicates
normally distributed
data
Q-Q PLOT

The circles lie quite close


to the line indicating the
data shows straight
relationship with normal
line, thus normally
distributed
BOXPLOT

All four quartiles not equal


range, indicating not
normally distributed.

Remove the outliers, to get


normal distribution.
3. STATE THE HYPOTHESES
 State the null and alternative hypotheses based on underlying
theory.
 The choice of the significance level is decided, either 0.01,
0.05, or 0.10.
 Collect data and compute the p-value of the data using an
accurate statistical test.
 Compare the p-value (generated by software) with the
significance value (0.01, 0.05, or 0.10, determined by
researcher).
 Conclusion.
4. TESTING OF MEANS:
STATE THE HYPOTHESES
a) T-Test is used to assess hypotheses involving a single sample,
paired sample, or two independent samples.

b) Three techniques of t-test procedures: one-sample t-test,


paired-sample t-test, and independent-samples t-test.

c) One-way Analysis of Variance (ANOVA) uses to compare the


mean of a variable between two or more independent groups.
ONE SAMPLE T-TEST

 Used to compare the mean


between a variable and a standard
mean.

 H1: Mean Customer Satisfaction is


significantly greater than 3

 Univariate Analysis

 CS – Interval variable

 SPSS steps: Analyse/Compare


means/bring CS to “test Variable”
box/type “3” as test value
SPSS output:
a) p-value in one sample t-test table is 0.001 means sample mean is significantly
different from 3.
b) One sample statistic sample shows mean = 3.366 indicate sample mean is
significantly greater than 3.
Thus, H1 is supported
PAIRED-SAMPLE T-TEST

 Used to compare the mean of a variable


for two related samples or to compare
two means from the same sample.

 H2: The mean score of customer


satisfaction level between before and
after changes in the service is not equal.

 Bivariate analysis.

 CS before & after – Interval variable

 SPSS steps: Analyse/Compare


means/paired-samples t-test/bring “CS
before & after” to variable box/OK.
SPSS output:
a) Refer paired sample: Test results: P-value is < 0.05 indicates there are mean difference before and
after service changes.
b) Refer paired sample stat: CS before = 3.46/ CS after = 3.37 suggest that CS before service changes
is significantly higher than after.
Thus, H2 is supported.
INDEPENDENT-SAMPLE T-TEST
Usedto compare the mean of a variable between
two unrelated groups.

H3: The mean score of customer satisfaction


between males and females is not equal.

Bivariate analysis.

Gender = Nominal

CS = Interval

SPSS Steps: Analyze/Compare means/Independent-


sample t-test/bring CS to “test variable” box/bring
gender to grouping variable box/define groups: 1
into group 1 box, 2 into group 2 box/continue/OK
SPSS output:
a) Levene’s test indicates p-value of 0.98, suggesting there were equal variance in both
male and female data.
b) Refer t-test for equal variance assumed, p-value is 0.558 (>0.05), suggesting mean
male and female are not significantly different.
c) Examine group statistics; mean male is 3.34, mean female is 3.38, indeed not
significantly difference. Thus, H3 is not supported.
ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
 Used to compare the mean of a variable
between three or more independent groups.

 H4: The mean score of the employees’ job


performance after the training programmes
A, B and C is not equal.

 Bivariate analysis.

 Performance: Interval

 Training: Nominal

 SPSS steps: Analyze/compare means/one


way ANOVA/move performance to
dependent list box/move training to factor
box/options/tick descriptive/tick
homogeneity of variance test/continue/OK
SPSS output:
a) Examine homogeneity of variances table, P-value is 0.986, met assumptions of homogeneity
variance.
b) Examine ANOVA table, P-value is 0.00 (<0.05). There is a significant difference across means of
Training Programme A, B and C.
Thus, H4 is supported.
5. MEASURES OF ASSOCIATION

a. Pearson Correlation Coefficient uses to measure the


strength of a linear association between two variables.
b. Linear regression analysis uses to predict changes in
dependent variable based on the value of independent variable(s)
or predictor(s).
c. Logistic regression analysis uses to predict changes in
categorical dependent variable based on the value of
independent variable(s) or predictor(s).
PEARSON CORRELATION
COEFFICIENT
 Used to measure the strength of a
linear association between two
variables.
 H5: There is a relationship between
employee motivation and
performance.
 Bivariate analysis.

 Employee motivation &


performance = interval
 SPSS steps:
analyze/correlate/bivariate/move
motivation & performance to
variable box/tick pearson/OK
SPSS output:
a) Refer correlations table: r=0.59, p-value = 0.00 (<0.05) indicate significant
results.
H5 is supported.
b) Since r is positive, there is a positive and significant relationship between
employee motivation and performance.
LINEAR REGRESSION

 Used to predict changes in the dependent variable


based on the value of independent variable(s) or
predictor(s).
 H6: There is a relationship between performance
and involvement.
 H7: There is a relationship between performance
and welfare.
 Multivariate analysis.

 Dependent variable: performance = interval

 Independent variable: Involvement & welfare =

interval.
 SPSS steps: Analyze/regression/linear/move
performance to dependent box/move involvement
& welfare to independent box/statistics/tick
estimates, model fit, descriptives & collinearity
diagnostics/continue/OK
SPSS output:
- Refer model summary: Adjusted R square = 0.45, indicates that 45% of the variance in the dependent variable
can be predicted from independent variables.
 Refer ANOVA table: P-value=0.00 (<0.05) indicates the regression equation is a good fit.
 Refer coefficient table: standard coefficient of involvement is 0.435 (p=0.00) and welfare is 0.314 (p=0.00)
indicates both significantly related to performance.
 Thus H6 & H7 is supported.
LOGISTIC REGRESSION

 Used to predict changes in a


categorical dependent
variable based on the value of
independent variable(s) or
predictor(s).
6. MEDIATION AND MODERATION

a. Mediation, an initial independent variable X1 may influence


the dependent variable Y through a mediator X2. The
interrelationship can be grouped to three relationships: (a)
Independent variable influences the dependent variable (X1 
Y), (b) independent variable influences the mediator (X1  X2),
(c) mediator direct influences the dependent variable (X2  Y).
b. A moderator is a variable that alters the relationship between
independent variable and dependent variable.
MEDIATION
 An initial independent variable X1 may influence the
dependent variable Y through a mediator X2.
 http://davidakenny.net/cm/mediate.htm

a
Y
X1

b c

X2
MEDIATING VARIABLE
 surfaces between the time the independent
variables start operating to influence the
dependent variable and the time their impact is
felt on it.
 Example
MODERATION

 A moderator is a variable that alters the relationship between


an independent variable and a dependent variable.
 https://statistics.laerd.com/spss-tutorials/dichotomous-
moderator-analysis-using-spss-statistics.php

Y
X

M
MODERATORS
 Moderating variable
 Moderator is qualitative (e.g., gender, race, class) or
quantitative (e.g., level of reward) variable that affects
the direction and/or strength of relation between
independent and dependent variable.

 Example
7. STRUCTURAL EQUATION MODELLING
(SEM)
Structural equation modeling (SEM) uses when a researcher is
faced with a set of interrelated variables, yet none of the multivariate
techniques allow the researcher to address the issues. SEM is widely
used for following:

a. Confirmatory Factor Analysis (CFA) uses to explore the patterns of


relationships among a number of variables and specify which variable
load onto which factors.

b. Estimating a path model uses to show path diagram, decomposing


covariances and correlations, and direct, indirect, and total effects.
PLAN YOUR ANALYSIS

Time Management
PLANNING YOUR ANALYSIS
 Leave enough time for data entry and data formatting
 Can take much longer than you expect

 In your codebook – note the TYPE of variable for each


measurement/question

 This will allow you to plan the proper levels and types of
analysis
PLANNING YOUR ANALYSIS
 If your research question requires a level of analysis
your variables won’t allow, you’ll need to transform
them
 Create‘dummy’ variables
 Collapse categories

 Determine the level of significance acceptable & apply


proper tests
PLANNING YOUR ANALYSIS
 Proper planning will make things easier later

 Take good notes on any transformations of data that you


do

 Save all the elements of your analysis programs - In


SPSS, Paste to syntax file, and save SYNTAX file;
allows you to save a copy of all steps you perform in
SPSS.

You might also like