Chapter 7 - Data Analysis Process (Full - Updated)
Chapter 7 - Data Analysis Process (Full - Updated)
Data
Analysis
Lecturer: Le Hoai Kieu Giang
Email: [email protected]
Contents:
1. Data preparation
2
Comment on the following questionnaire items and answers:
3
1. Data preparation
4
1. Data preparation
v Editing
Data Quality Issues:
• Incomplete questionnaire
Case C
è Little variance
Case D
6
1. Data preparation
v Editing
Editing detects errors and omissions, corrects them when possible, and
certifies that maximum data quality standards are achieved.
7
1. Data preparation
v Editing
Purposes:
• Accuracy and precision
• Consistent with the intent of the question and other information
in the survey.
• Uniformly entered.
• Completeness.
• Arranged to simplify coding and tabulation.
8
1. Data preparation
v Editing
To:
• Identify technical omissions such as a blank page on an interview form
10
1. Data preparation
v Editing
Treatment of Unsatisfactory Responses:
11
1. Data preparation
v Coding
Coding involves assigning numbers or other symbols to answers so that
the responses can be grouped into a limited number of categories.
12
1. Data preparation
v Coding
Coding rules:
Appropriateness Exhaustiveness
(Phù hợp) (Toàn diện)
13
1. Data preparation
v Coding
Appropriateness
14
1. Data preparation
v Coding
Exhaustiveness
Unidimensionality
17
1. Data preparation
v Data file
A data file is a collection of related records that make up a data set
18
2. Data analysis process
Start
2 variables
- T-test: It is used to determine whether there is a significant difference between the means of two
groups; or compare the mean of a sample to a known or hypothesized population mean. It is used
when the standard deviation is unknown and the sample size is small.
- Z-test: It is similar to a t-test, but it is used for the standard normal distribution and a large sample
size.
- Kolmogorov-Smirnov test (KS test): This test is appropriate when the data are at least ordinal.
The research situation calls for a comparison of an observed sample distribution with a theoretical
distribution/ or determines whether two distributions are the same. (Ex: whether your data set is a
normal distribution)
- Chi-square test: It is particularly useful in tests involving nominal data but can be used for higher
scales. It is a statistical method used to determine whether there is a significant association or
difference between categorical variables. (Ex: whether color preference is evenly distributed across
five colors) 20
2. Data analysis process
Bivariate analysis:
Function group Total
Mkt & Sales Prodct Others
Count 41 62 41 144
% within Gender 28.5% 43.1% 28.5% 100.0%
Male
% within Functgr 74.5% 93.9% 53.2% 72.7%
Gender
Multivariate analysis:
23
2. Data analysis process
§Dependence methods: One or more variables have been designated as being
predicted by a set of independent variables.
§ When the researcher wants to explore the interrelationship among the variables
taken together.
• The mode: The observation that occurs most frequently in a set of data
Unimodal 4; 5; 5; 5; 6; 6; 7; 7; 7; 7; 7; 8; 9; 10
Bimodal/
Multimodal 4; 5; 5; 5; 5; 6; 6; 7; 7; 7; 7; 8; 9; 10
16 29 20 9 34 10 23 12 15 22
• The mean: The sum of all the scores in a distribution divided by the
number of those scores.
1; 3; 3; 5; 7; 9; 6; 8
Mean = (1 + 3 + 3 + 5 + 7 + 9 + 6 + 8) / 8 = 5.25
Since the mean is determined by the value of every score, it is the preferred
measure of central tendency.
28
2. Data analysis process
2.1. Descriptive statistics
∑(# $%)!
V=
'
• Standard deviation (SD) is the square root of the mean squared
deviation from the mean of the distribution. It reflects the amount of
spread that the scores exhibit around the mean.
SD = 𝑉
29
2. Data analysis process
Negative skewness (left-tailed) Positive skewness (right-tailed)
Mode
Mode Mode
Median
Median Median
Mean
Mean
Kurtosis is a measure of
the relative peakedness
or flatness of the curve
defined by the frequency
distribution.
33
2. Data analysis process
2.2. Distributions
v Normal distribution
It is affected only by random influences, has no skew and the mean, median,
and mode all fall at exactly the same point.
Characteristics:
• A normal distribution is the proper term for a probability bell curve.
• In a normal distribution, the mean is zero and the standard deviation is 1 (Z –
score)
• It has zero skew and a kurtosis of 3.
• Normal distributions are symmetrical, but not all symmetrical distributions are
normal.
34
2. Data analysis process
2.2. Distributions
v Normal distribution
35
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
36
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
37
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
38
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
39
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
41
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
42
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
44
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics
45
2. Data analysis process
2.4. Assessing reliability of a measurement scale
• The more the test items intercorrelate, the higher the internal reliability.
è All the items in the test are measuring the same characteristic.
46
2. Data analysis process
2.4. Assessing reliability of a measurement scale
47
2. Data analysis process
2.4. Assessing reliability of a measurement scale
• The alpha coefficient after removing the item is greater than the current
alpha coefficient.
48
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha
49
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha
50
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha
51
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha
è Eliminate INF18 52
2. Data analysis process
2.6. Factor analysis to assess validity
53
2. Data analysis process
2.6. Factor analysis to assess validity
v Requirements in EFA:
Sample size: ratio of N /variables usually at least 5:1, preferably 10:1
57
2. Data analysis process
2.6. Factor analysis to assess validity
58
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for each individual scale:
59
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for each individual scale:
60
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
61
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:
62
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:
63
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
64
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
65
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:
66
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:
67
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:
è Eliminate INF17
68
2. Data analysis process
2.8. Regression analysis
Y = b0 + b1 X
Y: dependent variable
X: independent variable
69
2. Data analysis process
2.8. Regression analysis
Y = b0 + b1 X1 + b2 X2 + b3 X3 +…… etc.
Ex: an R2 of 0.779 implies that all the predictor variables taken together
explain 77.9% of the variation in the dependent variable.
70
2. Data analysis process
2.8. Regression analysis
71
2. Data analysis process
2.9. Using SPSS for regression analysis
v Create a representative variable
72
2. Data analysis process
2.9. Using SPSS for regression analysis
73
2. Data analysis process
2.9. Using SPSS for regression analysis
Pearson
correlation
coefficient
74
2. Data analysis process
2.9. Using SPSS for regression analysis
75
2. Data analysis process
2.9. Using SPSS for regression analysis
76
2. Data analysis process
2.9. Using SPSS for regression analysis
77
2. Data analysis process
2.9. Using SPSS for
regression analysis
78
2. Data analysis process
2.9. Using SPSS for
regression analysis
79
2. Data analysis process
2.9. Using SPSS for regression analysis
80
2. Data analysis process
2.8. Regression analysis
82