0% found this document useful (0 votes)
72 views77 pages

Chapter 7 - Data Analysis Process (Full - Updated)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views77 pages

Chapter 7 - Data Analysis Process (Full - Updated)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

Chapter 7:

Data
Analysis
Lecturer: Le Hoai Kieu Giang
Email: [email protected]
Contents:

1. Data preparation

2. Data analysis process

2
Comment on the following questionnaire items and answers:

1. How long have you lived at your current address?


Answer: 48

2. What is your age?


Answer: 32 years old

3
1. Data preparation

• Raw data: The unedited responses Editing


from a respondent exactly as
indicated by that respondent.
Coding
• Respondent error

• Non-respondent error Data file

4
1. Data preparation
v Editing
Data Quality Issues:
• Incomplete questionnaire

• The respondent did not understand or follow the instructions

• The responses show little variance.

• The returned questionnaire is physically incomplete: one or more pages


are missing.

• The questionnaire is answered by someone who does not qualify for


participation.
5
1. Data preparation
v Editing

Case A è Incomplete response

Case B è Items omission

Case C
è Little variance
Case D
6
1. Data preparation
v Editing

Editing is the review of the questionnaires with the objective of increasing


accuracy and precision.

Editing detects errors and omissions, corrects them when possible, and
certifies that maximum data quality standards are achieved.

7
1. Data preparation
v Editing

Purposes:
• Accuracy and precision
• Consistent with the intent of the question and other information
in the survey.
• Uniformly entered.
• Completeness.
• Arranged to simplify coding and tabulation.

8
1. Data preparation
v Editing

Field editing (Preliminary editing):


• Field editing review is a responsibility of the field supervisor/ interviewer

• As soon as possible after the interview

To:
• Identify technical omissions such as a blank page on an interview form

• Check legibility of handwriting for open-ended responses

• Clarify responses that are logically or conceptually inconsistent.


9
1. Data preparation
v Editing

In-house editing (Central editing):

• A rigorous editing job performed by a centralized office staff.

• Requires the editor to have a lot of experience and knowledge.

10
1. Data preparation
v Editing
Treatment of Unsatisfactory Responses:

• Returning to the field

• Assigning missing values

• Discarding unsatisfactory respondents

11
1. Data preparation
v Coding
Coding involves assigning numbers or other symbols to answers so that
the responses can be grouped into a limited number of categories.

• Create variable names for each question

• Convert the choices of each question into numbers/ labels.

12
1. Data preparation
v Coding
Coding rules:

Appropriateness Exhaustiveness
(Phù hợp) (Toàn diện)

Mutual exclusivity Unidimensionality


(Loại trừ) (Đơn hướng)

13
1. Data preparation
v Coding

Appropriateness

The classification/ grouping must be appropriate to the research


problem/objective.

14
1. Data preparation
v Coding
Exhaustiveness

Ex: Which app do you most


• The codes should represent the types to
often use to shop online?
be studied.
1. Shopee
2. Lazada
• The “other” category should account for
3. Tiki
the smallest percentage
4. Sendo
5. Other: …
15
1. Data preparation
v Coding
Mutual exclusivity

Each response corresponds to only one cell in a category set.


(1) Professional
(2) Managerial
(3) Sales How would you code a participant’s
(4) Clerical answer that specified “salesperson and
(5) Crafts student”?
(6) Operatives
(7) Unemployed.
(8) Student 16
1. Data preparation
v Coding

Unidimensionality

Each answer corresponds to a unique dimension.

17
1. Data preparation
v Data file
A data file is a collection of related records that make up a data set

18
2. Data analysis process
Start

One More than 2


variable Number of variables
variables

2 variables

Univariate Bivariate Multivariate


Analysis Analysis Analysis
19
2. Data analysis process
Univariate analysis:

- Descriptive analysis: frequency, mean, mode, median, variance, standard deviation,…

- T-test: It is used to determine whether there is a significant difference between the means of two
groups; or compare the mean of a sample to a known or hypothesized population mean. It is used
when the standard deviation is unknown and the sample size is small.

- Z-test: It is similar to a t-test, but it is used for the standard normal distribution and a large sample
size.

- Kolmogorov-Smirnov test (KS test): This test is appropriate when the data are at least ordinal.
The research situation calls for a comparison of an observed sample distribution with a theoretical
distribution/ or determines whether two distributions are the same. (Ex: whether your data set is a
normal distribution)

- Chi-square test: It is particularly useful in tests involving nominal data but can be used for higher
scales. It is a statistical method used to determine whether there is a significant association or
difference between categorical variables. (Ex: whether color preference is evenly distributed across
five colors) 20
2. Data analysis process
Bivariate analysis:
Function group Total
Mkt & Sales Prodct Others

Count 41 62 41 144
% within Gender 28.5% 43.1% 28.5% 100.0%
Male
% within Functgr 74.5% 93.9% 53.2% 72.7%
Gender

Two – way % of Total 20.7% 31.3% 20.7% 72.7%


Tabulation Count 14 4 36 54
% within Gender 25.9% 7.45% 66.7% 100.0%
Female

% within Functgr 25.5% 6.1% 46.8% 27.3%


% of Total 7.1% 2.0% 18.2% 27.3%
Total Count 55 66 77 198
% within Gender 27.8% 33.3% 38.9% 100.0%
% within Functgr 100.0% 100.0% 100.0% 100.0%
% of Total 27.8% 33.3% 38.9% 100.0% 21
2. Data analysis process
Bivariate analysis:

Test method Application Illustration


Chi – Square Association between 2 Association between income
Contingency Test nominal scaled variables. class and favorite beer brand.

Spearman Correlation between 2 Correlation between level of


correlation ordinal scaled variables reward (1,2,3) and level of
performance (1,2,3).
Pearson correlation Correlation between 2 Correlation between
metric variables. customer’s age and spending
on healthcare.
Simple regression Linear mathematic Sales and advertising
expression of 2 metric expenses.
variables 22
2. Data analysis process

Multivariate analysis:

q Involves the simultaneous analysis of >2 variables

q Advanced statistical techniques

q Powerful in solving complex research problems

23
2. Data analysis process
§Dependence methods: One or more variables have been designated as being
predicted by a set of independent variables.

è Multiple regression, ANOVA, Conjoint analysis, Discriminant analysis, Structural


Equation Modeling...

§Interdependence methods: No variable(s) are designated as being predicted by


others.

§ When the researcher wants to explore the interrelationship among the variables
taken together.

è Factor analysis, Cluster, Multidimensional Scaling MDS.


24
2. Data analysis process
Method Required scale of variable(s)
Dependent Independent
One dependent variable
Multiple regression Interval interval
ANOVA Interval Nominal
Multiple regression
Interval Nominal
with dummy variable
Discriminant analysis Nominal Interval
Conjoint analysis Ordinal Nominal
Two or more dependent variables
Canonical analysis Interval Interval
MANOVA Interval Nominal
Network structure including many dependent and independent variables
SEM Interval Interval 25
2. Data analysis process
2.1. Descriptive statistics

v Measures of central tendency: mode, mean, and median

• The mode: The observation that occurs most frequently in a set of data

Unimodal 4; 5; 5; 5; 6; 6; 7; 7; 7; 7; 7; 8; 9; 10

Bimodal/
Multimodal 4; 5; 5; 5; 5; 6; 6; 7; 7; 7; 7; 8; 9; 10

è The mode has limited usefulness as a measure of central tendency.


26
2. Data analysis process
2.1. Descriptive statistics

v Measures of central tendency: mode, mean, and median

• The median (means ‘middle item’ – trung vị): The middle


observation after all data have been placed in rank order.
Arranged
16 6 11 24 17 4 19 9 20 24 20 19 17 16 11 9 6 4
in order

16 29 20 9 34 10 23 12 15 22

Median = (20 + 16) / 2 = 18


34 29 23 22 20 16 15 12 10 9
27
2. Data analysis process
2.1. Descriptive statistics

v Measures of central tendency: mode, mean, and median

• The mean: The sum of all the scores in a distribution divided by the
number of those scores.

1; 3; 3; 5; 7; 9; 6; 8

Mean = (1 + 3 + 3 + 5 + 7 + 9 + 6 + 8) / 8 = 5.25

Since the mean is determined by the value of every score, it is the preferred
measure of central tendency.
28
2. Data analysis process
2.1. Descriptive statistics

• Variance is the average squared deviation from the mean (M).

∑(# $%)!
V=
'
• Standard deviation (SD) is the square root of the mean squared
deviation from the mean of the distribution. It reflects the amount of
spread that the scores exhibit around the mean.

SD = 𝑉
29
2. Data analysis process
Negative skewness (left-tailed) Positive skewness (right-tailed)
Mode
Mode Mode

Median
Median Median

Mean
Mean

Skewness is the tendency of the deviations from the mean to be larger in


one direction than in the other.
32
2. Data analysis process

Kurtosis is a measure of
the relative peakedness
or flatness of the curve
defined by the frequency
distribution.

33
2. Data analysis process
2.2. Distributions

v Normal distribution
It is affected only by random influences, has no skew and the mean, median,
and mode all fall at exactly the same point.
Characteristics:
• A normal distribution is the proper term for a probability bell curve.
• In a normal distribution, the mean is zero and the standard deviation is 1 (Z –
score)
• It has zero skew and a kurtosis of 3.
• Normal distributions are symmetrical, but not all symmetrical distributions are
normal.
34
2. Data analysis process
2.2. Distributions

v Normal distribution

For a normal distribution, 68% of the


observations are within +/- one
standard deviation of the mean, 95%
are within +/- two standard deviations,
and 99.7% are within +/- three
standard deviations.

35
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

v Frequency of demographic variables:

36
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

v Frequency of demographic variables:

37
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

v Frequency of demographic variables:

38
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

v Frequency of demographic variables:

39
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

v Descriptive statistics for (in)dependent variables (all items)

41
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

42
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

|Skewness| < 3 and |Kurtosis| <10 (Kline, 2011)


43
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

44
2. Data analysis process
2.3. Using SPSS to calculate and display descriptive statistics

45
2. Data analysis process
2.4. Assessing reliability of a measurement scale

v Internal consistency method:

• The more the test items intercorrelate, the higher the internal reliability.

è All the items in the test are measuring the same characteristic.

• Cronbach’s Alpha is very useful to indicate if the items are measuring


the same construct.

46
2. Data analysis process
2.4. Assessing reliability of a measurement scale

v Internal consistency method:

• Cronbach’s alpha ∈ [0; 1]

• An alpha of 0.6 or above is acceptable.

• Cronbach’s alpha ∈ [0.7; 0.8] è Good reliability

• Corrected Item-Total Correlation must be 0.3 or above.

47
2. Data analysis process
2.4. Assessing reliability of a measurement scale

v An item may be eliminated when:

• Cronbach’s alpha < 0.6 (or 0.7)

• Corrected Item-Total Correlation < 0.3

• The alpha coefficient after removing the item is greater than the current
alpha coefficient.

• BUT content validity is very important è Carefully consider eliminating


any item.

48
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha

49
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha

50
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha

51
2. Data analysis process
2.5. Using SPSS to calculate Cronbach’s Alpha

è Eliminate INF18 52
2. Data analysis process
2.6. Factor analysis to assess validity

Exploratory Factor Analysis (EFA): EFA aims to reduce data sets


comprising a large number of variables into a smaller number of factors
and thereby identify the underlying factor structure or model.

• Principal Component Analysis (PCA) - Varimax

• Principle axis factoring (PAF) - Promax

53
2. Data analysis process
2.6. Factor analysis to assess validity
v Requirements in EFA:
Sample size: ratio of N /variables usually at least 5:1, preferably 10:1

Kaiser-Meyer-Olkin (KMO) and Bartlett’s Test:


• The KMO measures the sampling adequacy, which must be greater than
0.5 for a satisfactory factor analysis to proceed.

• Bartlett’s test is significant when its associated probability is less than


0.05 (p-value < 0.05). This means that the variables do have some
correlation to each other.

è If both or one of these indicators is unsatisfactory, it is unwise to


continue with the factor analysis. 56
2. Data analysis process
2.6. Factor analysis to assess validity
v Requirements in EFA:
Eigenvalues and % of variance extracted:

• The eigenvalue is the sum of the squared factor loadings of a particular


factor. The eigenvalue for a factor indicates the total variance attributed
to that factor.
• Only factors with eigenvalues greater than (or equal to) 1.0 are retained.
• The cumulative percentage of variance extracted by the factors should
account for at least 50% of the variance.

57
2. Data analysis process
2.6. Factor analysis to assess validity

v An item is eliminated when:

• Factor loading < 0.5 è does not meet convergence validity.

• Cross-loading on multiple factors: | Fx1 | – |Fx2 | ≤ 𝟎. 𝟑 è does not meet


discriminant validity and unidimensional.

58
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for each individual scale:

59
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for each individual scale:

60
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)

61
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:

62
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:

63
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)

64
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)

65
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:

66
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:

67
2. Data analysis process
2.7. Using SPSS for exploratory factor analysis (EFA)
v EFA for all scales:

Factor loading = 0.46 < 0.5

|0.46 – 0.381| = 0.079 < 0.3

è Eliminate INF17
68
2. Data analysis process
2.8. Regression analysis

Simple linear regression (SLR): A technique for estimating a score or


observation in one variable based on a score or observation in another
variable. It enables estimates to be made of Y values from known values of X.

Y = b0 + b1 X

Y: dependent variable
X: independent variable

69
2. Data analysis process
2.8. Regression analysis

Multiple regression: A technique for estimating the value on the dependent


variable from values on two or more other variables.

Y = b0 + b1 X1 + b2 X2 + b3 X3 +…… etc.

The coefficient of multiple determination (R2) measures the strength of


the relationship between Y and the independent variables.

Ex: an R2 of 0.779 implies that all the predictor variables taken together
explain 77.9% of the variation in the dependent variable.

70
2. Data analysis process
2.8. Regression analysis

v Assumptions of multiple regression

• Having linear correlation between dependent variable and independent


variable. (Pearson’s correlation coefficient, p-value < 0.05)

• Multicollinearity (high correlations between independent variables) should


be avoided è VIF < 2

• The data should be normally distributed

71
2. Data analysis process
2.9. Using SPSS for regression analysis
v Create a representative variable

72
2. Data analysis process
2.9. Using SPSS for regression analysis

73
2. Data analysis process
2.9. Using SPSS for regression analysis

Pearson
correlation
coefficient

74
2. Data analysis process
2.9. Using SPSS for regression analysis

75
2. Data analysis process
2.9. Using SPSS for regression analysis

76
2. Data analysis process
2.9. Using SPSS for regression analysis

77
2. Data analysis process
2.9. Using SPSS for
regression analysis

78
2. Data analysis process
2.9. Using SPSS for
regression analysis

79
2. Data analysis process
2.9. Using SPSS for regression analysis

80
2. Data analysis process
2.8. Regression analysis

SAT (Dependent variable) = 0.202SALA + 0.177TRAIN + 0.289COW + 0.254COND


+ 0.115NAT 81
3. Summary

1. Prepare data file (editing, coding, creating data file)


2. Descriptive statistics (frequency, mean, SD,…)
3. Exploratory factor analysis (EFA) (analyze each separate scale)
4. Assessing reliability of a measurement scale (Cronbach’s alpha)
5. Exploratory factor analysis (EFA) (for all scales)
6. Calculate Cronbach’s alpha again
7. Linear correlation & Regression analysis
8. …

82

You might also like