0% found this document useful (0 votes)
2 views

Logistic Regression

The document discusses the applications of multivariate data analysis using SPSS, focusing on logistic regression analysis, including its types and applications in various fields such as marketing and education. It explains the requirements, statistical measures, and significance tests involved in binary and multinomial logistic regression, as well as factor analysis techniques and considerations. Additionally, it outlines the importance of sample size, assumptions, and methods for choosing and assessing factor models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Logistic Regression

The document discusses the applications of multivariate data analysis using SPSS, focusing on logistic regression analysis, including its types and applications in various fields such as marketing and education. It explains the requirements, statistical measures, and significance tests involved in binary and multinomial logistic regression, as well as factor analysis techniques and considerations. Additionally, it outlines the importance of sample size, assumptions, and methods for choosing and assessing factor models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Applications of

Multivariate Data
Analysis in Research
using SPSS
Logistic Regression Analysis
O This is used to predict a categorical dependent
variable from a set of continuous and/or
dichotomous predictors.
O The objective is to predict and explain the bases
for each object’s group membership through a
set of independent variables selected by the
researcher.
Logistic Regression Analysis
O This is used to predict a categorical dependent
variable from a set of continuous and/or
dichotomous predictors.
O The objective is to predict and explain the bases
for each object’s group membership through a
set of independent variables selected by the
researcher.
Some Variations
O Binary Logistic Regression – is used when the
dependent variable is a binary or dichotomous
variable.
O Multinomial Logistic Regression - is used when
the dependent variable has more than two
categories.
O Ordinal Logistic Regression – is used if the
categories are ranked in some increasing or
decreasing order.
Some Applications of Logistic
Regression Analysis
O In marketing and finance, this can be used to
predict the success or failure of a new product
or determine the category of credit risk for a
person. This may also be used to predict if a
firm will be successful.
O In education, it can help administrators to
decide whether a student should be admitted to
graduate school, classify students as to
vocational interests, etc.
Some Researches
General Goals
O Determine the effects of the independent
variables on the probability.
O Attain the highest predictive accuracy possible
with a given set of predictor variables.
Some Important Terms
O Odds (success) – is the ratio of the probability of
success, P, to the probability of failure,1–P.
Suppose that 80 students (50 are female and
30 are male) took an achievement test and results
shows that 40 female passed while 25 male
passed), the odds are:
𝑂𝑑𝑑𝑠 𝑠𝑢𝑐𝑐𝑒𝑠𝑠, 𝑖𝑓 𝑓𝑒𝑚𝑎𝑙𝑒 = 0.80ൗ1 − .80 = 4
𝑂𝑑𝑑𝑠 𝑠𝑢𝑐𝑐𝑒𝑠𝑠, 𝑖𝑓 𝑚𝑎𝑙𝑒 = 0.833ൗ1 − .833 = 5
Some Important Terms
O Odds Ratio (OR). Given the odds for males and
females for success, the odds ratio for success
is:
𝑂𝑑𝑑𝑠 (𝑠𝑢𝑐𝑐𝑒𝑠𝑠, 𝑚𝑎𝑙𝑒) 5
𝑂𝑅 = = = 1.25
𝑂𝑑𝑑𝑠 (𝑠𝑢𝑐𝑐𝑒𝑠𝑠, 𝑓𝑒𝑚𝑎𝑙𝑒) 4

The odds for a male to pass is 1.25 times


greater than for a female.
Dichotomous or Binary Logistic
Regression Analysis
From the linear model

Ŷi = b0 + b1X1i + b2 X2i +  + bk Xki


Y is transformed as ln (odds) or logit. The Binary
Logistic Model is:

ln(odds ) = b 0 + b1X1i + b 2 X 2i +  + b k X ki
Binary Logistic Regression
Analysis provides…
O Predicted category membership of each case
O Probability of membership
O Classification table
O Ordering of the relative importance or impact of
the predictor variables
Requirements for Binary Logistic
Regression Analysis
O The predictors must be of the interval, ratio or
dichotomous categorical.
O The form of relationship must be linear and
must only include relevant predictors.
O The expected value of the error term is zero.
O There is no correlation between the error and
the predictors.
O There is an absence of perfect multi-collinearity
between the predictors.
Sample Size consideration
O The recommended sample size for each group
in the dependent variable is at least ten (10)
observations per estimated parameter. (Hair,
2010)
Binary Logistic Regression
Analysis with one predictor
O Using birthweight.sav data
O Is age significantly related to birthweight of babies?
O Dependent variable: birthweight of baby (0 –
normal, 1 – low birthweight)

Model:

ln 𝑜𝑑𝑑𝑠, 𝑙𝑜𝑤 𝑏𝑖𝑟𝑡ℎ𝑤𝑒𝑖𝑔ℎ𝑡 = 𝑏0 + 𝑏1 ∗ 𝑎𝑔𝑒


SPSS output: The Null or Baseline Model
Statistical Measures of Model
Fit: Full Model vs Baseline
Model

O This shows the chi-square test for the change in


the -2LL value. The prediction of low birth
weight from age is not statistically significant.
SPSS output: These are pseudo R2 measures.

Pseudo R2 are R2 -like measures of effect size, not to be


reported as an actual proportion of variance in the dependent
variable Y attributed by the predictors together. Here, the value is
1.8%. This value is very low in terms of practical significance.
This test is a goodness of fit test. A good “fit”
between observed and expected frequencies is indicated by a
non-significant chi-square statistic (p-value>0.05). If p less
than 0.05, the estimates of the model parameters are
inaccurate.

A better model fit is indicated by a smaller difference


between the actual and predicted values. A p-value less
than 0.05 indicates that there is still significant differences
between actual and predicted values of Y.
Test for significance of predictor
variables, Wald test
O This is obtained by dividing the squared logistic
regression coefficient by its squared standard
error.
Test for significance of predictor
variables, Wald test

ln (odds, low birthwt)= 0.32 – 0.48*age

Since p > 0.05, mother’s


age is not significantly
associated with low birth
weight.
Binary Logistic Regression
Analysis with more than one
predictor
O Using birthwt.sav data
O Dependent variable: birthweight of baby (0 –
normal, 1 – low birthweight)
O Predictors: age of mother, weight of mother at
last menstrual period (lwt) and history of
hypertension
O The prediction of incidence of low birth weight from
age, weight during last menstruation and history of
hypertension of the mother is statistically
significant, 2 3 = 13.329, 𝑝 < 0.01.
O The model fits the data significantly as evidenced by a non-
significant chi square statistic (p>0.05). Thus, one can trust
the estimates of the model parameters.
The Logistic Regression Model

O 𝐿𝑜𝑔 𝑜𝑑𝑑𝑠 𝑜𝑓 𝑙𝑜𝑤 𝑏𝑖𝑟𝑡ℎ 𝑤𝑒𝑖𝑔ℎ𝑡 = 1.903 – 0.034 ∗ 𝑎𝑔𝑒 – 1.752 ∗


ℎ𝑖𝑠𝑡𝑜𝑟𝑦 𝑜𝑓ℎ𝑦𝑝𝑒𝑟𝑡𝑒𝑛𝑠𝑖𝑜𝑛 – 0.016 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡 𝑑𝑢𝑟𝑖𝑛𝑔 𝑙𝑎𝑠𝑡 𝑚𝑒𝑛𝑠𝑡𝑟𝑢𝑎𝑡𝑖𝑜𝑛.
The Logistic Regression Model

O A mother with history of hypertension (coded as 1) has higher odds of having a


low birth weight baby.
O The heavier the mother during last menstruation, the lower the odds of having a
low birthweight baby.
Hit Ratio
O Percentage of subjects correctly classified by
the model.
In Summary,
O Model significance tests are made with a chi-square test on
the differences in the -2LL between the proposed and null
model and several R2 –like measures.
O Interpretation of the coefficients for direction can be directly
assessed in the original coefficients (+ or – signs) or
indirectly in the exponentiated coefficients (<1, negative,
>1, positive).
O Magnitude is assessed by Exp(B), with the percentage
change in the dependent variable equal to (Exp(B) – 1) x
100.
Multinomial Logistic Regression
O This allows the categorical dependent variable
to contain more than two categories.
O It can be applied to such situations as
predicting voting preference (vote for candidate
A, vote for candidate B, vote for candidate C,
don’t vote) or predicting laptop brand purchase
(Asus, Macbook or Acer).
Some Researches
Favorite Ice Cream Flavor
O Dependent Variable: Favorite ice cream flavor
O Predictors: gender, score on video game and
score on puzzle game
O Using spss data set: mlogit.sav
Favorite Ice cream flavor

O Vanilla is the most preferred ice cream flavor (47.5%).

O For this example, vanilla will be used (code 2) as the reference


group.
Favorite Ice cream flavor
O Two models that will be generated:

O Model 1: Chocolate relative to vanilla


O Model 2: Strawberry relative to vanilla
On SPSS:
SPSS output:
O For Model 1 (Chocolate relative to vanilla), the significant predictors
are gender and puzzle score.
O The odds of preferring chocolate relative to vanilla are higher for
female than male (female coded as 1), that is, the odds of preferring
chocolate relative to vanilla are 2.263 times greater for females than
males.
O The odds of preferring chocolate relative to vanilla decrease as puzzle
score increase.
O For Model 2 (strawberry relative to vanilla), the significant predictors is
puzzle score (p<0.05).
O The odds of preferring strawberry relative to vanilla increases, Exp(B)
=1.044, as puzzle score increases.
Factor Analysis
O Factor Analysis is an interdependence
technique whose primary purpose is to define
the underlying structure among the variables in
the analysis.
O Factor analytic techniques can achieve their
purposes from either an exploratory or
confirmatory perspective.
Exploratory Factor Analysis
Some researches
The Design
O It is performed most often only on metric
variables.
O If a study is designed to reveal factor structure,
strive to have at least five variables for each
proposed factor.
The Design
O For sample size (Hair, 2010):
O The sample must have more observations than
variables.
O The minimum absolute sample size should be
50 observations.
O Strive to maximize the number of observations
per variable, with a desired ratio of 5
observations per variable.
Sample Size
(Comrey and Lee, 1992):
Size Remarks
50 Minimum; very poor
100 Poor
200 Fair
300 Good
500 Very good
At least 1000 Excellent
Rule of thumb 10 per item
Assumptions in Factor Analysis
O A strong conceptual foundation needs to
support the assumption that a structure does
exist before the factor analysis is performed.
O A statistically significant Barlett’s test of
sphericity (sig < 0.05) indicates that sufficient
correlations exist among the variables to
proceed.
Assumptions in Factor Analysis
O KMO-Measure of sampling adequacy (MSA)
values must exceed 0.50 for both the over-all
test and each individual variable.
O Variables with MSA < 0.5 should be omitted
from the factor analysis one at a time, with the
smallest being omitted each time.
O The Kaiser-Meyer-Olkin Index tells whether
factor analysis can be done.
KMO-MSA Index interpretation
Index Value Interpretation
Greater than 0.9 Marvelous
Close to 0.8 Meritorious
Close to 0.7 Middling
Close to 0.6 Mediocre
Close to 0.5 Miserable
Below 0.5 Unacceptable
Choosing Factor Models
O The component analysis model is most
appropriate when data reduction is paramount.
O The common factor model is best in well-
specified theoretical applications.
O Both arrive at essentially identical results if the
number of variables exceeds 30 or the
communalities exceed 0.6 for most variables.
Choosing Factor Models
Some authors also suggest:
O Maximum Likelihood method if data are
normally distributed.
O Principal Axis Factoring if multivariate normality
is severely violated.
Choosing Number of Factors
O Stopping criteria to determine the initial number of
factors to retain:
O Factors with eigenvalues greater than 1.0. (Latent root
criterion)
O A predetermined number of factors based on research
objectives and/or prior research. (A priori criterion)
O Enough factors to meet a specified percentage of
variance explained, usually 60% or higher. (Percentage
of variance criterion)
O Factors shown by the scree test to have substantial
amounts of common variance (i.e., factors before
inflection points). (Scree test criterion)
Choosing Factor Rotation
Methods
The goal of rotation is to simplify and clarify the data
structure.
O Orthogonal Rotation Methods
O Are the most widely used methods
O Are preferred when the research goal is data
reduction to either a smaller number of variables or
a set of uncorrelated measures for subsequent use
in other multivariate techniques.
O Examples are quartimax, varimax, and equimax
Choosing Factor Rotation
Methods
O Orthogonal Rotation Methods
O The varimax method simplifies the interpretation of
the factors. This minimizes the number of variables
that have high loadings on each factor.
O The quartimax method simplifies the interpretation
of the observed variables. It minimizes the number
of factors needed to explain the variables.
O The equimax method combines varimax and
quartimax.
Factor Rotation Methods
O Oblique Rotation Methods (OBLIMIN in SPSS)
O Are best suited to the goal of obtaining several
theoretically meaningful factors or constructs,
because few constructs in the real world are
uncorrelated. This allow rotated factors to be
correlated.
Assessing Factor Loadings
O Factor loadings represent the degree to which
each of the variables correlates with each of the
factors.
O Although values ∓0.30 𝑎𝑛𝑑 ∓ 0.40 are
minimally acceptable, values greater than
∓ 0.50 are considered necessary for practical
significance.
Assessing Factor Loadings
Factor loading Sample Size Needed for
Significance at 0.05
.30 350
.35 250
.40 200
.45 150
.50 120
.55 100
.60 85
.65 70
.70 60
.75 50
Source: Computations made with SOLO power analysis, BMDP Statistical
Software, Inc., 1993
Regarding Summated Scales
O Assess its unidimensionality with EFA or CFA.
O The reliability score is ideally a minimum of
0.70, although a 0.60 level can be used in
exploratory research.
Example:
O Survey about BPI Express Assist with 198
respondents (bea.sav file)
Q1 BEA is never out of order.

Q3 The touch screen monitor of BEA always works smoothly.

Q3 BEA always prints out the queuing (customer) number.

Q4 My BEA grievances are settled within reasonable time by the bank.

Q5 BEA’s interface is user-friendly.

Q6 BEA requires fewer steps to accomplish what I want to do.

Q7 BEA’s instructions are clear and understandable.

Q8 BEA saves me time and effort.

Q9 BEA is accessible and conveniently located inside the branch.

Q10 It is easy to process several transactions with BEA.


Reliability
Bivariate correlations
Correlations

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10

Q1 1
**
Q2 .387 1
** **
Q3 .340 .500 1
** ** **
Q4 .341 .301 .397 1
** ** ** **
Q5 .273 .319 .223 .193 1
** ** ** ** **
Q6 .274 .203 .199 .188 .456 1
** ** ** ** ** **
Q7 .318 .341 .339 .260 .537 .514 1
** ** ** * ** ** **
Q8 .186 .375 .203 .180 .305 .259 .402 1
** ** ** ** ** ** ** **
Q9 .284 .320 .224 .222 .296 .322 .365 .380 1
** ** ** ** ** ** ** ** **
Q10 .282 .322 .271 .189 .348 .427 .461 .371 .479 1

**. Correlation is significant at the 0.01 level (2-tailed).


*. Correlation is significant at the 0.05 level (2-tailed).
KMO and Bartlett’s test

Meritorious!

Significant since
p<0.05
Communalities
What are communalities?
O A variable’s communality is the estimate of its
shared or common variance among the
variables as represented by the derived factors.
Common variance is accounted for based on a
variable’s correlation with all other variables in
the analysis.
O Variables with communalities less than 0.5 are
considered as not having acceptable levels of
explanation.
Eigenvalues and Total Variance

There are two eigenvalues The proportion of variance


greater than 1; there are two accounted for by the two factors is
factors extracted. 51.602%.
Scree Plot
Factor
Solution
Factor Interpretation
Q1 BEA is never out of order.
Q2 The touch screen monitor of BEA always works
smoothly.
Q3 O Survey about BPI Express
BEA always prints out the queuing (customer) Assist
Factor with
1 198
Reliability and
Responsiveness
number. respondents (Using bea.sav file)
Q4 My BEA grievances are settled within
reasonable time by the bank.
Q5 BEA’s interface is user-friendly.
Q6 BEA requires fewer steps to accomplish what I
want to do.
Q7 BEA’s instructions are clear and
understandable.
Factor 2 Ease of Use
Q8 BEA saves me time and effort.
Q9 BEA is accessible and conveniently located
inside the branch.
Q10 It is easy to process several transactions with
BEA.
References:
a. Cohen, J, et al (2002) Applied Multiple
Regression/Correlation Analysis for the Behavioral
Sciences, 3rd Edition
b. Comrey, AL and Lee, HB (1992) A First Course in
Factor Analysis
c. Dimitrov, DM (2008) Quantitative Research in
Education, Whittier Publications Inc.
d. Hair, JF, et al. (2010) Multivariate Data Analysis: A
Global Perspective. Pearson.
e. IBM Corp. (2012) IBM SPSS Statistics for Windows, V.
21.0
f. Wilkincon, L. (1975) Tests of Significance in Stepwise
Regression, Psychological Bulletin 86.

You might also like