Factorial Analysis of Variance
Factorial Analysis of Variance
SPSS:
Choose From menu [Analyze] Compare MeansOne way ANOVA
Steps:
1. Add the IV & DV
2. Choose {Options}Select (Descriptive)Press {Continue}
3. Choose {Post-hoc}Select (LSD) Press {Continue}
4. Press {Ok}
SPSS:
Choose From menu [Analyze] General Linear ModelUnivariate
Steps:
1. Add the Credit_card_debt to the Dependent Variable (DV)
2. Add the Age_Cat & the Gender to the Fixed Factor(s) List (IV)
3. Choose {Options} Select (Descriptive)Press {OK}
4. Choose {Post-Hoc}Add the factors (In this example; we add only the Age_cat because the Gender is
only 2 items) from the left pan to the (post Hoc test for:) panSelect (LSD)Press {Continue}
5. Choose {Plots} Select the factor with the more categories to the (horizontal axis) pan , and the Factor
with the lowest categories to the (separate lines) panClick ADD Press {Continue}
6. Press {Ok}
Example:
It is being said that Start_Salary & The Current_Salary differs according to Gender & Minority & the interaction
between them.
Test this assumption using 95% Confidence interval.
Independent Variable (IV): Gender Var_Type: Categorical Called:Factor1
Independent Variable (IV): Minority Var_Type: Categorical Called:Factor2
SPSS:
Choose DB: Employee Data
Choose From menu [Analyze] General Linear ModelMultivariate
Steps:
1. Add the Beginning_Salary to the Dependent Variable (DV)
2. Add the Current_Salary to the Dependent Variable (DV)
3. Add the Gender & the Minority to the Fixed Factor(s) List (IV)
Extra Sources:
Link: https://www.ibm.com/support/pages/corrected-model-sums-squares-unianova-and-glm-multivariate
Problem
What is the meaning of the 'Corrected Model' term in the 'Tests of Between-Subjects Effects' Table in output for
SPSS UNIANOVA or GLM Multivariate?
Regression Analysis
Example: Study the relationship between Beginning salary & Months since hiring And
between Current Salary
A general rule is: if the correlation between two independent variables is between -0.70 and 0.70,
multicollinearity between the two variables is most likely not a problem.
If the VIF for an independent variable is more than 10, multicollinearity is likely and the independent
variable should be removed from the analysis.
SPSS:
Choose DB: Employee Data
Choose From menu [Analyze] Regression Linear
Steps:
3. Choose {Statistics} Check (Collinearity Diagnostics) & (Burbin Watson) Press {Continue}
4. Press {Ok}
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) -12120.813 3082.981 -3.932 <.001
Beginning Salary 1.914 .046 .882 41.271 <.001
Months since Hire 172.297 36.276 .102 4.750 <.001
a. Dependent Variable: Current Salary
If you want ot check the relative importance of the 2 IVs , check the Standardized Beta
Extra Resource:
What is The Durbin Watson Test?
The Durbin Watson Test is a measure of autocorrelation (also called serial correlation)
in residuals from regression analysis. Autocorrelation is the similarity of a time series over successive time
intervals. It can lead to underestimates of the standard error and can cause you to think predictors
are significant when they are not.
The Durbin Watson test reports a test statistic, with a value from 0 to 4, where:
2 is no autocorrelation.
0 to <2 is positive autocorrelation (common in time series data).
>2 to 4 is negative autocorrelation (less common in time series data).
A rule of thumb is that test statistic values in the range of 1.5 to 2.5 are relatively normal. Values outside of this
range could be cause for concern.
When an important ID variable was not presented to the regression model. The residuals of the scattered dots tend
to follow a pattern which would mean that the regression model needs revisiting to include/ search for the missing
IDV. (From Shady’s notes)
Serial correlation (also called Autocorrelation) is where error terms in a time series transfer from one period to
another. In other words, the error for one time period a is correlated with the error for a subsequent time period b.
For example, an underestimate for one quarter’s profits can result in an underestimate of profits for subsequent
quarters.
Types of Autocorrelations:
The most common form of autocorrelation is first-order serial correlation, which can either be positive or
negative.
Positive serial correlation is where a positive error in one period carries over into a positive error for
the following period.
Negative serial correlation is where a negative error in one period carries over into a negative error
for the following period.
Second-order serial correlation is where an error affects data two time periods later. This can happen when your
data has seasonality. Orders higher than second-order do happen, but they are rare.
What is the difference between collinearity and interaction?
https://stats.stackexchange.com/questions/113733/what-is-the-difference-between-collinearity-and-interaction
An interaction may arise when considering the relationship among three or more variables, and describes a
situation in which the simultaneous influence of two variables on a third is not additive. Most commonly,
interactions are considered in the context of regression analyses.
The presence of interactions can have important implications for the interpretation of statistical models. If two
variables of interest interact, the relationship between each of the interacting variables and a third "dependent
variable" depends on the value of the other interacting variable. In practice, this makes it more difficult to predict
the consequences of changing the value of a variable, particularly if the variables it interacts with are hard to
measure or difficult to control.
Collinearity is a statistical phenomenon in which two or more predictor variables in a multiple regression model
are highly correlated, meaning that one can be linearly predicted from the others with a non-trivial degree of
accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to
small changes in the model or the data. Collinearity does not reduce the predictive power or reliability of the model
as a whole, at least within the sample data themselves; it only affects calculations regarding individual predictors.
That is, a multiple regression model with correlated predictors can indicate how well the entire bundle of predictors
predicts the outcome variable, but it may not give valid results about any individual predictor, or about which
predictors are redundant with respect to others.