Regression
R2 represents the fit of the data. How much does the model explain the variation in the dependent
variable. Any R2 above 0.5 is considered a good fit.
Durbin Watson represents the autocorrelation of the data. This means that there will be correlation
between the error terms of the data. This is an undesirable property of regression, because, the
autocorrelation will bias the regression estimates, ie give you wrong predictions.
So check of autocorrelation is done by DW, a value of Durbin Watson around 2 is preferred. If the
DW is below 2, there is positive autocorrelation and if the DW is above 2 means there is negative
autocorrelation. A DW od 2 indicates there is no autocorrelation. The edge values of autocorrelation
are between 0 and 4.
ANOVA
The H0 for Anova test is: The model is not a good fit
Ha: The model is a good fit
Here we do not have F value table, therefore we are looking at the significance value.
P value test- significance test
If the p value is lesser than 0.05 then reject null hypothesis
If p value greater or equal to 0.05, accept the null hypothesis
Colliniearity Statistics
To test whether there is heteroskedasticity of data. Hsk is the residuals are cone shaped, that is they
are showing a positive or a negative trend (cone ) with the data. This is also bad for estimates and
predictions because they give biased estimates. If the value of VIF is greater than 10, the data is HSK.
Tolerance = 1/VIF
VIF = Variance Inflation Factor.
Regression Equation
Y =a +b1X1+ b2X2+b3X3+e
Salary = 17.143 +1.589 *(No of Years) + 3.643 * (Gender) – 0.89 * (Gen*no of years) + E
Significance
Constant
Ho : The constant is not significant
Ha: The constant is significant
P value test is
If p <0.05, we reject the null
If P >= 0.05, we accept the null
Constant is significant
Gender is significant
No of years of experience is significant
Interaction between gender and no of years is not significant.
FACTOR ANALYSIS
KMO Test
This checks the adequacy of the data to conduct the factor analysis. If the value of KMO is above 0.5,
we can proceed to conduct the factor analysis.
Bartlett’s test of Sphericity
H0: The correlation matrix is an identity matrix, we cannot proceed with factor analysis
Ha: The correlation matrix is not an identity matrix, we can proceed with factor analysis
P value test
If p <0.05, we reject the null
If P >= 0.05, we accept the null
Communalities
A communality is the extent to which an item correlates with all other items. Higher communalities are
better. If communalities for a particular variable are low (between 0.0-0.4), then that variable may
struggle to load significantly on any factor.
No Communalities are below 0.4, therefore, the extraction results will be good.
Factor Extraction
Any factor with an eigen value greater than 1 is extracted. The total variance of extraction in this
case is typed in the last column, last row and the individual variances explaned by each factor are
in the 2nd last column, each row.
Scree Plot
Whereever there is a tight bend, there you stop extraction. Here the tight bend cannot be
recognized, therefore we take the eigen value which is represented in the y axis, eigen value>1,
we extract. Therefore we are extracting 6 factors here.
Communalities will be represented as H2. Rotated component matrix values will be given as p1, p2
etc. F1, f2 – factor1, factor2.
Rotated Component Matrix
Here we are extracting component with a value greater the 0.5 (take modulus/absolute values)
1st factor – Convenience features, Driving Pleasure, Colors Available, Safety
2nd factor – engine capacity, fuel efficiency, running and maintenance cost, economical
3rd factor – brand name, after sales service, performance information available
4th factor – purpose of purchase, car image and positioning, advertising and marketing
5th factor – price on road, discount scheme, resale value
6th factor – looks and design
Latent Constructs
Name that you can give for each of the factor extracted
1st factor – Consumer preference
2nd factor – technical efficiencies
3rd factor – value addition
4th factor – marketing
5th factor – price
6th factor - glamour
Box’s Test of Cov Matrix
Ho: The covariance matrix of the three types of flowers are not different (same)
Ha: The covariance matrix of the three types of flowers are different
P test value
If sig. is < 0.05, then reject the null
If sig. is >=0.05 , then do not reject null
Wilks Lambda
H0: There is no discriminating power for the 4 variables (PL, PW, SL, SW)
Ha: There is discriminating power for the 4 variables (PL, PW, SL, SW)
P test here
If sig. is < 0.05, then reject the null
If sig. is >=0.05 , then do not reject null
Standardized discriminant function coefficient works similar to regression coefficients
This helps to write out the discriminant function
1st Function
D1 =0.608PW + 0.887PL-0.518SW-0.359SL
2nd Function
D2 = 0581PW-0.431PL+0.695SW+0.093SL
1st Function
Flower 0 (Mean Discriminant value) Flower 2 Flower 1
(-7.218) (1.822) (5.396)
2nd Function
Flower 2 Flower 0 Flower 1
-0.728 0.206 0.522
Function 1 has more variance between the three types of flowers, therefore, function 1 is a better
discriminant function.
------------------------------------------------------------------------------------------------------------------------------------