0% found this document useful (0 votes)
143 views71 pages

Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management

The document provides guidelines for conducting statistical analysis using SPSS for a final year research project. It outlines 12 steps for the analysis including conducting a pilot study, determining sample size, reliability analysis, computing means, testing for normality and homogeneity, assessing independence of errors, and performing descriptive, correlation and regression analyses. Details are provided on how to conduct each statistical test in SPSS and how to interpret the output and results.

Uploaded by

huda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views71 pages

Guideline For Final Year Project - Research Supervision: Faculty of Business, Accountancy and Management

The document provides guidelines for conducting statistical analysis using SPSS for a final year research project. It outlines 12 steps for the analysis including conducting a pilot study, determining sample size, reliability analysis, computing means, testing for normality and homogeneity, assessing independence of errors, and performing descriptive, correlation and regression analyses. Details are provided on how to conduct each statistical test in SPSS and how to interpret the output and results.

Uploaded by

huda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

SPSS GUIDELINE

FOR FINAL YEAR PROJECT -


RESEARCH SUPERVISION
FACULTY OF BUSINESS, ACCOUNTANCY AND MANAGEMENT
Prepared by :
Dineswary Nadarajan Noor Ain Zeni
Data Analysis: Step by Step Approach
• 1. Pilot Study – 20-30 data- Reliability Analysis (method 1 only)
• 2. Sampling size – Rules of 5
• 3. Reliability Analysis for all data (method 1 & 2)
• 4. Compute mean for each IVs & DV
• 5. Normality Test (DV only)- outlier
• 6. Linearity Test (Each IVs)-R2 & slope
• 7. Homogeneity Test (Levene Test & Residual vs Predicted scatter plot)
• 8. Independence of error (Durbin- Watson)
• 9. Descriptive Analysis (Tables & pie/bar Charts for demographic questions, line charts for 2nd data)
• 10. Correlation Analysis (2 tailed test, Sig, positive/negative direction of rship, strength – only for DV)
• 11. Regression Analysis (model summary-R2 & Durbin Watson, anova table – Sig & no need F stat,
coefficient table-VIF, B & Beta,)
• 12. Model equation
Data Entry (Variable View Tab)
Step 1: Coding all the answers of the questionnaire with numbers
Step 2: Defining each variable in the variable view tab in SPSS
Eg : Demographic variables, Independent variables, Dependent variables
• Name : Name each question by section (unique, no space or symbol)
• Type : Numeric
• Width, decimal, missing value and align: Default
• Label : The question labels
• Value : Meaning of each values
• Measure :
• Nominal : Categorical Data
• Exp: Gender, Marriage status, Nationality
• Ordinal : Order (Rank)
• Exp : Age, Winning position, Education level
• Scale : Likert Scale & Figures answer
• Exp: Strongly agree, agree, neutral, disagree and strongly disagree, weight in kg, height in metres.
Variable View in SPSS
Data View in SPSS
Sample Size

Rule of 5
• Total items in questionnaire x 5= p
• Sample size > p
Reliability Analysis
Reliability analysis using ‘Cronbach’s alpha (α)’
• Expressed on scale of 0–1
• Where 1 is most reliable outcome
• Optimum reliability usually at least 0.70
• More than 0.70 consider as the data is reliable

SPSS Instruction :
Analyze – Scale – Reliability Analysis – Move all questions into the item
box (except demographic questions)– Click OK.
Two Step to analyze reliability
• Step 1– overall data reliability
Analyze all the variable in the questionnaire (excluding
demographic factor)
• Step 2– to justify the unreliability of the data
Independent variable (Each variable Separately)
Dependent variable (Separately)

Since the value is more than 0.7


Then the data is reliable for further
analysis.
Research Tips:
• In the pilot study, the reliability analysis is being done to justify the
efficiency and consistency of the data from the questionnaires.

• If the Cronbach Alpha value falls below 0.7, Method Two must be executed
to identify which variable that is not reliable for questions’
amendment(possible redundancy, leading etc) and recollecting new data.

• Hence, when the Cronbach Alpha value reach above 0.7, a thorough and
complete data collection for the rest of the sample respondents now has to
be done before further analysis.
Compute Mean
• Click Transform 
Take Compute
Variable

Step 1:
Name the target
variable as unique as
naming each variable
such in variable view
tab.
Step 2: Counter Clockwise
Selection Panel
 Select StatisticalMean 
click UP arrow button 
select all the questions that
contain in the variable one
by one  click LEFT arrow
button
Eg :
MEAN(B3.1,B3.2,B3.3,B3.4)
 Click OK
Step 3: Repeat this step for all
the independent and
dependent variable.
The new COMPUTED variable
will be in data view set.
Normal Distribution Analysis
• There are a number of ways that we can do this :
• Statistically
• Significant value of Kolmogorov–Smirnov and Shapiro–Wilk test
• Z-score test of Skewness and Kurtosis (Second approach for normality
assessment)

• Graphically
• Box plot (to exclude the outliers when data is not normal)
• Q-Q plots
• Histogram with normal curve

• We will only examine the normality for the DEPENDENT VARIABLE.


Statistical Approach

Explore the rules of statistical approach for normality test.


• Consider the hypothesis, we want a NON-SIGNIFICANT outcome! This
indicates data is normally distributed.
• Significant is when sig-value is less than 0.05, reject Ho.
• Non-significant is when sig-value is more than 0.05, accept Ho.
Using SPSS for normality
• Select Analyze  Descriptive Statistics  Explore  (in
new window) transfer the ‘COMPUTED MEAN’ of
DEPENDENT VARIABLE into Dependent List box  click on
Statistics  tick Outlier click on Plots  tick Normality
plots with tests, Stem and Leaf, Histogram  click Continue
 click OK

• Kolmogorov–Smirnov Test
- can be used when N > 50
• Shapiro–Wilk Test
- for smaller samples when N < 50
Output for Normality Test
Kolmogorov–Smirnov Shapiro–Wilk
- should use when is N > 50 - smaller samples when N < 50

Since Sig. value is greater than 0.05,


Then failed to reject Ho.
So, the data is NORMALLY DISTRIBUTED
Z-score Test for Skewness and Kurtosis
• If one do not passed the normality test using Kolmogorov-Smirnov or
Shapiro-Wilk test (where Sig < 0.05)  Use Z-score!
• Z-score test is dividing the statistic and the standard error of the SKEWNESS
from the descriptives output in SPSS.
• Here are some guidelines for Z-score cut-off points:

Sample size z -score cut-off


< 50 ±1.96
51–100 ± 2 . 58
> 100 ± 3 . 29
Z-score Output – Descriptive table
Mood Score Statistic Std. Error
Male Skewness .909 .378
Kurtosis .908 .741
Z-scores
Divide the statistics (highlighted in blue) by standard error
(green) of the skewness.
𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 0.909
𝑧 − 𝑠𝑐𝑜𝑟𝑒 = = = 2.4047
𝑠𝑡𝑑. 𝑒𝑟𝑟𝑜𝑟 0.378
Therefore, Sample size z-score Conclusion
< 50 NOT NORMAL
±1.96
Check outlier
51-100 ±2.58 NORMAL
>100 ±3.29 NORMAL
Check For The Outliers
• When approaching statistical approaches for normality of data and
results found to be NOT NORMAL, check for the outliers.

• HOW? - Based on the box plot in the output, delete the


respondents/sample number that appear outside of the box plot. This
has to do simultaneously.

• RUN THE NORMALITY ANALYSIS AGAIN. If problem persist,


increase the number of sample size/transform the data.
Box-Whisker Plot

• This are outliers.


• They must be
deleted
simultaneously in
the data view tab.
Graphical Approach for
Normality
• Normal Q-Q plot
• Histogram with normal curve
Normal Q-Q Plot
• Right click - Copy
Comment 1:
R-square = 0.186
means, there is 18.6%
of the total variation
in DV (CS) can be
explained by IV (ivv4).

Comment 2:
Slope = 0.4 means, for
every 1 unit increase
in IV (ivv4), there will
be an increase by 0.4
in DV (CS).
Homoscedasticity

• There are a number of ways that we can do this :


• Statistically
• Levene Statistic

• Graphically
• Plot Z residual and Z predicted
Levene Statistic
Homogeneity – Levene Statistic
• Significant value > 0.05 – Homogeneity exist
• If Significant value < 0.5 – Homogeneity does not exist

Homogeneity does not exist


Homoscedasticity

• We are hoping that


the data is
distributed
rectangular (-3 to 3)

• No clustering or no
systematic
patterns in plotting
indicates that
variance are
constant.

• Homocedasticity
(homogeneity) exist in
the data model.
Independence of error

• Independence of error means that the distribution of errors is


random and not influenced by or correlated to the errors in prior
observations. The opposite of independence is
called autocorrelation.

• The independent/autocorrelation is a more quantitative measure


which involves calculating the Durbin-Watson Statistic.

A Durbin-Watson value between 1-3 indicates there is NO


AUTOCORRELATION problem among the residuals (error). Therefore,
the data are Independence of error.
DESCRIPTIVE STATISTICS
DESCRIPTIVE STATISTICS

Interpretation : The average value (mean) and the


standard deviation value (smallest value is good variable)
CHARTS
Charts
CORRELATION
• Relationship assessed in terms of a ‘correlation coefficient’
• Positive correlation
• Values change in same direction
• Temperature   sale of ice creams 
• Negative correlation
• Values change in opposite directions
• Temperature   sale of overcoats 
• No relationship between temperature and sale of hamsters
Scatterplots
 Negative correlation –
line slopes downwards from
left to right

 Positive correlation – line


slopes upwards from left to
right

 No correlation – no
pattern to cluster of data

Scale of correlation coefficient Value

0 < 𝒓 ≤ 0.19 Very Low Correlation

0.2 ≤ 𝒓 ≤ 0.39 Low Correlation

0.4 ≤ 𝒓 ≤ 0.59 Moderate Correlation

0.6≤ 𝒓 ≤ 0.79 High Correlation

0.8 ≤ 𝒓 < 1.0 Very High Correlation


Using SPSS for correlation

Select Analyze  Correlate  Bivariate  transfer the all independent


variable and dependent variable to Variable box  click OK

• Use the computed


mean of each
variable (mean of
independent
variable’s and
dependent variable)
Output of Correlation result
• Sig value should be ≤0.05  Non-Significant because ≥0.05

COMMENTS:
1) SIGNIFICANT CORRELATION!
2) STRENGTH (low/moderate/high) and SIGN (positive or negative relationship)
of the Pearson correlation (r) value.

Example:
There is a SIGNIFICANT relationship between
• Gender inequality and female employee performance
• Sexual inequality and female employee performance
• Inflexible working hours and female employee performance
There is a no SIGNIFICANT relationship between
• Low wages and female employee performance
Multiple Linear Regression
 Model has several regression lines
 Each with their own gradient
 But ‘constant’ will change each time another predictor added
 Regression equation now a little more complex
 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏𝑿𝟏 + 𝜷𝟐𝑿𝟐 + ⋯ 𝜷𝒏𝑿𝒏 + 𝜺𝒊
 Gradient for each predictor
 β1X1 for first predictor, β2X2 for second predictor…

• Using the SPSS for regression


• Select Analyze  Regression  Linear…  (in new window) dependent to Dependent  transfer
independent to Independent(s)  click Statistics… (in new window) tick boxes for Estimates,
Model fit, Collinearity diagnostic and durbin-watson  click Continue  click Plots  Move
ZRESID to Y box and ZPRED to X box  tick Histogram and Normal Probability Plot  click
Continue  click OK
Homoscedasticity
Homoscedasticity

• No clustering or no
systematic patterns in
plotting indicates that
variance are constant.

• Homocedasticity
(homogeneity) exist in
the data model.

• Acceptable - Assumption
satisfied!
Output Interpretation
1)Examine the regression assumptions:
• Normality analysis
• Homocedasticity (scatter plot of residual vs predicted value)
• Durbin – Watson test of autocorrelation(independence of residual)
• Multicollinearity assessment tools (Tol & VIF)

2)Interpret the Multiple Regression Analysis results:


• Model summary table : Interpret R-square
• ANOVA table : Interpret Sig. value
• Coefficient table : Interpret Sig. values and Beta/B coefficients,
R-square – 0.18518.5% of the total variation of dependent variable in this model can be explained
by the independent variables. (Please list down the names according to your variables)

A Durbin-Watson value between 1-3 indicates there is NO AUTOCORRELATION problem among the
residuals (error).

ANOVA table is referred at significant value :


Sig value ≤ 𝟎. 𝟎𝟓 means the model is fit to use for further analysis.
3) Collinearity (TOL & VIF)
1) Significant 2) Unstandardized & Standardized Coefficients TOL 𝒗𝒂𝒍𝒖𝒆 ≤ 𝟏𝟎
 Beta 𝐯𝐚𝐥𝐮𝐞 (𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝𝐢𝐳𝐞𝐝 𝐜𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭𝐬 − VIF value close to 1
Sig value ≤ 𝟎. 𝟎𝟓 𝐬𝐞𝐜𝐨𝐧𝐝𝐚𝐫𝐲 𝐝𝐚𝐭𝐚)
 1 = not correlated.
(check each variable and if  B value (unstandardized coefficients – primary data)  Between 1 and 5 = moderately
significant reject null  comment on the value and sign; correlated.
hypothesis, accept your  Greater than 5 = highly correlated.
• Highest value means most crucial factor
proposed hypothesis) • Positive : When x increase then y will increase too
then there is no multicollinearity
• Negative : When x increase then y will decrease problem
Model Equation
• Overall regression model equation :
𝒀𝒊
= −𝟎. 𝟎𝟐𝟒 𝑰𝒏𝒇𝒍𝒆𝒙𝒊𝒃𝒍𝒆 𝒉𝒐𝒖𝒓𝒔 + 𝟎. 𝟑𝟖𝟔 𝒈𝒆𝒏𝒅𝒆𝒓 𝒊𝒏𝒆𝒒𝒖𝒂𝒍𝒊𝒕𝒚
− 𝟎. 𝟏𝟐𝟓 𝒔𝒆𝒙𝒖𝒂𝒍 𝒉𝒂𝒓𝒂𝒔𝒔𝒎𝒆𝒏𝒕) + 𝟎. 𝟏𝟓𝟐(𝒍𝒐𝒘 𝒘𝒂𝒈𝒆𝒔

• Regression model equation with significant predictors :


𝒀𝒊 = 𝟎. 𝟑𝟖𝟔 𝒈𝒆𝒏𝒅𝒆𝒓 𝒊𝒏𝒆𝒒𝒖𝒂𝒍𝒊𝒕𝒚 + 𝟎. 𝟏𝟓𝟐 (𝒍𝒐𝒘 𝒘𝒂𝒈𝒆𝒔)

You might also like