SPSS Basic Statistics Tutorial 2020
SPSS Basic Statistics Tutorial 2020
Important: The instructions below are based on version 24 of the SPSS software,
which is available on the University of Nottingham network (type “SPSS” in programs
search bar). Please note if you use earlier or later versions of this software there will
be differences in the layout of output. However, the configuration of the menus is
largely the same and all underlying principles will remain the same.
Access: For details of how you can access SPSS software in either the computer
labs or for use off campus please visit the UoN software library.
https://workspace.nottingham.ac.uk/display/Software/SPSS+Statistics
Aim: Use SPSS v. 24 to perform some simple descriptive and bivariate analyses
using data from the Gedling area of Nottingham. The hints below will help you to
select the correct commands from the menus in SPSS. Please note this tutorial will
not provide guidance on entering your own data in SPSS.
Opening the dataset: Save the Gedling dataset (on Moodle Evidence Based
Medicine MEDS3039/MEDS3038 UNUK AUT Statistics 1 area) in a folder or drive of
your choice. Select File, Open, Data and select the Gedling data set from where it is
saved.
SPSS is a windows based computer package used for statistical analyses. For
version 17 the software becomes known as PASW, before IBM took over the
package and restored the SPSS title for version 19. SPSS is menus rather than
command driven. There are two main windows: the Data Editor window and the
Output or Viewer window.
When you first start the SPSS package, a Data Editor window opens. This displays
the contents of the working data file. This window has two different views: Data View
and Variable View and you can switch between the two using the tabs at the bottom.
MEDS3038/9 2020
2
To carry out procedures in Data View, use the menus along the top, or the toolbar
buttons which are shortcuts of the most popular commands. Position the mouse
over any toolbar button for a description of the command (which appears at the foot
of the screen).
Use the scroll bars to move up and down, and from right to left, to view the data.
Use the arrow keys on your keyboard, or the mouse, to move from one cell to
another in the spreadsheet.
To move to the top, bottom, far right or far left of the spreadsheet, press control
and the appropriate arrow key together. This is an easy way of seeing how many
people are in your dataset.
You can display the values of your categorical variables as the numeric codes
entered (eg 0’s and 1’s for sex), or to view the value labels which have been
defined in Variable View (e.g. male and female) go to View on the menubar, and
choose value labels. Alternatively use the button on the toolbar to swap
between the two display options.
MEDS3038/9 2020
3
Obtain a frequency table for the variable ‘weightgp’. (Hint: Analyze, Descriptive
Statistics, Frequencies, then put 'weightgp' (not 'weight') in the Variable(s) list
and click OK).
Question 2: How many people weigh between 70 and 84 kg? Also, express this
as a proportion and as a percentage.
Now display the distribution of the variable ‘weightgp’ graphically. (Hint: under
‘Graphs’, 'Legacy Dialogs' choose either ‘Bar…, Simple’ or ‘Pie’. and leave
‘Summaries for groups of cases’ box checked, then click Define. For bar charts,
drag ‘weightgp’ into the ‘Category Axis’ box, For pie charts, drag 'weightgp' into
the ‘Define Slice by:’ box. Click OK.
For bar chart, repeat but present % of cases. Hint: Click ‘% of cases’ at the top
of the screen.
Question 5: From the graph, approximately what % of the sample fall into the
lightest weight category?
The research question: Is there a relationship between sex and hay fever?
Summarise the data – i.e. use the table below to help you (HINT: you can create this
table using Analyze, Descriptive statistics, Crosstabs... Put Sex into the ‘Row(s)’ box
and hay fever in 1991 into the ‘Column(s)’ box). Click OK.
MEDS3038/9 2020
4
MEDS3038/9 2020
5
Question 8: What percentage of men had hay fever in 1991? (Hint: Repeat the
above procedure but this time click the ‘Cells’ tab and click ‘Row’ (Under
percentages) to get percentages with hay fever by gender, Remember to click
Continue after this).
Question 9: Using SPSS, calculate the odds ratio and 95% confidence interval
for hay fever in women as compared to men (HINT: using Analyze, Descriptive
statistics, Crosstabs and click the ‘Statistics’ cell and check the 'Risk' box. Men
would be considered the “unexposed” group in this instance (because they take
the lower numeric value, i.e. 1).
Question 11: Is this relationship significant? Carry out a chi-squared test (HINT:
using Analyze, Descriptive statistics, Crosstabs and tick the ‘Chi-square’ box
under the Statistics... option). What is the p value? Interpret the p value.
MEDS3038/9 2020
6
Question 12: Explore the associations using the same approach as above and
determine whether results are statistically significant
Sex and atopy in 1991
Sex and smoking
Social class and hay fever
(Hint: Remember that chi-square also works when you have exposure and outcome
variables which have >2 categories. The chi-square statistic and p-value can be
interpreted in the same way as before)
Obtain a histogram and summary statistics for the variable ‘FEV191’. (Hint:
choose Analyze, Descriptive Statistics, Explore and drag 'FEV191' into the
Dependent List. Click on the 'Plots.' cell and select 'Histogram' (also uncheck
Stem-and-leaf). Click Continue followed by OK.
Question 14: What is the mean FEV1 from the output produced?
Question 17: What is the range (the minimum and maximum values, not the
difference between them)?
Question 18: Repeat the above steps for the following variables:
1. Height
2. Weight
3. Age
MEDS3038/9 2020
7
The research question: Is there a difference in FEV1 between men and women?
Question 19: Write down the mean FEV1 and 95% CI for the mean FEV1 for men
and women (Hint: choose Analyze, Descriptive statistics, Explore. This time you
will need to drag the variable ‘Sex’ into the Factor list). Putting a variable in the
factor list will mean the procedure will be repeated for each category of the
variable.
Question 20: Write down an estimate of the mean difference and 95% confidence
intervals in FEV1 between men and women (HINT: you will find this test under the
option Analyze, Compare means, Independent-Samples T-Test. Place the
outcome measure in the Test variable(s): box and your exposure variable (i.e.
sex) in the Grouping variable box. Click on Define Groups… and define your
groups as ‘1’ and ‘2’).
Question 21: Compare the two means using an independent samples t test to
see if this result is significant. What is the p-value? And what does it mean?
(HINT: the output shows the results of two different tests, one assuming equal
variances between the two groups (top row), and one not assuming equal
variances (bottom row). Look at Levene’s test to decide which row to use: if
Levene’s test has a significant p value (P<0.05, indicated by Sig.) then there is a
difference in variances and you need to use the bottom row to compare means, if
not then use the top row. In practice you will find these results are very similar so
usually it is inconsequential which row of output you interpret.
Question 22: In this example the confidence interval for the mean difference does
not include zero. Explain why we would expect this even before looking at the
confidence interval.
MEDS3038/9 2020
8
Question 23: Following the same steps as above investigate whether vitamin E
intake differs between people with and without hay fever (hint: please make sure
groups are defined correctly).
5. Mann-Whitney U Test
Please note understanding output from this test is not required for the REM
exam (only why you would use it)
The research question: Are iron levels different for men and women?
Question 24: Are iron levels in 1991 (variable 'iron91') approximately normally
distributed in men and women?. HINT: plot separate histograms for men and
women using the Explore... command as described above, i.e. put 'iron91' in the
Dependent List and 'Sex' in the Factor list.
Question 25: Depending on the answer the Q.24 carry out either an Independent
Samples t-test or Mann-Whitney U test to determine whether there is a
significant difference in iron levels between men and women.
MEDS3038/9 2020
9
online material, although understanding this test goes beyond what you would be
expected to know for the REM module and your dissertations.
Question 26: If you have chosen to use the Mann-Whitney test for Q.25 then
present the median and inter-quartile range for the vitamin E levels in men and
women (this should be used because if the response variable is not normally
distributed it is not appropriate to use the mean and standard deviation).
HINT: to get the median and inter-quartile range use the Explore command as
described above. The inter-quartile range consists of two values the 25th and
75th percentiles (the median is the 50th percentile). To get these values Click on
‘Statistics’ followed by ‘Percentiles’ Click ‘Continue’ and then ‘OK’. The values
underneath 25 and 75 in the Weighted Average row of output represent the
interquartile range (do not use Interquartile range value given as part of the main
output table as this represents the difference between these percentiles and is
less informative).
Important note about assessing for normality: In the lecture slides for this module
you are provided with a flow diagram to select the most appropriate statistical test
based on aspects of your data. Such a diagram can provide a useful rule of thumb for
selecting the correct test but this should not be a substitute for good academic
judgement. In particular, a very common pitfall is choosing to perform a non-
parametric test (such as the Mann Whitney test) when a parametric test (such as the
independent samples t-test) is OK. Whilst it is true that a normal distribution is one of
the assumptions for an independent samples t-test, in reality the test is robust to a
certain level of non-normality in your outcome measure. In this module, we
emphasise the importance of presenting a measure of size of difference (which we
term “effect size”) in addition to a p-value. A limitation of the Mann Whitney test (and
certain other non-parametric tests) is that they do not relate directly to an effect size
(such as the difference between means, which is the basis of a t-test) so you are left
with a measure of significance (from a hypothesis test) but no measure of effect size.
Another more subtle, disadvantage, is that you are unlikely to get small (i.e.
significant) p-values from a non-parametric test when sample sizes are small even
when the difference between your groups is extreme. In the example above, the
distribution in iron levels were probably skewed enough to indicate that the Mann
Whitney test should be used instead of a t-test. Now consider the variable vitamin E
MEDS3038/9 2020
10
(vite91) which is only slightly skewed and compare this between people with and
without hay fever (hayf91) as in question 23 above - you will notice that with a big
dataset (such as this one) the p-values obtained from the two alternative tests are
very similar.
7. Correlation
The research question: Are age and Vitamin C levels in 1991 correlated?
Question 27: First, check to make sure that both age and vitamin C levels are
normally distributed (HINT: Use either ‘Explore’ as in section 3, or select Graphs,
Legacy Dialogs, Histogram. If using the ‘Explore’ command, both variables can
be put in the ‘Dependent List’ at the same time to save having to do this twice).
Question 28: Perform the appropriate correlation test between age and Vitamin C
level using the results to answer the research question. (HINT: use Analyze,
Correlate, Bivariate... and put both the variables you wish to compare in the
‘Variables’ box together. Make sure the ‘Pearson’ coefficient, the test covered in
the REM lecture, is the only one selected)
8. Regression
The research question: What is the relationship between height and vitamin E
consumption? Does vitamin E consumption predict an individual’s height?
Note: To answer questions 29 to 32 below you will need to decide which is your
dependent variable (y-axis) and which is your independent variable (x-axis).
Question 29: Draw a scatter diagram between height in cm (height) and vitamin E
consumption (vite91). Describe the relationship. (HINT: use the commands
Graphs, Legacy Dialogs, Scatter/Dot..., Simple Scatter followed by Define. Put
MEDS3038/9 2020
11
the appropriate variables in the X and Y Axis: boxes). Click OK (ignore all the
other boxes).
Question 31: Write down the equation of the line (HINT: in the form of Y=a+bX,
where Y is the outcome (dependent) variable and X is the predictor
(independent) variable. (HINT: In the coefficients table the intercept (a) term is
given under B in the '(Constant)' row and the regression slope (b )is given in the
'Vitamin E (mg) 1991' row).
Question 32: Use the above equation to predict an individual’s height if their
vitamin E consumption is 4.5 mg.
MEDS3038/9 2020