0% found this document useful (0 votes)
37 views11 pages

SPSS Basic Statistics Tutorial 2020

Uploaded by

Kavin Ruk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views11 pages

SPSS Basic Statistics Tutorial 2020

Uploaded by

Kavin Ruk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

1

SPSS TUTORIAL -_MEDS3038/9 Research Statistics 1 and 2

Important: The instructions below are based on version 24 of the SPSS software,
which is available on the University of Nottingham network (type “SPSS” in programs
search bar). Please note if you use earlier or later versions of this software there will
be differences in the layout of output. However, the configuration of the menus is
largely the same and all underlying principles will remain the same.

Access: For details of how you can access SPSS software in either the computer
labs or for use off campus please visit the UoN software library.
https://workspace.nottingham.ac.uk/display/Software/SPSS+Statistics

Aim: Use SPSS v. 24 to perform some simple descriptive and bivariate analyses
using data from the Gedling area of Nottingham. The hints below will help you to
select the correct commands from the menus in SPSS. Please note this tutorial will
not provide guidance on entering your own data in SPSS.

Opening the dataset: Save the Gedling dataset (on Moodle Evidence Based
Medicine MEDS3039/MEDS3038 UNUK AUT Statistics 1 area) in a folder or drive of
your choice. Select File, Open, Data and select the Gedling data set from where it is
saved.

1. About SPSS and Exploring the data

1.1 What is SPSS?

SPSS is a windows based computer package used for statistical analyses. For
version 17 the software becomes known as PASW, before IBM took over the
package and restored the SPSS title for version 19. SPSS is menus rather than
command driven. There are two main windows: the Data Editor window and the
Output or Viewer window.

1.2 The Data Editor window

When you first start the SPSS package, a Data Editor window opens. This displays
the contents of the working data file. This window has two different views: Data View
and Variable View and you can switch between the two using the tabs at the bottom.

MEDS3038/9 2020
2

Data View displays the data in a spreadsheet format.

Variable View displays a list of the variables in the dataset.

To carry out procedures in Data View, use the menus along the top, or the toolbar
buttons which are shortcuts of the most popular commands. Position the mouse
over any toolbar button for a description of the command (which appears at the foot
of the screen).

1.3 Viewing data

To get a feel for your data, go to Data View.

 Use the scroll bars to move up and down, and from right to left, to view the data.

 Use the arrow keys on your keyboard, or the mouse, to move from one cell to
another in the spreadsheet.

 To move to the top, bottom, far right or far left of the spreadsheet, press control
and the appropriate arrow key together. This is an easy way of seeing how many
people are in your dataset.

 You can display the values of your categorical variables as the numeric codes
entered (eg 0’s and 1’s for sex), or to view the value labels which have been
defined in Variable View (e.g. male and female) go to View on the menubar, and

choose value labels. Alternatively use the button on the toolbar to swap
between the two display options.

Question 1: How many people are in this dataset?

2. Exploring categorical variables

MEDS3038/9 2020
3

 Obtain a frequency table for the variable ‘weightgp’. (Hint: Analyze, Descriptive
Statistics, Frequencies, then put 'weightgp' (not 'weight') in the Variable(s) list
and click OK).

Question 2: How many people weigh between 70 and 84 kg? Also, express this
as a proportion and as a percentage.

Question 3: Which category has the smallest proportion of people in it?

Question 4: Are there any missing values?

 Now display the distribution of the variable ‘weightgp’ graphically. (Hint: under
‘Graphs’, 'Legacy Dialogs' choose either ‘Bar…, Simple’ or ‘Pie’. and leave
‘Summaries for groups of cases’ box checked, then click Define. For bar charts,
drag ‘weightgp’ into the ‘Category Axis’ box, For pie charts, drag 'weightgp' into
the ‘Define Slice by:’ box. Click OK.

 For bar chart, repeat but present % of cases. Hint: Click ‘% of cases’ at the top
of the screen.

Question 5: From the graph, approximately what % of the sample fall into the
lightest weight category?

Question 6: repeat the above steps for the following variables:


 Smoking status (smk91)
 Social class (social)

3. Describing the relationships between two categorical variables

 The research question: Is there a relationship between sex and hay fever?
Summarise the data – i.e. use the table below to help you (HINT: you can create this
table using Analyze, Descriptive statistics, Crosstabs... Put Sex into the ‘Row(s)’ box
and hay fever in 1991 into the ‘Column(s)’ box). Click OK.

MEDS3038/9 2020
4

MEDS3038/9 2020
5

Hay fever in 1991 TOTAL


Gender Yes No
Male
Female
TOTAL

Question 7: What is the “outcome measure”? And what is the “exposure”?

Question 8: What percentage of men had hay fever in 1991? (Hint: Repeat the
above procedure but this time click the ‘Cells’ tab and click ‘Row’ (Under
percentages) to get percentages with hay fever by gender, Remember to click
Continue after this).

Question 9: Using SPSS, calculate the odds ratio and 95% confidence interval
for hay fever in women as compared to men (HINT: using Analyze, Descriptive
statistics, Crosstabs and click the ‘Statistics’ cell and check the 'Risk' box. Men
would be considered the “unexposed” group in this instance (because they take
the lower numeric value, i.e. 1).

Question 10: Summarise your findings for this relationship.

Question 11: Is this relationship significant? Carry out a chi-squared test (HINT:
using Analyze, Descriptive statistics, Crosstabs and tick the ‘Chi-square’ box
under the Statistics... option). What is the p value? Interpret the p value.

MEDS3038/9 2020
6

Question 12: Explore the associations using the same approach as above and
determine whether results are statistically significant
 Sex and atopy in 1991
 Sex and smoking
 Social class and hay fever

(Hint: Remember that chi-square also works when you have exposure and outcome
variables which have >2 categories. The chi-square statistic and p-value can be
interpreted in the same way as before)

3. Exploring continuous variables

 Obtain a histogram and summary statistics for the variable ‘FEV191’. (Hint:
choose Analyze, Descriptive Statistics, Explore and drag 'FEV191' into the
Dependent List. Click on the 'Plots.' cell and select 'Histogram' (also uncheck
Stem-and-leaf). Click Continue followed by OK.

Question 13: Does the histogram produced look approximately ‘Normally


distributed’’?

Question 14: What is the mean FEV1 from the output produced?

Question 15: What is the median FEV1?

Question 16: What is the standard deviation?

Question 17: What is the range (the minimum and maximum values, not the
difference between them)?

Question 18: Repeat the above steps for the following variables:
1. Height
2. Weight
3. Age

MEDS3038/9 2020
7

4. Independent Samples t test

 The research question: Is there a difference in FEV1 between men and women?

Question 19: Write down the mean FEV1 and 95% CI for the mean FEV1 for men
and women (Hint: choose Analyze, Descriptive statistics, Explore. This time you
will need to drag the variable ‘Sex’ into the Factor list). Putting a variable in the
factor list will mean the procedure will be repeated for each category of the
variable.

Question 20: Write down an estimate of the mean difference and 95% confidence
intervals in FEV1 between men and women (HINT: you will find this test under the
option Analyze, Compare means, Independent-Samples T-Test. Place the
outcome measure in the Test variable(s): box and your exposure variable (i.e.
sex) in the Grouping variable box. Click on Define Groups… and define your
groups as ‘1’ and ‘2’).

Question 21: Compare the two means using an independent samples t test to
see if this result is significant. What is the p-value? And what does it mean?
(HINT: the output shows the results of two different tests, one assuming equal
variances between the two groups (top row), and one not assuming equal
variances (bottom row). Look at Levene’s test to decide which row to use: if
Levene’s test has a significant p value (P<0.05, indicated by Sig.) then there is a
difference in variances and you need to use the bottom row to compare means, if
not then use the top row. In practice you will find these results are very similar so
usually it is inconsequential which row of output you interpret.

Question 22: In this example the confidence interval for the mean difference does
not include zero. Explain why we would expect this even before looking at the
confidence interval.

MEDS3038/9 2020
8

Question 23: Following the same steps as above investigate whether vitamin E
intake differs between people with and without hay fever (hint: please make sure
groups are defined correctly).

5. Mann-Whitney U Test

Please note understanding output from this test is not required for the REM
exam (only why you would use it)

The research question: Are iron levels different for men and women?

Question 24: Are iron levels in 1991 (variable 'iron91') approximately normally
distributed in men and women?. HINT: plot separate histograms for men and
women using the Explore... command as described above, i.e. put 'iron91' in the
Dependent List and 'Sex' in the Factor list.

Question 25: Depending on the answer the Q.24 carry out either an Independent
Samples t-test or Mann-Whitney U test to determine whether there is a
significant difference in iron levels between men and women.

HINT (for Mann-Whitney U test): select Analyze, Nonparametric Tests,


Independent Samples... You will be asked ‘What is your objective?’. Leave the
‘Automatically compare distributions across groups’ cell checked and click on the
‘Fields’ tab at the top of the window. Put 'iron91' in the Test Fields box and 'sex'
in the Groups box. Then click on ‘Settings’ and then click on ‘Customise tests’
followed by ‘Mann-Whitney U (2 samples)’ and make sure all other boxes are
unchecked. Finally, click ‘Run’. In the Output you should get a p-value only (in
the column ‘Sig.’). In order to give an indication of the effect size you should
present the medians (and interquartile ranges) for each group (see q.26 below).

Technical note: The Mann-Whitney U test is used to compare the distribution of


the data values between your two exposure groups (in this case ‘gender’) to
determine whether on average values are higher in one group than the other. It
does not directly compare means (as does the Independent Samples T-test) or
medians. To find out more about this test please consult a statistics text book or

MEDS3038/9 2020
9

online material, although understanding this test goes beyond what you would be
expected to know for the REM module and your dissertations.

Question 26: If you have chosen to use the Mann-Whitney test for Q.25 then
present the median and inter-quartile range for the vitamin E levels in men and
women (this should be used because if the response variable is not normally
distributed it is not appropriate to use the mean and standard deviation).

HINT: to get the median and inter-quartile range use the Explore command as
described above. The inter-quartile range consists of two values the 25th and
75th percentiles (the median is the 50th percentile). To get these values Click on
‘Statistics’ followed by ‘Percentiles’ Click ‘Continue’ and then ‘OK’. The values
underneath 25 and 75 in the Weighted Average row of output represent the
interquartile range (do not use Interquartile range value given as part of the main
output table as this represents the difference between these percentiles and is
less informative).

Important note about assessing for normality: In the lecture slides for this module
you are provided with a flow diagram to select the most appropriate statistical test
based on aspects of your data. Such a diagram can provide a useful rule of thumb for
selecting the correct test but this should not be a substitute for good academic
judgement. In particular, a very common pitfall is choosing to perform a non-
parametric test (such as the Mann Whitney test) when a parametric test (such as the
independent samples t-test) is OK. Whilst it is true that a normal distribution is one of
the assumptions for an independent samples t-test, in reality the test is robust to a
certain level of non-normality in your outcome measure. In this module, we
emphasise the importance of presenting a measure of size of difference (which we
term “effect size”) in addition to a p-value. A limitation of the Mann Whitney test (and
certain other non-parametric tests) is that they do not relate directly to an effect size
(such as the difference between means, which is the basis of a t-test) so you are left
with a measure of significance (from a hypothesis test) but no measure of effect size.
Another more subtle, disadvantage, is that you are unlikely to get small (i.e.
significant) p-values from a non-parametric test when sample sizes are small even
when the difference between your groups is extreme. In the example above, the
distribution in iron levels were probably skewed enough to indicate that the Mann
Whitney test should be used instead of a t-test. Now consider the variable vitamin E

MEDS3038/9 2020
10

(vite91) which is only slightly skewed and compare this between people with and
without hay fever (hayf91) as in question 23 above - you will notice that with a big
dataset (such as this one) the p-values obtained from the two alternative tests are
very similar.

7. Correlation

 The research question: Are age and Vitamin C levels in 1991 correlated?

Question 27: First, check to make sure that both age and vitamin C levels are
normally distributed (HINT: Use either ‘Explore’ as in section 3, or select Graphs,
Legacy Dialogs, Histogram. If using the ‘Explore’ command, both variables can
be put in the ‘Dependent List’ at the same time to save having to do this twice).

Question 28: Perform the appropriate correlation test between age and Vitamin C
level using the results to answer the research question. (HINT: use Analyze,
Correlate, Bivariate... and put both the variables you wish to compare in the
‘Variables’ box together. Make sure the ‘Pearson’ coefficient, the test covered in
the REM lecture, is the only one selected)

8. Regression

 The research question: What is the relationship between height and vitamin E
consumption? Does vitamin E consumption predict an individual’s height?

Note: To answer questions 29 to 32 below you will need to decide which is your
dependent variable (y-axis) and which is your independent variable (x-axis).

Question 29: Draw a scatter diagram between height in cm (height) and vitamin E
consumption (vite91). Describe the relationship. (HINT: use the commands
Graphs, Legacy Dialogs, Scatter/Dot..., Simple Scatter followed by Define. Put

MEDS3038/9 2020
11

the appropriate variables in the X and Y Axis: boxes). Click OK (ignore all the
other boxes).

Question 30: Perform linear regression between height and vitamin E


consumption. Is there a significant association between height and vitamin E
consumption? (HINT: use Analyze, Regression, Linear... Put your y axis variable
in the Dependent box and your x axis variable in the Independent(s) variables
box)

Question 31: Write down the equation of the line (HINT: in the form of Y=a+bX,
where Y is the outcome (dependent) variable and X is the predictor
(independent) variable. (HINT: In the coefficients table the intercept (a) term is
given under B in the '(Constant)' row and the regression slope (b )is given in the
'Vitamin E (mg) 1991' row).

Question 32: Use the above equation to predict an individual’s height if their
vitamin E consumption is 4.5 mg.

MEDS3038/9 2020

You might also like