Introduction To SPSS
Introduction To SPSS
SPSS v15
Page 2 of 44
CONTENTS
Page 3 of 44
Page 4 of 44
PROCEDURE 0: GETTING DATA
Click on Start > JMU Applications > System Utilities > Change
Library
This displays the Connect Library Screen
Expand the „Change To‟ dropdown list and select „Avril Robarts LRC‟
from the options displayed.
A message is displayed:
Click on „OK‟.
Minimise all programs so you can see your desktop.
Click on the „My Computer‟ icon.
You will see an icon for your L: drive under Network Drives:
Page 5 of 44
Page 6 of 44
LESSON 1 –
SPSS BASICS
In this lesson, you will learn how to:
Start SPSS
Open an existing data file
Save a data file
Create a new data file
Create SPSS variables
Enter SPSS data
Navigate SPSS
Page 7 of 44
Page 8 of 44
Lesson 1: SPSS Basics
1.1 SPSS Files
There are 3 types of file used by SPSS:
Page 9 of 44
1.2 Starting SPSS
Page 10 of 44
PROCEDURE 2: OPENING AN EXISTING DATA FILE
The selected data file is opened
Page 11 of 44
1.5 Creating a New Data File
PROCEDURE 4: CREATING A NEW DATA FILE
If the Welcome Screen is visible: Select the „Cancel‟ button
If the Welcome Screen is not visible: Select “File/New/Data” from the Menu
Bar.
The SPSS data screen is shown (see figure 4)
Page 12 of 44
1.6 Creating SPSS Data
1.6.1 SPSS Variables
SPSS requires variables to hold it‟s data, so we must create these variables before we
can enter any data.
Page 13 of 44
1.6.2 Identifying Variables from Questionnaires
Since it is common to obtain data for our statistical experiments from questionnaires,
it is important to understand how to use our questionnaires to identify the variables we
will need.
EXAMPLE 1: QUESTIONNAIRE
QUESTIONNAIRE
No. Question Answer
1 Age (in years)
2 Height (in meters)
3 Weight (in kilogrammes)
4 How many cigarettes do you smoke per None
day (please tick as appropriate)? 1 - 10
11 - 20
21 - 30
31 - 40
41 or more
5 Do you own a set of Bathroom Scales Y/N
6 Do you own a Rowing Machine Y/N
7 Do you own an Exercise Bike Y/N
8 Do you own a Punch Bag Y/N
9 Do you own any other Sports Equipment Y/N
10 How many yours a week (approx) do you
spend exercising?
11 Are you a member of a sporting team? Y/N
12 How many books do you own?
13 Do you own a video recorder? Y/N
14 Do you own a DVD player? Y/N
15 Do you own a PC? Y/N
16 Do you own a Hi Fi? Y/N
17 How many computer games do you own?
18 Salary (times thousands of pounds). Please < 20
tick one as appropriate. 20.01 – 30
30.01 – 40
40.01 plus
There are 18 questions, and we will need one variable per question. So we will need
18 variables:
Page 14 of 44
EXAMPLE 1: QUESTIONNAIRE (cont.)
Page 15 of 44
1.6.3.1 Rules for Variable Names
The following rules apply to variable names:
The name must begin with a letter. The remaining characters can be any letter,
any digit, a period, or the symbols @, #, _, or $.
Variable names cannot end with a period.
Variable names that end with an underscore should be avoided (to avoid
conflict with variables automatically created by some procedures).
Blanks and special characters (for example, !, ?, ‟, and *) cannot be used.
Each variable name must be unique; duplication is not allowed. Variable
names are not case sensitive. The names NEWVAR, NewVar, and newvar are
all considered identical.
Page 16 of 44
1.6.3.3 Creating a Text Variable
PROCEDURE 7: CREATING A TEXT VARIABLE
Open the Variable View form, if it is not already visible (See section 1.6.3)
On the first available blank row: Enter a variable name in the „Name‟ column.
This name will be used to identify your variable
within SPSS. See section 1.6.3.1 for the
Variable Name Rules.
Press the „tab‟ or „return‟ key.
This fills the row with default settings for that
variable.
Select the cell in the „Type‟ column. Click on
the button that appears on the right hand
side of the field. This displays the „Variable
Type‟ box.
For example, a Yes/No variable could use the value Y to represent „Yes‟ and N to
represent „No‟. These are string values.
Alternately, you might want to use the value1 to represent „Yes‟ and 0 to represent
„No‟. These are numeric values.
Page 17 of 44
PROCEDURE 8: CREATING A CATEGORICAL VARIABLE
Open the Variable View form, if it is not already visible (See section 1.6.3)
On the first available blank row: Create either a numeric or a string variable,
depending on the type required for the label.
Select the cell in the „Value‟ column. Click on
the button that appears on the right hand
side of the field. This displays the „Value
Labels‟ box
Page 18 of 44
1.6.4 Deleting a variable
PROCEDURE 9: DELETING A VARIABLE
Open the Variable View
Right-click on the variable number.
EXERCISE 1
You have been handed the following questionnaire, which needs to be coded into
SPSS:
EXERCISE 1
QUESTIONNAIRE
No. Question Answer
1 Age (in years)
2 Height (in meters)
3 Name
4 Gender Male
Female
Create an SPSS data file (called Q1.sav) to hold the data.
Before you create the variables, decide:
o How many variables are needed
o What type (numeric, string, categorical) does each variable need to be?
EXERCISE 2
The following 2 completed questionnaires have been submitted.
Enter the data into your SPSS data file Q1.sav.
EXERCISE 2
QUESTIONNAIRE
No. Question Answer
1 Age (in years) 32
2 Height (in meters) 1.56
3 Name Julie Jones
4 Gender Male
Female X
No. Question Answer
1 Age (in years) 41
2 Height (in meters) 1.92
3 Name Andy Andrews
4 Gender Male X
Female
Page 19 of 44
1.6.5 Excluding Data from Calculations: Defining Missing
Variables
In SPSS there are no empty cells within the data file (which is assumed to be
rectangular). If no value has been entered, the system supplies the system-missing
value (represented on the screen as a dot). SPSS will automatically exclude system-
missing values from its statistical calculations.
It may be, however, that the user wishes SPSS to treat certain responses present
within the data as if they were missing. For example, one question on our survey
might have five possible answers (A, B, C, D, E). But supposing that category E
indicates that the respondent refused to answer the question, while category D
indicates that the respondent failed to understand the question. We might wish to
complete an analysis of our data, but not include categories D and E in the
calculations, even though we do want to see their frequencies and counts in the
output. In this case we could define categories D and E as user-missing.
SPSS allows us to define special values (missing values) which will be used to
indicate that the data is user-missing. Data values specified as user-missing are
flagged for special treatment and are excluded from most calculations.
For each variable, you can choose one of the following options:
Define no missing values
Define up to three values that the system will take to read „user-missing.‟ In our
example above, we could define categories C and D as missing values.
Define a range plus one discrete value. For example in an analysis of examination
results we might want to exclude results less than 20% from our analysis, but still
have the counts of those results displayed on our output. We might also want to
exclude those students who left the examination without submitting a paper (a walk
out could be coded as an arbitrary number, such as –9. The minus sign makes it
obvious that it is not an examination result). Here we could define the range 0%-20%
and the discrete value –9 as missing values.
All string values, including null or blank values, are considered valid unless you
explicitly define them as missing. To define null or blank values as missing for a
string variable, enter a single space in one of the fields for Discrete missing values.
Page 20 of 44
PROCEDURE 10: DEFINING MISSING VALUES
Open the Variable View form, if it is not already visible (See section 1.6.3)
On the row representing the variable to have missing values defined:
Select the cell in the „Missing‟ column. Click on the button that appears on the
right hand side of the field. This displays the „Missing Values‟ box
To define „No missing values,‟
either:
Select the „Cancel‟ button, or
Ensure that „No missing values‟ is
checked, then select the „OK‟
button.
To define „Discrete missing
values‟:
Check on „Discrete missing values‟
Enter up to three separate values in
the fields provided
Select the „OK‟ button. Figure 11: The Missing Values Box
Any datum with a value corresponding to one of these entered values will be
marked user-missing and ignored for most calculations.
To define „Range plus one optional discrete missing value‟:
Check on „Range plus one optional discrete missing value‟
Enter the limits of the range in the „Low‟ and „High‟ fields.
If required, enter a further discrete value in the „Discrete value‟ field.
Any datum having a value within the entered range, or corresponding to the
discrete value entered, will be marked user-missing and ignored for most
calculations.
For example, a variable might contain respondents‟ ages, held as numerical data.
However, we may decide to redefine this data so it has the following categories:
19 or below, 20 to 29, 30 to 39, 40 to 49, 50 to 59, 60 or above.
When categorising data, the results can be placed in the same variable, or a new
variable can be created. It is more common to create a new variable:
Page 21 of 44
PROCEDURE 11: CONVERTING NUMERIC DATA TO CATEGORICAL
DATA
From the „Transform‟ menu, select „Recode Into Different Variables…‟
This displays the „Recode into Different Variables‟ dialog box
Select the variable you want to transform in the left hand box
Select the arrow to move the variable into the „Input Variable -> Output
Variable box.
Enter a name for the new variable in the „Name‟ field.
Enter a label for the new variable in the „Label‟ field.
Select the „Change‟ button.
Select the „Old and New Values…‟ button
This displays the „Recode into Different Variables: Old and New Values‟ dialog
box.
To enter the lowest category Ensure that the „Range: LOWEST through value:‟
(e.g. 19 or below): checkbox is checked „on‟.
Enter the corresponding value (19, in our
example) in the „LOWEST through value‟ field.
Ensure that the „Value‟ check box is set to „on‟ in
the „New Value‟ frame.
Enter the category value in the „Value‟ field of the
„New Value‟ frame..
Page 22 of 44
PROCEDURE 11: CONVERTING NUMERIC DATA TO CATEGORICAL
DATA
Select the „Add‟ button
To enter the highest category Ensure that the „Range: value through HIGHEST:‟
(e.g. 60 or above): checkbox is checked „on‟.
Enter the corresponding value (60, in our
example) in the „value through HIGHEST:‟ field.
Ensure that the „Value‟ check box is set to „on‟ in
the „New Value‟ frame.
Enter the category value in the „Value‟ field of the
„New Value‟ frame..
Select the „Add‟ button
To enter central categories Ensure that the „Range: through‟ checkbox is
(e.g. 20 to 29): checked „on‟.
Enter the lowest and highest values of the range in
the corresponding boxes (20 and 29, in our
example)
Page 23 of 44
1.6.6.2 Regrouping Categorical Variables
Sometimes it is necessary to combine the categories of a categorical variable to form a
new set of categories. For example, we might have a categorical variable with 5
categories, which we want to recode into 3 categories, as follows:
Select the categorical variable you want to regroup in the left hand box
Select the arrow to move the variable into the „Input Variable -> Output
Variable box.
Enter a name for the new variable in the „Name‟ field.
Enter a label for the new variable in the „Label‟ field.
Select the „Change‟ button.
Select the „Old and New Values…‟ button
Page 24 of 44
This displays the „Recode into Different Variables: Old and New Values‟ dialog
box.
Page 25 of 44
1.6.6.3 Calculating New Variables from Existing Data
It is possible to create new variables by performing calculations on existing data. For
example, in your questionnaire you may have two variables, called „before‟ and
„after,‟ describing respondents‟ weight before and after a slimming treatment.
However, once the data has been gathered, you may decide that you need to perform
some tests on the actual difference in weight. It is necessary to calculate this value
from the „before‟ and „after‟ variables.
Enter the name of the new variable in the „Target Variable:‟ field.
Build the calculation: Select a variable from the left hand box
NOTE: Select the arrow to move it into the „Numeric
VARIABLES USED Expression‟ box.
IN Select an arithmetic operator from the displayed keypad
CALCULATIONS (+, -, *, / etc.).
MUST BE Select another variable from the left hand box
NUMERIC Select the arrow to move it into the „Numeric
Expression‟ box
Continue until the complete expression is built
Select „OK‟
Open the Variable View form (see PROCEDURE 5: ).
Create a label for the new variable.
Page 26 of 44
1.7 Getting Around in SPSS
To change any part of a variable, double-click on the horizontal grey bar at the top of
the spreadsheet. This brings you to the Define Variable window.
Use the arrow keys on the keyboard to move forward and backward.
To go all the way to the end of the data, press <control> and the right arrow key.
To go all the way to the beginning of the data, press <control> and the left arrow key.
Page 27 of 44
Page 28 of 44
LESSON 2 –
SUMMARISING DATA
In this lesson, you will learn how to:
Use SPSS to summarise data
Calculate frequencies (i.e. number of occurrences) of data
Calculate maximum, minimum and mean values
Calculate Standard Deviation
Display data in graphical format
Page 29 of 44
Page 30 of 44
Lesson 2: Summarizing Data
2.1 Summarizing Data Numerically
Once your data has been entered, it is often easy to spot broad trends by inspecting
summaries of that data. In SPSS these summaries are called Descriptive Statistics.
Frequencies
With a categorical variable it is often useful to know how many responses each
category has received. These can be expressed as straightforward counts or as
percentages.
Means
It is often useful to know the average values of our variables, particularly when
comparing one subset of data against another.
Select the variable(s) you want to examine in the left-hand panel (to select more
than one variable, depress the control key while left-clicking on the required
variables.)
Select the arrow button to move the selected variable(s) into the „Variable(s):‟
box.
Ensure that the „Display frequency tables‟ checkbox is checked „on.‟
Select the OK button. The SPSS Output Viewer is displayed.
Page 31 of 44
2.1.1.1 SPSS Output
The following output is derived from a frequency analysis (PROCEDURE 14: ) of the
dataset demo.sav. The particular variable being analysed is car.
Cumulative
Frequency Percent Valid Percent Percent
Valid Ec onomy 1841 28.8 28.8 28.8
Standard 2275 35.5 35.5 64.3
Luxury 2284 35.7 35.7 100.0
Total 6400 100.0 100.0
Notes
o Labels are displayed instead of variable names or category values.
o In the „Statistics‟ table we are shown the number of valid responses (6400),
and the number of missing responses (0) for that variable.
o We are shown the frequency (i.e. the number of responses) for each category
of the variable.
o We are shown the percentages for each category of the variable: 28.8% of
respondents said they had an Economy car
Page 32 of 44
2.1.2 Calculating Frequencies for Tabulated Categorical Variables
(Crosstabs)
2.1.2.1 Discussion
We may want to calculate how the categories of one variable break down against the
categories of another. For example, the data set demo.sav contains two variables:
1. gender, and
2. carcat (Primary vehicle price category).
Say that we would like to determine how car ownership within the vehicle price
categories is broken down between the sexes (i.e. the gender categories).
For this we can calculate a crosstab, displaying the relevant data in the cells of a table
Page 33 of 44
PROCEDURE 15: CALCULATING FREQUENCIES FOR TABULATED
CATEGORICAL DATA (CROSSTABS)
This displays the „Crosstabs: Cell Display‟ dialog box
Cases
Valid Mis sing Total
N Percent N Percent N Percent
Gender * Primary
6400 100.0% 0 .0% 6400 100.0%
vehicle price category
Page 34 of 44
OUTPUT 2: CROSSTAB CALCULATION
Ge nde r * P rima ry ve hicl e price category Crosstabulati on
Notes
o The variable labels are used throughout
o The top table tells us that there are 6400 respondents in the data set, with no
missing values
o The second table tells us, for example, that 28.6% of females bought an
Economy car, while 28.9% of males have Economy cars.
o 49.6% of the owners of luxury cars are female, while the other 50.4% are
male.
Page 35 of 44
2.1.3 Calculating the Mean, Maximum and Minimum of a Scalar
(Non-Categorical) Variable
Select the variable(s) you want to examine in the left-hand box (to select more than
one variable, depress the control key while left-clicking on the required variables.)
Select the arrow button to move the selected variable(s) into the „Variable(s):‟
box.
Select the OK button.
The SPSS Output Viewer is displayed
Notes
o The variable‟s label is displayed instead of its name.
o We can see that there were 6400 respondents.
o The minimum salary is 9.00
o The maximum salary is 1116.00
o The mean salary is 96.4748
Page 36 of 44
2.1.4 The Standard Deviation
Referring back to PROCEDURE 16: , we can see that another number, the Standard
Deviation, is also calculated.
Standard Deviation is a measure of the „spread‟ of the data. If all the numbers in your
sample are close to the mean value, then there is a „low spread,‟ and the sample has a
Low Standard Deviation.
If the numbers in your sample are not all close to the mean value, then there is a „high
spread,‟ and the sample is said to have a High Standard Deviation.
SET 1
1.3 1.4 1.2 1.1 1.3 1.5 1.2 1.3 1.2 1.1
SET 2
170 10 110 210 70 65 400 253 200 326
Page 37 of 44
2.2 Summarizing Data Graphically
Graphs are useful when you want to be able to assimilate your data in one glance.
They provide an easy way of summarizing data in a way that almost anyone can
understand.
2000
variable. 1800
1600
1400
1200
Count
1000
800
Under $25 $25 - $49 $50 - $74 $75+
$50 - $74
$25 - $49
2000
1800
1600
1400
1200
Count
1000
Under $25 $25 - $49 $50 - $74 $75+
There are many options available when creating a bar chart. In this example we will
use the dataset demo.sav to create a simple bar chart showing the number of
respondents in each income category. The required variable is called inccat.
Page 38 of 44
PROCEDURE 17: CREATING A SIMPLE BAR CHART
On the menu bar, select Graphs/Legacy Dialogs/Bar…
This displays the „Bar Charts‟ dialog box
Figure 23: The Define Simple Bar: Summaries for Groups of Cases
Dialog Box
Select the variable(s) you want to examine in the left-hand box . In this case we
select the variable „Income category in thousands [inccat]‟
Select the arrow button to move the selected variable(s) into the „Category
Axis‟ field.
Select the OK button.
The SPSS Output Viewer is displayed
Page 39 of 44
2.2.1.1 SPSS Output
The following output is the barchart derived from PROCEDURE 17: . The
particular variable being graphed is inccat.
2600
2400
2200
2000
1800
1600
1400
1200
Count
1000
800
Under $25 $25 - $49 $50 - $74 $75+
Here, you can easily see that most of the respondents were in the $25 - $29 income
category.
Page 40 of 44
2.2.2 Creating a Pie Chart
In this example we will use the dataset demo.sav to create a simple pie chart showing
the number of respondents in each income category. The required variable is called
inccat.
Figure 25: The Define Pie: Summaries for Groups of Cases Dialog
Box
Select the variable(s) you want to examine in the left-hand box . In this case we
select the variable „Income category in thousands [inccat]‟
Select the arrow button to move the selected variable(s) into the „Define
Slices by:‟ field.
Select the OK button.
The SPSS Output Viewer is displayed
Page 41 of 44
2.2.2.1 SPSS Output
The following output is the pie chart derived from PROCEDURE 18:
PROCEDURE 18: . The particular variable being graphed is inccat.
Under $25
$75+
$50 - $74
$25 - $49
Here, you can easily see that most of the respondents were in the $25 - $29 income
category.
Page 42 of 44
2.2.3 Creating a Line Chart
There are many options available when creating a line chart. In this example we will
use the dataset demo.sav to create a simple line chart showing the number of
respondents in each income category. The required variable is called inccat.
Figure 27: The Define Simple Line: Summaries for Groups of Cases
Dialog Box
Select the variable(s) you want to examine in the left-hand box . In this case we
select the variable „Income category in thousands [inccat]‟
Page 43 of 44
PROCEDURE 19: CREATING A SIMPLE LINE CHART
Select the arrow button to move the selected variable(s) into the „Category
Axis‟ field.
Select the OK button.
The SPSS Output Viewer is displayed
2600
2400
2200
2000
1800
1600
1400
1200
Count
1000
Under $25 $25 - $49 $50 - $74 $75+
Page 44 of 44