What Is SPSS ND2 Work
What Is SPSS ND2 Work
What Is SPSS ND2 Work
SPSS stands for “Statistical Package for Social Sciences”. It is an IBM tool. This tool first launched
in 1968. This is one software package. This package is mainly used for statistical analysis of the data.
SPSS is mainly used in the following areas like healthcare, marketing, and educational research,
market researchers, health researchers, survey companies, education researchers, government,
marketing organizations, data miners, and many others.
It provides data analysis for descriptive statistics, numeral outcome predictions, and identifying
groups. This software also gives data transformation, graphing and direct marketing features to
manage data smoothly.
Why SPSS?
They came under IBM SPSS Statistics, and most of the users refer to it as SPSS only.
It is straight forward, and its English-like command language helps the user to go through the flow.
SPSS introduces the following four programs that help researchers with their complex data analysis
Statistics Program
SPSS’s statistics program gives a large amount of basic statistical functionality; some include
frequencies, cross-tabulation, bivariate statistics, etc.
Modeler Program
Researchers are able to build and validate predictive models with the help of advanced statistical
procedures.
It gives robust feedback analysis. which in turn get a vision for the actual plan.
Visualization Designer
Researchers found this visual designer data to create a wide variety of visuals like density charts and
radial box plots.
Features of SPSS
The data from any survey collected via Survey Gizmo gets easily exported to SPSS for detailed and
good analysis.
In SPSS, data gets stored in.SAV format. These data mostly comes from surveys. This makes the
process of manipulating, analyzing and pulling data very simple.
SPSS have easy access to data with different variable types. These variable data is easy to
understand. SPSS helps researchers to set up model easily because most of the process is
automated.
After getting data in the magic of SPSS starts. There is no end to what we can do with this data.
SPSS has a unique way to get data from critical data also. Trend analysis, assumptions, and predictive
models are some of the characteristics of SPSS.
SPSS is easy for you to learn, use and apply.
It helps in to get data management system and editing tools handy.
SPSS offers you in-depth statistical capabilities for analyzing the exact outcome.
SPSS helps us to design, plotting, reporting and presentation features for more clarity.
Prediction for a variety of data for identifying groups and including methodologies such as cluster
analysis, factor analysis, etc.
Descriptive statistics, including the methodologies of SPSS, are frequencies, cross-tabulation, and
descriptive ratio statistics, which are very useful.
Also, Bivariate statistics, including methodologies like analysis of variance (ANOVA), means,
correlation, and nonparametric tests, etc.
Numeral outcome prediction such as linear regression.
It is a kind of self-descriptive tool which automatically considers that you want to open an existing
file, and with that opens a dialog box to ask which file you would like to open. This approach of
SPSS makes it very easy to navigate the interface and windows in SPSS if we open a file.
Besides the statistical analysis of data, the SPSS software also provides data management features;
this allows the user to do a selection, create derived data, perform file reshaping, etc. Another feature
is data documentation. This feature stores a metadata dictionary along with the data file.
It has two types of views those are Variable View and Data View:
Variable View
1. Name: This is a column field, which accepts the unique ID. This helps in sorting the data. For
example, the different demographic parameters such as name, gender, age, educational qualification
are the parameters for sorting data.
The only restriction is special characters which are not allowed in this type.
2. Label: The name itself suggests it gives the label. Which also gives the ability to add special
characters.
3. Type: This is very useful when different kind of data’s are getting inserted.
4. Width: We can measure the length of characters.
5. Decimal: While entering the percentage value, this type helps us to decide how much one needs to
define the digits required after the decimal.
6. Value: This helps the user to enter the value.
7. Missing: This helps the user to skip unnecessary data which is not required during analysis.
8. Align: Alignment, as the name suggests, helps to align left or right. But in this case, for ex. Left align.
9. Measure: This helps to measure the data being entered in the tools like ordinal, cardinal, nominal.
The data has to enter in the sheet named “variable view”. It allows us to customize the data type as
required for analyzing it.
To analyze the data, one needs to populate the different column headings like Name, Label, Type,
Width, Decimals, Values, Missing, Columns, Align, and Measures.
These headings are the different attributes which, help to characterize the data accordingly.
Data View
The data view is structured as rows and columns. By importing a file or adding data manually, we
can work with SPSS.
First of all, we need to check the minimum system requirements at the SPSS Statistics System
Requirements.
Then the option selects the Operating system loaded on your system and figure out the prerequisites.
Open browser for the SPSS website; this will lead to the downloading software application. Start
with the SPSS free trial version.
Following are the Steps for importing Excel file into SPSS.
=> Open
After selecting the excel file that will be imported for performing the data analysis, we need to ensure
that in the dialog box that we selected is “read variable names from the first row of data”.
And at the end, click OK. Your file has now imported in SPSS.
Conclusion
The bottom line is though Excel offers a good way of data organization, SPSS is more suitable for in-
depth data analysis. This tool is very useful in the analysis and visualization of data.
SPSS (Statistical Package for the Social
Sciences)
SPSS (Statistical Package for the Social Sciences), also known as IBM SPSS Statistics, is a software
package used for the analysis of statistical data.
Although the name of SPSS reflects its original use in the field of social sciences, its use has since
expanded into other data markets. SPSS is commonly used in healthcare, marketing and education
research.
The types of data analyzed using SPSS is widely varied. Common sources include survey results,
organization customer databases, Google Analytics, scientific research results and server log files.
SPSS supports both analysis and modification of many kinds of data and almost all formats of
structured data. The software supports spreadsheets, plain text files and relational databases such as
SQL, SATA and SAS.
SPSS provides data analysis for descriptive and bivariate statistics, numeral outcome predictions and
predictions for identifying groups. The software also provides data transformation, graphing and
direct marketing features.
The software interface displays open data similarly to a spreadsheet in its main view. With its
secondary variable view, the metadata that describes the variables and data entries present in the data
file are displayed.
The software package was created in 1968 by SPSS Inc. and was acquired by IBM in 2009. While
the software was renamed to IBM SPSS Statistics, it is still commonly referred to as just SPSS.
Conducting an analysis on your own can be a daunting task, especially if you are not familiar with
the software you have to use to perform your analyses. There are many different programs you can
use to analyze data; even Excel allows for some of the simpler analyses. Here we will discuss some
data management tasks in a very commonly used data analysis software in doctoral research: the
Statistical Package for the Social Sciences, or SPSS.
SPSS holds data in the same way as many other programs, with columns dedicated to different
variables, and one observation (or participant) per row. This is the main way to view your data in
SPSS, though you can also view just the list of variables in the dataset as well. These two data views
have different uses; the variable view allows you to label categories, rename variables, and change
each variable’s type, but does not allow you to see the data or edit any actual data entries. The data
viewer lets you see the data for each variable, but also just looks like a wall of numbers. Switching
between the two views is as easy as double clicking on a variable’s number (in variable view) or a
variable’s name (in data view), holding CTRL and pressing T, or selecting the view format from the
tabs at the bottom of either window
Here we will focus on the variable view. The first step in creating your data set is usually naming the
variables, but might be skipped if your data are downloaded with variable names already included. Double
clicking the variable name allows it to be renamed as needed. In the sample below, all of our variables are
numeric, but you can click on the numeric label to change it as needed – numeric is usually used, but some
datasets contain string variables, which just means that the variable can contain non-numeric text. Keep in
mind that a text variable will not be available for most statistical analyses when you open your analysis
windows. The width and decimals fields allow you to change the way the data is displayed in the data view.
Clicking on the values in the value column allows you to assign labels to different numbers. For
example, clicking on the age values brings up this window, where you can see that each number
represents a range of ages (e.g., 1 = 18 – 25). This makes a difference when you are looking at
descriptive information or running your analyses, where the number is automatically replaced with
the label. You can add new labels, or change or remove existing ones.
If you scroll over to the right side of the variable view window, you will also see the measure
column. This allows you to show what kind of data each variable holds. Ordinal data is usually used
to identify Likert-type responses, where participants respond with a set of ordered categories. Scale
variables, like the number of times a participant has done a certain task, should be labeled scale,
while nominal variables are those that contain categories with no specific order, like ethnicity or
gender. These can be changed with a simple drop-down.
Having a nice clean dataset is the first step to getting your thoughts organized, and makes your
output much easier to interpret. As an added bonus, if you ever need a second party to jump in to
help you out, they will really appreciate that you have already taken a few moments to make all of
your data clearly labeled and identified.
ntroduction to SPSS
1. Introduction
The aim of this module is to introduce you to the SPSS software, including
navigating the program interface and using syntax to manage and analyse
data. The module will also cover how to access and undertake commonly used
descriptive data analysis procedures in SPSS.
I would recommend splitting your screen with this module on one side, and the
SPSS program on the other side of your screen. Even better if you have dual
screens! It will be much easier to work through the module if you have visual
access to both this module and SPSS (otherwise you'll constantly be flicking
back and forth from one to the other).
To undertake this module, you will need access to a couple of datasets and
related files.
Please download and save the following files to a sensible location, where you
can easily access them for the duration of this module:
1. 2016 Health and Society student health survey data (SPSS format, 36
KB): HLTH1025_2016.sav
2. 2016 Health and Society student health survey data (Excel format,
93 KB): HLTH1025_2016.xlsx
Page Menu
Dashboard
Content
Activities
Recordings
Participants
1. Home
2. NC00437
3. Introduction to SPSS
Introduction to SPSS
3. Part I: Introduction to the SPSS software
It’s important to note that SPSS is not the only statistical software – there are
many others that you may come across if you pursue a career that requires you
to work with data. Some of the other more common statistical packages
include Stata and SAS (and there are many others). The focus for this session,
however, is on SPSS.
Click on the Start menu ( ) > All Programs > IBM SPSS Statistics
> IBM SPSS Statistics 21 (or whatever is the latest version number) to open
the SPSS program.
When you first open SPSS on your computer, you should see something that
looks similar to to following screenshot:
SPSS automatically assumes that you want to open an existing file, and
immediately opens a dialogue box to ask which file you’d like to open. It’ll
make it easier to navigate the interface and windows in SPSS if we open a file.
We’re going to open and use from data collected as part of a first-year course at
UniSA, called Health and Society (H&S). The data we are going to use were
collected at the beginning of 2016. So let’s first open our demonstration data
set – the 2016 H&S student health survey data.
Table of contents
1. Introduction
2. Datasets and files to download
3. Part I: Introduction to the SPSS software
o 3.1. What is SPSS?
o 3.2. Opening data - Part 1
o 3.3. Opening data - Part 2
o 3.4. Navigating the SPSS interface
o 3.5. Getting organised
4. Part II: Using Syntax
o 4.1. Using syntax to ... open and name a dataset
o 4.2. Using syntax to ... set your working directory
o 4.3. Using syntax to ... save data
5. Part III: Managing data in SPSS
o 5.1. Importing data from Excel to SPSS
o 5.2. Inspecting your data
o 5.3. Labelling your data
o 5.4. Sorting and merging data
6. Part IV: Describing your data
o 6.1. Procedures: Frequencies and Descriptives
o 6.2. Procedure: Crosstabs (and chi-square)
o 6.3. Procedure: Explore
o 6.4. Procedure: Correlations
7. Part V: Modifying your data
o 7.1. Compute a new variable
o 7.2. Recode an existing variable into a new variable
8. Getting help
9. References
Disclaimer
Copyright
Privacy
Web accessibility
Contact
Student Help
Staff Help
Facebook
Twitter
Linkedin
Youtube
Instagram
All material published on this website is protected by copyright and is made available for use only
within University of South Australia courses and programs under licence, with permission or in
compliance with the provisions of the Australian Copyright Act 1968 (Cth). Except as permitted
under the Copyright Act 1968, no part may be reproduced by any means or process without prior
written permission of the University of South Australia and the copyright owners.
Policies
In the active dialogue box that asks very nicely, "What would you like to
do?", we are going to "Open an existing data source", by clicking the "More files
..." option and finding our way to where we earlier saved our H&S dataset.
Once you have found the file you want to open, click Open and you’ll
(hopefully!) see a data file in front of you that looks something like this:
The other bit you may have noticed pop up was another window, mostly blank,
but with a bit of funny looking writing in it, like this:
Now that we have opened a data file, SPSS automatically opens what is called
an “Output Viewer” window. We’re going to talk about the SPSS interface,
including this window, shortly. Next, we have an example of how to open a
data file using the SPSS drop down menus.
Click on the File menu at the top left of your Data Editor window, then Open >
Data … :
Here you can navigate to your working folder and open your data file.
This is the window where you can see your data, and information about the
variables in your dataset. It is also possible to change your data in this window,
but I would strongly recommend against ever changing your data that way
because things can go terribly wrong and you have no record of what you
changed and how you changed it!
There are two ‘views’ for the Data Editor window:
1) Data view – you can see the actual data in your dataset for each record
and each variable; and
A second window in the SPSS interface is the Syntax Editor window. SPSS
doesn’t automatically open a syntax file for us, so let’s open one now. Click on
the File menu in the top left corner of your screen, and select New > Syntax:
The syntax window is where we can run commands (tell SPSS what we want it
to do), such as opening files, editing and managing data, undertaking statistical
procedures and tests, and saving files. The Syntax Editor window is essentially
a file that we can use to record and save everything that we do for a particular
piece of analysis, or when managing/editing a dataset.
When you first open a syntax file, it looks similar to a blank Word document:
While many people feel extremely uncomfortable using syntax, and would much
rather use the built-in menus (sound like you?!), in SPSS you can actually do
both (use the menus and the syntax file) for many procedures. We will start off
by doing this, to ease you into using SPSS syntax.
Regardless of whether you write the syntax yourself, or paste it into a syntax
file from the menus, it really is best practice to use a syntax file to keep a
record of all your data management and analysis procedures.
van den Berg (2013) lists six reasons you should use SPSS syntax:
A little later we will cover the basics of SPSS syntax, including a bit more
information on these six reasons for using it. By the end of this session, you
should be well on your way to using SPSS syntax competently and confidently!
Output viewer window
Any time you do anything in SPSS – even just opening a file – SPSS will
document it in the Output Viewer window. If you don’t already have an output
window open, SPSS will open a new one for you, as it did when we first opened
a dataset.
The output viewer window will keep a record of any and all commands you give
SPSS (e.g., open a file, save a file, etc.), and is also the window in which you
can view the results of any data/statistical procedures undertaken.
You can see in the screenshot below, that SPSS has made a record in the
Output file that you opened a particular SPSS file:
It shows us the syntax command that SPSS recognises to open a file (GET
FILE), and it also shows us the structure of the syntax:
GET FILE is the command, then there is an equals sign (=), followed by the file
path and file name.sav in single quotation marks, and a full stop (.) at the end
of the command. SPSS syntax always starts with the command name and
always ends with a full stop (.).
The second line starting with the command DATASET NAME is a very useful
piece of syntax – we’ll get to that shortly!
Similar to the Syntax Editor window, the Output Viewer window is simply a file
that you can save and edit if you wish. You can also copy tables and graphs (or
anything else) presented in the output file into Microsoft Word, Excel or other
similar programs.
I: Introduction to the SPSS software
When you’re working with data in SPSS (or any software for that matter),
ideally you should keep your work organised. How you organise your work is
up to you, but here are some general tips that may help make your life easier:
1. Keep all the files relating to a single ‘project’ in the one place. A ‘project
could be an analysis for a chapter of your thesis or for a publication, primary
data collection, data management processes, etc.
2. Make sure you work is backed up. Either save it somewhere that is
automatically backed up, or back it up yourself, regularly.
3. Never edit your original data file. You should always keep a copy of your
original data, and keep it somewhere that you won’t accidentally edit or
overwrite it. You may want to keep it in a sub-folder, for instance. Sometimes
we make mistakes in managing and using data, and need to go back to the
original file. Storing your original data can also be an important ethical
consideration.
4. Label your files well, with descriptive file names.
o You may like to number them in the order that they occur in the project
by placing a number at the front of the filename (a good way to do this is
to number using tenfolds (10, 20, 30 …) so that you can insert files later if
needed. For example, you could prefix a file with 15 if you wanted it to be
run between file 10 and 20.
o Use descriptive file names so that you have a good idea what’s in there
when you want to find something a year or two (or five) down the track.
If you have never been so organised with your data sets, now is an excellent
time to start! There’s really not a whole lot worse than doing a bunch of
analyses using multiple datasets and having the files all jumbled in with other
documents … or worse, spread all over the place in separate folders!
Here is your opportunity to get familiar with using syntax (code) in a statistical software
package. You may have used other statistical packages, but each one has a unique syntax. Although
once you become comfortable with the idea of using syntax, it’s not so difficult to move from one
program to the next and learn the unique commands and syntax structure for each.
One benefit of SPSS, is that most of the procedures accessed in the interface menus actually allow
you to paste the code for those procedures into a syntax file. This can be a great way to get
comfortable with using syntax. Let’s have a go at doing this now!
If you want to write a comment, or any sort of text that is NOT an SPSS
command (telling SPSS to do something), then you need to preface your text
with an asterisk (*). You also need to end your text with a full stop (.). When
reading your syntax, SPSS will ignore any text within an asterisk and a full stop.
1. Save your syntax file in the same place you saved your datasets (the organised
version!), with a descriptive name (I’ve called my syntax file “10 Introduction to
SPSS workshop_data management”).
2. Have a go at documenting what the file is for, who wrote it (that’s you), and
today’s date. You can see how I’ve commented my file below:
You may have noticed from the image above that SPSS colour codes its syntax
– rather a neat feature I think. Commands (words that tell SPSS to do
something) are in blue. Just for the record, it doesn’t matter if they’re written
in all caps or not. Objects are in red, and criteria are in yellow. Text (like my
comments) are in grey. We may come across some other colours as we go.
If you were wondering, the window=front code tells SPSS to make the dataset
the top window (in front of everything else already open on your desktop).
Notice also that I re-wrote the syntax commands in lowercase just to show you
that it still works (!).
Task: Open an SPSS data file using syntax
1. Save the dataset and syntax file you have open in SPSS (don’t worry about the
Output file).
2. Close SPSS (all files).
3. Open Windows Explorer and navigate to the folder where you have saved your
SPSS workshop files.
4. Locate the syntax file you just created and saved, and open it by double clicking
the file.
5. Select all of the text in the syntax file (text and commands) and then click the
“Run Selection” button on the menu bar:
This should now open the Health and Society dataset, and label it as
“HLTH1025_2016”. You’ll see the dataset name in the top right hand corner of
the Data Editor window in square brackets:
Now that you know how to comment an SPSS syntax file, and can open and
name a dataset using syntax, we’re going to also learn how to set a working
directory (I’ll explain …) and save files using syntax.
4. Part II: Using Syntax
There’s a really neat command in SPSS (and most statistical programs have
something similar) that allows you to define a directory where you are
accessing and saving your files – a place like we’ve just recently set up where
all our files are in the same place – called the working directory.
It’s good practice to define the location of your working directory at the
beginning of your SPSS syntax file, and will save you a lot of time in the long
run. You can do this using the cd command (cd stands for “change directory”).
You can also check the working directory using the show directory command.
The key commands for defining and checking your working directory:
We are now going to define our own working directory, which will be the place
you saved your SPSS files earlier. Now would be a perfect time to get out your
USB stick, plug it in, and then open windows explorer to determine the file path
for your working directory.
1. In your syntax file, under the text identifying the file, who wrote it and when it
was written, but before the get file command, use the cd command to define
your working folder location, e.g.,:
cd "C:\Temp\SPSS Workshop".
Notice how soon after you type “cd”, it changes colour to blue? Cool, huh?!
Here’s what your syntax file might now look like:
Now that we’ve set out working directory, we can modify our get file command
to remove the file path (which is now set using the cd command). Now,
whenever we open and/or save a file using this syntax, we won’t need to type
the file path. Makes writing the syntax much easier!
Obviously when you create or are given a data set for your work, you need
to save this file somewhere sensible. It’s really bad practice to use it from your
‘downloads’ folder, or your desktop (just for example … and yes, people actually
do this … but not you!).
Once you’ve decided where you’re going to save your files (you would have
already done this when defining your working directory, so we should be ok at
this point), you can use the save command to save your file where you want it.
For those of you who are still not entirely comfortable with the concept of
using/writing syntax yet, let’s first use the menus and then paste the save
syntax into our open syntax file.
The thing is, though, that we’ve set the working directory already, so there’s a
bit of inefficiency going on here. Also, the syntax is pretty straightforward, so
let’s have a go at writing it ourselves, making use of the fact that we’ve also set
the working directory up front.
The outfile criteria tells SPSS that we want to “save out” the file to a
location, which equals (=) the location we specify in quotation marks. Since
we set the working directory, we now don’t need to specify the whole file
path – just the file name. SPSS now knows that we want everything saved
to the working directory. Efficiency!
The text at the end of the command – /compressed – tells SPSS to compress
the data file to save space. This is an “option” for the save command (most
commands have one or more options to be more specific about what you
want SPSS to do) – options are highlighted in green.
Managing data well is a really important skill, and SPSS is a good place to learn some of the basics
of data management.
Before and after doing each of these processes, you should also know how to inspect your data.
One of the more common formats for entering and storing raw data is
Microsoft Excel. If you want to work with your data in SPSS, however, you’ll
need to import the data from Excel to SPSS. This can be relatively
straightforward, but there are a few things to look out for in the conversion
process.
We are going to have a go at importing a data file from Excel to SPSS, so that
we can see firsthand some of the issues that may come up.
My best advice would be to make sure that you have set up your data well in
Excel – this can make the importing process much easier. Also, make sure you
take note of a few key data elements in your original data, such as the total
number of respondents or records, and the number of variables.
Task: Import data from Excel to SPSS
If you have a look at the import syntax, you can see that it’s doing exactly what
we told it to do. Similar to the get file command, the get data command also
gives you the option to name your dataset.
Select only the get data procedure and click the Run Selection button on the
menu bar (the green arrow).
The first thing you should do after importing data from another format is to
check your output file to see if there are any error messages. The output file is
where you will see whether the syntax you have run has ‘worked’ properly.
In this case, you should not see any error messages, as in the image below:
Also, if you haven’t already put a description (comment) in front of your import
syntax, you should do it now!
The next thing to do is to check that the data look like they should, e.g.,
numeric data has imported as numeric, string data as string, the values look ok,
you have the right number of records, the right number of variables, etc. This
part requires knowing how to inspect your data in SPSS, which leads us nicely
into the next part of this workshop.
Before you can start doing data analyses, you need to familiarise yourself with your data. I like
to call this ‘getting to know your data’. There are several commands that help you to understand the
characteristics your data. Some of these commands provide similar information and it’s up to you
which ones you prefer to use.
There are a couple of useful commands for inspecting your data:
1. The codebook command is best run through the menus if you have a large number of variables and
you want to look at information for all of them
1. Click Analyze > Reports > Codebook…
2. Select all of the variables you want information for, and move them from the left hand box
to the right hand box:
3. Click Paste.
4. Select and run your codebook syntax from your syntax file.
5. Have a look at the information presented about your variables in the Output window:
2. To display the data dictionary, all you need to type into your syntax file is:
display dictionary.
You should now see something like this in your Output Viewer window:
. Part III: Managing data in SPSS
Later on, when you’re running some analyses and you want to look at, say, how many of your
respondents are male or female, you’ll run a procedure and the results will be displayed in the Output
Viewer window. If we left our data as they are, we’d get the results we want, but we probably
wouldn’t know how to read them. E.g., if I ran that procedure, here’s the output I’d get:
But which ones are male and which ones are female?! Because I haven’t labelled my data, it can be
difficult to remember what all the values for each variable refer to.
So here’s a tip: always label your variables and their values (particularly for categorical variables).
To give the variable sex a descriptive label, here’s what you could type in your syntax file:
Then, to label the categories of sex (in this case, male and female), we could type the following:
After labelling my variable, sex, if I run the same procedure as above I get a table that very clearly
indicates what is being presented:
The sort syntax is quite simple, and you can write this yourself. If I want to sort cases by sex,
in ascending order (although for this type of variable it probably doesn’t matter whether ascending or
descending), I can simply type into my syntax file the following:
The command to sort your data is sort cases. The “A” in brackets at the end of the statement
indicates we want the data sorted in ascending order – you can write “D” if you want descending.
Let’s say now, we have an additional variable – year – that we have in a separate dataset. This
second dataset has the same participants as our HLTH1025_2016 dataset, and they are identified by
the variable IDnumber. But we have several years’ worth of data from this course, and when we put
it all together, we want to be able to identify which students belong to which cohort (year of study).
The students we have information about belong to the second year of the study. We need to merge
this variable in to our main dataset.
This is a relatively simple merge procedure, but demonstrates the key steps involved.
1. You already have open the main dataset (HLTH1025_2016). You now need to open the
second dataset, called HLTH1025_2016_yr. Do this using the get file syntax:
get file=‘HLTH1025_2016_yr’.
We also want to name this second dataset – it will help us to tell SPSS exactly which dataset we
want to do things with … we’ll see in a minute when we sort. So use the dataset name syntax to
name your new dataset:
2. Now we need to sort cases by IDnumber in both datasets. Here’s where it’s helpful to have
datasets named, since we can ‘activate’ a dataset to run a procedure, then activate another dataset
to run the same or different procedure. It’s easy to activate a particular dataset if you’ve named
it. Use the following syntax:
3. Now that you’ve written the syntax to open, name and sort your data, you will need to select
those lines of code and Run Selection (big green arrow button).
4. The final step is to merge the year variable into your main dataset. Here we are going to use
the menus to help us, and then paste the syntax and run it from our syntax file.
c. Select the dataset HLTH1025_2016_yr to merge with your active dataset, and click
Continue:
f. Click the IDnumber variable in the “Excluded Variables:” box to the top left, and move it
using the arrow button into the “Key Variables:” box.
This tells SPSS to match cases in both datasets by the IDnumber variable, and we confirm
that the data in both datasets are sorted by IDnumber (the variable we want to match on).
When you click Paste (or OK) a warning message will come up to say that if your data are
not sorted in ascending order of the Key Variable(s) then the procedure will fail. Luckily we
already sorted the data as required, so we click OK.
After you run a procedure like merging, it’s best to check a few things. First, I always check
my output to see whether there were any error messages. Hopefully you don’t receive any
error messages, and your output record looks something like this:
The next thing I normally check is the Data Editor window in Variable View, to see whether
my new variable is now at the bottom of my list of variables, and check whether it looks like
it merged correctly.
Then, I’d just double check using Data View that the values have actually merged for that
variable, and that they look right. I know that all records should have a value of “2” for the
variable year.
Before initiating data analyses, it’s a good idea to check the frequencies of all variables, how
categorical variables have been coded, the minimum and maximum values and number of missing
observations. This is a good way to identify any outliers and potential mistakes in the dataset.
Before initiating data analyses, it’s a good idea to check the frequencies of all variables, how
categorical variables have been coded, the minimum and maximum values and number of missing
observations. This is a good way to identify any outliers and potential mistakes in the dataset.
The commands frequencies and descriptives I tend to write myself in my syntax file, but I will
often use the menus for crosstabs, explore and correlations, depending on what options I want to use.
Regardless of whether I use syntax or menus, I ALWAYS paste the code into my syntax file so
that I have a record of what I have done!!
See the following for an example of the frequencies (you can shorten this to freq) and descriptives
syntax:
freq smokestat.
Here’s the output:
descriptives age. [total count (n), min, max, mean & SD for age]
2. In the dialogue box that pops up, place your outcome of interest (smoking) in the columns, and the
variable you want to group participants by (to compare one against the other) in the rows (i.e., sex):
3. If we leave it at this, we’ll just get a count of respondents in each combination of categories (e.g.,
males who smoke, or females who are ex-smokers). But we want to know how to also get a
percentage for each group. For this, we need to select the “Cells…” button to the top right of the
dialogue box:
4. Here we can select percentages for rows, columns and total. Let’s select all three then click
Continue, then Paste, to paste the syntax to our syntax file for the record.
the majority of males and females (83.7% and 95.3%, respectively) have never smoked;
a greater proportion of females (95.3%) than males (83.7%) have never smoked;
a greater percentage of males (7.6%) than females (2.3%) are current smokers; and
a greater percentage of males (8.7%) are ex-smokers compared to females (2.3%).
It’s also possible to run chi square tests using crosstabs to test whether those differences observed are
statistically significant. We can do this through the “Statistics…” option in the crosstabs dialogue
box:
1. Tick the “Chi-square” check box at the top left of the Statistics dialogue box, then click Continue.
2. Paste your updated crosstabs syntax to your syntax file (don’t delete the previous one!), and run
selection.
3. In addition to the crosstabs results table you had before, your output should also include another
table at the bottom with the results of the chi-square test:
Chi-Square Tests
Value df Asymp. Sig. (2-
sided)
Pearson Chi-Square 11.633a 2 .003
Likelihood Ratio 10.547 2 .005
Linear-by-Linear Association 10.714 1 .001
N of Valid Cases 306
a. 2 cells (33.3%) have expected count less than 5. The minimum expected
count is 3.61.
The Pearson Chi-spare test (top row of data in the table) indicates that there are significant
differences between groups, given by the p-value less than 0.05 in the third column of the table.
As an example, let’s have a look at the mean age by smoking status categories.
For our final descriptive procedure, we’ll have a look at the correlation between weight and
height.
2. All you need to do is add the two continuous variables of interest (weight and height) into the
“Variables:” box, and click Paste.
3. Run selection in your syntax file and then look at your output:
4. This table gives us the results of our procedure. It tells us that there is a weak, negative
association between height and weight of -0.014, and that it is not statistically significant (p>0.05).
Normally when we collect data, we don’t get everything in the format that we actually want to
analyse. For example, in a survey we often collect information about height and weight – but what
we really want to know is their body mass index (BMI). We use the height and weight data to
calculate this, and create a new variable BMI. We might then also want to categorise our new BMI
variable into: underweight, normal weight, overweight & obese.
We’re now going to look at a couple of very commonly used commands to modify, and create new
variables from, existing data.
You can access these procedures from the menus, under the Transform menu. There are many
things you can do here, such as create standardised variables (with a mean of 0, SD of 1), create new
variables from ntiles (tertiles, quartiles, quintiles, deciles, etc.), and many more. It’s worth having a
look to see what you can do.
The compute and recode commands can be used for simple data modification processes, though,
so we’ll write the code ourselves in our syntax file.
7. Part V: Modifying your data
First, let’s compute a new variable, BMI, from weight and height [Note: BMI = weight/height2]:
compute heightm=height/100.
execute.
2. We should check our newly created variable heightm – how should we do this?
compute bmi=weight/height**2.
execute.
Now that we have our new variable BMI, we want to recode it into a new variable, bmicat,
where the BMI values are categorised into underweight (BMI<18.5), normal weight (BMI >=18.5
and <25), overweight (BMI >=25 and <30), and obese (BMI>=30).
execute.
The recode command is followed by the variable name (bmi), then we put in brackets the value
ranges for each category, and give them a label (1, 2, 3, and 4 for the four categories). We also tell
SPSS that any data that were missing (sysmis) we want to be missing in our new recoded variable.
We are recoding the bmi variable INTO a new variable, bmicat.
If we didn’t include the “into bmicat” at the end of the syntax, we would recode the original
bmi variable and would lose the individual BMI values and replace them with categories. I like to
keep my original variables, so tend to use the INTO option to recode into a new variable.
Getting help
SPSS has well-developed help facilities. In addition, Google is always a good way to find help!
Within the SPSS interface, there are two options for getting help:
Also see the online IBM SPSS knowledge centre, which is like a searchable manual for SPSS:
https://www.ibm.com/support/knowledgecenter/SSLVMB/welcome
There are also some other excellent online resources, such as this SPSS Tutorials website:
http://www.spss-tutorials.com/
References
IBM 2016, IBM Knowledge Center: SPSS Statistics, IBM, viewed 18 May
2016, <https://www.ibm.com/support/knowledgecenter/SSLVMB/welcome/>.
van den Berg RG 2013, 8.1.1 SPSS Syntax – Six reasons you should use it, SPSS Tutorials,
viewed 17 May 2016, <http://www.spss-tutorials.com/spss-syntax-six-reasons-you-should-
use-it/>.
van den Berg RG 2015, 1.2.1 SPSS Combining data with syntax and output, SPSS Tutorials,
viewed 17 May 2016, < http://www.spss-tutorials.com/spss-combining-data-with-syntax-
and-output/>.
van den Berg RG 2016, SPSS Tutorials, viewed 17 May 2016, <http://www.spss-
tutorials.com/>.