0% found this document useful (0 votes)
727 views20 pages

SPSS Project

This document provides a summary of SPSS (Statistical Package for the Social Sciences) programming. It discusses the history and capabilities of SPSS, how to work with SPSS command syntax, rules for running SPSS commands interactively versus in batch mode, and tips for customizing the SPSS programming environment and protecting original data. The document is intended to serve as an introduction to programming with SPSS command syntax.

Uploaded by

Adekayero Tope
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
727 views20 pages

SPSS Project

This document provides a summary of SPSS (Statistical Package for the Social Sciences) programming. It discusses the history and capabilities of SPSS, how to work with SPSS command syntax, rules for running SPSS commands interactively versus in batch mode, and tips for customizing the SPSS programming environment and protecting original data. The document is intended to serve as an introduction to programming with SPSS command syntax.

Uploaded by

Adekayero Tope
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

1

NAME: ADEKAYERO TOPE

MATRIC NO: 07/029

DEPT: COMPUTER SCIENCE

COURSE: STAT 203

TITLE: SPSS PROGRAMMING AND


DATA MANAGEMENT
2

HISTORY OF SPSS

The history of SPSS can be traced back to 1967, when Norman H. Nie, then a 22-
year-old Ph.D. candidate at Stanford University, decided to develop his own
solution after becoming "frustrated trying to use a computer to analyze data
describing the political culture of five nations," according to the September 22,
2003, issue of the Chicago Tribune. The application Nie was trying to use was
created for biologists, not social scientists. With that in mind, Nie took detailed
notes about what he needed in a software application and enlisted the help of Dale
H. Bent, a fellow doctoral candidate whose background was in operations research,
to design a file structure. Hadlai "Tex" Hull, who had recently received his MBA
from Stanford, was tapped to write the code, and by 1968 the Statistical Package
for the Social Sciences (SPSS) was born. Nie and Hull left Stanford to pursue
careers at the University of Chicago, and they brought their SPSS program along
with them. However, their main focus was on academics and research--not on
developing or selling software. Hull became head of the university's Computation
Center. Nie joined its National Opinion Research Center and eventually was
named chairman of the political science department.

Experienced data analysts know that a successful analysis or meaningful report


often requires more work in acquiring, merging, and transforming data than in
specifying the analysis or report itself. SPSS contains powerful tools for
accomplishing and automating these tasks. While much of this capability is
available through the graphical user interface, many of the most powerful features
are available only through
Command syntax.
SPSS is a complete and comprehensive system for analyzing data. SPSS can
take any type of file and use the information to tabulate reports, charts, plots, and
trends.
3

Working with Command Syntax


You don’t need to be a programmer to write SPSS command syntax, but there are a
few basic things you should know. A detailed introduction to SPSS command
syntax is available in the “Universals” section in the SPSS Command Syntax
Reference.
An SPSS command file is a simple text file. You can use any text editor to create a
command syntax file, but SPSS provides a number of tools to make your job
easier. Most features available in the graphical user interface have command
syntax equivalents, and there are several ways to reveal this underlying command
syntax:

 Use the Paste button. Make selections from the menus and dialog boxes,
and then click the Paste button instead of the OK button. This will paste the
underlying commands into a command syntax window.

 Record commands in the log. Select Display commands in the log on the
Viewer tab in the Options dialog box (Edit menu, Options) or run the
command SET PRINTBACK ON. As you run analyses, the commands for
your dialog box selections will be recorded and displayed in the log in the
Viewer window. You can then copy and paste the commands from the
Viewer into a syntax window or text editor. This setting persists across
sessions, so you have to specify it only once.

 Retrieve commands from the journal file. Most actions that you perform
in the graphical user interface (and all commands that you run from a
command syntax window) are automatically recorded in the journal file in
the form of command syntax. The default name of the journal file is spss.jnl.
The default location varies, depending on your operating system. Both the
name and location of the journal file are displayed on the General tab in the
Options dialog box (Edit menu, Options).
4

Running SPSS Commands


Once you have a set of commands, you can run the commands in a number of
ways:

 Highlight the commands that you want to run in a command syntax window
and click the Run button.
 Invoke one command file from another with the INCLUDE or INSERT
command. For more information, see “Using INSERT with a Master
Command Syntax File” on p. 20.
 Use the Production Facility to create production jobs that can run unattended
and even start unattended (and automatically) using common scheduling
software. See the Help system for more information about the Production
Facility.
 Use SPSSB (available only with the server version) to run command files
from a command line and automatically route results to different output
destinations in different formats. See the SPSSB documentation supplied
with the SPSS server software for more information.

Figure 2-1
Command syntax pasted from a dialog box

Syntax Rules
 Commands run from a command syntax window during a typical SPSS
session must follow the interactive command syntax rules.
5

 Commands files run via SPSSB or invoked via the INCLUDE command
must follow the batch command syntax rules.

Interactive Rules
The following rules apply to command specifications in interactive mode:

 Each command must start on a new line. Commands can begin in any
column of a command line and continue for as many lines as needed. The
exception is the END DATA command, which must begin in the first
column of the first line after the end of data.

 Each command should end with a period as a command terminator. It is best


to omit the terminator on BEGIN DATA, however, so that inline data is
treated a some continuous specification.
 The command terminator must be the last non-blank character in a
command.
 In the absence of a period as the command terminator, a blank line is
interpreted as a command terminator, one continuous specification.
 The command terminator must be the last non-blank character in a
command.
 In the absence of a period as the command terminator, a blank line is
interpreted as
 a command terminator.

Batch Rules
The following rules apply to command specifications in batch or production mode:
 All commands in the command file must begin in column 1. You can use
plus
 (+) or minus (–) signs in the first column if you want to indent the command
Specification to make the command file more readable.
 If multiple lines are used for a command, column 1 of each continuation
line must be blank.
 Command terminators are optional.
 A line cannot exceed 256 bytes; any additional characters are truncated.
6

Customizing the Programming Environment


There are a few global settings and customization features that may make working
with
command syntax a little easier.

Displaying Commands in the Log


By default, commands that have been run are not displayed in the log, which can
make it difficult to interpret error messages. To display commands in the log, use
the command:
SET PRINTBACK = ON.
Or, using the graphical user interface:
E From the menus, choose: Edit
Options...
E Click the Viewer tab.
E Select (check) Display commands in the log.

Displaying the Status Bar in Command Syntax Windows


In addition to various status messages, the status bar at the bottom of a command
syntax window displays the current line number and character position within the
line. Since error messages typically contain information about the column position
where an error was encountered, the column position information in the status bar
can help you to pinpoint errors. (Note: You may have to increase the width of the
command syntax window to see this information.) The status bar is displayed by
default. If it is currently not displayed, choose Status Bar from the View menu in
the command syntax window.
7

Status bar in command syntax window with current line number and column
position displayed

Protecting the Original Data


The original data file should be protected from modifications that may alter or
delete original variables and/or cases. If the original data are in an external file
format (for example, text, Excel, or database), there is little risk of accidentally
overwriting the original data while working in SPSS. However, if the original data
are in SPSS-format data files (.sav), there are many transformation commands that
can modify or destroy the data, and it is not difficult to inadvertently overwrite the
contents of an SPSS-format data file. Overwriting the original data file may result
in a loss of data that cannot
be retrieved.
There are several ways in which you can protect the original data, including:

 Storing a copy in a separate location, such as on a CD, that can’t be


overwritten.
 Using the operating system facilities to change the read-write property of the
file to read-only. If you aren’t familiar with how to do this in the operating
system, you can choose Mark File Read Only from the File menu or use the
PERMISSIONS subcommand on the SAVE command.
The ideal situation is then to load the original (protected) data file into SPSS and
do all data transformations, recoding, and calculations using SPSS. The objective
is to end up with one or more command syntax files that start from the original
data and produce the required results without any manual intervention.
8

Do Not Overwrite Original Variables


It is often necessary to recode or modify original variables, and it is good practice
to assign the modified values to new variables and keep the original variables
unchanged.
For one thing, this allows comparison of the initial and modified values to verify
that the intended modifications were carried out correctly. The original values can
subsequently be discarded if required.
Example
*These commands overwrite existing variables.
COMPUTE var1=var1*2.
RECODE var2 (1 thru 5 = 1) (6 thru 10 = 2).
*These commands create new variables.
COMPUTE var1_new=var1*2.
RECODE var2 (1 thru 5 = 1) (6 thru 10 = 2)(ELSE=COPY)/INTO var2_new.
 The difference between the two COMPUTE commands is simply the
substitution of
 a new variable name on the left side of the equals sign.
 The second RECODE command includes the INTO subcommand, which
specifies a
new variable to receive the recoded values of the original variable. ELSE=COPY
makes sure that any values not covered by the specified ranges are preserved.

Using Temporary Transformations


You can use the TEMPORARY command to temporarily transform existing
variables for analysis. The temporary transformations remain in effect through the
first command that reads the data (for example, a statistical procedure), after which
the variables revert to their original values.

Example
temporary.sps.
DATA LIST FREE /var1 var2.
BEGIN DATA
12
34
56
78
9 10
9

END DATA.
TEMPORARY.

COMPUTE var1=var1+ 5.
RECODE var2 (1 thru 5=1) (6 thru 10=2).
FREQUENCIES
/VARIABLES=var1 var2
/STATISTICS=MEAN STDDEV MIN MAX.
DESCRIPTIVES
/VARIABLES=var1 var2
/STATISTICS=MEAN STDDEV MIN MAX.
 The transformed values from the two transformation commands that follow
the TEMPORARY command will be used in the FREQUENCIES
procedure.
 The original data values will be used in the subsequent DESCRIPTIVES
procedure, yielding different results for the same summary statistics.

Under some circumstances, using TEMPORARY will improve the efficiency of


a job when short-lived transformations are appropriate. Ordinarily, the results of
transformations are written to the virtual active file for later use and eventually
are merged into the saved SPSS data file. However, temporary transformations
will not be written to disk, assuming that the command that concludes the
temporary state is not otherwise doing this, saving both time and disk space.
(TEMPORARY followed by SAVE, for example, would write the
transformations.) If many temporary variables are created, not writing them to
disk could be a noticeable saving with a large data file. However, some
commands require two or more passes of the data. In this situation, the
temporary transformations are recalculated for the second or later passes. If the
transformations are lengthy and complex, the time required for repeated
calculation might be greater than the time saved by not writing the results to
disk. Experimentation may be required to determine which approach is more
efficient.

Using Temporary Variables


10

For transformations that require intermediate variables, use scratch


(temporary) variables for the intermediate values. Any variable name that begins
with a pound sign (#) is treated as a scratch variable that is discarded at the end of
the series of transformation commands when SPSS encounters an EXECUTE
command or other command that reads the data (such as a statistical procedure).

Example
*scratchvar.sps.
DATA LIST FREE / var1.
BEGIN DATA
12345
END DATA.
COMPUTE factor=1.
LOOP #tempvar=1 TO var1.
- COMPUTE factor=factor * #tempvar.
END LOOP.
EXECUTE.

Figure 2-4
Result of loop with scratch variable

 The loop structure computes the factorial for each value of var1 and puts
the factorial value in the variable factor.
 The scratch variable #tempvar is used as an index variable for the loop
structure.
11

 For each case, the COMPUTE command is run iteratively up to the value of
var1.
 For each iteration, the current value of the variable factor is multiplied by
the current loop iteration number stored in #tempvar.
 The EXECUTE command runs the transformation commands, after which
the scratch variable is discarded.
The use of scratch variables doesn’t technically “protect” the original data in any
way, but it does prevent the data file from getting cluttered with extraneous
variables. If you need to remove temporary variables that still exist after reading
the data, you can use the DELETE VARIABLES command to eliminate them.

Use EXECUTE Sparingly


SPSS is designed to work with large data files (the current version can
accommodate 2.15 billion cases). Since going through every case of a large data
file takes time, the software is also designed to minimize the number of times it has
to read the data.
Statistical and charting procedures always read the data, but most transformation
commands (for example, COMPUTE, RECODE, COUNT, SELECT IF) do not
require a separate data pass.
The default behavior of the graphical user interface, however, is to read the data
for each separate transformation so that you can see the results in the Data Editor
immediately. Consequently, every transformation command generated from the
dialog boxes is followed by an EXECUTE command. So if you create command
syntax by pasting from dialog boxes or copying from the log or journal, your
command syntax may contain a large number of superfluous EXECUTE
commands that can significantly increase the processing time for very large data
files. In most cases, you can remove virtually all of the auto-generated EXECUTE
commands, which will speed up processing, particularly for large data files and
jobs
that contain many transformation commands. To turn off the automatic, immediate
execution of transformations and the associated pasting of EXECUTE commands:

From the menus, choose:


Edit
Options...
Click the Data tab.
Select Calculate values before used.

Getting Data into SPSS


12

Before you can work with data in SPSS, you need some data to work with. There
are several ways to get data into the application:
 Open a data file that has already been saved in SPSS format.
 Enter data manually in the Data Editor.
 Read a data file from another source, such as a database, text data file
spreadsheet, SAS, or Stata.
Opening an SPSS-format data file is simple, and manually entering data in the
Data Editor is not likely to be your first choice, particularly if you have a large
amount of data. This chapter focuses on how to read data files created and saved in
other applications and formats.

Getting Data from Databases


SPSS relies primarily on ODBC (open database connectivity) to read data from
databases. ODBC is an open standard with versions available on many platforms,
including Windows, UNIX, and Macintosh.

Installing Database Drivers


You can read data from any database format for which you have a database driver.
In local analysis mode, the necessary drivers must be installed on your local
computer. In distributed analysis mode (available with the Server version), the
drivers must be installed on the remote server.
ODBC database drivers for a wide variety of database formats are included on the
SPSS installation CD, including:

 Access
 Btrieve
 DB2
 dBase
 Excel
 FoxPro
 Informix
 Oracle
 Paradox
 Progress
 SQL Base
 SQL Server
13

 Sybase

Most of these drivers can be installed by installing the SPSS Data Access Pack.
You can install the SPSS Data Access Pack from the AutoPlay menu on the SPSS
installation CD.
If you need a Microsoft Access driver, you will need to install the Microsoft Data
Access Pack. An installable version is located in the Microsoft Data Access Pack
folder on the SPSS installation CD.
Before you can use the installed database drivers, you may also need to configure
the drivers using the Windows ODBC Data Source Administrator. For the SPSS
Data Access Pack, installation instructions and information on configuring data
sources are located in the Installation Instructions folder on the SPSS installation
CD.

Reading a Single Database Table


SPSS reads data from databases by reading database tables. You can read
information from a single table or merge data from multiple tables in the same
database. A single database table has basically the same two-dimensional structure
as an SPSS data file: records are cases and fields are variables. So, reading a single
table can be very simple.
Example
This example reads a single table from an Access database. It reads all records and
fields in the table.

*access1.sps.
GET DATA /TYPE=ODBC /CONNECT=
'DSN=MS Access Database;DBQ=C:\examples\data\dm_demo.mdb;'+
'DriverId=25;FIL=MS Access;MaxBufferSize=2048;PageTimeout=5;'
/SQL = 'SELECT * FROM CombinedTable'.EXECUTE.

 The GET DATA command is used to read the database.

 TYPE=ODBC indicates that an ODBC driver will be used to read the data.
This is required for reading data from any database, and it can also be used
for other data sources with ODBC drivers, such as Excel workbooks. For
more information, see “Reading Multiple Worksheets” on p. 33.
14

 CONNECT identifies the data source. For this example, the CONNECT
string was copied from the command syntax generated by the Database
Wizard. The entire string must be enclosed in single or double quotes. In this
example, we have split the long string onto two lines using a plus sign (+) to
combine the two strings.

 The SQL subcommand can contain any SQL statements supported by the
database format. Each line must be enclosed in single or double quotes.

 SELECT * FROM Combined Table reads all of the fields (columns) and all
records (rows) from the table named Combined Table in the database.

 Any field names that are not valid SPSS variable names are automatically
converted to valid variable names, and the original field names are used as
variable labels. In this database table, many of the field names contain
spaces, which are removed in the variable names.
15

Figure 3-1
Database field names converted to valid variable names

Reading Multiple Tables


You can combine data from two or more database tables by “joining” the tables.
The active dataset can be constructed from more than two tables, but each “join”
defines a relationship between only two of those tables:

 Inner join. Records in the two tables with matching values for one or more
specified fields are included. For example, a unique ID value may be used in
each table, and records with matching ID values are combined. Any records
without matching identifier values in the other table are omitted.
 Left outer join. All records from the first table are included regardless of
the criteria used to match records.
 Right outer join. Essentially the opposite of a left outer join. So, the
appropriate one to use is basically a matter of the order in which the tables
are specified in the SQL SELECT clause.
16

Example
In the previous two examples, all of the data resided in a single database
table. But what if the data were divided between two tables? This example
merges data from two different tables: one containing demographic
information for survey respondents and one containing survey responses.
*access_multtables1.sps.
 GET DATA /TYPE=ODBC /CONNECT= 'DSN=MS Access
Database;DBQ=C:\examples\data\dm_demo.mdb;'+'DriverId=25;FIL=MS
Access;MaxBufferSize=2048;PageTimeout=5;'
/SQL =
 'SELECT * FROM DemographicInformation, SurveyResponses' ' WHERE
DemographicInformation.ID=SurveyResponses.ID'.
EXECUTE.
 The SELECT clause specifies all fields from both tables.
 The WHERE clause matches records from the two tables based on the value
of the ID field in both tables. Any records in either table without matching
ID values in the other table are excluded.
 The result is an inner join in which only records with matching ID values in
both tables are included in the active dataset.

Example
In addition to one-to-one matching, as in the previous inner join example, you can
also merge tables with a one-to-many matching scheme. For example, you could
match a table in which there are only a few records representing data values and
associated descriptive labels with values in a table containing hundreds or
thousands of records representing survey respondents.
In this example, we read data from an SQL Server database, using an outer join to
avoid omitting records in the larger table that don’t have matching identifier values
in the smaller table.
*sqlserver_outer_join.sps.
GET DATA /TYPE=ODBC/CONNECT= 'DSN=SQLServer;UID=;APP=SPSS
For
Windows;''WSID=ROLIVERLAP;Network=DBMSSOCN;Trusted_Connection=Y
s'/SQL ='SELECT SurveyResponses.ID, SurveyResponses.Internet,'' [Value
Labels].[Internet Label]'' FROM SurveyResponses LEFT OUTER JOIN [Value
Labels]'' ON SurveyResponses.Internet'' = [Value Labels].[InternetValue]'.
17

Figure 3-2
SQL Server tables to be merged with outer join

Reading Excel Files


SPSS can read individual Excel worksheets and multiple worksheets in the same
Excel workbook. The basic mechanics of reading Excel files are relatively
Straightforward—rows are read as cases and columns are read as variables.
However, reading a typical Excel spreadsheet—where the data may not start in row
1, column 1—requires a little extra work, and reading multiple worksheets requires
treating the Excel workbook as a database. In both instances, we can use the
GETDATA command to read the data into SPSS.

Reading a “Typical” Worksheet


When reading an individual worksheet, SPSS reads a rectangular area of the
worksheet, and everything in that area must be data related. The first row of the
area may or may not contain variable names (depending on your specifications);
the remainder of the area must contain the data to be read. A typical worksheet,
however, may also contain titles and other information that may not be appropriate
for an SPSS data file and may even cause the data to be read incorrectly if you
don’t explicitly specify the range of cells to read.
18

Example
Figure 3-4 Typical Excel worksheet

To read this spreadsheet without the title row or total row and column:

*readexcel.sps.
GET DATA
/TYPE=XLS
/FILE='c:\examples\data\sales.xls'
/SHEET=NAME 'Gross Revenue'
/CELLRANGE=RANGE 'A2:I15'/READNAMES=on .
Reading Multiple Worksheets
An Excel file (workbook) can contain multiple worksheets, and you can read
multiple worksheets from the same workbook by treating the Excel file as a
database. This requires an ODBC driver for Excel.
19

Figure 3-6
Multiple worksheets in same workbook

When reading multiple worksheets, you lose some of the flexibility available for
reading individual worksheets:
 You cannot specify cell ranges.
 The first non-empty row of each worksheet should contain column labels
that will be used as variable names.
 Only basic data types—string and numeric—is preserved, and string
variables may be set to an arbitrarily long width.

Example
In this example, the first worksheet contains information about store location, and
the second and third contain information for different departments. All three
contain a column, Store Number, that uniquely identifies each store, so, the
information in the three sheets can be merged correctly regardless of the order in
which the stores are listed on each worksheet.
20

Reading Text Data Files


A text data file is simply a text file that contains data. Text data files fall into two
broad categories:
 Simple text data files, in which all variables are recorded in the same order
for all cases, and all cases contain the same variables. This is basically how
all data files appear once they are read into SPSS.
 Complex text data files, including files in which the order of variables may
vary between cases and hierarchical or nested data files in which some
records contain variables with values that apply to one or more cases
contained on subsequent records that contain a different set of variables (for
example, city, state, and street address on one record and name, age, and
gender of each household member on subsequent records).

Text data files can be further subdivided into two more categories:

Delimited: Spaces, commas, tabs, or other characters are used to separate


variables. The variables are recorded in the same order for each case but not
necessarily in the same column locations. This is also referred to as freefield
format. Some applications export text data in comma-separated values (CSV)
format; this is a delimited format.
Fixed width: Each variable is recorded in the same column location on the same
fact, in many text data files generated by computer programs, data values may
appear line (record) for each case in the data file. No delimiter is required between
values. In to run together without even spaces separating them. The column
location determines which variable is being read.

Saving Data in Text Format


You use the SAVE TRANSLATE command to save data as tab-delimited text or
the WRITE command to save data as fixed-width text. See the SPSS Command
Syntax Reference for more information.
Exporting Results to Word, Excel, and PowerPoint
The OMS command is the method of choice for exporting results in XML or text
format, but OMS is not appropriate if you want to export results to Microsoft
Word, Excel, or PowerPoint.
To export results to Word, Excel, or PowerPoint, you need to use the Export
facility in the Viewer. From the Viewer window menus, choose: File Export.

You might also like