SPSS Project
SPSS Project
HISTORY OF SPSS
The history of SPSS can be traced back to 1967, when Norman H. Nie, then a 22-
year-old Ph.D. candidate at Stanford University, decided to develop his own
solution after becoming "frustrated trying to use a computer to analyze data
describing the political culture of five nations," according to the September 22,
2003, issue of the Chicago Tribune. The application Nie was trying to use was
created for biologists, not social scientists. With that in mind, Nie took detailed
notes about what he needed in a software application and enlisted the help of Dale
H. Bent, a fellow doctoral candidate whose background was in operations research,
to design a file structure. Hadlai "Tex" Hull, who had recently received his MBA
from Stanford, was tapped to write the code, and by 1968 the Statistical Package
for the Social Sciences (SPSS) was born. Nie and Hull left Stanford to pursue
careers at the University of Chicago, and they brought their SPSS program along
with them. However, their main focus was on academics and research--not on
developing or selling software. Hull became head of the university's Computation
Center. Nie joined its National Opinion Research Center and eventually was
named chairman of the political science department.
Use the Paste button. Make selections from the menus and dialog boxes,
and then click the Paste button instead of the OK button. This will paste the
underlying commands into a command syntax window.
Record commands in the log. Select Display commands in the log on the
Viewer tab in the Options dialog box (Edit menu, Options) or run the
command SET PRINTBACK ON. As you run analyses, the commands for
your dialog box selections will be recorded and displayed in the log in the
Viewer window. You can then copy and paste the commands from the
Viewer into a syntax window or text editor. This setting persists across
sessions, so you have to specify it only once.
Retrieve commands from the journal file. Most actions that you perform
in the graphical user interface (and all commands that you run from a
command syntax window) are automatically recorded in the journal file in
the form of command syntax. The default name of the journal file is spss.jnl.
The default location varies, depending on your operating system. Both the
name and location of the journal file are displayed on the General tab in the
Options dialog box (Edit menu, Options).
4
Highlight the commands that you want to run in a command syntax window
and click the Run button.
Invoke one command file from another with the INCLUDE or INSERT
command. For more information, see “Using INSERT with a Master
Command Syntax File” on p. 20.
Use the Production Facility to create production jobs that can run unattended
and even start unattended (and automatically) using common scheduling
software. See the Help system for more information about the Production
Facility.
Use SPSSB (available only with the server version) to run command files
from a command line and automatically route results to different output
destinations in different formats. See the SPSSB documentation supplied
with the SPSS server software for more information.
Figure 2-1
Command syntax pasted from a dialog box
Syntax Rules
Commands run from a command syntax window during a typical SPSS
session must follow the interactive command syntax rules.
5
Commands files run via SPSSB or invoked via the INCLUDE command
must follow the batch command syntax rules.
Interactive Rules
The following rules apply to command specifications in interactive mode:
Each command must start on a new line. Commands can begin in any
column of a command line and continue for as many lines as needed. The
exception is the END DATA command, which must begin in the first
column of the first line after the end of data.
Batch Rules
The following rules apply to command specifications in batch or production mode:
All commands in the command file must begin in column 1. You can use
plus
(+) or minus (–) signs in the first column if you want to indent the command
Specification to make the command file more readable.
If multiple lines are used for a command, column 1 of each continuation
line must be blank.
Command terminators are optional.
A line cannot exceed 256 bytes; any additional characters are truncated.
6
Status bar in command syntax window with current line number and column
position displayed
Example
temporary.sps.
DATA LIST FREE /var1 var2.
BEGIN DATA
12
34
56
78
9 10
9
END DATA.
TEMPORARY.
COMPUTE var1=var1+ 5.
RECODE var2 (1 thru 5=1) (6 thru 10=2).
FREQUENCIES
/VARIABLES=var1 var2
/STATISTICS=MEAN STDDEV MIN MAX.
DESCRIPTIVES
/VARIABLES=var1 var2
/STATISTICS=MEAN STDDEV MIN MAX.
The transformed values from the two transformation commands that follow
the TEMPORARY command will be used in the FREQUENCIES
procedure.
The original data values will be used in the subsequent DESCRIPTIVES
procedure, yielding different results for the same summary statistics.
Example
*scratchvar.sps.
DATA LIST FREE / var1.
BEGIN DATA
12345
END DATA.
COMPUTE factor=1.
LOOP #tempvar=1 TO var1.
- COMPUTE factor=factor * #tempvar.
END LOOP.
EXECUTE.
Figure 2-4
Result of loop with scratch variable
The loop structure computes the factorial for each value of var1 and puts
the factorial value in the variable factor.
The scratch variable #tempvar is used as an index variable for the loop
structure.
11
For each case, the COMPUTE command is run iteratively up to the value of
var1.
For each iteration, the current value of the variable factor is multiplied by
the current loop iteration number stored in #tempvar.
The EXECUTE command runs the transformation commands, after which
the scratch variable is discarded.
The use of scratch variables doesn’t technically “protect” the original data in any
way, but it does prevent the data file from getting cluttered with extraneous
variables. If you need to remove temporary variables that still exist after reading
the data, you can use the DELETE VARIABLES command to eliminate them.
Before you can work with data in SPSS, you need some data to work with. There
are several ways to get data into the application:
Open a data file that has already been saved in SPSS format.
Enter data manually in the Data Editor.
Read a data file from another source, such as a database, text data file
spreadsheet, SAS, or Stata.
Opening an SPSS-format data file is simple, and manually entering data in the
Data Editor is not likely to be your first choice, particularly if you have a large
amount of data. This chapter focuses on how to read data files created and saved in
other applications and formats.
Access
Btrieve
DB2
dBase
Excel
FoxPro
Informix
Oracle
Paradox
Progress
SQL Base
SQL Server
13
Sybase
Most of these drivers can be installed by installing the SPSS Data Access Pack.
You can install the SPSS Data Access Pack from the AutoPlay menu on the SPSS
installation CD.
If you need a Microsoft Access driver, you will need to install the Microsoft Data
Access Pack. An installable version is located in the Microsoft Data Access Pack
folder on the SPSS installation CD.
Before you can use the installed database drivers, you may also need to configure
the drivers using the Windows ODBC Data Source Administrator. For the SPSS
Data Access Pack, installation instructions and information on configuring data
sources are located in the Installation Instructions folder on the SPSS installation
CD.
*access1.sps.
GET DATA /TYPE=ODBC /CONNECT=
'DSN=MS Access Database;DBQ=C:\examples\data\dm_demo.mdb;'+
'DriverId=25;FIL=MS Access;MaxBufferSize=2048;PageTimeout=5;'
/SQL = 'SELECT * FROM CombinedTable'.EXECUTE.
TYPE=ODBC indicates that an ODBC driver will be used to read the data.
This is required for reading data from any database, and it can also be used
for other data sources with ODBC drivers, such as Excel workbooks. For
more information, see “Reading Multiple Worksheets” on p. 33.
14
CONNECT identifies the data source. For this example, the CONNECT
string was copied from the command syntax generated by the Database
Wizard. The entire string must be enclosed in single or double quotes. In this
example, we have split the long string onto two lines using a plus sign (+) to
combine the two strings.
The SQL subcommand can contain any SQL statements supported by the
database format. Each line must be enclosed in single or double quotes.
SELECT * FROM Combined Table reads all of the fields (columns) and all
records (rows) from the table named Combined Table in the database.
Any field names that are not valid SPSS variable names are automatically
converted to valid variable names, and the original field names are used as
variable labels. In this database table, many of the field names contain
spaces, which are removed in the variable names.
15
Figure 3-1
Database field names converted to valid variable names
Inner join. Records in the two tables with matching values for one or more
specified fields are included. For example, a unique ID value may be used in
each table, and records with matching ID values are combined. Any records
without matching identifier values in the other table are omitted.
Left outer join. All records from the first table are included regardless of
the criteria used to match records.
Right outer join. Essentially the opposite of a left outer join. So, the
appropriate one to use is basically a matter of the order in which the tables
are specified in the SQL SELECT clause.
16
Example
In the previous two examples, all of the data resided in a single database
table. But what if the data were divided between two tables? This example
merges data from two different tables: one containing demographic
information for survey respondents and one containing survey responses.
*access_multtables1.sps.
GET DATA /TYPE=ODBC /CONNECT= 'DSN=MS Access
Database;DBQ=C:\examples\data\dm_demo.mdb;'+'DriverId=25;FIL=MS
Access;MaxBufferSize=2048;PageTimeout=5;'
/SQL =
'SELECT * FROM DemographicInformation, SurveyResponses' ' WHERE
DemographicInformation.ID=SurveyResponses.ID'.
EXECUTE.
The SELECT clause specifies all fields from both tables.
The WHERE clause matches records from the two tables based on the value
of the ID field in both tables. Any records in either table without matching
ID values in the other table are excluded.
The result is an inner join in which only records with matching ID values in
both tables are included in the active dataset.
Example
In addition to one-to-one matching, as in the previous inner join example, you can
also merge tables with a one-to-many matching scheme. For example, you could
match a table in which there are only a few records representing data values and
associated descriptive labels with values in a table containing hundreds or
thousands of records representing survey respondents.
In this example, we read data from an SQL Server database, using an outer join to
avoid omitting records in the larger table that don’t have matching identifier values
in the smaller table.
*sqlserver_outer_join.sps.
GET DATA /TYPE=ODBC/CONNECT= 'DSN=SQLServer;UID=;APP=SPSS
For
Windows;''WSID=ROLIVERLAP;Network=DBMSSOCN;Trusted_Connection=Y
s'/SQL ='SELECT SurveyResponses.ID, SurveyResponses.Internet,'' [Value
Labels].[Internet Label]'' FROM SurveyResponses LEFT OUTER JOIN [Value
Labels]'' ON SurveyResponses.Internet'' = [Value Labels].[InternetValue]'.
17
Figure 3-2
SQL Server tables to be merged with outer join
Example
Figure 3-4 Typical Excel worksheet
To read this spreadsheet without the title row or total row and column:
*readexcel.sps.
GET DATA
/TYPE=XLS
/FILE='c:\examples\data\sales.xls'
/SHEET=NAME 'Gross Revenue'
/CELLRANGE=RANGE 'A2:I15'/READNAMES=on .
Reading Multiple Worksheets
An Excel file (workbook) can contain multiple worksheets, and you can read
multiple worksheets from the same workbook by treating the Excel file as a
database. This requires an ODBC driver for Excel.
19
Figure 3-6
Multiple worksheets in same workbook
When reading multiple worksheets, you lose some of the flexibility available for
reading individual worksheets:
You cannot specify cell ranges.
The first non-empty row of each worksheet should contain column labels
that will be used as variable names.
Only basic data types—string and numeric—is preserved, and string
variables may be set to an arbitrarily long width.
Example
In this example, the first worksheet contains information about store location, and
the second and third contain information for different departments. All three
contain a column, Store Number, that uniquely identifies each store, so, the
information in the three sheets can be merged correctly regardless of the order in
which the stores are listed on each worksheet.
20
Text data files can be further subdivided into two more categories: