PSPP - Chapter 3
PSPP - Chapter 3
PSPP - Chapter 3
40 days 3,456,000
A date, on the other hand, is a particular instant in the past or the future. pspp
represents a date as a number of seconds since midnight preceding 14 Oct 1582. Because
midnight preceding the dates given below correspond with the numeric pspp dates given:
15 Oct 1582 86,400
4 Jul 1776 6,113,318,400
1 Jan 1900 10,010,390,400
1 Oct 1978 12,495,427,200
24 Aug 1995 13,028,601,600
quarter Refers to a quarter of the year between 1 and 4. The quarters of the year begin
on the first day of months 1, 4, 7, and 10.
week Refers to a week of the year between 1 and 53.
yday Refers to a day of the year between 1 and 366.
year Refers to a year, 1582 or greater. Years between 0 and 99 are treated according
to the epoch set on SET EPOCH, by default beginning 69 years before the
current date (see [SET EPOCH], page 165).
If these functions’ arguments are out-of-range, they are correctly normalized before con-
version to date format. Non-integers are rounded toward zero.
DATE.DMY (day, month, year) [Function]
DATE.MDY (month, day, year) [Function]
Results in a date value corresponding to the midnight before day day of month month
of year year.
DATE.MOYR (month, year) [Function]
Results in a date value corresponding to the midnight before the first day of month
month of year year.
DATE.QYR (quarter, year) [Function]
Results in a date value corresponding to the midnight before the first day of quarter
quarter of year year.
DATE.WKYR (week, year) [Function]
Results in a date value corresponding to the midnight before the first day of week
week of year year.
DATE.YRDAY (year, yday) [Function]
Results in a date value corresponding to the day yday of year year.
Dates and times may have very large values. Thus, it is not a good idea to take powers
of these values; also, the accuracy of some procedures may be affected. If necessary, convert
times or dates in seconds to some other unit, like days or years, before performing analysis.
pspp supplies a few functions for date arithmetic:
are reserved for pspp’s internal use, and attribute names that begin with @ or $@ are not
displayed by most pspp commands that display other attributes. Other attribute names
are not treated specially.
Attributes may also be organized into arrays. To assign to an array element, add an
integer array index enclosed in square brackets ([ and ]) between the attribute name and
value. Array indexes start at 1, not 0. An attribute array that has a single element (number
1) is not distinguished from a non-array attribute.
Use the DELETE subcommand to delete an attribute. Specify an attribute name by itself
to delete an entire attribute, including all array elements for attribute arrays. Specify an
attribute name followed by an array index in square brackets to delete a single element of an
attribute array. In the latter case, all the array elements numbered higher than the deleted
element are shifted down, filling the vacated position.
To associate custom attributes with particular variables, instead of with the entire active
dataset, use VARIABLE ATTRIBUTE (see Section 11.15 [VARIABLE ATTRIBUTE], page 108)
instead.
DATAFILE ATTRIBUTE takes effect immediately. It is not affected by conditional and
looping structures such as DO IF or LOOP.
The DATASET CLOSE command deletes a dataset. If the active dataset is specified by
name, or if ‘*’ is specified, then the active dataset becomes unnamed. If a different dataset
is specified by name, then it is deleted and becomes unavailable. Specifying ALL deletes all
datasets except for the active dataset, which becomes unnamed.
The DATASET DISPLAY command lists all the currently defined datasets.
Many DATASET commands accept an optional WINDOW subcommand. In the psppIRE
GUI, the value given for this subcommand influences how the dataset’s window is displayed.
Outside the GUI, the WINDOW subcommand has no effect. The valid values are:
ASIS Do not change how the window is displayed. This is the default for DATASET
NAME and DATASET ACTIVATE.
FRONT Raise the dataset’s window to the top. Make it the default dataset for running
syntax.
MINIMIZED
Display the window “minimized” to an icon. Prefer other datasets for running
syntax. This is the default for DATASET COPY and DATASET DECLARE.
HIDDEN Hide the dataset’s window. Prefer other datasets for running syntax.
The FILE subcommand must be used if input is to be taken from an external file. It may
be used to specify a file name as a string or a file handle (see Section 6.9 [File Handles],
page 42). If the FILE subcommand is not used, then input is assumed to be specified
within the command file using BEGIN DATA. . . END DATA (see Section 8.1 [BEGIN DATA],
page 62). The ENCODING subcommand may only be used if the FILE subcommand is also
used. It specifies the character encoding of the file. See Section 16.16 [INSERT], page 161,
for information on supported encodings.
The optional RECORDS subcommand, which takes a single integer as an argument, is used
to specify the number of lines per record. If RECORDS is not specified, then the number of
lines per record is calculated from the list of variable specifications later in DATA LIST.
The END subcommand is only useful in conjunction with INPUT PROGRAM. See Section 8.9
[INPUT PROGRAM], page 71, for details.
The optional SKIP subcommand specifies a number of records to skip at the beginning
of an input file. It can be used to skip over a row that contains variable names, for example.
DATA LIST can optionally output a table describing how the data file will be read. The
TABLE subcommand enables this output, and NOTABLE disables it. The default is to output
the table.
The list of variables to be read from the data list must come last. Each line in the
data record is introduced by a slash (‘/’). Optionally, a line number may follow the slash.
Following, any number of variable specifications may be present.
Each variable specification consists of a list of variable names followed by a description
of their location on the input line. Sets of variables may be specified using the DATA LIST
TO convention (see Section 6.7.3 [Sets of Variables], page 32). There are two ways to specify
the location of the variable on the line: columnar style and FORTRAN style.
In columnar style, the starting column and ending column for the field are specified after
the variable name, separated by a dash (‘-’). For instance, the third through fifth columns
on a line would be specified ‘3-5’. By default, variables are considered to be in ‘F’ format
(see Section 6.7.4 [Input and Output Formats], page 32). (This default can be changed; see
Section 16.20 [SET], page 163, for more information.)
In columnar style, to use a variable format other than the default, specify the format
type in parentheses after the column numbers. For instance, for alphanumeric ‘A’ format,
use ‘(A)’.
In addition, implied decimal places can be specified in parentheses after the column
numbers. As an example, suppose that a data file has a field in which the characters ‘1234’
should be interpreted as having the value 12.34. Then this field has two implied decimal
places, and the corresponding specification would be ‘(2)’. If a field that has implied
decimal places contains a decimal point, then the implied decimal places are not applied.
Changing the variable format and adding implied decimal places can be done together;
for instance, ‘(N,5)’.
When using columnar style, the input and output width of each variable is computed
from the field width. The field width must be evenly divisible into the number of variables
specified.
FORTRAN style is an altogether different approach to specifying field locations. With
this approach, a list of variable input format specifications, separated by commas, are
Chapter 8: Data Input and Output 66
placed after the variable names inside parentheses. Each format specifier advances as many
characters into the input line as it uses.
Implied decimal places also exist in FORTRAN style. A format specification with d
decimal places also has d implied decimal places.
In addition to the standard format specifiers (see Section 6.7.4 [Input and Output For-
mats], page 32), FORTRAN style defines some extensions:
X Advance the current column on this line by one character position.
Tx Set the current column on this line to column x, with column numbers consid-
ered to begin with 1 at the left margin.
NEWRECx Skip forward x lines in the current record, resetting the active column to the
left margin.
Repeat count
Any format specifier may be preceded by a number. This causes the action of
that format specifier to be repeated the specified number of times.
(spec1, . . . , specN )
Group the given specifiers together. This is most useful when preceded by a
repeat count. Groups may be nested arbitrarily.
FORTRAN and columnar styles may be freely intermixed. Columnar style leaves the
active column immediately after the ending column specified. Record motion using NEWREC
in FORTRAN style also applies to later FORTRAN and columnar specifiers.
Examples
1.
DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).
BEGIN DATA.
John Smith 102311
Bob Arnold 122015
Bill Yates 918 6
END DATA.
Defines the following variables:
• NAME, a 10-character-wide string variable, in columns 1 through 10.
• INFO1, a numeric variable, in columns 12 through 13.
• INFO2, a numeric variable, in columns 14 through 15.
• INFO3, a numeric variable, in columns 16 through 17.
The BEGIN DATA/END DATA commands cause three cases to be defined:
Case NAME INFO1 INFO2 INFO3
1 John Smith 10 23 11
2 Bob Arnold 12 20 15
3 Bill Yates 9 18 6
The TABLE keyword causes pspp to print out a table describing the four variables
defined.
Chapter 8: Data Input and Output 67
2.
DAT LIS FIL="survey.dat"
/ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
/Q01 TO Q50 7-56
/.
Defines the following variables:
• ID, a numeric variable, in columns 1-5 of the first record.
• NAME, a 30-character string variable, in columns 7-36 of the first record.
• SURNAME, a 30-character string variable, in columns 38-67 of the first record.
• MINITIAL, a 1-character string variable, in column 69 of the first record.
• Fifty variables Q01, Q02, Q03, . . . , Q49, Q50, all numeric, Q01 in column 7, Q02 in
column 8, . . . , Q49 in column 55, Q50 in column 56, all in the second record.
Cases are separated by a blank record.
Data is read from file survey.dat in the current directory.
This example shows keywords abbreviated to their first 3 letters.
The variables to be parsed are given as a single list of variable names. This list must
be introduced by a single slash (‘/’). The set of variable names may contain format spec-
ifications in parentheses (see Section 6.7.4 [Input and Output Formats], page 32). Format
specifications apply to all variables back to the previous parenthesized format specification.
In addition, an asterisk may be used to indicate that all variables preceding it are to
have input/output format ‘F8.0’.
Specified field widths are ignored on input, although all normal limits on field width
apply, but they are honored on output.