PSPP - Chapter 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 7: Mathematical Expressions 47

TRUNC (number [, mult[, fuzzbits]]) [Function]


Rounds number to a multiple of mult, toward zero. For the default mult of 1, this
is equivalent to discarding the fractional part of number. Values that fall short of a
multiple of mult by less than fuzzbits of errors in the least-significant bits of number
are rounded away from zero. If fuzzbits is not specified then the default is taken from
SET FUZZBITS (see [SET FUZZBITS], page 167), which is 6 unless overridden.

7.7.3 Trigonometric Functions


Trigonometric functions take numeric arguments and produce numeric results.

ARCOS (number) [Function]


ACOS (number) [Function]
Takes the arccosine, in radians, of number. Results in system-missing if number is
not between -1 and 1 inclusive. This function is a pspp extension.

ARSIN (number) [Function]


ASIN (number) [Function]
Takes the arcsine, in radians, of number. Results in system-missing if number is not
between -1 and 1 inclusive.

ARTAN (number) [Function]


ATAN (number) [Function]
Takes the arctangent, in radians, of number.

COS (angle) [Function]


Takes the cosine of angle which should be in radians.

SIN (angle) [Function]


Takes the sine of angle which should be in radians.

TAN (angle) [Function]


Takes the tangent of angle which should be in radians. Results in system-missing at
values of angle that are too close to odd multiples of π/2. Portability: none.

7.7.4 Missing-Value Functions


Missing-value functions take various numeric arguments and yield various types of results.
Except where otherwise stated below, the normal rules of evaluation apply within expression
arguments to these functions. In particular, user-missing values for numeric variables are
converted to system-missing values.

MISSING (expr) [Function]


When expr is simply the name of a numeric variable, returns 1 if the variable has
the system-missing value or if it is user-missing. For any other value 0 is returned.
If expr takes another form, the function returns 1 if the value is system-missing, 0
otherwise.

NMISS (expr [, expr]. . . ) [Function]


Each argument must be a numeric expression. Returns the number of system-missing
values in the list, which may include variable ranges using the var1 TO var2 syntax.
Chapter 7: Mathematical Expressions 48

NVALID (expr [, expr]. . . ) [Function]


Each argument must be a numeric expression. Returns the number of values in the
list that are not system-missing. The list may include variable ranges using the var1
TO var2 syntax.
SYSMIS (expr) [Function]
Returns 1 if expr has the system-missing value, 0 otherwise.
VALUE (variable) [Function]
Prevents the user-missing values of variable from being transformed into system-
missing values, and always results in the actual value of variable, whether it is valid,
user-missing, or system-missing.

7.7.5 Set-Membership Functions


Set membership functions determine whether a value is a member of a set. They take a set
of numeric arguments or a set of string arguments, and produce Boolean results.
String comparisons are performed according to the rules given in Section 7.6 [Relational
Operators], page 45.
ANY (value, set [, set]. . . ) [Function]
Results in true if value is equal to any of the set values. Otherwise, results in false.
If value is system-missing, returns system-missing. System-missing values in set do
not cause ANY to return system-missing.
RANGE (value, low, high [, low, high]. . . ) [Function]
Results in true if value is in any of the intervals bounded by low and high inclusive.
Otherwise, results in false. Each low must be less than or equal to its corresponding
high value. low and high must be given in pairs. If value is system-missing, returns
system-missing. System-missing values in set do not cause RANGE to return system-
missing.

7.7.6 Statistical Functions


Statistical functions compute descriptive statistics on a list of values. Some statistics can
be computed on numeric or string values; other can only be computed on numeric values.
Their results have the same type as their arguments. The current case’s weighting factor
(see Section 13.7 [WEIGHT], page 125) has no effect on statistical functions.
These functions’ argument lists may include entire ranges of variables using the var1 TO
var2 syntax.
Unlike most functions, statistical functions can return non-missing values even when
some of their arguments are missing. Most statistical functions, by default, require only 1
non-missing value to have a non-missing return, but CFVAR, SD, and VARIANCE require 2.
These defaults can be increased (but not decreased) by appending a dot and the minimum
number of valid arguments to the function name. For example, MEAN.3(X, Y, Z) would
only return non-missing if all of ‘X’, ‘Y’, and ‘Z’ were valid.
CFVAR (number, number[, . . . ]) [Function]
Results in the coefficient of variation of the values of number. (The coefficient of
variation is the standard deviation divided by the mean.)
Chapter 7: Mathematical Expressions 49

MAX (value, value[, . . . ]) [Function]


Results in the value of the greatest value. The values may be numeric or string.

MEAN (number, number[, . . . ]) [Function]


Results in the mean of the values of number.

MEDIAN (number, number[, . . . ]) [Function]


Results in the median of the values of number. Given an even number of nonmissing
arguments, yields the mean of the two middle values.

MIN (number, number[, . . . ]) [Function]


Results in the value of the least value. The values may be numeric or string.

SD (number, number[, . . . ]) [Function]


Results in the standard deviation of the values of number.

SUM (number, number[, . . . ]) [Function]


Results in the sum of the values of number.

VARIANCE (number, number[, . . . ]) [Function]


Results in the variance of the values of number.

7.7.7 String Functions


String functions take various arguments and return various results.

CONCAT (string, string[, . . . ]) [Function]


Returns a string consisting of each string in sequence. CONCAT("abc", "def",
"ghi") has a value of "abcdefghi". The resultant string is truncated to a maximum
of 255 characters.

INDEX (haystack, needle) [Function]


Returns a positive integer indicating the position of the first occurrence of needle in
haystack. Returns 0 if haystack does not contain needle. Returns system-missing if
needle is an empty string.

INDEX (haystack, needles, needle_len) [Function]


Divides needles into one or more needles, each with length needle len. Searches
haystack for the first occurrence of each needle, and returns the smallest value. Re-
turns 0 if haystack does not contain any part in needle. It is an error if needle len
does not evenly divide the length of needles. Returns system-missing if needles is an
empty string.

LENGTH (string) [Function]


Returns the number of characters in string.

LOWER (string) [Function]


Returns a string identical to string except that all uppercase letters are changed
to lowercase letters. The definitions of “uppercase” and “lowercase” are system-
dependent.
Chapter 7: Mathematical Expressions 50

LPAD (string, length) [Function]


If string is at least length characters in length, returns string unchanged. Otherwise,
returns string padded with spaces on the left side to length length. Returns an empty
string if length is system-missing, negative, or greater than 255.

LPAD (string, length, padding) [Function]


If string is at least length characters in length, returns string unchanged. Otherwise,
returns string padded with padding on the left side to length length. Returns an
empty string if length is system-missing, negative, or greater than 255, or if padding
does not contain exactly one character.

LTRIM (string) [Function]


Returns string, after removing leading spaces. Other white space, such as tabs, car-
riage returns, line feeds, and vertical tabs, is not removed.

LTRIM (string, padding) [Function]


Returns string, after removing leading padding characters. If padding does not con-
tain exactly one character, returns an empty string.

NUMBER (string, format) [Function]


Returns the number produced when string is interpreted according to format specifier
format. If the format width w is less than the length of string, then only the first w
characters in string are used, e.g. NUMBER("123", F3.0) and NUMBER("1234", F3.0)
both have value 123. If w is greater than string’s length, then it is treated as if
it were right-padded with spaces. If string is not in the correct format for format,
system-missing is returned.

REPLACE (haystack, needle, replacement[, n]) [Function]


Returns string haystack with instances of needle replaced by replacement. If nonneg-
ative integer n is specified, it limits the maximum number of replacements; otherwise,
all instances of needle are replaced.

RINDEX (haystack, needle) [Function]


Returns a positive integer indicating the position of the last occurrence of needle in
haystack. Returns 0 if haystack does not contain needle. Returns system-missing if
needle is an empty string.

RINDEX (haystack, needle, needle_len) [Function]


Divides needle into parts, each with length needle len. Searches haystack for the last
occurrence of each part, and returns the largest value. Returns 0 if haystack does
not contain any part in needle. It is an error if needle len does not evenly divide the
length of needle. Returns system-missing if needle is an empty string or if needle len
is less than 1.

RPAD (string, length) [Function]


If string is at least length characters in length, returns string unchanged. Otherwise,
returns string padded with spaces on the right to length length. Returns an empty
string if length is system-missing, negative, or greater than 255.
Chapter 7: Mathematical Expressions 51

RPAD (string, length, padding) [Function]


If string is at least length characters in length, returns string unchanged. Otherwise,
returns string padded with padding on the right to length length. Returns an empty
string if length is system-missing, negative, or greater than 255, or if padding does
not contain exactly one character.
RTRIM (string) [Function]
Returns string, after removing trailing spaces. Other types of white space are not
removed.
RTRIM (string, padding) [Function]
Returns string, after removing trailing padding characters. If padding does not con-
tain exactly one character, returns an empty string.
STRING (number, format) [Function]
Returns a string corresponding to number in the format given by format specifier
format. For example, STRING(123.56, F5.1) has the value "123.6".
STRUNC (string, n) [Function]
Returns string, first trimming it to at most n bytes, then removing trailing spaces.
Returns an empty string if n is missing or negative.
SUBSTR (string, start) [Function]
Returns a string consisting of the value of string from position start onward. Returns
an empty string if start is system-missing, less than 1, or greater than the length of
string.
SUBSTR (string, start, count) [Function]
Returns a string consisting of the first count characters from string beginning at
position start. Returns an empty string if start or count is system-missing, if start is
less than 1 or greater than the number of characters in string, or if count is less than
1. Returns a string shorter than count characters if start + count - 1 is greater than
the number of characters in string. Examples: SUBSTR("abcdefg", 3, 2) has value
"cd"; SUBSTR("nonsense", 4, 10) has the value "sense".
UPCASE (string) [Function]
Returns string, changing lowercase letters to uppercase letters.

7.7.8 Time & Date Functions


For compatibility, pspp considers dates before 15 Oct 1582 invalid. Most time and date
functions will not accept earlier dates.

7.7.8.1 How times & dates are defined and represented


Times and dates are handled by pspp as single numbers. A time is an interval. pspp
measures times in seconds. Thus, the following intervals correspond with the numeric
values given:
10 minutes 600
1 hour 3,600
1 day, 3 hours, 10 seconds 97,210
Chapter 7: Mathematical Expressions 52

40 days 3,456,000
A date, on the other hand, is a particular instant in the past or the future. pspp
represents a date as a number of seconds since midnight preceding 14 Oct 1582. Because
midnight preceding the dates given below correspond with the numeric pspp dates given:
15 Oct 1582 86,400
4 Jul 1776 6,113,318,400
1 Jan 1900 10,010,390,400
1 Oct 1978 12,495,427,200
24 Aug 1995 13,028,601,600

7.7.8.2 Functions that Produce Times


These functions take numeric arguments and return numeric values that represent times.

TIME.DAYS (ndays) [Function]


Returns a time corresponding to ndays days.

TIME.HMS (nhours, nmins, nsecs) [Function]


Returns a time corresponding to nhours hours, nmins minutes, and nsecs seconds.
The arguments may not have mixed signs: if any of them are positive, then none may
be negative, and vice versa.

7.7.8.3 Functions that Examine Times


These functions take numeric arguments in pspp time format and give numeric results.

CTIME.DAYS (time) [Function]


Results in the number of days and fractional days in time.

CTIME.HOURS (time) [Function]


Results in the number of hours and fractional hours in time.

CTIME.MINUTES (time) [Function]


Results in the number of minutes and fractional minutes in time.

CTIME.SECONDS (time) [Function]


Results in the number of seconds and fractional seconds in time. (CTIME.SECONDS
does nothing; CTIME.SECONDS(x) is equivalent to x.)

7.7.8.4 Functions that Produce Dates


These functions take numeric arguments and give numeric results that represent dates.
Arguments taken by these functions are:
day Refers to a day of the month between 1 and 31. Day 0 is also accepted and
refers to the final day of the previous month. Days 29, 30, and 31 are accepted
even in months that have fewer days and refer to a day near the beginning of
the following month.
month Refers to a month of the year between 1 and 12. Months 0 and 13 are also
accepted and refer to the last month of the preceding year and the first month
of the following year, respectively.
Chapter 7: Mathematical Expressions 53

quarter Refers to a quarter of the year between 1 and 4. The quarters of the year begin
on the first day of months 1, 4, 7, and 10.
week Refers to a week of the year between 1 and 53.
yday Refers to a day of the year between 1 and 366.
year Refers to a year, 1582 or greater. Years between 0 and 99 are treated according
to the epoch set on SET EPOCH, by default beginning 69 years before the
current date (see [SET EPOCH], page 165).
If these functions’ arguments are out-of-range, they are correctly normalized before con-
version to date format. Non-integers are rounded toward zero.
DATE.DMY (day, month, year) [Function]
DATE.MDY (month, day, year) [Function]
Results in a date value corresponding to the midnight before day day of month month
of year year.
DATE.MOYR (month, year) [Function]
Results in a date value corresponding to the midnight before the first day of month
month of year year.
DATE.QYR (quarter, year) [Function]
Results in a date value corresponding to the midnight before the first day of quarter
quarter of year year.
DATE.WKYR (week, year) [Function]
Results in a date value corresponding to the midnight before the first day of week
week of year year.
DATE.YRDAY (year, yday) [Function]
Results in a date value corresponding to the day yday of year year.

7.7.8.5 Functions that Examine Dates


These functions take numeric arguments in pspp date or time format and give numeric
results. These names are used for arguments:
date A numeric value in pspp date format.
time A numeric value in pspp time format.
time-or-date
A numeric value in pspp time or date format.
XDATE.DATE (time-or-date) [Function]
For a time, results in the time corresponding to the number of whole days date-or-
time includes. For a date, results in the date corresponding to the latest midnight at
or before date-or-time; that is, gives the date that date-or-time is in.
XDATE.HOUR (time-or-date) [Function]
For a time, results in the number of whole hours beyond the number of whole days
represented by date-or-time. For a date, results in the hour (as an integer between 0
and 23) corresponding to date-or-time.
Chapter 7: Mathematical Expressions 54

XDATE.JDAY (date) [Function]


Results in the day of the year (as an integer between 1 and 366) corresponding to
date.

XDATE.MDAY (date) [Function]


Results in the day of the month (as an integer between 1 and 31) corresponding to
date.

XDATE.MINUTE (time-or-date) [Function]


Results in the number of minutes (as an integer between 0 and 59) after the last hour
in time-or-date.

XDATE.MONTH (date) [Function]


Results in the month of the year (as an integer between 1 and 12) corresponding to
date.

XDATE.QUARTER (date) [Function]


Results in the quarter of the year (as an integer between 1 and 4) corresponding to
date.

XDATE.SECOND (time-or-date) [Function]


Results in the number of whole seconds after the last whole minute (as an integer
between 0 and 59) in time-or-date.

XDATE.TDAY (date) [Function]


Results in the number of whole days from 14 Oct 1582 to date.

XDATE.TIME (date) [Function]


Results in the time of day at the instant corresponding to date, as a time value. This
is the number of seconds since midnight on the day corresponding to date.

XDATE.WEEK (date) [Function]


Results in the week of the year (as an integer between 1 and 53) corresponding to
date.

XDATE.WKDAY (date) [Function]


Results in the day of week (as an integer between 1 and 7) corresponding to date,
where 1 represents Sunday.

XDATE.YEAR (date) [Function]


Returns the year (as an integer 1582 or greater) corresponding to date.

7.7.8.6 Time and Date Arithmetic


Ordinary arithmetic operations on dates and times often produce sensible results. Adding
a time to, or subtracting one from, a date produces a new date that much earlier or later.
The difference of two dates yields the time between those dates. Adding two times produces
the combined time. Multiplying a time by a scalar produces a time that many times longer.
Since times and dates are just numbers, the ordinary addition and subtraction operators
are employed for these purposes.
Adding two dates does not produce a useful result.
Chapter 7: Mathematical Expressions 55

Dates and times may have very large values. Thus, it is not a good idea to take powers
of these values; also, the accuracy of some procedures may be affected. If necessary, convert
times or dates in seconds to some other unit, like days or years, before performing analysis.
pspp supplies a few functions for date arithmetic:

DATEDIFF (date2, date1, unit) [Function]


Returns the span of time from date1 to date2 in terms of unit, which must be a quoted
string, one of ‘years’, ‘quarters’, ‘months’, ‘weeks’, ‘days’, ‘hours’, ‘minutes’, and
‘seconds’. The result is an integer, truncated toward zero.
One year is considered to span from a given date to the same month, day, and time of
day the next year. Thus, from Jan. 1 of one year to Jan. 1 the next year is considered
to be a full year, but Feb. 29 of a leap year to the following Feb. 28 is not. Similarly,
one month spans from a given day of the month to the same day of the following
month. Thus, there is never a full month from Jan. 31 of a given year to any day in
the following February.

DATESUM (date, quantity, unit[, method]) [Function]


Returns date advanced by the given quantity of the specified unit, which must be
one of the strings ‘years’, ‘quarters’, ‘months’, ‘weeks’, ‘days’, ‘hours’, ‘minutes’,
and ‘seconds’.
When unit is ‘years’, ‘quarters’, or ‘months’, only the integer part of quantity is
considered. Adding one of these units can cause the day of the month to exceed
the number of days in the month. In this case, the method comes into play: if it is
omitted or specified as ‘closest’ (as a quoted string), then the resulting day is the
last day of the month; otherwise, if it is specified as ‘rollover’, then the extra days
roll over into the following month.
When unit is ‘weeks’, ‘days’, ‘hours’, ‘minutes’, or ‘seconds’, the quantity is not
rounded to an integer and method, if specified, is ignored.

7.7.9 Miscellaneous Functions


LAG (variable[, n]) [Function]
variable must be a numeric or string variable name. LAG yields the value of that
variable for the case n before the current one. Results in system-missing (for numeric
variables) or blanks (for string variables) for the first n cases.
LAG obtains values from the cases that become the new active dataset after a procedure
executes. Thus, LAG will not return values from cases dropped by transformations
such as SELECT IF, and transformations like COMPUTE that modify data will change
the values returned by LAG. These are both the case whether these transformations
precede or follow the use of LAG.
If LAG is used before TEMPORARY, then the values it returns are those in cases just
before TEMPORARY. LAG may not be used after TEMPORARY.
If omitted, ncases defaults to 1. Otherwise, ncases must be a small positive constant
integer. There is no explicit limit, but use of a large value will increase memory
consumption.
Chapter 7: Mathematical Expressions 56

YRMODA (year, month, day) [Function]


year is a year, either between 0 and 99 or at least 1582. Unlike other pspp date
functions, years between 0 and 99 always correspond to 1900 through 1999. month
is a month between 1 and 13. day is a day between 0 and 31. A day of 0 refers to
the last day of the previous month, and a month of 13 refers to the first month of the
next year. year must be in range. year, month, and day must all be integers.
YRMODA results in the number of days between 15 Oct 1582 and the date specified,
plus one. The date passed to YRMODA must be on or after 15 Oct 1582. 15 Oct 1582
has a value of 1.
VALUELABEL (variable) [Function]
Returns a string matching the label associated with the current value of variable. If
the current value of variable has no associated label, then this function returns the
empty string. variable may be a numeric or string variable.

7.7.10 Statistical Distribution Functions


pspp can calculate several functions of standard statistical distributions. These functions
are named systematically based on the function and the distribution. The table below
describes the statistical distribution functions in general:
PDF.dist (x[, param. . . ])
Probability density function for dist. The domain of x depends on dist. For
continuous distributions, the result is the density of the probability function at
x, and the range is nonnegative real numbers. For discrete distributions, the
result is the probability of x.
CDF.dist (x[, param. . . ])
Cumulative distribution function for dist, that is, the probability that a random
variate drawn from the distribution is less than x. The domain of x depends
dist. The result is a probability.
SIG.dist (x[, param. . . )
Tail probability function for dist, that is, the probability that a random variate
drawn from the distribution is greater than x. The domain of x depends dist.
The result is a probability. Only a few distributions include an SIG function.
IDF.dist (p[, param. . . ])
Inverse distribution function for dist, the value of x for which the CDF would
yield p. The value of p is a probability. The range depends on dist and is
identical to the domain for the corresponding CDF.
RV.dist ([param. . . ])
Random variate function for dist. The range depends on the distribution.
NPDF.dist (x[, param. . . ])
Noncentral probability density function. The result is the density of the given
noncentral distribution at x. The domain of x depends on dist. The range is
nonnegative real numbers. Only a few distributions include an NPDF function.
NCDF.dist (x[, param. . . ])
Noncentral cumulative distribution function for dist, that is, the probability
that a random variate drawn from the given noncentral distribution is less than
Chapter 7: Mathematical Expressions 57

x. The domain of x depends dist. The result is a probability. Only a few


distributions include an NCDF function.

The individual distributions are described individually below.

7.7.10.1 Continuous Distributions


The following continuous distributions are available:

PDF.BETA (x) [Function]


CDF.BETA (x, a, b) [Function]
IDF.BETA (p, a, b) [Function]
RV.BETA (a, b) [Function]
NPDF.BETA (x, a, b, lambda) [Function]
NCDF.BETA (x, a, b, lambda) [Function]
Beta distribution with shape parameters a and b. The noncentral distribution takes
an additional parameter lambda. Constraints: a > 0, b > 0, lambda >= 0, 0 <= x <=
1, 0 <= p <= 1.

PDF.BVNOR (x0, x1, rho) [Function]


CDF.VBNOR (x0, x1, rho) [Function]
Bivariate normal distribution of two standard normal variables with correlation coef-
ficient rho. Two variates x0 and x1 must be provided. Constraints: 0 <= rho <= 1,
0 <= p <= 1.

PDF.CAUCHY (x, a, b) [Function]


CDF.CAUCHY (x, a, b) [Function]
IDF.CAUCHY (p, a, b) [Function]
RV.CAUCHY (a, b) [Function]
Cauchy distribution with location parameter a and scale parameter b. Constraints:
b > 0, 0 < p < 1.

CDF.CHISQ (x, df) [Function]


SIG.CHISQ (x, df) [Function]
IDF.CHISQ (p, df) [Function]
RV.CHISQ (df) [Function]
NCDF.CHISQ (x, df, lambda) [Function]
Chi-squared distribution with df degrees of freedom. The noncentral distribution
takes an additional parameter lambda. Constraints: df > 0, lambda > 0, x >= 0, 0
<= p < 1.

PDF.EXP (x, a) [Function]


CDF.EXP (x, a) [Function]
IDF.EXP (p, a) [Function]
RV.EXP (a) [Function]
Exponential distribution with scale parameter a. The inverse of a represents the rate
of decay. Constraints: a > 0, x >= 0, 0 <= p < 1.
Chapter 7: Mathematical Expressions 58

PDF.XPOWER (x, a, b) [Function]


RV.XPOWER (a, b) [Function]
Exponential power distribution with positive scale parameter a and nonnegative power
parameter b. Constraints: a > 0, b >= 0, x >= 0, 0 <= p <= 1. This distribution is
a pspp extension.
PDF.F (x, df1, df2) [Function]
CDF.F (x, df1, df2) [Function]
SIG.F (x, df1, df2) [Function]
IDF.F (p, df1, df2) [Function]
RV.F (df1, df2) [Function]
F-distribution of two chi-squared deviates with df1 and df2 degrees of freedom. The
noncentral distribution takes an additional parameter lambda. Constraints: df1 > 0,
df2 > 0, lambda >= 0, x >= 0, 0 <= p < 1.
PDF.GAMMA (x, a, b) [Function]
CDF.GAMMA (x, a, b) [Function]
IDF.GAMMA (p, a, b) [Function]
RV.GAMMA (a, b) [Function]
Gamma distribution with shape parameter a and scale parameter b. Constraints: a
> 0, b > 0, x >= 0, 0 <= p < 1.
PDF.LANDAU (x) [Function]
RV.LANDAU () [Function]
Landau distribution.
PDF.LAPLACE (x, a, b) [Function]
CDF.LAPLACE (x, a, b) [Function]
IDF.LAPLACE (p, a, b) [Function]
RV.LAPLACE (a, b) [Function]
Laplace distribution with location parameter a and scale parameter b. Constraints:
b > 0, 0 < p < 1.
RV.LEVY (c, alpha) [Function]
Levy symmetric alpha-stable distribution with scale c and exponent alpha. Con-
straints: 0 < alpha <= 2.
RV.LVSKEW (c, alpha, beta) [Function]
Levy skew alpha-stable distribution with scale c, exponent alpha, and skewness pa-
rameter beta. Constraints: 0 < alpha <= 2, -1 <= beta <= 1.
PDF.LOGISTIC (x, a, b) [Function]
CDF.LOGISTIC (x, a, b) [Function]
IDF.LOGISTIC (p, a, b) [Function]
RV.LOGISTIC (a, b) [Function]
Logistic distribution with location parameter a and scale parameter b. Constraints:
b > 0, 0 < p < 1.
PDF.LNORMAL (x, a, b) [Function]
CDF.LNORMAL (x, a, b) [Function]
Chapter 7: Mathematical Expressions 59

IDF.LNORMAL (p, a, b) [Function]


RV.LNORMAL (a, b) [Function]
Lognormal distribution with parameters a and b. Constraints: a > 0, b > 0, x >= 0,
0 <= p < 1.
PDF.NORMAL (x, mu, sigma) [Function]
CDF.NORMAL (x, mu, sigma) [Function]
IDF.NORMAL (p, mu, sigma) [Function]
RV.NORMAL (mu, sigma) [Function]
Normal distribution with mean mu and standard deviation sigma. Constraints: b >
0, 0 < p < 1. Three additional functions are available as shorthand:
CDFNORM (x) [Function]
Equivalent to CDF.NORMAL(x, 0, 1).
PROBIT (p) [Function]
Equivalent to IDF.NORMAL(p, 0, 1).
NORMAL (sigma) [Function]
Equivalent to RV.NORMAL(0, sigma).
PDF.NTAIL (x, a, sigma) [Function]
RV.NTAIL (a, sigma) [Function]
Normal tail distribution with lower limit a and standard deviation sigma. This dis-
tribution is a pspp extension. Constraints: a > 0, x > a, 0 < p < 1.
PDF.PARETO (x, a, b) [Function]
CDF.PARETO (x, a, b) [Function]
IDF.PARETO (p, a, b) [Function]
RV.PARETO (a, b) [Function]
Pareto distribution with threshold parameter a and shape parameter b. Constraints:
a > 0, b > 0, x >= a, 0 <= p < 1.
PDF.RAYLEIGH (x, sigma) [Function]
CDF.RAYLEIGH (x, sigma) [Function]
IDF.RAYLEIGH (p, sigma) [Function]
RV.RAYLEIGH (sigma) [Function]
Rayleigh distribution with scale parameter sigma. This distribution is a pspp exten-
sion. Constraints: sigma > 0, x > 0.
PDF.RTAIL (x, a, sigma) [Function]
RV.RTAIL (a, sigma) [Function]
Rayleigh tail distribution with lower limit a and scale parameter sigma. This distri-
bution is a pspp extension. Constraints: a > 0, sigma > 0, x > a.
PDF.T (x, df) [Function]
CDF.T (x, df) [Function]
IDF.T (p, df) [Function]
RV.T (df) [Function]
T-distribution with df degrees of freedom. The noncentral distribution takes an
additional parameter lambda. Constraints: df > 0, 0 < p < 1.
Chapter 7: Mathematical Expressions 60

PDF.T1G (x, a, b) [Function]


CDF.T1G (x, a, b) [Function]
IDF.T1G (p, a, b) [Function]
Type-1 Gumbel distribution with parameters a and b. This distribution is a pspp
extension. Constraints: 0 < p < 1.

PDF.T2G (x, a, b) [Function]


CDF.T2G (x, a, b) [Function]
IDF.T2G (p, a, b) [Function]
Type-2 Gumbel distribution with parameters a and b. This distribution is a pspp
extension. Constraints: x > 0, 0 < p < 1.

PDF.UNIFORM (x, a, b) [Function]


CDF.UNIFORM (x, a, b) [Function]
IDF.UNIFORM (p, a, b) [Function]
RV.UNIFORM (a, b) [Function]
Uniform distribution with parameters a and b. Constraints: a <= x <= b, 0 <= p
<= 1. An additional function is available as shorthand:

UNIFORM (b) [Function]


Equivalent to RV.UNIFORM(0, b).

PDF.WEIBULL (x, a, b) [Function]


CDF.WEIBULL (x, a, b) [Function]
IDF.WEIBULL (p, a, b) [Function]
RV.WEIBULL (a, b) [Function]
Weibull distribution with parameters a and b. Constraints: a > 0, b > 0, x >= 0, 0
<= p < 1.

7.7.10.2 Discrete Distributions


The following discrete distributions are available:

PDF.BERNOULLI (x) [Function]


CDF.BERNOULLI (x, p) [Function]
RV.BERNOULLI (p) [Function]
Bernoulli distribution with probability of success p. Constraints: x = 0 or 1, 0 <= p
<= 1.

PDF.BINOM (x, n, p) [Function]


CDF.BINOM (x, n, p) [Function]
RV.BINOM (n, p) [Function]
Binomial distribution with n trials and probability of success p. Constraints: integer
n > 0, 0 <= p <= 1, integer x <= n.

PDF.GEOM (x, n, p) [Function]


CDF.GEOM (x, n, p) [Function]
RV.GEOM (n, p) [Function]
Geometric distribution with probability of success p. Constraints: 0 <= p <= 1,
integer x > 0.
Chapter 7: Mathematical Expressions 61

PDF.HYPER (x, a, b, c) [Function]


CDF.HYPER (x, a, b, c) [Function]
RV.HYPER (a, b, c) [Function]
Hypergeometric distribution when b objects out of a are drawn and c of the available
objects are distinctive. Constraints: integer a > 0, integer b <= a, integer c <= a,
integer x >= 0.

PDF.LOG (x, p) [Function]


RV.LOG (p) [Function]
Logarithmic distribution with probability parameter p. Constraints: 0 <= p < 1, x
>= 1.

PDF.NEGBIN (x, n, p) [Function]


CDF.NEGBIN (x, n, p) [Function]
RV.NEGBIN (n, p) [Function]
Negative binomial distribution with number of successes parameter n and probability
of success parameter p. Constraints: integer n >= 0, 0 < p <= 1, integer x >= 1.

PDF.POISSON (x, mu) [Function]


CDF.POISSON (x, mu) [Function]
RV.POISSON (mu) [Function]
Poisson distribution with mean mu. Constraints: mu > 0, integer x >= 0.

7.8 Operator Precedence


The following table describes operator precedence. Smaller-numbered levels in the table
have higher precedence. Within a level, operations are always performed from left to right.
The first occurrence of ‘-’ represents unary negation, the second binary subtraction.
1. ( )
2. **
3. -
4. * /
5. + -
6. EQ GE GT LE LT NE
7. AND NOT OR
62

8 Data Input and Output


Data are the focus of the pspp language. Each datum belongs to a case (also called an
observation). Each case represents an individual or “experimental unit”. For example, in
the results of a survey, the names of the respondents, their sex, age, etc. and their responses
are all data and the data pertaining to single respondent is a case. This chapter examines the
pspp commands for defining variables and reading and writing data. There are alternative
commands to read data from predefined sources such as system files or databases (See
Section 9.3 [GET], page 83.)
Note: These commands tell pspp how to read data, but the data will not
actually be read until a procedure is executed.

8.1 BEGIN DATA


BEGIN DATA.
...
END DATA.
BEGIN DATA and END DATA can be used to embed raw ASCII data in a pspp syntax file.
DATA LIST or another input procedure must be used before BEGIN DATA (see Section 8.5
[DATA LIST], page 64). BEGIN DATA and END DATA must be used together. END DATA must
appear by itself on a single line, with no leading white space and exactly one space between
the words END and DATA, like this:
END DATA.

8.2 CLOSE FILE HANDLE


CLOSE FILE HANDLE handle name.
CLOSE FILE HANDLE disassociates the name of a file handle with a given file. The only
specification is the name of the handle to close. Afterward FILE HANDLE.
The file named INLINE, which represents data entered between BEGIN DATA and END
DATA, cannot be closed. Attempts to close it with CLOSE FILE HANDLE have no effect.
CLOSE FILE HANDLE is a pspp extension.

8.3 DATAFILE ATTRIBUTE


DATAFILE ATTRIBUTE
ATTRIBUTE=name(’value’) [name(’value’)]. . .
ATTRIBUTE=name[index](’value’) [name[index](’value’)]. . .
DELETE=name [name]. . .
DELETE=name[index] [name[index]]. . .
DATAFILE ATTRIBUTE adds, modifies, or removes user-defined attributes associated with
the active dataset. Custom data file attributes are not interpreted by pspp, but they are
saved as part of system files and may be used by other software that reads them.
Use the ATTRIBUTE subcommand to add or modify a custom data file attribute. Specify
the name of the attribute as an identifier (see Section 6.1 [Tokens], page 26), followed by
the desired value, in parentheses, as a quoted string. Attribute names that begin with $
Chapter 8: Data Input and Output 63

are reserved for pspp’s internal use, and attribute names that begin with @ or $@ are not
displayed by most pspp commands that display other attributes. Other attribute names
are not treated specially.
Attributes may also be organized into arrays. To assign to an array element, add an
integer array index enclosed in square brackets ([ and ]) between the attribute name and
value. Array indexes start at 1, not 0. An attribute array that has a single element (number
1) is not distinguished from a non-array attribute.
Use the DELETE subcommand to delete an attribute. Specify an attribute name by itself
to delete an entire attribute, including all array elements for attribute arrays. Specify an
attribute name followed by an array index in square brackets to delete a single element of an
attribute array. In the latter case, all the array elements numbered higher than the deleted
element are shifted down, filling the vacated position.
To associate custom attributes with particular variables, instead of with the entire active
dataset, use VARIABLE ATTRIBUTE (see Section 11.15 [VARIABLE ATTRIBUTE], page 108)
instead.
DATAFILE ATTRIBUTE takes effect immediately. It is not affected by conditional and
looping structures such as DO IF or LOOP.

8.4 DATASET commands


DATASET NAME name [WINDOW={ASIS,FRONT}].
DATASET ACTIVATE name [WINDOW={ASIS,FRONT}].
DATASET COPY name [WINDOW={MINIMIZED,HIDDEN,FRONT}].
DATASET DECLARE name [WINDOW={MINIMIZED,HIDDEN,FRONT}].
DATASET CLOSE {name,*,ALL}.
DATASET DISPLAY.
The DATASET commands simplify use of multiple datasets within a pspp session. They
allow datasets to be created and destroyed. At any given time, most pspp commands work
with a single dataset, called the active dataset.
The DATASET NAME command gives the active dataset the specified name, or if it
already had a name, it renames it. If another dataset already had the given name, that
dataset is deleted.
The DATASET ACTIVATE command selects the named dataset, which must already
exist, as the active dataset. Before switching the active dataset, any pending transforma-
tions are executed, as if EXECUTE had been specified. If the active dataset is unnamed before
switching, then it is deleted and becomes unavailable after switching.
The DATASET COPY command creates a new dataset with the specified name, whose
contents are a copy of the active dataset. Any pending transformations are executed, as
if EXECUTE had been specified, before making the copy. If a dataset with the given name
already exists, it is replaced. If the name is the name of the active dataset, then the active
dataset becomes unnamed.
The DATASET DECLARE command creates a new dataset that is initially “empty,”
that is, it has no dictionary or data. If a dataset with the given name already exists, this has
no effect. The new dataset can be used with commands that support output to a dataset,
e.g. AGGREGATE (see Section 12.1 [AGGREGATE], page 112).
Chapter 8: Data Input and Output 64

The DATASET CLOSE command deletes a dataset. If the active dataset is specified by
name, or if ‘*’ is specified, then the active dataset becomes unnamed. If a different dataset
is specified by name, then it is deleted and becomes unavailable. Specifying ALL deletes all
datasets except for the active dataset, which becomes unnamed.
The DATASET DISPLAY command lists all the currently defined datasets.
Many DATASET commands accept an optional WINDOW subcommand. In the psppIRE
GUI, the value given for this subcommand influences how the dataset’s window is displayed.
Outside the GUI, the WINDOW subcommand has no effect. The valid values are:
ASIS Do not change how the window is displayed. This is the default for DATASET
NAME and DATASET ACTIVATE.
FRONT Raise the dataset’s window to the top. Make it the default dataset for running
syntax.
MINIMIZED
Display the window “minimized” to an icon. Prefer other datasets for running
syntax. This is the default for DATASET COPY and DATASET DECLARE.
HIDDEN Hide the dataset’s window. Prefer other datasets for running syntax.

8.5 DATA LIST


Used to read text or binary data, DATA LIST is the most fundamental data-reading com-
mand. Even the more sophisticated input methods use DATA LIST commands as a building
block. Understanding DATA LIST is important to understanding how to use pspp to read
your data files.
There are two major variants of DATA LIST, which are fixed format and free format. In
addition, free format has a minor variant, list format, which is discussed in terms of its
differences from vanilla free format.
Each form of DATA LIST is described in detail below.
See Section 9.4 [GET DATA], page 84, for a command that offers a few enhancements
over DATA LIST and that may be substituted for DATA LIST in many situations.

8.5.1 DATA LIST FIXED


DATA LIST [FIXED]
{TABLE,NOTABLE}
[FILE=’file name’ [ENCODING=’encoding’]]
[RECORDS=record count]
[END=end var]
[SKIP=record count]
/[line no] var spec . . .

where each var spec takes one of the forms


var list start-end [type spec]
var list (fortran spec)
DATA LIST FIXED is used to read data files that have values at fixed positions on each
line of single-line or multiline records. The keyword FIXED is optional.
Chapter 8: Data Input and Output 65

The FILE subcommand must be used if input is to be taken from an external file. It may
be used to specify a file name as a string or a file handle (see Section 6.9 [File Handles],
page 42). If the FILE subcommand is not used, then input is assumed to be specified
within the command file using BEGIN DATA. . . END DATA (see Section 8.1 [BEGIN DATA],
page 62). The ENCODING subcommand may only be used if the FILE subcommand is also
used. It specifies the character encoding of the file. See Section 16.16 [INSERT], page 161,
for information on supported encodings.
The optional RECORDS subcommand, which takes a single integer as an argument, is used
to specify the number of lines per record. If RECORDS is not specified, then the number of
lines per record is calculated from the list of variable specifications later in DATA LIST.
The END subcommand is only useful in conjunction with INPUT PROGRAM. See Section 8.9
[INPUT PROGRAM], page 71, for details.
The optional SKIP subcommand specifies a number of records to skip at the beginning
of an input file. It can be used to skip over a row that contains variable names, for example.
DATA LIST can optionally output a table describing how the data file will be read. The
TABLE subcommand enables this output, and NOTABLE disables it. The default is to output
the table.
The list of variables to be read from the data list must come last. Each line in the
data record is introduced by a slash (‘/’). Optionally, a line number may follow the slash.
Following, any number of variable specifications may be present.
Each variable specification consists of a list of variable names followed by a description
of their location on the input line. Sets of variables may be specified using the DATA LIST
TO convention (see Section 6.7.3 [Sets of Variables], page 32). There are two ways to specify
the location of the variable on the line: columnar style and FORTRAN style.
In columnar style, the starting column and ending column for the field are specified after
the variable name, separated by a dash (‘-’). For instance, the third through fifth columns
on a line would be specified ‘3-5’. By default, variables are considered to be in ‘F’ format
(see Section 6.7.4 [Input and Output Formats], page 32). (This default can be changed; see
Section 16.20 [SET], page 163, for more information.)
In columnar style, to use a variable format other than the default, specify the format
type in parentheses after the column numbers. For instance, for alphanumeric ‘A’ format,
use ‘(A)’.
In addition, implied decimal places can be specified in parentheses after the column
numbers. As an example, suppose that a data file has a field in which the characters ‘1234’
should be interpreted as having the value 12.34. Then this field has two implied decimal
places, and the corresponding specification would be ‘(2)’. If a field that has implied
decimal places contains a decimal point, then the implied decimal places are not applied.
Changing the variable format and adding implied decimal places can be done together;
for instance, ‘(N,5)’.
When using columnar style, the input and output width of each variable is computed
from the field width. The field width must be evenly divisible into the number of variables
specified.
FORTRAN style is an altogether different approach to specifying field locations. With
this approach, a list of variable input format specifications, separated by commas, are
Chapter 8: Data Input and Output 66

placed after the variable names inside parentheses. Each format specifier advances as many
characters into the input line as it uses.
Implied decimal places also exist in FORTRAN style. A format specification with d
decimal places also has d implied decimal places.
In addition to the standard format specifiers (see Section 6.7.4 [Input and Output For-
mats], page 32), FORTRAN style defines some extensions:
X Advance the current column on this line by one character position.
Tx Set the current column on this line to column x, with column numbers consid-
ered to begin with 1 at the left margin.
NEWRECx Skip forward x lines in the current record, resetting the active column to the
left margin.
Repeat count
Any format specifier may be preceded by a number. This causes the action of
that format specifier to be repeated the specified number of times.
(spec1, . . . , specN )
Group the given specifiers together. This is most useful when preceded by a
repeat count. Groups may be nested arbitrarily.
FORTRAN and columnar styles may be freely intermixed. Columnar style leaves the
active column immediately after the ending column specified. Record motion using NEWREC
in FORTRAN style also applies to later FORTRAN and columnar specifiers.

Examples
1.
DATA LIST TABLE /NAME 1-10 (A) INFO1 TO INFO3 12-17 (1).

BEGIN DATA.
John Smith 102311
Bob Arnold 122015
Bill Yates 918 6
END DATA.
Defines the following variables:
• NAME, a 10-character-wide string variable, in columns 1 through 10.
• INFO1, a numeric variable, in columns 12 through 13.
• INFO2, a numeric variable, in columns 14 through 15.
• INFO3, a numeric variable, in columns 16 through 17.
The BEGIN DATA/END DATA commands cause three cases to be defined:
Case NAME INFO1 INFO2 INFO3
1 John Smith 10 23 11
2 Bob Arnold 12 20 15
3 Bill Yates 9 18 6
The TABLE keyword causes pspp to print out a table describing the four variables
defined.
Chapter 8: Data Input and Output 67

2.
DAT LIS FIL="survey.dat"
/ID 1-5 NAME 7-36 (A) SURNAME 38-67 (A) MINITIAL 69 (A)
/Q01 TO Q50 7-56
/.
Defines the following variables:
• ID, a numeric variable, in columns 1-5 of the first record.
• NAME, a 30-character string variable, in columns 7-36 of the first record.
• SURNAME, a 30-character string variable, in columns 38-67 of the first record.
• MINITIAL, a 1-character string variable, in column 69 of the first record.
• Fifty variables Q01, Q02, Q03, . . . , Q49, Q50, all numeric, Q01 in column 7, Q02 in
column 8, . . . , Q49 in column 55, Q50 in column 56, all in the second record.
Cases are separated by a blank record.
Data is read from file survey.dat in the current directory.
This example shows keywords abbreviated to their first 3 letters.

8.5.2 DATA LIST FREE


DATA LIST FREE
[({TAB,’c’}, . . . )]
[{NOTABLE,TABLE}]
[FILE=’file name’ [ENCODING=’encoding’]]
[SKIP=record cnt]
/var spec . . .

where each var spec takes one of the forms


var list [(type spec)]
var list *
In free format, the input data is, by default, structured as a series of fields separated
by spaces, tabs, or line breaks. If the current DECIMAL separator is DOT (see Section 16.20
[SET], page 163), then commas are also treated as field separators. Each field’s content
may be unquoted, or it may be quoted with a pairs of apostrophes (‘’’) or double quotes
(‘"’). Unquoted white space separates fields but is not part of any field. Any mix of spaces,
tabs, and line breaks is equivalent to a single space for the purpose of separating fields, but
consecutive commas will skip a field.
Alternatively, delimiters can be specified explicitly, as a parenthesized, comma-separated
list of single-character strings immediately following FREE. The word TAB may also be
used to specify a tab character as a delimiter. When delimiters are specified explicitly, only
the given characters, plus line breaks, separate fields. Furthermore, leading spaces at the
beginnings of fields are not trimmed, consecutive delimiters define empty fields, and no form
of quoting is allowed.
The NOTABLE and TABLE subcommands are as in DATA LIST FIXED above. NOTABLE is
the default.
The FILE, SKIP, and ENCODING subcommands are as in DATA LIST FIXED above.
Chapter 8: Data Input and Output 68

The variables to be parsed are given as a single list of variable names. This list must
be introduced by a single slash (‘/’). The set of variable names may contain format spec-
ifications in parentheses (see Section 6.7.4 [Input and Output Formats], page 32). Format
specifications apply to all variables back to the previous parenthesized format specification.
In addition, an asterisk may be used to indicate that all variables preceding it are to
have input/output format ‘F8.0’.
Specified field widths are ignored on input, although all normal limits on field width
apply, but they are honored on output.

8.5.3 DATA LIST LIST


DATA LIST LIST
[({TAB,’c’}, . . . )]
[{NOTABLE,TABLE}]
[FILE=’file name’ [ENCODING=’encoding’]]
[SKIP=record count]
/var spec . . .

where each var spec takes one of the forms


var list [(type spec)]
var list *
With one exception, DATA LIST LIST is syntactically and semantically equivalent to DATA
LIST FREE. The exception is that each input line is expected to correspond to exactly one
input record. If more or fewer fields are found on an input line than expected, an appropriate
diagnostic is issued.

8.6 END CASE


END CASE.
END CASE is used only within INPUT PROGRAM to output the current case. See Section 8.9
[INPUT PROGRAM], page 71, for details.

8.7 END FILE


END FILE.
END FILE is used only within INPUT PROGRAM to terminate the current input program.
See Section 8.9 [INPUT PROGRAM], page 71.

8.8 FILE HANDLE


For text files:
FILE HANDLE handle name
/NAME=’file name
[/MODE=CHARACTER]
[/ENDS={CR,CRLF}]
/TABWIDTH=tab width
[ENCODING=’encoding’]

You might also like