Arrays
Arrays
Most mathematical and computer languages have some notation for repeating or
other related values. These repeated structures are often called a matrix, a vector, a
dimension, a table, or in the SAS data step, this structure is called an array. While
every memory address in a computer is an array of sorts, the SAS definition is a
group of related variables that are already defined in a data step. Some differences
between SAS arrays and those of other languages are that SAS array elements
don’t need to be contiguous, the same length, or even related at all. All elements
must be character or numeric.
For example, suppose we have a file where each record contains5 values with the
temperatures for each hour of the day. These temperatures are in Fahrenheit and
we need to convert them to 5 Celsius values. Without arrays we need to repeat the
same calculation for all 5 temperature variables:
data temp;
input place $ temp1 temp2 temp3 temp4 temp5;
cards;
AP 40 50 60 70 80 90
MP 52 62 72 82 92 102
UP 46 56 66 76 86 96 106
KA 25 35 45 55 65 75 85
HA 30 40 50 60 70 80 90
RUN;
In the above example there are only 5 elements in each array, it would work just as
well with hundreds of elements. In addition to simplifying the calculations, by
defining arrays for the temperature values we could also have used the min the
input statement to simplify the input process. It should also be noted, while
TEMP1 is equivalent to the first element, TEMP2 to the second etc., the variables
do not need to be named consecutively. The array would work just as well with
non-consecutive variable names.
In this example, the variable x is equivalent to the first element, a to the second etc.
ARRAY STATEMENT
The statement used to define an array is the ARRAY statement.
The ARRAY statement is a compiler statement within the data step. In addition,
the array elements cannot be used in compiler statements such as DROP or KEEP.
An array must be defined within the data step prior to being referenced and if an
array is referenced without first being defined, an error will occur. Defining an
array within one data step and referencing the array within another data step will
also cause errors, because arrays exist only for the duration of the data step in
which they are defined. Therefore, it is necessary to define an array within every
datastep where the array will be referenced.
The ARRAY statement provides the following information about the SAS array:
The name must follow the same rules as variable names therefore, any valid SAS
name is a valid array name. When naming an array it is best to avoid using an array
name that is the same as a function name to avoid confusion. While parentheses or
square brackets can be used when referencing array elements, the braces {} are
used most often since they are not used in other SAS statements. SAS does place
one restriction on the name of the array. The array name may not be the same name
as any variable on the SAS data set.
Restriction:
The elements for an array must be all numeric or all character. When the elements
are character it is necessary to indicate this at the time the array is defined by
including the dollar sign ($) on the array statement after the reference to the
number of elements. If the dollar sign is not included on the array statement, the
array is assumed to be numeric.
data text;
array names{*} $ n1-n10;
array capitals{*} $ c1-c10;
input names{*};
do i=1 to 10;
capitals{i}=upcase(names{i});
end;
datalines;
smithers michaels gonzale zhurth frank bleigh
rounder joseph peters sam
When all numeric or all character variables in the data set are to be elements within
the array, there are several special variables that may used instead of listing the
individual variables as elements. The special variables are:
_ALL_ - when all variables on the data set will be used as elements and the
variables are all the same type
N is the array subscript in the array definition and it refers to the number of
elements within the array. A numeric constant, a variable whose value is a number,
a numeric SAS expression, or an asterisk (*) may be used as the subscript. The
subscript must be enclosed within braces {}, square brackets [], or parentheses ().
In our temperature example the subscript will be 5 for each of the 5 temperature
variables:
When the asterisk is used, it is not necessary to know how many elements are
contained within the array. SAS will count the number of elements for you. An
example of using the asterisk is when one of the special variables defines the
elements.
do i = 1 to dim(allnums);
allnums{i} = round(allnums{i},.1);
end;
In this example, when the array ALLNUMS is defined, SAS will count the number
of numeric variables used as elements of the array. Then, in the DO group
processing, the DIM function will return the count value as the ending range for
the loop.
data array_temp;
set temp;
array temperature_array {*} temp1-temp5;
array celsius_array {*} celsius_temp1-
celsius_temp5;
do i = 1 to dim(temperature_array);
celsius_array{i} =
round(5/9*(temperature_array[i] - 32));
end;
drop i;
run;
OF Operator
It is common to perform a calculation using all of the variables or elements that are
associated with an array. For example, if you want to sum each of the 5
temperatures. To do that, you pass the name of the array using the [*] syntax to the
SUM function by using the OF operator.
Data of_operator;
set temp;
array temperature_array {*} temp1-temp5;
array celsius_array {*} celsius_temp1-
celsius_temp5;
do i = 1to dim(temperature_array);
celsius_array{i} =
round(5/9*(temperature_array[i] - 32));
end;
sum_temp= sum(of temperature_array{*});
avg_temp= mean(of temperature_array{*});
drop i;
run;
data holidays;
input (holiday1-holiday3) (: $9.);
datalines;
EASTER LABOR_DAY CHRISTMAS
;
run;
data find_christmas;
set holidays;
/* Note that the $ sign is not necessary within
the ARRAY statement */
/* because the HOLIDAY variables are defined
previously as */
IN Operator
The IN operator can test whether a specific value occurs within any element or
variable of an array. You can use this operator with numeric as well as character
arrays. When you use an array with the IN operator, only the name of the array is
used. For example, you can write IF statements similar to the following:
data if_operator;
set holidays;
array holiday_list[*] holiday1-holiday3;
if'CHRISTMAS' in holiday_list then holiday3=
'found';
run;
VNAME Function
You can use the VNAME function to identify the name of a variable that is
associated with an element of an array. The name of the array, along with the
element number that appears within brackets, is passed as the single argument to
the VNAME function, as shown in this example:
Data Vname_ex;
Array my_array[*]$ Swathi Raj Sri;
i=2;
var_name=vname(my_array[i]);
put var_name;
run;
values either with commas or spaces and enclose character values in either single
or double quotation marks. The following ARRAY statements illustrate the
initialization of numeric and character values:
data aray_char;
array cities[4] $10 ('New York' 'Los Angeles'
'Dallas' 'Chicago');
run;
Consider the following example in which you have a data set with a character
variable that contains comments about drug prescriptions and dosages. In this
example, the task is to determine whether certain antibiotics are referred to in the
Program
Data drug_comments;
Input drugs_prescribed_commentary$char80.;
datalines;
20mg cephamandole taken 2xday
cefoperazone as needed
one aspirin per day
1 dose furazolidone before bed-time
;
run;
data find_antibiotics;
set drug_comments;
/* Initialize a character array with the names
of antibiotics. */
array antibiotics[7] $12 ('metronidazole',
'tinidazole', 'cephamandole',
'latamoxef', 'cefoperazone', 'cefmenoxime',
This means that the first element is referred to as element 1, the second element as
element 2, and so on.
data d;
array years [2006:2010] yr2006-yr2010;
run;
If you run the above example the year starts with 2006 (lower bound) and 2010
(upper bound).
data bound2;
array years [2006:2010];
run;
If you run the below example the output will be years1 years2…years5.
Temporary arrays can be either numeric or character. Just as you do with non-
temporary character arrays, you create a temporary character array by including the
dollar sign ($) after the array brackets.
arraymy_array[25] $ _temporary_;
The array has 24 elements for the variables TEMP1 through TEMP24. When the
array elements are used within the data step the array name and the element
number will reference them. The reference to the ninth element in the temperature
array is:
temperature_array{9}
You want a SAS data set with both the off-season and seasonal rates. The seasonal
rates are 25% higher than the off-season rates.
SOLUTIONS:
PROGRAM – VERSION 1:
Data SeasonRates;
set expenses(drop=ResortName);
seasonal1 = expense1*1.25;
seasonal2 = expense2*1.25;
seasonal3 = expense3*1.25;
seasonal4 = expense4*1.25;
seasonal5 = expense5*1.25;
seasonal6 = expense6*1.25;
run;
PROGRAM – VERSION 2:
To solve this problem using an array, you must first establish the correspondence
between
• one array name and the existing variables, expense1 - expense6
• a second array name and the new variables, seasonal1 - seasonal6.
Data work.total;
Drop num;
array charge{10,2} _temporary_
(18.5414
17.8420
12.5015
14.2518
16.3312
19.0016
12.7519
14.9816
15.7620
13.7517);
set expenses;
num = input(substr(resort,6),2.);
tax = charge{num,1};
gratuity = charge{num,2};
total = sum(of expense1-expense6,tax,gratuity);
run;
Below is the screen shot that represents how a SAS program reads a two
dimensional array in the back end.
data ratings;
input
ROOM FOOD ACTIVITY RATING$;
cards;
4 3 1 Awful
2 4 2 Awful
5 2 1 Maybe
2 4 2 Good
4 1 3 Great
2 2 3 Good
5 3 2 Great
3 2 3 Fantastic
2 4 3 Good
5 3 1 Great
;
run;