0% found this document useful (0 votes)
7 views20 pages

Arrays

Uploaded by

Ramya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

Arrays

Uploaded by

Ramya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

INTRODUCTION

Most mathematical and computer languages have some notation for repeating or
other related values. These repeated structures are often called a matrix, a vector, a
dimension, a table, or in the SAS data step, this structure is called an array. While
every memory address in a computer is an array of sorts, the SAS definition is a
group of related variables that are already defined in a data step. Some differences
between SAS arrays and those of other languages are that SAS array elements
don’t need to be contiguous, the same length, or even related at all. All elements
must be character or numeric.

WHY DO WE NEED ARRAYS?


The use of arrays may allow us to simplify our processing. We can use arrays to
help read and analyze repetitive data with a minimum of coding. An array and a
loop can make the program smaller.

For example, suppose we have a file where each record contains5 values with the
temperatures for each hour of the day. These temperatures are in Fahrenheit and
we need to convert them to 5 Celsius values. Without arrays we need to repeat the
same calculation for all 5 temperature variables:

data temp;
input place $ temp1 temp2 temp3 temp4 temp5;
cards;
AP 40 50 60 70 80 90
MP 52 62 72 82 92 102
UP 46 56 66 76 86 96 106
KA 25 35 45 55 65 75 85
HA 30 40 50 60 70 80 90
RUN;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
/*To Convert Fahrenheit to Celsius Using a Data
Step*/
data new;
set temp;
tempc1= 5/9*(temp1-32);
tempc2= 5/9*(temp2-32);
tempc3= 5/9*(temp3-32);
run;

/*Using arrays we can convert n no. of variables


(tempf) to tempc at a time.*/
data array_temp;
set temp;
array temperature_array {5} temp1-temp5;
array celsius_array {5} celsius_temp1-
celsius_temp5;
do i = 1 to 5;
celsius_array{i} = 5/9*(temperature_array[i] -
32);
end;
drop i;
run;

In the above example there are only 5 elements in each array, it would work just as
well with hundreds of elements. In addition to simplifying the calculations, by
defining arrays for the temperature values we could also have used the min the
input statement to simplify the input process. It should also be noted, while
TEMP1 is equivalent to the first element, TEMP2 to the second etc., the variables
do not need to be named consecutively. The array would work just as well with
non-consecutive variable names.

Array sample_array {5} x a i r d;

In this example, the variable x is equivalent to the first element, a to the second etc.

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
BASIC ARRAY CONCEPTS
Arrays within SAS are different than arrays in other languages. SAS arrays are
another way to temporarily group and refer to SAS variables. A SAS array is not a
new data structure, the array name is not a variable, and arrays do not define
additional variables. Rather, a SAS array provides a different name to reference a
group of variables.

The ARRAY statement defines variables to be processed as a group. The variables


referenced by the array are called elements. Once an array is defined, the array
name and an index reference the elements of the array. Since similar processing is
generally completed on the array elements, references to the array are usually
found within DO groups.

ARRAY STATEMENT
The statement used to define an array is the ARRAY statement.

array array-name {n} <$><length> array-elements <(initial-values)>;

The ARRAY statement is a compiler statement within the data step. In addition,
the array elements cannot be used in compiler statements such as DROP or KEEP.
An array must be defined within the data step prior to being referenced and if an
array is referenced without first being defined, an error will occur. Defining an
array within one data step and referencing the array within another data step will
also cause errors, because arrays exist only for the duration of the data step in
which they are defined. Therefore, it is necessary to define an array within every
datastep where the array will be referenced.

The ARRAY statement provides the following information about the SAS array:

 array-name – Any valid SAS name

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
 n – Number of elements within the array
 $ - Indicates the elements within the array are character type variables
 length – A common length for the array elements
 elements – List of SAS variables to be part of the array
 initial values – Provides the initial values for each of the array elements

The name must follow the same rules as variable names therefore, any valid SAS
name is a valid array name. When naming an array it is best to avoid using an array
name that is the same as a function name to avoid confusion. While parentheses or
square brackets can be used when referencing array elements, the braces {} are
used most often since they are not used in other SAS statements. SAS does place
one restriction on the name of the array. The array name may not be the same name
as any variable on the SAS data set.

Restriction:
The elements for an array must be all numeric or all character. When the elements
are character it is necessary to indicate this at the time the array is defined by
including the dollar sign ($) on the array statement after the reference to the
number of elements. If the dollar sign is not included on the array statement, the
array is assumed to be numeric.

Example 1: Using Character Variables in an Array

data text;
array names{*} $ n1-n10;
array capitals{*} $ c1-c10;
input names{*};
do i=1 to 10;
capitals{i}=upcase(names{i});
end;
datalines;
smithers michaels gonzale zhurth frank bleigh
rounder joseph peters sam

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
;

proc print data=text;


title 'Names Changed from Lowercase to Uppercase';
run;

When all numeric or all character variables in the data set are to be elements within
the array, there are several special variables that may used instead of listing the
individual variables as elements. The special variables are:

_NUMERIC_ - when all the numeric variables will be used as elements

_CHARACTER_ - when all the character variables will be used as elements

_ALL_ - when all variables on the data set will be used as elements and the
variables are all the same type

N is the array subscript in the array definition and it refers to the number of
elements within the array. A numeric constant, a variable whose value is a number,
a numeric SAS expression, or an asterisk (*) may be used as the subscript. The
subscript must be enclosed within braces {}, square brackets [], or parentheses ().
In our temperature example the subscript will be 5 for each of the 5 temperature
variables:

array temperature_array {5} temp1 – temp5;

When the asterisk is used, it is not necessary to know how many elements are
contained within the array. SAS will count the number of elements for you. An
example of using the asterisk is when one of the special variables defines the
elements.

array allnums {*} _numeric_;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
DIM Function:
When it is necessary to know how many elements are in the array, the DIM
function can be used to return the count of elements.

do i = 1 to dim(allnums);

allnums{i} = round(allnums{i},.1);

end;

In this example, when the array ALLNUMS is defined, SAS will count the number
of numeric variables used as elements of the array. Then, in the DO group
processing, the DIM function will return the count value as the ending range for
the loop.

data array_temp;
set temp;
array temperature_array {*} temp1-temp5;
array celsius_array {*} celsius_temp1-
celsius_temp5;
do i = 1 to dim(temperature_array);
celsius_array{i} =
round(5/9*(temperature_array[i] - 32));
end;
drop i;
run;

OF Operator
It is common to perform a calculation using all of the variables or elements that are
associated with an array. For example, if you want to sum each of the 5
temperatures. To do that, you pass the name of the array using the [*] syntax to the
SUM function by using the OF operator.

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
Example for Using Of Operator:

Data of_operator;
set temp;
array temperature_array {*} temp1-temp5;
array celsius_array {*} celsius_temp1-
celsius_temp5;
do i = 1to dim(temperature_array);
celsius_array{i} =
round(5/9*(temperature_array[i] - 32));
end;
sum_temp= sum(of temperature_array{*});
avg_temp= mean(of temperature_array{*});
drop i;
run;

Applying Character Functions to Arrays:


You can pass arrays to functions (such as concatenation functions) that expect
character arguments. The following example uses the CATX function to create a
character string that contains the concatenation of three holidays (Easter, Labor
Day, and Christmas).

data holidays;
input (holiday1-holiday3) (: $9.);
datalines;
EASTER LABOR_DAY CHRISTMAS
;
run;
data find_christmas;
set holidays;
/* Note that the $ sign is not necessary within
the ARRAY statement */
/* because the HOLIDAY variables are defined
previously as */

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
/* character variables. */
Array holiday_list[*] holiday1-holiday3;
all_holidays=catx(' ', of holiday_list[*]);
run;
proc print;
run;

IN Operator

The IN operator can test whether a specific value occurs within any element or
variable of an array. You can use this operator with numeric as well as character
arrays. When you use an array with the IN operator, only the name of the array is

used. For example, you can write IF statements similar to the following:

data if_operator;
set holidays;
array holiday_list[*] holiday1-holiday3;
if'CHRISTMAS' in holiday_list then holiday3=
'found';
run;

VNAME Function
You can use the VNAME function to identify the name of a variable that is
associated with an element of an array. The name of the array, along with the
element number that appears within brackets, is passed as the single argument to
the VNAME function, as shown in this example:

Data Vname_ex;
Array my_array[*]$ Swathi Raj Sri;
i=2;
var_name=vname(my_array[i]);
put var_name;
run;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
In this case, the value of the VAR_NAME= variable is “Raj”.

Assigning Initial Values to Array Variables or Elements:


For some applications, it can be beneficial to assign initial values to the variables
or elements of an array at the time that the array is defined. To do this, enclose the
initial values in parentheses at the end of the ARRAY statement. Separate the

values either with commas or spaces and enclose character values in either single
or double quotation marks. The following ARRAY statements illustrate the
initialization of numeric and character values:

array sizes[*] petite small medium large extra_large (2, 4, 6, 8, 10);

array cities[4] $10 ('New York' 'Los Angeles' 'Dallas' 'Chicago');

Examples for Assigning values into an Array:


Data aray_num;
array sizes[*] petite small medium large
extra_large (2, 4, 6, 8, 10);
run;

data aray_char;
array cities[4] $10 ('New York' 'Los Angeles'
'Dallas' 'Chicago');
run;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
Example: Determining Whether Antibiotics Are Referenced in Patient
Prescriptions

Consider the following example in which you have a data set with a character
variable that contains comments about drug prescriptions and dosages. In this
example, the task is to determine whether certain antibiotics are referred to in the

prescriptions. An array is initialized with a list of seven antibiotics. Then a DO


group loops through the array, checking for each of the antibiotics.

Key Tasks in This Example

 Initialize a character array with the names of antibiotics.


 Initialize a flag variable to indicate that no antibiotics are found.
 Use the INDEXW and UPCASE functions to search for each antibiotic.
 Reset the value of the flag variable if an antibiotic is found.

Program

Data drug_comments;
Input drugs_prescribed_commentary$char80.;
datalines;
20mg cephamandole taken 2xday
cefoperazone as needed
one aspirin per day
1 dose furazolidone before bed-time
;
run;
data find_antibiotics;
set drug_comments;
/* Initialize a character array with the names
of antibiotics. */
array antibiotics[7] $12 ('metronidazole',
'tinidazole', 'cephamandole',
'latamoxef', 'cefoperazone', 'cefmenoxime',

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
'furazolidone');
/* Initialize a flag variable to N, meaning that
no antibiotic *//*
is found. */
antibiotic_found='N';
/* Cycle through the array of antibiotics. */
do i=1 to dim(antibiotics);
/* Use the INDEXW and the UPCASE functions to
check for each drug. */
If indexw(upcase(drugs_prescribed_commentary),
upcase(antibiotics[i]))
then do;
/* When an antibiotic is found, change the flag
variable to Y,*/
/* meaning that an antibiotic is found. */
antibiotic_found = 'Y';
/* No need to continue checking because an
antibiotic is */
/* found. Terminate the DO group. */
leave;
end;
end;
keep drugs_prescribed_commentary
antibiotic_found;
run;
proc print;
run;

Specifying Lower and Upper Bounds of a Temporary Array


The arrays discussed previously in this paper use either a single value or an
asterisk within the array brackets. When this is done, by default, the lower bound is
1 and the upper bound is the number of elements in the array. For example, in both

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
of the following ARRAY statements, the lower bound is 1 and the upper bound is
5:

array years[5] yr2006-yr2010;

array years[*] yr2006-yr2010;

This means that the first element is referred to as element 1, the second element as
element 2, and so on.

Examples to understand this in clear:

data d;
array years [2006:2010] yr2006-yr2010;
run;

If you run the above example the year starts with 2006 (lower bound) and 2010
(upper bound).

data bound2;
array years [2006:2010];
run;

If you run the below example the output will be years1 years2…years5.

Creating a Temporary Array:


Occasionally, you might need an array to hold values temporarily for subsequent
calculations but have no need for the actual variables. In such cases, it is beneficial
to create arrays in which specific variables are never created or associated with
elements of an array. Arrays of this nature are referred to frequently as non-
variable-based arrays. To define such an array, use the _TEMPORARY_ keyword,
as shown in this example:

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
arraymy_array[25] _temporary_;

Temporary arrays can be either numeric or character. Just as you do with non-
temporary character arrays, you create a temporary character array by including the
dollar sign ($) after the array brackets.

arraymy_array[25] $ _temporary_;

ONE DIMENSION ARRAYS


A simple array may be created when the variables grouped together conceptually
appear as a single row. This is known as a one-dimensional array. Within the
Program Data Vector the variable structure may be visualized as:

The array statement to define this one-dimensional array will be:

array temperature_array {24} temp1 – temp24;

The array has 24 elements for the variables TEMP1 through TEMP24. When the
array elements are used within the data step the array name and the element
number will reference them. The reference to the ninth element in the temperature
array is:

temperature_array{9}

Example for One dimensional Array:

You want a SAS data set with both the off-season and seasonal rates. The seasonal
rates are 25% higher than the off-season rates.

SOLUTIONS:

• Use six assignment statements

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
• Use an array.

PROGRAM – VERSION 1:

The first solution requires considerable typing:

Data SeasonRates;
set expenses(drop=ResortName);
seasonal1 = expense1*1.25;
seasonal2 = expense2*1.25;
seasonal3 = expense3*1.25;
seasonal4 = expense4*1.25;
seasonal5 = expense5*1.25;
seasonal6 = expense6*1.25;
run;

PROGRAM – VERSION 2:
To solve this problem using an array, you must first establish the correspondence
between
• one array name and the existing variables, expense1 - expense6
• a second array name and the new variables, seasonal1 - seasonal6.

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
Data SeasonRates(drop=i);
set expenses(drop=ResortName);
array ex{6} expense1 - expense6; /* (1) */
array seasonal{6} seasonal1 - seasonal6; /* (2)
*/
do i=1 to 6; /* (3) */
seasonal{i} = ex{i}*1.25;
end;
format expense1 - expense6 seasonal1 - seasonal6
dollar9.2; /* (4) */
run;

Explanation of the above program:

1. To use an array, you must first declare it in an ARRAY statement. At


compile time, the ARRAY statement establishes a correspondence between
the name of the array, EX, and the numeric DATA step variables to which it
refers, EXPENSE1 - EXPENSE6.
The association of an array reference with a variable is only for the duration
of the DATA step. If you need to refer to the array in another DATA step,
you must repeat the array reference in that DATA step.
2. The array, SEASONAL, references variables that do not exist in the
Program Data Vector. SAS creates the variables SEASONAL1 -
SEASONAL6 from the ARRAY statement.
3. The DO loop processes each element of the arrays sequentially. The
assignment statement using the array reference calculates seasonal rates for
each of the six expenses.
4. Any FORMAT, LABEL, DROP, KEEP, or LENGTH statements must use
DATA step variable names, not array references.

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
Two Dimensional Array:
Two-dimensional arrays can be thought of as providing row and column
arrangement of a group of variables. Arrays can be defined with as many
dimensions as necessary.

Example for a Two Dimensional Array:

Data work.total;
Drop num;
array charge{10,2} _temporary_
(18.5414
17.8420
12.5015
14.2518
16.3312
19.0016
12.7519
14.9816
15.7620
13.7517);
set expenses;
num = input(substr(resort,6),2.);
tax = charge{num,1};
gratuity = charge{num,2};
total = sum(of expense1-expense6,tax,gratuity);
run;

Below is the screen shot that represents how a SAS program reads a two
dimensional array in the back end.

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
Creating a Multi-Dimensional Array:
datahtlrate;
input
Hotel$ RoomRateFoodRateActRate;
cards;
HOTEL1 4 3 1
HOTEL2 2 4 2
HOTEL3 5 2 1
HOTEL4 2 4 2
HOTEL5 4 1 3
HOTEL6 2 2 3
HOTEL7 5 3 2
HOTEL8 2 4 3
HOTEL9 3 2 3
HOTEL10 5 3 1
run;

data ratings;
input
ROOM FOOD ACTIVITY RATING$;
cards;
4 3 1 Awful
2 4 2 Awful
5 2 1 Maybe
2 4 2 Good
4 1 3 Great
2 2 3 Good
5 3 2 Great
3 2 3 Fantastic
2 4 3 Good
5 3 1 Great
;
run;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
data compare(keep=hotel roomrate foodrate
actrate HotelRating);
array rate{5,4,3} $ _temporary_;
if _n_ = 1thendo i=1to all;
set ratings nobs=all;
rate{room,food,activity}=rating;
end;
sethtlrate;
HotelRating=rate{roomrate,foodrate,actrate};
run;

Transpose a SAS data set:


Method -1
data dat1;
input name $ e1-e3;
datalines;
John 89 90 92
Mary 92 . 81
;
Proc transpose data=dat1
out=dat1_t1 (drop=_name_);
var e1-e3;
id name;
run;
title'DAT1 in the Original Form';
proc print data=dat1;
run;
title'Transposing DAT1 Using PROC TRANSPOSE';
proc print data=dat1_t1;
run;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
Method- 2
data dat1_t2(keep= John Mary);
set dat1 end=last;
array score[3] e1-e3;
array all[2,3] _temporary_;
arraynew_score[2] John Mary;
i + 1;
do j = 1to dim(score);
all[i,j] = score[j];
end;
if last thendo;
do j = 1to dim(score);
do i = 1to2;
new_score[i] = all[i,j];
end;
output;
end;
end;
run;

TRANSPOSING BY-GROUPS (ONE OBSERVATION PER


SUBJECT)

Example Using a Proc Transpose:


Proc sort data=dat1 out=dat1_sort;
by name;
run;
proc transpose data=dat1_sort
out=dat1_bygrp1 (rename=(col1=score)
where=(score ne .))
name=test;
by name;
var e1-e3;
run;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com
title'Transposing By-groups by Using PROC
TRANSPOSE';
proc print data=dat1_bygrp1;
run;

Transpose example using an Array:


data dat1_bygrp2 (drop=e1-e3 i);
set dat1;
array e[3];
do i =1to3;
test=cats("e", i);
score = e[i];
if not missing(score) thenoutput;
end;
run;
title'Transposing By-groups by Using Array
Processing';
proc print data=dat1_bygrp2;
run;

E-Mail: [email protected] Phone: +91-9848733309/+91-9676828080


www.covalentech.com & www.covalenttrainings.com

You might also like