0% found this document useful (0 votes)
51 views39 pages

SAS Ch1 Introduction 2

This document discusses techniques for modifying SAS data, including creating and modifying variables using assignment statements, arithmetic operators, and functions. It covers using IF-THEN/ELSE logic to conditionally modify data based on certain criteria. Methods like arrays and DO loops can simplify programs that perform the same operations on multiple variables. Variables can also be removed from a dataset using DROP or retained using KEEP statements.

Uploaded by

Ivan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views39 pages

SAS Ch1 Introduction 2

This document discusses techniques for modifying SAS data, including creating and modifying variables using assignment statements, arithmetic operators, and functions. It covers using IF-THEN/ELSE logic to conditionally modify data based on certain criteria. Methods like arrays and DO loops can simplify programs that perform the same operations on multiple variables. Variables can also be removed from a dataset using DROP or retained using KEEP statements.

Uploaded by

Ivan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Chapter 1: Introduction to SAS (2/2)

Chapter 1: Introduction to SAS (2/2)

Junshu Bao
University of Pittsburgh

1 / 39
Chapter 1: Introduction to SAS (2/2)

Table of contents

1.5 Modifying SAS Data

1.6 Proc Step

1.7 Global Statements

* More about Modifying and Combining Data Sets

1.8 SAS Graphics

2 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Creating and Modifying Variables


The assignment statement can be used both to create new
variables and modify existing ones. The basic form is
variable = expression
For examples
weightloss=startweight-weightnow;
startweight=startweight*0.4536;
SAS has the normal set of arithmetic operators: +, -, /
(divide), * (multiply), and ** (exponential), plus various
arithmetic, mathematical and statistical functions.

3 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Assignment Statements
Here are examples of basic types of assignment statements:
Type of expression Assignment statement
numeric constant NewVar = 10;
character constant NewVar = `ten';
a variable NewVar = OldVar;
a function of variable(s) NewVar = function(OldVariable);
Whether the variable NewVar is numeric or character depends
on the expression that denes it.

4 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Example: Survey of Home Gardeners


Gardeners were asked to estimate the number of pounds they
harvested for four crops: tomatoes, zucchini, peas, and grapes.
Gregor 10 2 40 0
Molly 15 5 10 1000
Luther 50 10 15 50
Susan 20 0 . 20
The following program reads the data and then modies the data.
DATA homegarden;
INFILE 'c:\MyRawData\Garden.dat';
INPUT Name $ 1-7 Tomato Zucchini Peas Grapes;
Zone = 14;
Type = `home';
Zucchini = Zucchini * 10;
Total = Tomato + Zucchini + Peas + Grapes;
PerTom = (Tomato / Total) * 100;
RUN;

See SAS program and output. 5 / 39


Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Missing Values

I The result of an arithmetic operation performed on a missing


value is itself a missing value.
I Missing values for numeric variables are represented by a period.
I A numeric variable can be set to a missing value by an
assignment statement such as:
age = .;
I A missing value may be assigned to a character variable as
follows:
team=` ';

6 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Using SAS Functions


SAS has hundreds of functions in general areas including:

Character Character String Matching Date and Time


Distance Financial Descriptive Statistics
Macro Mathematical Probability
Random Number State and Zip Code Variable Information
For example,
AvgScore = MEAN(Scr1, Scr2, Scr3, Scr4, Scr5);
DayEntered = DAY(Date);
Type = UPCASE(Type);

I The MEAN function returns the mean of non-missing arguments.


I The DAY function returns the day of the month.
I The UPCASE function transform the variable values to
uppercase. * SAS is case sensitive when it comes to variable
values; a 'd' is not the same as 'D'. 7 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Using IF-THEN Statements


Frequently, you want an assignment statement to apply to some
observations, but not all. This is called conditional logic and you do it
with IF-THEN statements:
IF condition THEN action;
Example: IF Model=`Mustang' THEN Make=`Ford';
Here are the basic comparison operators:
Symbolic Mnemonic Meaning
= EQ equals
^= and ~= NE not equal
> GT greater than
< LT less than
>= GE greater than or equal
<= LE less than or equal
The IN operator also makes comparisons. Here is an example:
IF Model IN (`Corvette', `Camaro') THEN Make=`Chevrolet'; 8 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

DO-END Keywords
A single IF-THEN statement can only have one action. If you add the
keywords DO and END, then you can execute more than one action.
The basic form is as follows:
IF condition THEN DO;
action1;
action2;
END;

For example,
IF Model=`Mustang' THEN DO;
Make=`Ford';
Size=`compact';
END;

9 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Specifying Multiple Conditions


You can also specify multiple conditions with the keywords AND and
OR:
IF condition1 AND condition2 THEN action;

For example
IF Model=`Mustang' AND Year<1975 THEN Status=`classic';

Like the comparison operators, AND and OR may be symbolic or


mnemonic:
Symbolic Mnemonic Meaning
& AND all comparisons must be true
| or ! OR at least one comparison must be true

10 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Example
The following data about used cars contain values for model, year,
make, number of seats, and color:
Corvette 1955 . 2 black
XJ6 1995 Jaguar 2 teal
Mustang 1966 Ford 4 red
Miata 2002 . . silver
CRX 2001 Honda 2 black
Camaro 2000 . 4 red
We will ll in missing data, and create a new variable, Status.
DATA sportscars;
INFILE `c:\MyRawData\UsedCars.dat';
INPUT Model $ Year Make $ Seats Color $;
IF Year < 1975 THEN Status = `classic';
IF Model = `Corvette' OR Model = `Camaro' THEN Make = `Chevy';
IF Model = `Miata' THEN DO;
Make = `Mazda';
Seats = 2;
END;
RUN;
11 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Grouping Observations with IF-THEN/ELSE


One common use of IF-THEN statements is for grouping
observations. By adding the keyword ELSE to your IF statements,
you can tell SAS that these statements are related.
IF-THEN/ELSE logic takes this basic form:
IF condition1 THEN action1;
ELSE IF condition2 THEN action2;
ELSE IF condition3 THEN action3;
... ...
ELSE action;
The last ELSE statement contains just an action. An ELSE of this
kind becomes a default which is automatically executed for all
observations failing to satisfy any of the previous IF statements. For
example,
IF Cost = . THEN CostGroup = 'missing';
ELSE IF Cost < 2000 THEN CostGroup = 'low';
ELSE IF Cost < 10000 THEN CostGroup = 'medium';
ELSE CostGroup = 'high';

* SAS considers missing values to be smaller than non-missing values. 12 / 39


Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Simplifying Programs with Arrays


When the same operation is to be carried out on several variables, it
is often convenient to use an array and an iterative do loop in
combination
For example, suppose you have 20 variables, q1 to q20, for which "not
applicable" has been coded -1 and we wish to set those to missing
values, we might do it as follows:
array qall{20} q1-q20;
do i = 1 to 20;
if qall{i} = -1 then qall{i} = . ;
end;

The array statement denes an array by specifying the name of the


array, `qall' here, the number of variables to be included in it in
braces and the list of variables to be included.
* All the variables in the array must be of the same type, that is all
numeric or all character.
13 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Deleting Variables
Variables may be removed from the data set being created by
using the drop and keep statements.
I The drop statement names a list of variables that are to be
excluded from the data set. For example:
data gradebook_final;
set gradebook;
drop quiz5;
run;
I The keep statement names a list of variables that are to be
the only ones retained in the data set. For example:
data gradebook_final;
set gradebook;
keep quiz1 quiz2 quiz3 quiz4;
run;
14 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Deleting Observations
It may be necessary to delete observations from the data set
either because they contain errors or because the analysis is to
be carried out on a subset of the data.
I Deleting erroneous observations is best done by using the if
then statement with the delete statement. For example,
if weightloss>startweight then delete;
I In the case above, it would also be useful to write out a
message giving more information about the observation
that contains the error.
if weightloss>startweight then do;
put 'Error in weight data' idno = startweight = weightloss = ;
delete;
run;
The put statement write text (in quotes) and the values of
variables to the log. 15 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data

Subsetting Data Sets


It may be necessary to delete observations from the data set
either because they contain errors or because the analysis is to
be carried out on a subset of the data. This can be achieved
with the subsetting if statement in a data step.
For example,
data women;
set bodyfat;
if sex = 'F';
run;

16 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step

Proc Statement

I Once data have been read into a SAS data set, SAS
procedures can be used to analyze the data.
I The proc step is a block of statements that specify the data
set to be analyzed, the procedure to be used and any
further details of the analysis.
I The proc statement names the procedure to be used and
may also specify options for the analysis.
The most important option is data= option that names the
data set to be analyzed. If the option is omitted, the
procedure uses the most recently created data set.

17 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step

Var Statement
The var statement species that variables that are to be
processed by the proc step. For example,
proc print data = SlimmingClub;
var name team weightloss;
run;
restricts the printout to the three variables mentioned, whereas
the default would be to print all variables.

18 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step

Where Statement
The where statement selects the observations to be processed.
The keyword where is followed by a logical condition, and only
those observations for which the condition is true are included
in the analysis.
proc print data = SlimmingClub;
where weightloss>0;
run;
only prints out observations with positive weight loss.

19 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step

By Statement
The by statement is used to process the data in groups.
I The observations are grouped according to the values of the
variable named in the by statement, and a separate analysis
is conducted for each group.
I The data set must rst be sorted on the by variable.

proc sort data=SlimmingClub;


by team;
proc means;
var weightloss;
by team;
run;

20 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step

Class Statement
The class statement is used with many procedures to name
variables that are to be used as classication variables, or
factors.
The variables named may be character or numeric variables and
will typically contain a relatively small range of discrete values.
For example

proc logistic data=ghq;


class sex;
model cases/total=sex ghq;
run;

21 / 39
Chapter 1: Introduction to SAS (2/2)
1.7 Global Statements

Global Statements (1) Title


Global statements may occur at any point in a SAS program
and remain in eect until reset. The title statement is a global
statement and provides a title that will appear on each page of
printed output and each graph until reset. An example would be
title `Analysis of Slimming Club Data';

I The text of the title must be enclosed in quotes.


I Multiple lines of titles can be specied with the title2
statement for the second line, title3 for the third line, and
so on up to 10.
I The title statement is synonymous with title1.

22 / 39
Chapter 1: Introduction to SAS (2/2)
1.7 Global Statements

Global Statements (2) Comments


Comment statements are global statements in the sense that
they can occur anywhere. There are two forms of comment
statement.
I The rst form begins with an asterisk and ends with a
semicolon, for example,
* this is a comment;
I The second form begins with /* and ends with */:
/* this is also a
comment
*/
Comments may appear on the same line as a SAS
statement, for example
bmi=weight/height**2; /* Body Mass Index */
23 / 39
Chapter 1: Introduction to SAS (2/2)
1.7 Global Statements

Global Statements (3) Options


The options statement is used to set SAS system options. Most
of these can be safely left at their default values. Some useful
options are:
I Nocenter aligns the output at the left, rather than
centering it on the page.
I Nodate suppresses printing of the date and time on the
output.
I Pageno=n sets the page number for the next page of
output. Alternatively, nonumber turns page numbering
o.
For example
options nodate nocenter nonumber;
24 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Concatenating Data Sets - Adding Observations


The set statement can be used to concatenate or stack the data sets
one on top of the other.
This is useful when you want to combine data sets with all or most of
the same variables but dierent observations. The basic form is:
data new-dataset;
set dataset1 dataset2;
run;

I The number of observations in the new data set will equal the
sum of the number of observations in the old data sets.
I If one of the data sets has a variable not contained in the other
data sets, then the observations from the other data sets will
have missing values for that variable.

25 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Example: Stacking Data Sets


The Fun Times Amusement Park has two entrances where they
collect data about their customers.
South Entrance Data:
Entrance Pass Number Size of Party Age
S 43 3 27
S 44 3 24
S 45 3 2
North Entrance Data:
Entrance Pass Number Size of Party Age Parking Lot
N 21 5 41 1
N 87 4 33 3
N 65 2 67 1
N 66 2 7 1
Note that the north entrance data set has one more variable, parking
lot number. The north entrance only has one parking lot. 26 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Example: Stacking Data Sets (cont.)


Suppose we would like to combine the data of the two entrances
and create a new variable, AmountPaid, which tells how much
each customer paid based on their age.
DATA both;
SET southentrance northentrance;
IF Age = . THEN AmountPaid = .;
ELSE IF Age < 3 THEN AmountPaid = 0;
ELSE IF Age < 65 THEN AmountPaid = 35;
ELSE AmountPaid = 27;
RUN;
See SAS program and output.

27 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Merging Data Sets - Adding Variables (1)


Data for a study may arise from more than one source, or at dierent
times, and need to be combined.
I For matching purpose, you will want to have a common variable
or several variables which taken together uniquely identify each
observation. If the data are not already sorted, use the sort
procedure to sort all data sets by the common variables.
I The basic form is as follows:
proc sort data=dataset1;
by variable-list;
proc sort data=dataset2;
by variable-list;
data new-dataset;
merge dataset1 dataset2;
by variable-list;

* If the two data sets have variables with the same names, then the
variables from the second data set will overwrite any variables having
the same name in the rst data set. 28 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Example: Belgian Chocolatier


A Belgian chocolatier keeps track of the number of each type of
chocolate sold each day.
I The code number for each chocolate and the number of pieces
sold that day are kept in a le.
I In a separate le she keeps the names and descriptions of each
chocolate as well as the code number.
In order to print the day's sales along with the descriptions of the
chocolates, the two les must be merged together using the code
number as the common variable.

See SAS program and output.

29 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

One-to-Many Match Merge


Sometimes you need to combine two data sets by matching one
observation from one data set with more than one observation in
another.
Suppose you had data for every state in the U.S. and wanted to
combine it with data for every county. This would be a one-to-many
match merge.
The statements for a one-to-many match merge are identical to those
for a one-to-one match merge:
data new-dataset;
merge dataset1 dataset2;
by variable-list;

I The order of the data sets in the merge statement does not aect
the matching. In other words, a one-to-many merge will match
the same observations as a many-to-one merge.
I Before you merge two data sets, they must be sorted by one or
more common variables. 30 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Example: One-to-Many Match Merge


A distributor of athletic shoes is putting all its shoes on sale at 20 to
30% o the regular price. The distributor has two data sets:
I Data set 1: information about each type of shoe. It contains one
record for each shoe with values for style, type of exercise
(running, walking, or cross-training), and regular price.
I Data set 2: discount factor. It contains one record for each type
of exercise and its discount.
To nd the sale price, we need to merge the two data sets and
calculate a new price after the discount.

See SAS program and output.

31 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Tracking and Selecting Observations


When you combine two data sets, you can use in= options to track
which of the original data sets contributed to each observation in the
new data set.
For example, the data step below creates a data set named both by
merging two data sets state and county. Then the in= options
create two variables named InState and InCounty.
data both;
merge state (in=InState) county (in=InCounty);
by StateName;

SAS gives the in= variables a value of 0 or 1. A value of 1 means that


data set did contributes to the current observation, and a value of 0
means no contribution.
* You can use this in= variable to subset data sets.
32 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Example: The IN= Option


A sporting goods manufacturer wants to send a sales rep to contact
all customers who did not place any orders during the third quarter of
the year. The company has two data les:
I Data le 1: customer information
I Data le 2: orders placed during the third quarter
To compile a list of customers without orders, you merge the two data
sets using the IN= option, and then select customers who had no
observations in the orders data set.

See SAS program and output.

33 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Selecting Observations with the WHERE= Option


The where= data set option is the most exible of all ways to subset
data. You can use it in data steps or proc steps. The basic form of a
where= option is:
where = (condition)
I If used in a set or merge statement, the where= option will be
applied to the data set that is being read. For example,
data gone;
set animals (where = (Status = 'Extinct'));

I If used in a data statement, the where= option will be applied to


the data set that is being written. For example,
data uncommon (where = (Status IN ('Endangered', 'Threatened'));
set animals;

34 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets

Example: WHERE= Option


The following data contain information about the Seven Summits, the
highest mountains on each continent. Each line of data includes the
name of a mountain, its continent, and height in meter.
Kilimanjaro Africa 5895
Vinson Massif Antarctica 4897
Everest Asia 8848
Elbrus Europe 5642
McKinley North America 6194
Aconcagua South America 6962
Kosciuszuko Australia 2228

We will create two data sets named "tallpeaks" (above 6000 meters)
and "American".

See SAS program and output.


35 / 39
Chapter 1: Introduction to SAS (2/2)
1.8 SAS Graphics

SAS Graphics
When the SAS/GRAPH module has been licensed, there are a
number of ways of producing high-quality graphical output. Three
main approaches:
I Graphical options within a statistical procedure
I Traditional graphics procedures (gplot, gchart, etc.)
Graphics procedures that existed in versions of SAS prior to 9.2.
I Statistical graphics procedures (sgplot, sgpanel, sgmatrix and
sgrender)
New graphics procedures which can produce a wide range of
attractive graphics.
We will focus on the statistical graphics procedures for now. The
specic graphical options that are available within statistical
procedures will be dealt with in later chapters.
36 / 39
Chapter 1: Introduction to SAS (2/2)
1.8 SAS Graphics

xy Plots - Proc sgplot


An xy plot is one in which the data are represented in two dimensions
dened by the values of two variables. For example, to create a
scatterplot,
proc sgplot data=bodyfat;
scatter y=pctfat x=age;
run;

The syntax is straightforward:


I A scatter statement is used to tell SAS to create a scatterplot.
I In the scatter statement, both the x and y variables are specied
explicitly.
For dierent types of plot, a statement other than scatter is used. See
next page.

37 / 39
Chapter 1: Introduction to SAS (2/2)
1.8 SAS Graphics

Types of xy Plots

Type of Plot Plotting Statement


Scatter plot - data values are plotted scatter
Line plot - data values are joined with lines series
Step plot - data values joined with stepped lines step
Needle plot - vertical line joins the value to the x axis needle
Regression plot - a scatter plot with a regression line reg
Locally weighted regression loess
Penalized Beta splines pbspline

* For line plots and step plots the points will be plotted in the
order in which they occur in the data set, so sort the data by the
x axis variable rst.
* A common variant of the xy plot distinguish groups in the data
by using dierent symbols/lines. This is done by the group=var
option. For example: scatter y=pctfat x=age/group=sex;
38 / 39
Chapter 1: Introduction to SAS (2/2)
1.8 SAS Graphics

Overlaying Plots
It is often useful to combine the information from two or more plots
by overlaying them. Sgplot does this automatically. For example, a
plot to compare the ts from linear regression and locally weighted
regression could be produced as follows:
proc sgplot data=bodyfat;
reg y=pctfat x=age;
loess y=pctfat x=age/nomarkers;
run;

The nomarkers option is specied to prevent the data points being


plotted twice as sgplot uses dierent plotting symbols for each.

39 / 39

You might also like