SAS Ch1 Introduction 2
SAS Ch1 Introduction 2
Junshu Bao
University of Pittsburgh
1 / 39
Chapter 1: Introduction to SAS (2/2)
Table of contents
2 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
3 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
Assignment Statements
Here are examples of basic types of assignment statements:
Type of expression Assignment statement
numeric constant NewVar = 10;
character constant NewVar = `ten';
a variable NewVar = OldVar;
a function of variable(s) NewVar = function(OldVariable);
Whether the variable NewVar is numeric or character depends
on the expression that denes it.
4 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
Missing Values
6 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
DO-END Keywords
A single IF-THEN statement can only have one action. If you add the
keywords DO and END, then you can execute more than one action.
The basic form is as follows:
IF condition THEN DO;
action1;
action2;
END;
For example,
IF Model=`Mustang' THEN DO;
Make=`Ford';
Size=`compact';
END;
9 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
For example
IF Model=`Mustang' AND Year<1975 THEN Status=`classic';
10 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
Example
The following data about used cars contain values for model, year,
make, number of seats, and color:
Corvette 1955 . 2 black
XJ6 1995 Jaguar 2 teal
Mustang 1966 Ford 4 red
Miata 2002 . . silver
CRX 2001 Honda 2 black
Camaro 2000 . 4 red
We will ll in missing data, and create a new variable, Status.
DATA sportscars;
INFILE `c:\MyRawData\UsedCars.dat';
INPUT Model $ Year Make $ Seats Color $;
IF Year < 1975 THEN Status = `classic';
IF Model = `Corvette' OR Model = `Camaro' THEN Make = `Chevy';
IF Model = `Miata' THEN DO;
Make = `Mazda';
Seats = 2;
END;
RUN;
11 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
Deleting Variables
Variables may be removed from the data set being created by
using the drop and keep statements.
I The drop statement names a list of variables that are to be
excluded from the data set. For example:
data gradebook_final;
set gradebook;
drop quiz5;
run;
I The keep statement names a list of variables that are to be
the only ones retained in the data set. For example:
data gradebook_final;
set gradebook;
keep quiz1 quiz2 quiz3 quiz4;
run;
14 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
Deleting Observations
It may be necessary to delete observations from the data set
either because they contain errors or because the analysis is to
be carried out on a subset of the data.
I Deleting erroneous observations is best done by using the if
then statement with the delete statement. For example,
if weightloss>startweight then delete;
I In the case above, it would also be useful to write out a
message giving more information about the observation
that contains the error.
if weightloss>startweight then do;
put 'Error in weight data' idno = startweight = weightloss = ;
delete;
run;
The put statement write text (in quotes) and the values of
variables to the log. 15 / 39
Chapter 1: Introduction to SAS (2/2)
1.5 Modifying SAS Data
16 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step
Proc Statement
I Once data have been read into a SAS data set, SAS
procedures can be used to analyze the data.
I The proc step is a block of statements that specify the data
set to be analyzed, the procedure to be used and any
further details of the analysis.
I The proc statement names the procedure to be used and
may also specify options for the analysis.
The most important option is data= option that names the
data set to be analyzed. If the option is omitted, the
procedure uses the most recently created data set.
17 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step
Var Statement
The var statement species that variables that are to be
processed by the proc step. For example,
proc print data = SlimmingClub;
var name team weightloss;
run;
restricts the printout to the three variables mentioned, whereas
the default would be to print all variables.
18 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step
Where Statement
The where statement selects the observations to be processed.
The keyword where is followed by a logical condition, and only
those observations for which the condition is true are included
in the analysis.
proc print data = SlimmingClub;
where weightloss>0;
run;
only prints out observations with positive weight loss.
19 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step
By Statement
The by statement is used to process the data in groups.
I The observations are grouped according to the values of the
variable named in the by statement, and a separate analysis
is conducted for each group.
I The data set must rst be sorted on the by variable.
20 / 39
Chapter 1: Introduction to SAS (2/2)
1.6 Proc Step
Class Statement
The class statement is used with many procedures to name
variables that are to be used as classication variables, or
factors.
The variables named may be character or numeric variables and
will typically contain a relatively small range of discrete values.
For example
21 / 39
Chapter 1: Introduction to SAS (2/2)
1.7 Global Statements
22 / 39
Chapter 1: Introduction to SAS (2/2)
1.7 Global Statements
I The number of observations in the new data set will equal the
sum of the number of observations in the old data sets.
I If one of the data sets has a variable not contained in the other
data sets, then the observations from the other data sets will
have missing values for that variable.
25 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
27 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
* If the two data sets have variables with the same names, then the
variables from the second data set will overwrite any variables having
the same name in the rst data set. 28 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
29 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
I The order of the data sets in the merge statement does not aect
the matching. In other words, a one-to-many merge will match
the same observations as a many-to-one merge.
I Before you merge two data sets, they must be sorted by one or
more common variables. 30 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
31 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
33 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
34 / 39
Chapter 1: Introduction to SAS (2/2)
* More about Modifying and Combining Data Sets
We will create two data sets named "tallpeaks" (above 6000 meters)
and "American".
SAS Graphics
When the SAS/GRAPH module has been licensed, there are a
number of ways of producing high-quality graphical output. Three
main approaches:
I Graphical options within a statistical procedure
I Traditional graphics procedures (gplot, gchart, etc.)
Graphics procedures that existed in versions of SAS prior to 9.2.
I Statistical graphics procedures (sgplot, sgpanel, sgmatrix and
sgrender)
New graphics procedures which can produce a wide range of
attractive graphics.
We will focus on the statistical graphics procedures for now. The
specic graphical options that are available within statistical
procedures will be dealt with in later chapters.
36 / 39
Chapter 1: Introduction to SAS (2/2)
1.8 SAS Graphics
37 / 39
Chapter 1: Introduction to SAS (2/2)
1.8 SAS Graphics
Types of xy Plots
* For line plots and step plots the points will be plotted in the
order in which they occur in the data set, so sort the data by the
x axis variable rst.
* A common variant of the xy plot distinguish groups in the data
by using dierent symbols/lines. This is done by the group=var
option. For example: scatter y=pctfat x=age/group=sex;
38 / 39
Chapter 1: Introduction to SAS (2/2)
1.8 SAS Graphics
Overlaying Plots
It is often useful to combine the information from two or more plots
by overlaying them. Sgplot does this automatically. For example, a
plot to compare the ts from linear regression and locally weighted
regression could be produced as follows:
proc sgplot data=bodyfat;
reg y=pctfat x=age;
loess y=pctfat x=age/nomarkers;
run;
39 / 39