0% found this document useful (0 votes)
8 views

Introducing SAS Software: Acknowlegements To David Williams Caroline Brophy

The document discusses introducing SAS software. It covers SAS environments, files, programs, steps, and data sets. It also discusses how to get data into SAS, create variables, screen data for errors, and deal with data issues.

Uploaded by

Gowtham Sp
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Introducing SAS Software: Acknowlegements To David Williams Caroline Brophy

The document discusses introducing SAS software. It covers SAS environments, files, programs, steps, and data sets. It also discusses how to get data into SAS, create variables, screen data for errors, and deal with data issues.

Uploaded by

Gowtham Sp
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Introducing

SAS software

Acknowlegements to
David Williams
Caroline Brophy
Statistics
in
Science

Need to know
SAS environment
SAS files (datasets, catalogs etc) & libraries
SAS programs
How to:
Get data in
Manipulate data
Get results out

Statistics
in
Science

SAS software environment

Statistics
in
Science

SAS Windows (SAS 9)

Statistics
in
Science

Some (!) SAS windows


Editor

Where code is written or imported, and submitted

Log

What happened, including what went wrong

Output

Results of program procedures that produce output

Explorer

Shows libraries (SAS & Windows), their files, and where you can see data,
graphs

Results

Shows how the output is made up of tables, graphs, datasets etc

Notepad
A useful place to keep bits of code

Statistics
in
Science

SAS software programs

Statistics
in
Science

SAS Programs
data one;
input x y;
datalines;
-3.2 0.0024
-3.1 0.0033
. . .
;
run;
proc print data = one (obs = 5);
run;
proc means data = one;
run;

Statistics
in
Science

DATA step
creates SAS data set

PROC steps
process data in data set

Step Boundaries
SAS steps begin with a

DATA statement

PROC statement.

SAS detects the end of a step when it encounters

Statistics
in
Science

a RUN statement (for most steps)

a QUIT statement (for some procedures)

the beginning of another step (DATA statement or


PROC statement).

Recommendation: use RUN; at end of each step

Step Boundaries
data seedwt;
input oz $ rad wt;
datalines;
Low 118.4 0.7
High 109.1 1.3
Low 215.2 2.9
run;
proc print data = two;
proc means data = seedwt;
class oz;
var rad wt;
run;

Statistics
in
Science

Submitting a SAS Program


When you execute a SAS program, the output generated
by SAS is divided into two major parts:
SAS log

contains information about the processing of


the SAS program, including any warning and
error messages.

SAS output contains reports generated by SAS


procedures and DATA steps.
Statistics
in
Science

Recommended steps!
1) Submit all (or selected) code by
F4
Click on the runner in the toolbar
2) Read log
3) Look in output window
if you expect code to produce output
4) Problems
Bad syntax
Missing ; at end of line
Missing quote at end of title (nasty!)
Statistics
in
Science

Improved output - HTML


Tools Options Preferences Results

Do this & resubmit code


Check HTML output in Results Window

Statistics
in
Science

SAS data sets

Statistics
in
Science

SAS data sets


SAS procedures (PROC ) process data from SAS
data sets
Need to know (briefly!)
What a SAS data set looks like
How to get out data into a SAS data set

Statistics
in
Science

SAS data sets


live in libraries
have a descriptor part (with useful info)
have a data part which is a rectangular table
of character and/or numeric data values
(rows called observations)
have names with syntax
<libname.>datasetname
libname defaults to work if omitted

Statistics
in
Science

work library
SAS data sets with a single part name like
oz, wp or mybestdata99
1)

are stored in the work library

2)

can be referenced e.g. as


mybestdata99 or work.mybestdata99

3)

Statistics
in
Science

are deleted at end of SAS session!

Dont loose your data!


Keep the SAS program that read the data from its
original source

. . . More later!

Statistics
in
Science

Viewing descriptor & data


/* view descriptor part */
proc contents data = wp;
run;
/* view data part */
proc print data = work.wp;
run;
Alternatively:
Use SAS Explorer: Open (for data) Properties (for descriptor)
Properties is not as clear as CONTENTS
Statistics
in
Science

SAS variables
There are two types of variables:
character

contain any value: letters, numbers, special

characters, and blanks.


Character values are stored with a length of 1 to 32,767
bytes (default is 8).
One byte equals one character.
numeric

stored as floating point numbers in 8 bytes

of storage by default.
Eight bytes of floating point storage provide space for 16 or
17 significant digits.
You are not restricted to 8 digits.
Dont change the 8 byte length!

Statistics
in
Science

SAS variables

OUTPUT
The CONTENTS Procedure
Alphabetic List of Variables and Attributes
#
1
2
3

Statistics
in
Science

Variable
oz
rad
wt

Type
Char
Num
Num

Len
8
8
8

SAS names
for data sets & variables
can be 32 characters long.
can be uppercase, lowercase, or mixed-case
but are not case sensitive!
must start with a letter or underscore. Subsequent characters can
be letters, underscores, or numeric digits
- no %$!*&#@ or spaces.

Statistics
in
Science

Missing Data Values


A value must exist for every variable for each observation.
Missing values are valid values.
LastName

FirstName

JobTitle

Salary

TORRES
LANGKAMM
SMITH
WAGSCHAL
TOERMOEN

JAN
SARAH
MICHAEL
NADJA
JOCHEN

Pilot
Mechanic
Mechanic
Pilot

50000
80000
.
77500
65000

A character missing
value is displayed as
a blank.
Statistics
in
Science

A numeric
missing value
is displayed as
a period.

SAS syntax
Not case sensitive
Each line usually begins with keyword
and ends with ;
Common Errors:
Forget ;
Miss-spelt or wrong keyword
Missing final quote in title

title Woodpecker Habitat; /* quote mark missing */


title Woodpecker Habitat;

Statistics
in
Science

Comments
1.

Type /* to begin a comment.

2.

Type your comment text.

3.

Type */ to end the comment.

To comment selected typed text remember: Ctrl+/

Alternative:
* comment ;

Statistics
in
Science

SAS

Creating a SAS data set

Statistics
in
Science

Getting data in!


Consider 2 methods

Statistics
in
Science

1)

Data in program (briefly!)

2)

Data in Excel workbook

Getting data in!


Data in program file:
data oz;
input
datalines;
Low 118.4
High 109.1
Low 215.2
. . .
;
run;

oz $ rad wt;
0.7
1.3
2.9
Note:
1. oz is text variable so requires $
2. No missing values
3. Values of oz

Statistics
in
Science

dont contain spaces

are at most 8 character long

Getting data in!


from Excel
Use IMPORT wizard
saving program to reduce future clicking!

Statistics
in
Science

Creating new variables


Adding a new variable to an existing SAS data
set (say work.old)
1. Use set
2. Give definition of new variable
data new;
/* read data from work.old */
set old;
y2 = y**2;
ly = log(y);
ly_base10 = log10(y);
t1 = (treat = 1);
run;
Statistics
in
Science

Data set: work.new


Obs

Statistics
in
Science

treat

ysquared

logy

logy_base10

t1

10.0

100.00

2.30259

100.0

10000.00

4.60517

-10.0

100.00

0.0

0.00

0.1

0.01 -2.30259

-1

Data Screening

Statistics
in
Science

Data Screening
checking input data for gross errors

Use PRINT procedure to scan for obvious anomalies


Use MEANS procedure & examine summary table
MAXIMUM, MINIMUM reasonable?
MEAN - near middle of range?
MISSING VALUES - input or calculation error e.g.
log(0)?
CV (= 100*std.dev/mean) - < 10% for plant growth,
between 12 & 30% for animal production variables, >
50% implies skewness for any positive variable

Statistics
in
Science

SAS syntax
MEANS syntax

What else should go here?

Statistics
in
Science

Dealing with data errors


Check original records
Change mistakes in recording where the correct
value is beyond question
Regenerate observations where possible e.g.
reweigh sample, redo chemical analysis
With a large body of data in an unbalanced
design err on the side of omitting questionable
data
Do not proceed until data has been
properly cleaned if necessary
perform a number of screening runs
Statistics
in
Science

You might also like