0% found this document useful (0 votes)
188 views61 pages

Chapter 1 and 2

This document provides an overview of data preparation for analytics using SAS. It discusses SAS programming essentials including running SAS programs, fundamental concepts, and debugging. It also covers making use of SAS Enterprise Guide for programming. The document outlines tasks involving business data from an airline, including importing data, creating reports, and data analysis using SAS procedures.

Uploaded by

sameer_kini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
188 views61 pages

Chapter 1 and 2

This document provides an overview of data preparation for analytics using SAS. It discusses SAS programming essentials including running SAS programs, fundamental concepts, and debugging. It also covers making use of SAS Enterprise Guide for programming. The document outlines tasks involving business data from an airline, including importing data, creating reports, and data analysis using SAS procedures.

Uploaded by

sameer_kini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 61

ISQS 6339, Data Management & Business

Intelligence

Data Preparation for


Analytics Using SAS
Zhangxi Lin
Texas Tech University

ISQS 6347, Data & Text Mining

Outline

An overview of data preparation for analytics


SAS Programming Essentials

Running SAS programs


Mastering fundamental concepts
SAS program debugging

Make use of SAS Enterprise Guide for programming

ISQS 6347, Data & Text Mining

Structure and Components of


Business Intelligence

ISQS 6347, Data & Text Mining

Overview: From Data


Warehousing to Data Analysis

Previous major topics in data warehousing (using SQL Server


2008)

Dimensional model design


ETL
Cubes design and OLAP

Data analysis topics (using SAS)

Data preparation

Analytic business questions


Data format and data conversion

Data cleansing
Data exploratory
Data analysis
Data visualization
ISQS 6347, Data & Text Mining

Components of the SAS


System
Reporting
And
Graphics

Data Access
And
Management

User
Interface

Analytical

Base SAS

Application
Development

Visualization
And Discovery

Business
Solutions

Web
Enablement

ISQS 6347, Data & Text Mining

SAS Programming Essentials

Find more information from

http://support.sas.com

ISQS 6347, Data & Text Mining

Data-driven Tasks

The functionality of the SAS System is built around


four data-driven tasks common to virtually any
applications

Data access
Data management
Data analysis
Data presentation

ISQS 6347, Data & Text Mining

Turning Data into


Information

Process of delivery meaningful information

80% data-related

Access
Scrub
Transform
Mange
Store and retrieve

20% analysis

ISQS 6347, Data & Text Mining

Turning Data into


Information DATA
Data

Step

SAS
Data Sets

PROC
Steps

Information

ISQS 6347, Data & Text Mining

Design of the SAS System


MultiVendor Architecture

90%
independent

PC

10%
dependent

Workstation

Servers/
Midrange

Mainframe

ISQS 6347, Data & Text Mining

Super
Computer

10

...

Design of the SAS System


MultiEngine Architecture
DB2
Teradata

SAP
dBase

DATA
DATA

ORACLE

SYBASE

Microsoft Excel

ISQS 6347, Data & Text Mining

11

SAS Programming Level


I

Fundamentals (ch1-3)
Producing list reports (ch4)
Enhancing output (ch5)
Creating data sets (ch6)
Data step programming (ch7)

Reading data
Creating variables
Conditional processing
Keeping and dropping variables
Reading Excel files

Combining SAS data sets (ch8)


Producing summary reports (ch9)
SAS graphing (ch10)
ISQS 6347, Data & Text Mining

12

Course Scenario
In

this course, you work with business data


from International Airlines (IA). The various
kinds of data that IA maintains are listed below:

flight data
passenger data
cargo data
employee data
revenue data

ISQS 6347, Data & Text Mining

13

Course Scenario
The

following are some tasks that you will


perform:

importing data
creating a list of employees
producing a frequency table of job codes
summarizing data
creating a report of salary information

ISQS 6347, Data & Text Mining

14

SAS Programs
A SAS program is a sequence of steps that the user
submits for execution.
Raw
Raw
Data
Data

DATA steps are typically used to create SAS


data sets.
DATA
DATA
Step
Step

SAS
Data
Set

SAS
Data
Set

PROC
PROC
Step
Step

Report
Report

PROC steps are typically used to process


SAS data sets (that is, generate reports
and graphs, edit data, and sort data).

ISQS 6347, Data & Text Mining

15

SAS Programs
data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;

DATA
Step

proc print data=work.staff;


run;
proc means data=work.staff;
class JobTitle;
var Salary;
run;

ISQS 6347, Data & Text Mining

PROC
Steps

16

Step Boundaries
SAS steps begin with either of the following:
DATA statement
PROC statement
SAS detects the end of a step when it encounters
one of the following:
a RUN statement (for most steps)
a QUIT statement (for some procedures)
the beginning of another step (DATA statement
or PROC statement)

ISQS 6347, Data & Text Mining

17

Step Boundaries
data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff;
proc means data=work.staff;
class JobTitle;
var Salary;
run;

ISQS 6347, Data & Text Mining

18

Running a SAS Program


You can invoke SAS in the following ways:
interactive windowing mode (SAS windowing
environment)
interactive menu-driven mode (SAS Enterprise Guide,
SAS/ASSIST, SAS/AF, or SAS/EIS software)
batch mode
noninteractive mode

ISQS 6347, Data & Text Mining

19

Preparation of SAS
Programming

Data sets: \SAS-Programming


Create a user defined library reference

Statement
LIBNAME libref SAS-data-library <options>;

Example
LIBNAME ia c:\workshop\winsas\prog1;

Two-levels of SAS files names


Libref.fielname

ISQS 6347, Data & Text Mining

20

SAS Programming Essentials

Demon: c02s2d1
Exercise: c02ex1

ISQS 6347, Data & Text Mining

21

Browsing the Descriptor


Portion

General form of the CONTENTS procedure:

PROC
PROCCONTENTS
CONTENTS DATA=SAS-data-set;
DATA=SAS-data-set;
RUN;
RUN;

Example:
proc contents data=work.staff;
run;

ISQS 6347, Data & Text Mining

c02s3d1
22

SAS Data Sets: Data Portion


The data portion of a SAS data set is a rectangular table
of character and/or numeric data values.
JobTitle

Salary

TORRES
LANGKAMM
SMITH
WAGSCHAL
TOERMOEN

JAN
SARAH
MICHAEL
NADJA
JOCHEN

Pilot
Mechanic
Mechanic
Pilot
Pilot

50000
80000
40000
77500
65000

Character values

Variable
values

FirstName

Variable
names

LastName

Numeric
values

Variable names are part of the descriptor portion, not the


data portion.
ISQS 6347, Data & Text Mining

23

SAS Variable Values


There are two types of variables:
character

contain any value: letters, numbers, special


characters, and blanks. Character values are
stored with a length of 1 to 32,767 bytes. One
byte equals one character.

numeric

stored as floating point numbers in 8 bytes


of storage by default. Eight bytes of floating point
storage provide space for 16 or 17 significant
digits. You are not restricted to
8 digits.

ISQS 6347, Data & Text Mining

24

SAS Data Set and Variable


SAS names have these characteristics:
Names

can be 32 characters long.


can be uppercase, lowercase, or mixed-case.
are not case sensitive.
must start with a letter or underscore.
Subsequent characters can be letters,
underscores, or numerals.

ISQS 6347, Data & Text Mining

25

Valid SAS Names

Select the valid default SAS names.


data5mon

ISQS 6347, Data & Text Mining

26

...

Valid SAS Names

Select the valid default SAS names.


data5mon

ISQS 6347, Data & Text Mining

27

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata

ISQS 6347, Data & Text Mining

28

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata

ISQS 6347, Data & Text Mining

29

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5

ISQS 6347, Data & Text Mining

30

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5

ISQS 6347, Data & Text Mining

31

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5
five months data

ISQS 6347, Data & Text Mining

32

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5
five months data

ISQS 6347, Data & Text Mining

33

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5
five months data
fivemonthsdata

ISQS 6347, Data & Text Mining

34

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5
five months data
fivemonthsdata

ISQS 6347, Data & Text Mining

35

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5
five months data
fivemonthsdata
FiveMonthsData

ISQS 6347, Data & Text Mining

36

...

Valid SAS Names

Select the valid default SAS names.


data5mon

5monthsdata
data#5
five months data
fivemonthsdata
FiveMonthsData

ISQS 6347, Data & Text Mining

37

...

Missing Data Values


A value must exist for every variable for each observation.
Missing values are valid values.
LastName

FirstName

JobTitle

Salary

TORRES
LANGKAMM
SMITH
WAGSCHAL
TOERMOEN

JAN
SARAH
MICHAEL
NADJA
JOCHEN

Pilot
Mechanic
Mechanic
Pilot

50000
80000
.
77500
65000

A character missing
value is displayed as
a blank.
ISQS 6347, Data & Text Mining

A numeric
missing value
is displayed
as a period.
39

Browsing the Data Portion


The

PRINT procedure displays the data


portion
of a SAS data set.
By

default, PROC PRINT displays the


following:

all observations
all variables
an Obs column on the left side
ISQS 6347, Data & Text Mining

40

Browsing the Data Portion

General form of the PRINT procedure:

PROC
PROCPRINT
PRINT DATA=SAS-data-set;
DATA=SAS-data-set;
RUN;
RUN;

Example:
proc print data=work.staff;
run;

ISQS 6347, Data & Text Mining

c02s3d1
41

SAS Data Set Terminology


SAS documentation and text in the SAS windowing
environment use the following terms interchangeably:
SAS
SASData
DataSet
Set

SAS
SASTable
Table

Variable
Variable

Column
Column

Observation
Observation

Row
Row

ISQS 6347, Data & Text Mining

42

SAS Syntax Rules


SAS statements have these characteristics:
usually begin with an identifying keyword
always end with a semicolon
data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff;
run;
proc means data=work.staff;
class JobTitle;
var Salary;
run;

ISQS 6347, Data & Text Mining

43

SAS Syntax Rules


SAS statements are free-format.
One or more blanks or special characters can
be used to separate words.
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing

data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff;
class JobTitle; ISQS 6347,
var
Salary;run;
Data & Text Mining

44

...

SAS Syntax Rules


SAS statements are free-format.
One or more blanks or special characters can
be used to separate words.
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing

data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff;
class JobTitle; ISQS 6347,
var
Salary;run;
Data & Text Mining

46

...

SAS Syntax Rules


SAS statements are free-format.
One or more blanks or special characters can
be used to separate words.
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing

data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff;
class JobTitle; ISQS 6347,
var
Salary;run;
Data & Text Mining

47

...

...

SAS Syntax Rules


SAS statements are free-format.
One or more blanks or special characters can
be used to separate words.
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing

data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff;
class JobTitle; ISQS 6347,
var
Salary;run;
Data & Text Mining

48

...

...

SAS Syntax Rules


SAS statements are free-format.
One or more blanks or special characters can
be used to separate words.
They can begin and end in any column.
A single statement can span multiple lines.
Several statements can be on the same line.
Unconventional Spacing

data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc means data=work.staff;
class JobTitle; ISQS 6347,
var
Salary;run;
Data & Text Mining

49

SAS Syntax Rules


Good spacing makes the program easier to read.
Conventional Spacing
data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff;
run;
proc means data=work.staff;
class JobTitle;
var Salary;
run;

ISQS 6347, Data & Text Mining

50

SAS Comments

Type /* to begin a comment.


Type your comment text.
Type */ to end the comment.

/* Create work.staff data set */


data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
/* Produce listing report of work.staff */
proc print data=work.staff;
run;
ISQS 6347, Data & Text Mining

c02s3d2
51

Syntax Errors
Syntax errors include the following:
misspelled keywords
missing or invalid punctuation
invalid options
daat work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;

proc print data=work.staff


run;
proc means data=work.staff average max;
class JobTitle;
var Salary;
run;
ISQS 6347, Data & Text Mining

52

Debugging a SAS
Program

c02s4d1.sas
userid.prog1.sascode(c02s4d1)
c02s4d2.sas
userid.prog1.sascode(c02s4d2)

This demonstration illustrates how to submit a


SAS program that contains errors, diagnose
the errors, correct the errors, and save the
corrected program.

ISQS 6347, Data & Text Mining

53

Recall a Submitted Program


Program statements accumulate in a recall buffer
each time you issue a SUBMIT command.
daat work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff
run;
proc means data=work.staff average max;
class JobTitle;
var Salary;
run;
data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff;
run;
proc means data=work.staff mean max;
class Jobtitle;
var Salary;
ISQS 6347, Data & Text Mining
run;

Submit
Number 1

Submit
Number 2
54

Recall a Submitted Program


Issue the RECALL command once to recall the most
recently submitted program.

Submit
Number 1
Issue RECALL
once.
Submit
Number 2

data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff;
run;
proc means data=work.staff mean max;
class JobTitle;
var Salary;
run;

Submit Number 2 statements


are recalled.

ISQS 6347, Data & Text Mining

55

Recall a Submitted Program


Issue the RECALL command again to recall Submit
Number 1 statements.

Submit
Number 1
Issue RECALL
again.
Submit
Number 2

daat work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff
run;
proc means data=work.staff average max;
class JobTitle;
var Salary;
run;
data work.staff;
infile 'raw-data-file';
input LastName $ 1-20 FirstName $ 21-30
JobTitle $ 36-43 Salary 54-59;
run;
proc print data=work.staff;
run;
proc means data=work.staff mean max;
class JobTitle;
var Salary;
run;
56
ISQS 6347, Data & Text Mining

Exercise 8: Basic SAS


Programming

Define library IA and Out


Go through all SAS programs in Chapter 2-5.
Write a SAS program to read a dataset created by
yourself or simply use Person0.txt in
\\TechShare\coba\d\ISQS3358\OtherDatasets\ .
The dataset is output to your library Out.
Try to apply whatever SAS features in Chapter 5 of Prog-I
to general a nice looking report.
Go through all exercises for Ch 2, 3, 4, 5, 6 (answer keys
are available, so no need to submit the results)
ISQS 6347, Data & Text Mining

57

Hands-on exercise

Write a SAS program to calculate the number


of dates passed in 2012 to 3/3/2012. The
input is in the format: date9.
01JAN2012 03MAR2012
Answer: 62 days

ISQS 6347, Data & Text Mining

58

Making Use of SAS Enterprise


Guide Code

Import a text file

Example: Orders.txt

Import an Excel file

Example: SupplyInfo.xls

ISQS 6347, Data & Text Mining

59

Learn from Examples

SAS Help

Contents -> Learning to use SAS -> Sample SAS


Programs -> Base SAS
Base Usage Guide Examples

Chapter 3, 4

ISQS 6347, Data & Text Mining

60

Import an Excel Sheet


proc import out=work.commrex
datafile ="C:\Lin\Shared\ISQS6339\Commrex_3358.xls" dbms=excel
replace;
sheet="Company";
getnames=yes;
mixed=no;
scantext=yes;
usedate=yes;
scantime=yes;
run;
proc print data=work.commrex;
run;

ISQS 6347, Data & Text Mining

61

Excel SAS/ACCESS LIBNAME


Engine
libname xlsdata 'C:\Lin\Shared\ISQS6339\Commrex_3358.xls';
proc print data=xlsdata.New1;
run;

ISQS 6347, Data & Text Mining

62

Exercise 8: SAS Data Step


Programming

http://zlin.ba.ttu.edu/6339/ExerciseSASProgramming.htm

ISQS 6347, Data & Text Mining

63

You might also like