PART 2: DATA WRANGLING
Understanding Data
step programming
You will learn:
How the DATA step works
General form of the programming statements
Programming techniques
Reading Materials
Chapter 6 and Chapter 7,
Step-by-Step Programming with Base SAS Software. 2001. Cary,
NC: SAS Institute Inc.
You can find the programs used in these examples and the
data at the link given below:
https://documentation.sas.com/?docsetId=basess&docsetTarget=titlepage.
htm&docsetVersion=9.4&locale=en
You can also Google the name of the book and choose a link that
provides online documentation
Input data used for examples
missing value
Assignment statement
Consider below how a new variable is created and assigned the value of a
mathematical expression
Data Tours1;
Run;
New variable Expression
created
Adding information to some observations but
not others
The use of IF-THEN-ELSE conditions
Note that for 1st and 3rd observtions ‘BonusPoints’ variable has missing value
Making uniform changes to data without
creating new variables
Notice AirCost appears on
Both sides of equality sign
Efficient use of variables
Inefficient: creating variables that scatter information
Efficient use of variables
Efficient: using variables to contain maximum information
More info packed
in one variable
Defining enough storage space for variables
Use LENGTH statement if longest value of variable is not in the 1st assignment
Statement
LENGTH statement
1st assignment statement
Conditionally deleting observations
Working with
Numeric Variables
Chapter 7
You will learn the following:
Input data set for the examples
Numeric variables
Creating new variables By arithmetic expressions
Understanding how SAS handles missing values
Propagating missing values
When you use a missing value in an arithmetic expression, SAS
sets the result of the expression to missing. If you use that
result in another expression, the next result is also missing.
In SAS, this method of treating missing values is called
propagation of missing values.
Calculations Using SAS Functions
Rounding Values
The following assignment statement rounds the value of AirCost to
the nearest $50:
RoundAir = round(AirCost, 50);
The following statement calculates the total cost of each tour,
rounded to the nearest $100:
TotalCostR = round(AirCost + LandCost, 100);
SAS contains around 280 built-in numeric expressions called functions.
Calculating total cost when some values are
missing
An assignment statement creating TotalCost variable when some of its
Components are missing would generate missing values, for example
TotalCost = AirCost + LandCost ;
The SUM function would, however, base calculations only on
non-missing values:
CostSum = Sum(AirCost , LandCost) ;
Combining functions
Logical operators
Input data set used in examples
Comparing numerical variables using logical
operators
Note: Australia tour (>$2000) is deleted