EPG1V2_Summary of Lesson 3_ Exploring and Validating Data
EPG1V2_Summary of Lesson 3_ Exploring and Validating Data
Exploring Data
PROC PRINT lists all columns and rows in the input table by default. The OBS= data set option limits the number of rows listed. The VAR statement
limits and orders the columns listed.
PROC MEANS generates simple summary statistics for each numeric column in the input data by default. The VAR statement limits the variables to
analyze.
PROC UNIVARIATE also generates summary statistics for each numeric column in the data by default, but includes more detailed statistics related to
distribution and extreme values. The VAR statement limits the variables to analyze.
PROC FREQ creates a frequency table for each variable in the input table by default. You can limit the variables analyzed by using the TABLES
statement.
Filtering Rows
The WHERE statement is used to filter rows. If the expression is true, rows are read. If the expression is false, they are not.
Numeric values are not in quotation marks and must only include digits, decimal points, and negative signs.
When an expression includes a fixed date value, use the SAS date constant syntax: “ddmmmyyyy”d, where dd represents a 1- or 2-digit day, mmm
represents a 3-letter month in any case, and yyyy represents a 2- or 4-digit year.
WHERE Operators
= or EQ
^= or ~= or NE
> or GT
< or LT
>= or GE
<= or LE
"ddMONyyyy"d
IN Operator
WHERE col-name IN(value-1<...,value-n>);
WHERE col-name NOT IN (value-1<…,value-n>);
%LET macro-variable=value;
WHERE numvar=¯ovar;
WHERE charvar="¯ovar";
WHERE datevar="¯ovar"d
A macro variable stores a text string that can be substituted into a SAS program.
The %LET statement defines the macro variable name and assigns a value.
Macro variables can be referenced in a program by preceding the macro variable name with an &.
If a macro variable reference is used inside quotation marks, double quotation marks must be used.
Formatting Columns
Formats are used to change the way values are displayed in data and reports.
Visit SAS Language Elements documentation to access a list of available SAS formats.
<$>format-name<w>.<d>
PROC SORT sorts the rows in a table on one or more character or numeric columns.
The OUT= option specifies an output table. Without this option, PROC SORT changes the order of rows in the input table.
The BY statement specifies one or more columns in the input table whose values are used to sort the rows. By default, SAS sorts in ascending order.
The NODUPKEY option keeps only the first row for each unique value of the column(s) listed in the BY statement.
The NODUPKEY option together with the BY _ALL_ statement removes adjacent rows that are entirely duplicated.
Copyright © 2023 SAS Institute Inc., Cary, NC, USA. All rights reserved.