0% found this document useful (0 votes)
181 views20 pages

Sas Handbook: By: Luis Montes

The document provides an overview of SAS procedures and statements for data management, sorting/printing/summarizing data, and statistical analysis. It covers various SAS statements and procedures for importing and manipulating data like the DATA, SET, MERGE, FORMAT, and ARRAY statements. Statistical procedures discussed include PROC PRINT, PROC FREQ, PROC MEANS, PROC TTEST and more for tasks like descriptive statistics, hypothesis testing, and modeling.

Uploaded by

lmontes93
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
181 views20 pages

Sas Handbook: By: Luis Montes

The document provides an overview of SAS procedures and statements for data management, sorting/printing/summarizing data, and statistical analysis. It covers various SAS statements and procedures for importing and manipulating data like the DATA, SET, MERGE, FORMAT, and ARRAY statements. Statistical procedures discussed include PROC PRINT, PROC FREQ, PROC MEANS, PROC TTEST and more for tasks like descriptive statistics, hypothesis testing, and modeling.

Uploaded by

lmontes93
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

SAS HANDBOOK

By: Luis Montes

Math 338 - Introduction to SAS Fall 2013

Table of Contents
1. DATA MANAGEMENT
1.1 Data Step
A. B. C. D. E. F. G. H. I. J. K. L. M. N. O. P. Q. R. S. DATA Statement Options Defining Variables Input Statement Datalines Statement Set Statement Merge Statement Length Statement Label Statement If-Else Statement Infile Statement Do Statement Keep-Drop Statements Output Statement Generating Random Numbers Internal Values Format and Informat Statements File Statement Put Statement Array Statement A. B. C. D. E. F. G. Proc Print Statement Options ID Statement By Statement Sum Statement Title & Footnote Statements Var Statement Sumby Statement

2.2 Proc Frequency Step


A. B. C. D. Proc Frequency Statement Options Weight Statement Tables Statement Where Statement

2.3 Proc Contents Step


A. Proc Contents Statement Options

2.4 Proc Tabulate Step


A. B. C. D. Proc Tabulate Statement Options Class Statement Var Statement Table Statement

2.5 Proc Sort Step


A. Proc Sort Statement Options

1.2 Proc Import Step


A. Proc Import Statement Options B. Getnames Statement

2.6 Proc GChart Step


A. Proc GChart Statement Options B. HBar, VBar, and VBar3D C. Block Statement

1.3 Statements Outside of Data and Procedure Steps


A. Libname Statement B. Quit Statement

2.7 Proc GPlot Step


A. Proc GPlot Statement Options

2. SORTING, PRINTING, AND SUMMARIZING DATA


2.1 Proc Print Step
B. Plot Statement C. Symbol Statement

2.8 Proc Format Step


A. Proc Format Statement Options B. Value Statement C. Picture Statement

3. STATISTICAL ANALYSIS IN SAS


3.1 Proc Univariate Step
A. Proc Univariate Statement Options B. Var Statement C. Histogram Statement

3.2 Proc Means Step


A. Proc Means Statement Options B. Var Statement

3.3 Proc ttest Step


A. B. C. D. Proc ttest Statement Options Class Statement Var Statement Paired Statement

3.4 Proc Corr Step


A. Proc Corr Statement Options B. Var Statement

3.5 Proc Reg Step


A. Proc Reg Statement Options B. Model Statement C. Plot Statement

3.6 Proc GLM Step


A. Proc GLM Statement Options B. LSMeans Statement

3.7 Proc Logistic Step


A. Proc Logistic Statement B. Class Statement C. Model Statement

1 DATA MANAGEMENT
1.1 DATA STEP
A. Data Statement Options
DATA DATA-SET-NAME-1<(DATA-SET-OPTIONS-1)> <DATA-SET-NAME-N<(DATA-SET-OPTIONSN>;

-When a data set is named _NULL_ the data step does not produce a data set. -When a datas name has the format <libname>.<data-set-name>, a permanent data set is created, located at the file path of the libname.

B. Defining Variables
Assignment Statement: VARIABLE-NAME = VALUE; -Variables can take alphanumeric values or numeric values. They can also take the output value of a function such as: Smallest(1, of 1,2,3) = 1 -Variables can also be created if mentioned in input, length, format, informat, etc. statements, but do not take values until they are defined.

C. Input Statement
SYNTAX: INPUT VARIABLE <SPECIFICATION(S)><@|@@>; -Variables are listed and separated by spaces. A specification may follow a variable name. -$ Follows a variable name, specifies it is alphanumeric.

-i and i-j Examples of column specifications. The former specifies a variable be read starting at column i, and the latter specifies a variable is read over columns i through j.

-@

This is a trailing @. It must be the last item in the input statement or else it becomes a pointer control. It holds the input reader at the final location, and the next input statement continues at this spot.

-#n -/ -@n This is a column pointer. It moves the input reader to column n. n must be an integer. Advances the input reader to the first column of the next line. This is a line pointer. It moves the input reader to row n.

D. Datalines Statement
SYNTAX: DATALINES <OPTIONS>; -With no options, the datalines statement is followed by raw data entered by the user. SAS software displays this by highlighting the raw data in yellow.

-Delimiter=<dlm> option
Specifies what is delimiting the raw data. By default SAS uses one space as a delimiter, but it can also use commas or tabs (dlm=09x) among many others.

E. Set Statement
SYNTAX: SET DATA-SET(S) <(DATA-SET-OPTION(S)> <SET-OPTIONS>; -Recall that the DATA step is itself a loop being applied to a data set. Whenever the Set statement is read, it reads one row of observations (including all variables), into the program data vector, which can be manipulated in the data set and even output if desired. -IN=<USER-GENERATED-VARIABLE-NAME> option This option generates a new variable (which we name), which takes a value of 1 if the data set contributes to an observation and take a value of 0 otherwise.

F. Merge Statement
SYNTAX: MERGE DATA-SET(S) <(DATA-SET-OPTIONS)>;

-The Merge statement differs from the set statement in that instead of combining data
sets by stacking observations vertically, the merge statement combines observations of data sets horizontally, adding variables. A BY <VARIABLE>; statement following a merge statement is very helpful.

G. Length Statement
SYNTAX: LENGTH VARIABLE-1 <VARIABLE-1-SPECIFICATION> VARIABLE-1-LENGTH; -The length statement changes the length of a variable to 2-8 or 3-8 for numeric variables (depending on operating environment) and 1-32767 for alphanumeric variables. Variables can also be defined in the length statement, as such, placing a $ after a variable name specifies it as an alphanumeric variable.

H. Label Statement
SYNTAX: LABEL <VARIABLE-1> =<LABEL-1>; -The label statement changes the face name of the variable it is applied to. If it is applied in a data step, the label is permanently associated with the variable. It can be applied in a procedure step, but if it is not used in the data step, the label will not be used outside the procedure step.

I. If-Else Statement
SYNTAX: IF (LOGICAL EXPRESSION) THEN (STATEMENT); <ELSE (STATEMENT)>; -SAS reads the logical expression after IF and if it returns a TRUE value, then it executes the statement after THEN. An ELSE statement is not necessary but it need follow the IF statement, and its statement is executed if the logical expression after IF returns a FALSE value.

J. Infile Statement
SYNTAX: INFILE FILE-PATH <OPTIONS>; -The file-path is a pathway to an external file we want to pull into SAS, such as a .txt file. Just as it was used for the datalines statement, DLM=<dlm> can be used as an option here. -FLOWOVER option The default method of reading for infile. When a data set has a missing value, it is skipped and the input reader gives a variable the character that follows.

-MISSOVER option The input reader continues onto the next variable when it detects a missing value, and specifies remaining variables (when it reaches end of input line) as missing values.

-STOPOVER option The input reader is stopped and it omits a row when it detects a missing value.

The figure to the right is a screenshot of examples for MISSOVER, FLOWOVER, and STOPOVER options for the infile statement. They are applied to the data set:

1, 2, 3 1, , 3 , 2, 3

K. Do Statement
SYNTAX: DO INDEX-VAR=SPECIFICATION <TO <END-SPECIFICATION> BY <SPECIFICATIONINCREMENTS>>; SAS STATEMENT(S) <END;>

-Conditional Do Loops (While) We have the option to have SAS execute statements while a logical expression is true. The logical expressions value is checked after all the statements are executed.

-Conditional Do Loops (Until) We have the option to have SAS execute statements until a logical expression becomes true. The logical expressions value is checked before any of the statements are executed.

-Iterative Do Loops (Ex. i=1 to 100 by 5) We can also have SAS execute a statement a finite number of times, while also creating an iterative variable. The by option designates the increment

L. Keep-Drop Statement

SYNTAX: DROP VARIABLE-1,VARIABLE-N;


KEEP VARIABLE-1,VARIABLE-N;

-The Drop statement drops all listed variables in the data set. Variables not listed remain. -The Keep statement keeps all listed variables in the data set. Variables not listed are dropped. -Keep and Drop can also be used as options in a set statement, in the form: SET DATA (KEEP=VARIABLE);

M. Output Statement
SYNTAX: OUTPUT <DATA-SET>; -Without listing data sets after OUTPUT, the OUTPUT statement writes the current observation to all data sets in the data statement. Otherwise, only the data sets listed take the current observation.

N. Generating Random Numbers


SYNTAX: VARIABLE=RAND(DISTRIBUTION); -The random function generates a random number with a given distribution. RAND(BINOMIAL,p,n) ~ Bin(p,n) RAND(GEOMETRIC,p) ~ Geom(p) RAND(POISSON,m) ~ Pois(m) RAND(UNIFORM) ~ U(0,1) RAND(BERNOULLI,p) ~ Bern(p)

O. Internal Values
_N_ : The number of observations in the DATA set.

P. Format and Informat Statements


SYNTAX: FORMAT VARIABLE-1 FORMAT-1 <VARIABLE-N> <FORMAT-N>
INFORMAT VARIABLE-1 INFORMAT-1 <VARIABLE-N> <INFORMAT-N>

-The format statement changes the appearance of a variable without changing the original variable. A list of formats can be found at:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a00126375 3.htm.

-The informat statement tells SAS to permanently change the raw data form of a variable into a formatted form. Informats can also be applied in the input statement. Informats for SAS 9.2 can be found at:
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a00123977 6.htm.

Q. File Statement
SYNTAX: FILE FILE-PATH <DEVICE> <OPTIONS> -The file statement creates an external file that will be written by the put statements in the data step. We can also use the print device so that the created external file is also displayed in the output window.

R. Put Statement
SYNTAX: PUT VARIABLE <SPECIFICATION(S)><@|@@>; -The put statement works similarly to the input statement, only it is applied to the printing of an external file given by the file statement.

S. Array Statement
SYNTAX: ARRAY ARRAY-NAME {SUBSCRIPT} <$><LENGTH> <ARRAY-ELEMENTS>; -SAS generates an array with the ARRAY statement. The name, subscript, whether or not its alphanumeric (placing the $ symbol), length and elements are generated by the user.

1.2 PROC IMPORT STEP


A. Proc Import Statement Options
SYNTAX: PROC IMPORT DATAFILE=FILE-PATH OUT=<LIBNAME.>DATA-SET<DATA-SET-OPTIONS>
<DBMS=FILETYPE> <REPLACE>;

-The proc import step is helpful for importing large files (given by the file-path) into SAS such as excel (.xls) files and export (.xpt) files. The proc import statement includes an out argument, producing a data set. The replace option will overwrite any existing data set with the same name.

B. GetNames Statement
SYNTAX: GETNAMES=(YES-OR-NO) -This statement specifies whether or not proc import should take the first row of the input data file as the list of variable names.

1.3 STATEMENTS OUTSIDE OF DATA AND PROCEDURE STEPS


A. Libname Statement
SYNTAX: LIBNAME NAME FOLDER-PATH; -The libname statement produces a library for permanent SAS data sets to be created by data steps. A permanent SAS data set is created in a data step if it is named NAME.dataset, where NAME is the name of the library.

B. Options Statement
SYNTAX: OPTIONS <OPTIONS>; -The options statement can do things like change the line size, page orientation, etc.

2 SORTING, PRINTING, AND SUMMARIZING DATA


2.1 PROC PRINT STEP
A. Proc Print Statement Options
SYNTAX: PROC PRINT <OPTION(S)>; -The proc print step is usually used to show the observations of a data set in a list, while giving the user several options. The proc print statement itself has a few options: Data=data-set Specifies which data set to print. Label Prompts SAS to use user-generated labels, whether they be created in the datasets data step or in this proc print step. noobs Removes the observation numbers in the print output.

B. ID Statement
SYNTAX: ID VARIABLE(S); -Designates that SAS use a particular variable or set of variables in printing instead of observation numbers. If more than one variable is in the ID statement, more than one group is printed.

C. By Statement
SYNTAX: BY <DESCENDING> VARIABLE(S); -The by statement specifies the ordering of the printing. If we desire the printing to be done in a descending order of a variable, then we can add the Descending option before the variable name. If more than one variable is listed, then the printing output is done in a group format.

D. Sum Statement
SYNTAX: SUM VARIABLE(S);

-The sum statement totals the values of the given variable(s) and prints them in the output window.

E. Title & Footnote Statement


SYNTAX: TITLE<110> TEXT MESSAGE;
FOOTNOTE<1-10> TEXT MESSAGE;

-The title and footnote statements work the same way. The number specifies the placement smallest numbers indicate main titles/footnotes. It can also be used in many other procedures with the same effect.

F. Var Statement
SYNTAX:VAR VARIABLE(S); -The var statement specifies which variables to print and their order. It is used in many other procedures.

G. Sumby Statement
SYNTAX: SUMBY VARIABLES(S); -The output print will include a sum for each variable listed in the sumby statement.

2.2 PROC FREQUENCY STEP


A. Proc Frequency Statement Options
SYNTAX: PROC FREQUENCY <DATA=DATA-SET> <ORDER=ORDER>; -The frequency procedure is effective in analyzing categorical data as it provides frequency counts, proportions, and can be used to perform chi-square tests. The order option can take values data, formatted, freq, or internal. The data order is the one of the appear FORMATTED: Sorted by order of formatting FREQ: Sorted by descending frequency count INTERNAL: Taking the order of the unformatted values DATA: Order in input data set

B. Weight Statement

SYNTAX: WEIGHT VARIABLE; -Specifying which numeric variable gives the counts of each observation in the input data set.

C. Tables Statement
SYNTAX: TABLES <N-WAY TABLES> </ OPTIONS>; -The tables statement generates tables that can be one-way to n-way tables. -ALPHA= option

Setting confidence level for confidence intervals


-Binomial option

Getting binomial proportion, confidence limits, and tests if tables are one-way
-Chisq option

Getting chi-square tests and statistics D. Where Statement


SYNTAX: WHERE EXPRESSION-1 <AND/OR> <EXPRESSION-N>; -Producing proc frequency outputs only where the expression(s) return true values. Can be used in many other procedures.

2.3 PROC CONTENTS STEP


A. Proc Contents Statement Options
SYNTAX: PROC CONTENTS <OPTION(S)>; -The contents procedure produces a detailed description of a given data set, such as a listing of variables with descriptions like length, type, etc.; number of observations in the data set; etc.

2.4 PROC TABULATE STEP


A. Proc Tabulate Statement Options
SYNTAX: PROC TABULATE <OPTION(S)>;

-The tabulate procedure provides statistics that can be produced in other procedures, but places them in a compact table/set of tables.

B. Class Statement
SYNTAX: CLASS VARIABLE(S) </OPTION(S)>; -The class statement is used in many procedures, it specifies one or more variables to be grouped.

C. Var Statement
SYNTAX: VAR VARIABLE(S) </OPTION(S)>; -The var statement is used in many procedures, it specifies one or more variables to be analyzed, the method of which depending on the procedure.

D. Table Statement
SYNTAX: TABLE VARIABLE(S) </OPTION(S)>; -The class statement is used in many procedures, it specifies one or more variables to be grouped.

2.5 PROC SORT STEP


A. Proc Sort Statement Options
SYNTAX: PROC SORT <DATA=DATA-SET>; -The sort procedure sorts a data set by a variable specified by a nested by statement. It is usually used before a new data set that will merge sorted data sets by a particular variable.

2.6 PROC GCHART STEP


A. Proc GChart Statement Options
SYNTAX: PROC GCHART <DATA=DATA-SET>; -The GChart procedure produces visual summaries of data in the form of charts. We can produce block charts, horizontal and vertical bar charts, pie and donut charts, and star charts.

B. HBar, VBar and Vbar3d Statements


SYNTAX: HBAR VARIABLE-1 <VARIABLE(S)> </OPTIONS>;
VBAR VARIABLE-1 <VARIABLE(S)> </OPTIONS>; VBAR3D VARIABLE-1 <VARIABLE(S)> </OPTIONS>;

-The HBar statement creates a horizontal bar chart for frequencies (default), sums, or means. VBar is similar, only the bar charts are vertical.

- The HBar3D statement creates a 3-d horizontal bar chart for frequencies (default),
sums, or means. VBar3d is similar.

C. Block Statement
-The block statement is very similar to the bar statements only that the block statement produces visual summaries in the form of blocks instead of bars.

2.7 PROC GPLOT STEP


A. Proc GPlot Statement Options
SYNTAX: PROC GPLOT <DATA=DATA-SET>; -The GPLOT procedure produces visual summaries for data, this time on a set of axes. --

B. Plot Statement
SYNTAX: PLOT Y-VARIABLE*X-VARIABLE </OPTIONS>; -We can plot a y-variable against an x-variable very easily with the plot statement.

C. Symbol Statement
SYNTAX: SYMBOL <COLOR=SYMBOL-COLOR>; The symbol statement helps us edit the gplot output.

2.8 PROC FORMAT STEP


A. Proc Format Statement Options
SYNTAX: PROC FORMAT; The format procedure helps change appearance of output

B. Value Statement
SYNTAX: VALUE <$> NAME <(FORMAT-OPTION(S)> <VALUE-RANGE(S)>;
-The value statement works to replace the original values with a format we specify. We can say a set of values should take a specific format, whether it be a category, or even a renaming.

C. Picture Statement
SYNTAX: PICTURE <$> NAME <(FORMAT-OPTION(S)> <VALUE-RANGE(S)>; The picture and value statements work similarly. Only the picture statement has the option of retaining the original value of a variable in addition to adding a character or formatting. For example, we can say 0.88 - <1.0 = 00 % A (mult=100);, which means values from 0.88 to 1 (less than 1), will print an A to the right, and they will also be multiplied by 100. This is an effective way of applying a syllabus-set grade to grade variables.

3 STATISTICAL ANALYSIS IN SAS


3.1 PROC UNIVARIATE STEP
A. Proc Univariate Statement Options
SYNTAX: PROC UNIVARIATE <OPTIONS>; The univariate procedure is effective in producing univariate statistical analysis on one or more variables. Options include Alpha=<significance-level> This option specifies a significance level for the provided 100(1-alpha)% invtervals. CIBASIC <alpha=significance-level> This option requests confidence intervals for the mean, standard deviation and variance of specified variable(s) with the assumption they are normally distributed. Mu0=<value> This option changes the hypothesized value from the default of 0 to a specified value.

B. Var Statement
SYNTAX: VAR <VARIABLE>; This statement specifies a variable(s) for univariate analysis.

C. Histogram Statement
SYNTAX: HISTOGRAM <VARIABLE> </OPTIONS>; The histogram statement produces a frequency bar chart for a specified variable(s). In the options field, we can specify a continuous distribution (ex. Normal, Exponential, etc.) and the procedure will superimpose its estimate of the appropriate probability density curve, and it will also provide goodness of fit tests.

3.2 PROC MEANS STEP

A. Proc Means Statement Options


SYNTAX: PROC MEANS <OPTIONS> <DESIRED-STATISTICS>; The means procedure is a more compact version of the univariate procedure. The options field is similar to that of the univariate procedure, but we can limit which statistics are displayed by listing them in the desired-statistics field (ex. N=# of observations, MEAN, SUM, etc.).

B. Var Statement
SYNTAX: VAR VARIABLE; The Var statement works the same way here as it does in the univariate procedure.

3.3 PROC TTEST STEP


A. Proc ttest Statement Options
SYNTAX: PROC TTEST <OPTIONS>; The ttest procedure produces t-tests for single samples, paired observation sets, and two independent samples. The options are similar to those of the Means and Univariate procedures.

B. Class Statement
SYNTAX: CLASS VARIABLE; Just like in the frequency procedure, the class statement specifies a group variable for the ttest procedure. This is required if we do analysis on two independent samples.

C. Var Statement
SYNTAX: VAR VARIABLE; Again, the var statement works just as it does in many other procedures.

D. Paired Statement
SYNTAX: PAIRED VARIABLE-A*VARIABLE-B; If we desire to perform analysis on a paired sample, we use the paired statement.

3.4 PROC CORR STEP


A. Proc Corr Statement Options
SYNTAX: PROC CORR <OPTIONS>; The correlation procedure produces correlation statistics for a specified pair(s) of variables.

B. Var Statement
SYNTAX: VAR VARIABLES; The var statement is where we list variables to correlate. The correlation procedure pairs all the variables we list.

3.5 PROC REG STEP


A. Proc Reg Statement Options
SYNTAX: PROC REG <OPTIONS>; In the regression procedure we can create regression models and statistics for them.

B. Model Statement
SYNTAX: MODEL DEPENDENT-VARIABLE = EXPLANATORY-VARIABLE(S); We tell SAS the structure of a desired regression model in the model statement. We list the explanatory variables in the appropriate field. NOTE: If we want an interaction model, we create an interaction variable in the data set and call it here.

C. Plot Statement
SYNTAX: PLOT Y*X; The plot procedure produces a scatter plot of the paired data set, and it imposes a regression line (the last model statement used) if we have created one prior.

3.6 PROC GLM STEP


A. Proc GLM Statement Options
SYNTAX: PROC GLM <OPTIONS>;

The GLM procedure is similar to the regression procedure, only it uses the method of least squares to fit general linear models.

B. LSMeans Statement
SYNTAX: LSMEANS VARIABLE </OPTIONS>; The LSMEANS statement calculates least squares means for each listed variable. It performs analysis on them as well.

3.7 PROC LOGISTIC STEP


A. Proc Logistic Statement
SYNTAX: PROC LOGISTIC <OPTIONS>; The logistic procedure is useful in creating logistic models (a model to predict probabilities given explanatory variables) and producing analysis for them.

B. Class Statement
SYNTAX: CLASS VARIABLE(S); The class statement works in the logistic procedure similarly to how it does in previously mentioned procedures.

C. Model Statement
SYNTAX: MODEL DEPENDENT-BINARY-VARIABLE = EFFECT(S); The model statement works similarly to the model statement in the regression procedure, only the dependent variable need be binary in this case.

You might also like