0% found this document useful (0 votes)
8 views

Import Xls Sas Code

The document discusses various statistical options and techniques that can be used with the PROC MEANS procedure in SAS. It explains how to generate summary statistics, perform group analysis, save output to a dataset, and more. Statistical options like N, NMISS, MEAN, STD, MIN, MAX are described.

Uploaded by

Nik Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Import Xls Sas Code

The document discusses various statistical options and techniques that can be used with the PROC MEANS procedure in SAS. It explains how to generate summary statistics, perform group analysis, save output to a dataset, and more. Statistical options like N, NMISS, MEAN, STD, MIN, MAX are described.

Uploaded by

Nik Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 6

/* Download the dataset file */

filename mydata temp;


proc http
url="https://github.com/deepanshu88/Datasets/raw/master/UploadedFiles/Test.xls"
method="GET"
out=mydata;
run;

/* Import */
proc import
file=mydata
out=test replace
dbms=xls;
run;

Proc Means Data = test;


Var q1 - q5;
Run;

Statistical Option Description


N Number of observations
NMISS Number of missing observations
MEAN Arithmetic average
STD Standard Deviation
MIN Minimum
MAX Maximum
SUM Sum of observations
MEDIAN 50th percentile
P1 1st percentile
P5 5th percentile
P10 10th percentile
P90 90th percentile
P95 95th percentile
P99 99th percentile
Q1 First Quartile
Q3 Third Quartile
Other Statistical Options
Statistical Option Description
VAR Variance
RANGE Range
USS Uncorr. sum of squares
CSS Corr. sum of squares
STDERR Standard Error
T Student’s t value for testing Ho: md = 0
PRT P-value associated with t-test above
SUMWGT Sum of the WEIGHT variable values
QRANGE Quartile range

How to See Specific Statistics


Suppose you want to see only two statistics - number of non-missing values and
number of missing values.

Proc Means Data = test N NMISS;


Var q1 - q5 ;
Run;
N refers to number of non-missing values and NMISS implies number of missing
values.

NMISS option in PROC MEANS


Tips : Add NOLABELS option to delete Label column in the PROC MEAN table.

Proc Means data = test N NMISS NOLABELS;


Var q1 - q5;
Run;
Group Analysis using PROC MEANS
Suppose you want to group or classify the analysis by Age. You can use the CLASS
statement to accomplish this task. It is equivalent to GROUP BY in SQL.

Proc Means data = test N NMISS NOLABELS;


Class Age;
Var q1 - q5;
Run;
Group analysis using PROC MEANS
You can use NONOBS option to delete N Obs column from the Proc Means table.

Proc Means data = test N NMISS NOLABELS NONOBS;


Class Age;
Var q1 - q5;
Run;
How to use Format in Proc Means
First, you need to create an user defined format.

Proc Format;
Value Age
1 = 'Less than 25'
2 = '25-34'
3 = '35-43'
4 = '44-50'
5 = '51-59'
6 = '60 or more';
Run;
Add FORMAT statement to use user defined format in PROC MEANS.

Proc Means data = test N MEAN;


Class Age;
Format Age Age.;
Var q1 - q5;
Run;
How to change Sorting Order
The DESCENDING option to the right of the slash in the first CLASS statement
instructs PROC MEANS to analyze the data in DESCENDING order of the values of Age.

Proc Means Data = test;


Class Age / descending;
Var q1 - q5 ;
Run;
Instead of displaying the results in "sort order" of the values of the
Classification Variable (s) you specified in the CLASS Statement, order the results
by frequency order using the ORDER=FREQ option in the CLASS Statement.

Proc Means Data = test N;


Class Age / Order = FREQ;
Var q1 - q5 ;
Run;
You can order the results by user-defined format of a variable specified in the
CLASS statement using the ORDER=FORMATTED option in the CLASS Statement.

Proc Means data = test N MEAN;


Class Age / Order = formatted;
Format Age Age.;
Var q1 - q5;
Run;
Custom formats in PROC MEANS
Note : If you specify CLASS statement without VAR statement, it classifies the
analysis by all numeric variables in your data set.

Grouping and Output in Separate Tables


Suppose you want to analyze variables Q1 - Q5 by variable AGE and want the output
of each levels of AGE in separate tables. You can use BY statement to accomplish
this task. See the example below-

Make sure you sort the data before using BY statement.

proc sort data= test;


by age;
run;
proc means data = test;
by age;
var q1 - q5 ;
run;
Difference between CLASS and BY statement
The CLASS statement returns analysis for a grouping (classification) variable in a
single table whereas BY statement returns the analysis for a grouping variable in
separate tables. Another difference is CLASS statement does not require the
classification variable to be pre-sorted whereas BY statement demands sorting.

Difference between CLASS and BY statement in PROC MEANS


Save Output in a Dataset
You can use NOPRINT option to tell SAS not to print output in output window.

Proc Means data = test NOPRINT;


Class Age / Order = formatted;
Format Age Age.;
Var q1 - q5;
Output out = readin mean= median = /autoname;
Run;
In the above code, readin is a data set in which output will be stored. The MEAN=
MEDIAN= options tells SAS to generate mean and median in the output dataset. The
AUTONAME Option automatically assigns unique variable names in the Output Data Set
“holding” the statistics requested in the OUTPUT statement.

You can use AUTOLABEL option to automatically assigns unique label names in the
Output Data Set “holding” the statistics requested in the OUTPUT statement.

Proc Means Data = test noprint;


Class Age ;
Var q1 q2;
Output out=F1 mean= / autoname autolabel;
Run;

You can specify variables for which you want summary statistics to be saved in a
output data set.

Proc Means Data = test noprint;


Class Age ;
Var q1 q2;
Output out=F1 mean(q1)= median(q2)= / autoname;
Run;
You can give custom names to variables stored in a output data set.

Proc Means Data = test noprint;


Class Age;
Var q1 - q5 ;
Output out=F1 mean=_mean1-_mean5 median=_median1-_median5;
Run;
DROP = , KEEP = option
We can use DROP and KEEP options to remove or keep some specific variables.

Proc Means Data = test noprint;


Class Age;
Var q1 - q5 ;
Output out=F1 (drop = _type_ _freq_) mean=_mean1-_mean5 median=_median1-_median5;
Run;
WHERE Statement
The WHERE statement is used to filter or subset data. In the code below, we are
filtering on variable Q1 and telling SAS to keep only those observations in which
value of Q1 is greater than 1.

Proc Means Data = test noprint;


Where Q1 > 1;
Class Age;
Var q1 - q5 ;
Output out=F1(drop= _FREQ_) mean= median= / autoname;
Run;
Like WHERE statement, we can use WHERE= OPTION to filter data. See the following
program -

Proc Means Data = test (Where=( Q1 > 1)) noprint;


Class Age;
Var q1 - q5 ;
Output out=F1(drop= _FREQ_) mean= median= / autoname;
Run;
Grouping by Two or More Variables
When two ore more variables are included in the CLASS statement, PROC MEANS returns
3 levels of classification which is shown in the _TYPE_ variable. Suppose we are
specifying variables AGE BU in the CLASS statement. SAS first returns mean and
median of variables Q1-Q5 by BU. It is the first level of classification which can
be filtered by using WHERE = ( _TYPE_ = 1). The same analysis by AGE is shown
against _TYPE_ = 2. When _TYPE_ = 3, SAS returns analysis by both the variables AGE
and BU.

Proc Means Data = test noprint;


Class Age BU;
Var q1 - q5 ;
Output out=F1 (where=(_type_=1) drop= AGE _FREQ_) mean= median= / autoname;
Output out=F2 (where=(_type_=2) drop= BU _FREQ_) mean= median= / autoname;
Output out=F3 (where=(_type_=3) drop= _FREQ_) mean= median= / autoname;
Run;
Using the NWAY option instructs PROC MEANS to output only observations with the
highest value of _TYPE_ to the new data set it is creating.

Proc Means Data = test nway noprint;


Class Age;
Var q1 - q5 ;
Output out=F1 mean=_mean1-_mean5 median=_median1-_median5;
Run;
By default, PROC MEANS will analyze the numeric analysis variables at all possible
combinations of the values of the classification variables. With the TYPES
statement, only the analyses specified in it are carried out by PROC MEANS.

Proc Means Data = test noprint;


Class Age BU Q1;
Types()
Age * BU
Age * BU * Q1;
Var q1 - q5;
Output out=F1 mean=_mean1-_mean5 max=_median1-_median5;
Run;
DESCENDTYPES Option : Orders rows/observations in the output data set by descending
value of _TYPE_.

Proc Means Data = test DESCENDTYPES noprint;


Class Age;
Var q1 - q5 ;
Output out=F1 mean=_mean1-_mean5 median=_median1-_median5;
Run;
Multiple CLASS Statements
Multiple CLASS statement permit user control over how the levels of the
classification variables are portrayed or written out to new data sets created by
PROC MEANS. It means any one of the classification variable can be displayed in
descending order.

Proc Means Data = test noprint;


Class Age / descending;
Class BU;
Var q1 - q5 ;
Output out=F1 mean=_mean1-_mean5 max=_median1-_median5;
Run;
Identifying Extreme Values
The IDGROUP options tells SAS to calculate the N largest and smallest values of the
variable specified in the VAR statement. The OUT[2] argument within IDGROUP option
means we want two extreme values to output.

data sales;
input products $ revenue;
datalines;
ProductA 100
ProductA 200
ProductA 300
ProductA 150
ProductA 250
ProductB 350
ProductB 200
ProductB 300
ProductB 400
;
run;

proc means data=sales noprint nway;


class products;
var revenue;
output out= myoutput
idgroup (max(revenue) out[2] (revenue)=maxrev)
idgroup (min(revenue) out[2] (revenue)=minrev)
sum= mean= /autoname;
run;
Sample T-Test using PROC MEANS
With PROC MEANS, we can perform hypothesis testing using sample t-test.

Null Hypothesis - Population Mean of Q1 is equal to 0


Alternative Hypothesis - Population Mean of Q1 is not equal to 0.

proc means data = test t prt;


var Q1;
run;
The PRT option returns p-value which implies lowest level of significance at which
we can reject null hypothesis. Since p-value is less than 0.05, we can reject the
null hypothesis and concludes that mean is significantly different from zero.

Difference between PROC MEANS and PROC FREQ


PROC MEANS is used to calculate summary statistics such as mean, count etc of
numeric variables. It requires at least one numeric variable whereas Proc Freq does
not have such limitation. In other words, if you have only one character variable
to analyse, PROC FREQ is the procedure to use.

You might also like