0% found this document useful (0 votes)
184 views11 pages

SAS Procedures For Common Statistical Analyses: Contents

This document provides an overview of common statistical analyses that can be conducted using SAS procedures. It describes procedures for describing and summarizing data, conducting hypothesis tests for independent and correlated samples, performing linear and generalized linear regressions, and analyzing data from experimental designs including one-way and two-way ANOVAs and randomized block designs. For each analysis, it lists the appropriate SAS procedure and provides an example syntax.

Uploaded by

maythri
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
184 views11 pages

SAS Procedures For Common Statistical Analyses: Contents

This document provides an overview of common statistical analyses that can be conducted using SAS procedures. It describes procedures for describing and summarizing data, conducting hypothesis tests for independent and correlated samples, performing linear and generalized linear regressions, and analyzing data from experimental designs including one-way and two-way ANOVAs and randomized block designs. For each analysis, it lists the appropriate SAS procedure and provides an example syntax.

Uploaded by

maythri
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 11

SAS Procedures for Common Statistical Analyses

Contents:

1. Introduction/Data Set Up
2. Describing Quantitative Variables
3. Describing Qualitative Variables
4. Two-Sample Tests (Independent Samples)
5. Completely Randomized Design (1-Way ANOVA)
6. Randomized Block Design
7. 2-Factor ANOVA
8. Chi-Square Tests
9. Linear Regression
10. Correlation
11. Generalized Linear Models
a) Logistic Regression
b) Poisson Regression
c) Negative Binomial Regression
2 Introduction/Data Set-Up

For all descriptions, we will have datasets where each line represents
an individual case, and there are 3 quantitative variables: X, Y, Z
measured; and 2 qualtative variables: A, B given, unless otherwise
noted.

DATA ONE;
INPUT X Y Z A B;
CARDS;
Data Here
;
RUN;

NOTE: All procedures can be done separately for all levels of one or
more factors, and specifically for only cases that meet some criteria.

Analysis Conducted separately for all levels of Factor A:


Data step
RUN;
PROC SORT; BY A; RUN;
PROC PROCNAME;
BY A;
Other PROC Statements
RUN;

Analysis Conducted only on cases where (say) A=1:


Data step
RUN;
PROC PROCNAME;
WHERE A=1;
Other PROC Statements
RUN;
2. Describing Quantitative Variables
Dataset contains
3 quantitative variables: X,Y,Z
• 2 qualitative Factors: A,B

Basic Statistics: PROC MEANS


Default: Mean, Standard Deviation, Minimum, Maximum
For all cases:
PROC MEANS;
VAR X Y Z;
RUN;
For cases separately by Factor A:
PROC MEANS;
CLASS A;
VAR X Y Z;
RUN;
For cases separately by combinations of Factors A & B:
PROC MEANS;
CLASS A B;
VAR X Y Z;
RUN;

Full-blown Summary: PROC UNIVARIATE


Default: Moments, SS, CV, SEM, Median, IQR, Tests for Location
(µ =0: t-test, Median=0: Sign, Signed-Rank tests), Quantiles, Extreme
Observations
PROC UNIVARIATE;
VAR X Y Z;
RUN;
2 Describing Qualitative Variables
Note: Dataset need not contain quantitive variables X, Y, Z; but does
contain qualitative responses A and B.

Frequency Tabulation for a Single Qualitive Response (A):


PROC FREQ; TABLES A; RUN;

Frequency Cross-Tabulation for Pair of Qualitive Responses (A,B):


PROC FREQ; TABLES A*B; RUN;

NOTE: In many instances you may wish to re-produce and further


analyze data previously published in a contingency table. Then each
“case” is a cell in the table, and you will include a count for each cell.

DATA ONE;
INPUT A B NUMCASE;
CARDS;
1 1 25
1 2 32
2 1 17
2 2 42
;
RUN;
PROC FREQ; TABLES A*B; WEIGHT NUMCASE; RUN;

4. 2-Sample tests (Independent Samples)


For this case, assume Factor A has 2 levels, and X is our response
variable.

TTEST Procedure: H0: µ 1-µ 2 = 0 versus HA: µ 1-µ 2 ≠ 0


The procedure will conduct the t-test based on the assumptions of
equal and unequal variances, as well as the F-test for equal variances
to guide you to which analysis to use.

PROC TTEST;
CLASS A;
VAR X;
RUN;

NPAR1WAY Procedure: H0: M1-M2 = 0 versus HA: M1-M2 ≠ 0

PROC NPAR1WAY WILCOXON;


CLASS A;
VAR X;
RUN;

5. Completely Randomized Design (1-Way ANOVA)

Statistical Model: Y = µ + τ i +ε ij =µ i+ε ij i=1,…,a j=1,…,ni


Let Factor A represent the treatment factor and Y be the response
variable. The dataset AOVOUT will contain the original dataset and
residuals (with variable name E).

ANOVA F-test, Levene’s Test for Equal Variance and


Bonferroni/Tukey Comparisons

PROC GLM;
CLASS A;
MODEL Y = A;
MEANS A / BON TUKEY HOVTEST;
OUTPUT OUT=AOVOUT R=E;
RUN;

Kruskal-Wallis H-Test (Nonparametric)

PROC NPAR1WAY WILCOXON;


CLASS A;
VAR Y;
RUN;

6. Randomized Block Design


Statistical Model: Y = µ + τ i + bj + ε ij = µ i +bj + ε ij i=1,…,a
j=1,…,b
Let A represent the treatment factor, B represent the blocking factor,
and Y be the response variable. The dataset AOVOUT will contain the
original dataset and residuals (with variable name E).

ANOVA F-test and Bonferroni/Tukey Comparisons

PROC GLM;
CLASS A B;
MODEL Y = A B;
MEANS A / BON TUKEY;
OUTPUT OUT=AOVOUT R=E;
RUN;

Friedman’s Test (Nonparametric)

PROC FREQ;
TABLES B*A*Y / CMH2 SCORES=RANK NOPRINT;
RUN;

Statistic and P-Value are printed by “Row Mean Scores Differ”

7. 2-Factor ANOVA

Statistical Model:
Y=µ + α i + β j +(α β )ij + ε ijk i=1,…,a j=1,…,b k=1,…,n
The dataset AOVOUT will contain the original dataset and residuals
(with variable name E).

Additive Model – No Interaction

PROC GLM;
CLASS A B;
MODEL Y = A B;
MEANS A B / BON TUKEY;
OUTPUT OUT=AOVOUT R=E;
RUN;

Model With Interaction

PROC GLM;
CLASS A B;
MODEL Y = A B A*B;
MEANS A B / BON TUKEY;
OUTPUT OUT=AOVOUT R=E;
RUN;

8. Chi-Square Test
Cases are classified on two qualitative variables: A and B
Want to test whether the classifications are independent (or that the
conditional distribution of variable B is the same for every level of
A).

PROC FREQ;
TABLES A*B / CHISQ EXPECTED;
RUN;

When measures of association (and tests of significance) are desired


instead of the Chi-Square test, use:

PROC FREQ;
TABLES A*B / MEASURES;
RUN;

9. Linear Regression
Simple Linear Regression
Statistical Model: Yi = β 0 + β 1Xi + ε i i=1,…,n
The dataset REGOUT will contain the original dataset and residuals
(with variable name E).

PROC REG;
MODEL Y = X;
OUTPUT OUT=REGOUT R=E;
RUN;

Multiple Linear Regression (Dataset contains variables X1,…,Xk)


Statistical Model: Yi = β 0 + β 1X1i +…+ β kXki + ε i i=1,…,n

PROC REG;
MODEL Y = X1 X2 … Xk;
OUTPUT OUT=REGOUT R=E;
RUN;

10. Correlation
Data: Variables Y1,…,Yk
Pairwise Bivariate Correlations
PROC CORR; VAR Y1 … Yk; RUN;

Partial Correlation between Y and Z, Controlling for X


PROC CORR; VAR Y Z; PARTIAL X; RUN:

11. Generalized Linear Models


Logistic Regression
Statistical Model: Y is a binary outcome:
e β 0 + β1 X i
π i = P(Yi = 1) = i = 1,..., n
1 + e β 0 + β1 X i

PROC GENMOD;
MODEL Y = X / DIST=BIN LINK=LOGIT;
RUN;

Poisson Regression
Statistical Model: Y is a count outcome:
Yi ~ Poisson(λ i) log(λ i) = β 0 + β 1XI E(Yi) = (λ i) V(Yi) = λ i

PROC GENMOD;
MODEL Y = X / DIST=POI LINK=LOG;
RUN;

Negative Binomial Regression


Statistical Model: Y is a count outcome:
Yi ~ NB(λ i,k) log(λ i) = β 0 + β 1Xi E(Yi) = (λ i) V(Yi) = λ i +
(λ i2/k)

PROC GENMOD;
MODEL Y = X / DIST=NB LINK=LOG;
RUN;

You might also like