Let's Explore SAS Proc T-Test: Ana Yankovsky
Let's Explore SAS Proc T-Test: Ana Yankovsky
Ana Yankovsky
Research Statistical Analyst
Screening Programs, AHS
[email protected]
Goals of the presentation:
• The T-Test or Student’s T-Test is any statistical hypothesis test in which the
test statistics (t-statistics) follows a Student’s t distribution if the null hypothesis is
supported.
• William Sealy Gosset introduce the t-statistics in 1908 while he worked at the
Guinness brewery in Dublin.
• He applied his findings to monitor the quality of stout in the production of dark beer.
• Because the company forbade its scientists to publish, Gosset published his work under the
pseudonym “Student”.
Syntax
• No statement can be used more than once. There is no restriction on the order of the
statements after the PROC statement.
The following options can appear in the PROC TTEST statement:
• You can specify a BY statement with PROC TTEST to obtain separate analyses on
observations in groups defined by the BY variables.
• When a BY statement appears, the procedure expects the input data set to be sorted
in order of the BY variables.
Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY
statement for the TTEST procedure.
Create an index on the BY variables using the DATASETS procedure (in base
SAS software).
CLASS Statement :
CLASS variable ;
• Is giving the name of the classification (or grouping) variable and must
accompany the PROC TTEST statement in the two independent sample cases.
• If it is used without the VAR statement, all numeric variables in the input data
set (except those appearing in the CLASS, BY, FREQ, or WEIGHT statement)
are included in the analysis.
• The class variable must have two, and only two, levels.
• You can use either a numeric or a character variable in the CLASS statement.
FREQ- Statement :
FREQ variable ;
• If the frequency value is less than 1 or is missing, the observation is not used
in the analysis.
• The FREQ statement cannot be used if the DATA= data set contains
statistics instead of the original observations.
PAIRED Statement:
PAIRED PairLists ;
• The CLASS and VAR statements cannot be used with the PAIRED statement.
PAIRED Statement (continued):
PAIRED PairLists ;
• The (*) requests comparisons between each variable on the left with each variable on the
right.
• The (:) requests comparisons between the first variable on the left and the first on the right,
the second on the left and the second on the right, and so forth. The number of variables on
the left must equal the number on the right when the colon is used.
• The VAR statement can be used with one- and two-sample t tests
and cannot be used with the PAIRED statement;
Syntax - Statement:
WEIGHT variable ;
• The WEIGHT statement weights each observation in the input data set by the
value of the WEIGHT variable.
• The values of the WEIGHT variable can be non integral, and they are not truncated.
• Observations with negative, zero, or missing values for the WEIGHT variable are
not used in the analyses.
• The WEIGHT statement cannot be used with an input data set of summary
statistics.
PROC TTEST in SAS performs t-test for:
• Paired observations;
The underlying assumption of the t test in all three cases is that the
observations are:
For the group comparison PROC TTEST computes t statistic based on the
assumption that the variances of the two groups are equal.
One sample t-test
VS
One sample t-test : Example
For this example, we will compare the mean of the variable weight_loss in
the clinic group for a pre-selected value of 4 and alpha value of 0.1:
DATA CLINIC;
SET TTEST.MI_PHVSCL_CHDATA;
IF GROUP='Clinic';
WEIGHT_LOSS=WEIGHT_BASE_KG-WEIGHT_6MON_KG;
RUN;
Summary statistics
Test Statistics
• DF - The degrees of freedom for the single sample t-test is simply the number of valid
observations minus 1.
• t Value - This is the Student t-statistic. It is the ratio of the difference between the sample
mean and the given number to the standard error of the mean. Since that the standard error
of the mean measure the variability of the sample mean, the smaller the standard error of the
mean, the more likely that our sample mean is close to the true population mean.
• Pr > |t| - The p-value is the two-tailed probability computed using t distribution. It is the
probability of observing a greater absolute value of t under the null hypothesis. For a one-
tailed test, halve this probability. If p-value is less than the pre-specified alpha level (usually .05
or .01) we will conclude that mean is statistically significantly different from zero. In our
example, the p-value for weight_loss is smaller than 0.1. So we conclude that the mean
for weight_loss is significantly different from 4.
One sample t-test : Example
Graphs
Paired t-test
For this example, we will compare the mean of the variable weight loss at the
base to the weight loss at 6 months in the clinic group.
Summary statistics
Test Statistics
Paired t-test: Example Results
Graphs
Paired t-test: Example Results
Graphs
Two sample t-test
• The SAS PROC TTEST procedure is used to test for the equality of means for
a two-sample (independent group) t-test.
• Key assumptions: underlying the two-sample t-test are that the random
samples are independent and that the populations are normally distributed
with equal variances.
Two sample t-test: Example
We would like to know if the mean weight loss in the phone group is equal to
the mean weight loss in the clinic group
DATA TWOSAMPLE;
SET TTEST.MI_PHVSCL_CHDATA;
WEIGHT_LOSS=WEIGHT_BASE_KG-WEIGHT_6MON_KG;
PROC SORT; BY GROUP;
RUN;
PROC MI DATA=TWOSAMPLE SEED=54987 SIMPLE NIMPUTE=5 OUT=MI_TWOSAMPLE;
MCMC CHAIN = MULTIPLE DISPLAYINIT INITIAL=EM(ITPRINT);
VAR WEIGHT_LOSS Age RacialCategory;
RUN;
PROC TTEST DATA=TWOSAMPLE;
CLASS GROUP;
VAR WEIGHT_LOSS ;
TITLE 'TTEST OF EQUALITY OF MEANS';
RUN;
Two sample t-test: Example Results
Summary statistics
Same as for one sample ttest, but for each group and for the difference of means;
Two sample t-test: Example Results
Test Statistics
Graphs
References: