Sas Tiny Manual
Sas Tiny Manual
2 Data sets
Working with data sets in SAS isnt that difficult, because its possible to use the so called LIBNAMEstatement. This is a reference to a physical place of a permanent data set2. Its also possible in SAS to work with temporary data sets, which will be deleted after the SAS session. With the LIBNAME-statement a reference is made to a place on the hard disk or diskette were the permanent data sets are located. We can illustrate this with an example. Suppose that the permanent data set TEST.SD2 is located in the directory C:\FILES\SAS. Instead of typing the whole path name over and over again, we can use a LIBNAME-statement to make a reference to this directory. We call this reference SAAB, but its possible to come up with another name. Using the following SAS-program, we refer with the name SAAB to the directory C:\FILES\SAS.
SAS Program 1 (LIBNAME statement)
Libname SAAB c:\files\sas; run;
After submitting the program, the following will be shown in the LOG window of SAS.
1 2
To submit, choose LOCALS and SUBMIT or press the icon with a S from SUBMIT. Permanent SAS data sets are files with the extension .sd2.
As said before permanent data sets are files with the extension .sd2. With the help of the LIBNAMEstatement its easy to access these files within SAS. With use of the procedure PRINT, its possible to see the contents of the data set PROEF.SD2 in the OUTPUT window. To so this we use the following SAS program:
SAS program 2 (Showing the contents of the data set PROEF.SD2)
Proc print data=saab.proef; run;
After submitting this program, we have the following output in the OUTPUT window
OUTPUT window (Partly)
OBS 1 2 3 4 5 6 7 8 9 10 UREN 40.0000 48.0000 31.9615 40.0000 49.8077 38.0000 40.9808 34.5192 23.0769 40.0000
LOGLOON 1.99461 1.03213 0.42720 1.30146 1.77532 2.02910 1.43139 -0.85376 2.71651 1.60157
If we want change some of the variables in the data set which we are working with, its useful to work with temporary data sets. With SAS program 3, the permanent data set PROEF.SD2 will be copied with use of the option SET to the temporary data set TEST.
SAS program 3 (Copying to a temporary data set)
Data test; set saab.proef; run;
As you can see in the program, the libname SAAB is not used in the DATA-statement. This way SAS knows that the data set TEST is temporary. If we would have used the libname SAAB in the DATAstatement (Data saab.test), the data set PROEF.SD2, would have been copied to the permanent data set TEST.sd2.
In the PROC REG statement we mention the data set which has to be used. In our case its the data set PROEF.SD2, which is in the directory C:\FILES\SAS. In the MODEL statement, we put the equation we want to estimate. SAS automatically adds a constant term (to estimate without a constant, you have to add the option /NOINT after te model statement). SAS program 4 generates the following output:
OUTPUT window (Generated by SAS program 4)
Model: MODEL1 Dependent Variable: UREN Analysis of Variance Sum of Squares 1567.50211 9787.09178 11354.59389 9.51952 41.74149 22.80589 Mean Square 391.87553 90.62122
DF 4 108 112
F Value 4.324
Prob>F 0.0028
0.1381 0.1061
Parameter Estimates Parameter Estimate 41.378101 -2.670460 3.422418 6.795788 0.672989 Standard Error 3.16675351 1.72915476 2.14442556 1.83019412 1.89331473 T for H0: Parameter=0 13.066 -1.544 1.596 3.713 0.355
DF 1 1 1 1 1
Within the PROC REG program there are several options to let SAS generate more output. Suppose we want to know the covariance matrix of the error term. By putting the option /COVB behind the MODEL statement in SAS program 4, the covariance matrix will be calculated and shown as an extra output in the OUTPUT window. In SAS its also possible to calculate the so called White Covariance Matrix. Instead of the option /COVB, we then have to use the option /ACOV in SAS program 4. SAS program 5 shows how to calculate this covariance matrix.
SAS program 5 (Calculation of the White Covariance Matrix)
Proc reg data=saab.proef; model uren = logloon vakbond man getr /ACOV; run;
By taking the square root of the diagonal elements of these matrix, we get the so called White Standard Errors. To calculate the Durbin-Watson statistic in order to test whether or not the errors have first-order autocorrelation, the option /DW has to be added to the MODEL-statement.
What is it what this program does? It uses the dataset TEST.SD2 in the directory whereto the libname link BIEB has been made. The line with the CLASS-statement determines that the variable UNION is the latent variable of the binary respons model under consideration.
In the line with the MODEL-statement, the model we want to estimate is specified. In this line it is also denoted which distribution SAS has to use to estimate the model. In the example the option /D=NORMAL was used, which implies that SAS will estimate a PROBIT model. IF we would change the option into /D=LOGISTIC, we would estimate a LOGIT model. The line with the OUTPUT-statement, writes the dataset UIT1 (to the directory whereto the libname BIEB is pointing). In the dataset there is also the variable PRED1 included. This variable denotes the prediction of every individual, on the basis of the model which was estimated before. IMPORTANT: If we use PROC PROBIT, SAS estimates - instead of . Be carefull about this in your interpretation of the estimation results. Using a cross-tabular matrix, it is possible to determine the predictionquality of the model. To make such a cross-tabular matrix in SAS, we use the following program.
SAS Program (Cross-tabular matric to determine the prediction quality of the model)
data bieb.HULP1; set bieb.UIT1; if PRED1 >= 0.5 then UNION_P1 = 1; if PRED1 < 0.5 then UNION_p1 = 0; run; proc freq data=bieb.HULP1; tables UNION*UNION_P1; run;
This program, creates the dataset HULP1 and to do this it uses the dataset UIT1. The second part of the program has as output a cross-tabular matrix in which the prediciton (according to the model) of the variable UNION has been shown against the observered variable UNION. Using this cross-tabular matrix it is possible to calculate a measure of fit of the model.