0% found this document useful (0 votes)
45 views

Correlation: Type Informat Name What It Does

The document provides examples of SAS code for common data management and analysis tasks including correlation analysis, importing and formatting data, date calculations, sorting, merging datasets, printing observations by group, and frequency analysis. Key functions and procedures demonstrated include PROC CORR, PROC FORMAT, DATDIF, YRDIF, MDY, WEEKDAY, DAY, MONTH, YEAR, PROC SORT, DATA MERGE, PROC PRINT, and PROC FREQ.

Uploaded by

bhavya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

Correlation: Type Informat Name What It Does

The document provides examples of SAS code for common data management and analysis tasks including correlation analysis, importing and formatting data, date calculations, sorting, merging datasets, printing observations by group, and frequency analysis. Key functions and procedures demonstrated include PROC CORR, PROC FORMAT, DATDIF, YRDIF, MDY, WEEKDAY, DAY, MONTH, YEAR, PROC SORT, DATA MERGE, PROC PRINT, and PROC FREQ.

Uploaded by

bhavya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Correlation

PROC CORR DATA=dataset <options>;


VAR variable(s);
WITH variable(s);
RUN;

PROC CORR DATA=sample PLOTS=SCATTER(NVAR=all);


VAR weight height;
RUN;

DATA WineRanking;
INPUT company $ type $ score 3. date MMDDYY10.;
FORMAT date MMDDYY8.;
DATALINES;
Helmes Pinot 56 09/14/2012
Helmes Reisling 38 09/14/2012
Vacca Merlot 91 09/15/2012
Sterling Pinot 65 06/30/2012
Sterling Prosecco 72 06/30/2012
;
RUN;

Type Informat Name What it Does


Character $w. Reads in character data of length w.
Numeric w.d Reads in numeric data of length w with d decimal points
Date MMDDYYw. Reads in date data in the form of 10-01-81

PROC FORMAT;
VALUE GENDERCODE
0 = 'Male'
1 = 'Female';
VALUE ATHLETECODE
0 = 'Non-athlete'
1 = 'Athlete';
VALUE SMOKINGCODE
0 = 'Nonsmoker'
1 = 'Past smoker'
2 = 'Current smoker';
RUN;
DATA sample_formatted2;
SET sample;
FORMAT gender GENDERCODE. athlete ATHLETECODE. smoking SMOKINGCODE.;
RUN;

PROC PRINT DATA=sample LABEL; // FOR LABELLING


VAR bday;
LABEL bday = "Date of Birth";
RUN;

DATE

DATDIF(start_date, end_date, basis);

DATA sample;
SET sample;
date = DATDIF(DOB, Admdate, 'ACT/ACT');
RUN;

Here the DATDIF function returns the difference between two date variables
(DOB and Admdate) in number of days and saves it in the new numeric variable date.

DATA sample;
SET sample;
years = YRDIF(DOB, Admdate, 'ACT/ACT');
RUN;

Here the YRDIF function gives the difference between two dates (DOB and Admdate) in
number of years and saves it in the new numeric variable years.

DATA sample;
SET sample;
date = MDY(mn, days, yr);
FORMAT date MMDDYY10.;
RUN;
Here a new variable date will be created by combining the values in the
variables mn, days, and yr using the MDY function. The (optional) MMDDYY10. format
tells SAS to display the date values in the form MM/DD/YYYY.

DATA sample;
SET sample;
wkday = WEEKDAY(DOB);
RUN;

Here the WEEKDAY function extracts the day of the week value from the date
variable DOB and saves it in the new numeric variable wkday.

DATA sample;
SET sample;
days = DAY(DOB);
RUN;

Here the DAY function extracts the day value from the date variable DOB and saves it in
the new numeric variable days.

DATA sample;
SET sample;
mn = MONTH(DOB);
RUN;

Here the MONTH function extracts the month value from the date variable DOB and
saves it in the new numeric variable mn

DATA sample;
SET sample;
yr = YEAR(DOB);
RUN;

Here the YEAR function extracts the year portion from the date value variable DOB and
saves it in the new numeric variable yr.
SORTING

PROC SORT data=sample;


BY gender descending bday;
RUN;

The data is sorted first by gender.Within each gender, the data is then sorted in descending
order by birth date.

MERGE

DATA New-Dataset-Name (OPTIONS);


MERGE Dataset-Name-1 (OPTIONS) Dataset-Name-2 (OPTIONS);
BY Variable(s);
RUN;

The BY statement contains the variable(s) that identifies the observation in the first dataset that
represents the same subject as the observation in the second dataset.

DATA patients;
INPUT Subject_ID DOB Gender $;
INFORMAT DOB MMDDYY10.;
FORMAT DOB MMDDYY10.;
DATALINES;
1 9/20/1980 Female
2 6/12/1954 Male
3 4/2/2001 Male
4 8/29/1978 Female
5 2/28/1986 Female
;
RUN;

DATA initial_appointments;
INPUT Subject_ID Visit_Date Doctor $;
INFORMAT Visit_Date MMDDYY10.;
FORMAT Visit_Date MMDDYY10.;
DATALINES;
1 1/31/2012 Walker
2 2/2/2012 Jones
3 1/15/2012 Jones
5 1/29/2012 Smith
;

PROC SORT DATA=patients;


BY Subject_ID;
RUN;

PROC SORT DATA=initial_appointments;


BY Subject_ID;
RUN;

DATA one_to_one_match;// One-to-one matching assumes that each subject appears exactly
once in each of the datasets being merged.
MERGE patients initial_appointments;
BY Subject_ID;
RUN;

ONE-TO-MANY MATCH// INSTEAD OF ONE TO


ONE YOU CAN ALSO USE THIS.
One-to-many matching assumes that each subject appears exactly once in one dataset,
but can have multiple matching records in another dataset.

printing observations by groups

PROC SORT DATA=sample;


BY Gender;
RUN;

PROC PRINT DATA=sample LABEL;


BY Gender;
ID ids;
VAR Gender Height Weight;
FORMAT Height Weight 3.0;
RUN;

Because we want to print observations by gender, we must first sort the data using PROC SORT.
The BY statement specifies that we want to group the printed output by the levels of
variable Gender. The ID statement specifies that variable StudentID should be printed instead of
the observation number. Because we are only interested in the height and weight of each
student, these two variables are specified in the VARstatement. (Note, however, that the variable
given in the ID statement will automatically print, regardless of whether or not it is listed in the
VAR statement.) Finally, a FORMAT statement specifies that height and weight should print with
no decimal point. (Specifically, it says that the values should be no wider than three characters,
and should have no decimal places.)

THE FREQ PROCEDURES

PROC FREQ DATA=dataset;


TABLES variable(s) / <options>;
RUN;

The TABLES statement is where you put the names of the variables you want to produce a
frequency table for. You can list as many variables as you want, with each variable separated by
a space.

Descending Order and Missing Values

PROC FREQ DATA=sample ORDER=freq;


TABLE State Rank / MISSING;
RUN;

The ORDER=freq option in the first line of the syntax tells SAS to order the values in the table in
descending order. The MISSING option appearing after the slash (/) in the TABLE statement
tells SAS to include the missing values as a row in the table.

You might also like