0% found this document useful (0 votes)
2 views21 pages

Training Handouts

SAS, or Statistical Analysis System, originated in 1966 and has evolved into a comprehensive software suite for data analysis and business administration. It includes components like DATA and PROC steps for data manipulation and reporting, and allows for various methods of data entry and storage. SAS data sets consist of a descriptor and data portion, with specific naming rules and attributes for variables.

Uploaded by

safura qazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views21 pages

Training Handouts

SAS, or Statistical Analysis System, originated in 1966 and has evolved into a comprehensive software suite for data analysis and business administration. It includes components like DATA and PROC steps for data manipulation and reporting, and allows for various methods of data entry and storage. SAS data sets consist of a descriptor and data portion, with specific naming rules and attributes for variables.

Uploaded by

safura qazi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 21

SAS Base- I

SAS stands for Statistical Analysis System


The origins of the SAS system can be traced back to 1966. Originally envisioned by Anthony J. Barr,
the concept was based on the use of algebraic formulas to translate raw data into usable forms. Barr
continued to refine the process through 1968. During that year, Barr began working with James
Goodnight to expand the framework created by Barr. Initially utilized within the academic community,
the idea of SAS began to cross over into the business community in the early 1970’s.

The collection of software that is routinely included in the SAS system allows the end user to perform
a wide range of tasks that cover just about every aspect of business administration and function.
Essentially, the SAS system represents a one stop shopping approach to getting all the programs
needed under one simple umbrella

The software included in the typical SAS system provides tools for all sorts of projects and daily tasks.
Writing reports and creating graphics are easy using the tools provided. Research and project
management software aids in creating both operational and marketing strategies. Tools that allow for
quick and efficient data entry and retrieval make it possible to gather statistics or other information for
reports in no time at all. The SAS system usually includes components that will aid in departmental
functions that range from information systems support to human resources management and even
customer care protocols.

Component of SAS program


DATA step and a PROC step. These two types of steps, alone or combined, form all SAS
programs.

DATA steps typically create or modify SAS data sets. They can also be used to produce custom
designed reports. For example, you can use DATA steps to

 Put your data into a SAS data set


 Compute values
 Check for and correct errors in your data
 Produce new SAS data sets by subsetting, merging, and updating existing data sets.

PROC (procedure) steps are pre-written routines that enable you to analyze and process the data in a
SAS data set and to present the data in the form of a report. PROC steps sometimes create new SAS
data sets that contain the results of the procedure. PROC steps can list, sort, and summarize data.
For example, you can use PROC steps to

 Create a report that lists the data


 Produce descriptive statistics
 Create a summary report
 Produce plots and charts.

SAS programs consist of SAS statements. A SAS statement has two important characteristics:
 It usually begins with a SAS keyword.
 It always ends with a semicolon.

SAS statements are in free format. This means that


 They can begin and end anywhere on a line
 One statement can continue over several lines
 Several statements can be on a line.
Methods for getting your data into SAS can be put into four general categories:
 entering data directly into SAS data sets
 creating SAS data sets from raw data files
 converting other software’s data files into SAS data sets
 Reading other software’s data files directly. Blanks or special characters separate "words" in
a SAS statement.

Note: You can specify SAS statements in uppercase or lowercase but text that is enclosed in
quotation marks is case sensitive.

SAS Names
SAS names follow a simple naming rule: All SAS variable names and data set names can be no
longer than 32 characters and must begin with a letter or the underscore (_) character. The remaining
characters in the name may be letters, digits, or the underscore character. Characters such as dashes
and spaces are not allowed. Here are some valid and invalid SAS names

Valid SAS Names


Parts
LastName
First_Name
_Ques5_
Cost_per_Pound
DATE
time
X12Y34Z56

Invalid SAS Names


8_is_enough Begins with a number
Price per Pound Contains blanks
Month-total Contains an invalid character ( - )
Num% Contains an invalid character (%)

SAS Data Sets

1.Variable Attributes: Type


Character ($)
Numeric.

2.Variable Type and Missing Values


For character variables shown below a blank represents a missing value.
For numeric variables shown below, a period represents a missing value.

4.Variable Attributes: Length


 Character variables can be up to 32K long.(Default Length 8)
 All numeric variables have a default length of 8. Numeric values (no matter how many
digits they contain) are stored as floating-point numbers in 8 bytes of storage, unless you
specify a different length.

Processing SAS Programs


When you submit a SAS program, SAS begins reading the statements and checking them for errors.
DATA and PROC statements signal the beginning of a new step. When SAS encounters a
subsequent DATA, PROC, or RUN statement (for DATA steps and most procedures) or a QUIT
statement (for some procedures), SAS stops reading statements and executes the previous step in
the program.
Log Messages
Each time a step is executed, SAS generates a log of the processing activities and the results of
the processing. The SAS log collects messages about the processing of SAS programs and about
any errors that occur.

How SAS Files Are Stored


Every SAS file is stored in a SAS library, which is a collection of SAS files. A SAS data library is
the highest level of organization for information within SAS.
SAS libraries have different implementations depending on your operating environment, but a
library usually corresponds to the level of organization that your host operating system uses to
access and store files. In some operating environments, a library is a physical collection of files.
In others, the files are only logically related.

Storing Files Temporarily or Permanently


Depending on the library name that you use when you create a file, you can store SAS files
temporarily or permanently.

Temporary SAS libraries last only for the current SAS session.

Storing files temporarily:


If you don't specify a library name when you create a file (or if you specify the library name
Work), the file is stored in the temporary SAS data library. When you end the session, the temporary
library and all of its files are deleted.

Permanent SAS libraries are available to you during subsequent SAS sessions.

Storing files permanently:


To store files permanently in a SAS data library, you specify a library name other than the default
library name Work.

For example, by specifying the library name sasdata when you create a file, you specify that the file is
to be stored in a permanent SAS data library until you delete it.

Libname ABC 'c:\users\acct\qtr1\report';

Note: Library name cannot be longer than 8 characters.

Referencing Permanent SAS Files

Two-Level Names
To reference a permanent SAS data set in your SAS programs, you use a two-level name:
libref.filename
In the two-level name, libref is the name of the SAS data library that contains the file, and
filename is the name of the file itself. A period separates the libref and filename.

For example, in our sample program, ABC.Student is the two-level name for the SAS data set
Student, which is stored in the library named ABC

Referencing Temporary SAS Files


To reference temporary SAS files, you can specify the default libref Work, a period, and the
filename. For example, the two-level name Work.Test references the SAS data set named Test
that is stored in the temporary SAS library Work..

Alternatively, you can use a one-level name (the filename only) to reference a file in a temporary
SAS library. When you specify a one-level name, the default libref Work is assumed. For
example, the one-level name Test also references the SAS data set named Test that is stored in
the temporary SAS library Work.
Overview of Data Sets

Conceptually, a SAS data set is a file that consists of two parts: a descriptor portion and a data
portion.

Descriptor Portion
The descriptor portion of a SAS data set contains information about the data set, including
 the name of the data set
 the date and time that the data set was created
 the number of observations
 the number of variables.

In addition to general information about the data set, the descriptor portion contains information about
the attributes of each variable in the data set. The attribute information includes the variable's name,
type, length, format, informat, and label.

Data Portion
The data portion of a SAS data set is a collection of data values that are arranged in a rectangular
table.

Examining the Descriptor Portion of a SAS Data Set


Using PROC CONTENTS

PROC CONTENTS DATA = YUNUS.ADMIT;


RUN

Data Set Name YUNUS.ADMIT Observations 21

Member Type DATA Variables 9

Engine V9 Indexes 0

19:13 Thursday, July Observation


Created 15, 1993
64
Length

19:13 Thursday, July Deleted


Last Modified 15, 1993
0
Observations

Protection Compressed NO

Data Set Type Sorted NO

Label

Data
WINDOWS
Representation

wlatin1 Western
Encoding (Windows)
Engine/Host Dependent Information

Data Set Page Size 8192

Number of Data Set Pages 1

First Data Page 1

Max Obs per Page 127

Obs in First Data Page 21

Number of Data Set Repairs 0

File Name C:\SASDATA\admit.sas7bdat

Release Created 9.0000M0

Host Created WIN_PRO

Alphabetic List of Variables and


Attributes

# Variable Type Len Format

8 ActLevel Char 4

4 Age Num 8

5 Date Num 8 MMDDYY8.

9 Fee Num 8 6.2

6 Height Num 8

1 ID Char 4

2 Name Char 14

3 Sex Char 1

7 Weight Num 8
Note: Here the variables are listed in alphabetical order

Demonstrating the VARNUM option of PROC CONTENTS

A more useful way to list variable information is to list them in the order the
variables are
stored in the SAS data set, rather than alphabetically. To create such a list, use the
VARNUM option of PROC CONTENTS, like this:

PROC CONTENTS DATA = YUNUS.ADMIT VARNUM;


RUN;

Data Set Name YUNUS.ADMIT Observations 21

Member Type DATA Variables 9

Engine V9 Indexes 0

19:13 Thursday, July Observation


Created 15, 1993
64
Length

19:13 Thursday, July Deleted


Last Modified 15, 1993
0
Observations

Protection Compressed NO

Data Set Type Sorted NO

Label

Data
WINDOWS
Representation

wlatin1 Western
Encoding (Windows)
Engine/Host Dependent Information

Data Set Page Size 8192

Number of Data Set Pages 1

First Data Page 1

Max Obs per Page 127

Obs in First Data Page 21

Number of Data Set Repairs 0

File Name C:\SASDATA\admit.sas7bdat

Release Created 9.0000M0

Host Created WIN_PRO

Variables in Creation Order

# Variable Type Len Format

1 ID Char 4

2 Name Char 14

3 Sex Char 1

4 Age Num 8

5 Date Num 8 MMDDYY8.

6 Height Num 8

7 Weight Num 8

8 ActLevel Char 4

9 Fee Num 8 6.2


Listing All the SAS Data Sets in a SAS Library Using
PROC CONTENTS

PROC CONTENTS DATA = YUNUS._all_ nods;


RUN;

Directory

Libref YUNUS

Engine V9

Physical Name C:\SASDATA

File Name C:\SASDATA

Member File
# Name Type Size Last Modified

1 ABC DATA 9216 15JUL1993:19:47:01

2 ADMIT DATA 9216 15JUL1993:19:13:52

3 CUSTOMERS DATA 5120 15JUL1993:16:36:40

4 CUSTOMERS2 DATA 5120 15JUL1993:16:36:41

5 INVENTORY DATA 5120 15JUL1993:16:36:41

6 INVOICE DATA 5120 15JUL1993:16:36:41

7 MANUFACTURERS DATA 5120 15JUL1993:16:36:41

8 PRODUCTS DATA 9216 15JUL1993:16:36:41

9 PURCHASES DATA 5120 15JUL1993:16:37:18


Syntax Errors
Syntax errors generally cause SAS software to stop processing the step where the error
is encountered. Common syntax errors include
• spelling mistakes
• forgetting semicolons
• leaving quotation marks unbalanced
• Specifying invalid options.

Debugging a SAS Program

When SAS encounters a syntax error, SAS identifies the error and writes the location and
explanation of the error to the SAS log. Syntax errors generally cause SAS software to
stop processing the step where the error is encountered. Common syntax errors include

Diagnosing and Correcting Syntax Errors

Syntax errors include


 Misspelled keywords
 Forgetting semicolons
 Leaving quotation marks unbalanced
 Specifying invalid options.

Internal raw data


If you type raw data directly in your SAS program, then the data are internal to your program. You
may want to do this when you have small amounts of data, or when you are testing a program with a
small test data set. Use the DATALINES statement to indicate internal data. The DATALINES
statement must be the last statement in the DATA step. All lines in the SAS program following the
DATALINES statement are considered data until SAS encounters a semicolon. The semicolon can be
on a line by itself or at the end of a SAS statement which follows the data lines. Any statements
following the data are part of a new step. If you are old enough to remember punching computer
cards, you might like to use the CARDS statement instead. The CARDS statement and the
DATALINES statement are synonymous. The following SAS program illustrates the use of the
DATALINES statement.

DATA uspresidents;
INPUT President $ Party $ Number;
DATALINES;
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35
;
RUN;

External raw data files


Usually you will want to keep data in external files, separating the data from the program. This
eliminates the chance that data will accidentally be altered when you are editing your SAS program.
Use the INFILE statement to tell SAS the filename and path, if appropriate, of the external file
containing the data. The INFILE statement follows the DATA statement and must precede the INPUT
statement. After the INFILE keyword, the file path and name are enclosed in quotation marks.
Suppose the following data are in a file called President.xls in the directory Training on the wns drive
(Windows):
Adams F 2
Lincoln R 16
Grant R 18
Kennedy D 35

The following program shows the use of the INFILE statement to read the external data file:

DATA uspresidents;
INFILE '/wns/Training/President.xls';
INPUT President $ Party $ Number;
RUN;

Long records
In some operating environments, SAS assumes external files have a record length of 256 or less.
(The record length is the number of characters, including spaces, in a data line.) If your data lines
are long, and it looks like SAS is not reading all your data, then use the LRECL= option in the INFILE
statement to specify a record length at least as long as the longest record in your data file.
INFILE '/wns/Training/President.txt' LRECL=2000;

SAS OPTIONS
DATE|NODATE
NUMBER|NONUMBER
PAGENO=
PAGESIZE=
LINESIZE=
OBS=

INFORMATS

Informats are useful anytime you have non-standard data. (Standard numeric data contain only
numerals, decimal points, minus signs, and E for scientific notation.) Numbers with embedded
commas or dollar signs are examples of non-standard data.SAS have informats for reading these
types of data as well.
Dates are perhaps the most common non-standard data. Using date informats, SAS will convert
conventional forms of dates like 10-31-2003 or 31OCT03 into a number, the number of days since
January 1, 1960. This number is referred to as a SAS date value. This turns out to be extremely
useful when you want to do calculations with dates. For example, you can easily find the number of
days between two dates by subtracting one from the other.

There are three general types of informats: character, numeric, and date.
Character: - $informatw.
Numeric: - informatw.d
Date: - informatw.

The $ indicates character informats, INFORMAT is the name of the informat, w is the total width, and
d is the number of decimal places (numeric informats only). The period is very important part of the
informat name. Without a period, SAS may try to interpret the informat as a variable name, which by
default, cannot contain any special characters except the underscore. Two informats do not have
names: $w., which reads standard character data, and w.d, which reads standard numeric data.
DATA contest;
INFILE DATALINES;
INPUT Name $16. +1 Age 2. +1 Type $1. +1 Date MMDDYY10. (Score1 Score2
Score3 Score4 Score5) (4.1);
DATALINES;
Alicia Grossman 13 c 10-28-2003 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2003 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2003 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2003 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2003 8.9 9.510.0 9.7 9.0
Brian Williams 11 C 10-29-2003 7.8 8.4 8.5 7.9 8.0
;RUN;

The variable Name has an informat of $16., meaning that it is a character variable 16 columns wide.
Variable Age has an informat of three, is numeric, three columns wide, and has no decimal places.
The +1 skips over one column. Variable Type is character, and it is one column wide. Variable Date
has an informat MMDDYY10. and reads dates in the form 10-31-2003 or 10/31/2003, each 10
columns wide. The remaining variables, Score1 through Score5, all require the same informat, 4.1. By
putting the variables and the informat in separate sets of parentheses, you only have to list the
informat once.

FORMATS

Formats affect only the way that the data values appear in output, not the actual data values as they
are stored in the SAS data set

PROC PRINT DATA = contest NOOBS;


TITLE ’Pumpkin Carving Contest’;
FORMAT DATE DATE7.;
RUN;

Name Age Type Date Score1 Score2 Score3 Score4 Score5

Alicia Grossman 13 c 28OCT03 7.8 6.5 7.2 8.0 7.9

Matthew Lee 9 D 30OCT03 6.5 5.9 6.8 6.0 8.1

Elizabeth Garcia 10 C 29OCT03 8.9 7.9 8.5 9.0 8.8

Lori Newcombe 6 D 30OCT03 6.7 5.6 4.9 5.2 6.1

Jose Martinez 7 d 31OCT03 8.9 9.5 10.0 9.7 9.0

Brian Williams 11 C 29OCT03 7.8 8.4 8.5 7.9 8.0


You can permanently assign a format to a variable in a SAS data set, or you can temporarily specify a
format in a PROC step to determine the way that the data values appear in output.

DATA contest2;
INFILE DATALINES;
INPUT Name $16. +1 Age 2. +1 Type $1. +1 Date MMDDYY10. (Score1 Score2
Score3 Score4 Score5) (4.1);
FORMAT DATE DATE9.;
DATALINES;
Alicia Grossman 13 c 10-28-2003 7.8 6.5 7.2 8.0 7.9
Matthew Lee 9 D 10-30-2003 6.5 5.9 6.8 6.0 8.1
Elizabeth Garcia 10 C 10-29-2003 8.9 7.9 8.5 9.0 8.8
Lori Newcombe 6 D 10-30-2003 6.7 5.6 4.9 5.2 6.1
Jose Martinez 7 d 10-31-2003 8.9 9.510.0 9.7 9.0
Brian Williams 11 C 10-29-2003 7.8 8.4 8.5 7.9 8.0
;
RUN;

PROC PRINT DATA = CONTEST2 NOOBS;


RUN;

Name Age Type Date Score1 Score2 Score3 Score4 Score5

Alicia Grossman 13 c 28OCT2003 7.8 6.5 7.2 8.0 7.9

Matthew Lee 9 D 30OCT2003 6.5 5.9 6.8 6.0 8.1

Elizabeth Garcia 10 C 29OCT2003 8.9 7.9 8.5 9.0 8.8

Lori Newcombe 6 D 30OCT2003 6.7 5.6 4.9 5.2 6.1

Jose Martinez 7 d 31OCT2003 8.9 9.5 10.0 9.7 9.0

Brian Williams 11 C 29OCT2003 7.8 8.4 8.5 7.9 8.0

Working with SAS Dates

A SAS date is a numeric value equal to the number of days since January 1, 1960.
The table below lists four dates and their values as SAS dates:

Date SAS date value


January 1, 1959 -365
January 1, 1960 0
January 1, 1961 366
January 1, 2003 15706

SAS has special tools for working with dates: informats for reading dates, functions for
manipulating dates, and formats for printing dates.

SAS has a variety of date informats for reading dates in many different forms. All of these informats
convert your data to a number equal to the number of days since January 1, 1960.
Setting the default century When SAS sees a date with a two-digit year like 07/04/76,
SAS has to decide in which century the year belongs. Is the year 1976, 2076, or perhaps 1776? The
system option YEARCUTOFF= specifies the first year of a hundred-year span for SAS to use. The
default value for this option is 1920, but you can change this value with the OPTIONS statement.
To avoid problems, you may want to specify the YEARCUTOFF= option whenever you have data
containing two-digit years. This statement tells SAS to interpret two-digit dates as occurring between
1950 and 2049:
OPTIONS YEARCUTOFF = 1950;

Dates in SAS expressions Once a variable has been read with a SAS date informat, it can be
used in arithmetic expressions like other numeric variables. For example, if a library book is due in
three weeks, you could find the due date by adding 21 days to the date it was checked out:
DateDue = DateCheck + 21;

You can use a date as a constant in a SAS expression by adding quotation marks and a letter D.

PROC PRINT DATA = CONTEST2 NOOBS;


WHERE DATE = '29OCT2003'd;
RUN;

Input Styles:

1. List input
2. Formatted input
3. Column input
4. Mixed input

List Input style

Characteristics of list Input Style


 Fields must be separated by at least one blank
 Each field must be specified in order
 Missing values must be represented by period
 Character values can’t contain embedded blanks
 The default length of character variables is 8. A longer value truncated when it is written in the
programmer vector
 Data must be standard character or numeric character

data list_input;
input name $ age sal ;
cards;
venu 24 456.09
inder 25 467.17
reddy 21 766.36
hanu 26 765.89
;
run;
Column Input style
That the INPUT statement lists the variables with their corresponding column locations in order from
left to right. However, one of the features of column input is the capability to read fields in any order.

input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25;

For example, you could have read the values for InStock and BackOrd before the values for
Item and IDnum

When you use column input, your data must be


 standard character or numeric values
 in fixed fields.

Standard numeric data values can contain only


 numbers
 decimal points
 numbers in scientific or E-notation (2.3E4, for example)
 plus or minus signs.
Nonstandard numeric data includes
 values that contain special characters, such as percent signs (%), dollar signs ($), and
commas (,)
 date and time values
 Data in fraction, integer binary, real binary, and hexadecimal forms.

When you use column input, you can


 read any or all fields from the raw data file
 read the fields in any order
 specify only the starting column for values that occupy only one column.

Column input has the following advantages over list input:


 Spaces are not required between values
 Missing values can be left blank
 Character data can have embedded spaces
 You can skip unwanted variables.

DATA column_input_1;
INFILE DATALINES;
INPUT Name $ 1-16 Age 18-19 Type $ 21 Score1 23-25;
DATALINES;
Alicia Grossman 13 c 7.8
Matthew Lee 9 D 6.5
Elizabeth Garcia 10 C 8.9
Lori Newcombe 6 D 6.7
Jose Martinez 7 d 8.9
Brian Williams 11 C 7.8
;
RUN;
Formatted Input style
Formatted input is a very powerful method for reading both standard and nonstandard data in fixed
fields.

Formatted input style works with two column pointer controls.


 The @n moves the input pointer to a specific column number.
 The +n moves the input pointer forward to a column number that is relative to the current
position.

Using the @n Column Pointer Control

The @n is an absolute pointer control that moves the input pointer to a specific column number.
The @ moves the pointer to column n, which is the first column of the field that is being read.

DATA PATIENT;
INFILE DATALINES;
INPUT @1 ID $4. @6 Name $14. @21Gender $1. @23 Age 2. @26 Date MMDDYY8.
@35 Height 2. @38 Weight 3. @42 ActLevel $4. @47 Fee 6.2 ;
datalines;
2588 Ivan, H F 22 06/02/97 63 139 LOW 85.20
2586 Derber, B M 25 06/04/97 75 188 HIGH 85.20
2458 Murray, W M 27 06/05/97 72 168 HIGH 85.20
2572 Oberon, M F 28 06/05/97 62 118 LOW 85.20
2544 Jones, M M 29 06/07/97 76 193 HIGH 124.80
2574 Peterson, V M 30 06/08/97 69 147 MOD 149.75
2501 Bonaventure, T F 31 06/09/97 61 123 LOW 149.75
2552 Reberson, P F 32 06/10/97 67 151 MOD 149.75
;
run;

The +n Pointer Control

The +n pointer control moves the input pointer forward to a column number that is relative to the
current position. The + moves the pointer forward n columns.

DATA formatted_input_rel_1;
INFILE DATALINES;
INPUT ID $4. +1 Name $14. +1 Gender $1. +1 Age 2. +1 Date MMDDYY8.
+1 Height 2. +2 Weight 3. +1 ActLevel $4. +1 Fee 6.2 ;
datalines;
2588 Ivan, H F 22 06/02/97 63 139 LOW 85.20
2586 Derber, B M 25 06/04/97 75 188 HIGH 85.20
2458 Murray, W M 27 06/05/97 72 168 HIGH 85.20
2572 Oberon, M F 28 06/05/97 62 118 LOW 85.20
2544 Jones, M M 29 06/07/97 76 193 HIGH 124.80
2574 Peterson, V M 30 06/08/97 69 147 MOD 149.75
2501 Bonaventure, T F 31 06/09/97 61 123 LOW 149.75
2552 Reberson, P F 32 06/10/97 67 151 MOD 149.75
;
run;
Creating single observation from multiple records

Line pointer controller


Absolute line pointer controller- #n
Relative line pointer controller - /

Absolute line pointer controller

data line_pointer_1;
input #1 Name $ Age Gender $ #2 City $ salary #3 State $;
datalines;
Raj 22 m
Delhi 22000
Delhi
rahul 25 m
Gurgaon 25000
Haryana
;
run;

Relative line pointer controller

data line_pointer_rel;
input Name $ Age Gender $ / City $ salary / State $;
datalines;
Raj 22 m
Delhi 22000
Delhi
rahul 25 m
Gurgaon 25000
Haryana
;
run;

Creating multiple observation from single record

Line Hold specifier- @@


data line_holder;
input Name $ Age @@;
datalines;
raj 22 rahul 23 sachin 24
rani 23 priya 21 ravi 21
;
run;

Line Hold specifier- @


data line_holder_2;
input City $ @;
do Quarter= 1 to 4;
input rain @;
output;
end;
datalines;
Delhi 56 63 45 33
Mumbai 52 53 66 32
Calcutta 52 45 26 34
;
run;
Reading Delimited Files with the DATA Step

The DLM= option If you read your data using list input, the DATA step expects your file to have
spaces between your data values. The DELIMITER=, or DLM=, option in the INFILE statement allows
you to read data files with other delimiters. The comma and tab characters are common delimiters
found in data files, but you could read data files with any delimiter character by just enclosing the
delimiter character in quotation marks after the DLM= option (i.e.,
DLM=’&’).

If the same data had tab characters between values instead of commas, then you could use the
DLM=’09’X option.

By default, SAS interprets two or more delimiters in a row as a single delimiter. If your file has missing
values, and two delimiters in a row indicate a missing value, then you will also need the DSD option in
the INFILE statement.

The DSD option The DSD (Delimiter-Sensitive Data) option for the INFILE statement does three
things for you.
 It ignores delimiters in data values enclosed in quotation marks.
 It does not read quotation marks as part of the data value.
 It treats two delimiters in a row as a missing value.

The DSD option assumes that the delimiter is a comma. If your delimiter is not a comma then you can
use the DLM= option with the DSD option to specify the delimiter.

CSV files Comma-separated values files, or CSV files, are a common type of file that can be read
with the DSD option. Many programs, such as Microsoft Excel, can save data in CSV format. These
files have commas for delimiters and consecutive commas for missing values; if there are commas in
any of the data values, and then those values are enclosed in quotation marks.

DATA music_2;
INFILE datalines DLM = ',' DSD ;
INPUT BandName :$30. GigDate :MMDDYY10. EightPM NinePM TenPM ElevenPM;
datalines;
Lupine Lights,12/3/2003,45,63,70,32
Awesome Octaves,12/15/2003,17,28,44,12
"Stop, Drop, and Rock-N-Roll",1/5/2004,34,62,77,91
The Silveyville Jazz Quartet,1/18/2004,38,30,42,43
Catalina Converts,1/31/2004,56,,65,34
;
run;

For BandName and GigDate Colon modified informats is used.

Colon modifier tells SAS to read for the length of the informat (30 for BandName and 10 for
GigDate), or until it encounters a delimiter, whichever comes first. Because the names of the bands
are longer than the default length of 8 characters, we use the :$30. informat for BandName to read
up to 30 characters
Missover

By default, SAS will go to the next data line to read more data if SAS has reached the end of the data
line and there are still more variables in the INPUT statement that have not been assigned values.
The MISSOVER option tells SAS that if it runs out of data, don’t go to the next data line. Instead,
assign missing values to any remaining variables.

Example: ‘missover_1.txt’
Lupine Lights,12/3/2003,45,63,70
Awesome Octaves,12/15/2003,17,28,44,12
"Stop, Drop, and Rock-N-Roll",1/5/2004,34,62,77,91
The Silveyville Jazz Quartet,1/18/2004,38,30,42,43
Catalina Converts,1/31/2004,56,,65,34

DATA music_3;
INFILE 'C:\Users\Yunus khwaja\Desktop\missover_1.txt' dlm = ',' DSD
MISSOVER;
INPUT BandName :$30. GigDate :MMDDYY10. EightPM NinePM TenPM ElevenPM ;
run;
proc print ; run;

Truncover
You need the TRUNCOVER option when you are reading data using column or formatted input and some data
lines are shorter than others. If a variable’s field extends past the end of the data line, then, by default, SAS will
go to the next line to start reading the variable’s value. This option tells SAS to read data for the variable until it
reaches the end of the data line, or the last column specified in the format or column range, whichever comes
first.

This program uses column input to read the address file. Because some of the addresses stop
before the end of the variable Street’s field (columns 20 through 35), you need the TRUNCOVER
option. Without the TRUNCOVER option, SAS would try to go to the next line to read the data for
Street on the first and third records.

Example: ‘Truncover.txt’
John Garcia 114 Maple Ave.
Sylvia Chung 1302 Washington Drive
Martha Newton 45 S.E. 14th St.

DATA homeaddress;
INFILE 'C:\Users\Yunus khwaja\Desktop\truncover.txt' truncover;
INPUT Name $ 1-13 Number 15-18 Street $ 20-35;
RUN;

Note: TRUNCOVER is similar to MISSOVER. Both will assign missing values to variables if the data
line ends before the variable’s field starts. But when the data line ends in the middle of a variable field,
TRUNCOVER will take as much as is there, whereas MISSOVER will assign the variable a missing
value.
IMPORT Procedure

There are a few things that PROC IMPORT does for you that make it easy to read certain types of
data files. PROC IMPORT will scan your data file and automatically determine the variable types
(character or numeric), will assign proper lengths to the character variables, and can recognize some
date formats.2 PROC IMPORT will treat two consecutive delimiters in your data file as a missing
value, will read values enclosed by quotation marks, and assign missing values to variables when it
runs out of data on a line. Also, if you want, you can use the first line in your data file for the variable
names. The IMPORT procedure actually writes a DATA step for you, and after you submit your
program, you can look in the Log window to see the DATA step it produced.

The simplest form of the IMPORT procedure is


PROC IMPORT DATAFILE = ’filename’ OUT = data-set;

Where, the file you want to read follows the DATAFILE= option, and the name of the SAS data set
you want to create follows the OUT= option. SAS will determine the file type by the extension of the
file as shown in the following table.

Type of File Extension DBMS Identifier


Comma-delimited .csv CSV
Tab-delimited .txt TAB
Delimiters other than commas or tabs DLM
Microsoft Excel .xls EXCEL
Microsoft Access .mdb ACCESS

If your file does not have the proper extension, or your file is of type DLM, then you must use the
DBMS= option in the PROC IMPORT statement. Use the REPLACE option if you already have a SAS
data set with the name you specified in the OUT= option, and you want to overwrite it. Here is the
general form of PROC IMPORT with both the REPLACE and the DBMS options:

PROC IMPORT DATAFILE = ’filename’ OUT = data-set


DBMS = identifier REPLACE;

The IMPORT procedure will, by default, get variable names from the first line in your data file. If you
do not want this, then add the GETNAMES=NO statement after the PROC IMPORT statement.
PROC IMPORT will assign the variables the names VAR1, VAR2, VAR3, and so on. Also if your data
file is type DLM, PROC IMPORT assumes that the delimiter is a space. If you have a different
delimiter, then specify it in the DELIMITER= statement. The following shows both these
statements:
PROC IMPORT DATAFILE = ’filename’ OUT = data-set
DBMS = DLM REPLACE;
GETNAMES = NO;
DELIMITER = ’delimiter-character’;
RUN;

Microsoft Access Files If you want to read Microsoft Access files, then instead of using the
DATAFILE= option, you need a DATABASE= and a DATATABLE=option as follows4:

PROC IMPORT DATABASE = ’database-path’ DATATABLE = ’table-name’


OUT = data-set DBMS = identifier REPLACE;
The EXPORT procedure

The general form of PROC EXPORT is

PROC EXPORT DATA = data-set OUTFILE = 'filename';

Where, data-set is the SAS data set you want to export, and filename is the name you make up for
the output data file. The following statement tells SAS to read a temporary SAS data set named
HOTELS and write a comma-delimited file named Hotels.csv in a directory named MyRawData on the
C drive (Windows):

PROC EXPORT DATA = hotels OUTFILE = 'c:\MyRawData\Hotels.csv';

SAS uses the last part of the filename, called the file extension, to decide what type of file to create.
You can also specify the file type by adding the DBMS= option to the PROC EXPORT statement. The
following table shows the filename extensions and DBMS identifiers currently available with Base SAS
software. If you specify the DBMS option, then it takes precedence over the file extension.

Type of file Extension DBMS Identifier


Comma-delimited .csv CSV
Tab-delimited .txt TAB
Space-delimited DLM

Notice that for space-delimited files, there is no standard extension so you must use the DBMS=
option. The following statement, containing the DBMS= option, tells SAS to create a spacedelimited
file named Hotels.spc. The REPLACE option tells SAS to replace any file with the same name.

PROC EXPORT DATA = hotels OUTFILE = 'c:\MyRawData\Hotels.spc'


DBMS = DLM REPLACE;

If you want to create a file with a delimiter other than a comma, tab, or space, then you can add the
DELIMITER statement. If you use the DELIMITER statement, then it does not matter what file
extension you use, or what DBMS identifier you specify, the file will have the delimiter that you specify
in the DELIMITER statement. For example, the following would produce a file, Hotels.txt, that has the
ampersand (&) as the delimiter:

PROC EXPORT DATA = hotels OUTFILE = 'c:\MyRawData\Hotels.txt'


DBMS = DLM REPLACE;
DELIMITER='&';

The following program writes a plain text, tab-delimited file that you can read with any text editor or
word processor:

LIBNAME sports ’c:\MySASLib’;

* Create Tab-delimited file;

PROC EXPORT DATA = sports.golf OUTFILE = 'c:\MyRawData\Golf.txt' REPLACE;


RUN;

Because the name of the output file ends with .txt and there is no DELIMITER statement, SAS will
write a tab-delimited file. If you run this program, your log will contain the following note about
the output file:

NOTE: 7 records were written to the file 'c:\MyRawData\Golf.txt'.


Notice that while the data set contained six observations, SAS wrote seven records. The extra
record contains the variable names. If you read this file into a word processor and set the tabs, it
will look like this:

CourseName NumberOfHoles Par Yardage GreenFees


Kapalua Plantation 18 73 7263 125
Pukalani 18 72 6945 55
Sandlewood 18 72 6469 35
Silversword 18 71 57
Waiehu Municipal 18 72 6330 25
Grand Waikapa 18 72 6122 200

Any format that you have assigned to variables in the SAS data set will be applied by PROC
EXPORT. If you want to change a format, use a FORMAT statement in a DATA step before running
PROC EXPORT.

You might also like