0% found this document useful (0 votes)
18 views39 pages

1BASICS - PPT Delhi Institute

SAS, or Statistical Analysis System, is a software developed in the 1960s to meet the statistical analysis needs of agricultural research. It features an Integrated Development Environment (IDE) with various components for data management, including libraries for data storage and manipulation. The document outlines the structure, programming concepts, and terminology associated with SAS, emphasizing its capabilities in data access, management, analysis, and presentation.

Uploaded by

Arijit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views39 pages

1BASICS - PPT Delhi Institute

SAS, or Statistical Analysis System, is a software developed in the 1960s to meet the statistical analysis needs of agricultural research. It features an Integrated Development Environment (IDE) with various components for data management, including libraries for data storage and manipulation. The document outlines the structure, programming concepts, and terminology associated with SAS, emphasizing its capabilities in data access, management, analysis, and presentation.

Uploaded by

Arijit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

What is SAS ?

• SAS – Statistical Analysis System/Software


Why the Need for SAS ?
In the early 1960s, the Statistics Department at
North Carolina State University was awarded an
agriculture research project.The people working
on the project needed computer software for
IBM mainframe that could access and manipulate
large volumes of data and perform statistical
analysis on the data. There was no package
available that met their needs, so they started
designing a solution.
History of SAS
• The Early 1960s
• Agricultural research at Land Grant
Universities
• Business need: general purpose statistical
software to manage and manipulate large
volume of data and perform statistical
analysis.
SAS IDE
INTEGRATED DEVELOPMENT ENVIRONMENT(IDE)

COMPONENTS/WINDOWS

1. EXPLORER WINDOW
2. LOG WINDOW INTERACTIVE (USED FOR PROG &DEBUG

3. ENHANCED EDITOR
4. OUTPUT WINDOW
5. RESULTS WINDOW NON-INTERACTIVE (USED FOR OUTPUT GENERATION
ONLY)
6. HTML WINDOW
SAS IDE
EXPLORER WINDOW : HAVING FOUR COMPONENTS

THE PURPOSE OF EXPLORED WINDOW IS TO NAVIGATE A


SYSTEM/ NETWORK

- IT BEHAVE LIKE A ALTERNATIVE OS (FOR NON-


WINDOWS ENVIRONMENT)
- HELPING IN STORAGE OF DATA & FILES

MYCOMPUTER (LOGICAL DISK)


FILE SHORTCUSTS (DOCs)
FAVORITE FOLDER (BROWSE)
LIBRARIES (STORES DATA SETS)

SAS INTEGRATESWITH ANY OPERATING SYSTEMS


SAS IDE
LIBRARIES: THESE ARE THE LOGICAL FOLDERS IN SAS USED
FOR DATA MGMT, FILE MGMT.

THEY ARE TWO TYPES:


DEPENDENT – CONNECTED LIBRARY
INDEPENDENT- SAS LIBRARY

DEPENDENT LIBRARY:

- DEPENDENT LIBRARY ALWAYS CONNECTED TO AN


EXTERNAL DATA SOURCES (DBMS/RDBMS/FILE)

- DEPENDENT LIBRARY CAN REFER TO A SINGLE DATA


SOURCE ONLY

- DOES NOT OCCUPY ANY MEMORY


- DATA MANUPULATIONS WILL BE REFLECTED IN BOTH SAS
& EXTERNAL SOURCES

DISADVANTAGE : SECURITY OF DB MAY BE COMPROMISED


SAS IDE
INDEPENDENT LIBRARY:

- THESE ARE NOT CONNECTED TO ANY EXTERNAL DATA


SOURCES (ONLY FOLDER BASED)

- USER HAS TO MANAGE ALL DATA & SECURITY OF DATA

DISADVANTAGE :

OCCUPY MEMORY (MORE MEMORY MAY EFFECT


PROCESSING SPEED)

ADVANTAGE:

WE CAN USE TO STORE DATA OF MULTIPLE DATASOURCES

CAN BE USED TO PROVIDE SECURITY TO INDEPENDENT


LIBRARIES
SAS IDE
LIBRARY MODES: ARE TWO TYPES

- TEMPORARY LIBRARY: SINGLE SESSION ONLY

- PERMANENT LIBRARY: MULTIPLE SESSIONS.

PORPERTIES OF LIBRARY

1. NAME : MAX 8 CHARS, MUST START WITH CHAR OR UNDERSCORE

2. ENGINE: USED TO DEFINE THE DATA SOURCE OF LIBRARY


(ORACLE, ACCESS…)


3. ENABLE AT START UP: PERMANENT TEMPORARY

4. PATH : DEFINES THE LOCATION & PARAMETER TO CONNECT TO A


DB OR DATA SOURCES (‘C:\PROG FILES\....’)

5. OPTIONS : SECURITY SETTINGS (R-READ, W-WIRTE, A-ALTER)


OPTIONAL.
SAS IDE
LOG WINDOW: IT DISPLAYS THE COMPILATION RESULTS IN COLOUR
CODED

- ERRORS (RED)
- INFORMATION (BLUE)
- SUGESSTIONS (MAROON)
- WARNINGS (GREEN)
- STATEMENTS (BLACK)

SUPPORTS DEBUGGING (REMOVING THE LOGICAL ERRORS


DEBUGGING). LOGICAL ERRORS ARE THREE TYPES

SYNTAX/RUN TIME/ LOGICAL/DATA ERRORS

RUNTIME( EXTERNAL ERRORS)


LOGICAL(OUT PUT IS NOT CORRECT)

-OPTIMIZATION: (TIME PERIOD BASED EXECUTION)

- REAL TIME (TOTAL TIME TAKEN TO)


- CPU TIME (HOW MUCH TIME CONSUME)
LOG WINDOW CAN BE STORED AS EXTERNAL FILE ( .LOG)
SAS IDE
ENHANCED EDITOR WINDOW:

-WIRITING SAS SCRIPTS (COLOUR CODED SCRIPTS)


-STORED AS .SAS FILE
-COMBINATION OF COMPLIER + INTERPRETER

INTERPRETER WHILE TYPING A PROGRAM


COMPILER WHILE EXECUTING A PROGRAM

INTERPRETER CHECKS EACH LINE ERROR


COMPILER CHECKS WHOLE PROGRAM ERRORS

PROGRAM EDITOR & ENCHACED EDITOR

- WRITING SAS SCRIPTS


- STORED AS .SAS FILE
- PROGRAM EDITOR ONLY FOR DOS & UNIX

NOTE: AN ENHANCED EDITOR WHEN SAVED BECOMES A


PROGRAM EDITOR WINDOW
SAS PROGRAMMING
CREATING DATA SETS (TABLES)

-MANNUALY
-USING EXISTING DATA SETS (DBMS/RDBMS)
-USING DATA FROM FILES (FLAT FILES)

DEFAULT LIBRARIES IN SAS

PERMANENT: SASUSER, SASHELP,GISMAP,MAPS

TEMPORARY: WORK (ALL DATASETS WITHOUT ANY REFERENCE


WILL BE STORED IN WORK LIBRARY)
Words in the SAS Language
• word or token in the SAS language is a
collection of characters that communicates a
meaning to SAS and is not divisible into smaller
units capable of independent use. It can contain
a maximum of 32,767 characters.

• A word or token ends when SAS encounters one


of the following: the beginning of a new token; a
blank after a name ; or a number token the
ending quotation mark of a literal token.
Words in the SAS Language
(contd)
• Each word or token in the SAS
language belongs to one of four
categories:
• names
• literals
• numbers
• special characters.
SAS NAMING CONVENTIONS
Name
1. SAS variable names may be up to 32 characters in length.
2. The first character must begin with an alphabetic character or an
underscore. Subsequent characters can be alphabetic characters,
numeric digits, or underscores.
3. A variable name may not contain blanks.
4. A variable name may not contain any special characters other
than the underscore.
5. A variable name may contain mixed case. The mixed case is
remembered and used for presentation purposes only. When SAS
processes variable names, however, it internally uppercases them.
You cannot, therefore, use the same letters with different
combinations of lower- and uppercase to represent different
variables. For example, cat, Cat, and CAT all represent the
same variable.
Words in the SAS Language (contd)
1. You may not assign the names of special SAS
automatic variables (such as _N_ and _ERROR_) or
variable list names (such as _NUMERIC_,
_CHARACTER_, and _ALL_) to variables.
• NAME is a series of characters that begin with a letter or
an underscore. Later characters can include letters,
underscores, and numeric digits. A name token can
contain up to 32,767 characters. In most contexts,
however, SAS names are limited to a shorter maximum
length, such as 32 or 8 characters. Examples of name
tokens include:
 Data _new yearcutoff year_99 descending _n_
Words in the SAS Language
(contd)
• Literal
 consists of 1 to 32,767 characters enclosed in single
or double quotation marks. Examples of literals
include
 ‘Chicago'
 "1990-91"
 ‘SatyaKalyani Pala'
 ‘Suresh Bharatha’
 ‘Mani"s plane'
 "Report for the Third Quarter"
Words in the SAS Language
(contd)
• Number
 in general is composed entirely of numeric digits, with
an optional decimal point and a leading plus or minus
sign. SAS also recognizes numeric values in the
following forms as number tokens: scientific (E-)
notation, hexadecimal notation, missing value
symbols, and date and time literals. Examples of
number tokens include
 5683 2.35 0b0x -5 5.4E-1 '24aug90'd
Words in the SAS Language
(contd)
• Special character
 isusually any single keyboard character other than
letters, numbers, the underscore, and the blank. In
general, each special character is a single token,
although some two-character operators, such as **
and <=, form single tokens. The blank can end a
name or a number token, but it is not a token.
Examples of special-character tokens include
= ; ' + @ /
Placement and Spacing of Words in
SAS Statements©
• Examples
• In this statement, blanks are not required because SAS can
determine the boundary of every token by examining the
beginning of the next token:
total=x+y;
• The first special-character token, the equal sign, marks the
end of the name token total. The plus sign, another special-
character token, marks the end of the name token x. The last
special-character token, the semicolon, marks the end of the
y token. Though blanks are not needed to end any tokens in
this example, you may add them for readability, as shown
here:
• total = x + y;
SAS FORMULA
• SAS FORMULA IS DIVIDED INTO TWO TYPES
– SAS TECHINICAL – USED FOR PROGRAMMING IN
ALL LAYERS
– SAS FUNCTIONAL – USED FOR PROGRAMMING &
PROCESS

PROCESS

1) DATA ACCESS
2) DATA MGMT
3) DATA ANALYSIS
4) DATA PRESENTATION
SAS FORMULA
SAS TECHNICAL FUNCTIONAL

1) DATA STEP DATA ACCESS


2) DATA SET DATA MGMT
3) DATA PROG & PROC DATA ANALYSIS
4) DATA OUTPUTS DATA PRESENTATION
SAS FORMULA
1. DATA STEP : DEFINE THE STRUCTURE OF DATA

DEFINITION : DATA TYPES HAVING TWO TYPES


NUMBER 8BYTES (MAX & MIN)
TEXT/CHAR WILL OCCUPY –1BYTE/CHAR
EXAMPLE
X=01 8BYTES
X=’01’ 2BYTES
NAME=‘ALLEN’ 5BYTES
N=9060984976789 8BYTES
S= ‘SAS SYSTEM’ 10BYTES

SAS STORES DATE AS NUMBER


CENTURY, YEAR, MONTH, DAY, HOURS, MIN & SEC
FROM 01-JAN-1960=0
21-JAN-1960=20
LARGEST NUMBER – 9,9999999------99(38 DIGITS)
SMALLEST NUMBER- 0.00000------0000(29 DECIMALS)
SAS FORMULA
1. DATA STEP : STRUCTURE OF SAS

STRUCUTRE: STORAGE PATTERN OF DATA. IT MAY


BE COMIBINATION OF

VARIABLE + DATA TYPE + SIZE + CONSTRAINT

ITEMNO NUMBER(4) (4) >999 AND <10,000


ITEMNAME TEXT 200 NMISS (NON MISSING)
PRICE NUMBER 7.2 >1000
SAS FORMULA
2. DATA SET : STORAGE OF DATA IN SAS

– A TABLE IN SAS CALLED AS DATA SET.


– DATA SET CONSISTS OF VARIABLES,
OBSERVATIONS.
– COLLECTION OF DATA IN FORM OF OBSERV &
VAR
– MUST BE BASED ON DATA STEP (DEFINITION &
STRUCTURE)

INTERNAL (SAS) & EXTERNAL (FILES /DB)


Basic Structure of SAS
• There are two main components in
the SAS programs –
• the Data step(s) and
• the Procedure step(s) also call PROC.
• The data step reads data from
external / internal sources,
manipulates and combines it with
other data set and prints reports.
The data step is used to prepare
your data for use by one of the
procedures (often called “Procs").
SAS FORMULA
3. DATA PROGRAMS : ARE USER DEFINED PROGRAMS.
IN SAS (20%) USED FOR

DATA PROCESSING
DATA MANUPULATION
LOGIC BUILDING IN SAS
INTEGRATION
CUSTOMIZATION
SYNTAX:

DATA <DATA SET OPTIONS>;


< PROG STATEMENTS>;
<LOGICAL STATEMENTS>;
< PROCESS STATEMENTS>;
RUN; COMPILE & EXCUTE
SAS FORMULA
3. DATA PROCEDURE : SAS BUILT-IN
PROGRAMS/FUNCATIONS (80%). HAVING 7638
PROCEDURES
SYNTAX BASED
ALL PROCEDURES ARE PROCESS BASED
DOMAIN BASED
GENERATE OUTPUT
SYNTAX:
PROC <PROC NAME> <OPTIONS>;
< SYNTAX STATEMENTS ONLY>;
RUN; /*BASE SAS PROCEDURES*/
QUIT; /*OTHER THAN SAS PROCEDURES*/
SAS FORMULA
4. DATA OUTPUT : RESPONSIBLE FOR OUTPUT
GENERATIONS FROM SAS.

– THE ENTIRE PROCESS IS PROCEDURE


BASED
– SIMPLE REPORTS & GRAPHS
– MULTIDIMENSIONAL REPORTS & GRAPHS
– DATA BASE
– DATA SET/TABLE
– GUI APPLICATION
– USER INTERFACES (WITH IN SAS)
SAS FORMULA
SAS FORMULA HAS DIVIDED INTO TWO TYPES:

(1) SAS TECHNICAL: USED FOR PROG IN ALL LAYERS


(2) SAS FUNCTIONAL: USED FOR PROG & PROCESS

PROCESS TECHNICAL
(A) DATA ACCESS DATA STEP
(B) DATA MGMT DATA SET
(C) DATA ANALYSIS DATA PROG & PROC
(D) DATA PRESENTATION DATA OUTPUT
Terminology in SAS
• In SAS, you call a
• File - DataSet
• Field - Variable
• Record(s) - OBServations / Rows
An Observation is a collection of data values
that usually relate to a single object.
A Variable is the set of data values that describe
a given characteristic.
An example will be shown to best describe.
SAS FORMULA
DATA
OUTPUT
FILES DATA BASE
NT

DATA PROC
NE

DATA STEP
CO RST
PO
M
FI

NT
ND E
C O O
P
N DATA SET DATA PROG
SE OM
C
Sample SAS program
• Data MySample;
• A=4;
• B=2;
• C=A*B;
• Run;
• Proc Print;
• Run;
Why RUN statement ?
Run statement
• – Tells SAS that the Data step or Procedure has ended.
• – Good practice to end each Data step or Procedure with
a run statement.
• – Must still SUBMIT the SAS program for it to be
Processed.
Missing Values in SAS
* A character missing value is displayed as a
blank.
• A numeric missing value is displayed as a
period.
• Example;

Data Missing_Test;
Length A B $ 10 ;
A='Ramanathan';
Run;
Proc Print;
Run;
The Data Step
• The data step provides a wide range of capabilities,
among them reading data from external sources,
reshaping and manipulating data, transforming data
and producing printed reports.

• The data step is actually an implied do loop whose


statements will be executed for each observation
either read from an external source, or accessed
from a previously processed data set.

• For each iteration, the data step starts with a vector


of missing values for all the variables to be placed in
the new observation. It then overwrites the missing
value for any variables either input or defined by the
data step statements. Finally, it outputs the
observation to the newly created data set.
Data Step: Basics
• Each data step begins with the word data and
optionally one or more data set names (and
associated options) followed by a semicolon. The
name(s) given on the data step are the names of
data sets which will be created within the data step. If
you don't include any names on the data step, SAS
will create default data set names of the form datan,
where n is an integer which starts at 1 and is
incremented so that each data set created 35 has a
unique name within the current session. Since it
becomes difficult to keep track of the default names,
it is recommended that you always explicitly specify
a data set name on the data statement.
• When you are running a data step to simply generate
a report, and don't need to create a data set, you can
use the special data set name _null_ to eliminate the
output of observations.
Data Step: Inputting Data
• The input statement of SAS is used to read
data from an external source, or from lines
contained in your SAS program.
• The infile statement names an external le
or leref from which to read the data;
otherwise the cards; or datalines;
statement is used to precede the data.
• Reading data from an external
• data one;
• infile “c:\Radhika\Samp.dat”;
• input a b c;
• run;
Data Step: Inputting Data
(contd)
• Reading from inline data
• data one;
• input a b c;
• datalines;
453
9 10 12
;
Run;
• By default, each invocation of the input
statement reads another record. This
example uses free-form input, with at least
one space between values.
Ex-Talking abt
• Data Demog;
• length Educ $ 15;
• Input Gender $ Age Race $ Height Weight
• Income Educ $ 21-29 Marstat $
• NumChld;
• Chldyes=(Numchld>0); /* If Numchld > 0 then
Chldyes=1;
• Else Chldyes=0;
• */
• Keep Gender Age educ chldyes ;
• Cards;
• M 46 W 72 190 45000 College M 3
• F 30 B 62 110 54000 Grad Sch S 0
• ;
• run ;
How to Use the INFILE Statement
• INFILE statement identifies the file to read, it must execute before the
INPUT statement that reads the input data records. You can use the
INFILE statement in conditional processing, such as an IF-THEN
statement, because it is executable. This allows you to control the
source of the input data records.
• Usually, you use an INFILE statement to read data from an external
file. When data are read from the job stream, you must use a
DATALINES statement. However, to take advantage of certain data-
reading options that are available only in the INFILE statement, you
can use an INFILE statement with the file-specification DATALINES
and a DATALINES statement in the same DATA step.
• When you use more than one INFILE statement for the same file-
specification and you use options in each INFILE statement, the
effect is additive. To avoid confusion, use all the options in the first
INFILE statement for a given external file.

You might also like