Base SAS Interview Questions
Base SAS Interview Questions
The INFILE statement is used to identify an external file while the INPUT
statment is used to describe your variables.
Note : The variable name, followed by $ (dollar sign), idenfities the variable
type ascharacter.
In the example shown above, ID and SEX are numeric variables and Name a
character variable.
2. Difference between Informat and Format
Informats read the data while Formats write the data.
Informat - To tell SAS that a number should be read in a particular format.
For example: the informat mmddyy6. tells SAS to read the
number 121713 as the date December 17, 2013.
Format - To tell SAS how to print the variables.
3. Difference between Missover and Truncover
Missover - When the MISSOVER option is used on the INFILE statement, the
INPUT statement does not jump to the next line when reading a short line.
Instead, MISSOVER sets variables to missing.
Truncover - It assigns the raw data value to the variable even if the value is
shorter than the length that is expected by the INPUT statement.
The following is an example of an external file that contains data:
1
22
333
4444
This DATA step uses the numeric informat 4. to read a single field in each
record of raw data and to assign values to the variable ID.
MISSOVER Option
data readin;
infile 'external-file' missover;
input ID4.;
run;
proc print data=readin;
run;
The output is shown below :
Obs
1
2
3
4
ID
.
.
.
4444
TRUNCOVER Option
data readin;
infile 'external-file' truncover;
input ID4.;
run;
proc print data=readin;
run;
The output is shown below :
Obs
1
2
3
4
ID
1
22
333
4444
The KEEP statement specifies the names of the variables that you want to
retain from the data set.
PROC MEANS
data mydata2;
set mydata;
a=sum(x,y,z);
p=x+y+z;
run;
The output is shown in the image below :
In the output, value of p is missing for 4th, 5th and 6th observations.
14. How to identify and remove unique and duplicate values?
Use PROC SORT with NODUPKEY and NODUP Options.
The detailed explanation is shown below :
SAMPLE DATA SET
ID
Name
Score
David
45
David
74
Sam
45
Ram
54
Bane
87
Mary
92
Bane
87
Dane
23
Jenny
87
Ken
87
Simran
63
Priya
72
There are several ways to identify and remove unique and duplicate values:
PROC SORT
In PROC SORT, there are two options by which we can remove duplicates.
1. NODUPKEY Option
2. NODUP Option
data _null_ ;
phone = '(312) 555-1212' ;
area_cd = substr(phone, 2, 3) ;
put area_cd = ;
run;
Result : In the log window, it writes area_cd = 312 .
LEFT SIDE APPLICATION
It is used to change just a few characters of a variable.
data _null_ ;
phone = '(312) 555-1212' ;
substr(phone, 2, 3) = '773' ;
put phone = ;
run ;
In this example, the variable PHONE has been changed from (312) 5551212 to (773) 555-1212.
See the log window shown in the image below:
SET concatenates the data sets where as MERGE matches the observations
of the data sets.
SET
MERGE
Recode the set of questions: Q1,Q2,Q3...Q20 in the same way: if the variable
has a value of 6 recode it to SAS missing.
data readin;
set outdata;
array Q(20) Q1-Q20;
do i= 1 to 20;
if Q(i) = 6 then Q(i)= .;
end;
run;
33. How to use arrays to recode all the numeric variables?
Use _numeric_ and dim functions in array.
data readin;
set outdata;
array Q(*) _numeric_;
do i= 1 to dim(Q);
if Q(i) = 6 then Q(i)= .;
end;
run;
Note : DIM returns a total count of the number of elements in array
dimension Q.
34. How to calculate mean for a variable by group?
Suppose Q1 is a numeric variable and Age a grouping variable. You wish to
compute mean for Q1 by Age.
PROC MEANS DATA = READIN;
VAR Q1;
CLASS AGE;
RUN;
35. How to generate cross tabulation?
Use PROC FREQ code.
PROC FREQ DATA=auto;
TABLES A*B ;
RUN;
SAS will produce table of A by B.
36. How to generate detailed summary statistics?
data _null_;
set testmean;
call symput("xbarmac",xbar);
run;
%put mean of x is &xbarmac;
41. What are SYMGET and SYMPUT?
SYMPUT puts the value from a dataset into a macro variable where as
SYMGET gets the value from the macro variable to the dataset.
42. Which date function advances a date, time or datetime value by
a given interval?
INTNX function advances a date, time, or datetime value by a given interval,
and returns a date, time, or datetime value. Ex: INTNX(interval,startfrom,number-of-increments,alignment).
43. How to count the number of intervals between two given SAS
dates?
INTCK(interval,start-of-period,end-of-period) is an interval function that
counts the number of intervals between two give SAS dates, Time and/or
datetime.
44. Difference between SCAN and SUBSTR?
SCAN extracts words within a value that is marked by
delimiters. SUBSTR extracts a portion of the value by stating the specific
location. It is best used when we know the exact position of the sub string to
extract from a character value.
45. The following data step executes:
Data strings;
Text1 = MICKEY MOUSE & DONALD DUCK;
Text = scan(text1,2,&);
Run;
What will the value of the variable Text be?
* DONALD DUCK [(Leading blanks are displayed using an asterisk *]
46. For what purpose would you use the RETAIN statement?
A RETAIN statement tells SAS not to set variables to missing when going
from the current iteration of the DATA step to the next. Instead, SAS retains
the values.
47. When grouping is in effect, can the WHERE clause be used in
PROC SQL to subset data?
No. In order to subset data when grouping is in effect, the HAVING clause
must be used. The variable specified in having clause must contain summary
statistics.
48. How to use IF THEN ELSE in PROC SQL?
PROC SQL;
SELECT WEIGHT,
CASE
WHEN WEIGHT BETWEEN 0 AND 50 THEN LOW
WHEN WEIGHT BETWEEN 51 AND 70 THEN MEDIUM
WHEN WEIGHT BETWEEN 71 AND 100 THEN HIGH
ELSE VERY HIGH
END AS NEWWEIGHT FROM HEALTH;
QUIT;
49. How to remove duplicates using PROC SQL?
Proc SQL noprint;
Create Table inter.Merged1 as
Select distinct * from inter.readin ;
Quit;
50. How to count unique values by a grouping variable?
You can use PROC SQL with COUNT(DISTINCT variable_name) to
determine the number of unique values for a column.
51.
Question: If you have 2 sets of Format Libraries, can a
SAS program have access to both during 1 session?
Answer: Yes. Use the FMTSEARCH option.
52.
Describe any three SAS functions
len=LENGTH(animal);
Result is - 6
53.
What is the purpose of the trailing and How would you use them?
If the data is continuosly in data set SAS would read the first words
only from each line in the `datalines' block and it will ignore the rest of
the line. if we use Trailing @@'it will read completly.and another type of
trailing is using single @ this is a line hold specifier.
Trailing @ is used to hold the record in input buffer to execute another
input statement on the same datalines.
Trailing @@ is used to hold the record in input buffer to execute same
input statement on same datalines intil eof record
The trailing @ or more technically, line hold specifiers are used to hold
the pointer in the same record for
multiple iterations. The two tyoes of line hold specifiers are single
trailing(@) and double trailing(@@).
The single trailing hold the record until it encounters either another
input statement or end of the datastep.
They are used for the records such as
001F38 H
002 F 40 G
To read these values to the datastep
Data example;
input @10 type $ @;
if type='H' then
input @1 id 3. @4 gender $1. @5 age2.;
else if type='G' then
input @1 id3. @5 gender $1. @7 age 2.;
end;
cards;
001F38 H
002 F 40 G
;
run;
The double trailing holds the until the end of the record.
Data example2;
input id age @@;
cards;
001 23 002 43 003 65 004 32 005 54
;
run;
54.
What is the significance of the OF in X=SUM (OF a1-a4, a6,
a9);?
Ans: It is use to tell sas to consider a set of values to be processed. In
the above example,SUM(OF a1-a4,a6,a9) resolves to
SUM(A1,A2,A3,A4,A6,A9).
If we dont use 'OF' then it would be treated as a minus sign..
A1(Minus)-A4 and that is not what we are trying to accomplis
55.
58.
The Data step creates an SAS dataset which carries the data along
with a data dictionary. The data dictionary holds the information
about the variables and their properties.
59.
Explain how you can debug and test your SAS program?
You can debug and test your SAS program by using Obs=0 and
systems options to trace the program execution in log
63.
nodup options?
The difference between the NODUP and NODUPKEY is that, NODUP
compares all the variables in our dataset while NODUPKEY compares
just the BY variables
64.
used for?
To display the contents of the SAS dataset PROC print is used and also
to assure that the data were read into SAS correctly. While, PROC
CONTENTS display information about an SAS dataset.
66.
SAS INFORMATS are used to read, or input data from external files
known as Flat Files ASCII files, text files or sequential files). The
informat will tell SAS on how to read data into SAS variables.
69.
PROC gplot has more options and can create more colorful and fancier
graphics.
72. Data step : typically create or modify SAS data set and
they can also used to produce costom-designed reports
Data step used to 1.Put data into SAS data set, cumpute
variable, check for correct error in data, Produce new SAS
data sets by subsetting, merging, and updating existing
data sets