0% found this document useful (0 votes)
427 views26 pages

Base SAS Interview Questions

The document describes differences between INPUT and INFILE statements, Informat and Format, Missover and Truncover options, and purpose of double trailing @@ in Input statement. It also provides examples of using Missover and Truncover options with INPUT statement to read an external file with different number of observations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
427 views26 pages

Base SAS Interview Questions

The document describes differences between INPUT and INFILE statements, Informat and Format, Missover and Truncover options, and purpose of double trailing @@ in Input statement. It also provides examples of using Missover and Truncover options with INPUT statement to read an external file with different number of observations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Difference between INPUT and INFILE

The INFILE statement is used to identify an external file while the INPUT
statment is used to describe your variables.

Note : The variable name, followed by $ (dollar sign), idenfities the variable
type ascharacter.
In the example shown above, ID and SEX are numeric variables and Name a
character variable.
2. Difference between Informat and Format
Informats read the data while Formats write the data.
Informat - To tell SAS that a number should be read in a particular format.
For example: the informat mmddyy6. tells SAS to read the
number 121713 as the date December 17, 2013.
Format - To tell SAS how to print the variables.
3. Difference between Missover and Truncover
Missover - When the MISSOVER option is used on the INFILE statement, the
INPUT statement does not jump to the next line when reading a short line.
Instead, MISSOVER sets variables to missing.
Truncover - It assigns the raw data value to the variable even if the value is
shorter than the length that is expected by the INPUT statement.
The following is an example of an external file that contains data:
1
22
333
4444

This DATA step uses the numeric informat 4. to read a single field in each
record of raw data and to assign values to the variable ID.
MISSOVER Option
data readin;
infile 'external-file' missover;
input ID4.;
run;
proc print data=readin;
run;
The output is shown below :
Obs
1
2
3
4

ID
.
.
.
4444

TRUNCOVER Option
data readin;
infile 'external-file' truncover;
input ID4.;
run;
proc print data=readin;
run;
The output is shown below :

Obs
1
2
3
4

ID
1
22
333
4444

4. Purpose of double trailing @@ in Input Statement ?


The double trailing sign (@@) tells SAS rather than advancing to a new
record, hold the current input record for the execution of the next INPUT
statement.

The output is shown below :

5. How to include or exclude specific variables in a data set?


- DROP, KEEP Statements and Data SET OPTIONS
DROP, KEEP Statement
The DROP statement specifies the names of the variables that you want to
remove from the data set.

The KEEP statement specifies the names of the variables that you want to
retain from the data set.

DROP, KEEP Data set Options


The main difference between DROP/ KEEP statement and DROP= /
KEEP= data set option is that you can not use DROP/KEEP statement in
procedures.

6. How to print observations 5 through 10 from a data set?


The FIRSTOBS= and OBS= data SET OPTIONS would tell SAS to print
observations 5 through 10 from the data set READIN.

7. What are the default statistics that PROC MEANS produce?


PROC MEANS produce the default statistics of N, MIN, MAX, MEAN and STD
DEV.
8. Name and describe functions that you have used for data
cleaning?

9. Difference between FUNCTION and PROC


Example : MEAN function and PROC MEANS
The MEAN function is an average of the value of several variables in one
observation.
The average that is calculated for a PROC MEANS is the sum of all of the
values of a variable divided by the number of observations in the variable.
In other words, The MEAN function will SUM across the row and a procedure
will SUM down a column.
MEAN Function

The output is shown below :

PROC MEANS

The output is shown below :

10. Differences between WHERE and IF statement?


For detailed explanation, see this tutorial - SAS : Where Vs IF .
1.
WHERE statement can be used in procedures to subset data while IF
statement cannot be used in procedures.
2.
WHERE can be used as a data set option while IF cannot be used as a
data set option.
3.
WHERE statement is more efficient than IF statement. It tells SAS
not to read all observations from the data set
4.
WHERE statement can be used to search for all similar character
values that sound alike while IF statement cannot be used.
5.
WHERE statement can not be used when reading data using INPUT
statement whereas IF statement can be used.
6.
Multiple IF statements can be used to execute multiple conditional
statements
7.
When it is required to use newly created variables, use IF
statement as it doesn't require variables to exist in the READIN data set.
11. What is Program Data Vector (PDV)?
PDV is a logical area in the memory.
How PDV is created?
SAS creates a dataset one observation at a time. Input buffer is created at
the time of compilation, for holding a record from external file. PDV is
created followed by the creation of input buffer. SAS builds dataset in
the PDV area of memory
12. What is DATA _NULL_?
The DATA _NULL_ is mainly used to create macro variables. It can also be
used to write output without creating a dataset. The idea of "null" here is
that we have a data step that actually doesn't create a data set.
13. What is the difference between '+' operator and SUM function?
SUM function returns the sum of non-missing arguments whereas +
operator returns a missing value if any of the arguments are missing.
Suppose we have a data set containing three variables - X, Y and Z. They all
have missing values. We wish to compute sum of all the variables.
The data is shown in the image below :

data mydata2;
set mydata;
a=sum(x,y,z);
p=x+y+z;
run;
The output is shown in the image below :

In the output, value of p is missing for 4th, 5th and 6th observations.
14. How to identify and remove unique and duplicate values?
Use PROC SORT with NODUPKEY and NODUP Options.
The detailed explanation is shown below :
SAMPLE DATA SET
ID

Name

Score

David

45

David

74

Sam

45

Ram

54

Bane

87

Mary

92

Bane

87

Dane

23

Jenny

87

Ken

87

Simran

63

Priya

72

Create this data set in SAS

There are several ways to identify and remove unique and duplicate values:
PROC SORT
In PROC SORT, there are two options by which we can remove duplicates.
1. NODUPKEY Option

2. NODUP Option

The NODUPKEY option removes duplicate observations where value of a


variable listed in BY statement is repeated while NODUP option removes
duplicate observations where values in all the variables are repeated
(identical observations).

The output is shown below :

The NODUPKEY has deleted 5 observations with duplicate values whereas


NODUP has not deleted any observations.
Why no value has been deleted when NODUP option is used?
Although ID 3 has two identical records (See observation 5 and 7), NODUP
option has not removed them. It is because they are not next to one another
in the dataset and SAS only looks at one record back.
To fix this issue, sort on all the variables in the dataset READIN.
To sort by all the variables without having to list them all in the program, you
can use the keywork _ALL_ in the BY statement (see below).

The output is shown below :

15. Difference between NODUP and NODUPKEY Options?


The NODUPKEY option removes duplicate observations where value of a
variable listed in BY statement is repeated while NODUP option removes
duplicate observations where values in all the variables are repeated
(identical observations).
16. What are _numeric_ and _character_ and what do they do?
1. _NUMERIC_ specifies all numeric variables that are already defined in the
current DATA step.
2. _CHARACTER_ specifies all character variables that are currently defined
in the current DATA step.
3. _ALL_ specifies all variables that are currently defined in the current DATA
step.
Example : To include all the numeric variables in PROC MEANS
proc means;
var _numeric_;
run;
17. How to sort in descending order?
Use DESCENDING keyword in PROC SORT code. The example below shows
the use of the descending keyword.
PROC SORT DATA=auto;
BY DESCENDING engine ;
RUN ;

18. Under what circumstances would you code a SELECT construct


instead of IF statements?
When you have a long series of mutually exclusive conditions and the
comparison is numeric, using a SELECT group is slightly more efficient than
using IF-THEN or IF-THEN-ELSE statements because CPU time is reduced.
The syntax for SELECT WHEN is as follows :
SELECT (condition);
WHEN (1) x=x;
WHEN (2) x=x*2;
OTHERWISE x=x-1;
END;
Example :
SELECT (str);
WHEN ('Sun') wage=wage*1.5;
WHEN ('Sat') wage=wage*1.3;
OTHERWISE DO;
wage=wage+1;
bonus=0;
END;
END;
19. How to convert a numeric variable to a character variable?
You must create a differently-named variable using the PUT function.
The example below shows the use of the PUT function.
charvar = put(numvar, 7.) ;
20. How to convert a character variable to a numeric variable?
You must create a differently-named variable using the INPUT function.
The example below shows the use of the INPUT function.
numvar=input(charvar,4.0);
21. What's the difference between VAR A1 - A3 and VAR A1 -- A3?
Single Dash : It is used to specify consecutively numbered variables. A1-A3
implies A1, A2 and A3.

Double-dash : It is used to specify variables based on the order of the


variables as they appear in the file, regardless of the name of the variable.
A1--A3 implies all the variables from A1 to A3 in the order they appear in the
data set.
Example : The order of variables in a data set : ID Name A1 A2 C1 A3
So using A1-A3 would return A1 A2 A3.
A1--A3 would return A1 A2 C1 A3.
22. Difference between PROC MEANS and PROC SUMMARY?
1. Proc MEANS by default produces printed output in the OUTPUT window
whereasProc SUMMARY does not. Inclusion of the PRINT option on the Proc
SUMMARY statement will output results to the output window.
2. Omitting the var statement in PROC MEANS analyses all the numeric
variable whereas Omitting the variable statement in PROC
SUMMARY produces a simple count of observation.
How to produce output in the OUTPUT window using PROC
SUMMARY?
Use PRINT option.
proc summary data = retail print;
class services;
var INVESTMENT ;
run;
23. Can PROC MEANS analyze ONLY the character variables?
No, Proc Means requires at least one numeric variable.
24. How SUBSTR function works?
The SUBSTR function is used to extract substring from a character variable.
The SUBSTR function has three arguments:
SUBSTR ( character variable, starting point to begin reading the
variable,number of characters to read from the starting point)
There are two basic applications of the SUBSTR function:
RIGHT SIDE APPLICATION

data _null_ ;
phone = '(312) 555-1212' ;
area_cd = substr(phone, 2, 3) ;
put area_cd = ;
run;
Result : In the log window, it writes area_cd = 312 .
LEFT SIDE APPLICATION
It is used to change just a few characters of a variable.
data _null_ ;
phone = '(312) 555-1212' ;
substr(phone, 2, 3) = '773' ;
put phone = ;
run ;
In this example, the variable PHONE has been changed from (312) 5551212 to (773) 555-1212.
See the log window shown in the image below:

25. Difference between CEIL and FLOOR functions?


The ceil function returns the smallest integer greater than/equal to the
argument whereas the floor returns the greatest integer less than/equal to
the argument.
For example : ceil(4.4) returns 5 whereas floor(4.4) returns 4.
26. Difference between SET and MERGE?

SET concatenates the data sets where as MERGE matches the observations
of the data sets.
SET

MERGE

27. How to do Matched Merge and output only consisting of


observations from both files?
Use IN= variable in MERGE statements. It is used for matched merge to
track and select which observations in the data set from the merge
statement will go to a new data set.
data readin;
merge file1(in=infile1) file2(in=infile2);
by id;
if infile1= infile2;
run;

28. How to do Matched Merge and output consisting of observations


in file1 but not in file2?
data readin;
merge file1(in=infile1) file2(in=infile2);
by id;
if infile1 ne infile2;
run;
29. How to do Matched Merge and output consisting of observations
from only file1?
data readin;
merge file1(in=infile1) file2(in=infile2);
by id;
if infile1;
run;
30. How do I create a data set with observations =100, mean 0 and
standard deviation 1?
data readin;
do i = 1 to 100;
temp = 0 + rannor(1) * 1;
output;
end;
run;
proc means data = readin mean stddev;
var temp;
run;
31. How to label values and use it in PROC FREQ?
Use PROC FORMAT to set up a format.
proc format;
value score 0 - 100 = 100-
101 - 200 = 101+
other = others
;
proc freq data= readin;
tables outdata;
format outdata score. ;
run;
32. How to use arrays to recode set of variables?

Recode the set of questions: Q1,Q2,Q3...Q20 in the same way: if the variable
has a value of 6 recode it to SAS missing.
data readin;
set outdata;
array Q(20) Q1-Q20;
do i= 1 to 20;
if Q(i) = 6 then Q(i)= .;
end;
run;
33. How to use arrays to recode all the numeric variables?
Use _numeric_ and dim functions in array.
data readin;
set outdata;
array Q(*) _numeric_;
do i= 1 to dim(Q);
if Q(i) = 6 then Q(i)= .;
end;
run;
Note : DIM returns a total count of the number of elements in array
dimension Q.
34. How to calculate mean for a variable by group?
Suppose Q1 is a numeric variable and Age a grouping variable. You wish to
compute mean for Q1 by Age.
PROC MEANS DATA = READIN;
VAR Q1;
CLASS AGE;
RUN;
35. How to generate cross tabulation?
Use PROC FREQ code.
PROC FREQ DATA=auto;
TABLES A*B ;
RUN;
SAS will produce table of A by B.
36. How to generate detailed summary statistics?

Use PROC UNIVARIATE code.


PROC UNIVARIATE DATA= READIN;
CLASS Age;
VAR Q1;
RUN;
Note : Q1 is a numeric variable and Age a grouping variable.
37. How to count missing values for numeric variables?
Use PROC MEANS with NMISS option.
38. How to count missing values for all variables?
proc format;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
run;
proc freq data=one;
format _CHAR_ $missfmt.;
tables _CHAR_ / missing missprint nocum nopercent;
format _NUMERIC_ missfmt.;
tables _NUMERIC_ / missing missprint nocum nopercent;
run;
39. Describe the ways in which you can create macro variables
There are 5 ways to create macro variables:
1.
%Let
2.
Iterative %DO statement
3.
Call Symput
4.
Proc SQl into clause
5.
Macro Parameters.
40. Use of CALL SYMPUT
CALL SYMPUT puts the value from a dataset into a macro variable.
proc means data=test;
var x;
output out=testmean mean=xbar;
run;

data _null_;
set testmean;
call symput("xbarmac",xbar);
run;
%put mean of x is &xbarmac;
41. What are SYMGET and SYMPUT?
SYMPUT puts the value from a dataset into a macro variable where as
SYMGET gets the value from the macro variable to the dataset.
42. Which date function advances a date, time or datetime value by
a given interval?
INTNX function advances a date, time, or datetime value by a given interval,
and returns a date, time, or datetime value. Ex: INTNX(interval,startfrom,number-of-increments,alignment).
43. How to count the number of intervals between two given SAS
dates?
INTCK(interval,start-of-period,end-of-period) is an interval function that
counts the number of intervals between two give SAS dates, Time and/or
datetime.
44. Difference between SCAN and SUBSTR?
SCAN extracts words within a value that is marked by
delimiters. SUBSTR extracts a portion of the value by stating the specific
location. It is best used when we know the exact position of the sub string to
extract from a character value.
45. The following data step executes:
Data strings;
Text1 = MICKEY MOUSE & DONALD DUCK;
Text = scan(text1,2,&);
Run;
What will the value of the variable Text be?
* DONALD DUCK [(Leading blanks are displayed using an asterisk *]
46. For what purpose would you use the RETAIN statement?

A RETAIN statement tells SAS not to set variables to missing when going
from the current iteration of the DATA step to the next. Instead, SAS retains
the values.
47. When grouping is in effect, can the WHERE clause be used in
PROC SQL to subset data?
No. In order to subset data when grouping is in effect, the HAVING clause
must be used. The variable specified in having clause must contain summary
statistics.
48. How to use IF THEN ELSE in PROC SQL?
PROC SQL;
SELECT WEIGHT,
CASE
WHEN WEIGHT BETWEEN 0 AND 50 THEN LOW
WHEN WEIGHT BETWEEN 51 AND 70 THEN MEDIUM
WHEN WEIGHT BETWEEN 71 AND 100 THEN HIGH
ELSE VERY HIGH
END AS NEWWEIGHT FROM HEALTH;
QUIT;
49. How to remove duplicates using PROC SQL?
Proc SQL noprint;
Create Table inter.Merged1 as
Select distinct * from inter.readin ;
Quit;
50. How to count unique values by a grouping variable?
You can use PROC SQL with COUNT(DISTINCT variable_name) to
determine the number of unique values for a column.
51.
Question: If you have 2 sets of Format Libraries, can a
SAS program have access to both during 1 session?
Answer: Yes. Use the FMTSEARCH option.
52.
Describe any three SAS functions

LENGTH: The length of an argument is returned without counting the


trailing blanks.
Ex:
animal=my cat;

len=LENGTH(animal);
Result is - 6

SUBSTR: The SUBSTR function extracts a substring from a given


argument starting at a given position for n characters or until the
end if n is not specified
Ex:
data dsn;
value =(916)734-6241;
substring=SUBSTR(value,2,3);
Result is - 916

TRIM: It removes the trailing blanks from a given character expression


Ex:
Str1 = my;
Str2 = cat;
Result = TRIM(Str1)(Str2);
Result = mycat

53.
What is the purpose of the trailing and How would you use them?
If the data is continuosly in data set SAS would read the first words
only from each line in the `datalines' block and it will ignore the rest of
the line. if we use Trailing @@'it will read completly.and another type of
trailing is using single @ this is a line hold specifier.
Trailing @ is used to hold the record in input buffer to execute another
input statement on the same datalines.
Trailing @@ is used to hold the record in input buffer to execute same
input statement on same datalines intil eof record
The trailing @ or more technically, line hold specifiers are used to hold
the pointer in the same record for
multiple iterations. The two tyoes of line hold specifiers are single
trailing(@) and double trailing(@@).
The single trailing hold the record until it encounters either another
input statement or end of the datastep.
They are used for the records such as
001F38 H
002 F 40 G
To read these values to the datastep
Data example;
input @10 type $ @;
if type='H' then
input @1 id 3. @4 gender $1. @5 age2.;
else if type='G' then
input @1 id3. @5 gender $1. @7 age 2.;
end;
cards;

001F38 H
002 F 40 G
;
run;
The double trailing holds the until the end of the record.
Data example2;
input id age @@;
cards;
001 23 002 43 003 65 004 32 005 54
;
run;
54.
What is the significance of the OF in X=SUM (OF a1-a4, a6,
a9);?
Ans: It is use to tell sas to consider a set of values to be processed. In
the above example,SUM(OF a1-a4,a6,a9) resolves to
SUM(A1,A2,A3,A4,A6,A9).
If we dont use 'OF' then it would be treated as a minus sign..
A1(Minus)-A4 and that is not what we are trying to accomplis
55.

Explain what is the basic structure of SAS programing?

The basic structure of SAS are


Program Editor
Explorer Window
Log Window
56.

What is the basic syntax style in SAS?

To run program successfully, and you have following basic elements:


There should be a semi-colon at the end of every line
A data statement that defines your data set
Input statement
There should be at least one space between each word or statement
A run statement
57.

For example: Infile H: \StatHW\yourfilename.dat;

58.

Explain what is Data Step?

The Data step creates an SAS dataset which carries the data along
with a data dictionary. The data dictionary holds the information
about the variables and their properties.
59.

Explain what is PDV?

The logical area in the memory is represented by PDV or Program Data


Vector. At the time, SAS creates a database of one observation at a
time. An input buffer is created at the time of compilation which holds
a record from an external file. The PDV is created following the input
buffer creation
60.

Mention what are the data types does SAS contain?

The data types in SAS are Numeric and Character.


61.

In SAS explain which statement does not perform

automatic conversions in comparisons?


In SAS, the where statement does not perform automatic conversions
in comparisons.
62.

Explain how you can debug and test your SAS program?

You can debug and test your SAS program by using Obs=0 and
systems options to trace the program execution in log
63.

Mention what is the difference between nodupkey and

nodup options?
The difference between the NODUP and NODUPKEY is that, NODUP
compares all the variables in our dataset while NODUPKEY compares
just the BY variables
64.

Mention the validation tools used in SAS?

For DataSet : Data set name/ debug Data set: Name/stmtchk


For Macros: Options: mprint mlogic symbolgen
65.

Explain what does PROC print, and PROC contents are

used for?
To display the contents of the SAS dataset PROC print is used and also
to assure that the data were read into SAS correctly. While, PROC
CONTENTS display information about an SAS dataset.
66.

Explain what is the use of function Proc summary?

The syntax of proc summary is same as that of proc means, it


computes descriptive statistics on numeric variables in the SAS
dataset.
67.

Explain what Proc glm does?

Proc glm performs simple and multiple regression, analysis of variance


(ANOVAL), analysis of covariance, multivariate analysis of variance and
repeated measure analysis of variance.
68.

Explain what is SAS informats?

SAS INFORMATS are used to read, or input data from external files
known as Flat Files ASCII files, text files or sequential files). The
informat will tell SAS on how to read data into SAS variables.
69.

Mention the category in which SAS Informats are placed?

SAS informats are placed in three categories,


Character Informats : $INFORMATw
Numeric Informats : INFORMAT w.d
Date/Time Informats: INFORMAT w.
70.

What function CATX syntax does?

CATX syntax concatenate character strings remove trailing and leading


blanks and inserts separators.
71.

Explain what is the use of PROC gplot?

PROC gplot has more options and can create more colorful and fancier
graphics.
72. Data step : typically create or modify SAS data set and
they can also used to produce costom-designed reports
Data step used to 1.Put data into SAS data set, cumpute
variable, check for correct error in data, Produce new SAS
data sets by subsetting, merging, and updating existing
data sets

PROC step : They pre-written routines that enable us to


analyze and process the data in a SAS data set and to
present the data in the form of a report
PROC steps sometimes create new SAS data sets that contain
the results of the procedure
PROC steps can list, sort, and summarize data
PROC steps are used to:Create a report that lists the data,
Produce descriptive statistics,Create a summary
report,Produce plots and charts

You might also like