0% found this document useful (0 votes)
19 views29 pages

SAS Interview Questions and Answers.

The document contains a comprehensive list of SAS interview questions and answers focused on various topics such as dataset options, merging datasets, date functions, and SAS macros. It covers differences between statements and options, how to manipulate datasets, and various programming techniques in SAS. The content is structured in a question-and-answer format, making it a useful resource for preparing for SAS-related interviews.

Uploaded by

fatteamol123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views29 pages

SAS Interview Questions and Answers.

The document contains a comprehensive list of SAS interview questions and answers focused on various topics such as dataset options, merging datasets, date functions, and SAS macros. It covers differences between statements and options, how to manipulate datasets, and various programming techniques in SAS. The content is structured in a question-and-answer format, making it a useful resource for preparing for SAS-related interviews.

Uploaded by

fatteamol123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

SAS Interview Questions and Answers

BASE SAS
Q1. What is the difference between Rename dataset option and Rename statement?
Ans: 1) The RENAME statement applies to all output data sets. And
rename dataset option applies to specific dataset only.
2) The RENAME = data set option can also be used in PROC steps.

Q2. If you want to rename different variables in different output data sets what will be used, Rename
statement or Rename dataset option?

Ans: Rename dataset option because Rename statement applies to all output data sets only.

Q3. Find out the error in below program?

Ans: Error is in where statement, as score1 variable already renamed to score2. And there is no
score1 variable present now for where statement. So correct statement is:
where score2 > 85;

Q4. What will be the output of below program?

Ans: Employee bonus dataset will have observations filtered with second where statement
department=”HR”, because when multiple where statements are written in program, last
where statement overwrite all other where statements.

Q5. What will be the output of below program?

Ans: Dataset employee will have observations where salary>50000, because where country=”US”
statement will be ignored.

Q6. What is the difference between where dataset option and where statement?

Ans: The WHERE statement applies to all input data sets, whereas the WHERE= data set option
selects observations only from the dataset for which it is specified.

Q7. Write a program to create employee_info dataset by reading below mentioned employee data.
And output employee_info dataset should only have emp_id and emp_doj variables?
Dataset name: employee
Variable name: emp_id, emp_name, emp_age, emp_address,
emp_state, emp_country, emp_dob, emp_doj, emp_dept,
emp_salary, emp_status.

Ans:

Note: interviewer here wants to know how efficient your code is, so use keep dataset option as you
only need two variables out of 11 and you should use keep dataset option with input dataset (in set
statement). Below program will also give you the same result but this is not efficient, as this one will
read all variables and will process all the variables and then output two variables in output dataset.

Q8. What is the difference between Keep dataset option and Keep statement?

Ans: KEEP = data set option can apply to both input and output data sets. The KEEP statement
applies only to output data sets.

Q9. Write a program to create employee_info dataset by reading below mentioned employee data
and output employee_info dataset should not have emp_id and emp_doj variables?
Dataset name: employee
Variable name: emp_id, emp_name, emp_age, emp_address,
emp_state, emp_country, emp_dob, emp_doj, emp_dept, emp_salary,
emp_status.

Ans:

Note: Use DROP dataset option as you need 9 variables out of 11 and you should use drop dataset
option with input dataset, so that only 9 variables get read.

Q10. What is the difference between Drop dataset option and Drop statement?

Ans: In DATA steps, the DROP = data set option can apply to both input and output data sets. The
DROP statement applies only to output data sets.

Q11. What is IN dataset option?

Ans: Creates a Boolean variable that indicates whether the data set contributed data to the current
observation, the value of the variable is 1 if the data set contributed to the current observation,
and 0 otherwise.

Q12. What is BY Group Processing in SAS?

Ans: BY-group processing is a method of processing observations from one or more SAS data sets
that are grouped or ordered by values of one or more common variables.

Q13. What condition is required for BY group processing?


Ans: Dataset should be sorted/indexed according to by group variable.

Q14. What do you understand by FIRST.variable and LAST.variable ?

Ans: First and Last variables are created when you use by group processing. SAS sets First.variable = 1
when it is processing the first observation in a by group and sets Last.variable =1 when it is
processing the last observation in a BY group.

Q15. How to remove duplicate rows in below dataset using First.Variable and Last.variable?

Ans:

Q16. Write a program to calculate department wise total salary using First.variable and Last.variable?

Ans:
Q17. What is dictionary table concept?

Ans: Dictionary tables store metadata about current SAS session, such as libraries, datasets, across,
indexes, options etc. You can also use sashelp views for the same corresponding tables, for example
dictionary.tables you can use sashelp.Vtable.

Q18. What type of information do dictionary.tables and dictionary.columns contains?

Ans: As the name suggests dictionary.tables contains information about datasets, no. of
observations, whereas dictionary.columns contains information regarding dataset and its columns.

Q19. What can we achieve using dictionary tables?

Ans: We can perform various operations that uses metadata, such as we can access all datasets
present in one location, find out number of observations/columns in each dataset, rename them or
we can find out if a particular column is present in how many datasets.

Q20. Write a program to find out what all datasets in SASUSER library contain "ID" variable?

Ans:

Q21. Write a program to add month name at the end of all the datasets present in SASUSER library?
(If dataset name is admit then new name should be ADMIT_MARCH).
Ans:

Q22. Write a program to create output that contains dataset name and number of observations in
each dataset present in SASUSER library.

Ans:

Q23. What are the various ways to combine datasets in SAS?

Ans:

1. Ans: SET Statement

2. Proc Append

3. Datastep Merge

4. Proc SQL Joins

5. Proc SQL SET Operators

6. Hash Object

Q24. What is interleaving (Sandwich observations)?

Ans: Interleaving happens when set statement is used with a BY statement with multiple datasets.
Values are arranged according to BY Group variable in the order of datasets in which they present.

Q25. What is one to one reading/matching?


Ans: The process of combining observations from two or more datasets into one observation using
multiple SET statement is called “one to one reading/matching”. The total number of observations in
new dataset are equal to the number of observations in the smallest dataset. Example:

Ans:

One Two

Output

Q 26. Which is better Proc Append or SET statement?

Ans: If we are combining only two datasets then Proc Append is better, because it adds observations
from 2nd dataset to the end of first dataset. It does not process any observation from First dataset.
Whereas in SET statement observations are processed from both datasets. Also Proc append provide
an option for fast load while working with databases.

Q27. What type of merging is used in below program?

Ans: One to one merge. It is also called positional merge because in the above program we are not
using by variable for merging, so it will merge both the datasets based on the position of variables.
Total number of observations in output dataset would be equal to the largest dataset that is used in
merge.

Q28. How many observations would be in output dataset Match merge?

Ans: 5 observations.
Q29. What is the difference between Proc SQL join and datastep Merge?

Ans:

1. The MERGE statement does not produce a Cartesian product. And because of this
Datastep merge is faster than Proc SQL joins.
2. Sorting is required for Datastep merge and for proc SQL join this is not required.
3. Many to many joins works better in proc SQL joins.

Q30. Write a program to achieve below scenario?

Common Variable is ID

Output dataset = M_left

Ans: We need to achieve left join using data step Merge. For this we need to use IN dataset option. It
assigns 1 to variable x or y if particular dataset has a observation for that specific by group.

Q31. Write a program to achieve below scenario?

Common Variable is ID

Output dataset = M_inner

Ans: We need to achieve inner join using data step Merge. For this we need to use IN dataset option.
Here we are checking if a particular value for by group is present in both datasets.

Q32. Write a program to achieve the scenario below?

Common Variable is ID
Output dataset = Left_Unique

Ans: We need to pick all observations that are present in A (left) dataset and those should not
present in B dataset.

Code
A

B Result: Left Unique

Q33. Write a program to convert 22754 numbers into date (example 19Apr2022).

Ans:

Q34. Write a program to convert “19Apr2022” string into date (19/04/2022).


Date literal = “19Apr2022”d;

Ans:

Q35. Write a program to read below data and create EMP dataset. DOJ has multiple date types.
Ans:

Q36. Write a program to convert current date into Teradata date format. Example if today is
19apr2022 then convert this into Teradata format as date '2022-04-19‘.

Ans:

Q36. Write a program to extract full month name from a numeric date (example 22754).

Ans:

Q37. What is PROC SQL Pass through?


Ans: Pass-through is a SQL facility, that is use to communicate with database using native SQL
statements without leaving current SAS session. The SQL statements are passed directly into
database. SAS has two types of pass-through:

1. Implicit Pass-Through
2. Explicit Pass-Through

Q38. What is Implicit Pass-through?

Ans: When we use libname statement to access database, it uses SAS access engine to convert SAS
statements into database specific SQL statements, this is called Implicit pass-through.

Q39. What is Explicit Pass-through?

Ans: When we directly write native SQL statements of specific database then it is called Explicit pass-
through. Here SAS Access Engine does not convert SAS statements into any specific native database
query. SQL query is processed by database engine itself. Example:

Above query will return all records from client_order table and create a New_order table in SAS.
However If you want to create a table inside Teradata database then you need to use execute
statement same as below:

Q40. Write a program to read below Excel file using explicit pass through.

Ans:
Q41. Write a program to read MS Access file using explicit pass through.

File Name = avg_salary.accdb


Table Name = salary

Ans:

Q42. What is SAS Trace option?

Ans: SAS Trace option is used to put messages in your log when you access external data from
various data sources. It has various options.

Q43. What is format and informat and what is the difference between both?

Ans: Formats and informats are pre-defined patterns for interpreting or displaying data values. This
means informats are applied at input/reading time, for example: while reading data from for RAW
data file or reading instream data. Whereas formats are used when we write values to the output
dataset. Formats and Informats are of two types:

1. System Defined
2. User Defined

Q44. Write a program to create user defined format to convert numeric gender values into full
gender name in below emp data set?
Ans:

Q45. What is dynamic format?

Ans: You can create a format using SAS dataset, for this CNTLIN option is used in proc format. Input
data set must at least contain the three variables

1. FMTNAME is the name of informat or format,


2. START gives the range’s starting value,
3. END gives the range’s end value,
4. LABEL is character variable whose value is the informatted or
formatted value

Q46. What is the use of single hyphen (-) statement?

Ans: Single Hyphen (-) character is to used specify range of variables.

Q47. What is the use of double hyphen (--)?

Ans: Double Hyphen (--) characters are used to specify consecutive set of variables.
Q48. What is the use of colon (:)?

Ans: Colon (:) character is used to specify set of variables that have common start name (prefix).

Q49. Add +- SIGN in below Numbers as per below output.

Input = output=

Ans:

Q50. Add Dash – in below Credit Card Numbers as per output.

Input= Output=

Ans:
Q51. Find out the errors in below program?

Ans:

Q52. How many observations will be there in dataset audi_cars?

Ans: Dataset audi_cars will have 0 observations and all the columns of sashelp.cars, along with new
column Brand. Above program will copy the descriptor portion of sashelp.cars in audi_cars dataset.

Q53. How to compare two SAS datasets?

Ans: Proc Compare procedure is used to compare two SAS datasets.

Q54. What will be the output generated if input dataset cars have 100 observations?

data one (obs=10);


set sashelp.cars(firstobs=10);
run;
Ans: There will be a warning, as obs options doesn’t work with output dataset. However dataset one
will be created and will have 91 observations.

Q55. What will be the output?

data subset;
age=20;
where age> 12;
run;

Ans: No output, code will give error because where cannot be used on newly created variable.

Q56. How many observations will the dataset employee contain?

Data employee(drop = i);


val = 20;
do I = 1 to 20;
val+1;
end;
output;
run;

Ans: one

Q57. How to remove duplicate values from data?

Ans: proc sort Nodup option or distinct

Q58. How SAS store dates?

Ans: SAS stores date as number of days from 1st Jan 1960.

Q59. Data X has 4 observations, how many observations in dataset Y ?

Data Y;
set X;
output;
output;
run;

Ans: 8

Q60. Which proc is often used to turn columns into rows?

Ans: Proc Transpose

Q61. Which statement will prevent the value being reinitialized to missing at the end of data iteration?

Ans: Retain

Q62. What date function will you use to calculate age of person if you have birthdate column in your
data?

Ans: age = intck ('YEAR', date1, date2);

Q63. What is the use of yearcutoff option?

Ans To set 100 year span. So that two digit year can pick the year accordingly.
Q64. Write a program to replace all numeric missing values of all numeric variables should replace with
letter X.

Ans: options missing=X;

Q65. Use Risk_scores dataset and create Risk_Score_1 dataset and copy only last observation of
Risk_scores dataset.

Ans:

data risk_score_1;
set risk_scores end=x;
if x=1;
run;

Q66. Create Department_salary dataset using Employee dataset, output dataset should have total
salary for each department.

Ans:

Data Department_salary;
set employee;
by department;
If first.department=1 then total_sal=salary;
else total_sal+salary;
if last.department;
run;

Q67. What will be the output of below program.

Dataset A

Dataset B
Ans:

Q68. What will happen if you miss to apply by statement while using merge in below code.

Ans: It will be a positional merge.

Q69. What is the way to remove format on a variable?

Ans: format variable name.

Q70. Is there a way we can use a dataset and create formats of all the observations present in the
dataset?

Ans: cntlin.

SAS MACRO
Q71. What are the various ways to create macro variables in SAS?

Ans: You can create macro variables using below methods:

1. %Let statement

2. %Global statement

3. %Local statement

4. Into statement in proc SQL

5. Call Symput routines and call Symputx routine

6. %do loop

7. By passing parameters in macros

Q72. Create a macro variable that counts the number of observations in a dataset?

Ans:
proc sql;
select count(*) into: nrows from sashelp.class;
quit;

Q73. Difference between %LOCAL and %GLOBAL?

Ans: %LOCAL is used to create a local macro variable during macro execution. It gets removed when
macro finishes its processing.
%GLOBAL is used to create a global macro variable and would remain accessible till the end of a
session. It gets removed when session ends.

Q74. Difference between %STR and %NRSTR?

Ans: Both %STR and %NRSTR functions are macro quoting functions which are used to hide the
normal meaning of special tokens and other comparison and logical operators so that they appear as
constant text. The only difference is %NRSTR can mask the macro triggers ( %, &) whereas %STR
cannot.

Q75. Please explain if below statements are correct or incorrect?

Ans: Correct. %substr() can be used in open code.

Q76. What is the value of macro variable X inside the macro and outside the macro while using %put
statement.

Ans: 20

Q77. What value would be printed in log?

Ans: z=20+30
Q78. How we can resolve value 50 in variable z in above question?

Ans: By using %eval() function.

Q79. What if values in above question have decimals?

Ans: In this case use %sysevalf().

Q80. How we can access Datastep function in SAS Macro?

Ans: Using %sysfunc().

Q81. What is difference between positional parameters and keyword parameters?

Ans: For positional parameters position is fixed, and positional parameters comes first in mixed case
scenarios.

Q82. What is the difference between Call Symput and call Symputx?

Ans: Both are used to create macro variables in datastep however call Symputx additionally removes
leading and trailing blanks. Ans also allow to create local and global macro variables.

Q83. Write a program to print all the macro variables in SAS log?

Ans: Use below statement to print all automatic, and user defined macro variables in
SAS log.

Q84. How can you remove a macro variable?

Ans: %symdel statement is used to delete macro variable. In below example macro variable X will be
deleted. Note: with %symdel statement & is not required with variable name.

Q85. How to debug a SAS macro?

Ans:

1. Symbolgen/nosymbolgen

2. Mprint/nomprint

3. Mlogic/nomlogic

Q86. What is the use of Symbolgen?

Ans: It displays information about macro variable reference in the log.

Q87. What is the use of MPrint?

Ans: Displays all the SAS statements that are sent to macro processor for execution.

Q88. What is the use of Mlogic?

Ans: Mlogic debugging option is used to print statements of step by step execution of a Macro. It
displays statements like, beginning execution, ending execution, logical operations values such as
%do loop, %if statement and macro parameters executions.

Q89. How many & will require in below %put statement?


%put ?c;
Ans: 7 ampersand(&).
Formula is 2number of variables -1
Total variables here are 3 (a,b,c) hence:

23 -1
=2*2*2-1
=8-1
Answers is 7
Q90. What is the use of %include statement?

Ans: We can save macro programs permanently in an external file. We can access that specific
macro using %include statement. Macro is compiled when you submit %include statement. And you
can call this macro anytime during the SAS session.

Q91. How to print macro definitions in the log if we access a macro using %include?

Ans: Use Source2 option.

PROC SQL
Q92. Find out the third highest MSRP in sashelp.cars data?

Answer1

Answer2

Q93. In below query what is the use of group by Clause?

Ans: In above query group by clause is used however summary (aggregate) function is not used
hence group by clause will convert to order by clause and sort the dataset ascending by variable
make.

Q94. In below query what is the use of summary function without group by clause?

Ans: In above query summary (aggregate) function is used however group by clause is not used
hence entire table will be considered as one group and Total_sales variable will have sum of all MSRP
variable values.
Q95. What is inline view?

Ans: An in-line view is a nested query that is specified in the outer query’s FROM clause.

Q96. What is SAS View and how to create it using Proc SQL?

Ans: A PROC SQL view is a stored query that is executed when you use the view in a SAS procedure or
DATA step. The view contains only the logic for accessing the data, not the data itself.

Q97. What are the benefits of a Proc SQL View?

Ans:

1. Save space (because SAS view is actually a query not physical data)
2. Input data sets are always current because data is derived from tables at
execution time.
3. Hide complex joins or queries from users.

Q98. What is the use of calculated keyword?

Ans: Calculated keyword is used, when we use any derived/calculated variable in select or
where clause.

Q99. Data step merge and Proc SQL join which one is better.

Ans:

1. It depends on the situation some places Data step Merge is better, and some places
Proc SQL Join is better. Below are some of these situations:
2. PROC SQL requires additional resources as compared to Data step MERGE for
simple joins.
3. Proc SQL can join maximum 256 tables in a query however in Data step merge we
can use any number of datasets.
4. Data step merge, datasets must be sorted as per the by variable, however in Proc SQL
it’s not required.
5. In Data step merge you can write complex logic, using do loops, arrays etc.
6. Proc SQL perform better in Many to Many joins, and Data Step Merge give
unexpected results.
Q100. What is Cartesian Product?

Ans: In a Cartesian product, each row in the first table is combined with every row in the second
table. In all types of joins, Cartesian Product is generated first then any sub setting condition is
applied. Size of a Cartesian Product can be a problem.

Q101. Write a query to display cartesian product as output?


Ans:

Q102. How to limit numbers of rows in proc SQL display?

Ans: Apart from where or having one can use below:

Q103. How to check if a Complex Proc SQL query has no syntax errors?

Ans: Using Validate keyword.

Q104. What is preferred SQL or data step?

Ans: "Proc SQL is very convenient for performing table joins compared to a data step merge as it
does not require the key columns to be sorted prior to join. A data step is more suitable for
sequential observation-by-observation processing.

PROC SQL can save a great deal of time if u want to filter the variables while selecting or we can
modify them, apply format and creating new variables, macro variables…as well as subsetting the
data. PROC SQL offers great flexibility for joining tables."

Q105. What will be the output?

Proc sql;
(Select * from A except
Select * from B)
Union
(Select * from B Except
Select * from A);
quit;
Ans: C Joshua
D Laurna

Q106. Suppose you have data for employees. It comprises of employees' name, ID and manager ID
You need to find out manager name against each employee ID.employee.

Note it’s a self join question.

Input

Output

Ans:

proc sql;

create table want as


select a.*, b.Name as Manager
from example2 as a left join example2 as b
on a.managerid = b.id;

quit;

Q107. Create a blank table with just a structure of another table?


Ans: PROC SQL;
CREATE TABLE a LIKE b;
QUIT;

Q108. How to checkt if data set has empty or has number of rows?

Ans: Select Count(*) as total_observation from table_name;

Q109. Select the Second Highest Score with PROC SQL

Ans: proc sql;


select *from example5
where score in (select max(score) from example5 where score not in (select max(score) from
example5));
quit;

Q110. How many observations will be there if we produce cartesian product using below datasets?

Table One

Table Two

Ans: 3*3=9

Q111. By looking in the below X and Y datasets, Kindly answers number of rows below:

X dataset have 15 Observations.

Y dataset have 7 Observations.

Common observation are 5.

How many observation if we do below joins?

i. Left Join
ii. Right Join
iii. Inner Join
iv. Full Join
Ans: Left Join=15, Right Join 7, Inner 5 and Full Join 15+7-5=17 observation.

Q112. What happens If you specify a GROUP BY clause in a query that does not contain a summary
function select caluse?

Ans: Group by caluse will covert to order by caluse.

Q113. What will happen if query contains summery function in select clause and GROUP BY Clause is
not present in the query?

Ans: Entire table will become one group.


Q114. How can you find duplicate records without using distinct keyword in Proc SQL?

Ans:

Proc SQL;
select Account_number , count(*) as num from account_header
group by Account_number
having num>1;
Quit;

Q115. Write a query to find All unique observations from table A, as shown below:

Ans:

Q116. Write a query to find All unique observations from both table A and B, as shown below:

Ans:
SAS Functions

Q117. What is the use of Coalesce function?

Ans: The COALESCE function returns the first non-missing value in all the listed argument.

Q118. What will be the length of Variable Message in below program?

Ans: Length is 200. And this is because cat function returns default length as 200, if
length is not previously specified.

Q119. Difference between SCAN() and SUBSTR() Function.

Ans:

Q120. Difference Between Find and Index Function?

Ans:
Q121. What is the use of INTNX() function?

Ans It increments/decrements date and time by intervals such as month, year etc. Example:

Q122. What is the use of Intck function?

Ans: The INTCK function returns the value of the complete interval passed between two dates, it can
take diff arguments like week,month,year;

data Age;
dob='13oct1981'd;
age_years=intck('year',dob,today());
run;

Q123. Difference between INTCK() and INTNX() functions?

Ans:

Q124. Write a program to set value missing (blank) for all character variables where values are either
‘?’ Or ‘NA’.

Ans:
Q125. Write a program to find value 100 in all numeric variables and replace it with 999.

Ans:

Q126. Array Statement is Compile time statement or Execution time statement?

Ans: Compile Time.

Q127. What function is used to count array elements?

Ans: Dim() function.

Q128. What type of Array should be used as calculations only?

Ans: _temporary_

You might also like