Are you preparing for your SAS interview and wondering how you can crack it with confidence? This article includes the most commonly asked SAS interview questions covering various topics. This blog is the perfect guide for you to learn all the basic as well as advanced concepts required to clear a SAS interview. For your convenience, we have divided this SAS Interview Questions blog into the following:
- Basic Level SAS Interview Questions
- Intermediate Level SAS Interview Questions
- Advanced Level SAS Interview Questions
Basic Level SAS Interview Questions
Q1. What is SAS? What are its benefits?
Ans. Statistical Analysis Software (SAS) is an integrated software suite used for data analytics, business intelligence, predictive analytics, and data management. It is developed by the SAS Institute. It provides a graphical point-and-click user interface for new users. SAS is preferred over R programming language and Python. It is easy to learn and also provides an easy choice – PROC SQL, for those who have prior knowledge of SQL.
- SAS Syntax is easy to learn.
- It can handle a large database easily.
- SAS is a closed source language that comes with thorough test algorithms.
- It is a comprehensible language and is easy to debug.
- SAS has a great Graphical User Interface (GUI) that has various tools like graphs, plots, and a highly versatile library.
- It is the market leader in corporate jobs and offers huge job prospects.
Q2. What are the features of SAS?
Ans. Following are the features of SAS:
- Strong Data Analysis Abilities
- Flexible 4 Generation Programming Language (4GL)
- SAS Studio
- Support for Various Types of Data Format
- Report Output Format
- Data Encryption Algorithm
Learn – what is Python?
Q3. Explain some capabilities of the SAS framework.
Ans. Following are some capabilities of the SAS framework:
- Access: SAS allows one to access data from various sources such as Excel files, SAS datasheets, Oracle databases, and more.
- Manage: It allows us to manage data to subset data, create variables, data cleaning, and data validation. SAS manages the existing data to provide the data that you need.
- Analyze: After the managing process, it will analyze the data to perform simple evaluations such as frequency and averages and complex analyses. It uses statistical techniques varying from detailed measures such as correlations to logistic regression and mixed models to complicated methods such as Bayesian hierarchical models and modern model selection.
- Present: It helps to present the data in the form of a list, graphic report, and summary. It will present the result of the analysis in a significant report in multiple formats such as HTML, RTF, and PDF. We can print this report, publish it online, or write them to data files.
Q4. Explain the basic structure of the SAS base program?
Ans. The basic structure of SAS consist of:
- ‘==DATA’ step, which recovers & manipulates data.
- ‘==PROC’ step, which interprets the data.
Explore courses related to SAS:
|Popular Python Courses||Popular R for Data Science Courses|
|Top Hadoop Courses||Top Big Data Analytics Courses|
Q5. What are some key concepts of SAS?
Ans. Following are some of the key concepts of SAS are:
KEEP=, DROP= dataset options
Data step logic
Reset to missing, or the RETAIN statement
FORMAT procedure for creating value formats
IN= dataset option
Q6. If a variable contains only numbers, can it be a character data type?
Ans. Yes, it depends on how the variable is used. Some numbers are used as a categorical value rather than a quantity.
Example: The ID of a particular table can be in number but does not specifically represent any quantity.
Explore – what is Data Science?
Q7. How to minimize the number of decimal places for the variable using PROC MEANS?
Ans. You can limit the decimal places by using MAXDEC=option. With this, you can set it equal to the length that you prefer.
Q8. Mention the default statistics that PROC MEANS produces?
Ans. Following are the default statistics produce by PROC MEANS:
- STD DEV
Q9. What is the length assigned to the target variable by the Scan function?
Ans. The SCAN function returns a given word from a character string using default and specific delimiters. The length assigned to the target variable by the scan function is 200.
Learn – what is R Programming?
Q10. Explain the use of the TRANWRD function.
Ans. TRANWRD function is used to control the search and replace functionality. It removes and replaces all the occurrences of a given word. It does not remove trailing blankets in the replacement string and the target string.
Intermediate Level SAS Interview Questions
Q11. Mention the methods to perform a “table lookup” in SAS.
Ans. Following are the five methods to perform “table lookup” in SAS:
- Match Merging
- Format Tables
- Direct Access
- PROC SQL
Q12. What are the most common programming errors that occur in SAS?
Ans. The most common programming errors in SAS are:
- Missing semicolon
- Not checking log after submitting program
- Unmatched quotation marks
- Invalid dataset option
- Invalid statement option
- Not using FSVIEW option vigorously
- Not using debugging techniques
You may also consider taking up:
- Data Analysis Courses
- Data Exploration Courses
- Statistics for Data Science Courses
- Data Science Courses
Q13. What is the difference between DO WHILE and DO UNTIL?
Ans. The DO WHILE expression is evaluated at the top of the DO loop. If the expression is false the first time it is evaluated, then the DO loop will not execute even once. On the other hand, DO UNTIL executes at least once.
Q14. When looking for data contained in a character string of 150 bytes, which function is the best to locate that data: scan, index, or indexc?
Ans. Index function – It searches a character expression for a string of characters.
Q15. Write code using PROC SORT on a data set containing State, District, and County as the primary variables, along with several numeric variables.
Proc sort data= Dist_County;
By state district city;
Learn about Data Analytics, read our post – what is Data Analysis?
Q16. How to remove duplicates using PROC SQL?
Ans. Duplicates can be removed by:
Proc SQL noprint;
Create Table inter.Merged1 as
Select distinct * from inter.readin ;
Q17. Write a code to print observation 5 through 10 from a dataset.
Ans. The FIRSTOBS= and OBS=data set options to allow SAS to print observations 5 through 10 from the data set READIN.
proc print data = readin (firstobs=5 obs=10);
Q18. What is PDV?
Ans. PDV or Program Data Vector represents the logical area in the memory. It creates a dataset of one observation at a time. An input buffer is created at the time of compilation for holding a record from an external file. The PDV is created after the input buffer creation.
When DATA step statements are compiled, SAS determines whether to create an input buffer. If there is raw data in the input file, SAS creates an input buffer to hold the data before moving the data to the PDV.
Q19. What is the difference between Missover and Truncover?
Ans. Missover – When the Missover option is used on the INFILE statement, the INPUT statement does not jump to the next line when reading a short line. When an INPUT statement reaches the end of the current input data record, variables without any values assigned are set to missing.
Truncover –The Truncover option assigns the raw data value to the variable even if the value is shorter than the length that is expected by the INPUT statement. The Truncover option acts similar to the Missover option. However, it takes partial values to fill the first unfilled variable.
The difference between the two is that while Truncover reads partial data that falls at the end of the record, Missover sets the value to missing.
Also explore – Top SAP Security Interview Questions and Answers
Q20. Explain the use of trailing @ and @@.
Ans. The single trailing @ (also known as the column pointer) instructs SAS to hold a record in the input buffer. It is used to peruse a line of your unchecked data before reading additional data in the same record. Using the trailing @ in the INPUT statement allows us to read a part of the raw data line, test it, and decide how to read additional data from the same record.
The double trailing @@ tells the SAS system to hold a record in the input buffer across multiple iterations of the DATA step. It tells the program to release the current raw data line only when there are no data values left to be read from that line.
|Single Trailing @||Double Trailing @ @|
|It instructs SAS to hold a record in the input buffer.||It instructs SAS to hold a record in the input buffer across multiple iterations of the DATA step.|
Q21. How can you Interleave SAS data sets?
Ans. Interleaving allows us to combine individual sorted data sets into one sorted data set. Interleaving is done using a SET statement along with a BY statement. The SET statement is used to mention the data sets we want to interleave and in the BY statement, we mention on which variable we want the final data set sorted.
We can interleave as many data sets as we want. The number of observations in the new data set is the sum of the number of observations in the original data sets.
The following example shows the interleaving of two sorted data sets by the variable Year:
Code to interleave two data sets by Year:
set data1 data2;
Q22. Explain the difference between One-to-One Merge and Match-Merge in SAS.
Ans. One-to-one merge is used when we want to combine one observation from each data set. It is not important to match observations.
For example, when merging an observation that contains an employee’s name and year with an observation that contains a date, time, and location for a conference, it does not matter which employee gets which time slot. In such a case, we will use a One-to-one merge.
Thus, a one-to-one merge is used if both data sets in the merge statement are sorted and each observation in one data set has a corresponding observation in the other data set.
On the other hand, Match-merge is used if the observations do not match.
Confused about whether to learn Python or Java? Read or blog Python Vs Java to find out which one is better to learn.
Q23. How to include or exclude specific variables in a data set?
Ans. To include or exclude specific variables in a data set, we use DROP, KEEP statements and Data set options.
It tells SAS the names of the variables to be removed from the data set.
For example, the following code will drop the variable score from the data set:
It specifies the names of the variables to be retained from the data set.
For example, the following code will keep the variable sum in the data set:
DROP, KEEP Data set Options:
The DROP= KEEP= data set option differs from the DROP KEEP statement as the DROP KEEP statement cannot be used in procedures.
data readin1 (drop=score);
data readin1 (keep=sum);
Advanced Level SAS Interview Questions
Q24. How to use arrays to recode all the numeric variables?
Ans. We can use _numeric_ and dim functions in the array to recode all the numeric variables.
array Q(*) _numeric_;
do i=1 to dim(Q);
if Q(i)=6 then Q(i)=.;
Q25. How to create Macro variables in SAS programming?
Ans. There are multiple ways to create macro variables in SAS programming. Some of them are:
- %LET statement
- Macro parameters (named as well as positional)
- CALL SYMPUTX routin
- INTO in PROC SQL
- %DO statement (iterative)
Check out top free and paid Business Analytics courses on Naukri learning.
Q26. What does P-value signify about the statistical data?
Ans. P-value is used to determine the observed result of the test in statistics. P-value makes the task easy for the users by providing a conclusion, and the value is always between 0 and 1.
- If P-Value > 0.05 then it denotes weak evidence against the null hypothesis, which means the null hypothesis cannot be declined.
- If P-value <= 0.05, it denotes strong evidence against the null hypothesis and indicates that the null hypothesis can be rejected.
- P-value=0.05, which is the marginal value, indicates that it is possible to go either way.
Q27. Explain the difference between the SAS sum function and using the “+” operator?
Ans. In SAS, the sum function returns the sum of missing and non-missing arguments, whereas the “+” operator returns a missing value if any argument or value is missing.
input x y z;
33 3 3
24 3 4
24 3 4
. 3 2
23 . 3
35 4 2
In this code, the value of p is missing from the 4th, 5th, and 6th observation
Check out the commonly asked Python Interview Questions and Answers
Q28. How do you remove duplicate values in SAS?
Ans. There are three methods to delete duplicate observations in the datasheet:
- By using nodups in the procedure
Proc sort data=SAS-Dataset nodups;
- By using an SQL query
Develop SAS – dataset as select * from Old-SAS-Dataset where var=distinct(var);
- By cleaning the data
If first.group and last.group then
Also Explore – what is Big Data?
Q29. Explain the difference between SAS functions and procedures.
Ans. Functions expect argument value to be supplied across an observation in SAS data while the procedure expects one variable value in an observation.
data average ;
set temp ;
avgtemp = mean( of T1 – T24 ) ;
The expressions of the main function are taken under observation where the “mean” function calculates the average of the different values in the observation.
proc sort ;
by month ;
proc means ;
by month ;
var avgtemp ;
Here, “proc” is used to calculate the average temperature by month, and this variable is used for denoting the procedure that means the variable month.
Check out the list of top technology online courses and certifications on Naukri learning.
Q30. How do you identify the number of iterations and specific conditions within a single ‘do’ loop?
Ans. The following code will help you to identify the number of iterations and specific conditions within a single ‘do’ loop:
do i=1 to 20 until(Sum1>=20000);
In this code, the do statement enables you to execute the do loop until the sum is greater than or equal to 20,000 or unit; it occurs 10 times.
Q31. What is a Linear Regression in SAS?
Ans. Linear regression is used to find the relationship between a dependent variable and one or more independent variables. If the score of variable Y is predicted from the score of the second variable X, then, X is determined as the predicted variable and Y as the criterion variable.
Example: Correlation between two variables
create table CARS1 as
SELECT invoice, horsepower, length, weight
WHERE make in (‘Audi’,’BMW’)
proc reg data = cars1;
model horsepower = weight ;
Q32. What is the feature of the max() function in SAS?
Ans. max() function is used in the programming to return the largest value.
- x = max(1, 5, -2)
// output 5
- x = max(1, null, 6)
// output 6
- x = max(-2)
// output -2
- x = max(7, -3*1.5)
// output 7
Also Read: Top Tableau Interview Questions and Answers
Q33. Explain the difference between VAR B1 – B3 and VAR B1 — B3?
Ans. A single dash “-” implies the consecutively numbered variable. A double dash “–” implies variables available in the dataset.
Data Set: ID NAME B1 B2 C1 B3
- B1 – B3 would return B1 B2 B3
- B1– B3 would return B1 B2 C1 B3
Check out popular Data Science Courses and Certifications.
Q34. Explain the condition where you code a SELECT construct instead of IF statements?
Ans. When you have numeric values and a long series of exclusive conditions, then it is better to use the SELECT group rather than IF-THEN or IF-THEN-ELSE statements. It also reduces the CPU time.
The syntax for SELECT WHEN is as follows:
WHEN (1) x=x;
WHEN (2) x=x*2;
WHEN (‘Sun’) wage=wage*1.5;
WHEN (‘Sat’) wage=wage*1.3;
Q35. How to create a permanent SAS dataset?
Ans. A permanent SAS dataset is saved to a location where it can be retrieved and used later. Thus, it will not be recreated each time you restart SAS. There are two ways to create a permanent SAS dataset:
- Assign a library and engine
- Create the data, assign both the library (other than WORK) and dataset name to make the dataset permanent
A library is a location on your computer. It could be a folder or a directory in which SAS data sets and other SAS files are stored. A library refers to the entire folder and not to individual data sets. The libname statement is used to define a library. An engine specifies the type of files that it is to write.
Q36. What is the output of the following program?
do month=1 to 12;
Ans. Output: 12
Q37. Explain some SAS character functions that are used for data cleaning in brief.
Ans. Following are the SAS character functions that are used for data cleaning in brief:
- LOWCASE(char_string) Function: It converts all the characters in a given string to lowercase.
- UPCASE(char_string) Function: This is used for converting all the characters in a given string to uppercase.
- COMPBL(str) Function: It compresses multiple blanks to a single blank.
- TRIM(str) Function: It removes trailing blanks from a given string.
- Strip Function: It removes leading and trailing spaces.
- Compress(char_string) Function: It removes leading, between, and trailing spaces.
- Find Function: It is used to locate a substring within a string
Q38. What is the difference between NODUP and NODUPKEY options?
Ans. The PROC SORT is categorized between NODUP and NODUPKEY options for removing duplicates. The difference between these options are:
|1. It compares just the BY variable present in the dataset.||1. NODUP compares all the variables present in the dataset.|
|2. It checks for and eliminates observations with duplicate BY values.||2. It considers entire observations. When Nodup is specified, the Sort Procedure compares the current observation to the previous observation. If the observations match for all variables, the current observation is left out of the output data set.|
|3. Syntax:PROC SORT DATA=readin NODUPKEY;BY variable name;RUN;||3. Syntax:PROC SORT DATA=readin NODUP;BY variable name;RUN;|
We hope that this SAS interview questions blog will help you boost your interview preparation.
In case you have recently completed a professional course/certification, then