Springboard - R 27 Essential Interview QA
Springboard - R 27 Essential Interview QA
Join our mailing list to stay up to date on our upcoming events Subscribe
August 1, 2018
You’ve handled the data science resume, so you’ll surely score the job now, right? Not so fast. A proper resume gets you an interview—no
more, no less. Now comes the part that could land you the job: slaying the interview. To help make that happen, let’s delve into R interview
questions that could come up during a conversation with a hiring manager or an interview-related test, along with suggested answers.
2. Can you write and explain some of the most common syntax in R?
Again, this is an easy—but crucial—one to nail. For the most part, this can be demonstrated through any other code you might write for
other R interview questions, but sometimes this is asked as a standalone. Some of the basic syntax for R that’s used most often might
include:
# — as in many other languages, # can be used to introduce a line of comments. This tells the compiler not to process the line, so it can be
used to make code more readable by reminding future inspectors what blocks of code are intended to do.
"" — quotes operate as one might expect; they denote a string data type in R.
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 1/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
<- — one of the quirks of R, the assignment operator is <- rather than the relatively more familiar use of =. This is an essential thing for
those using R to know, so it would be good to display your knowledge of it if the question comes up.
\ — the backslash, or reverse virgule, is the escape character in R. An escape character is used to “escape” (or ignore) the special meaning
of certain characters in R and, instead, treat them literally.
I’m running R version 3.5.1 for Windows on my machine with a standard, default install. The output in RStudio for me looks something like
this:
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 2/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
Its open-source nature. This qualifies as both an advantage and disadvantage for various reasons, but being open source means it’s
widely accessible, free to use, and extensible.
Its package ecosystem. The built-in functionality available via R packages means you don’t have to spend a ton of time reinventing the
wheel as a data scientist.
Its graphical and statistical aptitude. By many people’s accounts, R’s graphing capabilities are unmatched.
Memory and performance. In comparison to Python, R is often said to be the lesser language in terms of memory and performance. This
is disputable, and many think it’s no longer relevant as 64-bit systems dominate the marketplace.
Open source. Being open source has its disadvantages as well as its advantages. For one, there’s no governing body managing R, so
there’s no single source for support or quality control. This also means that sometimes the packages developed for R are not the highest
quality.
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 3/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
Security. R was not built with security in mind, so it must rely on external resources to mind these gaps.
Attribute Python R
Cost ★★★★★ ★★★★★
Security ★★★★☆ ★☆☆☆☆
Model building ★★★★★ ★★★★★
Learning curve ★★★☆☆ ★★☆☆☆
Visualization tools and libraries ★★★☆☆ ★★★★★
There are many comparisons to draw between Python and R. They are both free. They both have strong modeling capabilities. Python is
generally considered more secure and easier to learn, but R is typically thought to have better visualization tools and libraries. In many
jobs, you’ll be expected to use both R and Python, so it’s good to know about both, even if you aren’t fluent in both languages.
This code iterates through a range of numbers from 1 to 20 and prints the values. I don’t want to print 15, though, so I’ve used the next
statement to skip that iteration and move on to other values. The output would print 1-14 and 16-20.
myVar <- 15
print(myVar
myVar = 15
print(myVar)
The string “helloWorld” is now stored in the myVar variable. Note that this would produce an error if we got mixed up and tried to plug an
undefined variable object into a string, as in:
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 4/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
Scoping of variables is something to consider as well. <<- acts as the “superassignment” operator and is useful for closures. A good
example on how to use this can be found on stackoverflow.
Vectors
Matrices
Lists
Arrays
Factors
Data frames
statements
return(object)
Functions can be simple or complex, but they should make your code more extensible, readable, and efficient. This is a chance to show
your ingenuity and experience.
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 5/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
Though not required, strictly speaking, the argument header = TRUE is used to ensure that labels are not parsed as data. Making this
argument false means you either don’t have labels in your CSV or you want them to be part of the data output in R.
To do the same with data in a .txt file, simply change read.csv to read.table.
install.packages("package_name")
Followed by:
library(package_name)
It’s that simple. The first command installs the package and the second loads the package into the session.
with(randomDataSet, expression.test(sample))
You can use by() to apply a function to a data frame split by factors. Its usage is something like this:
The data frame plugged into this function is split into data frames (by row) subsetted by the values of factor(s), and a function is then
applied to each subset.
x <- 1:25
mode(x)
[1] "numeric"
y <- "helloWorld"
mode(y)
[1] "character"
mode(state.name)
[1] "character"
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 6/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
In the first line, we assign x to values 1 through 25, so when we run it through mode, we get “numeric” because the variable stores numeric
values. If we, instead, assign the variable to a string such as in the y variable above, we get “character” as the mode. You can try this out
with predefined data sets as well, such as with state.name.
20. What is a factor variable, and why would you use one?
A factor variable is a form of categorical variable that accepts either numeric or character string values. The most salient reason to use a
factor variable is that it can be used in statistical modeling with great accuracy. Another reason is that they are more memory efficient.
Simply use the factor() function to create a factor variable. There’s more information here.
To get a sense of how this works, plug in the letters array and search for the index of a specific letter using which().
In my console, I’ve checked the letters array, which contains the English alphabet in lowercase. I’ve used which() to find the positions of a,
z, and m, which returned the indexes 1, 26, and 13, respectively, because these are the positions in the array, as they are typically the
positions in the alphabet.
paste(hello, world)
I’ve stored Hello, and World. in variables aptly named hello and world. With paste(), I’ve simply plugged in the two variables, and it
concatenates them such that it creates the single phrase “Hello, World.”
An oddity with this function is that it will automatically insert spaces between the terms. Try it out with some numbers.
paste(1,2,3,4)
This can be a sort of “gotcha” with this question. If we don’t want spaces, we can adjust the sep parameter in the function, which defaults
to a space " ".
paste(1,2,3,4,sep=“”)
[1] "1234"
R uses the sort() function to order a vector or factor, listed and described below.
Radix: Usually the most performant algorithm, this is a non-comparative sorting algorithm that avoids overhead. It’s stable, and it’s the
default algorithm for integer vectors and factors.
Quick Sort: This method “uses Singleton (1969)’s implementation of Hoare’s Quicksort method and is only available when x is numeric
(double or integer) and partial is NULL,” according to R Documentation. It’s not considered a stable sort.
Shell: This method “uses Shellsort (an O(n4/3) variant from Sedgewick (1986)),” according to R Documentation.
Install the party package to get started with making the tree.
install.packages("party")
This gives you access to a fancy new function: ctree(), and, at its most basic, this is all we need to create a tree. First, let’s grab some data
from our package; make sure the package is loaded.
library(party)
Now we have access to some new data sets. Part of the strucchange package that bundles with party includes data on youth homicides
in Boston called BostonHomicide. Let’s use that one. You can print the data to the screen if you like.
print(BostonHomicide)
Now we’ll create the tree. The usage of ctree() goes something like this:
ctree(formula,dataset)
We’ve got our data set. I’ll assign it to a variable for simplicity.
plot(treeAnalysis)
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 8/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
Simply put, R is designed for data manipulation and visualization, so it’s natural that it would be used for data science.
***
R is a must-have tool for data scientists, so it’s a must-know component of the data scientist interview. These R interview questions are
just the tip of the iceberg. Familiarize yourself with the technical aspects of the language, the reasons data scientists use R programming
in comparison with other languages, and how to employ the language to accomplish tasks that will be expected of you, and you’ll nail the
interview.
Ready for more? To further develop your data science skills, check out Springboard’s mentor-led Data Science Career Track today.
Alexander Eakins
Alexander is a freelance technical writer and programming hobbyist.
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 9/11
10/9/21, 8:37 PM 27 Essential R Interview Questions (With Answers) | Springboard Blog
DATA S C I E N C E DATA S C I E N C E
Find Free Public Data Sets for 109 Data Science Interview
Your Data Science Project Questions and Answers
Completing your first data science project is a major Preparing for an interview is not easy–there is significant
milestone on the road to becoming a data scientist and helps uncertainty regarding the data science interview questions
READ M O R E READ M O R E
DATA S C I E N C E
READ M O R E
Women In Tech
Community
RESOURCES Universities
Follow Us on Twitter
E-books and Guides
Read Our Stories on Medium
View All Resources
Springboard Library
ALSO OF INTEREST How to Get Your First Software Developer Job,... How to Create a First-Rate UX Resume
Becoming A Full-stack Developer After One Bootcamp
https://www.springboard.com/blog/data-science/27-essential-r-interview-questions-with-answers/ 11/11