R Programming Interview Questions-1
R Programming Interview Questions-1
in data science. The focus is kept on basic yet inevitable data science tasks like data
manipulation, data cleaning, data loading etc. You have 2 hours to complete the task on
hand. Due to time limitations, questions are kept fairly simple yet might make you think
twice.
Considering R has numerous packages and base functions to perform a particular task,
this test will make you familiar with popular ways of performing data science tasks in R.
Q : 1) Two vectors X and Y are defined as X <- c(3, 2, 4) and Y <- c(1, 2). What
will be output of vector Z that is defined as Z <- X*Y
dmCQmCieac2OT BBRSGFNAI337
3,4,0
3,4,4
error
3,4,8
Q 2) If you want to know all the values in c (1, 3, 5, 7, 10) that are not in c (1,
5, 10, 12, 14). Which code in R can be used to do this?
dmCQmCieac2OT YD43CCYEK551
setdiff(c(1,3,5,7,10),c(1,5,10,12,14))
diff(c(1,3,5,7,10),c(1,5,10,12,14))
unique(c(1,3,5,7,10),c(1,5,10,12,14))
33
35
37
31
Q 4) The data shown below is from a csv file. Which of the following
commands can read this csv file as a dataframe into R?
Male 25.5 0
Female 35.6 1
Female 12.03 0
Female 11.30 0
Male 65.46 1
read.csv("Table1.csv")
read.csv("Table1.csv",header=FALSE)
read.table("Table1.csv")
read.csv2("Table1.csv",header=FALSE)
Q 5) The missing values in the data shown from a csv file have been
represented by "?". Which of the below code will read this csv file correctly
into R?
A 10 Sam
B ? Peter
C 30 Harry
D 40 ?
E 50 Mark
dmCQmCieac2OT CZ6DFP02RCSS
read.csv("Table2.csv")
read.csv("Table2.csv",header=FALSE, strings.na="?")
read.csv2("Table2.csv",header=FALSE,sep=",",na.strings="?")
read.table("Table2.csv")
Q 6) The table shown below from a "Train3.csv" file has row names as well as
column names.
Column 1 Column 2 Column 3
Which of the following code can read this csv file properly into R?
dmCQmCieac2OT QNC8V6YJ U1B1
read.delim("Train3.csv",header=T,sep=",",row.names=1)
read.csv2("Train3.csv",header=TRUE,row.names=TRUE)
read.table("Train3.csv",header=TRUE,sep=",")
read.csv("Train3.csv",row.names=TRUE,header=TRUE,sep=",")
Q 7) Which of the following code will fail to read the first two rows of the csv
file?
Column 1 Column 2 Column 3
dmCQmCieac2OT R2HBGXNH9SME
read.csv("Table3.csv",header=TRUE,row.names=1,sep=",",nrows=2)
read.csv("Table3.csv",row.names=1,nrows=2)
read.delim2("Table3.csv",header=T,row.names=1,sep=",",nrows=2)
read.table("Table3.csv",header=TRUE,row.names=1,sep=",",skip.last=2)
Q 8) Which of the following code will read only the second and the third
column (Column 2 and Column 3) into R?
Column 1 Column 2 Column 3
dmCQmCieac2OT AWBDXP6UTXS8
read.table("Table3.csv",header=T,row.names=1,sep=",",colClasses=c("character","
NULL",NA,NA))
read.csv("Table3.csv",header=TRUE,row.names=1,sep=",",colClasses=c("character
","NULL","NA","NA"))
read.csv("Table3.csv",row.names=1,colClasses=c("Null",na,na))
read.csv("Table3.csv",row.names=T, colClasses=TRUE)
Q 9) Below is a data frame which has already been read into R and stored in a
variable named "dataframe1". Which of the below code will produce a
summary (mean, mode, median etc if applicable) of the entire data set in a
single line of code?
V1 V2 V3
1 Male 12.5 46
2 Female 56 135
3 Male 45 698
4 Female 63 12
7 Female 12 457
summary(dataframe1)
stats(dataframe1)
summarize(dataframe1)
summarise(dataframe1)
Q 10) "dataframe2" has been read into R properly with missing values labelled
as NA. Which of the following code will return the total number of missing
values in the dataframe?
A 10 Sam
B NA Peter
C 30 Harry
D 40 NA
E 50 Mark
dmCQmCieac2OT 6ULSOS9MEB2G
table(dataframe2==NA)
table(is.na(dataframe2))
table(hasNA(dataframe2))
which(is.na(dataframe2)
Q 11) Which of the following code will not return the number of missing
values in each column?
A 10 Sam
B NA Peter
C 30 Harry
D 40 NA
E 50 Mark
dmCQmCieac2OT QBHDX8V3Y1PK
colSums(is.na(dataframe2))
apply(is.na(dataframe2),2,sum)
sapply(dataframe2,function(x) sum(is.na(x))
table(is.na(dataframe2))
Q 12) The data shown below has been loaded into R in a variable named
"dataframe3". The first row of data represents column names. The powerful
data manipulation package ‘dplyr’ has been loaded.
Gender Marital Status Age Dependents
Male Married 50 2
Female Married 45 5
Female Unmarried 25 0
Male Unmarried 21 0
Male Unmarried 26 1
Female Married 30 2
Female Unmarried 18 0
Which of the following code can select only the rows for which Gender is
"Male"?
dmCQmCieac2OT 01HL8ZL02MMX
subset(dataframe3, Gender="Male")
subset(dataframe3, Gender=="Male")
filter(dataframe3,Gender=="Male")
option 2 and 3
Q 13) Which of the following code can select the data of married females
only?
Gender Marital Status Age Dependents
Male Married 50 2
Female Married 45 5
Female Unmarried 25 0
Male Unmarried 21 0
Male Unmarried 26 1
Female Married 30 2
Female Unmarried 18 0
dmCQmCieac2OT J 7DL2B0S8X4F
Only 1
Both 1 and 2
Q 14) Which of the following code can select only "Age" and "Dependents"
columns only?
Gender Marital Status Age Dependents
Male Married 50 2
Female Married 45 5
Female Unmarried 25 0
Male Unmarried 21 0
Male Unmarried 26 1
Female Married 30 2
Female Unmarried 18 0
subset(dataframe3, select=c("Age","Dependents"))
select(dataframe3, Age,Dependents)
dataframe3[,c("Age","Dependents")]
Q 15) Which of the following code will convert the class of the "Dependents"
variable to a factor class?
Gender Marital Status Age Dependents
Male Married 50 2
Female Married 45 5
Female Unmarried 25 0
Male Unmarried 21 0
Male Unmarried 26 1
Female Married 30 2
Female Unmarried 18 0
dmCQmCieac2OT QL5VLV1076O1
dataframe3$Dependents=as.factor(dataframe3$Dependents)
dataframe3[,"Dependents"]=as.factor(dataframe3[,"Dependents"])
transform(dataframe3,Dependents=as.factor(Dependents))
Q 16) Which of the following code can calculate the mean age of Female?
Gender Marital Status Age Dependents
Male Married 50 2
Female Married 45 5
Female Unmarried 25 0
Male Unmarried 21 0
Male Unmarried 26 1
Female Married 30 2
Female Unmarried 18 0
dmCQmCieac2OT G3YXQC3E24LO
dataframe3%>%filter(Gender=="Female")%>%summarise(mean(Age))
mean(dataframe3$Age[which(dataframe3$Gender=="Female")])
mean(dataframe3$Age,dataframe3$Female)
Both 1 and 2
Q 17) The data shown below has been read into R and stored in a dataframe
named "dataframe4". It is given that "Has_Dependents" column is read as a
factor variable. We wish to convert this variable to numeric class. Which code
will help us achieve this?
Gender Marital Status Age Has_Dependents
Male Married 50 0
Female Married 45 1
Female Unmarried 25 0
Male Unmarried 21 0
Male Unmarried 26 1
Female Married 30 1
Female Unmarried 18 0
dmCQmCieac2OT CRIVFQXOKI51
dataframe4$Has_Dependents=as.numeric(dataframe4$Has_Dependents)
dataframe4[,"Has_Dependents"]=as.numeric(as.character(dataframe4$Has_
Dependents))
transform(dataframe4,Has_Dependents=as.numeric(Has_Dependents))
Q 18) There are two dataframes stored in two respective variables named
"dataframe1" and "dataframe2".
Dataframe1 Dataframe2
Which of the following code will produce the output as shown below?
B 2000 35.5
C 3000 45.5
D 4000 55.5
E 5000 65.5
F 6000 75.5
G 7000 85.5
H 8000 95.5
dmCQmCieac2OT MCLK86SKKZP9
merge(dataframe1,dataframe2,all=TRUE)
merge(dataframe1,dataframe2)
merge(dataframe1,dataframe2,by=intersect(names(x),names(y))
Q 19) Which of the following code will create a new column named "Size(MB)"
from the existing "Size(KB)" column? The dataframe is stored in a variable
named "dataframe5" (Given 1MB = 1024KB).
Package Name Creator Size(KB)
dmCQmCieac2OT 6HE9EIG39DF7
dataframe5$Size(MB)=dataframe$Size(KB)/1024
dataframe5$Size(KB)=dataframe$Size(KB)/1024
dataframe5%>%mutate(Size(MB)=Size(KB)/1024)
Both 1 and 3
Q 20)Certain Algorithms like XGBOOST work only with numerical data. In that
case, categorical variables present in dataset need to converted to DUMMY
variables which represent the presence or absence of a level of a categorical
variable in the dataset. Below is "dataFrame6":
Male Married 50 0
Female Married 45 1
Female Unmarried 25 0
Male Unmarried 21 0
Male Unmarried 26 1
Female Married 30 1
Female Unmarried 18 0
After creating the dummy variable for "Gender", the dataset looks like below.
Marital
Gender_Male Gender_Female Age Has_Dependents
Status
1 0 Married 50 0
0 1 Married 45 1
0 1 Unmarried 25 0
1 0 Unmarried 21 0
1 0 Unmarried 26 1
0 1 Married 30 1
0 1 Unmarried 18 0
dummies:: dummy.data.frame(dataframe6,names=c("Gender"))
Q 21) We wish to calculate the correlation between "Column 2" and "Column
3" of "dataframe7". Which of the below code will achieve the purpose?
Column1 Column2 Column3 Column4 Column5 Column6
Name8 Male 42 84 54 0 Mu
dmCQmCieac2OT WQQALBDXCZ8H
cor(dataframe7$column2,dataframe7$column3)
(cov(dataframe7$column2,dataframe7$column3))/
(sd(dataframe7$column4)*sd(dataframe7$column3))
(cov(dataframe7$column2,dataframe7$column3))/
(var(dataframe7$column4)*var(dataframe7$column3))
All of the above
Name8 Male 42 84 54 0 Mu
dataframe8$Column3[which(dataframe8$Column3==NA)]=mean(dataframe8$Colu
mn3)
dataframe8$Column3[which(is.na(dataframe8$Column3))]=mean(dataframe8$Colu
mn3)
dataframe8$Column3[which(is.na(dataframe8$Column3))]=mean(dataframe8$Colu
mn3,na.rm=TRUE)
dataframe8$Column3[which(is.na(dataframe8$Column3))]=mean(dataframe8$Colu
mn3,rm.na=TRUE)
Q 23) "Column7" contains some names with the salutations. In such cases, it
is always advisable to extract salutations in a new column since they can
provide more information to our predictive model. Your work is to choose the
code that cannot extract the salutations out of names in "Column7" and store
the salutations in "Column8".
Column1 Column2 Column3 Column4 Column5 Column6 Column7
dmCQmCieac2OT VZLFQR367V8V
dataframe9$Column8<-sapply(strsplit(as.character(dataframe9$Column7),split =
"[.]"),function(x){x[1]})
dataframe9$Column8<-sapply(strsplit(as.character(dataframe9$Column7),split =
"."),function(x){x[1]})
dataframe9$Column8<-sapply(strsplit(as.character(dataframe9$Column7),split =
".",fixed=TRUE),function(x){x[1]})
dataframe9$Column8<-unlist(strsplit(as.character(dataframe9$Column7),split =
".",fixed=TRUE))[seq(1,18,2)]
dmCQmCieac2OT XFKSNY5ELXL4
as.Date(as.character(dataframe10$Column3),format="%d%m%Y")
as.Date(dataframe10$Column3,format="%d%m%Y")
as.Date(as.character(dataframe10$Column3),format="%d%m%y")
as.Date(as.character(dataframe10$column3),format="%d/%B/%Y")
Q 25) Some ML algorithms work very well with normalized data. Your task is
to convert the "Column2" in the dataframe shown below into a normalised
one. Which of the following code would not achieve that? The normalised
column should be stored in a column named "Column8".
Column1 Column2 Column3 Column4 Column5 Column6 Column7
dataframe11$Column8<-(dataframe11$Column2-mean(dataframe11$Column2))/
sd(dataframe11$Column2)
dataframe11$Column8<-scale(dataframe11$Column2)
dataframe11$Column8<-normalizecolumn(dataframe11$Column2)
dmCQmCieac2OT 1AZTG9ICV3IH
write.csv("result.csv", dataframe12)
write.csv(file="result.csv",x=dataframe12,row.names = FALSE)
Both 2 and 3
2000
1000
1999
1998
Q 28) The dataset has been stored in a variable named "dataframe13". We
wish to see the location (index) of all those persons who have “Ms” in their
names (Column7). Which of the following code will not help us achieve that?
Column1 Column2 Column3 Column4 Column5 Column6 Column7
grep(pattern="Ms",x=dataframe13$Column7)
grep(pattern="ms",x=dataframe13$Column7, ignore.case=T)
grep(pattern="Ms",x=dataframe13$Column7,fixed=T)
grep(pattern="ms",x=dataframe13$Column7,ignore.case=T,fixed=T)
Q 29) The data below has been stored in "dataframe14". We wish to find and
replace all the instances of "Male" in "Column1" with "Man". Which of the
following code will help us do that?
Column1 Column2 Column3 Column4 Column5 Column6 Column7
Name1 Male 12 24081997 54 0 Alpha Mr.Sam
sub("Male","Man",dataframe14$Column1)
gsub("Male","Man",dataframe14$Column1)
dataframe14$Column1[which(dataframe14$Column1=="Male")]="Man"
Q 30) Which of the following command will display the classes of each column
for the following dataframe?
Column1 Column2 Column3 Column4 Column5 Column6 Column7
lapply(dataframe,class)
sapply(dataframe,class)
Both 1 and 2