0% found this document useful (0 votes)
40 views

R Programming Interview Questions-1

Uploaded by

sherlimca
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

R Programming Interview Questions-1

Uploaded by

sherlimca
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

This skill test is exclusively designed to evaluate your competence and comfort using R

in data science. The focus is kept on basic yet inevitable data science tasks like data
manipulation, data cleaning, data loading etc. You have 2 hours to complete the task on
hand. Due to time limitations, questions are kept fairly simple yet might make you think
twice.

Considering R has numerous packages and base functions to perform a particular task,
this test will make you familiar with popular ways of performing data science tasks in R.

Total number of questions: 55.

Q : 1) Two vectors X and Y are defined as X <- c(3, 2, 4) and Y <- c(1, 2). What
will be output of vector Z that is defined as Z <- X*Y
dmCQmCieac2OT BBRSGFNAI337

3,4,0

3,4,4

error

3,4,8

Q 2) If you want to know all the values in c (1, 3, 5, 7, 10) that are not in c (1,
5, 10, 12, 14). Which code in R can be used to do this?
dmCQmCieac2OT YD43CCYEK551

setdiff(c(1,3,5,7,10),c(1,5,10,12,14))

diff(c(1,3,5,7,10),c(1,5,10,12,14))

unique(c(1,3,5,7,10),c(1,5,10,12,14))

None of the Above.

Q 3) What is the output of f(2) ?


b <- 4
f <- function (a)
{
b <- 3
b^3 + g (a)
}
g <- function (a)
{
a*b
}

33

35

37

31

Q 4) The data shown below is from a csv file. Which of the following
commands can read this csv file as a dataframe into R?

Male 25.5 0

Female 35.6 1

Female 12.03 0

Female 11.30 0

Male 65.46 1

dmCQmCieac2OT MXVZFT6J NBNO

read.csv("Table1.csv")

read.csv("Table1.csv",header=FALSE)

read.table("Table1.csv")

read.csv2("Table1.csv",header=FALSE)

Q 5) The missing values in the data shown from a csv file have been
represented by "?". Which of the below code will read this csv file correctly
into R?
A 10 Sam

B ? Peter

C 30 Harry

D 40 ?

E 50 Mark

dmCQmCieac2OT CZ6DFP02RCSS

read.csv("Table2.csv")

read.csv("Table2.csv",header=FALSE, strings.na="?")

read.csv2("Table2.csv",header=FALSE,sep=",",na.strings="?")

read.table("Table2.csv")

Q 6) The table shown below from a "Train3.csv" file has row names as well as
column names.
Column 1 Column 2 Column 3

Row 1 15.5 14.12 69.5

Row 2 18.6 56.23 52.4

Row 3 21.4 47.02 63.21

Row 4 36.1 56.63 36.12

Which of the following code can read this csv file properly into R?
dmCQmCieac2OT QNC8V6YJ U1B1

read.delim("Train3.csv",header=T,sep=",",row.names=1)

read.csv2("Train3.csv",header=TRUE,row.names=TRUE)

read.table("Train3.csv",header=TRUE,sep=",")

read.csv("Train3.csv",row.names=TRUE,header=TRUE,sep=",")
Q 7) Which of the following code will fail to read the first two rows of the csv
file?
Column 1 Column 2 Column 3

Row 1 15.5 14.12 69.5

Row 2 18.6 56.23 52.4

Row 3 21.4 47.02 63.21

Row 4 36.1 56.63 36.12

dmCQmCieac2OT R2HBGXNH9SME

read.csv("Table3.csv",header=TRUE,row.names=1,sep=",",nrows=2)

read.csv("Table3.csv",row.names=1,nrows=2)

read.delim2("Table3.csv",header=T,row.names=1,sep=",",nrows=2)

read.table("Table3.csv",header=TRUE,row.names=1,sep=",",skip.last=2)

Q 8) Which of the following code will read only the second and the third
column (Column 2 and Column 3) into R?
Column 1 Column 2 Column 3

Row 1 15.5 14.12 69.5

Row 2 18.6 56.23 52.4

Row 3 21.4 47.02 63.21

Row 4 36.1 56.63 36.12

dmCQmCieac2OT AWBDXP6UTXS8

read.table("Table3.csv",header=T,row.names=1,sep=",",colClasses=c("character","
NULL",NA,NA))

read.csv("Table3.csv",header=TRUE,row.names=1,sep=",",colClasses=c("character
","NULL","NA","NA"))

read.csv("Table3.csv",row.names=1,colClasses=c("Null",na,na))

read.csv("Table3.csv",row.names=T, colClasses=TRUE)
Q 9) Below is a data frame which has already been read into R and stored in a
variable named "dataframe1". Which of the below code will produce a
summary (mean, mode, median etc if applicable) of the entire data set in a
single line of code?

V1 V2 V3

1 Male 12.5 46

2 Female 56 135

3 Male 45 698

4 Female 63 12

5 Male 12.36 230

6 Male 25.23 456

7 Female 12 457

summary(dataframe1)

stats(dataframe1)

summarize(dataframe1)

summarise(dataframe1)

Q 10) "dataframe2" has been read into R properly with missing values labelled
as NA. Which of the following code will return the total number of missing
values in the dataframe?
A 10 Sam

B NA Peter

C 30 Harry

D 40 NA

E 50 Mark

dmCQmCieac2OT 6ULSOS9MEB2G
table(dataframe2==NA)

table(is.na(dataframe2))

table(hasNA(dataframe2))

which(is.na(dataframe2)

Q 11) Which of the following code will not return the number of missing
values in each column?
A 10 Sam

B NA Peter

C 30 Harry

D 40 NA

E 50 Mark

dmCQmCieac2OT QBHDX8V3Y1PK

colSums(is.na(dataframe2))

apply(is.na(dataframe2),2,sum)

sapply(dataframe2,function(x) sum(is.na(x))

table(is.na(dataframe2))

Q 12) The data shown below has been loaded into R in a variable named
"dataframe3". The first row of data represents column names. The powerful
data manipulation package ‘dplyr’ has been loaded.
Gender Marital Status Age Dependents

Male Married 50 2

Female Married 45 5

Female Unmarried 25 0

Male Unmarried 21 0

Male Unmarried 26 1
Female Married 30 2

Female Unmarried 18 0

Which of the following code can select only the rows for which Gender is
"Male"?
dmCQmCieac2OT 01HL8ZL02MMX

subset(dataframe3, Gender="Male")

subset(dataframe3, Gender=="Male")

filter(dataframe3,Gender=="Male")

option 2 and 3

Q 13) Which of the following code can select the data of married females
only?
Gender Marital Status Age Dependents

Male Married 50 2

Female Married 45 5

Female Unmarried 25 0

Male Unmarried 21 0

Male Unmarried 26 1

Female Married 30 2

Female Unmarried 18 0

dmCQmCieac2OT J 7DL2B0S8X4F

subset(dataframe3,Gender=="Female" & Marital Status=="Married")

filter(dataframe3, Gender=="Female" , Marital Status=="Married")

Only 1
Both 1 and 2
Q 14) Which of the following code can select only "Age" and "Dependents"
columns only?
Gender Marital Status Age Dependents

Male Married 50 2

Female Married 45 5

Female Unmarried 25 0

Male Unmarried 21 0

Male Unmarried 26 1

Female Married 30 2

Female Unmarried 18 0

dmCQmCieac2OT MUNXRLATJ B6O

subset(dataframe3, select=c("Age","Dependents"))

select(dataframe3, Age,Dependents)

dataframe3[,c("Age","Dependents")]

All of the above

Q 15) Which of the following code will convert the class of the "Dependents"
variable to a factor class?
Gender Marital Status Age Dependents

Male Married 50 2

Female Married 45 5

Female Unmarried 25 0

Male Unmarried 21 0

Male Unmarried 26 1

Female Married 30 2
Female Unmarried 18 0

dmCQmCieac2OT QL5VLV1076O1

dataframe3$Dependents=as.factor(dataframe3$Dependents)

dataframe3[,"Dependents"]=as.factor(dataframe3[,"Dependents"])

transform(dataframe3,Dependents=as.factor(Dependents))

All of the Above

Q 16) Which of the following code can calculate the mean age of Female?
Gender Marital Status Age Dependents

Male Married 50 2

Female Married 45 5

Female Unmarried 25 0

Male Unmarried 21 0

Male Unmarried 26 1

Female Married 30 2

Female Unmarried 18 0

dmCQmCieac2OT G3YXQC3E24LO

dataframe3%>%filter(Gender=="Female")%>%summarise(mean(Age))

mean(dataframe3$Age[which(dataframe3$Gender=="Female")])

mean(dataframe3$Age,dataframe3$Female)

Both 1 and 2

Q 17) The data shown below has been read into R and stored in a dataframe
named "dataframe4". It is given that "Has_Dependents" column is read as a
factor variable. We wish to convert this variable to numeric class. Which code
will help us achieve this?
Gender Marital Status Age Has_Dependents

Male Married 50 0
Female Married 45 1

Female Unmarried 25 0

Male Unmarried 21 0

Male Unmarried 26 1

Female Married 30 1

Female Unmarried 18 0

dmCQmCieac2OT CRIVFQXOKI51

dataframe4$Has_Dependents=as.numeric(dataframe4$Has_Dependents)

dataframe4[,"Has_Dependents"]=as.numeric(as.character(dataframe4$Has_
Dependents))

transform(dataframe4,Has_Dependents=as.numeric(Has_Dependents))

All of the above

Q 18) There are two dataframes stored in two respective variables named
"dataframe1" and "dataframe2".

Dataframe1 Dataframe2

Feature1 Feature2 Feature3 Feature1 Feature2 Feature3

A 1000 25.5 E 5000 65.5

B 2000 35.5 F 6000 75.5

C 3000 45.5 G 7000 85.5

D 4000 55.5 H 8000 95.5

Which of the following code will produce the output as shown below?

Feature1 Feature2 Feature3


A 1000 25.5

B 2000 35.5

C 3000 45.5

D 4000 55.5

E 5000 65.5

F 6000 75.5

G 7000 85.5

H 8000 95.5

dmCQmCieac2OT MCLK86SKKZP9

merge(dataframe1,dataframe2,all=TRUE)

merge(dataframe1,dataframe2)

merge(dataframe1,dataframe2,by=intersect(names(x),names(y))

None of the above

Q 19) Which of the following code will create a new column named "Size(MB)"
from the existing "Size(KB)" column? The dataframe is stored in a variable
named "dataframe5" (Given 1MB = 1024KB).
Package Name Creator Size(KB)

Swirl Sean Kross 2568

Ggplot Hadley Wickham 5463

Dplyr Hadley Wickham 8961

Lattice Deepayan Sarkar 3785

dmCQmCieac2OT 6HE9EIG39DF7

dataframe5$Size(MB)=dataframe$Size(KB)/1024
dataframe5$Size(KB)=dataframe$Size(KB)/1024

dataframe5%>%mutate(Size(MB)=Size(KB)/1024)

Both 1 and 3
Q 20)Certain Algorithms like XGBOOST work only with numerical data. In that
case, categorical variables present in dataset need to converted to DUMMY
variables which represent the presence or absence of a level of a categorical
variable in the dataset. Below is "dataFrame6":

Gender Marital Status Age Has_Dependents

Male Married 50 0

Female Married 45 1

Female Unmarried 25 0

Male Unmarried 21 0

Male Unmarried 26 1

Female Married 30 1

Female Unmarried 18 0

After creating the dummy variable for "Gender", the dataset looks like below.

Marital
Gender_Male Gender_Female Age Has_Dependents
Status

1 0 Married 50 0

0 1 Married 45 1

0 1 Unmarried 25 0

1 0 Unmarried 21 0

1 0 Unmarried 26 1

0 1 Married 30 1
0 1 Unmarried 18 0

Which of the following command would have helped us to achieve this?


dmCQmCieac2OT O9TMX97WBP63

dummies:: dummy.data.frame(dataframe6,names=c("Gender"))

dataframe6[,"Gender"] <- split(dataframe6$Gender, ifelse(dataframe6$Gender ==


"Male",0,1))

contrasts(dataframe6$Gender) <- contr.treatment(2)

None of the above

Q 21) We wish to calculate the correlation between "Column 2" and "Column
3" of "dataframe7". Which of the below code will achieve the purpose?
Column1 Column2 Column3 Column4 Column5 Column6

Name1 Male 12 24 54 0 Alpha

Name2 Female 16 32 51 1 Beta

Name3 Male 52 104 32 0 Gamma

Name4 Female 36 72 84 1 Delta

Name5 Female 45 90 32 0 Phi

Name6 Male 12 24 12 0 Zeta

Name7 Female 32 64 64 1 Sigma

Name8 Male 42 84 54 0 Mu

Name9 Male 56 112 31 1 Eta

dmCQmCieac2OT WQQALBDXCZ8H

cor(dataframe7$column2,dataframe7$column3)

(cov(dataframe7$column2,dataframe7$column3))/
(sd(dataframe7$column4)*sd(dataframe7$column3))

(cov(dataframe7$column2,dataframe7$column3))/
(var(dataframe7$column4)*var(dataframe7$column3))
All of the above

Q 22) "Column 3" has 2 missing values represented as NA in the


"dataframe8". We wish to impute the missing values using the mean of the
"Column 3". Which code will help us do that?
Column1 Column2 Column3 Column4 Column5 Column6

Name1 Male 12 24 54 0 Alpha

Name2 Female 16 32 51 1 Beta

Name3 Male 52 104 32 0 Gamma

Name4 Female 36 72 84 1 Delta

Name5 Female 45 NA 32 0 Phi

Name6 Male 12 24 12 0 Zeta

Name7 Female 32 NA 64 1 Sigma

Name8 Male 42 84 54 0 Mu

Name9 Male 56 112 31 1 Eta

dmCQmCieac2OT O3PEB1J Q6B4Q

dataframe8$Column3[which(dataframe8$Column3==NA)]=mean(dataframe8$Colu
mn3)

dataframe8$Column3[which(is.na(dataframe8$Column3))]=mean(dataframe8$Colu
mn3)

dataframe8$Column3[which(is.na(dataframe8$Column3))]=mean(dataframe8$Colu
mn3,na.rm=TRUE)

dataframe8$Column3[which(is.na(dataframe8$Column3))]=mean(dataframe8$Colu
mn3,rm.na=TRUE)

Q 23) "Column7" contains some names with the salutations. In such cases, it
is always advisable to extract salutations in a new column since they can
provide more information to our predictive model. Your work is to choose the
code that cannot extract the salutations out of names in "Column7" and store
the salutations in "Column8".
Column1 Column2 Column3 Column4 Column5 Column6 Column7

Name1 Male 12 24 54 0 Alpha Mr.Sam

Name2 Female 16 32 51 1 Beta Ms.Lilly

Name3 Male 52 104 32 0 Gamma Mr.Mark

Name4 Female 36 72 84 1 Delta Ms.Shae

Name5 Female 45 NA 32 0 Phi Ms.Ria

Name6 Male 12 24 12 0 Zeta Mr.Patrick

Name7 Female 32 NA 64 1 Sigma Ms.Rose

Name8 Male 42 84 54 0 Mu Mr.Peter

Name9 Male 56 112 31 1 Eta Mr.Roose

dmCQmCieac2OT VZLFQR367V8V

dataframe9$Column8<-sapply(strsplit(as.character(dataframe9$Column7),split =
"[.]"),function(x){x[1]})

dataframe9$Column8<-sapply(strsplit(as.character(dataframe9$Column7),split =
"."),function(x){x[1]})

dataframe9$Column8<-sapply(strsplit(as.character(dataframe9$Column7),split =
".",fixed=TRUE),function(x){x[1]})

dataframe9$Column8<-unlist(strsplit(as.character(dataframe9$Column7),split =
".",fixed=TRUE))[seq(1,18,2)]

Q 24) "Column3" in the dataframe shown below is supposed to contain dates


in ddmmyyyy format but as you can see, there is some problem with its
format. Which of the following code can convert the values present in
"Column3 into date format?
Column1 Column2 Column3 Column4 Column5 Column6 Column7

Name1 Male 12 24081997 54 0 Alpha Mr.Sam

Name2 Female 16 30062001 51 1 Beta Ms.Lilly

Name3 Male 52 10041998 32 0 Gamma Mr.Mark


Name4 Female 36 17021947 84 1 Delta Ms.Shae

Name5 Female 45 15031965 32 0 Phi Ms.Ria

Name6 Male 12 24111989 12 0 Zeta Mr.Patrick

Name7 Female 32 26052015 64 1 Sigma Ms.Rose

Name8 Male 42 18041999 54 0 Mu Mr.Peter

Name9 Male 56 11021994 31 1 Eta Mr.Roose

dmCQmCieac2OT XFKSNY5ELXL4

as.Date(as.character(dataframe10$Column3),format="%d%m%Y")

as.Date(dataframe10$Column3,format="%d%m%Y")

as.Date(as.character(dataframe10$Column3),format="%d%m%y")

as.Date(as.character(dataframe10$column3),format="%d/%B/%Y")

Q 25) Some ML algorithms work very well with normalized data. Your task is
to convert the "Column2" in the dataframe shown below into a normalised
one. Which of the following code would not achieve that? The normalised
column should be stored in a column named "Column8".
Column1 Column2 Column3 Column4 Column5 Column6 Column7

Name1 Male 12 24081997 54 0 Alpha Mr.Sam

Name2 Female 16 30062001 51 1 Beta Ms.Lilly

Name3 Male 52 10041998 32 0 Gamma Mr.Mark

Name4 Female 36 17021947 84 1 Delta Ms.Shae

Name5 Female 45 15031965 32 0 Phi Ms.Ria

Name6 Male 12 24111989 12 0 Zeta Mr.Patrick

Name7 Female 32 26052015 64 1 Sigma Ms.Rose

Name8 Male 42 18041999 54 0 Mu Mr.Peter

Name9 Male 56 11021994 31 1 Eta Mr.Roose


dmCQmCieac2OT EHQJ XIQKB7AZ

dataframe11$Column8<-(dataframe11$Column2-mean(dataframe11$Column2))/
sd(dataframe11$Column2)

dataframe11$Column8<-scale(dataframe11$Column2)

dataframe11$Column8<-normalizecolumn(dataframe11$Column2)

All of the Above

Q 26) "dataframe12" is the output of a certain task. We wish to save this


dataframe into a csv file named “result.csv”. Which of the following
commands would help us accomplish this task?
Column1 Column2 Column3 Column4 Column5 Column6 Column7

Name1 Male 12 24081997 54 0 Alpha Mr.Sam

Name2 Female 16 30062001 51 1 Beta Ms.Lilly

Name3 Male 52 10041998 32 0 Gamma Mr.Mark

Name4 Female 36 17021947 84 1 Delta Ms.Shae

Name5 Female 45 15031965 32 0 Phi Ms.Ria

Name6 Male 12 24111989 12 0 Zeta Mr.Patrick

Name7 Female 32 26052015 64 1 Sigma Ms.Rose

Name8 Male 42 18041999 54 0 Mu Mr.Peter

Name9 Male 56 11021994 31 1 Eta Mr.Roose

dmCQmCieac2OT 1AZTG9ICV3IH

write.csv("result.csv", dataframe12)

write.csv(dataframe12,"result.csv", row.names = FALSE)

write.csv(file="result.csv",x=dataframe12,row.names = FALSE)

Both 2 and 3

Q 27) What is the length of vector y where y=seq(1,1000,by=0.5)?


dmCQmCieac2OT YC2QIA9HX8BY

2000

1000

1999

1998
Q 28) The dataset has been stored in a variable named "dataframe13". We
wish to see the location (index) of all those persons who have “Ms” in their
names (Column7). Which of the following code will not help us achieve that?
Column1 Column2 Column3 Column4 Column5 Column6 Column7

Name1 Male 12 24081997 54 0 Alpha Mr.Sam

Name2 Female 16 30062001 51 1 Beta Ms.Lilly

Name3 Male 52 10041998 32 0 Gamma Mr.Mark

Name4 Female 36 17021947 84 1 Delta Ms.Shae

Name5 Female 45 15031965 32 0 Phi Ms.Ria

Name6 Male 12 24111989 12 0 Zeta Mr.Patrick

Name7 Female 32 26052015 64 1 Sigma Ms.Rose

Name8 Male 42 18041999 54 0 Mu Mr.Peter

Name9 Male 56 11021994 31 1 Eta Mr.Roose

grep(pattern="Ms",x=dataframe13$Column7)

grep(pattern="ms",x=dataframe13$Column7, ignore.case=T)

grep(pattern="Ms",x=dataframe13$Column7,fixed=T)

grep(pattern="ms",x=dataframe13$Column7,ignore.case=T,fixed=T)

Q 29) The data below has been stored in "dataframe14". We wish to find and
replace all the instances of "Male" in "Column1" with "Man". Which of the
following code will help us do that?
Column1 Column2 Column3 Column4 Column5 Column6 Column7
Name1 Male 12 24081997 54 0 Alpha Mr.Sam

Name2 Female 16 30062001 51 1 Beta Ms.Lilly

Name3 Male 52 10041998 32 0 Gamma Mr.Mark

Name4 Female 36 17021947 84 1 Delta Ms.Shae

Name5 Female 45 15031965 32 0 Phi Ms.Ria

Name6 Male 12 24111989 12 0 Zeta Mr.Patrick

Name7 Female 32 26052015 64 1 Sigma Ms.Rose

Name8 Male 42 18041999 54 0 Mu Mr.Peter

Name9 Male 56 11021994 31 1 Eta Mr.Roose

dmCQmCieac2OT HYQICWB2J SVP

sub("Male","Man",dataframe14$Column1)

gsub("Male","Man",dataframe14$Column1)

dataframe14$Column1[which(dataframe14$Column1=="Male")]="Man"

All of the above

Q 30) Which of the following command will display the classes of each column
for the following dataframe?
Column1 Column2 Column3 Column4 Column5 Column6 Column7

Name1 Male 12 24081997 54 0 Alpha Mr.Sam

Name2 Female 16 30062001 51 1 Beta Ms.Lilly

Name3 Male 52 10041998 32 0 Gamma Mr.Mark

Name4 Female 36 17021947 84 1 Delta Ms.Shae

Name5 Female 45 15031965 32 0 Phi Ms.Ria

Name6 Male 12 24111989 12 0 Zeta Mr.Patrick

Name7 Female 32 26052015 64 1 Sigma Ms.Rose


Name8 Male 42 18041999 54 0 Mu Mr.Peter

Name9 Male 56 11021994 31 1 Eta Mr.Roose

lapply(dataframe,class)

sapply(dataframe,class)

Both 1 and 2

None of the above

You might also like