CS2610 Final Exam: If Is - Nan Print
CS2610 Final Exam: If Is - Nan Print
1. Describe the most common use of each of the following and show an if statement to test for it in a
variable x. (14 points)
• nan: not a number, most common is to indicate is not a number.
if (is.nan(x)) {print("x is NaN")}
2. Show a function that returns two vectors v1 and v2 in a list that are the smallest and largest 3 numerics
in the vector x. Check that the x is at least length 3 and numeric. If it is not return an NA. (17 points)
q2 <- function(x) {
if (length(x) < 3 || !is.numeric(x)) { return(NA) }
return(list('v1' = sort(x)[1:3], 'v2' = sort(x,decreasing = TRUE)[1:3]))
}
q2(c(1,6,9,7,2,11,3))
## $v1
## [1] 1 2 3
##
## $v2
## [1] 11 9 7
q2(c(1,2))
## [1] NA
1
3. Factor questions: (17 points)
• What is a factor in a data frame? A factor is a categorical variable. (2 points)
• What is the main use for factors? Factors are used to represent categorical data. Factors can be ordered
or unordered and are an important class for statistical analysis and for plotting.Factors provides levels
for categorical variables (2 points)
• Show how to read in a data frame “dataSet” without treating any columns as factorsl. (2 points)
dataSet = read.table("nfl.csv", sep="," ,header = TRUE)
dataSet = data.frame(dataSet, stringsAsFactors = FALSE)
• For a data frame “dataSet”, show how to check that the name of the columns match the names in a
list of strings called “colNames” by writing a function that returns T or F depending on weather they
match. (6 points)
checkNames <- function(dataSet, colNames) { #identical()
if (sum(colnames(dataSet) == colNames) == length(colnames(dataSet)) ) {
return(TRUE)
}else {
return(FALSE)
}
}
colNames = c("y" , "x1" ,"x2" ,"x3", "x4" ,"x5" ,"x6" ,"x7" ,"x8" ,"x9")
a = c("a", "b")
checkNames(dataSet,colNames)
## [1] TRUE
checkNames(dataSet,a)
## [1] FALSE
• Generate a dataFrame with n rows and three columns containing random normal samples with mean 2
and sd 7. Name the columns “col1”, “col2” and “col3” in your data frame.(5 points)
generateDataFrame <- function(n, mean = 2, sd = 7) {
col1 <- rnorm(n, mean, sd)
col2 <- rnorm(n, mean, sd)
col3 <- rnorm(n, mean, sd)
df <- data.frame(col1 = col1, col2 = col2, col3 = col3)
return(df)
}
generateDataFrame(3)
4. Write a function to return the indices of vector of a vector x where 2 ≤ xi < 5. Do not use any loops.
(17 points)
getIndices <- function(x) {
if (!is.vector(x) || !is.numeric(x) ) {
return(NA)
}else {
2
return(which(x >= 2 & x < 5))
}
}
x = c(1,6,"9")
getIndices(x)
## [1] NA
x = c(1:9)
getIndices(x)
## [1] 2 3 4
5. Write a function that returns the product of numeric matrices a and b using two nested for loops. Check
that a and b are numeric matrices of the correct dimensions to multiply or return NA. (18 points)
• Function that returns the product of numeric matrices: (12 points)
multiplyMatrix <- function(a, b) {
if (!is.matrix(a) || !is.numeric(a)) {
return(NA)
}
if (!is.matrix(b) || !is.numeric(b)) {
return(NA)
}
if (ncol(a) != nrow(b)) {
return(NA)
}
row = seq_along(b[1,])
col = seq_along(a[,1])
for (i in row) {
for (j in col) {
c[j, i] <- sum(a[j,] * b[,i])
}
}
return(c)
}
a = matrix(c(1:4), 2 , 2)
a
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
b = matrix(c(1:6), 2, 3)
b
3
multiplyMatrix(a,b)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
a %*% c
6. Given n by m numeric matrix X and n by 1 numeric vector y, show code to use the function lm and
print only the least squares solution b to y = Xb + e, and nothing else. (17 points)
• There are 3 ways to solve b:
vec = rnorm(10,13,1.5)
x = matrix(vec,5,2)
y = c(11:15)
getBetas1 <- function(x, y) {
if (!is.matrix(x) | !is.vector(y)) {
return(NA)
}
print(coef(lm(y~x)))
4
}
getBetas1(x,y)
## (Intercept) x1 x2
## 10.7913138 -0.4761347 0.6193482
getBetas2 <- function(x, y) {
if (!is.matrix(x) | !is.vector(y)) {
return(NA)
}
x = data.frame(x,y)
print(coef(lm(y~.,x)))
}
getBetas2(x,y)
## (Intercept) X1 X2
## 10.7913138 -0.4761347 0.6193482
getBetas3 <- function(x, y) {
if (!is.matrix(x) | !is.vector(y)) {
return(NA)
}
x = matrix(c(rep(1,nrow(x)),x),nrow(x),ncol(x)+1)
b = c(solve(t(x)%*%x)%*%(t(x)%*%y))
names(b) = c("(Intercept)", "X1", "X2" )
print(b)
}
getBetas3(x,y)
## (Intercept) X1 X2
## 10.7913138 -0.4761347 0.6193482