Lab 1 Activities
Lab 1 Activities
Author
The R code we have covered in class is available on the lecture section UM Learn page, under Content >
Course Material.
Question 1:
(a) In R, create a vector (call the vector x) of the five values 2, 5, 1, 4 and 7. Use R’s built-in function
mean() to find the mean of these five values. (We say we are passing in the x vector to the mean
function.)
x.vector <- double(5)
x <- c(2,5,1,4,7)
mean(x)
## [1] 3.8
(b) Suppose we have surveyed four respondents to collect their responses on some variable of interest. Only
three of them reply, so we have a missing value for the fourth person. Missing values are coded as NA
in R. We can create the vector of responses: y = c(2,5,1,NA). What does R tell you is the mean of
this vector y?
y = c(2,5,1,NA)
mean(y)
## [1] NA
(c) We can find the mean of the non-missing values for the vector in part (b) using extra options to R’s
mean function. The extra options are called arguments. To see what arguments are available for a
function, we need to access the help documentation. In the R console, type ?mean or help(mean) and
press enter (or return). The documentation will open up in the bottom right corner of RStudio. Write
the R code that will compute the mean of y, omitting missing values, using the proper argument to the
mean function.
y = c(2,5,1,NA)
mean(y,0,TRUE)
## [1] 2.666667
Notice that typing mean(0.75,0.25) in R will not return 0.5 (the mean of 0.75 and 0.25). That is because
R is interpreting only the value 0.75 as belonging to the data set of values of which you want the mean. It
is interpreting 0.25 as the value of the trim argument that is shown in the help documentation (we will
not be working with the trim argument). In order to get the mean of 0.75 and 0.25, the proper code is
mean(c(0.75,0.25)) or we must store 0.75 and 0.25 in a vector and then find the mean of that vector, as
we did in part (a).
Question 2:
(a) There are two datasets built-in to R named state.area and state.name referring to the 50 U.S. states.
Type these names at the R console to see the data they contain. The datasets are linked, so, for
example, the first component of state.area (which is the value 51609) gives you the area in square
miles of the first state in state.name (which is Alabama).
1
Observing the output of state.name, we see that California is the 5th state. Therefore, we can access
California’s area (in square miles) by accessing the 5th component of state.area:
state.area[5]
## [1] 158693
Write the R code that obtains the area in square miles of Pennsylvania.
state.area[38]
## [1] 45333
(b) A third related dataset is named state.region, for which the order of the components also corresponds
to the order in state.name. Write the R code that obtains the region of the U.S. that the state of
Iowa is in, based on this dataset. (When you extract the appropriate region, you may also get a list of
all regions, which is okay.)
state.region[15]
2
incomes = state.x77[,2]
hist(incomes, col="red")
Histogram of incomes
15
Frequency
10
5
0
incomes
Question 3:
(a) Create a vector in R with 7 components: three TRUEs and four FALSEs. (The order of the TRUEs
and FALSEs does not matter). Pass in this vector to the sum() function and report the output.
R=c(TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE)
sum(R)
## [1] 3
(b) Create a vector in R with 5 components: two TRUEs and three FALSEs. (The order of the TRUEs
and FALSEs does not matter). Pass in this vector to the sum() function and report the output.
R=c(TRUE,TRUE,FALSE,FALSE,FALSE)
sum(R)
## [1] 2
(c) Create a vector in R with 3 components: one TRUE and two FALSEs. (The order of the TRUEs and
FALSEs does not matter). Pass in this vector to the sum() function and report the output.
R=c(TRUE,FALSE,FALSE)
sum(R)
## [1] 1
(d) Create a vector in R with 4 components: zero TRUEs and all four FALSEs. Pass in this vector to the
sum() function and report the output.
3
R=c(FALSE,FALSE,FALSE,FALSE)
sum(R)
## [1] 0
(e) Based on these results, what does it seem the sum() function is doing when it is given a vector of
TRUEs and FALSEs?
It seems like the sum function is counting all the TRUE and FALSE pairs that can made from the contents
of the vector