0% found this document useful (0 votes)
11 views4 pages

Lab 1 Activities

Uploaded by

rishamdeep44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

Lab 1 Activities

Uploaded by

rishamdeep44
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lab #1 Activities

Author

The R code we have covered in class is available on the lecture section UM Learn page, under Content >
Course Material.
Question 1:
(a) In R, create a vector (call the vector x) of the five values 2, 5, 1, 4 and 7. Use R’s built-in function
mean() to find the mean of these five values. (We say we are passing in the x vector to the mean
function.)
x.vector <- double(5)
x <- c(2,5,1,4,7)
mean(x)

## [1] 3.8
(b) Suppose we have surveyed four respondents to collect their responses on some variable of interest. Only
three of them reply, so we have a missing value for the fourth person. Missing values are coded as NA
in R. We can create the vector of responses: y = c(2,5,1,NA). What does R tell you is the mean of
this vector y?
y = c(2,5,1,NA)
mean(y)

## [1] NA
(c) We can find the mean of the non-missing values for the vector in part (b) using extra options to R’s
mean function. The extra options are called arguments. To see what arguments are available for a
function, we need to access the help documentation. In the R console, type ?mean or help(mean) and
press enter (or return). The documentation will open up in the bottom right corner of RStudio. Write
the R code that will compute the mean of y, omitting missing values, using the proper argument to the
mean function.
y = c(2,5,1,NA)
mean(y,0,TRUE)

## [1] 2.666667
Notice that typing mean(0.75,0.25) in R will not return 0.5 (the mean of 0.75 and 0.25). That is because
R is interpreting only the value 0.75 as belonging to the data set of values of which you want the mean. It
is interpreting 0.25 as the value of the trim argument that is shown in the help documentation (we will
not be working with the trim argument). In order to get the mean of 0.75 and 0.25, the proper code is
mean(c(0.75,0.25)) or we must store 0.75 and 0.25 in a vector and then find the mean of that vector, as
we did in part (a).
Question 2:
(a) There are two datasets built-in to R named state.area and state.name referring to the 50 U.S. states.
Type these names at the R console to see the data they contain. The datasets are linked, so, for
example, the first component of state.area (which is the value 51609) gives you the area in square
miles of the first state in state.name (which is Alabama).

1
Observing the output of state.name, we see that California is the 5th state. Therefore, we can access
California’s area (in square miles) by accessing the 5th component of state.area:
state.area[5]

## [1] 158693
Write the R code that obtains the area in square miles of Pennsylvania.
state.area[38]

## [1] 45333
(b) A third related dataset is named state.region, for which the order of the components also corresponds
to the order in state.name. Write the R code that obtains the region of the U.S. that the state of
Iowa is in, based on this dataset. (When you extract the appropriate region, you may also get a list of
all regions, which is okay.)
state.region[15]

## [1] North Central


## Levels: Northeast South North Central West
(c) A fourth related dataset is named state.x77. Type state.x77 at the R console to see the data. The
data is longer, so you can scroll back up to see the beginning of the data. This data is stored in a
matrix. We will extract the per capita incomes (data is from the year 1974) from this dataset with the
following code. We are extracting the second column of state.x77, where the incomes reside. (The
state names are not in any column; rather, they are the names of the rows.)
incomes = state.x77[,2]
populations = state.x77[,1]
perCapita = incomes/populations
perCapita

## Alabama Alaska Arizona Arkansas California


## 1.0024896 17.3013699 2.0479204 1.6009479 0.2412492
## Colorado Connecticut Delaware Florida Georgia
## 1.9220779 1.7251613 8.3056995 0.5817325 0.8296492
## Hawaii Idaho Illinois Indiana Iowa
## 5.7177419 5.0664207 0.4561043 0.8390740 1.6176162
## Kansas Kentucky Louisiana Maine Maryland
## 2.0478070 1.0959551 0.9314241 3.4914934 1.2855410
## Massachusetts Michigan Minnesota Mississippi Missouri
## 0.8178535 0.5214576 1.1922979 1.3233661 0.8923851
## Montana Nebraska Nevada New Hampshire New Jersey
## 5.8270777 2.9196891 8.7271186 5.2721675 0.7141688
## New Mexico New York North Carolina North Dakota Ohio
## 3.1477273 0.2712436 0.7121853 7.9858713 0.4248719
## Oklahoma Oregon Pennsylvania Rhode Island South Carolina
## 1.4670350 2.0402802 0.3751265 4.8958110 1.2908381
## South Dakota Tennessee Texas Utah Vermont
## 6.1189427 0.9156482 0.3422407 3.3433084 8.2775424
## Virginia Washington West Virginia Wisconsin Wyoming
## 0.9437864 1.3666760 2.0105614 0.9736326 12.1436170
Create a histogram of the incomes object. Type ?hist at the console to see arguments that can be added
to the hist() function to enhance your histogram. Add at least one enhancement (change the title of the
histogram, an axis label, or the colour of the bars).

2
incomes = state.x77[,2]
hist(incomes, col="red")

Histogram of incomes
15
Frequency

10
5
0

3000 3500 4000 4500 5000 5500 6000 6500

incomes

Question 3:
(a) Create a vector in R with 7 components: three TRUEs and four FALSEs. (The order of the TRUEs
and FALSEs does not matter). Pass in this vector to the sum() function and report the output.
R=c(TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE)
sum(R)

## [1] 3
(b) Create a vector in R with 5 components: two TRUEs and three FALSEs. (The order of the TRUEs
and FALSEs does not matter). Pass in this vector to the sum() function and report the output.
R=c(TRUE,TRUE,FALSE,FALSE,FALSE)
sum(R)

## [1] 2
(c) Create a vector in R with 3 components: one TRUE and two FALSEs. (The order of the TRUEs and
FALSEs does not matter). Pass in this vector to the sum() function and report the output.
R=c(TRUE,FALSE,FALSE)
sum(R)

## [1] 1
(d) Create a vector in R with 4 components: zero TRUEs and all four FALSEs. Pass in this vector to the
sum() function and report the output.

3
R=c(FALSE,FALSE,FALSE,FALSE)
sum(R)

## [1] 0
(e) Based on these results, what does it seem the sum() function is doing when it is given a vector of
TRUEs and FALSEs?
It seems like the sum function is counting all the TRUE and FALSE pairs that can made from the contents
of the vector

You might also like