BES - R Lab 1
BES - R Lab 1
Review on Basics of R
1. Objectives
Importing data
Dataframe
Basic descriptive functions
The „by‟ function
This review lab is not intended to be comprehensive. Students thus are strongly advised to review all
labs in the course of Probability and Statistics (PAS).
1|P a ge
STA2 - R LAB 1
Exercise 3:
a. What does the following code do?
extracted1 <- data1[1:10,]
extracted2 <- data1[,1:3]
b. Check the structure of data1 using str() function. How many variables and observations are
there? Extract data for mpg, cyl and am variables and assign it to a data frame called
extracted3.
7. Data frames
A data frame in R is analogous to a dataset in SPSS, SAS or Stata. It is more general than a matrix in
that it permits different data types (numeric, character…) for different columns. Usually, a column
represents data for a variable (column name = variable name), and each row represents data for one
observation (or case in the language of SPSS). Data for one column must be of the same type.
2|P a ge
STA2 - R LAB 1
Note down the variable names, the number of variables, and the number of observations in data1.
What does the first column represent?
Let‟s try the following
mean(data1$mpg)
sd(data1$mpg)
summary(data1$mpg)
plot(data1$mpg, data1$wt)
In the above code, we access the mpg variable of the data1 data frame via the $ notation. This
notation signifies that we are dealing with a specific variable from that data frame.
8. Descriptive statistics
The following code produces summary statistics for 3 variables simultaneously.
summary(data1[c("mpg", "drat", "wt")])
If we want more descriptive measures, the function stat.desc() in the pastecs package can do a
good job. Let‟s install the pastecs package. Then follow the code listing below:
library(pastecs)
myvars <- c("mpg", "drat", "wt")
stat.desc(data1[myvars], basic=TRUE, desc=TRUE, norm=FALSE,
p=0.95)
Exercise 4: Use describeBy() function to produce summary statistics for mpg variable grouped by
am and cyl. Use mat=TRUE to display results in the form of a matrix. You should use list to
include more than 1 grouping variable.
3|P a ge
STA2 - R LAB 1
For instance, if we want to obtain a summary on mpg variable separately for each number of
cylinders in data1 data frame, we can use the following code:
by(data1$mpg,data1$cyl,summary)
by(data1$mpg,list(Cylinder=data1$cyl),summary)
If we want to produce a summary for mpg variable for each level of both am and cyl variables, we
can type in R the following code:
by(data1$mpg,list(data1$am, data1$cyl),summary)
by(data1$mpg,list(AM=data1$am, Cyl=data1$cyl),summary)
Exercise 5. Load the textbooks data from the openintro package. Examine a summary of the price
differences between new textbook sold at the UCLA bookstore and on Amazon for two subsets:
courses where one book is required and where more than one book is required.
install.packages("openintro”)
library(openintro)
data(textbooks)
10. Recap questions
a. What does the assignment operator do?
b. What are the main data types?
c. What function is used to check for data type of an object?
d. Print the data1 dataset. Find the sample mean and standard deviation of the weight
variable. Display the five-number summary for the weight variable. Provide a summary
for hp variable grouped by cyl variable using the functions describeBy() and by().
4|P a ge