Bt1101 l1 Lab - Basics of R Ay2425
Bt1101 l1 Lab - Basics of R Ay2425
Bt1101 l1 Lab - Basics of R Ay2425
@2023 NUS. The contents contained in the document may not be distributed or reproduced in any form or by any means without the written permission of NUS
Lab session contents
• Review related concepts
• Cover Part 1 of tutorial
o Discuss strategy and approach to the assignment questions
o Hands-on coding in R
o Discussion of answers
Everyone is here to learn. If you don't know, please ask. Let's learn together
2
Learning Objectives
• To be able to download and install R and R Studio to their
laptops
• To familiarise with the R studio interface, be able to create
an R script file, open and export data files
• To know some common functions used to do quick checks
on the dataset
• To be able to do some basic data manipulation with Base
R and dplyr with a small dataset (built in dataset will be
introduced and used)
3
Basics of
4
Setting up R& RStudio
Go to https://posit.co/download/rstudio-desktop/
5
Data Types in R
A basic concept in programming is variables.
• Variables allow you to store information such as values (e.g. “2”) or objects (e.g.
dataframes, functions) in R.
• Calling a variable’s name retrieves the stored information.
• Variable names are case-sensitive!
• Every variable has a data type (class):
- Numeric
- Integers
- Logical
- Character
- Factor
6
Data Structures in R
1. Vectors. Can contain one datatype (e.g. numeric, character, logical), 1D.
• y1 <- c(1, 2, 2, 3, 4, 5)
• y2 <- c(“small”, “medium”, “large”, “large”)
2. Matrix. Like vectors, can only contain one datatype (usually numeric). Data is arranged into a
fixed number of rows and columns, 2D.
• mat1 <- matrix(1:4, nrow=2, ncol=2)
7
Part 1
1) We will start by exploring the built-in dataset called ToothGrowth. To find out more
about this dataset, type ?ToothGrowth in the R command line.
8
Part 1
1) We will start by exploring the built-in dataset called ToothGrowth. To find out more
about this dataset, type ?ToothGrowth in the R command line.
9
Part 1
1) We will start by exploring the built-in dataset called ToothGrowth. To find out more
about this dataset, type ?ToothGrowth in the R command line.
10
Part 1
2) Selecting data
There are several variables in ToothGrowth. Using Base R and dplyr functions, can you
perform (i), (ii) and (iii)?
i. I. Extract the column supp
ii. II. Extract rows where supp is equal to “VC” and dose is less than 1 and assign
the output to df2
iii. III. Extract the values of len where supp is equal to “VC”
iv. IV. Try to perform the above operations (i, ii, iii) again but this time, assign the
output to df2.1, df2.2 and df2.3 respectively.
v. V. Use the class function to check the class attribute for each of the outputs. Use
is.data.frame function to check whether the output is a dataframe or a
vector.
11
Indexing and selection with base R
Cheatsheet: https://github.com/rstudio/cheatsheets/blob/main/base-r.pdf
12
dplyr Package
• Data manipulation library in R
• Lets you subset, reshape, join and summarize data typically using less code than would
be required in base R
• Part of the R tidyverse
• Install the package (if not already installed), then load the dplyr library.
- install.packages(“tidyverse”)
- library(tidyverse)
13
dplyr Package
Documentation: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
Cheatsheet: https://github.com/rstudio/cheatsheets/blob/main/data-transformation.pdf
14
dplyr Package
Documentation: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
Cheatsheet: https://github.com/rstudio/cheatsheets/blob/main/data-transformation.pdf
15
dplyr Package
Documentation: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
Cheatsheet: https://github.com/rstudio/cheatsheets/blob/main/data-transformation.pdf
16
dplyr Package
Documentation: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
Cheatsheet: https://github.com/rstudio/cheatsheets/blob/main/data-transformation.pdf
17
dplyr Package
Documentation: https://cran.r-project.org/web/packages/dplyr/dplyr.pdf
Cheatsheet: https://github.com/rstudio/cheatsheets/blob/main/data-transformation.pdf
18
dplyr Package
• dplyr introduces pipes: %>%
• Allows you to use the result of one function as the input to another
function that comes after the pipe.
• Essentially, pipes allow you to chain several functions together
19
After both pipes (%>%) After the first %>% Original ToothGrowth
Part 1
2i) Extract the column supp.
Part 1
2i) Extract the column supp.
21
Part 1
2ii) Extract rows where supp is equal to “VC” and dose is less than 1 and assign the
output to df2
22
Part 1
2iii) Extract the values of len where supp is equal to “VC”
Part 1
2iv) Try to perform the above operations (i, ii, iii) again but this time, assign the output
to df2.1, df2.2 and df2.3 respectively.
2 v) Use the class function to check the class attribute for each of the outputs.
Use is.data.frame function to check whether the output is a dataframe or a vector.
24
Part 1
25
Part 1
2vi) Use the `slice` function to extract the maximum and minimum values of `len`. Also
use `slice` to extract the 5th to 10th rows of observations.
26
Part 1
3) Adding/Removing/Changing data columns for Toothgrowth data.
i. Change the variable name from len to length and assign the output to df3.1
ii. Increase the value of len by 0.5 if supp is equal to OJ and assign the output to
df3.2
iii. Remove the column dose from the data and assign the output to df3.3
iv. Increase the value of dose by 0.1 for all records and rename dose to dose.new and
assign output to df3.4
v. Create a new variable high.dose and assign it a value of “TRUE” if dose is more
than 1 and “FALSE” if dose is less than or equal to 1. Assign the dataframe with
the new variable high.dose to df3.5. Export df3.5 to a csv file. Discuss what is the r
code to export as an excel file (.xlsx).
27
Part 1
3i) Change the variable name from len to length and assign the output to df3.1
28
Part 1
3ii) Increase the value of len by 0.5 if supp is equal to OJ and assign the output to df3.2
Part 1
3iii) Remove the column dose from the data and assign the output to df3.3
Part 1
3iv) Increase the value of dose by 0.1 for all records and rename dose to dose.new and
assign output to df3.4
Part 1
3v) Create a new variable high.dose and assign it a value of “TRUE” if dose is more than 1
and “FALSE” if dose is less than or equal to 1. Assign the dataframe with the new variable
high.dose to df3.5. Export df3.5 to a csv file. Discuss what is the r code to export as an
excel file (.xlsx).
33
Sorting
Base R
• sort returns the original object, sorted in ascending order by default.
• order returns the indices of the sorted object, also in ascending order by default.
34
Part 1
4i) There are two functions in Base R “sort” and “order” to perform sorting. How do
these two functions differ? Try to do a sort with each function on ToothGrowth$len.
35
Part 1
4ii) Using a base R function (e.g. order), how can you sort the dataframe ToothGrowth
in decreasing order of len?
36
Part 1
4iii) What dplyr function can you use to sort ToothGrowth in increasing order of len?
Can you also sort the dataframe in decreasing order of len?
Part 1
5) Factors
i. Check if supp is a factor vector. First type ToothGrowth$supp. What do you
observe with the output?
ii. Next use is.factor() and is.ordered() to check if supp is a factor and is
so whether it is an ordered factor.
iii. Now supposed we find that vitamin C (VC) is a superior supplement compared to
orange juice (OJ), and we want to order supp such that VC is a higher level than
OJ, how could we do this?
38
Part 1
5i) Check if supp is a factor vector. First type ToothGrowth$supp. What do you
observe with the output?
39
Part 1
5ii) Next use is.factor() and is.ordered() to check if supp is a factor and is so
whether it is an ordered factor.
Part 1
5ii) Now supposed we find that vitamin C (VC) is a superior supplement compared to
orange juice (OJ), and we want to order supp such that VC is a higher level than OJ, how
could we do this? (Hint: Assign factor_supp to ToothGrowth$supp)
41
R& RStudio tips
Clearing your workspace can prevent it from becoming messy.
• To clear the console: ctrl + L
• To clear a variable from your environment: rm(variable)
• To clear all variables (use with caution!): rm(list=ls())
Keyboard shortcuts help you work more efficiently. Some common ones are:
• Run the current line of code: cmd+enter (Mac), ctrl+enter (Windows)
• Insert the <- operator: option + - (Mac), Alt + - (Windows)
• Insert the %>% operator: cmd+shift+M (Mac), ctrl+shift+M (Windows)
Documentation provides information about functions in R and examples of how they’re used.
• To read R documentation about a function: help(function_name) or
?function_name
42
Next week
• Lecture and tutorials as per usual next week
• Basics of R Part 2 due 9th Sep, 9am — answers must be submitted in
the form of an R script.
43