0% found this document useful (0 votes)
57 views12 pages

BT1101 - R Code Cheatsheet 1.0

BT1101 covers various R programming concepts including data types, vectors, matrices, factors, data frames, lists, arrays, dplyr verbs for manipulating data, ggplot2 for data visualization, and R Markdown. Key concepts are: 1. Vectors are the basic data structure in R and can store numeric, character, logical values. Matrices are 2D arrays used to store tabular data. Data frames combine vectors of equal length and are commonly used for storing data tables. 2. The dplyr package contains verbs like select(), filter(), arrange(), mutate() for manipulating data frames. count() summarizes groups, group_by() works with summarize() to perform calculations within groups.

Uploaded by

SpideyMumu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views12 pages

BT1101 - R Code Cheatsheet 1.0

BT1101 covers various R programming concepts including data types, vectors, matrices, factors, data frames, lists, arrays, dplyr verbs for manipulating data, ggplot2 for data visualization, and R Markdown. Key concepts are: 1. Vectors are the basic data structure in R and can store numeric, character, logical values. Matrices are 2D arrays used to store tabular data. Data frames combine vectors of equal length and are commonly used for storing data tables. 2. The dplyr package contains verbs like select(), filter(), arrange(), mutate() for manipulating data frames. count() summarizes groups, group_by() works with summarize() to perform calculations within groups.

Uploaded by

SpideyMumu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

📈

BT1101 - R code
Data Types
Vectors
Naming a vector
Summing Vectors
Vector selection
Matrix
Selecting elements from a Matrix
Forming Matrix by combining Vectors
Factors
Data Frames
Lists
Referencing elements from a list
Arrays
Dplyr verbs
Count verbs
Group by and summarize
top_n verb
Transmute verb
ggplot2
Data Visualisation
Pie Charts
Bar Charts/ Barplot
Clustered Bar Charts (grouped Barplot)
Plot() function
Creating a side-by-side plot array
R Markdown
Formatting with Markdown
R Chunk Options
Writing code in R markdown
Images in R markdown
Adding external images to R markdown

Data Types
Decimal values are numerics while whole numbers like 4 are called integers. Integers are also
numerics.

Text (or string) are called characters

Boolean values are called logical

Complex numbers are called complex

x <- 5
y <- 5.5

BT1101 - R code 1
my_character <- "Hello World"

my_logic <- TRUE

w <- 1+2i

We can convert a variable to an integer type through two ways:

y <- as.integer(3) #can do the same to convert to character type too


OR
y <- 3L

Concatenating 2 character values:

fname = "Harry"
lname = "Potter"
paste(fname, lname)

We can check the class/type of a variable with the class() function:

x <- as.integer(5)
class(x)

Linear Rectangular

All Same Type Vector Matrix

Mixed List Data Frame

Vectors
Vectors are 1D arrays that can store any data types

Within each vector, only data of the same type can be stored

To create a vector you must use the combine function c()

Examples of vectors:

vector1 <- c(1, 2, 3)


vector2 <- c("a", "b", "c")
vector3 <- c(TRUE, FALSE, TRUE)

Naming a vector
You can give a name to the elements of a vector with the names() function

This will give a label to each value in the vector

my_vector <- c("Mursyid", "Student", "21")


names(my_vector) <- c("Name", "Occupation", "Age")

BT1101 - R code 2
#print out vector
my_vector

#Output will be something like this


Name Occupation Age
"Mursyid" "student" "21"

A more efficient way to name vectors is to assign the labels as a vector as well

kathleen_vector <- c("Kathleen", "Student", "19")


label_vector <- c("Name", "Occupation", "Age")

names(kathleen_vector) <- label_vector


names(my_vector) <- label_vector

Summing Vectors
You can perform any arithmetic operations on all numeric vectors

You can sum the total of a numeric vector with the function sum()

vector1 <- c(1, 2, 3)


vector4 <- c(4, 4, 4)

vector5 <- vector1 + vector4


total <- sum(vector1)

Vector selection
To select a specific element in a vector add square brackets to the end of the vector name

💡 Unlike Java arrays , the vector index starts with 1 not 0

We can select multiple elements to form a new vector

We can also select element(s) if the vector has been named

vector1 <- c(1, 2, 3, 4, 5, 6, 7)


a <- vector1[1]

# select by reference position


vector2 <- vector1[c(1,4)] # selecting the first and fourth element of the vector
vector3 <- vector1[1:4] # selecting from the first element to the fourth element

#Refer to above mentioned vectors


name_age <- kathleen_vector[c("Name", "Age")] # select by name
name_age

#Output will be
Name Age
"kathleen" "19"

BT1101 - R code 3
Certain values can be selected with comparison operators

vector1 > 4
[1] FALSE FALSE FALSE ... ... TRUE

selection_vector <- vector1 > 5


fivemore_vector <- vector1[selection_vector]

Matrix
We create a matrix using the function matrix()

a <- matrix(data = c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)


# nrow represents the number of row, ncol represents the number of column, byrow=TRUE
# means filling the matrix row by row. Without byrow=TRUE, output will be
# 1 3 5
# 2 4 6 instead

Selecting elements from a Matrix

a[2, 3] # Selecting the element at 2nd row, 3rd column


a[2, ] # Selecting all the elements from the 2nd row
a[, c(1, 3)] # Selecting all the elements from 1st and 3rd column

Forming Matrix by combining Vectors

v1 <- c(1, 2, 3, 4)
v2 <- c(5, 6, 7, 8)
rbind(v1, v2) # combine and form by rows
1 2 3 4 # output
5 6 7 8
cbind(v1, v2) # combine and form by columns
1 5 # output
2 6
3 7
4 8

Factors

gender <- factor(c("Female", "Male", "Female", "Female"))


levels(gender) # finding the levels of a factor
# output will be "Female" "Male"
nlevels(gender) # finding the number of levels of a factor
# output will be 2
speed_vector <- c("medium", "slow", "slow", "medium", "fast")
is.ordered(speed_vector) # checks if speed_vector is ordered
factor_speed_vector <- factor(speed_vector, ordered=TRUE, levels = c("slow", "medium", "fast"))
# converts speed_vector into an ordered factor vector

BT1101 - R code 4
Data Frames
a list of vectors of equal length used for storing data tables

We can create a data frame using the function data.frame()

d <- c(1, 2, 3, 4)
e <- c("red", "yellow", "blue", NA)

f <- c(T, T, T, F
mydata <- data.frame(d, e, f)
names(mydata) <- c("ID", "Color", "Passed")
mydata # print the dataframe
#output will be
ID Color Passed
1 red TRUE
2 yellow TRUE
3 blue TRUE
4 <NA> FALSE

mydata$ID #return all the elements belonging to the column of ID


#output:
[1] 1 2 3 4

nrow(mydata) # returns the total number of data rows


ncol(mydata) # returns the total number of data columns
head(mydata) # gives a preview of the data frame

Lists

A generic vector containing other objects

We can create a list using the function list()

a <- c(1, 2, 3)
b <- c(T, F)
list2 <- list(first = a, second = b) # assign names to the list members

Referencing elements from a list

list2[1] # reference by numeric index


list2$first # reference by name

Arrays
It is similar to matrixes, but arrays can have more than 2 dimensions

z <- 1:24 # elements from 1 to 24


dim(z) <- c(2, 4, 3) # dimensions of the array is 2x4x3

BT1101 - R code 5
Dplyr verbs
Have to be used with %>%

select() - selecting columns listed in a variable

counties %>%
select(state, county, population, poverty)
select(drive:work_at_home) # another way to select all the columns from drive to work_at_home

arrange() - arrange a column in ascending order or descending order

counties %>%
arrange(public_work) # arrange your data in ascending order
arrange(desc(public_work)) # arrange your data in descending order

filter() - filter your data such that only those that match the condition will be displayed, can have
more than 1 condition

counties %>%
filter(state == "Texas", population > 10000)
# Only showing Texas and with population more than 10000

filter(state %in% c("Texas", "Alabama"), population > 10000)


# To filter multiple of the same variable, use %in% with c()
# instead of ==

# Base R version of filter:


subset(dfO$Tree, Tree>3) # extract data with Tree>3 from the Tree colummn

mutate() - to add a new column through performing something on existing column

counties %>%
mutate(public_workers = population * public_work / 100)

Another function that can be used without %>% is glimpse() . It is used to examine all the variables
in a table

Count verbs
Non-weighted

counties %>%
count(region, sort=TRUE)

Weighted

counties %>%
count(region, wt = citizen, sort = TRUE)

BT1101 - R code 6
Group by and summarize
Summarize functions include:

sum()

mean()

median()

min()

max()

n() - to find the size of the group

counties %>%
group_by(state) %>%
summarize(total_population = sum(population))

group_by(state, metro) %>%


summarize(total_population = sum(population))

group_by(state) means that we are handling all the states that have the same name collectively

instead of individually, and a new column total_population is created which is the total population
of a state. Output will be:

does similar but instead of only handling state, it now handles state and its
group_by(state, metro)

metro and non-metro counties collectively

BT1101 - R code 7
ungroup()

top_n verb
Operates on a grouped function

top_n(<n>, <variable>) - top nth of the variable would be displayed

counties %>%
top_n(1, population)
# '1' specifies the number of most populated county in each state
# that would be displayed

top_n(2, men)
# this would display the top 2 counties with most men in each state

Transmute verb
select() + mutate()

returns a subset of columns that ae transformed and changed

counties %>%
transmute(state, county, fraction_men = men / population)

Keeps only the specified variables Keeps other variables

Cant change values select() rename()

Can change values transmute() mutate()

ggplot2
To create a line graph of the data from your table we use the ggplot() function after importing with
library(ggplot2)

ggplot(<data>, aes(x = ___, y = ___ , color = ___)) + geom_line()

BT1101 - R code 8
# Example
selected_names <- babynames %>%
filter(name %in% c("Steven", "Thomas", "Matthew"))
ggplot(selected_names, aes(x = year, y = number, color = name)) +
geom_line()

Output wil be:

Data Visualisation
Pie Charts
Not recommeded

Difficult to compare relative sizes

When using pie charts,

restrict to small number of categories

use labels

ensure it adds up to 100%

avoid 3D version

pie(<data>, labels = <data of labels>, main = "Insert name here")

#Example(labels would only contain words with no percentages):


DF1 <- Census_Education_Data_pie
slices <- DF1$'Not a High School Grad'
label <- DF1$Marital_Status
pie(slices, labels = label, main = "Pie Chart of Non-High School Grads")

BT1101 - R code 9
#Example but with percentages and colour:

#round(<>, 1) functions to round the percentages to 1 d.p


piepercent <- round(100 * DF1$'Not a High School Grad'/ sum(DF1$'Not a High School Grad'), 1)
slices <- DF1$'Not a High School Grad'
label <- DF1$Marital_Status

#use paste function to join percentage and label together while also adding "%" to the back
label <- paste(label, piepercent)
label <- paste(label, "&", sep = "")

#by using rainbow() we give 6 (derived from length()) colours to each label
pie(slices, labels = label, col = rainbow(length(label)), main = "Pie Chart of Non-High School Grads")

Bar Charts/ Barplot


represents data in rectangular bars with
length of the bar proportional to the value of the
variable

barplot(DF1$'Not a High School Grad',


names.arg = DF1$Marital_Status,
xlab = "Martial Status",
ylab = "High School Grads",
main = "Marital Status of Non-High School Grads",
col = "blue", cex.names = 0.5)

names.arg - labels , xlab - x-axis, ylab - y-axis, main - name of bar chart, col - colour of bar(s),
cex.names - change size of labels

Clustered Bar Charts (grouped Barplot)


We can use the barplot() function to make stacked Barplots and horizontal barplots as well

It compares values across categories using vertical rectangles

Plot() function
title(<name of the title>) - adds a title after plotting

log="xy" - scales the x and y axes by log. log="y" - scales only the y axis. log="x" - scales only
the x axis

# to plot scatterplots
plot(<x_data>, <y_data>, xlab = 'x-axis label', ylab = 'y-axis label',
pch = <shape_of_points>, col = 'colour of points')
# use lines() to add more scatter plots in the same xy axis
points(<x_data>, <y_data>)

# to plot line charts (for more info refer to data visualisation slides)
v <- c(7, 12, 4, 5, 10)
t <- c(1, 14, 6, 5, 13)
plot(v, type = "o", col = "red", xlab = "x-axis label", ylab = "y-axis label", main = "name")
# use lines() to add more line charts in the same xy axis
lines(t, type = "o", col = "blue")

BT1101 - R code 10
Creating a side-by-side plot array
par(mfrow = c(<num of row>, <num of col>))

R Markdown
Formatting with Markdown
We use # for making a header. The number of # represents the number of levels the header is

# National University of Singapore <- National Univeristy of Singapore is a first level header
## Hello <- Hello is a second level header

We use single * to italicised a sentence

*Hey* <- Hey will be italicised -> Hey

We use double * to bold a sentence

**Hello** <- Hello will be bolded -> Hello

We use [](<url>) to insert a link

[YouTube](https://youtube.com) -> the word YouTube will become blue -> YouTube

R Chunk Options
R chunk option function
include = TRUE/FALSE Whether to show the R code chunk and its output
echo = TRUE/FALSE Whether to show the R code chunk
message = TRUE/FALSE Whether to show output messages
warning = TRUE/FALSE Whether to show output warnings
eval = TRUE/FALSE Whether to actually evaluate the R code chunk

Writing code in R markdown

# for writing a block of code


```{r <insert the purpose of this code>, <include any R chunk options>}
# write your r code
```
# for writing a small code within a sentence E.g:

BT1101 - R code 11
2 + 2 equals `r 2+2`
output will be -> 2 + 2 equals 4

Images in R markdown
R chunk option Possible values effect
fig.height = Numeric, inches the height of the images in inches
fig.width = Numeric, inches the width of the images in inches
fig.align = one of “left”, “right” or “center” the alignment of the image in the report

Adding external images to R markdown

# Example:
![An impressive mountain!](https://google.com)
# An impressive mountain! is the caption for the image and is not necessary
# inside the bracket can either be the link to the image or the path to the local file containing the image

BT1101 - R code 12

You might also like