0% found this document useful (0 votes)

33 views40 pages

Tutorial 1 - R Programming

ACCT3112 tutorial 1

Uploaded by

jwhc0908

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views40 pages

Tutorial 1 - R Programming

ACCT3112 tutorial 1

Uploaded by

jwhc0908

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Tutorial 1: R Programming

Yuqi Sun

2024-09-19

1 / 40
knitr::opts_chunk$set(echo = TRUE, message=FALSE, warning=FALSE)
# Load the necessary libraries
library(tidyverse) # or you can just load library(dplyr)
library(babynames)

Usually, we load all necessary libraries in the first chunk. Here we name it as
setup.

▶ knitr::opts_chunk$set(echo = TRUE): This line sets global options for all

code chunks in the R Markdown document using the knitr package.
▶ echo = TRUE: means that the R code will be displayed in the final
output, allowing readers to see the code along with its results.
▶ more details on R code chunks: R code chunks

To insert an R code chunk:

▶ the keyboard shortcut is Ctrl + Alt + I (Cmd + Option + I on macOS).

▶ click Add Chunk command in the editor toolbar

2 / 40
1. Basic Markdown Syntax

Headings

Headings in Markdown are created by using the ‘#’ symbol before the text.
There are six levels of headings, represented by one to six ‘#’ symbols.

# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6

3 / 40
Emphasis

1. Bold text
You can make text bold by using two asterisks (**) or underscores (__) before
and after the text.

2. Italic text
You can make text italic by using one asterisk (*) or underscore (_) before and
after the text.

4 / 40
Lists

You can create unordered lists by starting a line with ’*‘,’+’ or ‘-’.
▶ Item 1
▶ Item 2
▶ Item 3

You can create ordered lists by starting a line with a number.

1. Item 1
2. Item 2
3. Item 3

5 / 40
Links and Images
1. You can create a link by enclosing the link text in brackets [] and the URL
in parentheses ().
[Visit Google](https://www.google.com)
Visit Google

2. You can display an image by starting with an exclamation mark (!),

followed by alt text in brackets [], and the path to the image in
parentheses ().
![R Logo](C:\Users\FBERPG\Desktop\logo_r.png){width=30%}

Figure 1: R Logo

6 / 40
2. Basic Data Exploration (Review on Dplyr)
We will only focus on using dplyr (library(tidyverse)) instead of base R in the
following tutorials.

Five Important Verbs

1. Pick observations by their values (filter()).

2. Reorder the rows (arrange()).
3. Pick variables by their names (select()).
4. Create new variables with functions of existing variables (mutate()).
5. Collapse many values down to a single summary (summarise()).

▶ These can all be used in conjunction with group_by() which changes the
scope of each function from operating on the entire dataset to operating
on it group-by-group.
▶ These six functions provide the verbs for a language of data manipulation
connected using %>%.

7 / 40
Grouped Summaries with summarise()

summarise() collapses a data frame to a single row:

summarise(mtcars, avg_mpg = mean(mpg, na.rm = TRUE))

## avg_mpg
## 1 20.09062

▶ summarise() is not terribly useful unless we pair it with group_by().

▶ This changes the unit of analysis from the complete dataset to individual
groups.
▶ Then, when you use the dplyr verbs on a grouped data frame, they’ll be
automatically applied “by group”.

8 / 40
by_cyl <- group_by(mtcars, cyl)
by_cyl

## # A tibble: 32 x 11
## # Groups: cyl [3]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # i 22 more rows

9 / 40
summarise(by_cyl, avg_mpg = mean(mpg, na.rm = TRUE))

## # A tibble: 3 x 2
## cyl avg_mpg
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1

10 / 40
Group by multiple variables

by_cyl2 <- group_by(mtcars, cyl, vs)

by_cyl2

## # A tibble: 32 x 11
## # Groups: cyl, vs [5]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # i 22 more rows

11 / 40
summarise(by_cyl2, avg_mpg = mean(mpg, na.rm = TRUE))

## # A tibble: 5 x 3
## # Groups: cyl [3]
## cyl vs avg_mpg
## <dbl> <dbl> <dbl>
## 1 4 0 26
## 2 4 1 26.7
## 3 6 0 20.6
## 4 6 1 19.1
## 5 8 0 15.1

12 / 40
Group by multiple variables

by_cyl3 <- group_by(mtcars, cyl, vs, am)

by_cyl3

## # A tibble: 32 x 11
## # Groups: cyl, vs, am [7]
## mpg cyl disp hp drat wt qsec vs am gear carb
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4
## 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4
## 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
## 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
## 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
## 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
## 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
## 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
## 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
## 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4
## # i 22 more rows

13 / 40
summarise(by_cyl3, avg_mpg = mean(mpg, na.rm = TRUE))

## # A tibble: 7 x 4
## # Groups: cyl, vs [5]
## cyl vs am avg_mpg
## <dbl> <dbl> <dbl> <dbl>
## 1 4 0 1 26
## 2 4 1 0 22.9
## 3 4 1 1 28.4
## 4 6 0 1 20.6
## 5 6 1 0 19.1
## 6 8 0 0 15.0
## 7 8 0 1 15.4

Together, group_by() and summarise() provide one of the tools that you’ll use
most commonly when working with dplyr: grouped summaries.

14 / 40
Missing Values
▶ You may have wondered about the na.rm argument we used above.
▶ What happens if we don’t set it?
▶ We will get missing values!
▶ That’s because aggregation functions obey the usual rule of missing values:
if there’s any missing value in the input, the output will be a missing value.
▶ Fortunately, all aggregation functions have an na.rm argument which
removes the missing values prior to computation:

mtcars %>%
group_by(cyl) %>%
summarise(avg_mpg = mean(mpg, na.rm = TRUE))

## # A tibble: 3 x 2
## cyl avg_mpg
## <dbl> <dbl>
## 1 4 26.7
## 2 6 19.7
## 3 8 15.1

15 / 40
%in%

▶ The %in% operator in R can be used to identify if an element (e.g., a

number) belongs to a vector or dataframe.
▶ For example, it can be used to see if the number 1 is in the sequence of
numbers 1 to 10.
1. Compare two Sequences of Numbers (vectors)
▶ In this example, we will use %in% to check if two vectors contain
overlapping numbers.
▶ Specifically, we will look at how we can get a logical value for more
specific elements, whether they are also present in a longer vector.

# sequence of numbers 1:
a <- seq(1, 5)
# sequence of numbers 2:
b <- seq(3, 12)

# using the %in% operator to check matching values in the vectors

a %in% b

## [1] FALSE FALSE TRUE TRUE TRUE

16 / 40
2. Compare two Vectors Containing Letters or Factors

g <- c("C", "D", "E")

h <- c("A", "E", "B", "C", "D", "E", "A", "B", "C", "D", "E")

h %in% g

## [1] FALSE TRUE FALSE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRU

which(h %in% g)

## [1] 2 4 5 6 9 10 11

17 / 40
3. Example: Use of %in% in filter

mtcars %>% filter(cyl %in% c(4,6))

## mpg cyl disp hp drat wt qsec vs am gear carb

## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
18 / 40
Useful Functions for Summary Statistics
1. Sample Size
Whenever you do any aggregation, it’s always a good idea to include either a
count (n()), or a count of non-missing values (sum(!is.na(x))).
▶ n() returns the size of the current group.
▶ To count the number of non-missing values, use sum(!is.na(x)).
▶ To count the number of distinct (unique) values, use n_distinct(x).

mtcars %>%
group_by(cyl) %>%
summarise(
avg_mpg = mean(mpg, na.rm = TRUE),
count = n()
)

## # A tibble: 3 x 3
## cyl avg_mpg count
## <dbl> <dbl> <int>
## 1 4 26.7 11
## 2 6 19.7 7
## 3 8 15.1 14

19 / 40
What if we don’t use summarize(), but instead use mutate()?
mtcars %>%
group_by(cyl) %>%
mutate(
avg_mpg = mean(mpg, na.rm = TRUE),
count = n()
) %>%
select(mpg, cyl, avg_mpg, count)

## # A tibble: 32 x 4
## # Groups: cyl [3]
## mpg cyl avg_mpg count
## <dbl> <dbl> <dbl> <int>
## 1 21 6 19.7 7
## 2 21 6 19.7 7
## 3 22.8 4 26.7 11
## 4 21.4 6 19.7 7
## 5 18.7 8 15.1 14
## 6 18.1 6 19.7 7
## 7 14.3 8 15.1 14
## 8 24.4 4 26.7 11
## 9 22.8 4 26.7 11
## 10 19.2 6 19.7 7
## # i 22 more rows 20 / 40
Group by multiple variables:

mtcars %>%
group_by(cyl, vs, am) %>%
summarise(
avg_mpg = mean(mpg, na.rm = TRUE),
count = n()
)

## # A tibble: 7 x 5
## # Groups: cyl, vs [5]
## cyl vs am avg_mpg count
## <dbl> <dbl> <dbl> <dbl> <int>
## 1 4 0 1 26 1
## 2 4 1 0 22.9 3
## 3 4 1 1 28.4 7
## 4 6 0 1 20.6 3
## 5 6 1 0 19.1 4
## 6 8 0 0 15.0 12
## 7 8 0 1 15.4 2

21 / 40
mtcars %>%
group_by(cyl) %>%
summarise(
engine_num = n_distinct(vs)
) %>%
arrange(desc(engine_num))

## # A tibble: 3 x 2
## cyl engine_num
## <dbl> <int>
## 1 4 2
## 2 6 2
## 3 8 1

22 / 40
2. Sum and Average
sum(x), mean(x), median(x)
It’s sometimes useful to combine aggregation with logical subsetting. For
example:

mtcars %>%
group_by(cyl) %>%
summarise(
avg_mpg = mean(mpg),
avg_positive_mpg = mean(mpg[mpg > 20]) # the average mpg if mpg > 20
)

## # A tibble: 3 x 3
## cyl avg_mpg avg_positive_mpg
## <dbl> <dbl> <dbl>
## 1 4 26.7 26.7
## 2 6 19.7 21.1
## 3 8 15.1 NaN

23 / 40
3. Variation
Measures of spread:
Standard deviation sd(x), Variance var(x), The interquartile range IQR(x),
Median absolute deviation mad(x).

mtcars %>%
group_by(cyl) %>%
summarise(sd_mpg = sd(mpg)) %>%
arrange(desc(sd_mpg))

## # A tibble: 3 x 2
## cyl sd_mpg
## <dbl> <dbl>
## 1 4 4.51
## 2 8 2.56
## 3 6 1.45

24 / 40
4. Rank
Measures of rank: min(x), quantile(x, 0.25), quantile(x, 0.75), max(x).

mtcars %>%
group_by(cyl) %>%
summarise(
min_hp = min(hp, na.rm = TRUE),
max_hp = max(hp, na.rm = TRUE),
percentile_25 = quantile(hp, 0.25, na.rm = TRUE),
percentile_75 = quantile(hp, 0.75, na.rm = TRUE)
)

## # A tibble: 3 x 5
## cyl min_hp max_hp percentile_25 percentile_75
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4 52 113 65.5 96
## 2 6 105 175 110 123
## 3 8 150 335 176. 241.

25 / 40
3. Babynames Data

Example 1: Statistics for Certain Names

bn <- babynames::babynames
nms <- c('Mary', 'John', 'Emily', 'Lawrence')
bn %>%
filter(name %in% nms) %>%
group_by(name) %>%
summarize('Number of Years' = n(), 'Number of babies' = sum(n))

## # A tibble: 4 x 3
## name ‘Number of Years‘ ‘Number of babies‘
## <chr> <int> <int>
## 1 Emily 212 843235
## 2 John 276 5137142
## 3 Lawrence 237 458898
## 4 Mary 268 4138360

26 / 40
length(unique(bn$year))

## [1] 138

We have 138 years in total.

Why could the Number of Years variable be even larger than 138?

27 / 40
bn %>%
filter(name %in% nms) %>%
group_by(name, sex) %>%
summarize(num.years.by.sex = n(),
num.babies.by.sex =sum(n)) %>%
mutate(total.num.babies = sum(num.babies.by.sex),
proportion = num.babies.by.sex/total.num.babies)

## # A tibble: 8 x 6
## # Groups: name [4]
## name sex num.years.by.sex num.babies.by.sex total.num.babies
## <chr> <chr> <int> <int> <int>
## 1 Emily F 138 841491 843235
## 2 Emily M 74 1744 843235
## 3 John F 138 21676 5137142
## 4 John M 138 5115466 5137142
## 5 Lawrence F 99 2125 458898
## 6 Lawrence M 138 456773 458898
## 7 Mary F 138 4123200 4138360
## 8 Mary M 130 15160 4138360

28 / 40
Example 2: Names that Persistently Appear in Each Year

numyears <- length(unique(bn$year))

persistentnames <- bn %>%
group_by(name, year) %>% ## M/F doesn't matter
summarize() %>%
group_by(name) %>%
summarize(Total = n()) %>%
filter(Total==numyears)
nrow(persistentnames)

## [1] 922

29 / 40
bn %>%
group_by(name, year) %>% ## M/F doesn't matter
summarize()

## # A tibble: 1,756,284 x 2
## # Groups: name [97,310]
## name year
## <chr> <dbl>
## 1 Aaban 2007
## 2 Aaban 2009
## 3 Aaban 2010
## 4 Aaban 2011
## 5 Aaban 2012
## 6 Aaban 2013
## 7 Aaban 2014
## 8 Aaban 2015
## 9 Aaban 2016
## 10 Aaban 2017
## # i 1,756,274 more rows

30 / 40
bn %>%
group_by(name, year) %>% ## M/F doesn't matter
summarize() %>%
group_by(name)

31 / 40
bn %>%
group_by(name, year) %>% ## M/F doesn't matter
summarize() %>%
group_by(name) %>%
summarize(Total = n())

## # A tibble: 97,310 x 2
## name Total
## <chr> <int>
## 1 Aaban 10
## 2 Aabha 5
## 3 Aabid 2
## 4 Aabir 1
## 5 Aabriella 5
## 6 Aada 1
## 7 Aadam 26
## 8 Aadan 11
## 9 Aadarsh 17
## 10 Aaden 17
## # i 97,300 more rows

32 / 40
bn %>%
group_by(name, year) %>% ## M/F doesn't matter
summarize() %>%
group_by(name) %>%
summarize(Total = n()) %>%
filter(Total==numyears)

## # A tibble: 922 x 2
## name Total
## <chr> <int>
## 1 Aaron 138
## 2 Abbie 138
## 3 Abe 138
## 4 Abel 138
## 5 Abigail 138
## 6 Abner 138
## 7 Abraham 138
## 8 Abram 138
## 9 Ada 138
## 10 Adam 138
## # i 912 more rows

33 / 40
What if we delete: group_by(name) ?

bn %>%
group_by(name, year) %>% ## M/F doesn't matter
summarize() %>%
##group_by(name) %>%
summarize(Total = n()) %>%
filter(Total==numyears)

persistentnames2 <- bn %>%

group_by(name, year, sex) %>% ## M/F indeed matter
summarize() %>%
group_by(name) %>%
summarize(Total = n()) %>%
filter(Total == 2*numyears)
nrow(persistentnames2)

## [1] 15

35 / 40
bn %>%
group_by(name, year, sex) %>% ## M/F indeed matter
summarize()

## # A tibble: 1,924,665 x 3
## # Groups: name, year [1,756,284]
## name year sex
## <chr> <dbl> <chr>
## 1 Aaban 2007 M
## 2 Aaban 2009 M
## 3 Aaban 2010 M
## 4 Aaban 2011 M
## 5 Aaban 2012 M
## 6 Aaban 2013 M
## 7 Aaban 2014 M
## 8 Aaban 2015 M
## 9 Aaban 2016 M
## 10 Aaban 2017 M
## # i 1,924,655 more rows

36 / 40
bn %>%
group_by(name, year, sex) %>% ## M/F indeed matter
summarize() %>%
group_by(name) %>%
summarize(Total = n())

## # A tibble: 97,310 x 2
## name Total
## <chr> <int>
## 1 Aaban 10
## 2 Aabha 5
## 3 Aabid 2
## 4 Aabir 1
## 5 Aabriella 5
## 6 Aada 1
## 7 Aadam 26
## 8 Aadan 11
## 9 Aadarsh 17
## 10 Aaden 18
## # i 97,300 more rows

37 / 40
bn %>%
group_by(name, year, sex) %>% ## M/F indeed matter
summarize() %>%
group_by(name) %>%
summarize(Total = n()) %>%
filter(Total == 2*numyears)

## # A tibble: 15 x 2
## name Total
## <chr> <int>
## 1 Francis 276
## 2 James 276
## 3 Jean 276
## 4 Jesse 276
## 5 Jessie 276
## 6 John 276
## 7 Johnnie 276
## 8 Joseph 276
## 9 Lee 276
## 10 Leslie 276
## 11 Marion 276
## 12 Ollie 276
## 13 Sidney 276
## 14 Tommie 276 38 / 40
What if we delete: group_by(name) ?

bn %>%
group_by(name, year, sex) %>% ## M/F indeed matter
summarize() %>%
##group_by(name) %>%
summarize(Total = n())%>%
filter(Total == 2*numyears)

## # A tibble: 0 x 3
## # Groups: name [0]
## # i 3 variables: name <chr>, year <dbl>, Total <int>

39 / 40
bn %>%
group_by(name, year, sex) %>% ## M/F indeed matter
summarize() %>%
##group_by(name) %>%
summarize(Total = n())

## # A tibble: 1,756,284 x 3
## # Groups: name [97,310]
## name year Total
## <chr> <dbl> <int>
## 1 Aaban 2007 1
## 2 Aaban 2009 1
## 3 Aaban 2010 1
## 4 Aaban 2011 1
## 5 Aaban 2012 1
## 6 Aaban 2013 1
## 7 Aaban 2014 1
## 8 Aaban 2015 1
## 9 Aaban 2016 1
## 10 Aaban 2017 1
## # i 1,756,274 more rows

40 / 40

Selfie
No ratings yet
Selfie
4 pages
R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
Jsall-2016-0001, Language of Early Buddhism, Levman PDF
No ratings yet
Jsall-2016-0001, Language of Early Buddhism, Levman PDF
41 pages
Starting With R
No ratings yet
Starting With R
34 pages
Beginner Guide To R and R Studio V1
No ratings yet
Beginner Guide To R and R Studio V1
27 pages
Tutorial 1
No ratings yet
Tutorial 1
29 pages
Lesson 7 - The Data Frame
No ratings yet
Lesson 7 - The Data Frame
7 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
STATA - Subject Table of Contents
No ratings yet
STATA - Subject Table of Contents
15 pages
R Intro STAT5000
No ratings yet
R Intro STAT5000
17 pages
BT1101 - R Code Cheatsheet 1.0
No ratings yet
BT1101 - R Code Cheatsheet 1.0
12 pages
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
No ratings yet
Linear Regression Analysis HUDM 5122: Introduction To R Johnny Wang
17 pages
Programming With R: Lecture #4
No ratings yet
Programming With R: Lecture #4
34 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
BT1101 L2 LAB - Data Exploration and Viz AY2425S1
No ratings yet
BT1101 L2 LAB - Data Exploration and Viz AY2425S1
45 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
R Session A
No ratings yet
R Session A
107 pages
Final DSR Lab Record
No ratings yet
Final DSR Lab Record
16 pages
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
No ratings yet
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
2 pages
R Packages Dplyr Sem-III 2021
No ratings yet
R Packages Dplyr Sem-III 2021
13 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
Basic R Dplyr Session 4 Demonstration
No ratings yet
Basic R Dplyr Session 4 Demonstration
18 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
Introduction To R PDF
No ratings yet
Introduction To R PDF
56 pages
Data - Analysis - With - R - 24
No ratings yet
Data - Analysis - With - R - 24
47 pages
Da Session 4
No ratings yet
Da Session 4
75 pages
Introduction To R, Version 2
No ratings yet
Introduction To R, Version 2
51 pages
CIND123 Swirl Lesson 15
No ratings yet
CIND123 Swirl Lesson 15
46 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
R Software - Notes
No ratings yet
R Software - Notes
18 pages
Mit 302 Cat Solutions - 1
No ratings yet
Mit 302 Cat Solutions - 1
4 pages
Advantages of R Programming Language:: Extensive Libraries
No ratings yet
Advantages of R Programming Language:: Extensive Libraries
34 pages
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
No ratings yet
Data Science Using R - Lab Manual-Complete Ver 2.0 - Nov 2024
36 pages
ECON 1100 R04 - R.Commands PDF
No ratings yet
ECON 1100 R04 - R.Commands PDF
15 pages
R Advbeginner v5
No ratings yet
R Advbeginner v5
73 pages
R Course ISLR Basics 2023
No ratings yet
R Course ISLR Basics 2023
77 pages
R Statistical Package
No ratings yet
R Statistical Package
63 pages
R
No ratings yet
R
13 pages
DS Lab
No ratings yet
DS Lab
31 pages
R Code Intro
No ratings yet
R Code Intro
46 pages
DR - Pierpaolo-Delser - Introduction R
No ratings yet
DR - Pierpaolo-Delser - Introduction R
83 pages
Statistics With R Unit 1: Divya Arun Kumar
No ratings yet
Statistics With R Unit 1: Divya Arun Kumar
65 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
Introduction To Data Analysis Using R 35 Min Lecture
No ratings yet
Introduction To Data Analysis Using R 35 Min Lecture
17 pages
R Programming Swirl
No ratings yet
R Programming Swirl
22 pages
CH 3
No ratings yet
CH 3
33 pages
Prerequis R
No ratings yet
Prerequis R
38 pages
Basic Statistics
No ratings yet
Basic Statistics
66 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
No ratings yet
Introduction To R: Nihan Acar-Denizli, Pau Fonseca
50 pages
ProgrammingForDS14 Rbasics
No ratings yet
ProgrammingForDS14 Rbasics
32 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
Unit 3 Part 2 Graphics For Communication
No ratings yet
Unit 3 Part 2 Graphics For Communication
40 pages
R Programming Unit 3 QB Solved
No ratings yet
R Programming Unit 3 QB Solved
272 pages
All v2 Basic Statistics Using R
No ratings yet
All v2 Basic Statistics Using R
241 pages
R Course Own English HS
No ratings yet
R Course Own English HS
70 pages
R Software Project
No ratings yet
R Software Project
42 pages
Lecture 1
No ratings yet
Lecture 1
42 pages
In R Programming PDF
No ratings yet
In R Programming PDF
72 pages
RStudio Exercices
No ratings yet
RStudio Exercices
8 pages
Form 2 School Based Computer Science Syllabus
No ratings yet
Form 2 School Based Computer Science Syllabus
5 pages
My Resume
No ratings yet
My Resume
1 page
Logical Positivism
No ratings yet
Logical Positivism
6 pages
Inquiry Unit Planning Template
No ratings yet
Inquiry Unit Planning Template
4 pages
Financial Modeling Question Paper
No ratings yet
Financial Modeling Question Paper
2 pages
Exercise Grade 12
No ratings yet
Exercise Grade 12
7 pages
Zapotec Civilization
No ratings yet
Zapotec Civilization
8 pages
Harmonize 2 TRM Review 58 Vocab Grammar Worksheets
No ratings yet
Harmonize 2 TRM Review 58 Vocab Grammar Worksheets
6 pages
How Does A Teacher Become A Facilitator of Learning
No ratings yet
How Does A Teacher Become A Facilitator of Learning
32 pages
42 Plag Report
No ratings yet
42 Plag Report
56 pages
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
No ratings yet
The Product Rule: Lesson Objective: Be Able To Differentiate The Product of Two Functions Using The Product Rule
13 pages
2023 Grade 12 Math Trial Exam Paper 1 GP Memo-1
No ratings yet
2023 Grade 12 Math Trial Exam Paper 1 GP Memo-1
22 pages
Manning 2010
No ratings yet
Manning 2010
13 pages
69th Film Awards MCQs
No ratings yet
69th Film Awards MCQs
25 pages
The Victorian Poetry
No ratings yet
The Victorian Poetry
7 pages
LDOMs (OVM For SPARC) Command Line Reference (Cheat Sheet)
100% (2)
LDOMs (OVM For SPARC) Command Line Reference (Cheat Sheet)
3 pages
Work BRITISH Council
No ratings yet
Work BRITISH Council
2 pages
Predicative Complexes With The Infinitive
No ratings yet
Predicative Complexes With The Infinitive
3 pages
Loading Data in +snowflake
No ratings yet
Loading Data in +snowflake
10 pages
UNIT 01 - Part of Speech
No ratings yet
UNIT 01 - Part of Speech
7 pages
Greek Society Reading-4
No ratings yet
Greek Society Reading-4
5 pages
Eden's Bridge Songs
No ratings yet
Eden's Bridge Songs
6 pages
17 HC2024 Tesla TTPoE v5
No ratings yet
17 HC2024 Tesla TTPoE v5
23 pages
Computer Assignment
No ratings yet
Computer Assignment
2 pages
Theory of L-Functions: An Introduction To The
No ratings yet
Theory of L-Functions: An Introduction To The
205 pages
English (Long Term Plan)
No ratings yet
English (Long Term Plan)
33 pages
What The Internet Really Is
No ratings yet
What The Internet Really Is
3 pages
Understanding The Times
No ratings yet
Understanding The Times
21 pages

Tutorial 1 - R Programming

Uploaded by

Tutorial 1 - R Programming

Uploaded by

Tutorial 1: R Programming

▶ knitr::opts_chunk$set(echo = TRUE): This line sets global options for all

To insert an R code chunk:

▶ the keyboard shortcut is Ctrl + Alt + I (Cmd + Option + I on macOS).

You can create ordered lists by starting a line with a number.

2. You can display an image by starting with an exclamation mark (!),

Five Important Verbs

1. Pick observations by their values (filter()).

summarise() collapses a data frame to a single row:

summarise(mtcars, avg_mpg = mean(mpg, na.rm = TRUE))

▶ summarise() is not terribly useful unless we pair it with group_by().

by_cyl2 <- group_by(mtcars, cyl, vs)

by_cyl3 <- group_by(mtcars, cyl, vs, am)

▶ The %in% operator in R can be used to identify if an element (e.g., a

# using the %in% operator to check matching values in the vectors

## [1] FALSE FALSE TRUE TRUE TRUE

g <- c("C", "D", "E")

mtcars %>% filter(cyl %in% c(4,6))

## mpg cyl disp hp drat wt qsec vs am gear carb

Example 1: Statistics for Certain Names

We have 138 years in total.

numyears <- length(unique(bn$year))

persistentnames2 <- bn %>%

You might also like