0% found this document useful (0 votes)
166 views1 page

R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF

The document provides a cheat sheet on using the tidyverse collection of R packages, which includes dplyr for data manipulation, ggplot2 for data visualization, and other packages like tidyr and purrr. It gives examples of using functions from dplyr like filter(), arrange(), and mutate() to select, sort, and modify data. Additionally, it demonstrates how to create scatter plots and line graphs with ggplot2 by mapping variables to aesthetics and using geoms like geom_point() and geom_line().

Uploaded by

Jose AG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
166 views1 page

R For Data Science - Tidyverse For Beginners (Ggplot2, Dplyr, Tidyr, Readr, Purr, Tibble, Stringr, Forcats) PDF

The document provides a cheat sheet on using the tidyverse collection of R packages, which includes dplyr for data manipulation, ggplot2 for data visualization, and other packages like tidyr and purrr. It gives examples of using functions from dplyr like filter(), arrange(), and mutate() to select, sort, and modify data. Additionally, it demonstrates how to create scatter plots and line graphs with ggplot2 by mapping variables to aesthetics and using geoms like geom_point() and geom_line().

Uploaded by

Jose AG
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

R For Data Science Cheat Sheet dplyr ggplot2

Tidyverse for Beginners Filter Scatter plot


Learn More R for Data Science Interactively at www.datacamp.com filter() allows you to select a subset of rows in a data frame. Scatter plots allow you to compare two variables within your data. To do this with
ggplot2, you use geom_point()
> iris %>% Select iris data of species
filter(Species=="virginica") "virginica" > iris_small <- iris %>%
> iris %>% Select iris data of species filter(Sepal.Length > 5)
Tidyverse filter(Species=="virginica", "virginica" and sepal length > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width)) +
Compare petal
width and length
Sepal.Length > 6) greater than 6.
The tidyverse is a powerful collection of R packages that are actually geom_point()
data tools for transforming and visualizing data. All packages of the
tidyverse share an underlying philosophy and common APIs. Arrange Additional Aesthetics
arrange() sorts the observations in a dataset in ascending or descending order • Color
The core packages are: based on one of its variables. > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width,
• ggplot2, which implements the grammar of graphics. You can use it > iris %>% Sort in ascending order of color=Species)) +
to visualize your data. arrange(Sepal.Length) sepal length geom_point()
> iris %>% Sort in descending order of
• dplyr is a grammar of data manipulation. You can use it to solve the arrange(desc(Sepal.Length)) sepal length • Size
most common data manipulation challenges. > ggplot(iris_small, aes(x=Petal.Length,
Combine multiple dplyr verbs in a row with the pipe operator %>%: y=Petal.Width,
color=Species,
• tidyr helps you to create tidy data or data where each variable is in a > iris %>% Filter for species "virginica"
size=Sepal.Length)) +
column, each observation is a row end each value is a cell. filter(Species=="virginica") %>% then arrange in descending
geom_point()
arrange(desc(Sepal.Length)) order of sepal length
• readr is a fast and friendly way to read rectangular data. Faceting
Mutate > ggplot(iris_small, aes(x=Petal.Length,
y=Petal.Width)) +
• purrr enhances R’s functional programming (FP) toolkit by providing a mutate() allows you to update or create new columns of a data frame.
geom_point()+
complete and consistent set of tools for working with functions and facet_wrap(~Species)
vectors. > iris %>% Change Sepal.Length to be
mutate(Sepal.Length=Sepal.Length*10) in millimeters
• tibble is a modern re-imaginging of the data frame. > iris %>% Create a new column Line Plots
mutate(SLMm=Sepal.Length*10) called SLMm
> by_year <- gapminder %>%
Combine the verbs filter(), arrange(), and mutate(): group_by(year) %>%
• stringr provides a cohesive set of functions designed to make summarize(medianGdpPerCap=median(gdpPercap))
> iris %>%
working with strings as easy as posssible > ggplot(by_year, aes(x=year,
filter(Species=="Virginica") %>%
y=medianGdpPerCap))+
mutate(SLMm=Sepal.Length*10) %>% geom_line()+
• forcats provide a suite of useful tools that solve common problems arrange(desc(SLMm)) expand_limits(y=0)
with factors.
Summarize Bar Plots
You can install the complete tidyverse with:
> install.packages("tidyverse") summarize() allows you to turn many observations into a single data point.
> by_species <- iris %>%
> iris %>% Summarize to find the filter(Sepal.Length>6) %>%
Then, load the core tidyverse and make it available in your current R summarize(medianSL=median(Sepal.Length)) median sepal length group_by(Species) %>%
session by running: > iris %>% Filter for virginica then summarize(medianPL=median(Petal.Length))
> library(tidyverse) filter(Species=="virginica") %>% summarize the median > ggplot(by_species, aes(x=Species,
summarize(medianSL=median(Sepal.Length)) sepal length y=medianPL)) +
Note: there are many other tidyverse packages with more specialised usage. They are not geom_col()
loaded automatically with library(tidyverse), so you’ll need to load each one with its own call You can also summarize multiple variables at once:
to library().
> iris %>% Histograms
Useful Functions filter(Species=="virginica") %>%
summarize(medianSL=median(Sepal.Length), > ggplot(iris_small, aes(x=Petal.Length))+
> tidyverse_conflicts() Conflicts between tidyverse and other maxSL=max(Sepal.Length)) geom_histogram()
packages
> tidyverse_deps() List all tidyverse dependencies group_by() allows you to summarize within groups instead of summarizing the
> tidyverse_logo() Get tidyverse logo, using ASCII or unicode entire dataset:
characters
> tidyverse_packages() List all tidyverse packages
> iris %>% Find median and max Box Plots
group_by(Species) %>% sepal length of each
> tidyverse_update() Update tidyverse packages summarize(medianSL=median(Sepal.Length), species > ggplot(iris_small, aes(x=Species,
maxSL=max(Sepal.Length)) y=Sepal.Width))+
Loading in the data > iris %>%
filter(Sepal.Length>6) %>%
Find median and max
petal length of each
geom_boxplot()

> library(datasets) Load the datasets package group_by(Species) %>% species with sepal
> library(gapminder) Load the gapminder package summarize(medianPL=median(Petal.Length), length > 6
> attach(iris) Attach iris data to the R search path maxPL=max(Petal.Length)) DataCamp
Learn R for Data Science Interactively

You might also like