(Tutorial) Graphics With Ggplot2 - DataCamp
(Tutorial) Graphics With Ggplot2 - DataCamp
(Tutorial) Graphics With Ggplot2 - DataCamp
DataCamp Team
September 25th, 2020
Data visualization is an essential skill for data scientists. It combines statistics and design
in meaningful and appropriate ways. On the one hand, data visualization is a form of
graphical data analysis, emphasizing accurate representation, and data interpretation. On
the other hand, data visualization relies on good design choices to make our plots
attractive and aid both the understanding and communication of results. On top of that,
there is an element of creativity, since data visualization is a form of visual
communication at its heart.
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 1/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
quickly explore data, but you'll also be tasked with explaining your results to stake-
holders. Good design begins with thinking about the audience - and sometimes that just
means ourselves.
Scatter Plot
Below, we have a dataset that contains the average brain and body weights of 62
mammals.
MASS::mammals
body brain
Arctic fox 3.385 44.50
Owl monkey 0.480 15.50
Mountain beaver 1.350 8.10
Cow 465.000 423.00
Grey wolf 36.330 119.50
Goat 27.660 115.00
Roe deer 14.830 98.20
...
Pig 192.000 180.00
Echidna 3.000 25.00
Brazilian tapir 160.000 169.00
Tenrec 0.900 2.60
Phalanger 1.620 11.40
Tree shrew 0.104 2.50
Red fox 4.235 50.40
To understand the relationship here, the most obvious rst step is to make a scatter plot,
like the one shown below:
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 2/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
Two mammals, the African and the Asian Elephants have both very large brain and body
weights, leading to a positive skew on both axes.
Linear Model
Now, if we were to apply a linear model, it would be a poor choice since a few extreme
values have a large in uence.
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 3/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 4/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
So, although we began with a rough exploratory plot, it informed us about our data and
led us to a meaningful result.
Anscombe's Plots
When we imagine a linear model, as presented on this anonymous plot, we imagine that
we are describing data that looks something like this.
But this same model could be describing a very different set of data, such as a parabolic
relationship, which calls for a different model.
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 5/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
Or data in which an extreme value has a large effect. which becomes clear when the
outlier is removed.
And sometimes, the model may be describing a relationship where, in fact, there is none
at all because some extreme values may be incorrect.
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 6/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
If we relied solely on the numerical output without plotting our data, we'd have missed
distinct and interesting underlying trends.
We can see that data visualization is rooted in statistics and graphical data analysis, but
it's also a creative process that involves some amount of trial and error.
Interactive Example
In the following example, you will rst Load the ggplot2 package using library(). Then,
you will use str() to explore the structure of the mtcars dataset.
Finally, you will visualize the ggplot and try to understand what ggplot does with the data.
You will use the mtcars dataset contains information on 32 cars from a 1973 issue of
Motor Trend magazine. This dataset is small, intuitive, and contains a variety of
continuous and categorical variables.
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 7/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
To learn more about data visualization with ggplot, please see this video from our course
Introduction to Data Visualization with ggplot2.
This content is taken from DataCamp’s Introduction to Data Visualization with ggplot2
course by Rick Scavetta.
8 0
Subscribe to RSS
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 9/10
1/1/2021 (Tutorial) Graphics with ggplot2 - DataCamp
https://www.datacamp.com/community/tutorials/graphics-with-ggplot2 10/10