0% found this document useful (0 votes)

57 views

3 Ggplot PDF

Uploaded by

Saitama Deku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views

3 Ggplot PDF

Uploaded by

Saitama Deku

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Week 3: Data visualization with ggplot

SOC252 Winter 2023

Table of contents

1 By the end of this lab you should know 1

2 Read in the data 2

3 ggplot 2
3.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3.2 Bar charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Line graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.5 Scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.6 Faceting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Review Questions 19

1 By the end of this lab you should know

• ggplot basics; how to make each of the important types of graphs

– histogram
– bar chart
– boxplot
– line plot
– scatter plot
• how to color / fill by group
• fct_reorder to reorder categorical values
• selecting only certain values of a variable using %in%
• faceting

1
2 Read in the data

We’ll be using the GSS and country indicators data set.

library(tidyverse)

# Read the files

gss <- read_csv("../data/gss.csv")
country_ind <- read_csv("../data/country_indicators.csv")

3 ggplot

ggplot is a powerful visualization package. It provides many options to make beautiful graphs,
maps, plots of all sorts. We will look at some important graph types today.

3.1 Histograms

ggplot works in layers:

• The first piece is the ggplot function, where you specify

– the data set where the data to be plotted are contained
– the aes function, where you specify the variables to be plotted
• We then specify what type of plot to make, which are functions prefixed by geom_
• We can then customize titles, labels, themes, etc

So for a histogram of ages at first marriage in the GSS, we start with specifying the dataset
and variable:

ggplot(data = gss, aes(age_at_first_marriage))

2
20 30 40 50
age_at_first_marriage

Note this is just a blank box, but it has the right x axis. Add a histogram:

ggplot(data = gss, aes(age_at_first_marriage)) +

geom_histogram()

600
count

400

200

0
20 30 40 50
age_at_first_marriage

3
Customize the labels:

ggplot(data = gss, aes(age_at_first_marriage)) +

geom_histogram() +
labs(title = "Age at first marriage, GSS", x = "Age at first marriage (years)")

Age at first marriage, GSS

600
count

400

200

0
20 30 40 50
Age at first marriage (years)

Now we can change the color of the bars. Note for histograms, bar chats, box plots, fill is
the main color choice (color changes the outline)

ggplot(data = gss, aes(age_at_first_marriage)) +

geom_histogram(fill = "lightblue", color = "navy") +
labs(title = "Age at first marriage, GSS", x = "Age at first marriage (years)")

4
Age at first marriage, GSS

600
count

400

200

0
20 30 40 50
Age at first marriage (years)

Note that you can also save the plot as an object and then print it

my_plot <- ggplot(data = gss, aes(age_at_first_marriage)) +

geom_histogram(fill = "lightblue", color = "navy")+
labs(title = "Age at first marriage, GSS", x = "Age at first marriage (years)")

# print
my_plot + ylab("Number of observations")

5
Age at first marriage, GSS

600
Number of observations

400

200

0
20 30 40 50
Age at first marriage (years)

Histograms select a binwidth or section of the data and then count how many of the obser-
vations fall within that. Histograms look different depending on the size of the bins. You can
also supply the number of bins that you want to create.

ggplot(data = gss, aes(age_at_first_marriage)) +

geom_histogram(fill = "lightblue", color = "navy", binwidth = 1) +
labs(title = "Age at first marriage, GSS", x = "Age at first marriage (years)")

6
Age at first marriage, GSS

600

400
count

200

0
20 30 40 50
Age at first marriage (years)

ggplot(data = gss, aes(age_at_first_marriage)) +

geom_histogram(fill = "lightblue", color = "navy", bins = 10)+
labs(title = "Age at first marriage, GSS", x = "Age at first marriage (years)")

Age at first marriage, GSS

1500
count

1000

500

0
20 30 40 50
Age at first marriage (years)

7
We can also plot by another variable to compare the plots by the categories of the variable.
For example, we look at plots by whether or not people have at least a bachelor degree:

ggplot(data = gss |> drop_na(has_bachelor_or_higher), aes(age_at_first_marriage , fill = h

geom_histogram(position = 'dodge') +
labs(title = "Age at first marriage, GSS", x = "Age at first marriage (years)")

Age at first marriage, GSS

600

400
has_bachelor_or_higher
count

No
Yes
200

0
20 30 40 50
Age at first marriage (years)

Importantly, note that the fill color is now specified in the aes function, because it depends
on a variable. Also note that when specifying the data, we have dropped the NAs in the
has_bachelor_or_higher variable.

3.2 Bar charts

Let’s plot the proportion of respondents by province as a bar chart. First save the proportions
as a new data frame

resp_by_prov <- gss |>

group_by(province) |>
tally() |>
mutate(prop = n / sum(n))

8
resp_by_prov

# A tibble: 10 x 3
province n prop
<chr> <int> <dbl>
1 Alberta 1728 0.0839
2 British Columbia 2522 0.122
3 Manitoba 1192 0.0579
4 New Brunswick 1337 0.0649
5 Newfoundland and Labrador 1094 0.0531
6 Nova Scotia 1425 0.0692
7 Ontario 5621 0.273
8 Prince Edward Island 708 0.0344
9 Quebec 3822 0.186
10 Saskatchewan 1153 0.0560

Now plot

ggplot(data = resp_by_prov, aes(x = province, y = prop)) +

geom_bar(stat = "identity") +
labs(title = "Proportion of GSS respondents by province", y = "proportion")

Proportion of GSS respondents by province

0.2
proportion

0.1

0.0
Alberta
British Columbia
Manitoba
New
Newfoundland
Brunswick and
Nova
Labrador
ScotiaOntario
Prince Edward Island
QuebecSaskatchewan
province

9
There are a few things here that would be nice to fix. Firstly, the categories are ordered
alphabetically, which is the default. It would be better visually to order by proportion. We
can do this using the fct_reorder function to alter (mutate) the province variable.

resp_by_prov <- resp_by_prov |>

mutate(province = fct_reorder(province, prop)) # order by proportion

Now try plotting again.

ggplot(data = resp_by_prov, aes(x = province, y = prop)) +

geom_bar(stat = "identity") +
labs(title = "Proportion of GSS respondents by province", y = "proportion")

Proportion of GSS respondents by province

0.2
proportion

0.1

0.0
Prince
Newfoundland
Edward Island
and
Saskatchewan
Labrador
Manitoba
New Brunswick
Nova ScotiaAlberta
British Columbia
Quebec Ontario
province

To improve readability, could change to horizontal bar chart.

ggplot(data = resp_by_prov, aes(x = province, y = prop)) +

geom_bar(stat = "identity") +
labs(title = "Proportion of GSS respondents by province", y = "proportion")+
coord_flip()

10
Proportion of GSS respondents by province
Ontario

Quebec

British Columbia

Alberta
province

Nova Scotia

New Brunswick

Manitoba

Saskatchewan

Newfoundland and Labrador

Prince Edward Island

0.0 0.1 0.2

proportion

3.3 Box plots

Let’s use the country indicators dataset here and do boxplots of child mortality in 2017 over
regions. Like the bar chart example, best to reorder the regions by the variable we are interested
in

country_ind_2017 <- country_ind |>

filter(year==2017) |>
mutate(region = fct_reorder(region, -child_mort)) # descending order

ggplot(data = country_ind_2017, aes(x = region, y = child_mort)) +

geom_boxplot() +
labs(title = "Distribution of child mortality by region, 2017", y = "under-five child mo

11
under−five child mortality (deaths per 1000 live births)
Distribution of child mortality by region, 2017
125

100

0
Sub−Saharan
Southern
Africa Asia
Oceania
South−eastern
Caucasus Asia
and Northern
Central
Latin America
Asia
Africa and
Eastern
Caribbean
Western
Asia Developed
Asia regions
region

The labels on the x axis are hard to read. We could do the same as last time (switch to
horizontal), or we can change the alignment of the labels:

ggplot(data = country_ind_2017, aes(x = region, y = child_mort)) +

geom_boxplot() +
labs(title = "Distribution of child mortality by region, 2017", y = "under-five child mor
theme(axis.text.x = element_text(angle = 45, hjust = 1))

12
child mortality (deaths per 1000 live births)
Distribution of child mortality by region, 2017
125

100

0
ia

ia
a

ia
ia

s
an

on
ric

ric
si
As

As
As

be
lA

gi
Af

Af
under−five

ib
rn

rn
rn

re
tra

er
n

n
O

ar
he

te
e
ra

ed
en

st
st

es
ut
ha

Ea
ea

op
C

W
So

or
Sa

an
h−

el
N
an

ev
−

a
ub

ic
So

D
s

er
su
S

Am
ca
au

tin
C

La
region

Note if you want to color the boxes, use fill, and then remove the legend (not needed)

ggplot(data = country_ind_2017, aes(x = region, y = child_mort, fill = region)) +

geom_boxplot() +
labs(title = "Distribution of child mortality by region, 2017", y = "under-five child mo
theme(axis.text.x = element_text(angle = 45, hjust = 1) ,
legend.position = 'none')

13
child mortality (deaths per 1000 live births)
Distribution of child mortality by region, 2017
125

100

0
ia

ia
a

ia
ia

s
an

on
ric

ric
si
As

As
As

be
lA

gi
Af

Af
under−five

ib
rn

rn
rn

re
tra

er
n

n
O

ar
he

te
e
ra

ed
en

st
st

es
ut
ha

Ea
ea

op
C

W
So

or
Sa

an
h−

el
N
an

ev
−

a
ub

ic
So

D
s

er
su
S

Am
ca
au

tin
C

La
region

3.4 Line graphs

Let’s look at the mean life satisfaction by age of respondent. Firstly, let’s make a new variable
in the gss dataset that groups people into 5-year age groups. Here’s the code to do this:

age_groups <- seq(15, 80, by = 5)

gss$age_group <- as.numeric(as.character(cut(gss$age,
breaks= c(age_groups, Inf),
labels = age_groups,
right = FALSE)))

#check
gss |> select(age, age_group)

# A tibble: 20,602 x 2
age age_group
<dbl> <dbl>
1 52.7 50
2 51.1 50
3 63.6 60
4 80 80
5 28 25

14
6 63 60
7 58.8 55
8 80 80
9 63.8 60
10 25.2 25
# ... with 20,592 more rows

Now let’s calculate the average of the ‘life satisfaction’ variable by age group and whether or
not they had at least a bachelor’s degree. This involves a group_by by two variables:

life_satis_age_bach <- gss |>

drop_na(has_bachelor_or_higher) |>
group_by(age_group, has_bachelor_or_higher) |>
summarise(mean_life_satis = mean(feelings_life, na.rm = TRUE))

Plot as a line chart over age, coloring by sex, for this example we use a different colour palette
called “Set1”:

ggplot(data = life_satis_age_bach, aes(x = age_group,

y = mean_life_satis,
colour = has_bachelor_or_higher)) +
geom_point() +
geom_line() +
scale_color_brewer(palette = "Set1", name = "Has Bachelor degree or higher?") + # change
labs(title = "Average life satisfaction by age and education", x = "age group", y = "ave

15
Average life satisfaction by age and education

8.50
average life satisfication

Has Bachelor degree or higher?

8.25
No
Yes

8.00

7.75
20 40 60 80
age group

3.5 Scatter plots

Let’s use the country indicators dataset here. The example in the lecture slides is life ex-
pectancy versus TFR. We also used a new colour palette called virdis, these colours palettes
are designed to be viewable in black and white as well.

ggplot(country_ind_2017, aes(tfr, life_expectancy, color = region,)) +

geom_point() +
labs(title = "TFR versus life expectancy, 2017", y = "life expectancy (years)", x = "TFR
theme_bw(base_size = 14) +
scale_color_viridis_d()

16
TFR versus life expectancy, 2017
region
Sub−Saharan Africa
life expectancy (years)
80 Southern Asia
Oceania
South−eastern Asia
70 Caucasus and Central Asia
Northern Africa
Latin America and Caribbean
60 Eastern Asia
Western Asia
Developed regions
2 4 6
TFR (births per woman)

Instead of dots could have country codes (although becomes hard to read, but easy to see
outliers)

ggplot(country_ind_2017, aes(tfr, life_expectancy, color = region, label = country_code)

geom_text() +
labs(title = "TFR versus life expectancy, 2017", y = "life expectancy (years)", x = "TFR
theme_bw(base_size = 14)+
scale_color_viridis_d()

17
TFR versus life expectancy, 2017
JPN
region
ESP
KOR
SGP
CHE
ITAFRA
AUS
PRT
FIN
GRCSWE
CANIRL ISR
ISL
NOR
LUX
MLT a Sub−Saharan Africa
life expectancy (years) SVN
AUT
PRI
CYP NZL
NLD
BEL
DEU
DNK
CRI
EST
CHL
POL
SVKQAT
CZE
USA
HRVURY
LTULBNPAN a Southern Asia
80 BIHCUB
THA
BRB
HUN MDV
ALBTUR
LVA
COLLKA
ARG OMN
MNE
ARE
CHN
BGR VNM
BLR
ROU
BRA ECU
PER
TUN
SRB
MUS
ARMBHR
MYS
ATG
RUS
LCA MEX
GEO NIC
BLZ
MAR
IRN
SLV
SYC DZA
SYR a Oceania
UKR BRN HND
DOM
SAU
KWT
VENKAZ
GTM
MDA TTO
JAM
BHS PRY
CPV
AZE JOR
KGZWSM
PHL
GRD
BGDSUR
BOL EGY SLB a South−eastern Asia
IDN
UZB
GUYMNG TJK
NPL BWA TON
IRQ
KIR
VUTSTP
BTN
IND TKM
KHM RWA TLS a Caucasus and Central Asia
70 MMR FJI SEN
KEN
DJIPAKMDG
GAB
YEMETH
ERI
ZAF SDN
MWITZA
MRT a Northern Africa
NAM
HTIPNG ZMB
COM
AFG
COG
LBR
GHA UGA a Latin America and Caribbean
SWZZWE BENAGO
GMB
BDI NER
MOZ
TGO BFA
GIN
60 CMR
GNB
GNQ a Eastern Asia
CIV MLI
SSD SOM
LSO a Western Asia
SLE
CAF TCD
NGA
a Developed regions
2 4 6
TFR (births per woman)

3.6 Faceting

Changing the color and fills is useful to show one other variable on a graph. For more compli-
cated set-ups, faceting graphs by an additional variable becomes useful.
For example let’s go back to plotting a histogram of age at first marriage by whether or not
the respondent has at least a bachelor degree, but also add in whether or not the respondent
was born in Canada. First, look at the unique values of the place_birth_canada variable:

gss |>
select(place_birth_canada) |>
unique()

# A tibble: 4 x 1
place_birth_canada
<chr>
1 Born in Canada
2 Born outside Canada
3 <NA>
4 Don't know

For now, filter the data to only include the first two categories. To do this, use the %in%
function within filter:

18
gss_subset <- gss |>
filter(place_birth_canada %in% c("Born in Canada", "Born outside Canada")) |>
drop_na(has_bachelor_or_higher) # also remove the NAs from the education variable

Now plot the histograms as before, but now also facet by place of birth. Note we are plotting
the density here.

ggplot(data = gss_subset, aes(age_at_first_marriage, fill = has_bachelor_or_higher)) +

geom_histogram(position = 'dodge', aes(y = ..density..)) +
facet_wrap(~place_birth_canada) +
xlab("age at first marriage")

Born in Canada Born outside Canada

0.125

0.100

0.075 has_bachelor_or_higher
density

No
0.050 Yes

0.025

0.000
20 30 40 50 20 30 40 50
age at first marriage

4 Review Questions

1. Using the country_indicator dataset, create a scatter plot of GDP over life expectancy
by region for the year 2014. Edit the labels, set a title, and make sure the graph is
color-coded.
2. Using the GSS dataset, create a bar graph of non-missing values for the province of birth
(place_birth_province) and then arrange the proportions from high to low. Make sure
to color code and make all labels are readable.

CT127 3 2 Pfda NP000327
No ratings yet
CT127 3 2 Pfda NP000327
21 pages
Guide Manual to Intercept and Beat the Roulette Microprocessor
From Everand
Guide Manual to Intercept and Beat the Roulette Microprocessor
The Guru
No ratings yet
Data Visualization With R - Principles and Practice
No ratings yet
Data Visualization With R - Principles and Practice
36 pages
Chapter 06-Statistical Methods in Quality Management: True/False
No ratings yet
Chapter 06-Statistical Methods in Quality Management: True/False
20 pages
Plotting With Ggplot: Install - Packages ("Ggplot2") Library (Ggplot2)
No ratings yet
Plotting With Ggplot: Install - Packages ("Ggplot2") Library (Ggplot2)
3 pages
Using Ggplot2 For Plots in R
No ratings yet
Using Ggplot2 For Plots in R
8 pages
Data Visualization
No ratings yet
Data Visualization
46 pages
Lecture 3&4
No ratings yet
Lecture 3&4
294 pages
2 R - Zajecia - 4 - Eng
No ratings yet
2 R - Zajecia - 4 - Eng
7 pages
Uncertainty
No ratings yet
Uncertainty
69 pages
ppt3
No ratings yet
ppt3
20 pages
On Eda
No ratings yet
On Eda
60 pages
R Module 4
No ratings yet
R Module 4
31 pages
Ggplot2 - Easy Way To Mix Multiple Graphs On The Same Page - Articles - STHDA
No ratings yet
Ggplot2 - Easy Way To Mix Multiple Graphs On The Same Page - Articles - STHDA
54 pages
Figures With GGPlot
No ratings yet
Figures With GGPlot
58 pages
Exercise 1
No ratings yet
Exercise 1
5 pages
Ggplot2 Slides
No ratings yet
Ggplot2 Slides
82 pages
PRACTICUM, Day 1: R Graphing: Basic Plotting and Ggplot2: CRG Bioinformatics Unit, Sarah - Bonnin@crg - Eu May 6th, 2016
No ratings yet
PRACTICUM, Day 1: R Graphing: Basic Plotting and Ggplot2: CRG Bioinformatics Unit, Sarah - Bonnin@crg - Eu May 6th, 2016
52 pages
11 Data Visualization
No ratings yet
11 Data Visualization
44 pages
pdf copy
No ratings yet
pdf copy
19 pages
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
No ratings yet
Data Visualization With Ggplot2, Asthetic Mappings, Facets, Common Problems, Layered Grammar of Graphics
21 pages
04 Data Visualization
No ratings yet
04 Data Visualization
64 pages
DSAAct6
No ratings yet
DSAAct6
18 pages
Geom - Histogram Ggplot2 Geom - Histogram : # Library
No ratings yet
Geom - Histogram Ggplot2 Geom - Histogram : # Library
9 pages
DS-R Block 4 All
No ratings yet
DS-R Block 4 All
50 pages
Data Visualization 2.1
No ratings yet
Data Visualization 2.1
2 pages
Unit3__R
No ratings yet
Unit3__R
19 pages
Data Visualization With Ggplot2::: Cheat Sheet
No ratings yet
Data Visualization With Ggplot2::: Cheat Sheet
2 pages
Unit III - R Programming
No ratings yet
Unit III - R Programming
21 pages
Graphics in R - Chapter3
No ratings yet
Graphics in R - Chapter3
56 pages
Ggplot2 Cheatsheet 2.0
No ratings yet
Ggplot2 Cheatsheet 2.0
2 pages
Ggplot2 Intro
No ratings yet
Ggplot2 Intro
78 pages
03 Data Visualization
No ratings yet
03 Data Visualization
64 pages
Ultimate Cheat SHEET - Analysis in R
No ratings yet
Ultimate Cheat SHEET - Analysis in R
17 pages
Cheat Sheet Ggplot2
No ratings yet
Cheat Sheet Ggplot2
2 pages
Day 5 Session 1 Visualization I
No ratings yet
Day 5 Session 1 Visualization I
66 pages
2015jan Ggplot2koffman
No ratings yet
2015jan Ggplot2koffman
79 pages
BDA Experiment 9 and 10
No ratings yet
BDA Experiment 9 and 10
22 pages
Ggplot2 Course2 Ch4 Slides
No ratings yet
Ggplot2 Course2 Ch4 Slides
30 pages
Graphics Chapter
No ratings yet
Graphics Chapter
49 pages
Actex Pa Sample
No ratings yet
Actex Pa Sample
12 pages
Lecture 7 - Integrated Analysis With R
No ratings yet
Lecture 7 - Integrated Analysis With R
79 pages
Visualization in R
No ratings yet
Visualization in R
44 pages
DataVis Cheat Sheet
No ratings yet
DataVis Cheat Sheet
13 pages
Advanced Visualisationv1
No ratings yet
Advanced Visualisationv1
22 pages
Guide To Create: Beautiful Graphics in R
No ratings yet
Guide To Create: Beautiful Graphics in R
48 pages
Apuntes de Clase - DataCamp - Visualization in Higher Dimensions
No ratings yet
Apuntes de Clase - DataCamp - Visualization in Higher Dimensions
50 pages
Ggplot2 Cheatsheet
No ratings yet
Ggplot2 Cheatsheet
2 pages
Scientific Data Visualization: Using Ggplot2
No ratings yet
Scientific Data Visualization: Using Ggplot2
53 pages
R Graphics Essentials For Great Data Visualization
No ratings yet
R Graphics Essentials For Great Data Visualization
28 pages
Beautiful Graphics in R
No ratings yet
Beautiful Graphics in R
238 pages
Ex4
No ratings yet
Ex4
4 pages
Ggplot2 Cheatsheet PDF
No ratings yet
Ggplot2 Cheatsheet PDF
2 pages
Kitten
From Everand
Kitten
Phil X
No ratings yet
Charts & Diagrams Primer
From Everand
Charts & Diagrams Primer
Beam Vanwaardenberg
No ratings yet
Craps Wagering Strategies Using Actual Las Vegas Roll Data
From Everand
Craps Wagering Strategies Using Actual Las Vegas Roll Data
Eric Cybulski
No ratings yet
Hexagon Number Sense
From Everand
Hexagon Number Sense
Christopher Casey
No ratings yet
Excel Simulations
From Everand
Excel Simulations
Gerard M. Verschuuren
3.5/5 (2)
Start Predicting In A World Of Data Science And Predictive Analysis
From Everand
Start Predicting In A World Of Data Science And Predictive Analysis
Matthew Abbitt
No ratings yet
From Average To K-means
From Everand
From Average To K-means
Beam van Waardenberg
No ratings yet
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
The Cosmic Codex
From Everand
The Cosmic Codex
Rick Van Loon
No ratings yet
Tutorial GCDkit Ver 3.00
No ratings yet
Tutorial GCDkit Ver 3.00
88 pages
Eco-07 2012 Solution
No ratings yet
Eco-07 2012 Solution
9 pages
Advance Statistics & Probability Q & A
100% (3)
Advance Statistics & Probability Q & A
2 pages
The Cropping Pattern in Tamil Nadu in The Year 1974-75 Was As Follows
No ratings yet
The Cropping Pattern in Tamil Nadu in The Year 1974-75 Was As Follows
10 pages
Allowables Structural Composites: Alan Nettles
No ratings yet
Allowables Structural Composites: Alan Nettles
2 pages
Maths 2 Uace
No ratings yet
Maths 2 Uace
6 pages
Ncert Solutions Class 9 Math Chapter 14 Statistics Ex 14 3
No ratings yet
Ncert Solutions Class 9 Math Chapter 14 Statistics Ex 14 3
17 pages
Maths Igcse Scheme of Work 0580 - 2011
0% (1)
Maths Igcse Scheme of Work 0580 - 2011
6 pages
Mip2602 Examination Paper Jan Feb 2023
No ratings yet
Mip2602 Examination Paper Jan Feb 2023
8 pages
Presentation (STA102)
No ratings yet
Presentation (STA102)
19 pages
CM5 - Mathematics As A Tool
No ratings yet
CM5 - Mathematics As A Tool
17 pages
Statistics For Managers Using Microsoft® Excel 5th Edition: Presenting Data in Tables and Charts
No ratings yet
Statistics For Managers Using Microsoft® Excel 5th Edition: Presenting Data in Tables and Charts
35 pages
Q 3 Module 19
No ratings yet
Q 3 Module 19
16 pages
Tabular and Graphical Presentation of Data1
100% (1)
Tabular and Graphical Presentation of Data1
7 pages
Business Analytics With Excel
No ratings yet
Business Analytics With Excel
9 pages
Module 3 PDF
No ratings yet
Module 3 PDF
24 pages
Case Study Statistics Class 9
No ratings yet
Case Study Statistics Class 9
4 pages
Mat 107 Test 1 Practice
100% (2)
Mat 107 Test 1 Practice
13 pages
N Should Be Rounded Off: Arithmetic Mean
No ratings yet
N Should Be Rounded Off: Arithmetic Mean
12 pages
IGCSE BIO - TB Practical Activities40
No ratings yet
IGCSE BIO - TB Practical Activities40
1 page
Excel Youtube Data Analysis
100% (1)
Excel Youtube Data Analysis
43 pages
Quality Control Analysis of Candy Wrapping Process Using The QCC (Quality Control Circle) Method in The Candy Industry Indonesian
No ratings yet
Quality Control Analysis of Candy Wrapping Process Using The QCC (Quality Control Circle) Method in The Candy Industry Indonesian
13 pages
Frequencies: Frequencies Variables Usia /piechart Percent /order Analysis
No ratings yet
Frequencies: Frequencies Variables Usia /piechart Percent /order Analysis
37 pages
Chapter 2 Descriptive Statistics
No ratings yet
Chapter 2 Descriptive Statistics
13 pages
guide assignment
No ratings yet
guide assignment
9 pages
Apm Project Risk Analysis and Management
No ratings yet
Apm Project Risk Analysis and Management
11 pages
pmg422 - Module 3 - Team Culminating Project Milestone 1
No ratings yet
pmg422 - Module 3 - Team Culminating Project Milestone 1
15 pages
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
No ratings yet
Chapter One: 1. Basic Concepts, Methods of Data Collection and Presentation
111 pages

3 Ggplot PDF

Uploaded by

3 Ggplot PDF

Uploaded by

Week 3: Data visualization with ggplot

SOC252 Winter 2023

1 By the end of this lab you should know 1

2 Read in the data 2

1 By the end of this lab you should know

• ggplot basics; how to make each of the important types of graphs

We’ll be using the GSS and country indicators data set.

# Read the files

ggplot works in layers:

• The first piece is the ggplot function, where you specify

ggplot(data = gss, aes(age_at_first_marriage))

ggplot(data = gss, aes(age_at_first_marriage)) +

ggplot(data = gss, aes(age_at_first_marriage)) +

Age at first marriage, GSS

ggplot(data = gss, aes(age_at_first_marriage)) +

my_plot <- ggplot(data = gss, aes(age_at_first_marriage)) +

ggplot(data = gss, aes(age_at_first_marriage)) +

ggplot(data = gss, aes(age_at_first_marriage)) +

Age at first marriage, GSS

ggplot(data = gss |> drop_na(has_bachelor_or_higher), aes(age_at_first_marriage , fill = h

Age at first marriage, GSS

3.2 Bar charts

resp_by_prov <- gss |>

ggplot(data = resp_by_prov, aes(x = province, y = prop)) +

Proportion of GSS respondents by province

resp_by_prov <- resp_by_prov |>

Now try plotting again.

ggplot(data = resp_by_prov, aes(x = province, y = prop)) +

Proportion of GSS respondents by province

To improve readability, could change to horizontal bar chart.

ggplot(data = resp_by_prov, aes(x = province, y = prop)) +

Newfoundland and Labrador

Prince Edward Island

0.0 0.1 0.2

3.3 Box plots

country_ind_2017 <- country_ind |>

ggplot(data = country_ind_2017, aes(x = region, y = child_mort)) +

ggplot(data = country_ind_2017, aes(x = region, y = child_mort)) +

ggplot(data = country_ind_2017, aes(x = region, y = child_mort, fill = region)) +

3.4 Line graphs

age_groups <- seq(15, 80, by = 5)

life_satis_age_bach <- gss |>

ggplot(data = life_satis_age_bach, aes(x = age_group,

Has Bachelor degree or higher?

3.5 Scatter plots

ggplot(country_ind_2017, aes(tfr, life_expectancy, color = region,)) +

ggplot(country_ind_2017, aes(tfr, life_expectancy, color = region, label = country_code)

ggplot(data = gss_subset, aes(age_at_first_marriage, fill = has_bachelor_or_higher)) +

Born in Canada Born outside Canada

You might also like