Part 6 Section 1 Introduction to Data Visualization
Part 6 Section 1 Introduction to Data Visualization
visualization
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 1 / 16
Introduction to data visualization
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 2 / 16
Introduction to data visualization
It is rarely useful when looking at the numbers, character strings from a
dataset. How much information you get when look at murders dataset?
library(dslabs)
murders
California
1000 Texas
Florida
Pennsylvania New York
Michigan Illinois
Louisiana Missouri Virginia Georgia
Ohio
Maryland Arizona North Carolina
South Carolina Tennessee New Jersey
Mississippi Alabama Indiana
Total number of murders
Kentucky Massachusetts
100 Arkansas Oklahoma
District of Columbia Connecticut Wisconsin Washington
Nevada
New Mexico Colorado
Kansas
Minnesota
Oregon
Delaware Nebraska
Wyoming Hawaii
New Hampshire
North Dakota
Vermont
1 3 10 30
Populations in millions
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 4 / 16
Introduction to data visualization
california
tennessee
north carolina
35
arizona new mexico oklahoma arkansas
south carolina
mississippi
georgia
alabama
louisiana
30 texas
florida
25
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 5 / 16
Introduction to data visualization
Why data visualization is important?
Make data easier to understand and remember
Discover unknown facts, outliers and trends
Visualize relationships and patterns quickly
Ask better questions and make better decisions
What makes a good data visualization?
Step 1. Clean data (is ready to visualize)
Step 2. Pick the right chart
Step 3. Design and customize your visualization
Step 4. Publish, share and communicate
Remember, simplicity is the key.
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 6 / 16
Introduction to data visualization
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 7 / 16
Introduction to data visualization
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 8 / 16
Introduction to data visualization
What is the problem with the NYC Regents Exam in 2010 where you need a
score of 65 to pass?
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 9 / 16
Introduction to ggplot2
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 10 / 16
Introduction to ggplot2
Advantages of ggplot2
Users are not limited to a set of pre-specified graphics. You can build
graphics that precisely tells your story.
Disadvantages of ggplot2
ggplot2 is useful only when users have some basic knowledge in R.
ggplot2 doesn’t suggest what graphics you should use to answer the
questions you are interested in.
ggplot2 is not designed to create dynamic and interactive graphics i.e.
ggplot2 is suitable with static data.
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 11 / 16
Grammar of graphics
The grammar of graphics describes the deep features that underlie all
statistical graphics:
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 12 / 16
Grammar of graphics
1. Data that you want to visualise.
2. Aesthetic mappings (aes) describing how variables in the data are
mapped to aesthetic attributes.
3. Geometric objects (geoms) represent what you actually see on
the plot: points, lines, polygons, etc.
4. A faceting describes how to break up the data into subsets.
5. Statistical transformations (stats) summarise data in many
useful ways.
6. A coordinate system describes how data coordinates are mapped
to the plane of the graphic.
7. A theme controls the finer points of display, like the font size and
background colour.
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 13 / 16
Introduction to data visualization
Measles vacinne was licensed in 1963 in the United States
Wyoming
Wisconsin
West Virginia
Washington
Virginia
Vermont
Utah
Texas
Tennessee
South Dakota
South Carolina
Rhode Island
Pennsylvania
Oregon
Oklahoma
Ohio
North Dakota
North Carolina
New York
New Mexico
New Jersey
New Hampshire
Nevada
Nebraska
Montana
Missouri
Mississippi
Minnesota
Michigan
Massachusetts
Maryland
Maine
Louisiana
Kentucky
Kansas
Iowa
Indiana
Illinois
Idaho
Hawaii
Georgia
Florida
District Of Columbia
Delaware
Connecticut
Colorado
California
Arkansas
Arizona
Alaska
Alabama
1940 1960 1980 2000
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 14 / 16
Introduction to data visualization
p1<-us_contagious_diseases%>%filter(disease=="Measles")%>%
mutate(rate=count*1000/population)%>%
ggplot(aes(year,state,fill=rate))+geom_tile(color="grey")
p1+scale_fill_gradientn(colors = c(rgb(1,1,1),rgb(1,0,0),
rgb(0.8,0,0)),trans = "sqrt")+
geom_vline(xintercept=1963, col = "green", size=2) +
scale_x_continuous(expand=c(0,0))+
theme_minimal()+
theme(panel.grid = element_blank(),
legend.position="bottom",
text = element_text(size = 12))+
xlab(label="")+
ylab(label="")
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 15 / 16
Introduction to data visualization
End of Section 1
Dr. Nguyen Quang Huy Part 6 - Section 1: Introduction to data visualization May 16, 2020 16 / 16