0% found this document useful (0 votes)
9 views111 pages

Data Visualization With R (2019!02!14)

Uploaded by

ttnhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views111 pages

Data Visualization With R (2019!02!14)

Uploaded by

ttnhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 111

What is R?

R is:
 A programming language used for statistical computing and data
visualization.
 Open source and freely available under the GNU General Public
License.
 Supported by the R Project for Statistical Computing
 Download latest version of R via:
 https://www.r-project.org/
 The use of R can be facilitated through the use of Rstudio.
 Download Rstudio via:
 https://www.rstudio.com/products/rstudio/download/
Basic R Data Modes
 Numeric
 e.g. 3.14, -543.6, 0.2, 10., etc.
 Integer
 e.g. 3, -544, 0, 10, etc.
 Complex
 e.g. -17 + 8*i
 Logical
 e.g. TRUE, FALSE
 Character
 e.g. “Hello”, “To be or not to be”, “ljfakl;r#”, etc.

While other modes of data exist, they are far less common than those listed above.
Common R Objects
Name Dimensions Contents Example
Vector 1 • Series of values 12.2 9.6 -4.8 2.5
• Single data mode
Matrix 2 • Values stored in rows 12.2 9.6 -4.8 2.5
and columns 8.3 -7.6 9.3 -2.7
• All values of the same -4.4 17.7 14.7 -6.9
data mode 1.7 4.5 53.4 5.2
List 1 • Series of values Denver 73.4 TRUE
• Allows multiple modes
Data Frame 2 • Values stored in rows Denver 73.4 TRUE
and columns Topeka 49.8 FALSE
• Different columns may
have different data
modes

In all of the above, integers are treated as numeric values.


Some Distinguishing Characteristics

 R does not use GUIs; it is entirely command-line based.


 Coding is simplified through the use of products such as RStudio.
 Commands consist of a function with associated arguments that define the manner in which
the function should be executed.
 For example, a command to create a vector
Command Arguments

c(“Larry”, “Moe”, “Curly”)

 For example, a command to load a comma-delimited data file:


Command Arguments

read.csv(“bigfile.csv”,
header = TRUE,
sep = “,”,
colClasses = c(“character”, “character”, “logical”, “numeric”))
Unsure about how to use a
command?
 Type a question mark (?), followed by the name of the
command (without any arguments defined).
 For example, to learn about the “par” command, which is used
for formatting various aspects of plots and other data
visualizations:
 ?par()
Creating New Objects

The output from commands can be assigned to objects. For


example:

stooges <- c(“Larry”, “Moe”, “Curly”)

Execute this command


and assign the resulting vector
to an object named “stooges”.
R Packages

 The capabilities of R are being continually expanded and improved


through the creation of “packages”
 A package may be developed to:
 Develop capabilities previously unavailable
 Improve upon existing capabilities
 Focus upon the needs of a particular user group.
Take a look for yourself.

 Google “list of R packages”


R Packages

 List of R Packages:
 https://cran.r-project.org/web/packages/available_packages_by_name.html
 Over 13,700 packages listed (as of 8 February 2019)

 Examples of R packages:
 dplyr: manipulates data by taking subsets, summarizing, rearranging and joining data sets.
 tidyr: reformats layouts of data sets to make them more compatible with R.
 lubridate: simplifies working with dates and times.
 oce: analysis of oceanographic data.
 AMR: antimicrobial resistance analysis
 WDI: used for downloading World Development Indicators data from the World Bank
Accessing Data Sets
 Data sets may be imported from the Internet or from a computer directory.
 R can import data in a wide variety of formats; some of the more common are:

• Excel • Access
• CSV • SQL Server
• TXT • Minitab
• SPSS
 R also includes a set of standard data sets (for you to practice/play with)
 To list available datasets, type the following command in R or Rstudio:
 library(help="datasets")
 A more complete description of many of these may be found on the following site:
 https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html
 For the examples in this presentation, we use data sets already included within R.
How to Specify Elements in Objects
 Element positions are often indicated using index numbers.
 Index numbers start at “1” and increase from there.
 Unlike some programming languages, such as Python, where indices start with “0”.
 An index of “1” refers to:
 The 1st element in a vector or list
 The 1st row or column of a matrix or data frame
 When specifying an element in a matrix or data frame, always indicate the row followed by the
column.
 e.g. mtcars[7, 4] refers to the element in the 7th row and 4th column of the data object mtcars.
 Colons can be used to indicate ranges of index numbers
 e.g. 3:9 indicates the indices from 3 through 9.
 Negative signs indicate the indices of elements to be excluded
 e.g. mtcars[7, -4] indicates everything in the 7th row of mtcars except what’s in the 4th column.
 Names can be assigned to elements in lists or to rows and columns in dataframes.
 mtcars["Valiant","hp"] will go to the row for the Plymouth Valiant and retrieve its horsepower (hp).
 Mtcars$hp will retrieve the horsepower (hp) column from mtcars.
Let’s try this out.
 Please open up the “Rstudio” application

If you don’t see 4


panes (windows), as
shown here, let me
know.
Examples
Suppose we have a list containing information about the Orlando Metropolitan
Statistical Area (MSA)
 Assume that the name of the list is “Orlando”

Element names (elements can be named in lists, but not in vectors)

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91

Element values
Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[5]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91

Note: R is case-sensitive.
 Therefore, in this command, the “O” in “Orlando” must be capitalized.
 If it were not capitalized (i.e., orlando[5]), you would get the following
error statement:
 Error: object 'orlando' not found
Examples
Question: What value(s) would be retrieved from the list “Orlando” by
the following command?

Orlando[5]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[2:4]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[2:4]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[c(1, 5, 6)]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[c(1, 5, 6)]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[c(1, 4:6)]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[c(1, 4:6)]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando["population"]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando["population"]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[Orlando < 25000]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91


Examples
Question: What value(s) would be retrieved from the list “Orlando” by the
following command?

Orlando[Orlando < 25000]

Name sq_miles population per_cap_income avg_temp_F rainfall_inches

“Orlando” 4012 2320195 25164 75.5 17.91

Note:

 When executing this command, you would get the following message:

 Warning message: NAs introduced by coercion

 This is due to R’s attempt to valuate a character string ( “Orlando”).


Examples

Assume that we have the following Data Frame, named


“New_England”:
Column
Names

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332

Row Names
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[3, 4]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[3, 4]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[3, 4:5]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[3, 4:5]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[3, 4:5]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[“Vermont”, 1]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[“Vermont”, 1]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[“Vermont”, “Population”]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[“Vermont”, “Population”]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[ , “Per_Capita_Income”]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England[ , “Per_Capita_Income”]

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England$Square_Miles

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Examples
Question: What value(s) would be retrieved from the data frame
“New_England” by the following command?

New_England$Square_Miles

Abbreviation Population Square_Miles Pop_per_Sq_Mile Per_Capita_Income


Rhode Island RI 1056298 1033.81 1018.1 30765
Connecticut CT 3590886 4842.36 738.1 38480
Massachusetts MA 6794422 7800.06 839.4 36441
Vermont VT 626042 9216.66 67.9 29535
New Hampshire NH 1330608 8952.65 147 33821
Maine ME 1329328 30842.92 43.1 27332
Taking subsets of an object
Command Arguments

subset(object name, criteria)

Example: using a dataset named mtcars (containing data about cars), extract
those records where mileage is greater than 20 mpg and either of the following
is true: the engine has more than 4 cylinders or 100 hp.

subset(mtcars, mpg>20 & (cyl>4 | hp >100))

Data set “&” = “and” “|” = “or”


Reordering data in an object
Command Arguments

filename[order(sorting field, sort direction)]

Example: Arrange the mtcars data set in descending order by horsepower (hp).

mtcars[order(mtcars$hp, decreasing = TRUE), ]

Data set field to sort by sort in descending order


The Basic Plot

Command Arguments

plot(mtcars$disp, mtcars$hp)

X coordinates Y coordinates
Jazzing up your plot
Let’s start by introducing “par”.
 par is a command for specifying graphical parameters
 Type “par()” to get a listing of the current values assigned to all or your par settings
 Explanations of par settings available at http://
stat.ethz.ch/R-manual/R-devel/library/graphics/html/par.html

Command Arguments

par (bg="#262a35", Background color using Hex color code

mar=c(5, 4, 3, 2), Margins on bottom, left, top and right sides of plot area in number of lines of text.
oma=c(0,0,0,0), Outer margins on bottom, left, top and right sides of plot area in number of lines of text.
col.lab="darkorange2", Color for x- and y-axis labels (using standard R color set)
col.axis="darkorange2", Color for axis annotation (using standard R color set)

col.main="darkorange2", Color for main title (using standard R color set)


font.main=2, Font setting for main title (“2” indicates bold type)

font.lab = 2, Font setting for x- and y-labels (“2” indicates bold type)

cex.main=1.2, Scaling factor for size of main title (relative to default value)

cex.axis=0.9, Scaling factor for size of axis annotations (relative to default value)
cex.lab=0.9, Scaling factor for size of axis annotations (relative to default value)

tck=0) Tick setting (“0” indicates no ticks on axes)


Jazzing up your plot
Now, create a blank plot using data from the mtcars data set (contains
specifications for various car models)

Command Arguments

plot(0, Don’t plot anything


xlim=c(min(mtcars$disp), max(mtcars$disp)), Set the limits of the x axis to the minimum and maximum of the disp
field
ylim=c(min(mtcars$hp), max(mtcars$hp)), Set the limits of the y axis to the minimum and maximum of the hp
field
type="n", Set type of plot (“n” indicates that no data will be plotted)
bty="n", Style of box to draw around plot (“n” indicates no box)

las=1, Orientation of axis labels (“1” indicates labels parallel to axes

main="Power as a Function of Displacement", Main title

xlab="Displacement", Label for x-axis

ylab="Power(hp)", Label for y-axis

asp=1/2) Y-to-x aspect ratio


So, what do we have so far?
Add the data points and a legend

Command Arguments

points(mtcars$disp, Where to find the x coordinates

mtcars$hp, Where to find the y coordinates

pch=mtcars$cyl, Indicate different symbols for different numbers of cylinders

col="green") Indicate color for symbols

legend(110, X-coordinate for upper left corner of legend box

380, Y-coordinate for upper left corner of legend box

pch=c(4,6,8), Indicate the different types of symbols on the plot

col="green", Indicate the color for the symbols

legend = c("4 cylinders", Provide the descriptions for the symbols


"6 cylinders","8 cylinders"),
bg ="azure4", Indicate the background color for the legend box

text.col = "white") Indicate the color for the text


…and a trend line would be nice

First, create a linear model (lm) based upon the data in the plot.
Object being Command Arguments
created

trend <- lm(hp~disp, Indicate that power (hp) is a function of displacement (disp)

data=mtcars) Indicate the data set being used

Then, add the line to the plot.


Command Arguments

abline(trend, Indicate what is being added

lty=2, Line type (“2” corresponds to a dashed line)

col="ghostwhite") Indicate line color of the line


Seems like a good time for an Updated
Figure
Just for fun, show the cars with the
highest power and displacement
Start by identifying the desired car models:
Command Arguments

subset(mtcars,hp==max(hp) | disp==max(disp))
Take a subset of mtcars for which hp matches the maximum hp value or disp matches the maximum disp value

The result:
Show the cars with the highest power and
displacement
Draw attention to the desired points by surrounding them with gold diamonds

Command Arguments

points(c(301, 472), The x-coordinates for the maximum hp and maximum disp points, respectively

c(335,205), The y-coordinates for the maximum hp and maximum disp points, respectively

pch=5, Indicate the type of symbol (“5” corresponds to an open diamond)

cex=2, Scaling factor for symbol size (relative to default size)

col="gold") Symbol color


Show the cars with the highest power and
displacement

Add labels for the desired points

Command Arguments

text(290, 360, "Max. Power: Maserati Bora", col = "white")


text(455, 180, "Max. Disp:", col="white")
text(455, 165, "Cadillac Fleetwood", col="white")

X-coordinates

y-coordinates

text

text color
As a final touch, let’s add some gridlines

Command Arguments

grid(NULL, Number of cells in the x-direction (“NULL” aligns with existing ticks/numbers)

NULL, Number of cells in the y-direction (“NULL” aligns with existing ticks/numbers)

col="azure4", Line color

lty=3, Line type (“3” corresponds to a dotted line)

lwd=1) Line width


…and, the final result
Visualizing more than 2
parameters at a time
 Arrays of graphs, each of which represents 2 parameters
 Series of graphs generated using a programming loop.
 Bubble Charts
 Heat Maps

Before doing the following examples, execute the following


command to clear your plotting area:
dev.off()
Quickly generating arrays of
maps
Let’s assume that we want to examine the interactions between the
following parameters from the mtcars data set:
 mileage (column 1)
 engine displacement (column 3)
 horsepower (column 4)
 weight (column 6)

This dataObject
can being
be extracted into a new object and plotted, as follows:
Source Desired Columns
created

take_4 <- mtcars[c(1, 3, 4, 6)]

plot(take_4)
Resulting Array of Maps
Using a programming loop
Suppose that you wanted to generate and simultaneously display 3 graphs using the mtcars
data set, as follows:
 weight versus horsepower, for 4-cylinder cars.
 weight versus horsepower, for 6-cylinder cars.
 weight versus horsepower, for 8-cylinder cars.

1. Create a vector object with the unique cyl (the number of cylinders) values from the
mtcars data.
Object being Command Arguments
created

cyl_num <- unique(mtcars$cyl)

2. Arrange the contents of the vector in ascending order


Command Arguments

cyl_num <- sort(cyl_num, decreasing = FALSE)


Using a programming loop
3. Divide the plotting area into 3 parts (one for each graph).

Arguments
Command

par(mfrow=c(3,1),mar=c(4,3,3,2), cex=0.6)

Divide the plotting Assign margins Scale text and


area into 3 rows around edges symbols
and 1 column
Using a programming loop
4. Create a loop to plot the 3 graphs.

Execute loop once for each element in


Start loop for (i in 1:length(cyl_num)) { cyl_num_ordered

Set cyl_count (number of cylinders) equal to the ith


cyl_count <- cyl_num[i] element in cyl_num_ordered

Create plot main title using concatenation (paste)


main_title <- paste(cyl_count,"Cylinders")
Contents of
loop mtcars_sub <- subset(mtcars,mtcars$cyl==cyl_count) Extract records from mtcars data set with
current number of cylinders

Using the extracted data, plot wt versus hp


plot(mtcars_sub$wt,mtcars_sub$hp,

main=main_title) Assign main title of plot

End loop }
Plots generated using programming loop
Bubble Charts
Bubble charts build upon traditional x-y plots by incorporating multi-sized
and multi-colored “bubbles” (symbols) to convey additional information.

Source: https://
www.flickr.com/phot
os/jawspeak/5944275
063
Create a bubble chart in R
Using the mtcars data set, let’s create a bubble chart that:
1. Shows horsepower (hp) as a function of the time required to go a quarter
mile (qsec).
2. Scales the size of the symbols to reflect the engine displacements of the
cars.
3. Colors the symbols to reflect the number of cylinders in the cars.
4. Includes legends indicating the meanings of the symbol colors and sizes.

Let’s start by clearing the existing contents and settings from the plot area, as
follows:
dev.off()
How big should the bubbles be?
 You want the bubble’s area to be proportional to the engine
displacement.
 However, in R, the bubble’s size is specified by giving the radius.
 The radius of the bubble is proportional to the square root of its area.
 Establish a new object that reflects the appropriate radius of the
bubble for each car model.

Object being created Command Arguments

bubble_radii <- sqrt(mtcars$disp/pi)


Create a vector Take the
listing the radii square root of the disp field
from mtcars divided by π
What color should the bubbles be?

Let’s use the following colors to indicate the number of cylinders:


 4 cylinders: green
 6 cylinders: blue
 8 cylinders: red

Program an R loop that will assign the appropriate color to each car
model in the mtcar data and list the colors in a vector object.
Loop to assign bubble colors
1. Create
bubble_colors <- NULL Object bubble_colors initially has
object
no contents

2. Start loop Run loop once for each record in


for (i in 1:length(mtcars$cyl)) { mtcars

Retrieve a new value of cyl from


cyl_value <- mtcars[i,2] mtcars and assign it to the object
cyl_value
3. Loop if(cyl_value==4) {color_i <- "#1eec1850"} If-then statement to determine the
contents
appropriate color for the number
else if (cyl_value==6) {color_i <- of cylinders and assign it to the
"#1842ec50"} object color_i

else color_i <- "#ec351850"


Add the contents of color_i to the
existing contents of bubble_colors
bubble_colors <- c(bubble_colors, color_i)

4. End loop
}

Note: the html color codes (e.g. "#1842ec50“) all have “50” appended to the end; this makes the
colors 50% transparent when plotted, thereby preventing the bubbles from obscuring each other
on the bubble plot.
Plot the data
Command Arguments

symbols(mtcars$qsec, Take X-coordinates from the qsec field in mtcars

mtcars$hp, Take Y-coordinates from the hp field in mtcars

circles=bubble_radii, Make circle radii proportional to the values in


bubble_radii.
inches=0.35, Set maximum radius for the circles.

fg="white", Color for the outline of the circles.


Color the inside of the circles as per bubble_colors
bg=bubble_colors,
xlab="Quarter Mile Time (sec)", Label for x-axis

ylab="horsepower", Label for y-axis

main="Assorted Information from the mtcars data Main title

set")
Here’s what we have so far
Let’s add a colors legend
We will use the legend command to create a legend, similarly to in an
earlier example.
Command Arguments

legend(21.5, 250,
fill=c("#1eec1850","#1842ec50","#ec351850"),
legend=c("4 cylinders","6 cylinders","8 cylinders"),
bg="azure1",
text.col="black")

In the above command, the fill argument creates boxes with the specified
colors next to the legend text.
Add a bubble size legend
1. Install and load a new package that allows the addition of shapes to the
plot.
install.packages("plotrix")
library(plotrix)

2. Draw 3 sample circles of different sizes


Command Arguments

draw.circle(21, 325, 0.35, nv=100, border=NULL, col=NULL, lty=1, lwd=1)


X-coord Line width
Line type
Y-coord
radius Use default border and color
Number of vertices
to use when
drawing circle

draw.circle(22,325,0.22,nv=100,border=NULL,col=NULL,lty=1,lwd=1)
draw.circle(22.75,325,0.11,nv=100,border=NULL,col=NULL,lty=1,lwd=1)
Add a bubble size legend
3. Label the size legend
text(22, 355, "Displacement", col="black")

4. Label the three circles


text(c(21, 22, 22.75),c(325, 325, 325), c("500","200","50"),col="red")
Ta-dah, the final result!
And now…heat maps!
 Heat maps arrange data tabularly
(much like in a conventional
table)
• However, the data values are
distinguished through variations
in color.

• The heat map at left illustrates


the relative frequency with which
people are born on different days
of the year.

By Leonid 2 (Own work) [CC BY-SA 3.0


(http://creativecommons.org/licenses/by-sa/3.0) or GFDL
(http://www.gnu.org/copyleft/fdl.html)], via Wikimedia Commons
Making heat maps with R
 Let’s try creating a heat map using the mtcars data set.
 To refresh our memory about what’s in this data set, use the
following command to see the first 15 records:
Command Arguments

head(mtcars, 15)
Making heat maps with R
 The command used for creating heat maps works only with data
matrices (not data frames).
 Is the mtcars data set a data matrix?
 Check using this command:
str(mtcars)

No, mtcars is a
data frame, not a
data matrix.

 Create a data matrix with the contents of mtcars:


mtcars_dm <- data.matrix(mtcars)
Making heat maps with R
 Choose a color palette (range of colors)
 This website provides an overview of some standard color palettes in R:
http://www.r-bloggers.com/color-palettes-in-r/
 Generate your heat map:
Command Arguments

heatmap(mtcars_dm, Data source


Choose not to have a dendogram (tree diagram) on the
Rowv=NA,
vertical axis

For colors, use the standard Colv=NA,


No dendogram on the horizontal axis
heat_colors palette, which ranges
from bright red for low values to col = heat.colors(30),
pale yellow for high values; use 30
different shades scale="column",
Center values in the direction of the columns

Margins around the column and row names, respectively


margins=c(5,10))
And here’s the heat map
And now, to wrap things up,
interactive graphs
 Rstudio can be used to create interactive graphs.
 Interactive graphs allow their users to alter the parameters being
graphed.
 The manipulate command used to create interactive graphs
does not work when programming directly in R.
 The manipulate command requires the installation of the
manipulate package.
 Therefore, start by installing and loading this package:
install.packages("manipulate")
library(manipulate)
Interactive Graphs
 Let’s invent a mathematical function and assign it to an object,
roller.

roller <- function(x) sin(x)- sin(sqrt(x))

 This could then be graphed without an interactive feature, as


follows:
Command Arguments

Function to graph
curve(roller,
Minimum x value (i.e. start graphing at x = 0
0,
100, Maximum x value (i.e. stop graphing at x = 30

Evaluate 100 values of x


n=1000)
Graph without interactivity
Adding Interactivity

 To add interactivity, use the manipulate command

The command used to create


the graph on the previous
manipulate(curve(roller, x.min, x.max, n=breaks),
slide is now an argument for
the manipulate command.
Sliders are defined that will x.min =slider(0, 1000),
allow the viewer of the •However, the minimum x,
graph to adjust values over x.max =slider(0, 1000),maximum x, and number of
breaks have now become
the specified ranges.
variables.
breaks=slider(10, 10000))
Graph with Added Interactivity
Questions?
Appendix A

Supplementary Visualizations
Supplementary Visualizations

1. Combining differently sized visualizations within the


plot area.

2. Tree Maps

3. Connected Scatter Plots


Combining differently sized
visualizations within the plot area
In the exercise for using a programming loop, you generated 3 graphs
arranged vertically in the plotting area.

Suppose that,
 On the left side of your plotting area, you wanted these plots, and
 On the right side, you wanted a bar graph showing the mileage for
each model of car from the mtcars data set (in order of increasing
mileage).

How would you do this?


Introducing the layout command
1. Start by creating an object (“plot_space”)containing a matrix that
reflects the arrangement of the plot area.

Contents of matrix; last 3 entries are the same to indicate


plot_space <- matrix(c(1,2,3,4,4,4), the merging of the 3 cells on the right

3, Have 3 rows.

2, Have 2 columns.

byrow=FALSE) Do not fill the cells by row (i.e. fill the cells by column,
starting with the first column)

2. Apply the matrix to the plotting area.

layout(plot_space)
Repeat the 3 original plots
1. Start with the 3 plots from the earlier exercise (you already know
how these work)
for (i in 1:length(cyl_num) {
cyl_count <- cyl_num_ordered[i]
main_title <- paste(cyl_count,"Cylinders")
mtcars_sub <- subset(mtcars,mtcars$cyl==cyl_count)
plot(mtcars_sub$wt,mtcars_sub$hp, main=main_title)
}
The 3 plots will appear on the left side of the plotting area.
Add the 4th plot
1. Order you data in the sequence in which you would like it to appear:

mtcars <- mtcars[order(mtcars$mpg, decreasing = TRUE),]

2. Modify your plot area and generate your plot


Command Arguments

par(las=2) Set label orientation (“2” indicates horizontal labels

Plot the mileage (mpg) data from the mtcars data set
barplot(mtcars$mpg,
main = "Mileage (mpg)", Main title

col= "darkgoldenrod1", Assign color to the bars

horiz=TRUE, Orient the bars horizontally


names.arg = row.names(mtcars), Use the row names from the mtcars data set as the labels

cex.names = 0.5) Scale the size of the labels (relative to the default size
And here’s how it looks…
Tree Maps
Tree maps represent data as a series of colored rectangles.
 Rectangles are grouped to represent hierarchical relationships.
 Rectangles are sized and colored, respectively, to illustrate associated data
values.

Tree map showing


beverage preferences
within a small group

By User:GBoshouwers
(MagnaView Designer Pro)
[Public domain], via Wikimedia
Commons
Create a Tree Map using R

Let’s create a Tree Map, using World Development Indicator


(WDI) data from the World Bank, that:
 Groups countries by region of the world.
 Sizes the countries in accordance with their populations.
 Colors countries on the basis of their land areas.

How would we do this?


Create a Tree Map using R

Start by installing and loading two packages that we will need:


1. The WDI package facilitates downloading WDI data from
the World Bank (the RJSONIO package must also be
loaded to use WDI)
install.packages("WDI")
library(RJSONIO)
library(WDI)

2. The treemap package allows the creation of tree maps.


install.packages("treemap")
library(treemap)
Create a Tree Map using R

Download data from the World Bank


Object being created Command Arguments

tree_data <- WDI(country = "all", Obtain data for all countries


Desired data fields
indicator = (population and
c("SP.POP.TOTL","AG.SRF.TOTL.K2"), surface area)
Starting year for desired data
start = 2014,
Ending year for desired data
end = 2014,
Include additional data such as capital,
region, income level, etc.
extra = TRUE)
Create a Tree Map using R

Examine the data.


head(tree_data,20)

In addition to records for


countries, the data includes
records for entire regions
(e.g. the Arab World,
Europe & Central Asia)

We do not want the records


for regions.
Create a Tree Map using R

“Clean” the data.


 Exclude any records for which a national capital is not provided (this
eliminates the records for regions)
tree_data <- subset(tree_data, capital != "")

“!=“ means “is not equal to”

 To make them easier to understand, rename the 4 th and 5th columns of the
data.
colnames(tree_data)[4:5] <- c("Population", "Area (sq.km)")
Create a Tree Map using R

Create the tree map.


Command Arguments

treemap(tree_data, Data source

index=c("region", Data fields to be used for, respectively, grouping and identifying


the rectangles.
"country"),
Data field for sizing the rectangles.
vSize="Population",
Data field for assigning colors to the rectangles.
vColor="Area Type of data in the data field assigned to the vColor argument
(sq.km)",
type="value")
Create a Tree Map using R
And here’s our tree map!
Connected Scatter Plots

Previous examples have included scatter plots.


 Useful for establishing the basic relationship between two
data parameters.

Connected Scatter Plots, in addition to showing the data, trace


the sequence in which the data points were recorded.
Creating a Connected Scatter Plot using R

One of the standard R data sets is faithful


 faithful records the spacing and duration of eruptions of the
Old Faithful geyser in Yellowstone National Park.
head(faithful)

What if, in addition to seeing the relationship between the


spacing and the duration, you wanted to see if the longer
eruptions tended to be followed by shorter eruptions (and vice
versa)?
Creating a Connected Scatter Plot using R

Establish the ranges of values in the faithful data set.

Determine the minimum value of the eruptions field,


round it down to an integer, and assign that to the
xmin <- floor(min(faithful$eruptions)) object xmin.

Determine the maximum value of the eruptions field,


xmax <- ceiling(max(faithful$eruptions)) round it up to an integer, and assign that to the object
xmax.

Determine the minimum value of the waiting field,


ymin <- floor(min(faithful$waiting)) round it down to an integer, and assign that to the
object ymin.
Determine the maximum value of the waiting field,
ymax <- ceiling(max(faithful$waiting)) round it down to an integer, and assign that to the
object ymax.
Creating a Connected Scatter Plot using R

Plot the points

plot(faithful$eruptions,
faithful$waiting,
xlim=c(xmin,xmax),
ylim=c(ymin,ymax),
xlab="Eruption Duration (sec)",
ylab="Interval Between Eruptions (sec)",
main="Sequence of Old Faithful Eruptions")
Creating a Connected Scatter Plot using R

Now, add some lines tracing the data points in the sequence in which they were recorded. However,
 Do not add all of the lines, as the plot will become too congested (and therefore difficult to
read)
 Alternate between differently colored lines to make it easier to follow from one line to the next.

Therefore, let’s
 plot only the first 10 lines, and
 alternate between red and green lines.
Creating a Connected Scatter Plot using R

To alternate colors, create a matrix with


 10 rows
 1 column
 Alternating values of “red” and “green”

line_col <- matrix(c("red","green"),10,1)

The matrix will continue adding values of first “red” and then
“green” until all 10 positions in the matrix are filled
Creating a Connected Scatter Plot using R

Use a loop to graph the first 10 lines using alternating colors

Start a loop with 10 iterations


for (i in 1:10){
Set the line color for each iteration to the ith value in line_col
loop_color <- line_col[i]
Set the record numbers in the faithful data set that will provide, respectively, the starting
and ending points of the line.
line_set <- c(i,i+1)
X-values for the starting and end points of the line
Draw line lines(faithful[line_set,1],
Y-values for the starting and end points of the line
faithful[line_set,2],
Assign line color
col=loop_color)
}
Creating a Connected Scatter Plot using R

And here’s the plot with the first 10 lines

Although not universally true, the lines


plotted on the graph do suggest that,
typically, an eruption of longer
duration will be followed by a shorter
eruption.

An analogous pattern also generally


occurs with regard to the waiting
periods between eruptions.
Appendix B

Reference Materials for Commands


used in this Workshop
Function Package Reference Materials
abline graphics http://www.inside-r.org/r-doc/graphics/abline

barplot graphics http://www.inside-r.org/r-doc/graphics/barplot

ceiling base http://www.inside-r.org/r-doc/base/round

colnames base http://www.inside-r.org/r-doc/base/colnames

curve graphics http://www.inside-r.org/r-doc/graphics/curve

data.matrix base http://www.inside-r.org/r-doc/base/data.matrix

dev grDevices http://www.inside-r.org/r-doc/grDevices/dev.off

draw.circle plotrix http://www.inside-r.org/packages/cran/plotrix/docs/draw.circle

floor base http://www.inside-r.org/r-doc/base/round

grid graphics http://www.inside-r.org/r-doc/graphics/grid

head utils http://www.inside-r.org/r-doc/utils/head

heatmap stats http://www.inside-r.org/r-doc/stats/heatmap

install.packages utils http://www.inside-r.org/r-doc/utils/install.packages

layout graphics http://www.inside-r.org/r-doc/graphics/layout

legend graphics http://www.inside-r.org/r-doc/graphics/legend

library base http://www.inside-r.org/r-doc/base/library

lm stats http://www.inside-r.org/r-doc/stats/lm
Function Package Reference Materials
manipulate manipulate https://support.rstudio.com/hc/en-us/articles/200551906-Interactive-Plottin
g-with-Manipulate

https://support.rstudio.com/hc/en-us/articles/200551906-Interactive-Plottin
g-with-Manipulate

matrix base http://www.inside-r.org/r-doc/base/matrix

paste base http://www.inside-r.org/r-doc/base/paste

points graphics http://www.inside-r.org/r-doc/graphics/points

par graphics http://www.inside-r.org/r-doc/graphics/par

plot graphics http://www.inside-r.org/r-doc/graphics/plot

read.csv (variant of utils http://www.inside-r.org/r-doc/utils/read.csv


read.table)
sort base http://www.inside-r.org/r-doc/base/sort

str utils http://www.inside-r.org/r-doc/utils/str

subset base http://www.inside-r.org/r-doc/base/subset

symbols graphics http://www.inside-r.org/r-doc/graphics/symbols

text graphics http://www.inside-r.org/r-doc/graphics/text

treemap treemap http://www.inside-r.org/node/193882

unique base http://www.inside-r.org/r-doc/base/unique

WDI WDI http://www.inside-r.org/packages/cran/wdi/docs/WDI


Appendix C

Reference Materials for Selecting


Colors
Color Selection Resources

Standard R Colors
• http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf

Standard R Color Palettes


• http://www.r-bloggers.com/color-palettes-in-r/

HTML Color Codes


• http://htmlcolorcodes.com/
• http://html-color-codes.info/
• http://www.computerhope.com/htmcolor.htm
• http://www.color-hex.com/
• http://www.w3schools.com/colors/colors_picker.asp
Feedback
We would appreciate your feedback on this workshop

http://goo.gl/forms/gcrJ1OSi5m

You might also like