Ggmap PDF
Ggmap PDF
Abstract In spatial statistics the ability to visualize data and models superimposed with their
basic social landmarks and geographic context
is invaluable. ggmap is a new tool which enables such visualization by combining the spatial
information of static maps from Google Maps,
OpenStreetMap, Stamen Maps or CloudMade
Maps with the layered grammar of graphics implementation of ggplot2. In addition, several
new utility functions are introduced which allow
the user to access the Google Geocoding, Distance Matrix, and Directions APIs. The result is
an easy, consistent and modular framework for
spatial graphics with several convenient tools for
spatial data analysis.
df$lat
-96.0
-95.5
-95.0
-94.5
df$lon
Introduction
30.0
30.5
29.5
29.0
df$lat
-96.0
-95.5
-95.0
-94.5
df$lon
C ONTRIBUTED A RTICLE
In most cases the plot is understandable to the researcher who has worked on the problem for some
time but is of hardly any use to his audience, who
must work to associate the data of interest with their
location. Moreover, it leaves out many practical
detailsare most of the events to the east or west of
landmark x? Are they clustered around more wellto-do parts of town, or do they tend to occur in disadvantaged areas? Questions like these cant really
be answered using these kinds of graphics because
we dont think in terms of small scale areal boundaries (e.g. zip codes or census tracts).
This is of course a (gross) exaggeration. With a
little effort better plots can be made, and tools such
as maps, maptools, sp, and RgoogleMaps make the
process much easier; in fact, RgoogleMaps was the
inspiration for ggmap. Moreover there has recently
been a deluge of interest in the subject of mapmaking
in RIan Fellows excellent interactive GUI-driven
DeducerSpatial package based on Bing Maps comes
to mind. ggmap takes another step in this direction
by situating the contextual information of various
kinds of static maps in the ggplot2 plotting framework. The result is an easy, consistent way of specifying plots which are readily interpretable by both
expert and audience and safeguarded from graphical inconsistencies by the layered grammar of graphics framework. The result is a spatial plot resembling
Figure 3.
murder <- subset(crime, offense == "murder")
qmplot(lon, lat, data = murder,
colour = I('red'), size = I(3), darken = .3)
that because of the Mercator projection limitations in mapproject, anything above/below 80o cannot be plotted currently.
ISSN 2073-4859
C ONTRIBUTED A RTICLE
necessary) and passing them to each of the API specific get_* functions. To ensure that the resulting
maps are the same across the various sources for
the same location/zoom specification, get_map first
grabs the appropriate Google Map, determines its
bounding box, and then downloads the other map as
needed. In the case of Stamen Maps and Cloudmade
Maps, this involves a stitching process of combining
several tiles (small map images) and then cropping
the result to the appropriate bounding box. The result is a single, consistent specification syntax across
the four map sources as seen for Google Maps and
OpenStreetMap in Figure 4.
baylor <- "baylor university"
qmap(baylor, zoom = 14)
C ONTRIBUTED A RTICLE
Before moving into the source and maptype arguments, it is important to note that the underlying API
specific get_* functions for which get_map is a wrapper provide more extensive mechanisms for downloading from their respective sources. For example,
get_googlemap can access almost the full range of the
Google Static Maps API as seen in Figure 5.
set.seed(500)
df <- round(data.frame(
x = jitter(rep(-95.36, 50), amount = .3),
y = jitter(rep( 29.76, 50), amount = .3)
), digits = 2)
map <- get_googlemap('houston', markers = df,
path = df, scale = 2)
ggmap(map, extent = 'device')
C ONTRIBUTED A RTICLE
Cloudmade Maps takes the tile styling even further by allowing the user to either (1) select among
thousands of user-made sets or (2) create an entirely
new style with a simple online editor where the
user can specify colors, lines, and so forth for various types of roads, waterways, landmarks, etc.,
all of which are generated by CloudMade and accessible in ggmap. ggmap, through get_map (or
get_cloudmademap) allows for both options. This is
a unique feature of CloudMade Maps which really
boosts their applicability and expands the possibilities with ggmap. The one minor drawback to using CloudMade Maps is that the user must register
with CloudMade to obtain an API key and then pass
the API key into get_map with the api_key argument.
API keys are free of charge and can be obtained in a
matter of minutes. Two low-light CloudMade map
styles are seen in Figure 7.
Figure 7: Two out of thousands of user made CloudMade Maps styles. The top is comparable to Figures
4 and 6, and the bottom is the bodies of water in Figure 5
ISSN 2073-4859
C ONTRIBUTED A RTICLE
ranges specified in the bb attribute of the ggmap object. Thus, the default base layer of the ggplot2 object
created by ggmap is ggplot(aes(x=lon,y=lat),data
= fourCorners), and the default x and y aesthetic
scales are calculated based on the longitude and latitude ranges of the map.
49.1
49.0
lat
48.9
48.8
48.7
48.6
2.0
2.2
2.4
2.6
2.8
lon
Data
Crime data were compiled from the Houston Police Departments website over the period of January
2010-August 2010. The data were lightly cleaned and
aggregated using plyr and geocoded using Google
Maps (to the center of the block, e.g., 6150 Main St.);
the full data set is available in ggmap as the data set
crime.
> str(crime)
'data.frame': 86314 obs. of 17 variables:
$ time
: POSIXt, format: "2010-01-01 0...
$ date
: chr "1/1/2010" "1/1/2010" "1...
$ hour
: int 0 0 0 0 0 0 0 0 0 0 ...
$ premise : chr "18A" "13R" "20R" "20R" ...
$ offense : chr "murder" "robbery" "aggr...
$ beat
: chr "15E30" "13D10" "16E20" ...
$ block
: chr "9600-9699" "4700-4799" ...
$ street : chr "marlive" "telephone" "w...
$ type
: chr "ln" "rd" "ln" "st" ...
$ suffix : chr "-" "-" "-" "-" ...
$ number : int 1 1 1 1 1 1 1 1 1 1 ...
$ month
: Factor w/ 12 levels "january"...
$ day
: Factor w/ 7 levels "monday" ...
$ location: chr "apartment parking lot" ...
$ address : chr "9650 marlive ln" "4750 ...
$ lon
: num -95.4 -95.3 -95.5 -95.4 ...
$ lat
: num 29.7 29.7 29.6 29.8 29.7...
ISSN 2073-4859
C ONTRIBUTED A RTICLE
Offense
Murder
Rape
Aggravated Assault
Robery
C ONTRIBUTED A RTICLE
HoustonMap +
stat_density2d(
aes(x = lon, y = lat, fill = ..level..,
alpha = ..level..),
size = 2, bins = 4, data = violent_crimes,
geom = "polygon")
Violent
Crime
Density
1400
1200
1000
800
600
400
C ONTRIBUTED A RTICLE
Tuesday
Wednesday
Thursday
Friday
Saturday
29.78
29.77
29.76
29.75
29.74
Latitude
29.78
Violent
Crime
Density
1500
29.77
29.76
1000
29.75
29.74
500
Sunday
29.78
29.77
29.76
29.75
29.74
95.39 95.37 95.35
Longitude
= "simple", only longitudes and latitudes are returned. These are actually Mercator projections of
the ubiquitous unprojected 1984 world geodetic system (WGS84), a spheroidal earth model used by
Google Maps. When output is set to "more", a larger
data frame is returned which provides much more
Google Geocoding information on the query :
> geocode("baylor university", output = "more")
lon
lat
type
loctype
1 -97.11441 31.54872 university approximate
address
north
south
east
1 [long address] 31.55823 31.53921 -97.0984
west postal_code
country
1 -97.13042
76706 united states
administrative_area_level_2
1
mclennan
administrative_area_level_1 locality
street
1
texas
waco s 5th st
streetNo point_of_interest
1
1311
<NA>
In particular, administrative bodies at various levels
are reported. Going further, setting output = "all"
returns the entire JavaScript Object Notation (JSON)
tree given by the Google Geocoding API parsed by
rjson (Couture-Beil, 2011).
The Geocoding API has a number of request limitations in place to prevent abuse. An unspecified
short-term rate limit is in place (see mapdist below)
as well as a 24-hour limit of 2,500 requests. These are
monitored to some extent by the hidden global variable .GoogleGeocodeQueryCount and exported function geocodeQueryCheck. geocode uses these to monitor its own progress and will either (1) slow its rate
depending on usage or (2) error if the query limit is
exceeded. Note that revgeocode shares the same request pool and is monitored by the same variable and
function. To learn more about the Google Geocoding, Distance Matrix, and Directions API usage regulations, see the websites listed in the bibliography.
revgeocode
In some instances it is useful to convert longitude/latitude coordinates into a physical address.
This is made possible (to the extent to which it is possible) with the revgeocode function which also relies
on the Google Geocoding API.
> gc <- geocode("baylor university")
> (gc <- as.numeric(gc))
[1] -97.11441 31.54872
> revgeocode(gc)
[1] "S 1st St, Baylor University, Waco, TX
76706, USA"
Like geocode, more output can be provided as well
> revgeocode(gc, output = "more")
address
route
establishment
ISSN 2073-4859
C ONTRIBUTED A RTICLE
10
time
url elements
1 2012-03-16 00:12:11 [url used]
1
2 2012-03-16 00:16:10 [url used]
2
If the user exceeds the limitations, mapdist either (1)
pauses until the short-term request limit has lapsed
or (2) errors if no queries are remaining. Thus, it
is almost identical to the mechanism in place for
geocoding. If the user believes this to be incorrect,
an override is available with the mapdist specification override_limit = TRUE.
The data frame output of mapdist is very convenient for use with ggplot2. An example is provided
by Figure 14, where travel times from one location
("My Office") to several nearby locations are (1) determined using mapdist, (2) binned into categories
using cut, and then (3) plotted using a combination
of qmap, geom_text, and geom_rect with the fill aesthetic set to the category of travel time. The full code
is in the examples section of the documentation of
ggmap.
Buzzard Billy's
Ninfa's Mexican
Cafe Cappuccino
Dr Pepper Museum
Mayborn Museum
Baseball Stadium
My Office
Salvation Army
HEB Grocery
Flea Market
The default mode of transportation is driving; however, the other modes are also available. The input
forms of from and to can be either physical addresses
(ideal), lazy ("the white house"), or geographic coordinates (which are reverse geocoded). While the
output defaults to the data frame format seen above,
setting output = "all" provides the full JSON tree
from Google.
The Distance Matrix API limits users to 100 requests per query, 100 requests per 10 seconds, and
2500 requests per 24 hours. To the extent to which
these can be easily monitored, the exported function
distQueryCheck helps the user keep track of their remaining balance of queries. It relies on the hidden
global variable .GoogleDistQueryCount
> distQueryCheck()
2495 distance queries remaining.
> .GoogleDistQueryCount
2 This
Basketball Arena
Administration
Minutes
Away
03 35 57 710 10+
by Bike
ISSN 2073-4859
C ONTRIBUTED A RTICLE
Bibliography
R.S. Bivand, E.J. Pebesma, and V. Gomez-Rubio. Applied spatial data analysis with R., Springer, New
The R Journal Vol. X/Y, Month, Year
11
URL
M. Loecher and Sense Networks. RgoogleMaps: Overlays on Google map tiles in R., R package version 1.1.9.3. URL http://CRAN.R-project.org/
package=RgoogleMaps.
E.J. Pebesma and R.S. Bivand. Classes and methods
for spatial data in R. R News, 5(2), 2005. URL
http://cran.r-project.org/doc/Rnews/.
T. Schlesinger and Manuel J. A. Eugster. osmar: OpenStreetMap and R., R package version 1.1-3. URL
http://CRAN.R-project.org/package=osmar.
U.S. Census Bureau, Geography Division, Cartographic Products Management Branch . Census 2000: Census Tract Cartographic Boundary
Files..
URL http://www.census.gov/geo/www/
cob/tr2000.html.
H. Wickham. ggplot2: elegant graphics for data analysis., Springer, New York, 2009.
H. Wickham. A layered grammar of graphics. Journal
of Computational and Graphical Statistics, 19(1):328,
2010.
H. Wickham. The Split-Apply-Combine Strategy
for Data Analysis. Journal of Statistical Software,
40(1):129, 2011. URL http://www.jstatsoft.
org/v40/i01/.
L. Wilkinson. The Grammar of Graphics., 2nd ed.,
Springer, New York, 2005.
David Kahle
Baylor University
Department of Statistical Science
One Bear Place #97140
Waco, TX 77005
[email protected]
Hadley Wickham
Rice University
Department of Statistics, MS 138
6100 Main St.
Houston, TX 77005
[email protected]
ISSN 2073-4859
C ONTRIBUTED A RTICLE
12
Route
A
31.558
31.556
Latitude
31.554
31.552
31.550
31.548
97.130
97.125
97.120
97.130
97.125
97.120
97.130
97.125
97.120
Longitude
Figure 16: Plotting shape files Census tracts in Texas from the 2000 U.S. Census
The R Journal Vol. X/Y, Month, Year
ISSN 2073-4859