Mapcruncher: Integrating The World'S Geographic Information
Mapcruncher: Integrating The World'S Geographic Information
Figure 1. (left) Base imagery of the UCLA campus (right) UCLA’s campus map superimposed, using MapCruncher
turning it into an interactive map layer. That is, we’d like to produce less scale distortion. It is hard to guess exactly which
superimpose our map onto the road and aerial photography projection a map uses by inspection because there are so many
already provided by online mapping sites, such that the two maps projections. For example, the USGS1 produces maps depicting
can be viewed together, as in Figure 1. each of the 50 United States using custom projection parameters
tailored to each state.
Map overlays have existed for as long as maps have existed, so it
may seem surprising that a new tool was necessary to accomplish Unfortunately, this problem is not well solved by any of the
a seemingly well-known task. In fact, our original intent was not numerous tools available that aid in the production of map
to create a tool, but to create a mashup using existing tools. In overlays. After a week or two of tinkering with various test maps,
this section, we describe some of the hurdles we encountered and we concluded the existing tools were all either too simple or too
how they motivated us to build a new tool to overcome them. complex. The simple tools were limited to linear transformations
such as scaling, translation and rotation. Our test maps did not
3.1 Reprojection of Unknown Map use the Mercator projection, so the simple tools could not warp
Projections them sufficiently to produce good alignment at all points. The
complex tools could perform arbitrary reprojections, but required
The Earth is round, but maps and the computer screens that
complete specification of the projection, which was unavailable
display them are flat. Maps that depict very small extents of the
for our test maps.
Earth relative to their level of detail, such as building blueprints,
can make the simplifying assumption that the Earth is as flat as MapCruncher addresses this problem using approximate
the map that depicts it. However, maps of larger extent can not reprojections. As we will see in Section 4, MapCruncher allows
ignore the curvature of the Earth. A cartographer must therefore users to point out correspondences between the two maps, then
select a method to convert the position of points on the three- estimates how to reproject the user’s map into Mercator without a
dimensional Earth’s surface to the two-dimensional map. The model of the source map’s projection. Although less accurate
mathematical functions used for this purpose are called map than an exact reprojection, this design choice fills a useful niche
projections [20]. in between the low- and high-end.
One spatial relationship or another is lost whenever the three-
dimensional Earth is projected into a two-dimensional 3.2 Management of Large Datasets
representation. Consequently, an astonishing variety of map The simple, intuitive pan-and-zoom interface provided by online
projections have been invented. Each projection makes different maps makes it easy to forget that they are providing access to
tradeoffs, typically maintaining high fidelity in some aspect of the immense repositories of data. Microsoft’s Virtual Earth platform
Earth’s representation (e.g., the shape of objects) by giving up has nearly 200 terabytes (1014 bytes) of imagery. While a casual
fidelity in some other aspect (e.g., apparent relative sizes of user is unlikely to ever create such a large dataset, we’ve found
objects). Cartographers select the best projection based on a that even modest maps can overwhelm normal desktop image
map’s intended use. Most map projections are parameterized, to processing tools.
enable them to be fine-tuned to the location, size, and aspect ratio For example, consider the map of neighborhood bicycle routes
of the extent of the map. produced by King County, Washington. Two of the authors
For two maps to be superimposed correctly, as is our goal, they commute to work by bicycle, so this map was of particular
must both be drawn using the same projection. In the world of interest. We tried to overlay it on several interactive maps
traditional GIS systems, this problem is usually easy to solve. (Google Maps [8], Google Earth [7], and Microsoft Virtual Earth
Most spatial data comes annotated with metadata describing [16]) using previously existing tools. All of them required that we
which projection was used to draw it, along with the projection’s provide the overlay as a single rasterized image (e.g. a PNG).
parameters. This information can be used to perform a The 2005 edition of the King County bicycle map is a 30”x36”
mathematically exact transformation of a map into any other poster. If rendered at a zoom level large enough that its smallest
projection. features are easily readable, it is a 3-gigapixel image. Despite
For casual mashups, the situation is more difficult. The vast considerable effort, we could not find a PDF rendering program
majority of maps available on the Web have been stripped of the under Windows or Linux capable of producing an output image of
metadata that describes the map projection. For maps that do that size. Their failure modes were diverse and often amusing.
have metadata, it is often in a format that can not be automatically Some ran out of RAM (3GB was available). Others filled the disk
parsed—for example, a text file describing the projection in with temporary files. Some simply froze the computer.
English. Consequently, it is nearly impossible to precisely or Even if we had we succeeded in creating such a large image from
automatically reproject a typical map found on the Web. our source PDF, other roadblocks would have awaited us. Similar
This is a problem for a user who wishes to create an overlay. limitations existed in the tools available both for registration of
Most maps are not drawn using the same projection as is used by the image to a reference map and cutting it into browser-
the major interactive online map services. Microsoft’s and compatible 256x256 pixel tiles. Our early failure in the seemingly
Google’s mapping sites, for example, use the Mercator simple task of creating a bicycle-map overlay was among our
Projection. (Mercator is used because it is conformal. Conformal motivations to write MapCruncher.
projections do not distort features’ shapes, making it possible to
overlay street maps on undistorted aerial photography.) In 1
contrast, most other maps are not expected to be used as overlays The United States Geological Survey (USGS) is the official
for photographs, so instead use one of the many projections that mapping agency for the United States.
MapCruncher was designed with enormous output images in transformation. First, we ask the user to identify some landmark
mind. As we will describe in Section 5, our tool uses the same that can be found both on the user’s map and also on the Virtual
strategy as the large interactive map sites: instead of producing a Earth map or aerial imagery; we call this identification a
single image, MapCruncher renders a large number of small “correspondence.” After obtaining several correspondences, we
(256x256) image tiles. This allows browsers to navigate through find the coefficients to a polynomial function that best fits them.
large custom overlays just as they do the underlying road maps A transformation with a 2nd-degree polynomial can look very
and aerial photography: efficiently downloading just the sub- similar to the transformation from many projections into
images they need, on-demand. In contrast, most other overlay Mercator.
generators that require the user download the entire overlay image
“But wait!” a GIS professional might insist. “Polynomials may
before displaying any of it. This is impractical for our 3-gigapixel
look similar to the right answer, but to reproject correctly, you
test map.
need trigonometry. And asking for user input by pointing out
MapCruncher also handles large source maps gracefully map landmarks is horribly prone to error!” This is true – and the
generating each 256x256 tile individually, directly from just the users who have spatial data annotated with all the metadata
portion of the source map that it requires. Again, this is in required to do an exact transformation are likely to use GIS tools,
contrast to other tile generators that require the entire source map not MapCruncher. While not exact, we’ve found polynomials
to be rendered in advance, even though the image may be giga- or produce excellent results in a wide variety of maps. By analogy,
even tera-pixels in size. the existence of AutoCAD does not obviate the value of Microsoft
Paint.
3.3 Mashing Without Programming In Section 4.1, we describe the process of gathering enough data
In the earliest days of the Web, content production was an
from the user to reproject the user’s map. In Section 4.2, we
engineering discipline. Writing HTML is similar in some ways to
describe how MapCruncher uses that input to produce a usable
computer programming. Like programming, it is inaccessible to
map overlay.
people who do not happen to be experts in the field – that is,
inaccessible to most people who want to create content. Various
HTML authoring tools quickly appeared, making it easier for non-
4.1 Georeferencing
The first step in creating an overlay with MapCruncher is
experts to write web pages without needing an intimate
specifying a number of correspondence points between the user’s
understanding of the underlying technology.
map (the “source map”) and the existing road maps and aerial
The situation today is similar with the creation of mashups, both photography (the “reference map”). Because the reference maps
geographic and otherwise. They are difficult to create without are, themselves, already registered to the Earth’s coordinate
first learning JavaScript, HTML, XML, esoteric APIs, map system2, each correspondence identifies the real latitude and
projections, and geographic coordinate systems. Our first attempt longitude of a point on the source map.
at creating a bicycle-route mashup was slowed by the requirement
MapCruncher provides a simple interface for specifying
we learn many new disciplines, from web APIs to map projections
correspondence points. The MapCruncher GUI, shown in
to online maps’ coordinate systems and naming schemes.
Figure 2, has two viewing panes. The source pane displays the
One of our motivations for writing MapCruncher was to make source map, which can be panned and zoomed to arbitrary
geographic mashups accessible to non-experts – including people locations and zoom levels. The reference pane displays the
who would not have been able to create a mashup without it. As reference map, using imagery from Microsoft Virtual Earth. The
we will see in Section 6, MapCruncher lets beginners create
point-and-click mashups, while still allowing advanced users to
customize arbitrarily.
Figure 3. Establishing a correspondence between a source Figure 5. Disagreement vector points toward likely correct
map and the reference map location.
The mathematically exact relationship between two maps is solve for the affine reprojection parameters as described above.
determined by (1) the projection of each map and (2) the Suppose we have two correspondences A and B, each comprised
parameters of that projection. The projection of the reference map of points (As,Ar) and (Bs,Br) on the source and reference maps,
and its parameters are known (in our case, Mercator). Therefore, respectively. To synthesize the third correspondence, we find on
one possible approach (which we do not employ) is to try to fit each map a point C that forms a right isosceles triangle with A
various selections of projection and parameters to the user-entered and B.
correspondence data to discover a best fit. Given the fitted model
for the source map projection and the known reference projection, 4.2.3 Quadratic reprojection
the function is determined. To accommodate maps where the constraints of affine
reprojection introduce significantly visible error, we also provide
Unfortunately, the set of projections in which source maps may be polynomial reprojection, in particular the subclass quadratic
drawn is quite large, and the process of fitting parameters to each reprojection. A quadratic reprojection takes the form:
projection is diverse and involved. An alternative approach that
we use in our application is to ignore the precise projections, and sx = c01rx2 + c01rxry+ c02rx + c03ry2 + c04ry + c05
instead use an approximation to model the entire class of potential
reprojections. sy = c11rx2 + c11rxry+ c12rx + c13ry2 + c14ry + c15
Like a projection, an approximate reprojection is a class of By introducing terms of higher degree than the linear terms of
functions selectable by parameters. MapCruncher includes two affine reprojection, the quadratic reprojection can better
classes of reprojections: (1) affine reprojections, including both approximate an exact reprojection, including some curvature. The
general affine reprojections and the restricted subclass of rigid curvature is still not perfect, because exact reprojection generally
reprojections, and (2) bivariate polynomial reprojections, involves trigonometric functions rather than polynomials. In
specifically the subclass of quadratic reprojections. These will be practice, however, we have found that the quadratic reprojection
discussed in the following sections. usually suffices. For most source maps, reprojection error is
dominated by sources other than the limitations of our quadratic
4.2.1 Affine reprojection model.
The affine reprojection is a linear relationship between the source
and reference coordinate systems: The disadvantage versus affine of quadratic reprojection is that it
requires six user-entered correspondence points to completely
sx = c00rx + c01ry + c02 constrain its parameters. These parameters are inferred in the
same manner as those for affine reprojection, as discussed in
sy = c10rx + c11ry + c12 Section 4.2.5.
An advantage of the affine reprojection is that it has only six 4.2.4 Higher-degree polynomials
parameters, which can be inferred with as few as three Of course, the technique used for quadratic reprojections can be
correspondences. (Each correspondence provides two constraint extended to polynomials of higher degree. We have found in
equations, one in x and one in y.) In Section 4.2.5, we discuss practice that quadratics are sufficient for most applications.
how these parameters are estimated. Higher degree polynomials might better approximate the exact
A limitation of affine reprojection is that it preserves straight
lines. If the source map is in, for example, a conic projection, then
exact reprojection will change straight lines in the source map into
curved lines in the reference projection. Affine reprojection
cannot produce this effect, and will therefore introduce errors into
maps where this effect is noticeable.
Boundaries in source into boundaries in reference and used to select tiles which
coordinates are projected… coordinates… contain the region of the source
map.
Figure 7. Identifying the set of tiles that cover a reprojected map
2. Transformed
boundary is axis-
aligned to select a 1. Tile boundary
region of source
transformed into
map to sample. source map
coordinates.
database of image tiles. Users can first select the maximum zoom MapCruncher’s approach is to render the pre-image of each tile
level for which tiles are produced. Each additional zoom level one at a time. This approach is efficient in both computation and
increases the spatial resolution of tiles by a factor of two in each memory. For each final rendered tile to be generated, it
dimension, and therefore increases the total storage requirements determines the section of the source map needed to generate the
by a factor of four. tile, and renders only that part of the source map. To determine
the section, the boundary of the reference tile in reference
5.1 Determining geographic extent of source coordinates is transformed through the reprojection function to
map produce a boundary in the source map coordinate system (arrow 1
in the Figure 8). An axis-aligned bounding box is drawn around
The geographic extent of the source map is determined by
the transformed tile boundary (as shown in the figure). The
applying the inverse of the reprojection function to the boundaries
region is axis-aligned because most source map formats are
of the source map. The inverse function maps from source map
amenable to sampling in such regions. The region is also slightly
coordinates to reference map coordinates, so this process
enlarged to account for projections with high curvature.
produces a boundary in reference coordinates that corresponds to
the boundary of the source map. The points on the reference Once this target region is computed, we ask the underlying PDF
boundary are converted into tile coordinates to select the set of renderer to produce a sample image of only the portion of the
tiles that contain the entire reprojected source image (see source map needed to render the final tile. This is memory-
Figure 7). This tile selection process is repeated for each zoom efficient because it only requires rasterization of small (approx.
level for which the user desires to output tiles. 300x300 pixel) images. Of course, at high zoom levels, these
images may cover a minute portion of the source map.
5.2 Selecting region of source map to sample MapCruncher uses a PDF renderer licensed from Foxit Software
In theory, the best-fit reprojection function is all that is needed to [5], which cleverly stores the list of image vectors in the PDF so
produce a complete set of rendered tiles: it allows us to find the that most of them can be pruned (not rendered) when viewing a
source-map pixel that corresponds to every possible reference- tiny region, making the pre-image approach even more
map pixel. However, there are many choices in the computationally efficient.
implementation of tile rendering that can have dramatic effects on
its efficiency and resource requirements. Finally, this small region of the rasterized source-map image is
sampled to produce the final rendered tile. For each of the
There are two straightforward approaches by which rendering 256x256 pixels in the final tile, the reprojection function is used
could be done, neither of which we use. First, one could use the to find the four nearest pixels in the source-map image. These
reprojection function (along with information about the location four pixels are combined using bilinear interpolation.
and zoom level of the tile being rendered) to map each individual
pixel’s location to a location in the source map; render the area of 6. DEPLOYMENT
the source map defined by the extent of the pixel; and use the One of our guiding principles in writing MapCruncher was that it
result of the rendering to assign visibility and color to the pixel. should minimize the specialized knowledge required by the user
This approach is prohibitively expensive in terms of the as much as practical. Therefore, it was important that
computational cost per pixel. MapCruncher not only create map image tiles, but automatically
emit a fully working web application that gives users instant
A second inefficient approach is to first render the entire source
gratification of seeing their creation come alive.
map at the scale dictated by the tile set’s zoom level. Then, for
each pixel in a final rendered tile, find the corresponding pixel in
the enormous, rendered source map. This approach, used by 6.1 Sample Web Page
many overlay tools, is computationally efficient and conceptually When MapCruncher renders output tiles, it also creates a sample
simple because the source map needs to be rendered only once. web page that shows the user’s map layers overlaid on top of
However, it is prohibitively memory-intensive when rendering Virtual Earth’s street maps and aerial imagery. The sample page
maps at high zoom levels. This is because rasterizing a vector also includes a “Find…” box, allowing users to search for
image such as a PDF source map requires memory proportional to businesses (using Virtual Earth’s yellow pages service) and
the size of the raster. For many source maps, rasterizing the entire overlay pushpins right on top of their custom maps. The new
thing at a high zoom level can result in a giga- or tera-pixel image. “VE3D” digital globe is also supported – instantly draping the
user’s map tiles on top of a three-dimensional rendering of Earth maps from 7 counties and 8 municipalities around Washington
that can be viewed from any position and angle. VE3D uses a and Oregon. Overlaying bicycle maps on top of the underlying
digital elevation map that is compatible with MapCruncher tiles, street maps is quite valuable. Bicycle maps typically do not show
so bicycle routes can actually be seen going up and over the smaller off-trail roads, making it difficult to plan an end-to-
mountains (see Figure 9). end trip without the overlay. The seamless integration of aerial
To some, it might seem that this sample web page is unnecessary: photography can also clear up ambiguities in sometimes casually-
surely anyone who bothered to create a mashup will also bother to drawn bicycle maps. For example, we used it to discover that a
write their own web page to display it! By way of pedestrian overpass was available on trail not clearly depicted as
counterargument, consider Microsoft’s basic HTML editor, crossing a major highway. The “Find a business…” feature of
FrontPage. When a user opens FrontPage, it titles the default Virtual Earth also makes it easy to, say, find an ice cream shop
blank document “New Page 1” – a string that appears 6 million along your route on a hot day.
times in the MSN Search index as of this writing. 6.3.2 National Park Service Maps
The United States’ National Park Service publishes maps of more
6.2 “Plain Old Web Server” Requirement than 200 National Parks in the public domain [10]. Each is
Another important constraint in our design was that the rendered
annotated with a rich set of data, including hiking trails, the
mashup can be served by a “POWS” – Plain Old Web Server.
names of many small lakes and rivers, geological formations, etc.
That is, we do not depend on the availability of any special server
In contrast, vendors of the street-map data found in most online
features, such as the ability to execute CGI scripts, interpret
mapping sites simply depict the park as a large blank area with the
server-side includes, or configure custom error documents.
park name.
Dependence on these features would limit our audience to
technical users who have administrative access to a web server. Using MapCruncher, it’s easy to combine the rich annotations
found in the park maps with the aerial and satellite photography
MapCruncher requires nothing from a web server other than its
provided by Virtual Earth [13]. It’s also easy to leverage Virtual
most basic function: return a file if it exists and a 404 error code if
Earth’s other features to produce new composite services – for
it does not exist. This means that users can create public mashups
example, getting driving directions from your home to the ranger
even without owning a web server – they can simply upload the
station, drawn right on top of the park map.
output directory to any public web service. This includes both
beginner-oriented services such as GeoCities and more advanced 6.3.3 Do-It-Yourself Aerial Photography
offerings such as Amazon S3. In both of these examples, server- Virtual Earth and Google are both adding and updating imagery
side execution and custom web configuration are not available. as quickly as they can; it's a top priority for them. However, for
the foreseeable future, there will always be people who want high-
6.3 Applications quality aerial photography in areas that do not yet have coverage.
MapCruncher has a wide variety of uses. Three of our favorites Previously, there was no way for users to add their own
are described here and available on the web. photography. MapCruncher makes this easy for the first time.
6.3.1 Pacific Northwest Bicycling Guide Two members of the MapCruncher team, coincidentally, are
Our most ambitious mashup to date is the Pacific Northwest private pilots. While on a flight 4,000 feet over the small town of
Bicycling Guide [14] – a seamless combination of bicycle route Forks, Washington, we had the idea of using new aerial
Figure 9. Bike trails on tiles emitted by MapCruncher draped over VE3D terrain.
photography as a source-image instead of a map. We circled for 8. ACKNOWLEDGMENTS
several minutes, taking a few snapshots out the side window with The authors would like to extend their sincere thanks to Danyel
an old digital camera. Fisher, Steve Lombardi, Karen Luecking, Joe Schwartz, Chandu
On the ground, we imported the photos into MapCruncher, using Thota, and the many testers who provided us valuable feedback.
distinctive landmarks shared by both our photos and the Virtual
Earth reference photos. The results were surprisingly good [12]. 9. REFERENCES
While seams between the images are visible, the polynomial
[1] H. Abdi. "Singular Value Decomposition (SVD) and
fitting function was able to effectively ortho-rectify large portions
Generalized Singular Value Decomposition (GSVD)." In
of our photos. (Most of them had severe perspective distortion
N.J. Salkind (Ed.): Encyclopedia of Measurement and
due to being shot at an oblique angle.)
Statistics. Thousand Oaks, Oct 2006.
Despite these problems, there was a dramatic increase in image
[2] chicagocrime.org, http://www.chicagocrime.org/map/.
quality, especially relative to the time and financial cost of our
project. In May of 2006, Virtual Earth’s coverage of Forks was [3] W. K. Edwards, A. LaMarca. Balancing Generality and
1m/pixel, 12-year old, black-and-white USGS aerial photography; Specificity in Document Management Systems, Interact'99.
Google’s was 8m/pixel satellite photography. After one hour in a [4] B. Ford, G. Back, G. Benson, J. Lepreau, A. Lin, O. Shivers.
small airplane and a few hours on the ground, we had modern, The Flux OSKit: A Substrate for OS and Language Research,
full-color, 0.5m/pixel photography of a market so small that it’s 16th SOSP, Oct 1997.
unlikely to be re-photographed by Microsoft or Google in the near
[5] Foxit Software, http://www.foxitsoftware.com/.
future.
[6] GeoRSS. Graphically Encoded Objects for RSS feeds,
7. A COMPOSABLE VIRTUAL EARTH http://www.georss.org/.
Most of the mashups we’ve seen to date are interesting because [7] Google. Google Earth, http://earth.google.com/.
the whole is greater than the sum of the parts. For example,
having a bicycle map integrated with a street map is more useful [8] Google. Google Maps, http://maps.google.com/.
than either one individually. To get the most utility from [9] J. S. Heidemann, G. J. Popek. File-System Development with
mashups, it’s not enough to combine users’ maps with Virtual Stackable Layers, ACM TOCS 12 (1), Feb 1994.
Earth. We also need a way to make them easily composable with
[10] Harpers Ferry Center. National Parks Service Maps,
each other.
http://www.nps.gov/carto/.
Ideally, mashups will no longer be thought of as individual sites,
[11] housingmaps.com, http://www.housingmaps.com/.
disconnected from the rest of the world. Instead, the building
blocks of mashups—the layers of rasters, points, and lines that [12] MapCruncher team. Do-It-Yourself Aerial Photography,
underlie them—should be composable, interchangable building http://research.microsoft.com/mapcruncher/Gallery/Forks/
blocks. We envision a world where mashups have more structure, [13] MapCruncher team. National Park Maps,
so that the bicycle layer we render can easily import the Doppler http://research.microsoft.com/mapcruncher/Gallery/National
weather data you’ve rendered, and can be imported into the web Parks/
site that features hiking layers. If people publis their applications
and the underlying data in a semantically meaningful way, a [14] MapCruncher team. Pacific Northwest Bicycling Guide,
nearly infinite set of innovative and diverse applications are sure http://research.microsoft.com/mapcruncher/Gallery/NWBike/
to follow. [15] N. C. Hutchinson, L. L. Peterson. The x-Kernel: an
MapCruncher tries to take a step in this direction by cleanly Architecture for Implementing Network Protocols, IEEE
separating the imperative code that run the mashup from Transactions on Software Engineering 17 (1), pp. 64-76, Jan
declarative code that describes the raster layer being imported. 1991.
Specifically, each time MapCruncher renders tiles, it also [16] Microsoft. Microsoft Virtual Earth,
describes those tiles—their geographic position, rendering depth, http://www.microsoft.com/virtualearth/default.mspx.
and so forth—in an XML file specially seeded with strings that [17] Open Geospatial Consortium. Geography Markup Language,
can be found by search engines. With enough people creating version 3.1.1.
MapCruncher layers, we can collectively create an enormous
database of interesting data layers, all geographically registered to [18] RunwayFinder – a flight planning tool for pilots,
compatible coordinate systems and instantly searchable using http://www.runwayfinder.com/.
existing search engines. [19] Seattle Bus Monster, http://www.busmonster.com/.
Who knows what kind of interesting mega-mashups might follow? [20] J. Snyder. Map Projections-A Working Manual, United
States Government Printing, Feb 1983.