Make The Map You Want With PROC GMAP and The Annotate Facility
Make The Map You Want With PROC GMAP and The Annotate Facility
Make the Map You Want with PROC GMAP and the Annotate Facility
Michael Eberhart, MPH, Philadelphia Department of Public Health
ABSTRACT
This paper describes how to use SAS/GIS® and PROC GMAP to create presentation-quality maps of geographic
data. Topics discussed include using U.S. Census Bureau TIGER/Line® files for geocoding address data, using
PROC GMAP and the annotate facility to display map datasets in a variety of formats, importing map files from other
software products, and assigning geocoded cases to polygons based on spatial location. Additional topics include
coordinate systems and map projections, PROC GMAP options that control appearance of maps, annotating polygon
borders, creating and combining annotate datasets, using annotate macros (%maplabel), summarizing data and
handling missing values, creating map output files using device options, loop processing with NULL datasets,
replaying graphics with high resolution, and customizing map legends with legend options and annotating.
GETTING STARTED
Maps can be an effective method for presenting data that varies geographically. Maps provide a spatial picture of the
data, and allow end-users to easily see clusters or areas of concentration. Spatial data refers to anything that can be
referenced based on its physical location, such as census tracts, zipcodes, and street addresses. Geocoding is the
process of adding spatial information to existing data based this physical location. Address geocoding attempts to
match a street address in a SAS dataset with spatial information in a SAS spatial database. If a match is found, the
coordinates for the address location (x,y) are added to the observation. Additional information about the address
location (e.g. census tract) can also be added to the address dataset. Data points can be displayed on a map
discretely or aggregated to some geographic unit.
TIGER/LINE DATA
The U.S. Census website contains TIGER/Line® files for all counties in the United States. The term TIGER® refers
to the Topologically Integrated Geographic Encoding and Referencing system used by the U.S. Census Bureau.
Each TIGER/Line file set contains a series of data files that contain spatial information for geographic features such
as roads, rail lines and rivers, as well as boundary lines for census tracts, census blocks and counties. The data
include digital information such as location in latitude and longitude, the names and types of features, address ranges
(from-to, left-right), and relationships between features (e.g. where rails cross streets, or census blocks are contained
within census tracts). The steps to download and import TIGER/Line files are outlined in a previous paper.
Import the dataset containing the addresses to be geocoded. Create a variable that contains all of the elements of
the street address – in this example, four string variables are concatenated and compressed using the COMPBL
functions to remove extra spaces. The city and state variables are added to the datasset.
1
NESUG 2008 And Now, Presenting ...
Batch geocoding is initiated using the %GCBATCH macro. The macro launches the geocoding facility and supplies
a series of values that determine what gets geocoded, how addresses are matched, what spatial data is used for
matching, and how the output file is generated. While processing the address data, the geocoding facility writes a
message to the log indicating its progress. As addresses are matched, the coordinates of the address location are
added to the address data set. Additional values contained in the spatial data, such as census tract or census block,
can also be added to the address data set for matched cases. If an address cannot be matched to the spatial data
but the address includes a zipcode, the X and Y coordinates of the zipcode centroid are added instead of the exact
coordinates of the address. To enhance matching, the geocoding process converts address components to
uppercase and attempts to standardize address components such as street direction and street type values. These
standardized versions of the address are also added to the address data set. The variables M_ADDR, M_CITY,
M_STATE, M_ZIP, and M_ZIP4 refer to the values that were actually matched during the geocoding process.
The code below will launch the geocoding facility using MAPDATA.TIGER.PHILA as the spatial reference map. It will
geocode addresses in WORK.GEOCODE using ADDRESS as the street address variable (av), CITY as the city
variable (cv), STATE as the state variable (sv), and ZIP1 as the zipcode variable (zv). It will recreate (newdata=yes)
the address dataset and include TRACT as an additional variable (pv).
EXPORT MAP
In order to display maps created in SAS/GIS using SAS/GRAPH, a map file must be exported into a traditional SAS
dataset that can be referenced by PROC GMAP. To do this, SAS provides an experimental sample program called
the GIS Exporter. The program and documentation can be downloaded from:
http://support.sas.com/rnd/datavisualization/mapsonline/html/tools.html
Select the GIS2SAS Exporter – 13jul04 Update (Request Download) link and follow the steps to download. This
download requires a username/password login. As stated in the program code, this is an experimental set of routines
for use only with Version 8 or above. The files are provided by SAS “as is” without warranty or support of any kind.
Once the files are downloaded, the GISEXPORT.CPT file must be taken out of transport format using PROC
CIMPORT.
Before invoking the GIS exporter, a series of macro variables must be assigned to define parameters of the export.
2
NESUG 2008 And Now, Presenting ...
When the GIS Exporter completes, it generates a graph file using PROC GMAP. This graph uses the exported SAS
dataset (in this case a map of census tract polygons) and a very complex annotate dataset generated by the
exporter. The annotate dataset draws lines for the other layers in the original map. Creating a graph file without the
annotate dataset will allow you to see the exported map of census tracts.
The code and output for a simple chorpleth map of census tracts is found below. However, you will notice that the
map appears different than it did in SAS/GIS. The reason for the difference is that the map is not projected. The
map dataset created by the GIS exporter uses latitude and longitude for its spatial values. Latitude and longitude are
coordinates for a spherical surface. Map projections convert these values to present spatial data with less distortion.
3
NESUG 2008 And Now, Presenting ...
MAP PROJECTION
Map projections allow a three dimensional sphere to be displayed on a flat surface. All map projections contain some
distortion, but using the right projection can help limit the amount of visual distortion. Map projections are not critical
when displaying maps that cover only a small area. However, larger areas will have more distortion, and may even
appear backwards if not projected. Projection is critical for any map that crosses the equator or meridian. The
GPROJECT procedure converts latitude and longitude into Cartesian coordinates which allows you to project maps in
such a way as to minimize distortion and maximize the display area. For most maps of the US, the default Albers
projection should suffice.
The GPROJECT procedure requires an input dataset (data=), an output dataset (out=) and an ID statement. The ID
statement refers to a variable in the dataset that identifies a unit area, and each unit area is evaluated separately.
The DEGREE option specifies that the latitude and longitude are in degrees (the default is radians), and the
EASTLONG option specifies that longitude values increase to the east.
The code below creates a projected output map dataset called PROJTRACT and creates a graph of the projected
map using PROC GMAP.
4
NESUG 2008 And Now, Presenting ...
Annotate Variables:
XSYS and YSYS indicate which coordinate system to use. In this example, coordinate system 2 is
used. Coordinate system 2 refers to the data area and the absolute value of x,y. For more information
regarding coordinate systems refer to the SAS help tool.
when indicates when annotate graphics are drawn – ‘a’ indicates that annotate graphics are drawn
after the procedure output and ‘b’ indicates that they are drawn before the procedure output.
position indicates where a text string will be drawn in relation to the x,y coordinates. A value of ‘E”
indicates the text should be centered one-half cell below the position of the x,y coordinate. The default
is 5 (centered).
style applies to the ‘label’ function and indicates the text font.
function indicates the action to be taken – options include MOVE, LABEL, BAR, and PIE (as well as
several others).
color and size indicate the color and size of the text (or the BAR, PIE, etc.). The SIZE is interpreted
based on the function – for TEXT, size represents the height of the text.
/*annotate dataset */
Because the annotate data must be projected using the same projection criteria as the map, the GPROJECT
procedure is run on the annotate dataset as well. The variable segment=1 is created so that all of the elements are
evaluated together, just as was done in the map projection.
pattern1 value=empty;
proc gmap data=projtract map=projtract anno=annop;
id tract;
choro tract /
coutline=gray
nolegend
;
run;
quit;
6
NESUG 2008 And Now, Presenting ...
function='image'; style='fit';imgpath='c:\nesug08\n_arrow.bmp';
x=10;y=16;output;
run;
LEGEND - The LEGEND option is included in the GMAP, however most of the legend elements will be added using
the annotate facility. Therefore, the legend is included in the GMAP leaving space to annotate legend values for the
geocoded addresses.
down=3 defines the number of rows for the legend entries (these two options [across= and down=] allow for
great flexibility in how a legend is displayed)
position=(bottom right inside) places the legend below the graph, justified right, inside of the graph output area
mode=share places the legend in the procedure output area allows other graphic elements to share the same
space (other options include RESERVE and PROTECT)
offset=(-2cm,) places the legend 2cm to the left of the default position for bottom/right/inside (the offset
function allows you to place the legend anywhere you want)
value= sets the height, font and text of the legend value(s)
7
NESUG 2008 And Now, Presenting ...
IMPORT SHAPEFILES
If you want to present data at some other geographic unit, you will need a traditional map dataset drawn to that unit.
Census tracts (and census blocks) as well as county boundaries are available with the TIGER/Line data, and SAS
provides map datasets for zipcodes. But users often want to see data using boundaries that are specific to their
needs. Other GIS software programs allow users to create geographic boundary layers. Layers created using ESRI
can be imported into SAS using PROC MAPIMPORT.
The two required arguments in PROC MAPIMPORT are the name of the output dataset (OUT=) and the complete
pathname of the input shapefile (DATAFILE=). By default, SAS reads and converts all variables in the input data,
however the INCLUDE and EXCLUDE options can be used to limit the number of variables.
LABELLING POLYGONS
The %MAPLABEL macro creates an annotate dataset that can be used to label all or some of the polygons in the
map. Prior to running any of the annotate macros provided by SAS, you must first run the %ANNOMAC macro.
Upon completion, a message in the log indicates that annotate macros are now available. The %MAPLABEL macro
has several arguments, and depending on how you want the polygons labeled, you may need to create a new
variable. The output annotate dataset will place the label at the centroid of the polygon.
8
NESUG 2008 And Now, Presenting ...
%annomac;
proc sort data=sasuser.council;by dist_num;run;
%maplabel(sasuser.council,sasuser.council,anno1,dist_num,dist_num,font=swissb,
color=red, size=3);
You can customize the label by creating one or more new variables, or by creating more than one %maplabel output
file and conbining the results. For example, to label the polygons with the text “Dist.” on one line and the district
number below it, create two annotate datasets and use the annotate variable ‘position’ to place one label below the
other. The position variable determines where text is placed in relation to the location defined by x,y variables.
Position ‘5’ is centered on the exact location. Position ‘8’ is centered one cell below the location.
9
NESUG 2008 And Now, Presenting ...
ANNOTATE POLYGONS
To create annotate dataset to draw polygon boundaries, first remove observations from the traditional map dataset
that do not refer to the polygon boundaries using PROC GREMOVE. The GREMOVE procedure combines unit
areas defined in a map data set into larger unit areas by removing shared borders between the original unit areas.
Then use the data step to create an annotate dataset that has functions to start polygons and continue polygons as
needed.
10
NESUG 2008 And Now, Presenting ...
/*Call macro for each district, providing ward and 4 boundary values*/
/*Executed once for each unique value of voting ward*/
data _null_; set assign;
by vote00;
call
execute('%assign('||vote00||','||xmin||','||ymin||','||xmax||','||ymax||')');
run;
11
NESUG 2008 And Now, Presenting ...
12
NESUG 2008 And Now, Presenting ...
The filename statement is used to create a fileref and an output file pathname. The fileref ‘maps’ is used in the
gsfname option to direct the output of the procedure. The device is set to .bmp, just as it was when the graphs were
created. Graph options are used to improve the resolution of the output graph. The ‘xpixels’ and ‘ypixels’ options
increase the resolution by a factor of 6. Without these options, xpixels=584 and ypixels=403 for a final resolution of
96 pixels per inch. Changing the xpixels and ypixels to 3600 and 2400, respectively, increases the resolution by a
factor of 6 (approx.) and creates an output file at approximately 600 pixels per inch. The ‘lfactor=6’ option increases
the line thickness by the same factor so that the lines do not disappear when the resolution is increased.
/*Export map*/
filename maps 'e:\nesug08\map_poly_assign.bmp';
goptions xpixels=3600 ypixels=2400 device=bmp gsfname=maps lfactor=6;
proc greplay igout=work.gseg tc=sashelp.templt
template=whole nofs;
treplay 1:gmap;
run;
quit;
/*ACTIVEX GMAP*/
goptions reset=all device=activex;
filename odsout 'c:\nesug08';
ods html path=odsout
file='council activex.html';
legend1 label=(f=swissb j=c 'Cases') across=1 down=4 frame position=(bottom
right inside);
title h=2 f=swissb 'Cases by Council District';
proc gmap data=cdist
map=sasuser.council
;
id dist_num;
choro _freq_ /
levels=4
legend=legend1
;
run;
quit;
ods html close;
ods listing;
13
NESUG 2008 And Now, Presenting ...
REFERENCES
Eberhart, M. “Geocoding and PROC GMAP - Tools for Presenting Spatial Data”, available online at
http://www.nesug.org/proceedings/nesug07/hw/hw05.pdf
Odem, E. and Massengill D. “Cheap Geocoding: SAS/GIS® and Free TIGER® Data”, available online at
http://support.sas.com/rnd/papers/sugi30/CheapGeocoding.pdf
SAS Institute Inc., SAS 9.1.3 Help and Documentation, Cary, NC: SAS Institute Inc., 2000-2004.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Michael Eberhart, MPH
Philadelphia Department of Public Health
1101 Market Street
8th Floor
Philadelphia, PA 19107
Work Phone: 215-685-4772
Email: [email protected]
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.
14