Vds
Vds
Vds
SAS Visual Statistics 8.1: The New Self-Service Easy Analytics Experience
From SAS Global Forum 2016 Proceedings, by Xiangxiang Meng, Cheryl LeSaint, and
Don Chapman.
Create Your First Graph: Visual Data Exploration with SAS ODS Graphics Designer
Excerpt from Chapter 3 from SAS ODS Graphics Designer by Example: A Visual Guide to Creating
Graphs Interactively by Sanjay Matange and Jeanette Bottitta.
sas.com/books
for additional books and resources.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies. 2017 SAS Institute Inc. All rights reserved. M1597709 US.0317
About This Book
Topics covered in this free e-book illustrate the power of SAS software that are available as tools
for data visualization, highlighting a variety of domains, including infographics, geomapping, and
clinical graphs for the health and life sciences.
For many more helpful resources, please visit support.sas.com and support.sas.com/books.
Do you have questions about a SAS Press book that you are reading? Contact the author through
saspress@sas.com or https://support.sas.com/author_feedback.
SAS has many resources to help you find answers and expand your knowledge. If you need
additional help, see our list of resources: https://support.sas.com/publishing.
vi
Foreword
Sometimes a picture is worth a thousand words, especially when you to need to present complex
information in an easily consumable form. In the past, cavemen painted walls, the Egyptians
carved hieroglyphs, printers used etching and engraving, and draftsmen created technical
drawingsall to visually convey information.
These days, the amount of information has increased exponentially, and luckily we have
computers to help us create our picturesor as we data scientists like to call them, visualizations.
Visualization encompasses all the graphical techniques that can be used to better understand your
data and the results of your analyses. Visualization includes things like graphs and maps, and even
a text table with visual cues such as colored traffic lighting.
SAS software provides many different techniques to visualize your data, and several excellent
books have been written to demonstrate how to use these techniques. We have selected chapters
from SAS Press books and relevant SAS Global Forum papers to introduce you to the topics and
let you sample what each of the authors has to offer. If any of these chapters pique your interest,
then those will probably be good books to add to your personal library:
1. I-Sah Hsieh and Alice Kassens Big Data and the Stories It Can Tell - Exploring Big
Data with SAS Visual Analytics: Visualizing Trade Data (Anticipated Fall 2017 Sign
up to be notified when it is available.)
Big data is everywhere. To many, defining big data is not the important issue. Rather, it
is how we use it. Structured and unstructured data can tell stories of fraud, need, and
want with the right technologies and analytic tools. In this excerpt, we explore a few
examples and scenarios to illustrate the storytelling process.
2. Tricia Aanderud, Ryan Kumpfmiller, and Rob Collum The Where of Data - An
Introduction to SAS Visual Analytics: How to Explore Numbers, Design Reports, and
Gain Insight into Your Data
Traditional business intelligence systems have focused on answering the who, what, and
when questions, but organizations often need to know the where of data as well. SAS
Visual Analytics makes it easy to plot geospatial data, which can add a completely new
element to your data visualizations and analysis.
3. Travis Murphy Infographics Powered by SAS Visual Analytics and SAS Office
Analytics
Infographics are a representation of information in a graphic format that is designed to
make the data easily understandable at a glance, without having to have a deep
knowledge of the data. Because of the amount of data available today, more infographics
are being created to communicate the information and insight from all available data,
both in the boardroom and on social media. This excerpt shows you how to create
information graphics that can be printed, shared, and dynamically explored with objects
and data from SAS Visual Analytics.
viii Visualization Data with SAS: Selected Topics
4. Xiangxiang Meng, Cheryl LeSaint, and Don Chapman SAS Visual Statistics 8.1: The
New Self-Service Easy Analytics Experience
In today's Business Intelligence world, self-service is a prerequisite because enables an
everyday knowledge worker to explore data and personalize business reports without
being tech-savvy. The new release of SAS Visual Statistics introduces an HTML5-based,
easy-to-use user interface that combines statistical modeling, business reporting, and
mobile sharing into a one-stop self-service shop. The backbone analytic server of SAS
Visual Statistics is also updated, enabling an end user to analyze data of various sizes in
the cloud. This excerpt illustrates this new self-service modeling experience in SAS
Visual Statistics using telecom churn data, including the steps of identifying distinct user
subgroups using decision tree; building and tuning regression models; designing business
reports for customer churn; and sharing the final modeling outcome on a mobile device.
5. Sanjay Matange and Jeanette Bottitta Create Your First Graph: Visual Data
Exploration with SAS ODS Graphics Designer SAS ODS Graphics Designer by
Example: A Visual Guide to Creating Graphs Interactively
The SAS ODS Graphics Designer is an interactive drag-and-drop feature that you can
use to create many graphs, including histograms, box plots, scatter plot matrices,
classification panels, and more. You can render your graph in batch with new data and
output the results to any open ODS destination, or view the generated Graph Template
Language (GTL) code as a leg-up to GTL programming. The book takes you step-by-
step through the features of the designer, providing you with examples of graphs that are
commonly used for the analysis of data in the health care, life sciences, and finance
industries.
6. Sanjay Matange Clinical Graphs Using the SAS 9.4 SGPLOT Procedure Clinical
Graphs Using SAS
Clinical graphs often display the data in one cell along with derived statistics and other
details that aid in the decoding of the information in the graph. Most of these single-cell
graphs can be created using the SGPLOT procedure. With SAS 9.4, the SGPLOT
procedure supports some new and useful features that simplify the creation of such
graphs. The goal of this excerpt is to cover in detail the creation of some commonly used
clinical graphs using SAS 9.4. The chapter will provide not only code that you can use
directly for such graphs, but also will provide ideas on how you can use or combine plot
statements to create your own custom graph.
7. Warren Kuhfeld Customizing the Kaplan-Meier Survival Plot
Heavily used by pharmaceutical companies and other health care researchers, the
Kaplan-Meier plot may be the most important plot in SAS/STAT. In this excerpt, I
provide a technical customization method for the Kaplan-Meier plot to display patient
survival over time for one or more groups of patients.
We hope these selections give you a better picture of the many tools that are available to
visualize your data.
Robert Allison
The Graph Guy
Foreword ix
Robert Allison has worked at SAS for over 20 years, and is perhaps the
foremost expert in using SAS/GRAPH to create custom graphs. Robert has
shared his expertise by presenting papers at several conferences such as
SUGI, SAS Global Forum, and SESUG. He also wrote the book
SAS/GRAPH: Beyond the Basics, and he writes blog posts demonstrating
how SAS can be used to visually analyze real-world data.
Robert received his PhD and MS degrees from North Carolina State
University. During his PhD program, he used SAS software to build a data
warehouse and visualization system for textile and apparel-related data.
x Visualization using SAS: Selected Topics
Big Data and the Stories It Can Tell
I-Sah Hsieh, Alice Louise Kassens
Excerpt from upcoming title Exploring Big Data with SAS Visual Analytics: Visualizing Trade Data,
anticipated Fall 2017. Sign up for our new book notice and receive an email when this book is
available for purchase.
Big data is an amalgamation of information from a variety of sources. Specific skills and powerful
technology are required to properly handle and use big data to its fullest potential. This sample chapter
from Exploring Big Data with SAS Visual Analytics: Visualizing Trade Data shows the characteristics of
big data and provides examples of it from a variety of industries. To contextualize the use of big data in
these industries, this sample chapter provides specific examples of businesses harnessing the power of big
data to find stories from within and to use those stories to cut costs, save lives, and reduce product delivery
time are provided. A drawback to many methods of big data analytics is the steep learning curve. An
alternative, visual analytics, is introduced. The remainder of the book explores an application of SAS
Visual Analytics to divine stories from the millions of rows of data within the UN Comtrade data setthe
SAS Visual Analytics to UN Comtrade.
support.sas.com/hsieh
Alice Louise Kassens, PhD, is the John S. Shannon Professor of Economics at Roanoke
College in Salem, Virginia, where she teaches econometrics, labor economics, health
economics, and principles of macroeconomics. Kassens incorporates the SAS Visual
Analytics UN Comtrade platform into her macroeconomics course. Additionally, Kassens
started the first undergraduate SAS Joint Certificate Program at Roanoke College and
teaches all courses required for the certificate. Dr. Kassens earned a BA in economics and
history at the College of William and Mary, a PhD in economics from North Carolina
State University. She taught at Washington and Lee University prior to taking her position
at Roanoke College. She also serves on the Virginia Governors Joint Advisory Board of Economists.
support.sas.com/kassens
Chapter 2: Big Data and the Stories It Can Tell
What Is Big Data? ....................................................................................................... 1
The 3Vs ..........................................................................................................................................1
Examples of Big Data ...................................................................................................................2
Finding the Stories Within........................................................................................... 3
Monitoring the Spread of Disease ..............................................................................................3
Wanting to Improve Speed of Delivery .......................................................................................3
Methods of Analysis.................................................................................................... 3
Data Preparation ...........................................................................................................................3
Data Integration ............................................................................................................................4
Summary Statistics.......................................................................................................................4
Statistical Modeling ......................................................................................................................5
Machine Learning .........................................................................................................................6
An Alternative: Visual Analytics .................................................................................. 6
UN Comtrade Data and Visual Analytics ..................................................................... 7
References ................................................................................................................. 7
Over time, the definition of big data changes as data becomes bigger and begets new challenges and opportunities.
The first documented use of the term big data was in a 1997 paper by NASA scientists who were describing the
difficulties of using the enormous amount of information created by supercomputers (Cox and Ellsworth, 1997).
Incorporating some of the challenges of working with big data, Laney (2001) defines it as high volume, high velocity,
and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight
discovery and process optimization (Gartner). This definition is an abridged version of a 2001 research note by Laney
(Laney, 2001).
In sum, big data is an amalgamation of information from a variety of sources. Handling the data and using it are both an
art and a challenge that require specific skills and technology. Both the data and the tools to gather, store, and analyze it
are constantly evolving.
The 3Vs
Volume, velocity, and variety are three terms, collectively referred to as the 3Vs, used to label the challenges and
nature of big data. Volume refers to the amount of data collected from many sources including social media,
consumer purchases, Internet searches, financial transactions, and machine-to-machine interactions. Currently, an
estimated 2.5 quintillion bytes of data is created each day: 2,500,000,000,000,000,000 bytes. That is enough bytes
to fill 2.5 trillion novels, which stacked on top of one another would measure 33.3 billion feet. That is the distance
from the earth to the moontimes 26! Now that is BIG.
2 Exploring Big Data Using Visual Analytics: Visualizing Trade Data
Figure 1. 3Vs
Velocity describes the rate at which data is created, stored, and processed. Data is generated every second. Each
social media entry, including video and photos, is data. Every time you use a credit card or loyalty card, information
is created, stored, and processed.
Variety references the different forms of data. Data can be structured or unstructured. Most data is unstructured and
is not easily searchable. Examples of unstructured data are email, tweets, photographs, videos, and purchase
information. Structured data resides in fixed fields and is thus easily searchable and analyzed. Data stored in a
spreadsheet is an example of structured data.
Some groups add other dimensions to the big data definition, such as veracity, variability, and complexity (4Vs and
a C). Regardless of acronym or number of dimensions, it is clear that big data is a jumble of quickly growing bits
and bytes of information.
Retail
Retailers sell goods and services to the public. Walmart Stores, Inc., Kroger Company, Amazon.com Inc., The
Home Depot, Inc., and Target Corporation are among the largest retail companies in the world. Consider the data
generated, collected, and stored by such establishments: consumer data (from POS systems, website trackers, social
media, and call center logs), inventories, prices, and employee information and observations.
Manufacturing
Manufacturers create goods using machinery, typically on a large scale. The top manufacturers in the world include
Toyota Motor Corporation, Samsung Electronics Company, Ltd., General Electric, and Apple. Data types that are
important to these companies include production quantities, quality-control, and productivity of workers.
Finance
Financial services are those dealing with the management of money including those offered by banks and insurance
companies. Berkshire Hathaway, Inc., American Express Company, BNP Paribus, and Banco Santander are some of
the largest financial services companies in the world. They consider customer call records, claims records, and
inputs for stress tests.
Chapter 2: Big Data and the Stories It Can Tell 3
Health Care
The health care industry provides goods and services to treat patients. The industry consists of a variety of markets
such as physician, nurse, insurance, hospital, and pharmaceutical. In the private sector, the largest health care
companies include McKesson Corporation, UnitedHealth Group, Inc., Express Scripts Holding Company, and
Cardinal Health, Inc. Crucial data covers the quality, intensity, quantity, and costs of services and come from both
patients and providers and in a variety of forms (structured and unstructured).
Government
Government agencies (local, state, and federal in the United States) purchase and provide a variety of goods and
services. In the aggregate, these agencies collect and consider a large amount of data concerning trade, individual
income, and public health to name a few.
Education
Public and private educational institutions and firms provide educational services. Data used includes applicant and
employee information, financial aid dollars, and operating expenses.
Transportation
Transportation firms provide transportation services for cargo and passengers. United Parcel Service Inc., Delta Air
Lines Inc., and Union Pacific Railroad are among the worlds largest transportation companies. Essential data for
these firms includes passengers, traffic and routes, cargo tonnage, and weather.
Methods of Analysis
In the previous section, we explained the notion of big data and gave examples of both the data itself and the stories that
it can tell. How do we get from the data to the storytelling? It is a learned art, much like that so carefully mastered by the
likes of Shakespeare and Twain. Telling stories with big data requires special analytic skills and technology and a touch
(or more) of creativity. In this section, we will briefly and simply cover popular methods of analysis used today by data
scientists like those at UPS. We will assume that you have identified your research question or goal.
Data Preparation
The most time-consuming task of analyzing data is frequently the preparation process. We briefly discuss that
process here and then move on to a discussion of analysis methods. First, the data must be gathered. Sometimes the
data must be painstakingly collected by the researcher from historical documents, household surveys, or clinical
trials. In many cases, the data is collected elsewhere by machines, other humans, and computer programs.
4 Exploring Big Data Using Visual Analytics: Visualizing Trade Data
Data comes from a variety of sources, as mentioned earlier, and. Once collected, almost always requires some
cleaning prior to analysis. Data cleaning describes the detection and removal of errors and inconsistencies to
improve data quality. Problems within a data set might include misspellings, missing observations, entry errors, and
other forms of invalid data. To maximize the effectiveness of a particular data set during analysis, steps should be
taken to remove these errors. Frequently, this is an arduous task because much of it must be undertaken by the
human eye (looking at the data itself or summary statistics) or basic computer programs. The best data-cleaning
processes are consistent and treat errors in a uniform way. Data cleaning does not get the glory of data analysis and
its results, but is a crucial step toward the accuracy of those results and thus the reputation of the researcher.
Data Integration
Users of big data use data from disparate sources and locations. Examples of sources include email, call logs, social
media, and photographs. Examples of locations would be separate businesses. These data sets must be brought
together for analysis on a uniform query space. There are several methods of integration ranging from highly
manual to virtual and highly automated. For example, a company might have three offices in separate locations,
each with their own databases. These databases might contain information pertaining to sales, customer satisfaction,
and inventories (databases A, B, and C). In order to more efficiently run their business and increase profitability, the
company might decide to bring the three databases together so that anyone within the firm can access and analyze
the data as a whole.
There are several methods of data integration ranging from highly manual with direct access to specific source data
to a virtual access point that is separate from the data sources. One popular method is to create a warehouse of the
disparate data sources. The warehouse is a unified version of the databases and is managed separately from the
originating sources. Users from any of our example firms can access the warehoused data to monitor processes, find
trends, test hypotheses, and make predictions.
The following sections briefly discuss these methods of analysis and how we can learn from data. These discussions
only scratch the surface of the topics, and interested readers are encouraged to explore these topics in more detail.
Summary Statistics
A simple, but important type of data analysis involves summary statistics of either an entire data set or a randomly
drawn subset. Summary statistics provide quick and useful information about the nature and distribution of
variables within a data set and can lead to the discovery of data errors. General types of summary statistics are
measures of central tendency, dispersion, and relatedness.
Measures of central tendency, such as the mean, median, and mode, indicate the value around which a distribution is
centered. The values can be used to describe a typical observation. For example, the sample mean household income
provides a measure of the typical income level earned by a group of people. This can be important for a company
trying to understand their consumer base.
Chapter 2: Big Data and the Stories It Can Tell 5
Measures of dispersion describe the variability of observations from a measure of central tendency. Examples
include the variance and standard deviation. It is useful for a firm to know both the earnings of their typical
customer and how much observations differ from that measure. If the variability is large in our example, many
people earn well below and/or above the mean income value. Targeting the typical household for marketing
purposes might not be successful because very few people actually earn that income level.
Frequently, analysts are also interested in relationships between variables. Measures of relatedness, such as the
covariance and correlation coefficient, quantify the linear relationship between two variables. For example, a firm
might want to know the relationship between sales of a product (for example, boots) and weather. If the relationship
is strong, the firm knows to track the weather and adjust inventories accordingly. If there is no relationship, or the
variables are independent, there is no need to track weather data (at least as far as sales and inventories are
concerned).
Other common summary statistics include minimum and maximum values and measures of range. Not only do
summary statistics provide a description of a data set, but they also help us find data errors. For example, if you
know that the maximum value of a survey response is five, and you review your summary statistics and note the
maximum value reported is a nine, there is a data entry error that should be addressed.
Statistical Modeling
Although summary statistics are simple to generate and useful, they do have drawbacks. One problem is their
narrowness. For example, when you look at the relationship between sales and weather, a correlation coefficient
only measures the linear relationship between the two variables, and ignores all else. Sales are likely impacted not
only by weather, but also by income, consumer sentiment, time of year, and other factors. A more robust measure
will tell us the impact of weather on sales while holding everything else constant.
Statistical modeling is a subfield of mathematics in which relationships between inputs are used to predict an
outcome. Regression is a common statistical method of analysis that measures the impact of a set of variables or
features simultaneously. A dependent variable (for example, each online sale for a particular firm) is regressed on a
set of independent variables. Examples of independent variables include income, age, and gender of the buyer,
weather at the time of sale, and use of coupon in the transaction. The regression will generate a set of estimated
coefficients for each independent variable. These coefficients quantify the additional change in the dependent
variable from an additional unit of the independent variable, holding all other independent variables constant. The
differences between statistical modeling and other methods (some of which are overviewed in the following
sections) are illustrated in Figure 3 (Srivastava, 2015).
Regression analysis is useful for hypothesis testing (for example, customers buy more boots when it is cold) and for
quantifying the relationship between variables while holding others constant (for example, Customers buying 20%
more boots when it is cold out than when it is warm out, holding all else constant). In addition to testing hypotheses
6 Exploring Big Data Using Visual Analytics: Visualizing Trade Data
and generating estimated values, regression estimates can be used for predictions. For example, given values for all
independent variables and estimated coefficients, what is the predicted value of sales? Clearly, statistical models are
more advanced and likely more informative than basic summary statistics.
Machine Learning
Machine learning (ML) is a subfield of computer science and artificial intelligence (AI) in which algorithms learn
from data. The learned models are then used for predictions. This differs from statistical modeling in which the
model is explicitly defined by the researcher. ML became possible with advances in technology and falling prices in
that technology. It is widely used with big data. For example, machine learning is telling Pandora, Netflix, and
Amazon which song, movie, or product that you might like based on your information, user history, and the choices
made by people like you. ML can be supervised (labeled data is used to train the model), unsupervised (data is
grouped from unlabeled data), or combinations of the two.
Visual analytics describes data visualization methods that permit analysts and nontechnical users to visualize data
and make decisions; it uses pictures to tell a story. Before developing the visuals, the researcher should be familiar
with the data (for example, which variables are numeric); have a specific question or questions to be addressed; and
know the audience to whom the results will be presented (SAS, 2014). The simplest visuals to tell the story are the
best ones and the best visual can vary from audience to audience.
Visuals range in complexity. Some common, simple graphics include line, bar, and column charts, which are used
to show the relationship between two or more variables. Figure 3 shows the percent change in quarterly US Gross
Domestic Product (GDP) and the unemployment rate between 2000 and 2016 (Federal Reserve Economic Data,
2016). This chart might be of interest to someone analyzing the relationship between production in the economy and
the labor market. It is clear that unemployment is counter-cyclical, meaning it moves counter RGDP (Real Gross
Domestic Product) and the business cycle. Added clarity is brought by highlighting periods of recession (in gray).
This one picture easily relates the story of recent economic history in the United States. If a researcher presented or
analyzed a large table of data, the story would be harder to tell.
Chapter 2: Big Data and the Stories It Can Tell 7
NOTE: Data was downloaded from FRED 6/14/2016. Graphic developed by the author.
References
Big data. 2008. In OED Online. Oxford: Oxford University Press. Retrieved June 5, 2016, from
http://www.oed.com.
Bureau of Economic Analysis. June 14, 2016. Real GDP. Federal Reserve Bank of St. Louis.
Cox, M., and D. Ellsworth. 1997. Application-controlled demand paging for out-of-core visualization.
Proceedings of the 8th Conference on Visualization '97, pp. 235-ff. Los Alamitos, CA: IEEE Computer
Society Press.
Davenport, T. H., and J. Dyche. May 2013. SAS Institute white paper. Big Data in Big Companies. Retrieved
from SAS: http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper2/bigdata-bigcompanies-
106461.pdf.
Gartner. No date. Gartner IT Glossary. Retrieved from Gartner: http://www.gartner.com/it-glossary/big-data/.
Laney, D. February 6, 2001. 3-D Data Management: Controlling Data Volume, Velocity, and Variety. Retrieved
from Gartner Blog Network: http://blogs.gartner.com/doug-laney/files/2012/01/ad949-3D-Data-
Management-Controlling-Data-Volume-Velocity-and-Variety.pdf.
SAS Institute Inc. 2014. SAS Institute white paper. Data Visualization Techniques. Retrieved from SAS:
http://www.sas.com/content/dam/SAS/en_us/doc/whitepaper1/data-visualization-techniques-106006.pdf.
Srivastava, T. July 1, 2015. Difference between machine learning and statistical learning. Retrieved June 13,
2016, from http://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-
modeling/.
US Bureau of Labor Statistics. June 14, 2016. Civilian Unemployment Rate. Federal Reserve Bank of St. Louis.
Retrieved June 14, 2016. https://fred.stlouisfed.org/series/A191RL1Q225SBEA
US Federal Reserve Board. July 2014. 2013 Federal Reserve Payments Study. Retrieved from Federal Reserve
Bank Services:
https://frbservices.org/files/communications/pdf/general/2013_fed_res_paymt_study_summary_rpt.pdf
The Where of Data
Tricia Aanderud, Rob Collum, and Ryan Kumpfmiller
Excerpt from An Introduction to SAS Visual Analytics: How to Explore Numbers, Design
Reports and Gain Insight into Your Data
Maps or geospatial data enable viewers to associate the real concept of land with the abstract
concept of data. Exploring geography with your eyes and associating facts with those areas can
change a viewers perspective. Its useful to know where data points occur as well as where they
do not. SAS Visual Analytics makes this task easier than ever.
Perhaps youve heard people talk about tornado alleyits an area down the middle of the United
States where tornadoes occur more frequently. Tornadoes are powerful and scary storms that
produce wind speeds capable of sending a wood board through a metal car door. These storm
events are responsible for massive property damage and loss of lives. In this chapter, the United
States National Climatic Data Center storm events (such as tornadoes) since 1950 were plotted.
The tornadoes are rated by their intensity with F5 being the most destructive.
What is awesome about SAS Visual Analytics is that users can quickly and easily convert these
storm events to a map where they can see tornado touchdowns. With a few more clicks, they can
filter the tornados by strength and by year. Its shocking at first to see all the tornados! When you
start peeling away the data by filtering for strength, you realize where the events are more likely
to occur. Then if you animate the map with time, you realize how rare these events are. Use this
chapter to learn about the principles of geospatial data and to add new dimensions to your data.
support.sas.com//aanderud
Rob Collum is a Principal Technical Architect in the Professional Services
Division at SAS. For the past twenty years, Rob has enabled the delivery of
high-performance solutions that provide substantial value and meaningful
impact to SAS customers around the world. He currently works alongside a
team of professionals who partner with other divisions to identify, create, and
standardize architectural practices for the newest SAS technologies. Rob
received a Bachelors degree in Computer Science from North Carolina State
University, is a regular contributor to SAS Global Forum, and has coauthored
several SAS certification exams.
support.sas.com/collum
support.sas.com/kumpfmiller
An Introduction to SAS Visual Analytics: How to Explore
Numbers, Design Reports, and Gain Insight into Your Data.
Full book available for purchase here.
chapter six
SAS Visual Analytics makes it easy to plot geospatial data, which can add a completely new element to your
data visualizations and analysis. In a tabular report, multiple columns might represent customers, competitors,
and demographic information. The tabular report might not reveal anything useful. But if you can geocode the
data and overlay it on a map, you quickly see where the better customers are, where they are in relation to
competitors, and the regions that provide the most market potential based on underlying demographics.
SAS Visual Analytics geo mapping capabilities are based on integration with two mapping technologies:
OpenStreetMap and ESRI ArcGIS. The examples in this chapter use OpenStreetMap with SAS Visual
Analytics 7.3 unless otherwise noted. In this chapter, you will learn how to create geographic data items and
geospatial objects.
In her book, The Wall Street Guide to Information Graphics, Dona Wong suggests that there are times when
geography is not part of the story, so it doesnt make sense to force it to be. In her example, she shows two sales
regions where sales were higher in one. In the following figure, the example was re-created using Australian
states.
132 An Introduction to SAS Visual Analytics
Because the regions are so disproportionate in size, comparing the sales revenue is not helpful. It does not lead
to any conclusion except that Western Australia generated more revenue than Victoria. Your viewer does not
have any useful takeaway, because the conclusion might be expected. It doesnt seem relevant to a data story in
which the focus was really sales revenue. With only two values to display, would a list table or even a pie chart
have been a better data visualization choice? The point of this example is to understand that even if you can
show a cool geo object, you should ask yourself if it makes sense for your data story.
Europe is an international business center and a leading tourist destination. If you want to tell a data story about
how popular a location it is, you could start by exploring the airport traffic. The Anna Aero website
(http://www.anna.aero) contains data that details trends from most of the worlds airports. You can use the
airport codes and passenger counts to start a data story about the most and least popular cities.
In the following figure, the location of each airport is shown along with the passenger count and the difference
from the previous year. In this example, location enhances the story. Instead of fancy calculations, the viewer
can simply use their eyes to search for patterns.
the where of data 133
There are multiple observations. You might notice the darker circles that show increased passenger counts.
Perhaps you notice where there are multiple airports within a close radius, and you are curious why one has a
higher passenger count. When used effectively, geospatial data can reveal previously unknown patterns or assist
with confirming suspicions.
To keep things easy, SAS Visual Analytics has predefined geographic data elements ranging from general
values such as country names, to specific values such as ISO country codes. If your geographic data item
contains a country name, then it can be matched to an internal table so that the location can be plotted on a map.
Starting with SAS Visual Analytics 7.1, geographic data items can be country name, ISO 2-Letter codes, ISO
Numeric Codes, or SAS Map ID values. You select the predefined method when you create the geographic data
item.
These geographic data elements represent the center of the area. If you are showing France, it can be shown
with a country outline or at the center of the country. The following table contains the available predefined
geographical data elements provided by default and examples of how the incoming data items are expected to
appear. The table shows the data values expected from three countries.
134 An Introduction to SAS Visual Analytics
US State Abbreviations NC
On the SAS Support site, there is a Geographic Lookup Values for SAS Visual Analytics (at
http://support.sas.com/rnd/datavisualization/vageo/71/VA71LookupValues.html). This page contains a list of
these values to help you understand your specific location. The tables at this site list the countries and the
associated ISO numeric codes.
SAS Visual Analytics uses internal tables in the MAPSGFK library that is shipped with the product. You can
review the tables in this library to ensure that your data matches the expected name by using an application such
as SAS Studio. For additional assistance in creating geo data, you can use the GEOCODE procedure that is
available with the SAS/GRAPH software.
There is an international standard called ISO 3166 published by the International Organization for
Standardization. This standard applies numeric values to countries and regions that everyone can
use. There are several advantages to using numeric references, particularly in the data world.
If a programmer is using a non-Latin based language, such as Chinese or Hebrew, the number
makes it easy to look up values. Also, when new countries form, a new number can be assigned
while maintaining the older number for historical purposes.
After importing data into SAS Visual Analytics, you must assign the data item to a Geography role before it can
be used with any of the geo objects. You can create a geographic data item from an existing character or
numeric data item. To create a geographic data item:
Right-click the data item that contains the geographic element that matches the predefined role. In this
example, the Country data item contains the country names, such as Australia or Brazil.
Note: Some users prefer to duplicate the data item before assigning it to this role.
the where of data 135
Select Geography Country or Region Names. Your data item is moved under the Geography
section. You can use it with geographic roles.
Choose a geographic role.
If SAS Visual Analytics cannot plot your country data item, you might need to convert a countrys common
name to its official name. For example, Russia, United States of America, and Great Britain could be in your
data set, but SAS Visual Analytics cannot plot them. When you search the country names in the
MAPSGFK.WORLD data set, you learn that these countries use a different IDNAME. Often it is easier to
convert the values to the ISO numeric code rather than using names.
All geospatial data items represent a location on the planet Earth. A specific point or address has a set of
coordinates, which are called latitude and longitude. You might recall from elementary school when you studied
the globe and learned how it was divided by imaginary parallel lines that circle the globe from and east to west
(called latitude) and from north to south (called longitude).
When you provide a locations latitude and longitude, you are referencing these lines. If you think about the
worlds airports, its possible to describe the geospatial location with just latitude and longitude coordinates. In
the following figure, the airports are highlighted on the map, and the table on the left shows the airport name
with its coordinates. Notice that the latitude numbers are similar because these airports are in a similar eastern
European location. There is some variation in the longitude numbers as the airport is further south. Compare the
Charles De Gaulle (Paris, France) coordinates to the Dublin (Dublin, Ireland) coordinates to better understand
the values.
136 An Introduction to SAS Visual Analytics
To create a custom geographic item, you must have the latitude and longitude coordinates available in the data
set. The coordinates can be based on the World Geodetic System (WGS84), Web Mercator, and the British
National Grid (OSGB36). The default is the World Geodetic System (WGS84).
There are three coordinate systems available for custom data points. These standards were
developed for diverse purposes but are now commonly used.
WGS84 was developed by the United States military for satellite-positioning systems.
Web Mercator is a web standard. It was first used with Google Maps.
OSGB36 is a British-developed system that is heavily used in British-based maps.
You should choose the system that works best for your specific location or geo data. In most cases
the WGS84 system works.
Lets use the airport coordinates as the basis for the new geographic item. We use the airport name to create this
data item, but other data items such as Airport Code would also work.
Duplicate the Airport data item and name the new item Airport Name.
Right-click the new data item, and then select Geography Custom. A Geography window appears.
Select your data items for latitude and longitude in the appropriate fields. Your new data item appears
in the Geography area.
the where of data 137
If your data set does not have the geo coordinates available, you can get them through several
sources.
The SAS MAPSGFK library contains multiple countries and regions.
There are open-source databases available that you can find with a web search.
Google has an API that you can query through code. The free service has a daily access limit,
but you can subscribe to their service or other commercial services.
Coordinate Pinpoints an exact location on the map using a custom geographical item
Regional Outlines a regional area
Bubble Combines a bubble plot to show a value at the location
These objects enable you to highlight your geospatial data in different ways and for different stories. Lets
explore the different ways that these data objects are used.
For the remaining topics in this chapter, the examples are created using the storm events data set from the
United States National Climatic Data Center website. This database contains US storm events (such as
tornadoes and thunderstorms) since 1950. The data set contains other facts such as the number of deaths or
injuries and the estimated property damage. The tornadoes are rated by their intensity on the Fujita scale from
F0 to F5 with F5 being the most destructive. In 2007, the Enhanced Fujita scale was introduced and tornadoes
were categorized as EF0-EF5.
138 An Introduction to SAS Visual Analytics
Perhaps youve heard people talk about tornado alleyits an area down the middle of the United States where
tornadoes occur more frequently. Tornadoes are powerful and scary storms that produce wind speeds capable of
sending a wood board through a metal car door. These storm events are responsible for massive property
damage and loss of lives. Geo coordinate maps are excellent at showing exactly where an event occurred. In the
following figure, the teal markers indicate where tornadoes with EF5/F5 strength of 230 mph+ (370 kph) winds
arose in the past 50 years.
To make this geo coordinate object a little more interesting, lets add the EF4/F4 tornado touchdown points for
the same time period. By contrasting the teal and gold markers, the viewer sees that an EF5/F5 tornado is less
common.
While the data points are chaotic, its clear where severe tornados occurred. This data would not have had the
same impression if we had plotted it as a line chart or even a pie chart. The touchdown points help you realize
why those particular states have a higher disaster recovery budget.
the where of data 139
This data object uses a custom geography data item that is based on supplied latitude and longitude values. If
the coordinates are incorrect, then the map might show your data in the middle of the ocean. In our sample data,
some of the coordinates were entered incorrectly. This resulted in tornados appearing in the Atlantic Ocean. To
correct this situation, the latitude and longitude values in the data set would have to be edited or filtered.
When you have too much data to display, SAS Visual Analytics issues a yellow icon and warns you to add
some filters to your data. With custom geographic data items, it is more likely to happen. The solution is to
control how much data appears at once by setting filters.
Add a date range slider to compare events along a time scale. Adding Event Year to the slider enables the
user to compare which years might have had more active storm seasons.
Split the data item categories. Use the display rules to assign the tornado scale to a different color so that
each level is clearer. Then add a List filter and assign the Tornado F/EF Scale to the list. Users can select
which tornado scale they want to compare.
140 An Introduction to SAS Visual Analytics
Use the geo regional data object when you need to introduce a subject about location. This geospatial object
helps a viewer understand where to focus their attention or understand how much variation occurs for a value.
These objects are also called choropleth maps, which is Greek for multitude of areas.
When you start thinking about dangerous storm events, you can imagine that these events cause considerable
property damage. States more prone to severe tornados will plan larger disaster recovery budgets. It would be
interesting to compare the damage costs by state. Using a geo regional map, you can place a value over an entire
region, such as a country or a state. Color is then applied over the regions to indicate the intensity of the value.
In the preceding figure, you can see the associated property damage cost for the tornadoes across the areas. The
darker the color, the costlier the storm damage. Use an average or percentage to make the values comparable or
normalized. By exploring the visualization, you can easily see the areas of most damage, but its harder to
understand where there is the least damage. Be sure to use a legend so that the user understands the color range.
the where of data 141
When you position your pointer over each state, a data tip appears that contains the assigned data items values.
Since Ohio and Kansas are similar in color, viewers might be interested to learn more. Most of Kansas is farm
land and rural areas, while Ohio is more densely populated and industrial. Being in tornado alley, Kansas
probably experiences more tornados and thus more crop damage. With the larger population, it might be costlier
for Ohio when there is an extreme storm event.
There are a few settings that can make a geo regional data object a nicer user experience.
Add data tips to provide more information when the user positions the pointer over content.
You can add as many as you like, but make sure that the data items enhance instead of confuse the
viewer. For the preceding example, we added the Storm Event Count as a data tip.
Adjust the color transparency for the overlay so that the user can see the underlying values.
If the underlying values are masked, it might cause confusion. For this visualization, the
transparency was adjusted to 25%. It was just enough to maintain the color while still allowing the
underlying value to peek though.
Adjust the gradient color to ensure enough contrast.
The ocean is a light blue, so a contrasting color that does not appear too similar to the landscape
features is required. In Figure 6.9, a single color for the Gradient value is used. It is easier to
decode a value when the color intensity increases as the value increases.
In Envisioning Information, Edward Tufte has a fascinating discussion of color with maps. His
suggestion is to use colors that are found in nature. He encourages using a color palette on the
lighter side and provides several examples used across several centuries.
The geo region data object is excellent for getting the user to focus on specific areas. It leads to more questions
about the storm events, so it might be convenient to use an info window to provide more details. This info
window shows the storms by duration with estimated damage. A quick storm can result in as much damage as a
longer one, although this probably depends on where the tornado touches down.
142 An Introduction to SAS Visual Analytics
When the user clicks on the state, the info window appears with additional information. This example uses a
bar-line chart, but it can be anything you can create in a section. This map is a good way to start a story. It
provides an overview and helps the viewer understand where to focus their attention. In this case, it was Kansas
and Ohio.
The only pitfall to an info window is that the viewer might not recall the values from the previous pop-up. Use
this technique for data discovery or as a way to entice someone into your story. This data story is completely
about the location and comparing how the events affected the states.
1. Create a tab. In this example, the geo regional map was created.
2. Create another tab with the data objects of your choice. For this example, a bar-line data object was used to
show the event duration and estimated property damage.
3. Select the down arrow next to the title and select Display as Info Window.
the where of data 143
4. Return to the page that you created in step 1. Right-click on the map, and then select Add Link > Info
Window Link. In the window that appears, select which info window you want.
Once you turn the tab into an info window, it does not appear to the viewer. You can use an info
window in other situations to provide information about the tab.
Bubble plots receive a lot criticism for being difficult to understand. These charts can pack a lot of data into a
few variables. A layperson might spend more time trying to understand a bubble plot, but this doesnt seem true
for the geo bubble maps. Possibly its because the user sees the map and understands that it is related to
location.
In the previous topic, we created a geo regional map to show the average damage cost from F5 tornadoes for
each state. One issue with the method was that users had to position the pointer over each state to see how many
storm events were associated with each event. If a user wants details, it is a little awkward. A geo bubble map
resolves this issue..
A geo bubble plot places a bubble on the geographic location and enables you to control two aspects of the
bubble: its size and color. In the following example, the bubble size is the event count (the number of
tornadoes) while the color shows the estimated property damages (shown with the scale). Now it is more
apparent that Kansas endured a similar number of events as Mississippi, but the price tag was a little larger.
However, it also shows that Ohio had a similar cost but fewer events than Kansas.
144 An Introduction to SAS Visual Analytics
When you use bubbles to encode data, you are asking the user to compare the bubble size and the bubble color.
The legend ensures that the user has some visual cues to assist with understanding. You can place the legend
anywhere around the object. In the preceding example, the legend is placed on the right.
By default, the geo bubble object uses a gradient scale of red to blue. This scale is acceptable when working
with performance data and is commonly referred to as trafficlighting. The color mimics the traffic signals where
red means stop and green means go. However, we have a logic problem in this instance. The red indicates the
least amount of damage and the blue indicates the most. Technically, any property damage is bad. (After all, we
are not measuring how well the tornado was at damaging property!)
The gradient scale was changed to teal in our chart. The bubbles are not as close to the ocean and provide
enough contrast with the circle. However, notice that the bubble over Tennessee is barely visible. The bubbles
were set to 30% transparency to make the state names visible. Perhaps another color would be more suitable?
You can experiment with your data object and decide.
For users who want additional functionality, they can subscribe to the ESRI premium features. The premium
service offers drive-time analysis, drive-by-distance analysis, and the ability to create custom shapes. In this
example, the user was looking for customers within a 5- to 10-minute driving distance of the store location. The
darker inner area is the 5-minute drive, while the lighter inner band in the 10-minute drive.
OpenStreetMap This is an open-source project, where a worldwide user community maintains the data about
roads, boundaries, trails, and much more.
ESRI ArcGIS Maps This advanced mapping platform uses highly interactive and informative geographical
maps. The maps are maintained by ESRI, a SAS partner.
The SAS Visual Analytics environment must be configured to point to one of these mapping technologies. An
OpenStreetMap server is hosted by SAS and is available as part of the default configuration. Organizations
might want to host and maintain their own OpenStreetMap server. Organizations can also use the ESRI server
(ArcGIS for Server, version 10.1 or higher) for access to maps. Refer to the SAS Visual Analytics:
Administration Guide for your release for more configuration details.
Many SAS Visual Analytics users are concerned about what information from their data must be shared in order
to retrieve map tiles from OpenStreetMap or ESRI ArcGIS Maps. After all, if the data is confidential to their
enterprise, it needs to be kept secure. Fortunately, none of your actual data is leaked outside of the environment.
SAS Visual Analytics simply requests the specific map tiles necessary to render the selected geographic area.
The highlighted regions, bubble plots, and all are created within the SAS Visual Analytics application.
References
Aanderud, Tricia. 2016. Where in the World is SAS Visual Analytics? Available at
https://www.zencos.com/blog/review-geoplot-in-sas-visual-analytics/.
Massengill, Darrell. 2016. The GEOCODE Procedure and SAS Visual Analytics. Proceedings of the SAS
Global Forum 2016 Conference. Paper SAS3480-2016. Cary, NC: SAS Institute Inc. Available at
http://support.sas.com/resources/papers/proceedings16/SAS3480-2016.pdf.
Nori, Murali, and Himesh Patel. 2016. Location, Location, LocationAnalytics with SAS Visual
Analytics and ESRI. Proceedings of the SAS Global Forum 2016 Conference. Paper SAS4060-
2016. Cary, NC: SAS Institute Inc. Available at
http://support.sas.com/resources/papers/proceedings16/SAS4060-2016.pdf.
Schulz, Falko, and Anand Chitale. 2014. More Than a Map: Location Intelligence with SAS Visual
Analytics. Proceedings of the SAS Global Forum 2014 Conference. Paper SAS021-2014. Cary,
NC: SAS Institute Inc. Available at
http://support.sas.com/resources/papers/proceedings14/SAS021-2014.pdf.
Tufte, Edward R. 1990. Envisioning Information. Cheshire, CT: Graphics Press.
US NOAA Data. Storm Events Database. Accessed 2016. See https://www.ncdc.noaa.gov/stormevents/.
Wong, Dona M. 2013. The Wall Street Journal Guide to Information Graphics: the Dos and Donts of
Presenting Data, Facts, and Figures. New York, NY: W. W. Norton & Company, Inc.
Infographics Powered by SAS Visual
Analytics and SAS Office Analytics
Travis Murphy
This is an excerpt from a 2016 SAS Global Users Group Proceeding. For more
SUGI and SAS Global Forum Proceedings, visit the online versions of the
Proceedings.
A picture is worth a thousand words, but what if there are a billion words?
The old saying is very true today given the rise in data and the appetite for this data in all aspects
of life. The picture becomes even more important today, and this is where infographics step in.
Infographics are representations of information in a graphic format designed to make the data
easily understandable, at a glance, without your needing a deep knowledge of the data. The
amount of data available today is vast, and more infographics are being created to communicate
the information and insight from all this available data. This is not just on social mediait is inside
the boardroom as well. This paper shows you how to create information graphics that can be
printed, shared, and dynamically explored with objects and data from SAS Visual Analytics.
Connect your infographics to the high-performance analytical engine from SAS for repeatability,
scale, and performance on big data. In this paper, you will see how to easily leverage elements of
your corporate dashboards and self-service analytics while communicating subjective information
and adding the context that business teams require, in a highly visual format.
This paper looks at how SAS Office Analytics enables a Microsoft Office user to create
infographics for all occasions. You will be presented with a workflow that lets you get the most
from your SAS Visual Analytics system without having to code anything. This paper provides the
perfect blend of creative freedom and data governance that comes from leveraging the power of
SAS Visual Analytics and the familiarity of Microsoft Office.
support.sas.com/murphy
Infographics Powered by SAS Visual Analytics and SAS Office Analytics
Travis Murphy, SAS Institute Inc.
Excerpt from SAS Global Forum 2016 Proceedings
ABSTRACT
A picture is worth a thousand words, but what if there are a billion words? This is where the picture
becomes even more important, and this is where infographics step in. Infographics are a representation
of information in a graphic format designed to make the data easily understandable, at a glance, without
having to have a deep knowledge of the data. Because of the amount of data available today, more
infographics are being created to communicate the information and insight from all available data, both in
the boardroom and on social media. This session shows you how to create information graphics that can
be printed, shared, and dynamically explored with objects and data from SAS Visual Analytics. Connect
your infographics to the high-performance analytical engine from SAS for repeatability, scale, and
performance on big data and for ease of use. You see how to leverage elements of your corporate
dashboards and self-service analytics while communicating subjective information and adding the context
that business teams require, in a highly visual format. This session looks at how SAS Office Analytics
enables a Microsoft Office user to create infographics for all occasions. You learn a workflow that lets you
get the most from your SAS Visual Analytics system without having to code anything. You will leave this
session with the perfect blend of creative freedom and data governance that comes from leveraging the
power of SAS Visual Analytics and the familiarity of Microsoft Office.
INTENDED AUDIENCE
This paper is aimed at SAS Visual Analytics users who create and design reports and dashboards for
their users. Managers can use this paper to determine what the teams can create and design with SAS
Visual Analytics and SAS Office Analytics. This paper is for beginner and intermediate SAS users.
INTRODUCTION
The world is at a point where the attention span of a consumer is only about eight seconds. If something
doesnt grab their attention, chances are they will move on. Not only will they move on, but most of the
time they wont come back. This has created a shift in the massive growth of using infographics to quickly
show data-driven visuals for immediate impact. There are now infographics about how many infographics
there are. There is a reason for this growth in infographics.
Now you can circumvent written language to a large extent. A lot of printed words are there to
describe things that occur spatially. In many cases a picture is worth a thousand words. Now we can
generate these pictures and graphics and we can convey them to other people very easily. I think
its inevitable that visual media are going to become more important in conveying ideas and not just
about raging fires.
Marcel Just, Center for Cognitive Brain Imaging at Carnegie Mellon University, 2010
As the quote infers, data visualization is a more effective way of getting a reaction from the audience.
Analysts need to adapt to this shift in the audiences attention span and combine visuals with the massive
amount of available data. We need to combine infographics with big data to take advantage of the
1
opportunities that it presents. There are two types of infographics in broad categories: artistic and
business. The artistic infographic is one in which graphic designers take information from authors and
create a very artistic visual. This visual could be hung on the wall as a poster or put on the front page to
support a headline. The next is the business infographic. This is a very structured visual and an extension
of the dashboard concept. A dashboard can be an infographic as well. However, generally they are more
about data than about being subjective, which often makes them a one-stop shop for self-service
business intelligence. Both styles of infographics have their place, and when used correctly, they can
have a great impact on the intended audience. The business infographic, which is the focus of this paper,
combines artistic elements like clip-art background images, and includes data-driven content from the
corporate data warehouse or big data platform. Traditionally, the creator needed to spend a large amount
of time with tools such as Microsoft Excel or with programming languages to develop and craft the correct
data to support the right visual. This is not so any more.
Analysts do not have to write code to get great visualization from the data. They dont have to spend
hours using scripts and spreadsheets to craft the data into a usable format. The analyst can start to be
more creative on the visual layer and get better visualization in the infographic. This paper is not focused
on the theory of design for infographics. I will leave that to the graphic designers, data journalists, and
visual artists of the world. This paper focuses on the SAS analytic engine and associated software to
make extremely powerful tools available to the business analyst to better create infographics.
The boardroom, just like the classroom, has a much shorter attention span to consume information and to
get to the AHA moment as fast as possible. The emergence and growth of many data visualization tools
have placed infographics and data visualization at the forefront when considering business intelligence
solutions. SAS has been a leader in data visualization for 40 years. Over the years, the tools have
improved to be more approachable for more people within an organization. It is proven that data
presented visually is more easily processed by the brain than looking at the tabular format or words
alone. Also, the number of infographics has increased dramatically over the past five years, and
according to Google trends, this number continues to rise. The only caveat is that pictures and
visualizations themselves should not impede the message or the clarity of the information being
communicated. I know from personal experience, working on many data warehouse projects, that all
stakeholders, whether executive or line manager, need to be guided on a path, a repeatable path, to the
insight that is being communicated from the data platform.
What has changed is the vast increase in data: the volume, variety, and velocity. The Internet of things
and surrounding applications are guaranteeing the continued growth in data assets. The importance of
data visualization will be the difference between noticing a pattern or missing a pattern altogether. To add
to this, business analysts need to do what they did in multiple pages in a dashboard now in eight seconds
to capture the audience. Business analysts must provide insight and create a reason for the audience to
click through to more detailed information. This is where the business infographic comes in.
3
RepeatableYou can refresh each of these visuals each week or month without any other rework.
This is easy to repeat, not having to start from the beginning every time.
ScalableYou need to scale to massive amounts of data, allowing the compute power of the SAS
analytic engine to be leveraged underneath what seems to be a simple summary or individual gauge.
Approachable-There is no code required. This approach leverages skills that an everyday user
would have already, including using Microsoft Office, the SAS drag-and-drop user interface, and SAS
Visual Analytics.
It is important to understand what technology is being used to achieve this unified approach: the SAS
Enterprise Analytics platform, office productivity tools, and your own design creativity.
5
Figure 4. SAS Ribbon in Microsoft Office Showing SAS Visual Analytics Add-In for Office (PowerPoint in
Office 2016)
THE APPROACHES
The approach for creating infographics with SAS is to create repeatable dashboard-style business
infographics versus the artistic infographics that can look great as posters on the kitchen wall, but can
take months to update or create. I have backed a few of these artistic infographic posters on Kickstarter
myself. On the flip side, the aim here is creating a business-ready environment to take some of the
benefits of the infographics to the enterprise stakeholders or to the C-level executive in the boardroom.
We need to continue to innovate how we capture and keep the attention of stakeholders. This is an option
for you right now.
Here are three options for incorporating SAS analytics into an infographic.
7
not the focus of this paper. I will leave this skill to the other SAS experts who have done a heap of work in
this area, like Rob Allison, who is prolific in providing working examples of the art of the possible with
SAS/GRAPH as a data visualization engine. An example of Robs work can be seen in Figure 7 (Allison
2013). I find that this option is best used when you have white-boarded your infographic design and have
a clear vision of where all of the elements will live. This option can be done in batch to create files like
HTML, PDF, etc., or can be packaged and run on-demand using a SAS Stored Process.
Figure 7. Infographic Created by Rob Allison Using SAS/GRAPH with Built-In Links to Drill Through to Details
9
SAS Enterprise Guide options are not covered in this paper. However, it is an important part of the
SAS Office Analytics suite. You can refresh any reports that you send from SAS Enterprise Guide to a
Microsoft Office document in a flow, and then open the reports with links to the original SAS source when
you open them in Microsoft Office. This aids in repeatability and ease of use for your infographics.
MY APPROACH
Of the three options, I have chosen to focus on option 2. I will be leveraging graphics created in SAS
Visual Analytics and inserting them into my infographic template inside PowerPoint and Excel. This
requires no coding skills at all, and it is the most approachable option to get started today.
To do this correctly, I need to do some work inside SAS Visual Analytics first to create the data-driven
objects for my infographics. Of course, I could just open my favorite SAS Visual Analytics report and grab
any object. This means that I have not optimized my infographics look and feel, though. I need to use the
SAS Visual Analytics Designer to create my optimized report elements for best use in this highly visual
design.
You need to choose the subject area and focus on it as you step through the following information. I
chose to focus on the sales data for my organization: a fictitious toy company that has sample data
shipped with SAS Visual Analytics. This focus enables me to maximize my efforts as I consider the visual
elements.
11
Figure 11. Example of external media to create your infographic
It is now time to unify these visual elements into the infographic itself. If you are anything like me, this can
take a long time to get just right. Many iterations of design and trial and error will probably occur before
you have the desired layout and information to tell the narrative that you want from your data.
In Figure 12, you can navigate to all of your SAS Visual Analytics reports and analyses and insert desired
elements into your infographic. This cannot be any easier, and dont forget that the computation and
number-crunching is happening on the SAS platform, not on your laptop.
Figure 12. SAS Add-In for Microsoft Office Navigating to SAS Report from Microsoft Excel
Once you have added all of the desired content from SAS, you can add your subjective commentary and
narrative to tell the story that you want in your infographic. Now, you are ready to see the end result. The
way I have done this is to use the built-in Save As option to save to PDF, which is a publication- and
print-ready format that you can share without allowing others to change the data. If I do want others to
edit the design, I can just email the document and they can edit the design. They cannot update the data
if they dont have security permissions in the SAS analytics engine, though. This provides a great balance
of creative freedom and data governance.
13
Figure 13 shows the final product of the infographics created with SAS Visual Analytics and SAS Office
Analytics. These infographics show highlights such as the top three sales performers and sales by year
for the past three years. The focus of these infographics is sales and company performance data, and
they show sales performance at a glance.
Figure 13. Final Infographics from PowerPoint and Excel
15
Turn on the background and wall transparencies in the SAS Visual Analytics report. This does not really
change the SAS Visual Analytics side. When you insert these as part of the infographic, the background
color will be the same as the layout that you have in Microsoft Office.
Create final touches.
If some visual elements cannot be switched off in an object, include some shapes with solid fill to blend the
objects with the background color. This will assist with your infographic precision.
Consider adding a link to the SAS Visual Analytics server in your design. This is a great way to unite the
infographic user with the live dashboard. This can drive your audience to the self-service analytics platform.
CONCLUSION
The aim of this paper is to introduce a new use case for your trusted software from SAS, specifically SAS
Visual Analytics and SAS Add-In for Microsoft Office. I hope that this paper has provided some ideas on
how you could achieve data-driven visualization in your business. I know that a step-by-step guide to
create what is outlined in this paper would be a great asset. As an alternative, I plan to provide a video of
creating the infographics, and I will post it to the SAS Communities site following SAS Global Forum
2016. I encourage you to share your creations on the SAS Communities SAS Visual Analytics site.
Remember that infographics and data visualizations need to include narrative or context to meet the
needs of the audience, not just show aesthetics. You can have repeatable infographics with governed
access to analytics resources within your enterprise. This leverages the power of big data and the SAS
analytics engine with your familiar Microsoft Office productivity tools that you use every day.
REQUIRED SOFTWARE
Support Documentation at http://support.sas.com/software/products/addin/index.html
Here are the specific software versions used in this paper:
Microsoft Office 2016 and 2013
SAS Visual Analytics 7.3
SAS Office Analytics, SAS Visual Analytics Add-In for Office (with limited features from SAS Office
Analytics), and SAS Add-In 7.12 for Microsoft Office
SAS Visual Analytics Add-In for Office is available in Microsoft PowerPoint and Excel and works only with
SAS Visual Analytics.
REFERENCES
Tukey, John W. 1977. Exploratory Data Analysis. Reading, Massachusetts: Addison-Wesley.
SAS Institute Inc. 2015. SAS Visual Analytics: Users Guide. Cary, NC: SAS Institute Inc. Available at
http://support.sas.com/documentation/onlinedoc/va/index.html.
Time.com. You Now Have a Shorter Attention Span than a Goldfish. Kevin McSpadden. 2015. Available
at http://time.com/3858309/attention-spans-goldfish/.
Nieman Reports. Watching the Human Brain Process Information. 2010. Available at
http://niemanreports.org/articles/watching-the-human-brain-process-information/.
SAS Learning PostBlog Post. Creating fancy 'infographics' with SAS. 2013. Robert Allison. Available at
http://blogs.sas.com/content/sastraining/2013/04/11/creating-fancy-infographics-with-sas/.
RECOMMENDED READING
Bailey, D., Tim Beese, and Casey Smith. 2015. Take Your Data Analysis and Reporting to the Next
Level by Combining SAS Office Analytics, SAS Visual Analytics, and SAS Studio. Proceedings of
the SAS Global Forum 2015 Conference. Cary, NC: SAS Institute Inc. Available at
https://support.sas.com/resources/papers/proceedings15/SAS1804-2015.pdf.
Bailey, D., Anand Chitale, and I-Kong Fu. 2014. Share Your SAS Visual Analytics Reports with SAS
Office Analytics. Proceedings of the SAS Global Forum 2014 Conference. Cary, NC: SAS Institute, Inc.
Available at http://support.sas.com/resources/papers/proceedings14/.
Devarajan, R., H. Patel, P. Berryman, and L. Everdyke. 2014. Create Custom Graphs in SAS Visual
Analytics Using SAS Visual Analytics Graph Builder. Proceedings of the SAS Global Forum 2014
Conference. Cary, NC: SAS Institute Inc. Available at
http://support.sas.com/resources/papers/proceedings14/SAS346-2014.pdf.
SAS Institute Inc. SAS Visual Analytics: Video Library. Cary, NC: SAS Institute Inc. Available
http://support.sas.com/training/tutorial/va73/.
SAS Institute Inc. 2015. SAS Visual Analytics: Users Guide. Cary, NC: SAS Institute Inc. Available
http://support.sas.com/documentation/cdl/en/vaug/67500/PDF/default/vaug.pdf.
SAS Institute Inc. SAS Visual Analytics Community. SAS Support Communities. Cary, NC: SAS Institute
Inc. Available https://communities.sas.com/community/support-communities/sas-visual-analytics.
Allison, Robert. 2015. Robert Allison's SAS/Graph InfoGraphics! Available at
http://robslink.com/SAS/democd_infographics/aaaindex.htm.
SAS Institute Inc. 2016. SAS Office Analytics Fact Sheet. Cary, NC: SAS Institute Inc. Available at
http://www.sas.com/content/dam/SAS/en_us/doc/factsheet/sas-office-analytics-105595.pdf.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Travis Murphy
SAS Institute Australia and New Zealand
300 Burns Bay Road
Lane Cove, Sydney, NSW Australia 2066
Travis.Murphy@sas.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies.
17
SAS Visual Statistics 8.1: The New Self-
Service Easy Analytics Experience
Xiangxiang Meng, Cheryl LeSaint, Don Chapman, SAS Institute Inc.
This is an excerpt from a 2016 SAS Global Users Group Proceeding. For more
SUGI and SAS Global Forum Proceedings, visit the online versions of the
Proceedings.
When you are analyzing data, it is important that you can easily identify relationships between the
variables. By identifying relationships, you can make predictions for variables of interest. A
common variable of interest is a binary variable. Should a person be admitted to a program? Is
this transaction fraudulent? This paper examines data on whether a customer cancels his or her
subscription, which is also known as churn. The churn variables relationship with other
information about the customers account is examined in a variety of ways throughout this paper.
The exploratory data analysis and feature engineering all build up to identifying significant
predictors of churn through a logistic regression. The combination of SAS Visual Analytics 8.1
and SAS Visual Statistics 8.1 brings data analysis and reporting to your fingertips.
support.sas.com/meng
SAS Visual Statistics 8.1: The New Self-Service Easy Analytics Experience
Xiangxiang Meng, Cheryl LeSaint, Don Chapman, SAS Institute Inc.
Excerpt from SAS Global Forum 2016 Proceedings
ABSTRACT
In today's Business Intelligence world, self-service, which allows an everyday knowledge worker to
explore data and personalize business reports without being tech-savvy, is a prerequisite. The new
release of SAS Visual Statistics introduces an HTML5-based, easy-to-use user interface that combines
statistical modeling, business reporting, and mobile sharing into a one-stop self-service shop. The
backbone analytic server of SAS Visual Statistics is also updated, allowing an end user to analyze data of
various sizes in the cloud. The paper illustrates this new self-service modeling experience in SAS Visual
Statistics using telecom churn data, including the steps of identifying distinct user subgroups using
decision tree, building and tuning regression models, designing business reports for customer churn, and
sharing the final modeling outcome on a mobile device.
INTRODUCTION
When analyzing data, it is important to be able to easily identify relationships between the variables. By
identifying relationships, you are able to make predictions for variables of interest. A common variable of
interest is a binary variable. Should a person be admitted to a program? Is this transaction fraudulent?
This paper examines data on whether a customer cancels his or her subscription, which is also known as
churn. The churn variables relationship with other information about the customers account is examined
in a variety of ways throughout this paper. The exploratory data analysis and feature engineering all build
up to identifying significant predictors of churn through a logistic regression. The combination of SAS
Visual Analytics and SAS Visual Statistics in the 8.1 release brings data analysis and reporting to your
fingertips. The 8.1 release of SAS Visual Analytics and SAS Visual Statistics was pre-production when
this paper was authored; therefore, details are subject to change.
MODERN DESIGN
The modern user experience supports the modeling needs of statisticians and data scientists as well as
the reporting needs of business analysts. The application is designed to run in your browser and is written
entirely in HTML5. This opens up options for developing models away from the desktop.
The underlying infrastructure has been rewritten to meet the deployment demands of the future. It
supports more versatile deployment options ranging from on-site to in-the-cloud deployments by
1
leveraging microservices that meet the scaling and reliability needs expected in todays world. You can
also take advantage of the next-generation in-memory analytics server from SAS that unifies all the
analytics procedures into a single server.
Data access is built into SAS Visual Statistics. You have full access to enterprise data stored in Hadoop
or in the corporate database. You also have the ability to upload your own data stored in plain text files,
Excel spreadsheets, or a SAS data set. This level of self-service, along with a rich set of data
manipulation capabilities, enables the feature engineering expected by sophisticated users.
USER EXPERIENCE
SAS Visual Analytics and SAS Visual Statistics has been seamlessly integrated, making all modeling
work immediately accessible in a report. Documenting and sharing models are easy; you just save your
work. This work can be viewed on mobile devices using the SAS Visual Analytics mobile viewers.
Often the generation of a report is secondary to interactively building models and generating score code.
SAS Visual Statistics does not require report layout; it is there only if needed. It also provides the tools to
compare two models side-by-side, or to evaluate them interactively using the model comparison task.
Figure 2 provides a basic introduction to the layout of the user interface of SAS Visual Analytics 8.1. SAS
Visual Statistics is an add-on to SAS Visual Analytics, which provides additional statistical modeling tasks
in the Content menu.
Figure 2. User Interface of SAS Visual Analytics 8.1
3
CASE STUDY USING TELECOM CHURN DATA
DATA DESCRIPTION
The examples in this paper are derived using telecommunications data. The variable of interest is
whether the customers cancel their service or not, also known as churn. The data set is available from the
UCI Machine Learning Repository of databases. The data set contains 3,333 observations, where each
row contains the information collected for a customer account. Table 1 provides a brief description about
the variables in this table.
Variable Name Description
Churn Label of churn (Yes/No).
Day_Calls, Day_Charge, Day_Mins Total day calls, charges, and minutes
Eve_Calls, Eve_Charge, Eve_Mins Total evening calls, charges, and minutes
Night_Calls, Night_Charge, Night_Mins Total night calls, charges, and minutes
Intl_Calls, Intl_Charge, Intl_Mins Total international calls, charges, and minutes
Account_Length Length of account before churn
CustServ_Calls Total number of customer service calls
State State of USA
Intl_Plan Indicator for international plan
VMail_Message Number of voice mail messages
VMail_Plan Indicator for voice message plan
Table 1. Description of the Telecom Churn Data
FEATURE ENGINEERING
Feature engineering is an important step to improve the performance of a statistical or machine learning
model, and it is recognized as the most manual and time-consuming effort in a learning process. SAS
Visual Analytics 8.1 provides a few tools to create new features from existing columns. These features
are created on-demand and do not require additional disk and memory footprint. Within SAS Visual
Analytics 8.1, you can do the following tasks:
1. Determine whether a variable should be used as categorical or measure.
2. Create a new hierarchy using a set of categorical variables.
5
3. Create a custom category that represents a grouping of the levels of a categorical variable with high
cardinality.
4. Create a calculated item using user-specified formulas.
Most of these options are available in the Add menu of the Data pane, as shown in Figure 4.
Figure 6. Treemap of Churn Rates across States Using the Calculated Churn=Yes Column
The data contains only total charges for each type of call (Day, Evening, Night, and International). You
can easily derive other features such as average charge per call, total numbers of domestic calls, total
domestic charge, and so on.:
'Day_Charge'n / 'Day_Calls'n /* Day_Avg_Charge */
'Eve_Charge'n / 'Eve_Calls'n /* Eve_Avg_Charge */
'Night_Charge'n / 'Night_Calls'n /* Night_Avg_Charge */
'Intl_Charge'n / 'Intl_Calls'n /* Intl_Avg_Charge */
'Day_Calls'n + 'Eve_Calls'n + 'Night_Calls'n /* Total_Domestic_Call*/
'Day_Charge'n + 'Eve_Charge'n + 'Night_Charge'n /* Total_Domestic_Charge*/
Note that deriving multiple calculated items or deriving calculated items multiple times does not require
additional disk storage or data passes. The definitions of the calculated items are attached to the in-
memory data source and are computed only when they are used by a visualization or model. Figure 7
shows a scatter plot of the two calculated items DAY_AVG_CHARGE and EVE_AVG_CHARGE. You can
easily identify data anomalies: one account has an extremely high average evening charge and a few
accounts have high average day charges, the majority of which are churners.
7
Figure 7. Scatter Plot of the Calculated Average Day and Evening Charges
DATA SEGMENTATION
The geo map in Figure 3 and the treemap in Figure 6 show that churn rate varies across states. This
implies STATE is a significant factor for predicting churns. However, the STATE column is a high-
cardinality variable with 51 levels and you might not want to use it in a model directly. SAS Visual
Statistics provides several methods for dimension reduction and data segmentation. For example, you
can use the decision tree model in SAS Visual Statistics to group the levels of a high-cardinality variable
into several leaves, based on a target variable, as shown in Figure 8.
In this example, a decision tree model is built with response variable CHURN and only one predictor
STATE. Decision trees are widely used in many applications such as predictive modeling, data
segmentation, and outlier detection. Each application requires different tree parameter settings. For data
segmentation, you often need a smaller tree to ensure each leaf of the tree has enough observations.
Figure 8 shows a two-level decision tree with four branches. The tooltip shows that the third node (Node
ID = 3) is the data segment with highest churn rate (19.44 percent) and contains the following states: CO,
IN, KS, MA, MD, MI, MT, NC, NJ, NV, TX, UT, and WA.
Figure 8. A Two-level Four-branch Decision Tree for Data Segmentation
It is often desirable to use the data segmentation from a decision tree model in other models. With SAS
Visual Statistics, you can derive a leaf ID column that represents the leaf assignment of the observations.
Figure 9 shows the right-click menu (on a mobile device, hold to pop up this menu) to derive a leaf ID
variable. For this use case, the leaf ID contains four values (1, 2, 3, 4) that represent the grouping of 51
states into four segments with different levels of churn rates (CHURN = Yes).
9
Figure 10. Editable Tree-based Segmentation
Deriving a leaf ID variable is one way of saving the results of a model. All models in SAS Visual Statistics
allow you to save the analytical contents for later use. You can do the following tasks:
1. Save the model as part of the report. If you open a saved model and the underlying data source has
been updated, the model is automatically retrained.
2. Save a footprint of the model as SAS DATA step code (score code). You can use the score code to
score a new data source for either prediction or validation purposes.
3. Derive new columns from the model. These columns are attached to the currently loaded table and
can be further used in any other models or report objects.
11
Second, we link both the bar chart and the logistic regression to the button bar control. This is done by
creating a new Filter Action from the button bar, as shown in Figure 12.
Figure 14. Response Distribution and Logistic Regression Model for the Leaf ID = 3 Data Segment
Looking at the results for the logistic regression, you can see in the Fit Summary plot the variables that
are significant at predicting whether a customer will cancel (churn). Here you can see that whether they
have an international plan (Intl_Plan), the total number of customer service calls they make
13
(CustServ_Calls), the average charge for their calls during the day (Day_Avg_Charge), the total number
of calls during the day (Day_Calls), and whether they have a voice message plan (VMail_Plan) are all
significant. This aligns with some of the exploratory data analysis. The box plots showed more separation
between churners and non-churners for CustServ_Calls than Intl_Call. The fact that Day_Avg_Charge is
significant at predicting churn was seen in the scatter plot, which showed large values of this variable are
associated with customers that churn. These significant predictor variables would be good to focus on
when attempting to reduce customer churn.
MOBILE VIEWER
Exploratory data analysis, model construction, and report building can all be done through SAS Visual
Analytics and SAS Visual Statistics using a web browser from a desktop client or mobile device. Once a
final report has been settled on that summarizes the findings of your analysis, the report can be viewed
by many. A saved report can be shared simply be opening it in SAS Visual Analytics Viewer. From SAS
Visual Analytics Viewer, the user can view and interact with all pages in the report, email the report to
others, and print interesting results.
CONCLUSION
In conclusion, SAS Visual Analytics and SAS Visual Statistics 8.1 provide a unified platform for your
analytic journey. The paper uses churn data to demonstrate this new self-service experience and
provides a working example of exploring, manipulating, modeling, and building business reports of the
churn data, all in a single user interface.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Xiangxiang Meng
SAS Institute Inc.
Xiangxiang.Meng@sas.com
Cheryl LeSaint
SAS Institute Inc.
Cheryl.LeSaint@sas.com
Don Chapman
SAS Institute Inc.
Don.Chapman@sas.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of
SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies.
Create Your First Graph: Visual Data
Exploration with SAS ODS Graphics Designer
Sanjay Matange and Jeanette Bottitta
Excerpt from SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs
Interactively
The previous chapters of this book covered the domain of visual data analytics, with a focus on
big data. SAS Visual Analytics provides you with the right tools for this job. Often, however,
your data needs fall into the traditional category, where you are dealing with clinical or
pharmaceutical data on drug safety or clinical trials. Here too, visualizing the relationships
between the different categories in your data can lead to quick insights, and indicate the analyses
you may want to undertake.
The SAS ODS Graphics Designer application is a visual tool that enables you to visualize your
data with zero programming knowledge. If you know your data and the graph you want to make,
you can take a point-and-click approach to building your graphs. The application will generate
the code that you can later use to create reports.
But often, you get some brand new data, and want to get quick graphical views of your data. SAS
ODS Graphics Designer really shines here by enabling you to just pick the variables of interest.
The application will then create for you hundreds of possible graphs that can visually show
possible correlations in your data. Then, you can just use the graphs created for you, and
customize them to your needs.
The SAS ODS Graphics Designer application can simplify your task of finding trends in your
data.
support.sas.com/matange
Jeanette Bottitta is a technical writer at SAS Institute, where she specializes in ODS Graphics
software. Jeanette has over 12 years of experience writing programming guides, including the
SAS ODS Graphics Procedures Guide. She has worked with SAS ODS Graphics Designer since
its initial release and enjoys its user-friendly interface.
support.sas.com/bottitta
SAS ODS Graphics Designer by Example. Full book
available for purchase here.
Lets make a distribution plot of city gas mileage for all cars in the Sashelp.Cars data set:
Note: If the Graph Gallery is not displayed, select View Graph Gallery.
Chapter 3: Create Your First Graph 21
1. On the Basic tab of the Graph Gallery, select the Histogram icon.
2. Click OK.
You can also double-click the Histogram icon. For future steps, this book uses the double-click
action when it is available.
Figure 3.2 Graph Gallery with Highlighted Histogram and Basic Tab
A graph window is displayed that includes a histogram plot. The Assign Data dialog box for the
histogram appears, showing the placeholder data assignments.
22 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
Note: Do not click OK in the Assign Data dialog box. You will change the data assignments
in the next step.
The placeholder data enables the designer to show a visual draft of the type of plot that has been
included in the graph. In this case, the designer uses the HEIGHT variable from the Sashelp.Class
data set. Using this placeholder data, the designer creates the histogram.
The Assign Data dialog box is discussed in more detail in Chapter 4. For now, its sufficient to
know that you use this dialog box to change the data assigned to the plot and to the analysis
variable.
Chapter 3: Create Your First Graph 23
1. In the Assign Data dialog box, select CARS for Data Set.
The previous variable, HEIGHT, is not available in the new Sashelp.Cars data set, so the
Analysis variable setting has been cleared. Because an analysis variable is required, the
settings for the histogram are not complete. As a result, the plot identifier is shown in red
and the OK button is not available.
The designer creates the graph with a histogram and a placeholder title and footnote.
Note: Your graph might be a different size from the one shown here. For information about
the graph sizes used in this book, see Graph Size in About This Book.
1. Double-click on the placeholder title (Type in your title). The placeholder text is
highlighted:
To set title properties, right-click on the title, and select Title Properties. The Text
Properties dialog box is displayed from which you can customize the visual properties
of the title, such as font, color, and so on. You can do the same for the footnote.
You can insert more titles and footnotes using the Insert menu.
For this example, lets add a normal density curve, a kernel density curve, and a fringe plot to the
graph. These plot types are compatible with the histogram and can be added to this graph.
26 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
1. In the Elements pane, click the Normal icon as shown in Figure 3.6. Drag and drop the
Normal icon onto the graph. The Assign Data dialog box for a normal density curve is
displayed.
Fit an existing plot is selected by default. With this option, the normal density curve
uses the same data settings as the histogram. There is only one plot currently in the
graphthe histogram. If there were more plots, you would have a choice of which
plot to use for the fit.
Because Fit an existing plot is selected, Analysis is not available. The newly added
plot must use the same data as the histogram. In this example, the normal density
curve is fitted with the same analysis variable as the histogram.
The Assign Data dialog box for a kernel density curve is displayed.
This dialog box is similar to the dialog box for the normal density curve. Fit an existing
plot is selected by default.
4. Keep the default selections and click OK. The kernel density curve is added to the graph.
5. In the Elements pane, click the Fringe icon. Drag and drop the Fringe icon onto the
graph.
4. Click OK.
The graph is updated with the new plots.
28 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
Figure 3.7 Graph with Normal Density Curve, Kernel Density Curve, and Fringe Plot
To make them clearer, you can modify the visual properties of the kernel density curve. Select the
kernel density curve, and then change the properties.
1. In the graph, select the kernel density curve. (The kernel density curve is the taller of the
two curves.)
The kernel density curve is selected and in bold. The other plots in the cell are dimmed.
Chapter 3: Create Your First Graph 29
2. Right-click on the curve, and then select Plot Properties. The Cell Properties dialog
box is displayed with the Plots tab selected.
3. Make sure that kernel is selected as Plot. If not, select kernel.
The style element currently assigned to the kernel density curve is GraphFit, as shown in
Style Element. (For more information about style elements, see Visual Properties of a
Graph in Chapter 4.)
5. Click OK.
30 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
Cell legend
is placed inside the data area of a cell. It is available from the Element panes Insets panel. By
default, this legend contains information for all the plots in that cell. It contains entries for
only the plots in that cell.
Global legend
is placed outside the cells. It contains entries from all the plots in the entire graph, including
multi-cell graphs. A global legend can be added to the graph by selecting Insert Global
Legend or by clicking the global legend toolbar button.
For this example, add a cell legend to the cell in the empty space in the upper right corner. Then,
modify the legend to contain only the entries for the normal density curve and the kernel density
curve.
1. In the Insets panel at the bottom of the Elements pane, click the Discrete Legend icon.
Chapter 3: Create Your First Graph 31
2. Drag and drop this icon onto the upper right corner of the cell.
A legend that contains all the plots in the cell is added to the graph. However, it is
unnecessary to show all the plots in the legend because some of that information is
obvious. In this example, the histogram and the fringe plot are easily identified, so
remove those entries from the legend.
7. Click OK.
32 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
Figure 3.9 Graph with a Cell Legend in the Upper Right Corner
To add a new row, right-click anywhere in the plot area, and select Add Row.
The graph area is split into two rows of equal height. You now have a graph with two rows; each
row has one cell.
Tip: You can add a column by right-clicking and selecting Add Column. Cells are always
added in full rows or full columns to create a regular grid. After adding the row for this
example, if you then add a column, the graph will contain four cells.
The new cell is empty except for the text drop a plot here. You can now populate this cell with
plots and insets.
Chapter 3: Create Your First Graph 33
2. The Assign Data dialog box for the horizontal box plot is displayed.
Library and Data Set values are based on the previous settings for the graph. You can
keep those settings for the new plot. However, because this is a separate cell, you can
select a different library and data set. The requirement about using the same data set
applies only to plots in the same cell.
34 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
Starting with SAS 9.4, the height of the new cell is automatically reduced to fit the single box
plot.
To use a common X axis for both rows, right-click on one of the X axis areas, and select
Common Column Axis.
A common column axis is created for all the cells in the column (two cells in this example). It is
displayed at the bottom. The axis range for the common column axis is the union of the ranges for
each cell in the column. All plots in each cell in the column are drawn correctly scaled to this new
common column axis.
The two rows resize to fit their contents. The bottom row is shorter than the top row.
Note: In early releases of the designer, the rows are not automatically resized. If that is the
case for you, you can change the height manually.
Position the cursor between the upper and lower row of the graph. A dashed line appears
between the rows, and the cursor changes to a two-headed arrow .
Click and drag the dashed line downward to reduce the height of the bottom row.
36 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
The code window is displayed in the work area for the active graph.
For simplicity, the following figure shows the code window for the state of the graph when it
contained only a histogram, a normal density curve, and a title.
Figure 3.13 GTL Code for the Histogram and Normal Density Curve
Figure 3.13 shows the GTL code needed to create this graph using the TEMPLATE procedure and
the SGRENDER procedure. The boxed section shows the layout overlay block containing the
histogram and normal density curve.
You can leave the code window displayed and view the changes to the code as you make changes
to the graph.
The code window is Read-only. You can save the code as a SAS file, or you can copy and paste
the code into SAS and run the program.
1. Select Edit Copy. An image of the active graph is copied to the clipboard.
2. Paste the image into an application by using the applications paste command, such as
Ctrl-V.
Chapter 3: Create Your First Graph 37
Note: You use this SGD file later in the book, so be sure to complete the previous step.
Although graphs cannot be saved on the first six tabs of the gallery, you can add new tabs to the
gallery.
38 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
1. Select File Save in Graph Gallery. The Save in Graph Gallery dialog box is
displayed.
2. For Group Name, select the name of the group into which you want to add the graph.
Each group corresponds to a tab in the gallery.
Group Name contains the names of groups that have been created at your site. It does
not contain the names of the default groups. If no groups have been created at your site,
or if you want to create your own group, do the following:
a. Select the New icon.
b. In the Create New Group dialog box, specify a name for the group. For this example,
specify MyGraphs, and click OK.
The new group appears in Group Name in the Save in Graph Gallery dialog box.
3. For Graph name, enter Histogram.
4. (Optional) The designer creates a default graph icon for the generated graph. If you want
to replace the default icon with an icon from your file system, click Browse, and select a
different icon.
5. (Optional) For Tooltip, you can provide a description that is displayed as a tooltip for
your new graph.
6. Click OK.
Chapter 3: Create Your First Graph 39
The graph is saved in the Graph Gallery for future use. Your gallery might look something like
this:
In the code, replace c:\histogram.sgd with the name and location of the SGD file that you
saved.
The graph is run using the definition in the SGD file and the original data set used to create the
graph. The libref and data set should be available in the SAS session.
SGD graphs can be run with different data as long as all the required variables exist in the new
data set. You can specify different data by using the DATA= option in PROC SGDESIGN. This
topic is covered in more detail in Chapter 7.
40 SAS ODS Graphics Designer by Example: A Visual Guide to Creating Graphs Interactively
Clinical Graphs Using the SAS 9.4 SGPLOT
Procedure
Sanjay Matange
The previous chapters of this book covered the domain of visual data analytics, with a focus on big data. We
also looked at creating graphs using SAS ODS Graphics Designer with no coding needed. But often after the
visual exploration and analysis of data, you need to convey the results of your analyses to your clients or to
regulatory agencies, such as FDA, that have stringent requirements.
One such area is in the reporting of the analysis of safety data for clinical trials or for drug safety. The
pharmaceutical industry spends billions of dollars bringing life-saving drugs to the communities. The process
requires extensive safety and efficacy testing of the drugs before they are approved. Results have to be shared
with other researchers and the regulators using sophisticated graphical techniques.
The tools in the SAS ODS Graphics Designer enable you to deliver your analytic results to your consumers.
These tools include Survival Plots, Forest Plots, and Adverse Event Plots. This chapter will introduce you to
the procedures that you can use to create such graphs. A moderate level of programming knowledge is
required.
Sanjay Matange is Research & Development Director in the SAS Data Visualization
Division, where he is responsible for the development and support of SAS ODS Graphics
software. This includes the Graph Template Language (GTL), Statistical Graphics (SG)
procedures, SAS ODS Graphics Designer, and other related graphics applications. Sanjay
has been with SAS for over 25 years. He is coauthor of two patents and author of four
SAS Press books.
support.sas.com/matange
From Clinical Graphs Using SAS. Full book available
for purchase here.
Clinical graphs often display the data in one cell along with derived statistics and other details that
aid in the decoding of the information in the graph. Most of these single-cell graphs can be created
using the SGPLOT procedure.
With SAS 9.4, the SGPLOT procedure supports some new and useful features that simplify the
creation of such graphs. These include the following new statements and features:
XAXISTABLE and YAXISTABLE. These two statements support axis tables along the x-
and y-axes. These statements can be used to draw "At-Risk" tables along the X-axis, or study
names and statistic values along the Y-axis. Rows and columns of textual data can be
displayed inside the data area or outside.
TEXT plot. This statement displays a text string from a column at the specified location. It
replaces the need for using a SCATTER plot statement with the MARKERCHAR option.
Because a text plot draws only text strings, other features are available for this function,
including control of offsets that might be driven by the text values.
POLYGON plot. This statement displays polygons in the graph based on the columns in the
data set. This is useful in drawing ranges in the graph for various levels, including complex
regions in graphs for device evaluation like the Clark Error Grid.
The goal in this chapter is to cover in detail the creation of some commonly used clinical graphs
using SAS 9.4. The chapter will provide not only code that you can use directly for such graphs,
but will also provide ideas on how you can use or combine plot statements to create your own
custom graph.
The SG Annotate facility features are also available for you to use in cases where the result cannot
be achieved using plot layers. SG Annotate was used extensively in Chapter 3 to create the clinical
graphs. See Section 2.9 for an introduction to this feature.
89
Figure 4.1.1 Graph of QTc Change from Baseline with the Subjects Table at the Bottom
Normally, a box plot treats the category variable as discrete, which would have placed all the tick
values on the x-axis at equally spaced intervals. However, in this case the values on the x-axis
represent days from start of study, and we want to place the data at the correctly scaled distance
along the x-axis. This can be done be explicitly setting TYPE=LINEAR on the x-axis. Now, each
box is placed at the scaled location along the x-axis.
The box plot is classified by treatment, which has two values "Drug A" and "Drug B". The boxes
are sized by the smallest interval along the x-axis. In this case, it is one day at the start of the study.
Hence, the effective midpoint spacing is set by that interval, and all boxes are drawn to fit in this
space.
The box plot uses the GROUPDISPLAY=CLUSTER option to place the groups side by side. We
have used the NOFILL option to draw empty boxes.
The table of the subjects at risk is displayed using the XAXISTABLE statement, showing risk
values by week and drug. The optional X role is not specified, so the table uses the X role that is
active; in this case, it is from the VBOX statement. The option LOCATION=OUTSIDE is used to
display the risk values outside the data area at the default bottom position.
The XAXISTABLE is classified by treatment by setting CLASS=DRUG. Now, the values for risk
are displayed in separate rows by drug. The value of the classifier DRUG is shown in the row label
on the left of the data. The option COLORGROUP=DRUG is used to color the risk values by drug
for easier association. Display attributes such as font size and font weight are set for both the risk
values and labels using the appropriate options:
Reference lines are placed on the y-axis at y= 0, 30, and 60 to represent the levels of concern. A
reference line is also placed on the x-axis at x=26 to separate MAX value.
The axis tick value "Max" has a value of x=28, and a format is used to display the tick value. The
tick values displayed on the x-axis are determined by the VALUES option on the XAXIS
statement, and the option MAX is set to 29 to allow an even display of the tick values.
The y-axis places the tick values from -120 to 90 by 30, and also sets the displayed axis range.
A legend is automatically added by the procedure because the box plot has a GROUP role. We
have used the KEYLEGEND statement to customize some aspects of the legend. The legend title
is removed, and the lengths of the line segments representing each classification value are
shortened using the LINELENGTH option.
Normally, the procedure draws longer lines for each class value in the legend in order to represent
the full line pattern. In this case, however, we are using the HTMLBlue style, which uses the
attribute priority of color. So, most line styles used are solid, and a long line segment is not
required.
91
Relevant details are shown in the code snippet above. For full details, see Program 4_1, available
from the authors page at http://support.sas.com/matange.
4.1.2 Box Plot of QTc Change from Baseline with Inner Risk Table
and Bands
The traditional graph commonly in use in the industry, as shown in Figure 4.1.1, shows the "At-
Risk" table at the bottom of the graph, just above the footnote with other items in between. Such a
layout places the risk data relatively far away from the graph. Even though the values are aligned
with the data along the x-axis, the distance and intervening items like the legend and the axis items
create a distraction.
Figure 4.1.2 Graph of QTc Change from Baseline with Subjects Table inside
Graphs are easier to decode when relevant information is placed as close as possible, thus reducing
the amount of eye movement needed to decode the graph. Following this principle, it would be
more effective to place the risk information inside the graph area, closer to the graphical
information. This arrangement is shown in the Figure 4.1.2. It was achieved by placing the
XAXISTABLE with LOCATION=INSIDE, as shown in the code snippet below.
Another improvement would be to represent the levels of concern as colored bands with direct
labels. This reduces the eye movement that is required to decode the information in the data. We
can do this by using the BAND plot statements. A text plot is used to label each band with the level
92 Clinical Graphs Using SAS
of concern "Normal", "Concern", and "High". The columns needed are included in the data. The
code snippet for inclusion of bands, band labels, and the inner risk table is shown below.
For graphs that are consumed in a color medium, this graph provides all the information in a
compact form that is free of clutter. The levels of concern are color coded with direct labels, and
risk values are moved closer to the rest of the data. For full details, see Program 4_1.
93
Box plots are represented in the legend using the display characteristics of the box. In this case, the
boxes are not filled. Normally, when using grayscale, the line style for the second group is a
94 Clinical Graphs Using SAS
dashed line. To avoid drawing boxes with patterned lines, we have specified only one solid pattern
for all groups in the STYLEATTRS statement. So, using lines in the legend will not be effective.
To distinguish the two groups, we would like to display markers in the legend. To do this, we
added a scatter plot that plots QTc by Week and Drug, except that all these QTc values are missing.
So, no markers are actually drawn in the plot, but the legend that is derived from the scatter plot
displays the marker symbols. Relevant details are shown in the code snippet above. For full
details, see Program 4_1.
The overlaid series plot could have also used the same response variable "Mean", but then the last
value on the x-axis, "LOCF", would have been joined to the previous one.
To avoid this, we have copied the values from the variable "Mean" to the variable "Mean2", with a
missing value for the x=28. The series plot uses "Mean2" as the response variable, which excludes
the last value to avoid connecting to the "LOCF" value.
Note, both the SCATTER and SERIES statements use GROUPDISPLAY=CLUSTER. This option
spreads the position of each group value on the x-axis. CLUSTERWIDTH=0.5 is set to keep the
clusters tight. This means that all the class values will be spread within 50% of the midpoint
spacing. Since both statements use same setting for group display and cluster width, the lines and
markers match for each group value.
An XAXISTABLE is used to display the "Number of Subjects" values at the bottom of the graph.
The display variable is "N", and the x-axis variable is the same as the x variable for the primary
plot "Week". So, the optional X role does not need to be specified in the statement.
The graph is classified by "Drug". We have specified "Drug" for the CLASS role for the axis table.
This causes the values for the two values of "Drug" to be displayed in separate rows.
COLORGROUP is also set to "Drug", so the values are colored by "Drug".
The table of subjects is displayed at the bottom of the graph by setting LOCATION=OUTSIDE,
which is also the default setting. The table has a title, which was set using the TITLE option. Text
attributes for the values, labels, and title are specified using the appropriate options.
The Y reference line is drawn at y=0, and the X reference line is drawn at X=26, which acts like a
separator for the "LOCF" value. A user-defined format is used to display "LOCF" for x=28.
The legend is generated by default by the procedure because group is in effect. But to prevent
multiple items in the legend, we specify the KEYLEGEND statement with the name of only one
statement. The legend is placed at the top center of the wall.
96 Clinical Graphs Using SAS
Relevant details are shown in the code snippet above. For full details, see Program 4_2.
4.2.2 Mean Change in QTc by Visit and Treatment with Inner Table of
Subjects
In this graph, the table of subjects at visit is displayed above the x-axis. This makes it easier to
understand the numbers because they are closer to the rest of the graph.
The graph above is mostly similar to 4.2.1, with the key difference of placing the "Subjects at
Visit" inside the data area instead of at the bottom of the graph. This improves the readability of
the graph.
The key difference in the code is the use of LOCATION=Inside for the XAXISTABLE. We also
use the SEPARATOR option to draw the line above the table. A reference line is used to separate
the "LOCF" value. Relevant details are shown in the code snippet above. For full details, see
Program 4_2.
To create an effective graph in grayscale, we have run the same graph as in 4.2.2 with ODS
Style=JOURNAL. Also, we have used STYLEATTRS option to customize the group attributes.
Note the use of FILLEDOUTLINEDMARKERS. When we are using the filled markers that are
specified here, the markers are drawn with fill and outline.
MARKERFILLATTRS=GRAPHWALLS is used. Relevant details are shown in the code snippet
above. For full details, see Program 4_2.
This graph is likely one of the most complex displays that can be created using the SGPLOT
procedure. This graph displays the distribution of ASAT by treatment over time using a grouped
99
box plot on a linear x-axis. The visit values are scaled correctly on the time axis. The smallest
interval between the visits determines the "effective" midpoint spacing used for adjacent placement
of the treatment values.
An XAXISTABLE statement is used to display the "Number of Subjects" values at the bottom of
the graph. A second XAXISTABLE at the top is used to display the count of values above 2.0 by
treatment.
100 Clinical Graphs Using SAS
Drawing this graph using the Journal style poses a few challenges, mainly in the drawing of the
boxes and their representation in the legend. Using the Journal style, the boxes for Drug "B" will
get drawn using dashed lines. Because those look odd, I set the STYLEATTRS option to use only
solid lines.
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 101
Although this improves the rendering of the boxes, it will put two solid lines in the legend for "A"
and "B". It would be better to show the mean markers in the legend instead. To do this, I have to
add a scatter plot of asat2 by Week and Drug and include that in the legend. Because values in
"asat2" are all missing, no markers are displayed in the graph itself, but the group markers are
displayed in the legend. Relevant details are shown in the code snippet above. For full details, see
Program 4_3.
markerfillattrs=(color=white);
keylegend 's' / title='Treatment' linelength=20;
yaxis label='Median with 95% CL' grid;
xaxis display=(nolabel);
run;
This graph displays the median of the lipid data by visit and treatment. The visits are at regular
intervals and represented as discrete data. However, they could also be on a time axis with unequal
intervals. The values for each treatment are displayed along with the 95% confidence limits as
adjacent groups using GROUPDISPLAY=Cluster and CLUSTERWIDTH=0.5.
The values across visits are joined using a series plot. Note, the series plot also uses cluster groups
with the same cluster width. The lengths of the line segments in the legends are reduced using the
LINELENGTH option. Markers with fill and outlines are used with specific fill attributes.
Relevant details are shown in the code snippet above. For full details, see Program 4_4.
Figure 4.4.2 Median of Lipid Profile by Visit and Treatment on Linear Axis
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 103
clusterwidth=0.5;
scatter x=n y=median / yerrorlower=lcl yerrorupper=ucl group=trt
groupdisplay=cluster clusterwidth=0.5
errorbarattrs=(thickness=1) filledoutlinedmarkers
markerattrs=(size=7) name='s'
markerfillattrs=(color=white);
keylegend 's' / title='Treatment' linelength=20;
yaxis label='Median with 95% CL' grid;
xaxis display=(nolabel) values=(1 4 8 12 16);
run;
The visits are not at regular intervals and are displayed at the correct scaled location along the x-
axis. The visits are at week 1, 2, 4, 8, 12, and 16. These values are formatted to the strings shown
on the axis. "Visit 1" collides with "Baseline", causing alternate tick values to be dropped, so I
removed "1" from the tick value list.
As you can see, the group values are displayed as clusters, and the "effective midpoint spacing" is
the shortest distance between the values. The markers are reduced in size to show the clustering.
This can be adjusted by setting marker SIZE=7. Four filled markers are assigned to the list of
markers.
Relevant details are shown in the code snippet above. For full details, see Program 4_4.
104 Clinical Graphs Using SAS
A step plot of survival by time by strata displays the curves. A scatter overlay is used to draw the
censored values, and an XAXISTABLE statement is used to display the at-risk values at the bottom
of the graph. Relevant details are shown in the code snippet above. For full details, see Program
4_5.
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 105
lineattrs=(pattern=solid) name='s';
scatter x=time y=censored / markerattrs=(symbol=plus) name='c';
scatter x=time y=censored / markerattrs=(symbol=plus) GROUP=stratum;
xaxistable atrisk/x=tatrisk location=outside class=stratum
colorgroup=stratum;
keylegend 'c' / location=inside position=topright;
keylegend 's';
run;
All this graph needs is to simply specify LOCATION=Inside for the XAXISTABLE statement. In
addition to that, we have switched on the separator that draws the horizontal line between the table
and the curves.
Relevant details are shown in the code snippet above. For full details, see Program 4_5.
Here we cannot use colors to identify the strata. Normally, the Journal style uses line patterns to
identify the groups. Although line patterns work well for curves, they are not so effective with step
plots because of the frequent breaks. So, it is preferable to use solid lines for all the levels of the
step plot and to use markers to identify the strata.
Figure 4.5.3 Survival Plot with Internal "Subjects At-Risk" Table in Grayscale
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 107
In this case, markers are also used to identify the censored observations. So, I have chosen to use
the CURVELABEL option with the SPLITCHAR option to identify the curves. This results in a
clean and effective graph, without the need for a legend for the strata.
Relevant details are shown in the code snippet above. For full details, see Program 4_5.
The data for this graph contains the odds ratio, the confidence limits, and the weight for each study.
The studies are reclassified with "1" for individual study names and "2" for "Overall". We use this
information to plot the graph by study using SCATTERPLOT and YAXISTABLE statements.
Note that this graph uses the Analysis style that has an attribute priority of "None", and producing
the display of varying markers by group.
We have used the TEXT statement to place the "Favors" strings at the bottom, using a study value
of NBSP. Y-axis tick values are left-aligned, and the fit policy is set to "none" so that all tick
values are displayed regardless of congestion. For full details, see Program 4_6.
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 109
styleattrs axisextent=data;
scatter y=study x=or2 / markerattrs=graphdata2(symbol=diamondfilled);
highlow y=study low=lcl high=ucl / type=line;
highlow y=study low=q1 high=q3 / type=bar barwidth=0.6;
yaxistable study / y=study location=inside position=left
labelattrs=(size=7);
yaxistable or lcl ucl wt / y=study location=inside position=right;
refline 1 / axis=x noclip;
refline 0.01 0.1 10 100 / axis=x lineattrs=(pattern=shortdash noclip;
text y=study x=xlbl text=lbl / position=center contributeoffsets=none;
xaxis type=log max=100 minor display=(nolabel) valueattrs=(size=7);
yaxis display=none fitpolicy=none reverse valueshalign=left
colorbands=even colorbandsattrs=Graphdatadefault(transparency=0.75);
run;
This graph uses a highlow plot to display the relative weights for each study and the confidence
interval. The scatter plot uses the "OR2" variable, which is non-missing only for the "Overall"
study. So, only the diamond marker is drawn by the scatter plot.
The width of each marker is proportional to the weight in linear scale. However, because we have
used a log x-axis, the widths might not be represented accurately in log scale. So, this can provide
a qualitative representation of the weight.
The graph has no wall or wall borders and the x-axis line is displayed only to the extent of the data
by using the AXISEXTENT=DATA option on the STYLEATTRS statement. The y-axis is
replaced by a YAXISTABLE on the left side so that the bands extend across the full graph.
Relevant details are shown in the code snippet above. For full details, see Program 4_6.
110 Clinical Graphs Using SAS
Rendering this graph in a grayscale medium does not pose a lot of challenges. Basically, we have
set the ODS style to JOURNAL to produce the graph above. This is structurally similar to the
graph shown in Section 4.6.2.
111
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 111
Relevant details are shown in the code snippet above. For full details, see Program 4_6.
For the graph shown below, only the hazard ratio plot in the middle is displayed using a plot
statement. The tabular data is displayed in axis tables.
The graph above displays the hazard ratio and confidence limits by subgroup, along with the
number of patients in the study and other statistics. The key difference here is the display of the
subgroups and values in the first column. The subgroup titles are displayed in a bold font, and the
values are displayed in a normal font and indented to the right.
The data for the graph is shown above. The study values come from the "Subgroup" column and
are displayed by "ObsId" order. For subgroup labels like "Overall", the Id value is "1", and for the
values in the subgroup, the ID is "2". This ID is used to control the attributes of the values that are
displayed in the first column of the graph using the attribute map as defined below. Text attributes
are defined by the value. Id=1 values are displayed with a bigger, bold font.
In addition, the indention of some of the values in column 1 are controlled by "IndentWt" column.
The default indention value is 1/8 inch, and can be changed using the INDENT option. Actual
indention amount is based on the INDENTWEIGHT * INDENT.
The second column contains a combination of patient count and percentage and is displayed by
another YAXISTABLE statement. The hazard ratio graph in the middle is displayed using a
highlow plot and a scatter plot. Then, the three columns on the right are displayed using another
YAXISTABLE, with three columns.
113
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 113
The insets at the bottom, "<- PCI Better" and "Therapy Better ->", are displayed in a text plot using
the "text", "xl", and "ObsId" columns. The text is center justified. See the full code for this
information in the data set, available from the authors page at http://support.sas.com/matange.
Finally, the wide horizontal bands across the graph are drawn using the REFLINE statement with
the "Ref" column. This column is a copy of the ObsId column where alternate 3 observations are
set to missing. Reference lines are not drawn when the value is missing. Also note, the x-axis line
is drawn only to the extent of the actual data, and not all the way using the AXISEXTENT option
on the STYLEATTRS statement.
Relevant details are shown in the code snippet above. For full details, see Program 4_7.
The graph above displays each adverse event as a bar segment over its duration. The color of the
event is set by the severity. The source data is in CDISC, using the SDTM tabulation model
format, as shown below.
The data has many columns, but the ones that we are using are aeseq, aedecod, aesev, aestdtc, and
aeendtc. In the example above, all aestdtc values are present and assumed to be valid. If not, some
data cleaning might be needed. In the DATA step, stdate is extracted from aestdtc and endate
from aeendtc. If aeendtc is missing, the largest value of endate is used, and highcap is set to
FilledArrow to indicate that the event does not have an end date. A valid end date is required to
draw the event in the graph. The data set that is required for plotting the graph is shown below.
Figure 4.8.3 Data Set for Adverse Event Timeline Graph with Caps
The data set below is computed for creating the graph. Note that in this data set, we do not have
any observations with Severity=Severe. However, the legend in the graph does have an entry for
Severe. These dummy observations do not have valid start and end values, so they are not
actually drawn in the graph. The top x-axis is enabled by using a scatter plot assigned to the x2-
axis. Macro variables are used to align the x- and x2-axes.
Observations with specific group values are assigned the color and other attributes from the
GraphData1-12 style elements. These are assigned in the order in which they are encountered in
the data. In this case, we are using specific colors for "Mild", "Moderate", and "Severe". If we just
115
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 115
change the colors of the style elements, we will get one of the three colors, but the color
assignments can shift based on the order of the data.
To ensure consistent and reliable color assignment, we will use the Discrete Attribute Map data set.
Colors and the specific values of the group values are explicitly assigned. Now, the group values
will get the colors by value, and not those based on the order of the values in the data. In this case,
LineColor is used both for lines and text.
Another benefit of using the attribute map is based on the SAS 9.4 option "Show" in the map data
set. This applies to every map "ID" that is defined in the data set. In this case, there is only one
"Severity". By default, only the values that occur in the data are included in the legend. But, if
Show is set to "AttrMap", then all the values from the attribute map ID are shown in the legend. In
this case, even though the aesev value of "Severe" never occurs in the data, it is still shown in the
legend. Another benefit of this feature is that the values that are shown in the legend are sorted in
the same order in which they appear in the attribute map. So, we can get a custom sorting of the
legend by using this feature.
The highlow plot is ideally suited for such a use case, and provides support for drawing labels and
arrowhead caps at each end. In this case, the LOWLABEL option is used to draw the event names.
We have displayed the aedecod label only the first time. The HIGHCAP option is used to draw the
arrowhead as shown for "Cough" at the right end. This indicates an event that does not have an end
date in the data.
For the grayscale use case, we can change the highlow bar type to the default "Line". This will
allow use of the line pattern as the visual element for the different severity values. Here is the
graph, along with the appropriate attribute map.
116 Clinical Graphs Using SAS
Relevant details are shown in the code snippet above. For full details, see Program 4_8.
117
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 117
The graph displays the change in tumor size in descending order of size increase for the population
by treatment. The data is shown on the right. The response type is shown at the end of the bar.
118 Clinical Graphs Using SAS
Confidence limits are shown at +20% and -30%. A partial response is generally indicated for
tumor shrinkage of 30% or more; however, the author does not claim domain-specific expertise.
See domain-centric papers for more information about such details.
The STYLEATTRS statement is used to control the colors for treatments 1 and 2. For specific
assignment of colors by treatment, a discrete attribute map would be preferred.
A serious clinical graph does not necessarily have to have boring aesthetics. The graph below
displays the same information using a different set of colors and presentation aspects, including
bars with a textured look. The confidence region is displayed using a band plot with 50%
transparency.
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 119
A VBARPARM statement is used instead of a VBAR statement because we want to layer a band
plot in the graph. Grid lines are enabled, and the legend has an opaque background. Group display
of "Cluster" is used so that we can display the bar data labels.
Relevant details are shown in the code snippet above. For full details, see Program 4_9.
120 Clinical Graphs Using SAS
We have used a VBAR statement with Time as the category and Cohort (Group) as the group. The
time values are treated as discrete, and each cluster of incidence bars is positioned at equidistant
midpoints along the axis.
121
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 121
The STYLEATTRS statement is used to set the four colors for the group values. Y-axis grid lines
are enabled, and the tick marks are removed.
baselineattrs=(thickness=0) outlineattrs=(color=gray);
xaxis discreteorder=data valueattrs=(size=8) fitpolicy=none
display=(nolabel);
yaxis offsetmin=0.04 grid display=(noticks);
keylegend / title='' location=inside position=topright across=1 border
autoitemsize valueattrs=(size=8);
run;
We have used a new SAS 9.4 feature to display the axis only for the extent of the data using the
AXISEXTENT=DATA on the STYLEATTRS statement. This produces results that are preferred
by many users.
Relevant details are shown in the code snippet above. For full details, see Program 4_10.
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 123
The graph above uses two VBOX statements, one each for the values for drugs A and B.
The levels of concern for the lab tests are different, so we have used the DROPLINE statement to
draw the levels differently for the upper and lower values. Discrete offset is used to start the drop
line halfway between the lab values.
124 Clinical Graphs Using SAS
In this example, the data is arranged by group, instead of by multi-column as in 4.11.1. We are
using empty boxes in a black and white medium using the Journal style. We have set all lines to
solid, so we need another way to indicate the treatment name.
125
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 125
Here we used a scatter overlay, with Y=OUT, on a column that has all values > 4. We also set the
y-axis MAX=4 in order to remove these fake markers while still retaining them in the legend.
Relevant details are shown in the code snippet above. For full details, see Program 4_11.
Figure 4.12.1 Clark Error Grid for Blood Glucose Measurement Accuracy
126 Clinical Graphs Using SAS
The data for this graph includes the measured and reference glucose level observations, data for
zone boundaries and the zone labels, and data for zone labels.
The scatter plot in the program is used to draw the metered glucose values by reference. The series
plot is used to display the boundaries of each zone, and the text plot is used to display the zone
name. The text plot is optimized for display of textual items in a graph. A discrete attributes map is
used to color the markers in each zone appropriately.
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 127
The attribute map is not used here because the zones are clearly marked in the graph itself. It is
only helpful to have different markers in each zone, but not necessary. However, if it does become
necessary to place the same markers across different graphs for each zone, this can be ensured by
using a discrete attribute map.
All axis offsets are set to zero to ensure the zone boundaries touch the axes. This also removes the
effect of any offset contributions preferred by the text plot.
Relevant details are shown in the code snippet above. For full details, see Program 4_12.
128 Clinical Graphs Using SAS
Figure 4.13.1.1 The Swimmer Plot for Tumor Response over Time
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 129
An arrowhead on the right indicates continuing response. The bar contains durations over which
the "Complete" or "Partial" response is indicated, with a start and end time. The disease stage is
indicated by the color of the bar, with a legend showing the unique values below the x-axis. An
inset is included to decode the different markers in the event bar. A "Durable" response is
indicated by the square marker on the left end of the bar.
Note that the start and end points for each response are represented by colored markers inside each
event bar. However, the same points are shown in grayscale in the inset table. This is achieved by
first plotting the markers in a gray color, and overdrawing those by colored markers using
GROUP=Status. The scatter plots that plot the gray markers are the ones that are included in the
inset.
Also note the existence of a "right arrow" marker in the inset indicating the continuing event. This
is done by including a scatter plot with a right triangle marker in the plot, but the data for this
marker is missing. However, it is included in the inset.
The structure of the data set that is needed for the graph is shown below.
Note, although the program for this graph is longer than some other ones, it can be built one part at
a time.
First, plot the full duration from Low to High by Item using a grouped highlow plot with a
High Cap and TYPE=BAR. Include this in the outside legend.
Layer the individual "Response" events from Startline to Endline by Status using a high-low
bar with the default line type. Include this in the inset legend.
Layer the Start and End events in a gray color. Include these in the inset legend.
Layer the Start and End events again using GROUP=Status.
Add a scatter plot with missing data to include the "Right Arrow" in the legend.
The Discrete Attribute Map data set contains two maps, one for the colored graph "StatusC", and
one for the grayscale graph called "StatusJ". AttrId=StatusC is used in this graph. For full details,
see Program 4_12.
4.13.2 The Swimmer Plot for Tumor Response over Time in Grayscale
The tumor response graph is shown in grayscale. The disease stage is shown on the left as we
cannot use a color indicator.
Patterned lines are used to draw the response events, and a YAXISTABLE is used to draw the
stage labels on the left. ATTRID=StatusJ is used in this graph. For full details, see Program 4_13.
Figure 4.13.2 The Swimmer Plot for Tumor Response over Time in Grayscale
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 131
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 133
The graph above renders the full CDC chart for Length and Weight Percentiles from the data for
one subject. The original graph was a bit taller, but I shrank it to fit this page. The data that is
required is created by appending the CDC percentile data with the historical data for one subject.
The CDC data is included in the file named "4_14_CDC_Cleaned.csv".
The CDC data for the percentile curves is shown below. Only a few of the observations are
displayed to conserve space. Also, the data contains all the columns for 5, 10, 25, 50, 75, 90, and
95 percentiles, but only a few columns are included to fit in the space.
The historical data for the subject is appended at the bottom of the curve data, using the column
names Sex, Age, Height, and Length, as shown below.
title j=l h=9pt 'Birth to 36 months: Boys' j=r "Name: John Smith";
title2 j=l h=8pt "Length-for-age and Weight-for-age percentiles" j=r
"Record # 12345-67890";
footnote j=l h=7pt "Published May 30, 2000 (modified 4/20/01) CDC";
proc sgplot data=Chart_Patient noautolegend;
where sex=1;
refline 3 4 5 6 / axis=y2 lineattrs=graphgridlines;
/*--Curve bands--*/
band x=agemos lower=w5 upper=w95 / y2axis fillattrs=graphdata1
transparency=0.9;
band x=agemos lower=w10 upper=w90 / y2axis fillattrs=graphdata1
transparency=0.8;
band x=agemos lower=w25 upper=w75 / y2axis fillattrs=graphdata1
transparency=0.8;
/*--Curves--*/
series x=agemos y=w5 / y2axis lineattrs=graphdata1 transparency=0.5;
series x=agemos y=w10 / y2axis lineattrs=graphdata1 transparency=0.7;
series x=agemos y=w25 / y2axis lineattrs=graphdata1 transparency=0.7;
series x=agemos y=w50 / y2axis x2axis lineattrs=graphdata1;
series x=agemos y=w75 / y2axis lineattrs=graphdata1 transparency=0.7;
134 Clinical Graphs Using SAS
The program that is required to draw all the elements of this graph is long, but easy to understand.
So, I have shown it in parts across the following pages. The first part of the program is shown
above, with titles, footnotes, and percentile curves for Weight. The bands are drawn with three
transparent overlays to create the appearance of color gradation. The curves are overlaid on the
bands.
/*--Curve labels--*/
text x=agemos y=w5 text=l5 / y2axis textattrs=graphdata1
position=right;
text x=agemos y=w10 text=l10 / y2axis textattrs=graphdata1
position=right;
text x=agemos y=w25 text=l25 / y2axis textattrs=graphdata1
position=right;
text x=agemos y=w50 text=l50 / y2axis textattrs=graphdata1
position=right;
text x=agemos y=w75 text=l75 / y2axis textattrs=graphdata1
position=right;
text x=agemos y=w90 text=l90 / y2axis textattrs=graphdata1
position=right;
text x=agemos y=w95 text=l95 / y2axis textattrs=graphdata1
position=right;
/*--Patient datas--*/
series x=age y=weight / lineattrs=graphdata1(thickness=2)
y2axis markers markerattrs=(symbol=circlefilled size=11)
filledoutlinedmarkers markerfillattrs=(color=white)
markeroutlineattrs=graphdata1(thickness=2);
The code section above draws the curve labels for the percentile curves on the right. This is
overlaid by the historical subject weight data as a series plot. The code for Height is shown below.
/*--Curve bands--*/
band x=agemos lower=h5 upper=h95 / fillattrs=graphdata3
transparency=0.9;
band x=agemos lower=h10 upper=h90 / fillattrs=graphdata3
transparency=0.8;
band x=agemos lower=h25 upper=h75 / fillattrs=graphdata3
transparency=0.8;
/*--Curves--*/
series x=agemos y=h5 / lineattrs=graphdata3(pattern=solid)
transparency=0.5;
series x=agemos y=h10 /lineattrs=graphdata3(pattern=solid)
transparency=0.7;
series x=agemos y=h25 /lineattrs=graphdata3(pattern=solid)
transparency=0.7;
series x=agemos y=h50 /lineattrs=graphdata3(pattern=solid) x2axis;
series x=agemos y=h75 /lineattrs=graphdata3(pattern=solid)
135
Chapter 4: Clinical Graphs Using the SAS 9.4 SGPLOT Procedure 135
transparency=0.7;
series x=agemos y=h90 /lineattrs=graphdata3(pattern=solid)
transparency=0.7;
series x=agemos y=h95 /lineattrs=graphdata3(pattern=solid)
transparency=0.5;
/*--Curve labels--*/
text x=agemos y=h5 text=l5 / textattrs=graphdata3
position=bottomright;
text x=agemos y=h10 text=l10 / textattrs=graphdata3 position=right;
text x=agemos y=h25 text=l25 / textattrs=graphdata3 position=right;
text x=agemos y=h50 text=l50 / textattrs=graphdata3 position=right;
text x=agemos y=h75 text=l75 / textattrs=graphdata3 position=right;
text x=agemos y=h90 text=l90 / textattrs=graphdata3 position=right;
text x=agemos y=h95 text=l95 / textattrs=graphdata3 position=topright;
/*--Patient datas--*/
series x=age y=height /
lineattrs=graphdata3(pattern=solid thickness=2)
markers markerattrs=(symbol=circlefilled size=11)
filledoutlinedmarkers markerfillattrs=(color=white)
markeroutlineattrs=graphdata3(thickness=2);
The Height (Length) and Weight data ranges are different, and these need to be plotted with
different vertical scales and axis details. We can do that by using two separate Y-axes for each
column. Here we used the Y2AXIS for Weight and YAXIS for Height. This breaks the link
between the two variables scales, thus allowing us to draw the Height and Weight curves and data
independently.
/*--Table--*/
inset " Date Age(Mos) Wt(Kg) Ln(Cm)"
"04 May 2010 Birth 3.5 52"
"02 Aug 2010 3 6.5 63"
"01 Nov 2010 6 8.5 68"
"07 Feb 2011 9 9.5 72"
"02 May 2011 12 10.5 75" / border
textattrs=(family='Courier' size=6 weight=bold)
position=bottomright;
Note the options on the YAXIS and the Y2AXIS statements. The Y2AXIS has
OFFSETMAX=0.25, which means that all items that are associated with it are displayed only in the
136 Clinical Graphs Using SAS
lower 75% of the graph height. This causes all the "Weight" related items and the axis (drawn in
blue) to be drawn in the lower part.
Similarly, the YAXIS has OFFSETMIN =0.25, so all the "Height or Length" related items are
drawn in the upper part of the graph. More importantly, the scaling for each axis is independent,
allowing us to draw different tick values on the axes. To make the graph easier to read, we have
taken care to position the Y grid lines so that they line up with the values on each side.
The program snippet above also shows how we can include the historical data as a tabular display
in the chart for easy reference. I have used the INSET statement to create a tabular display.
Although the values here are hardcoded, we can use macro variables assigned from the DATA step.
Relevant details are shown in the code snippet above. For full details, see Program 4_14.
4.15 Summary
The graphs discussed in this chapter represent a large fraction of the graphs commonly used in the
clinical trials industry and in Health and Life Sciences in general. Most of these are "single-cell"
graphs where the main data is displayed in one cell in the middle, along with other information.
In this chapter, we have used the SAS 9.4 SGPLOT procedure, which provides you with a large
selection of plot statements that can be used to create many graphs on their own. Many of these
plot statements can be combined in creative ways to create almost any graph that might be needed.
Some new statements, such as the axis tables, and options newly added to SAS 9.4 make it much
easier to create these graphs.
The SG Annotate facility further enhances your ability to create custom graphs using the SGPLOT
procedure. Although we have not used it in these examples, annotation can be very useful to add
some custom details that are otherwise hard to do using plot layers.
Group attributes such as colors or marker symbol shapes can be assigned by specific group values
using the Attribute Map feature. This ensures that attributes are correctly mapped regardless of the
data order, or whether some groups are present or not.
1
My graph is based on ideas presented in a paper. See Phillips, Stacey D. 2014. Swimmer Plot: Tell a
Graphical Story of Your Time to Response Data Using PROC SGPLOT. Proceedings of the
Pharmaceutical Industry SAS Users Group (PharmaSug) 2014 Conference. San Diego, CA: SAS
Institute Inc. Available at http://www.pharmasug.org/2014-proceedings.html.
Customizing the Kaplan-Meier Survival Plot
Excerpt from SAS/STAT 14.2 Users Guide
The Kaplan-Meier plot displays patient survival over time for one or more groups of patients.
The Kaplan-Meier plot is heavily used in medical, pharmaceutical, and life-sciences research. It is
a plot that researches can easily customize in a variety of ways. This chapter shows you how to
customize the Kaplan-Meier plot through a series of examples. It discusses four types of
examples: specifying procedure options, modifying graph templates by using macro variables,
modifying graph templates by using macros, and changing styles. Each example is designed to be
small, simple, self-contained, and easy to copy and use as is or with minor modifications.
For more SAS/STAT chapters with ODS and ODS Graphics examples, please visit these
resources.
http://support.sas.com/documentation/onlinedoc/stat/142/ods.pdf
http://support.sas.com/documentation/onlinedoc/stat/142/odsgraph.pdf
http://support.sas.com/documentation/onlinedoc/stat/142/templt.pdf
support.sas.com/kuhfeld
Chapter 23
Customizing the Kaplan-Meier Survival Plot
Contents
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806
Controlling the Survival Plot by Specifying Procedure Options . . . . . . . . . . . . . . . . 807
Enabling ODS Graphics and the Default Kaplan-Meier Plot . . . . . . . . . . . . . . 807
Individual Survival Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 809
Hall-Wellner Confidence Bands and Homogeneity Test . . . . . . . . . . . . . . . . . 811
Equal-Precision Bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812
Displaying the Patients-at-Risk Table inside the Plot . . . . . . . . . . . . . . . . . . 814
Displaying the Patients-at-Risk Table outside the Plot . . . . . . . . . . . . . . . . . 816
Modifying At-Risk Table Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 817
Reordering the Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 820
Suppressing the Censored Observations . . . . . . . . . . . . . . . . . . . . . . . . . 823
Failure Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824
Controlling the Survival Plot by Modifying Graph Templates . . . . . . . . . . . . . . . . . 824
The Modularized Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 825
Changing the Plot Title . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 827
Modifying the Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 829
Changing the Line Thickness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 831
Changing the Group Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 832
Changing the Line Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 833
Changing the Font . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834
Changing the Legend and Inset Position . . . . . . . . . . . . . . . . . . . . . . . . . 836
Changing How the Censored Points Are Displayed . . . . . . . . . . . . . . . . . . . 838
Adding a Y-Axis Reference Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 839
Changing the Homogeneity Test Inset . . . . . . . . . . . . . . . . . . . . . . . . . . 841
Suppressing the Second Title and Adding a Footnote . . . . . . . . . . . . . . . . . . 843
Adding a Small Inset Table with Event Information . . . . . . . . . . . . . . . . . . . 844
Adding an External Table with Event Information . . . . . . . . . . . . . . . . . . . 846
Suppressing the Legend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848
Kaplan-Meier Plot with Event Table and Other Customizations . . . . . . . . . . . . 849
Compiled Template Cleanup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 850
Graph Templates, Macros, and Macro Variables . . . . . . . . . . . . . . . . . . . . . . . . 851
The Macro Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853
The Smaller Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
The Larger Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856
Event Table Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 861
806 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
Overview
The LIFETEST procedure is a nonparametric procedure for analyzing survival data. You can use PROC
LIFETEST to compute the Kaplan-Meier curve (1958), which is a nonparametric maximum likelihood
estimate of the survivor function. The Kaplan-Meier plot (also called the product-limit survival plot) is a
popular tool in medical, pharmaceutical, and life sciences research. The Kaplan-Meier plot contains step
functions that represent the Kaplan-Meier curves of different samples (strata). The Kaplan-Meier plot has
many other features that you can add or change through procedure options, graph templates, and style
templates. This chapter explores these features in detail but does not explain how to interpret the graphs
or the underlying analysis. For more information about PROC LIFETEST and the Kaplan-Meier plot, see
Chapter 71, The LIFETEST Procedure.
This chapter shows you how to modify the Kaplan-Meier plot through a series of examples. It discusses
four types of examples: specifying procedure options, modifying graph templates by using macro variables,
modifying graph templates by using macros, and changing styles. Most examples do not go into detail about
the tools that underlie the template changes. Each example is designed to be small, simple, self-contained,
and easy to copy and use as is or with minor modifications. Subsequent sections provide more details
about the macro variables and macros that are used to modify the graph templates. You can use the simple
examples to make a wide variety of changes without reading or understanding the detailed descriptions at the
end of this chapter.
Statistical procedures produce tables by using the Output Delivery System (ODS) and produce graphs by
using ODS Graphics. Procedures produce graphs as automatically as they produce tables, and graphs and
tables are integrated in the ODS output. Graphs that are produced by ODS Graphics are controlled by options,
the data object (the matrix of information that is graphed), a style template, and a graph template. A style
template is a SAS program that controls the overall appearance of graphs, including colors, line and marker
styles, sizes, fonts, and so on. A graph template is a SAS program, written in the Graph Template Language
Controlling the Survival Plot by Specifying Procedure Options F 807
(GTL), that provides a detailed specification of the layout and contents of each graph. Each graph that is
created when ODS Graphics is enabled is controlled by a graph template.1
If you want to modify a graph template, you usually use the TEMPLATE procedure to display the template
of interest, and then you copy it into your editor, modify it, and submit it to SAS to compile. Then, when you
run your procedure, it uses the new template. The PROC LIFETEST survival plot is the only plot in SAS
for which you have another alternative available for template modification. SAS provides the survival plot
templates in a series of macros and macro variables that are modular and easier to modify than the original
templates. This chapter provides numerous examples of using these macros and macro variables.
The data that are used in this chapter come from 137 bone marrow transplant patients in a study by Klein and
Moeschberger (1997) and are available in the BMT data set in the Sashelp library. At the time of transplant,
each patient is classified in one of three risk categories: ALL (acute lymphoblastic leukemia), AML (acute
myelocytic leukemia)Low Risk, and AMLHigh Risk. The endpoint of interest is the disease-free survival
time, which is the time in days until death, relapse, or the end of the study. The variable Group represents the
patients risk category, the variable T represents the disease-free survival time, and the variable Status is the
censoring indicator. A status of 1 indicates an event time, and a status of 0 indicates a censored time.
1 ODS Graphics might or might not be enabled by default. ODS Graphics is usually enabled by default in the SAS windowing
environment and disabled when you invoke SAS in other ways. However, these defaults can be changed in a number of ways. ODS
Graphics is enabled in the first example in this chapter by the ODS GRAPHICS ON statement and remains enabled throughout the
chapter.
808 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
The following step, which explicitly specifies the default PLOTS=SURVIVAL option, is equivalent to the
preceding step:
You can use the STRATA=PANEL option as follows to display the results in separate panels of a single
graphical display:
The rest of this chapter discusses overlaid plots such as the one displayed in Figure 23.1.
Hall-Wellner Confidence Bands and Homogeneity Test F 811
Equal-Precision Bands
You can use the following statements to add equal-precision bands to the plot:
You can use the following statements to add both Hall-Wellner and equal-precision bands to the plot:
The group labels for the at-risk table are group numbers, and these numbers appear in the legend. Numbers
are used rather than the actual labels because the length of the longest label (13) is greater than the default
that is set by the maximum label length option (MAXLEN=12). You can display labels rather than the group
numbers by specifying a MAXLEN= value equal to the maximum group label length as follows:
You can specify at-risk values that do not correspond to the original time axis tick marks. You can use the
PLOTS=SURVIVAL(ATRISK(ATRISKTICK)) option to add tick marks that correspond to the specified
at-risk values:
You can display tick values only at those times that are given in the ATRISK= list:
proc format;
invalue bmtnum 'AML-Low Risk' = 1 'ALL' = 2 'AML-High Risk' = 3;
value bmtfmt 1 = 'AML-Low Risk' 2 = 'ALL' 3 = 'AML-High Risk';
run;
data BMT(drop=g);
set sashelp.BMT(rename=(group=g));
Group = input(g, bmtnum.);
run;
You can submit the following steps to display ALL first, followed by AMLLow Risk and then AMLHigh
Risk:
proc format;
invalue bmtnum 'ALL' = 1 'AML-Low Risk' = 2 'AML-High Risk' = 3;
value bmtfmt 1 = 'ALL' 2 = 'AML-Low Risk' 3 = 'AML-High Risk';
run;
data BMT(drop=g);
set sashelp.BMT(rename=(group=g));
Group = input(g, bmtnum.);
run;
Failure Plots
All the discussion up to this point has been about survival plots. You can instead plot failure probabilities by
using the PLOTS=SURVIVAL(FAILURE) option as follows:
data _null_;
%let url = //support.sas.com/documentation/onlinedoc/stat/ex_code/142;
infile "http:&url/templft.html" device=url;
file 'macros.tmp';
retain pre 0;
input;
_infile_ = tranwrd(_infile_, '&', '&');
_infile_ = tranwrd(_infile_, '<' , '<');
if index(_infile_, '</pre>') then pre = 0;
if pre then put _infile_;
if index(_infile_, '<pre>') then pre = 1;
run;
2 The two templates that PROC LIFETEST uses are named Stat.Lifetest.Graphics.ProductLimitSurvival
and Stat.Lifetest.Graphics.ProductLimitSurvival2.
3 You might wonder why these macros are not simply made available in the SAS autocall library. The autocall library provides
macros that you can run. In this context, you do not need to simply run a macro. You need to copy it, extract parts of it, modify those
parts, and submit the modified statements. That is not convenient with the autocall library.
4 However, there might be something that you always want to change. For example, if you always want the survival plot to be
entitled Kaplan-Meier Plot, then you can modify the title once inside the %ProvideSurvivalMacros macro. This is not illustrated in
this chapter. All examples illustrate ad hoc changes that are made outside the context of the %ProvideSurvivalMacros macro.
826 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
Submitting these statements only defines the %ProvideSurvivalMacros macro. It does not make any of its
component macros and macro variables available. The URL macro variable is used to avoid an overly long
INFILE statement.
You can provide the default macros and macro variables by running the following macro:
%ProvideSurvivalMacros
Running this macro provides the default macros and macro variables (or restores them if you have previously
submitted the %ProvideSurvivalMacros macro).5 The %ProvideSurvivalMacrosmacro also runs the %Com-
pileSurvivalTemplates macro and hence replaces any compiled survival plot templates that you might have
created in the past. You can recompile the templates by submitting the following macro:
%CompileSurvivalTemplates
This macro runs PROC TEMPLATE and compiles the templates from all the macros and macro variables in
the %ProvideSurvivalMacros macro along with any that you modified. Running this macro produces two
compiled templates that are stored in a special SAS data file called an item store. For more information about
SAS item stores, see the section SAS Item Stores on page 879. Assuming that you have not modified your
ODS path by using an ODS PATH statement, compiled templates are stored in an item store in the Sasuser
library. Files in the Sasuser library persist across SAS sessions until they are deleted. When you are done
with a modified template, it is wise to clean up all remnants of it by restoring the default macros and by
deleting the modified templates from the Sasuser template item store. You can delete the modified templates
(so that SAS can only find the original templates) by running the following step:
proc template;
delete Stat.Lifetest.Graphics.ProductLimitSurvival /
store=sasuser.templat;
delete Stat.Lifetest.Graphics.ProductLimitSurvival2 /
store=sasuser.templat;
run;
This step deletes the compiled templates from the item store sasuser.templat. You can omit the STORE=
option if you are using the default ODS path, but it is good practice to explicitly control which templates are
deleted. Deleting the compiled templates does not change any of the macros or macro variables. Only the
compiled templates (not the macros or macro variables) affect the graph when you run PROC LIFETEST.
For more information about compiled templates, item stores, and cleanup, see the section SAS Item Stores
on page 879.
5 Semicolons are not needed after a macro call like this one, so they are not used in these examples.
Changing the Plot Title F 827
%ProvideSurvivalMacros
%CompileSurvivalTemplates
The following statements modify the Y axis so that tick marks start at 0.2:
%ProvideSurvivalMacros
%CompileSurvivalTemplates
%ProvideSurvivalMacros
%CompileSurvivalTemplates
%ProvideSurvivalMacros
%CompileSurvivalTemplates
The original colors (as shown in Figure 23.33) are more subtle than those shown in Figure 23.21. If you want
to change the order of the original colors by using this approach, then you need to know what they are so that
you can specify them. The graph colors for the HTMLBlue and Statistical styles are extracted from the style
in the section Displaying a Style and Extracting Color Lists on page 868 and displayed in Figure 23.36.
The section Modifying Color Lists on page 871 shows you how to change the graph template to specify
the original colors in a different order. The section Swapping Colors among Style Elements on page 872
shows you how to use a macro to change a style template to specify the original colors in a different order
(without having to extract and specify the color names).
%ProvideSurvivalMacros
%CompileSurvivalTemplates
Other values for the DATALINEPATTERNS= option are provided in the section The Macro Variables on
page 853. You must use the option ATTRPRIORITY=NONE when you want to have varying line patterns in
an ATTRPRIORITY=COLOR style like HTMLBlue or Pearl. In an ATTRPRIORITY=COLOR style, groups
are not distinguished by line patterns, and the line patterns for second and subsequent groups match the line
pattern for the first group.
%ProvideSurvivalMacros
%CompileSurvivalTemplates
Fonts vary from installation to installation. Sample font strings include: Times New Roman, Courier New,
Arial, and Calibri. For more information about text and label attribute options, see SAS Graph Template
Language: Reference. For information about changing fonts in ODS styles, see the section Displaying a
Style and Extracting Font Information on page 874. ODS Graphics can use a single style element in more
than one place in a graph; this example shows how to change individual graph components.
836 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
%ProvideSurvivalMacros
%CompileSurvivalTemplates
%ProvideSurvivalMacros
%CompileSurvivalTemplates
The Unicode Consortium (http://unicode.org/) provides a list of character codes. Also see Fig-
ure 22.2.7 in Chapter 22, ODS Graphics Template Modification, for information about the Unicode
specification for other markers. Although some Unicode characters are supported in some fonts, you should
always specify a Unicode font when using special characters.
referenceline y=0.5;
You can do this by using the %StmtsTop macro. By default, this macro is empty. You can use the %StmtsTop
macro to add new statements to the beginning of the block of statements that define the appearance of the
graph. In contrast, you can use the %StmtsBottom macro to provide statements at the end of the statement
block. ODS Graphics draws statements in the order in which they appear; therefore, reference lines should
be drawn first so they do not obscure other parts of the graph.
The following step creates the plot in Figure 23.26:
%ProvideSurvivalMacros
%macro StmtsTop;
referenceline y=0.5;
%mend;
%CompileSurvivalTemplates
%macro pValue;
if (PVALUE < .0001)
entry TESTNAME " p " eval (PUT(PVALUE, PVALUE6.4));
else
entry TESTNAME " p=" eval (PUT(PVALUE, PVALUE6.4));
endif;
%mend;
The following example directly specifies the test name (replacing the internal name Logrank with Log
Rank) and adds blank spaces around the equal sign:
%ProvideSurvivalMacros
%macro pValue;
if (PVALUE < .0001)
entry "Log Rank p " eval (PUT(PVALUE, PVALUE6.4));
else
entry "Log Rank p = " eval (PUT(PVALUE, PVALUE6.4));
endif;
%mend;
%CompileSurvivalTemplates
Because this template modification replaces a character string that is more appropriately set by PROC
LIFETEST, you should clean up afterward as follows:
%ProvideSurvivalMacros
proc template;
delete Stat.Lifetest.Graphics.ProductLimitSurvival /
store=sasuser.templat;
delete Stat.Lifetest.Graphics.ProductLimitSurvival2 /
store=sasuser.templat;
run;
Suppressing the Second Title and Adding a Footnote F 843
%ProvideSurvivalMacros
%let ntitles = 1;
%macro StmtsBeginGraph;
entryfootnote halign=left "Acme Company %sysfunc(date(),worddate.)" /
textattrs=GraphDataText;
%mend;
%CompileSurvivalTemplates
By default, the nTitles macro variable is 2, and all titles are displayed. Setting nTitles to 1 suppresses the
844 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
second title. You can add titles or footnotes to the plot by adding them to the %StmtsBeginGraph macro.
This example adds a footnote that consists of a company name followed by the current date, formatted by
using the WORDDATE format. The GraphDataText style element is used; it has a smaller font than the
default style element, GraphFootnoteText.
%ProvideSurvivalMacros
%macro StmtsBottom;
dynamic %do i = 1 %to 3; StrVal&i NObs&i NEvent&i %end;;
layout gridded / columns=3 border=TRUE autoalign=(TopRight);
entry ""; entry "Event"; entry "Total";
%do i = 1 %to 3;
%let t = / textattrs=GraphData&i;
entry halign=right Strval&i &t; entry NEvent&i &t; entry NObs&i &t;
%end;
endlayout;
%mend;
%CompileSurvivalTemplates
6 This legend is wide and might not be displayed if your graph is small. If the legend is not displayed, try increasing the size of
the graph by specifying the WIDTH= or HEIGHT= option in the ODS GRAPHICS statement.
Adding a Small Inset Table with Event Information F 845
The macro variable TitleText2, which controls the title for the multiple-strata plot, is changed. You can
change all three title macro variables, as is done in the construction of Figure 23.17, or you can change only
TitleText2 when you have multiple overlaid strata, as in this example. The LegendOpts macro variable value
was changed from TITLE=GROUPNAME LOCATION=OUTSIDE to display the censored value legend in
place of the legend title and to display the legend inside the bottom of the plot. When the InsetOpts macro
variable is null, the usual inset that contains the censored value and p-value is not displayed.
The %StmtsBottom macro (null by default) is replaced with a macro that creates the new inset table. This
macro adds statements to the bottom of the templates. If you ignore for a moment most of the options, the
core of the generated statements is as follows:
dynamic StrVal1 NObs1 NEvent1 StrVal2 NObs2 NEvent2 StrVal3 NObs3 NEvent3;
layout gridded / columns=3;
entry ""; entry "Event"; entry "Total";
entry Strval1; entry NEvent1; entry NObs1;
entry Strval2; entry NEvent2; entry NObs2;
entry Strval3; entry NEvent3; entry NObs3;
endlayout;
The macro first constructs a DYNAMIC statement that includes the names of the dynamic variables that
contain some of the results. PROC LIFETEST creates these dynamic variables and sets them to values,
but you must declare them in your template before using them. For more information about these dynamic
variables, see the section Additional Dynamic Variables on page 864. The macro then constructs a 4 3
grid that contains a table consisting of a title line and a row for each stratum (which consists of the stratum
846 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
label, the number of events, and the total number of subjects). The full layout that the %StmtsBottom macro
generates, with all the options, is as follows:
dynamic StrVal1 NObs1 NEvent1 StrVal2 NObs2 NEvent2 StrVal3 NObs3 NEvent3;
layout gridded / columns=3 border=TRUE autoalign=(TopRight);
entry "";
entry "Event";
entry "Total";
entry halign=right Strval1 / textattrs=GraphData1;
entry NEvent1 / textattrs=GraphData1;
entry NObs1 / textattrs=GraphData1;
entry halign=right Strval2 / textattrs=GraphData2;
entry NEvent2 / textattrs=GraphData2;
entry NObs2 / textattrs=GraphData2;
entry halign=right Strval3 / textattrs=GraphData3;
entry NEvent3 / textattrs=GraphData3;
entry NObs3 / textattrs=GraphData3;
endlayout;
%ProvideSurvivalMacros
%SurvivalSummaryTable
%CompileSurvivalTemplates
%ProvideSurvivalMacros
%SurvivalSummaryTable
%CompileSurvivalTemplates
The legend is suppressed when the LegendOpts macro variable is null. This example also illustrates changing
the design height to 500 pixels and moving the at-risk table back inside the body of the plot.
Kaplan-Meier Plot with Event Table and Other Customizations F 849
proc format;
invalue bmtnum 'ALL' = 1 'AML-Low Risk' = 2 'AML-High Risk' = 3;
value bmtfmt 1 = 'ALL' 2 = 'AML-Low Risk' 3 = 'AML-High Risk';
run;
data BMT(drop=g);
set sashelp.BMT(rename=(group=g));
Group = input(g, bmtnum.);
run;
%ProvideSurvivalMacros
%SurvivalSummaryTable
%CompileSurvivalTemplates
%ProvideSurvivalMacros
proc template;
delete Stat.Lifetest.Graphics.ProductLimitSurvival /
store=sasuser.templat;
delete Stat.Lifetest.Graphics.ProductLimitSurvival2 /
store=sasuser.templat;
run;
For more information about deleting compiled templates, see the section SAS Item Stores on page 879.
Graph Templates, Macros, and Macro Variables F 851
Many options, including most of the options that are specified in multiple places in the templates, are
extracted to macro variables.
The %CompileSurvivalTemplates macro provides the main body of the two templates. You can call it
to compile the templates after making changes.
The two templates share many statements, and a macro %DO loop creates both versions.
The portion of the templates for the table for the p-values is stored in the macro %pValue.
The portion of the templates for the single-stratum case is stored in the macro %SingleStratum.
The portion of the templates for the multiple-strata case is stored in the macro %MultipleStrata.
The macro %AtRiskLatticeStart begins the two-cell lattice that contains the plot above the table when
the at-risk table is outside the body of the plot.
The macro %AtRiskLatticeEnd ends the two-cell lattice that contains the plot and the table when the
at-risk table is outside the body of the plot.
Some empty macros (%StmtsBeginGraph, %StmtsTop, and %StmtsBottom) are provided to enable
you to add statements and options to strategic places in the templates.
7 The macros do not affect any graph that uses graph templates other than the two templates that are modified here. The macros
do not affect the STRATA=PANEL plot that uses the template Stat.Lifetest.Graphics.ProductLimitSurvivalPanel
or the failure plot that uses the template Stat.Lifetest.Graphics.ProductLimitFailure.
852 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
This organization makes it easy to identify the relevant parts of the templates, modify these parts, and
recompile the templates. A small portion of the %ProvideSurvivalMacros macro follows:
%macro ProvideSurvivalMacros;
%let GraphOpts = ;
854 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
The %ProvideSurvivalMacros macro declares that these macro variables are global in scope, so you can
assign values to them in your programs and have them affect the internal macros. These macro variables
specify a variety of GTL options; for more information, see SAS Graph Template Language: Reference. The
macro variables are as follows.
TitleText0 provides the common text that is used in the title for the single-stratum and multiple-strata
cases. METHOD is a dynamic variable that PROC LIFETEST sets. In these examples,
the value of METHOD is Product-Limit; the product-limit method is also known as the
Kaplan-Meier (1958) method.
TitleText1 provides the title text for the single-stratum title (relying on TitleText0).
TitleText2 provides the title text for the multiple-strata title (relying on TitleText0).
nTitles specifies the number of titles. Set the macro variable nTitles to 1 to suppress the second title
line or 0 to suppress all title lines. You can add titles to the plot by adding ENTRYTITLE
statements to the top of the %StmtsBeginGraph macro even when you suppress the usual titles
by setting the nTitles macro variable to 0 or 1. By default, nTitles equals 2.
yOptions provides the Y-axis options. The LABEL= option provides the axis label. The SHORTLA-
BEL= option provides the axis label for small plots when the LABEL= option label is too
long. The LINEAROPTS= option specifies linear axis options. This and most other axes are
linear axes; alternatives include log-scale axes. The VIEWMIN=0 and VIEWMAX=1 options
ensure that the axis goes from 0 to 1 even when the actual results have a more restricted
range. The TICKVALUELIST= option provides the tick values. Standard SAS number list
abbreviations like 0 TO 1 BY 0.2 are not valid in the GTL.
xOptions provides the X-axis options. The LABEL= option is not provided, so the axis label comes
from the column label in the ODS data object. You can add a LABEL= option or other
axis options if you want. The SHORTLABEL= option provides the axis label for small
plots when the label is too long. The short label comes from a dynamic variable that PROC
LIFETEST provides. The OFFSETMIN= option ensures that there is extra space between the
axis and the minimum tick mark. The LINEAROPTS= option specifies linear axis options.
The VIEWMAX= option ensures that the axis goes to the value in the MAXTIME dynamic
variable set by PROC LIFETEST. The TICKVALUELIST= option provides the tick values in
a dynamic variable. The TICKVALUEFITPOLICY= option provides, in a dynamic variable,
the approach for handling dense tick marks. Approaches include rotation, staggering, and
thinning.
Tips provides options for tooltips for the step plots. Tooltips are text boxes that appear in HTML
output when you rest your mouse pointer over part of the plot when the IMAGEMAP=ON
option is specified in the ODS GRAPHICS statement. Tooltips are provided for the X- and
Y-axis columns. Additional columns that are assigned roles (and hence are eligible to use as
tooltips) include the at-risk and event columns. These columns are given the tooltip labels
Number at Risk and Observed Events. Unless you are specifically interested in tooltips,
you probably do not need to modify this macro variable.
TipLabel provides a label for the Y-axis tooltip. Unless you are specifically interested in tooltips, you
do not need to modify this macro variable.
StepOpts provides options for the step functions. This macro variable is null by default. You can use
this option to control the line thickness (for example, LINEATTRS=(THICKNESS=2.5)) and
other aspects of the step functions.
The Macro Variables F 855
Groups provides the name of the data object columns that provide group names and the index that
provides the order of the group names. You will probably never need to modify this macro
variable.
BandOpts provides the group information for band plots. You will probably never need to modify this
macro variable.
InsetOpts provides options for the inset table that provides the censored value legend and the homogene-
ity test p-value. The AUTOALIGN= option specifies the places in the plot where the inset table
can be positioned. If your preferred placement is somewhere other than the top right corner,
you can modify the automatic placement list. The BORDER= option displays a border around
this table. The BACKGROUNDCOLOR= option controls the table background. By default, it
matches the background color for the rest of the plot by using the GraphWalls:Color style
reference. The OPAQUE=TRUE option specifies an opaque table that hides any graphical
elements that are behind the table. You can set the InsetOpts macro variable to null to suppress
the usual inset that contains the censored value and p-value.
LegendOpts provides options for the external legend that identifies the strata. The title comes from a
dynamic variable GroupName that the procedure sets. By default, the legend is outside the
plot. Specify LOCATION=INSIDE and an AUTOALIGN= option such as the one provided
in the InsetOpts macro variable if you want the legend to appear inside the plot. You can set
the LegendOpts macro variable to null to suppress the legend.
AtRiskOpts provides options for the at-risk table. The option DISPLAY=(LABEL) limits the display to
labels. VALUEATTRS=(SIZE=7PT) specifies a font size of seven points.
ClassOpts provides the options that are used in the at-risk table to distinguish groups of observations.
Censored provides the marker (a plus sign) that is displayed in the plot to indicate censored observations.
CensorStr provides the character for the inset table that shows how censored observations appear in the
plot.
GraphOpts provides options for the template BEGINGRAPH statement. By default, the GraphOpts
macro variable is null. The following options are particularly useful:
ATTRPRIORITY=AUTO | NONE | COLOR specifies the priority for varying the at-
tributes that distinguish groups of observations. AUTO honors the setting that is other-
wise in effect. COLOR varies only the color attribute. NONE simultaneously varies
colors, markers, and lines. Styles such as HMTLBlue and Pearl are ATTRPRIOR-
ITY=COLOR styles, whereas styles such as DEFAULT, Statistical, Listing, and RTF are
ATTRPRIORITY=NONE styles.
DATACOLORS=(color-list) specifies the list of colors (which control confidence bands)
to replace the graph data colors from the GraphData1GraphDataN style elements.
DATACONTRASTCOLORS=(color-list) specifies the list of contrast colors (which con-
trol markers and lines) to replace the graph data contrast colors from the GraphData1
GraphDataN style elements.
DATALINEPATTERNS=(line-pattern-list) specifies the list of line patterns to replace the
graph data line patterns from the GraphData1GraphDataN style elements. There are
46 line patterns, and you can specify each pattern by using an integer in the range 1
to 46. Some patterns have names associated with them. You can specify either the
name or the number for the following number/name pairs: 1 Solid, 2 ShortDash, 4
856 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
%macro pValue;
if (PVALUE < .0001)
entry TESTNAME " p " eval (PUT(PVALUE, PVALUE6.4));
else
entry TESTNAME " p=" eval (PUT(PVALUE, PVALUE6.4));
endif;
%mend;
By default, the %StmtsBeginGraph, %StmtsTop, and %StmtsBottom macros are empty. You can use them to
add new statements to the BEGINGRAPH block or to the beginning or end of the block of statements that
define the appearance of the graph.
The %pValue macro is used to control the display of the p-value from the homogeneity test.
%macro CompileSurvivalTemplates;
%local outside;
proc template;
%do outside = 0 %to 1;
define statgraph
Stat.Lifetest.Graphics.ProductLimitSurvival%scan(2,2-&outside);
dynamic NStrata xName plotAtRisk
%if %nrbquote(&censored) ne %then plotCensored;
plotCL plotHW plotEP labelCL labelHW labelEP maxTime xtickVals
xtickValFitPol rowWeights method StratumID classAtRisk
plotTest GroupName Transparency SecondTitle TestName pValue
_byline_ _bytitle_ _byfootnote_;
BeginGraph %if %nrbquote(&graphopts) ne %then / &graphopts;;
if (NSTRATA=1)
%if &ntitles %then %do;
if (EXISTS(STRATUMID)) entrytitle &titletext1;
else entrytitle &titletext0;
endif;
%end;
%StmtsBeginGraph
%AtRiskLatticeStart
layout overlay / xaxisopts=(&xoptions) yaxisopts=(&yoptions);
%StmtsTop
%SingleStratum
%StmtsBottom
endlayout;
%AtRiskLatticeEnd
else
%if &ntitles %then %do; entrytitle &titletext2; %end;
%if &ntitles gt 1 %then %do;
if (EXISTS(SECONDTITLE))
entrytitle SECONDTITLE / textattrs=GRAPHVALUETEXT;
endif;
%end;
%StmtsBeginGraph
%AtRiskLatticeStart
layout overlay / xaxisopts=(&xoptions) yaxisopts=(&yoptions);
%StmtsTop
%MultipleStrata
%StmtsBottom
endlayout;
%AtRiskLatticeEnd(class)
endif;
858 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
The primary difference between these templates is that when the macro variable Outside is 1, a LAYOUT
LATTICE statement block is used to place the at-risk table outside the graph. When Outside is 1, the macros
%AtRiskLatticeStart and %AtRiskLatticeEnd provide the LAYOUT LATTICE statement block (two cells,
plot above and at-risk table below) and the LAYOUT OVERLAY statement block for the at-risk table. The
%AtRiskLatticeStart and %AtRiskLatticeEnd macros are defined as follows:
%macro AtRiskLatticeStart;
%if &outside %then %do;
layout lattice / rows=2 rowweights=ROWWEIGHTS
columndatarange=union rowgutter=10;
cell;
%end;
%mend;
%macro AtRiskLatticeEnd(useclassopts);
%if &outside %then %do;
endcell;
cell;
layout overlay / walldisplay=none xaxisopts=(display=none);
axistable x=TATRISK value=ATRISK / &atriskopts
%if &useclassopts ne %then &classopts;;
endlayout;
endcell;
endlayout;
%end;
%mend;
The %CompileSurvivalTemplates macro relies on two other macros: %SingleStratum for the single-stratum
case and %MultipleStrata for the multiple-strata case. The %SingleStratum macro is as follows:
The Larger Macros F 859
%macro SingleStratum;
if (PLOTHW=1 AND PLOTEP=0)
bandplot LimitUpper=HW_UCL LimitLower=HW_LCL x=TIME /
displayTail=false modelname="Survival" fillattrs=GRAPHCONFIDENCE
name="HW" legendlabel=LABELHW;
endif;
if (PLOTHW=0 AND PLOTEP=1)
bandplot LimitUpper=EP_UCL LimitLower=EP_LCL x=TIME /
displayTail=false modelname="Survival" fillattrs=GRAPHCONFIDENCE
name="EP" legendlabel=LABELEP;
endif;
if (PLOTHW=1 AND PLOTEP=1)
bandplot LimitUpper=HW_UCL LimitLower=HW_LCL x=TIME /
displayTail=false modelname="Survival" fillattrs=GRAPHDATA1
datatransparency=.55 name="HW" legendlabel=LABELHW;
bandplot LimitUpper=EP_UCL LimitLower=EP_LCL x=TIME /
displayTail=false modelname="Survival" fillattrs=GRAPHDATA2
datatransparency=.55 name="EP" legendlabel=LABELEP;
endif;
if (PLOTCL=1)
if (PLOTHW=1 OR PLOTEP=1)
bandplot LimitUpper=SDF_UCL LimitLower=SDF_LCL x=TIME /
displayTail=false modelname="Survival" display=(outline)
outlineattrs=GRAPHPREDICTIONLIMITS name="CL" legendlabel=LABELCL;
else
bandplot LimitUpper=SDF_UCL LimitLower=SDF_LCL x=TIME /
displayTail=false modelname="Survival"
fillattrs=GRAPHCONFIDENCE
name="CL" legendlabel=LABELCL;
endif;
endif;
if (PLOTCENSORED=1)
scatterplot y=CENSORED x=TIME / &censored &tiplabel
name="Censored" legendlabel="Censored";
endif;
%mend;
The %MultipleStrata macro is as follows:
%macro MultipleStrata;
if (PLOTHW=1)
bandplot LimitUpper=HW_UCL LimitLower=HW_LCL x=TIME / &bandopts
datatransparency=Transparency;
endif;
if (PLOTEP=1)
bandplot LimitUpper=EP_UCL LimitLower=EP_LCL x=TIME / &bandopts
datatransparency=Transparency;
endif;
if (PLOTCL=1)
if (PLOTHW=1 OR PLOTEP=1)
bandplot LimitUpper=SDF_UCL LimitLower=SDF_LCL x=TIME / &bandopts
display=(outline) outlineattrs=(pattern=ShortDash);
else
bandplot LimitUpper=SDF_UCL LimitLower=SDF_LCL x=TIME / &bandopts
datatransparency=Transparency;
endif;
endif;
if (PLOTCENSORED=1)
scatterplot y=CENSORED x=TIME / &groups &tiplabel &censored;
endif;
if (PLOTTEST=1)
layout gridded / rows=1 &insetopts;
%pValue
endlayout;
endif;
endif;
%end;
%mend;
%macro SurvTabHeader(multiple);
%if &multiple %then %do; entry ""; %end;
entry "";
entry &r "Median";
entry "";
%macro SurvivalTable;
%local fmt r i t;
%let fmt = bestd6.;
%let r = halign = right;
columnheaders;
layout overlay / pad=(top=5);
if(NSTRATA=1)
layout gridded / columns=6 border=TRUE;
dynamic PctMedianConfid NObs NEvent Median
LowerMedian UpperMedian;
%SurvTabHeader(0)
862 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
%macro SurvivalSummaryTable;
%macro AtRiskLatticeStart;
layout lattice / columndatarange=union rowgutter=10
rows=%if &outside %then 2 rowweights=ROWWEIGHTS;
%else 1;;
%if &outside %then %do; cell; %end;
%mend;
%macro AtRiskLatticeEnd(useclassopts);
%if &outside %then %do;
endcell;
cell;
layout overlay / walldisplay=none xaxisopts=(display=none);
axistable x=TATRISK value=ATRISK / &atriskopts
%if &useclassopts ne %then &classopts;;
endlayout;
endcell;
%end;
%SurvivalTable
endlayout;
%mend;
%mend;
Dynamic Variables F 863
If you want to create an event table like the one displayed in Figure 23.30, you only need to call the %Sur-
vivalSummaryTable macro. If you want to modify the table, then you need to modify the %SurvTabHeader
and %SurvivalTable macros.
Dynamic Variables
Graph templates consist of instructions, written by SAS developers, in conjunction with SAS procedure
code. However, SAS developers cannot fully provide some instructions when the template is written, because
some elements of some graphs cannot be known until the procedure runs. For example, the legend title in a
graph that has multiple strata corresponds to the label or name of the stratification variable, and the procedure
calculates the p-value for the homogeneity test. SAS procedures create dynamic variables to provide some
run-time information to graphs.8 Some dynamic variables are set by the procedure and are declared in the
template. Other dynamic variables are also set by the procedure, but you must declare them directly or
through the template modification macros before you can use them.
_ByFootNote_ is a binary variable that, when true, displays the BY-group BY line as a footnote.
_ByLine_ is a character variable that provides the BY-group BY line.
_ByTitle_ is a binary variable that, when true, displays the BY-group BY line as a title.
ClassAtRisk is a character variable that names the data object column that contains the classification
(stratification) values.
GroupName is a character variable that contains the stratification legend title.
LabelCL is a character variable that contains the label for the confidence limits legend entry
(including the percent sign).
LabelEP is a character variable that contains the label for the equal-precision band legend entry
(including the percent sign).
LabelHW is a character variable that contains the label for the Hall-Wellner band legend entry
(including the percent sign).
MaxTime is a numeric variable that contains the maximum value to display on the X axis.
Method is a character variable that contains the method for the plot title.
NStrata is an integer variable that contains the number of strata.
PValue is a numeric variable that contains the p-value for the homogeneity test.
PlotAtRisk is a binary variable that, when true, is used to display the at-risk table.
PlotCensored is a binary variable that, when true, displays the censored values on the step functions.
8 Axis labels can be set directly in the template or at run time through dynamic variables or through data object column labels.
864 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
PlotCL is a binary variable that, when true, displays the pointwise confidence limits.
PlotEP is a binary variable that, when true, displays the equal-precision band.
PlotHW is a binary variable that, when true, displays the Hall-Wellner confidence band.
PlotTest is a binary variable that, when true, displays the p-value for the homogeneity test.
RowWeights is a pair of relative heights of the plot and the external at-risk table.
SecondTitle is a character variable that provides the second title line.
StratumID is a character variable that provides the value of the stratification variable for the single
stratum case.
TestName is a character variable that provides the name of the homogeneity test (for example,
logrank).
Transparency is a numeric variable that provides the transparency for the confidence bands in the
multiple strata case.
XName is a character variable that contains a short label for the X axis, which might be used in
place of the ordinary X-axis label when the ordinary label is long or the plot is small.
XtickValFitPol is a character variable that contains the option for handling dense tick values on the X
axis.
XtickVals is a list of X-axis tick values.
9 Because the number of dynamic variables is a function of the number of strata, the template definition cannot automatically
Style Templates
Graphs that are produced by ODS Graphics are controlled by the data object (the matrix of information that
is graphed), the graph template (the program that controls how a specific graph is constructed), and a style
template (a program that controls the overall appearance of graphs, including colors, line and marker styles,
sizes, fonts, and so on). Although it is rarely necessary, you can use different styles or modify styles to change
the appearance of all graphs, including the survival plot. In the past, you could make certain Kaplan-Meier
plot modifications only through style modifications. However, with the addition of the DATACOLORS=,
DATACONTRASTCOLORS=, and DATALINEPATTERNS= options in the GTL, you no longer have to
modify styles in order to modify how groups of observations are displayed. This section shows you how to
change styles, extract group color and other information from styles, and modify styles.
866 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
proc template;
define style styles.ListingColor;
parent = styles.Listing;
style Graph from Graph / attrpriority = "Color";
end;
run;
You need to specify the new style name in an ODS destination statement, as in the following:
proc template;
source styles.htmlblue;
run;
The results of this step, which are not shown, include the option PARENT=STYLES.STATISTICAL and do
not include definitions of the colors (gData1, gData2, ..., gData12) and contrast colors (gcData1, gcData2,
..., gcData12). These are the color definitions that are used in the style elements GraphData1, GraphData2,
..., GraphData12. You can examine the parent Statistical style as follows:
proc template;
source styles.statistical;
run;
The results of this step are not shown because they are hard to interpret in their raw form, but the desired
color definitions are included. You can submit the following statements to display the colors for the Statistical
(and hence HTMLBlue) style in a more understandable form:
proc template;
source styles.statistical / file='style.tmp';
run;
data colors;
length element Color $ 20;
infile 'style.tmp';
input;
if index(_infile_, 'data') then do;
element = scan(_infile_, 1, ' ');
Color = scan(_infile_, 3, ' ;');
Type = ifc(index(element, 'gc'), 'Line', 'Fill') || ' Colors';
i = input(compress(element, 'gcdat'';'), ?? 2.);
if i then output;
end;
run;
Type Color
Line Colors cx445694
cxA23A2E
cx01665E
cx543005
cx9D3CDB
cx7F8E1F
cx2597FA
cxB26084
cxD17800
cx47A82A
cxB38EF3
cxF9DA04
Type Color
Fill Colors cx6F7EB3
cxD05B5B
cx66A5A0
cxA9865B
cxB689CD
cxBABC5C
cx94BDE1
cxCD7BA1
cxCF974B
cx87C873
cxB7AEF1
cxDDD17E
You can use the following steps to display the GraphData1 GraphData12 line and fill colors (contrast
colors and colors, respectively):
data display;
array y[12] y1 - y12;
do i = 1 to 12; y[i] = i; end;
do x = 1 to 10; output; end;
do i = 1 to 12; y[i] = i + .5; end;
do x = 1 to 10; output; end;
run;
data _null_;
870 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
set colors;
call symputx(compress(type || put(i, 2.)), color);
run;
title;
The results are displayed in Figure 23.36. The colors in Figure 23.36 are richer than the colors in the bands
in the survival plots because of the DATATRANSPARENCY= options in the BANDPLOT statements.
%ProvideSurvivalMacros
%CompileSurvivalTemplates
You can use the information in Figure 23.36 to modify the style template, but the next example shows an
easier way.
872 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
proc template;
define style styles.&to;
parent=styles.&from;
%do i = 1 %to 12;
%let s = %scan(&list, &i);
%if &s ne %then %do;
style GraphData&i from GraphData&i /
contrastcolor = GraphColors("gcdata&s")
color = GraphColors("gdata&s");
%end;
%end;
end;
run;
%mend;
The rest of this section is optional. It explains how you can directly modify colors in a style template when
the %Reorder macro or the technique illustrated in the section Changing the Group Color on page 832 is
not sufficient.
The source code for the MyStyle style (as generated by the %Reorder macro) is as follows:
proc template;
define style Styles.MyStyle;
parent = styles.htmlblue;
style GraphData1 from GraphData1 /
color = GraphColors('gdata3')
contrastcolor = GraphColors('gcdata3');
style GraphData2 from GraphData2 /
color = GraphColors('gdata2')
contrastcolor = GraphColors('gcdata2');
style GraphData3 from GraphData3 /
color = GraphColors('gdata1')
contrastcolor = GraphColors('gcdata1');
end;
run;
You can create a modified style that has direct color specifications by using the colors in Figure 23.35 as
follows:
proc template;
define style Styles.MyStyle;
parent = styles.htmlblue;
style GraphData1 from GraphData1 /
color = cx66A5A0
contrastcolor = cx01665E;
style GraphData2 from GraphData2 /
color = cxD05B5B
contrastcolor = cxA23A2E;
style GraphData3 from GraphData3 /
color = cx6F7EB3
contrastcolor = cx445694;
end;
run;
You can define additional GraphDataN style elements as well. For more information about how to define
style elements, see the section Displaying Other Style Elements on page 876.
You can delete the new style template as follows:
proc template;
delete Styles.MyStyle / store=sasuser.templat;
run;
874 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
proc template;
source styles.htmlblue / expand;
run;
The results of this step are long and are not shown. You can write a copy of the style templates to a file as
follows:
proc template;
source styles.htmlblue / expand file='style.tmp';
run;
The EXPAND option writes the specified style (HTMLBlue), followed by its parent (Statistical), and followed
by the parents parent (DEFAULT) to the file. The following step extracts and displays the first place in the
file that the graph fonts are defined (which make the final decision in the style template):
data _null_;
infile 'style.tmp' pad;
input line $char80.;
file print;
if index(lowcase(line), ' graphfonts ') then y + 1;
if y then put line $char80.;
if y and index(line, ';') then stop;
run;
The results are displayed in Figure 23.38.
If the GraphFonts style element is defined in the HTMLBlue style, then it will appear first in the file, followed
by the definitions from the Statistical style and then the DEFAULT style. In this case, the GraphFonts style
element is defined in the DEFAULT style (last in the file), which is overridden by a definition in the Statistical
style (closer to the top of the file); that is the definition that is inherited by the HTMLBlue style and displayed
in Figure 23.38.
Displaying a Style and Extracting Font Information F 875
The following step creates a new style, BigFont, that changes the GraphLabelFont style element from a
regular 10-point font to a bold 12-point font and changes the GraphValueFont style element from a regular
9-point font to a bold 8-point font:
proc template;
define style Styles.BigFont;
parent = Styles.HTMLBlue;
style graphfonts from graphfonts /
'GraphLabelFont' = ("<sans-serif>, <MTsans-serif>",12pt,bold)
'GraphValueFont' = ("<sans-serif>, <MTsans-serif>",8pt,bold);
end;
run;
The following step creates the plot that is displayed in Output 23.39:
proc template;
delete Styles.BigFont / store=sasuser.templat;
run;
For information about making ad hoc font changes in the graph templates rather than making more global
font changes in style templates, see the section Changing the Font on page 834.
proc template;
source styles.htmlblue / expand file='style.tmp';
run;
data _null_;
infile 'style.tmp' pad;
input line $char80.;
file print;
if index(lowcase(line), ' graphdata1 ') then y + 1;
if y then put line $char80.;
if y and index(line, ';') then stop;
run;
This example displays the GraphData1 style element. The results are displayed in Figure 23.40.
The following steps display all the GraphFonts style elements from all the styles:
proc template;
source styles / file='style.tmp';
run;
data _null_;
infile 'style.tmp' pad;
length style $ 80;
retain style;
input line $char80.;
file print;
if index(lowcase(line), 'define style') then style = line;
if index(lowcase(line), ' graphfonts ') then do;
y + 1;
put style $char80.;
end;
if y then put line $char80.;
if index(line, ';') then y = 0;;
run;
The results of this step are not displayed. You can use this approach to help you better understand the
options that are available for modifying styles. The SOURCE statement specifies a single-level value of
STYLES rather than a specific style name such as STYLES.HTMLBLUE, so all templates that begin with
STYLES as the first level (all style templates) are written to the file. The DATA step displays all definitions
of GraphFonts and the names of all styles that define the GraphFonts style element.
You can insert the name of another style element (in lowercase with a leading and trailing blank) in the
preceding programs in place of graphdata1 or graphfonts. After you display a style element, you can
modify the definition and create a new style that uses the modified definition, as in the example in the section
Displaying a Style and Extracting Font Information on page 874. Some of the style elements that you might
want to display and modify are listed in Table 23.3.
878 F Chapter 23: Customizing the Kaplan-Meier Survival Plot
References
Hall, W. J., and Wellner, J. A. (1980). Confidence Bands for a Survival Curve from Censored Data.
Biometrika 67:133143.
Kaplan, E. L., and Meier, P. (1958). Nonparametric Estimation from Incomplete Observations. Journal of
the American Statistical Association 53:457481.
Klein, J. P., and Moeschberger, M. L. (1997). Survival Analysis: Techniques for Censored and Truncated
Data. New York: Springer-Verlag.
Ready to take your SAS
sas.com/books
for additional books and resources.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies. 2017 SAS Institute Inc. All rights reserved. M1588358 US.0217