Statistical Graphics Procedures by Example Effective Graphs Using SAS
Statistical Graphics Procedures by Example Effective Graphs Using SAS
Statistical Graphics Procedures by Example: Effective Graphs Using SAS®. Cary, NC: SAS Institute Inc.
Statistical Graphics Procedures by Example: Effective Graphs Using SAS®
Copyright © 2011, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-60764-887-1 (electronic book)
ISBN 978-1-60764-762-1
All rights reserved. Produced in the United States of America.
For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the
prior written permission of the publisher, SAS Institute Inc.
For a Web download or e-book: Your use of this publication shall be governed by the terms established by
the vendor at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the
permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic
editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of
others’ rights is appreciated.
U.S. Government Restricted Rights Notice: Use, duplication, or disclosure of this software and related
documentation by the U.S. government is subject to the Agreement with SAS Institute and the restrictions set
forth in FAR 52.227-19, Commercial Computer Software-Restricted Rights (June 1987).
SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414
1st printing, November 2011
®
SAS Publishing provides a complete selection of books and electronic products to help customers use SAS
software to its fullest potential. For more information about our e-books, e-learning products, CDs, and hard-
copy books, visit the SAS Publishing Web site at support.sas.com/publishing or call 1-800-727-3228.
®
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are registered trademarks or trademarks of their respective companies.
Contents
Preface ix
Chapter 1 Introduction
1.1 Principles of Effective Graphics 3
1.2 Automatic Graphs from SAS Procedures 10
1.3 Graph Template Language 11
1.4 Statistical Graphics Procedures 11
1.5 Organization of This Book 12
1.6 Data Sets and Custom Styles 14
1.7 Color and Gray-Scale Graphs 14
1.8 Effective Graphics and the Use of Decorative Skins 15
1.9 SAS 9.2 and SAS 9.3 Features 15
Chapter 2 Statistical Graphics Procedures
2.1 Key Concepts 19
2.2 SGPLOT Procedure 22
2.3 SGPANEL Procedure 24
2.4 Combining Statements 29
2.5 SGSCATTER Procedure 30
2.6 Styles and Their Usage 32
2.7 Template-Based Graphics vs. Device-Based Graphics 33
Chapter 3 Common Graphs
3.1 Introduction 37
3.2 Single-Cell Graphs 38
3.3 Classification Panels 55
3.4 Comparative and Matrix Graphs 57
Chapter 4 Basic Plots
4.1 Introduction 61
4.2 SGPLOT Procedure 62
4.3 Plot Roles and Options 63
4.4 Scatter Plot 64
4.5 Scatter Plots with Data Labels 74
iv Contents
The question we often got from users was simply, “How do I make this graph?” The best way for
us to answer that question was to show the code needed to create it. If a book could answer that
question, what would it look like? Why not create a book that starts from the end result and work
backward to show exactly what is needed to create the graph? That led to the idea of writing this
book.
The primary audience for this book is the SAS user who wants to visualize raw data or create a
graph from the results of a custom analysis. Often, you already have a mental image of the graph
you want to create; you just need to quickly find the correct syntax. Product documentation and
books on the topic often take a procedure-centric approach and describe the features one at a
time. Figuring out what is needed to create a graph from such resources requires a solid
understanding of the procedure, and it can take a while to obtain the right results. It would be
much easier if we start with the graph you want, and show you the code instead.
This book addresses this situation by using examples to document the procedure options. Users
can look through the large number of graph examples and find the type of graph they want to
create. Each of the graph examples includes the code needed to generate the graph, along with
a brief commentary on key features of the graph.
The reason why this approach works so well for the SG Procedures is because these procedures
take a building-block approach. You start with the basic plot, and simply add the features you
need one at a time. The procedures support a wide array of plot types, so the combinations and
possibilities grow rapidly. For example, if you know how to build a simple series plot, then
creating a plot with three series plots becomes straightforward. You simply add two more series
statements, and the procedure automatically does the work. The same principle also applies to
combinations of disparate, but compatible, plot types, such as a bar chart and a line chart.
The book also describes, by example, other important features of the procedures such as axes,
insets and legends. For example, you can take a sample graph with a linear axis, and change it
to log. You can also create a custom axis displaying only the values you want, add insets and
customize the legends.
The SG Procedures are designed with the principles of effective graphics built-in to convey the
information with maximum clarity and minimum clutter. By default, these procedures will create
graphs that are free of unnecessary clutter in the graph elements, legends, and axes.
These procedures are designed to create graphs that are suitable for the statistical and analytical
use cases. Such graphs emphasize the maximization of “chart ink” and removal of elements that
are not clearly necessary for the delivery of the information. It will be evident that you have to do
very little to get aesthetically pleasing graphs for these use cases.
Visual aesthetics, however, are in the eye of the beholder. In non-statistical use case, the
expectation of the consumer is for flashy graphs, even at the cost of some effectiveness. One
person’s “chart junk” is another person’s “cool”. These procedures are finding increasing usage
for the creation of graphs for the business domain. They support some options to add “flash” to
these graphs. These options are mainly available for the bar charts and can be used when
necessary.
SG Procedures support full-color graphics. Many styles shipped with SAS are optimized for
creation of color graphs. However, printing color graphs in gray scale can sometimes lead to
undesirable results. This is important, since many technical journals are printed in gray scale.
Since this book is printed in gray scale we have used the appropriate gray scale styles.
The examples and techniques discussed in this book will be relevant and useful for all SAS users.
This is particularly so for statisticians and other analytical users in the pharmaceutical, clinical
trials, health care, financial, and other domains. This book is focused on how to create the
required graph given the data. Techniques for modeling and analysis of the data itself are
beyond the scope of this book.
Acknowledgements
Many people have contributed in many different ways to make this book possible. We would like
to thank Bob Rodriguez, Senior Director in Advanced Analytics at SAS, for supporting the
concept behind this book and for his detailed review of the contents. His insightful suggestions
have significantly improved the quality of the materials and the presentation of the book.
On the contents of the book and accuracy of the information, we received invaluable support from
Melisa Turner and Susan Schwartz, our “eagle eye” team of technical reviewers. Both Melisa
and Susan invested many days reviewing the code and improving the contents. We also thank
our reviewers Lelia McConnell, David Schlotzhauer, Peter Christie, and Rebecca Ottesen for their
valuable suggestions. Our heartfelt thanks go to Susan Slaughter for her review, moral support,
and guidance on this project. The table of Statement Combinations in section 2.4 is modeled
after a similar table from Susan and Lora Delwiche’s recent paper, “Using PROC SGPLOT for
i
Quick High-Quality Graphs.”
Preface xi
Last and most importantly, we thank our families for the understanding and support they provided
while we spent long evenings and weekends on this project.
Author Pages
Each SAS Press author has an author page, which includes several features that relate to the
author including a biography, book descriptions for coming soon titles and other titles by the
author, contact information, links to sample chapters and example code and data, events and
extras, and more.
Comments or Questions?
If you have comments or questions about this book, you may contact the authors through SAS as
follows:
Mail:
E-mail: [email protected]
For a complete list of books available through SAS Press, visit support.sas.com/publishing.
Receive up-to-date information about all new SAS publications via e-mail by subscribing to the
SAS Publishing News monthly eNewsletter. Visit support.sas.com/subscribe.
i
Available at http://support.sas.com/resources/papers/proceedings09/158-2009.pdf.
xii
Chapter 1
Introduction
1.1 Principles of Effective Graphics 3
1.2 Automatic Graphs from SAS Procedures 10
1.3 Graph Template Language 11
1.4 Statistical Graphics Procedures 11
1.5 Organization of This Book 12
1.6 Data Sets and Custom Styles 14
1.7 Color and Gray-Scale Graphs 14
1.8 Effective Graphics and the Use of Decorative Skins 15
1.9 SAS 9.2 and SAS 9.3 Features 15
2 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Chapter 1: Introduction
Graphs are an essential part of modern data analysis. From clinical trials to quality control,
effective graphs are integral to the analysis process. Large quantities of data are collected for
clinical drug trials for safety, retail sales, warranty claims, medical lab results, and financial
transactions. Analysis of this data often relies on review of the data in tabular form. Viewing the
data in the form of a graph along with results of the statistical analysis of the data on the same
graph can significantly enhance the understanding of the data and the results.
A key aspect of this process is the ability to create an effective graph that can communicate the
raw data along with the statistical analysis results in a clear and concise form. These graphs can
help the analyst to visualize the trends and patterns in the data and the associations between
variables that are not evident in tabular form. Such insights can guide the direction of further
questions and formulation of additional testing methods and gathering of more focused data.
This graph is further complicated by addition of the cumulative strips along the outside of the
pie chart. One reason why a pie chart is not an effective graph is the difficulty of making
magnitude comparisons when the data is plotted as an angle from a common (or non-
common) base. This applies to the strips and the slices.
4 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Figure 1.2
Comparison of magnitude along a linear scale from a common base is very reliable. The
same data rendered as a bar chart is much easier to decode as shown above in Figure 1.2.
nd rd th
In this graph, the comparisons between the 2 , 3 and 4 bars for the % case are much
easier and more reliable.
The pie chart can be a useful visual for some use cases as shown in Figure 1.3. This graph
shows the portion of sales for Auto as a fraction of the total sales. The pie chart can work
well for visualizing such “part-to-whole” comparisons.
In Figure 1.4, it is relatively easy to compare the magnitudes of the responses for all drugs.
The line segments in Figure 1.5 are plotted from different baselines, but still it is possible to
compare the magnitudes of each line segment for the drugs.
Differentiation of marker shapes and line patterns is pre-attentive. Groups can be easily
differentiated when marker shapes or line patterns are used as grouping indicators as
shown in Figures 1.6 and 1.7.
Stevens' power law proposes a relationship between the magnitude of a physical stimulus
and its perceived intensity or strength. The law proposes that the accuracy of magnitude
perception for visual length is linear. Magnitude of a line twice as long is perceived almost as
twice as long. However, the accuracy of perception of magnitude is reduced for other
representations. For area, it is only about 1.6. That is, an area twice as large only seems
like 1.6 times as large. So, we tend to underestimate areas.
Figure 1.8 and Figure 1.9 both display the mean MPG by type of car. Figure 1.8 uses the dot
plot where the response values are plotted as a linear distance from a common baseline. It is
very easy for the eye to decode the relative values of each car type. Grid lines help to line up
the values.
Figure 1.9 displays the same data using a pie chart. Clearly, it is much harder to decode the
relative values since it is difficult to make good magnitude comparisons using angular
distances, especially from different baselines.
Chapter 1 Introduction 7
Figure 1.10 and Figure 1.11 both display the mean MPG by type of car. Figure 1.10 uses a
vertical bar chart to display the data. It is very easy for the eye to decode the relative values
for each car type. A format is applied to the response column to reduce the clutter for the
data label.
Figure 1.11 displays the same data using extruded 3-D bars. This is often referred to as a
2.5-D graph, since the data itself has only two dimensions. The third dimension is artificially
added to make the bars appear like 3-D blocks. Usage of such aesthetic features can
sometimes inhibit the process of decoding the data accurately.
There are several potential pitfalls in the 2.5-D representation of the data. The axis values
are displayed on the left at the “front” face of the bars. The grid lines are drawn along the
side and back face of the graph. So to measure the value for each bar, one has to line up the
correct face (front or back).
Often in such representations, the bars do not occupy the full depth of the walls, thus leaving
room for confusion. Even though the bar values are displayed in the 2.5-D case, some
values, such as for the Truck category, can become partially hidden behind other bars.
o Dot plots, needle plots, and bar charts are good for representations of magnitude.
o Pie charts and area plots are not ideal for representations of magnitude.
o Color intensity is not an effective representation of magnitude.
o 3-D representations are often not effective when the data are 2-D.
o Unobtrusive grid lines can help in the decoding of the data.
8 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
When you compare magnitude between categories, closer proximity increases the accuracy
of comparison. For an effective graph, it helps to bring the items that are to be compared as
close to each other as possible.
Figure 1.12 is suitable for comparison of MPG for sedans and sports cars manufactured in
different regions (origin). The comparison between car types is facilitated by bringing these
categories closer in proximity. In this graph it is harder to compare “Sedan” from USA with
“Sedan” from Asia. Figure 1.13 is more suitable for such a comparison where “Origin” is
used as the category role.
Edward Tufte’s principles for the creation of effective graphics include the following
recommendations:
Often when creating graphics for marketing and sales presentations, there is a desire to
make the graph visually “compelling”. To add this “Wow” factor, visual elements may be
added to the graph to make it more aesthetically appealing. If one is not careful, these
artifacts may introduce distractions or, worse, actually distort the data, making it harder to
decode the data accurately.
Effectiveness of a graph can be enhanced by removing unnecessary artifacts from the graph.
Avoid usage of gradient background and images. Inclusion of embellishments like drop
shadows for data markers can increase the visual appeal of the graph but can reduce the
effectiveness of the graph.
Chapter 1 Introduction 9
Generally, people find it difficult to absorb and retain a large number of data values at one
time. Short-term memory is limited. Arranging the data in smaller chunks can aid in the
processing of information. Research in this field shows:
Both of the graphs above display revenues by product over time. The data are grouped by
product, each series representing one product. Figure 1.14 uses a traditional legend at the
bottom of the graph to identify each product.
To compare revenues for desks and chairs, you have to move your eyes down to the legend
and then back up to the plot. Figure 1.15 uses direct labeling for each series. This
eliminates eye movement and thus facilitates easier comparisons of the data.
1.1.7 Summary
You can use the above-mentioned guidelines to create graphs that convey information with
maximum effectiveness and minimum distractions. In summary, we suggest that you:
The SG Procedures provide the features you need to implement the above guidelines.
10 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
To create the graphs from SAS 9.2 analytical procedures, you only need to switch on the
ODS Graphics system before running the procedure. This is done by including the following
statements in your program. No additional graph code is required.
ods html;
ods graphics on;
Figure 1.16 shows the usage of ODS Graphics with an example of the REG procedure. In
this example, we have specifically requested the output of the Analysis of Variance table and
the Fit Plot by using the ODS SELECT statements. The resulting HTML output is shown in
Figure 1.17. The Analysis of Variance table and the Fit Plot are produced in the right
sequence in the output HTML file.
Chapter 1 Introduction 11
It is worth repeating that the Fit Plot is produced automatically, without any graph coding
required on the part of the user. It is useful to note that the procedures have to run
additional processing steps to create these graphs. Some procedures may create a large set
of graphs, which is something to consider when using this option.
With SAS 9.3, running in DMS mode, the default open destination is HTML, and ODS
Graphics is on by default. For line mode, the default open destination is LISTING, and ODS
Graphics is off by default. This is a change from SAS 9.2, where the default open destination
is LISTING, and ODS Graphics is off by default.
GTL can also be used directly by you, the SAS user, to create your own custom graph
template. Then, you can use the SGRENDER procedure to associate this template with the
appropriate data to produce the resultant graph.
A detailed description of GTL and the SGRENDER procedure is beyond the scope of this
book. However, it should be noted that all graphs created by ODS Graphics system are done
using the GTL syntax at some level. This is also true of the graphs produced by the
Statistical Graphics (SG) procedures, which is the topic of this book.
The graphs are created using GTL behind the scene. So, these graphs have the same look
and feel as the automatic graphs created from the SAS analytical procedures. These graphs
are useful for visualization of the raw data or for custom graphs of analysis results.
2. Analysis of Data
This phase of the project involves the analysis of the data using analytical procedures
and/or your own custom data steps. Automatic graphs can be obtained from individual
procedures as mentioned in section 1.2. In this phase you may also need to create
specialized graphs that are not currently supported by the procedure itself.
For all steps in the process above, you need the ability to create graphs from the raw data or
from the results of your custom analysis. You may also want to use the graphs that are
automatically created for you by individual procedures in the report for the project. In this
case, the SG procedures are the ideal tools for this job for the following reasons:
o Graphs created by the SG procedures are identical in look and feel to the automatic
graphs created by the analytical procedures. Mixing and matching the output from
the SG procedures and the analytical procedures is seamless.
o SG procedure steps can be run along with the analytical procedures and data steps
to produce a sequential output in the open ODS destination.
o SG procedures provide a simple and concise syntax to create many types of graphs,
classification panels, and scatter plot matrices.
o With SAS 9.3, SG procedures also provide the ability to annotate the graph with
Annotate-like functionality using a data set. Additionally, attribute maps can be used
to control the usage of visual attributes like color or marker symbols in the graph.
These topics will be discussed in detail in Chapter 9.
Instead of listing all the options and features of the procedures, we take the reverse
approach. If you have an idea of the graph you want to make, you can just flip through this
book and find the graph closest to what you need. Then, right alongside, you will find the
code necessary to create the graph. From there, you can build on the graph by borrowing
from other examples in the book.
Chapter 1 Introduction 13
SG procedures utilize a building-block approach to creating a graph. If you see two graphs
that each individually include elements that you want in one graph, it is highly likely that you
can combine the statements in one procedure step and get the combined graph you need.
For example, you can combine a scatter plot, a series plot, and various regression plot
statements from different examples into one procedure step. Some common combinations
are as follows:
o Scatter, Series, Step, Band, Regression, Ellipse, VBarParm, and HBarParm
o Histogram and Density
o Bar Chart, Line Chart, and Dot Plot
o In Chapter 2, we start with a general description of each procedure. This will show
you the structure of the syntax and the main features with a few examples. From
there on, we focus on examples, starting with single-cell graphs, and then moving on
to more complex cases.
o In Chapter 3, we review graphs that are commonly used in various domains. This
section covers the different graph types you can create. We defer the detailed
discussion of the plot options to subsequent chapters.
o In Chapters 4–7, we cover the main groups of single-cell graphs using the SGPLOT
procedure. Plot statements used in these graphs can be combined within the groups
to create the graph you need. Various supported options are used to demonstrate
the features.
o In Chapter 8, we cover common customizations for axes, legends, and insets.
o In Chapter 9, we cover the topics of annotation and attribute maps. Annotations
allow you to add custom graphical elements to a graph that may or may not be data
driven. Attribute maps provide you the ability to tie the plot attributes like color,
symbols, and line patterns to explicit data values. These powerful features for
detailed customization of graphs are included with SAS 9.3.
o In Chapter 10, we cover classification panels using the SGPANEL procedure. This
topic leverages all you have learned about single-cell graphs to produce graphs that
are classified by multiple class variables.
o In Chapter 11, we cover comparative scatter plots and scatter plot matrices using the
SGSCATTER procedure.
o In Chapter 12, we cover graphs commonly used in the health and life sciences
industry.
o In Chapter 13, we cover some special business graphs. Here you will find detailed
examples that combine features from previous chapters to create the graph.
14 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
o In Chapter 14, we cover the topic of styles. Here you will see the inner workings of
styles and the association between style elements and graph features. We will cover
the basics of creating your own custom style for graphs.
o In Chapter 15, we cover the options on the ODS DESTINATION statement that have
a direct bearing on the rendering of the graphs. We also review the options you can
set on the ODS GRAPHICS statement to control aspects of graph rendering.
o In Chapter 16, we cover how to create graphs appropriate for different use cases.
Often, graphs are created for inclusion in a full slide of a Microsoft PowerPoint
presentation, or in one 3-1/4” column of a Microsoft Word document for a printed
journal. We will provide some tips on how to create graphs that are suitable for such
use cases.
Custom styles are sometimes used to render some of the graphs in this book. Primarily,
these are necessary to reduce the font sizes to help fit the graphs into the small space
available. The results you see may vary based on the active style for an ODS destination.
However, when color graphs are printed in gray scale, there is a significant loss of fidelity in
the representation of distinct categories in the graph. For example, a graph with two series
plots, one for Drug A and one for Drug B can be well represented in color with use of two
distinct colors, say red and blue. These colors are often designed to have equal weight to
avoid unintentional bias.
When such a graph is printed in gray scale, these two series plots may look very similar
unless they have other distinguishing features such as line patterns and marker shapes to
facilitate discrimination between groups. Bar charts can benefit from use of fill patterns to
facilitate such discrimination.
This book is printed in gray scale, so it is important to create the graphs that will print well in a
gray-scale format. To ensure this, it is best to create the original graph in the gray-scale
format that maximizes the discriminability of the different categories and groups. All of the
Chapter 1 Introduction 15
graphs included in this book are created using gray-scale styles such as Journal, Journal2 or
Journal3, or styles derived from these styles.
When you run a program from this book, or one of your own, the graph will be rendered using
the active style of the open destination. For SAS 9.2, this is the LISTING destination. For
SAS 9.3 in DMS session, this is the HTML destination. Since both of these destinations use
a default color style, you will get a graph rendered in full color. To get a gray-scale graph,
use one of the styles mentioned above.
The WIDTH or HEIGHT options on the ODS GRAPHICS statement have been used to render
the graphs for this book. However, these options are not shown in the sample code. When
you run the same code without these options, the graphs will render in the default size.
Though initially designed with the statistical user in mind, these procedures are finding
increasing usage in non-statistical domains. In these use cases, there is often a desire for a
flashier graph, even at the expense of effectiveness.
In the SG procedures, the bar chart statements support an option to apply a decorative skin
to a bar. This does not change the shape of the bar but provides a “flashier” rendering. This
option can be used at the discretion of the user. Some examples in this book use this option
to demonstrate this feature. The intention is primarily to expose these available features.
In some cases you can remove the new option and still run the code using SAS 9.2.
Note: When running SAS 9.3 in DMS mode, the default open destination is HTML. For non-
DMS mode, the default open destination is LISTING. The ODS GRAPHICS feature is
automatically enabled for the execution of the SG procedures.
16
Chapter 2
Statistical Graphics Procedures
2.1 Key Concepts 19
2.2 SGPLOT Procedure 22
2.3 SGPANEL Procedure 24
2.4 Combining Statements 29
2.5 SGSCATTER Procedure 30
2.6 Styles and Their Usage 32
2.7 Template-Based Graphics vs. Device-Based Graphics 33
18 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
The SG procedures provide a direct procedure interface into the ODS Graphics system. These
procedures create graphs with very little code. The SGPLOT and SGPANEL procedure syntax
allows you to build up complex graphs by use of plot statements and other features. The
SGSCATTER procedures syntax is designed to give single-statement access to a variety of
scatter plot panels and matrices.
Legend
Data
Area
Plots
Graph
Figure 2.1
Figure 2.1 shows a typical single-cell graph built using the following components:
o one or more titles at the top of the graph.
o one or more footnotes at the bottom of the graph
o one region in the middle displaying the data
o one or more plots in the data area
o one or more legends or insets inside the data area or outside
Note: All statements are referred to as “plot”, regardless of whether it is a series plot,
scatter plot, bar chart, histogram, and so on.
20 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Graph: Refers to the individual output created by the procedure. In most of the common
use cases, each execution of the procedure creates one graph output file. Often these
procedures produce multiple output files (for BY variable usage, or paging of large panels),
each of which is referred to as a “Graph”.
Cell: Each graph can have one or more data areas where the data is plotted as shown in
Figure 2.2. Each of these is referred to as a “Cell”. A cell may or may not include axes.
Plot statements: Each plot statement is responsible for drawing only its own data
representation. It is told by the container where to draw itself and how to scale its data.
Axes: The X and Y axes are shared by all the plots in the graph. The data range for each
axis is determined by the graph and is based on all the plots placed in it. Each graph can
have a second set of axes, called X2 (at the top) and Y2 (on the right). Each plot can
specify which axes it wants to use.
Legends: A graph can have one or more legends, and each can be placed in any part of
the graph. Each legend can specify the information to be displayed in it.
Classification panels are very useful to visualize the distribution of data classified by one or
more class variables in one display. Figures 2.2 and 2.3 both display the association
between MPG and Horsepower for vehicles by country of origin and type of car. From both
graphs, one can easily glean some information:
Figure 2.2 displays a classification panel with a Lattice layout, which supports two class
variables, one for row and one for column. Each row and column has a header that
displays the value of the classification variables.
Figure 2.3 displays a classification panel with a Panel layout, which supports multiple class
variables. Each cell in the panel has multiple headers, one for each class variable.
Figure 2.4 displays a comparative graph for Mileage and Price by Weight and Horse
Power. This graph has common axes for comparison of the values. Figure 2.5 shows a
Scatter Plot Matrix for three variables, MPG, Horsepower, and MSRP. Such a matrix can
provide preliminary visual indication of direct or inverse associations between the variables.
Comparative and Matrix graphs are created using the SGSCATTER procedure.
If the results of the built-in heuristics are not desirable, they can be turned off.
22 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
1. The procedure statement supports multiple options. Use of these options will be
demonstrated in the examples shown in later chapters.
2. One or more plot statements are used to represent the data. Each plot statement has its
own set of required data roles and options. These are described in detail in later
chapters. Many plot statements are supported and can be grouped as shown below.
Plot statements from compatible groups can be combined in one procedure step:
a. Reference Lines
b. Insets
c. Axes
d. Legends
Figure 2.6 shows a typical use case of the SGPLOT procedure. In this example, we have
created a distribution plot for the variable “Horsepower”. Three separate plot statements are
used, one for each data element of the graph. These statements are rendered in the order in
Chapter 2 SG Procedures 23
which they are specified as illustrated in Figure 2.7. Figure 2.9 shows the actual rendered
graph.
position=topright
across=1;
run;
In Figure 2.6, we start with the HISTOGRAM statement, and then place two DENSITY plots
on it. The KEYLEGEND statement is used to customize the position of the legend.
o The HISTOGRAM statement is placed first in the area bounded by the axes.
o The first DENSITY curve (“Normal” by default) is placed on top of the histogram.
o The second DENSITY curve (Type=Kernel) is placed on top of the Normal curve.
o The legend is actually built by default by the procedure and placed on top. The
KEYLEGEND statement is used to customize its location.
Figure 2.8 and Figure 2.9 display the program and the resulting graph. The XAXIS statement
is added to suppress the x-axis label, as it is unnecessary given the title of the graph.
position=topright
across=1;
xaxis display=(nolabel);
run;
As mentioned above, the supported plot statements are grouped in four categories. Plots
within each category can be combined in a procedure step. Plots from the “Basic Plots”
group and “Fit and Confidence Plots” group can also be combined.
INSET, REFLINE, and KEYLEGEND statements can be used as necessary to add these
elements to your graph.
Axes statements can be used to customize the appearance of any of the four axes:
RUN;
1. The procedure statement supports multiple options that will be demonstrated in the
examples shown in later chapters.
2. The PANELBY statement is required and must be placed before of any of the plot, refline,
inset, axis, or legend statements. This statement is used to set the layout type and other
options that control the overall paneling of the cells.
Chapter 2 SG Procedures 25
3. One or more plot statements are used to represent data. Each plot statement has its
own set of data roles and options described in detail in later chapters. This procedure
supports most of the same plot statements as the SGPLOT procedure.
Figure 2.10 shows program statements for the creation of a classification panel displaying the
distribution of mileage by origin and type. The resulting graph is shown in Figure 2.11.
The program code in Figure 2.10 has the following noteworthy items:
o The PANELBY statement is required, and must come before any plot, axes,
reference line, or legend statements. This sets the classification variables.
o The two plot statements, HISTOGRAM and DENSITY, define the “Prototype” that is
used to populate each cell of the panel.
o The ROWAXIS and COLAXIS statements are used to customize the row and column
axes.
The PANELBY statement must provide one or more class variables. The procedure
automatically subdivides the graph region into multiple cells based on all crossings of the
unique values of the class variables. In Figure 2.10, we have specified two class variables
“Origin” and “Type”, each having two unique values. Hence, we get a graph with four cells,
as shown in Figure 2.11. Each cell has two headers.
Each cell in the graph is populated with the same set of plot statements as specified in the
procedure syntax. In this case, each cell gets a HISTOGRAM and a DENSITY plot. The
26 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
data for each cell is a subset based on the class value(s) for the cell. In Figure 2.11, the cell
titled “Asia” displays the subset of the data where Origin=’Asia’.
The non-plot statements such as the ROWAXIS and COLAXIS are used to customize the
common axes. A legend is automatically created based on an inspection of the syntax.
All cells of the graphs created by this procedure have common external axes:
The SGPANEL procedure supports a number of layout types, which will be described in the
following sections.
This is the default layout type. This layout supports one or more (N) class variables. A cell is
created for each crossing of the unique values of all the class variables. Figure 2.12 shows a
graph with two class variables, each with 2 levels, resulting in a graph with 4 cells.
Figure 2.12
Chapter 2 SG Procedures 27
o Each cell has N cell headers, one for each class variable, displaying the value for
each cell.
o By default, only the cells that have data are displayed. Cells without any data are
dropped from the graph. An option can be used to display all cells.
o The procedure automatically decides the number of rows and columns for the grid.
When a graph has many cells, the procedure will automatically break up the graph
into multiple “pages”, to prevent the cells from getting too small.
o Common external row and column axes are used.
o Options are available to allow the user to control the “paging” of the graph.
This layout requires two class variables. The first class variable is treated as the Column
variable, and the second as the Row variable. See Figure 2.13.
o Each unique value of the Column variable creates a column in the grid.
o Each unique value of the Row variable creates a row in the grid.
o Each column gets a column header, by default at the top.
o Each row gets a row header, by default on the right.
o Common external row and column axes are used.
o Every crossing of the unique values of the column and row variables is displayed as
a cell, regardless of whether it has data or not.
Figure 2.13
28 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
This is a special case of the LATTICE layout, where only one variable is provided in the list of
class variables. This layout produces a panel of columns in one row. Each cell has a column
header as shown in Figure 2.14.
Figure 2.14
PBSPLINE
HBARPAR
HIGHLOW
VBARPAR
ELLIPSE *
SCATTER
HISTOGR
DENSITY
LINEPAR
REFLINE
VECTOR
NEEDLE
BUBBLE
SERIES
LOESS
HLINE
VLINE
HBOX
VBOX
BAND
HBAR
VBAR
STEP
REG
DOT
Basic Plots
SCATTER x x x x x x x x x x x x x x x x
SERIES x x x x x x x x x x x x x x x x
STEP x x x x x x x x x x x x x x x x
BAND x x x x x x x x x x x x x x x x
NEEDLE x x x x x x x x x x x x x x x x
VECTOR x x x x x x x x x x x x x x x x
VBARPARM x x x x x x x x x x x
HBARPARM x x x x x x x x x x x
BUBBLE x x x x x x x x x x x x x x x x
HIGHLOW x x x x x x x x x x x x x x x x
REFLINE x x x x x x x x x x x x x x x x x x x x x x x x x
LINEPARM x x x x x x x x x x x x x x x x
LINEPARM x x x x x x x x x x x x x x x x
Fit and Confidence Plots
REG x x x x x x x x x x x x x x
LOESS x x x x x x x x x x x x x x
PBSPLINE x x x x x x x x x x x x x x
ELLIPSE * x x x x x x x x x x x x x x
Distribution Plots
HISTOGRAM x x x
DENSITY x x x
VBOX x x
HBOX x x
Categorization Plots
VBAR x x x
VLINE x x x
HBAR x x x x
HLINE x x x x
DOT x x x x
Figure 2.16
30 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
5. RUN;
1. The procedure statement supports multiple options that will be demonstrated in the
examples shown in later chapters.
2. The PLOT statement creates a set of cells arranged in a uniform grid in the graph based
on the plot request. Each cell includes an independent scatter plot with optional fit and
ellipses. Axes can be made uniform across all plots, if needed. Figures 2.17 and 2.18
show an example of the PLOT statement. The two Y axes are not uniform.
markerattrs=(symbol=circlefilled)
transparency=0.85;
run;
3. The COMPARE statement creates a row and column grid of cells in the graph based on
the list of X and Y variables. Each cell includes a scatter plot with optional fit and
ellipses. Each row of the grid has a common external y-axis. Each column of the grid
has a common external x-axis. Options can be used to customize the plots. Figures
2.19 and 2.20 show an example of the COMPARE statement.
4. The MATRIX statement creates a row and column grid of cells based on the list of
variables specified. Each cell includes a scatter plot with optional fit and ellipses. Each
row and column of the grid has external axes. Figures 2.21 and 2.22 show an example
of the MATRIX statement.
title 'Variable
Associations';
proc sgscatter
data=sgbook.cars2;
matrix mpgc weight hp msrp
/
markerattrs=(symbol=
circlefilled)
transparency=0.9;
run;
Graphs also derive the visual attributes for plot colors, marker symbols, line thickness, axis
label fonts, etc. from specific named elements of the style. The association between the
element of the graph and the style element is well defined and described in detail in the ODS
product documentation. Some relevant information is included in Chapter 14.
You can control the visual appearance of the graphs in different ways:
1. Use a pre-defined SAS style: Every ODS output destination has a default style. All
graphs written to this destination use this style by default. You can change the active style
for an ODS output destination by setting the STYLE= option for the destination. All graphs
written to that destination will then use that style.
2. Use a custom style: If you like one of the pre-defined SAS styles, but would prefer to
change a few of the appearance settings, you can derive a new style from one of the SAS
styles by using the TEMPLATE procedure. For more information on this topic, see the ODS
product documentation for PROC TEMPLATE.
3. Use appearance options: A style set on the ODS destination affects all features of the
all graph rendered to the destination. You can customize the appearance of specific features
of a graph by setting specific appearance options in the procedure syntax. This overrides the
settings derived from the style only for that one use case.
Usage of these options will become apparent through many examples in this book. More
information on this topic is covered in Chapter 14.
Chapter 2 SG Procedures 33
1. SG procedures create output directly in industry standard formats, such as PNG, JPG,
PDF, PS, and so on. SG procedures do not create GSEG output.
a. The GOPTIONS statement is not supported (except for using the RESET option to
reset titles and footnotes). Global options that control the output are specified on the
ODS GRAPHICS statement.
b. TITLE and FOOTNOTE statements are supported, with the exception of a few
options.
c. PATTERN and SYMBOL statements are not supported. However, plot types and
attributes are specified directly on the plot statements, which make the application of
those attributes clearer.
d. AXIS statements are not supported. However, PROC SGPLOT supports four axis
statements (XAXIS, YAXIS, X2AXIS, and Y2AXIS) to control the axes around the
cell. PROC SGPANEL supports the ROWAXIS and COLAXIS statements to control
its axes. PROC SGSCATTER does not have axis statements, but there are
techniques that will be described later for controlling some aspects of the axes.
e. LEGEND statements are not supported. PROC SGPLOT and SGPANEL support a
KEYLEGEND statement to control the contents of the legend, its layout, and position.
PROC SGSCATTER has a LEGEND option to give some legend control.
34
Chapter 3
Common Graphs
3.1 Introduction 37
3.2 Single-Cell Graphs 38
3.3 Classification Panels 55
3.4 Comparative and Matrix Graphs 57
36 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
3.1 Introduction
All the graphs have a common structure:
o one or more titles at the top and one or more footnotes at the bottom
o various graphical representations of the data in the middle
o one or more legends, inside or outside the data area
o insets or statistics table inside or outside the data area
Figure 3.1.1
Single-Cell Graphs: In section 3.2, we will use the SGPLOT procedure to create single-cell
graphs. We will put together the needed syntax to create graphs like the one shown above in
Figure 3.1.1.
Classification Panels: In section 3.3, we will use the SGPANEL procedure to create
classification panels. Once you know how to mix and match statements for PROC SGPLOT, all
you need is a PANELBY statement to set up the classification variables.
Comparative and Matrix Graphs: In section 3.4, we will use the SGSCATTER procedure to
create comparative and matrix graphs.
In the rest of this chapter, we provide a broad coverage of the different types of graphs you can
create using the SG Procedures. The goal is to expose to you the different types of graphs,
without going deep into the features of each plot. We will cover that in subsequent chapters.
38 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
4.1 Introduction
Basic plots used in both the SGPLOT and SGPANEL procedure are the workhorse plots for
visualization of raw data, or of summary statistics that have been computed prior to creation of
the graph. These plots do not process the data in any way, but they plot the raw data values in
the graph. The plots included in this group are:
Figure 4.1
62 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Options:
CYCLEATTRS boolean Cycle through style elements for each plot
DATA =sas-data-set Optional data set
DATTRMAP =sas-data-set Data set defining an attribute map
DESCRIPTION =string Description string
NOAUTOLEGEND boolean Do not show the automatic legend
NOCYCLEATTRS boolean Do not cycle style elements for each plot
PAD =value Padding around the outside of the graph
SGANNO =sas-data-set Data set containing the annotations
TMPLOUT =string File name for generated graph template code
UNIFORM =keyword Uniform axis and legends across BY variables
As seen in the examples in Chapter 3, one or more plot statements can be provided that will work
together to create the graph. See the table of permissible combinations shown in section 2.4.
Chapter 4 Basic Plots 63
All plot statements have required and optional parameters necessary to create the plot. These
parameters fall in the following broad categories:
Categories 1 - 3 above include role and option names that are specific to each plot. These role
and option names are used consistently and only as needed. We will discuss these with each
specific plot statement.
Category 4 includes the set of options that are common to all plot statements as shown in the
table below. These options work in the same way across all plot statements. Some plots may
not support one or more of these common options.
Common Options:
ATTRID =string The attribute map ID (See Chapter 9).
CLUSTERWIDTH =value Fraction of the mid-point spacing to be used for
drawing the group cluster.
DISCRETEOFFSET =value Fractional shift within midpoint spacing.
GROUPDISPLAY =keyword OVELAY | CLUSTER. Default value varies.
GROUPORDER =keyword DATA | ASCENDING | DESCENDING
LEGENDLABEL =string The label that appears in the legend to represent
this (non-group) plot.
NAME =string Specifies a name for this statement. Other
statements, such as KEYLEGEND, can refer to a
plot by its name.
TRANSPARENCY =value Specifies transparency for the visual elements.
URL =string-column URL link when used in HTML output.
These common options are available for each plot statement, and can be used exactly as
described above. To avoid duplication, we will not list the above common options for the
individual plots that are discussed in this chapter.
64 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
The basic scatter plot is a key plot type for visualization of the raw
data. The syntax is:
Appearance Options:
ERRORBARATTRS =line-attrs Appearance attributes for error bars
MARKERATTRS =marker-attrs Appearance attributes for markers
MARKERCHARATTRS =text-attrs Appearance attributes for char markers
Boolean Options:
DATALABEL Display Y value as data label
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
Chapter 4 Basic Plots 65
ods escapechar='^';
title "Average Temperatures in ^{unicode '00b0'x}C and ^{unicode '00b0'x}F";
proc sgplot data=sgbook.weather;
refline 32 / label='32' labelloc=inside labelpos=min;
scatter x=month y=high / legendlabel='High';
scatter x=month y=lowc / y2axis legendlabel='Low';
xaxis grid valueattrs=(size=6) offsetmin=0.1;
yaxis grid min=14 max=104;
y2axis min=-10 max=40;
keylegend / location=inside position=topleft across=1;
run;
72 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
• MARKERCHAR = column
• DATALABEL < = column >
The scatter plot displays a marker at the (x, y) location for each observation. The
SCATTER statement also supports display of data labels with each marker using the
DATALABEL option. In this case, the associated data label is displayed near the marker:
By default, a data label collision avoidance algorithm is used to minimize the collisions
between labels and other markers in the plot. Plots with data labels can get busy very
quickly. When using data labels we recommend the following:
The data label collision avoidance algorithm attempts to move the labels to avoid collision
with markers or other labels. This can sometimes move a label away from the marker. One
technique to retain context is to use group colors as shown later.
Often it is desirable to display a character string at the (x, y) position for each observation
instead of a marker. The MARKERCHAR=column option draws a character string for the
value from the column for each observation at the (x, y) location in the plot area.
Figure 4.5.5 extends the example in Figure 4.4.14 to display the temperature value for each
observation. This makes it easier to see both the trend and the actual value.
Chapter 4 Basic Plots 75
The basic series plot is a key plot type for visualization of raw data.
This plot does not summarize the data. The syntax is:
Appearance Options:
LINEATTRS =line-attrs Appearance attributes for lines
MARKERATTRS =marker-attrs Appearance attributes for markers
Other Options:
CURVELABEL =string Curve label for non-grouped case
CURVELABELLOCATION =keyword Curve label location (INSIDE | OUTSIDE)
CURVELABELPOSITION =keyword Curve label position (TOP, BOTTOM, …)
Boolean Options:
BREAK Break the series plot at missing values
CURVELABEL Display curve labels from group role
DATALABEL Display Y value as data label
MARKERS Display the markers
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
Chapter 4 Basic Plots 79
A step plot draws a horizontal line through the Y value, with a step
change at the next observation.
Appearance Options:
ERRORBARATTRS =line-attrs Appearance attributes for error bars
LINEATTRS =line-attrs Appearance attributes for lines
MARKERATTRS =marker-attrs Appearance attributes for markers
Other Options:
CURVELABEL =string Curve label for non-grouped case
CURVELABELLOCATION =keyword Curve label location (INSIDE | OUTSIDE)
CURVELABELPOSITION =keyword Curve label position (TOP, BOTTOM, etc…)
JUSTIFY =value Location of data point relative to step
Boolean Options:
BREAK Break the series plot at missing values
CURVELABEL Display curve labels from group role
DATALABEL Display Y value as data label
MARKERS Display the markers
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
Chapter 4 Basic Plots 85
For each value of the X (or Y) variable, a band is drawn between the
lower and upper response values. Response values can come from a column in the data or can
be a constant value.
Appearance Options:
CURVELABELATTRS =text-attrs Appearance of curve label
FILLATTRS =fill-attrs Appearance of fill between values
LINEATTRS =line-attrs Appearance of the lines
Other Options:
CURVELABELLOC =keyword Location of label – INSIDE | OUTSIDE
CURVELABELPOS =keyword Position of label - START | END
CURVELABELLOWER =string Name of lower band curve
CURVELABELUPPER =string Name of upper band curve
MODELNAME =plotname Name of the plot for interpolation info.
NAME =string Name for this statement
TRANSPARENCY =value Transparency for the line and markers
TYPE =value Series or Step
Boolean Options:
FILL | NOFILL Display fill between lower and upper values
OUTLINE | NOOUTLINE Display outline at lower and upper values
NOEXTEND
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
The band may be filled, with or without an outline. The GROUP role is used to create multiple
bands, one for each unique group value. When used with a series or step plot to display a
confidence or prediction band, the MODELNAME option can be used to create an association
with the fit plot.
Note: The band plot statement is optimal for the display of confidence and prediction bands,
where the X (or Y) values are sorted. The band is drawn in data order, and if the data are not
sorted by X (or Y), then the results may be unpredictable.
Appearance Options:
DATALABELATTRS =text-attrs Appearance attributes for labels
LINEATTRS =line-attrs Appearance attributes for the needles
MARKERATTRS =marker-attrs Appearance attributes for markers
Other Options:
BASELINE =number Y-intercept for the baseline
Boolean Options:
DATALABEL Display Y value as data label
MARKERS Display the markers
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
Chapter 4 Basic Plots 95
Appearance Options:
DATALABELATTRS =text-attrs Appearance attributes for data labels
LINEATTRS =line-attrs Appearance attributes for lines
Other Options:
ARROWHEADDIRECTION =keyword OUT | IN | BOTH
ARROWHEADSHAPE =keyword OPEN | CLOSED | FILLED | BARBED
Boolean Options:
DATALABEL Display curve labels from group role
NOARROWHEADS Suppress display of arrow heads
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
98 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Appearance Options:
FILLATTRS =fill-attrs Appearance of bar interior
LIMITATTRS =line-attrs Appearance of limit bars
Other Options:
BARWIDTH = value Width of bar as fraction of tick spacing
DATASKIN = skin-value One of the predefined skin types
Boolean Options:
DATALABEL Display default bar data labels
FILL | NOFILL Display filled / unfilled bars
OUTLINE | NOOUTLINE Display bar outlines or not.
MISSING Accept missing as category value
Bubble plot draws circular bubbles sized by the Size role at the X and Y
locations in the plot. Radius of the bubbles is scaled linearly by the
SIZE variable.
Appearance Options:
DATALABELATTRS =text-attrs Appearance attributes of the labels
FILLATTRS =fill-attrs Appearance attributes of bubble interior
LINEATTRS =line-attrs Appearance attributes for bubble outline
Other Options:
BRADIUSMAX =value Radius in pixels of the largest bubble
BRADIUSMIN =value Radius in pixels of the smallest bubble
Boolean Options:
FILL | NOFILL Display filled bubbles or not
OUTLINE | NOOUTLINE Display bubble outline or not
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
106 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Appearance Options:
FILLATTRS =fill-attrs Appearance of bar interior
DATALABELATTRS =text-attrs Appearance of bar labels
LINEATTRS =line-attrs Appearance of line or outline
Other Options:
BARWIDTH =value Width of bar as fraction of tick spacing
GROUPDISPLAY =value Cluster or Overlay
GROUPORDER =keyword DATA | ASCENDING | DESCENDING
INTERVALBARWIDTH =value Width of bar in pixels
TYPE =keyword BAR | LINE
Boolean Options:
FILL | NOFILL Display filled bars or not for Type=Bar
OUTLINE | NOOUTLINE Display outline or not for Type=Bar
Appearance Options:
LINEATTRS =line-attrs Appearance attributes for the reference lines
Other Options:
AXIS = axis The axis X, X2, Y, Y2 for the values
LABEL = string Labels for the reference line(s)
LABELLOC = string Location of the label (Inside | Outside)
LABELPOS = string Position of the label (Start | End)
Boolean Options:
NOCLIP Refline is included to determine axis range
Related Style Elements: The GraphReference style element is used to draw these lines.
Chapter 4 Basic Plots 113
Appearance Options:
CURVELABELATTRS =text-attrs Appearance of the label
CURVELABELLOC =keyword OUTSIDE | INSIDE
CURVELABELPOS =keyword AUTO | MAX | MIN
LINEATTRS =line-attrs Appearance of the line
Other Options:
CLIP =boolean Include data for this plot in the axis range
LEGENDLABEL =string Label that appears in legend
NAME =string Name for this statement
Boolean Options:
NOEXTEND Line is not drawn to the axes
X2AXIS Assigns X values to the X2 (top) axis
Y2AXIS Assigns Y values to the Y2 (right) axis
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
116 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Appearance Options:
DATALABELATTRS =text-attrs Appearance attributes of the labels
FILLATTRS =fill-attrs Appearance attributes of bubble interior
FINALBARATTRS =line-attrs Appearance attributes for bubble outline
INITIALBARATTRS =line-attrs Appearance attributes for bubble outline
Other Options:
BARWIDTH =value Radius in pixels of the largest bubble
DATASKIN =skin-value One of the predefined skin types
FINALBARTICKVALUE =string String for labeling final bar on axis
INITIALBARTICKVALUE =string String for labeling initial bar on axis
INITIALBARVALUE =value Response value for initial bar
STAT =keyword MEAN | SUM
Boolean Options:
FILL | NOFILL Display filled bar or not
OUTLINE | NOOUTLINE Display bar outline or not
MISSING Show missing category values
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a group variable is present, the GraphData1-GraphData12 elements are used, one for
each group.
118 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
The order of the plot statements and the axes determine if certain combinations are allowed. The
axis type (linear, discrete, etc.) is decided by the first plot in the list. Subsequent plots must have
compatible type of data on the same axis for the combination to be acceptable.
Compatibility:
1. If the axis is of type LINEAR, it is compatible with only LINEAR variables.
2. If the axis is of type DISCRETE, then it is compatible with both DISCRETE and LINEAR
variables. Subsequent LINEAR data are treated as DISCRETE.
5.1 Introduction
The SGPLOT and SGPANEL procedures support multiple fit and confidence plots as listed
below. These plots normally also display the raw data as a scatter plot, and also support display
of confidence bands. These plots can be combined with many basic X-Y plots, most commonly
with the scatter plot.
REG LOESS
PBSPLINE ELLIPSE
130 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
All plot statements have required and optional parameters necessary to create the visual. These
parameters fall into the following broad categories:
Common Options:
ATTRID =string The attribute map ID used for the group
variable.
CURVELABELATTRS =line-attrs Appearance attributes for the curve label.
CURVELABELLOC =keyword Curve label location - INSIDE | OUTSIDE
CURVELABELPOS =keyword Curve label position - AUTO | MAX | MIN
DATALABEL <=column> Display a scatter label for the Y variable or
from an optional variable.
DATALABELATTRS =text-attrs Appearance attributes for DataLabel.
GROUP =column Classification variable for fit calculations.
LEGENDLABEL =string The label that appears in the legend to
represent this (non-group) plot.
LINEATTRS = line-attrs Appearance attributes for the fit line.
MARKERATTRS = marker-attrs Appearance attributes for the markers.
MAXPOINTS =positive-integer Maximum number of prediction points.
NAME =string Specifies a name for this statement. Other
statements, such as a KEYLEGEND, can
refer to a plot by its name.
TRANSPARENCY =value Specifies the transparency for the visual
elements.
These common options are available for each plot statement, and can be used exactly as
described above. To avoid duplication, we will not list the above common options for the
individual plots that are discussed in this chapter.
Chapter 5 Fit and Confidence Plots 131
Appearance Options:
ALPHA =positive-number Confidence level
CLIATTRS =cli-attrs Appearance attributes for CLI
CLMATTRS =clm-attrs Appearance attributes for CLM
DEGREE =positive-integer The degree of the polynomial
Other Options:
CLI =string Display the confidence limits for individual predicted
values using the string for the legend label.
CLM =string Display confidence limits for mean predicted values
using the string for the legend label.
Boolean Options:
CLI Display confidence limits for individual predicted values.
CLM Display confidence limits for mean predicted values.
NOLEGCLI Exclude the CLI from the legend.
NOLEGCLM Exclude the CLM from the legend.
NOLEGFIT Exclude the fit line from the legend.
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
132 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Appearance Options:
ALPHA =positive-number Confidence level to compute.
CLMATTRS =clm-attrs Appearance attributes for CLM.
DEGREE =positive-integer The degree of the polynomial.
INTERPOLATION =keyword The degree of the interpolating polynomials
used for blending local polynomial fits at the
kd tree vertices – LINEAR | CUBIC.
REWEIGHT =postive-integer The number of iterative reweighting steps.
SMOOTH =postive-integer Specifies a smoothing parameter value.
Other Options:
CLM =string Display confidence limits for mean predicted values
using the string for the legend label.
Boolean Options:
CLM Display confidence limits for mean predicted values.
NOLEGCLM Exclude the CLM from the legend.
NOLEGFIT Exclude the fit line from the legend.
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
Chapter 5 Fit and Confidence Plots 139
A marker is displayed at each (x, y) location. A fit line is computed and displayed. Optionally,
confidence limits and/or confidence limits of the means can be displayed.
Appearance Options:
ALPHA =positive-number Confidence level to compute
CLIATTRS =cli-attrs Appearance attributes for CLI
CLMATTRS =clm-attrs Appearance attributes for CLM
DEGREE =positive-integer The degree of the polynomial
NKNOTS =positive-number Number of evenly-spaced internal knots
Other Options:
CLI =string Display confidence limits for individual predicted
values using the string for the legend label.
CLM =string Display confidence limits for mean predicted values
using the string for the legend label.
CLMTRANSPARENCY =value Transparency for the CLM band.
Boolean Options:
CLI Display confidence limits for individual predicted values.
CLM Display confidence limits for mean predicted values.
NOLEGCLI, NOLEGCLM Exclude the CLI or CLM from the legend.
NOLEGFIT Exclude the fit line from the legend.
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple statements are overlaid, the GraphData1-
GraphData12 elements are used, one for each group or statement.
146 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Appearance Options:
ALPHA =positive-number Confidence level to compute
FILLATTRS =fill-attrs Attributes for the ellipse fill
LINEATTRS =line-attrs Attributes for the ellipse outline
TYPE =keyword Type of the confidence ellipse
PREDICTED | MEAN
Boolean Options:
CLIP Exclude the ellipse data from the axis data range calculation
FILL / NOFILL Enable/Disable display of a fill color in the ellipse
OUTLINE/NOOUTLINE Enable/Disable display of an outline around the ellipse
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
Chapter 5 Fit and Confidence Plots 153
6.1 Introduction
The SGPLOT and SGPANEL procedures support multiple distribution plots as listed below.
These plots normally do not display the raw data. These plots can be combined only in specific
ways with other plots in this category or the basic plots.
HISTOGRAM DENSITY
VBOX HBOX
162 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
6.2 Histogram
The histogram shows the distribution of a numeric variable. The
syntax is:
Appearance Options:
FILLATTRS =fill-attrs Appearance attributes for bins
Other Options:
BINSTART = value X coordinate of the first bin.
BINWIDTH =value Width of the bin
BOUNDARY =keyword Location of boundary values – UPPER |
LOWER
NBINS =value Number of bins
SCALE =keyword PERCENT | COUNT | PROPORTION for Y axis
scaling
Boolean Options:
FILL | NOFILL Whether or not the bins are filled
OUTLINE | NOOUTLINE Whether or not the bins have outlines
SHOWBINS Tickmark on the X axis is shown at the midpoint of the bin
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
Chapter 6 Distribution Plots 163
Appearance Options:
LINEATTRS =line-attrs Appearance attributes for line
Other Options:
TYPE =keyword (opts) NORMAL (options) | KENNEL (options)
SCALE =keyword PERCENT | COUNT | PROPORTION for y-axis
scaling
Type=Normal Options:
MU =value Mean value to be used
SIGMA =value Standard deviation to be used
Type=Kernel Options:
C =value Band Width
WEIGHT =keyword NORMAL | QUADRATIC | TRIANGULAR
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
Chapter 6 Distribution Plots 167
A box is displayed for each unique category value. The default display
includes the Q1-Q3 interval, mean, median, whiskers, and outliers.
Appearance Options:
CONNECTATTRS =line-attrs Appearance for connect line
DATALABELATTRS =text-attrs Appearance for data labels
FILLATTRS =fill-attrs Appearance of box interior
LINEATTRS =line-attrs Appearance of box outlines
MEANATTRS OUTLIERATTRS =marker-attrs Appearance of mean and outliers
MEDIANATTRS =line-attrs Appearance of median
WHISKERATTRS =line-attrs Appearance of whiskers
Other Options:
BOXWIDTH =value Width of the box as a fraction of spacing
CAPSHAPE =keyword SERIF | LINE | BRACKET
CONNECT =keyword The statistic to connect – MEAN, MEDIAN, etc.
Boolean Options:
DATALABEL Label the outliers
EXTREME Whiskers are drawn to the extreme values
FILL | NOFILL Specifies whether the boxes are filled or not
LABELFAR Display labels for far outliers
MISSING NOCAPS NOMEAN Various options
NOMEDIAN NOOUTLIERS NOTCHES Various options
Related Style Elements: Same as for Scatter, Series, VBar, and HBar.
170 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
XAXIS TYPE=linear.
Appearance Options:
CONNECTATTRS =line-attrs Appearance for connect line
DATALABELATTRS =text-attrs Appearance for data labels
FILLATTRS =attrs Appearance of box interior
LINEATTRS =attrs Appearance of box outlines
MEANATTRS OUTLIERATTRS =marker-attrs Appearance of mean and outliers
MEDIANATTRS =line-attrs Appearance of median
WHISKERATTRS =line-attrs Appearance of whiskers
Other Options:
BOXWIDTH =value Width of the box as a fraction of spacing
CAPSHAPE =keyword SERIF | LINE | BRACKET
CONNECT =keyword The statistic to connect – MEAN, MEDIAN, etc.
Boolean Options:
DATALABEL Label the outliers
EXTREME Whiskers are drawn to the extreme values
FILL | NOFILL Specifies whether the boxes are filled or not
LABELFAR Display labels for far outliers
MISSING NOCAPS NOMEAN Various options
NOMEDIAN NOOUTLIERS NOTCHES Various options
7.1 Introduction
The SGPLOT and SGPANEL procedures support the categorization plots listed below. Each of
these plots support sum, mean, and frequency statistics. In addition, these plots support limit
calculations for mean statistics. The limit statistics include the confidence limit of the mean (CLM),
standard error (STDERR), and standard deviation (STDDEV). The confidence of the CLM statistic
can be adjusted using the ALPHA option, and the number of STDERRs or STDDEVs can be
specified using the NUMSTD option.
These plots cannot be used to draw pre-calculated limits. To draw custom limits on a bar chart,
you should use a VBARPARM or HBARPARM plot (see Figure 4.11.1). For line charts, use a
SERIES plot overlaid with a SCATTER plot (see Figure 4.6.10). For DOT plot, simply use a
scatter plot (see Figure 4.4.7).
All plot statements have required and optional parameters necessary to create the visual. Here is
a list of common options for categorization plots:
Common Options:
ALPHA =value Confidence level of CLM statistic
ATTRID =string The attribute map ID used for the group
variable
CATEGORYORDER =keyword RESPASC | RESPDESC
CLUSTERWIDTH =number Cluster width as a ratio of midpoint spacing
DATALABEL <=column> Display a scatter label for the Y variable or
from an optional variable
DATALABELATTRS =text-attrs Appearance attributes for DataLabel
DISCRETEOFFSET =number Amount to offset all data primitives from the
category midpoints
GROUPORDER =keyword DATA | ASCENDING | DESCENDING
LEGENDLABEL =string The label that appears in the legend to
represent this (non-group) plot
LIMITATTRS =line-attrs Appearance attributes for the limits
LIMITS =keyword BOTH | UPPER | LOWER
LIMITSTAT =keyword CLM|STDDEV | STDERR
NAME =string Specifies a name for this statement. Other
statements, such as a KEYLEGEND, can
refer to a plot by its name.
NUMSTD =positive-integer Number of standard units to compute
STAT =keyword FREQ | MEAN | SUM
TRANSPARENCY =value Specifies the transparency for the visual
elements
URL =string-column Contains URLs for drilldown
WEIGHT =num-column Used to weight observations
Appearance Options:
FILLATTRS =fill-attrs Appearance attributes for Fill
Other Options:
BARWIDTH =value Width of bar a fraction of tick spacing
DATALABELPOS =keyword DATA | TOP | BOTTOM
DATASKIN =keyword Quasi-3-D effect for the bars (GLOSS, etc.)
GROUPDISPLAY =keyword STACK | CLUSTER
Boolean Options:
FILL | NOFILL Controls bar fill
OUTLINE | NOOUTLINE Controls bar outline
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
184 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Appearance Options:
FILLATTRS =fill-attrs Appearance attributes for fill
Other Options:
BARWIDTH =value Width of bar a fraction of tick spacing
DATASKIN =keyword Quasi-3D effect for the bars (GLOSS, etc.)
GROUPDISPLAY =keyword STACK | CLUSTER
Boolean Options:
FILL | NOFILL Controls bar fill
OUTLINE | NOOUTLINE Controls bar outline
Related Style Elements: The GraphDataDefault style element is used to draw the plot elements.
When a GROUP variable is present, or when multiple plot statements are overlaid, the
GraphData1-GraphData12 elements are used, one for each group or plot statement.
192 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Appearance Options:
CURVELABELATTRS =text-attrs Appearance attributes for the line labels
LINEATTRS =line-attrs Appearance attributes for the line
MARKERATTRS =marker-attrs Appearance attributes for the markers
Other Options:
CURVELABEL <=”string”> Label(s) for the line(s)
CURVELABELLOC =keyword INSIDE | OUTSIDE
CURVELABELPOS =keyword AUTO | MIN | MAX | START | END
DATALABELPOS =keyword DATA | TOP | BOTTOM
GROUPDISPLAY =keyword OVERLAY | CLUSTER
Boolean Options:
BREAK Break the line at missing response values
MARKERS Display markers on the data points
X2AXIS Assign the plot to the secondary x-axis
Y2AXIS Assign the plot to the secondary y-axis
Appearance Options:
CURVELABELATTRS =text-attrs Appearance attributes for the line labels
LINEATTRS =line-attrs Appearance attributes for the line
MARKERATTRS =marker-attrs Appearance attributes for the markers
Other Options:
CURVELABEL <=”string”> Label(s) for the line(s)
CURVELABELLOC =keyword INSIDE | OUTSIDE
CURVELABELPOS =keyword AUTO | MIN | MAX | START | END
GROUPDISPLAY =keyword OVERLAY | CLUSTER
Boolean Options:
BREAK Break the line at missing response values
MARKERS Display markers on the data points
X2AXIS Assign the plot to the secondary x-axis
Y2AXIS Assign the plot to the secondary y-axis
Appearance Options:
MARKERATTRS =marker-attrs Appearance attributes for the markers
Other Options:
GROUPDISPLAY =keyword OVERLAY | CLUSTER
Boolean Options:
X2AXIS Assign the plot to the secondary x-axis
Y2AXIS Assign the plot to the secondary y-axis
8.1 Introduction
The SGPLOT procedure supports up to four axes (XAXIS, YAXIS, X2AXIS, and Y2AXIS), while
the SGPANEL procedure supports only two (ROWAXIS and COLAXIS). The axis statements are
contained within the procedure code and can be specified in any order. For the secondary axes
to appear in SGPLOT, a plot must be assigned to them using the X2AXIS or Y2AXIS plot options.
The SGPLOT and SGPANEL procedures support four types of axes: LINEAR, LOG, TIME, and
DISCRETE. The axis type is chosen automatically based on the chart type, the data, and the
formats used; however, the LOG type is never chosen automatically. If you override the default
axis type, be sure you do not specify an axis type that is incompatible with your chart type;
otherwise, your chart will not draw. For example, if you request a vertical bar chart, the x-axis
type is automatically set to DISCRETE. If you attempt to set the x-axis type to any other value,
you will get a note in the log and the chart will not draw.
The following axis options can be used regardless of axis type. Axis options that apply to a
specific type will be addressed in the following sections.
The KEYLEGEND statement is used to specify the content and the attributes of custom legends,
as well as control the attributes of automatic legends. This statement works the same way in both
the SGPLOT and SGPANEL procedures, with the exception of the LOCATION option. The
LOCATION option is not available in the SGPANEL procedure because internal legends are not
222 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
supported. The SGSCATTER procedure has a LEGEND option that supports a subset of the
options in the following legend table. See the documentation for more details.
If the legend is located inside, and the POSITION option is not specified, the legend will
automatically position itself in an area with the least amount of data collision. This feature is very
useful for batch programs. If the plot data varies significantly from one run to the next, the legend
will move to the best possible position without any program change.
Legend Options:
The INSET statement is used to add additional text or small tables to an SGPLOT graph. As the
statement name suggests, the inset is always drawn within the plot area of the graph. The
information in the inset is specified directly on the statement. See the inset examples for more
details.
If the POSITION option is not specified, the inset will automatically position itself in an area with
the least amount of data collision. This feature is very useful for batch programs. If the plot data
varies significantly from one run to the next, the inset will move to the best possible position
without any program change.
Inset Options:
8.6 Legends
Figure 8.6.1: Automatic Legends
8.7 Insets
Figure 8.7.1: Inset Types
ods escapechar='~';
proc sgplot data=sashelp.class; 9.3
reg x=weight y=height / clm;
inset ( "Y~{unicode bar}"="62.34" "R~{sup '2'}"="0.94"
"~{unicode alpha}"=".05" ) / border textattrs=(size=7pt);
run;
Chapter 9
Annotation and Attribute Maps (SAS 9.3)
9.1 Annotation 235
9.2 Attribute Maps (9.3) 251
234 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
9.1 Annotation
The basic idea for annotation is that you can define a data set that contains the custom drawing
actions you want to perform on a graph (see Figure 9.1.1). The data set has reserved column
names, each providing the specific information needed to create the graphical primitives. The
name of this data set is provided on the procedure statement using the SGANNO option.
Each observation draws one graphical primitive on the graph as defined by the FUNCTION
column. If the function is a POLYGON or a POLYLINE, then this observation, together with
subsequent POLYCONT values, defines the vertices of the figure. The information needed for
each function is provided in the other named columns of the data set. Figure 9.1.2 shows the
program code needed to create the graph shown in Figure 9.1.3.
Figure 9.1.3
236 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
When you create the annotation data set using DATA step code, be sure to check the length of
your columns. This annotation definition uses a number of keywords of varying lengths. You
might start with an attribute that is only four characters wide and change it to something eight
characters wide later in the DATA step. The eight-character value will be truncated to four
characters unless you use the LENGTH statement or set a column range in the INPUT statement.
If you see odd results in your graph, be sure to check the log to see if invalid attributes are
specified due to truncation.
Many SAS/GRAPH users will be familiar with the annotation concepts described in this chapter,
as they are based on the SAS/GRAPH annotation facility. However, this new facility has been
redesigned to take advantage of the capabilities of the ODS Graphics system, such as
transparency and rich text support. The data set column names and values have been defined in
a way to help make them more memorable and self-documenting. The facility uses various
drawing spaces in non-paneled plots to help simplify the placement of graphics primitives.
For the SAS 9.3 release, the following functions are supported in the FUNCTION column:
• TEXT – Used to draw a text string in the graph.
• TEXTCONT – Used to continue a text string from a previous TEXT or TEXTCONT
function. This function is typically to perform rich text operations, such as changing color
or font attributes.
• IMAGE – Used to place an image into the graph.
• LINE – Used to draw a line segment in the graph.
• ARROW – Used to draw an arrow in the graph.
• RECTANGLE – Used to draw rectangles or squares in the graph.
• OVAL – Used to draw ovals or circles in the graph.
• POLYGON – Used to draw a closed line figure in the graph. Because this is a closed
figure, the figure may be filled. This operation specifies the starting point of the polygon.
Each subsequent POLYCONT operation is used to draw one line segment from the
previous ending point.
• POLYLINE – Used to draw an open line figure in the graph. This operation specifies the
starting point of the polygon. Each subsequent POLYCONT operation is used to draw
one line segment from the previous ending point.
• POLYCONT – Used to specify the next drawing point for a POLYGON or POLYLINE
figure. All visual attributes for the figure must be specified on the initial POLYGON or
POLYLINE statement.
Chapter 9 Annotation and Attribute Maps (SAS 9.3) 237
An important concept to understand when working with annotations is the concept of drawspace.
The drawspace specification is a combination of both an area and a unit. Not all areas are
supported by all SG procedures. The SGPLOT procedure supports all areas, while the SGPANEL
and SGSCATTER procedures support only the graph and layout areas. The graph below shows
where these areas are located in a typical SGPLOT output (Figure 9.1.4).
Figure 9.1.4
The units are either PERCENT or PIXEL, except the DATA area which also supports a VALUE
unit. As an example, if you wanted to draw some annotation in the wall area using percentages,
you could use the WALLPERCENT drawspace. Currently, annotations must be drawn using
absolute coordinates; there is not a relative drawing mode.
Annotations do not reserve space in the graph. If you need to reserve space in the graph for an
annotation, there are two techniques you can use. If you need space outside of the plot area, the
SG procedures support a new option called PAD that gives you the ability to padding to any of the
four edges of the graph. To reserve space within the plot area, use the OFFSETMIN and
OFFSETMAX options on the axis statements.
238 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
This example uses a user-defined format to create age ranges for each box. To show these
ranges in the most compact way, you can use a Unicode “≤” in the tick value. To add these
characters, you can turn off the tick values and use annotation to create the tick values. The ODS
ESCAPECHAR functionality can be used to add Unicode, superscripts, or subscript to your
annotation strings. The data set contains a value for each range so that the values can be
positioned in data space. The label uses the Unicode function to generate the “≤” (‘2264’x). The
new PAD option is used to reserve space for the annotated axis values.
data anno;
retain function 'text' y1space 'graphpercent' x1space 'datavalue' y1 7 width 15;
input x1 label $ 4-33;
cards;
21 20 (*ESC*){unicode '2264'x} 30
31 30 (*ESC*){unicode '2264'x} 40
41 40 (*ESC*){unicode '2264'x} 50
51 50 (*ESC*){unicode '2264'x} 60
61 60 (*ESC*){unicode '2264'x} 70
;
run;
When tick values contain multiple words, it is sometimes more space-efficient to split the value
into multiple lines. There is a technique using annotation that can help you create these multi-line
values. The key is using the WIDTH column to specify a text width. When a text string reaches
the specified width, the system will automatically try to wrap the string. By combining the correct
width with the “center” value on the JUSTIFY column, you can create center-justified, multi-line
tick values.
North Carolina datavalue wallpercent top text 15 center -0.5 North Carolina
South Carolina datavalue wallpercent top text 15 center -0.5 South Carolina
data anno;
set sgbook.rainfall (keep=state) end=_last_;
length x1space $ 11 y1space $ 13 anchor $ 6;
rename state=xc1;
retain function "text" x1space "datavalue" y1space "wallpercent" width 15
justify "center" y1 -0.5 anchor "top";
label=state;
output;
if (_last_) then do;
x1space = "wallpercent"; title "Maximum Rainfall Caused by a Tropical Cyclone";
y1space = "graphpercent"; proc sgplot data=rainfall sganno=anno
anchor="bottom"; pad=(bottom=15%) noautolegend;
x1=50; xaxis display=(nolabel novalues);
y1=1; vbar state / response=rainfall dataskin=pressed
label="State"; datalabel;
output; run;
end;
240 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
This graph contains an axis-aligned statistics table of class weight and height. The Y-coordinates
and the table values are extracted from the bar chart’s input data set. After the last value is
processed (_last_ is set), we add the column headers for the table. The PROC SGPLOT call to
make the bar chart uses the new PAD option to provide space for the table.
Alfred 69.0 112.5 wallpercent datavalue Right 113 text Alfred 111.0
data anno;
set sashelp.class (obs=5 keep=name weight height) end=_last_;
length x1space $ 11 y1space $ 11 anchor $ 8 label $ 6;
retain function 'text' x1space 'wallpercent' y1space 'datavalue' anchor 'right';
yc1=name; /* Y-coordinate is the name */
x1=111; /* percent beyond the edge of the wall */
label=put(weight, F3.0);
output;
x1=122; /* percent beyond the edge of the wall */
label=put(height, F3.0);
output;
if (_last_) then do; /* Add table headers to the end of the data */
y1space = 'wallpercent';
width = 20;
anchor = 'top';
textweight='bold';
y1 = 103; title "Class Statistics";
x1=107.5; proc sgplot data=sashelp.class (obs=5) sganno=anno
pad=(right=40%);
label="Weight";
yaxis display=(nolabel);
output;
hbar name / response=weight categoryorder=respdesc
x1=119.5; dataskin=pressed nostatlabel;
label="Height"; run;
output;
Chapter 9 Annotation and Attribute Maps (SAS 9.3) 241
This example combines the techniques of data alignment and text width in the previous examples
to create a simple forest plot. Notice how the effect labels along the bottom axis are positioned
based on wall position, and the width keeps long labels from colliding. If you know you are going
to have long effect labels, you may want to specify more bottom padding in the SGPLOT
procedure. Also see Figure 12.2 for an alternate way to create a forest plot without annotation.
Study Summary label anchor y1space x1space function width textsize yc1 x1
You can create a small annotate data set to add a company logo to all of the graphs in your
report, assuming you want the logo in the same location. A typical place to put this logo is in one
of the extreme corners of the graph. The key to using the data set is to specify the proper padding
in the SG procedure to prevent collisions with the plot axes or other graph features.
data anno;
retain function "Image" anchor "bottomright" x1 100 y1 0.5
width 20 drawspace "graphpercent" image "logo.png";
run;
In this example, the IMAGESCALE column is used to tile the texture image. The WIDTH and
HEIGHT columns define the size of the area in which to tile the image. The ANCHOR, X1, and Y1
columns determine where the tiling area is positioned. The LAYER column is used to push the
image to the back of the graph, even behind the wall of the plot area.
function anchor x1 y1 width height widthunit heightunit imagescale drawspace transparency layer Image
Image bottomright 100 0.5 640 480 pixel pixel tile graphpercent 0.5 back graywood.jpg
data anno;
retain function "Image" anchor "bottomright" x1 100 y1 0.5 width 640
height 480 widthunit 'pixel' heightunit 'pixel' imagescale 'tile'
drawspace "graphpercent" layer 'back' transparency 0.5
Image "graywood.png";
run;
Chapter 9 Annotation and Attribute Maps (SAS 9.3) 245
In this example, an image is used to label each curve instead of a text label. All images are
positioned just outside of the x-axis data area. The Y position of the images comes from the Y
value of the plot points at the year 2000. The ability to position images in data space also opens
up the possibility of using images as plot points. Note that annotations do not reserve space;
therefore, you must provide an axis offset large enough to contain the images within the wall
area.
In addition to the image labels, the company logo and the footnote text are also annotated. The
text is annotated because the text from a FOOTNOTE statement would have been raised up with
plot when adding padding for the company logo.
246 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
data anno;
length x1space $ 13 y1space $ 13 anchor $ 11;
/* Query for the observation at year 2000 */
set meat_consumption (where=(year='01jan2000'd)) end=_last_;
retain anchor 'left' y1space 'datavalue' x1space 'datapercent'
width 40 widthunit 'pixel' function 'image' x1 102;
y1 = chicken;
image = chicken.jpg";
output; /* Chicken image */
y1 = beef;
image = cow.jpg";
output; /* Cow image */
y1 = pork;
image = pig.jpg";
output; /* Pig image */
if (_last_) then do;
/* The company logo */
x1space = "graphpercent";
y1space = "graphpercent";
anchor = "bottomright";
x1 = 99;
y1 = 1;
width=90;
image = Logo.png";
output;
/* The footnote text */
function = "text";
anchor = "bottomleft";
x1 = 1;
width=150;
textsize = 6;
label = "Source: USDA";
output;
end; run;
Chapter 9 Annotation and Attribute Maps (SAS 9.3) 247
In this example, arrows are used to emphasize outlying values in the plot. Notice how a common
X2 and Y2 value are use to draw the arrows to the same point. The single label is added after all
of the data is processed.
data anno;
set sgbook.expensive (keep=msrp mpg_city) end=_last_;
length x1space $ 11 y1space $ 11;
retain function "Arrow" x1space "datavalue" y1space "datavalue" x2space "wallpercent"
y2space "wallpercent" x2 50 y2 75 direction "in" scale 0.5;
rename msrp=y1 mpg_city=x1;
mpg_city = mpg_city + 1;
output;
if (_last_) then do;
function = "text";
x1space = "wallpercent";
y1space = "wallpercent";
mpg_city=50; Title "Is Good Gas Mileage Expensive?";
proc sgplot data=sashelp.cars sganno=anno;
Figure 9.1.13: Making a Bubble Legend
msrp=75;
scatter x=mpg_city y=msrp / group=type;
anchor="left";
loess x=mpg_city y=msrp / nomarkers;
width=50;
run;
label="These cars are expensive!";
output; end;
248 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
This legend is constructed using four annotation functions: RECTANGLE, OVAL, LINE, and
TEXT. The minimum and maximum size of the annotated bubbles is synchronized with the
bubble plot by using the BRADIUSMIN and BRADIUSMAX options.
data anno;
retain drawspace "wallpercent" widthunit "pixel" heightunit "pixel"
linethickness 1 textsize 8;
length function $ 9;
input function $ x1 y1 width height x2 y2 textsize anchor $ label $ 48-66;
cards;
Rectangle 86 76.5 140 87 . . 12 bottom
Oval 80 76.5 44 44 . . 12 bottom
Oval 80 76.5 16 16 . . 12 bottom
Line 80 87.9 . . 87 87.9 12 bottom
Line 80 80.5 . . 87 80.5 12 bottom
Text 86 98 140 . . . 12 top Salary (in dollars)
Text 87 88.1 140 . . . 8 left $32,816
Text 87 80.4 140 . . . 8 left $18,444; run;
Chapter 9 Annotation and Attribute Maps (SAS 9.3) 249
In this example, a POLYLINE function is used with other functions to create an annotation
showing the result of two companies combining into one. A group variable is used to color the
bars differently from the rest of the companies.
data anno;
length yc1 $ 15;
retain drawspace "datavalue";
function="polyline";
yc1="Chrysler";
x1=2.5;
output;
function="polycont";
x1=3.5;
output;
yc1="Fiat";
output;
x1=3.0;
output;
function="polyline";
yc1="Suzuki";
x1=3.5;
output;
function="polycont";
x1=6.0;
output;
yc1="Fiat + Chrysler";
output;
function="arrow";
x2=5.0;
yc2="Fiat + Chrysler";
output;
function="text";
yc1="Honda";
x1=6.1;
anchor="Left";
width=30;
label="Alliance creates the #6 global automaker by volume";
output;
run;
Required Column(s):
ID The ID of the attrmap. This value is referenced from the
Char
ATTRID option on the plot statements.
VALUE The data value to be assigned to the attributes. The
keyword _OTHER_ in the column can be used to define
Char the attributes of any values that are not explicitly defined in
the map. Numeric values should represent their final
formatted form.
Optional Column(s):
A style element reference for marker attributes (e.g.
MARKERSTYLE Char
GraphData1, GraphData2, etc.).
Either a literal marker symbol (circle, plus, etc.) or an
MARKERSYMBOL Char
attribute style reference (GraphData1:markersymbol).
Either a literal marker color (red, cxff0000, etc.) or an
MARKERCOLOR Char
attribute style reference (GraphData1:contrastcolor).
A style element reference for line attributes (e.g.
LINESTYLE Char
GraphData1, GraphData2, etc.).
Either a literal line pattern (solid, dash, etc.) or an attribute
LINEPATTERN Char
style reference (GraphData1:linepattern).
Either a literal line color (red, cxff0000, etc.) or an attribute
LINECOLOR Char
style reference (GraphData1:contrastcolor).
A style element reference for fill attributes (e.g.
FILLSTYLE Char
GraphData1, GraphData2, etc.).
Either a literal fill color (red, cxff0000, etc.) or an attribute
FILLCOLOR Char
style reference (GraphData1:color).
You can define multiple attribute maps within the same data set; however, make sure that the
data set stays sorted by the ID column. The ID is referenced from the procedure’s PLOT
statement and the attributes applied to the GROUP variable. If there is not a GROUP variable on
the PLOT statement, the attribute map is ignored. Also note that values in the attribute map are
case-sensitive.
The optional columns are applied to the plot only if the attributes can be applied to the plot.
Otherwise, the extra columns are ignored. If a group value is not found in the map, the attributes
for that group will come from the ODS style.
252 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
In this example, the attribute map is defined to control the appearance of both the plot lines and
the markers for each group value.
data attrmap;
length value $ 9 markersymbol $ 14;
retain id 'my_id' linepattern 'solid';
input value $ markersymbol;
cards;
Microsoft circlefille
IBM squarefilled
Intel trianglefilled
;
run;
Chapter 9 Annotation and Attribute Maps (SAS 9.3) 253
The DATA step produces an attribute map with the following information:
The value “my_id” in the ID column is referenced by the ATTRID option in the following code to
identify which map in the data set to use. The values from the GROUP variable are compared to
the map values in the VALUE column to determine if the defined attributes in the attribute
columns should be applied.
Classification panel graphs allow us to examine the relationship between variables for all
crossings of the class variables. Therefore, we can examine effects of a drug over time classified
by the gender of the person in the study.
10.1 Introduction
In addition to the plots supported by the SGPLOT procedure, the SGPANEL procedure supports
a PANELBY statement. This is the key difference between the two procedures. As mentioned in
Section 2.3, the PANELBY statement allows us to specify the LAYOUT type, and the
classification variables. The graph space is subdivided into smaller “cells” based on the number
of levels for each class variable(s).
Each cell in the graph is populated with a subset of the data that corresponds to the class
variable value(s). The plot statements in the PROC step after the PANELBY statement define the
“prototype” or “rubber stamp” that is used to populate each cell, as shown in Figures 10.1.1 and
10.1.2 below.
panelby sex;
histogram cholesterol;
density cholesterol;
colaxis display=(nolabel);
run;
Note: Each cell essentially contains a graph of the type created by the SGPLOT procedure. So,
all the techniques that we have learned in the previous chapters apply here, with just a few
exceptions.
Options:
CYCLEATTRS boolean Cycle through style elements for each plot
DATA =sas-data-set Optional data set
DESCRIPTION =string Description string
DATTRMAP =sas-data-set Data set defining an attribute map
NOAUTOLEGEND boolean Do not cycle style elements for each plot
NOCYCLEATTRS boolean Do not cycle style elements for each plot
PAD =value Padding around the outside of the graph
SGANNO =sas-data-set Data set containing the annotations
TMPLOUT =string File name for generated graph template code
As seen in the examples in Chapter 3, one or more plot statements can be provided that will work
together to create the graph. See the table of permissible combinations shown in section 2.4.
The syntax and features of some of these plots are very similar to the same statements in the
SGPLOT procedure, with a few deviations related to positioning of data and curve labels. The
syntax for these plot statements has been discussed in the preceding chapters as listed below, so
we need not discuss them again here.
• Basic Plots : Chapter 4
• Fit and Confidence Plots : Chapter 5
• Distribution Plots : Chapter 6
• Categorization Plots : Chapter 7
The following statements are new to the SGPANEL procedure, or they behave differently than in
the SGPLOT procedure. We will discuss these statements in detail in this chapter:
• PANELBY : Defines the paneling structure of the graph
• KEYLEGEND : Customizes the external legend
• REFLINE : Adds common reference lines
• COLAXIS : Customizes the common external column axis
• ROWAXIS : Customizes the common external row axis
Uniform Data Ranges: For a panel graph, the axis ranges are always uniform as follows:
• All vertical axes in any one row always have a uniform data range.
• All horizontal axes in any one column always have a uniform data range.
• Vertical (or horizontal) data ranges may or may not be uniform between rows (or
columns).
• A Y reference line is drawn at the same Y value in all cells of a row.
• An X reference line is drawn at the same X value in all cells of a column.
The space provided for the graph is subdivided into an even grid of
cells based on the number of crossings for the class variables.
Figure 10.3.1
Four different types of layouts are available. These are panel (default),
lattice, columnlattice and rowlattice as described below.
An example of the lattice layout is shown in figure 10.3.1, where the first class variable is used as
the column variable and the second as the row variable.
260 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Options:
COLHEADERPOS =keyword TOP | BOTTOM | BOTH
COLUMNS =value Number of columns
LAYOUT =keyword PANEL, LATTICE, ROWLATTICE, etc.
ROWHEADERPOS =keyword LEFT | RIGHT | BOTH
ROWS =value Number of rows
SPACING =value Spacing between cells
START =keyword TOPLEFT | BOTTOMLEFT
UNISCALE =keyword ALL | COLUMN | ROW
Boolean Options:
BORDER | NOBORDER Displays borders
MISSING Includes missing as a separate level
NOVARNAME Displays only the class value in the header
ONEPANEL Prevents paging of the panel
Layout Types – PROC SGPANEL supports four different layouts. Each layout subdivides the
available graph region into a regular grid of cells. The layout of the cells and the location of the
headers are different for each layout type as shown in Figure 10.3.2 below.
Panel
(Multiple class
Variables)
RowLattice
Lattice
(One class var)
(Two Class
Variables)
ColumnLattice
(One class var)
Figure 10.3.2
Chapter 10 Classification Panels 261
Figure 10.4.10: Grouped Bar Chart title 'Vehicle Mileage by Origin ...';
proc sgpanel data=sashelp.cars;
where type <> 'Hybrid';
SG procedures with SAS 9.2 do not panelby origin / onepanel novarname
support adjacent “cluster” grouped bar layout=columnlattice noborder
charts. This graph type can be simulated colheaderpos=bottom;
using the Layout=ColumnLattice option vbar type / response=mpg_city
stat=mean dataskin=gloss group=type;
and other appropriate options as shown colaxis grid display=none
here. offsetmin=0.2 offsetmax=0.2;
rowaxis grid;
SAS 9.3 does support “cluster” grouped run;
9.3
bar charts as discussed earlier.
266 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
11.1 Introduction
Comparative and matrix graphs primarily consist of scatter plots of multiple response variables or
measures in the data set. These graphs are especially valuable for understanding the raw data
received from the field. Often the analyst would like to get a feel for the data and find
relationships between variables of a data set using “visual” means. This can be a valuable tool in
the pre-analysis phase of the project.
The SGSCATTER procedure is specifically designed for the creation of grids of scatter plots in
the form of comparative or matrix graphs. Now, you certainly can use the SGSCATTER
procedure to create a single-cell scatter plot with fit curves and confidence bands. However, you
will be better off using the SGPLOT procedure to create the single-cell graph as seen in the many
examples shown in previous chapters. The SGSCATTER procedure is better suited for creation
of grids of scatter plots with a few fit and confidence plots.
Options:
DATA =sas-data-set Optional data set
DESCRIPTION =string Description string
DATTRMAP =sas-data-set Data set defining an attribute map
PAD =value Padding around the outside of the graph
SGANNO =sas-data-set Data set containing the annotations
TMPLOUT =string File name for generated graph template code
Plot request is in one of the forms listed below. A regular grid of scatter plots is created based on
the plot request.
• y * x : creates a single scatter plot.
• y1 * x1 y2 * x2 : creates a graph with two scatter plots as defined.
• (y1 y2) * x : creates a graph with two separate scatter plots y1 * x and y2 * x.
• y * (x1 x2) : creates a graph with two separate scatter plots y * x1 and y * x2.
• (y1 y2) * (x1 x2): creates a graph with four separate scatter plots.
Options:
ATTRID =variable Associated Attr Map for visual attributes
COLUMNS =value Number of columns in panel
DATALABEL =variable Variable for labeling of scatter points
ELLIPSE <=(options) Fit a confidence ellipse to the data
GROUP = variable Grouping variable
JOIN <=options> Connect the data points
LEGEND = (options) Controls legend characteristics
LOESS = (options) Fit a Loess curve to the data
MARKERATTRS =marker-attrs Specify marker attributes
PBSPLINE =(options) Fit a penalized B-Spline to the data
REG =(options) Fit a regression line to the data
ROWS =value Number of rows in panel
SPACING =value Spacing between rows and columns
TRANSPARENCY =value Transparency value
UNISCALE =keyword Uniform scaling for X | Y | ALL. Default is NONE
Boolean Options:
GRID Display borders
NOLEGEND Suppress display of legend
REFTICK Display duplicate reference tick marks
Chapter 11 Comparative and Matrix Plots 271
Figure 11.3.7: Multiple Plot Request title "Male Patients Vital Signs
Profile";
proc sgscatter data=sashelp.heart;
This graph shows diastolic and systolic where sex='Male';
blood pressure by age and cholesterol by plot (diastolic systolic) *
weight in one row per the plot request. ageatstart cholesterol*weight
Regression fit is shown. / grid Reg=(cli degree=2)
markerattrs=(symbol=circlefilled)
The transparency of the markers is transparency=0.98 rows=1;
increased to make the fit line and bands run;
easier to see.
Figure 11.3.8: Multiple Plot Request title "Male Patients Vital Signs
Profile";
proc sgscatter data=sashelp.heart;
This graph shows diastolic and systolic where sex='Male';
blood pressure and “Cholesterol by plot (diastolic systolic cholesterol)
Age” and “Age at Death” by “Age at * ageatstart ageatdeath * ageatstart
Start”. A penalized B-spline fit is shown. / grid pbspline=(clm degree=1)
markerattrs=(symbol=circlefilled)
Marker transparency is set to make the transparency=0.98 rows=1 ellipse;
fit line and ellipses easier to see. run;
Chapter 11 Comparative and Matrix Plots 275
The COMPARE statement creates regular grids of scatter plots. Fit lines and ellipses can be
included. The graphs always have common row and column axis. By default, individual columns
have uniform data ranges for the x-axis. Individual rows have uniform data range for the y-axis.
Options:
ATTRID =variable Associated Attr Map for visual attributes
DATALABEL =variable Variable for labeling of scatter points
ELLIPSE <=(options) Fit a confidence ellipse to the data
GROUP = variable Grouping variable
JOIN <=options> Connect the data points
LEGEND = (options) Controls legend characteristics
LOESS = (options) Fit a Loess curve to the data
MARKERATTRS =attrs Specify marker attributes
PBSPLINE =(options) Fit a penalized B-spline to the data
REG =(options) Fit a regression line to the data
SPACING =value Spacing between rows and columns
TRANSPARENCY =value Transparency value
Boolean Options:
GRID Display grid lines
NOLEGEND Suppress display of legend
REFTICK Display duplicate reference tick marks
Chapter 11 Comparative and Matrix Plots 277
The MATRIX statement creates regular grids of scatter plots of pair-wise combinations of all the
variables in the list. Diagonals are used to label the variable name. Diagonals can alternatively
be used to display the distribution of each individual variable itself:
Options:
ATTRID =variable Associated Attr Map for visual attributes
DATALABEL =variable Variable for labeling of scatter points
ELLIPSE <=(options) Fit a confidence ellipse to the data
GROUP =variable Grouping variable
JOIN <=options> Connect the data points
LEGEND =(options) Controls legend characteristics
MARKERATTRS =attrs Specify marker attributes
START =keyword Specify the starting position
– TOPLEFT | BOTTOMLEFT
TRANSPARENCY =value Specify the transparency for the markers
Boolean Options:
NOLEGEND Suppress display of legend
282 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Evaluation of such data can be significantly enhanced by the use of graphical displays of the data
in the form of dot plots, box plots, and lattice and matrix displays. Graphical display of data
provides insights into trends and correlations that are just not possible through tabular data.
Often, combining the data and summary statistics on a single display allows for improved analysis
and interpretation of the results.
The examples include displays of lab results over time, distribution of tests by treatment, lattice
and matrix displays of liver function tests (LFT), patient profiles, adverse event plots sorted by
relative risk, hazard function plots, and displays for evaluation of blood chemistry and
hematology.
SG procedures are very well suited for the creation of such displays using the techniques
described in the previous chapters of this book. In this chapter, we will examine how you can
create some common graphs used in this industry. Some examples are shown in Figure 12.1.
Detailed code for creating such graphs is shown in the following pages.
Figure 12.1
288 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Study OddsRatio LowerCL UpperCL Weight Q1 Q3 ObsId study2 lcl2 ucl2 OR LCL UCL WT
Modano (1967) 0.590 0.096 3.634 5% 0.56 0.62 1 0.096 3.634 OR LCL UCL Weight
Borodan (1981) 0.464 0.201 1.074 18% 0.38 0.55 2 0.201 1.074 OR LCL UCL Weight
0.328 0.233 0.462 . . . 16 Overall . . OR LCL UCL Weight
aestdate aeendate aeseq aedecod aesev y aestdy aeendy stday enday lcap hcap xs
06MAR13 06MAR13 . MILD -9 0 0 0 0 0
06MAR13 06MAR13 . MODERATE -9 0 0 0 0 0
06MAR13 06MAR13 . SEVERE -9 0 0 0 0 0
06MAR13 06MAR13 1 DIZZINESS MODERATE 1 0 0 0 0 0
20MAR13 . 2 COUGH MILD 2 14 . 14 104 ARROW 0
12.10 QTc Change Graph with Annotated “At Risk” Values (9.3)
Here we have added the “At Risk” table to the graph shown in Figure 12.9 using the Annotation.
• An annotation data set is created to draw the at-risk values for “Drug A” and “Drug B”.
• The X coordinate is in DataValue space so they line up with the X axis values.
• The Y coordinate is in WallPixel space, 62 and 74 pixels below the y-axis.
• Annotations are also created for the labels and the “At Risk” label.
• Space is created for the annotation by adding a bottom pad of 80 pixels.
footnote j=l 'Most Frequent On Therapy Adverse Events Sorted By Relative Risk';
proc sgplot data=sgbook.MostFrequentAE;
refline refae / lineattrs=(thickness=12) transparency=0.8;
scatter y=ae x=a / name='a' legendlabel='Drug A (N=216)';
scatter y=ae x=b / name='b' legendlabel='Drug B (N=431)';
scatter y=ae x=mean / xerrorlower=low xerrorupper=high x2axis;
refline 40 / axis=x;
keylegend 'a' 'b';
xaxis offsetmax=0.5 grid labelattrs=(size=8) valueattrs=(size=7)
label='Percent ';
x2axis offsetmin=0.5 type=log logbase=2 logstyle=logexpand grid max=64
labelattrs=(size=8) valueattrs=(size=7)
label=' Relative Risk with 95% CL';
yaxis display=(nolabel noticks);
run;
Chapter 12 Health and Life Sciences Graphs 301
PARAM PERCENT time chartvar lcl ucl n Nlbl Lcllbl UclLbl PctLbl
EYES ITCHY/GRITTY 40.0 Week 1 Placebo 25.7 54.3 45 N LCL UCL %
EYES ITCHY/GRITTY 60.9 Week 1 Drug A 49.4 72.4 69 N LCL UCL %
EYES ITCHY/GRITTY 52.1 Week 1 Drug B 40.6 63.5 73 N LCL UCL %
EYES ITCHY/GRITTY 62.2 End Point Placebo 48.1 76.4 45 N LCL UCL %
EYES ITCHY/GRITTY 69.6 End Point Drug A 58.7 80.4 69 N LCL UCL %
EYES ITCHY/GRITTY 57.5 End Point Drug B 46.2 68.9 73 N LCL UCL %
• X and Y axes are restricted to the lower 75% of the space (OffsetMax=0.25).
• X2 and Y2 axes are restricted to the upper 15% of the axis space (OffsetMin=0.85).
• Each graph is drawn using the appropriate combinations of X, Y, X2, and Y2 axes.
• Reference lines are used to demarcate the cells.
The graphs shown in this book have been customized for gray-scale rendering. However, color
graphs are particularly useful for business graphics, and the visual appearance of these graphs
can be further enhanced by the use of color styles.
The techniques described in the earlier chapters of this book can be leveraged in creative ways to
create such graphs. Here we have used the axis-splitting technique to display multiple graphs in
one cell, like the graph for bond yields. We have used combinations of HighLow and VBar to do
the “Product Sales and targets” graph, and we used a bubble plot to do the “Social Graph”.
In this chapter, we will examine how you can create some common graphs used in this industry.
Some examples are shown in Figure 13.1. Detailed code for creating such graphs is shown in
the following pages.
• Y-Y2 split technique is used to draw price data above the volume data.
• Price data is an overlay of the Bollinger band, 25 and 50 event moving average, and
price.
• Volume data includes volume and moving average.
• The key legend is placed inside in the empty upper right corner with all the details.
Stock Date Open High Low Close Volume avg25 avg50 upper lower vavg
IBM 01AUG88 $126.00 $126.87 $110.37 $112 5,256,886 $23 . $29 $16 7560307.56
IBM 01SEP88 $111.12 $116.50 $109.50 $115 5,433,352 $22 . $29 $16 7552677.08
IBM 03OCT88 $115.12 $124.87 $112.75 $123 5,790,742 $22 . $29 $16 7462149.84
14.1 Introduction
Graphs created by the SG procedures derive their visual look and feel from the active style for the
output destination. Each ODS destination has a default associated style. Styles supplied by SAS
are designed to ensure effectiveness and an aesthetically pleasing appearance by default.
Default styles for common destinations are shown in Figure 14.1.
You can create your own style to suit the look you ODS Destination Style Name
prefer or to present a consistent corporate look by LISTING Listing
using the TEMPLATE procedure. The description of
HTML HtmlBlue
PROC TEMPLATE is beyond the scope of this book.
RTF RTF
However, since styles are intimately related to the
PDF PRINTER
topic of graphs, we will cover the related relevant
topics here. Figure 14.1
For a discussion of styles and the TEMPLATE procedure, see the SAS documentation.
When you do this, all subsequent output sent to the destination will use this assigned style until
the destination is closed or the style is changed.
The default visual attributes of various graph elements are also derived from the style. Plot
colors, marker symbols, line thickness, axis label fonts, etc. are derived from specific named
elements of the style. The association between the element of the graph and the style element
is well defined and described in detail in the product documentation. Some common style
element names and their associated graph elements are shown in Figures 14.2 and 14.3.
322 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
GraphWalls
GrapValueText GraphFootnoteText
Graph
Figure 14.2: Graph and Text Elements
Figure 14.2 above lists some style elements that are commonly used for various elements in
the graph. These elements are used to determine the visual attributes for the graph
background, walls, and the various text statements such as titles, footnotes, etc. The
elements are listed below. This graph was created with SAS 9.3 PROC SGPLOT.
If you want to create a new style that has smaller fonts for the tick values, you can customize
the GraphValueText element appropriately. Then, all the graphs using this style will display
the new font you have selected.
A new style can be created using the TEMPLATE procedure. It is generally good practice to
derive a new style from a parent style. In this way, you only need to customize the elements
that you need to change. For details on how to create a new style, see the SAS
documentation for the TEMPLATE procedure.
Chapter 14 Styles 323
GraphDataDefault GraphReference
Figure 14.3 above shows some style elements that are commonly used for various plot
elements in the graph. These elements are used to determine the visual attributes for the
parts of the bar chart or series plot.
The GraphDataDefault element is used to display plots that do not have a GROUP option. In
Figure 14.3, the bar chart does not have a GROUP option set, so all the bars for this plot use
the GraphDataDefault style element.
Plot statements that use a GROUP option utilize the visual attributes from GraphData1 to
GraphData12 style elements. Most styles supplied by SAS have 12 data elements.
However, there is no limit, and you can define more or fewer elements as you please. For a
full list, see the SAS documentation.
For example, the default font size for the x-axis tick values may be just a little too big to fit the
space available. Shrinking the font just a bit will do just fine. One way to do this would be to
change the size of the currently used font:
xaxis valueattrs=(size=7);
You will likely have seen such syntax in many places in the examples. This works fine, but in
other cases you want to set the visual properties of one plot to the same as another item in the
graph. You can use the Data Label font for the Axis Tick Values as follows:
xaxis valueattrs=GraphDataText;
This technique is especially useful when using color output. If you have two separate measures
plotted on the same graph, one to the y-axis and one to the Y2 axis, it can be useful to set the
color of the axis tick values to the same color as the plot line. You can do that as follows:
yaxis valueattrs=GraphData1;
y2axis valueattrs=GraphData2;
For the reason above, and to ensure your graph works well for all use cases, it is recommended
you always use “style relative” settings. If you do not like a color used in the graph, try one of the
other colors defined in the style. Hard-coded colors are best used for one-off graphs that are not
likely to be used with different style settings (see Figure 13.6).
Chapter 15
ODS Destination and ODS Graphics
Options
15.1 Introduction 327
15.2 ODS Destination Options 327
15.3 ODS Graphics Options 328
326 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Global options that apply to the graphs are specified on the ODS GRAPHICS statement. In
Chapter 2, we discussed the use of the ODS GRAPHICS statement to switch on the creation of
automatic graphs from the SAS procedures. With SAS 9.3, ODS Graphic is ON by default for all
procedures including the SG Procedures. Global options that control certain aspects of the
graphs can be provided on this statement. These options are discussed below.
SGE option: This option is used to enable or disable the creation of editable graphical output in
the format used by the ODS Graphics Editor. For SAS 9.2, only the LISTING destination
supports this option. For SAS 9.3 this option is available on all ODS destinations.
By default, the SGE option is OFF. When SGE=ON, an editable graph is created along with the
image file. This output is listed in the Results window. You can open this file to launch the ODS
Graphics Editor to make non-persistent customizations to the ODS Graphics output.
DPI and Image_DPI: Graphs are rendered at default DPI for the output destination. For some
destinations, the DPI applies only to the graph image. The size of the graph is based on the
active DPI. If the size is specified in pixels (the default is in pixels), then the baseline for scaling
is the active DPI for the destination.
328 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
For complete details on all options for each ODS destination see the SAS documentation for ODS
Language Statements.
Anti-aliasing: This is a technique used to improve the rendering of various elements in the
graph such as the markers, lines, text, and so on. Anti-aliasing improves the visual appearance
of the elements but consumes more computing resources. Text elements in the graph are always
anti-aliased for high quality. Anti-aliasing of the line and marker elements can be controlled by
these options. When there are too many observations in the data, anti-aliasing becomes less
effective and more costly. By default, anti-aliasing is turned off for markers and lines when the
number of observations exceeds 600. The ANTIALIAS option can be used to enable or disable
anti-aliasing. The ANTIALIASMAX option can be used to change the level when anti-aliasing is
turned off.
down or even run out of memory. This can happen if the user inadvertently selects a variable that
has a large number of discrete values for a discrete role, such as zip code, etc. In normal usage,
variables used for such roles have < 1000 unique values. In this case, the procedure will stop
processing and log a message. If you really need to run such a use case, you can set these
settings to a larger value to run your use case.
OUTPUTFMT: The default output format is determined by the ODS destination. For many
destinations like LISTING, HTML, etc., the graph is created in industry-standard image formats.
You can specify the format you want by using this option. The value you provide will only be
honored if the output destination supports the format.
IMAGEMAP: For the HTML destination, tool tips can be displayed for the data values in the
graph. By default, this setting is OFF since it consumes computing resources. You can enable
creation of the image map for display of tool tips for HTML destinations by using this option.
IMAGENAME: The default name for the graph output file is “sgplot.xxx” or “sgpanel.xxx” based
on the procedure used. File names are automatically appended with an incremented counter to
avoid overwriting of the files from multiple procedure executions. You can provide your own
preferred name for the filename prefix.
LABELMAX: Often you can request the display of data labels with a scatter plot. The
procedures will attempt to place the data labels close to the markers, with minimal collision
between labels. At some point, there are too many labels in the graph, and the graph becomes
ineffective. By default, the display of labels is disabled when the number of labels exceeds 200.
You can use this limit to control this feature.
MAXLEGENDAREA: By default, the legends for the graph are dropped if the area occupied by
them exceeds 20% of the total area of the graph. This is to prevent cases where the legend is so
big that there is not much space left for the display of the data itself. You can control this limit by
using this option.
PANELCELLMAX: The SGPANEL procedure subdivides the graph into cells based on the
number crossings of the classification variables. By default, if the number of cells exceeds
10,000, the graph is not created and a message is logged. You can control this limit by using this
option.
RESET: You can reset all the options to the default settings by using this option. You can also
use reset=option-name to reset one or more specific options. You can use reset=index to
reset just the counter that is used to postfix an index to the output file name.
SCALE: The output graph size is determined by the procedure. See the WIDTH or HEIGHT
option. If you request a different height of the graph, the internal details of the graph are scaled
using a nonlinear algorithm designed to render graphs with reasonable font sizes. In such a
case, the actual font size may be different from what was specified for the text in the graph. You
can enable or disable this scaling for height by using this option.
330 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
TIPMAX: Tooltips can be displayed for the data values for the HTML destination. By default, if
the number of observations exceeds 500, tooltips are disabled. You can control this limit by using
this option.
WIDTH and HEIGHT: The default width and height for the graph is determined by the procedure,
and is often 640px x 480px. In case of paneled graphs, the size may vary based on the number
of cells. You can control the size of the graph using these options.
Chapter 16
Tips for Graph Output
For the LISTING destination, the default graph size is 640 pixels x 480 pixels at 96 DPI. These
default settings work well for viewing the graphs on the computer screen. For inclusion of such
graphs in various documents, some customization is useful.
For the graph for Figure 16.2, we need to fit the graph inside a 2.875-inch-wide box. So, we have
created the graph of the correct size by using the ODS Graphics WIDTH option. When this graph
is inserted into the document, no image scaling is required.
The graph in Figure 16.2 has fonts that are easy to read. The line thickness is also bolder. If the
graph had markers, they would be bigger. Here the graph is doing the scaling, and it uses a font-
friendly way to scale these items, resulting in a more readable graph.
334 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
Design and Render Size: Each graph has a default “design” size. For most cases, this is 640px
x 480px. This size cannot be changed directly in SG procedures, but it can be changed if you are
using GTL. A different “render” size can be specified by using the WIDTH and HEIGHT options
on the ODS GRAPHICS statement.
When the render size is different from the design size, the graph is scaled to fit the new size. All
individual elements inside the graph are scaled in a sub-linear fashion. The scale is:
0.25
Scale-Factor = (Render-Height / Design-Height)
This is done specifically to slow the shrinking or growth of the font sizes. Therefore, as the graph
height shrinks, the font size shrinks too, but at a slower rate. Similarly, as the graph height
grows, the font size grows too, but at a slower rate.
For our use case shown above, Design-Height = 480px. The Render-Width is 2.875. If we retain
the aspect ratio, the render width is 2.875 *.0.75 = 2.16in at 96 DPI, resulting in a graph width of
207 pixels. The Linear scale for the image size is 207 / 480 = 0.43. However, the Scale-Factor
shown above is 0.43 ** 0.25 = 0.81.
In Figure 16.1, the scale factor for the image and the fonts is 0.43. Therefore, a 10-point font
effectively shows as a 4-point font, which is quite small. For the graph in Figure 16.2, the scale
factor for image is 0.43, but the scale factor for the fonts is 0.81. So, a 10-point font is drawn at 8
point, which is still quite readable.
Recommendation for Creating Small Graphs: Always render your graph at a size close to its
final usage. Using the default size graph in a document will be fine if the graph fills a width of 6 to
7 inches. But if the graph is to fit a smaller size, do not use the default size graph.
Caveat: This does mean that the final rendered font sizes may differ from those specified in the
code or the style. If it is important that the font size be exactly as specified, then you can use the
NOSCALE option on the ODS GRAPHICS statement. Even if the graph render height is different
from the default design height, the fonts, lines, and markers will not scale.
If you want to create graph output that can be embedded in another document or
presentation, it is convenient to create a separate file using the LISTING destination. With
SAS 9.2, all graph output is in image format. Even for PDF and RTF output, the graph is
essentially an image embedded into the document.
With SAS 9.3, support for vector graphics output has been added. You can now generate
EMF, PDF, PS, SVG, and PCL output, depending on the ODS destination you use. In some
special cases, the graphs use data skins or have gradient legends. In such cases, the output
is in image formats. This is also the case when using transparency for EMF and PS.
Graph size is controlled by setting the WIDTH and HEIGHT options on the ODS GRAPHICS
statement and DPI, as seen in section 16.2. When creating smaller graphs for embedding
into documents, it is useful to change the graph size. It is also useful to increase the graph
size when creating a graph that is more detailed in nature and needs more space for
rendering all the details.
Vector format output is inherently scalable and ideal for use in some cases. So you can use
the default output in EMF or PDF formats for such use cases. You can enlarge the graph on
the computer screen to fill the window, and these vector-based graphs will scale up nicely.
Image-based output created at a low (or default) resolution or DPI does not scale up very
well. When scaled, you will see “jaggies” along the edges of the marker, lines and fonts. To
include such graphs in a presentation using a projection system to a large audience, you
should increase the DPI of the graph. This will make the graph larger and also make all the
elements of the graph (for example, markers, line thickness, and fonts) linearly bigger.
Therefore, this graph, when viewed from afar, will look readable.
Creating a graph with a size of 6.4in x 4.8in at 200 DPI will generate an image that is 1280px
x 960px. The PNG file will include the DPI information in the file; therefore, when you include
this graph in a document, it will occupy a region 6.4in x 4.8in by default. However, if you view
this file in some image viewers or Internet Explorer, you may see the image in the full 1280 x
960 size because these viewers do not honor the DPI setting in the PNG file.
Recommendations for Graphs in Web Pages: Some Web browsers do not honor the DPI
setting in the PNG file. A graph intended to be 6in x 4in (200 DPI) may be shown as 12in x
8in in the browser. Other browsers do scale, but poorly. So, it is best to use the ODS HTML
destination, or create graphs at 96 DPI for usage in Web pages.
336 Statistical Graphics Procedures by Example: Effective Graphs Using SAS
In this book, we have used a size of 3in x 2in at 300 DPI for most of the graphs. Specifying the
size shrinks the fonts a little. Using high DPI gives us graphs that will scale well when printed on
a high-resolution printer. For graphs that occupy a larger area, like those in Chapters 12 and 13,
we have used a larger size of 6in x 3in at 200 DPI.
• More system memory is needed to create the graph. The rendering system is Java based.
SAS system settings for Java specify the amount of memory that can be used by Java.
Using higher size and DPI requires a higher amount of memory. If the graph cannot be
rendered, a warning is logged. If you need the higher resolution graph, you can change the
memory settings using the JREOPTIONS option specified in the SAS system options. See
the SAS documentation for details.
-jreoptions '(-Xmx512m)'
• If sufficient memory resources are available, the graph will be rendered. However, it will
likely consume more system and CPU time to render the graph.
• If the graph is successfully rendered, the output file size will be bigger.
Index
A ARROWHEADSHAPE= option, NEEDLE
statement (SGPLOT) 99
accuracy of magnitude perception 6
arrows, emphasizing data with 247
ACROSS= option, KEYLEGEND statement
attribute maps 235, 251–253
(SGPLOT) 231
automatic graphs from procedures 10–11
adverse event timelines 291–292, 300
automatic paging of panels 28
ALPHA= option
axes
CLM option 140
See also discrete axis
DOT statement (SGPLOT) 215
See also linear axis
ELLIPSE statement (SGPLOT) 153,
See also log axis
155
See also time axis
HBAR statement (SGPLOT) 194
about 20, 221
HLINE statement (SGPLOT) 208
axis assignment examples 134, 141,
LOESS statement (SGPLOT) 140
148
PBSPLINE statement (SGPLOT)
creating axis-aligned statistics table
146–147
240
REG statement (SGPLOT) 132
creating multi-line tick values 239
VBAR statement (SGPLOT) 186
creating tick values with Unicode 238
VLINE statement (SGPLOT) 201
options supported 221
annotations
SGPLOT procedure 22
about 235–237
AXIS= option, REFLINE statement
adding company logos 243
(SGPLOT) 114
background images in graphs 244
bubble legends 248 B
creating axis-aligned statistics table
background images in graphs 244
240
band plots
creating axis tick values with Unicode
about 89
238
graph examples 61
creating forest plots with 241–242
grouped 91, 124
creating multi-axis tick values 239
grouped with transparency 91
images as curve labels 245–246
overlay 92, 124
making polyline figures 249–250
overlay with constant lower limit 92
QTc change graph with annotated at
overlay with curve labels 93
risk values 296
overlay with scatter and series plots
using arrows to emphasize data 247
122
ANTIALIAS option, ODS GRAPHICS
overlay with scatter plots 123
statement 328
overlay with series plots 123
ANTIALIASMAX= option, ODS GRAPHICS
overlay with step plots 93
statement 44, 79, 328
roles and options supported 89
ARROW function 236
338 Index
C CLM option
LOESS statement (SGPLOT)
Carr, Daniel 4
140–141, 157
categorization plots
PBSPLINE statement (SGPLOT)
See also bar charts
146–147
See also dot plots
REG statement (SGPLOT) 40, 132
See also line charts
CLMTRANSPARENCY= option, REG
about 181
statement (SGPLOT) 40
combining statements 29
CLOSE= option, HIGHLOW statement
roles and options supported 182
(SGPLOT) 53, 109
SGPANEL procedure 181, 258
CLUSTERWIDTH= option
SGPLOT procedure 22, 62, 181
HBOX statement (SGPLOT) 176
CATEGORYORDER= option
NEEDLE statement (SGPLOT) 96
discrete order and 229
SERIES statement (SGPLOT) 83
DOT statement (SGPLOT) 216
COLAXIS statement, SGPANEL procedure
HBAR statement (SGPLOT) 193
about 25–26, 33, 257, 259
VBAR statement (SGPLOT) 187
box plot row lattice example 55
VLINE statement (SGPLOT) 209
histogram lattice example 56
cell graphs 20, 307
color graphs 14–15
classification panels
COLUMN= option, PANELBY statement
See also SGPANEL procedure
(SGPANEL) 263
about 20–21, 37, 257–258
COLUMNLATTICE layout (SGPANEL
automatic paging 28
procedure) 28
bar chart panels 55
comparative and matrix graphs 37, 57
box plot row lattices 55
comparative and matrix plots
column lattice 265
See also SGSCATTER procedure
grouped bar chart 265
about 269
histogram lattice 56
comparative graph 277, 279
lattice by origin and type 262
grouped comparative graph 278
mileage by horsepower and type 261
grouped comparative plot 278
mileage by type 261
grouped scatter plots 284
paging of large panels 266
heat map 280
panel by origin and type 262
multiple plot requests 274
panel by type 263–264
plot grids 272
row lattice by type 263–264
plot with fit 273
scatter panels 56
plot with fit and ellipse 273
Cleveland, William S. 3
rectangular matrix 280
CLI option
scatter plot matrix 282, 284
PBSPLINE statement (SGPLOT)
scatter plot matrix with diagonals and
147–148
ellipse 283
REG statement (SGPLOT) 40,
scatter plot matrix with histograms 283
133–134
scatter plots 271
CLIP option, ELLIPSE statement (SGPLOT)
scatter plots with attributes 271
155
340 Index
vertical bar charts (continued) with CLM and data label position 202
grouped using SGPANEL procedure with confidence limits 201
189 with curve labels 203
grouped with data labels 102 with data labels 200
grouped with skins 48, 51 with markers 199
multiple with patterns, fill colors, skins with reference line 201
51 with upper limits 202
overlaid with discrete offset 190 vertical vector plots 99
overlay 103 The Visual Display of Quantitative
overlay with skins and offsets 103 Information (Tufte) 3
roles and options supported 183 visual perception, effective graphics and
stacked with skins 48 4–5
with adjacent groups 188 Visualizing Data (Cleveland) 3
with an upper limit 186 vital signs by time point name 305
with confidence limits 186 VLINE statement, SGPLOT procedure
with data labels 185 about 198
with fill attributes and data skins 184 ALPHA= option 201
with groups and pattern fills 189 BREAK option 200
with limits and label positioning 187 CATEGORYORDER= option 209
with no fill 184 CURVELABEL= option 203
with patterns 50 DATALABEL option 200
with reference line 185 DATALABELATTRS= option 200
with response sorting 187 DATALABELPOS= option 202
with stacked groups 188 DISCRETEOFFSET= option 204
vertical box plots LEGENDLABEL= option 204
about 169 LIMITS= option 202
graph examples 42, 161, 170 LINEATTRS= option 49, 199
grouped 170 MARKERATTRS= option 199
grouped unfilled 171 NUMSTD= option 202
on linear axis 172
overlay 171
W
overlay for linear data 172 waterfall charts
roles and options supported 169 about 117
with labels 173 graph examples 118
with notches 173 grouped 119
vertical line charts roles and options supported 117
about 198 with data labels 119
graph examples 181, 199 with initial value 118
grouped 203 WATERFALL statement, SGPLOT
overlaid 204 procedure 117
overlaid with discrete offset 204 WHERE statement, SGPLOT procedure 43
roles and options supported 198 WIDTH= option, ODS GRAPHICS statement
with break 200 15, 328, 330
Index 357
X Yoda 128
YORIGIN= option, NEEDLE statement
XAXIS statement, SGPLOT procedure (SGPLOT) 98
about 23–24, 33
DISCRETEORDER= option 229
FITPOLICY= option 119, 229
INTERVAL= option 227
LABEL= option 206, 213
MAX= option 223
MIN= option 223
MINOR option 228
NOTIMESPLIT option 227
OFFSETMIN= option 71, 114
REVERSE option 197
TICKVALUEFORMAT= option 228
TYPE= option 43, 172
VALUEATTRS= option 125
VALUES= option 223
VALUESHINT option 224
X2AXIS statement, SGPLOT procedure
about 24, 33
DISPLAY option 77
GRID option 134, 141, 148
XORIGIN= option, NEEDLE statement
(SGPLOT) 98
Y
YAXIS statement, SGPLOT procedure
about 24, 33
DISPLAY= option 52
INTEGER option 224
LABEL= option 199
LOGBASE= option 226
LOGSTYLE= option 226
MIN= option 79, 85
MINOR option 225
OFFSETMAX= option 46, 95, 102, 173
OFFSETMIN= option 118
REVERSE option 190
VALUEATTRS= option 125
Y2AXIS statement, SGPLOT procedure
about 24, 33
GRID option 134, 141, 148