Clinstdtktug
Clinstdtktug
SAS® Documentation
The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS® Clinical Standards Toolkit 1.7:
User’s Guide, Second Edition. Cary, NC: SAS Institute Inc.
SAS® Clinical Standards Toolkit 1.7: User’s Guide, Second Edition
Copyright © 2016, SAS Institute Inc., Cary, NC, USA
Chapter 2 / Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Global Standards Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
What Is a Standard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Common Framework Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Common Usage Scenarios for the Framework . . . . . . . . . . . . . . . . . . . . . . . . . 16
Maintenance Usage Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
vii
Whatʼs New
What's New in the SAS Clinical
Standards Toolkit
Overview
Here are the significant new features in the SAS Clinical Standards Toolkit 1.7:
n Macro changes
n Support for CDISC CDASH 1.1
n Support for CDISC Dataset-XML 1.0
n Additional support for CDISC Define-XML 2.0
n Reduced and consolidated validation_master data sets for SDTM 3.1.2, 3.1.3, 3.2,
and ADaM 2.1
n Support for the Analysis Results Metadata 1.0 extension for Define-XML 2.0
Macro Changes
Here are the changes to macros that have been made in the SAS Clinical Standards
Toolkit 1.7:
viii SAS Clinical Standards Toolkit
1
Introduction to the SAS Clinical
Standards Toolkit
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Clinical
The SAS Clinical Standards Toolkit focuses primarily on supporting clinical research
activities. These activities involve the discovery and development of new
pharmaceutical and biotechnology products and medical devices. These activities
occur from project initiation through product submission and throughout the full
product lifecycle. They do not include non-research patient records or health-care,
pharmacy, hospital, and insurance electronic records.
Standards
The SAS Clinical Standards Toolkit initially focuses on standards defined by the
Clinical Data Interchange Standards Consortium (CDISC). CDISC is a global, open,
multidisciplinary, nonprofit organization that has established standards to support the
2 Chapter 1 / Introduction to the SAS Clinical Standards Toolkit
Toolkit
The term toolkit connotes a collection of tools, products, and solutions. The SAS
Clinical Standards Toolkit provides a set of standards and functionality that will
evolve and grow with future product updates and releases. Customer requirements
and expectations of the SAS Clinical Standards Toolkit will play a key role in
deciding what functionality to provide in future releases.
References
Table 1.1 References
2
Framework
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
What Is a Standard? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Overview
The Framework module of the SAS Clinical Standards Toolkit enables you to manage
the registration of standards, and provides the metadata and API infrastructure to
interact with those standards.
To understand the Framework module, you must understand the fundamentals of how
the files are structured and used. The Framework module has two distinct pieces:
n the components that are installed as part of the SAS Foundation and shared files
(SAS macros, JAR files, and so on)
n the global standards library
The following sections describe the structure of the global standards library. The
sections use some of the framework macros to show how the shared files are used.
Global Standards Library 9
During the installation and configuration of the SAS Clinical Standards Toolkit, you are
prompted for the location where the global standards library should be installed. The
configuration process creates a series of directories in this location.
n logs contains the transactionlog data set used by the metadata management
macros. For more information, see Chapter 4, “Metadata Management,” on page
57.
n metadata contains data sets that have information about the registered standards.
For more information, see “Common Framework Metadata” on page 13.
n schema-repository contains the schemas for XML-based standards that are
supported.
n standards contains a standard-specific directory hierarchy for each of the
supported standards.
n xsl-repository contains directories and XSL files used in reading and writing
XML files.
The logs directory contains one data set: transactionlog. This data set is populated
only by the metadata management macros. The data set can be updated by one or
more users depending on how the SAS Clinical Standards Toolkit is implemented (file
server installation or single installation on a laptop). The data set contains metadata
update information from all users.
The metadata directory contains three data sets and one XML file: Standards,
Standardlookup, StandardSASReferences, and availabletransforms.xml. The Standards
data set has a list of the registered standards and basic information relating to each
standard.
10 Chapter 2 / Framework
The following display shows the full content of the global standards library Standards
data set included with the SAS Clinical Standards Toolkit after a new installation of the
application:
Note: The &_cstGRoot directory in the rootpath column maps to the global standards
library directory.
The StandardSASReferences data set defines the typical inputs and outputs of SAS
processes that are associated with each standard.
Figure 2.2 Global Standards Library: Some Rows and Columns of the Metadata
StandardSASReferences Data Set
The type and subtype columns can be used to reference information that the SAS
Clinical Standards Toolkit needs. This information is in the directory structures and file
Global Standards Library 11
naming standards used by the customer. A full list of valid types and subtypes are
provided in this document.
The standards directory contains subdirectories for each of the standard versions that
is provided with the SAS Clinical Standards Toolkit. In addition, there are subdirectories
for user-customized versions of these standards and any new user-defined standards.
Each subdirectory should be considered a stand-alone module. This is how the SAS
Clinical Standards Toolkit can keep parallel standards and reduce the need for
revalidation. Within each subdirectory, there might be directories that group the files,
data sets, and housekeeping programs.
The Standardlookup data set contains discrete lookup values specific to a SAS Clinical
Standards Toolkit registered standard. It provides specific information for column values
and data set template names. In addition, this data set is used to perform internal
validation of the SAS Clinical Standards Toolkit.
The following display shows the directory structure for a Microsoft Windows global
standards library with cdisc-sdtm-3.1.3-1.7 expanded:
Figure 2.4 Directory Structure for a Microsoft Windows Global Standards Library
The schema-repository directory contains XML schema definitions that are used to
validate XML files. Standards that use XML should have their schemas in this directory
so that they can be found. For example, the schema-repository directory for CDISC
CRT-DDS 1.0 as defined in the Standards data set maps to this location:
global standards library directory/schema-repository/
cdisc-crtdds-1.0.0
The xsl-repository directory contains files that are used to transform XML files from
one format to another. For example, the default style sheet directory for CDISC CRT-
DDS 1.0 define.xml files created by the SAS Clinical Standards Toolkit as defined in the
Standards data set maps to this location:
global standards library directory/xsl-repository/CRT-DDS/1.0/
export
What Is a Standard?
The answer to this question depends on what the standard is supposed to do. In the
case of terminology, it might be a format catalog and a data set. In the case of an XML-
based standard, it might be metadata that describes the SAS representation of the
XML. It might be data sets that control validating the SAS representation of the XML. It
might be routines to convert the SAS representation to the actual XML files. Or, it might
be initialization files for standard-specific properties.
The minimum number of items that are needed to register a standard to the framework
are the data sets that define the standard, as well as the standard's SASReferences
data set. The macro to register a standard is described in “Registering a New Version of
a Standard” on page 26.
For more information about what a SAS Clinical Standards Toolkit standard is, see
Chapter 5, “Supported Standards,” on page 87.
Overview
The following SAS Clinical Standards Toolkit metadata files support the functions and
common tasks across multiple standards.
14 Chapter 2 / Framework
File structure and content for each of these metadata files are fully described in Chapter
3, “Metadata File Descriptions,” on page 33. Use of these metadata files is
documented in sections that use the SAS Clinical Standards Toolkit metadata.
Other SAS Clinical Standards Toolkit metadata files specific to supported standards or
specific to actions (such as validation) are described in Chapter 3, “Metadata File
Descriptions,” on page 33. They are also discussed elsewhere in this document.
StandardSASReferences
This data set defines the typical inputs and outputs of SAS processes that are
associated with each standard. The StandardSASReferences data set is in the global
standards library metadata folder and within each registered standard folder hierarchy
here:
global standards library directory/standards/<standard>/control
Standardlookup
This data set contains valid values for discrete variables in the SAS Clinical Standards
Toolkit metadata files. The Standardlookup data set is in the global standards library
directory and within each registered standard folder hierarchy at this location:
global standards library directory/standards/<standard>/control
Common Framework Metadata 15
Properties Files
These files provide the set of name-value pairs that are required to establish the
environment for each SAS Clinical Standards Toolkit process. Properties are translated
into SAS global macro variables at the start of each process. Properties are within each
registered standard folder hierarchy here:
global standards library directory/standards/<standard>/programs
Overview
The following sections describe usage scenarios that the framework accommodates.
Code that is required to complete the usage scenario is included in each section. All
macros that are provided in the usage scenarios are in the primary SAS Clinical
Standards Toolkit autocall path:
n Microsoft Windows
!sasroot/cstframework/sasmacro
n UNIX
!sasroot/sasautos
For complete macro documentation, see the SAS Clinical Standards Toolkit: Macro API
Documentation.
This code looks at the global SASReferences data set for a properties entry with a
SubType value of initialize. By default, this entry is located here:
Global macro variables are initialized based on the name-value pairs in this properties
file. After this macro has been called once, you do not need to call it again during the
SAS session, unless you want to override macro variables or reset them.
In this example, the initialization properties for the default version of the CDISC SDTM
standard (currently 3.2) are used without needing to specify a version.
The data set work.regStds contains the information from the global standards library
metadata Standards data set. The work.regStds data set's content matches the
information provided in Figure 2.1 on page 10.
The data set work.regStds contains the information from the global standards library
metadata Standards data set. The last column is productRevision. This column
contains the revision of each standard version. If the productRevision column is blank,
then the standard was originally registered with SAS Clinical Standards Toolkit 1.2.
Here is another, simpler method to determine the current SAS Clinical Standards Toolkit
release:
%put CST Version: %cstutil_getcstversion;
The parameters that are used in this macro call specify the standard CST-FRAMEWORK
and the data set to create to contain the information. Because the standard version is
omitted, the default standard version is used. The data set that is returned is a
SASReferences data set. For the macro call, this display shows the first few columns of
data that are returned.
The Type and SubType identify that it is a SASReferences table. The Standard
identifies the module to be used. If the standard version is not specified, then the default
for standard version is used. The output is a data set named work.sasrefs that contains
0 observations.
,_cstOutputLibrary=work
);
This code creates the domains described by CDISC SDTM version 3.1.3 in the Work
library. Each domain contains 0 observations.
memname='refColumns';
output;
run;
/*
Step 3. Call the macro to get the metadata.
*/
%cst_getstandardmetadata(
_cstSASReferences=work.sasrefs
);
Step 1 uses one macro to create an empty SASReferences data set named
work.sasrefs.
Step 2 determines the information to be returned. The standard and version is CDISC
SDTM 3.1.2. The type and subType identify the types of metadata to be returned. The
sasRef and memname identify the target library and name for each data set.
Step 3 is the actual macro call that does the processing. The data set work.sasrefs
is read, and the global metadata is used to fulfill the request.
The outcome of these steps is two data sets. The data set work.refTables contains
metadata about the CDISC SDTM 3.1.2 domains. The data set work.refColumns
contains metadata about each of the columns defined in the domains.
n If the global standards library needs to move, it can without having to change all of
the SASReferences files that use a standard.
n To change standard versions, you need only to change the contents of the
standardversion column.
standard='CST-FRAMEWORK';
standardversion='1.2';
type='messages';
subtype='';
sasref='cstmsg';
reftype='libref';
order=1;
iotype='input';
filetype='dataset';
allowoverwrite='N';
output;
standard='CST-FRAMEWORK';
standardversion='1.2';
type='lookup';
subtype='';
24 Chapter 2 / Framework
sasref='cstlkup';
reftype='libref';
order=1;
iotype='input';
filetype='dataset';
allowoverwrite='N';
output;
standard='CST-FRAMEWORK';
standardversion='1.2';
type='results';
subtype='validationresults';
sasref='cstrslt';
reftype='libref';
order=1;
iotype='output';
filetype='dataset';
allowoverwrite='Y';
output;
run;
The following display shows what the data set looks like:
The path and memname columns are missing. The user has specified the standard,
standardversion, type, subtype, SASref, and reftype. This information is sufficient. The
rest of the information is available from the registered standard's metadata.
This macro call attempts to insert the missing information if it is found in a registered
standard's metadata:
/*
Step 4. Insert the missing information from registered
standard.
*/
%cst_insertstandardsasrefs(
_cstSASReferences=sasrefs
,_cstOutputDS=outSASRefs
);
Maintenance Usage Scenarios 25
The following display shows what the output data set looks like:
Overview
The following sections describe usage scenarios that the framework accommodates.
Code that is required to complete the usage scenario is included in each section. All
macros that are provided in the usage scenarios are in the primary SAS Clinical
Standards Toolkit autocall path:
n Microsoft Windows
!sasroot/cstframework/sasmacro
n UNIX
!sasroot/sasautos
Note: All of the maintenance usage scenarios require that you have Write access to the
global standards library.
For complete macro documentation, see the SAS Clinical Standards Toolkit: Macro API
Documentation.
TIP Best Practice Recommendation: Do not modify global standards library files
provided with the SAS Clinical Standards Toolkit. Instead, modify copies of these files.
Leaving the SAS files intact enables these files to be updated without concern about
overwriting or losing your changes.
26 Chapter 2 / Framework
Step 1 ensures that the macro variable that contains the global standards library path is
set. Step 2 registers the standard by passing this information:
n The main path to the directory that contains the standard version's files.
n The path to the registration data sets that are used to populate the global standards
library metadata data sets. This is the name of the subfolder in the _cstRootPath
parameter value.
When defining and registering a new standard, you should evaluate which of the
metadata files described in “Common Framework Metadata” on page 13 should be
provided to support new standard functionality. For example:
n Should a sample SASReferences file be created to perform some task?
n Should a Messages data set be added to provide standard-specific informational
messages?
n Should properties files be provided to set standard-specific global macro variables?
For more information about the metadata files that support the SAS Clinical Standards
Toolkit, see Chapter 3, “Metadata File Descriptions,” on page 33. You can define new
metadata types. These new metadata types should be documented in the standard-
specific StandardSASReferences and Standardlookup data sets, and in the SAS
Clinical Standards Toolkit framework Standardlookup data set.
The version 3.1.3 is set as the default version for the CDISC SDTM standard.
This macro call unregisters the CDISC SDTM 3.1.1 standard, removes it from the global
standards library metadata Standards data set, and removes all records for 3.1.1 from
the StandardSASReferences data set:
%cst_unregisterstandard(
_cstStandard=CDISC-SDTM
,_cstStandardVersion=3.1.1
);
2 Confirm that multiple libraries exist for the same standard version.
Maintenance Usage Scenarios 29
Figure 2.8 Multiple Versions per Standard in the Global Standards Library
The following display shows that the registered version CDISC SDTM 3.1.2.-1.6
indicates that it is the original version that was shipped with the SAS Clinical
Standards Toolkit 1.6:
Figure 2.9 Global Standards Library Metadata Standards Data Set before Updates
CDISC SDTM 3.1.2.-1.6 is defined as the default version for the CDISC SDTM
standard.
Step 2: Register the updated CDISC SDTM 3.1.2 metadata in the global standards
library to use the SAS Clinical Standards Toolkit 1.7.
2 Start a SAS session. Make sure that the current directory is the programs directory.
3 To unregister the currently installed revision and version, submit this code:
%cstutil_setcstgroot;
/*
Set the framework properties used for the uninstall
*/
%cst_setstandardproperties(
_cstStandard=CST-FRAMEWORK,
_cstSubType=initialize
);
/*
If the version to be replaced is the default, you must
make another version the default.
In this case, this is the desired final outcome anyway.
*/
Maintenance Usage Scenarios 31
%cst_setstandardversiondefault(
_cstStandard=CDISC-SDTM
,_cstStandardVersion=3.1.3
);
/*
Unregister the standard
*/
%cst_unregisterstandard(
_cstStandard=CDISC-SDTM
,_cstStandardVersion=3.1.2
);
4 Check the Results data set. By default, the data set is work._cstResults. The final
line in the data set should report that the standard version is no longer registered as
a standard.
5 Open and submit the registerstandard.sas file from the programs directory into the
Program Editor.
Figure 2.10 Global Standards Library Metadata Standards Data Set after Updates
32 Chapter 2 / Framework
33
3
Metadata File Descriptions
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
StandardSASReferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Standardlookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
SASReferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Overview
The SAS Clinical Standards Toolkit provides and uses metadata files to support its basic
core functions, and to support specific functionality within the SAS Clinical Standards
Toolkit. The file content and structure are described in the following sections. The usage
of each of these metadata files is described in the document.
Standards
The Standards data set is used by the SAS Clinical Standards Toolkit framework to
store information about a standard version. All standards that are provided with the SAS
Clinical Standards Toolkit, and standards that you might want to add are defined in the
global standards library in the metadata/standards data set. All calls to the
%CST_REGISTERSTANDARD macro that are described in Chapter 2 interact directly
with the metadata/standards data set.
Table 3.1 Metadata/Standards Data Set Structure in the Global Standards Library
rootpath ($200) The root path for the standard version's directory
in the global standards library.
studylibraryrootpath ($200) The root path to the study repository. This can be
used to initialize the studyRootPath and
studyOutputPath global macro variables and to
use relative paths to study library subfolders. By
default, this is set to the sample library that is
associated with each standard provided with the
SAS Clinical Standards Toolkit.
The global standards library data set provided with the SAS Clinical Standards Toolkit is
located here:
global standards library directory/metadata/standards.sas7bdat
The global standards library data set contains these records, which are provided with
the SAS Clinical Standards Toolkit (the columns are continued in the subsequent two
images).
Figure 3.1 Metadata/Standards Data Set Content in the Global Standards Library
StandardSASReferences 37
The &_cstGRoot in the rootpath column maps to the global standards library
directory that is set by calling the %CSTUTIL_SETCSTGROOT macro.
An example of the global standards library data set that is used to register a specific
standard is located here:
global standards library directory/standards/
cdisc-sdtm-3.1.2-1.7/control/standards.sas7bdat
StandardSASReferences
The StandardSASReferences metadata data set specifies a set of library and file
records that are used by most processes that are provided with the SAS Clinical
Standards Toolkit implementation of each standard. It contains references to those
libraries and files that are installed with each standard that SAS provides. A standard-
specific StandardSASReferences data set exists for each SAS Clinical Standards
38 Chapter 3 / Metadata File Descriptions
Toolkit data standard that is supported by SAS. For example, the CDISC SDTM 3.1.2
StandardSASReferences data set is located here:
global standards library directory/standards/
cdisc-sdtm-3.1.2-1.7/control/standardsasreferences.sas7bdat
The type and subtype values are discussed in the following section. The SASref value
is the default value that is used in the library and filename allocation process. You can
overwrite this value. The path value contains a relative path. The relpathprefix value
rootpath instructs the code to use the rootpath location that is specified in the
standard-specific Standards data set. The resolved path is shown in Figure 3.3 on page
39.
This data set contains the concatenation of each StandardSASReferences data set that
is provided for each supported standard in the SAS Clinical Standards Toolkit. The
following enhancements are the only enhancements to the data set during
concatenation:
n the path column is resolved to the full global standards library path for each record,
based on the relpathprefix value
Standardlookup 39
The following display shows the content for the CDISC SDTM StandardSASReferences
data set that is described in Figure 3.2 on page 38. In the display, &_cstGRoot maps to
the global standards library directory that is set by calling the
%CSTUTIL_SETCSTGROOT macro:
The structure of all StandardSASReferences data sets is the same for all standards
provided with the SAS Clinical Standards Toolkit. This structure is described in
“SASReferences” on page 42.
Standardlookup
The Standardlookup data set provides a mechanism to capture valid values for discrete
variables in the SAS Clinical Standards Toolkit metadata files. This data set supports
such tasks as validating the content of the SAS Clinical Standards Toolkit metadata files
and providing selectable values in the user interfaces of other tools and solutions.
40 Chapter 3 / Metadata File Descriptions
Table 3.2 Standardlookup Data Set Structure in the Global Standards Library
Column
Column Name Length Description
standardversion ($20) The version number of the registered standard. This must
be unique within the standard.
templatetype ($8) For the given record, a non-null value (for example, data
set) indicates that a template is available. For example, the
macro call
%cst_createdsfromtemplate(
_cstStandard=CST-FRAMEWORK,
_cstType=control,_cstSubType=reference,
_cstOutputDS=work.sasreferences) finds that a
template is available as csttmplt.sasreferences.
Column
Column Name Length Description
A Standardlookup data set is provided for most standards with the SAS Clinical
Standards Toolkit. This data set can be used in the definition and registration of custom
standards in the SAS Clinical Standards Toolkit.
The cross-standard global standards library Standardlookup data set that is provided
with the SAS Clinical Standards Toolkit is located here:
global standards library directory/metadata/
standardlookup.sas7bdat
This data set contains the concatenation of each Standardlookup data set that is
provided for each supported standard in the SAS Clinical Standards Toolkit.
The following display shows an example of the records in a Standardlookup data set:
Figure 3.4 Standardlookup Data Set Content in the Global Standards Library
These records show the valid values for discrete columns in any SDTM 3.1.2
SASReferences (including StandardSASReferences) data set. For example, filetype
can have values of CATALOG, DATASET, FILE, or FOLDER. These records also show
that a SASReferences data set allows two subtype values (REFERENCE and
VALIDATION) when type is CONTROL. When type is CONTROL, the subtype value
must always be non-null.
42 Chapter 3 / Metadata File Descriptions
Templates are available for both the SASReferences data set and the validation_master
data sets. For more information about the columns and values in SASReferences data
sets, see the following section.
SASReferences
Each SAS Clinical Standards Toolkit process (for example, a primary task or action such
as validating source data against a SAS Clinical Standards Toolkit standard) requires
using a SASReferences data set. The SASReferences data set identifies all of the
inputs required and the outputs that are created by the process. Each process might
have its own unique SASReferences data set.
Chapter 6, “SASReferences File,” on page 137, describes the content and usage of
SASReferences data sets.
The following table identifies and describes each column within a SASReferences data
set:
Column
Column Name Length Description
standard ($20) Standard name. This value should match the standard
field in the Standards data set in global standards
library directory/metadata and in other
metadata files referenced in SASReferences (for
example, CDISC SDTM and CDISC CRT-DDS). This
column is required.
Column
Column Name Length Description
type ($40) The type of input and output data or metadata. This is a
predefined set of values that are documented in the
global standards library directory/
standards/cst-framework-1.7/control/
standardlookup data set. These values are also
itemized in Table 6.1 on page 140. This column is
required.
subtype ($40) The specific subtype within type of input and output data
or metadata. This is a predefined set of values that are
documented in the global standards library
directory/standards/cst-framework-1.7/
control/standardlookup data set. These values
are also itemized in Table 6.1 on page 140. This column
is optional, depending on type.
SASref ($8) The SAS libref or fileref that references the library or file
in the SAS Clinical Standards Toolkit SAS process. This
value should match the value of sasref that is used in any
other associated metadata files (for example, in the
Source Columns data set, the value is type=srcmeta).
This column is required. It must conform to SAS libref or
fileref naming conventions.
reftype ($8) The reference type. This column is required. Valid values
are libref and fileref.
iotype ($8) The input/output type (input, output, or both) of the entity.
Entities defined as “input” or “both” must exist and be
accessible. If not, calls to the
%CSTUTILVALIDATESASREFERENCES macro report
an error condition and halt the process.
allowoverwrite ($1) Allow the file to be overwritten (Y/N), for files with an
iotype value of “output” or “both”.
44 Chapter 3 / Metadata File Descriptions
Column
Column Name Length Description
path ($2048) The path of the library or the path portion of the file
reference. If you want to use the default value for a
standard, standardversion, type, or subtype, then leave
the path blank. The value is added to the &_cstSASRefs
working version of the SASReferences data set from the
standard-specific StandardSASReferences data set.
Specific paths should be provided for any type or subtype
that is study- or run-specific. Paths might be relative to an
environment variable (for example, !sasroot) or to a SAS
macro variable (for example, &studyRootPath).
Column
Column Name Length Description
memname ($48) The name of a specific SAS file (data set or catalog) or
file that is not created by SAS (for example, properties or
an XML file). The memname column should be blank for
library references. This column is optional, depending on
type. As a general rule, memname should be provided if
the path is provided, except where individual file
references are not appropriate (for example,
type=autocall and type=sourcedata). If you want to use
the default value for a standard, standardversion, type, or
subtype, then leave memname blank. The value is added
to the &_cstSASRefs working version of the
SASReferences data set from the standard-specific
StandardSASReferences data set. The file suffix for SAS
files is optional.
The following display shows some information in a typical SAS Clinical Standards
Toolkit SASReferences data set:
From this display, you can see that the data set contains information about types of data
and metadata and where they are located. The SAS Clinical Standards Toolkit imposes
a rigid, minimum SASReferences file structure. All columns defined in Table 3.3 on page
42 are expected; additional columns are allowed. No changes to column attributes are
allowed (for example, changing column lengths).
46 Chapter 3 / Metadata File Descriptions
Note: SASReferences data sets from the SAS Clinical Standards Toolkit releases prior
to version 1.5 can be used in version 1.7 if they do not include any of the columns
added in version 1.5 (iotype, filetype, allowoverwrite, and relpathprefix).
Properties
The SAS Clinical Standards Toolkit uses properties files to set default preferences for
each process. Properties are name-value pairs that are translated into SAS global
macro variables. These macro variables are available for the duration of a SAS Clinical
Standards Toolkit process. Properties can be defined in any number of files. Both text
file and SAS data set formats are supported. For more information about the SAS
Clinical Standards Toolkit global macro variables, see Appendix 1, “Global Macro
Variables,” on page 459. These macro variables are derived from properties files
provided with the SAS Clinical Standards Toolkit.
The following table describes the contents of a sample properties file in global
standards library directory/standards/cst-framework/programs/
initialize.properties:
_cstDebug 0
_cst_rc 0
_cst_rcmsg
_cst_MsgID
_cst_MsgParm1
_cst_MsgParm2
Messages 47
_cstResultSeq 0
_cstSeqCnt 0
_cstSrcData
_cstResultFlag 0
_cstResultsDS work._cstresults
_cstMessages work._cstmessages
_cstReallocateSASRefs 0
_cstFMTLibraries
_cstMessageOrder APPEND
_cstSASRefsLoc
_cstSASRefsName
_cstSASRefs work._cstsasrefs
_cstStdSASRefs
_cstSubjectColumns _none_
_cstLRECL LRECL=2048
_cstVersion 1.7
Messages
By default, the SAS Clinical Standards Toolkit provides a Messages data set for the
SAS Clinical Standards Toolkit framework and for each data standard provided with the
48 Chapter 3 / Metadata File Descriptions
SAS Clinical Standards Toolkit. Each Messages data set includes a list of codes and
associated text that are specific to each standard. In some cases, actions such as
validation are used to report process results.
The following table describes the structure of all the message files:
Column
Column Name Length Description Required
Column
Column Name Length Description Required
Column
Column Name Length Description Required
The Messages data set that supports the SAS Clinical Standards Toolkit framework is
located here:
global standards library directory/standards/cst-framework-1.7/
messages/messages.sas7bdat
The following display provides an excerpt of records and columns from the SAS Clinical
Standards Toolkit framework Messages data set:
Certain message-type data sets that support non-framework standards are described in
this document.
Results
Each SAS Clinical Standards Toolkit process generates a Results data set. The Results
data set can be persisted beyond the SAS session based on SASReferences data set
settings. Each Results data set captures the outcome of specific process actions. Each
Results data set uses the Messages data set to standardize output.
Results 51
The structure of each SAS Clinical Standards Toolkit Results data set is described in
this table.
Column Column
Name Length Description
resultid ($8) Result ID. The resultid is a message ID from the standard
Messages data set (for example, framework or CDISC
SDTM). The SAS Clinical Standards Toolkit has adopted a
naming convention matching a resultid with each standard.
The resultid values are prefixed with an up to 4-character
prefix (CST for framework messaging; CDISC examples:
ODM, SDTM, ADAM, and CRT). By convention, the prefix
matches the mnemonic field in the Standards data set in
global standards library directory/
metadata. This prefix is followed by a 4-digit numeric that
is unique within the standard (for example, SDTM1234). You
can use any naming convention limited to eight characters.
Value should be non-null.
checkid ($8) Validation check ID. The SAS Clinical Standards Toolkit has
adopted a naming convention matching each standard to be
validated. The checkid values are prefixed with an up to 4-
character prefix (CDISC examples: ODM, SDTM, ADAM,
and CRT). By convention, the prefix matches the mnemonic
field in the Standards data set in global standards
library directory/metadata. This prefix is
followed by a 4-digit numeric that is unique within the
standard (for example, SDTM1234). You can use any
naming convention limited to eight characters.
Value should be non-null for validation processes. Otherwise,
this column is optional.
52 Chapter 3 / Metadata File Descriptions
Column Column
Name Length Description
message ($500) Resolved message text from Messages data set. The
message value includes up to two run-time parameter values
in message text.
Value should be non-null.
Results 53
Column Column
Name Length Description
_cst_rc (8.) Process status. Values are nonzero and aborted. A nonzero
value typically indicates that the process ended abnormally.
Value should be non-null.
actual ($240) Actual value observed. This value is generally used for
validation reporting. It provides the actual column values that
are in error. This column is optional.
keyvalues ($2000) Record-level keys and values. This value is generally used
for validation reporting. It provides domain key values for
records that are in error. This column is optional.
For an example of a SAS Clinical Standards Toolkit Results data set, see Figure 7.9 on
page 213 and Figure 7.10 on page 214.
54 Chapter 3 / Metadata File Descriptions
Overview
The following metadata files can be used for specific tasks. In some cases, the file
structures might be unique to the supported or referenced standard. These metadata
files are provided by the SAS Clinical Standards Toolkit.
Validation Metrics
Each SAS Clinical Standards Toolkit validation process can generate a Summary data
set that provides a meaningful denominator for most validation checks. The Summary
data set enables you to more accurately assess the relative scope of errors that are
detected. The generation of this data set is based on validation property settings. This
data set can be persisted beyond the SAS session based on SASReferences data set
settings. For example, Table 7.10 on page 193 describes the metrics metadata for the
CDISC SDTM standard, and Figure 7.2 on page 195 provides sample content for the
CDISC SDTM standard.
The SAS implementation of the CDISC CRT-DDS 1.0 standard comes with the style
sheet define-v1-updated-html.xsl. This style sheet is an updated version of the
stylesheet that was used in the updated version of the CDISC SDTM/ADaM Pilot
Project Submission Package in 2013. (See http://www.cdisc.org/sdtmadam-pilot-.)
Because XSL style sheets are not part of the official CDISC standards, you can use
alternative style sheets for display purposes.
57
4
Metadata Management
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Support Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Common Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Overview
Management of metadata is performed using macros and driver programs to add,
modify, and delete metadata. Prior to version 1.6 of the SAS Clinical Standards Toolkit,
these macros were in three general categories:
n Macros or driver programs that derive entire data sets or catalogs from standard-
specific data or metadata. For example:
o The driver program create_sourcemetadata, which initializes source metadata
files from a SAS library of data sets or from a CRT-DDS define.xml file.
o The %CST_CREATEDSFROMTEMPLATE macro, which creates a zero-
observation data set that is based on a template.
o The %CST_CREATETABLESFORDATASTANDARD macro, which creates
domain data sets as defined in reference_tables and reference_columns.
o The %CSTUTIL_BUILDFORMATSFROMXML macro, which creates format
catalogs from codelist information in XML-based standards.
n Macros or driver programs that modify run-time process metadata from standard-
specific data or metadata. For example:
o The %CST_INSERTSTANDARDSASREFS macro, which does a look-through to
provide paths and memnames from StandardSASReferences.
o The %CSTUPDATESTANDARDSASREFS macro, which expands all relative
paths to full paths in a SASReferences file.
n Macros or driver programs that register or initialize a new standard or standard
version. For example, the %CST_REGISTERSTANDARD macro registers a new
standard within the global standards library.
There were no macros or driver programs to support modifying metadata files that are
associated with a given standard or standard version at a record level. Beginning with
Overview 59
version 1.6 of the SAS Clinical Standards Toolkit provides metadata management
macros that enable metadata management to accomplish these goals:
n Make minor modifications to a domain. For example, increase a column length in the
reference_columns data set.
n Add or remove columns to or from domain metadata, such as reference_columns or
source_columns.
n Update a validation_master record to change the definition of an existing validation
check.
n Add one or more records (validation checks) to a validation_master data set.
n Modify a Messages data set record. For example, modify the text or severity values.
n Update a specific CRT-DDS or Define-XML 2.0 data set (in any of the SAS
representation data sets).
n Add a record to value-level metadata, such as source_values.
n Retain any metadata modifications in a permanent transaction log data set.
n Enable the registration of a new set of controlled terminology
The SAS Clinical Standards Toolkit has always been an open-source collection of SAS
macros, programs, format catalogs, and data sets. Any SAS programmer, with the
proper security authorization, can modify any of these components of the product. For
this reason, the metadata management macros enable you to make modifications to the
metadata data sets and to track these changes in a transaction log data set. Use of
these macros preserves the metadata of the SAS Clinical Standard Toolkit data sets,
such as data set labels, keys, and sort order.
Metadata management macros are addressed in this chapter. Each macro is briefly
described. In addition to the main metadata management macros, a small group of
supporting macros is available. All actions performed by the metadata management
macros are written to a transaction log data set. Information about all macros is in the
SAS Clinical Standards Toolkit: Macro API Documentation.
60 Chapter 4 / Metadata Management
Note: Transaction records are not written when a macro is run in test mode.
The columns that are written to the transaction log data set are shown in this table.
The default transaction log data set is stored in global standards library
directory/logs as transactionlog.sas7bdat. This location and data set name are set
in the %CST_GETSTATIC AUTOCALL macro using the static variable names
CST_LOGGING_PATH and CST_LOGGING_DS, respectively. This default location and
name can be modified by overriding the %CST_GETSTATIC macro or by setting the
value of the global macro variable _cstTransactionDS to a reachable libref.dataset value
before calling any metadata management macro.
Overview
Metadata management macros enable you to customize the metadata of any data
standard that is used by the SAS Clinical Standards Toolkit. The macros provide a
mechanism, the transaction log data set, to track changes.
62 Chapter 4 / Metadata Management
The metadata management macros included in SAS Clinical Standards Toolkit are
shown in this table.
Macro Description
Note: Information about all macros is in the SAS Clinical Standards Toolkit: Macro API
Documentation.
Test Mode
To verify changes before they are written to a permanent data set, all of the metadata
management macros can be run in test mode except as noted below.
Write access permission is required to the target permanent data set. Write access
permission is checked as an initial step in the metadata management macros. If Write
access permission is not available, the macro does complete successfully, even in test
mode.
Note: cstutiladddataset and cstutiladddscolumn cannot be run in test mode.
Metadata Management Macros 63
All test mode output is generated in the SAS Work directory, and the transaction log
data set is not updated. After you have verified that the changes are correct, run the
macro again with test mode disabled, and the permanent data set is modified.
Problem Reporting
There are two ways to report problems: in the _cstResults data set or in the SAS log
file.
Because a full SAS Clinical Standards Toolkit environment (one in which all global
macro variables are defined) is not required for a macro to run, a macro reports
problems in one of two locations, in this order:
1 If the _cstResultsDS macro variable and the data set specified by the value of
_cstResultsDS exist, problems are reported in the _cstResults data set.
2 If the _cstResultsDS macro variable or the data set specified by the value of
_cstResultsDS does not exist, problems are reported in the SAS log file.
Note: After the first submission of a macro, a work._cstresults data set might exist and
the _cstResultsDS macro variable might specify the data set. Subsequent macro
submissions report problems to the work._cstresults data set instead of to the SAS log
file. This happens because some of the macros call other internal macros that generate
a work._cstresults data set. This data set is then used by subsequent macros for
problem reporting.
If the SAS log file is used to report problems, the SAS Clinical Standards Toolkit
distinguishes problems from normal SAS log messages by displaying a message similar
to this one:
[CSTLOGMESSAGE.CSTUTILDELETEDSCOLUMN] ERROR:
results.transactionlog could not be found.
64 Chapter 4 / Metadata Management
Support Macros
The support macros that enhance the functionality of the metadata management
macros are shown in this table.
Macro Description
Common Parameters
The metadata management macros share a set of common parameters. These
parameters are used by all of the metadata management macros:
n _cstStd: The SAS Clinical Standards Toolkit registered standard name (for example,
CDISC-SDTM).
n _cstStdVer: The SAS Clinical Standards Toolkit registered standard version (for
example, 3.1.3).
n _cstDS: The target data set to act on. This is specified in libname.dataset form,
where the LIBNAME has been previously allocated.
The parameter _cstTestMode is used by most of the metadata management macros.
_cstTestMode specifies whether a macro is run in test mode. The valid values are Y
(default) or N. For more information, see “Test Mode” on page 62.
Copying a Data Set from One Library to Another Library 65
libname newstudy '<directory where new study data sets will reside>';
libname srcmeta '<directory supplying data set to be copied>';
libname log 'C:\cstGlobalLibrary\logs';
%cstutiladddataset(
_cstStd=CDISC-SDTM,
_cstStdVer=3.1.3,
_cstDS=newstudy.source_values,
_cstInputDS=srcmeta.source_values,
_cstDSLabel=SDTM Source Value Metadata,
_cstDSKeys=sasref table column value,
_cstOverwrite=Y);
Before running the macro, the Newstudy library is empty. After running the macro, the
data set from the Srcmeta library is copied to the Newstudy library.
The SAS log file contains a message to inform you that the operation was successful:
66 Chapter 4 / Metadata Management
Note: If the message is not in the SAS log file, review the contents of the
work._cstresults data set.
The following display shows that the properties of the newstudy.source_values data set
show that the keys and label parameter values were used:
The following display shows part of the transaction log data set in the Log library, which
shows that it was updated:
Note: Appending records to a data set always adds rows to the target data set even if
the rows already exist in the target data set. Merging records adds new rows and
updates existing rows. If keys are present, the target data set is sorted and duplicate
key records are deleted.
Note: To merge successfully, the keys for the data sets must match. Any discrepancies
in the keys are reported either to the SAS log file or to the Results data set.
Before running the macro, the work.newrecs data set was created for the _cstNewDS
macro parameter.
Note: The data set (newstudy.source_values) must share the same structure as the
target data set (work.newrecs).
68 Chapter 4 / Metadata Management
After the macro is run, the newstudy.source_values data set is updated with the new EG
record and the IE record. The following display shows an example of the updated data
set:
The row count (Rows) is increased from 28 to 30 (compare to the image on page 66),
which indicates that the two records were added from the work.newrecs data set.
Adding Records to a Data Set 69
The following display shows the updated properties of the newstudy.source_values data
set:
The results of running the macro write to the work._cstresults data set because the data
set was created by the macro in the previous example.
70 Chapter 4 / Metadata Management
The following display shows part of the work._cstresults data set, in which row 5
contains the message generated after running the
%CSTUTILAPPENDMETADATARECORD macro:
The following display shows part of the transaction log data set, which shows that it was
updated for each row of data added (rows 2 and 3):
%cstutilappendmetadatarecords(
_cstStd=CDISC-SDTM,
_cstStdVer=3.1.3,
_cstDS=newstudy.source_values,
_cstNewDS=work.newrecs,
_cstUpdateDSType=append,
_cstOverwriteDup=y,
_cstTestMode=n);
%cstutilupdatemetadatarecords(
_cstStd=CDISC-SDTM,
_cstStdVer=3.1.3,
_cstDS=newstudy.source_values,
_cstDSIfClause=table='EG' and value='QTC',
_cstColumn=label,
_cstValue=QT Interval (QTc),
_cstTestMode=n);
The following display shows the value of the Column Description for QTC (PR
Interval) before running the macro:
The following display shows the modified value of the Column Description for QTC
(QT Interval (QTc)):
The following display shows the updated transaction log data set in row 4:
%cstutiladddscolumn(
_cstStd=CDISC-SDTM,
_cstStdVer=3.1.3,
_cstDS=newstudy.source_values,
_cstColumn=comment2,
_cstColumnLabel=Additional comment,
_cstColumnType=c,
_cstColumnLength=200,
_cstColumnFmt=$200.,
_cstColumnInitValue=This is a test to add a new variable);
Before running the macro, the number of columns in the newstudy.source_values data
set was 21. After running the macro, the number of columns is 22 and the comment2
column was modified.
74 Chapter 4 / Metadata Management
The following display shows the full set of columns in the newstudy.source_values data
set. The comment2 column is at the bottom of the list with length, format, and label as
specified in the macro parameters.
The following display shows the modified newstudy.source_values data set, which
shows that the initial value of This is a test to add a new variable was set for the
new column on all data set records:
The following display shows the updated transaction log data set:
%cstutilmodifycolumnattribute(
_cstStd=CDISC-SDTM,
_cstStdVer=3.1.3,
_cstDS=newstudy.source_values,
_cstColumn=comment2,
_cstAttr=label,
_cstAttrValue=New label for comment2,
_cstTestMode=n);
The following display shows the updated transaction log data set:
%cstutildeletedscolumn(
_cstStd=CDISC-SDTM,
_cstStdVer=3.1.3,
_cstDS=newstudy.source_values,
_cstColumn=comment2,
_cstMustBeEmpty=n,
_cstTestMode=n);
78 Chapter 4 / Metadata Management
The following display shows the modified columns in the newstudy.source_values data
set. The comment2 column has been removed and the column count is reduced to 21.
The following display shows the updated transaction log data set:
CAUTION! Ensure that the WHERE clause retrieves the correct records to delete.
It is highly recommended that this operation initially be performed in test mode. For
more information, see “Test Mode” on page 62.
In this example, the two rows of data added from the previous examples are deleted
from the newstudy.source_values data set using the same WHERE clause.
*********************
* Delete a record *
*********************;
%cstutildeletemetadatarecords(
_cstStd=CDISC-SDTM,
_cstStdVer=3.1.3,
_cstDS=newstudy.source_values,
_cstDSIfClause=(table='EG' and value='QTC') or (table='IE' and value='INCL25'),
_cstTestMode=n);
80 Chapter 4 / Metadata Management
The following display shows the modified newstudy.source_values data set, which
shows that the two rows have been deleted and the record count is reduced from 30 to
28:
The following display shows the updated transaction log data set:
%cstutil_deletedataset(
_cstDataSetName=newstudy.source_values,
_cstLogging=1);
After running the macro, the directory no longer contains the data set.
82 Chapter 4 / Metadata Management
This macro does not write to the work._cstresults data set. Messages are written
directly to the SAS log file:[CSTLOGMESSAGE.CSTUTIL_DELETEDATASET] NOTE:
newstudy.source_values successfully deleted.
The following display shows the updated transaction log data set:
Note: In the image, Name of standard and Standard version are not populated. The
%CSTUTIL_DELETEDATASET macro is an older SAS Clinical Standard Toolkit macro
that does not require those parameter values for any data lookups. However, the values
for the file path and the data set name are listed in the transaction log data set.
This data set stores metadata about the snapshots and subsets:
global standards library directory/standards/cdisc-
terminology-1.7/control/standardsubtypes.sas7bdat
Each CDISC terminology standard that is provided by SAS includes a SAS format
catalog (cterms.sas7bcat) and a SAS data set (cterms.sas7bdat). The data set is an
extract of the NCI EVS controlled terminology for a given CDISC standard and update.
A similar data set and catalog that represent your snapshot or subset must be created.
The data set and catalog location are identified in the _cstpath parameter of the
%CSTUTILREGISTERCTSUBTYPE macro. The snapshot or subset must be registered
using the %CSTUTILREGISTERCTSUBTYPE macro, which adds a record to this data
set:
global standards library directory/standards/cdisc-
terminology-1.7/control/standardsubtypes.sas7bdat
The following example registers a data set and a catalog named myct in the global
standards library directory/standards/cdisc-terminology-1.7/
cdisc-sdtm/201412/formats folder:
%cstutilregisterctsubtype(
_cststd=CDISC-TERMINOLOGY,
_cststdver=CDISC-SDTM,
_cststandardsubtype=NCI_THESAURUS,
_cststandardsubtypeversion=201412,
_cstpath=&_cstGRoot./standards/cdisc-terminology-1.7/cdisc-sdtm/201412/formats,
_cstmemname=myct,
_cstisstandarddefault=N,
_cstdescription=%nrbquote(CDISC SDTM Controlled Terminology, released by NCI on 2014-12-20));
Note: The transaction log data set is broken into three displays for clarity.
84 Chapter 4 / Metadata Management
See Also
5
Supported Standards
CDISC SDTM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Release Dates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
CDISC SDTM 3.1.1 Reference Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
CDISC SDTM 3.1.2 Reference Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
CDISC SDTM 3.1.3 Reference Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
CDISC SDTM 3.2 Reference Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Overview
The SAS Clinical Standards Toolkit is designed to support various clinical standards.
The SAS Clinical Standards Toolkit was initially built to support the Clinical Data
Interchange Standards Consortium (CDISC) standards. However, the generic
framework enables definition of any type of standard.
Each SAS Clinical Standards Toolkit standard provides a SAS representation of the
published source guidelines or source specification. The SAS representation is
designed to serve as a model or template of the source specification.
Two key design requirements shaped the implementation of the SAS Clinical Standards
Toolkit standards.
n Each supported standard is represented in one or more SAS files. This facilitates
these points:
o It provides SAS users with an implementation of data models and standards that
are based on SAS.
o It enables you to use SAS routines to assess how well any user-defined set of
data and metadata conforms to the standard.
o It enables you to use SAS code to read and derive files in other formats (for
example, XML).
Each SAS Clinical Standards Toolkit standard is an optimized reference standard
from a SAS perspective.
n You are able to define your own customized standards, or you are able to modify
existing SAS standards. For more information about how new standards are
registered in the SAS Clinical Standards Toolkit, see “Registering a New Version of a
Standard” on page 26.
90 Chapter 5 / Supported Standards
SAS provides new standards and updates based on customer requirements, changes to
source guidelines, and changes to source specifications.
This document uses the term “reference standard” to refer to the SAS representation of
each source specification.
The following display shows the global standards library folder hierarchy that is
provided for CDISC SDTM:
The metadata folder contains the data set and column metadata for each
supported domain. The SAS Clinical Standards Toolkit provides a utility macro
(%CST_CREATETABLESFORDATASTANDARD) that reads this metadata, and
builds an empty data set for each supported SDTM domain. All supporting files
required by the SAS Clinical Standards Toolkit to support the specific CDISC SDTM
standard are provided in the remaining folders.
o The control folder provides these data sets:
column metadata documenting the SAS data sets used to build the define.xml file,
and a default style sheet for the generated define.xml file. A broader view of what
comprises the CDISC CRT-DDS reference standard must recognize that the
standard also references data and metadata from other standards.
CDISC SDTM
Purpose
CDISC SDTM defines a standard structure for data tabulations that are submitted as
part of a product application to a regulatory authority such as the FDA. The data sets
and columns required for a regulatory application are not prescribed by the standard.
Instead, these requirements are based on the trial protocol and discussions with the
regulatory authority in charge of reviewing the submission. Therefore, any SAS Clinical
Standards Toolkit standard, including any CDISC SDTM standard, is only a
representative sample or template.
Release Dates
CDISC SDTM 3.1.2
n CDISC SDTM Model, Final Version 1.2, November 12, 2008
n CDISC SDTM Implementation Guide, Final Version 3.1.2, November 12, 2008
Description
CDISC standards, including SDTM, allow for the inclusion and exclusion of some
columns. (For example, timing variables can be included or excluded.) In addition,
CDISC standards do not specify a length for most columns. Therefore, any
implementation of a CDISC standard requires interpretation of that standard, which
might lead to differences in the implementation of that standard. Reference standards
are derived based on internal conventions and experiences, and discussions with
regulatory authorities.
The domain and column metadata that constitute the SAS representation of each
CDISC SDTM standard are derived from the global standards library in these formats:
n as empty data sets (using the utility macro
%CST_CREATETABLESFORDATASTANDARD)
n as table metadata (See Table 5.1 on page 95.)
n as column metadata for each domain (See Table 5.2 on page 96.)
CDISC SDTM 95
SASref REFMETA
Table AE
Class Events
XmlPath .../transport/ae.xpt
Purpose Tabulation
State Final
Date 2013-11-26
Standard CDISC-SDTM
StandardVersion 3.2
sasref REFMETA
table AE
column AESEV
label Severity/Intensity
order 26
type C
length 20
displayformat
xmldatatype text
xmlcodelist AESEV
core Perm
origin
role RecordQualifier
term (AESEV)
algorithm
qualifiers UPPERCASE
standard CDISC-SDTM
standardversion 3.2
standardref
CDISC SDTM 97
The SAS Clinical Standards Toolkit CDISC SDTM reference standard provides
metadata and code to validate the structure and content of the SDTM domains.
It is this set of files, in whole or in part, that defines each of the CDISC SDTM reference
standards.
Comments - CO Questionnaires - QS
Comments - CO PK Parameters - PP
Microscopic Findings - MI
102 Chapter 5 / Supported Standards
Purpose
The Analysis Data Model (ADaM) specifies the fundamental principles and standards to
follow when creating analysis data sets and associated metadata. ADam supports
efficient generation, replication, and review of analysis results. The design of analysis
data sets is generally driven by the scientific and medical objectives of the clinical trial.
A fundamental principle is that the structure and content of the analysis data sets must
support clear, unambiguous communication of the scientific and statistical aspects of
the clinical trial.
The purpose of ADaM is to provide a framework that enables analysis of the data. At
the same time, ADaM enables reviewers and other recipients of the data to have a clear
understanding of the data’s lineage from collection to analysis to results. Whereas
ADaM is optimized to support data derivation and analysis, CDISC Study Data
Tabulation Model (SDTM) is optimized to support data tabulation.
Release Date
CDISC ADaM Analysis Data Model, Final Version 2.1, December 17, 2009
The ADaM Basic Data Structure for Time-to-Event Analyses, Version 1.0, May 8, 2012
Analysis Data Model (ADaM) Data Structure for Adverse Event Analysis, Version 1.0,
May 10, 2012
Regulatory Basis
(Source: Submission of Data in CDISC Format to CBER, http://www.fda.gov/
BiologicsBloodVaccines/DevelopmentApprovalProcess/ucm209137.htm, page updated:
October 18, 2013)
Effective December 15, 2010, SDTM and ADaM are being accepted for CBER IND,
NDA, and BLA submissions.
CDISC ADaM 2.1 103
“In determining how to create ADaM analysis datasets for submission to CDER,
sponsors should refer to three documents: the Analysis Data Model and the ADaM
Implementation Guide (www.CDISC.org), and the FDA Study Data Specifications
Document (http://www.fda.gov/downloads/ForIndustry/DataStandards/
StudyDataStandards/UCM199599.pdf). Close adherence to the ADaM Implementation
Guide is expected and any specific questions that result from attempts to adhere to
these documents should be discussed with the review division.”
Implementation of the CDISC ADaM 2.1 reference standard in the SAS Clinical
Standards Toolkit supports each of these principles.
The number and structure of analysis data sets are highly dependent on the type of
study, the study objectives as defined in the statistical analysis plan, and discussions
with the reviewing authority. ADaM data sets incorporate derived and collected data that
permit analysis with little or no additional programming. Data can be from various SDTM
domains, other ADaM data sets, or any combination thereof.
The CDISC ADaM 2.1 reference standard currently supports these analysis data set
structures:
n The subject-level analysis data set (ADSL) provides descriptive information about
subjects, such as study disposition, demographic, and baseline characteristics. The
ADSL is the primary source for subject-level variables included in other analysis data
sets, such as population flags and treatment variables. There is only one ADSL per
study, and the ADSL and its related metadata are required in each CDISC-based
submission of data from a clinical trial, even if no other analysis data sets are
submitted.
n The ADaM Basic Data Structure (BDS) is used for the majority of ADaM data sets,
regardless of the therapeutic area or type of analysis. Each BDS data set contains
one or more records per subject and analysis parameter. The structure of some BDS
data sets might include an analysis time point. A record in a BDS analysis data set
can represent an observed, derived, or imputed value required for analysis. Each
BDS data set contains a core set of variables that describe the analysis parameter
and the value being analyzed. A data value can be derived from any source file,
including any combination of SDTM and ADaM data sets. The Time-to-Event
analysis data set is an example implementation of the BDS structure.
n The Adverse Event analysis data set (ADAE) structure is built on the nomenclature
of the CDISC SDTM Implementation Guides for collected data. The ADAE data set
adds attributes, variables, and data structures that are required for statistical
analyses. The primary SDTM source domain for the ADAE data set is AE, with the
corresponding SUPPAE. Additional variables can be added from the ADaM ADSL
data set. The ADAE data set is required when SDTM AE is not sufficient to support
all adverse event analyses. The ADAE structure for the standard adverse event
CDISC ADaM 2.1 105
safety data set has at least one record per each AE recorded in the SDTM AE
domain.
Metadata for the ADSL, BDS, and ADAE data sets is defined in the SAS Clinical
Standards Toolkit reference_tables data set in the standard metadata folder.
The Analysis Data Model identifies four types of metadata that are captured and
supported by the SAS Clinical Standards Toolkit.
Table 5.6 ADaM Metadata Types and SAS Clinical Standards Toolkit Locations
Version 1.0 of the Analysis Data Model Implementation Guide (ADaMIG) defines a
common set of ADSL and BDS columns that can be used as templates for ADaM
analysis data sets. This set of ADSL and BDS columns has been supplemented with
Version 1.0 of the Analysis Data Model (ADaM) Data Structure for Adverse Event
Analysis. Metadata for the 290 columns in the SAS representation of ADSL, BDS, and
ADAE is defined in the SAS Clinical Standards Toolkit reference_columns data set in
the standard metadata folder. Empty ADSL, BDS, and ADAE data sets containing these
columns can be derived from the SAS Clinical Standards Toolkit global standards library
using the utility macro %CST_CREATETABLESFORDATASTANDARD.
106 Chapter 5 / Supported Standards
The SAS Clinical Standards Toolkit CDISC ADaM reference standard also provides
metadata and code to validate the structure and content of the ADaM analysis data
sets.
These supplemental files, in whole or in part, define the SAS Clinical Standards Toolkit
CDISC ADaM reference standard.
Purpose
The CDISC CRT-DDS standard defines the metadata structures in a machine-readable
XML format. These metadata structures are used to describe tabulation and analysis
data sets and variables for regulatory submissions. The XML schema that is used to
define the metadata structures in an XML format is based on an extension to the CDISC
Operational Data Model (ODM).
Release Date
CDISC CRT-DDS, Final Version 1.0, February 10, 2005
CDISC CRT-DDS 1.0 107
Regulatory Basis
(Source: CDISC Case Report Tabulation Data Definition Specification)
In 1999, the FDA standardized the submission of clinical and non-clinical data and
metadata in a set of eSubmission guidelines to include metadata descriptions of the
data sets and columns within a Data Definition Document (define.pdf). In 2003, the FDA
published a set of guidance documents on receiving electronic product applications per
the International Conference on Harmonisation (ICH) electronic Common Technical
Document (eCTD) specifications. In these specifications, the FDA expanded the
acceptable file types to include the XML format.
<ItemGroupDef OID="docroot.IG.TE"
Name="TE"
Repeating="No"
IsReferenceData="Yes"
Purpose="Tabulation"
def:Label="Trial Elements"
def:Structure="One record per planned element"
def:DomainKeys="STUDYID,ETCD"
def:Class="Trial Design"
def:ArchiveLocationID="ArchiveLocation.te">
!-- All ItemRefs would be listed here -->
<def:leaf ID="ArchiveLocation.te"
xlink:href="te.xpt"> <def:title>te.xpt</def:title>
</def:leaf>
</ItemGroupDef>
CDISC CRT-DDS 1.0 109
Column Value
OID IG.TE
Name TE
Repeating No
IsReferenceData Yes
SASDatasetName TE
Domain TE
Origin
Role
Purpose Tabulation
ArchiveLocationID Location.TE
FK_MetaDataVersion MDV.1
Note: Empty or null attributes are not typically included in the XML file.
110 Chapter 5 / Supported Standards
The highly structured nature of CDISC CRT-DDS data requires that any mapping to a
relational format include a large number of data sets, with foreign key relationships to
help preserve the intended non-relational object structure. In the SAS Clinical Standards
Toolkit, foreign key relationships are enforced when validating the CDISC CRT-DDS
data sets.
Field lengths in the CDISC CRT-DDS data sets are consistent by core data type. CDISC
has not specified any limit to the length of most character fields. Arbitrary lengths have
been chosen by data type. These lengths are listed in this table. In the table, standard
data types are distilled into core data types. To be safe, larger lengths have been
chosen to ensure that no data loss occurs in the SAS Clinical Standards Toolkit pre-
installed data sets. Production tables might be compressed using SAS mechanisms to
preserve disk space.
Type
Name Length Description
text 2000 A character field that can accommodate a large number of characters
Purpose
The CDISC Define-XML 2.0 standard defines the metadata structures in a machine-
readable XML format. These metadata structures are used to describe tabulation and
analysis data sets and variables for regulatory submissions and any proprietary (non-
CDISC) data set structure. The XML schema that is used to define the metadata
structures in an XML format is based on an extension to the CDISC Operational Data
Model (ODM).
112 Chapter 5 / Supported Standards
Release Date
CDISC Define-XML Version 2.0 specification, Production Version 2.0.0, March 5, 2013.
Regulatory Basis
(Source: CDISC Define-XML Version 2.0 Specification)
“In the United States, the approval process for regulated human and animal health
products requires the submission of data from clinical trials and other studies as
expressed in the Code of Federal Regulations (CFR). The FDA established the
regulatory basis for wholly electronic submission of data in 1997 with the publication of
regulations on the use of electronic records in place of paper records (21 CFR Part 11).
In 1999, the FDA standardized the submission of clinical and non-clinical data using the
SAS Version 5 XPORT Transport Format and the submission of metadata using
Portable Document Format (PDF), respectively. In 2005, the Study Data Specifications
published by the FDA included the recommendation that data definitions (metadata) be
provided as a Define-XML file. In December 2011, the CDER Common Data Standards
Issues Document stated that ‘a properly functioning define.xml file is an important part
of the submission of standardized electronic datasets and should not be considered
optional.’”
The tablecore column in the reference_tables data set indicates whether the table is a
required (Req) or optional (Opt) part of the Define-XML 2.0 metadata, according to the
XML schema. Tables with tablecore equal to Ext are part of the underlying ODM
114 Chapter 5 / Supported Standards
metadata model, but they should be considered extensions to the Define-XML 2.0
metadata model. The core column in the reference_columns data set indicates whether
a column is required (Req) or optional (Opt) in a table when the table is part of the
metadata.
As a general rule, the SAS representation of the CDISC Define-XML 2.0 standard is
patterned to match the XML element (data set) and attribute (column) structure of
define.xml. The SAS representation of the CDISC Define-XML 2.0 metadata model
contains fewer tables than the CDISC Define-XML 2.0 metadata model. This reduction
was accomplished by combining tables with the same structure.
The TranslatedText table contains the contents of the TranslatedText child elements of
various parent elements (ItemGroupDefs, ItemDefs, ItemOrigin, CodeLists,
CodeListItems, MethodDefs, CommentDefs, and others). Other tables that combine
similar table structures into one table are the Aliases table, the DocumentRefs table,
and the FormalExpressions table.
The highly structured nature of CDISC Define-XML 2.0 data requires that any mapping
to a relational format include a large number of data sets. Foreign key relationships help
preserve the intended non-relational object structure. In SAS Clinical Standards Toolkit,
these foreign key relationships are enforced when validating CDISC Define-XML 2.0
data sets in a way that is similar to the CDISC CRT-DDS 1.0 data sets.
Field lengths in the CDISC Define-XML 2.0 data sets are consistent by core data type.
CDISC has not specified a limit to the length of most character fields. Arbitrary lengths
have been chosen by data type. Here are the lengths:
Note: CRT-DDS 1.0 and Define-XML 2.0 use the same default lengths
In the table, standard data types are distilled into core data types. Larger lengths have
been chosen to ensure that no data loss occurs in the SAS Clinical Standards Toolkit
pre-installed data sets. Production tables can be compressed using SAS mechanisms
to preserve disk space.
116 Chapter 5 / Supported Standards
To support this functionality, supplemental files include these global standards library
files:
n A SAS format catalog (defct.sas7bcat) in the formats folder provides valid values
for selected columns in the 46 data sets of the SAS representation.
n The Messages data set in the messages folder provides unified error messaging for
all Define-XML processes.
n SAS code in the macros folder provides code that is specific to CDISC Define-XML
2.0. This SAS code augments code that is provided in the primary SAS Clinical
Standards Toolkit autocall library (!sasroot/cstframework/sasmacro).
n The style sheet folder contains the define2-0-0.xsl XSL style sheet. The
define2-0-0.xsl style sheet is based on the style sheet that was published by CDISC
in 2013. It can be found at http://www.cdisc.org/define-xml.
A define.xml file can be rendered in a human-readable form (such as HTML) with an
XSL style sheet.
CDISC Analysis Results Metadata 1.0 for Define-XML 2.0 117
Purpose
The CDISC Define-XML 2.0 standard defines the metadata structures in a machine-
readable XML format. These metadata structures are used to describe tabulation and
analysis data sets and variables for regulatory submissions, as well as any proprietary
(non-CDISC) data set structure.
The Analysis Results Metadata extension to the Define-XML 2.0.0 describes a model
for the purpose of submissions to regulatory agencies such as the United States Food
and Drug Administration (FDA) as well as for the exchange of analysis datasets and key
results between other parties. This Analysis Results Metadata extension is based on the
metadata model as described in the CDISC ADaM Analysis Data Model Version 2.1
document.
The XML schema that is used to define the metadata structures in an XML format is
based on an extension to the CDISC Operational Data Model (ODM).
Release Date
CDISC Analysis Results Metadata Specification for Define-XML Version 2, Production
Version 1.0, January 27, 2015.
Regulatory Basis
(Source: Technical Conformance Guide on Electronic Study Data Submissions,
Pharmaceuticals and Medical Devices Agency, Provisional Translation [as of July
2015]).
In order for the review of clinical study data to progress smoothly, it is important that the
relationship between the analysis results shown in the application documents and the
analysis datasets is easily understandable. Therefore, the definition documents of the
118 Chapter 5 / Supported Standards
ADaM datasets should preferably include Analysis Results Metadata, which shows the
relationship between the analysis results and the corresponding analysis dataset and
the variables used, for the analyses performed to obtain the main results of efficacy and
safety and clinical study results that provide the rationales for setting of the dosage and
administration, shown in 4.1.1.3. The Analysis Results Metadata of each analysis
should preferably include the following items.
n Figure or table numbers and titles showing the analysis results displayed in the
clinical study report
n Purpose and reasons for performing the analysis
n Parameter name and code to be used
n Variables subject to analysis
n Dataset to be used
n Selection criteria for the records subject to analysis
n Corresponding description in the statistical analysis plan, analysis program name,
and summary of the analytical methods
n Extract of the analysis program corresponding to the analysis method
For the format of the Analysis Results Metadata, the applicant should refer to the
Analysis Results Metadata Specification for Define-XML by CDISC to the extent
possible, but if it is difficult to include it into the definition document, it is possible to
submit it as a separated file in PDF format, as specified in “Electronic Specifications of
Common Technical Documents”, and “Handling of Electronic Specifications of Common
Technical Documents”. The explanations in the definition document may be written in
Japanese.
Figure 5.5 reference_tables (CDISC Define-XML 2.0 including Analysis Results Metadata)
120 Chapter 5 / Supported Standards
Figure 5.6 reference_columns (CDISC Define-XML 2.0 including Analysis Results Metadata)
The tablecore column in the reference_tables data set indicates whether the table is a
required (Req) or optional (Opt) part of the Define-XML 2.0 metadata, according to the
XML schema. Tables with tablecore equal to Ext are part of the underlying ODM
metadata model, but they should be considered extensions to the Define-XML 2.0
metadata model. The core column in the reference_columns data set indicates whether
a column is required (Req) or optional (Opt) in a table when the table is part of the
metadata.
CDISC ODM
Purpose
(Source: CDISC website http://www.cdisc.org/odm)
CDISC ODM 121
The CDISC ODM standard facilitates the archival and interchange of the metadata and
data for clinical research. ODM is a vendor-neutral, platform-independent format for the
interchange and archival of clinical study data. ODM includes the clinical data and its
associated metadata, administrative data, reference data, and audit information. All of
the information that needs to be shared during setup, operation, analysis, and
submission, as well as for long-term retention as part of an archive, is included in ODM.
Release Dates
n CDISC ODM, Version 1.3.0, December 15, 2006
n CDISC ODM, Version 1.3.1, February 11, 2010
The SAS Clinical Standards Toolkit does not support this CDISC ODM 1.3.0
functionality:
n reading or writing the DigitalSignatures section of the ODM
n vendor or customer extensions of the ODM
n processing is limited to a single ODM file (for example, the use of PriorFileOID to
reference another file is ignored)
122 Chapter 5 / Supported Standards
The domain and column metadata that constitute the SAS representation of CDISC
ODM 1.3.0 are derived from the global standards library in these formats:
n as empty data sets (using the utility macro
%CST_CREATETABLESFORDATASTANDARD)
n as table metadata (See Table 5.12 on page 123.)
n as column metadata for 315 columns in the 66 data sets (reference_columns in the
standard metadata folder)
As a general rule, the SAS representation of the CDISC ODM standard is patterned to
match the XML element (data set) and attribute (column) structure of odm.xml. For
example, consider this XML extract:
The following table describes how the XML element and attribute information maps to
the SAS representation:
SAS Column
XML Element or Attribute SAS Data Set SAS Column Value
The following table lists the complete set of 66 tables that form the SAS Clinical
Standards Toolkit SAS representation of the CDISC ODM 1.3.0 standard:
admindata itemrangecheckvalues
124 Chapter 5 / Supported Standards
annotation itemrcformalexpression
annotationflag itemrole
association keyset
auditrecord location
clinicaldata locationversion
clitemdecodetranslatedtext measurementunits
codelistitems metadataversion
codelists methoddefformalexpression
conditiondefformalexpression methoddefs
conditiondefs methoddeftranslatedtext
conditiondeftranslatedtext mutranslatedtext
enumerateditems odm
externalcodelists presentation
formdata protocoleventrefs
formdefarchlayouts protocoltranslatedtext
formdefitemgrouprefs rcerrortranslatedtext
formdefs referencedata
formdeftranslatedtext signature
imputationmethods signaturedef
itemaliases study
itemdata studyeventdata
CDISC ODM 125
itemdefs studyeventdefs
itemdeftranslatedtext studyeventdeftranslatedtext
itemgroupaliases studyeventformrefs
itemgroupdata subjectdata
itemgroupdefitemrefs user
itemgroupdefs useraddress
itemgroupdeftranslatedtext useraddressstreetname
itemmurefs useremail
itemquestionexternal userfax
itemquestiontranslatedtext userlocationref
itemrangechecks userphone
The highly structured nature of CDISC ODM data requires that any mapping to a
relational format include a large number of data sets, with foreign key relationships to
help preserve the intended non-relational object structure. In the SAS Clinical Standards
Toolkit, foreign key relationships are enforced when validating the CDISC ODM data
sets.
Field lengths in the CDISC ODM data sets are consistent by core data type. CDISC has
not specified any limit to the length of most character fields. Arbitrary lengths have been
chosen by data type. These lengths are listed in this table. In the table, standard data
types are distilled into core data types. To be safe, larger lengths have been chosen to
ensure that no data loss occurs in the SAS Clinical Standards Toolkit pre-installed data
126 Chapter 5 / Supported Standards
sets. Production tables might be compressed using SAS mechanisms to preserve disk
space.
Type
Name Length Description
text 2000 A character field that can accommodate a large number of characters
The table metadata for the 66 data sets and the column metadata for the 315 columns
in those data sets that comprise the SAS representation of the CDISC ODM 1.3.0
standard are here:
global standards library directory/standards/
cdisc-odm-1.3.0-1.7/metadata
Only the ODM data set, which contains valid values for the FileOID, CreationDateTime,
and FileType variables, is needed to create a minimal, but valid, CDISC ODM-compliant
XML document. This is based on the CDISC ODM standard, which is flexible. All table
and column names are case sensitive. They must be specified exactly as shown.
In the SAS implementation of the relational data model, the keys are extended to define
a unique record in every SAS data set. For example, a unique record in the
EnumeratedItems data set is defined by the variables FK_CODELISTS and
CODEDVALUE. These SAS data set keys are in the table metadata in the SAS
reference_tables data set.
CDISC ODM 127
Starting in ODM 1.3.0, there are two forms of the ItemData element, which is the
element used by ODM for transmitting clinical data item values. These two forms are
untyped and typed. Here is an example of a typed ItemData element:
Both of these data values are stored in the Value variable in the ItemData SAS data set.
In the case of typed data, the ItemDataType variable in the ItemData SAS data set has
the data type (for example, Float). In the case of untyped data, the ItemDataType
variable in the ItemData SAS data set is null.
Typed and untyped data transmission should not be mixed within a single ODM file.
However, in the example provided by the SAS Clinical Standards Toolkit, both types are
part of the same example for demonstration purposes.
In the SAS Clinical Standards Toolkit, the CDISC ODM standard supports reading and
representing in SAS a complete odm.xml file, and building an odm.xml file. The SAS
Clinical Standards Toolkit validates both the structure and content of the SAS
representation of each odm.xml file and the structural integrity of that file. The SAS
Clinical Standards Toolkit also supports the extraction of subject or reference data for a
data set (such as an SDTM AE domain) from the odm.xml file.
To support all of this functionality, supplemental files include these global standards
library files:
n A SAS format catalog (odmct.sas7bcat) in the formats folder provides valid values
for selected columns in the 66 tables of the SAS representation.
n The Messages data set in the messages folder provides error messaging for all
Validation Master checks.
n The Validation Master data set in the validation/control folder contains the
superset of checks validating the structure and content of the 66 tables.
128 Chapter 5 / Supported Standards
n SAS code in the macros folder provides CDISC ODM-specific code that augments
the code provided in the primary SAS Clinical Standards Toolkit autocall library
(!sasroot/cstframework/sasmacro).
It is this set of files, in whole or in part, that defines the CDISC ODM 1.3.0 reference
standard.
codelistaliases formaliases
codelistitemaliases methodaliases
codelisttranslatedtext mualiases
conditionaliases protocolaliases
enumerateditemaliases studyeventaliases
n The table metadata for these 76 data sets can be found in the reference_tables data
set in the standard metadata folder. Column metadata for the 352 columns in these
76 data sets can be found in the reference_columns data set in the standard
metadata folder.
This set of files, in whole or in part, defines the CDISC ODM 1.3.1 reference standard.
CDISC SEND 3.0 129
Purpose
The CDISC SEND standard defines a standard structure for data tabulations that are
designed to support single-dose general toxicology studies, repeat-dose general
toxicology studies, and carcinogenicity non-clinical studies. CDISC SEND is based on
CDISC SDTM. These data tabulations are submitted as part of a product application to
a regulatory authority such as the FDA.
The data sets and columns required for a product application are not prescribed by the
standard. Instead, requirements are based on the trial protocol and discussions with the
regulatory authority in charge of reviewing the application. Therefore, any SAS Clinical
Standards Toolkit standard, including the CDISC SEND standard, is only a
representative sample or template.
Release Date
CDISC Standard for Exchange of Nonclinical Data (SEND), Final Version 3.0, May 19,
2011
Purpose
Version 1.1 of the Clinical Data Acquisition Standards Harmonization (CDASH) standard
identifies the basic data collection fields needed from a clinical, scientific, and regulatory
perspective. The data collection fields enable more efficient and consistent data
collection at clinical research sites.
This standard is designed to be used by clinical trials personnel who are responsible for
collecting, cleaning, and ensuring the integrity of clinical trials data.
CDISC CDASH 1.1 131
The CDISC CDASH and CDISC SDTM standards are related. The CDISC SDTM
standard provides a standard for the submission of data. The CDISC CDASH standard
is needed earlier in the data flow process. It defines a basic set of data collection fields
(or variables) that are expected to exist in the majority of CRFs. The data collection
fields are highly recommended, recommended, or conditional. The CDASH data
collection fields facilitate mapping to the CDISC SDTM structure, which is required for
the submission of data.
The CDASH 1.1 standard describes the basic recommended data collection fields for 16
domains commonly used in clinical trials.
Release Date
CDISC Clinical Data Acquisition Standards Harmonization (CDASH) Standard, Version
1.1, January 18, 2011
Purpose
The CDISC Controlled Terminology standard supports standardizing values for columns
in data submitted to the regulatory authorities. Standardization facilitates loads into
regulatory databases, data review, and analysis. The initial standardization of values
has primarily been in support of SDTM submission data and the CDISC CDASH
(Clinical Data Acquisition Standards Harmonization) development of standardized data
collection instruments.
The SAS Clinical Standards Toolkit offers snapshots of the NCI EVS Thesaurus. These
snapshots are typically coordinated with the release of other CDISC standards that use
the thesaurus. Several snapshots are currently supported across several standards.
CDISC Controlled Terminology 133
The SAS Clinical Standards Toolkit offers a tool to import controlled terminology from
the ODM XML files that can be downloaded from the NCI CDISC Controlled
Terminology FTP site (http://evs.nci.nih.gov/ftp1/CDISC/).
For SDTM, these snapshots are supplied, which support the Study Data Tabulation
Model Implementation Guide (SDTMIG):
n The 201212 snapshot was taken from the NCI EVS Controlled Terminology for
SDTM, released December 2012.
n The 201312 snapshot was taken from the NCI EVS Controlled Terminology for
SDTM, released December 2013.
n The 201406 snapshot was taken from the NCI EVS Controlled Terminology for
SDTM, released June 2014.
For SEND, these snapshots are supplied, which support the Standard for the Exchange
of Nonclinical Data Implementation Guide Version 3.0 (SENDIG V3.0):
n The 201212 snapshot was taken from the NCI EVS Controlled Terminology for
SEND, released December 2012.
n The 201312 snapshot was taken from the NCI EVS Controlled Terminology for
SEND, released December 2013.
n The 201406 snapshot was taken from the NCI EVS Controlled Terminology for
SEND, released June 2014.
For ADaM, these snapshots are supplied, which support the Analysis Data Model
Implementation Guide Version 1.0 (ADaMIG v1.0):
n The 201101 snapshot was taken from the NCI EVS Controlled Terminology for
ADaM, released January 2011.
n The 201107 snapshot was taken from the NCI EVS Controlled Terminology for
ADaM, released July 2011.
n The 201512 snapshot was taken from the NCI EVS Controlled Terminology for
ADaM, released December 2015.
134 Chapter 5 / Supported Standards
For Questionnaires (QS), the following snapshot is supplied, which supports the
Questionnaire Controlled Terminology for the current version of the Study Data
Tabulation Model Implementation Guide:
n The 201312 snapshot was taken from the NCI EVS Controlled Terminology for
Questionnaires, released December 2013.
n The 201406 snapshot was taken from the NCI EVS Controlled Terminology for
Questionnaires, released June 2014.
For CDASH, these snapshots are supplied, which support the Clinical Data Acquisition
Standards Harmonization Standard Version 1.0 (CDASH STD v1.0):
n The 201212 snapshot was taken from the NCI EVS Controlled Terminology for
CDASH, released December 2012.
n The 201312 snapshot was taken from the NCI EVS Controlled Terminology for
CDASH, released December 2013.
n The 201403 snapshot was taken from the NCI EVS Controlled Terminology for
CDASH, released April 2014.
Note: Although SAS does not provide the SAS Clinical Standards Toolkit with the
CDASH standard, the terminology is provided as a convenience.
Purpose
CDISC Dataset-XML defines a standard format for transporting tabular data in XML
between any two entities based on CDISC ODM XML. In addition to supporting the
transport of data sets as part of a submission to the FDA, Dataset-XML can be used to
exchange data between two parties. For example, the Dataset-XML data format can be
used by a CRO to transmit SDTM or ADaM data sets to a sponsor organization.
Dataset-XML supports SDTM, ADaM, and SEND data sets but can also be used to
exchange any other type of tabular data set.
The metadata for a data set in a Dataset-XML file must conform to the Define-XML
standard. Each Dataset-XML file contains data for a single data set, but a single Define-
XML file describes all of the data sets included in the folder. Both Define-XML 1.0 and
Define-XML 2.0 are supported for use with Dataset-XML.
Release Date
CDISC Dataset-XML Version 1.0 Specification, Production Version 1.0.0, April 22, 2014
Regulatory Basis
In the United States, the approval process for regulated human and animal health
products requires the submission of data from clinical trials and other studies as
expressed in the Code of Federal Regulations (CFR). The FDA established the
regulatory basis for wholly electronic submission of data in 1997 with the publication of
regulations on the use of electronic records in place of paper records (21 CFR Part 11).
In 1999, the FDA standardized the submission of clinical and non-clinical data using the
SAS Version 5 XPORT Transport Format and the submission of metadata using
Portable Document Format (PDF), respectively. In 2005, the Study Data Specifications
published by the FDA included the recommendation that data definitions (metadata) be
provided as a Define-XML file.
136 Chapter 5 / Supported Standards
On November 5, 2012, the FDA held a meeting entitled “Regulatory New Drug Review:
Solutions for Study Data Exchange Standards”, the purpose of which was to solicit input
regarding the advantages and disadvantages of current and emerging open,
consensus-based standards for the exchange of regulated study data. CDISC Dataset-
XML was presented as an alternative for consideration.
In 2014, the FDA conducted a pilot to evaluate CDISC Dataset-XML as a solution to the
challenges of the SAS Version 5 XPORT transport.
6
SASReferences File
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Overview
The SAS Clinical Standards Toolkit supports the submission of SAS processes using
predefined metadata files. These files are introduced and described in Chapter 3,
“Metadata File Descriptions,” on page 33. The key metadata file that supports this
functionality is the SASReferences file. This SAS data set essentially identifies all of the
key inputs and outputs for any SAS Clinical Standards Toolkit process. Each unique
process can have an associated, unique SASReferences file. However, the SAS Clinical
Standards Toolkit offers many standardization aids, so more generic SASReferences
files are preferable.
The required SASReferences file structure is provided in Table 3.3 on page 42 and
example content is provided in Figure 3.5 on page 45.
138 Chapter 6 / SASReferences File
The SAS Clinical Standards Toolkit offers several ways to create a SASReferences file
for use in subsequent processes.
1 Use sample SASReferences files that are provided with the SAS Clinical Standards
Toolkit. These sample SASReferences files contain the required and optional
contents for specific tasks. For example, the task of validating the functionality of
CDISC SDTM 3.1.2 uses the SASReferences file found here in SAS 9.3:
sample study library directory\cdisc-sdtm-3.1.2-1.7\
sascstdemodata\control
An excerpt of this sample SASReferences file is provided in Figure 3.5 on page 45.
2 The SAS Clinical Standards Toolkit provides SASReferences templates for use.
These templates are either zero-observation data sets or data sets containing
records that must be modified. A SASReferences data set template is located here:
global standards library directory/standards/
cst-framework-1.7/templates
The SAS Clinical Standards Toolkit provides default SASReferences data sets for
each supported standard. These default SASReferences data sets contain records
that are commonly required for certain SAS Clinical Standards Toolkit tasks (such as
validation). However, all records that are required might not be included. Or, all
records that are included might not be required for certain tasks. And, SAS librefs,
filerefs, paths, and memname values might require modification. For example, see
the StandardSASReferences data set found here:
Building a SASReferences File 139
3 The SAS Clinical Standards Toolkit provides the utility macros to build and return
many SAS Clinical Standards Toolkit metadata data sets.
n The %CST_GETSTANDARDSASREFERENCES macro returns the
StandardSASReferences data set. (See the file description in Chapter 3,
“Metadata File Descriptions,” on page 33 for the specified standard.)
n The %CST_CREATEDSFROMTEMPLATE macro can be used to return an
empty SASReferences data set.
Use of these utility macros is illustrated later in this chapter.
The primary function of the SASReferences file is to define the SAS Clinical Standards
Toolkit process inputs and outputs. What information does the process need to
reference? What does the process produce? Where does the information come from
and go? The “what” information is determined by the use of two SASReferences fields:
type and subtype. The “where” information is determined by path and memname. The
values for all of these fields are restricted for the SAS Clinical Standards Toolkit to
values itemized in the framework Standardlookup data set found here:
global standards library directory/standards/cst-framework-1.7/
control/standardlookup.sas7bdat
Customizing the type and subtype values in the Standardlookup data set is allowed.
Customization is a prerequisite if you want to use the field values in any
SASReferences data set that is used by the SAS Clinical Standards Toolkit.
140 Chapter 6 / SASReferences File
The following table lists and describes the acceptable type and subtype values in the
framework Standardlookup data set:
Table 6.1 SAS Clinical Standards Toolkit SASReferences Type and Subtype Values
resultspackage xml or log This type is not used in the SAS Clinical
Standards Toolkit. This type bundles a set
of process inputs and outputs together for
later access.
standardmetadata attribute or element Identifies the SAS data set templates for
valid_attributes and valid_elements when
validating ODM files.
146 Chapter 6 / SASReferences File
Every instance of the SASReferences file does not require a specific path and filename.
At the beginning of this section, a call to this macro was described:
%cst_getstandardsasreferences(_cstStandard=CST-FRAMEWORK,
_cstStandardVersion=1.2,_cstOutputDS=sasreferences);
The following display shows that this macro call produces this SASReferences file:
The SASref field, with values of cstmeta and control, points to the same path field
value. The control SASref was retained to ensure backward compatibility with past
releases.
Figure 6.2 on page 148 shows the information returned by this call to
%CST_GETSTANDARDSASREFERENCES for the CDISC SDTM standard:
%cst_getstandardsasreferences(_cstStandard=CDISC-SDTM,
_cstOutputDS=sasreferences);
148 Chapter 6 / SASReferences File
A comparison of Figure 6.1 on page 147 and Figure 6.2 on page 148 shows little
similarity in the record types and no overlap in references to specific files. The target
inputs and outputs for CDISC SDTM are more focused on the task (for example,
validating SDTM domains). The SAS Clinical Standards Toolkit validation processes
require specification of a comparative reference standard. Here, there are references to
a standard-specific macro library (autocall), Messages data set, and properties files.
Unique SASref values by type are provided, pointing to distinct files and folders in the
global standards library.
Consider an actual SASReferences file built to support CDISC SDTM 3.1.2 validation.
The task of validating the functionality of CDISC SDTM 3.1.2 uses the SASReferences
file here in SAS 9.3 and SAS 9.4:
sample study library directory\cdisc-sdtm-3.1.2-1.7\
sascstdemodata\control
Building a SASReferences File 149
The following display shows the complete contents of the SASReferences file:
Table 6.2 Explanation of Sample SASReferences File for CDISC SDTM Validation
Lines Comment
1 Instructs the SAS Clinical Standards Toolkit to add any SDTM-specific macros to
the autocall path.
2 Documents the name and location of this file. This information is used in the
sample reports that are discussed in this document.
3 Points to the set of validation checks to be run in this validation assessment. The
framework default values for SASref, path, and memname have been overridden.
4, 22 Two standards are referenced to create a format search path. Line 4 references
the SDTM study-specific formats catalog. Line 22 references the more general
CDISC Controlled Terminology cterms catalog. The precedence is set by the
order column.
Lines Comment
8 The validation properties path has been modified to point to a location in the
study hierarchy, rather than to the global standards library that is defined in the
StandardSASReferences file.
9–12, 14– Points to the reference standard for CDISC SDTM 3.1.2, but unlike the template
15, 21, 24 defaults in Figure 6.2 on page 148, path and memname are blank. Leaving them
blank tells the SAS Clinical Standards Toolkit to look in the CDISC SDTM 3.1.2
StandardSASReferences file and use the defaults for that standard and version.
This convention facilitates portability of the data set by doing a run-time lookup
for the current information. The lookup results in the inclusion of the path and
memname values as defined in Figure 6.2 on page 148.
16–17 Specifies that process results are to be stored in a location in the study hierarchy.
19–20 These values follow the style used in line 18 for source data. The same SASref is
used for multiple subtypes in a single type because the subtypes reference two
differently named SAS data sets from the same folder.
.
.
quit;
This macro copies the template. New records can be added various ways, including the
previous PROC SQL technique. There is no requirement that the SASReferences file
has to live outside the SAS Work area and be kept beyond the SAS Clinical Standards
Toolkit process. However, these are best practices that enable future capabilities such
as process reruns and reporting.
Overview
After a SASReferences file has been created for a task, three key steps occur.
1 The name and location of the file must be communicated to the SAS Clinical
Standards Toolkit.
3 The file content is translated into allocated SAS libraries and filenames, system
options are set, and required work files are created.
After these steps are completed, a SAS environment has been properly established to
support subsequent SAS Clinical Standards Toolkit tasks.
Sample driver programs are provided with the SAS Clinical Standards Toolkit. These
driver programs show how to perform the necessary setup tasks for SAS Clinical
Standards Toolkit processes, and how to reference and use sample data that is
provided with the SAS Clinical Standards Toolkit.
The following table lists the parameters that are supported by the
%CSTUTIL_PROCESSSETUP macro:
Parameter Description
Parameter Description
Excluding the SAS Clinical Standards Toolkit reporting processes, to communicate with
a SASReferences file, use one of these two methods:
Note: The SAS Clinical Standards Toolkit reporting processes might use the
_cstSASReferencesSource=RESULTS parameter.
1 Create and reference the SASReferences file in the SAS Work library.
%* The following call assumes the existence of work.sasreferences;
%cstutil_processsetup();
The call to the %CSTUTIL_SETCSTROOT macro sets the SAS Clinical Standards
Toolkit global macro variable &_cstSRoot to the sample library.
If you have used previous versions of the SAS Clinical Standards Toolkit, you might see
failures when you use the %CSTUTILVALIDATESASREFERENCES macro against
SASReferences data sets that were created in a version before the SAS Clinical
Standards Toolkit 1.5. These failures are caused by the stricter adherence to the
SASReferences metadata model that the %CSTUTILVALIDATESASREFERENCES
macro enforces.
Here is the syntax of this macro:
%macro cstutilvalidatesasreferences (_cstDSName=,
_cstStandard=,_cstStandardversion=, _cstSASRefsGoldStd=,
_cstallowoverride=, _cstResultsType=, _cstPreAllocated,
_cstVerbose= );
_cstDSName specifies the two-level name of the data set to be validated. This value is
required. The default value is &_cstSASRefs derived from the process setup macro.
_cstStandard specifies the name of a registered data standard. This value is required.
The default value is CST-FRAMEWORK.
_cstallowoverride specifies whether to ignore one or more of the values defined above.
Specify the check code in a blank-delimited string (for example, CHK01 CHK07). If null,
all conditions are tested.
_cstResultsType specifies where to store report findings: in the SAS log or in the
Results data set. This value is required. It must be either LOG or RESULTS. The default
value is LOG.
_cstPreAllocated specifies whether to allocate librefs and filerefs when this macro is
called. If they are not allocated, the validation of data sets and catalogs is performed
based on paths and memnames, not on libref.memnames. This value is required. It
must be either N or Y. The default value is N.
This macro is typically used as a part of the normal process setup. It is called either
before or as a part of %CSTUTIL_ALLOCATESASREFERENCES or as a stand-alone
call outside the context of use in the normal process setup. The macro sets the _cst_rc
and _cst_rcmsg global macro variables to indicate that the SASReferences data set is
valid (_cst_rc=0) or not valid (_cst_rc ne 0).
There are eight checks associated with this macro when validating a SASReferences
data set.
n CHK01: The data set is structurally correct.
n CHK02: An unknown standard or standardversion exists.
n CHK03: The referenced input and output files and folders can be accessed.
n CHK04: All required look-throughs to the global standards library defaults work.
n CHK05: All discrete character field values are found in the Standardlookup data set.
n CHK06: For the given context, path and memname macro variables are resolved.
n CHK07: Multiple fmtsearch records exist, but valid ordering is not provided.
n CHK08: Multiple autocall records exist, but valid ordering is not provided.
156 Chapter 6 / SASReferences File
In the SAS Clinical Standards Toolkit 1.5, additional columns were included in the
SASReferences data set to facilitate internal validation. Two of these columns are iotype
and filetype. To remain backward compatible, if the SASReferences data set is missing
these two columns, CHK03 is ignored because the
%CSTUTIL_VALIDATESASREFERENCES macro assumes that the SASReferences
data set was created in a version before the SAS Clinical Standards Toolkit 1.5.
Results are written to the Results data set defined by the &_cstResultsDS global macro
variable.
Based on the value of iotype, the macro has detected a specified input file, data set,
or catalog that does not exist in the path provided by SASReferences. For iotype
equal to 'output' or 'both,' the specified path is Read-Only and does not allow the
SAS Clinical Standards Toolkit to create an output file.
Correct this issue by ensuring that pathnames, filenames, data set names, and
catalog names are entered correctly. For output file references, ensure that the user
account has Write access permission to the folders that are specified in
SASReferences.
n CHK04 - Required look-throughs to the global standards library defaults do not work.
For this check to be meaningful, ensure that a call to
%CST_INSERTSTANDARDSASREFS has been performed before running this
check. Otherwise, empty pathnames might exist that are populated with a call to
%CST_INSERTSTANDARDSASREFS.
This check is not applicable to stand-alone use. This check detects pathnames that
are missing or null.
Correct this issue by verifying that the call to %CST_INSERTSTANDARDSASREFS
was made before running this check. Otherwise, provide a valid pathname for each
missing value.
n CHK05 - Not all discrete character fields were found in the Standardlookup data set.
This check detects missing or incorrect names for the following columns in
SASReferences: reftype, type+subtype combinations, iotype, filetype, and
allowoverwrite.
Note: Because iotype, filetype, and allowoverwrite were introduced in the SAS
Clinical Standards Toolkit 1.5, these columns are ignored when
&_cstCurrentStyle=0. (See check CHK03.)
Correct this issue by providing valid values for these columns in SASReferences. If
needed, update the Standardlookup data set.
Note: Updating the Standardlookup data set is an advanced use of the SAS Clinical
Standards Toolkit and should be performed by an administrator.
n CHK06 - For the given context, all macro variables have not been resolved.
158 Chapter 6 / SASReferences File
This check detects unresolved macro variables used in the memname and path
columns.
Correct this issue by making sure all macro references used in SASReferences have
been resolved.
n CHK07
To ensure proper FMTSEARCH functionality in SAS, the order in which the
fmtsearch string is built is very important for the proper functioning of the SAS
Clinical Standards Toolkit. This check detects multiple fmtsearch records with invalid
order values. Invalid order values could be missing or duplicate values.
Correct this issue by assigning valid order values for multiple fmtsearch records.
n CHK08
To ensure proper AUTOCALL macro functionality in SAS, the order in which the
autocall macro string is built is very important for the proper functioning of the SAS
Clinical Standards Toolkit. This check detects multiple autocall records with invalid
order values. Invalid order values could be missing or duplicate values.
Correct this issue by assigning valid order values for multiple autocall records.
5 Any property files are passed to cst_setproperties to create global macro variables.
6 The format search path is set if any type=fmtsearch records are found, based on the
order that is specified.
7 The autocall path is set if any type=autocall records are found, based on the order
that is specified. By default, the framework macro library was added to the autocall
path when the SAS Clinical Standards Toolkit was deployed.
8 A Messages data set is created to contain records from each standard, based on the
properties or global macro variables _cstMessages and _cstMessageOrder. The
Messages data set is used for the duration of the process to add fully resolved
messages to the Results data set.
After all of these steps have been performed, all libraries should be allocated, all paths
and global macros should be set, and the global status macro variable _cst_rc should
be set to 0. The process is ready to proceed.
CAUTION! SASReferences is key to the process, and any errors cause the
process to fail. This is a common process failure point because of the importance of
the SASReferences file, and the strict structural and content expectations of the file. For
tips on debugging problems with the SASReferences file, see “Common Errors and
Solutions” on page 156.
TIP Best Practice Recommendation: Each SASReferences file is customized for the
specific task to be completed. Later sections describe SASReferences
implementations required by these specific tasks.
160 Chapter 6 / SASReferences File
161
7
Compliance Assessment Against a
Reference Standard
The SAS Clinical Standards Toolkit provides a framework to build a process. The
process uses inputs or process controls to evaluate the compliance of source data with
a reference standard. Each SAS Clinical Standards Toolkit process uses a SAS
program file to point to a SASReferences control data set, and to execute a primary
action SAS macro (such as %SDTM_VALIDATE). This SAS program file is referred to
as a driver program in this document.
Generally, validation is performed by running SAS macros against the standard, which
is represented by SAS files. Validation of some standards, such as CDISC CRT-DDS,
might include validating files that are not SAS files (such as define.xml).
164 Chapter 7 / Compliance Assessment Against a Reference Standard
The following display shows a SAS Clinical Standards Toolkit validation process:
o Reference Metadata is a set of SAS data sets that provide metadata. This
metadata defines a specific standard and is typically in a format specific to a
standard. For example, metadata about data sets might be captured in a
reference_tables data set. Metadata about columns in those data sets might be
captured in a reference_columns data set. For an example, see Table 5.1 on
page 95 and Table 5.2 on page 96.
o Properties are a series of name-value pairs that are translated into SAS global
macro variables. These macro variables are available for the duration of the SAS
Clinical Standards Toolkit process. Properties might be defined in a varied
number of files. Both text file format and SAS data set format are supported. For
information about a sample validation.properties file, see “Validation Check
Metadata: Validation Master” on page 173. For information about the SAS
Clinical Standards Toolkit global macro variables, see Appendix 1, “Global Macro
Variables,” on page 459.
o Set of Checks to Run is a set of checks that represent all or some of the checks
defined for a standard. Each check provides metadata that is used by the
validation code to perform a specific compliance assessment.
n Controlled Terminology is an optional set of lookup values against which source data
columns can be evaluated. These values can be in the form of SAS format catalogs
or SAS data sets.
n Results are presented in a Results data set that itemizes the process findings, and in
a Metrics data set that summarizes the results. The Results data set usually
contains a record indicating that each check was run successfully without error, or it
contains a record that itemizes the errors detected. Information about the process
also might be included. The generation of a Metrics data set is conditional based on
property file settings.
The SAS Clinical Standards Toolkit validation makes these basic assumptions:
1 There is some combination of source data and metadata available as SAS files that
you want to validate.
2 A reference standard has been defined with which the source data and metadata are
to be compared. The SAS Clinical Standards Toolkit provides representative
reference metadata for each supported standard.
166 Chapter 7 / Compliance Assessment Against a Reference Standard
3 The source data can be in a varied number of SAS files, and those SAS files can
have any form. However, the metadata describing the source data must accurately
represent the source data. The metadata must be in a form specific to a supported
standard and defined by the SAS Clinical Standards Toolkit.
4 A set of validation checks must be defined, and the validation checks must conform
to a generic SAS Clinical Standards Toolkit SAS data set structure. The SAS Clinical
Standards Toolkit provides a representative set of validation checks for each
supported standard.
Metadata Requirements
Overview
As noted in Chapter 5, “Supported Standards,” on page 87, a standard consists of
properties, messages, and metadata files that collectively represent the standard in the
SAS Clinical Standards Toolkit. Each SAS Clinical Standards Toolkit registered standard
can support validation if the standards.supportsvalidation flag is set to Y. This setting
indicates that the required set of validation files defining the standard exist. By default,
the set of validation files that supports the standards that are provided by SAS is in the
cstGlobalLibrary folder hierarchy.
For example, validation files that define the CDISC SDTM 3.1.3 standard are in this
folder hierarchy:
global standards library directory/standards/cdisc-sdtm-3.1.3–1.7
The following sections describe each metadata type used by typical validation
processes. For information about metadata files that are common to all SAS Clinical
Standards Toolkit processes, see Chapter 3, “Metadata File Descriptions,” on page 33.
Metadata characteristics specific to compliance assessments are described in the
sections in this chapter.
Metadata Requirements 167
Reference Metadata
For CDISC standards, reference metadata about data sets is defined in a
reference_tables data set, and metadata about columns is defined in a
reference_columns data set. An example of a CDISC SDTM reference_tables record is
provided in Table 7.1 on page 167 and an example of a CDISC SDTM
reference_columns record is provided in Table 7.2 on page 169.
Note: The structure and content of the reference metadata data sets can vary across
standards.
Column
Column Name Length Description
sasref $8 The SAS libref that refers to the table in the SAS Clinical
Standards Toolkit process. This value should match the
value of the SASReferences.sasref field, where
type=referencemetadata and subtype=table. This column is
required.
table $32 The name of the tabulation domain or analysis data set
being defined in the standard. The value must conform to
SAS naming conventions. This column is required.
label $200 The label of the domain being defined in the standard. The
value must conform to SAS naming conventions. This
column is required for standards from which define.xml
metadata is derived.
168 Chapter 7 / Compliance Assessment Against a Reference Standard
Column
Column Name Length Description
xmlpath $200 The path to the SAS transport file. This path can be
specified as a relative path. The value can be used when
creating define.xml to populate the value for the def:leaf
xlink:href link to the domain file. The value should be the
pathname and filename of the SAS transport file relative to
the location of define.xml file. This column is optional and
not relevant for all standards.
xmltitle $200 The title of the SAS transport file. The value can be used
when creating a define.xml file to populate the value for the
def:leaf def:title value. It can provide a meaningful
description, label, or location of the domain leaf (for
example, crt/data sets/Protocol 1234/AE.xpt). This column is
optional and not relevant for all standards.
state $20 A description of the table state, such as Draft or Final. This
column is optional.
Column
Column Name Length Description
standard $20 This value captures the standard name. This value must
match the name of a registered standard in the SAS Clinical
Standards Toolkit framework. For a discussion of registered
standards, see Chapter 2, “Framework,” on page 7. This
value must match the standard field in the SASReferences
data set. Examples are CDISC SDTM and CDISC CRT-
DDS. This column is required.
comment $500 Any character string that provides comments relevant to the
table. This column is optional.
Note: The column length can vary to match submission requirements or corporate
conventions.
sasref $8 The SAS libref that refers to the table containing the
column in the SAS Clinical Standards Toolkit process.
This value should match the value of the
SASReferences.sasref field, where
type=referencemetadata and subtype=column. This
column is required.
table $32 The name of the tabulation domain or analysis data set
being defined in the standard. The value must conform
to SAS naming conventions. This column is required.
170 Chapter 7 / Compliance Assessment Against a Reference Standard
column $32 The name of the column in the table. The value must
conform to SAS naming conventions. This column is
required.
label $200 The label of the column. The value must conform to
SAS naming conventions. This column is required for
standards from which define.xml metadata is derived.
displayformat $32 The display format for numeric variables. For example,
8.2 indicates that floating-point variable values should
be displayed to the second decimal place. This value is
optional and not relevant for all standards.
origin $40 Information about the source of the column. Values can
include CRF page numbers and derived or variable
references. Values are user extensible. This column is
optional and not relevant for all standards.
standard $20 This value captures the standard name. This value
must match the name of a registered standard in the
SAS Clinical Standards Toolkit framework. For a
discussion of registered standards, see Chapter 2,
“Framework,” on page 7. This value must match the
standard field in the SASReferences data set.
Examples are CDISC SDTM and CDISC CRT-DDS.
This column is required.
Note: The column length can vary to match submission requirements or corporate
conventions.
The standard reference metadata provided with the SAS Clinical Standards Toolkit is in
the global standards library. By default, this library is located here:
global standards library directory/standards/
<specific standard>/metadata
For example, for the CDISC SDTM 3.1.3 standard, the location is:
global standards library directory/standards/
cdisc-sdtm-3.1.3-1.7/metadata
This global standards library metadata folder can contain other standard-specific
metadata. For example, CDISC SDTM includes class_tables and class_columns data
sets. These data sets have more generic metadata than specific domain instances like
DM or AE, and they are most useful when deriving new, custom domains. For example,
if a new CDISC SDTM events domain is required, you can initialize table metadata
based on the EVENTS record in class_tables data set, and can initialize column
metadata based on the EVENTS, IDENTIFIERS, and TIMING records in the
class_columns data set.
Source Metadata
The SAS Clinical Standards Toolkit validation processes require source metadata that
describes source (study) domains and columns. This is the study data that is to be
validated. The SAS Clinical Standards Toolkit assumes that the reference metadata
Metadata Requirements 173
The SAS Clinical Standards Toolkit assumes that source_tables and source_columns
data sets accurately reflect and are consistent with the source data that they describe.
Although some standard-specific validation checks might look for discrepancies and
report them in detail, failure to accurately reflect and be consistent with the source data
can lead to errors in the SAS Clinical Standards Toolkit validation process. It can even
halt the execution of the process.
The SAS Clinical Standards Toolkit requires that this data set have a fixed structure.
174 Chapter 7 / Compliance Assessment Against a Reference Standard
The following table lists the columns in the Validation Master data set:
Column
Column Name Length Description
standard $20 This value captures the standard name. This value
must match the name of a registered standard in the
SAS Clinical Standards Toolkit framework. For a
discussion of registered standards, see Chapter 2,
“Framework,” on page 7. This value must match the
standard field in the SASReferences data set.
Examples are CDISC SDTM and CDISC CRT-DDS.
This column is required.
Column
Column Name Length Description
checksource $40 A string that identifies the source of the check. CDISC
examples include SAS, WebSDM, and CDISC. This
field can contain any user-defined value. A primary use
of this field is to subset the full set of checks in the run-
time Validation Control data set. This column is
required.
Column
Column Name Length Description
codesource $32 The name of the check macro. The name must conform
to SAS naming conventions. The value must be in the
SAS autocall path. An example is
%CSTCHECK_NOTUNIQUE. This column is required.
Column
Column Name Length Description
Column
Column Name Length Description
Column
Column Name Length Description
lookuptype $20 This value defines the type of information to use for
value comparison to some standard. Values include:
Metadata: Use the SAS Clinical Standards Toolkit
metadata. Specifically, use the value of the column
metadata field xmlcodelist to identify the codelist
(rendered as a SAS format).
Format: Use a SAS format from the SAS format search
path.
Dataset: Use a reference SAS data set (for example,
medDRA). There are no SAS Clinical Standards Toolkit
requirements for the structure and content of the
reference SAS data set.
<extensible>: Other user-defined values can be used if
there are explicitly referenced in user-written code.
This column in optional.
180 Chapter 7 / Compliance Assessment Against a Reference Standard
Column
Column Name Length Description
Column
Column Name Length Description
Column
Column Name Length Description
The content of the Validation Master data set is based on a combination of compliance
requirements and the SAS representation of the standard.
The following table describes a sample Validation Master data set record for the CDISC
SDTM 3.1.3 standard:
Table 7.4 Sample CDISC SDTM 3.1.3 Validation Master Data Set Record
checkseverity Warning
checktype Column
lookuptype
lookupsource
standardref
reportingcolumns
checkstatus 1
uniqueid SDTM086001CST150SDTM3
122012-06-08T10:49:21CST
comment
The Validation Master data set contains all validation checks for a standard, whereas
the Validation Control data set is the run-time equivalent and contains just the validation
checks to be run in a validation process. The Validation Control data set is structurally
equivalent to the Validation Master data set. For additional information about how the
validation check metadata in the Validation Control data set is used in the SAS Clinical
Standards Toolkit validation processes, see “Special Topic: How the SAS Clinical
Standards Toolkit Interprets Validation Check Metadata” on page 236.
Column
Column Name Length Description
standard $20 This value captures the standard name. This value must
match the standard in the associated Validation Master
data set. This column is required.
The content of the Validation_StdRef data set is based on information from any source
that supports the check.
186 Chapter 7 / Compliance Assessment Against a Reference Standard
Table 7.6 Sample CDISC SDTM 3.1.3 Validation_StdRef Data Set for Check SDTM0860 —
Record 1
Table 7.7 Sample CDISC SDTM 3.1.3 Validation_StdRef Data Set for Check SDTM0860 —
Record 2
build a list of target tables and columns. For more information, see “Special Topic: How
the SAS Clinical Standards Toolkit Interprets Validation Check Metadata” on page 236.
The Validation_DomainsByCheck data set is located here:
global standards library directory/standards/cdisc-sdtm-3.1.x/
validation/control
It contains records for each domain to be validated by each check in the Validation
Master data set. This data set is used by reporting tools that are provided with the SAS
Clinical Standards Toolkit to report domain-specific errors. For more information, see
Chapter 11, “Reporting,” on page 443. It is also available to other programs and
applications that might need to subset checks that are applicable to specific domains.
The SDTM version of the Validation_DomainsByCheck data set that is provided by SAS
is built from the version of the Validation Master data set that is also provided by SAS. If
the tableScope and columnScope columns are modified, then the
Validation_DomainsByCheck data set must also be modified or rebuilt.
Column
Column Name Length Description
table $32 This value captures the domain or table name. This
column is required.
checksource $40 A string that identifies the source of the check. This
value must match checksource in the associated
Validation Master data set.
This data set is patterned after the data set that is described in Table 7.8 on page 188.
However, the column class ($40, Observation Class within Standard) has been added.
This addition accommodates the different way that the ADaM reference standard is
defined. For example, the reference_tables data set, located in /standards/cdisc-
adam-2.1-1.7/metadata, includes a BDS record that serves as a class template for
all specific implementations of BDS that are required for a study. The SAS Clinical
Standards Toolkit does not know each of the specific analysis data sets, so the
Validation_ClassByCheck data set includes records by class, not by domain, for each
check in the ADaM Validation Master data set.
Validation.Properties
Properties specific to validation processes are provided with the SAS Clinical Standards
Toolkit. These properties enable you to specify how validation checks are to be
processed and whether metrics are to be reported.
As with all SAS Clinical Standards Toolkit properties files, a call to the
%CST_SETPROPERTIES macro is required to translate the properties into SAS global
macro variables. This call can be explicitly made as a driver program setup task, or it
can be made by including the Validation.Properties file as a record in the
SASReferences data set. For all standards that support validation, the
Validation.Properties file is required, even if no metrics are wanted because the SAS
190 Chapter 7 / Compliance Assessment Against a Reference Standard
Clinical Standards Toolkit validation process does expect, and uses, the metrics global
macro variables.
_cstMetricsDS This property sets the SAS data set name to use to
accumulate metrics during the process. The default value
is work._cstmetrics.
By default, for all standards that support validation, Validation.Properties is located here:
global standards library directory/standards/<standard>/programs
Properties can logically be associated with each study. Using the CDISC SDTM 3.1.3
sample study provided with the SAS Clinical Standards Toolkit as an example, a study-
specific instance of the Validation.Properties file is located here: sample study
library directory/cdisc-sdtm-3.1.3–1.7.
192 Chapter 7 / Compliance Assessment Against a Reference Standard
Messages
Each SAS Clinical Standards Toolkit registered standard that supports validation has a
Validation Master data set, and an associated Messages data set. The Validation
Master data set provides the super-set of checks defined for that standard. The
Messages data set provides messages to be generated during the execution of each
validation process. A distinct Messages data set record is expected for each set of
checkid and checksource values in the Validation Master data set. Messages can be
parameterized and internationalized.
By default, the standard-specific Messages data set is deployed to this directory in each
supported standard:
global standards library directory/standards/<standard>/messages
All Messages data sets in the SAS Clinical Standards Toolkit should have the same
structure. The structure is defined in Chapter 3, “Metadata File Descriptions,” on page
33.
During a process, the SAS Clinical Standards Toolkit appends any standard-specific
messages that are required by the process to any generic SAS Clinical Standards
Toolkit framework messages that are available to all processes. This appended
Messages data set follows the naming convention that is defined within the global
macro variable _cstMessages.
Validation Metrics
Generating the SAS Clinical Standards Toolkit validation metrics provides a meaningful
denominator for most validation checks. This enables you to more accurately assess
the relative scope of errors that are detected. Generally, the calculated denominator is a
count of the number of records processed in a domain.
This code segment, which is extracted from a validation check macro, shows a typical
calculation of the number of records in a domain. It also shows the macro call to add the
count to the Validation Metrics data set:
data _null_;
if 0 then set &_cstDSName nobs=_numobs;
Metadata Requirements 193
call symputx('_cstMetricsCntNumRecs',_numobs);
stop;
run;
Because a check can evaluate multiple columns in a domain, the count will be greater.
In addition, a metadata-level check that does not access the domain data directly might
report the number of metadata records instead.
The following table provides a description of the Validation Metrics data set, including
the meaning of each field:
Column
Column Name Length Description
Column
Column Name Length Description
srcdata $200 The string that specifies the domain or check macro to
which the metricparameter applies. Values should be
non-null.
The following display shows the Validation Metrics output from a SAS Clinical Standards
Toolkit validation process running CDISC SDTM validation. The Validation Control data
set contains 11 validation checks.
The missing reccount value in line 90 and the absence of other metrics for SDTM0815
indicate that the check was not run. (SDTM0815 evaluates the value of the POOLID
column, which is not used in any non-POOLDEF domain in the sample study provided
by SAS.) This should be reported in the Results data set.
Lines 93 through 95 report metrics on the SDTM0860 validation check. Two problems
are reported in the Results data set for a single subject, and these metrics (16 subjects
and 36 records tested) provide denominator information to assess how common the
problems are.
Lines 96 through 102 are summary metrics reported at the end of the SDTM validation
process in the %SDTM_VALIDATE macro. The following five problems are noted:
n one check (SDTM0815) could not be run
n two of the three warnings were for SDTM0860
n one other warning and one error condition were found
196 Chapter 7 / Compliance Assessment Against a Reference Standard
The Validation Results and Validation Metrics data sets, when used in tandem, provide
a more complete picture of each compliance assessment.
For more information about the Validation Metrics data set, see Table 7.10 on page 193.
Cross-Standard Validation
Overview
The implementation of the ADaM 2.1 standard in the SAS Clinical Standards Toolkit
requires the use of a number of cross-standard validation checks. These cross-standard
validation checks compare data and metadata between two different standards, such as
ADaM 2.1 and SDTM 3.1.2.
The SAS Clinical Standards Toolkit provides two macros that enable cross-standard
comparisons: cstcheck_crossstdcomparedomains.sas and
cstcheck_crossstdmetamismatch.sas. These macros are located here: !sasroot/
cstframework/sasmacro.
The
%CSTCHECK_CROSSSTDCOMPAREDOMAINS
Macro
The %CSTCHECK_CROSSSTDCOMPAREDOMAINS macro compares values for one
or more columns in one table with those same columns in another domain in another
standard. Or, it compares the values against metadata from the comparison standard.
The macro requires use of _cstCodeLogic as a full DATA step or PROC SQL invocation.
This DATA or SQL step assumes as input a work copy of the column metadata data set
returned by the %CSTUTIL_BUILDCOLLIST macro. Any resulting records in the
derived data set represent errors to be reported.
n ADaM SDTM domain reference (for traceability), but the SDTM domain is unknown
An ADaM 2.1 validation check that uses this macro is ADAM0653. Here is the rule
description for this check:
The
%CSTCHECK_CROSSSTDMETAMISMATCH
Macro
The %CSTCHECK_CROSSSTDMETAMISMATCH macro identifies inconsistencies in
metadata across registered standards. The macro requires use of _cstCodeLogic as a
full DATA step or PROC SQL invocation. This DATA step or SQL step assumes as input
a work copy of the column metadata data set returned by the
%CSTUTIL_BUILDCOLLIST macro. Any resulting records in the derived data set
represent errors to be reported.
198 Chapter 7 / Compliance Assessment Against a Reference Standard
Assumptions:
3 The _cstProblems data set includes at least two columns. The mnemonics are from
the global standards library data set:
n &_cstStMnemonic._value (for example, ADAM_value containing the value of the
column of interest from the primary standard)
n &_cstCrMnemonic._value (for example, SDTM_value containing the value of the
column of interest from the comparison standard)
An ADaM 2.1 validation check that uses this macro is ADAM0651. Here is the rule
description for this check, taken from the CDISC ADaM Validation document:
“ADaM column with a column name prefix of 'AE' not found in SDTM”
The full codeLogic PROC SQL step for ADAM0653 is located here:
global standards library
directory
Building a Validation Process 199
/standards/cdisc-adam-2.1-1.7/validation/control/validation_
master.sas7bdat
Overview
Building a SAS Clinical Standards Toolkit validation process is similar to building any
SAS Clinical Standards Toolkit process. The differences are the validation process
inputs and outputs, as defined in the SASReferences data set, can differ, a standard-
specific validate macro is called, and process output can include an optional Metrics
data set.
This table shows the standard-specific validation macros for all SAS Clinical Standards
Toolkit standards that support validation.
SASReferences Customizations
A SAS Clinical Standards Toolkit validation process requires that you specify a
reference standard with which the source data and metadata can be compared. The
following display shows the three records, specific to the standard and standardversion
of interest, that should be included in the SASReferences data set:
Figure 7.3 Defining the Reference Standard in the SASReferences Data Set
The empty path field signals that the path and memname information should be derived
from the StandardSASReferences data set associated with the standard and
standardversion. Including the referencecontrol and referencemetadata records is
unique to validation process in the SAS Clinical Standards Toolkit.
The SAS Clinical Standards Toolkit validation can include references to these files:
Figure 7.4 Defining the Validation-Specific Properties File in the SASReferences Data Set
Figure 7.5 Defining the Metrics Output Location in the SASReferences Data Set
The Metrics data set provides a summary of the validation process, including error
counts, processing time, and denominators for specific checks. For a complete
discussion of validation metrics, see “Validation Metrics” on page 192 and “Validation
Results and Metrics” on page 212. For information about the global macro variables
that govern metrics output, see Appendix 1, “Global Macro Variables,” on page 459.
The Metrics data set is typically output to the same location as the validation Results
data set. This location is common to all SAS Clinical Standards Toolkit processes.
3 The location of any libraries containing controlled terminology, format catalogs, and
coding dictionary data sets.
Figure 7.6 Defining the Location of Controlled Terminology in the SASReferences Data Set
The type=fmtsearch records enable you to specify multiple format catalogs (for
example, company-wide, compound, group-level, and study-level). Order in the
format search path is set by the order field. The type=referencecterm record
enables you to specify one or more lookup data sets (such as dictionary lookups like
LOINC and MedDRA). These lookup data sets do not need to conform to a specific
structure, and they do not need to be in a structure that can be read into a SAS
format. Customized code (typically in the Validation Master codelogic field) is
required to join domain data with each associated lookup data set.
Figure 7.7 Defining the Run-Time Validation Control Location in the SASReferences Data Set
The Validation Control data set is required and discussed in the following section.
202 Chapter 7 / Compliance Assessment Against a Reference Standard
A sample CDISC SDTM 3.1.3 Validation Control data set is deployed to this directory:
sample study library directory/cdisc-sdtm-3.1.3–1.7/
sascstdemodata/control
The &studyRootPath value is assumed to have been set to sample study library
directory/cdisc-sdtm-3.1.3/sascstdemodata.
The Validation Master data set (illustrated in Figure 7.3 on page 200 and in this display)
serves as the source for Validation Control content. Note that in this display, the path
and memname information have been derived from the StandardSASReferences data
set and points to the global standards library.
The following table provides examples of how to create a Validation Control data set
from the Validation Master data set. The sample code is written assuming that the code
Building a Validation Process 203
will be submitted in a context where libraries have been allocated and the format search
and autocall paths have been set.
Check
Subset Sample Code
Check
Subset Sample Code
Generally, the SAS Clinical Standards Toolkit processes validation checks in the order
in which they appear in the Validation Control data set. Each validation process honors
the default validation property _cstCheckSortOrder. If this property is not set, then the
data set order is assumed. As a part of the Validation Control derivation, checks can be
Building a Validation Process 205
sorted in any user-defined order. Or, _cstCheckSortOrder can be set to sort the
Validation Control data set at run time by any fields in that data set.
TIP Best Practice Recommendation: You might find the prioritization of checks to be
helpful in identifying problems early in the process, or for using as prerequisites for
checks that follow.
These changes should be made before the process setup begins (as changes to the
properties file), or after the process setup ends (as a series of %let statements in the
code stream).
These macro variables are used as substitution parameters later in the driver program
to reduce the number of code changes required.
Running a Validation Process 207
%cst_setStandardProperties(_cstStandard=CST-FRAMEWORK,_cstSubType=initialize);
Initialize the minimum set of global macro variables used to run any SAS Clinical
Standards Toolkit process. This includes the names of work data sets, default locations
of files, and metadata used to populate the process Results data set.
Each registered standard should have its own initialize.properties. For each standard
that is included in a specific process, the %CST_SETSTANDARDPROPERTIES macro
can be called at this point. Alternatively, type=properties records can be added to the
SASReferences data set, and the properties are processed when the
%CSTUTIL_ALLOCATESASREFERENCES macro is called. This latter approach is
followed in the SDTM validate_data.sas driver program.
%cst_getRegisteredStandards(_cstOutputDS=work._cstStandards);
data _null_;
set work._cstStandards (where=(standard="CST-FRAMEWORK"));
call symputx('_cstVersion',strip(productrevision));
run;
Get the list of registered standards to determine the version of the SAS Clinical
Standards Toolkit.
* Set Controlled Terminology version for this process *;
%cst_getstandardsubtypes(_cstStandard=CDISC-TERMINOLOGY,_cstOutputDS=work._cstStdSubTypes);
data _null_;
set work._cstStdSubTypes (where=(standardversion="&_cstStandard" and isstandarddefault='Y'));
* User can override CT version of interest by specifying a different where clause: *;
* Example: (where=(standardversion="&_cstStandard" and standardsubtypeversion='201104')) *;
call symputx('_cstCTPath',path);
call symputx('_cstCTMemname',memname);
call symputx('_cstCTDescription',description);
run;
Choose the default controlled terminology that is associated with the _cstStandard and
_cstStandardVersion. Cleanup work files.
*********************************************************************************************;
* The following data step sets (at a minimum) the studyrootpath and studyoutputpath. These *;
* are used to make the driver programs portable across platforms and allow the code to be *;
* run with minimal modification. These macro variables by default point to locations within *;
208 Chapter 7 / Compliance Assessment Against a Reference Standard
* the cstSampleLibrary, set during install but modifiable thereafter. The cstSampleLibrary *;
* is assumed to allow write operations by this driver module. *;
*********************************************************************************************;
%cstutil_setcstsroot;
data _null_;
call symput('studyRootPath',cats("&_cstSRoot",
"/cdisc-sdtm-3.1.3-&_cstVersion/sascstdemodata"));
call symput('studyOutputPath',cats("&_cstSRoot",
"/cdisc-sdtm-3.1.3-&_cstVersion/sascstdemodata"));
run;
The workPath value provides the path to the Work directory. This directory is referenced
within the sample study SASReferences data set path column. It is not required.
*****************************************************************************************;
* One strategy to defining the required library and file metadata for a CST process *;
* is to optionally build SASReferences in the WORK library. An example of how to do *;
* this follows. *;
* *;
* The call to cstutil_processsetup below tells CST how SASReferences will be provided *;
* and referenced. If SASReferences is built in work, the call to cstutil_processsetup *;
* may, assuming all defaults, be as simple as %cstutil_processsetup() *;
*****************************************************************************************;
*****************************************************************************************;
* Build the SASReferences data set *;
* column order: standard, standardversion, type, subtype, sasref, reftype, iotype, *;
* filetype, allowoverwrite, relpathprefix, path, order, memname, comment *;
* note that &_cstGRoot points to the Global Library root directory *;
* path and memname are not required for Global Library references - defaults will be used*;
******************************************************************************************;
%cst_createdsfromtemplate(_cstStandard=CST-FRAMEWORK, _cstType=control,_cstSubType=reference,
_cstOutputDS=work.sasreferences);
proc sql;
insert into work.sasreferences
values ("CST-FRAMEWORK" "1.2" "messages" "" "messages" "libref" "input" "dataset"
Running a Validation Process 209
For an explanation of the purpose and content of each SASReferences file, see Chapter
6, “SASReferences File,” on page 137. For a fully initialized SASReferences data set for
SDTM validation, see Figure 6.3 on page 149.
Note: For more information about the %CSTUTIL_PROCESSSETUP macro, see the
SAS Clinical Standards Toolkit: Macro API Documentation.
in the validate_data.sas driver reflects the acceptance of the macro parameter defaults
listed above.
*********************************************************************;
* Set global macro variables for the location of the sasreferences *;
* file (overrides default properties initialized above *;
*********************************************************************;
%let _cstSASRefsName=&_cstSASReferencesName;
%let _cstSASRefsLoc=&_cstSASReferencesLocation;
The final setup step for the %CSTUTIL_PROCESSSETUP macro is a call to the
%CSTUTIL_ALLOCATESASREFERENCES utility macro. The SASReferences data set
is now interpreted by the SAS Clinical Standards Toolkit. These actions complete the
process:
3 All filerefs and librefs are allocated. (This action is contingent on the
_cstReallocateSASRefs property or global macro variable value).
5 The format search path is set if any type=fmtsearch records are found. This is based
on the order specified.
6 The autocall path is set if any type=autocall records are found. This is based on the
order specified.
7 A Messages data set is created to contain records from each referenced standard.
This data set is based on the _cstMessages and _cstMessageOrder properties or
Running a Validation Process 211
global macro variable values. This data set is used for the duration of the process to
add fully resolved messages to the Results data set.
At this point, all libraries should be allocated, all paths and global macros should be set,
and the global status macro variable _cst_rc should be set to 0. The process is ready to
proceed.
CAUTION! The SASReferences data set is key to the process, and any errors will
cause the process to fail. This is a common process failure point because of the
importance of the SASReferences data set. For tips on debugging problems with the
SASReferences data set, see “Special Topic: Debugging a Validation Process” on page
244 and “Assessing Structural Integrity and Content” on page 153.
1 The macro looks up the Validation Control data set reference from SASReferences.
2 The macro re-sorts the Validation Control data set based on the _cstCheckSortOrder
property or global macro variable value. This step is optional.
3 Metadata about the validation process, such as the standard/version, key files
referenced, and process datetimes, is added to the process Results data set.
4 For each check in the Validation Control data set with a checkstatus > 0, this macro
calls the check macro specified in the Validation Control codesource field. It passes
all of the check metadata to the check macro.
For tips on debugging if unexpected errors occur, see “Special Topic: Debugging a
Validation Process” on page 244.
This step is optional, and it is unnecessary with batch processing. You should not clean
up prematurely or aggressively if additional SAS Clinical Standards Toolkit processes
are to be run in the same interactive SAS session.
Note: For more information about the %CSTUTIL_CLEANUPCSTSESSION macro,
see the SAS Clinical Standards Toolkit: Macro API Documentation.
Figure 7.9 on page 213 summarizes a sample validation process. Here are a few facts
about the sample validation process:
1 The validation process was run on CDISC SDTM 3.1.3 source data.
2 It referenced a Validation Control data set that contained metadata for four checks.
Note: In these displays, some rows have been hidden to reduce redundant
examples.
Table 7.13 Comments about the Validation Results Data Sets in Displays 7.9 and 7.10
Lines Comment
5-6 Informational notes that inform you that the process SASReferences data
set passed internal validation using the
%CSTUTILVALIDATESASREFERENCES macro called from two different
macros.
21 Check SDTM0815 did not run. The check scope as defined in tableScope
and columnScope found no domains other than POOLDEF in the sample
study that contained the column of interest (POOLID).
Running a Validation Process 215
Lines Comment
22-23 A single problem was detected for each of the SDTM0816 and SDTM0860
checks. Actual column values and key values for the problem records are
reported to aid in problem resolution.
For a description of the Validation Metrics data set that is associated with this example
compliance assessment, see Figure 7.2 on page 195..
A set of sample validation reports is available to summarize the SAS Clinical Standards
Toolkit validation process results and metrics. For more information, see Chapter 11,
“Reporting,” on page 443.
216 Chapter 7 / Compliance Assessment Against a Reference Standard
Overview
The SAS Clinical Standards Toolkit provides a set of defined checks for each standard,
where the global standards library directory/metadata standards data set
supportsvalidation flag is set to “Y”. By default, each Validation Master data set is
located here:
global standards library directory/standards/<standard>/
validation/control
Table 7.14 Summary of Checks in Each validation_master Data Set That Is Provided with the
SAS Clinical Standards Toolkit
ADaM 2.1 63 56 13
CRT-DDS 1.0 83 12 7
CT 1.0.0 34 14 7
SDTM 3.1.2 26 26 8
SDTM 3.1.3 48 46 11
SDTM 3.2 52 49 11
Validation Checks by Standard 217
CST-FRAMEWORK 137 92 11
Note: Starting with the SAS Clinical Standards Toolkit 1.7, OpenCDISC checks have
been removed from the validation_master data sets for CDISC-SDTM and CDISC-
ADaM.
ADaM 2.1
The CDISC ADaM validation checks are derived from the SAS interpretation of the
CDISC ADaM Validation Checks Version 1.0 (final production version dated September
20, 2010) and the CDISC ADaM Validation Checks Version 1.1 maintenance release
(dated and released January 21, 2011 to correct errors and remove duplicate checks).
Excluding the OpenCDISC checks leaves 11 CDISC-defined checks in the SAS Clinical
Standards Toolkit.
In addition, SAS has added 45 unique checks (52 total records) to the Validation Master
data set. These checks can be identified where checksource=“SAS”.
ADaM data sets are typically derived from a tabulation study, such as SDTM or SEND.
Some checks require the comparison of ADaM content with data and metadata from the
tabulation source. Of the 63 validation_master records, 10 involve a comparison with
another CDISC standard such as SDTM 3.1.3.
The validity of CRT-DDS data is determined by the standard in the form of XML schema
definitions. These XML schema definitions must be translated into checks appropriate
for the relational and tabular format.
The SAS Clinical Standards Toolkit provides 83 CDISC CRT-DDS validation checks.
These validation checks were developed by SAS and are based on CRT-DDS and ODM
implementation experience and careful review of the associated implementation guides,
with special emphasis on the occurrence of “should” within each implementation guide.
Table 7.15 on page 218 lists the types of checks for CRT-DDS data.
Each check type is assumed to operate on data that exists in a source column in a
source data set. A check type can reference one or more parameters that validate the
source column data. A parameter can be a character string or a representation of some
column other than the source column against which the source column data must be
compared.
All character comparisons are case sensitive. Character data is assumed to have been
trimmed of leading or trailing white space.
Unique in data set Structural No two values for the source column can be the
same in the same source data set.
Validation Checks by Standard 219
Required character value Data The trimmed (white space removed) value of
the character data must consist of one or more
characters.
Required numeric value Data The numeric value of the column cannot be
missing.
Foreign key Structural A value is required for this column in every row.
required(targetColumn) Each value must have an equivalent value in
the target column. This check is the equivalent
of running the required character value check,
and this check failing if that check fails. If the
required character value passes, the foreign
key check is run.
Character format: Data The character data must not contain any
fileName characters other than uppercase and lowercase
letters of the alphabet, numeric digits, an
underscore (_), or a period. Regular
expression: [A-Za-z0-9_.]+.
220 Chapter 7 / Compliance Assessment Against a Reference Standard
Unique across data Structural No value in this column can be the same as any
sets(targetcolumn0,...) value in any of the data set columns.
Primary key Data Must be unique in data set check type and the
required character value check type.
Must Have Structural For each distinct value in this column, there
Corresponding must be at least one equivalent value in the
Value(targetColumn) target column.
No Duplicates Per Unique Structural For each distinct value in the target column,
Value(targetColumn) each value in the source column must be
unique. That is, the same value cannot appear
more than once in the source column for each
distinct value in the target column.
met for the successful generation of a define.xml file. You might want to defer
structural checks until later in the process of populating the CRT-DDS data sets. This
is because foreign key relationships require that the data be made available in a
particular order (that is, a referenced key must be available before the foreign key to
it can exist).
The CDISC CRT-DDS validation also checks the data against a set of expected values.
The expected values have been stored in a format catalog (crtddsct.sas7bcat) and a
data set (crtddsct.sas7bdat). They are in the global standards library
directory/standards/cdisc-crtdds-1.0-1.7/formats folder.
The SASReferences data set needs to contain a row for fmtsearch, with SAS libref set
to crtfmt and the Filename should refer to crtddsct.sas7bcat.
As in CRT-DDS, the validity of ODM data is determined by the standard in the form of
XML schema definitions. These XML schema definitions must be translated into checks
appropriate for the relational and tabular formats.
o a regular expression
The SAS Clinical Standards Toolkit provides 179 ODM 1.3.0 and 190 ODM 1.3.1
validation checks. These validation checks were developed by SAS and are based on
ODM implementation experience and careful review of the CDISC ODM Implementation
Guide, with special emphasis on the occurrence of “should” within the Implementation
Guide.
By default, the ODM 1.3.0 Validation Master data sets are here:
global standards library directory/standards/cdisc-odm-1.3.0-1.7/
validation/control and the
Table 7.16 on page 222 lists the types of checks for ODM data.
Each check type is assumed to operate on data that exists in a source column in a
source data set. A check type can reference one or more parameters that validate the
source column data. A parameter can be a character string or a representation of a
column other than the source column against which the source column data must be
compared.
All character comparisons are case sensitive. Character data is assumed to have been
trimmed of leading and trailing white space.
Unique in data set Structural No two values for the source column can be
equivalent within the same source data set.
Required character Data The trimmed (white space removed) value of the
value character data must consist of one or more
characters.
Validation Checks by Standard 223
Enumeration(s0,s1,…) Data If character data exists, its value must match one
of the given enumerated character strings. All
string comparisons are case sensitive.
Foreign key Structural A value is required for this column in every row
required(targetColumn) and each value must have an equivalent value in
the given target column. This check is the
equivalent of running the required character
value check, and failing if that check fails. If a
required character value passes, the foreign key
check is run.
Character format: Data The character data must not contain any
fileName characters other than uppercase and lowercase
letters of the alphabet, numeric digits, the
underscore (_) character, or a period. Regular
expression: [A-Za-z0-9_.]+.
224 Chapter 7 / Compliance Assessment Against a Reference Standard
Must Have Structural For each distinct value in this column, there must
Corresponding be at least one equivalent value in the supplied
Value(targetColumn) target column.
Unique across data Structural No value in this column can be equal to any
sets(targetcolumn0,…) value in any of the given data set columns.
Primary key Data Must satisfy the Unique in data set check type
and the required character value check type.
Validation Checks by Standard 225
Invalid Value Data Documents based on ODM 1.3 should have the
ODM version set to 1.3.
External File Reference Data External file reference found because the prior
Found file OID is not missing (for example,
ODM.PriorFileOID ne ‘’)
226 Chapter 7 / Compliance Assessment Against a Reference Standard
Data Set Does Not Metadata Invalid root element. The ODM file must contain
Exist a root element called ODM. In other words, the
ODM data set must exist.
Mixed Data Exists Multirecord Typed and Untyped data transmission should not
be mixed within a single ODM file.
1 Data checks have no dependencies on data outside of the source table. An example
is ensuring that a value exists in a column in which values cannot be missing.
2 Structural checks deal with relationships and data integrity between tables. An
example is foreign key enforcement. Structural conditions must be met for the
successful generation of an ODM XML file. You might want to defer structural checks
until later in the process when populating the ODM data sets. This is because
foreign key relationships require that the data is made available in a particular order
(that is, a referenced key must be available before the foreign key to it can exist).
For the CDISC ODM validation checks that compare the data against a set of expected
values, the expected values are stored in a format catalog (odmct.sas7bcat) and a data
set (odmct.sas7bdat). For ODM 1.3.0, these are in the global standards library
directory/standards/cdisc-odm-1.3.0-1.7/formats folder. Case-sensitivity
compliance is required by the XML schema validation.
CDISC SDTM
The SAS Clinical Standards Toolkit provides validation checks in support of CDISC
SDTM 3.1.2, 3.1.3, and 3.2. These checks are derived from multiple sources that have
evolved over time. Most checks in the SAS Clinical Standards Toolkit are based on SAS
data management and cleaning experiences building CDISC SDTM domains.
Each version of the CDISC SDTM Validation Master data set (such as SDTM 3.1.3)
contains a different number of checks based on the rules that are in effect at the time of
each version and the number and type of supported tabulation domains. For more
information about the distribution of checks by version, see Table 7.14 on page 216.
It is named validation_master.sas7bdat.
Each Validation Master data set is built with multiple instances of the checks. This better
supports check selection by version or checksource (that is, WebSDM, SAS, or
228 Chapter 7 / Compliance Assessment Against a Reference Standard
customer-defined checks) and enables unique check logic and messaging by version or
checksource.
Multiple instances of a specific check are provided to handle different sets of SDTM
domains. For example, consider a check that assesses whether sequence numbers
(**SEQ) are consecutively numbered. For most domains, this is assessed in each
patient (USUBJID). However, the trial summary (TS) domain does not contain patient-
level data, so the check logic differs for this domain. The Validation Master metadata
would differ for these two instances of the check, but the check would report the same
error message for each check.
Note: The validation check data set column checkstatus indicates the state of each
check. It indicates that the check is ready to be run in its current defined state, or that
the check can be run based on some external criteria. Current valid values are 1
(active), 0 (inactive), -1 (deprecated), and -2 (not yet implemented). Values are
extensible to meet your requirements. You can choose to use other values such as 1
(draft), 2 (test), and 3 (production). If a check is included in the run-time Validation
Control data set, the SAS Clinical Standards Toolkit attempts to run the check as
defined if the checkstatus value is > 0.
Consider the interrelationships among the SAS Clinical Standards Toolkit validation
check metadata. All run-time Validation Control data sets, any programs that build or
derive from these data sets, corresponding Messages data sets, and the
Validation_StdRef data set are examples of how interconnected many SAS Clinical
Standards Toolkit metadata files are. For more information, see “Messages” on page
192. By default, the Validation_StdRef data set is located here:
global standards library directory/standards/<specific standard
and version>/validation/control
CDISC CT 1.0.0
The CDISC CT validation checks are patterned in part after the CDISC ODM checks.
The checks ensure that SAS rules for format names and non-duplicate values are
followed. A total of 34 records are defined in the Validation Master data set, which, by
default, is located here:global standards library directory/standards/
cdisc-ct-1.0.0-1.7/validation/control.
Special Topic: Validation Check Macros 229
4 SAS macros should have simple parameter signatures. All macros accept a single
parameter, _cstControl, which is a single-observation data set that contains check-
specific metadata.
5 SAS macros should be implemented as non-compiled open code.
6 SAS macros should be callable using the SAS autocall facility. The SAS Clinical
Standards Toolkit framework supports a single SAS macros library. Each SAS
Clinical Standards Toolkit standard supports an additional macros library, and the
macro library is available using the SAS autocall path.
7 Code modules should be generic and reusable with multiple validation checks. For
example, the check macros %CSTCHECK_COLUMN,
%CSTCHECK_NOTINCODELIST, and %CSTCHECK_NOTUNIQUE are used by
every standard provided with the SAS Clinical Standards Toolkit that supports
validation.
230 Chapter 7 / Compliance Assessment Against a Reference Standard
These design requirements should be used when developing custom validation check
macros. The following table identifies and describes the purpose of each of the check
macros provided with the SAS Clinical Standards Toolkit:
%CSTCHECK_COLUMN
%CSTCHECK_COLUMNCOMPARE
%CSTCHECK_COLUMNEXISTS
%CSTCHECK_COLUMNVARLIST
%CSTCHECK_COMPAREDOMAINS
Special Topic: Validation Check Macros 231
%CSTCHECK_CROSSSTDCOMPAREDOMAINS
%CSTCHECK_CROSSSTDMETAMISMATCH
%CSTCHECK_DSMISMATCH
%CSTCHECK_METAMISMATCH
%CSTCHECK_NOTCONSISTENT
%CSTCHECK_NOTIMPLEMENTED
%CSTCHECK_NOTINCODELIST
232 Chapter 7 / Compliance Assessment Against a Reference Standard
%CSTCHECK_NOTSORTED
%CSTCHECK_NOTUNIQUE
%CSTCHECK_RECMISMATCH
%CSTCHECK_RECNOTFOUND
Special Topic: Validation Check Macros 233
%CSTCHECK_VIOLATESSTD
%CSTCHECK_ZEROOBS
%CSTCHECKCOMPAREALLCOLUMNS
%CSTCHECKENTITYNOTFOUND
%CSTCHECKFOREIGNKEYNOTFOUND
Each validation check macro follows a standard basic workflow. Several of the
validation check macros perform more complex operations and multiple functions. The
basic workflow includes these events:
234 Chapter 7 / Compliance Assessment Against a Reference Standard
5 Perform the logic required to properly assess the validation check. This might be the
check macro code itself, or the code in the validation check metadata codeLogic
field.
6 Write any informational or error messages to the Results data set. Metrics are
written to the Metrics data set.
The following display shows the use of each check macro, by standard and version:
More complete documentation is provided for each check macro in the SAS Clinical
Standards Toolkit: Macro API Documentation. This information is derived from the code
headers. See “Special Topic: Validation Customization” on page 252.
236 Chapter 7 / Compliance Assessment Against a Reference Standard
Overview
Four Validation Master metadata fields are key to how the SAS Clinical Standards
Toolkit processes source data and source metadata: usesourcemetadata, tablescope,
columnscope, and codelogic.
The SAS Clinical Standards Toolkit uses usesourcemetadata to point to the correct
metadata. If usesourcemetadata is set to Y, then the SAS Clinical Standards Toolkit
knows that the source metadata (source_tables and source_columns) is to be used to
derive the domains and columns to be evaluated for compliance to the standard. If
usesourcemetadata is set to N, reference metadata (reference_tables and
reference_columns) is to be used.
The SAS Clinical Standards Toolkit uses the tablescope and columnscope values to
build the work._csttablemetadata and work._cstcolumnmetadata data sets. Based on
the values of these fields, the SAS Clinical Standards Toolkit creates a subset of source
metadata or reference metadata that represents the union of tablescope and
columnscope. The SAS Clinical Standards Toolkit builds columns specified in
columnscope that also exist in the tables specified in tablescope.
For those checks that use codelogic, the SAS Clinical Standards Toolkit builds local
macro variables to communicate tablescope and columnscope settings to the code.
Simple examples are each domain is interpreted as &_cstDSName, and each column is
interpreted as &_cstColumn.
Code logic is run. If the check code logic is a statement (codetype=1 or 3), then
_cstError=1 is generally set. If the check code logic is a DATA step or PROC SQL code
segment (codetype=2 or 4), then work.cstproblems is created.
Special Topic: SAS Implementation of ISO 8601 237
Overview
ISO 8601 is a widely used data standard for dates, times, durations, and intervals. The
values are stored as text strings. They are formatted in a way that ensures that all of the
components are always unambiguous. ISO 8601 is both platform and software
independent, which makes it suitable for data interchange.
Many data standards use a simplified subset of ISO 8601 for specifying their own dates,
times, and durations. This is true of several CDISC standards, including SDTM.
A complete discussion of ISO 8601 and the CDISC subset of ISO 8601 is beyond the
scope of this document. The following tables provide a general idea of what the text
strings look like and how to interpret their values. Additional information is in the
references.
This list provides a summary of the SAS Clinical Standards Toolkit support of ISO 8601:
n Consistent with CDISC SDTM guidelines, the SAS Clinical Standards Toolkit does
not support the ISO 8601 basic format. This means that the text strings must contain
the hyphen delimiter for parts of the dates, and the colon delimiter for parts of the
time.
n The SAS Clinical Standards Toolkit does not support some of the rarely used
formats allowed by ISO 8601. The week (W) formats for dates, Julian dates, and
extended dates (used to denote years greater than 9999) are not supported.
SAS provides capabilities for processing ISO 8601 text strings that are far beyond those
capabilities required by the SAS Clinical Standards Toolkit and CDISC standards.
n The SAS informats $N8601B. and $N8601E. convert an ISO 8601 text string to a
special string called an ISO 8601 entity.
The ISO 8601 entity is a complex binary value that is stored as a hexadecimal value
in a SAS string variable.
238 Chapter 7 / Compliance Assessment Against a Reference Standard
The ISO 8601 entity string is useful for reporting in the ISO 8601 format because it
prevents the loss of valuable information from the input ISO 8601 text string.
n The ISO 8601 entity value should not be confused with the traditional numeric SAS
date, time, or datetime value.
n The ISO 8601 entity should not be used in calculations or comparisons.
n The CALL IS8601_CONVERT routine can be used to generate traditional numeric
SAS dates, times, and datetime values from an ISO 8601 string.
n For additional information, see the online SAS documentation.
2009-03-25T22:29:30. March 25, 2009 10:29 If provided, the time zone must be in
333+05:00 and 30.333 seconds HH:MM format. It cannot be truncated or
p.m. in the time zone a partial value.
GMT + 5 hours Some values in ISO 8601 formats can
have decimal places. Most commonly, this
is seen in seconds. The decimal place
can be denoted as either a period (.) or a
comma (,).
When a time zone is provided, it must be
accompanied by a complete date. The
date cannot be truncated or a partial
value. This is necessary because the 24
global time zones force the date to be
considered as part of the time.
2009-03-25T22:29Z March 25, 2009 10:29 Z can be used to substitute for times in
p.m. Zulu time GMT (or Zulu) time.
240 Chapter 7 / Compliance Assessment Against a Reference Standard
Table 7.20 Example ISO 8601 Values for Dates and Times: Partial Datetime Examples
-----T22:29 The time 10:29 p.m. A time value must always be prefixed by
No value for the date is a date value.
provided. In this example, the date value is
completely missing, which would be
appropriate for time-only fields.
Durations: Template
Table 7.21 Example ISO 8601 Values for Durations: Template
Durations: Examples
Table 7.22 Example ISO 8601 Values for Durations: Examples
P1D The span of one day. Durations always start with P for a period of
time.
Units of time that are not known are usually
omitted. If time is omitted, then T must also
be omitted.
242 Chapter 7 / Compliance Assessment Against a Reference Standard
P1Y2M3DT4H5M6S The span of 1 year, 2 The units must be in the correct order.
months, 3 days, 4 The T is required for all time values, but it
hours, 5 minutes, and 6 should not be specified if no time value is
seconds. given.
Intervals: Template
Table 7.23 Example ISO 8601 Values for Intervals: Template
YYYY-MM-DDTHH:MM:SS/
PnYnMnDTnHnMnS
or
YYYY-MM-DDTHH:MM:SS/
PnYnMnDTnHnMnS
or
YYYY-MM-DDTHH:MM:SS/YYYY-MM-
DDTHH:MM:SS
Special Topic: SAS Implementation of ISO 8601 243
Intervals: Examples
Table 7.24 Example ISO 8601 Values for Intervals: Examples
2009-03-25T22:29/P1Y The span of one year Intervals can express the period of
starting on March 25, 2009 time that starts at a given point in
at 10:29 p.m. time.
The end time is implied.
P0001-00-00/2009-03-25 The span of one year Intervals can express the period of
T22:29 ending on March 25, 2009 time that ends at a given point in
at 10:29 p.m. time.
The start time is implied.
2008-03-25/2009-03-25 The span of time between Intervals can express the period of
March 25, 2008 and March time that starts at a given point in
25, 2009, which happens time and ends at a given point in
to be one year. time.
The duration value itself is implied.
Topic Link
Topic Link
Overview
The SAS Clinical Standards Toolkit provides two properties or global macro variables for
debugging problems occurring with all processes. These are _cstDebug and
_cstDebugOptions.
The _cstDebug global macro variable toggles debugging options on and off. Many SAS
Clinical Standards Toolkit code modules have conditional branching such as:
%if &_cstDebug %then
%do;
/* perform some action */
end;
data _null_;
_cstDebug = input(symget('_cstDebug'),8.);
if _cstDebug then
call execute("options &_cstDebugOptions;");
else
call execute(("%sysfunc(tranwrd(options %cmpres(&_cstDebugOptions),
%str( ), %str( no)));"));
run;
The following table lists common setup errors and possible causes:
Location Where
Error Error Is Reported Possible Cause and Corrective Action
Expected libraries are SAS Log, Libraries (1) An invalid physical name for the libref
not allocated. window, SAS DMS has been used.
Is the libref a valid SAS name?
A SAS name can contain one to 32
characters.
It must start with a letter or an underscore
(_), not a number.
Subsequent characters must be letters,
numbers, or underscores.
Blanks cannot appear in SAS names.
Is the libref a reserved SAS libref name?
You should not use Work, Sasuser, or
Sashelp.
(2) The path specified for the libref is
invalid; it points to a nonexistent directory.
Check the path in your SASReferences
data set.
Error: SAS system SAS Log Work is being used as a sasref value with
library WORK cannot be or without a path being designated. A
reassigned. similar error occurs if Sasuser or Sashelp is
used.
WARNING: One or more SAS Log One of the paths specified for a libref is
libraries specified in the invalid; it points to a nonexistent directory.
concatenated library
CSTTMP do not exist.
Special Topic: Debugging a Validation Process 247
Location Where
Error Error Is Reported Possible Cause and Corrective Action
Error: Physical file does SAS Log (1) The SASReferences data set references
not exist. a file that does not exist.
(2) The filename is not a valid SAS name.
248 Chapter 7 / Compliance Assessment Against a Reference Standard
Location Where
Error Error Is Reported Possible Cause and Corrective Action
WARNING: Apparent SAS Log (1) The macro is misnamed or has not been
invocation of macro added to the expected autocall library.
SDTM_VALIDATE not Does the macros folder for this standard
resolved. exist in the cstGlobalLibrary, in the !sasroot
hierarchy, or in some correctly designated
custom location?
(2) The expected autocall path was not
created correctly in the call to
%CSTUTIL_
ALLOCATESASREFERENCES.
Check that the SASReferences data set
contains a type=autocall record, defined as
a fileref, and points to the correct folder
location.
Check for an error occurring earlier in the
SAS log suggesting that
cstutil_allocatesasreferences failed before
setting the autocall path.
Most errors that halt a validation process are reported in the Results data set. As a
general rule, these Results data set fields signal process failures and provide
information about the cause of the failure:
n the Process status field (_cst_rc), when the value is set to a nonzero value
n the Problem detected field (resultflag), when the value is set to -1
Special Topic: Debugging a Validation Process 249
n the Source Data field (srcdata) identifies the macro reporting the problem
n the Resolved Message text field (message) provides a problem cause
n the Basis for Result field (resultdetails) can provide additional information pertinent
to the problem
Depending on the severity of the problem and when it occurs, the Results data set
might not be saved to the persisted location if that location was requested using a
type=results record in the SASReferences data set. In this case, the Results data set
defined with the &_cstResultsDS global macro variable might be referenced for the
previous information. By default, &_cstResultsDS is set to work._cstresults.
Generally, the SAS Clinical Standards Toolkit does not halt the validation process when
an error is detected in a specific check. The error is noted in the Results data set, the
resultflag value for that check is set to -1, _cst_rc is set to 0, and processing continues
with the next check. A validation process is most likely to be halted (by setting _cst_rc to
1) when there is a significant metadata error that suggests subsequent checks would
likely fail to run.
The following table lists common causes for premature process failure or the failure of
specific checks to run:
Resultid in
Error Results Data Set Possible Cause or Corrective Action
<Data set> could not CST0003 This error usually indicates that a specific
be found source column or data set could not be found.
The code loops through a set of domains or
columns built from the source metadata data
sets. This error might result when the source
metadata does not accurately reflect the
source data.
250 Chapter 7 / Compliance Assessment Against a Reference Standard
Resultid in
Error Results Data Set Possible Cause or Corrective Action
One or more check CST0026 A value in the Validation Control data set for
metadata column the check being run is invalid in the context of
values is invalid. the specific check macro. Examples include
conditions that are required by the check
macro but are not found, such as no code logic
found, an unexpected usesourcemetadata
value, or no lookuptype or lookupsource for
valid value assessments.
Special Topic: Debugging a Validation Process 251
Resultid in
Error Results Data Set Possible Cause or Corrective Action
Code failed due to SAS CST0050 A SAS DATA step or SAS procedure failed and
error-see log. the cause is reported in the SAS log. This most
commonly occurs because of missing data
sets, missing columns, incorrectly sorted data
sets, and unexpected macro variable values.
<Message lookup failed <varies> The check macro code generates a resultid
to find matching value that does not find a match in the
record> Messages data set. Either the wrong resultid
has been specified, or the standard-specific
Messages data set has not been updated to
include the resultid.
Overview
One of the significant benefits of the SAS Clinical Standards Toolkit is that you can
customize the solution to meet your needs. From a validation perspective, this includes:
n modifying an existing standard or defining a new reference standard
n using any set of source data and metadata
n modifying the SAS validation checks for supported standards
n adding new validation checks for supported standards
n modifying existing validation check macros or adding new macros
n modifying the SAS Clinical Standards Toolkit messaging, including
internationalization
n attempting to validate multiple studies in a single validation process
Each of these factors suggests that the SAS Clinical Standards Toolkit CDISC reference
standards will be modified or replaced with customer-derived standards. The SAS
Clinical Standards Toolkit offers the option of building a reference standard to
encompass domain and column customizations. Or, you can customize check macros
and check logic to perform specific compliance assessments to a standard. For
example, in CDISC SDTM, it is not uncommon to build multiple supplemental qualifier
domains (for example, SUPPAE) associated with a core reference domain (for example,
AE). It is at the customer's discretion whether the reference standard is modified to
include each unique supplemental qualifier domain, or to use existing SAS Clinical
Standards Toolkit validation check macros with unique code logic or custom check
macros to validate the custom domains. These latter options are discussed in the
following case studies.
It is likely that you will derive multiple reference standards. From a SAS Clinical
Standards Toolkit validation perspective, the only relevant reference standard is the one
defined in the SASReferences data set (as type=referencemetadata).
For information about registering a new standard in the SAS Clinical Standards Toolkit,
see “Registering a New Version of a Standard” on page 26.
254 Chapter 7 / Compliance Assessment Against a Reference Standard
One key SAS Clinical Standards Toolkit requirement is that source study elements
should be kept in synchronization. Another key requirement is that all relevant source
study elements should be accurately represented in a SASReferences data set. The
synchronization of study elements is a task that is often performed outside the SAS
Clinical Standards Toolkit. The study data libraries must contain the domains of interest,
the study metadata must provide the complete set of table-level and column-level
metadata necessary to describe the source data, and any format catalogs and coding
dictionaries supporting the study must be available.
standard (for example, to include other domains you consistently use) or you have
one or more studies that have new domains, changes are likely to involve alterations
to the Validation Master and Validation Control (run time) tablescope or columnscope
fields.
n Changing the Validation Control codelogic field to alter the logic used to identify error
conditions. This might be a necessary change if a check needs to be generalized to
accommodate new domains or columns. Or, customer conventions might differ from
those in the SAS Clinical Standards Toolkit checks.
n If customer code changes are sufficiently significant, then it might be better to create
a new validation check macro. (See “Case Study 5: Modifying Existing Validation
Check Macros or Adding New Macros” on page 257.) If a new validation check
macro is required, then the Validation Control codesource field needs to be modified
to contain the name of the new check macro.
n The Validation Control uniqueid field provides a way to uniquely identify a specific
validation check for reference. Any substantive change to any Validation Control
data set check field normally leads to a new uniqueid. For information about the
structure of uniqueid, see Table 7.3 on page 174.
n The Validation Control checkstatus field provides an easy way to identify selected
checks with a user-defined status (for example, draft, deprecated, or not available
for a given study). The SAS Clinical Standards Toolkit does not reference this field
within any validation check macro.
n The Validation Control lookupsource field can be changed to reference a different
SAS format or lookup data set (for example, a new version of MedDRA). In the latter
case, a change to the pathname, memname, or both fields in the SASReferences
data set might be a more appropriate action.
n Certain Validation Master fields accept any user-defined value (for example,
checksource, sourceid, checktype, standardref, and checkstatus). These fields are
not referenced by the validation check macros. The remaining fields are used in the
validation check macros, so you must abide by the SAS Clinical Standards Toolkit
conventions. These conventions are described in Chapter 2, “Framework,” on page
7.
n A new check should be added to the (run time) Validation Control data set for
testing. After testing, it can be promoted to the Validation Master data set to be
available to applications and processes. These requirements follow a typical
development process.
n For each new validation check, a matching message is required. This is the
message that you want written to the Results data set when an error condition is
detected. For details, see “Messages” on page 192.
n Use a similar validation check as a template to build the check metadata required by
the SAS Clinical Standards Toolkit. Ask yourself the following types of questions:
o What category or type of check is it?
Look at the Validation Master data set checktype column. Does it look only at
table or column metadata, and not at data values (Metadata)? Does it require a
specific raw column value (ColumnValue), or a value that complies with some
controlled terminology (Cntlterm)? Must the assessment look across multiple
records (Multirecord) or multiple tables (Multitable)?
o Does the check compare columns within a single table?
Consider Validation Master records where the codesource column is
cstcheck_columncompare, cstcheck_columnvarlist, or cstcheck_notunique.
o Does the check compare tables?
Consider Validation Master records where the codesource column is
cstcheck_comparedomains or cstcheck_recnotfound.
o Does the check look across multiple standards?
Consider Validation Master records where the codesource column is
cstcheck_crossstdcomparedomains or cstcheck_crossstdmetamismatch.
Special Topic: Validation Customization 257
Some validation scenarios might require modifications to the SAS Clinical Standards
Toolkit check macros or the derivations of new macros. If so, these guidelines should be
followed. These guidelines facilitate the use of these macros in the general SAS Clinical
Standards Toolkit framework and in the specific SAS Clinical Standards Toolkit
validation framework.
258 Chapter 7 / Compliance Assessment Against a Reference Standard
n Follow the current naming convention or adopt a consistent naming convention that
conforms to SAS naming conventions.
n Use the current autocall library or use a customized autocall library that has been
defined in the SASReferences data set (type=autocall).
n Conform to the basic check macro workflow. This workflow is described in “Special
Topic: Validation Check Macros” on page 229.
n Ensure that the macro correctly accepts and interprets the metadata provided as
input from the Validation Control data set. If the new macro fails to do so, then it can
be hardcoded to provide any specific functionality that is desired.
n Ensure that the macro writes appropriate output to the Results and Metrics data
sets.
1 Maintain the relationship between the SAS Clinical Standards Toolkit standard-
specific messages and standard-specific validation checks.
2 Maintain the relationship between messages and validation check macro code.
(Deviations are acceptable to the extent that missing parameters have suitable
defaults.)
3 Internationalize messages.
A SAS Clinical Standards Toolkit message is created for each distinct combination of
the Validation Master standard and checksource fields. This allows the SAS Clinical
Standards Toolkit to support checksource-specific messaging and severity. A unique
SAS Clinical Standards Toolkit message is required for each value of the Validation
Master standardversion field if that value is not the wildcard ***.
Special Topic: Validation Customization 259
Consider the CDISC SDTM Validation Master record excerpt in this display.
Figure 7.12 Validation Master Data Set Excerpt for Check CUST0073
The following display shows that only two Messages data set records are required:
Building the message record to use a default value (as specified in the parameter1 field)
solves the problem when the calling macro fails to pass a substitution value. Using
parameters is optional. Parameters might be needed only if the message is to be used
in multiple contexts where substitutions of parameter values help interpret the message.
Clinical Standards Toolkit supports library concatenation, but SAS only reads data
sets from the first defined library when the same data set name occurs in multiple
libraries. Because standard domain names are expected, this approach does not
work unless a unique domain-naming convention across studies is used. A similar
approach is required for source metadata. These constraints make this approach
less tenable.
n Another alternative methodology is to use multiple SASReferences librefs (multiple
type=sourcedata records). You have one for each study source library, and a single
source metadata library (with one table and one column metadata data set, setting
the SASRef column to each libref used in SASReferences). This methodology works
for any validation check that does not compare columns across domains or
compares domains.
Source data libraries are considered when tablescope and columnscope parsing
occurs in the SAS Clinical Standards Toolkit. However, if tablescope does not
include the libref, unintended comparisons of multiple columns or multiple domains
from different studies can occur. As a result, this methodology is not recommended
unless you consistently use multiple librefs in the source metadata and validation
check metadata.
Consider these scenarios and how each one can be handled using the SAS Clinical
Standards Toolkit:
n Scenario 1: You want to create and manage codelists (SAS formats) independent of
the CDISC Controlled Terminology standard provided with SAS Clinical Standards
Toolkit.
This scenario assumes you have one or more user-defined SAS format catalogs that
contain valid values associated with your data columns. These user-defined format
catalogs might include extensions to existing CDISC Controlled Terminology
codelists or to new formats associated with columns in custom domains. The SAS
Clinical Standards Toolkit SASReferences data set enables you to specify
references to multiple catalogs and to manage the order in which these appear in
the format search path. For example, if you have a catalog named MYTERMS that
contains all formats of interest for your study, your SASReferences data set can
contain a single type=fmtsearch record:
However, if you prefer to keep your customizations in a separate format catalog, but
you want to use the CDISC Controlled Terminology codelists provided with the SAS
Clinical Standards Toolkit, your SASReferences data set will have multiple
type=fmtsearch records, with the order column value set to establish the format
search path precedence:
In this case, any extended, like-named formats in MYTERMS are used instead of the
original formats in CTERMS provided with the SAS Clinical Standards Toolkit.
n Scenario 2: You want to manage codelist (SAS format) customizations as a
registered standard in the global standards library of the SAS Clinical Standards
Toolkit.
Special Topic: Using Alternative Controlled Terminologies 263
The SAS Clinical Standards Toolkit provides sample programs that create the data
sets that are needed to register controlled terminology. The programs also register
these data sets. The programs are called create_terminology_standarddatasets.sas
and registerstandard.sas and are here:
global standards library directory/standards/cdisc-
terminology-1.7/programs
Note: You must have Write access to the global standards library.
264 Chapter 7 / Compliance Assessment Against a Reference Standard
If you want to add a completely new set of terminology to the global standards
library, you must follow the information in “Maintenance Usage Scenarios” on page
25.
Assume that your organization has created its own comprehensive set of CDISC
controlled terminology, and you have created the global standards library subfolder
hierarchy (with CDISC ADaM fully expanded) shown in this display.
After the registration process, this display shows how your global standards library
data set might look (using the folder hierarchy above).
The following display shows that the standardsubstypes data set located in the
global standards library directory/standards/cdisc-
terminology-1.7/control folder now contains this CDISC ADaM record:
n Scenario 3: You use multiple versions of the MedDRA dictionary to code Adverse
Events across multiple studies within a submission.
The SAS Clinical Standards Toolkit does not provide copies of the MedDRA coding
dictionary as maintained and distributed by the Maintenance and Support Services
Organization. Your organization more than likely maintains the multiple updates to
MedDRA, and you might need to reference multiple versions of MedDRA in a single
SAS Clinical Standards Toolkit process.
Although it is possible to create and use SAS format catalogs for MedDRA lookups
(and similar coding dictionary lookups), the SAS Clinical Standards Toolkit provides
a mechanism to reference and use a data set lookup methodology in the
SASReferences data set using one or more type=referencecterm records. Each
record points to a specific MedDRA version using a unique SAS libref, with the
resulting libref.dataset available for use, as needed.
266 Chapter 7 / Compliance Assessment Against a Reference Standard
n Scenario 4: You use the WHO Drug dictionary to ensure that your coding of
Concomitant Medications in CMDECOD and CMCLASCD includes valid terms and
class codes.
The SAS Clinical Standards Toolkit does not provide copies of the WHO Drug
dictionary as created by the World Health Organization and managed by the
Uppsala Monitoring Centre. As in Scenario 3, the SAS Clinical Standards Toolkit
provides a mechanism to reference and use a data set lookup methodology in the
SASReferences data set using one or more type=referencecterm records.
The following display shows how your WHO Drug reference might look:
The SAS Clinical Standards Toolkit provided, in releases prior to version 1.7, several
CDISC SDTM validation checks that involved lookups to coding dictionaries. This
methodology can still be used in the SAS Clinical Standards Toolkit 1.7.
The following display shows the relevant metadata columns from the validation
check data set:
The codelogic value is specific to the coding dictionary. In a WHO Drug lookup,
drugname and atc_code (or their equivalents) are used. The
%CSTCHECK_NOTINCODELIST check macro retrieves and uses the lookup data
set named in the lookupsource metadata column based on information stored in the
SASReferences data set records where type=referencecterm.
Special Topic: Performance Considerations 267
8
Internal Validation
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Overview
Each standard as defined in the SAS Clinical Standards Toolkit includes numerous SAS
metadata files and SAS macros. For the SAS Clinical Standards Toolkit to function
properly, each file must contain a core set of columns that have an expected variable
type. Each macro is designed to use these core columns to perform certain functions.
270 Chapter 8 / Internal Validation
The term internal validation refers to a set of tools that checks the consistency of the
SAS metadata files. The tools use the SAS Clinical Standards Toolkit validation
framework and methodology that assess standard-specific files against a defined
reference standard. The tools determine whether the metadata that the SAS Clinical
Standards Toolkit expects is correctly defined.
Scenario
Scenario
Supporting Macros
The following macros support SAS Clinical Standards Toolkit internal validation. Many of
these macros are also used for other purposes.
These macros are located in the primary SAS Clinical Standards Toolkit autocall path:
n Microsoft Windows
!sasroot/cstframework/sasmacro
n UNIX
!sasroot/sasautos
For complete macro documentation, see the SAS Clinical Standards Toolkit: Macro API
Documentation.
In most driver programs that are provided with the SAS Clinical Standards Toolkit, a call
to the %CSTUTIL_PROCESSSETUP macro initiates a series of steps to establish the
environment to perform a subsequent task, such as validating a study or building a
define.xml file. SAS file and library references are allocated. Updates to the SAS
autocall and format search paths are completed. These steps are completed based
solely on the content of a SASReferences data set.
With the SAS Clinical Standards Toolkit, the SASReferences data set is automatically
validated through a series of calls to the %CSTUTILVALIDATESASREFERENCES
macro. These calls to %CSTUTILVALIDATESASREFERENCES are made within
macros called in the %CSTUTIL_PROCESSSETUP macro workflow. The following
error conditions are reported by default:
CHK01 The data set is A structural comparison with the template that is
structurally incorrect. provided with the SAS Clinical Standards Toolkit is
performed using cstutilcomparestructure. Minor
differences involving labels, informats, and formats
are generally ignored.
CHK04 A required look-through You might choose to leave the path or memname
to the global standards blank in your SASReferences data set, which
library defaults fails. indicates that you want to use the defaults as
specified in the standard-specific
StandardSASReferences data set. If the path or
memname remains blank (unresolved) after the
final call to
%CSTUTILVALIDATESASREFERENCES in
%CSTUTIL_ALLOCATESASREFERENCES, this
error is reported.
CHK05 One or more discrete Columns with discrete values (reftype, type
character field values +subtype combinations, iotype, filetype,
cannot be found in the allowoverwrite) must have values as defined in the
Standardlookup data standard-specific Standardlookup data set.
set.
CHK06 For the given context, If macro variables are used as part of the path or
path or memname memname value, they must resolve to an
macro variables are not accessible folder or file.
resolved.
CHK08 Multiple autocall records To properly set the autocall path, an unambiguous
exist, but valid ordering ordering of multiple type=autocall records must be
is not provided. provided.
The occurrence of any of these errors causes the process to terminate. The rationale is
that if the process setup is incomplete, and the SAS Clinical Standards Toolkit cannot
recognize a SASReferences column value or find a specified file, the process output
might be unreliable. Correct problems reported in the process results data set (as
typically defined by the _cstResultsDS global macro variable) and resubmit the process.
Sample Driver Programs 275
Overview
The SAS Clinical Standards Toolkit internal validation addresses these primary use
cases:
This is implemented with and illustrated by the use of the validate_iqoq sample
driver, which is located here:
sample study library directory/cst-framework-1.7/programs
This is a two-step process:
a Select the CST-FRAMEWORK standard, and run the checks that are defined in
the validation_control_glmeta view of the internal validation validation_master
data set.
This is a set of 64 checks (checkid < CSTV100) that look only at the global
standards library metadata folder.
b Select 1 to n specific standards, and run the checks that are defined in the
validation_control_stdiqoq view of the internal validation validation_master data
set.
This is a set of 50 checks (checkid > CSTV100 that are relevant to installation
qualification and operational qualification issues) that look only at metadata
libraries other than the global standards library metadata folder.
This is implemented with and illustrated by the use of the validate_standard sample
driver. Select 1 to n specific standards, and run the checks that are defined in the
validation_control_std view of the internal validation validation_master data set.
276 Chapter 8 / Internal Validation
This is a set of 73 checks (checkid > CSTV100) that look only at metadata libraries
other than the global standards library metadata folder.
The sample drivers that support internal validation are described in the following
sections. The SASReferences data set is validated automatically as part of these
sample driver programs during the call to the %CSTUTIL_PROCESSSETUP macro.
This driver program performs all standard-specific validation checks. This excludes
checks that target the global standards library directory/metadata folder
files. Essentially, this is any check defined in validation_master, where checktype NE
‘GLMETA’.
data work._cstStandardsforIV;
set work._cstAllStandards (where=(
(upcase(standard) = 'CDISC-ADAM' and standardversion='2.1')
or (upcase(standard) = 'CDISC-CRTDDS' and standardversion='1.0')
or (upcase(standard) = 'CDISC-CDASH' and standardversion='1.1')
/*
or (upcase(standard) = 'CDISC-DATASET-XML' and standardversion='1.0.0')
or (upcase(standard) = 'CDISC-DEFINE-XML' and standardversion='2.0.0')
or (upcase(standard) = 'CDISC-CT' and standardversion='1.0.0')
or (upcase(standard) = 'CDISC-ODM' and standardversion='1.3.0')
or (upcase(standard) = 'CDISC-ODM' and standardversion='1.3.1')
or (upcase(standard) = 'CDISC-SDTM' and standardversion='3.1.2')
or (upcase(standard) = 'CDISC-SDTM' and standardversion='3.1.3')
or (upcase(standard) = 'CDISC-SDTM' and standardversion='3.2')
or (upcase(standard) = 'CDISC-SEND' and standardversion='3.0')
or (upcase(standard) = 'CDISC-TERMINOLOGY' and standardversion='NCI_THESAURUS')
or (upcase(standard) = 'CST-FRAMEWORK' and standardversion='1.2')
*/
));
run;
In this example, validation is performed only for the CDISC ADaM, CDISC CDASH,
and CDISC CRT-DDS standards.
Sample Driver Programs 279
data work.stdvalidation_sasrefs;
set _cstTemp.stdvalidation_sasrefs;
if type='control' and subtype='validation' then
do;
filetype='view';
memname='validation_control_std.sas7bvew';
end;
run;
Note: Alternate views might be used. See “Internal Validation Driver Programs That
Are Provided with the SAS Clinical Standards Toolkit” on page 276.
3 Call the process setup macro to perform all CST-FRAMEWORK file and library
allocations.
The returned &_cstSASRefs data set contains fully resolved path and memname
values.
%cstutil_processsetup(_cstSASReferencesLocation=&workpath,
_cstSASReferencesName=stdvalidation_sasrefs);
*****************************************************************************;
data work.stdvalidation_sasrefs;
set &_cstSASRefs
attrib _srcfile format=$8. label='File source for record';
**********************************************************************;
* Framework validation sasreferences: cstcntl.stdvalidation_sasrefs *;
**********************************************************************;
_srcfile='STDVAL';
run;
Note: This step is optional because it merely provides an indication of the sources
and purposes of specific SASReferences data set records.
5 Call the code-generator macro to build the job stream for each standard:
filename incCode CATALOG "work._cstCode.stds.source" LRECL=255;
%cstutilbuildstdvalidationcode(_cstStdDS=work._cstStandardsforIV,
_cstSampleRootPath=_DEFAULT_, _cstSampleSASRefDSPath=_DEFAULT_,
_cstSampleSASRefDSName=_DEFAULT_);
b Look for the standard-specific StandardSASReferences data set from the global
standards library. If found, run cstutil_processsetup using this data set.
d Look for the standard-specific sdtvalidation_sasrefs data set from the sample
library. If found, run cstutil_processsetup using this data set.
g Run
%cstutilbuildmetadatafromsasrefs(cstSRefsDS=work._
cstTempSASRefDS,cstSrcTabDS=work.source_tables,
cstSrcColDS=work.source_columns).
This macro dynamically builds reference_tables and reference_columns data
sets from a SASReferences data set. For examples, see Figure 8.1 on page 282
and Figure 8.2 on page 283.
i Call cstvalidate, which uses the validation_control view specific to the driver
focus (in this case, validation_control_std) as specified in “Internal Validation
Driver Programs That Are Provided with the SAS Clinical Standards Toolkit” on
page 276.
6 For each standard selected in validate_standard driver workflow step 1, repeat steps
a through j in step 5.
Results are collated in cstrslt.validation_results. For excerpts of the results, see
Figure 8.3 on page 284.
282 Chapter 8 / Internal Validation
Note: **This is an excerpt only. Not all records and columns are shown.
Sample Driver Programs 283
Note: **This is an excerpt only. Not all records and columns are shown.
284 Chapter 8 / Internal Validation
Note: **This is an excerpt only. Not all records and columns are shown.
Validation Checks
The validation_master data set column checktype is used to specify the primary focus of
each check. The following table shows the distribution of records by checktype:
Total Number of
Focus Checktype Checks (Unique)
The 137 validation checks use 11 of the SAS Clinical Standards Toolkit framework
check macros. The following table shows the distribution of these checks by check
macro:
Number of
Check Macro Records
%CSTCHECK_COLUMN 38
%CSTCHECK_COLUMNCOMPARE 50
%CSTCHECK_COMPAREDOMAINS 8
%CSTCHECK_DSMISMATCH 7
%CSTCHECK_NOTCONSISTENT 5
%CSTCHECK_NOTINCODELIST 2
%CSTCHECK_NOTUNIQUE 2
%CSTCHECK_RECMISMATCH 4
%CSTCHECK_RECNOTFOUND 11
286 Chapter 8 / Internal Validation
Number of
Check Macro Records
%CSTCHECK_ZEROOBS 3
%CSTCHECKENTITYNOTFOUND 7
For internal validation, using the SAS libref is usually required in the validation_master
tablescope value. Each SAS libref is associated with a specific SAS library through the
SASReferences record that identifies the library (or specific SAS file) as an input to the
process.
As with all validation check data sets in the SAS Clinical Standards Toolkit, you can add
your own checks or modify existing checks to meet your validation requirements.
(The SAS Clinical Standards Toolkit global standards library and sample study library
have been set to the path that is indicated.)
The location of the views can vary based on where your global standards library and
sample study library are located.
This check reports each instance where the Standards data set column rootpath cannot
be found. This value is important to support the use of relative paths, which are
indicated by a non-null value in the SASReferences relpathprefix column.
The following display shows a portion of the check metadata for this check:
Each of the column values shown in Figure 8.4 on page 287 is explained in the
following table:
9
XML-Based Standards
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Basic Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Creating a CDISC CRT-DDS 1.0 define.xml File . . . . . . . . . . . . . . . . . 332
Sample Driver Program: create_crtdds_from_sdtm.sas . . . . . . . . . 334
Sample Driver Program: create_crtdds_define.sas . . . . . . . . . . . . . . 339
Creating a define.pdf File from the SAS
Representation of the CDISC CRT-DDS 1.0 Standard . . . . . . . . 343
Creating a CDISC Define-XML 2.0 define.xml
File (Including Analysis Results Metadata 1.0) . . . . . . . . . . . . . . . . . 346
Sample Driver Program: create_sasdefine_from_source.sas . . 348
Sample Driver Program: create_definexml.sas . . . . . . . . . . . . . . . . . . . 355
Creating a CDISC ODM XML File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Sample Driver Program: create_odmxml.sas . . . . . . . . . . . . . . . . . . . . . 362
For CDISC CRT-DDS 1.0, this means that 39 data sets (such as ItemDefs) containing
176 columns are derived from the define.xml element and attribute structure.
For CDISC Define-XML 2.0, there are 46 data sets (such as ItemDefs) containing 215
columns that are derived from the define.xml element and attribute structure. For the
CDISC Analysis Results Metadata extension for Define-XML 2.0, the SAS
representation was extended to 54 data sets containing 239 columns.
For CDISC ODM 1.3.0, there are 66 data sets containing 315 columns in the SAS
representation of the model.
For ODM 1.3.1, there are 76 data sets containing 352 columns in the SAS
representation of the model.
For CDISC CT 1.0, there are 15 data sets containing 73 columns in the SAS
representation of the model.
The SAS representation of each standard can be derived in part from other standards
(such as CDISC SDTM or CDISC ADaM) and can include supporting metadata from
other sources. The SAS Clinical Standards Toolkit can create a CDISC CRT-DDS 1.0
XML file, a CDISC Define-XML 2.0 file (including Analysis Results Metadata), a CDISC
ODM 1.3.0 file, a CDISC ODM 1.3.1 XML file, a Dataset-XML 1.0 file, or a CDISC CT
XML 1.0 file.
Overview
Support of CDISC XML-based standards, such as CDISC Define-XML 2.0, CDISC CRT-
DDS (define.xml), and CDISC ODM, includes the ability to read XML files into SAS data
set format. In the SAS Clinical Standards Toolkit, you can read these types of files:
n a CDISC CRT-DDS 1.0
n a CDISC Define-XML 2.0 define.xml file (including Analysis Results Metadata 1.0)
n a CDISC ODM 1.3.0 or CDISC ODM 1.3.1 XML file
Reading XML Files 295
n the Controlled Terminology files as they are published by the NCI in ODM XML
format
Basic Workflow
Here is the basic workflow for reading XML files:
2 Use valid XSL style sheets for each target data set (such as ItemDefs.xsl).
3 Use the SAS DATA step component JavaObj to create a standardized intermediate
cubeXML file using the XSL style sheets.
4 Read the standardized cubeXML file using the SAS XML LIBNAME engine and
XMLMAP processing.
This basic workflow is used by all XML-based standards that are supported by the SAS
Clinical Standards Toolkit.
To read an ODM XML file, a specialized macro named %ODM_READ is available in the
ODM 1.3.0 standards macro folder. This folder is located here:
global standards library directory/standards/
cdisc-odm-1.3.0-1.7/macros
framework initialization properties and the CDISC ODM 1.3.0 initialization properties.
Throughout the processing of the %ODM_READ macro, the Results data set contains
all framework and ODM 1.3.0 specific messages generated during run time.
Based on file references defined in the SASReferences data set, the %ODM_READ
macro accesses the ODM XML file.
After the %ODM_READ macro confirms that the ODM XML file exists, a call is made to
the SAS DATA step component JavaObj. JavaObj processing converts the ODM XML
file into the cubeXML file through transformations using XSL files and processes. The
cubeXML file is created in the Work library. The name of the cubeXML file is
_cubnnnn.xml, where nnnn is a randomly generated number. The cubeXML file is
accessed using the SAS XML LIBNAME engine and XMLMAP processing. A default
XMLMAP file is stored in the sample ODM 1.3.0 study folder hierarchy
under /referencexml as odm.map. The odm.map file is required to process the
cubeXML file. If it does not exist, then the %ODM_READ macro attempts to create one
using the ODM reference metadata.
<TABLE name="ItemDefs">
<TABLE-PATH syntax="XPath">/LIBRARY/ItemDefs</TABLE-PATH>
<TABLE-DESCRIPTION>Item metadata</TABLE-DESCRIPTION>
<COLUMN name="OID">
<PATH syntax="Xpath">/LIBRARY/ItemDefs/OID</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>Unique identifier for this item</DESCRIPTION>
<LENGTH>64</LENGTH>
</COLUMN>
<COLUMN name="Name">
<PATH syntax="Xpath">/LIBRARY/ItemDefs/Name</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>Item (variable) name</DESCRIPTION>
<LENGTH>128</LENGTH>
</COLUMN>
<COLUMN name="DataType">
<PATH syntax="Xpath">/LIBRARY/ItemDefs/DataType</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
298 Chapter 9 / XML-Based Standards
When the cubeXML is processed, each of the 66 data sets (such as ItemDefs) that are
included in the SAS representation of the CDISC ODM 1.3.0 model is derived.
Note: For more information about the %ODM_READ macro, see the SAS Clinical
Standards Toolkit: Macro API Documentation.
filenames to use, and the names and locations of data sets to be created by the
process. It can be modified to point to study-specific files. For an explanation of the
SASReferences data set, see Chapter 6, “SASReferences File,” on page 137.
In the SASReferences data set, there are two input file references and five output data
set references that are key to the successful completion of the driver program. Table 9.1
on page 299 lists these files and data sets, and they are discussed in separate
sections. In the sample create_sasodm_fromxml.sas driver program, these values are
set for &studyRootPath and &studyOutputPath:
&studyRootPath=&_cstSRoot/cdisc-odm-&_cstStandardVersion.
-&_cstVersion
&studyOutputPath=&_cstSRoot/cdisc-odm-&_cstStandardVersion.
-&_cstVersion
Table 9.1 Key Components of the SASReferences Data Set for the
create_sasodm_fromxml.sas Driver Program
SAS
LIBNAME or
Fileref to Reference
Metadata Type Use Type Path Name of File
Input
Output
SAS
LIBNAME or
Fileref to Reference
Metadata Type Use Type Path Name of File
Process Inputs
The externalxml type refers to the ODM XML file to read. The filename reference
odmxml is defined in the SASReferences data set. This filename reference is used in
the submitted SAS code when referring to the ODM XML file.
The referencexml type refers to the SAS map file that is used to generate the SAS data
sets that represent the ODM file metadata and content. The filename reference
odmmap is defined in the SASReferences data set. This filename reference is used in
the submitted SAS code when referring to the SAS map file. If a path and filename for
the map file are not specified, a temporary map file is created as part of the odm_read
processing.
Process Outputs
When the driver program finishes running, the read_results data set is created in the
Results library. This data set contains informational, warning, and error messages that
were generated by the driver program.
Reading XML Files 301
The following display shows an example of the contents of a Results data set that was
created while reading the sample ODM XML file that was provided with the SAS Clinical
Standards Toolkit:
Figure 9.1 Example of a Partial Results Data Set Created by the create_sasodm_fromxml.sas
Driver Program
302 Chapter 9 / XML-Based Standards
The %ODM_READ macro creates the source_tables and source_columns data sets in
the Srcmeta library. These data sets contain the table and column metadata for each of
the SAS data sets that is derived from the ODM XML file.
Figure 9.2 Example of Partial Source_Tables Data Set Derived from the %ODM_READ Macro
Reading XML Files 303
Figure 9.3 Example of Partial Source_Columns Data Set Derived from the %ODM_READ
Macro
The Srcdata library contains the SAS data sets that represent the ODM file metadata
and content. By default, the %ODM_READ macro creates 66 unique data sets in the
SAS Clinical Standards Toolkit for ODM 1.3.0. Some of these data sets might be empty
if no associated content was derived from the ODM XML file. There is a one-to-one
304 Chapter 9 / XML-Based Standards
correspondence between the tables listed in the Srcdata library and the tables
contained in the source_tables metadata file in the Srcmeta library.
Figure 9.4 Example of Partial Srcdata Library Derived from the %ODM_READ Macro
unless the EDC views have been built using the CDISC CDASH standard. From a SAS
perspective, you might want to extract clinical data from an ODM XML file to serve as
source data for transformations that derive SDTM domain data sets.
ODM integer and float data types are converted to SAS numeric data. All other ODM
data types are converted to SAS character data. If an integer or float data value cannot
be converted, a warning appears in the SAS log and Results data set.
Here is a partial listing of the metadata in a sample ODM XML file:
<ItemGroupDef OID="ItemGroupDefs.OID.AE" Repeating="Yes"
SASDatasetName="AE" Name="Adverse Events" Domain="AE"
Comment="Some adverse events from this trial">
<ItemRef ItemOID="ID.TAREA" OrderNumber="1" Mandatory="No" />
<ItemRef ItemOID="ID.PNO" OrderNumber="2" Mandatory="No" />
<ItemRef ItemOID="ID.SCTRY" OrderNumber="3" Mandatory="No" />
<ItemRef ItemOID="ID.F_STATUS" OrderNumber="4" Mandatory="No" />
<ItemRef ItemOID="ID.LINE_NO" OrderNumber="5" Mandatory="No" />
<ItemRef ItemOID="ID.AETERM" OrderNumber="6" Mandatory="No" />
<ItemRef ItemOID="ID.AESTMON" OrderNumber="7" Mandatory="No" />
<ItemRef ItemOID="ID.AESTDAY" OrderNumber="8" Mandatory="No" />
<ItemRef ItemOID="ID.AESTYR" OrderNumber="9" Mandatory="No" />
<ItemRef ItemOID="ID.AESTDT" OrderNumber="10" Mandatory="No" />
306 Chapter 9 / XML-Based Standards
Here is a partial listing of the data in the same sample ODM XML file:
<ClinicalData StudyOID="Study.OID" MetaDataVersionOID="MetaDataVersion.OID.1">
<SubjectData SubjectKey="S001P011" TransactionType="Insert">
<StudyEventData StudyEventOID="StudyEventDefs.OID.6.AdverseEvent"
StudyEventRepeatKey="1">
<FormData FormOID="FormDefs.OID.AE" FormRepeatKey="1">
<ItemGroupData ItemGroupOID="ItemGroupDefs.OID.AE"
ItemGroupRepeatKey="1">
<ItemData ItemOID="ID.TAREA" Value="ONC" />
<ItemData ItemOID="ID.PNO" Value="143-02" />
<ItemData ItemOID="ID.SCTRY" Value="USA" />
<ItemData ItemOID="ID.F_STATUS" Value="V" />
<ItemData ItemOID="ID.LINE_NO" Value="1" />
<ItemData ItemOID="ID.AETERM" Value="HEADACHE" />
<ItemData ItemOID="ID.AESTMON" Value="06" />
<ItemData ItemOID="ID.AESTDAY" Value="10" />
<ItemData ItemOID="ID.AESTYR" Value="1999" />
Reading XML Files 307
The %ODM_EXTRACTDOMAINDATA macro creates the data set shown in Figure 9.5
on page 307 and Figure 9.6 on page 308. The first 12 columns in this data set are the
data set keys. The macro parameter _cstODMMinimumKeyset determines whether
these keys are part of the extracted data set.
Two sample driver programs for ODM 1.3.1 are provided with the SAS Clinical
Standards Toolkit to demonstrate the use of the %ODM_EXTRACTDOMAINDATA
macro:
sample study library directory/cdisc-odm-1.3.1-1.7/
programs/extract_domaindata_all.sas
data _null_;
set srcdata.itemgroupdefs(keep=OID Name IsReferenceData SASDatasetName Domain);
file incCode;
length macrocall $400 _cstOutputName $100;
_cstOutputName=SASDatasetName;
* If we have to use the Name, Only use letters and digits;
if missing(_cstOutputName) then _cstOutputName=cats(compress(Name, 'adk'));
* If first character a digit, prepend an underscore;
if anydigit(_cstOutputName)=1 then _cstOutputName=cats('_', _cstOutputName);
* Cut long names;
if length(_cstOutputName) > 32 then _cstOutputName=substr(_cstOutputName, 1, 32);
macrocall=cats('%odm_extractdomaindata(_cstSelectAttribute=OID',
', _cstSelectAttributeValue=', OID,
', _cstIsReferenceData=', IsReferenceData,
', _cstMaxLabelLength=256',
', _cstAttachFormats=Yes',
', _cstODMMinimumKeyset=No',
', _cstLang=en',
', _cstOutputDS=', _cstOutputName, ');');
put macrocall;
run;
Reading XML Files 311
%include incCode;
filename incCode clear;
This macro is referenced from the create_sasct_fromxml.sas driver program. For more
information, see “Sample Driver Program: create_sasct_fromxml.sas ” on page 314.
File references and other metadata that are required by the macro are set as global
macro variable values. These global macro variable values are set through the
framework initialization properties and the CDISC controlled terminology 1.0
initialization properties. Throughout the processing of the %CT_READ macro, the
Results data set contains all framework-specific messages and CDISC controlled
terminology 1.0-specific messages that were generated during run time.
Based on file references defined in the SASReferences data set, the %CT_READ
macro accesses the ODM controlled terminology XML file.
312 Chapter 9 / XML-Based Standards
The following display shows a partial listing of a sample ODM controlled terminology
XML file:
Figure 9.7 Partial Listing of a Sample ODM Controlled Terminology XML File
After the %CT_READ macro confirms that the ODM controlled terminology XML file
exists, a call is made to the SAS DATA step component JavaObj. JavaObj processing
converts the ODM controlled terminology XML file into a cubeXML file through
transformations using XSL files and processes.
The cubeXML file is created in the SAS Work library. The name of the cubeXML file is
_cubnnnn.xml, where nnnn is a randomly generated number.
The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMap
processing. A default XMLMap file is stored in the sample CDISC controlled terminology
1.0 study folder hierarchy (referencexml/odm.map). An odm.map file is required to
process the cubeXML file. If it does not exist, the %CT_READ macro attempts to create
one using the CDISC controlled terminology reference metadata.
<TABLE name="CodeLists">
<TABLE-PATH syntax="XPath">/LIBRARY/CodeLists</TABLE-PATH>
<TABLE-DESCRIPTION>Codelist metadata</TABLE-DESCRIPTION>
<COLUMN name="OID">
<PATH syntax="Xpath">/LIBRARY/CodeLists/OID</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>Unique identifier for this codelist</DESCRIPTION>
<LENGTH>128</LENGTH>
</COLUMN>
<COLUMN name="Name">
<PATH syntax="Xpath">/LIBRARY/CodeLists/Name</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>CodeList name</DESCRIPTION>
<LENGTH>128</LENGTH>
</COLUMN>
<COLUMN name="DataType">
<PATH syntax="Xpath">/LIBRARY/CodeLists/DataType</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>CodeList item value data type (integer | float | text | string)</DESCRIPTION>
<LENGTH>7</LENGTH>
</COLUMN>
<COLUMN name="SASFormatName">
<PATH syntax="Xpath">/LIBRARY/CodeLists/SASFormatName</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>SAS format name</DESCRIPTION>
<LENGTH>8</LENGTH>
</COLUMN>
<COLUMN name="ExtCodeID">
<PATH syntax="Xpath">/LIBRARY/CodeLists/ExtCodeID</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>Unique numeric code randomly generated by NCI Thesaurus (NCIt)</DESCRIPTION>
<LENGTH>64</LENGTH>
</COLUMN>
<COLUMN name="CodeListExtensible">
<PATH syntax="Xpath">/LIBRARY/CodeLists/CodeListExtensible</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>Defines if controlled terms may be added to the codelist (Yes | No)</DESCRIPTION>
<LENGTH>3</LENGTH>
</COLUMN>
314 Chapter 9 / XML-Based Standards
<COLUMN name="CDISCSubmissionValue">
<PATH syntax="Xpath">/LIBRARY/CodeLists/CDISCSubmissionValue</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>Specific value expected for submissions</DESCRIPTION>
<LENGTH>512</LENGTH>
</COLUMN>
When the cubeXML file is processed, each of the 15 data sets (such as CodeLists) that
are included in the SAS representation of the CDISC controlled terminology model is
derived. One input parameter can be specified in the call to the %CT_READ macro. The
parameter offers the option to create source metadata files.
Note: For more information about the %CT_READ macro, see the SAS Clinical
Standards Toolkit: Macro API Documentation.
By default, if a %CT_READ macro call is made with null parameters, source metadata
is derived. The target location of the derived metadata files is defined in the
SASReferences data set.
In the SASReferences data set, there are two input file references and five output data
set references that are key to the successful completion of the driver program. Table 9.2
on page 315 lists these files and data sets. In the sample create_sasct_fromxml.sas
driver program, these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=sample study library directory/cdisc-ct-1.0-1.7
Table 9.2 Key Components of the SASReferences Data Set for the create_sasct_fromxml.sas
Driver Program
SAS
LIBNAME
Metadata or Fileref Reference
Type to Use Type Path Name of File
Input
Output
Process Inputs
The externalxml type refers to the ODM controlled terminology XML file to read. The
filename reference crtxml is defined in the SASReferences data set. This filename
reference is used in the submitted SAS code when referring to the ODM controlled
terminology XML file.
316 Chapter 9 / XML-Based Standards
The referencexml type refers to the SAS map file that is used to generate the SAS data
sets that represent the ODM file metadata and content. The filename reference ctmap is
defined in the SASReferences data set. This filename reference is used in the
submitted SAS code when referring to the SAS map file. If a path and filename for the
map file are not specified, a temporary map file is created as part of the %CT_READ
macro processing.
Process Outputs
When the driver program finishes running, the read_results_sdtm_201212 data set is
created in the Results library. This data set contains informational messages, warnings,
and error messages that were generated by the driver program.
Reading XML Files 317
The following display shows an example of the contents of a Results data set that was
created while reading the sample ODM controlled terminology XML file as released by
NCI that was provided with the SAS Clinical Standards Toolkit:
Figure 9.8 Example of a Partial Results Data Set Created by the create_sasct_fromxml.sas
Driver Program
The Srcdata library contains the SAS data sets that represent the ODM controlled
terminology XML file metadata and content. By default, the %CT_READ macro creates
15 unique data sets in the SAS Clinical Standards Toolkit. Some of these data sets
might be empty if no associated content was derived from the ODM controlled
terminology XML file. There is a one-to-one correspondence between the tables listed in
318 Chapter 9 / XML-Based Standards
the Srcdata library and the tables contained in the source_tables metadata file in the
Srcmeta library.
Figure 9.9 Example of Partial Srcdata Library Derived from the %CT_READ Macro
The following display shows an example of controlled terminology in ODM XML (the
Action Taken with Study Treatment codelist):
The following display shows the data set created by the %CT_CREATEFORMATS
macro:
Figure 9.11 Partial cterms SAS Data Set Created by the %CT_CREATEFORMATS Macro
The following display shows that the %CT_CREATEFORMATS macro uses the data set
to create the $ACN SAS format:
After these steps, the value of the fmtname variable is validated against the following
regular expression:
'm/^(?=.{1,32}$)([\$a-zA-Z_][a-zA-Z0-9_]*[a-zA-Z_])$/'
If the value of the fmtname variable fails validation, the fmtname variable value does not
contain a valid SAS format name. The value is set to missing. Then, the codelist is not
used to create a SAS format.
Two sample driver programs are provided with the SAS Clinical Standards Toolkit to
demonstrate the use of the %CT_CREATEFORMATS macro:
sample study library directory/cdisc-ct-1.0-1.7/programs/
create_ctformats.sas
322 Chapter 9 / XML-Based Standards
Note: This section demonstrates reading CDISC CRT-DDS 1.0 define.xml files as an
example. The CDISC Define-XML 2.0 process is similar, but uses the define_read
macro instead of the crtdds_read macro.
The SAS Clinical Standards Toolkit supports reading a define.xml file and translating the
file metadata into a SAS representation of the CDISC CRT-DDS model. To read the
define.xml file, a specialized macro named %CRTDDS_READ is available in the CRT-
DDS 1.0 standards macros folder. This folder is located in global standards
library directory/standards/cdisc-crtdds-1.0-1.7/macros.
File references and other metadata that are required by the macro are set as global
macro variables. These global macro variables are set through the framework
initialization properties and the CDISC CRT-DDS 1.0 initialization properties.
Throughout the processing of the %CRTDDS_READ macro, the Results data set
contains all framework-specific messages and CRT-DDS 1.0-specific messages that
were generated during run time.
Based on file references defined in the SASReferences data set, the %CRTDDS_READ
macro accesses the define.xml file.
xmlns="http://www.cdisc.org/ns/odm/v1.2" FileOID="1"
CreationDateTime="2011-07-13T17:15:43-04:00"
AsOfDateTime="2011-07-13T17:12:42"
Description="define1" FileType="Snapshot" Id="define1"
ODMVersion="1.0">
<Study OID="1">
<GlobalVariables>
<StudyName>study1</StudyName>
<StudyDescription>first study</StudyDescription>
<ProtocolName>Protocol abc</ProtocolName>
</GlobalVariables>
<MetaDataVersion OID="1" Name="CDISC-SDTM 3.1.2"
Description="CDISC-SDTM 3.1.2"
def:DefineVersion="1.0.0"
def:StandardName="CDISC SDTM"
def:StandardVersion="3.1.2">
<ItemGroupDef
OID="AE1" Name="AE" Repeating="Yes"
IsReferenceData="No"
SASDatasetName="AE" Domain="AE"
Purpose="Tabulation" def:Label="Adverse Events"
def:Class="Events"
def:Structure="One record per adverse event per subject"
def:DomainKeys="STUDYID USUBJID AEDECOD AESTDTC"
def:ArchiveLocationID="AE1">
<ItemRef ItemOID="COL1" Mandatory="Yes"
OrderNumber="1" KeySequence="1" Role="Identifier"/>
<ItemRef ItemOID="COL2" Mandatory="Yes"
OrderNumber="2" Role="Identifier"/>
<ItemRef ItemOID="COL3" Mandatory="Yes"
OrderNumber="3" KeySequence="2" Role="Identifier"/>
<ItemRef ItemOID="COL4" Mandatory="Yes"
OrderNumber="4" Role="Identifier"/>
<ItemRef ItemOID="COL5" Mandatory="No"
OrderNumber="5" Role="Identifier"/>
<ItemRef ItemOID="COL6" Mandatory="No"
OrderNumber="6" Role="Identifier"/>
<ItemRef ItemOID="COL7" Mandatory="No"
OrderNumber="7" Role="Identifier"/>
After the %CRTDDS_READ macro confirms that the define.xml file exists, a call is
made to the SAS DATA step component JavaObj. JavaObj processing converts the
define.xml file into a cubeXML file through transformations using XSL files and
processes.
324 Chapter 9 / XML-Based Standards
The cubeXML file is created in the Work library. The name of the cubeXML file is
_cubnnnn.xml , where nnnn is a randomly generated number.
The cubeXML file is accessed using the SAS XML LIBNAME engine and XMLMap
processing. A default XMLMap file is stored in the sample CRT-DDS 1.0 study folder
hierarchy (referencexml/define.map). The define.map file is required to process
the cubeXML file. If it does not exist, the crtdds_read attempts to create one using the
CRT-DDS reference metadata.
<TABLE name="AnnotatedCRFs">
<TABLE-PATH syntax="XPath">/LIBRARY/AnnotatedCRFs</TABLE-PATH>
<TABLE-DESCRIPTION>Annotated CRF metadata</TABLE-DESCRIPTION>
<COLUMN name="DocumentRef">
<PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/DocumentRef</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>The referenced Annotated CRF document</DESCRIPTION>
<LENGTH>2000</LENGTH>
</COLUMN>
<COLUMN name="leafID">
<PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/leafID</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>The unique ID of the referenced Annotated CRF</DESCRIPTION>
<LENGTH>128</LENGTH>
</COLUMN>
<COLUMN name="FK_MetaDataVersion">
<PATH syntax="Xpath">/LIBRARY/AnnotatedCRFs/FK_MetaDataVersion</PATH>
<TYPE>character</TYPE>
<DATATYPE>character</DATATYPE>
<DESCRIPTION>Foreign key: MetaDataVersion.OID</DESCRIPTION>
<LENGTH>128</LENGTH>
</COLUMN>
</TABLE>
Processing of the cubeXML file results in the derivation of the data sets (such as
ItemDefs) currently included in the SAS representation of the CDISC CRT-DDS model.
Reading XML Files 325
The final step in the %CRTDDS_READ macro is the derivation of table and column
metadata that describe the data sets in the SAS representation of the define.xml file. At
this point, the %CRTDDS_READ macro is ready to create the source_tables and
source_columns data sets. The tables in the source_tables data set are created and
copied to the output library as defined in the SASReferences data set.
Note: CDISC CRT-DDS 1.0 is discussed in this section. The process is similar for
CDISC Define-XML 2.0.
The create_sascrtdds_fromxml.sas driver program is used to read define.xml files.
In the SASReferences data set, there are two input file references and four output data
set references that are key to the successful completion of the driver program. Table 9.3
on page 326 lists these files and data sets, and they are discussed in separate
sections. In the sample create_sascrtdds_fromxml.sas driver program, these values are
set for &studyRootPath and &studyOutputPath:
&studyRootPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion
326 Chapter 9 / XML-Based Standards
&studyOutputPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion
Table 9.3 Key Components of the SASReferences Data Set for the
create_sascrtdds_fromxml.sas Driver Program
SAS
LIBNAME or
Fileref to Reference
Metadata Type Use Type Path Name of File
Input
Output
Process Inputs
The externalxml type refers to the define.xml file to read. The filename reference crtxml
is defined in the SASReferences data set. This filename reference is used in the
submitted SAS code when referring to the define.xml file.
The referencexml type refers to the SAS map file that is used to generate the SAS data
sets that represent the define.xml file metadata and content. The filename reference
crtmap is defined in the SASReferences data set. This filename is used in the submitted
Reading XML Files 327
SAS code when referring to the SAS map file. If a path and filename for the map file are
not specified, a temporary map file is created as part of the crtdds_read processing.
Process Outputs
The sourcedata type is the library where the metadata files are created. These
metadata files are the data sets that comprise the CRT-DDS information.
The sourcemetadata type refers to two data sets that are created from the cubeXML
file, source_tables, and source_columns. Both data sets are stored in the same library.
The source_tables data set contains metadata about each table that is derived from the
CRT-DDS macro. The source_columns data set contains similar metadata but it is at the
column level. Both of the data sets are written to the Srcmeta library. The
sourcemetadata type refers to a data set source_study. The source_study data set is
created in the Srcmeta library and contains study metadata.
The results type refers to the Results data set that contains information from running the
CRT-DDS macro. This information is written to the read_results data set in the Results
library.
Process Results
When the driver program finishes running, the read_results data set is created in the
Results library. This data set contains informational, warning, and error messages that
were generated by the driver program.
328 Chapter 9 / XML-Based Standards
The following display shows an example of the contents of a Results data set in the
CRT-DDS sample study:
The %CRTDDS_READ macro creates the source_tables and source_columns data sets
in the Srcmeta library. These data sets contain the table and column metadata for the
SAS representation of CRT-DDS that is derived from the define.xml file. The Srcmeta
Reading XML Files 329
Figure 9.14 Example of Partial Source_Tables Data Set Derived from the %CRTDDS_READ
Macro
330 Chapter 9 / XML-Based Standards
Figure 9.15 Example of Partial Source_Columns Data Set Derived from the
%CRTDDS_READ Macro
The Srcdata library contains the driver program-generated tables that comprise the SAS
representation of the CRT-DDS model. There is a one-to-one correspondence between
the tables listed in the Srcdata library and the tables contained in the source_tables
Writing XML Files 331
metadata file in the Srcmeta library. The Srcdata library corresponds to the location
specified in SASReferences (&studyOutputPath/deriveddata).
Figure 9.16 Example of Partial Srcdata Library Derived from the %CRTDDS_READ Macro
When running the driver programs against non-sample data, you must populate the
SASReferences data set in the driver program with the proper values. For an
explanation of the SASReferences data set, see Chapter 6, “SASReferences File,” on
page 137.
Overview
Support of CDISC XML-based standards, such as CDISC CRT-DDS 1.0, CDISC Define-
XML 2.0, and CDISC ODM, includes the ability to render these files in SAS data set
format and the ability to create model-specific XML files from a SAS data set
representation of those standards.
332 Chapter 9 / XML-Based Standards
In the SAS Clinical Standards Toolkit, you can create a CDISC CRT-DDS 1.0 define.xml
file or CDISC Define-XML 2.0 file (including Analysis Results Metadata 1.0) that
references a CDISC SDTM study, a SEND study, or a CDISC ADaM study. You can also
create a CDISC ODM 1.3.0 XML file or a CDISC ODM 1.3.1 file.
The next section outlines the basic workflow for the creation of model-specific XML files.
Basic Workflow
Here is the basic workflow for writing XML files:
2 (Optional) Validate the SAS representation of the XML-based standard (to include
foreign key relationships, value conformance to a set of expected values, and so
on).
3 Create a standardized intermediate cubeXML file using the data and metadata
contained in the SAS representation of the standard.
4 (Build and) reference a set of valid XSL style sheets for each target data set (such
as ItemDefs.xsl).
5 Use the SAS DATA step component JavaObj to read the cubeXML file using the XSL
style sheets to create the target standard-specific XML file.
6 (Optional) Validate the structure and syntax of the XML file that was created against
an XML schema.
3 The %CRTDDS_WRITE macro creates the define.xml file from the SAS
representation of the CRT-DDS files.
4 The %CSTUTILXMLVALIDATE macro validates that the XML file is structurally and
syntactically correct according to the XML schema for the CRT-DDS 1.0 standard.
This macro is important if you customize the define.xml file outside of the workflow.
For example, if you edit the define.xml file to add links for annotated CRF pages, this
macro validates the syntax.
These macros are called by driver programs that are responsible for properly setting up
each SAS Clinical Standards Toolkit process to perform a specific SAS Clinical
Standards Toolkit task. Several sample driver programs are provided with the SAS
Clinical Standards Toolkit CDISC CRT-DDS standard related to the creation of the
define.xml file.
These driver programs are examples that are provided with the SAS Clinical Standards
Toolkit. You can use these driver programs or create your own. The names of these
driver programs are not important. However, the content is important and demonstrates
how the various SAS Clinical Standards Toolkit framework macros are used to generate
the required metadata files.
The driver programs create a define.xml based on SDTM metadata. Similar programs
are provided with the SAS Clinical Standards Toolkit for the creation of a define.xml
based on ADaM metadata.
The following table lists the parameters for the driver program:
_cstOutLib Yes The library reference (LIBNAME) where the tables are
created.
Writing XML Files 335
_cstSourceTables Yes The data set that contains the SDTM metadata for the
domains to include in the CRT-DDS file.
_cstSourceColumns Yes The data set that contains the SDTM metadata for the
domain columns to include in the CRT-DDS file.
_cstSourceStudy Yes The data set that contains the SDTM metadata for the
studies to include in the CRT-DDS file.
_cstSourceValues No The data set that contains the SDTM metadata for the
Value Level columns to include in the CRT-DDS file.
_cstSourceDocuments No The data set that contains the SDTM metadata for the
Document references to include in the CRT-DDS file.
In the example, the %CRTDDS_SDTMTODEFINE macro writes all of the CRT-DDS 1.0
defined tables to the Srcdata library.
In the SASReferences data set, there are five input file references and one output data
set reference that are key to the successful completion of the
create_crtdds_from_sdtm.sas driver program. Table 9.5 on page 336 lists these files
and data sets, and they are discussed in separate sections. In the sample
create_crtdds_from_sdtm.sas driver program, these values are set for &studyRootPath
and &studyOutputPath:
&studyRootPath=sample study library directory/cdisc-sdtm-3.1.3–
1.7/sascstdemodata
Table 9.5 Key Components of the SASReferences Data Set for the
create_crtdds_from_sdtm.sas Driver Program
SAS
LIBNAME or Reference
Metadata Type Fileref to Use Type Path Name of File
Input
SAS
LIBNAME or Reference
Metadata Type Fileref to Use Type Path Name of File
Output
Process Inputs
The sourcemetadata type refers to three data sets that contain the SDTM domain
metadata: source_tables, source_columns, and source_values. These data sets are
stored in the same library.
This location is represented in the driver program by the sampdata library name.
A source study data set (source_study) is required by this driver program. The following
table lists the variables that are required in this data set:
Table 9.6 Variables Required in the Source Study Data Set (source_study)
SASref Yes The reference that ties the study name to the
corresponding domains that are associated with
this study in the source_tables and source_columns
data sets.
ProtocolName Yes The name of the protocol for the study. This value is
used to populate the srcdata.study.protocolname
column.
Only a single study can be referenced in the source study data set.
Process Outputs
The sourcedata type is the library where the metadata files are created. These
metadata files are the data sets that comprise the SAS representation of the CDISC
CRT-DDS 1.0 standard. The create_crtdds_from_sdtm.sas driver program creates 39
data sets. Most of these data sets have zero observations because there is no default
Writing XML Files 339
SDTM metadata source. In the SAS Clinical Standards Toolkit sample study, these data
sets are written to the sample study library directory/cdisc-crtdds-1.0–
1.7/data directory. This location is represented in the driver program by the srcdata
library name.
Process Results
When the driver program finishes running, the sdtmtodefine_results data set is created.
This data set contains informational, warning, and error messages that were generated
by the submitted driver program.
Figure 9.17 Example of a Partial Results Data Set from CRT-DDS Sample Study
metadata or data are missing, then empty elements and attributes are not created in the
define.xml file. The inputs and outputs are specified in the SASReferences data set.
Note: For more information about the %CRTDDS_WRITE macro, see the SAS Clinical
Standards Toolkit: Macro API Documentation.
In this example, a default style sheet is generated in the same directory as the XML
output based on the information in the SASReferences data set. XML encoding is set to
UTF-16, and process results are written to the default &_cstResultsDS data set.
Here is the call to the macro from the sample create_crtdds_define.sas driver program:
%crtdds_write(_cstCreateDisplayStyleSheet=1);
The call creates a display style sheet and uses default values for the parameters.
Multiple tasks can be executed in any SAS Clinical Standards Toolkit driver program.
The create_crtdds_define.sas driver program calls both the %CRTDDS_WRITE macro
to create the define.xml file, and the %CSTUTILXMLVALIDATE macro to validate the
syntax of the generated define.xml file. For more information about the
%CSTUTILXMLVALIDATE macro, see “Validation of XML-Based Standards” on page
366.
In the SASReferences data set, there are two input file references and three output data
set references that are key to the successful completion of the create_crtdds_define.sas
driver program. Table 9.7 on page 341 lists these files and data sets, and they are
discussed in separate sections. In the sample create_crtdds_define.sas driver program,
these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=sample study library directory/cdisc-crtdds-1.0–
1.7
Table 9.7 Key Components of the SASReferences Data Set for the %CRTDDS_WRITE Macro
LIBNAME
Metadata or Fileref Reference
Type to Use Type Path Name of File
Input
Output
Process Inputs
Use of the control library name that points to the path in the &workpath macro variable
demonstrates a technique of documenting the derivation of the SASReferences data set
in the SAS Work library. The driver program initiates the macro variable &workpath with
this SAS code:
%let workPath=%sysfunc(pathname(work));
342 Chapter 9 / XML-Based Standards
The sourcedata type is the library that contains the 39 data sets that might have been
populated by the create_crtdds_from_sdtm.sas driver program. These metadata files
are the data sets that constitute the SAS representation of the CDISC CRT-DDS 1.0
standard. In the SAS Clinical Standards Toolkit sample study, these data sets are read
from the sample study library directory/cdisc-crtdds-1.0–1.7/data
directory. This location is represented in the driver program by the Srcdata library name.
Process Outputs
The externalxml type refers to the define.xml file. This file is accessed in the driver
program from the extxml filename statement, and is written to the sample study
library directory/cdisc-crtdds-1.0–1.7/sourcexml directory.
The referencexml type can serve as either an input or output file reference. If the path
and filename are not specified, the %CRTDDS_WRITE macro interprets the
_cstCreateDisplayStyleSheet=1 parameter to indicate the default style sheet that is
provided by the SAS Clinical Standards Toolkit in the global standards library. If a path
and filename are specified, the referencexml type serves as an output file reference for
the %CRTDDS_WRITE macro. The default style sheet is copied from the global
standards library to the path and filename that are specified.
The results type refers to the write_results data set that documents the results of the
create_crtdds_define.sas driver program. In the SAS Clinical Standards Toolkit CDISC
CRT-DDS folder hierarchy, this information is written to the sample study library
directory/cdisc-crtdds-1.0–1.7/results directory.
Writing XML Files 343
Process Results
Inclusion of the results record (row) in the SASReferences data set indicates that the
process results are to be copied to a write_results data set located in the specified SAS
library.
Figure 9.18 Example of a Partial Results Data Set from the CRT-DDS Sample Study
The %CRTDDS_WRITEPDF macro supports the creation of a define.pdf file for the
CDISC ADaM, SDTM, and SEND standards. The contents of the sections (which
attributes are printed) is based on the Study Data Tabulation Model Metadata
Submission Guidelines (SDTM-MSG) (http://www.cdisc.org/sdtm, 2011-12-31).
The define.pdf file has an optional table of contents and these sections:
n Dataset level metadata
n Variable level metadata
n Value level metadata
n Algorithms (Computational Methods)
n Controlled Terminology
The following parameters are the most important parameters for the
%CRTDDS_WRITEPDF macro:
n _cstCDISCStandard
The CDISC standard for which the define.pdf is created. Valid values: SDTM, SEND,
and ADAM. The default is SDTM.
n _cstSourceLib
The library that contains the CRT-DDS SAS data sets. If not provided, the code
looks in SASReferences for type=sourcedata.
n _cstReportOutput
The name of the PDF to create. If not provided, the code looks in SASReferences
for type=report.
n _cstLinks
Indicates whether the macro creates internal hyperlinks in the PDF. Valid values: Y
or N. The default is N.
n _cstTOC
Writing XML Files 345
Indicates that the macro creates a table of contents in the PDF. Valid values: Y or N.
The default is N.
Two sample driver programs are provided with the SAS Clinical Standards Toolkit to
demonstrate the use of the %CRTDDS_WRITEPDF macro:
sample study library directory/cdisc-crtdds-1.0-1.7/programs/
create_crtdds_define_pdf.sas
The following displays show examples of define.pdf files that were created by the
%CRTDDS_WRITEPDF macro:
3 The %CSTUTILXMLVALIDATE macro validates that the XML file is structurally and
syntactically correct according to the XML schema for the CDISC Define-XML 2.0
standard.
These macros are called by driver programs that are responsible for properly setting up
each SAS Clinical Standards Toolkit process to perform a specific SAS Clinical
Standards Toolkit task. Several sample driver programs are provided with the SAS
Clinical Standards Toolkit CDISC Define-XML 2.0 standard related to the creation of the
define.xml file.
These driver programs are examples that are provided with the SAS Clinical Standards
Toolkit. You can use these driver programs or create your own. The names of these
driver programs are not important. However, the content is important and demonstrates
how the various SAS Clinical Standards Toolkit framework macros are used to generate
the required metadata files.
The driver programs create a define.xml file based on SDTM or ADaM metadata.
348 Chapter 9 / XML-Based Standards
When the macro parameter _cstFullModel has the value N, only the 31 Define-XML 2.0
core tables are created. Otherwise, all 46 tables in the Define-XML 2.0 reference
standard are created, but only those tables with available data are populated. The other
tables contain zero observations. When the macro parameter _cstCheckLengths has
the value Y, the macro checks the actual value lengths of variables with DataType=text
against the lengths defined in the metadata templates. If the lengths are short, a
warning is written to the log file and the Results data set.
Note: For more information about the %DEFINE_SOURCETODEFINE macro, see the
SAS Clinical Standards Toolkit: Macro API Documentation.
In this example, eight extra tables are created with metadata for analysis results.
The create_sasdefine_from_source.sas driver program is provided with the SAS Clinical
Standards Toolkit, and it is ready to run on any of the SDTM or ADaM sample studies.
The driver program can be run interactively or in batch. To run the driver program
interactively, start a SAS session, and load the driver program into the SAS editor.
In the SASReferences data set, there are seven input file references and one output
data set reference that are key to the successful completion of the
create_sasdefine_from_source.sas driver program. Table 9.8 on page 350 lists these
files and data sets, and they are discussed in separate sections. In the sample
350 Chapter 9 / XML-Based Standards
Here are the macro variable assignments in the sample driver program to work with the
sample SDTM 3.1.2 metadata:
%let _cstTrgStandard=CDISC-SDTM;
%let _cstTrgStandardVersion=3.1.2;
Table 9.8 Key Components of the SASReferences Data Set for the
create_sasdefine_from_source.sas Driver Program
SAS
LIBNAME
or Fileref Reference
Metadata Type to Use Type Path Name of File
Input
SAS
LIBNAME
or Fileref Reference
Metadata Type to Use Type Path Name of File
Output
Process Inputs
The sourcemetadata type refers to the data sets that contain the SDTM study metadata:
source_study, source_tables, source_columns, source_values, source_codelists,
source_documents, and source_analysisresults. . These data sets are stored in the
same library.
This location is represented in the driver program by the sampdata library name.
A source study data set (source_study) can have only one record, and it is required by
this macro. The following table lists the variables that are required in this data set:
Table 9.9 Variables Required in the Source Study Data Set (source_study)
SASref Yes The reference that ties the study name to the
corresponding domains that are associated with
this study in the source_tables and source_columns
data sets.
ProtocolName Yes The name of the protocol for the study. This value is
used to populate the srcdata.study.protocolname
column.
Only a single study can be referenced in a source study data set. The
%DEFINE_SOURCETODEFINE macro selects records from only the source_tables,
source_colums, source_codelists, source_values, source_documents, and
source_analysisresults data sets whose StudyVersion column value is equal to the
value of the StudyVersion column in the source_study data set.
Process Outputs
The sourcedata type is the library where the metadata files are created. These
metadata files are the data sets that constitute the SAS representation of the CDISC
Define-XML 2.0 standard. The create_sasdefine_from_source.sas driver program
creates 46 or 31 data sets, depending on the value of the _cstFullModel macro
parameter. Most of these data sets have zero observations because there is no default
SDTM metadata source. In the SAS Clinical Standards Toolkit sample driver program
create_sasdefine_from_source.sas, these data sets are written to this location:
sample study library directory/cdisc-definexml–2.0.0-1.7/data/
cdisc-sdtm-3.1.2
This location is represented in the driver program by the srcdata library name.
354 Chapter 9 / XML-Based Standards
Process Results
When the driver program finishes running, the sourcetodefine_results data set is
created in the Results library. This data set contains informational, warning, and error
messages that were generated by the driver program.
Figure 9.21 Example of a Partial Results Data Set from Define-XML 2.0 Sample Study
Writing XML Files 355
Note: For more information about the %DEFINE_WRITE macro, see the SAS Clinical
Standards Toolkit: Macro API Documentation.
In this example, a default style sheet is generated in the same directory as the XML
output based on the information in the SASReferences data set. XML encoding is set to
UTF-16, and process results are written to the default &_cstResultsDS data set.
Here is the call to the macro from the sample create_definexml.sas driver program:
%define_write(_cstCreateDisplayStyleSheet=1);
The call creates a display style sheet and uses default values for the parameters.
The create_definexml.sas driver program is ready to run on any of the CDISC SDTM
sample studies. The driver program can be run interactively or in batch.
Multiple tasks can be executed in any SAS Clinical Standards Toolkit driver program.
The create_definexml.sas driver program calls both the %DEFINE_WRITE macro to
create the Define-XML file and the %CSTUTILXMLVALIDATE macro to validate the
syntax of the generated Define-XML file. For more information about the
356 Chapter 9 / XML-Based Standards
In the SASReferences data set, there are two input file references and three output data
set references that are key to the successful completion of the create_definexml.sas
driver program. Table 9.10 on page 356 lists these files and data sets, and they are
discussed in separate sections. In the sample create_definexml.sas driver program,
these values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=sample study library directory/cdisc-
definexml-2.0.0–1.7
Table 9.10 Key Components of the SASReferences Data Set for the %DEFINE_WRITE
Macro
LIBNAME
Metadata or Fileref Reference
Type to Use Type Path Name of File
Input
Output
LIBNAME
Metadata or Fileref Reference
Type to Use Type Path Name of File
Process Inputs
Use of the control library name that points to the path in the &workpath macro variable
demonstrates a technique of documenting the derivation of the SASReferences data set
in the SAS Work library. The driver program initiates the macro variable &workpath with
this SAS code:
%let workPath=%sysfunc(pathname(work));
The sourcedata type is the library that contains the Define-XML data sets that might
have been populated by the create_sasdefine_from_source.sas driver program. These
metadata files are the data sets that constitute the SAS representation of the CDISC
Define-XML 2.0 standard. In the SAS Clinical Standards Toolkit sample study, these
data sets are read from the sample study library directory/cdisc-
definexml–2.0.0-1.7/data/cdisc-sdtm-3.1.2 directory. This location is
represented in the driver program by the Srcdata library name.
358 Chapter 9 / XML-Based Standards
Process Outputs
The externalxml type refers to the define-sdtm-3.1.2.xml file. This file is accessed in the
driver program from the extxml filename statement, and is written to the sample
study library directory/cdisc-definexml–2.0–1.7/sourcexml directory.
The referencexml type can serve as either an input or output file reference. If the path
and filename are not specified, the %DEFINE_WRITE macro interprets the
_cstCreateDisplayStyleSheet=1 parameter to indicate the default style sheet that is
provided by the SAS Clinical Standards Toolkit in the global standards library. If a path
and filename are specified, the referencexml type serves as an output file reference for
the %DEFINE_WRITE macro. The default style sheet is copied from the global
standards library to the path and filename that are specified.
The results type refers to the write_results data set that documents the results of the
create_definexml.sas driver program. In the SAS Clinical Standards Toolkit CDISC
Define-XML folder hierarchy, this information is written to the sample study library
directory/cdisc-definexml–2.0-1.7/results directory.
On UNIX, if you have not set up your browser configuration in SAS, you need to copy
define-sdtm-3.1.2.xml and define2-0-0.xsl to an environment where you can display the
XML file in a web browser.
Note: The style sheet information in define2-0-0.xsl is not guaranteed to work for all
browser types and versions to produce the correct HTML. But, it does work with Internet
Explorer 6.0 and higher. The Chrome browser, for example, does not allow local XML
and XSLT processing.
The sample driver program also creates the HTML rendition in the same folder as the
XML file using this code:
proc xsl
in=extxml
xsl=xslt01
out=html;
run;
Writing XML Files 359
Instead of opening the XML file in a browser and letting the browser use the XSL file to
render the HTML, you can directly open the HTML file.
Depending on your browser, you might see a security warning because the style sheet
uses JavaScript.
Process Results
Inclusion of the results record (row) in the SASReferences data set indicates that the
process results are to be copied to a write_results data set located in the specified SAS
library.
Figure 9.24 Example of a Partial Results Data Set from the Define-XML 2.0 Sample Study
There are several key macros that are provided with the SAS Clinical Standards Toolkit
that support the creation of an ODM XML file. The macros are listed in the order in
which they are executed:
1 The %ODM_VALIDATE macro submits a set of validation checks based on what is
defined in the Validation Control data set to validate the referenced SAS
representation of each ODM XML file.
362 Chapter 9 / XML-Based Standards
2 The %ODM_WRITE macro creates the ODM XML file from the SAS representation
of the ODM files and validates that the XML file is structurally and syntactically
correct. This macro is important if you customize the XML file outside of the
workflow.
3 The %CSTUTILXMLVALIDATE macro validates that the XML file is structurally and
syntactically correct, according to the XML schema for the ODM standard. This
macro is important if you customize the ODM XML file outside of the workflow.
These macros are called by driver programs that are responsible for properly setting up
each SAS Clinical Standards Toolkit process to perform a specific SAS Clinical
Standards Toolkit task. Two sample driver programs are provided with the SAS Clinical
Standards Toolkit CDISC ODM standard related to the creation of the XML file.
These driver programs are examples that are provided with the SAS Clinical Standards
Toolkit. You can use these driver programs or create your own. The names of these
driver programs are not important. However, the content is important and demonstrates
how the various SAS Clinical Standards Toolkit framework macros are used to generate
the required metadata files.
For more information about the %ODM_WRITE macro, see the SAS Clinical Standards
Toolkit: Macro API Documentation.
In this example, no default style sheet is generated for the XML output, XML encoding is
set to UTF-16, and process results are written to the default &_cstResultsDS data set.
Here is the call to the macro from the sample create_odmxml.sas driver program:
%odm_write();
The call uses default values for the parameters. The create_odmxml.sas driver program
is ready to run on the CDISC ODM sample study provided with the SAS Clinical
Standards Toolkit. The driver program can be run interactively or in batch.
In the SASReferences data set, there are one input file reference and two output data
set references that are key to the successful completion of the create_odmxml.sas
driver program. Table 9.11 on page 364 lists these files and data sets, and they are
discussed in separate sections. In the sample create_odmxml.sas driver program, these
values are set for &studyRootPath and &studyOutputPath:
&studyRootPath=sample study library directory/cdisc-odm-1.3.0–1.7
Table 9.11 Key Components of the SASReferences Data Set for the %ODM_WRITE Macro
Input
Output
Process Inputs
The sourcedata type is the library that contains the default 66 data sets that comprise
the SAS representation of an ODM XML file. These data sets might have been
populated by a previous odm_read task, or you might have processes in place that build
these data sets from source files. In the SAS Clinical Standards Toolkit sample study,
these data sets are read from the sample study library directory/cdisc-
odm-1.3.0–1.7/data directory. This location is represented in the driver program by
the Srcdata library name.
Process Outputs
The externalxml type refers to the ODM XML file that is to be derived by the process.
This file is accessed in the driver program from the extxml filename statement, and is
written to the sample study library directory/cdisc-odm-1.3.0–1.7/
sourcexml directory.
Note: Unlike CDISC CRT-DDS or CDISC Define-XML, CDISC does not supply a
default style sheet for ODM and one is not provided as part of the SAS Clinical
Standards Toolkit. However, you can use the %ODM_WRITE macro, which provides the
_cstCreateDisplayStyleSheet parameter, to use information that you provide in the
Metadata Type referencexml record of the SASReferences file.
Writing XML Files 365
The results type refers to the write_results data set that documents the results of the
create_odmxml driver program. In the SAS Clinical Standards Toolkit CDISC CRT-DDS
folder hierarchy, this information is written to this location:
sample study library directory/cdisc-odm-1.3.0–1.7/results
Process Results
Inclusion of the results record (row) in the SASReferences data set indicates that the
process results are to be copied to a write_results data set located in the specified SAS
library.
Figure 9.25 Example of a Partial Results Data Set from the ODM Sample Data Hierarchy
366 Chapter 9 / XML-Based Standards
XML Validation
When validating XML-based standards (such as CDISC ODM, CDISC CT, CDISC CRT-
DDS 1.0, and CDISC Define-XML 2.0, ), the SAS Clinical Standards Toolkit offers two
complementary methodologies.
The SAS Clinical Standards Toolkit provides both methodologies to support the
validation of CDISC CRT-DDS 1.0 and CDISC ODM 1.3.0 and 1.3.1 files.
For CDISC Define-XML 2.0 files, SAS Clinical Standards Toolkit supports validation
against an XML schema.
recommended that you replace calls to these macros with a call to the
%CSTUTILXMLVALIDATE macro.
In this example, the %CSTUTILXMLVALIDATE macro is being submitted with a log level
of Info.
Note: For more information about the %CSTUTILXMLVALIDATE macro, see the SAS
Clinical Standards Toolkit: Macro API Documentation.
XML schema validation results are logged using four log-level settings. These log levels
refer to the XML-generated log, not the log that is generated by SAS.
Warning Messages that indicate that there might be an issue with the CRT-DDS
document or with the execution of the validation process.
Fatal Error Messages that indicate that the XML document could not be processed
at all. There are many causes, including file system access errors,
incorrect file paths, and malformed XML.
Each message that is generated during XML validation is associated with one of these
levels. The level that you choose determines what other messages are generated. For
example, if you choose the Warning level, then all Warning messages and anything
more severe, such as Error and Fatal error messages, are generated. If you choose the
Error level, then only Error and Fatal Error messages are generated.
In the SAS Clinical Standards Toolkit, CDISC CRT-DDS validation uses the same types
of metadata and the same workflow process that is common to validation of all data
standards. SAS provides a set of validation checks for CDISC CRT-DDS that are
designed to verify the metadata definitions and values of the 39 data sets that comprise
the SAS representation of the CRT-DDS model. These checks were created by SAS.
For more information about these checks, see Chapter 7, “Compliance Assessment
Against a Reference Standard,” on page 161. Metadata about each check is provided in
the Validation Master data set in global standards library directory/
standards/cdisc-crtdds-1.0-1.7/validation/control.
If all 39 CRT-DDS tables contribute information to the define.xml file, then the validation
process can run directly against the reference_tables and reference_columns data sets.
In this case, the Use source data flag in the validation check data set needs to be set to
N. However, you are likely to run validation against a subset of the 39 tables. In this
case, a source_tables data set that contains the subset needs to be created from the
reference_tables data set. And, a corresponding source_columns data set needs to be
created from the reference_columns data set. The run-time validation check data set
can contain all of the checks, and Use source data can be set to Y, which is the default
value.
In the SASReferences data set, there are four input file references, one input library
reference, and one output data set reference that are key to the successful completion
of the validation process. Table 9.13 on page 370 lists these files, libraries, and data
sets, and they are discussed in separate sections. In the sample
validate_crtdds_data.sas driver program, these values are set for &studyRootPath and
&studyOutputPath:
370 Chapter 9 / XML-Based Standards
Note: The &studyRootPath and &studyOutputPath paths are the same for this driver
program. Two macro variables have been retained to maintain consistency across the
SAS Clinical Standards Toolkit driver programs.
&studyRootPath=sample study library directory/cdisc-crtdds-1.0–
1.7
Table 9.13 Key Components of the SASReferences Data Set for the validate_crtdds_data.sas
Driver Program
SAS
LIBNAME or
Fileref to Reference
Metadata Type Use Type Path Name of File
Input
Output
Process Inputs
The use of the cntl_s LIBNAME that points to the &workpath path demonstrates a
technique of documenting the derivation of the SASReferences data set in the SAS
Validation of XML-Based Standards 371
Work library. The driver program initiates the macro variable &workPath with this
statement:
%let workPath=%sysfunc(pathname(work));
In this case, the cntl_s LIBNAME points to the same directory as the Work LIBNAME.
The second control record points to the validation_control data set (run-time validation
check data set), and is accessed by the cntl_v LIBNAME statement. This LIBNAME is
assigned to the sample study library directory/cdisc-crtdds-1.0–1.7/
control directory.
The sourcemetadata type references two metadata data sets that describe the table
(source_tables) and column (source_columns) metadata for the 39 data sets that
comprise the SAS representation of the CRT-DDS model. Both data sets are stored in
the same library. In the SAS Clinical Standards Toolkit, this source metadata is read
from the sample study library directory/cdisc-crtdds-1.0–1.7/
metadata directory. This location is represented in the driver program by the Srcmeta
library name.
The sourcedata type is the library where the 39 data sets that comprise the SAS
representation of the CRT-DDS model are stored. These are the data sets that are
being validated. In the SAS Clinical Standards Toolkit, this library is read from the
sample study library directory/cdisc-crtdds-1.0–1.7/data directory.
This location is represented in the driver program by the Srcdata library name.
Process Outputs
For the SAS Clinical Standards Toolkit validation processes, the only process outputs
that are generated are the Validation Results and Validation Metrics data sets. These
data sets are described in the following section.
Process Results
When the validate_crtdds_data.sas driver program finishes running, the
validation_results data set is created in the Results library. The Results data set
contains informational, warning, and error messages that were generated by the driver
372 Chapter 9 / XML-Based Standards
In the SAS Clinical Standards Toolkit, CDISC ODM validation uses the same types of
metadata and the same workflow process that is common to validation of all data
standards. SAS provides a set of validation checks for CDISC ODM that are designed
to verify the metadata definitions and values of the default 66 data sets that comprise
the SAS representation of the ODM model. These checks were created by SAS. For
more information about these checks, see Chapter 7, “Compliance Assessment Against
a Reference Standard,” on page 161. Metadata about each check is provided in the
Validation Master data set in the global standards library directory/
standards/cdisc-odm-1.3.0-1.7/validation/control directory.
Validation of XML-Based Standards 373
The %ODM_VALIDATE macro controls the validation workflow for ODM. As each check
is processed from the run-time validation check data set, the check determines the
source of the table and column metadata to use. The reference_tables and
reference_columns data sets contain the metadata for the 66 data sets that comprise
the SAS representation for CDISC ODM. Unless you make customizations or run-time
modifications, the source metadata source_tables and source_columns data sets
contain the same content as the reference metadata reference_tables and
reference_columns data sets.
If all 66 ODM tables contribute information to the ODM XML file, then the validation
process can run directly against the reference_tables and reference_columns data sets.
In this case, the Use source data flag in the validation check data set needs to be set to
N. However, you can choose to run validation against a subset of the 66 tables. In this
case, a source_tables data set that contains the subset needs to be created from the
reference_tables data set. And, a corresponding source_columns data set needs to be
created from the reference_columns data set. The run-time validation check data set
can contain all of the checks, and the Use source data flag can be set to Y, which is the
default value.
In the SASReferences data set, there are three input file references, one input library
reference, and one output data set reference that are key to the successful completion
374 Chapter 9 / XML-Based Standards
of the validation process. These files, libraries, and data sets are listed in Table 9.14 on
page 374, and they are discussed in separate sections. In the sample
validate_odm_data.sas driver program, these values are set for &studyRootPath and
&studyOutputPath.
Note: The &studyRootPath and &studyOutputPath paths are the same for this driver
program. These two macro variables have been retained to maintain consistency across
the SAS Clinical Standards Toolkit driver programs.
&studyRootPath=sample study library directory/cdisc-odm-1.3.0–1.7
Table 9.14 Key Components of the SASReferences Data Set for the validate_odm_data.sas
Driver Program
LIBNAME
or Fileref Reference
Metadata Type to Use Type Path Name of File
Input
Output
Process Inputs
The control record points to the validation_control data set (run-time validation check
data set) data set. It is accessed by the cntl_v LIBNAME statement. This LIBNAME is
assigned to the sample study library directory/cdisc-odm-1.3.0–1.7/
control directory.
The sourcemetadata type references two metadata data sets that describe the table
(source_tables) and column (source_columns) metadata for the 66 data sets that
comprise the SAS representation of the ODM model. Both data sets are stored in the
same library. In the SAS Clinical Standards Toolkit, this source metadata is read from
the sample study library directory/cdisc-odm-1.3.0–1.7/metadata
directory. This location is represented in the driver program by the Srcmeta library
name.
The sourcedata type is the library where the 66 data sets that comprise the SAS
representation of the ODM model are stored. These are the data sets that are being
validated. In the SAS Clinical Standards Toolkit, this library is read from the sample
study library directory/cdisc-odm-1.3.0–1.7/data directory. This
location is represented in the driver program by the Srcdata library name.
Process Outputs
For the SAS Clinical Standards Toolkit validation processes, the only process outputs
that are generated are the Validation Results and Validation Metrics data sets. These
data sets are described in the following section.
Process Results
When the validate_odm_data driver program finishes running, the validation_results
data set is created in the Results library. The Results data set contains informational,
warning, and error messages that were generated by the driver program. Reporting of
376 Chapter 9 / XML-Based Standards
validation process metrics is supported, although it is not implemented for CDISC ODM
validation.
Overview
The typical SAS Clinical Standards Toolkit workflow in support of the CDISC standards
includes the definition and validation of SDTM submission data and the creation and
validation of a define.xml file based on the SDTM domain data. This exercise
demonstrates how you can read a define.xml file to extract the data and metadata for
the purposes of re-creating the original source SDTM study. Re-creating the original
source study has value as a stand-alone exercise, either to extract a new SDTM study
from a define.xml file or to create a new SDTM study using information in a define.xml
file as a template.
Special Topic: A Round-Trip Exercise Involving the CDISC SDTM and CDISC CRT-DDS
Standards 377
As a round-trip exercise, this task validates the performance of the %CRTDDS_WRITE
and %CRTDDS_READ macros and allows a comparison of original and re-created
SDTM metadata and data. This display details the high-level workflow for this exercise.
The Workflow
These steps describe the workflow in more detail. The first five steps describe the
derivation of the CDISC CRT-DDS 1.0 define.xml file.
Note: Steps 1 to 6 can be used with CDISC Define-XML 2.0. However, steps 7 to 9
have not been implemented in the SAS Clinical Standards Toolkit for Define-XML 2.0.
1 Access a study that contains valid CDISC SDTM data and metadata. This is a study
that contains domain data (AE, DM, CO, and so on) and the SAS Clinical Standards
Toolkit metadata about that SDTM study, such as source_tables and
source_columns. The SAS Clinical Standards Toolkit also includes XSL style sheets,
378 Chapter 9 / XML-Based Standards
XMLMap files, and any metadata that is provided by SAS during the SAS Clinical
Standards Toolkit installation.
2 Use the set of sample driver programs that are provided in the SAS Clinical
Standards Toolkit to define the input and output files for each process task and to
invoke the macros that support each standard-specific task. The driver programs are
designed to run with the sample studies, but can be modified as needed. New
custom drivers can be created and used.
7 SDTM domain data sets are created based on a reachable set of SAS transport files
that are specified in the define.xml file. Submit the create_sasdata_fromxpt.sas
SDTM driver program. For SDTM 3.1.2, the program is in the sample study
library directory/cdisc-sdtm-3.1.3–1.7/sascstdemodata/programs
directory. This driver program accesses the
%SDTMUTIL_CREATESASDATAFROMXPT macro to generate the SDTM domain
data sets from the SAS transport files. Creation of the SAS transport files is not
performed by the SAS Clinical Standards Toolkit. These files would have been
produced as a prerequisite to the generation of the define.xml file as a part of the
Electronic Common Technical Document preparation process. The
%SDTMUTIL_CREATESASDATAFROMXPT macro assumes that the SAS transport
files are reachable from a folder relative to the location of the referenced define.xml
file. In the create_sasdata_fromxpt.sas SDTM driver program, the XPT files are read
from the sample study library directory/cdisc-crtdds-1.0–1.7/
transport directory. The generated data sets are written to the sample study
library directory/cdisc-sdtm-3.1.3–1.7/sascstdemodata/derived/
data directory. At this point, the SDTM domain data sets should contain the same
information as the original domain data sets that were accessed at the beginning of
this process. By specifying a new target folder location, the SDTM data sets can be
validated against those referenced in steps 1 and 3.
8 Source metadata that describes the SDTM domains and columns is derived using
information contained in the CRT-DDS data sets derived in step 6. Submit the
create_sourcemetadata.sas SDTM driver program. For SDTM 3.1.2, it is installed in
the sample study library directory/cdisc-sdtm-3.1.3–1.7/
sascstdemodata/programs directory. In this exercise, this driver program calls
the %SDTMUTIL_CREATESRCMETAFROMCRTDDS macro, which uses a library
of SAS data sets that capture define.xml metadata (typically derived using the
%CRTDDS_READ macro). The output of this step is a set of SDTM metadata in the
source_tables, source_columns, and source_study data sets. These data sets are
written to the sample study library directory/cdisc-sdtm-3.1.3–1.7/
sascstdemodata/derived/metadata directory. At this point, the SDTM
metadata should contain the same information as the original metadata that was
380 Chapter 9 / XML-Based Standards
accessed at the beginning of this process. By specifying a new target folder location,
the SDTM metadata data sets can be validated against those referenced in steps 1
and 3.
9 SAS formats that support SDTM controlled terminology are derived using
information contained in the CRT-DDS data sets that were derived in step 6. Submit
the create_formatsfromcrtdds.sas SDTM driver program. For SDTM 3.1.2, this
program is installed in the sample study library directory/cdisc-
sdtm-3.1.3–1.7/sascstdemodata/programs directory. The driver program
accesses the %SDTMUTIL_CREATEFORMATSFROMCRTDDS macro and
generates the controlled terminology SAS format catalog based on codelists
specified in the define.xml file. The derived SAS format catalog is written to the
sample study library directory/cdiscsdtm-3.1.3–1.7/
sascstdemodata/derived/formats directory. These formats should match
those formats that were referenced by the SDTM columns at the beginning of this
process. By specifying a new target folder location, the SAS format catalog can be
validated against the catalog referenced in steps 1 and 3.
Once the round-trip exercise is complete, data derived from the process should match
the original data. There might be some metadata collected that does not match exactly
(particularly any date and time fields that collect real-time information). Differences can
be detected by submitting PROC COMPARE on any of the derived data and metadata
data sets against the original data and metadata data sets.
Generally, if you are resubmitting the same process code again without changing the
&_cststandard or &_cststandardversion global macro variables and you do not have
references to different data or metadata libraries, there are no consequences. However,
if you are attempting to change the standard or standard version in the same SAS
session or you are attempting to reference different studies, code libraries, or
terminology libraries, you must use the following code between each code submission:
%let _cstReallocateSASRefs=1;
%include "&_cstGRoot/standards/cst-framework-1.7/programs/resetautocallpath.sas";
In the driver programs provided with the SAS Clinical Standards Toolkit, the previous
code is commented so that it is not submitted during run time.
The results of the comparison are presented in a SAS data set that contains the
columns shown in the following table:
Table Table
Column Column
Issue Issue
Comment Comment
The Issue column summarizes issues that are found. The issue is identified by a
keyword.
The following table shows the Issue column keywords and their meanings:
DSLABEL The data set label does not match the data set description in the
Define-XML metadata.
LABEL The variable label does not match the variable description in the
Define-XML metadata.
DATA_COLUMN A data set column does not have a definition in the Define-XML
metadata.
LENGTH Inconsistencies exist between the length of the SAS variable and
the length defined in the Define-XML metadata.
Note: This check is performed only for SAS character variables
because the definition of the length of a numerical variable is not
compatible between SAS and Define-XML.
TYPE Inconsistencies exist between the type of the SAS variable and
the DataType defined in the Define-XML metadata.
Here is an example of the code to check the metadata for a CRT-DDS 1.0 file:
%cst_setStandardProperties(_cstStandard=CST-FRAMEWORK,_cstSubType=initialize);
%cstutil_setcstsroot;
%let studyRootPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion;
%let studyOutputPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion;
384 Chapter 9 / XML-Based Standards
%cstutilcomparemetadatasasdefine(
_cstSourceXPTFolder=%sysfunc(pathname(srcdata)),
_cstSourceMetadataLibrary=srcmeta,
_cstRptDS=results.compare_metadata_results
);
Here is an example of the code to check the metadata for a Define-XML 2.0 file:
%cst_setStandardProperties(_cstStandard=CST-FRAMEWORK,_cstSubType=initialize);
%cstutil_setcstsroot;
%let studyRootPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion;
%let studyOutputPath=&_cstSRoot/cdisc-crtdds-1.0-&_cstVersion;
%cstutilcomparemetadatasasdefine(
_cstSourceXPTFolder=%sysfunc(pathname(srcdata)),
_cstSourceMetadataLibrary=srcmeta,
_cstRptDS=results.compare_metadata_results
);
Overview
Note: The following process is the same for all ODM versions that are supported by the
SAS Clinical Standards Toolkit. The process is explained using ODM version 1.3.0.
In practice, vendor and custom extensions to ODM are common. For example,
Electronic Data Capture (EDC) vendors use data management features and flags that
might be exported using ODM XML extensions. By default, these extensions are
ignored by the SAS Clinical Standards Toolkit. Recall that the SAS Clinical Standards
Toolkit uses XSL style sheets for each of the default, supported 66 ODM data sets (such
as ItemDefs). These style sheets look for specifically named tags and hierarchical paths
based on the CDISC ODM 1.3.0 published specification. If elements or attributes exist
in the XML file but not in the specification, they are ignored.
For example, in this XML code fragment, note the Vendor:<name> syntax. This
represents a hypothetical extension to the ODM XML, presumably accompanied by a
namespace reference supporting the Vendor naming convention.
<FormData FormOID=" FormDefs.OID.Death" FormRepeatKey="00-01"
386 Chapter 9 / XML-Based Standards
TransactionType="Remove" Vendor:Revised="No">
<Vendor:DataQuery DQOID="DQ.OID.001"
QueryText="Premature report of patients demise?">
<Flag>Y</Flag>
<AuditRecord>
<UserRef UserOID="User.OID.I024" />
<LocationRef LocationOID="Location.OID.S001" />
<DateTimeStamp>2011-01-24T15:13:22</DateTimeStamp>
</AuditRecord>
</Vendor:DataQuery>
</FormData>
In this code fragment, the Vendor:DataQuery syntax specifies a new element with
several new attributes and references to other existing (supported) elements. Note the
additional Vendor:Revised attribute for FormData.
The SAS Clinical Standards Toolkit provides a macro to parse the ODM XML file to
identify currently unsupported elements and tags. This macro,
%CSTUTIL_READXMLTAGS, is located in the primary SAS Clinical Standards Toolkit
autocall library (!sasroot/cstframework/sasmacro).
In this call, the XML file to be parsed is specified with the inxml fileref. The results of
parsing are to be written to two data sets: work.cstodmelements for all unique elements
found in the XML file and work.cstodmattributes for all unique attributes found that are
associated with each unique element.
Note: For more information about the %CSTUTIL_READXMLTAGS macro, see the
SAS Clinical Standards Toolkit: Macro API Documentation.
Special Topic: Identifying Unsupported Elements and Attributes in a CDISC ODM File 387
This program provides the same process setup functionality supported in most SAS
Clinical Standards Toolkit driver programs, builds a SASReferences data set that
defines process inputs and outputs, and allocates all SAS librefs and filerefs.
Table 9.17 Key Components of the SASReferences Data Set for the
find_unsupported_tags.sas Program
SAS
LIBNAME
or Fileref Reference
Metadata Type to Use Type Path Name of File
Input
Output
Process Inputs
The externalxml type refers to the ODM XML file to read. The filename odmxml is
defined in the SASReferences data set. This filename is used in the submitted SAS
code when referring to the XML file. The ODM XML file odm_extended.xml contains
sample extensions to the core ODM 1.3.0 model.
The standardmetadata type, referenced by the odmmeta SAS libref, references the
global standards library directory/standards/cdisc-odm-1.3.0-1.7/
metadata folder. This folder includes the two data sets valid_elements and
valid_attributes, which contain the full list of ODM core elements and attributes
Special Topic: Identifying Unsupported Elements and Attributes in a CDISC ODM File 389
supported by the SAS Clinical Standards Toolkit. The valid_elements data set contains
a single column element itemizing the ODM core elements. The valid_attributes data set
contains each attribute within the context of its parent tag and containing element.
The following display shows a partial listing of the valid_attributes data set:
Process Outputs
The results type refers to the Results data set that contains information from running the
process. In the SAS Clinical Standards Toolkit sample code hierarchy, this information is
written to the sample study library directory/cdisc-odm-1.3.0–1.7/
results directory. This location is represented in the program by the Results library
name.
Depending on the parameter values associated with the call to the
%CSTUTIL_READXMLTAGS macro, two additional process outputs might be persisted
at the conclusion of the process. If the _cstxmlreporting parameter is set to Dataset, any
unsupported elements are documented in the data set referenced by the
_cstxmlelementds parameter and any unsupported attributes are documented in the
data set referenced by the _cstxmlattrds parameter.
390 Chapter 9 / XML-Based Standards
Process Results
When the program finishes running, the readxmltags_results data set is created in the
Results library. This data set contains informational, warning, and error messages that
were generated by the program.
The following display shows an example of the contents of a Results data set run
against the customized odm_extended.xml input file (with the _cstxmlreporting
parameter set to Results):
Figure 9.32 Example of a Partial Results Data Set Created by the find_unsupported_tags.sas
Program
Overview
The typical SAS Clinical Standards Toolkit workflow that supports the creation of a
Define-XML 2.0 file includes the definition of metadata that describes the study,
domains, columns, codelists, value-level metadata, and supporting documents. A
CDISC ADaM study can also include analysis results metadata.
The valid values for the _cstSubType parameter are study, table, column, codelist,
value, analysisresults, and document.
Part of the metadata in these data sets can be derived by macros in the SAS Clinical
Standards Toolkit based on various inputs such as these:
n the study domain data sets
For more information, see “Creating Study Source Metadata from Study Domain
Data Sets” on page 392.
n metadata data from an imported Define-XML 2.0 file from a similar study
For more information, see “Deriving Study Source Metadata from an Imported
Define-XML 2.0 File for a Similar Study” on page 394.
n metadata converted from source study metadata that was previously used for the
creation of a CRT-DDS 1.0 define.xml file for a study
For more information, see “Migrating Study Source Metadata Used for the Creation
of a CRT-DDS 1.0 define.Xml File for the Study” on page 397.
392 Chapter 9 / XML-Based Standards
These macros are called by driver programs that are responsible for properly setting up
each SAS Clinical Standards Toolkit process to perform a specific task. These driver
programs are examples that are provided with the SAS Clinical Standards Toolkit. You
can use these driver programs or create your own. The names of these driver programs
are not important. However, the content is important and demonstrates how the various
SAS Clinical Standards Toolkit framework macros are used to generate the required
metadata files.
The source data is read from a single SAS library. You can modify the code to reference
multiple libraries by using library concatenation. Only one study reference can be
specified. Multiple study references require modification of the code.
%define_createsrcmetafromsaslib(
_cstSASDataLib=srcdata,
_cstStudyMetadata=work.studymetadata,
_cstTrgStandard=&_cstTrgStandard,
_cstTrgStandardVersion=&_cstTrgStandardVersion,
_cstTrgStudyDS=trgmeta.source_study,
_cstTrgTableDS=trgmeta.source_tables,
_cstTrgColumnDS=trgmeta.source_columns,
_cstTrgCodeListDS=trgmeta.source_codelists,
_cstTrgValueDS=trgmeta.source_values,
_cstTrgDocumentDS=trgmeta.source_documents,
_cstTrgAnalysisResultDS=trgmeta.source_analysisresults,
_cstLang=en,
_cstUseRefLib=Y,
_cstRefTableDS=refmeta.reference_tables,
_cstRefColumnDS=refmeta.reference_columns,
_cstClassTableDS=refmeta.class_tables,
_cstClassColumnDS=refmeta.class_columns,
394 Chapter 9 / XML-Based Standards
_cstKeepAllCodeLists=Y,
_cstFormatCatalogs=cstfmt.formats ncifmt.cterms,
_cstNCICTerms=ncifmt.cterms
);
After the driver program runs, the srcmeta_saslib_results data set is created. This data
set contains informational, warning, and any error messages that were generated by the
driver program.
The following SAS data sets must exist in this Define-XML V2.0.0 SAS data set library:
aliases itemrefwhereclauserefs
codelistitems itemvaluelistrefs
codelists mdvleaf
definedocument mdvleaftitles
documentrefs metadataversion
Special Topic: Creating Study Source Metadata to Create a CDISC Define-XML 2.0 define.xml
File 395
enumerateditems methoddefs
externalcodelists pdfpagerefs
formalexpressions study
itemdefs translatedtext
itemgroupdefs valuelistitemrefs
itemgroupitemrefs valuelists
itemgroupleaf whereclausedefs
itemgroupleaftitles whereclauserangechecks
itemorigin whereclauserangecheckvalues
When creating the source_analysisresults data set, the following SAS data sets must
exist in this Define-XML V2.0.0 SAS data set library:
analysisdataset analysisresultdisplays
analysisdatasets analysisresults
analysisdocumentation analysisvariables
analysisprogrammingcode analysiswhereclauserefs
%define_createsrcmetafromdefine(
_cstDefineDataLib=srcdata,
_cstTrgStandard=&_cstTrgStandard,
_cstTrgStandardVersion=&_cstTrgStandardVersion,
_cstTrgMetaLibrary=trgmeta,
_cstTrgStudyDS=trgmeta.source_study,
_cstTrgTableDS=trgmeta.source_tables,
_cstTrgColumnDS=trgmeta.source_columns,
_cstTrgCodeListDS=trgmeta.source_codelists,
_cstTrgValueDS=trgmeta.source_values,
_cstTrgDocumentDS=trgmeta.source_documents,
_cstTrgAnalysisResultDS=trgmeta.source_analysisresults,
_cstLang=en,
_cstUseRefLib=Y,
_cstRefTableDS=refmeta.reference_tables,
_cstRefColumnDS=refmeta.reference_columns,
_cstClassTableDS=refmeta.class_tables,
_cstClassColumnDS=refmeta.class_columns,
_cstReturn=_cst_rc,
_cstReturnMsg=_cst_rcmsg
);
After the driver program runs, the srcmeta_define_results data set is created. This data
set contains informational, warning, and error messages that were generated by the
driver program.
Special Topic: Creating Study Source Metadata to Create a CDISC Define-XML 2.0 define.xml
File 397
For CRT-DDS 1.0.0, the following source metadata SAS data sets are defined in SAS
Clinical Standards Toolkit starting with version 1.5:
n source_study
n source_tables
n source_columns
n source_values
n source_documents
For Define-XML 2.0.0, the source metadata SAS data set source_codelists contains all
metadata needed to create codelists in the define.xml file. The metadata includes
external codelists (for example, MedDRA and WHODRUGG) and NCI metadata (for
example, the so-called C-codes).
To create the source_codelists study metadata data set, you must specify two items: a
list of format catalogs that define the study formats and a SAS data set that contains
CDISC/NCI codelist metadata.
Here is an example of the librefs that are defined after the initial setup:
%**********************************************************************************;
%* Define libnames for input *;
%**********************************************************************************;
%* Original CRT-DDS v1 source metadata for SDTM 3.1.2 in CST 1.7;
libname crtdds "&studyRootPath/sascstdemodata/metadata";
%**********************************************************************************;
%* Define libnames for output *;
%**********************************************************************************;
%* Migrated Define-XML v2 source metadata;
libname defv2 "&studyOutputPath/derivedstudymetadata_crtdds/%lowcase(&_cstTrgStandard)-
&_cstTrgStandardVersion";
%**********************************************************************************;
%* Define formats *;
%**********************************************************************************;
*********************************************************************;
* Set CDISC NCI Controlled Terminology version for this process. *;
*********************************************************************;
%cst_getstandardsubtypes(_cstStandard=CDISC-TERMINOLOGY,_cstOutputDS=work._cstStdSubTypes);
data _null_;
set work._cstStdSubTypes (where=(standardversion="&_cstTrgStandard" and isstandarddefault='Y'));
* User can override CT version of interest by specifying a different where clause: *;
* Example: (where=(standardversion="&_cstTrgStandard" and standardsubtypeversion='201104'))*;
call symputx('_cstCTPath',path);
call symputx('_cstCTMemname',memname);
run;
Note: It is likely that you must modify some mappings based on the specific data
values. It is important to use the format names as specified because these formats are
used in the conversion macros.
%**********************************************************************************;
%* Migrate source tables *;
%**********************************************************************************;
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_study,
_cstTrgDS=defv2.source_study, _cstStudyVersion=&studyversion,
_cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_tables,
_cstTrgDS=defv2.source_tables, _cstStudyVersion=&studyversion,
_cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_columns,
_cstTrgDS=defv2.source_columns, _cstStudyVersion=&studyversion,
_cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_values,
_cstTrgDS=defv2.source_values, _cstStudyVersion=&studyversion,
_cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
%cstutilmigratecrtdds2define(_cstSrcLib=crtdds, _cstSrcDS=source_documents,
_cstTrgDS=defv2.source_documents, _cstStudyVersion=&studyversion,
_cstStandard=&_cstTrgStandard, _cstCheckValues=Y);
The creation of the source_codelists table is a separate task because this table was not
available in the CRT-DDS 1.0.0 source metadata.
Here is an example of the call to the %CSTUTILGETNCIMETADATA macro, in which
the _cstFormatCatalogs parameter is blank. This indicates that the format catalogs that
define the code lists to include in the source_codelists table are taken from the value of
the FMTSEARCH option.
%**********************************************************************************;
Special Topic: Creating Study Source Metadata to Create a CDISC Define-XML 2.0 define.xml
File 401
%* Create source_codelists *;
%**********************************************************************************;
%* Get formats ;
%cstutilgetncimetadata(
_cstFormatCatalogs=,
_cstNCICTerms=ncisdtm.cterms,
_cstLang=en,
_cstStudyVersion=&studyversion,
_cstStandard=&_cstTrgStandard,
_cstStandardVersion=&_cstTrgStandardVersion,
_cstFmtDS=work._cstformats,
_cstSASRef=&SASRef,
_cstReturn=_cst_rc,
_cstReturnMsg=_cst_rcmsg
);
Here is an example of the last part of the sample driver program, in which metadata for
external controlled terminology is added to the source_codelists data set:
%**********************************************************************************;
%* Updates for External Controlled Terminology *;
%**********************************************************************************;
402 Chapter 9 / XML-Based Standards
proc sql;
insert into defv2.source_codelists
(sasref, codelist, codelistname, codelistdatatype, dictionary, version,
studyversion, standard, standardversion)
values ("&SASRef", "CL.AEDICT", "Adverse Event Dictionary", "text", "MEDDRA", "8.0",
"&studyversion", "&_cstTrgStandard", "&_cstTrgStandardVersion")
values ("&SASRef", "CL.DRUGDCT", "Drug Dictionary", "text", "WHODRUG", "200204",
"&studyversion", "&_cstTrgStandard", "&_cstTrgStandardVersion")
;
quit;
data defv2.source_columns;
set defv2.source_columns;
if table="AE" and column in ("AEDECOD" "AEBODSYS") then xmlcodelist="CL.AEDICT";
if table="CM" and column in ("CMDECOD" "CMCLAS" "CMCLASCD")
then xmlcodelist="CL.DRUGDCT";
run;
CDISC Dataset-XML
Overview
CDISC Dataset-XML defines a standard format for transporting tabular data in XML
between any two entities based on CDISC ODM XML. In addition to supporting the
transport of data sets as part of a submission to the FDA, Dataset-XML can be used to
exchange data between two parties. For example, the Dataset-XML data format can be
used by a CRO to transmit SDTM or ADaM data sets to a sponsor organization.
Dataset-XML supports SDTM, ADaM, and SEND data sets but can also be used to
exchange any other type of tabular data set.
XML file describes all of the data sets included in the folder. Both Define-XML 1.0 and
Define-XML 2.0 are supported for use with Dataset-XML.
%datasetxml_write(
_cstSourceLibrary=srcdata,
_cstOutputLibrary=xmldata
_cstSourceMetadataDefineFileRef=srcmeta,
_cstCheckLengths=Y,
_cstIndent=N,
_cstZip=Y,
_cstDeleteAfterZip=N
);
In this example, the Dataset-XML files are compressed into ZIP files, with one ZIP file
per Dataset-XML file. But, the Dataset-XML files are not deleted after compression.
The Define-XML file that describes the SAS data sets must contain metadata about all
SAS data sets and all variables to convert. The Dataset-XML files by themselves do not
have any information about the SAS data sets (name and label) or the SAS variables
(name, label, data type, length, and display format). When the Dataset-XML file is
converted back to SAS data sets, this information must be provided by the Define-XML
file.
Here is an example of a Dataset-XML file:
<?xml version="1.0" encoding="UTF-8"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3"
xmlns:data="http://www.cdisc.org/ns/Dataset-XML/v1.0"
ODMVersion="1.3.2" FileType="Snapshot" FileOID="cdisc01.AE"
PriorFileOID="www.cdisc.org.Studycdisc01-Define-XML_2.0.0"
CreationDateTime="2014-06-23T13:18:18"
data:DatasetXMLVersion="1.0.0">
<ClinicalData StudyOID="cdisc01"
MetaDataVersionOID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2">
<ItemGroupData ItemGroupOID="IG.AE" data:ItemGroupDataSeq="1">
...
<ItemData ItemOID="IT.AE.AETERM" Value="AGITATED"/>
Here is an example of a Define-XML file:
<ODM ... >
<Study OID="cdisc01">
...
<MetaDataVersion OID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2"
Name="Study CDISC01, Data Definitions"
Description="Study CDISC01, Data Definitions"
def:DefineVersion="2.0.0" def:StandardName="SDTM-IG"
def:StandardVersion="3.1.2">
...
<ItemGroupDef OID="IG.AE"
Domain="AE" Name="AE" Repeating="Yes" IsReferenceData="No"
SASDatasetName="AE" Purpose="Tabulation"
def:Structure="One record per adverse event per subject"
def:Class="EVENTS" def:ArchiveLocationID="LF.AE">
...
<ItemRef ItemOID="IT.AE.AETERM" OrderNumber="6" Mandatory="Yes"/>
...
<ItemDef OID="IT.AE.AETERM" Name="AETERM" DataType="text" Length="25"
SASFieldName="AETERM">
It would be an error to try to extract from the Dataset-XML file the SAS data set name
from an ItemGroup object identifier (ItemGroupOID=“IG.AE”). It would also be an error
to try to extract the variable name from an object identifier (ItemOID=”IT.AE.AETERM”).
There is no requirement concerning the values of the identifiers.
SAS tables and columns are matched to @SASDatasetName (or, if this value is not
specified, @Name) and @SASFieldName (or, if this value is not specified, @Name).
SASDatasetName and SASFieldName are optional but @Name is required. So,
@Name is always available.
If the ItemGroup or ItemDef is not found, the XML is generated with this pattern for
@ItemGroupOID and @ItemOID:
ItemGroupOID = ”IG.<table>”
ItemOID = “IT.<table>.<column>”
Warnings are written to the SAS log file and the write_results data set in the results
folder.
Here is an example of the SAS log file:
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Columns not found in metadata:
ADAE.AEDECOD ADAE.AETERM
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Missing ItemData/@ItemOID for column=AEDECOD
406 Chapter 9 / XML-Based Standards
The following display shows an example of the write_results data set as created by the
create_datasetxml.sas sample program:
The @IsReferenceData attribute in the Define-XML file determines whether the data set
is considered ReferenceData or ClinicalData. Here is an example:
<ReferenceData StudyOID="cdisc01"
MetaDataVersionOID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2">
<ItemGroupData ItemGroupOID="IG.TE" data:ItemGroupDataSeq="1">
<ItemData ItemOID="IT.STUDYID" Value="CDISC01"/>
<ItemData ItemOID="IT.TE.DOMAIN" Value="TE"/>
<ItemData ItemOID="IT.TE.ETCD" Value="EOS"/>
<ItemData ItemOID="IT.TE.ELEMENT" Value="End of Study"/>
<ItemData ItemOID="IT.TE.TESTRL" Value="Study Termination"/>
<ItemData ItemOID="IT.TE.TEDUR" Value="P1D"/>
</ItemGroupData>
<ClinicalData StudyOID="cdisc01"
MetaDataVersionOID="MDV.CDISC01.SDTMIG.3.1.2.SDTM.1.2">
<ItemGroupData ItemGroupOID="IG.AE" data:ItemGroupDataSeq="1">
<ItemData ItemOID="IT.STUDYID" Value="CDISC01"/>
<ItemData ItemOID="IT.AE.DOMAIN" Value="AE"/>
<ItemData ItemOID="IT.USUBJID" Value="CDISC01.100008"/>
<ItemData ItemOID="IT.AE.AESEQ" Value="1"/>
<ItemData ItemOID="IT.AE.AESPID" Value="1"/>
<ItemData ItemOID="IT.AE.AETERM" Value="AGITATED"/>
Dataset-XML files into SAS data set with the %DATASETXML_READ macro. Warnings
are written to the SAS log file and the write_results data set in the results folder.
Here is an example of the SAS log file:
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AETERM Length=20 _valueLength=24 value=HEARTBURN-LIKE DYSPEPSIA
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AETERM Length=20 _valueLength=25 value=ACID REFLUX (OESOPHAGEAL)
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AEDECOD Length=20 _valueLength=32 value=Gastrooesophageal reflux disease
WARNING: [CSTLOGMESSAGE.DATASETXML_WRITE] Length too short: __ItemGroupOID=IG.ADAE
__ItemOID=IT.ADAE.AETERM Length=20 _valueLength=25 value=ACID REFLUX (OESOPHAGEAL)
The %DATASETXML_WRITE macro also checks that numeric variables in ADaM data
sets that represent date and time information have a DisplayFormat defined in the
Define-XML file.
The Define-XML file that describes the Dataset-XML files must contain metadata
information about all Dataset-XML files and all variables to convert to SAS data sets.
The Dataset-XML files by themselves do not have any information about the SAS data
sets (name and label) or the SAS variables (name, label, data type, length, and display
format).
Character variables that represent date- and time-related information in ADaM or SDTM
data conform to the ISO 8601 standard and do not have a length specified in the
Define-XML file. The _cstDateTimeLength parameter specifies the length to use for
these variables when they are converted to SAS data sets. If the lengths of character
variables are too short to hold the data, warnings are written to the SAS log file and the
read_results data set in the results folder.
Here is an example of the SAS log file:
WARNING: [CSTLOGMESSAGE.DATASETXML_READ] TRUNCATION occurred: Length=20 too short for
ItemGroupDataSeq=12 IT.ADAE.AETERM value=HEARTBURN-LIKE DYSPEPSIA (length=24)
WARNING: [CSTLOGMESSAGE.DATASETXML_READ] TRUNCATION occurred: Length=20 too short for
ItemGroupDataSeq=25 IT.ADAE.AETERM value=HEARTBURN-LIKE DYSPEPSIA (length=24)
WARNING: [CSTLOGMESSAGE.DATASETXML_READ] TRUNCATION occurred: Length=20 too short for
ItemGroupDataSeq=28 IT.ADAE.AETERM value=ACID REFLUX (OESOPHAGEAL)
WARNING: [CSTLOGMESSAGE.DATASETXML_READ] TRUNCATION occurred: Length=20 too short for
ItemGroupDataSeq=28 IT.ADAE.AEDECOD value=Gastrooesophageal reflux disease (length=32)
CDISC Dataset-XML 409
The following display shows an example of the read_results data set as created by the
create_sas_from_datasetxml.sas sample program:
Inconsistencies between the Dataset-XML file and the Define-XML file, which can lead
to issues with matching data to metadata, are written to the SAS log file and the
read_results data set in the results folder.
Here is an example of the SAS log file:
WARNING: [CSTLOGMESSAGE.DATASETXML_READ] Items not found in metadata:
IT.ADAE.AEDECOD IT.ADAE.AETERM
In the following example, the ADAE data set is created without the AETERM and
AEDECOD variables, as shown in this PROC COMPARE output:
Dataset Created Modified NVar NObs Label
Variables Summary
410 Chapter 9 / XML-Based Standards
For every SAS data set that is compared, the macro reports the error code as returned
by PROC COMPARE. The following table shows the error codes:
For example, code=40 (8+32) indicates that a format and a label are different. This
message is written to the SAS log file:
WARNING: [CSTLOGMESSAGE.CSTUTILCOMPAREDATASETS] Comparing srcdata.adqs and
trgdata.adqs - Differences: FORMAT/LABEL (SysInfo=40)
When converting SAS data sets to Dataset-XML and then converting back to SAS data
sets, here are difference to expect:
n Date- and time-related columns do not have a length defined in the Define-XML
metadata.
412 Chapter 9 / XML-Based Standards
By specifying PROC COMPARE options with the _cstCompOptions parameter, you can
specify that the comparison be less precise. For example,
_cstCompOptions=%str(criterion=0.00000000000001). Lesser precision
prevents differences close to machine precision from being reported as errors.
The following display shows an example of data set differences reported in the
read_results data set:
10
CDISC ADaM Data
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Overview
The SAS Clinical Standards Toolkit provides the following support for the CDISC ADaM
2.1 standard:
414 Chapter 10 / CDISC ADaM Data
n A metadata representation of the CDISC ADaM standard in a set of SAS data sets.
For more information, see “SAS Representation of CDISC ADaM Metadata” on page
414.
n The ability to derive template (zero-observation) data sets for the ADaM subject-
level Analysis (ADSL) data set, a representative Basic Data Structure (BDS) data
set, and an ADaM Adverse Event (ADAE) data set.
Note: Templates for additional ADaM data structures will be provided in future
releases after the CDISC ADaM team approves them for use.
n Implementation of version 1.2 CDISC ADaM validation checks as prepared by the
CDISC ADaM team.
In addition, SAS has provided validation checks for the ADAE and ADaM Time-to-
Event (ADTTE) domains. These validation checks are derived from individual
implementation guides provided by CDISC. For the ADAE domain, the release of the
implementation guide is Analysis Data Model (ADaM) Data Structure for Adverse
Event Analysis, Version 1.0. For the ADTTE domain, the release of the
implementation guide is ADaM Basic Data Structure for Time-to-Event Analyses,
Version 1.0.
n A sample reporting methodology that combines the analysis results metadata with a
sample set of tables, listings, and figures (TLF) metadata to create example clinical
study reports.
The specific sources from the ADaM document for each metadata type are shown in the
following table:
Analysis Data Set Section 5.1, Analysis Data Set Metadata, Table 5.1.1
In the SAS Clinical Standards Toolkit, the Analysis data set metadata is captured in the
reference_tables and class_tables data sets, which are located here:
global standards library directory/standards/
cdisc-adam-2.1-1.7/metadata
The SAS Clinical Standards Toolkit captures more metadata than might be specified for
a standard. This helps support SAS Clinical Standards Toolkit functionality and provides
greater consistency across supported standards.
The following table shows the mapping of the Analysis data set metadata defined by the
CDISC ADaM team to the SAS metadata representation in the reference_tables data
set:
DATASET LOCATION The folder and filename where the dataset xmlpath
can be found, ideally hyperlinked to the
actual dataset (that is, XPT file)
**Source: Analysis Data Model (ADaM), Version 2.1, Section 5.1, Analysis Dataset
Metadata, Table 5.1.1
The reference_tables data set provided with the SAS Clinical Standards Toolkit contains
three records for the ADaM ADAE data set, ADaM ADSL data set, and a representative
ADaM BDS data set. CDISC ADaM specifies that only the ADSL data set is required.
Any number of BDS data sets can be defined as required for each study.
In the SAS Clinical Standards Toolkit, Analysis Variable metadata is captured in the
reference_columns and class_columns data sets in the global standards library folder:
SAS Representation of CDISC ADaM Metadata 417
The following table shows the mapping of Analysis Variable metadata defined by the
CDISC ADaM team to the SAS metadata representation in the reference_columns data
set:
reference_
Analysis Variable Metadata columns Column
Field** Description** Mapping
reference_
Analysis Variable Metadata columns Column
Field** Description** Mapping
**Source: Analysis Data Model (ADaM), Version 2.1, Section 5.2, Analysis Variable
Metadata, Table 5.2.1
The reference_columns data set provided with the SAS Clinical Standards Toolkit
contains one record for each column in each of the three data sets (ADSL, BDS, and
ADAE) in the reference_tables data set. This results in 63 records (columns) for ADSL,
142 records (columns) for BDS, and 85 records (columns) for the ADAE data set.
Core reference_columns metadata for each column is in the Analysis Data Model
(ADaM) Implementation Guide, Version 1.0. Figure 10.1 on page 419 provides an
excerpt of ADSL column metadata as itemized in Table 3.1.1 of the Analysis Data Model
SAS Representation of CDISC ADaM Metadata 419
(ADaM) Implementation Guide, Version 1.0. This metadata has been translated into the
SAS representation of ADSL as shown in Figure 10.2 on page 419.
Figure 10.1 ADSL Columns as Specified in the Analysis Data Model (ADaM) Implementation
Guide
The SAS implementation makes assumptions about the data type and length of each
column. These assumptions represent a typical implementation consistent with SDTM
metadata and conventions for specific types of columns. For example, most identifiers
have a default length of 40, most flags have a length of 1, and columns using controlled
terminology are defined with a length that is long enough to capture the longest
controlled term.
A third type of metadata identified in the Analysis Data Model (ADaM), Version 2.1 (see
Table 10.1 on page 415) is analysis parameter value-level metadata. As noted in the
ADaM document:
“Each BDS data set can contain multiple analysis parameters. In a BDS analysis
dataset, the variable PARAM contains a unique description for every analysis parameter
included in that dataset. Each value of PARAM identifies a set of one or more rows in
the dataset. To describe how variable metadata vary by PARAM/PARAMCD, the
metadata element PARAMETER IDENTIFIER is required in variable-level metadata for
a BDS analysis dataset. This PARAMETER IDENTIFIER metadata element identifies
which variables have metadata that vary depending on PARAM/PARAMCD, and links
the metadata for a variable to the appropriate value of PARAM/PARAMCD.”
The SAS Clinical Standards Toolkit CDISC ADaM sample study provides a
source_values data set that captures analysis parameter information. This data set
offers a consistent approach for all CDISC standards that contribute metadata to the
derivation of CRT-DDS (ADaM, SDTM, and SEND).
SAS Representation of CDISC ADaM Metadata 421
The following display shows an excerpt of the sample ADaM source_values data set:
For more information about analysis parameter value-level metadata, see sections 5.2.1
and 5.2.2 of the Analysis Data Model (ADaM) Version 2.1 document.
The final set of metadata prescribed by the Analysis Data Model (ADaM) Version 2.1
document is analysis results metadata. Analysis results metadata is described in the
ADaM document:
“These metadata provide traceability from a result used in a statistical display to the
data in the analysis data sets. Analysis results metadata are not required. Analysis
results metadata describe the major attributes of a specified analysis result found in a
clinical study report or submission.”
422 Chapter 10 / CDISC ADaM Data
The metadata fields used to describe an analysis result are listed in Table 10.4 on page
422. The analysis results metadata is illustrated in the SAS Clinical Standards Toolkit
CDISC ADaM sample study analysis_results.sas7bdat data set found in sample
study library directory/cdisc-adam-2.1-1.7/sascstdemodata/
metadata. This sample file can serve as a template to initialize your analysis results
data set, or see “ADaM Data Set Templates” on page 425.
**Source: Analysis Data Model (ADaM), Version 2.1, Section 5.3, Analysis Results
Metadata, Table 5.3.1
Note: The structure of the analysis results metadata as described in Table 10.4 on
page 422 is different from the structure of the metadata that is needed for creating
ADaM Data Set Templates 425
Analysis Results Metadata 1.0 for Define-XML 2.0 because the latter is based on the
2013 implementation for Define-XML v2.
The successful creation of the data sets is reported in the SAS log:
Specifying additional data sets or columns in the global standards library folder results
in the %CST_CREATETABLESFORDATASTANDARD macro building a different set of
zero-observation data sets. The global standards library folder is located here:
global standards library directory/standards/
cdisc-adam-2.1-1.7/metadata
A zero-observation template data set for the analysis_results data set is located here:
global standards library directory/standards/cdisc-adam-2.1-1.7/
templates.
426 Chapter 10 / CDISC ADaM Data
Overview
Validation of CDISC ADaM data sets in the SAS Clinical Standards Toolkit uses the
same validation methodology used for other standards. Within the global standards
library, registering each standard includes setting the flag supportsvalidation in the
Metadata Standards data set. All standards that support validation, including ADaM, use
the same validation framework and processes described in Chapter 7, “Compliance
Assessment Against a Reference Standard,” on page 161.
ADaM validation of ADSL and BDS data sets is based on the CDISC ADaM Validation
Checks Version 1.2 Maintenance Release (dated and released July 5, 2012 to correct
errors and to add and remove checks). This documentation was prepared by the CDISC
ADaM team.
Note: In SAS Clinical Standards Toolkit 1.7, ADaM validation of ADSL and BDS data
sets changed from previous releases. The validation checks covered by OpenCDISC
have been removed, and only checks developed by SAS and 11 CDISC checks remain
(63 total). In SAS Clinical Standards Toolkit 1.7, these remaining 63 checks have no
corresponding checks in OpenCDISC and are provided solely to expand the validation
of ADaM domains.
The SAS Clinical Standards Toolkit defines validation checks using a combination of
these files:
n the Validation Master data set, which is located here:
global standards library directory/standards/cdisc-
adam-2.1-1.7/validation/control
This data set contains 63 records, 11 of which are CDISC validation checks.
n the Messages data set, which is located here:
global standards library directory/standards/cdisc-
adam-2.1-1.7/messages
Validation of ADaM Data Sets 427
This data set contains 56 observations. Some messages in this data set are used
across several checks in the Validation Master data set.
%CSTCHECK_COLUMN %CSTCHECK_CROSSSTDCOMPAREDOMAINS*
%CSTCHECK_COLUMNCOMPARE %CSTCHECK_CROSSSTDMETAMISMATCH*
%CSTCHECK_COLUMNVARLIST %CSTCHECK_METAMISMATCH
%CSTCHECK_COMPAREDOMAINS %CSTCHECK_NOTINCODELIST
428 Chapter 10 / CDISC ADaM Data
%CSTCHECK_DSMISMATCH %CSTCHECK_NOTUNIQUE
%CSTCHECK_NOTCONSISTENT %CSTCHECK_ZEROOBS
%CSTCHECKCOMPAREALLCOLUMNS*
* These macros are used only for CDISC ADaM validation, although they are available
to all standards.
Note: This list represents a subset of check macros that are available to all standards
to be validated.
For information about the purpose and use of each check macro, see the SAS Clinical
Standards Toolkit: Macro API Documentation.
Figure 10.4 Partial Metadata for the CDISC ADaM Cross-Standard Validation Checks
Validation of ADaM Data Sets 429
The following figure shows some of the installed SAS files for ADaM, the data and
metadata folders that support reporting, and the baddata and badmetadata folders that
support validation. The corresponding sample driver programs (analyze_data.sas and
validate_data.sas, respectively), which are located in the programs folder (as shown in
Figure 10.5 on page 429) point to the correct source data and metadata folders.
Figure 10.5 Example Folder Hierarchy for a CDISC ADaM Sample Study
Validation Results
The results of an ADaM validation process, as documented in the validation_results
data set, are shown in Figure 10.6 on page 430 and Figure 10.7 on page 431. The first
15 records of the data set shown in Figure 10.6 on page 430 have been excluded from
430 Chapter 10 / CDISC ADaM Data
the display because they report generic process setup and metadata information
common to all validation processes.
Records 22 through 24 report the results of one of the cross-standard validation checks.
This validation check finds a subject (USUBJID) in the ADaM data sets that was not
found in the SDTM DM domain.
A partial report of the validation_metrics data set (including a process summary noting
that 17 checks were attempted, two could not be run, and 11 errors were detected) is
shown in Figure 10.8 on page 432. The two checks that could not be run referenced
columns in the check metadata that could not be found or assessed in the source data
sets.
432 Chapter 10 / CDISC ADaM Data
Overview
The primary purpose of the CDISC ADaM standard is to build analysis data sets that
support analysis and reporting of clinical research. This purpose, in turn, supports the
greater goal of submitting clinical research results to regulatory authorities. These
regulatory authorities determine the efficacy and safety of a medical device or product.
The Analysis Data Model (ADaM), Version 2.1 document provides specifications for the
structure and content of analysis data sets, and a suggested metadata format for
documenting the analysis results generated. Analysis results metadata describe the
major attributes of a specified analysis result found in a clinical study report or
submission. Analysis results metadata support traceability from an analysis result used
in a statistical display to the data in the analysis data sets.
Sample Reporting Methodology 433
The SAS Clinical Standards Toolkit representation of the ADaM standard includes a
sample implementation of an analysis reporting methodology.
Note: This methodology is for illustrative purposes only. Each organization has its own
set of processes and workflows that support the generation of a clinical study report or
submission. The sample reporting methodology provided with the SAS Clinical
Standards Toolkit is intended to be representative of similar industry reporting
methodologies. The intent is not to provide a definitive reporting methodology, but to
illustrate the interaction of reporting components through the adoption of the ADaM
standard. The format for the analysis results metadata in the SAS Clinical Standards
Toolkit has been updated for the processes that create a Define-XML 2.0 file that
include analysis results metadata according to the Analysis Results Metadata 1.0 for
Define-XML 2.0 specification.
Key clinical trial reporting components are shown in the following table:
Source Data Source data for analysis data sets, often SDTM. Traceability
back to source data is a key ADaM requirement.
Controlled Terminology Set of allowable terms used in any source or analysis data set.
For CDISC, NCI EVS serves as the primary source of terms.
Analysis Data Sets ADaM data sets, typically including the ADSL data set and any
number of BDS data sets (for example, ADAE and ADLB)
required to support analyses.
Analysis Results (tables, The set of statistical displays (for example, text, tabular, or
listings, and figures) graphical presentation of results) or inferential statements (such
For more information, see as p-values or estimates of treatment effect).
“Analysis Results (Tables,
Listings, and Figures)” on
page 441.
TLF Metadata (to include Commonly provided as table shells. Can also include display-
table shells) specific metadata (often as Microsoft Excel files) used by the
For more information, see analysis programs to generate the displays.
“TLF Metadata” on page
435.
Analysis Results Metadata Defined by the Analysis Data Model (ADaM), Version 2.1
For more information, see document, Section 5.3. For more information, see Table 10.4 on
“Analysis Results page 422.
Metadata” on page 442.
Analysis Programs Programming code that uses the analysis data sets (and,
For more information, see optionally, TLF metadata) to create the analysis results.
“Analysis Programs” on
page 438.
Submission Package (for The structured submission used to package data, metadata,
example, eCTD) code, and results in a standard form to facilitate review.
The majority of the files supporting the ADaM sample reporting methodology provided
with the SAS Clinical Standards Toolkit are located in the ADaM analysis folder:
sample study library directory/cdisc-adam-2.1/sascstdemodata/
analysis
Sample Reporting Methodology 435
Figure 10.9 SAS Clinical Standards Toolkit ADaM Analysis Folder Hierarchy
TLF Metadata
A common industry reporting strategy is to create table shells (templates) that specify
the output for each statistical display. The SAS Clinical Standards Toolkit provides
sample table shells in this file:
sample study library directory/cdisc-adam-2.1–1.7/sascstdemodata/
analysis/documents/Mock_tables_shells.pdf.
One of these displays, a table reporting patient demographics (Table 14.2.01), follows:
436 Chapter 10 / CDISC ADaM Data
The elements of each table shell (for example, titles, footnotes, headings, column and
row labels, cell formatting, and so on) are sometimes captured in a metadata format,
often in Microsoft Excel files. The usual intent is to create reporting macros that can
generate analysis reports based on this metadata, so that changes in metadata are all
that is required to modify and rerun any report.
For the SAS Clinical Standards Toolkit, sample metadata is included that demonstrates
the use of such metadata within the ADaM reporting environment.
Note: The sample metadata provided does not represent a full implementation. All
metadata fields used in the report examples are not provided.
To interpret this metadata, a sample SAS XML map file (tlfddt.map) is provided in the
same folder. SAS data sets, representing this XML metadata, are provided in the library
of SAS files located here:
sample study library directory/cdisc-adam-2.1–1.7/sascstdemodata/
analysis/data
The following figures provide examples of some of the metadata available in the source
XML file. This metadata has been extracted into SAS data sets.
Row 1 of the Tlf_master data set describes a centered landscape table and shows
where the generating code can be found. The title for that table is provided in the
Tlf_titles file. These tables correspond to the table shell titles specified in Figure 10.10
on page 436.
438 Chapter 10 / CDISC ADaM Data
Analysis Programs
The analysis program to generate sample Table 14.2.01 is located here:
sample study library directory/cdisc-adam-2.1–1.7/sascstdemodata/
analysis/code
As noted above, these sample analysis programs do not fully use the sample TLF
metadata provided with the SAS Clinical Standards Toolkit. The basic coding strategy
adopted with each SAS Clinical Standards Toolkit sample analysis program is to build
each section (one or more row combinations) and to concatenate these sections into a
single input file used by PROC REPORT.
A sample driver program is provided to perform the process setup, to define (or
reference) the SASReferences data set, to perform any required report setup, and to
call the generic ADaM reporting macro %ADAM_CREATEDISPLAY. This sample driver
program is located here:
sample study library directory/cdisc-adam-2.1–1.7/sascstdemodata/
programs/analyze_data.sas
To automate this process of creating all analysis reports for a study, it would be
necessary to cycle through any available metadata (such as that described in Figure
10.12 on page 437) to construct multiple calls to the %ADAM_CREATEDISPLAY macro.
The %ADAM_CREATEDISPLAY macro header provides an overview of the macro
functionality and a summary of the defined macro parameters:
adam_createdisplay
Sample Reporting Methodology 439
The path to the code to create the display is provided either directly in the
macro parameters or is derived from a metadata source. Examples of metadata
sources are analysis results metadata or Tables, Listings, and Figures data
definition metadata (TLFDDT) that you maintain and reference in the
SASReferences data set.
Figure 10.14 Sample Results Data Set Generated by the analyze_data.sas Driver Program
Sample Reporting Methodology 441
11
Reporting
Sample Reports
Overview
To show how the SAS Clinical Standards Toolkit metadata and results can be
summarized in a report format, several sample reports are available with the SAS
Clinical Standards Toolkit. These reports are offered as templates that can be modified
to facilitate data review. The report templates are PROC REPORT implementations that
use ODS to generate report output in a variety of formats supported by ODS. Three
sample reports are provided:
n Report 1: This report is applicable to most SAS Clinical Standards Toolkit processes.
It itemizes records that are written to the Results data by the process. In the case of
validation processes, this report itemizes Results data set records by validation
check.
444 Chapter 11 / Reporting
n Report 2: This report is specific to the SAS Clinical Standards Toolkit validation
processes for standards that have the concept of source data domains (for example,
CDISC SDTM and CDISC ADaM). Results are summarized by domain.
n Report 3: This report is specific to the SAS Clinical Standards Toolkit validation
functionality that summarizes all available metadata about validation checks for a
supported standard. This report offers a multi-panel or one-page-per-check
presentation format.
A sample driver program is provided to define the SAS Clinical Standards Toolkit
environment and to call the primary task framework macro
(%CSTUTIL_CREATEREPORT). This excerpt from the driver program header provides
a brief overview:
cst_report.sas
Two options for invoking this routine are addressed in these scenarios:
(1) This code is run as a natural continuation of a CST process, within
the same SAS session, with all required files available. The working
assumption is that the SASReferences data set (referenced by the
_cstSASRefs macro) exists and contains information on all input files
required for reporting.
Process Results Reporting 445
(2) This code is being run in another SAS session with no CST setup
established, but the user has a CST results data set and therefore can
derive the location of the SASReferences file that can provide the full
CST setup needed to run the reports.
Assumptions:
To generate all panels for both types of reports, the following metadata
is expected:
- the SASReferences file must exist, and must be identified in the
call to cstutil_processsetup if it is not work.sasreferences.
- a Results data set.
- a (validation-specific) Metrics data set.
- the (validation-specific) run-time Control data set itemizing the
validation checks requested.
- access to the (validation-specific) check messages data set.
The reporting as implemented in the SAS Clinical Standards Toolkit attempts to address
these two scenarios described in the driver program header above:
1 Some SAS Clinical Standards Toolkit task (such as validation against a reference
standard) has been completed. The Results data set has been created. And, in the
same SAS session (or batch job stream), you want to generate one or both reports.
In this scenario, the reporting process uses the SASReferences data set defined by
the global macro variable _cstSASRefs that was used by the previous process. The
Results data set to be summarized in the report is the data set that was previously
created and perhaps persisted to a location other than the SAS Work library.
(Whether the data set was persisted was specified in the SASReferences data set.)
Other files required by the report are identified in Table 11.1 on page 447.
2 The Results data set that was created in some prior SAS Clinical Standards Toolkit
session is available. You want to generate one or both reports. The SAS Clinical
Standards Toolkit processes add informational records to the Results data set,
documenting the process itself. For example, a SAS Clinical Standards Toolkit
446 Chapter 11 / Reporting
CDISC SDTM validation process writes records to the Results data set that contains
this sample message text:
Message
PROCESS STANDARD: CDISC-SDTM
PROCESS STANDARDVERSION: 3.1.3
PROCESS DRIVER: SDTM_VALIDATE
PROCESS DATE: 2012-10-01T09:57:14
PROCESS TYPE: VALIDATION
PROCESS SASREFERENCES:
&_cstSRoot./cdisc-sdtm-3.1.3-1.7/sascstdemodata/control/
sasreferences.sas7bdat
From this information, a reporting process can attempt to find and open the
referenced SASReferences data set to derive information for some or all of the
report sections.
CAUTION! There are obvious limits to how useful any SAS Clinical Standards
Toolkit Results data set can be in rebuilding a session for reporting purposes.
For example, if the SASReferences data set was built in the Work library in a
previous session, then it is not available and the report process fails. Similarly, if the
SASReferences data set references library and file paths using a macro variable
prefix (for example, &_cstGRoot or &studyRootPath), and those macro variables are
not set or point to a different root path than the original process, then the report
process might fail or yield unpredictable results. In the example above, the
referenced SASReferences data set points to the sample library folder hierarchy that
was used for a SAS Clinical Standards Toolkit 1.5 process. This folder hierarchy still
exists in the SAS Clinical Standards Toolkit 1.7, so the results data set would more
likely be found. This scenario or technique is most appropriate for sites that adopt a
consistent means of building and populating SASReferences data sets.
Process Results Reporting 447
SASReferences &_cstSASRefs used by the prior The Results data set record
task that generated the Results containing the message
data set. PROCESS SASREFERENCES
attempts to use the referenced file.
&_cstSASRefs is set to this file.
Note: In the SAS Clinical Standards Toolkit, you are able to define report output
locations in the SASReferences data set. These locations can be defined with
type=report in SASReferences. They can be further specified in the framework
448 Chapter 11 / Reporting
Standardlookup data set. For more information, see Chapter 2, “Framework,” on page
7.
This code is excerpted from the cst_report.sas driver program and performs the setup
tasks that are specific to reporting:
%cstutil_processsetup(_cstSASReferencesLocation=&studyrootpath/control);
%cstutil_reportsetup(_cstRptType=Results);
%let _cstSetupSrc=RESULTS;
%cstutil_processsetup();
%let _cstRptResultsDS=work.validation_results;
%cstutil_reportsetup(_cstRptType=Results);
Process Results Reporting 449
As the final step, the reporting driver program makes one or more calls to the utility
reporting macro. At a minimum (using default parameter values), a macro call to create
report 2 might include this code:
%cstutil_createreport(_cstsasreferencesdset=&_cstSASRefs,_cstreportbydomain=Y,
_cstreportoutput=&studyrootpath/results/cstchecktablereport.pdf);
Note: For more information about the %CSTUTIL_CREATEREPORT macro, see the
SAS Clinical Standards Toolkit: Macro API Documentation.
%cstutil_createreport(
_cstsasreferencesdset=&_cstSASRefs,
_cstresultsdset=&_cstRptResultsDS,
_cstmetricsdset=&_cstRptMetricsDS,
_cstreportbytable=N,
_cstreporterrorsonly=Y,
_cstreportobs=50,
_cstreportoutput=%nrbquote(&_cstRptOutputFile),
_cstsummaryReport=Y,
_cstioReport=Y,
_cstmetricsReport=Y,
_cstgeneralResultsReport=Y,
_cstcheckIdResultsReport=Y);
Interpretation of this request produces a (validation) results listing that contains all five
report panels and includes only the first 50 errors that are reported for each validation
check.
450 Chapter 11 / Reporting
The following displays show report content. The displays apply to report 1 (by checkid)
unless otherwise indicated.
A sample driver program is provided to define the SAS Clinical Standards Toolkit
environment and to call the primary task framework macro
(%CSTUTIL_CREATEMETADATAREPORT). This excerpt from the driver program
header provides a brief overview:
cst_metadatareport.sas
Two scenarios for invoking this routine are addressed in this driver module:
(1) This code is run as a natural continuation of a CST process, within
the same SAS session, with all required files available. The working
assumption is that the SASReferences data set (&_cstSASRefs) exists and
contains information on all files required for reporting.
(2) This code is being run in another SAS session with no CST setup
established. In this case, the user assumes responsibility for
defining all librefs and macro variables needed to run the reports,
although defaults are set.
Validation Check Metadata Reporting 455
Assumptions:
(1) SASReferences is not required for this task. If found, it will be used.
If not found, default libraries and macro variables are set and may be
overridden by the user.
(2) The user of this code may override any cstutil_createmetadatareport
parameter values.
(3) Only the cstutil_createmetadatareport &_cstRptControl and &_cstMessages
parameters are REQUIRED.
(4) If the _cststdrefds parameter is not set, the associated panel cannot be
generated.
(5) By default, a PDF report format is assumed. This may be overridden.
(6) Report output will be written to cstcheckmetadatareport.pdf in the SAS
WORK library unless another location is specified in SASReferences or
in the set-up code below.
(7) The report macro cstutil_createmetadatareport will only produce panel 1
(Check Overview) unless any of the last 3 parameters are set to Y.
Report setup is similar to reporting on process results. The only key difference is that
the call to the %CSTUTIL_REPORTSETUP macro passes a different parameter value
to request check metadata reporting:
%cstutil_reportsetup(_cstRptType=Metadata);
To generate the metadata report, the reporting driver program makes one or more calls
to the utility reporting macro. At a minimum (using default parameter values), a macro
call to create report 3 might include this code:
%cstutil_createmetadatareport(
_cstValidationDS=&_cstRptControl
,_cstMessagesDS=&_cstMessages
,_cstReportOutput=%bquote(&_cstRptOutput)
);
%cstutil_createmetadatareport(
_cststandardtitle=%str(CDISC-SDTM 3.1.3 Validation Check Metadata),
_cstvalidationds=refcntl.validation_master,
_cstvalidationdswhclause=,
_cstmessagesds=&_cstMessages,
456 Chapter 11 / Reporting
_cststdrefds=refcntl.validation_stdref,
_cstreportoutput=%nrbquote(&studyOutputPath/results/cstcheckmetadatareport.pdf),
_cstcheckmdreport=Y,
_cstmessagereport=Y,
_cststdrefreport=Y,
_cstrecordview=N);
Appendix 1
Global Macro Variables
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
Overview
Most of the SAS Clinical Standards Toolkit global macro variables that are provided by
SAS are defined in properties files in the form of name and value pairs. Here is an
example:
_cstDebug=0
2 The file is included in the SASReferences data set (with type=properties), in which
the %CSTUTIL_ALLOCATESASREFERENCES macro calls the
%CST_SETPROPERTIES macro.
Global macro variables can be deleted at the end of a process if the SAS Clinical
Standards Toolkit utility macro %CSTUTIL_CLEANUPCSTSESSION is called with the
_cstDeleteGlobalMacroVars parameter set to 1.
Here are several commonly used global macro variables that are not defined in the
properties files previously described:
Appendix 2
Additional Utility Macros
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Overview
To help you develop content for a new standard or study, SAS provides these macros:
n %CSTUTILSQLCOLUMNDEFINITION
n %CSTUTILSQLGENERATETABLE
n %CSTUTILFINDFIXEXTDASCIICHARS
The %CSTUTILSQLCOLUMNDEFINITION
Macro
The %CSTUTILSQLCOLUMNDEFINITION macro generates the SQL equivalent of the
SAS ATTRIB statement in a SAS data set. The structure and content of the returned
code differs based on what type of SQL code you choose to generate: SAS, ANSI, or
Oracle.
The macro checks each column name in the SAS data set against a list of reserved
words for both ANSI SQL and Oracle SQL. If a reserved word is found in the SAS data
set, a message appears in the SAS log file, and the macro appends __SQL1 (single
Generating PROC SQL Code to Create and Populate Data Sets 465
underline, single underline, SQL1) to the column name in the SAS data set. In the
generated code, you must decide whether to modify the column name in the generated
code or rename the column in the SAS data set before submitting the macro.
DOMAIN varchar(8),
USUBJID varchar(40),
AESEQ numeric,
AEGRPID varchar(40),
AEREFID varchar(40),
AESPID varchar(40),
AETERM varchar(200),
AEMODIFY varchar(200),
AELLT varchar(100),
AELLTCD numeric,
AEDECOD varchar(200),
AEPTCD numeric,
AEHLT varchar(100),
AEHLTCD numeric,
AEHLGT varchar(100),
AEHLGTCD numeric,
AECAT varchar(40),
AESCAT varchar(40),
AEPRESP varchar(2),
AEBODSYS varchar(80),
AEBDSYCD numeric,
AESOC varchar(80),
AESOCCD numeric,
AELOC varchar(40),
AESEV varchar(20),
AESER varchar(2),
AEACN varchar(40),
AEACNOTH varchar(200),
AEREL varchar(40),
AERELNST varchar(40),
AEPATT varchar(20),
AEOUT varchar(40),
AESCAN varchar(2),
AESCONG varchar(2),
AESDISAB varchar(2),
AESDTH varchar(2),
AESHOSP varchar(2),
AESLIFE varchar(2),
AESOD varchar(2),
AESMIE varchar(2),
AECONTRT varchar(2),
AETOXGR varchar(20),
AESTDTC varchar(64),
AEENDTC varchar(64),
AESTDY numeric,
AEENDY numeric,
AEDUR varchar(64),
468 Appendix 2 / Additional Utility Macros
AEENRF varchar(20),
AEENRTPT varchar(40),
AEENTPT varchar(40) )
AEREL varchar2(40),
AERELNST varchar2(40),
AEPATT varchar2(20),
AEOUT varchar2(40),
AESCAN varchar2(2),
AESCONG varchar2(2),
AESDISAB varchar2(2),
AESDTH varchar2(2),
AESHOSP varchar2(2),
AESLIFE varchar2(2),
AESOD varchar2(2),
AESMIE varchar2(2),
AECONTRT varchar2(2),
AETOXGR varchar2(20),
AESTDTC varchar2(64),
AEENDTC varchar2(64),
AESTDY numeric,
AEENDY numeric,
AEDUR varchar2(64),
AEENRF varchar2(20),
AEENRTPT varchar2(40),
AEENTPT varchar2(40) )
n For ANSI SQL and Oracle SQL, review modified reserved words. The macro
identifies reserved words and appends a reserved word with __SQL1 (single
underline, single underline, SQL1).
n For ANSI SQL and Oracle SQL, review and modify the SQL code as needed.
The resulting SAS PROC SQL code is written to the create_sasSQL.sas file, which is
specified by the _cstSQLFile parameter. The resulting table generated by the SQL code
is written to the test library, which is specified by the _cstDSLibraryOut parameter.
The following is an excerpt of the generated code in the create_sasSQL.sas file:
proc sql;
create table work.cst7495 (label="Adverse Events")
(STUDYID char(40) label="Study Identifier", DOMAIN char(8) label="Domain Abbreviation",...);
insert into work.cst7495
values ('SASCSTDEMODATA' , 'AE' , 'S001P002' , 1, '' , '' , '' , 'ABDOMINAL PAIN' , '' , '' ,...)
values ('SASCSTDEMODATA' , 'AE' , 'S001P003' , 2, '' , '' , '' , 'ABDOMINAL CRAMP' , '' , 'Abdominal...)
values ('SASCSTDEMODATA' , 'AE' , 'S001P003' , 3, '' , '' , '' , 'RASH' , '' , 'Rash' , 10037844,...)
.
.
.
;
create table test.AE (label="Adverse Events")
as select * from work.cst7495 order by STUDYID, USUBJID, AEDECOD, AESTDTC
;
drop table work.cst7495
;
quit;
After you submit the create_sasSQL.sas file in a SAS session, the AE data set is
created in the test library.
The following display shows that, in addition to the data, metadata (such as the label
and the sort order) of the AE data set is retained.
insert into AE (STUDYID, DOMAIN__SQL1, USUBJID, AESEQ, AEGRPID, AEREFID, AESPID, AETERM, AEMODIFY,...)
values ('SASCSTDEMODATA' , 'AE' , 'S001P003' , 3, '' , '' , '' , 'RASH' , '' , 'Rash' , 10037844,...)
.
.
.
insert into AE (STUDYID, DOMAIN__SQL1, USUBJID, AESEQ, AEGRPID, AEREFID, AESPID, AETERM, AEMODIFY,...)
values ('SASCSTDEMODATA' , 'AE' , 'S003P019' , 106, '' , '' , '' , 'HEARTBURN-LIKE DYSPEPSIA' , '' ,...);
;
Notice the DOMAIN column from the AE data set has been renamed in the generated
Oracle SQL code as DOMAIN__SQL1. The word “domain” is a reserved word in Oracle
SQL. Therefore, the macro appends __SQL1. You must decide where to change this
column name: In the data set before submitting the macro or in the generated Oracle
SQL code (to rename the column in the generated table).
After submitting the macro, the SAS log file contains the following warning message:
[CSTLOGMESSAGE.CSTUTILSQLCOLUMNDEFINITION] WARNING: Column [DOMAIN ] is an ORACLE SQL
RESERVED WORD - This column may need to be changed in the contributing SAS data set.
[CSTLOGMESSAGE.CSTUTILSQLCOLUMNDEFINITION] WARNING: Column [DOMAIN ] is being renamed to
DOMAIN__SQL1 .
The %CSTUTILFINDFIXEXTDASCIICHARS
Macro
The %CSTUTILFINDFIXEXTDASCIICHARS macro performs these tasks:
n identify extended ASCII characters in column values in a SAS data set
n create a SAS data set that contains the extended ASCII characters and their
replacement characters
n generate code to replace the extended ASCII characters with acceptable characters
474 Appendix 2 / Additional Utility Macros
Extended ASCII characters occur most often when a SAS data set is populated by
reading a Microsoft Excel spreadsheet or Word document that contains characters such
as curly quotation marks and double quotation marks.
Note: The code generated by this macro replaces the extended ASCII characters in the
SAS data set, not the macro itself.
This macro uses a SAS format in the macro code to map replacement characters to the
extended ASCII characters. SAS provides a default format for mapping to common
extended ASCII characters. You should review the mappings, change them, or create
new mappings.
Note: This macro does not handle double-byte character set (DBCS) data.
In addition to the SAS format in the macro code, this macro accepts an external SAS
format that you create. This external SAS format enables you to create different ASCII
mappings for different studies or standards without having to change the global
mappings in the macro code. For more information, see “Example: Using an External
SAS Format” on page 488.
This macro creates a SAS data set (specified by the _cstOutputDS parameter) that
contains the extended ASCII characters and the characters with which to replace them.
An extended ASCII character that does not have a replacement character is indicated
by a question mark (?) (or the value specified by the _cstExtFmtOtherValue parameter)
in the _cstRemapNote column. The ? provides a visual cue that a valid value is needed
to replace an extended ASCII character.
Replacing Extended ASCII Characters in a SAS Data Set 475
Note: You must map replacement characters in either the SAS format in the macro
code or in an external SAS format, and then resubmit the macro to ensure that all
extended ASCII characters are replaced.
Data sets that are created by generated code are written to the output directory
specified by the _cstWriteToLib parameter. The default output directory is WORK. Data
set labels and the sort order of the original data sets are maintained.
Note: You must manage the output directory because files can be overwritten by
subsequent submissions of the generated code.
The following display shows the data set before the extended ASCII characters ` , ’ , “ ,
and ” (ASCII values 145 through 148) are replaced:
Figure A2.4 testdata.ext_ascii Data Set Before Replacing the Extended ASCII Characters
The _cstNote column identifies the record number and the column position of the record
value of the extended ASCII character. The _cstRemapNote column specifies the
extended ASCII character and its replacement value.
All of the records in testdata.ext_ascii that contain extended ASCII characters have
replacement values. As a result, the macro is submitted and the following SAS code is
generated in the findextendedascii.sas file:
%macro _cstFixASCII;
********************************************;
********** Initialize libraries **********;
********************************************;
***********************************************************************************;
********** Updating data set testdata.ext_ascii **********;
***********************************************************************************;
%let _cstDSLabel=%cstutilgetattribute(_cstDataSetName=testdata.ext_ascii,
_cstAttribute=LABEL);
%let _cstDSSortVars=%cstutilgetattribute(_cstDataSetName=testdata.ext_ascii,
_cstAttribute=SORTEDBY);
%mend;
%_cstFixASCII;
All four extended ASCII characters are included in this generated code. A combination
of the BYTE and TRANWRD functions is used to convert the extended ASCII
characters to replacement characters. The %CSTUTILGETATTRIBUTE macro retrieves
the sort order and the label of the original data set. If they exist, these values are used
when the output data set is created to maintain the original metadata associated with
the original files. Otherwise, the original label and sort order are lost.
478 Appendix 2 / Additional Utility Macros
The following display shows the data set after replacing the extended ASCII characters:
Figure A2.6 work.ext_ascii Data Set After Replacing the Extended ASCII Characters
To identify the extended ASCII characters that must be replaced, the following
parameters are specified in the %CSTUTILFINDFIXEXTDASCIICHARS macro:
%cstutilfindfixextdasciichars(
_cstDSName=testdata.ext_ascii2,
_cstColumnName=stringchars,
_cstGeneratedCodeFile=c:/fixascii/findextendedascii2.sas,
_cstOutputDS=work._cstProblems2,
_cstWriteToLib=testdat2);
Replacing Extended ASCII Characters in a SAS Data Set 479
Here are the meanings of the two parameters not specified in the previous example:
n _cstOuputDS is the data set to record the references to extended ASCII characters
in _cstDSName. The value is specified as work._cstProblems2. (The default is
work._cstProblems, which was used by default in the previous example.)
n The _cstWriteToLib parameter is the library in which to write the data sets created by
the generated code. This is specified as testdat2.
The following display shows the content of the work._cstProblems2 data set:
The _cstNote column identifies the record number and the column position of the record
value of the extended ASCII character. The _cstRemapNote column specifies the
extended ASCII character and its replacement value.
Notice that the fifth record has a ? as the replacement ASCII character. This is the
visual cue shown in Figure A2.3 on page 475.
Note: All extended ASCII characters must be mapped before submitting the generated
code.
Although one of the extended ASCII characters is not mapped, the SAS code is still
generated in the c:/fixascii/findextendedascii2.sas file, which is specified
by the _cstGeneratedCodeFile parameter.
Here is the generated code:
%macro _cstFixASCII;
********************************************;
********** Initialize libraries **********;
********************************************;
***********************************************************************************;
********** Updating data set testdata.ext_ascii2 **********;
***********************************************************************************;
%let _cstDSLabel=%cstutilgetattribute(_cstDataSetName=testdata.ext_ascii2,
_cstAttribute=LABEL);
%let _cstDSSortVars=%cstutilgetattribute(_cstDataSetName=testdata.ext_ascii2,
_cstAttribute=SORTEDBY);
%mend;
%_cstFixASCII;
Notice the differences between this SAS code and the SAS code for the previous
example. This SAS code includes an additional LIBNAME statement for the output
library reference specified by the _cstWriteToLib parameter (testdat2).
The line
stringchars=tranwrd(stringchars,byte(159),byte(?))
Replacing Extended ASCII Characters in a SAS Data Set 481
contains the unmapped extended ASCII character. In addition to the ? as a visual cue
that a replacement value is needed, a message is written to the SAS log file after the
%CSTUTILFINDFIXEXTDASCIICHARS macro is submitted.
***********************************************************************************************
[CSTLOGMESSAGE.CSTUTILFINDFIXEXTDASCIICHARS] WARNING: Unresolved extended ASCII characters are
present in the data. Refer to work._cstProblems2 for more information.
[CSTLOGMESSAGE.CSTUTILFINDFIXEXTDASCIICHARS] WARNING: These unresolved values need to be
updated in the PROC FORMAT statement of this macro.
***********************************************************************************************
.)
When you are unfamiliar with the data and there are many data sets, the
%CSTUTILFINDFIXEXTDASCIICHARS macro enables you to examine all data sets in
a specific library for extended ASCII characters.
The following example demonstrates identifying the extended ASCII characters in all of
the data sets and in all of the columns in the testdata library:
%cstutilfindfixextdasciichars(
_cstDSName=testdata._ALL_,
_cstGeneratedCodeFile=c:/fixascii/findfixextendedascii3.sas);
The _cstDSName parameter includes the LIBNAME reference and the keyword _ALL_.
Note: The _cstColumnName parameter is omitted and cannot be used with the _ALL_
keyword.
>>>>>
>>>>> Starting test for: TESTDATA.EXT_ASCII2
>>>>>
As each data set is examined, a starting message (Starting test for) and a list of
variables (Variable List to Variable Count) are written to the SAS log file.
The following warning message is written to the SAS log file to inform you that
unresolved extended ASCII characters are present in the data set:
***********************************************************************************************
[CSTLOGMESSAGE.CSTUTILFINDFIXEXTDASCIICHARS] WARNING: Unresolved extended ASCII characters are
present in the data. Refer to work._cstProblems for more information.
[CSTLOGMESSAGE.CSTUTILFINDFIXEXTDASCIICHARS] WARNING: These unresolved values need to be
updated in the PROC FORMAT statement of this macro.
***********************************************************************************************
***********************************************************************************;
********** Updating data set TESTDATA.EXT_ASCII **********;
***********************************************************************************;
%let _cstDSLabel=%cstutilgetattribute(_cstDataSetName=TESTDATA.EXT_ASCII,
_cstAttribute=LABEL);
%let _cstDSSortVars=%cstutilgetattribute(_cstDataSetName=TESTDATA.EXT_ASCII,
_cstAttribute=SORTEDBY);
end;
if _n_= 4 then do;
characters=tranwrd(characters,byte(148),byte(34));
stringchars=tranwrd(stringchars,byte(148),byte(34));
end;
run;
***********************************************************************************;
********** Updating data set TESTDATA.EXT_ASCII2 **********;
***********************************************************************************;
%let _cstDSLabel=%cstutilgetattribute(_cstDataSetName=TESTDATA.EXT_ASCII2,
_cstAttribute=LABEL);
%let _cstDSSortVars=%cstutilgetattribute(_cstDataSetName=TESTDATA.EXT_ASCII2,
_cstAttribute=SORTEDBY);
%do;
proc sort data=work.EXT_ASCII2;
by &_cstDSSortVars
run;
%end;
%mend;
Before any updates can be made to the ext_ascii2 data set, the following lines of code
must be resolved by mapping a value to the extended ASCII character 159:
characters=tranwrd(characters,byte(159),byte(?));
stringchars=tranwrd(stringchars,byte(159),byte(?));
Note: For each submission of the macro, the _cstRetainOutputDS parameter must be
specified as Y and the _cstGeneratedCodeFile parameter must specify the same file.
This example examines all columns in testdata.ext_ascii. The output data set is
specified as work.all_asciiProblems.
For the first submission, the _cstRetainOutputDS parameter is specified as N. This
clears the existing data set specified by the _cstOutputDS parameter.
%(
_cstDSName=testdata.ext_ascii,
_cstGeneratedCodeFile=c:/fixascii/findfixextendedascii4.sas,
_cstOutputDS=work.all_asciiProblems,
_cstRetainOutputDS=N,
_cstFindFix=Find);
486 Appendix 2 / Additional Utility Macros
blocks in the code: one for TESTDATA and another for TESTDAT2 (with corresponding
output libraries OUT1 and OUT2).
Here is an excerpt of the generated code in the findfixextendedascii4.sas file:
%macro _cstFixASCII;
********************************************;
********** Initialize libraries **********;
********************************************;
***********************************************************************************;
********** Updating data set testdat2.all_ascii **********;
***********************************************************************************;
%let _cstDSLabel=%cstutilgetattribute(_cstDataSetName=testdat2.all_ascii,
_cstAttribute=LABEL);
%let _cstDSSortVars=%cstutilgetattribute(_cstDataSetName=testdat2.all_ascii,
_cstAttribute=SORTEDBY);
run;
********************************************;
********** Initialize libraries **********;
********************************************;
***********************************************************************************;
********** Updating data set testdata.ext_ascii **********;
***********************************************************************************;
%let _cstDSLabel=%cstutilgetattribute(_cstDataSetName=testdata.ext_ascii,
_cstAttribute=LABEL);
%let _cstDSSortVars=%cstutilgetattribute(_cstDataSetName=testdata.ext_ascii,
_cstAttribute=SORTEDBY);
%mend;
488 Appendix 2 / Additional Utility Macros
%_cstFixASCII;
When you use an external SAS format, you must specify the value in the external SAS
format that indicates a missing value. You specify this missing value in the
_cstExtFmtOtherValue parameter. For example, if the external SAS format specifies
other=MISSING, the value of the _cstExtFmtOtherValue parameter must be MISSING.
The %CSTUTILFINDFIXEXTDASCIICHARS macro can then act on the missing value.
If the external SAS format does not contain a other= statement, the default value is
**.
Here is an example of an external SAS format and the macro submission:
proc format library=work.myformats;
value asciifmt
10=32
19=45
20=45
24=39
25=39
28=34
29=34
139=60
145=39
146=39
147=34
148=34
150=45
Replacing Extended ASCII Characters in a SAS Data Set 489
151=45
155=62
other=MISSING;
run;
options fmtsearch=(work.myformats);
%cstutilfindfixextdasciichars(
_cstDSName=testdat2.all_ascii,
_cstColumnName=stringchars,
_cstExternalFmt=asciifmt,
_cstExtFmtOtherValue=MISSING,
_cstGeneratedCodeFile=c:/fixascii/findfixextendedascii5.sas,
_cstOutputDS=all_cstProblems,
_cstRetainOutputDS=N,
_cstWriteToLib=work,
_cstFindFix=Find
);
Note: Best practices recommend that an external SAS format be stored in a managed
permanent format catalog.
The following display shows the _cstOutputDS data set. The _cstRemapValue for other
is MISSING, which alerts you to a problem:
***********************************************************************************;
********** Updating data set testdat2.all_ascii **********;
***********************************************************************************;
%let _cstDSLabel=%cstutilgetattribute(_cstDataSetName=testdat2.all_ascii,
_cstAttribute=LABEL);
490 Appendix 2 / Additional Utility Macros
%let _cstDSSortVars=%cstutilgetattribute(_cstDataSetName=testdat2.all_ascii,
_cstAttribute=SORTEDBY);
The line
test_stringchars=tranwrd(test_stringchars,byte(9),byte(MISSING));
is the visual cue that an additional mapping is required. This represents the other=
value specified in the external SAS format.
491
Index
C CDISC Dataset-XML 402
CDISC Define-XML 2.0 111
CDISC 1 CDISC ODM 120
CDISC ADaM 102 CDISC SDTM 93
Analysis data set metadata CDISC SDTM 3.1.1
415 reference standard 97
analysis results metadata 422 CDISC SDTM 3.1.2
analysis variable metadata reference standard 97
417 CDISC SDTM 3.1.3
cross-standard validation 428 reference standard 99
data set templates 425 CDISC SDTM 3.2
key clinical reporting reference standard 100
components 433 CDISC SEND 129
overview 413 clinical
sample data 429 defined 1
sample reporting 432 Clinical Data Interchange
SAS representation 414 Standards Consortium
TLF metadata 435 See CDISC
unique validation properties clinical research activities 1
427 columns
validation check macros 427 in data tables 55
validation of analysis data sets common framework metadata
426 13
CDISC CDASH 130 controlled terminology 165
CDISC Controlled Terminology alternatives 261
132 defined 165
CDISC CRT-DDS 106
CDISC CRT-DDS standard
sample XML style sheet 55
492 Index
F
G
files
list of files associated with global macro variables
registered standard 19 initializing 16
folder hierarchy global standards library 9
global standards library 90 directories in 9
framework directory structure 11
creating data sets used by 20 folder hierarchy 90
creating table shells based on
a data standard 20
Index 493
L
P
list of files and data sets
associated with registered process controls 164
standard 19 defined 164
list of installed standards 17 properties 15, 165
logs directory 9 defined 165
properties files
structure of 46
M
macro variables R
initializing framework's global
macro variables 16 reference metadata 165
macros defined 165
494 Index
getting a copy of 21 S
reference standards 90
reference_columns data set 55 SAS Clinical Standards Toolkit
reference_tables data set 54 1
references 2 SAS sessions
referencing default version of translating content of
standards 17 SASReferences file for 158
registered standards SASReferences data set 15
inserting information from file content and structure 42
SASReferences files into validating 273
22 SASReferences file
list of files and data sets assessing structural integrity
associated with 19 and content 153
registering communicating filename and
new standards 26 location to SAS Clinical
new version of a standard 26 Standards Toolkit 151
unregistering a standard how it's used 151
version 27 translating content for SAS
unregistering an old version of sessions 158
a standard, then registering SASReferences files 137
a new version of a standard building 138
28 inserting information from
releases registered standards into
determining which release is 22
installed 18 sample files 138
results 165 templates 138
defined 165 utility macros 139
Results data set 15 scenarios
file content and structure 50 maintenance usage scenarios
revisions 25
determining which revision is scenarios for framework usage
installed 18 16
schema-repository directory 12
set of checks to run 165
defined 165
Index 495