ADAM Trace Bility
ADAM Trace Bility
ABSTRACT
One of the fundamental principles of ADaM is that datasets and associated metadata must include
traceability as a link between analysis results, ADaM datasets, and SDTM datasets. The existing ADaM
documents contain some examples of simple traceability, such as variable derivations and inclusion of
the SDTM sequence number, but what about more complex examples?
An ADaM sub-team is currently developing a Traceability Examples Document, showing how traceability
can be employed in a wide variety of practical scenarios. Some of these examples contain content from
other CDISC documents, modified to focus on the traceability aspects. Others are being developed
specifically for the Traceability Examples Document. As members of the Traceability Examples ADaM
sub-team, we are including in this PharmaSUG paper and presentation a selection of examples that
demonstrate the power of traceability in complex analyses.
INTRODUCTION
Traceability is mentioned throughout the published ADaM documents. It is, in fact, one of the fundamental
principles of ADaM. Section 2.1 of the CDISC ADaMIG v1.1 states “ADaM datasets and associated
metadata must provide traceability to show the source or derivation of a value or variable (i.e., the data’s
lineage or relationship between an analysis value and its predecessor(s)).”
ADaM documents include examples that demonstrate this principle of traceability. These examples don’t
imply a rule or required process, just one possible way to handle specific analysis needs that abide by the
fundamental principle of traceability. The ADaM Traceability Examples document, currently in
development, focuses on examples that demonstrate traceability in both data and metadata.
At the time of this writing, there were fifteen examples being developed as part of the ADaM Traceability
Examples document. In order to share with the community, even before the ADaM Traceability Examples
document is published for public review, the following set of examples are described here in this paper by
the ADaM authors who are developing them:
• General ADSL Traceability
• Traceability with Parameters from Multiple Input Datasets
• Traceability when Creating Rows in BDS
• Traceability When Multiple Analysis Variables are needed on the Same Row
To fully understand this paper, it is expected that the reader be fluent in the CDISC standards ADaM,
SDTM, and Define-XML. Documents about these and other CDISC standards can be downloaded from
the CDISC website https://www.cdisc.org/. See the Recommended Reading section for a list of CDISC
standards documents most pertinent to this paper.
1
Traceability: Some Thoughts and Examples for ADaM Needs, continued
DATA FLOW
Data for ADSL in this example is coming directly from SDTM domains DM, DS, and EX.
DM DS EX
SDTM dataset input
data flow
TRACEABILITY NEEDS
The following metadata table shows an example of variable-level traceability for each ADSL variable. In
this example, AAGEGR1 was created to serve the analysis of grouping subjects by age categories. The
traceability from AAGEGR1 all the way back to the SDTM variable DM.BRTHDTC is demonstrated with
the following:
• Deriving AAGE as the actual age used for the grouping;
• Creating BRTHDT to calculate AAGE;
• Keeping the predecessor of DM.BRTHDTC to show how it was imputed to BRTHDT.
2
Traceability: Some Thoughts and Examples for ADaM Needs, continued
3
Traceability: Some Thoughts and Examples for ADaM Needs, continued
Variable-level metadata, as shown above, is useful for variables that are copied or derived from easily-
referenced data. Because ADSL is one record per subject, there is no opportunity to include variables
such as sequence number to provide data point traceability. For many ADSL variables, including those
mentioned here, variable-level traceability is sufficient. An additional example is being developed for the
ADaM Traceability Examples document to show how an intermediate dataset prior to ADSL can be used
to provide additional traceability for more complex derivations; that example is not included in this
PharmaSUG paper.
4
Traceability: Some Thoughts and Examples for ADaM Needs, continued
DATA FLOW
Data for this example time-to-event analysis dataset (ADHYP) is from at least SDTM hospitalization (HO),
vital signs (VS), and disposition (DS). Figure 2 demonstrates that data flow. It is possible for data to
come from additional input sources, but for this illustration we are only looking at input data that is used to
derive these particular parameter(s) of interest. Also, not shown in the diagram is any data coming from
ADSL.
Data Flow
ADaM
ADHYP
TRACEABILITY NEEDS
Keeping sub-events (i.e., hospitalization, diastolic blood pressure > 90 and systolic blood pressure > 140)
provides the data to support the hypertension event and is part of the traceability of this analysis dataset.
Table 6 shows example variable metadata for specific ADaM dataset variables, and Table 7 shows
Parameter-Value-Level metadata for selected variables. Note that only variables used to illustrate the
concept of traceability are shown in these tables .
Table 6: Variable Metadata for ADHYP
Variable
Name Variable Label Type Codelist/ Controlled Terms Variable Metadata
USUBJID Unique Subject Char VS.USUBJID if subject had either systolic blood
Identifier pressure > 140 and/or diastolic blood pressure > 90
HO. USUBJID if subject is hospitalized due to high
blood pressure
DS. USUBJID if subject did not have hospitalization
or hypertension
PARAM Parameter Char Time to First Hospital Time to Hypertension Event parameter is used for
Admission (day); Time to the analysis of time to hypertension. The other
First DBP > 90 (day); Time to ‘time to’ parameters are sub-events and are
First SBP > 140 (day); Time included.
to Hypertension Event (day)
PARAMCD Parameter Code Char HOSPADM; DBP; SBP; Create one record for each PARAMCD even if the
HYPEREVT subject did have the event.
AVAL Analysis Value Num See parameter-level metadata below
5
Traceability: Some Thoughts and Examples for ADaM Needs, continued
Variable
Name Variable Label Type Codelist/ Controlled Terms Variable Metadata
CNSR Censored Num 1; 0 See parameter-level metadata below
EVNTDESC Event or Censoring Char See parameter-level metadata below
Description
SRCDOM Source Data Char HO; VS; DS See parameter-level metadata below
SRCVAR Source Variable Char HOSTDY; VSDY; DSSTDY See parameter-level metadata below
SRCSEQ Source Sequence Num The sequence number --SEQ of the row in the input
Number dataset identified in the SRCDOM that relates to
the analysis value being derived.
EVNTDESC PARAMCD = ‘DBP’ Assigned: If subject had diastolic blood pressure > 90, then EVNTDESC
= ‘FIRST DBP > 90’. Otherwise if DS.DSDECOD = ‘COMPLETED’ then
EVNTDESC = ‘COMPLETED THE STUDY’. Otherwise EVNTDESC =
DS.DSDECOD.
EVNTDESC PARAMCD = ‘SBP’ Assigned: If subject had systolic blood pressure > 140, then
EVNTDESC = ‘FIRST SBP > 140’. Otherwise if DS.DSDECOD =
‘COMPLETED’ then EVNTDESC = ‘COMPLETED THE STUDY’. Otherwise
EVNTDESC = DS.DSDECOD.
EVNTDESC PARAMCD = ‘HYPEREVT’ DERIVED: If at least one of the sub-events (HOSPADM, DBP, SBP) had
EVNTDESC that indicated ‘FIRST …’, then EVNTDESC = ‘HYPERTEN.
EVENT’. Otherwise if DS.DSDECOD = ‘COMPLETED’ then EVNTDESC =
‘COMPLETED THE STUDY’. Otherwise EVNTDESC = DS.DSDECOD.
6
Traceability: Some Thoughts and Examples for ADaM Needs, continued
Controlled
Variable Where Terms/ Formats Source/Derivation/Comment
SRCDOM PARAMCD = ‘HOSPADM’ HO; DS Assigned: If subject was admitted to the hospital, then SRCDOM =
‘HO’. Otherwise SRCDOM = ‘DS’.
SRCDOM PARAMCD = ‘DBP’ VS; DS Assigned: If subject had diastolic blood pressure > 90, then SRCDOM
= ‘VS’. Otherwise SRCDOM = ‘DS’.
SRCDOM PARAMCD = ‘SBP’ VS; DS Assigned: If subject had systolic blood pressure > 140, then SRCDOM
= ‘VS’. Otherwise SRCDOM = ‘DS’.
SRCDOM PARAMCD = ‘HYPEREVT’ HO; VS; DS DERIVED: Using sub-events determine the earliest event time and set
SRCDOM accordingly. Otherwise SRCDOM = ‘DS’.
SRCVAR PARAMCD = ‘HOSPADM’ HOSTDY; Assigned: If subject was admitted to the hospital, then SRCVAR =
DSSTDY ‘HOSTDY’. Otherwise, SRCVAR = ‘DSSTDY’.
SRCVAR PARAMCD = ‘DBP’ VSDY; DSSTDY Assigned: If subject had diastolic blood pressure > 90, then SRCVAR =
‘VSDY’. Otherwise, SRCVAR = ‘DSSTDY’.
SRCVAR PARAMCD = ‘SBP’ VSDY; DSSTDY Assigned: If subject had systolic blood pressure > 140, then SRCVAR =
‘VSDY’. Otherwise, SRCVAR = ‘DSSTDY’.
SRCVAR PARAMCD = ‘HYPEREVT’ HOSTDY; VSDY; DERIVED: Using sub-events determine the earliest event time and set
DSSTDY SRCVAR accordingly. Otherwise, SRCVAR = ‘DSSTDY’.
7
Traceability: Some Thoughts and Examples for ADaM Needs, continued
Utilizing the SRCDOM, SRCVAR, and SRCSEQ variables in ADHYP allows us to trace each record back
to the source dataset. For example, Table 11 shows the resulting analysis dataset ADHYP.
Table 11: Output Data Example ADHYP
Row USUBJID PARAM PARAMCD AVAL CNSR EVNTDESC SRCDOM SRCVAR SRCSEQ
Time to First Hospital FIRST HOSPITAL
1 2010 HOSPADM 9 0 HO HOSTDY 99
Admission (day) ADMISSION
2 2010 Time to First DBP>90 (day) DBP 15 0 FIRST DBP>90 VS VSDY 208
COMPLETED THE
3 2010 Time to First SBP>140 (day) SBP 22 1 DS DSSTDY 301
STUDY
Time to Hypertension Event
4 2010 HYPEREVT 9 0 HYPERTEN. EVENT HO HOSTDY 99
(day)
Time to First Hospital COMPLETED THE
5 3082 HOSPADM 10 1 DS DSSTDY 130
Admission (day) STUDY
COMPLETED THE
6 3082 Time to First DBP>90 (day) DBP 10 1 DS DSSTDY 130
STUDY
COMPLETED THE
7 3082 Time to First SBP>140 (day) SBP 10 1 DS DSSTDY 130
STUDY
Time to Hypertension Event COMPLETED THE
8 3082 HYPEREVT 10 1 DS DSSTDY 130
(day) STUDY
Notice that we can trace the value of Row 1 back to HO data with HOSEQ = 99 (Row 1 in HO) and trace
the value of Row 2 back to VS data with VSSEQ = 208 (Row 6 in VS).
ANALYSIS NEED
In this example, the analysis requirement is to summarize the average of the triplicate ECG interval
values (AVAL) as well as change from baseline (CHG), where baseline (BASE) is defined as the average
of the triplicate ECG intervals collected on the visit prior to the first administration of study drug. This
summary will be performed by analysis visit (AVISIT).
8
Traceability: Some Thoughts and Examples for ADaM Needs, continued
Where DTYPE='AVERAGE'
Figure 3: Example Table Showing Analysis Need for Averaging Values Across Each Visit
DATA FLOW
Data for ADEG in this example is coming directly from SDTM domain EG plus ADSL.
9
Traceability: Some Thoughts and Examples for ADaM Needs, continued
TRACEABILITY METADATA
Dataset-level, variable-level, and parameter-value-level metadata, all useful for traceability, are shown
below. Only content pertinent to the example are included here.
ADEG Electrocardiogram Analysis One record per subject, parameter, analysis visit, reference BASIC DATA
Dataset ID, derivation type STRUCTURE
10
Traceability: Some Thoughts and Examples for ADaM Needs, continued
Below are example ECG data from ADaM ADEG. Only content pertinent to the example are included
here.
11
Traceability: Some Thoughts and Examples for ADaM Needs, continued
12
Traceability: Some Thoughts and Examples for ADaM Needs, continued
OTHER USES
This example demonstrated how to maintain traceability when you create new records in a BDS ADaM
dataset. The traceability is both metadata driven (i.e., variable-level and parameter-level metadata
defining how the new record is derived) and data-driven (e.g., maintaining the source records and
variables from SDTM in the ADEG dataset, such as using EGSEQ). There are numerous use cases
similar to this example, such as:
• An endpoint analysis where new records are created for an analysis visit of “Endpoint”, using a
derivation type (DTYPE) of LOCF or WOCF.
• Creating a composite endpoint.
• Interpolating missing values (when not using Mixed modeling methods).
• Creating group scores (e.g., total symptom scores as sum of individual symptom scores from an
allergy diary).
• Creating additional collapsed record rows in an adverse event analysis dataset, when a single
adverse event was collected across multiple records.
DATA FLOW
In this example, ADaM dataset ADQS has data from SDTM QS plus ADSL, and ADaM dataset ADQST is
derived directly from and using only ADQS.
13
Traceability: Some Thoughts and Examples for ADaM Needs, continued
TRACEABILITY METADATA
Dataset-level, variable-level, and parameter-value-level metadata are shown below. Only content
pertinent to the example are included here.
The variable metadata for ADQS provides traceability to the source SDTM data variables and describes
the process of deriving new DTYPE='SUM' records:
STUDYID QS.STUDYID
USUBJID QS.USUBJID
TRTP ADSL.TRT01P
VISIT QS.VISIT
PARAMCD Keep QS records where QS.QSCAT='MOTOR FUNCTION QUESTIONNAIRE' and VISIT in ('BASELINE'
'MONTH 1'), set PARAMCD=QS.QSTESTCD and PARAM=QS.QSTEST.
Create 4 derived records per subject, with PARAMCD, PARAM values as:
UPPER=Upper Body Motor Score
LOWER=Lower Body Motor Score
For timepoints BASELINE and MONTH 1
PARAM See PARAMCD
14
Traceability: Some Thoughts and Examples for ADaM Needs, continued
Variable Where
Name Condition Variable Metadata
DTYPE EQ Where PARAMCD='UPPER', the sum of the scores for questions 1-3
'SUM' Where PARAMCD='LOWER', the sum of the scores for questions 4-6
If any scores are missing, do not impute sum
ABLFL Set to Y where VISIT=BASELINE
BASE AVAL value from the record where ABLFL=Y, populate for post-baseline records only
CHG AVAL-BASE
QSSEQ QS.QSSEQ
QSCAT QS.QSCAT
ADQST Transposed ADAM One record This dataset is derived from ADQS by transposing CHG by USUBJID, using the
ADQS OTHER per subject values of PARAMCD as new variable names and the values of PARAM as new
variable labels
The variable metadata for ADQST is relatively simple, describing the transpose process and providing the
predecessor origins for each variable:
Table 19: Variable Metadata Example ADQST
Variable Name Variable Label Variable Metadata
15
Traceability: Some Thoughts and Examples for ADaM Needs, continued
9 XYZ XYZ-001 DRUG A BASELINE UPPER Upper Body Score SUM 115
10 XYZ XYZ-001 DRUG A MONTH 1 UPPER Upper Body Score SUM 135
11 XYZ XYZ-001 DRUG A BASELINE LOWER Lower Body Score SUM 110
12 XYZ XYZ-001 DRUG A MONTH 1 LOWER Lower Body Score SUM 115
11(cont) Y
12(cont) 110 5
Note: In the sample data for ADQS shown in Table 20, records that originate from SDTM have a value in
QSSEQ and no value in DTYPE, and records which are derived have a value in DTYPE and no value in
QSSEQ. Including variable QSSEQ allows us to identify the exact source record in QS that was used for
the row in ADQS.
Below are example ADQST data. Only content pertinent to the example are included here.
16
Traceability: Some Thoughts and Examples for ADaM Needs, continued
The sample data for ADQST shown in Table 21 supports the needs of the statistical analysis, and through
the dataset, variable, and parameter metadata it is possible to trace each analysis value to a specific
record in ADQS, and from there to the source SDTM records. A note of importance is that if ADQS was
not produced, and only ADQST provided, the traceability between source and analysis data would be
lost.
OTHER USES
This example demonstrates how each data point in a wide multiple analysis variables dataset can be
traced back across derivations to its SDTM source using variable metadata and data point traceability
provided by the BDS standard. It is not necessary for the final horizontal dataset to be one record per
USUBJID. For example, one may create a one record per USUBJID per AVISIT timepoint dataset that
arranges all analysis values from that timepoint horizontally.
CONCLUSION
Four examples were shown to demonstrate how data, metadata, and even intermediate datasets, can all
provide traceability when creating ADaM datasets. Each of these examples comes from the ADaM
Traceability Examples document now in development.
When deciding how to create ADaM datasets, the authors encourage you to ask yourself the following
questions:
• Can the end-user determine which data is copied from SDTM and which is derived?
• Can the end-user determine how each variable and row was created in the dataset?
• Can the end-user trace back to the SDTM data that was used to create the value used for
analysis?
By considering the perspective of the end-user, traceability can be built in a natural and useful way.
REFERENCES
All CDISC documents referenced in this paper can be downloaded from https://www.cdisc.org/.
RECOMMENDED READING
• Analysis Data Model Implementation Guide version 1.1
• Analysis Data Model (ADaM) Examples in Commonly Used Statistical Analysis Methods
• CDISC Define-XML Specification Version 2.0
17
Traceability: Some Thoughts and Examples for ADaM Needs, continued
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
Sandra Minjoe
PRA Health Sciences
[email protected]
Wayne Zhong
Accretion Softworks
[email protected]
Kent Letourneau
PRA Health Sciences
[email protected]
Richann Watson
DataRich Consulting
[email protected]
18