CDISC-SAS Clinical Interview QUESTIONS and ANSWERS-5
CDISC-SAS Clinical Interview QUESTIONS and ANSWERS-5
html
SAS CLINICAL : Interview Questions and Answers:
1) What do you know about CDISC and its standards?
CDISC stands for Clinical Data Interchange Standards Consortium and it is
developed keeping in mind to bring great deal of efficiency in the entire drug
development process. CDISC brings efficiency to the entire drug development
process by improving the data quality and speed-up the whole drug development
process and to do that CDISC developed a series of standards, which include
Operation data Model (ODM), Study data Tabulation Model (SDTM) and the
Analysis Data Model ADaM).
2) Why people these days are more talking about CDSIC and what
advantages it brings to the Pharmaceutical Industry?
A) Generally speaking, Only about 30% of programming time is used to generate
statistical results with SAS®, and the rest of programming time is used to
familiarize data structure, check data accuracy, and tabulate/list raw data and
statistical results into certain formats. This non-statistical programming time will
be significantly reduced after implementing the CDISC standards.
3) What are the challenges as SAS programmer you think you will face
when you first implement CDISC standards in you company?
A) With the new requirements of electronic submission, CRT datasets need to
conform to a set of standards for facilitating reviewing process. They no longer are
created solely for programmers convenient. SDS will be treated as specifications of
datasets to be submitted, potentially as reference of CRF design. Therefore,
statistical programming may need to start from this common ground.
References:
Pharmasug/2007/fc/fc05
pharmasug/2003/fda compliance/fda055
Before SDTM:
There are different names for each domain and domains don’t have a standard
structure. There is no standard variables list for each and every domain.
Because of this FDA reviewers always had to take so much pain in understanding
themselves with different data, domain names and name of the variable in each
analysis dataset. Reviewers will have spent most of the valuable time in cleaning
up the data into a standard format rather than reviewing the data for the accuracy.
This process will delay the drug development process as such.
After SDTM:
There will be standard domain names and standard structure for each domain.
There will be a list of standard variables and names for each and every dataset.
Because of this, it will become easy to find and understand the data and reviewers
will need less time to review the data than the data without SDTM standards. This
process will improve the consistency in reviewing the data and it can be time
efficient.
The purpose of creating SDTM domain data sets is to provide Case Report
Tabulation (CRT) data FDA, in a standardized format. If we follow these standards
it can greatly reduce the effort necessary for data mapping. Improper use of
CDISC standards, such as using a valid domain or variable name incorrectly, can
slow the metadata mapping process and should be avoided4.
Verifies that all required variables are present in the data set
Reports as an error any variables in the data set that are not defined in the
domain
Reports a warning for any expected domain variables that are not in the data set
Notes any permitted domain variables that are not in the data set
Verifies that all domain variables are of the expected data type and proper length
Detects any domain variables that are assigned a controlled terminology
specification by the domain and do not have a format assigned to them.
The procedure also performs the following checks on domain data content of the
source on a per observation basis:
Verifies that all required variable fields do not contain missing values
Detects occurrences of expected variable fields that contain missing values
Detects the conformance of all ISO-8601 specification assigned values; including
date, time, date time, duration, and interval types
Notes correctness of yes/no and yes/no/null responses,
Advantages:
• Your “raw” database is equivalent to your SDTM which provides the most elegant
solution.
• Your clinical data management staff will be able to converse with
end-users/sponsors about the data easily since your clinical data manager and the
und-user/sponsor will both be looking at SDTM datasets.
• As soon as the CDMS database is built, the SDTM datasets are available.
Disadvantages:
• This approach may be cost prohibitive. Forcing the CDMS to create the SDTM
structures may simply be too cumbersome to do efficiently.
• Forcing the CDMS to adapt to the SDTM may cause problems with the operation
of the CDMS which could reduce data quality.
Advantages:
• The great flexibility of SAS will let you transform any proprietary CDMS
structure into the SDTM. You do not have to work around the rigid constraints of
the CDMS.
• Changes could be made to the SDTM conversion without disturbing clinical data
management processes.
• The CDMS is allowed to do what it does best which is to enter, manage, and
clean data.
Disadvantages: • There would be additional cost to transform the data from your
typical CDMS structure into the SDTM.
Specifications, programming, and validation of the SAS programming
transformation would be required.
• Once the CDMS database is up, there would then be a subsequent delay while
the SDTM is created in SAS.
This delay would slow down the production of analysis datasets and reporting. This
assumes that you follow the linear progression of CDMS -> SDTM -> analysis
datasets (ADaM).
• Since the SDTM is a derivation of the “raw” data, there could be errors in
translation from the “raw” CDMS data to the SDTM.
• Your clinical data management staff may be at a disadvantage when speaking
with end-users/sponsors about the data since the data manager will likely be
looking at the CDMS data and the sponsor will see SDTM data.
Disadvantages:
• There would still be some additional cost needed to transform the data from the
SDTM-like CDMS structure into the SDTM. Specifications, programming, and
validation of the transformation would be required.
• There would be some delay while the SDTM-like CDMS data is converted to the
SDTM.
• Your clinical data management staff may still have a slight disadvantage when
speaking with endusers/ sponsors about the data since the clinical data manager
will be looking at the SDTM-like data and the sponsor will see the true SDTM
data.
The trial design class contains seven domains and the special-purpose class
contains two domains (Demographics and Comments).
The trial design domains provide the reviewer with information on the criteria,
structure and scheduled events of a clinical trail. The only required domain is
demographics.
There are two other special purpose relationship data sets, the Supplemental
Qualifiers (SUPPQUAL) data set and the Relate Records (RELREC) data set.
SUPPQUAL is a highly normalized data set that allows you to store virtually any
type of information related to one of the domain data sets. SUPPQUAL domain also
accommodates variables longer than 200, the Ist 200 characters should be stored
in the domain variable and the remaining should be stored in it5.
Each of the SDTM domains has a collection of variables associated with it.
There are five roles that a variable can have:
Identifier,
Topic,
Timing,
Qualifier,
and for trial design domains,
Rule. Using lab data as an example, the subject ID, domain ID and sequence (e.g.
visit) are identifiers.
The name of the lab parameter is the topic,
the date and time of sample collection are timing variables,
the result is a result qualifier and the variable containing the units is a variable
qualifier.
Variables that are common across domains include the basic identifiers study ID
(STUDYID), a two-character domain ID (DOMAIN) and unique subject ID
(USUBJID).
In studies with multiple sites that are allowed to assign their own subject
identifiers, the site ID and the subject ID must be combined to form USUBJID.
Prefixing a standard variable name fragment with the two-character domain ID
generally forms all other variable names.
The SDTM specifications do not require all of the variables associated with a
domain to be included in a submission. In regard to complying with the SDTM
standards, the implementation guide specifies each variable as being included in
one of three categories:
REQUIRED – These variables are necessary for the proper functioning of standard
software tools used by reviewers. They must be included in the data set structure
and should not have a missing value for any observation.
EXPECTED – These variables must be included in the data set structure; however
it is permissible to have missing values.
PERMISSIBLE – These variables are not a required part of the domain and they
should not be included in the data set structure if the information they were
designed to contain was not collected.
Electrocardiogram (EG)
Inclusion / Exclusion (IE)
Lab Results (LB)
Physical Examination (PE)
Questionnaire (QS)
Subject Characteristics (SC)
Vital Signs (VS)
• Events Class – Incidents independent of the study that happen to the subject
during the lifetime of the study.
• Trial Design Class – Information about the design of the clinical trial (e.g.,
crossover trial, treatment arms) including information about the subjects with
respect to treatment and visits.
If the data management metadata is not in compliance with SDTM then avoid auto
mapping. So do manual mapping the datasets to SDTM datasets and the mapping
each variable to appropriate domain.
The whole process of mapping include: *Read in the corporate data standards
into a database table.
• Assign a CDISC domain prefix to each database module.
• Attach a combo box containing the SDTM variable for the selected domain to a
new mapping variable field.
• Search each module, and within each module select the most appropriate CDISC
variable.
•Then search for variables mapped to the wrong type Character not equal to
Character; Numeric not equal to Numeric.
• Review the mapping to see if any conflicts are resolvable by mapping to a more
appropriate variable.
• We need to verify that the mapped variable is appropriate for each role.
• Then finally we have to ensure all ‘required’ variables are present in the
domain6.
8) What do you know about SDTM Implementation Guide, Have you used it,
if you have can you tell me which version you have used so far?
SDTM Implementation guide provides documentation on metadata (data of data)
for the domain datasets that includes filename, variable names, type of variables
and its labels etc. I have used SDTM implementation guide versions 3.1.1/3.1.2
PERMISSIBLE – These variables are not a required part of the domain and they
should not be included in the data set structure if the information they were
designed to contain was not collected.
11) What are the Common Issues in Mapping Dummy corporate standards
to CDISC (SDTM) Standards?
Ref: Mapping Corporate Data Standards to the CDISC Model (SAS Paper)
by David Parker, AstraZeneca, Manchester, United Kingdom
12) Can you explain AdaM or AdaM datasets?
The Analysis Data Model describes the general structure, metadata, and content
typically found in Analysis Datasets and accompanying documentation. The three
types of metadata associated with analysis datasets (analysis dataset metadata,
analysis variable metadata, and analysis results metadata) are described and
examples provided. (source:CDISC Analysis Data Model: Version 2.0)
Analysis datasets (AD) are typically developed from the collected clinical trial data
and used to create statistical summaries of efficacy and safety data. These AD’s are
characterized by the creation of derived analysis variables and/or records. These
derived data may represent a statistical calculation of an important outcome
measure, such as change from baseline, or may represent the last observation for a
subject while under therapy. As such, these datasets are one of the types of data
sent to the regulatory agency such as FDA.
The CDISC Analysis Data Model (ADaM) defines a standard for Analysis
Dataset’s to be submitted to the regulatory agency. This provides a clear content,
source, and quality of the datasets submitted in support of the statistical analysis
performed by the sponsor.
In ADaM, the descriptions of the AD’s build on the nomenclature of the SDTM with
the addition of attributes, variables and data structures needed for statistical
analyses. To achieve the principle of clear and unambiguous communication relies
on clear AD documentation. This documentation provides the link between the
general description of the analysis found in the protocol or statistical analysis plan
and the source data.
References:
1) http://support.sas.com/rnd/base/xmlengine/proccdisc/cdiscsdtm.html
2) http://www.fda.gov
3) pharmasug/2005/fdacompliance/fc01.pdf
4) http://www2.sas.com/proceedings/forum2008/207-2008.pdf
5) http://analytics.ncsu.edu/sesug/2006/PO08_06.PDF
6) http://www.lexjansen.com/phuse/2005/cd/cd11.pdf
7) http://www.pharmasug.org/2005/FC03.pdf
Apart from those .. you may also need to prepare for these questions too...
1) How many years experience you have working with CDISC standards?
(Tell me the usuall process flow or the procedure you have followed regarding
implementation of CDISC standards)
3) For how many studies so far you have done SDTM mapping.
4) Have you ever been asked to create specifications for SDTM mapping.
5) Do you have experience doing the mapping as per the sponsor standards.
6) a) Tell me few details about the databases you have worked with so far?
b) Which database do you think you had most trouble with? (Inform, Rave, Clintrial
or Oracle clinical)
a) annotated CRF
b) Specification Document
c) SDTM datasets
8) a) How do you verify all the standards has been maintained as per the SDTM
implementation guide?
9) What you will do when you find a problem as part of the validation process?
10) What kind of macros you have developed which can be useful in creating
SDTM standard datasets?
11) Do you like to create a single program for each domain and then include in a
batch program or
12) Do you have any experience talking to the client on regular basis? If, yes...
share with me your experience?
13) Do you have experience working with people in different time zone?
14) Do you have experience or knowledge about WEBSDM checks or Open CDISC?
17) If you are working as a validator, how do you communicate with the main
programmer?
18) How many weeks time you think you need to finish creating the SDTM
datasets? (Just for programming)?
19) Is there any sample program you can write or show ... which will give us an
idea about you SAS programming skills?
20) What's the challenging part regarding the whole SDTM mapping process?
21) For which domain do you think you always need to be very careful? and why?
22) If I ask you to create SDTM mapping specification document? what documents
or files you need and why?
23) Do you know anything about splitting domains. (or Can you split the domains
rather than creating one big domain)?
25) What do you know about controlled terminology and for which domains you
need controlled terminology?
27) Can you share with me any differences you know between implementation
guide v3.1.1 and v3.1.2?
28) How do you determine the time line, If the client asked you to provide one for
the SDTM mapping conversion process?
29) Is there any way to apply attributes to the SDTM variables other than just
manually typing all the details about (length/label/format/informat etc) in an attrib
statement?
30) You have been asked to create a domain (not included in implmentation guide)
for CRF, what you will do or how do you create one?
CDISC SDTM Questions You might be asked in an interview
1) Have you used - -STAT variable anytime. If yes, why and in what kind of domain you used that variable.
2) I see in your CV that you have experience in developing SDTM domains based on IG 3.1.1, V3.1.2 and
V3.1.3. Can you share some of the differences between each version of Implementation Guide?
(Difference between SDTM IG 3.1.1 vs. V3.1.2 and V3.1.2 vs. V3.1.3)
3) Can you give me an example of a variable which can be used to group some of the records?
4) Tell me your experience using - -SPEC variable.
5) What’s the significance of - -PRESP variable and tell me what do you know about - -OCCUR variable.
6) Can you give me an example of a Topic Variable in:
a) Intervention Domains
b) Event Domains
c) Finding Domains
7) What’s your experience creating the Related Records domain (RELREC)? Can you give me few
examples of the domains you’ve used to create a RELREC SDTM domain?
8) What’s your experience creating the Findings About (FA) and Clinical Events (CE) domains.
What’s the difference between the FA and CE domains?
9) Can you give me few examples of the kind of data you are going to map it to FA and CE domains.
10) Why can’t we include Clinical Event data in AE domain?
11) What’s your experience creating the custom domains? How do you create a custom domain?
12) What you do, if you have a CRF page and all of the information collected on it aren’t related to any
specific SDTM domain.
13) When do you create a SUPPQUAL or Custom domain?
14) If you have any experience creating a custom domain, can you share, what kind of the data that was and
what’s the PREFIX you have used for the domain name.
15) Tell me about the difficult thing you have to do or manage when you work as a SDTM standards
implementer.
16) Have you use - -OBJ variable. If you are, in which domain? And what’s the significance.
17) Tell me about Required/Expected or Permissible variables in SDTM domains.
18) Have you created any Tumor Domains? Can you give use few examples of the tumor domains you have
created.