0% found this document useful (0 votes)
91 views7 pages

GPP Guidance Document v1.1

This document provides guidance on good programming practices for clinical data analysis. It recommends familiarizing yourself with study documents and standards before starting work. Key elements of a program header include identifying the project, author, purpose, macros used, and revision history. Comments should describe the rationale for code and section comments can structure programs. Naming conventions help identify related programs and datasets.

Uploaded by

David Manteigas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views7 pages

GPP Guidance Document v1.1

This document provides guidance on good programming practices for clinical data analysis. It recommends familiarizing yourself with study documents and standards before starting work. Key elements of a program header include identifying the project, author, purpose, macros used, and revision history. Comments should describe the rationale for code and section comments can structure programs. Naming conventions help identify related programs and datasets.

Uploaded by

David Manteigas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Guidance on Good Programming Practice

Steering Board for Good Programming Practice in Health and Life Sciences
Version 1.1 March 2014

Table of Contents
Introduction ................................................................................................................................ 2
Getting Started With a New Project........................................................................................... 2
Language .................................................................................................................................... 3
Program header .......................................................................................................................... 3
Revision history ......................................................................................................................... 4
Comments .................................................................................................................................. 4
Naming conventions .................................................................................................................. 4
Coding conventions ................................................................................................................... 4
Log File Checking...................................................................................................................... 5
Portability................................................................................................................................... 6
Hard coding ................................................................................................................................ 6
Defensive programming............................................................................................................. 7
APPENDIX ................................................................................................................................ 7

Page 1 of 7
Guidance on Good Programming Practice
Steering Board for Good Programming Practice in Health and Life Sciences
Version 1.1 March 2014

Introduction
This document provides guidance for good programming practices (GPP) for analysis,
reporting and data manipulation of clinical data in health and life sciences organizations.
This guidance is primarily aimed at SAS programmers however the principles of GPP also
apply to other languages such as R and Stata. In addition, although this is not produced with
SAS macros in mind, the same principles apply to macros too.

We often have to update existing programs to add new rules, copy programs from one study
to another, and take over programs written by others. The guidance aims to show how to
produce well structured and well documented programs so that they are easy to read and
maintain over time. It is meant to be applicable to all programs, and hence all programmers
regardless of experience. Specific rules may be of more use to novice programmers, but
applying the principles should be in mind for experienced programmers and mentors.

Getting Started With a New Project


When starting work on any new study, it is important to familiarize yourself with the study.
Review the study documents and try to understand the following:
The objectives of the study.
How many patients will be enrolled, randomized, and treated.
Schedule of events, i.e. screening, run-in, treatment periods, washouts, how many
treatments and when they are taken.
What is the primary endpoint and how, when and where is this data collected.
Timelines for the trial, when is the database lock, when should the top line results be
ready, and when should all the reporting be finalized.
The current status of the project.

Study documents include:


Clinical Study Protocol (CSP) - study outline and statistical sections are usually of
relevance.
Case Report Forms (CRF) /annotated CRF (annotated with the dataset name and
variable name) - to understand where the data comes from and how it was
collected and where it is stored.
Statistical Analysis Plan (SAP) to see what data is reported and how.
Analysis Datasets (ADS) specifications describes which derived datasets should
be created and what will be stored within them, including detailed definitions of
endpoints. Used for ADS programming and validation.
Table shells used for tables, listings and graphs (TLG) programming or
validation.
Publications, if available (to check against already available results).
Previous Clinical Study Reports (CSR), if available (to check against already
available results).

Before you start programming, it is important that you familiarize yourself with the
following:
Familiarize yourself with the system you are working on.
Check for company specific programming standards.
Check for study and project specific standards.

Page 2 of 7
Guidance on Good Programming Practice
Steering Board for Good Programming Practice in Health and Life Sciences
Version 1.1 March 2014

Check for industry standards like Clinical Data Interchange Standards Consortium
(CDISC) which are to be applied or can be applied.
Check if a similar project/study has been worked on, i.e. check if available SAS code
can be reused.
Check for project-independent macros that can be applied.

Organization specific guidance

Reference organization specific guidance here.

Language

The language used in programming code and within headers and comments is English.

Organization specific guidance

Reference
Program organization specific guidance here.
header

A standard header should be used for every program. The purpose of the header is to identify
the program and provide documentation including revision history. It provides the necessary
information for a code reviewer to identify and understand the program and its development
life cycle. The elements included in a header will vary from organization to organization but
below is a discussion of some of the most common elements.

Required elements
The following should be included in all program headers:
Identification of the project of which the program is a part.
Program name.
Author identification which should be human readable and unique.
Short description of program purpose.
List of macros used in the program.
Date program was first put into production, was finalized, or when past first
validation.
o This date will be chosen based on the operational procedures used within the
company /organization creating the program. The date should indicate the
first date when the program was released for final use.
Revision history (see discussion below).

Recommended elements
The following are not required but are highly recommended in all program headers:
All outputs generated by the program, including both file creation and modification.
External files used such as datasets or databases that are used as data inputs to the
program or macros used.
Platform and operating system for which the program was developed to run.
Software/programming language and version which the program was programmed in.

Organization specific guidance

Page 3 of 7
Guidance on Good Programming Practice
Steering Board for Good Programming Practice in Health and Life Sciences
Version 1.1 March 2014

Reference organization specific guidance here.

Revision history

The revision history section is critical to document the revisions made to the program once it
is put into production. A well designed revision history section should include the author of
the change, date of release of the change, a short description of the change. Revision history
may also include a version number for changes which can be used as a reference in the code.

Organization specific guidance

Reference organization specific guidance here.

Comments

Comments are important to help anyone reviewing, modifying or using a program to be able
to quickly understand the code. All major data or proc steps should be commented,
especially data specific and complex code. Ideally comments should be comprehensive, and
should describe the rationale and not simply the action. For example, instead of simply
typing "Access demography data", describe which data elements you are accessing and why
they are needed, for example, Bringing in DM to get gender and age and subset to include
only the intent to treat population. Comments can also include links to external
documentation (requirement specifications, design documents. The programs should also be
split up into sections by creating different types of comments, e.g. many rows with asterisks.
This helps to structure the program and make it easier for others to see an overview of the
program.

Organization specific guidance

Reference
Naming organization specific guidance here.
conventions

All organizations should have standard naming conventions. Program naming conventions
will make it possible to identify groups of related programs such as adverse events tables.
Dataset and variable names should describe as best as possible their content, but of course
datasets following CDISC standards will have pre-defined names. Space characters should be
avoided in variable, dataset and output file names.

Organization specific guidance

Reference
Coding organization specific guidance here.
conventions

In order to be efficient and streamline the sharing of program code between programmers,
with regulatory agencies, and with external partners or vendors, it is vital for code structure to
follow standard conventions. SAS code which follows these conventions is much easier to
read, modify, maintain, and correct. These conventions are divided into those which should

Page 4 of 7
Guidance on Good Programming Practice
Steering Board for Good Programming Practice in Health and Life Sciences
Version 1.1 March 2014

be considered as required, and those which are merely recommendations to be followed as


applicable.

Required conventions
Do not overwrite existing datasets, use different meaningful names for each
temporary dataset.
Each organization may have its own standards for using case within programming
code but use of all uppercase should be avoided.
Separate data steps and procedures with at least one blank line.
Use data=dataset option in procedure statements so that the dataset being used is
explicitly stated to ensure that the statement will work if it is moved to another
location.
End data steps and procedures with run or quit to provide a boundary and allow for
independent execution.
Split data steps into logical parts.
Put each statement on a separate line.
Left justify global statements and data and procedure statements and their
corresponding run and quit statements.
Indent statements belonging to a level by 2 to 5 columns (use the same number of
spaces throughout the program), i.e. every nesting level should be visibly indented
from the previous level.
Do not use tabs for indentation because they will display differently depending on the
platform and text editor being used, use blanks instead.
For do loops place the end statement in the same position as the do statement so that
they can be easily matched.
Insert parentheses in meaningful places in order to clarify the sequence in which
mathematical or logical operations are performed.
When converting character variables to numeric or vice versa, use the put and input
functions to explicitly convert the variable to ensure that it is done in the way
intended and to avoid errors, warnings, and notes in the program log.
Structure your program to read in all external data at the top, do the processing then
produce any outputs or permanent analysis datasets.

Recommended conventions
Perform only one task per module or macro.
Use logical groupings to separate code into blocks.
Double space between sections.
Group similar statements together.
Define new variables with the attrib statement in order to ensure that the variable
properties such as length, format, and label are correct instead of allowing them to be
implicitly determined by the circumstances in which they are initialized in the code.

Organization specific guidance

Reference
Log organization specific guidance here.
File Checking

As part of development and validation practices, it is often mandated that the log file
generated is checked to ensure that the program has executed in the correct intention. Many
Page 5 of 7
Guidance on Good Programming Practice
Steering Board for Good Programming Practice in Health and Life Sciences
Version 1.1 March 2014

companies may have their own automatic log file checking utilities to aid in this, and there
are many examples of such tools in widely available papers. ERROR and WARNING in
logs should normally be avoided. There are sometime exceptions to this, such as warnings
that are output from statistical models that do not have enough data. Ordinarily, any
warnings that are deemed acceptable are documented. There are also some specific
NOTEs that can indicate a problem. The common NOTEs that should normally be
avoided include those relating to repeats, more than one, uninitialized and
referenced.

Also, any user defined checks that have been added, such as from defensive programming,
should be checked for in the log and followed up on. A company-specific naming convention
for user defined checks can aid in this, so the specific string can be searched for within the
log. Examples of such conventions include ISSUE:, USER:, and ALERT:. Avoid the
use of user-generated errors and warnings labeled "NOTE:", "WARNING:" or "ERROR:", as
these may make it difficult to find genuine problems when searching the log.

Organization specific guidance

Reference organization specific guidance here.


Portability

Most organizations are now working across multiple platforms, commonly combining
Windows and Unix environments. There can be many occasions where code will work on
one platform and not on another. Portability is more than just working across multiplatform
environments; it is also about making programs easier to be used across projects. Below are
some suggestions to address some of the most common impediments to portability.
Use rounding in newly created variables (if applicable) in order to avoid different results
e.g. from 64 bit operating systems to 32 bit systems. (However give careful consideration
to doing this and round at the limit of precision as otherwise it may affect results. Where
rounding is only required for presenting results, do so after calculations and derivations
are completed.)
Avoid explicitly defining file paths in libname, filename, and %include statements
requiring platform specific syntax such as forward slash or back slash.
Avoid the use of X commands to execute statements directly on the operating system.
Avoid explicit project or data specific code by using macro variables where possible. An
example of this is using macro variables to describe dosing groups in table headers
instead of typing them out in the report section.

Organization specific guidance

Reference
Hard codingorganization specific guidance here.

Hardcoding is the modification of the value of an item of source data within program code.
Hardcoding should be avoided whenever possible in final code, and changes to source data
should be done in data entry or capture systems which give better compliance to regulations
such as FDA 21CFR11. Hardcoding may be done temporarily in order to get a program to
run due to dirty data or correct for database inconsistencies. Permanent hardcoding to fix
incorrect data values in a final database is strongly discouraged, but if it is unavoidable then it
must be approved following a standard process and clearly documented using standard
comments and PUT statements to the log to show what has been hard coded.
Page 6 of 7
Guidance on Good Programming Practice
Steering Board for Good Programming Practice in Health and Life Sciences
Version 1.1 March 2014

Organization specific guidance

Referenceprogramming
Defensive organization specific guidance here.
Defensive programming is an approach to programming intended to anticipate future changes
of the data that might influence the coding algorithms. Ideally programs should be written in
such a way that they will continue to work correctly in case of new or unexpected data values
which did not exist at the time the code was developed. Analysis dataset and table programs
are often developed in the early stages of a project or even when the only available data is test
data. In these situations the data often does not contain all possible values of data points such
as visits or time points, race values, and questionnaire responses, but the program must be
able to handle those values when they do become present in the data at a later point.

Organization specific guidance

Reference organization specific guidance here.

APPENDIX

Appendices can be added to the document to include organization specific guidance as well
as any templates or examples.

Page 7 of 7

You might also like