Application of Bioinformatics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Informatics

Application of
bioinformatics in support
of precision medicine
Bioinformatics has played a major role in gene sequencing diagnostics and has
been an essential tool to investigate the genetic causes of disease. With the
support of new technologies and tools, bioinformatics can play an important
part in the support and continued development of precision medicine.

T
By Mike Furness he European Union in vitro diagnostic reg- This article is based on the presentations and
and John Wise ulation (EU IVDR) came into force in May discussion of a symposium that was held in March
2017 and will come into effect in May 2019 on the theme of ‘Application of
2022. Much work needs to be done to realign CDx Bioinformatics in support of Precision Medicine’.
R&D processes from the requirements of the old
Directive 98/79/EC to the rigours of the new An introduction to the biology
Regulation (EU) 2017/746 and its enhanced underpinning clinical bioinformatics
demands for CE Marking* (terms marked with an In the last few years there has been a rapid devel-
* are described in Table 1). As such, in April 2018 opment in NGS sequencing technology and a sub-
the Pistoia Alliance, a global, not-for-profit stantial increase in the capacity to generate genom-
alliance of life science companies, technology sup- ic sequence data, along with a significant decrease
pliers, publishers and academic groups that work in costs. The cost of sequencing the first whole
together to lower barriers to innovation in life sci- human genome*, completed in 2001, was estimat-
ence R&D and healthcare, formed a community of ed at $2.7 billion2. Veritas Genetics3 is now citing
interest (CoI) on Companion Diagnostics, Next the costs of whole genome sequencing to be $599
Generation Sequencing and Regulation (CDx/NGS and the same company is reported to be predicting
and Regulation)1 to consider the many challenges the $99 genome will become available in the next
facing the diagnostics industry and to contribute to three to five years. At that price, whole genome
knowledge sharing within the community. The CoI sequencing (WGS) would become an affordable
identified three main areas where knowledge shar- standards component in patient care.
ing needed to be enhanced viz: This increase in productivity and cost-effective-
ness has led to a growth in genomic sequencing
l Applying NGS technologies in precision projects to levels where there are now dozens of
medicine. research and clinical genome projects running
l Application of bioinformatics in support of pre- across the world4. These projects are now includ-
cision medicine. ing more clinical data with the genomics data and
l Aligning research standards with clinical stan- these data sets are being analysed by research
dards for precision medicine. bioinformatics teams in pharmaceutical companies

56 Drug Discovery World Summer 2019


Informatics

Figure 1
Looking at a very crude model
./01".$%,'2-/3"4".-2-*'56-+&"7 89:"4";6$,% of the drug discovery and
development process, we can
map on where omic
technologies are currently
having an impact. Creation of
registries and with biobanks
provide access to large, more
standardised data sets on
participants and help identify
relevant participants for both
research and clinical trials.
With the growth of clinical
trials collecting omic data, we
now also have the possibility of
stratifying patients with
biomarkers (and ultimately
CDx) todefine those most
likely to respond to particular
therapeutic agents

!"#$%&'$(" )**$(+,-

and also embedded within health services. This and Regulation’ CoI was to bring together protag-
research is focused on a better understanding of onists in the research, clinical and regulatory
disease mechanisms, the identification of biomark- domains relevant to CDx to consider whether stan-
ers* of particular diseases or conditions and a dards could be identified and agreed such that if
characterisation of the patients’ drug responses the research analyses were aligned to them, then
(both good and bad). Such work, and the increas- the data first used to identify the biomarker could
ing affordability of genome sequencing, provides form the basis of the regulatory filing of the CDx.
an opportunity to develop these biomarkers into Such an approach would have the benefit of min-
CDx to enable drugs to be targeted to the specific imising the duplication of effort, saving time and
patient populations most likely to respond posi- cost in development, and as such getting more
tively to treatment with a specific therapeutic agent effective therapeutics to the marketplace faster,
(Figure 1) benefitting patients, healthcare providers and the
Figure 2 is a schematic that shows how retro- companies providing the therapeutics and CDx.
spective analysis of clinical trial results might strat- The inaugural workshop in April 2018 brought
ify the participants into different cohorts allowing together representatives from pharma and biotech,
insight into the genomic profile of those patients technology companies, clinical scientists and regu-
who might well respond to a drug therapy and lators to identify the key issues that would need to
those patients who are unlikely to respond. be addressed. One of the key learnings was that
It should be noted that if such biomarkers had CDx business embraced a broad range of disci-
been identified in the comparatively unregulated plines such as genomics, NGS, bioinformatics, clin-
research environment, then if these biomarkers ical and regulatory affairs and the workshop dele-
were to be used in a clinically applicable CDx, the gates were in large part familiar with some of those
research work that had been carried out to identify disciplines, but by no means all. As such, a first
the biomarkers could well need to be repeated and step was to help address these needs and the Pistoia
validated under clinical regulatory conditions, in Alliance CoI organised a symposium on the
accordance with an appropriate quality system, in ‘Application of NGS Technologies in Precision
order to be eligible for registration as a CDx. Medicine’ in September 2018, and this symposium
An objective of the Pistoia Alliance ‘CDx/NGS on the ‘Application of Bioinformatics in support of

Drug Discovery World Summer 2019 57


Informatics

Precision Medicine’ in March 2019. Both were tar- physiological cause of the disease and not just the
geted at the interested but non-specialist audience. symptomatic relief that has been the standard-of-
care until recently. Furthermore, such detailed
Use case: applying bioinformatics in genetic understanding of the abnormalities in the
drug discovery and development – gene will pave the way for gene therapies to be
Cystic Fibrosis developed.
In the EU, Cystic Fibrosis affects one in 2,000-
3,000 new-borns and in the USA one in 3,500. In Some principal concepts
Asia existing evidence indicates that the prevalence of bioinformatics
of CF is rare5. CF is a multisystem disease caused Bioinformatics can be used in clinical diagnostics.
by one of several different mutations in the cystic The bioinformatics tools can be used to detect the
fibrosis transmembrane conductance regulator presence of genetic variants that act as markers
(CFTR) gene located in chromosome 76. The for a condition or a disease. However, these
CFTR gene provides instructions for making a pro- bioinformatics tools when deployed in Europe
tein which functions as a channel across the mem- must follow the stipulations of the EU IVDR and
brane of cells that produce mucus, sweat, saliva, be CE marked to demonstrate that they are fit for
tears and digestive enzymes7. purpose.
Genetic markers can be used to understand the The EU IVDR includes a specific exemption for
disease stratification in terms of symptoms and diagnostics that are used in the same health institu-
severity across populations, as well as to enable tion as they are made or modified but with some
drugs to be targeted more effectively. specific requirements as set out in Article 5, para-
More than 1,7008 genetic variants have been graph 5. Health institutions wishing to apply the
identified in the CFTR gene for patients with exemption in the new Regulations will need to
Cystic Fibrosis. Only five of these mutations have ensure that products meet the relevant General
a frequency greater than 1%. The deletion of Safety and Performance Requirements. In addition,
phenylalanine in position 508 of the CFTR health institutions will need to have:
(F508del-CFTR) is the most common mutation in
CF patients9 found in ~90% of CF patients. l An appropriate quality system in place eg ISO
While six common classes of the disease have 15189.
been identified10, based on molecular deficit, there l A justification for applying the exemption
is a move from just genotyping patients towards including that the target patient group’s specific
‘theratyping’ (matching therapies or medications needs cannot be met, or cannot be met at the
to specific types of mutations) based on lack of appropriate level of performance by an equivalent
protein (correctors) or lack of function of the pro- device available on the market.
tein (potentiators). Development of drugs such as l Appropriate technical documentation in place.
Ivacaftor initially treated specific groups of
patients (primarily the G551D mutation), who Several key issues are present in all bioinformat-
account for 4-5% of cases of cystic fibrosis. ics and probably one of the most important is how
Notably, Ivacaftor was the first medication the tools are deployed. This includes managing
approved for the management of the underlying dependencies, eg other data associated with the
causes of CF (abnormalities in CFTR protein func- analysis, software versions and version control, and
tion) rather than control of the symptoms of CF. operating system compatibility. Source code for the
But dual therapies including a combination of tools is generally stored in repositories (eg Github,
Ivacaftor with Lumacaftor have increased the BitBucket) and containers (eg Docker) can be used
number of patients who can benefit from drug to wrap up all the source code and its dependencies
therapy. Furthermore, work is now also under way into a standardised format, ready to run.
to use a triple combination of tezacaftor, ivacaftor In clinical diagnostics, bioinformatics software,
and an experimental drug VX 455 as well, which including the sequencer’s own software, should be
has the potential to treat twice as many CF validated, ie shown to be robust and repeatable.
patients. So, it must be demonstrated that, given identical
This use case shows the importance of being input (reads from the sequencer), the analysis
able to stratify patients within the overall popula- pipeline will always produce identical output
tion of those suffering from CF. Understanding the (markers identified). However, this will not be the
details of the genetic abnormalities provides case when stochastic analysis techniques are
opportunities for drug therapy to address the deployed, or AI/ML is used. As such, what needs to

58 Drug Discovery World Summer 2019


Informatics

Figure 1

be demonstrated and documented is that the error Provenance was a crucial issue to be
rate is within acceptable limits. addressed – who ran it, when, where did they run
it, which workflow manager was used, etc. The
Analyses and results – sharing Common Workflow Language (CWL) makes an
and reproducibility – repositories, important contribution to this challenge and the
containers and workflows paper published in 2018 entitled ‘Sharing interop-
Workflows can be deployed to automate analyses erable workflow provenance: A review of best
to enable them to run faster and more reproducibly practices and their practical application in
and to scale. For example, it had been calculated CWLProv’14 focused specifically on provenance
that running a virtual drug docking simulation on in this environment.
a laptop computer would theoretically take 8.5
years (not useful), but that simulation could be run Scaling bioinformatics – high
in the cloud with a workflow using 40,000 CPUs performance computing and the cloud
in just four hours. Workflow manager software The ability to scale bioinformatics solutions is
comes in a variety of packages. A comprehensive important. A good example can be demonstrated
list is available in GitHub11. by Genomics England. In the 100,000 Genomes
The current emphasis on the deployment of the Project, DNA is sequenced by Illumina. As analy-
FAIR data principles (Findable, Accessible, ses scale, so the underlying platforms need to
Interoperable, Reproducible) in bioinformatics was change, eg individual applications might require
noted as indeed was the Pistoia Alliance project12 SaaS (Software as a Service) such as GATK* and
and its multi-author paper on the ‘Implementation Dragen*, through genomics platforms (PaaS
of FAIR Data Principles for Pharma and Life [Platform as a Service]) such as BaseSpace
Sciences’13. The use of workflow managers helps Sequence Hub*, SevenBridges*, DNAnexus* and
to address the reproducibility of these analyses and up to Infrastructure platforms (IaaS [Infrastructure
sharing the code through repositories such as as a service]) such as AWS*, Google Cloud* and
Github or ContainerHub allows other users to run Microsoft Azure*.
exactly the code that was used to generate the ini- Key business considerations for investing in
tial results. bioinformatics include:

Drug Discovery World Summer 2019 59


Informatics

l Scalability. for biomarker discovery. The data generated can


l On-premise hardware investment (capital also be integrated with clinical outcomes data to
expenditure) versus cloud-based implementation perform patient survival analyses. Currently ClaraT
(operational expenditure). covers three of the 10 biologies identified to have a
l Workforce – bioinformaticians, software engi- role in cancer, with plans to ultimately cover all 10
neers, DevOps teams. areas. However, this assay is designated as RUO
l Compute and storage costs. (Research Use Only) and as such cannot be used for
l Direct instrument integration. diagnostic or prognostic purposes, including pre-
l Security and compliance requirements, especial- dicting responsiveness to a particular therapy.
ly in a clinical environment. Machine Learning (ML) has been deployed on
l Accuracy and reproducibility. sequencing data derived from FFPE (Formalin
l Turnaround times. Fixed, Paraffin Embedded) tissues to automate
l Out-of-the-box, plug-and-play solutions versus removal of artefacts introduced by the formalin-
custom-built. fixing process. This approach has reduced the inci-
dence of artefacts from 42% down to just over 1%
Integrating the data siloes – analysing in some cases. One of the key advantages in doing
separate data sets together this is to allow archival samples to be screened
Sharing healthcare genomics data across multina- more accurately, allowing the possibility of increas-
tional sites creates a range of challenges. ing the numbers of available clinical trial partici-
Interesting work to address some of these is being pants.
carried out by Elixir and the Genomic Alliance for
Global Health (GA4GH). It has been estimated Emerging technologies –
that in 2012, roughly 1% of all genome sequencing Blockchain and AI/ML
was funded by healthcare; by 2022 that is expected Looking forward, Blockchain and Artificial
to increase to 80%. As clinical data requires addi- Intelligence/Machine Learning (AI/ML) could play
tional security and compliance requirements over an increasingly important role. Blockchain utilises
research data, most clinical data sets need to distributed ledger technology and provides an irre-
remain in defined geographical locations. vocable audit trail for all data handling without
However, to benefit from these large volumes of the requirement of a trusted party. Such an
data distributed globally, especially when looking approach could contribute strongly to making
at rare diseases, there needs to be a means to pro- sequence data available to researchers in a con-
tect patient confidentiality while allowing trolled manner, the partnership between Shivom
researchers to search these data sets to identify and Lifebit providing one example of such capabil-
where specific patient populations can be found. ity17. Furthermore, the application of blockchain
Federating databases is the best practice identified technology could support the regulatory compli-
to date, but it requires common data models and ance of diagnostic analyses. It was noted that the
tools to allow interoperability across sites. Pistoia Alliance had a blockchain project
This becomes even more pressing with initiatives ‘Blockchain supporting Life Science & Health’18 to
being announced such as the MEGA (Million explore the capabilities of this exciting technology
European Genomes Available) Initiative15. Elixir is in life sciences.
working closely with other national organisations It was anticipated that if AI were optimally
and global initiatives, such as GA4GH to address exploited it could make a strong contribution to
some of these issues with projects such as the biopharma and healthcare. Some examples were
Beacons initiative, which allows users to identify put forward, ie AI could:
the locations of patients with specific genetic char-
acteristics across multiple sites globally, and creat- l Predict patient drug response.
ing a Tools Platform16 to provide easy access to l Support patient stratification to optimise clinical
bioinformatics tools. trials or to personalise treatments.
l Predict disease progression.
NGS bioinformatics: l Diagnose disease.
challenges and solutions l Discover biomarkers (thereby improving diag-
For more diagnostics to be developed to identify nostics).
more diseases, more biomarkers need to be discov- l Optimise drug design in silico to increase effica-
ered. The ClaraT assay from Almac can analyse gene cy and decrease toxicity.
expression data from microarrays or RNAseq data l Support multi-omics data analysis optimisation.

60 Drug Discovery World Summer 2019


Informatics

Table 1

TOOL DESCRIPTION URL

AWS A leading provider of cloud-based Infrastructure-as-a-Service https://aws.amazon.com/

AWS Batch AWS Batch enables developers, scientists, and engineers to easily and https://aws.amazon.com/batch/
efficiently run hundreds of thousands of batch computing jobs on AWS

AWS Lambda AWS Lambda lets you run code without provisioning or managing servers. https://aws.amazon.com/lambda/
You pay only for the compute time you consume – there is no charge
when your code is not running

BaseSpace Sequence Data management and analysis that is simple enough for labs getting https://www.illumina.com/content/dam/illumina-
Hub started, or powerful enough for rapidly scaling up next-generation marketing/documents/products/datasheets/datas
sequencing (NGS) operations heet_basespace.pdf

Biomarker A biomarker is a characteristic that is objectively measured and evaluated https://link.springer.com/referenceworkentry/10.


as an indicator of normal biologic processes, pathogenic processes, or 1007%2F978-1-4419-9863-7_211
pharmacologic responses to a therapeutic intervention

CloudKnot A Python Library to Run your Existing Code on AWS Batch http://conference.scipy.org/proceedings/scipy201
8/pdfs/adam_richie-halford.pdf

DNAnexus A secure, trusted cloud platform and global network for scientific https://www.dnanexus.com/
collaboration and accelerated discovery

Dragen Dynamic Read Analysis for GENomics. A Bio-IT Platform providing https://emea.illumina.com/products/by-
accurate, ultra-rapid secondary analysis of sequencing data type/informatics-products/dragen-bio-it-
platform.html

Galaxy An open source, web-based platform for data intensive biomedical https://usegalaxy.org/
research

GATK A genomic analysis toolkit focused on variant discovery https://software.broadinstitute.org/gatk/

Google Cloud A leading provider of cloud-based Infrastructure-as-a-Service https://cloud.google.com/

Lifebit Automates multi-omics & big data HPC/Cloud deployment. Leverages AI https://lifebit.ai/
for breakthrough insights generation

Microsoft Azure A leading provider of cloud-based Infrastructure-as-a-Service https://azure.microsoft.com/en-gb/

Nextflow Nextflow enables scalable and reproducible scientific workflows using https://www.nextflow.io/
software containers

pywren Pywren runs existing python code at massive scale via AWS Lambda http://pywren.io/

SevenBridges A biomedical data company, specialising in software and data analytics to https://www.sevenbridges.com/
drive public and private healthcare research.

CE Marking CE marking proves that your product has been assessed and meets EU https://europa.eu/youreurope/business/product/
safety, health and environmental protection requirements ce-mark/index_en.htm

Genome A genome is an organism’s complete set of DNA, including all of its genes https://ghr.nlm.nih.gov/primer/hgp/genome

Drug Discovery World Summer 2019 61


Informatics

While AI/ML was subject to much hype, there cloud was playing an increasingly important role in
are a broad range of areas where this technology bioinformatics analyses by enabling the scalability
might USEFULLY be applied. of systems needing to keep up with increased
What’s on the horizon that will impact bioinfor- workload. The cloud also made available a wide
matics? variety of tools, including AI/ML-based tools, to
The future will: increase the capability of bioinformatics analyses.
l Be driven by companies that either are not well Finally, blockchain technology could contribute
known or do not yet exist. strongly to the management, availability and anal-
l Be secure, for cloud-based systems offer ysis of genomics data allowing the individual to
advanced security features and alerts (eg UK-OFFI- own their data and to make it available for
CIAL security classification is supported by research as and when they choose. DDW
AWS19).
l Require more experiments executed more quickly.
l Demand ease-of-use, eg design thinking and user
experience engineering will be increasingly impor-
tant20.
l Require more cloud-accessible, software compo- Mike Furness was Founder of TheFirstNuomics
nents, datasets, tools and techniques to build and currently works at Qiagen in the
sophisticated applications. Bioinformatics Customer Services Team. He has
l Require cloud-based scalability of applications. spent more than 30 years working in genomics and
l Be about data sets, tools and techniques, eg AWS bioinformatics, developing and applying new tech-
is supporting public-hosted data sets and under- nologies to understanding disease and drug R&D.
pins many tools (eg Lifebit*) and techniques (eg He has previously worked for Life Technologies,
pywren*, CloudKnot* and Nextflow*). Cancer Research UK, Pfizer, Incyte Genomics,
DNAnexus, Congenica and Lifebit, as well as con-
Conclusion sulting widely for pharmaceutical and technology
CDx are the sine qua non of precision medicine companies and investors and the Pistoia Alliance.
and CDx needs to conform to the rigorous quality
requirements imposed by the EU IVDR, whether John Wise specialises in precompetitive collabora-
by obtaining CE marking or exercising Health tion in the life science R&D information ecosys-
Institute Exemption. The capabilities of gene tem. He is a consultant to the Pistoia Alliance, a
sequencing and its bioinformatics analysis were not-for-profit organisation committed to lowering
increasing rapidly, while the associated time and the barriers to innovation in life science R&D, and
costs were decreasing. When bioinformatics was also serves as the programme co-ordinator for the
involved in diagnostics then the bioinformatics sys- PRISME Forum, a not-for-profit biopharma R&D
tems needed to be validated in accordance with a IT/Informatics leadership group focused on the
recognised quality system to demonstrate that their sharing of best practices. John has worked in life
results were robust and repeatable. Bioinformatics science R&D informatics in a variety of organisa-
was an essential tool to investigate the genetic tions, including academia, the pharmaceutical
causes of disease. Data standards and federated industry and a cancer research charity, as well as in
approaches to healthcare genetic data needed to be the technology supply side of the industry. John
developed and deployed to allow research access to graduated in physiology before obtaining a post-
data that was geographically distributed. The graduate certificate in education.

ADVERTISEMENT INDEX
Agilent Technologies, Inc 39 Charles River Laboratories, Inc 32-33 Quanterix Corporation IFC
Analytik Jena 25 ELRIG 49 Select Biosciences Ltd IBC
Biostrata Ltd 29 Eurofins Discovery Services 6 Taconic Biosciences, Inc 50
BioTek Instruments, Inc 27 Horizon Discovery Group plc 4,31
BMG Labtech GmbH 22,OBC Labcyte, Inc 3

62 Drug Discovery World Summer 2019

You might also like