Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review

Alagarswamy, Kokiladevi; Shi, Wenjie; Boini, Aishwarya; Messaoudi, Nouredin; Grasso, Vincent; Cattabiani, Thomas; Turner, Bruce; Croner, Roland; Kahlert, Ulf D.; Gumbs, Andrew

doi:10.3390/biomedinformatics4030096

Open AccessReview

Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review

by

Kokiladevi Alagarswamy

¹,

Wenjie Shi

²,

Aishwarya Boini

³

,

Nouredin Messaoudi

⁴

,

Vincent Grasso

⁵,

Thomas Cattabiani

⁶

,

Bruce Turner

⁷,

Roland Croner

²

,

Ulf D. Kahlert

²

and

Andrew Gumbs

^2,7,8,*

¹

Department of Medicine, Georgian National University, 0144 Tbilisi, Georgia

²

Department of General-, Visceral-, Vascular and Transplantation Surgery, University of Magdeburg, Haus 60a, Leipziger Str. 44, 39120 Magdeburg, Germany

³

Davao Medical School Foundation, Davao City 8000, Philippines

⁴

Department of Hepatopancreatobiliary Surgery, Vrije Universiteit Brussel (VUB), Universitair Ziekenhuis Brussel (UZ Brussel), Europe Hospitals, 1090 Brussels, Belgium

⁵

Department of Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM 87131, USA

⁶

Fourth State of Matter Technologies, Bayonne, NJ 07002, USA

⁷

Talos Surgical, Inc., New Castle, DE 19720, USA

⁸

Department of Surgery, American Hospital of Tbilisi, 0102 Tbilisi, Georgia

^*

Author to whom correspondence should be addressed.

BioMedInformatics 2024, 4(3), 1757-1772; https://doi.org/10.3390/biomedinformatics4030096

Submission received: 1 April 2024 / Revised: 28 April 2024 / Accepted: 15 July 2024 / Published: 24 July 2024

(This article belongs to the Special Issue Feature Papers in Applied Biomedical Data Science)

Download

Browse Figures

Versions Notes

Abstract

:

In this scoping review, we delve into the transformative potential of artificial intelligence (AI) in addressing challenges inherent in whole-genome sequencing (WGS) analysis, with a specific focus on its implications in oncology. Unveiling the limitations of existing sequencing technologies, the review illuminates how AI-powered methods emerge as innovative solutions to surmount these obstacles. The evolution of DNA sequencing technologies, progressing from Sanger sequencing to next-generation sequencing, sets the backdrop for AI’s emergence as a potent ally in processing and analyzing the voluminous genomic data generated. Particularly, deep learning methods play a pivotal role in extracting knowledge and discerning patterns from the vast landscape of genomic information. In the context of oncology, AI-powered methods exhibit considerable potential across diverse facets of WGS analysis, including variant calling, structural variation identification, and pharmacogenomic analysis. This review underscores the significance of multimodal approaches in diagnoses and therapies, highlighting the importance of ongoing research and development in AI-powered WGS techniques. Integrating AI into the analytical framework empowers scientists and clinicians to unravel the intricate interplay of genomics within the realm of multi-omics research, paving the way for more successful personalized and targeted treatments.

Keywords:

whole genome sequencing; artificial intelligence; surgery

1. Introduction

Currently, most advanced genomic studies for patients with cancers involve panels that analyze approximately 500 genes; however, the human genome contains approximately 20,000 genes. With prices ranging near 1000USD, cost is no longer the main obstacle to the whole genomic sequencing (WGS) of tumors. The limitation is how to analyze the massive amounts of data created (Figure 1). It could be argued that if there are no chemotherapeutic targets outside of the known genes in tumor panels that are being used, then more genomic data would not be able to provide any useful data to help guide therapeutic treatment decisions. Luckily, over the last 10–15 years computer scientists have developed machine learning (ML) algorithms with deep learning (DL) architectures that have paved the way for viable artificial intelligence (AI) [1]. This review will discuss the reasons why WGS is not routinely performed in the management of cancer patients and discuss how AI can potentially overcome many of the limitations in WGS analysis in oncology. Here, we offer a palimpsest of historic genomic sequencing technologies for our purposes, but we strongly recommend to the reader excellent authoritative reviews such as that by Shendure et al. [2].

2. The Evolution of DNA Sequencing

In the 1970s, Sanger and Maxam–Gilbert sequencing were first-generation techniques that revolutionized DNA sequencing. Sanger sequencing, also known as the chain termination, dideoxynucleotide, or sequencing-by-synthesis (SBS) method, involves utilizing one strand of double-stranded DNA as a template. This method employs chemically modified dideoxynucleotides (ddNTPs), marked as ddG, ddA, ddT, and ddC for each DNA base. Incorporating these ddNTPs prevents further elongation, resulting in DNA fragments of varied sizes. Gel electrophoresis separates these fragments, visible through imaging systems (X-ray or UV light) [3,4,5,6]. Applied Biosystems automated Sanger sequencing in the late 1980s with the ABI Prism 370 (Life Technologies, Waltham, MA, USA), utilizing capillary electrophoresis for fast and accurate sequencing. This method played a crucial role in sequencing projects for Bacteriophages [7] and various plant species [8,9], with its most notable accomplishment being the decoding of the first human genome [10].

While Sanger sequencing remained prominent for single or low-throughput DNA sequencing over three decades, challenges arose in speeding up the analysis for complex genomes, such as those of plant species, while still being expensive and time-consuming [11]. In contrast, Maxam–Gilbert sequencing, another first-generation method known as the chemical degradation method, relies on the chemical cleaving of nucleotides, which is particularly effective for small nucleotide polymers [12,13]. This method, performed without DNA cloning, generates marked fragments separated by electrophoresis. Despite its initial use, Maxam–Gilbert sequencing faced challenges, and development and refinements of Sanger sequencing favored the latter method. Moreover, Maxam–Gilbert sequencing was deemed unsafe due to its use of toxic and radioactive chemicals [11].

For three decades, Sanger sequencing faced significant challenges in terms of cost and time. However, a transformative shift occurred after 2005 with the advent of a new generation of sequencers, overcoming the limitations of the earlier generations. Second-generation sequencing (SGS) technologies rapidly produce vast amounts of sequence data at a relatively low cost, enabling the completion of a human genome in just a few weeks. This approach involves generating millions of short reads from amplified individual DNA fragments through iterative cycles of nucleotide extensions. However, the extensive data generated pose challenges in terms of interpretation, analysis, and management.

The widespread adoption of SGS technologies has significantly influenced biomedical research with applications ranging from WGS and target resequencing to the characterization of structural and copy number variations, profiling of epigenetic modifications, transcriptome sequencing, and identification of infectious agents. Ongoing developments involve the creation of new methodologies and instruments aimed at sequencing the entire human genome in less than a day [13]. Short read sequencing methods are broadly categorized into sequencing by ligation (SBL) and SBS, with major platforms such as Roche/454 (launched in 2005, Basel, Switzerland), Illumina/Solexa (in 2006), and ABI/SOLiD (in 2007).

These platforms marked significant advancements in sequencing technology [11]. Roche/454 sequencing, introduced in 2005, utilizes pyrosequencing based on the detection of pyrophosphate released after each nucleotide incorporation. This method involves random fragmentation of DNA samples, bead attachment with primers, emulsion PCR amplification, and pyrosequencing on a picotiter plate, enabling parallel reactions. The latest instrument, GS FLX+, generates reads of up to 1000 bp [14,15,16]. Ion Torrent semiconductor sequencing, acquired by Life Technologyies, California, United States of America in 2010, employs a chip with microwells and detects hydrogen ions release during sequencing instead of fluorescent labeled nucleotides. Ion Torrent sequencers produce read lengths of 200 bp, 400 bp, and 600 bp, offering advantages in faster sequencing times [16,17].

Solexa, later acquired by Illumina, commercialized the Genome Analyzer (GA) in 2006. Illumina’s SBS approach, currently the most widely used technology, involves random DNA fragmentation, adapter ligation, cluster amplification, and sequencing using reversible terminators. Illumina sequencers yield high data outputs exceeding 600 Gbp, with short read lengths initially around 35 bp but now reaching around 125 bp [18]. Supported Oligonucleotide Ligation and Detection SOLiD, developed by Applied Biosystems (ABI) after acquiring Solexa, utilizes sequencing by ligation. The process involves multiple sequencing rounds, adapter attachment, emulsion PCR, ligation of 8-mers with fluorescent labels, and recording emitted colors [14]. ABI/SOLiD produces short reads with lengths initially at 35 bp, improving to 75 bp with high accuracy due to each base being read twice. However, drawbacks include relatively short reads and long run times, with errors attributed to noise during the ligation cycle, mainly causing substitution errors [11,19].

The SGS technologies discussed earlier have significantly transformed DNA analysis and have been widely adopted compared to the first-generation sequencing technologies. However, SGS technologies often necessitate a time-consuming and expensive PCR amplification step. Moreover, the intricate nature of genomes, featuring numerous repetitive regions, poses challenges for SGS technologies, especially with their relatively short reads, complicating genome assembly. In response to these challenges, scientists have introduced a new era of sequencing known as “third-generation sequencing” (TGS). TGS technologies address the limitations of SGS by offering lower sequencing costs, streamlined sample preparation without the need for PCR amplification, and significantly faster execution times. TGS also excels in generating long reads, surpassing several kilobases, which proves instrumental in resolving assembly issues and dealing with repetitive regions in complex genomes [11].

Two primary approaches characterize TGS [5]: the single-molecule real-time sequencing approach (SMRT) [20] and the synthetic approach. The SMRT approach, developed by the Quake laboratory [21,22,23], is widely used and implemented by Pacific Biosciences and Oxford Nanopore sequencing, particularly the MinION sequencer. Pacific Biosciences, a leading developer in TGS, introduced the first genomic sequencer using the SMRT approach. Unlike other technologies, Pacific Biosciences’ sequencer detect signals in real time during nucleotide incorporation instead of executing amplification cycles. The system employs SMRT cells, each containing zero-mode waveguides (ZMWs), nanostructures with diameters in the tens of nanometers [24,25]. These ZMWs utilize light properties, preventing its propagation through openings with diameters less than the wavelength, and the resulting decrease in light intensity along the wells illuminates the bottom. Each ZMW houses a DNA polymerase and the target DNA fragment for sequencing. As nucleotides are incorporated, they emit a luminous signal recorded by sensors, enabling the determination of the DNA sequence. Pacific Bioscience technology offers several advantages over SGS. Sample preparation is remarkably swift, taking 4 to 6 h instead of days. Additionally, the technology produces long read lengths, averaging around 10 kbp, with individual reads extending up to 60 kbp—surpassing the capabilities of any SGS technology. Despite its high error rate of approximately 13%, dominated by insertions and deletions, these errors are randomly distributed along the long reads [15,18,26,27,28].

Oxford Nanopore sequencing (ONT) was devised as a method for determining the sequence of nucleotides in DNA. In 2014, Oxford Nanopore Technologies, Oxford, United Kingdom introduced the MinION, a compact single-molecule Nanopore sequencing device measuring four inches in length and connecting to a laptop computer via a USB 3.0 port. Released for testing through the MinION Access Program (MAP), the MinION sequencer garnered attention for its potential to generate longer reads, facilitating improved resolution of structural genomic variants and repeat content [28,29,30]. The MinION sequencer has several advantages, including its cost-effectiveness, compact size, and real-time data display on the device screen without the need to wait for a run’s completion. Notably, the MinION can yield very long reads, surpassing 150 kbp, which enhances the contiguity of de novo assembly. However, the MinION does exhibit a relatively high error rate of approximately 12%, distributed across ~3% mismatches, ~4% insertions, and ~5% deletions [31].

3. What Is Whole Genomic Sequencing (WGS)?

WGS provides the most comprehensive data about a given organism. SGS and TGS techniques, or together what we will refer to Next Generation Sequencing (NGS), can deliver large amounts of data in a short amount of time. Profiling an entire genome facilitates the discovery of novel genes and variants associated with disease, particularly those in non-coding areas of the genome. The initial phase of NGS crucially involves the extraction and isolation of nucleic acids, whether it be total RNA, genomic DNA, or various RNA types. The DNA (or cDNA) sample undergoes a process that results in relatively short double-stranded fragments, typically ranging from 100 to 800 base pairs. Depending on the specific application, DNA fragmentation can be achieved through various methods such as physical shearing, enzyme digestion, or PCR-based amplification of specific genetic regions. These resulting DNA fragments are then linked to technology-specific adaptor sequences, creating a fragment library. These adaptors may also carry a distinctive molecular “barcode” to uniquely tag each sample with a specific DNA sequence.

Library preparation is the subsequent step, involving the preparation of DNA or RNA samples for processing and reading by sequencers. This is accomplished by fragmenting the samples to produce a pool of appropriately sized targets and adding specialized adapters at both ends, which will later interact with the NGS platform. The resulting prepared samples, referred to as “libraries”, represent a collection of molecules ready for sequencing. The specific library preparation procedure may vary based on the reagents and methods used, but the ultimate NGS libraries must consist of DNA fragments of desired lengths with adapters at both ends. Before sequencing, the DNA library may be affixed to a solid surface and clonally amplified to enhance the detectable signal from each target during sequencing (this step is only for certain sequencing technologies, such as illumina). Throughout this process, each unique DNA molecule in the library is attached to the surface of a bead or a flow cell and subjected to PCR amplification, generating a set of identical clones. These libraries are then subjected to further quality control steps before sequencing to ensure accuracy. Ultimately, all the DNA in the library is sequenced simultaneously using a sequencing instrument.

Each NGS experiment results in substantial quantities of intricate data comprising short DNA reads. While different technology platforms have their distinct algorithms and data analysis tools, they generally follow a similar analysis ‘pipeline’ and employ common metrics to assess the quality of NGS datasets. The analysis can be categorized into three stages: primary, secondary, and tertiary analysis. Primary analysis involves the conversion of raw signals from instrument detectors into digitized data or base calls. Raw data, collected during each sequencing cycle, are processed into files containing base calls assembled into sequencing reads (FASTQ files) along with their associated quality scores (Phred quality score). Secondary analysis encompasses read filtering and trimming based on quality, followed by the alignment of reads to a reference genome, or the assembly of reads for novel genomes, concluding with variant calling. The primary output is a Binary Alignment Map (BAM) file containing aligned reads. Tertiary analysis is the most intricate phase, requiring the interpretation of results and extraction of meaningful information from the data [32,33,34,35,36].

4. AI-Powered Whole Genomic Sequencing

Genomics is progressing into an era of data-driven science. With the emergence of high-throughput technologies in human genomics, we find ourselves inundated with a vast amount of genomic data. AI, particularly deep learning (DL) methods, plays a crucial role in extracting knowledge and patterns from this wealth of genomic information. The proper execution of the variant calling step is pivotal for the success of numerous studies in clinical, association, or population genetics. The array of contemporary genomics protocols, techniques, and platforms complicates the selection of methods and algorithms, as there is no universal solution applicable to all scenarios. The accurate identification of gene variants in a person’s genome from tens of millions of small, error-prone reading sequences is still an ongoing challenge despite the fast progress made by sequencing technologies. Poplin et al. showed that a deep convolutional neuronal network, called DeepVariant, was able to effectively identify gene variations within a concurrent NGS reading [37]. This was achieved through the model learning statistical relationships from images of read pileups around potential variants and true genotype calls.

Notably, DeepVariant outperforms existing state-of-the-art tools. The acquired model demonstrates generalization across genome builds and mammalian species, enabling nonhuman sequencing projects to leverage the extensive human ground-truth data. The study further illustrates DeepVariant’s ability to adapt and call variants in various sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes. This underscores the advantages of employing automated and versatile techniques for variant calling [37,38].

Identifying genetic variants from NGS data presents a formidable challenge due to the inherent errors in NGS reads, which exhibit error rates ranging from approximately 0.1% to 10%. Moreover, these errors stem from a multifaceted process influenced by factors such as instrument characteristics, preceding data processing tools, and the genomic sequence itself. A central challenge in genomics involves identifying nucleotide variations within an individual’s genome compared to a reference sequence, a process known as “variant calling”. Accurate and efficient variant calling is crucial for detecting genomic variations responsible for phenotypic disparities and diseases. Clairvoyante addresses this challenge by predicting variant type, zygosity, alternative alleles, and indel length. Remarkably, Clairvoyante overlooks sample specifics and can identify variants in less than 2 h on a standard server. Introduced to predict variant characteristics such as SNP or indel type, zygosity, alternative alleles, and indel length, the Clairvoyante model overcomes a limitation in the DeepVariant model, which lacks comprehensive variant details, including precise alternative alleles and variant type. Notably, Clairvoyante is tailored for utilizing long-read sequencing data from technologies like those from Pacific Biosciences (PacBio) and Oxford Nanopore Technology (ONT), though it is versatile enough to be commonly applied to short read datasets as well [38,39,40].

Lei Cai et al. expand this advanced approach to address the challenge of calling structural variations. They introduce DeepSV, a DL-based method designed for calling long deletions from sequence reads. DeepSV utilizes a unique visualization method for sequence reads, strategically capturing multiple sources of information in the sequence data relevant to long deletions. Additionally, DeepSV incorporates techniques to handle noisy training data effectively. The model in DeepSV is trained using visualized sequence reads, and deletion calls are made based on this trained model. The authors demonstrate that DeepSV surpasses existing methods in terms of the accuracy and efficiency of deletion calling, particularly on data from the 1000 Genomes Project. This study highlights the potential of DL in effectively calling various types of genetic variations that are more complex than single-nucleotide polymorphisms (SNPs) [41].

Intelli-NGS, on the other hand, excels in discerning reliable variant calls from Ion Torrent sequencer data. Ion Torrent is a second-generation sequencing platform with smaller capital costs than Illumina, but it is also prone to higher machine error than the latter. Given its lower costs, the platform is generally preferred in developing countries where NGS is still a very exclusive technique. There are many software tools available for platforms other than Ion Torrent (life Technologies, California, United States of America). This makes the already-tricky analysis part even more error-prone. Intelli-NGS excels in discerning reliable variant calls from Ion Torrent sequencer data [42]. Additionally, models like DeepGestalt are designed to identify facial phenotypes associated with genetic disorders [43], DeepMiRGene predicts miRNA precursors [44], and DeepMILO (DL for Modeling Insulator Loops) forecasts the impact of non-coding sequence variants on 3D chromatin structure [45].

DeepPVP (PhenomeNet Variant Predictor) is proficient in identifying variants in both whole-exome and whole-genome sequence data [46], while ExPecto excels in accurately predicting the tissue-specific transcriptional effects of mutations and functional single-nucleotide polymorphisms (SNPs) [47]. PEDIA (Prioritisation of exome data by image analysis) is instrumental in prioritizing variants and genes for diagnosing patients with rare genetic disorders [48].

The exome sequencing approach is extensively used in research and diagnostic laboratories to discover pathologic variants and study the genetic architecture of human diseases. However, a significant proportion of identified genetic variants are actually false-positive calls, and these pose a serious challenge for the interpretation of variants. A new tool named Genomic vARiants FIltering by dEep Learning moDels in NGS (GARFIELD-NGS) relies on DL models to dissect false and true variants in exome sequencing experiments. GARFIELD-NGS significantly reduces the proportion of false candidates, thus improving the identification of diagnostically relevant variants. These results define GARFIELD-NGS as a robust tool for all types of Illumina and ION exome data. The GARFIELD-NGS script performs automated variant scoring on VCF files, and it can be easily integrated in existing analysis pipelines [49].

Other notable models, including DeepWAS [50], Basset [51], DanQ [52], and SPEID [53], focus on identifying disease-associated single-nucleotide polymorphisms (SNPs), predicting causative SNPs, extracting DNA function directly from sequence data, and enhancing promoter interaction (EPI) prediction, respectively [50,51,52,53]. In the realm of gene expression and regulation, a range of tools such as DeepExpression, DeepGSR, SpliceAI, Gene2vec, and MPRA-DragoNN serve distinct purposes, such as predicting gene expression, recognizing genomic signals and regions, identifying splice function, generating a representation of gene distribution, and predicting as well as analyzing regulatory DNA sequences and non-coding genetic variants [54,55,56,57,58].

Non-coding RNAs, a relatively new focus of intensive investigation, were initially perceived as regulators of gene expression at the post-transcriptional level, without encoding functional proteins. Studies have unveiled that non-coding RNAs, including miRNAs, piRNAs, endogenous siRNAs, and long non-coding RNAs, are prevalent regulators. Importantly, a growing body of evidence underscores the significant contribution of regulatory non-coding RNAs to the realm of epigenetic control, emphasizing the noteworthy role of RNA in governing gene expression [59].

Detecting the functional impacts of non-coding variants poses a significant hurdle in human genetics [60]. To predict the effects of non-coding variants directly from sequencing data, a DL-based algorithmic framework called DeepSEA was created. DeepSEA learns a regulatory sequence code directly from extensive chromatin profiling data, allowing the precise prediction of chromatin effects resulting from sequence alterations at the single-nucleotide level. Other DL models in epigenomics, such as FactorNet, DeepCpG, and Basenji, are adept at predicting cell-type-specific transcriptional binding factors, methylation states from single-cell data, and cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes [40,61].

5. Pharmacogenomic Deep Learning Models

In pharmacogenomics, models like DeepD, DNN-DTI, DeepBL, DeepDrug3D, DrugCell, and DeepSynergy are applied for translating pharmacogenomic features, predicting drug–target interactions, forecasting beta-lactamase, characterizing, and classifying protein 3D binding pockets, as well as predicting drug response and synergy in anticancer drugs. Graph-based methods, like Graph Convolutional Networks (GCNs), are important in biomedical research to predict protein–protein interactions (PPIs), understand drug interactions, and facilitate drug repurposing. For PPI prediction, GCNs analyze protein interaction networks to uncover hidden associations among proteins, help identify drug targets, and elucidate disease mechanisms. GCNs model drug–target interactions as graphs, enabling the prediction of interactions and accelerating drug discovery. Further, by integrating diverse biological data, GCNs identify connections between drugs and diseases to promote cost-effective and time-efficient drug development.

Collectively, these models form a comprehensive suite of DL tools crucial for understanding and analyzing genomics and related fields [61,62,63,64,65,66]. DL methods have proven to be highly effective in predicting treatment responses based on “-omic” datasets of cell lines. An illustrative example is Drugcell, a visible neural network (VNN) interpretation model designed to elucidate the structure and function of human cancer cells in response to therapy. This model aligns its central mechanisms with the organizational structure of human biology, enabling the prediction of drug responses across various cancers and intelligently planning successful treatment combinations.

DrugCell was specifically engineered to capture both aspects of therapy response within an interpretable model, comprising two divisions: the VNN, which integrates call genotype, and the artificial neural network (ANN), which integrates drug design. The first VNN model takes in text files detailing the hierarchical associations between molecular subsystems in human cells, incorporating 2086 biological process standards from the Gene Ontology (GO) database. The second ANN model takes traditional ANN inputs, integrating text files representing the Morgan fingerprint of drugs, the chemical structure represented by the canonical vector symbol. The outputs from these two divisions are amalgamated into a single layer of neurons, generating the response of a given genotype to a specific therapy. The prediction accuracy of each drug individually demonstrated significant precision, revealing distinct drug sub-populations. This level of accuracy competes with state-of-the-art regression methods utilized in previous models for predicting drug responses. Notably, when compared to a parallel neural network model trained solely on drug design and labeled tissue, DrugCell significantly outperformed the tissue-based model. This underscores that DrugCell has effectively assimilated data from somatic mutations beyond the capabilities of a tissue-only approach [66,67].

6. Exploring AI-Powered Genomics in Multi-Omics Research

6.1. Radiomics, Pathomics and Surgomics

In addition to WGS, AI has also enabled the development of multiple new “-omic” fields such as Radiomics Pathomics and more recently Surgomics [68,69,70,71,72]. A fusion of WGS, radiomics, and pathomics was proposed by the Artificial intelligence, Radiomics, Oncopathomics and Surgomics (AiRGOS) project [73,74,75,76,77,78,79,80]. It has been shown that radiomic analysis of single arterial phase cross-sectional images of hepatocellular cancers can be used to accurately predict which tumors will recur early [81]. By combing AI-powered analysis of WGS patterns of tumors with pre-operative three-dimensional reconstructions of tumors and pathomic analysis of histopathology slides, it is believed that tumor boards enhanced with AI can be created once they are trained with retrospective chemotherapy, immunotherapy, and radiation treatment regimens. Hopefully, an algorithm will be able to be created that can reduce the therapeutic non-responsive rate that many cancer patients suffer [82,83,84,85]. By minimizing lost time to ineffective treatments and reducing exposure to side effects of these treatments, it is believed that overall survival can be significantly improved (Figure 2).

Once this concept is validated post-operatively, so that better decisions on adjuvant treatments can be determined, the project will be expanded to neoadjuvant treatment regimens and then real-time analysis in the actual operating room [72,86,87]. Known as Video Surgomics, video data from the operating room can theoretically be analyzed in real time with enhanced imaging techniques such as multispectral imaging, narrow band imaging, and biophotonics so that information that the human eye cannot perceive during surgery can be captured, analyzed, and interpreted so that the operating surgeon can make better decisions regarding the decision for surgery and the extent of resection. Hopefully, Video Surgomics will be able to reduce the number of patients who have occult carcinomatosis, sarcomatosis, or metastases. Conceivably, it could help guide surgeons to make real-time decisions for more advanced therapies, such as the choice for intra-peritoneal chemotherapy.

6.2. Proteomics, Transcriptomics, and Genomics

The generation and processing of huge biological data sets (-omics data) is made possible by technologic and informatics advancements, which are driving a fundamental change in biomedical science research. Even while the fields of proteomics, transcriptomics, genomics, bioinformatics, and biostatistics are gaining ground, they are still primarily evaluated separately using different methodologies, producing monothematic, rather than integrated, information. Combining and applying (multi)omics data can improve knowledge of molecular pathways, mechanisms, and processes that distinguish between health and disease [88]. Within the field of proteomics, transcriptomics and genomics are dynamic partners that provide distinct insights into the complex regulation of biological functions.

Proteomics is the scientific study of the proteome, or the whole set of proteins expressed and altered by a biological system. Proteomes are extremely dynamic and constantly changing both within and among biological systems. The word “proteomics” was coined by Marc Wilkins in 1996 to emphasize how much more complex and dynamic researching proteins is than studying genomes. Using techniques like mass spectrometry (MS), protein microarrays, X-ray crystallography, chromatography-based methods, and Western blotting, proteomics analyzes a range of factors related to protein content, function, regulation, post-translational modifications, expression levels, mobility within cells, and interactions. Mass spectrometry has become an essential high-throughput proteomic technique these days, especially when paired with liquid chromatography (LC-MS/MS). The way that protein structure is predicted has fundamentally changed as a result of DL advancements like the AlphaFold algorithm [89].

The use of AI technology has resulted in notable advancements in the field of proteomics. The exponential increase in biomedical data, particularly multi-omics and genome sequencing datasets, has ushered in a new era of data processing and interpretation. AI-driven mass spectrometry-based proteomic research has progressed because of data sharing and open access laws. Initially, AI was restricted to data analysis and interpretation, but recent advances in DL have transformed the sector and improved the accuracy and caliber of data. DL may be able to surpass the best-in-class biomarker identification processes that are currently available in predicting experimental peptide values from amino acid sequences. Proteomics and AI convergence presents a transformative paradigm for biomedical research, offering fresh perspectives on biological systems and ground-breaking approaches to diagnosis and treatment [90].

Though proteomics is not expressly discussed, this narrative alludes to its consequences. Understanding protein-level manifestations becomes crucial at the intersection of genetics and proteomics, as highlighted by the focus on PPAR proteins and their therapeutic potential in colonic disorders. Additionally, the integration of flow cytometry and genomics in hematologic malignancies suggests a proteomic component, highlighting the importance of assessing protein expression for accurate diagnosis [91].

The transcriptome, or collection of all RNA transcripts, of an organism is studied by transcriptomic technology. An organism’s DNA encodes its information, which is then expressed by transcription. Since its first attempt in the early 1990s, transcriptomics has seen substantial change. RNA sequencing (RNA-Seq) and microarrays are two important methods in the field. Measurements of gene expression in various tissues, environments, or time periods shed light on the biology and regulation of genes. Understanding human disease and identifying wide-ranging coordinated trends in gene expression have both benefited greatly from this analysis [92].

Since the late 1990s, technological advancements have transformed the sector. Transcriptomics has come a long way since its inception due to techniques like serial analysis of gene expression (SAGE), the emergence of microarrays, and NGS technologies in the 2000s. Because the transcriptome is dynamic, it is challenging to define and analyze, which calls for the application of AI methods with ML techniques like Random Forests and Support Vector Machines. Neural networks and other DL technologies have been shown to be crucial in enhancing transcript categorization by unveiling intricate biological principles. Understanding the molecular mechanisms underlying differential gene expression requires a thorough analysis of gene expression data. AI tools such as Random Forests and Deep Neural Networks (DNNs) analyze massive datasets to distinguish between clinical groups and detect diseases. Gene expression becomes more complex due to polyadenylation and alternative splicing; AI aids in predicting splicing patterns and comprehending splicing codes [93].

Specific challenges are introduced by single-cell RNA sequencing (scRNA-seq), including a high proportion of zero-valued observations, or “dropouts”. The visualization and interpretation of high-dimensional scRNA-seq data has benefited from enhanced dimensionality reduction through the use of ML and DL techniques like Uniform Manifold Approximation and Projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE) [93].

Significant transcriptome data is produced by RNA-seq technology and can be obtained by AI-based algorithms [94]. The transcriptome is largely composed of non-coding RNAs (ncRNAs), which have become important actors with a variety of roles ranging from complex mRNA regulatory mechanisms to catalytic functions. The field of transcriptome analysis has continued to advance, as seen by the evolution of techniques from conventional Northern blotting to sophisticated RNA-Seq (RNA sequencing) [91]. Transcriptomic research has been greatly impacted by AI, notably in oncology. By enhancing the accuracy of identifying cancer states and stages, AI transcriptomic analysis contributes to the development of precision medicine. AI techniques like denoising and dropout imputation tackle problems in scRNA-seq research such as high noise levels and missing data. AI algorithms are essential for separating biological signals from noise and integrating multi-omics data as the profiles get more complicated.

Immunotherapy issues are resolved by AI-assisted transcriptome analysis, which analyzes tumor heterogeneity, predicts responses, and identifies different cell types. Technologies will be continuously created and used in immunotherapy research as the era of precision medicine grows. Combining them could boost the effectiveness of immunotherapies and alter the course of cancer research [94]. The integration of AI into transcriptomics has significantly enhanced our comprehension of the transcriptome, particularly considering the growing technologies and new challenges in single-cell research [93].

The study of genomics reveals the complexities encoded within an organism’s entire set of genes. It emphasizes the importance of even single-nucleotide polymorphisms (SNPs) in defining genetic loci that contribute to complicated disorders. It does, however, address the difficulties, such as false-positive connections, highlighting the importance of precision in experimental design. The combined global efforts in genomics, particularly in the quick identification of the SARS-associated coronavirus, demonstrate the discipline’s real-world effect in tackling new health concerns [91].

ML has made immense progress in genomics since the 1980s, particularly with the integration of DL techniques in the 2000s. In fact, ML has been instrumental in predicting DNA regulatory areas, annotating sequence elements, and discovering chromatin effects in genomics. To effectively handle the vast number of sequences and diverse data sources, DL methods such as Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) have been utilized. The success of unsupervised learning, specifically GANs and Autoencoders (AEs), in identifying intricate patterns in biological sequences through the extraction of representative features from genomics data further demonstrates the potential of these powerful techniques in genomics research.

The integration of generative models and large language models (LLMs) into AI-powered WGS could help the understanding of genetics, epigenetics, and personalized treatment.

Generative models, particularly those based on DL like variational AEs or GANs, can simulate epigenetic modifications with high fidelity. By training these models on large datasets of epigenetic data, we can generate realistic epigenetic profiles for different cell types, tissues, and conditions. This capability can provide insights into the complex interplay between genetic and epigenetic factors to unravel the mechanisms of diseases and guide treatment. LLMs can analyze vast amounts of genetic results and clinical data to predict clinical outcomes for individual patients. By learning patterns and associations from large-scale genomic and clinical databases, LLMs can assist clinicians to improve patient outcomes, minimize adverse effects, and optimize healthcare resources.

These models can integrate multi-omics data sources, such as genomics, transcriptomics, epigenomics, and proteomics, to capture complex molecular interactions. Furthermore, advanced techniques such as transfer learning and domain adaptation can enhance model performance by leveraging knowledge from related domains or datasets. This interdisciplinary approach enables researchers to solve complex biological questions and address clinical challenges more effectively.

The integration of ML and CRISPR-Cas9 technology is a pivotal pairing of experimental and computational biology, expediting research on large-scale genetic interactions. Approaches utilizing ML and DL have displayed potential in identifying connections between diseases and genes, as well as forecasting the genetic susceptibility of intricate disorders. The scope of methods is evident, ranging from SVM-based classification to comprehensive frameworks employing CNNs, as demonstrated by CADD, DANN, and ExPecto [89].

Studies have been greatly impacted by the emergence of AI, especially big data fields like functional genomics. Large amounts of data have been organized and abstracted using deep architectures, which has improved interpretability. Conversely, the lack of explainability in deep AI architectures casts doubt on the applicability and transparency of findings. The wider use of functional genomics depends on the free and open exchange of AI tools, resources, and knowledge. AI technologies are selected in the fields of biology and functional genomics to provide mechanistic understanding of biological processes. This enables assessment of biological systems or the development of theoretical models that forecast their behavior. The use of AI in systems biology will see competition or cooperation between data-driven and model-driven methods. DeepMind’s AlphaFold approach highlights the power of AI, particularly transformer-based models. Complex considerations of individual and communal rights are involved in functional genomics, and the application of AI necessitates navigating through different data, interpreting pertinent questions, and addressing legal, ethical, and moral issues. In the developing landscape of AI development, a cautious approach is required to ensure that the advantages outweigh the potential negative repercussions [89].

Diverse Interaction provides insight into the vast synergy present in molecular biology. Proteomics reveals expressions at the protein level, transcriptomics reveals the functional RNA environment, and genomics lays the groundwork by interpreting genetic data. The integration of various disciplines offers a thorough understanding of diseases, emphasizing the value of a multimodal approach in diagnosis and therapy. Ultimately, the dynamic interplay of transcriptomics, proteomics, and genomics holds the key to understanding the complexity of illnesses. This discussion highlights the dynamic nature of molecular biology, where each specialty contributes in a different way to the overarching narrative of health and disease [91].

While fields like proteomics, transcriptomics, and genomics are making individual strides, they are often evaluated in isolation, leading to monothematic information. To overcome this limitation, efforts are being made to integrate (multi)omics data, aiming to enhance our understanding of molecular pathways, mechanisms, and processes related to health and disease. Multi-omics integration reveals synergies to provide unique insights into the intricate regulation of biological functions and offers a comprehensive understanding of diseases, emphasizing the significance of a multimodal approach in diagnosis and therapy. The dynamic interplay among these disciplines holds the key to unraveling the complexity of illnesses, showcasing the nuanced contributions of each specialty in the broader narrative of health and disease.

The combination of AI and WGS provides a new vantage to understand disease including prevention, diagnosis, and personal treatment. However, some issues related to technology, ethics, and regulations remain.

First, more attention must be given to data privacy and safety. When applying AI to analyze WGS, it is important that information is not leaked or tampered with. So, developing robust mechanisms for data sharing to promote scientific research and protect personal privacy is a major challenge. Second, interpretability and transparency must improve. Although AI could effectively analyze WGS, AI models are regarded as a “black box” and their decision-making processes are not transparent enough. In the medical field, this opacity can affect the trust of doctors and patients. Improving the interpretability of the models could help medical practitioners understand and verify AI recommendations. Third, data bias needs to be solved. The predictive performance of AI models is highly dependent on training data. If the WGS set is biased in terms of race, gender, or region, the prediction results of the AI algorithm may also be biased. Ensuring data representativeness and model generalizability is key to promoting equitable development.

Other problems such as the requirement of large computing resources, ethical and legal problems, and model development need to be addressed.

7. Conclusions

In this scoping review, we have highlighted the potential of AI-powered WGS in overcoming the limitations of existing sequencing technologies, particularly in the context of oncology. We have provided an overview of the evolution of DNA sequencing technologies, the process of WGS, and the application of AI, specifically DL, in variant calling and pharmacogenomics. In light of this, we emphasize the importance of multimodal approaches in diagnoses and therapies, and the need for further research and development in AI-powered WGS techniques. We suggest that AI-powered WGS has the potential to revolutionize the field of genomics and improve patient outcomes when associated with the realm of multi-omics data, but also acknowledge the challenges associated with the interpretation and management of the vast amount of data generated by high-throughput technologies. Overall, as we have provided a comprehensive overview of the potential of AI in addressing the challenges associated with WGS analysis, we have underscored the need for further research and development in this field to improve patient outcomes through personalized and targeted treatments.

Author Contributions

K.A.: writing of original draft and administration; W.S.: editing and response to reviewers; A.B.: writing of original draft; N.M.: writing of original draft; V.G.: writing of original draft and editing; T.C.: editing and response to reviewers; B.T.: supervision; R.C.: supervision; U.D.K.: editing, response to reviewers, and supervision; A.G.: conceptualization, supervision, administration, writing of original draft, editing, and response to reviewers. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no funding.

Acknowledgments

The authors would like to thank Stephen Song for his help in developing Figure 1 and Figure 2.

Conflicts of Interest

Author Thomas Cattabiani is from Fourth State of Matter Technologies. Andrew Gumbs is the CEO of Talos Surgical, Bruce Turner is the Chairman of Talos Surgical. The other authors have no relevant conflicts of interest to report.

References

Hasanbek, M. Data science and the role of artificial intelligence in medicine: Advancements, applications, and challenges. Eur. J. Mod. Med. Pract. 2024, 4, 90–93. [Google Scholar]
Shendure, J.; Balasubramanian, S.; Church, G.M.; Gilbert, W.; Rogers, J.; Schloss, J.A.; Waterston, R.H. DNA sequencing at 40: Past, present and future. Nature 2017, 550, 345–353, Erratum in Nature 2019, 568, E11. [Google Scholar] [CrossRef] [PubMed]
Sanger, F.; Coulson, A.R. A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. J. Mol. Biol. 1975, 94, 441–448. [Google Scholar] [CrossRef] [PubMed]
Sanger, F.; Nicklen, S.; Coulson, A.R. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 1977, 74, 5463–5467. [Google Scholar] [CrossRef] [PubMed]
Masoudi-Nejad, A.; Narimani, Z.; Hosseinkhan, N. Next Generation Sequencing and Sequence Assembly: Methodologies and Algorithms; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 4. [Google Scholar]
El-Metwally, S.; Ouda, O.M.; Helmy, M. Next Generation Sequencing Technologies and Challenges in Sequence Assembly; Springer Science & Business: Berlin/Heidelberg, Germany, 2014; Volume 7. [Google Scholar]
Sanger, F.; Coulson, A.; Barrell, B.G.; Smith, A.J.H.; Roe, B.A. Cloning in single-stranded bacteriophage as an aid to rapid DNA sequencing. J. Mol. Biol. 1980, 143, 161–178. [Google Scholar] [CrossRef]
The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 2000, 408, 796–815. [Google Scholar] [CrossRef] [PubMed]
Goff, S.A.; Ricke, D.; Lan, T.-H.; Presting, G.; Wang, R.; Dunn, M.; Glazebrook, J.; Sessions, A.; Oeller, P.; Varma, H.; et al. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 2002, 296, 92–100. [Google Scholar] [CrossRef]
Rm, D. A map of human genome variation from population-scale sequencing. Nature 2010, 467, 1061–1073. [Google Scholar]
Kchouk, M.; Gibrat, J.F.; Elloumi, M. Generations of sequencing technologies: From first to next generation. Biol. Med. 2017, 9, 395. [Google Scholar] [CrossRef]
Maxam, A.M.; Gilbert, W. A new method for sequencing DNA. Proc. Natl. Acad. Sci. USA 1977, 74, 560–564. [Google Scholar] [CrossRef]
Bayés, M.; Heath, S.; Gut, I.G. Applications of second generation sequencing technologies in complex disorders. Curr. Top. Behav. Neurogenet. 2012, 12, 321–343. [Google Scholar]
Mardis, E.R. Next-generation DNA sequencing methods. Annu. Rev. Genom. Hum. Genet. 2008, 9, 387–402. [Google Scholar] [CrossRef]
Liu, L.; Li, Y.; Li, S.; Hu, N.; He, Y.; Pong, R.; Lin, D.; Lu, L.; Law, M. Comparison of next-generation sequencing systems. J. Biomed. Biotechnol. 2012, 2012, 251364. [Google Scholar] [CrossRef]
Reuter, J.A.; Spacek, D.V.; Snyder, M.P. High-throughput sequencing technologies. Mol. Cell 2015, 58, 586–597. [Google Scholar] [CrossRef] [PubMed]
Loman, N.J.; Misra, R.V.; Dallman, T.J.; Constantinidou, C.; Gharbia, S.E.; Wain, J.; Pallen, M.J. Performance comparison of benchtop high-throughput sequencing platforms. Nat. Biotechnol. 2012, 30, 434–439. [Google Scholar] [CrossRef]
Kulski, J.K. Next-generation sequencing—An overview of the history, tools, and “Omic” applications. Next Gener. Seq.-Adv. Appl. Chall. 2016, 10, 61964. [Google Scholar]
Alic, A.S.; Ruzafa, D.; Dopazo, J.; Blanquer, I. Objective review of de novo stand-alone error correction methods for NGS data. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2016, 6, 111–146. [Google Scholar] [CrossRef]
Bentley, D.R.; Balasubramanian, S.; Swerdlow, H.P.; Smith, G.P.; Milton, J.; Brown, C.G.; Hall, K.P.; Evers, D.J.; Barnes, C.L.; Bignell, H.R.; et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008, 456, 53–59. [Google Scholar] [CrossRef]
Eid, J.; Fehr, A.; Gray, J.; Luong, K.; Lyle, J.; Otto, G.; Peluso, P.; Rank, D.; Baybayan, P.; Bettman, B.; et al. Real-time DNA sequencing from single polymerase molecules. Science 2009, 323, 133–138. [Google Scholar] [CrossRef] [PubMed]
Braslavsky, I.; Hebert, B.; Kartalov, E.; Quake, S.R. Sequence information can be obtained from single DNA molecules. Proc. Natl. Acad. Sci. USA 2003, 100, 3960–3964. [Google Scholar] [CrossRef] [PubMed]
Harris, T.D.; Buzby, P.R.; Babcock, H.; Beer, E.; Bowers, J.; Braslavsky, I.; Causey, M.; Colonell, J.; DiMeo, J.; Efcavitch, J.W.; et al. Single-molecule DNA sequencing of a viral genome. Science 2008, 320, 106–109. [Google Scholar] [CrossRef] [PubMed]
McCoy, R.C.; Taylor, R.W.; Blauwkamp, T.A.; Kelley, J.L.; Kertesz, M.; Pushkarev, D.; Petrov, D.A.; Fiston-Lavier, A.-S. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS ONE 2014, 9, e106689. [Google Scholar] [CrossRef] [PubMed]
Rhoads, A.; Au, K.F. PacBio sequencing and its applications. Genom. Proteom. Bioinform. 2015, 13, 278–289. [Google Scholar] [CrossRef] [PubMed]
Chin, C.-S.; Peluso, P.; Sedlazeck, F.J.; Nattestad, M.; Concepcion, G.T.; Clum, A.; Dunn, C.; O’Malley, R.; Figueroa-Balderas, R.; Morales-Cruz, A.; et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods 2016, 13, 1050–1054. [Google Scholar] [CrossRef] [PubMed]
Koren, S.; Schatz, M.C.; Walenz, B.P.; Martin, J.; Howard, J.T.; Ganapathy, G.; Wang, Z.; Rasko, D.A.; McCombie, W.R.; Jarvis, E.D.; et al. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 2012, 30, 693–700. [Google Scholar] [CrossRef] [PubMed]
Mikheyev, A.S.; Tin, M.M. A first look at the Oxford Nanopore MinION sequencer. Mol. Ecol. Resour. 2014, 14, 1097–1102. [Google Scholar] [CrossRef] [PubMed]
Laehnemann, D.; Borkhardt, A.; McHardy, A.C. Denoising DNA deep sequencing data—High-throughput sequencing errors and their correction. Brief. Bioinform. 2016, 17, 154–179. [Google Scholar] [CrossRef]
Laver, T.; Harrison, J.; O’neill, P.A.; Moore, K.; Farbos, A.; Paszkiewicz, K.; Studholme, D.J. Assessing the performance of the oxford nanopore technologies minion. Biomol. Detect. Quantif. 2015, 3, 1–8. [Google Scholar] [CrossRef]
Ip, C.L.; Loose, M.; Tyson, J.R.; de Cesare, M.; Brown, B.L.; Jain, M.; Leggett, R.M.; Eccles, D.A.; Zalunin, V.; Urban, J.M.; et al. MinION Analysis and Reference Consortium: Phase 1 data release and analysis. F1000Research 2015, 4, 1075. [Google Scholar] [CrossRef]
Behjati, S.; Tarpey, P.S. What is next generation sequencing? Arch. Dis. Child.-Educ. Pract. 2013, 98, 236–238. [Google Scholar] [CrossRef]
Grada, A.; Weinbrecht, K. Next-generation sequencing: Methodology and application. J. Investig. Dermatol. 2013, 133, e11. [Google Scholar] [CrossRef]
Slatko, B.E.; Gardner, A.F.; Ausubel, F.M. Overview of next-generation sequencing technologies. Curr. Protoc. Mol. Biol. 2018, 122, e59. [Google Scholar] [CrossRef]
Podnar, J.; Deiderick, H.; Huerta, G.; Hunicke-Smith, S. Next-Generation sequencing RNA-Seq library construction. Curr. Protoc. Mol. Biol. 2014, 106, 4–21. [Google Scholar] [CrossRef]
Nakagawa, H.; Fujita, M. Whole genome sequencing analysis for cancer genomics and precision medicine. Cancer Sci. 2018, 109, 513–522. [Google Scholar] [CrossRef]
Poplin, R.; Chang, P.-C.; Alexander, D.; Schwartz, S.; Colthurst, T.; Ku, A.; Newburger, D.; Dijamco, J.; Nguyen, N.; Afshar, P.T.; et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 2018, 36, 983–987. [Google Scholar] [CrossRef]
Chen, N.C.; Kolesnikov, A.; Goel, S.; Yun, T.; Chang, P.C.; Carroll, A. Improving variant calling using population data and deep learning. BMC Bioinform. 2023, 24, 197. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Luo, R.; Sedlazeck, F.J.; Lam, T.W.; Schatz, M.C. A multi-task convolutional deep neural network for variant calling in single molecule sequencing. Nat. Commun. 2019, 10, 998. [Google Scholar] [CrossRef]
Ahsan, M.U.; Gouru, A.; Chan, J.; Zhou, W.; Wang, K. A signal processing and deep learning framework for methylation detection using Oxford Nanopore sequencing. Nat. Commun. 2024, 15, 1448. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Cai, L.; Wu, Y.; Gao, J. DeepSV: Accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network. BMC Bioinform. 2019, 20, 665. [Google Scholar] [CrossRef]
Singh, A.; Bhatia, P. Intelli-NGS: Intelligent NGS, a deep neural network-based artificial intelligence to delineate good and bad variant calls from IonTorrent sequencer data. bioRxiv 2019. [Google Scholar] [CrossRef]
Gurovich, Y.; Hanani, Y.; Bar, O.; Nadav, G.; Fleischer, N.; Gelbman, D.; Basel-Salmon, L.; Krawitz, P.M.; Kamphausen, S.B.; Zenker, M.; et al. Identifying facial phenotypes of genetic disorders using deep learning. Nat. Med. 2019, 25, 60–64. [Google Scholar] [CrossRef]
Park, S.; Min, S.; Choi, H.; Yoon, S. deepMiRGene: Deep neural network based precursor microrna prediction. arXiv 2016, arXiv:1605.00017. [Google Scholar]
Boudellioua, I.; Kulmanov, M.; Schofield, P.N.; Gkoutos, G.V.; Hoehndorf, R. DeepPVP: Phenotype-based prioritization of causative variants using deep learning. BMC Bioinform. 2019, 20, 65. [Google Scholar] [CrossRef]
Trieu, T.; Martinez-Fundichely, A.; Khurana, E. DeepMILO: A deep learning approach to predict the impact of non-coding sequence variants on 3D chromatin structure. Genome Biol. 2020, 21, 79. [Google Scholar] [CrossRef]
Zhou, J.; Theesfeld, C.L.; Yao, K.; Chen, K.M.; Wong, A.K.; Troyanskaya, O.G. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 2018, 50, 1171–1179. [Google Scholar] [CrossRef]
Hsieh, T.-C.; Mensah, M.A.; Pantel, J.T.; Aguilar, D.; Bar, O.; Bayat, A.; Becerra-Solano, L.; Bentzen, H.B.; Biskup, S.; Borisov, O.; et al. PEDIA: Prioritization of exome data by image analysis. Genet. Med. 2019, 21, 2807–2814. [Google Scholar] [CrossRef]
Ravasio, V.; Ritelli, M.; Legati, A.; Giacopuzzi, E. Garfield-ngs: Genomic variants filtering by deep learning models in NGS. Bioinformatics 2018, 34, 3038–3040. [Google Scholar] [CrossRef]
Arloth, J.; Eraslan, G.; Andlauer, T.F.M.; Martins, J.; Iurato, S.; Kühnel, B.; Waldenberger, M.; Frank, J.; Gold, R.; Hemmer, B.; et al. DeepWAS: Multivariate genotype-phenotype associations by directly integrating regulatory information using deep learning. PLoS Comput. Biol. 2020, 16, e1007616. [Google Scholar] [CrossRef]
Kelley, D.R.; Snoek, J.; Rinn, J.L. Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016, 26, 990–999. [Google Scholar] [CrossRef]
Quang, D.; Xie, X. DanQ: A hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016, 44, e107. [Google Scholar] [CrossRef]
Singh, S.; Yang, Y.; Póczos, B.; Ma, J. Predicting enhancer-promoter interaction from genomic sequence with deep neural networks. Quant. Biol. 2019, 7, 122–137. [Google Scholar] [CrossRef]
Zeng, W.; Wang, Y.; Jiang, R. Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network. Bioinformatics 2020, 36, 496–503. [Google Scholar] [CrossRef]
Kalkatawi, M.; Magana-Mora, A.; Jankovic, B.; Bajic, V.B. DeepGSR: An optimized deep-learning structure for the recognition of genomic signals and regions. Bioinformatics 2019, 35, 1125–1132. [Google Scholar] [CrossRef]
Jaganathan, K.; Panagiotopoulou, S.K.; McRae, J.F.; Darbandi, S.F.; Knowles, D.; Li, Y.I.; Kosmicki, J.A.; Arbelaez, J.; Cui, W.; Schwartz, G.B.; et al. Predicting splicing from primary sequence with deep learning. Cell 2019, 176, 535–548. [Google Scholar] [CrossRef]
Du, J.; Jia, P.; Dai, Y.; Tao, C.; Zhao, Z.; Zhi, D. Gene2vec: Distributed representation of genes based on co-expression. BMC Genom. 2019, 20, 82. [Google Scholar] [CrossRef]
Movva, R.; Greenside, P.; Marinov, G.K.; Nair, S.; Shrikumar, A.; Kundaje, A. Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PLoS ONE 2019, 14, e0218073. [Google Scholar] [CrossRef]
Kaikkonen, M.U.; Lam, M.T.; Glass, C.K. Non-coding RNAs as regulators of gene expression and epigenetics. Cardiovasc. Res. 2011, 90, 430–440. [Google Scholar] [CrossRef]
Chen, X.; Xu, H.; Shu, X.; Song, C.X. Mapping epigenetic modifications by sequencing technologies. Cell Death Differ. 2023. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Troyanskaya, O.G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 2015, 12, 931–934. [Google Scholar] [CrossRef] [PubMed]
Chiu, Y.-C.; Chen, H.-I.H.; Zhang, T.; Zhang, S.; Gorthi, A.; Wang, L.-J.; Huang, Y.; Chen, Y. Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC Med. Genom. 2019, 12, 18. [Google Scholar]
Xie, L.; He, S.; Song, X.; Bo, X.; Zhang, Z. Deep learning-based transcriptome data classification for drug-target interaction prediction. BMC Genom. 2018, 19, 667. [Google Scholar] [CrossRef]
Wang, Y.; Li, F.; Bharathwaj, M.; Rosas, N.C.; Leier, A.; Akutsu, T.; Webb, G.I.; Marquez-Lago, T.T.; Li, J.; Lithgow, T.; et al. DeepBL: A deep learning-based approach for in silico discovery of beta-lactamases. Brief. Bioinform. 2021, 22, bbaa301. [Google Scholar] [CrossRef]
Pu, L.; Govindaraj, R.G.; Lemoine, J.M.; Wu, H.C.; Brylinski, M. DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput. Biol. 2019, 15, e1006718. [Google Scholar] [CrossRef]
Kuenzi, B.M.; Park, J.; Fong, S.H.; Sanchez, K.S.; Lee, J.; Kreisberg, J.F.; Ma, J.; Ideker, T. Predicting drug response and synergy using a deep learning model of human cancer cells. Cancer Cell 2020, 38, 672–684. [Google Scholar] [CrossRef]
Mavropoulos, A.; Johnson, C.; Lu, V.; Nieto, J.; Schneider, E.C.; Saini, K.; Phelan, M.L.; Hsie, L.X.; Wang, M.J.; Cruz, J.; et al. Artificial Intelligence-Driven Morphology-Based Enrichment of Malignant Cells from Body Fluid. Mod. Pathol. 2023, 36, 100195. [Google Scholar] [CrossRef] [PubMed]
Qiu, H.; Wang, M.; Cao, T.; Feng, Y.; Zhang, Y.; Guo, R. Low-coverage whole-genome sequencing for the effective diagnosis of early endometrial cancer: A pilot study. Heliyon 2023, 9, e19323. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
van der Hoeven, J.J.M.; Monkhorst, K.; van de Wouw, A.J.; Roepman, P. Onbekende primaire tumor opsporen met ‘whole genome sequencing’ [Whole genome sequencing to find the primary tumour in cancer of unknown primary origin]. Ned. Tijdschr. Geneeskd. 2023, 167, D7625. [Google Scholar] [PubMed]
Akhoundova, D.; Rubin, M.A. The grand challenge of moving cancer whole-genome sequencing into the clinic. Nat. Med. 2024, 30, 39–40. [Google Scholar] [CrossRef] [PubMed]
Cao, T.M.; Tran, N.H.; Nguyen, P.L.; Pham, H. Multimodal contrastive learning for diagnosing Cardiovascular diseases from electrocardiography (ECG) signals and patient metadata. arXiv 2023, arXiv:2304.11080. [Google Scholar]
Carreras, J.; Nakamura, N. Artificial Intelligence, Lymphoid Neoplasms, and Prediction of MYC, BCL2, and BCL6 Gene Expression Using a Pan-Cancer Panel in Diffuse Large B-Cell Lymphoma. Hemato 2024, 5, 119–143. [Google Scholar] [CrossRef]
Gumbs, A.A.; Croner, R.; Abu-Hilal, M.; Bannone, E.; Ishizawa, T.; Spolverato, G.; Frigerio, I.; Siriwardena, A.; Messaoudi, N. Surgomics and the Artificial intelligence, Radiomics, Genomics, Oncopathomics and Surgomics (AiRGOS) Project. Artif. Intell. Surg. 2023, 3, 180–185. [Google Scholar] [CrossRef]
Li, J.; Liu, H.; Liu, W.; Zong, P.; Huang, K.; Li, Z.; Li, H.; Xiong, T.; Tian, G.; Li, C.; et al. Predicting gastric cancer tumor mutational burden from histopathological images using multimodal deep learning. Brief. Funct. Genom. 2024, 23, 228–238. [Google Scholar] [CrossRef] [PubMed]
Mondol, R.K.; Millar, E.K.A.; Graham, P.H.; Browne, L.; Sowmya, A.; Meijering, E. hist2RNA: An Efficient Deep Learning Architecture to Predict Gene Expression from Breast Cancer Histopathology Images. Cancers 2023, 15, 2569. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Bagger, F.O.; Borgwardt, L.; Jespersen, A.S.; Hansen, A.R.; Bertelsen, B.; Kodama, M.; Nielsen, F.C. Whole genome sequencing in clinical practice. BMC Med. Genom. 2024, 17, 39. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Ulph, F.; Bennett, R. Psychological and Ethical Challenges of Introducing Whole Genome Sequencing into Routine Newborn Screening: Lessons Learned from Existing Newborn Screening. New Bioeth. 2023, 29, 52–74. [Google Scholar] [CrossRef] [PubMed]
Katsuya, Y. Current and future trends in whole genome sequencing in cancer. Cancer Biol. Med. 2024, 21, 16–20. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Preuer, K.; Lewis, R.P.; Hochreiter, S.; Bender, A.; Bulusu, K.C.; Klambauer, G. DeepSynergy: Predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 2018, 34, 1538–1546. [Google Scholar] [CrossRef]
Alharbi, W.S.; Rashid, M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum. Genom. 2022, 16, 26. [Google Scholar] [CrossRef]
Kinoshita, M.; Ueda, D.; Matsumoto, T.; Shinkawa, H.; Yamamoto, A.; Shiba, M.; Okada, T.; Tani, N.; Tanaka, S.; Kimura, K.; et al. Deep Learning Model Based on Contrast-Enhanced Computed Tomography Imaging to Predict Postoperative Early Recurrence after the Curative Resection of a Solitary Hepatocellular Carcinoma. Cancers 2023, 15, 2140. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Chen, L.; Zhang, C.; Xue, R.; Liu, M.; Bai, J.; Bao, J.; Wang, Y.; Jiang, N.; Li, Z.; Wang, W.; et al. Deep whole-genome analysis of 494 hepatocellular carcinomas. Nature 2024, 627, 586–593. [Google Scholar] [CrossRef] [PubMed]
Samsom, K.G.; Bosch, L.J.W.; Schipper, L.J.; Schout, D.; Roepman, P.; Boelens, M.C.; Lalezari, F.; Klompenhouwer, E.G.; de Langen, A.J.; Buffart, T.E.; et al. Optimized whole-genome sequencing workflow for tumor diagnostics in routine pathology practice. Nat. Protoc. 2024, 19, 700–726. [Google Scholar] [CrossRef] [PubMed]
Iacobucci, G. Whole genome sequencing can help guide cancer care, study reports. BMJ 2024, 384, q65. [Google Scholar] [CrossRef] [PubMed]
Haga, Y.; Sakamoto, Y.; Kajiya, K.; Kawai, H.; Oka, M.; Motoi, N.; Shirasawa, M.; Yotsukura, M.; Watanabe, S.I.; Arai, M.; et al. Whole-genome sequencing reveals the molecular implications of the stepwise progression of lung adenocarcinoma. Nat. Commun. 2023, 14, 8375. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Lancia, G.; Varkila, M.R.J.; Cremer, O.L.; Spitoni, C. Two-step interpretable modeling of ICU-AIs. Artif. Intell. Med. 2024, 151, 102862. [Google Scholar] [CrossRef] [PubMed]
Chow, B.J.W.; Fayyazifar, N.; Balamane, S.; Saha, N.; Clarkin, O.; Green, M.; Maiorana, A.; Golian, M.; Dwivedi, G. Interpreting Wide-Complex Tachycardia using Artificial Intelligence. Can. J. Cardiol. 2024, 1–9. [Google Scholar] [CrossRef] [PubMed]
Auffray, C.; Chen, Z.; Hood, L. Systems medicine: The future of medical genomics and healthcare. Genome Med. 2009, 1, 2. [Google Scholar] [CrossRef]
Caudai, C.; Galizia, A.; Geraci, F.; Le Pera, L.; Morea, V.; Salerno, E.; Via, A.; Colombo, T. AI applications in functional genomics. Comput. Struct. Biotechnol. J. 2021, 19, 5762–5790. [Google Scholar] [CrossRef]
Mann, M.; Kumar, C.; Zeng, W.; Strauss, M.T. Perspective Artificial intelligence for proteomics and biomarker discovery. Cell Syst. 2021, 12, 759–770. [Google Scholar] [CrossRef]
Kiechle, F.L.; Holland-Staley, C.A. Genomics, transcriptomics, proteomics, and numbers. Arch. Pathol. Lab. Med. 2003, 127, 1089–1097. [Google Scholar] [CrossRef]
Lowe, R.; Shirley, N.; Bleackley, M.; Dolan, S.; Shafee, T. Transcriptomics technologies. PLoS Comput. Biol. 2017, 13, e1005457. [Google Scholar] [CrossRef]
Supplitt, S.; Karpinski, P.; Sasiadek, M.; Laczmanska, I. Current Achievements and Applications of Transcriptomics in Personalized Cancer Medicine. Int. J. Mol. Sci. 2021, 22, 1422. [Google Scholar] [CrossRef] [PubMed]
Gui, Y.; He, X.; Yu, J.; Jing, J. Artificial Intelligence-Assisted Transcriptomic Analysis to Advance Cancer Immunotherapy. J. Clin. Med. 2023, 12, 1279. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Whole genomic sequencing can lead to predictive chemotherapy response. Certain genetic markers or patterns may be associated with sensitivity or resistance to specific chemotherapeutic agents. By analyzing the entire genetic profile of the tumor, clinicians can predict the likelihood of response to different chemotherapy drugs and select the most effective treatment regimen for an individual patient.

Figure 2. AI-powered analysis of WGS data can create more powerful cancer treatment paradigms with integration and use of AI/deep learning to optimize surgical resection and neoadjuvant and adjuvant treatment options. Algorithms based on comparative survivor outcomes can identify the best chemotherapy/immunotherapy regimens and predict long-term survival.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alagarswamy, K.; Shi, W.; Boini, A.; Messaoudi, N.; Grasso, V.; Cattabiani, T.; Turner, B.; Croner, R.; Kahlert, U.D.; Gumbs, A. Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review. BioMedInformatics 2024, 4, 1757-1772. https://doi.org/10.3390/biomedinformatics4030096

AMA Style

Alagarswamy K, Shi W, Boini A, Messaoudi N, Grasso V, Cattabiani T, Turner B, Croner R, Kahlert UD, Gumbs A. Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review. BioMedInformatics. 2024; 4(3):1757-1772. https://doi.org/10.3390/biomedinformatics4030096

Chicago/Turabian Style

Alagarswamy, Kokiladevi, Wenjie Shi, Aishwarya Boini, Nouredin Messaoudi, Vincent Grasso, Thomas Cattabiani, Bruce Turner, Roland Croner, Ulf D. Kahlert, and Andrew Gumbs. 2024. "Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review" BioMedInformatics 4, no. 3: 1757-1772. https://doi.org/10.3390/biomedinformatics4030096

Article Menu

Should AI-Powered Whole-Genome Sequencing Be Used Routinely for Personalized Decision Support in Surgical Oncology—A Scoping Review

Abstract

1. Introduction

2. The Evolution of DNA Sequencing

3. What Is Whole Genomic Sequencing (WGS)?

4. AI-Powered Whole Genomic Sequencing

5. Pharmacogenomic Deep Learning Models

6. Exploring AI-Powered Genomics in Multi-Omics Research

6.1. Radiomics, Pathomics and Surgomics

6.2. Proteomics, Transcriptomics, and Genomics

7. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI