Best-Insilco Epitope Design
Best-Insilco Epitope Design
Best-Insilco Epitope Design
Immuno-
informatics
Third Edition
METHODS IN MOLECULAR BIOLOGY
Series Editor
John M. Walker
School of Life and Medical Sciences
University of Hertfordshire
Hatfield, Hertfordshire, UK
Third Edition
Edited by
Namrata Tomar
Department of BioMedical Engineering, Medical College of Wisconsin, Milwaukee, WI, USA
Editor
Namrata Tomar
Department of BioMedical Engineering
Medical College of Wisconsin
Milwaukee, WI, USA
This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer
Nature.
The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface
The immune system is very complex and consists of numerous cell types, molecular path-
ways, and signals, which help a host system to distinguish between normal, healthy cells and
unhealthy cells. All immune cell types have a specific role and ways of recognizing potentially
harmful foreign bodies. To get the diverse details of an immune network, a researcher may
optimize immune responses for a specific issue that may range from minor infections to
cancer. This requires implementing data mining, statistics, and machine learning approaches
to convert high-throughput immune data into meaningful insights. In simpler terms,
Immunoinformatics incorporates the application of bioinformatics methods, mathematical
models, and statistical techniques for the study of immune systems biology. The develop-
ment of immunoinformatics tools, databases, and models involves computer scientists and
modeling experts working closely with immunologists in a multidisciplinary team. Modeling
and computational approaches have been widely applied to solve the problems in immunol-
ogy as in quantifying the data generated in laboratory experiments and extracting meaning-
ful biological information on its kinetics. To state the value of computational tools and
models in immunology research, we need a variety of immune system-related databases,
prediction software and modeling tools, informatics, and computational infrastructure for
connecting computer modeling and wet-lab experimentation, as well as data analytics and
visualization.
This book consists of 23 chapters that cover diverse immunoinformatics research topics.
It involves tools and databases of potential epitope prediction, HLA gene analysis, MHC
characterizing, in silico vaccine design, mathematical modeling of host-pathogen interac-
tions, and network analysis of immune system data.
Chapter 1 introduces a reverse vaccinology approach and its advantages and applications. It
basically searches through genomic sequences to predict antigens that have a capacity to be
used as potential vaccine candidates. It describes required web tools, databases, and software
to predict potential epitopes for vaccine development.
Chapter 2 introduces a peptide-based vaccine approach to design an in silico vaccine
against Zika virus.
Chapter 3 focuses on high-definition genomic analysis of human leukocyte antigen
(HLA) genes that encode for major histocompatibility complex (MHC) proteins. The
genotyping of HLA alleles was done through whole genome sequencing data, whole
exome sequencing data, or targeting sequence of HLA genes by using next-generation
sequencing technology.
Chapter 4 describes detailed steps for a computational vaccine design for MERS-CoV
infections. It mostly makes use of IEDB software to predict the suitable MERS-CoV epitope
vaccine.
Chapter 5 introduces an alignment-independent platform for allergenicity prediction. It
utilizes three modular servers to assess the allergenicity of a randomly selected allergenic
protein and demonstrates a protocol for fast and reliable in silico prediction.
v
vi Preface
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
ix
x Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407
Contributors
xi
xii Contributors
SHUJI KAWAGUCHI • Center for Genomic Medicine, Kyoto University Graduate School of
Medicine, Kyoto, Japan
KUMARI SNEHKANT LATA • Gujarat Biotechnology Research Centre, Department of Science
and Technology, Government of Gujarat, Gandhinagar, India; Department of Botany,
Bioinformatics and Climate Change, Gujarat University, Ahmedabad, India
SHIDE LIANG • Department of R&D, Bio-Thera Solutions, Guangzhou, China
NEHA LOHIA • Department of Biotechnology, Thapar Institute of Engineering and
Technology, Patiala, India; School of Life Sciences, Jaipur National University, Jaipur,
India
SMARAJIT MANNA • Centre for Interdisciplinary Research and Education, Kolkata, India;
Jagadis Bose National Science Talent Search, Kolkata, India
FUMIHIKO MATSUDA • Center for Genomic Medicine, Kyoto University Graduate School of
Medicine, Kyoto, Japan
KUSUM MEHLA • National Bureau of Animal Genetic Resources, Karnal, Haryana, India
QING MENG • State Key Laboratory of Proteomics, Beijing Proteome Research Center,
National Center for Protein Sciences-Beijing (PHOENIX Center), Beijing Institute of
Lifeomics, Beijing, China
ASHESH NANDY • Centre for Interdisciplinary Research and Education, Kolkata, India
BHARGAV PANDYA • Department of Biotechnology and Biological Sciences, Institute of
Advanced Research (IAR), Gandhinagar, Gujarat, India
SAUMYA PATEL • Department of Botany, Bioinformatics and Climate Change, Gujarat
University, Ahmedabad, India
TRIDIP PHUKAN • Microbiology Division, Department of Botany, Gauhati University,
Guwahati, Assam, India
RICHA RAGHUWANSHI • Department of Botany, Mahila Mahavidyalaya, Banaras Hindu
University, Varanasi, UP, India
JAYASHREE RAMANA • Department of Biotechnology and Bioinformatics, Jaypee University of
Information Technology, Waknaghat, HP, India
ALVARO RAS-CARMONA • Department of Immunology, School of Medicine, Complutense
University of Madrid, Madrid, Spain
PEDRO A. RECHE • Department of Immunology, School of Medicine, Complutense University
of Madrid, Madrid, Spain
DAVID ROSSELL • Department of Economics and Business, Universitat Pompeu Fabra,
Barcelona, Spain
SOUMEN ROY • Department of Physics, Bose Institute, Kolkata, India
JOSE L. SANCHEZ-TRINCADO • Department of Immunology, School of Medicine, Complutense
University of Madrid, Madrid, Spain
ASHWANI SHARMA • Biopredic International, Parc d’activité de la Bretèche Bâtiment A4,
Saint Grégoire, France
PUI YAN SIAK • Department of Biotechnology, Faculty of Applied Sciences, UCSI University,
Kuala Lumpur, Malaysia
SAPTARSHI SINHA • Department of Physics, Bose Institute, Kolkata, India
SUBRATA SINHA • Centre for Biotechnology and Bioinformatics, Dibrugarh University,
Dibrugarh, Assam, India
MICHAEL SLIFKER • Cancer Biology Program, Fox Chase Cancer Center, Philadelphia, PA,
USA
DIVYA TARWADI • Department of Biotechnology and Biological Sciences, Institute of Advanced
Research (IAR), Gandhinagar, Gujarat, India
Contributors xiii
Abstract
The application of the fields of pharmacogenomics and pharmacogenetics to vaccine design, profoundly
combined with bioinformatics, has been recently termed “vaccinomics.” The enormous amount of infor-
mation generated by whole genome sequencing projects and the rise of bioinformatics has triggered the
birth of a new era of vaccine research and development, leading to a “third generation” of vaccines, which
are based on the application of vaccinomics science to vaccinology. The first example of such an approach is
reverse vaccinology. Reverse vaccinology reduces the period of vaccine target detection and evaluation to
1–2 years. This approach targets the genomic sequence and predicts those antigens that are most likely to be
vaccine candidates. This approach allows not only the identification of all the antigens obtained by the
previous methods but also the discovery of new antigens that work on a totally different paradigm. Hence
this method helps in the discovery of novel mechanisms of immune intervention. Epitope-based immune-
derived vaccines (IDV) are generally considered to be safe when compared to other vectored or attenuated
live vaccines. Epitope-based IDV may also provide essential T-cell help for antibody-directed vaccines. Such
vaccines may have a significant advantage over earlier vaccine design approaches, as the cautious assortment
of the components may diminish.
1 Introduction
1
2 Amol M. Kanampalliwar
1.1 Modification in In this approach the genome of the different isolates of same
Reverse Vaccinology organism is compared with each other by using computer analysis.
The first pan genome approach was done against Streptococcus
1.1.1 Pan Genomic
agalactiae [13].
Reverse Vaccinology
1.1.2 Comparative In this approach the pathogenic and nonpathogenic strains of one
Reverse Vaccinology species are compared at their genetic level. It deals with the differ-
ences in structure of proteins of different organisms.
1.3 The Role of When the conventional ways fail to develop a vaccine, then one has
Epitope Prediction in to follow the nonconventional ways for the preparation of vaccine.
Reverse Vaccinology Until now the genomic sequences of more than 500 pathogens
including bacteria and viruses are available on NIH list. As the
techniques are available for studying host-pathogen interactions,
whole genome study, and every unique gene, the work is now
focused on the development of epitope-driven vaccines that are
target specific.
An epitope is an antigenic determinant that plays an important
role in immunity of an organism. These are present on the surface
of organisms that can be detected by the antibody [15]. Reverse
vaccinology deals with computational analysis of genome that can
be used for the prediction of the epitopes that are surface proteins.
So the epitopes play an important role in the development of a
candidate vaccine. The major role played in immune system is by B
and T lymphocyte. B cells are important in recognizing the epitopes
of the antigens that can be identified by the paratopes of antibody.
In some cases, T cells play a role in cell-mediated immunity as the
processed antigenic peptides interact with the T cell when they are
presented in context of T cell. So the prediction of the epitopes of T
4 Amol M. Kanampalliwar
2 Materials
3 Methods
Table 1
List of selected HIV-1 proteins
Name Accession Start Stop GeneID Locus Locus tag Protein product Length
Pr55(gag) NC_001802.1 336 1838 155030 Gag HIV1gp2 NP_057850.1 500
Reverse Vaccinology and Its Applications 5
Fig. 1 Query window for the prediction of MHC I bonding epitope present at RANKPEP
6 Amol M. Kanampalliwar
Table 2
List of predicted epitopes against MHC class I for Pr55 (Gag)
Table 3
List of predicted epitopes against MHC class II for Pr5 (Gag)
Table 4
List of predicted models MHCI of Pr55 (Gag)
Table 5
List of predicted models IEDB MHCII of Pr55 (Gag)
aln = alignment(env)=’gp1.ali’
10 Amol M. Kanampalliwar
10. Running this will create two output files. Compare.txt file
contains a dendrogram for all possible templates selected.
With the help of this dendrogram, one best and closely related
template was chosen for the next step. The chosen template
had minimum resolution and maximum similarity with the
query sequence (epitope).
11. Align2d.py fie was opened in word pad and PDB ID of selected
template was specified in program lines:
13. File named model-single.py was opened in word pad and cho-
sen template name was entered in desired program lines:
15. This step generated five probable models for the target epitope,
details of which are present in output file—model-single.txt
file. The model having lowest Dope score was selected as the
final model for the target epitope.
16. For analysis of the generated structure, Ramachandran plot was
generated by SwissPdb viewer software [16].
17. The modeled structure was opened in SwissPdb viewer:
File ! open pdb file ! gp1.B99990002.pdb.
Select ! all.
Wind ! Ramachandran Plot.
The images of modeled epitopes were taken by RasMol.
Display ! wireframe.
Colour ! group.
Option ! label.
3.4 Molecular In this project, AUTODOCK software was used for molecular
Docking (Tables 6, 7, docking and the protocol was designed based on the results of
8, 9 and Figs. 8, 9) previous experiments [17, 18].
1. Broadly neutralizing antibodies (BrNAbs) that were produced
by the immune systems of some HIV-infected patients were
selected for molecular docking [19–21]. The PDB structures
of antibodies were downloaded from the website http://www.
rcsb.org/pdb/home/home.do.
2. A folder was created where files needed for docking by AUTO-
DOCK 1.5.4 were pasted. These files include PDB file of target
protein (antibody) and ligand protein (modeled epitope struc-
ture) and extension file of autodock and autogrid.
12 Amol M. Kanampalliwar
Table 6
Docking results of RANKPEP MHC I Pr55 (gag) epitopes
Table 7
Software and their links for prediction of B-cell epitope prediction (MHC binding peptides) (Adopted
from Kanampalliwar et al., 2013) [22]
Epitope prediction
tools URL
Bepitope http://www-dsv.cea.fr/en/institutes/institute-of-environmental-biology-and-
biotechnology-ibeb/services2/department-of-biochemistry-and-nuclear-
toxicology-sbtn/molecular-recognition-and-interactions-laboratory-lirm/
research/software/bepitope
BcePred http://www.imtech.res.in/raghava/bcepred/
Pepitope http://pepitope.tau.ac.il/
Ellipro http://tools.immuneepitope.org/tools/ElliPro
Epitopia http://epitopia.tau.ac.il
ABCpred http://www.imtech.res.in/raghava/abcpred
FBCpred http://ailab.cs.iastate.edu/bcpreds/
Discotope http://www.cbs.dtu.dk/services/DiscoTope-2.0
3. The software was opened and the following steps were followed
for docking.
4. The target file (antibody PDB file) was prepared first:
The target file was opened in Autodock
File—read molecule—open target file.
Color—by atom type—all geometries—ok.
Edit—hydrogens—add polar only.
Reverse Vaccinology and Its Applications 13
Table 8
Immunological databases and their links (Adopted from Kanampalliwar et al., 2013) [22]
Table 9
Software and their links for prediction of T-cell epitope prediction (MHC binding peptides) (Adopted
from Kanampalliwar et al., 2013) [22]
5. The ligand file (modeled epitope PDB file) was then prepared:
Ligand—input—open pdb file of ligand—ok.
Ligand—torsion tree—detect root.
Ligand—torsion tree—show/hide root maker.
Ligand—torsion tree—choose torsions—done.
Ligand—torsion tree—set number of torsions—dismiss.
Ligand—output—save PDBQT—save as ligand.pdbqt.
Reverse Vaccinology and Its Applications 15
10. After the completion of the above step (which takes quite long
time ranging from 1 h to 3 h), a file “dock.dlg” was created in
the folder. This file contains the binding energies, docking
energies, and RMSD values on the basis of which the docked
confirmation analysis can be done.
11. The dock.dlg file was further opened in autodock:
Analyze—Docking—open the dock.dlg file.
Analyze—macromolecule—open.
Analyze—confirmation play—click on “&”—show info.
16 Amol M. Kanampalliwar
References
1. Poland GA, Ovsyannikova IG, Jacobson RM 12. LM L (2010) New strategies for vaccine devel-
(2009) Application of pharmacogenomics to opment. SPCV 2:e4
vaccines. Pharmacogenomics 10(5):837–852 13. Lefébure T, Stanhope MJ (2007) Evolution of
2. Bagnoli F et al (2011) Designing the next gen- the core and pan-genome of streptococcus:
eration of vaccines for global public health. positive selection, recombination, and genome
OMICS 15(9):545–566 composition. Genome Biol 8:R71
3. Rappuoli R (2000) Reverse vaccinology. Curr 14. Donati C, Rappuoli R (2013) Reverse vaccinol-
Opin Microbiol 3(5):445–450 ogy in the 21st century: improvements over the
4. Elliott SL et al (2008) Phase I trial of a CD8+ original design. Ann N Y Acad Sci
T-cell peptide epitope-based vaccine for infec- 1285:115–132
tious mononucleosis. J Virol 82 15. Ansari HR, Raghava GP (2010) Identification
(3):1448–1457 of conformational B-cell epitopes in an antigen
5. Gahery H et al (2006) New CD4+ and CD8+ from its primary sequence. Immunome Res 6:6
T cell responses induced in chronically HIV 16. Guex N, Peitsch MC (1997) SWISS-MODEL
type-1-infected patients after immunizations and the Swiss-PdbViewer: an environment for
with an HIV type 1 lipopeptide vaccine. AIDS comparative protein modeling. Electrophoresis
Res Hum Retrovir 22(7):684–694 18(15):2714–2723
6. Asjo B et al (2002) Phase I trial of a therapeutic 17. Goodsell DS, Olson AJ (1990) Automated
HIV type 1 vaccine, Vacc-4x, in HIV type docking of substrates to proteins by simulated
1-infected individuals with or without antire- annealing. Proteins 8(3):195–202
troviral therapy. AIDS Res Hum Retrovir 18 18. Sotriffer CA et al (2000) Automated docking
(18):1357–1365 of ligands to antibodies: methods and applica-
7. Kran AM et al (2004) HLA- and dose- tions. Methods 20(3):280–291
dependent immunogenicity of a peptide- 19. Walker LM et al (2011) Broad neutralization
based HIV-1 immunotherapy candidate coverage of HIV by multiple highly potent
(Vacc-4x). AIDS 18(14):1875–1883 antibodies. Nature 477(7365):466–470
8. De Groot AS et al (2011) Tools for vaccine 20. Walker LM et al (2009) Broad and potent neu-
design: prediction and validation of highly tralizing antibodies from an African donor
immunogenic and conserved class II epitopes reveal a new HIV-1 vaccine target. Science
and development of epitope-driven vaccines, in 326(5950):285–289
development of vaccines. John Wiley & Sons, 21. Trkola A et al (1996) Human monoclonal anti-
Inc., Hoboken, New Jersey, pp 65–94 body 2G12 defines a distinctive neutralization
9. Lara HH, Garza-Treviño EN, Ixtepan- epitope on the gp120 glycoprotein of human
Turrent L, Singh DK (2011) Silver nanoparti- immunodeficiency virus type 1. J Virol 70
cles are broad-spectrum bactericidal and viru- (2):1100–1108
cidal compounds. J Nanobiotechnology 9:30 22. Kanampalliwar AM, Soni R, Girdhar A, Tiwari
10. Geels MJ et al (2011) European vaccine initia- A (2013) Web based tools and databases for
tive: lessons from developing malaria vaccines. epitope prediction and analysis: a contextual
Expert Rev Vaccines 10(12):1697–1708 review. Int J Comput Bioinform In Silico
11. Rinaudo CD, Telford JL, Rappuoli R, Seib KL Model 2(4):180–185
(2009) Vaccinology in the genome era. J Clin
Invest 119(9):2515–2525
Chapter 2
Abstract
With the increasing frequency of viral epidemics, vaccines to augment the human immune response system
have been the medium of choice to combat viral infections. The tragic consequences of the Zika virus
pandemic in South and Central America a few years ago brought the issues into sharper focus. While
traditional vaccine development is time-consuming and expensive, recent advances in information technol-
ogy, immunoinformatics, genetics, bioinformatics, and related sciences have opened the doors to new
paradigms in vaccine design and applications.
Peptide vaccines are one group of the new approaches to vaccine formulation. In this chapter, we discuss
the various issues involved in the design of peptide vaccines and their advantages and shortcomings, with
special reference to the Zika virus for which no drugs or vaccines are as yet available. In the process, we
outline our work in this field giving a detailed step-by-step description of the protocol we follow for such
vaccine design so that interested researchers can easily follow them and do their own designing. Several
flowcharts and figures are included to provide a background of the software to be used and results to be
anticipated.
Key words Peptide vaccine, Sequence descriptors, Vaccine design protocol, Alignment-free techni-
ques, Average solvent accessibility (ASA) and protein variability, Epitopes, Graphical methods
1 Introduction
17
18 Ashesh Nandy et al.
Virus attack
Fig. 1 Flowchart: How the immune system works against a viral infection
3.1 Graphical Sometimes the available information is partial, as, e.g., in the case of
Methods the Zika virus during the early days of the outbreak. In this case, we
do a simple 2D graphical representation of the available sequence
and examine how the various data fragments fit and whether they
can be used for a deeper analysis; this was the case for the Zika virus
mentioned, and we needed that to ensure we had the right fit
[16]. The method is to assign the four bases of a nucleotide
sequence to the four cardinal directions of a Cartesian coordinate
system [27]; e.g., assign adenine (a) to the negative x-axis, cytosine
(c) to the positive y-axis, guanine (g) to the positive x-axis, and
thymine (t) to the negative y-axis (in the databases, uracil (u) in
RNA sequences is generally represented by t). To plot a graph, one
takes a step for a base in the sequence in the assigned direction, then
the next step for the next base, and so on until the whole sequence
is plotted. This charts the base distribution of a sequence as a curve
on the 2D grid (see Note 1), including some degeneracy arising out
of overlapping steps, where comparison of two or more sequences
can be made to visually see where they are similar or different. This
feature was used in one of our papers to group together several
human papillomavirus types to design one variety of vaccine
[15]. Moreover, the graphical representation
P canPbe enumerated
xi yi
by defining a center of mass μx ¼ iN and μy ¼ iN and a graph
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
radius, g R ¼ μx 2 þ μy 2 that is the distance of the center of mass
from the origin [28]. The gR turns out to be a very sensitive
measure of the base distribution of a sequence and equal gR implies
practically identical sequences [29]. The gR, therefore, has been
designated as an index, a descriptor that is characteristic of a
sequence. The advent of graphical representations and their numer-
ical characterization gave rise to many other approaches [30], but
for ease of computation, we operate with the 2D method outlined
above (see Notes 2–4).
22 Ashesh Nandy et al.
where the pR, similarly as in the case of gR, acts as a descriptor of the
sequence. With a little bit of adjustment in the computation as
explained in our paper [14] (see Note 5), the pR turns out to be
an index of a sequence or segment where equal pR values for two
segments imply identical sequences. This property of the pR is very
useful for our purposes of vaccine design.
3.2 Viral Mutations After identifying the surface protein we wish to design vaccines for,
it is necessary to question whether it itself or segments of it are
stable against mutations, lest our efforts get overcome in short
time. Because viruses mutate very rapidly, and more so the RNA
viruses like the Zika, dengue, and others, that do not have an error-
correcting mechanism in their replication machinery, one has to
ensure that the segments identified will be reasonably stable. There
are two approaches to this task: alignment of sequences to deter-
mine conserved regions using some software such as BLAST [32]
and an alignment-free approach. Alignment procedures are well
established and familiar to most molecular biologists. The difficul-
ties with this approach are that (1) it is model dependent on how to
take care of mismatches and (2) it is highly computation intensive
and cannot accommodate too many sequences at a time, limited by
the computing resources available. On the other hand, alignment-
free approaches, like the graphical methods outlined above, are
relatively recent, take sequences as they are without the necessity
of a fit to other sequences, and can be computed fairly easily for a
very large number of sequences.
3.3 Graphical Sliding This methodology is used to determine those segments of the
Window Method selected antigen that are fairly well conserved compared to the
for Protein Variability rest of the sequence by comparing the relevant index for each
Methodology for Peptide Vaccine Design 23
3.5 3D Modeling However, the chance that parts of the identified candidates could
get covered by neighboring proteins needs to be checked. For this
we consider a 3D model of the structure of the protein, using
structural information from a protein database such as UniProt
and a modeling software like Cn3D or PyMOL. We have found a
space fill rendition the best visual where the identified peptides can
be highlighted in different colors (see Fig. 4). If any of the selected
peptides are overshadowed by neighboring proteins, we can leave
those out and have a shorter list to carry on with (see Note 7).
3.6 Epitope Potential The next step is to determine whether one or more of the peptides
in the short list has good potential to generate an immune
response. Not all amino acid combinations are able to do that;
those that do are classified as epitopes. One of the most widely
used web servers for this purpose is the IEDB (Immune Epitope
Database Analysis Resource). We have to submit the viral protein
sequence and select among the many options what we want as
24 Ashesh Nandy et al.
40
35
30
25
20
15
10
5 A
B C D E F
0
0 50 100 150 200 250 300 350 400 450 500
Sliding window position
Fig. 3 Plots of average solvent accessibility (ASA) profile (upper curve) of H5N1
neuraminidase protein sequence with segment variability (lower curve) deter-
mined by the graphical sliding window method. Comparison of the two curves
yielded six likely candidates, labeled (A–F), that showed the least protein
segment variability and maximum solvent accessibility. (Reproduced from Ref.
12 with permission of the authors)
output. The idea is that our antigen epitopes should elicit the T-cell
and B-cell antibodies to enable the immune reaction to the invad-
ing pathogen. This is also facilitated by a knowledge of the target
host community’s human leukocyte antigen (HLA) alleles, a profile
of which is available in a separate database within the IEDB suite.
On running the system thereafter, the IEDB output will be a series
of 15-mer peptides with percentile ranks about how strong (per-
centile rank 0 (strong) to 99 (weak)) the immune response could
be; generally, percentile ranks above 10 are supposed to be rather
weak and only peptides with ranks below 5 are considered worth
further tests. In our case of vaccine design candidates, we consider
only those peptides in the IEDB output that carry the short-listed
peptides identified in the previous step and determine if they fit in
the overall scheme of acceptable epitopes (see Note 8).
Methodology for Peptide Vaccine Design 25
3.7 Autoimmune If some, or all, of the short-listed peptides are acceptable as epi-
Threats topes, we next need to ensure that they will not result in any
autoimmune threats. This can happen if any of the new peptides
we determine as a suitable candidate is akin to an existing peptide in
the host proteins, in which case the immune response may attack
the host protein also. To safeguard against such an eventuality, we
do a BLAST analysis with our short-listed peptides and determine if
any host protein or peptide has sequence overlap in these peptides,
in which case these peptides need to be removed from our list.
What we finally have is a small list of peptides that have the
potential to evoke a good immune response with no autoimmune
threats. These peptides can be prepared in a laboratory and tested in
animals and tissue culture before undertaking human phase trials.
Table 1
Selection of databases, protein structure, and epitope prediction software
5 Notes
Fig. 5 The Zika virus (KY785476) envelope gene drawn in three different axes systems in the 2D graphical
representation model
28 Ashesh Nandy et al.
Acknowledgments
References
1. Nandy A, Basak SC (2017) The epidemic that 8. Poland GA, Kennedy RB, Ovsyannikova IG
shook the world—the Zika virus rampage. (2011) Vaccinomics and personalized vaccinol-
Explor Res Hypothesis Med 2(3):43–56. ogy: is science leading us toward a new path of
https://doi.org/10.14218/ERHM.2017. directed vaccine development and discovery?
00018 PLoS Pathog 7:e1002344
2. Riedel S (2005) Edward Jenner and the history 9. Poland GA, Whitaker JA, Poland CM, Ovsyan-
of smallpox and vaccination. Proc (Baylor Univ nikova IG, Kennedy RB (2016) Vaccinology in
Med Cent) 18(1):21–25 the third millennium: scientific and social chal-
3. Fu ZF (1997) Rabies and rabies research: past, lenges. Curr Opin Virol 17:116–125
present and future. Vaccine 15:S20–S24 10. Rappuoli R (2001) Reverse vaccinology, a
4. Roy P, Nandy A, Basak SC (2019) Zika virus— genome-based approach to vaccine develop-
the quest for vaccines. In: Basak SC, Bhatta- ment. Vaccine 19:2688–2691
charjee AK, Nandy A (eds) Zika virus surveil- 11. Purcell AW, McCluskey J, Rossjohn J (2007)
lance, vaccinology and anti-Zika drug More than one reason to rethink the use of
discovery: computer-assisted strategies to com- peptides in vaccine design. Nat Rev 6:404–414
bat the menace. Nova Science Publishers Inc., 12. Ghosh A, Nandy A, Nandy P (2010) Compu-
New York tational analysis and determination of a highly
5. Chit A, Parker J, Halperin SA, conserved surface exposed segment in H5N1
Papadimitropoulos M, Krahn M, Grooten- avian flu and H1N1 swine flu neuraminidase.
dorst P (2014) Toward more specific and trans- BMC Struct Biol 10:6. https://doi.org/10.
parent research and development costs: the 1186/1472-6807-10-6
case of seasonal influenza vaccines. Vaccine 32 13. Sarkar T, Das S, De A, Nandy P,
(26):3336–3340. https://doi.org/10.1016/j. Chattopadhyay S, Chawla-Sarkar M, Nandy A
vaccine.2013.06.055 (2015) H7N9 influenza outbreak in China
6. Backert L, Kohlbacher O (2015) Immunoin- 2013: in silico analyses of conserved segments
formatics and epitope prediction in the age of of the hemagglutinin as a basis for the selection
genomic medicine. Genome Med 7:119 of peptide vaccine targets. Comput Biol Chem
7. Tomar N, De RK (2014) Immunoinformatics: 59:8–15
a brief review. In: De RK TN (ed) Methods 14. Ghosh A, Chattopadhyay S, Chawla-Sarkar M
Mol Biol, vol 1184. Springer Science+Business et al (2012) In Silico study of rotavirus VP7
Media, New York. https://doi.org/10.1007/ surface accessible conserved regions for anti-
978-1-4939-1115-8_3 viral drug/vaccine design. PLoS One 7:
30 Ashesh Nandy et al.
Abstract
HLA is essential for various medical applications, such as genomic studies of multifactorial diseases,
including immune system and inflammation-related disorders. Therefore, an accurate HLA typing method
that is applicable for any allele registered in HLA allele databases is required to deduce scientific evidence
related to disorders. Here, we describe a method for determining HLA alleles from next-generation
sequencing (NGS) results by using currently available HLA sequence data in public HLA databases and
show its application in association analysis.
Key words HLA allele, Genotyping, NGS, Software, Database, Bioinformatics, Logistic regression
1 Introduction
31
32 Shuji Kawaguchi and Fumihiko Matsuda
2 Materials
2.1 HLA-HD Setup HLA-HD was developed to enable HLA typing using WGS data,
WES data, RNA-seq data, and target sequence data of HLA genes.
HLA-HD is freely available for academic use and research purposes
upon registration and can be downloaded from the HLA-HD
website (https://www.genome.med.kyoto-u.ac.jp/HLA-HD/).
HLA-HD works on almost any operating system, including
Linux, Mac OS, and Windows. After the HLA-HD source program
is downloaded from the website, HLA-HD can be installed by
typing “sh install.sh.” For installation, GNU Compiler Collection
(https://gcc.gnu.org/) is required. PATH must be set in the com-
puter environment being used; i.e., the .bashrc is changed to
“export PATH¼$PATH:/path_to_HLA-HD_install_directory/
bin” after the installation. HLA-HD requires bowtie2 (http://
www.metagenomics.wiki/tools/bowtie2/) [12] to align the NGS
reads to the sequence of HLA alleles.
2.2 Input Data HLA-HD requires fastq (fastq.gz) data, which include sequences of
for HLA-HD the HLA genes generated by NGS. Mapping for WGS and WES
data takes time because HLA-HD maps all NGS reads to currently
available HLA allele sequences. Therefore, reads should be
dropped in advance by using samtools (http://samtools.
sourceforge.net/) and picard tools (https://broadinstitute.
github.io/picard/) to reduce time costs as follows (see Notes 1
and 2).
High-Definition Genomic Analysis of HLA Genes Via Comprehensive HLA. . . 33
2.3 Amino Acid To perform association analyses of amino acid residues in HLA
Residue Data for HLA protein, a data set of aligned positions for each HLA allele is
Protein required. The IPD-IMGT/HLA database provides the aligned
position data at FTP site (ftp://ftp.ebi.ac.uk/pub/databases/
ipd/imgt/hla/alignments/). Aligned data of HLA proteins tar-
geted in the research of interest can be downloaded, e.g., data for
HLA-A are represented as “A_prot.txt”. However, these positions
are merely numbered from the first amino acid residue of the leader
peptide and therefore are not convenient for a comparative study
between various HLA domains. Furthermore, sequences of almost
all the alleles are unfortunately recorded only at the MHC groove
domains (G-DOMAINs). In light of this problem, IMGT pro-
posed a unique numbering system to unify the positions of
G-DOMAINs [13]. This numbering has become highly valuable
for association studies of amino acid residues in HLA proteins.
34 Shuji Kawaguchi and Fumihiko Matsuda
2.4 Data Set Data set of typed HLA alleles must be converted to four-digit
and Scripts of Analysis resolution. If HLA alleles are newly identified and not recorded in
for Amino Acid the IPD-IMGT/HLA database, they should be registered in the
Residues database before analysis [8]. Sample scripts and data can be down-
loaded from https://www.genome.med.kyoto-u.ac.jp/HLA-HD/
hla_aa_analysis/ (see Note 3). In the demonstration, we used pub-
lic HLA data of Southern Han Chinese (CHS) and Japanese in
Tokyo (JPT) populations, whose samples were sequenced for the
1000 Genomes Project [14] and typed previously [15]. Scripts are
written by python and R codes and checked by python version
2.7.10 and 3.5.4 and R version 3.6.0.
3 Methods
3.1 HLA Typing by 1. HLA-HD: for paired-end short read data, the input command is
HLA-HD
“hlahd.sh [-m <int>] [-t <int>] [-c 0 to 1.0] [-f /path/to/
freq/data] <fastq data 1> <fastq data 2> <hla gene split
file> /path/to/dictionary <result name> /output/directory”,
3.2 Convert Typing The typed HLA alleles by HLA-HD are recorded in sampleID_f-
Results to IMGT Unique inal.result.txt at six-digit resolution as a tab-separated text. One
Numbering allele is represented by a hyphen if the gene was typed as having
homozygous alleles. If a candidate has not been determined for an
allele pair by the end of the run, multiple candidates are listed in
parallel. On the contrary, the allele pair is recorded as “Not typed”
if no candidates were obtained. To convert HLA alleles to amino
acid residues aligned by IMGT unique numbering, first, the
six-digit allele name must be replaced by a four-digit resolution,
cutting the allele name after the second colon and merging the
typed alleles to a tab-separated text file, sample, gene 1 allele
1, gene 1 allele 2, gene 2 allele 1, gene2 allele 2, and so on, for
each sample set (Fig. 1).
Second, the amino acid position files of HLA proteins to be
analyzed must be downloaded from the IPD-IMGT/HLA ftp site
(discussed in Subheading 2.3). By using python scripts and data
files, the typed allele list is realigned to IMGT unique numbering as
follows:
1. Concatenate amino acid residues of each allele in the position
file, e.g., position file for HLA-A “A_prot.txt” is concatenated
by using python script:
Fig. 1 A sample txt file of the typing result. Each column must be separated by tab character, and the header of
two alleles should be described as genename_1 and genename_2
36 Shuji Kawaguchi and Fumihiko Matsuda
DQB1*05:03 D2 EDFVYQFKGLCYFTNGTERVRGVTRHIYNREEYVRFDSDVGVYRAVTPQGRPDAEYWNSQKEVLEGARASVDRVCRHNYEVAYRGILQRR
DQB1*05:04 D2 EDFVYQFKGLCYFTNGTERVRGVTRYIYNREEYVRFDSDVGVYRAVTPQGRPSAEYWNSQKDILEEDRASVDRVCRHNYEVAYRGILQRR
DQB1*05:05 D2 *DFVYQFKGLCYFTNGTERVRGVTRHIYNREEYARFDSDVGVYRAVTPQGRPSAEYWNSQKEVLEGARASVDRVCRHNYEVAYRGILQRR
DQB1*05:106 D2 EDFVYQFKGLCYFTNGTERVRGVTRHIYNREEYVRFDSDVGVYRAVTPQGRPSAEYWNSQKEVLEGARASVDRVCRHNYEVAYRGILQRR
DQB1*05:107 D2 *DFVYQFKGLCYFTNGTERVRGVTRHIYNREEYVRFDSDVGVYRAVTPQGRPVAEYWNSQKEVLEGARASVDRVCRHNYEVAYRGILQRR
DQB1*05:108 D2 *DFVYQFKGLCYFTNGTERVRGVTRHIYNREEYVRFDSDVGVYRAVTPQGRPDAEYWNSQKEVLEGARASVDRVCRHNYEVAYRGILQRR
DQB1*05:109 D2 *DFVYQFKGLCYFTNGTERVRGVTRHIYNREEYVRFDSDVGVYRAVTPQGRPDAEYWNSQKEVLEGARASVDRVCRHNYKVAYRGILQRR
DQB1*05:110N D2 *DFVYQFKGLCYFTNGTERVRGVTRHIYNX............................................................
Fig. 2 A converted data aligned to G-DOMAINs for HLA-DRB1 (DRB1_aa.gd.txt). An asterisk and a dot mean
amino acid residue is not recorded in the IPD-IMGT/HLA database and not translated at this position,
respectively
a b
−15
−10
log10 p
−5
0
1 92 1 92 1 92 1 92 1 92 1 92 1 92
2
_D
_D
_D
_D
_D
_D
D
1_
A
B
R
D
Fig. 3 (a) A result of binomial logistic regression for amino acid residues of G-DOMAINs between CHS and JPT
data sets. The most significant position and frequencies of amino acid residues of the position can be checked
from the returned values. (b) A plot of p values at G-DOMAINs. The red circle at HLA-A D2 domain (G-ALPHA2
domain) shows the most significant amino acid position. The gray line shows the significant threshold as
determined by Bonferroni correction, considering the number of positions contributing to variation in the
G-DOMAINs
4 Notes
References
1. Okada Y, Kim K, Han B et al (2014) Risk for class II binding cancer mutations. Cell
ACPA-positive rheumatoid arthritis is driven 175:416–428. e13
by shared HLA amino acid polymorphisms in 11. Kishikawa T, Momozawa Y, Ozeki T et al
Asian and European populations. Hum Mol (2019) Empirical evaluation of variant calling
Genet 23:6916–6926 accuracy using ultra-deep whole-genome
2. Dostál C, Iványi D, Macurová H et al (1977) sequencing data. Sci Rep 9:1784
HLA antigens in systemic lupus erythemato- 12. Langmead B, Salzberg SL (2012) Fast gapped-
sus. Ann Rheum Dis 36:83–85 read alignment with bowtie 2. Nat Methods
3. Terao C, Ota M, Iwasaki T et al (2019) IgG4- 9:357–359
related disease in the Japanese population: a 13. Lefranc M-P, Duprat E, Kaas Q et al (2005)
genome-wide association study. Lancet Rheu- IMGT unique numbering for MHC groove
matol 1:e14–e22 G-DOMAIN and MHC superfamily (MhcSF)
4. Mardis ER (2008) The impact of next- G-LIKE-DOMAIN. Dev Comp Immunol
generation sequencing technology on genetics. 29:917–938
Trends Genet 24:133–141 14. 1000 Genomes Project Consortium, Abecasis
5. Erlich RL, Jia X, Anderson S et al (2011) Next- GR, Auton A et al (2012) An integrated map of
generation sequencing for HLA typing of class genetic variation from 1,092 human genomes.
I loci. BMC Genomics 12:42 Nature 491:56–65
6. Gabriel C, Fürst D, Faé I et al (2014) HLA 15. Abi-Rached L, Gouret P, Yeh J-H et al (2018)
typing by next-generation sequencing—get- Immune diversity sheds light on missing varia-
ting closer to reality. Tissue Antigens 83:65–75 tion in worldwide genetic diversity panels.
7. Hosomichi K, Jinam TA, Mitsunaga S et al PLoS One 13:e0206512
(2013) Phase-defined complete sequencing of 16. González-Galarza FF, Takeshita LYC, Santos
the HLA genes by next-generation sequencing. EJM et al (2015) Allele frequency net 2015
BMC Genomics 14:355 update: new features for HLA epitopes, KIR
8. Robinson J, Soormally AR, Hayhurst JD et al and disease and HLA adverse drug reaction
(2016) The IPD-IMGT/HLA database—new associations. Nucleic Acids Res 43:
developments in reporting HLA variation. D784–D788
Hum Immunol 77:233–237 17. Raychaudhuri S, Sandor C, Stahl EA et al
9. Kawaguchi S, Higasa K, Shimizu M et al (2012) Five amino acids in three HLA proteins
(2017) HLA-HD: an accurate HLA typing explain most of the association between MHC
algorithm for next-generation sequencing and seropositive rheumatoid arthritis. Nat
data. Hum Mutat 38:788–797 Genet 44:291–296
10. Marty Pyke R, Thompson WK, Salem RM et al
(2018) Evolutionary pressure against MHC
Chapter 4
Abstract
The aim of this study was to use IEDB software to predict the suitable MERS-CoV epitope vaccine against
the most known world population alleles through four selecting proteins such as S glycoprotein and
envelope protein and their modification sequences after the pandemic spread of MERS-CoV in 2012.
IEDB services is one of the computational methods; the output of this study showed that S glycoprotein,
envelope (E) protein, and S and E protein modified sequences of MERS-CoV might be considered as a
protective immunogenic with high conservancy because they can elect both neutralizing antibodies and
T-cell responses when reacting with B-cell, T-helper cell, and cytotoxic T lymphocyte. NetCTL, NetChop,
and MHC-NP were used to confirm our results. Population coverage analysis showed that the putative
helper T-cell epitopes and CTL epitopes could cover most of the world population in more than 60 geo-
graphical regions. According to AllerHunter results, all those selected different protein showed
non-allergen; this finding makes this computational vaccine study more desirable for vaccine synthesis.
Key words Middle East respiratory syndrome coronavirus, Severe acute respiratory syndrome coro-
navirus, Federal Drug Administration, Immuno epitope database, FAO, AllerHunter
1 Introduction
39
40 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
2.1 Protein Sequence A total number of 130 spike (S) glycoproteins and 41 envelope
Retrieval (E) proteins of MERS-CoV were retrieved from NCBI (http://
www.ncbi.nlm.nih.gov/protein/) database in September 2016,
which was actually collected from different parts of the world,
such as Saudi Arabia, China, Thailand, United Kingdom, Qatar,
Tunisia, and South Africa. The accession numbers of retrieved
strains were listed in Supplementary Tables 1 and 2. All methods
below were applied for S, E, modified S & E proteins; modified S
and E proteins were made by randomly changing some amino acids
in their reference sequences; see Table 1 envelope protein (E) with
Table 2 spike glycoprotein (S) gene bank accession numbers.
A Computational Vaccine Designing Approach for MERS-CoV Infections 41
Table 1
Gene Bank Accession No of Envelope protein
(continued)
42 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 1
(continued)
2.3 Determination The retrieved sequences, which were collected from NCBI, were
of Conserved Regions used as a platform to obtain the conserved regions by using multi-
ple sequence alignment (MSA). Sequences were aligned with the
aid of ClustalW as implemented in the BioEdit program, version
7.0.9.0.
2.4.1 Prediction of Linear BepiPred from immune epitope database and analysis resource
B-Cell Epitopes (http://toolsiedb.ofg/bcell/) was used for linear B-cell epitope
prediction from the conserved region with a default threshold
value of 0.350. BepiPred combines the predictions of a hidden
Markov model and the propensity scale of Parker et al. as it is
described in Larsen et al. (Immunome Research, 2006).
2.4.2 Prediction By Emini surface accessibility prediction tool of the immune epi-
of Surface Accessibility tope database (IEDB), the surface-accessible epitopes were pre-
dicted from the conserved regions holding the default threshold
value 1.000 or higher.
A Computational Vaccine Designing Approach for MERS-CoV Infections 43
Table 2
Gene Bank Accession No of S glycoprotein
(continued)
44 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 2
(continued)
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 45
Table 2
(continued)
(continued)
46 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 2
(continued)
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 47
Table 2
(continued)
2.4.3 Prediction The Kolaskar and Tongaonkar antigenicity method was used to
of Epitope Antigenicity determine the antigenic sites with a default threshold value of
Sites 1.045.
2.4.4 Prediction Parker hydrophilicity prediction tool was used to determine the
of Epitope Hydrophilicity hydrophilicity of the conserved regions; the threshold default value
was 1.286.
2.4.5 Prediction of Beta Chou and Fasman beta turn prediction method was used with the
Turn Sites default threshold 1.009 to determine the sites that contain beta
turns.
2.4.6 Prediction Karplus and Schulz flexibility prediction tools were used for the
of Flexibility prediction of chain flexibility in proteins (selection of peptide anti-
gen) with default threshold value 0.992.
Thresholds of all tools were provided by IEDB and it is mainly
calculated by the software as the average score of the tested protein
for each corresponding tools.
2.5 T-Cell Epitope Scanning an antigen sequence for amino acid patterns indicative of:
Prediction
2.5.1 MHC Class Analysis of peptide binding to MHC class I molecules was assessed by
I Binding Predictions the IEDB MHC I prediction tool http://tools.iedb.org/mhci/n;
for MHC-I binding prediction, several alleles were used including
HLA-A, HLA-B, HLA-C, and HLA-E that have been reported as
frequent around the world. MHC-I peptide complex presentation to
T lymphocytes undergo several steps. The attachment of cleaved
peptides to MHC molecules step was predicted. Consensus method
which combines ANN, SMM, and scoring matrices derived from
combinatorial peptide libraries (Comblib_Sidney2008) was used.
9-mer epitope lengths were selected. All internationally conserved
epitopes that bind to alleles at score equal or less than 1.0 percentile
rank (low percentile rank ¼ good binders) were selected for further
48 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
2.5.2 MHC Class II Analysis of peptide binding to MHC class II molecules was assessed
Binding Predictions by the IEDB MHC II prediction tool http://tools.immuneepitope.
org/mhcii/. For MHC-II binding prediction, the reference set of
alleles was used, which include HLA-DQ, HLA-DP, and HLA-DR
that are most frequent around the world. MHC class II groove has
the ability to bind to peptides with different lengths. There are
seven prediction methods in the IEDB MHC II prediction tool;
NetMHCIIpan was used in this study; the conserved epitopes that
bind to alleles at scores equal or less than 10 percentile rank were
selected for further analysis as in selecting thresholds (cutoffs) for
MHC class I and II binding predictions, http://help.iedb.org/
entries/23854373-Selecting-thresholds-cut-offs-for-MHC-class-
I-and-II-binding-predictions [7, 11–14].
Proteasome cleavage The scores can be interpreted as logarithms of the total amount of
cleavage site usage liberating the peptide C-terminus; it depends on
a lot of other factors, e.g., the amount of source protein degraded.
TAP transport The TAP score estimates an effective log (IC50) values for the
binding to TAP of a peptide or its N-terminal prolonged
precursors.
MHC binding The MHC binding prediction is identical to Class I with output
log (IC50) values.
Processing This score combines the proteasomal cleavage and TAP transport
predictions. It predicts a quantity proportional to the amount of
peptide present in the ER, where a peptide can bind to multiple
MHC molecules. This allows predicting T-cell epitope candidates
independent of MHC restriction.
A Computational Vaccine Designing Approach for MERS-CoV Infections 49
Total This score combines the proteasomal cleavage, TAP transport, and
MHC binding predictions. It predicts a quantity proportional to
the amount of peptide presented by MHC molecules on the cell
surface. High scores mean high efficiency.
2.5.4 Neural NetChop that was used here is a predictor of proteasomal proces-
Network-Based Prediction sing based upon a neural network. NetCTL and NetCTLpan are
of Proteasomal Cleavage predictors of T-cell epitopes along a protein sequence. The positive
Sites (NetChop) and T-Cell predictions threshold, 0.5, 0.75, and 1, sequentially for all methods
Epitopes (NetCTL above are displayed in green, while the red color for prediction
and NetCTLpan) below the threshold.
2.5.5 MHC-NP: MHC-NP employs data obtained from MHC elution experiments
Prediction of Peptides in order to assess the probability that a given peptide is naturally
Naturally Processed by processed and binds to a given MHC molecule. This tool used in
the MHC this study was the winner of the second Machine Learning Compe-
tition in Immunology; it is composed of three groups of peptides,
binders, nonbinders, and eluted peptides that considered as natu-
rally processed peptides, so greater probe score considered naturally
processing peptide.
2.6 Epitope Analysis All potential MHC I and MHC II binders from spike glycoprotein,
Tools E protein, and S and E modified sequences were assessed for a
population coverage against the whole world population especially
2.6.1 Population
Saudi Arabia with other reported MERS-CoV countries. Calcula-
Coverage Calculation
tions are achieved using the selected MHC-I and MHC-II inter-
acted alleles by the IEDB population coverage calculation tool
http://tools.iedb.org/tools/population/iedb_input; it computes
projected population coverage, average number of epitope hits/
HLA combinations recognized by the population, and minimum
number of epitope hits/HLA combinations recognized by 90% of
the population (PC90).
PhD-SNP SVM input is the sequence and profile at the mutated position.
SNPs and GO SVM input is all the input in PhD-SNP, PANTHER, and GO term
features, by giving disease probability (if >0.5 mutation is predicted
disease).
2.9 Peptide The peptide search tool was used to find all UniProtKB sequences
Search Tool that exactly match a query peptide sequence (http://www.uniprot.
org/peptidesearch/). This means we can easily synthesis the
A Computational Vaccine Designing Approach for MERS-CoV Infections 51
3 Results
3.1 Prediction Spike glycoprotein, E protein, and modified S and E protein were
of B-Cell Epitopes subjected to BepiPred linear epitope prediction, Emini surface
accessibility, Kolaskar and Tongaonkar antigenicity, Parker hydro-
phobicity, Chou and Fasman beta turn prediction methods, and
52 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Fig. 1 BepiPred linear epitope prediction of S glycoprotein, the desired epitope residue showed in yellow color.
The red horizontal line indicates surface accessibility threshold (0.35)
3.1.1 BepiPred Linear The average binder score of spike glycoprotein to B cell was 0.35;
Epitope Prediction Method all values equal or greater than the default threshold 0.35 were
predicted to be potential B-cell binders.
3.1.2 Emini Surface The average surface accessibility areas of the protein were scored as
Accessibility Prediction 1.000; all values equal or greater than the default threshold 1.0
were regarded potentially in the surface. A total number of positive
S glycoprotein peptide represent 481 peptide out of 1349, while in
E protein represents 23 out of 77 and in S and E modified sequence
represents 485 out 485 and 17out of 77 peptides sequentially.
3.1.3 Kolaskar The default threshold of antigenicity of the protein was 1.045; all
and Tongaonkar values greater than 1.045 were considered as potential antigenic
Antigenicity determinants. The positive result number of selected S glycoprotein
peptide represents 655 out of 1348, while in E protein represents
55 out of 76 and in S and E modified sequence represents 668 out
of 668 and 47 out of 76 peptides sequentially.
3.1.4 Parker The average hydrophilicity score of the protein was 1.286; all values
Hydrophilicity Prediction equal or greater than the default threshold 1.286 were potentially
hydrophilic. The positive result number of S glycoprotein peptide
A Computational Vaccine Designing Approach for MERS-CoV Infections 53
Fig. 2 Emini surface accessibility prediction of S glycoprotein. The desired epitope residue for surface
accessibility showed in yellow color, while green color was below threshold (1.000)
Fig. 3 Kolaskar and Tongaonkar antigenicity prediction of S glycoprotein. The desired epitope residue for
antigenicity showed in yellow color, while the green color below the red horizontal line indicates less
antigenicity below (1.045)
54 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Fig. 4 Parker hydrophilicity prediction of S glycoprotein. The desired epitope residue showed in yellow color.
The red horizontal line indicates parker hydrophilicity threshold (1.286)
Fig. 5 Chou and Fasman beta turn prediction of S glycoprotein. The desired epitope residue showed in yellow
color. The red horizontal line indicates beta turn prediction threshold (1.009)
A Computational Vaccine Designing Approach for MERS-CoV Infections 55
Fig. 6 Karplus and Schulz flexibility prediction of S glycoprotein. The desired epitope residue showed in yellow
color. The red horizontal line indicates surface accessibility threshold (0.35)
Fig. 7 BepiPred linear epitope prediction of S glycoprotein modified sequence. The desired epitope residue
showed in yellow color. The red horizontal line indicates BepiPred Linear Epitope threshold (0.35)
56 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Fig. 8 Emini surface accessibility prediction of S glycoprotein modified sequence. The desired epitope residue
showed in yellow color, while green color below the red horizontal line indicates surface accessibility
threshold (1.000)
Fig. 9 Kolaskar and Tongaonkar antigenicity prediction of S glycoprotein modified sequence. The desired
epitope residue showed in yellow color. The red horizontal line indicates antigenicity threshold (1.045)
A Computational Vaccine Designing Approach for MERS-CoV Infections 57
Fig. 10 Parker hydrophilicity prediction of S glycoprotein modified sequence. The desired epitope residue
showed in yellow color, while green color below the red horizontal line indicates hydrophilicity threshold
(1.286)
Fig. 11 Chou and Fasman beta turn prediction of S glycoprotein modified sequence. The desired epitope
residue showed in yellow color. The red horizontal line indicates beta turn threshold (1.009)
58 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Fig. 12 Karplus and Schulz flexibility prediction of S glycoprotein modified sequence. The desired epitope
residue showed in yellow color, while green color below the red horizontal line indicates flexibility threshold
(0.992)
3
Threshold
1
Score
–1
–2
–3
0 20 40 60 80 100
Position
Fig. 13 BePipred linear epitope prediction of E protein. The desired epitope residue showed in yellow color.
The red horizontal line indicates Bepipred Linear Epitope threshold (0.35)
7
Threshold
4
Score
0
0 20 40 60 80
Position
Fig. 14 Emini surface accessibility prediction of E protein. The desired epitope residue showed in yellow color,
while green color below the red horizontal line indicates surface accessibility threshold (1.000)
1.25
Threshold
1.20
1.15
1.10
Score
1.05
1.00
0.95
0.90
0 20 40 60 80
Position
Fig. 15 Kolaskar and Tongaonkar antigenicity prediction of E protein. The desired epitope residue showed in
yellow color, while green color below the red horizontal line indicates antigenicity threshold (1.045)
3.1.5 Chou and Fasman To determine the site that contains beta turns, the default threshold
Beta Turn Prediction was 1.009; all values equal or greater than the default threshold
were considered beta turn sites. The positive result number of
selected peptide represents 668 out of 1348 in S glycoprotein,
while it represents 19 out of 76 in E protein and 673 out of
673 with 21 out of 76 in both S and E modified sequence
sequentially.
60 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
4
Threshold
0
Score
–2
–4
–6
0 20 40 60 80
Position
Fig. 16 Parker hydrophilicity prediction of E protein the desired epitope residue showed in yellow color. The
red horizontal line indicates hydrophilicity threshold (1.286)
1.4
Threshold
1.3
1.2
1.1
Score
1.0
0.9
0.8
0.7
0.6
0 20 40 60 80
Position
Fig. 17 Chou and Fasman beta turn prediction of E protein. The desired epitope residue showed in yellow
color. The red horizontal line indicates beta turn threshold (1.009)
3.1.6 Karplus and Schulz The default threshold value 0.992 determined chain flexibility in
Flexibility Prediction proteins, so all values equal or greater than the default threshold
were considered as chain flexibility of protein. The positive results
of selected peptide represent 679 out of 1347 in S glycoprotein,
and it represents 24 out of 24 in E protein beside represented
680 out of 681 and 24 out of 75 in S and E modified sequences
sequentially.
The most common B-cell epitope for E protein is YVKFQDS in
a position 69, while for E protein modified sequence, they are
A Computational Vaccine Designing Approach for MERS-CoV Infections 61
110
Threshold
105
100
Score
0.95
0.90
0.85
0 20 40 60 80
Position
Fig. 18 Karplus and Schulz flexibility prediction of E protein. The desired epitope residue showed in yellow
color, while green color below the red horizontal line indicated flexibility below threshold (0.992)
7 Threshold
4
Score
0
0 20 40 60 80
Position
Fig. 19 BepiPred linear epitope prediction of E protein modified sequence. The desired epitope residue showed
in yellow color. The red horizontal line indicates BepiPred Linear Epitope threshold (0.35)
7
Threshold
4
Score
0
0 20 40 60 80
Position
Fig. 20 Emini surface accessibility prediction of E protein modified sequence. The desired epitope residue
showed in yellow color, above the red horizontal line threshold (1.000)
6
Threshold
2
Score
–2
–4
–6
0 20 40 60 80
Position
Fig. 21 Kolaskar and Tongaonkar Antigenicity prediction of E protein modified sequence. The desired epitope
residue showed in yellow color, while green color indicates antigenicity below threshold (1.045)
3.2 T-Cell Epitope Spike glycoprotein, E protein, and S and E modified sequence were
Prediction subjected to consensus method for MHC-I binding, NetMHCII-
pan for MHC-II binding, NetMHCpan for proteasomal cleavage/
TAP transport/MHC class I combined predictor, NetChop and
A Computational Vaccine Designing Approach for MERS-CoV Infections 63
6
Threshold
2
Score
-2
-4
-6
0 20 40 60 80
Position
Fig. 22 Parker hydrophilicity prediction of E protein modified sequence. The desired epitope residue showed in
yellow color. The red horizontal line indicates hydrophilicity threshold (1.286)
1.3
Threshold
1.2
1.1
Score
1.0
0.9
0.8
0.7
0 20 40 60 80
Position
Fig. 23 Chou and Fasman beta turn prediction of E protein modified sequence. The desired epitope residue
showed in yellow color, while green color below the red horizontal line indicates low beta turn threshold
(1.009)
3.2.1 MHC Class Analysis of peptide sequence that’s binding to MHC class I mole-
I Binding Predictions cules by consensus method was assessed by the conserved epitopes
that bind to alleles at score equal or less than 1.0 percentile. The
64 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
1.15
Threshold
1.10
1.05
Score
1.00
0.95
0.90
0 20 40 60 80
Position
Fig. 24 Karplus and Schulz flexibility prediction of E protein modified sequence. The desired epitope residue
showed in yellow color that illustrates flexibility threshold (0.992)
3.2.2 MHC Class II Analysis of peptide binding to MHC class II molecules was assessed
Binding Predictions by the conserved epitopes that bind to alleles at scores equal or less
than 10 percentile rank; the positive result numbers of selected
epitopes showed 212 out of 4819 epitopes in S glycoprotein,
685 out of 4148 in E protein, and 6896 out of 75,206 with
685 out of 4148 in both S and E modified proteins sequentially.
The following alleles are more common between S glycopro-
tein, E protein, and S and E modified sequences, and they are
HLA-DPA1∗01:03/DPB1∗02:01, HLA-DPA1∗02:01/
DPB1∗01:01, HLA-DRB1∗01:01, HLA-DRB1∗01:02,
HLA-DRB1∗04:04, HLA-DRB1∗04:05, HLA-DRB1∗04:08,
HLA-DRB1∗04:10, HLA-DRB1∗04:23, HLA-DRB1∗07:01,
HLA-DRB1∗07:03, HLA-DRB1∗08:06, HLA-DRB1∗11:04,
HLA-DRB1∗11:06, HLA-DRB1∗12:01, HLADRB1∗13:04,
HLA-DRB1∗13:11, HLA-DRB1∗13:21, and
HLA-DRB4∗01:01, but in S and modified S glycoprotein, both
of them contain other 42 different alleles not shown here. In E and
modified E protein, HLA-DRB1∗01:01 had higher frequency
numbers of alleles which represented 20, followed by 17 in
HLA-DRB1∗01:02, 11 in HLA-DRB1∗12:01, 10 in
HLA-DRB1∗11:04, HLA-DRB1∗11:06, and
HLA-DRB1∗13:11, and 9 in HLA-DRB1∗07:01,
HLA-DRB1∗07:03 and HLA-DRB1∗13:21, while in S and mod-
ified S glycoprotein, those alleles below had higher frequency num-
bers, which represented (200/199) in HLA-DRB1∗04:08/
(199/201) HLA-DRB1∗04:01, HLA-DRB1∗04:21, and
HLA-DRB1∗04:26/(194/190) in HLA-DRB1∗09:01/
(192/189) in HLA-DRB1∗04:05/(167/167) in
HLA-DRB1∗07:01, HLA-DRB1∗07:03/(164/167) in
HLA-DRB1∗15:02, (160/159) in HLA-DRB1∗13:02/
(159/159) in HLA-DRB1∗11:14, HLA-DRB1∗11:20, and
HLA-DRB1∗13:23, and (152/158) in HLA-DRB3∗01:01.
E and modified E protein had the same peptide sequences
with same frequency numbers, but the higher frequency
numbers only showed in peptides below; it represented 15 with
GFNTLLVQPALSLYMn, 14 with TGFNTLLVQPALSLYn,
13 with FNTLLVQPALSLYMT, 12 with MTGFNTLLVQPALSLn,
A Computational Vaccine Designing Approach for MERS-CoV Infections 67
3.2.3 Proteasomal In NetMHCpan high scores mean high efficiency due to prediction
Cleavage/TAP Transport/ of a quantity proportional to the amount of peptide presented by
MHC Class I Combined MHC molecules on the cell surface; total score higher or equal to
Predictor 0 were selected for S and modified S glycoprotein, while in E
protein total score equal or higher than 0.3 was selected, but in
modified E protein total score equal or higher than 2.82 was
selected; see Tables 3 and 4.
3.2.4 Neural The positive prediction thresholds are 0.5 and 0.75 (green color)
Network-Based Prediction for NetChop and NetCTL sequentially considered as proteasomal
of Proteasomal Cleavage cleavage sites for T-cell epitopes; see Figs. 25, 26, 27, 28, 29, 30,
Sites (NetChop) and T-Cell 31, 32, 33, 34, 35, 36, 37, and 38 with Table 5.
Epitopes (NetCTL NetChop prediction score equal or greater than 0.5 in S glyco-
and NetCTLpan) protein represented a positive result; more than 300 peptides out of
1353 showed positive results, while in modified S glycoprotein,
5 out of 66 showed positive results, in E protein 28 out of
82 were positive, and 28 out of 82 in modified E protein were
positive.
Both E & modified E protein showed 28 amino acid that’s
crossed the threshold; 0.5 with same residue position like: F ! 33;
L ! 58, 50, 39, 51, 28, 56, 2; Q ! 70; R ! 63; Y ! 59 and 66;
V ! 67, 65, 41, 21, 22, 52, 29; except: V ! 82 in E protein while
it’s at position 10 in modified E protein, L ! 76 in E protein while
at position 34 and 6 in modified E protein, F ! 69 in E protein
while it’s at positions 17 and 19 in modified E protein, W ! 81 in E
while it’s at position 11 in modified E protein, R ! 38 in E, I ! 18
in E, K ! 68 and 73 in E while A ! 32 in modified E protein with
M ! 60,Y ! 57 in E protein.
68 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 3
Illustrate the positive selected peptide sequences for both S and modified
S glycoprotein sequence by NetMHCpan prediction tool
S Modified S
a
AFYCILEPR AFYCILEPRa
ASLNSFKEYa,b ASLNSFKEYa,b
ATDCSDGNYa,b ATDCSDGNYa,b
AYQNLVGYYa,b AYQNLVGYYa,b
ALALCVFFIa AAIPFAQSI
a
CGTLLRAFY ALGAMQTGF
CTFMYTYNIa,b AVNNNAQALb
CYSSLILDYa ALALCVFFIa
CMGKLKCNRa,b CGTLLRAFYa
DAYQNLVGYa,b CTFMYTYNIa,b
ESFDVESGV CYSSLILDYa
EMRLASIAFa CMGKLKCNRa,b
ETKTHATLFa DLSQLHCSY
a
ESAALSAQL DAYQNLVGYa,b
FANGFVVRI b ETKTHATLFa
FLLTPTESYa EMRLASIAFa
FFNHTLVLLa,b EAAYTSSLL
FSDGKMGRFa ESAALSAQLa
FSSRYVDLYa FLLTPTSSYa
FQFATLPVY FFNHTLVLLa,b
FSVDGYIRR FSDGKMGRFa
FYVYKLQPLa FSSRYVDLYa
FSNPTCLILa,b FTNCNYNLTb
FQNCTAVGVa,b FYVYKLQPLa
FSFGVTQEYa FSNPTCLILa,b
FVVNAPNGL b FQNCTAVGVa,b
FQDELDEFFa FVYDAYQNLb
GVHLFSSRYa FSFGVTQEYa
GLVNSSLFVa,b FAQSIFYRL
GYYSDDGNYa,b FQDELDEFFa
GLYFMHVGYa GVHLFSSRYa
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 69
Table 3
(continued)
S Modified S
GQGTHIVSF GVRQQRFVY
a,b
GRLTTLNAF GYYSDDGNYa,b
HSVFLLMFL GLVNSSLFVa,b
HISSTMSQYa GWTAGLSSF
a
IEVDIQQTF GRLTTLNAFa,b
IIYPQGRTYc GLYFMHVGYa
ITITYQGLF HISSTMSQYa
ITYQGLFPYa IEVDIQQTFa
ITEDEILEWa IIYPQTRTYc
IASNCYSSLa,b ITYQGLFPYa
ILATVPHNLa,b ITEDEILEWa
ILDYFSYPLa IASNCYSSLa,b
ITKPLKYSYa ILATVPHNLa
IAFNHPIQVa,b ILDYFSYPLa
IEVVSAYGLa ITKPLKYSYa
IAGLVALALa IAFNHPIQVa,b
KQFANGFVVa,b ICAQYVAGY
a
KAWAAFYVY IPFAQSIFY
KLQPLTFLLc IANKFNQAL b
KETKTHATLa IEVVSAYGL1
a
KVTIADPGY IPNFGSLTF b
KVTVDCKQYa IAGLVALALa
KELGNYTYYa,b KQFDNGFVVa,b
KYVAPQVTYa KAWAAFYVYa
LLRAFYCILa KLQPLTFLWc
LLDFSVDGY KETKTHATLa
LPVYDTIKYa KVTVDCKQYa
LYGGNMFQFb KVTIADPGYa
LSGTPPQVYa KYVAPQVTYa
LSLFSVNDF b KELGNYTYYa,b
LSIPTNFSFa,b LLRAFYCILa
(continued)
70 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 3
(continued)
S Modified S
LQMGFGITVa LPVYDTIKYa
LINGRLTTLa,b LSGTPPQVYa
LVRSESAALa LTFLWDFSV
LYFMHVGYYa LQMGFGITVa
LVALALCVFa LSIPTNFSFa,b
MGRFFNHTLa,b LGSIAGVGW
MLGSSVGNFa,b LSSFAAIPF
a
MGFGITVQY LASELSNTF b
MTEQLQMGFa LINGRLTTLa,b
MLKRRDSTY LVRSESAALa
MSQYSRSTRa LTFINTTLLb
NLRNCTFMYa,b LYFMHVGYYa
NSYTSFATYa,b LVALALCVFa
NSVCPKLEFa,b MGRFFNHTLa,b
NHIEVVSAYa,b MLGSSVGNFa,b
NTTLLDLTY b MGFGITVQYa
PVYDTIKYY MSQYSRSTRa
QFANGFVVR b MTEQLQMGFa
QTAQGVHLFa MEAAYTSSL
c
QPLTFLLDF NLRNCTFMYa,b
QSFSNPTCL1b NSYTSFATYa,b
QALHGANLR b NSVCIKLEFa,b
QSSPIIPGFa NHIEVVSAYa,b
RFFNHTLVLa,b QTAQGVHLFa
RNCTFMYTYa QLHCSYESF
a,b
RLVFTNCNY QPLTFLWDFc
RSTRSMLKRa QSFSNPTCLa,b
RSAIEDLLFa QQRFVYDAY
SVFLLMFLL QVDQLNSSY b
SFKEYFNLRa,b QSSPIIPGFa
SLNSFKEYFa,b RFFNHTLVLa,b
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 71
Table 3
(continued)
S Modified S
SFDVESGVYa RNCTFMYTYa,b
SGVYSVSSFa RLVFTNCNYa,b
SLILDYFSYa RSTRSMLKRa
SQFNYKQSFa,b RSAIEDLLFa
SSAGPISQFa SFKEYFNLRa,b
SPLEGGGWLa SLNSFKEYFa,b
SQLGNCVEYa,b SFDVESGVYa
STVAMTEQL SGVYSVSSFa
STVWEDGDYa SLILDYFSYa
SYINKCSRLa,b SPLEGGGWLa
SSTMSQYSRa SQFNYKQSFa,b
STLTPRSVRa SSAGPISQFa
STRSMLKRRa STVWEDGDYa
SVRNLFASVa,b SYINKCSRLa,b
TFFDKTWPRa SSTMSQYSRa
TYSNITITYa,b STRSMLKRRa
TAVGVRQQRa SQLGNCVEYa,b
TVWEDGDYYa STLTPRSVRa
TLLDLTYEM SLLGSIAGV
a,b
TSIPNFGSL SVRNLFASVa,b
TYQNISTNLa,b TFFDKTWPRa
TYYNKWPWYa,b TYSNITITYa,b
VSKADGIIYa TTITKPLKY
a
VYKLQPLTF TVWEDGDYYa
VECDFSPLLa TAVGVRQQRa
VYNFKRLVFa,b TTNEAFQKVb
VASGSTVAM TSIPNFGSLa,b
VSIVPSTVWa TYQNISTNLa,b
VSVPVSVIYa TYYHKWPWYa
VNAPNGLYFa,b VSKADGIIYa
VVNAPNGLYa,b VECDFSPLLa
(continued)
72 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 3
(continued)
S Modified S
VALALCVFFa VYKLQPLTFa
VVKALNESYa,b VYNFKRLVFa,b
WPWYIWLGFa VSIVPSTVWa
WAAFYVYKLa VSVPVSVIYa
YQGDHGDMYc VNAPNGLYFa,b
YFNLRNCTFa,b VVNAPNGLYa,b
YYSIIPHSIa VALALCVFFa
YSIIPHSIRa VVKALNESYa,b
YNLTKLLSLa,b WPWYIWLGFa
YPLSMKSDLa WSYTGSSFY
YSSLILDYFa WTAGLSSFA
a
YGVSGRGVF WAAFYVYKLa
YINKCSRLLa YQGDHGDYYc
YSLYGVSGRa YFNLRNCTFa,b
YSYINKCSRa,b YNLTKLLSLa,b
YYRKQLSPLa YSIIPHSIRa
YSRSTRSMLa YYSIIPHSIa
YYSDDGNYYa,b YINKCSRLLa,b
YYPSNHIEVa,b YPLSMKSDLa
YAPEPITSLa YSSLILDYFa
YTYYNKWPWb,c YSYINKCSRa,b
YYNKWPWYIb,c YYRKQLSPLa
YGVSGRGVFa
YSLYGVSGRa
YSRSTRSMLa
YYSDDGNYYa,b
YAPEPITSLa
YYPSNHIEVa,b
YTYYHKWPWc
YYHKWPWYIc
a
Indicates a common peptide sequence
b
Indicates presence of arginine in sequence
c
Indicates a partial similarity between both reference sequence and modified sequence
A Computational Vaccine Designing Approach for MERS-CoV Infections 73
Table 4
Illustrate the positive selected peptide sequences for both E and modified E protein by NetMHCpan
prediction tool
E Modified E
a
ALYLYNTGR KPPLPEDVW
CMAFLTATR
FTVVCAITL
FVQERIGLF
ITLLVCMAF
LFIVNFFIF a
LVQPALYLY
LYNTGRSVY a
MAFLTATRL
RIGLFIVNF a
TLLVQPALY
a
Indicates presence of arginine in sequence
NetChop Prediction
Threshold – 0.5 Positive prediction Negative prediction
1.0
0.8
0.6
Score
0.4
0.2
0.0
0 20 40 60 80 100
Position
Fig. 25 Illustrate the NetChop positive prediction of E protein with threshold equal or greater than 0.5
N.B:-.
1. Peptide sequences of both E and modified E protein were
different even if they had a similar residue position.
2. NetCTL was used for E and modified E protein just due to
large amounts of data beside, time-consuming when it is used
with S glycoprotein.
3. Modified E protein NetCTL charts were not shown here.
74 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
NetChop Prediction
Threshold – 0.5 Positive prediction Negative prediction
1.0
0.8
0.6
Score
0.4
0.2
0.0
0 20 40 60 80
Position
Fig. 26 Illustrate the NetChop positive prediction of modified E protein threshold equal or greater than 0.5
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
1.6
1.4
1.2
1.0
0.8
Score
0.6
0.4
0.2
0.0
–0.2
0 20 40 60 80
Position
Fig. 27 Illustrate the NetCTL positive prediction of E protein supertype A1 that’s indicated in a green color with
threshold equal or greater than 0.75 above the red color
3.2.5 MHC-NP: The greater probe score was considered as naturally processing
Prediction of Peptides peptide; probe scores greater than 0 were considered as naturally
Naturally Processed by processing peptides.
the MHC The total positive epitope number of naturally processing pep-
tides represented 10,189 out of 10,760 in S glycoprotein and
A Computational Vaccine Designing Approach for MERS-CoV Infections 75
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
2.0
1.5
1.0
Score
0.5
0.0
–0.5
0 20 40 60 80
Position
Fig. 28 Illustrate the NetCTL prediction of E protein supertype A2, the desired supertype A2 appeared in a
green color with threshold equal or greater than 0.75 above the threshold red color
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
2.0
1.5
1.0
Score
0.5
0.0
–0.5
0 20 40 60 80
Position
Fig. 29 Illustrate the NetCTL prediction of E protein supertype A3, the positive results appeared in a green color
with threshold equal or greater than 0.75 above the red color
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
14
12
10
0.8
Score
0.6
0.4
0.2
0.0
-0.2
0 20 40 60 80
Position
Fig. 30 Illustrate the NetCTL prediction of E protein supertype A24, positive results appeared in a green color
with threshold equal or greater than 0.75 above the threshold red color
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
1.2
1.0
0.8
Score
0.6
0.4
0.2
0.0
0 20 40 60 80
Position
Fig. 31 Illustrate the NetCTL prediction of E protein supertype A26, positive results appeared in a green color
with threshold equal or greater than 0.75 above the threshold red color
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
0.8
0.7
0.6
0.5
0.4
Score
0.3
0.2
0.1
0.0
–0.1
0 20 40 60 80
Position
Fig. 32 Illustrate the NetCTL negative prediction of E protein supertype B7 with threshold below 0.75
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
0.9
0.8
0.7
0.6
0.5
Score
0.4
0.3
0.2
0.1
0.0
0 20 40 60 80
Position
Fig. 33 Illustrate the NetCTL negative prediction of E protein supertype B8 with threshold below 0.75
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
0.8
0.7
0.6
0.5
0.4
Score
0.3
0.2
0.1
0.0
-0.1
0 20 40 60 80
Position
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
1.5
1.0
0.5
0.0
Score
-0.5
-1.0
-1.5
-2.0
-2.5
0 20 40 60 80
Position
Fig. 35 Illustrate the NetCTL negative prediction of E protein supertype B39 with threshold below 0.75
A Computational Vaccine Designing Approach for MERS-CoV Infections 79
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
0.8
0.7
0.6
0.5
0.4
Score
0.3
0.2
0.1
0.0
-0.1
0 20 40 60 80
Position
Fig. 36 Illustrate the NetCTL negative prediction of E protein supertype B44 with threshold below 0.75
NetCTL Prediction
Threshold – 0.75 Positive prediction Negative prediction
1.4
1.2
1.0
0.8
Score
0.6
0.4
0.2
0.0
–0.2
0 20 40 60 80
Position
Fig. 37 Illustrate the NetCTL prediction of E protein supertype B58, positive results appeared in a green
colored with threshold equal or greater than 0.75 above the threshold red color
80 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
NetCTL Prediction
Threshold - 0.75 Positive prediction Negative prediction
1.6
1.4
1.2
1.0
0.8
Score
0.6
0.4
0.2
0.0
-0.2
0 20 40 60 80
Position
Fig. 38 Illustrate the NetCTL prediction of E protein supertype B62, positive results appeared in a green
colored with threshold equal or greater than 0.75 above the threshold red color
3.3 Epitope Analysis MHC-I and MHC-II interacted alleles by the IEDB population
Tools coverage calculation tool was computed by the average number of
epitope hits/HLA combinations recognized by the population and
3.3.1 Population
a minimum number of epitope hits/HLA combinations recognized
Coverage Calculation
by 90% of the population (PC90); see tables below.
Those below represented a selected E protein epitopes for
population coverage calculation:
PFVQER, VQERIG, QERIGL, FLTATR, LYLYNT,
YLYNTG, LYNTGR, YNTGRS, NTGRSV, TGRSVY, RSVYVK,
YVKFQD, VKFQDS, KFQDSK, FQDSKP, QDSKPP, DSKPPL,
SKPPLP, KPPLPP, PPLPPD, PLPPDE, LPPDEW, PPDEWV,
MLPFVQE, LPFVQER, PFVQERI, VQERIGL, RIGLFIV,
IGLFIVN, GLFIVNF, LFIVNFF, FIVNFFI, IVNFFIF, and
VNFFIFT.
There are differences between MHC-I and MHC-II popula-
tion coverage percentage.
There are similarities between MHC-I between both E and
modified E protein, but still there are differences between them at
MHC-II.
Those below represented a selected modified E protein epi-
topes for population coverage calculation:
RSVYVP, LYMTGR, VYVPQQ, PLPEDV, QERIGW,
TGRSVY, YMTGRS, QFVQER, VPQQDS, SKPPLP, PPLPED,
DSKPPL, YVPQQD, KPPLPE, QDSKPP, PQQDSK, QQDSKP,
PLPEDVW, QFVQERI, AFLTATH, MLQFVQE, ALSLYMT,
A Computational Vaccine Designing Approach for MERS-CoV Infections 81
Table 5
Illustrate NetCTL +ve results in E and modified E protein with indications of similarities and
differences in the peptide sequences between them, beside the totals numbers of them
Peptide sequence for E Peptide sequence for modified Residue position for
Supertype protein E protein E/modified E protein
A1 LVQPALYLY LVQPALSLY 51/51
LYNTGRSVY 58/58
A2 FVQERIGWF FVQERIGWF 4/4
VVCDITLLV VVCDITLLV 21/21
FLTATHLCV FLTATHLCV 33/33
LLVQPALSL LLVQPALSL 50/50
SLYMTGRSV SLYMTGRSV 57/57
YMTGRSVYV YMTGRSVYV 59/59
A3 ALYLYNTGR ALSLYMTGR 55/55
NTGRSVYVK 60/
VYVKFQDSK 65/
A24 MLPFVQERI MLQFVQERI 1/1
PFVQERIGL FVQERIGWF 3/4
FVQERIGLF RIGWFIPNF 4/8
RIGLFIVNF WFIPNFFDF 8/11
IGLFIVNFF FTVVCDITL 9/19
LFIVNFFIF ITLLVCTAF 11/25
FTVVCAITL LVQPALSLY 19/51
ITLLVCMAF LYMTGRSVY 25/58
MAFLTATRL 31/
LVQPALYLY 51/
LYNTGRSVY 58/
TGRSVYVKF 61/
KFQDSKPPL 68/
A26 FVQERIGWF FVQERIGWF 4/4
RIGWFIPNF RIGWFIPNF 8/8
WFIPNFFDF WFIPNFFDF 11/11
TVVCDITLL TVVCDITLL 20/20
ITLLVCTAF ITLLVCTAF 25/25
ATHLCVQCM ATHLCVQCM 36/36
LCVQCMTGF LCVQCMTGF 39/39
QCMTGFNTL QCMTGFNTL 42/42
NTLLVQPAL NTLLVQPAL 48/48
LVQPALSLY LVQPALSLY 51/51
B7 – LLVQPALSL /50
QPALSLYMT /53
KPPLPEDVW /3
B8 FVQERIGLF FVQERIGWF 4/4
TGRSVYVKF WFIPNFFDF 61/11
B27 – – –
B39 YNTGRSVYV YMTGRSVYV 59/59
KFQDSKPPL 68
(continued)
82 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 5
(continued)
Peptide sequence for E Peptide sequence for modified Residue position for
Supertype protein E protein E/modified E protein
B44 – – –
B58 ITLLVCMAF IGWFIPNFF 25/9
KPPLPPDEW ITLLVCTAF 73/25
KPPLPEDVW /3
B62 FVQERIGLF FVQERIGWF 4/4
ITLLVCMAF WFIPNFFDF 25/11
TLLVQPALY ITLLVCTAF 49/25
LVQPALYLY LVQPALSLY 51/51
YLYNTGRSV LYMTGRSVY 57/58
Table 6
MHC-I coverage population for S and modified S glycoprotein
Class I
(continued)
84 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 6
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 85
Table 6
(continued)
Class I
(continued)
86 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 6
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 87
Table 6
(continued)
Class I
(continued)
88 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 6
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 89
Table 6
(continued)
Class I
(continued)
90 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 6
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 91
Table 6
(continued)
Class I
(continued)
92 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 6
(continued)
Class I
Table 7
The MHC-II coverage population for S and modified S glycoprotein
Class II
(continued)
94 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 7
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 95
Table 7
(continued)
Class II
(continued)
96 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 7
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 97
Table 7
(continued)
Class II
(continued)
98 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 7
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 99
Table 7
(continued)
Class II
(continued)
100 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 7
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 101
Table 7
(continued)
Class II
(continued)
102 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 7
(continued)
Class II
3.4 Homology The results of homology modeling were not shown here because
Modeling they are not necessary.
104 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 8
MHC-I coverage population for E protein
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 105
Table 8
(continued)
Class I
(continued)
106 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 8
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 107
Table 8
(continued)
Class I
(continued)
108 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 8
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 109
Table 8
(continued)
Class I
(continued)
110 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 8
(continued)
Class I
3.5 Confirmation The results of confirmatory amino acid change were not shown
of Amino Acid Change here because they are not necessary.
in Spike Glycoprotein
(S) and Envelope
Protein (E) Sequence
3.6 Peptide The results of peptide search tool showed presence of selected
Search Tool peptide sequence in another organisms such as Leishmania dono-
vani, Drosophila sechellia (fruit fly), Leishmania infantum,
A Computational Vaccine Designing Approach for MERS-CoV Infections 111
Table 9
MHC-I coverage population for modified E protein
Class I
(continued)
112 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 9
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 113
Table 9
(continued)
Class I
(continued)
114 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 9
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 115
Table 9
(continued)
Class I
(continued)
116 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 9
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 117
Table 9
(continued)
Class I
(continued)
118 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 9
(continued)
Class I
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 119
Table 9
(continued)
Class I
(continued)
120 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 9
(continued)
Class I
Table 10
The MHC-II coverage population for E protein
Class II
(continued)
122 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 10
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 123
Table 10
(continued)
Class II
(continued)
124 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 10
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 125
Table 10
(continued)
Class II
(continued)
126 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 10
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 127
Table 10
(continued)
Class II
(continued)
128 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 10
(continued)
Class II
Table 11
The MHC-II coverage population for modified E protein
Class II
(continued)
130 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 11
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 131
Table 11
(continued)
Class II
(continued)
132 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 11
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 133
Table 11
(continued)
Class II
(continued)
134 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 11
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 135
Table 11
(continued)
Class II
(continued)
136 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 11
(continued)
Class II
(continued)
A Computational Vaccine Designing Approach for MERS-CoV Infections 137
Table 11
(continued)
Class II
(continued)
138 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 11
(continued)
Class II
3.8 AlgPred: AlgPred showed non-allergen for all four sequences (S, E, modified
Prediction S and E proteins) as follows:
of Allergenic Proteins
1. Prediction by mapping of IgE epitope: The protein sequence
and Mapping of IgE does not contain experimentally proven IgE epitope.
Epitopes
2. MAST RESULT: No Hits found; NON ALLERGEN.
3. BLAST results of ARPS: No hits found, NON-ALLERGEN.
4. Prediction by hybrid approach: NON-ALLERGEN/
ALLERGEN.
There were slightly differences between the four sequences in
SVM prediction methods according to amino acid composition/
dipeptide composition as in Tables 12 and 13.
3.9 VaxiJen v2.0 VaxJen servers showed three protein sequences out of two, consid-
ered as probable antigens, as illustrated below:
S glycoprotein: threshold for this model, 0.4; overall antigen
prediction, 0.4827 (probable ANTIGEN).
Modified S glycoprotein: threshold for this model, 0.4; overall
antigen prediction, 0.4907 (probable ANTIGEN).
140 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Table 12
SVM prediction methods based on amino acid composition for the four protein sequences
Table 13
Illustrates SVM prediction methods based on dipeptide composition for the four protein sequences
4 Discussions
et al. [6] said that the full-length S DNA and a truncated S1 subunit
glycoprotein can elicit a higher titer of neutralizing antibodies; this
kind of immunization protected non-human primates (NHPs)
from severe lung disease after intratracheal challenge with MERS-
CoV injection; in another study that was done in Iran by Poorin-
mohammad et al. [15] [NetCTL 1.2 (Larsen et al., 2007), EpiJen
(Doytchinova et al, 2006), and NHLApred (Bhasin and Raghava,
2007), they were selected computational prediction tools with
PEPstr server for modeling (Kaur et al., 2007)] to identify cyto-
toxic T-lymphocyte epitopes presented by the human leukocyte
antigen (HLA)-A∗0201; as this is the most frequent HLA class I
allele among Middle Eastern populations with this selected RBD
for their study, they showed LLSGTPPQV, ILDYFSYPL
ILATVPHNL, NLTTITKPL, LQMGFGITV, and FSNPTCLIL
as selected epitopes but LLSGTPPQV and FSNPTCLIL were con-
sidered as real epitope due to the following: peptides with binding
orientations closer to the native structure and lower binding free
energy scores are ranked higher in having the potential to be real
epitopes reverse another study were done by Shi J et al. [19] by
using the Immune Epitope Database, that said: the nucleocapsid
(N) protein of MERS-CoV might be a better protective immuno-
gen with high conservancy and potential eliciting both neutralizing
antibodies and T-cell responses when compared with spike
(S) protein; in addition 71 peptides were identified as helper
T-cell epitopes, 34 peptides were identified as CTL epitopes; just
top 10 helper T-cell epitopes and CTL epitopes based on maximum
HLA binding alleles, can elicit protective cellular immune responses
against MERS-CoV were considered as MERS vaccine candidates
and they are covering 15 geographic regions [19].
In this study that consists of two parts reference and modified
sequence of both S glycoprotein and E protein, I found that the
most common B-cell epitope that passed all B-cell prediction meth-
ods [IEDB prediction tool] for E protein is YVKFQDS in position
69 and for modified E they are VYVPQQD, YVPQQDS, and
PPLPED/PPLPEDV epitopes at positions 68, 69, and 77 sequen-
tially; while for S and modified S, they are DVGPDSV, PDSVKSA,
DSVKSAC, PRPIDVS, HTPATDC, AKPSGSV, KPSGSVV,
SGTPPQV, GTPPQVY, TPPQVYN, QLSPLEG, YGPLQTP,
PRSVRSV, RSVRSVP, SVKSSQS, VKSSQSS, SQSSPII, and
SLNTKYV at positions 23, 26, 27, 48, 211, 371, 372, 393,
394, 395, 547, 707, 750, 751, 856, 859 (857 in modified S
glycoprotein), and 1202 sequentially, but QVDQLNS and
VDQLNSS epitopes at positions 772 and 773 are only found in S
glycoprotein, while LTPTSSY, TPTSSYV, PTSSYVD, TSSYVDV,
DHGDYYV, YSQDVKQ, ANQYSPC, NQYSPCV, and YYRKQLS
epitopes at positions 15, 16, 17, 18, 83, 108, 523, 524, and 543 are
only found in modified S glycoprotein; according to my study, I
found that the results of S and modified S glycoprotein they are
142 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
partially agree with the study that was done in Africa city of
Technology-Khartoum, Sudan by Badawi et al, [16] in those epi-
topes GTPPQVY in position 391–397 and LTPRSVRSVP in posi-
tion 745–754, may be do you to different numbers of selected
MERS-CoV protein sequence.
Prediction of cytotoxic T-lymphocyte epitopes and their inter-
action with MHC Class I, the results showed ILDYFSYPL was
similar according my study, Badwai et al [16] and Poorinmoham-
mad and Mohabatkar [15] studies; partially similarity with Iranian
study [15] in LLSGTPPQV, ILATVPHNL, LQMGFGITV, and
FSNPTCLIL epitopes were noticed except NLTTITKPL epitope
that was absent from my study in S and modified S sequence;
FSNPTCLIL represents the only epitope that is found in my
study in S and modified S sequence; FSFGVTQEY have a high
affinity to bind to many alleles and these findings agree with Badawi
et al. [16] in addition to ITYQGLFPY in my study through S
glycoprotein sequence, but still there are differences in the numbers
of selected epitopes that reacted with MHC-I which were higher
than that in Badawi et al. [16], while in E protein FIFTVVCAI
epitope has a higher allele affinity followed by ITLLVCMAF,
IVNFFIFTV, and LVQPALYLY reverse modified E protein;
LVQPALSLY epitope has shown high affinity and then followed
by LYMTGRSVY, WFIPNFFDF, YMTGRSVYV, ITLLVCTAF,
FVQERIGWF, FLTATHLCV, and CMTGFNTLL, the last epi-
tope which is common between E and modified E protein
sequences.
Prediction of T-helper cell epitopes and their interactions with
MHC Class II showed FNLTLLEPVSISTGS epitope that was
considered as the most suitable epitope with a high affinity to
26 alleles in Badawi et al. [16]; this epitope was actually found in
S and modified S sequence of my study, but the difference is that it
cannot considered that the most suitable epitope with a high bind-
ing affinity to different alleles like in in Badawi et al, [16] study.
There is no research results related to E protein and modified E
and S glycoprotein epitope vaccine instead of partial similarity that I
found between S and modified S glycoprotein.
No previous study illustrates S glycoprotein and E protein
allergic reactions except the study that were done by Shi J et al.
[19] for N protein, but in this study, S and E protein showed no
allergic reaction according to AllerHunter services. Furthermore
Shi J et al. [19] said that, for N protein, the analysis of the surface
accessibility of the predicted peptides showed that the maximum
surface probability value was 6.971 at amino acid position from
363 to 368 (363KKEKKQ368), but the minimum value of surface
probability was 0.074 for 205GIGAVG210 peptides, while in the
analysis of the flexibility of the predicted peptides, they showed that
the maximum flexibility value was 1.160 at amino acid position
from 170 to 176 (167GNSQSSS173) with the minimum value
A Computational Vaccine Designing Approach for MERS-CoV Infections 143
4.1 Conclusions As I mentioned before, software vaccine and drug design became
very important in the first and third world countries to avoid
wasting resources, time, and efforts; for MERS-CoV vaccine, it is
important to design effective vaccine that cannot be protected
against MERS-CoV but also the emergence of new strain besides
the other human coronavirus especially when MERS-CoV vaccines
they are not passed all vaccine design protocols.
In this study I found the following points: Emergence of a new
strains may had a minor change in peptide sequence vaccine espe-
cially when the selected viruses parts nor longer neither smaller in
their length.
In B-cell prediction; mutations can lead to increased numbers
of selected epitopes with very few sequence changes noticed, in
addition to a large number of shared epitopes between reference
and modified sequence; this means mutated sequence has the abil-
ity to elicit the same immune response (IR) (response to virus by
the same antibodies as in first infections).
144 Hiba Siddig Ibrahim and Shamsoun Khamis Kafi
Acknowledgments
The author would like to thank Allah, her family, for always sup-
porting her, and the National Ribat University members.
References
1. Coronavirus-Vaccine-a-6110.html, 2013 5. Ithete NL, Stoffberg S, Corman VM, Cotton-
2. http://en.wikipedia.org/wiki/Coronavirus, tail VM, Richards LR, Schoeman MC,
2014 Drosten C, Drexler JF, Preiser W (2013)
3. Khan G (2013) A novel coronavirus capable of Close relative of human middle east respiratory
lethal human infections: an emerging picture. syndrome coronavirus in Bat, South Africa.
Virol J 10:66. http://virologyj.biomedcentral. Emerg Infect Dis 19(10):1697–1699
com/articles/10.1186/1743-422X-10-66 6. Wang L, Shi W, Joyce GM, Modjarrad K,
4. Modjarrad K (2016) MERS-CoV vaccine can- Zhang Y, Leung K, Lees RC, Zhou T, Yassine
didates in development: the current landscape. MH et al (2015) Evaluation of candidate vac-
Vaccine 34(26):2982–2987 cine approaches for MERS-CoV. Nat Commun
A Computational Vaccine Designing Approach for MERS-CoV Infections 145
Abstract
A great number of novel proteins have been generated from new sources and genetically modified foods
during the last decade. As the allergenicity of these proteins is of particular importance for their safe usage,
fast and reliable screening strategies for allergenicity assessment are required. The WHO/FAO guidance
directs to structural similarities between the novel proteins and known allergens detected by sequence
alignment. However, the allergic response involves conformational IgE epitopes that are undetectable by
sequence alignment. Here, we present a protocol for allergenicity prediction based on a platform of three
alignment-independent servers developed in our lab: AllerTOP v.1, AllerTOP v.2, and AllergenFP. The
servers use similar datasets but different chemical descriptors and methods to derive models for allergenicity
prediction. The platform is freely accessible and user-friendly. The protocol is demonstrated stepwise on a
randomly chosen query protein.
Key words Allergenicity, Allergenicity prediction, Physicochemical properties of amino acids, Align-
ment-independent methods
1 Introduction
147
148 Ivan Dimitrov and Irini Doytchinova
2.1 Swiss-Prot Swiss-Prot is the manually annotated and reviewed database of the
Universal Protein Knowledgebase (UniProtKB) (http://www.
uniprot.org). It contains more than 50,000 protein sequences
(July 2019).
2.4 AllerTOP v.2 AllerTOP v.2 model was developed on an updated dataset of 2427
allergens and 2427 non-allergens. The set of allergens was collected
from several databases containing allergens of different origin and
route of exposure. The set of non-allergens was collected from
widely used food species like tomato, pepper, potato, bread
wheat, rice, and human non-immunogenic proteins.
The protein sequences were encoded by five E-descriptors
accounting for hydrophobicity, size, helix- or β-strand forming
propensities, and relative abundance. After ACC transformation,
the dataset was analyzed by kNN algorithm. The best performing
model showed accuracy of 85% at k ¼ 1 [3].
2.5 AllergenFP AllergenFP uses the same ACC transformed dataset as AlerTOP
v.2. Each vector from the dataset is encoded as a binary fingerprint.
The tested protein is presented as a string of E-descriptors, trans-
formed by ACC and coded binary to generate a fingerprint. The
fingerprint is compared to the protein fingerprints from the Aller-
genFP dataset by Tanimoto similarity index. The protein showing
the highest similarity with the tested protein predicts the
allergenicity/non-allergenicity of the tested protein. The external
validation of the model gave 88% accuracy [4].
3 Methods
3.1 Allergenicity 1. Retrieve the protein sequence for Fra a 1.02 from UniProtKB
Prediction by AllertTOP (entry: A0A024B3G5) or GenBank (AHZ10957.1).
v.1 2. Download or copy the sequence in FASTA format.
3. Open AllerTOP v.1 (URL: http://www.ddg-pharmfac.net/
allertop).
4. Enter the protein sequence in plain format (removing the title
line: delimiter >).
5. Press “Get the Result” button.
6. The AllerTOP v.1. RESULTS page lists the statement:
“PROBABLE ALLERGEN” (Fig. 1) and a list of the three
nearest proteins presented by accession number, hyperlink to
UniProtKB or NCBI protein database, and class allergen or
non-allergen. In our case, the three nearest neighbors are:
(a) 60280861 defined as allergen—major allergen Mal d
1.03F from Malus domestica (apple) [10] with accession
number in the NCBI protein database
Fig. 1 The AllerTOP v.1. RESULTS page for the query protein AHZ10957.1 from
Fragaria x ananassa subsp. ananassa (strawberry)
An Alignment-Independent Platform for Allergenicity Prediction 151
3.2 Allergenicity 1. Retrieve the protein sequence for Fra a 1.02 from UniProtKB
Prediction by AllertTOP (entry: A0A024B3G5) or GenBank (AHZ10957.1).
v.2 2. Download or copy the sequence in FASTA format.
3. Go to AllerTOP v.2 homepage (URL: http://www.ddg-
pharmfac.net/AllerTOP).
4. Enter the protein sequence in plain format (removing the title
line: delimiter >).
5. Press “Get the Result” button.
6. The AllerTOP v.2. RESULTS page lists the statement: “Your
sequence is PROBABLE ALLERGEN. The nearest protein is
NCBI gi number 60280861 (hyperlink) defined as allergen”
(Fig. 2). Here again, the nearest neighbor is the major allergen
Mal d 1.03F from Malus domestica (apple) [10].
3.3 Allergenicity 1. Retrieve the protein sequence for Fra a 1.02 from UniProtKB
Prediction by (entry: A0A024B.
AllergenFP G5) or GenBank (AHZ10957.1).
2. Download or copy the sequence in FASTA format.
Fig. 2 The AllerTOP v.2. RESULTS page for the query protein AHZ10957.1 from
Fragaria x ananassa subsp. ananassa (strawberry)
152 Ivan Dimitrov and Irini Doytchinova
Fig. 3 The AllergenFP RESULTS page for the query protein AHZ10957.1 from
Fragaria x ananassa subsp. ananassa (strawberry)
Acknowledgments
This work has been accomplished with the financial support by the
Grant No BG05M2OP001-1.001-0003, financed by the Science
and Education for Smart Growth Operational Program (2014-
2020), and co-financed by the European Union through the
European structural and investment funds.
References
1. FAO/WHO (2009) Foods derived from mod- physical-chemical properties. J Mol Model
ern biotechnology (annex: assessment of possi- 7:445–453
ble allergenicity, Codex Alimentarius—joint 8. Tanimoto TT (1958) An elementary mathe-
FAO/WHO food standards), 2nd edn. matical theory of classification and prediction.
WHO, Geneva IBM Research Yorktown Heights, New York
2. Dimitrov I, Flower DR, Doytchinova I (2013) 9. Franz-Oberdorf K, Eberlein B, Edelmann K,
AllerTOP—a server for in silico prediction of Hücherig S, Besbes F, Darsow U, Ring J,
allergens. BMC Bioinformatics 14(Suppl.6):S4 Schwab W (2016) Fra a 1.02 is the most potent
3. Dimitrov I, Bangov I, Flower DR, Doytchi- isoform of the bet v 1-like allergen in straw-
nova I (2014) AllerTOP v.2—a server for in berry fruit. J Agric Food Chem 64:3688–3696
silico prediction of allergens. J Mol Model 10. Gao ZS, van de Weg WE, Schaart JG, Schouten
20:2278 HJ, Tran DH, Kodde LP, van der Meer IM, van
4. Dimitrov I, Naneva L, Bangov I, Doytchinova der Geest AHM, Kodde J, Breiteneder H,
I (2014) AllergenFP: allergenicity prediction Hoffmann-Sommergruber K, Bosch D, Gilis-
by descriptor fingerprints. Bioinformatics sen LJWJ (2005) Genomic cloning and linkage
30:846–851 mapping of the mal d 1 (PR-10) gene family in
5. Hellberg S, Sjöström M, Skagerberg B, Wold S apple (Malus domestica). Theor Appl Genet
(1987) Peptide quantitative structure-activity 111:171–183
relationships, a multivariate approach. J Med 11. Hsieh LS, Moos M, Lin Y (1995) Characteri-
Chem 30:1126–1135 zation of apple 18 and 31 kd allergens by
6. Wold S, Jonsson J, Sjöström M, Sandberg M, microsequencing and evaluation of their con-
R€annar S (1993) DNA and peptide sequences tent during storage and ripening. J Allergy Clin
and chemical processes multivariately modelled Immunol 96:960–970
by principal components analysis and partial 12. Scheurer S, Pastorello EA, Wangorsch A,
least squares projections to latent structures. K€astner M, Haustein D, Vieths S (2001)
Anal Chim Acta 277:239–253 Recombinant allergens Pru av 1 and Pru av
7. Venkatarajan MS, Braun W (2001) New quan- 4 and a newly identified lipid transfer protein
titative descriptors of amino acids based on in the in vitro diagnosis of cherry allergy. J
multidimensional scaling of a large number of Allergy Clin Immunol 107:724–731
Chapter 6
Abstract
With advancements in sequencing technologies, vast amount of experimental data has accumulated. Due to
rapid progress in the development of bioinformatics tools and the accumulation of data, immunoinfor-
matics or computational immunology emerged as a special branch of bioinformatics which utilizes bioin-
formatics approaches for understanding and interpreting immunological data. One extensively studied
aspect of applied immunology involves using available databases and tools for prediction of B- and T-cell
epitopes. B and T cells comprise two arms of adaptive immunity.
This chapter first reviews the methodology we used for computational identification of B- and T-cell
epitopes against enterotoxigenic Escherichia coli (ETEC). Then we discuss other databases of epitopes and
analysis tools for T-cell and B-cell epitope prediction and vaccine design. The predicted peptides were
analyzed for conservation and population coverage. HLA distribution analysis for predicted epitopes
identified efficient MHC binders. Epitopes were further tested using computational docking studies to
bind in MHC-I molecule cleft. The predicted epitopes were conserved and covered more than 80% of the
world population.
1 Introduction
155
156 Jayashree Ramana and Kusum Mehla
2 Our Approach
2.2 Antigenicity VaxiJen server [9] which transforms protein sequence information
and Transmembrane into vectors of amino acid properties was used to determine the
Prediction protein sequences with the potential to be antigenic above thresh-
old score of 0.6. The cellular location of a protein holds some clues
to its function. Hence, for the potential antigenic sequences, their
cellular locations were predicted through PSORTb at a probability
value ( p-value) of 7.5. PSORTb is based on a Bayesian network
model to calculate associated probability for five major localization
sites, viz., cytoplasmic, inner membrane, periplasmic, outer mem-
brane, and extracellular for Gram-negative bacteria [10]. In search
for vaccine candidates, it is imperative to look for genomic seg-
ments with high potential to be antigenic, and thus VaxiJen was
firstly used for identifying antigenic proteins and later to identify
peptides with high antigenicity values. VaxiJen predicted nine pro-
teins as antigenic above a threshold of 0.6, three of which were
localized in outer membrane making them ideal candidates for
epitope identification. A Venn diagram represents three prioritized
proteins retrieved after filtering on physiochemical properties
which would be further used for epitope mapping (Fig. 1).
158 Jayashree Ramana and Kusum Mehla
Non-human Antigenic
0 0
Pathogenic Membrane
0
52 0
6 0
2 0
0 0
0 0
Fig. 1 Representation of three prioritized target proteins: The proteins characterized as (1) shared between
pathogenic strains and excluded from commensal strain, (2) non-homologous to humans, (3) antigenic,
(4) membrane localized are shown using different colors in the Venn diagram. Proteins satisfying a particular
parameter are shown in the corresponding category of the Venn diagram. Three proteins were prioritized for
vaccine candidate identification. The image has been generated by Jvenn. (Reproduced by permission of The
Royal Society of Chemistry [47])
2.3 T-Cell Epitope For T-cell epitope predictions, parameters such as T-cell processiv-
Prediction ity, the number of human leukocyte antigens (HLA) alleles covered,
and significant population coverage were considered. For predic-
tion of cytotoxic T-lymphocyte (CTL) epitopes from the outer
membrane localized sequences, NetCTL Server was used at a
threshold value of 0.75 and predictions were restricted to
12 major histocompatibility class I (MHC-I) supertypes. NetCTL
is neural network model which combines the prediction of peptide
MHC-I binding, proteasomal C terminal cleavage, and TAP trans-
port efficiency [11]. The CTL epitopes were also evaluated for their
antigenic potential. Based on their antigenicity scores, epitopes
were categorized into low-, medium-, and high-priority vaccine
candidates. As shown in Table 1, a total of 50 epitopes were pre-
dicted by NetCTL which after antigenicity assessment were
grouped in high priority (5 candidates), medium priority (11 can-
didates), and low priority (9 candidates). To predict MHC-I alleles
that bind efficiently to the high-priority epitopes, Immune Epitope
Database (IEDB) tool was used [12]. For computing inhibitory
concentration (IC50) values of MHC-1 and MHC-II alleles, a
partial least square-based multivariate model MHCPred was used
Immunoinformatics and Epitope Prediction 159
Table 1
Most probable predicted epitopes selected on the basis of their NetCTL (MHC binding, proteasomal
processing, and TAP transport) and VaxiJen (Antigenicity) score
[13]. The consensus alleles with binding affinity IC50 value less
than 500 nM from both MHCPred and IEDB were considered as
efficient peptide binders. A list of both MHC-I and MHC-II alleles
which may serve as efficient binders for each high-priority epitope
was prepared. The results are summarized in Table 2. Despite
unavailability of data for majority of the MHC-II alleles, our pre-
dicted peptides showed good population coverage. MHC is highly
polymorphic, thus identifying peptides which can bind to several
MHC alleles is an important factor in vaccine identification
[14]. We observed that all the predicted epitopes were interacting
with more than 20 MHC alleles.
160 Jayashree Ramana and Kusum Mehla
Table 2
Predicted potential T-cell epitopes, along with their interacting MHC-I and MHC-II alleles with an
affinity of <500 nM and corresponding IC50 values (bracketed)
Total no. of
MHC-peptide
Epitope binders MHC-I alleles MHC-II alleles
LICFFTLSY 24 HLA-A∗01:01 (92.47), HLA-A∗02:06 HLA-DRB1∗01:01
(105.68), HLA-A∗02:17 (130.82), (9.40)
HLA-A∗03:01 (305.49), HLA-A∗11:01
(257.22), HLA-A∗29:02 (30.89),
HLA-A∗30:02 (400.37), HLA-A∗32:01
(468.75), HLA-A∗32:07 (23.72),
HLA-A∗32:15 (164.53), HLA-A∗68:01
(59.57), HLA-A∗68:02 (492.04),
HLA-A∗68:23 (99.25), HLA-A∗80:01
(26.42), HLA-B∗15:01 (54.83),
HLA-B∗15:02 (293.24), HLA-B∗15:03
(56.51), HLA-B∗15:17 (212.18),
HLA-B∗27:20 (14.40), HLA-B∗35:01
(133.53), HLA-B∗40:13 (179.03),
HLA-C∗12:03 (14.73), HLA-C∗14:02
(468.40)
PLNPLILLY 23 HLA-A∗01:01 (285.76), HLA-A∗02:03 HLA-DRB1∗01:01
(4.80), HLA-A∗02:17 (280.32), (1.01),
HLA-A∗03:01 (323.59), HLA-A∗11:01 HLA-DRB1∗04:01
(144.88), HLA-A∗29:02 (25.63), (359.75)
HLA-A∗31:01 (210.38), HLA-A∗32:07
(20.42), HLA-A∗32:15 (47.78),
HLA-A∗68:01 (127.64), HLA-A∗68:23
(91.57), HLA-A∗80:01 (2.30),
HLA-B∗15:02 (144.62), HLA-B∗27:20
(4.76), HLA-B∗35:01 (436.52),
HLA-B∗40:13 (399.88), HLA-C∗05:01
(102.02), HLA-C∗07:01 (326.72),
HLA-C∗07:02 (295.24), HLA-C∗12:03
(48.11), HLA-C∗14:02 (228.36)
PIVNLFLLY 22 HLA-A∗01:01 (14.13), HLA-A∗02:01 HLA-DRB1∗01:01
(456.04), HLA-A∗02:03 (280.54), (0.78),
HLA-A∗02:06 (38.55), HLA-A∗03:01 HLA-DRB1∗07:01
(20.94), HLA-A∗11:01 (49.32), (47.64)
HLA-A∗26:01 (450.41), HLA-A∗26:02
(108.22), HLA-A∗29:02 (28.96),
HLA-A∗32:07 (26.92), HLA-A∗32:15
(52.63), HLA-A∗68:01 (99.54),
HLA-A∗68:23 (76.34), HLA-A∗80:01
(4.53), HLA-B∗15:02 (159.31),
HLA-B∗27:20 (35.69), HLA-B∗35:01
(251.77), HLA-B∗40:13 (93.53),
HLA-C∗03:03 (255.63), HLA-C∗12:03
(23.62)
(continued)
Immunoinformatics and Epitope Prediction 161
Table 2
(continued)
Total no. of
MHC-peptide
Epitope binders MHC-I alleles MHC-II alleles
SVSVFIFLF 27 HLA-A∗02:01 (306.90), HLA-A∗02:02 HLA-DRB1∗01:01
(269.78), HLA-A∗02:06 (307.61), (12.45)
HLA-A∗02:50 (74.00), HLA-A∗11:01
(6.90), HLA-A∗24:02 (467.34),
HLA-A∗24:03 (388.63), HLA-A∗26:02
(63.58), HLA-A∗29:02 (325.66),
HLA-A∗32:01 (111.41), HLA-A∗32:07
(7.91), HLA-A∗32:15 (179.99),
HLA-A∗68:01 (50.23), HLA-A∗68:02
(202.77), HLA-A∗68:23 (11.72),
HLA-B∗15:02 (222.45), HLA-B∗15:03
(495.55), HLA-B∗15:17 (93.69),
HLA-B∗27:20 (70.39), HLA-B∗35:01
(242.66), HLA-B∗40:13 (106.40),
HLA-B∗58:01 (454.26), HLA-C∗03:03
(92.17), HLA-C∗07:01 (239.98),
HLA-C∗07:02 (297.97), HLA-C∗12:03
(84.96)
SVFIFLFIY 25 HLA-A∗02:01 (316.96), HLA-A∗02:02 HLA-DRB1∗01:01
(467.74), HLA-A∗02:06 (77.62), (41.59)
HLA-A∗02:17 (90.50), HLA-A∗03:01
(239.33), HLA-A∗11:01 (18.84),
HLA-A∗26:01 (454.58), HLA-A∗26:02
(73.17), HLA-A∗29:02 (21.27),
HLA-A∗30:02 (77.89), HLA-A∗32:01
(253.48), HLA-A∗32:07 (13.09),
HLA-A∗32:15 (53.12), HLA-A∗68:01
(15.56), HLA-A∗68:02 (283.79),
HLA-A∗68:23 (5.53), HLA-A∗80:01
(18.37), HLA-B∗15:02 (157.12),
HLA-B∗15:17 (46.96), HLA-B∗27:20
(41.83), HLA-B∗35:01 (160.91),
HLA-B∗40:13 (20.37), HLA-C∗07:02
(444.81), HLA-C∗12:03 (11.07)
(Reproduced by permission of The Royal Society of Chemistry [47])
2.4 HLA Distribution Five high-priority epitopes were grouped into two sets as they
Analysis belonged to two different proteins. Another IEDB tool [15] for
human population coverage analysis based on data from Allele
Frequency Net Database (AFND) was used to study the distribu-
tion of human HLA alleles for the high-priority epitopes in these
sets. The two sets were showing immune response elicitation in
more than 95% of the total world population. For one epitope set,
maximum coverage was in European region (99.23%), followed by
162 Jayashree Ramana and Kusum Mehla
2.6 B-Cell Epitope For B-cell epitope predictions, we identified epitopes which could
Identification be efficiently processed by B lymphocytes and were chosen based on
criteria of surface accessibility, flexibility, and hydrophilicity. To
identify potential antigens which can interact with B lymphocytes,
BCPred [21] and AAP [22] algorithms at BCPREDS server
were used. IEDB tools, viz., Emini-surface accessibility prediction
[23], Karplus and Schulz flexibility prediction [24], and Parker
Immunoinformatics and Epitope Prediction 163
Table 3
Population coverage of predicted epitopes (set I and set II) based on MHC-I and MHC-II
restriction data
(continued)
164 Jayashree Ramana and Kusum Mehla
Table 3
(continued)
Fig. 2 (a) Docked complex of HLA-A∗11:01 with epitope SVFIFLFIY visualized in Pymol (b) and corresponding
interactions involved in binding visualized in LigPlot. (Reproduced by permission of The Royal Society of
Chemistry [47])
Table 4
Five most potential B-cell epitopes by combined predictions of AAP, BCPred, IEDB tools (Emini
surface accessibility, Karplus and Schulz flexibility, Parker hydrophilicity), further filtered based on
their antigenicity values (VaxiJen Score)
Table 5
Potential conformational B-cell epitope residues predicted using CBTOPE and DiscoTope server, along
with the corresponding secondary structure conformation each residue adopts and DiscoTope scores
CBTOPE DiscoTope
B-cell epitopes are found more in the coiled or turn region than in the
helix or beta sheet and majority of the antibody interacting residues
predicted in our study were also adopting coil conformation. This
further validates the accuracy of our results.
To sum up our approach, it would be safe to say that tools and
databases employed in our approach allowed us to identify a group
of ETEC-specific vaccine candidates with high potential to encode
for protective immunogens. This approach confirms the antigenic
and protective efficacy of a subset of the candidates. Further experi-
mental validation of the predicted epitopes in eliciting humoral and
cell-mediated immune responses in vitro and in vivo is required to
increase our therapeutic arsenal against ETEC.
166 Jayashree Ramana and Kusum Mehla
3.1 B-Cell Epitope B cells mediate humoral adaptive immunity by recognizing solvent-
Prediction exposed antigens through means of B-cell receptors (BCR) which
are present on surface of B cells. BCRs contain membrane-bound
immunoglobulins which upon activation are released in their solu-
ble forms where they exert their appropriate action against antigens
such as neutralizing them or tagging them for apoptosis. B-cell
epitopes can be continuous (linear stretch of amino acids) or dis-
continuous/conformational (set of spatially separated amino acids
which are brought in close proximity as a result of protein folding.
Though 90% of the B-cell epitopes are conformational, there is not
much data available for such epitopes due to prerequisite of struc-
ture availability, which makes conformational epitope prediction
even more challenging. Bcipep is a collection of 3031 experimen-
tally determined linear B-cell epitopes curated from literature and
other public repositories [27]. Based on immunogenicity, these
epitopes are categorized into immunodominant, immunogenic,
and null-immunogenic epitopes. CED is another B-cell epitope
database which consists of well-defined conformational epitopes
and related information such as epitope residue composition and
location, its structure and immunological property, antibody bind-
ing to the particular epitope, etc. curated from published literature
[28]. Epitome implements a semi-automated tool for analyzing
Immunoinformatics and Epitope Prediction 167
3.2 T-Cell Epitope Antigens are translocated to the endoplasmic reticulum (ER) from
Prediction cytoplasm through action of TAP (transporter associated with anti-
gen presentation) where TAP transport protein gets dissociated and
antigens are bound to MHC molecules. This antigen-MHC com-
plex then leaves ER to be presented on the surface of antigen-
presenting cells (APCs). These MHC-bound antigens presented
on the surface of APC are recognized by T cell by means of T-cell
receptor (TCR). T-cell epitopes are bound to MHC class-I
(MHCI) and class-II (MHC-II) molecules which are recognized
by CD8 and CD4 T cells, respectively. To accurately model the
168 Jayashree Ramana and Kusum Mehla
4 Conclusion
References
1. Kazi A, Chuah C, Majeed ABA et al (2018) Mycobacterium ulcerans for the identification
Current progress of immunoinformatics of putative essential genes and therapeutic can-
approach harnessed for cellular- and antibody- didates. PLoS One 7(8):e43080
dependent vaccine design. Pathog Glob Health 9. Doytchinova I, Flower D (2007) VaxiJen: a
112(3):123–131 server for prediction of protective antigens,
2. Evans MC (2008) Recent advances in immu- tumour antigens and subunit vaccines. BMC
noinformatics: application of in silico tools to Bioinformatics 8(1):4
drug development. Curr Opin Drug Discov 10. Yu NY, Wagner JR, Laird M et al (2010)
Deve l11(2):233–241 PSORTb 3.0: improved protein subcellular
3. Walker RI (2015) An assessment of enterotoxi- localization prediction with refined localization
genic Escherichia coli and Shigella vaccine can- subcategories and predictive capabilities for all
didates for infants and children. Vaccine 33 prokaryotes. Bioinformatics 26
(8):954–965 (13):1608–1615
4. Pizza M, Scarlato V, Masignani V et al (2000) 11. Larsen MV, Lundegaard C, Lamberth K et al
Identification of vaccine candidates against ser- (2007) Large-scale validation of methods for
ogroup B meningococcus by Whole-Genome cytotoxic T-lymphocyte epitope prediction.
sequencing. Science 287(5459):1816–1820 BMC Bioinformatics 8:424
5. Moriel DG, Bertoldi I, Spagnuolo A et al 12. Tenzer S, Peters B, Bulik S et al (2005) Model-
(2010) Identification of protective and broadly ing the MHC class I pathway by combining
conserved vaccine antigens from the genome of predictions of proteasomal cleavage, TAP
extraintestinal pathogenic Escherichia coli. transport and MHC class I binding. Cell Mol
Proc Natl Acad Sci U S A 107(20):9072–9077 Life Sci 62(9):1025–1037
6. McCarthy A, Lindsay J (2010) Genetic varia- 13. Guan P, Doytchinova IA, Zygouri C et al
tion in Staphylococcus aureus surface and (2003) MHCPred: a server for quantitative
immune evasion genes is lineage associated: prediction of peptide-MHC binding. Nucleic
implications for vaccine design and host- Acids Res 31(13):3621–3624
pathogen interactions. BMC Microbiol 10 14. Thorpe C, Edwards L, Snelgrove R et al (2007)
(1):173 Discovery of a vaccine antigen that protects
7. Brocchieri L, Karlin S (2005) Protein length in mice from Chlamydia pneumoniae infection.
eukaryotic and prokaryotic proteomes. Nucleic Vaccine 25(12):2252–2260
Acids Res 33(10):3390–3400 15. Bui HH, Sidney J, Dinh K et al (2006) Predict-
8. Butt AM, Nasrullah I, Tahir S et al (2012) ing population coverage of T-cell epitope-
Comparative genomics analysis of
170 Jayashree Ramana and Kusum Mehla
based diagnostics and vaccines. BMC Bioinfor- 30. Odorico M, Pellequer JL (2003) BEPITOPE:
matics 7:153 predicting the location of continuous epitopes
16. Harris JA, Roy K, Woo-Rasberry V et al (2011) and patterns in proteins. J Mol Recognit 16
Directed evaluation of enterotoxigenic Escher- (1):20–22
ichia coli autotransporter proteins as putative 31. Saha S, Raghava GPS (2006) Prediction of
vaccine candidates. PLoS Negl Trop Dis 5(12): continuous B-cell epitopes in an antigen using
e1428 recurrent neural network. Proteins 65
17. Thevenet P, Shen Y, Maupetit J et al (2012) (1):40–48
PEP-FOLD: an updated de novo structure pre- 32. Kulkarni-Kale U, Raskar-Renuse S, Natekar-
diction server for both linear and disulfide Kalantre G et al (2014) Antigen–antibody
bonded cyclic peptides. Nucleic Acids Res 40 interaction database (AgAbDb): a compen-
(Web Server issue):W288–W293 dium of antigen–antibody interactions. In:
18. Macindoe G, Mavridis L, Venkatraman V et al De R, Tomar N (eds) Immunoinformatics,
(2010) HexServer: an FFT-based protein Methods in molecular biology (methods and
docking server powered by graphics processors. protocols), vol 1184. Humana Press,
Nucleic Acids Res 38(Web Server issue): New York, pp 149–164
W445–W449 33. Jespersen MC, Peters B, Nielsen M et al (2017)
19. The PyMOL Molecular Graphics System BepiPred-2.0: improving sequence-based
(2010) Version 1.3r1. LLC, Schrodinger B-cell epitope prediction using conformational
20. Laskowski RA, Swindells MB (2011) LigPlot+: epitopes. Nucleic Acids Res 45(W1):
multiple ligand-protein interaction diagrams W24–W29
for drug discovery. J Chem Inf Model 51 34. Sweredoski MJ, Baldi P (2008) PEPITO:
(10):2778–2786 improved discontinuous B-cell epitope predic-
21. El-Manzalawy Y, Dobbs D, Honavar V (2008) tion using multiple distance thresholds and half
Predicting linear B-cell epitopes using string sphere exposure. Bioinformatics 24
kernels. J Mol Recognit 21(4):243–255 (12):1459–1460
22. Chen J, Liu H, Yang J et al (2007) Prediction 35. Ponomarenko J, Bui HH, Li W et al (2008)
of linear B-cell epitopes using amino acid pair ElliPro: a new structure-based tool for the pre-
antigenicity scale. Amino Acids 33(3):423–428 diction of antibody epitopes. BMC Bioinfor-
matics 9:514
23. Emini EA, Hughes JV, Perlow DS et al (1985)
Induction of hepatitis A virus-neutralizing 36. Rubinstein ND, Mayrose I, Martz E et al
antibody by a virus-specific synthetic peptide. (2009) Epitopia: a web-server for predicting
J Virol 55(3):836–839 B-cell epitopes. BMC Bioinformatics 10:287
24. Karplus PA, Schulz GE (1985) Prediction of 37. Kringelum JV, Lundegaard C, Lund O et al
chain flexibility in proteins. Naturwissenschaf- (2012) Reliable B cell epitope predictions:
ten 72(4):212–213 impacts of method development and improved
benchmarking. PLoS Comput Biol 8(12):
25. Parker JMR, Guo D, Hodges RS (1986) New e1002829
hydrophilicity scale derived from high-
performance liquid chromatography peptide 38. Rammensee H, Bachmann J, Emmerich NP
retention data: correlation of predicted surface et al (1999) SYFPEITHI: database for MHC
residues with antigenicity and X-ray-derived ligands and peptide motifs. Immunogenetics
accessible sites. Biochemistry 25 50(3–4):213–219
(19):5425–5432 39. Lefranc MP (2001) IMGT, the international
26. Ansari HR, Raghava GPS (2010) Identification ImMunoGeneTics database. Nucleic Acids
of conformational B-cell Epitopes in an antigen Res 29(1):207–209
from its primary sequence. Immunome Res 6:6 40. Sathiamurthy M, Hickman HD, Cavett JW
27. Saha S, Bhasin M, Raghava GPS (2005) Bci- et al (2003) Population of the HLA ligand
pep: a database of B-cell epitopes. BMC Geno- database. Tissue Antigens 61(1):12–19
mics 6:79 41. Toseland CP, Clayton DJ, McSparron H et al
28. Huang J, Honda W (2006) CED: a conforma- (2005) AntiJen: a quantitative immunology
tional epitope. BMC Immunol 7:7 database integrating functional, thermody-
namic, kinetic, biophysical, and cellular data.
29. Schlessinger A, Ofran Y, Yachdav G et al Immunome Res 1(1):4
(2006) Epitome: database of structure-inferred
antigenic epitopes. Nucleic Acids Res 34: 42. Reche PA, Zhang H, Glutting JP et al (2005)
D777–D780 EPIMHC: a curated database of
MHC-binding peptides for customized
Immunoinformatics and Epitope Prediction 171
Abstract
Vaccination is the best way to prevent the spread of emerging or reemerging infectious disease. Current
research for vaccine development is mainly focused on recombinant-, subunit-, and peptide-based vaccine.
At this point, immunoinformatics has been proven as a powerful method for identification of potential
vaccine candidates, by analyzing immunodominat B- and T-cell epitopes. This method can reduce the time
and cost of experiment to a great extent, by reducing the number of vaccine candidates for experimental
testing for their efficacy. This chapter describes the use of immunoinformatics and molecular docking
methods to screen potential vaccine candidates by taking Leptospira as a model.
1 Introduction
173
174 Kumari Snehkant Lata et al.
Fig. 1 Workflow representing the key steps in screening of potential B- and T-cell epitopes. The conforma-
tional B-cell epitope depicted in this flowchart was predicted from 3D structure (PDB ID: 2ZZ8) of LipL32
protein of Leptospira. Structure shown here for visualization of binding interaction was downloaded from RCSB
PDB (PDB ID: 1B0G)
176 Kumari Snehkant Lata et al.
2.1 Retrieval The first step in immunoinformatic approach for vaccine design is
of the Target to retrieve protein sequences or whole proteome in FASTA format.
Sequences We can extract protein sequences from UniProtKB or NCBI data-
base and whole proteome sequence can be retrieved from UniProt
Proteomes database. Once protein sequences are obtained, we will
screen these sequences for their immunogenicity.
2.3 Identification The localization of a protein plays a vital role in determining its
of Outer Membrane functionality. A potential immunogen has to be easily recognized
Protein (OMP) by the immune cells. Outer membrane proteins are surface-exposed
which is easily recognized by the host immune system and involved
in the interaction between bacterial cells and their host [17, 18]. In
pathogenic bacteria, OMPs are proven to be the most promising
vaccine candidates, due to its interaction with the host immune
cells; and hence, identification of OMPs is crucial for a reliable and
rapid vaccine development [3]. Therefore, protein sequences, pre-
dicted as antigenic in Vaxijen server, were subjected to CELLO
v.2.5 server [19, 20] (http://cello.life.nctu.edu.tw/) to retrieve
outer membrane protein (see Note 1). CELLO uses machine
learning, support vector machine algorithm to predict localization
of the proteins.
2.4 B-Cell Epitope B-cell epitope is the main antigenic region of an antigen which are
Prediction recognized by the B-cell receptors of the immune system and are
able to stimulate humoral immune response, which cause the
B-lymphocytes to differentiate into antibody-secreting plasma and
memory cells [21]. After activation plasma cells secrete antigen-
specific antibodies and circulated to the bone marrow where they
can encounter the antigen. Memory B cells are distributed
throughout the body and respond quickly to kill the antigen if it
is encountered again [22]. B-cell epitopes can be categorized as
linear (continuous) and conformational (discontinuous) based on
their spatial structure.
Immunoinformatics for Vaccine Design 177
2.4.1 Linear B-Cell Linear B-cell epitope is a consecutive sequence of amino acids on an
Epitope antigen. B-cell epitope can be predicted using IEDB analysis
resource, where we have to paste protein sequence in plain format
or provide a Swiss-Prot ID and select an appropriate prediction
method. Linear B-cell epitope can be predicted using BepiPred
prediction method [23]. Antigenic B-cell epitope were predicted
by Kolaskar and Tongaonkar method at Immune Epitope Database
(IEDB) analysis resource (http://tools.iedb.org/main/bcell/)
[24]. This method predicts antigenic peptide by analyzing the
physicochemical properties of amino acid residues and their abun-
dance in experimentally determined antigenic epitopes [3, 24]. The
accuracy of this method to predict epitope is about 75%. Surface
accessibility, flexibility, and hydrophilic properties are also main
characteristics of B-cell epitopes [25]; hence, to predict these prop-
erties, Emini surface accessibility [26], Karplus and Schulz flexibil-
ity [27], and Parker hydrophilicity [28] prediction methods were
employed, respectively, with default parameters of IEDB analysis
resource (see Note 2).
2.5 T-Cell Epitope Unlike B cells, T cells do not recognize antigen directly, here
antigen first processed by antigen-presenting cell (APCs), e.g.,
dendritic cells or B cells or macrophages, and then present to the
T-cell receptor (TCR) by major histocompatibility complex
[22, 30]. There are mainly two types of T-cell epitopes, CTL
(cytotoxic T-lymphocytes) and HTC (Helper T cell). T cell
expresses a cluster of differentiation (CD) receptor on its cell sur-
face that recognizes the antigen presented on MHC molecule
[8]. CTL expresses the CD8+ receptor on its surface and recog-
nizes peptides presented by MHC class I molecules, while HTC
expresses CD4+ receptor, which recognize MHC II antigen
complex [8].
2.5.1 Helper T-Cell (HTC) Helper T cell is crucial for activating an efficient humoral and cell-
Epitope mediated immune response, by stimulating the differentiation and
proliferation of B and cytotoxic T cell [31]. The binding of epi-
topes, complexed with MHC class II, to the T-cell receptor can
178 Kumari Snehkant Lata et al.
2.5.2 Cytotoxic Consistent predictions of CTL epitopes are very important for the
T-Lymphocyte (CTL) coherent vaccine design. Because sometimes humoral immunity is
Epitope not far enough to completely clean the infection, cell-mediated
immunity is required to induce cell death and completely destroy
the bacterial habitat. Although pathogenic Leptospira is not consid-
ered as a typical intracellular pathogen, indeed some bacterial pro-
teins may be able to escape from the phagolysosome and reach to
the cytosol of host cells and are exposed to the host CD8+ T-cell
response [3, 35]. Hence, the presence of CTL epitopes in OMP
protein was predicted using NetCTL.1.2 server (http://www.cbs.
dtu.dk/services/NetCTL), with default parameters [36]. This
server predicts epitopes by integrating predictions of MHC class I
binding, proteasomal C-terminal cleavage, and the TAP transport
efficiency. The MHC class I binding and proteasomal C-terminal
cleavage were predicted by the artificial neural network, while a
weight matrix was used to predict the TAP transport efficiency. This
tool has an option to select MHC-I supertype; we can select any
one of the MHC-I supertypes at a time from the drop-down
option. As an output, it generates 9-mer epitope sequence and
their respective score for C-terminal cleavage, TAP transport effi-
ciency, threshold of epitope identification, and overall combined
score.
2.6 Immunogenicity The peptides with strong immunogenicity are more probable T-cell
Prediction of T-Cell epitopes than those with weak immunogenicity. Therefore, the
Epitopes immunogenicity of putative T-cell epitopes was evaluated using
IEDB immunogenicity prediction tool. CD4 T-cell immunogenicity
prediction method at IEDB (http://tools.iedb.org/CD4episcore/)
Immunoinformatics for Vaccine Design 179
2.9 Molecular A molecular docking study was performed to ensure the molecular
Binding Interaction binding interaction between HLA molecules and our predicted
Analysis of Predicted T-cell epitopes. For docking study, we need three-dimensional
Epitopes with HLA (3D) structure of predicted epitopes and HLA molecules. Since,
Molecules HLA-A2 allele is one of the most frequent MHC class I alleles in
most of the human populations; we downloaded the 3D structure
of HLA-A2 allele from RCSB PDB Protein Data Bank (https://
www.rcsb.org/) [41, 42].
180 Kumari Snehkant Lata et al.
2.9.1 3D Structure The 3D structures of all predicted T-cell epitopes excluding the
Prediction of T-Cell allergen one were modeled with the PEP-FOLD3 server (http://
Epitopes mobyle.rpbs.univ-paris-diderot.fr/cgi-bin/portal.py#forms::PEP-
FOLD3), using 200 simulation runs [43]. First the PEP-FOLD3
server clustered different conformational models of given epitope
and then sorted them with the sOPEP energy value. The best
ranked model was selected to analyze the interactions with selected
HLA molecules.
2.9.2 Molecular Docking Molecular docking between HLA molecule and predicted T-cell
epitopes was performed using PatchDock rigid-body server
(https://bioinfo3d.cs.tau.ac.il/PatchDock/php.php). In this
tool, we have to upload 3D structure of receptor and ligand mole-
cule; here, receptor will be HLA molecule and epitopes as a ligand.
This tool computes complexes with good molecular shape comple-
mentarity based on geometry of the molecules. The output of
PatchDock contains a list of predicted complex structures with
their rank, based on score. The best ranked docked complex was
further refined using FireDock (Fast Interaction Refinement in
Molecular Docking) server82,83 (http://bioinfo3d.cs.tau.ac.il/
FireDock/php.php) [44, 45] (see Notes 1 and 5). The output of
FireDock result includes ten best solutions for final refinement
complex, based on the binding score. This tool ranks the refined
complex based on global energy, attractive and repulsive Van der
Waals forces, atomic contact energy, and hydrogen binding interac-
tion scores. The complex having lowest global energy was ranked
first and considered the best suited confirmation for complex for-
mation. A screenshot of FireDock result outputs, taken from help
file, is shown in Fig. 2.
Fig. 2 Screenshot of refined docking complex, created from help file of FireDock
tool. The complexes are ranked based on their global energy
Immunoinformatics for Vaccine Design 181
3 Notes
1. Users are suggested to use the most recent versions of any tools
or servers, as these tools are continuously being updated with
improved prediction algorithms and datasets.
2. As an alternative to the tools mentioned above for B-cell epi-
tope, you can use BcePred Prediction Server (http://crdd.
osdd.net/raghava/bcepred/). This tool gives scores for hydro-
philicity, turns, the surface exposed, flexibility, polar, accessibil-
ity, and antigenicity propensity score for each residue of the
protein.
3. For predicting conformational B-cell epitopes, you will require
a 3D structure of target protein. 3D structure of target protein
can be downloaded from RCSB PDB database (https://www.
rcsb.org/), if available. Otherwise, you can model the 3D
structure of target protein using Modeller (https://salilab.
org/modeller/) or I-TASSER tool (https://zhanglab.ccmb.
med.umich.edu/I-TASSER/). Modeller is a command line
tool, whereas I-TASSER is a web server.
4. As an alternative, allergenicity of epitopes can be predicted
using AllergenFP (http://ddg-pharmfac.net/AllergenFP/)
and AlgPred (http://crdd.osdd.net/raghava/algpred/submis
sion.html) tool.
5. Before using any web server or tools, users are requested to go
through the help/FAQ section of tools, so that you can easily
understand the stepwise procedure, the basic principles, and
the usage of parameters.
182 Kumari Snehkant Lata et al.
Fig. 3 Parameters for visualization of H-bond between epitope and HLA molecule. The 3D structure of HLA-A2-
peptide complex (visualized in this image) was downloaded from RCSB PDB (PDB ID: 1B0G)
References
1. Vijayachari P, Sugunan AP, Shriram AN (2008) 7. Wang Z, Jin L, We˛grzyn A (2007) Leptospiro-
Leptospirosis: an emerging global public sis vaccines. Microb Cell Factories 6(1):39
health problem. J Biosci 33(4):557–569 8. Tomar N, De RK (2010) Immunoinformatics:
2. Adler B, de la Peña Moctezuma A (2010) Lep- an integrated scenario. Immunology 131
tospira and leptospirosis. Vet Microbiol 140 (2):153–168
(3–4):287–296 9. De Gregorio E, Rappuoli R (2012) Vaccines
3. Lata KS, Kumar S, Vaghasia V et al (2018) for the future: learning from human immunol-
Exploring Leptospiral proteomes to identify ogy. Microb Biotechnol 5(2):149–155
potential candidates for vaccine design against 10. Patronov A, Doytchinova I (2013) T-cell epi-
leptospirosis using an immunoinformatics tope vaccine design by immunoinformatics.
approach. Sci Rep 8(1):6935 Open Biol 3(1):120139
4. Levett PN (2015) Systematics of leptospira- 11. Yang X, Yu X (2009) An introduction to epi-
ceae. In: Leptospira and leptospirosis. Springer, tope prediction methods and software. Rev
Berlin, Heidelberg, pp 11–20 Med Virol 19(2):77–96
5. Adler B (2015) Vaccines against leptospirosis. 12. Dellagostin O, Grassmann A, Rizzi C et al
In: Leptospira and leptospirosis. Springer, Ber- (2017) Reverse vaccinology: an approach for
lin, Heidelberg, pp 251–272 identifying leptospiral vaccine candidates. Int
6. Ellis WA (2015) Animal leptospirosis. In: Lep- J Mol Sci 18(1):158
tospira and leptospirosis. Springer, Berlin, Hei- 13. Chaudhuri R, Ramachandran S (2017) Immu-
delberg, pp 99–137 noinformatics as a tool for new antifungal
Immunoinformatics for Vaccine Design 183
vaccines. In: Kalkum M, Semis M (eds) Meth- 27. Karplus PA, Schulz GE (1985) Prediction of
ods and protocols. Methods in molecular biol- chain flexibility in proteins. Naturwissenschaf-
ogy, vol 1625. Springer, Heidelberg, pp 31–43 ten 72(4):212–213
14. Davies MN, Flower DR (2007) Harnessing 28. Parker JMR, Guo D, Hodges RS (1986) New
bioinformatics to discover new vaccines. Drug hydrophilicity scale derived from high-
Discov Today 12(9–10):389–395 performance liquid chromatography peptide
15. Groot ASD, Rappuoli R (2004) Genome- retention data: correlation of predicted surface
derived vaccines. Expert Rev Vaccines 3 residues with antigenicity and X-ray-derived
(1):59–76 accessible sites. Biochemistry 25
16. Doytchinova IA, Flower DR (2007) VaxiJen: a (19):5425–5432
server for prediction of protective antigens, 29. Ponomarenko J, Bui HH, Li W et al (2008)
tumour antigens and subunit vaccines. BMC ElliPro: a new structure-based tool for the pre-
Bioinformatics 8(1):4 diction of antibody epitopes. BMC Bioinfor-
17. Lin J, Huang S, Zhang Q (2002) Outer mem- matics 9(1):514
brane proteins: key players for bacterial adapta- 30. Sanchez-Trincado JL, Gomez-Perosanz M,
tion in host niches. Microbes Infect 4 Reche PA (2017) Fundamentals and methods
(3):325–331 for T-and B-cell epitope prediction. J Immunol
18. Rodrı́guez-Ortega MJ, Norais N, Bensi G et al Res 2017:2680160
(2006) Characterization and identification of 31. Chen K, Kolls JK (2013) T cell–mediated host
vaccine candidate proteins through analysis of immune defenses in the lung. Annu Rev
the group A Streptococcus surface proteome. Immunol 31:605–633
Nat Biotechnol 24(2):191 32. Karosiene E, Rasmussen M, Blicher T et al
19. Yu CS, Lin CJ, Hwang JK (2004) Predicting (2013) NetMHCIIpan-3. 0, a common
subcellular localization of proteins for Gram- pan-specific MHC class II prediction method
negative bacteria by support vector machines including all three human MHC class II iso-
based on n-peptide compositions. Protein Sci types, HLA-DR, HLA-DP and HLA-DQ.
13(5):1402–1406 Immunogenetics 65(10):711–724
20. Yu CS, Chen YC, Lu CH, Hwang JK (2006) 33. Lazarski CA, Chaves FA, Jenks SA et al (2005)
Prediction of protein subcellular localization. The kinetic stability of MHC class II: peptide
Proteins 64(3):643–651 complexes is a key parameter that dictates
21. Nair DT, Singh K, Siddiqui Z et al (2002) immunodominance. Immunity 23(1):29–40
Epitope recognition by diverse antibodies sug- 34. Weber CA, Mehta PJ, Ardito M et al (2009) T
gests conformational convergence in an anti- cell epitope: friend or foe? Immunogenicity of
body response. J Immunol 168(5):2371–2382 biologics in context. Adv Drug Deliv Rev 61
22. Paul WE (2012) Fundamental immunology. (11):965–976
Lippincott Williams & Wilkins, Philadelphia 35. Fraga TR, Barbosa AS, Isaac L (2011) Lepto-
23. Jespersen MC, Peters B, Nielsen M, Marcatili P spirosis: aspects of innate immunity, immuno-
(2017) BepiPred-2.0: improving sequence- pathogenesis and immune evasion from the
based B-cell epitope prediction using confor- complement system. Scand J Immunol 73
mational epitopes. Nucleic Acids Res 45(W1): (5):408–419
W24–W29 36. Larsen MV, Lundegaard C, Lamberth K et al
24. Kolaskar AS, Tongaonkar PC (1990) A semi- (2007) Large-scale validation of methods for
empirical method for prediction of antigenic cytotoxic T-lymphocyte epitope prediction.
determinants on protein antigens. FEBS Lett BMC Bioinformatics 8(1):424
276(1–2):172–174 37. Dhanda S, Karosiene E, Edwards L et al (2018)
25. Fieser TM, Tainer JA, Geysen HM, Houghten Predicting HLA CD4 immunogenicity in
RA, Lerner RA (1987) Influence of protein human populations. Front Immunol 9:1369
flexibility and peptide conformation on reactiv- 38. Calis JJ, Maybeno M, Greenbaum JA et al
ity of monoclonal anti-peptide antibodies with (2013) Properties of MHC class I presented
a protein alpha-helix. Proc Natl Acad Sci 84 peptides that enhance immunogenicity. PLoS
(23):8568–8572 Comput Biol 9(10):e1003266
26. Emini EA, Hughes JV, Perlow D, Boger J 39. Muh HC, Tong JC, Tammi MT (2009) Aller-
(1985) Induction of hepatitis a virus- Hunter: a SVM-pairwise system for assessment
neutralizing antibody by a virus-specific syn- of allergenicity and allergic cross-reactivity in
thetic peptide. J Virol 55(3):836–839 proteins. PLoS One 4(6):e5861
184 Kumari Snehkant Lata et al.
40. Bui HH, Sidney J, Li W et al (2007) Develop- 44. Andrusier N, Nussinov R, Wolfson HJ (2007)
ment of an epitope conservancy analysis tool to FireDock: fast interaction refinement in molec-
facilitate the design of epitope-based diagnos- ular docking. Proteins 69(1):139–159
tics and vaccines. BMC Bioinformatics 8 45. Mashiach E, Schneidman-Duhovny D, Andru-
(1):361 sier N et al (2008) FireDock: a web server for
41. Hildesheim A, Apple RJ, Chen CJ et al (2002) fast interaction refinement in molecular dock-
Association of HLA class I and II alleles and ing. Nucleic Acids Res 36(suppl_2):
extended haplotypes with nasopharyngeal car- W229–W232
cinoma in Taiwan. J Natl Cancer Inst 94 46. Berendsen HJ, van der Spoel D, van Drunen R
(23):1780–1789 (1995) GROMACS: a message-passing parallel
42. Rivoltini L, Loftus DJ, Barracchini K et al molecular dynamics implementation. Comput
(1996) Binding and presentation of peptides Phys Commun 91(1–3):43–56
derived from melanoma antigens MART-1 47. Pettersen EF, Goddard TD, Huang CC et al
and glycoprotein-100 by HLA-A2 subtypes. (2004) UCSF chimera—a visualization system
Implications for peptide-based immunother- for exploratory research and analysis. J Comput
apy. J Immunol 156(10):3882–3891 Chem 25(13):1605–1612
43. Lamiable A, Thévenet P, Rey J et al (2016) 48. DeLano WL (2002) The PyMOL molecular
PEP-FOLD3: faster de novo structure predic- graphics system. http://www.pymol.org
tion for linear peptides in solution and in com-
plex. Nucleic Acids Res 44(W1):W449–W454
Chapter 8
Abstract
MHC class I proteins present intracellular peptides on the cell’s surface, enabling the immune system to
recognize tumor-specific neoantigens of early neoplastic cells and eliminate them before the tumor develops
further. However, variability in peptide-MHC-I affinity results in variable presentation of oncogenic
peptides, leading to variable likelihood of immune evasion across both individuals and mutations. Since
the major determinant of peptide-MHC-I affinity in patients is individual MHC-I genotype, we developed
a residue-centric presentation score taking both mutated residues and MHC-I genotype into account and
hypothesized that high scores (which correspond to poor presentation) would correlate to high mutation
frequencies within tumors. We applied our scoring system to 9176 tumor samples from TCGA across 1018
recurrent mutations and found that, indeed, presentation scores predicted mutation probability. These
findings open the door to more personalized treatment plans based on simple genotyping. Here, we outline
the computational tools and statistical methods used to arrive at this conclusion.
Key words Cancer predisposition, Cancer susceptibility prediction, Major histocompatibility complex
(MHC), Human leukocyte antigen (HLA), Antigen presentation, Cancer, Immunology, Immunoe-
diting, Immunotherapy, Neoantigens
1 Introduction
185
186 Lainie Beauchemin et al.
2.1 Acquisition The Cancer Genome Atlas (TCGA) contains data from thousands
of Patient Data of tumor samples, including clinical data, DNA sequence data, copy
number information, and mRNA expression data (https://
cancergenome.nih.gov/). Data was accessed from the NCI Geno-
mic Data Commons (https://portal.gdc.cancer.gov/).
2.2 HLA Typing We used exome sequencing data on the TCGA patients to type the
of Patient Data HLA-A, HLA-B, and HLA-C alleles of all patients. In order to
maximize the number of patients included in the analysis while not
compromising the accuracy of the calls, we used three different
typing software: PolySolver [7], Optitype [8], and snp2HLA
188 Lainie Beauchemin et al.
2.3 Generation We created a matrix containing the mutation information for every
of the Mutation Matrix patient in our sample. Among all mutations present in all tumors,
we selected only those occurring in the top 100 ranked oncogenes
and top 100 ranked tumor suppressors according to Davoli et al.
[10] and were observed at least in three TCGA patients. In addi-
tion, we only retained mutations that generate amino acid changes
(missense mutations and inframe indels). With these considera-
tions, we generated a binary 9176 1018 matrix for presence/
absence of mutation, with each column representing a recurrent
mutation and each row representing a patient sample. Mutations
outside the 200 top ranked oncogenes and tumor suppressors were
deemed passenger-like mutations and 1000 were sampled for anal-
ysis. Another 1000 common germline variants were sampled from
the Exome Variant Server (http://evs.gs.washington.edu/EVS/).
3.1 Depicting We condensed our processed data into two matrices, each with
the Global Link 9176 rows and 1018 columns. Each of the 9176 rows represents
Between PHBR Scores an HLA-typed patient and each of the 1018 columns represents a
and Mutation recurrent oncogenic mutation. One matrix is comprised of PHBR
Probability scores corresponding to the patient and the mutation specified by
the entry’s row and column, respectively, and the other is a binary
matrix composed of entries of either one or zero, with a 1 indicating
that the patient represented by the cell’s row presents the mutation
specified by the column and a 0 indicating that the patient does not.
In order to explore the general relationship between PHBR
and the presence of mutation in patient samples, we compared the
distributions of PHBR scores associated with mutation and those
not associated with mutation. That is, we took PHBR scores
corresponding to a 1 (mutation) in one group and those
corresponding to a 0 (no mutation) in another. The distributions
of such PHBR scores in both groups can be visualized using a
boxplot (Fig. 1a). As is apparent from the boxplot alone, higher
PHBR scores are more commonly associated with mutations. To
gain a more detailed understanding of this trend, we generated a
histogram using the same data (Fig. 1b). Indeed, for PHBR scores
<2, PHBR scores not associated with mutation are enriched, while
the opposite appears true for >2. Overall, mutation-associated
scores tend to be greater than scores not associated with mutations,
indicating that mutations tend to occur in the context of lower
MHC-I presentation.
The functional form of the relationship underlying this ten-
dency, however, remained unclear. We wondered if the relationship
was in some sense linear, in that an increase in PHBR uniformly
corresponds to an increased likelihood of mutation. The histogram
made clear that differences in PHBR frequency between the two
groups are highly variable across the range of PHBR scores, with
the most drastic differences seen when PHBR scores are less than
1 and modest differentials when PHBR scores exceed 2. We won-
dered if there was some sort of threshold after which PHBRs no
longer had a significant effect or if there was another explanation.
To investigate this issue, a generalized additive logistic regression
model using cubic spline basis functions can be fitted, as implemen-
ted in the mgcv R package [15]. The R code to fit the model is
given below. The code assumes that mut is a vector 1’s and 0’s
indicating mutation presence/absence and that PHBR is a vector
storing the PHBR scores.
gam1¼ gam(mut ~ s(log(PHBR)), family¼‘binomial’)
Characterizing MHC-I Genotype Predictive Power for Oncogenic Mutation. . . 191
10
PHBR Scores
8
6
4
2
0
0 1
Mutation Presence
B
0.4
No mutation
0.3
Mutation
Frequency
0.2
0.1
0.0
0 2 4 6 8 10
PHBR
A 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0012 Mutation Probability vs. PHBR B log(PHBR) vs. logit(probability)
−6.8
logit probability of mutation
Probability of mutation
−7.0
−7.2
−7.4
0 2 4 6 8 10 −2 −1 0 1 2
PHBR log−PHBR
Fig. 2 Nonparametric estimates of the mutation probability as a function of PHBR scores. (a) GAM-based
mutation probability vs. PHBR. (b) GAM-based logit-mutation probability vs. log PHBR
Probability of mutation
logit mutation probability
0.0
−0.5
−1.0
−1.5
−4 −2 0 2 4 0 2 4 6 8 10
log PHBR PHBR
Fig. 3 Validation of the GAM models. (a) Frequency of mutation for PHBR groups. (b) Scatterplot with each
point representing the log-average PHBR of a group and the logit-mutation frequency within that group
Characterizing MHC-I Genotype Predictive Power for Oncogenic Mutation. . . 193
3.2 Determining We had at this point firm evidence of a bias toward higher presen-
Predictive Power tation scores in observed mutations, establishing a global link
of Presentation Scores between PHBR and mutation status. To determine the extent to
for Mutation which PHBR scores can predict mutation presence, it has to be
Probability taken into account that the influence of MHC-I presentation on
mutation probability can operate at different levels. For instance, it
could be that PHBR patient scores help identify patients that are
more likely to have a mutation or it could also be that PHBR scores
help identify which mutations are more likely to occur in a given
patient. To investigate these different relationships and understand
the relationship between PHBR and mutation probability more
precisely, we analyzed mutation prediction using two models: the
within-mutation model, which assesses the power of the PHBR
scores to predict which patients will have a particular mutation,
and the within-patient model, referring to the prediction of which
mutations a particular patient’s tumor is more likely to have.
3.2.2 Determining The model the ability of PHBR scores to predict mutation proba-
Predictive Power bility within a patient, the same analysis as before can be implemen-
of Presentation Scores ted, except this time a model for patient-specific random effects is
Within a Given Patient fitted (i.e., certain patients are more likely to have more mutations
194 Lainie Beauchemin et al.
3.3 Odds Ratios As discussed above, there is strong evidence that PHBR scores have
an effect on mutation probability within patients and marginal
evidence of a slight effect on the mutation probability across
patients. To help interpret the degree to which scores predict
mutation probability for both models, we compute the odds ratios
of the two models. The OR is 1.282 in the within-patient analysis,
that is, a +1 increase in log-PHBR score is estimated to increase the
odds of mutation by 28.2%. The estimated OR for the within-
mutation analysis was only 1.028 and the upper end of the CI was
1.059, indicating that any potential increase in odds is unlikely to
be larger than 5.9% per unit of PHBR (Table 1). The same models
can be fitted on common germline variants or passenger mutations
for which any or much weaker association is expected, respectively.
As expected, the results show that PHBR of passenger mutations or
germline variants are not predictive of mutation probability in both
the within-mutation and the within-patient models (Table 1).
To more fully understand the degree to which PHBR is predic-
tive of mutation probability both within mutations and within
patients, we compute odds ratios for both models, within-mutation
and within-patient, with multiple mutation frequency cutoffs. That
is, we repeated the previous analyses but restricting attention to
mutations that appeared in 3, 5, 10, 20, and 40 tumors
(Table 2).
The within-mutation model has an odds ratio close to 1 for all
cutoffs, with 1 included in the 95% confidence interval, suggesting
that this model has very limited predictive power, if any at all. By
contrast, the within-patient model proves predictive with the
degree of predictive power highly dependent on the mutation
frequency cutoff. Although always significant, as the cutoff
increases from 3, 5, 10, and 20, the odds ratio of the within-
patient model concomitantly increases from 1.183 to 1.545. How-
ever, at frequency cutoff of 40, the OR although still >1 and highly
Characterizing MHC-I Genotype Predictive Power for Oncogenic Mutation. . . 195
Table 1
Quantitate estimate of the association between PHBR scores and mutation probability
Table 2
Quantitate estimate of the association between PHBR scores and mutation probability as a function of
oncogenic mutation frequency
A PCPG B THCA
KIRP PCPG
LIHC KIRP
PRAD LGG
BRCA PAAD
READ READ
BLCA COAD
LAML BRCA
HNSC TGCT
LUAD SKCM
STAD PRAD
GBM LIHC
UCEC BLCA
OV LUAD
COAD HNSC
THCA STAD
LGG GBM
SKCM OV
LUSC UCEC
TGCT LAML
SARC LUSC
PAAD SARC
KIRC KIRC
C PCPG
KIRP
D THCA
PCPG
LAML KIRP
TGCT LAML
GBM LGG
READ PAAD
LUAD COAD
HNSC READ
BLCA SKCM
LIHC GBM
OV LUAD
STAD TGCT
BRCA HNSC
PRAD BRCA
THCA OV
LUSC STAD
SKCM BLCA
COAD PRAD
UCEC UCEC
LGG LIHC
PAAD LUSC
SARC SARC
KIRC KIRC
Fig. 4 Predictive power of PHBR scores across cancer types. (a–d) ORs as black boxes and 95% CIs as red
dotted lines for different cancer types using (a) oncogenic mutations with frequency 5 and the within-
mutation model, (b) same as in A but using within-patient model, (c) oncogenic mutations with frequency 20
and the within-mutation model, (d) same as in (c) but using the within-patient model
within each model, we generated two plots, one for the within-
patient model and one for the within-mutation model, and added
the odds ratio data for each tissue type (Fig. 4).
The analysis is restricted to only those cancer types which
contain more than 100 samples so as to increase the confidence of
the odds ratio estimates. In addition, as observed in the pan-cancer
analyses, when we restrict the cancer type analysis to highly recur-
rent mutations, specifically those with at least 20 instances, as
opposed to the default threshold of 5, the ORs obtained across
cancer types are significantly higher in the within-patient model,
with some cases up to 2.51, meaning that increases in PHBR of
1 unit imply 251% increase in mutation probability of occurrence.
As expected, none of the cancer types is associated with a significant
OR in the within-mutation models.
Characterizing MHC-I Genotype Predictive Power for Oncogenic Mutation. . . 197
Acknowledgments
References
1. Kaplan DH et al (1998) Demonstration of an https://doi.org/10.1126/science.271.5256.
interferon gamma-dependent tumor surveil- 1734
lance system in immunocompetent mice. Proc 6. Marty R et al (2017) MHC-I genotype restricts
Natl Acad Sci U S A 95:7556–7561. https:// the oncogenic mutational landscape. Cell
doi.org/10.1073/pnas.95.13.7556 171:1272–1283. e1215. https://doi.org/10.
2. Shankaran V et al (2001) IFNgamma and lym- 1016/j.cell.2017.09.050
phocytes prevent primary tumour development 7. Shukla SA et al (2015) Comprehensive analysis
and shape tumour immunogenicity. Nature of cancer-associated somatic mutations in class
410:1107–1111. https://doi.org/10.1038/ I HLA genes. Nat Biotechnol 33:1152–1158.
35074122 https://doi.org/10.1038/nbt.3344
3. Sidney J, Peters B, Frahm N, Brander C, Sette 8. Szolek A et al (2014) OptiType: precision HLA
A (2008) HLA class I supertypes: a revised and typing from next-generation sequencing data.
updated classification. BMC Immunol 9:1. Bioinformatics 30:3310–3316. https://doi.
https://doi.org/10.1186/1471-2172-9-1 org/10.1093/bioinformatics/btu548
4. Zitvogel L, Tesniere A, Kroemer G (2006) 9. Jia X et al (2013) Imputing amino acid poly-
Cancer despite immunosurveillance: immuno- morphisms in human leukocyte antigens. PLoS
selection and immunosubversion. Nat Rev One 8:e64683. https://doi.org/10.1371/
Immunol 6:715–727. https://doi.org/10. journal.pone.0064683
1038/nri1936 10. Davoli T et al (2013) Cumulative haploinsuffi-
5. Leach DR, Krummel MF, Allison JP (1996) ciency and triplosensitivity drive aneuploidy
Enhancement of antitumor immunity by patterns and shape the cancer genome. Cell
CTLA-4 blockade. Science 271:1734–1736. 155:948–962. https://doi.org/10.1016/j.
cell.2013.10.011
198 Lainie Beauchemin et al.
Abstract
Diseases and infections elicit a multilayered immune response which consists of molecular and cellular
interaction cascades. Recent advances in high-throughput technologies have facilitated multiparameter
investigation of immune cells involved in human immune responses. These multiparameter investigations
generate large-scale datasets and advanced computational techniques are required to gain useful informa-
tion from them. Networks or graphs offer a practical way to represent complex information and develop
advanced algorithms to unveil the underlying mechanisms. Here we discuss ways to assemble and analyze
networks using genome-wide transcriptional profiles. Additionally, we discuss ways to integrate information
available in primary literature and databases with the networks assembled using large-scale datasets. Finally,
we describe ways in which network analysis offers insights into human immune responses.
Key words Gene co-expression, Network, Immunology, Clustering, Correlation, Mutual informa-
tion, Transcriptomics
1 Introduction
199
200 Lauren Benoodt and Juilee Thakar
Fig. 1 Epithelial (top) and dendritic cell (bottom) co-expression networks responsive to influenza infection. (a)
The shortest path based network highlighting the interactions between genes from the KEGG cytosolic DNA
sensing pathway. (b) Subnetworks enriched in genes from the KEGG cytosolic DNA sensing pathway. (c)
Nearest neighbors of the CXCL10 gene [4]
Fig. 2 A chord diagram of fuzzy cluster membership. The width of the lines represents the number of genes
from a cluster overlapping with another cluster. The KEGG pathways indicated next to the cluster numbers
show enrichment of that pathway in a cluster [9]
Network Analysis of Large-Scale Data and Its Application to Immunology 205
data points will be a member of one or more clusters (Fig. 2). This
method requires that the number of clusters be initially set like in
k-means clustering and a cutoff like in hierarchical clustering. This
type of clustering is more representative of biological pathways in
which many genes aren’t only associated with one signaling cascade
or biological process [9].
It is also possible to cluster genes into subnetworks based on
association scores and the network topology (Fig. 1b). This is also
known as community detection, where a community is a very dense
section of a larger graph. A community will have more interactions
within the community than the rest of the network. One frequently
used approach is based on random walks where a node with many
edges connected to it will have more opportunities for having a
walk go through it. An example of this is the Walktrap method
implemented in the R package iGraph which relies on short random
walks that determine the likelihood of nodes being members of a
community and then using hierarchical clustering to merge adja-
cent communities [10]. It is possible to have weighted random
walks where the association score for each edge is used as an edge
weight and the probability of a random walk traversing an edge is
based on the edge weight [11]. Other methods utilize prior
knowledge-based information, such as canonical pathway net-
works, to generate subnetworks that incorporate connections
known to be biologically meaningful, but allowing for the inclusion
of previously uncharacterized interactions [12]. Random walk-
based methods are just one type of network-topology-based com-
munity detection. There are also methods for finding overlapping
communities. Much like fuzzy clustering, these methods will out-
put clusters where nodes can belong to multiple clusters [13].
developed using MI, and gene modules were identified using fuzzy
C-means clustering. It has been shown that the systemic immune
response to influenza and RSV can be characterized by investigative
gene expression in PBMCs. Despite PBMCs being located away
from the site of infection, they have been shown to have gene-
expression patterns for interferon-related genes that correlate with
airway epithelial cells in RSV and influenza infection [25]. This is
important since obtaining samples from the primary infected cells
from human subjects is not always possible and frequently mixtures
of cells like PBMCs are utilized to measure the immune response.
We have also studied the response in specific cells such as epithelial
and dendritic cells. Airway epithelial cells are constantly exposed to
pathogens and are responsible for secreting immune-activating
signals in response to harmful pathogens [26]. Dendritic cells are
a part of the airway mucosa and lung parenchyma which process
and present antigens to T-cells and are involved in the inflammatory
response to infection [26].
When studying the immune response, differences between
human subjects are important factors. Age, gender, race, time
since onset of symptoms, and other factors can impact the expres-
sion of immune-related genes. For example, when studying RSV
infection in infants, the developing immune system responds dif-
ferently to infection depending on the age of the patients
[27]. Regression-based methods have been developed to use
these covariates when calculating association scores between
genes which can reduce false discovery [28].
To investigate immune response, it is critical to understand the
intercellular and cell-cytokine interactions. In the past, knowledge-
driven networks of cytokine interactions have been combined with
data-driven networks to find novel interactions between cells and
cytokines in the immune response [29]. This study used a database
of cytokine-cytokine relationships to expand the networks derived
from bacterial respiratory infections and asthma. The authors were
able to infer the role of NK cells in combined asthma and bacterial
infection, which was previously uncharacterized.
Recent developments in technology such as flow cytometry,
cyTOF, and single cell RNAseq has allowed for deeper investiga-
tions of cell–cell interactions. Flow cytometry requires predeter-
mined markers for analysis and is limited in how many markers can
be assessed at one time. The cyTOF assay identifies immune popu-
lations using time of flight mass cytometry [30]. A combination of
these cellular phenotyping assays with gene expression has facili-
tated methods to deconvolute cell-type-specific expression
networks.
Single cell RNAseq provides new opportunities to investigate
the differences between cells. Methods like Seurat use a network
based clustering algorithm to identify populations of cells from sin-
gle cell RNAseq data [32]. The cells are clustered by expression and
Network Analysis of Large-Scale Data and Its Application to Immunology 209
7 Conclusions
8 Notes
References
8. D’Haeseleer P (2005) How does gene expres- 21. Sanchez-Castillo M, Blanco D, Tienda-Luna
sion clustering work? Nat Biotechnol IM et al (2018) A Bayesian framework for the
23:1499–1501 inference of gene regulatory networks from
9. Khan A, Katanic D, Thakar J (2017) Meta- time and pseudo-time series data. Bioinformat-
analysis of cell- specific transcriptomic data ics 34:964–970
using fuzzy c-means clustering discovers versa- 22. Huynh-Thu VA, Irrthum A, Wehenkel L,
tile viral responsive genes. BMC Bioinformatics Geurts P (2010) Inferring regulatory networks
18:295 from expression data using tree-based meth-
10. Pons P, Latapy M (2006) Computing commu- ods. PLoS One 5:1–10
nities in large networks using random walks. J 23. Li J, Zhang Q, Chen Z et al (2019) A network-
Graph Algorithms Appl 10:191–218 based pathway-extending approach using DNA
11. Rosvall M, Bergstrom CT (2007) Maps of ran- methylation and gene expression data to iden-
dom walks on complex networks reveal com- tify altered pathways. Sci Rep 9:11853
munity structure. Proc Natl Acad Sci U S A 24. Qian F, Thakar J, Yuan X et al (2014) Immune
105:1118–1123 markers associated with host susceptibility to
12. Bourdakou MM, Spyrou GM (2017) Informed infection with west nile virus. Viral Immunol
walks: whispering hints to gene hunters inside 27:39–47
networks’ jungle. BMC Syst Biol 11:1–11 25. Ioannidis I, McNally B, Willette M et al (2012)
13. Javed MA, Younis MS, Latif S et al (2018) Plasticity and virus specificity of the airway epi-
Community detection in networks: a multidis- thelial cell immune response during respiratory
ciplinary review. J Netw Comput Appl virus infection. J Virol 86:5422–5436
108:87–111 26. Holt PG, Strickland DH, Wikström ME, Jahn-
14. Margolin AA, Nemenman I, Basso K et al sen FL (2008) Regulation of immunological
(2006) ARACNE: an algorithm for the recon- homeostasis in the respiratory tract. Nat Rev
struction of gene regulatory networks in a Immunol 8:142–152
mammalian cellular context. BMC Bioinfor- 27. Walsh EE, Mariani TJ, Chu C et al (2019)
matics 7(Suppl 1):S7 Aims, study design, and Enrollment results
15. Butte AJ, Kohane IS (2000) Mutual informa- from the assessing predictors of infant respira-
tion relevance networks: functional genomic tory syncytial virus effects and severity study.
clustering using pairwise entropy measure- JMIR Res Protoc 8:e12907
ments. Pac Symp Biocomput 5:415–426 28. Xie J (2018) False discovery rate control for
16. Chen JC, Cerise JE, Jabbari A et al (2015) high dimensional networks of quantile associa-
Master regulators of infiltrate recruitment in tions conditioning on covariates. J R Stat Soc
autoimmune disease identified through Series B Stat Methodol 80:1015–1034
network-based molecular deconvolution. Cell 29. Campbell C, Thakar J, Albert R (2011) Net-
Syst 1:326–337 work analysis reveals cross-links of the immune
17. Langfelder P, Horvath S (2008) WGCNA: an pathways activated by bacteria and allergen.
R package for weighted correlation network Phys Rev E Stat Nonlin Soft Matter Phys
analysis. BMC Bioinformatics 9:559 84:1–12
18. Voigt EA, Grill DE, Zimmermann MT et al 30. Gadalla R, Noamani B, MacLeod BL et al
(2018) Transcriptomic signatures of cellular (2019) Validation of CyTOF against flow cyto-
and humoral immune responses in older adults metry for immunological studies and monitor-
after seasonal influenza vaccination identified ing of human cancer clinical trials. Front Oncol
by data-driven clustering. Sci Rep 8:1–16 9:1–13
19. Faith JJ, Hayete B, Thaden JT et al (2007) 31. Krutzik PO, Hale MB, Nolan GP (2005) Char-
Large-scale mapping and validation of Escher- acterization of the murine immunological Sig-
ichia coli transcriptional regulation from a naling network with Phosphospecific flow
compendium of expression profiles. PLoS Biol Cytometry. J Immunol 175:2366–2373
5:0054–0066 32. Butler A, Hoffman P, Smibert P et al (2018)
20. Friedman N, Linial M, Nachman I, Pe’er D Integrating single-cell transcriptomic data
(2000) Using Bayesian networks to analyze across different conditions, technologies, and
expression data. J Comput Biol 7:601–620 species. Nat Biotechnol 36:411–420
Chapter 10
Abstract
Discovery of tumor antigenic epitopes is important for cancer vaccine development. Such epitopes can be
designed and modified to become more antigenic and immunogenic in order to overcome immunosup-
pression towards the native tumor antigen. In silico-guided modification of epitope sequences allows
predictive discrimination of those that may be potentially immunogenic. Therefore, only candidates
predicted with high antigenicity will be selected, constructed, and tested in the lab. Here, we described
the employment of in silico tools using a multiparametric approach to assess both potential T-cell epitopes
(MHC class I/II binding) and B-cell epitopes (hydrophilicity, surface accessibility, antigenicity, and linear
epitope). A scoring and ranking system based on these parameters was developed to shortlist potential
mimotope candidates for further development as peptide cancer vaccines.
Key words In silico prediction, T-cell epitope, B-cell epitope, Tumor antigen, IEDB, Vaccine design,
Mimotope
1 Introduction
213
214 Winfrey Pui Yee Hoo et al.
2 Methods
2.1 In Silico Amino 1. Enter the website for Genbank, National Center for Biotech-
Acid Substitution(s) on nology Information at https://www.ncbi.nlm.nih.gov/
Peptide Sequence genbank/
2. Click on the drop-down button beside the “search” bar and
select “Protein.” Type the accession number, if known. Other-
wise, type the name of protein and species of choice. Click
“search.”
3. Obtain and download the full-length peptide sequence of
interest for sequence manipulation. As peptide sequence may
come from various species, select the peptide sequence base on
the target species.
4. Select a region from the full-length peptide sequence for mod-
ification (see Note 1).
5. Modify the peptide sequences by adding single amino acid
substitutions flanking the original mutations, thus generating
peptide sequences with up to 19 different amino acid possibi-
lities (see Note 2).
6. Use all query sequences generated for subsequent T- and B-cell
epitope predictions.
2.2 T-Cell Epitope 1. Go to IEDB website using the following link: http://www.
Prediction iedb.org/ (Fig. 1) and click on “MHC I Binding” or “MHC
II Binding” found under “T Cell Epitope Prediction” at the
2.2.1 Input of Query
section “Epitope Analysis Resource.” This will allow you to
Sequences
enter the MHC I and II binding prediction sites, which can be
also assessed via http://tools.iedb.org/mhci/ (Fig. 2) and
http://tools.iedb.org/mhcii/ (Fig. 3), respectively.
2. Under “Specify Sequence(s),” insert protein sequence in
FASTA format or click the “Choose File” button to select a
file from your desktop (see Note 3).
Fig. 1 The official website of Immune Epitope Database and Analysis Resource
216 Winfrey Pui Yee Hoo et al.
Fig. 4 Example of the generated web page displaying the results of the MHC class I binding prediction
Fig. 5 Example of the generated web page displaying the results of the MHC class II binding prediction
2.2.2 Interpretation of 1. Both MHC class I and II binding prediction results display the
Output Results “Input Sequences” and a table showing the predicted
sequences along with their percentile ranks (Figs. 4 and 5)
(see Note 7).
2. Click on the “Check to expand the result” checkbox to expand
the tabulated predictions, which will show scores from individ-
ual prediction methods used (Figs. 6 and 7) (see Note 8).
3. Click to download the predicted results in Microsoft Excel
format.
Epitope Sequence Modifications and Predictions 219
Fig. 6 Example of the expanded results of the MHC class I binding prediction
Fig. 7 Example of the expanded results of the MHC class II binding prediction
4. Select the peptide sequence with the lowest percentile rank for
each query peptide sequence (see Note 9).
5. For MHC class I binding prediction results, categorize the
peptide sequences into high (0–50 nM), intermediate
(51–500 nM), and low (501–5000 nM) binders based on the
percentile rank.
6. For MHC class II binding prediction results, categorize the
peptide sequences into high (0.01–4.65 nM), intermediate
(4.66–9.28 nM), and low (9.29 nM) binders based on the
percentile rank.
220 Winfrey Pui Yee Hoo et al.
2.3 B-Cell Epitope 1. Go to IEDB website using the following link: http://www.
Prediction iedb.org/ and click on “Antigen Sequence Properties” found
under “B Cell Epitope Prediction” at the section “Epitope
2.3.1 Input of Query
Analysis Resource.” This will allow you to enter the Antibody
Sequences
Epitope Prediction site, which can be also assessed using this
link: http://tools.immuneepitope.org/bcell/ (Fig. 8).
2. Enter protein sequence containing less than 50,000 residues in
plain .txt format into the query box (see Note 10).
3. Choose a prediction method in the section “Choose a method”
according to the parameter of choice.
4. Click on the “submit” button to start the prediction.
Fig. 9 Example of a BepiPred linear epitope prediction result of a peptide sequence. Graph and table display
the scores obtained by each residue in the inserted sequence. X- and Y-axis in the graph represent the
sequence position and epitope score, respectively. The yellow region above the threshold was predicted to be
part of an epitope
Parker Hydrophilicity 1. Parker hydrophilicity prediction displays the scores for each
Prediction of Amino Acid window size of 7 amino acid residue of the query peptide
Residues sequence (Fig. 10) (see Note 13).
2. Press in the result table below to download the predicted
results in Microsoft Excel format.
3. Select the highest residue score as the raw hydrophilic score for
each query peptide sequence.
4. Categorize the peptide sequences into high (3.444–4.614),
intermediate (2.615–3.443), or low (0–2.614) hydrophilicity
based on the respective scores.
Kolaskar and Tongaonkar 1. Kolaskar and Tongaonkar antigenicity prediction displays the
Antigenicity Prediction scores for each window size of 7 amino acid residues of the
query peptide sequence (similar to Parker hydrophilicity pre-
diction) (Fig. 11) (see Note 13).
2. Press in the result table below to download the predicted
results in Microsoft Excel format.
3. Select the highest residue score as the raw antigenicity score for
each query peptide sequence (see Note 14).
4. Categorize the peptide sequences into high (1.0) and inter-
mediate (<1.0) antigenicity based on the respective scores.
222 Winfrey Pui Yee Hoo et al.
Fig. 10 Example of a Parker hydrophilicity prediction result of a peptide sequence. Graph and table displays
the scores obtained by each residue containing 7 amino acids. The X- and Y-axis in the graph represent the
sequence position and hydrophilic propensity score, respectively. The yellow regions above the threshold are
hydrophilic
Fig. 11 Example of a Kolaskar and Tongaonkar antigenicity prediction result of a peptide sequence. Graph and
table display the scores obtained by each residue containing 7 amino acids. The X- and Y-axis in the graph
represent the sequence position and antigenicity score, respectively. The yellow regions above the threshold
are antigenic
Emini Surface Accessibility Emini surface accessibility prediction calculates the Sn using the
Prediction formulae Sn ¼ (n + 4 + i)(0.37)6, where Sn is the surface proba-
bility, n is the fractional surface probability value, while i varies from
1 to 6 [19].
1. Emini surface accessibility prediction displays the scores for
each window size of 7 amino acid residues of the query peptide
Epitope Sequence Modifications and Predictions 223
Fig. 12 Example of an Emini surface accessibility prediction result of a peptide sequence. Graph and table
display the scores obtained by each residue containing 7 amino acids. The X- and Y-axis in the graph
represent the sequence position and surface probability score, respectively. The yellow regions above the
threshold are parts of the peptide that can be accessed by B-cell receptors
Table 1
Example of a raw score conversion table for each parameter assessed
3 Notes
References
1. Pietersz GA, Pouniotis DS, Apostolopoulos V 11. Schreurs MWJ, Kueter EWM, Scholten KBJ,
(2006) Design of peptide-based vaccines for Lemonnier FA, Meijer CJLM, Hooiberg E
cancer. Curr Med Chem 13(14):1591–1607. (2005) A single amino acid substitution
https://doi.org/10.2174/ improves the in vivo immunogenicity of the
092986706777441922 HPV16 oncoprotein E7 (11-20) cytotoxic T
2. Li W, Joshi MD, Singhania S, Ramsey KH, lymphocyte epitope. Vaccine 23
Murthy AK (2014) Peptide vaccine: progress (31):4005–4010. https://doi.org/10.1016/j.
and challenges. Vaccines (Basel) 2 vaccine.2005.03.014
(3):515–536. https://doi.org/10.3390/ 12. Hofmann S, Mead A, Malinovskis A, Hardwick
vaccines2030515 NR, Guinn BA (2015) Analogue peptides for
3. Sharav T, Wiesmüller KH, Walden P (2007) the immunotherapy of human acute myeloid
Mimotope vaccines for cancer immunotherapy. leukemia. Cancer Immunol Immunother 64
Vaccine 25(16):3032–3037. https://doi.org/ (11):1357–1367. https://doi.org/10.1007/
10.1016/j.vaccine.2007.01.033 s00262-015-1762-9
4. Knittelfelder R, Riemer AB, Jensen-Jarolim E 13. Kumar SR, Prabakaran M, Ashok Raj KV, He F,
(2009) Mimotope vaccination—from allergy Kwang J (2015) Amino acid substitutions
to cancer. Expert Opin Biol Ther 9 improve the immunogenicity of H7N7HA
(4):493–506. https://doi.org/10.1517/ protein and protect mice against lethal H7N7
14712590902870386 viral challenge. PLoS One 10(6):e0128940.
5. Buhrman JD, Slansky JE (2013) Mimotope https://doi.org/10.1371/journal.pone.
vaccine efficacy gets a "boost" from native 0128940
tumour antigens. OncoImmunology 2(4): 14. Brown JH, Jardetzky TS, Gorga JC et al
e23492. https://doi.org/10.4161/onci. (1993) Three-dimensional structure of the
23492 human class II histocompatibility antigen
6. Lipford GB, Bauer S, Wagner H, Heeg K HLA-DR1. Nature 364(6432):33–39.
(1995) In vivo CTL induction with point- https://doi.org/10.1038/364033a0
substituted ovalbumin peptides: immunoge- 15. Parker JM, Guo D, Hodges RS (1986) New
nicity correlates with peptide-induced MHC hydrophilicity scale derived from high-
class I stability. Vaccine 13(3):313–320. performance liquid chromatography peptide
https://doi.org/10.1016/0264-410x(95) retention data: correlation of predicted surface
93320-9 residues with antigenicity and X--ray--derived
7. Pogue RR, Eron J, Frelinger JA, Matsui M accessible sites. Biochemistry 25
(1995) Amino-terminal alteration of the (19):5425–5432. https://doi.org/10.1021/
HLA-A∗0201-restricted human immunodefi- bi00367a013
ciency virus pol peptide increases complex sta- 16. Levitt M (1978) Conformational preferences
bility and in vitro immunogenicity. Proc Natl of amino acids in globular proteins. Biochem-
Acad Sci U S A 92(18):8166–8170. https:// istry 17(20):4277–4285. https://doi.org/10.
doi.org/10.1073/pnas.92.18.8166 1021/bi00613a026
8. Fikes J (2004) Chapter 2: the rational design of 17. Kavitha K, Saritha R, Vinod Chandra S (2013)
T cell epitopes with enhanced immunogenicity. Computational methods in linear B--cell epi-
In: Morse MA, Clay TM, Lyerly HK (eds) tope prediction. Int J Comput Appl 63
Handbook of cancer vaccines. Humana Press, (12):28–32
New Jersey 18. Kolaskar AS, Tongaonkar PC (1990) A semi-
9. Bei R, Scardino A (2010) TAA polyepitope empirical method for prediction of antigenic
DNA-based vaccines: a potential tool for can- determinants on protein antigens. FEBS Lett
cer therapy. J Biomed Biotechnol:1–12. 276(1–2):172–174. https://doi.org/10.
https://doi.org/10.1155/2010/102758 1016/0014-5793(90)80535-q
10. Huarte E, Sarobe P, Lu J, Casares N, Lasarte JJ, 19. Emini EA, Hughes JV, Perlow DS, Boger J
Dotor J, Ruiz M, Prieto J, Celis E, Borrás- (1985) Induction of hepatitis a virus-
Cuesta F (2002) Enhancing immunogenicity neutralising antibody by a virus-specific syn-
of a CTL epitope from carcinoembryonic anti- thetic peptide. J Virol 55(3):836–839
gen by selective amino acid replacements. Clin 20. Chang ST, Ghosh D, Kirschner DE, Linder-
Cancer Res 8(7):2336–2344 man JJ (2006) Peptide length-based prediction
228 Winfrey Pui Yee Hoo et al.
Abstract
Peptide-based vaccines are an appealing strategy which involves usage of short synthetic peptides to
engineer a highly targeted immune response. These short synthetic peptides contain potential T- and
B-cell epitopes. Experimental approaches in identifying these epitopes are time-consuming and expensive;
hence immunoinformatics approach came into picture. Immuninformatics approach involves epitope
prediction tools, molecular docking, and population coverage analysis in design of desired immunogenic
peptides. In order to overcome the antigenic variation of viruses, conserved regions are targeted to find the
potential epitopes. The present chapter demonstrates the use of immunoinformatics approach to select
potential peptide containing multiple T- (CD8+ and CD4+) and B-cell epitopes from Avian H3N2 M1
Protein. Further, molecular docking (to analyse HLA-peptide interaction) and population coverage analysis
have been used to verify the potential of peptide to be presented by polymorphic HLA molecules. In silico
approach of epitope prediction has proven to be successful methodology in screening the putative epitopes
among numerous possible vaccine targets in a given protein.
Key words Conservation, Influenza epitope, BLASTp, Peptide-based vaccine, Epitope prediction,
Docking, Population coverage
1 Introduction
229
230 Neha Lohia and Manoj Baranwal
2 Methodology
Fig. 2 Influenza virus resource webpage displaying various parameters for downloading the protein sequences
2.2 Identification The extensive mutability of influenza virus significantly enhances its
of Conserved propensity to escape immune recognition, thus causing inadequate
Sequences immune response of the host against all the circulating variants.
Therefore, a peptide candidate for universal influenza vaccine
should be highly conserved. In order to identify highly conserved
peptide,
1. Open the protein sequence file in Microsoft word, and replace
all the “X” and “J” alphabets in the protein sequence with “N.”
Save the changes.
2. Align the protein sequences using multiple sequence compari-
son by log expectation (MUSCLE) or any other multiple
sequence alignment tool. Download the results of alignment
in FASTA format.
3. Load the alignment file on another tool called “Antigen Varia-
bility ANAlyzer” (AVANA) to find the conserved peptide
stretch.
Note: The Antigen Variability ANalyzer (AVANA) tool,
which is a standalone application, analyzes multiple sequence
alignments based on entropy and calculates variability at a given
Immuninformatics Approach in Epitope Mapping 233
0.75
0.5
0.25
0
0 25 50 75 100 125 150 175 200 225 250
Position of amino acid
2.3 T-Cell Epitope Various databases have been established for collection of experi-
Prediction by mental immunological data. Next-generation sequencing and high-
Consensus Approach throughput screening of HLA binding assays have taken the lead in
identification of novel MHC alleles and understanding of their
binding patterns. The computational tools are pattern recognition
methods trained on large data obtained from in vitro experiments.
Pattern recognition is an application of machine learning (ML) in
computer sciences. ML is employed for the study and construction
of algorithms that can recognize motifs/pattern from the training
data (binding peptides) and make predictions. Various ML techni-
ques are extensively used in immunoinformatics, which include
support vector machines (SVMs), position-specific scoring matrices
(PSSMs), artificial neural networks (ANNs), and hidden Markov
models (HMMs). T-cell epitope prediction tools based on different
machine learning approaches are listed in Table 1. The consensus
234 Neha Lohia and Manoj Baranwal
Table 1
List of various T-cell epitope prediction tools
Predictive
Predictive server for
server for HLA I HLA II (CD4+ Predictive
Server name (CD8+ T cells) T cells) method Link References
EpiJen 24 Multistep http://www.ddgpharmfac. [20]
algorithm net/epijen/EpiJen/
EpiJen.htm
IEDB 77 ANN and http://tools.iedb.org/ [4]
binding SMM main/tcell/
KISS 64 SVM http://cbio.ensmp.fr/kiss/ [21]
MHC2Pred 42 SVM http://www.imtech.res.in/ [22]
raghava/mhc2pred/
MHCPred 14 11 Additive http://www.ddg-pharmfac. [23]
method net/mhcpred/
MHCPred/
MMBPred 46 Quantitative http://www.imtech.res.in/ [24]
matrix raghava/mmbpred/
NetCTL 12 supertypes∗ ANN http://www.cbs.dtu.dk/ [25]
regression services/NetCTL
NetCTL 12 supertypes∗ ANN-weight http://www.cbs.dtu.dk/ [26]
PAN 1.1 matrix services/NetCTLpan/
netMHCpan 172 human ANN http://www.cbs.dtu.dk/ [27]
4.0 services/NetMHCpan/
netMHC 4.0 81 human ANN http://www.cbs.dtu.dk/ [28]
services/NetMHC/
nHLAPred 67 ANN http://www.imtech.res.in/ [29]
raghava/nhlapred/
ProPred 51 Quantitative http://www.imtech.res.in/ [30]
matrix raghava/propred/
ProPred I 47 Quantitative http://www.imtech.res.in/ [31]
matrix raghava/propred1/
RANKPEP 118 62 PSSM http://bio.dfci.harvard. [32]
edu/RANKPEP/
SVRMHC 36 6 SVM http://c1.accurascience. [33]
com/SVRMHCdb/
SYFPEITHI 33 (human) 7 Published http://www.syfpeithi.de/ [34]
motifs bin/MHCServer.dll/
EpitopePrediction.htm
*Supertype is defined as the cluster of functionally related HLA alleles that share binding specificities towards the same
panel of peptides owing to similar structural features of HLAs peptide binding groove
Immuninformatics Approach in Epitope Mapping 235
Table 2
Sequence-based B-cell epitope prediction tools
Server
Name Predictive method URL Reference
BepiPred Random forest algorithm http://www.cbs.dtu.dk/services/ [35]
BepiPred/
CBtope Support vector machine [SVM http://crdd.osdd.net/raghava/cbtope/ [36]
submit.php
ABCpred Standard feed-forward [FNN) and https://webs.iiitd.edu.in/raghava/ [37]
recurrent neural network [RNN) abcpred/ABC_submission.html
IgPred SVM https://webs.iiitd.edu.in/raghava/ [38]
igpred/help.html
BCPRED SVM http://ailab.ist.psu.edu/bcpred/ [39]
predict.html
SVMtrip SVM http://sysbio.unl.edu/SVMTriP/ [40]
prediction.php
Bcepred Parker, Karplus, Emini, and Kolaskar http://crdd.osdd.net/raghava/ [41]
method bcepred/
BEST Support vector machine [SVM) Standalone software [42]
method
Epitopia Naive Bayes classifier http://epitopia.tau.ac.il/index.html [43]
Pepitope PepSurf and Mapitope algorithms http://pepitope.tau.ac.il/ [44]
iBCE-EL SVM, RF, ERT, GB, AB, and k-NN http://www.thegleelab.org/iBCE-EL/ [45]
COBEpro SVM and propensity score http://scratch.proteomics.ics.uci.edu/ [46]
LBtope SVM and multiple algorithms in Weka http://crdd.osdd.net/raghava/lbtope/ [47]
Table 3
Conserved sequences of Avian H3N2 M1 protein
Table 4
CD8+ and CD4+ T-cell epitopes and B-cell epitopes of human H3N2 virus matrix 1 protein
Table 5
Avian H3N2 matrix 1 peptides containing overlapping CD8+ and CD4+ T-cell epitopes and B-cell
epitope
Table 6
HLA molecules used for docking and binding energy of each HLA-epitope complex
Free energy ΔG
(in Kcal/mol)
after docking
HLA with native Free energy ΔG (in Kcal/mol) after
Type of HLA PDB id molecules Resolution peptide docking with M1 epitopes/peptide
HLA class I 3MRK HLA-A2 1.4 Å 7.30 6.80 (KRMGVQMQR)
HLA class II 1KLU HLA-DR1 1.93 Å 5.70 6.20 (LENLQTYQKRMGVQM)
Fig. 4 Pose of dockings obtained after docking of HLA class I (3MRK) and CD8+
T-cell epitope
Immuninformatics Approach in Epitope Mapping 241
Fig. 5 Population coverage of the peptide containing CD8+ epitope and CD4+
epitope
References
1. Tong S, Zhu X, Li Y et al (2013) New World transmissibility factors in PB2 proteins of influ-
Bats Harbor diverse influenza a viruses. PLoS enza a by large-scale mutual information analy-
Pathog 9(10):e1003657 sis. BMC Bioinformatics 9(1):1–18
2. Purcell AW, McCluskey J, Rossjohn J (2007) 9. Lohia N, Baranwal M (2014) Conserved pep-
More than one reason to rethink the use of tides containing overlapping CD4+ and CD8+
peptides in vaccine design. Nat Rev Drug Dis- T-cell epitopes in the H1N1 influenza virus: an
cov 6(5):404–414 immunoinformatics approach. Viral Immunol
3. Slingluff CL (2011) The present and future of 27(5):225–234
peptide vaccines for Cancer. Cancer J 17 10. Agallou M, Athanasiou E, Koutsoni O et al
(5):343–350 (2014) Experimental validation of multi-
4. Vita R, Overton JA, Greenbaum JA et al epitope peptides including promising MHC
(2008) The immune epitope database (IEDB) class I- and II-restricted epitopes of four
3.0. Nucleic Acids Res 43(D1):D405–D412 known Leishmania infantum proteins. Front
5. Backert L, Kohlbacher O (2015) Immunoin- Immunol 5:1–16
formatics and epitope prediction in the age of 11. Vijayan R, Subbarao N, Manoharan N (2015)
genomic medicine. Genome Med 7(1):119 In silico analysis of conformational changes
6. Bao Y, Bolotov P, Dernovoy D et al (2008) induced by normal and mutation of macro-
The influenza virus resource at the national phage infectivity potentiator catalytic residues
center for biotechnology information. J Virol and its interactions with Rapamycin. Interdis-
82(2):596–601 cip Sci 7(3):326–333
7. Zhang Y, Aevermann BD, Anderson TK et al 12. Patronov A, Dimitrov I, Flower DR et al
(2017) Influenza research database: an (2011) Peptide binding prediction for the
integrated bioinformatics resource for influ- human class II MHC allele HLA-DP2: a
enza virus research. Nucleic Acids Res 45 molecular docking approach. BMC Struct
(D1):D466–D474 Biol 11:32
8. Miotto O, Heiny AT, Tan TW et al (2007) 13. Thévenet P, Shen Y, Maupetit J et al (2012)
Identification of human-to-human PEP-FOLD: an updated de novo structure pre-
diction server for both linear and disulfide
242 Neha Lohia and Manoj Baranwal
bonded cyclic peptides. Nucleic Acids Res 40 ligand and peptide binding affinity data. J
(W1):288–293 Immunol 199(9):3360–3368
14. Trott O, Olson AJ (2010) AutoDock Vina: 28. Andreatta M, Nielsen M (2015) Gapped
improving the speed and accuracy of docking sequence alignment using artificial neural net-
with a new scoring function efficient optimiza- works: application to the MHC class i system.
tion and multithreading. J Comput Chem 31 Bioinformatics 32(4):511–517
(2):455 29. Bhasin M, Raghava GPS (2007) A hybrid
15. Lohia N, Baranwal M (2018) Highly con- approach for predicting promiscuous MHC
served hemagglutinin peptides of H1N1 influ- class I restricted T cell epitopes. J Biosci 32
enza virus elicit immune response. 3 Biotech 8 (1):31–42
(12):492 30. Singh H, Raghava GPS (2002) ProPred: pre-
16. Jain S, Baranwal M (2019) Computational diction of HLA-DR binding sites. Bioinfor-
analysis in designing T cell epitopes enriched matics 17(12):1236–1237
peptides of Ebola glycoprotein exhibiting 31. Singh H, Raghava GPS (2003) ProPred1: pre-
strong binding interaction with HLA mole- diction of promiscuous MHC class-I binding
cules. J Theor Biol 465:34–44 sites. Bioinformatics 19(8):1009–1014
17. Bui HH, Sidney J, Dinh K et al (2006) Predict- 32. Reche PA, Reinherz EL (2007) Prediction of
ing population coverage of T-cell epitope- peptide-MHC binding using profiles. Methods
based diagnostics and vaccines. BMC Bioinfor- Mol Bio 409:185–200
matics 7:1–5 33. Liu W, Wan J, Meng X et al (2007) In silico
18. González-Galarza FF, Takeshita LYC, Santos prediction of peptide-MHC binding affinity
EJM et al (2015) Allele frequency net 2015 using SVRMHC. Methods Mol Biol (Clifton,
update: new features for HLA epitopes, KIR NJ) 409:283–291
and disease and HLA adverse drug reaction 34. Rammensee HG, Bachmann J, Emmerich
associations. Nucleic Acids Res 43(D1): NPN et al (1999) SYFPEITHI: database for
D784–D788 MHC ligands and peptide motifs. Immunoge-
19. Lohia N, Baranwal M (2015) Identification of netics 50(3–4):213–219
conserved peptides comprising multiple T cell 35. Jespersen MC, Peters B, Nielsen M et al (2017)
epitopes of matrix 1 protein in H1N1 influenza BepiPred-2.0: improving sequence-based
virus. Viral Immunol 28(10):570–579 B-cell epitope prediction using conformational
20. Doytchinova IA, Guan P, Flower DR (2006) epitopes. Nucleic Acids Res 45(W1):
EpiJen: a server for multistep T cell epitope W24–W29
prediction. BMC Bioinformatics 7:1–11 36. Ansari H, Raghava GP (2010) Identification of
21. Jacob L, Vert JP (2008) Efficient peptide- conformational B-cell epitopes in an antigen
MHC-I binding prediction for alleles with few from its primary sequence. Immunome Res 6
known binders. Bioinformatics24(3):358–366 (1):6
22. MHC2PRED: http://crdd.osdd.net/ 37. Saha S, Raghava GP (2006) Prediction of con-
raghava/mhc2pred/info.html tinuous B-cell epitopes in an antigen using
23. Guan P, Doytchinova IA, Zygouri C (2003) recurrent neural network. Proteins 65
MHCPred: a server for quantitative prediction (1):40–48
of peptide-MHC binding. Nucleic Acids Res 38. Gupta S, Ansari HR, Gautam A et al (2013)
31(13):3621–3624 Identification of B-cell epitopes in an antigen
24. Bhasin M, Raghava GPS (2003) Prediction of for inducing specific class of antibodies. Biol
promiscuous and high-affinity mutated MHC Direct 8(1):27
binders. Hybrid Hybridomics 22(4):229–234 39. Chen J, Liu H, Yang J et al (2007) Prediction
25. Larsen MV, Lundegaard C, Lamberth K et al of linear B-cell epitopes using amino acid pair
(2007) Large-scale validation of methods for antigenicity scale. Amino Acids 33(3):423–428
cytotoxic T-lymphocyte epitope prediction. 40. Yao B, Zhang L, Liang S et al (2012)
BMC Bioinformatics 8:1–12 SVMTriP: a method to predict antigenic epi-
26. Stranzl T, Larsen MV, Lundegaard C et al topes using support vector machine to inte-
(2010) NetCTLpan: pan-specific MHC class I grate tri-peptide similarity and propensity.
pathway epitope predictions. Immunogenetics PLoS One 7(9):e45152
62(6):357–368 41. Saha S, Raghava GPS (2004) BcePred: predic-
27. Jurtz V, Paul S, Andreatta M et al (2017) tion of continuous B-cell epitopes in antigenic
NetMHCpan-4.0: improved peptide–MHC sequences using Physico-chemical properties.
class i interaction predictions integrating eluted In: Nicosia G, Cutello V, Bentley PJ, Timmis
J (eds) Artificial immune systems. ICARIS
Immuninformatics Approach in Epitope Mapping 243
2004. Lecture notes in computer science, vol 45. Manavalan B, Govindaraj RG, Shin TH et al
3239. Springer, Berlin, Heidelberg (2018) iBCE-EL: a new ensemble learning
42. Gao J, Faraggi E, Zhou Y et al (2012) BEST: framework for improved linear B-cell epitope
improved prediction of B-cell epitopes from prediction. Front Immunol 9:1695
antigen sequences. PLoS One 7(6):e40104 46. Sweredoski MJ, Baldi P (2009) COBEpro: a
43. Rubinstein ND, Mayrose I, Martz E et al novel system for predicting continuous B-cell
(2009) Epitopia: a web-server for predicting epitopes. Protein Eng Des Sel 22(3):113–120
B-cell epitopes. BMC Bioinformatics 10 47. Singh H, Ansari HR, Raghava GP (2013)
(1):287 Improved method for linear b-cell epitope pre-
44. Mayrose I, Penn O, Erez E et al (2007) Pepi- diction using antigen’s primary sequence.
tope: epitope mapping from affinity-selected PLoS One 8(5):e62216
peptides. Bioinformatics 23(23):3244–3246
Chapter 12
Abstract
A proof of concept for new methodology to detect and potentially quantify mAb aggregation is presented.
Assay development included using an aggregated mAb as bait for screening of a phage display peptide
library and identifying those peptides with random sequence which can recognize mAb aggregates. The
selected peptides can be used for developing homogeneous quantitative methods to assess mAb aggrega-
tion. Results indicate that a peptide-binding method coupled with fluorescence polarization detection can
detect mAb aggregation and potentially monitor the propensity of therapeutic protein candidates to
aggregate.
Key words Monoclonal antibodies, Aggregation, Peptide phage display, Next-generation sequenc-
ing, FITC-peptide
1 Introduction
245
246 Illarion V. Turko
2 Materials
2.2 Phage Display 1. Ph.D.-12 phage display peptide library kit from New England
BioLabs (Ipswich, MA, USA), catalog # E8110S. The library
consists of M13 filamentous bacteriophage, on which five cop-
ies of a 12-amino-acid linear random peptide sequence are
expressed as N-terminal fusions to the minor coat protein
pIII of the phage. A short linker glycine-glycine-glycine-serine
(GGGS) is present between each displayed peptide and pIII
protein. The M13 phage is propagated in E. coli host strain
A New Approach to Assess mAb Aggregation 247
2.3 Phage DNA 1. TE buffer: 10 mmol/L Tris–HCl (pH 8.0) with 1 mmol/
Purification L EDTA.
2. Iodide buffer: TE buffer with 4 mol/L sodium iodide. Store in
a dark bottle covered with foil at room temperature.
3. DNA Clean and Concentrator-5 kit (Zymo Research, catalog #
D4003).
3 Methods
Fig. 1 SEC for temperature-treated NIST mAb. (a) First run. (b) Second run. (c)
Third run
A New Approach to Assess mAb Aggregation 249
3.2 Phage Display Protocols that cover applications of Ph.D. Phage Display Libraries
can be found at https://www.neb.com/-/media/catalog/
Datacards%20or%20Manuals/manualE8102.pdf. Here, we
describe only the protocol used in the current study:
1. Phage panning: 150 μL/well of 500 μg/mL untreated NIST-
mAb or temperature-treated NISTmAb in PBS was used to
cover several wells of a 96-well plate (Nunclon, catalog #
167008) at 4 C overnight. All subsequent steps were carried
out with gentle orbital shaking at room temperature. First,
wells were washed with 200 μL of PBS for 10 min and PBS
was replaced with 200 μL of blocking buffer for 2 h. Then, each
well was loaded with 150 μL of 15-fold diluted Ph.D.-12 phage
display peptide library in blocking buffer. After 2 h of incuba-
tion, each well was washed five times with 200 μL of washing
buffer for 10 min each washing. Finally, attached phage was
eluted with 150 μL of elution buffer for exactly 10 min and
transferred to clean autoclaved conical tubes containing 22 μL
of neutralizing buffer.
Table 1
Appearance of peptides in NGS data. 127,396 hits are arranged based on their appearance. The top
abundance peptide (#1) was found 156 times
Peptide Appearance
#1 156
#2 65
#3 41
#4 26
# 55 10
# 1985 1
# 127396 1
Motif LDLKR
DM LDCRR VGCAP
GPF LDVKR NMTV
HEYQ LSIER RLP
E LWSKR ATYPPL
A LDLPR PQIGNR
VTRFHP LNESR Y
SYND LASFR LTT
NS VDFTR YTHSG
SQGKDH LRLLR P
Fig. 2 An example of a common motif search using The MEME Suits software.
100 top NGS sequences sorted by position p-value
0.1
0.05
0
0 50 100 150 200 250
NISTmAb, µmol/L
Fig. 3 The representative fluorescence polarization saturation binding assay using two FITC-peptides (with
appearance #1 and #7 from NGS data) at 0.5 μmol/L and the indicated concentrations of control and
temperature-treated NISTmAb
4 Notes
Acknowledgments
References
1. Lowe D, Dudgeon K, Rouet R, Schofield P, Hydrogen exchange mass spectrometry reveals
Jermutus L, Christ D (2011) Aggregation, sta- protein interfaces and distant dynamic cou-
bility, and formulation of human antibody pling effects during the reversible self-
therapeutics. Adv Protein Chem Struct Biol association of an IgG1 monoclonal antibody.
84:41–61 MAbs 7:525–539
2. Singh SK (2011) Impact of product-related 10. Tessier PM, Wu J, Dickinson CD (2014)
factors on immunogenicity of therapeutics. J Emerging methods for identifying monoclonal
Pharm Sci 100:354–387 antibodies with low propensity to self-associate
3. Cromwell MEM, Hilario E, Jacobson F (2006) during the early discovery process. Expert
Protein aggregation and bioprocessing. AAPS J Opin Drug Deliv 11:461–465
8:E572–E579 11. Yadav S, Shire SJ, Kalonia DS (2010) Factors
4. Vazquez-Rey M, Lang DA (2011) Aggregates affecting the viscosity in high concentration
in monoclonal antibody manufacturing pro- solutions of different monoclonal antibodies.
cesses. Biotechnol Bioeng 108:1494–1508 J Pharm Sci 99:4812–4829
5. Nishi H, Miyajima M, Nakagami H, Noda M, 12. Jezek J, Rides M, Derham B, Moore J,
Uchiyama S, Fukui K (2010) Phase separation Cerasoli E, Simler R, Perez-Ramirez B (2011)
of an IgG1 antibody solution under a low ionic Viscosity of concentrated therapeutic protein
strength condition. Pharm Res 27:1348–1360 compositions. Adv Drug Deliv Rev
6. Manning MC, Chou DK, Murphy BM, Payne 63:1107–1117
RW, Katayama DS (2010) Stability of protein 13. Binabaji E, Ma J, Zydney AL (2015) Intermo-
pharmaceuticals: an update. Pharm Res lecular interactions and the viscosity of highly
27:544–575 concentrated monoclonal antibody solutions.
7. Geng SB, Cheung JK, Narasimhan C, Pharm Res 32:3102–3109
Shameem M, Tessier PM (2014) Improving 14. Schiel JE, Turner A (2018) The NISTmAb
monoclonal antibody selection and engineer- reference material 8671 lifecycle management
ing using measurements of colloidal protein and quality plan. Anal Bioanal Chem
interactions. J Pharm Sci 103:3356–3363 410:2067–2078
8. Razinkov VI, Treuheit MJ, Becker GW (2015) 15. Moerke NJ (2009) Fluorescence polarization
Accelerated formulation development of (FP) assays for monitoring peptide-protein and
monoclonal antibodies (mAbs) and nucleic acid-protein binding. Curr Protoc
mAb-based modalities: review of methods and Chem Biol 1:1–15
tools. J Biomol Screen 20:468–483 16. Rossi AM, Taylor CW (2011) Analysis of
9. Arora J, Hickey JM, Majumdar R, protein-ligand interactions by fluorescence
Esfandiary R, Bishop SM, Samra HS, Mid- polarization. Nat Protoc 6:365–387
daugh CR, Weis DD, Volkin DB (2015)
Chapter 13
Abstract
Many pathogenic organisms have an inherent ability to rapidly evolve into new variants, which enables them
to escape previously existing immune responses. Vaccine design strategies should be aimed to counteract
such variability, targeting the conserved antigen regions of the pathogen. Sequence variability analysis
allows the identification of conserved regions upon multiple sequence alignments of the relevant antigens.
In this chapter, we describe a detailed protocol and provide software to build variability-free proteomes for
epitope-vaccine design. The procedure, which will be illustrated for human herpesvirus 1 (HHV1), involves
the identification of protein clusters, followed by multiple sequence alignments and Shannon variability
calculations. The software required to build variability-free proteomes is available at http://imed.med.ucm.
es/software/mmb2019.
Key words Epitope, Consensus sequence, Clustering, Multiple sequence alignments, Variability,
Shannon entropy
1 Introduction
255
256 Jose L. Sanchez-Trincado and Pedro A. Reche
2 Materials
2.1 Sequences We obtained HHV1 complete genome sequence files from the
NCBI Nucleotide database (https://www.ncbi.nlm.nih.gov/
nuccore). A total of 270 HHV1 genomes, each encompassing
150,000 nucleotides, were downloaded into a single GenBank
file (sequences.gb) (Fig. 1). We selected as the reference genome that
with GenBank accession X14112.1 and put it into a separate file
(ref.gb).
2.2 Software Software and scripts used in this report are summarized in Table 1.
Sequence clusters were generated using CD-HIT [18] and MSAs
using MUSCLE [19] or ClustalΩ [21]. Perl scripts are used for data
Generation of Reference Consensus Proteomes 257
Table 1
Computer applications and scripts for generating variability-free reference proteomes
3 Methods
3.1 Collection of CDS To build HHV1 consensus proteome sequences, we first have to
from Genome Files extract CDS from both, the genome selected as reference (ref.gb)
and the file including all genomes (sequences.gb). Alternatively, one
258 Jose L. Sanchez-Trincado and Pedro A. Reche
could directly retrieve protein sequences from NCBI (see Note 1).
Perl scripts cds_gb4cdhit.pl and cds_gbref4cdhit.pl are needed to
complete this task. Both scripts select CDS in GenBank records
generating a single protein FASTA file after them. The scripts differ
in that cds_gbref4cdhit.pl introduces a “ref” tag in the header of
sequences and must be used with the reference genome. These two
scripts are executed as follows:
3.2 Generation of Clustering of amino acid sequences, in this case from HHV1, is
Protein Clusters achieved with CD-HIT. This program defines clusters within a
collection of sequences after an identity threshold and generates
nonredundant sequence files. The default threshold is 0.9; all
sequences sharing 90% identity will be included in the same cluster.
Here, however, we show the use of CD-HIT using 0.8 as the
identity threshold (see Note 2).
3.3 Generation Next step is to generate multiple sequence alignments (MSAs) from
Multiple Sequence selected clusters, those with reference CDS on them (see Note 3).
Alignments from This is achieved using the Perl script parse_cdhit_clstr.pl. The script
Sequence Clusters has several options and arguments and usage help can be obtained
by calling the program without arguments. In this chapter, it is used
as follows:
$ parse_cdhit_clstr.pl —i sequences_and_ref_08.clstr /
—p sequences_and_ref.fa —m 1 —s 1 —u 1
3.4 Generation of To obtain consensus sequences with variable sites masked requires
Reference Consensus computing sequence variability per site/residue in MSAs and to
Proteomes with that end we use Shannon entropy (Eq. 1).
Variable Sites Masked XM
H ¼ i P i Log2 ðP i Þ ð1Þ
260 Jose L. Sanchez-Trincado and Pedro A. Reche
3.5 Identification of Consensus sequence with variable residues masked can be used as
HHV1 Conserved CD8 input of tools like RANKPEP (http://imed.med.ucm.es/Tools/
T-Cell Epitopes rankpep.html) [23] to predict non-variable T-cell epitopes. In
addition, these consensus sequences are also instrumental to iden-
tify conserved epitopes from experimentally validated ones [4–
7]. Here, we illustrate such application to identify conserved
HHV1-specific CD8 T-cell epitopes from experimentally validated
ones. A search in the IEDB resource (https://www.iedb.org/)
identifies 124 CD8 T-cell epitopes from HHV1 with 9 residues in
length, meeting the following criteria: linear peptides, positive
results on T-cell assays, recognition by human subjects, and restric-
tion by HLA I molecules. Conserved epitopes can be identified as
Generation of Reference Consensus Proteomes 261
Fig. 2 Fragment of the HHV1 consensus proteome. Dots replace sites/residues with H > 0.5
4 Notes
Acknowledgments
References
1. Cuevas JM, Geller R, Garijo R, Lopez- 3. Vogel M, Bachmann MF (2019) Immunoge-
Aldeguer J, Sanjuan R (2015) Extremely high nicity and Immunodominance in antibody
mutation rate of HIV-1 in vivo. PLoS Biol 13 responses. Curr Top Microbiol Immunol.
(9):e1002251. https://doi.org/10.1371/jour https://doi.org/10.1007/82_2019_160
nal.pbio.1002251 4. Gomez-Perosanz M, Russo G, Sanchez-
2. Qiu X, Duvvuri VR, Bahl J (2019) Computa- Trincado J, Pennisi M, Reche P, Shepherd A,
tional approaches and challenges to developing Pappalardo F (2019). Computational Immu-
universal influenza vaccines. Vaccines (Basel) 7 nogenetics. In: encyclopedia of bioinformatics
(2). https://doi.org/10.3390/ and computational biology, vol 2. Elsevier, pp
vaccines7020045 906–930
Generation of Reference Consensus Proteomes 263
5. Sanchez-Trincado JL, Gomez-Perosanz M, 15. Sheikh QM, Gatherer D, Reche PA, Flower
Reche PA (2017) Fundamentals and methods DR (2016) Towards the knowledge-based
for T- and B-cell epitope prediction. J Immu- design of universal influenza epitope ensemble
nol Res 2017:2680160. https://doi.org/10. vaccines. Bioinformatics 32(21):3233–3239.
1155/2017/2680160 https://doi.org/10.1093/bioinformatics/
6. Sette A, Rappuoli R (2010) Reverse vaccinol- btw399
ogy: developing vaccines in the era of geno- 16. Zhang Q, Wang P, Kim Y, Haste-Andersen P,
mics. Immunity 33(4):530–541. https://doi. Beaver J, Bourne PE, Bui HH, Buus S,
org/10.1016/j.immuni.2010.1009.1017 Frankild S, Greenbaum J, Lund O,
7. Vivona S, Gardy JL, Ramachandran S, Brink- Lundegaard C, Nielsen M, Ponomarenko J,
man FS, Raghava GP, Flower DR, Filippini F Sette A, Zhu Z, Peters B (2008) Immune epi-
(2008) Computer-aided biotechnology: from tope database analysis resource (IEDB-AR).
immuno-informatics to reverse vaccinology. Nucleic Acids Res 36(Web Server issue):
Trends Biotechnol 26(4):190–200. https:// W513–W518. https://doi.org/10.1093/
doi.org/10.1016/j.tibtech.2007.1012.1006. nar/gkn254
Epub 2008 Feb 1021 17. Reche PA, Zhang H, Glutting JP, Reinherz EL
8. Garcia-Boronat M, Diez-Rivero CM, Reinherz (2005) EPIMHC: a curated database of
EL, Reche PA (2008) PVS: a web server for MHC-binding peptides for customized
protein sequence variability analysis tuned to computational vaccinology. Bioinformatics 21
facilitate conserved epitope discovery. Nucleic (9):2140–2141. https://doi.org/10.1093/
Acids Res 36(Web Server issue):W35–W41. bioinformatics/bti269
https://doi.org/10.1093/nar/gkn211 18. Li W, Godzik A (2006) Cd-hit: a fast program
9. Alonso-Padilla J, Lafuente EM, Reche PA for clustering and comparing large sets of pro-
(2017) Computer-aided Design of an tein or nucleotide sequences. Bioinformatics
Epitope-Based Vaccine against Epstein-Barr 22(13):1658–1659. https://doi.org/10.
virus. J Immunol Res 2017:9363750. 1093/bioinformatics/btl158
https://doi.org/10.1155/2017/9363750 19. Edgar RC (2004) MUSCLE: multiple
10. Damfo SA, Reche P, Gatherer D, Flower DR sequence alignment with high accuracy and
(2017) In silico design of knowledge-based high throughput. Nucleic Acids Res 32
Plasmodium falciparum epitope ensemble vac- (5):1792–1797. https://doi.org/10.1093/
cines. J Mol Graph Model 78:195–205. nar/gkh340
https://doi.org/10.1016/j.jmgm.2017.10. 20. Shannon CE (1948) A mathematical theory of
004 communication. Bell Syst Tech J 27
11. Molero-Abraham M, Lafuente EM, Flower (3):379–423. https://doi.org/10.1002/j.
DR, Reche PA (2013) Selection of conserved 1538-7305.1948.tb01338.x
epitopes from hepatitis C virus for 21. Sievers F, Wilm A, Dineen D, Gibson TJ,
pan-populational stimulation of T-cell Karplus K, Li W, Lopez R, McWilliam H,
responses. Clin Dev Immunol 2013:601943. Remmert M, Söding J, Thompson JD, Higgins
https://doi.org/10.1155/2013/601943 DG (2011) Fast, scalable generation of high-
12. Murphy D, Reche P, Flower DR (2019) quality protein multiple sequence alignments
Selection-based design of in silico dengue epi- using Clustal omega. Mol Syst Biol 7(1):539.
tope ensemble vaccines. Chem Biol Drug Des https://doi.org/10.1038/msb.2011.75
93(1):21–28. https://doi.org/10.1111/ 22. Reche PA, Reinherz EL (2003) Sequence varia-
cbdd.13357 bility analysis of human class I and class II
13. Reche PA, Keskin DB, Hussey RE, Ancuta P, MHC molecules: functional and structural cor-
Gabuzda D, Reinherz EL (2006) Elicitation relates of amino acid polymorphisms. J Mol
from virus-naive individuals of cytotoxic T lym- Biol 331(3):623–641. https://doi.org/10.
phocytes directed against conserved HIV-1 1016/s0022-2836(03)00750-2
epitopes. Med Immunol 5(1). https://doi. 23. Reche PA, Glutting JP, Zhang H, Reinherz EL
org/10.1186/1476-9433-5-1 (2004) Enhancement to the RANKPEP
14. Shah P, Mistry J, Reche PA, Gatherer D, resource for the prediction of peptide binding
Flower DR (2018) In silico design of Myco- to MHC molecules using profiles. Immunoge-
bacterium tuberculosis epitope ensemble vac- netics 56(6):405–419. https://doi.org/10.
cines. Mol Immunol 97:56–62. https://doi. 1007/s00251-004-0709-7
org/10.1016/j.molimm.2018.03.007
Chapter 14
Abstract
Immunoinformatic plays a pivotal role in vaccine design and development. While traditional methods are
exclusively depended on immunological experiments, they are less effective, relatively expensive, and time-
consuming. However, recent advances in the field of immunoinformatics have provided innovative tools for
the rational design of vaccine candidates. This approach allows the selection of immunodominant regions
from the sequence of whole genome of a pathogen. The identified immunodominant region could be used
to develop potential vaccine candidates that can trigger protective immune responses in the host. At
present, epitope-based vaccine is an attractive concept which has been successfully trailed to develop
vaccines against a number of pathogens. In this chapter, we outline the methodology and workflow of
how to deploy immunoinformatics tools in order to identify immunodominant epitopes using Shigella as a
model organism. The immunodominant epitopes, derived from S. flexneri 2a using this workflow, were
validated using in vivo model, indicating the robustness of the outlined workflow.
1 Introduction
265
266 Priti Desai et al.
2 Materials
Suggested databases/servers.
Sr
no. Name Web address Use
1 UniProtKB https://www.uniprot.org A database of protein
sequence and function
information
2 VaxiJen http://www.ddg- Alignment independent
v2.0 pharmfac.net/vaxijen/ prediction of protective
VaxiJen/VaxiJen.html antigens
3 Swiss- http://swissmodel. Homology modeling and
model expasy.org/ 3D structure prediction
server
4 RAMPAGE http://www-cryst.bioc. Evaluation of the backbone
cam.ac.uk/rampage conformation and
checking non-GLY
residues in excluded
regions
5 PROSA https://prosa.services. Protein structure validation
came.sbg.ac.at/prosa.
php
6 BepiPred http://www.cbs.dtu.dk/ B-cell epitope prediction
services/BepiPred/
7 ProPred http://www.imtech.res. Prediction of epitopes
in/raghava/propred/ binding to MHC class II
molecules
8 IEDB-AR http://tools.iedb.org/ Prediction of epitopes
mhci/ binding to MHC class I
molecules
9 Glide https://www. Receptor-ligand docking
schrodinger.com/glide
10 Pepstr http://www.imtech.res. Prediction of tertiary
in/raghava/pepstr/ structure of peptides
11 BLASTp https://blast.ncbi.nlm. Searching similar proteins
nih.gov/Blast.cgi?
PAGE¼Proteins
Immunoinformatic Identification of Potential Epitopes 267
3 Method
Molecular docking
Fig. 1 Workflow
268 Priti Desai et al.
(a) 180
C231 ASP
A315 ASN
ψ 0
–180
–180 0 180
φ
Fig. 2 (a) Ramachandran plot of OmpC model. The plot indicates two outliers, namely, asn315 of chain A and
asp231 of chain C. Ninety-eight percent of the residues are expected to be in favored regions, while 2% are
expected to be in allowed regions. Ideally there should be zero outliers. (b) Ramachandran plot highlighting
allowed and favored regions for special cases like glycine, proline, and pre-proline residues
Immunoinformatic Identification of Potential Epitopes 269
C231 ASP
0 A315 A
-180
Pre-Pro Proline
180
0
-180
-180 0 180 -180 0 180
Fig. 2 (continued)
Fig. 3 (a) Z-score of OmpC model, (b) Z-score of OmpC template. The Z-score of the input protein is shown by
the black dot
4 Notes
scores for each amino acid present at position 1–9, and scores
of individual amino acids are summed to give the score for the
peptide.
8. ANN (Artificial Neural Network) uses neural networks
trained with the help of a combination of input methods, like
sparse encoding and Blosum encoding, for quantitative predic-
tion of binding. The final prediction value is the average with
equal weight of the sparse and Blosum-encoded neural net-
work predictions. One of the advantages of ANN is that it relies
on the mutual information, i.e., how neighboring amino acids
affect each other’s binding affinity. The information about the
amino acid present at i position gives information about the
amino acid present at the i + 1 position.
9. SMM (Stabilized Matrix Method) uses both individual score of
amino acids at each position and pair coefficient to provide the
final prediction of binding affinity. Matrix entries have been
derived by minimizing the distance between the predicted
scores and measured affinities for a set of training peptides.
Only those coefficients are used for which sufficient training
data was available. A regularization parameter is included in the
minimization function used to derive matrix entries and pair
coefficients to avoid over-fitting.
10. CombLib (Combinatorial Peptide Libraries) method uses data
derived from binding assays of different HLA alleles with posi-
tion scanning combinatorial peptide libraries. These libraries
contain mixtures of peptides, and in each mixture, one amino
acid will be fixed at a certain position and the other positions
will have different amino acids present on different peptides. In
total 20 9 ¼ 180 mixtures will be required to get the data for
the contribution of all 20 amino acids at each position of the
9mer peptide. This method is available for 15 alleles.
11. NetMHCpan method uses neural networks trained on a very
large dataset of 79,137 unique peptide-MHC class I interac-
tions. The input data used to train neural networks is low in
redundancy and contains a large fraction of nonbinding data
for each allele. This method can be used to make predictions
for chimpanzee, gorilla, rhesus macaque, and mouse alleles
apart from human alleles.
12. This is to check whether the peptide is having same structure
on its own as compared to when it is a part of the fully folded
protein.
13. The dimensions of the binding box are kept as small as possible
because the binding sites are already known. Keeping the bind-
ing box small enough to cover only the intended binding site
saves computation power compared to trying to dock the
peptide against the whole MHC molecule.
274 Priti Desai et al.
References
1. Brusic V, Petrovsky N (2005) Immunoinfor- historical perspective. Electrophoresis 30:
matics and its relevance to understanding S162–S173
human immune disease. Expert Rev Clin 12. Benkert P, Biasini M, Schwede T (2011)
Immunol 1(1):145–157 Toward the estimation of the absolute quality
2. Gershoni JM, Roitburd-Berman A, Siman-Tov of individual protein structure models. Bioin-
DD, Tarnovitski Freund N, Weiss Y (2007) formatics 27:343–350
Epitope mapping: the first step in developing 13. Bertoni M, Kiefer F, Biasini M, Bordoli L,
epitope-based vaccines. BioDrugs 21 Schwede T (2017) Modeling protein quater-
(3):145–156 nary structure of homo- and hetero-oligomers
3. Goldsby RA, Kindt TJ, Kuby J, Osborne BA beyond binary interactions by homology. Sci
(2002) Immunology, 5th edn. W. H. Freeman, Rep 7(1):10480
New York 14. Lovell SC, Davis IW, Arendall IIIWB, de Bak-
4. Khan AM, Miotto O, Heiny AT, Salmon J, ker PIW, Word JM, Prisant MG, Richardson JS,
Srinivasan KN, Nascimento EJ, Marques ET Richardson DC (2002) Structure validation by
Jr, Brusic V, Tan TW, August JT (2006) A Calpha geometry: phi,psi and Cbeta deviation.
systematic bioinformatics approach for selec- Proteins: Structure, Function Genetics
tion of epitope-based vaccine targets. Cell 50:437–450
Immunol 244(2):141–147 15. Wiederstein M, Sippl MJ (2007) ProSA-web:
5. Bremel RD, Homan EJ (2010) An integrated interactive web service for the recognition of
approach to epitope analysis I: dimensional errors in three-dimensional structures of pro-
reduction, visualization and prediction of teins. Nucleic Acids Res 35:W407–W410
MHC binding using amino acid principal com- 16. Sippl MJ (1993) Recognition of errors in
ponents and regression approaches. Immu- three-dimensional structures of proteins. Pro-
nome Res 6:7 teins 17:355–362
6. Schubert B, Lund O, Nielsen M (2013) Evalu- 17. Jespersen MC, Peters B, Nielsen M, Marcatili P
ation of peptide selection approaches for (2017) BepiPred-2.0: improving sequence-
epitope-based vaccine design. Tissue Antigens based B-cell epitope prediction using confor-
82(4):243–251. https://doi.org/10.1111/ mational epitopes. Nucleic Acids Res 45(W1):
tan.12199 W24–W29. https://doi.org/10.1093/nar/
7. Mukhopadhaya A, Mahalanabis D, Chakrabarti gkx352
MK (2006) Role of Shigella flexneri 2a 34 kDa 18. Doytchinova IA, Flower DR (2007) VaxiJen: a
outer membrane protein in induction of pro- server for prediction of protective antigens.
tective immune response. Vaccine tumour antigens and subunit vaccines BMC
24:6028–6036 Bioinformatics 8:4
8. Jarza˛b A, Witkowska D, Ziomek E, 19. Doytchinova IA, Flower DR (2007) Identify-
Da˛browska A, Szewczuk Z, Gamian A (2013) ing candidate subunit vaccines using an
Shigella flexneri 3a outer membrane protein C alignment-independent method based on prin-
epitope is recognized by human umbilical cord cipal amino acid properties. Vaccine
sera and associated with protective activity. 25:856–866
PLoS One 8(8):e70539 20. Doytchinova IA, Flower DR (2008) Bioinfor-
9. Waterhouse A, Bertoni M, Bienert S, Studer G, matic approach for identifying parasite and fun-
Tauriello G, Gumienny R, Heer FT, de Beer gal candidate subunit vaccines. Open Vaccines J
TAP, Rempfer C, Bordoli L, Lepore R, 1:22–26
Schwede T (2018) SWISS-MODEL: homol- 21. Singh H, Raghava GPS (2001) ProPred: pre-
ogy modelling of protein structures and com- diction of HLA-DR binding sites. Bioinfor-
plexes. Nucleic Acids Res 46(W1): matics 17(12):1236–1237
W296–W303
22. Nielsen M, Lundegaard C, Worning P, Laue-
10. Bienert S, Waterhouse A, de Beer TAP, moller SL, Lamberth K, Buus S, Brunak S,
Tauriello G, Studer G, Bordoli L, Schwede T Lund O (2003) Reliable prediction of T-cell
(2017) The SWISS-MODEL repository - new epitopes using neural networks with novel
features and functionality. Nucleic Acids Res sequence representations. Protein Sci
45:D313–D319 12:1007–1017
11. Guex N, Peitsch MC, Schwede T (2009) Auto- 23. Peters B, Sette A (2005) Generating quantita-
mated comparative protein structure modeling tive models describing the sequence specificity
with SWISS-MODEL and Swiss-PdbViewer: a
Immunoinformatic Identification of Potential Epitopes 275
of biological processes with the stabilized Glide: a new approach for rapid, accurate dock-
matrix method. BMC Bioinformatics 6:132 ing and scoring. 2. Enrichment factors in data-
24. Sidney J, Assarsson E, Moore C, Ngo S, base screening. J Med Chem 47:1750–1759
Pinilla C, Sette A, Peters B (2008) Quantitative 29. Friesner RA, Banks JL, Murphy RB, Halgren
peptide binding motifs for 19 human and TA, Klicic JJ, Mainz DT, Repasky MP, Knoll
mouse MHC class I molecules derived using EH, Shaw DE, Shelley M, Perry JK, Francis P,
positional scanning combinatorial peptide Shenkin PS (2004) Glide: a new approach for
libraries. Immunome Res 4(2) rapid, accurate docking and scoring. 1. Method
25. Hoof I, Peters B, Sidney J, Pedersen LE, and assessment of docking accuracy. J Med
Sette A, Lund O, Buus S, Nielsen M (2009) Chem 47:1739–1749
NetMHCpan, a method for MHC class I bind- 30. Singh S, Singh H, Tuknait A, Chaudhary K,
ing prediction beyond humans. Immunogenet- Singh B, Kumaran S, Raghava GPS (2015)
ics 61(1):1–13 PEPstrMOD: structure prediction of peptides
26. Bui HH, Sidney J, Dinh K, Southwood S, containing natural, non-natural and modified
Newman MJ, Sette A (2006) Predicting popu- residues. Biol Direct 10:73
lation coverage of T-cell epitope-based diag- 31. Kaur H, Garg A, Raghava GPS (2007) PEPstr:
nostics and vaccines. BMC Bioinformatics a de novo method for tertiary structure predic-
7:153 tion of small bioactive peptides. Protein Pept
27. Friesner RA, Murphy RB, Repasky MP, Frye Lett 14:626–630
LL, Greenwood JR, Halgren TA, Sanschagrin 32. Madden TL, Tatusov RL, Zhang J (1996)
PC, Mainz DT (2006) Extra precision glide: [9] Applications of network BLAST server.
docking and scoring incorporating a model of Computer Methods for Macromolecular
hydrophobic enclosure for protein-ligand com- Sequence Analysis:131–141. https://doi.org/
plexes. J Med Chem 49:6177–6196 10.1016/s0076-6879(96)66011-x
28. Halgren TA, Murphy RB, Friesner RA, Beard
HS, Frye LL, Pollard WT, Banks JL (2004)
Chapter 15
Abstract
Vaccines have become a cost-effective method for prevention or treatment of viral infections. Conventional
methods to design a vaccine candidate is a laborious process requiring time and economy. Many approaches
have been made to reduce the times and economy of vaccine development. In this regard, immunoinfor-
matic approach is supposed to bring a revolution in vaccine development. This chapter provides an overview
of immunoinformatics and its application in in silico vaccine design and development strategies in humans
against viral diseases with the help of available databases and tools.
Abbreviations
277
278 Richa Anand and Richa Raghuwanshi
1 Introduction
2 Immunoinformatics
Immune system
Anatomic Physiologic
Cellular response Humoral response
(skin, mucous membranes) (temperature, low pH)
Phagocytic
Inflammatory B-cell
(blood monocytes, neutrophils, T-cell
(serum proteins)
tissue, macrophages)
Recognize short linear peptide epitopes
displayed by
Immunoinformatics approaches
Molecular dynamics simulation and docking Microarray technique for vaccine design
3.1 Methods The methodology for T-cell epitope-based vaccine design against
for T-Cell any viral disease is given below. Either use default threshold para-
Epitope-Based Vaccine meters or user-defined parameters for prediction of epitopes using
Prediction different computational tools. Always select computational tools
carefully and as per the need of query.
3.1.2 Antigenicity The ability of an antigen to bind to, or interact with, the B-cell or
Prediction T-cell receptors is termed as antigenicity. Therefore, it is important
to predict the highest antigenic protein among structural proteins
of virus.
l Submit the FASTA-formatted amino acid sequences of total
structural proteins to the VaxiJen v 2.0 server (alignment-free
approach for antigenicity prediction with 70% to 89% of accu-
racy) for prediction of antigenicity based on the physicochemical
properties of amino acids (see Note 1).
l Sort all antigenic proteins according to their antigenic score.
l Select antigenic protein with the highest antigenicity score for its
further evaluation.
Designing Viral Vaccine by Immunoinformatics 281
3.1.3 Homology Analysis l Perform BLAST to search homology between the predicted
protein with highest antigenicity and human proteins (see
Note 3).
3.1.4 T-Cell Epitope After evaluation of predicted highest antigenic protein, T-cell epi-
Prediction topes can be predicted in silico. T-cell epitopes presented by MHC
class-I molecules are typically 8–11 amino acids long peptides,
whereas MHC class-II molecules are 13–17 amino acids long pep-
tides. The rationality of T-cell epitope-based vaccine majorly
depends on the consistency of peptide prediction. Therefore,
apply multiple tools for prediction of epitopes, and select the top
ranked epitopes predicted by all tools/servers for further evalua-
tion. Some frequently used tools are detailed below.
Prediction of MHC Class-I Epitope
l Submit highest antigenic protein sequences either in PLAIN
text or in FASTA/PIR/EMBL format in CTL pred (QM,
ANN, and SVM-based algorithm) to predict CTL peptides (see
Note 4).
l Use either QM or ANN or SVM or Consensus or Combined
approach to predict peptides against CTL (see Note 5).
l Select top-ranked peptides for their processing prediction (see
Note 6).
l Other tools which predict peptides restricted to CTL at super-
type and allele-specific level are:
– BIMAS-HLA, an ANN-based algorithm, predicts the pep-
tides that can be recognized by MHC supertypes A1, A3,
A24, B7, and B40 and alleles A∗0201, B∗3501, B∗3701,
B∗5101, and B∗5801.
– Net CTL 1.2, an ANN and weight matrix-based algorithm,
predicts CTL epitopes restricted to 12 MHC class-I super-
type (A1, A2, A3, A24, A26, B7, B8, B27, B39, B44, B58,
and B62).
– Propred-I, PSSM-based, server predicts peptides against
47 MHC class-I alleles.
– IEDB, Consensus-based, predicts MHC class-I binding
peptides.
– nHLApred, a MHC Class-I Binding Peptide Prediction, con-
sists of two parts: a) ComPred, based on the hybrid approach
of QM and ANN, predicts the peptides for 67 MHC class-I
alleles, and b) ANNpred, ANN based, predicts peptides for
30 HLA class-I alleles.
282 Richa Anand and Richa Raghuwanshi
3.1.5 MHC Class-I l Predict the probability for being T-cell epitope among selected
Processing Prediction top-scored peptides through MHC-I processing prediction tool
in IEDB.
l Assess the peptides based on proteasomal cleavage score, TAP
transport efficiency, and MHC class-I binding affinity.
l IC50 value for MHC binding peptides is calculated based on
SMM algorithm.
l The lower IC50 value indicates higher affinity.
l Select the best epitope for immunogenicity and antigenicity
prediction.
Designing Viral Vaccine by Immunoinformatics 283
3.1.9 Allergenicity Since most vaccines transfer the immune response to the allergic
and Toxicity Prediction reaction by initiating immunoglobulin E and Type II T-helper cells,
therefore, after analyzing the population coverage, it is essential to
analyze the allergenecity and toxicity of the predicted epitopes.
l Submit the epitopes sequences to AllerTOP v. 2.0 and Allergen
FP 1.0 for allergenecity prediction.
l AllerTop v. 2.0 (based on kNN, k ¼ 1) predicts allergenicity with
~ 89% accuracy on the basis of amino acid properties such as size,
hydrophobicity, helix-forming propensity, β-strand-forming
propensity, and relative abundance of amino acids.
284 Richa Anand and Richa Raghuwanshi
3.1.10 Peptide Match Before selecting an epitope as a vaccine candidate, the probability of
for Autoimmunity autoimmune reaction must be considered.
Prediction
l Submit predicted best epitopes sequences to Peptide Match
Service tool available in PIR.
l Select Homo sapiens (human) as the target organism (see
Note 7).
l Apply the limitation for UniRef100 representative sequences
within the UniProtKB.
l Select epitopes which do not show similarity with human.
3.1.11 Preparation After carrying out all the above said steps, preparation of the 3D
of the 3D Structure structure of selected best peptide epitope and MHC allele is neces-
of Selected Epitope sary to perform molecular docking studies.
and MHC Allele
l Search the 3D structure of best peptide epitope and MHC allele
in the PDB, if structures are available, then retrieve them in PDB
file format, and proceed for molecular docking studies.
l If structures are not available in the PDB, then predict the 3D
structures as follows:
– Submit the best selected epitopes to PEP-FOLD3 web server
(it works on the basis of the structural alphabet (SA) letters to
explain the structural conformation of four consecutive
amino acid residues coupled with a series of SA greedy algo-
rithm and coarse-grained force field) for prediction of 3D
structure-predicted best epitope.
– Save the best model predicted by PEP-FOLD3 in PDB file
format for docking with MHC alleles.
– Predict the 3D structure [9] of MHC alleles through two or
more protein 3D structure prediction servers/softwares such
as Phyre 2, LOMENTS, MUSTER, MODELLER, and
SWISS-MODEL (see Note 8).
– Analyze the resulted models based on Ramachandran plot,
and select the best 3D model for further refinement process.
Designing Viral Vaccine by Immunoinformatics 285
3.2 Methods The methodology for B-cell epitope-based vaccine design is given
for B-Cell below. Either use default threshold parameters or user-defined
Epitope-Based Vaccine parameters for prediction of epitopes using different computational
Prediction tools. The methodology for B-cell epitope-based vaccine design is
discussed below.
1. Antigenicity prediction.
2. Homology analysis.
3. Linear B-cell epitope prediction.
4. Continuous or conformational epitope prediction.
5. Epitope conservation, immunogenicity, and antigenicity
analysis.
6. Allergenicity and toxicity prediction.
7. Physiochemical characterization.
8. Peptide match for autoimmunity analysis.
All the above steps except steps 3, 4, and 7 in Subheading
3.2 remain same as discussed in T-cell epitope prediction in
Subheading 3.1.
9. Linear B-Cell Epitope Prediction
Length of B-cell epitopes vary from 5 to 30 residues, but
mostly web-based tools/servers predict linear B-cell epitopes
with length of 20 amino acid residues.
l Submit antigenic protein sequence to webservers LBtope,
ABCpred, Bcepred, and BepiPred-2.0.
l LBtope, based on SVM, predicts linear B-cell peptide epi-
tope with an overall accuracy of ~81%.
l ABCpred uses recurrent neural network for prediction with
an accuracy of ~65.93%.
286 Richa Anand and Richa Raghuwanshi
3.3 T- and B-Cell Often, peptides capable of inducing both cellular and humoral
Epitope response are suggested as most probable candidate. Thus, it
Superimposition depends on user’s interest to proceed for this additional step.
l Select potential T- and B-cell epitopes obtained through above
approaches and study their alignment.
l Consider the overlapped regions of epitopes as most potential
epitope for vaccine development.
4 Notes
References
1. Sundaramurthi JC, Ashokkumar M, 6. Welly BT, Miller MR, Stot JL et al (2017)
Swaminathan S, Hanna LE (2017) HLA based Genome report: identification and validation of
selection of epitopes offers a potential window of anti-genic proteins from Pajaroellobacter abor-
opportunity for vaccine design against HIV. tibovis using de novo genome sequence assem-
Vaccine 35:5568–5575 bly and reverse vaccinology. G3 7:321–331
2. Murrell S, Wu SC, Butler M (2011) Review of 7. Kimbrell DA, Beutler B (2001) The evolution
dengue virus and the development of a vaccine. and genetics of innate immunity. Nat Rev Genet
Biotechnol Adv 29:239–247 2:256–267
3. Khan MA, Hossain MU, Rakib-Uz-Zaman SM 8. Trost B, Bickis M, Kusalik A (2007) Strength in
et al (2015) Epitope-based peptide vaccine numbers: achieving greater accuracy in MHC-I
design and target site depiction against Ebola binding prediction by combining the results
viruses: an immunoinformatics study. Scand J from multiple prediction tools. Immunome Res
Immunol 82:25–34 3:5
4. Rappuoli R (2000) Reverse vaccinology. Curr 9. Anand R (2018) Identification of potential Anti-
Opin Microbiol 3:445–450 tuberculosis drugs through docking and virtual
5. He Y, Rappuoli R, De Groot AS et al (2010) screening. Interdiscip Sci Comput Life Sci
Emerging vaccine informatics. J Biomed Bio- 10:419–429
technol. https://doi.org/10.1155/2010/
218590
Chapter 16
Abstract
Accurate prediction of discontinuous antigenic epitopes is important for immunologic research and medical
applications, but it is not an easy problem. Currently, there are only a few prediction servers available,
though discontinuous epitopes constitute the majority of all B-cell antigenic epitopes. In this chapter, we
describe two online servers, EPCES and EPSVR, for discontinuous epitope prediction. All methods were
benchmarked by a curated independent test set, in which all antigens had no complex structures with the
antibody, and their epitopes were identified by various biochemical experiments. The servers and all datasets
are available at http://sysbio.unl.edu/EPCES/ and http://sysbio.unl.edu/EPSVR/.
1 Introduction
289
290 Shide Liang et al.
2 Materials
2.1 Webserver The webservers, EPCES and EPSVR, were developed for confor-
mational B-cell epitope prediction, which is available at http://
sysbio.unl.edu/EPCES/ and http://sysbio.unl.edu/EPSVR/.
Figure 1 displays the input page for two webservers that allow
users to input a PDB ID or upload a protein structure file in PDB
format. The chain name is also required (see Note 1). Once a user
submits a job by clicking the “submit” button, after typing the
correct four-letter word shown in a figure to prevent robot sub-
missions (see Note 2), a new page will appear, which acknowledges
Fig. 2 The result window of the EPCES webserver. The result is saved in a PDB file, which can be downloaded
by clicking the button
2.2 Datasets The training set was gathered and screened from three protein data
sets: (1) 22 antigen-antibody complexes and their unbound struc-
2.2.1 Training Set
tures from protein docking Benchmark 2.0 [24], (2) 59 represen-
tative antigen-antibody complexes compiled by Ponomarenko and
Bourne [23], and (3) 17 antigen-antibody complex structures
released between February 2006 and October 2008 with available
unbound antigen structures, which was the test set in our previous
work [21]. Any antigen-antibody complex was discarded if its
antigen had no available unbound structure because the unbound
structures were required for prediction. A complex structure was
not used if its antigenic epitope consisted of amino acid residues
located on multiple chains. A complex was included if the sequence
identity between its antigen and all other antigens from the other
complex structures was less than 35% following local sequence
alignment. For an antigen with a sequence identity in the range of
35–50%, we accepted the antigen-antibody complex if the binding
topology was not the same as its homologous complex. For an
antigen with more than one antigenic epitope, only one was used
in order to avoid confusion in subsequent application of support
vector regression methods. As a result, a total of 48 complexes and
their unbound structures meeting the above criteria were used as a
training set, available for download at http://sysbio.unl.edu/
services/EPSVR/training.tar.gz.
2.2.2 Test Set The test set was curated from 293 entries of the Conformational
Epitope Database [25] (CED, Release 0.03) with the following
criteria. We only considered entries that had unbound antigen
structures, but no complex structures. Multiple entries with the
same antigen structure were combined and considered as one tar-
get, and antigenic residues from multiple entries were mapped onto
one protein structure. The sequence identity between any two
selected proteins was also required to be less than 35%. All selected
antigens were also screened against the rest of CED database and
our training set; the sequence identity between a selected antigen
and other antigens with complex structures in the CED or in the
training set was less than 35%. A total of 22 antigenic proteins in the
CED met all the above criteria; these were 1www, 1hgu, 1eku,
1mbn, 1av1, 1 pv6, 1al2, 2gmf, 1a7c, 1y8o, 1og5, 1jeq, 1dab,
1w7b, 1ly2, 1rec, 1nu6, 2b5i, 2gib, 1p4t, 1xwv, and 1qgt. Three
antigenic proteins, 1www, 1hgu, and 1xwv, were excluded since
they had multiple antibody-binding sites and the mapped antigenic
residues were evenly distributed on the protein surfaces. Therefore,
the final test set contained 19 antigen structures, available at
http://sysbio.unl.edu/services/EPSVR/testing.tar.gz.
EPCES and EPSVR: Prediction of Conformational B-cell Eptiopes 293
3 Method
3.1 EPCES and Both EPCES and EPSVR employed the same set of features, but
EPSVR Predictive the predictive models are different. For the description of all fea-
Model tures, please see the Subheading 3.3.
EPCES conducts prediction with consensus scoring. To take
the advantage of the multiple features, we used a voting mechanism
with the above-described six scoring functions. A patch was con-
sidered as an interface patch if five of the all six terms scored it into
the top-ranked patch set. We did not use the vote mechanism of all
six votes from the six scoring functions because one surface patch
with a small contact number could not have a high planarity score at
the same time. The number of predicted residues with each single
term is the same, but the threshold of how many top ranked patches
shall be kept can be varied to yield predictions with different
sensitivities.
EPSVR employed the support vector regression (SVR), which
is implemented with the SVM package LIBSVM [26]. For the
training step, any integer value between 0 and the patch size
(20 for this work), and each surface patch had six SVR attributes,
which were calculated with the six scoring terms: residue epitope
propensity, conservation score, side-chain energy score, contact
number, surface planarity score, and secondary structure composi-
tion. The six scoring terms were the same as used in our previous
work [21]. All SVR parameters were optimized by a grid search
(c ¼ 2–10~ 1, g ¼ 2–12~ 3, and p ¼ 2–5~ 2), and for each grid
point of triplets, a leave-one-out procedure was applied to evaluate
the trained SVR model. For prediction, first, we enumerated all
surface patches of a given antigen structure and calculated their six
SVR attributes. For each surface patch, we predicted the number of
putative interface residues by the trained SVR model. Here, a patch
score was defined as the fraction of the number of putative interface
residues to the total number of amino acid residues in the patch,
i.e., 20. One surface residue was assigned a residue score by averag-
ing patch scores of all patches in which this amino acid residue is
included. Finally, we sorted surface residues according to their
residue scores and the top-ranked ones were considered as interface
residues. The assumption here is that a residue frequently appearing
in top-scoring patches is likely an interface residue.
3.2 Definition of Following the previous work [27], we consider an amino acid
Surface Residues, residue as a surface residue if the relative accessibility of its side
Surface Patches, and chain is greater than 6% with probe radius ¼ 1.2 Å. A surface patch
Interface Residues is defined as a central surface residue and its 19 nearest surface
neighbors in space. Solvent vector constraints [28] were applied
in order to avoid patches sampled on different sides of a protein
surface. An interface residue is the surface residue with solvent
accessibility decreased more than 1 Å2 upon association.
294 Shide Liang et al.
3.3 Feature Vectors Residue epitope propensity [29], conservation score [29], side-
chain energy score [29], contact number [15], surface planarity
score [30], and secondary structure composition [31] were
exploited for antibody-binding site prediction. We previously used
the first three terms for protein-protein interface prediction
(PINUP [29]), which has the highest prediction accuracy accord-
ing to an independent study [32]. The last three terms have been
used for antibody-binding site prediction by other researchers. We
describe the details of those six terms in the following paragraphs.
Residue epitope propensity. The score of antibody-binding site
propensity, Epropensity(i), is defined as
!
P rInterface Sr
E propensity ði Þ ¼ ln surface ave ð1Þ
Pr Sr
4 Notes
Acknowledgments
References
1. Parker JM, Guo D, Hodges RS (1986) New 7. Chen J, Liu H, Yang J et al (2007) Prediction
hydrophilicity scale derived from high- of linear B-cell epitopes using amino acid pair
performance liquid chromatography peptide antigenicity scale. Amino Acids 33
retention data: correlation of predicted surface (3):423–428. https://doi.org/10.1007/
residues with antigenicity and X-ray-derived S00726-006-0485-9
accessible sites. Biochemistry 8. El-Manzalawy Y, Dobbs D, Honavar V (2008)
25 (19):5425–5432 Predicting linear B-cell epitopes using string
2. Emini EA, Hughes JV, Perlow DS et al (1985) kernels. J Mol Recognit 21(4):243–255.
Induction of hepatitis a virus-neutralizing anti- https://doi.org/10.1002/jmr.893
body by a virus-specific synthetic peptide. J 9. Blythe MJ, Flower DR (2005) Benchmarking
Virol 55(3):836–839 B cell epitope prediction: underperformance of
3. Karplus PA, Schulz GE (1985) Prediction of existing methods. Protein Sci 14(1):246–248.
chain flexibility in proteins - a tool for the https://doi.org/10.1110/ps.041059505
selection of peptide antigens. Naturwis- 10. Greenbaum JA, Andersen PH, Blythe M et al
senschaften 72(4):212–213 (2007) Towards a consensus on datasets and
4. Kolaskar AS, Tongaonkar PC (1990) A semi- evaluation metrics for developing B-cell epi-
empirical method for prediction of antigenic tope prediction tools. J Mol Recognit 20
determinants on protein antigens. FEBS Lett (2):75–82. https://doi.org/10.1002/jmr.815
276(1–2):172–174. https://doi.org/10. 11. Sweredoski MJ, Baldi P (2009) COBEpro: a
1016/0014-5793(90)80535-q novel system for predicting continuous B-cell
5. Larsen JE, Lund O, Nielsen M (2006) epitopes. Protein Eng Des Sel 22(3):113–120.
Improved method for predicting linear B-cell https://doi.org/10.1093/protein/gzn075
epitopes. Immunome Res 2:2. https://doi. 12. Yang X, Yu X (2009) An introduction to epi-
org/10.1186/1745-7580-2-2 tope prediction methods and software. Rev
6. Saha S, Raghava GP (2006) Prediction of con- Med Virol 19(2):77–96. https://doi.org/10.
tinuous B-cell epitopes in an antigen using 1002/rmv.602
recurrent neural network. Proteins 65 13. MHV VR (1996) Mapping epitope structure
(1):40–48. https://doi.org/10.1002/prot. and activity: from one-dimensional prediction
21078
EPCES and EPSVR: Prediction of Conformational B-cell Eptiopes 297
Abstract
Identifying protein antigenic epitopes recognizable by antibodies is the key step for new immuno-
diagnostic reagent discovery and vaccine design. To facilitate this process and improve its efficiency,
computational methods were developed to predict antigenic epitopes. For the linear B-cell epitope
prediction, many methods were developed, including BepiPred, ABCPred, AAP, BCPred, BayesB, BEOra-
cle/BROracle, BEST, and SVMTriP. Among these methods, SVMTriP, a frontrunner, utilized Support
Vector Machine by combining the tri-peptide similarity and Propensity scores. Applied on non-redundant
B-cell linear epitopes extracted from IEDB, SVMTriP achieved a sensitivity of 80.1% and a precision of
55.2% with a five-fold cross-validation. The AUC value was 0.702. The combination of similarity and
propensity of tri-peptide subsequences can improve the prediction performance for linear B-cell epitopes. A
webserver based on this method was constructed for public use. The server and all datasets used in the
corresponding study are available at http://sysbio.unl.edu/SVMTriP. This chapter describes the webserver
of SVMTriP.
1 Introduction
Antigenic epitopes are regions of the protein surface that are pref-
erentially recognized by B-cell antibodies [1]. Prediction of anti-
genic epitopes is useful for the investigation on the mechanism of
body’s self-protection systems and can help the design of vaccine
components and immuno-diagnostic reagents [2].
Usually, B-cell antigenic epitopes are classified as either contin-
uous or discontinuous. A continuous (also called linear) epitope is a
consecutive fragment from the protein sequence, while a discontin-
uous epitope is composed of several fragments scattered along the
protein sequence but still forms an antigen-binding interface in 3D
(see Note 1). Currently, the majority of the available epitope pre-
diction methods focused on continuous epitopes due to the relative
simplicity of the problem and the convenience of available
299
300 Bo Yao et al.
2 Materials
2.1 Webserver The webserver, SVMTriP, was developed for linear B-cell epitope
prediction, which is available at http://sysbio.unl.edu/SVMTriP/.
Figure 1 displays the input page for the webserver that allows users
to cut and paste the protein sequence. The input required for
SVMTriP is a query protein sequence in the FASTA format (see
Note 2) and the linear epitope length (see Note 3). The standard
20 characters for amino acids are accepted, and any characters not
included in those 20 will be removed by the webserver (see Note 4).
Only one sequence per run is allowed for input. If multiple protein
sequences are entered as the input, only the first one will be pro-
cessed. Once a user submits a job by clicking the “submit” button,
after typing the correct four-letter word shown in a figure to
prevent robot submissions (see Note 5), a new page will appear,
which acknowledges the successful submission and displays a URL
in red that will be used to check the prediction results (see Note 6).
The input sequence is first screened against the database of all
received input-sequences to see whether it has been predicted
before. If the same sequence has been predicted before, the existing
results will be returned directly. Otherwise, the protein sequence is
subsequently passed on to the predictor running in the back-
ground, which will then screen the input sequence with a sliding
window, generate feature vectors for each window, and finally use
SVMTriP: A Method to Predict B-Cell Linear Antigenic Epitopes 301
Fig. 1 The input window of the SVMTriP webserver. The only required input is the protein sequence, which can
be copied and pasted into main text box on this page. Name, Organization, and Email are optional
an SVM classifier to score all candidates (see Note 7). The scores of
all candidate sites will be returned and displayed on the output
page, which is shown in Fig. 2 (see Note 8). The results are
permanently saved in the database, and users can access the results
with the URL obtained when they first submit their input
sequence.
Fig. 2 The result window of the SVMTriP webserver. All candidate sites for a given protein sequence are
displayed. All candidate sites in one group are ranked based on their predicted scores. The rank, location,
subsequence, and score for a given site are displayed
2.3 Downloading The source code for the webserver, the compiled training/test
datasets, and the well-trained SVM models used by the webserver
are available for downloading at http://sysbio.unl.edu/SVMTriP/
download.php.
3 Methods
3.1 SVM Package This webserver used the SVM package, SVMlight, implemented by
and Model Parameters Joachims (http://svmlight.joachims.org/) [23]. The parameters
of C, the cost, and γ for RBF kernel in SVM were optimized (see
Note 10). During the procedure of a five-fold cross-validation, the
five test results were used to calculate the mean values and 95%
confidence intervals of the sensitivity, precision, and maximal
F-measure (see Note 11).
SVMTriP: A Method to Predict B-Cell Linear Antigenic Epitopes 303
3.2 Feature Vectors The tri-peptide subsequence space was used to encode the SVM
attributes. This kernel had a space of 203 attributes for both
tri-peptide substring and propensity. The score of the i-th attribute,
K(i), is defined as the tri-peptide subsequence similarity kernel
modulated by its corresponding tri-peptide propensity. Please see
Eq. (1):
K ði Þ ¼ T ðiÞ P ðiÞ , ð1Þ
where K denotes the score of the i-th attribute, T denotes the
(i) (i)
4 Notes
Acknowledgments
References
1. Getzoff ED, Tainer JA, Lerner RA et al (1988) 12. Larsen JE, Lund O, Nielsen M (2006)
The chemistry and mechanism of antibody Improved method for predicting linear B-cell
binding to protein antigens. Adv Immunol epitopes. Immunome Res 2:2. https://doi.
43:1–98 org/10.1186/1745-7580-2-2
2. Milich DR (1989) Synthetic T and B cell rec- 13. Saha S, GPS R (2006) Prediction of continu-
ognition sites: implications for vaccine devel- ous B-cell epitopes in an antigen using recur-
opment. Adv Immunol 45:195–282 rent neural network. Proteins 65(1):40–48.
3. Parker JM, Guo D, Hodges RS (1986) New https://doi.org/10.1002/Prot.21078
hydrophilicity scale derived from high- 14. Chen J, Liu H, Yang J et al (2007) Prediction
performance liquid chromatography peptide of linear B-cell epitopes using amino acid pair
retention data: correlation of predicted surface antigenicity scale. Amino Acids 33
residues with antigenicity and X-ray-derived (3):423–428. https://doi.org/10.1007/
accessible sites. Biochemistry 25 S00726-006-0485-9
(19):5425–5432 15. El-Manzalawy Y, Dobbs D, Honavar V (2008)
4. Hopp TP, Woods KR (1981) Prediction of Predicting linear B-cell epitopes using string
protein antigenic determinants from amino kernels. J Mol Recognit 21(4):243–255.
acid sequences. Proc Natl Acad Sci U S A 78 https://doi.org/10.1002/Jmr.893
(6):3824–3828 16. Yao B, Zhang L, Liang S et al (2012)
5. Emini EA, Hughes JV, Perlow DS et al (1985) SVMTriP: a method to predict antigenic epi-
Induction of hepatitis a virus-neutralizing anti- topes using support vector machine to inte-
body by a virus-specific synthetic peptide. J grate tri-peptide similarity and propensity.
Virol 55(3):836–839 PLoS One 7(9):e45152. https://doi.org/10.
6. Pellequer JL, Westhof E, MHV V (1993) Cor- 1371/journal.pone.0045152
relation between the location of antigenic sites 17. Alix AJ (1999) Predictive estimation of protein
and the prediction of turns in proteins. Immu- linear epitopes by using the program PEOPLE.
nol Lett 36(1):83–100 Vaccine 18(3–4):311–314
7. Karplus PA, Schulz GE (1985) Prediction of 18. Odorico M, Pellequer JL (2003) BEPITOPE:
chain flexibility in proteins - a tool for the predicting the location of continuous epitopes
selection of peptide antigens. Naturwis- and patterns in proteins. J Mol Recognit 16
senschaften 72(4):212–213 (1):20–22. https://doi.org/10.1002/jmr.602
8. Kolaskar AS, Tongaonkar PC (1990) A semi- 19. Wee LJ, Simarmata D, Kam YW et al (2010)
empirical method for prediction of antigenic SVM-based prediction of linear B-cell epitopes
determinants on protein antigens. FEBS Lett using Bayes feature extraction. BMC Genomics
276(1–2):172–174 11(Suppl 4):S21. https://doi.org/10.1186/
9. Vita R, Zarebski L, Greenbaum JA et al (2010) 1471-2164-11-S4-S21
The immune epitope database 2.0. Nucleic 20. Wang Y, Wu W, Negre NN et al (2011) Deter-
Acids Res 38(Database issue):D854–D862. minants of antigenicity and specificity in
https://doi.org/10.1093/nar/gkp1004 immune response for protein sequences. BMC
10. Saha S, Bhasin M, Raghava GP (2005) Bcipep: Bioinformatics 12:251. https://doi.org/10.
a database of B-cell epitopes. BMC Genomics 1186/1471-2105-12-251
6:79. https://doi.org/10.1186/1471-2164- 21. Gao J, Faraggi E, Zhou Y et al (2012) BEST:
6-79 improved prediction of B-cell epitopes from
11. Schonbach C, JLY K, Sheng X et al (2000) antigen sequences. PLoS One 7(6):e40104.
FIMM, a database of functional molecular https://doi.org/10.1371/journal.pone.
immunology. Nucleic Acids Res 28 0040104
(1):222–224
SVMTriP: A Method to Predict B-Cell Linear Antigenic Epitopes 307
Abstract
Phage–bacteria interaction is a classic example of competitive coevolution in nature. Mathematical model-
ing of such interactions furnishes new insight into the dynamics of phage and bacteria. Besides its intrinsic
value, a somewhat underutilized aspect of such insight is that it can provide beneficial inputs toward better
experimental design. In this chapter, we discuss several modeling techniques that can be used to study the
dynamics between phages and their host bacteria. Monte Carlo simulations and differential equations (both
ordinary and delay differential equations) can be used to successfully model phage–bacteria dynamics in
well-mixed populations. The presence of spatial restrictions in the interaction media significantly affects the
dynamics of phage–bacteria interactions. For such cases, techniques like cellular automata and reaction–-
diffusion equations can be used to capture these effects adequately. We discuss details of the modeling
techniques with specific examples.
Key words Phage–bacteria interactions, Monte Carlo simulations, Differential equations, Reaction–-
diffusion equations, Cellular Automata
1 Introduction
1.1 Phage–Bacteria Bacteriophages are the viruses that require their host bacterial cells
Dynamics due to their metabolic and reproductive obligations. The distribu-
tion of bacteriophages depends on the distribution of the popula-
tion of their host cells in nature, as phages replicate inside the host
cell. A phage particle chiefly consists of nucleic acid as its genetic
material and proteins in its outer coating. The genetic material of a
phage could be either DNA or RNA in a single-stranded or double-
stranded form. DNA phages have a much simpler replication cycle
than RNA phages. The overall size of a phage particle varies
between 20 and 200 nm in either length or diameter, depending
on its shape [1].
The replication cycle of a phage particle can be broadly classi-
fied into two types: lytic and lysogenic. The lytic cycle is much
simpler than the lysogenic cycle. At the beginning of a typical
lytic cycle, a phage particle is adsorbed on a host cell surface. This
adsorption is very specific and depends on the host and the phage.
309
310 Saptarshi Sinha et al.
There are a series of host surface receptors that have been identified
in various hosts like E. coli, salmonella, mycobacterium, etc. The
receptor molecules vary in their biochemical composition depend-
ing on the host cell [2]. Figure 1 shows the lytic and lysogenic cycle
of a phage.
Upon the adsorption of a phage particle on the bacterial cell
surface, the genetic material of the phage penetrates inside the host
cell. This step is followed by the replication of the phage compo-
nents. Packaging of genetic material of the phage inside the capsid
follows. Ultimately, the newly created phage particles are released
from the host cell through burst out or budding. In this chapter, we
mainly consider the lytic cycle of phage in phage–bacteria dynamics.
On the other hand, in the lysogenic cycle, phage genetic mate-
rial gets integrated inside the host chromosome after penetration.
This particular phage is called prophage. It remains within the host
chromosome for several generations before it arrives back to the
lytic cycle. During prophage, the phage genetic material replicates
within the host chromosome.
1.2 Phage–Bacteria Phages and their host bacteria foster a competitive interaction in
Coevolution natural environments [3, 4]. During their coevolution, bacterial
cells evolve phage resistance mechanisms. These mechanisms are
host specific and have great diversity. These mechanisms involve:
(1) modification of surface bond receptors, which disrupts the
phage adsorption, (2) restriction enzyme modifications, due to
which phage particles are left unable to resist, and (3) modification
of metabolic enzymes, which could inhibit phage propagation.
Modeling Phage–Bacteria Dynamics 311
1.3.1 Models Without When we consider phage–bacteria dynamics in liquid media, every
Spatial Restriction phage particle is free in principle to interact with any of the host
bacterial cells present in the medium [7]. The spatial restriction is
far less as the viscosity of the medium is usually rather low. Not
merely this, the reaction mixture is generally incubated in the
laboratory at 120–180 rotations per minute (RPM); during the
interaction time, this creates a well-mixed environment. In such
situations, the experimental results are based on the number of host
cells and phage particles at a particular instant of time. We can
experimentally measure various parameters such as host growth
rate, phage adsorption rate, etc. We can also measure the burst
size, which depends on the specific phage–bacteria system. To
model such scenarios, we can use either Monte Carlo simulations
or a set of differential equations.
1.3.2 Models with Spatial When we discussed the spatial restrictions in phage–bacteria inter-
Restriction actions, we were mainly focused on their interactions in solid or
semi-solid media. Such interactions can be found in both labora-
tory and natural environments. The hollow zone formed on an
opaque bacterial lawn due to phage propagation is called a plaque.
This plaque or plaque-forming unit is used in a laboratory environ-
ment for counting phages. This plaque formation is used not only
for quantification of phage numbers but also for studying the
phage–bacteria dynamics on solid media.
In this method, a phage sample is diluted and mixed with
0.4–0.8% agar, which is called soft or top agar. Host bacterial
solution, which acts as indicator bacteria, is also added with this
phage–agar mixture. Now the combination is pour plated on
1.8–2% hard agar plate and incubated overnight at a suitable tem-
perature. After the incubation period has elapsed, clear plaques
would be visible on the hazy lawn of indicator bacteria.
Now, this plaque morphology depends on various parameters,
which would be incorporated in our theoretical model. These
parameters include plaque diameter, agar concentration, phage
particle diffusion in soft agar, mechanism of release of phage parti-
cle, the density of host cell, etc. The size and appearance of plaque
are dependent not only on the specific host–phage system but also
on the above mentioned parameters in a complex manner. Figure 3
shows Plaque formation on agar and various factors controlling
plaque diameter.
The adsorption events inside the soft agar depend on agar
density, phage diffusion rate, adsorption rate, host cell density,
and latent period. In a host crowded environment, the adsorption
rate will be high due to the availability of host cells for phage
adsorption. But in high agar density, it would be tedious for a
phage particle to diffuse toward the host cell. Thus, agar density
has a negative effect on phage adsorption. After the first round of
infection, the burst size plays an essential role in determining the
Modeling Phage–Bacteria Dynamics 313
Fig. 3 Plaque formation on agar and various factors controlling plaque diameter
2 Methods
2.1 Monte Carlo Among various simulation techniques, Monte Carlo is one of the
Simulations most widely applied methods used in fields, as diverse as economics
to biology [8].
Let us consider a simple example of the Monte Carlo simula-
tions. Suppose we are throwing two dice together, each of which
could exhibit values from one to six in a given throw. Now we have
to calculate the probability of a particular value, which is the sum of
the values shown by the two dice when thrown together.
There are 36 possible combinations for two dice, when they are
cast together. We can calculate the probability of a particular out-
come as represented in Fig. 4a. For example, let us consider a
combined score of 5. This score is attainable in the following ways
with a pair of dice: (1,4), (2,3), (3,2), and (4,1). The probability of
scoring 5 will, therefore, be 4/36 ¼ 0.111.
Now computationally we could simulate the same thing repeat-
edly by generating two mutually independent natural random
numbers in the range of 1–6 and then adding these numbers
using Monte Carlo simulations. We can thus calculate the approxi-
mate value of the probability of any given score with two dice. For
10,000 trials, the outcome is represented in Fig. 4b. Obviously, the
determination of this probability can be improved by increasing the
number of trials.
Somewhat removed from our present discussion, even in the
field of finance, Monte Carlo simulations can be similarly used to
calculate risk analysis from such distributions for given scenarios.
314 Saptarshi Sinha et al.
Fig. 4 (a) Probability of obtaining various outcomes when a pair of dice is thrown simultaneously.
(b) The probability distribution for 10,000 trials
2.1.4 Latent Period and Two other parameters need to be considered in phage–bacteria
Burst Size dynamics, namely, the latent period and burst size. We saw in the
lytic cycle that after penetration phages require some time for
synthesis and assembly of new particles. This time gap between
infection and release of a new phage particle is referred to as the
latent period. The phage growth curve is represented as a one-step
growth curve. On the other hand, burst size represents the number
of new phage particles released from an infected cell after successful
phage infection. We consider these two parameters in our Monte
Carlo simulations.
2.1.5 Algorithm for An outline of the algorithm for Monte Carlo simulations toward
Monte Carlo Simulations simulating a typical lytic cycle in phage–bacteria dynamics is repre-
sented in Fig. 5. Here, we need to generate two pseudo–random
numbers (R1, R2) from a uniform distribution using two different
“seeds” for every host cell at every time point. If R1 is less than the
adsorption probability mentioned earlier, then the host cell would
be infected by a phage particle. If no infection occurs, then we need
316 Saptarshi Sinha et al.
Fig. 5 Algorithm for Monte Carlo simulation of phage–bacteria dynamics. Here, U, P, and I denote the number
of uninfected host, phage, and infected host cells, respectively. Dp and Ap denote the probability of host cell
division and phage adsorption, respectively. Lp is the latent period, while R1 and R2 are random numbers from
a uniform distribution
P_ ðt Þ ¼ bkS ðt τÞP ðt τÞ μp P ðt Þ aP ðt Þ
The above equations determine the rate of change of concen-
tration of bacteria and free phage, respectively, with respect to time.
Let us first discus the parameters affecting the change of concentra-
tion of bacteria. ‘α’ corresponds to the growth rate of susceptible
bacteria and ‘C’ is the carrying capacity of the bacterial population.
The rate of adsorption of free phage particle by bacterial cells is
denoted by ‘k’. The phage–bacteria infection is modeled upon the
principle of mass action. In a well-mixed population, the rate at
which two populations interact with each other is directly propor-
tional to the product of the size of both populations. It is to be
noted here that adsorption is considered as an irreversible process.
The constant removal rate of susceptible bacteria and free phage is
318 Saptarshi Sinha et al.
denoted by ‘a’. For the case of free phage particles, an infected cell
is lysed after a fixed amount of time called the latent period,
denoted by ‘τ’. Upon lysis, each infected bacteria releases ‘b’ new
free phage particles. ‘b’ is known as burst size. Thus, the number of
free phage particles at time ‘t’ depends on the number of interact-
ing susceptible bacterial cells and free phage particles at time (t-τ).
The rate of spontaneous decay or inactivation of free phage parti-
cles is denoted by ‘μp’. For simplicity, in this model, all rate para-
meters are considered to be constants. In the later sections of this
chapter, the definition of the variables described in the above DDE
model remains the same throughout. New parameters, if intro-
duced, will be explained accordingly.
2.3 Reaction– Earlier we have described the use of ODE and DDE to understand
Diffusion Equations phage–bacteria interactions in well-mixed environments. However,
most bacteria exist in the form of biofilms. Thus, for a better
understanding of phage–bacteria ecology, we need to explore
methodologies to understand the growth of phage population in
spatially constrained environments. This is because unlike well-
mixed environments, here the interaction between a bacterial pop-
ulation and a phage population is subject to spatial limitation.
In the laboratory, such spatially constrained conditions can be
easily seen in agar gels. Also, agar gel provides a simplified setup to
study the growth of phage population in spatially constrained
environments as compared to biofilms. As discussed above, various
factors that are responsible for phage growth within semi-solid
media are infection of bacterial cells, phage particle diffusion, and
phage-induced lysis of bacterial cells [14]. Koch first modeled
plaque formation or phage growth in semi-solid media in 1964
[15]. The plaque enlargement rate, ‘r’, was estimated to be pro-
portional to (D/L)1/2, i.e.,
12
D
r¼c
L
where ‘D’ is the phage diffusion rate, ‘L’ corresponds to the latent
period, and ‘c’ is the binding constant of phages [15]. A more
mechanistic approach can be adopted by constructing a set of
reaction–diffusion equations to model plaque growth [16]. Three
population interactions were considered for the above modeling,
namely, host bacteria (B), infected host bacteria (I), and free phage
particles (V) given by
k1
V þ B Ð I k2 Y :V
k1
bacterial cell. ‘Y’ is the burst size per lysed bacterial cell. The
resulting set of equations is as follows [16]:
δ ½V δ2 ½V D δ½V
¼ D: þ k1 ½V ½B þ k1 ½I þ Y k2 ½I
δt δr 2 r δr
δ½B
¼ k1 ½V ½B þ k1 ½I
δt
δ½I
¼ k1 ½V ½B k1 ½I k2 ½I
δt
where ‘D’ is the phage diffusion rate. It was presumed that the host
cells cannot diffuse. Since the plaques are considered to be radially
symmetric, the above equations were formulated in polar
co-ordinates as a function of time, ‘t’, and position, ‘r’, with
boundary conditions given by.
r δδr
½V
¼0 at r ¼ 0.
[V] ¼ 0, [B] ¼ 0 and [I] ¼ 0 as r ! 1.
2.4.1 2D Cellular To describe the basic process of 2D cellular automata, let us con-
Automata sider an infinitely long graph paper. Here each small square is
defined as a cell. Now each of the cells has one of the two definite
states (0 and 1). Again every cell has neighboring cells, which are
represented by various neighborhood methods. In 2D cellular
automata, there are mainly two types of neighborhood methods
that are considered: von Neumann neighborhood and Moore
neighborhood. The position of each cell is determined by its coor-
dinate [18]. Now if we consider the coordinate of the central cell at
320 Saptarshi Sinha et al.
Fig. 6 Different neighborhood systems in cellular automata: (a) Radial neighborhood, (b) Moore neighborhood,
and, (c) von Neumann neighborhood
3 SIR-Type Modeling
4 Notes
4.1 Monte Carlo Monte Carlo simulations have been used to study
Simulations mycobacteriophage-mycobacteria infection. We discuss later how
this simulation method can demonstrate the presence of alternative
killing mechanism of mycobacteria, which is not a phage-mediated
direct killing of the host cell [19]. The initial simulation of phage
dynamics was found to match perfectly with experimental results.
However, the host cell count in simulations is at variance with
experimental results. This difference can however be accounted
for by the incorporation of secondary host cell density-dependent
killing factor in the simulation. It was presumed that at the time of
burst out, this unknown secondary killing factor is produced from
infected cells and caused lyses of neighboring uninfected cells. Later
on, it was experimentally found that Reactive oxygen species (ROS)
were indeed produced from infected cells. This ROS was often
322 Saptarshi Sinha et al.
Fig. 8 The secondary killing mechanism of mycobacteria through ROS generation during phage infection
4.2 Ordinary The host–bacteria dynamics was modeled using ODE wherein the
Differential Equations degradation rate, ‘μi’, of infected bacterial cells was introduced [20]
and the number of infected bacteria is denoted by ‘I’. It demon-
strates the coexistence of both bacterial host and phages. The
model is as follows:
S ðt Þ
S_ ðt Þ ¼ αS ðt Þ 1 kS ðt ÞP ðt Þ
C
I_ ðt Þ ¼ kS ðt ÞP ðt Þ μi I ðt Þ
P_ ðt Þ ¼ bI ðt Þ μp P ðt Þ
It was shown that at stable equilibrium, which depends upon
the carrying capacity of the bacterial cell population, both phage
and host coexist. Furthermore, the carrying capacity of the host
population, ‘C’, with C < μiμp/bk, results in the extinction of the
phage population along with the infected host cells [20].
The incorporation of host response against both bacteria and
phage into the system of ODE provides significant insights into
phage–bacteria interactions [21]. The therapeutic responses were
shown to be dependent on various density-dependent thresholds.
Modeling Phage–Bacteria Dynamics 323
4.3 Delay Differential The model proposed by Campbell was further extended by the
Equations inclusion of infected bacteria in the interacting population to
study phage–bacteria dynamics [25]. Lenski and Levin
incorporated resource concentrations into the system of DDEs to
describe the evolutionary constraints for E. coli and a virulent phage
in chemostat [26]. A heterogeneous population of bacterial cells
has also been considered [27]. The division of population is based
upon the number of receptors present on each bacterial cell wall.
Cells having the same number of receptors are considered to exhibit
a similar sensitivity to phage infection. It was observed that hetero-
geneity imparts robustness to the bacterial population toward sur-
vival under strong phage pressure. This is different from the case of
homogeneous bacterial populations, which leads to the extinction
of susceptible bacteria in the long run.
It is a widely known fact that bacteria can develop a certain
degree of resistance against the interacting phage due to coevolu-
tion. Many studies have been conducted where such characteristics
of bacteria have been included to model the population dynamics
between phage and bacteria. One of these calculated the bacterial
324 Saptarshi Sinha et al.
dI
¼ rS ðt ÞP ðt Þ mrS ðt τÞP ðt τÞHeavi ðt τÞ
dt |fflfflfflfflfflffl{zfflfflfflfflfflffl} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
fraction of cells lysed
infected cells
ðinfected cell population decay due to lysisÞ
due to adsorption
Modeling Phage–Bacteria Dynamics 325
dP
¼ b mrS ðt τÞP ðt τÞHeavi ð t τÞ
dt |{z} |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
burst size
fraction of cells lysed resulting
in new phages
rS ðt ÞP ðt Þ
|fflfflfflfflfflffl{zfflfflfflfflfflffl}
phage decay due to adsorption
However, the factor ‘m’ alone cannot account for the experi-
mentally observed numbers of phage and bacteria. Only upon the
incorporation of an additional parameter ‘q’ we can account for the
experimental observations. This comes into play only upon the
occurrence of lysis after the primary infection. The predicted effects
of this “secondary killing factor” mathematically represented above
by ‘q’ have been verified experimentally [19].
4.4 Reaction– After Koch’s model for plaque enlargement, various other models
Diffusion Equations have been proposed, which shed light upon the factors that are
crucial for plaque enlargement [30–32].
Time delay has been included in the set of reaction–diffusion
equations [33]. Using numerical analyses, an approximate value of
plaque development rate is obtained, which is closer to that
observed experimentally. A solution is also obtainable for the fol-
lowing set of equations [33]:
n o
τ τ
½V tt þ ½V t ¼ D eff ½V rr k1 ½V ½B þ ð½V ½B Þt
2 2
½I τ ½I
þ Y k 2 ½I 1 þ ½I 1
½I max 2 ½I max t
½B t ¼ k1 ½V ½B
½I
½I t ¼ k 1 ½V ½B k 2 ½I 1
½I max
Here, [. . .] corresponds to the concentration. The sub-indices
[. . .]rr, [. . .]t represent second-order derivatives w.r.t position and
time, respectively. [. . .]t denotes time derivative. ‘τ’ is the latent
period. The remaining variables are same as defined previously.
‘Deff ’ is the hindered diffusion constant and is related to diffusion
constant ‘D’ according to the following equation:
1f
D eff ¼ D
1 þ fx
where ‘x’ accounts for bacterium shape and f ¼ B0/Bmax, i.e., the
ratio of concentration of bacteria to its maximum possible value.
The above models were combined with the Koch plaque growth
model to derive simplified estimations of phage optimal latent
period toward maximizing the plaque size [14].
326 Saptarshi Sinha et al.
4.5 Cellular Cellular automata has been applied in combination with partial
Automata differential equations (PDE) to model phage growth in a bacterial
biofilm [37]. This model concludes that the steady-state of the
phage–bacteria mixed population in a biofilm depends on nutrient
availability. Simple cellular automata techniques were introduced to
model spatial heterogeneity in phage–bacteria dynamics in the
biofilm surface [38]. These models have described the coexistence
of phage and bacteria in spatially restricted situations
reasonably well.
References
18. Ermentrout GB, Edelstein-Keshet L (1993) 28. Cairns BJ et al (2009) Quantitative models of
Cellular automata approaches to biological in vitro bacteriophage-host dynamics and their
modelling. J Theor Biol 160:97–133 application to phage therapy. PLoS Pathog 5
19. Samaddar S et al (2016) Dynamics of (1):e1000253
mycobacteriophage-mycobacterial host inter- 29. Han Z, Smith HL (2012) Bacteriophage-
action: evidence for secondary mechanisms for resistant and Bacteriophage-sensitive bacteria
host lethality. Appl Environ Microbiol in a chemostat. Math Biosci Eng 9:737–765
82:124–133 30. Da K et al (1981) Appendix: a model of plaque
20. Bremermann HJ (1983) Parasites at the origin formation. Gene 13:221–225
of life. J Math Biol 16:165–180 31. Lee Y, Eisner SD, Yin J (1997) Antiserum inhi-
21. Payne RJH, Jansen VAA (2001) Understand- bition of propagating viruses. Biotechnol
ing bacteriophage therapy as a density- Bioeng 55:542–546
dependent kinetic process. J Theor Biol 32. You L, Yin J (1999) Amplification and spread
208:37–48 of viruses in a growing plaque. J Theor Biol
22. Smith HL, Trevino RT (2009) Bacteriophage 200:365–373
infection dynamics: multiple host binding sites. 33. Ortega-Cejas V et al (2004) Approximate solu-
Math Model Nat Phenom 4:111–136 tion to the speed of spreading viruses. Phys Rev
23. Wang X, Wang J (2017) Modelling the within- E 031909:69
host dynamics of cholera: bacterial–viral inter- 34. Gourley SA, Kuang Y (2005) A delay reaction-
action. J Biol Dyn 11:484–501 diffusion model of the spread of bacteriophage
24. Holguin AV et al (2019) Host resistance, geno- infection. SIAM J Appl Math 65:550–566
mics and population dynamics in a salmonella 35. Jones DA, Smith HL (2011) Bacteriophage
enteritidis and phage system. Viruses 11:188 and Bacteria in a flow reactor. Bull Math Biol
25. Levin BR, Stewart FM, Chao L (1977) 73:2357–2383
Resource-limited growth, competition and 36. Mitarai N, Brown S, Sneppen K (2016) Popu-
predation: a model and experimental studies lation dynamics of phage and bacteria in spa-
with bacteria and bacteriophage. Am Nat tially structured habitats using phage and
111:3–24 Escherichia coli. J Bacteriol 198:1783–1793
26. Lenski RE, Levin BR (1985) Constraints on 37. Simmons M et al (2018) Phage mobility is a
the coevolution of bacteria and virulent core determinant of phage–bacteria coexis-
phage: a model, some experiments, and predic- tence in biofilms. ISME J 12:531
tions for natural communities. Am Nat 38. Kerr B et al (2006) Local migration promotes
125:585–602 competitive restraint in a host–pathogen ’trag-
27. Chapman-McQuiston E, Wu XL (2008) Sto- edy of the commons’. Nature 442:75
chastic receptor expression allows sensitive bac-
teria to evade phage attack. Part II: theoretical
analyses. Biophys J 94:4537–4548
Chapter 19
Abstract
Mycobacterium sp. is exhibiting complex evolution of antimicrobial resistance (AMR) and can therefore be
considered as a serious human pathogen. Many strategies were employed earlier to evade the pathogenesis
but AMR became threatened. Molecular tools employing bacteriophage can be an alternative to effective
treatment against Mycobacterium. Phage treatment using phage-encoded products, such as lysins, causes
lysis of cells; particularly bacteria could be used instead of direct use of these bacteriophages. Modern
technologies along with bacteriophage strategies such as in silico immunoinformatics approach, machine
learning, and artificial intelligence have been described thoroughly to escape the pathogenesis. Therefore,
understanding the molecular mechanisms could be a possible alternative to evade the pathogenesis.
1 Introduction
329
330 Arabinda Ghosh et al.
1.1 Phage-Induced Some phages (e.g., D29) have broad host areas and infect many
Host Gene Expression species, including both rapidly growing and slowly growing
and Alteration M. tuberculosis and M. smegmatis [6], while other are extremely
narrow and have a single known host infected (Barnyard). There is
at least a phase (DS6A) whose host range is limited to the
M. tuberculosis strains. Although only a partial genome sequence
of this possibly highly helpful and exciting phase is present [7, 8]. A
variety of bacterial pathogens such as Escherichia coli, Salmonella
sp., Coynebacterium diphtheria, and Vibrio cholera and their patho-
geneis are menifested by phage-coded toxins. Most M. tuberculosis
have one or two tiny (size of 10 kbp) prophylactical components,
namely Rv1 and μRv2, that are carried on tuberculosis strains.
Multiple mycobacterial strains, including M. cannetti,
M. marinum, M. abscessus, and M. ulcerans, carry ulceran prophets
that are semi-intact that could have an effect on their biodiversity.
Phage-encoded protein expression can only impact phages in the
hosts. An alternative path is to integrate the phage genome in a host
gene, which is essential for certain physiological processes. Phage
integration usually involves site-specific recombination—including
an integration-mediated combination between the phage locations
and the bacterial attachments (AtP and ATB), with the use of two
separate kinds of enzymes. Tyrosine integrases are the most preva-
lent and typically mediate integration into the tRNA gene
(a significant exception is the well-studied lambda phage integra-
tion). In contrast, phages using a serine-integrase typically use an
attB site located within a host’s protein-coding genes, which is
Dynamics of Mycobacteriophage—Mycobacterial Host Interaction 331
1.3 Phage Therapy In the more extensive arrangement, bacteriophages are of two
sorts: lysogenic (calm) in which bacteriophage coordinates their
genome into host DNA and lytic (destructive) in which bacterio-
phage duplicates quickly into the cell and therefore bursts the host
cell to proceed with the contamination to bacterial cells. Lytic
bacteriophages duplicate in a logarithmic way in host bacterial cell
and discharged by the lysis of the tainted bacterium, which incor-
porates the holin-endolysin discharge system [55, 56]. Holins cre-
ate a scraped spot in the bacterial film through which endolysins
finds the best approach to the peptidoglycan layer [56]. Endolysins
are cell divider hydrolases that debase the bacterial peptidoglycan
and prompt cell lysis and release of descendant phages [57]. It was
evidenced that the phage lambda lysis obey holin-endolysin-subor-
dinate process yet another parallel pathway managed by spanins and
Ms6 LysB, an adornment lytic proteins were presented later on
[58, 59]. Phage treatment is as yet observed to be effective against
various pathogens, for example, Pseudomonas, Staphylococcus, Kleb-
siella, and E. coli, and staphylococcal lung infection [60–63] as of
late; phage treatment has demonstrated a critical guarantee in the
treatment of diseases caused by pathogens that are impervious to
numerous anti-toxins. Chhibber et al. demonstrated that phages
can be utilized for treating Klebsiella pneumoniae respiratory tract
contamination, and a solitary portion was sufficient to protect the
majority of the tested creatures [64]. Likewise, phage treatment is
additionally answered to be successful in cerebrospinal meningitis
in infants [65]. Phage treatment has demonstrated guarantee
against numerous diseases caused by E. coli including skin infec-
tions [66], intermittent subphrenic and subhepatic abscesses [67],
cystic fibrosis by Pseudomonas aeruginosa [68], staphylococcal eye
infections [69], Gram-negative interceded neonatal sepsis [70],
fiery urinary tract diseases [71], and Buruli Ulcer caused by Myco-
bacterium ulcerans [40]. It was additionally demonstrated that
phage treatment amid deadly contamination prompts increment
in phage titer with time while in the event of antimicrobials the
complexity declines [72, 73]. In earlier days, it was accepted that
phages can act just against extracellularly increasing bacterium;
however, an ongoing report demonstrated that phages are skilled
at intracellular killing of immersed methicillin-safe Staphylococcus
aureus. This was exhibited utilizing host microorganisms as a vehi-
cle to convey phages inside phagocytic cells [74, 75]. A numerical
demonstrating in the populace elements method demonstrated
that a solitary portion of phage was more viable as opposed to
different dosages of antibiotics [76].
Biofilm brings down the antimicrobial susceptibility of a micro-
organism; a biofilm of Mtb and M. smegmatis shows a higher
degree of drug resistance than growing bacilli or bacilli in plank-
tonic form [77]. Although in TB, there is no obvious proof of
biofilm production, particularly in MDR-TB, but the expression
336 Arabinda Ghosh et al.
2.4 NGS-Based HLA To foresee a T-cell epitope, information of the HLA allotype is
Composing required. Traditional methodologies for HLA composing depend
on either counteracting agent-based techniques or focused on
sequencing [97]. In numerous clinical applications, the NGS infor-
mation of a patient is as of now accessible. The apparatuses induc-
ing the HLA allotype from NGS information (exome,
transcriptome) would thus be able to maintain a strategic distance
from extra expense. These instruments are added as often as possi-
ble to surmise HLA types for substantial scale genome sequencing
ventures (e.g., ICGC) [98], The Cancer Genome Atlas, 1000
Genomes venture [99], where no committed HLA composing
information is accessible for the dominant part of genomes.
2.5 White Blood Cell Given the HLA type for an individual, it is presently conceivable to
Epitope (T-cell) foresee the HLA ligands. This is frequently alluded to as T-cell
Forecast epitope expectation, despite the fact that introduction by HLA is
important, yet not adequate, for a peptide to end up an epitope
since acknowledgment by the resistant framework is not ensured.
HLA ligand restricting is a constraining advance in the antigen-
handling pathway. It is, for the most part, viewed as more particular
than resulting ventures of the antigen-preparing pathways and in
this way essential for immunization structure. PSSM-based indica-
tors (e.g., SYFPEITHI [100], RANKPEP [101] or Bimas [102],
SVM-based indices (e.g., SVMHC) [94], SVRMHc, and
ANN-based methods (e.g., netMHC) [103] are some of the most
popular of these approaches.
NetMHCpan [104], TEPITOPEpan [105], ADT [106], Uni-
Tope [107], and KISS [108] are also specific strategies for these
approaches. The PickPocket and the Tepicopotamus computes the
coping specifics in the HLA atom by contrasting the pocket build-
ups and the HLAs in their libraries and by establishing a weighted
normal score. The SVM-based tools are KISS whereas MULTI-
PRED trains one indicator for every super class. Rather than every
single other strategy, netMHCpan enables the client to make expec-
tations for discretionary HLA class I arrangements.
HLA class II epitope indicators are ProPred [109], RANKPED
[101], TEPITOPE [98], SVRMHC [94], MHC2MIL [110],
and MHC2pred. These instruments have a few indicators for the
HLA-DR locus. netMHCII, RANKPED, and MHC2MIL likewise
give forecasts to HLA-DQ and DP.
2.8 Ligands Entry The entry of a ligand in HLA will not guarantee that it is seen by the
into HLA TCR. The component immunogenicity of the ligand was subse-
quently defined as the epitope of POPI, a SVM-based indicator
created by Tung and Ho in 2007 [115].
2.9 Focusing B-cell The expectation of B-cell epitope is unique in relation to the T-cell
Epitope epitope prediction at a very fundamental point. Immune system
epitopes are brief, direct arrangements for peptides, while B-cell
epitopes are not really compatible within groups. The intricate
structure of collapsed proteins can prompt spatial closeness of
amino acids that can be remote in the antigen grouping. As of
late distributed indicators for ceaseless epitopes are COBEpro
[116], BCPRed, and FBCPred [117].
3 Conclusion
4 Notes
Acknowledgments
References
1. Hatfull GF (2012) The secret lives of myco- mycobacteriophage receptors of Mycobacte-
bacteriophages. Adv Virus Res 82:179–288 rium phlei/Mycobacterium smegmatis. Bio-
2. Chanishvili N (2012) Phage therapy – history chemistry 35:11812–11819
from Twort and d’Herelle through Soviet 15. Chen J, Kriakov J, Singh A, Jacobs WR Jr,
experience to current approaches. Adv Virus Besra GS, Bhatt A (2009) Defects in glyco-
Res 83:3–40 peptidolipid biosynthesis confer phage I3
3. Hendrix RW (2003) Bacteriophage geno- resistance in Mycobacterium smegmatis.
mics. Curr Opin Microbiol 6:506–511 Microbiology 155:4050–4057
4. Hatfull GF (2008) Bacteriophage genomics. 16. Hatfull GF (2013) Complete genome
Curr Opin Microbiol 11:447–453 sequences of 63 mycobacteriophages.
5. Suttle CA (2007) Marine viruses – major Genome Announc 1(6):e00847–e00813
players in the global ecosystem. Nat Rev 17. Hatfull GF (2014) Mycobacteriophages: win-
Microbiol 5:801–812 dows into tuberculosis. PLoS Pathog 10(3):
6. Rybniker J, Kramme S, Small PL (2006) Host e1003953
range of 14 mycobacteriophages in Mycobac- 18. Cole ST, Brosch R, Parkhill J, Garnier T,
terium ulcerans and seven other mycobacteria Churcher C, Harris D (1998) Deciphering
including Mycobacterium tuberculosis—appli- the biology of Mycobacterium tuberculosis
cation for identification and susceptibility from the complete genome sequence. Nature
testing. J Med Microbiol 55:37–42 393(6685):537e44
7. Bowman BU (1969) Properties of mycobac- 19. Jacobs-Sera D, Marinelli LJ, Bowman C,
teriophage DS6A. I. Immunogenicity in Broussard GW, Guerrero Bustamante C,
rabbits. Proc Soc Exp Biol Med 131:196–200 Boyle MM, Petrova ZO, Dedrick RM, Pope
8. Jones WD Jr (1975) Differentiation of known WH, Science Education Alliance Phage Hun-
strains of BCG from isolates of Mycobacterium ters Advancing Genomics and Evolutionary
bovis and Mycobacterium tuberculosis by using Science Sea-Phages Program, Modlin RL,
mycobacteriophage 33D. J Clin Microbiol Hendrix RW, Hatfull GF (2012) On the
1:391–392 nature of mycobacteriophage diversity and
host preference. Virology 434:187–201
9. Phillips LM, Sellers MI (1970) Effects of eth-
ambutol, actinomycin D and mitomycin C on 20. Court DL, Oppenheim AB, Adhya SL (2007)
the biosynthesis of D29-infected mycobacte- A new look at bacteriophage lambda genetic
rium smegmatis. In: Juhasz SE, Plummer G networks. J Bacteriol 189:298–304
(eds) Host-virus relationships in mycobacte- 21. Zumla A, George A, Sharma V, Herbert N,
rium, nocardia and actinomyces. Charles Baroness Masham of Ilton (2013) WHO’s
C. Thomas, Springfield, pp 80–102 2013 global report on tuberculosis: successes,
10. David HL, Clavel S, Clement F, Moniz- threats, and opportunities. Lancet 382
Pereira J (1980) Effects of antituberculosis (9907):1765e7
and antileprosy drugs on mycobacteriophage 22. Waites MJ, Morgan NL, Rockey JS, Higton G
D29 growth. Antimicrob Agents Chemother (2001) Industrial microbiology: an introduc-
18:357–359 tion. Blackwell Science Ltd, Hoboken, p 177
11. Tokunaga T, Kataoka T, Suga K (1970) Phage 23. Fruciano DE, Bourne S (2007) Phage as an
inactivation by an ethanol-ether extract of antimicrobial agent: d’Herelle’s heretical the-
Mycobacterium smegmatis. Am Rev Respir ories and their role in the decline of phage
Dis 101:309–313 prophylaxis in the West. Can J Infect Dis
12. Furuchi A, Tokunaga T (1972) Nature of the Med Microbiol 18:19–26
receptor substance of Mycobacterium smeg- 24. Herelle FD (1917) An invisible microbe that
matis for D4 bacteriophage adsorption. J Bac- is antagonistic to the dysentery bacillus
teriol 111:404–411 Cozzes rendus. Acad Sci 165:373–375
13. Bisso G, Castelnuovo G, Nardelli MG, 25. Levin BR, Bull JJ (2004) Population and evo-
Orefici G, Arancia G, Lanéelle G, lutionary dynamics of phage therapy. Nat Rev
Asselineau C, Asselineau J (1976) A study on Microbiol 2:166–173. https://doi.org/10.
the receptor for a mycobacteriophage: phage 1038/nrmicro822
phlei. Biochimie 58:87–97 26. Lu TK, Koeris MS (2011) The next genera-
14. Khoo KH, Suzuki R, Dell A, Morris HR, tion of bacteriophage therapy. Curr Opin
McNeil MR, Brennan PJ, Besra GS (1996) Microbiol 14:524–531. https://doi.org/10.
Chemistry of the lyxose-containing 1016/j.mib.2011.07.028
344 Arabinda Ghosh et al.
27. Radetsky P (1996) The good virus. Discover. during therapy. PLoS One 6:e18327.
http://discovermagazine.com/1996/nov/ https://doi.org/10.1371/journal.pone.
thegoodvirus918 0018327
28. Samaddar S, Grewal RK, Sinha S, Ghosh S, 39. Gillespie SH (2002) Evolution of drug resis-
Roy S, Gupta SKD (2016) Dynamics of tance in Mycobacterium tuberculosis: clinical
mycobacteriophage-mycobacterial host inter- and molecular perspective. Antimicrob
action: evidence for secondary mechanisms Agents Chemother 46:267–274. https://
for host lethality. Appl Environ Microbiol doi.org/10.1128/AAC.46.2.267-274.2002
82:124–133 40. Trigo G, Martins TG, Fraga AG, Longatto-
29. Berry M, Gurung A, Easty DL (1995) Toxic- Filho A, Castro AG, Azeredo J, Pedrosa J
ity of antibiotics and antifungals on cultured (2013) Phage therapy is effective against
human corneal cells: effect of mixing, expo- infection by Mycobacterium ulcerans in a
sure and concentration. Eye 9(Part murine footpad model. PLoS Negl Trop Dis
1):110–115. https://doi.org/10.1038/eye. 7:e2183. https://doi.org/10.1371/journal.
1995.17 pntd.0002183
30. Lees AW, Allan GW, Smith J, Tyrrell WF, 41. Ford ME, Stenstrom C, Hendrix RW, Hatfull
Fallon RJ (1971) Toxicity form rifampicin GF (1998) Mycobacteriophage TM4:
plus isoniazid and rifampicin plus ethambutol genome structure and gene expression.
therapy. Tubercle 52:182–190. https://doi. Tuber Lung Dis 79:63–73. https://doi.org/
org/10.1016/0041-3879(71)90041-9 10.1054/tuld.1998.0007
31. Fenton M, Ross P, McAuliffe O, O’Mahony J, 42. Fullner KJ, Hatfull GF (1997) Mycobacter-
Coffey A (2010) Recombinant bacteriophage iophage L5 infection of Mycobacterium bovis
lysins as antibacterials. Bioeng Bugs 1:9–16. BCG: implications for phage genetics in the
https://doi.org/10.4161/bbug.1.1.9818 slow-growing mycobacteria. Mol. Microbio.
32. Fischetti VA (2008) Bacteriophage lysins as 26:755–766
effective antibacterials. Curr Opin Microbiol 43. Hatfull GF, Sarkis GJ (1993) DNA sequence,
11:393–400. https://doi.org/10.1016/j. structure and gene expression of mycobacter-
mib.2008.09.012 iophage L5: a phage system for mycobacterial
33. Matsuzaki S, Rashel M, Uchiyama J, genetics. Mol Microbiol 7:395–405. https://
Sakurai S, Ujihara T, Kuroda M, Ikeuchi M, doi.org/10.1111/j.1365-2958.1993.
Tani T, Fujieda M, Wakiguchi H, Imai S tb01131.x
(2005) Bacteriophage therapy: a revitalized 44. Piuri M, Hatfull GF (2006) A peptidoglycan
therapy against bacterial infectious diseases. J hydrolase motif within the mycobacterioph-
Infect Chemother 11:211–219. https://doi. age TM4 tape measure protein promotes effi-
org/10.1007/s10156-005-0408-9 cient infection of stationary phase cells. Mol
34. Schuch R, Nelson D, Fischetti VA (2002) A Microbiol 62:1569–1585. https://doi.org/
bacteriolytic agent that detects and kills Bacil- 10.1111/j.1365-2958.2006.05473.x
lus anthracis. Nature 418:884–889. https:// 45. Pedulla ML, Ford ME, Houtz JM,
doi.org/10.1038/nature01026 Karthikeyan T, Wadsworth C, Lewis JA,
35. Miller ES, Kutter E, Mosig G, Arisaka F, Jacobs-Sera D, Falbo J, Gross J, Pannunzio
Kunisawa T, Ruger W (2003) Bacteriophage NR, Brucker W, Kumar V, Kandasamy J,
T4 genome. Microbiol Mol Biol Rev Keenan L, Bardarov S, Kriakov J, Lawrence
67:86–156. https://doi.org/10.1128/ JG, Jacobs WR Jr, Hendrix RW, Hatfull GF
MMBR.67.1.86-156.2003 (2003) Origins of highly mosaic mycobacter-
36. Monk A, Rees C, Barrow P, Hagens S, Harper iophage genomes. Cell 113:171–182
D (2010) Bacteriophage applications: where 46. Pena CE, Judy S, Hatfull Graham F (1998)
are we now? Lett. Appl. Microbiol Mycobacteriophage D29 integrase-mediated
51:363–369. https://doi.org/10.1111/j. recombination: specificity of mycobacterioph-
1472-765X.2010.02916.x age integration. Gene 225:143
37. Williams MM, Yakrus MA, Arduino MJ, 47. Donnelly-Wu MK, Jacobs WR Jr, Hatfull GF
Cooksey RC, Crane CB, Banerjee SN, Hil- (1993) Superinfection immunity of mycobac-
born ED, Donlan RM (2009) Structural anal- teriophage L5: applications for genetic trans-
ysis of biofilm formation by rapidly and slowly formation of mycobacteria. Mol Microbiol
growing nontuberculous mycobacteria. Appl 7:407–417
Environ Microbiol 75:2091–2098 48. Doke S (1960) Studies on mycobacterio-
38. Colijn C, Cohen T, Ganesh A, Murray M phages and lysogenic mycobacteria. J Kuma-
(2011) Spontaneous emergence of multiple moto Med Soc 34:1360–1373
drug resistance in tuberculosis before and
Dynamics of Mycobacteriophage—Mycobacterial Host Interaction 345
49. Lee MH, Pascopella L, Jacobs WR Jr, Hatfull 62. Kaźmierczak Z, Górski A, Da˛browska K
GF (1991) Site-specific integration of myco- (2014) Facing antibiotic resistance: Staphylo-
bacteriophage L5: integration-proficient vec- coccus aureus phages as a medical tool.
tors for Mycobacterium smegmatis, Viruses 6:2551–2570
Mycobacterium tuberculosis, and bacille 63. Pires DP, Vilas Boas D, Sillankorva S, Azeredo
Calmette-Guerin. Proc Natl Acad Sci U S A J (2015) Phage therapy: a Step forward in the
88:3111–3115 treatment of Pseudomonas aeruginosa infec-
50. Chatterjee S, Mitra M, Das Gupta SK (2000) tions. J Virol 89:7449–7456
A high yielding mutant of mycobacteriophage 64. Chhibber S, Kaur S, Kumari S (2008) Thera-
L1 and its application as a diagnostic tool. peutic potential of bacteriophage in treating
FEMS Microbiol Lett 188:47–53 Klebsiella pneumoniae B5055-mediated lobar
51. Chaudhuri B, Sau S, Datta HJ, Mandal NC pneumonia in mice. J Med Microbiol
(1993) Isolation, characterization, and 57:1508–1513
mapping of temperature-sensitive mutations 65. Strój L, Weber-Dabrowska B, Partyka K,
in the genes essential for lysogenic and lytic Mulczyk M, Wójcik M (1999) Successful
growth of the mycobacteriophage L1. Virol- treatment with bacteriophage in purulent
ogy 194:166–172 cerebrospinal meningitis in a newborn.
52. Freitas-Vieira A, Anes E, Moniz-Pereira J Neurol Neurochir Pol 33:693–698
(1998) The site-specific recombination locus 66. Cisło M, Dabrowski M, Weber-Dabrowska B,
of mycobacteriophage Ms6 determines DNA Woytoń A (1987) Bacteriophage treatment of
integration at the tRNA(Ala) gene of Myco- suppurative skin infections. Arch Immunol
bacterium spp. Microbiology Ther Exp (Warsz) 35:175–183
144:3397–3406 67. Kwarcinski W, Lazarkiewicz B, Weber-
53. Bowman B Jr (1958) Quantitative studies on Dabrowska B, Rudnicki J, Kaminski K, Scie-
some mycobacterialphage host systems. J. bura M (1994) Bacteriophage therapy in the
Bacteriol 76:52–62 treatment of repeated subphrenic abscess and
54. Timme TL, Brennan PJ (1984) Induction of subhepatic abscess with jejunal fistula after
bacteriophage from members of the Myco- stomach resection. Pol Tyg Lek 49:535
bacterium avium, Mycobacterium intracellu- 68. Shabalova IA, Karpanov NI, Krylov VN, Shar-
lare, Mycobacterium scrofulaceum ibjanova TO, Akhverdijan VZ (1995) Pseudo-
serocomplex. J Gen Microbiol monas aeruginosa bacteriophage in treatment
130:2059–2066 of p. aeruginosa infection in cystic fibrosis
55. Young R (1992) Bacteriophage lysis: mecha- patients. In Proceedings of IX International
nism and regulation. Microbiol Rev Cystic Fibrosis Congress. International Cystic
56:430–481 Fibrosis Association, Zurich, Switzerland,
56. Young R (2002) Bacteriophage holins: deadly p. 443
diversity. J Mol Microbiol Biotechnol 69. Proskurov VA (1970) Use of staphylococcal
4:21–36 bacteriophage for therapeutic and preventive
57. Loessner MJ (2005) Bacteriophage endoly- purposes. Zh Mikrobiol Epidemiol Immuno-
sins – current state of research and applica- biol 47:104–107
tions. Curr Opin Microbiol 8:480–487 70. Pavlenishvili I, Tsertsvadze T (1993) Bacter-
58. Berry J, Rajaure M, Pang T, Young R (2012) iophagotherapy and enterosrbtion in treat-
The spanin complex is essential for lambda ment of sepsis of newborns caused by gram
lysis. J Bacteriol 194:5667–5674 negative bacteria. Pren Neon Infect 11:104
59. Catalão MJ, Gil F, Moniz-Pereira J, São- 71. Perepanova TS, Darbeeva OS, Kotliarova GA,
José C, Pimentel M (2013) Diversity in bac- Kondrat’eva EM, Maı̆skaia LM, Malysheva
terial lysis systems: bacteriophages show the VF, Baı̆guzina FA, Grishkova NV (1995)
way. FEMS Microbiol Rev 37:554–571 The efficacy of bacteriophage preparations in
60. Capparelli R, Parlato M, Borriello G, treating inflammatory urologic diseases. Urol
Salvatore P, Iannelli D (2007) Experimental Nefrol (Mosk) 5:14–17
phage therapy against Staphylococcus aureus 72. D’hérelle F (1923) (1993) The Bacterio-
in mice. Antimicrob Agents Chemother phage, Its Role in Immunity. Ind Med Gaz.
51:2765–2773 58(9):443–444
61. Denou E, Bruttin A, Barretto C, Ngom- 73. Brüssow H (2005) Phage therapy: the Escher-
Bru C, Brüssow H, Zuber S (2009) T4 phages ichia coli experience. Microbiology
against Escherichia coli diarrhea: Potential 151:2133–2140
and problems. Virology 388:21–30
346 Arabinda Ghosh et al.
96. Lundegaard C, Lund O, Nielsen MJ (2011) 107. Toussaint NC, Feldhahn M, Ziehm M,
Prediction of epitopes using neural network Stevanovic S, Kohlbacher O (2011) T-cell
based methods. Immunol Methods 374 epitope prediction based on self-tolerance.
(1-2):26–34 In: Proceedings of the 2nd ACM Conference
97. Erlich H (2012) HLA DNA typing: past, on Bioinformatics, Computational Biology
present, and future. Tissue Antigens 80 and Biomedicine - BCB ’11. New York:
(1):1–11 ACM Press, p. 584
98. Zhang L, Chen Y, Wong HS, Zhou S, 108. Jacob L, Vert JP (2008) Efficient peptide-
Mamitsuka H, Zhu S (2012) TEPITOPEpan: MHC-I binding prediction for alleles with
extending TEPITOPE for peptide binding few known binders. Bioinformatics
prediction covering over 700 HLA-DR mole- 24:358–366
cules. PLoS One 7(2):e30483 109. Singh H, Raghava GP (2001) ProPred: pre-
99. Abecasis GR, Auton A, Brooks LD, DePristo diction of HLA-DR binding sites. Bioinfor-
MA, Durbin RM, Handsaker RE, Kang HM, matics 17:1236–1237
Marth GT, McVean GA (2012) An integrated 110. Wan J, Liu W, Xu Q, Ren Y, Flower DR, Li T
map of genetic variation from 1,092 human (2006) SVRMHC prediction server for
genomes. 1000 Genomes Project Consor- MHC-binding peptides. BMC Bioinformatics
tium. Nature 491(7422):56–65 7:463
100. Rammensee H, Bachmann J, Emmerich NP, 111. Vita R, Overton JA, Greenbaum JA,
Bachor OA, Stevanovic S (1999) SYF- Ponomarenko J, Clark JD, Cantrell JR,
PEITHI: database for MHC ligands and pep- Wheeler DK, Gabbard JL, Hix D, Sette A,
tide motifs. Immunogenetics 50:213–219 Peters B (2015) The immune epitope data-
101. Reche PA, Glutting JP, Reinherz EL (2002) base (IEDB) 3.0. Nucleic Acids Res 43(Data-
Prediction of MHC class I binding peptides base issue):405–412
using profile motifs. Human Immunol 112. Doytchinova IA, Guan P, Flower DR (2006)
63:701–709 EpiJen: a server for multistep T cell epitope
102. Parker KC, Bednarek MA, Coligan JE (1994) prediction. BMC Bioinformatics 7:131.
Scheme for ranking potential HLA-A2 bind- https://doi.org/10.1186/1471-2105-7-
ing peptides based on independent binding of 131
individual peptide side-chains. J Immunol 113. Donnes P, Kohlbacher O (2005) Integrated
152:163–175 modeling of the major events in the MHC
103. Lundegaard C, Lamberth K, Harndahl M, class I antigen processing pathway. Protein
Buus S, Lund O (2008) NetMHC-3.0: accu- Sci 4:2132–2140
rate web accessible predictions of human, 114. Larsen MV, Lundegaard C, Lamberth K,
mouse and monkey MHC class I affinities Buus S, Lund O, Nielsen M (2007) Large-
for peptides of length 8-11. Nucleic Acids scale validation of methods for cytotoxic
Res 36(Web Server issue):509–512 T-lymphocyte epitope prediction. BMC Bio-
104. Nielsen M, Lundegaard C, Blicher T, informatics 8:424
Lamberth K, Harndahl M, Justesen S et al 115. Tung CW, Ho SY (2007) POPI: predicting
(2007) NetMHCpan, a Method for Quanti- immunogenicity of MHC class I binding pep-
tative Predictions of Peptide Binding to Any tides by mining informative physicochemical
HLA-A and -B Locus Protein of Known properties. Bioinformatics 23:942–949
Sequence. PLoS ONE 2(8):e796 116. Sweredoski MJ, Baldi P (2008) (2008).
105. Bian H, Hammer J (2004) Discovery of pro- COBEpro: a novel system for predicting con-
miscuous HLA-II-restricted T cell epitopes tinuous B-cell epitopes. Protein Eng Des Sel
with TEPITOPE. Methods 34:468–475 22(3):113–120
106. Jojic N, Reyes-Gomez M, Heckerman D, 117. EL-Manzalawy Y, Dobbs D, Honavar V
Kadie C, Schueler-Furman O (2006) (2008) Predicting linear B-cell epitopes
Learning MHC I–peptide binding. Bioinfor- using string kernels. J Mol Recognit
matics 22:227–235 21:243–255
Chapter 20
Abstract
Electrochemiluminescence immunoassays are based on the principle of light emission in a chemical
environment to detect and analyze different proteins and biomolecules. It has numerous advantages over
traditional analytical methods including conservation of sample, high sensitivity, broad range, and relative
ease of use. Herein, we describe the electrochemiluminescence methods by using Mesoscale Discovery
System with recommendations and optimization of protocols to aid in discovery of biological relevant
markers and also discuss avoidance of major pitfalls for accurate biomarker detection.
1 Introduction
349
350 Vrushali Abhyankar and Ammaar H. Abidi
2 Materials
3 Methods
3.1 On Cell Western 1. Plate cells at a density of 20,000 cells/50 μL in full growth
media on high bind plates (Meso-Scale Discovery, Gaithers-
burg, MD).
2. The following day, remove full growth media gently and
replace with 100 μL media (e.g., DMEM with 1% FBS and
1% P/S. Let the cells harmonize and synchronize activity over a
24-h period.
352 Vrushali Abhyankar and Ammaar H. Abidi
3. The next day, add 100 μL of media with 1% FBS and 1% P/S
containing stimulus to achieve the final well concentration,
which is 1. An hour later (stimulating inflammation), add
compounds of choice with previously determined concentra-
tions. Make sure to account for volume differences so that the
final concentrations of ligands and stimulus are 1 (see
Note 1).
4. Discard medium after 24 h and use appropriate antibody con-
centrations most appropriate for your assay. For instance, to
study the on cell marker for pro-inflammation, use 2 μg/mL of
CD16/32 added in 30 μL PBS to each well and incubate for
2 h. The incubation is at room temperature with light shaking
shaker at 130 rpm, followed by a gentle 150 μL PBS wash twice.
5. Then use 30 μL of 2 μg/mL anti-rat (CD16/32) SULFO-
TAG antibody for another 2-h incubation. The incubation is at
room temperature with light shaking shaker at 130 rpm.
6. After 3 gentle washes with PBS 150 μL/well, add 2 -
Surfactant-Free Read Buffer and follow instructions in Sub-
heading 3.5 [13].
3.2 Conventional 1. To obtain conditioned media for the immune-assays, cells are
Assay (Multiplex) to be seeded at densities of 10,000–30,000 cells/well in
96-well polystyrene flat bottom plates (use collagen plates if
cell staining protocols are to be performed). However, samples
can be also obtained from and are not limited to plasma, serum,
urine, and CSF (see Note 2).
2. In a cell culture–based system, medium is changed after 24 h
from full growth medium (contains higher serum content) to
0–1% FBS with respective antibiotic (i.e., P/S, Gentamicin) for
another 24 h at 37 C, 5% CO2 to synchronize cell activity.
3. Check the cells for optimal health and morphology, followed
by addition of stimulus of choice to induce immune response
(made in the 0–1% FBS medium). The most common stimu-
lants include LPS, TNF-α, and IL-1β.
4. To test a compound(s) or molecule(s) ability to increase/
decrease immune responses in stimulated conditions, there
are a few approaches that can be taken. Stimulus can be
added an hour before at respective concentration to create a
proinflammatory response prior to the addition of compound.
For additive or synergistic effects, co-stimulation of compound
and stimulus can be added together. To achieve stimulated
inflammation, addition of the compound after 30–60 min
stimulus can be used [15] (see Note 3).
5. The conditioned medium can be removed based on assay
parameter or allotted points in a time course (i.e., 1, 6,
18, 24, 48, 72 h).
Electrochemiluminescence Immunoassays 353
3.5 ECLIA MesoScale 1. MSD Discovery Workbench (DWB) software can be used to
Discovery Reader prepare a template for the experiment or use previous plate
and Analysis layouts used (Fig. 1).
2. Launch DWB software; either make a new template or use a
previously prepared template.
3. Run the plate via reader (ex. MSD Sector 2400); the orienta-
tion of the plate does not matter as barcodes can detect the
orientation of the plate. The plates after reading can be
returned to the input site or sent to an adjacent site if multiple
plates are being read.
4. Click new plate layout icon and select the appropriate spot for
each well. If the plate is a standard kit (proinflammatory panel),
stored kit layouts can be used to simplify the process.
Fig. 3 Experiment 10-spot and plate information in Mesoscale Discovery Workbence Software
4 Notes
Fig. 4 Inter-plate statistics of standards and unknowns after read in Mesoscale Discovery Workbence
Software
Fig. 5 Spot selection in 10-spot plate displays data grid for selected analyte in Mesoscale Discovery
Workbence Software
10. Avoid making bubbles during pipetting steps as this may also
lead to variability in results. Important: Avoid bubbles espe-
cially during the addition of Read Buffer; if bubbles are pres-
ent, carefully use a pipette tip and gently remove them. DO
NOT touch the pipette tip at the bottom of the well or shake
plate after the addition of Read Buffer.
11. Capture antibodies pre-coated and exposed to propriety stabi-
lizing treatment on the multi-spot plates unless U-Plex is
purchased, in which a few additional steps are performed to
prime the plate with capture antibodies.
12. Increasing the rotary motion (rpm) for the plate shaker during
capture/detection stage may aid in reaching equilibrium.
Therefore, increasing above 500 rpm but below 1000 rpm is
recommended if available.
13. Keep plates sealed during incubation periods and carefully
remove the seal as fluids can spill.
360 Vrushali Abhyankar and Ammaar H. Abidi
Fig. 6 Statistics and protein concentrations of standards and Mesoscale Discovery Workbence Software
14. After the aspiration of media from MSD plates, gently tap the
plates over a paper towel to remove residual content.
15. The plates can also be run partially, therefore, to avoid the spill
of the content of unused wells; unused wells on the plate
should be sealed or covered during the procedure. The plate
can be kept in the fridge (2–8 C) for up to 30 days in the foil
pack with the dessicant.
16. Additionally, the general guidelines provided by MSD do not
always fit the calibration curve for all the samples. In the event
that samples are either above or below the calibration curve fit,
dilutions can be altered to optimize the response.
17. After read, if the results are not showing as expected. Consult
MSD technical specialist as dilution errors or assay template
setup in MSD Workbench Software may be an issue (common
error).
Alternative technologies do exist that may also be helpful
in the detection of biomakers. A quick highlight of two com-
parable technologies can be seen below.
Electrochemiluminescence Immunoassays 361
Fig. 7 Standard curve and fit of unknowns in Mesoscale Discovery Workbence Software
Fig. 8 Work Bench Exported to MS Excel to generate graphs and respective sheets
Fig. 9 HTRF-based assay principle to detect donor–acceptor biotinylated biomolecules. Permission to reprint
this image has been kindly provided by Cisbio
References
19. Pan J, Zheng QZ, Li Y et al (2019) Discovery for the simultaneous quantitative detection of
and validation of a serological autoantibody Neuropilin-1 and Neuropilin-2 using xMAP
panel for early diagnosis of esophageal squa- technology and its clinical application. J Clin
mous cell carcinoma. Cancer Epidemiol Bio- Lab Anal 33(4):e22850. https://doi.org/10.
markers Prev. https://doi.org/10.1158/ 1002/jcla.22850
1055-9965.Epi-18-1269 22. Bates AM, Fischer CL, Abhyankar VP et al
20. Cao Q, Xiao B, Jin G et al (2019) Expression of (2018) Matrix metalloproteinase response of
transforming growth factor beta and matrix dendritic cell, gingival epithelial keratinocyte,
metalloproteinases in the aqueous humor of and T-cell transwell co-cultures treated with
patients with congenital ectopia lentis. Mol porphyromonas gingivalis hemagglutinin-B.
Med Rep. https://doi.org/10.3892/mmr. Int J Mol Sci 19(12). https://doi.org/10.
2019.10287 3390/ijms19123923
21. Huang ZL, Meng PP, Yang Y et al (2019)
Establishment of a bead-based duplex assay
Chapter 21
Abstract
Autoantibodies are antibodies against host self-proteins (autoantigens), which play significant roles in
homeostasis maintenance and diseases with autoimmune disorders. Numerous papers were published in
the past decade on the identification of human autoantigens in different human diseases. However, there is
no consensus collection with all the reported autoantigens yet. To address this need, previously we
developed a human autoantigen database, AAgAtlas 1.0, by text-mining and manual curation, which
collects 1126 autoantigens associated with 1071 human diseases. AAgAtlas 1.0 provides a user-friendly
interface to conveniently browse, retrieve, and download human autoantigen genes, their functional
annotation, related diseases, and the evidence from the literature. AAgAtlas is freely available online
http://biokb.ncpsb.org/aagatlas/. In this chapter, we make an introduction and provide a guide to the
users of AAgAtlas 1.0 database.
Key words Database, Autoantibody, Autoantigen, Autoimmune disease, Cancer, Biomarker, Diag-
nosis, Therapeutic treatment
1 Introduction
365
366 Dan Wang et al.
2 Methods
2.1 AAg-Related All PubMed abstracts were extracted from PubMed database
Keywords Extraction through the NCBI E-utilities API. The AAg-related abstracts
were obtained by bio-entity recognizer using the keywords of
either “autoantigen” or “autoantibody” or their lexical variants
like “auto-antigen”, “autoantigens”, “auto-antigens”, “auto-anti-
body”, “autoantibodies”, or “auto-antibodies”. As a result, 45,830
abstracts and 94,313 sentences were obtained (see Notes 1 and 2).
2.3 Manual Curation We performed three rounds of manual curation to remove false-
positives and select the bona fide AAgs for our database. (1) All
extracted sentences with appropriate AAg names were checked and
selected by two experienced researchers, independently; (2) the
resulting sentences were then submitted to an internal review, in
which all AAg names were manually reviewed and approved by
three experts again; (3) all co-authors were required to randomly
check database to make sure that all genes imported into our
database are bona fide AAgs with appropriate supporting evidence.
Finally, 1126 AAg genes and 1071 related diseases were obtained
(see Note 3). All genes were functionally annotated and uploaded
to the database with appropriate evident sentences.
3.1 Query by Gene On the “Home” page, the user can enter the gene symbol in the
Symbol “Gene Symbol” search box. The drop-down menu will provide the
auto-completed gene symbol in AAgAtlas 1.0 database. Select one
and click the “Search” button; the page will return the searching
result. The results are divided into four columns: Gene, Disease,
PubMed Abstracts, and Sentences. Basic information about the
AAg gene and cross-references to external databases can be
obtained by clicking on the hyperlink of the AAg gene symbol in
the gene column. Supporting literature evidence can be viewed by
clicking on the PubMed abstract or sentence. If the user clicks the
“Reset” button, all current search terms will be deleted.
Here we employ Breast cancer type 1 susceptibility protein
(BRCA1) as an example to show the query with the gene symbol.
BRCA1 is a tumor suppressor that can regulate the cell growth
through the maintenance of genomic stability. Mutations in
BRCA1 and BRCA2 account for about 25% of familial breast
cancers [5, 6]. After typing BRCA1 in the search box of “Gene
Symbol” and clicking “Search”, the searching results will be dis-
played, which contain the BRCA1 gene name, a list of related
diseases, PubMed abstracts, and supporting sentence (Fig. 2).
The detailed information of BRCA1 can be shown after a click on
the BRCA1 gene name, which contains the validated evidence,
synonyms, disease section, description, Entrez gene summary,
chromosome, cytoband, chromosome location (bp), and the links
to Ensembl, Entrez gene, Uniprot, neXtProt, and Antibodypedia
databases (Fig. 3). In the search result, it can be noticed that AAb to
BRCA1 was generated in a variety of human cancers, including
breast cancer, lung cancer, prostate cancer, and ovarian cancer
with the frequency from 0.7% to 28% [7, 8]. Furthermore, the
combination of BRCA1 and other AAbs (p53, c-myc, HER2,
NY-ESO-1, BRCA2 and MUC1) can achieve the sensitivity of
65% for primary breast cancer and 45% ductal carcinoma in situ at
a specificity of 85% [8].
AAgAtlas 1.0: A Database of Human Autoantigens Extracted from Biomedical. . . 369
Fig. 3 Detail information of the queried AAg gene AAgAtlas 1.0 database
3.2 Query by The user can also query the AAgs that are associated with a specific
Disease Term disease using “Query by disease term” function. For example, as
the most common invasive cancer in women, breast cancer affects
~12% of women worldwide. The development of breast cancer is
associated with being female, obesity, genetics, drinking alcohol,
etc. [9]. The accumulating evidence reveals the association between
breast cancer and autoimmunity, in which the risk of developing
breast cancer can be reduced in patients with rheumatoid arthritis
and systemic lupus erythematosus and increased in the patients with
psoriasis [10].
To find all the reported AAbs associated with breast cancer, we
entered the name of “breast cancer” into the “Disease Term”
search box, the search engine returned the results, including
88 AAg genes related to breast cancer, the supporting literature
evidence, and the number of sentences (Fig. 4). These AAgs
include the well-known cancer-related genes, such as ERBB2,
BRCA1, TP53, TP63, and MUC1. When the user clicks on the
supporting literature of the evidence or the number of sentences,
the page will display a table containing the gene, disease, PubMed
ID, evidence, and manual verification information. The original
AAgAtlas 1.0: A Database of Human Autoantigens Extracted from Biomedical. . . 371
Fig. 5 GO analysis of breast cancer associated AAgs using panther database. (A–D) are the analysis of
biological process, signaling pathway, subcellular location and protein class, respectively
4 Notes
Acknowledgments
This work was supported by the Chinese National Major Project for
New Drug Innovation (2018ZX09733003), National Key Basic
Research Project (2018YFA0507503, 2017YFC0906703),
National Natural Science Foundation of China (81673040 and
31870823), State Key Laboratory of Proteomics (SKLP-
O201703 and SKLP-K201505), and Capital’s Funds for Health
Improvement and Research (2018-2-4034).
References
Abstract
One of the major challenges in the field of vaccine design is identifying B-cell epitopes in continuously
evolving viruses. Various tools have been developed to predict linear or conformational epitopes, each
relying on different physicochemical properties and adopting distinct search strategies. In this chapter, we
propose different ensemble meta-learning approaches for epitope prediction based on stacked, cascade
generalizations, and meta decision trees. Through meta learning, we expect a meta learner to be able to
integrate multiple prediction models and outperform the single best-performing model. The objective of
this chapter is twofold: (1) to promote the complementary predictive strengths in different prediction tools
and (2) to introduce computational models to exploit the synergy among various prediction tools. Our
primary goal is not to develop any particular classifier for B-cell epitope prediction, but to advocate the
feasibility of meta learning to epitope prediction. With the flexibility of meta learning, the researcher can
construct various meta classification hierarchies that are applicable to epitope prediction in different protein
domains.
Key words B-cell epitopes, Meta learning, Stacking, Cascade, Meta decision trees
1 Introduction
375
376 Yuh-Jyh Hu
2.2 Stacked Stacked and cascade generalizations are methods of combining the
Generalization predictions of multiple learning models that have been trained for a
(Stacking) classification task [22–25]. Unlike approaches based on bagging
and Cascade [27] or boosting [28], which aim to reduce the variance of multiple
Generalization learners to improve performance, stacked and cascade generaliza-
(Cascade) tions both work as layered processes with the aim of reducing
learner bias.
In stacked generalization, each of a set of base learners is trained
in a data set, and the predictions of these base learners become the
meta features. A successive layer of meta learners receives the meta
features as the input with which to train the meta models in parallel,
passing their output to the subsequent layer. A single classifier at
the top level makes the final prediction. Figure 1 shows a stacked
generalization architecture. Stacked generalization is considered a
378 Yuh-Jyh Hu
Fig. 1 Generic stacked generalization architecture. In stacked generalization, a varying number of meta
learners are placed in parallel at each level in the hierarchy. They integrate and transform the output from the
preceding level into meta features and pass the meta features as the input to the successive level. One meta
learner serves as the arbitrator at the top level to produce the final meta classification
Table 1
Summary of base features
(continued)
Meta Learning to Predict B-cell Epitopes 381
Table 1
(continued)
2.3 Meta Decision MDTs [26] are used for meta learning that applies multiple base
Trees (MDT) classifiers to a single data set by exploiting the classification results
of the base classifiers as a type of meta-knowledge. The structure of
an MDT is identical to that of an ordinary decision tree, in that
both have internal nodes and leaves, and have the same computa-
tional complexity; however, in an MDT, the attributes associated
with the internal nodes and the meaning indicated by the leaves
differ from those of an ordinary decision tree.
In both MDT and ordinary decision trees, an internal node
specifies a test on an attribute value. For an ordinary decision tree,
the attribute selected for the internal node must be one of the base
382 Yuh-Jyh Hu
Fig. 3 Two-level stacking architecture. The conformational epitope predictors and linear epitope predictors
were all placed at Level 0. One of the learners SVM, C4.5, k-NN, or ANN served as a meta learner to integrate
the output from the base predictors and produced the meta classification as the final result
Fig. 4 Three-level stacking architecture. The conformational epitope predictors and linear epitope predictors
were all placed at Level 0. We selected C4.5, k-NN, and ANN as the Level-1 meta learners that transformed
the output of the base predictors into meta features and passed them to the successive level. We designated
SVM as the top meta learner that learned from the base features and the meta features to produce the meta
classification as the final result
Fig. 5 Cascade generalization architecture. The conformational epitope predictors and linear epitope pre-
dictors all served at Level 0 as the base predictors. We placed k-NN, C4.5, ANN, and SVM sequentially from
Levels 1 to 4 as meta learners. Each meta learner generalized the output from the previous level to meta
knowledge in the form of meta features. The meta features and base features propagated sequentially to the
successive level as input to the subsequent meta learner. The top-level meta learner, SVM, produced the final
meta classification
Fig. 6 Sample ordinary decision tree and MDT. (a) An ordinary decision tree and (b) a meta decision tree
X
m X
m
ScoreN ¼ α∙ wi ∙dni 0:5e þ ð1 αÞ∙ w i ∙ni ð3Þ
i¼1 i¼1
386 Yuh-Jyh Hu
2.4 Data Sets An epitope prediction server must be trained to obtain its predic-
and Performance tion model before it can make a prediction. Because the epitope
Measures predictors used in this study were web-based servers or software
packages, they could not be retrained using novel training data. To
conduct a consistent and unbiased comparative analysis of the
prediction performances of these servers, we created an indepen-
dent data set of antigens with known epitopes. We collected the test
data sets used in DiscoTope 2.0 [21], SEPPA 2.0 [14], and Bpre-
dictor [33] and combined them with the data of the Epitome
database [58] and Immune Epitope Database (IEDB) [59]. After
removing the duplicate proteins, and filtering out the antigens
without annotations, or previously used to train the base prediction
servers, we built an independent data set of 64 antigens for predic-
tion performance evaluation (Table 2). To ensure fair comparison
between different prediction methods, we used the independent
64 antigens with the epitope residues annotated in the IEDB for
testing and selected 94 antigens that have been previously used to
train the base prediction servers (Table 3) to train the classification
models. The antigen protein 3D structures were used as input for
the structure-based classifiers, and the corresponding antigen
sequences were sent to the sequence-based predictors as input.
Meta Learning to Predict B-cell Epitopes 387
Table 2
Test data set of 64 protein antigens
Table 3
Training data set of 94 protein antigens
1A2Y_C 1ADQ_A 1AFV_A 1AHW_C 1AR1_B 1BGX_T 1BQL_Y 1BVK_C 1C08_C 1DQJ_C
1DZB_X 1DZB_Y 1EGJ_A 1EO8_A 1EZV_E 1FDL_Y 1FNS_A 1FSK_A 1G7H_C 1G7I_C
1G7J_C 1G7L_C 1G7M_C 1G9M_G 1G9N_G 1GC1_G 1HYS_B 1IC4_Y 1IC5_Y 1IC7_Y
1J1O_Y 1J1P_Y 1J1X_Y 1JHL_A 1JPS_T 1JRH_I 1KIP_C 1KIQ_C 1KIR_C 1KYO_E
1LK3_A 1MEL_L 1MHP_B 1MLC_E 1N8Z_C 1NBY_C 1NBZ_C 1NDG_C 1NDM_C 1NSN_S
1OAK_A 1ORS_C 1OSP_O 1QLE_B 1R3K_C 1RJL_C 1RVF_1 1RVF_2 1RVF_3 1RZJ_G
1RZK_G 1TZH_V 1TZI_V 1UA6_Y 1UAC_Y 1UJ3_C 1V7M_V 1W72_A 1WEJ_F 1XIW_A
1YJD_C 1YQV_Y 1YY9_A 1ZTX_E 2AEP_A 2ARJ_Q 2B2X_A 2DD8_S 2EIZ_C 2HMI_B
2Q8A_A 2QQK_A 2QQN_A 2UZI_R 2VH5_R 2VXQ_A 2VXT_I 2W9E_A 2XTJ_A 2ZUQ_A
Table 4
Definitions of performance measures
2.5 Correlation A meta classifier can consist of an arbitrary number of base learners,
Analysis and its overall performance depends on these learning components.
If the learning components have complementary predictive
strengths, a meta classifier can search a variety of hypotheses in
the hypothesis space and provide superior generalizations for
novel test data than a single-component learner can. We introduce
two methods to evaluate the correlation between base learning
components in meta learning. One is based on statistical correlation
analysis; the other is based on clustering analysis.
2.5.1 Stacking We used statistical techniques to analyze the B-cell epitope predic-
and Cascade tion tools. We evaluated the correlations between the prediction
scores, and between the rankings of the prediction scores. Using a
Pearson’s correlation analysis, we measured the strength of the
relationship between the prediction scores produced by the tools.
We ranked the prediction scores produced by the tools and calcu-
lated the Spearman’s rank correlation coefficient to investigate the
correlations between the prediction score rankings of the predic-
tion tools. The results from correlation analysis can provide a basis
for selecting the appropriate base learners in stacking and cascade
meta learning.
3 Experimental Results
3.1.1 Prediction We evaluated four conformational and four linear epitope predic-
Correlations Between Base tors as the base learners in our stacking and cascade architectures.
Prediction Servers The conformational predictors were DiscoTope 2.0 [9], ElliPro
[10], SEPPA 2.0 [11], and Bpredictor [18]; and the linear epitope
predictors were BepiPred [5], ABCpred [6], AAP [7], and
BCPREDS [8]. We calculated the Pearson’s correlation coefficients
for the prediction scores produced by the base prediction tools. To
further analyze the correlations among predictions based on the
score rankings, we sorted the prediction scores of all protein sites
provided by each base learner and then conducted a Spearman’s
rank correlation analysis. Tables 5 and 6 list the Pearson’s correla-
tion coefficients and Spearman’s rank correlation coefficients of all
pairs of linear and conformational predictors, respectively. The
average correlation coefficients of the linear and conformational
prediction tools were 0.383 vs. 0.384 and 0.370 vs. 0.459 in the
Pearson’s and Spearman’s correlation analyses, respectively, which
indicate a relatively weak correlation among the epitope predictions
of the base learners.
3.1.2 Prediction We built MDTs based on eight base classifiers: C4.5 [29], k-NN
Correlations Between Base [30], SVM [32], RF [52], PART [53], BN [54], JRip [55], and VP
Inductive Classifiers [56]. We measured the correlation between two base classifiers by
the ARI [37] of their classifications. Table 7 lists the ARI values of
all pairs of the base classifiers for an independent test data set of
18 antigens. The mean standard deviation ARI values for the test
data sets were 0.238 0.084; the ARI value is relatively low,
indicating a relatively weak correlation among the base classifiers.
Table 5
Correlation analysis of linear epitope predictors
Table 6
Correlation analysis of conformational epitope predictors
Table 7
Correlation analysis of base inductive classifiers based on ARI
Classifier C4.5 KNN Voted perceptron PART Random forest Bayes Net JRip
KNN 0.198 - - - - - -
Voted perceptron 0.189 0.256 - - - - -
PART 0.290 0.259 0.282 - - - -
Random forest 0.157 0.245 0.166 0.164 - - -
BayesNet 0.232 0.120 0.126 0.240 0.050 - -
JRip 0.251 0.239 0.281 0.306 0.237 0.191 -
SVM 0.248 0.359 0.382 0.386 0.361 0.125 0.335
Table 8
Fivefold cross-validation of meta classifiers
Table 9
Fivefold cross-validation of base epitope prediction servers
3.3 Independent In addition to the comparisons between the meta classifiers and the
Tests base epitope predictors for the same fivefold CVs, we also com-
pared the meta classifiers with the epitope predictors separately,
using different test antigens selected from the independent test
data set of 64 antigens (see Subheading 2.4). We conducted the
experiments on several representative epitope predictors: SEPPA
2.0, DiscoTope 2.0, Bpredictor, ElliPro, and CBTOPE [61]. Each
of them had been trained and tested by different data sets. We first
trained the meta classifiers, stacking, cascade, and BaggingMDT,
from the same training data set of 94 antigens (see Subheading 2.4),
which were used previously to train these predictors in comparison.
In each experiment, we selected one epitope predictor for compar-
ison. To conduct consistent and unbiased analysis, from the inde-
pendent test data set of 64 antigens, we removed those that were
also used to train the base predictor selected for comparison to
ensure the training and test data were mutually exclusive. Table 10
shows that stacking, cascade, and BaggingMDT were superior or
comparable to these representative epitope predictors. The results
demonstrate that the synergy in the effects of multiple epitope
predictors or inductive classifiers can achieve superior performance
compared with that produced by a single epitope predictor.
4 Conclusion
Table 10
Results of independent tests
5 Notes
References
1. Meloen RH, Puijk WC, Langeveld JP, Lange- 14. Qi T, Qiu T, Zhang Q, Tang K, Fan Y, Qiu J
dijk JP, Timmerman P (2003) Design of syn- et al (2014) SEPPA 2.0-more refined server to
thetic peptides for diagnostics. Curr Protein predict spatial epitope considering species of
Pept Sci 4:253–260 immune host and subcellular localization of
2. Tanabe S (2007) Epitope peptides and immu- protein antigen. Nucleic Acids Res 42:
notherapy. Curr Protein Pept Sci 8:109–118 W59–W63
3. Naz RK, Dabir P (2007) Peptide vaccines 15. Ponomarenko J, Bui HH, Li W, Fusseder N,
against cancer, infectious diseases, and concep- Bourne PE, Sette A, Peters B (2008) ElliPro: a
tion. Front Biosci 12:1833–1844 new structure-based tool for the prediction of
4. Benjamin DC, Berzofsky JA, East IJ, Gurd FR, antibody epitopes. BMC Bioinformatics 9:514
Hannum C, Leach SJ et al (1984) The anti- 16. Karplus PA, Schulz GE (1985) Prediction of
genic structure of proteins: a reappraisal. Annu chain flexibility in proteins – a tool for the
Rev Immunol 2:67–101 selection of peptide antigens. Naturwis-
5. Pellequer JL, Westhof E, Van Regenmortel senschaften 72:212–213
MH (1991) Predicting location of continuous 17. Rubinstein ND, Mayrose I, Martz E, Pupko T
epitopes in proteins from their primary struc- (2009) Epitopia: a web-server for predicting
tures. Methods Enzymol 203:176–201 B-cell epitopes. BMC Bioinformatics 10:287
6. Hopp TP, Woods KR (1981) Prediction of 18. Zhang W, Liu J, Zhao M, Li Q (2012) Predict-
protein antigenic determinant from amino ing linear B-cell epitopes by using sequence-
acid sequences. Proc Natl Acad Sci U S A derived structural and physicochemical fea-
78:3824–3828 tures. Int J Data Min Bioinform 6(5):557–569
7. Pellequer J, Westhof E, Van Regenmortel M 19. Liang S, Zheng D, Standley DM, Yao B,
(1993) Correlation between the location of Zacharias M, Zhang C (2010) EPSVR and
antigenic sites and the prediction of turns in EPMeta: prediction of antigenic epitopes
proteins. Immunol Lett 36(1):83–99 using support vector regression and multiple
8. Blythe MJ, Doytchinova IA, Flower DR server results. BMC Bioinformatics 11:381
(2002) JenPep: A database of quantitative 20. Zhang W, Niu Y, Xiong Y, Zhao M, Yu R, Liu J
functional peptide data for immunology. Bio- (2012) Computational prediction of confor-
informatics 18(3):434–439 mational B-cell epitopes from antigen primary
9. Larsen JE, Lund O, Nielsen M (2006) structures by ensemble learning. PLoS One 7
Improved method for predicting linear B-cell (8):e43575
epitopes. Immunome Res 2:2 21. Kringelum JV, Lundegaard C, Lund O, Niel-
10. Saha S, Raghava G (2006) Prediction of con- sen M (2012) Reliable B cell epitope predic-
tinuous B-cell epitopes in an antigen using tions: impacts of method development and
recurrent neural network. Proteins 65 improved benchmarking. PLoS Comput Biol
(1):40–48 8(12):e1002829
11. Chen J, Liu H, Yang J, Chou K (2007) Predic- 22. Wolpert DH (1992) Stacked Generalization.
tion of linear B-cell epitopes using amino acid Neural Netw 5:241–259
pair antigenicity scale. Amino Acids 33 23. Ting KM, Witten IH (1997) Stacked generali-
(3):423–428 zation: When does it work? In: International
12. El-Manzalawy Y, Dobbs D, Honavar V (2008) Joint Conference on Artificial Intelligence, pp
Predicting linear B-cell epitopes using string 866–873
kernels. J Mol Recognit 21(4):243–255 24. Gama J (1998) Combining classifiers by con-
13. Andersen PH, Nielsen M, Lund O (2006) Pre- structive induction. In: European Conference
diction of residues in discontinuous B-cell epi- on Machine Learning, pp 178–189
topes using protein 3D structures. Protein Sci 25. Gama J, Brazdil P (2000) Cascade Generaliza-
15:2558–2567 tion. Mach Learn 41(3):315–343
396 Yuh-Jyh Hu
26. Todorovski L, Dzeroski S (2000) Combining 41. Gerstein M, Tsai J, Levitt M (1995) The vol-
multiple models with meta decision trees. Lect ume of atoms on the protein surface: calculated
Notes Comput Sci 1910:54–64 from simulation, using Voronoi Polyhedra. J
27. Breiman L (1996) Bagging predictors. Mach Mol Biol 249:955–966
Learn 24:123–140 42. Lee B, Richards FM (1971) The interpretation
28. Schapire R (1990) The strength of weak learn- of protein structures: estimation of static acces-
ability. Mach Learn 5:197–227 sibility. J Mol Biol 55(3):379–400
29. Quinlan JR (1993) C4.5: programs for 43. Gerstein M (1992) A resolution-sensitive pro-
machine learning. Morgan Kaufmann Publish- cedure for comparing protein surfaces and its
ers, San Francisco application to the comparison of antigen-
30. Cover TM, Hart PE (1967) Nearest neighbor combining sites. Acta Cryst A48:271–276
pattern classification. IEEE Trans Inf Theory 44. Hausman RE, Cooper GM (2003) The cell: a
13(1):21–27 molecular approach. ASM Press, Washington,
31. Bishop CM (1996) Neural networks for pat- DC
tern recognition. Oxford University Press, 45. Kyte J, Doolittle RF (1982) A simple method
Oxford for displaying the hydropathic character of a
32. Chang CC, Lin CJ (2011) LIBSVM: a library protein. J Mol Biol 157(1):105–132
for support vector machines. ACM Trans Intell 46. Kolaskar AS, Tongaonkar PC (1990) A semi-
Syst Technol 2(3):1–27 empirical method for prediction of antigenic
33. Zhang W, Xiong Y, Zhao M, Zou H, Ye X, Liu determinants on protein antigens. FEBS Lett
J (2011) Prediction of conformational B-cell 276(1–2):172–174
epitopes from 3D structures by random forests 47. Hu Y-J, Lin S-C, Lin Y-L, Lin K-H, You S-N
with a distance-based feature. BMC Bioinfor- (2014) A meta-learning approach for B-cell
matics 12:341 conformational epitope prediction. BMC Bio-
34. Nagano K (1973) Logical analysis of the mech- informatics 15:378
anism of protein folding: I. predictions of heli- 48. Emini EA, Hughes JV, Perlow DS, Boger J
ces, loops and beta-structures from primary (1985) Induction of hepatitis A virus-
structure. J Mol Biol 75(2):401–420 neutralizing antibody by a virus-specific syn-
35. Hubbard SJ, Thornton JM (1993) NACCESS thetic peptide. J Virol 55:836–839
Computer Program. Department of Biochem- 49. Janin J, Wodak S, Levitt M, Maigret B (1978)
istry and Molecular Biology, University Col- Conformation of amino acid side-chains in
lege London proteins. J Mol Biol 125(3):357–386
36. Lipkin HJ (2004) Physics of Debye-Waller Fac- 50. Ponnuswamy PK, Prabhakaran M, Manavalan
tors. arXiv:cond-mat/0405023 P (1980) Hydrophobic packing and spatial
37. Liu R, Hu J (2011) Prediction of discontinu- arrangement of amino-acid-residues in
ous B-cell epitopes using logistic regression globular-proteins. Biochim Biophys Acta
and structural information. J Proteomics 623:301–316
Bioinform 4:10–15 51. Grantham R (1974) Amino acid difference for-
38. Sanner MF, Olson AJ, Spehner JC (1996) mula to help explain protein evolution. Science
Reduced surface: an efficient way to compute 185:862–864
molecular surfaces. Biopolymers 38 52. Breiman L (2001) Random forests. Mach
(3):305–320 Learn 45:5–32
39. Parker JM, Guo D, Hodges RS (1986) New 53. Frank E, Witten IH (1998) Generating accu-
hydrophilicity scale derived from high- rate rule sets without global optimization. In:
performance liquid chromatography peptide Proceedings of the Fifteenth International
retention data: correlation of predicted surface Conference on Machine Learning, pp 144–151
residues with antigenicity and X-ray-derived 54. Pearl J (1988) Probabilistic reasoning in intel-
accessible sites. Biochemistry 25 ligent systems: networks of plausible inference.
(19):5425–5432 Morgan Kaufmann Publishers Inc., Burling-
40. Zhang Z, Sch€affer AA, Miller W, Madden TL, ton, MA
Lipman DJ, Koonin EV, Altschul SF (1998) 55. Cohen WW (1995) Fast effective rule induc-
Protein sequence similarity searches using pat- tion. In: Proceedings of the Fifteenth Interna-
terns as seeds. Nucleic Acids Res 26 tional Conference on Machine Learning, pp
(17):3986–3990 115–123
Meta Learning to Predict B-cell Epitopes 397
56. Freund Y, Schapire RF (1999) Large margin 59. Ponomarenko J, Papangelopoulos N, Zajonc
classification using the perceptron algorithm. DM, Peters B, Sette A, Bourne PE (2011)
Mach Learn 37:277–296 IEDB-3D: structural data within the immune
57. Breiman L, Friedman JH, Olshen RA, Stone CJ epitope database. Nucleic Acids Res 39:
(1984) Classification and regression trees. D1164–D1170
Wadsworth & Brooks/Cole Advanced Books 60. Hubert L, Arabie P (1985) Comparing parti-
& Software, Monterey, CA tions. J Classif 2:193–218
58. Schlessinger A, Ofran Y, Yachdav G, Rost B 61. Ansari HR, Raghava G (2010) Identification of
(2006) Epitome: database of structure-inferred conformational B-cell Epitopes in an antigen
antigenic epitopes. Nucleic Acids Res 34: from its primary sequence. Immunome Res 6:6
D777–D780
Chapter 23
Abstract
The proteasome complex is mainly responsible for proteolytic degradation of cytosolic proteins, generating
the C-terminus of MHC I-restricted peptide ligands and CD8 T cell epitopes. Therefore, prediction of
proteasomal cleavage sites is relevant for anticipating CD8 T-cell epitopes. There are two different protea-
somes, the constitutive proteasome, expressed in all types of cells, and the immunoproteasome, constitu-
tively expressed in dendritic cells. Although both proteasome forms generate peptides for presentation by
MHC I molecules, the immunoproteasome is the main form involved in providing peptide fragments for
priming CD8 T cells. On the contrary, the proteasome provides peptides for presentation by MHC I
molecules that can be targeted by already primed CD8 T cells. Proteasome cleavage prediction server
(PCPS) is a server for predicting cleavage sites generated by both the constitutive proteasome and the
immunoproteasome. Here, we illustrate the usage of PCPS to predict proteasome and immunoproteasome
cleavage sites and compare the results with those provided by NetChop, a related tool available online.
PCPS is implemented for free public use available online at http://imed.med.ucm.es/Tools/pcps/.
1 Introduction
399
400 Marta Gomez-Perosanz et al.
2 PCPS Overview
Fig. 1 Web interface of PCPS. Cleavage site predictions by PCPS are performed in three simple steps:
(1) Upload the target sequence, (2) choose the proteasome and/or immunoproteasome cleavage model to
apply, and (3) run analysis
3.1 Input Protein input sequence/s query for PCPS can be pasted or
uploaded from a local file. In our example (Fig. 2a), we have
introduced our target HCV proteome in the “INPUT” field in
FASTA (see Note 1). If user opts for uploading the sequence from a
local file, there will be two sequential steps: first, browse/choose
the local file with the sequences and second, hit the upload bottom.
This is done to facilitate preprocessing and error checking of input
data prior to submission to the server.
3.2 Cleavage The web server allows the user to select a single n-gram cleavage
Prediction Models prediction model, proteasome or immunoproteasome, or both
models simultaneously (see Note 2). The different cleavage model
predictions were trained on datasets of peptide fragments of differ-
ent lengths (6, 8, or 12 amino acids) consisting of 382 MHCI-
eluted peptides (proteasome model) or 553 naturally processed
CD8 T-cell epitopes (immunoproteasome model). Datasets com-
prise two distinct portions with the same number of residues: one
consisting of the C-terminal end of MHCI-restricted peptides and
Fig. 2 PCPS search example and output. The figure illustrates the different parameters selected for computing
PCPS predictions using HCV proteome (ACN: M62321.1) as example (a) and the proteasome and immuno-
proteasome cleavage sites prediction results provided by the server (b)
Proteasomal Cleavage Predictions 403
Table 1
Cleavage models available at PCPS server
Table 2
Predictive performance of PCPS and NetChop
Server SE SP MCC
PCPS 0.88 0.57 0.34
NetChop 0.78 0.66 0.33
SE sensitivity, SP specificity, MCC Mathews’ correlation coefficient of the cleavage
prediction method as computed by Eqs. 1, 2, and 3
5 Notes
Acknowledgments
References
1. Kloetzel PM (2001) Antigen processing by the predictions of proteasomal cleavage, TAP
proteasome. Nat Rev Mol Cell Biol 2 transport and MHC class I binding. Cell Mol
(3):179–187 Life Sci 62(9):1025–1037
2. Blum JS, Wearsch PA, Cresswell P (2013) 11. Holzhutter HG, Frommel C (1999) Kloetzel
Pathways of antigen processing. Annu Rev PM. A theoretical approach towards the identi-
Immunol 31:443–473 fication of cleavage-determining amino acid
3. Rock KL, Goldberg AL (1999) Degradation of motifs of the 20 S proteasome. J Mol Biol
cell proteins and the generation of MHC class 286(4):1251–1265
I-presented peptides. Annu Rev Immunol 12. Kuttler C, Nussbaum AK, Dick TP, Rammen-
17:739–779 see HG, Schild H, Hadeler KP (2000) An
4. Craiu A, Akopian T, Goldberg A, Rock KL algorithm for the prediction of proteasomal
(1997) Two distinct proteolytic processes in cleavages. J Mol Biol 298(3):417–429
the generation of a major histocompatibility 13. Bhasin M, Raghava GP (2005) Pcleavage: an
complex class I-presented peptide. Proc Natl SVM based method for prediction of constitu-
Acad Sci U S A 94(20):10850–10855 tive proteasome and immunoproteasome cleav-
5. Dalet A, Stroobant V, Vigneron N, Van den age sites in antigenic sequences. Nucleic Acids
Eynde BJ (2011) Differences in the production Res 33(Web Server issue):W202–W207
of spliced antigenic peptides by the standard 14. Saxova P, Buus S, Brunak S, Kesmir C (2003)
proteasome and the immunoproteasome. Eur Predicting proteasomal cleavage sites: a com-
J Immunol 41(1):39–46 parison of available methods. Int Immunol 15
6. Morel S, Levy F, Burlet-Schiltz O, Brasseur F, (7):781–787
Probst-Kepper M, Peitrequin AL et al (2000) 15. Kesmir C, Nussbaum AK, Schild H, Detours V,
Processing of some antigens by the standard Brunak S (2002) Prediction of proteasome
proteasome but not by the immunoprotea- cleavage motifs by neural networks. Protein
some results in poor presentation by dendritic Eng 15(4):287–296
cells. Immunity 12(1):107–117 16. Diez-Rivero CM, Lafuente EM, Reche PA
7. Nielsen M, Lundegaard C, Lund O, Kesmir C (2010) Computational analysis and modeling
(2005) The role of the proteasome in generat- of cleavage by the immunoproteasome and the
ing cytotoxic T-cell epitopes: insights obtained constitutive proteasome. BMC Bioinformatics
from improved predictions of proteasomal 11:479
cleavage. Immunogenetics 57(1–2):33–41 17. Fleri W, Paul S, Dhanda SK, Mahajan S, Xu X,
8. Rivett AJ, Hearn AR (2004) Proteasome func- Peters B et al (2017) The immune epitope
tion in antigen presentation: immunoprotea- database and analysis resource in epitope dis-
some complexes, peptide production, and covery and synthetic vaccine design. Front
interactions with viral proteins. Curr Protein Immunol 8:278
Pept Sci 5(3):153–161 18. Reche PA, Glutting JP, Zhang H, Reinherz EL
9. Nussbaum AK, Kuttler C, Hadeler KP, Ram- (2004) Enhancement to the RANKPEP
mensee HG, Schild H (2001) PAProC: a pre- resource for the prediction of peptide binding
diction algorithm for proteasomal cleavages to MHC molecules using profiles. Immunoge-
available on the WWW. Immunogenetics 53 netics 56(6):405–419
(2):87–94 19. Reche PA, Glutting JP, Reinherz EL (2002)
10. Tenzer S, Peters B, Bulik S, Schoor O, Prediction of MHC class I binding peptides
Lemmel C, Schatz MM et al (2005) Modeling using profile motifs. Hum Immunol 63
the MHC class I pathway by combining (9):701–709
INDEX
A C
Adaptive immunity................................18, 155, 166, 262 Cancer susceptibility prediction ................................... 186
Aggregation ..................................................... vi, 245–253 Cancer vaccine development ............................... 213–226
Allele frequency net database (AFND) ......................... 34, CD8+ and CD4+ T-cells ..................................... 167, 168,
159, 161, 237 235, 237–240
AllergenFP model ......................................................... 148 Cellular automata ................................311, 319–321, 326
Allergenicity prediction............................ v, 147–152, 179 Chimera ..........................................................49, 181, 182
Allergy............................................................................ 147 ClustalW .......................................................................... 42
AllerHunter server ........................................................ 179 Clustering ................................................... 168, 202–206,
AllerTOP ......................................................148–151, 283 208, 210, 253, 258, 389
Annotation .....................................................50, 367, 386 Combinatorial peptide libraries (CombLib) ................ 47,
Antibody ........................................................... vi, 3, 8, 11, 225, 270, 273
12, 21, 140, 147, 155, 156, 165–167, 177, 219, Conformational prediction B-cell eptiopes ........ 289–296
220, 235, 245, 349–354, 375 Consensus approach............................................. 233, 235
Antibody-binding sites .......................291, 292, 294, 295 Conservancy analysis ..................................................... 179
Antigen ....................................................... 20, 22, 24, 47, Cytokines ....................................207, 208, 213, 353, 354
51, 128, 140, 155, 162, 166–168, 173, 174, 176, Cytoscape.............................................................. 203, 210
177, 208, 219, 255, 256, 265, 266, 280, 286, Cytotoxic T-lymphocyte (CTL) ..................................141,
291, 293, 295, 302 142, 158, 177–179, 281
Antigen-Antibody Interaction Database
(AgAbDb) .......................................................... 167 D
Artificial neural network (ANN) .......................... 47, 167, Delay-differential reaction diffusion model................. 326
168, 178, 225, 233, 234, 270, 273, 281, 282,
Data-driven networks ......................................... 200, 202,
300, 339, 340, 376, 379, 381–384, 403 203, 205, 207–209
Antigen-presenting cell (APCs) ................. 167, 177, 214 Dendritic cells............................. 177, 201–203, 208, 400
Autoantibody (AAbs).............................................vii, 365, Discontinuous antigenic epitopes ....................... 286, 299
367, 368, 370, 373
Autoimmunity ...............................vii, 284–287, 365, 370 E
Autoreactivity ................................................................ 230
ECLIA technique .......................................................... 350
B Electrochemiluminescence ............................ vii, 349–363
Emini surface accessibility prediction ........................... 42,
Bacteriophages .............................................. vii, 246, 309,
52, 53, 56, 59, 62, 162, 165, 214, 222, 223
324, 329, 332, 333, 335, 337, 338, 341 Energy minimization .............................................. 4, 181,
BaggingMDT ...................................................... 386, 387, 271, 286, 287
390, 392–394
Enzyme-linked immunosorbent assay
B and T lymphocytes ........................................................ 3 (ELISA)...........................300, 349, 350, 362, 373
Basic Local Alignmet Search Tool (BLAST) ................ 20, EPCES ............................................................. vi, 289–296
22, 24–26, 51, 128, 139, 149, 157, 267, 281, 302 Epitope-based immune-derived vaccines......................... 2
Bayesian network (BN)....................................... 157, 206,
Epitope potential.......................................................20, 23
210, 383, 390 Epitope prediction .............................................. 4, 12, 13,
BcePred server................................................12, 181, 236 26, 42–49, 51, 52, 61, 155–169, 214, 215,
Binding stability ............................................................ 181
219–221, 223, 226, 255, 266, 268, 269,
Breast cancer type 1 susceptibility protein 281–282, 285, 286, 289, 290, 299, 300
(BRCA1)................................................... 368, 370 Epitopes-HLA docked complex................................... 181
407
IMMUNOINFORMATICS
408 Index
EPSVR ............................................................. vi, 289–296 Immunosurveillance............................................. 185–187
Escherichia coli ................................................. vi, 156, 330 Immunotherapy ............................................................ 213
Influenza virus ...................................................... 229–241
F Inhibitory concentration .............................................. 158
In silico amino acid substitution .................................. 215
FASTA format .................................................4, 150, 151,
176, 178, 179, 215, 231, 232, 238, 259, 261, In silico PCR.................................................................... 42
262, 267, 280, 283, 300, 304 In silico vaccine design .......................................................v
Flow cytometers ............................................................ 362
K
Fluorescein isothiocyanate (FITC) .............................. 246
Fluorescence polarization ...................246, 247, 251–253 Kernel methods ............................................................. 376
Fuzzy C-means clustering ...........................204–206, 208 k nearest neighbour (kNN) algorithm ................ 148, 149
Kolaskar and Tongaonker antigenicity.......................... 47,
G 51–53, 56, 59, 62, 214, 221–223
GenBank data base.......................................................... 26
L
Gene co-expression ......................................202, 205–207
Grand Average Hydropathicity (GRAVY) ................... 286 Latent period.............................. 312, 315–318, 324, 336
Leptospirosis.................................................... vi, 173–182
H LIBSVM ........................................................................ 293
Helper T-cell (HTC) ..........................141, 177–179, 279
M
Hidden Markov model (HMM) ................................... 42,
214, 233, 300, 339 Machine learning (ML) ............................................. v, 49,
HIV-1 ........................................................................4, 255 176, 233, 235, 284, 300, 339, 376
HLA allele genotyping .............................................31–37 Major histocompatibility complex (MHC) ................ 5–7,
HLA distribution analysis .................................... 161–162 12, 13, 31, 33, 48, 49, 63, 67, 74, 155, 156, 159,
HLA sequence data......................................................... 33 162, 167, 168, 177, 213, 216, 217, 225
HLA typing ................................................................... 187 Meta decision trees (MDT)...................................vii, 376,
Homology modeling ............................................ 4, 7, 49, 381–386, 389, 390, 392, 395
82, 103, 266, 267 Meta learning ................................................. vii, 375–395
HTRF Immunoassays ................................................... 361 Middle East Respiratory Syndrome Coronavirus
Human leukocyte antigen (HLA).......................... 24, 31, (MERS-CoV)........................................... v, 39–144
33–36, 49, 80, 92, 102, 110, 120, 128, 138, 141, Mimotopes .....................................vi, 167, 213, 214, 224
158, 159, 162, 168, 169, 178–182, 185 Molecular docking ........................................... vi, 4, 8–15,
Human papilloma virus .................................................. 19 143, 162, 179, 180, 236–237, 239–240, 271,
Hydropathy index ................................................ 380, 381 284, 285
Hydrophilicity ...................................................47, 49, 52, Molecular dynamics simulations .................................. 181
54, 57, 60, 63, 162, 165, 167, 177, 181, 214, Monoclonal antibody (mAb) ......................... vi, 245–253
221, 223, 224, 286, 300 Monte Carlo simulations .................................... 311–313,
315, 316, 321, 322, 324
I Motif-based sequence analysis...................................... 251
IC50 value ........................... 48, 158–160, 169, 178, 282 Multiplex assays ............................................350, 352–353
IMGT database ............................................................. 168 Multiplicity of infection (MOI) .......................... 314, 315
Mycobacteria .............................................. 139, 321, 322,
Immune epitope database (IEDB)....................... v, vi, 13,
23, 26, 29, 42, 47, 49, 69, 80, 141, 158, 159, 161, 331, 332, 336–338, 341, 342
162, 165, 177–179, 188, 214–217, 220, 223, Mycobacteriophage........................................ vi, 321, 324,
331–334, 336–339, 341
225, 234, 237, 238, 256, 260, 270, 281–283,
300, 301, 340, 386, 404
N
Immune epitope database analysis resource
(IEDB-AR) ................................13, 214, 266, 270 NCBI database ........................................ 4, 148, 176, 280
Immunoassays ..............................................349–363, 373 NetMHCpan .....................................................48, 61, 62,
Immunodominant epitopes..................................... vi, 265 67, 68, 73, 168, 188, 225, 234, 270, 273, 340
Immunoglobulins ................................................ 166, 168 Next-generation sequencing (NGS) ...................... 31, 32,
Immunosuppression ..................................................... 186 34, 233, 246, 247, 250–253, 339, 340
IMMUNOINFORMATICS
Index 409
O Residue conservation score .......................................... 294
Respiratory syncytial virus (RSV)........................ 207, 208
Oncogenic mutation probability......................... 185–197 Reverse vaccinology ................................. v, 1–15, 19, 156
Oncogenome................................................................. 187 RNAseq data ................................................................. 208
Outer membrane protein (OMP) ...............................174,
176, 178, 266 S
P Secondary structure .............................................. 26, 165,
167, 214, 290, 293–295, 300, 380
Paratopes ...................................................................3, 213 Severe acute respiratory syndrome coronavirus ............ 40
Parker hydrophilicity prediction............................. 47, 52, Side chain polarity ................................................ 380, 381
54, 57, 60, 63, 162, 165, 177, 221, 222 Size exclusion chromatography (SEC) .......................246,
PatchDock rigid-body server........................................ 180 248, 249
Pathogen..........................................................2, 3, 17–19, Stabilized matrix method (SMM) ........................ 47, 225,
24, 156, 157, 166, 169, 173, 174, 178, 208, 230, 234, 270, 273, 282
255, 262, 265, 266, 271, 272, 278, 330, 332, Stromal cell-basal medium (SCBM) ............................ 351
336, 338, 341, 342, 355, 365 Support vector regression (SVR) ............... 168, 291, 293
Pearson correlation .............................200, 201, 204, 209 SVM package........................................................ 293, 302
PEP-FOLD3 ........................................................ 180, 284 SVMTriP.........................................vi, 236, 281, 299–305
Peptide-MHC-I affinity ................................................ 400 Swiss-Prot database ....................................................... 168
Peptide vaccine design ..............................................17–29 SYFPEITHI...........................................13, 168, 234, 340
Phage-bacteria dynamics................................. vi, 309–326 Synthetic peptide vaccine.............................................. 231
Phage display ......................... vi, 245–247, 249, 252, 253
Phage panning............................................................... 249 T
Physicochemical properties................................... 51, 149,
177, 214, 280, 286, 376, 383 TAP transport....................................................48, 49, 61,
Population coverage calculation...............................49, 80 62, 67, 158, 159, 167, 178, 282
Position specific scoring matrices (PSSMs) ................168, T-cell receptor (TCR) ......................................... 167, 168,
233, 234, 281, 339, 340, 380, 381 177, 213, 230, 280, 341
Primary human periodontal ligament fibroblasts The Cancer Genome Atlas (TCGA) ...........................187,
(hPDLFs) ........................................................... 351 188, 193, 195, 340
Propensity scores............................................ vi, 167, 181, 3D modeling ................................................................... 23
222, 236, 300, 376, 380 Transcriptomics ...................................200, 206, 207, 338
Proteasome cleavage ............................. 48, 400, 403–405 Transporter associated with antigen presentation
Protein data bank (PDB) ...................................... 7–9, 26, (TAP) ..................................................48, 167, 168
148, 162, 179, 181, 182, 267, 271, 284, 286,
V
290, 291, 295
Protein Information Resource (PIR)..................... 7, 148, Vaccinomics .................................................................1, 19
281, 284 VaxiJen server ......................................157, 159, 165, 176
Protein-protein interaction network (PPI)............. vi, 207 Virus Pathogen Database and Analysis
Protein Variability Server (PVS)................................... 256 Resource (ViPR)................................................ 280
Q W
qPCR ............................................................................. 250 Whole genome database ................................................... 4
Quantum Matrices ............................................... 281, 282
Z
R
Zika virus ............................................................... v, 17–29
Ramachandran Plot assessment .................................... 267 Zoonotic disease ........................................................... 173
Random Forest Algorithm ......................... 236, 272, 286 Z-score.................................................148, 268, 270, 272
RANKPEP .......................................................4–6, 12, 13,
168, 234, 260, 282, 340