0% found this document useful (0 votes)
17 views21 pages

Recent Advances in Computational Prediction of Secondary and Supersecondary Structures From Protein Sequences

Uploaded by

lucylit0666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views21 pages

Recent Advances in Computational Prediction of Secondary and Supersecondary Structures From Protein Sequences

Uploaded by

lucylit0666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/385838547

Recent Advances in Computational Prediction of Secondary and


Supersecondary Structures from Protein Sequences

Chapter · November 2024


DOI: 10.1007/978-1-0716-4213-9_1

CITATIONS READS

0 252

5 authors, including:

Jian Zhang Lukasz Kurgan


Xinyang Normal University Virginia Commonwealth University
11 PUBLICATIONS 209 CITATIONS 335 PUBLICATIONS 18,812 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Lukasz Kurgan on 18 November 2024.

The user has requested enhancement of the downloaded file.


Recent advances in computational prediction of secondary
and supersecondary structures from protein sequences
Jian Zhang 1, 2, *, Jingjing Qian 1, Quan Zou 2, Feng Zhou 1, Lukasz Kurgan 3, *
1
School of Computer and Information Technology, Xinyang Normal University, Xinyang
464000, China
2
Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology
of China, Quzhou 324003, China
3
Department of Computer Science, College of Engineering, Virginia Commonwealth
University, Virginia, VA 23284, USA
*
Corresponding authors: [email protected]; [email protected]

Summary
The secondary structures (SSs) and supersecondary structures (SSSs) underly the three-
dimensional structure of proteins. Prediction of the SSs and SSSs from protein sequences
enjoys high levels of use and finds numerous applications in the development of a broad
range of other bioinformatics tools. Numerous sequence-based predictors of SS and SSS were
developed and published in recent years. We survey and analyze 45 SS predictors that were
released since 2018, focusing on their inputs, predictive models, scope of their prediction, and
availability. We also review 32 sequence-based SSS predictors, which primarily focus on
predicting coiled coils and beta-hairpins and which include five methods that were published
since 2018. Substantial majority of these predictive tools rely on machine learning models,
including a variety of deep neural network architectures. They also frequently use
evolutionary sequence profiles. We discuss details of several modern SS and SSS predictors
that are currently available to the users and which were published in higher impact venues.
Keywords: secondary structure; supersecondary structure; prediction; protein sequence;
neural network; deep learning; evolutionary profile; PSSM; HMM.

1. Introduction
Protein secondary structure (SS) is a local spatial conformation which is formed by the
backbone atoms of the polypeptide chain [1]. The main SS elements include α-helix, β-strand,
and coils, with other more specific or rare elements, such as 310 helix, pi helix, β-turn and β-
bridge [2]. Inclusion of specific SS elements in protein structure defines an overall type of
structural fold, such as all-alpha, all-beta, and alpha and beta [3]. For instance, hemoglobin
and myoglobin contain mostly α-helices while ferredoxin has no α-helices. The α-helix is a
segment in protein sequence which is coiled around a central axis and formed with hydrogen
bonds, typically in the right-handed way and with a pitch (vertical distance between
consecutive turns of the helix) of 5.4 Å (0.54 nm) [4][1]. The β-strand (or β-sheet) is another
common SS that is made up of multiple sequence segments that are held by hydrogen bonds

1
in parallel (N to C direction) or antiparallel way [5]. SSs are typically extracted from the
tertiary structure using specialized software. The arguably most popular software is the
Dictionary of Protein Secondary Structure (DSSP) [2, 6], although many other tools, such as
DEFINE [7], STRIDE [8], SKSP [9], PSSC [10], SACF [11], and HECA [12], are available.
Additionally, the 2Struc server offers access to multiple annotation methods [13]. We show
examples of α-helices and β-strands in Figure 1, which gives a carton representation of the
tertiary structure of the DNA-binding protein 7d from the human T-cell leukemia virus type-1
(HTLV-1).

Figure 1. Cartoon representation of the DNA-binding protein 7d from the human T-cell leukemia virus type-1 (HTLV-1)
structure (PDB ID: 6VOY, chain A). Secondary structures are color coded and include -helices (blue), -stands (red), and
coils (purple). We also show the -hairpin supersecondary structure motifs (orange), which consists of two -strands in
antiparallel direction and a coil between them, inside the dashed rectangle.

SS elements that are adjacent in space may interact with each other and form regular SS
aggregates, which are called supersecondary structures (SSSs) [14, 15]. SSS elements
typically cover various combinations of α-helices and β-strands, such as the α-helix-based
units (e.g., α-α), β-sheet combinations (e.g., β-β and β-β-β), and hybrid combinations (e.g., β-
α-β) [16]. Commonly occurring SSSs include -helix hairpins, -hairpins, coiled coils, Greek
key, and Rossmann motifs. Similar to SSs, SSSs act as building blocks of the tertiary
structure, contribute to structural stability, and are involved in protein-ligand interfaces [16-
19]. The SSSs can be annotated with the SOCKET [20] and PROMOTIF [21] programs.
Moreover, certain SSSs, such as the coiled coils and -- motifs, can be collected and
analyzed using databases that include CC+ [22] and TOPS [23]. We illustrate one of the
simplest and common SSS elements, the -hairpin, in Figure 1.
SS and SSS can be also predicted directly from protein sequence using computational tools.
There are several dozens of SS and SSS predictors [6, 24-29], and some of them enjoy very
high levels of use. For instance, several SS predictors have secured high citation counts,
including PSIPRED [30], Jpred [31], and PHD [32] that were cited 6843, 1843 and 1720
times, respectively (source: Google Scholar as of February 2024). Besides the direct use to
make predictions from the protein sequences, these tools are also widely utilized to develop

2
other bioinformatics methods including multiple alignment tools [33, 34], target selection
methods [35, 36], and predictors of binding residues [37-42], residue depth [43, 44] and
residue contacts [45, 46], beta-turns [47], intrinsic disorder [48-52], disordered protein-
binding regions [53, 54] and folding rates and types [55-57], to give a few examples.
Similarly, the SSS predictors were applied to analyze amyloids [58] and microbial pathogens
[59], perform simulations of protein folding [60], perform genome-wide studies of protein
structure [61], and predict protein domains [62] and folds [63].
Given their high number, widespread use and popularity, sequence-based SS predictors were
reviewed in several studies. The earlier reviews focus on the contemporary computational
advancements in this field, which include the use of sliding sequence window, evolutionary
information generated from multiple sequence alignments, and machine-learning algorithms
[64-66]. Surveys in the early 2010s focus on evaluations and applications of the SS predictors
and provide practical advice, such as details concerning their availability and scope [67-69].
Moreover, consensus-based approaches, which combine predictions from multiple methods to
improve predictive performance, were popular at that time [70-72]. The most recent reviews
describe more current SS predictors, including those that rely on deep neural networks [24-
27]. However, they do not consider the SSS predictors and in general these predictors have
been reviewed to a much lesser extent. The -hairpin and coiled coil predictors together with
the SS predictors were surveyed in 2006 [73, 74]. There are three more recent surveys that
consider sequence-based SS and SSS predictors [6, 28, 29]. We also note a recent article that
overviews various aspects of the SSS characterization including experimental methods,
relation to structural classification of proteins, related databases and a few SSS predictors
[19]. The newest survey article was published in 2019 and overviews 34 methods that include
17 SS and 17 SSS predictors that were released between 2005 and 2019 [6]. This chapter
expands the last study by surveying SS and SSS predictors. We identify and summarize 45 SS
predictors and 32 SSS predictors. These numbers demonstrate high levels of interest in the
development of these predictors. We discuss the inputs and predictive models that these
methods utilize, define and list outputs that they generate, and comment on their availability.
Moreover, we provide a more detailed description for a selected collection of recent and
impactful methods.

3
Table 1. Sequence-based predictors of protein SS that were published since 2018. Methods are sorted in the reverse
chronological order. The methods in bold font were published in higher impact and more topically relevant journals. The
“predictive models” column includes SVM (support vector machine), RF (random forest), kNN (k-nearest neighbor), and
numerous variants of neural networks (listed alphabetically): BGRU (bi-directional gate current units network), BiLSTM (bi-
directional long-short-term-memory network), BRNN (bi-directional recurrent neural network), BTCN (bi-directional
temporal convolutional network), CNN (convolutional neural network), DBN (deep belief network), DEN (deep embedding
network), FFNN (feed-forward neural network), GAN (generative adversarial network), GCN (graph convolutional
networks), GNN (graph neural network), HGMA (hyper graph multi-head attention network), RDN (residual dense network),
SAN (self-attention network), ResNet (residual convolutional network), TCN (temporal convolutional network), and WGAN
(wasserstein generative adversarial network). The “inputs” columns denote the use of the five most commonly types of
inputs: the sequence itself, individual amino acids encoded using 1-hot scheme, evolutionary profile based on the position-
specific scoring matrix (PSSM), physio-chemical properties of amino acids, and evolutionary profile based on the hidden
Markov model (HMM).

Inputs
Year Scope of
Predictor Ref. Predictive models 1-hot Physico- HMM predictions
published Sequence PSSM
encoding chem. profile
PHAT [79] 2023 HGMA, BiLSTM  3 & 8 state
Yuan et al. [75] 2023 BiLSTM, BTCN    3 & 8 state
Zhao et al. [98] 2023 CNN   3 & 8 state
Srushti et al. [99] 2023 CNN, BiLSTM   8 state
Geethu et al. [100] 2023 RDN   3 & 8 state
WG-ICRN [94] 2023 WGAN, CNN   8 state
IGPRED-MultiTask [95] 2023 CNN, GCN, BiLSTM    3 state
Rashid et al. [93] 2023 DBN  3 state
Multi-S3P [101] 2023 CNN, BiLSTM    3 & 8 state
CGAN-PSSP [86] 2022 GAN, CNN   3 & 8 state
DML_SS [84] 2022 DEN     3 & 8 state
DLBLS_SS [76] 2022 BiLSTM, TCN    3 & 8 state
S-CNN-BGRU [102] 2022 CNN, GRU     3 & 8 state
Zhang et al. [96] 2022 CNN, GRU     3 & 8 state
WGACSTCN [103] 2022 TCN   3 & 8 state
SPOT-1D-LM [104] 2022 BiLSTM, BRNN   3 & 8 state
ShuffleNet_SS [85] 2022 CNN   3 & 8 state
SAP4SS [105] 2022 FFNN    8 state
Enireddy et al. [106] 2022 LSTM  3 & 8 state
Charalampous et al. [107] 2022 FFNN   3 state
OCLSTM [77] 2021 CNN, BiLSTM   3 & 8 state
TMPSS [108] 2021 CNN, BiLSTM   3 state
PSSP-MVIRT [109] 2021 CNN, BGRU    3 state
MLPRNN [87] 2021 FFNN, BGRU   3 & 8 state
Xu et al. [110] 2021 CNN   3 state
CSI‑LSTM [111] 2021 BiLSTM   3 state
ProteinUnet [112] 2021 CNN   3 & 8 state
SPOT-1D-Single [97] 2021 LSTM  3 & 8 state
DNSS2 [113] 2021 CNN    3 & 8 state
Jin et al. [114] 2021 GCN, BiLSTM     8 state
IGPRED [115] 2021 CNN, GCN    3 & 8 state
Nahid et al. [116] 2021 GNN  8 state
EN-CSLR [117] 2020 CNN, LSTM, RF  3 & 8 state
OPUS-TASS [118] 2020 CNN     3 & 8 state
Zhao et al. [119] 2020 GAN, CNN  3 state
Nnessy [120] 2020 kNN  3 & 8 state
SAINT [121] 2020 SAN    8 state
Du et al. [89] 2019 RF, SVM  3 state
Long et al. [122] 2019 CNN    3 state
SPOT-1D [123] 2019 BRNN, ResNet    3 & 8 state
DeepACLSTM [78] 2019 CNN, BiLSTM    8 state
Ma et al. [91] 2018 SVM   3 state
MUFOLD-SS [92] 2018 CNN, FFNN     3 & 8 state
Xie et al. [90] 2018 SVM   3 state
CFLM [124] 2018 RDN   3 & 8 state

4
2. Methods

2.1. Modern sequence-based secondary structure predictors

The sequence-based SS predictors use protein sequence to generate amino acid-level


propensities for specific sets of the SS states including 3-states prediction (i.e., helix (H),
strand (E), and coil (C)) and 8-states prediction (i.e., 310 helix (G), α-helix (H), pi helix (I), β-
strand (E), β bridge (B), turn (T), bend (S), and other coils (C)), which are based on the
outputs of the DSSP program. We summarize 45 sequence-based SS predictors that were
published since 2018 in Table 1. We note a steady inflow of methods over the years,
demonstrating consistent levels of interest to develop new tools. The table summarizes three
major aspects of these methods: their inputs, predictive models and outputs.
We identify the five most commonly used types of inputs: 1) the sequence itself; 2) encoding
of the amino acids in a proximity of the predicted residues using the 1-hot approach; 3)
evolutionary profile based on the position-specific scoring matrix (PSSM); 4) physio-
chemical properties of amino acids; and 5) evolutionary profile based on the hidden Markov
model (HMM). The sequence input is typically directly processed by neural networks that are
capable of extracting local and long-range dependencies between amino acids [75-80]. The 1-
hot encoding translates the 20 natural amino acids into a 20-dimentional bit vector, where
each position in the vector corresponds to a presence of a particular amino acid type. The
physio-chemical properties quantify characteristics of amino acids that are relevant to
formation of SS elements, such as hydrophobicity, van der Waals volume, and polarizability.
The last two input types concern evolutionary information that is extracted from multiple
sequence alignments. Two popular options are to generate PSSM with the PSI-BLAST [81]
and/or MMseqs [82] software, and/or use the hidden Markov model (HMM) based profile that
are produced using the HHblits program [83]. Table 1 reveals that while none of the 45
methods applies the five inputs together, 18 methods apply at least 3 inputs and only 9 rely on
one input type. By far the most popular input is the PSSM that is used by 35 out of 45 tools,
while 37 tools use at least one evolutionary profile type (PSSM and/or HMM).
Table 1 demonstrates that modern SS predictors rely exclusively on machine learning models,
with the substantial majority using various types of neural networks that represent several
major network types, such as recurrent, convolutional, graph, generative and feed-forward. A
few models use other types of machine learning models, such as support vector machines and
random forest. We also observe that the majority of methods, 25 out of 45, predict both 3-state
and 8-state SSs. The predictions of the 7 methods that output only the 8-state SS can be
converted into the 3-state prediction. Moreover, we find that many of these methods are
comparatively tested using the same benchmark datasets, primarily derived from the Critical
Assessment of protein Structure Prediction (CASP) experiments, including the CASP10,
CASP11, CASP12, CASP13, and CASP14 datasets [75-78, 84-97].
Many of these methods were published in lower tier conferences and journals. We use bold
font in Table 1 to identify methods that were published in higher impact and more topically
relevant journals, which include (alphabetically) Advanced Science, Bioinformatics, Briefings

5
in Bioinformatics, Frontiers in Bioengineering and Biotechnology, Frontiers in Genetics, and
Proteins. We focus on seven selected methods that were published in the higher impact and
relevant journals and that are readily available to the end users, as web server and/or source
code. The following sections provide details concerning authors, models, inputs, predictive
models and availability of these eight SS predictors.

2.1.1 PHAT

PHAT was introduced in 2023 by Qin Ma’s group at the Ohio State University and Leyi Wei’s
group at the Shandong University [79]. PHAT is implemented as a deep hypergraph learning
framework, predicts SS for peptides, and provides downstream functional analysis for the
predicted peptides. The PHAT’s architecture consists of three main modules: (i) knowledge
transfer module, (ii) hypergraph embedding module, and (iii) feature fusion and classification
module. One of the key features of this model is the fact that it combines sequential semantic
information from large-scale biological corpus and structural semantic information from
multi-scale structural segmentation. PHAT was shown to secure Q3 accuracy (accuracy for
the 3-state SS prediction) of 84.1% on a test dataset [79].
Inputs: Protein sequence
Architecture: Deep hypergraph network
Availability: Web server at http://inner.wei-group.net/PHAT/

2.1.2 MLPRNN

This predictor was released in 2021 by Yandong Huang’s group at the Jimei University [87].
MLPRNN is a deep learning model that includes a two-layer stacked bidirectional gated
recurrent unit (BGRU) block sandwiched between two feed-forward neural network (FFNN)
blocks. It predicts both 3-state and 8-state SS for protein sequences. Authors demonstrate that
their tool secures Q3 of 83.3% and Q8 (accuracy for the 8-state SS prediction) of 70.6% on a
popular benchmark dataset [87].
Inputs: PSSM profile generated from PSI-BLAST and HMM profile calculated with HHblits
Architecture: Deep neural network composed of BGRU and FFNN layers
Availability: Code at https://gitlab.com/yandonghuang/mlpbgru

2.1.3 TMPSS

TMPSS was made available in 2021 by Han Wang’s group at the Northeast Normal
University and Guan Ning Lin’s team at the Shanghai Jiao Tong University [108]. It uses a
multi-task learning strategy to develop a deep network model that predicts SS for the alpha
transmembrane proteins. The predictive model combines a convolutional neural network
(CNN) and two stacked bi-directional long-short-term-memory (BiLSTM) layers, and
predicts only the 3-state SS. This predictor also provides the ability to predict topology of the

6
transmembrane helices. Evaluation on a test dataset reveals that TMPSS secures Q3 of 84.0%
[108].
Inputs: One-hot encoding and HMM profile generated by HHblits
Architecture: Deep neural network composed of CNN and BiLSTM layers
Availability: Code at https://github.com/NENUBioCompute/TMP-SS

2.1.4 PSSP-MVIRT

This predictor was published in 2021 by Leyi Wei’s group at the Shandong University [109].
Similar to PHAT, PSSP-MVIRT (Peptide Secondary Structure Prediction based on Multi-
View Information, Restriction and Transfer learning) also targets prediction of SS for
peptides. The predictive model is a deep neural network composed of CNN and bi-directional
gate current units (BGRU) that was computed with the help of transfer learning. This tool
predicts only the 3-state SS and was shown to secure Q3 of 78.5% on a test dataset [109].
Inputs: One-hot encoding, PSSM profile generated from PSI-BLAST, HMM profile generated
from HMMER3.0
Architecture: Deep neural network composed of CNN and BGRU layers
Availability: Code at https://github.com/massyzs/PSSP-MVIRT and web server at
http://server.malab.cn/PSSP-MVIRT

2.1.5 Nnessy

Nnessy (Nearest-Neighbor-based prediction of SS without searching for homology) was


released in 2020 by Spencer Krieger’s group at the University of Arizona [120]. This model
performs predictions in two steps: (i) estimate probability of each residue being the
corresponding SS classes; (ii) combine these probabilities with empirical transition rates
between SS to compute the final prediction for the entire protein. The first step is
implemented using the k-nearest neighbor (kNN) model while the second step takes
advantage of dynamic programming. Nnessy predicts both 3-state and 8-state SS. Test on
several test datasets show that it produces results with Q3 ranging between 67.9% and 85.7%
(depending on the dataset used) and Q8 between 66.5% and 82.4% [120].
Inputs: Protein sequence
Architecture: 2-step prediction process using kNN and dynamic programming
Availability: Code at https://nnessy.cs.arizona.edu/

2.1.6 SPOT-1D

SPOT-1D was published in 2019 by Yaoqi Zhou’s group that is currently located at the
Shenzhen Bay Laboratory [123]. The underlying predictive model is a deep neural network
that combines bi-directional recurrent neural network layers (BRNN) and residual

7
convolutional network layers (ResNets). SPOT-1D predicts 3-state and 8-state SS, and a few
other structural characteristics that include backbone angles, half-sphere exposure, contact
numbers and solvent accessible surface area. Using a large test dataset, authors show that
SPOT-1D yields Q3 of 86.2% and Q8 of 75.4% [123].
Inputs: PSSM generated from PSI-BLAST and HMM profiles generated from HHblits
Architecture: Deep neural network composed of BRNN and ResNet layers
Availability: Web server at https://apisz.sparks-lab.org:8443/SPOT-1D.html

2.1.7 MUFOLD-SS

This predictor was made available in 2018 by Dong Xu’s group at the University of Missouri
[92]. MUFOLD-SS’s predictive model is a deep neural network that includes inception blocks
followed by CNN and FFNN layers. It predicts 3-state and 8-state SS. Authors shows that this
tool produces predictions with Q3 ranging between 83.4% and 86.5% and Q8 between 72.1%
and 76.5%, depending on a test dataset that they use [92]. MUFold-SS is included in the
MUFold-SSW server [125] that provides predictions of protein SS by MUFold-SS, torsion
angles by MUFold-Angle [126], beta-turns by MUFold-BetaTurn [127] and gamma-turns by
MUFold-GammaTurn [128].
Inputs: PSSM generated from PSI-BLAST, HMM profile from HHblits, and protein sequence
Architecture: Deep neural network composed of CNN and FFNN layers
Availability: Code at http://dslsrv8.cs.missouri.edu/~cf797/MUFoldSS/download.html and
web server at http://mufold.org/mufold-ss-angle

2.2. Modern supersecondary structure predictors

The sequence-based SSS predictors are typically designed for a specific SSS type. For
instance, SpiriCoil predicts coiled coils [61] and BhairPred predicts  hairpins [129].
Correspondingly, these methods usually produce 2-state predictions for each amino acid in the
protein sequence: a given SSS state vs. any other conformation. There are numerous
sequence-based SSS predictors. A relatively recent survey discussed 17 methods that were
published before 2019 [6]. Here, we provide an updated review that covers a more complete
collection of predictors, particularly focusing on the methods that were released after 2018.
The arguably most frequently predicted SSS type is coiled coils. The first coiled coil predictor
by David Parry dates back to 1982 [130], and was developed using a small dataset of proteins
with coiled coils. The subsequently released predictors benefited from progressively larger
amounts of data, leading to the development of more sophisticated and accurate models [61,
131-140]. The predictive quality of several coiled coil predictors was evaluated in a
comparative study in ref. [141]. We note that the two most recent coiled coil predictors,
DeepCoil [142] and CoCoPRED [143], rely on modern deep neural network models,
compared to the older methods that primarily utilize the hidden Markov model. Another
commonly predicted SSS type is  hairpin. Several sequence-based  hairpin predictors were

8
developed over the last two decades [144-148], with one of the first efforts completed in 2002
[149]. Among these tools, we highlight a relatively popular BhairPred [129] and the most
recent StarPDB [150].

Compared to  hairpins and coiled coils, there are relatively fewer methods that address
prediction of other SSS types. The earliest predictors of the -turn- motif were released in
the late 1980s and they utilize a rather simple approach that scores similarity between the
input sequence and the -turn- structure library [151, 152]. The subsequent methods use
larger collections of sequence motifs developed from known -turn- structures and make
predictions based on sequence similarity to these motifs [153, 154] or using a machine
learning algorithm, such as the support vector machine [154]. Methods that cover other types
of SSSs include the predictor by Sun and Hu which addresses identification of the β-α-β motif
[155]. There are also several tools that predict multiple types of SSSs. Chronologically, the
predictor by Zou et al. was published in 2010 and it targets β-β, β-α, α-β and α-α motifs [156].
It performs predictions using quadratic discriminant function that relies on the Mahalanobis
distance to a training dataset of these four SSS motifs. The StackSSSPred tool, which predicts
 hairpins and the β-α-β motif, was released in 2019 [157]. It uses a rather complex predictive
architecture where three machine learning models generated by extra trees, kNN, and gradient
boosted trees algorithms are combined using a random decision forest classifier. The newest
method that was designed by Hu and colleagues and published in 2020, predicts β-β, β-α, α-β
and α-α motifs , and it also uses the random decision forest model [158]. The two common
themes for the predictors of the multiple SSS types are that they rely on machine learning
models and that their authors did not release the underlying code or a web server, essentially
requiring users to re-implement their methods.
In total, we identify 32 sequence-based SSS prediction methods that include 16 coiled coil
predictors [61, 130-140, 142, 143], eight  hairpins predictors [129, 144-150], four predictors
of the -turn- motif [151-154], one predictor of the β-α-β motif [155] and 3 predictors of
multiple SSS types [156-158]. Five of these predictors were published since 2018. While this
is not an exhaustive list, it nearly doubles the coverage of our previous survey from 2019 that
identifies 17 predictors [6]. That survey describes a few more impactful and available to the
end users methods: the  hairpin predictor BhairPred [129], the coiled coils predictor
MultiCoil2 [159], and the -turn- predictor GYM [153]. These three methods were released
before 2012. We observe that many of the 32 predictors lack web servers and their
implementations/code are not available, which substantially limits their utility and impact
[160]. We supplement the detailed discussion of the three methods from ref. [6], with
similarly expanded summary of the two recent and more impactful predictors that are
available as web server and/or code: CoCoPRED that predicts coiled coils [143] and StarPDB
that predicts  hairpins [150].

2.2.1 StarPDB

StarPDB (STructural Annotation of Residues using PDB) was published in 2016 by the G.P.S.

9
Raghava’s lab at the Institute of Microbial Technology in Chandigarh, India [150]. This is a
relatively simple approach that makes predictions based on the PSI-BLAST derived sequence
similarity between the input sequence and a database of motifs. This method makes
predictions of the β-hairpin and several SS states that include β-turn, γ-turn, β-bulge, and psi-
loop. The SSS prediction relies on the dataset of β-hairpins extracted from Protein Data Bank
[161] using PROMOTIF [21]. The authors report accuracy (Q2) of 89.1% for the β-hairpin
prediction on a small test dataset [150].
Inputs: Sequence
Architecture: PSI-BLAST based sequence similarity
Availability: Web server at http://crdd.osdd.net/raghava/starpdb/

2.2.2 CoCoPRED

CoCoPRED was released in 2021 by the Hong-Bin Shen’s group at the Shanghai Jiao Tong
University [143]. CoCoPRED predicts three key structural features of coiled-coil: coiled-coil
domains (7 residues long repeat segment in the sequence), registers (patterns of amino acids
that compose the repeat), and oligomeric state (number of helices in the coiled coil), when
compared to the other coiled coil predictors that typically predict one of these characteristics.
This method relies on a modern deep neural network architecture that combines multiple
CNN and BiLSTM layers. The inputs are fed into a BiLSTM layer, followed by a CNN layer,
and the outputs of that layer are input to three parallel and custom-designed CNN layers, each
predicting the different aspect of coiled cols, i.e., coiled-coil domain, oligomeric state and
register. Evaluation on test datasets show that CoCoPRED secures AUC of 0.50 for the
prediction of the coiled-coil domains, Q4 accuracy (prediction of the four classes of coiled
coils: parallel dimer, antiparallel dimes, trimer, tetramer) of 65.9% for the oligomeric state,
and Q7 accuracy (prediction for each of the 7 positions in the repeat) of 70.0% for the register
prediction.
Inputs: One-hot encoding, HMM profile generated from HHblits, and physio-chemical
property of amino acids (i.e., hydrophobicity)
Architecture: Deep neural network composed of CNN and BiLSTM layers
Availability: Code at http://www.csbio.sjtu.edu.cn/bioinf/CoCoPRED/Download.htm and web
server at http://www.csbio.sjtu.edu.cn/bioinf/CoCoPRED/

Acknowledgments

This work was supported in part by the Nanhu Scholars Program for Young Scholars of the
Xinyang Normal University to J.Z and the Robert J. Mattauch Endowed Chair funds to L.K.

References
1. Stollar, E.J. and D.P. Smith, Uncovering protein structure. Essays Biochem, 2020.
64(4): p. 649-680.

10
2. Kabsch, W. and C. Sander, Dictionary of protein secondary structure: Pattern
recognition of hydrogen-bonded and geometrical features. Biopolymers, 1983. 22(12):
p. 2577-2637.
3. Andreeva, A., et al., The SCOP database in 2020: expanded classification of
representative family and superfamily domains of known protein structures. Nucleic
Acids Res, 2020. 48(D1): p. D376-D382.
4. Errington, N., T. Iqbalsyah, and A.J. Doig, Structure and stability of the alpha-helix:
lessons for design. Methods Mol Biol, 2006. 340: p. 3-26.
5. Cohen, N. and C.D. Eisenbach, Molecular Mechanics of Beta-Sheets. ACS Biomater
Sci Eng, 2020. 6(4): p. 1940-1949.
6. Oldfield, C.J., K. Chen, and L. Kurgan, Computational Prediction of Secondary and
Supersecondary Structures from Protein Sequences. Methods Mol Biol, 2019. 1958: p.
73-100.
7. Richards, F.M. and C.E. Kundrot, Identification of structural motifs from protein
coordinate data: Secondary structure and first-level supersecondary structure.
Proteins: Structure, Function, and Genetics, 1988. 3(2): p. 71-84.
8. Frishman, D. and P. Argos, Knowledge-based protein secondary structure assignment.
Proteins: Structure, Function, and Genetics, 1995. 23(4): p. 566-579.
9. Zhang, W., A.K. Dunker, and Y. Zhou, Assessing secondary structure assignment of
protein structures by using pairwise sequence-alignment benchmarks. Proteins:
Structure, Function, and Bioinformatics, 2008. 71(1): p. 61-67.
10. Zacharias, J. and E.W. Knapp, Protein secondary structure classification revisited:
processing DSSP information with PSSC. J Chem Inf Model, 2014. 54(7): p. 2166-79.
11. Cao, C., et al., A New Secondary Structure Assignment Algorithm Using C-alpha
Backbone Fragments. International Journal of Molecular Sciences, 2016. 17(3).
12. Saqib, M.N., J.D. Krys, and D. Gront, Automated Protein Secondary Structure
Assignment from Calpha Positions Using Neural Networks. Biomolecules, 2022.
12(6).
13. Klose, D.P., B.A. Wallace, and R.W. Janes, 2Struc: the secondary structure server.
Bioinformatics, 2010. 26(20): p. 2624-2625.
14. Koch, I. and T. Schäfer, Protein super-secondary structure and quaternary structure
topology: theoretical description and application. Current Opinion in Structural
Biology, 2018. 50: p. 134-143.
15. Protein Supersecondary Structures, 2 Edition. Protein Supersecondary Structures, 2
Edition, 2019. 1958: p. 1-438.
16. Pires, D.E., et al., Exploring protein supersecondary structure through changes in
protein folding, stability, and flexibility. Protein Supersecondary Structures: Methods
and Protocols, 2019: p. 173-185.
17. Kister, A. Relationship between Amino Acids Sequences and Protein Structures:
Folding Patterns and Sequence Patterns. in Bioinformatics Research and
Applications: 5th International Symposium, ISBRA 2009 Fort Lauderdale, FL, USA,
May 13-16, 2009 Proceedings 5. 2009. Springer.

11
18. MacCarthy, E., D. Perry, and D.B. Kc, Advances in protein super-secondary structure
prediction and application to protein structure prediction. Protein Supersecondary
Structures: Methods and Protocols, 2019: p. 15-45.
19. Rudnev, V.R., et al., Current Approaches in Supersecondary Structures Investigation.
International Journal of Molecular Sciences, 2021. 22(21): p. 11879.
20. Walshaw, J. and D.N. Woolfson, SOCKET: a program for identifying and analysing
coiled-coil motifs within protein structures11Edited by J. Thornton. Journal of
Molecular Biology, 2001. 307(5): p. 1427-1450.
21. Hutchinson, E.G. and J.M. Thornton, PROMOTIF-A program to identify and analyze
structural motifs in proteins. Protein Science, 1996. 5(2): p. 212-220.
22. Testa, O.D., E. Moutevelis, and D.N. Woolfson, CC+: a relational database of coiled-
coil structures. Nucleic Acids Research, 2009. 37(Database): p. D315-D322.
23. Michalopoulos, I., TOPS: an enhanced database of protein structural topology.
Nucleic Acids Research, 2004. 32(90001): p. 251D-254.
24. Yang, Y., et al., Sixty-five years of the long march in protein secondary structure
prediction: the final stretch? Briefings in Bioinformatics, 2018. 19(3): p. 482-494.
25. Meng, F. and L. Kurgan, Computational Prediction of Protein Secondary Structure
from Sequence. Curr Protoc Protein Sci, 2016. 86: p. 2 3 1-2 3 10.
26. Smolarczyk, T., I. Roterman-Konieczna, and K. Stapor, Protein Secondary Structure
Prediction: A Review of Progress and Directions. Current Bioinformatics, 2020. 15(2):
p. 90-107.
27. Ismi, D.P., R. Pulungan, and Afiahayati, Deep learning for protein secondary structure
prediction: Pre and post-AlphaFold. Comput Struct Biotechnol J, 2022. 20: p. 6271-
6286.
28. Ho, H.K., et al., A survey of machine learning methods for secondary and
supersecondary protein structure prediction. Methods Mol Biol, 2013. 932: p. 87-106.
29. Chen, K. and L. Kurgan, Computational prediction of secondary and supersecondary
structures. Methods Mol Biol, 2013. 932: p. 63-86.
30. Jones, D.T., Protein secondary structure prediction based on position-specific scoring
matrices. Journal of Molecular Biology, 1999. 292(2): p. 195-202.
31. Drozdetskiy, A., et al., JPred4: a protein secondary structure prediction server.
Nucleic Acids Research, 2015. 43(W1): p. W389-W394.
32. Rost, B., PHD: Predicting one-dimensional protein structure by profile-based neural
networks, in Methods in Enzymology. 1996, Elsevier. p. 525-539.
33. Pei, J. and N.V. Grishin, PROMALS: towards accurate multiple sequence alignments
of distantly related proteins. Bioinformatics, 2007. 23(7): p. 802-808.
34. Deng, X. and J. Cheng, MSACompro: improving multiple protein sequence alignment
by predicted structural features. Methods Mol Biol, 2014. 1079: p. 273-83.
35. Mizianty, M.J. and L. Kurgan, Sequence-based prediction of protein crystallization,
purification and production propensity. Bioinformatics, 2011. 27(13): p. i24-i33.
36. Slabinski, L., et al., XtalPred: a web server for prediction of protein crystallizability.
Bioinformatics, 2007. 23(24): p. 3403-3405.

12
37. Yan, J. and L. Kurgan, DRNApred, fast sequence-based method that accurately
predicts and discriminates DNA- and RNA-binding residues. Nucleic Acids Res, 2017.
45(10): p. e84.
38. Peng, Z. and L. Kurgan, High-throughput prediction of RNA, DNA and protein binding
regions mediated by intrinsic disorder. Nucleic Acids Res, 2015. 43(18): p. e121.
39. Zhang, J., et al., DNAgenie: accurate prediction of DNA-type-specific binding residues
in protein sequences. Brief Bioinform, 2021. 22(6).
40. Zhang, F., et al., DeepPRObind: Modular Deep Learner that Accurately Predicts
Structure and Disorder-Annotated Protein Binding Residues. J Mol Biol, 2023: p.
167945.
41. Zhang, F., et al., HybridRNAbind: prediction of RNA interacting residues across
structure-annotated and disorder-annotated proteins. Nucleic Acids Res, 2023. 51(5):
p. e25.
42. Katuwawala, A., B. Zhao, and L. Kurgan, DisoLipPred: Accurate prediction of
disordered lipid binding residues in protein sequences with deep recurrent networks
and transfer learning. Bioinformatics, 2021.
43. Song, J., et al., Prodepth: Predict Residue Depth by Support Vector Regression
Approach from Protein Sequences Only. PLoS ONE, 2009. 4(9): p. e7072.
44. Zhang, H., et al., Sequence based residue depth prediction using evolutionary
information and predicted secondary structure. BMC Bioinformatics, 2008. 9(1): p.
388.
45. Xue, B., E. Faraggi, and Y. Zhou, Predicting residue-residue contact maps by a two-
layer, integrated neural-network method. Proteins: Structure, Function, and
Bioinformatics, 2009. 76(1): p. 176-183.
46. Cheng, J. and P. Baldi, Improved residue contact prediction using support vector
machines and a large feature set. BMC Bioinformatics, 2007. 8(1): p. 113.
47. Zheng, C. and L. Kurgan, Prediction of beta-turns at over 80% accuracy based on an
ensemble of predicted secondary structures and multiple alignments. BMC
Bioinformatics, 2008. 9: p. 430.
48. Mizianty, M.J., et al., Improved sequence-based prediction of disordered regions with
multilayer fusion of multiple information sources. Bioinformatics, 2010. 26(18): p.
i489-i496.
49. Walsh, I., et al., CSpritz: accurate prediction of protein disorder segments with
annotation for homology, secondary structure and linear motifs. Nucleic Acids Res,
2011. 39(Web Server issue): p. W190-6.
50. Hu, G., et al., flDPnn: Accurate intrinsic disorder prediction with putative propensities
of disorder functions. Nature Communications, 2021. 12(1): p. 4438.
51. Orlando, G., et al., Prediction of Disordered Regions in Proteins with Recurrent
Neural Networks and Protein Dynamics. J Mol Biol, 2022. 434(12): p. 167579.
52. Kurgan, L., et al., Tutorial: a guide for the selection of fast and accurate
computational tools for the prediction of intrinsic disorder in proteins. Nat Protoc,
2023. 18(11): p. 3157-3172.

13
53. Yan, J., et al., Molecular recognition features (MoRFs) in three domains of life. Mol
Biosyst, 2016. 12(3): p. 697-710.
54. Sharma, R., et al., OPAL: Prediction of MoRF regions in intrinsically disordered
protein sequences. Bioinformatics, 2018.
55. Zhang, H., et al., Determination of protein folding kinetic types using sequence and
predicted secondary structure and solvent accessibility. Amino Acids, 2010. 42(1): p.
271-283.
56. Gao, J., et al., Accurate prediction of protein folding rates from sequence and
sequence-derived residue flexibility and solvent accessibility. Proteins: Structure,
Function, and Bioinformatics, 2010: p. NA-NA.
57. Huang, J.T., et al., Prediction of protein folding rates from simplified secondary
structure alphabet. J Theor Biol, 2015. 383: p. 1-6.
58. O'Donnell, C.W., et al., A method for probing the mutational landscape of amyloid
structure. Bioinformatics, 2011. 27(13): p. i34-i42.
59. Bradley, P., et al., BETAWRAP: Successful prediction of parallel -helices from
primary sequence reveals an association with many microbial pathogens. Proceedings
of the National Academy of Sciences, 2001. 98(26): p. 14819-14824.
60. Sun, Z.R., et al., Molecular Dynamics Simulation of Protein Folding with
Supersecondary Structure Constraints. Journal of Protein Chemistry, 1998. 17(8): p.
765-769.
61. Rackham, O.J.L., et al., The Evolution and Structure Prediction of Coiled Coils across
All Genomes. Journal of Molecular Biology, 2010. 403(3): p. 480-493.
62. Reddy, C.C.S., et al., PURE: A webserver for the prediction of domains in unassigned
regions in proteins. BMC Bioinformatics, 2008. 9(1): p. 281.
63. Anton, B., et al., On the use of direct-coupling analysis with a reduced alphabet of
amino acids combined with super-secondary structure motifs for protein fold
prediction. NAR Genom Bioinform, 2021. 3(2): p. lqab027.
64. Barton, G.J., Protein secondary structure prediction. Current Opinion in Structural
Biology, 1995. 5(3): p. 372-376.
65. Heringa, J., Computational Methods for Protein Secondary Structure Prediction Using
Multiple Sequence Alignments. Current Protein & Peptide Science, 2000. 1(3): p. 273-
301.
66. Rost, B., Protein Secondary Structure Prediction Continues to Rise. Journal of
Structural Biology, 2001. 134(2-3): p. 204-218.
67. Zhang, H., et al., Critical assessment of high-throughput standalone methods for
secondary structure prediction. Briefings in Bioinformatics, 2011. 12(6): p. 672-688.
68. Rost, B., Prediction of protein structure in 1D – secondary structure, membrane
regions, and solvent accessibility, in Structural Bioinformatics. 2009, John Wiley &
Sons, Inc. p. 679-714.
69. Pirovano, W. and J. Heringa, Protein Secondary Structure Prediction, in Methods in
Molecular Biology. 2009, Humana Press. p. 327-348.
70. Albrecht, M., et al., Simple consensus procedures are effective and sufficient in

14
secondary structure prediction. Protein Engineering Design and Selection, 2003.
16(7): p. 459-462.
71. Yan, J., M. Marcus, and L. Kurgan, Comprehensively designed consensus of
standalone secondary structure predictors improves Q3 by over 3%. J Biomol Struct
Dyn, 2014. 32(1): p. 36-51.
72. Kieslich, C.A., et al., conSSert: Consensus SVM Model for Accurate Prediction of
Ordered Secondary Structure. J Chem Inf Model, 2016. 56(3): p. 455-61.
73. Singh, M., Predicting Protein Secondary and Supersecondary Structure, in Chapman
& Hall/CRC Computer & Information Science Series. 2006, Chapman and Hall/CRC.
p. 29-1-29-22.
74. Gruber, M., J. Söding, and A.N. Lupas, Comparative analysis of coiled-coil prediction
methods. Journal of Structural Biology, 2006. 155(2): p. 140-145.
75. Yuan, L., Y. Ma, and Y. Liu, Ensemble deep learning models for protein secondary
structure prediction using bidirectional temporal convolution and bidirectional long
short-term memory. Frontiers in Bioengineering and Biotechnology, 2023. 11: p.
1051268.
76. Yuan, L., et al., DLBLS_SS: protein secondary structure prediction using deep
learning and broad learning system. RSC Adv, 2022. 12(52): p. 33479-33487.
77. Zhao, Y. and Y. Liu, OCLSTM: Optimized convolutional and long short-term memory
neural network model for protein secondary structure prediction. Plos one, 2021.
16(2): p. e0245982.
78. Guo, Y., et al., DeepACLSTM: deep asymmetric convolutional long short-term
memory neural models for protein secondary structure prediction. BMC
Bioinformatics, 2019. 20(1): p. 341.
79. Jiang, Y., et al., Explainable deep hypergraph learning modeling the peptide
secondary structure prediction. Advanced Science, 2023. 10(11): p. 2206151.
80. Yang, X., et al., Modality-DTA: Multimodality fusion strategy for drug–target affinity
prediction. IEEE/ACM Transactions on Computational Biology and Bioinformatics,
2023. 20(2): p. 1200-1210.
81. Altschul, S.F., et al., Gapped BLAST and PSI-BLAST: a new generation of protein
database search programs. Nucleic acids research, 1997. 25(17): p. 3389-3402.
82. Hauser, M., M. Steinegger, and J. Söding, MMseqs software suite for fast and deep
clustering and searching of large protein sequence sets. Bioinformatics, 2016. 32(9):
p. 1323-1330.
83. Remmert, M., et al., HHblits: lightning-fast iterative protein sequence searching by
HMM-HMM alignment. Nat Methods, 2011. 9(2): p. 173-5.
84. Yang, W., Y. Liu, and C. Xiao, Deep metric learning for accurate protein secondary
structure prediction. Knowledge-Based Systems, 2022. 242: p. 108356.
85. Yang, W., et al., Protein secondary structure prediction using a lightweight
convolutional network and label distribution aware margin loss. Knowledge-Based
Systems, 2022. 237: p. 107771.
86. Jin, X., et al., Prediction of protein secondary structure based on an improved channel

15
attention and multiscale convolution module. Frontiers in Bioengineering and
Biotechnology, 2022. 10: p. 901018.
87. Lyu, Z., et al., Protein secondary structure prediction with a reductive deep learning
method. Frontiers in Bioengineering and Biotechnology, 2021. 9: p. 687426.
88. de Oliveira, G.B., H. Pedrini, and Z. Dias. Fusion of BLAST and ensemble of
classifiers for protein secondary structure prediction. in 2020 33rd SIBGRAPI
Conference on Graphics, Patterns and Images (SIBGRAPI). 2020. IEEE.
89. Du, Y., et al., Protein Secondary Structure Prediction with Dynamic Self-Adaptation
Combination Strategy Based on Entropy. Journal of Quantum Computing, 2019. 1(1):
p. 21.
90. Xie, S., Z. Li, and H. Hu, Protein secondary structure prediction based on the fuzzy
support vector machine with the hyperplane optimization. Gene, 2018. 642: p. 74-83.
91. Ma, Y., Y. Liu, and J. Cheng, Protein Secondary Structure Prediction Based on Data
Partition and Semi-Random Subspace Method. Sci Rep, 2018. 8(1): p. 9856.
92. Fang, C., Y. Shang, and D. Xu, MUFOLD-SS: New deep inception-inside-inception
networks for protein secondary structure prediction. Proteins, 2018. 86(5): p. 592-598.
93. Rashid, S., S. Sundaram, and C.K. Kwoh, Empirical Study of Protein Feature
Representation on Deep Belief Networks Trained With Small Data for Secondary
Structure Prediction. IEEE/ACM Trans Comput Biol Bioinform, 2023. 20(2): p. 955-
966.
94. Li, S., et al., WG-ICRN: Protein 8-state secondary structure prediction based on
Wasserstein generative adversarial networks and residual networks with Inception
modules. Math Biosci Eng, 2023. 20(5): p. 7721-7737.
95. Gormez, Y. and Z. Aydin, IGPRED-MultiTask: A Deep Learning Model to Predict
Protein Secondary Structure, Torsion Angles and Solvent Accessibility. IEEE/ACM
Trans Comput Biol Bioinform, 2023. 20(2): p. 1104-1113.
96. Zhang, X., et al., Multistage Combination Classifier Augmented Model for Protein
Secondary Structure Prediction. Front Genet, 2022. 13: p. 769828.
97. Singh, J., et al., SPOT-1D-Single: improving the single-sequence-based prediction of
protein secondary structure, backbone angles, solvent accessibility and half-sphere
exposures using a large training set and ensembled deep learning. Bioinformatics,
2021. 37(20): p. 3464-3472.
98. Zhao, J. and Q. Xiao. Prediction of Protein Secondary Structure Based on Lightweight
Convolutional Neural Network. in 2023 6th International Conference on Computer
Network, Electronic and Automation (ICCNEA). 2023. IEEE.
99. Srushti, C., P. Prathibhavani, and K. Venugopal. Eight-State Accuracy Prediction of
Protein Secondary Structure using Ensembled Model. in 2023 International
Conference for Advancement in Technology (ICONAT). 2023. IEEE.
100. Geethu, S. and E. Vimina, Protein Secondary Structure Prediction Using Cascaded
Feature Learning Model. Applied Soft Computing, 2023. 140: p. 110242.
101. Mufassirin, M.M., et al., Multi-S3P: Protein Secondary Structure Prediction with
Specialized Multi-Network and Self-Attention-based Deep Learning Model. IEEE

16
Access, 2023.
102. Sofi, M.A. and M.A. Wani, Protein secondary structure prediction using data-
partitioning combined with stacked convolutional neural networks and bidirectional
gated recurrent units. International Journal of Information Technology, 2022. 14(5): p.
2285-2295.
103. Yuan, L., Y. Ma, and Y. Liu, Protein secondary structure prediction based on
Wasserstein generative adversarial networks and temporal convolutional networks
with convolutional block attention modules. Math Biosci Eng, 2023. 20(2): p. 2203-
2218.
104. Singh, J., et al., Reaching alignment-profile-based accuracy in predicting protein
secondary and tertiary structural properties without alignment. Sci Rep, 2022. 12(1):
p. 7607.
105. Newton, M.A.H., et al., Secondary structure specific simpler prediction models for
protein backbone angles. BMC Bioinformatics, 2022. 23(1): p. 6.
106. Enireddy, V., C. Karthikeyan, and D.V. Babu, OneHotEncoding and LSTM-based deep
learning models for protein secondary structure prediction. Soft Computing, 2022.
26(8): p. 3825-3836.
107. Charalampous, K., et al., Solving the Protein Secondary Structure Prediction Problem
With the Hessian Free Optimization Algorithm. IEEE Access, 2022. 10: p. 27759-
27770.
108. Liu, Z., et al., TMPSS: A Deep Learning-Based Predictor for Secondary Structure and
Topology Structure Prediction of Alpha-Helical Transmembrane Proteins. Front
Bioeng Biotechnol, 2020. 8: p. 629937.
109. Cao, X., et al., PSSP-MVIRT: peptide secondary structure prediction based on a multi-
view deep learning architecture. Briefings in Bioinformatics, 2021. 22(6): p. bbab203.
110. Xu, Y. and J. Cheng, Secondary structure prediction of protein based on multi scale
Convolutional attention neural networks. Mathematical Biosciences and Engineering,
2021. 18(4): p. 3404-3423.
111. Miao, Z., et al., CSI-LSTM: a web server to predict protein secondary structure using
bidirectional long short term memory and NMR chemical shifts. J Biomol NMR, 2021.
75(10-12): p. 393-400.
112. Kotowski, K., et al., ProteinUnet—An efficient alternative to SPIDER3‐single for
sequence‐based prediction of protein secondary structures. Journal of computational
chemistry, 2021. 42(1): p. 50-59.
113. Guo, Z., J. Hou, and J. Cheng, DNSS2: Improved ab initio protein secondary structure
prediction using advanced deep learning architectures. Proteins, 2021. 89(2): p. 207-
217.
114. Jin, H., et al. Combining GCN and Bi-LSTM for protein secondary structure
prediction. in 2021 IEEE International Conference on Bioinformatics and
Biomedicine (BIBM). 2021. IEEE.
115. Görmez, Y., M. Sabzekar, and Z. Aydın, IGPRED: Combination of convolutional
neural and graph convolutional networks for protein secondary structure prediction.

17
Proteins, 2021. 89(10): p. 1277-1288.
116. Nahid, T.H., F.A. Jui, and P.C. Shill. Protein Secondary Structure Prediction using
Graph Neural Network. in 2021 5th International Conference on Electrical
Information and Communication Technology (EICT). 2021. IEEE.
117. Cheng, J., Y. Liu, and Y. Ma, Protein secondary structure prediction based on
integration of CNN and LSTM model. Journal of Visual Communication and Image
Representation, 2020. 71: p. 102844.
118. Xu, G., Q. Wang, and J. Ma, OPUS-TASS: a protein backbone torsion angles and
secondary structure predictor based on ensemble neural networks. Bioinformatics,
2020. 36(20): p. 5021-5026.
119. Zhao, Y., H. Zhang, and Y. Liu, Protein secondary structure prediction based on
generative confrontation and convolutional neural network. IEEE Access, 2020. 8: p.
199171-199178.
120. Krieger, S. and J. Kececioglu, Boosting the accuracy of protein secondary structure
prediction through nearest neighbor search and method hybridization. Bioinformatics,
2020. 36(Supplement_1): p. i317-i325.
121. Uddin, M.R., et al., SAINT: self-attention augmented inception-inside-inception
network improves protein secondary structure prediction. Bioinformatics, 2020.
36(17): p. 4599-4608.
122. Long, S. and P. Tian, Protein secondary structure prediction with context
convolutional neural network. RSC advances, 2019. 9(66): p. 38391-38396.
123. Hanson, J., et al., Improving prediction of protein secondary structure, backbone
angles, solvent accessibility and contact numbers by using predicted contact maps and
an ensemble of recurrent and residual convolutional neural networks. Bioinformatics,
2019. 35(14): p. 2403-2410.
124. Zhang, B., J. Li, and Q. Lu, Prediction of 8-state protein secondary structures by a
novel deep learning architecture. BMC Bioinformatics, 2018. 19(1): p. 293.
125. Fang, C., et al., MUFold-SSW: a new web server for predicting protein secondary
structures, torsion angles and turns. Bioinformatics, 2020. 36(4): p. 1293-1295.
126. Fang, C., Y. Shang, and D. Xu, Prediction of protein backbone torsion angles using
deep residual inception neural networks. IEEE/ACM transactions on computational
biology and bioinformatics, 2018. 16(3): p. 1020-1028.
127. Fang, C., Y. Shang, and D. Xu, MUFold-BetaTurn: a deep dense inception network for
protein beta-turn prediction. arXiv preprint arXiv:1808.04322, 2018.
128. Fang, C., Y. Shang, and D. Xu, Improving protein gamma-turn prediction using
inception capsule networks. Scientific reports, 2018. 8(1): p. 15741.
129. Kumar, M., et al., BhairPred: prediction of β-hairpins in a protein from multiple
alignment information using ANN and SVM techniques. Nucleic Acids Research,
2005. 33(Web Server): p. W154-W159.
130. Parry, D.A.D., Coiled-Coils in Alpha-Helix-Containing Proteins - Analysis of the
Residue Types within the Heptad Repeat and the Use of These Data in the Prediction
of Coiled-Coils in Other Proteins. Bioscience Reports, 1982. 2(12): p. 1017-1024.

18
131. Delorenzi, M. and T. Speed, An HMM model for coiled-coil domains and a
comparison with PSSM-based predictions. Bioinformatics, 2002. 18(4): p. 617-625.
132. Bartoli, L., et al., CCHMM_PROF: a HMM-based coiled-coil predictor with
evolutionary information. Bioinformatics, 2009. 25(21): p. 2757-2763.
133. McDonnell, A.V., et al., Paircoil2: improved prediction of coiled coils from sequence.
Bioinformatics, 2006. 22(3): p. 356-358.
134. Mason, J.M., et al., Semirational design of Jun-Fos coiled coils with increased
affinity: Universal implications for leucine zipper prediction and design. Proceedings
of the National Academy of Sciences, 2006. 103(24): p. 8989-8994.
135. Gruber, M., J. Soding, and A.N. Lupas, REPPER--repeats and their periodicities in
fibrous proteins. Nucleic Acids Research, 2005. 33(Web Server): p. W239-W243.
136. Guzenko, D. and S.V. Strelkov, CCFold: rapid and accurate prediction of coiled-coil
structures and application to modelling intermediate filaments. Bioinformatics, 2018.
34(2): p. 215-222.
137. Berger, B., et al., Predicting coiled coils by use of pairwise residue correlations. Proc
Natl Acad Sci U S A, 1995. 92(18): p. 8259-63.
138. Wolf, E., P.S. Kim, and B. Berger, MultiCoil: a program for predicting two- and three-
stranded coiled coils. Protein Sci, 1997. 6(6): p. 1179-89.
139. Lupas, A., M. Van Dyke, and J. Stock, Predicting coiled coils from protein sequences.
Science, 1991. 252(5009): p. 1162-4.
140. Berger, B. and M. Singh, An iterative method for improved protein structural motif
recognition. J Comput Biol, 1997. 4(3): p. 261-73.
141. Li, C., et al., Critical evaluation of in silico methods for prediction of coiled-coil
domains in proteins. Briefings in Bioinformatics, 2016. 17(2): p. 270-282.
142. Ludwiczak, J., et al., DeepCoil-a fast and accurate prediction of coiled-coil domains
in protein sequences. Bioinformatics, 2019. 35(16): p. 2790-2795.
143. Feng, S.H., C.Q. Xia, and H.B. Shen, CoCoPRED: coiled-coil protein structural
feature prediction from amino acid sequence using deep neural networks.
Bioinformatics, 2022. 38(3): p. 720-729.
144. Jia, S.-C. and X.-Z. Hu, Using Random Forest Algorithm to Predict β-Hairpin Motifs.
Protein & Peptide Letters, 2011. 18(6): p. 609-617.
145. Xia, J.-F., et al., Prediction of β-Hairpins in Proteins Using Physicochemical
Properties and Structure Information. Protein & Peptide Letters, 2010. 17(9): p. 1123-
1128.
146. Zou, D., Z. He, and J. He, β-Hairpin prediction with quadratic discriminant analysis
using diversity measure. Journal of Computational Chemistry, 2009: p. NA-NA.
147. Hu, X.Z. and Q.Z. Li, Prediction of the β-Hairpins in Proteins Using Support Vector
Machine. The Protein Journal, 2008. 27(2): p. 115-122.
148. Kuhn, M., J. Meiler, and D. Baker, Strand-loop-strand motifs: Prediction of hairpins
and diverging turns in proteins. Proteins: Structure, Function, and Bioinformatics,
2004. 54(2): p. 282-288.
149. de la Cruz, X., et al., Toward predicting protein topology: An approach to

19
identifying β hairpins. Proceedings of the National Academy of Sciences, 2002.
99(17): p. 11157-11162.
150. Singh, H. and G.P.S. Raghava, BLAST-based structural annotation of protein residues
using Protein Data Bank. Biology Direct, 2016. 11: p. 4.
151. Dodd, I.B. and J.B. Egan, Improved detection of helix-turn-helix DNA-binding motifs
in protein sequences. Nucleic Acids Research, 1990. 18(17): p. 5019-5026.
152. Dodd, I.B. and J.B. Egan, Systematic method for the detection of potential lambda
Cro-like DNA-binding regions in proteins. J Mol Biol, 1987. 194(3): p. 557-64.
153. Narasimhan, G., et al., Mining Protein Sequences for Motifs. Journal of Computational
Biology, 2002. 9(5): p. 707-720.
154. Xiong, W., et al., Local combinational variables: an approach used in DNA-binding
helix-turn-helix motif prediction with sequence information. Nucleic Acids Research,
2009. 37(17): p. 5632-5640.
155. Sun, L.X. and X.Z. Hu, Recognition of beta-alpha-beta Motifs in Proteins by Using
Random Forest Algorithm. Proceedings of the 2013 6th International Conference on
Biomedical Engineering and Informatics (Bmei 2013), Vols 1 and 2, 2013: p. 546-551.
156. Zou, D.S., et al., Supersecondary Structure Prediction Using Chou's Pseudo Amino
Acid Composition. Journal of Computational Chemistry, 2011. 32(2): p. 271-278.
157. Flot, M., et al., StackSSSPred: A Stacking-Based Prediction of Supersecondary
Structure from Sequence. Methods Mol Biol, 2019. 1958: p. 101-122.
158. Hu, X.-z., et al., Using random forest algorithm to predict super-secondary structure
in proteins. The Journal of Supercomputing, 2020. 76: p. 3199-3210.
159. Trigg, J., et al., Multicoil2: Predicting Coiled Coils and Their Oligomerization States
from Sequence in the Twilight Zone. PLoS ONE, 2011. 6(8): p. e23519.
160. Song, J.N. and L. Kurgan, Availability of web servers significantly boosts citations
rates of bioinformatics methods for protein function and disorder prediction.
Bioinformatics Advances, 2023. 3(1).
161. Burley, S.K., et al., RCSB Protein Data Bank: powerful new tools for exploring 3D
structures of biological macromolecules for basic and applied research and education
in fundamental biology, biomedicine, biotechnology, bioengineering and energy
sciences. Nucleic Acids Res, 2021. 49(D1): p. D437-D451.

20

View publication stats

You might also like