Chapter 7 Multiple Sequence Alignment Tools Sof 2022 Bioinformatics For
Chapter 7 Multiple Sequence Alignment Tools Sof 2022 Bioinformatics For
Multiple sequence
alignment tools e software
and resources 7
7.1 Introduction
Multi-sequence alignment (MSA) is one of the oldest computational biology issues.
It has more than two DNA, RNA or protein sequences that are associated with it.
One frequently employed technique is to eliminate misplaces, inserts and deletions
in the alignments, and an optimal alignment can be calculated using the Dynamic
Programming (DP) algorithm. Unfortunately only a limited number of sequences
can computerise the DP algorithm, and DP is thus only used for calculating
alignments in pairs. The complexity of the computation of pair sequences is O
(n2), however, and thus, while computationally costly, still can be calculated opti-
mally. We must use various heuristic methods to construct multiple sequence align-
ments (MSAs). The computer complexity is O (2knk), where k is the sequence
number and n the length. It takes around 28 to 1008 ¼ 3 to 1018 s, slightly longer
than the predicted universe Age, in other words to align eight DNA sequences of
100 bases each.
Sequence comparison, data quality evaluation, protein and RNA-structure pre-
diction, database quest and phylogenetic analyses can be used to compare different
sequence alignments. Therefore, depending on the function, different approaches are
used. The most commonly used MSA software’s and tools are well-described here.
7.1.1 Kalign
Kalign is a rapid and precise multi-sequence protein, RNA and DNA sequence align-
ment algorithm. It is locally focussed and adaptable to large alignments.
Steps to use:
(1) Open Kalign on your browser using link https://www.ebi.ac.uk/Tools/msa/
kalign/.
(2) Select type of sequence whether DNA or protein that needs to be analysed.
(3) Paste sequences in any suitable format.
For example, using two sequences, i.e. Sequence1 and Sequence2 of Brassica
oleraceae variety (Fig. 7.1).
FIGURE 7.1
Kalign-Multiple sequence alignment.
7.1.2 MView
MView reformats sequence database search results (BLAST, FASTA, etc.) or mul-
tiple alignments (MSF, PIR, CLUSTAL, etc.) and optionally adds HTML to the page
design and layout power. MView is neither a multi-alignment application nor a
general-purpose alignment editor.
Steps to use:
(1) Open MView on your browser using link https://www.ebi.ac.uk/Tools/msa/
mview/.
(2) Select the type of sequence (DNA or protein) that need to be analysed.
(3) Paste sequences in any suitable format.
For example, pasting two sequences, i.e. Sequence1 and Sequence2 of Brassica
oleraceae variety (Fig. 7.4).
(4) Submit.
(5) Download alignment file (Fig. 7.5).
7.1 Introduction 57
FIGURE 7.2
Clustal alignment by Kalign.
FIGURE 7.3
Phylogenetic tree by Kalign.
7.1.3 WebPRANK
WebPRANK is a modern MSA programme with phylogenics which uses progressive
information to help insert and delete information. The WebPRANK server facilitates
genome sequencing with DNA, protein and codon sequences, as well as protein-
translated DNA alignment. The resulting alignments are commonly used in the
evolutionary sequence analysis in different formats. In order to visualise and post-
processing the findings in a server-related cladogram, a web-based alignment
browser provides the webPRANK server for removal of low reliably alignment col-
umns. In addition to de novo alignments, WebPRANK can be used to deduce the age
sequences with phylogenetically feasible distance patterns, as well as annotation and
post-processing of existing alignments.
58 CHAPTER 7 Multiple sequence alignment tools
FIGURE 7.4
MView- Multiple alignment viewer.
FIGURE 7.5
MView results.
7.1 Introduction 59
Steps to use:
(1) Open webPRANK using link https://www.ebi.ac.uk/goldman-srv/webprank/.
(2) Paste the sequences that needs to be aligned in FASTA format.
For example, pasting two sequences viz., Sequence1 and Sequence2 of a
Brassica oleraceae variety (Fig. 7.6).
(3) Submit and start alignment.
(4) Download results (Fig. 7.7).
FIGURE 7.6
WebPRANK- Data input.
FIGURE 7.7
WebPRANK results.
7.1.4 TM-aligner
An online method for aligning the transmembrane proteins with an algorithm match-
ing string Wu-Manber. In different colour schemes, the tool can display multiple
sequence lines. TM-align is a sequence-independent algorithm for comparison of
the protein structure. TM-align first produces an optimised residue-to-residue align-
ment with structural similarity, using heuristic DP iterations, in respect of two
60 CHAPTER 7 Multiple sequence alignment tools
protein structures of unknown equivalence. The two structures based on the detected
alignment are returned with an ideal superimposition, along with a TM score that
measures the structural resemblance. The TM-score is in (0, 1), where 1 means
the two structures fit perfectly. Scores below 0.2 refer to the unrelated randomly
selected protein, while those above 0.5 typically take on the same fold in SCOP/
CATH, based on strict statistics for structures in the PDB.
Steps to use tm-aligner:
(1) Open TM-Aligner using link https://zhanglab.dcmb.med.umich.edu/TM-align/.
(2) Input Structure 1 and Structure 2 in PDB format or PDBx/mmCIF format
(mandatory):
Example, Structure 1 and Structure 2 of random sequences (Fig. 7.8).
(3) Run TM-Aligner.
(4) Protein visualisation (Protein-1 in blue and Protein-2 in red) (Fig. 7.9).
FIGURE 7.8
TM-Aligner Sequence Input page.
FIGURE 7.9
Protein visualization in TM-Aligner.
FIGURE 7.10
Annotation based on protein (Mustguseal).
7.2.1 PSAweb
PSAweb is a web server that is designed to analyse protein alignment and amino acid
sequence. This is a comprehensive Internet online tool that enables the quick visual-
isation of an analysis in GIF format through output. It helps the user to analyse and
present the primary protein structure and protein alignment.
Protein sequence analysis: The server enables users to map the proprieties of
amino acids along the basic protein (e.g. plot for flexibility and hydrophilicity).
Up to four properties can be selected out of 36 server features available, in a single
window or in several windows at a time.
Steps to use:
(1) Open PSAweb server.
(2) Click on Analysis of Single Sequence or MSA.
(3) Run analysis.
For example, amino acid sequence of Insulin (Homo sapiens) (Fig. 7.11).
(4) View analysis of submitted protein sequence (Fig. 7.12).
7.2 How does mustguseal function? 63
FIGURE 7.11
PSAweb- Sequence input.
FIGURE 7.12
Protein analysis in PSAweb.
FIGURE 7.13
PVS homepage.
7.2.3 PRALINE
Praline is an MSA programme with various alignment approaches, such as structural
information integration into the alignment process. It also offers an overview of the
various alignment of the sequences.
Steps to use:
(1) Open PRALINE on your browser using link https://www.ibi.vu.nl/programs/
pralinewww/.
(2) Paste in your PROTEIN sequences in FASTA format (MAX 500 sequences,
length 2000).
(3) Submit and run.
For example, if there are two protein sequences viz., Sample1 and Sample 2. The
result will be like (Fig. 7.14).
7.2.4 PROMALS3D
It’s a web-based tool for creating MSAs. The databases are being scanned and struc-
tured and used with user limitations.
Steps to use:
(1) Open link for PROMALS3D viz., http://prodata.swmed.edu/promals3d/
promals3d.php.
7.2 How does mustguseal function? 65
FIGURE 7.14
PRALINE Sequences submit.
(2) Enter two sequences or more than two sequences of protein in FASTA format
that needs to be aligned. For example, sequences of insulin, isoform 2 pre-
cursor [Homo sapiens] and MicE [Microbacterium arborescens] (Fig. 7.15).
(3) Submit sequences.
(4) Check alignment results (Fig. 7.16).
FIGURE 7.15
Data input-PROMALS3D.
FIGURE 7.16
Colored PROMALS3D alignment result.
7.2 How does mustguseal function? 67
FIGURE 7.17
MAFFT Results.
FIGURE 7.18
Phylogenetic tree by MAFFT.
68 CHAPTER 7 Multiple sequence alignment tools
7.3.2 DIALIGN-TX
It is the most recent release of the MSA tool. It generates substantially better align-
ments on locally and globally linked sequence sets than previous versions of DIA-
LIGN due to several algorithmic improvements. However, DIALIGN-T uses a
straightforward greedy method, as in the original implementation of the programme,
to combine various alignments from local pairs of similarities. The most important
algorithm in DIALIGN-T is the use of a guide tree.
7.3.5 Phylo
Phylo is the most used platform by people to refine DNA’s MSA with patterns. It is
very easy to use as there is no need for detailed biological expertise in this platform.
7.3.6 PRANK
PRANK has been planned to create several lines representing the progressive
homology and phylogenetic details for inserts and deletions.
7.3.7 CRASP
In order to identify associated residues the method analyses many Protein Sequence
alignments. The algorithm takes the position that residues are the product of func-
tional inventions. Estimates are dependent on physicochemical properties.
7.3.8 ProbCons
Multiple alignment of amino acid sequences based on probabilistic consistency. In
the alignment construction it employs probabilistic modelling and consistency
7.3 Some other MSA tools 69
technique. In comparison with T-Coffee, Clustal W and Dialign, the authors say this
method has enhanced alignments.
7.3.9 DIALIGN
DIALIGN is a platform for MSAs. DIALIGN-TX is an enhanced variant, a switch
that improves on DIALIGN-T, combining selfish, egalitarian approaches.
7.3.11 R-Coffee
R-Coffee is a packet that is extracted from the T-coffee package for several
alignments of RNA sequences. It uses structural information to build sequence
alignments, and a special T-Coffee version builds several sequence alignments
with structural information. Specifications: RNAlppold, Mafft, Muscle, ProbCons
and ConSan from Vienna kit.
7.3.13 OD-seq
OD-seq is an MSA tool to identify outliers. It works by identifying sequences in the
multiple alignment with an inconsistent average distance from sequences.
7.3.14 BARCOD
By using the Vronique Barriels process, BARCOD creates a character matrix that
codes each input/deletion event for a single event, regardless of its duration, and
maintains common indels.
7.3.15 Edialign
Edialign is an EMBOSS variant of DIALIGN 22 MSA tool. It provides an MSA and
takes nucleic acid or protein sequences as input. The sequences do not have to be
identical over the full duration, since the software builds alignments from pairs of
gapless sequence segments. Such pairs of segments are called diagonals. If
70 CHAPTER 7 Multiple sequence alignment tools
(possibly) coding nucleic acid sequences are to be matched, edialign can alterna-
tively convert the compared ‘nucleic acid segments’ to ‘peptide segments’, or
even perform comparisons at both the nucleic acid and protein levels to improve
sensitivity.
7.3.16 MAFCO
MAFCO is Multiple Alignment Format Compression tool specially built to
compress MAF files.
7.3.18 MSAprobs
It is a tool for analysis of protein sequences using MSA. It uses a mix of hidden Mar-
kov models, weighted probabilistic accuracy, weighted profile to profile alignments.
7.3.21 Biojs-io-clustal
It is one of the important tools used for parsing Clustal files in web browser.
7.3.22 PASTA
PASTA is also known as Practical Alignment using Sate and TrAnsitivity. It uses a
guide tree for MSA.
7.3.23 SARA-Coffee
SARA is an MSA web server resource of various three-dimensional structure-driven
RNA sequences. The SARA software combines pair-wise structural alignments in
multiple RNA alignments with another R-Coffee resource.
7.3 Some other MSA tools 71
7.3.24 Staccato
Staccato is an MSA, combining three-dimensional probabilities of structure
alignment and the normal probabilities of amino acid replacement.
7.3.25 MARS
MARS is a method developed specifically for the alignment of circular genome se-
quences, like mitochondria and viral genome sequences.
7.3.26 Malakite
Malakite (Multiple Alignment Automatic Kinship Tiling Engine) is a web-based
method for the study of aligned blocks in several alignments in the protein chain.
7.3.27 trimAl
It is a tool available online for removing incorrectly matched MSA sequences. To
maximise the signal-to-noise ratio, you can automatically detect and pick different
parameters.
7.3.28 Multi-LAGAN
Multi-LAGAN is a multi-genomic sequence alignment tool. It is also known as
MLAGAN.
7.3.29 Pro-Coffee
A component and implemented for multiple alignment of the promoter areas, the
T-Coffee Kit includes Pro-Coffee.
7.3.30 R3D-2-MSA
R3D-2-MSA is a web-based application for connecting 3D structures with a range of
RNA sequence alignments. The R3D-2-MSA is a tool for the RNA 3D structures.
7.3.31 ProDA
ProDA is a method that first identifies repeatedly homologous regions in a series of
protein sequences for local multiple sequence (MSAs).
7.3.32 MSAProbs-MPI
It is a Multiple Sequence MSAProbs parallel edition. The process is based on
Markov’s secret models.
72 CHAPTER 7 Multiple sequence alignment tools
7.3.33 HmmCleaner
It is used in conjunction with hidden Markov profile models to remove alignment
and sequencing errors from different sequence alignments (pHMM). The tool is built
upon and incorporates.
7.3.35 PnpProbs
It operates in two groups with a sequence assignment distant and ‘normally’ and
uses only a guideline tree for ‘normally’ linked sequences. A non-progressive
approach for multiple sequences for remotely linked sequences is used.
7.3.36 ANTICALIgN
An instrument developed specifically for combinatory protein engineering. Based on
a reference sequence template and global sequence alignment, ANTICALIgN can
create MSA.
7.3.37 FAMSA
FAMSA is designed to quickly align large protein families with multiple sequences.
It first identifies the longest common sequences and is able to calculate the gap costs
in a specific way. It continues to apply a new iterative approach gradually to the
alignments. The authors say that Clustal Omega and MAFFT are superior to
FAMSA.
7.3.38 KMAD
KMAD is a particular platform that has been developed to construct multiple aligned
proteins (IDPs). IDPs differ from globular proteins because they lack tertiary struc-
ture and have less sequence conservation.
7.3.39 VerAlign
VerAlign is a software that compares the accuracy of a test alignment to the quality
of a reference version of the same alignments. It uses SPdist scoring, which calcu-
lates a distance between malfunctioned pairs of amino acid.
Further reading 73
Further reading
Cabanettes, F., Klopp, C., 2018. D-GENIES: dot plot large genomes in an interactive, efficient
and simple way. PeerJ 6, e4958.
Frazer, K.A., 2004. VISTA: computational tools for comparative genomics. Nucleic Acids
Res. 32 (Web Server issue), W273eW279.
Garcia-Boronat, M., Diez-Rivero, C.M., Reinherz, E.L., Reche, P.A., 2008. PVS: a web server
for protein sequence variability analysis tuned to facilitate conserved epitope discovery.
Nucleic Acids Res. 1 (35e41), 36.
Junier, T., Pagni, M., 2000. Dotlet: diagonal plots in a web browser. Bioinformatics 16 (2),
178e179.
Katoh, K., Rozewicki, J., Yamada, K.D., 2019. MAFFT online service: multiple sequence
alignment, interactive sequence choice and visualization. Briefings Bioinf. 20 (4),
1160e1166.
Noé, L., Kucherov, G., 2005. YASS: enhancing the sensitivity of DNA similarity search.
Nucleic Acids Res. 33, W540eW543.
Pei, J., Tang, M., Grishin, N.V., 2008. PROMALS3D web server for accurate multiple protein
sequence and structure alignments. Nucleic Acids Res. 36 (Web Server issue),
W30eW34.
Raghava, G.P.S., 2001. A graphical web server for the analysis of protein sequences and
alignment. Biotech Softw. Internet Rep. 2 (6).
Simossis, V.A., Heringa, J., 2005. PRALINE: a multiple sequence alignment toolbox that
integrates homology-extended and secondary structure information. Nucleic Acids Res.
33 (Web Server issue), W289eW294.
Suplatov, D.A., Kopylov, K.E., Popova, N.N., Voevodin, V.V., Svedas, V.K., 2018. Mustgu-
seal: a server for multiple structure-guided sequence alignment of protein families.
Bioinformatics 34 (9).
Lassmann, T., Sonnhammer, E.L.L., 2006. Kalign, Kalignvu and Mumsa: Web servers for
multiple sequence alignment. Nucleic Acids Res. 34 (Suppl. l_2), W596eW599.
Troshin, P.V., Procter, J.B., Barton, G.J., 2011. Java bioinformatics analysis web services for
multiple sequence alignment-JABAWS: MSA. BMC Bioinf. 27 (14), 2001e2002.
Wheeler, T.J., Kececioglu, J.D., 2007. Multiple alignment by aligning alignments. Bioinfor-
matics 23 (13), i559ei568.
Robert, X., Gouet, P., 2014. Deciphering key features in protein structures with the new END-
script server. Nucleic Acids Res. 4, W320eW324.
Zhang, Y., Skolnick, J., 2005. TM-align: a protein structure alignment algorithm based on
TM-score. Nucleic Acids Res. 33, 2302e2309.