0% found this document useful (0 votes)
12 views19 pages

Chapter 7 Multiple Sequence Alignment Tools Sof 2022 Bioinformatics For

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views19 pages

Chapter 7 Multiple Sequence Alignment Tools Sof 2022 Bioinformatics For

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CHAPTER

Multiple sequence
alignment tools e software
and resources 7
7.1 Introduction
Multi-sequence alignment (MSA) is one of the oldest computational biology issues.
It has more than two DNA, RNA or protein sequences that are associated with it.
One frequently employed technique is to eliminate misplaces, inserts and deletions
in the alignments, and an optimal alignment can be calculated using the Dynamic
Programming (DP) algorithm. Unfortunately only a limited number of sequences
can computerise the DP algorithm, and DP is thus only used for calculating
alignments in pairs. The complexity of the computation of pair sequences is O
(n2), however, and thus, while computationally costly, still can be calculated opti-
mally. We must use various heuristic methods to construct multiple sequence align-
ments (MSAs). The computer complexity is O (2knk), where k is the sequence
number and n the length. It takes around 28 to 1008 ¼ 3 to 1018 s, slightly longer
than the predicted universe Age, in other words to align eight DNA sequences of
100 bases each.
Sequence comparison, data quality evaluation, protein and RNA-structure pre-
diction, database quest and phylogenetic analyses can be used to compare different
sequence alignments. Therefore, depending on the function, different approaches are
used. The most commonly used MSA software’s and tools are well-described here.

7.1.1 Kalign
Kalign is a rapid and precise multi-sequence protein, RNA and DNA sequence align-
ment algorithm. It is locally focussed and adaptable to large alignments.
Steps to use:
(1) Open Kalign on your browser using link https://www.ebi.ac.uk/Tools/msa/
kalign/.
(2) Select type of sequence whether DNA or protein that needs to be analysed.
(3) Paste sequences in any suitable format.
For example, using two sequences, i.e. Sequence1 and Sequence2 of Brassica
oleraceae variety (Fig. 7.1).

Bioinformatics for Everyone. https://doi.org/10.1016/B978-0-323-91128-3.00012-4 55


Copyright © 2022 Elsevier Inc. All rights reserved.
56 CHAPTER 7 Multiple sequence alignment tools

FIGURE 7.1
Kalign-Multiple sequence alignment.

(4) Submit your sequences.


(5) Download alignment file (Fig. 7.2).
(6) You can also download phylogenetic tree of submitted sequences (Fig. 7.3).

7.1.2 MView
MView reformats sequence database search results (BLAST, FASTA, etc.) or mul-
tiple alignments (MSF, PIR, CLUSTAL, etc.) and optionally adds HTML to the page
design and layout power. MView is neither a multi-alignment application nor a
general-purpose alignment editor.
Steps to use:
(1) Open MView on your browser using link https://www.ebi.ac.uk/Tools/msa/
mview/.
(2) Select the type of sequence (DNA or protein) that need to be analysed.
(3) Paste sequences in any suitable format.
For example, pasting two sequences, i.e. Sequence1 and Sequence2 of Brassica
oleraceae variety (Fig. 7.4).
(4) Submit.
(5) Download alignment file (Fig. 7.5).
7.1 Introduction 57

FIGURE 7.2
Clustal alignment by Kalign.

FIGURE 7.3
Phylogenetic tree by Kalign.

7.1.3 WebPRANK
WebPRANK is a modern MSA programme with phylogenics which uses progressive
information to help insert and delete information. The WebPRANK server facilitates
genome sequencing with DNA, protein and codon sequences, as well as protein-
translated DNA alignment. The resulting alignments are commonly used in the
evolutionary sequence analysis in different formats. In order to visualise and post-
processing the findings in a server-related cladogram, a web-based alignment
browser provides the webPRANK server for removal of low reliably alignment col-
umns. In addition to de novo alignments, WebPRANK can be used to deduce the age
sequences with phylogenetically feasible distance patterns, as well as annotation and
post-processing of existing alignments.
58 CHAPTER 7 Multiple sequence alignment tools

FIGURE 7.4
MView- Multiple alignment viewer.

FIGURE 7.5
MView results.
7.1 Introduction 59

Steps to use:
(1) Open webPRANK using link https://www.ebi.ac.uk/goldman-srv/webprank/.
(2) Paste the sequences that needs to be aligned in FASTA format.
For example, pasting two sequences viz., Sequence1 and Sequence2 of a
Brassica oleraceae variety (Fig. 7.6).
(3) Submit and start alignment.
(4) Download results (Fig. 7.7).

FIGURE 7.6
WebPRANK- Data input.

FIGURE 7.7
WebPRANK results.

7.1.4 TM-aligner
An online method for aligning the transmembrane proteins with an algorithm match-
ing string Wu-Manber. In different colour schemes, the tool can display multiple
sequence lines. TM-align is a sequence-independent algorithm for comparison of
the protein structure. TM-align first produces an optimised residue-to-residue align-
ment with structural similarity, using heuristic DP iterations, in respect of two
60 CHAPTER 7 Multiple sequence alignment tools

protein structures of unknown equivalence. The two structures based on the detected
alignment are returned with an ideal superimposition, along with a TM score that
measures the structural resemblance. The TM-score is in (0, 1), where 1 means
the two structures fit perfectly. Scores below 0.2 refer to the unrelated randomly
selected protein, while those above 0.5 typically take on the same fold in SCOP/
CATH, based on strict statistics for structures in the PDB.
Steps to use tm-aligner:
(1) Open TM-Aligner using link https://zhanglab.dcmb.med.umich.edu/TM-align/.
(2) Input Structure 1 and Structure 2 in PDB format or PDBx/mmCIF format
(mandatory):
Example, Structure 1 and Structure 2 of random sequences (Fig. 7.8).
(3) Run TM-Aligner.
(4) Protein visualisation (Protein-1 in blue and Protein-2 in red) (Fig. 7.9).

FIGURE 7.8
TM-Aligner Sequence Input page.

7.1.5 Mustguseal (multiple structure-guided sequence alignment)


It is a multi-sequence protein family alignment web application. The programme
builds structural and other information-based alignments in public databases.
7.2 How does mustguseal function? 61

FIGURE 7.9
Protein visualization in TM-Aligner.

Bioinformatic protocol designed to create a wide arrays of functionally diverse


protein families. Mustguseal is a web-based platform. Mustguseal can be used in
a superfamily to create a concentrated alignment in the selected family of proteins
or to superimpose a wide collection of similar proteins.

7.2 How does mustguseal function?


The Mustguseal Protocol conducts a similarity structure quest to evolvingly collect
remote family members who are supposed to represent various protein groups. Then,
Mustguseal carries out a sequence quest for similarities to collect closely linked rel-
atives e members of the respective families e for each evolutionarily distant relative
gathered. In order to achieve a range of functionally diverse homologous proteins,
Mustguseal takes into account variation of sequences and structures within a broad
superfamily. The final multiple alignment is then applied by a combination of struc-
ture and sequence alignment procedures.
Steps to use:
(1) Open Mustguseal using link https://mustguseal.belozersky.msu.ru/#scenario¼1.
(2) Submit a Query code in PDB database, e.g. 1r3c.
(3) Submit Query chain, e.g. A.
(4) Choose a PDB structure, e.g. X-ray structures only or the entire PDB database.
(5) Select sequence similarity search database.
(6) Submit.
(7) Download results.
(8) Showing annotation based on protein 0_1r3c_A (Fig. 7.10).
62 CHAPTER 7 Multiple sequence alignment tools

FIGURE 7.10
Annotation based on protein (Mustguseal).

7.2.1 PSAweb
PSAweb is a web server that is designed to analyse protein alignment and amino acid
sequence. This is a comprehensive Internet online tool that enables the quick visual-
isation of an analysis in GIF format through output. It helps the user to analyse and
present the primary protein structure and protein alignment.
Protein sequence analysis: The server enables users to map the proprieties of
amino acids along the basic protein (e.g. plot for flexibility and hydrophilicity).
Up to four properties can be selected out of 36 server features available, in a single
window or in several windows at a time.
Steps to use:
(1) Open PSAweb server.
(2) Click on Analysis of Single Sequence or MSA.
(3) Run analysis.
For example, amino acid sequence of Insulin (Homo sapiens) (Fig. 7.11).
(4) View analysis of submitted protein sequence (Fig. 7.12).
7.2 How does mustguseal function? 63

FIGURE 7.11
PSAweb- Sequence input.

FIGURE 7.12
Protein analysis in PSAweb.

7.2.2 PVS (protein variability server)


In order to calculate sequence variability in a multi-protein sequence alignment, the
PVS web server uses several variability metrics. The tool will map the sequence
variability to the supplied 3D structure, map the variability, serial variability, predict
t-cell epitopes, find preserved 3D structured sequences and return retention frag-
ments. The PVS measures are very straightforward. Only enter your order in the
box and perform analysis (Fig. 7.13).
64 CHAPTER 7 Multiple sequence alignment tools

FIGURE 7.13
PVS homepage.

7.2.3 PRALINE
Praline is an MSA programme with various alignment approaches, such as structural
information integration into the alignment process. It also offers an overview of the
various alignment of the sequences.
Steps to use:
(1) Open PRALINE on your browser using link https://www.ibi.vu.nl/programs/
pralinewww/.
(2) Paste in your PROTEIN sequences in FASTA format (MAX 500 sequences,
length 2000).
(3) Submit and run.
For example, if there are two protein sequences viz., Sample1 and Sample 2. The
result will be like (Fig. 7.14).

7.2.4 PROMALS3D
It’s a web-based tool for creating MSAs. The databases are being scanned and struc-
tured and used with user limitations.
Steps to use:
(1) Open link for PROMALS3D viz., http://prodata.swmed.edu/promals3d/
promals3d.php.
7.2 How does mustguseal function? 65

FIGURE 7.14
PRALINE Sequences submit.

(2) Enter two sequences or more than two sequences of protein in FASTA format
that needs to be aligned. For example, sequences of insulin, isoform 2 pre-
cursor [Homo sapiens] and MicE [Microbacterium arborescens] (Fig. 7.15).
(3) Submit sequences.
(4) Check alignment results (Fig. 7.16).

7.2.5 MAFFT (CBRC)


MAFFT is a nucleotide and protein sequence alignment software. It enables users to
choose sequences and visualisations interactively.
Steps to use:
(1) Open link for MAFFT on your browser https://mafft.cbrc.jp/alignment/server/.
(2) Paste protein or DNA sequences in FASTA format.
(3) Submit.
(4) Check results (Fig. 7.17).
(5) Check phylogenetic tree of submitted sequences (Fig. 7.18).
66 CHAPTER 7 Multiple sequence alignment tools

FIGURE 7.15
Data input-PROMALS3D.

FIGURE 7.16
Colored PROMALS3D alignment result.
7.2 How does mustguseal function? 67

FIGURE 7.17
MAFFT Results.

FIGURE 7.18
Phylogenetic tree by MAFFT.
68 CHAPTER 7 Multiple sequence alignment tools

7.3 Some other MSA tools


7.3.1 OPAL (progressive-iterative alignment)
It is a ‘Shape and Polish Technique’ method for MSA. It can align protein and DNA
sequences, and expects FASTA inputs. The authors say that OPAL is more accurate
in protein sequence alignment than the muscle and comparable to Muscle and that its
accuracy is similar to that of MAFFT and Muscle in DNA sequence alignments.

7.3.2 DIALIGN-TX
It is the most recent release of the MSA tool. It generates substantially better align-
ments on locally and globally linked sequence sets than previous versions of DIA-
LIGN due to several algorithmic improvements. However, DIALIGN-T uses a
straightforward greedy method, as in the original implementation of the programme,
to combine various alignments from local pairs of similarities. The most important
algorithm in DIALIGN-T is the use of a guide tree.

7.3.3 CHAOS and DIALIGN web server


It is a web-based framework using an application that searches from the CHAOS
database to find a list of similarities in local sequences. These similarities are
used by DIALIGN as anchor points for several alignments of series.

7.3.4 UniProt align


An MSA web interface in Uniprot using Clustal Omega.

7.3.5 Phylo
Phylo is the most used platform by people to refine DNA’s MSA with patterns. It is
very easy to use as there is no need for detailed biological expertise in this platform.

7.3.6 PRANK
PRANK has been planned to create several lines representing the progressive
homology and phylogenetic details for inserts and deletions.

7.3.7 CRASP
In order to identify associated residues the method analyses many Protein Sequence
alignments. The algorithm takes the position that residues are the product of func-
tional inventions. Estimates are dependent on physicochemical properties.

7.3.8 ProbCons
Multiple alignment of amino acid sequences based on probabilistic consistency. In
the alignment construction it employs probabilistic modelling and consistency
7.3 Some other MSA tools 69

technique. In comparison with T-Coffee, Clustal W and Dialign, the authors say this
method has enhanced alignments.

7.3.9 DIALIGN
DIALIGN is a platform for MSAs. DIALIGN-TX is an enhanced variant, a switch
that improves on DIALIGN-T, combining selfish, egalitarian approaches.

7.3.10 Muscle (WS jabaws)


Jalview is a command-line user interface that you can use to JABAWS or install and
run JABAWS on your own device.

7.3.11 R-Coffee
R-Coffee is a packet that is extracted from the T-coffee package for several
alignments of RNA sequences. It uses structural information to build sequence
alignments, and a special T-Coffee version builds several sequence alignments
with structural information. Specifications: RNAlppold, Mafft, Muscle, ProbCons
and ConSan from Vienna kit.

7.3.12 PRANK API


An MSA method for the sequence of nucleic acid and amino acids at EBI for
PRANK. In order to prevent overestimating insert/delete events, the core algorithm
varies from ‘standard ones’. The evolutionary gap between sequences is taken into
consideration.

7.3.13 OD-seq
OD-seq is an MSA tool to identify outliers. It works by identifying sequences in the
multiple alignment with an inconsistent average distance from sequences.

7.3.14 BARCOD
By using the Vronique Barriels process, BARCOD creates a character matrix that
codes each input/deletion event for a single event, regardless of its duration, and
maintains common indels.

7.3.15 Edialign
Edialign is an EMBOSS variant of DIALIGN 22 MSA tool. It provides an MSA and
takes nucleic acid or protein sequences as input. The sequences do not have to be
identical over the full duration, since the software builds alignments from pairs of
gapless sequence segments. Such pairs of segments are called diagonals. If
70 CHAPTER 7 Multiple sequence alignment tools

(possibly) coding nucleic acid sequences are to be matched, edialign can alterna-
tively convert the compared ‘nucleic acid segments’ to ‘peptide segments’, or
even perform comparisons at both the nucleic acid and protein levels to improve
sensitivity.

7.3.16 MAFCO
MAFCO is Multiple Alignment Format Compression tool specially built to
compress MAF files.

7.3.17 MAFFT (REST)


It is an MSA tool at EBI with REST interface.

7.3.18 MSAprobs
It is a tool for analysis of protein sequences using MSA. It uses a mix of hidden Mar-
kov models, weighted probabilistic accuracy, weighted profile to profile alignments.

7.3.19 Clustal Omega (EBI)


The multiple interfaces of EBI Clustal Omega include web interface, REST API,
SOAP API and Open API.

7.3.20 T-Coffee (EBI)


It is the most widely used MSA programme. The T-Coffee programme pre-processes
the data by pairing all sequences and incorporates this information into the gradual
alignment procedure. Different sources may obtain structural sequence information.
Amino acid and nucleotide sequences may be aligned. The programme brings
together various methods of alignment.

7.3.21 Biojs-io-clustal
It is one of the important tools used for parsing Clustal files in web browser.

7.3.22 PASTA
PASTA is also known as Practical Alignment using Sate and TrAnsitivity. It uses a
guide tree for MSA.

7.3.23 SARA-Coffee
SARA is an MSA web server resource of various three-dimensional structure-driven
RNA sequences. The SARA software combines pair-wise structural alignments in
multiple RNA alignments with another R-Coffee resource.
7.3 Some other MSA tools 71

7.3.24 Staccato
Staccato is an MSA, combining three-dimensional probabilities of structure
alignment and the normal probabilities of amino acid replacement.

7.3.25 MARS
MARS is a method developed specifically for the alignment of circular genome se-
quences, like mitochondria and viral genome sequences.

7.3.26 Malakite
Malakite (Multiple Alignment Automatic Kinship Tiling Engine) is a web-based
method for the study of aligned blocks in several alignments in the protein chain.

7.3.27 trimAl
It is a tool available online for removing incorrectly matched MSA sequences. To
maximise the signal-to-noise ratio, you can automatically detect and pick different
parameters.

7.3.28 Multi-LAGAN
Multi-LAGAN is a multi-genomic sequence alignment tool. It is also known as
MLAGAN.

7.3.29 Pro-Coffee
A component and implemented for multiple alignment of the promoter areas, the
T-Coffee Kit includes Pro-Coffee.

7.3.30 R3D-2-MSA
R3D-2-MSA is a web-based application for connecting 3D structures with a range of
RNA sequence alignments. The R3D-2-MSA is a tool for the RNA 3D structures.

7.3.31 ProDA
ProDA is a method that first identifies repeatedly homologous regions in a series of
protein sequences for local multiple sequence (MSAs).

7.3.32 MSAProbs-MPI
It is a Multiple Sequence MSAProbs parallel edition. The process is based on
Markov’s secret models.
72 CHAPTER 7 Multiple sequence alignment tools

7.3.33 HmmCleaner
It is used in conjunction with hidden Markov profile models to remove alignment
and sequencing errors from different sequence alignments (pHMM). The tool is built
upon and incorporates.

7.3.34 MSA-PAD 2.0


It is an MSA DNA web-based tool. The algorithm uses PFAM or user-supplied
profiles. Registration and login are needed for the web interface.

7.3.35 PnpProbs
It operates in two groups with a sequence assignment distant and ‘normally’ and
uses only a guideline tree for ‘normally’ linked sequences. A non-progressive
approach for multiple sequences for remotely linked sequences is used.

7.3.36 ANTICALIgN
An instrument developed specifically for combinatory protein engineering. Based on
a reference sequence template and global sequence alignment, ANTICALIgN can
create MSA.

7.3.37 FAMSA
FAMSA is designed to quickly align large protein families with multiple sequences.
It first identifies the longest common sequences and is able to calculate the gap costs
in a specific way. It continues to apply a new iterative approach gradually to the
alignments. The authors say that Clustal Omega and MAFFT are superior to
FAMSA.

7.3.38 KMAD
KMAD is a particular platform that has been developed to construct multiple aligned
proteins (IDPs). IDPs differ from globular proteins because they lack tertiary struc-
ture and have less sequence conservation.

7.3.39 VerAlign
VerAlign is a software that compares the accuracy of a test alignment to the quality
of a reference version of the same alignments. It uses SPdist scoring, which calcu-
lates a distance between malfunctioned pairs of amino acid.
Further reading 73

Further reading
Cabanettes, F., Klopp, C., 2018. D-GENIES: dot plot large genomes in an interactive, efficient
and simple way. PeerJ 6, e4958.
Frazer, K.A., 2004. VISTA: computational tools for comparative genomics. Nucleic Acids
Res. 32 (Web Server issue), W273eW279.
Garcia-Boronat, M., Diez-Rivero, C.M., Reinherz, E.L., Reche, P.A., 2008. PVS: a web server
for protein sequence variability analysis tuned to facilitate conserved epitope discovery.
Nucleic Acids Res. 1 (35e41), 36.
Junier, T., Pagni, M., 2000. Dotlet: diagonal plots in a web browser. Bioinformatics 16 (2),
178e179.
Katoh, K., Rozewicki, J., Yamada, K.D., 2019. MAFFT online service: multiple sequence
alignment, interactive sequence choice and visualization. Briefings Bioinf. 20 (4),
1160e1166.
Noé, L., Kucherov, G., 2005. YASS: enhancing the sensitivity of DNA similarity search.
Nucleic Acids Res. 33, W540eW543.
Pei, J., Tang, M., Grishin, N.V., 2008. PROMALS3D web server for accurate multiple protein
sequence and structure alignments. Nucleic Acids Res. 36 (Web Server issue),
W30eW34.
Raghava, G.P.S., 2001. A graphical web server for the analysis of protein sequences and
alignment. Biotech Softw. Internet Rep. 2 (6).
Simossis, V.A., Heringa, J., 2005. PRALINE: a multiple sequence alignment toolbox that
integrates homology-extended and secondary structure information. Nucleic Acids Res.
33 (Web Server issue), W289eW294.

Suplatov, D.A., Kopylov, K.E., Popova, N.N., Voevodin, V.V., Svedas, V.K., 2018. Mustgu-
seal: a server for multiple structure-guided sequence alignment of protein families.
Bioinformatics 34 (9).
Lassmann, T., Sonnhammer, E.L.L., 2006. Kalign, Kalignvu and Mumsa: Web servers for
multiple sequence alignment. Nucleic Acids Res. 34 (Suppl. l_2), W596eW599.
Troshin, P.V., Procter, J.B., Barton, G.J., 2011. Java bioinformatics analysis web services for
multiple sequence alignment-JABAWS: MSA. BMC Bioinf. 27 (14), 2001e2002.
Wheeler, T.J., Kececioglu, J.D., 2007. Multiple alignment by aligning alignments. Bioinfor-
matics 23 (13), i559ei568.
Robert, X., Gouet, P., 2014. Deciphering key features in protein structures with the new END-
script server. Nucleic Acids Res. 4, W320eW324.
Zhang, Y., Skolnick, J., 2005. TM-align: a protein structure alignment algorithm based on
TM-score. Nucleic Acids Res. 33, 2302e2309.

You might also like