0% found this document useful (0 votes)
69 views18 pages

Lab Report 3 Bioinformatics

The document provides information about performing pairwise and multiple sequence alignments using CLUSTALW and BLAST. It includes the procedure for using each tool, beginning with opening the relevant web browser and uploading or pasting sequences in FASTA format. The results section shows the output of NCBI searches for several protein sequences from different organisms to compare their similarity and includes the relevant sequences in FASTA format.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views18 pages

Lab Report 3 Bioinformatics

The document provides information about performing pairwise and multiple sequence alignments using CLUSTALW and BLAST. It includes the procedure for using each tool, beginning with opening the relevant web browser and uploading or pasting sequences in FASTA format. The results section shows the output of NCBI searches for several protein sequences from different organisms to compare their similarity and includes the relevant sequences in FASTA format.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

SCHOOL OF PHARMACY

LAB REPORT 3

PAIRWISE SEQUENCE ALIGNMENT


AND MULTIPLE SEQUENCE
ALIGNMENT USING CLUSTALW &
BLAST

BIOINFORMATICS
(PBI2020IP)

NAME STUDENT ID
RABIATUL ADAWIYAH BINTI 012020091691
HASBULLAH

PROGRAMME:
BACHELOR OF PHARMACEUTICAL
TECHNOLOGY (BPHT)
LECTURER :
AP DR SANTOSH FATTEPUR AND
DR ALICIA NG
DATE OF SUBMISSION:
9th DECEMBER 2020
Practical 3: Pairwise Sequence Alignment and Multiple Sequence Alignment using
CLUSTALW & BLAST

Sequence database is useful for prediction of function, structure, or biochemical activity of


genes whose sequence have been determined in the laboratory. The sequence of the gene
of interest is compared to every sequence in a sequence database, while best-matching
sequences are predicted to have similar function or biochemical activity. There are TWO (2)
types of sequence alignment: pairwise and multiple.

Introduction:

Sequence alignment is the method of comparing and identifying similarities between


biological arrangements. What “similarities” are being identified will depend on the objectives
of the specific alignment handle. Sequence alignment shows up to be greatly valuable in a
number of bioinformatics applications. For example, the only way to compare two groupings
of the same length is to calculate the number of matching symbols. The value that measures
the degree of sequence similarity is called the alignment score of two arrangements. The
inverse value, compared to the level of dissimilarity between sequences, is usually referred
to as the distance between sequences. The number of non-matching characters is called the
Hamming remove. The sequence alignment is made between a known sequence and
unknown sequence or between two unknown sequences. The known sequence is called
reference sequences while the unknown sequence is called query sequence. There are two
types of sequence alignment which are pairwise alignment and multiple sequence alignment
(MSA). Pairwise alignment is an alignment procedure comparing two biological sequences
of either protein, DNA, or RNA. It used to find out conserved regions between the two
sequences and the similarity searches in a database. On the other hand, multiple sequence
alignment is an alignment procedure comparing three or more biological sequences of either
protein, DNA, or RNA. It used to detect regions of variability or conservation in a family of
proteins, detection of homology between a newly sequenced gene and an existing gene
family prediction of protein structure, and demonstrate homology in multigene families. Next,
The difference between local and global alignment are the local alignment is a matching two
sequence from regions which have more similarity with each other, to see whether a
substring in one sequence aligns well with a substring in the other also to search for local
similarities in large sequences usually for newly sequenced genomes. Meanwhile, the global
alignment is a matching the residues of two sequences across their entire length. It is to
compare two genes with the same function such as human vs mouse and to compare two
proteins with similar function. ClustalW just like the other Clustal devices is used for aligning
different nucleotide or protein sequences in an effective way. It employments progressive
alignment methods, which align the most similar sequences, first and work their way down to
the slightest similar sequences until a global alignment is made. ClustalW may be a matrix-
based algorithm, though devices like T-Coffee and Dialign are consistency-based. ClustalW
incorporates a decently effective calculation that competes well against other programs. This
program requires three or more sequences in order to calculate a global alignment, for
pairwise sequence alignment (2 sequences) utilize devices similar to Decorate, LALIGN.
The basic Local Alignment Look Tool (Blast) finds regions of likeness between sequences.
The program differentiates nucleotide or protein sequences and calculates the factual
importance of matches. The impact can be utilized to gather useful and developmental
connections between sequences as well as help recognize members of gene families.
Aim:

To perform pairwise and multiple sequence alignment using both CLUSTALW & BLAST
tools.

Procedure:

A. CLUSTALW
1. Open the web browser and type https://www.ebi.ac.uk/Tools/msa/clustalw2/.
2. Upload the sequences from the Notepad or paste the sequences in FASTA format.

3. Upload two sequences for pairwise alignment or more than two sequences for multiple
sequences alignment. After uploading, choose the “Execute Multiple Alignment” option in the
alignment icon.

4. Sequence alignment results will be appeared within few seconds after execution. 5.
Report the result.

B. BLAST
1. Open the web browser and type http://blast.ncbi.nlm.nih.gov/Blast.cgi

2. Click either nucleotide blast or protein blast icon according to the requirement.

3. Select “Align two or more sequences” check box for opting multiple sequence alignment
or deselect for pairwise alignment.

4. Upload or paste a query sequence (in FASTA format) in the query box and execute
BLAST for pairwise alignment. This will be identifying most similar sequences from the
databank.

5. Upload or paste a query sequence (in FASTA format) in the query box and upload more
than one sequences (in FASTA format) in the subject box and then execute BLAST for
multiple sequence alignment. This will be identifying the similarity/ dissimilarity among the
sequences.

6. Report the result.


Result

a) NCBI search for the protein (print screen)

i. Lipoxygenase in Homo sapiens

ii. Lipoxygenase in Glycine Max


iii. Proteases in Human rhinovirus sp.

iv. Proteases in Shigella sonnei


v. MetP protein in Salmonella enterica

vi. MetP protein in Streptococcus pyogenes


vii. MetP protein in Clostridioides difficile

viii. MetP protein in Listeria monocytogenes


ix. MetP protein in Streptococcus pneumoniae

x. MetP protein in Acinetobacter baumannii


b) FASTA format for all the queries

i. Lipoxygenase in Homo sapiens

>AAA36183.1 lipoxygenase [Homo sapiens]


MPSYTVTVATGSQWFAGTDDYIYLSLVGSAGCSEKHLLDKPFYNDFERGAVDSYDVTVDEELGEIQLVRI
EKRKYWLNDDWYLKYITLKTPHGDYIEFPCYRWITGDVEVVLRDGRAKLARDDQIHILKQHRRKELETRQ
KQYRWMEWNPGFPLSIDAKCHKDLPRDIQFDSEKGVDFVLNYSKAMENLFINRFMHMFQSSWNDFADFEK
IFVKISNTISERVMNHWQEDLMFGYQFLNGCNPVLIRRCTELPEKLPVTTEMVECSLERQLSLEQEVQQG
NIFIVDFELLDGIDANKTDPCTLQFLAAPICLLYKNLANKIVPIAIQLNQIPGDENPIFLPSDAKYDWLL
AKIWVRSSDFHVHQTITHLLRTHLVSEVFGIAMYRQLPAVHPIFKLLVAHVRFTIAINTKAREQLICECG
LFDKANATGGGGHVQMVQRAMKDLTYASLCFPEAIKARGMESKEDIPYYFYRDDGLLVWEAIRTFTAEVV
DIYYEGDQVVEEDPELQDFVNDVYVYGMRGRKSSGFPKSVKSREQLSEYLTVVIFTASAQHAAVNFGQYD
WCSWIPNAPPTMRAPPPTAKGVVTIEQIVDTLPDRGRSCWHLGAVWALSQFQENELFLGMYPEEHFIEKP
VKEAMARFRKNLEAIVSVIAERNKKKQLPYYYLSPDRIPNSVAI

ii. Lipoxygenase in Glycine Max

>NP_001235189.1 lipoxygenase [Glycine max]


MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGLDALGHAVDALTAFAGHSISL
QLISATQTDGSGKGKVGNEAYLEKHLPTLPTLGARQEAFDINFEWDASFGIPGAFYIKNFMTDEFFLVSV
KLEDIPNHGTINFVCNSWVYNFKSYKKNRIFFVNDTYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDR
IYDYDIYNDLGNPDGGDPRPIIGGSSNYPYPRRVRTGREKTRKDPNSEKPGEIYVPRDENFGHLKSSDFL
TYGIKSLSQNVIPLFKSIILNLRVTSSEFDSFDEVRGLFEGGIKLPTNILSQISPLPVLKEIFRTDGENT
LQFPPPHVIRVSKSGWMTDDEFAREMIAGVNPNVIRRLQEFPPKSTLDPATYGDQTSTITKQQLEINLGG
VTVEEAISAHRLFILDYHDAFFPYLTKINSLPIAKAYATRTILFLKDDGSLKPLAIELSKPATVSKVVLP
ATEGVESTIWLLAKAHVIVNDSGYHQLISHWLNTHAVMEPFAIATNRHLSVLHPIYKLLYPHYKDTININ
GLARQSLINAGGIIEQTFLPGKYSIEMSSVVYKNWVFTDQALPADLVKRGLAVEDPSAPHGLRLVIEDYP
YAVDGLEIWDAIKTWVHEYVSVYYPTNAAIQQDTELQAWWKEVVEKGHGDLKDKPWWPKLQTVEDLIQSC
SIIIWTASALHAAVNFGQYPYGGYIVNRPTLARRFIPEEGTKEYDEMVKDPQKAYLRTITPKFETLIDIS
VIEILSRHASDEVYLGQRDNPNWTTDSKALEAFKKFGNKLAEIEGKITQRNNDPSLKSRHGPVQLPYTLL
HRSSEEGMSFKGIPNSISI

iii. Proteases in Human rhinovirus sp.

>AAA45759.1 protease, partial [Human rhinovirus sp.]


AFRPCNVNTKIGNAKCCPFVCGKAVTFKDRSTCSTYNLSSSLHHILEEDKRRRQVVDVMSAIFQGPISLD
APPPPAIADLLQSVRTPRVIKYCQIIMGHPAECQVERDLNIANSIIAIIANIISIAGIIFVIYKLFCSLQ
GPYSGEPKPKTKVPERRVVAQGPEEEFGRSILKNNTCVITTGNGKFTGLGIHDRILIIPTHADPGREVQV
NGVHTKVLDSYDLYNRDGVKLEITVIQLDRNEKFRDIRKYIPETEDDYPECNLALSANQDEPTIIKVGDV
VSYGNILLSGNQTARMLKYNYPTKSGYCGGVLYKIGQILGIHVGGNGRDGFSAMLLRSYFTGQIKVNKHA
TECGLPDIQTIHTPSKTKLQPSVFYDVFPGSKEPAVLTDNDPRLEVNFKEA

iv. Proteases in Shigella sonnei

>WP_052962488.1 sigma E protease regulator RseP [Shigella sonnei]


MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIGFGKALWRRTDKLGTEYVMALIPLGGYV
KMLDERAEPVVPELRHHAFNNKSVGQRAAIIAAGPVANFIFAIFAYWLGFIIGVPGVRPVVGEIAANSIA
AEAQIAPGTELKAVDGIETPDWDAVRLQLVDKIGDESTTITVAPFGSDQRRDVKLDLRHWAFEPDKEDPV
SSLGIRPRGPQIEPVLENVQPNSAASKAGLQAGDRIVKVDGQPLTQWVTFVMLVRDNPGKSLALEIERQG
SPLSLTLIPESKPGNGKAIGFVGIEPKVIPLPDEYKVVRQYGPFNAIVEATDKTWQLMKLTVSMLGKLIT
GDVKLNNLSGPISIAKGAGMTAELGVVYYLPFLALISVNLGIINLFPLPVLDGGHLLFLAIEKIKGGPVS
ERVQDFCYRIGSILLVLLMGLALFNDFSRL
v. MetP protein in Salmonella enterica

>CAD5307872.1 Methionine import system permease protein MetP [Salmonella


enterica subsp. enterica serovar Typhimurium]
MDDLLPDLTLAFNETFQMLSISTVLAILGGLPLGFLIFVTDRHLFWQNRFIYLVASVLVNIIRSVPFVIL
LVLLLPLTQLLLGNTIGPIAASVPLSVAAIAFYARLVDSALREVDKGIIEAALAFGASPMRIICTVLLPE
ASAGLLRGLTITLVSLIGYSAMAGIVGGGGVGDLAIRYGYYRYETEVMVVTVVALIVLVQVVQMLGDWLA
KRADKRDRH

vi. MetP protein in Streptococcus pyogenes

>AKP81145.1 Methionine import system permease protein MetP [Streptococcus


pyogenes]
MSQLIQTYLPNVYELGWSGDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDGVIENKTICWVI
DKVTSIFRAIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYARQVQVVFSELDKGVIEAAQAS
GATFWDIVKVYLSEGLPDLIRVSTVTLISLVGETAMAGAIGAGGLGNVAISYGYNRFNNDVTWVATIIIL
LIIFAIQFIGDSLTRRFSHK

vii. MetP protein in Clostridioides difficile

>ALP04977.1 Methionine import system permease protein MetP [Clostridioides


difficile]
MNSLIDFLTTLFPNALLQTLYMVIVPTIVATILGFILAIILVVTKPDGLKPNSTINSALGFIVNIFRSFP
FMILIVAMIPITRLIVGTSIGETAAIVPITIGAAPFIARIIESSLNEVDKGLIEAAKSFGATKRQIVFKV
MIKEAMPSIVSGITLSIISILGYTAMAGAVGAGGLGNIALIYGYQRFDTAVMVYTVIALIILVQIIQGVG
NLAYKKLK

viii. MetP protein in Listeria monocytogenes

>CCO64955.1 Methionine import system permease protein MetP [Listeria


monocytogenes serotype 4b str. LL195]
MTKLQELFPNVDFQMMWVATQETLYMTLVSLFAVFLLGIVLGLLLFLTNNKKHAGARILYWITAILVNVF
RSIPFIILIVLLLPMTKSLVGTVIGPKAALPALIISAAPFYGRMVEIAFREVDKGVIEAAKSMGANMFTI
IGKVLIPEALPAIISGITVTAISLVGFTAMAGVIGAGGLGNTAYLEGFQRGQPDVTVLATIIILIIVFIF
QFIGDFLTKRTDKR

ix. MetP protein in Streptococcus pneumoniae

>VDG79202.1 Methionine import system permease protein MetP [Streptococcus


pneumoniae]
MESLIQTYLPNVYKMGWAGQAGWGTAIYLTLYMTVLSFIIGGFLGLVAGLFLVLTAPGGVLENKVVFWIL
DKITSIFRAVPFIILLAILSPLSHLIVKTSIGPNAALVPLSFAVFAFFARQVQVVLAELDGGVIEAAQAS
GATFWDIVGVYLSEGLPDLIRVTTVTLISLVGETAMAGAVGAGGIGNVAIAYGFNRYNHDVTILATIVII
LIIFAIQFLGDFLTKKLSHK

x. MetP protein in Acinetobacter baumannii

>AVP34927.1 Methionine import system permease protein MetP [Acinetobacter


baumannii]
MQYQLIDLLITGTVDTLLMVGASAFIAFLIGLPIAVILVSTSEHGIHPSQKINQALGWVINITRSVPFLI
LMVALIPLTRWIVGTSYGVWAAVVPLTIAAIPFFARIAEVSLREVDQGLIEAAQAMGCNRKQIIWHVLLP
EALPGIVAGFTVTIVTMINSSAIAGAIGAGGLGDIAYRYGYQRFDMQIMLAVILVLIVLVMLVQATGDAL
AQQLDKRKV
c) Pairwise alignment results

i. Lipoxygenase in Homo sapiens and Glycine max

Clustalw

########################################
# Program: needle
# Rundate: Wed 9 Dec 2020 08:13:59
# Commandline: needle
# -auto
# -stdout
# -asequence emboss_needle-I20201209-081517-0222-75853467-p1m.asequence
# -bsequence emboss_needle-I20201209-081517-0222-75853467-p1m.bsequence
# -datafile EBLOSUM62
# -gapopen 10.0
# -gapextend 0.5
# -endopen 10.0
# -endextend 0.5
# -aformat3 pair
# -sprotein1
# -sprotein2
# Align_format: pair
# Report_file: stdout
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: AAA36183.1
# 2: NP_001235189.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 896
# Identity: 200/896 (22.3%)
# Similarity: 329/896 (36.7%)
# Gaps: 259/896 (28.9%)
# Score: 524.0
#
#
#=======================================
AAA36183.1 1 -------------------------------------------------- 0

NP_001235189. 1 MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGL 50

AAA36183.1 1 ---------------------MPSYTVTVATGSQWFAGTDDYIYLSLVGS 29
:.|.|.|..:|. ..||:
NP_001235189. 51 DALGHAVDALTAFAGHSISLQLISATQTDGSGK------------GKVGN 88

AAA36183.1 30 AGCSEKHLLDKPFYNDFERGA-VDSYDVTVDEEL-----GEIQLVRIEKR 73
....||||...| ..|| .:::|:..:.:. |...:
NP_001235189. 89 EAYLEKHLPTLP-----TLGARQEAFDINFEWDASFGIPGAFYI------ 127

AAA36183.1 74 KYWLNDDWYLKYITLK-TPHGDYIEFPCYRWITGDVEVVLRDGRAKLARD 122


|.::.|:::|..:.|: .|:...|.|.|..|:..... .:..|.....|
NP_001235189. 128 KNFMTDEFFLVSVKLEDIPNHGTINFVCNSWVYNFKS--YKKNRIFFVND 175

AAA36183.1 123 DQIHI-----LKQHRRKELET--------RQKQYRWMEW-------NP-- 150


..:.. |.::|::|||. |:...|..:: ||
NP_001235189. 176 TYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDRIYDYDIYNDLGNPDG 225

AAA36183.1 151 GFPLSI----------------DAKCHKD----------LPRDIQFDSEK 174


|.|..| ..|..|| :|||..|...|
NP_001235189. 226 GDPRPIIGGSSNYPYPRRVRTGREKTRKDPNSEKPGEIYVPRDENFGHLK 275

AAA36183.1 175 GVDFVLNYSKAMENLFINRF------MHMFQSSWNDFADFEKIF---VKI 215


..||:....|::....|..| :.:..|.::.|.:...:| :|:
NP_001235189. 276 SSDFLTYGIKSLSQNVIPLFKSIILNLRVTSSEFDSFDEVRGLFEGGIKL 325

AAA36183.1 216 SNTISERV-----------------------------MNHWQEDLMFGYQ 236


...|..:: .:.|..|..|..:
NP_001235189. 326 PTNILSQISPLPVLKEIFRTDGENTLQFPPPHVIRVSKSGWMTDDEFARE 375

AAA36183.1 237 FLNGCNPVLIRRCTELPEK------------LPVTTEMVECSLERQLSLE 274


.:.|.||.:|||..|.|.| ..:|.:.:|.:| ..:::|
NP_001235189. 376 MIAGVNPNVIRRLQEFPPKSTLDPATYGDQTSTITKQQLEINL-GGVTVE 424

AAA36183.1 275 QEVQQGNIFIVDFELLDGIDA-----NKTDPCTLQFLAAPICLLYKNLAN 319


:.:....:||:|:. || .|.:...:....|...:|:.....
NP_001235189. 425 EAISAHRLFILDYH-----DAFFPYLTKINSLPIAKAYATRTILFLKDDG 469

AAA36183.1 320 KIVPIAIQLNQIPGDENPIFLPSDAKYD---WLLAKIWVRSSDFHVHQTI 366


.:.|:||:|:: |...:.:.||:....: |||||..|..:|...||.|
NP_001235189. 470 SLKPLAIELSK-PATVSKVVLPATEGVESTIWLLAKAHVIVNDSGYHQLI 518

AAA36183.1 367 THLLRTHLVSEVFGIAMYRQLPAVHPIFKLLVAHVRFTIAINTKAREQLI 416


:|.|.||.|.|.|.||..|.|..:|||:|||..|.:.||.||..||:.||
NP_001235189. 519 SHWLNTHAVMEPFAIATNRHLSVLHPIYKLLYPHYKDTININGLARQSLI 568

AAA36183.1 417 CECGLFDKANATGGGGHVQMVQRAMKDLTYASLCFPEAIKARGMESK--- 463


...|:.::....|... ::|.....|:..:.....|..:..||:..:
NP_001235189. 569 NAGGIIEQTFLPGKYS-IEMSSVVYKNWVFTDQALPADLVKRGLAVEDPS 617

AAA36183.1 464 ---------EDIPYYFYRDDGLLVWEAIRTFTAEVVDIYYEGDQVVEEDP 504


||.||.. |||.:|:||:|:..|.|.:||..:..:::|.
NP_001235189. 618 APHGLRLVIEDYPYAV---DGLEIWDAIKTWVHEYVSVYYPTNAAIQQDT 664

AAA36183.1 505 ELQDFVNDVYVYGMRGRKSSGFPKSVKSREQLSEYLTVVIFTASAQHAAV 554


|||.:..:|...|....|...:...:::.|.|.:..:::|:||||.||||
NP_001235189. 665 ELQAWWKEVVEKGHGDLKDKPWWPKLQTVEDLIQSCSIIIWTASALHAAV 714

AAA36183.1 555 NFGQYDWCSWIPNAPPTMRAPPPTAKGVVTIEQIVD--------TLPDRG 596


|||||.:..:|.|.|...|...| .:|....:::|. |:..:.
NP_001235189. 715 NFGQYPYGGYIVNRPTLARRFIP-EEGTKEYDEMVKDPQKAYLRTITPKF 763

AAA36183.1 597 RSCWHLGAVWALSQFQENELFLGMYPEEHF-IEKPVKEAMARFRKNLEAI 645


.:...:..:..||:...:|::||.....:: .:....||..:|...|..|
NP_001235189. 764 ETLIDISVIEILSRHASDEVYLGQRDNPNWTTDSKALEAFKKFGNKLAEI 813

AAA36183.1 646 VSVIAERNKKK---------QLPYYYL--------SPDRIPNSVAI 674


...|.:||... ||||..| |...||||::|
NP_001235189. 814 EGKITQRNNDPSLKSRHGPVQLPYTLLHRSSEEGMSFKGIPNSISI 859
Blast

Accession Description
lcl|Query_10001 AAA36183.1 lipoxygenase [Homo sapiens]

lcl|Query_10002 NP_001235189.1 lipoxygenase


[Glycine max]

Query_10001 1 ------------------------------------------------------------------
MPSYTVTVATGSQW 14
Query_10002 1
MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGLDALGHAVDALTAFAGHSISLQLISATQTDG 80

Query_10001 15
FAGTDDYIYLSLVGSAGCSEKHLLDKPFYNDFERGAVDSYDVTVDEELGEIQLVRIEKRKYWLNDDWYLKYITLKT-PHG 93
Query_10002 81 SGK-------GKVGNEAYLEKHLPTLPTLG--ARQEAFDINFEWDASFGIPGAFYIKNFM---
TDEFFLVSVKLEDIPNH 148

Query_10001 94 DYIEFPCYRWITGDVEVVLRDGRAKLARDDQ-----IHILKQHRRKELETRQ-----------
KQYRWMEWNP------- 150
Query_10002 149 GTINFVCNSWVYNFKSY--
KKNRIFFVNDTYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDRIYDYDIYNDLGNPDGG 226

Query_10001 151 ---------------------GFPLSIDAKCHKD----


LPRDIQFDSEKGVDFVLNYSKAMENLFINRFMHM------FQ 199
Query_10002 227
DPRPIIGGSSNYPYPRRVRTGREKTRKDPNSEKPGEIYVPRDENFGHLKSSDFLTYGIKSLSQNVIPLFKSIILNLRVTS 306

Query_10001 200 SSWNDFADFEKIFVKI--------------------------------


SNTISERVMNHWQEDLMFGYQFLNGCNPVLIR 247
Query_10002 307
SEFDSFDEVRGLFEGGIKLPTNILSQISPLPVLKEIFRTDGENTLQFPPPHVIRVSKSGWMTDDEFAREMIAGVNPNVIR 386

Query_10001 248 RCTELPEKL------------


PVTTEMVECSLERQLSLEQEVQQGNIFIVDFELLDGIDANKTDPCTLQFLAAPICLLYK 315
Query_10002 387 RLQEFPPKSTLDPATYGDQTSTITKQQLEINLGG-
VTVEEAISAHRLFILDYHDAFFPYLTKINSLPIAKAYATRTILFL 465

Query_10001 316 NLANKIVPIAIQLNQIPGDENPIFLPSDAKYD---


WLLAKIWVRSSDFHVHQTITHLLRTHLVSEVFGIAMYRQLPAVHP 392
Query_10002 466 KDDGSLKPLAIELSK-
PATVSKVVLPATEGVESTIWLLAKAHVIVNDSGYHQLISHWLNTHAVMEPFAIATNRHLSVLHP 544

Query_10001 393
IFKLLVAHVRFTIAINTKAREQLICECGLFDKANATGGGGHVQMVQRAMKDLTYASLCFPEAIKARGMES---------K 463
Query_10002 545 IYKLLYPHYKDTININGLARQSLINAGGIIEQTFLPGKYS-
IEMSSVVYKNWVFTDQALPADLVKRGLAVEDPSAPHGLR 623

Query_10001 464
EDIPYYFYRDDGLLVWEAIRTFTAEVVDIYYEGDQVVEEDPELQDFVNDVYVYGMRGRKSSGFPKSVKSREQLSEYLTVV 543
Query_10002 624
LVIEDYPYAVDGLEIWDAIKTWVHEYVSVYYPTNAAIQQDTELQAWWKEVVEKGHGDLKDKPWWPKLQTVEDLIQSCSII 703

Query_10001 544 IFTASAQHAAVNFGQYDWCSWIPNAPPTMR--APPPTAKGVVTI-----


EQIVDTLPDRGRSCWHLGAVWALSQFQENEL 616
Query_10002 704
IWTASALHAAVNFGQYPYGGYIVNRPTLARRFIPEEGTKEYDEMVKDPQKAYLRTITPKFETLIDISVIEILSRHASDEV 783

Query_10001 617 FLGMYPEEH-FIEKPVKEAMARFRKNLEAIVSVIAERNKKK---------QLPYYYLSPDR--------


IPNSVAI 674
Query_10002 784
YLGQRDNPNWTTDSKALEAFKKFGNKLAEIEGKITQRNNDPSLKSRHGPVQLPYTLLHRSSEEGMSFKGIPNSISI 859
ii. Proteases in Human rhinovirus sp. and Shigella sonnei

Clustalw

########################################
# Program: needle
# Rundate: Wed 9 Dec 2020 12:12:47
# Commandline: needle
# -auto
# -stdout
# -asequence emboss_needle-I20201209-121245-0578-18428069-p2m.asequence
# -bsequence emboss_needle-I20201209-121245-0578-18428069-p2m.bsequence
# -datafile EBLOSUM62
# -gapopen 10.0
# -gapextend 0.5
# -endopen 10.0
# -endextend 0.5
# -aformat3 pair
# -sprotein1
# -sprotein2
# Align_format: pair
# Report_file: stdout
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: AAA45759.1
# 2: WP_052962488.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 595
# Identity: 71/595 (11.9%)
# Similarity: 130/595 (21.8%)
# Gaps: 339/595 (57.0%)
# Score: 33.0
#
#
#=======================================

AAA45759.1 1 -------------------------------------------------- 0

WP_052962488. 1 MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIGFGKALWR 50

AAA45759.1 1 -------------------------------------------------- 0

WP_052962488. 51 RTDKLGTEYVMALIPLGGYVKMLDERAEPVVPELRHHAFNNKSVGQRAAI 100

AAA45759.1 1 -------------------------AFRP----CNVNTKIGNAKCCPFVC 21
..|| ...|:....|:..|...
WP_052962488. 101 IAAGPVANFIFAIFAYWLGFIIGVPGVRPVVGEIAANSIAAEAQIAPGTE 150

AAA45759.1 22 GKAV-----------------TFKDRSTCSTYNLSSSLHHILEEDKRRRQ 54
.||| ...|.||..|.....| |:||..
WP_052962488. 151 LKAVDGIETPDWDAVRLQLVDKIGDESTTITVAPFGS-------DQRRDV 193

AAA45759.1 55 VVDVMSAIF----QGPISL--DAPPPPAIADLLQSVRTPRVIKYCQIIMG 98
.:|:....| :.|:|. ..|..|.|..:|::|:
WP_052962488. 194 KLDLRHWAFEPDKEDPVSSLGIRPRGPQIEPVLENVQ------------- 230

AAA45759.1 99 HPAECQVERDLNIANSIIAIIANIISIAGIIFVIY-----KLFCSLQGPY 143


|.....:..|...:.|:.:....:: ..:.||:. ....:|:...
WP_052962488. 231 -PNSAASKAGLQAGDRIVKVDGQPLT-QWVTFVMLVRDNPGKSLALEIER 278

AAA45759.1 144 SGEPKPKTKVPERRVVAQGPEEEFGRSILKNNTCVITTGNGKFTG-LGIH 192


.|.|...|.:||.: .||||..| :||.
WP_052962488. 279 QGSPLSLTLIPESK-----------------------PGNGKAIGFVGIE 305

AAA45759.1 193 DRILIIPTHADPGREVQVNGVHTKVLDSYDLYNRDGVKLEITVIQLDR-- 240


.:::.:| |..:.|:..|....::::.| :....:::||..|.:
WP_052962488. 306 PKVIPLP---DEYKVVRQYGPFNAIVEATD---KTWQLMKLTVSMLGKLI 349

AAA45759.1 241 --NEKFRDIR------------------KYIPETEDDYPECNLALSANQD 270


:.|..::. .|:| .|||.:
WP_052962488. 350 TGDVKLNNLSGPISIAKGAGMTAELGVVYYLP---------FLALIS--- 387

AAA45759.1 271 EPTIIKVG-------DVVSYGNIL------LSGNQTARMLKYNYPTKSGY 307


:.:| .|:..|::| :.|...:..:: .:
WP_052962488. 388 ----VNLGIINLFPLPVLDGGHLLFLAIEKIKGGPVSERVQ-------DF 426

AAA45759.1 308 CGGVLYKIGQILGIHVGGNGR-DGFSAMLLRSYFTGQIKVNKHATECGLP 356


| |:||.||.:.:.|... :.||.:
WP_052962488. 427 C----YRIGSILLVLLMGLALFNDFSRL---------------------- 450

AAA45759.1 357 DIQTIHTPSKTKLQPSVFYDVFPGSKEPAVLTDNDPRLEVNFKEA 401

WP_052962488. 451 --------------------------------------------- 450


Blast

Accession Description
lcl|Query_10001 AAA45759.1 protease, partial [Human rhinovirus
sp.]
lcl|Query_10002 WP_052962488.1 sigma E protease regulator
RseP [Shigella sonnei]

Query_10001 1 ----------------------------AFRPCNVNTK---
IGNAKCCPFVCGKAVTFKDRSTCSTYNLS---------- 39
Query_10002 1 MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIG--------
FGKALWRRTDKLGTEYVMALIPLGGYVKM 72

Query_10001 40 ---------SSLHHILEEDKRRRQ---------
VVDVMSAIFQGPISLDAPPPPAIADLLQSVRTPRVIKYCQIIMGHPA 101
Query_10002 73 LDERAEPVVPELRHHAFNNKSVGQRAAIIAAGPVANFIFAIFAYWLGFIIGVP-
GVRPVVGEIAANSIAAEAQIAPGTEL 151

Query_10001 102 ECQVERDLNIANSIIAIIANIISIAGIIFVIYKLFCSLQGP-------


YSGEPKPKTKVPERRVVAQGPEEEFGRSILKN 174
Query_10002 152
KAVDGIETPDWDAVRLQLVDKIGDESTTITVAPFGSDQRRDVKLDLRHWAFEPDKEDPVSSLGIRPRGPQIE---
PVLEN 228

Query_10001 175
NTCVITTGNGKFTGLGIHDRILIIPTHADPGREVQVNGVHTKVLDSYDLYNRDGVKLEITVIQLDRNEKFRDIRKYIPE
T 254
Query_10002 229 ---VQPNSAASKAGLQAGDRI------------VKVDGQPLTQWVTFVMLVRDNPGKSLA-
LEIERQGSPLS----LTLI 288

Query_10001 255 EDDYPECNLALSANQDEPTIIKVGDVVS-------


YGNILLSGNQTARMLKYNYPTKSGYCGGVLYKIGQILGIHVGGNG 327
Query_10002 289
PESKPGNGKAIGFVGIEPKVIPLPDEYKVVRQYGPFNAIVEATDKTWQLMKLTVSMLGKLITGDV-
KLNNLSGPISIAKG 367

Query_10001 328 RDGFSAMLLRSY---FTGQIKVNKHATEC-GLPDIQTIHT-----------PSKTKLQP--


----SVFYDVFPGSKEPAV 386
Query_10002 368 A-
GMTAELGVVYYLPFLALISVNLGIINLFPLPVLDGGHLLFLAIEKIKGGPVSERVQDFCYRIGSILLVLLMG----LA
442

Query_10001 387 LTDNDPRLEVNFKEA 401


Query_10002 443 LFNDFSRL------- 450
d) Multiple alignment results

i. MetP protein in Salmonella enterica, MetP protein in Streptococcus pyogenes, MetP


protein in Clostridioides difficile, MetP protein in Listeria monocytogenes, MetP
protein in Streptococcus pneumoniae and MetP protein in Acinetobacter baumannii

Clustalw

CLUSTAL O(1.2.4) multiple sequence alignment

CCO64955.1 -MTKLQELFPNVDFQMMWVA-------
TQETLYMTLVSLFAVFLLGIVLGLLLFLTNNKK 52
AKP81145.1 -
MSQLIQTYLPNVYELGWSGDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDG 59
VDG79202.1 -
MESLIQTYLPNVYKMGWAGQAGWGTAIYLTLYMTVLSFIIGGFLGLVAGLFLVLTAPGG 59
CAD5307872.1 -----MDDL-----------
LPDLTLAFNETFQMLSISTVLAILGGLPLGFLIFVTDRHL 44
ALP04977.1 -MNSLIDFL-----------
TTLFPNALLQTLYMVIVPTIVATILGFILAIILVVTKPDG 48
AVP34927.1 MQYQLID---------------
LLITGTVDTLLMVGASAFIAFLIGLPIAVILVSTSEHG 45
: *: * . *: ..::. *

CCO64955.1
HAGARILYWITAILVNVFRSIPFIILIVLLLPMTKSLVGTVIGPKAALPALIISAAPFYG 112
AKP81145.1
VIENKTICWVIDKVTSIFRAIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYA 119
VDG79202.1
VLENKVVFWILDKITSIFRAVPFIILLAILSPLSHLIVKTSIGPNAALVPLSFAVFAFFA 119
CAD5307872.1
FWQNRFIYLVASVLVNIIRSVPFVILLVLLLPLTQLLLGNTIGPIAASVPLSVAAIAFYA 104
ALP04977.1
LKPNSTINSALGFIVNIFRSFPFMILIVAMIPITRLIVGTSIGETAAIVPITIGAAPFIA 108
AVP34927.1
IHPSQKINQALGWVINITRSVPFLILMVALIPLTRWIVGTSYGVWAAVVPLTIAAIPFFA 105
: : .: *:.**:**:. : :: :: . * ** : ... * .

CCO64955.1
RMVEIAFREVDKGVIEAAKSMGANMFTIIGKVLIPEALPAIISGITVTAISLVGFTAMAG 172
AKP81145.1 RQVQVVFSELDKGVIEAAQASGATFWDIVK-
VYLSEGLPDLIRVSTVTLISLVGETAMAG 178
VDG79202.1 RQVQVVLAELDGGVIEAAQASGATFWDIVG-
VYLSEGLPDLIRVTTVTLISLVGETAMAG 178
CAD5307872.1
RLVDSALREVDKGIIEAALAFGASPMRIICTVLLPEASAGLLRGLTITLVSLIGYSAMAG 164
ALP04977.1
RIIESSLNEVDKGLIEAAKSFGATKRQIVFKVMIKEAMPSIVSGITLSIISILGYTAMAG 168
AVP34927.1
RIAEVSLREVDQGLIEAAQAMGCNRKQIIWHVLLPEALPGIVAGFTVTIVTMINSSAIAG 165
* : : *:* *:**** : *.. *: * : *. :: *:: ::::. :*:**

CCO64955.1 VIGAGGLGNTAYLEGFQRGQPDVTVLATIIILIIVFIFQFIGDFLTKRTDKR--- 224


AKP81145.1 AIGAGGLGNVAISYGYNRFNNDVTWVATIIILLIIFAIQFIGDSLTRRFSHK--- 230
VDG79202.1 AVGAGGIGNVAIAYGFNRYNHDVTILATIVIILIIFAIQFLGDFLTKKLSHK--- 230
CAD5307872.1 IVGGGGVGDLAIRYGYYRYETEVMVVTVVALIVLVQVVQMLGDWLAKRADKRDRH 219
ALP04977.1 AVGAGGLGNIALIYGYQRFDTAVMVYTVIALIILVQIIQGVGNLAYKKLK----- 218
AVP34927.1 AIGAGGLGDIAYRYGYQRFDMQIMLAVILVLIVLVMLVQATGDALAQQLDKRKV- 219
:*.**:*: * *: * : : . : ::::: .* *: :: .
Blast

Accession Description
lcl|Query_10001 CAD5307872.1 Methionine import system permease protein MetP [Salmonella enterica
subsp. enterica serovar Typhimurium]
lcl|Query_10002 AKP81145.1 Methionine import system permease protein MetP [Streptococcus
pyogenes]
lcl|Query_10003 ALP04977.1 Methionine import system permease protein MetP [Clostridioides difficile]
lcl|Query_10004 CCO64955.1 Methionine import system permease protein MetP [Listeria
monocytogenes serotype 4b str. LL195]
lcl|Query_10005 VDG79202.1 Methionine import system permease protein MetP [Streptococcus
pneumoniae]
lcl|Query_10006 AVP34927.1 Methionine import system permease protein MetP [Acinetobacter
baumannii]

Query_10001 1 ---MDDLLPDLTLAFN--------ETFQMLSISTVLAILGGLPLGFLI
FVTDRHLFWQNRFIYLVASVLVNIIRSVP 66
Query_10002 1
MSQLIQTYLPNVYELGWSGDagwGLAIWNTLYMTIVPFIVGGAIGLLL[4]VLTGPDGVIENKTICWVIDKVTSIFRAI
P 81
Query_10003 1 ---MN-SLIDFLTTLFPNAL---LQTLYMVIVPTIVATILGFILAIIL
VVTKPDGLKPNSTINSALGFIVNIFRSFP 70
Query_10004 1 MTKLQELFPNVDFQMMWVAT---QETLYMTLVSLFAVFLLGIVLGLLL
FLTNNKKHAGARILYWITAILVNVFRSIP 74
Query_10005 1
MESLIQTYLPNVYKMGWAGQagwGTAIYLTLYMTVLSFIIGGFLGLVA[4]VLTAPGGVLENKVVFWILDKITSIFRAV
P 81
Query_10006 1 ---MQYQLIDLLIT----GT---VDTLLMVGASAFIAFLIGLPIAVIL
VSTSEHGIHPSQKINQALGWVINITRSVP 67

Query_10001 67 FVILLVLLLPLTQLLLGNTIGPIAAS---
VPLSVAAIAFYARLVDSALREVDKGIIEAALAFGASPMRIICTVLLPEASA 143
Query_10002 82 FVILIAILASFTYLLLRTTLGATAAL---
VPLTFATFPFYARQVQVVFSELDKGVIEAAQASGATFWDIVKVYL-SEGLP 157
Query_10003 71 FMILIVAMIPITRLIVGTSIGETAAIvPITIGAAPFIARIIES---
SLNEVDKGLIEAAKSFGATKRQIVFKVMIKEAMP 147
Query_10004 75 FIILIVLLLPMTKSLVGTVIGPKAAL-PALIISAAPFYGRMVE--
IAFREVDKGVIEAAKSMGANMFTIIGKVLIPEALP 151
Query_10005 82 FIILLAILSPLSHLIVKTSIGPNAAL---
VPLSFAVFAFFARQVQVVLAELDGGVIEAAQASGATFWDIVGVYL-SEGLP 157
Query_10006 68 FLILMVALIPLTRWIVGTSYGVWAAVvPLTIAAIPFFARIAEV---
SLREVDQGLIEAAQAMGCNRKQIIWHVLLPEALP 144

Query_10001 144
GLLRGLTITLVSLIGYSAMAGIVGGGGVGDLAIRYGYYRYETEVMVVTVVALIVLVQVVQMLGDWLAKRADKRdrh
219
Query_10002 158
DLIRVSTVTLISLVGETAMAGAIGAGGLGNVAISYGYNRFNNDVTWVATIIILLIIFAIQFIGDSLTRRFSHK---
230
Query_10003 148
SIVSGITLSIISILGYTAMAGAVGAGGLGNIALIYGYQRFDTAVMVYTVIALIILVQIIQGVGNLAYKKLK-----
218
Query_10004 152
AIISGITVTAISLVGFTAMAGVIGAGGLGNTAYLEGFQRGQPDVTVLATIIILIIVFIFQFIGDFLTKRTDKR---
224
Query_10005 158
DLIRVTTVTLISLVGETAMAGAVGAGGIGNVAIAYGFNRYNHDVTILATIVIILIIFAIQFLGDFLTKKLSHK---
230
Query_10006 145
GIVAGFTVTIVTMINSSAIAGAIGAGGLGDIAYRYGYQRFDMQIMLAVILVLIVLVMLVQATGDALAQQLDKRkv-
219

e) The total number of best matched residues for each alignment

i. Lipoxygenase in Homo sapiens and Glycine max

200

ii. Proteases in Human rhinovirus sp. and Shigella sonnei

71

iii. MetP protein in Salmonella enterica, MetP protein in Streptococcus pyogenes, MetP
protein in Clostridioides difficile, MetP protein in Listeria monocytogenes, MetP
protein in Streptococcus pneumoniae and MetP protein in Acinetobacter baumannii

38

Conclusion

What I learned from this lab session is that we can find pairwise sequence alignment and
multiple sequence alignment using CLUSTALW and BLAST. These two software have a lot
of data about protein, DNA, RNA, and many more. With this software, we can differentiate
from one protein to another. The main difference between pairwise sequence alignment and
multiple sequence alignment is that the pairwise only can align two proteins while the
multiple sequence alignment can align more than two proteins. When we align those
proteins, we get much information such as the similarities, the differences, and others. Not
only that, but we also can get to know the function of a new protein when we align it to a
known protein. If they are more similar to one another, the function of the new protein is also
similar to the known protein. Lastly, when we use CLUSTALW for alignment, we can get the
information on how similar each residue of protein, either it is a match, mismatch, or very
mismatch to one another. Those software ease for everyone especially the scientist and the
researcher to find the information about the alignment of protein and how similar it is to one
another.

You might also like