Lab Report 3 Bioinformatics
Lab Report 3 Bioinformatics
LAB REPORT 3
BIOINFORMATICS
(PBI2020IP)
NAME STUDENT ID
RABIATUL ADAWIYAH BINTI 012020091691
HASBULLAH
PROGRAMME:
BACHELOR OF PHARMACEUTICAL
TECHNOLOGY (BPHT)
LECTURER :
AP DR SANTOSH FATTEPUR AND
DR ALICIA NG
DATE OF SUBMISSION:
9th DECEMBER 2020
Practical 3: Pairwise Sequence Alignment and Multiple Sequence Alignment using
CLUSTALW & BLAST
Introduction:
To perform pairwise and multiple sequence alignment using both CLUSTALW & BLAST
tools.
Procedure:
A. CLUSTALW
1. Open the web browser and type https://www.ebi.ac.uk/Tools/msa/clustalw2/.
2. Upload the sequences from the Notepad or paste the sequences in FASTA format.
3. Upload two sequences for pairwise alignment or more than two sequences for multiple
sequences alignment. After uploading, choose the “Execute Multiple Alignment” option in the
alignment icon.
4. Sequence alignment results will be appeared within few seconds after execution. 5.
Report the result.
B. BLAST
1. Open the web browser and type http://blast.ncbi.nlm.nih.gov/Blast.cgi
2. Click either nucleotide blast or protein blast icon according to the requirement.
3. Select “Align two or more sequences” check box for opting multiple sequence alignment
or deselect for pairwise alignment.
4. Upload or paste a query sequence (in FASTA format) in the query box and execute
BLAST for pairwise alignment. This will be identifying most similar sequences from the
databank.
5. Upload or paste a query sequence (in FASTA format) in the query box and upload more
than one sequences (in FASTA format) in the subject box and then execute BLAST for
multiple sequence alignment. This will be identifying the similarity/ dissimilarity among the
sequences.
Clustalw
########################################
# Program: needle
# Rundate: Wed 9 Dec 2020 08:13:59
# Commandline: needle
# -auto
# -stdout
# -asequence emboss_needle-I20201209-081517-0222-75853467-p1m.asequence
# -bsequence emboss_needle-I20201209-081517-0222-75853467-p1m.bsequence
# -datafile EBLOSUM62
# -gapopen 10.0
# -gapextend 0.5
# -endopen 10.0
# -endextend 0.5
# -aformat3 pair
# -sprotein1
# -sprotein2
# Align_format: pair
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: AAA36183.1
# 2: NP_001235189.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 896
# Identity: 200/896 (22.3%)
# Similarity: 329/896 (36.7%)
# Gaps: 259/896 (28.9%)
# Score: 524.0
#
#
#=======================================
AAA36183.1 1 -------------------------------------------------- 0
NP_001235189. 1 MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGL 50
AAA36183.1 1 ---------------------MPSYTVTVATGSQWFAGTDDYIYLSLVGS 29
:.|.|.|..:|. ..||:
NP_001235189. 51 DALGHAVDALTAFAGHSISLQLISATQTDGSGK------------GKVGN 88
AAA36183.1 30 AGCSEKHLLDKPFYNDFERGA-VDSYDVTVDEEL-----GEIQLVRIEKR 73
....||||...| ..|| .:::|:..:.:. |...:
NP_001235189. 89 EAYLEKHLPTLP-----TLGARQEAFDINFEWDASFGIPGAFYI------ 127
Accession Description
lcl|Query_10001 AAA36183.1 lipoxygenase [Homo sapiens]
Query_10001 1 ------------------------------------------------------------------
MPSYTVTVATGSQW 14
Query_10002 1
MTGGMFGRKGQKIKGTVVLMPKNVLDFNAITSVGKGSAKDTATDFLGKGLDALGHAVDALTAFAGHSISLQLISATQTDG 80
Query_10001 15
FAGTDDYIYLSLVGSAGCSEKHLLDKPFYNDFERGAVDSYDVTVDEELGEIQLVRIEKRKYWLNDDWYLKYITLKT-PHG 93
Query_10002 81 SGK-------GKVGNEAYLEKHLPTLPTLG--ARQEAFDINFEWDASFGIPGAFYIKNFM---
TDEFFLVSVKLEDIPNH 148
Query_10001 94 DYIEFPCYRWITGDVEVVLRDGRAKLARDDQ-----IHILKQHRRKELETRQ-----------
KQYRWMEWNP------- 150
Query_10002 149 GTINFVCNSWVYNFKSY--
KKNRIFFVNDTYLPSATPGPLVKYRQEELEVLRGDGTGKRRDFDRIYDYDIYNDLGNPDGG 226
Query_10001 393
IFKLLVAHVRFTIAINTKAREQLICECGLFDKANATGGGGHVQMVQRAMKDLTYASLCFPEAIKARGMES---------K 463
Query_10002 545 IYKLLYPHYKDTININGLARQSLINAGGIIEQTFLPGKYS-
IEMSSVVYKNWVFTDQALPADLVKRGLAVEDPSAPHGLR 623
Query_10001 464
EDIPYYFYRDDGLLVWEAIRTFTAEVVDIYYEGDQVVEEDPELQDFVNDVYVYGMRGRKSSGFPKSVKSREQLSEYLTVV 543
Query_10002 624
LVIEDYPYAVDGLEIWDAIKTWVHEYVSVYYPTNAAIQQDTELQAWWKEVVEKGHGDLKDKPWWPKLQTVEDLIQSCSII 703
Clustalw
########################################
# Program: needle
# Rundate: Wed 9 Dec 2020 12:12:47
# Commandline: needle
# -auto
# -stdout
# -asequence emboss_needle-I20201209-121245-0578-18428069-p2m.asequence
# -bsequence emboss_needle-I20201209-121245-0578-18428069-p2m.bsequence
# -datafile EBLOSUM62
# -gapopen 10.0
# -gapextend 0.5
# -endopen 10.0
# -endextend 0.5
# -aformat3 pair
# -sprotein1
# -sprotein2
# Align_format: pair
# Report_file: stdout
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: AAA45759.1
# 2: WP_052962488.1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 595
# Identity: 71/595 (11.9%)
# Similarity: 130/595 (21.8%)
# Gaps: 339/595 (57.0%)
# Score: 33.0
#
#
#=======================================
AAA45759.1 1 -------------------------------------------------- 0
WP_052962488. 1 MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIGFGKALWR 50
AAA45759.1 1 -------------------------------------------------- 0
AAA45759.1 1 -------------------------AFRP----CNVNTKIGNAKCCPFVC 21
..|| ...|:....|:..|...
WP_052962488. 101 IAAGPVANFIFAIFAYWLGFIIGVPGVRPVVGEIAANSIAAEAQIAPGTE 150
AAA45759.1 22 GKAV-----------------TFKDRSTCSTYNLSSSLHHILEEDKRRRQ 54
.||| ...|.||..|.....| |:||..
WP_052962488. 151 LKAVDGIETPDWDAVRLQLVDKIGDESTTITVAPFGS-------DQRRDV 193
AAA45759.1 55 VVDVMSAIF----QGPISL--DAPPPPAIADLLQSVRTPRVIKYCQIIMG 98
.:|:....| :.|:|. ..|..|.|..:|::|:
WP_052962488. 194 KLDLRHWAFEPDKEDPVSSLGIRPRGPQIEPVLENVQ------------- 230
Accession Description
lcl|Query_10001 AAA45759.1 protease, partial [Human rhinovirus
sp.]
lcl|Query_10002 WP_052962488.1 sigma E protease regulator
RseP [Shigella sonnei]
Query_10001 1 ----------------------------AFRPCNVNTK---
IGNAKCCPFVCGKAVTFKDRSTCSTYNLS---------- 39
Query_10002 1 MLSFLWDLASFIVALGVLITVHEFGHFWVARRCGVRVERFSIG--------
FGKALWRRTDKLGTEYVMALIPLGGYVKM 72
Query_10001 40 ---------SSLHHILEEDKRRRQ---------
VVDVMSAIFQGPISLDAPPPPAIADLLQSVRTPRVIKYCQIIMGHPA 101
Query_10002 73 LDERAEPVVPELRHHAFNNKSVGQRAAIIAAGPVANFIFAIFAYWLGFIIGVP-
GVRPVVGEIAANSIAAEAQIAPGTEL 151
Query_10001 175
NTCVITTGNGKFTGLGIHDRILIIPTHADPGREVQVNGVHTKVLDSYDLYNRDGVKLEITVIQLDRNEKFRDIRKYIPE
T 254
Query_10002 229 ---VQPNSAASKAGLQAGDRI------------VKVDGQPLTQWVTFVMLVRDNPGKSLA-
LEIERQGSPLS----LTLI 288
Clustalw
CCO64955.1 -MTKLQELFPNVDFQMMWVA-------
TQETLYMTLVSLFAVFLLGIVLGLLLFLTNNKK 52
AKP81145.1 -
MSQLIQTYLPNVYELGWSGDAGWGLAIWNTLYMTIVPFIVGGAIGLLLGLLLVLTGPDG 59
VDG79202.1 -
MESLIQTYLPNVYKMGWAGQAGWGTAIYLTLYMTVLSFIIGGFLGLVAGLFLVLTAPGG 59
CAD5307872.1 -----MDDL-----------
LPDLTLAFNETFQMLSISTVLAILGGLPLGFLIFVTDRHL 44
ALP04977.1 -MNSLIDFL-----------
TTLFPNALLQTLYMVIVPTIVATILGFILAIILVVTKPDG 48
AVP34927.1 MQYQLID---------------
LLITGTVDTLLMVGASAFIAFLIGLPIAVILVSTSEHG 45
: *: * . *: ..::. *
CCO64955.1
HAGARILYWITAILVNVFRSIPFIILIVLLLPMTKSLVGTVIGPKAALPALIISAAPFYG 112
AKP81145.1
VIENKTICWVIDKVTSIFRAIPFVILIAILASFTYLLLRTTLGATAALVPLTFATFPFYA 119
VDG79202.1
VLENKVVFWILDKITSIFRAVPFIILLAILSPLSHLIVKTSIGPNAALVPLSFAVFAFFA 119
CAD5307872.1
FWQNRFIYLVASVLVNIIRSVPFVILLVLLLPLTQLLLGNTIGPIAASVPLSVAAIAFYA 104
ALP04977.1
LKPNSTINSALGFIVNIFRSFPFMILIVAMIPITRLIVGTSIGETAAIVPITIGAAPFIA 108
AVP34927.1
IHPSQKINQALGWVINITRSVPFLILMVALIPLTRWIVGTSYGVWAAVVPLTIAAIPFFA 105
: : .: *:.**:**:. : :: :: . * ** : ... * .
CCO64955.1
RMVEIAFREVDKGVIEAAKSMGANMFTIIGKVLIPEALPAIISGITVTAISLVGFTAMAG 172
AKP81145.1 RQVQVVFSELDKGVIEAAQASGATFWDIVK-
VYLSEGLPDLIRVSTVTLISLVGETAMAG 178
VDG79202.1 RQVQVVLAELDGGVIEAAQASGATFWDIVG-
VYLSEGLPDLIRVTTVTLISLVGETAMAG 178
CAD5307872.1
RLVDSALREVDKGIIEAALAFGASPMRIICTVLLPEASAGLLRGLTITLVSLIGYSAMAG 164
ALP04977.1
RIIESSLNEVDKGLIEAAKSFGATKRQIVFKVMIKEAMPSIVSGITLSIISILGYTAMAG 168
AVP34927.1
RIAEVSLREVDQGLIEAAQAMGCNRKQIIWHVLLPEALPGIVAGFTVTIVTMINSSAIAG 165
* : : *:* *:**** : *.. *: * : *. :: *:: ::::. :*:**
Accession Description
lcl|Query_10001 CAD5307872.1 Methionine import system permease protein MetP [Salmonella enterica
subsp. enterica serovar Typhimurium]
lcl|Query_10002 AKP81145.1 Methionine import system permease protein MetP [Streptococcus
pyogenes]
lcl|Query_10003 ALP04977.1 Methionine import system permease protein MetP [Clostridioides difficile]
lcl|Query_10004 CCO64955.1 Methionine import system permease protein MetP [Listeria
monocytogenes serotype 4b str. LL195]
lcl|Query_10005 VDG79202.1 Methionine import system permease protein MetP [Streptococcus
pneumoniae]
lcl|Query_10006 AVP34927.1 Methionine import system permease protein MetP [Acinetobacter
baumannii]
Query_10001 1 ---MDDLLPDLTLAFN--------ETFQMLSISTVLAILGGLPLGFLI
FVTDRHLFWQNRFIYLVASVLVNIIRSVP 66
Query_10002 1
MSQLIQTYLPNVYELGWSGDagwGLAIWNTLYMTIVPFIVGGAIGLLL[4]VLTGPDGVIENKTICWVIDKVTSIFRAI
P 81
Query_10003 1 ---MN-SLIDFLTTLFPNAL---LQTLYMVIVPTIVATILGFILAIIL
VVTKPDGLKPNSTINSALGFIVNIFRSFP 70
Query_10004 1 MTKLQELFPNVDFQMMWVAT---QETLYMTLVSLFAVFLLGIVLGLLL
FLTNNKKHAGARILYWITAILVNVFRSIP 74
Query_10005 1
MESLIQTYLPNVYKMGWAGQagwGTAIYLTLYMTVLSFIIGGFLGLVA[4]VLTAPGGVLENKVVFWILDKITSIFRAV
P 81
Query_10006 1 ---MQYQLIDLLIT----GT---VDTLLMVGASAFIAFLIGLPIAVIL
VSTSEHGIHPSQKINQALGWVINITRSVP 67
Query_10001 67 FVILLVLLLPLTQLLLGNTIGPIAAS---
VPLSVAAIAFYARLVDSALREVDKGIIEAALAFGASPMRIICTVLLPEASA 143
Query_10002 82 FVILIAILASFTYLLLRTTLGATAAL---
VPLTFATFPFYARQVQVVFSELDKGVIEAAQASGATFWDIVKVYL-SEGLP 157
Query_10003 71 FMILIVAMIPITRLIVGTSIGETAAIvPITIGAAPFIARIIES---
SLNEVDKGLIEAAKSFGATKRQIVFKVMIKEAMP 147
Query_10004 75 FIILIVLLLPMTKSLVGTVIGPKAAL-PALIISAAPFYGRMVE--
IAFREVDKGVIEAAKSMGANMFTIIGKVLIPEALP 151
Query_10005 82 FIILLAILSPLSHLIVKTSIGPNAAL---
VPLSFAVFAFFARQVQVVLAELDGGVIEAAQASGATFWDIVGVYL-SEGLP 157
Query_10006 68 FLILMVALIPLTRWIVGTSYGVWAAVvPLTIAAIPFFARIAEV---
SLREVDQGLIEAAQAMGCNRKQIIWHVLLPEALP 144
Query_10001 144
GLLRGLTITLVSLIGYSAMAGIVGGGGVGDLAIRYGYYRYETEVMVVTVVALIVLVQVVQMLGDWLAKRADKRdrh
219
Query_10002 158
DLIRVSTVTLISLVGETAMAGAIGAGGLGNVAISYGYNRFNNDVTWVATIIILLIIFAIQFIGDSLTRRFSHK---
230
Query_10003 148
SIVSGITLSIISILGYTAMAGAVGAGGLGNIALIYGYQRFDTAVMVYTVIALIILVQIIQGVGNLAYKKLK-----
218
Query_10004 152
AIISGITVTAISLVGFTAMAGVIGAGGLGNTAYLEGFQRGQPDVTVLATIIILIIVFIFQFIGDFLTKRTDKR---
224
Query_10005 158
DLIRVTTVTLISLVGETAMAGAVGAGGIGNVAIAYGFNRYNHDVTILATIVIILIIFAIQFLGDFLTKKLSHK---
230
Query_10006 145
GIVAGFTVTIVTMINSSAIAGAIGAGGLGDIAYRYGYQRFDMQIMLAVILVLIVLVMLVQATGDALAQQLDKRkv-
219
200
71
iii. MetP protein in Salmonella enterica, MetP protein in Streptococcus pyogenes, MetP
protein in Clostridioides difficile, MetP protein in Listeria monocytogenes, MetP
protein in Streptococcus pneumoniae and MetP protein in Acinetobacter baumannii
38
Conclusion
What I learned from this lab session is that we can find pairwise sequence alignment and
multiple sequence alignment using CLUSTALW and BLAST. These two software have a lot
of data about protein, DNA, RNA, and many more. With this software, we can differentiate
from one protein to another. The main difference between pairwise sequence alignment and
multiple sequence alignment is that the pairwise only can align two proteins while the
multiple sequence alignment can align more than two proteins. When we align those
proteins, we get much information such as the similarities, the differences, and others. Not
only that, but we also can get to know the function of a new protein when we align it to a
known protein. If they are more similar to one another, the function of the new protein is also
similar to the known protein. Lastly, when we use CLUSTALW for alignment, we can get the
information on how similar each residue of protein, either it is a match, mismatch, or very
mismatch to one another. Those software ease for everyone especially the scientist and the
researcher to find the information about the alignment of protein and how similar it is to one
another.