Bioinformatics Pratical File
Bioinformatics Pratical File
Bommareddy Sudheshna
Enrollment number - A0504120116
Roll number - BTB/20/252
Section - B
Submitted To - Dr. Bhawna Rathi
(BIOF202)
Exercise – 1
PubMed
Aim:
To study omicron virus using PubMed.
Output:
Conclusion:
We can conclude that PubMed is a free resource
supporting the search and retrieval of biomedical and life sciences
literature with the aim of improving health–both globally and
personally. The PubMed database contains more than 33 million
citations and abstracts of biomedical literature. Here in the above
example I got output of abstract by searching about omicron virus.
Exercise – 2
Aim:
To know about PMC and to study about nucleotide
sequence of Omicron virus using GenBank and GenPept.
Output:
Conclusion:
We can conclude that PubMed Central (PMC) is a
free digital repository that archives open access full-text scholarly
articles that have been published in biomedical and life sciences
journals. The GenBank database is designed to provide and encourage
access within the scientific community to the most up-to-date and
comprehensive DNA sequence information. GenPept is a database of
GenBank gene products, namely the translation of all CDS (coding
sequence) features with a translation qualifier. Here in the above
example I got output of nucleotide sequence of omicron virus.
Exercise – 3
Aim:
To know about Taxonomy, GEO, Structure & OMIM by
using different examples and sequences.
Output:
(i)Taxonomy:
(ii) GEO:
(iii) Structure:
(iv) OMIM:
Conclusion:
We can conclude that GenBank Taxonomy browser
presents the classification used for the genetic database at the NCBI
as an idented list. Every taxon in the list can be clicked and used to
examine the taxonomic structure and to retrieve sequence data
according to taxonomic criteria. GEO is a public functional genomics
data repository supporting MIAME-compliant data submissions.
Array- and sequence-based data are accepted. Tools are provided to
help users query and download experiments and curated gene
expression profiles. NCBI Structure is a database with experimentally
resolved structures of proteins and nucleotides derived from the
Protein Data Bank (PDB), with value-added features such as explicit
chemical graphs and computationally-identified similar 3D structures.
OMIM is a comprehensive, authoritative compendium of human
genes and genetic phenotypes that is freely available and updated
daily. Here in the above examples we got outputs for all the
specifications with different examples.
Exercise - 4
Aim:
To know about Gene & PDB by using nucleotide sequence.
Output:
(i)Gene:
(ii) PDB:
Conclusion:
We can conclude that NCBI's Gene resources
include collections of curated nucleotide sequences used as
references, sequence clusters to predict and study homologs, and
various databases and tools for the study of gene expression. PDB is a
very important database when it comes to the areas of structural
biology. Structures in PDB have wide applications. They can be used
for various studies including identification of new protein structures
via in silico approaches or can be used for protein–nucleic acid
interaction studies. Here in the above example we got outputs of
nucleotide sequence by using Gene and PDB.
Exercise – 5
Aim:
To study covid 19 virus using Uniprot & plotting Dotplot
for small and large sequences.
Output:
Conclusion:
We can conclude that The Universal Protein
Resource (UniProt) is a comprehensive resource for protein sequence
and annotation data. In bioinformatics a dot plot is a graphical method
for comparing two biological sequences and identifying regions of
close similarity after sequence alignment. It is a type of recurrence
plot. Here in the above example we got outputs by using covid 19
virus in uniprot and sequences in dotplot.
Exercise - 6
Aim:
To perform needle – global alignment for protein and
DNA sequences.
Output:
(i)Protein:
(ii) DNA:
Conclusion:
We can conclude that global alignment aligns two
sequences from beginning to end, aligning each letter in each
sequence only once. An alignment is produced, regardless of whether
or not there is similarity between the sequences. Here in the above
example we got outputs of protein and DNA sequences.
Exercise – 7
Aim:
To perform water & L – align program for protein
sequences.
Output:
(i)Water:
(ii) L – align:
Conclusion:
We can conclude that EMBOSS Water uses the
Smith-Waterman algorithm (modified for speed enhancements) to
calculate the local alignment of two sequences. LALIGN -- local
alignment tool Compare two protein or DNA sequences for local
similarity and show the local sequence alignments. Here in the above
example we got the outputs of protein sequences.
Exercise – 8
Aim:
To perform BLASTN program for searching nucleotide
databases using a nucleotide query and BLASTP program for
searching protein databases using a protein query.
Output:
(i)BLASTN:
(ii) BLASTP:
Conclusion:
We can conclude that BLAST is a computer
algorithm that is available for use online at the National Centre for
Biotechnology Information (NCBI) website, as well as many other
sites. BLASTN can rapidly align and compare a query DNA sequence
with a database of sequences, which makes it a critical tool in ongoing
genomic research. Standard protein-protein BLAST (BLASTP) is
used for both identifying a query amino acid sequence and for finding
similar sequences in protein databases. Like other BLAST programs,
BLASTP is designed to find local regions of similarity. Here in the
above example we got the outputs of protein and nucleotide databases
with red lines.
Exercise – 9
Aim:
To perform BLASTX program for searching protein
databases using a translated nucleotide query. TBLASTN program for
searching translated nucleotide databases using a protein query and
TBLASTX program for searching translated nucleotide databases
using a translated nucleotide query.
Output:
Conclusion:
We can conclude that BLASTX is a powerful
gene‐finding or gene‐predicting tool. It is recommended for
identifying the protein‐coding genes in genomic DNA/cDNA. It is
also used to detect whether a novel nucleotide sequence is a protein‐
coding gene or not, and it can be used to identify proteins encoded by
transcripts or transcript variants. TBLASTN operates by translating
database nucleotide sequences to hypothetical amino acid sequences
in all six reading frames and then aligning the hypothetical amino acid
sequences to the query. TBLASTN is widely used as associating
proteins with chromosomes or with mRNAs is useful in many
biological studies. The purpose of TBLASTX is to find very distant
relationships between nucleotide sequences. Protein-nucleotide 6-
frame translation (TBLASTN) This program compares a protein
query against the all six reading frames of a nucleotide sequence
database. It may be used to map a protein to genomic DNA. Here in
the above example we got outputs of all the required protein databases
and translated nucleotide databases. And we can observe the red and
green lines in the outputs.
Exercise – 10
Virtual Lab
Aim:
To know about virtual lab.
Output:
Conclusion:
We can conclude that this virtual laboratory of
biotechnology and biomedical engineering is for undergraduate and
postgraduate students to get a deeper understanding on the analysis of
sequence data, its alignment. Here in the above example we got the
outputs of virtual lab by doing different activities.
Exercise – 11
Aim:
To use multiple sequence alignment.
Output:
Conclusion:
We can conclude that Clustal Omega is a new
multiple sequence alignment program that uses seeded guide trees and
HMM profile-profile techniques to generate alignments between three
or more sequences. For the alignment of two sequences please instead
use our pairwise sequence alignment tools. Here in the above example
we got the output of different sequences through multiple alignment
program.
Exercise – 12
Clustal
Aim:
To use custal.
Output:
Conclusion:
We can conclude that Clustal omega like the other
Clustal tools is used for aligning multiple nucleotide or protein
sequences in an efficient manner. It uses progressive alignment
methods, which align the most similar sequences first and work their
way down to the least similar sequences until a global alignment is
created. Here in the above example we got the outputs of multiple
sequence alignment through clustal.
Exercise – 13
Aim:
To use GOR4 & SOPMA.
Output:
Conclusion:
We can conclude that The GOR method is an
information theory-based method for the prediction of secondary
structures in proteins. The Self-Optimized Prediction method With
Alignment (SOPMA) is a tool to predict the secondary structure of a
protein. Based on the query (primary sequence of a protein), SOPMA
will predict its secondary structure. Here in the above example we got
the output for detailed secondary structure and information of protein.
Exercise – 14
SWISS-MODEL
Aim:
To plot Ramachandran plot using Swiss-model.
Output:
Conclusion:
We can conclude that SWISS-MODEL provides
several levels of user interaction through its World Wide Web
interface: in the 'first approach mode' only an amino acid sequence of
a protein is submitted to build a 3D model. Template selection,
alignment and model building are done completely automated by the
server. Here in the above example we got the outputs of 3D model
and the explanation about the structure of amino acid sequences
through Swiss-model.