0% found this document useful (0 votes)
24 views63 pages

Bioinformatic Database Record

The document describes performing a pairwise sequence alignment using EMBOSS tools. It provides the organism names and protein names to be aligned: Homo sapiens and Mus musculus, with the proteins keratin and myosin. EMBOSS-water and EMBOSS-needle are mentioned as tools that use algorithms like Smith-Waterman to calculate local and global alignments between sequences respectively.

Uploaded by

faizal071810
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views63 pages

Bioinformatic Database Record

The document describes performing a pairwise sequence alignment using EMBOSS tools. It provides the organism names and protein names to be aligned: Homo sapiens and Mus musculus, with the proteins keratin and myosin. EMBOSS-water and EMBOSS-needle are mentioned as tools that use algorithms like Smith-Waterman to calculate local and global alignments between sequences respectively.

Uploaded by

faizal071810
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 63

EX.

NO: 1 DATE:

NATIONAL CENTRE FOR BIOTECHNOLOGY INFORMATION

AIM:
To Search on NCBI – PubMed bibliographic search, different options author name,
keyword in title, abstract, title and/or abstract, related articles, different display options.

DESCRIPTION:

The National Center for Biotechnology Information (NCBI) is part of the United States
National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH).
The NCBI is located in Bethesda, Maryland and was founded in 1988 through legislation
sponsored by Senator Claude Pepper. The NCBI houses a series of databases relevant to
biotechnology and biomedicine and is an important resource for bioinformatics tools and
services. Major databases include GenBank for DNA sequences and PubMed, a
bibliographic database for the biomedical literature. Other databases include the NCBI
Epigenomics database. All these databases are available online through the Entrez search
engine. PubMed Health provides information for consumers and clinicians on prevention and
treatment of diseases and conditions.

PROCEDURE:

1. Enter the URL: www.ncbi.nlm.nih.gov

2. Select PubMed from All Database

3. Enter the Disease name in the Search Box

4. Note the information about disease in PubMed

INPUT: Protein: casein


HOMEPAGE:
WORKSPACE:

OUTPUT:
RESULT:
Searched on NCBI and Retrieved the information about Protein: case in the different
options author name, keyword in title, abstract and related articles, different display options
in NCBI- PubMed.
EX.NO:02 DATE:

EMBL(European Molecular Biology Laboratory)


AIM:
To search on EMBL for nucleic acid sequence.

DESCRIPTION:

The European Molecular Biology Laboratory (EMBL) is a molecular


biology research institution supported by 27 member states, two prospect states, and one
associate member state. EMBL was created in 1974 and is an intergovernmental organization
funded by public research money from its member states.
The European Molecular Biology Laboratory (EMBL) is one of the highest-ranking
scientific research organizations in the world. The Laboratory's Headquarters is located in
Heidelberg (Germany), with additional sites in Grenoble (France), Hamburg (Germany),
Barcelona (Spain), Hinxton (UK) and Rome.

PROCEDURE:
1. Enter the URL: www.ebi.ac.uk.
2. Enter the Organism name in the Search Box.
3. Retrieve sequence of the Organism in EMBL.INPUT:
Organism Name: Homo sapiens.
HOME PAGE:
WORK SPACE:

OUTPUT:
>ENA|EAL24519|EAL24519.1 Homo sapiens (human) LOC401434
ATGGCCCTTCGGGGAATGCCCTGGGCGCCCCGAATACTCAGTGGGGCCTGTTACTTGGCT
GTTTCTCAACATGGAGGAGCGTGGCCCAGACGGCTTTCCCACGCAGGGGAGCATGGTCCA
GATGGCTTTCCCACGTGGGGCAGCCTGGCTCAGACGGCTTTCCCACGCAGGGGAGCATGG
TCCAGACGGCTTTCCCACACCGGGGAGCGTGGCCCAGACGGCTTTCCCACGCTGGGGAGC
CTGGTCCAGACGGCTTTCCCACACCGGGGAGCGTGGCCCAGACGGCTTTCCCACACGGGC
AGCCTGGCTCAGACGGCTTTCCCAGCCTCGCAGAGCTCCCTCTTCTGTTTTCCTGCACTG
CTAAAGCTATGGTCACTCCTTCTGCCAATGCTTGGCTTCACTTCCCTCTACTTCTCCAAG
CTGTGTCCTTTTTCTTTATTCTTATTCACTTACTACTGTTTCTCTATTATCCCTGTCTTG
CTCAATTTTGATTCCACTCCCTGGCAGTTTCATCAGTTCAAAGGAACTAGAAGTCTTCAT
CCCCTAAGCCCTCCCTCCCCCAGGGACCCCTGCCGCTGCCTAGTGCTGGAGAGGCAGACG
CCCCCGCAGTGTTTGCTGCACTGA
RESULT:
Searched and Retrieved the Nucleic acid sequence of Organism : Homo sapiens ID No:MN006677 on
EMBL.
EX NO:3 DATE:

READSEQ AND TRANSEQ


AIM:
To Study of sequence formats by ReadSeq(EMBOSS SERQRET) and TranSeq

DESCRIPTION:

READSEQ(EMBOSS SEQRET):

EMBOSS Seqret reads and writes (returns) sequences. It is useful for a variety of tasks such as
extracting sequences from databases, displaying sequences, reformatting sequences, producing the reverse
complement of a sequence, extracting fragments of a sequence, sequence case conversion or any
combination of the above functions.

TRANSEQ:

EMBOSS Transeq translates nucleic acid sequences to their corresponding peptide


sequences. It can translate to the three forward and three reverse frames, and output multiple
frame translations at once.

PROCEDURE:

1. Go to url:
2. https://www.ebi.ac.uk/Tools/sfc/readseq/
https://www.ebi.ac.uk/Tools/st/emboss_transeq/
3. Enter the input sequence in fasta format.
4. Click submit to view the result.
5. Note down study study sequence.
HOMEPAGE(WORKSPACE):
OUTPUT:
RESULT:

EMBOSS Seqret was used to read and display the (sequence name & sequence ID)

EMBOSS Transeq translates the nucleic acid sequence (sequence name & sequence
ID) to the corresponding peptide sequence based on three forward and three reverse frames
of translation.
EX.NO:3 DATE:

PROTEIN INFORMATION RESOURCES

Aim:

To perform a similarity search of PIR database for the given protein sequence.

Description:

The Protein Information Resource (PIR) is an integrated public bioinformatics


resource to support genomic, proteomic and systems biology research and scientific studies.
PIR was established in 1984 by the National Biomedical Research Foundation (NBRF) as a
resource to assist researchers in the identification and interpretation of protein sequence
information. Prior to that, the NBRF compiled the first comprehensive collection of
macromolecular sequences in the Atlas of Protein Sequence and Structure, published from
1965-1978 under the editorship of Margaret O. Dayhoff. Dr. Dayhoff and her research group
pioneered in the development of computer methods for the comparison of protein sequences,
for the detection of distantly related sequences and duplications within sequences, and for the
inference of evolutionary histories from alignments of protein sequences.

Procedure:

1. Enter the URL: https://pir.georgetown.edu/.

2. Give the protein name on the search box.

3. View the similarity information for the given protein.

Input:

Protein Name: Myosin


Home Page:

Workspace:
OUTPUT:

RESULT:

Searched and Retrieved the protein information :myosin on PIR.


EX.NO:05 DATE:

UNIVERSAL PROTEIN RESOURCE


AIM:
To perform a similarity search of UniProt database for the given protein sequence.

DESCRIPTION:

The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence and
annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB),the UniProt
Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host
institutions EMBL-EBI, SIB and PIR are committed to the long-term preservation of the UniProt
databases.The Universal Protein Resource (UniProt) is a comprehensive resource for protein sequence
and annotation data. The UniProt databases are the UniProt Knowledgebase (UniProtKB), the UniProt
Reference Clusters (UniRef), and the UniProt Archive (UniParc). The UniProt consortium and host
institutions EMBL-EBI, SIB and PIR are committed to the long-term preservation of the UniProt
databases.

PROCEDURE:
1. Enter the URL: https://www.uniprot.org

2. Give the Protein name on the search box

3. View the similarity information of the given protein

INPUT:
Organism: Cannabis sativa
Protein Name: Edestin
HOMEPAGE:
WORKSPACE:

OUTPUT:
RESULT:
Performed a similarity search of UniProt database for protein name ID and Organism
name.

EX.NO.:6 DATE:

PROTPARAM TOOL

AIM:

To retrieve various Physio-Chemical properties for a given protein sequence using Protparam
tool.

DESCRIPTION:

ProtParam computes various physical and chemical parameters for a given protein
stored in Swiss-Prot or TrEMBL or an user entered protein sequence. The computed
parameters include the molecular weight, theoretical pI, amino acid composition, atomic
composition, extinction coefficient, estimated half-life, instability index, aliphatic index and
grand average of hydropathicity.

PROCEDURE:

1. Go to URL:http://web.expasy.org/protparam/

2. Enter the protein sequence of actin .

3. Give the sequence into protparam search box

4. Note down the results.

INPUT:

Protein name:Actin
HOMEPAGE:
OUTPUT:
RESULT:

Retrieved information of various physico-chemical parameters for the protein mame & ID by
using protparam tool.
EX.NO:8 DATE:

PAIRWISE SEQUENCE ALIGNMENT

Aim:

To perform pairwise sequence alignment for a set of Organism name 1: Homo sapiens,
Organism name 2: Mus musculus Protein name 1: keratin, Protein name 2: myosin

Description:

Emboss-water uses the Smith-Waterman algorithm to calculate the local alignment of a


sequence to one or more other sequences. The gap insertion penalty, gap extension penalty
and substitution matrix used to calculate the alignments are specified. These dynamic
programming algorithms were first developed for protein sequence comparison by Smith and
Waterman, though similar methods were independently devised during the late 1960's and
early 1970's for use in the fields of speech processing and computer science. Emboss-needle
- EMBOSS Needle reads two input sequences and writes their optimal global sequence
alignment to file. This is the form for protein sequences. Please go to the nucleotide form if
you wish to align DNA or RNA sequences.

Procedure:
1. Enter the URL for Emboss
Needle:https://www.ebi.ac.uk/Tools/psa/emboss_needle/

2. To paste the sequence of two organisms to box one and box two
respectively.3. View the alignments of sequence
Input:
Organism name 1: Homo sapiens, Organism name 2: Mus musculus Protein name 1: keratin,
Protein name 2: myosin
Homepage:

Output:

Emboss Needle:
Result:
Performed pairwise sequence alignment for a set of analogous proteins.
EX.NO:9 DATE:

PRINTS DATABASE

AIM:
To perform Motifs searching in derived database PRINTS and BIOCK databases.

DESCRIPTION:

PRINTS is a compendium of protein fingerprints. A fingerprint is a group of


conserved motifs used to characterize a protein family; its diagnostic power is refined by
iterative scanning of a SWISS-PROT/TrEMBL composite. Usually the motifs do not overlap,
but are separated along a sequence, though they may be contiguous in 3D-space. Fingerprints
can encode protein folds and functionalities more flexibly and powerfully than can single
motifs, full diagnostic potency deriving from the mutual context provided by motif neighbors.

PROCEDURE:

1. GO to https://www.uniprot.org/uniprot/P42526
2. Enter the protein sequence actin
3. Run the sequence and retrieve the results
4. Note down the results

INPUT:

Protein Name: Actin


HOMEPAGE

OUTPUT
RESULT:

Motif searching in prints database has been performed for actin.


EX.NO:10 DATE:
. PROTEIN DATA BANK

AIM:

To explore the structure using PDB.

DESCRIPTION:

The Protein Data Bank (PDB) is a database for the three-dimensional structural data of large
biological molecules, such as proteins and nucleic acids. The data, typically obtained by Xray
crystallography, NMR spectroscopy, or, increasingly, cryo-electron microscopy, and submitted
by biologists and biochemists from around the world, are freely accessible on the Internet via
the websites of its member organisations (PDBe, PDBj, and RCSB). The PDB is overseen by an
organization called the World wide Protein Data Bank, wwPDB. The PDB is a key resource in
areas of structural biology, such as structural genomics. Most major scientific journals, and
some funding agencies, now require scientists to submit their structure data to the PDB. Many
other databases use protein structures deposited in the PDB.

PROCEDURE:

1. Enter the URL: http://www.rcsb.org/

2. Enter the protein name in search box

3. To click the search results and note the structure exploration

4. Note down the results

INPUT:

PROTEIN NAME: ACTIN 7ccc


HOME PAGE:

WORK SPACE:
OUTPUT:

RESULT:

Retrieved the structure and details of 7ccc(acin) exploration by using PDB database.
EX.NO: 11 DATE:

SCOP AND CATH

AIM:

To list SCOP lineage and CATH architecture description for a set of proteins

DESCRIPTION:

SCOP -The Structural Classification of Proteins database is a largely manual classification of


protein structural domains based on similarities of their structures and amino acid sequences.
A motivation for this classification is to determine the evolutionary relationship between
proteins.

CATH- The CATH Protein Structure Classification database is a free, publicly available
online resource that provides information on the evolutionary relationships of protein
domains.

PROCEDURE 1:

1. Enter the URL for SCOP: http://supfam.mrc-lmb.cam.ac.uk

2. To click the super family option and to click the sequence search

3. To paste the sequence on the search box.

4. View the result for the given sequence.

PROCEDURE 2:

1. Enter the URL for CATH: http://www.cathdb.info

2. Enter the protein name on the search box.

3. 3. View the result for the given protein


INPUT:
Protein Name: Actin

HOME PAGE:

SCOP

OUTPUT:
HOME PAGE:

CATH

OUTPUT:
RESULT:

Listed SCOP lineages and CATH architecture description for a set of proteins.
EX.NO: 12 DATE:

STRUCTURE VISUALIZATION USING RASMOL SOFTWARE

Aim:
To perform visualization of a protein structure using rasmol software.

Description:
RasMol is an important scientific tool for the visualisation of proteins, nucleic acids and
small molecules and to prepare publication-quality images was created by Roger Sayle in
1992. More controlled operations can be done only with the command line interface. Widely
used, simple to use (menus) for simple operations Complex operations require command-line
interface.

Procedure:

1. Open rasmol software , The main menu has ‘File’, ‘Display’, ‘Colours’, ‘Export’,
‘Options’, ‘Settings’.
2. To load a molecule click File > Open…. Select the file from your computer for
visualization.
3. By default the program displays the content as ‘wireframe’ model.

4. The display can be represented in different colours by selecting the ‘Colours’ options
in the menu bar by selecting the appropriate option
5. To display the Distance, Angle and Torsional measurement.

Input:

The input protein structure is 1NL1.


Work space:

OUTPUT:

DISPLAY ( WIREFRAME ):
COLOUR (PINK):

PICK IDENT:
PICK DISTANCE:
PICK ANGLE:

RESULT:

The given protein 1DWD structure was analyzed using RasMol software . with these
commands display wireframe,pick ident,colour pink pick coard.
Ex.No: 13 Date:

PYMOL

Aim:

To visualize the protein structure using PYMOL.

Description:

PyMOL is computer software, a molecular visualization system created by Warren


Lyford DeLano. It is user-sponsored, open-source software, released under the Python License. It
was commercialized initially by DeLano Scientific LLC, which was a private software company
dedicated to creating useful tools that become universally accessible to scientific and educational
communities. It is currently commercialized by Schrödinger; Inc. PyMOL can produce
highquality 3D images of small molecules and biological macromolecules, such as proteins.
According to the original author, by 2009, almost a quarter of all published images of 3D protein
structures in the scientific literature were made using PyMOL.

Procedure:

1. Click to open the PyMOL software.


2. To click the file option to open the pdb file.
3. Using the PyMOL software options to visualize the structures.
4. Note down the informations for the protein structure.

Input:

PDB Id: 1qlx, 5yj5

Protein Name: prion


Workspace:

Output:

Superimposition of Protein:
Result

Protein structure 1qlx, 5yj5 has been viewed and interpreted using pymol.
Ex. No: 14 Date:

LALIGN (NUCLEOTIDE ALIGNMENT)

AIM:
Pair wise sequence alignment by LALIGN tool.

DESCRIPTION:
Compares two sequences looking for local sequence similarities. The tool reports a number of
non-overlapping alignments between sequences. LALIGN is a “linear-space algorithm” in the
sense that it needs only space proportional to the sum of the input size and the output size . lalign
is part of the Fasta3 package. This version replaces that from the Fasta2 package. While
programs such as fasta and search report only the best alignment between the query sequence and
the library sequence, lalign reports a number of nonoverlapping alignments between sequences.

PROCEDURE:

1. Enter the URL: http://www.uniprot.org/

2. Retrieve the protein sequence from uniprot database.

3. Enter the URL: https://omictools.com/lalign-tool

4. Paste the sequence in search box

5. To click the submit button

6. Note down the results

INPUT:

Protein name: Glucose transporter

HOMEPAGE:
WORKSPCAE:
OUTPUT:.

Pairwise alignment has been done between Q9N764 (HOMO SAPIENS) and Q9JJZ
(RATTUS NORVEGICUS) using LALIGN tool.
EX NO:15 DATE:

NCBI-BLAST TOOL
AIM:

To perform Sequence similarity search using NCBI-BLAST tool

DESCRIPTION:

The most common local alignment tool is BLAST (Basic Local Alignment Search Tool)
developed by Altschul et al. The operative phrase in the phrase is local alignment. The BLAST is
a set of algorithms that attempt to find a short fragment of a query sequence that aligns perfectly
with a fragment of a subject sequence found in a database. That initial alignment must be greater
than a neighborhood score threshold. For the original BLAST algorithm, the fragment is then
used as a seed to extend the alignment in both directions. The alignment is extended in both
directions until the T score for the aligned segment does not continue to increase. The first step
of the BLAST algorithm is to break the query into short words of a specific length.

PROCEDURE:

1. Go to URL: https://blast.ncbi.nlm.nih.gov/Blast.cgi 2.
Enter the nucleotide sequence for bovine organism.
3. Enter the submit button.

4. Note down the results

.
INPUT:

Sequence name: haemoglobin AJ6168001


HOMEPAGE:
WORKSPACE:
OUTPUT:
RESULT:
Retrieved the similarity sequence from blast tool for haemoglobin sequence (Input
you have written as Homosapiens. Output screen shot is for Haemoglobin).
EX:NO:16 DATE:

MULTIPLE SEQUENCE ALIGNMENT

AIM:

To retrieve amino acid sequences (in FASTA format) of Bowman-Birk inhibitors from
different species (monocots and dicots) and perform multiple alignment with ClustalW to
evaluate their homology. To compare and comment on the conservation disulfide bridge
pattern between monocots and dicots.

DESCRIPTION:

Clustal Omega is a new multiple sequence alignment program that uses seeded guide trees
and HMM profile-profile techniques to generate alignments between three or more
sequences. For the alignment of two sequences please instead use our pairwise sequence
alignment tools.

Monocot seeds are defined as seeds that consist of a single (mono) embryonic leaf or
cotyledon.The structure of the seed and the number of cotyledons present in the seed are the
most important characteristics that allow the differentiation of monocots and dicots. Dicot
seeds are defined as seeds that consist of two embryonic leaves or cotyledons.Dicot seeds
contain a single embryo with an embryo axis and two cotyledons around it. Initially, all
angiosperms or flowering plants were grouped under dicots.cysteine is a sulfer containing
amino acid,in proteins usually exists as a cystine by forming a disulfide bond between –two
cysteine residues which is essential for forming tertiary structure and stability of a protein.
PROCEDURE:

1. Go to UNIPROT.
2. Retreive the protein sequence of monocot and dicot seeds.
3. Enter the URL: https://www.ebi.ac.uk/Tools/msa/clustalo/ 4. Run MSA using
CLUSTALW.
5. Note down the result.
HOMEPAGE

RESULT

RESULT:

Homology has been compared and comment on the conservation disulfide bridge pattern
between monocots and dicots. Sequence which has the common pattern CCD.
EX.NO.:17 DATE:

METABOLIC PATHWAY INFORMATION IN KEGG AND METACYC DATABASE

AIM:
To searching metabolic pathway information using KEGG and METACYC database.

DESCRIPTION:

KEGG:

KEGG is a database resource for understanding high-level functions and utilities of the
biological system, such as the cell, the organism and the ecosystem, from genomic and
molecular-level information. It is a computer representation of the biological system, consisting
of molecular building blocks of genes and proteins (genomic information) and chemical
substances (chemical information) that are integrated with the knowledge on molecular wiring
diagrams of interaction, reaction and relation networks (systems information). It also contains
disease and drug information (health information) as perturbations to the biological system.

METACYC:
MetaCyc is a curated database of experimentally elucidated metabolic pathways from all
domains of life. MetaCyc contains 2937 pathways from 3295 different organisms. MetaCyc
contains pathways involved in both primary and secondary metabolism, as well as associated
metabolites, reactions, enzymes, and genes. The goal of MetaCyc is to catalog the universe of
metabolism by storing a representative sample of each experimentally elucidated pathway.
PROCEDURE:

KEGG:

1. enter the URL:http://www.genome.jp/kegg/


2. To click the metabolic option
3. To enter the disease name and click the search box
4. To retrieve the metabolic pathway information
5. Note down the result

METACYC:

1. Enter the URL:http://metacyc.org/


2. To click the analysis and click the comparative
analysis
3. Enter the organisms in search box 4. To note down the
pathway information
5. Note down the results.

INPUT: KEGG

Disease name: malaria

METACYC:

Organism name: mycobacterium tuberculosis H37Rv

Mycobacterium tuberculosis KZN1435


HOMEPAGE:

WORKSPACE:
METACYC:

HOMEPAGE:

WORKSPACE:
OUTPUT:

KEGG:
METACYC:
Result

To retrieve mycobacterium tuberculosis H37RV, mycobacterium tuberculosis


KZN1435 metabolic pathway information from KEGG and metacyc database.

EX.NO:18 DATE:

MEGA
AIM:
1.To perform phylogenetic analysis by neighbour joining method using the kimura two-
parameter model for a set of nucleotide sequences

2. To perform phylogenetic analysis by neighbour joining method using the Dayhoff PAM
matrix for a set of amino acid sequence(ribonucleases)

DESCRIPTION:

The molecular evolutionary genetics analysis (MEGA) software is a desktop application


designed for comparative analysis of homologous gene sequences either from or from different
species with a special emphasis on inferring evolutionary relationship and patterns of DNA and
protein evolution. In addition to the tools for statistical analysis of data, mega provides many
convienent facilities for the assembly of sequence data set from files or web based resportories,
and it include tool for visual presentation of the result obtained in the form phylogentic trees and
evolutionary distance matrixes. Here we discuss the motivation design principles and priorities
that have shaped the development of mega.

PROCEDURE:

1. To collect and download the similar sequence of a particular organism using BLAST.

2. Open the mega software and upload the download file.

3. Click the alignment menu – edit built alignment and sub window open.

4. To paste the download sequence.

5. Click data menu – export alignment - mega format.

6. To construct phylogentic tree and distance of that organism.

7. Close the window.


Result:

We constructed the phylogenetic tree and genetic diversity using neighbour joining
algorithm.

You might also like