0% found this document useful (0 votes)
0 views24 pages

Lecture 2

The document provides an overview of various DNA and protein databases, including Genbank, EMBL, SwissProt, and the Protein Data Bank, detailing their management, contents, and functions. It also discusses data retrieval tools like Entrez, DBGET, and SRS, which facilitate access to biological data. Additionally, the document highlights the impact of bioinformatics on molecular medicine, microbial genomics, and evolutionary studies, emphasizing its significance in modern biology and technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views24 pages

Lecture 2

The document provides an overview of various DNA and protein databases, including Genbank, EMBL, SwissProt, and the Protein Data Bank, detailing their management, contents, and functions. It also discusses data retrieval tools like Entrez, DBGET, and SRS, which facilitate access to biological data. Additionally, the document highlights the impact of bioinformatics on molecular medicine, microbial genomics, and evolutionary studies, emphasizing its significance in modern biology and technology.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Introduction to DNA and

protein databases

Lecture 2
Introduction to Bioinformatics
By
Dr. Syed Babar Jamal Bacha
Assistant Professor
Department of Biological Sciences
National University of Medical Sciences

May, 2025 1
Genbank
-First database setup to store DNA sequence data, set up In 1982
-manage by National Centre Biotechnology Information
-currently holds about 17 billion bases from more than 100,000 organisms
-Each of the sequences are given an ID number for easy identification in the
database
http://www.ncbi.nlm.nih.gov/Genbank/GenbankSearch.html
EMBL (European Molecular Biology Laboratory) Nucleotide
Sequence Database

-a comprehensive database of DNA and RNA sequences


-manage by European Bioinformatics Institute (EBI)
-information collected from the scientific literature and patent
applications
-supported by 17 countries in Western Europe
-currently contains nearly more than 10 million bases
http://www.ebi.ac.uk/embl/
SwissProt:
-a database of protein sequence, function and structure
-manage by European Bioinformatics Institute
-provides a high level of integration with other databases
-a very low level of redundancy (means less identical sequences are present in the
database)
http://www.ebi.ac.uk/swissprot/
EC-ENZYME:
The 'ENZYME' data bank contains the following data for a characterized
enzyme:
-EC number
-Recommended name and Alternative names
-Catalytic activity
-Cofactors
-Pointers to the SWISS-PROT entry that correspond to the enzyme
-diseases associated with a deficiency of the enzyme
Protein Data Bank
-manage by Research Collaboratory for Structural Bioinformatics (RCSB)
-a collection of all publicly available 3D structures of proteins, nucleic
acids, carbohydrates
-variety of other complexes experimentally determined by X-ray
crystallography and NMR
http://www.rcsb.org/pdb/
PROSPECT
-Protein Structure Prediction and Evaluation Computer Toolkit
-a protein-structure prediction system
-It construct a 3-D model of proteins
by protein threading
- Protein threading
Algorithms for protein fold recognition
http://www.bioinformaticssolutions.com/products/prospect.php
BLAST (Basic Local Alignment Search Tool)

-homology and similarity tools


-develop by NCBI
-Search programs designed for the Windows platform
-used to perform fast similarity searches for protein or DNA
-users can retrieve results and format their results in different format
Access to distributed Data

Biological data is widely distributed ov er the WWW.

Data can be retrieved by,


1. Search engines

2. Data retrieval tools


Search Engine
Examples for Search Engines, Google
• Chrome
• Firefox
• Bing
• Yahoo!
• Ask, etc…

Using Search engines


1. Can find relevant web pages
2. It is difficult to find desired information
3. Difficult to find specific information.
Bing
Chrome
Data retrieval tools
Dedicated to access information for molecular biologists. Most widely used are,

• Entrez

• DBGET

• SRS

Each of these allows,

-Text based searching of a no. of linked DBs.

-Sequence searching.

They differ in,

-The DBs they cover

-How the retrieved information is accessed and presented.


Entrez
WWW-based data retrieval system.

Developed by NCBI (National Centre for Biotechnology Information).

Integrates information held in different DBs.


Entrez
• Data bases covered by Entrez are, Nucleic acid - GenBank, RefSeq.
• Protein seqs - SWISS-PROT, PIR.

• 3D structures - MMDB, PDB


• Genomes - Many sources PopSet - From GenBank OMIM - OMIM
• Taxonomy - NCBI taxonomy database
• Books- Bookshelf
• GEO - (Gene Expression Omnibus)

• Literature - PubMed
DBGET
An integrated data retrieval system developed and maintained by,
-The Institute for Chemical Research (Kyoto University)
-The Human Genome Center (University of Tokyo) Data bases covered are,

• Nucleic acid Seqs - GenBank, EMBL

• Protein Seqs - SWISS-PROT, PIR

• 3D structures PDB

• Seq motifs – PROSITE

• Enzyme reactions - LIGAND

• Literature - LITDB Medline etc.,


DBGET DBGET Search
SRS

SRS - Sequence Retrieval System

- Data retrieval tool developed by EBI

- Integrates 80 molecular biology DBs

- An Open source software (Can be installed locally)

SRS has an associated scripting language called Icar us


Molecular Medicine

-Most of the disease has a genetic component and environmental component


-we can search for the genes directly associated with different diseases
-begin to understand the molecular basis of these diseases more clearly
-better treatments, cures and even preventative tests to be developed
Microbial genomic application
-MGP (Microbial Genome Project) to sequence genomes of bacteria
-useful in energy production, industrial processing and toxic waste
reduction
-scientists can begin to understand these microbes at a very
fundamental level
-isolate the genes that give them their unique abilities to survive under
extreme conditions
Waste clean up
-Deinococcus radiodurans is known as the world's toughest bacteria
-the most radiation resistant organism known
-Scientists are interested in this organism because of its potential
usefulness in cleaning up waste sites that contain radiation and toxic
chemicals
Evolutionary studies
-sequencing of genomes from different organism
-evolutionary studies can be performed to determine the tree of life
-find last universal common ancestor
Impact of Bioinformatics 1
-Bioinformatics leads to advances in understanding basic biological
processes, treatment, and prevention of many genetic diseases
-Bioinformatics has transformed the discipline of biology from a purely
lab-based science to an information science as well
Impact of Bioinformatics 2
-modern biology and related sciences are increasingly becoming
dependent on Bioinformatics
-Thus, Bioinformatics exhibits great potential in the future development
of science and technology

You might also like