0% found this document useful (0 votes)

343 views41 pages

Unit 6 - Bioinformatics

Bioinformatics is an interdisciplinary field involving biology, computer science, mathematics and statistics. It uses computational techniques to solve biological problems related to data representation, storage, analysis and interpretation of various types of biological data like DNA sequences, protein structures, and functional genomics experiments. There are three main sub-disciplines within bioinformatics - developing algorithms and statistics to analyze relationships in large datasets, analyzing and interpreting different types of biological data, and developing tools to efficiently access and manage different types of information. Biological databases play an important role in bioinformatics by making vast amounts of biological data available in a computer readable format.

Uploaded by

Leon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

343 views41 pages

Unit 6 - Bioinformatics

Uploaded by

Leon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 41

BIOINFORMATICS

INTRODUCTION
• Bioinformatics is an interdisciplinary field mainly involving
molecular biology and genetics, computer science, mathematics,
and statistics.
• computational techniques for solving biological problems
1. data problems: representation (graphics), storage and retrieval
(databases), analysis (statistics, artificial intelligence,
optimization, etc.)
2. biology problems: sequence analysis, structure or function
prediction, data mining, etc. also called computational biology
• National Center for Biotechnology Information (NCBI 2001)
defines bioinformatics as: Bioinformatics is the field of science in
which biology, computer science, and IT merge into a single
discipline

• There are three important sub-disciplines within bioinformatics

1. the development of new algorithms and statistics which assess
relationships among members of large data sets

2. the analysis and interpretation of various types of data

including nucleotide and amino acid sequences, protein domains,
and protein structures

3. the development and implementation of tools that enable

efficient access and management of different types of information.
• Types of datasets : genome sequences, macromolecular
structures, and functional genomics experiments (e.g.
microarray data)

• other data : phylogenetic and metabolic pathway analysis,

the text of scientific papers, and plant varietal information
and statistics.

• Analysis of biological data requires application of large

number of techniques like primary sequence alignment,
protein 3D structure alignment, phylogenetic tree
construction, prediction and classification of protein
structure, prediction of RNA structure, prediction of protein
function, and expression data clustering.
• Development of suitable algorithms is an important part of
bioinformatics

• The techniques and algorithms were specifically developed

for the analysis of biological data, for instance, the dynamic
programming algorithm for sequence alignment is one of the
most popular programmes among the biologists

• The sequence information generated worldwide is stored

systematically in different types of databases

• Hence, it is necessary to understand about the databases

and their different types
Pattern recognition
• The initiation of translation or transcription process is
determined by the presence of specific patterns of DNA or
RNA, or motifs.
• Research on detecting specific patterns of DNA sequences
such as genes, protein coding regions, promoters, etc., leads
to uncover functional aspects of cells.
• Patterns are used in database searching eg:- BLOCKS in
protein database
• Pattern searching on BLAST and FASTA for the closest
matches
PATTERNS MOST EXAMINED IN DNA
SEQUENCES

Gene features DNA characteristics

Coding sequences ORFs,GC rich, CpG content

Translational start and Start:ATG, Stop:TAA,TAG,TGA

stop sites

Splice site(exon/intron Consensus sequences

borders)

Promoter regions TATA,shine-dalgarno,Pribnow,Kozak

consensus, CpG content

Poly A Signals Consensus sequence ,10-20 bases

upstream to poly A tail
Prokaryotic gene structure

ORF (open reading frame)

TATA box
Start codon Stop codon

ATGACAGATTACAGATTACAGATTACAGGATAG
Frame 1
Frame 2
Frame 3
Prokaryotes
• Advantages
– Simple gene structure
– Small genomes (0.5 to 10 million bp)
– No introns
– Genes are called Open Reading Frames (ORFs)
– High coding density (>90%)
• Disadvantages
– Some genes overlap (nested)
– Some genes are quite short (<60 bp)
Gene finding approaches

1) Rule-based (e.g, start & stop codons)

2) Content-based (e.g., codon bias,

promoter sites)

3) Similarity-based (e.g., orthologs)

4) Pattern-based (e.g., machine-learning)

5) Ab-initio methods (FFT)

Simple rule-based gene finding
• Look for putative start Codon (ATG)
• Staying in same frame, scan in groups of three
until a stop Codon is found
• If no: of codons >=50, assume it’s a gene
• At end of chromosome, repeat process for
reverse complement
Example ORF
Content based gene prediction method

• RNA polymerase promoter site (-10, -30 site

or TATA box)
• Shine-Dalgarno sequence (+10, Ribosome
Binding Site) to initiate protein translation
• Codon biases
• High GC content
Similarity-based gene finding
• Take all known genes from a related genome and compare
them to the query genome via BLAST
• Disadvantages:
– Orthologs (genes in different species that evolved from a common
ancestral gene by speciation)/paralogs genes related by
duplication within a genome –evolve new function) sometimes
lose function and become pseudogenes
– Not all genes will always be known in the comparison genome
– The best species for comparison isn’t always obvious
• Similarity comparisons are good supporting evidence for
prediction validity
Machine Learning Techniques
Hidden Markov Model
ANN based method
Bayes Networks
Ab-initio Methods
• Fast Fourier Transform (algorithm) based
methods
• Poor performance
• Able to identify new genes
• FTG method (FTG is a web server for analyzing
nucleotide sequences to predict the genes
using Fourier transform techniques).
Eukaryotic genes
Eukaryotes
• Complex gene structure
• Large genomes (0.1 to 3 billion bases)
• Exons and Introns (interrupted)
• Low coding density (<30%)
– 3% in humans, 25% in Fugu, 60% in yeast
• Alternate splicing (40-60% of all genes)
• Considerable number of pseudogenes
Finding Eukaryotic Genes Computationally

• Rule-based
– Not as applicable – too many false positives
• Content-based Methods
– CpG islands, GC content, hexamer repeats, composition statistics, codon
frequencies
• Feature-based Methods
– donor sites, acceptor sites, promoter sites, start/stop codons, polyA signals,
feature lengths
• Similarity-based Methods
– sequence homology, EST (expressed sequence tags) searches
• Pattern-based
– HMMs, Artificial Neural Networks
• Most effective is a combination of all the above
Gene prediction programs
• Rule-based programs
– Use explicit set of rules to make decisions.
– Example: GeneFinder
• Neural Network-based programs
– Use data set to build rules.
– Examples: Grail, GrailEXP
• Hidden Markov Model-based programs
– Use probabilities of states and transitions between
these states to predict features.
– Examples: Genscan, GenomeScan
Combined Methods

• GRAIL (http://compbio.ornl.gov/Grail-1.3/)

• FGENEH (http://www.bioscience.org/urllists/genefind.htm)

• HMMgene (http://www.cbs.dtu.dk/services/HMMgene/)

• GENSCAN(http://genes.mit.edu/GENSCAN.html)
• GenomeScan (http://genes.mit.edu/genomescan.html)

• Twinscan (http://ardor.wustl.edu/query.html)
Egpred: Prediction of Eukaryotic Genes

• Similarity Search
– First BLASTX against RefSeq datbase

– Second BLASTX against sequences from first BLAST

– Detection of significant exons from BLASTX output

– BLASTN against Introns to filter exons

• Prediction using ab-initio programs

– NNSPLICE used to compute splice sites

• Combined method
Biological databases
Introduction
• Biological databases : libraries of life sciences information,
collected from scientific experiments, published literature, high-
throughput experiment technology, and computational analysis

• Information from research area including genomics, proteomics,

metabolomics, microarray gene expression, phylogenetics

• There are two main functions of biological databases:

1. Make biological data available to scientists.

2. To make biological data available in computer readable form.

• Biological databases can be broadly classified into sequence and
structure databases

• Sequence databases are applicable to both nucleic acid sequences

and protein sequences

• structure database is applicable to only Proteins.

• The first database was created within a short period after the Insulin
protein sequence was made available in 1956.

• Around mid 1960s, the first nucleic acid sequence of Yeast tRNA with
77 bases (individual units of nucleic acids) was found out. During this
period, three dimensional structures of proteins were studied and
the well known Protein Data Bank was developed as the first protein
structure database with only 10 entries in 1972
• Databases in general can be classified in to primary, secondary or composite
databases.

• A primary database contains information of the sequence or structure alone.

• Experimental results are submitted directly into the database by researchers, and
the data are essentially archival in nature.

• Once given a database accession number, the data in primary databases are never
changed: they form part of the scientific record.

• Examples of these include

1. Swiss-Prot , PIR - protein sequences,

2. EMBL, GenBank & DDBJ -Genome sequences {International Nucleotide Sequence

Database Collaboration (INSDC)}

3. PDB, SCOP-protein structures.

International nucleotide data banks

GenBank
EMBL
USA
Europe

EMBL International NLM

EBI Advisory Meeting NCBI

Collaborative
Meeting
TrEMBL DDBJ NRDB
Japan

NIG
CIB
Genbank (NCBI)

Created in 1988 as a part of the

National Library of Medicine at NIH
– Open access, annotated collection of publically available
nucleotide sequence
– Produced & maintained by NCBI
– Accessed & searched through Entrez system at NCBI
– Develop software tools for sequence analysis
EMBL
• European Molecular Biology Laboratory
• Supported by 20 European countries &
Australia
• Nucleotide sequence database
• Maintained by EBI (European
Bioinformatics Institute)
DDBJ
• DNA Data Bank of Japan
• Collaboration with EMBL & Genbank
• Run by National Institute of Genetics
• A secondary database contains derived information
from the primary database.

• They are often referred to as curated databases but

this is a bit of a misnomer because primary
databases are also curated to ensure that the data
in them is consistent and accurate.
Primary database Secondary database

Curated database;
Synonyms Archival database
knowledgebase

Results of analysis, literature

Direct submission of
research and interpretation,
Source of data experimentally-derived data
often of data in primary
from researchers
databases

•InterPro (protein families,

•ENA, GenBank and DDBJ (nucl
motifs and domains)
eotide sequence)
•UniProt
•Array Express
Knowledgebase (sequence and
Archive and GEO (functional
functional information on
Examples genomics data)
proteins)
•Protein Data Bank (PDB;
•Ensembl (variation, function,
coordinates of three-
regulation and more layered
dimensional macromolecular
onto whole genome
structures)
sequences)
Composite protein Databases
• There are a number of "composite" databases of protein
sequences.

• These compile their sequence data from the primary

sequence databases and filter them to retain only the non-
redundant sequences.

• The best-known are OWL, NCBI

• PIR (Protein Information Resource), SWISS-PROT, TrEMBL

• PROSITE, Pfam (motif databases)

Database searching

• Database use a system where an entry can be

identified in 2 different ways :
1. Identifier

2. Accession code (or number)

1. Identifier :
– String of letters & digits

– Abbreviation of full protein or gene name

– “locus” in GenBank , “entry name” in SWISS-PROT

– Changeable

Eg : KRAF_HUMAN is the entry name for Raf-1 oncogene from Homo

sapiens

2. Accession code (or number) :

– A number ( possibly with a few character in front) uniquely identifies an
entry in its database

– Stable

Eg : accession code for KRAF_HUMAN in SWISS-PROT is P04049

• Some software systems must be used to perform
the searches like
– all entries with keyword (eg : “GTPase”)

– entry with a given literature reference (by author or

article )

– all protein with a keyword (eg : “ribosomal”)

• Two examples of such software systems :

– SRS - The sequence retrieval system

– ENTREZ
• SRS :
– Sequence Retrieval System

– Developed by EBI

– System for integrating heterogeneous databases

– Web oriented system, accessed through HTML pages & Common Gateway
Interface(CGI) scripts

• ENTREZ :
– Developed & accessible at NCBI Entrez site

– Provide search facilities for large no. of databases & links between them

– Provides a well defined web interface

Genome Annotation and Tools
No ratings yet
Genome Annotation and Tools
20 pages
Bioinformatics Pratical File
No ratings yet
Bioinformatics Pratical File
63 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
Manual PDF
100% (1)
Manual PDF
53 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Unit 1: Structural Genomics
No ratings yet
Unit 1: Structural Genomics
4 pages
microRNA, Gene Mapping and Gene Sequencing
No ratings yet
microRNA, Gene Mapping and Gene Sequencing
33 pages
Molecular Diagnostics
No ratings yet
Molecular Diagnostics
19 pages
Dna Purification and Extraction Practical Report
100% (1)
Dna Purification and Extraction Practical Report
8 pages
RFLP & Rapd
100% (3)
RFLP & Rapd
25 pages
Unit1 - Bioinformatics (KBT-603)
No ratings yet
Unit1 - Bioinformatics (KBT-603)
91 pages
Omics Technology: October 2010
No ratings yet
Omics Technology: October 2010
28 pages
Molecular Marker
No ratings yet
Molecular Marker
3 pages
Nucleic-Acid Isolation Methods: Michael T Madziva, PHD
No ratings yet
Nucleic-Acid Isolation Methods: Michael T Madziva, PHD
33 pages
Techniques in Molecular Biology (COMPLETE)
100% (1)
Techniques in Molecular Biology (COMPLETE)
51 pages
7.1 Linkage and Crossing Over
No ratings yet
7.1 Linkage and Crossing Over
34 pages
Quantitative Polymerase Chain Reaction
No ratings yet
Quantitative Polymerase Chain Reaction
14 pages
Bioinformatics. CH 3 Databases (Summarized Notes)
50% (2)
Bioinformatics. CH 3 Databases (Summarized Notes)
5 pages
Omics
No ratings yet
Omics
6 pages
FASTA
No ratings yet
FASTA
33 pages
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
No ratings yet
Bioinformatics Is The Inter-Disciplinary Branch of Biology Which Merges Computer Science, Mathematics and Engineering To Study The Biological Data
26 pages
Transgenic Animals: Their Benefits To Human Welfare: What Is A Transgenic Animal?
No ratings yet
Transgenic Animals: Their Benefits To Human Welfare: What Is A Transgenic Animal?
7 pages
Polymerase Chain Reaction (PCR)
No ratings yet
Polymerase Chain Reaction (PCR)
21 pages
DNA Footprinting: Pranjali Priya 15-MSVM 06 M.Sc. Biochemistry and Molecular Biology
100% (1)
DNA Footprinting: Pranjali Priya 15-MSVM 06 M.Sc. Biochemistry and Molecular Biology
10 pages
Group # 13
No ratings yet
Group # 13
49 pages
Molecular Genetic Diagnosis
No ratings yet
Molecular Genetic Diagnosis
47 pages
Clone Identification, Screening, Selection
No ratings yet
Clone Identification, Screening, Selection
21 pages
Genetics, Lecture 11 (Lecture Notes)
No ratings yet
Genetics, Lecture 11 (Lecture Notes)
16 pages
Molecular Markers
No ratings yet
Molecular Markers
39 pages
Biological Databases
No ratings yet
Biological Databases
39 pages
Bioinformatics History of Bioinformatics
No ratings yet
Bioinformatics History of Bioinformatics
10 pages
Purification of DNA
No ratings yet
Purification of DNA
14 pages
Bi0505 Lab
No ratings yet
Bi0505 Lab
102 pages
Lab Report 2 Bioinformatics
No ratings yet
Lab Report 2 Bioinformatics
17 pages
PCR Lecture
100% (1)
PCR Lecture
35 pages
DNA Microarray
100% (1)
DNA Microarray
34 pages
Gene Sequencing: Darshan Maheshbhai Patel 1 Sem M. Pharm Dept. of Pharmacology Anand Pharmacy College Guide: Anjali Patel
100% (1)
Gene Sequencing: Darshan Maheshbhai Patel 1 Sem M. Pharm Dept. of Pharmacology Anand Pharmacy College Guide: Anjali Patel
47 pages
MEDIA Animal Cell Culture
100% (10)
MEDIA Animal Cell Culture
28 pages
Sequence Analysis &alignment
100% (1)
Sequence Analysis &alignment
2 pages
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
No ratings yet
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
9 pages
Molecular Cell Biology 1. Exam Questions
No ratings yet
Molecular Cell Biology 1. Exam Questions
3 pages
Biotechnology by Satyanarayana LSA Gwalior PDF
100% (1)
Biotechnology by Satyanarayana LSA Gwalior PDF
687 pages
Gene Cloning
100% (1)
Gene Cloning
54 pages
BLAST
100% (1)
BLAST
4 pages
Biological Databases Genbank
No ratings yet
Biological Databases Genbank
31 pages
Postranslational Modification
No ratings yet
Postranslational Modification
78 pages
Blotting Techniques and Their Applications
No ratings yet
Blotting Techniques and Their Applications
40 pages
Phylogenetic Tree Lab (FASTA)
No ratings yet
Phylogenetic Tree Lab (FASTA)
8 pages
Russel - Capt3 - Replicacion
No ratings yet
Russel - Capt3 - Replicacion
25 pages
DNA Manipulative Enzymes
No ratings yet
DNA Manipulative Enzymes
17 pages
Types of Electrophoresis and Dna Fingerprinting B: 5,, ,: Y Group Lood Martos Panganiban Trangia
100% (1)
Types of Electrophoresis and Dna Fingerprinting B: 5,, ,: Y Group Lood Martos Panganiban Trangia
73 pages
Phylogenetic Analysis
No ratings yet
Phylogenetic Analysis
25 pages
Omics-Based On Science, Technology, and Applications Omics
0% (1)
Omics-Based On Science, Technology, and Applications Omics
22 pages
(Chem 102.2) Polymerase Chain Reaction
No ratings yet
(Chem 102.2) Polymerase Chain Reaction
19 pages
Molecular Markers
No ratings yet
Molecular Markers
4 pages
Phylogenetic Trees
100% (2)
Phylogenetic Trees
20 pages
PCR Based Molecualr, Genetic Markers
No ratings yet
PCR Based Molecualr, Genetic Markers
59 pages
Experiments in Molecular Cell Biology: A Problems Book With Multiple-Choice Question-Based Tests
No ratings yet
Experiments in Molecular Cell Biology: A Problems Book With Multiple-Choice Question-Based Tests
20 pages
Proteomics Introduction
67% (3)
Proteomics Introduction
39 pages
Protein Database Overview
No ratings yet
Protein Database Overview
13 pages
Cell Division
No ratings yet
Cell Division
7 pages
Genetic Redundancy Eliminates The Dream of Beneficial Mutations 2153 0602 1000201
100% (3)
Genetic Redundancy Eliminates The Dream of Beneficial Mutations 2153 0602 1000201
3 pages
Biology Practical Notebook Standard XII
No ratings yet
Biology Practical Notebook Standard XII
5 pages
Hierarchy of The Biosphere - Opusa
No ratings yet
Hierarchy of The Biosphere - Opusa
23 pages
Microbiology - Acid Fast Staining Lab
50% (2)
Microbiology - Acid Fast Staining Lab
2 pages
Addis Ababa University Institute of Biotechnology Presentation Assignment For The Course Advanced Molecular Biology (BIOT 801)
No ratings yet
Addis Ababa University Institute of Biotechnology Presentation Assignment For The Course Advanced Molecular Biology (BIOT 801)
23 pages
POXVIRUS
No ratings yet
POXVIRUS
12 pages
Journal of Fisheries
No ratings yet
Journal of Fisheries
13 pages
Chapter 1 Sir Jagadish Chandra Bose (1858-1937) : A Pioneer in Photosynthesis Research and Discoverer of Unique Carbon Assimilation in Hydrilla
No ratings yet
Chapter 1 Sir Jagadish Chandra Bose (1858-1937) : A Pioneer in Photosynthesis Research and Discoverer of Unique Carbon Assimilation in Hydrilla
10 pages
Enabling Assessment 3.1 Plant Cell Structure: Name Date Teacher CYS Points 30 Points Time Allotment 60 Minutes Score
No ratings yet
Enabling Assessment 3.1 Plant Cell Structure: Name Date Teacher CYS Points 30 Points Time Allotment 60 Minutes Score
3 pages
Zoology Assignment #1
No ratings yet
Zoology Assignment #1
5 pages
Wildlife Conservation
No ratings yet
Wildlife Conservation
1 page
From Bench To Plant Scale Up Specialty Chemical Processes Directly
100% (1)
From Bench To Plant Scale Up Specialty Chemical Processes Directly
8 pages
Dispersal: A Central and Independent Trait in Life History Dries Bonte and Maxime Dahirel
No ratings yet
Dispersal: A Central and Independent Trait in Life History Dries Bonte and Maxime Dahirel
9 pages
5th Eight Legged Creatures PDF
No ratings yet
5th Eight Legged Creatures PDF
7 pages
SPM 1119 2011 Bi K2
No ratings yet
SPM 1119 2011 Bi K2
19 pages
K01592 20180419014724 Annelids
No ratings yet
K01592 20180419014724 Annelids
3 pages
Difference Between Gram Positive and Gram Negative Bacteria: April 2017
No ratings yet
Difference Between Gram Positive and Gram Negative Bacteria: April 2017
14 pages
Phalsa-Seed To Fruit
No ratings yet
Phalsa-Seed To Fruit
29 pages
Stellaria Media 2
No ratings yet
Stellaria Media 2
23 pages
Science Chapter 6
No ratings yet
Science Chapter 6
13 pages
Darwin's Finches and Australian Fauna and Flora
No ratings yet
Darwin's Finches and Australian Fauna and Flora
7 pages
Application of Genetics Enggineering in Medicine
No ratings yet
Application of Genetics Enggineering in Medicine
33 pages
Biotechnology Sample Sop 2
100% (4)
Biotechnology Sample Sop 2
2 pages
390 699 1 SM
No ratings yet
390 699 1 SM
11 pages
Rainforest Food Web
No ratings yet
Rainforest Food Web
6 pages
Science 2007 Breakthrough of The Year Human Genetic Variation
No ratings yet
Science 2007 Breakthrough of The Year Human Genetic Variation
2 pages
Third Periodical Examination in Science 4
No ratings yet
Third Periodical Examination in Science 4
3 pages
Handwriting-Poem Ver 7
No ratings yet
Handwriting-Poem Ver 7
6 pages
Activity:: After Watching The Video Clip, Record Your Answer To This Question in ISN
No ratings yet
Activity:: After Watching The Video Clip, Record Your Answer To This Question in ISN
32 pages

Unit 6 - Bioinformatics

Uploaded by

Unit 6 - Bioinformatics

Uploaded by

BIOINFORMATICS

• There are three important sub-disciplines within bioinformatics

2. the analysis and interpretation of various types of data

3. the development and implementation of tools that enable

• other data : phylogenetic and metabolic pathway analysis,

• Analysis of biological data requires application of large

• The techniques and algorithms were specifically developed

• The sequence information generated worldwide is stored

• Hence, it is necessary to understand about the databases

Gene features DNA characteristics

Coding sequences ORFs,GC rich, CpG content

Translational start and Start:ATG, Stop:TAA,TAG,TGA

Splice site(exon/intron Consensus sequences

Promoter regions TATA,shine-dalgarno,Pribnow,Kozak

Poly A Signals Consensus sequence ,10-20 bases

ORF (open reading frame)

1) Rule-based (e.g, start & stop codons)

2) Content-based (e.g., codon bias,

3) Similarity-based (e.g., orthologs)

4) Pattern-based (e.g., machine-learning)

5) Ab-initio methods (FFT)

• RNA polymerase promoter site (-10, -30 site

– Second BLASTX against sequences from first BLAST

– Detection of significant exons from BLASTX output

– BLASTN against Introns to filter exons

• Prediction using ab-initio programs

• Information from research area including genomics, proteomics,

• There are two main functions of biological databases:

1. Make biological data available to scientists.

2. To make biological data available in computer readable form.

• Sequence databases are applicable to both nucleic acid sequences

• structure database is applicable to only Proteins.

• A primary database contains information of the sequence or structure alone.

• Examples of these include

1. Swiss-Prot , PIR - protein sequences,

2. EMBL, GenBank & DDBJ -Genome sequences {International Nucleotide Sequence

3. PDB, SCOP-protein structures.

EMBL International NLM

EBI Advisory Meeting NCBI

Created in 1988 as a part of the

• They are often referred to as curated databases but

Results of analysis, literature

•InterPro (protein families,

• These compile their sequence data from the primary

• The best-known are OWL, NCBI

• PIR (Protein Information Resource), SWISS-PROT, TrEMBL

• PROSITE, Pfam (motif databases)

• Database use a system where an entry can be

2. Accession code (or number)

– Abbreviation of full protein or gene name

– “locus” in GenBank , “entry name” in SWISS-PROT

Eg : KRAF_HUMAN is the entry name for Raf-1 oncogene from Homo

2. Accession code (or number) :

Eg : accession code for KRAF_HUMAN in SWISS-PROT is P04049

– entry with a given literature reference (by author or

– all protein with a keyword (eg : “ribosomal”)

• Two examples of such software systems :

– System for integrating heterogeneous databases

– Provides a well defined web interface

You might also like