0% found this document useful (0 votes)

17 views28 pages

Bioinformatics Lecture Notes Database

The document provides an introduction to bioinformatics, emphasizing its role in managing biological data through computational techniques. It outlines the aims, applications, and importance of biological databases, detailing various types of databases such as primary and secondary databases, along with examples like GeneBank and EMBL. Additionally, it discusses gene annotation and the significance of accession numbers in identifying sequences within databases.

Uploaded by

4ytghtdfkt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views28 pages

Bioinformatics Lecture Notes Database

Uploaded by

4ytghtdfkt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 28

Introduction to

bioinformatics
(databases)
Course Code –BOTY 4204
Course Title- Techniques in plant sciences , biostatistics and
bioinformatics

By – Dr. Alok Kumar Shrivastava

Department of Botany
Mahatma Gandhi Central
University, Motihari
What is bioinformatics ?
• In biology, bioinformatics is defined as, “the use
of computer to store, retrieve, analyse or predict
the composition or structure of bio-molecules” .
Bioinformatics is the application of
computational techniques and information technology
to the organisation and management of biological
data. Classical bioinformatics deals primarily with
sequence analysis.
Aims of bioinformatics
 Development of database containing all
biological information.
 Development of better tools for data designing,
annotation and mining.
 Design and development of drugs by using
simulation software.
 Design and development of software tools for protein
structure prediction function, annotation and docking
analysis.
 Creation and development of software to improve tools
for analysing sequences for their function and similarity
with other sequences
Applications of bioinformatics

Crop Antibiotic
development resistance
Gene
therapy

Drought
resistance
Drug
designing

Applications of
Forensic
bioinformatis
analysis
Medicine
biotechnology

Wheather
analysisy
Evolutionary
studies
Waste Veterinary
ceanup science
Biological databases
• Biological data are complex, exception-ridden, vast
and incomplete. Therefore several databases has been
created and interpreted to ensure
unambiguous results. A collection of biological
data arranged in computer readable form that
enhances the speed of search and retrieval and
convenient to use is called biological database.
A good database must have updated
information.
Importance of biological database
•A range of information like biological
sequences, structures, binding sites, metabolic
interactions, molecular action, functional
relationships, protein families, motifs and
homologous can be retrieved by using biological
databases. The main purpose of a biological database
is to store and manage biological data and
information in computer readable forms.
Types of biological
database
Primary secondary
database Derived
database database

Nucleotide Protein structur Domain and

Protein databas e
sequence database sequence motif database
e Prosite
database PDB
Gene Swis-port
bank EBI-MSD
Blocks
DDBJ PIR MMDB
COG
EMBL
GenePept
Structure Gen Metaboli Specialised
database expression cpathway database
database databas
SCOP TGI
e KEGG
GEOGXD GSOB
CATH PathD
GPCRD
B
Primary database vs. secondary
database
• A primary database contains only sequence or
structural information.
• The database derived from the analysis or treatment of
primary data are secondary database. It is very
important for interfering protein function.
Examples of some primary
biological database
GeneBan
 One of the fastest growingkrepositories of known nucleotide sequences,
GeneBank (Genetic Sequence Databank), has a flat file structure. It is an
ASCII text file, readable by both humans and computers. Besides sequence
data, GeneBank files contain information such as accession numbers and
gene names, phylogenetic classification and references to published literature.

 This database has been developed and maintained at the NCBI, Bethesda,
MD,
USA, as a part of International Sequence Database Collaboration (INSDC).

 It is an open access sequence database.

 It coordinates with individual and other sequence databases
laboratories EMBL and DDBJ. like

Continue………………..
 It is an annotated collection of all nucleotide sequences that are
available to the public.

 The nucleotide database was divided into three databases at

NCBI: CoreNucleotide database, Expressed Sequence Tag (EST) and
Genome Survey Sequence (GSS).

 CoreNucleotide database has most of the nucleotide sequences used. It

also encloses all nucleotide records that are not in the EST and GSS
databases.

 Submission of sequences to GeneBank can be done using BankIt,

Sequin and tbl2asn tools.
EMBL(European Molecular
Biology Laboratory)
• A comprehensive database of DNA and RNA sequences,
EMBL nucleotide sequence database is collected from scientific
literature, patient offices and is directly submitted by
researchers. EMBL has been prepared in collaboration
with GeneBank (USA) and the DNA Database of Japan
(DDBJ).
• It is established in 1980.
• It is maintained by EBI (European Bioinformatics Institute)
Swiss-
 This is a curated protein sequence database that offers a high
Port
level of integration with other databases and also has a very
low level of redundancy. Swiss-Port strives to provide protein
sequences with a high level of annotation (for instance, the
description of protein function, domain structure and post
translational modifications, etc.).
 It is established in 1986 and maintained collaboratively ,
since 1987, by the department of Medical Biochemistry
of the University of Geneva and the EMBL data Library.
 TrEMBL is a computer–annotated supplement of Swiss-Port
that contains all translations of EMBL nucleotide sequence
entries, which is not yet integrated in Swiss-Port.
 Currently Swiss-Port have 0.5 and TrEMBL have 7.6
milliom sequences.
Protein Information Resource(PIR)
• PIR is an integrated public bioinformatics resource
to support genomic and proteomic research and
scientific studies. Nowadays, PIR offers a wide
variety of resources mainly oriented to
assisting the propagation and consistency of
protein annotations like PIRSF, ProClass and
ProLINK.
Examples of Some Secondary
Biological Database
Motif Databases
• Protein sequence motif is a set of conserved amino
acid residues that are important for protein function and are
located within a certain distance from one another.
These motifs usually provide clues to the functions
of otherwise uncharacterised proteins.
• The PROSITE database consists of documentation
entries describing protein domains, families and functional
sites as well as associated patterns and profiles to identify
them.
• PRINT is a database for protein fingerprints. A fingerprint is
a group of conserved motifs used to characterise a
protein family.
Domain Database
• A protein domain is an independently folded,
structurally compact unit that forms a steady three-
dimensional structure and shows a certain level of
evolutionary conservation. Typically , a conserved domain
contains one or more motifs.
• ProDom is a protein domain database automatically
generated from the Swiss-Port and TrEMBL sequence
database.
• SMART is a highlyreliableand sensitive tool for
domain identification.
• COG is a database and a convenient tool for motif and
domain identification.
3D Structure databases
• PDB (Protein Data bank) is the main primary database for
3D structures of biological macromolecules determined by
X-ray, crystallography and NMR. It also accepts
experimental data used to determine the structures and
homology models.
• SCOP (Structural Classification of Protein database)
classifies protein 3D structures in a hierarchical scheme
of structure classes. All the protein structures in PDB are
classified her, and the updated new structures are deposited
in PDB.
• The CATH database (Class, Architecture, Topology,
Homologous) contains a hierarchical classification of
protein domain structure.
Protein data bank

• PDB (Protein data bank) is a repository for 3D

structural data obtained by x-ray
crystallography or NMR spectroscopy of
proteins and nucleic acids.
• Research Collaboratory for Structural
Bioinformatics (RCSB) PDB provides a variety of
tools and resources for studying the structures of
biological macromolecules and their relationship with
other sequences, its function and diseases caused
if any .
Gene expression databases
• GEO or Gene Expression Omnibus is a curated
online resource and a gene expression molecular
abundance repository for gene expression data browsing,
query and retrieval.
• GXD or Gene Expression Database is a community
resource for gene expression information.
• MGED or Microarray Gene Expression data
contains microarray data, generated by functional
genomics and proteomics experiments.
• ArrayExpress from European Bioinformatics Institute is
a repository for transcriptomics data.
Metabolic pathway databases
• KEGG PATHWAY Database contains graphical pathway maps
for all known metabolic pathways from various organisms.
• EcoCyc is an E. coli database , stores information regarding
the genome and biochemical machinery of E. coli.
• LIGAND is a chemical database for enzyme reactions at the
Institute for Chemical Research, Kyoto. It is composite database
currently consisting of the COMPOUND, DRUG, GLYCAN,
REACTION, RPAIR and ENZYME databases.
• MetaCyc is a non-redundant, experimentally elucidated
metabolic pathway database.
• BRENDA is an enzyme database tat contains information on
all aspects of enzymes and enzymatic reactions.
Genome databases
• Genome databases give absolute information on the
heritable properties of an organism. These databases help
to identify genes and predict their functions. A few
genome databases have links with specific organism
databases.
• GOLD (Genomes Online Database at the University
of Illinois, USA) contains a list of all the complete and
ongoing genome projects worldwide.
• Genomes at NCBI (National Centre for
Biotechnology Information, USA).
• TIGR database (TDB), at the institute for Genomic
Research at Rockeville MD, USA.
Virological databases
• A virological database contains all the sequences
and related information of viruses of animals, plants,
bacteria,and archea; for example,
fungi HIV
the
database. A committee called protease
The ee on Taxonomy of Viruses(ICTV)
Committ International
authorises and organises the taxonomic classification
of viruses. ICTVdB contains taxonomic information
for over thousands of virus species.
World biodiversity databases
• Taxonomic databases are builtto document all
known
species make them available accessible
and
worldwide. These and
databases contain
hierarchies, specie names,
taxonomic descriptions,
illustrations and
s references.
synonyms,For example: CCINFO,
STRAIN and
ALGAE.
Database for various model
organismscoli- E. coli Genome Centre(Wisconsin
• Escherichia
university, USA), The E.coli index (University of Birmingham, UK)
• Arabidopsis thaliana- TAIR (The Arabidopsis Information
Resource)
• Homo sapiens- Human Genome Resources at NCBI, USA
• oryza sativa(rice) -RGP (Rice Genome Research
Programme, Japan)
• Drosaphila melanogaster -FlyBase (Drosophila Genome Database)
• Mus musculus (mouce)- Mouce Genome Informatics
• Danio rerio(zebrafish)- ZFIN (Zebrafish Information Network at
the University of Oregon, USA)
• S. cerevisiae (Bakersyeast)-SGD ()Yeast Genome Database
Annotation of
Genegenomes
In molecular biology, ????? make the basic genetic material
and typically consist of DNA. Whereby, genome include the genes
(coding ) and non coding regions, of interest to us, are the coding
regions as they actively influence basic life processes. The
genes contain useful biological information that is required in
building up and maintaining an organism. Gene annotation can be
defined merely as the process of making nucleotide sequence
meaningful.
Gene annotation involves the process of taking the raw DNA
sequence produced by genome sequencing projects and adding layers
of analysis and interpretation necessary to extracting
biologically significant information and placing such derived details
into context. Annotation is the process by which pertinent
Accession number

Accession numbers are unique identifiers

which permanently identify sequences in the
database. Accession numbers are
assigned and communicated to authors within
two working days of the receipt of submission.

YAR LESBIAN CE COMPLETE DOCUMENT BOOK - TXT by Abooks - Com.ng
No ratings yet
YAR LESBIAN CE COMPLETE DOCUMENT BOOK - TXT by Abooks - Com.ng
76 pages
Grade 11 - Life Science - Revision Questions - Micro-Organisms
100% (3)
Grade 11 - Life Science - Revision Questions - Micro-Organisms
5 pages
Biological Databases Lec 2,3
No ratings yet
Biological Databases Lec 2,3
49 pages
Bioinformatics. CH 3 Databases (Summarized Notes)
50% (2)
Bioinformatics. CH 3 Databases (Summarized Notes)
5 pages
S11.12LT IIa 4
100% (1)
S11.12LT IIa 4
35 pages
Biomedical Nanotechnology 2011
50% (2)
Biomedical Nanotechnology 2011
431 pages
Data Base in Bioinformatics
No ratings yet
Data Base in Bioinformatics
30 pages
DNA Polymerase Notes
No ratings yet
DNA Polymerase Notes
10 pages
4.5 Biopharmaceutics and Pharmacokinetics (Theory)
0% (1)
4.5 Biopharmaceutics and Pharmacokinetics (Theory)
2 pages
Bioinformatics Databases
No ratings yet
Bioinformatics Databases
10 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
BIOINFORMATICS - eNOTES
No ratings yet
BIOINFORMATICS - eNOTES
23 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Cell Division Study Guide
No ratings yet
Cell Division Study Guide
4 pages
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
No ratings yet
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
42 pages
Biological Databases: - Bio-Informatics
No ratings yet
Biological Databases: - Bio-Informatics
16 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
M.SC - Biotechnology Syllabus CBCS Final
No ratings yet
M.SC - Biotechnology Syllabus CBCS Final
6 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Capture D'écran . 2023-03-14 À 00.15.22
No ratings yet
Capture D'écran . 2023-03-14 À 00.15.22
54 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
Biological Databases
No ratings yet
Biological Databases
3 pages
Biocon
No ratings yet
Biocon
2 pages
BCH 505 Bioinformatics 3 (2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3 (2 2) Databases
17 pages
Biological Data and Database Biological Data
No ratings yet
Biological Data and Database Biological Data
10 pages
Biological Databases PDF
No ratings yet
Biological Databases PDF
13 pages
CH12
No ratings yet
CH12
8 pages
Unit II Bioinformatics
No ratings yet
Unit II Bioinformatics
25 pages
Databases - Final
No ratings yet
Databases - Final
50 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
52 pages
Basics of Bioinformatics in Biological Research
No ratings yet
Basics of Bioinformatics in Biological Research
5 pages
المحاضرة 2
No ratings yet
المحاضرة 2
16 pages
Unit Ii
No ratings yet
Unit Ii
23 pages
Module 2 Biodata
No ratings yet
Module 2 Biodata
36 pages
Bioinformatics Overview
100% (1)
Bioinformatics Overview
18 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Day 1
No ratings yet
Day 1
38 pages
Database
No ratings yet
Database
40 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
Introduction To Bioinformatics (Databases)
No ratings yet
Introduction To Bioinformatics (Databases)
28 pages
Zoya Bioinformatics Assignment
No ratings yet
Zoya Bioinformatics Assignment
36 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
No ratings yet
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
5 pages
Biological Information On Artificial Intelligence
No ratings yet
Biological Information On Artificial Intelligence
20 pages
Basics of Bioinformatics in Biological Research
No ratings yet
Basics of Bioinformatics in Biological Research
5 pages
BCH 428 Slide
No ratings yet
BCH 428 Slide
32 pages
Biol BDs Singapore
No ratings yet
Biol BDs Singapore
24 pages
Biological Databases
No ratings yet
Biological Databases
17 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
Science Olympiad Mock Test 1
No ratings yet
Science Olympiad Mock Test 1
8 pages
Genetically Modified Organisms: Submitted By:-Snehal Muskan CLASS: - 12 "A" Submitted To: - LD Sir
No ratings yet
Genetically Modified Organisms: Submitted By:-Snehal Muskan CLASS: - 12 "A" Submitted To: - LD Sir
12 pages
Bio f4 Chap 5 Cell Division
No ratings yet
Bio f4 Chap 5 Cell Division
30 pages
Beacon Vs Facs
No ratings yet
Beacon Vs Facs
9 pages
Biological Data Bases
No ratings yet
Biological Data Bases
36 pages
Biological Databases
No ratings yet
Biological Databases
13 pages
Cho PDF
No ratings yet
Cho PDF
8 pages
Database
No ratings yet
Database
16 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Plasmid (1952-1997) - Lederberg1998
No ratings yet
Plasmid (1952-1997) - Lederberg1998
9 pages
MLG 2 - Part 4 - Nucleic Acids
No ratings yet
MLG 2 - Part 4 - Nucleic Acids
10 pages
Essential Info Notes-1
No ratings yet
Essential Info Notes-1
57 pages
Materi 1 PDF
No ratings yet
Materi 1 PDF
19 pages
Cdna and Bac Library
No ratings yet
Cdna and Bac Library
14 pages
Cels191 Things To Remember
No ratings yet
Cels191 Things To Remember
9 pages
Latest Cbse Sample Paper
No ratings yet
Latest Cbse Sample Paper
4 pages
Secondary Structure Prediction
No ratings yet
Secondary Structure Prediction
7 pages
Quiz No. 1 Acellular and Prokaryotic Microbes
No ratings yet
Quiz No. 1 Acellular and Prokaryotic Microbes
6 pages
Biologicaldatabase 190402034501
No ratings yet
Biologicaldatabase 190402034501
26 pages
Genetic Diversity of Ehrlichia Canis in
No ratings yet
Genetic Diversity of Ehrlichia Canis in
7 pages
Databases 2025
No ratings yet
Databases 2025
50 pages
CMSC 838T - Lecture 9: Bioinformatics Databases
No ratings yet
CMSC 838T - Lecture 9: Bioinformatics Databases
65 pages
RUGUNTSUMI Book Complete Littafin Yaki by Abdul&#039 Aziz Sani Madakin Gini .TXT by Abooks - Com.ng
No ratings yet
RUGUNTSUMI Book Complete Littafin Yaki by Abdul&#039 Aziz Sani Madakin Gini .TXT by Abooks - Com.ng
8 pages
Bacteria Pear Deck
No ratings yet
Bacteria Pear Deck
23 pages
CHEM UN 2443 002 Syllabus - 2024 Fall
No ratings yet
CHEM UN 2443 002 Syllabus - 2024 Fall
6 pages
Banana Island Book Complete End Original by Oum Hairan .PDF by Abooks - Com.ng
No ratings yet
Banana Island Book Complete End Original by Oum Hairan .PDF by Abooks - Com.ng
103 pages
Virology Multiple Choice Questions
No ratings yet
Virology Multiple Choice Questions
4 pages
Biological Databases BDB
No ratings yet
Biological Databases BDB
5 pages
Hayar Mace! Book Complete by Oum Hairan - TXT by Abooks - Com.ng
No ratings yet
Hayar Mace! Book Complete by Oum Hairan - TXT by Abooks - Com.ng
109 pages
Unit V
No ratings yet
Unit V
8 pages
Next CSP25T17Q
No ratings yet
Next CSP25T17Q
40 pages
AUREN WATA TARA Book Complete by Hajara L Sadiq - TXT by Abooks - Com.ng
No ratings yet
AUREN WATA TARA Book Complete by Hajara L Sadiq - TXT by Abooks - Com.ng
122 pages
Beijing Normal University (School of Life Sciences) CSC Scholarship Faculty List
No ratings yet
Beijing Normal University (School of Life Sciences) CSC Scholarship Faculty List
4 pages
MATAN Hafiz Book Complete Document .TXT by Abooks - Com.ng
No ratings yet
MATAN Hafiz Book Complete Document .TXT by Abooks - Com.ng
178 pages
Biological Database ODL
No ratings yet
Biological Database ODL
21 pages
ANGULU DA KAN ZABO Complete by BADI&#039 AT IBRAHIM Mrs Bukhari .TXT by Abooks - Com.ng
No ratings yet
ANGULU DA KAN ZABO Complete by BADI&#039 AT IBRAHIM Mrs Bukhari .TXT by Abooks - Com.ng
83 pages
Bio Course Outline Semester 1 2023
No ratings yet
Bio Course Outline Semester 1 2023
4 pages
MAI SAKO Garin Neman Gira Book 1 Complete Document by Billy Galadanci .TXT by Abooks - Com.ng
No ratings yet
MAI SAKO Garin Neman Gira Book 1 Complete Document by Billy Galadanci .TXT by Abooks - Com.ng
36 pages
ZEENAT YAR JARIDA Book Complete Document .TXT by Abooks - Com.ng
No ratings yet
ZEENAT YAR JARIDA Book Complete Document .TXT by Abooks - Com.ng
105 pages
Biological Databases ODL
No ratings yet
Biological Databases ODL
31 pages
Peace BMCB Seminar
No ratings yet
Peace BMCB Seminar
13 pages
Database 2
No ratings yet
Database 2
15 pages
YAKANAH! BOOK 1 COMPLETED BY TAKORI .TXT by Abooks - Com.ng
No ratings yet
YAKANAH! BOOK 1 COMPLETED BY TAKORI .TXT by Abooks - Com.ng
69 pages
Analytical Similarity Assessment in Biosimilar Product Development, 1st Edition Google Drive Download
100% (15)
Analytical Similarity Assessment in Biosimilar Product Development, 1st Edition Google Drive Download
15 pages
Kulu Book Complete Document by Riamcool - TXT by Abooks - Com.ng
No ratings yet
Kulu Book Complete Document by Riamcool - TXT by Abooks - Com.ng
80 pages
Biological - Databases Class Work 60
No ratings yet
Biological - Databases Class Work 60
60 pages
Databases Class Work
No ratings yet
Databases Class Work
48 pages
FADEELAH by Fadila Sani Bakori - TXT by Hausanovels - Com.ng
No ratings yet
FADEELAH by Fadila Sani Bakori - TXT by Hausanovels - Com.ng
119 pages
Introduction To Databases
No ratings yet
Introduction To Databases
21 pages
TAMBARIN SARAUTA Book (1 To End) Complete by UMMU SAFWAN - TXT by Abooks - Com.ng
No ratings yet
TAMBARIN SARAUTA Book (1 To End) Complete by UMMU SAFWAN - TXT by Abooks - Com.ng
328 pages
Lecture3 4
No ratings yet
Lecture3 4
73 pages
Koren Maciji Compt-1
No ratings yet
Koren Maciji Compt-1
90 pages
Wuta A Masa?a Complete Hausa Novel by Ayshercool by Arewahausanovels - Com.ng
No ratings yet
Wuta A Masa?a Complete Hausa Novel by Ayshercool by Arewahausanovels - Com.ng
289 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet

Bioinformatics Lecture Notes Database

Uploaded by

Bioinformatics Lecture Notes Database

Uploaded by

Introduction to

By – Dr. Alok Kumar Shrivastava

Nucleotide Protein structur Domain and

 It is an open access sequence database.

 The nucleotide database was divided into three databases at

 CoreNucleotide database has most of the nucleotide sequences used. It

 Submission of sequences to GeneBank can be done using BankIt,

• PDB (Protein data bank) is a repository for 3D

Accession numbers are unique identifiers

You might also like