0% found this document useful (0 votes)
8 views11 pages

2024.HF BioInformatics Lec3p

The document provides an overview of biological databases, explaining their structure, types, and importance in bioinformatics. It details various database formats, including flat file and XML, and lists significant biological databases such as GenBank and Ensembl. Additionally, it discusses the types of information that can be obtained from database searches related to genes, including evolutionary, genomic, structural, expression, and functional data.

Uploaded by

arash429a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views11 pages

2024.HF BioInformatics Lec3p

The document provides an overview of biological databases, explaining their structure, types, and importance in bioinformatics. It details various database formats, including flat file and XML, and lists significant biological databases such as GenBank and Ensembl. Additionally, it discusses the types of information that can be obtained from database searches related to genes, including evolutionary, genomic, structural, expression, and functional data.

Uploaded by

arash429a
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

2/11/24

Bioinformatics:
Dr. Hossein Fallahi Lecture 3
Dep. of Biology,
School of Sceinces,
Razi University Databases
Kermanshah
Iran

Databases

q A very simple form of (non-electronic) database is a filing cabinet. In the filing cabinet,
you can store many different records (sheets of paper), each containing multiple data
elements.

q Example: a filing cabinet of invoices


ü the filing cabinet is a table

ü the columns are the fields of data on the individual invoices (customer, product, price,
quantity)

ü the rows (records) are the individual invoices

q The biggest problem with a filing cabinet is that you can only store your data one way
(e.g., in alphabetical order of the customer’s last name), and there’s no good way of
searching your files based on any other criteria (say, by product ordered).
2
2

1
2/11/24

What is biological database?

Ø Biological databases are libraries of life sciences information, collected


from scientific experiments, published literature, high throughput
experiment technology and computational analysis.
Ø They contain information from genomics, proteomics, microarray gene
expression.
Ø Information contained in biological databases includes gene function,
structure, localization(both cellular and chromosomal),biological
sequences and structures.

3
3

Some terms in database

Tables (entities)
•basic elements of information to track, e.g., gene, organism, sequence,
citation
Columns (fields)
•attributes of tables, e.g. for citation table, title, journal, volume, author
Rows (records)
•actual data
•whereas fields describe what data is stored, the rows of a table are where
the actual data is stored

4
4

2
2/11/24

Flat File Storage Data Formats

• When GenBank, EMBL and DDBJ formed a collaboration (1986), sequence


databases had moved to a defined flat file format with a shared feature
table format and annotation standards.

• The flat file formats from the sequence databases are still used to access
and display sequence and annotation.

• They are also convenient for storage of local copies.

5
5

6
6

3
2/11/24

XML format

A nucleotide sequence record, encoded in XML

<?xml version="1.0" encoding="UTF-8"?>


<Sequence>
<accession>NM-171533</accession>
<organism>Caenorhabditis elegans</organism>
<sequence-data>
agcacatgacatgagcagtgccccaaatgatgactgtgagatcgacaaggg
aacaccttctaccgcttcactttttacaacgctgatgctcagtcaaccatcttcttct
acagctgttttacagtgtacatattgtggaagctcgtgcacatcttcccaattgca
aacatgtttattctg
[Full sequence has been omitted for brevity.]
</sequence-data>
</Sequence>

7
7

Biological Databases

•Over 1000 biological databases


•Vary in size, quality, coverage, level of interest
•Many of the major ones covered in the annual Database Issue of Nucleic Acids Research
•What makes a good database?

ØA simple, easy to understand structure. Øgood interface with Easy retrieval of data.
ØCross-referenced ØAccuracy
ØComprehensive, but easy to search. Øis up-to-date
ØAnnotated, but not “too annotated”. Øbatch search/download
ØMinimum redundancy.

8
8

4
2/11/24

TYPES OF MOLECULAR DATABASES

• Primary Databases
• Original submissions by experimentalists
• Content controlled by the submitter

• Derivative Databases
• Derived from primary data
• Content controlled by third party (NCBI)

9
9

Primary Databases

1. DNA Data Bank of Japan (National Institute of Genetics)


2. EMBL (European Bioinformatics Institute)
3. GenBank (National Center for Biotechnology Information)

1. repositories for nucleotide sequence data from all organisms. All three databases
2. accept nucleotide sequence submissions,
3. exchange new and updated data on a daily basis to achieve optimal synchronization
between them.

10
10

5
2/11/24

Secondary Databases

1. RefSeq
2. SNP / Disease Databases
3. OMIM; Online Mendelian Inheritance in Man OMIM Inherited Diseases
4. HapMap
5. 23andme's database

11
11

Ten Important Bioinformatics Databases

q GenBank www.ncbi.nlm.nih.gov nucleotide sequences


q Ensembl www.ensembl.org human/mouse genome (and others)
q PubMed www.ncbi.nlm.nih.gov literature references
q NR www.ncbi.nlm.nih.gov protein sequences
q SWISS-PROT www.expasy.ch protein sequences
q InterPro www.ebi.ac.uk protein domains
q OMIM www.ncbi.nlm.nih.gov genetic diseases
q Enzymes www.chem.qmul.ac.uk enzymes
q PDB www.rcsb.org/pdb/ protein structures
q KEGG www.genome.ad.jp metabolic pathways

12 Source: Bioinform atics for Dum m ies

12

6
2/11/24

What can be discovered about a gene by a database search?

• A little or a lot, depending on the gene


• Evolutionary information: homologous genes, taxonomic distributions, allele frequencies, synteny, etc.

• Genomic information: chromosomal location, introns, UTRs, regulatory regions, shared domains, etc.

• Structural information: associated protein structures, fold types, structural domains

• Expression information: expression specific to particular tissues, developmental stages, phenotypes, diseases, etc.
• Functional information: enzymatic/molecular function, pathway/cellular role, localization, role in diseases

13
13

14
14

7
2/11/24

General types of bioinformatics databases

1 Meta Databases
2 Nucleic Acid Databases
3 Amino Acid / Protein Databases
4 Additional Databases (carbohydrate, systems)
5 Specialized Databases (antibodies, barcode of life)
6 Wiki-Style Databases

15
15

Meta Databases

1. BioGraph- A knowledge discovery based on the integration of > 20 heterogeneous databases


2. Neuroscience Information Framework
3. ConsensusPathDB
4. GeneCards (Weizmann Inst.)
5. PathogenPortal
6. iRefIndex: provides an index of protein interactions
7. The Encyclopedia of DNA Elements (ENCODE)
8. Human Epigenome Atlas,
9. Metascape

16
16

8
2/11/24

Gene Expression Databases


(mostly Microarray data)

1. ArrayExpress (European Bioinformatics Institute)


2. Gene Expression Omnibus (GEO, National Center for Biotechnology Information)
3. GPX (Scottish Centre for Genomic Technology and Informatics)
4. maxd (Univ. of Manchester)
5. Stanford Microarray Database (SMD) (Stanford University)
6. Genevestigator - Expression Search Engine (Nebion AG)
7. BioGPS (The Scripps Research Institute)
8. The European Genome-phenome Archive (EGA)
9. The Genotype-Tissue Expression (GTEx) project collects and analyzes thousands of tissues
from hundreds of healthy donors who are also densely genotyped.

17
17

Genome Databases

10. Saccharomyces Genome Database


1. Gene Disease Database
11. Viral Bioinformatics Resource Center
2. SNPedia
12. Xenbase, genome of the model organism Xenopus
3. CAMERA Resource for microbial genomics and
13. Wormbase, Caenorhabditis elegans
metagenomics
14. Zebrafish Information Network
4. Ensembl
15. TAIR, The Arabidopsis Information Resource.
5. Exome Aggregation Consortium (ExAC)
16. UCSC Malaria Genome Browser
6. Flybase, genome Drosophila melanogaster
17. RGD Rat Genome Database
7. MGI Mouse Genome (Jackson Lab.)
18. The 1000 Genomes Project
8. RegulonDB - E. coli K-12.
19. Personal Genome Project
9. Repbase database for repetitive elements (transposons).
20. Legume Information System (LIS)

18
18

9
2/11/24

Protein Sequence Databases

19
19

Structure Databases

20
20

10
2/11/24

Protein Domain And Family Database

21
21

Pathway Databases

22
22

11

You might also like