0% found this document useful (0 votes)

16 views

Lesson 01 Intro DataBases V2

The document provides an overview of public databases in health and life sciences, focusing on the organization, structure, and access methods for biological data. It discusses the importance of controlled vocabularies, the distinction between primary and secondary databases, and various querying techniques for data retrieval. Additionally, it highlights the National Center for Biotechnology Information (NCBI) as a key resource for accessing a wide range of biological databases and tools.

Uploaded by

marti.diez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Lesson 01 Intro DataBases V2

Uploaded by

marti.diez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Public Databases in Health and Life

Sciences
“the potential to translate big data into big discovery”

Academic year 2019-2020

Public Databases in
Health and Life
Sciences

Lesson 1. Organizing biological knowledge in databases.

• Technical concepts and definitions.

• Different classifications of biological databases.
• Hierarchical organization of life and levels of annotation.

Practical session #1
• Introduction to the NCBI and the Entrez system
• Tools for online databases search and retrieval (part I).
What is a database?

Dictionary definition

Database : A usually large collection of data organized specially for rapid search and
retrieval. (as by a computer)

The Oxford English dictionary cites a 1962 technical

report as the first to use the term "data-base.“
What is a database?

A collection of related data, which are:

o Structured
o Searchable (index)
o Updated periodically (releases)
o Cross-referenced (hyperlinks)

You will need an appropriate database

management system (DBMS)!

Query
o Keywords
o Sequences db Analysis Function
o Gene ID
How Databases should be?
Taken from http://www.ebi.ac.uk/services

Open

High Compati-
quality ble
Principles of
a sequence
database

Compre-
Portable
hensive
What is a database?
– A collection of related data elements
• Tables
• columns (fields)
• rows (records)
• Documents
• Key -> Value
– Records retrieved using a query language
– Database technology is well established

Relational Databases
Rows (records)
• actual data
• whereas fields describe what data is stored, the rows of a table are where
the actual data is stored

Columns (fields)
• attributes of tables, e.g. for citation table, title, journal, volume, author
How is information organized in databases?

Accession numbers and Identifiers

An Identifier is essentially a name of a database, table, or

table column.
• As the creator of the database, you are free to identify these objects as you please.
• The identifier can change (based on the curator)

Each record (row in the table) has a unique identifier, alone

or combined with another column is unique for that table.
The primary key (accession number or accession code).
• The primary key should not change.
• Data is indexed according this primary key
• The unique identifier serves to identify the data stored in this record across all
the tables in the database (relational database).
• Usually a string of letters and digits that uniquely identifies an entry in its
database.
– The accession number for TPIS_CHICK in Uniprot/Swissprot is P00940
Biological Databases
• Thousands biological databases
• Many of the major ones covered in the annual
Database Issue of Nucleic Acids Research (NAR)
(2018: 1737 listings). It is available at
http://www.oxfordjournals.org/nar/database/c/
• Vary in size, structure, quality, coverage, level of
interest, data origin …
• Generally accessible through the web
Limitation and challenges of biological databases
• Each of the database resources contains a different subset of biological knowledge.

• There is not standard format

o Every database or program has it own format for storing or presenting
data (eg: http://current.geneontology.org/ontology/go-basic.obo )

• There is not standard nomenclature

o Every database has its own names (Controlled vocabularies)

• Data is not fully optimized

o Some datasets have missing information without indication of it

• Data errors
o Data is some time of poor quality, erroneous, misspelled
o Error propagation resulting from computer annotation

• The integration of biological data remained an additional challenge

o Different DBMS
o Name of biological objects across databases (Controlled vocabularies)
o Biological databases are continuously changing
o Clash of concepts
Controlled vocabularies are a vital ingredient for annotating
data stored in databases.
• Non-hierarchical controlled vocabularies
o the simplest type of controlled vocabularies are non-hierarchical
lists of terms
• Thesaurus as a controlled and structured vocabulary in which concepts
are represented by terms
• Taxonomy as a classification scheme
o you can find everything annotated as a sub-category of the search
term
• Using ontologies
o an ontology describes the categories of objects described in a body
of data, the relationships between those objects, and the
relationships between those categories
o E.g. the Gene Ontology (GO) describes the function and cellular
localization of gene products across all species. Eg GO:0045892
EMBL-EBI Train online
Bioinformatics for the terrified
https://www.ebi.ac.uk/training/online/
GO:0045892:
negative regulation of transcription, DNA-
templated
What is a flat file database?
• Sequential collection of entries, stored in a set of text files
• Flat-File databases can be represented as holding all of their data in one
table only (two-dimensional table)
• Files written in plain text, standard defined format
• Often tab-delimited or comma-separated text files
• Each line is a record. Fields are separated by delimiters: tabs, commas…
• Each file is a record. Fields expressed as key->value (eg: json db)
• Searching issues!
Accesion Source Gene Mol Type
AF068625.2 Mus musculus dnmt3a mRNA

HD654844.1 Homo sapiens hba1 mRNA

AD836734.3 Escherichia coli recA DNA

BD823723.5 Homo sapiens hpo3 DNA

TF7823562.1 VIH p17 cDNA

AS9832656.3 Homo sapiens hbb DNA

AF6723523.1 Danio rerio egf2 mRNA

What is a relational database?
• A relational database contains multiple tables and defines the relationships
between them.
• Virtually all use SQL (Structured Query Language) as a language for querying
and maintaining

invoice_id customer product price quantity total

1 Elmer buckshot $2,00 2 $4,00
2 Wiley Acme snow machine $5,00 1 $5,00
3 Elmer shotgun $25,00 1 $25,00
4 Bugs carrots $0,50 20 $10,00

database scheme
customer_table
name address notes
Elmer Looney Tunes Dr. likes hunting and opera
Wiley Southwest desert big mail order customer
Bugs Rabbit Hole likes to cross dress

product_table
product price notes
carrots $ 0.50
shotgun $ 25.00 oddly flexible
buckshot $ 2.00
Acme snow machine $ 5.00 high defect rate
A common way of storing biological data in a
structured manner is to use a relational database
tab1
Accesion Source Gene Mol Type
AF068625.2 Mus musculus dnmt3a mRNA

HD654844.1 Homo sapiens hba1 mRNA

AD836734.3 Escherichia coli recA DNA

BD823723.5 Homo sapiens hpo3 DNA

TF7823562.1 VIH p17 cDNA

AS9832656.3 Homo sapiens hbb DNA

AF6723523.1 Danio rerio egf2 mRNA

tab2
Species TaxID Synonym
Homo sapiens 9606 Human
Mus musculus 10090 Mouse
Danio rerio 7955 Zebra fish
Escherichica coli 562 E. coli
Essential aspects of primary and secondary databases.

Primary database Secondary database

Synonyms Archival database Curated database;

knowledgebase
Source of data Direct submission of experimentally- Results of analysis,
derived data from researchers literature research and
(database staff organize but don’t interpretation, often of
add additional information) data in primary databases
Once given a database accession Continuously updated
number, the data in primary Biocuration
databases are never changed: they
form part of the scientific record.

EMBL-EBI Train online

Bioinformatics for the terrified
https://www.ebi.ac.uk/training/online/course/bioinformatics-terrified-2018/primary-and-secondary-databases
Definition and aims of biocuration

Biocuration involves the interpretation and integration of

information relevant to biology into a database or resource that
enables integration of the scientific literature as well as large data
sets.

Primary goals of biocuration.

– Accurate and comprehensive representation of biological

knowledge
– Easy access to this data for working scientists and a basis for
computational analysis
How to access the data ?

Databases Search and Retrieval

A request of data from a Database is called as Query

Queries can be of three forms:

1. Choose from a list of parameters

2. Query by example (QBE)

• QBE build wizard allows which data to display

3. Query language

• SQL (structured query language)

How to access the data in public databases ?

Human Web interface (web based, small scale)

o Free text search

o Common mode of search are keywords with modifiers or
identifiers
o Cross-references link the information of different databases “click your
o You do not see the underlying database structure way”
o Output defined by host/provider

Web services and Programmatic data access

o Application Programmers Interface (API)

Programming Utilities Web Service
o To approach database programmatically

Download the data: File Transfer Protocol (FTP), rsync, http

o Flat files (script based, bulk data download)

Database searching tips to choose from a list of parameters

• Using keywords and enclose phrases in double quotes

• Looking for links to Help or Examples
• Boolean searches
Boolean logic consists of three logical operators:
OR
AND
NOT
• Wildcards or Query Truncation
• Advance searches by using search tags and fielded searching
Searching Strategies: Boolean operators

Restrict Expand Filter

Searching with wildcards or query truncation.

• Truncation: Wildcards are useful if, for example, you wish to search for a group of
words (e.g., all words starting with “cell” and ending with “ase”) or if it is unclear
how a word is spelt in a databank.
• Cell* (NCBI and EMBL)
• *ase (EMBL)
• *moglobin (EMBL)

• “?” Matches one character of any value. (EMBL)

• nif? This expression finds the gene
names nifa, nifb, nifc, nifd, nife. But
do not words like Nifedipine

Note: Placing a wildcard at the start of a word or string may increase the response
time because all words in the index have to be checked against your string.
For example cat* in PubMed, will give incomplete results!
Fielded searching using any of the indexed fields (advanced searches)

Entering the phrase with a [field descriptions]:

robotic surgery [title]
Miller MJ [author]
“protein domain” [TI]
human [Organism]
insulin [Protein Name]

Combining fielded searching with booleans

enzymes [TI] NOT Gonzales P [AU]
human [Organism] AND insulin [Protein Name]
Search for Field Descriptions are different in each Database

NCBI UNIPROT
NC_0000*[Accession] accession:p62988
Human[Organism] organism:human
horse[taxonomy] taxonomy:40674
neoplasms[MeSHTerms] keyword:neoplasms
prolactin[Protein Name] name:"prion protein“
APOE[gene] gene:HSPC233
srcdb_refseq[Properties] database:(type:pfam)
2010/06[Publication Date] created:[20121001 TO *]
110:500[Sequence Length] length:[100 TO 500]
gene_symbol[sym] go:0015629
1.1.1.53[ecno] ec:3.2.1.23
gbdiv_est[PROP] reviewed:yes
: : : : : :
etc etc

http://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Search_F
ields_and_Qualifiers
https://www.ncbi.nlm.nih.gov/books/NBK49540/
http://www.uniprot.org/help/query-fields
The NCBI is a comprehensive website for biologists (database
of databases)
http://www.ncbi.nlm.nih.gov/gquery/

o The National Center for Biotechnology

Information (NCBI)
o Created in 1988 as a part of the National Library
of Medicine at NIH
o Establish public databases
o Research in computational biology
o Develop software tools for sequence analysis
o Disseminate biomedical information
o Bethesda, Maryland
o Over 30 databases (primary, secondary,
specialized, meta-databases, etc.)
A brief history of National Center for Biotechnology Information's
formation and growth

1956 US National Library NCBI November

of Medicine (NLM) of 1988

1984-1987
related political 1994—NCBI Website
actions

1997-NCBI introduces
PubMed

http://www.ncbi.nlm.nih.gov/books/NBK148949/
The NCBI home page
http://www.ncbi.nlm.nih.gov/
NCBI hosts over 30 databases
How to access the NCBI data ?
Entrez: An Integrated Database Search and Retrieval System

Human Web interface (web based, small scale)

o Free text search

o List of identifiers (Batch Entrez)

Web service (Programmatic data access)

o Entrez Utilities Web Service (NCBI): The E-utilities

Searching sequence databases using a sequence query

o BLAST

File Transfer Protocol (FTP)

o Flat files (script based, bulk data download)

Entrez: An Integrated Database Search and Retrieval System

• Access all NCBI resources (Database Integration)

• Entrez Databases
• All Molecular Database entries are organized by
organism (Taxonomy Database).
• Each record is assigned a UID “unique integer
identifier” for internal tracking
• Each record is indexed by data fields: [author],
[title], [organism], and many others
• Each record is given a Document Summary
(DocSum).
• Each record is manually or computationally
assigned links to biologically related UIDs in and
across databases.
https://www.ncbi.nlm.nih.gov/sites/batchentrez
Ways to retrieve information from biological databases

Entrez Utilities Web Service (NCBI)

The E-utilities
• Entrez Programming Utilities are tools that provide access to Entrez data outside of
the regular web query interface
•ESearch: Searches and retrieves primary IDs and retains results in the user's
environment.
•EFetch: Retrieves records from one or more primary IDs or from the user's
environment.
•Also: EGQuery, EInfo, ELink, ESpell, ESummary

E-utilities Quick Start

http://www.ncbi.nlm.nih.gov/books/NBK25500/
BLAST (Basic Local Alignment Search Tool)

Altschul, et al. 1990, J Mol Biol, 215:403-10*

Altschul, et al. 1997, Nucleic Acids Res, 25(17): 3389–3402
(* one of the most highly cited paper)

• BLAST is an algorithm to find regions of similarity between biological

sequences both proteins or nucleic acids

• BLAST compares a query sequence with a library or database of sequences,

and identify library sequences that resemble the query sequence above a
certain threshold.

• BLAST Home page: http://www.ncbi.nlm.nih.gov/BLAST

• BLAST is one of the most widely used bioinformatics programs

NCBI: Taxonomy DataBase

398,955 species!
Each taxa with an ID, the TaxID
http://www.ncbi.nlm.nih.gov/taxonomy/
NCBI:Molecular Sequence Databases
Sequence Databases (Primary) Marker Databases
Nucleotide (GenBank) Single Nucleotide Polymorphisms (SNP’s, dbSNP)
PopSet Sequence Tagged Sites (STS’s, dbSTS)
SRA, GSS Expressed Sequence Tags (EST’s, dbEST)
Protein

https://www.ncbi.nlm.nih.gov/nuccore/
NCBI: Derivative Databases
Nucleotide derived Human curated, compilation and correction of data
Example: RefSeq, GENE Example: RefSeq
Protein-derived Computationally Derived
Example: CDD Example: UniGene
Structure-derived Combinations
Example: Structure Example: NCBI Genome Assembly

https://www.ncbi.nlm.nih.gov/refseq/
NCBI: Derivative Databases

https://www.slideshare.net/KavisaGhosh/ncbi
The EMBL-EBI
http://www.ebi.ac.uk/

The EMBL-EBI's search engine

The EMBL-EBI nucleotide repository
ENA
http://www.ebi.ac.uk/ena/

Case 22 Victoria Chemicals PLC (A)
75% (4)
Case 22 Victoria Chemicals PLC (A)
2 pages
Coursera BioinfoMethods-I Lecture01
No ratings yet
Coursera BioinfoMethods-I Lecture01
15 pages
Management Prerogative
No ratings yet
Management Prerogative
60 pages
FE_BME_400_BI_Week 05_Lec
No ratings yet
FE_BME_400_BI_Week 05_Lec
10 pages
Unit 5-Introduction To Biological Databases
No ratings yet
Unit 5-Introduction To Biological Databases
14 pages
Biological Databases: DR Z Chikwambi Biotechnology
No ratings yet
Biological Databases: DR Z Chikwambi Biotechnology
47 pages
Lecture 4 Biological Databases
No ratings yet
Lecture 4 Biological Databases
29 pages
Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13
No ratings yet
Bioinformatics Week 1: Play Video Starting At:4:13 and Follow Transcript4:13
7 pages
Bioinformatics lecture 1
No ratings yet
Bioinformatics lecture 1
48 pages
RAJU
No ratings yet
RAJU
24 pages
Introduction To Databases
No ratings yet
Introduction To Databases
7 pages
1.Databases
No ratings yet
1.Databases
10 pages
Introduction To Biological Databases
No ratings yet
Introduction To Biological Databases
5 pages
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
No ratings yet
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
42 pages
Kristen 1scsasc
No ratings yet
Kristen 1scsasc
35 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
Lec4 Databases
No ratings yet
Lec4 Databases
29 pages
04 Computer Applications in Pharmacy Full Unit IV
No ratings yet
04 Computer Applications in Pharmacy Full Unit IV
14 pages
Bioinformatics Day1
No ratings yet
Bioinformatics Day1
5 pages
Databases For Microarrays: Vidhya Jagannathan SIB, Lausanne
No ratings yet
Databases For Microarrays: Vidhya Jagannathan SIB, Lausanne
49 pages
Bioinfo U2 KD 2
No ratings yet
Bioinfo U2 KD 2
3 pages
Biological Databases_May2023
No ratings yet
Biological Databases_May2023
30 pages
Biological Databases
No ratings yet
Biological Databases
20 pages
"MBG1002 Biological Databases Week II
No ratings yet
"MBG1002 Biological Databases Week II
37 pages
Bioinformatics Database Resources: Icxa Khandelwal Pavan Kumar Agrawal Rahul Shrivastava
No ratings yet
Bioinformatics Database Resources: Icxa Khandelwal Pavan Kumar Agrawal Rahul Shrivastava
46 pages
UNIT II
No ratings yet
UNIT II
23 pages
Sec1 Introduction to Bioinformatics
No ratings yet
Sec1 Introduction to Bioinformatics
20 pages
(BIOINFORMATICS)AHMAD[1]
No ratings yet
(BIOINFORMATICS)AHMAD[1]
10 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
34 pages
ajol-file-journals_314_articles_242956_submission_proof_242956-3745-584187-1-10-20230306
No ratings yet
ajol-file-journals_314_articles_242956_submission_proof_242956-3745-584187-1-10-20230306
17 pages
#1 L1 BioDatabases
No ratings yet
#1 L1 BioDatabases
89 pages
Lecture 1-2 Intro
No ratings yet
Lecture 1-2 Intro
24 pages
1. Databases
No ratings yet
1. Databases
34 pages
M Lec 01 & 02 Biological Database
No ratings yet
M Lec 01 & 02 Biological Database
50 pages
2024.HF_BioInformatics_Lec3p
No ratings yet
2024.HF_BioInformatics_Lec3p
11 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Q Data Bases
No ratings yet
Q Data Bases
2 pages
Database Intro Powerpoint
No ratings yet
Database Intro Powerpoint
24 pages
Databases in Bioinformatics
No ratings yet
Databases in Bioinformatics
33 pages
المحاضرة 2
No ratings yet
المحاضرة 2
16 pages
Lecture 5- DataBase
No ratings yet
Lecture 5- DataBase
18 pages
Biological Databases
No ratings yet
Biological Databases
3 pages
Unit 2.4: Bioinformatics and Databases
No ratings yet
Unit 2.4: Bioinformatics and Databases
55 pages
Coursera BioinfoMethods-I Lecture01 r2018
No ratings yet
Coursera BioinfoMethods-I Lecture01 r2018
16 pages
Biological Database
No ratings yet
Biological Database
3 pages
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
No ratings yet
Bioinformatics and Omics Topic: Database and Biological Database With Examples Assignment-3
5 pages
Capture D'écran . 2023-03-14 À 00.15.22
No ratings yet
Capture D'écran . 2023-03-14 À 00.15.22
54 pages
BD0 Introduction 2per
No ratings yet
BD0 Introduction 2per
19 pages
Biological Databases
No ratings yet
Biological Databases
39 pages
Coursera BioinfoMethods-I Lecture01 r2022 For Slides
No ratings yet
Coursera BioinfoMethods-I Lecture01 r2022 For Slides
16 pages
Ananya Jaiswal
No ratings yet
Ananya Jaiswal
20 pages
Biological Databases (1)
No ratings yet
Biological Databases (1)
41 pages
Biodb Week1 Slides v3
No ratings yet
Biodb Week1 Slides v3
86 pages
DOC-20250225-WA0035.
No ratings yet
DOC-20250225-WA0035.
12 pages
BIOINFORMATICS (FINAL)
No ratings yet
BIOINFORMATICS (FINAL)
41 pages
Biological Databases- Types and Importance _ Bioinformatics _ Microbe Notes
No ratings yet
Biological Databases- Types and Importance _ Bioinformatics _ Microbe Notes
6 pages
Lecture 5 Information Retrieval From Databases
No ratings yet
Lecture 5 Information Retrieval From Databases
22 pages
Bioinformatics Database Systems (Kevin Byron, Katherine G. Herbert etc.) (Z-Library)
No ratings yet
Bioinformatics Database Systems (Kevin Byron, Katherine G. Herbert etc.) (Z-Library)
49 pages
Databases in Bioinformatics - An Introduction
No ratings yet
Databases in Bioinformatics - An Introduction
11 pages
Biological Databases
No ratings yet
Biological Databases
28 pages
Basics of Bioinformatics in Biological Research
No ratings yet
Basics of Bioinformatics in Biological Research
5 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Abridged Arcadii's Guide
No ratings yet
Abridged Arcadii's Guide
11 pages
Cre/LoxP System
No ratings yet
Cre/LoxP System
2 pages
Emissivity Table E4
No ratings yet
Emissivity Table E4
11 pages
Kumpulan Contoh Soal SBMPTN Bahasa Inggris Dan Pembahasannya 2016
No ratings yet
Kumpulan Contoh Soal SBMPTN Bahasa Inggris Dan Pembahasannya 2016
9 pages
Book 2 Shipping and Logistics Management Springer 2023
No ratings yet
Book 2 Shipping and Logistics Management Springer 2023
13 pages
Solar Inverters
100% (3)
Solar Inverters
48 pages
Account Determination From SD
No ratings yet
Account Determination From SD
9 pages
Charlevoix County News - September 22, 2011
No ratings yet
Charlevoix County News - September 22, 2011
18 pages
Transporter Contract - Ntoane Projects and Lions Den
No ratings yet
Transporter Contract - Ntoane Projects and Lions Den
10 pages
MINI Project Report - Prishita Srivastava
No ratings yet
MINI Project Report - Prishita Srivastava
37 pages
Change Management in British Airways
No ratings yet
Change Management in British Airways
20 pages
Lichtenstein FuG 202 and FuG 220 Aiborne Radar
80% (5)
Lichtenstein FuG 202 and FuG 220 Aiborne Radar
27 pages
12international Healthcare Building Standards Codes - Henning Lensch
No ratings yet
12international Healthcare Building Standards Codes - Henning Lensch
69 pages
Quality Control and Assurance (QAQC)
No ratings yet
Quality Control and Assurance (QAQC)
25 pages
Company Data Matrix
No ratings yet
Company Data Matrix
3 pages
Summary 1
No ratings yet
Summary 1
42 pages
Whitepaper Data Driven Growing Next Level
No ratings yet
Whitepaper Data Driven Growing Next Level
16 pages
Guidelines For Hydrogenation
No ratings yet
Guidelines For Hydrogenation
5 pages
Cpwd Dar Vol II 14092023 Civil
No ratings yet
Cpwd Dar Vol II 14092023 Civil
1,114 pages
Snowflake Training Slide SANMs
67% (6)
Snowflake Training Slide SANMs
218 pages
Tender Specifications-Exhaust System
No ratings yet
Tender Specifications-Exhaust System
6 pages
Curriculum Vitae: Telecommunications Engineer
No ratings yet
Curriculum Vitae: Telecommunications Engineer
2 pages
02 AKL BDP OTKP. 2021 - PTS - PKK KELAS 12 AKL DAN 12 OTKP SMT 1 (Hasil)
100% (1)
02 AKL BDP OTKP. 2021 - PTS - PKK KELAS 12 AKL DAN 12 OTKP SMT 1 (Hasil)
67 pages
CHAPTER VII Katarungang Pambarangay
No ratings yet
CHAPTER VII Katarungang Pambarangay
4 pages
Prospectus - 51 SR - National - Gwalior
No ratings yet
Prospectus - 51 SR - National - Gwalior
6 pages
Experiment No. 06 IPC Using Pipe
No ratings yet
Experiment No. 06 IPC Using Pipe
3 pages
Employees Motivation Mechanism in Fast Moving Consumer Goods Industry - Case of Procter & Gamble
No ratings yet
Employees Motivation Mechanism in Fast Moving Consumer Goods Industry - Case of Procter & Gamble
4 pages
Essential Requirements For Reinforced Concrete Buildings
No ratings yet
Essential Requirements For Reinforced Concrete Buildings
3 pages

Lesson 01 Intro DataBases V2

Uploaded by

Lesson 01 Intro DataBases V2

Uploaded by

Public Databases in Health and Life

Academic year 2019-2020

Lesson 1. Organizing biological knowledge in databases.

• Technical concepts and definitions.

The Oxford English dictionary cites a 1962 technical

A collection of related data, which are:

You will need an appropriate database

Accession numbers and Identifiers

An Identifier is essentially a name of a database, table, or

Each record (row in the table) has a unique identifier, alone

• There is not standard format

• There is not standard nomenclature

• Data is not fully optimized

• The integration of biological data remained an additional challenge

HD654844.1 Homo sapiens hba1 mRNA

AD836734.3 Escherichia coli recA DNA

BD823723.5 Homo sapiens hpo3 DNA

TF7823562.1 VIH p17 cDNA

AS9832656.3 Homo sapiens hbb DNA

AF6723523.1 Danio rerio egf2 mRNA

invoice_id customer product price quantity total

HD654844.1 Homo sapiens hba1 mRNA

AD836734.3 Escherichia coli recA DNA

BD823723.5 Homo sapiens hpo3 DNA

TF7823562.1 VIH p17 cDNA

AS9832656.3 Homo sapiens hbb DNA

AF6723523.1 Danio rerio egf2 mRNA

Primary database Secondary database

Synonyms Archival database Curated database;

EMBL-EBI Train online

Biocuration involves the interpretation and integration of

Primary goals of biocuration.

– Accurate and comprehensive representation of biological

Databases Search and Retrieval

A request of data from a Database is called as Query

Queries can be of three forms:

1. Choose from a list of parameters

2. Query by example (QBE)

• QBE build wizard allows which data to display

• SQL (structured query language)

Human Web interface (web based, small scale)

o Free text search

Web services and Programmatic data access

o Application Programmers Interface (API)

Download the data: File Transfer Protocol (FTP), rsync, http

o Flat files (script based, bulk data download)

• Using keywords and enclose phrases in double quotes

Restrict Expand Filter

• “?” Matches one character of any value. (EMBL)

Entering the phrase with a [field descriptions]:

Combining fielded searching with booleans

o The National Center for Biotechnology

1956 US National Library NCBI November

Human Web interface (web based, small scale)

o Free text search

o List of identifiers (Batch Entrez)

Web service (Programmatic data access)

o Entrez Utilities Web Service (NCBI): The E-utilities

Searching sequence databases using a sequence query

File Transfer Protocol (FTP)

o Flat files (script based, bulk data download)

• Access all NCBI resources (Database Integration)

Entrez Utilities Web Service (NCBI)

E-utilities Quick Start

Altschul, et al. 1990, J Mol Biol, 215:403-10*

• BLAST is an algorithm to find regions of similarity between biological

• BLAST compares a query sequence with a library or database of sequences,

• BLAST Home page: http://www.ncbi.nlm.nih.gov/BLAST

• BLAST is one of the most widely used bioinformatics programs

The EMBL-EBI's search engine

You might also like