Data Retrieval System: Text-Based Database Searching

There are three important data retrieval systems of particular relevance to molecular biologists: Entrez ( at NCBI) (GI(Global Image disk image file) /Accession no. Sequence Retreival System, SRS (at EBI) DBGET/LinkDB (At Japan) The advantage of these retrieval systems is that they not only return matches to a query, but also provide handy pointers to additional important information in related databases.

Uploaded by

shikha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

340 views54 pages

Data Retrieval System: Text-Based Database Searching

Uploaded by

shikha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

DATA RETRIEVAL SYSTEM

Text-based Database Searching

Submitted By:
Dr. Shikha Thakur
Assistant Professor (Guest Faculty)
TCSC
Mumbai
Maharashtra
• The amount of biologically relevant data accessible via the WWW is
increasing at a very rapid rate.
• It is important for scientists to have easy and efficient ways of wading
through the data and finding what is important for their research.
• Knowing how to access and search for information in the database is
essential.
Depending on the type of data at hand, there are
two basic ways of searching:
• Using descriptive words to search text databases.
• Using a nucleotide or protein sequence to search sequence
databases.
Text- based database Searching
• There are three important data retrieval systems of particular
relevance to molecular biologists:
• Entrez ( at NCBI) (GI(Global Image disk image file) /Accession no.
• Sequence Retreival System, SRS (at EBI)
• DBGET/LinkDB (At Japan)
• The advantage of these retrieval systems is that they not only return
matches to a query, but also provide handy pointers to additional
important information in related databases.
Text-based database Searching
• The three systems differ in the databases they search and the links
they provide to other information.
• In using any of these systems, queries can be as simple as entering
the accession number of a newly published sequence or as complex
as searching multiple database fields for specific terms.
Text-Based Database Searching
• Basic Search Concepts
• Boolean Search – An advanced query search using two or more terms,
using Boolean operator AND, OR, NOT, default – AND
• Broadening the Search – If the results of a search produce no useful
entries, change or remove terms.
• Narrowing the search – If the results of a search produce no useful entries,
change or remove terms.
• Proximity Searching – To search with multiword terms or phrases, place
quotes around the terms.
• Wild Card – The character prepended or appended to a search term make
a search less specific., e.g., to look for all authors with last name Zav,
search using Zav*.
Entrez
• Entrez – is a molecular biology database and retrieval system
developed by the National Center for Biotechnology Information
(NCBI).
• It is an entry point for exploring distinct but integrated databases.
• (http://www.ncbi.nlm.nih.gov/Entrez/)
Entrez
• The Entrez system provides access to:
• Nucleotide sequence databases- GenBank/DDBJ/EBI
• Protein sequence databases – Swiss-Prot, PIR, PRF, PDB, and translated
protein sequences from DNA sequence databases.
• Genome and chromosome mapping data
• Molecular Modeling 3-D structures Databases.
• Literature database, PubMed – Provides excellent and easy access to
MEDLINE and pre-MEDLINE articles.
• Taxonomy database – Allows retrieval of DNA and protein sequences for
any taxonomic group.
• Specialized Databases – OMIM, dbSNP, UniSTS, etc.
Entrez
• The most valuable feature of Entrez is
• Its exploitation of the concept of ’neighbouring’.
• Which allows related articles indifferent databases to be linked to
each other, whether or not they are cross-referenced directly.
• Neighbours and links are listed in the order of similarity to the query.
• The similarity is based on pre-computed analysis of sequences,
structures and the literature.
Entrez
• One particularly useful feature in Entrez is –
• The ability to retrieve large sets of data based on some criterion and
to download them to a local computer- Batch Entrez
• Allowing these sequences to be worked on using analytical tools
available on local computer.
Entrez Features
1. Entrez Global Query – Search a subset of Entrez databases.
2. Batch Entrez –Upload a file of GI or accession numbers to retrieve
sequences.
3. Making Links Entrez – Linking to PubMed and Genbank
4.E-Utilities – Entrez programming utilities
5. LinkOut – External links to related resources.
6. Cubby – Provides with a stored search feature to store and update
searches, allows to customize your LinkOut display.
SRS.
• The Sequence Retrieval System (SRS) – A network browser for
datbases in molecular biology.
• It is a powerful sequence information indexing, search and retrieval
system (http://srs.ebi.ac.uk/)
SRS
• SRS is a homogeneous interface to over 80 biological databases
developed at the European Bioinformatics Institute (EBI) at Hinxton,
UK.
• The types of databases included are sequence and sequence related,
metabolic pathways, transcription factors, application results (e.g.,
BLAST), protein 3D- structure, genome, mapping, mutations, and
locus-specific mutatins.
• One can access and query their contents and navigate among them.
SRS
The Web page listing all the databases contains a link to a description
page about the database and includes the date of last update.
One can select one or more datbases to search before entering the
query.
• Over 30 versions of SRS are currently running on the WWW. Each
includes a different subset of databases and associated analytical
tools.
SRS
• SRS Features:
• SRS databases are well indexed, thus reducing the search time for the
large number of potential databases.
• SRS allows any flat file database to be indexed to any other. The
advantage being the derived indices may be rapidly searched allowing
users to retrieve link and access entries from all the interconnected
resources.
• The system has the particular strength that it can be readily
customized to use any defined set of databanks.
SRS
• Simple SRS queries
• By accession number
• Query on accession number: J00231
• By a simple author or organism: Ausubel and Rhizobium
• Boolean relations between keywords: and, or, but not
SRS
• Contd…
• Searching by dates: 01-Jan-1995:31-Dec-1995.
• Searching by size: 400:600
• Using hypertext links in an entry: Medline, Swiss- Prot and PDB
entries can be linked from within the EMBL database.
• Display of molecules via Rasmol plug-in
DBGET
• DBGET/LinkDB – Is an integrated bioinformatics database retrieval
system at GenomeNet, developed by the institute for Chemical
Research, Kyoto University, and the Human Genome Center of the
University of Tokyo.
DBGET
• DBGET – Is used to search and extract entries from a wide range of
molecular biology databases.
• LinkDB- Is used to compute links between entries in different
databases.
• It is designed to be a network distributed database system with an
open architecture, which is suitable for incorporating local databases
or establishing a server environment.
• http://www.genome.ad.jp/dbget/
DBGET
• DBGET/LinkDB is integrated with other search tools, such as BLAST,
FAST and MOTIF to conduct further retreivals instantly.
• DBGET provides access to about 20 databases, which are queried one
at a time. After querying one of these databases, DBGET presents
links to associated information in addition to the list of results.
• A unique feature of DBGET is its connection with the Kyoto
Encyclopedia of Genes and Genomes(KEGG) database – a database of
metabolic and regulatory pathways.
DBGET
• DBGET has three basic commands (or three basic modes in the Web
version), bfind, bget, and blink, to search and extract database
entries.
• blink – To search and extract database entries.
• bget – Performs the retrieval of database entries specified by the
combination of dbname:identifier
• bfind – Is used for searching entries by keywords
• Notable feature of DBGET, different from other text search systems, is
that no keyword indexing is performed when a database is installed or
updated.
DBGET
• Selected fields are extracted and stored in separate files for bfind
searches.
• An advantage for rapid database updates, but sometimes a
disadvantage for elaborate searching.
• To supplement bfind, the full text search STAG is provided.
• blink – The LinkDB search. Once entries of interest are found, it can
be used to retrieve related entries in a given database or all databases
in GenomeNet.
Example

• Let’s consider an example to show how each system can be used to

retrieve the SwissProt entry P04391, an ornithine
carbamoyltransferase protein in Escherichia coli.
• In Entrez, enter the name P04391 in the protein database query
form and view the entry and associated links and neighbours.
Example - SRS
• In SRS, first select the SwissProt database, then enter P04391 in the
query form and, once the entry is displayed search for links to other
related databases.
Example – LinkDB
• However, the fastest way of gathering the related information for this
entry is to search LinkDB.
• By simply entering swissport:P04391, a list of all links to all the
related databases is displayed.
Thank You

Assignment (1) (1) TCL & Icc
No ratings yet
Assignment (1) (1) TCL & Icc
4 pages
Assignment (2) Linux
No ratings yet
Assignment (2) Linux
6 pages
Useful DbGet One
100% (1)
Useful DbGet One
4 pages
Tempus Cui Ug
No ratings yet
Tempus Cui Ug
576 pages
Digital Design Flow
No ratings yet
Digital Design Flow
71 pages
Commands Important To Floorplan
No ratings yet
Commands Important To Floorplan
8 pages
Linux
No ratings yet
Linux
50 pages
Understanding The Skew Group Report Dumped by Report - Ccopt - Skew - Groups Command
No ratings yet
Understanding The Skew Group Report Dumped by Report - Ccopt - Skew - Groups Command
2 pages
Migrating To Onsemi Network & Cliosoft
No ratings yet
Migrating To Onsemi Network & Cliosoft
29 pages
Shanghai Training Placement
No ratings yet
Shanghai Training Placement
25 pages
Focal - Opt - Icc.: DRC Violations That Remains After The Post-Route Optimization Performed by The
No ratings yet
Focal - Opt - Icc.: DRC Violations That Remains After The Post-Route Optimization Performed by The
4 pages
Adding Spare Cells
No ratings yet
Adding Spare Cells
4 pages
PrimeTime Suite Tool Commands - Fix - Eco - Wire
No ratings yet
PrimeTime Suite Tool Commands - Fix - Eco - Wire
2 pages
What Is The Difference Between FSDB and VCD Files
No ratings yet
What Is The Difference Between FSDB and VCD Files
1 page
Programming With Awk and Perl
No ratings yet
Programming With Awk and Perl
4 pages
PD-Eval Test 28dec
No ratings yet
PD-Eval Test 28dec
7 pages
Inputs Required at Each Stages of PNR Flow and Sanity Checks Need To Be Done at Each Stages
No ratings yet
Inputs Required at Each Stages of PNR Flow and Sanity Checks Need To Be Done at Each Stages
7 pages
Ccopt Design
No ratings yet
Ccopt Design
2 pages
1 Quick Tour v1.0
No ratings yet
1 Quick Tour v1.0
33 pages
Enc Script
No ratings yet
Enc Script
1 page
Unit 1
No ratings yet
Unit 1
105 pages
PD RO The NG Lecture-9-Routing
No ratings yet
PD RO The NG Lecture-9-Routing
35 pages
Variation Library Application Note
No ratings yet
Variation Library Application Note
18 pages
Labcxfb
No ratings yet
Labcxfb
15 pages
Icc Commands
No ratings yet
Icc Commands
28 pages
高级ASIC芯片综合
No ratings yet
高级ASIC芯片综合
257 pages
Magnet - Placement Analysis
No ratings yet
Magnet - Placement Analysis
9 pages
Asic Design Types: ASIC Is Mainly Divided Into Two Divisions
No ratings yet
Asic Design Types: ASIC Is Mainly Divided Into Two Divisions
43 pages
Considerations For Writing UPF For A Hierarchical Flow: Scope vs. Hierarchy
No ratings yet
Considerations For Writing UPF For A Hierarchical Flow: Scope vs. Hierarchy
9 pages
Multi Threaded Optimizing Technique For Dynamic Binary Translator CrossBit
No ratings yet
Multi Threaded Optimizing Technique For Dynamic Binary Translator CrossBit
8 pages
Adaptive Filters
No ratings yet
Adaptive Filters
23 pages
TCL 2 2015.00 LG 04
No ratings yet
TCL 2 2015.00 LG 04
8 pages
Setnanoroutemode PDF
No ratings yet
Setnanoroutemode PDF
9 pages
ASIC Implementation of I2C Bus
No ratings yet
ASIC Implementation of I2C Bus
33 pages
Ccopt Lab 2 (Ccopt Rak)
100% (1)
Ccopt Lab 2 (Ccopt Rak)
3 pages
My PD Notes PDF
No ratings yet
My PD Notes PDF
80 pages
Static Timing Analysis Updated
No ratings yet
Static Timing Analysis Updated
15 pages
DTMF Chip Flow Picture
No ratings yet
DTMF Chip Flow Picture
1 page
Sram Low Power Decoder
No ratings yet
Sram Low Power Decoder
7 pages
PrimeTime AdvancedOCV WP
No ratings yet
PrimeTime AdvancedOCV WP
9 pages
TCL Language
No ratings yet
TCL Language
17 pages
Metal Fil
No ratings yet
Metal Fil
15 pages
L10
No ratings yet
L10
24 pages
PD Inputs
No ratings yet
PD Inputs
40 pages
Optional: More IC Compiler GUI: Learning Objectives
No ratings yet
Optional: More IC Compiler GUI: Learning Objectives
12 pages
May Batch STA2
No ratings yet
May Batch STA2
12 pages
An Introduction To TCL Scripting: John Ousterhout Sun Microsystems Laboratories
100% (1)
An Introduction To TCL Scripting: John Ousterhout Sun Microsystems Laboratories
21 pages
TimeQuest Timing Analyzer
No ratings yet
TimeQuest Timing Analyzer
79 pages
GREP Command
No ratings yet
GREP Command
2 pages
RTL To GDS
No ratings yet
RTL To GDS
70 pages
APR v1
No ratings yet
APR v1
119 pages
Mail Response
No ratings yet
Mail Response
167 pages
STA Presentation
No ratings yet
STA Presentation
22 pages
TCL Interview Preparation
No ratings yet
TCL Interview Preparation
27 pages
Pin Assignment
No ratings yet
Pin Assignment
12 pages
Synthesis Questions
No ratings yet
Synthesis Questions
4 pages
Notes
No ratings yet
Notes
57 pages
Bioinformatics Day 5
No ratings yet
Bioinformatics Day 5
6 pages
How To Perform Database-Searching: Sequence Retrieval System (SRS)
No ratings yet
How To Perform Database-Searching: Sequence Retrieval System (SRS)
2 pages
Data Retrival Systems
No ratings yet
Data Retrival Systems
3 pages
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
No ratings yet
College of Agriculture, Rajendranagar, Hyderabad-500030: Professor Jayashankar Telangana State Agricultural University
34 pages
Gene Prediction Methods
No ratings yet
Gene Prediction Methods
2 pages
Genomes3e PPT ch04
No ratings yet
Genomes3e PPT ch04
35 pages
Pam Blosum Comparison 2022
No ratings yet
Pam Blosum Comparison 2022
2 pages
Biological Databases Genbank
No ratings yet
Biological Databases Genbank
31 pages
Bioinformatics Assingment - New Kandy - Draft
100% (1)
Bioinformatics Assingment - New Kandy - Draft
14 pages
Molecular Phylogeny Part I
No ratings yet
Molecular Phylogeny Part I
10 pages
2 Introduction To PDB
No ratings yet
2 Introduction To PDB
43 pages
Brainstorming Session On Bioinformatics For Beginners
No ratings yet
Brainstorming Session On Bioinformatics For Beginners
1 page
Assignment 2 - Database Searching - 19 Mar. 2024
No ratings yet
Assignment 2 - Database Searching - 19 Mar. 2024
4 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
2024 Bioinformatics Algorithms Day 2
100% (1)
2024 Bioinformatics Algorithms Day 2
107 pages
Genomics
No ratings yet
Genomics
6 pages
National Center For Biotechnology Information
No ratings yet
National Center For Biotechnology Information
23 pages
Biology Lab 1 Bioinformatic Report
No ratings yet
Biology Lab 1 Bioinformatic Report
5 pages
Cobalt RID 9JFE347B211 (5 Seqs)
No ratings yet
Cobalt RID 9JFE347B211 (5 Seqs)
2 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
History: Mathematical and Theoretical Biology Is An
No ratings yet
History: Mathematical and Theoretical Biology Is An
4 pages
GTGF GGCF
No ratings yet
GTGF GGCF
19 pages
Lab Report 3 Bioinformatics
No ratings yet
Lab Report 3 Bioinformatics
18 pages
Modeller Tutorial
No ratings yet
Modeller Tutorial
20 pages
Standardized Annotation of Translated Open Reading Frames: Correspondence
No ratings yet
Standardized Annotation of Translated Open Reading Frames: Correspondence
6 pages
List of Online Bioinformatics Tools and Software - Final
No ratings yet
List of Online Bioinformatics Tools and Software - Final
23 pages
Using Profile HMM in MSA
No ratings yet
Using Profile HMM in MSA
4 pages
Bioinformatic Tools and Resources
No ratings yet
Bioinformatic Tools and Resources
17 pages
Biological Search Engines
No ratings yet
Biological Search Engines
3 pages
Resume
No ratings yet
Resume
3 pages
A Review of Fractional-Order Models For Plant Epidemiology: Progress in Fractional Differentiation and Applications
No ratings yet
A Review of Fractional-Order Models For Plant Epidemiology: Progress in Fractional Differentiation and Applications
33 pages
Chromosome-Level Genome Assembly of The Greenfin Horse-Faced Filefish (Thamnaconus
No ratings yet
Chromosome-Level Genome Assembly of The Greenfin Horse-Faced Filefish (Thamnaconus
28 pages
Second - Done - W14a - Substitution Patterns
No ratings yet
Second - Done - W14a - Substitution Patterns
36 pages

Data Retrieval System: Text-Based Database Searching

Uploaded by

Data Retrieval System: Text-Based Database Searching

Uploaded by

DATA RETRIEVAL SYSTEM

Text-based Database Searching

• Let’s consider an example to show how each system can be used to

You might also like