Data Retrieval System: Text-Based Database Searching
There are three important data retrieval systems of particular relevance to molecular biologists:
Entrez ( at NCBI) (GI(Global Image disk image file) /Accession no.
Sequence Retreival System, SRS (at EBI)
DBGET/LinkDB (At Japan)
The advantage of these retrieval systems is that they not only return matches to a query, but also provide handy pointers to additional important information in related databases.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
340 views54 pages
Data Retrieval System: Text-Based Database Searching
There are three important data retrieval systems of particular relevance to molecular biologists:
Entrez ( at NCBI) (GI(Global Image disk image file) /Accession no.
Sequence Retreival System, SRS (at EBI)
DBGET/LinkDB (At Japan)
The advantage of these retrieval systems is that they not only return matches to a query, but also provide handy pointers to additional important information in related databases.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54
DATA RETRIEVAL SYSTEM
Text-based Database Searching
Submitted By: Dr. Shikha Thakur Assistant Professor (Guest Faculty) TCSC Mumbai Maharashtra • The amount of biologically relevant data accessible via the WWW is increasing at a very rapid rate. • It is important for scientists to have easy and efficient ways of wading through the data and finding what is important for their research. • Knowing how to access and search for information in the database is essential. Depending on the type of data at hand, there are two basic ways of searching: • Using descriptive words to search text databases. • Using a nucleotide or protein sequence to search sequence databases. Text- based database Searching • There are three important data retrieval systems of particular relevance to molecular biologists: • Entrez ( at NCBI) (GI(Global Image disk image file) /Accession no. • Sequence Retreival System, SRS (at EBI) • DBGET/LinkDB (At Japan) • The advantage of these retrieval systems is that they not only return matches to a query, but also provide handy pointers to additional important information in related databases. Text-based database Searching • The three systems differ in the databases they search and the links they provide to other information. • In using any of these systems, queries can be as simple as entering the accession number of a newly published sequence or as complex as searching multiple database fields for specific terms. Text-Based Database Searching • Basic Search Concepts • Boolean Search – An advanced query search using two or more terms, using Boolean operator AND, OR, NOT, default – AND • Broadening the Search – If the results of a search produce no useful entries, change or remove terms. • Narrowing the search – If the results of a search produce no useful entries, change or remove terms. • Proximity Searching – To search with multiword terms or phrases, place quotes around the terms. • Wild Card – The character prepended or appended to a search term make a search less specific., e.g., to look for all authors with last name Zav, search using Zav*. Entrez • Entrez – is a molecular biology database and retrieval system developed by the National Center for Biotechnology Information (NCBI). • It is an entry point for exploring distinct but integrated databases. • (http://www.ncbi.nlm.nih.gov/Entrez/) Entrez • The Entrez system provides access to: • Nucleotide sequence databases- GenBank/DDBJ/EBI • Protein sequence databases – Swiss-Prot, PIR, PRF, PDB, and translated protein sequences from DNA sequence databases. • Genome and chromosome mapping data • Molecular Modeling 3-D structures Databases. • Literature database, PubMed – Provides excellent and easy access to MEDLINE and pre-MEDLINE articles. • Taxonomy database – Allows retrieval of DNA and protein sequences for any taxonomic group. • Specialized Databases – OMIM, dbSNP, UniSTS, etc. Entrez • The most valuable feature of Entrez is • Its exploitation of the concept of ’neighbouring’. • Which allows related articles indifferent databases to be linked to each other, whether or not they are cross-referenced directly. • Neighbours and links are listed in the order of similarity to the query. • The similarity is based on pre-computed analysis of sequences, structures and the literature. Entrez • One particularly useful feature in Entrez is – • The ability to retrieve large sets of data based on some criterion and to download them to a local computer- Batch Entrez • Allowing these sequences to be worked on using analytical tools available on local computer. Entrez Features 1. Entrez Global Query – Search a subset of Entrez databases. 2. Batch Entrez –Upload a file of GI or accession numbers to retrieve sequences. 3. Making Links Entrez – Linking to PubMed and Genbank 4.E-Utilities – Entrez programming utilities 5. LinkOut – External links to related resources. 6. Cubby – Provides with a stored search feature to store and update searches, allows to customize your LinkOut display. SRS. • The Sequence Retrieval System (SRS) – A network browser for datbases in molecular biology. • It is a powerful sequence information indexing, search and retrieval system (http://srs.ebi.ac.uk/) SRS • SRS is a homogeneous interface to over 80 biological databases developed at the European Bioinformatics Institute (EBI) at Hinxton, UK. • The types of databases included are sequence and sequence related, metabolic pathways, transcription factors, application results (e.g., BLAST), protein 3D- structure, genome, mapping, mutations, and locus-specific mutatins. • One can access and query their contents and navigate among them. SRS The Web page listing all the databases contains a link to a description page about the database and includes the date of last update. One can select one or more datbases to search before entering the query. • Over 30 versions of SRS are currently running on the WWW. Each includes a different subset of databases and associated analytical tools. SRS • SRS Features: • SRS databases are well indexed, thus reducing the search time for the large number of potential databases. • SRS allows any flat file database to be indexed to any other. The advantage being the derived indices may be rapidly searched allowing users to retrieve link and access entries from all the interconnected resources. • The system has the particular strength that it can be readily customized to use any defined set of databanks. SRS • Simple SRS queries • By accession number • Query on accession number: J00231 • By a simple author or organism: Ausubel and Rhizobium • Boolean relations between keywords: and, or, but not SRS • Contd… • Searching by dates: 01-Jan-1995:31-Dec-1995. • Searching by size: 400:600 • Using hypertext links in an entry: Medline, Swiss- Prot and PDB entries can be linked from within the EMBL database. • Display of molecules via Rasmol plug-in DBGET • DBGET/LinkDB – Is an integrated bioinformatics database retrieval system at GenomeNet, developed by the institute for Chemical Research, Kyoto University, and the Human Genome Center of the University of Tokyo. DBGET • DBGET – Is used to search and extract entries from a wide range of molecular biology databases. • LinkDB- Is used to compute links between entries in different databases. • It is designed to be a network distributed database system with an open architecture, which is suitable for incorporating local databases or establishing a server environment. • http://www.genome.ad.jp/dbget/ DBGET • DBGET/LinkDB is integrated with other search tools, such as BLAST, FAST and MOTIF to conduct further retreivals instantly. • DBGET provides access to about 20 databases, which are queried one at a time. After querying one of these databases, DBGET presents links to associated information in addition to the list of results. • A unique feature of DBGET is its connection with the Kyoto Encyclopedia of Genes and Genomes(KEGG) database – a database of metabolic and regulatory pathways. DBGET • DBGET has three basic commands (or three basic modes in the Web version), bfind, bget, and blink, to search and extract database entries. • blink – To search and extract database entries. • bget – Performs the retrieval of database entries specified by the combination of dbname:identifier • bfind – Is used for searching entries by keywords • Notable feature of DBGET, different from other text search systems, is that no keyword indexing is performed when a database is installed or updated. DBGET • Selected fields are extracted and stored in separate files for bfind searches. • An advantage for rapid database updates, but sometimes a disadvantage for elaborate searching. • To supplement bfind, the full text search STAG is provided. • blink – The LinkDB search. Once entries of interest are found, it can be used to retrieve related entries in a given database or all databases in GenomeNet. Example
• Let’s consider an example to show how each system can be used to
retrieve the SwissProt entry P04391, an ornithine carbamoyltransferase protein in Escherichia coli. • In Entrez, enter the name P04391 in the protein database query form and view the entry and associated links and neighbours. Example - SRS • In SRS, first select the SwissProt database, then enter P04391 in the query form and, once the entry is displayed search for links to other related databases. Example – LinkDB • However, the fastest way of gathering the related information for this entry is to search LinkDB. • By simply entering swissport:P04391, a list of all links to all the related databases is displayed. Thank You