0% found this document useful (0 votes)

28 views46 pages

Entrez

Uploaded by

Uswa Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views46 pages

Entrez

Uploaded by

Uswa Rana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 46

Databases: 1

Book Shelf
 A organized set of data held in a
Book
computer, especially one that is
accessible in various ways.

Why Databases
Data
 Major goal in developing databases is
to provide efficient and user friendly Databases (Software) for
access to the data stored. data storage

Retrieval System
Retrieval Systems of NCBI: 2

Entrez SRS (Sequence GetEntry

Retrieval System)

• Search distinct health • Information indexing and • DDBJ flat file search
sciences databases retrieval system designed system by accession no.
for libraries with flat file
format.
• (EMBL nucleotide)
3
ENTREZ: 4

 First distributed on CD-ROM by NCBI in 1991.

 Text-based search and retrieval system of NCBI for databases like

PubMed, Protein Structures, Complete Genomes, Taxonomy, and many
others.

 Key feature is that it can integrate information, which comes from

cross-referencing between NCBI databases based on preexisting and
logical relationships between individual entries.
Continue… 5

 This is highly convenient: users do not have to visit multiple databases

located in disparate places.

For Example:
 In a nucleotide sequence page, one may find cross-referencing links to
the translated protein sequence, genome mapping data, or to the
related PubMed literature information, and to protein structures if
available.
Entrez basic retrieval links and tools: 6
BLAST: 7

 The Basic Local Alignment Search Tool (BLAST) compares

primary biological sequence information, such as the amino-
acid sequences of proteins or the nucleotides of DNA and or
RNA sequences.
VAST: 8

 The Vector Alignment Search

Tool (VAST) is a computer
algorithm developed at NCBI
and used to identify similar
protein 3-dimensional
structures
9
10
Phylogeny tool: 11
 Generates a common tree for a set of taxa.
 How to retrieve data regarding phylogenetic relationships via
Entrez using NCBI:
1) Search google for NCBI Tree Viewer
12
13
14
Text-based database searching: 15

Boolean Search
 Provides a way of generating precise queries that produce well-defined sets
of results. AND,NOT & OR are the Boolean operators used.

 Broadens the Search – If the results of a search produce no useful entries,

change or remove terms.

 Narrows the Search – If the results of a search produce too many entries,
change or add terms.
Text-based searching: 16

Boolean operators
 To perform complex queries in a database.
 This is to join a series of keywords using logical terms such as AND, OR, and
NOT to indicate relationships between the keywords used in a search.

AND Search result must contain both

OR Search for results containing either word or both
NOT Excludes results containing either one of the words
Example: promoters OR response elements NOT human AND mammals.
Continue… 17

Parenthesis
 Used to force a particular order of evaluation, similar to mathematical
statements.
 Enclosing individual concepts in parentheses changes this priority.
 Items contained within parentheses are executed first.

Example:
 gene AND (acid OR base).
If multiple terms are entered they are automatically AND’ed together.
Proximity searching: 18

 Only allows us to find terms that appear within a certain number of words
of each other.
 Find terms situated within a specified distance of each other in any
order. The closer they are, the higher the document appears in the
results list.

 NEAR, ADJ, SAME operators.

 We can search with multiword terms or phrases, place quotes around the
terms i.e A protein name, gene name or gene symbol directly can be
used.
Continued… 19

 To search for authors, their names must be entered in a

particular format: {Last name} {initials}
 No punctuation
 Only author fields will be searched in the database
 Searches can be further limited by adding [AUTHOR] to
the query string.
Continue… 20

Accession numbers or sequence identification numbers

 Can be searched, but specific formats are required (direct retrieval of full
sequence record) e.g.
CAA79696
NP_778203

 To find a match to an exact phrase, enclose it in quotation marks e.g.

"contactin associated protein"
"duchenne muscular dystrophy"
Truncation: 21

Wildcard

 The character * prepended/appended to a search term make a

search less specific.
 It finds all terms that begin with a given string of text.
Example:
To look for all authors with last name Zav, search using Zav*.
 Only end-truncation is supported.
 Wildcards will only consider the first 150 matches to the string.
Continue… 22

 Molecular weights can be searched in the following format:

1) {weight}[Molecular Weight]
2) {weight minimum}:{weight maximum}[Molecular Weight]

 Other searches :
1) Accession numbers, [ACCN]
2) Sequence Length [SLEN]
23

Practical Example
Text-based Database Searching: 24

1) Basic
How to
?
2) Advanced Method 1(do a separate search for each term or phrase and
combine searches using History).
3) Advanced Method 2 (stack your query one step at a time (iterative
searching) using Preview/Index)
4) Complex Boolean Query Used often

https://www.ncbi.nlm.nih.gov/Class/MLACourse/Modules/Entrez/index.html
Basic: 25

 I need to retrieve human nucleotide sequences associated with

colon cancer.
 Just enter search terms without specifying search fields, other
limits, or Boolean operators.

All databases
26
Advanced Method 1: 27

 Do a separate search for each term or phrase and combine

searches using History.

 Limits and History.
28
29
Advanced Method 2: 30

Stacks the query one step at a time (iterative searching)

using Preview/Index)

Title
Colon cancer

Organism Humans
Complex Boolean query: 31

Boolean
Operators

Developed in 31st Oct 2007

32
Search builder: 33
Shortcut method: 34

 This method restricts the search to specific subsets of

records such as those from a specific organism, molecule
type or source database i.e. Facet/ Filters/ Limits.
Facets/ Filters/ Limits: 35
36

Non-redundancy
How can I download sequence records to a file on my computer?
37

 Click the Send to menu that

appears at the upper right of
document summaries or record
views.
 Select the file radio button.
 Then choose the desired format
from the pull-down list.
 Click the Create File button to
save the records.
Facets/Filters: 38

1) Organism

2) Molecule type (limit results to particular

molecule type)

3) Source database (allow us to limit the results to

a particular database)
Molecule types: 39

cRNA (anti-sense RNA)

 A short section of a gene or other DNA element that are used

to hybridize a cDNA
Non-coding RNA (ncRNA)
 RNA molecule that is not translated into a protein.
 The DNA sequence from which a functional non-coding RNA is
transcribed is often called an RNA gene.
 Abundant and functionally important types of non-coding
RNAs include transfer RNAs(tRNAs) and ribosomal RNAs (rRNAs),
as well as small RNAs
40
41
Source databases: 42

INSDC

 The International Nucleotide Sequence Database Collaboration

(INSDC) is a long-standing foundational initiative that operates
between DDBJ, EMBL-EBI and NCBI.
 It covers the spectrum of data raw reads, through alignments and
assemblies to functional annotation, enriched with contextual
information relating to samples and experimental configurations.
Continue… 43

Third Party Annotation (TPA)

 It is a sequence derived or assembled from primary sequence data

currently found in the DDBJ/EMBL/GenBank International
Nucleotide Sequence Database.

 It can be genomic or mRNA sequence and can be assembled or

derived from primary genomic and/or mRNA sequences.
How do I change the format, number, or sorting
order of records displayed? 44
Formats: 45

Abstract Syntax Notation One(ASN 1)

• NCBI uses ASN.1 for the storage and retrieval of data such as
nucleotide and protein sequences, structures, genomes, PubMed
records, and more.

GenInfo Identifier (GI numbers)

 It is a simple series of digits that are assigned consecutively to
each sequence record processed by NCBI. . Each time a sequence
record is changed, it is assigned a new GI number.
Additional filters: 46

Bookshelf NBK21101
100% (1)
Bookshelf NBK21101
451 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
ATT - 1420338878321 - Lesson Plan in Science II
No ratings yet
ATT - 1420338878321 - Lesson Plan in Science II
2 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Chapter 1 Introduction To Biology Lesson 1.1 and 1.7
100% (1)
Chapter 1 Introduction To Biology Lesson 1.1 and 1.7
140 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Protein Synthesis Worksheet
No ratings yet
Protein Synthesis Worksheet
4 pages
Human Kar Yo Typing Se
33% (3)
Human Kar Yo Typing Se
4 pages
Lecture 5 Information Retrieval From Databases
No ratings yet
Lecture 5 Information Retrieval From Databases
22 pages
Genetics Problems - Grade 11: Monohybrid Crosses
No ratings yet
Genetics Problems - Grade 11: Monohybrid Crosses
4 pages
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
No ratings yet
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
22 pages
SL Biology Syllabus Notes
No ratings yet
SL Biology Syllabus Notes
52 pages
Grade 10 3rd Quarter Science Reviewer
No ratings yet
Grade 10 3rd Quarter Science Reviewer
19 pages
Hsslive Xii Botany Qns and Answers Nandini 2025
No ratings yet
Hsslive Xii Botany Qns and Answers Nandini 2025
36 pages
Database
No ratings yet
Database
40 pages
Coursera BioinfoMethods-I Lab01 PDF
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
22 pages
Posttranscriptional Gene Regulation 3rd Edition Erik Dassi PDF Download
100% (1)
Posttranscriptional Gene Regulation 3rd Edition Erik Dassi PDF Download
91 pages
Chapter 12
No ratings yet
Chapter 12
60 pages
Lecture3 4
No ratings yet
Lecture3 4
73 pages
Index: Auroras Technological and Research Institute
No ratings yet
Index: Auroras Technological and Research Institute
56 pages
Dna Introduction
No ratings yet
Dna Introduction
14 pages
Bioinformatics Tutorial
No ratings yet
Bioinformatics Tutorial
12 pages
An Introduction To Genes and Genomes
No ratings yet
An Introduction To Genes and Genomes
13 pages
PMC National MDCAT Syllabus 2020 19-10-2020
No ratings yet
PMC National MDCAT Syllabus 2020 19-10-2020
46 pages
(Ebooks PDF) Download Microarray Image and Data Analysis Theory and Practice 1st Edition Luis Rueda Full Chapters
100% (4)
(Ebooks PDF) Download Microarray Image and Data Analysis Theory and Practice 1st Edition Luis Rueda Full Chapters
81 pages
Bioinformatics Lecture 1
No ratings yet
Bioinformatics Lecture 1
48 pages
L10 NCBI Exercises
No ratings yet
L10 NCBI Exercises
44 pages
Sem 420
No ratings yet
Sem 420
20 pages
Hardy Weinberg Law
No ratings yet
Hardy Weinberg Law
7 pages
Data Retrieval System: Text-Based Database Searching
No ratings yet
Data Retrieval System: Text-Based Database Searching
54 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
BioinfoMethods I Lab01
No ratings yet
BioinfoMethods I Lab01
19 pages
Comp Bio Lab File
No ratings yet
Comp Bio Lab File
43 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Fat Noews
No ratings yet
Fat Noews
37 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Analisis Keragaman Genetik Dan Pengembangan Profil Sidik Jari DNA 20 Varietas Cabai Lokal Indonesia Berdasarkan Marka SSR (Genetic Diversity Analysis and Development of DNA
No ratings yet
Analisis Keragaman Genetik Dan Pengembangan Profil Sidik Jari DNA 20 Varietas Cabai Lokal Indonesia Berdasarkan Marka SSR (Genetic Diversity Analysis and Development of DNA
15 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
Molecular Genetics - Lab Manual - 22 May 2021
No ratings yet
Molecular Genetics - Lab Manual - 22 May 2021
36 pages
Fluorescence In-Situ Hybridization (FISH) : November 2018
No ratings yet
Fluorescence In-Situ Hybridization (FISH) : November 2018
24 pages
Module1 Understanding Bioinformatics
No ratings yet
Module1 Understanding Bioinformatics
28 pages
Lecture1 BIOF242 Shuvadeep
No ratings yet
Lecture1 BIOF242 Shuvadeep
38 pages
Manual
No ratings yet
Manual
68 pages
Biological Databases: Notes Adapted From Lecture Notes of Dr. Larry Hunter at The University of Colorado
No ratings yet
Biological Databases: Notes Adapted From Lecture Notes of Dr. Larry Hunter at The University of Colorado
41 pages
Bioinfo Course Notes M1 2020 DR Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 DR Mbulli
56 pages
Nucleic Acid Databases
No ratings yet
Nucleic Acid Databases
37 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
35 pages
5.7. Data Retrieval
No ratings yet
5.7. Data Retrieval
16 pages
Class 1 Bioinfo Course Microdome-1
No ratings yet
Class 1 Bioinfo Course Microdome-1
23 pages
Ncbi
No ratings yet
Ncbi
25 pages
Bio Informatics
No ratings yet
Bio Informatics
46 pages
Plant Biotechnology
No ratings yet
Plant Biotechnology
44 pages
Lecture 2
No ratings yet
Lecture 2
24 pages
Bioinformatics 1 p2
No ratings yet
Bioinformatics 1 p2
22 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Experiment - 01
No ratings yet
Experiment - 01
26 pages
Genoma Ib Usp July2019
No ratings yet
Genoma Ib Usp July2019
85 pages
بحث المعلوماتية الحيوية
No ratings yet
بحث المعلوماتية الحيوية
39 pages
BCH 428 Slide
No ratings yet
BCH 428 Slide
32 pages
Terms 333
No ratings yet
Terms 333
18 pages
Human Genetics Case Study No. 1
No ratings yet
Human Genetics Case Study No. 1
3 pages
Unit 1
No ratings yet
Unit 1
24 pages
Retrieval Tools
No ratings yet
Retrieval Tools
7 pages
Lab 1 - Introduction and Protocol
No ratings yet
Lab 1 - Introduction and Protocol
28 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
NEET 2024 Biology Syllabus For Medical Entrance Examination - Free PDF Download
No ratings yet
NEET 2024 Biology Syllabus For Medical Entrance Examination - Free PDF Download
13 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
S10 Q3 Enhanced Hybrid Module 4 Week 4 Final
No ratings yet
S10 Q3 Enhanced Hybrid Module 4 Week 4 Final
16 pages
120-202 Lab 01 - Fall 2018
No ratings yet
120-202 Lab 01 - Fall 2018
13 pages
Fermented Food Made From Milk
No ratings yet
Fermented Food Made From Milk
35 pages
June 2019 (IAL) QP - Unit 2 Edexcel Biology A-Level
No ratings yet
June 2019 (IAL) QP - Unit 2 Edexcel Biology A-Level
32 pages
Probability Biostatistics
No ratings yet
Probability Biostatistics
30 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
CH12
No ratings yet
CH12
8 pages
Different Statistical Tests With Uses and Examples
No ratings yet
Different Statistical Tests With Uses and Examples
7 pages
Ncbi Dulu
No ratings yet
Ncbi Dulu
6 pages
Dragon Genetics Worksheet
No ratings yet
Dragon Genetics Worksheet
4 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
Biosatistics Ppt..graphs
No ratings yet
Biosatistics Ppt..graphs
10 pages
IPR
No ratings yet
IPR
4 pages
Molecular Diagnosis of Inherited Disorders: Lessons From Hemoglobinopathies
No ratings yet
Molecular Diagnosis of Inherited Disorders: Lessons From Hemoglobinopathies
14 pages
TỔNG HỢP WORD FORMATION HSG TỈNH 12 THPT
No ratings yet
TỔNG HỢP WORD FORMATION HSG TỈNH 12 THPT
7 pages
Bioinfo U3 Part 2
No ratings yet
Bioinfo U3 Part 2
3 pages
PEDGREE
No ratings yet
PEDGREE
2 pages
Join Six Leading Experts Who Will Be Your Guide To The Rapidly Evolving Field of Genetics New Scientist
No ratings yet
Join Six Leading Experts Who Will Be Your Guide To The Rapidly Evolving Field of Genetics New Scientist
1 page
Science LESSON-03-06
No ratings yet
Science LESSON-03-06
2 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Automatic Image Annotation: Fundamentals and Applications
From Everand
Automatic Image Annotation: Fundamentals and Applications
Fouad Sabry
No ratings yet

Entrez

Uploaded by

Entrez

Uploaded by

Databases: 1

Entrez SRS (Sequence GetEntry

 First distributed on CD-ROM by NCBI in 1991.

 Text-based search and retrieval system of NCBI for databases like

 Key feature is that it can integrate information, which comes from

 This is highly convenient: users do not have to visit multiple databases

 The Basic Local Alignment Search Tool (BLAST) compares

 The Vector Alignment Search

 Broadens the Search – If the results of a search produce no useful entries,

AND Search result must contain both

 NEAR, ADJ, SAME operators.

 To search for authors, their names must be entered in a

Accession numbers or sequence identification numbers

 To find a match to an exact phrase, enclose it in quotation marks e.g.

 The character * prepended/appended to a search term make a

 Molecular weights can be searched in the following format:

 I need to retrieve human nucleotide sequences associated with

 Do a separate search for each term or phrase and combine

Stacks the query one step at a time (iterative searching)

Developed in 31st Oct 2007

 This method restricts the search to specific subsets of

 Click the Send to menu that

2) Molecule type (limit results to particular

3) Source database (allow us to limit the results to

cRNA (anti-sense RNA)

 A short section of a gene or other DNA element that are used

 The International Nucleotide Sequence Database Collaboration

Third Party Annotation (TPA)

 It is a sequence derived or assembled from primary sequence data

 It can be genomic or mRNA sequence and can be assembled or

Abstract Syntax Notation One(ASN 1)

GenInfo Identifier (GI numbers)

You might also like