0% found this document useful (0 votes)

25 views4 pages

PC#1 Exercises Introduction To NCBI 2020 v2

The document provides a practical session guide for exploring the National Center for Biotechnology Information (NCBI) and its Entrez databases. It includes instructions for conducting searches, understanding database records, and utilizing Batch Entrez for retrieving sequences related to human cancer. Additionally, it covers the interpretation of search queries and the significance of unique identifiers in NCBI databases.

Uploaded by

marti.diez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views4 pages

PC#1 Exercises Introduction To NCBI 2020 v2

Uploaded by

marti.diez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Practical Session #1: Introduction to NCBI and Entrez databases.

I. Explore the National Center for Biotechnology information (NCBI) website and get familiar
with its design and environment and its major databases.

https://www.ncbi.nlm.nih.gov/

What NCBI is? How many databases are hosted by the NCBI? Which of the following NCBI
databases could be considered as primary databases? Protein, Nucleotide, CDD, PubMed,
Gene, Genomes, Refseq, BioProjects.

II. The Entrez system

Perform a Global Query at the NCBI through the Entrez using the expression (all[Filter]). Which
database contains the largest number of records?

Now we are interested to find all the information at the NCBI related to the group of diseases
in humans known as cancer. Type the word “cancer” in the search box on the NCBI homepage
and run the search (Global Query). Note that the query is interpreted differently in different
databases. How many scientific papers contain this word? How many nucleotide sequences?
How many cancer-related functional genomics studies have been stored at NCBI? Why does
taxonomy database give us one record? (For discussion in class)

Perform a new Global Query but using the word “human”. How many entries (records) have
been obtained for the different databases? Would we get the same results if we perform a
Global Query using the search expression (homo sapiens), (human[organism]) or (homo
sapiens[organism])? Why? How is the expression [organism] interpreted by each database?

If you are interested in studying human cancer, which of the following strategies would
produce a more useful set of results in a Global Query at the NCBI?

cancer AND human

cancer[organism] AND human

cancer AND human[organism]

cancer OR human[organism]

III. At NCBI each record is assigned a UID “unique integer identifier” for internal tracking. In
sequence databases this unique identifier is also known as the Accession number.

What NCBI database the following UIDs belong to?

CM000253.1

NG_011877.1

SRX4644664

NP_002266.2
CP027442.1

PRJNA490405

CAB37359.1

ADE87724.1

IV. Open the NCBI entry with accession number NG_011877 and get familiar with the format
and the different fields used to store sequence information. This will open in the GenBank
Flat File Format.

What does this entry represent? Do you think this entry provides cross-references (links) to
other databases? From which organism this sequence was obtained? What is the UID or
identifier for this organism in the Taxanomy database? What does the underscore “_” in the
accession number stand for? Display the entry in FASTA format. What happened?

V. In the Taxonomy database explore all the information related to the organism Homo
sapiens.

Look at the lineage for this taxon. What order do humans belong to? What is the txid for this
mammalian order?

How many human protein sequences are there today at the NCBI?

VI. Advance searches

With which of these strategies will you find all the human sequences stored in the nucleotide
database at the NCBI?

A. txid9606[Primary organism]

B. homo sapiens[Primary Organism]

C. homo sapiens[porgn]

D. human[porgn]

VI. Batch Entrez

Use Batch Entrez to upload a file of GIs or accession numbers from the Nucleotide or Protein
databases, or upload a list of record identifiers from other Entrez databases. Batch Entrez will
download automatically the corresponding records.

In this exercise we will retrieve from the NCBI database all sequences related to Homo sapiens
tumor protein 53 (TP53) published on a paper with PubMed accession number PMC3675194.
This flat text file has a list of the accession numbers referenced in this paper.

1. Save the text file locally in your computer.

2. Open Batch Entrez.

https://www.ncbi.nlm.nih.gov/sites/batchentrez
3. Select the database from which the list of accessions will be queried.
4. Use the “Browse” button to select the filename containing the list of idetifiers from
your system directory.
5. After pressing the “Retrieve” button you will see a list of record summaries. Retrieve
them!
6. Optionally, select a format in which to display the data for viewing, and/or saving.
Select “Send to file” to save the file.

How many records are on the list?

From what database the entries belong to?

Do all entries represent human sequences?

Do all entries represent mRNA sequences?

Do all sequences belong to the same human subject?

Do all sequences have the same length?

If it does not work you can obtain the same results with this link:
https://www.ncbi.nlm.nih.gov/nuccore/?term=KC820708:KC820786[pacc]

You can download all this sequences as fasta files using e-utilities, under linux / OSX / Cygwin,
etc:

1) Download the PMC3675194-List_IDs.txt file

2) Go to the downloaded directory in a command line shell (eg: bash)
3) Execute the following 2 commands:
a. dos2unix PMC3675194-List_IDs.txt
b. cat PMC3675194-List_IDs.txt |xargs -tI% wget -O %.fasta
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=
%&rettype=fasta&retmode=text

Note that the command “b” starts with “cat” and ends with “text”. The command “a” is
important, you should know why.

Xii-Pst Book PDF
0% (1)
Xii-Pst Book PDF
96 pages
Practical
No ratings yet
Practical
9 pages
GD121 Spare Parts Old
No ratings yet
GD121 Spare Parts Old
647 pages
Verified PDF Download Sociological Theory by George Ritzer 10e FULL Version
100% (1)
Verified PDF Download Sociological Theory by George Ritzer 10e FULL Version
404 pages
Road Paving, Trenches
100% (2)
Road Paving, Trenches
42 pages
Cost Volume Profit Analysis & Absorption Costing
0% (1)
Cost Volume Profit Analysis & Absorption Costing
21 pages
Bioinformatics Practical File
No ratings yet
Bioinformatics Practical File
12 pages
Aero Space Rivets
No ratings yet
Aero Space Rivets
16 pages
IMA2023109 - Imagine Invoice 132432 - Thecaratshop
No ratings yet
IMA2023109 - Imagine Invoice 132432 - Thecaratshop
1 page
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
No ratings yet
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
22 pages
Assignment 1 - Database - Feb 2022
No ratings yet
Assignment 1 - Database - Feb 2022
2 pages
Systems Biology: A Textbook
From Everand
Systems Biology: A Textbook
Edda Klipp
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
22 pages
PC#1 Exercises Introduction To NCBI 2020-Solved
No ratings yet
PC#1 Exercises Introduction To NCBI 2020-Solved
6 pages
Module1 Understanding Bioinformatics
No ratings yet
Module1 Understanding Bioinformatics
28 pages
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
No ratings yet
Murabahah and Murabahah For Purchase Orderer: Islamic Financial Transactions
14 pages
50 Years of The Future
100% (1)
50 Years of The Future
25 pages
LO4 Access To Sequenced Data and Related Information
No ratings yet
LO4 Access To Sequenced Data and Related Information
11 pages
O Poder Do Mel
No ratings yet
O Poder Do Mel
26 pages
NCBI Resources
No ratings yet
NCBI Resources
13 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Comsats University Islamabad Campus
No ratings yet
Comsats University Islamabad Campus
6 pages
Data Retrival Systems
No ratings yet
Data Retrival Systems
3 pages
Genomics & Proteomics
No ratings yet
Genomics & Proteomics
22 pages
Bioinformatics Manual Updated
No ratings yet
Bioinformatics Manual Updated
48 pages
Genomics
No ratings yet
Genomics
24 pages
Activity 1: Using Databases To Analyze DNA Sequences
No ratings yet
Activity 1: Using Databases To Analyze DNA Sequences
9 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
37 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Experiment - 01
No ratings yet
Experiment - 01
26 pages
Biological Databases: Notes Adapted From Lecture Notes of Dr. Larry Hunter at The University of Colorado
No ratings yet
Biological Databases: Notes Adapted From Lecture Notes of Dr. Larry Hunter at The University of Colorado
41 pages
Bioinfo Lab Final
No ratings yet
Bioinfo Lab Final
49 pages
Fall 2018 BIF401 1 Solution
No ratings yet
Fall 2018 BIF401 1 Solution
9 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Bioinformatics 1 p2
No ratings yet
Bioinformatics 1 p2
22 pages
Exer 5 - BIOINFORMATICS
No ratings yet
Exer 5 - BIOINFORMATICS
21 pages
Bioinformatics Exercise TYBSC
No ratings yet
Bioinformatics Exercise TYBSC
13 pages
Bioinfo Exercise 2
No ratings yet
Bioinfo Exercise 2
4 pages
Sanskrit PDF
No ratings yet
Sanskrit PDF
33 pages
University of Okara: Name: Topic: Subject: Semester: Department
No ratings yet
University of Okara: Name: Topic: Subject: Semester: Department
29 pages
Entrez
No ratings yet
Entrez
46 pages
Ncbi Dulu
No ratings yet
Ncbi Dulu
6 pages
Bioinformatics Lab Notebook: Comsats University, Islamabad
No ratings yet
Bioinformatics Lab Notebook: Comsats University, Islamabad
27 pages
Advanced Cellular Biology
No ratings yet
Advanced Cellular Biology
50 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
30 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
BioinfoMethods I Lab01
No ratings yet
BioinfoMethods I Lab01
19 pages
Lecture 4-Entrez-Biological Information Repository.
No ratings yet
Lecture 4-Entrez-Biological Information Repository.
10 pages
Bioinformatics Glossary
No ratings yet
Bioinformatics Glossary
4 pages
2006 09 01 - Lect01 - ch1 2 PDF
No ratings yet
2006 09 01 - Lect01 - ch1 2 PDF
104 pages
Ncbi
No ratings yet
Ncbi
25 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Bibliografia On Line2009
No ratings yet
Bibliografia On Line2009
8 pages
Lab 1
No ratings yet
Lab 1
39 pages
Lecture 5 - DataBase
No ratings yet
Lecture 5 - DataBase
18 pages
Signature Assignment Art Analysis-Final Paper
No ratings yet
Signature Assignment Art Analysis-Final Paper
5 pages
Ahmed Saad Qatea / 4 Stage
No ratings yet
Ahmed Saad Qatea / 4 Stage
10 pages
Manual
No ratings yet
Manual
68 pages
Exp 1
No ratings yet
Exp 1
7 pages
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
No ratings yet
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
6 pages
202 07 Bioinformatics
No ratings yet
202 07 Bioinformatics
14 pages
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
No ratings yet
Bio-Informatics, Its Application S& Ncbi: Submitted By: Sidhant Oberoi (BTF/09/4038)
9 pages
Factsheet: Genome Database
No ratings yet
Factsheet: Genome Database
4 pages
Genbank: National Center For Biotechnology Information
No ratings yet
Genbank: National Center For Biotechnology Information
5 pages
Comp Bio Lab File
No ratings yet
Comp Bio Lab File
43 pages
Bioinfi U3 Part - 1
No ratings yet
Bioinfi U3 Part - 1
4 pages
National Center For Biotechnology Information
No ratings yet
National Center For Biotechnology Information
4 pages
Human Genome Project A0001899-001-000 PDF
No ratings yet
Human Genome Project A0001899-001-000 PDF
10 pages
Seth Bordenstein March 20, 2005 Bioinformatics Lab Page 1
No ratings yet
Seth Bordenstein March 20, 2005 Bioinformatics Lab Page 1
6 pages
14 Hes
No ratings yet
14 Hes
2 pages
A Milling Machine Is A Machine Tool Used To Machine Solid Materials
No ratings yet
A Milling Machine Is A Machine Tool Used To Machine Solid Materials
7 pages
Random Details
No ratings yet
Random Details
2 pages
CHỨC NĂNG GIAO TIẾP
No ratings yet
CHỨC NĂNG GIAO TIẾP
10 pages
System Biology Assignment
No ratings yet
System Biology Assignment
17 pages
FULL Version Testbank Coordinate Geometry For JEE Advanced 3rd Edition G Tewani Multiple Formats
No ratings yet
FULL Version Testbank Coordinate Geometry For JEE Advanced 3rd Edition G Tewani Multiple Formats
409 pages
FV - Pitch Deck - Company Name
No ratings yet
FV - Pitch Deck - Company Name
12 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
QuST Sponsored MTech
No ratings yet
QuST Sponsored MTech
1 page
Registration For JP Morgan Chase & Co Recruitment Drive For 2025 Graduating Batch
No ratings yet
Registration For JP Morgan Chase & Co Recruitment Drive For 2025 Graduating Batch
2 pages
Satyam Cnlu Torts Roughdraft
No ratings yet
Satyam Cnlu Torts Roughdraft
4 pages
64482-International Price Index 23 24 v11
No ratings yet
64482-International Price Index 23 24 v11
30 pages
Bell, SOME EXPERIMENTS IN DIAGNOSTIC TEACHING
No ratings yet
Bell, SOME EXPERIMENTS IN DIAGNOSTIC TEACHING
23 pages
Filmit Themes 2021-22 For Students
No ratings yet
Filmit Themes 2021-22 For Students
4 pages
Activity 3 Earths Interior
No ratings yet
Activity 3 Earths Interior
3 pages
This Content Downloaded From 42.1.77.20 On Tue, 05 Nov 2024 14:43:27 UTC
No ratings yet
This Content Downloaded From 42.1.77.20 On Tue, 05 Nov 2024 14:43:27 UTC
17 pages
9YA, 95B, 971-Broken Valve Springs
No ratings yet
9YA, 95B, 971-Broken Valve Springs
3 pages
Visual Effects (VFX) Market 2034: Forecast & Analysis
No ratings yet
Visual Effects (VFX) Market 2034: Forecast & Analysis
10 pages
B.ing Kls XII
No ratings yet
B.ing Kls XII
1 page
R June 6 Prakash Bari Health
No ratings yet
R June 6 Prakash Bari Health
6 pages

PC#1 Exercises Introduction To NCBI 2020 v2

Uploaded by

PC#1 Exercises Introduction To NCBI 2020 v2

Uploaded by

Practical Session #1: Introduction to NCBI and Entrez databases.

II. The Entrez system

cancer AND human

cancer[organism] AND human

cancer AND human[organism]

What NCBI database the following UIDs belong to?

VI. Advance searches

B. homo sapiens[Primary Organism]

VI. Batch Entrez

1. Save the text file locally in your computer.

How many records are on the list?

From what database the entries belong to?

Do all entries represent human sequences?

Do all entries represent mRNA sequences?

Do all sequences belong to the same human subject?

Do all sequences have the same length?

1) Download the PMC3675194-List_IDs.txt file

You might also like