0% found this document useful (0 votes)

25 views17 pages

Module 1_Session 3_Part 1

Uploaded by

mariabrowny33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views17 pages

Module 1_Session 3_Part 1

Uploaded by

mariabrowny33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Introduction to Bioinformatics Online Course : IBT

Module 1: Introduction to databases and resources

(Session 3)

Part I sequence identifier and sequence

processing pipeline

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
Learning Outcomes
• Objective: Basic DNA sequence analysis

• ILOs : By the end of the session the trainee will be able to

• Define accession numbers and the significance of RefSeq identifiers
• Understand how to find a DNA sequence and save it in the correct format
• Identify features on the sequence and their applications such as coding
regions, restriction enzyme sites, etc.
• Interpret sequence analysis results and understand the biological impact
of functional regions
• Finding the six open reading frames and choose the correct one
• Design primers for amplification of a DNA sequence

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
How to identify genes using bioinformatics tools
Obtain the DNA or RNA sequence data from the organism of
interest through web lab research using various sequencing
Sequencing technologies, such as Sanger sequencing or next- generation
sequencing (NGS) is first done to obtain the sequence of the
gene of interest.

Clean and process the sequence data to remove any errors or

Preprocessing artefacts introduced during sequencing, as well as any
sequences that do not align with the reference genome or
transcriptome.

A sequence record is called 'annotated' when biological information is

added and linked to a position in the sequence annotated information

Gene identification
represented in sequence features table

Annotate the genome or transcriptome to identify potential genes

and annotation and their features, such as coding regions, exons, introns,
promoters, and regulatory elements. This can be done using tools
such as Ensembl, NCBI's, RefSeq, or UCSC Genome Browser.
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
Use computational methods to predict the location
and structure of genes within the genome or
Gene prediction transcriptome. This can be done using tools such as
Augustus, GeneMark, or Glimmer.

Validate the predicted genes computationally by comparing

.
them to known genes in related organisms, or by using
Validation experimental techniques such as RNA sequencing or
reverse transcription polymerase chain reaction (RT- PCR) to
confirm their expression

Functional Once genes have been identified, further analysis can be

done to determine their functions, interactions, and
analysis pathways. This can be done using tools such as Gene
Ontology (GO) and KEGG Pathway databases.
Introduction to Bioinformatics online course: IBT
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases:
Bioinformatics Abeir Shalaby
Resources & Databases: Abeir Shalaby
Searching the database for your gene of interest

First you have to determine for yourself which information you want
- NA sequences or protein sequences
- If NA, genomic sequences, or RNA derived sequences
- All possible sequences that exists, or just curated ones
- Retrieving the annotated sequence
- finding and interpret the annotated information represented in sequence features table (The
different kinds of features (e.g., gene, mRNA, coding region, tRNA)

The characterization of genomic features using computational and experimental methods

is called gene identification or annotation to answer the following questions
Which region codes for a protein?
Which DNA strand is used to encode the gene?
Where does the gene start and end?
Where are the exon-intron boundaries?
Where are the regulatory sequences for that gene?
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
Data exchange
between DDBJ, EMBL
and GenBank occurs
daily so it is only
necessary to submit
the sequence to one National Institute of Genetics (NIG)
database, whichever
one is most
convenient, without EMBL-bank
regard for where the
sequence may be
published

Introduction to Bioinformatics online course: IBT

12
Bioinformatics Resources & Databases: Abeir Shalaby
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
GenBank Numbers used to Key or uniquely Identify Entries

• 1. SeqID:
Initially, the Entry Name in the LOCUS line was used as the only key to a GenBank entry
This name attempted to mimic the organism and function of the gene encoded
Problem: impossible to do this systematically and uniquely with new knowledge ...
These Entry Names now change over time...
• 2. Accession Numbers:
The Accession Number was then introduced, to be the primary key to reference an entry in the
database . It will always stay with the entry, even when entry is updated
a. Genbank accession number , either 5 (eg: X79797) or 6 (eg: AF028831)
b. 'RefSeq' entry is the new entry (eg: NC_001140 )

the letter used reflects which of the three databases (GenBank, EMBL, DDBJ) is the primary database
, they have So many different IDs , we need to mapping accession numbers to move between them
Using EBI tool to Convert genbank accession number to ebi accession number
https://www.ebi.ac.uk/ebisearch/search?db=biotools&query=GenBank

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
GenBank Numbers used to Key or uniquely Identify Entries
3. new SeqID number = Accession . Version and unique GenBank Identifier."gi|" number
• Accession Number : identifies a sequence record , it does not change even the sequence is changed
• Version Number: tracks changes to the sequence itself by an integer extension of the accession
number called “Version” Initial version is “.1”
• Each version of a sequence gets a new unique NCBI identifier called a GI number , the number which
follows is a unique sequence id. Any change to the sequence data will result in a new gi number.
SO don’t search by GI

Entrez and BLAST results both present the following formatted text as part of the returned result:

gi|4557284|ref|NM_000646.1|AGLf| [4557284]
Gi gene identifier 4557284
Accession Number NM_000646
Version NM_000646.1
LOCUS name AGLf Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
Using EBI tool to Convert genbank accession number to ebi accession
number
https://www.ebi.ac.uk/ebisearch/search?db=biotools&query=GenBank

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
Gencard Human gene database
• You can use Gencard database to get all information of only
human genes https://www.genecards.org/

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
RefSeq processing pipline

Once sequence data are deposited in the public archival

databases, it is available for an automatic RefSeq processing
pipelines in collaboration with authoritative scientists or groups
outside NCBI, and curation by biological experts at NCBI center
The RefSeq processing pipelines
(A) 1) vertebrate curation pipeline, Curation of REFSEQ an
automated BLAST step

2) the computational genome annotation pipeline,

Available RefSeq and INSDC data are aligned to an assembled
genome, for gene prediction to define the annotation models. New
MODEL RefSeq

3) extraction from GenBank All RefSeq bacterial and archaeal

genomes, with the exception of RefSeq Prokaryotic Reference
Genomes, are annotated using NCBI's prokaryotic genome cross-
references (db_xref) to the source GenBank record
http://www.ncbi.nlm.nih.gov/RefSeq/
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
RefSeq Accession Number and its recourses
•Model RefSeq : RNA and protein products that are generated by the eukaryotic genome annotation pipeline.
These records use accession prefixes XM_, XR_, and XP_.
•Known RefSeq : RNA and protein products that are mainly derived from GenBank cDNA and EST data and
are supported by the RefSeq eukaryotic curation group. These records use accession prefixes NM_,NR_, NP_.
mRNAs and Proteins
NM_123456 Curated mRNA
NP_123456 Curated Protein
NR_123456 Curated non-coding RNA
XM_123456 Predicted mRNA
XP_123456 Predicted Protein
XR_123456 Predicted non-coding RNA
Gene Records
NG_123456 Reference Genomic Sequence
Chromosome
NC_123455 genomes, human chromosomes ,
Microbial replicons, organelle
AC_123455 Alternate assemblies
Assemblies
NT_123456 Contig
NW_123456 WGS Supercontig
Introduction to Bioinformatics online course: IBT
Bioinformatics Resources & Databases: Abeir Shalaby
REFSEQ status code and level of curation
A combined approach uses both collaborator supplied sequence information and automated
BLAST analysis to provide an initial RefSeq record
The initial RefSeq record will have a status of INFERRED, PREDICTED, or PROVISIONAL and
may include enhanced feature annotation
The status of Refseq is presented in “Comment” section in header of genbank file

• PROVISIONAL - Submitted, but not reviewed

• PREDICTED- Submitted but not reviewed , and some aspect of the RefSeq record is predicted.
• INFERRED- not subjected to individual reviewing Predicted by genome sequence analysis,
possibly homology not experimental evidence.
• VALIDATED- Additional manual curation mainly by NCBI staff , such as sequencing errors and
misassociation with a locus.
• REVIEWED- Additional annotation, a summary description, and other functional information as
available.

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
• https://www.ncbi.nlm.nih.gov/books/NBK21105/#ch1.Appendix_GenBank_RefSeq_TPA_and_UniP

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
Retrieving refseq record depending on biological data type

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby
The increasing size of the database and the diversity of data
sources available have made it convenient to split Genebank into
smaller discrete division.

PRI Primate PHG Bacteriophage

ROD Rodent SYN Synthetic
MAM Mammalian EST Expressed Sequence Tags
VRT Vertebrate PAT Patent
INV Invertebrate STS Sequence tagged sites
PLA Plant, Fungal GSS Genome Survey Sequence
BCT Bacterial
RNA Structural
RNA
VRL Viral

Introduction to Bioinformatics online course: IBT

Bioinformatics Resources & Databases: Abeir Shalaby

Campbell Biology 10th Edition Full Download
No ratings yet
Campbell Biology 10th Edition Full Download
412 pages
WNDMutationDB1 12 2w
No ratings yet
WNDMutationDB1 12 2w
7 pages
IBT_DNA_seq_analysis
No ratings yet
IBT_DNA_seq_analysis
38 pages
IBT Databases and Resources Updated2024
No ratings yet
IBT Databases and Resources Updated2024
61 pages
4Bioinformaticsdatabases
No ratings yet
4Bioinformaticsdatabases
71 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Gene Patenting
No ratings yet
Gene Patenting
13 pages
Fat Noews Docx (2)
No ratings yet
Fat Noews Docx (2)
37 pages
Lecture_3
No ratings yet
Lecture_3
55 pages
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
No ratings yet
Bioinform-Tica-Pdf-May-6-2010-12-38-Pm-3-5-Meg
105 pages
Unit 1
No ratings yet
Unit 1
25 pages
Bioinformatics 1 p2
No ratings yet
Bioinformatics 1 p2
22 pages
selected topic in cs 1 (3)
No ratings yet
selected topic in cs 1 (3)
53 pages
Bioinformatics1
No ratings yet
Bioinformatics1
37 pages
Module 1_Session 3_Part 2
No ratings yet
Module 1_Session 3_Part 2
36 pages
The One
No ratings yet
The One
436 pages
Brown and Beige Vintage Scrapbook Creative Portfolio Cover A4 Document
No ratings yet
Brown and Beige Vintage Scrapbook Creative Portfolio Cover A4 Document
20 pages
Forward and Reverse Gentics
No ratings yet
Forward and Reverse Gentics
21 pages
Vectors in RDNA Technology
No ratings yet
Vectors in RDNA Technology
117 pages
BT Brinjal Case Study
100% (1)
BT Brinjal Case Study
26 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
19 pages
Campbell Biology Chapter 1 Exam
No ratings yet
Campbell Biology Chapter 1 Exam
9 pages
L6 the+Cell+Cycle+&+the+Central+Dogma
No ratings yet
L6 the+Cell+Cycle+&+the+Central+Dogma
38 pages
lecture1_BIOF242_shuvadeep
No ratings yet
lecture1_BIOF242_shuvadeep
38 pages
Molecular Genetics - Lab Manual - 22 May 2021
No ratings yet
Molecular Genetics - Lab Manual - 22 May 2021
36 pages
38401062 Introduction
No ratings yet
38401062 Introduction
13 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Lab 1 - Introduction and Protocol
No ratings yet
Lab 1 - Introduction and Protocol
28 pages
Module1 Understanding Bioinformatics
No ratings yet
Module1 Understanding Bioinformatics
28 pages
Database
No ratings yet
Database
40 pages
Databases of NCBI
No ratings yet
Databases of NCBI
13 pages
Nuclear Quantum... My Life Simulator Chapter 708 - Comrade Mao
No ratings yet
Nuclear Quantum... My Life Simulator Chapter 708 - Comrade Mao
13 pages
Genetic Influence in Development-1
No ratings yet
Genetic Influence in Development-1
7 pages
The Pfizer CRISPR Vaccine Experiment
No ratings yet
The Pfizer CRISPR Vaccine Experiment
12 pages
unit 1
No ratings yet
unit 1
24 pages
Sheet 1 Dna Replication
No ratings yet
Sheet 1 Dna Replication
16 pages
LO4 Access to Sequenced Data and Related Information
No ratings yet
LO4 Access to Sequenced Data and Related Information
11 pages
Bioinformatics Lab Assignment Group 3
No ratings yet
Bioinformatics Lab Assignment Group 3
7 pages
The Bioinformatics Toolbox Extends MATLAB
No ratings yet
The Bioinformatics Toolbox Extends MATLAB
19 pages
Aakash: Byju'S
No ratings yet
Aakash: Byju'S
6 pages
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
No ratings yet
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
33 pages
Bioinfo Course Notes M1 2020 Dr Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 Dr Mbulli
56 pages
Population Genetics
No ratings yet
Population Genetics
9 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Sec1 Introduction to Bioinformatics
No ratings yet
Sec1 Introduction to Bioinformatics
20 pages
Bioinfo U3 Part 2
No ratings yet
Bioinfo U3 Part 2
3 pages
Genomic Imprinting
0% (1)
Genomic Imprinting
17 pages
Enhancers - and - Silencers - An - Integrated - and - Simple - M
No ratings yet
Enhancers - and - Silencers - An - Integrated - and - Simple - M
9 pages
BioinfoMethods I Lab01
No ratings yet
BioinfoMethods I Lab01
19 pages
Entrez
No ratings yet
Entrez
46 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Lecture 5- DataBase
No ratings yet
Lecture 5- DataBase
18 pages
Dna Rna Ans PDF
No ratings yet
Dna Rna Ans PDF
8 pages
Extensions Exceptions Mendel - S Laws
No ratings yet
Extensions Exceptions Mendel - S Laws
18 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Bioinfi U3 Part -1
No ratings yet
Bioinfi U3 Part -1
4 pages
Manual
No ratings yet
Manual
68 pages
BI205 Prac 5&6
No ratings yet
BI205 Prac 5&6
11 pages
Bi Workbook
No ratings yet
Bi Workbook
13 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
No ratings yet
Biology 171L - General Biology Lab I Lab 12: Introduction To Bioinformatics
6 pages
Blast Introduction
No ratings yet
Blast Introduction
42 pages
Microbiology and Parasitology
No ratings yet
Microbiology and Parasitology
55 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
2006 09 01 - Lect01 - ch1 2 PDF
No ratings yet
2006 09 01 - Lect01 - ch1 2 PDF
104 pages
Principles of Plant Biotechnology
100% (1)
Principles of Plant Biotechnology
349 pages
BTH 403-BTG407 LECTURE 1
No ratings yet
BTH 403-BTG407 LECTURE 1
6 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
No ratings yet
Lab 1A - Exploring Ncbi: Bioinformatic Methods I Lab 1
22 pages
Coursera BioinfoMethods-I Lab01 PDF
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
22 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
2-History of Genetics
No ratings yet
2-History of Genetics
26 pages
Operons and Prokaryotic Gene Regulation08 NatEdu
No ratings yet
Operons and Prokaryotic Gene Regulation08 NatEdu
2 pages
Biotechnology: Textbook For Class XI
No ratings yet
Biotechnology: Textbook For Class XI
12 pages
Index: Auroras Technological and Research Institute
No ratings yet
Index: Auroras Technological and Research Institute
56 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Unit 6 - Bioinformatics
No ratings yet
Unit 6 - Bioinformatics
41 pages
IB BIOLOGY SL TOPIC 3 Nucleic Acids & Proteins and Chemical Elements & Water
No ratings yet
IB BIOLOGY SL TOPIC 3 Nucleic Acids & Proteins and Chemical Elements & Water
13 pages
Optimal Conditions For The Growth OfE
No ratings yet
Optimal Conditions For The Growth OfE
12 pages
Sheikh Ibrahim Secondary School: Total Score
No ratings yet
Sheikh Ibrahim Secondary School: Total Score
4 pages
Cons Pros: What Is Genetic Engineering?
No ratings yet
Cons Pros: What Is Genetic Engineering?
1 page
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Bioinformatics in Aquaculture: Principles and Methods
From Everand
Bioinformatics in Aquaculture: Principles and Methods
Zhanjiang (John) Liu
No ratings yet
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Bioinformatics: Merging Biology and Technology
From Everand
Bioinformatics: Merging Biology and Technology
Mani Devar
No ratings yet
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)

Module 1_Session 3_Part 1

Uploaded by

Module 1_Session 3_Part 1

Uploaded by

Introduction to Bioinformatics Online Course : IBT

Module 1: Introduction to databases and resources

Part I sequence identifier and sequence

Introduction to Bioinformatics online course: IBT

• ILOs : By the end of the session the trainee will be able to

Introduction to Bioinformatics online course: IBT

Clean and process the sequence data to remove any errors or

A sequence record is called 'annotated' when biological information is

Annotate the genome or transcriptome to identify potential genes

Validate the predicted genes computationally by comparing

Functional Once genes have been identified, further analysis can be

The characterization of genomic features using computational and experimental methods

Introduction to Bioinformatics online course: IBT

Introduction to Bioinformatics online course: IBT

Introduction to Bioinformatics online course: IBT

Introduction to Bioinformatics online course: IBT

Once sequence data are deposited in the public archival

2) the computational genome annotation pipeline,

3) extraction from GenBank All RefSeq bacterial and archaeal

• PROVISIONAL - Submitted, but not reviewed

Introduction to Bioinformatics online course: IBT

Introduction to Bioinformatics online course: IBT

Introduction to Bioinformatics online course: IBT

PRI Primate PHG Bacteriophage

Introduction to Bioinformatics online course: IBT

You might also like