0% found this document useful (0 votes)
158 views7 pages

Unigene

UniGene is an NCBI database that clusters EST sequences from dbEST and GenBank mRNA into gene-oriented clusters. Only ESTs with 3' ends are clustered to provide a more unique representation of transcripts. Contaminant sequences are removed before clustering the cleaned ESTs based on sequence overlaps. The final UniGene clusters represent unique genes and are annotated with gene and tissue information.

Uploaded by

Nandni Jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views7 pages

Unigene

UniGene is an NCBI database that clusters EST sequences from dbEST and GenBank mRNA into gene-oriented clusters. Only ESTs with 3' ends are clustered to provide a more unique representation of transcripts. Contaminant sequences are removed before clustering the cleaned ESTs based on sequence overlaps. The final UniGene clusters represent unique genes and are annotated with gene and tissue information.

Uploaded by

Nandni Jha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 7

UniGene

• UniGene is NCBI EST cluster database.


• Each cluster is a set of overlapping EST
sequences .
• The database is constructed based on
combined information from dbEST, GenBank
mRNA database.
• Only ESTs with 3’ ends are clustered.
• The resulting 3’EST sequences provide more
unique representation of the transcripts.
• The next step is to remove contaminant
sequences that include bacterial vectors.
• The cleaned ESTs are used to search against a
database of known unique genes with BLAST.
• The compiling step identifies sequence
overlaps and derived final sequence.
• During this step, errors in individual ESTs are
corrected, then sequences are partitioned into
clusters and assembled into contig.
• The final result is a set of nonredundant, gene
clusters known as UniGene clusters.
• Each UniGene cluster represents unique gene
and is further annotated for its gene locus
information, as well as information related to
the tissue type where gene has been
GSS
• In field of bioinformatics and computational
biology, genome survey sequences are
nucleotide sequences similar to ESTs.
• The only difference is that most of them are
genomic in origin rather than mRNA.
• Genome Survey Sequences are typically
generated and submitted to NCBI by labs
performing genome sequencing.
• They are used, amongst other things, as a
framework for the mapping and sequencing of
• Genome survey sequencing is a new way to
map the genome sequences.
• Current genome sequencing approaches are
mostly high-throughput shotgun methods,
and GSS is often used on the first step of
sequencing.
• GSSs can provide an initial global view of a
genome, which includes both coding and non-
coding DNA and contain repetitive section of
the genome.
UCSC
• The UCSC genome browser is an online
genome browser hosted by University of
California Santa Cruz.
• It is an interactive website offering access to
genome sequence data from variety of
vertebrates and invertebrates species.
• The UCSC genome browser hosts genomes
from variety of organisms: As of September
2009, this included 24 vertebrates, 14
mammals, 13 insects, 11 species of
• The UCSC genome browser is a part of
package of tools accessible from the UCSC
genome bioinformatics website.
• The UCSC genome browser provides users
with visualization of results from genome such
as SNP associated studies, linkage studies,
chromosomal positions of genes, evolutionary
relationships, alignments.
• It includes many tools such as Genome
browser, BLAT, Gene sorter, Genome graphs.
TIGR
• TIGR Gene Indices (www.tigr.org/tdb/tgi.shtml)
is an EST database that uses a different
clustering method from UniGene.
• It compiles data from dbEST, GenBank mRNA
and genomic DNA data, and TIGR’s own
sequence database.
• Sequences are only clustered if they are more
than 95% identical for over a forty nucleotide
region in pairwise comparisons.
• BLAST and FASTA are used to identify sequence
overlaps.

You might also like