0% found this document useful (0 votes)
48 views8 pages

Lec 3 Terms and Definitions in Bioinformatics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views8 pages

Lec 3 Terms and Definitions in Bioinformatics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Terms and Definitions in Bioinformatics

Bioinformatics:

An absolute definition of bioinformatics has not been agreed upon.

The first level, however, can be defined as the design and application of
methods for the collection, organization, indexing, storage, and analysis
of biological sequences (both nucleic acids [DNA and RNA] and
proteins).

The next stage of bioinformatics is the derivation of knowledge


concerning the pathways, functions, and interactions of these genes
(functional genomics) and proteins (proteomics).

Bioinformatics is also referred to as computational biology.

Annotation (explanation): Text fields of information about a


biosequence which are added to a sequence databases. Annotation (the
elucidation and description of biologically relevant features in the
sequence) consists of the description of the following items:

 Function(s) of the protein.


 Post-translational modification(s). For example carbohydrates,
phosphorylation, acetylation, GPI-anchor, etc.
 Domains and sites. For example calcium binding regions, ATP-
binding sites, zinc fingers, homeobox, kringle, etc.
 Secondary structure.
 Quaternary structure. For example homodimer, heterotrimer, etc.
 Similarities to other proteins.
 Disease(s) associated with deficiencie(s) in the protein.
 Sequence conflicts, variants, etc.
Assembly: The process of placing fragments of DNA that have been
sequenced into their correct position within the chromosome.
Base: One of five molecules which are assembled, along with a ribose
and a phosphate, to form nucleotides.
Adenine (A), guanine (G), cytosine (C), and thymine (T) are found in
DNA while RNA is made from adenine (A), guanine (G), cytosine (C),
and uracil (U).
Base pair (BP): The complementary bases on opposite strands of DNA
which are held together by hydrogen bonding. The atomic structure of
these bases pre-select the pairing of adenine with thymine and the
pairing of guanine with cytosine (or uracil in RNA).

BLAST: Basic Local Alignment Search Tool.


A program for searching biosequence databases which was
developed and is maintained by a group at the National Center for
Biotechnology Information (NCBI). There are several versions of
BLAST:
BLASTP which searches a protein database,
BLASTN to search a nucleotide database,
TBLASTN which searches for a protein sequence in a nucleotide
database by translating nucleotide sequences in all 6 reading frames,
BLASTX which can search for a nucleotide sequence against a
protein database by translating the query via all 6 reading frames,
gapped-BLAST, and psi-BLAST.
BLAST locates patches of regional similarity instead of calculating
the best overall alignment using gaps. The program then uses a scoring
matrix to rank these matches as positive, negative or zero. If the initial
match is scored highly, the search is expanded in both directions until
the ranking score falls off.

BEAUTY (BLAST Enhanced Alignment Utility): A tool developed at


Baylor College of Medicine (Worley et al. 1995) which uses BLAST to
search several custom databases and incorporates sequence family
information, location of conserved domains, and information about any
annotated sites or domains directly into the BLAST query results.

BLITZ: EBI's ultra-fast protein database search which uses the


MPsearch algorithm.

cDNA (complementary DNA): An artificial piece of DNA that is


synthesized from an mRNA (messenger RNA) template and is created
using reverse transcriptase.
The single stranded form of cDNA is frequently used as a probe in the
preparation of a physical map of a genome. cDNA is preferred for
sequence analysis because the introns found in DNA are removed in
translation from DNA ----> mRNA ----> cDNA.

CLUSTAL W: A general purpose program for multiple alignments of


DNA and protein sequences developed by Thompson, et. al. in 1994.
Complementarity: The sequence-specific or shape-specific recognition
that occurs when two or more molecules bind together.

DNA forms double stranded helixes because the complementary


orientation of the bases in each strand facilitate the formation of the
hydrogen bonds which hold the strands together.

Computational biology: See bioinformatics

Consensus sequence: a sequence of DNA having similar structure and


function in different organisms.
or
The most commonly occurring amino acid or nucleotide at each
position of an aligned series of proteins or polynucleotides.
Consensus map: The location of all consensus sequences in a series of
multiply aligned proteins or polynucleotides.
Conserved sequence: A sequence within DNA or protein that is
consistent across species or has remained unchanged within the species
over its evolutionary period.

EMBL: The European Molecular Biology Laboratory


(http://www.embl-heidelberg.de/) which is located in Heidelberg
Germany.

EBI: The European Bioinformatics Institute (http://www.ebi.ac.uk/) is a


part of the EMBL.
EMBL Nucleotide Sequence Database: Europe's primary nucleotide
sequence resource. Main sources for DNA and RNA sequences are
direct submissions from individual researchers, genome sequencing
projects and patent applications. The database is produced in
collaboration with GenBank and the DNA Database of Japan (DDBJ).
Each of the three groups collects a portion of the total sequence data
reported worldwide, and all new and updated database entries are
exchanged between the groups on a daily basis.

Entrez: A WWW-based database retrieval program created by the


National Center for Biotechnology Information (NCBI), a division of the
NIH.

EST (Expressed Sequence Tag): A partial sequence of a cDNA clone


that can be used to identify sites in a gene.

FASTA: An alignment program for protein sequences created by


Pearson and Lipman in 1988. The program is one of the many heuristic
algorithms proposed to speed up sequence comparison. The basic idea is
to add a fast prescreen step to locate the highly matching segments
between two sequences, and then extend these matching segments to
local alignments using more rigorous algorithms such as Smith-
Waterman.

GenBank: The NIH genetic sequence database.


An annotated collection of all publicly available DNA sequences
which is located at http://www.ncbi.nlm.nih.gov/. There are
approximately 2,162,000,000 bases in 3,044,000 sequence records as of
December 1998.
GenBank is part of the International Nucleotide Sequence
Database Collaboration, which is comprised of the DNA DataBank of
Japan (DDBJ), the European Molecular Biology Laboratory (EMBL),
and GenBank at NCBI. These three organizations exchange data on a
daily basis.

HUGO: The Human Genome Organization.

Multiple Alignment: A set of biosequences arranged in a table such that


each row of the table consists of one sequence padded by gaps. The
columns of the table highlight similarity (or residue conservation)
between positions of each biosequence. An Optimal Multiple Alignment
is one that has the highest degree of similarity, or the lowest cost.

NCBI: The National Center for Biotechnology Information


(http://www.ncgi.nlm.nih.gov/), a division of the NIH, is the home of the
BLAST and Entrez servers.

NCGR: The National Center for Genome Resources


(http://www.ncgr.org/).

NHGRI: The National Human Genome Research Institute of the NIH


(http://www.nhgri.nih.gov/)

PDB (Protein Data Bank): An international repository for the results of


macromolecular studies using NMR, X-ray crystallography, or
homology methods. The results of structural studies of proteins, RNA,
DNA, viruses, and polysaccharides are presently available.
The term PDB also defines a standard file format for publishing protein
and nucleotide structures for use in computer programs.

Primer: The short sequence of nucleotides (usually eight) which serve


to prime the DNA polymerase process during cell division. Primers are
produced by the enzyme primase. Primers also can be customized to
'isolate' specific sections of DNA for replication using PCR.

Scoring Function (also cost function or weight function): The


methods used to evaluate the quality of the overlap between sequences.
A variety of scoring functions are used to evaluate single replacement
operations, multiple alignments (either whole or columns), and pairwise
alignments. The score of an alignment of two sequences (a and b) is the
sum of the score of all the replacement operations that lead from a to b.

Sequencing: Determining the order of nucleotides in a gene or the order


of amino acids in a protein.

Single nucleotide polymorphism (SNP): The most common type of


DNA sequence variation.
An SNP is a change in a single base pair at a particular position along
the DNA strand. When an SNP occurs, the gene's function may change,
as seen in the development of bacterial resistance to antibiotics or of
cancer in humans.
SWISS-PROT:
An annotated protein sequence database established in 1986 and
maintained collaboratively, since 1987, by the Department of Medical
Biochemistry of the University of Geneva and the EMBL Data Library
(now the EMBL Outstation - The European Bioinformatics Institute
(EBI)).
The SWISS-PROT protein sequence data bank consists of
sequence entries. Sequence entries are composed of different line-types,
each with their own format. For standardization purposes the format of
SWISS-PROT follows as closely as possible that of the EMBL
Nucleotide Sequence Database.
The Swiss Institute of Bioinformatics (SBI):
An academic institution established on March 30, 1998 as a non-
profit foundation.
The goals of this institute are to promote the development of
software tools and databases in the field of bioinformatics, to sustain a
high-quality research program in bioinformatics, to provide, in
collaboration with academic partners, a curriculum of courses and
seminars for the formation of research scientists in the field of
bioinformatics, and to offer services to the Swiss scientific user
community through the Swiss-EMBnet node (which is currently
maintained jointly by Swiss Institute for Cancer Research and the
University of Geneva).

TIGR: The Institute for Genomic Research, located in Bethesda


Maryland (http://www.tigr.org/)

You might also like