What Is Bioinformatics?
What Is Bioinformatics?
Interdisciplinary field of science that combines biology, computer science, statistics, physics,
chemistry, mathematics and engineering by developing methods and software tools for
understanding and interpreting biological data in genetics and genomics
What is DNA, meaning, groups facts?
DNA, deoxyribonucleic acid-hereditary material in humans and almost all other organisms,
biological instructions that make each species unique
• Information in DNA is stored as a code made up of four chemical bases: adenine (A), guanine
(G), cytosine (C), and thymine (T)
• DNA bases pair up with each other, A with T and C with G, to form units called base pairs
• Nucleotides are arranged in two long strands to form a spiral called double helix
All DNA follow Chargaff’s rule- “The total number of purines in a DNA molecule is equal to the total
number of pyrimidines”
• Structure of double helix resembles a ladder, with the base pairs forming the ladder’s rungs and
the sugar and phosphate molecules forming the vertical sidepieces of the ladder
• Each strand of DNA in double helix can function as a pattern for duplicating sequence of bases
Purines are bases that have double ring and triple bound
HEMOGLOBIN
A protein is a polymer of
amino acids linked together by
peptide bonds- Primary structure
is the sequence of amino acids
in the chain
Backbone of the protein will fold to form a regular repeating pattern called secondary structure
Protein folds upon itself when regions of secondary structure are interrupted by irregularly
folded loops and turns. It helps to visualize the helices as pink and the turns as white. This pattern
repeats for the entire length of whole protein chain. The irregular folding of the whole protein into a
compact globular structure is called tertiary structure
Some proteins are actually a collection of smaller proteins called subunits. Hemoglobin is made
of four subunits. The arrangement of subunits in a protein is its quaternary structure. It helps to
visualize the subunits as different colors
AMINO ACIDS
Amino acids are linked together as a chain — and that the true identity of a protein is derived not
only from its composition, but also from the precise order of its constituent amino acids
• First amino-acid sequence of protein insulin-determined in 1951.
• Actual recipe for human insulin, from which all its biological properties derive, is the following
chain of 110 residues
insulin=MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTP
KTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
NH2 and COOH groups of atoms are used to form peptidic bonds between successive residues in
the sequence
Protein molecule is made when a free NH2 group links chemically with a COOH group, forming
the peptide bond CO-NH
As a result of this chaining process, your protein molecule is going to be left with an unused
NH2 at one end and an unused COOH at the other end known as N-terminus and C-terminus of
protein chain
books, databases, and so on defines the sequence of a protein or of a protein fragment as the
succession of its constituent amino acids, listed in order from the N-terminus to the C-terminus.
Databases
Primary Database
•Original submission by experimentalists who have researched
•Content controlled by submitters
•Example: GENEBANK, SNP, GEO...
Secondary Database
•Built up from primary data which is retrieved by
primary database
•Content controlled by third party NCBI
•Example: RefSeq, RefSNP, NCBI, Structure, Protein
Shortcuts :
When two sequences are descended from a common evolutionary origin, they are said to have a
homologous relationship or share homology. Sequence similarity, which is the percentage of
aligned residues that are similar in physiochemical properties such as size, charge, and
hydrophobicity.
Sequence similarity and sequence identity are synonymous for nucleotide sequences. For protein
sequences, however, the two concepts are very different In a protein sequence alignment,
sequence identity refers to the percentage of matches of the same amino acid residues between
two aligned sequences. Similarity refers to the percentage of aligned residues that have similar
physicochemical characteristics and can be more readily substituted for each other
Задача
ALGORITAMS
The first application of dynamic programming in local alignment is the Smith–Waterman algorithm
•In this algorithm, positive scores are assigned for matching residues and zeros for mismatches.
• No negative scores are used.
•This approach may be suitable for aligning divergent sequences or sequences with multiple
domains that may be of different origins
Most commonly used pairwise alignment web servers apply the local alignment strategy, which
include SIM, SSEARCH, and LALIGN. SIM (http://bioinformatics.iastate.edu/aat/align/align.html) is
a web-based program for pairwise alignment using the Smith–Waterman algorithm that finds the
best scored non overlapping local alignments between two sequences.
•It is able to handle tens of kilobases of genomic sequence.
•The user has the option to set a scoring matrix and gap penalty scores.
•A specified number of best scored alignments are produced.
•rRNA ribosomal RNA (rRNA) Any one of a number of specific RNA molecules that form part of
the structure of a ribosome and participate in the synthesis of proteins
•tRNA transfer RNA (tRNA) Set of small RNA molecules used in protein synthesis as an interface
(adaptor) between messenger RNA and amino acids.
DNA encodes hereditary information (genotype) -> decoded into RNA -> protein
(phenotype)
TRANSLATION
Conversion of RNA into amino acid sequence that makes a protein
•The mRNA leaves the nucleus and enters the cytoplasm
• Ribosomes attach to mRNA
• tRNA (carrying anti-codon) picks up the correct amino acids and carries them to the mRNA
strand forming the protein
Ex:
–tRNA carries GAU (anti-codon)& looks for CUA on mRNA
Transcription
•Transcription- process that makes mRNA from DNA
1.DNA unzips into 2 separate strands A. DNA Helicase is the enzyme that breaks H-bond 2. Free
floating RNA NITROGEN BASES in the nucleus pair up w/unzipped DNA NITROGEN BASES: A.
Cytosine(C) pairs with Guanine(G) * (G) with (C) B. Uracil(U) pairs with Adenine(A) * (A) with (U)
C. Thymine (T) pairs with Adenine (A) ***remember (T) is only with DNA
TRANSCRIPTION VS TRANSLATION
RNA splicing
In molecular biology and genetics, splicing is a modification of the nascent pre-messenger
RNA(pre-mRNA) transcript in which introns are removed and exons are joined. For nuclear-
encoded genes,splicing takes place within the nucleus after or concurrently with transcription.
•carried out by spliceosomes
•Spliceosomes
–complex of proteins and several small nuclear ribonucleoproteins (snRNPs)
–Recognize splice sites (specific RNA sequences)
–cleave out introns and splice together exons (coding
region)
In most eukaryotic genes, coding regions (exons) are
interrupted by noncoding regions (introns). During
transcription, the entire gene is copied into a pre-mRNA,
which includes exons and introns. During the process of
RNA splicing, introns are removed and exons joined to
form a contiguous coding sequence.
Function of RNA
Structural
• e.g., rRNA, which is a major structural component of ribosomes BUT - its role is not just
structural, also: Catalytic RNA in the ribosome has peptidyltransferase activity
• Enzymatic activity responsible for peptide bond formation between amino acids in growing
peptide chain
• Also, many small RNAs are enzymes "ribozymes"
Regulatory Recently discovered important new roles for RNAs In normal cells:
• in "defense" - esp. in plants
• in normal development e.g., siRNAs, miRNA
As tools:
• for gene therapy or to modify gene expression
• RNAi
• RNA aptamers