0% found this document useful (0 votes)
30 views25 pages

Week 2

This document provides an overview of protein structure and bioinformatics. It discusses that proteins are responsible for catalyzing reactions in cells and regulating gene activity. Bioinformatics uses DNA sequence information to determine protein amino acid sequences and find related proteins to deduce their properties, structures and functions. The document outlines the four levels of protein structure - primary, secondary, tertiary, and quaternary. It also describes how the differing properties of amino acids and how they are linked by peptide bonds to form polypeptide chains.

Uploaded by

Nurullah Mertel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views25 pages

Week 2

This document provides an overview of protein structure and bioinformatics. It discusses that proteins are responsible for catalyzing reactions in cells and regulating gene activity. Bioinformatics uses DNA sequence information to determine protein amino acid sequences and find related proteins to deduce their properties, structures and functions. The document outlines the four levels of protein structure - primary, secondary, tertiary, and quaternary. It also describes how the differing properties of amino acids and how they are linked by peptide bonds to form polypeptide chains.

Uploaded by

Nurullah Mertel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Bioinformatics

Protein Structure
Assoc. Prof. Dr. Gazi Erkan BOSTANCI

Slides are mainly based on ‘Understanding Bioinformatics’ by Marketa


Zvelebil and Jeremy O. Baum
• If there is one class of molecules which could be said to live life it
would be the proteins.

• They are responsible for catalyzing almost all the chemical reactions
in the cell (RNA has a more limited but important role, as we saw
earlier), they regulate all gene activity, and they provide much of the
cellular structure.

• There is speculation that life may have started with nucleic acid
chemistry only, but it is the extraordinary functional versatility of
proteins that has enabled life to reach its current complex state.
• Proteins can function as enzymes
catalyzing a wide variety of *Cytoskeleton is the skeleton for a cell and
maintains the shape of a cell.
reactions necessary for life, and
they can be important for the
structure of living systems, such as
those proteins involved in the
cytoskeleton.

• The size of a protein can vary from


relatively small to quite large
macromolecules.
• The DNA sequence of a gene can be analyzed to give the amino acid
sequence of the protein product. In that aspect alone, the ready
availability of DNA sequences of genes and whole genomes from the
1980s onward revolutionized biology, as it opened up this vital
shortcut to determining the amino acid sequence of virtually any
protein.

• Bioinformatics uses this sequence information to find related proteins


and thus gather together knowledge that can help deduce the likely
properties of unknown proteins, plus their structures and functions.

• Knowing the relationship between a protein’s structure and its


function provides a greater understanding of how the protein works,
and thus often enables the researcher to propose experiments to
explore how modifying the structure will affect the function.
Primary and Secondary Structure
• A protein folds into a three-dimensional
structure, which is determined by its
protein sequence. The fold of the protein
consists of repeating structural units
called secondary structures, that will be
discussed in this section (see Flow
Diagram).

• The fold of the protein is very important


for the way the protein will function, and
whether it will function correctly.
• Therefore the study of the ways in which
proteins fold and understanding how they
fold is an important area of
bioinformatics, as well as predicting the
fold of a protein from its sequence.
Protein structure can be considered on
several different levels
• The analysis of protein structure by experimental techniques such as
X-ray crystallography and nuclear magnetic resonance (NMR) has
shown that proteins adopt distinct structural elements.

• In general there are four levels of protein structure to consider.


• The primary structure is the protein sequence, the types and order of
the amino acids in the protein chain.

• The secondary structure is the first level of protein folding, in which


parts of the chain fold to form generic structures that are found in all
proteins.

• The tertiary structure is formed by the further folding and packing


together of these elements to give the final three-dimensional
conformation unique to the protein.

• Many functional proteins are formed of more than one protein chain,
in which case the individual chains are called protein subunits. The
subunit composition and arrangement in such multisubunit proteins
is called the quaternary conformation.
• The structure adopted by a protein chain, and
thus its function, is determined entirely by its
amino acid sequence, but the rules that govern
how a protein chain of a given sequence folds
up are not yet understood and it is impossible
to predict the folded structure of a protein de
novo from its amino acid sequence alone.
• –There are several studies on this, including
recent ones.

• Helping to solve this problem is one of the


challenges facing bioinformatics.
Amino acids are the building blocks of
proteins
• Proteins are made up of 20 types of naturally occurring amino acids,
with a few other amino acids occurring infrequently.

• These 20 amino acids consist solely of the elements carbon (C),


nitrogen (N), oxygen (O), and hydrogen (H), with the exception of
cysteine and methionine, which also contain sulfur (S).

• The structure of an amino acid can be divided into a common main


chain part and a side chain that differs in chemical structure among
the different amino acids. The side chain is attached to the main
chain carbon atom known as the α-carbon (Cα).
• Diagram of an amino acid.
• (A) shows the chemical structure
of two amino acids, where R
represents the side chains, which
can be different as shown in (B).

• The amino acid consists of a


central Cα atom with a main
chain N and C at either side of it.
The C is bonded to an O with a
double bond.
The differing chemical and physical properties
of amino acids are due to their side chains
• The functional properties of proteins are almost entirely due to the
side chains of the amino acids. Each type of amino acid has specific
chemical physical properties that are conferred on it by the structure
and chemical properties of its side chain.

• They can, however, be classified into overlapping groups that share


some common physical and chemical properties, such as size and
electrical charge.
• The smallest amino acid is glycine, which has
only a hydrogen atom as its side chain. This
endows it with particular properties such as
great flexibility.

• The other extreme of side-chain flexibility is


represented by proline, an amino acid that
has a side chain bonded to the main-chain
nitrogen atom, resulting in a rigid structure.
• Some amino acids have uncharged side
chains and these are generally hydrophobic
(not liking water, therefore tend to be
buried within the protein surrounded by
other hydrophobic amino acids) while
others are positively or negatively charged.

• The charged or polar amino acids are


hydrophilic; they like to be surrounded by
water molecules with which they can form
interactions.
• As there are 20 distinct amino acids that occur in proteins, there can
be 20n different polypeptide chains of length n.

• For example, a polypeptide chain 250 amino acids in length will be


one of more than 10325 alternative different sequences.

• Clearly, the sequences that do occur are only a tiny fraction of those
possible. Often only a few sequence modifications are needed to
destabilize the three-dimensional conformation of a protein, and so it
is probable that the majority of these alternative sequences will not
adopt a stable conformation.
Amino acids are covalently linked together in
the protein chain by peptide bonds
• The primary structure of a protein is the sequence of
amino acids in the linear protein chain, which consists of
covalently linked amino acids. This linear chain is often
called a polypeptide chain.
Amino acid structure

• The amino acids are linked by peptide bonds, which are


formed by a condensation reaction (the loss of a water
molecule) between the backbone carboxyl group of one
amino acid and the amino group of another.

• When linked together in this way, the individual amino


acids are conventionally called amino acid residues.
• Peptide bonds.
• (A) gives the chemical
formulae of the peptide bond
that is formed between amino
acids to make a polypeptide
chain.
• (B) illustrates the above in a
diagrammatic form.
Implication for Bioinformatics
• In part, bioinformatics concerns itself with the analysis of protein
sequence to predict the secondary structure, the tertiary structure,
and the function of the protein, as well as its relationship to other
proteins.

• Different secondary structures tend to have subtle differences in


chemical environments, resulting in amino acid preferences.

• In addition, amino acid preferences are seen at particular locations in


proteins due to the functional role they play, for example as catalytic
residues or stabilizing the overall protein structure.
Evolution has aided sequence analysis
• Protein sequence similarity is a
powerful tool for characterizing
protein function and structure since
an enormous amount of information
is conserved throughout the
evolutionary process.
• Proteins that have a common
ancestor are referred to as being
homologous.
• Sequence alignment and database search techniques can identify
homologous proteins.

• Homologous proteins usually have a similar three-dimensional


structure with related active sites and binding domains. Therefore
homologous proteins will also often have related functions, although
this is not always the case.

• Most amino acids that change during evolution are found in regions
that are not structurally or functionally important, such as many of
the loops (or variable) regions.

• If the homologous protein is also functionally related then the amino


acids involved in function are often conserved during evolution,
which helps in identifying the function of a new protein.
Visualization and computer manipulation of
protein structures
• There are a number of programs available that read the coordinate
file and convert it to a visible three-dimensional representation of the
protein. The protein can be rotated, specific regions highlighted, and
some measurements can be calculated.
• Some of these programs are very powerful and can be of great use in
analyzing the structural properties and molecular function, as well as
allowing for the manual modification of the molecule.
• Some of the programs are free or low cost, such as Chimera, Yasara,
and DeepView. Others are extremely powerful programs that allow
the user to carry out computationally intensive modifications to the
molecule, but are expensive.
• Molecular representations.
• The different representations that can
be used to illustrate molecules, from
very simple ones that only use the Cα
or backbone atoms to spacefilling
models of all atoms in the structure.
• There are many styles for viewing molecular structures, including
those with atomic-level detail such as space-filling models, ball and
stick models, and wireframe models (also called stick models or
skeletal models), as well as surface models.

• However, it is often desirable to have a simplified model of the


protein, such as backbone or Cα models and schematic (cartoon)
models. Such models can be represented on a computer screen and
can be represented in different styles and colors.

• Molecular models are usually based upon an atomic coordinate file,


which in general give the (x,y,z) coordinates of each atom.
• ChimeraX demonstration Heteronychus arator

• 2bbv.pdb (Black beetle virus, RNA virus)

You might also like